I think we are hitting an old bug. tried it with
Hadoop 3.1.1 Hive 3.1.1 Spark 3.1.1 Try to create an ORC transactional table in Hive (PySpark) CREATE TABLE if not exists test.randomDataDelta( ID INT , CLUSTERED INT , SCATTERED INT , RANDOMISED INT , RANDOM_STRING VARCHAR(50) , SMALL_VC VARCHAR(50) , PADDING VARCHAR(40) ) STORED AS ORC TBLPROPERTIES ( *"transactional" = "true", "orc.create.index"="true", "orc.bloom.filter.columns"="ID", "orc.bloom.filter.fpp"="0.05", "orc.compress"="SNAPPY", "orc.stripe.size"="16777216", "orc.row.index.stride"="10000" )* And populate it through Spark with random data it works and can red it through Spark starting at ID = 218 ,ending on = 236 Schema of delta table root |-- ID: long (nullable = true) |-- CLUSTERED: double (nullable = true) |-- SCATTERED: double (nullable = true) |-- RANDOMISED: double (nullable = true) |-- RANDOM_STRING: string (nullable = true) |-- SMALL_VC: string (nullable = true) |-- PADDING: string (nullable = true) +-----+-----+ |minID|maxID| +-----+-----+ | 1| 236| +-----+-----+ Finished at 14/06/2021 19:02:43.43 Now I am trying to read it in Hive 0: jdbc:hive2://rhes75:10099/default> desc test.randomDataDelta; +----------------+--------------+----------+ | col_name | data_type | comment | +----------------+--------------+----------+ | id | int | | | clustered | int | | | scattered | int | | | randomised | int | | | random_string | varchar(50) | | | small_vc | varchar(50) | | | padding | varchar(40) | | +----------------+--------------+----------+ 7 rows selected (0.169 seconds) 0: jdbc:hive2://rhes75:10099/default> *select count(1) from test.randomDataDelta;Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. ORC split generation failed with exception: java.lang.NoSuchMethodError: org.apache.hadoop.fs.FileStatus.compareTo(Lorg/apache/hadoop/fs/FileStatus;)I (state=08S01,code=1)* I did a Google search and showed the error I raised three years ago https://user.hive.apache.narkive.com/Td3He6Vj/failed-execution-error-return-code-1-from-org-apache-hadoop-hive-ql-exec-mr-mapredtask-orc-split So it has not been fixed yet! HTH view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On Mon, 14 Jun 2021 at 16:29, Suryansh Agnihotri <sagnihotri2...@gmail.com> wrote: > No this also does not work. > Steps I followed. > spark-sql: > CREATE TABLE students (id int, name string, marks int) STORED AS ORC > TBLPROPERTIES ('transactional' = 'true'); > > hive-cli: > created a students_copy table and inserted some values in it and did > "INSERT OVERWRITE TABLE students select * from default.students_copy;" > I am able to query both tables from hive-cli but not from spark (table > students is created using spark ) > > Thanks > > On Mon, 14 Jun 2021 at 20:07, Mich Talebzadeh <mich.talebza...@gmail.com> > wrote: > >> Ok there were issues in the past with the ORC table read through Spark. >> >> If the ORC table is created through Spark I believe it will work >> >> Do a test. Create the ORC table through Spark first. >> >> Then do insert overwrite into that table through Hive cli from your Hive >> created ORC table and see if you can access data in the new table through >> Spark. >> >> HTH >> >> >> >> >> >> view my Linkedin profile >> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >> >> >> >> *Disclaimer:* Use it at your own risk. Any and all responsibility for >> any loss, damage or destruction of data or any other property which may >> arise from relying on this email's technical content is explicitly >> disclaimed. The author will in no case be liable for any monetary damages >> arising from such loss, damage or destruction. >> >> >> >> >> On Mon, 14 Jun 2021 at 15:19, Suryansh Agnihotri < >> sagnihotri2...@gmail.com> wrote: >> >>> Table was created by hive (hive-cli) , format is orc. I am able to get >>> data from hive-cli (hive return rows). >>> But spark-sql/spark-shell does not return any rows. >>> >>> On Mon, 14 Jun 2021 at 19:26, Mich Talebzadeh <mich.talebza...@gmail.com> >>> wrote: >>> >>>> How the table was created in the first place, spark or Hive? >>>> >>>> Is this table an ORC table and does Spark or Hive return rows? >>>> >>>> HTH >>>> >>>> >>>> >>>> view my Linkedin profile >>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>> >>>> >>>> >>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for >>>> any loss, damage or destruction of data or any other property which may >>>> arise from relying on this email's technical content is explicitly >>>> disclaimed. The author will in no case be liable for any monetary damages >>>> arising from such loss, damage or destruction. >>>> >>>> >>>> >>>> >>>> On Mon, 14 Jun 2021 at 14:33, Suryansh Agnihotri < >>>> sagnihotri2...@gmail.com> wrote: >>>> >>>>> Hi >>>>> Does spark support querying hive tables which are transactional? >>>>> I am using spark 3.0.2 / hive metastore 3.1.2 and trying to query the >>>>> table but I am not able to see the data from the table , although *show >>>>> tables *does list the table from hive metastore and desc table works >>>>> fine but *select * from table* gives *empty result*. >>>>> Does the later version of spark have the fix or is there another way >>>>> to query? >>>>> Thanks >>>>> >>>>