[ https://issues.apache.org/jira/browse/SPARK-29234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Suchintak Patnaik updated SPARK-29234: -------------------------------------- Description: When we create a bucketed table as follows, it's input and output format are getting displayed as SequenceFile format. But physically the files are getting created in HDFS as the format specified by the user e.g. orc,parquet,etc. df.write.format("orc").bucketBy(4,"order_status").saveAsTable("OrdersExample") in Hive, DESCRIBE FORMATTED OrdersExample; describe formatted ordersExample; OK # col_name data_type comment col array<string> from deserializer # Storage Information SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat: org.apache.hadoop.mapred.SequenceFileInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat Querying the same table in Hive is giving error. select * from OrdersExample; OK Failed with exception java.io.IOException:java.io.IOException: hdfs://nn01.itversity.com:8020/apps/hive/warehouse/kuki.db/ordersexample/part-00000-55920574-eeb5-48b7-856d-e5c27e85ba12_00000.c000.snappy.orc not a SequenceFile While reading the same table in Spark also giving error. df = spark. was: When we create a bucketed table as follows, it's input and output format are getting displayed as SequenceFile format. But physically the files are getting created in HDFS as the format specified by the user e.g. orc,parquet,etc. df.write.format("orc").bucketBy(4,"order_status").saveAsTable("OrdersExample") in Hive, DESCRIBE FORMATTED OrdersExample; describe formatted ordersExample; OK # col_name data_type comment col array<string> from deserializer # Storage Information SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat: org.apache.hadoop.mapred.SequenceFileInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat Querying the same table in Hive is giving error. select * from OrdersExample; OK Failed with exception java.io.IOException:java.io.IOException: hdfs://nn01.itversity.com:8020/apps/hive/warehouse/kuki.db/ordersexample/part-00000-55920574-eeb5-48b7-856d-e5c27e85ba12_00000.c000.snappy.orc not a SequenceFile > bucketed table created by Spark SQL DataFrame is in SequenceFile format > ----------------------------------------------------------------------- > > Key: SPARK-29234 > URL: https://issues.apache.org/jira/browse/SPARK-29234 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.3.0 > Reporter: Suchintak Patnaik > Priority: Major > > When we create a bucketed table as follows, it's input and output format are > getting displayed as SequenceFile format. But physically the files are > getting created in HDFS as the format specified by the user e.g. > orc,parquet,etc. > df.write.format("orc").bucketBy(4,"order_status").saveAsTable("OrdersExample") > in Hive, DESCRIBE FORMATTED OrdersExample; > describe formatted ordersExample; > OK > # col_name data_type comment > col array<string> from deserializer > # Storage Information > SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > InputFormat: org.apache.hadoop.mapred.SequenceFileInputFormat > OutputFormat: > org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat > Querying the same table in Hive is giving error. > select * from OrdersExample; > OK > Failed with exception java.io.IOException:java.io.IOException: > hdfs://nn01.itversity.com:8020/apps/hive/warehouse/kuki.db/ordersexample/part-00000-55920574-eeb5-48b7-856d-e5c27e85ba12_00000.c000.snappy.orc > not a SequenceFile > While reading the same table in Spark also giving error. > df = spark. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org