[ https://issues.apache.org/jira/browse/SPARK-28099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17212281#comment-17212281 ]
Zsombor Fedor commented on SPARK-28099: --------------------------------------- It is not ORC specific: val testData = List(1,2,3,4,5) val dataFrame = testData.toDF() dataFrame .coalesce(1) .write .format("parquet") .save("user/hive/warehouse/test/dir1=1/") spark.sql("CREATE EXTERNAL TABLE test (val INT) STORED AS PARQUET LOCATION '/user/hive/warehouse/test/'") val queryResponse = spark.sql("SELECT * FROM test") //java.lang.AssertionError: assertion failed at scala.Predef$.assert(Predef.scala:156) // at org.apache.spark.sql.hive.HiveMetastoreCatalog.convertToLogicalRelation(HiveMetastoreCatalog.scala:214) > Assertion when querying unpartitioned Hive table with partition-like naming > --------------------------------------------------------------------------- > > Key: SPARK-28099 > URL: https://issues.apache.org/jira/browse/SPARK-28099 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.4.3 > Reporter: Douglas Drinka > Priority: Major > > {code:java} > val testData = List(1,2,3,4,5) > val dataFrame = testData.toDF() > dataFrame > .coalesce(1) > .write > .mode(SaveMode.Overwrite) > .format("orc") > .option("compression", "zlib") > .save("s3://ddrinka.sparkbug/testFail/dir1=1/dir2=2/") > spark.sql("DROP TABLE IF EXISTS ddrinka_sparkbug.testFail") > spark.sql("CREATE EXTERNAL TABLE ddrinka_sparkbug.testFail (val INT) STORED > AS ORC LOCATION 's3://ddrinka.sparkbug/testFail/'") > val queryResponse = spark.sql("SELECT * FROM ddrinka_sparkbug.testFail") > //Throws AssertionError > //at > org.apache.spark.sql.hive.HiveMetastoreCatalog.convertToLogicalRelation(HiveMetastoreCatalog.scala:214){code} > It looks like the native ORC reader is creating virtual columns named dir1 > and dir2, which don't exist in the Hive table. [The > assertion|[https://github.com/apache/spark/blob/c0297dedd829a92cca920ab8983dab399f8f32d5/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala#L257]] > is checking that the number of columns match, which fails due to the virtual > partition columns. > Actually getting data back from this query will be dependent on SPARK-28098, > supporting subdirectories for Hive queries at all. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org