[ 
https://issues.apache.org/jira/browse/SPARK-28099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17212281#comment-17212281
 ] 

Zsombor Fedor commented on SPARK-28099:
---------------------------------------

It is not ORC specific:


val testData = List(1,2,3,4,5)
val dataFrame = testData.toDF()
dataFrame
.coalesce(1)
.write
.format("parquet")
.save("user/hive/warehouse/test/dir1=1/")
spark.sql("CREATE EXTERNAL TABLE test (val INT) STORED AS PARQUET LOCATION 
'/user/hive/warehouse/test/'")

val queryResponse = spark.sql("SELECT * FROM test") 
//java.lang.AssertionError: assertion failed at 
scala.Predef$.assert(Predef.scala:156) 
// at 
org.apache.spark.sql.hive.HiveMetastoreCatalog.convertToLogicalRelation(HiveMetastoreCatalog.scala:214)


 

> Assertion when querying unpartitioned Hive table with partition-like naming
> ---------------------------------------------------------------------------
>
>                 Key: SPARK-28099
>                 URL: https://issues.apache.org/jira/browse/SPARK-28099
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.4.3
>            Reporter: Douglas Drinka
>            Priority: Major
>
> {code:java}
> val testData = List(1,2,3,4,5)
> val dataFrame = testData.toDF()
> dataFrame
> .coalesce(1)
> .write
> .mode(SaveMode.Overwrite)
> .format("orc")
> .option("compression", "zlib")
> .save("s3://ddrinka.sparkbug/testFail/dir1=1/dir2=2/")
> spark.sql("DROP TABLE IF EXISTS ddrinka_sparkbug.testFail")
> spark.sql("CREATE EXTERNAL TABLE ddrinka_sparkbug.testFail (val INT) STORED 
> AS ORC LOCATION 's3://ddrinka.sparkbug/testFail/'")
> val queryResponse = spark.sql("SELECT * FROM ddrinka_sparkbug.testFail")
> //Throws AssertionError
> //at 
> org.apache.spark.sql.hive.HiveMetastoreCatalog.convertToLogicalRelation(HiveMetastoreCatalog.scala:214){code}
> It looks like the native ORC reader is creating virtual columns named dir1 
> and dir2, which don't exist in the Hive table. [The 
> assertion|[https://github.com/apache/spark/blob/c0297dedd829a92cca920ab8983dab399f8f32d5/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala#L257]]
>  is checking that the number of columns match, which fails due to the virtual 
> partition columns.
> Actually getting data back from this query will be dependent on SPARK-28098, 
> supporting subdirectories for Hive queries at all.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to