Ryan Blue created SPARK-27960: --------------------------------- Summary: DataSourceV2 ORC implementation doesn't handle schemas correctly Key: SPARK-27960 URL: https://issues.apache.org/jira/browse/SPARK-27960 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.4.3 Reporter: Ryan Blue
While testing SPARK-27919 (#[24768|https://github.com/apache/spark/pull/24768]), I tried to use the v2 ORC implementation to validate a v2 catalog that delegates to the session catalog. The ORC implementation fails the following test case because it cannot infer a schema (there is no data) but it should be using the schema used to create the table. Test case: {code} test("CreateTable: test ORC source") { spark.conf.set("spark.sql.catalog.session", classOf[V2SessionCatalog].getName) spark.sql(s"CREATE TABLE table_name (id bigint, data string) USING $orc2") val testCatalog = spark.catalog("session").asTableCatalog val table = testCatalog.loadTable(Identifier.of(Array(), "table_name")) assert(table.name == "orc ") // <-- should this be table_name? assert(table.partitioning.isEmpty) assert(table.properties == Map( "provider" -> orc2, "database" -> "default", "table" -> "table_name").asJava) assert(table.schema == new StructType().add("id", LongType).add("data", StringType)) // <-- fail val rdd = spark.sparkContext.parallelize(table.asInstanceOf[InMemoryTable].rows) checkAnswer(spark.internalCreateDataFrame(rdd, table.schema), Seq.empty) } {code} Error: {code} Unable to infer schema for ORC. It must be specified manually.; org.apache.spark.sql.AnalysisException: Unable to infer schema for ORC. It must be specified manually.; at org.apache.spark.sql.execution.datasources.v2.FileTable.$anonfun$dataSchema$5(FileTable.scala:61) at scala.Option.getOrElse(Option.scala:138) at org.apache.spark.sql.execution.datasources.v2.FileTable.dataSchema$lzycompute(FileTable.scala:61) at org.apache.spark.sql.execution.datasources.v2.FileTable.dataSchema(FileTable.scala:54) at org.apache.spark.sql.execution.datasources.v2.FileTable.schema$lzycompute(FileTable.scala:67) at org.apache.spark.sql.execution.datasources.v2.FileTable.schema(FileTable.scala:65) at org.apache.spark.sql.sources.v2.DataSourceV2SQLSuite.$anonfun$new$5(DataSourceV2SQLSuite.scala:82) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org