Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20511#discussion_r168833177 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala --- @@ -160,6 +160,15 @@ abstract class OrcSuite extends OrcTest with BeforeAndAfterAll { } } } + + test("SPARK-23340 Empty float/double array columns raise EOFException") { + Seq(Seq(Array.empty[Float]).toDF(), Seq(Array.empty[Double]).toDF()).foreach { df => + withTempPath { path => --- End diff -- Ur, I think you are still confused two things. First of all, we have five ORC readers. We didn't check `ORC MR reader` and `ORC Vectorized Copy` explicitly. We usually test 1, 2, and 3. 1. Hive Serde 2. Hive OrcFileFormat 3. Apache ORC Vectorized Wrapper 4. Apache ORC Vectorized Copy 5. Apache ORC MR In this PR, we already adds 1, 2, 3. 3 is the vectorized wrapper reader. 1. Hive Serde : `HiveOrcQuerySuite.test(SPARK-23340 Empty float/double array columns raise EOFException)` 2. Hive OrcFileFormat: `OrcSourceSuite` <= `OrcSuite.test("SPARK-23340 Empty float/double array columns raise EOFException")` 3. Apache ORC Vectorized Wrapper: `HiveOrcSourceSuite` <= `OrcSuite.test("SPARK-23340 Empty float/double array columns raise EOFException")` Second, this test schema includes complex types. So, 3 (vectorized wrapper reader) configuration is also going to fall-back ORC MR Reader path. In other words, case 5. Please note that Apache Spark support vectorization for `Atomic` types only in both Parquet and ORC.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org