[GitHub] spark pull request #20511: [SPARK-23340][SQL] Upgrade Apache ORC to 1.4.3

dongjoon-hyun Fri, 16 Feb 2018 10:22:54 -0800

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20511#discussion_r168833177
  
    --- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala
 ---
    @@ -160,6 +160,15 @@ abstract class OrcSuite extends OrcTest with 
BeforeAndAfterAll {
           }
         }
       }
    +
    +  test("SPARK-23340 Empty float/double array columns raise EOFException") {
    +    Seq(Seq(Array.empty[Float]).toDF(), 
Seq(Array.empty[Double]).toDF()).foreach { df =>
    +      withTempPath { path =>
    --- End diff --
    
    Ur, I think you are still confused two things.
    
    First of all, we have five ORC readers. We didn't check `ORC MR reader` and 
`ORC Vectorized Copy` explicitly. We usually test 1, 2, and 3.
    
    1. Hive Serde
    2. Hive OrcFileFormat
    3. Apache ORC Vectorized Wrapper
    4. Apache ORC Vectorized Copy
    5. Apache ORC MR
    
    In this PR, we already adds 1, 2, 3. 3 is the vectorized wrapper reader.
    1. Hive Serde : `HiveOrcQuerySuite.test(SPARK-23340 Empty float/double 
array columns raise EOFException)`
    2. Hive OrcFileFormat: `OrcSourceSuite` <= `OrcSuite.test("SPARK-23340 
Empty float/double array columns raise EOFException")`
    3. Apache ORC Vectorized Wrapper: `HiveOrcSourceSuite` <= 
`OrcSuite.test("SPARK-23340 Empty float/double array columns raise 
EOFException")`
    
    Second, this test schema includes complex types. So, 3 (vectorized wrapper 
reader) configuration is also going to fall-back ORC MR Reader path. In other 
words, case 5. Please note that Apache Spark support vectorization for `Atomic` 
types only in both Parquet and ORC.




---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20511: [SPARK-23340][SQL] Upgrade Apache ORC to 1.4.3

Reply via email to