[GitHub] spark pull request #19928: [SPARK-22267][SQL][TEST] Spark SQL incorrectly re...

dongjoon-hyun Fri, 08 Dec 2017 03:25:22 -0800

GitHub user dongjoon-hyun opened a pull request:

    https://github.com/apache/spark/pull/19928


    [SPARK-22267][SQL][TEST] Spark SQL incorrectly reads ORC files when column 
order is different

    ## What changes were proposed in this pull request?
    
    Until 2.2.1, with the default configuration, Apache Spark returns incorrect 
results when ORC file schema is different from metastore schema order. This is 
due to Hive 1.2.1 library and some issues on `convertMetastoreOrc` option.
    
    ```scala
    scala> Seq(1 -> 2).toDF("c1", 
"c2").write.format("orc").mode("overwrite").save("/tmp/o")
    scala> sql("CREATE EXTERNAL TABLE o(c2 INT, c1 INT) STORED AS orc LOCATION 
'/tmp/o'")
    scala> spark.table("o").show    // This is wrong.
    +---+---+
    | c2| c1|
    +---+---+
    |  1|  2|
    +---+---+
    scala> spark.read.orc("/tmp/o").show  // This is correct.
    +---+---+
    | c1| c2|
    +---+---+
    |  1|  2|
    +---+---+
    ```
    
    After [SPARK-22279](https://github.com/apache/spark/pull/19499), the 
default configuration doesn't have this bug. Although Hive 1.2.1 library code 
path still has the problem, we had better have a test coverage on what we have 
in order to prevent future regression on it.
    
    ## How was this patch tested?
    
    Pass the Jenkins with a newly added test test.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dongjoon-hyun/spark SPARK-22267

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19928.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19928
    
----
commit ea75bce40b6b6b27168e439e4866f859db35ba71
Author: Dongjoon Hyun <dongj...@apache.org>
Date:   2017-12-08T10:57:51Z

    [SPARK-22267][SQL][TEST] Spark SQL incorrectly reads ORC files when column 
order is different

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19928: [SPARK-22267][SQL][TEST] Spark SQL incorrectly re...

Reply via email to