[GitHub] spark pull request #22287: [SPARK-25135][SQL] FileFormatWriter should respec...

wangyum Thu, 30 Aug 2018 11:15:27 -0700

GitHub user wangyum opened a pull request:

    https://github.com/apache/spark/pull/22287


    [SPARK-25135][SQL] FileFormatWriter should respect the schema of Hive

    ## What changes were proposed in this pull request?
    
    This pr fix `FileFormatWriter's dataSchema`  should respect the schema of 
Hive. Otherwise there will be two issues. 
    
    1.  Throwing an exception(This can be reproduce by added test case):
    ```scala
    java.util.NoSuchElementException: None.get
        at scala.None$.get(Option.scala:347)
        at scala.None$.get(Option.scala:345)
        at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$3$$anonfun$4.apply(FileFormatWriter.scala:87)
        at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$3$$anonfun$4.apply(FileFormatWriter.scala:87)
    ```
    2. The schema of the Hive table is not the same as the schema of the 
parquet file.
    
    ## How was this patch tested?
    
    - Unit tests for FileFormatWriter should respect the schema of Hive.
    - Manual tests for didn't break UI issues fixed by 
[SPARK-22834](https://issues.apache.org/jira/browse/SPARK-22834):
    
![image](https://user-images.githubusercontent.com/5399861/44870021-94ce1700-acc1-11e8-8ef7-d7a8ba3c435d.png)
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/wangyum/spark SPARK-25135-view

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22287.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22287
    
----
commit b54953a8224aa0a7759289a83e876e3bfc166cb6
Author: Yuming Wang <yumwang@...>
Date:   2018-08-30T17:46:02Z

    FileFormatWriter should respect the input query schema in HIVE

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22287: [SPARK-25135][SQL] FileFormatWriter should respec...

Reply via email to