GitHub user koertkuipers opened a pull request:

    https://github.com/apache/spark/pull/22123

    [SPARK-25134][SQL] Csv column pruning with checking of headers throws 
incorrect error

    ## What changes were proposed in this pull request?
    
    When column pruning is turned on the checking of headers in the csv should 
only be for the fields in the requiredSchema, not the dataSchema, because 
column pruning means only requiredSchema is read.
    
    ## How was this patch tested?
    
    Added 2 unit tests where column pruning is turned on/off and csv headers 
are checked againt schema 
    
    Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tresata-opensource/spark 
feat-csv-column-pruning-and-check-header

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22123.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22123
    
----
commit dcd9ac45673af31e59dcfb633a2b87f76f2bee03
Author: Koert Kuipers <koert@...>
Date:   2018-08-16T15:35:16Z

    if csv column-pruning is turned on header should be checked with 
requiredSchema not dataSchema

commit c4179a9f0a85b412178323e6cb881385fa644051
Author: Koert Kuipers <koert@...>
Date:   2018-08-16T15:52:02Z

    update jira reference in unit test

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to