GitHub user koertkuipers opened a pull request: https://github.com/apache/spark/pull/22123
[SPARK-25134][SQL] Csv column pruning with checking of headers throws incorrect error ## What changes were proposed in this pull request? When column pruning is turned on the checking of headers in the csv should only be for the fields in the requiredSchema, not the dataSchema, because column pruning means only requiredSchema is read. ## How was this patch tested? Added 2 unit tests where column pruning is turned on/off and csv headers are checked againt schema Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tresata-opensource/spark feat-csv-column-pruning-and-check-header Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22123.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22123 ---- commit dcd9ac45673af31e59dcfb633a2b87f76f2bee03 Author: Koert Kuipers <koert@...> Date: 2018-08-16T15:35:16Z if csv column-pruning is turned on header should be checked with requiredSchema not dataSchema commit c4179a9f0a85b412178323e6cb881385fa644051 Author: Koert Kuipers <koert@...> Date: 2018-08-16T15:52:02Z update jira reference in unit test ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org