GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/22676
[SPARK-25684][SQL] Organize header related codes in CSV datasource ## What changes were proposed in this pull request? 1. Move `CSVDataSource.makeSafeHeader` to `CSVUtils.makeSafeHeader` (as is). Rationale: - Historically and at the first place of refactoring (which I did), I intended to put all CSV specific handling (like options), filtering, extracting header, etc. - See `JsonDataSource`. Now `CSVDataSource` is quite consistent with `JsonDataSource`. Since CSV's code path is quite complicated, we might better match them as possible as we can. 2. Move `CSVDataSource.checkHeaderColumnNames` to `CSVHeaderChecker.checkHeaderColumnNames` (as is). Rationale: - Similar reasons above with 1. 3. Put `enforceSchema` logics into `CSVHeaderChecker`. - The checking header and column pruning stuff were added (per https://github.com/apache/spark/pull/20894 and https://github.com/apache/spark/pull/21296) but some of codes such as https://github.com/apache/spark/pull/21296 are duplicated - Also, checking header code is basically here and there. We better put them in a single place, which is quite error-prone. See (https://github.com/apache/spark/pull/22656). ## How was this patch tested? Existing tests should cover this. You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark refactoring-csv Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22676.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22676 ---- commit 56906680ab7d5d63be04bac2c3a19bb52baa3025 Author: hyukjinkwon <gurwls223@...> Date: 2018-10-09T07:26:08Z Organize header related codes in CSV datasource ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org