GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/23091
[SPARK-26122][SQL] Support encoding for multiLine in CSV datasource ## What changes were proposed in this pull request? In the PR, I propose to pass the CSV option `encoding`/`charset` to `uniVocity` parser to allow parsing CSV files in different encodings when `multiLine` is enabled. The value of the option is passed to the `beginParsing` method of `CSVParser`. ## How was this patch tested? Added new test to `CSVSuite` for different encodings and enabled/disabled header. You can merge this pull request into a Git repository by running: $ git pull https://github.com/MaxGekk/spark-1 csv-miltiline-encoding Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23091.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23091 ---- commit 1a7a0cb4430f847ac95c0c764393003581415103 Author: Maxim Gekk <maxim.gekk@...> Date: 2018-11-19T20:51:04Z Added a test commit cd57ec5833bbfb5f0b33d63a56b48d25924f6be1 Author: Maxim Gekk <maxim.gekk@...> Date: 2018-11-19T21:07:41Z Test multiple encodings commit 1c76f8944979df8a7b9b8181ebfa38933c3f2c00 Author: Maxim Gekk <maxim.gekk@...> Date: 2018-11-19T21:09:04Z Pass encoding to uniVocity parser commit 16eb14c73f3fad8d83fee41d5665b52f180daf73 Author: Maxim Gekk <maxim.gekk@...> Date: 2018-11-19T21:22:23Z Test with header and without it ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org