GitHub user MaxGekk opened a pull request:

    https://github.com/apache/spark/pull/23091

    [SPARK-26122][SQL] Support encoding for multiLine in CSV datasource

    ## What changes were proposed in this pull request?
    
    In the PR, I propose to pass the CSV option `encoding`/`charset` to 
`uniVocity` parser to allow parsing CSV files in different encodings when 
`multiLine` is enabled. The value of the option is passed to the `beginParsing` 
method of `CSVParser`.
    
    ## How was this patch tested?
    
    Added new test to `CSVSuite` for different encodings and enabled/disabled 
header.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/MaxGekk/spark-1 csv-miltiline-encoding

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/23091.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #23091
    
----
commit 1a7a0cb4430f847ac95c0c764393003581415103
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-11-19T20:51:04Z

    Added a test

commit cd57ec5833bbfb5f0b33d63a56b48d25924f6be1
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-11-19T21:07:41Z

    Test multiple encodings

commit 1c76f8944979df8a7b9b8181ebfa38933c3f2c00
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-11-19T21:09:04Z

    Pass encoding to uniVocity parser

commit 16eb14c73f3fad8d83fee41d5665b52f180daf73
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-11-19T21:22:23Z

    Test with header and without it

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to