GitHub user smurakozi opened a pull request:

    https://github.com/apache/spark/pull/19906

    [SPARK-22516][SQL] Bump up Univocity version to 2.5.9

    ## What changes were proposed in this pull request?
    
    There was a bug in Univocity Parser that causes the issue in SPARK-22516. 
This was fixed by upgrading from 2.5.4 to 2.5.9 version of the library :
    
    **Executing**
    ```
    spark.read.option("header","true").option("inferSchema", 
"true").option("multiLine", "true").option("comment", 
"g").csv("test_file_without_eof_char.csv").show()
    ```
    **Before**
    ```
    ERROR Executor: Exception in task 0.0 in stage 6.0 (TID 6)
    com.univocity.parsers.common.TextParsingException: 
java.lang.IllegalArgumentException - Unable to skip 1 lines from line 2. End of 
input reached
    ...
    Internal state when error was thrown: line=3, column=0, record=2, 
charIndex=31
        at 
com.univocity.parsers.common.AbstractParser.handleException(AbstractParser.java:339)
        at 
com.univocity.parsers.common.AbstractParser.parseNext(AbstractParser.java:475)
        at 
org.apache.spark.sql.execution.datasources.csv.UnivocityParser$$anon$1.next(UnivocityParser.scala:281)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
    ```
    **After**
    ```
    +-------+-------+
    |column1|column2|
    +-------+-------+
    |    abc|    def|
    +-------+-------+
    ```
    
    ## How was this patch tested?
    The already existing `CSVSuite.commented lines in CSV data` test was 
extended to parse the file also in multiline mode. The test input file was 
modified to also include a comment in the last line.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/smurakozi/spark SPARK-22516

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19906.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19906
    
----
commit 8bc6a9ce9f6eeb854261d26dabaf04052eb8b5b2
Author: smurakozi <smurak...@gmail.com>
Date:   2017-11-27T08:30:25Z

    [SPARK-22516][SQL] Bump up Univocity version to 2.5.9

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to