[ 
https://issues.apache.org/jira/browse/SPARK-15148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15271964#comment-15271964
 ] 

Apache Spark commented on SPARK-15148:
--------------------------------------

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/12923

>  Upgrade Univocity library from 2.0.2 to 2.1.0
> ----------------------------------------------
>
>                 Key: SPARK-15148
>                 URL: https://issues.apache.org/jira/browse/SPARK-15148
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.0.0
>            Reporter: Hyukjin Kwon
>            Priority: Minor
>
> It looks a new release of Univocity CSV library was published, 
> https://github.com/uniVocity/univocity-parsers/releases.
> This contains some improvements as below:
> {quote}
> 1. Performance improvements for parsing/writing CSV and TSV. CSV writing and 
> parsing got 30-40% faster.
> 2. Deprecated methods setParseUnescapedQuotes and 
> setParseUnescapedQuotesUntilDelimiter class CsvParserSettings in favor of the 
> new setUnescapedQuoteHandling method that takes values from the 
> UnescapedQuoteHandling enumeration.
> 3. Default behavior of the CSV parser when unescaped quotes are found on the 
> input changed to parse until a delimiter character is found, i.e. 
> UnescapedQuoteHandling.STOP_AT_DELIMITER. The old default of trying to find a 
> closing quote (i.e. UnescapedQuoteHandling.STOP_AT_CLOSING_QUOTE) can be 
> problematic when no closing quote is found, making the parser accumulate all 
> characters into the same value, until the end of the input.
> {quote}
> With Spark,
> Firstly, It uses this library for CSV data source. This will affect the 
> performance.
> Secondly, Spark uses {{setParseUnescapedQuotesUntilDelimiter}} which is 
> deprecated in this version because It seems there are some more 
> functionalities for parsing unescaped quotes. This seems not directly related 
> with Spark but we might have to consider using this in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to