[ https://issues.apache.org/jira/browse/SPARK-15148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15271964#comment-15271964 ]
Apache Spark commented on SPARK-15148: -------------------------------------- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/12923 > Upgrade Univocity library from 2.0.2 to 2.1.0 > ---------------------------------------------- > > Key: SPARK-15148 > URL: https://issues.apache.org/jira/browse/SPARK-15148 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 2.0.0 > Reporter: Hyukjin Kwon > Priority: Minor > > It looks a new release of Univocity CSV library was published, > https://github.com/uniVocity/univocity-parsers/releases. > This contains some improvements as below: > {quote} > 1. Performance improvements for parsing/writing CSV and TSV. CSV writing and > parsing got 30-40% faster. > 2. Deprecated methods setParseUnescapedQuotes and > setParseUnescapedQuotesUntilDelimiter class CsvParserSettings in favor of the > new setUnescapedQuoteHandling method that takes values from the > UnescapedQuoteHandling enumeration. > 3. Default behavior of the CSV parser when unescaped quotes are found on the > input changed to parse until a delimiter character is found, i.e. > UnescapedQuoteHandling.STOP_AT_DELIMITER. The old default of trying to find a > closing quote (i.e. UnescapedQuoteHandling.STOP_AT_CLOSING_QUOTE) can be > problematic when no closing quote is found, making the parser accumulate all > characters into the same value, until the end of the input. > {quote} > With Spark, > Firstly, It uses this library for CSV data source. This will affect the > performance. > Secondly, Spark uses {{setParseUnescapedQuotesUntilDelimiter}} which is > deprecated in this version because It seems there are some more > functionalities for parsing unescaped quotes. This seems not directly related > with Spark but we might have to consider using this in the future. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org