[
https://issues.apache.org/jira/browse/FLINK-1223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14201903#comment-14201903
]
ASF GitHub Bot commented on FLINK-1223:
---------------------------------------
Github user StephanEwen commented on a diff in the pull request:
https://github.com/apache/incubator-flink/pull/187#discussion_r20003612
--- Diff:
flink-core/src/main/java/org/apache/flink/types/parser/StringParser.java ---
@@ -30,7 +34,14 @@
private static final byte WHITESPACE_TAB = (byte) '\t';
private static final byte QUOTE_DOUBLE = (byte) '"';
-
+
+ private static final Set<Byte> trailingCheckSet = Sets.newHashSet(
--- End diff --
The parsers are often stressed badly when reading large CSV files, so we
think a lot about performance here.
I am wondering if a hash set is the best choice to check for these three
elements. Computing hash, table lookup, entry lookup, comparison, ...
Might be cheaper to just hardwire the check for those types...
> Allow value escaping in CSV files
> ---------------------------------
>
> Key: FLINK-1223
> URL: https://issues.apache.org/jira/browse/FLINK-1223
> Project: Flink
> Issue Type: Bug
> Components: Java API, Scala API
> Affects Versions: 0.8-incubating
> Reporter: Johannes
> Priority: Minor
>
> The CSV Parser currently does not interpret escaped values
> The example from here
> http://en.wikipedia.org/wiki/Comma-separated_values#Example
> {code}
> Year,Make,Model,Description,Price
> 1997,Ford,E350,"ac, abs, moon",3000.00
> 1999,Chevy,"Venture ""Extended Edition""","",4900.00
> {code}
> Does not work currently.
> Here escaping inside the string field generates an error.
> For reference
> An interesting post about the fallacies that could be encountered when
> parsing CSV files.
> [http://tburette.github.io/blog/2014/05/25/so-you-want-to-write-your-own-CSV-code/]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)