[ https://issues.apache.org/jira/browse/NIFI-4496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16234544#comment-16234544 ]
ASF GitHub Bot commented on NIFI-4496: -------------------------------------- Github user andrewmlim commented on a diff in the pull request: https://github.com/apache/nifi/pull/2245#discussion_r148347427 --- Diff: nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/csv/CSVReader.java --- @@ -54,6 +54,26 @@ "The first non-comment line of the CSV file is a header line that contains the names of the columns. The schema will be derived by using the " + "column names in the header and assuming that all columns are of type String."); + // CSV parsers + public static final AllowableValue APACHE_COMMONS_CSV = new AllowableValue("commons-csv", "Apache Commons CSV", + "The CSV parser implementation from the Apache Commons CSV library."); + + public static final AllowableValue JACKSON_CSV = new AllowableValue("jackson-csv", "Jackson CSV", + "The CSV parser implementation from the Jackson Dataformats library"); --- End diff -- Need a period (.) after library to be consistent. > Improve performance of CSVReader > -------------------------------- > > Key: NIFI-4496 > URL: https://issues.apache.org/jira/browse/NIFI-4496 > Project: Apache NiFi > Issue Type: Improvement > Components: Extensions > Reporter: Matt Burgess > Assignee: Matt Burgess > Priority: Major > > During some throughput testing, it was noted that the CSVReader was not as > fast as desired, processing less than 50k records per second. A look at [this > benchmark|https://github.com/uniVocity/csv-parsers-comparison] implies that > the Apache Commons CSV parser (used by CSVReader) is quite slow compared to > others. > From that benchmark it appears that CSVReader could be enhanced by using a > different CSV parser under the hood. Perhaps Jackson is the best choice, as > it is fast when values are quoted, and is a mature and maintained codebase. -- This message was sent by Atlassian JIRA (v6.4.14#64029)