[jira] [Commented] (NIFI-4496) Improve performance of CSVReader

ASF GitHub Bot (JIRA) Wed, 01 Nov 2017 11:39:19 -0700

    [ 
https://issues.apache.org/jira/browse/NIFI-4496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16234544#comment-16234544
 ]


ASF GitHub Bot commented on NIFI-4496:
--------------------------------------

Github user andrewmlim commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/2245#discussion_r148347427
  
    --- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/csv/CSVReader.java
 ---
    @@ -54,6 +54,26 @@
             "The first non-comment line of the CSV file is a header line that 
contains the names of the columns. The schema will be derived by using the "
                 + "column names in the header and assuming that all columns 
are of type String.");
     
    +    // CSV parsers
    +    public static final AllowableValue APACHE_COMMONS_CSV = new 
AllowableValue("commons-csv", "Apache Commons CSV",
    +            "The CSV parser implementation from the Apache Commons CSV 
library.");
    +
    +    public static final AllowableValue JACKSON_CSV = new 
AllowableValue("jackson-csv", "Jackson CSV",
    +            "The CSV parser implementation from the Jackson Dataformats 
library");
    --- End diff --
    
    Need a period (.) after library to be consistent.


> Improve performance of CSVReader
> --------------------------------
>
>                 Key: NIFI-4496
>                 URL: https://issues.apache.org/jira/browse/NIFI-4496
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Extensions
>            Reporter: Matt Burgess
>            Assignee: Matt Burgess
>            Priority: Major
>
> During some throughput testing, it was noted that the CSVReader was not as 
> fast as desired, processing less than 50k records per second. A look at [this 
> benchmark|https://github.com/uniVocity/csv-parsers-comparison] implies that 
> the Apache Commons CSV parser (used by CSVReader) is quite slow compared to 
> others.
> From that benchmark it appears that CSVReader could be enhanced by using a 
> different CSV parser under the hood. Perhaps Jackson is the best choice, as 
> it is fast when values are quoted, and is a mature and maintained codebase.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (NIFI-4496) Improve performance of CSVReader

Reply via email to