[ https://issues.apache.org/jira/browse/NIFI-4496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16292819#comment-16292819 ]
ASF GitHub Bot commented on NIFI-4496: -------------------------------------- Github user mattyb149 commented on the issue: https://github.com/apache/nifi/pull/2245 @jdye64 I think I fixed the issue you were seeing. We have to do most of the schema resolution/management manually, Jackson's methods for handling that don't seem to work for what we need. So I removed the setting of column names on the parser, having the column names changed the parser to want an actual array with [] surrounding the line (weird, right?). Then for files without headers, I needed to make sure we used the schema field names, so I had to adjust the logic where "rawFieldNames" is generated. Mind taking a look at this latest version? Please and thanks! > Improve performance of CSVReader > -------------------------------- > > Key: NIFI-4496 > URL: https://issues.apache.org/jira/browse/NIFI-4496 > Project: Apache NiFi > Issue Type: Improvement > Components: Extensions > Reporter: Matt Burgess > Assignee: Matt Burgess > > During some throughput testing, it was noted that the CSVReader was not as > fast as desired, processing less than 50k records per second. A look at [this > benchmark|https://github.com/uniVocity/csv-parsers-comparison] implies that > the Apache Commons CSV parser (used by CSVReader) is quite slow compared to > others. > From that benchmark it appears that CSVReader could be enhanced by using a > different CSV parser under the hood. Perhaps Jackson is the best choice, as > it is fast when values are quoted, and is a mature and maintained codebase. -- This message was sent by Atlassian JIRA (v6.4.14#64029)