[ https://issues.apache.org/jira/browse/CAMEL-12698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Önder Sezgin resolved CAMEL-12698. ---------------------------------- Resolution: Fixed > Unmarshaling a CSV file with the NEL (next line) character will cause Bindy > to misread the entire file > ------------------------------------------------------------------------------------------------------ > > Key: CAMEL-12698 > URL: https://issues.apache.org/jira/browse/CAMEL-12698 > Project: Camel > Issue Type: Improvement > Components: camel-bindy > Affects Versions: 2.22.0 > Reporter: Jason Black > Assignee: Önder Sezgin > Priority: Minor > Fix For: 2.23.0 > > > I am using Apache Camel to process a lot of large CSV files, and relying on > Bindy to assist with unmarshalling them into POJOs. > We have an upstream data bug which causes a record of ours to contain the > Unicode character > [NEL|http://www.fileformat.info/info/unicode/char/85/index.htm], but while > we're working through the cause of that, I found it curious as to what Bindy > is actually doing with it. We rely on the unmarshal process to perform a > batch insert, and because our POJO is missing certain fields, we started > observing that the > Bindy is relying on Scanner to read lines in a large file; however, Scanner > itself also does some parsing of the line with the assumption that, if it > sees the NEL character, it will regard it as a newline character. The modern > Files API does not make this distinction and reads to a newline designation > only (e.g \n, \r, or \r\n). > There are two ways to fix this from what I've been able to smoke test: > * Change the Scanner implementation to use a delimeter of the more > traditional newline characters > * Use Java 8's Files API and stream the file in > I would personally want to use the Files API to handle this since it's more > robust and capable of higher performance, but I'll explore both approaches > and see where I end up. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)