[ 
https://issues.apache.org/jira/browse/CAMEL-12698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16611641#comment-16611641
 ] 

ASF GitHub Bot commented on CAMEL-12698:
----------------------------------------

MakotoTheKnight commented on a change in pull request #2454: CAMEL-12698: Use 
the Stream API to read files instead of Scanner
URL: https://github.com/apache/camel/pull/2454#discussion_r216906514
 
 

 ##########
 File path: 
components/camel-bindy/src/main/java/org/apache/camel/dataformat/bindy/csv/BindyCsvDataFormat.java
 ##########
 @@ -16,6 +16,19 @@
  */
 package org.apache.camel.dataformat.bindy.csv;
 
+import org.apache.camel.Exchange;
 
 Review comment:
   I may have anticipated that this would fail the build.  I've seen the errors 
and have corrected them with the latest commits.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Unmarshaling a CSV file with the NEL (next line) character will cause Bindy 
> to misread the entire file
> ------------------------------------------------------------------------------------------------------
>
>                 Key: CAMEL-12698
>                 URL: https://issues.apache.org/jira/browse/CAMEL-12698
>             Project: Camel
>          Issue Type: Bug
>          Components: camel-bindy
>    Affects Versions: 2.22.0
>            Reporter: Jason Black
>            Priority: Major
>
> I am using Apache Camel to process a lot of large CSV files, and relying on 
> Bindy to assist with unmarshalling them into POJOs.
> We have an upstream data bug which causes a record of ours to contain the 
> Unicode character 
> [NEL|http://www.fileformat.info/info/unicode/char/85/index.htm], but while 
> we're working through the cause of that, I found it curious as to what Bindy 
> is actually doing with it.  We rely on the unmarshal process to perform a 
> batch insert, and because our POJO is missing certain fields, we started 
> observing that the 
> Bindy is relying on Scanner to read lines in a large file; however, Scanner 
> itself also does some parsing of the line with the assumption that, if it 
> sees the NEL character, it will regard it as a newline character.  The modern 
> Files API does not make this distinction and reads to a newline designation 
> only (e.g \n, \r, or \r\n).
> There are two ways to fix this from what I've been able to smoke test:
>  * Change the Scanner implementation to use a delimeter of the more 
> traditional newline characters
>  * Use Java 8's Files API and stream the file in
> I would personally want to use the Files API to handle this since it's more 
> robust and capable of higher performance, but I'll explore both approaches 
> and see where I end up.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to