[ 
https://issues.apache.org/jira/browse/CAMEL-12698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16569794#comment-16569794
 ] 

ASF GitHub Bot commented on CAMEL-12698:
----------------------------------------

onderson commented on a change in pull request #2454: CAMEL-12698: Use the 
Stream API to read files instead of Scanner
URL: https://github.com/apache/camel/pull/2454#discussion_r207786126
 
 

 ##########
 File path: 
components/camel-bindy/src/main/java/org/apache/camel/dataformat/bindy/csv/BindyCsvDataFormat.java
 ##########
 @@ -138,58 +144,74 @@ public Object unmarshal(Exchange exchange, InputStream 
inputStream) throws Excep
         // List of Pojos
         List<Map<String, Object>> models = new ArrayList<>();
 
-        // Pojos of the model
-        Map<String, Object> model;
         InputStreamReader in = null;
-        Scanner scanner = null;
         try {
             if (checkEmptyStream(factory, inputStream)) {
                 return models;
             }
     
             in = new InputStreamReader(inputStream, 
IOHelper.getCharsetName(exchange));
-    
-            // Scanner is used to read big file
-            scanner = new Scanner(in);
-    
+
             // Retrieve the separator defined to split the record
             String separator = factory.getSeparator();
             String quote = factory.getQuote();
             ObjectHelper.notNull(separator, "The separator has not been 
defined in the annotation @CsvRecord or not instantiated during initModel.");
-    
-            int count = 0;
-            
-            // If the first line of the CSV file contains columns name, then we
-            // skip this line
-            if (factory.getSkipFirstLine()) {
-                // Check if scanner is empty
-                if (scanner.hasNextLine()) {
-                    scanner.nextLine();
+            AtomicInteger count = new AtomicInteger(0);
+
+            // Use a Stream to stream a file across.
+            try (Stream<String> lines = new BufferedReader(in).lines()) {
 
 Review comment:
   yes i agree, but this is for backward compatibility. some parts may be 
dropped or cleaned up in camel 3 or maybe your change may be only introduced in 
2.23 unless earlier versions would require the same thing. @oscerd wdyt?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Unmarshaling a CSV file with the NEL (next line) character will cause Bindy 
> to misread the entire file
> ------------------------------------------------------------------------------------------------------
>
>                 Key: CAMEL-12698
>                 URL: https://issues.apache.org/jira/browse/CAMEL-12698
>             Project: Camel
>          Issue Type: Bug
>          Components: camel-bindy
>    Affects Versions: 2.22.0
>            Reporter: Jason Black
>            Priority: Major
>
> I am using Apache Camel to process a lot of large CSV files, and relying on 
> Bindy to assist with unmarshalling them into POJOs.
> We have an upstream data bug which causes a record of ours to contain the 
> Unicode character 
> [NEL|http://www.fileformat.info/info/unicode/char/85/index.htm], but while 
> we're working through the cause of that, I found it curious as to what Bindy 
> is actually doing with it.  We rely on the unmarshal process to perform a 
> batch insert, and because our POJO is missing certain fields, we started 
> observing that the 
> Bindy is relying on Scanner to read lines in a large file; however, Scanner 
> itself also does some parsing of the line with the assumption that, if it 
> sees the NEL character, it will regard it as a newline character.  The modern 
> Files API does not make this distinction and reads to a newline designation 
> only (e.g \n, \r, or \r\n).
> There are two ways to fix this from what I've been able to smoke test:
>  * Change the Scanner implementation to use a delimeter of the more 
> traditional newline characters
>  * Use Java 8's Files API and stream the file in
> I would personally want to use the Files API to handle this since it's more 
> robust and capable of higher performance, but I'll explore both approaches 
> and see where I end up.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to