fhueske commented on a change in pull request #6823: [FLINK-10134] UTF-16 
support for TextInputFormat bug refixed
URL: https://github.com/apache/flink/pull/6823#discussion_r225553146
 
 

 ##########
 File path: 
flink-core/src/main/java/org/apache/flink/api/common/io/DelimitedInputFormat.java
 ##########
 @@ -62,26 +64,41 @@
        // Charset is not serializable
        private transient Charset charset;
 
+       /**
+        * The charset of bom in the file to process.
+        */
+       private transient Charset bomCharset;
+
+       /**
+        * The Map to record the BOM encoding of all files.
+        */
+       private transient final Map<String, Charset> fileBomCharsetMap;
+
+       /**
+        * The stepSize to record different encoding formats.
+        */
+       protected transient int charsetStepSize = 1;
 
 Review comment:
   UTF-16 is variable length encoded. Hence, the number of bytes of a character 
depend on the character not only on the charset. I'd remove this variable and 
move it to `TextInputFormat` which uses it in a specific way.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to