XuQianJin-Stars commented on a change in pull request #6823: [FLINK-10134] 
UTF-16 support for TextInputFormat bug refixed
URL: https://github.com/apache/flink/pull/6823#discussion_r226516362
 
 

 ##########
 File path: 
flink-core/src/main/java/org/apache/flink/api/common/io/DelimitedInputFormat.java
 ##########
 @@ -472,6 +498,7 @@ public void open(FileInputSplit split) throws IOException {
 
                this.offset = splitStart;
                if (this.splitStart != 0) {
+                       setBomFileCharset(split);
 
 Review comment:
   I have two questions about this commit, as follows:
   For the first suggestion, I feel that users often cannot know the encoding 
of the file accurately. For example: file encoding `UTF-16LE`, with bom header, 
user-specified encoding `UTF-16BE` will report an error. And there is bom UTF 
with bom encoding I believe will be the majority. So I think it is necessary to 
do the bom code detection first, which is better for the user experience.
   For the fourth recommendation, the seek of `GenericCsvInputFormat` cannot be 
seek to position 0. It calls the `seek` method of `InputStreamFSInputWrapper`. 
This method cannot currently seek to position 0.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to