[jira] [Commented] (FLINK-3921) StringParser not specifying encoding to use

ASF GitHub Bot (JIRA) Tue, 06 Dec 2016 13:14:37 -0800

    [ 
https://issues.apache.org/jira/browse/FLINK-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15726729#comment-15726729
 ]


ASF GitHub Bot commented on FLINK-3921:
---------------------------------------

Github user fhueske commented on the issue:

    https://github.com/apache/flink/pull/2901
  
    The byte-level implementation as in `IntParser` was originally used to 
avoid String object instances. Such an implementation was not possible with 
Double because it led to very imprecise results. Therefore, we chose the String 
approach there. You are of course right, that we would need to use the String 
approach as well if a charset is used that is not compatible with the current 
byte-level parsing.
    
    IMO, it makes sense to open a JIRA for that and solving this as a follow up 
to this issue.
    Feel free to merge this PR.


> StringParser not specifying encoding to use
> -------------------------------------------
>
>                 Key: FLINK-3921
>                 URL: https://issues.apache.org/jira/browse/FLINK-3921
>             Project: Flink
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 1.0.3
>            Reporter: Tatu Saloranta
>            Assignee: Rekha Joshi
>            Priority: Trivial
>
> Class `flink.types.parser.StringParser` has javadocs indicating that contents 
> are expected to be Ascii, similar to `StringValueParser`. That makes sense, 
> but when constructing actual instance, no encoding is specified; on line 66 
> f.ex:
>    this.result = new String(bytes, startPos+1, i - startPos - 2);
> which leads to using whatever default platform encoding is. If contents 
> really are always Ascii (would not count on that as parser is used from CSV 
> reader), not a big deal, but it can lead to the usual Latin-1-VS-UTF-8 issues.
> So I think that encoding should be explicitly specified, whatever is to be 
> used: javadocs claim ascii, so could be "us-ascii", but could well be UTF-8 
> or even ISO-8859-1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-3921) StringParser not specifying encoding to use

Reply via email to