[
https://issues.apache.org/jira/browse/FLINK-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15899233#comment-15899233
]
ASF GitHub Bot commented on FLINK-5824:
---------------------------------------
Github user StephanEwen commented on the issue:
https://github.com/apache/flink/pull/3468
I would prefer to keep the value in `ConfigConstants`, just because it
clearly defines in one place what we want to use as the default (rather than
passing `StandardCharsets.UTF_8` everywhere).
As for the name, I think both suggestions are fine in the end. It is
unlikely that we ever switch the charset, especially since checkpoints and
high-availability storage encodes data in that charset.
What would be good to have is a test that checks that we never use a byte
conversion without a charset. Maybe a checkstyle rule. If that does not work,
we can try and do a `Reflections` test (see RpcCompletenessTest for an example
of how to reflectively analyze code in tests).
> Fix String/byte conversions without explicit encoding
> -----------------------------------------------------
>
> Key: FLINK-5824
> URL: https://issues.apache.org/jira/browse/FLINK-5824
> Project: Flink
> Issue Type: Bug
> Components: Python API, Queryable State, State Backends,
> Checkpointing, Webfrontend
> Reporter: Ufuk Celebi
> Assignee: Dawid Wysakowicz
> Priority: Blocker
>
> In a couple of places we convert Strings to bytes and bytes back to Strings
> without explicitly specifying an encoding. This can lead to problems when
> client and server default encodings differ.
> The task of this JIRA is to go over the whole project and look for
> conversions where we don't specify an encoding and fix it to specify UTF-8
> explicitly.
> For starters, we can {{grep -R 'getBytes()' .}}, which already reveals many
> problematic places.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)