Pavel Zeger created FLINK-39805:
-----------------------------------

             Summary: FlinkConfigBuilder uses platform-default charset when 
writing log/pod-template files
                 Key: FLINK-39805
                 URL: https://issues.apache.org/jira/browse/FLINK-39805
             Project: Flink
          Issue Type: Bug
          Components: Kubernetes Operator
            Reporter: Pavel Zeger


`flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/config/FlinkConfigBuilder.java`,
 three calls:
{code:java}
File log4jConfFile = new File(tmpDir.getAbsolutePath(), CONFIG_FILE_LOG4J_NAME);
Files.write(log4jConfFile.toPath(), log4jConf.getBytes());

File logbackConfFile = new File(tmpDir.getAbsolutePath(), 
CONFIG_FILE_LOGBACK_NAME);
Files.write(logbackConfFile.toPath(), logbackConf.getBytes());

final File tmp = File.createTempFile(GENERATED_FILE_PREFIX + "podTemplate_", 
".yaml");
Files.write(tmp.toPath(), Serialization.asYaml(podTemplate).getBytes());{code}
`String.getBytes()` (no-arg) encodes using the JVM’s Charset.defaultCharset(), 
which is environment-dependent. On most modern Linux containers it happens to 
be UTF-8, but:
 # On older Linux base images and on container runtimes that don’t set 
LANG=*UTF-8, the default falls back to US-ASCII or ISO-8859-1.
 # On Windows hosts the default is typically windows-1252 or another local code 
page.
 # In a JVM run with -Dfile.encoding=..., the result depends on whatever the 
operator was started with.

When this happens, any non-ASCII character in the user’s log4j.properties, 
logback.xml, or podTemplate.yaml (a UTF-8 emoji in a comment, an 
internationalised label key, an annotation containing a CJK character, 
non-breaking spaces in YAML, etc.) is corrupted.

The pod template case is the most concerning. Users frequently add annotations 
/ labels / env values containing non-ASCII characters (legitimate use cases: 
internationalised tenant labels, owner names with diacritics, region tags, 
etc.). A corrupted YAML written to the temp file is then passed to Kubernetes, 
which either rejects it (best case) or silently accepts a corrupted value 
(worst case).

 

*Proposed fix*
 # Always use UTF-8 explicitly
 # Adding the SpotBugs DM_DEFAULT_ENCODING rule to the project would prevent 
recurrence. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to