Mathias Fußenegger created FLINK-24702:
------------------------------------------
Summary: toUpperCase/toLowerCase calls may cause problems with
some system locales
Key: FLINK-24702
URL: https://issues.apache.org/jira/browse/FLINK-24702
Project: Flink
Issue Type: Technical Debt
Reporter: Mathias Fußenegger
I'm currently exploring the code base and saw several toUpperCase & toLowerCase
calls on strings without explicitly declaring the Locale.
This means it will use the System Locale which can lead to surprising
behaviors, see [https://bugs.java.com/bugdatabase/view_bug.do?bug_id=6208680]
> String.toLowerCase or String.toUpperCase sometimes fails to work when run in
>a Turkish or Azeri environment [...]The reason is that Turkish and Azeri have
>dotted and dotless "i"s, and conversion of these characters leads to results
>that aren't adequate for strings in other languages
I didn't investigate whether the current calls are actually problematic, but
there could be bugs if there is a .equals() check following a
toUpperCase/toLowerCase or when these strings are used in map lookups, etc.
Projects like Lucene use [https://github.com/policeman-tools/forbidden-apis] to
prevent this methods from being used to avoid these potential problems.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)