Mathias Fußenegger created FLINK-24702:
------------------------------------------

             Summary: toUpperCase/toLowerCase calls may cause problems with 
some system locales
                 Key: FLINK-24702
                 URL: https://issues.apache.org/jira/browse/FLINK-24702
             Project: Flink
          Issue Type: Technical Debt
            Reporter: Mathias Fußenegger


I'm currently exploring the code base and saw several toUpperCase & toLowerCase 
calls on strings without explicitly declaring the Locale.

This means it will use the System Locale which can lead to surprising 
behaviors, see [https://bugs.java.com/bugdatabase/view_bug.do?bug_id=6208680]

 

> String.toLowerCase or String.toUpperCase sometimes fails to work when run in 
>a Turkish or Azeri environment [...]The reason is that Turkish and Azeri have 
>dotted and dotless "i"s, and conversion of these characters leads to results 
>that aren't adequate for strings in other languages

 

I didn't investigate whether the current calls are actually problematic, but 
there could be bugs if there is a .equals() check following a 
toUpperCase/toLowerCase or when these strings are used in map lookups, etc.

 

Projects like Lucene use [https://github.com/policeman-tools/forbidden-apis] to 
prevent this methods from being used to avoid these potential problems.

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to