New submission from Steven D'Aprano <steve+pyt...@pearwood.info>: str.capitalize appears to uppercase the first character of the string, which is okay for ASCII but not for non-English letters.
For example, the letter NJ in Croatian appears as Nj at the start of words when the first character is capitalized: Njemačka ('Germany'), not NJemačka. (In ASCII, that's Njemacka not NJemacka.) https://en.wikipedia.org/wiki/Gaj's_Latin_alphabet#Digraphs But using any of: U+01CA LATIN CAPITAL LETTER NJ U+01CB LATIN CAPITAL LETTER N WITH SMALL LETTER J U+01CC LATIN SMALL LETTER NJ we get the wrong result with capitalize: py> 'NJemačka'.capitalize() 'NJemačka' py> 'Njemačka'.capitalize() 'NJemačka' py> 'njemačka'.capitalize() 'NJemačka' I believe that the correct behaviour is to titlecase the first code point and lowercase the rest, which is what the Apache library here does: https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#capitalize-java.lang.String- ---------- messages: 339568 nosy: steven.daprano priority: normal severity: normal status: open title: str.capitalize should titlecase the first character not uppercase _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue36549> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com