New submission from Steven D'Aprano <steve+pyt...@pearwood.info>:

str.capitalize appears to uppercase the first character of the string, which is 
okay for ASCII but not for non-English letters.

For example, the letter NJ in Croatian appears as Nj at the start of words when 
the first character is capitalized:

Njemačka ('Germany'), not NJemačka.

(In ASCII, that's Njemacka not NJemacka.)

https://en.wikipedia.org/wiki/Gaj's_Latin_alphabet#Digraphs

But using any of:

U+01CA LATIN CAPITAL LETTER NJ
U+01CB LATIN CAPITAL LETTER N WITH SMALL LETTER J
U+01CC LATIN SMALL LETTER NJ 

we get the wrong result with capitalize:


py> 'NJemačka'.capitalize()
'NJemačka'
py> 'Njemačka'.capitalize()
'NJemačka'
py> 'njemačka'.capitalize()
'NJemačka'


I believe that the correct behaviour is to titlecase the first code point and 
lowercase the rest, which is what the Apache library here does:

https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#capitalize-java.lang.String-

----------
messages: 339568
nosy: steven.daprano
priority: normal
severity: normal
status: open
title: str.capitalize should titlecase the first character not uppercase

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue36549>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to