[issue33108] Unicode char 304 in lowercase has len = 2

Steven D'Aprano Tue, 20 Mar 2018 09:13:20 -0700

Steven D'Aprano <steve+pyt...@pearwood.info> added the comment:

It has never been the case that upper() or lower() are guaranteed to preserve 
string length in Unicode. For example, some characters decompose into a base 
plus combining characters. Ligatures are another example. See here for more 
details:


https://unicode.org/faq/casemap_charprop.html


However, this example surprises me. In Python 2, I get the result I expected:

py> c = unichr(304)
py> unicodedata.name(c)
'LATIN CAPITAL LETTER I WITH DOT ABOVE'
py> unicodedata.name(c.lower())
'LATIN SMALL LETTER I'


If I am reading the UnicodeData.txt file correctly, I think that the right 
behaviour is for LATIN CAPITAL LETTER I WITH DOT ABOVE to lowercase to LATIN 
SMALL LETTER I, as it did in Python 2.

ftp://ftp.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt

----------
nosy: +steven.daprano

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue33108>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue33108] Unicode char 304 in lowercase has len = 2

Reply via email to