Steven D'Aprano <steve+pyt...@pearwood.info> added the comment: It has never been the case that upper() or lower() are guaranteed to preserve string length in Unicode. For example, some characters decompose into a base plus combining characters. Ligatures are another example. See here for more details:
https://unicode.org/faq/casemap_charprop.html However, this example surprises me. In Python 2, I get the result I expected: py> c = unichr(304) py> unicodedata.name(c) 'LATIN CAPITAL LETTER I WITH DOT ABOVE' py> unicodedata.name(c.lower()) 'LATIN SMALL LETTER I' If I am reading the UnicodeData.txt file correctly, I think that the right behaviour is for LATIN CAPITAL LETTER I WITH DOT ABOVE to lowercase to LATIN SMALL LETTER I, as it did in Python 2. ftp://ftp.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt ---------- nosy: +steven.daprano _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue33108> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com