Steve D'Aprano <steve+pyt...@pearwood.info> writes: > From time to time, people discover that Python's string algorithms work on > code > points rather than "real characters", which can lead to anomalies like the > following: > > s = 'xäex' > s = unicodedata.normalize('NFD', s) > print(s) > print(s[::-1]) > > > which results in: > > xäex > xëax
> If you're interested in this issue Note that it depends on the difference between two apparently identical strings:: >>> s1 = 'xäex' >>> s2 = unicodedata.normalize('NFD', s1) >>> s1, s2 ('xäex', 'xäex') The strings are different, and the items you get when iterating them are different:: >>> len(s1), len(s2) (4, 5) >>> [unicodedata.name(c) for c in s1] ['LATIN SMALL LETTER X', 'LATIN SMALL LETTER A WITH DIAERESIS', 'LATIN SMALL LETTER E', 'LATIN SMALL LETTER X'] >>> [unicodedata.name(c) for c in s2] ['LATIN SMALL LETTER X', 'LATIN SMALL LETTER A', 'COMBINING DIAERESIS', 'LATIN SMALL LETTER E', 'LATIN SMALL LETTER X'] which explains why they're different when reversed:: >>> [unicodedata.name(c) for c in reversed(s1)] ['LATIN SMALL LETTER X', 'LATIN SMALL LETTER E', 'LATIN SMALL LETTER A WITH DIAERESIS', 'LATIN SMALL LETTER X'] >>> "".join(reversed(s1)) 'xeäx' >>> [unicodedata.name(c) for c in reversed(s2)] ['LATIN SMALL LETTER X', 'LATIN SMALL LETTER E', 'COMBINING DIAERESIS', 'LATIN SMALL LETTER A', 'LATIN SMALL LETTER X'] >>> "".join(reversed(s2)) 'xëax' -- \ “I know that we can never get rid of religion …. But that | `\ doesn’t mean I shouldn’t hate the lie of faith consistently and | _o__) without apology.” —Paul Z. Myers, 2011-12-28 | Ben Finney -- https://mail.python.org/mailman/listinfo/python-list