Matthew Barnett added the comment: The re module works with codepoints, it doesn't understand canonical equivalence.
For example, it doesn't recognise that "\N{LATIN CAPITAL LETTER E}\N{COMBINING ACUTE ACCENT}" is equivalent to "\N{LATIN CAPITAL LETTER E WITH ACUTE}". This is true for Python in general, except for identifiers, which are normalised: >>> "\N{LATIN CAPITAL LETTER E}\N{COMBINING ACUTE ACCENT}" 'É' >>> É = 0 >>> "\N{LATIN CAPITAL LETTER E WITH ACUTE}" 'É' >>> É 0 This also means that, say '.' will match only 1 _codepoint_. ---------- nosy: +mrabarnett _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue31193> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com