On Tue, Dec 3, 2013 at 8:14 AM, Ned Batchelder <n...@nedbatchelder.com> wrote: > This is where my knowledge about Unicode gets fuzzy. Isn't it the case that > some grapheme clusters (or whatever the right word is) can't be normalized > down to a single code point? Characters can accept many accents, for > example.
You can't normalize everything down to a single code point, but you can normalize the other way by breaking out everything that can be broken out. >>> print(ascii(unicodedata.normalize("NFKC", "ä"))) '\xe4' >>> print(ascii(unicodedata.normalize("NFKD", "ä"))) 'a\u0308' ChrisA -- https://mail.python.org/mailman/listinfo/python-list