On 12/02/2013 01:23 PM, Chris Angelico wrote:
On Tue, Dec 3, 2013 at 8:14 AM, Ned Batchelder <n...@nedbatchelder.com> wrote:
This is where my knowledge about Unicode gets fuzzy. Isn't it the case that
some grapheme clusters (or whatever the right word is) can't be normalized
down to a single code point? Characters can accept many accents, for
example.
You can't normalize everything down to a single code point, but you
can normalize the other way by breaking out everything that can be
broken out.
print(ascii(unicodedata.normalize("NFKC", "ä")))
'\xe4'
print(ascii(unicodedata.normalize("NFKD", "ä")))
'a\u0308'
Well, Stephen was right then! There's room for a library to handle this
situation. Or is there one already?
--
~Ethan~
--
https://mail.python.org/mailman/listinfo/python-list