Re: Python Unicode handling wins again -- mostly

Chris Angelico Mon, 02 Dec 2013 13:25:03 -0800

On Tue, Dec 3, 2013 at 8:14 AM, Ned Batchelder <[email protected]> wrote:
> This is where my knowledge about Unicode gets fuzzy.  Isn't it the case that
> some grapheme clusters (or whatever the right word is) can't be normalized
> down to a single code point?  Characters can accept many accents, for
> example.


You can't normalize everything down to a single code point, but you
can normalize the other way by breaking out everything that can be
broken out.

>>> print(ascii(unicodedata.normalize("NFKC", "ä")))
'\xe4'
>>> print(ascii(unicodedata.normalize("NFKD", "ä")))
'a\u0308'

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Python Unicode handling wins again -- mostly

Reply via email to