Re: Python Unicode handling wins again -- mostly

Ethan Furman Mon, 02 Dec 2013 13:49:21 -0800

On 12/02/2013 01:23 PM, Chris Angelico wrote:

On Tue, Dec 3, 2013 at 8:14 AM, Ned Batchelder <[email protected]> wrote:

This is where my knowledge about Unicode gets fuzzy.  Isn't it the case that
some grapheme clusters (or whatever the right word is) can't be normalized
down to a single code point?  Characters can accept many accents, for
example.


You can't normalize everything down to a single code point, but you
can normalize the other way by breaking out everything that can be
broken out.

print(ascii(unicodedata.normalize("NFKC", "ä")))

'\xe4'

print(ascii(unicodedata.normalize("NFKD", "ä")))

'a\u0308'


Well, Stephen was right then!  There's room for a library to handle this 
situation.  Or is there one already?

--
~Ethan~
--
https://mail.python.org/mailman/listinfo/python-list

Re: Python Unicode handling wins again -- mostly

Reply via email to