Re: [Python-Dev] UCS2/UCS4 default

Jeroen Ruigrok van der Werven Wed, 02 Jul 2008 11:22:21 -0700

-On [20080702 19:42], Guido van Rossum ([EMAIL PROTECTED]) wrote:
>Yes. At least in the sense that \Uxxxxxxxx gets translated to a
>surrogate pair, and that the UTF-8 codec supports surrogate pairs in
>both directions. It's been like this for a long time. What else would
>you expect from UTF-16 support?


Well, unless I misunderstand things, a Python 3 compiled with the default
Unicode option gives this:

>>> len("\N{MUSICAL SYMBOL G CLEF}")
2

Whereas a Python 3 with --with-wide-unicode gives:


>>> len("\N{MUSICAL SYMBOL G CLEF}")
1

This, of course, causes problems with splitting, finding, and so on. So that
means that a Python 3 with only 2 byte Unicode support is not to be
used/recommended for Unicode outside of the BMP.

-- 
Jeroen Ruigrok van der Werven <asmodai(-at-)in-nomine.org> / asmodai
イェルーン ラウフロック ヴァン デル ウェルヴェン
http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B
Tomorrow's battle is won during today's practice...
_______________________________________________
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] UCS2/UCS4 default

Reply via email to