-On [20080702 19:42], Guido van Rossum ([EMAIL PROTECTED]) wrote: >Yes. At least in the sense that \Uxxxxxxxx gets translated to a >surrogate pair, and that the UTF-8 codec supports surrogate pairs in >both directions. It's been like this for a long time. What else would >you expect from UTF-16 support?
Well, unless I misunderstand things, a Python 3 compiled with the default Unicode option gives this: >>> len("\N{MUSICAL SYMBOL G CLEF}") 2 Whereas a Python 3 with --with-wide-unicode gives: >>> len("\N{MUSICAL SYMBOL G CLEF}") 1 This, of course, causes problems with splitting, finding, and so on. So that means that a Python 3 with only 2 byte Unicode support is not to be used/recommended for Unicode outside of the BMP. -- Jeroen Ruigrok van der Werven <asmodai(-at-)in-nomine.org> / asmodai イェルーン ラウフロック ヴァン デル ウェルヴェン http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B Tomorrow's battle is won during today's practice... _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com