Hi And. In article <[EMAIL PROTECTED]>, [EMAIL PROTECTED] writes:
and-google> Akihiro KAYAMA wrote: and-google> > As the character set is wider than UTF-16(U+10FFFF), I can't use and-google> > Python's native unicode string class. and-google> and-google> Have you tried using Python compiled in Wide Unicode mode and-google> (--enable-unicode=ucs4)? You get native UTF-32/UCS-4 strings then, and-google> which should be enough for most purposes. >From my quick survey, Python's Unicode support is restricted to UTF-16 range(U+0000...U+10FFFF) intentionally, regardless of --enable-unicode=ucs4 option. > Python 2.4.1 (#2, Sep 3 2005, 22:35:47) > [GCC 2.95.4 20020320 [FreeBSD]] on freebsd4 > Type "help", "copyright", "credits" or "license" for more information. > >>> u"\U0010FFFF" > u'\U0010ffff' > >>> len(u"\U0010FFFF") > 1 > >>> u"\U00110000" > UnicodeDecodeError: 'unicodeescape' codec can't decode bytes in position 0-9: > illegal Unicode character Simple patch to unicodeobject.c which disables unicode range checking could solve this, but I don't want to maintenance specialized Python binary for my project. -- kayama -- http://mail.python.org/mailman/listinfo/python-list