Re: unicode data - accessing codepoints > FFFF on narrow python builts

Gabriel Genellina Wed, 18 Apr 2007 12:33:28 -0700

En Wed, 18 Apr 2007 06:37:56 -0300, <[EMAIL PROTECTED]> escribió:

> Hi all,
> I'd like to ask about the usage of unicode data on a narrow python build.
> Unicode string literals \N{name} work even without the (explicit) import  
> of unicodedata and it correctly handles also the  "wider" unicodes  
> planes - over FFFF
>
>>>>  u"\N{LATIN SMALL LETTER E}"
> u'e'
>>>>  u"\N{GOTHIC LETTER AHSA}"
> u'\U00010330'
>
> The unicode data functions works analogous in the basic plane, but  
> behave differently otherwise:
>
>>>>  unicodedata.lookup("LATIN SMALL LETTER E")
> u'e'
>>>> unicodedata.lookup("GOTHIC LETTER AHSA")
> u'\u0330'
>
> (0001 gets trimmed)
>
> Is it a bug in unicodedata, or is this the expected behaviour on a  
> narrow build?


Looks like a bug, but I'm not sure whether in unicodedata or in general  
Unicode support:

py> x=u"\N{GOTHIC LETTER AHSA}"
py> ord(x)
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
TypeError: ord() expected a character, but string of length 2 found
py> unicodedata.name(x)
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
TypeError: need a single Unicode character as parameter
py> len(x)
2
py> list(x)
[u'\ud800', u'\udf30']

That looks like UTF-16 (?) but seen as two characters instead of one.
Probably in a 32bits build Python should refuse to use such character (and  
limit Unicode support to the basic plane?) (or not?) (if not, what's the  
point of sys.maxunicode?) (enough parenthesis for now).

Anyway a better place for bug reports is  
http://sourceforge.net/tracker/?group_id=5470

-- 
Gabriel Genellina

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: unicode data - accessing codepoints > FFFF on narrow python builts

Reply via email to