En Wed, 18 Apr 2007 06:37:56 -0300, <[EMAIL PROTECTED]> escribió: > Hi all, > I'd like to ask about the usage of unicode data on a narrow python build. > Unicode string literals \N{name} work even without the (explicit) import > of unicodedata and it correctly handles also the "wider" unicodes > planes - over FFFF > >>>> u"\N{LATIN SMALL LETTER E}" > u'e' >>>> u"\N{GOTHIC LETTER AHSA}" > u'\U00010330' > > The unicode data functions works analogous in the basic plane, but > behave differently otherwise: > >>>> unicodedata.lookup("LATIN SMALL LETTER E") > u'e' >>>> unicodedata.lookup("GOTHIC LETTER AHSA") > u'\u0330' > > (0001 gets trimmed) > > Is it a bug in unicodedata, or is this the expected behaviour on a > narrow build?
Looks like a bug, but I'm not sure whether in unicodedata or in general Unicode support: py> x=u"\N{GOTHIC LETTER AHSA}" py> ord(x) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: ord() expected a character, but string of length 2 found py> unicodedata.name(x) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: need a single Unicode character as parameter py> len(x) 2 py> list(x) [u'\ud800', u'\udf30'] That looks like UTF-16 (?) but seen as two characters instead of one. Probably in a 32bits build Python should refuse to use such character (and limit Unicode support to the basic plane?) (or not?) (if not, what's the point of sys.maxunicode?) (enough parenthesis for now). Anyway a better place for bug reports is http://sourceforge.net/tracker/?group_id=5470 -- Gabriel Genellina -- http://mail.python.org/mailman/listinfo/python-list