Hi, thanks for the answer,
> From: Gabriel Genellina <[EMAIL PROTECTED]>
> Subj: Re: unicode data - accessing codepoints > FFFF on narrow python builts
> Datum: 18.4.2007 21:33:11
> ----------------------------------------
>
> py> x=u"\N{GOTHIC LETTER AHSA}"
> py> ord(x)
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> TypeError: ord() expected a character, but string of length 2 found
> py> unicodedata.name(x)
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> TypeError: need a single Unicode character as parameter
> py> len(x)
> 2
> py> list(x)
> [u'\ud800', u'\udf30']
>
> That looks like UTF-16 (?) but seen as two characters instead of one.
> Probably in a 32bits build Python should refuse to use such character (and
> limit Unicode support to the basic plane?) (or not?) (if not, what's the
> point of sys.maxunicode?) (enough parenthesis for now).
>
> --
> Gabriel Genellina
>
Yes, this is a UTF-16 surrogate pair, which is, as far as I know the usual way
the characters outside the basic plane are handled on narrow python builds.
There are some problems with it, but most things (I need) with non-basic plane
characters can be done this way (GUI display, utf-8 text saving) - thus I
wouldn't be happy, if this support were removed.
The problem is the access to unicodedata, which requires "a string of length
1"; I thought, it could also accept the codepoint number, but it doesn't seem
to be possible.
Thanks again.
vbr - Vlastimil Brom
--
http://mail.python.org/mailman/listinfo/python-list