"Alexandre Vassalotti" <[EMAIL PROTECTED]> wrote: > Thanks for explanation. Anyway, it certainly much simpler to deal with > surrogate pairs than with variable-width characters.
I don't know, I really liked my tree overlay that could handle variable-width characters of any internal encoding (utf-7, utf-8, utf-16). Of course it takes an extra O(n/logn) space and O(logn) time to access arbitrary characters in the worst case, but such is the case with time/space tradeoffs. - Josiah > On 6/1/07, Josiah Carlson <[EMAIL PROTECTED]> wrote: > > > > "Alexandre Vassalotti" <[EMAIL PROTECTED]> wrote: > > > Hi, > > > > > > I was doing some testing on the new _string_io module, since I was > > > slightly skeptical on my handling of wide Unicode characters (32-bit > > > of length, instead of the usual 16-bit in UTF-16). So, I ran this > > > little test: > > > > > > >>> s = _string_io.StringIO() > > > >>> s.write(u'ð¯£') > > > >>> s.tell() > > > 2 > > > > > > Like I expected, wide Unicode characters count for two. However, I was > > > surprised that Python treats them as two characters as well: > > > > > > >>> len(u'ð¯£') > > > 2 > > > >>> u'ð¯£' > > > u'\ud87e\udccd' > > > > > > Is it a bug, or only an implementation choice? > > > > If your Python is compiled as a UTF-16 build, then any character in the > > extended plane will be seen as two characters by Python. If you are > > using a UCS-4 build (it's the same as UTF-32), then you should be seeing > > the single wide character as a single wide character. The only > > exception to this rule is if you enter the wide character as a surrogate > > pair, in which case Python doesn't normalize it into the single wide > > character. To get a real wide character, you would need to use a proper > > escape, or decode from an encoded string. > > > > > > - Josiah > > > > > > > -- > Alexandre Vassalotti _______________________________________________ Python-3000 mailing list [email protected] http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
