Nicholas Bastin wrote: > On May 4, 2005, at 6:20 PM, Shane Hathaway wrote: > >>>Nicholas Bastin wrote: >>> >>> >>>>"This type represents the storage type which is used by Python >>>>internally as the basis for holding Unicode ordinals. Extension >>>>module >>>>developers should make no assumptions about the size of this type on >>>>any given platform." >>> >>> >>>But people want to know "Is Python's Unicode 16-bit or 32-bit?" >>>So the documentation should explicitly say "it depends". >> >>On a related note, it would be help if the documentation provided a >>little more background on unicode encoding. Specifically, that UCS-2 >>is >>not the same as UTF-16, even though they're both two bytes wide and >>most >>of the characters are the same. UTF-16 can encode 4 byte characters, >>while UCS-2 can't. A Py_UNICODE is either UCS-2 or UCS-4. It took me > > I'm not sure the Python documentation is the place to teach someone > about unicode. The ISO 10646 pretty clearly defines UCS-2 as only > containing characters in the BMP (plane zero). On the other hand, I > don't know why python lets you choose UCS-2 anyhow, since it's almost > always not what you want.
You've got that wrong: Python let's you choose UCS-4 - UCS-2 is the default. Note that Python's Unicode codecs UTF-8 and UTF-16 are surrogate aware and thus support non-BMP code points regardless of the build type: A UCS2-build of Python will store a non-BMP code point as UTF-16 surrogate pair in the Py_UNICODE buffer while a UCS4 build will store it as a single value. Decoding is surrogate aware too, so a UTF-16 surrogate pair in a UCS2 build will get treated as single Unicode code point. Ideally, the Python programmer should not really need to know all this and I think we've achieved that up to certain point (Unicode can be complicated - there's nothing to hide there). However, the C progammer using the Python C API to interface to some other Unicode implementation will need to know these details. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 06 2005) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! :::: _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com