On 2007-03-23 19:18, Jason Orendorff wrote: > Scheme is adding Unicode support in an upcoming standard: > (DRAFT) http://www.r6rs.org/document/lib-html/r6rs-lib-Z-H-3.html > > I have two questions for the python-dev team about Python's Unicode > experiences. If it's convenient, please take a moment to reply. > Thanks in advance. > > 1. In hindsight, what do you think about PEP 261, the Py_UNICODE_WIDE > build option? On balance, has this been good, bad, or indifferent? > What's good/bad about it?
Having narrow and wide builds introduces a level of complexity that seems unnecessary. Few people ever use non-BMP code points and the ones who do can easily get away with UTF-16 surrogates. Most Unixes have chosen to go with UCS4 as storage format, so you have little choice if you want to take advantage of mapping directly to wchar on Unix. Windows has chosen UTF-16 as internal storage format and wchar is 16-bit on that platform. You may also want to consider looking at PEP 263: http://www.python.org/dev/peps/pep-0263 Source code encoding is a great thing ! You can now write native Unicode in Python source code. The only downside is the extra complexity added by the fact that the tokenizer in Py2 works on 8-bit characters. For this reason we had to decode the source code to Unicode, then encode it to UTF-8, pass it to the tokenizer and then decode the UTF-8 literal strings for Unicode back into Unicode again. Ideally, the tokenizer in Py3k should be rewritten to work directly on Unicode. > 2. The idea of multiple string representations has come up (that is, > where all strings are Unicode, but in memory some are 8-bit, some > 16-bit, and some 32-bit--each string uses the narrowest possible > representation). This has been discussed here for Python 3000. My > question is: Is this for real? How far along is it? How likely is > it? My suggestion for Scheme is not to go down that route. It adds complexity for little added value and also makes the implementation slower (due to the frequent conversion from one internal format to another). Can't comment on Py3k - I'm out of that loop. If you want to know more about how Unicode was added to Python 2.x and how it can be used, I suggest you read the following: Unicode integration (one of the first PEPs ever written :-): http://www.python.org/dev/peps/pep-0100 Unicode in Python: http://www.egenix.com/files/python/EuroPython2002-Python-and-Unicode.pdf Designing Unicode-aware Applications in Python: http://www.egenix.com/files/python/EPC2006-Developing-Unicode-aware-applications-in-Python.pdf Hope that helps, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 23 2007) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! :::: _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com