On Sat, Feb 21, 2009 at 9:45 PM, "Martin v. Löwis" <mar...@v.loewis.de> wrote: >>> Indeed. As Python *can* encode all characters even in 2-byte mode >>> (since PEP 261), it seems clear that Python's Unicode representation >>> is *not* strictly UCS-2 anymore. >> >> Since we're already discussing this, I'm curious - why was UCS-2 >> chosen over plain UTF-16 or UTF-8 in the first place for Python's >> internal storage? > > You mean, originally? Originally, the choice was only between UCS-2 > and UCS-4; choice was in favor of UCS-2 because of size concerns. > UTF-8 was ruled out easily because it doesn't allow constant-size > indexing; UTF-16 essentially for the same reason (plus there was > no point to UTF-16, since there were no assigned characters outside > the BMP).
Yes, I failed to realise how long ago the unicode data type was implemented originally. :-) Thanks for the explanation. -- Denis Kasak -- http://mail.python.org/mailman/listinfo/python-list