>> Indeed. As Python *can* encode all characters even in 2-byte mode >> (since PEP 261), it seems clear that Python's Unicode representation >> is *not* strictly UCS-2 anymore. > > Since we're already discussing this, I'm curious - why was UCS-2 > chosen over plain UTF-16 or UTF-8 in the first place for Python's > internal storage?
You mean, originally? Originally, the choice was only between UCS-2 and UCS-4; choice was in favor of UCS-2 because of size concerns. UTF-8 was ruled out easily because it doesn't allow constant-size indexing; UTF-16 essentially for the same reason (plus there was no point to UTF-16, since there were no assigned characters outside the BMP). Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list