Raymond Hettinger writes: > Neither UTF-16 nor UCS-2 is exactly correct anyway.
>From a standards lawyer point of view, UCS-2 is exactly correct, as far as I can tell upon rereading ISO 10646-1, especially Annexes H ("retransmitting devices") and Q ("UTF-16"). Annex Q makes it clear that UTF-16 was intentionally designed so that Python-style processing could be done in a UCS-2 context. > For the "wide" build, the entire range of unicode is encoded at > 4 bytes per character and slicing/len operate correctly since > every character is the same length. This used to be called UCS-4 > and is now UTF-32. That's inaccurate, I believe. UCS-4 is not a UTF, and doesn't satisfy the range restrictions of a UTF. > So, with "wide" builds there isn't much confusion (except perhaps > unfamiliar terminology). The real issue seems to be that for > "narrow" builds, none of the usual encoding names is exactly > correct. I disagree. I do see a problem with "UCS-2", because it fails to tell us that Python implements a large number of features that make it easy to do a very good job of working with non-BMP data in 16-bit builds of Python, with no extra effort. Python is not perfect, and (rarely) some of the imperfections may be very distressing. But it's very good, and deserves to be advertised as such. However, I don't see how "narrow" tells us more than "UCS-2" does. If "UCS-2" is equally (or more) informative, I prefer it because it is the technically precise, already well-defined, term. > From a users point-of-view, the actual encoding or encoding name > doesn't matter much. They just need to be able to predict the relevant > behaviors (memory consumption and len/slicing behavior). "UCS-2" indicates those behaviors precisely and concisely. The problems are (a) the lack of familiarity of users with this term, if David is reasonably representative, and (b) the fact that it fails to advertise Python's UTF-16 capabilities. "Narrow" suffers from both of those problems, and further from the fact that it has no independent standard definition. Furthermore, "wide" has a very widespread, platform-dependent meaning derived from wchar_t. If we have to document what the terms we choose mean anyway, why not document the existing terms and reduce entropy, rather than invent new ones and increase entropy? _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com