On 2/12/2012 5:14 PM, Chris Angelico wrote:
On Mon, Feb 13, 2012 at 9:07 AM, Terry Reedy<tjre...@udel.edu>  wrote:
The situation before ascii is like where we ended up *before* unicode.
Unicode aims to replace all those byte encoding and character sets with
*one* byte encoding for *one* character set, which will be a great
simplification. It is the idea of ascii applied on a global rather that
local basis.

Unicode doesn't deal with byte encodings; UTF-8 is an encoding,

The Unicode Standard specifies 3 UTF storage formats* and 8 UTF byte-oriented transmission formats. UTF-8 is the most common of all encodings for web pages. (And ascii pages are utf-8 also.) It is the only one of the 8 most of us need to much bother with. Look here for the list
http://www.unicode.org/glossary/#U
and for details look in various places in
http://www.unicode.org/versions/Unicode6.1.0/ch03.pdf

but so are UTF-16, UTF-32.
> and as many more as you could hope for.

All the non-UTF 'as many more as you could hope for' encodings are not part of Unicode.

* The new internal unicode scheme for 3.3 is pretty much a mixture of the 3 storage formats (I am of course, skipping some details) by using the widest one needed for each string. The advantage is avoiding problems with each of the three. The disadvantage is greater internal complexity, but that should be hidden from users. They will not need to care about the internals. They will be able to forget about 'narrow' versus 'wide' builds and the possible requirement to code differently for each. There will only be one scheme that works the same on all platforms. Most apps should require less space and about the same time.

--
Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to