Re: A few questiosn about encoding

MRAB Thu, 20 Jun 2013 10:21:07 -0700

On 20/06/2013 17:37, Chris Angelico wrote:

On Fri, Jun 21, 2013 at 2:27 AM,  <[email protected]> wrote:

And all these coding schemes have something in common,
they work all with a unique set of code points, more
precisely a unique set of encoded code points (not
the set of implemented code points (byte)).


Just what the flexible string representation is not
doing, it artificially devides unicode in subsets and try
to handle eache subset differently.



UTF-16 divides Unicode into two subsets: BMP characters (encoded using
one 16-bit unit) and astral characters (encoded using two 16-bit units
in the D800::/5 netblock, or equivalent thereof). Your beloved narrow
builds are guilty of exactly the same crime as the hated 3.3.

UTF-8 divides Unicode into subsets which are encoded in 1, 2, 3, or 4
bytes, and those who previously used ASCII still need only 1 byte per
codepoint!

--
http://mail.python.org/mailman/listinfo/python-list

Re: A few questiosn about encoding

Reply via email to