On Fri, Jun 21, 2013 at 3:17 AM, MRAB <pyt...@mrabarnett.plus.com> wrote: > On 20/06/2013 17:37, Chris Angelico wrote: >> >> On Fri, Jun 21, 2013 at 2:27 AM, <wxjmfa...@gmail.com> wrote: >>> >>> And all these coding schemes have something in common, >>> they work all with a unique set of code points, more >>> precisely a unique set of encoded code points (not >>> the set of implemented code points (byte)). >>> >>> Just what the flexible string representation is not >>> doing, it artificially devides unicode in subsets and try >>> to handle eache subset differently. >>> >> >> >> UTF-16 divides Unicode into two subsets: BMP characters (encoded using >> one 16-bit unit) and astral characters (encoded using two 16-bit units >> in the D800::/5 netblock, or equivalent thereof). Your beloved narrow >> builds are guilty of exactly the same crime as the hated 3.3. >> > UTF-8 divides Unicode into subsets which are encoded in 1, 2, 3, or 4 > bytes, and those who previously used ASCII still need only 1 byte per > codepoint!
Yes, but there's never (AFAIK) been a Python implementation that represents strings in UTF-8; UTF-16 was one of two options for Python 2.2 through 3.2, and is the one that jmf always seems to be measuring against. ChrisA -- http://mail.python.org/mailman/listinfo/python-list