On Sun, Aug 19, 2012 at 11:50 AM, Ian Kelly <ian.g.ke...@gmail.com> wrote:
> Note that this only describes the structure of "compact" string
> objects, which I have to admit I do not fully understand from the PEP.
>  The wording suggests that it only uses the PyASCIIObject structure,
> not the derived structures.  It then says that for compact ASCII
> strings "the UTF-8 data, the UTF-8 length and the wstr length are the
> same as the length of the ASCII data."  But these fields are part of
> the PyCompactUnicodeObject structure, not the base PyASCIIObject
> structure, so they would not exist if only PyASCIIObject were used.
> It would also imply that compact non-ASCII strings are stored
> internally as UTF-8, which would be surprising.

Oh, now I get it.  I had missed the part where it says "character data
immediately follow the base structure".  And the bit about the "UTF-8
data, the UTF-8 length and the wstr length" are not describing the
contents of those fields, but rather where the data can be alternatively
found since the fields don't exist.
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to