On Sun, Aug 19, 2012 at 11:50 AM, Ian Kelly <ian.g.ke...@gmail.com> wrote: > Note that this only describes the structure of "compact" string > objects, which I have to admit I do not fully understand from the PEP. > The wording suggests that it only uses the PyASCIIObject structure, > not the derived structures. It then says that for compact ASCII > strings "the UTF-8 data, the UTF-8 length and the wstr length are the > same as the length of the ASCII data." But these fields are part of > the PyCompactUnicodeObject structure, not the base PyASCIIObject > structure, so they would not exist if only PyASCIIObject were used. > It would also imply that compact non-ASCII strings are stored > internally as UTF-8, which would be surprising.
Oh, now I get it. I had missed the part where it says "character data immediately follow the base structure". And the bit about the "UTF-8 data, the UTF-8 length and the wstr length" are not describing the contents of those fields, but rather where the data can be alternatively found since the fields don't exist. -- http://mail.python.org/mailman/listinfo/python-list