Re: [Python-Dev] Difference between PyUnicode_IS_ASCII and PyUnicode_IS_COMPACT_ASCII ?

Victor Stinner Tue, 20 Dec 2011 11:34:44 -0800

On 20/12/2011 09:54, Antoine Pitrou wrote:


Hello,

The include file (unicodeobject.h) seems to imply that some pure ASCII
strings can be non-compact, but I don't understand how that can happen.

If you create a string from Py_UNICODE* or wchar_t* (using the legacyAPI), PyUnicode_READY() may create a non-compact but ASCII string.


Such string would be in the following state (extract of unicodeobject.h):

       - legacy string, ready:

         * structure = PyUnicodeObject structure
         * test: !PyUnicode_IS_COMPACT(op) && kind != PyUnicode_WCHAR_KIND
         * kind = PyUnicode_1BYTE_KIND, PyUnicode_2BYTE_KIND or
           PyUnicode_4BYTE_KIND
         * compact = 0
         * ready = 1
         * data.any is not NULL

* utf8 is shared and utf8_length = length with data.any ifascii = 1

         * utf8_length = 0 if utf8 is NULL

Besides, the following comment also seems wrong:

        - compact:

          * structure = PyCompactUnicodeObject
          * test: PyUnicode_IS_ASCII(op)&&  !PyUnicode_IS_COMPACT(op)

I added the "test" lines recently because I always forget how to get thestructure type. The correct test should be:


       - compact:

         * structure = PyCompactUnicodeObject
         * test: PyUnicode_IS_COMPACT(op) && !PyUnicode_IS_ASCII(op)

Victor
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Difference between PyUnicode_IS_ASCII and PyUnicode_IS_COMPACT_ASCII ?

Reply via email to