[issue14419] Faster ascii decoding

Serhiy Storchaka Tue, 27 Mar 2012 03:34:31 -0700

Serhiy Storchaka <[email protected]> added the comment:

> q is not the address of the Unicode string, but the address of the
> data following the Unicode structure in memory. Strings created by
> PyUnicode_New() are composed on one unique memory block: {structure,
> data}.


I know all that.

#define _PyUnicode_COMPACT_DATA(op)                     \
    (PyUnicode_IS_ASCII(op) ?                   \
     ((void*)((PyASCIIObject*)(op) + 1)) :              \
     ((void*)((PyCompactUnicodeObject*)(op) + 1)))

q is ((void*)((PyASCIIObject*)(op) + 1)). (PyASCIIObject*)(op) + 1 is pointer 
to PyASCIIObject and has same alignment as PyASCIIObject. PyASCIIObject is 
aligned to sizeof(void *) 
because it starts with void * field. Consequently, q is aligned to sizeof(void 
*). It does not depend on the number and the size of the fields in 
PyASCIIObject, except for the 
first one.

Of course, if _PyUnicode_COMPACT_DATA definition is changed, it will cease to 
be true. Then apply my first patch, which may be a bit less effective for short 
strings 
(performance for short strings is bad measureable through Python). However, for 
short strings, we can put a size limit:

if (size >= 2 * SIZEOF_LONG && ((size_t) p & LONG_PTR_MASK) == ((size_t) q & 
LONG_PTR_MASK)) {

----------

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue14419>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue14419] Faster ascii decoding

Reply via email to