Serhiy Storchaka <storch...@gmail.com> added the comment: > q is not the address of the Unicode string, but the address of the > data following the Unicode structure in memory. Strings created by > PyUnicode_New() are composed on one unique memory block: {structure, > data}.
I know all that. #define _PyUnicode_COMPACT_DATA(op) \ (PyUnicode_IS_ASCII(op) ? \ ((void*)((PyASCIIObject*)(op) + 1)) : \ ((void*)((PyCompactUnicodeObject*)(op) + 1))) q is ((void*)((PyASCIIObject*)(op) + 1)). (PyASCIIObject*)(op) + 1 is pointer to PyASCIIObject and has same alignment as PyASCIIObject. PyASCIIObject is aligned to sizeof(void *) because it starts with void * field. Consequently, q is aligned to sizeof(void *). It does not depend on the number and the size of the fields in PyASCIIObject, except for the first one. Of course, if _PyUnicode_COMPACT_DATA definition is changed, it will cease to be true. Then apply my first patch, which may be a bit less effective for short strings (performance for short strings is bad measureable through Python). However, for short strings, we can put a size limit: if (size >= 2 * SIZEOF_LONG && ((size_t) p & LONG_PTR_MASK) == ((size_t) q & LONG_PTR_MASK)) { ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue14419> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com