New submission from STINNER Victor <victor.stin...@haypocalc.com>: A lot of code is duplicated in unicodeobject.c to manipulate ("encode/decode") surrogates. Each function has from one to three different implementations. The new decode_ucs4() function adds a new implementation. Attached patch replaces this code by macros.
I think that only the implementations of IS_HIGH_SURROGATE and IS_LOW_SURROGATE are important for speed. ((ch & 0xFFFFFC00UL) == 0xD800) (from decode_ucs4) is *a little bit* faster than (0xD800 <= ch && ch <= 0xDBFF) on my CPU (Atom Z520 @ 1.3 GHz): running test_unicode 4 times takes ~54 sec instead of ~57 sec (-3%). These 3 macros have to be checked, I wrote the first one: #define IS_SURROGATE(ch) (((ch) & 0xFFFFF800UL) == 0xD800) #define IS_HIGH_SURROGATE(ch) (((ch) & 0xFFFFFC00UL) == 0xD800) #define IS_LOW_SURROGATE(ch) (((ch) & 0xFFFFFC00UL) == 0xDC00) I added cast to Py_UCS4 in COMBINE_SURROGATES to avoid integer overflow if Py_UNICODE is 16 bits (narrow build). It's maybe useless. #define COMBINE_SURROGATES(ch1, ch2) \ (((((Py_UCS4)(ch1) & 0x3FF) << 10) | ((Py_UCS4)(ch2) & 0x3FF)) + 0x10000) HIGH_SURROGATE and LOW_SURROGATE require that their ordinal argument has been preproceed to fit in [0; 0xFFFF]. I added this requirement in the comment of these macros. It would be better to have only one macro to do the two operations, but because "*p++" (dereference and increment) is usually used, I prefer to avoid one unique macro (I don't like passing *p++ in a macro using its argument more than once). Or we may add a third macro using HIGH_SURROGATE and LOW_SURROGATE. I rewrote the main loop of PyUnicode_EncodeUTF16() to avoid an useless test on ch2 on narrow build. I also added a IS_NONBMP macro just because I prefer macro over hardcoded constants. ---------- files: unicode_macros.patch keywords: patch messages: 142108 nosy: benjamin.peterson, ezio.melotti, haypo, lemburg, loewis, pitrou, tchrist, terry.reedy priority: normal severity: normal status: open title: Use macros for surrogates in unicodeobject.c versions: Python 3.3 Added file: http://bugs.python.org/file22901/unicode_macros.patch _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue12751> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com