STINNER Victor added the comment: "For 32-bit Windows, the code generated for unicode_compare is quite slow. There are either 1 or 2 kind checks in each call to PyUnicode_READ (...)"
Yes, PyUnicode_READ() *is* slow. It should not be used in a loop. And unicode_compare() uses PyUnicode_READ() in a loop. An improvment would be to write specialized version of each combinaison of Unicode kinds: (UCS1, UCS2), (UCS1, UCS4), (UCS2, UCS1), (UCS2, UCS2), (UCS2, UCS4) (UCS4, UCS1), (UCS4, UCS2), (UCS4, UCS4) # (UCS1, UCS1) uses memcmp() But I am not convinced that the gain would be visible, and I don't know how to factorize the code. We should probably use a huge macro. 2013/4/4 Neil Hodgson <rep...@bugs.python.org>: > > Neil Hodgson added the comment: > > For 32-bit Windows, the code generated for unicode_compare is quite slow. > > There are either 1 or 2 kind checks in each call to PyUnicode_READ and 2 > calls to PyUnicode_READ inside the loop. A compiler may decide to move the > kind checks out of the loop and specialize the loop but MSVC 2010 appears to > not do so. The assembler (32-bit build) for each PyUnicode_READ looks like > > mov ecx, DWORD PTR _kind1$[ebp] > cmp ecx, 1 > jne SHORT $LN17@unicode_co@2 > lea ecx, DWORD PTR [ebx+eax] > movzx edx, BYTE PTR [ecx+edx] > jmp SHORT $LN16@unicode_co@2 > $LN17@unicode_co@2: > cmp ecx, 2 > jne SHORT $LN15@unicode_co@2 > movzx edx, WORD PTR [ebx+edi] > jmp SHORT $LN16@unicode_co@2 > $LN15@unicode_co@2: > mov edx, DWORD PTR [ebx+esi] > $LN16@unicode_co@2: > > The kind1/kind2 variables aren't even going into registers and at least > one test+branch and a jump are executed for every character. Two tests for 2 > and 4 byte kinds. len1 and len2 don't get to go into registers either. > > My system isn't set up for 64-bit MSVC 2010 but looking at the code from > 64-bit MSVC 2012 shows that all the variables have been moved into registers > but the kind checking is still inside the loop. This accounts for better > results with 64-bit Python 3.3 on Windows but isn't as good as Unix or Python > 3.2. > > ; 10431: c1 = PyUnicode_READ(kind1, data1, i); > > cmp rsi, 1 > jne SHORT $LN17@unicode_co > lea rax, QWORD PTR [r9+rcx] > movzx r8d, BYTE PTR [rax+rbx] > jmp SHORT $LN16@unicode_co > $LN17@unicode_co: > cmp rsi, 2 > jne SHORT $LN15@unicode_co > movzx r8d, WORD PTR [r9+r11] > jmp SHORT $LN16@unicode_co > $LN15@unicode_co: > mov r8d, DWORD PTR [r9+r10] > $LN16@unicode_co: > > Attached the 32-bit assembler listing. > > ---------- > Added file: http://bugs.python.org/file29673/unicode_compare.asm > > _______________________________________ > Python tracker <rep...@bugs.python.org> > <http://bugs.python.org/issue17615> > _______________________________________ ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue17615> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com