On Wed, Oct 12, 2016 at 2:19 PM, Elliot Gorokhovsky <elliot.gorokhov...@gmail.com> wrote: [...] > So that was the motivation for all this. Actually, if I wrote this for > python 2, I might be able to get even better numbers (at least for strings), > since we can't use strcmp in python 3. (Actually, I've heard UTF-8 strings > are strcmp-able, so maybe if we go through and verify all the strings are > UTF-8 we can strcmp them? I don't know enough about how PyUnicode stuff > works to do this safely). My string special case currently just bypasses the > typechecks and goes to unicode_compare(), which is still wayyy overkill for > the common case of ASCII or Latin-1 strings, since it uses a for loop to go > through and check characters, and strcmp uses compiler magic to do it in > like, negative time or something. I even PyUnicode_READY the strings before > comparing; I'm not sure if that's really necessary, but that's how > PyUnicode_Compare does it.
It looks like PyUnicode_Compare already has a special case to use memcmp when both of the strings fit into latin1: https://github.com/python/cpython/blob/cfc517e6eba37f1bd61d57bf0dbece9843bff9c8/Objects/unicodeobject.c#L10855-L10860 I suppose the for loops that are used for multibyte strings could potentially be sped up with SIMD or something, but that gets complicated fast, and modern compilers might even be doing it already. -n -- Nathaniel J. Smith -- https://vorpus.org _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/