On 01/15/2016 05:58 PM, Bernd Schmidt wrote:

One question Richard posed in the comments: why aren't we optimizing
small constant size memcmps other than size 1 to *s == *q? The reason is
the return value of memcmp, which implies byte-sized operation
(incidentally, the use of SImode in the cmpmem/cmpstr patterns is really
odd). It's possible to work around this, but expansion becomes a little
more tricky (subtract after bswap, maybe).

When I did this (big-endian conversion, wide substract, sign) to the tail difference check in glibc's x86_64 memcmp, it was actually a bit faster than isolating the differing byte and returning its difference, even for non-random data such as encountered during qsort:

 * Expand __memcmp_eq for small constant sizes with loads and
   comparison, fall back to a memcmp call.

Should we export such a function from glibc? I expect it's fairly common. Computing the tail difference costs a few cycles.

It may also make sense to call a streamlined implementation if you have interesting alignment information (for x86_64, that would be at least 16 on one or both inputs, so it's perhaps not easy to come by).

Florian

Reply via email to