On Fri, Jan 15, 2016 at 5:58 PM, Bernd Schmidt <bschm...@redhat.com> wrote: > PR43052 is a PR complaining about how the rep cmpsb expansion that gcc uses > for memcmp is slower than the library function. As is so often the case, if > you investigate a bit, you can find a lot of issues with the current > situation in the compiler. > > This PR was accidentally fixed by a patch by Nick which disabled the use of > cmpstrnsi for memcmp expansion, on the grounds that cmpstrnsi could stop > looking after seeing a null byte, which would be invalid for memcmp, so only > cmpmemsi should be used. This fix was for an out-of-tree target. > > I believe the rep cmpsb sequence used by i386 would actually be valid, so we > could duplicate the cmpstrn pattern to also match cmpmem and be done - but > that would then again cause the performance problem described in the PR, so > it's probably not a good idea. > > One question Richard posed in the comments: why aren't we optimizing small > constant size memcmps other than size 1 to *s == *q? The reason is the > return value of memcmp, which implies byte-sized operation (incidentally, > the use of SImode in the cmpmem/cmpstr patterns is really odd). It's > possible to work around this, but expansion becomes a little more tricky > (subtract after bswap, maybe). Still, the current code generation is lame. > > So, for gcc-6, I think we shouldn't do anything. The PR is fixed, and > there's no easy bug-fix that can be done to improve matters. Not sure > whether to keep the PR open or create a new one for the remaining issues. > For the next stage1, I'm attaching a proof-of-concept patch that does the > following: > * notice if memcmp results are only used for equality comparison > against zero > * if so, replace with a different builtin __memcmp_eq > * Expand __memcmp_eq for small constant sizes with loads and > comparison, fall back to a memcmp call. > > The whole thing could be extended to work for sizes larger than an int, > along the lines of memcpy expansion controlled by move ratio etc. Thoughts?
See also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52171 - the inline expansion for small sizes and equality compares should be done on GIMPLE. Today the strlen pass might be an appropriate place to do this given its superior knowledge about string lengths. The idea of turning eq feeding memcmp into a special memcmp_eq is good but you have to avoid doing that too early - otherwise you'd lose on res = memcmp (p, q, sz); if (memcmp (p, q, sz) == 0) ... that is, you have to make sure CSE got the chance to common the two calls. This is why I think this kind of transform needs to happen in specific places (like during strlen opt) rather than in generic folding. Richard. > > Bernd