On April 12, 2018 4:31:12 PM GMT+02:00, Jakub Jelinek <ja...@redhat.com> wrote: >On Thu, Apr 12, 2018 at 04:19:38PM +0200, Richard Biener wrote: >> Well, but that wouldn't be a fix for a regression and IMHO there's >> no reason for a really lame mempcpy. If targets disgree well, > >It is a regression as well, in the past we've emitted mempcpy when user >wrote mempcpy, now we don't. > >E.g. >extern void *mempcpy (void *, const void *, __SIZE_TYPE__); >void bar (void *, void *, void *); > >void >foo (void *x, void *y, void *z, void *w, __SIZE_TYPE__ n) >{ > bar (mempcpy (x, w, n), mempcpy (y, w, n), mempcpy (z, w, n)); >} > >is on x86_64-linux -O2 in 7.x using the 3 mempcpy calls and 90 bytes in >foo, while >on the trunk uses 3 memcpy calls and 96 bytes in foo. > >For -Os that is easily measurable regression, for -O2 it depends on the >relative speed of memcpy vs. mempcpy and whether one or both of them >are in >I-cache or not.
Well, then simply unconditionally not generate a libcall from the move expander? > >> then they get what they deserve. >> >> I don't see any aarch64 specific mempcpy in glibc btw so hopefully >> the default non-stupid one kicks in (it exactly looks like my C >> version) > > Jakub