On Thu, Apr 12, 2018 at 04:19:38PM +0200, Richard Biener wrote:
> Well, but that wouldn't be a fix for a regression and IMHO there's
> no reason for a really lame mempcpy.  If targets disgree well,

It is a regression as well, in the past we've emitted mempcpy when user
wrote mempcpy, now we don't.

E.g.
extern void *mempcpy (void *, const void *, __SIZE_TYPE__);
void bar (void *, void *, void *);

void
foo (void *x, void *y, void *z, void *w, __SIZE_TYPE__ n)
{
  bar (mempcpy (x, w, n), mempcpy (y, w, n), mempcpy (z, w, n));
}

is on x86_64-linux -O2 in 7.x using the 3 mempcpy calls and 90 bytes in foo, 
while
on the trunk uses 3 memcpy calls and 96 bytes in foo.

For -Os that is easily measurable regression, for -O2 it depends on the
relative speed of memcpy vs. mempcpy and whether one or both of them are in
I-cache or not.

> then they get what they deserve.
> 
> I don't see any aarch64 specific mempcpy in glibc btw so hopefully
> the default non-stupid one kicks in (it exactly looks like my C
> version)

        Jakub

Reply via email to