https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120398
Bug ID: 120398
Summary: vectorization emits shuffles followed by scalar adds
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: amonakov at gcc dot gnu.org
Target Milestone: ---
Target: x86_64-*-*
Another variant of PR 109892. GCC manages to emit vector multiplications at
-O2, but corresponding additions are all scalar, and there's tons of shuffles
in between. At the same time, on AArch64 this loop is vectorized properly.
static float muladd(float x, float y, float z)
{
return x * y + z;
}
float g(float x[], long n)
{
float r0 = 0, r1 = 0;
for (; n; x += 2, n--) {
r0 = muladd(x[0], x[0], r0);
r1 = muladd(x[1], x[1], r1);
}
return r0 + r1;
}