https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117344
Bug ID: 117344
Summary: Suboptimal use of movprfx in SVE intrinsics code
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Keywords: aarch64-sve, missed-optimization
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: ktkachov at gcc dot gnu.org
Target Milestone: ---
Target: aarch64
I'm not sure how bad this is in real code but spotted the testcase:
#include <arm_sve.h>
svint32_t foo(svbool_t pg, svint32_t a, svint32_t b)
{
b = svadd_m (pg, b, a);
return b;
}
This will generate with e.g. -O2 -march=armv9-a
foo:
mov z31.d, z0.d
movprfx z0, z1
add z0.s, p0/m, z0.s, z31.s
ret
but LLVM can do:
foo:
add z1.s, p0/m, z1.s, z0.s
mov z0.d, z1.d
ret