https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116075
Bug ID: 116075
Summary: Inefficient SVE INSR codegen
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Keywords: aarch64-sve, missed-optimization
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: ktkachov at gcc dot gnu.org
Target Milestone: ---
Target: aarch64
I'm using the testcase:
#include <stdint.h>
#define N 32000
uint8_t in[N];
uint8_t in2[N];
uint32_t
foo (void)
{
uint32_t res = 0;
for (int i = 0; i < N; i++)
res += in[i];
return res;
}
compiling with -Ofast -mcpu=neoverse-v2
Ignoring the vector loop for now, in the preamble I see generated code:
mov z31.b, #0
movprfx z30, z31
insr z30.s, wzr
which seems inefficient as it just zeroes out z31 and z30.