https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117557
--- Comment #7 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
The codegen in GCC 15 is:
ld1b z30.h, p6/z, [x0]
lsl z30.h, z30.h, #2
uunpkhi z29.s, z30.h
uunpklo z30.s, z30.h
ld1w z31.s, p3/z, [x23, z29.s, sxtw]
ld1w z29.s, p7/z, [x23, z30.s, sxtw]
st1w z29.s, p7, [x24, z12.s, sxtw]
st1w z31.s, p7, [x24, z12.s, sxtw]
but in GCC 14:
ld1w {z31.s}, p5/z, [x23, z30.s, sxtw]
ld1w {z29.s}, p4/z, [x23, z28.s, sxtw]
st1w {z29.s}, p4, [x24, z12.s, sxtw]
st1w {z31.s}, p5, [x3, z12.s, sxtw]
It looks like the incorrect mask is used in some cases, it looks like when it
has to unpack a vector it uses the same mask for every entry rather than the
unpack mask.
It also stores to the wrong address. It's storing to x24 twice. rather than x24
+ VL.
The GCC 15 code should be
ld1w z31.s, p3/z, [x23, z29.s, sxtw]
ld1w z29.s, p7/z, [x23, z30.s, sxtw]
st1w z29.s, p7, [x24, z12.s, sxtw]
addvl x3, x24, #2
st1w z31.s, p3, [x3, z12.s, sxtw]
Looking where we mess it up. It looks like the vectorizer is using the wrong
defs.
I'm running cvise for a testcase.