https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105219
Tamar Christina <tnfchris at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |tnfchris at gcc dot gnu.org Last reconfirmed| |2022-4-11 --- Comment #1 from Tamar Christina <tnfchris at gcc dot gnu.org> --- Smaller reproducer getting rid of the loop nest and simplify the inner condition. int a; char b[60]; short d[19]; long long f; __attribute__ ((noinline, noipa)) void e(int g, int h, short k[19]) { for (signed j = 1; j < h + 14; j++) { int i = 0; b[j] = 1; i = 2; b[i * 14 + j] = 1; a = g ? k[j] : 0; } } __attribute__ ((optimize("O0"))) int main() { e(9, 1, d); for (long l = 0; l < 6; ++l) for (long m = 0; m < 4; ++m) f ^= b[l + m * 4]; if (f) __builtin_abort (); } and the -mtune=thunderx doesn't seem needed. I can reproduce with just -O3 -march=armv8.2-a+sve -msve-vector-bits=128. the write to `a` itself is significant not for the result but to block the loop from being optimized to a memset.