https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106187
--- Comment #25 from Richard Earnshaw <rearnsha at gcc dot gnu.org> --- A quick status update. I've managed to reduce the testcase to the latest attachment. The program is heavily reduced (so some bits likely don't make much sense), but the test still 'passes' when compiled with -fno-strict-aliasing, but fails with the same error when that option is omitted. Looking at the assembler output of void hwy::N_EMU128::TestMulAdd::operator()<float, hwy::N_EMU128::Simd<float, 4u, 0> >(float, hwy::N_EMU128::Simd<float, 4u, 0>) [clone .isra.0] we see (correct on left, incorrect on right): add r3, sp, #148 add r3, sp, #148 vmov.f32 s14, #3.0e+0 vmov.f32 s14, #3.0e+0 [1] mov r6, r4 mov r6, r4 vmov.f32 s15, #2.0e+0 vmov.f32 s15, #2.0e+0 add r8, sp, #100 add r8, sp, #100 add lr, sp, #132 add lr, sp, #132 ldm r3, {r0, r1, r2, r3} ldm r3, {r0, r1, r2, r3} vstr.32 s14, [sp, #152] vstr.32 s14, [sp, #152] vmov.f32 s14, #4.0e+0 vmov.f32 s14, #4.0e+0 [2] stm r4, {r0, r1, r2, r3} | stm r5, {r0, r1, r2, r3} add ip, sp, #116 add ip, sp, #116 vstr.32 s14, [sp, #156] vstr.32 s14, [sp, #156] vmov.f32 s14, #5.0e+0 vmov.f32 s14, #5.0e+0 stm r5, {r0, r1, r2, r3} < add r5, sp, #36 add r5, sp, #36 add r10, sp, #196 add r10, sp, #196 vstr.32 s14, [sp, #160] vstr.32 s14, [sp, #160] add r9, sp, #152 add r9, sp, #152 [3] vldr.32 s14, [r6] vldr.32 s14, [r6] [4] stm r8, {r0, r1, r2, r3} | stm r4, {r0, r1, r2, r3} vmul.f32 s15, s14, s15 vmul.f32 s15, s14, s15 > stm r8, {r0, r1, r2, r3} at [1] we see that r6 and r4 are the same value. We also see that at [3] a register is read using r6 as the base. In the good code on the left, the STM to r4 is at 2, but in the incorrect code is does not occur until 4, ie immediately after the load at [3]. I need to dig a bit deeper now on this specific function to see if the alias information is correct, or if it has somehow been lost/corrupted during the compilation.