https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106187

--- Comment #25 from Richard Earnshaw <rearnsha at gcc dot gnu.org> ---
A quick status update.

I've managed to reduce the testcase to the latest attachment.  The program is
heavily reduced (so some bits likely don't make much sense), but the test still
'passes' when compiled with -fno-strict-aliasing, but fails with the same error
when that option is omitted.

Looking at the assembler output of void
hwy::N_EMU128::TestMulAdd::operator()<float, hwy::N_EMU128::Simd<float, 4u, 0>
>(float, hwy::N_EMU128::Simd<float, 4u, 0>) [clone .isra.0]

we see (correct on left, incorrect on right):


        add     r3, sp, #148                    add     r3, sp, #148
        vmov.f32        s14, #3.0e+0            vmov.f32        s14, #3.0e+0
[1]     mov     r6, r4                          mov     r6, r4
        vmov.f32        s15, #2.0e+0            vmov.f32        s15, #2.0e+0
        add     r8, sp, #100                    add     r8, sp, #100
        add     lr, sp, #132                    add     lr, sp, #132
        ldm     r3, {r0, r1, r2, r3}            ldm     r3, {r0, r1, r2, r3}
        vstr.32 s14, [sp, #152]                 vstr.32 s14, [sp, #152]
        vmov.f32        s14, #4.0e+0            vmov.f32        s14, #4.0e+0
[2]     stm     r4, {r0, r1, r2, r3}  |         stm     r5, {r0, r1, r2, r3}
        add     ip, sp, #116                    add     ip, sp, #116
        vstr.32 s14, [sp, #156]                 vstr.32 s14, [sp, #156]
        vmov.f32        s14, #5.0e+0            vmov.f32        s14, #5.0e+0
        stm     r5, {r0, r1, r2, r3}  <
        add     r5, sp, #36                     add     r5, sp, #36
        add     r10, sp, #196                   add     r10, sp, #196
        vstr.32 s14, [sp, #160]                 vstr.32 s14, [sp, #160]
        add     r9, sp, #152                    add     r9, sp, #152
[3]     vldr.32 s14, [r6]                       vldr.32 s14, [r6]
[4]     stm     r8, {r0, r1, r2, r3}  |         stm     r4, {r0, r1, r2, r3}
        vmul.f32        s15, s14, s15           vmul.f32        s15, s14, s15
                                      >         stm     r8, {r0, r1, r2, r3}

at [1] we see that r6 and r4 are the same value.  We also see that at [3] a
register is read using r6 as the base.  In the good code on the left, the STM
to r4 is at 2, but in the incorrect code is does not occur until 4, ie
immediately after the load at [3].

I need to dig a bit deeper now on this specific function to see if the alias
information is correct, or if it has somehow been lost/corrupted during the
compilation.

Reply via email to