https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231
--- Comment #19 from Richard Earnshaw <rearnsha at gcc dot gnu.org> --- This is another problem with (I suspect) incorrect aliasing information. If I compile with -fno-strict-aliasing, I get 88: f4432a1f vst1.8 {d18-d19}, [r3 :64] // {>E} SP+96/16 8c: f4420a1f vst1.8 {d16-d17}, [r2 :64] // {>A} SP+32/16 90: e893000f ldm r3, {r0, r1, r2, r3} // {<E} SP+96/16 94: e884000f stm r4, {r0, r1, r2, r3} // {>G} SP+128/16 98: eddd0b20 vldr d16, [sp, #128] ; 0x80 // {<G.l} SP+128/8 9c: eddd1b22 vldr d17, [sp, #136] ; 0x88 // {<G.h} SP+136/8 a0: e88c000f stm ip, {r0, r1, r2, r3} // {>B} SP+48/16 a4: e28dc040 add ip, sp, #64 ; 0x40 a8: e885000f stm r5, {r0, r1, r2, r3} // {>F} SP+112/16 ac: f2d80570 vshl.s16 q8, q8, #8 b0: f3f503e0 vneg.s16 q8, q8 b4: edcd0b20 vstr d16, [sp, #128] ; 0x80 // {>G.l} SP+128/8 b8: edcd1b22 vstr d17, [sp, #136] ; 0x88 // {>G.h} SP+136/8 bc: e894000f ldm r4, {r0, r1, r2, r3} // {<G} SP+128/16 c0: e88c000f stm ip, {r0, r1, r2, r3} // {>C} SP+64/16 c4: e28dc050 add ip, sp, #80 ; 0x50 c8: e88c000f stm ip, {r0, r1, r2, r3} // {>D} SP+80/16 cc: e885000f stm r5, {r0, r1, r2, r3} // {>F} SP+112/16 I've annotated each memory access with its stack address and labeled each 16-byte slot from A to G. With -fstrict-aliasing this becomes: 88: f4420a1f vst1.8 {d16-d17}, [r2 :64] // {>A} SP+32/16 8c: eddd0b20 vldr d16, [sp, #128] ; 0x80 // {<G.l} SP+128/8 ! 90: eddd1b22 vldr d17, [sp, #136] ; 0x88 // {<G.h} SP+136/8 ! 94: f4432a1f vst1.8 {d18-d19}, [r3 :64] // {>E} SP+96/16 98: e893000f ldm r3, {r0, r1, r2, r3} // {<E} SP+96/16 9c: e88c000f stm ip, {r0, r1, r2, r3} // {>B} SP+48/16 a0: e28dc040 add ip, sp, #64 ; 0x40 a4: f2d80570 vshl.s16 q8, q8, #8 a8: e884000f stm r4, {r0, r1, r2, r3} // {>G} SP+128/16 ! ac: e885000f stm r5, {r0, r1, r2, r3} // {>F} SP+112/16 b0: f3f503e0 vneg.s16 q8, q8 b4: edcd0b20 vstr d16, [sp, #128] ; 0x80 // {>G.l} SP+128/8 b8: edcd1b22 vstr d17, [sp, #136] ; 0x88 // {>G.h} SP+136/8 bc: e894000f ldm r4, {r0, r1, r2, r3} // {<G} SP+128/16 c0: e88c000f stm ip, {r0, r1, r2, r3} // {>C} SP+64/16 c4: e28dc050 add ip, sp, #80 ; 0x50 c8: e88c000f stm ip, {r0, r1, r2, r3} // {>D} SP+80/16 cc: e885000f stm r5, {r0, r1, r2, r3} // {>F} SP+112/16 And we see that the initial store to G has been moved after the reads from it. I'm still digging, but it may be pertinent that the reads have been split into two separate instructions; perhaps when the split was done the alias sets weren't copied correctly.