https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91333
--- Comment #5 from Marc Glisse <glisse at gcc dot gnu.org> --- With trunk (master?), compiling with -O3, h gives movapd %xmm1, %xmm3 addsd %xmm3, %xmm1 movapd %xmm0, %xmm2 addsd %xmm2, %xmm0 addsd %xmm1, %xmm0 which looks good (the asm prevents from doing addsd %xmm1 %xmm1 directly). However, if I add -mavx, I get vmovapd %xmm0, %xmm2 vmovapd %xmm1, %xmm4 vmovapd %xmm1, %xmm0 vaddsd %xmm0, %xmm4, %xmm0 vmovapd %xmm2, %xmm3 vaddsd %xmm2, %xmm3, %xmm2 vaddsd %xmm0, %xmm2, %xmm0 That's 2 extra moves compared to the non-avx version, which seems wrong since AVX gives more freedom to the RA. Those initial moves look quite similar to the ones I get for f with gcc-9 -O3 -mno-avx, so the optimization looks fragile.