https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90579
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Last reconfirmed|2019-05-23 00:00:00 |2025-2-12
CC| |konstantinos.eleftheriou@vr
| |ull.eu, law at gcc dot gnu.org
--- Comment #19 from Richard Biener <rguenth at gcc dot gnu.org> ---
Assembly with -O3 -march=skylake is still
loop:
.LFB0:
.cfi_startproc
movslq %edi, %rdi
vbroadcastsd %xmm0, %ymm1
vmovddup %xmm0, %xmm0
vmulpd a(,%rdi,8), %ymm1, %ymm1
vxorpd %xmm4, %xmm4, %xmm4
vmovupd %ymm1, r(%rip) <--- Offsetted full store
vmulpd a+32(,%rdi,8), %xmm0, %xmm0
vmovupd %xmm0, r+32(%rip) <--- Store upper half
vmovupd r+16(%rip), %ymm2 <--- STLF fail
vextractf128 $0x1, %ymm2, %xmm3
vunpckhpd %xmm3, %xmm3, %xmm0
vaddsd %xmm4, %xmm0, %xmm0
vunpckhpd %xmm2, %xmm2, %xmm4
vaddsd %xmm3, %xmm0, %xmm0
vunpckhpd %xmm1, %xmm1, %xmm3
vaddsd %xmm4, %xmm0, %xmm0
vaddsd %xmm2, %xmm0, %xmm0
vaddsd %xmm3, %xmm0, %xmm0
vaddsd %xmm0, %xmm1, %xmm0
vzeroupper
ret
when you enable -favoid-store-forwarding this is split as
vmulpd a(,%rdi,8), %ymm1, %ymm1
vmovupd %ymm1, r(%rip)
vmulpd a+32(,%rdi,8), %xmm0, %xmm0
vmovupd r+16(%rip), %ymm5
vmovapd %ymm5, -32(%rsp)
vmovapd %xmm0, -16(%rsp)
vmovapd -32(%rsp), %ymm6
vmovupd %xmm0, r+32(%rip)
but that is even worse now, the offset load is still there and the
two stack moves don't forward. So we replaced one with two STLF fails.
Ugh.