https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114987
--- Comment #7 from Haochen Jiang ---
Furthermore, when I build with GCC11, the codegen is much better:
vaddps 0xc0(%rsp),%ymm5,%ymm2
vaddps 0xe0(%rsp),%ymm4,%ymm1
vmovaps %ymm2,0x80(%rsp)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114987
--- Comment #6 from Hongtao Liu ---
> I tried to move "vmovdqa %xmm1,0xd0(%rsp)" before "vmovdqa %xmm0,0xe0(%rsp)"
> and rebuilt the binary and it will save half the regression.
57.93 │200: vaddps 0xc0(%rsp),%ymm3,%ymm5
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114987
--- Comment #5 from Haochen Jiang ---
What I have found is that the binary built with GCC13 and GCC14 will regress on
Cascadelake and Skylake.
But when I copied the binary to Icelake, it won't. Seems Icelake might fix this
with micro-tuning.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114987
Richard Biener changed:
What|Removed |Added
Status|UNCONFIRMED |NEW
Summary|[14/15