https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114987
--- Comment #6 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
> I tried to move "vmovdqa %xmm1,0xd0(%rsp)" before "vmovdqa %xmm0,0xe0(%rsp)"
> and rebuilt the binary and it will save half the regression.
57.93 │200: vaddps 0xc0(%rsp),%ymm3,%ymm5
11.11 │ vaddps 0xe0(%rsp),%ymm2,%ymm6
...
3.22 │ vmovdqa %xmm1,0xc0(%rsp)
│ vmovdqa %xmm5,0xd0(%rsp)
3.52 │ vmovdqa %xmm0,0xe0(%rsp)
│ vmovdqa %xmm6,0xf0(%rsp)
I guess there're specific patterns in SKX microarhitecture for STLF, the main
difference is instruction order of those xmm stores.
>From compiler side, the worth thing to do is PR107916.