https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116979
--- Comment #18 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Jakub Jelinek from comment #17)
> Not marking as fixed for GCC 15 yet, as there is the scalar cost computation
> issue unresolved.
There is also a issue how the final result for SFmode is constructed. It can be
seen when compiled with -ffast-math:
vmovshdup %xmm0, %xmm4
vmovss %xmm0, -8(%rsp)
vmovss %xmm4, -4(%rsp)
vmovq -8(%rsp), %xmm0
This will result in store forwarding stall.