https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88494

--- Comment #19 from Peter Cordes <pcordes at gmail dot com> ---
Do you have an Intel machine you can test on?  AMD handles 4-operand
instructions like VBLENDVPD more efficiently than Intel, and looking over the
comments that was something I suspected might have been relevant.  The initial
report was on Intel Haswell.

I have a Skylake but I don't have old and new GCC versions installed.  I could
run an x86-64 GNU/Linux binary if anyone wants to send one.

But since you could reproduce it getting slower with GCC9.5 then fast again
with GCC15, that's probably good.  (Unless the speedup from 9.5 to 15 was
something unrelated, and there's still something improvable...  But without a
specific thing to look for in the asm, probably we should leave that for a
potential new bug report and leave this closed.)

Reply via email to