https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88494
--- Comment #19 from Peter Cordes <pcordes at gmail dot com> --- Do you have an Intel machine you can test on? AMD handles 4-operand instructions like VBLENDVPD more efficiently than Intel, and looking over the comments that was something I suspected might have been relevant. The initial report was on Intel Haswell. I have a Skylake but I don't have old and new GCC versions installed. I could run an x86-64 GNU/Linux binary if anyone wants to send one. But since you could reproduce it getting slower with GCC9.5 then fast again with GCC15, that's probably good. (Unless the speedup from 9.5 to 15 was something unrelated, and there's still something improvable... But without a specific thing to look for in the asm, probably we should leave that for a potential new bug report and leave this closed.)
