https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97832
--- Comment #14 from Michael_S <already5chosen at yahoo dot com> --- I tested a smaller test bench from Comment 3 with gcc trunk on godbolt. Issue appears to be only partially fixed. -Ofast result is no longer a horror that it was before, but it is still not as good as -O3 or -O2. -Ofast code generation is still strange and there are few vblendpd instruction that serve no useful purpose. And -O2/O3 is still not as good as it should be or as good as icc. But, as mentioned in my original post, over-aggressive load+op combining is a separate problem.