https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98167
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> --- That works only for single-operation and doesn't really scale. I think we want to expose the permutes at the GIMPLE level via ix86_gimple_fold_builtin. We already handle IX86_BUILTIN_SHUFPD there but not IX86_BUILTIN_SHUFPS for some reason.