https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68655
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- typedef int v4si __attribute__((vector_size(16))); v4si bar (v4si a, v4si b) { return __builtin_shuffle (a, b, (v4si) { 0, 1, 4, 5 }); } works just fine using shufps. It might be that the generic optabs code should try using larger element modes if the permutation allows that. It already tries QImode as fallback. OTOH as v8hi baz (v8hi a, v8hi b) { return __builtin_shuffle (a, b, (v8hi) { 0, 1, 2, 3, 12, 13, 14, 15 }); } works and uses shufpd the x86 backend seems to be somewhat prepared for the above.