https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96918
--- Comment #16 from Jakub Jelinek <jakub at gcc dot gnu.org> --- Optimal shuffle is next to impossible on architectures like x86 where you have dozens of different permutation instructions and often you need not just one, but 2, 3, 4 or 5 of them depending on exact ISA and permutation. GCC has over 20k lines of source for choosing reasonable constant permutations just on this architecture. This PR is not about __builtin_shuffle emitting bad code, but about the vector lshift + rshift ored not even trying to emit it as permutation and comparing that to what one gets from those 3 operations if there is no native rotate. Though, sure, one could also derive from it that perhaps some constant permutations would be in some cases best emitted as 2 shifts + or, guess we don't try that among 3 insn cases yet. The current ones are i386-expand.cc:expand_vec_perm_movs (struct expand_vec_perm_d *d) i386-expand.cc:expand_vec_perm_insertps (struct expand_vec_perm_d *d) i386-expand.cc:expand_vec_perm_blend (struct expand_vec_perm_d *d) i386-expand.cc:expand_vec_perm_vpermil (struct expand_vec_perm_d *d) i386-expand.cc:expand_vec_perm_pshufb (struct expand_vec_perm_d *d) i386-expand.cc:expand_vec_perm_1 (struct expand_vec_perm_d *d) i386-expand.cc:expand_vec_perm_shufps_shufps (struct expand_vec_perm_d *d) i386-expand.cc:expand_vec_perm_pshuflw_pshufhw (struct expand_vec_perm_d *d) i386-expand.cc:expand_vec_perm_punpckldq_pshuf (struct expand_vec_perm_d *d) i386-expand.cc:expand_vec_perm_palignr (struct expand_vec_perm_d *d, bool single_insn_only_p) i386-expand.cc:expand_vec_perm_pblendv (struct expand_vec_perm_d *d) i386-expand.cc:expand_vec_perm_interleave2 (struct expand_vec_perm_d *d) i386-expand.cc:expand_vec_perm_vpermq_perm_1 (struct expand_vec_perm_d *d) i386-expand.cc:expand_vec_perm_vperm2f128 (struct expand_vec_perm_d *d) i386-expand.cc:expand_vec_perm_interleave3 (struct expand_vec_perm_d *d) i386-expand.cc:expand_vec_perm_vperm2f128_vblend (struct expand_vec_perm_d *d) i386-expand.cc:expand_vec_perm_2perm_interleave (struct expand_vec_perm_d *d, bool two_insn) i386-expand.cc:expand_vec_perm_2perm_pblendv (struct expand_vec_perm_d *d, bool two_insn) i386-expand.cc:expand_vec_perm_psrlw_psllw_por (struct expand_vec_perm_d *d) i386-expand.cc:expand_vec_perm_2vperm2f128_vshuf (struct expand_vec_perm_d *d) i386-expand.cc:expand_vec_perm_pshufb2 (struct expand_vec_perm_d *d) i386-expand.cc:expand_vec_perm_vpshufb2_vpermq (struct expand_vec_perm_d *d) i386-expand.cc:expand_vec_perm_vpshufb2_vpermq_even_odd (struct expand_vec_perm_d *d) i386-expand.cc:expand_vec_perm_pand_pandn_por (struct expand_vec_perm_d *d) i386-expand.cc:expand_vec_perm_pslldq_psrldq_por (struct expand_vec_perm_d *d, bool pandn) i386-expand.cc:expand_vec_perm_even_odd_pack (struct expand_vec_perm_d *d) i386-expand.cc:expand_vec_perm_even_odd_trunc (struct expand_vec_perm_d *d) i386-expand.cc:expand_vec_perm_even_odd_1 (struct expand_vec_perm_d *d, unsigned odd) i386-expand.cc:expand_vec_perm_even_odd (struct expand_vec_perm_d *d) i386-expand.cc:expand_vec_perm_broadcast_1 (struct expand_vec_perm_d *d) i386-expand.cc:expand_vec_perm_broadcast (struct expand_vec_perm_d *d) i386-expand.cc:expand_vec_perm_vpermt2_vpshub2 (struct expand_vec_perm_d *d) i386-expand.cc:expand_vec_perm_vpshufb4_vpermq2 (struct expand_vec_perm_d *d) where several of them use the RTL machine description and match various insns in the md (e.g. expand_vec_perm_1 does try whatever matches a single insn, etc.).
