https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123175
--- Comment #10 from Tamar Christina <tnfchris at gcc dot gnu.org> --- > > Could do so if you want? > > The all_from_input_p work if nelts is correct, so this fix seems wrong. For > the particular pattern I think just initializing nelts from op0 is correct. > Hmm, yeah I needed all_in_range_p for something else (updates to simplify_vector_constructor) and ended up using it here too. So yeah agreed it's overcomplicated for this pattern. > But as said, I wonder if it was really intended to relax VEC_PERM_EXPR this > much. I wonder if we even ever get those on non-VLA targets? We do, all my optimizations are for Adv. SIMD. > Going forward I'd like to see a vec_perm_indices CTOR from gassign * > and tree (for match.pd if the tree one handles SSA name by looking at > the definition would be convenient) to avoid such issues. > > Do you have a non-GIMPLE testcase that shows the issue you are fixing above? Well one of the things my patch optimizes is that expansions of 64-bit permutes are zero extended to 128-bit types today because of the old restrictions of VEC_PERM_EXPR. So GCC generates unneeded zero extensions in all these cases https://godbolt.org/z/W8MnYP9cr In GIMPLE we get <bb 2> [local count: 1073741824]: _3 = {a_2(D), { 0, 0, 0, 0, 0, 0, 0, 0 }}; _5 = {b_4(D), { 0, 0, 0, 0, 0, 0, 0, 0 }}; _6 = VEC_PERM_EXPR <_3, _5, { 0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23 }>; return _6; which is really unneeded. One of the patches in the patch series teaches __builtin_shufflevector that if the target supports 64 -> 128 permutes to not zero extend it. Though Richard made the point before that perhaps __builtin_shufflevector should never zero extend and veclower should legitimize it by zero extending then. In essence we'd have the simplest form in GIMPLE then.
