https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123175

--- Comment #10 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
> > Could do so if you want?
> 
> The all_from_input_p work if nelts is correct, so this fix seems wrong.  For
> the particular pattern I think just initializing nelts from op0 is correct.
> 

Hmm, yeah I needed all_in_range_p for something else (updates to
simplify_vector_constructor) and ended up using it here too.  So yeah agreed
it's overcomplicated for this pattern.

> But as said, I wonder if it was really intended to relax VEC_PERM_EXPR this
> much.  I wonder if we even ever get those on non-VLA targets?

We do, all my optimizations are for Adv. SIMD.

> Going forward I'd like to see a vec_perm_indices CTOR from gassign *
> and tree (for match.pd if the tree one handles SSA name by looking at
> the definition would be convenient) to avoid such issues.
> 
> Do you have a non-GIMPLE testcase that shows the issue you are fixing above?

Well one of the things my patch optimizes is that expansions of 64-bit permutes
are zero extended to 128-bit types today because of the old restrictions of
VEC_PERM_EXPR.

So GCC generates unneeded zero extensions in all these cases
https://godbolt.org/z/W8MnYP9cr

In GIMPLE we get

  <bb 2> [local count: 1073741824]:
  _3 = {a_2(D), { 0, 0, 0, 0, 0, 0, 0, 0 }};
  _5 = {b_4(D), { 0, 0, 0, 0, 0, 0, 0, 0 }};
  _6 = VEC_PERM_EXPR <_3, _5, { 0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6,
22, 7, 23 }>;
  return _6;

which is really unneeded.

One of the patches in the patch series teaches __builtin_shufflevector that if
the target supports 64 -> 128 permutes to not zero extend it.  Though Richard
made the point before that perhaps __builtin_shufflevector should never zero
extend and veclower should legitimize it by zero extending then.  In essence
we'd have the simplest form in GIMPLE then.

Reply via email to