Re: [PATCH][RFC] Fix PR90510, VEC_PERM -> BIT_INSERT folding

Richard Sandiford Sun, 19 May 2019 00:43:07 -0700

Richard Biener <rguent...@suse.de> writes:
> This adds, incrementally ontop of moving VEC_PERM_EXPR folding
> to match.pd (but not incremental patch - sorry...), folding
> of single-element insert permutations to BIT_INSERT_EXPR.
>
> Things are quite awkward with the new poly-int vec-perm stuff
> so effectively this doesn't work for SVE and is still very
> ugly.  I wonder how to make it generic enough so SVE would
> be happy and / or how to make the code prettier.
>
> I also can't find a helper to read a specific vector element
> from a VECTOR_CST/CONSTRUCTOR (can I even do that "generally"
> with a poly-int index?!), but there surely must be one.


Yeah, would be nice to have a helper that handles both VECTOR_CST
and CONSTRUCTOR, even if just for constant indices.

CONSTRUCTORs are still very much fixed-length, so it wouldn't really
make sense to fold a poly_int index at compile time.  The only case we
could handle is when the index is known to be beyond the last nonzero
element.

We could fold some poly_int VECTOR_CST_ELT indices to poly_ints, but
it'd depend on the values involved.

Indexing specific poly_int elements of a single vector is a bit dubious
in length-agnostic code though.  Not saying it'll never be useful, but
it's certainly not a native SVE operation.  So I think even for SVE,
constant VECTOR_CST/CONSTRUCTOR elements are the only interesting case
for now.

> [...]
> @@ -11774,61 +11777,7 @@ fold_ternary_loc (location_t loc, enum t
>         poly_uint64 nelts = TYPE_VECTOR_SUBPARTS (type);
>         bool single_arg = (op0 == op1);
>         vec_perm_indices sel (builder, single_arg ? 1 : 2, nelts);
> -
> -       /* Check for cases that fold to OP0 or OP1 in their original
> -          element order.  */
> -       if (sel.series_p (0, 1, 0, 1))
> -         return op0;
> -       if (sel.series_p (0, 1, nelts, 1))
> -         return op1;
> -
> -       if (!single_arg)
> -         {
> -           if (sel.all_from_input_p (0))
> -             op1 = op0;
> -           else if (sel.all_from_input_p (1))
> -             {
> -               op0 = op1;
> -               sel.rotate_inputs (1);
> -             }
> -         }

Since this isn't an incremental patch... :-)

One of the holes of the current code is that we still allow two
permute indices for the same permutation, e.g.  { 4, 1, 2, 3 } and
{ 0, 5, 6, 7 }.  IMO we should canonicalize it so that the first index
selects from the first vector.  So maybe the above should be:

          if (!single_arg)
            {
              if (known_ge (sel[0], nunits))
                {
                  std::swap (op0, op1);
                  sel.rotate_inputs (1);
                }
              if (sel.all_from_input_p (0))
                op1 = op0;
            }

> [...]
> +     /* See if the permutation is performing a single element
> +        insert from a CONSTRUCTOR or constant and use a BIT_INSERT_EXPR
> +        in that case.  */
> +     unsigned HOST_WIDE_INT cnelts;
> +        if ((TREE_CODE (cop0) == VECTOR_CST
> +          || TREE_CODE (cop0) == CONSTRUCTOR
> +          || TREE_CODE (cop1) == VECTOR_CST
> +          || TREE_CODE (cop1) == CONSTRUCTOR)
> +         && nelts.is_constant (&cnelts))
> +          {
> +         unsigned first = 0, first_oo = 0, first_i;
> +         unsigned second = 0, second_oo = 0, second_i;
> +         HOST_WIDE_INT idx;
> +         for (unsigned HOST_WIDE_INT i = 0; i < cnelts; ++i)
> +           if (!sel[i].is_constant (&idx))
> +             {
> +               first = second = 0;
> +               break;
> +             }
> +           else if ((unsigned HOST_WIDE_INT)idx < cnelts)
> +             {
> +               first_i = i;
> +               first++;
> +               first_oo += (unsigned HOST_WIDE_INT)idx != i;
> +             }
> +           else
> +             {
> +               second_i = i;
> +               second++;
> +               second_oo += (unsigned HOST_WIDE_INT)idx != i + cnelts;
> +             }

This won't handle the case in which the inserted element comes from
the same vector.

If we add the extra canonicalization above, we'd only ever be inserting
into the second vector at element 0.   The test for that would be:

   if (sel.series_p (1, 1, nelts + 1, 1))
     // insert sel[0] into index 0 of the second vector

I think the SVE-friendly way of doing the check for the first vector
would be:

   unsigned int encoded_nelts = sel.encoding ().encoded_nelts ();
   unsigned int i = 0;
   for (; i < encoded_nelts; ++i)
     if (maybe_ne (sel[i], i))
       break;
   if (i < encoded_nelts && sel.series_p (i + 1, 1, i + 1, 1))
     // insert sel[i] into index i of the first vector

Although that's two searches, one of them will halt very early.

The SVE-friendly way of getting the VECTOR_CST/CONSTRUCTOR element would be:

  poly_uint64 idx = sel[i];
  unsigned HOST_WIDE_INT const_idx;
  if (known_le (idx, nelts) && idx.is_constant (&const_idx))
    // const_idx from first vector
  else if (known_ge (idx, nelts) && (idx - nelts).is_constant (&const_idx))
    // const_idx from second vector
  else
    // bail out

...which is a bit ugly, I admit.

Thanks,
Richard

Re: [PATCH][RFC] Fix PR90510, VEC_PERM -> BIT_INSERT folding

Reply via email to