Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> writes:
> On Thu, 10 Aug 2023 at 21:27, Richard Sandiford
> <richard.sandif...@arm.com> wrote:
>>
>> Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> writes:
>> >> static bool
>> >> is_simple_vla_size (poly_uint64 size)
>> >> {
>> >>   if (size.is_constant ())
>> >>     return false;
>> >>   for (int i = 1; i < ARRAY_SIZE (size.coeffs); ++i)
>> >>     if (size[i] != (i <= 1 ? size[0] : 0))
>> > Just wondering is this should be (i == 1 ? size[0] : 0) since i is
>> > initialized to 1 ?
>>
>> Both work.  I prefer <= 1 because it doesn't depend on the micro
>> optimisation to start at coefficient 1.  In a theoretical 3-indeterminate
>> poly_int, we want the first 2 coefficients to be nonzero and the rest to
>> be zero.
>>
>> > IIUC, is_simple_vla_size should return true for polynomials of first
>> > degree and having same coeff like 4 + 4x ?
>>
>> FWIW, poly_int only supports first-degree polynomials at the moment.
>> coeffs>2 means there is more than one indeterminate, rather than a
>> higher power.
> Oh OK, thanks for the clarification.
>>
>> >>       return false;
>> >>   return true;
>> >> }
>> >>
>> >>
>> >>   FOR_EACH_MODE_IN_CLASS (mode, MODE_VECTOR_INT)
>> >>     {
>> >>       auto nunits = GET_MODE_NUNITS (mode);
>> >>       if (!is_simple_vla_size (nunits))
>> >>         continue;
>> >>       if (nunits[0] ...)
>> >>         test_... (mode);
>> >>       ...
>> >>
>> >>     }
>> >>
>> >> test_vnx4si_v4si and test_v4si_vnx4si look good.  But with the
>> >> loop structure above, I think we can apply the test_vnx4si and
>> >> test_vnx16qi to more cases.  So the classification isn't the
>> >> exact number of elements, but instead a limit.
>> >>
>> >> I think the nunits[0] conditions for test_vnx4si are as follows
>> >> (inspection only, so could be wrong):
>> >>
>> >> > +/* Test cases where result and input vectors are VNx4SI  */
>> >> > +
>> >> > +static void
>> >> > +test_vnx4si (machine_mode vmode)
>> >> > +{
>> >> > +  /* Case 1: mask = {0, ...} */
>> >> > +  {
>> >> > +    tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1);
>> >> > +    tree arg1 = build_vec_cst_rand (vmode, 2, 3, 1);
>> >> > +    poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
>> >> > +
>> >> > +    vec_perm_builder builder (len, 1, 1);
>> >> > +    builder.quick_push (0);
>> >> > +    vec_perm_indices sel (builder, 2, len);
>> >> > +    tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel);
>> >> > +
>> >> > +    tree expected_res[] = { vector_cst_elt (res, 0) };
>> > This should be { vector_cst_elt (arg0, 0) }; will fix in next patch.
>> >> > +    validate_res (1, 1, res, expected_res);
>> >> > +  }
>> >>
>> >> nunits[0] >= 2 (could be all nunits if the inputs had 
>> >> nelts_per_pattern==1,
>> >> which I think would be better)
>> > IIUC, the vectors that can be used for a particular test should have
>> > nunits[0] >= res_npatterns,
>> > where res_npatterns is as computed in fold_vec_perm_cst without the
>> > canonicalization ?
>> > For above test -- res_npatterns = max(2, max (2, 1)) == 2, so we
>> > require nunits[0] >= 2 ?
>> > Which implies we can use above test for vectors with length 2 + 2x, 4 + 
>> > 4x, etc.
>>
>> Right, that's what I meant.  With the inputs as they stand it has to be
>> nunits[0] >= 2.  We need that form the inputs correctly.  But if the
>> inputs instead had nelts_per_pattern == 1, the test would work for all
>> nunits.
> In the attached patch, I have reordered the tests based on min or max limit.
> For tests where sel_npatterns < 3 (ie dup sequence), I have kept input
> npatterns = 1,
> so we can test more vector modes, and also input npatterns matter only
> for stepped sequence in sel
> (Since for a dup pattern we don't enforce the constraint of selecting
> elements from same input pattern).
> Does it look OK ?
>
> For the following tests with input vectors having shape (1, 3)
> sel = {0, 1, 2, ...}  // (1, 3)
> res = { arg0[0], arg0[1], arg0[2], ... } // (1, 3)
>
> and sel = {len, len + 1, len + 2, ... }  // (1, 3)
> res = { arg1[0], arg1[1], arg1[2], ... } // (1, 3)
>
> Altho res_npatterns = 1, I suppose these will need to be tested with
> vectors with length >= 4 + 4x,
> since index 2 can be ambiguous for length 2 + 2x  ?
> (In the patch, these are cases 2 and 3 in test_nunits_min_4)

Ah, yeah, fair point.  I guess that means:

+      /* Case 3: mask = {len, 0, 1, ...} // (1, 3)
+        Test that stepped sequence of the pattern selects from arg0.
+        res = { arg1[0], arg0[0], arg0[1], ... } // (1, 3)  */
+      {
+       tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1);
+       tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1);
+       poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
+
+       vec_perm_builder builder (len, 1, 3);
+       poly_uint64 mask_elems[] = { len, 0, 1 };
+       builder_push_elems (builder, mask_elems);
+
+       vec_perm_indices sel (builder, 2, len);
+       tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel);
+
+       tree expected_res[] = { ARG1(0), ARG0(0), ARG0(1) };
+       validate_res (1, 3, res, expected_res);
+      }

needs to be min_2 after all.

Also:

> +/* Helper routine to push multiple elements into BUILDER.  */
> +
> +static void
> +builder_push_elems (vec_perm_builder& builder, poly_uint64 *elems)
> +{
> +  for (unsigned i = 0; i < builder.encoded_nelts (); i++)
> +    builder.quick_push (elems[i]);
> +}

I think it'd be safer to make this:

template<unsigned N>
builder_push_elems (vec_perm_builder& builder, poly_uint64 (&elems)[N])
{
  for (unsigned i = 0; i < N; i++)
    builder.quick_push (elems[i]);
}

so that we only push elements that are in the array.

OK for trunk with those changes, thanks.

Richard

> +
> +#define ARG0(index) vector_cst_elt (arg0, index)
> +#define ARG1(index) vector_cst_elt (arg1, index)
> +
> +/* Test cases where result is VNx4SI and input vectors are V4SI.  */
> +
> +static void
> +test_vnx4si_v4si (machine_mode vnx4si_mode, machine_mode v4si_mode)
> +{
> +  for (int i = 0; i < 10; i++)
> +    {
> +      /* Case 1:
> +      sel = { 0, 4, 1, 5, ... }
> +      res = { arg[0], arg1[0], arg0[1], arg1[1], ...} // (4, 1)  */
> +      {
> +     tree arg0 = build_vec_cst_rand (v4si_mode, 4, 1, 0);
> +     tree arg1 = build_vec_cst_rand (v4si_mode, 4, 1, 0);
> +
> +     tree inner_type
> +       = lang_hooks.types.type_for_mode (GET_MODE_INNER (vnx4si_mode), 1);
> +     tree res_type = build_vector_type_for_mode (inner_type, vnx4si_mode);
> +
> +     poly_uint64 res_len = TYPE_VECTOR_SUBPARTS (res_type);
> +     vec_perm_builder builder (res_len, 4, 1);
> +     poly_uint64 mask_elems[] = { 0, 4, 1, 5 };
> +     builder_push_elems (builder, mask_elems);
> +
> +     vec_perm_indices sel (builder, 2, res_len);
> +     tree res = fold_vec_perm_cst (res_type, arg0, arg1, sel);
> +
> +     tree expected_res[] = { ARG0(0), ARG1(0), ARG0(1), ARG1(1) };
> +     validate_res (4, 1, res, expected_res);
> +      }
> +
> +      /* Case 2: Same as case 1, but contains an out of bounds access which
> +      should wrap around.
> +      sel = {0, 8, 4, 12, ...} (4, 1)
> +      res = { arg0[0], arg0[0], arg1[0], arg1[0], ... } (4, 1).  */
> +      {
> +     tree arg0 = build_vec_cst_rand (v4si_mode, 4, 1, 0);
> +     tree arg1 = build_vec_cst_rand (v4si_mode, 4, 1, 0);
> +
> +     tree inner_type
> +       = lang_hooks.types.type_for_mode (GET_MODE_INNER (vnx4si_mode), 1);
> +     tree res_type = build_vector_type_for_mode (inner_type, vnx4si_mode);
> +
> +     poly_uint64 res_len = TYPE_VECTOR_SUBPARTS (res_type);
> +     vec_perm_builder builder (res_len, 4, 1);
> +     poly_uint64 mask_elems[] = { 0, 8, 4, 12 };
> +     builder_push_elems (builder, mask_elems);
> +
> +     vec_perm_indices sel (builder, 2, res_len);
> +     tree res = fold_vec_perm_cst (res_type, arg0, arg1, sel);
> +
> +     tree expected_res[] = { ARG0(0), ARG0(0), ARG1(0), ARG1(0) };
> +     validate_res (4, 1, res, expected_res);
> +      }
> +    }
> +}
> +
> +/* Test cases where result is V4SI and input vectors are VNx4SI.  */
> +
> +static void
> +test_v4si_vnx4si (machine_mode v4si_mode, machine_mode vnx4si_mode)
> +{
> +  for (int i = 0; i < 10; i++)
> +    {
> +      /* Case 1:
> +      sel = { 0, 1, 2, 3}
> +      res = { arg0[0], arg0[1], arg0[2], arg0[3] }.  */
> +      {
> +     tree arg0 = build_vec_cst_rand (vnx4si_mode, 4, 1);
> +     tree arg1 = build_vec_cst_rand (vnx4si_mode, 4, 1);
> +
> +     tree inner_type
> +       = lang_hooks.types.type_for_mode (GET_MODE_INNER (v4si_mode), 1);
> +     tree res_type = build_vector_type_for_mode (inner_type, v4si_mode);
> +
> +     poly_uint64 res_len = TYPE_VECTOR_SUBPARTS (res_type);
> +     vec_perm_builder builder (res_len, 4, 1);
> +     poly_uint64 mask_elems[] = {0, 1, 2, 3};
> +     builder_push_elems (builder, mask_elems);
> +
> +     vec_perm_indices sel (builder, 2, res_len);
> +     tree res = fold_vec_perm_cst (res_type, arg0, arg1, sel);
> +
> +     tree expected_res[] = { ARG0(0), ARG0(1), ARG0(2), ARG0(3) };
> +     validate_res_vls (res, expected_res, 4);
> +      }
> +
> +      /* Case 2: Same as Case 1, but crossing input vector.
> +      sel = {0, 2, 4, 6}
> +      In this case,the index 4 is ambiguous since len = 4 + 4x.
> +      Since we cannot determine, which vector to choose from during
> +      compile time, should return NULL_TREE.  */
> +      {
> +     tree arg0 = build_vec_cst_rand (vnx4si_mode, 4, 1);
> +     tree arg1 = build_vec_cst_rand (vnx4si_mode, 4, 1);
> +
> +     tree inner_type
> +       = lang_hooks.types.type_for_mode (GET_MODE_INNER (v4si_mode), 1);
> +     tree res_type = build_vector_type_for_mode (inner_type, v4si_mode);
> +
> +     poly_uint64 res_len = TYPE_VECTOR_SUBPARTS (res_type);
> +     vec_perm_builder builder (res_len, 4, 1);
> +     poly_uint64 mask_elems[] = {0, 2, 4, 6};
> +     builder_push_elems (builder, mask_elems);
> +
> +     vec_perm_indices sel (builder, 2, res_len);
> +     const char *reason;
> +     tree res = fold_vec_perm_cst (res_type, arg0, arg1, sel, &reason);
> +
> +     ASSERT_TRUE (res == NULL_TREE);
> +     ASSERT_TRUE (!strcmp (reason, "cannot divide selector element by arg 
> len"));
> +      }
> +    }
> +}
> +
> +/* Test all input vectors.  */
> +
> +static void
> +test_all_nunits (machine_mode vmode)
> +{
> +  /* Test with 10 different inputs.  */
> +  for (int i = 0; i < 10; i++)
> +    {
> +      tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1);
> +      tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1);
> +      poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> +
> +      /* Case 1: mask = {0, ...} // (1, 1)
> +      res = { arg0[0], ... } // (1, 1)  */
> +      {
> +     vec_perm_builder builder (len, 1, 1);
> +     builder.quick_push (0);
> +     vec_perm_indices sel (builder, 2, len);
> +     tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel);
> +     tree expected_res[] = { ARG0(0) };
> +     validate_res (1, 1, res, expected_res);
> +      }
> +
> +      /* Case 2: mask = {len, ...} // (1, 1)
> +      res = { arg1[0], ... } // (1, 1)  */
> +      {
> +     vec_perm_builder builder (len, 1, 1);
> +     builder.quick_push (len);
> +     vec_perm_indices sel (builder, 2, len);
> +     tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel);
> +
> +     tree expected_res[] = { ARG1(0) };
> +     validate_res (1, 1, res, expected_res);
> +      }
> +
> +      /* Case 3: mask = {len, 0, 1, ...} // (1, 3)
> +      Test that stepped sequence of the pattern selects from arg0.
> +      res = { arg1[0], arg0[0], arg0[1], ... } // (1, 3)  */
> +      {
> +     tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1);
> +     tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1);
> +     poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> +
> +     vec_perm_builder builder (len, 1, 3);
> +     poly_uint64 mask_elems[] = { len, 0, 1 };
> +     builder_push_elems (builder, mask_elems);
> +
> +     vec_perm_indices sel (builder, 2, len);
> +     tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel);
> +
> +     tree expected_res[] = { ARG1(0), ARG0(0), ARG0(1) };
> +     validate_res (1, 3, res, expected_res);
> +      }
> +    }
> +}
> +
> +/* Test all vectors which contain at-least 2 elements.  */
> +
> +static void
> +test_nunits_min_2 (machine_mode vmode)
> +{
> +  for (int i = 0; i < 10; i++)
> +    {
> +      /* Case 1: mask = { 0, len, ... }  // (2, 1)
> +      res = { arg0[0], arg1[0], ... } // (2, 1)  */
> +      {
> +     tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1);
> +     tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1);
> +     poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> +
> +     vec_perm_builder builder (len, 2, 1);
> +     poly_uint64 mask_elems[] = { 0, len };
> +     builder_push_elems (builder, mask_elems);
> +
> +     vec_perm_indices sel (builder, 2, len);
> +     tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel);
> +
> +     tree expected_res[] = { ARG0(0), ARG1(0) };
> +     validate_res (2, 1, res, expected_res);
> +      }
> +
> +      /* Case 2: mask = { 0, len, 1, len+1, ... } // (2, 2)
> +      res = { arg0[0], arg1[0], arg0[1], arg1[1], ... } // (2, 2)  */
> +      {
> +     tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1);
> +     tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1);
> +     poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> +
> +     vec_perm_builder builder (len, 2, 2);
> +     poly_uint64 mask_elems[] = { 0, len, 1, len + 1 };
> +     builder_push_elems (builder, mask_elems);
> +
> +     vec_perm_indices sel (builder, 2, len);
> +     tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel);
> +
> +     tree expected_res[] = { ARG0(0), ARG1(0), ARG0(1), ARG1(1) };
> +     validate_res (2, 2, res, expected_res);
> +      }
> +
> +      /* Case 4: mask = {0, 0, 1, ...} // (1, 3)
> +      Test that the stepped sequence of the pattern selects from
> +      same input pattern. Since input vectors have npatterns = 2,
> +      and step (a2 - a1) = 1, step is not a multiple of npatterns
> +      in input vector. So return NULL_TREE.  */
> +      {
> +     tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1);
> +     tree arg1 = build_vec_cst_rand (vmode, 2, 3, 1);
> +     poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> +
> +     vec_perm_builder builder (len, 1, 3);
> +     poly_uint64 mask_elems[] = { 0, 0, 1 };
> +     builder_push_elems (builder, mask_elems);
> +
> +     vec_perm_indices sel (builder, 2, len);
> +     const char *reason;
> +     tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel,
> +                                   &reason);
> +     ASSERT_TRUE (res == NULL_TREE);
> +     ASSERT_TRUE (!strcmp (reason, "step is not multiple of npatterns"));
> +      }
> +    }
> +}
> +
> +/* Test all vectors which contain at-least 4 elements.  */
> +
> +static void
> +test_nunits_min_4 (machine_mode vmode)
> +{
> +  for (int i = 0; i < 10; i++)
> +    {
> +      /* Case 1: mask = { 0, len, 1, len+1, ... } // (4, 1)
> +      res: { arg0[0], arg1[0], arg0[1], arg1[1], ... } // (4, 1)  */
> +      {
> +     tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1);
> +     tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1);
> +     poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> +
> +     vec_perm_builder builder (len, 4, 1);
> +     poly_uint64 mask_elems[] = { 0, len, 1, len + 1 };
> +     builder_push_elems (builder, mask_elems);
> +
> +     vec_perm_indices sel (builder, 2, len);
> +     tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel);
> +
> +     tree expected_res[] = { ARG0(0), ARG1(0), ARG0(1), ARG1(1) };
> +     validate_res (4, 1, res, expected_res);
> +      }
> +
> +      /* Case 2: sel = {0, 1, 2, ...}  // (1, 3)
> +      res: { arg0[0], arg0[1], arg0[2], ... } // (1, 3) */
> +      {
> +     tree arg0 = build_vec_cst_rand (vmode, 1, 3, 2);
> +     tree arg1 = build_vec_cst_rand (vmode, 1, 3, 2);
> +     poly_uint64 arg0_len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> +
> +     vec_perm_builder builder (arg0_len, 1, 3);
> +     poly_uint64 mask_elems[] = {0, 1, 2};
> +     builder_push_elems (builder, mask_elems);
> +
> +     vec_perm_indices sel (builder, 2, arg0_len);
> +     tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel);
> +     tree expected_res[] = { ARG0(0), ARG0(1), ARG0(2) };
> +     validate_res (1, 3, res, expected_res);
> +      }
> +
> +      /* Case 3: sel = {len, len+1, len+2, ...} // (1, 3)
> +      res: { arg1[0], arg1[1], arg1[2], ... } // (1, 3) */
> +      {
> +     tree arg0 = build_vec_cst_rand (vmode, 1, 3, 2);
> +     tree arg1 = build_vec_cst_rand (vmode, 1, 3, 2);
> +     poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> +
> +     vec_perm_builder builder (len, 1, 3);
> +     poly_uint64 mask_elems[] = {len, len + 1, len + 2};
> +     builder_push_elems (builder, mask_elems);
> +
> +     vec_perm_indices sel (builder, 2, len);
> +     tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel);
> +     tree expected_res[] = { ARG1(0), ARG1(1), ARG1(2) };
> +     validate_res (1, 3, res, expected_res);
> +      }
> +
> +      /* Case 4:
> +     sel = { len, 0, 2, ... } // (1, 3) 
> +     This should return NULL because we cross the input vectors.
> +     Because,
> +     Let's assume len = C + Cx
> +     a1 = 0
> +     S = 2
> +     esel = arg0_len / sel_npatterns = C + Cx
> +     ae = 0 + (esel - 2) * S
> +        = 0 + (C + Cx - 2) * 2
> +        = 2(C-2) + 2Cx
> +
> +     For C >= 4:
> +     Let q1 = a1 / arg0_len = 0 / (C + Cx) = 0
> +     Let qe = ae / arg0_len = (2(C-2) + 2Cx) / (C + Cx) = 1
> +     Since q1 != qe, we cross input vectors.
> +     So return NULL_TREE.  */
> +      {
> +     tree arg0 = build_vec_cst_rand (vmode, 1, 3, 2);
> +     tree arg1 = build_vec_cst_rand (vmode, 1, 3, 2);
> +     poly_uint64 arg0_len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> +
> +     vec_perm_builder builder (arg0_len, 1, 3);
> +     poly_uint64 mask_elems[] = { arg0_len, 0, 2 };
> +     builder_push_elems (builder, mask_elems);
> +
> +     vec_perm_indices sel (builder, 2, arg0_len);
> +     const char *reason;
> +     tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel, 
> &reason);
> +     ASSERT_TRUE (res == NULL_TREE);
> +     ASSERT_TRUE (!strcmp (reason, "crossed input vectors"));
> +      }
> +
> +      /* Case 5: npatterns(arg0) = 4 > npatterns(sel) = 2
> +      mask = { 0, len, 1, len + 1, ...} // (2, 2)
> +      res = { arg0[0], arg1[0], arg0[1], arg1[1], ... } // (2, 2)
> +
> +      Note that fold_vec_perm_cst will set
> +      res_npatterns = max(4, max(4, 2)) = 4
> +      However after canonicalizing, we will end up with shape (2, 2).  */
> +      {
> +     tree arg0 = build_vec_cst_rand (vmode, 4, 1);
> +     tree arg1 = build_vec_cst_rand (vmode, 4, 1);
> +     poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> +
> +     vec_perm_builder builder (len, 2, 2);
> +     poly_uint64 mask_elems[] = { 0, len, 1, len + 1 };
> +     builder_push_elems (builder, mask_elems);
> +
> +     vec_perm_indices sel (builder, 2, len);
> +     tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel);
> +     tree expected_res[] = { ARG0(0), ARG1(0), ARG0(1), ARG1(1) };
> +     validate_res (2, 2, res, expected_res);
> +      }
> +
> +      /* Case 6: Test combination in sel, where one pattern is dup and other
> +      is stepped sequence.
> +      sel = { 0, 0, 0, 1, 0, 2, ... } // (2, 3)
> +      res = { arg0[0], arg0[0], arg0[0],
> +              arg0[1], arg0[0], arg0[2], ... } // (2, 3)  */
> +      {
> +     tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1);
> +     tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1);
> +     poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> +
> +     vec_perm_builder builder (len, 2, 3);
> +     poly_uint64 mask_elems[] = { 0, 0, 0, 1, 0, 2 };
> +     builder_push_elems (builder, mask_elems);
> +
> +     vec_perm_indices sel (builder, 2, len);
> +     tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel);
> +
> +     tree expected_res[] = { ARG0(0), ARG0(0), ARG0(0),
> +                             ARG0(1), ARG0(0), ARG0(2) };
> +     validate_res (2, 3, res, expected_res);
> +      }
> +    }
> +}
> +
> +/* Test all vectors which contain at-least 8 elements.  */
> +
> +static void
> +test_nunits_min_8 (machine_mode vmode)
> +{
> +  for (int i = 0; i < 10; i++)
> +    {
> +      /* Case 1: sel_npatterns (4) > input npatterns (2)
> +      sel: { 0, 0, 1, len, 2, 0, 3, len, 4, 0, 5, len, ...} // (4, 3)
> +      res: { arg0[0], arg0[0], arg0[0], arg1[0],
> +             arg0[2], arg0[0], arg0[3], arg1[0],
> +             arg0[4], arg0[0], arg0[5], arg1[0], ... } // (4, 3)  */
> +      {
> +     tree arg0 = build_vec_cst_rand (vmode, 2, 3, 2);
> +     tree arg1 = build_vec_cst_rand (vmode, 2, 3, 2);
> +     poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> +
> +     vec_perm_builder builder(len, 4, 3);
> +     poly_uint64 mask_elems[] = { 0, 0, 1, len, 2, 0, 3, len,
> +                                  4, 0, 5, len };
> +     builder_push_elems (builder, mask_elems);
> +
> +     vec_perm_indices sel (builder, 2, len);
> +     tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel);
> +
> +     tree expected_res[] = { ARG0(0), ARG0(0), ARG0(1), ARG1(0),
> +                             ARG0(2), ARG0(0), ARG0(3), ARG1(0),
> +                             ARG0(4), ARG0(0), ARG0(5), ARG1(0) };
> +     validate_res (4, 3, res, expected_res);
> +      }
> +    }
> +}
> +
> +/* Test vectors for which nunits[0] <= 4.  */
> +
> +static void
> +test_nunits_max_4 (machine_mode vmode)
> +{
> +  /* Case 1: mask = {0, 4, ...} // (1, 2)
> +     This should return NULL_TREE because the index 4 may choose
> +     from either arg0 or arg1 depending on vector length.  */
> +  {
> +    tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1);
> +    tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1);
> +    poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> +
> +    vec_perm_builder builder (len, 1, 2);
> +    poly_uint64 mask_elems[] = {0, 4};
> +    builder_push_elems (builder, mask_elems);
> +
> +    vec_perm_indices sel (builder, 2, len);
> +    const char *reason;
> +    tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel, 
> &reason);
> +    ASSERT_TRUE (res == NULL_TREE);
> +    ASSERT_TRUE (reason != NULL);
> +    ASSERT_TRUE (!strcmp (reason, "cannot divide selector element by arg 
> len"));
> +  }
> +}
> +
> +#undef ARG0
> +#undef ARG1
> +
> +/* Return true if SIZE is of the form C + Cx and C is power of 2.  */
> +
> +static bool
> +is_simple_vla_size (poly_uint64 size)
> +{
> +  if (size.is_constant ()
> +      || !pow2p_hwi (size.coeffs[0]))
> +    return false;
> +  for (unsigned i = 1; i < ARRAY_SIZE (size.coeffs); ++i)
> +    if (size.coeffs[i] != (i <= 1 ? size.coeffs[0] : 0))
> +      return false;
> +  return true;
> +}
> +
> +/* Execute fold_vec_perm_cst unit tests.  */
> +
> +static void
> +test ()
> +{
> +  machine_mode vnx4si_mode = E_VOIDmode;
> +  machine_mode v4si_mode = E_VOIDmode;
> +
> +  machine_mode vmode;
> +  FOR_EACH_MODE_IN_CLASS (vmode, MODE_VECTOR_INT)
> +    {
> +      /* Obtain modes corresponding to VNx4SI and V4SI,
> +      to call mixed mode tests below.
> +      FIXME: Is there a better way to do this ?  */
> +      if (GET_MODE_INNER (vmode) == SImode)
> +     {
> +       poly_uint64 nunits = GET_MODE_NUNITS (vmode);
> +       if (is_simple_vla_size (nunits)
> +           && nunits.coeffs[0] == 4)
> +         vnx4si_mode = vmode;
> +       else if (known_eq (nunits, poly_uint64 (4)))
> +         v4si_mode = vmode;
> +     }
> +
> +      if (!is_simple_vla_size (GET_MODE_NUNITS (vmode))
> +       || !targetm.vector_mode_supported_p (vmode))
> +     continue;
> +
> +      poly_uint64 nunits = GET_MODE_NUNITS (vmode);
> +      test_all_nunits (vmode);
> +      if (nunits.coeffs[0] >= 2)
> +     test_nunits_min_2 (vmode);
> +      if (nunits.coeffs[0] >= 4)
> +     test_nunits_min_4 (vmode);
> +      if (nunits.coeffs[0] >= 8)
> +     test_nunits_min_8 (vmode);
> +
> +      if (nunits.coeffs[0] <= 4)
> +     test_nunits_max_4 (vmode);
> +    }
> +
> +  if (vnx4si_mode != E_VOIDmode && v4si_mode != E_VOIDmode
> +      && targetm.vector_mode_supported_p (vnx4si_mode)
> +      && targetm.vector_mode_supported_p (v4si_mode))
> +    {
> +      test_vnx4si_v4si (vnx4si_mode, v4si_mode);
> +      test_v4si_vnx4si (v4si_mode, vnx4si_mode);
> +    }
> +}
> +}; // end of test_fold_vec_perm_cst namespace
> +
>  /* Verify that various binary operations on vectors are folded
>     correctly.  */
>  
> @@ -16943,6 +17693,7 @@ fold_const_cc_tests ()
>    test_arithmetic_folding ();
>    test_vector_folding ();
>    test_vec_duplicate_folding ();
> +  test_fold_vec_perm_cst::test ();
>  }
>  
>  } // namespace selftest

Reply via email to