Sigh, I knew I should have waited until the morning to proof-read
and send this.

Richard Sandiford <richard.sandif...@arm.com> writes:
> diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> index 40767736389..00fce4945a7 100644
> --- a/gcc/fold-const.cc
> +++ b/gcc/fold-const.cc
> @@ -10743,27 +10743,37 @@ fold_vec_perm_cst (tree type, tree arg0, tree arg1, 
> const vec_perm_indices &sel,
>    unsigned res_npatterns, res_nelts_per_pattern;
>    unsigned HOST_WIDE_INT res_nelts;
>  
> -  /* (1) If SEL is a suitable mask as determined by
> -     valid_mask_for_fold_vec_perm_cst_p, then:
> -     res_npatterns = max of npatterns between ARG0, ARG1, and SEL
> -     res_nelts_per_pattern = max of nelts_per_pattern between
> -                          ARG0, ARG1 and SEL.
> -     (2) If SEL is not a suitable mask, and TYPE is VLS then:
> -     res_npatterns = nelts in result vector.
> -     res_nelts_per_pattern = 1.
> -     This exception is made so that VLS ARG0, ARG1 and SEL work as before.  
> */
> -  if (valid_mask_for_fold_vec_perm_cst_p (arg0, arg1, sel, reason))
> -    {
> -      res_npatterns
> -     = std::max (VECTOR_CST_NPATTERNS (arg0),
> -                 std::max (VECTOR_CST_NPATTERNS (arg1),
> -                           sel.encoding ().npatterns ()));
> +  /* First try to implement the fold in a VLA-friendly way.
> +
> +     (1) If the selector is simply a duplication of N elements, the
> +      result is likewise a duplication of N elements.
> +
> +     (2) If the selector is N elements followed by a duplication
> +      of N elements, the result is too.
>  
> -      res_nelts_per_pattern
> -     = std::max (VECTOR_CST_NELTS_PER_PATTERN (arg0),
> -                 std::max (VECTOR_CST_NELTS_PER_PATTERN (arg1),
> -                           sel.encoding ().nelts_per_pattern ()));
> +     (3) If the selector is N elements followed by an interleaving
> +      of N linear series, the situation is more complex.
>  
> +      valid_mask_for_fold_vec_perm_cst_p detects whether we
> +      can handle this case.  If we can, then each of the N linear
> +      series either (a) selects the same element each time or
> +      (b) selects a linear series from one of the input patterns.
> +
> +      If (b) holds for one of the linear series, the result
> +      will contain a linear series, and so the result will have
> +      the same shape as the selector.  If (a) holds for all of
> +      the lienar series, the result will be the same as (2) above.

linear

> +
> +      (b) can only hold if one of the inputs pattern has a

input patterns

Sorry for the typos.

Richard

> +      stepped encoding.  */
> +  if (valid_mask_for_fold_vec_perm_cst_p (arg0, arg1, sel, reason))
> +    {
> +      res_npatterns = sel.encoding ().npatterns ();
> +      res_nelts_per_pattern = sel.encoding ().nelts_per_pattern ();
> +      if (res_nelts_per_pattern == 3
> +       && VECTOR_CST_NELTS_PER_PATTERN (arg0) < 3
> +       && VECTOR_CST_NELTS_PER_PATTERN (arg1) < 3)
> +     res_nelts_per_pattern = 2;
>        res_nelts = res_npatterns * res_nelts_per_pattern;
>      }
>    else if (TYPE_VECTOR_SUBPARTS (type).is_constant (&res_nelts))

Reply via email to