Hi Juzhe,

the general method seems sane and useful (it's not very complicated).
I was just distracted by

> Selector = { 0, 17, 2, 19, 4, 21, 6, 23, 8, 9, 10, 27, 12, 29, 14, 31 }, the 
> common expression:
> { 0, nunits + 1, 1, nunits + 2, 2, nunits + 3, ...  }
> 
> For this selector, we can use vmsltu + vmerge to optimize the codegen.

because it's actually { 0, nunits + 1, 2, nunits + 3, ... } or maybe
{ 0, nunits, 0, nunits, ... } + { 0, 1, 2, 3, ..., nunits - 1 }.

Because of the ascending/monotonic? selector structure we can use vmerge
instead of vrgather.

> +/* Recognize the patterns that we can use merge operation to shuffle the
> +   vectors. The value of Each element (index i) in selector can only be
> +   either i or nunits + i.
> +
> +   E.g.
> +   v = VEC_PERM_EXPR (v0, v1, selector),
> +   selector = { 0, nunits + 1, 1, nunits + 2, 2, nunits + 3, ...  }

Same.

> +
> +   We can transform such pattern into:
> +
> +   v = vcond_mask (v0, v1, mask),
> +   mask = { 0, 1, 0, 1, 0, 1, ... }.  */
> +
> +static bool
> +shuffle_merge_patterns (struct expand_vec_perm_d *d)
> +{
> +  machine_mode vmode = d->vmode;
> +  machine_mode sel_mode = related_int_vector_mode (vmode).require ();
> +  int n_patterns = d->perm.encoding ().npatterns ();
> +  poly_int64 vec_len = d->perm.length ();
> +
> +  for (int i = 0; i < n_patterns; ++i)
> +    if (!known_eq (d->perm[i], i) && !known_eq (d->perm[i], vec_len + i))
> +      return false;
> +
> +  for (int i = n_patterns; i < n_patterns * 2; i++)
> +    if (!d->perm.series_p (i, n_patterns, i, n_patterns)
> +     && !d->perm.series_p (i, n_patterns, vec_len + i, n_patterns))
> +      return false;

Maybe add a comment that we check that the pattern is actually monotonic
or however you prefet to call it?

I didn't go through all tests in detail but skimmed several.  All in all
looks good to me.

Regards
 Robin

Reply via email to