Hi Juzhe, the general method seems sane and useful (it's not very complicated). I was just distracted by
> Selector = { 0, 17, 2, 19, 4, 21, 6, 23, 8, 9, 10, 27, 12, 29, 14, 31 }, the > common expression: > { 0, nunits + 1, 1, nunits + 2, 2, nunits + 3, ... } > > For this selector, we can use vmsltu + vmerge to optimize the codegen. because it's actually { 0, nunits + 1, 2, nunits + 3, ... } or maybe { 0, nunits, 0, nunits, ... } + { 0, 1, 2, 3, ..., nunits - 1 }. Because of the ascending/monotonic? selector structure we can use vmerge instead of vrgather. > +/* Recognize the patterns that we can use merge operation to shuffle the > + vectors. The value of Each element (index i) in selector can only be > + either i or nunits + i. > + > + E.g. > + v = VEC_PERM_EXPR (v0, v1, selector), > + selector = { 0, nunits + 1, 1, nunits + 2, 2, nunits + 3, ... } Same. > + > + We can transform such pattern into: > + > + v = vcond_mask (v0, v1, mask), > + mask = { 0, 1, 0, 1, 0, 1, ... }. */ > + > +static bool > +shuffle_merge_patterns (struct expand_vec_perm_d *d) > +{ > + machine_mode vmode = d->vmode; > + machine_mode sel_mode = related_int_vector_mode (vmode).require (); > + int n_patterns = d->perm.encoding ().npatterns (); > + poly_int64 vec_len = d->perm.length (); > + > + for (int i = 0; i < n_patterns; ++i) > + if (!known_eq (d->perm[i], i) && !known_eq (d->perm[i], vec_len + i)) > + return false; > + > + for (int i = n_patterns; i < n_patterns * 2; i++) > + if (!d->perm.series_p (i, n_patterns, i, n_patterns) > + && !d->perm.series_p (i, n_patterns, vec_len + i, n_patterns)) > + return false; Maybe add a comment that we check that the pattern is actually monotonic or however you prefet to call it? I didn't go through all tests in detail but skimmed several. All in all looks good to me. Regards Robin