This is not a fix for the case { 1, 2, 3, ..., 31, 0 }. This patch is an extension of expand_vec_perm_palignr on AVX2 case. For the case { 1, 2, 3, ..., 31, 0 } we should use separate function/pattern. I like split as it is similar to already handled SSE byte rotate {1,2,3,.....,15, 0}: ssse3_palignr<mode>_perm and AVX2 split: *avx_vperm_broadcast_<mode>.
On Wed, Oct 1, 2014 at 2:35 PM, Jakub Jelinek <ja...@redhat.com> wrote: > On Wed, Oct 01, 2014 at 12:28:51PM +0200, Uros Bizjak wrote: >> On Wed, Oct 1, 2014 at 12:16 PM, Evgeny Stupachenko <evstu...@gmail.com> >> wrote: >> > Getting back to initial patch, is it ok? >> >> IMO, we should start with Jakub's proposed patch [1] >> >> [1] https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00010.html > > That doesn't compile, will post a new version; got interrupted when > I found that in > GCC_TEST_RUN_EXPENSIVE=1 make check-gcc > RUNTESTFLAGS='--target_board=unix/-mavx2 dg-torture.exp=vshuf*.c' > one test is miscompiled even with unpatched compiler, debugging that now. > > That said, my patch will not do anything about the > case Mark mentioned { 1, 2, 3, ..., 31, 0 } permutation, > for that we can't do vpalignr followed by vpshufb or similar, > but need to do some permutation first and then vpalignr on > the result. So it would need a new routine. It is still a 2 > insn permutation, not 6, and needs different algorithm, so sharing > the same routine for that is undesirable. > > Jakub