On Fri, May 13, 2022 at 1:43 PM liuhongt <hongtao....@intel.com> wrote: > > When d->perm[i] == d->perm[i-1] + 1 and d->perm[i] == nelt, it's not > continuous. It should fail if there's more than 2 continuous areas. > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > Ok for trunk? > > > gcc/ChangeLog: > > PR target/105587 > * config/i386/i386-expand.cc > (expand_vec_perm_pslldq_psrldq_por): Fail when (d->perm[i] == > d->perm[i-1] + 1) && d->perm[i] == nelt && start != -1. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/pr105587.c: New test.
LGTM, probably even obvious patch. Thanks, Uros. > --- > gcc/config/i386/i386-expand.cc | 9 ++------- > gcc/testsuite/gcc.target/i386/pr105587.c | 11 +++++++++++ > 2 files changed, 13 insertions(+), 7 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/i386/pr105587.c > > diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc > index 0fd3028c205..806e1f5aaa3 100644 > --- a/gcc/config/i386/i386-expand.cc > +++ b/gcc/config/i386/i386-expand.cc > @@ -20963,7 +20963,8 @@ expand_vec_perm_pslldq_psrldq_por (struct > expand_vec_perm_d *d, bool pandn) > start1 = d->perm[0]; > for (i = 1; i < nelt; i++) > { > - if (d->perm[i] != d->perm[i-1] + 1) > + if (d->perm[i] != d->perm[i-1] + 1 > + || d->perm[i] == nelt) > { > if (start2 == -1) > { > @@ -20973,12 +20974,6 @@ expand_vec_perm_pslldq_psrldq_por (struct > expand_vec_perm_d *d, bool pandn) > else > return false; > } > - else if (d->perm[i] >= nelt > - && start2 == -1) > - { > - start2 = d->perm[i]; > - end1 = d->perm[i-1]; > - } > } > > clear_op0 = end1 != nelt - 1; > diff --git a/gcc/testsuite/gcc.target/i386/pr105587.c > b/gcc/testsuite/gcc.target/i386/pr105587.c > new file mode 100644 > index 00000000000..a5b6ab2a016 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr105587.c > @@ -0,0 +1,11 @@ > +#/* { dg-do compile } */ > +/* { dg-options "-O3 -msse2 -mno-ssse3" } */ > + > +extern short arr_108[][4][2][24][12], arr_110[][4][2][24][12]; > +void test() { > + for (unsigned a = 0; a < 2; a += 2) > + for (unsigned b = 4; b < 22; b++) > + for (int c = 1; c < 11; c++) > + arr_110[0][0][a][b][c] = (unsigned char)arr_108[0][0][a][b][c]; > +} > + > -- > 2.18.1 >