On Wed, Aug 28, 2019 at 8:45 AM Jakub Jelinek <ja...@redhat.com> wrote: > > Hi! > > The following two testcases FAIL to be vectorized, because SSE2 doesn't have > many permutation instructions and the one that actually works (whole vector > shifts) aren't enabled for the V4SFmode. > > The following patch fixes it by enabling those optabs also for V4SFmode (and > V2DFmode). Strictly speaking, we need it only for the VI_128 modes plus > V4SFmode, but I'm not sure it is worth adding yet another iterator for > VI_128 + V4SF and the instructions actually do work for V2DFmode too, just > there are also other permutation instructions that handle V2DFmode. > > Bootstrapped/regtested on x86_64-linux, ok for trunk? > > 2019-08-28 Jakub Jelinek <ja...@redhat.com> > > PR libgomp/91530 > * config/i386/sse.md (vec_shl_<mode>, vec_shr_<mode>): Use > V_128 iterator instead of VI_128. > > * testsuite/libgomp.c/scan-21.c: New test. > * testsuite/libgomp.c/scan-22.c: New test.
OK. (We already use integer shifts in floating-point context, e.g. signbit<mode>2 expander in sse.md.) Thanks, Uros. > --- gcc/config/i386/sse.md.jj 2019-08-27 12:26:25.385089103 +0200 > +++ gcc/config/i386/sse.md 2019-08-27 13:50:42.594849445 +0200 > @@ -12047,9 +12047,9 @@ (define_insn "<shift_insn><mode>3<mask_n > (define_expand "vec_shl_<mode>" > [(set (match_dup 3) > (ashift:V1TI > - (match_operand:VI_128 1 "register_operand") > + (match_operand:V_128 1 "register_operand") > (match_operand:SI 2 "const_0_to_255_mul_8_operand"))) > - (set (match_operand:VI_128 0 "register_operand") (match_dup 4))] > + (set (match_operand:V_128 0 "register_operand") (match_dup 4))] > "TARGET_SSE2" > { > operands[1] = gen_lowpart (V1TImode, operands[1]); > @@ -12060,9 +12060,9 @@ (define_expand "vec_shl_<mode>" > (define_expand "vec_shr_<mode>" > [(set (match_dup 3) > (lshiftrt:V1TI > - (match_operand:VI_128 1 "register_operand") > + (match_operand:V_128 1 "register_operand") > (match_operand:SI 2 "const_0_to_255_mul_8_operand"))) > - (set (match_operand:VI_128 0 "register_operand") (match_dup 4))] > + (set (match_operand:V_128 0 "register_operand") (match_dup 4))] > "TARGET_SSE2" > { > operands[1] = gen_lowpart (V1TImode, operands[1]); > --- libgomp/testsuite/libgomp.c/scan-21.c.jj 2019-08-27 22:56:03.805127837 > +0200 > +++ libgomp/testsuite/libgomp.c/scan-21.c 2019-08-27 22:58:26.347043679 > +0200 > @@ -0,0 +1,6 @@ > +/* { dg-require-effective-target size32plus } */ > +/* { dg-require-effective-target avx_runtime } */ > +/* { dg-additional-options "-O2 -fopenmp -fdump-tree-vect-details -msse2 > -mno-sse3" } */ > +/* { dg-final { scan-tree-dump-times "vectorized \[2-6] loops" 2 "vect" } } > */ > + > +#include "scan-13.c" > --- libgomp/testsuite/libgomp.c/scan-22.c.jj 2019-08-27 22:56:51.034437425 > +0200 > +++ libgomp/testsuite/libgomp.c/scan-22.c 2019-08-27 22:59:01.978522645 > +0200 > @@ -0,0 +1,6 @@ > +/* { dg-require-effective-target size32plus } */ > +/* { dg-require-effective-target avx_runtime } */ > +/* { dg-additional-options "-O2 -fopenmp -fdump-tree-vect-details -msse2 > -mno-sse3" } */ > +/* { dg-final { scan-tree-dump-times "vectorized \[2-6] loops" 2 "vect" } } > */ > + > +#include "scan-17.c" > > Jakub