On Wed, Aug 28, 2019 at 8:45 AM Jakub Jelinek <ja...@redhat.com> wrote:
>
> Hi!
>
> The following two testcases FAIL to be vectorized, because SSE2 doesn't have
> many permutation instructions and the one that actually works (whole vector
> shifts) aren't enabled for the V4SFmode.
>
> The following patch fixes it by enabling those optabs also for V4SFmode (and
> V2DFmode).  Strictly speaking, we need it only for the VI_128 modes plus
> V4SFmode, but I'm not sure it is worth adding yet another iterator for
> VI_128 + V4SF and the instructions actually do work for V2DFmode too, just
> there are also other permutation instructions that handle V2DFmode.
>
> Bootstrapped/regtested on x86_64-linux, ok for trunk?
>
> 2019-08-28  Jakub Jelinek  <ja...@redhat.com>
>
>         PR libgomp/91530
>         * config/i386/sse.md (vec_shl_<mode>, vec_shr_<mode>): Use
>         V_128 iterator instead of VI_128.
>
>         * testsuite/libgomp.c/scan-21.c: New test.
>         * testsuite/libgomp.c/scan-22.c: New test.

OK.

(We already use integer shifts in floating-point context, e.g.
signbit<mode>2 expander in sse.md.)

Thanks,
Uros.

> --- gcc/config/i386/sse.md.jj   2019-08-27 12:26:25.385089103 +0200
> +++ gcc/config/i386/sse.md      2019-08-27 13:50:42.594849445 +0200
> @@ -12047,9 +12047,9 @@ (define_insn "<shift_insn><mode>3<mask_n
>  (define_expand "vec_shl_<mode>"
>    [(set (match_dup 3)
>         (ashift:V1TI
> -        (match_operand:VI_128 1 "register_operand")
> +        (match_operand:V_128 1 "register_operand")
>          (match_operand:SI 2 "const_0_to_255_mul_8_operand")))
> -   (set (match_operand:VI_128 0 "register_operand") (match_dup 4))]
> +   (set (match_operand:V_128 0 "register_operand") (match_dup 4))]
>    "TARGET_SSE2"
>  {
>    operands[1] = gen_lowpart (V1TImode, operands[1]);
> @@ -12060,9 +12060,9 @@ (define_expand "vec_shl_<mode>"
>  (define_expand "vec_shr_<mode>"
>    [(set (match_dup 3)
>         (lshiftrt:V1TI
> -        (match_operand:VI_128 1 "register_operand")
> +        (match_operand:V_128 1 "register_operand")
>          (match_operand:SI 2 "const_0_to_255_mul_8_operand")))
> -   (set (match_operand:VI_128 0 "register_operand") (match_dup 4))]
> +   (set (match_operand:V_128 0 "register_operand") (match_dup 4))]
>    "TARGET_SSE2"
>  {
>    operands[1] = gen_lowpart (V1TImode, operands[1]);
> --- libgomp/testsuite/libgomp.c/scan-21.c.jj    2019-08-27 22:56:03.805127837 
> +0200
> +++ libgomp/testsuite/libgomp.c/scan-21.c       2019-08-27 22:58:26.347043679 
> +0200
> @@ -0,0 +1,6 @@
> +/* { dg-require-effective-target size32plus } */
> +/* { dg-require-effective-target avx_runtime } */
> +/* { dg-additional-options "-O2 -fopenmp -fdump-tree-vect-details -msse2 
> -mno-sse3" } */
> +/* { dg-final { scan-tree-dump-times "vectorized \[2-6] loops" 2 "vect" } } 
> */
> +
> +#include "scan-13.c"
> --- libgomp/testsuite/libgomp.c/scan-22.c.jj    2019-08-27 22:56:51.034437425 
> +0200
> +++ libgomp/testsuite/libgomp.c/scan-22.c       2019-08-27 22:59:01.978522645 
> +0200
> @@ -0,0 +1,6 @@
> +/* { dg-require-effective-target size32plus } */
> +/* { dg-require-effective-target avx_runtime } */
> +/* { dg-additional-options "-O2 -fopenmp -fdump-tree-vect-details -msse2 
> -mno-sse3" } */
> +/* { dg-final { scan-tree-dump-times "vectorized \[2-6] loops" 2 "vect" } } 
> */
> +
> +#include "scan-17.c"
>
>         Jakub

Reply via email to