Re: [PATCH] i386: Fix wrong optimization for consecutive masked scatters [PR 101472]

2021-08-26 Thread Hongtao Liu via Gcc-patches
On Fri, Aug 27, 2021 at 10:03 AM Kong, Lingling via Gcc-patches
 wrote:
>
> Hi,
>
> For avx512f_scattersi, mask operand only affect set src, we need 
> to refine the pattern to let gcc know mask register also affect the dest.
> So we put mask operand into UNSPEC_VSIBADDR.
>
> Bootstrapped and regression tested on x86_64-linux-gnu{-m32,-m64}.
> Ok for master?
Ok.
>
> gcc/ChangeLog:
>
> PR target/101472
> * config/i386/sse.md: (scattersi): Add mask operand to
> UNSPEC_VSIBADDR.
> (scattersi): Likewise.
> (*avx512f_scattersi): Merge mask operand to set_dest.
> (*avx512f_scatterdi): Likewise
>
> gcc/testsuite/ChangeLog:
>
> PR target/101472
> * gcc.target/i386/avx512f-pr101472.c: New test.
> * gcc.target/i386/avx512vl-pr101472.c: New test.
> ---
>  gcc/config/i386/sse.md| 20 +++--
>  .../gcc.target/i386/avx512f-pr101472.c| 49 
>  .../gcc.target/i386/avx512vl-pr101472.c   | 79 +++
>  3 files changed, 140 insertions(+), 8 deletions(-)  create mode 100644 
> gcc/testsuite/gcc.target/i386/avx512f-pr101472.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-pr101472.c
>
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 
> 03fc2df1fb0..a3055dbd316 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -24205,8 +24205,9 @@
>"TARGET_AVX512F"
>  {
>operands[5]
> -= gen_rtx_UNSPEC (Pmode, gen_rtvec (3, operands[0], operands[2],
> -   operands[4]), UNSPEC_VSIBADDR);
> += gen_rtx_UNSPEC (Pmode, gen_rtvec (4, operands[0], operands[2],
> +   operands[4], operands[1]),
> +   UNSPEC_VSIBADDR);
>  })
>
>  (define_insn "*avx512f_scattersi"
> @@ -24214,10 +24215,11 @@
>   [(unspec:P
>  [(match_operand:P 0 "vsib_address_operand" "Tv")
>   (match_operand: 2 "register_operand" "v")
> - (match_operand:SI 4 "const1248_operand" "n")]
> + (match_operand:SI 4 "const1248_operand" "n")
> + (match_operand: 6 "register_operand" "1")]
>  UNSPEC_VSIBADDR)])
> (unspec:VI48F
> - [(match_operand: 6 "register_operand" "1")
> + [(match_dup 6)
>(match_operand:VI48F 3 "register_operand" "v")]
>   UNSPEC_SCATTER))
> (clobber (match_scratch: 1 "="))] @@ -24243,8 
> +24245,9 @@
>"TARGET_AVX512F"
>  {
>operands[5]
> -= gen_rtx_UNSPEC (Pmode, gen_rtvec (3, operands[0], operands[2],
> -   operands[4]), UNSPEC_VSIBADDR);
> += gen_rtx_UNSPEC (Pmode, gen_rtvec (4, operands[0], operands[2],
> +   operands[4], operands[1]),
> +   UNSPEC_VSIBADDR);
>  })
>
>  (define_insn "*avx512f_scatterdi"
> @@ -24252,10 +24255,11 @@
>   [(unspec:P
>  [(match_operand:P 0 "vsib_address_operand" "Tv")
>   (match_operand: 2 "register_operand" "v")
> - (match_operand:SI 4 "const1248_operand" "n")]
> + (match_operand:SI 4 "const1248_operand" "n")
> + (match_operand:QI 6 "register_operand" "1")]
>  UNSPEC_VSIBADDR)])
> (unspec:VI48F
> - [(match_operand:QI 6 "register_operand" "1")
> + [(match_dup 6)
>(match_operand: 3 "register_operand" "v")]
>   UNSPEC_SCATTER))
> (clobber (match_scratch:QI 1 "="))] diff --git 
> a/gcc/testsuite/gcc.target/i386/avx512f-pr101472.c 
> b/gcc/testsuite/gcc.target/i386/avx512f-pr101472.c
> new file mode 100644
> index 000..89c6603c2ff
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx512f-pr101472.c
> @@ -0,0 +1,49 @@
> +/* PR target/101472 */
> +/* { dg-do compile } */
> +/* { dg-options "-mavx512f -O2" } */
> +/* { dg-final { scan-assembler-times "vpscatterqd\[
> +\\t\]+\[^\{\n\]*ymm\[0-9\]\[^\n\]*zmm\[0-9\]\[^\n\]*{%k\[1-7\]}(?:\n|\[
> +\\t\]+#)" 2 } } */
> +/* { dg-final { scan-assembler-times "vpscatterdd\[
> +\\t\]+\[^\{\n\]*zmm\[0-9\]\[^\n\]*zmm\[0-9\]\[^\n\]*{%k\[1-7\]}(?:\n|\[
> +\\t\]+#)" 2 } } */
> +/* { dg-final { scan-assembler-times "vpscatterqq\[
> +\\t\]+\[^\{\n\]*zmm\[0-9\]\[^\n\]*zmm\[0-9\]\[^\n\]*{%k\[1-7\]}(?:\n|\[
> +\\t\]+#)" 2 } } */
> +/* { dg-final { scan-assembler-times "vpscatterdq\[
> +\\t\]+\[^\{\n\]*zmm\[0-9\]\[^\n\]*ymm\[0-9\]\[^\n\]*{%k\[1-7\]}(?:\n|\[
> +\\t\]+#)" 2 } } */
> +/* { dg-final { scan-assembler-times "vscatterqps\[
> +\\t\]+\[^\{\n\]*ymm\[0-9\]\[^\n\]*zmm\[0-9\]\[^\n\]*{%k\[1-7\]}(?:\n|\[
> +\\t\]+#)" 2 } } */
> +/* { dg-final { scan-assembler-times "vscatterdps\[
> +\\t\]+\[^\{\n\]*zmm\[0-9\]\[^\n\]*zmm\[0-9\]\[^\n\]*{%k\[1-7\]}(?:\n|\[
> +\\t\]+#)" 2 } } */
> +/* { dg-final { scan-assembler-times "vscatterqpd\[
> +\\t\]+\[^\{\n\]*zmm\[0-9\]\[^\n\]*zmm\[0-9\]\[^\n\]*{%k\[1-7\]}(?:\n|\[
> +\\t\]+#)" 2 } } */
> +/* { 

[PATCH] i386: Fix wrong optimization for consecutive masked scatters [PR 101472]

2021-08-26 Thread Kong, Lingling via Gcc-patches
Hi,

For avx512f_scattersi, mask operand only affect set src, we need to 
refine the pattern to let gcc know mask register also affect the dest.
So we put mask operand into UNSPEC_VSIBADDR.

Bootstrapped and regression tested on x86_64-linux-gnu{-m32,-m64}.
Ok for master?

gcc/ChangeLog:

PR target/101472
* config/i386/sse.md: (scattersi): Add mask operand to
UNSPEC_VSIBADDR.
(scattersi): Likewise.
(*avx512f_scattersi): Merge mask operand to set_dest.
(*avx512f_scatterdi): Likewise

gcc/testsuite/ChangeLog:

PR target/101472
* gcc.target/i386/avx512f-pr101472.c: New test.
* gcc.target/i386/avx512vl-pr101472.c: New test.
---
 gcc/config/i386/sse.md| 20 +++--
 .../gcc.target/i386/avx512f-pr101472.c| 49 
 .../gcc.target/i386/avx512vl-pr101472.c   | 79 +++
 3 files changed, 140 insertions(+), 8 deletions(-)  create mode 100644 
gcc/testsuite/gcc.target/i386/avx512f-pr101472.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-pr101472.c

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 
03fc2df1fb0..a3055dbd316 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -24205,8 +24205,9 @@
   "TARGET_AVX512F"
 {
   operands[5]
-= gen_rtx_UNSPEC (Pmode, gen_rtvec (3, operands[0], operands[2],
-   operands[4]), UNSPEC_VSIBADDR);
+= gen_rtx_UNSPEC (Pmode, gen_rtvec (4, operands[0], operands[2],
+   operands[4], operands[1]), 
+   UNSPEC_VSIBADDR);
 })
 
 (define_insn "*avx512f_scattersi"
@@ -24214,10 +24215,11 @@
  [(unspec:P
 [(match_operand:P 0 "vsib_address_operand" "Tv")
  (match_operand: 2 "register_operand" "v")
- (match_operand:SI 4 "const1248_operand" "n")]
+ (match_operand:SI 4 "const1248_operand" "n")
+ (match_operand: 6 "register_operand" "1")]
 UNSPEC_VSIBADDR)])
(unspec:VI48F
- [(match_operand: 6 "register_operand" "1")
+ [(match_dup 6)
   (match_operand:VI48F 3 "register_operand" "v")]
  UNSPEC_SCATTER))
(clobber (match_scratch: 1 "="))] @@ -24243,8 +24245,9 
@@
   "TARGET_AVX512F"
 {
   operands[5]
-= gen_rtx_UNSPEC (Pmode, gen_rtvec (3, operands[0], operands[2],
-   operands[4]), UNSPEC_VSIBADDR);
+= gen_rtx_UNSPEC (Pmode, gen_rtvec (4, operands[0], operands[2],
+   operands[4], operands[1]), 
+   UNSPEC_VSIBADDR);
 })
 
 (define_insn "*avx512f_scatterdi"
@@ -24252,10 +24255,11 @@
  [(unspec:P
 [(match_operand:P 0 "vsib_address_operand" "Tv")
  (match_operand: 2 "register_operand" "v")
- (match_operand:SI 4 "const1248_operand" "n")]
+ (match_operand:SI 4 "const1248_operand" "n")
+ (match_operand:QI 6 "register_operand" "1")]
 UNSPEC_VSIBADDR)])
(unspec:VI48F
- [(match_operand:QI 6 "register_operand" "1")
+ [(match_dup 6)
   (match_operand: 3 "register_operand" "v")]
  UNSPEC_SCATTER))
(clobber (match_scratch:QI 1 "="))] diff --git 
a/gcc/testsuite/gcc.target/i386/avx512f-pr101472.c 
b/gcc/testsuite/gcc.target/i386/avx512f-pr101472.c
new file mode 100644
index 000..89c6603c2ff
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-pr101472.c
@@ -0,0 +1,49 @@
+/* PR target/101472 */
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vpscatterqd\[ 
+\\t\]+\[^\{\n\]*ymm\[0-9\]\[^\n\]*zmm\[0-9\]\[^\n\]*{%k\[1-7\]}(?:\n|\[ 
+\\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vpscatterdd\[ 
+\\t\]+\[^\{\n\]*zmm\[0-9\]\[^\n\]*zmm\[0-9\]\[^\n\]*{%k\[1-7\]}(?:\n|\[ 
+\\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vpscatterqq\[ 
+\\t\]+\[^\{\n\]*zmm\[0-9\]\[^\n\]*zmm\[0-9\]\[^\n\]*{%k\[1-7\]}(?:\n|\[ 
+\\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vpscatterdq\[ 
+\\t\]+\[^\{\n\]*zmm\[0-9\]\[^\n\]*ymm\[0-9\]\[^\n\]*{%k\[1-7\]}(?:\n|\[ 
+\\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vscatterqps\[ 
+\\t\]+\[^\{\n\]*ymm\[0-9\]\[^\n\]*zmm\[0-9\]\[^\n\]*{%k\[1-7\]}(?:\n|\[ 
+\\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vscatterdps\[ 
+\\t\]+\[^\{\n\]*zmm\[0-9\]\[^\n\]*zmm\[0-9\]\[^\n\]*{%k\[1-7\]}(?:\n|\[ 
+\\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vscatterqpd\[ 
+\\t\]+\[^\{\n\]*zmm\[0-9\]\[^\n\]*zmm\[0-9\]\[^\n\]*{%k\[1-7\]}(?:\n|\[ 
+\\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vscatterdpd\[ 
+\\t\]+\[^\{\n\]*zmm\[0-9\]\[^\n\]*ymm\[0-9\]\[^\n\]*{%k\[1-7\]}(?:\n|\[ 
+\\t\]+#)" 2 } } */
+
+#include 
+
+void two_scatters_epi32(void* addr, __mmask8 k1, __mmask8 k2, __m512i vindex, 
+__m256i a, __m512i b)
+{
+ 

Re: [PATCH] i386: Fix wrong optimization for consecutive masked scatters [PR 101472]

2021-08-25 Thread Hongtao Liu via Gcc-patches
On Wed, Aug 25, 2021 at 2:14 PM Kong, Lingling via Gcc-patches
 wrote:
>
> Hi,
>
> For avx512f_scattersi, mask operand only affect set src, we
> need to refine the pattern to let gcc know mask register also affect the dest.
> So we put mask operand into UNSPEC_VSIBADDR.
>
> Bootstrapped and regression tested on x86_64-linux-gnu{-m32,-m64}.
> Ok for master?
>
> gcc/ChangeLog:
>
> *config/i386/sse.md (scattersi): Add mask operand to
> UNSPEC_VSIBADDR.
> (scattersi): Likewise.
> (*avx512f_scattersi): Merge mask operand
> to set_dest.
> (*avx512f_scatterdi): Likewise
>
> gcc/testsuite/ChangeLog:
>
> *gcc.target/i386/avx512f-pr101472.c: New test.
> *gcc.target/i386/avx512vl-pr101472.c: Ditto.
Please follow GCC Coding Convention ChanLog which is described in
https://gcc.gnu.org/codingconventions.html#ChangeLogs.

-= gen_rtx_UNSPEC (Pmode, gen_rtvec (3, operands[0], operands[2],
-   operands[4]), UNSPEC_VSIBADDR);
+= gen_rtx_UNSPEC (Pmode, gen_rtvec (4, operands[0], operands[2],
+   operands[4], operands[1]),
UNSPEC_VSIBADDR);
Lines shall be at most 80 columns.
 })

 (define_insn "*avx512f_scattersi"
@@ -24214,10 +24214,11 @@
  [(unspec:P
 [(match_operand:P 0 "vsib_address_operand" "Tv")
  (match_operand: 2 "register_operand" "v")
- (match_operand:SI 4 "const1248_operand" "n")]
+ (match_operand:SI 4 "const1248_operand" "n")
+ (match_operand: 6 "register_operand" "1")]
 UNSPEC_VSIBADDR)])
(unspec:VI48F
- [(match_operand: 6 "register_operand" "1")
+ [(match_dup 6)
   (match_operand:VI48F 3 "register_operand" "v")]
  UNSPEC_SCATTER))
(clobber (match_scratch: 1 "="))]
@@ -24243,8 +24244,8 @@
   "TARGET_AVX512F"
 {
   operands[5]
-= gen_rtx_UNSPEC (Pmode, gen_rtvec (3, operands[0], operands[2],
-   operands[4]), UNSPEC_VSIBADDR);
+= gen_rtx_UNSPEC (Pmode, gen_rtvec (4, operands[0], operands[2],
+   operands[4], operands[1]),
UNSPEC_VSIBADDR);
Ditto.
 })

-- 
BR,
Hongtao


[PATCH] i386: Fix wrong optimization for consecutive masked scatters [PR 101472]

2021-08-25 Thread Kong, Lingling via Gcc-patches
Hi,

For avx512f_scattersi, mask operand only affect set src, we
need to refine the pattern to let gcc know mask register also affect the dest.
So we put mask operand into UNSPEC_VSIBADDR.

Bootstrapped and regression tested on x86_64-linux-gnu{-m32,-m64}.
Ok for master?

gcc/ChangeLog:

*config/i386/sse.md (scattersi): Add mask operand to
UNSPEC_VSIBADDR.
(scattersi): Likewise.
(*avx512f_scattersi): Merge mask operand
to set_dest.
(*avx512f_scatterdi): Likewise

gcc/testsuite/ChangeLog:

*gcc.target/i386/avx512f-pr101472.c: New test.
*gcc.target/i386/avx512vl-pr101472.c: Ditto.


0001-i386-Fix-wrong-optimization-for-consecutive-masked-s.patch
Description: 0001-i386-Fix-wrong-optimization-for-consecutive-masked-s.patch