Re: [PATCH] i386: Fix wrong optimization for consecutive masked scatters [PR 101472]
On Fri, Aug 27, 2021 at 10:03 AM Kong, Lingling via Gcc-patches wrote: > > Hi, > > For avx512f_scattersi, mask operand only affect set src, we need > to refine the pattern to let gcc know mask register also affect the dest. > So we put mask operand into UNSPEC_VSIBADDR. > > Bootstrapped and regression tested on x86_64-linux-gnu{-m32,-m64}. > Ok for master? Ok. > > gcc/ChangeLog: > > PR target/101472 > * config/i386/sse.md: (scattersi): Add mask operand to > UNSPEC_VSIBADDR. > (scattersi): Likewise. > (*avx512f_scattersi): Merge mask operand to set_dest. > (*avx512f_scatterdi): Likewise > > gcc/testsuite/ChangeLog: > > PR target/101472 > * gcc.target/i386/avx512f-pr101472.c: New test. > * gcc.target/i386/avx512vl-pr101472.c: New test. > --- > gcc/config/i386/sse.md| 20 +++-- > .../gcc.target/i386/avx512f-pr101472.c| 49 > .../gcc.target/i386/avx512vl-pr101472.c | 79 +++ > 3 files changed, 140 insertions(+), 8 deletions(-) create mode 100644 > gcc/testsuite/gcc.target/i386/avx512f-pr101472.c > create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-pr101472.c > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index > 03fc2df1fb0..a3055dbd316 100644 > --- a/gcc/config/i386/sse.md > +++ b/gcc/config/i386/sse.md > @@ -24205,8 +24205,9 @@ >"TARGET_AVX512F" > { >operands[5] > -= gen_rtx_UNSPEC (Pmode, gen_rtvec (3, operands[0], operands[2], > - operands[4]), UNSPEC_VSIBADDR); > += gen_rtx_UNSPEC (Pmode, gen_rtvec (4, operands[0], operands[2], > + operands[4], operands[1]), > + UNSPEC_VSIBADDR); > }) > > (define_insn "*avx512f_scattersi" > @@ -24214,10 +24215,11 @@ > [(unspec:P > [(match_operand:P 0 "vsib_address_operand" "Tv") > (match_operand: 2 "register_operand" "v") > - (match_operand:SI 4 "const1248_operand" "n")] > + (match_operand:SI 4 "const1248_operand" "n") > + (match_operand: 6 "register_operand" "1")] > UNSPEC_VSIBADDR)]) > (unspec:VI48F > - [(match_operand: 6 "register_operand" "1") > + [(match_dup 6) >(match_operand:VI48F 3 "register_operand" "v")] > UNSPEC_SCATTER)) > (clobber (match_scratch: 1 "=&Yk"))] @@ -24243,8 > +24245,9 @@ >"TARGET_AVX512F" > { >operands[5] > -= gen_rtx_UNSPEC (Pmode, gen_rtvec (3, operands[0], operands[2], > - operands[4]), UNSPEC_VSIBADDR); > += gen_rtx_UNSPEC (Pmode, gen_rtvec (4, operands[0], operands[2], > + operands[4], operands[1]), > + UNSPEC_VSIBADDR); > }) > > (define_insn "*avx512f_scatterdi" > @@ -24252,10 +24255,11 @@ > [(unspec:P > [(match_operand:P 0 "vsib_address_operand" "Tv") > (match_operand: 2 "register_operand" "v") > - (match_operand:SI 4 "const1248_operand" "n")] > + (match_operand:SI 4 "const1248_operand" "n") > + (match_operand:QI 6 "register_operand" "1")] > UNSPEC_VSIBADDR)]) > (unspec:VI48F > - [(match_operand:QI 6 "register_operand" "1") > + [(match_dup 6) >(match_operand: 3 "register_operand" "v")] > UNSPEC_SCATTER)) > (clobber (match_scratch:QI 1 "=&Yk"))] diff --git > a/gcc/testsuite/gcc.target/i386/avx512f-pr101472.c > b/gcc/testsuite/gcc.target/i386/avx512f-pr101472.c > new file mode 100644 > index 000..89c6603c2ff > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/avx512f-pr101472.c > @@ -0,0 +1,49 @@ > +/* PR target/101472 */ > +/* { dg-do compile } */ > +/* { dg-options "-mavx512f -O2" } */ > +/* { dg-final { scan-assembler-times "vpscatterqd\[ > +\\t\]+\[^\{\n\]*ymm\[0-9\]\[^\n\]*zmm\[0-9\]\[^\n\]*{%k\[1-7\]}(?:\n|\[ > +\\t\]+#)" 2 } } */ > +/* { dg-final { scan-assembler-times "vpscatterdd\[ > +\\t\]+\[^\{\n\]*zmm\[0-9\]\[^\n\]*zmm\[0-9\]\[^\n\]*{%k\[1-7\]}(?:\n|\[ > +\\t\]+#)" 2 } } */ > +/* { dg-final { scan-assembler-times "vpscatterqq\[ > +\\t\]+\[^\{\n\]*zmm\[0-9\]\[^\n\]*zmm\[0-9\]\[^\n\]*{%k\[1-7\]}(?:\n|\[ > +\\t\]+#)" 2 } } */ > +/* { dg-final { scan-assembler-times "vpscatterdq\[ > +\\t\]+\[^\{\n\]*zmm\[0-9\]\[^\n\]*ymm\[0-9\]\[^\n\]*{%k\[1-7\]}(?:\n|\[ > +\\t\]+#)" 2 } } */ > +/* { dg-final { scan-assembler-times "vscatterqps\[ > +\\t\]+\[^\{\n\]*ymm\[0-9\]\[^\n\]*zmm\[0-9\]\[^\n\]*{%k\[1-7\]}(?:\n|\[ > +\\t\]+#)" 2 } } */ > +/* { dg-final { scan-assembler-times "vscatterdps\[ > +\\t\]+\[^\{\n\]*zmm\[0-9\]\[^\n\]*zmm\[0-9\]\[^\n\]*{%k\[1-7\]}(?:\n|\[ > +\\t\]+#)" 2 } } */ > +/* { dg-final { scan-assembler-times "vscatterqpd\[ > +\\t\]+\[^\{\n\]*zmm\[0-9\]\[^\n\]*zmm\[0-9\]\[^\n\]*{%k\[1-7\]}(?:\n|\[ > +\\t\]+#)" 2 } } */ >
[PATCH] i386: Fix wrong optimization for consecutive masked scatters [PR 101472]
Hi, For avx512f_scattersi, mask operand only affect set src, we need to refine the pattern to let gcc know mask register also affect the dest. So we put mask operand into UNSPEC_VSIBADDR. Bootstrapped and regression tested on x86_64-linux-gnu{-m32,-m64}. Ok for master? gcc/ChangeLog: PR target/101472 * config/i386/sse.md: (scattersi): Add mask operand to UNSPEC_VSIBADDR. (scattersi): Likewise. (*avx512f_scattersi): Merge mask operand to set_dest. (*avx512f_scatterdi): Likewise gcc/testsuite/ChangeLog: PR target/101472 * gcc.target/i386/avx512f-pr101472.c: New test. * gcc.target/i386/avx512vl-pr101472.c: New test. --- gcc/config/i386/sse.md| 20 +++-- .../gcc.target/i386/avx512f-pr101472.c| 49 .../gcc.target/i386/avx512vl-pr101472.c | 79 +++ 3 files changed, 140 insertions(+), 8 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-pr101472.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-pr101472.c diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 03fc2df1fb0..a3055dbd316 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -24205,8 +24205,9 @@ "TARGET_AVX512F" { operands[5] -= gen_rtx_UNSPEC (Pmode, gen_rtvec (3, operands[0], operands[2], - operands[4]), UNSPEC_VSIBADDR); += gen_rtx_UNSPEC (Pmode, gen_rtvec (4, operands[0], operands[2], + operands[4], operands[1]), + UNSPEC_VSIBADDR); }) (define_insn "*avx512f_scattersi" @@ -24214,10 +24215,11 @@ [(unspec:P [(match_operand:P 0 "vsib_address_operand" "Tv") (match_operand: 2 "register_operand" "v") - (match_operand:SI 4 "const1248_operand" "n")] + (match_operand:SI 4 "const1248_operand" "n") + (match_operand: 6 "register_operand" "1")] UNSPEC_VSIBADDR)]) (unspec:VI48F - [(match_operand: 6 "register_operand" "1") + [(match_dup 6) (match_operand:VI48F 3 "register_operand" "v")] UNSPEC_SCATTER)) (clobber (match_scratch: 1 "=&Yk"))] @@ -24243,8 +24245,9 @@ "TARGET_AVX512F" { operands[5] -= gen_rtx_UNSPEC (Pmode, gen_rtvec (3, operands[0], operands[2], - operands[4]), UNSPEC_VSIBADDR); += gen_rtx_UNSPEC (Pmode, gen_rtvec (4, operands[0], operands[2], + operands[4], operands[1]), + UNSPEC_VSIBADDR); }) (define_insn "*avx512f_scatterdi" @@ -24252,10 +24255,11 @@ [(unspec:P [(match_operand:P 0 "vsib_address_operand" "Tv") (match_operand: 2 "register_operand" "v") - (match_operand:SI 4 "const1248_operand" "n")] + (match_operand:SI 4 "const1248_operand" "n") + (match_operand:QI 6 "register_operand" "1")] UNSPEC_VSIBADDR)]) (unspec:VI48F - [(match_operand:QI 6 "register_operand" "1") + [(match_dup 6) (match_operand: 3 "register_operand" "v")] UNSPEC_SCATTER)) (clobber (match_scratch:QI 1 "=&Yk"))] diff --git a/gcc/testsuite/gcc.target/i386/avx512f-pr101472.c b/gcc/testsuite/gcc.target/i386/avx512f-pr101472.c new file mode 100644 index 000..89c6603c2ff --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512f-pr101472.c @@ -0,0 +1,49 @@ +/* PR target/101472 */ +/* { dg-do compile } */ +/* { dg-options "-mavx512f -O2" } */ +/* { dg-final { scan-assembler-times "vpscatterqd\[ +\\t\]+\[^\{\n\]*ymm\[0-9\]\[^\n\]*zmm\[0-9\]\[^\n\]*{%k\[1-7\]}(?:\n|\[ +\\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vpscatterdd\[ +\\t\]+\[^\{\n\]*zmm\[0-9\]\[^\n\]*zmm\[0-9\]\[^\n\]*{%k\[1-7\]}(?:\n|\[ +\\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vpscatterqq\[ +\\t\]+\[^\{\n\]*zmm\[0-9\]\[^\n\]*zmm\[0-9\]\[^\n\]*{%k\[1-7\]}(?:\n|\[ +\\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vpscatterdq\[ +\\t\]+\[^\{\n\]*zmm\[0-9\]\[^\n\]*ymm\[0-9\]\[^\n\]*{%k\[1-7\]}(?:\n|\[ +\\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vscatterqps\[ +\\t\]+\[^\{\n\]*ymm\[0-9\]\[^\n\]*zmm\[0-9\]\[^\n\]*{%k\[1-7\]}(?:\n|\[ +\\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vscatterdps\[ +\\t\]+\[^\{\n\]*zmm\[0-9\]\[^\n\]*zmm\[0-9\]\[^\n\]*{%k\[1-7\]}(?:\n|\[ +\\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vscatterqpd\[ +\\t\]+\[^\{\n\]*zmm\[0-9\]\[^\n\]*zmm\[0-9\]\[^\n\]*{%k\[1-7\]}(?:\n|\[ +\\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vscatterdpd\[ +\\t\]+\[^\{\n\]*zmm\[0-9\]\[^\n\]*ymm\[0-9\]\[^\n\]*{%k\[1-7\]}(?:\n|\[ +\\t\]+#)" 2 } } */ + +#include + +void two_scatters_epi32(void* addr, __mmask8 k1, __mmask8 k2, __m512i vindex, +__m256i a, __m512i b)
Re: [PATCH] i386: Fix wrong optimization for consecutive masked scatters [PR 101472]
On Wed, Aug 25, 2021 at 2:14 PM Kong, Lingling via Gcc-patches wrote: > > Hi, > > For avx512f_scattersi, mask operand only affect set src, we > need to refine the pattern to let gcc know mask register also affect the dest. > So we put mask operand into UNSPEC_VSIBADDR. > > Bootstrapped and regression tested on x86_64-linux-gnu{-m32,-m64}. > Ok for master? > > gcc/ChangeLog: > > *config/i386/sse.md (scattersi): Add mask operand to > UNSPEC_VSIBADDR. > (scattersi): Likewise. > (*avx512f_scattersi): Merge mask operand > to set_dest. > (*avx512f_scatterdi): Likewise > > gcc/testsuite/ChangeLog: > > *gcc.target/i386/avx512f-pr101472.c: New test. > *gcc.target/i386/avx512vl-pr101472.c: Ditto. Please follow GCC Coding Convention ChanLog which is described in https://gcc.gnu.org/codingconventions.html#ChangeLogs. -= gen_rtx_UNSPEC (Pmode, gen_rtvec (3, operands[0], operands[2], - operands[4]), UNSPEC_VSIBADDR); += gen_rtx_UNSPEC (Pmode, gen_rtvec (4, operands[0], operands[2], + operands[4], operands[1]), UNSPEC_VSIBADDR); Lines shall be at most 80 columns. }) (define_insn "*avx512f_scattersi" @@ -24214,10 +24214,11 @@ [(unspec:P [(match_operand:P 0 "vsib_address_operand" "Tv") (match_operand: 2 "register_operand" "v") - (match_operand:SI 4 "const1248_operand" "n")] + (match_operand:SI 4 "const1248_operand" "n") + (match_operand: 6 "register_operand" "1")] UNSPEC_VSIBADDR)]) (unspec:VI48F - [(match_operand: 6 "register_operand" "1") + [(match_dup 6) (match_operand:VI48F 3 "register_operand" "v")] UNSPEC_SCATTER)) (clobber (match_scratch: 1 "=&Yk"))] @@ -24243,8 +24244,8 @@ "TARGET_AVX512F" { operands[5] -= gen_rtx_UNSPEC (Pmode, gen_rtvec (3, operands[0], operands[2], - operands[4]), UNSPEC_VSIBADDR); += gen_rtx_UNSPEC (Pmode, gen_rtvec (4, operands[0], operands[2], + operands[4], operands[1]), UNSPEC_VSIBADDR); Ditto. }) -- BR, Hongtao
[PATCH] i386: Fix wrong optimization for consecutive masked scatters [PR 101472]
Hi, For avx512f_scattersi, mask operand only affect set src, we need to refine the pattern to let gcc know mask register also affect the dest. So we put mask operand into UNSPEC_VSIBADDR. Bootstrapped and regression tested on x86_64-linux-gnu{-m32,-m64}. Ok for master? gcc/ChangeLog: *config/i386/sse.md (scattersi): Add mask operand to UNSPEC_VSIBADDR. (scattersi): Likewise. (*avx512f_scattersi): Merge mask operand to set_dest. (*avx512f_scatterdi): Likewise gcc/testsuite/ChangeLog: *gcc.target/i386/avx512f-pr101472.c: New test. *gcc.target/i386/avx512vl-pr101472.c: Ditto. 0001-i386-Fix-wrong-optimization-for-consecutive-masked-s.patch Description: 0001-i386-Fix-wrong-optimization-for-consecutive-masked-s.patch