Re: [PATCH][AVX512]Lower AVX512 vector compare to AVX version when dest is vector
On Wed, Jan 06, 2021 at 11:34:32AM +0800, Hongtao Liu via Gcc-patches wrote: > > >> > > >> Note there's a data dependency between them. insn 7 feeds insn 9. When > > >> there's a data dependency, combiner patterns are usually the better > > >> choice than peepholes. I think you'd be looking to match something > > >> likethis (from the . combine dump): > > >> > > Using combiner patterns, details is discussed in PR98348 > > Boottrapped and regtested on x86_64-linux-gnu{-m32,} for both GCC10 and trunk. > gcc/ChangeLog: > > PR target/96891 > PR target/98348 > * config/i386/sse.md (VI_128_256): New mode iterator. > (*avx_cmp3_1, *avx_cmp3_2, *avx_cmp3_3, > *avx_cmp3_4, *avx2_eq3, *avx2_pcmp3_1, > *avx2_pcmp3_2, *avx2_gt3): New > define_insn_and_split to lower avx512 vector comparison to avx > version when dest is vector. > (*_cmp3,*_cmp3,*_ucmp3): > define_insn_and_split for negating the comparison result. > * config/i386/predicates.md (float_vector_all_ones_operand): > New predicate. > * config/i386/i386-expand.c (ix86_expand_sse_movcc): Use > general NOT operator without UNSPEC_MASKOP. > > gcc/testsuite/ChangeLog: > > PR target/96891 > PR target/98348 > * gcc.target/i386/avx512bw-pr96891-1.c: New test. > * gcc.target/i386/avx512f-pr96891-1.c: New test. > * gcc.target/i386/avx512f-pr96891-2.c: New test. > * gcc.target/i386/avx512f-pr96891-3.c: New test. > * g++.target/i386/avx512f-pr96891-1.C: New test. > * gcc.target/i386/bitwise_mask_op-3.c: Adjust testcase. Ok for trunk. I'd prefer not to backport it to GCC 10. Jakub
Re: [PATCH][AVX512]Lower AVX512 vector compare to AVX version when dest is vector
> >> > >> Note there's a data dependency between them. insn 7 feeds insn 9. When > >> there's a data dependency, combiner patterns are usually the better > >> choice than peepholes. I think you'd be looking to match something > >> likethis (from the . combine dump): > >> Using combiner patterns, details is discussed in PR98348 Boottrapped and regtested on x86_64-linux-gnu{-m32,} for both GCC10 and trunk. gcc/ChangeLog: PR target/96891 PR target/98348 * config/i386/sse.md (VI_128_256): New mode iterator. (*avx_cmp3_1, *avx_cmp3_2, *avx_cmp3_3, *avx_cmp3_4, *avx2_eq3, *avx2_pcmp3_1, *avx2_pcmp3_2, *avx2_gt3): New define_insn_and_split to lower avx512 vector comparison to avx version when dest is vector. (*_cmp3,*_cmp3,*_ucmp3): define_insn_and_split for negating the comparison result. * config/i386/predicates.md (float_vector_all_ones_operand): New predicate. * config/i386/i386-expand.c (ix86_expand_sse_movcc): Use general NOT operator without UNSPEC_MASKOP. gcc/testsuite/ChangeLog: PR target/96891 PR target/98348 * gcc.target/i386/avx512bw-pr96891-1.c: New test. * gcc.target/i386/avx512f-pr96891-1.c: New test. * gcc.target/i386/avx512f-pr96891-2.c: New test. * gcc.target/i386/avx512f-pr96891-3.c: New test. * g++.target/i386/avx512f-pr96891-1.C: New test. * gcc.target/i386/bitwise_mask_op-3.c: Adjust testcase. > > Jeff > -- BR, Hongtao From 240c830b3d35f7571da876a21aa71e263c3abe80 Mon Sep 17 00:00:00 2001 From: liuhongt Date: Fri, 18 Dec 2020 15:56:06 +0800 Subject: [PATCH] Lower AVX512 vector comparison to AVX version when dest is vector. gcc/ChangeLog: PR target/96891 PR target/98348 * config/i386/sse.md (VI_128_256): New mode iterator. (*avx_cmp3_1, *avx_cmp3_2, *avx_cmp3_3, *avx_cmp3_4, *avx2_eq3, *avx2_pcmp3_1, *avx2_pcmp3_2, *avx2_gt3): New define_insn_and_split to lower avx512 vector comparison to avx version when dest is vector. (*_cmp3,*_cmp3,*_ucmp3): define_insn_and_split for negating the comparison result. * config/i386/predicates.md (float_vector_all_ones_operand): New predicate. * config/i386/i386-expand.c (ix86_expand_sse_movcc): Use general NOT operator without UNSPEC_MASKOP. gcc/testsuite/ChangeLog: PR target/96891 PR target/98348 * gcc.target/i386/avx512bw-pr96891-1.c: New test. * gcc.target/i386/avx512f-pr96891-1.c: New test. * gcc.target/i386/avx512f-pr96891-2.c: New test. * gcc.target/i386/avx512f-pr96891-3.c: New test. * g++.target/i386/avx512f-pr96891-1.C: New test. * gcc.target/i386/bitwise_mask_op-3.c: Adjust testcase. --- gcc/config/i386/i386-expand.c | 14 +- gcc/config/i386/predicates.md | 47 gcc/config/i386/sse.md| 261 +- .../g++.target/i386/avx512f-pr96891-1.C | 37 +++ .../gcc.target/i386/avx512bw-pr96891-1.c | 75 + .../gcc.target/i386/avx512f-pr96891-1.c | 40 +++ .../gcc.target/i386/avx512f-pr96891-2.c | 30 ++ .../gcc.target/i386/avx512f-pr96891-3.c | 39 +++ .../gcc.target/i386/bitwise_mask_op-3.c | 1 - 9 files changed, 531 insertions(+), 13 deletions(-) create mode 100644 gcc/testsuite/g++.target/i386/avx512f-pr96891-1.C create mode 100644 gcc/testsuite/gcc.target/i386/avx512bw-pr96891-1.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-pr96891-1.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-pr96891-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-pr96891-3.c diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c index 6e08fd32726..b4f8b275718 100644 --- a/gcc/config/i386/i386-expand.c +++ b/gcc/config/i386/i386-expand.c @@ -3568,17 +3568,11 @@ ix86_expand_sse_movcc (rtx dest, rtx cmp, rtx op_true, rtx op_false) ? force_reg (mode, op_false) : op_false); if (op_true == CONST0_RTX (mode)) { - rtx (*gen_not) (rtx, rtx); - switch (cmpmode) - { - case E_QImode: gen_not = gen_knotqi; break; - case E_HImode: gen_not = gen_knothi; break; - case E_SImode: gen_not = gen_knotsi; break; - case E_DImode: gen_not = gen_knotdi; break; - default: gcc_unreachable (); - } rtx n = gen_reg_rtx (cmpmode); - emit_insn (gen_not (n, cmp)); + if (cmpmode == E_DImode && !TARGET_64BIT) + emit_insn (gen_knotdi (n, cmp)); + else + emit_insn (gen_rtx_SET (n, gen_rtx_fmt_e (NOT, cmpmode, cmp))); cmp = n; /* Reverse op_true op_false. */ std::swap (op_true, op_false); diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md index be5aaa4d76f..0bb0729e933 100644 --- a/gcc/config/i386/predicates.md +++ b/gcc/config/i386/predicates.md @@ -1069,6 +1069,53 @@ (define_predicate "zero_extended_scalar_load_operand" return true; }) +/* Return true if operand is a float vector constant that is all ones. */
Re: [PATCH][AVX512]Lower AVX512 vector compare to AVX version when dest is vector
On 11/16/20 8:10 PM, Hongtao Liu wrote: > On Tue, Nov 17, 2020 at 8:05 AM Jeff Law wrote: >> >> On 9/2/20 3:34 AM, Hongtao Liu via Gcc-patches wrote: >>> Hi: >>> Add define_peephole2 to eliminate potential redundant conversion >>> from mask to vector. >>> Bootstrap is ok, regression test is ok for i386/x86-64 backend. >>> Ok for trunk? >>> >>> gcc/ChangeLog: >>> PR target/96891 >>> * config/i386/sse.md (VI_128_256): New mode iterator. >>> (define_peephole2): Lower avx512 vector compare to avx version >>> when dest is vector. >>> >>> gcc/testsuite/ChangeLog: >>> >>> * gcc.target/i386/avx512bw-pr96891-1.c: New test. >>> * gcc.target/i386/avx512f-pr96891-1.c: New test. >>> * gcc.target/i386/avx512f-pr96891-2.c: New test. >> Aren't these the two insns in question: >> >> >> (insn 7 4 8 2 (set (reg:QI 86) >> (unspec:QI [ >> (reg:V8SF 90) >> (reg:V8SF 89) >> (const_int 2 [0x2]) >> ] UNSPEC_PCMP)) "j.c":4:14 1911 {avx512vl_cmpv8sf3} >> (expr_list:REG_DEAD (reg:V8SF 90) >> (expr_list:REG_DEAD (reg:V8SF 89) >> (nil >> (note 8 7 9 2 NOTE_INSN_DELETED) >> (insn 9 8 14 2 (set (reg:V8SI 82 [ _2 ]) >> (vec_merge:V8SI (const_vector:V8SI [ >> (const_int -1 [0x]) repeated x8 >> ]) >> (const_vector:V8SI [ >> (const_int 0 [0]) repeated x8 >> ]) >> (reg:QI 86))) "j.c":4:14 2705 {*avx512vl_cvtmask2dv8si} >> (expr_list:REG_DEAD (reg:QI 86) >> (nil))) >> >> >> Note there's a data dependency between them. insn 7 feeds insn 9. When >> there's a data dependency, combiner patterns are usually the better >> choice than peepholes. I think you'd be looking to match something >> likethis (from the . combine dump): >> >> (set (reg:V8SI 82 [ _2 ]) >> (vec_merge:V8SI (const_vector:V8SI [ >> (const_int -1 [0x]) repeated x8 >> ]) >> (const_vector:V8SI [ >> (const_int 0 [0]) repeated x8 >> ]) >> (unspec:QI [ >> (reg:V8SF 90) >> (reg:V8SF 89) >> (const_int 2 [0x2]) >> ] UNSPEC_PCMP))) >> >> >> Jeff >> > Yes, as discussed in [1], maybe it's better to refactor avx512 integer > mask with VnBImode. Then unspec_pcmp could be dropped and simplify_rtx > could handle vector comparison more effectively. > > [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97521#c4 Thanks for the pointer. I didn't realize this patch was essentially abandoned. Jeff
Re: [PATCH][AVX512]Lower AVX512 vector compare to AVX version when dest is vector
On Tue, Nov 17, 2020 at 8:05 AM Jeff Law wrote: > > > On 9/2/20 3:34 AM, Hongtao Liu via Gcc-patches wrote: > > Hi: > > Add define_peephole2 to eliminate potential redundant conversion > > from mask to vector. > > Bootstrap is ok, regression test is ok for i386/x86-64 backend. > > Ok for trunk? > > > > gcc/ChangeLog: > > PR target/96891 > > * config/i386/sse.md (VI_128_256): New mode iterator. > > (define_peephole2): Lower avx512 vector compare to avx version > > when dest is vector. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/i386/avx512bw-pr96891-1.c: New test. > > * gcc.target/i386/avx512f-pr96891-1.c: New test. > > * gcc.target/i386/avx512f-pr96891-2.c: New test. > > Aren't these the two insns in question: > > > (insn 7 4 8 2 (set (reg:QI 86) > (unspec:QI [ > (reg:V8SF 90) > (reg:V8SF 89) > (const_int 2 [0x2]) > ] UNSPEC_PCMP)) "j.c":4:14 1911 {avx512vl_cmpv8sf3} > (expr_list:REG_DEAD (reg:V8SF 90) > (expr_list:REG_DEAD (reg:V8SF 89) > (nil > (note 8 7 9 2 NOTE_INSN_DELETED) > (insn 9 8 14 2 (set (reg:V8SI 82 [ _2 ]) > (vec_merge:V8SI (const_vector:V8SI [ > (const_int -1 [0x]) repeated x8 > ]) > (const_vector:V8SI [ > (const_int 0 [0]) repeated x8 > ]) > (reg:QI 86))) "j.c":4:14 2705 {*avx512vl_cvtmask2dv8si} > (expr_list:REG_DEAD (reg:QI 86) > (nil))) > > > Note there's a data dependency between them. insn 7 feeds insn 9. When > there's a data dependency, combiner patterns are usually the better > choice than peepholes. I think you'd be looking to match something > likethis (from the . combine dump): > > (set (reg:V8SI 82 [ _2 ]) > (vec_merge:V8SI (const_vector:V8SI [ > (const_int -1 [0x]) repeated x8 > ]) > (const_vector:V8SI [ > (const_int 0 [0]) repeated x8 > ]) > (unspec:QI [ > (reg:V8SF 90) > (reg:V8SF 89) > (const_int 2 [0x2]) > ] UNSPEC_PCMP))) > > > Jeff > Yes, as discussed in [1], maybe it's better to refactor avx512 integer mask with VnBImode. Then unspec_pcmp could be dropped and simplify_rtx could handle vector comparison more effectively. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97521#c4 -- BR, Hongtao
Re: [PATCH][AVX512]Lower AVX512 vector compare to AVX version when dest is vector
On 9/2/20 3:34 AM, Hongtao Liu via Gcc-patches wrote: > Hi: > Add define_peephole2 to eliminate potential redundant conversion > from mask to vector. > Bootstrap is ok, regression test is ok for i386/x86-64 backend. > Ok for trunk? > > gcc/ChangeLog: > PR target/96891 > * config/i386/sse.md (VI_128_256): New mode iterator. > (define_peephole2): Lower avx512 vector compare to avx version > when dest is vector. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/avx512bw-pr96891-1.c: New test. > * gcc.target/i386/avx512f-pr96891-1.c: New test. > * gcc.target/i386/avx512f-pr96891-2.c: New test. Aren't these the two insns in question: (insn 7 4 8 2 (set (reg:QI 86) (unspec:QI [ (reg:V8SF 90) (reg:V8SF 89) (const_int 2 [0x2]) ] UNSPEC_PCMP)) "j.c":4:14 1911 {avx512vl_cmpv8sf3} (expr_list:REG_DEAD (reg:V8SF 90) (expr_list:REG_DEAD (reg:V8SF 89) (nil (note 8 7 9 2 NOTE_INSN_DELETED) (insn 9 8 14 2 (set (reg:V8SI 82 [ _2 ]) (vec_merge:V8SI (const_vector:V8SI [ (const_int -1 [0x]) repeated x8 ]) (const_vector:V8SI [ (const_int 0 [0]) repeated x8 ]) (reg:QI 86))) "j.c":4:14 2705 {*avx512vl_cvtmask2dv8si} (expr_list:REG_DEAD (reg:QI 86) (nil))) Note there's a data dependency between them. insn 7 feeds insn 9. When there's a data dependency, combiner patterns are usually the better choice than peepholes. I think you'd be looking to match something likethis (from the . combine dump): (set (reg:V8SI 82 [ _2 ]) (vec_merge:V8SI (const_vector:V8SI [ (const_int -1 [0x]) repeated x8 ]) (const_vector:V8SI [ (const_int 0 [0]) repeated x8 ]) (unspec:QI [ (reg:V8SF 90) (reg:V8SF 89) (const_int 2 [0x2]) ] UNSPEC_PCMP))) Jeff
Re: [PATCH][AVX512]Lower AVX512 vector compare to AVX version when dest is vector
On Wed, Sep 2, 2020 at 2:33 AM Hongtao Liu via Gcc-patches wrote: > > Hi: > Add define_peephole2 to eliminate potential redundant conversion > from mask to vector. > Bootstrap is ok, regression test is ok for i386/x86-64 backend. > Ok for trunk? > > gcc/ChangeLog: > PR target/96891 > * config/i386/sse.md (VI_128_256): New mode iterator. > (define_peephole2): Lower avx512 vector compare to avx version > when dest is vector. > > gcc/testsuite/ChangeLog: Missing PR target/96891 > * gcc.target/i386/avx512bw-pr96891-1.c: New test. > * gcc.target/i386/avx512f-pr96891-1.c: New test. > * gcc.target/i386/avx512f-pr96891-2.c: New test. > > -- > BR, > Hongtao -- H.J.
[PATCH][AVX512]Lower AVX512 vector compare to AVX version when dest is vector
Hi: Add define_peephole2 to eliminate potential redundant conversion from mask to vector. Bootstrap is ok, regression test is ok for i386/x86-64 backend. Ok for trunk? gcc/ChangeLog: PR target/96891 * config/i386/sse.md (VI_128_256): New mode iterator. (define_peephole2): Lower avx512 vector compare to avx version when dest is vector. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512bw-pr96891-1.c: New test. * gcc.target/i386/avx512f-pr96891-1.c: New test. * gcc.target/i386/avx512f-pr96891-2.c: New test. -- BR, Hongtao From ba76432c08f47e4ecc1f355c0dfdea8908aaf9f4 Mon Sep 17 00:00:00 2001 From: liuhongt Date: Wed, 2 Sep 2020 17:14:39 +0800 Subject: [PATCH] Lower AVX512 vector compare to AVX version when dest is vector. gcc/ChangeLog: PR target/96891 * config/i386/sse.md (VI_128_256): New mode iterator. (define_peephole2): Lower avx512 vector compare to avx version when dest is vector. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512bw-pr96891-1.c: New test. * gcc.target/i386/avx512f-pr96891-1.c: New test. * gcc.target/i386/avx512f-pr96891-2.c: New test. --- gcc/config/i386/sse.md| 93 +++ .../gcc.target/i386/avx512bw-pr96891-1.c | 36 +++ .../gcc.target/i386/avx512f-pr96891-1.c | 40 .../gcc.target/i386/avx512f-pr96891-2.c | 30 ++ 4 files changed, 199 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/avx512bw-pr96891-1.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-pr96891-1.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-pr96891-2.c diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 8250325e1a3..31e0dc2a600 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -629,6 +629,9 @@ (define_mode_iterator VI_128 [V16QI V8HI V4SI V2DI]) ;; All 256bit vector integer modes (define_mode_iterator VI_256 [V32QI V16HI V8SI V4DI]) +;; All 128 and 256bit vector integer modes +(define_mode_iterator VI_128_256 [V16QI V8HI V4SI V2DI V32QI V16HI V8SI V4DI]) + ;; Various 128bit vector integer mode combinations (define_mode_iterator VI12_128 [V16QI V8HI]) (define_mode_iterator VI14_128 [V16QI V4SI]) @@ -6703,6 +6706,96 @@ (define_insn "*_cvtmask2" (set_attr "prefix" "evex") (set_attr "mode" "")]) +/* Lower avx512 parallel floating compare to avx compare when dst is vector. */ +(define_peephole2 + [(set (match_operand: 0 "register_operand") + (unspec: + [(match_operand:VF_128_256 1 "register_operand") + (match_operand:VF_128_256 2 "nonimmediate_operand") + (match_operand:SI 3 "const_0_to_31_operand")] + UNSPEC_PCMP)) + (set (match_operand: 4 "register_operand") + (vec_merge: + (match_operand: 5 "vector_all_ones_operand") + (match_operand: 6 "const0_operand") + (match_dup 0)))] + "!EXT_REX_SSE_REGNO_P (REGNO (operands[4])) + && !EXT_REX_SSE_REGNO_P (REGNO (operands[1])) + && !(REG_P (operands[2]) && EXT_REX_SSE_REGNO_P (REGNO (operands[2]))) + && peep2_reg_dead_p (2, operands[0])" + [(set (match_dup 7) + (unspec:VF_128_256 + [(match_dup 1) + (match_dup 2) + (match_dup 3)] UNSPEC_PCMP))] + "operands[7] = gen_rtx_REG (mode, REGNO (operands[4]));") + +/* Lower avx512 parallel integral compare to avx compare when dst is vector. */ +(define_peephole2 + [(set (match_operand: 0 "register_operand") + (unspec: + [(match_operand:VI_128_256 1 "register_operand") + (match_operand:VI_128_256 2 "nonimmediate_operand")] + UNSPEC_MASKED_EQ)) + (set (match_operand:VI_128_256 4 "register_operand") + (vec_merge:VI_128_256 + (match_operand:VI_128_256 5 "vector_all_ones_operand") + (match_operand:VI_128_256 6 "const0_operand") + (match_dup 0)))] + "!EXT_REX_SSE_REGNO_P (REGNO (operands[4])) + && !EXT_REX_SSE_REGNO_P (REGNO (operands[1])) + && !(REG_P (operands[2]) && EXT_REX_SSE_REGNO_P (REGNO (operands[2]))) + && peep2_reg_dead_p (2, operands[0])" + [(set (match_dup 4) + (eq:VI_128_256 + (match_dup 1) + (match_dup 2)))]) + +(define_peephole2 + [(set (match_operand: 0 "register_operand") + (unspec: + [(match_operand:VI_128_256 1 "register_operand") + (match_operand:VI_128_256 2 "nonimmediate_operand")] + UNSPEC_MASKED_GT)) + (set (match_operand:VI_128_256 4 "register_operand") + (vec_merge:VI_128_256 + (match_operand:VI_128_256 5 "vector_all_ones_operand") + (match_operand:VI_128_256 6 "const0_operand") + (match_dup 0)))] + "!EXT_REX_SSE_REGNO_P (REGNO (operands[4])) + && !EXT_REX_SSE_REGNO_P (REGNO (operands[1])) + && !(REG_P (operands[2]) && EXT_REX_SSE_REGN