Re: [PATCH] i386: Optimize vpsubusw compared to 0 into vpcmpleuw or vpcmpnltuw[PR96906]
On Thu, Dec 3, 2020 at 2:22 AM Jakub Jelinek wrote: > > On Tue, Dec 01, 2020 at 12:49:03PM +0800, Hongtao Liu via Gcc-patches wrote: > > +bool neq_p = INTVAL (operands[4]) >> 2; > > +/* LE: 2, NLT: 5, NLE: 6, LT: 1 */ > > +rtx cmp_predicate = neq_p ? GEN_INT (6) : GEN_INT (2); > > +if (MEM_P (operands[1])) > > + { > > + std::swap (operands[1], operands[2]); > > + cmp_predicate = neq_p ? GEN_INT (1) : GEN_INT (5); > > + } > > +emit_insn (gen__ucmp3 (operands[0], operands[1], > > + operands[2], cmp_predicate)); > > I'd suggest instead: > +/* LE: 2, NLT: 5, NLE: 6, LT: 1 */ > +int cmp_predicate = 2; /* LE */ > +if (MEM_P (operands[1])) > + { > + std::swap (operands[1], operands[2]); > + cmp_predicate = 5; /* NLT (GE) */ > + } > +if ((INTVAL (operands[4]) & 4) != 0) > + cmp_predictate ^= 4; /* Invert the comparison to NLE (GT) or LT. */ > +emit_insn (gen__ucmp3 (operands[0], operands[1], > operands[2], > +GEN_INT (cmp_predicate))); > so that you don't create the rtx CONST_INTs in 4 places and don't do that > unnecessarily when you will need another constant. Thanks for the review,committed. > > Otherwise LGTM, thanks. > > Jakub > -- BR, Hongtao
Re: [PATCH] i386: Optimize vpsubusw compared to 0 into vpcmpleuw or vpcmpnltuw[PR96906]
On Tue, Dec 01, 2020 at 12:49:03PM +0800, Hongtao Liu via Gcc-patches wrote: > +bool neq_p = INTVAL (operands[4]) >> 2; > +/* LE: 2, NLT: 5, NLE: 6, LT: 1 */ > +rtx cmp_predicate = neq_p ? GEN_INT (6) : GEN_INT (2); > +if (MEM_P (operands[1])) > + { > + std::swap (operands[1], operands[2]); > + cmp_predicate = neq_p ? GEN_INT (1) : GEN_INT (5); > + } > +emit_insn (gen__ucmp3 (operands[0], operands[1], > + operands[2], cmp_predicate)); I'd suggest instead: +/* LE: 2, NLT: 5, NLE: 6, LT: 1 */ +int cmp_predicate = 2; /* LE */ +if (MEM_P (operands[1])) + { + std::swap (operands[1], operands[2]); + cmp_predicate = 5; /* NLT (GE) */ + } +if ((INTVAL (operands[4]) & 4) != 0) + cmp_predictate ^= 4; /* Invert the comparison to NLE (GT) or LT. */ +emit_insn (gen__ucmp3 (operands[0], operands[1], operands[2], +GEN_INT (cmp_predicate))); so that you don't create the rtx CONST_INTs in 4 places and don't do that unnecessarily when you will need another constant. Otherwise LGTM, thanks. Jakub
Re: [PATCH] i386: Optimize vpsubusw compared to 0 into vpcmpleuw or vpcmpnltuw[PR96906]
On Mon, Nov 30, 2020 at 9:46 PM Jakub Jelinek wrote: > > On Mon, Nov 30, 2020 at 09:11:10PM +0800, Hongtao Liu wrote: > > +;; PR96906 - optimize vpsubusw compared to 0 into vpcmpleuw or vpcmpnltuw. > > +(define_split > > + [(set (match_operand: 0 "register_operand") > > +(unspec: > > + [(us_minus:VI12_AVX512VL > > + (match_operand:VI12_AVX512VL 1 "vector_operand") > > + (match_operand:VI12_AVX512VL 2 "vector_operand")) > > + (match_operand:VI12_AVX512VL 3 "const0_operand") > > + (match_operand:SI 4 "const0_operand")] > > + UNSPEC_PCMP))] > > + "TARGET_AVX512BW && ix86_binary_operator_ok (US_MINUS, mode, > > operands)" > > Too long line, please wrap it. > Also, INTVAL (operands[4]) == 0 is EQ comparison, can't we handle also > NE (i.e. INTVAL (operands[4]) == 4? > I.e. replace the "const0_operand" in there with "const_0_to_7_operand" > and check in conditions that (INTVAL (operands[4]) & 3) == 0. > > > + [(const_int 0)] > > + { > > +/* LE: 2, NLT: 5. */ > > +rtx cmp_predicate = GEN_INT (2); > > +if (MEM_P (operands[1])) > > + { > > +std::swap (operands[1], operands[2]); > > +cmp_predicate = GEN_INT (5); > > For INTVAL (operands[4]) == 4 it would then be cmp_predictate NLE: 4 resp. > LT: 3 I think. > > Also, this handles only UNSPEC_PCMP, can't we handle UNSPEC_UNSIGNED_PCMP > too? I mean, for equality comparisons it doesn't really matter if we have > signed or unsigned == or !=. And for unsigned > x == 0U is equivalent to x <= 0U, and x != 0U equivalent to x > 0U. > > Jakub > Yes, Update patch. +(define_int_iterator UNSPEC_PCMP_ITER + [UNSPEC_PCMP UNSPEC_UNSIGNED_PCMP]) + +(define_int_attr pcmp_signed_mask + [(UNSPEC_PCMP "3") (UNSPEC_UNSIGNED_PCMP "1")]) + +;; PR96906 - optimize vpsubusw compared to 0 into vpcmpleuw or vpcmpnltuw. +;; For signed comparison, handle EQ 0: NEQ 4, +;; for unsigned comparison extra handle LE:2, NLE:6, equivalent to EQ and NEQ. + +(define_split + [(set (match_operand: 0 "register_operand") + (unspec: + [(us_minus:VI12_AVX512VL +(match_operand:VI12_AVX512VL 1 "vector_operand") +(match_operand:VI12_AVX512VL 2 "vector_operand")) + (match_operand:VI12_AVX512VL 3 "const0_operand") + (match_operand:SI 4 "const_0_to_7_operand")] + UNSPEC_PCMP_ITER))] + "TARGET_AVX512BW + && ix86_binary_operator_ok (US_MINUS, mode, operands) + && (INTVAL (operands[4]) & ) == 0" + [(const_int 0)] + { +bool neq_p = INTVAL (operands[4]) >> 2; +/* LE: 2, NLT: 5, NLE: 6, LT: 1 */ +rtx cmp_predicate = neq_p ? GEN_INT (6) : GEN_INT (2); +if (MEM_P (operands[1])) + { + std::swap (operands[1], operands[2]); + cmp_predicate = neq_p ? GEN_INT (1) : GEN_INT (5); + } +emit_insn (gen__ucmp3 (operands[0], operands[1], + operands[2], cmp_predicate)); +DONE; + }) + -- BR, Hongtao From e3eb61066ee665325cba8e231b991f9a1dda07df Mon Sep 17 00:00:00 2001 From: liuhongt Date: Mon, 30 Nov 2020 13:27:16 +0800 Subject: [PATCH] Optimize vpsubusw compared to 0 into vpcmpleuw or vpcmpnleuw [PR96906] For signed comparisons, it handles cases that are eq or neq to 0. For unsigned comparisons, it additionaly handles cases that are le or gt to 0(equivilent to eq or neq to 0). Transform case eq to leu, case neq to gtu. .i.e. for -mavx512bw -mavx512vl transform eq case code from vpsubusw%xmm1, %xmm0, %xmm0 vpxor %xmm1, %xmm1, %xmm1 vpcmpeqw %xmm1, %xmm0, %k0 to vpcmpleuw %xmm1, %xmm0, %k0 .i.e. for -mavx512bw -mavx512vl transform neq case code from vpsubusw%xmm1, %xmm0, %xmm0 vpxor %xmm1, %xmm1, %xmm1 vpcmpneqw %xmm1, %xmm0, %k0 to vpcmpnleuw %xmm1, %xmm0, %k0 gcc/ChangeLog PR target/96906 * config/i386/sse.md (_ucmp3): Add a new define_split after this insn. gcc/testsuite/ChangeLog * gcc.target/i386/avx512bw-pr96906-1.c: New test. * gcc.target/i386/pr96906-1.c: Add -mno-avx512f. --- gcc/config/i386/sse.md| 37 ++ .../gcc.target/i386/avx512bw-pr96906-1.c | 68 +++ gcc/testsuite/gcc.target/i386/pr96906-1.c | 2 +- 3 files changed, 106 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/i386/avx512bw-pr96906-1.c diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 4aad462f882..7a4dafea1ed 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -3006,6 +3006,43 @@ (define_insn "_ucmp3" (set_attr "prefix" "evex") (set_attr "mode" "")]) +(define_int_iterator UNSPEC_PCMP_ITER + [UNSPEC_PCMP UNSPEC_UNSIGNED_PCMP]) + +(define_int_attr pcmp_signed_mask + [(UNSPEC_PCMP "3") (UNSPEC_UNSIGNED_PCMP "1")]) + +;; PR96906 - optimize vpsubusw compared to 0 into vpcmpleuw or vpcmpnltuw. +;; For signed comparison, handle EQ 0: NEQ 4, +;; for unsigned comparison extra handle LE:2, NLE:6, equivalent to EQ
Re: [PATCH] i386: Optimize vpsubusw compared to 0 into vpcmpleuw or vpcmpnltuw[PR96906]
On Mon, Nov 30, 2020 at 09:11:10PM +0800, Hongtao Liu wrote: > +;; PR96906 - optimize vpsubusw compared to 0 into vpcmpleuw or vpcmpnltuw. > +(define_split > + [(set (match_operand: 0 "register_operand") > +(unspec: > + [(us_minus:VI12_AVX512VL > + (match_operand:VI12_AVX512VL 1 "vector_operand") > + (match_operand:VI12_AVX512VL 2 "vector_operand")) > + (match_operand:VI12_AVX512VL 3 "const0_operand") > + (match_operand:SI 4 "const0_operand")] > + UNSPEC_PCMP))] > + "TARGET_AVX512BW && ix86_binary_operator_ok (US_MINUS, mode, > operands)" Too long line, please wrap it. Also, INTVAL (operands[4]) == 0 is EQ comparison, can't we handle also NE (i.e. INTVAL (operands[4]) == 4? I.e. replace the "const0_operand" in there with "const_0_to_7_operand" and check in conditions that (INTVAL (operands[4]) & 3) == 0. > + [(const_int 0)] > + { > +/* LE: 2, NLT: 5. */ > +rtx cmp_predicate = GEN_INT (2); > +if (MEM_P (operands[1])) > + { > +std::swap (operands[1], operands[2]); > +cmp_predicate = GEN_INT (5); For INTVAL (operands[4]) == 4 it would then be cmp_predictate NLE: 4 resp. LT: 3 I think. Also, this handles only UNSPEC_PCMP, can't we handle UNSPEC_UNSIGNED_PCMP too? I mean, for equality comparisons it doesn't really matter if we have signed or unsigned == or !=. And for unsigned x == 0U is equivalent to x <= 0U, and x != 0U equivalent to x > 0U. Jakub
[PATCH] i386: Optimize vpsubusw compared to 0 into vpcmpleuw or vpcmpnltuw[PR96906]
Hi: This patch is quite similar like what jakub did in https://gcc.gnu.org/pipermail/gcc-patches/2020-November/560151.html but for target avx512bw. .i.e. for -mavx512bw -mavx512vl transform code from vpsubusw%xmm1, %xmm0, %xmm0 vpxor %xmm1, %xmm1, %xmm1 vpcmpw $0, %xmm1, %xmm0, %k0 to vpcmpleuw %xmm1, %xmm0, %k0 Bootstrapped/regtested on x86_64-linux is ok. gcc/ChangeLog PR target/96906 * config/i386/sse.md (_ucmp3): Add a new define_split after this insn. gcc/testsuite/ChangeLog * gcc.target/i386/avx512bw-pr96906-1.c: New test. * gcc.target/i386/pr96906-1.c: Add -mno-avx512f. diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 4aad462f882..eebc3750584 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -3006,6 +3006,30 @@ (define_insn "_ucmp3" (set_attr "prefix" "evex") (set_attr "mode" "")]) +;; PR96906 - optimize vpsubusw compared to 0 into vpcmpleuw or vpcmpnltuw. +(define_split + [(set (match_operand: 0 "register_operand") +(unspec: + [(us_minus:VI12_AVX512VL + (match_operand:VI12_AVX512VL 1 "vector_operand") + (match_operand:VI12_AVX512VL 2 "vector_operand")) + (match_operand:VI12_AVX512VL 3 "const0_operand") + (match_operand:SI 4 "const0_operand")] + UNSPEC_PCMP))] + "TARGET_AVX512BW && ix86_binary_operator_ok (US_MINUS, mode, operands)" + [(const_int 0)] + { +/* LE: 2, NLT: 5. */ +rtx cmp_predicate = GEN_INT (2); +if (MEM_P (operands[1])) + { +std::swap (operands[1], operands[2]); +cmp_predicate = GEN_INT (5); + } +emit_insn (gen__ucmp3 (operands[0], operands[1], +operands[2], cmp_predicate)); +DONE; + }) + (define_insn "avx512f_vmcmp3" [(set (match_operand: 0 "register_operand" "=k") (and: diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-pr96906-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-pr96906-1.c new file mode 100644 index 000..ae7ec7abed1 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512bw-pr96906-1.c @@ -0,0 +1,80 @@ +/* PR target/96906 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx512bw -mavx512vl -masm=att" } */ +/* { dg-final { scan-assembler-times {(?n)vpcmpub[ \t]*\$2} 6 } } */ +/* { dg-final { scan-assembler-times {(?n)vpcmpuw[ \t]*\$2} 6 } } */ + + +#include + +__mmask8 +ff1 (__m128i x, __m128i y) +{ + return _mm_cmp_epi16_mask (_mm_subs_epu16 (x, y), _mm_setzero_si128 (), 0); +} + +__mmask8 +ff2 (__m128i x, __m128i y) +{ + return _mm_cmple_epu16_mask (x, y); +} + +__mmask16 +ff3 (__m128i x, __m128i y) +{ + return _mm_cmp_epi8_mask (_mm_subs_epu8 (x, y), _mm_setzero_si128 (), 0); +} + +__mmask16 +ff4 (__m128i x, __m128i y) +{ + return _mm_cmple_epu8_mask (x, y); +} + +__mmask16 +ff5 (__m256i x, __m256i y) +{ + return _mm256_cmp_epi16_mask (_mm256_subs_epu16 (x, y), _mm256_setzero_si256 (), 0); +} + +__mmask16 +ff6 (__m256i x, __m256i y) +{ + return _mm256_cmple_epu16_mask (x, y); +} + +__mmask32 +ff7 (__m256i x, __m256i y) +{ + return _mm256_cmp_epi8_mask (_mm256_subs_epu8 (x, y), _mm256_setzero_si256 (), 0); +} + +__mmask32 +ff8 (__m256i x, __m256i y) +{ + return _mm256_cmple_epu8_mask (x, y); +} + +__mmask32 +ff9 (__m512i x, __m512i y) +{ + return _mm512_cmp_epi16_mask (_mm512_subs_epu16 (x, y), _mm512_setzero_si512 (), 0); +} + +__mmask32 +ff10 (__m512i x, __m512i y) +{ + return _mm512_cmple_epu16_mask (x, y); +} + +__mmask64 +ff11 (__m512i x, __m512i y) +{ + return _mm512_cmp_epi8_mask (_mm512_subs_epu8 (x, y), _mm512_setzero_si512 (), 0); +} + +__mmask64 +ff12 (__m512i x, __m512i y) +{ + return _mm512_cmple_epu8_mask (x, y); +} diff --git a/gcc/testsuite/gcc.target/i386/pr96906-1.c b/gcc/testsuite/gcc.target/i386/pr96906-1.c index 9d836eb2bdd..b1b41bf522d 100644 --- a/gcc/testsuite/gcc.target/i386/pr96906-1.c +++ b/gcc/testsuite/gcc.target/i386/pr96906-1.c @@ -1,6 +1,6 @@ /* PR target/96906 */ /* { dg-do compile } */ -/* { dg-options "-O2 -mavx2" } */ +/* { dg-options "-O2 -mavx2 -mno-avx512f" } */ /* { dg-final { scan-assembler-times "\tvpminub\[^\n\r]*xmm" 2 } } */ /* { dg-final { scan-assembler-times "\tvpminuw\[^\n\r]*xmm" 2 } } */ /* { dg-final { scan-assembler-times "\tvpminub\[^\n\r]*ymm" 2 } } */ -- 2.18.1