Re: [PATCH] i386: Optimize vpsubusw compared to 0 into vpcmpleuw or vpcmpnltuw[PR96906]

2020-12-02 Thread Hongtao Liu via Gcc-patches
On Thu, Dec 3, 2020 at 2:22 AM Jakub Jelinek  wrote:
>
> On Tue, Dec 01, 2020 at 12:49:03PM +0800, Hongtao Liu via Gcc-patches wrote:
> > +bool neq_p = INTVAL (operands[4]) >> 2;
> > +/* LE: 2, NLT: 5, NLE: 6, LT: 1  */
> > +rtx cmp_predicate = neq_p ? GEN_INT (6) : GEN_INT (2);
> > +if (MEM_P (operands[1]))
> > +  {
> > + std::swap (operands[1], operands[2]);
> > + cmp_predicate = neq_p ? GEN_INT (1) : GEN_INT (5);
> > +  }
> > +emit_insn (gen__ucmp3 (operands[0], operands[1],
> > + operands[2], cmp_predicate));
>
> I'd suggest instead:
> +/* LE: 2, NLT: 5, NLE: 6, LT: 1  */
> +int cmp_predicate = 2; /* LE  */
> +if (MEM_P (operands[1]))
> +  {
> +   std::swap (operands[1], operands[2]);
> +   cmp_predicate = 5; /* NLT (GE)  */
> +  }
> +if ((INTVAL (operands[4]) & 4) != 0)
> +  cmp_predictate ^= 4; /* Invert the comparison to NLE (GT) or LT.  */
> +emit_insn (gen__ucmp3 (operands[0], operands[1], 
> operands[2],
> +GEN_INT (cmp_predicate)));
> so that you don't create the rtx CONST_INTs in 4 places and don't do that
> unnecessarily when you will need another constant.
Thanks for the review,committed.
>
> Otherwise LGTM, thanks.
>
> Jakub
>


-- 
BR,
Hongtao


Re: [PATCH] i386: Optimize vpsubusw compared to 0 into vpcmpleuw or vpcmpnltuw[PR96906]

2020-12-02 Thread Jakub Jelinek via Gcc-patches
On Tue, Dec 01, 2020 at 12:49:03PM +0800, Hongtao Liu via Gcc-patches wrote:
> +bool neq_p = INTVAL (operands[4]) >> 2;
> +/* LE: 2, NLT: 5, NLE: 6, LT: 1  */
> +rtx cmp_predicate = neq_p ? GEN_INT (6) : GEN_INT (2);
> +if (MEM_P (operands[1]))
> +  {
> + std::swap (operands[1], operands[2]);
> + cmp_predicate = neq_p ? GEN_INT (1) : GEN_INT (5);
> +  }
> +emit_insn (gen__ucmp3 (operands[0], operands[1],
> + operands[2], cmp_predicate));

I'd suggest instead:
+/* LE: 2, NLT: 5, NLE: 6, LT: 1  */
+int cmp_predicate = 2; /* LE  */
+if (MEM_P (operands[1]))
+  {
+   std::swap (operands[1], operands[2]);
+   cmp_predicate = 5; /* NLT (GE)  */
+  }
+if ((INTVAL (operands[4]) & 4) != 0)
+  cmp_predictate ^= 4; /* Invert the comparison to NLE (GT) or LT.  */
+emit_insn (gen__ucmp3 (operands[0], operands[1], operands[2],
+GEN_INT (cmp_predicate)));
so that you don't create the rtx CONST_INTs in 4 places and don't do that
unnecessarily when you will need another constant.

Otherwise LGTM, thanks.

Jakub



Re: [PATCH] i386: Optimize vpsubusw compared to 0 into vpcmpleuw or vpcmpnltuw[PR96906]

2020-11-30 Thread Hongtao Liu via Gcc-patches
On Mon, Nov 30, 2020 at 9:46 PM Jakub Jelinek  wrote:
>
> On Mon, Nov 30, 2020 at 09:11:10PM +0800, Hongtao Liu wrote:
> > +;; PR96906 - optimize vpsubusw compared to 0 into vpcmpleuw or vpcmpnltuw.
> > +(define_split
> > +  [(set (match_operand: 0 "register_operand")
> > +(unspec:
> > +  [(us_minus:VI12_AVX512VL
> > + (match_operand:VI12_AVX512VL 1 "vector_operand")
> > + (match_operand:VI12_AVX512VL 2 "vector_operand"))
> > +   (match_operand:VI12_AVX512VL 3 "const0_operand")
> > +   (match_operand:SI 4 "const0_operand")]
> > +  UNSPEC_PCMP))]
> > +  "TARGET_AVX512BW && ix86_binary_operator_ok (US_MINUS, mode, 
> > operands)"
>
> Too long line, please wrap it.
> Also, INTVAL (operands[4]) == 0 is EQ comparison, can't we handle also
> NE (i.e. INTVAL (operands[4]) == 4?
> I.e. replace the "const0_operand" in there with "const_0_to_7_operand"
> and check in conditions that (INTVAL (operands[4]) & 3) == 0.
>
> > +  [(const_int 0)]
> > +  {
> > +/* LE: 2, NLT: 5.  */
> > +rtx cmp_predicate = GEN_INT (2);
> > +if (MEM_P (operands[1]))
> > +  {
> > +std::swap (operands[1], operands[2]);
> > +cmp_predicate = GEN_INT (5);
>
> For INTVAL (operands[4]) == 4 it would then be cmp_predictate NLE: 4 resp.
> LT: 3 I think.
>
> Also, this handles only UNSPEC_PCMP, can't we handle UNSPEC_UNSIGNED_PCMP
> too?  I mean, for equality comparisons it doesn't really matter if we have
> signed or unsigned == or !=.  And for unsigned
> x == 0U is equivalent to x <= 0U, and x != 0U equivalent to x > 0U.
>
> Jakub
>

Yes, Update patch.

+(define_int_iterator UNSPEC_PCMP_ITER
+  [UNSPEC_PCMP UNSPEC_UNSIGNED_PCMP])
+
+(define_int_attr pcmp_signed_mask
+  [(UNSPEC_PCMP "3") (UNSPEC_UNSIGNED_PCMP "1")])
+
+;; PR96906 - optimize vpsubusw compared to 0 into vpcmpleuw or vpcmpnltuw.
+;; For signed comparison, handle EQ 0: NEQ 4,
+;; for unsigned comparison extra handle LE:2, NLE:6, equivalent to EQ and NEQ.
+
+(define_split
+  [(set (match_operand: 0 "register_operand")
+   (unspec:
+ [(us_minus:VI12_AVX512VL
+(match_operand:VI12_AVX512VL 1 "vector_operand")
+(match_operand:VI12_AVX512VL 2 "vector_operand"))
+  (match_operand:VI12_AVX512VL 3 "const0_operand")
+  (match_operand:SI 4 "const_0_to_7_operand")]
+ UNSPEC_PCMP_ITER))]
+  "TARGET_AVX512BW
+  && ix86_binary_operator_ok (US_MINUS, mode, operands)
+  && (INTVAL (operands[4]) & ) == 0"
+  [(const_int 0)]
+  {
+bool neq_p = INTVAL (operands[4]) >> 2;
+/* LE: 2, NLT: 5, NLE: 6, LT: 1  */
+rtx cmp_predicate = neq_p ? GEN_INT (6) : GEN_INT (2);
+if (MEM_P (operands[1]))
+  {
+   std::swap (operands[1], operands[2]);
+   cmp_predicate = neq_p ? GEN_INT (1) : GEN_INT (5);
+  }
+emit_insn (gen__ucmp3 (operands[0], operands[1],
+   operands[2], cmp_predicate));
+DONE;
+  })
+


-- 
BR,
Hongtao
From e3eb61066ee665325cba8e231b991f9a1dda07df Mon Sep 17 00:00:00 2001
From: liuhongt 
Date: Mon, 30 Nov 2020 13:27:16 +0800
Subject: [PATCH]  Optimize vpsubusw compared to 0 into vpcmpleuw or vpcmpnleuw
 [PR96906]

For signed comparisons, it handles cases that are eq or neq to 0.
For unsigned comparisons, it additionaly handles cases that are le or
gt to 0(equivilent to eq or neq to 0). Transform case eq to leu,
case neq to gtu.

.i.e. for -mavx512bw -mavx512vl transform eq case code from

	vpsubusw%xmm1, %xmm0, %xmm0
	vpxor   %xmm1, %xmm1, %xmm1
	vpcmpeqw  %xmm1, %xmm0, %k0
to
	vpcmpleuw   %xmm1, %xmm0, %k0

.i.e. for -mavx512bw -mavx512vl transform neq case code from

	vpsubusw%xmm1, %xmm0, %xmm0
	vpxor   %xmm1, %xmm1, %xmm1
	vpcmpneqw  %xmm1, %xmm0, %k0
to
	vpcmpnleuw   %xmm1, %xmm0, %k0

gcc/ChangeLog
	PR target/96906
	* config/i386/sse.md
	(_ucmp3): Add a new
	define_split after this insn.

gcc/testsuite/ChangeLog

	* gcc.target/i386/avx512bw-pr96906-1.c: New test.
	* gcc.target/i386/pr96906-1.c: Add -mno-avx512f.
---
 gcc/config/i386/sse.md| 37 ++
 .../gcc.target/i386/avx512bw-pr96906-1.c  | 68 +++
 gcc/testsuite/gcc.target/i386/pr96906-1.c |  2 +-
 3 files changed, 106 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512bw-pr96906-1.c

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 4aad462f882..7a4dafea1ed 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -3006,6 +3006,43 @@ (define_insn "_ucmp3"
(set_attr "prefix" "evex")
(set_attr "mode" "")])
 
+(define_int_iterator UNSPEC_PCMP_ITER
+  [UNSPEC_PCMP UNSPEC_UNSIGNED_PCMP])
+
+(define_int_attr pcmp_signed_mask
+  [(UNSPEC_PCMP "3") (UNSPEC_UNSIGNED_PCMP "1")])
+
+;; PR96906 - optimize vpsubusw compared to 0 into vpcmpleuw or vpcmpnltuw.
+;; For signed comparison, handle EQ 0: NEQ 4,
+;; for unsigned comparison extra handle LE:2, NLE:6, equivalent to EQ 

Re: [PATCH] i386: Optimize vpsubusw compared to 0 into vpcmpleuw or vpcmpnltuw[PR96906]

2020-11-30 Thread Jakub Jelinek via Gcc-patches
On Mon, Nov 30, 2020 at 09:11:10PM +0800, Hongtao Liu wrote:
> +;; PR96906 - optimize vpsubusw compared to 0 into vpcmpleuw or vpcmpnltuw.
> +(define_split
> +  [(set (match_operand: 0 "register_operand")
> +(unspec:
> +  [(us_minus:VI12_AVX512VL
> + (match_operand:VI12_AVX512VL 1 "vector_operand")
> + (match_operand:VI12_AVX512VL 2 "vector_operand"))
> +   (match_operand:VI12_AVX512VL 3 "const0_operand")
> +   (match_operand:SI 4 "const0_operand")]
> +  UNSPEC_PCMP))]
> +  "TARGET_AVX512BW && ix86_binary_operator_ok (US_MINUS, mode, 
> operands)"

Too long line, please wrap it.
Also, INTVAL (operands[4]) == 0 is EQ comparison, can't we handle also
NE (i.e. INTVAL (operands[4]) == 4?
I.e. replace the "const0_operand" in there with "const_0_to_7_operand"
and check in conditions that (INTVAL (operands[4]) & 3) == 0.

> +  [(const_int 0)]
> +  {
> +/* LE: 2, NLT: 5.  */
> +rtx cmp_predicate = GEN_INT (2);
> +if (MEM_P (operands[1]))
> +  {
> +std::swap (operands[1], operands[2]);
> +cmp_predicate = GEN_INT (5);

For INTVAL (operands[4]) == 4 it would then be cmp_predictate NLE: 4 resp.
LT: 3 I think.

Also, this handles only UNSPEC_PCMP, can't we handle UNSPEC_UNSIGNED_PCMP
too?  I mean, for equality comparisons it doesn't really matter if we have
signed or unsigned == or !=.  And for unsigned
x == 0U is equivalent to x <= 0U, and x != 0U equivalent to x > 0U.

Jakub



[PATCH] i386: Optimize vpsubusw compared to 0 into vpcmpleuw or vpcmpnltuw[PR96906]

2020-11-30 Thread Hongtao Liu via Gcc-patches
Hi:
  This patch is quite similar like what jakub did in
https://gcc.gnu.org/pipermail/gcc-patches/2020-November/560151.html
but for target avx512bw.

.i.e. for -mavx512bw -mavx512vl transform code from

vpsubusw%xmm1, %xmm0, %xmm0
vpxor   %xmm1, %xmm1, %xmm1
vpcmpw  $0, %xmm1, %xmm0, %k0
to
vpcmpleuw   %xmm1, %xmm0, %k0

   Bootstrapped/regtested on x86_64-linux is ok.

gcc/ChangeLog
PR target/96906
 * config/i386/sse.md
(_ucmp3): Add a new
define_split after this insn.

gcc/testsuite/ChangeLog

* gcc.target/i386/avx512bw-pr96906-1.c: New test.
* gcc.target/i386/pr96906-1.c: Add -mno-avx512f.

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 4aad462f882..eebc3750584 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -3006,6 +3006,30 @@ (define_insn
"_ucmp3"
(set_attr "prefix" "evex")
(set_attr "mode" "")])

+;; PR96906 - optimize vpsubusw compared to 0 into vpcmpleuw or vpcmpnltuw.
+(define_split
+  [(set (match_operand: 0 "register_operand")
+(unspec:
+  [(us_minus:VI12_AVX512VL
+ (match_operand:VI12_AVX512VL 1 "vector_operand")
+ (match_operand:VI12_AVX512VL 2 "vector_operand"))
+   (match_operand:VI12_AVX512VL 3 "const0_operand")
+   (match_operand:SI 4 "const0_operand")]
+  UNSPEC_PCMP))]
+  "TARGET_AVX512BW && ix86_binary_operator_ok (US_MINUS, mode, operands)"
+  [(const_int 0)]
+  {
+/* LE: 2, NLT: 5.  */
+rtx cmp_predicate = GEN_INT (2);
+if (MEM_P (operands[1]))
+  {
+std::swap (operands[1], operands[2]);
+cmp_predicate = GEN_INT (5);
+  }
+emit_insn (gen__ucmp3 (operands[0], operands[1],
+operands[2], cmp_predicate));
+DONE;
+  })
+
 (define_insn "avx512f_vmcmp3"
   [(set (match_operand: 0 "register_operand" "=k")
 (and:
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-pr96906-1.c
b/gcc/testsuite/gcc.target/i386/avx512bw-pr96906-1.c
new file mode 100644
index 000..ae7ec7abed1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-pr96906-1.c
@@ -0,0 +1,80 @@
+/* PR target/96906 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512bw -mavx512vl -masm=att" } */
+/* { dg-final { scan-assembler-times {(?n)vpcmpub[ \t]*\$2} 6 } } */
+/* { dg-final { scan-assembler-times {(?n)vpcmpuw[ \t]*\$2} 6 } } */
+
+
+#include
+
+__mmask8
+ff1 (__m128i x, __m128i y)
+{
+  return _mm_cmp_epi16_mask (_mm_subs_epu16 (x, y), _mm_setzero_si128 (), 0);
+}
+
+__mmask8
+ff2 (__m128i x, __m128i y)
+{
+  return _mm_cmple_epu16_mask (x, y);
+}
+
+__mmask16
+ff3 (__m128i x, __m128i y)
+{
+  return _mm_cmp_epi8_mask (_mm_subs_epu8 (x, y), _mm_setzero_si128 (), 0);
+}
+
+__mmask16
+ff4 (__m128i x, __m128i y)
+{
+  return _mm_cmple_epu8_mask (x, y);
+}
+
+__mmask16
+ff5 (__m256i x, __m256i y)
+{
+  return _mm256_cmp_epi16_mask (_mm256_subs_epu16 (x, y),
_mm256_setzero_si256 (), 0);
+}
+
+__mmask16
+ff6 (__m256i x, __m256i y)
+{
+  return _mm256_cmple_epu16_mask (x, y);
+}
+
+__mmask32
+ff7 (__m256i x, __m256i y)
+{
+  return _mm256_cmp_epi8_mask (_mm256_subs_epu8 (x, y),
_mm256_setzero_si256 (), 0);
+}
+
+__mmask32
+ff8 (__m256i x, __m256i y)
+{
+  return _mm256_cmple_epu8_mask (x, y);
+}
+
+__mmask32
+ff9 (__m512i x, __m512i y)
+{
+  return _mm512_cmp_epi16_mask (_mm512_subs_epu16 (x, y),
_mm512_setzero_si512 (), 0);
+}
+
+__mmask32
+ff10 (__m512i x, __m512i y)
+{
+  return _mm512_cmple_epu16_mask (x, y);
+}
+
+__mmask64
+ff11 (__m512i x, __m512i y)
+{
+  return _mm512_cmp_epi8_mask (_mm512_subs_epu8 (x, y),
_mm512_setzero_si512 (), 0);
+}
+
+__mmask64
+ff12 (__m512i x, __m512i y)
+{
+  return _mm512_cmple_epu8_mask (x, y);
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr96906-1.c
b/gcc/testsuite/gcc.target/i386/pr96906-1.c
index 9d836eb2bdd..b1b41bf522d 100644
--- a/gcc/testsuite/gcc.target/i386/pr96906-1.c
+++ b/gcc/testsuite/gcc.target/i386/pr96906-1.c
@@ -1,6 +1,6 @@
 /* PR target/96906 */
 /* { dg-do compile } */
-/* { dg-options "-O2 -mavx2" } */
+/* { dg-options "-O2 -mavx2 -mno-avx512f" } */
 /* { dg-final { scan-assembler-times "\tvpminub\[^\n\r]*xmm" 2 } } */
 /* { dg-final { scan-assembler-times "\tvpminuw\[^\n\r]*xmm" 2 } } */
 /* { dg-final { scan-assembler-times "\tvpminub\[^\n\r]*ymm" 2 } } */
-- 
2.18.1