Re: [PATCH][PR target/97770] x86: Add missing popcount2 expander

2020-11-30 Thread Hongyu Wang via Gcc-patches
> OK.  Presumably once this is applied Richi is going to  look at the
> higher level issues in the vectorizer which inhibit creating the HI/QI
> vector popcounts?
>

Yes, this is the prerequisite to look at the vectorization issue. I'll
ask Hongtao to
help check-in this patch. Thanks for the approval.

Jeff Law  于2020年12月1日周二 上午12:17写道:

>
>
>
> On 11/11/20 6:54 PM, Hongyu Wang via Gcc-patches wrote:
> > Hi,
> >
> > According to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97770, x86
> > backend need popcount2 expander so __builtin_popcount could be
> > auto vectorized with AVX512BITALG/AVX512VPOPCNTDQ targets.
> >
> > For DImode the middle-end vectorizer could not generate expected code,
> > and for QI/HImode there is no corresponding IFN, xfails are added for
> > these tests.
> >
> > Bootstrap/regression test for x86 backend is OK.
> >
> > OK for master?
> >
> > gcc/ChangeLog
> >
> > PR target/97770
> > * gcc/config/i386/sse.md (popcount2): New expander
> > for SI/DI vector modes.
> > (popcount2): Likewise for QI/HI vector modes.
> >
> > gcc/testsuite/ChangeLog
> >
> > PR target/97770
> > * gcc.target/i386/avx512bitalg-pr97770-1.c: New test.
> > * gcc.target/i386/avx512vpopcntdq-pr97770-1.c: Likewise.
> > * gcc.target/i386/avx512vpopcntdq-pr97770-2.c: Likewise.
> > * gcc.target/i386/avx512vpopcntdqvl-pr97770-1.c: Likewise.
> >
> >
> > 0001-Add-popcount-mode-expander-to-enable-popcount-auto-v.patch
> >
> OK.  Presumably once this is applied Richi is going to  look at the
> higher level issues in the vectorizer which inhibit creating the HI/QI
> vector popcounts?
>
> Jeff
>


Re: [PATCH][PR target/97770] x86: Add missing popcount2 expander

2020-11-30 Thread Jeff Law via Gcc-patches



On 11/11/20 6:54 PM, Hongyu Wang via Gcc-patches wrote:
> Hi,
>
> According to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97770, x86
> backend need popcount2 expander so __builtin_popcount could be
> auto vectorized with AVX512BITALG/AVX512VPOPCNTDQ targets.
>
> For DImode the middle-end vectorizer could not generate expected code,
> and for QI/HImode there is no corresponding IFN, xfails are added for
> these tests.
>
> Bootstrap/regression test for x86 backend is OK.
>
> OK for master?
>
> gcc/ChangeLog
>
> PR target/97770
> * gcc/config/i386/sse.md (popcount2): New expander
> for SI/DI vector modes.
> (popcount2): Likewise for QI/HI vector modes.
>
> gcc/testsuite/ChangeLog
>
> PR target/97770
> * gcc.target/i386/avx512bitalg-pr97770-1.c: New test.
> * gcc.target/i386/avx512vpopcntdq-pr97770-1.c: Likewise.
> * gcc.target/i386/avx512vpopcntdq-pr97770-2.c: Likewise.
> * gcc.target/i386/avx512vpopcntdqvl-pr97770-1.c: Likewise.
>
>
> 0001-Add-popcount-mode-expander-to-enable-popcount-auto-v.patch
>
OK.  Presumably once this is applied Richi is going to  look at the
higher level issues in the vectorizer which inhibit creating the HI/QI
vector popcounts?

Jeff



[PATCH][PR target/97770] x86: Add missing popcount2 expander

2020-11-11 Thread Hongyu Wang via Gcc-patches
Hi,

According to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97770, x86
backend need popcount2 expander so __builtin_popcount could be
auto vectorized with AVX512BITALG/AVX512VPOPCNTDQ targets.

For DImode the middle-end vectorizer could not generate expected code,
and for QI/HImode there is no corresponding IFN, xfails are added for
these tests.

Bootstrap/regression test for x86 backend is OK.

OK for master?

gcc/ChangeLog

PR target/97770
* gcc/config/i386/sse.md (popcount2): New expander
for SI/DI vector modes.
(popcount2): Likewise for QI/HI vector modes.

gcc/testsuite/ChangeLog

PR target/97770
* gcc.target/i386/avx512bitalg-pr97770-1.c: New test.
* gcc.target/i386/avx512vpopcntdq-pr97770-1.c: Likewise.
* gcc.target/i386/avx512vpopcntdq-pr97770-2.c: Likewise.
* gcc.target/i386/avx512vpopcntdqvl-pr97770-1.c: Likewise.

-- 
Regards,

Hongyu, Wang
From b809052b0bab5d80dd0a1b1ffbd55faa8179a416 Mon Sep 17 00:00:00 2001
From: Hongyu Wang 
Date: Wed, 11 Nov 2020 09:41:13 +0800
Subject: [PATCH] Add popcount expander to enable popcount auto
 vectorization under AVX512BITALG/AVX512POPCNTDQ target.

gcc/ChangeLog

	PR target/97770
	* gcc/config/i386/sse.md (popcount2): New expander
	for SI/DI vector modes.
	(popcount2): Likewise for QI/HI vector modes.

gcc/testsuite/ChangeLog

	PR target/97770
	* gcc.target/i386/avx512bitalg-pr97770-1.c: New test.
	* gcc.target/i386/avx512vpopcntdq-pr97770-1.c: Likewise.
	* gcc.target/i386/avx512vpopcntdq-pr97770-2.c: Likewise.
	* gcc.target/i386/avx512vpopcntdqvl-pr97770-1.c: Likewise.
---
 gcc/config/i386/sse.md| 12 
 .../gcc.target/i386/avx512bitalg-pr97770-1.c  | 60 ++
 .../i386/avx512vpopcntdq-pr97770-1.c  | 63 +++
 .../i386/avx512vpopcntdq-pr97770-2.c  | 39 
 .../i386/avx512vpopcntdqvl-pr97770-1.c| 14 +
 5 files changed, 188 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512bitalg-pr97770-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512vpopcntdq-pr97770-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512vpopcntdq-pr97770-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512vpopcntdqvl-pr97770-1.c

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 8437ad27087..8566b2ccda2 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -22678,6 +22678,12 @@ (define_insn "avx5124vnniw_vp4dpwssds_maskz"
 (set_attr ("prefix") ("evex"))
 (set_attr ("mode") ("TI"))])
 
+(define_expand "popcount2"
+  [(set (match_operand:VI48_AVX512VL 0 "register_operand")
+	(popcount:VI48_AVX512VL
+	  (match_operand:VI48_AVX512VL 1 "nonimmediate_operand")))]
+  "TARGET_AVX512VPOPCNTDQ")
+
 (define_insn "vpopcount"
   [(set (match_operand:VI48_AVX512VL 0 "register_operand" "=v")
 	(popcount:VI48_AVX512VL
@@ -22722,6 +22728,12 @@ (define_insn "*restore_multiple_leave_return"
   "TARGET_SSE && TARGET_64BIT"
   "jmp\t%P1")
 
+(define_expand "popcount2"
+  [(set (match_operand:VI12_AVX512VL 0 "register_operand" "=v")
+	(popcount:VI12_AVX512VL
+	  (match_operand:VI12_AVX512VL 1 "nonimmediate_operand" "vm")))]
+  "TARGET_AVX512BITALG")
+
 (define_insn "vpopcount"
   [(set (match_operand:VI12_AVX512VL 0 "register_operand" "=v")
 	(popcount:VI12_AVX512VL
diff --git a/gcc/testsuite/gcc.target/i386/avx512bitalg-pr97770-1.c b/gcc/testsuite/gcc.target/i386/avx512bitalg-pr97770-1.c
new file mode 100644
index 000..c83a477045c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bitalg-pr97770-1.c
@@ -0,0 +1,60 @@
+/* PR target/97770 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512bitalg -mavx512vl -mprefer-vector-width=512" } */
+/* Add xfail since no IFN for QI/HImode popcount */
+/* { dg-final { scan-assembler-times "vpopcntb\[ \\t\]+\[^\\n\\r\]*xmm" 1 {xfail *-*-*} } } */
+/* { dg-final { scan-assembler-times "vpopcntw\[ \\t\]+\[^\\n\\r\]*xmm" 1 {xfail *-*-*} } } */
+/* { dg-final { scan-assembler-times "vpopcntb\[ \\t\]+\[^\\n\\r\]*ymm" 1 {xfail *-*-*} } } */
+/* { dg-final { scan-assembler-times "vpopcntw\[ \\t\]+\[^\\n\\r\]*ymm" 1 {xfail *-*-*} } } */
+/* { dg-final { scan-assembler-times "vpopcntb\[ \\t\]+\[^\\n\\r\]*zmm" 1 {xfail *-*-*} } } */
+/* { dg-final { scan-assembler-times "vpopcntw\[ \\t\]+\[^\\n\\r\]*zmm" 1 {xfail *-*-*} } } */
+
+#include 
+
+void
+__attribute__ ((noipa, optimize("-O3")))
+popcountb_128 (char * __restrict dest, char* src)
+{
+  for (int i = 0; i != 16; i++)
+dest[i] = __builtin_popcount (src[i]);
+}
+
+void
+__attribute__ ((noipa, optimize("-O3")))
+popcountw_128 (short* __restrict dest, short* src)
+{
+  for (int i = 0; i != 8; i++)
+dest[i] = __builtin_popcount (src[i]);
+}
+
+void
+__attribute__ ((noipa, optimize("-O3")))
+popcountb_256 (char * __restrict dest, char* src)
+{
+  for (int i = 0; i != 32; i++)
+dest[i] = __builtin_popcount (src[i]);
+}
+
+void
+__attribute__ ((noipa, optimize("-O3")))
+popcountw_256 (short* __re