Re: [PATCH] RISC-V: Add popcount fallback expander.

2023-10-18 Thread Robin Dapp
> Could you by the way add this mention this PR: 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111791
> Add the test of this PR ?

Commented in that PR.  This patch does not help there.

Regards
 Robin


Re: Re: [PATCH] RISC-V: Add popcount fallback expander.

2023-10-18 Thread 钟居哲

Could you by the way add this mention this PR: 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111791
Add the test of this PR ?









juzhe.zh...@rivai.ai



 



From: Robin Dapp



Date: 2023-10-18 21:51



To: juzhe.zh...@rivai.ai; gcc-patches; palmer; kito.cheng; jeffreyalaw



CC: rdapp.gcc



Subject: Re: [PATCH] RISC-V: Add popcount fallback expander.



I didn't push this yet because it would have introduced an UNRESOLVED that



my summary script didn't catch.  Normally I go with just contrib/test_summary



but that only filters out FAIL and XPASS.  I should really be using



compare_testsuite_log.py from riscv-gnu-toolchain/scripts.



 



It was caused by a typo:



 



-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "slp" } } 
*/



+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "slp2" } 
} */



 



Regards



Robin



 




Re: [PATCH] RISC-V: Add popcount fallback expander.

2023-10-18 Thread Robin Dapp
I didn't push this yet because it would have introduced an UNRESOLVED that
my summary script didn't catch.  Normally I go with just contrib/test_summary
but that only filters out FAIL and XPASS.  I should really be using
compare_testsuite_log.py from riscv-gnu-toolchain/scripts.

It was caused by a typo:

-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "slp" } } 
*/
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "slp2" } 
} */

Regards
 Robin


[PATCH] RISC-V: Add popcount fallback expander.

2023-10-18 Thread juzhe.zh...@rivai.ai
LGTM popcount patch.



juzhe.zh...@rivai.ai


Re: [PATCH] RISC-V: Add popcount fallback expander.

2023-10-18 Thread Robin Dapp
> I saw you didn't extend VI -> V_VLSI. I guess will failed SLP on popcount.

Added VLS modes and your test in v2.

Testsuite looks unchanged on my side (vect, dg, rvv).

Regards
 Robin

Subject: [PATCH v2] RISC-V: Add popcount fallback expander.

I didn't manage to get back to the generic vectorizer fallback for
popcount so I figured I'd rather create a popcount fallback in the
riscv backend.  It uses the WWG algorithm from libgcc.

gcc/ChangeLog:

* config/riscv/autovec.md (popcount2): New expander.
* config/riscv/riscv-protos.h (expand_popcount): Define.
* config/riscv/riscv-v.cc (expand_popcount): Vectorize popcount
with the WWG algorithm.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/popcount-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/popcount-2.c: New test.
* gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/popcount.c: New test.
---
 gcc/config/riscv/autovec.md   |   14 +
 gcc/config/riscv/riscv-protos.h   |1 +
 gcc/config/riscv/riscv-v.cc   |   71 +
 .../riscv/rvv/autovec/unop/popcount-1.c   |   20 +
 .../riscv/rvv/autovec/unop/popcount-2.c   |   19 +
 .../riscv/rvv/autovec/unop/popcount-run-1.c   |   49 +
 .../riscv/rvv/autovec/unop/popcount.c | 1464 +
 7 files changed, 1638 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index c5b1e52cbf9..80910ba3cc2 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -1484,6 +1484,20 @@ (define_expand "xorsign3"
   DONE;
 })
 
+;; 
---
+;; - [INT] POPCOUNT.
+;; 
---
+
+(define_expand "popcount2"
+  [(match_operand:V_VLSI 0 "register_operand")
+   (match_operand:V_VLSI 1 "register_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_popcount (operands);
+  DONE;
+})
+
+
 ;; -
 ;;  [INT] Highpart multiplication
 ;; -
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 49bdcdf2f93..4aeccdd961b 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -515,6 +515,7 @@ void expand_fold_extract_last (rtx *);
 void expand_cond_unop (unsigned, rtx *);
 void expand_cond_binop (unsigned, rtx *);
 void expand_cond_ternop (unsigned, rtx *);
+void expand_popcount (rtx *);
 
 /* Rounding mode bitfield for fixed point VXRM.  */
 enum fixed_point_rounding_mode
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 21d86c3f917..8b594b7127e 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -4152,4 +4152,75 @@ expand_vec_lfloor (rtx op_0, rtx op_1, machine_mode 
vec_fp_mode,
   emit_vec_cvt_x_f (op_0, op_1, UNARY_OP_FRM_RDN, vec_fp_mode);
 }
 
+/* Vectorize popcount by the Wilkes-Wheeler-Gill algorithm that libgcc uses as
+   well.  */
+void
+expand_popcount (rtx *ops)
+{
+  rtx dst = ops[0];
+  rtx src = ops[1];
+  machine_mode mode = GET_MODE (dst);
+  scalar_mode imode = GET_MODE_INNER (mode);
+  static const uint64_t m5 = 0xULL;
+  static const uint64_t m3 = 0xULL;
+  static const uint64_t mf = 0x0F0F0F0F0F0F0F0FULL;
+  static const uint64_t m1 = 0x0101010101010101ULL;
+
+  rtx x1 = gen_reg_rtx (mode);
+  rtx x2 = gen_reg_rtx (mode);
+  rtx x3 = gen_reg_rtx (mode);
+  rtx x4 = gen_reg_rtx (mode);
+
+  /* x1 = src - (src >> 1) & 0x555...);  */
+  rtx shift1 = expand_binop (mode, lshr_optab, src, GEN_INT (1), NULL, true,
+OPTAB_DIRECT);
+
+  rtx and1 = gen_reg_rtx (mode);
+  rtx ops1[] = {and1, shift1, gen_int_mode (m5, imode)};
+  emit_vlmax_insn (code_for_pred_scalar (AND, mode), riscv_vector::BINARY_OP,
+  ops1);
+
+  x1 = expand_binop (mode, sub_optab, src, and1, NULL, true, OPTAB_DIRECT);
+
+  /* x2 = (x1 & 0xULL) + ((x1 >> 2) & 0xULL);
+   */
+  rtx and2 = gen_reg_rtx (mode);
+  rtx ops2[] = {and2, x1, gen_int_mode (m3, imode)};
+  emit_vlmax_insn (code_for_pred_scalar (AND, mode), riscv_vector::BINARY_OP,
+  ops2);
+
+  rtx shift2 = expand_binop (mode, lshr_optab, x1, GEN_INT (2), NULL, true,
+OPTAB_DIRECT);
+
+  rtx and22 = gen_reg_rtx (mode);
+  rtx ops22[] = {and22, shift2, gen_int_mode (m3, imode)};
+  

Re: [PATCH] RISC-V: Add popcount fallback expander.

2023-10-18 Thread Robin Dapp
> I saw you didn't extend VI -> V_VLSI. I guess will failed SLP on
> popcount.
Hehe, right, I just copied and pasted the expander from my old
patch.  Will adjust it and add the test.

Regards
 Robin


[PATCH] RISC-V: Add popcount fallback expander.

2023-10-18 Thread Robin Dapp
Hi,

as I didn't manage to get back to the generic vectorizer fallback for
popcount in time (still the generic costing problem) I figured I'd
rather implement the popcount fallback in the riscv backend.
It uses the WWG algorithm from libgcc.

rvv.exp is unchanged, vect and dg.exp testsuites are currently running.

Regards
 Robin

gcc/ChangeLog:

* config/riscv/autovec.md (popcount2): New expander.
* config/riscv/riscv-protos.h (expand_popcount): Define.
* config/riscv/riscv-v.cc (expand_popcount): Vectorize popcount
with the WWG algorithm.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/popcount-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/popcount.c: New test.
---
 gcc/config/riscv/autovec.md   |   14 +
 gcc/config/riscv/riscv-protos.h   |1 +
 gcc/config/riscv/riscv-v.cc   |   71 +
 .../riscv/rvv/autovec/unop/popcount-1.c   |   20 +
 .../riscv/rvv/autovec/unop/popcount-run-1.c   |   49 +
 .../riscv/rvv/autovec/unop/popcount.c | 1464 +
 6 files changed, 1619 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index c5b1e52cbf9..dfe836f705d 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -1484,6 +1484,20 @@ (define_expand "xorsign3"
   DONE;
 })
 
+;; 
---
+;; - [INT] POPCOUNT.
+;; 
---
+
+(define_expand "popcount2"
+  [(match_operand:VI 0 "register_operand")
+   (match_operand:VI 1 "register_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_popcount (operands);
+  DONE;
+})
+
+
 ;; -
 ;;  [INT] Highpart multiplication
 ;; -
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 49bdcdf2f93..4aeccdd961b 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -515,6 +515,7 @@ void expand_fold_extract_last (rtx *);
 void expand_cond_unop (unsigned, rtx *);
 void expand_cond_binop (unsigned, rtx *);
 void expand_cond_ternop (unsigned, rtx *);
+void expand_popcount (rtx *);
 
 /* Rounding mode bitfield for fixed point VXRM.  */
 enum fixed_point_rounding_mode
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 21d86c3f917..8b594b7127e 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -4152,4 +4152,75 @@ expand_vec_lfloor (rtx op_0, rtx op_1, machine_mode 
vec_fp_mode,
   emit_vec_cvt_x_f (op_0, op_1, UNARY_OP_FRM_RDN, vec_fp_mode);
 }
 
+/* Vectorize popcount by the Wilkes-Wheeler-Gill algorithm that libgcc uses as
+   well.  */
+void
+expand_popcount (rtx *ops)
+{
+  rtx dst = ops[0];
+  rtx src = ops[1];
+  machine_mode mode = GET_MODE (dst);
+  scalar_mode imode = GET_MODE_INNER (mode);
+  static const uint64_t m5 = 0xULL;
+  static const uint64_t m3 = 0xULL;
+  static const uint64_t mf = 0x0F0F0F0F0F0F0F0FULL;
+  static const uint64_t m1 = 0x0101010101010101ULL;
+
+  rtx x1 = gen_reg_rtx (mode);
+  rtx x2 = gen_reg_rtx (mode);
+  rtx x3 = gen_reg_rtx (mode);
+  rtx x4 = gen_reg_rtx (mode);
+
+  /* x1 = src - (src >> 1) & 0x555...);  */
+  rtx shift1 = expand_binop (mode, lshr_optab, src, GEN_INT (1), NULL, true,
+OPTAB_DIRECT);
+
+  rtx and1 = gen_reg_rtx (mode);
+  rtx ops1[] = {and1, shift1, gen_int_mode (m5, imode)};
+  emit_vlmax_insn (code_for_pred_scalar (AND, mode), riscv_vector::BINARY_OP,
+  ops1);
+
+  x1 = expand_binop (mode, sub_optab, src, and1, NULL, true, OPTAB_DIRECT);
+
+  /* x2 = (x1 & 0xULL) + ((x1 >> 2) & 0xULL);
+   */
+  rtx and2 = gen_reg_rtx (mode);
+  rtx ops2[] = {and2, x1, gen_int_mode (m3, imode)};
+  emit_vlmax_insn (code_for_pred_scalar (AND, mode), riscv_vector::BINARY_OP,
+  ops2);
+
+  rtx shift2 = expand_binop (mode, lshr_optab, x1, GEN_INT (2), NULL, true,
+OPTAB_DIRECT);
+
+  rtx and22 = gen_reg_rtx (mode);
+  rtx ops22[] = {and22, shift2, gen_int_mode (m3, imode)};
+  emit_vlmax_insn (code_for_pred_scalar (AND, mode), riscv_vector::BINARY_OP,
+  ops22);
+
+  x2 = expand_binop (mode, add_optab, and2, and22, NULL, true, OPTAB_DIRECT);
+
+  /* x3 = (x2 + (x2 >> 4)) & 0x0f0f0f0f0f0f0f0fULL;  */
+  rtx shift3 = expand_binop (mode, lshr_optab, x2, GEN_INT (4), NULL, true,
+