from:"钟居哲"

Re: [RISC-V] Combining vfmv and .vv instructions into a .vf instruction

2024-07-24 Thread 钟居哲

I think Demin is working on it. And Robin is reviewer of this stuff.

juzhe.zh...@rivai.ai

From: Artemiy Volkov
Date: 2024-07-25 01:25
To: juzhe.zh...@rivai.ai; demin@starfivetech.com; jeffreya...@gmail.com
CC: gcc@gcc.gnu.org
Subject: [RISC-V] Combining vfmv and .vv instructions into a .vf instruction
Hi Juzhe, Demin, Jeff,

This email is intended to continue the discussion started in
https://marc.info/?l=gcc-patches=170927452922009=2 about combining vfmv.v.f
and vfmxx.vv instructions into the scalar-vector form vfmxx.vf.

There was a mention on that thread of the potential usefulness of the 
late-combine
pass (added last month in
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=792f97b44ffc5e6a967292b3747fd835e99396e7)
in making this transformation. However, when I tried it out with my testcase at
https://godbolt.org/z/o8oPzo7qY, I found it unable to handle these complex
post-split1 patterns for broadcast and vfmacc:

(insn 129 128 130 3 (set (reg:RVVM4SF 168 [ _61 ])
(if_then_else:RVVM4SF (unspec:RVVMF8BI [
(const_vector:RVVMF8BI [
(const_int 1 [0x1]) repeated x16
])
(const_int 16 [0x10])
(const_int 2 [0x2]) repeated x2
(const_int 0 [0])
(reg:SI 66 vl)
(reg:SI 67 vtype)
] UNSPEC_VPREDICATE)
(vec_duplicate:RVVM4SF (mem:SF (reg:SI 143 [ ivtmp.21 ]) [1 
MEM[(float *)_145]+0 S4 A32]))
(unspec:RVVM4SF [
(reg:SI 0 zero)
] UNSPEC_VUNDEF))) "/app/example.c":19:53 4019 
{*pred_broadcastrvvm4sf_zvfh}
 (nil))
[ ... ]
(insn 131 130 34 3 (set (reg:RVVM4SF 139 [ D__lsm.10 ])
(if_then_else:RVVM4SF (unspec:RVVMF8BI [
(const_vector:RVVMF8BI [
(const_int 1 [0x1]) repeated x16
])
(const_int 16 [0x10])
(const_int 2 [0x2]) repeated x2
(const_int 0 [0])
(const_int 7 [0x7])
(reg:SI 66 vl)
(reg:SI 67 vtype)
(reg:SI 69 frm)
] UNSPEC_VPREDICATE)
(plus:RVVM4SF (mult:RVVM4SF (reg/v:RVVM4SF 135 [ row ])
(reg:RVVM4SF 168 [ _61 ]))
(reg:RVVM4SF 139 [ D__lsm.10 ]))
(unspec:RVVM4SF [
(reg:SI 0 zero)
] UNSPEC_VUNDEF))) "/app/example.c":19:36 15007 
{*pred_mul_addrvvm4sf_undef}
 (nil))

I'm no expert on this, but what's stopping us from adding some vector-scalar
split patterns alongside vector-vector ones in autovec.md to fix this? For
instance, the addition of fma4_scalar insn_and_split like this:

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index d5793ac..bf54d71 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -1229,2 +1229,22 @@

+(define_insn_and_split "fma4_scalar"
+  [(set (match_operand:V_VLSF 0 "register_operand")
+(plus:V_VLSF
+ (mult:V_VLSF
+   (vec_duplicate:V_VLSF (match_operand:SF 1 
"direct_broadcast_operand"))
+   (match_operand:V_VLSF 2 "register_operand"))
+ (match_operand:V_VLSF 3 "register_operand")))]
+  "TARGET_VECTOR && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+rtx ops[] = {operands[0], operands[1], operands[2], operands[3],
+ operands[0]};
+riscv_vector::emit_vlmax_insn (code_for_pred_mul_scalar (PLUS, mode),
+  riscv_vector::TERNARY_OP_FRM_DYN, ops);
+DONE;
+  }
+  [(set_attr "type" "vector")])
+
;; -

does lead to vfmacc.vf instructions being emitted instead of vfmacc.vv's for the
testcase linked above.

What do you think about this approach to implement this optimization? Am I
missing anything important? Maybe split1 is too early to determine the final
instruction format (.vf vs .vv) and we should strive to recombine during
late-combine2?

Also, is there anyone working on this optimization at the present moment?

Many thanks in advance,
Artemiy

Re: [PATCH v1] RISC-V: Bugfix vfmv insn honor zvfhmin for FP16 SEW [PR115763]

2024-07-03 Thread 钟居哲

LGTM。



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2024-07-03 22:17
To: gcc-patches
CC: juzhe.zhong; kito.cheng; jeffreyalaw; rdapp.gcc; Pan Li
Subject: [PATCH v1] RISC-V: Bugfix vfmv insn honor zvfhmin for FP16 SEW 
[PR115763]
From: Pan Li 
 
According to the ISA,  the zvfhmin sub extension should only contain
convertion insn.  Thus,  the vfmv insn acts on FP16 should not be
present when only the zvfhmin option is given.
 
This patch would like to fix it by split the pred_broadcast define_insn
into zvfhmin and zvfh part.  Given below example:
 
void test (_Float16 *dest, _Float16 bias) {
  dest[0] = bias;
  dest[1] = bias;
}
 
when compile with -march=rv64gcv_zfh_zvfhmin
 
Before this patch:
test:
  vsetivlizero,2,e16,mf4,ta,ma
  vfmv.v.fv1,fa0 // should not leverage vfmv for zvfhmin
  vse16.v v1,0(a0)
  ret
 
After this patch:
test:
  addi sp,sp,-16
  fsh  fa0,14(sp)
  addi a5,sp,14
  vsetivli zero,2,e16,mf4,ta,ma
  vlse16.v v1,0(a5),zero
  vse16.v  v1,0(a0)
  addi sp,sp,16
  jr   ra
 
PR target/115763
 
gcc/ChangeLog:
 
* config/riscv/vector.md (*pred_broadcast): Split into
zvfh and zvfhmin part.
(*pred_broadcast_zvfh): New define_insn for zvfh part.
(*pred_broadcast_zvfhmin): Ditto but for zvfhmin.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/scalar_move-5.c: Adjust asm check.
* gcc.target/riscv/rvv/base/scalar_move-6.c: Ditto.
* gcc.target/riscv/rvv/base/scalar_move-7.c: Ditto.
* gcc.target/riscv/rvv/base/scalar_move-8.c: Ditto.
* gcc.target/riscv/rvv/base/pr115763-1.c: New test.
* gcc.target/riscv/rvv/base/pr115763-2.c: New test.
 
Signed-off-by: Pan Li 
---
gcc/config/riscv/vector.md| 49 +--
.../gcc.target/riscv/rvv/base/pr115763-1.c|  9 
.../gcc.target/riscv/rvv/base/pr115763-2.c| 10 
.../gcc.target/riscv/rvv/base/scalar_move-5.c |  4 +-
.../gcc.target/riscv/rvv/base/scalar_move-6.c |  6 +--
.../gcc.target/riscv/rvv/base/scalar_move-7.c |  6 +--
.../gcc.target/riscv/rvv/base/scalar_move-8.c |  6 +--
7 files changed, 64 insertions(+), 26 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr115763-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr115763-2.c
 
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index fe18ee5b5f7..d9474262d54 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -2080,31 +2080,50 @@ (define_insn_and_split "*pred_broadcast"
   [(set_attr "type" "vimov,vimov,vlds,vlds,vlds,vlds,vimovxv,vimovxv")
(set_attr "mode" "")])
-(define_insn "*pred_broadcast"
-  [(set (match_operand:V_VLSF_ZVFHMIN 0 "register_operand" "=vr, vr, 
vr, vr, vr, vr, vr, vr")
- (if_then_else:V_VLSF_ZVFHMIN
+(define_insn "*pred_broadcast_zvfh"
+  [(set (match_operand:V_VLSF0 "register_operand"  "=vr,  vr,  
vr,  vr")
+ (if_then_else:V_VLSF
  (unspec:
- [(match_operand: 1 "vector_broadcast_mask_operand" "Wc1,Wc1, vm, 
vm,Wc1,Wc1,Wb1,Wb1")
-  (match_operand 4 "vector_length_operand"  " rK, rK, rK, rK, 
rK, rK, rK, rK")
-  (match_operand 5 "const_int_operand"  "  i,  i,  i,  i,  
i,  i,  i,  i")
-  (match_operand 6 "const_int_operand"  "  i,  i,  i,  i,  
i,  i,  i,  i")
-  (match_operand 7 "const_int_operand"  "  i,  i,  i,  i,  
i,  i,  i,  i")
+ [(match_operand: 1 "vector_broadcast_mask_operand" "Wc1, Wc1, Wb1, 
Wb1")
+  (match_operand  4 "vector_length_operand" " rK,  rK,  rK,  
rK")
+  (match_operand  5 "const_int_operand" "  i,   i,   i,   
i")
+  (match_operand  6 "const_int_operand" "  i,   i,   i,   
i")
+  (match_operand  7 "const_int_operand" "  i,   i,   i,   
i")
 (reg:SI VL_REGNUM)
 (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
-   (vec_duplicate:V_VLSF_ZVFHMIN
- (match_operand: 3 "direct_broadcast_operand"   " f,  
f,Wdm,Wdm,Wdm,Wdm,  f,  f"))
-   (match_operand:V_VLSF_ZVFHMIN 2 "vector_merge_operand""vu,  0, vu,  0, 
vu,  0, vu,  0")))]
+   (vec_duplicate:V_VLSF
+ (match_operand: 3 "direct_broadcast_operand"  "  f,   f,   f,   
f"))
+   (match_operand:V_VLSF  2 "vector_merge_operand"  " vu,   0,  vu,   
0")))]
   "TARGET_VECTOR"
   "@
vfmv.v.f\t%0,%3
vfmv.v.f\t%0,%3
+   vfmv.s.f\t%0,%3
+   vfmv.s.f\t%0,%3"
+  [(set_attr "type" "vfmov,vfmov,vfmovfv,vfmovfv")
+   (set_attr "mode" "")])
+
+(define_insn "*pred_broadcast_zvfhmin"
+  [(set (match_operand:V_VLSF_ZVFHMIN   0 "register_operand"  
"=vr,  vr,  vr,  vr")
+ (if_then_else:V_VLSF_ZVFHMIN
+   (unspec:
+ [(match_operand:1 "vector_broadcast_mask_operand" " vm,  vm, 
Wc1, Wc1")
+  (match_operand 4 "vector_length_operand" " rK,  rK,  
rK,  rK")
+  (match_operand 5 "const_int_operand" "  i,   i,  
 i,   i")
+  (match_operand 6 "const_int_operand"

Re: [PATCH v2] RISC-V: Remove integer vector eqne pattern

2024-06-20 Thread 钟居哲

LGTM.



juzhe.zh...@rivai.ai
 
From: demin.han
Date: 2024-06-20 11:28
To: gcc-patches
CC: juzhe.zhong; kito.cheng; pan2.li; jeffreyalaw; rdapp.gcc
Subject: [PATCH v2] RISC-V: Remove integer vector eqne pattern
We can unify eqne and other comparison operations.
 
Tested on RV32 and RV64.
 
gcc/ChangeLog:
 
* config/riscv/predicates.md (comparison_except_eqge_operator): Only
  exclude ge
(comparison_except_ge_operator): Ditto
* config/riscv/riscv-string.cc (expand_rawmemchr): Use cmp pattern
(expand_strcmp): Ditto
* config/riscv/riscv-vector-builtins-bases.cc: Remove eqne cond
* config/riscv/vector.md (@pred_eqne_scalar): Remove eqne
  patterns
(*pred_eqne_scalar_merge_tie_mask): Ditto
(*pred_eqne_scalar): Ditto
(*pred_eqne_scalar_narrow): Ditto
(*pred_eqne_extended_scalar_merge_tie_mask): Ditto
(*pred_eqne_extended_scalar): Ditto
(*pred_eqne_extended_scalar_narrow): Ditto
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/integer-cmp-eqne.c: New test.
 
Signed-off-by: demin.han 
---
v2 changes:
  1. add test
 
gcc/config/riscv/predicates.md|   4 +-
gcc/config/riscv/riscv-string.cc  |   4 +-
.../riscv/riscv-vector-builtins-bases.cc  |   3 -
gcc/config/riscv/vector.md| 279 +-
.../riscv/rvv/base/integer-cmp-eqne.c |  66 +
5 files changed, 81 insertions(+), 275 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/integer-cmp-eqne.c
 
diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 0fb5729fdcf..9971fabc587 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -568,8 +568,8 @@ (define_predicate "ltge_operator"
(define_predicate "comparison_except_ltge_operator"
   (match_code "eq,ne,le,leu,gt,gtu"))
-(define_predicate "comparison_except_eqge_operator"
-  (match_code "le,leu,gt,gtu,lt,ltu"))
+(define_predicate "comparison_except_ge_operator"
+  (match_code "eq,ne,le,leu,gt,gtu,lt,ltu"))
(define_predicate "ge_operator"
   (match_code "ge,geu"))
diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index 83e7afbd693..4702001bd9b 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -1342,7 +1342,7 @@ expand_rawmemchr (machine_mode mode, rtx dst, rtx 
haystack, rtx needle,
   /* Compare needle with haystack and store in a mask.  */
   rtx eq = gen_rtx_EQ (mask_mode, gen_const_vec_duplicate (vmode, needle), 
vec);
   rtx vmsops[] = {mask, eq, vec, needle};
-  emit_nonvlmax_insn (code_for_pred_eqne_scalar (vmode),
+  emit_nonvlmax_insn (code_for_pred_cmp_scalar (vmode),
  riscv_vector::COMPARE_OP, vmsops, cnt);
   /* Find the first bit in the mask.  */
@@ -1468,7 +1468,7 @@ expand_strcmp (rtx result, rtx src1, rtx src2, rtx nbytes,
 = gen_rtx_EQ (mask_mode, gen_const_vec_duplicate (vmode, CONST0_RTX 
(mode)),
  vec1);
   rtx vmsops1[] = {mask0, eq0, vec1, CONST0_RTX (mode)};
-  emit_nonvlmax_insn (code_for_pred_eqne_scalar (vmode),
+  emit_nonvlmax_insn (code_for_pred_cmp_scalar (vmode),
  riscv_vector::COMPARE_OP, vmsops1, cnt);
   /* Look for vec1 != vec2 (includes vec2[i] == 0).  */
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index 596b88cc8a3..6483faba39c 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -718,9 +718,6 @@ public:
  if (CODE == GE || CODE == GEU)
return e.use_compare_insn (CODE, code_for_pred_ge_scalar (
   e.vector_mode ()));
-   else if (CODE == EQ || CODE == NE)
- return e.use_compare_insn (CODE, code_for_pred_eqne_scalar (
-e.vector_mode ()));
  else
return e.use_compare_insn (CODE, code_for_pred_cmp_scalar (
   e.vector_mode ()));
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index f8fae6557d9..fe18ee5b5f7 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -4704,7 +4704,7 @@ (define_expand "@pred_cmp_scalar"
 (match_operand 8 "const_int_operand")
 (reg:SI VL_REGNUM)
 (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
-   (match_operator: 3 "comparison_except_eqge_operator"
+   (match_operator: 3 "comparison_except_ge_operator"
 [(match_operand:V_VLSI_QHS 4 "register_operand")
  (vec_duplicate:V_VLSI_QHS
(match_operand: 5 "register_operand"))])
@@ -4722,7 +4722,7 @@ (define_insn "*pred_cmp_scalar_merge_tie_mask"
 (match_operand 7 "const_int_operand"  "  i")
 (reg:SI VL_REGNUM)
 (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
-   (match_operator: 2 "comparison_except_eqge_operator"
+   (match_operator: 2 "comparison_except_ge_operator"
 [(match_operand:V_VLSI_QHS 3 "register_operand"   " vr")
  (vec_duplicate:V_VLSI_QHS
(match_operand: 4 "register_operand"  "  r"))])
@@ -4747,7 +4747,7 @@ (define_insn "*pred_cmp_scalar"
 (match_operand 8 "const_int_operand" "i,

Re: [PATCH v1] RISC-V: Refine the SAT_ARITH test help header files [NFC]

2024-06-14 Thread 钟居哲

LGTM

juzhe.zh...@rivai.ai

From: pan2.li
Date: 2024-06-15 10:44
To: gcc-patches
CC: juzhe.zhong; kito.cheng; jeffreyalaw; rdapp.gcc; Pan Li
Subject: [PATCH v1] RISC-V: Refine the SAT_ARITH test help header files [NFC]
From: Pan Li 

Separate the vector part code to one standalone header file,  which
is independent with the scalar part.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-1.c: Leverage
the new header file for vector part.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-run-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-run-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-run-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-run-4.c: Ditto.
* gcc.target/riscv/sat_arith.h: Move vector part out.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h: New test.

Signed-off-by: Pan Li 
---
.../riscv/rvv/autovec/binop/vec_sat_arith.h   | 59 +++
.../riscv/rvv/autovec/binop/vec_sat_u_add-1.c |  2 +-
.../riscv/rvv/autovec/binop/vec_sat_u_add-2.c |  2 +-
.../riscv/rvv/autovec/binop/vec_sat_u_add-3.c |  2 +-
.../riscv/rvv/autovec/binop/vec_sat_u_add-4.c |  2 +-
.../rvv/autovec/binop/vec_sat_u_add-run-1.c   |  2 +-
.../rvv/autovec/binop/vec_sat_u_add-run-2.c   |  2 +-
.../rvv/autovec/binop/vec_sat_u_add-run-3.c   |  2 +-
.../rvv/autovec/binop/vec_sat_u_add-run-4.c   |  2 +-
.../riscv/rvv/autovec/binop/vec_sat_u_sub-1.c |  2 +-
.../riscv/rvv/autovec/binop/vec_sat_u_sub-2.c |  2 +-
.../riscv/rvv/autovec/binop/vec_sat_u_sub-3.c |  2 +-
.../riscv/rvv/autovec/binop/vec_sat_u_sub-4.c |  2 +-
.../rvv/autovec/binop/vec_sat_u_sub-run-1.c   |  2 +-
.../rvv/autovec/binop/vec_sat_u_sub-run-2.c   |  2 +-
.../rvv/autovec/binop/vec_sat_u_sub-run-3.c   |  2 +-
.../rvv/autovec/binop/vec_sat_u_sub-run-4.c   |  2 +-
gcc/testsuite/gcc.target/riscv/sat_arith.h| 57 ++
18 files changed, 80 insertions(+), 68 deletions(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h
new file mode 100644
index 000..450f0fbbc72
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h
@@ -0,0 +1,59 @@
+#ifndef HAVE_VEC_SAT_ARITH
+#define HAVE_VEC_SAT_ARITH
+
+#include 
+
+/**/
+/* Saturation Add (unsigned and signed)   
*/
+/**/
+#define DEF_VEC_SAT_U_ADD_FMT_1(T)   \
+void __attribute__((noinline))   \
+vec_sat_u_add_##T##_fmt_1 (T *out, T *op_1, T *op_2, unsigned limit) \
+{\
+  unsigned i;\
+  for (i = 0; i < limit; i++)\
+{\
+  T x = op_1[i]; \
+  T y = op_2[i]; \
+  out[i] = (x + y) | (-(T)((T)(x + y) < x)); \
+}\
+}
+
+#define RUN_VEC_SAT_U_ADD_FMT_1(T, out, op_1, op_2, N) \
+  vec_sat_u_add_##T##_fmt_1(out, op_1, op_2, N)
+
+/**/
+/* Saturation Sub (Unsigned and Signed)   
*/
+/**/
+#define DEF_VEC_SAT_U_SUB_FMT_1(T)   \
+void __attribute__((noinline))   \
+vec_sat_u_sub_##T##_fmt_1 (T *out, T *op_1, T *op_2, unsigned limit) \
+{\
+  unsigned i;\
+  for (i = 0; i < limit; i++)\
+{

Re: [PATCH] RISC-V: Split vwadd.wx and vwsub.wx and add helpers.

2024-05-17 Thread 钟居哲

I think it should be backport to GCC-14 since it is a bug.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2024-05-17 23:24
To: gcc-patches
CC: palmer; Kito Cheng; juzhe.zh...@rivai.ai; jeffreyalaw; rdapp.gcc
Subject: [PATCH] RISC-V: Split vwadd.wx and vwsub.wx and add helpers.
Hi,
 
vwadd.wx and vwsub.wx have the same problem vfwadd.wf had.  This patch
splits the insn pattern in the same way vfwadd.wf was split.
 
It also adds two patterns to recognize extended scalars.  In practice
those do not provide a lot of improvement over what we already have but
in some instances we can get rid of redundant extensions.  If somebody
considers the patterns excessive, I'd be open to not add them.
 
Regtested on rv64gcv_zvfh_zvbb.
 
Regards
Robin
 
gcc/ChangeLog:
 
* config/riscv/vector.md: Split vwadd.wx/vwsub.wx pattern and
add extended_scalar patterns.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/pr115068.c: Add vwadd.wx/vwsub.wx
tests.
* gcc.target/riscv/rvv/base/pr115068-run.c: Include pr115068.c.
* gcc.target/riscv/rvv/base/vwaddsub-1.c: New test.
---
gcc/config/riscv/vector.md| 62 ---
.../gcc.target/riscv/rvv/base/pr115068-run.c  | 24 +--
.../gcc.target/riscv/rvv/base/pr115068.c  | 26 
.../gcc.target/riscv/rvv/base/vwaddsub-1.c| 47 ++
4 files changed, 127 insertions(+), 32 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vwaddsub-1.c
 
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 107914afa3a..248461302dd 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -3900,27 +3900,71 @@ (define_insn 
"@pred_single_widen_add"
(set_attr "mode" "")])
(define_insn "@pred_single_widen__scalar"
-  [(set (match_operand:VWEXTI 0 "register_operand"   "=vr,   
vr")
+  [(set (match_operand:VWEXTI 0 "register_operand" "=vd,vd, 
vr, vr")
(if_then_else:VWEXTI
  (unspec:
- [(match_operand: 1 "vector_mask_operand"   "vmWc1,vmWc1")
-  (match_operand 5 "vector_length_operand"  "   rK,   rK")
-  (match_operand 6 "const_int_operand"  "i,i")
-  (match_operand 7 "const_int_operand"  "i,i")
-  (match_operand 8 "const_int_operand"  "i,i")
+ [(match_operand: 1 "vector_mask_operand"" vm,vm,Wc1,Wc1")
+  (match_operand 5 "vector_length_operand"  " rK,rK, rK, rK")
+  (match_operand 6 "const_int_operand"  "  i, i,  i,  i")
+  (match_operand 7 "const_int_operand"  "  i, i,  i,  i")
+  (match_operand 8 "const_int_operand"  "  i, i,  i,  i")
 (reg:SI VL_REGNUM)
 (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
  (plus_minus:VWEXTI
- (match_operand:VWEXTI 3 "register_operand" "   vr,   vr")
+ (match_operand:VWEXTI 3 "register_operand" " vr,vr, vr, vr")
(any_extend:VWEXTI
  (vec_duplicate:
- (match_operand: 4 "reg_or_0_operand"   "   rJ,   rJ"
-   (match_operand:VWEXTI 2 "vector_merge_operand"   "   vu,0")))]
+ (match_operand: 4 "reg_or_0_operand"   " rJ,rJ, rJ, rJ"
+   (match_operand:VWEXTI 2 "vector_merge_operand"   " vu, 0, vu,  
0")))]
   "TARGET_VECTOR"
   "vw.wx\t%0,%3,%z4%p1"
   [(set_attr "type" "vi")
(set_attr "mode" "")])
+(define_insn "@pred_single_widen_add_extended_scalar"
+  [(set (match_operand:VWEXTI 0 "register_operand" "=vd,vd, 
vr, vr")
+ (if_then_else:VWEXTI
+   (unspec:
+ [(match_operand: 1 "vector_mask_operand"" vm,vm,Wc1,Wc1")
+  (match_operand 5 "vector_length_operand"  " rK,rK, rK, rK")
+  (match_operand 6 "const_int_operand"  "  i, i,  i,  i")
+  (match_operand 7 "const_int_operand"  "  i, i,  i,  i")
+  (match_operand 8 "const_int_operand"  "  i, i,  i,  i")
+  (reg:SI VL_REGNUM)
+  (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
+   (plus:VWEXTI
+ (vec_duplicate:VWEXTI
+   (any_extend:
+ (match_operand: 4 "reg_or_0_operand"   " rJ,rJ, rJ, rJ")))
+ (match_operand:VWEXTI 3 "register_operand" " vr,vr, vr, vr"))
+   (match_operand:VWEXTI 2 "vector_merge_operand"   " vu, 0, vu,  
0")))]
+  "TARGET_VECTOR"
+  "vwadd.wx\t%0,%3,%z4%p1"
+  [(set_attr "type" "viwalu")
+   (set_attr "mode" "")])
+
+(define_insn "@pred_single_widen_sub_extended_scalar"
+  [(set (match_operand:VWEXTI 0 "register_operand" "=vd,vd, 
vr, vr")
+ (if_then_else:VWEXTI
+   (unspec:
+ [(match_operand: 1 "vector_mask_operand"" vm,vm,Wc1,Wc1")
+  (match_operand 5 "vector_length_operand"  " rK,rK, rK, rK")
+  (match_operand 6 "const_int_operand"  "  i, i,  i,  i")
+  (match_operand 7 "const_int_operand"  "  i, i,  i,  i")
+  (match_operand 8 "const_int_operand"  "  i, i,

Re: [PATCH] RISC-V: Add vector popcount, clz, ctz.

2024-05-17 Thread 钟居哲

LGTM



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2024-05-17 23:26
To: gcc-patches
CC: rdapp.gcc; palmer; Kito Cheng; juzhe.zh...@rivai.ai; jeffreyalaw
Subject: [PATCH] RISC-V: Add vector popcount, clz, ctz.
Hi,
 
this patch adds the zvbb vcpop, vclz and vctz to the autovec machinery
as well as tests for them.  It also changes several non-VLS iterators
to V_VLS iterators for consistency.
 
Regtested on rv64gcv_zvfh_zvbb.
 
Regards
Robin
 
gcc/ChangeLog:
 
* config/riscv/autovec.md (ctz2): New expander.
(clz2): Ditto.
* config/riscv/generic-vector-ooo.md: Add bitmanip ops to insn
reservation.
* config/riscv/vector-crypto.md: Add VLS modes to insns.
* config/riscv/vector.md: Add bitmanip ops to mode_idx and other
attributes.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/unop/popcount-1.c: Adjust check
for zvbb.
* gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/popcount-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/popcount-3.c: New test.
* gcc.target/riscv/rvv/autovec/unop/popcount-template.h: New test.
* gcc.target/riscv/rvv/autovec/unop/clz-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/clz-run.c: New test.
* gcc.target/riscv/rvv/autovec/unop/clz-template.h: New test.
* gcc.target/riscv/rvv/autovec/unop/ctz-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/ctz-run.c: New test.
* gcc.target/riscv/rvv/autovec/unop/ctz-template.h: New test.
---
gcc/config/riscv/autovec.md   | 30 +-
gcc/config/riscv/generic-vector-ooo.md|  2 +-
gcc/config/riscv/vector-crypto.md | 93 ++-
gcc/config/riscv/vector.md| 14 +--
.../gcc.target/riscv/rvv/autovec/unop/clz-1.c |  8 ++
.../riscv/rvv/autovec/unop/clz-run.c  | 36 +++
.../riscv/rvv/autovec/unop/clz-template.h | 21 +
.../gcc.target/riscv/rvv/autovec/unop/ctz-1.c |  8 ++
.../riscv/rvv/autovec/unop/ctz-run.c  | 36 +++
.../riscv/rvv/autovec/unop/ctz-template.h | 21 +
.../riscv/rvv/autovec/unop/popcount-1.c   |  4 +-
.../riscv/rvv/autovec/unop/popcount-2.c   |  4 +-
.../riscv/rvv/autovec/unop/popcount-3.c   |  8 ++
.../riscv/rvv/autovec/unop/popcount-run-1.c   |  3 +-
.../rvv/autovec/unop/popcount-template.h  | 21 +
15 files changed, 250 insertions(+), 59 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/clz-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/clz-run.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/clz-template.h
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/ctz-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/ctz-run.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/ctz-template.h
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount-3.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount-template.h
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index aa1ae0fe075..a9391ed146c 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -1566,7 +1566,7 @@ (define_expand "xorsign3"
})
;; 
---
-;; - [INT] POPCOUNT.
+;; - [INT] POPCOUNT, CTZ and CLZ.
;; 
---
(define_expand "popcount2"
@@ -1574,10 +1574,36 @@ (define_expand "popcount2"
(match_operand:V_VLSI 1 "register_operand")]
   "TARGET_VECTOR"
{
-  riscv_vector::expand_popcount (operands);
+  if (!TARGET_ZVBB)
+riscv_vector::expand_popcount (operands);
+  else
+{
+  riscv_vector::emit_vlmax_insn (code_for_pred_v (POPCOUNT, mode),
+  riscv_vector::CPOP_OP, operands);
+}
   DONE;
})
+(define_expand "ctz2"
+  [(match_operand:V_VLSI 0 "register_operand")
+   (match_operand:V_VLSI 1 "register_operand")]
+  "TARGET_ZVBB"
+  {
+riscv_vector::emit_vlmax_insn (code_for_pred_v (CTZ, mode),
+riscv_vector::CPOP_OP, operands);
+DONE;
+})
+
+(define_expand "clz2"
+  [(match_operand:V_VLSI 0 "register_operand")
+   (match_operand:V_VLSI 1 "register_operand")]
+  "TARGET_ZVBB"
+  {
+riscv_vector::emit_vlmax_insn (code_for_pred_v (CLZ, mode),
+riscv_vector::CPOP_OP, operands);
+DONE;
+})
+
;; -
;;  [INT] Highpart multiplication
diff --git a/gcc/config/riscv/generic-vector-ooo.md 
b/gcc/config/riscv/generic-vector-ooo.md
index 96cb1a0be29..5e933c83841 100644
--- a/gcc/config/riscv/generic-vector-ooo.md
+++ b/gcc/config/riscv/generic-vector-ooo.md
@@ -74,7 +74,7 @@ (define_insn_reservation "vec_fmul" 6
;; Vector crypto, assumed to be a generic operation for now.
(define_insn_reservation "vec_crypto" 4
-  (eq_attr "type" "crypto")
+  (eq_attr "type" "crypto,vclz,vctz,vcpop")
   "vxu_ooo_issue,vxu_ooo_alu")
;; Vector crypto, AES
diff

Re: [PATCH] RISC-V: Add vandn combine helper.

2024-05-17 Thread 钟居哲

LGTM



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2024-05-17 23:26
To: gcc-patches
CC: rdapp.gcc; palmer; Kito Cheng; juzhe.zh...@rivai.ai; jeffreyalaw
Subject: [PATCH] RISC-V: Add vandn combine helper.
Hi,
 
this patch adds a combine pattern for vandn as well as tests for it.
 
Regtested on rv64gcv_zvfh_zvbb.
 
Regards
Robin
 
gcc/ChangeLog:
 
* config/riscv/autovec-opt.md (*vandn_): New pattern.
* config/riscv/vector.md: Add vandn to mode_idx.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/binop/vandn-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vandn-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vandn-template.h: New test.
---
gcc/config/riscv/autovec-opt.md   | 18 +++
gcc/config/riscv/vector.md|  2 +-
.../riscv/rvv/autovec/binop/vandn-1.c |  8 +++
.../riscv/rvv/autovec/binop/vandn-run.c   | 54 +++
.../riscv/rvv/autovec/binop/vandn-template.h  | 38 +
5 files changed, 119 insertions(+), 1 deletion(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vandn-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vandn-run.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vandn-template.h
 
diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
index 06438f9e2f7..07372d965b0 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -1559,3 +1559,21 @@ (define_insn_and_split "*vwsll_zext1_trunc_scalar_"
 DONE;
   }
   [(set_attr "type" "vwsll")])
+
+;; vnot + vand = vandn.
+(define_insn_and_split "*vandn_"
+ [(set (match_operand:V_VLSI 0 "register_operand" "=vr")
+   (and:V_VLSI
+(not:V_VLSI
+  (match_operand:V_VLSI  2 "register_operand"  "vr"))
+(match_operand:V_VLSI1 "register_operand"  "vr")))]
+  "TARGET_ZVBB && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+insn_code icode = code_for_pred_vandn (mode);
+riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, operands);
+DONE;
+  }
+  [(set_attr "type" "vandn")])
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index c6a3845dc13..dafcd7d9bf9 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -743,7 +743,7 @@ (define_attr "mode_idx" ""
vfcmp,vfminmax,vfsgnj,vfclass,vfmerge,vfmov,\
vfcvtitof,vfncvtitof,vfncvtftoi,vfncvtftof,vmalu,vmiota,vmidx,\
vimovxv,vfmovfv,vslideup,vslidedown,vislide1up,vislide1down,vfslide1up,vfslide1down,\
- vgather,vcompress,vmov,vnclip,vnshift")
+ vgather,vcompress,vmov,vnclip,vnshift,vandn")
   (const_int 0)
   (eq_attr "type" "vimovvx,vfmovvf")
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vandn-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vandn-1.c
new file mode 100644
index 000..3bb5bf8dd5b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vandn-1.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-add-options "riscv_v" } */
+/* { dg-add-options "riscv_zvbb" } */
+/* { dg-additional-options "-std=c99 -fno-vect-cost-model" } */
+
+#include "vandn-template.h"
+
+/* { dg-final { scan-assembler-times {\tvandn\.vv} 8 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vandn-run.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vandn-run.c
new file mode 100644
index 000..243c5975068
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vandn-run.c
@@ -0,0 +1,54 @@
+/* { dg-do run } */
+/* { dg-require-effective-target "riscv_zvbb_ok" } */
+/* { dg-add-options "riscv_v" } */
+/* { dg-add-options "riscv_zvbb" } */
+/* { dg-additional-options "-std=c99 -fno-vect-cost-model" } */
+
+#include "vandn-template.h"
+
+#include 
+
+#define SZ 512
+
+#define RUN(TYPE, VAL) 
\
+  TYPE a##TYPE[SZ];
\
+  TYPE b##TYPE[SZ];
\
+  for (int i = 0; i < SZ; i++) 
\
+{  
\
+  a##TYPE[i] = 123;
\
+  b##TYPE[i] = VAL;
\
+}  
\
+  vandn_##TYPE (a##TYPE, a##TYPE, b##TYPE, SZ);
\
+  for (int i = 0; i < SZ; i++) 
\
+assert (a##TYPE[i] == (TYPE) (123 & ~VAL));
+
+#define RUN2(TYPE, VAL)
\
+  TYPE as##TYPE[SZ];   
\
+  for (int i = 0; i < SZ; i++) 
\
+as##TYPE[i] = 123;

Re: [PATCH] RISC-V: Use widening shift for scatter/gather if applicable.

2024-05-17 Thread 钟居哲

LGTM



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2024-05-17 23:25
To: gcc-patches
CC: rdapp.gcc; palmer; Kito Cheng; juzhe.zh...@rivai.ai; jeffreyalaw
Subject: [PATCH] RISC-V: Use widening shift for scatter/gather if applicable.
Hi,
 
with the zvbb extension we can emit a widening shift for scatter/gather
index preparation in case we need to multiply by 2 and zero extend.
 
The patch also adds vwsll to the mode_idx attribute and removes the
mode from shift-count operand of the insn pattern.
 
Regtested on rv64gcv_zvfh_zvbb.
 
Regards
Robin
 
gcc/ChangeLog:
 
* config/riscv/riscv-v.cc (expand_gather_scatter): Use vwsll if
applicable.
* config/riscv/vector-crypto.md: Remove mode from vwsll shift
count operator.
* config/riscv/vector.md: Add vwsll to mode iterator.
 
gcc/testsuite/ChangeLog:
 
* lib/target-supports.exp: Add zvbb.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_64-12-zvbb.c: New 
test.
---
gcc/config/riscv/riscv-v.cc   |  42 +--
gcc/config/riscv/vector-crypto.md |   4 +-
gcc/config/riscv/vector.md|   4 +-
.../gather-scatter/gather_load_64-12-zvbb.c   | 113 ++
gcc/testsuite/lib/target-supports.exp |  48 +++-
5 files changed, 193 insertions(+), 18 deletions(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_64-12-zvbb.c
 
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 814c5febabe..8b41b9c7774 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -4016,7 +4016,7 @@ expand_gather_scatter (rtx *ops, bool is_load)
{
   rtx ptr, vec_offset, vec_reg;
   bool zero_extend_p;
-  int scale_log2;
+  int shift;
   rtx mask = ops[5];
   rtx len = ops[6];
   if (is_load)
@@ -4025,7 +4025,7 @@ expand_gather_scatter (rtx *ops, bool is_load)
   ptr = ops[1];
   vec_offset = ops[2];
   zero_extend_p = INTVAL (ops[3]);
-  scale_log2 = exact_log2 (INTVAL (ops[4]));
+  shift = exact_log2 (INTVAL (ops[4]));
 }
   else
 {
@@ -4033,7 +4033,7 @@ expand_gather_scatter (rtx *ops, bool is_load)
   ptr = ops[0];
   vec_offset = ops[1];
   zero_extend_p = INTVAL (ops[2]);
-  scale_log2 = exact_log2 (INTVAL (ops[3]));
+  shift = exact_log2 (INTVAL (ops[3]));
 }
   machine_mode vec_mode = GET_MODE (vec_reg);
@@ -4043,9 +4043,12 @@ expand_gather_scatter (rtx *ops, bool is_load)
   poly_int64 nunits = GET_MODE_NUNITS (vec_mode);
   bool is_vlmax = is_vlmax_len_p (vec_mode, len);
+  bool use_widening_shift = false;
+
   /* Extend the offset element to address width.  */
   if (inner_offsize < BITS_PER_WORD)
 {
+  use_widening_shift = TARGET_ZVBB && zero_extend_p && shift == 1;
   /* 7.2. Vector Load/Store Addressing Modes.
If the vector offset elements are narrower than XLEN, they are
zero-extended to XLEN before adding to the ptr effective address. If
@@ -4054,8 +4057,8 @@ expand_gather_scatter (rtx *ops, bool is_load)
raise an illegal instruction exception if the EEW is not supported for
offset elements.
- RVV spec only refers to the scale_log == 0 case.  */
-  if (!zero_extend_p || scale_log2 != 0)
+ RVV spec only refers to the shift == 0 case.  */
+  if (!zero_extend_p || shift)
{
  if (zero_extend_p)
inner_idx_mode
@@ -4064,19 +4067,32 @@ expand_gather_scatter (rtx *ops, bool is_load)
inner_idx_mode = int_mode_for_size (BITS_PER_WORD, 0).require ();
  machine_mode new_idx_mode
= get_vector_mode (inner_idx_mode, nunits).require ();
-   rtx tmp = gen_reg_rtx (new_idx_mode);
-   emit_insn (gen_extend_insn (tmp, vec_offset, new_idx_mode, idx_mode,
-   zero_extend_p ? true : false));
-   vec_offset = tmp;
+   if (!use_widening_shift)
+ {
+   rtx tmp = gen_reg_rtx (new_idx_mode);
+   emit_insn (gen_extend_insn (tmp, vec_offset, new_idx_mode, idx_mode,
+   zero_extend_p ? true : false));
+   vec_offset = tmp;
+ }
  idx_mode = new_idx_mode;
}
 }
-  if (scale_log2 != 0)
+  if (shift)
 {
-  rtx tmp = expand_binop (idx_mode, ashl_optab, vec_offset,
-   gen_int_mode (scale_log2, Pmode), NULL_RTX, 0,
-   OPTAB_DIRECT);
+  rtx tmp;
+  if (!use_widening_shift)
+ tmp = expand_binop (idx_mode, ashl_optab, vec_offset,
+ gen_int_mode (shift, Pmode), NULL_RTX, 0,
+ OPTAB_DIRECT);
+  else
+ {
+   tmp = gen_reg_rtx (idx_mode);
+   insn_code icode = code_for_pred_vwsll_scalar (idx_mode);
+   rtx ops[] = {tmp, vec_offset, const1_rtx};
+   emit_vlmax_insn (icode, BINARY_OP, ops);
+ }
+
   vec_offset = tmp;
 }
diff --git a/gcc/config/riscv/vector-crypto.md 
b/gcc/config/riscv/vector-crypto.md
index 24822e2712c..0ddc2f3f3c6 100755
--- a/gcc/config/riscv/vector-crypto.md
+++ b/gcc/config/riscv/vector-crypto.md
@@ -295,7 +295,7 @@ (define_insn "@pred_vwsll"
(ashift:VWEXTI
  (zero_extend:VWEXTI
(match_operand: 3 "register_operand" "vr"))
- (match_operand: 4 "register_operand"

Re: [PATCH] RISC-V: Add vwsll combine helpers.

2024-05-17 Thread 钟居哲

LGTM.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2024-05-17 23:25
To: gcc-patches
CC: rdapp.gcc; palmer; Kito Cheng; juzhe.zh...@rivai.ai; jeffreyalaw
Subject: [PATCH] RISC-V: Add vwsll combine helpers.
Hi,
 
this patch enables the usage of vwsll in autovec context by adding the
necessary combine patterns and tests.
 
Regtested on rv64gcv_zvfh_zvbb.
 
Regards
Robin
 
gcc/ChangeLog:
 
* config/riscv/autovec-opt.md (*vwsll_zext1_): New
pattern.
(*vwsll_zext2_): Ditto.
(*vwsll_zext1_scalar_): Ditto.
(*vwsll_zext1_trunc_): Ditto.
(*vwsll_zext2_trunc_): Ditto.
(*vwsll_zext1_trunc_scalar_): Ditto.
* config/riscv/vector-crypto.md: Make pattern similar to other
narrowing/widening patterns.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/binop/vwsll-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vwsll-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vwsll-template.h: New test.
---
gcc/config/riscv/autovec-opt.md   | 123 ++
gcc/config/riscv/vector-crypto.md |   2 +-
.../riscv/rvv/autovec/binop/vwsll-1.c |  10 ++
.../riscv/rvv/autovec/binop/vwsll-run.c   |  67 ++
.../riscv/rvv/autovec/binop/vwsll-template.h  |  49 +++
5 files changed, 250 insertions(+), 1 deletion(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vwsll-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vwsll-run.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vwsll-template.h
 
diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
index 645dc53d868..06438f9e2f7 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -1436,3 +1436,126 @@ (define_insn_and_split "*n"
 DONE;
   }
   [(set_attr "type" "vmalu")])
+
+;; vzext.vf2 + vsll = vwsll.
+(define_insn_and_split "*vwsll_zext1_"
+  [(set (match_operand:VWEXTI 0 "register_operand""=vr ")
+  (ashift:VWEXTI
+ (zero_extend:VWEXTI
+   (match_operand: 1 "register_operand"" vr "))
+   (match_operand: 2 "vector_shift_operand" "vrvk")))]
+  "TARGET_ZVBB && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+insn_code icode = code_for_pred_vwsll (mode);
+riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, operands);
+DONE;
+  }
+  [(set_attr "type" "vwsll")])
+
+(define_insn_and_split "*vwsll_zext2_"
+  [(set (match_operand:VWEXTI 0 "register_operand""=vr ")
+  (ashift:VWEXTI
+ (zero_extend:VWEXTI
+   (match_operand: 1 "register_operand"" vr "))
+ (zero_extend:VWEXTI
+   (match_operand: 2 "vector_shift_operand" "vrvk"]
+  "TARGET_ZVBB && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+insn_code icode = code_for_pred_vwsll (mode);
+riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, operands);
+DONE;
+  }
+  [(set_attr "type" "vwsll")])
+
+
+(define_insn_and_split "*vwsll_zext1_scalar_"
+  [(set (match_operand:VWEXTI 0 "register_operand"   "=vr")
+  (ashift:VWEXTI
+ (zero_extend:VWEXTI
+   (match_operand: 1 "register_operand"   " vr"))
+   (match_operand:   2 "vector_scalar_shift_operand" " rK")))]
+  "TARGET_ZVBB && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+if (GET_CODE (operands[2]) == SUBREG)
+  operands[2] = SUBREG_REG (operands[2]);
+insn_code icode = code_for_pred_vwsll_scalar (mode);
+riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, operands);
+DONE;
+  }
+  [(set_attr "type" "vwsll")])
+
+;; For
+;;   uint16_t dst;
+;;   uint8_t a, b;
+;;   dst = vwsll (a, b)
+;; we seem to create
+;;   aa = (int) a;
+;;   bb = (int) b;
+;;   dst = (short) vwsll (aa, bb);
+;; The following patterns help to combine this idiom into one vwsll.
+
+(define_insn_and_split "*vwsll_zext1_trunc_"
+  [(set (match_operand: 0   "register_operand""=vr ")
+(truncate:
+  (ashift:VQEXTI
+ (zero_extend:VQEXTI
+   (match_operand: 1   "register_operand"" vr "))
+ (match_operand:VQEXTI 2   "vector_shift_operand" "vrvk"]
+  "TARGET_ZVBB && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+insn_code icode = code_for_pred_vwsll (mode);
+riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, operands);
+DONE;
+  }
+  [(set_attr "type" "vwsll")])
+
+(define_insn_and_split "*vwsll_zext2_trunc_"
+  [(set (match_operand: 0   "register_operand""=vr ")
+(truncate:
+  (ashift:VQEXTI
+ (zero_extend:VQEXTI
+   (match_operand: 1   "register_operand"" vr "))
+ (zero_extend:VQEXTI
+   (match_operand: 2   "vector_shift_operand" "vrvk")]
+  "TARGET_ZVBB && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+insn_code icode = code_for_pred_vwsll (mode);
+riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, operands);
+DONE;
+  }
+  [(set_attr "type" "vwsll")])
+
+(define_insn_and_split "*vwsll_zext1_trunc_scalar_"
+  [(set (match_operand: 0

Re: [PATCH] RISC-V: Split vwadd.wx and vwsub.wx and add helpers.

2024-05-17 Thread 钟居哲

LGTM.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2024-05-17 23:24
To: gcc-patches
CC: palmer; Kito Cheng; juzhe.zh...@rivai.ai; jeffreyalaw; rdapp.gcc
Subject: [PATCH] RISC-V: Split vwadd.wx and vwsub.wx and add helpers.
Hi,
 
vwadd.wx and vwsub.wx have the same problem vfwadd.wf had.  This patch
splits the insn pattern in the same way vfwadd.wf was split.
 
It also adds two patterns to recognize extended scalars.  In practice
those do not provide a lot of improvement over what we already have but
in some instances we can get rid of redundant extensions.  If somebody
considers the patterns excessive, I'd be open to not add them.
 
Regtested on rv64gcv_zvfh_zvbb.
 
Regards
Robin
 
gcc/ChangeLog:
 
* config/riscv/vector.md: Split vwadd.wx/vwsub.wx pattern and
add extended_scalar patterns.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/pr115068.c: Add vwadd.wx/vwsub.wx
tests.
* gcc.target/riscv/rvv/base/pr115068-run.c: Include pr115068.c.
* gcc.target/riscv/rvv/base/vwaddsub-1.c: New test.
---
gcc/config/riscv/vector.md| 62 ---
.../gcc.target/riscv/rvv/base/pr115068-run.c  | 24 +--
.../gcc.target/riscv/rvv/base/pr115068.c  | 26 
.../gcc.target/riscv/rvv/base/vwaddsub-1.c| 47 ++
4 files changed, 127 insertions(+), 32 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vwaddsub-1.c
 
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 107914afa3a..248461302dd 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -3900,27 +3900,71 @@ (define_insn 
"@pred_single_widen_add"
(set_attr "mode" "")])
(define_insn "@pred_single_widen__scalar"
-  [(set (match_operand:VWEXTI 0 "register_operand"   "=vr,   
vr")
+  [(set (match_operand:VWEXTI 0 "register_operand" "=vd,vd, 
vr, vr")
(if_then_else:VWEXTI
  (unspec:
- [(match_operand: 1 "vector_mask_operand"   "vmWc1,vmWc1")
-  (match_operand 5 "vector_length_operand"  "   rK,   rK")
-  (match_operand 6 "const_int_operand"  "i,i")
-  (match_operand 7 "const_int_operand"  "i,i")
-  (match_operand 8 "const_int_operand"  "i,i")
+ [(match_operand: 1 "vector_mask_operand"" vm,vm,Wc1,Wc1")
+  (match_operand 5 "vector_length_operand"  " rK,rK, rK, rK")
+  (match_operand 6 "const_int_operand"  "  i, i,  i,  i")
+  (match_operand 7 "const_int_operand"  "  i, i,  i,  i")
+  (match_operand 8 "const_int_operand"  "  i, i,  i,  i")
 (reg:SI VL_REGNUM)
 (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
  (plus_minus:VWEXTI
- (match_operand:VWEXTI 3 "register_operand" "   vr,   vr")
+ (match_operand:VWEXTI 3 "register_operand" " vr,vr, vr, vr")
(any_extend:VWEXTI
  (vec_duplicate:
- (match_operand: 4 "reg_or_0_operand"   "   rJ,   rJ"
-   (match_operand:VWEXTI 2 "vector_merge_operand"   "   vu,0")))]
+ (match_operand: 4 "reg_or_0_operand"   " rJ,rJ, rJ, rJ"
+   (match_operand:VWEXTI 2 "vector_merge_operand"   " vu, 0, vu,  
0")))]
   "TARGET_VECTOR"
   "vw.wx\t%0,%3,%z4%p1"
   [(set_attr "type" "vi")
(set_attr "mode" "")])
+(define_insn "@pred_single_widen_add_extended_scalar"
+  [(set (match_operand:VWEXTI 0 "register_operand" "=vd,vd, 
vr, vr")
+ (if_then_else:VWEXTI
+   (unspec:
+ [(match_operand: 1 "vector_mask_operand"" vm,vm,Wc1,Wc1")
+  (match_operand 5 "vector_length_operand"  " rK,rK, rK, rK")
+  (match_operand 6 "const_int_operand"  "  i, i,  i,  i")
+  (match_operand 7 "const_int_operand"  "  i, i,  i,  i")
+  (match_operand 8 "const_int_operand"  "  i, i,  i,  i")
+  (reg:SI VL_REGNUM)
+  (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
+   (plus:VWEXTI
+ (vec_duplicate:VWEXTI
+   (any_extend:
+ (match_operand: 4 "reg_or_0_operand"   " rJ,rJ, rJ, rJ")))
+ (match_operand:VWEXTI 3 "register_operand" " vr,vr, vr, vr"))
+   (match_operand:VWEXTI 2 "vector_merge_operand"   " vu, 0, vu,  
0")))]
+  "TARGET_VECTOR"
+  "vwadd.wx\t%0,%3,%z4%p1"
+  [(set_attr "type" "viwalu")
+   (set_attr "mode" "")])
+
+(define_insn "@pred_single_widen_sub_extended_scalar"
+  [(set (match_operand:VWEXTI 0 "register_operand" "=vd,vd, 
vr, vr")
+ (if_then_else:VWEXTI
+   (unspec:
+ [(match_operand: 1 "vector_mask_operand"" vm,vm,Wc1,Wc1")
+  (match_operand 5 "vector_length_operand"  " rK,rK, rK, rK")
+  (match_operand 6 "const_int_operand"  "  i, i,  i,  i")
+  (match_operand 7 "const_int_operand"  "  i, i,  i,  i")
+  (match_operand 8 "const_int_operand"  "  i, i,  i,  i")
+  (reg:SI VL_REGNUM)
+  (reg:SI

Re: Re: [PATCH] RISC-V: Do not allow v0 as dest when merging [PR115068].

2024-05-16 Thread 钟居哲

LGTM this patch （fix for vfwadd.wf).

And here is a simpel case to reproduce same bug for vwadd.wx:

https://compiler-explorer.com/z/4rP9Yvdq1

#include 
#include 

vint64m8_t test_vwadd_wx_i64m8_m(vbool8_t vm, vint64m8_t vs2, int rs1, size_t 
vl) {
  return __riscv_vwadd_wx_i64m8_m(vm, vs2, rs1, vl);
}

char global_memory[1024];
void *fake_memory = (void *)global_memory;

int main ()
{
  asm volatile("fence":::"memory");
  long x;
  asm volatile("":"=r"(x)::"memory");
  vint64m8_t vwadd_wx_i64m8_m_vd = test_vwadd_wx_i64m8_m(
__riscv_vreinterpret_v_i8m1_b8(__riscv_vundefined_i8m1()), 
__riscv_vundefined_i64m8(), x, __riscv_vsetvlmax_e64m8());
  asm volatile(""::"vr"(vwadd_wx_i64m8_m_vd):"memory");

  return 0;
}

main:
fence
vsetvli a4,zero,e32,m4,ta,ma
vwadd.wxv0,v8,a5,v0.t > vd and vm are both v0 which is 
wrong.
li  a0,0
    ret


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2024-05-16 03:31
To: 钟居哲; gcc-patches
CC: rdapp.gcc; palmer; kito.cheng; Jeff Law
Subject: Re: [PATCH] RISC-V: Do not allow v0 as dest when merging [PR115068].
> I saw vwadd/vwsub.wx have same issue. Could you change them and add test too ?
 
Yes, will do.  At first I didn't manage to reproduce it because we
seem to be lacking a combine-opt pattern for it.  I'm going to post
it separately.
 
Regards
Robin

[PATCH] RISC-V: add option -m(no-)autovec-segment

2024-05-13 Thread 钟居哲

LGTM



juzhe.zh...@rivai.ai

Re: Re: [PATCH v1 2/3] RISC-V: Implement vectorizable early exit with vcond_mask_len

2024-05-13 Thread 钟居哲

>> Seems a bit odd on first sight.  If all we want to do is to
>> select between two masks why do we need a large Pmode mode?

Since we are lowering final mask = vcond_mask_len (mask, 1s, 0s, len, bias),
into:

vid.v v1
vcmp v2
vmsltu.vx  v2, v1, len, TUMU
Then len is Pmode, so we only allow to lower vcond_mask_len with vector mode 
for Pmode.

>> So that's basically a mask-move with length?  Can't this be done
>> differently?  If not, please describe, maybe this is already
>> the shortest way.

We are implementing: final mask = mask[i] && i < len ? 1 : 0
The mask move with length but TUMU, I believe current approach is the optimal 
way.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2024-05-14 05:14
To: pan2.li; gcc-patches
CC: rdapp.gcc; juzhe.zhong; kito.cheng; richard.guenther; Tamar.Christina; 
richard.sandiford
Subject: Re: [PATCH v1 2/3] RISC-V: Implement vectorizable early exit with 
vcond_mask_len
Hi Pan,
 
thanks for working on this.
 
In general the patch looks reasonable to me but I'd rather
have some more comments about the high-level idea.
E.g. cbranch is implemented like aarch64 by xor'ing the
bitmasks and comparing the result against zero (so we branch
based on mask equality).
 
> +;; vcond_mask_len
 
High-level description here instead please.
 
> +(define_insn_and_split "vcond_mask_len_"
> +  [(set (match_operand:VB 0 "register_operand")
 
> +(unspec: VB [
> + (match_operand:VB 1 "register_operand")
> + (match_operand:VB 2 "const_1_operand")
 
I guess it works like that because operand[2] is just implicitly
used anyway but shouldn't that rather be an all_ones_operand?
 
> +   && riscv_vector::get_vector_mode (Pmode, GET_MODE_NUNITS 
> (mode)).exists ()"
 
Seems a bit odd on first sight.  If all we want to do is to
select between two masks why do we need a large Pmode mode?
 
> +rtx ops[] = {operands[0], operands[1], operands[1], cmp, reg, 
> operands[4]};
 
So that's basically a mask-move with length?  Can't this be done
differently?  If not, please describe, maybe this is already
the shortest way.
 
Regards
Robin

Re: [PATCH] RISC-V: Do not allow v0 as dest when merging [PR115068].

2024-05-13 Thread 钟居哲

Hi, Robin.

I saw vwadd/vwsub.wx have same issue. Could you change them and add test too ?

Thanks.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2024-05-14 04:15
To: gcc-patches
CC: rdapp.gcc; palmer; Kito Cheng; juzhe.zh...@rivai.ai; jeffreyalaw
Subject: [PATCH] RISC-V: Do not allow v0 as dest when merging [PR115068].
Hi,
 
this patch splits the vfw...wf pattern so we do not emit
e.g. vfwadd.wf v0,v8,fa5,v0.t anymore.
 
Regtested on rv64gcv_zvfh.
 
Regards
Robin
 
gcc/ChangeLog:
 
PR target/115068
 
* config/riscv/vector.md:  Split vfw.wf pattern.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/pr115068-run.c: New test.
* gcc.target/riscv/rvv/base/pr115068.c: New test.
---
gcc/config/riscv/vector.md| 20 ++---
.../gcc.target/riscv/rvv/base/pr115068-run.c  | 28 ++
.../gcc.target/riscv/rvv/base/pr115068.c  | 29 +++
3 files changed, 67 insertions(+), 10 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr115068-run.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr115068.c
 
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 2a54f78df8e..e408baa809c 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -7178,24 +7178,24 @@ (define_insn "@pred_single_widen_sub"
(symbol_ref "riscv_vector::get_frm_mode (operands[9])"))])
(define_insn "@pred_single_widen__scalar"
-  [(set (match_operand:VWEXTF 0 "register_operand"   "=vr,   
vr")
+  [(set (match_operand:VWEXTF 0 "register_operand""=vd, vd, 
vr, vr")
(if_then_else:VWEXTF
  (unspec:
- [(match_operand: 1 "vector_mask_operand"   "vmWc1,vmWc1")
-  (match_operand 5 "vector_length_operand"  "   rK,   rK")
-  (match_operand 6 "const_int_operand"  "i,i")
-  (match_operand 7 "const_int_operand"  "i,i")
-  (match_operand 8 "const_int_operand"  "i,i")
-  (match_operand 9 "const_int_operand"  "i,i")
+ [(match_operand: 1 "vector_mask_operand"  " vm, vm,Wc1,Wc1")
+  (match_operand 5 "vector_length_operand" " rK, rK, rK, rK")
+  (match_operand 6 "const_int_operand" "  i,  i,  i,  i")
+  (match_operand 7 "const_int_operand" "  i,  i,  i,  i")
+  (match_operand 8 "const_int_operand" "  i,  i,  i,  i")
+  (match_operand 9 "const_int_operand" "  i,  i,  i,  i")
 (reg:SI VL_REGNUM)
 (reg:SI VTYPE_REGNUM)
 (reg:SI FRM_REGNUM)] UNSPEC_VPREDICATE)
  (plus_minus:VWEXTF
- (match_operand:VWEXTF 3 "register_operand" "   vr,   vr")
+ (match_operand:VWEXTF 3 "register_operand"" vr, vr, vr, vr")
(float_extend:VWEXTF
  (vec_duplicate:
- (match_operand: 4 "register_operand"   "f,f"
-   (match_operand:VWEXTF 2 "vector_merge_operand"   "   vu,0")))]
+ (match_operand: 4 "register_operand"  "  f,  f,  f,  f"
+   (match_operand:VWEXTF 2 "vector_merge_operand"  " vu,  0, vu,  
0")))]
   "TARGET_VECTOR"
   "vfw.wf\t%0,%3,%4%p1"
   [(set_attr "type" "vf")
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr115068-run.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/pr115068-run.c
new file mode 100644
index 000..95ec8e06021
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr115068-run.c
@@ -0,0 +1,28 @@
+/* { dg-do run } */
+/* { dg-require-effective-target riscv_v_ok } */
+/* { dg-add-options riscv_v } */
+/* { dg-additional-options "-std=gnu99" } */
+
+#include 
+#include 
+
+vfloat64m8_t
+test_vfwadd_wf_f64m8_m (vbool8_t vm, vfloat64m8_t vs2, float rs1, size_t vl)
+{
+  return __riscv_vfwadd_wf_f64m8_m (vm, vs2, rs1, vl);
+}
+
+char global_memory[1024];
+void *fake_memory = (void *) global_memory;
+
+int
+main ()
+{
+  asm volatile ("fence" ::: "memory");
+  vfloat64m8_t vfwadd_wf_f64m8_m_vd = test_vfwadd_wf_f64m8_m (
+__riscv_vreinterpret_v_i8m1_b8 (__riscv_vundefined_i8m1 ()),
+__riscv_vundefined_f64m8 (), 1.0, __riscv_vsetvlmax_e64m8 ());
+  asm volatile ("" ::"vr"(vfwadd_wf_f64m8_m_vd) : "memory");
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr115068.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/pr115068.c
new file mode 100644
index 000..6d680037aa1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr115068.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-add-options riscv_v } */
+/* { dg-additional-options "-std=gnu99" } */
+
+#include 
+#include 
+
+vfloat64m8_t
+test_vfwadd_wf_f64m8_m (vbool8_t vm, vfloat64m8_t vs2, float rs1, size_t vl)
+{
+  return __riscv_vfwadd_wf_f64m8_m (vm, vs2, rs1, vl);
+}
+
+char global_memory[1024];
+void *fake_memory = (void *) global_memory;
+
+int
+main ()
+{
+  asm volatile ("fence" ::: "memory");
+  vfloat64m8_t vfwadd_wf_f64m8_m_vd = test_vfwadd_wf_f64m8_m (
+

Re: Re: [PATCH 2/4] df: Add DF_LIVE_SUBREG problem

2024-05-08 Thread 钟居哲

Thanks Dim.

We noticed there is regression in aarch64 CI.
We will fix it with following your comments and regression in aarch64 CI.



juzhe.zh...@rivai.ai
 
From: Dimitar Dimitrov
Date: 2024-05-08 23:57
To: 陈硕
CC: 丁乐华; gcc-patches; 钟居哲; 夏晋; vmakarov; richard.sandiford
Subject: Re: [PATCH 2/4] df: Add DF_LIVE_SUBREG problem
On Wed, May 08, 2024 at 11:34:48AM +0800, 陈硕 wrote:
> Hi Dimitar
> 
> 
> I send a patch just now, modifies accordingly
> 
> 
> some comments:
> 
> 
> Nit: Should have two spaces after the dot, per GNU coding style. 
> I'd suggest
> to run the contrib/check_GNU_style.py script on your patches.
> Do you mean "star" by "dot", i.e. "/*" should be "/* "?
 
No, I was referring to the following paragraph from
https://www.gnu.org/prep/standards/standards.html :
   "Please put two spaces after the end of a sentence in your comments, ..."
 
To fix, simply add a second space after the dot, e.g.:
  -   Like DF_LR, but include tracking subreg liveness. Currently used to 
provide
  +   Like DF_LR, but include tracking subreg liveness.  Currently used to 
provide
 
 
For reference, here is the output from the style checker:
  $ git show | ./contrib/check_GNU_style.py -
  === ERROR type #4: dot, space, space, new sentence (24 error(s)) ===
  ...
  gcc/df-problems.cc:1350:52:   Like DF_LR, but include tracking subreg 
liveness.█Currently used to provide
 
> 
> 
> These names seem a bit too short for global variables. Perhaps tuck
> them in a namespace?
> 
> Also, since these must remain empty, shouldn't they be declared as const?
> 
> namespace df {
>  const bitmap_head empty_bitmap;
>  const subregs_live empty_live;
> }
> 
> 
> 
> May be better if "namespace df" contains all DF related code? as a minor 
> modification, I add a prefix "df_" to the variables.
> Meanwhile, const seems inapropriate here, since it's returned as normal 
> pointer rather than const pointer in some funtions, 
> 
> change to const would break this return value type check, and a const_cast 
> would make the const meanlingless.
> 
> 
> more details see in the patch
 
Thanks for considering my suggestion.
 
Regards,
Dimitar
> 
> 
> regards
> Shuo
> 
> 
>

Re: Re: [PATCH 4/4] lra: Apply DF_LIVE_SUBREG data

2024-05-08 Thread 钟居哲

Thanks Vlad.

I noticed there is devel/subreg-coalesce branch.

We are working on supporting subreg coalesce in IRA/LRA base on the latest 
version of subreg DF patch.

And we will send the followup patches.

Thanks.


juzhe.zh...@rivai.ai
 
From: Vladimir Makarov
Date: 2024-05-09 00:29
To: Lehua Ding
CC: richard.sandiford; juzhe.zhong; gcc-patches
Subject: Re: [PATCH 4/4] lra: Apply DF_LIVE_SUBREG data
 
On 5/7/24 23:01, Lehua Ding wrote:
> Hi Vladimir,
>
> I'll send V3 patchs based on these comments. Note that these four 
> patches only support subreg liveness tracking and apply to IRA and LRA 
> pass. Therefore, no performance changes are expected before we support 
> subreg coalesce. There will be new patches later to complete the 
> subreg coalesce functionality. Support for subreg coalesce requires 
> support for subreg copy i.e. modifying the logic for conflict detection.
>
>
Thank you for your clarification that the current batch of patches does 
not change the performance.  I hope the next batch of patches will be 
added to devel/subreg-coalesce branch too for their easier evaluation.

Re: Re: [PATCH v1] RISC-V: Revert RVV wv instructions overlap and xfail tests

2024-04-22 Thread 钟居哲

Apologize that we didn't post our (me, kito and Li Pan) disscussions.

This is the story:
We found that my previous patches which support highpart register overlap with 
register filter for instructions like (vwadd.wv)
cause ICE reported by:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114714
and this is obviously a regression (No ICE on GCC 13.2, but ICE on GCC 14)

We have tried several fixes to work around this ICE, however, we failed.
And also I found my previous patches are quite wrong which is not the perfect 
solution to support register group overlap
for vwadd.wv. 
So, finally we decide to revert those patches.

Kito knows the details of this story, kito can share more details in GNU patche 
meeting.

Thanks.


juzhe.zh...@rivai.ai
 
From: Patrick O'Neill
Date: 2024-04-23 01:20
To: Li, Pan2; Robin Dapp; gcc-patches@gcc.gnu.org
CC: juzhe.zh...@rivai.ai; kito.ch...@gmail.com
Subject: Re: [PATCH v1] RISC-V: Revert RVV wv instructions overlap and xfail 
tests
Hi Pan,
I'm not sure I'm following.  Did we miss something that should have been
covered?  Like only an overlap on the srcs but not the dest?
Are there testcases that fail?  If so we should definitely have one.
Can you give some additional information on why these reverts are needed?
+1 to the request for a failing testcase if there is one. Patrick If something 
is broken then indeed we should revert it.
Yes, we may need to support these in gcc-15.
... why not just revert everything and xfail all the tests in a
follow up?  Your patch is essentially a revert but doesn't look like
it.  I'd rather we let a revert be a revert and adjust the tests
separately so it becomes clear.
Sure, will revert b3b2799b872 and then file the patch for the xfail tests.
Pan

Re: Re: [PATCH v1] RISC-V: Adjust overlap attr after revert d3544cea63d and e65aaf8efe1

2024-04-22 Thread 钟居哲

I think the revert patch exposes latent bug, Li Pan will look into it.



juzhe.zh...@rivai.ai
 
From: Patrick O'Neill
Date: 2024-04-23 03:55
To: pan2.li; gcc-patches
CC: juzhe.zhong; kito.cheng; rdapp.gcc
Subject: Re: [PATCH v1] RISC-V: Adjust overlap attr after revert d3544cea63d 
and e65aaf8efe1
Hi Pan,
 
I was running the testsuite for this and noticed an ICE scroll by when 
this patch is applied to cacc55a4c0be8d0bc7417b6a28924eadbbe428e3 for 
rv64gcv:
 
FAIL: gfortran.dg/graphite/pr29832.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  (internal 
compiler error: in extract_insn, at recog.cc:2812)
 
I'll send the full list of new failures once the runs finish.
 
Thanks,
Patrick
 
On 4/22/24 06:47, pan2...@intel.com wrote:
> From: Pan Li 
>
> After we reverted below 2 commits, the reference to attr need some
> adjustment as the group_overlap is no longer available.
>
> * RISC-V: Robostify the W43, W86, W87 constraint enabled attribute
> * RISC-V: Rename vconstraint into group_overlap
>
> The below tests are passed for this patch.
>
> * The rv64gcv fully regression tests.
>
> gcc/ChangeLog:
>
> * config/riscv/vector-crypto.md:
>
> Signed-off-by: Pan Li 
> ---
>   gcc/config/riscv/vector-crypto.md | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/config/riscv/vector-crypto.md 
> b/gcc/config/riscv/vector-crypto.md
> index 519c6a10d94..23dc549e5b8 100755
> --- a/gcc/config/riscv/vector-crypto.md
> +++ b/gcc/config/riscv/vector-crypto.md
> @@ -322,7 +322,7 @@ (define_insn "@pred_vwsll_scalar"
> "vwsll.v%o4\t%0,%3,%4%p1"
> [(set_attr "type" "vwsll")
>  (set_attr "mode" "")
> -   (set_attr "group_overlap" 
> "W21,W21,W21,W21,W42,W42,W42,W42,W84,W84,W84,W84,none,none")])
> +   (set_attr "vconstraint" 
> "W21,W21,W21,W21,W42,W42,W42,W42,W84,W84,W84,W84,no,no")])
>   
>   ;; vbrev.v vbrev8.v vrev8.v
>   (define_insn "@pred_v"

Re: [PATCH v1] RISC-V: Adjust overlap attr after revert d3544cea63d and e65aaf8efe1

2024-04-22 Thread 钟居哲

lgtm



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2024-04-22 21:47
To: gcc-patches
CC: juzhe.zhong; kito.cheng; rdapp.gcc; Pan Li
Subject: [PATCH v1] RISC-V: Adjust overlap attr after revert d3544cea63d and 
e65aaf8efe1
From: Pan Li 
 
After we reverted below 2 commits, the reference to attr need some
adjustment as the group_overlap is no longer available.
 
* RISC-V: Robostify the W43, W86, W87 constraint enabled attribute
* RISC-V: Rename vconstraint into group_overlap
 
The below tests are passed for this patch.
 
* The rv64gcv fully regression tests.
 
gcc/ChangeLog:
 
* config/riscv/vector-crypto.md:
 
Signed-off-by: Pan Li 
---
gcc/config/riscv/vector-crypto.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
 
diff --git a/gcc/config/riscv/vector-crypto.md 
b/gcc/config/riscv/vector-crypto.md
index 519c6a10d94..23dc549e5b8 100755
--- a/gcc/config/riscv/vector-crypto.md
+++ b/gcc/config/riscv/vector-crypto.md
@@ -322,7 +322,7 @@ (define_insn "@pred_vwsll_scalar"
   "vwsll.v%o4\t%0,%3,%4%p1"
   [(set_attr "type" "vwsll")
(set_attr "mode" "")
-   (set_attr "group_overlap" 
"W21,W21,W21,W21,W42,W42,W42,W42,W84,W84,W84,W84,none,none")])
+   (set_attr "vconstraint" 
"W21,W21,W21,W21,W42,W42,W42,W42,W84,W84,W84,W84,no,no")])
;; vbrev.v vbrev8.v vrev8.v
(define_insn "@pred_v"
-- 
2.34.1

Re: [PATCH v1] RISC-V: Add xfail test case for highpart register overlap of vx/vf widen

2024-04-21 Thread 钟居哲

LGTM



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2024-04-21 13:01
To: gcc-patches
CC: juzhe.zhong; kito.cheng; rdapp.gcc; Pan Li
Subject: [PATCH v1] RISC-V: Add xfail test case for highpart register overlap 
of vx/vf widen
From: Pan Li 
 
We reverted below patch for register group overlap, add the related
insn test and mark it as xfail.  And we will remove the xfail
after we support the register overlap in GCC-15.
 
a23415d7572 RISC-V: Support highpart register overlap for widen vx/vf 
instructions
 
The below test suites are passed.
* The rv64gcv fully regression test.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/pr112431-22.c: New test.
* gcc.target/riscv/rvv/base/pr112431-23.c: New test.
* gcc.target/riscv/rvv/base/pr112431-24.c: New test.
* gcc.target/riscv/rvv/base/pr112431-25.c: New test.
* gcc.target/riscv/rvv/base/pr112431-26.c: New test.
* gcc.target/riscv/rvv/base/pr112431-27.c: New test.
 
Signed-off-by: Pan Li 
---
.../gcc.target/riscv/rvv/base/pr112431-22.c   | 188 ++
.../gcc.target/riscv/rvv/base/pr112431-23.c   | 119 +++
.../gcc.target/riscv/rvv/base/pr112431-24.c   |  86 
.../gcc.target/riscv/rvv/base/pr112431-25.c   | 104 ++
.../gcc.target/riscv/rvv/base/pr112431-26.c   |  68 +++
.../gcc.target/riscv/rvv/base/pr112431-27.c   |  51 +
6 files changed, 616 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr112431-22.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr112431-23.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr112431-24.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr112431-25.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr112431-26.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr112431-27.c
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr112431-22.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/pr112431-22.c
new file mode 100644
index 000..ac56703c75c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr112431-22.c
@@ -0,0 +1,188 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3" } */
+
+#include "riscv_vector.h"
+
+size_t __attribute__ ((noinline))
+sumation (size_t sum0, size_t sum1, size_t sum2, size_t sum3, size_t sum4,
+   size_t sum5, size_t sum6, size_t sum7, size_t sum8, size_t sum9,
+   size_t sum10, size_t sum11, size_t sum12, size_t sum13, size_t sum14,
+   size_t sum15)
+{
+  return sum0 + sum1 + sum2 + sum3 + sum4 + sum5 + sum6 + sum7 + sum8 + sum9
+ + sum10 + sum11 + sum12 + sum13 + sum14 + sum15;
+}
+
+size_t
+foo (char const *buf, size_t len)
+{
+  size_t sum = 0;
+  size_t vl = __riscv_vsetvlmax_e8m8 ();
+  size_t step = vl * 4;
+  const char *it = buf, *end = buf + len;
+  for (; it + step <= end;)
+{
+  vint8m1_t v0 = __riscv_vle8_v_i8m1 ((void *) it, vl);
+  it += vl;
+  vint8m1_t v1 = __riscv_vle8_v_i8m1 ((void *) it, vl);
+  it += vl;
+  vint8m1_t v2 = __riscv_vle8_v_i8m1 ((void *) it, vl);
+  it += vl;
+  vint8m1_t v3 = __riscv_vle8_v_i8m1 ((void *) it, vl);
+  it += vl;
+  vint8m1_t v4 = __riscv_vle8_v_i8m1 ((void *) it, vl);
+  it += vl;
+  vint8m1_t v5 = __riscv_vle8_v_i8m1 ((void *) it, vl);
+  it += vl;
+  vint8m1_t v6 = __riscv_vle8_v_i8m1 ((void *) it, vl);
+  it += vl;
+  vint8m1_t v7 = __riscv_vle8_v_i8m1 ((void *) it, vl);
+  it += vl;
+  vint8m1_t v8 = __riscv_vle8_v_i8m1 ((void *) it, vl);
+  it += vl;
+  vint8m1_t v9 = __riscv_vle8_v_i8m1 ((void *) it, vl);
+  it += vl;
+  vint8m1_t v10 = __riscv_vle8_v_i8m1 ((void *) it, vl);
+  it += vl;
+  vint8m1_t v11 = __riscv_vle8_v_i8m1 ((void *) it, vl);
+  it += vl;
+  vint8m1_t v12 = __riscv_vle8_v_i8m1 ((void *) it, vl);
+  it += vl;
+  vint8m1_t v13 = __riscv_vle8_v_i8m1 ((void *) it, vl);
+  it += vl;
+  vint8m1_t v14 = __riscv_vle8_v_i8m1 ((void *) it, vl);
+  it += vl;
+  vint8m1_t v15 = __riscv_vle8_v_i8m1 ((void *) it, vl);
+  it += vl;
+  
+  asm volatile("nop" ::: "memory");
+  vint16m2_t vw0 = __riscv_vwadd_vx_i16m2 (v0, 33, vl);
+  vint16m2_t vw1 = __riscv_vwadd_vx_i16m2 (v1, 33, vl);
+  vint16m2_t vw2 = __riscv_vwadd_vx_i16m2 (v2, 33, vl);
+  vint16m2_t vw3 = __riscv_vwadd_vx_i16m2 (v3, 33, vl);
+  vint16m2_t vw4 = __riscv_vwadd_vx_i16m2 (v4, 33, vl);
+  vint16m2_t vw5 = __riscv_vwadd_vx_i16m2 (v5, 33, vl);
+  vint16m2_t vw6 = __riscv_vwadd_vx_i16m2 (v6, 33, vl);
+  vint16m2_t vw7 = __riscv_vwadd_vx_i16m2 (v7, 33, vl);
+  vint16m2_t vw8 = __riscv_vwadd_vx_i16m2 (v8, 33, vl);
+  vint16m2_t vw9 = __riscv_vwadd_vx_i16m2 (v9, 33, vl);
+  vint16m2_t vw10 = __riscv_vwadd_vx_i16m2 (v10, 33, vl);
+  vint16m2_t vw11 = __riscv_vwadd_vx_i16m2 (v11, 33, vl);
+  vint16m2_t vw12 = __riscv_vwadd_vx_i16m2 (v12, 33, vl);
+  vint16m2_t vw13 = __riscv_vwadd_vx_i16m2 (v13, 33, vl);
+  vint16m2_t vw14 =

Re: [PATCH v1] RISC-V: Add xfail test case for incorrect overlap on v0

2024-04-20 Thread 钟居哲

lgtm



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2024-04-20 23:21
To: gcc-patches
CC: juzhe.zhong; kito.cheng; rdapp.gcc; Pan Li
Subject: [PATCH v1] RISC-V: Add xfail test case for incorrect overlap on v0
From: Pan Li 
 
We reverted below patch for register group overlap, add the related
insn test and mark it as xfail.  And we will remove the xfail
after we support the register overlap in GCC-15.
 
018ba3ac952 RISC-V: Fix overlap group incorrect overlap on v0
 
The below test suites are passed.
* The rv64gcv fully regression test.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/pr112431-34.c: New test.
 
Signed-off-by: Pan Li 
---
.../gcc.target/riscv/rvv/base/pr112431-34.c   | 101 ++
1 file changed, 101 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr112431-34.c
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr112431-34.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/pr112431-34.c
new file mode 100644
index 000..286185aa01e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr112431-34.c
@@ -0,0 +1,101 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3" } */
+
+#include "riscv_vector.h"
+
+size_t __attribute__ ((noinline))
+sumation (size_t sum0, size_t sum1, size_t sum2, size_t sum3, size_t sum4,
+   size_t sum5, size_t sum6, size_t sum7, size_t sum8, size_t sum9,
+   size_t sum10, size_t sum11, size_t sum12, size_t sum13, size_t sum14,
+   size_t sum15)
+{
+  return sum0 + sum1 + sum2 + sum3 + sum4 + sum5 + sum6 + sum7 + sum8 + sum9
+ + sum10 + sum11 + sum12 + sum13 + sum14 + sum15;
+}
+
+size_t
+foo (char const *buf, size_t len)
+{
+  size_t sum = 0;
+  size_t vl = __riscv_vsetvlmax_e8m8 ();
+  size_t step = vl * 4;
+  const char *it = buf, *end = buf + len;
+  for (; it + step <= end;)
+{
+  vuint8m1_t v0 = __riscv_vle8_v_u8m1 ((void *) it, vl);
+  it += vl;
+  vuint8m1_t v1 = __riscv_vle8_v_u8m1 ((void *) it, vl);
+  it += vl;
+  vuint8m1_t v2 = __riscv_vle8_v_u8m1 ((void *) it, vl);
+  it += vl;
+  vuint8m1_t v3 = __riscv_vle8_v_u8m1 ((void *) it, vl);
+  it += vl;
+  vuint8m1_t v4 = __riscv_vle8_v_u8m1 ((void *) it, vl);
+  it += vl;
+  vuint8m1_t v5 = __riscv_vle8_v_u8m1 ((void *) it, vl);
+  it += vl;
+  vuint8m1_t v6 = __riscv_vle8_v_u8m1 ((void *) it, vl);
+  it += vl;
+  vuint8m1_t v7 = __riscv_vle8_v_u8m1 ((void *) it, vl);
+  it += vl;
+  vuint8m1_t v8 = __riscv_vle8_v_u8m1 ((void *) it, vl);
+  it += vl;
+  vuint8m1_t v9 = __riscv_vle8_v_u8m1 ((void *) it, vl);
+  it += vl;
+  vuint8m1_t v10 = __riscv_vle8_v_u8m1 ((void *) it, vl);
+  it += vl;
+  vuint8m1_t v11 = __riscv_vle8_v_u8m1 ((void *) it, vl);
+  it += vl;
+  vuint8m1_t v12 = __riscv_vle8_v_u8m1 ((void *) it, vl);
+  it += vl;
+  vuint8m1_t v13 = __riscv_vle8_v_u8m1 ((void *) it, vl);
+  it += vl;
+  vuint8m1_t v14 = __riscv_vle8_v_u8m1 ((void *) it, vl);
+  it += vl;
+  vuint8m1_t v15 = __riscv_vle8_v_u8m1 ((void *) it, vl);
+  it += vl;
+  
+  asm volatile("nop" ::: "memory");
+  vint16m2_t vw0 = __riscv_vluxei8_v_i16m2 ((void *) it, v0, vl);
+  vint16m2_t vw1 = __riscv_vluxei8_v_i16m2 ((void *) it, v1, vl);
+  vint16m2_t vw2 = __riscv_vluxei8_v_i16m2 ((void *) it, v2, vl);
+  vint16m2_t vw3 = __riscv_vluxei8_v_i16m2 ((void *) it, v3, vl);
+  vint16m2_t vw4 = __riscv_vluxei8_v_i16m2 ((void *) it, v4, vl);
+  vint16m2_t vw5 = __riscv_vluxei8_v_i16m2 ((void *) it, v5, vl);
+  vint16m2_t vw6 = __riscv_vluxei8_v_i16m2 ((void *) it, v6, vl);
+  vint16m2_t vw7 = __riscv_vluxei8_v_i16m2 ((void *) it, v7, vl);
+  vint16m2_t vw8 = __riscv_vluxei8_v_i16m2 ((void *) it, v8, vl);
+  vint16m2_t vw9 = __riscv_vluxei8_v_i16m2 ((void *) it, v9, vl);
+  vint16m2_t vw10 = __riscv_vluxei8_v_i16m2 ((void *) it, v10, vl);
+  vint16m2_t vw11 = __riscv_vluxei8_v_i16m2 ((void *) it, v11, vl);
+  vint16m2_t vw12 = __riscv_vluxei8_v_i16m2 ((void *) it, v12, vl);
+  vint16m2_t vw13 = __riscv_vluxei8_v_i16m2 ((void *) it, v13, vl);
+  vint16m2_t vw14 = __riscv_vluxei8_v_i16m2 ((void *) it, v14, vl);
+  vbool8_t mask = *(vbool8_t*)it;
+  vint16m2_t vw15 = __riscv_vluxei8_v_i16m2_m (mask, (void *) it, v15, vl);
+
+  asm volatile("nop" ::: "memory");
+  size_t sum0 = __riscv_vmv_x_s_i16m2_i16 (vw0);
+  size_t sum1 = __riscv_vmv_x_s_i16m2_i16 (vw1);
+  size_t sum2 = __riscv_vmv_x_s_i16m2_i16 (vw2);
+  size_t sum3 = __riscv_vmv_x_s_i16m2_i16 (vw3);
+  size_t sum4 = __riscv_vmv_x_s_i16m2_i16 (vw4);
+  size_t sum5 = __riscv_vmv_x_s_i16m2_i16 (vw5);
+  size_t sum6 = __riscv_vmv_x_s_i16m2_i16 (vw6);
+  size_t sum7 = __riscv_vmv_x_s_i16m2_i16 (vw7);
+  size_t sum8 = __riscv_vmv_x_s_i16m2_i16 (vw8);
+  size_t sum9 = __riscv_vmv_x_s_i16m2_i16 (vw9);
+  size_t sum10 = __riscv_vmv_x_s_i16m2_i16 (vw10);
+  size_t

Re: [PATCH] RISC-V: Use vmv1r.v instead of vmv.v.v for fma output reloads [PR114200].

2024-03-06 Thread 钟居哲

LGTM



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2024-03-06 21:44
To: gcc-patches; palmer; Kito Cheng; juzhe.zh...@rivai.ai
CC: rdapp.gcc; jeffreyalaw
Subject: [PATCH] RISC-V: Use vmv1r.v instead of vmv.v.v for fma output reloads 
[PR114200].
Hi,
 
three-operand instructions like vmacc are modeled with an implicit
output reload when the output does not match one of the operands.  For
this we use vmv.v.v which is subject to length masking.
 
In a situation where the current vl is less than the full vlenb
and the fma's result value is used as input for a vector reduction
(which is never length masked) we effectively only reduce vl
elements.  The masked-out elements are relevant for the
reduction, though, leading to a wrong result.
 
This patch replaces the vmv reloads by full-register reloads.
 
Regtested on rv64, rv32 is running.
 
Regards
Robin
 
gcc/ChangeLog:
 
PR target/114200
PR target/114202
 
* config/riscv/vector.md: Use vmv[1248]r.v instead of vmv.v.v.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/pr114200.c: New test.
* gcc.target/riscv/rvv/autovec/pr114202.c: New test.
---
gcc/config/riscv/vector.md| 96 +--
.../gcc.target/riscv/rvv/autovec/pr114200.c   | 18 
.../gcc.target/riscv/rvv/autovec/pr114202.c   | 20 
3 files changed, 86 insertions(+), 48 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr114200.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr114202.c
 
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index f89f9c2fa86..8b1c24c5d79 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -5351,10 +5351,10 @@ (define_insn "*pred_mul_plus_undef"
   "@
vmadd.vv\t%0,%4,%5%p1
vmacc.vv\t%0,%3,%4%p1
-   vmv.v.v\t%0,%4\;vmacc.vv\t%0,%3,%4%p1
+   vmv%m4r.v\t%0,%4\;vmacc.vv\t%0,%3,%4%p1
vmadd.vv\t%0,%4,%5%p1
vmacc.vv\t%0,%3,%4%p1
-   vmv.v.v\t%0,%5\;vmacc.vv\t%0,%3,%4%p1"
+   vmv%m5r.v\t%0,%5\;vmacc.vv\t%0,%3,%4%p1"
   [(set_attr "type" "vimuladd")
(set_attr "mode" "")])
@@ -5378,9 +5378,9 @@ (define_insn "*pred_madd"
   "TARGET_VECTOR"
   "@
vmadd.vv\t%0,%3,%4%p1
-   vmv.v.v\t%0,%2\;vmadd.vv\t%0,%3,%4%p1
+   vmv%m2r.v\t%0,%2\;vmadd.vv\t%0,%3,%4%p1
vmadd.vv\t%0,%3,%4%p1
-   vmv.v.v\t%0,%2\;vmadd.vv\t%0,%3,%4%p1"
+   vmv%m2r.v\t%0,%2\;vmadd.vv\t%0,%3,%4%p1"
   [(set_attr "type" "vimuladd")
(set_attr "mode" "")
(set_attr "merge_op_idx" "2")
@@ -5409,9 +5409,9 @@ (define_insn "*pred_macc"
   "TARGET_VECTOR"
   "@
vmacc.vv\t%0,%2,%3%p1
-   vmv.v.v\t%0,%4\;vmacc.vv\t%0,%2,%3%p1
+   vmv%m4r.v\t%0,%4;vmacc.vv\t%0,%2,%3%p1
vmacc.vv\t%0,%2,%3%p1
-   vmv.v.v\t%0,%4\;vmacc.vv\t%0,%2,%3%p1"
+   vmv%m4r.v\t%0,%4\;vmacc.vv\t%0,%2,%3%p1"
   [(set_attr "type" "vimuladd")
(set_attr "mode" "")
(set_attr "merge_op_idx" "4")
@@ -5462,9 +5462,9 @@ (define_insn "*pred_madd_scalar"
   "TARGET_VECTOR"
   "@
vmadd.vx\t%0,%2,%4%p1
-   vmv.v.v\t%0,%3\;vmadd.vx\t%0,%2,%4%p1
+   vmv%m3r.v\t%0,%3\;vmadd.vx\t%0,%2,%4%p1
vmadd.vx\t%0,%2,%4%p1
-   vmv.v.v\t%0,%3\;vmadd.vx\t%0,%2,%4%p1"
+   vmv%m3r.v\t%0,%3\;vmadd.vx\t%0,%2,%4%p1"
   [(set_attr "type" "vimuladd")
(set_attr "mode" "")
(set_attr "merge_op_idx" "3")
@@ -5494,9 +5494,9 @@ (define_insn "*pred_macc_scalar"
   "TARGET_VECTOR"
   "@
vmacc.vx\t%0,%2,%3%p1
-   vmv.v.v\t%0,%4\;vmacc.vx\t%0,%2,%3%p1
+   vmv%m4r.v\t%0,%4\;vmacc.vx\t%0,%2,%3%p1
vmacc.vx\t%0,%2,%3%p1
-   vmv.v.v\t%0,%4\;vmacc.vx\t%0,%2,%3%p1"
+   vmv%m4r.v\t%0,%4\;vmacc.vx\t%0,%2,%3%p1"
   [(set_attr "type" "vimuladd")
(set_attr "mode" "")
(set_attr "merge_op_idx" "4")
@@ -5562,9 +5562,9 @@ (define_insn "*pred_madd_extended_scalar"
   "TARGET_VECTOR && !TARGET_64BIT"
   "@
vmadd.vx\t%0,%2,%4%p1
-   vmv.v.v\t%0,%2\;vmadd.vx\t%0,%2,%4%p1
+   vmv%m2r.v\t%0,%2\;vmadd.vx\t%0,%2,%4%p1
vmadd.vx\t%0,%2,%4%p1
-   vmv.v.v\t%0,%2\;vmadd.vx\t%0,%2,%4%p1"
+   vmv%m2r.v\t%0,%2\;vmadd.vx\t%0,%2,%4%p1"
   [(set_attr "type" "vimuladd")
(set_attr "mode" "")
(set_attr "merge_op_idx" "3")
@@ -5595,9 +5595,9 @@ (define_insn "*pred_macc_extended_scalar"
   "TARGET_VECTOR && !TARGET_64BIT"
   "@
vmacc.vx\t%0,%2,%3%p1
-   vmv.v.v\t%0,%4\;vmacc.vx\t%0,%2,%3%p1
+   vmv%m4r.v\t%0,%4\;vmacc.vx\t%0,%2,%3%p1
vmacc.vx\t%0,%2,%3%p1
-   vmv.v.v\t%0,%4\;vmacc.vx\t%0,%2,%3%p1"
+   vmv%m4r.v\t%0,%4\;vmacc.vx\t%0,%2,%3%p1"
   [(set_attr "type" "vimuladd")
(set_attr "mode" "")
(set_attr "merge_op_idx" "4")
@@ -5649,10 +5649,10 @@ (define_insn "*pred_minus_mul_undef"
   "@
vnmsub.vv\t%0,%4,%5%p1
vnmsac.vv\t%0,%3,%4%p1
-   vmv.v.v\t%0,%3\;vnmsub.vv\t%0,%4,%5%p1
+   vmv%m3r.v\t%0,%3\;vnmsub.vv\t%0,%4,%5%p1
vnmsub.vv\t%0,%4,%5%p1
vnmsac.vv\t%0,%3,%4%p1
-   vmv.v.v\t%0,%3\;vnmsub.vv\t%0,%4,%5%p1"
+   vmv%m3r.v\t%0,%3\;vnmsub.vv\t%0,%4,%5%p1"
   [(set_attr "type" "vimuladd")
(set_attr "mode" "")])
@@ -5676,9 +5676,9 @@ (define_insn "*pred_nmsub"
   "TARGET_VECTOR"
   "@

Re: Re: [PATCH] RISC-V: Add initial cost handling for segment loads/stores.

2024-03-01 Thread 钟居哲

+  /* Segment load/store permute cost.  */
+  const int segment_permute_2;
+  const int segment_permute_4;
+  const int segment_permute_8;

Why do we only have 2/4/8, I think we should have 2/3/4/5/6/7/8


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2024-02-28 05:27
To: juzhe.zh...@rivai.ai; gcc-patches; palmer; kito.cheng
CC: rdapp.gcc; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Add initial cost handling for segment loads/stores.
> This patch looks odd to me.
> I don't see memrefs in the trunk code.
 
It's on top of the vle/vse offset handling patch from
a while back that I haven't committed yet.
 
> Also, I prefer list all cost in cost tune info for NF = 2 ~ 8 like ARM SVE 
> does:
I don't mind having separate costs for each but I figured they
scale anyway with the number of vectors already.  Attached v2
is more similar to aarch64.
 
Regards
Robin
 
Subject: [PATCH v2] RISC-V: Add initial cost handling for segment
loads/stores.
 
This patch makes segment loads and stores more expensive.  It adds
segment_permute_2 (as well as 4 and 8) cost fields to the common vector
costs and adds handling to adjust_stmt_cost.
 
gcc/ChangeLog:
 
* config/riscv/riscv-protos.h (struct common_vector_cost): Add
segment_permute cost.
* config/riscv/riscv-vector-costs.cc (costs::adjust_stmt_cost):
Handle segment loads/stores.
* config/riscv/riscv.cc: Initialize segment_permute_[248] to 1.
---
gcc/config/riscv/riscv-protos.h|   5 +
gcc/config/riscv/riscv-vector-costs.cc | 139 +
gcc/config/riscv/riscv.cc  |   6 ++
3 files changed, 108 insertions(+), 42 deletions(-)
 
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 80efdf2b7e5..9b737aca1a3 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -218,6 +218,11 @@ struct common_vector_cost
   const int gather_load_cost;
   const int scatter_store_cost;
+  /* Segment load/store permute cost.  */
+  const int segment_permute_2;
+  const int segment_permute_4;
+  const int segment_permute_8;
+
   /* Cost of a vector-to-scalar operation.  */
   const int vec_to_scalar_cost;
diff --git a/gcc/config/riscv/riscv-vector-costs.cc 
b/gcc/config/riscv/riscv-vector-costs.cc
index adf9c197df5..c8178d71101 100644
--- a/gcc/config/riscv/riscv-vector-costs.cc
+++ b/gcc/config/riscv/riscv-vector-costs.cc
@@ -1043,6 +1043,25 @@ costs::better_main_loop_than_p (const vector_costs 
*uncast_other) const
   return vector_costs::better_main_loop_than_p (other);
}
+/* Returns the group size i.e. the number of vectors to be loaded by a
+   segmented load/store instruction.  Return 0 if it is no segmented
+   load/store.  */
+static int
+segment_loadstore_group_size (enum vect_cost_for_stmt kind,
+   stmt_vec_info stmt_info)
+{
+  if (stmt_info
+  && (kind == vector_load || kind == vector_store)
+  && STMT_VINFO_DATA_REF (stmt_info))
+{
+  stmt_info = DR_GROUP_FIRST_ELEMENT (stmt_info);
+  if (stmt_info
+   && STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info) == VMAT_LOAD_STORE_LANES)
+ return DR_GROUP_SIZE (stmt_info);
+}
+  return 0;
+}
+
/* Adjust vectorization cost after calling riscv_builtin_vectorization_cost.
For some statement, we would like to further fine-grain tweak the cost on
top of riscv_builtin_vectorization_cost handling which doesn't have any
@@ -1067,55 +1086,91 @@ costs::adjust_stmt_cost (enum vect_cost_for_stmt kind, 
loop_vec_info loop,
 case vector_load:
 case vector_store:
{
-   /* Unit-stride vector loads and stores do not have offset addressing
-  as opposed to scalar loads and stores.
-  If the address depends on a variable we need an additional
-  add/sub for each load/store in the worst case.  */
-   if (stmt_info && stmt_info->stmt)
+   if (stmt_info && stmt_info->stmt && STMT_VINFO_DATA_REF (stmt_info))
{
-   data_reference *dr = STMT_VINFO_DATA_REF (stmt_info);
-   class loop *father = stmt_info->stmt->bb->loop_father;
-   if (!loop && father && !father->inner && father->superloops)
+   /* Segment loads and stores.  When the group size is > 1
+ the vectorizer will add a vector load/store statement for
+ each vector in the group.  Here we additionally add permute
+ costs for each.  */
+   /* TODO: Indexed and ordered/unordered cost.  */
+   int group_size = segment_loadstore_group_size (kind, stmt_info);
+   if (group_size > 1)
+ {
+   switch (group_size)
+ {
+ case 2:
+   if (riscv_v_ext_vector_mode_p (loop->vector_mode))
+ stmt_cost += costs->vla->segment_permute_2;
+   else
+ stmt_cost += costs->vls->segment_permute_2;
+   break;
+ case 4:
+   if (riscv_v_ext_vector_mode_p (loop->vector_mode))
+ stmt_cost += costs->vla->segment_permute_4;
+   else
+ stmt_cost += costs->vls->segment_permute_4;
+   break;
+ case 8:
+   if (riscv_v_ext_vector_mode_p (loop->vector_mode))
+ stmt_cost += costs->vla->segment_permute_8;
+   else
+ stmt_cost +=

Re:[PATCH 5/5] RISC-V: Support vmsxx.vx for autovec comparison of vec and imm

2024-02-29 Thread 钟居哲

Hi, han. My comment for this patch is same as

[PATCH 3/5] RISC-V: Support vmfxx.vf for autovec comparison of vec and imm



--Original--
From: "demin.han"

Re:[PATCH 3/5] RISC-V: Support vmfxx.vf for autovec comparison of vec and imm

2024-02-29 Thread 钟居哲

Hi, han. I understand you are trying to support optimize vector-splat_vector 
into vector-scalar in "expand" stage, that is,


vv - vx or vv - vf.


It's a known issue that we know for a long time.


This patch is trying to transform vv-vf when the splat vector is duplicate 
from a constant (by recognize it is a CONST_VECTOR in expand stage),
but can't transform vv-vf when splat vector is duplicate from a 
register.


For example, like a[i] = b[i]  x ? c[i] : d[i], the x is a register, this 
case can not be optimized with your patch.


Actually, we have a solution to do all possible transformation (including the 
case I mentioned above) from vv to vx or vf by late-combine PASS which
is contributed by ARM Richard 
Sandiford:https://patchwork.ozlabs.org/project/gcc/patch/mptr0ljn9eh@arm.com/
You can try to apply this patch and experiment it locally yourself.


And I believe it will be landed in GCC-15. So I don't think we need this patch 
to do the optimization.


Thanks.

--Original--
From: "demin.han"

Re:[PATCH 4/5] RISC-V: Remove integer vector eqne pattern

2024-02-29 Thread 钟居哲

Hi, han. My review comment of this patch is same as I said in:



[PATCH 1/5] RISC-V: Remove float vector eqne pattern



--Original--
From: "demin.han"

Re:[PATCH 2/5] RISC-V: Refactor expand_vec_cmp

2024-02-29 Thread 钟居哲

LGTM. But please commit it with adding [NFC] into the title of this patch:


RISC-V: Refactor expand_vec_cmp [NFC]


--Original--
From: "demin.han"

Re:[PATCH 1/5] RISC-V: Remove float vector eqne pattern

2024-02-29 Thread 钟居哲

Hello, han. Thanks for trying to optimize the codes.


But I believe those vector-scalar patterns (eq/ne) you remove in this patch are 
necessary.


This is the story:
1. For commutative RTL code in GCC like plus, eq, ne, ... etc,
  we known in semantic Both (eq: (reg) (vec_duplicate ... ) and 
(eq: (vec_duplicate ...) (reg)) are right.
  However, GCC prefer this order as I remembered - (eq: 
(vec_duplicate ...) (reg)).


2. Before this patch, the order of the comparison as follows (take eq and lt as 
an example):
 
  1). (eq: (vec_duplicate ...) (reg)) -- commutative
  2). (lt: (reg) (vec_duplicate ... )  -- 
non-commutative
 
 These patterns order are different.
 
 So, you see we have dedicated patterns (seems duplicate patterns) 
for vector-scalar eq/ne, whereas, we unify eq/ne into other comparisons for 
vector-vector instructions.
 If we unify eq/ne into other comparisons for vector-scalar 
instructions (like your patch does), we will end up have:
 
 (eq: (reg) (vec_duplicate ... ) [after this patch] instead of (eq: 
(vec_duplicate ...) (reg)) [Before this patch].


So, I think this patch may not be right.
I may be wrong, Robin/Jerff/kito feel free to correct me if I am wrong.


--Original--
From: "demin.han"

Re: Re: [PATCH v3] RISC-V: Introduce gcc option mrvv-vector-bits for RVV

2024-02-28 Thread 钟居哲

I think it makes more sense to remove --param=riscv-autovec-preference and add 
-mrvv-vector-bits



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2024-02-28 20:56
To: pan2.li
CC: gcc-patches; juzhe.zhong; yanzhang.wang; rdapp.gcc; jeffreyalaw
Subject: Re: [PATCH v3] RISC-V: Introduce gcc option mrvv-vector-bits for RVV
Take one more look, I think this option should work and integrate with
--param=riscv-autovec-preference= since they have similar jobs but
slightly different.
 
We have 3 value for  --param=riscv-autovec-preference=: none, scalable
and fixed-vlmax
 
-mrvv-vector-bits=scalable is work like
--param=riscv-autovec-preference=scalable and
-mrvv-vector-bits=zvl is work like
--param=riscv-autovec-preference=fixed-vlmax.
 
So I think...we need to do some conflict check, like:
 
-mrvv-vector-bits=zvl can't work with --param=riscv-autovec-preference=scalable
-mrvv-vector-bits=scalable can't work with
--param=riscv-autovec-preference=fixed-vlmax
 
but it may not just alias since there is some useful combinations like:
 
-mrvv-vector-bits=zvl with --param=riscv-autovec-preference=none:
NO auto vectorization but intrinsic code still could benefit from the
-mrvv-vector-bits=zvl option.
 
-mrvv-vector-bits=scalable with --param=riscv-autovec-preference=none
Should still work for VLS code gen, but just disable auto
vectorization per the option semantic.
 
However here is something we need some fix, since
--param=riscv-autovec-preference=none still disable VLS code gen for
now, you can see some example here:
https://godbolt.org/z/fMTr3eW7K
 
But I think it's really the right behavior here, this part might need
to be fixed in vls_mode_valid_p and some other places.
 
 
Anyway I think we need to check all use sites with RVV_FIXED_VLMAX and
RVV_SCALABLE, and need to make sure all use site of RVV_FIXED_VLMAX
also checked with RVV_VECTOR_BITS_ZVL.
 
 
 
> -/* Return the VLEN value associated with -march.
> +static int
> +riscv_convert_vector_bits (int min_vlen)
 
Not sure if we really need this function, it seems it always returns min_vlen?
 
> +{
> +  int rvv_bits = 0;
> +
> +  switch (rvv_vector_bits)
> +{
> +  case RVV_VECTOR_BITS_ZVL:
> +  case RVV_VECTOR_BITS_SCALABLE:
> +   rvv_bits = min_vlen;
> +   break;
> +  default:
> +   gcc_unreachable ();
> +}
> +
> +  return rvv_bits;
> +}
> +
> +/* Return the VLEN value associated with -march and -mwrvv-vector-bits.

Re: Re: [PATCH] RISC-V: Update test expectancies with recent scheduler change

2024-02-27 Thread 钟居哲

>> I don't think it's that simple.  On some uarchs vsetvls are nearly free
>>while on others they can be fairly expensive.  It's not clear (to me)
>>yet if one approach or the other is going to be the more common.

That's uarch dependent which is not the stuff I am talking about.
What's I want to say is that this patch breaks those testcases I added for 
VSETVL PASS testing.
And those testcases are uarch independent.



juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2024-02-27 23:22
To: juzhe.zh...@rivai.ai; Robin Dapp; Edwin Lu; gcc-patches
CC: gnu-toolchain; pan2.li
Subject: Re: [PATCH] RISC-V: Update test expectancies with recent scheduler 
change
 
 
On 2/26/24 18:21, juzhe.zh...@rivai.ai wrote:
> If the scheduling model increases the vsetvls, we shouldn't set it as 
> default scheduling model
I don't think it's that simple.  On some uarchs vsetvls are nearly free 
while on others they can be fairly expensive.  It's not clear (to me) 
yet if one approach or the other is going to be the more common.
 
jeff

Re: [PATCH] RISC-V: Adjust vec unit-stride load/store costs.

2024-02-16 Thread 钟居哲

Can memrefs computed in analyze_loop_vinfo ?

juzhe.zh...@rivai.ai

From: Robin Dapp
Date: 2024-02-13 21:42
To: gcc-patches; palmer; Kito Cheng; jeffreyalaw; juzhe.zh...@rivai.ai
CC: rdapp.gcc
Subject: [PATCH] RISC-V: Adjust vec unit-stride load/store costs.
Hi,

scalar loads provide offset addressing while unit-stride vector
instructions cannot.  The offset must be loaded into a general-purpose
register before it can be used.  In order to account for this, this
patch adds an address arithmetic heuristic that keeps track of data
reference operands.  If we haven't seen the operand before we add the
cost of a scalar statement.

This helps to get rid of an lbm regression when vectorizing (roughly
0.5% fewer dynamic instructions).  gcc5 improves by 0.2% and deepsjeng
by 0.25%.  wrf and nab degrade by 0.1%.  This is because before we now
adjust the cost of SLP as well as loop-vectorized instructions whereas
we would only adjust loop-vectorized instructions before.
Considering higher scalar_to_vec costs (3 vs 1) for all vectorization
types causes some snippets not to get vectorized anymore.  Given these
costs the decisions look correct but appear worse when just counting
dynamic instructions.

In total SPECint 2017 has 4 bn dynamic instructions less and SPECfp 0.7
bn less so not a whole lot.

Regtested on riscv64.

Regards
Robin

gcc/ChangeLog:

* config/riscv/riscv-vector-costs.cc (adjust_stmt_cost): Move...
(costs::adjust_stmt_cost): ... to here and add vec_load/vec_store
offset handling.
(costs::add_stmt_cost): Also adjust cost for statements without
stmt_info.
* config/riscv/riscv-vector-costs.h: Define zero constant.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/riscv/rvv/vse-slp-1.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/vse-slp-2.c: New test.
---
gcc/config/riscv/riscv-vector-costs.cc| 86 ---
gcc/config/riscv/riscv-vector-costs.h | 10 +++
.../vect/costmodel/riscv/rvv/vse-slp-1.c  | 51 +++
.../vect/costmodel/riscv/rvv/vse-slp-2.c  | 53 
4 files changed, 190 insertions(+), 10 deletions(-)
create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/vse-slp-1.c
create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/vse-slp-2.c

diff --git a/gcc/config/riscv/riscv-vector-costs.cc 
b/gcc/config/riscv/riscv-vector-costs.cc
index 7c9840df4e9..adf9c197df5 100644
--- a/gcc/config/riscv/riscv-vector-costs.cc
+++ b/gcc/config/riscv/riscv-vector-costs.cc
@@ -42,6 +42,7 @@ along with GCC; see the file COPYING3.  If not see
#include "backend.h"
#include "tree-data-ref.h"
#include "tree-ssa-loop-niter.h"
+#include "tree-hash-traits.h"
/* This file should be included last.  */
#include "riscv-vector-costs.h"
@@ -1047,18 +1048,81 @@ costs::better_main_loop_than_p (const vector_costs 
*uncast_other) const
top of riscv_builtin_vectorization_cost handling which doesn't have any
information on statement operation codes etc.  */
-static unsigned
-adjust_stmt_cost (enum vect_cost_for_stmt kind, tree vectype, int stmt_cost)
+unsigned
+costs::adjust_stmt_cost (enum vect_cost_for_stmt kind, loop_vec_info loop,
+ stmt_vec_info stmt_info,
+ slp_tree, tree vectype, int stmt_cost)
{
   const cpu_vector_cost *costs = get_vector_costs ();
   switch (kind)
 {
 case scalar_to_vec:
-  return stmt_cost += (FLOAT_TYPE_P (vectype) ? costs->regmove->FR2VR
-   : costs->regmove->GR2VR);
+  stmt_cost += (FLOAT_TYPE_P (vectype) ? costs->regmove->FR2VR
+ : costs->regmove->GR2VR);
+  break;
 case vec_to_scalar:
-  return stmt_cost += (FLOAT_TYPE_P (vectype) ? costs->regmove->VR2FR
-   : costs->regmove->VR2GR);
+  stmt_cost += (FLOAT_TYPE_P (vectype) ? costs->regmove->VR2FR
+ : costs->regmove->VR2GR);
+  break;
+case vector_load:
+case vector_store:
+ {
+   /* Unit-stride vector loads and stores do not have offset addressing
+  as opposed to scalar loads and stores.
+  If the address depends on a variable we need an additional
+  add/sub for each load/store in the worst case.  */
+   if (stmt_info && stmt_info->stmt)
+ {
+   data_reference *dr = STMT_VINFO_DATA_REF (stmt_info);
+   class loop *father = stmt_info->stmt->bb->loop_father;
+   if (!loop && father && !father->inner && father->superloops)
+ {
+   tree ref;
+   if (TREE_CODE (dr->ref) != MEM_REF
+   || !(ref = TREE_OPERAND (dr->ref, 0))
+   || TREE_CODE (ref) != SSA_NAME)
+ break;
+
+   if (SSA_NAME_IS_DEFAULT_DEF (ref))
+ break;
+
+   if (memrefs.contains ({ref, cst0}))
+ break;
+
+   memrefs.add ({ref, cst0});
+
+   /* In case we have not seen REF before and the base address
+  is a pointer operation try a bit harder.  */
+   tree base = DR_BASE_ADDRESS (dr);
+   if (TREE_CODE (base) == POINTER_PLUS_EXPR
+   || TREE_CODE (base) == POINTER_DIFF_EXPR)
+ {
+   /* Deconstruct BASE's first operand.  If it is a binary
+ operation, i.e. a base and an "offset" store this
+

Re: Re: [Committed] RISC-V: Add regression test for vsetvl bug pr113429

2024-01-26 Thread 钟居哲

newlib rv32gcv



juzhe.zh...@rivai.ai
 
From: Patrick O'Neill
Date: 2024-01-27 08:38
To: juzhe.zh...@rivai.ai; gcc-patches
CC: kito.cheng; law; rdapp; vineetg
Subject: Re: [Committed] RISC-V: Add regression test for vsetvl bug pr113429
What target/config are these failures on?
I tried rv64gcv, rv64gc, rv32gcv, and rv32gc with RUNTESTFLAGS="rvv.exp" and 
don't see these failures.

Thanks,
Patrick
On 1/25/24 23:20, juzhe.zh...@rivai.ai wrote:
This patch causes the following regression:

FAIL: gcc.target/riscv/rvv/vsetvl/pr113429.c   -O0  (test for excess errors)
FAIL: gcc.target/riscv/rvv/vsetvl/pr113429.c   -O1  (test for excess errors)
FAIL: gcc.target/riscv/rvv/vsetvl/pr113429.c   -O2  (test for excess errors)
FAIL: gcc.target/riscv/rvv/vsetvl/pr113429.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  (test for excess errors)
FAIL: gcc.target/riscv/rvv/vsetvl/pr113429.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  (test for excess errors)
FAIL: gcc.target/riscv/rvv/vsetvl/pr113429.c   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess 
errors)
FAIL: gcc.target/riscv/rvv/vsetvl/pr113429.c   -O3 -g  (test for excess errors)
FAIL: gcc.target/riscv/rvv/vsetvl/pr113429.c   -Os  (test for excess errors)

I suggest you add :

/* { dg-require-effective-target rv64 } */
/* { dg-require-effective-target riscv_v } */



juzhe.zh...@rivai.ai
 
From: Patrick O'Neill
Date: 2024-01-24 09:20
To: juzhe.zh...@rivai.ai; gcc-patches
CC: kito.cheng; law; rdapp; vineetg
Subject: [Committed] RISC-V: Add regression test for vsetvl bug pr113429
The reduced testcase for pr113429 (cam4 failure) needed additional
modules so it wasn't committed.
The fuzzer found a c testcase that was also fixed with pr113429's fix.
Adding it as a regression test.
PR target/113429
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/vsetvl/pr113429.c: New test.
Signed-off-by: Patrick O'Neill 
---
 .../gcc.target/riscv/rvv/vsetvl/pr113429.c| 70 +++
 1 file changed, 70 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr113429.c
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr113429.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr113429.c
new file mode 100644
index 000..05c3eeecb94
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr113429.c
@@ -0,0 +1,70 @@
+/* { dg-do run } */
+/* { dg-options "-march=rv64gcv_zvl256b -mabi=lp64d -O3" } */
+
+long a;
+int b, c, d, e, f, g;
+short h, i, j;
+static int k = 3;
+static int l = 6;
+int m[5][7];
+signed char n;
+int *const o = 
+
+signed char(p)(signed char p1, signed char q) {
+  return p1 / q;
+}
+
+void s(unsigned p1) {
+  b = (b ^ p1) & 255;
+}
+
+static long t() {
+  long u;
+  signed char v;
+  d = 1;
+  for (; d <= 4; d++) {
+j = 0;
+for (; j <= 4; j++) {
+  v = 0;
+  for (; v <= 4; v++) {
+if (m[v][v])
+  continue;
+c = 0;
+for (; c <= 4; c++) {
+  n = 0;
+  for (; n <= 4; n++) {
+int *w = 
+long r = v;
+u = r == 0 ? a : a % r;
+h |= u;
+*w = g;
+--m[n][c];
+f &= *o;
+  }
+}
+if (p((i < 3) ^ 9, k))
+  ;
+else if (v)
+  return 0;
+  }
+}
+  }
+  return 1;
+}
+
+static char x() {
+  for (;;) {
+t();
+if (l)
+  return 0;
+  }
+}
+
+int main() {
+  x();
+  s(e & 255);
+  if (b == 0)
+return 0;
+  else
+return 1;
+}
-- 
2.34.1

Re: Re: [PATCH] RISC-V: Fix large memory usage of VSETVL PASS [PR113495]

2024-01-23 Thread 钟居哲

>> Why that change?  Was no-schedule necessary before and is not anymore?
>> Is it a result from the changes?  I'd hope not.
Yes. But reasonable. So adapt testcase is enough.

juzhe.zh...@rivai.ai

From: Robin Dapp
Date: 2024-01-24 05:12
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Fix large memory usage of VSETVL PASS [PR113495]
> SPEC 2017 wrf benchmark expose unreasonble memory usage of VSETVL PASS
> that is, VSETVL PASS consume over 33 GB memory which make use impossible
> to compile SPEC 2017 wrf in a laptop.
> 
> The root cause is wasting-memory variables:

LGTM. The new code matches compute_lcm_local_properties more
closely which makes sense to me.

One separate thing, nothing to do with this patch - I find
bitmap_union_of_preds_with_entry not wrong but weirdly written.
Probably because it was copied from somewhere and slightly
adjusted?  If you touch more code anyway, would you mind fixing it?

  for (ix = 0; ix < EDGE_COUNT (b->preds); ix++)
{
  e = EDGE_PRED (b, ix);
  bitmap_copy (dst, src[e->src->index]);
  break;
}
  if (ix == EDGE_COUNT (b->preds))
bitmap_clear (dst);

The whole idea seems to _not_ skip the entry block.  So something
like if (EDGE_COUNT () == 0) {...} else { bitmap_copy (...)) should
be sufficient?  If the input is assumed to be empty we could even
skip the copy.

> -/* { dg-options "--param=riscv-autovec-preference=scalable -march=rv32gcv 
> -mabi=ilp32 -fno-schedule-insns -fno-schedule-insns2 -fno-tree-vectorize" } */
> +/* { dg-options "--param=riscv-autovec-preference=scalable -march=rv32gcv 
> -mabi=ilp32 -fno-tree-vectorize" } */

Why that change?  Was no-schedule necessary before and is not anymore?
Is it a result from the changes?  I'd hope not.

Regards
Robin

Re: Re: [Committed] RISC-V: Suppress warning

2024-01-19 Thread 钟居哲

OK. I saw the other arguments there:

tree fntype ATTRIBUTE_UNUSED,
rtx libname ATTRIBUTE_UNUSED,

So I leverage these and add ATTRIBUTE_UNUSED to 'fndecl'

Maybe it's better remove all arguments for riscv_init_cumulative_args which are 
unused as you suggested.



juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2024-01-20 08:52
To: Juzhe-Zhong; gcc-patches
CC: pan2.li; schwab
Subject: Re: [Committed] RISC-V: Suppress warning
 
 
On 1/19/24 17:27, Juzhe-Zhong wrote:
> ../../gcc/config/riscv/riscv.cc: In function 'void 
> riscv_init_cumulative_args(CUMULATIVE_ARGS*, tree, rtx, tree, int)':
> ../../gcc/config/riscv/riscv.cc:4879:34: error: unused parameter 'fndecl' 
> [-Werror=unused-parameter]
> 4879 | tree fndecl,
>| ~^~
> ../../gcc/config/riscv/riscv.cc: In function 'bool 
> riscv_vector_mode_supported_any_target_p(machine_mode)':
> ../../gcc/config/riscv/riscv.cc:10537:56: error: unused parameter 'mode' 
> [-Werror=unused-parameter]
> 10537 | riscv_vector_mode_supported_any_target_p (machine_mode mode)
>|   ~^~~~
> cc1plus: all warnings being treated as errors
> make[3]: *** [Makefile:2559: riscv.o] Error 1
> 
> Suppress these warnings.
> 
> gcc/ChangeLog:
> 
> * config/riscv/riscv.cc (riscv_init_cumulative_args): Suppress warning.
> (riscv_vector_mode_supported_any_target_p): Ditto.
There's actually more cleanup to do in there ;-) One of the arguments 
currently marked as unused is actually used.  And the better way to 
handle unused arguments is to just drop their name (like you did with 
riscv_vector_mode_supported_any_target_p).
 
I'm actually in the process of bootstrapping and regression testing the 
additional fixes to riscv_init_cumulative_args.
 
jeff

Re: [PATCH] RISC-V: fix some vsetvl debug info in pass's Phase 2 code [NFC]

2024-01-16 Thread 钟居哲

LGTM.



juzhe.zh...@rivai.ai
 
From: Vineet Gupta
Date: 2024-01-17 05:41
To: gcc-patches; Robin Dapp; juzhe . zhong @ rivai . ai
CC: Jeff Law; kito.cheng; gnu-toolchain; Vineet Gupta
Subject: [PATCH] RISC-V: fix some vsetvl debug info in pass's Phase 2 code [NFC]
When staring at VSETVL pass for PR/113429, spotted some minor
improvements.
 
1. For readablity, remove some redundant condition check in Phase 2
   function  earliest_fuse_vsetvl_info ().
2. Add iteration count in debug prints in same function.
 
gcc/ChangeLog:
* config/riscv/riscv-vsetvl.cc (earliest_fuse_vsetvl_info):
Remove redundant checks in else condition for readablity.
(earliest_fuse_vsetvl_info) Print iteration count in debug
prints.
(earliest_fuse_vsetvl_info) Fix misleading vsetvl info
dump details in certain cases.
 
Signed-off-by: Vineet Gupta 
---
gcc/config/riscv/riscv-vsetvl.cc | 20 ++--
1 file changed, 10 insertions(+), 10 deletions(-)
 
diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 78a2f7b38faf..41d4b80648f6 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -2343,7 +2343,7 @@ public:
   void compute_lcm_local_properties ();
   void fuse_local_vsetvl_info ();
-  bool earliest_fuse_vsetvl_info ();
+  bool earliest_fuse_vsetvl_info (int iter);
   void pre_global_vsetvl_info ();
   void emit_vsetvl ();
   void cleaup ();
@@ -2961,7 +2961,7 @@ pre_vsetvl::fuse_local_vsetvl_info ()
bool
-pre_vsetvl::earliest_fuse_vsetvl_info ()
+pre_vsetvl::earliest_fuse_vsetvl_info (int iter)
{
   compute_avl_def_data ();
   compute_vsetvl_def_data ();
@@ -2984,7 +2984,8 @@ pre_vsetvl::earliest_fuse_vsetvl_info ()
   if (dump_file && (dump_flags & TDF_DETAILS))
 {
-  fprintf (dump_file, "\n  Compute LCM earliest insert data:\n\n");
+  fprintf (dump_file, "\n  Compute LCM earliest insert data (lift 
%d):\n\n",
+iter);
   fprintf (dump_file, "Expression List (%u):\n", num_exprs);
   for (unsigned i = 0; i < num_exprs; i++)
{
@@ -3032,7 +3033,7 @@ pre_vsetvl::earliest_fuse_vsetvl_info ()
   if (dump_file && (dump_flags & TDF_DETAILS))
 {
-  fprintf (dump_file, "Fused global info result:\n");
+  fprintf (dump_file, "Fused global info result (lift %d):\n", iter);
 }
   bool changed = false;
@@ -3142,8 +3143,7 @@ pre_vsetvl::earliest_fuse_vsetvl_info ()
  if (src_block_info.has_info ())
src_block_info.probability += dest_block_info.probability;
}
-   else if (src_block_info.has_info ()
-&& !m_dem.compatible_p (prev_info, curr_info))
+   else
{
  /* Cancel lift up if probabilities are equal.  */
  if (successors_probability_equal_p (eg->src))
@@ -3151,11 +3151,11 @@ pre_vsetvl::earliest_fuse_vsetvl_info ()
  if (dump_file && (dump_flags & TDF_DETAILS))
{
  fprintf (dump_file,
-"  Change empty bb %u to from:",
+"  Reset bb %u:",
   eg->src->index);
  prev_info.dump (dump_file, "");
  fprintf (dump_file,
-"to (higher probability):");
+" due to (same probability):");
  curr_info.dump (dump_file, "");
}
  src_block_info.set_empty_info ();
@@ -3170,7 +3170,7 @@ pre_vsetvl::earliest_fuse_vsetvl_info ()
  if (dump_file && (dump_flags & TDF_DETAILS))
{
  fprintf (dump_file,
-"  Change empty bb %u to from:",
+"  Change bb %u from:",
   eg->src->index);
  prev_info.dump (dump_file, "");
  fprintf (dump_file,
@@ -3627,7 +3627,7 @@ pass_vsetvl::lazy_vsetvl ()
 {
   if (dump_file)
fprintf (dump_file, "  Try lift up %d.\n\n", fused_count);
-  changed = pre.earliest_fuse_vsetvl_info ();
+  changed = pre.earliest_fuse_vsetvl_info (fused_count);
   fused_count += 1;
   } while (changed);
-- 
2.34.1

Re: [PATCH v2] RISC-V: RVV: add toggle to control vsetvl pass behavior

2024-01-16 Thread 钟居哲

LGTM.

juzhe.zh...@rivai.ai

From: Vineet Gupta
Date: 2024-01-17 05:41
To: gcc-patches; Robin Dapp; juzhe . zhong @ rivai . ai
CC: Jeff Law; kito.cheng; gnu-toolchain; Vineet Gupta
Subject: [PATCH v2] RISC-V: RVV: add toggle to control vsetvl pass behavior
RVV requires VSET?VL? instructions to dynamically configure VLEN at
runtime. There's a custom pass to do that which has a simple mode
which generates a VSETVL for each V insn and a lazy/optimal mode which
uses LCM dataflow to move VSETVL around, identify/delete the redundant
ones.

Currently simple mode is default for !optimize invocations while lazy
mode being the default.

This patch allows simple mode to be forced via a toggle independent of
the optimization level. A lot of gcc developers are currently doing this
in some form in their local setups, as in the initial phase of autovec
development issues are expected. It makes sense to provide this facility
upstream. It could potentially also be used by distro builder for any
quick workarounds in autovec bugs of future.

gcc/ChangeLog:
* config/riscv/riscv.opt: New -param=vsetvl-strategy.
* config/riscv/riscv-opts.h: New enum vsetvl_strategy_enum.
* config/riscv/riscv-vsetvl.cc
(pre_vsetvl::pre_global_vsetvl_info): Use vsetvl_strategy.
(pass_vsetvl::execute): Use vsetvl_strategy.

Signed-off-by: Vineet Gupta 
---
Changes since v1:
  - Dropped OPTIM_NO_DEL
---
gcc/config/riscv/riscv-opts.h|  9 +
gcc/config/riscv/riscv-vsetvl.cc |  2 +-
gcc/config/riscv/riscv.opt   | 14 ++
3 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index ff4406ab8eaf..ca57dddf1d9a 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -116,6 +116,15 @@ enum stringop_strategy_enum {
   STRATEGY_AUTO = STRATEGY_SCALAR | STRATEGY_VECTOR
};
+/* Behavior of VSETVL Pass.  */
+enum vsetvl_strategy_enum {
+  /* Simple: Insert a vsetvl* instruction for each Vector instruction.  */
+  VSETVL_SIMPLE = 1,
+  /* Optimized: Run LCM dataflow analysis to reduce vsetvl* insns and
+ delete any redundant ones generated in the process.  */
+  VSETVL_OPT = 2
+};
+
#define TARGET_ZICOND_LIKE (TARGET_ZICOND || (TARGET_XVENTANACONDOPS && 
TARGET_64BIT))
/* Bit of riscv_zvl_flags will set contintuly, N-1 bit will set if N-bit is
diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index df7ed149388a..78a2f7b38faf 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -3671,7 +3671,7 @@ pass_vsetvl::execute (function *)
   if (!has_vector_insn (cfun))
 return 0;
-  if (!optimize)
+  if (!optimize || vsetvl_strategy & VSETVL_SIMPLE)
 simple_vsetvl ();
   else
 lazy_vsetvl ();
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index 44ed6d69da29..fd4f1a4df206 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -546,6 +546,20 @@ Target Undocumented Bool Var(riscv_vector_abi) Init(0)
Enable the use of vector registers for function arguments and return value.
This is an experimental switch and may be subject to change in the future.
+Enum
+Name(vsetvl_strategy) Type(enum vsetvl_strategy_enum)
+Valid arguments to -param=vsetvl-strategy=:
+
+EnumValue
+Enum(vsetvl_strategy) String(simple) Value(VSETVL_SIMPLE)
+
+EnumValue
+Enum(vsetvl_strategy) String(optim) Value(VSETVL_OPT)
+
+-param=vsetvl-strategy=
+Target Undocumented RejectNegative Joined Enum(vsetvl_strategy) 
Var(vsetvl_strategy) Init(VSETVL_OPT)
+-param=vsetvl-strategy= Set the optimization level of VSETVL insert 
pass.
+
Enum
Name(stringop_strategy) Type(enum stringop_strategy_enum)
Valid arguments to -mstringop-strategy=:
-- 
2.34.1

Re: Re: [PATCH] RISC-V: Switch RVV cost model to generic vector cost model

2024-01-10 Thread 钟居哲

>> (1) Fall back to the generic cost model if the tune model didn't
>> (specify one, i.e. make sure we always use the generic cost
>> ( model rather than the default one.
>> ((2) Change this generic (fallback) cost model so we don't have
>> (regressions on the current trunk, as it's now always used.
>> ((3) Adjust it piece by piece.

>> (Sure this makes sense and is also what I had in mind.

Yes, that's my plan.

Send in V2:
[PATCH V2] RISC-V: Switch RVV cost model. (gnu.org)



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2024-01-10 23:36
To: 钟居哲; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; Jeff Law
Subject: Re: [PATCH] RISC-V: Switch RVV cost model to generic vector cost model
> Current generic cost model makes dynamic-lmul2-7.c generate inferior codegen.
> 
> I found if I tweak the cost a little bit then dynamic-lmul2-7.c codegen can 
> be recovered.
> However, it makes other tests failed
> It's complicated story
 
Ok, makes sense.  So the plan seems to be:
 
(1) Fall back to the generic cost model if the tune model didn't
 specify one, i.e. make sure we always use the generic cost
 model rather than the default one.
(2) Change this generic (fallback) cost model so we don't have
 regressions on the current trunk, as it's now always used.
(3) Adjust it piece by piece.
 
Sure this makes sense and is also what I had in mind.
 
> It's true that: we can keep current cost model 
> default_builtin_vectorization_cost
> And tweak generic cost model, for exampl, add testcase for SHA256 and add 
> -mtune=generic-ooo to test it.
 
> But the question, how do you know whether there is a regression on current 
> testsuite with -mtune=generic-ooo ?
 
That's a valid question and not easily solved.  Ideally the
generic model is generic enough to be a good base for most
uarchs.  Then the uarchs would only do minor adjustments and
have their own tests for that while the bulk of the generic
tests would still pass.
 
Generally, normal tests should be pretty independent of the
cost model with the exception of checking instruction sequences.
Those that are not should either specify their own -mtune and/or
disable scheduling.  Of course that's easier said than done...
 
Back to the patch:
 
I would suggest either renaming generic_vl[sa]_vector_cost to
rvv_vl[sa]_vector_cost (I find generic a bit too close to default)
and/or add comments that those are supposed to be the vector cost models
used by default if no other cost model was specified.
 
After understanding (2) of the plan the patch is OK to me with
that changed.
 
Regards
Robin

Re: Re: [PATCH] RISC-V: Switch RVV cost model to generic vector cost model

2024-01-10 Thread 钟居哲

Current generic cost model makes dynamic-lmul2-7.c generate inferior codegen.

I found if I tweak the cost a little bit then dynamic-lmul2-7.c codegen can be 
recovered.
However, it makes other tests failed
It's complicated story

So, I'd rather set it as default cost and switch to it.
Then, we can tune the cost gradually, not only fix the issues we faced (e.g. 
SHA256), but also no matter how we 
tweak the costs later, it won't hurt the codegen of current tests.

It's true that: we can keep current cost model 
default_builtin_vectorization_cost
And tweak generic cost model, for exampl, add testcase for SHA256 and add 
-mtune=generic-ooo to test it.
But the question, how do you know whether there is a regression on current 
testsuite with -mtune=generic-ooo ?

Note that we can tweak generic vector cost model to fix SHA256 issue easily, 
but we should also make sure 
we don't have regressions on current testsuite with the new cost model.  So I 
switch the cost model.

juzhe.zh...@rivai.ai

From: Robin Dapp
Date: 2024-01-10 23:04
To: 钟居哲; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; Jeff Law
Subject: Re: [PATCH] RISC-V: Switch RVV cost model to generic vector cost model
On 1/10/24 15:40, 钟居哲 wrote:
> I need to add these costs for segment load/stores:
> 
> /* Generic costs for VLA vector operations.  */
> static const scalable_vector_cost generic_vla_vector_cost = {
>   {
> 1,/* int_stmt_cost  */
> 1,/* fp_stmt_cost  */
> 1,/* gather_load_cost  */
> 1,/* scatter_store_cost  */
> 1,/* vec_to_scalar_cost  */
> 1,/* scalar_to_vec_cost  */
> 1,/* permute_cost  */
> 1,/* align_load_cost  */
> 1,/* align_store_cost  */
> 2,/* unalign_load_cost  */
> 2,/* unalign_store_cost  */
>   },
>   2,/* vlseg2_vsseg2_permute_cost  */
>   2,/* vlseg3_vsseg3_permute_cost  */
>   3,/* vlseg4_vsseg4_permute_cost  */
>   3,/* vlseg5_vsseg5_permute_cost  */
>   4,/* vlseg6_vsseg6_permute_cost  */
>   4,/* vlseg7_vsseg7_permute_cost  */
>   4,/* vlseg8_vsseg8_permute_cost  */
> };
> 
> to fix the SLP issues in the following patches.
> 
> If you don't allow me to switch to generic vector cost model and tune it.
> How can I fix the FAILs of slp-*.c cases ?
> 
> Currently, l let all slp-*.c tests all XFAIL which definitely incorrect.

Of course we don't want those XFAILs.  It's not a matter of "allowing"
or not but rather that I'd like to understand the reasoning.  The patch
itself seems reasonable to me apart from not really getting the
intention.

Your main point seems to be

> +  const cpu_vector_cost *costs = tune_param->vec_costs;
> +  if (!costs)
> +return _vector_cost
and that is fine.  What's not clear is whether changing the actual
costs is a temporary thing or whether it is supposed to be another
fallback.  If they are going to be changed anyway, why do we need
to revert to the default model now?  As discussed yesterday
increased permute costs and vec_to_scalar costs make sense, to first
order.  Is that because of dynamic-lmul2-7.c?

Generally we need to make the costs dependent on the
type or mode of course, just as we started to do with the latencies.
Permute is particularly sensitive as you already gathered.

Regards
Robin

Re: Re: [PATCH] RISC-V: Switch RVV cost model to generic vector cost model

2024-01-10 Thread 钟居哲

I need to add these costs for segment load/stores:

/* Generic costs for VLA vector operations.  */
static const scalable_vector_cost generic_vla_vector_cost = {
  {
1, /* int_stmt_cost  */
1, /* fp_stmt_cost  */
1, /* gather_load_cost  */
1, /* scatter_store_cost  */
1, /* vec_to_scalar_cost  */
1, /* scalar_to_vec_cost  */
1, /* permute_cost  */
1, /* align_load_cost  */
1, /* align_store_cost  */
2, /* unalign_load_cost  */
2, /* unalign_store_cost  */
  },
  2, /* vlseg2_vsseg2_permute_cost  */
  2, /* vlseg3_vsseg3_permute_cost  */
  3, /* vlseg4_vsseg4_permute_cost  */
  3, /* vlseg5_vsseg5_permute_cost  */
  4, /* vlseg6_vsseg6_permute_cost  */
  4, /* vlseg7_vsseg7_permute_cost  */
  4, /* vlseg8_vsseg8_permute_cost  */
};

to fix the SLP issues in the following patches.

If you don't allow me to switch to generic vector cost model and tune it.
How can I fix the FAILs of slp-*.c cases ?

Currently, l let all slp-*.c tests all XFAIL which definitely incorrect.


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2024-01-10 22:11
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Switch RVV cost model to generic vector cost model
Hi Juzhe,
 
> The reason we want to switch to generic vector cost model is the default
> cost model generates inferior codegen for various benchmarks.
> 
> For example, PR113247, we have performance bug that we end up having over 70%
> performance drop of SHA256.  Currently, no matter how we adapt cost model,
> we are not able to fix the performance bug since we always use default cost 
> model by default.
> 
> Also, tweak the generic cost model back to default cost model since we have 
> some FAILs in
> current tests.
 
So to recap:
 
- Our current default tune model is rocket which does not have a vector
   cost model.  No other tune model except generic-ooo has one.
 
- We want tune models with no vector cost model to fall back to the
   default vector cost model for now, later possibly the generic RVV
   cost model.
 
- You're seeing inferior codegen for dynamic-lmul2-7.c with our generic
   RVV (not default) vector cost model (built with -mtune=generic-ooo?).
 
Therefore the suggestions is to start over freshly with the default
vector cost model?
 
>  /* Generic costs for VLA vector operations.  */
> @@ -374,13 +374,13 @@ static const scalable_vector_cost 
> generic_vla_vector_cost = {
>  1, /* fp_stmt_cost  */
>  1, /* gather_load_cost  */
>  1, /* scatter_store_cost  */
> -2, /* vec_to_scalar_cost  */
> +1, /* vec_to_scalar_cost  */
>  1, /* scalar_to_vec_cost  */
> -2, /* permute_cost  */
> +1, /* permute_cost  */
>  1, /* align_load_cost  */
>  1, /* align_store_cost  */
> -1, /* unalign_load_cost  */
> -1, /* unalign_store_cost  */
> +2, /* unalign_load_cost  */
> +2, /* unalign_store_cost  */
>},
>  };
 
So is the idea here to just revert the values to the defaults for now
and change them again soon?  And not to keep this as another default
and add others?
 
I'm a bit confused here :)  How does this help?  Can't we continue to
fall back to the default vector cost model when a tune model does not
specify a vector cost model?  If generic-ooo using the generic vector
cost model is the problem, then let's just change it to NULL for now?
 
I suppose at some point we will not want to fall back to the default
vector cost model anymore but always use the generic RVV cost model.
Once we reach the costing part we need to fall back to something
if nothing was defined and generic RVV is supposed to always be better 
than default.
 
Regards
Robin

Re: Re: [PATCH] RISC-V: Switch RVV cost model to generic vector cost model

2024-01-10 Thread 钟居哲

>> So is the idea here to just revert the values to the defaults for now
>> and change them again soon?  And not to keep this as another default
>> and add others?

My idea is to revert default for now. Then we can refine the cost gradually.

>> I'm a bit confused here :)  How does this help?  Can't we continue to
>> fall back to the default vector cost model when a tune model does not
>> specify a vector cost model?  If generic-ooo using the generic vector
>> cost model is the problem, then let's just change it to NULL for now?

If you still want to fall back to default vector cost model.
Could you tell me how to fix the XFAILs of slp-*.c tests ?



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2024-01-10 22:11
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Switch RVV cost model to generic vector cost model
Hi Juzhe,
 
> The reason we want to switch to generic vector cost model is the default
> cost model generates inferior codegen for various benchmarks.
> 
> For example, PR113247, we have performance bug that we end up having over 70%
> performance drop of SHA256.  Currently, no matter how we adapt cost model,
> we are not able to fix the performance bug since we always use default cost 
> model by default.
> 
> Also, tweak the generic cost model back to default cost model since we have 
> some FAILs in
> current tests.
 
So to recap:
 
- Our current default tune model is rocket which does not have a vector
   cost model.  No other tune model except generic-ooo has one.
 
- We want tune models with no vector cost model to fall back to the
   default vector cost model for now, later possibly the generic RVV
   cost model.
 
- You're seeing inferior codegen for dynamic-lmul2-7.c with our generic
   RVV (not default) vector cost model (built with -mtune=generic-ooo?).
 
Therefore the suggestions is to start over freshly with the default
vector cost model?
 
>  /* Generic costs for VLA vector operations.  */
> @@ -374,13 +374,13 @@ static const scalable_vector_cost 
> generic_vla_vector_cost = {
>  1, /* fp_stmt_cost  */
>  1, /* gather_load_cost  */
>  1, /* scatter_store_cost  */
> -2, /* vec_to_scalar_cost  */
> +1, /* vec_to_scalar_cost  */
>  1, /* scalar_to_vec_cost  */
> -2, /* permute_cost  */
> +1, /* permute_cost  */
>  1, /* align_load_cost  */
>  1, /* align_store_cost  */
> -1, /* unalign_load_cost  */
> -1, /* unalign_store_cost  */
> +2, /* unalign_load_cost  */
> +2, /* unalign_store_cost  */
>},
>  };
 
So is the idea here to just revert the values to the defaults for now
and change them again soon?  And not to keep this as another default
and add others?
 
I'm a bit confused here :)  How does this help?  Can't we continue to
fall back to the default vector cost model when a tune model does not
specify a vector cost model?  If generic-ooo using the generic vector
cost model is the problem, then let's just change it to NULL for now?
 
I suppose at some point we will not want to fall back to the default
vector cost model anymore but always use the generic RVV cost model.
Once we reach the costing part we need to fall back to something
if nothing was defined and generic RVV is supposed to always be better 
than default.
 
Regards
Robin

Re: Re: [PATCH v5] RISC-V: Fix register overlap issue for some xtheadvector instructions

2024-01-10 Thread 钟居哲

>> For the other insns, I wonder if we could get away with not really
>>disabling the newly added early-clobber alternatives for RVV but
>>just disparaging ("?") them?  That way we could re-use "full" for
>>the thv-disabled alternatives and "none" for the newly added ones.
>>("none" will still be misleading then, though :/)

I prefer to disable those early-clobber alternatives added of theadvector for 
RVV,
since disparage still make RA possible reaches the early clobber alternatives.

>>If this doesn't work or others feel the separation is not strict
>>enough, I'd prefer a separate attribute rather than overloading
>>group_overlap.  Maybe something like "spec_restriction" or similar
>>with two values "rvv" and "thv"?

I like this idea, it makes more sense to me. So I think it's better to add an 
attribute to
disable alternative for theadvector or RVV1.0.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2024-01-10 21:36
To: Jun Sha (Joshua); gcc-patches
CC: rdapp.gcc; jim.wilson.gcc; palmer; andrew; philipp.tomsich; jeffreyalaw; 
christoph.muellner; juzhe.zhong; Jin Ma; Xianmiao Qu
Subject: Re: [PATCH v5] RISC-V: Fix register overlap issue for some 
xtheadvector instructions
Hi Joshua,
 
> For th.vmadc/th.vmsbc as well as narrowing arithmetic instructions
> and floating-point compare instructions, an illegal instruction
> exception will be raised if the destination vector register overlaps
> a source vector register group.
> 
> To handle this issue, we use "group_overlap" and "enabled" attribute
> to disable some alternatives for xtheadvector.
 
>  ;; Widening instructions have group-overlap constraints.  Those are only
>  ;; valid for certain register-group sizes.  This attribute marks the
>  ;; alternatives not matching the required register-group size as disabled.
> -(define_attr "group_overlap" "none,W21,W42,W84,W43,W86,W87,W0"
> +(define_attr "group_overlap" 
> "none,W21,W42,W84,W43,W86,W87,W0,thv_disabled,rvv_disabled"
>(const_string "none"))
 
I realize there have been some discussions before but I find the naming
misleading.  The group_overlap attribute is supposed to specify whether
groups overlap (and mark the respective alternatives accepting
only this overlap).
Then we check if the groups overlap and disable all non-matching
alternatives.  "none" i.e. "no overlap" always matches.
 
Your first goal seems to be to disable existing non-early-clobber
alternatives for thv.  For this, maybe "full", "same" (or "any"?) would
work?  Please also add a comment in group_overlap_valid then that we
need not actually check for register equality.
 
For the other insns, I wonder if we could get away with not really
disabling the newly added early-clobber alternatives for RVV but
just disparaging ("?") them?  That way we could re-use "full" for
the thv-disabled alternatives and "none" for the newly added ones.
("none" will still be misleading then, though :/)
 
If this doesn't work or others feel the separation is not strict
enough, I'd prefer a separate attribute rather than overloading
group_overlap.  Maybe something like "spec_restriction" or similar
with two values "rvv" and "thv"?
 
Regards
Robin

Re: Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions of XTheadVector.

2024-01-09 Thread 钟居哲

Yes. I aggree with you that we should wait until all theadvector are acccepted.

Thanks.

juzhe.zh...@rivai.ai

From: Jeff Law
Date: 2024-01-10 01:49
To: 钟居哲; cooper.joshua; gcc-patches
CC: jim.wilson.gcc; palmer; andrew; philipp.tomsich; Christoph Müllner; jinma; 
Cooper Qu
Subject: Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions of 
XTheadVector.

On 1/8/24 16:04, 钟居哲 wrote:
> This patch looks ok from myside.
Likewise.

So I think the only question for this specific patch is whether or not 
it makes sense to include it now or wait for more of the thead bits to 
get to acceptance.

I tend to think it should wait since I don't think it has any value 
without the rest of the thead vector changes and it's not 100% clear if 
those changes are going to make it into gcc-14 or not.

Jeff

Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions of XTheadVector.

2024-01-08 Thread 钟居哲

This patch looks ok from myside.



juzhe.zh...@rivai.ai
 
From: Jun Sha (Joshua)
Date: 2024-01-03 14:08
To: gcc-patches
CC: jim.wilson.gcc; palmer; andrew; philipp.tomsich; jeffreyalaw; 
christoph.muellner; juzhe.zhong; Jun Sha (Joshua); Jin Ma; Xianmiao Qu
Subject: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions of 
XTheadVector.
This patch adds th. prefix to all XTheadVector instructions by
implementing new assembly output functions. We only check the
prefix is 'v', so that no extra attribute is needed.
 
gcc/ChangeLog:
 
* config/riscv/riscv-protos.h (riscv_asm_output_opcode):
New function to add assembler insn code prefix/suffix.
(th_asm_output_opcode):
Thead function to add assembler insn code prefix/suffix.
* config/riscv/riscv.cc (riscv_asm_output_opcode): Likewise
* config/riscv/riscv.h (ASM_OUTPUT_OPCODE): Likewise.
* config/riscv/thead.cc (th_asm_output_opcode): Likewise
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/xtheadvector/prefix.c: New test.
 
Co-authored-by: Jin Ma 
Co-authored-by: Xianmiao Qu 
Co-authored-by: Christoph Müllner 
---
gcc/config/riscv/riscv-protos.h |  2 ++
gcc/config/riscv/riscv.cc   | 11 +++
gcc/config/riscv/riscv.h|  4 
gcc/config/riscv/thead.cc   | 13 +
.../gcc.target/riscv/rvv/xtheadvector/prefix.c  | 12 
5 files changed, 42 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/prefix.c
 
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 31049ef7523..71724dabdb5 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -102,6 +102,7 @@ struct riscv_address_info {
};
/* Routines implemented in riscv.cc.  */
+extern const char *riscv_asm_output_opcode (FILE *asm_out_file, const char *p);
extern enum riscv_symbol_type riscv_classify_symbolic_expression (rtx);
extern bool riscv_symbolic_constant_p (rtx, enum riscv_symbol_type *);
extern int riscv_float_const_rtx_index_for_fli (rtx);
@@ -717,6 +718,7 @@ extern void th_mempair_prepare_save_restore_operands 
(rtx[4], bool,
  int, HOST_WIDE_INT,
  int, HOST_WIDE_INT);
extern void th_mempair_save_restore_regs (rtx[4], bool, machine_mode);
+extern const char *th_asm_output_opcode (FILE *asm_out_file, const char *p);
#ifdef RTX_CODE
extern const char*
th_mempair_output_move (rtx[4], bool, machine_mode, RTX_CODE);
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 0d1cbc5cb5f..51878797287 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -5636,6 +5636,17 @@ riscv_get_v_regno_alignment (machine_mode mode)
   return lmul;
}
+/* Define ASM_OUTPUT_OPCODE to do anything special before
+   emitting an opcode.  */
+const char *
+riscv_asm_output_opcode (FILE *asm_out_file, const char *p)
+{
+  if (TARGET_XTHEADVECTOR)
+return th_asm_output_opcode (asm_out_file, p);
+
+  return p;
+}
+
/* Implement TARGET_PRINT_OPERAND.  The RISCV-specific operand codes are:
'h' Print the high-part relocation associated with OP, after stripping
diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index 6df9ec73c5e..c33361a254d 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -826,6 +826,10 @@ extern enum riscv_cc get_riscv_cc (const rtx use);
   asm_fprintf ((FILE), "%U%s", (NAME)); \
   } while (0)
+#undef ASM_OUTPUT_OPCODE
+#define ASM_OUTPUT_OPCODE(STREAM, PTR) \
+  (PTR) = riscv_asm_output_opcode(STREAM, PTR)
+
#define JUMP_TABLES_IN_TEXT_SECTION 0
#define CASE_VECTOR_MODE SImode
#define CASE_VECTOR_PC_RELATIVE (riscv_cmodel != CM_MEDLOW)
diff --git a/gcc/config/riscv/thead.cc b/gcc/config/riscv/thead.cc
index 20353995931..dc3aed3904d 100644
--- a/gcc/config/riscv/thead.cc
+++ b/gcc/config/riscv/thead.cc
@@ -883,6 +883,19 @@ th_output_move (rtx dest, rtx src)
   return NULL;
}
+/* Define ASM_OUTPUT_OPCODE to do anything special before
+   emitting an opcode.  */
+const char *
+th_asm_output_opcode (FILE *asm_out_file, const char *p)
+{
+  /* We need to add th. prefix to all the xtheadvector
+ instructions here.*/
+  if (current_output_insn != NULL && p[0] == 'v')
+fputs ("th.", asm_out_file);
+
+  return p;
+}
+
/* Implement TARGET_PRINT_OPERAND_ADDRESS for XTheadMemIdx.  */
bool
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/prefix.c 
b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/prefix.c
new file mode 100644
index 000..eee727ef6b4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/prefix.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gc_xtheadvector -mabi=ilp32 -O0" } */
+
+#include "riscv_vector.h"
+
+vint32m1_t
+prefix (vint32m1_t vx, vint32m1_t vy, size_t vl)
+{
+  return __riscv_vadd_vv_i32m1 (vx, vy, vl);
+}
+
+/* { dg-final { scan-assembler {\mth\.v\M} } } */
-- 
2.17.1

Re: [PATCH v8 2/2] RISC-V: Add crypto vector api-testing cases.

2024-01-08 Thread 钟居哲

LGTM.



juzhe.zh...@rivai.ai
 
From: Feng Wang
Date: 2024-01-08 17:12
To: gcc-patches
CC: kito.cheng; jeffreyalaw; juzhe.zhong; Feng Wang
Subject: [PATCH v8 2/2] RISC-V: Add crypto vector api-testing cases.
Patch v8: Resubmit after fix the rtl-checking issue. Passed all the riscv 
regression test.
Patch v7: Add newline at the end of file.
Patch v6: Move intrinsic tests into rvv/base.
Patch v5: Rebase
Patch v4: Add some RV32 vx constraint testcase.
Patch v3: Refine crypto vector api-testing cases.
Patch v2: Update march info according to the change of riscv-common.c
 
This patch add crypto vector api-testing cases based on
https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/eopc/vector-crypto/auto-generated/vector-crypto
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/zvbb-intrinsic.c: New test.
* gcc.target/riscv/rvv/base/zvbb_vandn_vx_constraint.c: New test.
* gcc.target/riscv/rvv/base/zvbc-intrinsic.c: New test.
* gcc.target/riscv/rvv/base/zvbc_vx_constraint-1.c: New test.
* gcc.target/riscv/rvv/base/zvbc_vx_constraint-2.c: New test.
* gcc.target/riscv/rvv/base/zvkg-intrinsic.c: New test.
* gcc.target/riscv/rvv/base/zvkned-intrinsic.c: New test.
* gcc.target/riscv/rvv/base/zvknha-intrinsic.c: New test.
* gcc.target/riscv/rvv/base/zvknhb-intrinsic.c: New test.
* gcc.target/riscv/rvv/base/zvksed-intrinsic.c: New test.
* gcc.target/riscv/rvv/base/zvksh-intrinsic.c: New test.
* gcc.target/riscv/zvkb.c: New test.
---
.../riscv/rvv/base/zvbb-intrinsic.c   | 179 ++
.../riscv/rvv/base/zvbb_vandn_vx_constraint.c |  15 ++
.../riscv/rvv/base/zvbc-intrinsic.c   |  62 ++
.../riscv/rvv/base/zvbc_vx_constraint-1.c |  14 ++
.../riscv/rvv/base/zvbc_vx_constraint-2.c |  14 ++
.../riscv/rvv/base/zvkg-intrinsic.c   |  24 +++
.../riscv/rvv/base/zvkned-intrinsic.c | 104 ++
.../riscv/rvv/base/zvknha-intrinsic.c |  33 
.../riscv/rvv/base/zvknhb-intrinsic.c |  33 
.../riscv/rvv/base/zvksed-intrinsic.c |  33 
.../riscv/rvv/base/zvksh-intrinsic.c  |  24 +++
gcc/testsuite/gcc.target/riscv/zvkb.c |  13 ++
12 files changed, 548 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvbb-intrinsic.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/zvbb_vandn_vx_constraint.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvbc-intrinsic.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/zvbc_vx_constraint-1.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/zvbc_vx_constraint-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvkg-intrinsic.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvkned-intrinsic.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvknha-intrinsic.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvknhb-intrinsic.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvksed-intrinsic.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvksh-intrinsic.c
create mode 100644 gcc/testsuite/gcc.target/riscv/zvkb.c
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/zvbb-intrinsic.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/zvbb-intrinsic.c
new file mode 100644
index 000..b7e25bfe819
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/zvbb-intrinsic.c
@@ -0,0 +1,179 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zvbb_zve64x -mabi=lp64d -Wno-psabi" } */
+#include "riscv_vector.h"
+
+vuint8mf8_t test_vandn_vv_u8mf8(vuint8mf8_t vs2, vuint8mf8_t vs1, size_t vl) {
+  return __riscv_vandn_vv_u8mf8(vs2, vs1, vl);
+}
+
+vuint32m1_t test_vandn_vx_u32m1(vuint32m1_t vs2, uint32_t rs1, size_t vl) {
+  return __riscv_vandn_vx_u32m1(vs2, rs1, vl);
+}
+
+vuint32m2_t test_vandn_vv_u32m2_m(vbool16_t mask, vuint32m2_t vs2, vuint32m2_t 
vs1, size_t vl) {
+  return __riscv_vandn_vv_u32m2_m(mask, vs2, vs1, vl);
+}
+
+vuint16mf2_t test_vandn_vx_u16mf2_m(vbool32_t mask, vuint16mf2_t vs2, uint16_t 
rs1, size_t vl) {
+  return __riscv_vandn_vx_u16mf2_m(mask, vs2, rs1, vl);
+}
+
+vuint32m4_t test_vandn_vv_u32m4_tumu(vbool8_t mask, vuint32m4_t maskedoff, 
vuint32m4_t vs2, vuint32m4_t vs1, size_t vl) {
+  return __riscv_vandn_vv_u32m4_tumu(mask, maskedoff, vs2, vs1, vl);
+}
+
+vuint64m4_t test_vandn_vx_u64m4_tumu(vbool16_t mask, vuint64m4_t maskedoff, 
vuint64m4_t vs2, uint64_t rs1, size_t vl) {
+  return __riscv_vandn_vx_u64m4_tumu(mask, maskedoff, vs2, rs1, vl);
+}
+
+vuint8m8_t test_vbrev_v_u8m8(vuint8m8_t vs2, size_t vl) {
+  return __riscv_vbrev_v_u8m8(vs2, vl);
+}
+
+vuint16m1_t test_vbrev_v_u16m1_m(vbool16_t mask, vuint16m1_t vs2, size_t vl) {
+  return __riscv_vbrev_v_u16m1_m(mask, vs2, vl);
+}
+
+vuint32m4_t test_vbrev_v_u32m4_tumu(vbool8_t mask, vuint32m4_t maskedoff, 
vuint32m4_t vs2, size_t vl) {
+  return __riscv_vbrev_v_u32m4_tumu(mask, maskedoff, vs2, vl);
+}
+
+vuint16mf4_t test_vbrev8_v_u16mf4(vuint16mf4_t vs2, size_t vl) {
+  return

Re: [PATCH v7 1/2] RISC-V: Add crypto vector builtin function.

2024-01-08 Thread 钟居哲

LGTM.



juzhe.zh...@rivai.ai
 
From: Feng Wang
Date: 2024-01-08 17:12
To: gcc-patches
CC: kito.cheng; jeffreyalaw; juzhe.zhong; Feng Wang
Subject: [PATCH v7 1/2] RISC-V: Add crypto vector builtin function.
Patch v7:Resubmit after fix trl-checking issue. Passed all the riscv regression 
test.
Patch v6:Remove unused code.
Patch v5:Rebase.
Patch v4:Merge crypto vector function.def into vector.
Patch v3:Define a shape for vaesz and merge vector-crypto-types.def
 into riscv-vector-builtins-types.def.
Patch v2:Optimize function_shape class for crypto_vector.
 
This patch add the intrinsic funtions of crypto vector based on the
intrinsic doc(https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob
/eopc/vector-crypto/auto-generated/vector-crypto/intrinsic_funcs.md).
 
Co-Authored by: Songhe Zhu 
Co-Authored by: Ciyan Pan 
gcc/ChangeLog:
 
* config/riscv/riscv-vector-builtins-bases.cc (class vandn):
Add new function_base for crypto vector.
(class bitmanip): Ditto. 
(class b_reverse):Ditto. 
(class vwsll):   Ditto. 
(class clmul):   Ditto. 
(class vg_nhab):  Ditto. 
(class crypto_vv):Ditto. 
(class crypto_vi):Ditto. 
(class vaeskf2_vsm3c):Ditto.
(class vsm3me): Ditto.
(BASE): Add BASE declaration for crypto vector.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def (REQUIRED_EXTENSIONS):
Add crypto vector intrinsic definition.
(vbrev): Ditto.
(vclz): Ditto.
(vctz): Ditto.
(vwsll): Ditto.
(vandn): Ditto.
(vbrev8): Ditto.
(vrev8): Ditto.
(vrol): Ditto.
(vror): Ditto.
(vclmul): Ditto.
(vclmulh): Ditto.
(vghsh): Ditto.
(vgmul): Ditto.
(vaesef): Ditto.
(vaesem): Ditto.
(vaesdf): Ditto.
(vaesdm): Ditto.
(vaesz): Ditto.
(vaeskf1): Ditto.
(vaeskf2): Ditto.
(vsha2ms): Ditto.
(vsha2ch): Ditto.
(vsha2cl): Ditto.
(vsm4k): Ditto.
(vsm4r): Ditto.
(vsm3me): Ditto.
(vsm3c): Ditto.
* config/riscv/riscv-vector-builtins-shapes.cc (struct crypto_vv_def):
Add new function_shape for crypto vector.
(struct crypto_vi_def): Ditto.
(struct crypto_vv_no_op_type_def): Ditto.
(SHAPE): Add SHAPE declaration of crypto vector.
* config/riscv/riscv-vector-builtins-shapes.h: Ditto.
* config/riscv/riscv-vector-builtins-types.def (DEF_RVV_CRYPTO_SEW32_OPS):
Add new data type for crypto vector.
(DEF_RVV_CRYPTO_SEW64_OPS): Ditto.
(vuint32mf2_t): Ditto.
(vuint32m1_t): Ditto.
(vuint32m2_t): Ditto.
(vuint32m4_t): Ditto.
(vuint32m8_t): Ditto.
(vuint64m1_t): Ditto.
(vuint64m2_t): Ditto.
(vuint64m4_t): Ditto.
(vuint64m8_t): Ditto.
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_CRYPTO_SEW32_OPS):
Add new data struct for crypto vector.
(DEF_RVV_CRYPTO_SEW64_OPS): Ditto.
(registered_function::overloaded_hash): Processing size_t uimm for C overloaded 
func.
* config/riscv/riscv-vector-builtins.def (vi): Add vi OP_TYPE.
---
.../riscv/riscv-vector-builtins-bases.cc  | 264 +-
.../riscv/riscv-vector-builtins-bases.h   |  28 ++
.../riscv/riscv-vector-builtins-functions.def |  94 +++
.../riscv/riscv-vector-builtins-shapes.cc |  87 +-
.../riscv/riscv-vector-builtins-shapes.h  |   4 +
.../riscv/riscv-vector-builtins-types.def |  25 ++
gcc/config/riscv/riscv-vector-builtins.cc | 133 -
gcc/config/riscv/riscv-vector-builtins.def|   1 +
8 files changed, 633 insertions(+), 3 deletions(-)
 
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index d70468542ee..d12bb89f91c 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -2127,6 +2127,212 @@ public:
   }
};
+/* Below implements are vector crypto */
+/* Implements vandn.[vv,vx] */
+class vandn : public function_base
+{
+public:
+  rtx expand (function_expander ) const override
+  {
+switch (e.op_info->op)
+  {
+  case OP_TYPE_vv:
+return e.use_exact_insn (code_for_pred_vandn (e.vector_mode ()));
+  case OP_TYPE_vx:
+return e.use_exact_insn (code_for_pred_vandn_scalar (e.vector_mode 
()));
+  default:
+gcc_unreachable ();
+  }
+  }
+};
+
+/* Implements vrol/vror/clz/ctz.  */
+template
+class bitmanip : public function_base
+{
+public:
+  bool apply_tail_policy_p () const override
+  {
+return (CODE == CLZ || CODE == CTZ) ? false : true;
+  }
+  bool apply_mask_policy_p () const override
+  {
+return (CODE == CLZ || CODE == CTZ) ? false : true;
+  }
+  bool has_merge_operand_p () const override
+  {
+return (CODE == CLZ || CODE == CTZ) ? false : true;
+  }
+  
+  rtx expand (function_expander ) const override
+  {
+switch (e.op_info->op)
+{
+  case OP_TYPE_v:
+  case OP_TYPE_vv:
+return e.use_exact_insn (code_for_pred_v (CODE, e.vector_mode ()));
+  case OP_TYPE_vx:
+return e.use_exact_insn (code_for_pred_v_scalar (CODE, e.vector_mode 
()));
+  default:
+gcc_unreachable ();
+}
+  }
+};
+
+/* Implements vbrev/vbrev8/vrev8.  */
+template
+class b_reverse : public

Re: RE: Loop vectorizer optimization questions

2024-01-08 Thread 钟居哲

Oh. It's nice to see you have support min/max index reduction.

I knew your patch can handle this following:

int idx = ii;
int max = mm;
for (int i = 0; i < n; ++i) {
  int x = a[i];
  if (max < x) {
max = x;
idx = i;
  }
}

But I wonder whether your patch can handle this:
int idx = ii;
int max = mm;
for (int i = 0; i < n; ++i) {
  int x = a[i];
  if (max <= x) {
max = x;
idx = i;
  }
}

Will you continue to work on min/max with index ?
Or you want me to continue this work base on your patch ?

I have an initial patch which roughly implemented LLVM's approach but turns out 
Richi doesn't want me to apply LLVM's approach so your patch may be more 
reasonable than LLVM's approach.

Thanks.


juzhe.zh...@rivai.ai
 
From: Tamar Christina
Date: 2024-01-09 01:50
To: 钟居哲; gcc
CC: rdapp.gcc; richard.guenther
Subject: RE: Loop vectorizer optimization questions
> 
> Also, another question is that I am working on min/max reduction with index, I
> believe it should be in GCC-15, but I wonder
> whether I can pre-post for review in stage 4, or I should post patch (min/max
> reduction with index) when GCC-15 is open.
> 
 
FWIW, We tried to implement this 5 years ago 
https://gcc.gnu.org/pipermail/gcc-patches/2019-November/534518.html
and you'll likely get the same feedback if you aren't already doing so.
 
I think Richard would prefer to have a general framework these kinds of 
operations.  We never got around to doing so
and it's still on my list but if you're taking care of it 
 
Just though I'd point out the previous feedback.
 
Cheers,
Tamar
 
> Thanks.
> 
> 
> juzhe.zh...@rivai.ai

Loop vectorizer optimization questions

2024-01-08 Thread 钟居哲

Hi, Richard.

I saw this following code:

  if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
{
  if (direct_internal_fn_supported_p (IFN_VCOND_MASK_LEN, vectype,
  OPTIMIZE_FOR_SPEED))
return false;
  else
vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype, NULL);
}

for early break, current early break is not sufficient to support target with 
length partial vector so that we are not able to enable early break for RVV.

I wonder if I want to support this in middle-end, is it allowed in GCC-14 ? Or 
should I defer to GCC-15.

Also, another question is that I am working on min/max reduction with index, I 
believe it should be in GCC-15, but I wonder
whether I can pre-post for review in stage 4, or I should post patch (min/max 
reduction with index) when GCC-15 is open.

Thanks.


juzhe.zh...@rivai.ai

Re: Re: [Committed] RISC-V: Use MAX instead of std::max [VSETVL PASS]

2024-01-07 Thread 钟居哲

Since in the previous review from Robin, he have ever asked me change std::max 
into MAX,
I thought the policy is preferring MAX instead of std::max.

I change the codes to make them consistent but it seems I am wrong.

So is it reasonable that I change all RVV-related codes back to use 
std::max/min ?

If yes, I can send a patch to adapt all of them in RVV related codes.

juzhe.zh...@rivai.ai

From: Jeff Law
Date: 2024-01-08 03:11
To: Juzhe-Zhong; gcc-patches
Subject: Re: [Committed] RISC-V: Use MAX instead of std::max [VSETVL PASS]

On 1/6/24 17:36, Juzhe-Zhong wrote:
> Obvious fix, Committed.
> 
> gcc/ChangeLog:
> 
> * config/riscv/riscv-vsetvl.cc: replace std::max by MAX.
Curious why you made this change -- in general we're moving to 
std::{min,max,swap} and away from macro-ized min/max/swap.

Jeff

Re: Re: [PATCH] RISC-V: Teach liveness computation loop invariant shift amount[Dynamic LMUL]

2024-01-05 Thread 钟居哲

Thanks Robin.

is_gimple_constant makes more senes. Committed with addressing your comments.

juzhe.zh...@rivai.ai

From: Robin Dapp
Date: 2024-01-05 17:54
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Teach liveness computation loop invariant shift 
amount[Dynamic LMUL]
> 1). We not only have vashl_optab,vashr_optab,vlshr_optab which vectorize 
> shift with vector shift amount,
> that is, vectorization of 'a[i] >> x[i]', the shift amount is loop variant.
> 2). But also, we have ashl_optab, ashr_optab, lshr_optab which can vectorize 
> shift with scalar shift amount,
> that is, vectorization of 'a[i] >> x', the shift amount is loop invariant.
> 

> +static bool
> +loop_invariant_op_p (class loop *loop,
> +  tree op)
> +{
> +  if (is_gimple_min_invariant (op))
> +return true;
> +  if (SSA_NAME_IS_DEFAULT_DEF (op)
> +  || !flow_bb_inside_loop_p (loop, gimple_bb (SSA_NAME_DEF_STMT (op
> +return true;
> +  return gimple_uid (SSA_NAME_DEF_STMT (op)) & 1;
> +}
> +

Looks like this is straight from tree-ssa-loop-ch.  Do we need
is_gimple_min_invariant (is_gimple_constant could be sufficient?)
and DEFAULT_DEF for our case?  The rhs of a shift should never contain
a default def?

I'm not entirely happy about the "loop invariant" heuristic/proxy
of the shift amount being vectorizable.  That seems like something
that could bite us in the future in case we do slp-like vectorization
on loop-invariant (but varying) data.

As it helps for now and is not a correctness issue I'd still tend to
go forward with it.

Regards
Robin

Re: [committed] RISC-V: Clean up testsuite for multi-lib testing [NFC]

2024-01-05 Thread 钟居哲

Hi, kito.

This patch causes these following regression FAILs:

FAIL: gcc.target/riscv/rvv/autovec/partial/single_rgroup_run-3.c (test for 
excess errors)
FAIL: gcc.target/riscv/rvv/autovec/partial/single_rgroup_run-3.c (test for 
excess errors)
FAIL: gcc.target/riscv/rvv/autovec/partial/single_rgroup_run-3.c (test for 
excess errors)
FAIL: gcc.target/riscv/rvv/autovec/partial/single_rgroup_run-3.c (test for 
excess errors)
FAIL: gcc.target/riscv/rvv/autovec/partial/single_rgroup_run-3.c (test for 
excess errors)
FAIL: gcc.target/riscv/rvv/autovec/partial/single_rgroup_run-3.c (test for 
excess errors)
FAIL: gcc.target/riscv/rvv/autovec/partial/single_rgroup_run-3.c (test for 
excess errors)
FAIL: gcc.target/riscv/rvv/autovec/partial/single_rgroup_run-3.c (test for 
excess errors)
FAIL: gcc.target/riscv/rvv/autovec/partial/single_rgroup_run-3.c (test for 
excess errors)
FAIL: gcc.target/riscv/rvv/autovec/partial/single_rgroup_run-3.c (test for 
excess errors)

spawn -ignore SIGHUP 
/work/home/jzzhong/work/docker/riscv-gnu-toolchain/build/dev-rv64gcv-lp64d-medany-newlib-spike-release-m1-scalable/build-gcc-newlib-stage2/gcc/xgcc
 
-B/work/home/jzzhong/work/docker/riscv-gnu-toolchain/build/dev-rv64gcv-lp64d-medany-newlib-spike-release-m1-scalable/build-gcc-newlib-stage2/gcc/
 
/work/home/jzzhong/work/docker/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/single_rgroup_run-3.c
 -march=rv64gcv -mabi=lp64d -mcmodel=medany -fdiagnostics-plain-output 
-ftree-vectorize -O2 --param riscv-autovec-lmul=m1 --param 
riscv-autovec-preference=scalable -lm -o ./single_rgroup_run-3.exe^M
In file included from 
/work/home/jzzhong/work/docker/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/single_rgroup-3.c:4,^M
 from 
/work/home/jzzhong/work/docker/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/single_rgroup_run-3.c:4:^M
/work/home/jzzhong/work/docker/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/single_rgroup_run-3.c:
 In function 'main':^M
/work/home/jzzhong/work/docker/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/single_rgroup-3.h:108:9:
 error: implicit declaration of function 'assert' 
[-Wimplicit-function-declaration]^M
/work/home/jzzhong/work/docker/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/single_rgroup-3.h:174:3:
 note: in expansion of macro 'run_6'^M
/work/home/jzzhong/work/docker/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/single_rgroup_run-3.c:16:3:
 note: in expansion of macro 'TEST_ALL'^M
/work/home/jzzhong/work/docker/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/single_rgroup-3.h:108:9:
 note: 'assert' is defined in header ''; this is probably fixable by 
adding '#include '^M
/work/home/jzzhong/work/docker/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/single_rgroup-3.h:174:3:
 note: in expansion of macro 'run_6'^M
/work/home/jzzhong/work/docker/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/single_rgroup_run-3.c:16:3:
 note: in expansion of macro 'TEST_ALL'^M
compiler exited with status 1
FAIL: gcc.target/riscv/rvv/autovec/partial/single_rgroup_run-3.c (test for 
excess errors)
Excess errors:
/work/home/jzzhong/work/docker/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/single_rgroup-3.h:108:9:
 error: implicit declaration of function 'assert' 
[-Wimplicit-function-declaration]

UNRESOLVED: gcc.target/riscv/rvv/autovec/partial/single_rgroup_run-3.c 
compilation failed to produce executable


Could you fix it ?



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2024-01-05 16:39
To: gcc-patches; kito.cheng; juzhe.zhong
CC: Kito Cheng
Subject: [committed] RISC-V: Clean up testsuite for multi-lib testing [NFC]
- Drop unnecessary including for stdlib.h and math.h
- Drop assert.h / assert, use __builtin_abort instead.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/binop/shift-scalar-template.h:
Use __builtin_abort instead of assert.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-1.c: Drop math.h.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_zvfh-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_zvfh-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_zvfh-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_zvfh-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_zvfh-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_zvfh-2.c: Ditto.
*

Re: Re: [committed] RISC-V: Add crypto vector builtin function.

2024-01-05 Thread 钟居哲

Thanks Jeff.

Yeah, I aggree we are not doing thing terribly wrong but Palmer request revert 
of the vector-crypto,
so I revert it (actually, I asked Li Pan revert it).

Actually, Wang Feng has fixed the issue:
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/641903.html 
It's just a pretty simple typo cause the ICE.

Soon, vector-crypto will be committed.

Eswin guys are working on various vector extension (vector crypto, BF16 vector, 
...etc).
And I have told them only vector-crypto can accepted in the GCC-14 release and 
defer BF16 vector to GCC-15.

So, I believe we won't have any more new features until GCC-14 release.

Thanks.

juzhe.zh...@rivai.ai

From: Jeff Law
Date: 2024-01-05 23:50
To: Palmer Dabbelt; juzhe.zhong
CC: gcc-patches; Kito Cheng; Kito.cheng
Subject: Re: [committed] RISC-V: Add crypto vector builtin function.

On 1/4/24 20:24, Palmer Dabbelt wrote:
> On Thu, 04 Jan 2024 19:17:21 PST (-0800), juzhe.zh...@rivai.ai wrote:
>> Hi, Wang Feng.
>>
>> Your patch has some ICEs:
>> FAIL: gcc.target/riscv/rvv/base/zvbc-intrinsic.c (internal compiler 
>> error: RTL check: expected code 'const_int', have 'reg' in 
>> vlmax_avl_type_p, at config/riscv/riscv-v.cc:4930)
>> FAIL: gcc.target/riscv/rvv/base/zvbc-intrinsic.c (test for excess errors)
>> FAIL: gcc.target/riscv/rvv/base/zvbc_vx_constraint-1.c (internal 
>> compiler error: RTL check: expected code 'const_int', have 'reg' in 
>> vlmax_avl_type_p, at config/riscv/riscv-v.cc:4930)
>> FAIL: gcc.target/riscv/rvv/base/zvbc_vx_constraint-1.c (test for 
>> excess errors)
>> FAIL: gcc.target/riscv/rvv/base/zvbc_vx_constraint-2.c (internal 
>> compiler error: RTL check: expected code 'const_int', have 'reg' in 
>> vlmax_avl_type_p, at config/riscv/riscv-v.cc:4930)
>> FAIL: gcc.target/riscv/rvv/base/zvbc_vx_constraint-2.c (test for 
>> excess errors)
> 
> So let's just revert it, it doesn't even look like it was reviewed. 
> We've set a really bad precedent here where we're just merging a bunch 
> of unreviewed code and sorting out the regressions in trunk, that's not 
> the right way to do things.
> 
>>
>> I suspect you didn't enable rtl check in the regression:
>>
>> ../../configure --enable-gcc-checking=rtl.
>> Plz enable rtl check in the regression tests.
We haven't ever required folks to test with RTL checking enabled due to 
its compile-time cost.  So I don't think Feng did anything wrong here.

IIRC, Jakub's standard practice over in the x86 world is to do a 
bootstrap and regression test with RTL checking enabled in the spring as 
we get closer to the release to weed out these kinds of things that can 
slip through.

Clearly there's a bug and we should fix it, but it's not a sign that 
anything has gone terribly wrong.

jeff

Re: Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions of XTheadVector.

2024-01-01 Thread 钟居哲

This is Ok from my side.
But before commit this patch, I think we need this patch first:
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/641533.html 

I will be back to work so I will take a look at other patches today.


juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2024-01-01 01:43
To: Jun Sha (Joshua); gcc-patches
CC: jim.wilson.gcc; palmer; andrew; philipp.tomsich; christoph.muellner; 
juzhe.zhong; Jin Ma; Xianmiao Qu
Subject: Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions of 
XTheadVector.
 
 
On 12/28/23 21:19, Jun Sha (Joshua) wrote:
> This patch adds th. prefix to all XTheadVector instructions by
> implementing new assembly output functions. We only check the
> prefix is 'v', so that no extra attribute is needed.
> 
> gcc/ChangeLog:
> 
> * config/riscv/riscv-protos.h (riscv_asm_output_opcode):
> New function to add assembler insn code prefix/suffix.
> * config/riscv/riscv.cc (riscv_asm_output_opcode): Likewise.
> * config/riscv/riscv.h (ASM_OUTPUT_OPCODE): Likewise.
> 
> Co-authored-by: Jin Ma 
> Co-authored-by: Xianmiao Qu 
> Co-authored-by: Christoph Müllner 
> ---
>   gcc/config/riscv/riscv-protos.h|  1 +
>   gcc/config/riscv/riscv.cc  | 14 ++
>   gcc/config/riscv/riscv.h   |  4 
>   .../gcc.target/riscv/rvv/xtheadvector/prefix.c | 12 
>   4 files changed, 31 insertions(+)
>   create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/prefix.c
> 
> diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
> index 31049ef7523..5ea54b45703 100644
> --- a/gcc/config/riscv/riscv-protos.h
> +++ b/gcc/config/riscv/riscv-protos.h
> @@ -102,6 +102,7 @@ struct riscv_address_info {
>   };
>   
>   /* Routines implemented in riscv.cc.  */
> +extern const char *riscv_asm_output_opcode (FILE *asm_out_file, const char 
> *p);
>   extern enum riscv_symbol_type riscv_classify_symbolic_expression (rtx);
>   extern bool riscv_symbolic_constant_p (rtx, enum riscv_symbol_type *);
>   extern int riscv_float_const_rtx_index_for_fli (rtx);
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 0d1cbc5cb5f..ea1d59d9cf2 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -5636,6 +5636,20 @@ riscv_get_v_regno_alignment (machine_mode mode)
> return lmul;
>   }
>   
> +/* Define ASM_OUTPUT_OPCODE to do anything special before
> +   emitting an opcode.  */
> +const char *
> +riscv_asm_output_opcode (FILE *asm_out_file, const char *p)
> +{
> +  /* We need to add th. prefix to all the xtheadvector
> + insturctions here.*/
> +  if (TARGET_XTHEADVECTOR && current_output_insn != NULL_RTX &&
> +  p[0] == 'v')
> +fputs ("th.", asm_out_file);
> +
> +  return p;
Just a formatting nit. The GNU standards break lines before the 
operator, not after.  So
   if (TARGET_XTHEADVECTOR
   && current_output_insn != NULL
   && p[0] == 'v')
 
Note that current_output_insn is "extern rtx_insn *", so use NULL, not 
NULL_RTX.
 
Neither of these nits require a new version for review.  Just fix them.
 
If Juzhe is fine with this, so am I.  We can refine it if necessary later.
 
jeff

Re: 回复：[PATCH v3 0/6] RISC-V: Support XTheadVector extension

2023-12-23 Thread 钟居哲

I suggest you send the first patch which support theadvector with only adding 
"th.".
After it's done, then we can talk about it later.



juzhe.zh...@rivai.ai
 
发件人： joshua
发送时间： 2023-12-23 11:37
收件人： juzhe.zh...@rivai.ai; gcc-patches
抄送： Jim Wilson; palmer; andrew; philipp.tomsich; jeffreyalaw; christoph.muellner
主题： 回复：回复：[PATCH v3 0/6] RISC-V: Support XTheadVector extension
Hi Juzhe,

Sorry but I'm not quite familiar with the group_overlap framework. Could you 
take this pattern as an example to show how to disable an alternative in some 
target?

Joshua

--
发件人：juzhe.zh...@rivai.ai 
发送时间：2023年12月22日(星期五) 18:32
收件人："cooper.joshua"; 
"gcc-patches"
抄　送：Jim Wilson; palmer; 
andrew; "philipp.tomsich"; 
jeffreyalaw; 
"christoph.muellner"; 
jinma; "cooper.qu"
主　题：Re: 回复：[PATCH v3 0/6] RISC-V: Support XTheadVector extension

Yeah.

(define_insn "@pred_msbc"
  [(set (match_operand: 0 "register_operand""=vr, vr, ")
  (unspec:
 [(minus:VI
   (match_operand:VI 1 "register_operand" "  0, vr,  vr")
   (match_operand:VI 2 "register_operand" " vr,  0,  vr"))
  (match_operand: 3 "register_operand"" vm, vm,  vm")
  (unspec:
[(match_operand 4 "vector_length_operand" " rK, rK,  rK")
 (match_operand 5 "const_int_operand" "  i,  i,   i")
 (reg:SI VL_REGNUM)
 (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)] UNSPEC_VMSBC))]
  "TARGET_VECTOR"
  "vmsbc.vvm\t%0,%1,%2,%3"
  [(set_attr "type" "vicalu")
   (set_attr "mode" "")
   (set_attr "vl_op_idx" "4")
   (set (attr "avl_type_idx") (const_int 5))])

You should use an attribute to disable alternative 0 and alternative 1 
constraint.


juzhe.zh...@rivai.ai
 
发件人： joshua
发送时间： 2023-12-22 18:29
收件人： juzhe.zh...@rivai.ai; gcc-patches
抄送： Jim Wilson; palmer; andrew; philipp.tomsich; jeffreyalaw; 
christoph.muellner; jinma; cooper.qu
主题： 回复：回复：[PATCH v3 0/6] RISC-V: Support XTheadVector extension
Hi Juzhe,
What xtheadvector needs to handle is just that destination vector register 
cannot overlap source vector register group for instructions like vmadc/vmsbc. 
That is not what group_overlap means. We nned to add "&" to the registers in 
the corresponding xtheadvector patterns while rvv 1.0 doesn't have this 
constraint.

(define_insn "@pred_th_msbc"
  [(set (match_operand: 0 "register_operand""=")
(unspec:
[(minus:VI
  (match_operand:VI 1 "register_operand" "  vr")
  (match_operand:VI 2 "register_operand" " vr"))
(match_operand: 3 "register_operand"" vm")
(unspec:
  [(match_operand 4 "vector_length_operand" " rK")
(match_operand 5 "const_int_operand" "  i")
(reg:SI VL_REGNUM)
(reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)] UNSPEC_VMSBC))]
  "TARGET_XTHEADVECTOR"
  "vmsbc.vvm\t%0,%1,%2,%3"
  [(set_attr "type" "vicalu")
  (set_attr "mode" "")
  (set_attr "vl_op_idx" "4")
  (set (attr "avl_type_idx") (const_int 5))])

Joshua







--
发件人：juzhe.zh...@rivai.ai 
发送时间：2023年12月22日(星期五) 16:07
收件人："cooper.joshua"; 
"gcc-patches"
抄　送：Jim Wilson; palmer; 
andrew; "philipp.tomsich"; 
jeffreyalaw; 
"christoph.muellner"; 
jinma; "cooper.qu"
主　题：Re: 回复：[PATCH v3 0/6] RISC-V: Support XTheadVector extension

You mean theadvector doesn't want the current RVV1.0 register overlap magic  as 
follows ?
The destination EEW is smaller than the source EEW and the overlap is in the 
lowest-numbered part of the source register group (e.g., when LMUL=1, vnsrl.wi 
v0, v0, 3 is legal, but a destination of v1 is not).
The destination EEW is greater than the source EEW, the source EMUL is at least 
1, and the overlap is in the highest-numbered part of the destination register 
group (e.g., when LMUL=8, vzext.vf4 v0, v6 is legal, but a source of v0, v2, or 
v4 is not).

If yes, I suggest disable the overlap constraint using attribute, More details 
you can learn from 

(set_attr "group_overlap"


juzhe.zh...@rivai.ai
 
发件人： joshua
发送时间： 2023-12-22 11:33
收件人： 钟居哲; gcc-patches
抄送： jim.wilson.gcc; palmer; andrew; philipp.tomsich; Jeff Law; Christoph 
Müllner; jinma; Cooper Qu
主题： 回复：[PATCH v3 0/6] RISC-V: Support XTheadVector extension
Hi Juzhe,

Thank you for your comprehensive comments.

Classifying theadvector intrinsics into 3 kinds is really important to make our 
patchset more organize

Re: Re: [PATCH] RISC-V: Make PHI initial value occupy live V_REG in dynamic LMUL cost model analysis

2023-12-22 Thread 钟居哲

Committed. Thanks Jeff.



juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-12-23 00:58
To: Juzhe-Zhong; gcc-patches
CC: kito.cheng; kito.cheng; rdapp.gcc
Subject: Re: [PATCH] RISC-V: Make PHI initial value occupy live V_REG in 
dynamic LMUL cost model analysis
 
 
On 12/22/23 02:51, Juzhe-Zhong wrote:
> Consider this following case:
> 
> foo:
>  ble a0,zero,.L11
>  lui a2,%hi(.LANCHOR0)
>  addisp,sp,-128
>  addia2,a2,%lo(.LANCHOR0)
>  mv  a1,a0
>  vsetvli a6,zero,e32,m8,ta,ma
>  vid.v   v8
>  vs8r.v  v8,0(sp) ---> spill
> .L3:
>  vl8re32.v   v16,0(sp)---> reload
>  vsetvli a4,a1,e8,m2,ta,ma
>  li  a3,0
>  vsetvli a5,zero,e32,m8,ta,ma
>  vmv8r.v v0,v16
>  vmv.v.x v8,a4
>  vmv.v.i v24,0
>  vadd.vv v8,v16,v8
>  vmv8r.v v16,v24
>  vs8r.v  v8,0(sp)---> spill
> .L4:
>  addiw   a3,a3,1
>  vadd.vv v8,v0,v16
>  vadd.vi v16,v16,1
>  vadd.vv v24,v24,v8
>  bne a0,a3,.L4
>  vsetvli zero,a4,e32,m8,ta,ma
>  sub a1,a1,a4
>  vse32.v v24,0(a2)
>  sllia4,a4,2
>  add a2,a2,a4
>  bne a1,zero,.L3
>  li  a0,0
>  addisp,sp,128
>  jr  ra
> .L11:
>  li  a0,0
>  ret
> 
> Pick unexpected LMUL = 8.
> 
> The root cause is we didn't involve PHI initial value in the dynamic LMUL 
> calculation:
> 
># j_17 = PHI---> # 
> vect_vec_iv_.8_24 = PHI <_25(9), { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }(5)>
> 
> We didn't count { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 } in consuming vector register but it does 
> allocate an vector register group for it.
Yup.  There's analogues in the scalar space.  Depending on the context 
we might consider the value live on the edge, at the end of e->src or at 
the start of e->dest.
 
In the scalar space we commonly have multiple constant values and we try 
to account for them as best as we can as each distinct constant can 
result in a constant load.  We also try to find pseudos that happen to 
already have the value we want so that they participate in the 
coalescing process.  I doubt either of these cases are particularly 
important for vector though.
 
 
> 
> This patch fixes this missing count. Then after this patch we pick up perfect 
> LMUL (LMUL = M4)
> 
> foo:
> ble a0,zero,.L9
> lui a4,%hi(.LANCHOR0)
> addi a4,a4,%lo(.LANCHOR0)
> mv a2,a0
> vsetivli zero,16,e32,m4,ta,ma
> vid.v v20
> .L3:
> vsetvli a3,a2,e8,m1,ta,ma
> li a5,0
> vsetivli zero,16,e32,m4,ta,ma
> vmv4r.v v16,v20
> vmv.v.i v12,0
> vmv.v.x v4,a3
> vmv4r.v v8,v12
> vadd.vv v20,v20,v4
> .L4:
> addiw a5,a5,1
> vmv4r.v v4,v8
> vadd.vi v8,v8,1
> vadd.vv v4,v16,v4
> vadd.vv v12,v12,v4
> bne a0,a5,.L4
> slli a5,a3,2
> vsetvli zero,a3,e32,m4,ta,ma
> sub a2,a2,a3
> vse32.v v12,0(a4)
> add a4,a4,a5
> bne a2,zero,.L3
> .L9:
> li a0,0
> ret
> 
> Tested on --with-arch=gcv no regression. Ok for trunk ?
> 
> PR target/113112
> 
> gcc/ChangeLog:
> 
> * config/riscv/riscv-vector-costs.cc (max_number_of_live_regs): Refine dump 
> information.
> (preferred_new_lmul_p): Make PHI initial value into live regs calculation.
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.dg/vect/costmodel/riscv/rvv/pr113112-1.c: New test.
OK assuming you've done the necessary regression testing.
 
jeff

Re: Re: [PATCH v1] RISC-V: XFail the signbit-5 run test for RVV

2023-12-21 Thread 钟居哲

Maybe use riscv_v ?



juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-12-22 03:16
To: pan2.li; gcc-patches
CC: juzhe.zhong; yanzhang.wang; kito.cheng; richard.guenther; tamar.christina
Subject: Re: [PATCH v1] RISC-V: XFail the signbit-5 run test for RVV
 
 
On 12/20/23 19:25, pan2...@intel.com wrote:
> From: Pan Li 
> 
> This patch would like to XFail the signbit-5 run test case for
> the RVV.  Given the case has one limitation like "This test does not
> work when the truth type does not match vector type." in the beginning
> of the test file.  Aka, the RVV vector truth type is not integer type.
> 
> The target board of riscv-sim like below will pick up `-march=rv64gcv`
> when building the run test elf. Thus, the RVV cannot bypass this test
> case like aarch64_sve with additional option `-march=armv8-a`.
> 
>riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow
> 
> For RVV, we leverage dg-xfail-run-if for this case like `amdgcn`.
But isn't that just going to turn this into an XPASS when vector is not 
enabled?
 
Looking at a recent rv64gc run of mine:
 
> PASS: gcc.dg/signbit-5.c (test for excess errors)
> PASS: gcc.dg/signbit-5.c execution test
 
 
Ideally we'd find a way to handle with and without vector.
 
jeff

Re: [PATCH v3 0/6] RISC-V: Support XTheadVector extension

2023-12-20 Thread 钟居哲

Btw, rv32/rv64gc or rv32/rv64 gcv testing is not enough.

We need full coverage testing, since we always commit patch after no regression 
testing on full coverage testing:

with these following configurations:

-march=rv[32/64]gc_zve32f_zvfh_zfh
-march=rv[32/64]gc_zve64d_zvfh_zfh
-march=rv[32/64]gcv_zvfh_zfh
-march=rv[32/64]gcv_zvl256b_zvfh_zfh
-march=rv[32/64]gcv_zvl512b_zvfh_zfh
-march=rv[32/64]gcv_zvl1024b_zvfh_zfh

-march=rv[32/64]gc_zve32f_zvfh_zfh --param=riscv-autovec-lmul=m2
-march=rv[32/64]gc_zve32f_zvfh_zfh --param=riscv-autovec-lmul=m4
-march=rv[32/64]gc_zve32f_zvfh_zfh --param=riscv-autovec-lmul=m8
-march=rv[32/64]gc_zve32f_zvfh_zfh --param=riscv-autovec-lmul=dynamic
-march=rv[32/64]gc_zve64d_zvfh_zfh --param=riscv-autovec-lmul=m2
-march=rv[32/64]gc_zve64d_zvfh_zfh --param=riscv-autovec-lmul=m4
-march=rv[32/64]gc_zve64d_zvfh_zfh --param=riscv-autovec-lmul=m8
-march=rv[32/64]gc_zve64d_zvfh_zfh --param=riscv-autovec-lmul=dynamic
-march=rv[32/64]gcv_zvfh_zfh --param=riscv-autovec-lmul=m2
-march=rv[32/64]gcv_zvfh_zfh --param=riscv-autovec-lmul=m4
-march=rv[32/64]gcv_zvfh_zfh --param=riscv-autovec-lmul=m8
-march=rv[32/64]gcv_zvfh_zfh --param=riscv-autovec-lmul=dynamic
-march=rv[32/64]gcv_zvl256b_zvfh_zfh --param=riscv-autovec-lmul=m2
-march=rv[32/64]gcv_zvl256b_zvfh_zfh --param=riscv-autovec-lmul=m4
-march=rv[32/64]gcv_zvl256b_zvfh_zfh --param=riscv-autovec-lmul=m8
-march=rv[32/64]gcv_zvl256b_zvfh_zfh --param=riscv-autovec-lmul=dynamic
-march=rv[32/64]gcv_zvl512b_zvfh_zfh --param=riscv-autovec-lmul=m2
-march=rv[32/64]gcv_zvl512b_zvfh_zfh --param=riscv-autovec-lmul=m4
-march=rv[32/64]gcv_zvl512b_zvfh_zfh --param=riscv-autovec-lmul=m8
-march=rv[32/64]gcv_zvl512b_zvfh_zfh --param=riscv-autovec-lmul=dynamic
-march=rv[32/64]gcv_zvl1024b_zvfh_zfh --param=riscv-autovec-lmul=m2
-march=rv[32/64]gcv_zvl1024b_zvfh_zfh --param=riscv-autovec-lmul=m4
-march=rv[32/64]gcv_zvl1024b_zvfh_zfh --param=riscv-autovec-lmul=m8
-march=rv[32/64]gcv_zvl1024b_zvfh_zfh --param=riscv-autovec-lmul=dynamic
-march=rv[32/64]gc_zve32f_zvfh_zfh --param=riscv-autovec-preference=fixed-vlmax
-march=rv[32/64]gc_zve32f_zvfh_zfh --param=riscv-autovec-lmul=m2 
--param=riscv-autovec-preference=fixed-vlmax
-march=rv[32/64]gc_zve32f_zvfh_zfh --param=riscv-autovec-lmul=m4 
--param=riscv-autovec-preference=fixed-vlmax
-march=rv[32/64]gc_zve32f_zvfh_zfh --param=riscv-autovec-lmul=m8 
--param=riscv-autovec-preference=fixed-vlmax
-march=rv[32/64]gc_zve32f_zvfh_zfh --param=riscv-autovec-lmul=dynamic 
--param=riscv-autovec-preference=fixed-vlmax
-march=rv[32/64]gc_zve64d_zvfh_zfh --param=riscv-autovec-preference=fixed-vlmax
-march=rv[32/64]gc_zve64d_zvfh_zfh --param=riscv-autovec-lmul=m2 
--param=riscv-autovec-preference=fixed-vlmax
-march=rv[32/64]gc_zve64d_zvfh_zfh --param=riscv-autovec-lmul=m4 
--param=riscv-autovec-preference=fixed-vlmax
-march=rv[32/64]gc_zve64d_zvfh_zfh --param=riscv-autovec-lmul=m8 
--param=riscv-autovec-preference=fixed-vlmax
-march=rv[32/64]gc_zve64d_zvfh_zfh --param=riscv-autovec-lmul=dynamic 
--param=riscv-autovec-preference=fixed-vlmax
-march=rv[32/64]gcv_zvfh_zfh --param=riscv-autovec-preference=fixed-vlmax
-march=rv[32/64]gcv_zvfh_zfh --param=riscv-autovec-lmul=m2 
--param=riscv-autovec-preference=fixed-vlmax
-march=rv[32/64]gcv_zvfh_zfh --param=riscv-autovec-lmul=m4 
--param=riscv-autovec-preference=fixed-vlmax
-march=rv[32/64]gcv_zvfh_zfh --param=riscv-autovec-lmul=m8 
--param=riscv-autovec-preference=fixed-vlmax
-march=rv[32/64]gcv_zvfh_zfh --param=riscv-autovec-lmul=dynamic 
--param=riscv-autovec-preference=fixed-vlmax
-march=rv[32/64]gcv_zvl256b_zvfh_zfh 
--param=riscv-autovec-preference=fixed-vlmax
-march=rv[32/64]gcv_zvl256b_zvfh_zfh --param=riscv-autovec-lmul=m2 
--param=riscv-autovec-preference=fixed-vlmax
-march=rv[32/64]gcv_zvl256b_zvfh_zfh --param=riscv-autovec-lmul=m4 
--param=riscv-autovec-preference=fixed-vlmax
-march=rv[32/64]gcv_zvl256b_zvfh_zfh --param=riscv-autovec-lmul=m8 
--param=riscv-autovec-preference=fixed-vlmax
-march=rv[32/64]gcv_zvl256b_zvfh_zfh --param=riscv-autovec-lmul=dynamic 
--param=riscv-autovec-preference=fixed-vlmax
-march=rv[32/64]gcv_zvl512b_zvfh_zfh 
--param=riscv-autovec-preference=fixed-vlmax
-march=rv[32/64]gcv_zvl512b_zvfh_zfh --param=riscv-autovec-lmul=m2 
--param=riscv-autovec-preference=fixed-vlmax
-march=rv[32/64]gcv_zvl512b_zvfh_zfh --param=riscv-autovec-lmul=m4 
--param=riscv-autovec-preference=fixed-vlmax
-march=rv[32/64]gcv_zvl512b_zvfh_zfh --param=riscv-autovec-lmul=m8 
--param=riscv-autovec-preference=fixed-vlmax
-march=rv[32/64]gcv_zvl512b_zvfh_zfh --param=riscv-autovec-lmul=dynamic 
--param=riscv-autovec-preference=fixed-vlmax
-march=rv[32/64]gcv_zvl1024b_zvfh_zfh 
--param=riscv-autovec-preference=fixed-vlmax
-march=rv[32/64]gcv_zvl1024b_zvfh_zfh --param=riscv-autovec-lmul=m2 
--param=riscv-autovec-preference=fixed-vlmax
-march=rv[32/64]gcv_zvl1024b_zvfh_zfh --param=riscv-autovec-lmul=m4 
--param=riscv-autovec-preference=fixed-vlmax

Re: [PATCH v3 0/6] RISC-V: Support XTheadVector extension

2023-12-20 Thread 钟居哲

Hi, Joshua.

Thanks for working hard on clean up codes and support tons of work on 
theadvector.

After fully review this patch, I understand you have 3 kinds of theadvector 
intrinsics from the codebase of current RVV1.0 GCC.

1). instructions that can leverage all current codes of RVV1.0 intrinsic with 
simply adding "th." prefix directly.
2). instructions that leverage current MD patterns but with some tweak and 
patterns copy since they are not simply added "th.".
3). new instructions that current RVV1.0 doesn't have like vlb instructions.

Overal, 1) and 3) look reasonable to me. But 2) need me some time to figure out 
the better way to do that (Current this patch with copying patterns is not 
approach I like)

So, I hope you can break this big patch into 3 different series patches.

1. Support partial theadvector instructions which leverage directly from 
current RVV1.0 with simple adding "th." prefix.
2. Support totally different name theadvector instructions but share same 
patterns as RVV1.0 instructions.
3. Support new headvector instructions like vlib...etc.

I think 1 and 3 separate patches can be quickly merged after my more details 
reviewed and approved in the following patches you send like V4 ?.

For 2, it's a bit more complicate, but I think we can support like ARM and 
other targets, use ASM targethook to rewrite the whole string of the 
instructions.
For example, like strided load/store, you can know this instructions from 
attribute:
(set_attr "type" "vlds")






juzhe.zh...@rivai.ai
 
From: Jun Sha (Joshua)
Date: 2023-12-20 20:20
To: gcc-patches
CC: jim.wilson.gcc; palmer; andrew; philipp.tomsich; jeffreyalaw; 
christoph.muellner; juzhe.zhong; Jun Sha (Joshua); Jin Ma; Xianmiao Qu
Subject: [PATCH v3 0/6] RISC-V: Support XTheadVector extension
This patch series presents gcc implementation of the XTheadVector
extension [1].
 
[1] https://github.com/T-head-Semi/thead-extension-spec/
 
For some vector patterns that cannot be avoided, we use
"!TARGET_XTHEADVECTOR" to disable them in order not to
generate instructions that xtheadvector does not support,
causing 36 changes in vector.md.
 
For the th. prefix issue, we use current_output_insn and
the ASM_OUTPUT_OPCODE hook instead of directly modifying
patterns in vector.md.
 
We have run the GCC test suite and can confirm that there
are no regressions.
 
All the test results can be found in the following links,
Run without xtheadvector:
https://gcc.gnu.org/pipermail/gcc-testresults/2023-December/803686.html
 
Run with xtheadvector:
https://gcc.gnu.org/pipermail/gcc-testresults/2023-December/803687.html
 
Furthermore, we have run the tests in 
https://github.com/riscv-non-isa/rvv-intrinsic-doc/tree/main/examples, 
and all the tests passed.
 
Co-authored-by: Jin Ma 
Co-authored-by: Xianmiao Qu 
Co-authored-by: Christoph Müllner 
 
RISC-V: Refactor riscv-vector-builtins-bases.cc
RISC-V: Split csr_operand in predicates.md for vector patterns
RISC-V: Introduce XTheadVector as a subset of V1.0.0
RISC-V: Adds the prefix "th." for the instructions of XTheadVector
RISC-V: Handle differences between XTheadvector and Vector
RISC-V: Add support for xtheadvector-specific intrinsics
 
---
gcc/common/config/riscv/riscv-common.cc   |   23 +
gcc/config.gcc|4 +-
gcc/config/riscv/autovec.md   |2 +-
gcc/config/riscv/predicates.md|8 +-
gcc/config/riscv/riscv-c.cc   |8 +-
gcc/config/riscv/riscv-protos.h   |1 +
gcc/config/riscv/riscv-string.cc  |3 +
gcc/config/riscv/riscv-v.cc   |   13 +-
.../riscv/riscv-vector-builtins-bases.cc  |   18 +-
.../riscv/riscv-vector-builtins-bases.h   |   19 +
.../riscv/riscv-vector-builtins-shapes.cc |  149 +
.../riscv/riscv-vector-builtins-shapes.h  |3 +
.../riscv/riscv-vector-builtins-types.def |  120 +
gcc/config/riscv/riscv-vector-builtins.cc |  315 +-
gcc/config/riscv/riscv-vector-builtins.h  |5 +-
gcc/config/riscv/riscv-vector-switch.def  |  150 +-
gcc/config/riscv/riscv.cc |   46 +-
gcc/config/riscv/riscv.h  |4 +
gcc/config/riscv/riscv.opt|2 +
gcc/config/riscv/riscv_th_vector.h|   49 +
gcc/config/riscv/t-riscv  |   16 +
.../riscv/thead-vector-builtins-functions.def |  659 
gcc/config/riscv/thead-vector-builtins.cc |  887 ++
gcc/config/riscv/thead-vector-builtins.h  |  123 +
gcc/config/riscv/thead-vector.md  | 2827 +
gcc/config/riscv/vector-iterators.md  |  186 +-
gcc/config/riscv/vector.md|   44 +-
.../riscv/predef-__riscv_th_v_intrinsic.c |   11 +
.../gcc.target/riscv/rvv/base/abi-1.c |2 +-
.../gcc.target/riscv/rvv/base/pragma-1.c  |2 +-
.../gcc.target/riscv/rvv/xtheadvector.c   |   13 +
.../riscv/rvv/xtheadvector/prefix.c   |   12 +

Re: Re: [PATCH v3 4/6] RISC-V: Adds the prefix "th." for the instructions of XTheadVector.

2023-12-20 Thread 钟居哲

>> So rather than looking at the mode, would it make more sense to have an
>> attribute (or re-use an existing attribute) to identify which opcodes
>> are going to need prefixing?  We've got access to the INSN via
>> current_output_insn.  So we can lookup attributes trivially.

Yes, I totally aggree with Jeff's idea. We have addes many attributes for each 
RVV instructions.
For example, VSETVL PASS is highly depending on those attribute to do the 
optimizations.

Btw, I have review the full patch and I am gonna give more comprehensive 
comments in cover letter.



juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-12-21 02:22
To: Jun Sha (Joshua); gcc-patches
CC: jim.wilson.gcc; palmer; andrew; philipp.tomsich; christoph.muellner; 
juzhe.zhong; Jin Ma; Xianmiao Qu
Subject: Re: [PATCH v3 4/6] RISC-V: Adds the prefix "th." for the instructions 
of XTheadVector.
 
 
On 12/20/23 05:32, Jun Sha (Joshua) wrote:
> This patch adds th. prefix to all XTheadVector instructions by
> implementing new assembly output functions.
> 
> gcc/ChangeLog:
> 
> * config/riscv/riscv-protos.h
> (riscv_asm_output_opcode): New function.
> * config/riscv/riscv.cc (riscv_asm_output_opcode): Likewise.
> * config/riscv/riscv.h (ASM_OUTPUT_OPCODE): Likewise.
> 
> Co-authored-by: Jin Ma 
> Co-authored-by: Xianmiao Qu 
> Co-authored-by: Christoph Müllner 
> ---
>   gcc/config/riscv/riscv-protos.h   |  1 +
>   gcc/config/riscv/riscv.cc | 26 +++
>   gcc/config/riscv/riscv.h  |  4 +++
>   .../riscv/rvv/xtheadvector/prefix.c   | 12 +
>   4 files changed, 43 insertions(+)
>   create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/prefix.c
> 
> diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
> index eaee53ce94e..f0eee71a18a 100644
> --- a/gcc/config/riscv/riscv-protos.h
> +++ b/gcc/config/riscv/riscv-protos.h
> @@ -101,6 +101,7 @@ struct riscv_address_info {
>   };
>   
>   /* Routines implemented in riscv.cc.  */
> +extern void riscv_asm_output_opcode(FILE *asm_out_file, const char *p);
>   extern enum riscv_symbol_type riscv_classify_symbolic_expression (rtx);
>   extern bool riscv_symbolic_constant_p (rtx, enum riscv_symbol_type *);
>   extern int riscv_float_const_rtx_index_for_fli (rtx);
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 8ae65760b6e..d3010bed8d8 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -5595,6 +5595,32 @@ riscv_get_v_regno_alignment (machine_mode mode)
> return lmul;
>   }
>   
> +void
> +riscv_asm_output_opcode(FILE *asm_out_file, const char *p)
Needs a function comment.  There's several examples in this file you can 
use to see the style we commonly use.  And a minor formatting nit, 
always put a space between a function name and an open paren.
 
 
> +{
> +  if (!TARGET_XTHEADVECTOR)
> +return;
> +
> +  if (current_output_insn == NULL_RTX)
> +return;
> +
> +  /* We need to handle the 'vset' special case here since it cannot
> + be controlled by vector mode. */
> +  if (!strncmp (p, "vset", 4))
> +{
> +  fputs ("th.", asm_out_file);
> +  return;
> +}
> +
> +  subrtx_iterator::array_type array;
> +  FOR_EACH_SUBRTX (iter, array, PATTERN (current_output_insn), ALL)
> +if (*iter && riscv_v_ext_mode_p (GET_MODE (*iter)) && p[0] == 'v')
> +  {
> + fputs ("th.", asm_out_file);
> + return;
> +  }
> +}
So rather than looking at the mode, would it make more sense to have an 
attribute (or re-use an existing attribute) to identify which opcodes 
are going to need prefixing?  We've got access to the INSN via 
current_output_insn.  So we can lookup attributes trivially.
 
This is a question, not a demand -- I'm looking for a solution that's 
going to be reliable with minimal effort going forward.
 
jeff

Re: Re: [PATCH] RISC-V: Block VLSmodes according to TARGET_MAX_LMUL and BITS_PER_RISCV_VECTOR

2023-12-20 Thread 钟居哲

I was thinking the commit log is quite clear but I don't mind telling you more 
details behind this patch.

First, This patch is not an optimization patch, is fatal ICE and run-time bug 
fix patch as commit log said.

Before this patch, we enable more aggressive VLSmodes that size larger than 
MAX_LMUL * MIN_VLEN which
cause many ICE an run-time FAILs due to middle-end doesn't allow them.
It's a long time issue in my TODO list and I was hoping to fix them in another 
way but I failed to do that.

So, I disable those aggressive VLSmodes which fixed over 1K+ ICE and run-time 
FAILs in full coverage testing
as well as PR. I think you should read the PR disscussion then you will known 
the full picture like:

Robin comment #8 in this PR:

112853 – RISC-V: RVV: SPEC2017 525.x264 regression (gnu.org)

" style="color: rgb(96, 112, 207);">Robin Dapp 2023-12-06 10:19:06 UTC
With Juzhe's latest fix that disables VLS modes >= 128 bit for zvl128b x264 
runs without issues here and some of the additional execution failures are gone.
Will post the current comparison later.

I have said it clearly that dump FAIL much lower priority than those lethal ICE 
and run-time executions.

We have made quick analysis on those dump FAILs, some of them just needs test 
adaption, some of them may need some
new auto-vectorization pattern to recover back the codegen since this patch 
disallow aggressive VLS modes which hurt the
performance in very few situations.

So we made the decision here, ignore those very few bogus dump FAILs introduced 
by this patch, fix lethal all ICE and run-time FAILs due to aggressive VLS 
modes first.
Make the testing continue to exposing more other lethal bugs.  Then after full 
coverage testing is stable, I will come back revisit them.

Btw, almost dump FAILs are fixed introduced by this patch except this one: 
slp-reduc-sad-2.c 

And full coverage testing is stable now (No lethal ICE and run time fail except:

FAIL: gcc.dg/pr30957-1.c execution test
FAIL: gcc.dg/signbit-5.c execution test )

These 2 FAILs may trigger undefined behavior and such behaviors different 
between RISC-V and other targets.(Li Pan is working on it)

So I will come back revisit slp-reduc-sad-2.c today (unless there are some 
other lethal ICE and run-time execution FAILs raised in bugzilla, to me, the 
highest priority is always fixing ICE and run FAILs)
which may need vec_unapck/vec_widen_xxx patterns.
I am still investigating whether we can fix this dump FAIL avoid adding new 
patterns.

Thanks.

juzhe.zh...@rivai.ai

From: Palmer Dabbelt
Date: 2023-12-21 01:44
To: juzhe.zhong
CC: gcc-patches; Kito Cheng; kito.cheng; jeffreyalaw; Robin Dapp; juzhe.zhong
Subject: Re: [PATCH] RISC-V: Block VLSmodes according to TARGET_MAX_LMUL and 
BITS_PER_RISCV_VECTOR
On Tue, 05 Dec 2023 04:57:27 PST (-0800), juzhe.zh...@rivai.ai wrote:
> This patch fixes ICE mentioned on PR112851 and PR112852.
> Actually these ICEs happens many times in full coverage testing.
>
> The ICE happens on:
>
> bug.c:84:1: internal compiler error: in partial_subreg_p, at rtl.h:3187
>84 | }
>   | ^
> 0x11a7271 partial_subreg_p(machine_mode, machine_mode)
> ../../../../gcc/gcc/rtl.h:3187
>
> gcc_checking_assert (ordered_p (outer_prec, inner_prec));
>
> outer_prec is the PRECISION of RVVM1SImode
> inner_prec is the PRECISION of V64SImode
>
> when it is zvl512b.
>
> outer_prec is VLA mode with size (512, 512)
> inner_prec is VLS mode with size (2048, 0)
>
> Their precision/size relationship is not certain.
> So block VLSmodes according to TARGET_MAX_LMUL and BITS_PER_RISCV_VECTOR, 
> then we never reaches
> the situation that comparing the precision/size between VLA size and VLS size 
> that size > coeffs[0] of VLA mode.
>
> Note this patch cause following regression:
>
> FAIL: gcc.target/riscv/rvv/autovec/pr111751.c -O3 -ftree-vectorize  
> scan-assembler-not vset
> FAIL: gcc.target/riscv/rvv/autovec/pr111751.c -O3 -ftree-vectorize  
> scan-assembler-times li\\s+[a-x0-9]+,0\\s+ret 2
>
> FAIL: gcc.target/riscv/rvv/base/cpymem-1.c check-function-bodies f3
> FAIL: gcc.target/riscv/rvv/base/cpymem-2.c check-function-bodies f2
> FAIL: gcc.target/riscv/rvv/base/cpymem-2.c check-function-bodies f3
>
> 1. cpymem check FAIL should be fixed on the testcase since the test is 
> fragile which should be robostified.
>
> 2. pr111751.c is Vector cost model issue, and I will fix it in the following 
> patch.
>
> For now, we should land this patch first (highest-priority) since it is 
> fixing ICE.
>
> PR target/112851
> PR target/112852

I know I'm pretty late here, but this has happened a bunch of times 
before and I keep getting stuck on other stuff and thus don't get the 
time to say anything.  So I figured I'd say something anyway:

Please stop committing code that introduces new test failures, even if 
you don't think those failures are important.  We've got a lot of people 
trying to push through the test failures with the hope of getting larger 
code bases

Re: Re: [PATCH] RISC-V: Fix calculation of max live vregs

2023-12-20 Thread 钟居哲

Ok. Thanks Jeff reminding me.
Will be carefull next time.

juzhe.zh...@rivai.ai

From: Jeff Law
Date: 2023-12-20 23:28
To: juzhe.zh...@rivai.ai; demin.han; gcc-patches
CC: pan2.li
Subject: Re: [PATCH] RISC-V: Fix calculation of max live vregs

On 12/20/23 04:17, juzhe.zh...@rivai.ai wrote:
> I see. LGTM. Thanks for explanation.
> 
> I will ask Li Pan commit it for you.
The patch from Demin didn't specify if it had been regression tested.

All patches must be regression tested and an indication that the test 
passed and on what target must be included in the patch email thread.

Please don't ACK patches that haven't followed this policy. It's OK with 
conditions like "OK after verifying this patch doesn't cause regressions 
in the testsuite on rv64gc" or something similar.

jeff

Re: RE: [PATCH] Regression FIX: Remove vect_variable_length XFAIL from some tests

2023-12-19 Thread 钟居哲

Do you mean for ARM SVE, these tests need to be specified as only ARM SVE ?

Actually, for RVV, is same situation as ARM. We are using VLS modes 
(fixed-length vectors) to vectorize these cases so that they are XPASS.

The difference between RVV and ARM is that: variable-length and fixed-length 
vectors are both valid on RVV, using same RVV ISA.
Wheras, for ARM, variable-length vectors use SVE ISA but fixed-length vectors 
use NEON ISA.




juzhe.zh...@rivai.ai
 
From: Tamar Christina
Date: 2023-12-19 20:29
To: Juzhe-Zhong; gcc-patches@gcc.gnu.org
CC: rguent...@suse.de
Subject: RE: [PATCH] Regression FIX: Remove vect_variable_length XFAIL from 
some tests
Hi Juzhe,
 
> -Original Message-
> From: Juzhe-Zhong 
> Sent: Tuesday, December 19, 2023 11:19 AM
> To: gcc-patches@gcc.gnu.org
> Cc: rguent...@suse.de; Tamar Christina ; Juzhe-
> Zhong 
> Subject: [PATCH] Regression FIX: Remove vect_variable_length XFAIL from some
> tests
> 
> Hi, this patch fixes these following regression FAILs on RVV:
> 
> XPASS: gcc.dg/tree-ssa/pr84512.c scan-tree-dump optimized "return 285;"
> XPASS: gcc.dg/vect/bb-slp-43.c -flto -ffat-lto-objects  scan-tree-dump-not 
> slp2
> "vector operands from scalars"
> XPASS: gcc.dg/vect/bb-slp-43.c scan-tree-dump-not slp2 "vector operands from
> scalars"
> XPASS: gcc.dg/vect/bb-slp-subgroups-3.c -flto -ffat-lto-objects  
> scan-tree-dump-
> times slp2 "optimized: basic block" 2
> XPASS: gcc.dg/vect/bb-slp-subgroups-3.c scan-tree-dump-times slp2 "optimized:
> basic block" 2
> 
> Since vect_variable_length are available for ARM SVE and RVV, I just use 
> compiler
> explorer to confirm ARM SVE same as
> RVV.
> 
> Hi, @Tamar. Could you double check whether this patch fix is reasonable to 
> you ?
> 
 
Hmm I would be surprised if this is working correctly for RVV since as far as I 
know we don't have
variable length support in SLP i.e. SLP can't predicate operation during build 
so the
current vectorizer only supports fixed length vector SLP, unless Richi did some 
magic?
 
For SVE the reason this XPASS is because the compiler will fallback to NEON 
unless it's
told it can't.  But that's not actually testing VLA SLP.
 
i.e. https://godbolt.org/z/5n5fWahxh  just using `+sve` isn't enough and it has 
to be told
it can only use SVE.  Is it perhaps something similar for RVV?
 
If RVV has a similar param, perhaps the correct fix is to append it to the 
tests so they
XFAIL correctly?
 
Regards,
Tamar
 
> And.
> 
> Hi, @Richard. Is this patch Ok for trunk if this patch fixes regression for 
> both RVV
> and ARM SVE.
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.dg/tree-ssa/pr84512.c: Remove vect_variable_length XFAIL.
> * gcc.dg/vect/bb-slp-43.c: Ditto.
> * gcc.dg/vect/bb-slp-subgroups-3.c: Ditto.
> 
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/pr84512.c| 2 +-
>  gcc/testsuite/gcc.dg/vect/bb-slp-43.c  | 2 +-
>  gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c | 2 +-
>  3 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr84512.c 
> b/gcc/testsuite/gcc.dg/tree-
> ssa/pr84512.c
> index 496c78b28dc..3c027012670 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/pr84512.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr84512.c
> @@ -13,4 +13,4 @@ int foo()
>  }
> 
>  /* Listed targets xfailed due to PR84958.  */
> -/* { dg-final { scan-tree-dump "return 285;" "optimized" { xfail { 
> amdgcn*-*-* ||
> vect_variable_length } } } } */
> +/* { dg-final { scan-tree-dump "return 285;" "optimized" { xfail { 
> amdgcn*-*-* } } }
> } */
> diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-43.c 
> b/gcc/testsuite/gcc.dg/vect/bb-
> slp-43.c
> index dad2d24262d..40bd2e0dfbf 100644
> --- a/gcc/testsuite/gcc.dg/vect/bb-slp-43.c
> +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-43.c
> @@ -14,4 +14,4 @@ f (int *restrict x, short *restrict y)
>  }
> 
>  /* { dg-final { scan-tree-dump-not "mixed mask and nonmask" "slp2" } } */
> -/* { dg-final { scan-tree-dump-not "vector operands from scalars" "slp2" { 
> target {
> { vect_int && vect_bool_cmp } && { vect_unpack && vect_hw_misalign } } xfail {
> vect_variable_length && { ! vect256 } } } } } */
> +/* { dg-final { scan-tree-dump-not "vector operands from scalars" "slp2" { 
> target {
> { vect_int && vect_bool_cmp } && { vect_unpack && vect_hw_misalign } } } } } 
> */
> diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c
> b/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c
> index fb719915db7..3f0d45ce4a1 100644
> --- a/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c
> +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c
> @@ -42,7 +42,7 @@ main (int argc, char **argv)
>  /* Because we disable the cost model, targets with variable-length
> vectors can end up vectorizing the store to a[0..7] on its own.
> With the cost model we do something sensible.  */
> -/* { dg-final { scan-tree-dump-times "optimized: basic block" 2 "slp2" { 
> target { !
> amdgcn-*-* } xfail vect_variable_length } } } */
> +/* { dg-final { scan-tree-dump-times

Re: [PATCH] fold-const: Handle AND, IOR, XOR with stepped vectors [PR112971].

2023-12-18 Thread 钟居哲

Thanks Robin send initial patch to fix this ICE bug.

CC to Richard S, Richard B, and Andrew.

Thanks.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-12-19 03:50
To: gcc-patches
CC: rdapp.gcc; Li, Pan2; juzhe.zh...@rivai.ai
Subject: [PATCH] fold-const: Handle AND, IOR, XOR with stepped vectors 
[PR112971].
Hi,
 
found in PR112971, this patch adds folding support for bitwise operations
of const duplicate zero vectors and stepped vectors.
On riscv we have the situation that a folding would perpetually continue
without simplifying because e.g. {0, 0, 0, ...} & {7, 6, 5, ...} would
not fold to {0, 0, 0, ...}.
 
Bootstrapped and regtested on x86 and aarch64, regtested on riscv.
 
I won't be available to respond quickly until next year.  Pan or Juzhe,
as discussed, feel free to continue with possible revisions.
 
Regards
Robin
 
 
gcc/ChangeLog:
 
PR middle-end/112971
 
* fold-const.cc (const_binop): Handle
zerop@1  AND/IOR/XOR  VECT_CST_STEPPED_P@2
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/pr112971.c: New test.
---
gcc/fold-const.cc  | 14 +-
.../gcc.target/riscv/rvv/autovec/pr112971.c| 18 ++
2 files changed, 31 insertions(+), 1 deletion(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c
 
diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index f5d68ac323a..43ed097bf5c 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -1653,8 +1653,20 @@ const_binop (enum tree_code code, tree arg1, tree arg2)
 {
   tree type = TREE_TYPE (arg1);
   bool step_ok_p;
+
+  /* AND, IOR as well as XOR with a zerop can be handled directly.  */
   if (VECTOR_CST_STEPPED_P (arg1)
-   && VECTOR_CST_STEPPED_P (arg2))
+   && VECTOR_CST_DUPLICATE_P (arg2)
+   && integer_zerop (VECTOR_CST_ELT (arg2, 0)))
+ step_ok_p = code == BIT_AND_EXPR || code == BIT_IOR_EXPR
+   || code == BIT_XOR_EXPR;
+  else if (VECTOR_CST_STEPPED_P (arg2)
+&& VECTOR_CST_DUPLICATE_P (arg1)
+&& integer_zerop (VECTOR_CST_ELT (arg1, 0)))
+ step_ok_p = code == BIT_AND_EXPR || code == BIT_IOR_EXPR
+   || code == BIT_XOR_EXPR;
+  else if (VECTOR_CST_STEPPED_P (arg1)
+&& VECTOR_CST_STEPPED_P (arg2))
/* We can operate directly on the encoding if:
  a3 - a2 == a2 - a1 && b3 - b2 == b2 - b1
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c
new file mode 100644
index 000..816ebd3c493
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c
@@ -0,0 +1,18 @@
+/* { dg-do compile }  */
+/* { dg-options "-march=rv64gcv_zvl256b -mabi=lp64d -O3 -fno-vect-cost-model" 
}  */
+
+int a;
+short b[9];
+char c, d;
+void e() {
+  d = 0;
+  for (;; d++) {
+if (b[d])
+  break;
+a = 8;
+for (; a >= 0; a--) {
+  char *f = 
+  *f &= d == (a & d);
+}
+  }
+}
-- 
2.43.0

Re: [PATCH] RISC-V: Add -fno-vect-cost-model to pr112773 testcase

2023-12-14 Thread 钟居哲

LGTM

juzhe.zh...@rivai.ai

From: Patrick O'Neill
Date: 2023-12-15 05:32
To: gcc-patches
CC: rdapp.gcc; juzhe.zhong; Patrick O'Neill
Subject: [PATCH] RISC-V: Add -fno-vect-cost-model to pr112773 testcase
The testcase for pr112773 started passing after r14-6472-g8501edba91e
which was before the actual fix. This patch adds -fno-vect-cost-model
which prevents the testcase from passing due to the vls change.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/partial/pr112773.c: Add
-fno-vect-cost-model.

Signed-off-by: Patrick O'Neill 
---
gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/pr112773.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/pr112773.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/pr112773.c
index 5f7374b0040..57104c9ebec 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/pr112773.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/pr112773.c
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-additional-options "-march=rv32gcv_zvl256b -mabi=ilp32d -O3" } */
+/* { dg-additional-options "-march=rv32gcv_zvl256b -mabi=ilp32d -O3 
-fno-vect-cost-model" } */
long long a;
int b, c;
-- 
2.42.0

Re: Re: [PATCH] Middle-end: Adjust decrement IV style partial vectorization COST model

2023-12-14 Thread 钟居哲

Thanks a lot for the clarification.

I send a patch to remove address memory cost:
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640595.html 
for decremnt IV/SELECT_VL.

And I have tested various cases that are all get better codegen in RVV.



juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-12-14 20:50
To: juzhe.zh...@rivai.ai
CC: gcc-patches; richard.sandiford; jeffreyalaw
Subject: Re: Re: [PATCH] Middle-end: Adjust decrement IV style partial 
vectorization COST model
On Thu, 14 Dec 2023, juzhe.zh...@rivai.ai wrote:
 
> Thanks Richard.
> 
> Let me clarify again to make sure I understand your comments correctly:
> 
> Do you suggest not to model address cost here like other partial 
> vectorization style (while_ult, avx512...etc). Then set COST = 1 since 
> we only have SELECT_VL since beginning. At various cases we saw, COST=1 
> is better than COST=2.
 
I suggest to not model with address cost in mind since nothing else in
the vectorizer does that and thus you're "comparing" apples with
oranges then.
 
If address cost is important to decide between SELECT_VL and not
SELECT_VL then we'd need to start modeling address cost _at all_.
 
Richard.
 
> Thanks.
> 
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-12-14 18:46
> To: juzhe.zhong
> CC: gcc-patches; richard.sandiford; jeffreyalaw
> Subject: Re: [PATCH] Middle-end: Adjust decrement IV style partial 
> vectorization COST model
> 
> 
> Am 14.12.2023 um 09:28 schrieb juzhe.zh...@rivai.ai:
> 
>  
> Hi, Richard.
> 
> I have a question about the decrement IV costing since I find the reduction 
> case is generating inferior codegen.
> 
> reduc_plus_int:
> mv a3,a0
> ble a1,zero,.L7
> addiw a5,a1,-1
> li a4,2
> bleu a5,a4,.L8
> vsetivli zero,4,e32,m1,ta,ma
> srliw a4,a1,2
> vmv.v.i v1,0
> slli a4,a4,4
> add a4,a4,a0
> mv a5,a0
> .L4:
> vle32.v v2,0(a5)
> addi a5,a5,16
> vadd.vv v1,v1,v2
> bne a5,a4,.L4
> li a5,0
> vmv.s.x v2,a5
> andi a5,a1,-4
> vredsum.vs v1,v1,v2
> vmv.x.s a0,v1
> beq a1,a5,.L12
> .L3:
> subw a1,a1,a5
> slli a5,a5,32
> srli a5,a5,32
> slli a1,a1,32
> vsetvli a4,zero,e32,m1,ta,ma
> slli a5,a5,2
> srli a1,a1,32
> vmv.v.i v1,0
> add a3,a3,a5
> vsetvli a1,a1,e8,mf4,ta,ma
> vle32.v v3,0(a3)
> li a5,0
> vsetivli zero,1,e32,m1,ta,ma
> vmv.s.x v2,a5
> vsetvli zero,a1,e32,m1,tu,ma
> vmv.v.v v1,v3
> vsetvli a4,zero,e32,m1,ta,ma
> vredsum.vs v1,v1,v2
> vmv.x.s a5,v1
> addw a0,a0,a5
> ret
> .L12:
> ret
> .L7:
> li a0,0
> ret
> .L8:
> li a5,0
> li a0,0
> j .L3
> 
> This patch adjust length_update_cost from 3 (original cost) into 2 can fix 
> conversion case (the test append in this patch).
> But can't fix reduction case.
> 
> Then I adjust it into COST = 1:
> 
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 19e38b8637b..50c99b1fe79 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -4877,7 +4877,7 @@ vect_estimate_min_profitable_iters (loop_vec_info 
> loop_vinfo,
>  processed in current iteration, and a SHIFT operation to
>  compute the next memory address instead of adding 
> vectorization
>  factor.  */
> - length_update_cost = 2;
> + length_update_cost = 1;
> else
>   /* For increment IV stype, Each may need two MINs and one MINUS 
> to
>  update lengths in body for next iteration.  */
> 
> Then the codegen is reasonable now:
> 
> reduc_plus_int:
> ble a1,zero,.L4
> vsetvli a5,zero,e32,m1,ta,ma
> vmv.v.i v1,0
> .L3:
> vsetvli a5,a1,e32,m1,tu,ma
> vle32.v v2,0(a0)
> slli a4,a5,2
> sub a1,a1,a5
> add a0,a0,a4
> vadd.vv v1,v2,v1
> bne a1,zero,.L3
> li a5,0
> vsetivli zero,1,e32,m1,ta,ma
> vmv.s.x v2,a5
> vsetvli a5,zero,e32,m1,ta,ma
> vredsum.vs v1,v1,v2
> vmv.x.s a0,v1
> ret
> .L4:
> li a0,0
> ret
> 
> The reason I set COST = 2 instead of 1 in this patch since
> 
> one COST is for SELECT_VL.
> 
> The other is for memory address calculation since we don't update memory 
> address with adding VF directly,
> instead:
> 
> We shift the result of SELECV_VL, and then add it into the memory IV as 
> follows:
> 
> SSA_1 = SELECT_VL --> SSA_1 is element-wise
> SSA_2 = SSA_1 << 1 (If element is INT16, make it to be bytes-wise)
> next iteration memory address = current iteration memory address + SSA_2.
> 
> If elemente is INT8, then the shift operation is not needed:
> SSA_2 = SSA_1 << 1 since it is already byte-wise.
> 
> So, my question is the COST should be 1 or 2.
> It seems that COST = 1 is better for
> using SELECT_VL.
> 
> We are not modeling address cost, so trying to account for this here is only 
> going to be a heuristic that?s as many times wrong than it is correct.  If 
> the difference between SELECT_VL and not is so small  then you?ll have a hard 
> time modeling this here.
> 
>  
> Thanks.
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-12-13 18:17
> To: Juzhe-Zhong
> CC: gcc-patches; richard.sandiford; jeffreyalaw
> Subject: Re: [PATCH]

Re: Re: [PATCH] expmed: Perform mask extraction via QImode [PR112773].

2023-12-13 Thread 钟居哲

Thanks Richard.

LGTM for RISC-V part.

Thanks Robin for fixing it.



juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-12-13 22:05
To: Robin Dapp
CC: Richard Biener; gcc-patches; juzhe.zhong\@rivai.ai
Subject: Re: [PATCH] expmed: Perform mask extraction via QImode [PR112773].
Robin Dapp  writes:
> @@ -1758,16 +1759,19 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 
> bitsize, poly_uint64 bitnum,
>if (VECTOR_MODE_P (outermode) && !MEM_P (op0))
>  {
>scalar_mode innermode = GET_MODE_INNER (outermode);
>enum insn_code icode
>  = convert_optab_handler (vec_extract_optab, outermode, innermode);
>poly_uint64 pos;
> -  if (icode != CODE_FOR_nothing
> -   && known_eq (bitsize, GET_MODE_BITSIZE (innermode))
> -   && multiple_p (bitnum, GET_MODE_BITSIZE (innermode), ))
> +  if ((icode != CODE_FOR_nothing
> +&& known_eq (bitsize, GET_MODE_PRECISION (innermode))
> +&& multiple_p (bitnum, GET_MODE_PRECISION (innermode), )))
 
This adds an extra, unnecessary layer of bracketing.  OK for the
target-independent parts without that.
 
Thanks,
Richard
 
>  {
>class expand_operand ops[3];
>  
> -   create_output_operand ([0], target, innermode);
> +   create_output_operand ([0], target,
> + insn_data[icode].operand[0].mode);
>ops[0].target = 1;
>create_input_operand ([1], op0, outermode);
>create_integer_operand ([2], pos);
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/pr112773.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/pr112773.c
> new file mode 100644
> index 000..5f7374b0040
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/pr112773.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-march=rv32gcv_zvl256b -mabi=ilp32d -O3" } */
> +
> +long long a;
> +int b, c;
> +int *d;
> +void e(unsigned f) {
> +  for (;; ++c)
> +if (f) {
> +  a = 0;
> +  for (; a <= 3; a++) {
> +f = 0;
> +for (; f <= 0; f++)
> +  if ((long)a)
> +break;
> +  }
> +  if (b)
> +*d = f;
> +}
> +}

Re: Re: [PATCH 2/3] RISC-V: setmem for RISCV with V extension

2023-12-11 Thread 钟居哲

I think we should leave it to user choice.

--param=riscv-autovec-lmul=m1/m2/m4/m8/dynamic.

So use TARGET_MAX_LMUL should be more reasonable.



juzhe.zh...@rivai.ai
 
From: Sergei Lewis
Date: 2023-12-11 22:58
To: juzhe.zh...@rivai.ai
CC: gcc-patches; Robin Dapp; Kito.cheng; jeffreyalaw
Subject: Re: [PATCH 2/3] RISC-V: setmem for RISCV with V extension
The thinking here is that using the largest possible LMUL when we know the 
operation will fit in fewer registers potentially leaves performance on the 
table - indirectly, due to the unnecessarily increased register pressure, and 
also directly, depending on the implementation.

On Mon, Dec 11, 2023 at 10:05 AM juzhe.zh...@rivai.ai  
wrote:
Hi, Thanks for contributing this.

+/* Select appropriate LMUL for a single vector operation based on
+   byte size of data to be processed.
+   On success, return true and populate lmul_out.
+   If length_in is too wide for a single vector operation, return false
+   and leave lmul_out unchanged.  */
+
+static bool
+select_appropriate_lmul (HOST_WIDE_INT length_in,
+HOST_WIDE_INT _out)
+{
I don't think we need this, you only need to use TARGET_MAX_LMUL




juzhe.zh...@rivai.ai

Re: Re: [PATCH] RISC-V: Add vectorized strcmp.

2023-12-09 Thread 钟居哲

I didn't use any special configuration:

--with-arch=rv64gcv_zvl256b --with-abi=lp64d --test --jobs=64 --with-sim=qemu 
--enable-gcc-checking=yes,assert,extra,rtlflag,rtl,gimple



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-12-09 22:07
To: 钟居哲; gcc-patches; palmer; kito.cheng; Jeff Law
CC: rdapp.gcc
Subject: Re: [PATCH] RISC-V: Add vectorized strcmp.
> rv64gcv
 
With -minline-strcmp I assume?
 
Regards
Robin

Re: Re: [PATCH] RISC-V: Add vectorized strcmp.

2023-12-09 Thread 钟居哲

rv64gcv



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-12-09 21:51
To: 钟居哲; gcc-patches; palmer; kito.cheng; Jeff Law
CC: rdapp.gcc
Subject: Re: [PATCH] RISC-V: Add vectorized strcmp.
> FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
> FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
> FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
> FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
> FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
> FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
> FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
> FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
> FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
> FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
 
Thanks, which config?  For me everything under builtin passes on rv64gcv
and rv32gcv:
 
PASS: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
PASS: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
PASS: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
PASS: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
PASS: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
PASS: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
PASS: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
PASS: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
PASS: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
PASS: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
 
Regards
Robin

Re: [PATCH] RISC-V: Recognize stepped series in expand_vec_perm_const.

2023-12-09 Thread 钟居哲

It's more reasonable to fix it in vec_perm_const instead of fix it in 
middle-end.

LGTM.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-12-09 21:18
To: gcc-patches; palmer; Kito Cheng; jeffreyalaw; juzhe.zh...@rivai.ai
CC: rdapp.gcc
Subject: [PATCH] RISC-V: Recognize stepped series in expand_vec_perm_const.
Hi,
 
we currently try to recognize various forms of stepped (const_vector)
sequence variants in expand_const_vector.  Because of complications with
canonicalization and encoding it is easier to identify such patterns
in expand_vec_perm_const_1 already where perm.series_p () is available.
 
This patch introduces shuffle_series as new permutation pattern and
tries to recognize series like [base0 base1 base1 + step ...].  If such
a series is found the series is expanded by expand_vec_series and a
gather is emitted.
 
On top the patch fixes the step recognition in expand_const_vector
for stepped series where such a series would end up before.
 
This fixes several execution failures when running code compiled for a
scalable vector size of 128 on a target with vlen = 256 or higher.
The problem was only noticed there because the encoding for a reversed
[2 2]-element vector ("3 2 1 0") is { [1 2], [0 2], [1 4] }.
Some testcases that failed were:
 
vect-alias-check-18.c
vect-alias-check-1.F90
pr64365.c
 
On a 128-bit target, only the first two elements are used.  The
third element causing the complications only comes into effect at
vlen = 256.
 
With this patch the testsuite results are similar with vlen = 128 and
vlen = 256 (when built with -march=rv64gcv_zvl128b).
 
Regards
Robin
 
 
gcc/ChangeLog:
 
* config/riscv/riscv-v.cc (expand_const_vector):  Fix step
calculation.
(modulo_sel_indices): Also perform modulo for variable-length
constants.
(shuffle_series): Recognize series permutations.
(expand_vec_perm_const_1): Add shuffle_series.
---
gcc/config/riscv/riscv-v.cc | 66 +++--
1 file changed, 64 insertions(+), 2 deletions(-)
 
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 9b99d0aca84..fd6ef0660a2 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -1378,12 +1378,15 @@ expand_const_vector (rtx target, rtx src)
 { base0, base1, base1 + step, base1 + step * 2, ... }  */
  rtx base0 = builder.elt (0);
  rtx base1 = builder.elt (1);
-   rtx step = builder.elt (2);
+   rtx base2 = builder.elt (2);
+
+   scalar_mode elem_mode = GET_MODE_INNER (mode);
+   rtx step = simplify_binary_operation (MINUS, elem_mode, base2, base1);
+
  /* Step 1 - { base1, base1 + step, base1 + step * 2, ... }  */
  rtx tmp = gen_reg_rtx (mode);
  expand_vec_series (tmp, base1, step);
  /* Step 2 - { base0, base1, base1 + step, base1 + step * 2, ... }  */
-   scalar_mode elem_mode = GET_MODE_INNER (mode);
  if (!rtx_equal_p (base0, const0_rtx))
base0 = force_reg (elem_mode, base0);
@@ -3395,6 +3398,63 @@ shuffle_extract_and_slide1up_patterns (struct 
expand_vec_perm_d *d)
   return true;
}
+static bool
+shuffle_series (struct expand_vec_perm_d *d)
+{
+  if (!d->one_vector_p || d->perm.encoding ().npatterns () != 1)
+return false;
+
+  poly_int64 el1 = d->perm[0];
+  poly_int64 el2 = d->perm[1];
+  poly_int64 el3 = d->perm[2];
+
+  poly_int64 step1 = el2 - el1;
+  poly_int64 step2 = el3 - el2;
+
+  bool need_insert = false;
+  bool have_series = false;
+
+  /* Check for a full series.  */
+  if (known_ne (step1, 0) && d->perm.series_p (0, 1, el1, step1))
+have_series = true;
+
+  /* Check for a series starting at the second element.  */
+  else if (known_ne (step2, 0) && d->perm.series_p (1, 1, el2, step2))
+{
+  have_series = true;
+  need_insert = true;
+}
+
+  if (!have_series)
+return false;
+
+  /* Get a vector int-mode to be used for the permute selector.  */
+  machine_mode sel_mode = related_int_vector_mode (d->vmode).require ();
+  insn_code icode = optab_handler (vec_shl_insert_optab, sel_mode);
+
+  /* We need to be able to insert an element and shift the vector.  */
+  if (need_insert && icode == CODE_FOR_nothing)
+return false;
+
+  /* Success! */
+  if (d->testing_p)
+return true;
+
+  /* Create the series.  */
+  machine_mode eltmode = Pmode;
+  rtx series = gen_reg_rtx (sel_mode);
+  expand_vec_series (series, gen_int_mode (need_insert ? el2 : el1, eltmode),
+  gen_int_mode (need_insert ? step2 : step1, eltmode));
+
+  /* Insert the remaining element if necessary.  */
+  if (need_insert)
+emit_insn (GEN_FCN (icode) (series, series, gen_int_mode (el1, eltmode)));
+
+  emit_vlmax_gather_insn (d->target, d->op0, series);
+
+  return true;
+}
+
/* Recognize the pattern that can be shuffled by generic approach.  */
static bool
@@ -3475,6 +3535,8 @@ expand_vec_perm_const_1 (struct expand_vec_perm_d *d)
return true;
  if (shuffle_extract_and_slide1up_patterns (d))
return true;
+   if (shuffle_series (d))
+ return true;
  if (shuffle_generic_patterns (d))
return true;

Re: Re: [PATCH] RISC-V: Add vectorized strcmp.

2023-12-08 Thread 钟居哲

FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test




juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-12-09 00:25
To: gcc-patches; palmer; kito.cheng; Jeff Law; 钟居哲
CC: rdapp.gcc
Subject: Re: [PATCH] RISC-V: Add vectorized strcmp.
Ah, I forgot to attach the current v2 that also enables strncmp.
It was additionally tested with -minline-strncmp on rv64gcv.
 
Regards
Robin
 
Subject: [PATCH v2] RISC-V: Add vectorized strcmp and strncmp.
 
This patch adds vectorized strcmp and strncmp implementations and
tests.  Similar to strlen, expansion is still guarded by
-minline-str(n)cmp.
 
gcc/ChangeLog:
 
PR target/112109
 
* config/riscv/riscv-protos.h (expand_strcmp): Declare.
* config/riscv/riscv-string.cc (riscv_expand_strcmp): Add
strategy handling and delegation to scalar and vector expanders.
(expand_strcmp): Vectorized implementation.
* config/riscv/riscv.md: Add TARGET_VECTOR to strcmp and strncmp
expander.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c: New test.
* gcc.target/riscv/rvv/autovec/builtin/strcmp.c: New test.
* gcc.target/riscv/rvv/autovec/builtin/strncmp-run.c: New test.
* gcc.target/riscv/rvv/autovec/builtin/strncmp.c: New test.
---
gcc/config/riscv/riscv-protos.h   |   1 +
gcc/config/riscv/riscv-string.cc  | 161 +-
gcc/config/riscv/riscv.md |   6 +-
.../riscv/rvv/autovec/builtin/strcmp-run.c|  32 
.../riscv/rvv/autovec/builtin/strcmp.c|  13 ++
.../riscv/rvv/autovec/builtin/strncmp-run.c   | 136 +++
.../riscv/rvv/autovec/builtin/strncmp.c   |  13 ++
7 files changed, 357 insertions(+), 5 deletions(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strncmp-run.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strncmp.c
 
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index c7b5789a4b3..20bbb5b859c 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -558,6 +558,7 @@ void expand_cond_binop (unsigned, rtx *);
void expand_cond_ternop (unsigned, rtx *);
void expand_popcount (rtx *);
void expand_rawmemchr (machine_mode, rtx, rtx, rtx, bool = false);
+bool expand_strcmp (rtx, rtx, rtx, rtx, unsigned HOST_WIDE_INT, bool);
void emit_vec_extract (rtx, rtx, poly_int64);
/* Rounding mode bitfield for fixed point VXRM.  */
diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index 6cde1bf89a0..11c1f74d0b3 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -511,12 +511,19 @@ riscv_expand_strcmp (rtx result, rtx src1, rtx src2,
 return false;
   alignment = UINTVAL (align_rtx);
-  if (TARGET_ZBB || TARGET_XTHEADBB)
+  if (TARGET_VECTOR && stringop_strategy & STRATEGY_VECTOR)
 {
-  return riscv_expand_strcmp_scalar (result, src1, src2, nbytes, alignment,
- ncompare);
+  bool ok = riscv_vector::expand_strcmp (result, src1, src2,
+  bytes_rtx, alignment,
+  ncompare);
+  if (ok)
+ return true;
 }
+  if ((TARGET_ZBB || TARGET_XTHEADBB) && stringop_strategy & STRATEGY_SCALAR)
+return riscv_expand_strcmp_scalar (result, src1, src2, nbytes, alignment,
+ncompare);
+
   return false;
}
@@ -1092,4 +1099,152 @@ expand_rawmemchr (machine_mode mode, rtx dst, rtx 
haystack, rtx needle,
 }
}
+/* Implement cmpstr using vector instructions.  The ALIGNMENT and
+   NCOMPARE parameters are unused for now.  */
+
+bool
+expand_strcmp (rtx result, rtx src1, rtx src2, rtx nbytes,
+unsigned HOST_WIDE_INT, bool)
+{
+  gcc_assert (TARGET_VECTOR);
+
+  /* We don't support big endian.  */
+  if (BYTES_BIG_ENDIAN)
+return false;
+
+  bool with_length = nbytes != NULL_RTX;
+
+  if (with_length
+  && (!REG_P (nbytes) && !SUBREG_P (nbytes) && !CONST_INT_P (nbytes)))
+return false;
+
+  if (with_length && CONST_INT_P (nbytes))
+nbytes = force_reg (Pmode, nbytes);
+
+  machine_mode mode = E_QImode;
+  unsigned int isize = GET_MODE_SIZE (mode).to_constant ();
+  int lmul = TARGET_MAX_LMUL;
+  poly_int64

Re: Re: [PATCH] RISC-V: Block VLSmodes according to TARGET_MAX_LMUL and BITS_PER_RISCV_VECTOR

2023-12-05 Thread 钟居哲

Hi, Robin.

>> Wouldn't maybe_gt on the mode precision already suffice?  I.e. do we need
>> the ordered_p and the exclusion for masks?  (Sure, masks never exceed
>> one register anyway.)

Currently, I don't see mask mode cause assertion ICE.

>> Couldn't we exclude all VLS modes that exceed our minimum vector size?
>> Or will this exclude too many?

I think the VLS modes are excluded exactly meet we expected.
For example, when zvl128b, LMUL = 1.
We allow allow VLS modes <= 128bit, exclude VLS modes > 128bits.
We have the same behavior as ARM SVE.

>> And could we move this to vls_mode_valid_p?  We already do similar
>> checks for fixed_vlmax there.
This check is already in the vls_mode_valid_p.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-12-05 22:34
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Block VLSmodes according to TARGET_MAX_LMUL and 
BITS_PER_RISCV_VECTOR
Yes, makes sense. Just one clarifying question.
 
> +{
> +  if (GET_MODE_CLASS (vls_mode) != MODE_VECTOR_BOOL
> +   && !ordered_p (TARGET_MAX_LMUL * BITS_PER_RISCV_VECTOR,
> + GET_MODE_PRECISION (vls_mode)))
> + /* We enable VLS modes which are aligned with TARGET_MAX_LMUL and
> +BITS_PER_RISCV_VECTOR.
> +
> +e.g. When TARGET_MAX_LMUL = 1 and BITS_PER_RISCV_VECTOR = (128,128).
> +We enable VLS modes have fixed size <= 128bit.  Since ordered_p is
> +false between VLA modes with size = (128, 128) bits and VLS mode
> +with size = 128 bits, we will end up with multiple ICEs in
> +middle-end generic codes.  */
> + return false;
> +  return true;
> +}
 
Wouldn't maybe_gt on the mode precision already suffice?  I.e. do we need
the ordered_p and the exclusion for masks?  (Sure, masks never exceed
one register anyway.)
Couldn't we exclude all VLS modes that exceed our minimum vector size?
Or will this exclude too many?
 
And could we move this to vls_mode_valid_p?  We already do similar
checks for fixed_vlmax there.
 
Regards
Robin

Re: Re: [PATCH] RISC-V: Support highest-number regno overlap for widen ternary vx instructions

2023-12-04 Thread 钟居哲

Adapt patch in V2 with explictly write constraints in the pattern:
[V2] RISC-V: Support highest-number regno overlap for widen ternary - Patchwork 
(sourceware.org)

Thanks.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-12-04 20:13
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Support highest-number regno overlap for widen 
ternary vx instructions
> +(define_mode_attr widen_ternop_dest_constraint [
> +  (RVVM8QI "=vd, vr, vd, vr, vd, vr, ?")
> +  (RVVM4QI "=vd, vr, vd, vr, vd, vr, ?")
> +  (RVVM2QI "=vd, vr, vd, vr, vd, vr, ?")
> +  (RVVM1QI "=vd, vr, vd, vr, vd, vr, ?")
> +  (RVVMF2QI "=vd, vr, vd, vr, vd, vr, ?")
> +  (RVVMF4QI "=vd, vr, vd, vr, vd, vr, ?")
> +  (RVVMF8QI "=vd, vr, vd, vr, vd, vr, ?")
> +  (RVVM8HI "=vd, vr, vd, vr, vd, vr, ?")
[...]
 
I'm fine with avoiding the overlap but I'm not sure this is
easily maintainable because the constraints don't actually
depend on the mode?  I suppose this is for easy re-use across
different insns but there are only six(?) widening patterns
so we don't even save lines of code by this?
 
I guess I would prefer the normal approach of writing it out
explicitly in the pattern.  Maybe add a different replacement
method like define_subst in the future to simplify such
situations?
 
 
+  "vwmacc.vx\t%0,%z3,%4%p1"
 
Why the z here?  For canonicalization?
 
Regards
Robin

Re: Re: [PATCH 2/6] c: Turn int-conversion warnings into permerrors

2023-12-01 Thread 钟居哲

newlib/newlib/libm/complex/catanl.c:55:20:
 error: implicit declaration of function 'atan2l'; did you mean 'atan2f'? 
[-Wimplicit-function-declaration]
   55 | t = 0.5L * atan2l(2.0L * x, a);
  |^~
  |atan2f
/work/home/jzzhong/work/toolchain/riscv/build/dev-rv64gcv_zvfh_zfh-lp64d-medany-linux-spike-debug/../../newlib/newlib/libm/complex/catanl.c:65:26:
 error: implicit declaration of function 'logl'; did you mean 'logf'? 
[-Wimplicit-function-declaration]
   65 | w = w + (0.25L * logl(a)) * I;
  |  ^~~~
  |  logf
make[4]: *** [Makefile:43354: libm/complex/libm_a-csinhl.o] Error 1
make[4]: *** [Makefile:43368: libm/complex/libm_a-csinl.o] Error 1
make[4]: *** [Makefile:43382: libm/complex/libm_a-catanl.o] Error 1
make[4]: Leaving directory 
'/work/home/jzzhong/work/toolchain/riscv/build/dev-rv64gcv_zvfh_zfh-lp64d-medany-linux-spike-debug/build-newlib-nano/riscv64-unknown-elf/newlib'
make[3]: *** [Makefile:5283: all] Error 2
make[3]: Leaving directory 
'/work/home/jzzhong/work/toolchain/riscv/build/dev-rv64gcv_zvfh_zfh-lp64d-medany-linux-spike-debug/build-newlib-nano/riscv64-unknown-elf/newlib'
make[2]: *** [Makefile:8492: all-target-newlib] Error 2
make[2]: Leaving directory 
'/work/home/jzzhong/work/toolchain/riscv/build/dev-rv64gcv_zvfh_zfh-lp64d-medany-linux-spike-debug/build-newlib-nano'
make[1]: *** [Makefile:879: all] Error 2
make[1]: Leaving directory 
'/work/home/jzzhong/work/toolchain/riscv/build/dev-rv64gcv_zvfh_zfh-lp64d-medany-linux-spike-debug/build-newlib-nano'
make: *** [Makefile:641: stamps/build-newlib-nano] Error 2

Confirm newlib/glibc/musl definitely can not be compiled by trunk GCC.



juzhe.zh...@rivai.ai
 
From: Patrick O'Neill
Date: 2023-12-02 09:10
To: 钟居哲; gcc-patches
CC: thomas; fweimer
Subject: Re: [PATCH 2/6] c: Turn int-conversion warnings into permerrors
Hi Juzhe,

I can confirm the failure on Newlib.
I'm not seeing any issues on glibc 2.37.
I haven't tried to build musl.

Since this patch promotes warnings to errors breakages were probably expected.
The fix may require changes to newlib to remove the errors.
I've hacked together a series of patches on top of newlib 4.3.0 that resolves 
these issues (but I think they'd need more work to be upstream-able):
https://github.com/patrick-rivos/riscv-gnu-toolchain/tree/35d8e8c486bd2f6e3e2e673db8d2b979309a6de4/fixups/newlib

@Thomas @Florian am I right in assuming that breakages were expected/the fix 
should come from fixing the warnings?

Thanks,
Patrick
On 12/1/23 16:33, 钟居哲 wrote:
Hi, This patch cause error on building newlib/glibc/musl on RISC-V port:

/work/home/jzzhong/work/toolchain/riscv/build/dev-rv64gcv_zvfh_zfh-lp64d-medany-newlib-spike-debug/../../newlib/libgloss/riscv/sys_access.c:8:40:
 error: passing argument 3 of 'syscall_errno' makes integer from pointer 
without a cast [-Wint-conversion]
8 |   return syscall_errno (SYS_access, 2, file, mode, 0, 0, 0, 0);
  |^~~~
  ||
  |const char *
In file included from 
/work/home/jzzhong/work/toolchain/riscv/build/dev-rv64gcv_zvfh_zfh-lp64d-medany-newlib-spike-debug/../../newlib/libgloss/riscv/sys_access.c:2:
/work/home/jzzhong/work/toolchain/riscv/build/dev-rv64gcv_zvfh_zfh-lp64d-medany-newlib-spike-debug/../../newlib/libgloss/riscv/internal_syscall.h:66:38:
 note: expected 'long int' but argument is of type 'const char *'
   66 | syscall_errno(long n, int argc, long _a0, long _a1, long _a2, long _a3, 
long _a4, long _a5)
  | ~^~~
/work/home/jzzhong/work/toolchain/riscv/build/dev-rv64gcv_zvfh_zfh-lp64d-medany-newlib-spike-debug/../../newlib/libgloss/riscv/sys_utime.c:5:39:
 warning: 'struct utimbuf' declared inside parameter list will not be visible 
outside of this definition or declaration
5 | _utime(const char *path, const struct utimbuf *times)
  |   ^~~
/work/home/jzzhong/work/toolchain/riscv/build/dev-rv64gcv_zvfh_zfh-lp64d-medany-newlib-spike-debug/../../newlib/libgloss/riscv/sys_faccessat.c:
 In function '_faccessat':
/work/home/jzzhong/work/toolchain/riscv/build/dev-rv64gcv_zvfh_zfh-lp64d-medany-newlib-spike-debug/../../newlib/libgloss/riscv/sys_faccessat.c:7:50:
 error: passing argument 4 of 'syscall_errno' makes integer from pointer 
without a cast [-Wint-conversion]
7 |   return syscall_errno (SYS_faccessat, 4, dirfd, file, mode, flags, 0, 
0);
  |  ^~~~
  |  |
  |  const char *
In file included from 
/work/home/jzzhong/work/toolchain/riscv/build/dev-rv64gcv_zvfh_zfh-lp64d-medany-newlib-spike-debug/../../newlib/libgloss/riscv/sys_faccessat.c:2:
/work/home/jzzhong/work/too

Re: [PATCH v4] RISC-V: Bugfix for legitimize move when get vec mode in zve32f

2023-12-01 Thread 钟居哲

LGTM



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-12-02 08:59
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v4] RISC-V: Bugfix for legitimize move when get vec mode in 
zve32f
From: Pan Li 
 
If we want to extract 64bit value but ELEN < 64, we use RVV
vector mode with EEW = 32 to extract the highpart and lowpart.
However, this approach doesn't honor DFmode when movdf pattern
when ZVE32f and of course results in ICE when zve32f.
 
This patch would like to reuse the approach with some additional
handing, consider lowpart bits is meaningless for FP mode, we need
one int reg as bridge here. For example:
 
rtx tmp = gen_rtx_reg (DImode)
reg:DI = reg:DF (fmv.d.x) // Move DF reg to DI
...
perform the extract for high and low parts
...
reg:DF = reg:DI (fmv.x.d) // Move DI reg back to DF after all done
 
PR target/112743
 
gcc/ChangeLog:
 
* config/riscv/riscv.cc (riscv_legitimize_move): Take the
exist (U *mode) and handle DFmode like DImode when EEW is
32bits for ZVE32F.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/pr112743-2.c: New test.
 
Signed-off-by: Pan Li 
---
gcc/config/riscv/riscv.cc | 63 +--
.../gcc.target/riscv/rvv/base/pr112743-2.c| 52 +++
2 files changed, 95 insertions(+), 20 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr112743-2.c
 
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index a4fc858fb50..84512dcdc68 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -2605,41 +2605,64 @@ riscv_legitimize_move (machine_mode mode, rtx dest, rtx 
src)
   unsigned int nunits = vmode_size > mode_size ? vmode_size / mode_size : 
1;
   scalar_mode smode = as_a (mode);
   unsigned int index = SUBREG_BYTE (src).to_constant () / mode_size;
-  unsigned int num = smode == DImode && !TARGET_VECTOR_ELEN_64 ? 2 : 1;
+  unsigned int num = known_eq (GET_MODE_SIZE (smode), 8)
+ && !TARGET_VECTOR_ELEN_64 ? 2 : 1;
+  bool need_int_reg_p = false;
   if (num == 2)
{
  /* If we want to extract 64bit value but ELEN < 64,
 we use RVV vector mode with EEW = 32 to extract
 the highpart and lowpart.  */
+   need_int_reg_p = smode == DFmode;
  smode = SImode;
  nunits = nunits * 2;
}
-  vmode = riscv_vector::get_vector_mode (smode, nunits).require ();
-  rtx v = gen_lowpart (vmode, SUBREG_REG (src));
-  for (unsigned int i = 0; i < num; i++)
+  if (riscv_vector::get_vector_mode (smode, nunits).exists ())
{
-   rtx result;
-   if (num == 1)
- result = dest;
-   else if (i == 0)
- result = gen_lowpart (smode, dest);
-   else
- result = gen_reg_rtx (smode);
-   riscv_vector::emit_vec_extract (result, v, index + i);
+   rtx v = gen_lowpart (vmode, SUBREG_REG (src));
+   rtx int_reg = dest;
-   if (i == 1)
+   if (need_int_reg_p)
{
-   rtx tmp
- = expand_binop (Pmode, ashl_optab, gen_lowpart (Pmode, result),
- gen_int_mode (32, Pmode), NULL_RTX, 0,
- OPTAB_DIRECT);
-   rtx tmp2 = expand_binop (Pmode, ior_optab, tmp, dest, NULL_RTX, 0,
-OPTAB_DIRECT);
-   emit_move_insn (dest, tmp2);
+   int_reg = gen_reg_rtx (DImode);
+   emit_move_insn (int_reg, gen_lowpart (GET_MODE (int_reg), dest));
}
+
+   for (unsigned int i = 0; i < num; i++)
+ {
+   rtx result;
+   if (num == 1)
+ result = int_reg;
+   else if (i == 0)
+ result = gen_lowpart (smode, int_reg);
+   else
+ result = gen_reg_rtx (smode);
+
+   riscv_vector::emit_vec_extract (result, v, index + i);
+
+   if (i == 1)
+ {
+   rtx tmp = expand_binop (Pmode, ashl_optab,
+   gen_lowpart (Pmode, result),
+   gen_int_mode (32, Pmode), NULL_RTX, 0,
+   OPTAB_DIRECT);
+   rtx tmp2 = expand_binop (Pmode, ior_optab, tmp, int_reg,
+NULL_RTX, 0,
+OPTAB_DIRECT);
+   emit_move_insn (int_reg, tmp2);
+ }
+ }
+
+   if (need_int_reg_p)
+ emit_move_insn (dest, gen_lowpart (GET_MODE (dest), int_reg));
+   else
+ emit_move_insn (dest, int_reg);
}
+  else
+ gcc_unreachable ();
+
   return true;
 }
   /* Expand
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr112743-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/pr112743-2.c
new file mode 100644
index 000..fdb35fd70f2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr112743-2.c
@@ -0,0 +1,52 @@
+/* Test that we do not have ice when compile */
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zve32f_zvfh_zfh -mabi=lp64 -O2" } */
+
+#include 
+
+union double_union
+{
+  double d;
+  __uint32_t i[2];
+};
+
+#define word0(x)  (x.i[1])
+#define word1(x)  (x.i[0])
+
+#define P 53
+#define Exp_shift 20
+#define Exp_msk1  ((__uint32_t)0x10L)
+#define Exp_mask  ((__uint32_t)0x7ff0L)
+
+double ulp (double _x)
+{
+  union double_union x, a;
+  register int L;
+
+  x.d = _x;
+  L = (word0 (x) & Exp_mask) - (P - 1) * Exp_msk1;
+
+  if (L > 0)
+{
+  L |= Exp_msk1 >> 4;
+  word0 (a) =

[PATCH 2/6] c: Turn int-conversion warnings into permerrors

2023-12-01 Thread 钟居哲

Hi, This patch cause error on building newlib/glibc/musl on RISC-V port:

/work/home/jzzhong/work/toolchain/riscv/build/dev-rv64gcv_zvfh_zfh-lp64d-medany-newlib-spike-debug/../../newlib/libgloss/riscv/sys_access.c:8:40:
 error: passing argument 3 of 'syscall_errno' makes integer from pointer 
without a cast [-Wint-conversion]
8 |   return syscall_errno (SYS_access, 2, file, mode, 0, 0, 0, 0);
  |^~~~
  ||
  |const char *
In file included from 
/work/home/jzzhong/work/toolchain/riscv/build/dev-rv64gcv_zvfh_zfh-lp64d-medany-newlib-spike-debug/../../newlib/libgloss/riscv/sys_access.c:2:
/work/home/jzzhong/work/toolchain/riscv/build/dev-rv64gcv_zvfh_zfh-lp64d-medany-newlib-spike-debug/../../newlib/libgloss/riscv/internal_syscall.h:66:38:
 note: expected 'long int' but argument is of type 'const char *'
   66 | syscall_errno(long n, int argc, long _a0, long _a1, long _a2, long _a3, 
long _a4, long _a5)
  | ~^~~
/work/home/jzzhong/work/toolchain/riscv/build/dev-rv64gcv_zvfh_zfh-lp64d-medany-newlib-spike-debug/../../newlib/libgloss/riscv/sys_utime.c:5:39:
 warning: 'struct utimbuf' declared inside parameter list will not be visible 
outside of this definition or declaration
5 | _utime(const char *path, const struct utimbuf *times)
  |   ^~~
/work/home/jzzhong/work/toolchain/riscv/build/dev-rv64gcv_zvfh_zfh-lp64d-medany-newlib-spike-debug/../../newlib/libgloss/riscv/sys_faccessat.c:
 In function '_faccessat':
/work/home/jzzhong/work/toolchain/riscv/build/dev-rv64gcv_zvfh_zfh-lp64d-medany-newlib-spike-debug/../../newlib/libgloss/riscv/sys_faccessat.c:7:50:
 error: passing argument 4 of 'syscall_errno' makes integer from pointer 
without a cast [-Wint-conversion]
7 |   return syscall_errno (SYS_faccessat, 4, dirfd, file, mode, flags, 0, 
0);
  |  ^~~~
  |  |
  |  const char *
In file included from 
/work/home/jzzhong/work/toolchain/riscv/build/dev-rv64gcv_zvfh_zfh-lp64d-medany-newlib-spike-debug/../../newlib/libgloss/riscv/sys_faccessat.c:2:
/work/home/jzzhong/work/toolchain/riscv/build/dev-rv64gcv_zvfh_zfh-lp64d-medany-newlib-spike-debug/../../newlib/libgloss/riscv/internal_syscall.h:66:48:
 note: expected 'long int' but argument is of type 'const char *'
   66 | syscall_errno(long n, int argc, long _a0, long _a1, long _a2, long _a3, 
long _a4, long _a5)
  |   ~^~~
make[5]: *** [Makefile:3315: riscv/riscv_libgloss_a-sys_access.o] Error 1
make[5]: *** Waiting for unfinished jobs
/work/home/jzzhong/work/toolchain/riscv/build/dev-rv64gcv_zvfh_zfh-lp64d-medany-newlib-spike-debug/../../newlib/libgloss/riscv/sys_open.c:
 In function '_open':
/work/home/jzzhong/work/toolchain/riscv/build/dev-rv64gcv_zvfh_zfh-lp64d-medany-newlib-spike-debug/../../newlib/libgloss/riscv/sys_open.c:8:38:
 error: passing argument 3 of 'syscall_errno' makes integer from pointer 
without a cast [-Wint-conversion]
8 |   return syscall_errno (SYS_open, 3, name, flags, mode, 0, 0, 0);
  |  ^~~~
  |  |
  |  const char *
In file included from 
/work/home/jzzhong/work/toolchain/riscv/build/dev-rv64gcv_zvfh_zfh-lp64d-medany-newlib-spike-debug/../../newlib/libgloss/riscv/sys_open.c:2:
/work/home/jzzhong/work/toolchain/riscv/build/dev-rv64gcv_zvfh_zfh-lp64d-medany-newlib-spike-debug/../../newlib/libgloss/riscv/internal_syscall.h:66:38:
 note: expected 'long int' but argument is of type 'const char *'
   66 | syscall_errno(long n, int argc, long _a0, long _a1, long _a2, long _a3, 
long _a4, long _a5)
  | ~^~~
/work/home/jzzhong/work/toolchain/riscv/build/dev-rv64gcv_zvfh_zfh-lp64d-medany-newlib-spike-debug/../../newlib/libgloss/riscv/sys_openat.c:
 In function '_openat':
/work/home/jzzhong/work/toolchain/riscv/build/dev-rv64gcv_zvfh_zfh-lp64d-medany-newlib-spike-debug/../../newlib/libgloss/riscv/sys_openat.c:7:47:
 error: passing argument 4 of 'syscall_errno' makes integer from pointer 
without a cast [-Wint-conversion]
7 |   return syscall_errno (SYS_openat, 4, dirfd, name, flags, mode, 0, 0);
  |   ^~~~
  |   |
  |   const char *
In file included from 
/work/home/jzzhong/work/toolchain/riscv/build/dev-rv64gcv_zvfh_zfh-lp64d-medany-newlib-spike-debug/../../newlib/libgloss/riscv/sys_openat.c:2:

Re: [PATCH] RISC-V: Add vectorized strcmp.

2023-12-01 Thread 钟居哲

lgtm



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-12-01 23:23
To: gcc-patches; palmer; Kito Cheng; jeffreyalaw; juzhe.zh...@rivai.ai
CC: rdapp.gcc
Subject: [PATCH] RISC-V: Add vectorized strcmp.
Hi,
 
this patch adds a vectorized strcmp implementation and tests.  Similar
to strlen, expansion is still guarded by -minline-strcmp.  I just
realized I forgot to make it a series but this one is actually
dependent on the NFC patch and the rawmemchr fix before.
 
Regards
Robin
 
gcc/ChangeLog:
 
* config/riscv/riscv-protos.h (expand_strcmp): Declare.
* config/riscv/riscv-string.cc (riscv_expand_strcmp): Add
strategy handling and delegation to scalar and vector expanders.
(expand_strcmp): Vectorized implementation.
* config/riscv/riscv.md: Add TARGET_VECTOR to strcmp expander.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c: New test.
* gcc.target/riscv/rvv/autovec/builtin/strcmp.c: New test.
---
gcc/config/riscv/riscv-protos.h   |   1 +
gcc/config/riscv/riscv-string.cc  | 161 +-
gcc/config/riscv/riscv.md |   3 +-
.../riscv/rvv/autovec/builtin/strcmp-run.c|  32 
.../riscv/rvv/autovec/builtin/strcmp.c|  13 ++
5 files changed, 206 insertions(+), 4 deletions(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp.c
 
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index c94c82a9973..5878a674413 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -558,6 +558,7 @@ void expand_cond_binop (unsigned, rtx *);
void expand_cond_ternop (unsigned, rtx *);
void expand_popcount (rtx *);
void expand_rawmemchr (machine_mode, rtx, rtx, rtx, bool = false);
+bool expand_strcmp (rtx, rtx, rtx, rtx, unsigned HOST_WIDE_INT, bool);
void emit_vec_extract (rtx, rtx, poly_int64);
/* Rounding mode bitfield for fixed point VXRM.  */
diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index 6cde1bf89a0..11c1f74d0b3 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -511,12 +511,19 @@ riscv_expand_strcmp (rtx result, rtx src1, rtx src2,
 return false;
   alignment = UINTVAL (align_rtx);
-  if (TARGET_ZBB || TARGET_XTHEADBB)
+  if (TARGET_VECTOR && stringop_strategy & STRATEGY_VECTOR)
 {
-  return riscv_expand_strcmp_scalar (result, src1, src2, nbytes, alignment,
- ncompare);
+  bool ok = riscv_vector::expand_strcmp (result, src1, src2,
+  bytes_rtx, alignment,
+  ncompare);
+  if (ok)
+ return true;
 }
+  if ((TARGET_ZBB || TARGET_XTHEADBB) && stringop_strategy & STRATEGY_SCALAR)
+return riscv_expand_strcmp_scalar (result, src1, src2, nbytes, alignment,
+ncompare);
+
   return false;
}
@@ -1092,4 +1099,152 @@ expand_rawmemchr (machine_mode mode, rtx dst, rtx 
haystack, rtx needle,
 }
}
+/* Implement cmpstr using vector instructions.  The ALIGNMENT and
+   NCOMPARE parameters are unused for now.  */
+
+bool
+expand_strcmp (rtx result, rtx src1, rtx src2, rtx nbytes,
+unsigned HOST_WIDE_INT, bool)
+{
+  gcc_assert (TARGET_VECTOR);
+
+  /* We don't support big endian.  */
+  if (BYTES_BIG_ENDIAN)
+return false;
+
+  bool with_length = nbytes != NULL_RTX;
+
+  if (with_length
+  && (!REG_P (nbytes) && !SUBREG_P (nbytes) && !CONST_INT_P (nbytes)))
+return false;
+
+  if (with_length && CONST_INT_P (nbytes))
+nbytes = force_reg (Pmode, nbytes);
+
+  machine_mode mode = E_QImode;
+  unsigned int isize = GET_MODE_SIZE (mode).to_constant ();
+  int lmul = TARGET_MAX_LMUL;
+  poly_int64 nunits = exact_div (BYTES_PER_RISCV_VECTOR * lmul, isize);
+
+  machine_mode vmode;
+  if (!riscv_vector::get_vector_mode (GET_MODE_INNER (mode), nunits)
+ .exists ())
+gcc_unreachable ();
+
+  machine_mode mask_mode = riscv_vector::get_mask_mode (vmode);
+
+  /* Prepare addresses.  */
+  rtx src_addr1 = copy_addr_to_reg (XEXP (src1, 0));
+  rtx vsrc1 = change_address (src1, vmode, src_addr1);
+
+  rtx src_addr2 = copy_addr_to_reg (XEXP (src2, 0));
+  rtx vsrc2 = change_address (src2, vmode, src_addr2);
+
+  /* Set initial pointer bump to 0.  */
+  rtx cnt = gen_reg_rtx (Pmode);
+  emit_move_insn (cnt, CONST0_RTX (Pmode));
+
+  rtx sub = gen_reg_rtx (Pmode);
+  emit_move_insn (sub, CONST0_RTX (Pmode));
+
+  /* Create source vectors.  */
+  rtx vec1 = gen_reg_rtx (vmode);
+  rtx vec2 = gen_reg_rtx (vmode);
+
+  rtx done = gen_label_rtx ();
+  rtx loop = gen_label_rtx ();
+  emit_label (loop);
+
+  /* Bump the pointers.  */
+  emit_insn (gen_rtx_SET (src_addr1, gen_rtx_PLUS (Pmode, src_addr1, cnt)));
+  emit_insn (gen_rtx_SET (src_addr2, gen_rtx_PLUS (Pmode, src_addr2, cnt)));
+
+  rtx vlops1[] = {vec1, vsrc1};
+  rtx vlops2[] = {vec2, vsrc2};
+
+  if (!with_length)
+{
+  emit_vlmax_insn (code_for_pred_fault_load (vmode),
+

Re: [PATCH] RISC-V: Add vectorized strlen.

2023-12-01 Thread 钟居哲

LGTM.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-12-01 23:21
To: gcc-patches; palmer; Kito Cheng; jeffreyalaw; juzhe.zh...@rivai.ai
CC: rdapp.gcc
Subject: [PATCH] RISC-V: Add vectorized strlen.
Hi,
 
this patch implements a vectorized strlen by re-using and slightly
adjusting the rawmemchr implementation.  Rawmemchr returns the address
of the needle while strlen returns the difference between needle address
and start address.
 
As before, strlen expansion is guarded by -minline-strlen.
 
While testing with -minline-strlen I encountered a vsetvl problem in
memcpy-chk.c where we didn't insert a vsetvl at the proper spot (after
a setjmp).  This needs to be fixed separately and I figured I'd post
this patch as-is.
 
Regards
Robin
 
gcc/ChangeLog:
 
* config/riscv/riscv-protos.h (expand_rawmemchr): Add strlen
parameter.
* config/riscv/riscv-string.cc (riscv_expand_strlen): Call
rawmemchr.
(expand_rawmemchr): Add strlen handling.
* config/riscv/riscv.md: Add TARGET_VECTOR to strlen expander.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/builtin/strlen-run.c: New test.
* gcc.target/riscv/rvv/autovec/builtin/strlen.c: New test.
---
gcc/config/riscv/riscv-protos.h   |  2 +-
gcc/config/riscv/riscv-string.cc  | 41 ++-
gcc/config/riscv/riscv.md |  8 +---
.../riscv/rvv/autovec/builtin/strlen-run.c| 37 +
.../riscv/rvv/autovec/builtin/strlen.c| 12 ++
5 files changed, 83 insertions(+), 17 deletions(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strlen-run.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strlen.c
 
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 695ee24ad6f..c94c82a9973 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -557,7 +557,7 @@ void expand_cond_unop (unsigned, rtx *);
void expand_cond_binop (unsigned, rtx *);
void expand_cond_ternop (unsigned, rtx *);
void expand_popcount (rtx *);
-void expand_rawmemchr (machine_mode, rtx, rtx, rtx);
+void expand_rawmemchr (machine_mode, rtx, rtx, rtx, bool = false);
void emit_vec_extract (rtx, rtx, poly_int64);
/* Rounding mode bitfield for fixed point VXRM.  */
diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index 594ff49fc5a..6cde1bf89a0 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -588,9 +588,16 @@ riscv_expand_strlen_scalar (rtx result, rtx src, rtx align)
bool
riscv_expand_strlen (rtx result, rtx src, rtx search_char, rtx align)
{
+  if (TARGET_VECTOR && stringop_strategy & STRATEGY_VECTOR)
+{
+  riscv_vector::expand_rawmemchr (E_QImode, result, src, search_char,
+   /* strlen */ true);
+  return true;
+}
+
   gcc_assert (search_char == const0_rtx);
-  if (TARGET_ZBB || TARGET_XTHEADBB)
+  if ((TARGET_ZBB || TARGET_XTHEADBB) && stringop_strategy & STRATEGY_SCALAR)
 return riscv_expand_strlen_scalar (result, src, align);
   return false;
@@ -979,12 +986,13 @@ expand_block_move (rtx dst_in, rtx src_in, rtx length_in)
}
-/* Implement rawmemchr using vector instructions.
+/* Implement rawmemchr and strlen using vector instructions.
It can be assumed that the needle is in the haystack, otherwise the
behavior is undefined.  */
void
-expand_rawmemchr (machine_mode mode, rtx dst, rtx src, rtx pat)
+expand_rawmemchr (machine_mode mode, rtx dst, rtx haystack, rtx needle,
+   bool strlen)
{
   /*
 rawmemchr:
@@ -1005,6 +1013,9 @@ expand_rawmemchr (machine_mode mode, rtx dst, rtx src, 
rtx pat)
   */
   gcc_assert (TARGET_VECTOR);
+  if (strlen)
+gcc_assert (mode == E_QImode);
+
   unsigned int isize = GET_MODE_SIZE (mode).to_constant ();
   int lmul = TARGET_MAX_LMUL;
   poly_int64 nunits = exact_div (BYTES_PER_RISCV_VECTOR * lmul, isize);
@@ -1028,12 +1039,13 @@ expand_rawmemchr (machine_mode mode, rtx dst, rtx src, 
rtx pat)
  return a pointer to the matching byte.  */
   unsigned int shift = exact_log2 (GET_MODE_SIZE (mode).to_constant ());
-  rtx src_addr = copy_addr_to_reg (XEXP (src, 0));
+  rtx src_addr = copy_addr_to_reg (XEXP (haystack, 0));
+  rtx start_addr = copy_addr_to_reg (XEXP (haystack, 0));
   rtx loop = gen_label_rtx ();
   emit_label (loop);
-  rtx vsrc = change_address (src, vmode, src_addr);
+  rtx vsrc = change_address (haystack, vmode, src_addr);
   /* Bump the pointer.  */
   rtx step = gen_reg_rtx (Pmode);
@@ -1052,8 +1064,8 @@ expand_rawmemchr (machine_mode mode, rtx dst, rtx src, 
rtx pat)
 emit_insn (gen_read_vldi_zero_extend (cnt));
   /* Compare needle with haystack and store in a mask.  */
-  rtx eq = gen_rtx_EQ (mask_mode, gen_const_vec_duplicate (vmode, pat), vec);
-  rtx vmsops[] = {mask, eq, vec, pat};
+  rtx eq = gen_rtx_EQ (mask_mode, gen_const_vec_duplicate (vmode, needle), 
vec);
+  rtx vmsops[] = {mask, eq, vec, needle};
   emit_nonvlmax_insn (code_for_pred_eqne_scalar

Re: [PATCH] RISC-V: Rename and unify stringop strategy handling [NFC].

2023-12-01 Thread 钟居哲

LGTM



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-12-01 23:21
To: gcc-patches; palmer; Kito Cheng; jeffreyalaw; juzhe.zh...@rivai.ai
CC: rdapp.gcc
Subject: [PATCH] RISC-V: Rename and unify stringop strategy handling [NFC].
Hi,
 
now split into multiple patches.
 
In preparation for the vectorized strlen and strcmp support this NFC
patch unifies the stringop strategy handling a bit.  The "auto"
strategy now is a combination of scalar and vector and an expander
should try the strategies in their preferred order.
 
For the block_move expander this patch does just that.
 
Regards
Robin
 
gcc/ChangeLog:
 
* config/riscv/riscv-opts.h (enum riscv_stringop_strategy_enum):
Rename...
(enum stringop_strategy_enum): ... to this.
* config/riscv/riscv-string.cc (riscv_expand_block_move): New
wrapper expander handling the strategies and delegation.
(riscv_expand_block_move_scalar): Rename function and make
static.
(expand_block_move): Remove strategy handling.
* config/riscv/riscv.md: Call expander wrapper.
* config/riscv/riscv.opt: Rename.
---
gcc/config/riscv/riscv-opts.h | 18 ++--
gcc/config/riscv/riscv-string.cc  | 92 +++
gcc/config/riscv/riscv.md |  4 +-
gcc/config/riscv/riscv.opt| 18 ++--
.../riscv/rvv/base/cpymem-strategy-1.c|  2 +-
.../riscv/rvv/base/cpymem-strategy-2.c|  2 +-
.../riscv/rvv/base/cpymem-strategy-3.c|  2 +-
.../riscv/rvv/base/cpymem-strategy-4.c|  2 +-
.../riscv/rvv/base/cpymem-strategy-5.c|  2 +-
9 files changed, 78 insertions(+), 64 deletions(-)
 
diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index e6e55ad7071..30efebbf07b 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -104,15 +104,15 @@ enum riscv_entity
};
/* RISC-V stringop strategy. */
-enum riscv_stringop_strategy_enum {
-  /* Use scalar or vector instructions. */
-  USE_AUTO,
-  /* Always use a library call. */
-  USE_LIBCALL,
-  /* Only use scalar instructions. */
-  USE_SCALAR,
-  /* Only use vector instructions. */
-  USE_VECTOR
+enum stringop_strategy_enum {
+  /* No expansion. */
+  STRATEGY_LIBCALL = 1,
+  /* Use scalar expansion if possible. */
+  STRATEGY_SCALAR = 2,
+  /* Only vector expansion if possible. */
+  STRATEGY_VECTOR = 4,
+  /* Use any. */
+  STRATEGY_AUTO = STRATEGY_SCALAR | STRATEGY_VECTOR
};
#define TARGET_ZICOND_LIKE (TARGET_ZICOND || (TARGET_XVENTANACONDOPS && 
TARGET_64BIT))
diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index 80e3b5981af..f3a4d3ddd47 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -707,51 +707,68 @@ riscv_block_move_loop (rtx dest, rtx src, unsigned 
HOST_WIDE_INT length,
/* Expand a cpymemsi instruction, which copies LENGTH bytes from
memory reference SRC to memory reference DEST.  */
-bool
-riscv_expand_block_move (rtx dest, rtx src, rtx length)
+static bool
+riscv_expand_block_move_scalar (rtx dest, rtx src, rtx length)
{
-  if (riscv_memcpy_strategy == USE_LIBCALL
-  || riscv_memcpy_strategy == USE_VECTOR)
+  if (!CONST_INT_P (length))
 return false;
-  if (CONST_INT_P (length))
-{
-  unsigned HOST_WIDE_INT hwi_length = UINTVAL (length);
-  unsigned HOST_WIDE_INT factor, align;
+  unsigned HOST_WIDE_INT hwi_length = UINTVAL (length);
+  unsigned HOST_WIDE_INT factor, align;
-  align = MIN (MIN (MEM_ALIGN (src), MEM_ALIGN (dest)), BITS_PER_WORD);
-  factor = BITS_PER_WORD / align;
+  align = MIN (MIN (MEM_ALIGN (src), MEM_ALIGN (dest)), BITS_PER_WORD);
+  factor = BITS_PER_WORD / align;
-  if (optimize_function_for_size_p (cfun)
-   && hwi_length * factor * UNITS_PER_WORD > MOVE_RATIO (false))
- return false;
+  if (optimize_function_for_size_p (cfun)
+  && hwi_length * factor * UNITS_PER_WORD > MOVE_RATIO (false))
+return false;
-  if (hwi_length <= (RISCV_MAX_MOVE_BYTES_STRAIGHT / factor))
+  if (hwi_length <= (RISCV_MAX_MOVE_BYTES_STRAIGHT / factor))
+{
+  riscv_block_move_straight (dest, src, INTVAL (length));
+  return true;
+}
+  else if (optimize && align >= BITS_PER_WORD)
+{
+  unsigned min_iter_words
+ = RISCV_MAX_MOVE_BYTES_PER_LOOP_ITER / UNITS_PER_WORD;
+  unsigned iter_words = min_iter_words;
+  unsigned HOST_WIDE_INT bytes = hwi_length;
+  unsigned HOST_WIDE_INT words = bytes / UNITS_PER_WORD;
+
+  /* Lengthen the loop body if it shortens the tail.  */
+  for (unsigned i = min_iter_words; i < min_iter_words * 2 - 1; i++)
{
-   riscv_block_move_straight (dest, src, INTVAL (length));
-   return true;
+   unsigned cur_cost = iter_words + words % iter_words;
+   unsigned new_cost = i + words % i;
+   if (new_cost <= cur_cost)
+ iter_words = i;
}
-  else if (optimize && align >= BITS_PER_WORD)
- {
-   unsigned min_iter_words
- = RISCV_MAX_MOVE_BYTES_PER_LOOP_ITER / UNITS_PER_WORD;
-   unsigned iter_words =

Re: [PATCH] RISC-V: Fix rawmemchr implementation.

2023-12-01 Thread 钟居哲

LGTM。



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-12-01 23:20
To: gcc-patches; palmer; Kito Cheng; jeffreyalaw; juzhe.zh...@rivai.ai
CC: rdapp.gcc
Subject: [PATCH] RISC-V: Fix rawmemchr implementation.
Hi,
 
this fixes a bug in the rawmemchr implementation by incrementing the
source address by vl * element_size instead of just vl.
 
This is normally harmless as we will just scan the same region more than
once but, in combination with an older qemu version, would lead to
an execution failure in SPEC2017.
 
Regards
Robin
 
 
gcc/ChangeLog:
 
* config/riscv/riscv-string.cc (expand_rawmemchr): Increment
source address by vl * element_size.
---
gcc/config/riscv/riscv-string.cc | 13 +++--
1 file changed, 7 insertions(+), 6 deletions(-)
 
diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index f3a4d3ddd47..594ff49fc5a 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -1017,6 +1017,8 @@ expand_rawmemchr (machine_mode mode, rtx dst, rtx src, 
rtx pat)
   machine_mode mask_mode = riscv_vector::get_mask_mode (vmode);
   rtx cnt = gen_reg_rtx (Pmode);
+  emit_move_insn (cnt, CONST0_RTX (Pmode));
+
   rtx end = gen_reg_rtx (Pmode);
   rtx vec = gen_reg_rtx (vmode);
   rtx mask = gen_reg_rtx (mask_mode);
@@ -1033,6 +1035,11 @@ expand_rawmemchr (machine_mode mode, rtx dst, rtx src, 
rtx pat)
   rtx vsrc = change_address (src, vmode, src_addr);
+  /* Bump the pointer.  */
+  rtx step = gen_reg_rtx (Pmode);
+  emit_insn (gen_rtx_SET (step, gen_rtx_ASHIFT (Pmode, cnt, GEN_INT (shift;
+  emit_insn (gen_rtx_SET (src_addr, gen_rtx_PLUS (Pmode, src_addr, step)));
+
   /* Emit a first-fault load.  */
   rtx vlops[] = {vec, vsrc};
   emit_vlmax_insn (code_for_pred_fault_load (vmode),
@@ -1055,16 +1062,10 @@ expand_rawmemchr (machine_mode mode, rtx dst, rtx src, 
rtx pat)
   emit_nonvlmax_insn (code_for_pred_ffs (mask_mode, Pmode),
  riscv_vector::CPOP_OP, vfops, cnt);
-  /* Bump the pointer.  */
-  emit_insn (gen_rtx_SET (src_addr, gen_rtx_PLUS (Pmode, src_addr, cnt)));
-
   /* Emit the loop condition.  */
   rtx test = gen_rtx_LT (VOIDmode, end, const0_rtx);
   emit_jump_insn (gen_cbranch4 (Pmode, test, end, const0_rtx, loop));
-  /*  We overran by CNT, subtract it.  */
-  emit_insn (gen_rtx_SET (src_addr, gen_rtx_MINUS (Pmode, src_addr, cnt)));
-
   /*  We found something at SRC + END * [1,2,4,8].  */
   emit_insn (gen_rtx_SET (end, gen_rtx_ASHIFT (Pmode, end, GEN_INT (shift;
   emit_insn (gen_rtx_SET (dst, gen_rtx_PLUS (Pmode, src_addr, end)));
-- 
2.43.0

Re: Re: [PATCH] RISC-V: Support highpart register overlap for vwcvt

2023-11-29 Thread 钟居哲

>> overlap or group_overlap.  Then change "no" to "none" and rename
>> "vconstraint_enabled" to "group_overlap_valid" (or without the group).

>> Add a comment to group_overlap_valid:

>> ; Widening instructions have group-overlap constraints.  Those are only
>> ; valid for certain register-group sizes.  This attribute marks the
>> ; alternatives not matching the required register-group size as disabled.

Ok will send a patch to fix them. Thanks.


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-11-29 21:30
To: 钟居哲; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; Jeff Law
Subject: Re: [PATCH] RISC-V: Support highpart register overlap for vwcvt
>>> I can't really match spec and code.  For the lmul = 2 case sure,
>>> but W84 e.g. allows v4 and not v6?  What actually is "highest-numbered 
>>> part"?
> Yes.
> 
> For vwcvt, LMUL 4 -> LMUL 8. 
> We allow overlap  vwcvt v0 (occupy v0 - v7), v4 (occupy v4 - v7)
> This patch support the overlap above.
 
Ok thanks, that way it makes sense.  The allowed overlap size is the
size of the source group which is determined by the "extension factor".
 
But don't we allow e.g. v2 and v4 with W82?  Shouldn't it be % 8 == 6
and % 8 == 7 for W82 and W81? Or for W41, % 4 == 3?  At least when looking
at the given spec example that would correspond to W82?
 
> This is kito's code. Could you suggest another name ? I can modify it.
 
overlap or group_overlap.  Then change "no" to "none" and rename
"vconstraint_enabled" to "group_overlap_valid" (or without the group).
 
Add a comment to group_overlap_valid:
 
; Widening instructions have group-overlap constraints.  Those are only
; valid for certain register-group sizes.  This attribute marks the
; alternatives not matching the required register-group size as disabled.
 
 
> I experiment with many tests, turns out adding ? generate better codegen.
> You can try it (remove ?) and testing it on case (I added in this patch).
 
It looks like we spill without it but I don't get why.  Well, as long
as it works, I guess we can defer that question.
 
Regards
Robin

Re: Re: [PATCH] RISC-V: Support highpart register overlap for vwcvt

2023-11-29 Thread 钟居哲

>> But don't we allow e.g. v2 and v4 with W82?  Shouldn't it be % 8 == 6
>> and % 8 == 7 for W82 and W81? Or for W41, % 4 == 3?  At least when looking
>> at the given spec example that would correspond to W82?

I think you are right.  It should be W86 for vsext.vf4 (LMUL2 -> LMUL8)
W87 for vsext.vf8 (LMUL1->LMUL8)
W43 for vsext.vf4 (LMUL1->LMUL4)

This patch is just only using W21,W42,W84.
Will adapt that in the following patches.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-11-29 21:30
To: 钟居哲; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; Jeff Law
Subject: Re: [PATCH] RISC-V: Support highpart register overlap for vwcvt
>>> I can't really match spec and code.  For the lmul = 2 case sure,
>>> but W84 e.g. allows v4 and not v6?  What actually is "highest-numbered 
>>> part"?
> Yes.
> 
> For vwcvt, LMUL 4 -> LMUL 8. 
> We allow overlap  vwcvt v0 (occupy v0 - v7), v4 (occupy v4 - v7)
> This patch support the overlap above.
 
Ok thanks, that way it makes sense.  The allowed overlap size is the
size of the source group which is determined by the "extension factor".
 
But don't we allow e.g. v2 and v4 with W82?  Shouldn't it be % 8 == 6
and % 8 == 7 for W82 and W81? Or for W41, % 4 == 3?  At least when looking
at the given spec example that would correspond to W82?
 
> This is kito's code. Could you suggest another name ? I can modify it.
 
overlap or group_overlap.  Then change "no" to "none" and rename
"vconstraint_enabled" to "group_overlap_valid" (or without the group).
 
Add a comment to group_overlap_valid:
 
; Widening instructions have group-overlap constraints.  Those are only
; valid for certain register-group sizes.  This attribute marks the
; alternatives not matching the required register-group size as disabled.
 
 
> I experiment with many tests, turns out adding ? generate better codegen.
> You can try it (remove ?) and testing it on case (I added in this patch).
 
It looks like we spill without it but I don't get why.  Well, as long
as it works, I guess we can defer that question.
 
Regards
Robin

Re: Re: [PATCH] RISC-V: Support highpart register overlap for vwcvt

2023-11-29 Thread 钟居哲

>> Looks like this already went in while I was looking at it...
I committed it after kito approve that.

>> I can't really match spec and code.  For the lmul = 2 case sure,
>> but W84 e.g. allows v4 and not v6?  What actually is "highest-numbered part"?
Yes.

For vwcvt, LMUL 4 -> LMUL 8. 
We allow overlap  vwcvt v0 (occupy v0 - v7), v4 (occupy v4 - v7)
This patch support the overlap above.
Without this patch, vwcvt is completely non-overlap in any register.

>>The name vconstraint doesn't say anything.
This is kito's code. Could you suggest another name ? I can modify it.

>>Regarding earlyclobber.  We still don't consume all inputs before
>>writing them (there is overlap after all).  Is that the gist of
>>the change?  Circumventing the earlyclobber by "strategically"
>>selecting registers?
 
>>Why is the disparage necessary?  What happens if it's not there?

I experiment with many tests, turns out adding ? generate better codegen.
You can try it (remove ?) and testing it on case (I added in this patch).



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-11-29 19:01
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Support highpart register overlap for vwcvt
Looks like this already went in while I was looking at it...
 
In general it looks ok to me but I would have really hoped
for some more comments.
 
> +;; These following constraints are used by RVV instructions with dest EEW > 
> src EEW.
> +;; RISC-V 'V' Spec 5.2. Vector Operands:
> +;; The destination EEW is greater than the source EEW, the source EMUL is at 
> least 1,
> +;; and the overlap is in the highest-numbered part of the destination 
> register group.
> +;; (e.g., when LMUL=8, vzext.vf4 v0, v6 is legal, but a source of v0, v2, or 
> v4 is not).
> +(define_register_constraint "W21" "TARGET_VECTOR ? V_REGS : NO_REGS"
> +  "A vector register has register number % 2 == 1." "regno % 2 == 1")
> +
> +(define_register_constraint "W42" "TARGET_VECTOR ? V_REGS : NO_REGS"
> +  "A vector register has register number % 4 == 2." "regno % 4 == 2")
> +
> +(define_register_constraint "W84" "TARGET_VECTOR ? V_REGS : NO_REGS"
> +  "A vector register has register number % 8 == 4." "regno % 8 == 4")
> +
> +(define_register_constraint "W41" "TARGET_VECTOR ? V_REGS : NO_REGS"
> +  "A vector register has register number % 4 == 1." "regno % 4 == 1")
> +
> +(define_register_constraint "W81" "TARGET_VECTOR ? V_REGS : NO_REGS"
> +  "A vector register has register number % 8 == 1." "regno % 8 == 1")
> +
> +(define_register_constraint "W82" "TARGET_VECTOR ? V_REGS : NO_REGS"
> +  "A vector register has register number % 8 == 2." "regno % 8 == 2")
> +
 
I can't really match spec and code.  For the lmul = 2 case sure,
but W84 e.g. allows v4 and not v6?  What actually is "highest-numbered part"?
 
> +(define_attr "vconstraint" "no,W21,W42,W84,W41,W81,W82"
> +  (const_string "no"))
> +
> +(define_attr "vconstraint_enabled" "no,yes"
> +  (cond [(eq_attr "vconstraint" "no")
> + (const_string "yes")
> +
> + (and (eq_attr "vconstraint" "W21")
> +   (match_test "riscv_get_v_regno_alignment (GET_MODE (operands[0])) != 
> 2"))
> + (const_string "no")
> +
> + (and (eq_attr "vconstraint" "W42,W41")
> +   (match_test "riscv_get_v_regno_alignment (GET_MODE (operands[0])) != 
> 4"))
> + (const_string "no")
> +
> + (and (eq_attr "vconstraint" "W84,W81,W82")
> +   (match_test "riscv_get_v_regno_alignment (GET_MODE (operands[0])) != 
> 8"))
> + (const_string "no")
> +]
> +   (const_string "yes")))
> +
 
The name vconstraint doesn't say anything.
 
>  ;; vwcvt.x.x.v
>  (define_insn "@pred_"
> -  [(set (match_operand:VWEXTI 0 "register_operand"  
> "=,")
> +  [(set (match_operand:VWEXTI 0 "register_operand"   "=vr,   
> vr,   vr,   vr,  vr,vr, ?, ?")
 
Regarding earlyclobber.  We still don't consume all inputs before
writing them (there is overlap after all).  Is that the gist of
the change?  Circumventing the earlyclobber by "strategically"
selecting registers?
 
Why is the disparage necessary?  What happens if it's not there?
 
Regards
Robin

Re: Re: [PATCH] RISC-V: Optimize a special case of VLA SLP

2023-11-23 Thread 钟居哲

I don't think loop vectorizer can do more optimization here.

GCC pass to vec_perm_const targethook vec_perm <,,(nunits - 1, nunits , nuits + 
1, )>
to handle that. It's very target dependent. We can't do more about that.

For RVV, it's better transform this case into vec_extract + vec_shl_insert.
However, for ARM SVE, it's not. ARM SVE has a dedicated instruction to handle 
that (trn),
it's better to pass vec_perm_const with this permute indice for ARM SVE.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-11-23 22:58
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Optimize a special case of VLA SLP
LGTM (and harmless enough) but I'd rather wait for a second look or a
maintainer's OK as we're past stage 1 and it's not a real bugfix.
(On top, it's Thanksgiving so not many people will even notice).
 
On a related note, this should probably be a middle-end optimization
but before a variable-index vec extract most likely nobody bothered. 
 
Regards
Robin

Re: Re: [PATCH] RISC-V: Optimize a special case of VLA SLP

2023-11-23 Thread 钟居哲

Thanks Robin.

Send V2: 
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638033.html 
with adding changeLog since I realize changlog issue in V1:
gcc/ChangeLog:
* config/riscv/riscv-v.cc (shuffle_extract_and_slide1up_patterns):
(expand_vec_perm_const_1):


Tested on zvl128b/zvl256b/zvl512b/zvl1024b on both RV32 and RV64 no regression.

Hope we can land it on GCC-14.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-11-23 22:58
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Optimize a special case of VLA SLP
LGTM (and harmless enough) but I'd rather wait for a second look or a
maintainer's OK as we're past stage 1 and it's not a real bugfix.
(On top, it's Thanksgiving so not many people will even notice).
 
On a related note, this should probably be a middle-end optimization
but before a variable-index vec extract most likely nobody bothered. 
 
Regards
Robin

Re: Re: [PATCH 0/5] Add support for operand-specific alignment requirements

2023-11-22 Thread 钟居哲

Hi, Richard.

Current define_mode_attr can only map an attribute for a mode.
I wonder whether we can map a mode to multiple attributes ?

E.g. (define_mode_attr dest_constraint [(V16QI "")])

But I want it to be:

(define_mode_attr dest_constraint [(V16QI (TARGET_MIN_VLEN <= 128 "vr") 
(TARGET_MIN_VLEN > 128 "")) ])

It seems that we can't achieve this for now. Would it be possible we exend it 
in GCC-15 ?

juzhe.zh...@rivai.ai

From: Richard Sandiford
Date: 2023-11-22 18:08
To: juzhe.zhong\@rivai.ai
CC: gcc-patches; vmakarov\@redhat.com; kito.cheng
Subject: Re: [PATCH 0/5] Add support for operand-specific alignment requirements
"juzhe.zh...@rivai.ai"  writes:
> Hi, Richard.
>
> Thanks for supporting register filter in IRA/LRA.
> I found it is useful for RVV since we have a set of widen operations that 
> allow source register overlap highpart of dest register group
>
> For example, if vsext.vf2 v0(dest consume reg v0 and reg v1), v1 (source 
> consume v1 only)
> I want to support the highpart overlap above. (Currently, we don't any 
> overlap between source and dest in such instructions).
>
> So, I wonder whether we can pass "machine_mode" into register filter. Ok, I 
> think it's too late since stage 1 closes. I wonder we can add it in GCC-15?

I think adding a mode would add too much overhead.  The mode would be
the mode of the operand, but with subregs, the mode of the operand can
be different from the mode of the RA allocno.  So it would no longer
be enough for the RA to calculate a bitmask of filters.  It would need
ro remember which modes are used with those filters.

We'd also need to turn the current HARD_REG_SETs into [MAX_MACHINE_MODE]
arrays of HARD_REG_SETs.  (And there are now more than 256 machine modes
for riscv.)

The pattern that uses the constraints should already "know" the mode.
So if possible, I think it would be better to use different constraints
for different modes, using define_mode_attrs.

Thanks,
Richard

Re: Re: RISC-V: Support XTheadVector extensions

2023-11-22 Thread 钟居哲

I prefer ASM_OUTPUT_OPCODE or  assembler dialect to %^ and I don't want to see 
any change of vector.md.

%^ will cause high burden for future maintainment.

Besides, ASM_OUTPUT_OPCODE can the whole string. My patch is just a draft.
We can exlude for example, in zvbb, we can exclude appending "th." in vrev.v 
instruction.



juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-11-23 06:27
To: Christoph Müllner; 钟居哲
CC: gcc-patches; kito.cheng; kito.cheng; cooper.joshua; rdapp.gcc; 
philipp.tomsich; Cooper Qu; jinma; Nelson Chu
Subject: Re: RISC-V: Support XTheadVector extensions
 
 
On 11/22/23 07:24, Christoph Müllner wrote:
> On Wed, Nov 22, 2023 at 2:52 PM 钟居哲  wrote:
>>
>> I am totally ok to approve theadvector on GCC-14 before stage 3 close
>> as long as it doesn't touch the current RVV codes too much and binutils 
>> supports theadvector.
>>
>> I have provided the draft approach:
>> https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637349.html
>> which turns out doesn't need to change any codes of vector.md.
>> I strongly suggest follow this draft. I can be actively review theadvector 
>> during stage 3.
>> And hopefully can help you land theadvector on GCC-14.
> 
> I see now two approaches:
> 1) Let GCC emit RVV instructions for XTheadVector for instructions
> that are in both
> 2) Use the ASM_OUTPUT_OPCODE hook to output "th." for these instructions
> 
> No doubt, the ASM_OUTPUT_OPCODE hook approach is better than our
> format-string approach, but would 1) not be the even better
> solution? It would also mean, that not a single test case is required
> for these overlapping instructions (only a few tests that ensure that
> we don't emit RVV instructions that are not available in
> XTheadVector). Besides that, letting GCC emit RVV instructions for
> XTheadVector is a very clever idea, because it fully utilizes the
> fact that both extensions overlap to a huge degree.
> 
> The ASM_OUTPUT_OPCODE approach could lead to an issue if we enable
XTheadVector
> with any other vector extension, say Zvfoo. In this case the Zvfoo 
> instructions will all be prefixed as well with "th.". I know that it
> is not likely to run into this problem (such a machine does not exist
> in real hardware), but it is possible to trigger this issue easily
> and approach 1) would not have this potential issue.
I'm not a big fan of the ASM_OUTPUT_OPCODE approach.While it is 
simple, I worry a bit about it from a long term maintenance standpoint. 
As you note we could well end up at some point with an extension that 
has an mnenomic starting with "v" that would blow up.  But I certainly 
see the appeal of such a simple test to support thead vector.
 
Given there are at least 3 approaches that can fix that problem (%^, 
assembler dialect or ASM_OUTPUT_OPCODE), maybe we could set that 
discussion aside in the immediate term and see if there are other issues 
that are potentially more substantial.
 
 
 
 
--
 
 
 
More generally, I think I need to soften my prior statement about 
deferring this to gcc-15.  This code was submitted in time for the 
gcc-14 deadline, so it should be evaluated just like we do anything else 
that makes the deadline.  There are various criteria we use to evaluate 
if something should get integrated and we should just work through this 
series like we always do and not treat it specially in any way.
 
 
jeff

Re: Re: RISC-V: Support XTheadVector extensions

2023-11-22 Thread 钟居哲

I am totally ok to approve theadvector on GCC-14 before stage 3 close
as long as it doesn't touch the current RVV codes too much and binutils 
supports theadvector.

I have provided the draft approach:
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637349.html 
which turns out doesn't need to change any codes of vector.md.
I strongly suggest follow this draft. I can be actively review theadvector 
during stage 3.
And hopefully can help you land theadvector on GCC-14.

Thanks.



juzhe.zh...@rivai.ai
 
From: Christoph Müllner
Date: 2023-11-22 18:07
To: juzhe.zh...@rivai.ai
CC: gcc-patches; kito.cheng; Kito.cheng; cooper.joshua; Robin Dapp; 
jeffreyalaw; Philipp Tomsich; Cooper Qu; Jin Ma; Nelson Chu
Subject: Re: RISC-V: Support XTheadVector extensions
Hi Juzhe,
 
Sorry for the late reply, but I was not on CC, so I missed this email.
 
On Fri, Nov 17, 2023 at 2:41 PM juzhe.zh...@rivai.ai
 wrote:
>
> Ok. I just read the theadvector extension.
>
> https://github.com/T-head-Semi/thead-extension-spec/blob/master/xtheadvector.adoc
>
> Theadvector is not custom extension. Just a uarch to disable some of the 
> RVV1.0 extension
> Theadvector can be considered as subextension of 'V' extension with disabling 
> some of the
> instructions and adding some new thead vector target load/store (This is 
> another story).
>
> So, for disabling the instruction that theadvector doesn't support.
> You don't need to touch such many codes.
>
> Here is a much simpler approach to do (I think it's definitely working):
> 1. Don't change any codes in vector.md and keep GCC generates ASM with "th." 
> prefix.
> 2. Add !TARGET_THEADVECTOR into vector-iterator.md to disable the mode you 
> don't want.
> For example , theadvector doesn't support fractional vector.
>
> Then it's pretty simple:
>
> RVVMF2SI "TARGET_VECTOR && !TARGET_THEADVECTOR".
>
> 3. Remove all the tests you add in this patch.
> 4. You can add theadvector specific load/store for example, th.vlb 
> instructions they are allowed.
> 5. Modify binutils, and make th.vmulh.vv as the pseudo instruction of vmulh.vv
> 6. So with compile option "-S", you will still see ASM as  "vmulh.vv". but 
> with objdump, you will see th.vmulh.vv.
 
Yes, all these points sound reasonable, to minimize the patchset size.
I believe in point 1 you meant "without th. prefix".
 
I've added Jin Ma (who is the main author of the Binutils patchset) so
he is also aware
of the proposal to use pseudo instructions to avoid duplication in Binutils.
 
Thank you very much!
Christoph
 
 
>
> After this change, you can send V2, then I can continue to review on GCC-15.
>
> Thanks.
>
> 
> juzhe.zh...@rivai.ai
>
>
> From: juzhe.zh...@rivai.ai
> Date: 2023-11-17 19:39
> To: gcc-patches
> CC: kito.cheng; kito.cheng; cooper.joshua; Robin Dapp; jeffreyalaw
> Subject: RISC-V: Support XTheadVector extensions
> 90% theadvector extension reusing current RVV 1.0 instructions patterns:
> Just change ASM, For example:
>
> @@ -2923,7 +2923,7 @@ (define_insn "*pred_mulh_scalar"
>   (match_operand:VFULLI_D 3 "register_operand"  "vr,vr, vr, vr")] VMULH)
>(match_operand:VFULLI_D 2 "vector_merge_operand" "vu, 0, vu,  0")))]
>"TARGET_VECTOR"
> -  "vmulh.vx\t%0,%3,%z4%p1"
> +  "%^vmulh.vx\t%0,%3,%z4%p1"
>[(set_attr "type" "vimul")
> (set_attr "mode" "")])
>
> +  if (letter == '^')
> +{
> +  if (TARGET_XTHEADVECTOR)
> + fputs ("th.", file);
> +  return;
> +}
>
>
> For almost all patterns, you just simply append "th." in the ASM prefix.
> like change "vmulh.vv" -> "th.vmulh.vv"
>
> Almost all theadvector instructions are not new features,  all same as RVV1.0.
> Why do you invent the such ISA doesn't include any features that RVV1.0 
> doesn't satisfy ?
>
> I am not explicitly object this patch. But I should know the reason.
>
> Btw, stage 1 will close soon.  So I will review this patch on GCC-15 as long 
> as all other RISC-V maintainers agree.
>
>
> 
> juzhe.zh...@rivai.ai

Re: [PATCH] RISC-V: testsuite: Do not set default arch for RVV.

2023-11-20 Thread 钟居哲

LGTM.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-11-21 00:26
To: gcc-patches; palmer; Kito Cheng; jeffreyalaw; juzhe.zh...@rivai.ai
CC: rdapp.gcc
Subject: [PATCH] RISC-V: testsuite: Do not set default arch for RVV.
Hi,
 
as per recent discussion and in order to fix inconsistencies
between spike and qemu this patch removes gcc_march and gcc_mabi
arguments from the default CFLAGS in the testsuite invocation for
some sub directories.
 
Juzhe reported that this helps for him.
 
Regards
Robin
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/rvv.exp:  Remove -march and -mabi from
default CFLAGS.
---
gcc/testsuite/gcc.target/riscv/rvv/rvv.exp | 9 +
1 file changed, 1 insertion(+), 8 deletions(-)
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp 
b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
index 237a20e11aa..1d5041b0c8c 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
+++ b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
@@ -30,18 +30,11 @@ if ![info exists DEFAULT_CFLAGS] then {
 set DEFAULT_CFLAGS " -ansi -pedantic-errors"
}
-set gcc_march "rv64gcv_zfh"
-set gcc_mabi  "lp64d"
-if [istarget riscv32-*-*] then {
-  set gcc_march "rv32gcv_zfh"
-  set gcc_mabi  "ilp32d"
-}
-
# Initialize `dg'.
dg-init
# Main loop.
-set CFLAGS "$DEFAULT_CFLAGS -march=$gcc_march -mabi=$gcc_mabi -O3"
+set CFLAGS "$DEFAULT_CFLAGS -O3"
dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/base/*.\[cS\]]] \
"" $CFLAGS
gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/vsetvl/*.\[cS\]]] \
-- 
2.42.0

Re: [PATCH] RISC-V: testsuite: Add rv64 requirement for bug-9 and bug-14.

2023-11-20 Thread 钟居哲

/* { dg-do run { target { { {riscv_v} && {rv64} } } } } */

Seems you should remove rv64 here ? sicne I think it is redundant here.


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-11-21 00:26
To: gcc-patches; palmer; Kito Cheng; jeffreyalaw; juzhe.zh...@rivai.ai
CC: rdapp.gcc
Subject: [PATCH] RISC-V: testsuite: Add rv64 requirement for bug-9 and bug-14.
Hi,
 
this adds an effective target requirement to compile the tests.  Since
we disabled 64-bit indices on rv32 targets those tests should be
unsupported on rv32.
 
Regards
Robin
 
gcc/testsuite/ChangeLog:
 
* g++.target/riscv/rvv/base/bug-14.C: Add
dg-require-effective-target rv64.
* g++.target/riscv/rvv/base/bug-9.C: Ditto.
---
gcc/testsuite/g++.target/riscv/rvv/base/bug-14.C | 1 +
gcc/testsuite/g++.target/riscv/rvv/base/bug-9.C  | 1 +
2 files changed, 2 insertions(+)
 
diff --git a/gcc/testsuite/g++.target/riscv/rvv/base/bug-14.C 
b/gcc/testsuite/g++.target/riscv/rvv/base/bug-14.C
index bf0c7bd3a36..0d35f2056c6 100644
--- a/gcc/testsuite/g++.target/riscv/rvv/base/bug-14.C
+++ b/gcc/testsuite/g++.target/riscv/rvv/base/bug-14.C
@@ -1,5 +1,6 @@
/* { dg-do run { target { { {riscv_v} && {rv64} } } } } */
/* { dg-options "-O2" } */
+/* { dg-require-effective-target rv64 } */
#include
#include
diff --git a/gcc/testsuite/g++.target/riscv/rvv/base/bug-9.C 
b/gcc/testsuite/g++.target/riscv/rvv/base/bug-9.C
index 8d17883bb57..4241f940d63 100644
--- a/gcc/testsuite/g++.target/riscv/rvv/base/bug-9.C
+++ b/gcc/testsuite/g++.target/riscv/rvv/base/bug-9.C
@@ -1,5 +1,6 @@
/* { dg-do run { target { { {riscv_v} && {rv64} } } } } */
/* { dg-options "-O2" } */
+/* { dg-require-effective-target rv64 } */
#include
#include
-- 
2.42.0

Re: Re: RISC-V: Support XTheadVector extensions

2023-11-18 Thread 钟居哲

Currently I start to work on full coverage testing (with different compile 
option test GCC testsuite)
and fix bugs which is highest priority definitely.

I am not able to find the time review this patch on GCC-14 for now.

So conservatively, postpone it to GCC-15.  

If we are lucky that I stablize RVV support quickly, we still have a chance to 
make it landed on GCC-14.
It all depends on my review.

But no worry, I will review that eventually.



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-11-18 18:32
To: Philipp Tomsich
CC: Jeff Law; juzhe.zh...@rivai.ai; gcc-patches; kito.cheng; cooper.joshua; 
Robin Dapp; jkridner
Subject: Re: RISC-V: Support XTheadVector extensions
I guess it would be worth to state my thought publicly:
 
I *support* adding the T-head vector (a.k.a. vector 0.7) to upstream
GCC since T-Head vector already ships a large enough number of boards,
also it's not really T-head's problem as Palmer described in another
mail.
 
My biggest concern before is T-head folks didn't involved into
community work too much, so accept that definitely will increasing
work for maintainers, however I saw T-head folks is trying to
contribute stuffs to upstream now, so may not a concern now, also I
believe accept this patch will encourage they work more on upstream
together, which is benefit to each other.
 
Back to the one of the biggest issues for the patch set: GCC 14 or GCC
15. My general thought is it may be OK if it's less invasive enough,
then should be OK for GCC 14, but I don't have a strong opinion, since
as you know I am not the main developer of the vector part, so I will
let Ju-Zhe make the final decision, because he is the one who
contributes most things to RISC-V vector gcc support.

Re: Re: RISC-V: Support XTheadVector extensions

2023-11-17 Thread 钟居哲

>> I suspect it's going to be even worse if you we have multiple patterns
>> with the same underlying RTL, but just different output strings.
No. We don't need to add (duplicate) any new patterns.
I know RVV GCC very well. I know how to do that.


juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-11-18 08:01
To: 钟居哲; palmer
CC: gcc-patches; kito.cheng; kito.cheng; cooper.joshua; rdapp.gcc
Subject: Re: RISC-V: Support XTheadVector extensions
 
 
On 11/17/23 16:16, 钟居哲 wrote:
>  >> I assume this hunk is meant for riscv_output_operand in riscv.cc.  We
>>>may also need to add '^' to the punct_valid_p hook.  But yes, this is
>>>the preferred way to go when all we need to do is prefix the instruction
>>>with "th.".
> 
> No. I don't think we need to add '^' . I don't want theadvector to touch 
> any codes
> of vector.md.
> Mixing up theadvector with RVV1.0 is a nighmare for RVV maintain.
> People like me don't want to touch any thing related to Thead.
> But anyway, I will take care of that in GCC-15.
I suspect it's going to be even worse if you we have multiple patterns 
with the same underlying RTL, but just different output strings.
 
The standard way to handle that has been with an output modifier and/or 
ASSEMBLER_DIALECT.  If you look at the PA port for example, the 
assembler syntax changed dramatically between the PA1.0/PA1.1 era and 
the PA2.0 era.  But we support both variants trivially without 
duplicating all the patterns.
 
But we've got time to sort this out.  I don't think the code in question 
was targeted towards gcc-14.
 
 
jeff

Re: Re: RISC-V: Support XTheadVector extensions

2023-11-17 Thread 钟居哲

>> I assume this hunk is meant for riscv_output_operand in riscv.cc.  We
>> may also need to add '^' to the punct_valid_p hook.  But yes, this is
>> the preferred way to go when all we need to do is prefix the instruction
>> with "th.".

No. I don't think we need to add '^' . I don't want theadvector to touch any 
codes
of vector.md.
Mixing up theadvector with RVV1.0 is a nighmare for RVV maintain.
People like me don't want to touch any thing related to Thead.
But anyway, I will take care of that in GCC-15.

juzhe.zh...@rivai.ai

From: Palmer Dabbelt
Date: 2023-11-18 01:11
To: juzhe.zhong
CC: gcc-patches; Kito Cheng; kito.cheng; cooper.joshua; rdapp.gcc; jeffreyalaw
Subject: Re: RISC-V: Support XTheadVector extensions
On Fri, 17 Nov 2023 03:39:48 PST (-0800), juzhe.zh...@rivai.ai wrote:
> 90% theadvector extension reusing current RVV 1.0 instructions patterns:
> Just change ASM, For example:
> 
> @@ -2923,7 +2923,7 @@ (define_insn "*pred_mulh_scalar"
>   (match_operand:VFULLI_D 3 "register_operand"  "vr,vr, vr, vr")] VMULH)
>(match_operand:VFULLI_D 2 "vector_merge_operand" "vu, 0, vu,  0")))]
>"TARGET_VECTOR"
> -  "vmulh.vx\t%0,%3,%z4%p1"
> +  "%^vmulh.vx\t%0,%3,%z4%p1"
>[(set_attr "type" "vimul")
> (set_attr "mode" "")])
> +  if (letter == '^')
> +{
> +  if (TARGET_XTHEADVECTOR)
> + fputs ("th.", file);
> +  return;
> +}
> 
> For almost all patterns, you just simply append "th." in the ASM prefix.
> like change "vmulh.vv" -> "th.vmulh.vv"
> 
> Almost all theadvector instructions are not new features,  all same as RVV1.0.
> Why do you invent the such ISA doesn't include any features that RVV1.0 
> doesn't satisfy ?
> 
> I am not explicitly object this patch. But I should know the reason.

There's some more in the later threads, but with the top posting it kind 
of got lost so I'm just replying here.

This really isn't T-Head's fault: we announced V-0.7 as a stable draft 
that was being implemented, and then T-Head went and implemented it.  
Most of that history has been scrubbed by RVI, but you can still find 
some stuff like this old talk on YouTube 
.

In general we've just figured out a way to make things work when HW 
vendors end up in a grey area in RISC-V land.  That obviously results in 
a bunch of pain for the SW people, but this stuff is only useful if we 
can run on real HW and that always involves some amount of pain.  
Hopefully we can get to a point where we make fewer problems for 
ourselves, but we've got a long history to dig out from and there's 
going to be a lot more of this in the future.

So I don't like this XTHeadV stuff, but I think we're best to take it: 
these guys tried to do the right thing and got thrown under the bus by 
RVI, we should help them.  This is almost certainly going to be a lot 
more pain that we're used to, just given the size of the extensions in 
question, but I still think it's the right  way to go.

The other option is to essentially just tell them to fork the ISA, which 
isn't good for anyone.

> Btw, stage 1 will close soon.  So I will review this patch on GCC-15 as long 
> as all other RISC-V maintainers agree.

I agree this is gcc-15 material: there's a lot of subtle differences in 
behavior between 0.7 and 1.0, even when the mnemonics are the same.  
We're already pretty buried in testing for 14, so trying to pick up 
another target is going to be a huge headache (particularly one that's a 
bit special).

> 
> 
> 
> 
> juzhe.zh...@rivai.ai

Re: Re: [PATCH] RISC-V: Disallow 64-bit indexed loads and stores for rv32gcv.

2023-11-17 Thread 钟居哲

>> Yeah, just noticed that myself.  Anyway will do some more tests,
>> maybe my initial VLS analysis was somehow flawed.

You can check binop_vx_constraint-167.c ~ binop_vx_constraint-174.c

This patch is pre-approved if you change as my suggestion.
I am gonna sleep so I am not able to review again. 
Feel free to commit it after change as I suggested.

Thanks.


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-11-17 23:13
To: 钟居哲; gcc-patches; palmer; kito.cheng; Jeff Law
CC: rdapp.gcc
Subject: Re: [PATCH] RISC-V: Disallow 64-bit indexed loads and stores for 
rv32gcv.
> It must be correct. We already have test (intrinsic codes) for it.
 
Yeah, just noticed that myself.  Anyway will do some more tests,
maybe my initial VLS analysis was somehow flawed.
> Condition should be put into iterators (Add a new iterator for
> indexed load store).
 
Ah, that's what you meant.  Sure.
 
Regards
Robin

Re: Re: [PATCH] RISC-V: Disallow 64-bit indexed loads and stores for rv32gcv.

2023-11-17 Thread 钟居哲

>> I'm wondering whether the VLA modes in the iterator are correct.
>> Looks dubious to me but unsure, will need to create some tests
>> before continuing.

It must be correct. We already have test (intrinsic codes) for it.

>> What's the problem with those?  We probably won't reach there
>> because the indexed is considered invalid before but we could,
>> theoretically, still combine them?
Condition should be put into iterators (Add a new iterator for indexed load 
store).
Like you did on RAITO iterators adding !TARGET_64BIT.
It's easier to maintain since iterators codes only happence once.
Wheras this patch is adding GET_MODE_BITSIZE (GET_MODE_INNER (mode)) <= 
GET_MODE_BITSIZE (Pmode)
in many places.


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-11-17 22:55
To: 钟居哲; gcc-patches; palmer; kito.cheng; Jeff Law
CC: rdapp.gcc
Subject: Re: [PATCH] RISC-V: Disallow 64-bit indexed loads and stores for 
rv32gcv.
> OK. Make sense。
 
I'm wondering whether the VLA modes in the iterator are correct.
Looks dubious to me but unsure, will need to create some tests
before continuing.
 
> LGTM as long as you remove  all
> GET_MODE_BITSIZE (GET_MODE_INNER (mode)) <= GET_MODE_BITSIZE (Pmode)
 
What's the problem with those?  We probably won't reach there
because the indexed is considered invalid before but we could,
theoretically, still combine them?
 
Regards
Robin

Re: Re: [PATCH] RISC-V: Disallow 64-bit indexed loads and stores for rv32gcv.

2023-11-15 Thread 钟居哲

OK. Make sense。
LGTM as long as you remove  all
GET_MODE_BITSIZE (GET_MODE_INNER (mode)) <= GET_MODE_BITSIZE (Pmode)

juzhe.zh...@rivai.ai

From: Robin Dapp
Date: 2023-11-16 04:30
To: 钟居哲; gcc-patches; palmer; kito.cheng; Jeff Law
CC: rdapp.gcc
Subject: Re: [PATCH] RISC-V: Disallow 64-bit indexed loads and stores for 
rv32gcv.
On 11/15/23 15:29, 钟居哲 wrote:
> Could you show me the example ?
> 
> It's used by handling SEW = 64 on RV32. I don't know why this patch touch 
> this code.

Use gather_load_run-1.c with the 64-bit index patterns disabled
on rv32.  We insert (mem:DI (reg:SI)) into a vector so use the
SEW = 64 demote handler.  There we set vl = vl * 2 (which is correct)
but the mode (i.e. vector) just changes from DI to SI while
keeping the number of elements the same.  Then we get e.g. go
from V8DI to V8SI and slide down 16 elements, losing the lower
half.  

Regards
Robin

Re: Re: [PATCH] RISC-V: Disallow 64-bit indexed loads and stores for rv32gcv.

2023-11-15 Thread 钟居哲

Could you show me the example ?

It's used by handling SEW = 64 on RV32. I don't know why this patch touch this 
code.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-11-15 22:27
To: 钟居哲; gcc-patches; palmer; kito.cheng; Jeff Law
CC: rdapp.gcc
Subject: Re: [PATCH] RISC-V: Disallow 64-bit indexed loads and stores for 
rv32gcv.
> Looks wrong. Recover back.
 
When we demote we use two elements where there was one before.
Therefore the vector needs to be able to hold twice as many
elements.  We adjust vl correctly but the mode is not here.
 
Regards
Robin

Re: Re: [PATCH] RISC-V: testsuite: Fix 32-bit FAILs.

2023-11-15 Thread 钟居哲

Hi, Kito. Could you take a look at this issue?

-march parser is consistent between non-linux and linux.

You can simplify verify it with these cases:

FAIL: gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c -std=c99 -O3 
-ftree-vectorize --param riscv-autovec-preference=fixed-vlmax (test for excess 
errors)
FAIL: gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-runu.c -std=c99 -O3 
-ftree-vectorize --param riscv-autovec-preference=fixed-vlmax (test for excess 
errors)
FAIL: gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-zvfh-run.c -std=c99 
-O3 -ftree-vectorize --param riscv-autovec-preference=fixed-vlmax (test for 
excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c -std=c99 -O3 
-ftree-vectorize --param riscv-autovec-preference=fixed-vlmax (test for excess 
errors)
FAIL: gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-zvfh-run.c -std=c99 -O3 
-ftree-vectorize --param riscv-autovec-preference=fixed-vlmax (test for excess 
errors)
FAIL: gcc.target/riscv/rvv/autovec/vmv-imm-run.c -O3 -ftree-vectorize (test for 
excess errors)

These cases failed on non-linux toolchain, but pass on linux toolchain.
This consistency is caused by your previous multilib patch as Lehua said:
https://github.com/gcc-mirror/gcc/commit/17d683d 




juzhe.zh...@rivai.ai
 
From: Lehua Ding
Date: 2023-11-13 19:27
To: kito.cheng; Robin Dapp
CC: juzhe.zh...@rivai.ai; gcc-patches; palmer; jeffreyalaw
Subject: Re: [PATCH] RISC-V: testsuite: Fix 32-bit FAILs.
Hi Kito,
 
On 2023/11/13 19:13, Lehua Ding wrote:
> Hi Robin,
> 
> On 2023/11/13 18:33, Robin Dapp wrote:
>>> On 2023/11/13 18:22, juzhe.zh...@rivai.ai wrote:
 If there is a difference between them. I think we should fix 
 riscv-common.cc.
 Since I think "zvfh_zfh" should not be different with "zfh_zvfh"
>>>
>>> It's possible. Let me debug it and see if there's a problem.
>>
>> I don't think it is different.  Just checked and it still works for me.
>>
>> Could you please tell me how you invoke the testsuite?
> 
> This looks to be the difference between the linux and elf versions of 
> gcc. The elf version of gcc we are build will have this problem, the 
> linux version of gcc will not. I think the linux version of gcc has a 
> wrong behavior.:
> 
> ➜  riscv-gnu-toolchain-push git:(tintin-dev) 
> ./build/dev-rv32gcv_zfh_zvfh-ilp32d-medany-newlib-spike-debug/install/bin/riscv32-unknown-elf-gcc
>  -march=rv32gcv_zfh 
> build/dev-rv64gcv_zvfh_zfh-lp64d-medany-newlib-spike-debug/hello.c
> riscv32-unknown-elf-gcc: fatal error: Cannot find suitable multilib set 
> for 
> '-march=rv32imafdcv_zicsr_zifencei_zfh_zfhmin_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b'/'-mabi=ilp32d'
> compilation terminated.
> ➜  riscv-gnu-toolchain-push git:(tintin-dev) 
> ./build/dev-rv32gcv_zfh_zvfh-ilp32d-medany-linux-spike-debug/install/bin/riscv32-unknown-linux-gnu-gcc
>  -march=rv32gcv_zfh 
> build/dev-rv64gcv_zvfh_zfh-lp64d-medany-newlib-spike-debug/hello.c
> 
 
It looks like this commit[1] from you make the difference between elf 
and linux. Can you help to see if it makes sense to behave differently 
now? elf version --with-arch is rv32gcv_zvfh_zfh, and the user will get 
an error with -march=rv32gcv_zfh. linux version will not.
 
[1] https://github.com/gcc-mirror/gcc/commit/17d683d
 
-- 
Best,
Lehua (RiVAI)
lehua.d...@rivai.ai

Re: Re: [PATCH] RISC-V: vsetvl: Refine REG_EQUAL equality.

2023-11-13 Thread 钟居哲

I just checked your test. I won't be brittle in the future.
Since it should be 4 vsetvls with e16m1 for SLP AVL/VL toggling.
And also it is no scheduling.  The middle-end MIN_EXPR SLP always produce 4 
AVL/VL toggling
as long as we don't schedule the instructions.

So it won't be problem.

So, LGTM.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-11-13 21:28
To: juzhe.zh...@rivai.ai; gcc-patches; palmer; kito.cheng; jeffreyalaw
CC: rdapp.gcc
Subject: Re: [PATCH] RISC-V: vsetvl: Refine REG_EQUAL equality.
On 11/13/23 11:36, juzhe.zh...@rivai.ai wrote:
> --- /dev/null
> +++ 
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_zbb_run-2.c
> @@ -0,0 +1,19 @@
> +/* { dg-do run { target { riscv_v } } } */
> +/* { dg-additional-options "-march=rv64gcv_zbb --param 
> riscv-autovec-preference=fixed-vlmax" } */
> 
> Could you add compile test (with assembly check) instead of run test ?
 
I found it a bit difficult to create a proper test, hopefully the attached
is not too brittle.
 
My impression is that it would be easier to have such tests if there were
vsetvl statistics of how many vsetvls we merged, fused and for what
reasons etc.
Maybe that's a good learning exercise to get familiar with the pass for
somebody?
 
Regards
Robin
 
Subject: [PATCH v3] RISC-V: vsetvl: Refine REG_EQUAL equality.
 
This patch enhances the equality check for REG_EQUAL notes in the vsetvl
pass by using the == operator instead of rtx_equal_p.  With that, in
situations like the following, a5 and a7 are not considered equal
anymore.
 
(insn 62 60 63 4 (set (reg:DI 17 a7 [orig:154 loop_len_54 ] [154])
(umin:DI (reg:DI 15 a5 [orig:174 _100 ] [174])
(reg:DI 30 t5 [219]))) 442 {umindi3}
 (expr_list:REG_EQUAL (umin:DI (reg:DI 15 a5 [orig:174 _100 ] [174])
(const_int 8 [0x8]))
(nil)))
(insn 63 62 65 4 (set (reg:DI 15 a5 [orig:175 _103 ] [175])
(minus:DI (reg:DI 15 a5 [orig:174 _100 ] [174])
(reg:DI 17 a7 [orig:154 loop_len_54 ] [154]))) 11 {subdi3}
 (nil))
(insn 65 63 66 4 (set (reg:DI 16 a6 [orig:153 loop_len_53 ] [153])
(umin:DI (reg:DI 15 a5 [orig:175 _103 ] [175])
(reg:DI 30 t5 [219]))) 442 {umindi3}
 (expr_list:REG_EQUAL (umin:DI (reg:DI 15 a5 [orig:175 _103 ] [175])
(const_int 8 [0x8]))
(nil)))
 
gcc/ChangeLog:
 
* config/riscv/riscv-vsetvl.cc (source_equal_p): Use pointer
equality for REG_EQUAL.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_zbb.c: New test.
---
gcc/config/riscv/riscv-vsetvl.cc  | 12 +-
.../rvv/autovec/partial/multiple_rgroup_zbb.c | 23 +++
2 files changed, 34 insertions(+), 1 deletion(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_zbb.c
 
diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 3fa25a6404d..63f966f2f3a 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -561,7 +561,17 @@ source_equal_p (insn_info *insn1, insn_info *insn2)
   rtx note1 = find_reg_equal_equiv_note (rinsn1);
   rtx note2 = find_reg_equal_equiv_note (rinsn2);
   if (note1 && note2 && rtx_equal_p (note1, note2))
-return true;
+{
+  /* REG_EQUIVs are invariant at function scope.  */
+  if (REG_NOTE_KIND (note2) == REG_EQUIV)
+ return true;
+
+  /* REG_EQUAL are not so in order to consider them similar the RTX they
+ point to must be identical.  We could also allow "rtx_equal"
+ REG_EQUALs but would need to check if no insn between them modifies
+ any of their sources.  */
+  return note1 == note2;
+}
   return false;
}
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_zbb.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_zbb.c
new file mode 100644
index 000..15178a2c848
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_zbb.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } *.
+/* { dg-options "-march=rv64gcv_zbb -mabi=lp64d -O2 --param 
riscv-autovec-preference=fixed-vlmax -fno-schedule-insns -fno-schedule-insns2" 
} */
+
+#include 
+
+void __attribute__ ((noipa))
+test (uint16_t *__restrict f, uint32_t *__restrict d, uint64_t *__restrict e,
+  uint16_t x, uint16_t x2, uint16_t x3, uint16_t x4, uint32_t y,
+  uint32_t y2, uint64_t z, int n)
+{
+  for (int i = 0; i < n; ++i)
+{
+  f[i * 4 + 0] = x;
+  f[i * 4 + 1] = x2;
+  f[i * 4 + 2] = x3;
+  f[i * 4 + 3] = x4;
+  d[i * 2 + 0] = y;
+  d[i * 2 + 1] = y2;
+  e[i] = z;
+}
+}
+
+/* { dg-final { scan-assembler-times 
"vsetvli\tzero,\s*\[a-z0-9\]+,\s*e16,\s*m1,\s*ta,\s*ma" 4 } } */
-- 
2.41.0

Re: Re: [PATCH 0/7] ira/lra: Support subreg coalesce

2023-11-11 Thread 钟居哲

Hi, Richard.

>> Maybe dead lanes are better tracked at the gimple level though, not sure.
>> (But AArch64 might need to lower lane operations more than it does now if
>> we want gimple to handle it.)

We were trying to address such issue at GIMPLE leve at the beginning.
Tracking subreg-lanes of tuple type may be enough for aarch64 since aarch64 
only tuple types.
However, for RVV, that's not enough to address all issues.
Consider this following situation:
https://godbolt.org/z/fhTvEjvr8 

You can see comparing with LLVM, GCC has so many redundant mov instructions 
"vmv1r.v".
Since GCC is not able to tracking subreg liveness, wheras LLVM can.

The reason why tracking sub-lanes in GIMPLE can not address these redundant 
move issues for RVV：

1. RVV has tuple type like "vint8m1x2_t" which is totoally the same as aarch64 
"svint8x1_t".
It used by segment load/store which is similiar instruction "ld2r" 
instruction in ARM SVE (vec_load_lanes/vec_store_lanes)
Support sub-lanes tracking in GIMPLE can fix this situation for both RVV 
and ARM SVE.

2. However, we are not having "vint8m1x2_t", we also have "vint8m2_t" (LMUL =2) 
which also occupies 2 regsiters
which is not tuple type, instead, it is simple vector type. Such type is 
used by all simple operations.
For example, "vadd" with vint8m1_t is doing PLUS operation on single vector 
registers, wheras same
instruction "vadd“ with vint8m2_t is dong PLUS operation on 2 vector 
registers.  Such type we can't
define them as tuple type for following reasons:
1). we also have tuple type for LMUL > 1, for example, we also have 
"vint8m2x2_t" has tuple type.
 If we define "vint8m2_t" as tuple type, How about "vint8m2x2_t" ? , 
Tuple type with tuple or
 Array with array ? It makes type so strange.
2). RVV instrinsic doc define vint8m2x2_t as tuple type, but vint8m2_t not 
tuple type. We are not able
 to change the documents.
3). Clang has supported RVV intrinsics 3 years ago, vint8m2_t is not tuple 
type for 3 years and widely
 used, changing type definition will destroy ecosystem.  So for 
compability, we are not able define
 LMUL > 1 as tuple type.

For these reasons, we should be able to access highpart of vint8m2_t and 
lowpart of vint8m2_t, we provide
vget to generate subreg access of the vector mode.

So, at the discussion stage, we decided to address subpart access of vector 
mode in more generic way,
which is support subreg liveness tracking in RTL level. So that it can not only 
address issues happens on ARM SVE,
but also address issues for LMUL > 1.

3. After we decided to support subreg liveness tracking in RTL, we study LLVM.
Actually, LLVM has a standalone PASS right before their linear scan RA 
(greedy) call register coalescer.
So, the first draft of our solution is supporting register coalescing 
before RA which is opened source:
riscv-gcc/gcc/ira-coalesce.cc at riscv-gcc-rvv-next ・ 
riscv-collab/riscv-gcc (github.com)
by simulating LLVM solution. However, we don't think such solution is 
elegant and we have consulted
Vlad.  Vlad suggested we should enhance IRA/LRA with subreg liveness 
tracking which turns to be
more reasonable and elegant approach. 

So, after Lehua several experiments and investigations, he dedicate himself 
produce this series of patches.
And we think Lehua's approach should be generic and optimal solution to fix 
this subreg generic problems.

Thanks.

juzhe.zh...@rivai.ai

From: Richard Sandiford
Date: 2023-11-11 23:33
To: Jeff Law
CC: Lehua Ding; gcc-patches; vmakarov; juzhe.zhong
Subject: Re: [PATCH 0/7] ira/lra: Support subreg coalesce
Jeff Law  writes:
> On 11/8/23 02:40, Richard Sandiford wrote:
>> Lehua Ding  writes:
>>> Hi,
>>>
>>> These patchs try to support subreg coalesce feature in
>>> register allocation passes (ira and lra).
>> 
>> Thanks a lot for the series.  This is definitely something we've
>> needed for a while.
>> 
>> I probably won't be able to look at it in detail for a couple of weeks
>> (and the real review should come from Vlad anyway), but one initial
>> comment:
> Absolutely agreed on the above.
>
> The other thing to ponder.  Jivan and I have been banging on Joern's 
> sub-object tracking bits for a totally different problem in the RISC-V 
> space.  But there may be some overlap.
>
> Essentially Joern's code tracks liveness for a few chunks in registers. 
> bits 0..7, bits 8..15, bits 16..31 and bits 32..63.  This includes 
> propagating liveness from the destination through to the sources.  SO 
> for example if we have
>
> (set (reg:SI dest) (plus:SI (srcreg1:SI) (srcreg2:SI)))
>
> If we had previously determined that only bits 0..15 were live in DEST, 
> then we'll propagate that into the source registers.
>
> The goal is to ultimately transform something like
>
> (set (dest:mode) (any_extend:mode (reg:narrower_mode)))
>
> into
>
> (set (dest:mode) (subreg:mode (reg:narrower_mode)))
>
> Where the

Re: Re: [PATCH] RISC-V: Add combine optimization by slideup for vec_init vectorization

2023-11-10 Thread 钟居哲

Thanks. Robin. Committed.

>> The test patterns are a bit unwieldy but not a blocker
>>IMHO.  Could probably done shorter using macro magic?
I have no idea. But I think we can revisit it and refine tests when we have 
time.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-11-10 20:47
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Add combine optimization by slideup for vec_init 
vectorization
Hi Juzhe,
 
LGTM.  The test patterns are a bit unwieldy but not a blocker
IMHO.  Could probably done shorter using macro magic?
 
Regards
Robin

1 2 3 4 5 >

1 - 100 of 410 matches

Mail list logo