Re: [PATCH v1] RISC-V: Support FP irintf auto vectorization

2023-10-11 Thread juzhe.zh...@rivai.ai
LGTM。 Thanks。



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-10-12 09:52
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Support FP irintf auto vectorization
From: Pan Li 
 
This patch would like to support the FP irintf auto vectorization.
 
* int irintf (float)
 
Due to the limitation that only the same size of data type are allowed
in the vectorier, the standard name lrintmn2 only act on SF => SI.
 
Given we have code like:
 
void
test_irintf (int *out, float *in, unsigned count)
{
  for (unsigned i = 0; i < count; i++)
out[i] = __builtin_irintf (in[i]);
}
 
Before this patch:
.L3:
  ...
  flw  fa5,0(a1)
  fcvt.w.s a5,fa5,dyn
  sw   a5,-4(a0)
  ...
  bne  a1,a4,.L3
 
After this patch:
.L3:
  ...
  vle32.v v1,0(a1)
  vfcvt.x.f.v v1,v1
  vse32.v v1,0(a0)
  ...
  bne a2,zero,.L3
 
The rest part like DF => SI/HF => SI will be covered by the hook
TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION.
 
gcc/ChangeLog:
 
* config/riscv/autovec.md (lrint2): Rename from.
(lrint2): Rename to.
* config/riscv/vector-iterators.md: Rename and remove TARGET_64BIT.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/unop/math-irint-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-irint-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-irint-0.c: New test.
 
Signed-off-by: Pan Li 
---
gcc/config/riscv/autovec.md   |  9 ++-
gcc/config/riscv/vector-iterators.md  | 74 +--
.../riscv/rvv/autovec/unop/math-irint-0.c | 14 
.../riscv/rvv/autovec/unop/math-irint-run-0.c | 63 
.../riscv/rvv/autovec/vls/math-irint-0.c  | 30 
5 files changed, 149 insertions(+), 41 deletions(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-irint-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-irint-run-0.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-irint-0.c
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index dc76a01d82c..c3a51e22ceb 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2240,6 +2240,7 @@ (define_expand "avg3_ceil"
;; - trunc/truncf
;; - roundeven/roundevenf
;; - lrint/lrintf
+;; - irintf
;; -
(define_expand "ceil2"
   [(match_operand:V_VLSF 0 "register_operand")
@@ -2311,12 +2312,12 @@ (define_expand "roundeven2"
   }
)
-(define_expand "lrint2"
-  [(match_operand: 0 "register_operand")
-   (match_operand:V_VLS_FCONVERTL 1 "register_operand")]
+(define_expand "lrint2"
+  [(match_operand:0 "register_operand")
+   (match_operand:V_VLS_FCONVERT_I_L_LL 1 "register_operand")]
   "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
   {
-riscv_vector::expand_vec_lrint (operands[0], operands[1], mode, 
mode);
+riscv_vector::expand_vec_lrint (operands[0], operands[1], mode, 
mode);
 DONE;
   }
)
diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index bb0c46ea30a..96ddd34c958 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -3281,8 +3281,8 @@ (define_mode_attr vnnconvert [
   (V512DI "v512hf")
])
-;; L indicates convert to long
-(define_mode_attr VLCONVERT [
+;; Convert to int, long and long long
+(define_mode_attr V_I_L_LL_CONVERT [
   (RVVM8SF "RVVM8SI") (RVVM4SF "RVVM4SI") (RVVM2SF "RVVM2SI")
   (RVVM1SF "RVVM1SI") (RVVMF2SF "RVVMF2SI")
@@ -3298,7 +3298,7 @@ (define_mode_attr VLCONVERT [
   (V512DF "V512DI")
])
-(define_mode_attr vlconvert [
+(define_mode_attr v_i_l_ll_convert [
   (RVVM8SF "rvvm8si") (RVVM4SF "rvvm4si") (RVVM2SF "rvvm2si")
   (RVVM1SF "rvvm1si") (RVVMF2SF "rvvmf2si")
@@ -3314,40 +3314,40 @@ (define_mode_attr vlconvert [
   (V512DF "v512di")
])
-(define_mode_iterator V_VLS_FCONVERTL [
-  (RVVM8SF "TARGET_VECTOR_ELEN_FP_32 && !TARGET_64BIT")
-  (RVVM4SF "TARGET_VECTOR_ELEN_FP_32 && !TARGET_64BIT")
-  (RVVM2SF "TARGET_VECTOR_ELEN_FP_32 && !TARGET_64BIT")
-  (RVVM1SF "TARGET_VECTOR_ELEN_FP_32 && !TARGET_64BIT")
-  (RVVMF2SF "TARGET_VECTOR_ELEN_FP_32 && !TARGET_64BIT && TARGET_MIN_VLEN > 
32")
-
-  (RVVM8DF "TARGET_VECTOR_ELEN_FP_64 && TARGET_64BIT")
-  (RVVM4DF "TARGET_VECTOR_ELEN_FP_64 && TARGET_64BIT")
-  (RVVM2DF "TARGET_VECTOR_ELEN_FP_64 && TARGET_64BIT")
-  (RVVM1DF "TARGET_VECTOR_ELEN_FP_64 && TARGET_64BIT")
-
-  (V1SF "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_FP_32 &

Re: Re: RISC-V: Support CORE-V XCVMAC and XCVALU extensions

2023-10-11 Thread juzhe.zh...@rivai.ai
Plz revert it. It blocks development of all targets.



juzhe.zh...@rivai.ai
 
From: Andrew Pinski
Date: 2023-10-12 09:03
To: juzhe.zh...@rivai.ai
CC: gcc-patches; jeffreyalaw; Kito.cheng; kito.cheng; Robin Dapp
Subject: Re: RISC-V: Support CORE-V XCVMAC and XCVALU extensions
On Wed, Oct 11, 2023 at 6:01 PM juzhe.zh...@rivai.ai
 wrote:
>
> ../../../../gcc/gcc/doc/extend.texi:21708: warning: node next `RISC-V Vector 
> Intrinsics' in menu `CORE-V Built-in Functions' and in sectioning `RX 
> Built-in Functions' differ
> ../../../../gcc/gcc/doc/extend.texi:21716: warning: node `RX Built-in 
> Functions' is next for `CORE-V Built-in Functions' in menu but not in 
> sectioning
> ../../../../gcc/gcc/doc/extend.texi:21716: warning: node `RISC-V Vector 
> Intrinsics' is prev for `CORE-V Built-in Functions' in menu but not in 
> sectioning
> ../../../../gcc/gcc/doc/extend.texi:21716: warning: node up `CORE-V Built-in 
> Functions' in menu `Target Builtins' and in sectioning `RISC-V Vector 
> Intrinsics' differ
> ../../../../gcc/gcc/doc/extend.texi:21708: node `RISC-V Vector Intrinsics' 
> lacks menu item for `CORE-V Built-in Functions' despite being its Up target
> ../../../../gcc/gcc/doc/extend.texi:21889: warning: node prev `RX Built-in 
> Functions' in menu `CORE-V Built-in Functions' and in sectioning `RISC-V 
> Vector Intrinsics' differ
> In file included from ../../../../gcc/gcc/gensupport.cc:26:0:
> ../../../../gcc/gcc/rtl.h:66:26: warning: ‘rtx_def::code’ is too small to 
> hold all values of ‘enum rtx_code’
>  #define RTX_CODE_BITSIZE 8
>   ^
> ../../../../gcc/gcc/rtl.h:318:33: note: in expansion of macro 
> ‘RTX_CODE_BITSIZE’
>ENUM_BITFIELD(rtx_code) code: RTX_CODE_BITSIZE;
>  ^~~~
>
> make[2]: *** [Makefile:3534: doc/gcc.info] Error 1
> make[2]: *** Waiting for unfinished jobs
> rm gfdl.pod gcc.pod gcov-dump.pod gcov-tool.pod fsf-funding.pod gpl.pod 
> cpp.pod gcov.pod lto-dump.pod
> make[2]: Leaving directory 
> '/work/home/jzzhong/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/build-gcc-newlib-stage1/gcc'
> make[1]: *** [Makefile:4648: all-gcc] Error 2
> make[1]: Leaving directory 
> '/work/home/jzzhong/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/build-gcc-newlib-stage1'
> make: *** [Makefile:590: stamps/build-gcc-newlib-stage1] Error 2
 
This is also recorded as
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111777 . It breaks more
than just RISCV; it depends on the version of texinfo that is
installed too.
 
Thanks,
Andrew
 
>
> 
> juzhe.zh...@rivai.ai
 


RISC-V: Support CORE-V XCVMAC and XCVALU extensions

2023-10-11 Thread juzhe.zh...@rivai.ai
../../../../gcc/gcc/doc/extend.texi:21708: warning: node next `RISC-V Vector 
Intrinsics' in menu `CORE-V Built-in Functions' and in sectioning `RX Built-in 
Functions' differ
../../../../gcc/gcc/doc/extend.texi:21716: warning: node `RX Built-in 
Functions' is next for `CORE-V Built-in Functions' in menu but not in sectioning
../../../../gcc/gcc/doc/extend.texi:21716: warning: node `RISC-V Vector 
Intrinsics' is prev for `CORE-V Built-in Functions' in menu but not in 
sectioning
../../../../gcc/gcc/doc/extend.texi:21716: warning: node up `CORE-V Built-in 
Functions' in menu `Target Builtins' and in sectioning `RISC-V Vector 
Intrinsics' differ
../../../../gcc/gcc/doc/extend.texi:21708: node `RISC-V Vector Intrinsics' 
lacks menu item for `CORE-V Built-in Functions' despite being its Up target
../../../../gcc/gcc/doc/extend.texi:21889: warning: node prev `RX Built-in 
Functions' in menu `CORE-V Built-in Functions' and in sectioning `RISC-V Vector 
Intrinsics' differ
In file included from ../../../../gcc/gcc/gensupport.cc:26:0:
../../../../gcc/gcc/rtl.h:66:26: warning: ‘rtx_def::code’ is too small to hold 
all values of ‘enum rtx_code’
 #define RTX_CODE_BITSIZE 8
  ^
../../../../gcc/gcc/rtl.h:318:33: note: in expansion of macro ‘RTX_CODE_BITSIZE’
   ENUM_BITFIELD(rtx_code) code: RTX_CODE_BITSIZE;
 ^~~~

make[2]: *** [Makefile:3534: doc/gcc.info] Error 1
make[2]: *** Waiting for unfinished jobs
rm gfdl.pod gcc.pod gcov-dump.pod gcov-tool.pod fsf-funding.pod gpl.pod cpp.pod 
gcov.pod lto-dump.pod
make[2]: Leaving directory 
'/work/home/jzzhong/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/build-gcc-newlib-stage1/gcc'
make[1]: *** [Makefile:4648: all-gcc] Error 2
make[1]: Leaving directory 
'/work/home/jzzhong/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/build-gcc-newlib-stage1'
make: *** [Makefile:590: stamps/build-gcc-newlib-stage1] Error 2



juzhe.zh...@rivai.ai


Re: Re: [PATCH V2] RISC-V: Fix incorrect index(offset) of gather/scatter

2023-10-11 Thread juzhe.zh...@rivai.ai
Oh. Yes.

Address comment:
V3: https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632623.html 

Use if (inner_offsize < BITS_PER_WORD)



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-10-11 17:50
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw
Subject: Re: [PATCH V2] RISC-V: Fix incorrect index(offset) of gather/scatter
Hi Juzhe,
 
good that you noticed it now,  I should have caught that
in the review back then...
 
One thing, though:
 
> +  if (inner_offsize < GET_MODE_BITSIZE (GET_MODE (ptr)).to_constant ())
 
Shouldn't ptr always be Pmode i.e. the bitsize == XLEN?
 
Rest LGTM.
 
Regards
Robin
 


Re: [PATCH] RISC-V: Fix incorrect index(offset) of gather/scatter

2023-10-11 Thread juzhe.zh...@rivai.ai
Refine the codes in V2:
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632619.html 



juzhe.zh...@rivai.ai
 
From: Juzhe-Zhong
Date: 2023-10-11 17:03
To: gcc-patches
CC: kito.cheng; kito.cheng; jeffreyalaw; rdapp.gcc; Juzhe-Zhong
Subject: [PATCH] RISC-V: Fix incorrect index(offset) of gather/scatter
I suddenly I made a mistake that was lucky un-exposed.
 
https://godbolt.org/z/c3jzrh7or
 
GCC is using 32 bit index offset:
 
vsll.vi v1,v1,2
vsetvli zero,a5,e32,m1,ta,ma
vluxei32.v  v1,(a1),v1
 
This is wrong since v1 may overflow 32bit after vsll.vi.
 
After this patch:
 
vsext.vf2 v8,v4
vsll.vi v8,v8,2
vluxei64.v v8,(a1),v8
 
Same as Clang.
 
Regression passed. Ok for trunk ?
 
gcc/ChangeLog:
 
* config/riscv/autovec.md: Fix offset bug.
* config/riscv/riscv-protos.h (gather_scatter_valid_offset_p): New function.
* config/riscv/riscv-v.cc (expand_gather_scatter): Fix offset bug.
(gather_scatter_valid_offset_p): New function.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/gather-scatter/offset_extend-1.c: New test.
 
---
gcc/config/riscv/autovec.md   | 28 +--
gcc/config/riscv/riscv-protos.h   |  1 +
gcc/config/riscv/riscv-v.cc   | 16 +--
.../autovec/gather-scatter/offset_extend-1.c  | 14 ++
4 files changed, 42 insertions(+), 17 deletions(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/offset_extend-1.c
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 41bff3a318f..07607bff71e 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -59,7 +59,7 @@
(match_operand: 5 "vector_mask_operand")
(match_operand 6 "autovec_length_operand")
(match_operand 7 "const_0_operand")]
-  "TARGET_VECTOR"
+  "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p 
(mode)"
{
   riscv_vector::expand_gather_scatter (operands, true);
   DONE;
@@ -74,7 +74,7 @@
(match_operand: 5 "vector_mask_operand")
(match_operand 6 "autovec_length_operand")
(match_operand 7 "const_0_operand")]
-  "TARGET_VECTOR"
+  "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p 
(mode)"
{
   riscv_vector::expand_gather_scatter (operands, true);
   DONE;
@@ -89,7 +89,7 @@
(match_operand: 5 "vector_mask_operand")
(match_operand 6 "autovec_length_operand")
(match_operand 7 "const_0_operand")]
-  "TARGET_VECTOR"
+  "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p 
(mode)"
{
   riscv_vector::expand_gather_scatter (operands, true);
   DONE;
@@ -104,7 +104,7 @@
(match_operand: 5 "vector_mask_operand")
(match_operand 6 "autovec_length_operand")
(match_operand 7 "const_0_operand")]
-  "TARGET_VECTOR"
+  "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p 
(mode)"
{
   riscv_vector::expand_gather_scatter (operands, true);
   DONE;
@@ -119,7 +119,7 @@
(match_operand: 5 "vector_mask_operand")
(match_operand 6 "autovec_length_operand")
(match_operand 7 "const_0_operand")]
-  "TARGET_VECTOR"
+  "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p 
(mode)"
{
   riscv_vector::expand_gather_scatter (operands, true);
   DONE;
@@ -134,7 +134,7 @@
(match_operand: 5 "vector_mask_operand")
(match_operand 6 "autovec_length_operand")
(match_operand 7 "const_0_operand")]
-  "TARGET_VECTOR"
+  "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p 
(mode)"
{
   riscv_vector::expand_gather_scatter (operands, true);
   DONE;
@@ -153,7 +153,7 @@
(match_operand: 5 "vector_mask_operand")
(match_operand 6 "autovec_length_operand")
(match_operand 7 "const_0_operand")]
-  "TARGET_VECTOR"
+  "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p 
(mode)"
{
   riscv_vector::expand_gather_scatter (operands, true);
   DONE;
@@ -172,7 +172,7 @@
(match_operand: 5 "vector_mask_operand")
(match_operand 6 "autovec_length_operand")
(match_operand 7 "const_0_operand")]
-  "TARGET_VECTOR"
+  "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p 
(mode)"
{
   riscv_vector::expand_gather_scatter (operands, false);
   DONE;
@@ -187,7 +187,7 @@
(match_operand: 5 "vector_mask_operand")
(match_operand 6 "autovec_length_operand")
(match_operand 7 "const_0_operand")]
-  "TARGET_VECTOR"
+  "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p 
(mode)"
{
   riscv_vector::expand_gather_scatter (operands,

Re: [PATCH v1] RISC-V: Support FP lrint/lrintf auto vectorization

2023-10-11 Thread juzhe.zh...@rivai.ai
LGTM.



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-10-11 16:49
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Support FP lrint/lrintf auto vectorization
From: Pan Li 
 
This patch would like to support the FP lrint/lrintf auto vectorization.
 
* long lrint (double) for rv64
* long lrintf (float) for rv32
 
Due to the limitation that only the same size of data type are allowed
in the vectorier, the standard name lrintmn2 only act on DF => DI for
rv64, and SF => SI for rv32.
 
Given we have code like:
 
void
test_lrint (long *out, double *in, unsigned count)
{
  for (unsigned i = 0; i < count; i++)
out[i] = __builtin_lrint (in[i]);
}
 
Before this patch:
.L3:
  ...
  fld  fa5,0(a1)
  fcvt.l.d a5,fa5,dyn
  sd   a5,-8(a0)
  ...
  bne  a1,a4,.L3
 
After this patch:
.L3:
  ...
  vsetvli a3,zero,e64,m1,ta,ma
  vfcvt.x.f.v v1,v1
  vsetvli zero,a2,e64,m1,ta,ma
  vse32.v v1,0(a0)
  ...
  bne a2,zero,.L3
 
The rest part like SF => DI/HF => DI/DF => SI/HF => SI will be covered
by TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION.
 
gcc/ChangeLog:
 
* config/riscv/autovec.md (lrint2): New pattern
for lrint/lintf.
* config/riscv/riscv-protos.h (expand_vec_lrint): New func decl
for expanding lint.
* config/riscv/riscv-v.cc (emit_vec_cvt_x_f): New helper func impl
for vfcvt.x.f.v.
(expand_vec_lrint): New function impl for expanding lint.
* config/riscv/vector-iterators.md: New mode attr and iterator.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/unop/test-math.h: New define for
CVT like test case.
* gcc.target/riscv/rvv/autovec/vls/def.h: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-lrint-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lrint-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lrint-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lrint-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-lrint-0.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-lrint-1.c: New test.
 
Signed-off-by: Pan Li 
---
gcc/config/riscv/autovec.md   | 11 +++
gcc/config/riscv/riscv-protos.h   |  1 +
gcc/config/riscv/riscv-v.cc   | 20 ++
gcc/config/riscv/vector-iterators.md  | 69 +++
.../riscv/rvv/autovec/unop/math-lrint-0.c | 14 
.../riscv/rvv/autovec/unop/math-lrint-1.c | 14 
.../riscv/rvv/autovec/unop/math-lrint-run-0.c | 63 +
.../riscv/rvv/autovec/unop/math-lrint-run-1.c | 63 +
.../riscv/rvv/autovec/unop/test-math.h| 24 +++
.../gcc.target/riscv/rvv/autovec/vls/def.h|  9 +++
.../riscv/rvv/autovec/vls/math-lrint-0.c  | 30 
.../riscv/rvv/autovec/vls/math-lrint-1.c  | 30 
12 files changed, 348 insertions(+)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lrint-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lrint-1.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lrint-run-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lrint-run-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-lrint-0.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-lrint-1.c
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 53e9d34eea1..dc76a01d82c 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2239,6 +2239,7 @@ (define_expand "avg3_ceil"
;; - round/roundf
;; - trunc/truncf
;; - roundeven/roundevenf
+;; - lrint/lrintf
;; -
(define_expand "ceil2"
   [(match_operand:V_VLSF 0 "register_operand")
@@ -2309,3 +2310,13 @@ (define_expand "roundeven2"
 DONE;
   }
)
+
+(define_expand "lrint2"
+  [(match_operand: 0 "register_operand")
+   (match_operand:V_VLS_FCONVERTL 1 "register_operand")]
+  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
+  {
+riscv_vector::expand_vec_lrint (operands[0], operands[1], mode, 
mode);
+DONE;
+  }
+)
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 43426a5326b..f6bd15b47b0 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -474,6 +474,7 @@ void expand_vec_rint (rtx, rtx, machine_mode, machine_mode);
void expand_vec_round (rtx, rtx, machine_mode, machine_mode);
void expand_vec_trunc (rtx, rtx, machine_mode, machine_mode);
void expand_vec_roundeven (rtx, rtx, machine_mode, machine_mode);
+void expand_vec_lrint (rtx, rtx, machine_mode, machine_mode);
#endif
bool sew64_scalar_helper (rtx *, rtx *, rtx, machine_mode,
  bool, void (*)(rtx *, rtx));
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index c72e411f125..64f99d85d91 100644
---

Re: Re: [PATCH] RISC-V/testsuite: Enable `vect_pack_trunc'

2023-10-11 Thread juzhe.zh...@rivai.ai
Hi, Maciej.

I have enable all vectorization test on RVV which is committed:

https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632598.html 

But I have added every test with:
+|| ([istarget riscv*-*-*]
+&& [check_effective_target_riscv_v])
As you said, you think we don't need to add check_effective_target_riscv_v 
every time.

So, feel free to adjust it (remove check_effective_target_riscv_v) and send a 
patch. 
But I hope you can adjust each set of tests carefully to make every thing 
consistent.

Thanks.


juzhe.zh...@rivai.ai
 
From: Maciej W. Rozycki
Date: 2023-10-11 05:35
To: juzhe.zhong
CC: gcc-patches; jeffreyalaw; Robin Dapp; Kito.cheng
Subject: Re: Re: [PATCH] RISC-V/testsuite: Enable `vect_pack_trunc'
On Tue, 10 Oct 2023, juzhe.zh...@rivai.ai wrote:
 
> It's weird. Could you give me the FAILs report?
 
I keep forgetting that I have a piece of code in my board description 
files that makes the testsuite leave output files in place, which helps 
much when debugging failures (although it's not a perfect solution for 
test cases like those verified at different optimisation levels where the 
output filename is reused and consequently subsequent outputs overwrite 
earlier ones; something to improve perhaps).  Unfortunately the presence 
of output files confuses some test cases and makes them fail; arguably a 
test case bug.  None of the offending test cases are directly related to 
RISC-V development, so I just ignore the presence of these failures and 
only focus on regressions and progressions between testsuite runs.
 
Here are fresh results with the testsuite output tree made tidy:
 
=== gcc Summary ===
 
# of expected passes 194602
# of unexpected failures 145
# of unexpected successes 11
# of expected failures 1631
# of unresolved testcases 120
# of unsupported tests 3828
 
It probably makes no sense to clutter the mailing list with my FAIL and 
UNRESOLVED results; I can send them off-list if you find them useful.
 
  Maciej
 


Re: Re: [PATCH] RISC-V: Enable full coverage vect tests

2023-10-11 Thread juzhe.zh...@rivai.ai
Thanks. Committed.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-10-11 14:54
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Enable full coverage vect tests
Hi Juzhe,
 
seems OK to me.  We don't support most of the patterns directly
but as we can and want to vectorize them it makes sens to enable
the tests.
 
Regards
Robin
 


Re: Re: [PATCH] RISC-V/testsuite: Enable `vect_pack_trunc'

2023-10-10 Thread juzhe.zh...@rivai.ai
It's weird. Could you give me the FAILs report?



juzhe.zh...@rivai.ai
 
From: Maciej W. Rozycki
Date: 2023-10-10 18:18
To: 钟居哲
CC: gcc-patches; Jeff Law; rdapp.gcc; kito.cheng
Subject: Re: 回复: [PATCH] RISC-V/testsuite: Enable `vect_pack_trunc'
On Mon, 9 Oct 2023, Maciej W. Rozycki wrote:
 
> > Btw, could you rebase to the trunk and run regression again?
> 
>  Full regression-testing takes roughly 40 hours here and I do not normally
> update the tree midway through my work so as not to add variables and end 
> up chasing a moving target, especially with such an unstable state that we 
> have ended up with recently with the RISC-V port.  Since I'm done with 
> this part I can refresh and schedule another run if you are curious as to 
> how it looks like from my side.  For the C subset alone it'll take less.
 
After 10 hours I have now got:
 
=== gcc Summary ===
 
# of expected passes 194576
# of unexpected failures 600
# of unexpected successes 11
# of expected failures 1631
# of unresolved testcases 120
# of unsupported tests 3828
 
as at commit cc5033721553 ("Fixes for profile count/probability 
maintenance"), which is slightly better, but still far from your 92 FAILs.  
NB I ran this testing with `--param=riscv-autovec-preference=scalable'; I 
guess I could have mentioned it.
 
  Maciej
 


Re: [PATCH v2 0/4] RISC-V target attribute

2023-10-10 Thread juzhe.zh...@rivai.ai
LGTM on my side.
IMHO, we need to support attribute (rvv_vector_bits) which depend on this 
patch, am I right?

If yes, will you support this feature in GCC-14 release?



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-10-10 12:13
To: gcc-patches; kito.cheng; palmer; jeffreyalaw; rdapp; juzhe.zhong
Subject: [PATCH v2 0/4] RISC-V target attribute
This patch set implement target attribute for RISC-V target, which is similar 
to other target like x86 or ARM, let user able to set some local setting per 
function without changing global settings.
 
We support arch, tune and cpu first, and we will support other target attribute 
later, this version DOES NOT include multi-version function support yet, that 
is future work, probably work for GCC 15.
 
The full proposal is put in RISC-V C-API document[1], which has discussed with 
RISC-V LLVM community, so we have consistent syntax and semantics. 
 
[1] https://github.com/riscv-non-isa/riscv-c-api-doc/pull/35
 
v2 changelog:
- Resolve awk multi-dimensional issue.
- Tweak code format
- Tweak testcases
 
 
 


Re: Re: [PATCH] RISC-V Regression: Fix FAIL of bb-slp-pr65935.c for RVV

2023-10-10 Thread juzhe.zh...@rivai.ai
Great ! I am gonna wait for Richi's  approval.



juzhe.zh...@rivai.ai
 
From: Andrew Stubbs
Date: 2023-10-10 17:40
To: Juzhe-Zhong; gcc-patches@gcc.gnu.org
CC: rguent...@suse.de; jeffreya...@gmail.com
Subject: Re: [PATCH] RISC-V Regression: Fix FAIL of bb-slp-pr65935.c for RVV
On 10/10/2023 02:39, Juzhe-Zhong wrote:
> Here is the reference comparing dump IR between ARM SVE and RVV.
> 
> https://godbolt.org/z/zqess8Gss
> 
> We can see RVV has one more dump IR:
> optimized: basic block part vectorized using 128 byte vectors
> since RVV has 1024 bit vectors.
> 
> The codegen is reasonable good.
> 
> However, I saw GCN also has 1024 bit vector.
> This patch may cause this case FAIL in GCN port ?
> 
> Hi, GCN folk, could you check this patch in GCN port for me ?
 
This patch *fixes* an existing test fail on GCN. :)
 
It's probably one of the many I've never had time to analyze (and 
optimizing more than expected makes it low priority).
 
LGTM
 
Andrew
 


Re: Re: [PATCH] RISC-V/testsuite: Enable `vect_pack_trunc'

2023-10-09 Thread juzhe.zh...@rivai.ai
Oh. I realize this patch increase FAIL that I recently fixed:
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632247.html 

This fail because RVV doesn't have vec_pack_trunc_optab (Loop vectorizer will 
failed at first time but succeed at 2nd time), 
then RVV will dump 4 times FOLD_EXTRACT_LAST instead of 2  (ARM SVE 2 times 
because they have vec_pack_trunc_optab).

I think the root cause of RVV failing at multiple tests of "vect" is that we 
don't enable vec_pack/vec_unpack/... stuff, 
we still succeed at vectorizations and we want to enable tests of them 
(Mostly just using different approach to vectorize it (cause dump FAIL) because 
of some changing I have done previously in the middle-end).

So enabling "vec_pack" for RVV will fix some FAILs but increase some other 
FAILs.

CC to Richi to see more reasonable suggestions.



juzhe.zh...@rivai.ai
 
发件人: Maciej W. Rozycki
发送时间: 2023-10-10 06:38
收件人: 钟居哲
抄送: gcc-patches; Jeff Law; rdapp.gcc; kito.cheng
主题: Re: 回复: [PATCH] RISC-V/testsuite: Enable `vect_pack_trunc'
On Tue, 10 Oct 2023, 钟居哲 wrote:
 
> Btw, could you rebase to the trunk and run regression again?
 
Full regression-testing takes roughly 40 hours here and I do not normally
update the tree midway through my work so as not to add variables and end 
up chasing a moving target, especially with such an unstable state that we 
have ended up with recently with the RISC-V port.  Since I'm done with 
this part I can refresh and schedule another run if you are curious as to 
how it looks like from my side.  For the C subset alone it'll take less.
 
  Maciej
 


Re: [PATCH v2] RISC-V: Refine bswap16 auto vectorization code gen

2023-10-09 Thread juzhe.zh...@rivai.ai
LGTM now.

Thanks.



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-10-09 21:09
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v2] RISC-V: Refine bswap16 auto vectorization code gen
From: Pan Li 
 
Update in v2
 
* Remove emit helper functions.
* Take expand_binop instead.
 
Original log:
 
This patch would like to refine the code gen for the bswap16.
 
We will have VEC_PERM_EXPR after rtl expand when invoking
__builtin_bswap. It will generate about 9 instructions in
loop as below, no matter it is bswap16, bswap32 or bswap64.
 
  .L2:
1 vle16.v v4,0(a0)
2 vmv.v.x v2,a7
3 vand.vv v2,v6,v2
4 sllia2,a5,1
5 vrgatherei16.vv v1,v4,v2
6 sub a4,a4,a5
7 vse16.v v1,0(a3)
8 add a0,a0,a2
9 add a3,a3,a2
  bne a4,zero,.L2
 
But for bswap16 we may have a even simple code gen, which
has only 7 instructions in loop as below.
 
  .L5
1 vle8.v  v2,0(a5)
2 addia5,a5,32
3 vsrl.vi v4,v2,8
4 vsll.vi v2,v2,8
5 vor.vv  v4,v4,v2
6 vse8.v  v4,0(a4)
7 addia4,a4,32
  bne a5,a6,.L5
 
Unfortunately, this way will make the insn in loop will grow up to
13 and 24 for bswap32 and bswap64. Thus, we will refine the code
gen for the bswap16 only, and leave both the bswap32 and bswap64
as is.
 
gcc/ChangeLog:
 
* config/riscv/riscv-v.cc (shuffle_bswap_pattern): New func impl
for shuffle bswap.
(expand_vec_perm_const_1): Add handling for shuffle bswap pattern.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/vls/perm-4.c: Adjust checker.
* gcc.target/riscv/rvv/autovec/unop/bswap16-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/bswap16-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/vls/bswap16-0.c: New test.
 
Signed-off-by: Pan Li 
---
gcc/config/riscv/riscv-v.cc   | 91 +++
.../riscv/rvv/autovec/unop/bswap16-0.c| 17 
.../riscv/rvv/autovec/unop/bswap16-run-0.c| 44 +
.../riscv/rvv/autovec/vls/bswap16-0.c | 34 +++
.../gcc.target/riscv/rvv/autovec/vls/perm-4.c |  4 +-
5 files changed, 188 insertions(+), 2 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/bswap16-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/bswap16-run-0.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/bswap16-0.c
 
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 23633a2a74d..c72e411f125 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -3030,6 +3030,95 @@ shuffle_decompress_patterns (struct expand_vec_perm_d *d)
   return true;
}
+static bool
+shuffle_bswap_pattern (struct expand_vec_perm_d *d)
+{
+  HOST_WIDE_INT diff;
+  unsigned i, size, step;
+
+  if (!d->one_vector_p || !d->perm[0].is_constant (&diff) || !diff)
+return false;
+
+  step = diff + 1;
+  size = step * GET_MODE_UNIT_BITSIZE (d->vmode);
+
+  switch (size)
+{
+case 16:
+  break;
+case 32:
+case 64:
+  /* We will have VEC_PERM_EXPR after rtl expand when invoking
+ __builtin_bswap. It will generate about 9 instructions in
+ loop as below, no matter it is bswap16, bswap32 or bswap64.
+.L2:
+ 1 vle16.v v4,0(a0)
+ 2 vmv.v.x v2,a7
+ 3 vand.vv v2,v6,v2
+ 4 sllia2,a5,1
+ 5 vrgatherei16.vv v1,v4,v2
+ 6 sub a4,a4,a5
+ 7 vse16.v v1,0(a3)
+ 8 add a0,a0,a2
+ 9 add a3,a3,a2
+bne a4,zero,.L2
+
+ But for bswap16 we may have a even simple code gen, which
+ has only 7 instructions in loop as below.
+.L5
+ 1 vle8.v  v2,0(a5)
+ 2 addia5,a5,32
+ 3 vsrl.vi v4,v2,8
+ 4 vsll.vi v2,v2,8
+ 5 vor.vv  v4,v4,v2
+ 6 vse8.v  v4,0(a4)
+ 7 addia4,a4,32
+bne a5,a6,.L5
+
+ Unfortunately, the instructions in loop will grow to 13 and 24
+ for bswap32 and bswap64. Thus, we will leverage vrgather (9 insn)
+ for both the bswap64 and bswap32, but take shift and or (7 insn)
+ for bswap16.
+   */
+default:
+  return false;
+}
+
+  for (i = 0; i < step; i++)
+if (!d->perm.series_p (i, step, diff - i, step))
+  return false;
+
+  if (d->testing_p)
+return true;
+
+  machine_mode vhi_mode;
+  poly_uint64 vhi_nunits = exact_div (GET_MODE_NUNITS (d->vmode), 2);
+
+  if (!get_vector_mode (HImode, vhi_nunits).exists (&vhi_mode))
+return false;
+
+  /* Step-1: Move op0 to src with VHI mode.  */
+  rtx src = gen_reg_rtx (vhi_mode);
+  emit_move_insn (src, gen_lowpart (vhi_mode, d->op0));
+
+  /* Step-2: Shift right 8 bits to dest.  */
+  rtx dest = expand_binop (vhi_mode, lshr_optab, src, gen_int_mode (8, Pmode),
+NULL_RTX, 0, OPTAB_DIRECT);
+
+  /* Step-3: Shift left 8 bits to src.  */
+  src = expand_binop (vhi_mode, ashl_optab, src, gen_int_mode (8, Pmode),
+   NULL_RTX, 0, OPTAB_DIRECT);
+
+  /* Step-4: Logic Or dest and src to dest.  */
+  dest = expand_binop (vhi_mode, ior_optab, dest, src,
+NULL_RTX, 0, OPTAB_DIRECT);
+
+  /* Step-5: Move src to target with VQI mode.  */
+  emit_move_insn (d->target, gen_lowpart 

Re: Re: [PATCH] RISC-V Regression test: Fix FAIL of fast-math-slp-38.c for RVV

2023-10-09 Thread juzhe.zh...@rivai.ai
>> OK.  
Thanks.  Committed.

>> Note load/store-lanes is specifically pre-empting SLP if all
>> loads/stores of a SLP intance can support that.  Not sure if this
>> heuristic is good for load/store lanes with high stride?

Yeah, I understand your concern. 
Em, I am sure too.
But RVV ISA define lanes load/store from 2 to 8 and LLVM already supported.
I think we can fully support them, then let RISC-V COST model decide it whether 
it is profitable or not.

Also, I found RVV can vectorize a TSVC case with stride = 5 
lane_load/lane_store:

tsvc-s353.c:

-/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail *-*-* } } } 
*/
+/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { ! riscv_v 
} } } } */

https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632213.html

So, I think overall it is beneficial we support high stride lane load/store 
which can help us vectorize more cases.



juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-10-09 20:41
To: Juzhe-Zhong
CC: gcc-patches; jeffreyalaw
Subject: Re: [PATCH] RISC-V Regression test: Fix FAIL of fast-math-slp-38.c for 
RVV
On Mon, 9 Oct 2023, Juzhe-Zhong wrote:
 
> Reference: https://godbolt.org/z/G9jzf5Grh
> 
> RVV is able to vectorize this case using SLP. However, with 
> -fno-vect-cost-model, RVV vectorize it by vec_load_lanes with stride 6.
 
OK.  Note load/store-lanes is specifically pre-empting SLP if all
loads/stores of a SLP intance can support that.  Not sure if this
heuristic is good for load/store lanes with high stride?
 
> gcc/testsuite/ChangeLog:
> 
> * gcc.dg/vect/fast-math-slp-38.c: Add ! vect_strided6.
> 
> ---
>  gcc/testsuite/gcc.dg/vect/fast-math-slp-38.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/fast-math-slp-38.c 
> b/gcc/testsuite/gcc.dg/vect/fast-math-slp-38.c
> index 7c7acd5bab6..96751faae7f 100644
> --- a/gcc/testsuite/gcc.dg/vect/fast-math-slp-38.c
> +++ b/gcc/testsuite/gcc.dg/vect/fast-math-slp-38.c
> @@ -18,4 +18,4 @@ foo (void)
>  }
>  
>  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" 
> } } */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" 
> { target { ! vect_strided6 } } } } */
> 
 
-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
 


Re: Re: [PATCH V2] TEST: Fix vect_cond_arith_* dump checks for RVV

2023-10-09 Thread juzhe.zh...@rivai.ai
Thanks Robin. Could you send V3 to Richi ? And commit it if Richi is ok with 
that.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-10-09 18:26
To: Andreas Schwab; juzhe.zhong
CC: rdapp.gcc; gcc-patches; rguenther; jeffreyalaw
Subject: Re: [PATCH V2] TEST: Fix vect_cond_arith_* dump checks for RVV
On 10/9/23 09:32, Andreas Schwab wrote:
> On Okt 09 2023, juzhe.zh...@rivai.ai wrote:
> 
>> Turns out COND(_LEN)?_ADD can't work.
> 
> It should work though.  Tcl regexps are a superset of POSIX EREs.
> 
 
The problem is that COND(_LEN)?_ADD matches two times against
COND_LEN_ADD and a scan-tree-dump-times 1 will fail.  So for those
checks in vect-cond-arith-6.c we either need to switch to
scan-tree-dump or change the pattern to "\.(?:COND|COND_LEN)_ADD".
 
Juzhe, something like the attached works for me.
 
Regards
Robin
 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-4.c 
b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-4.c
index 1af0fe642a0..7d26dbedc5e 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-4.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-4.c
@@ -52,8 +52,8 @@ main (void)
   return 0;
}
-/* { dg-final { scan-tree-dump { = \.COND_ADD} "optimized" { target 
vect_double_cond_arith } } } */
-/* { dg-final { scan-tree-dump { = \.COND_SUB} "optimized" { target 
vect_double_cond_arith } } } */
-/* { dg-final { scan-tree-dump { = \.COND_MUL} "optimized" { target 
vect_double_cond_arith } } } */
-/* { dg-final { scan-tree-dump { = \.COND_RDIV} "optimized" { target 
vect_double_cond_arith } } } */
+/* { dg-final { scan-tree-dump { = \.COND_(LEN_)?ADD} "optimized" { target 
vect_double_cond_arith } } } */
+/* { dg-final { scan-tree-dump { = \.COND_(LEN_)?SUB} "optimized" { target 
vect_double_cond_arith } } } */
+/* { dg-final { scan-tree-dump { = \.COND_(LEN_)?MUL} "optimized" { target 
vect_double_cond_arith } } } */
+/* { dg-final { scan-tree-dump { = \.COND_(LEN_)?RDIV} "optimized" { target 
vect_double_cond_arith } } } */
/* { dg-final { scan-tree-dump-not {VEC_COND_EXPR} "optimized" { target 
vect_double_cond_arith } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-5.c 
b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-5.c
index ec3d9db4202..f7daa13685c 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-5.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-5.c
@@ -54,8 +54,8 @@ main (void)
   return 0;
}
-/* { dg-final { scan-tree-dump { = \.COND_ADD} "optimized" { target { 
vect_double_cond_arith && vect_masked_store } } } } */
-/* { dg-final { scan-tree-dump { = \.COND_SUB} "optimized" { target { 
vect_double_cond_arith && vect_masked_store } } } } */
-/* { dg-final { scan-tree-dump { = \.COND_MUL} "optimized" { target { 
vect_double_cond_arith && vect_masked_store } } } } */
-/* { dg-final { scan-tree-dump { = \.COND_RDIV} "optimized" { target { 
vect_double_cond_arith && vect_masked_store } } } } */
+/* { dg-final { scan-tree-dump { = \.COND_(LEN_)?ADD} "optimized" { target { 
vect_double_cond_arith && vect_masked_store } } } } */
+/* { dg-final { scan-tree-dump { = \.COND_(LEN_)?SUB} "optimized" { target { 
vect_double_cond_arith && vect_masked_store } } } } */
+/* { dg-final { scan-tree-dump { = \.COND_(LEN_)?MUL} "optimized" { target { 
vect_double_cond_arith && vect_masked_store } } } } */
+/* { dg-final { scan-tree-dump { = \.COND_(LEN_)?RDIV} "optimized" { target { 
vect_double_cond_arith && vect_masked_store } } } } */
/* { dg-final { scan-tree-dump-not {VEC_COND_EXPR} "optimized" { target { 
vect_double_cond_arith && vect_masked_store } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-6.c 
b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-6.c
index 2aeebd44f83..a80c30a50b2 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-6.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-6.c
@@ -56,8 +56,8 @@ main (void)
}
/* { dg-final { scan-tree-dump-times {vectorizing stmts using SLP} 4 "vect" { 
target vect_double_cond_arith } } } */
-/* { dg-final { scan-tree-dump-times { = \.COND_ADD} 1 "optimized" { target 
vect_double_cond_arith } } } */
-/* { dg-final { scan-tree-dump-times { = \.COND_SUB} 1 "optimized" { target 
vect_double_cond_arith } } } */
-/* { dg-final { scan-tree-dump-times { = \.COND_MUL} 1 "optimized" { target 
vect_double_cond_arith } } } */
-/* { dg-final { scan-tree-dump-times { = \.COND_RDIV} 1 "optimized" { target 
vect_double_cond_arith } } } */
+/* { dg-final { scan-tree-dump { = \.COND_(LEN_)?ADD} "optimized" { target 
vect_double_cond_arith } } } */
+/* { dg-final { scan-tree-dump { = \.COND_(LEN_)?SUB} "optimized" { target 
vect_double_cond_arith } } } */
+/* { dg-final { scan-tree-dump { = \.COND_(LEN_)?MUL} "optimized" { target 
vect_double_cond_arith } } } */
+/* { dg-final { scan-tree-dump { = \.COND_(LEN_)?RDIV} "optimized" { target 
vect_double_cond_arith } } } */
/* { dg-final { scan-tree-dump-not {VEC_COND_EXPR} "optimized" { target 
vect_double_cond_arith } } } */
 


Re: [PATCH v1] RISC-V: Refine bswap16 auto vectorization code gen

2023-10-09 Thread juzhe.zh...@rivai.ai
Remove these functions:

+static void
+emit_vec_sll_scalar (rtx op_0, rtx op_1, rtx op_2, machine_mode vec_mode)
+{
+  rtx sll_ops[] = {op_0, op_1, op_2};
+  insn_code icode = code_for_pred_scalar (ASHIFT, vec_mode);
+
+  emit_vlmax_insn (icode, BINARY_OP, sll_ops);
+}
+
+static void
+emit_vec_srl_scalar (rtx op_0, rtx op_1, rtx op_2, machine_mode vec_mode)
+{
+  rtx srl_ops[] = {op_0, op_1, op_2};
+  insn_code icode = code_for_pred_scalar (LSHIFTRT, vec_mode);
+
+  emit_vlmax_insn (icode, BINARY_OP, srl_ops);
+}
+
+static void
+emit_vec_or (rtx op_0, rtx op_1, rtx op_2, machine_mode vec_mode)
+{
+  rtx or_ops[] = {op_0, op_1, op_2};
+  insn_code icode = code_for_pred (IOR, vec_mode);
+
+  emit_vlmax_insn (icode, BINARY_OP, or_ops);
+}
+

Instead, 

For sll, you should use :
rtx tmp
= expand_binop (Pmode, ashl_optab, op_1,
gen_int_mode (8, Pmode), NULL_RTX, 0,
OPTAB_DIRECT);

For srl, you should use:
rtx tmp
= expand_binop (Pmode, lshiftrt_optab, op_1,
gen_int_mode (8, Pmode), NULL_RTX, 0,
OPTAB_DIRECT);


For or, you should use:
expand_binop (Pmode, ior_optab, tmp, dest, NULL_RTX, 0,
   OPTAB_DIRECT);



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-10-09 16:51
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Refine bswap16 auto vectorization code gen
From: Pan Li 
 
This patch would like to refine the code gen for the bswap16.
 
We will have VEC_PERM_EXPR after rtl expand when invoking
__builtin_bswap. It will generate about 9 instructions in
loop as below, no matter it is bswap16, bswap32 or bswap64.
 
  .L2:
1 vle16.v v4,0(a0)
2 vmv.v.x v2,a7
3 vand.vv v2,v6,v2
4 sllia2,a5,1
5 vrgatherei16.vv v1,v4,v2
6 sub a4,a4,a5
7 vse16.v v1,0(a3)
8 add a0,a0,a2
9 add a3,a3,a2
  bne a4,zero,.L2
 
But for bswap16 we may have a even simple code gen, which
has only 7 instructions in loop as below.
 
  .L5
1 vle8.v  v2,0(a5)
2 addia5,a5,32
3 vsrl.vi v4,v2,8
4 vsll.vi v2,v2,8
5 vor.vv  v4,v4,v2
6 vse8.v  v4,0(a4)
7 addia4,a4,32
  bne a5,a6,.L5
 
Unfortunately, this way will make the insn in loop will grow up to
13 and 24 for bswap32 and bswap64. Thus, we will refine the code
gen for the bswap16 only, and leave both the bswap32 and bswap64
as is.
 
gcc/ChangeLog:
 
* config/riscv/riscv-v.cc (emit_vec_sll_scalar): New help func
impl for emit vsll.vi/vsll.vx
(emit_vec_srl_scalar): Likewise for vsrl.vi/vsrl.vx.
(emit_vec_or): Likewise for vor.vv.
(shuffle_bswap_pattern): New func impl for shuffle bswap.
(expand_vec_perm_const_1): Add shuffle bswap pattern.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/vls/perm-4.c: Adjust checker.
* gcc.target/riscv/rvv/autovec/unop/bswap16-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/bswap16-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/vls/bswap16-0.c: New test.
 
Signed-off-by: Pan Li 
---
gcc/config/riscv/riscv-v.cc   | 117 ++
.../riscv/rvv/autovec/unop/bswap16-0.c|  17 +++
.../riscv/rvv/autovec/unop/bswap16-run-0.c|  44 +++
.../riscv/rvv/autovec/vls/bswap16-0.c |  34 +
.../gcc.target/riscv/rvv/autovec/vls/perm-4.c |   4 +-
5 files changed, 214 insertions(+), 2 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/bswap16-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/bswap16-run-0.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/bswap16-0.c
 
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 23633a2a74d..3e3b5f2e797 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -878,6 +878,33 @@ emit_vlmax_decompress_insn (rtx target, rtx op0, rtx op1, 
rtx mask)
   emit_vlmax_masked_gather_mu_insn (target, op1, sel, mask);
}
+static void
+emit_vec_sll_scalar (rtx op_0, rtx op_1, rtx op_2, machine_mode vec_mode)
+{
+  rtx sll_ops[] = {op_0, op_1, op_2};
+  insn_code icode = code_for_pred_scalar (ASHIFT, vec_mode);
+
+  emit_vlmax_insn (icode, BINARY_OP, sll_ops);
+}
+
+static void
+emit_vec_srl_scalar (rtx op_0, rtx op_1, rtx op_2, machine_mode vec_mode)
+{
+  rtx srl_ops[] = {op_0, op_1, op_2};
+  insn_code icode = code_for_pred_scalar (LSHIFTRT, vec_mode);
+
+  emit_vlmax_insn (icode, BINARY_OP, srl_ops);
+}
+
+static void
+emit_vec_or (rtx op_0, rtx op_1, rtx op_2, machine_mode vec_mode)
+{
+  rtx or_ops[] = {op_0, op_1, op_2};
+  insn_code icode = code_for_pred (IOR, vec_mode);
+
+  emit_vlmax_insn (icode, BINARY_OP, or_ops);
+}
+
/* Emit merge instruction.  */
static machine_mode
@@ -3030,6 +3057,94 @@ shuffle_decompress_patterns (struct expand_vec_perm_d *d)
   return true;
}
+static bool
+shuffle_bswap_pattern (struct expand_vec_perm_d *d)
+{
+  HOST_WIDE_INT diff;
+  unsigned i, size, step;
+
+  if (!d->one_vector_p || !d->perm[0].is_constant (&diff) || !diff)
+return false;
+
+  step = diff + 1;
+  size = step * GET_MODE_UNIT_BITSIZE

Re: Re: [PATCH] TEST: Fix XPASS of outer loop vectorization tests for RVV

2023-10-09 Thread juzhe.zh...@rivai.ai
Thanks Richi.

I will try to figure out a better way to adapt the tests without adding riscv* 
specific targets variant.



juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-10-09 16:17
To: Juzhe-Zhong
CC: gcc-patches; jeffreyalaw
Subject: Re: [PATCH] TEST: Fix XPASS of outer loop vectorization tests for RVV
On Sun, 8 Oct 2023, Juzhe-Zhong wrote:
 
> Even though RVV doesn't enable vec_unpack/vec_pack, it succeed on outer loop 
> vectorizations.
 
How so?  I think this maybe goes with the other similar change.
 
That is, when we already have specific target checks adding riscv-*-* 
looks sensible but when we don't we should figure if there's a capability
we can (add and) test instead.
 
> Fix these following XPASS FAILs:
> 
> XPASS: gcc.dg/vect/no-scevccp-outer-16.c scan-tree-dump-times vect "OUTER 
> LOOP VECTORIZED." 1
> XPASS: gcc.dg/vect/no-scevccp-outer-17.c scan-tree-dump-times vect "OUTER 
> LOOP VECTORIZED." 1
> XPASS: gcc.dg/vect/no-scevccp-outer-19.c scan-tree-dump-times vect "OUTER 
> LOOP VECTORIZED." 1
> XPASS: gcc.dg/vect/no-scevccp-outer-21.c scan-tree-dump-times vect "OUTER 
> LOOP VECTORIZED." 1
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.dg/vect/no-scevccp-outer-16.c: Fix XPASS for RVV.
> * gcc.dg/vect/no-scevccp-outer-17.c: Ditto.
> * gcc.dg/vect/no-scevccp-outer-19.c: Ditto.
> * gcc.dg/vect/no-scevccp-outer-21.c: Ditto.
> 
> ---
>  gcc/testsuite/gcc.dg/vect/no-scevccp-outer-16.c | 2 +-
>  gcc/testsuite/gcc.dg/vect/no-scevccp-outer-17.c | 2 +-
>  gcc/testsuite/gcc.dg/vect/no-scevccp-outer-19.c | 2 +-
>  gcc/testsuite/gcc.dg/vect/no-scevccp-outer-21.c | 2 +-
>  4 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-16.c 
> b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-16.c
> index c7c2fa8a504..12179949e00 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-16.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-16.c
> @@ -59,4 +59,4 @@ int main (void)
>return 0;
>  }
>  
> -/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { 
> xfail { ! {vect_unpack } } } } } */
> +/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { 
> xfail { { ! {vect_unpack } } && { ! {riscv_v } } } } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-17.c 
> b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-17.c
> index ba904a6c03e..86554a98169 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-17.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-17.c
> @@ -65,4 +65,4 @@ int main (void)
>return 0;
>  }
>  
> -/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { 
> xfail { ! {vect_unpack } } } } } */
> +/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { 
> xfail { { ! {vect_unpack } } && { ! {riscv_v } } } } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-19.c 
> b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-19.c
> index 5cd4049d08c..624b54accf4 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-19.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-19.c
> @@ -49,4 +49,4 @@ int main (void)
>return 0;
>  }
>  
> -/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { 
> xfail { ! {vect_unpack } } } } } */
> +/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { 
> xfail { { ! {vect_unpack } } && { ! {riscv_v } } } } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-21.c 
> b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-21.c
> index 72e53c2bfb0..b30a5d78819 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-21.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-21.c
> @@ -59,4 +59,4 @@ int main (void)
>return 0;
>  }
>  
> -/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { 
> xfail { ! { vect_pack_trunc } } } } } */
> +/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { 
> xfail { { ! {vect_pack_trunc } } && { ! {riscv_v } } } } } } */
> 
 
-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
 


Re: Re: [PATCH] RISC-V: Support movmisalign of RVV VLA modes

2023-10-09 Thread juzhe.zh...@rivai.ai
>> But you gobble the "or .." into an existing -mstrict-align flag - are
>> you sure all implementations are
>> self-consistent with handling non-vector memory instructions and
>> vector memory instructions here?
>> At least the above wording doesn't seem to impose such requirement.

RVV ISA: 
"Support for misaligned vector memory accesses is independent of an 
implementation’s support for misaligned scalar memory accesses."
Support misalign vector memory access is independent on scalar memory access.
I think this patch (using -mno-strict-align) is not appropriate, which means I 
need additional compile option.




juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-10-09 16:01
To: Juzhe-Zhong
CC: gcc-patches; kito.cheng; kito.cheng; jeffreyalaw; rdapp.gcc
Subject: Re: [PATCH] RISC-V: Support movmisalign of RVV VLA modes
On Sun, Oct 8, 2023 at 9:22 AM Juzhe-Zhong  wrote:
>
> Previously, I removed the movmisalign pattern to fix the execution FAILs in 
> this commit:
> https://github.com/gcc-mirror/gcc/commit/f7bff24905a6959f85f866390db2fff1d6f95520
>
> I was thinking that RVV doesn't allow misaligned at the beginning so I 
> removed that pattern.
> However, after deep investigation && reading RVV ISA again and experiment on 
> SPIKE,
> I realized I was wrong.
>
> RVV ISA reference: 
> https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vector-memory-alignment-constraints
>
> "If an element accessed by a vector memory instruction is not naturally 
> aligned to the size of the element,
>  either the element is transferred successfully or an address misaligned 
> exception is raised on that element."
 
But you gobble the "or .." into an existing -mstrict-align flag - are
you sure all implementations are
self-consistent with handling non-vector memory instructions and
vector memory instructions here?
At least the above wording doesn't seem to impose such requirement.
 
> It's obvious that RVV ISA does allow misaligned vector load/store.
>
> And experiment and confirm on SPIKE:
>
> [jzzhong@rios-cad122:/work/home/jzzhong/work/toolchain/riscv/gcc/gcc/testsuite/gcc.dg/vect]$~/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/install/bin/spike
>  --isa=rv64gcv --varch=vlen:128,elen:64 
> ~/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/install/riscv64-unknown-elf/bin/pk64
>   a.out
> bbl loader
> z   ra 00010158 sp 003ffb40 gp 
> 00012c48
> tp  t0 000110da t1 000f t2 
> 
> s0 00013460 s1  a0 00012ef5 a1 
> 00012018
> a2 00012a71 a3 000d a4 0004 a5 
> 00012a71
> a6 00012a71 a7 00012018 s2  s3 
> 
> s4  s5  s6  s7 
> 
> s8  s9  sA  sB 
> 
> t3  t4  t5  t6 
> 
> pc 00010258 va/inst 020660a7 sr 80026620
> Store/AMO access fault!
>
> [jzzhong@rios-cad122:/work/home/jzzhong/work/toolchain/riscv/gcc/gcc/testsuite/gcc.dg/vect]$~/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/install/bin/spike
>  --misaligned --isa=rv64gcv --varch=vlen:128,elen:64 
> ~/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/install/riscv64-unknown-elf/bin/pk64
>   a.out
> bbl loader
>
> We can see SPIKE can pass previous *FAILED* execution tests with specifying 
> --misaligned to SPIKE.
>
> So, to honor RVV ISA SPEC, we should add movmisalign pattern back base on the 
> investigations I have done since
> it can improve multiple vectorization tests and fix dumple FAILs.
>
> This patch fixes these following dump FAILs:
>
> FAIL: gcc.dg/vect/vect-bitfield-read-2.c -flto -ffat-lto-objects  
> scan-tree-dump-not optimized "Invalid sum"
> FAIL: gcc.dg/vect/vect-bitfield-read-2.c scan-tree-dump-not optimized 
> "Invalid sum"
> FAIL: gcc.dg/vect/vect-bitfield-read-4.c -flto -ffat-lto-objects  
> scan-tree-dump-not optimized "Invalid sum"
> FAIL: gcc.dg/vect/vect-bitfield-read-4.c scan-tree-dump-not optimized 
> "Invalid sum"
> FAIL: gcc.dg/vect/vect-bitfield-write-2.c -flto -ffat-lto-objects  
> scan-tree-dump-not optimized "Invalid sum"
> FAIL: gcc.dg/vect/vect-bitfield-write-2.c scan-tree-dump-not optimized 
> "Invalid sum"
> FAIL: gcc.dg/vect/vect-bitfield-write-3.c -flto -ffat-lto-objects  
> scan-tree-dump-not optimi

Re: Re: [PATCH] TEST: Fix dump FAIL of vect-multitypes-16.c for RVV

2023-10-09 Thread juzhe.zh...@rivai.ai
Yes. We do have && enable char -> long conversion (vsext.vf8/vzext.vf8)

Thanks for the comment, I will adapt test as you suggested.



juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-10-09 15:31
To: Jeff Law
CC: Juzhe-Zhong; gcc-patches; richard.sandiford
Subject: Re: [PATCH] TEST: Fix dump FAIL of vect-multitypes-16.c for RVV
On Sun, 8 Oct 2023, Jeff Law wrote:
 
> 
> 
> On 10/8/23 05:35, Juzhe-Zhong wrote:
> > RVV (RISC-V Vector) doesn't enable vect_unpack, but we still vectorize this
> > case well.
> > So, adjust dump check for RVV.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> >  * gcc.dg/vect/vect-multitypes-16.c: Fix dump FAIL of RVV.
> I'd hoped to avoid a bunch of risc-v special casing in the generic part of the
> testsuite.  Basically the more we have target specific conditionals rather
> than conditionals using properties, the more likely we are to keep revisiting
> this stuff over time and possibly for other architectures as well.
> 
> What is it about risc-v's vector support that allows it to optimize this case?
> Is it the same property that allows us to handle the outer loop vectorization
> tests that you changed in another patch?
 
I suspect for VLA vectorization we can use direct conversion from
char to long long here?  I also notice the testcase uses 'char',
not specifying its sign.  So either of [sz]extVxyzDIVxyzQI is possibly
provided by RISCV?  (or possibly via some intermediate types in a
multi-step conversion)
 
For non-VLA and with the single vector size restriction we'd need
unpacking.
 
So it might be better
 
{ target { vect_unpack || { vect_vla && vect_sext_char_longlong } } }
 
where I think neither vect_vla nor vect_sext_char_longlong exists.
 
Richard - didn't you run into similar things with SVE?
 
Richard.
 
 
> Neither an ACK nor NAK right now.
> 
> Jeff
> 
> 
 
-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
 


Re: [PATCH V2] TEST: Fix vect_cond_arith_* dump checks for RVV

2023-10-09 Thread juzhe.zh...@rivai.ai
Hi, Richi and Robin.

Turns out COND(_LEN)?_ADD can't work.

Is this patch Ok ? Or do you have another solution to change the dump check for 
RVV?

Thanks.



juzhe.zh...@rivai.ai
 
From: Juzhe-Zhong
Date: 2023-10-08 09:33
To: gcc-patches
CC: rguenther; jeffreyalaw; rdapp.gcc; Juzhe-Zhong
Subject: [PATCH V2] TEST: Fix vect_cond_arith_* dump checks for RVV
This patch fixes the following dumple FAILs:
FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects  scan-tree-dump 
optimized " = \\.COND_SUB"
FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects  scan-tree-dump 
vect " = \\.COND_ADD"
FAIL: gcc.dg/vect/vect-cond-arith-2.c scan-tree-dump optimized " = \\.COND_SUB"
FAIL: gcc.dg/vect/vect-cond-arith-2.c scan-tree-dump vect " = \\.COND_ADD"
FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects  scan-tree-dump 
optimized " = \\.COND_ADD"
FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects  scan-tree-dump 
optimized " = \\.COND_MUL"
FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects  scan-tree-dump 
optimized " = \\.COND_RDIV"
FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects  scan-tree-dump 
optimized " = \\.COND_SUB"
FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = \\.COND_ADD"
FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = \\.COND_MUL"
FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = \\.COND_RDIV"
FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = \\.COND_SUB"
FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects  scan-tree-dump 
optimized " = \\.COND_ADD"
FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects  scan-tree-dump 
optimized " = \\.COND_MUL"
FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects  scan-tree-dump 
optimized " = \\.COND_RDIV"
FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects  scan-tree-dump 
optimized " = \\.COND_SUB"
FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = \\.COND_ADD"
FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = \\.COND_MUL"
FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = \\.COND_RDIV"
FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = \\.COND_SUB"
FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects  
scan-tree-dump-times optimized " = \\.COND_ADD" 1
FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects  
scan-tree-dump-times optimized " = \\.COND_MUL" 1
FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects  
scan-tree-dump-times optimized " = \\.COND_RDIV" 1
FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects  
scan-tree-dump-times optimized " = \\.COND_SUB" 1
FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = 
\\.COND_ADD" 1
FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = 
\\.COND_MUL" 1
FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = 
\\.COND_RDIV" 1
FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = 
\\.COND_SUB" 1
 
For RVV, the expected dumple IR is COND_LEN_* pattern.
 
Also, we are still failing at this check:
 
FAIL: gcc.dg/vect/vect-cond-arith-2.c scan-tree-dump optimized " = 
\\.COND_LEN_SUB"
FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects  scan-tree-dump 
optimized " = \\.COND_LEN_SUB"
 
Since we have a known bug in GIMPLE_FOLD that Robin is working on it.
 
@Robin: Plz make sure vect-cond-arith-2.c passes with this patch and your bug 
fix patch.
 
Ok for trunk ?
 
gcc/testsuite/ChangeLog:
 
* gcc.dg/vect/vect-cond-arith-2.c: Fix dump check for RVV.
* gcc.dg/vect/vect-cond-arith-4.c: Ditto.
* gcc.dg/vect/vect-cond-arith-5.c: Ditto.
* gcc.dg/vect/vect-cond-arith-6.c: Ditto.
 
---
gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c | 4 ++--
gcc/testsuite/gcc.dg/vect/vect-cond-arith-4.c | 8 
gcc/testsuite/gcc.dg/vect/vect-cond-arith-5.c | 8 
gcc/testsuite/gcc.dg/vect/vect-cond-arith-6.c | 8 
4 files changed, 14 insertions(+), 14 deletions(-)
 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c 
b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c
index 38994ea82a5..3832a660023 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c
@@ -41,5 +41,5 @@ neg_xi (double *x)
   return res_3;
}
-/* { dg-final { scan-tree-dump { = \.COND_ADD} "vect" { target { 
vect_double_cond_arith && vect_fully_masked } } } } */
-/* { dg-final { scan-tree-dump { = \.COND_SUB} "optimized" { target { 
vect_double_cond_arith && vect_fully_masked } } } } */
+/* { dg-final { scan-tree-dump { = \

Re: Re: [PATCH] TEST: Fix vect_cond_arith_* dump checks for RVV

2023-10-07 Thread juzhe.zh...@rivai.ai
Hi, Jeff.

Address your comments and fix on V2:
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632239.html 

I think it look reasonable good for a long term maintenance now.

Ok for trunk ?



juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-10-07 23:09
To: Juzhe-Zhong; gcc-patches
CC: rguenther; rdapp.gcc
Subject: Re: [PATCH] TEST: Fix vect_cond_arith_* dump checks for RVV
 
 
On 10/7/23 05:45, Juzhe-Zhong wrote:
> This patch fixes the following dumple FAILs:
> FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects  scan-tree-dump 
> optimized " = \\.COND_SUB"
> FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects  scan-tree-dump 
> vect " = \\.COND_ADD"
> FAIL: gcc.dg/vect/vect-cond-arith-2.c scan-tree-dump optimized " = 
> \\.COND_SUB"
> FAIL: gcc.dg/vect/vect-cond-arith-2.c scan-tree-dump vect " = \\.COND_ADD"
> FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects  scan-tree-dump 
> optimized " = \\.COND_ADD"
> FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects  scan-tree-dump 
> optimized " = \\.COND_MUL"
> FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects  scan-tree-dump 
> optimized " = \\.COND_RDIV"
> FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects  scan-tree-dump 
> optimized " = \\.COND_SUB"
> FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = 
> \\.COND_ADD"
> FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = 
> \\.COND_MUL"
> FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = 
> \\.COND_RDIV"
> FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = 
> \\.COND_SUB"
> FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects  scan-tree-dump 
> optimized " = \\.COND_ADD"
> FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects  scan-tree-dump 
> optimized " = \\.COND_MUL"
> FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects  scan-tree-dump 
> optimized " = \\.COND_RDIV"
> FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects  scan-tree-dump 
> optimized " = \\.COND_SUB"
> FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = 
> \\.COND_ADD"
> FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = 
> \\.COND_MUL"
> FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = 
> \\.COND_RDIV"
> FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = 
> \\.COND_SUB"
> FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects  
> scan-tree-dump-times optimized " = \\.COND_ADD" 1
> FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects  
> scan-tree-dump-times optimized " = \\.COND_MUL" 1
> FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects  
> scan-tree-dump-times optimized " = \\.COND_RDIV" 1
> FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects  
> scan-tree-dump-times optimized " = \\.COND_SUB" 1
> FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = 
> \\.COND_ADD" 1
> FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = 
> \\.COND_MUL" 1
> FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = 
> \\.COND_RDIV" 1
> FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = 
> \\.COND_SUB" 1
> 
> For RVV, the expected dumple IR is COND_LEN_* pattern.
> 
> Also, we are still failing at this check:
> 
> FAIL: gcc.dg/vect/vect-cond-arith-2.c scan-tree-dump optimized " = 
> \\.COND_LEN_SUB"
> FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects  scan-tree-dump 
> optimized " = \\.COND_LEN_SUB"
> 
> Since we have a known bug in GIMPLE_FOLD that Robin is working on it.
> 
> @Robin: Plz make sure vect-cond-arith-2.c passes with this patch and your bug 
> fix patch.
> 
> Ok for trunk ?
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.dg/vect/vect-cond-arith-2.c: Fix dump check for RVV.
> * gcc.dg/vect/vect-cond-arith-4.c: Ditto.
> * gcc.dg/vect/vect-cond-arith-5.c: Ditto.
> * gcc.dg/vect/vect-cond-arith-6.c: Ditto.
Would it make more sense to adjust the regexp so that it matched the 
standard form as well as the LEN form?  So for example we could have a 
regexp that matched COND_ADD and COND_LEN_ADD.
 
Just wondering if that'll be better from a long term maintenance standpoint.
 
Jeff
 


Re: Re: [PATCH v1] RISC-V: Add more run test for FP rounding autovec

2023-10-07 Thread juzhe.zh...@rivai.ai
Also I have reverted your commit:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=066a43ce72ab6559ba14af9628df19daa0b85cdf

Plz test the patch and verify it doesn't cause any FAILs if the toolchain 
doesn't have "zvfh_zfh".




juzhe.zh...@rivai.ai
 
From: juzhe.zh...@rivai.ai
Date: 2023-10-07 17:49
To: pan2.li; gcc-patches
CC: pan2.li; yanzhang.wang; kito.cheng
Subject: Re: [PATCH v1] RISC-V: Add more run test for FP rounding autovec
These testcases cause multiple FAILs:

I think you should 
/* { dg-do run { target { riscv_v && riscv_zvfh_hw && riscv_zfh_ok } } } */



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-10-07 14:25
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Add more run test for FP rounding autovec
From: Pan Li 
 
For _Float16 types, add run test for:
* ceil
* floor
* nearbyint
* rint
* round
* roundeven
* trunc
 
For float and double, add run test for:
* roundeven
 
The zfa extension is required for these run test cases, the simulation
target_board may look like below for rv64.
 
target_board="riscv-sim/-march=rv64gcv_zfa_zfh/-mabi=lp64d/-mcmodel=medlow"
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/rvv.exp: Add zfa for building.
* gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-floor-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-nearbyint-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-rint-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-round-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-2.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-trunc-run-0.c: New test.
 
Signed-off-by: Pan Li 
---
.../riscv/rvv/autovec/unop/math-ceil-run-0.c  | 39 +++
.../riscv/rvv/autovec/unop/math-floor-run-0.c | 39 +++
.../rvv/autovec/unop/math-nearbyint-run-0.c   | 48 +++
.../riscv/rvv/autovec/unop/math-rint-run-0.c  | 48 +++
.../riscv/rvv/autovec/unop/math-round-run-0.c | 39 +++
.../rvv/autovec/unop/math-roundeven-run-0.c   | 39 +++
.../rvv/autovec/unop/math-roundeven-run-1.c   | 39 +++
.../rvv/autovec/unop/math-roundeven-run-2.c   | 39 +++
.../riscv/rvv/autovec/unop/math-trunc-run-0.c | 39 +++
gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|  4 +-
10 files changed, 371 insertions(+), 2 deletions(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-run-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-nearbyint-run-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-rint-run-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-round-run-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-1.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-2.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-trunc-run-0.c
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c
new file mode 100644
index 000..70cba3602bb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c
@@ -0,0 +1,39 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99 -O3 -ftree-vectorize -fno-vect-cost-model 
-ffast-math" } */
+
+#include "test-math.h"
+
+#define ARRAY_SIZE 128
+
+_Float16 in[ARRAY_SIZE];
+_Float16 out[ARRAY_SIZE];
+_Float16 ref[ARRAY_SIZE];
+
+TEST_UNARY_CALL (_Float16, __builtin_ceilf16)
+TEST_ASSERT (_Float16)
+
+TEST_INIT (_Float16, 1.2, 2.0, 1)
+TEST_INIT (_Float16, -1.2, -1.0, 2)
+TEST_INIT (_Float16, 3.0, 3.0, 3)
+TEST_INIT (_Float16, 1023.5, 1024.0, 4)
+TEST_INIT (_Float16, 1024.0, 1024.0, 5)
+TEST_INIT (_Float16, 0.0, 0.0, 6)
+TEST_INIT (_Float16, -0.0, -0.0, 7)
+TEST_INIT (_Float16, -1023.5, -1023.0, 8)
+TEST_INIT (_Float16, -1024.0, -1024.0, 9)
+
+int
+main ()
+{
+  RUN_TEST (_Float16, 1, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (_Float16, 2, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (_Float16, 3, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (_Float16, 4, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (_Float16, 5, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (_Float16, 6, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (_Float16, 7, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (_Float16, 8, __bu

Re: [PATCH v1] RISC-V: Add more run test for FP rounding autovec

2023-10-07 Thread juzhe.zh...@rivai.ai
These testcases cause multiple FAILs:

I think you should 
/* { dg-do run { target { riscv_v && riscv_zvfh_hw && riscv_zfh_ok } } } */



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-10-07 14:25
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Add more run test for FP rounding autovec
From: Pan Li 
 
For _Float16 types, add run test for:
* ceil
* floor
* nearbyint
* rint
* round
* roundeven
* trunc
 
For float and double, add run test for:
* roundeven
 
The zfa extension is required for these run test cases, the simulation
target_board may look like below for rv64.
 
target_board="riscv-sim/-march=rv64gcv_zfa_zfh/-mabi=lp64d/-mcmodel=medlow"
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/rvv.exp: Add zfa for building.
* gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-floor-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-nearbyint-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-rint-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-round-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-2.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-trunc-run-0.c: New test.
 
Signed-off-by: Pan Li 
---
.../riscv/rvv/autovec/unop/math-ceil-run-0.c  | 39 +++
.../riscv/rvv/autovec/unop/math-floor-run-0.c | 39 +++
.../rvv/autovec/unop/math-nearbyint-run-0.c   | 48 +++
.../riscv/rvv/autovec/unop/math-rint-run-0.c  | 48 +++
.../riscv/rvv/autovec/unop/math-round-run-0.c | 39 +++
.../rvv/autovec/unop/math-roundeven-run-0.c   | 39 +++
.../rvv/autovec/unop/math-roundeven-run-1.c   | 39 +++
.../rvv/autovec/unop/math-roundeven-run-2.c   | 39 +++
.../riscv/rvv/autovec/unop/math-trunc-run-0.c | 39 +++
gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|  4 +-
10 files changed, 371 insertions(+), 2 deletions(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-run-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-nearbyint-run-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-rint-run-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-round-run-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-1.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-2.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-trunc-run-0.c
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c
new file mode 100644
index 000..70cba3602bb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c
@@ -0,0 +1,39 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99 -O3 -ftree-vectorize -fno-vect-cost-model 
-ffast-math" } */
+
+#include "test-math.h"
+
+#define ARRAY_SIZE 128
+
+_Float16 in[ARRAY_SIZE];
+_Float16 out[ARRAY_SIZE];
+_Float16 ref[ARRAY_SIZE];
+
+TEST_UNARY_CALL (_Float16, __builtin_ceilf16)
+TEST_ASSERT (_Float16)
+
+TEST_INIT (_Float16, 1.2, 2.0, 1)
+TEST_INIT (_Float16, -1.2, -1.0, 2)
+TEST_INIT (_Float16, 3.0, 3.0, 3)
+TEST_INIT (_Float16, 1023.5, 1024.0, 4)
+TEST_INIT (_Float16, 1024.0, 1024.0, 5)
+TEST_INIT (_Float16, 0.0, 0.0, 6)
+TEST_INIT (_Float16, -0.0, -0.0, 7)
+TEST_INIT (_Float16, -1023.5, -1023.0, 8)
+TEST_INIT (_Float16, -1024.0, -1024.0, 9)
+
+int
+main ()
+{
+  RUN_TEST (_Float16, 1, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (_Float16, 2, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (_Float16, 3, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (_Float16, 4, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (_Float16, 5, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (_Float16, 6, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (_Float16, 7, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (_Float16, 8, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (_Float16, 9, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-run-0.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-run-0.c
new file mode 100644
index 000..c542278c1f5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-run-0.c
@@ -0,0 +1,39 @@
+/* { dg-do run {

Re: [PATCH v1] RISC-V: Add more run test for FP rounding autovec

2023-10-06 Thread juzhe.zh...@rivai.ai
OK



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-10-07 14:25
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Add more run test for FP rounding autovec
From: Pan Li 
 
For _Float16 types, add run test for:
* ceil
* floor
* nearbyint
* rint
* round
* roundeven
* trunc
 
For float and double, add run test for:
* roundeven
 
The zfa extension is required for these run test cases, the simulation
target_board may look like below for rv64.
 
target_board="riscv-sim/-march=rv64gcv_zfa_zfh/-mabi=lp64d/-mcmodel=medlow"
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/rvv.exp: Add zfa for building.
* gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-floor-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-nearbyint-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-rint-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-round-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-2.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-trunc-run-0.c: New test.
 
Signed-off-by: Pan Li 
---
.../riscv/rvv/autovec/unop/math-ceil-run-0.c  | 39 +++
.../riscv/rvv/autovec/unop/math-floor-run-0.c | 39 +++
.../rvv/autovec/unop/math-nearbyint-run-0.c   | 48 +++
.../riscv/rvv/autovec/unop/math-rint-run-0.c  | 48 +++
.../riscv/rvv/autovec/unop/math-round-run-0.c | 39 +++
.../rvv/autovec/unop/math-roundeven-run-0.c   | 39 +++
.../rvv/autovec/unop/math-roundeven-run-1.c   | 39 +++
.../rvv/autovec/unop/math-roundeven-run-2.c   | 39 +++
.../riscv/rvv/autovec/unop/math-trunc-run-0.c | 39 +++
gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|  4 +-
10 files changed, 371 insertions(+), 2 deletions(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-run-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-nearbyint-run-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-rint-run-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-round-run-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-1.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-2.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-trunc-run-0.c
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c
new file mode 100644
index 000..70cba3602bb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c
@@ -0,0 +1,39 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99 -O3 -ftree-vectorize -fno-vect-cost-model 
-ffast-math" } */
+
+#include "test-math.h"
+
+#define ARRAY_SIZE 128
+
+_Float16 in[ARRAY_SIZE];
+_Float16 out[ARRAY_SIZE];
+_Float16 ref[ARRAY_SIZE];
+
+TEST_UNARY_CALL (_Float16, __builtin_ceilf16)
+TEST_ASSERT (_Float16)
+
+TEST_INIT (_Float16, 1.2, 2.0, 1)
+TEST_INIT (_Float16, -1.2, -1.0, 2)
+TEST_INIT (_Float16, 3.0, 3.0, 3)
+TEST_INIT (_Float16, 1023.5, 1024.0, 4)
+TEST_INIT (_Float16, 1024.0, 1024.0, 5)
+TEST_INIT (_Float16, 0.0, 0.0, 6)
+TEST_INIT (_Float16, -0.0, -0.0, 7)
+TEST_INIT (_Float16, -1023.5, -1023.0, 8)
+TEST_INIT (_Float16, -1024.0, -1024.0, 9)
+
+int
+main ()
+{
+  RUN_TEST (_Float16, 1, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (_Float16, 2, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (_Float16, 3, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (_Float16, 4, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (_Float16, 5, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (_Float16, 6, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (_Float16, 7, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (_Float16, 8, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (_Float16, 9, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-run-0.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-run-0.c
new file mode 100644
index 000..c542278c1f5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-run-0.c
@@ -0,0 +1,39 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99 -O3 -ftree-vectorize -fno-vect-cost-model 
-ffast-math" } */
+
+#include

Re: [PATCH] RISC-V: Fix scan-assembler-times of RVV test case

2023-10-06 Thread juzhe.zh...@rivai.ai
OK.



juzhe.zh...@rivai.ai
 
From: Li Xu
Date: 2023-10-07 11:18
To: gcc-patches
CC: kito.cheng; palmer; juzhe.zhong; xuli
Subject: [PATCH] RISC-V: Fix scan-assembler-times of RVV test case
From: xuli 
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-25.c: Adjust assembler 
times.
* gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-26.c: Ditto.
---
.../gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-25.c   | 10 +-
.../gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-26.c   | 10 +-
2 files changed, 10 insertions(+), 10 deletions(-)
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-25.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-25.c
index c566f8a4751..2ec9487a6c6 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-25.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-25.c
@@ -88,8 +88,8 @@ void f (void * restrict in, void * restrict out, int n, int 
cond)
   }
}
-/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*mf8,\s*t[au],\s*m[au]} 3 { target { 
no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
"-funroll-loops" no-opts "-g" } } } } */
-/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e16,\s*mf4,\s*t[au],\s*m[au]} 2 { target { 
no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
"-funroll-loops" no-opts "-g" } } } } */
-/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e32,\s*mf2,\s*t[au],\s*m[au]} 3 { target { 
no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
"-funroll-loops" no-opts "-g" } } } } */
-/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e64,\s*m1,\s*t[au],\s*m[au]} 2 { target { 
no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
"-funroll-loops" no-opts "-g" } } } } */
-/* { dg-final { scan-assembler-times {vsetvli} 10 { target { no-opts "-O0"  
no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts 
"-g" } } } } */
+/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*mf8,\s*t[au],\s*m[au]} 10 { target { 
no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
"-funroll-loops" no-opts "-g" } } } } */
+/* { dg-final { scan-assembler-not 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e16,\s*mf4,\s*t[au],\s*m[au]} { target { 
no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
"-funroll-loops" no-opts "-g" } } } } */
+/* { dg-final { scan-assembler-not 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e32,\s*mf2,\s*t[au],\s*m[au]} { target { 
no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
"-funroll-loops" no-opts "-g" } } } } */
+/* { dg-final { scan-assembler-not 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e64,\s*m1,\s*t[au],\s*m[au]} { target { no-opts 
"-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" 
no-opts "-g" } } } } */
+/* { dg-final { scan-assembler-times {vsetvli} 19 { target { no-opts "-O0"  
no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts 
"-g" } } } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-26.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-26.c
index d0e75258188..bcafce36895 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-26.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-26.c
@@ -80,8 +80,8 @@ void f (void * restrict in, void * restrict out, int n, int 
cond)
   }
}
-/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*mf8,\s*t[au],\s*m[au]} 3 { target { 
no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
"-funroll-loops" no-opts "-g" } } } } */
-/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e16,\s*mf4,\s*t[au],\s*m[au]} 2 { target { 
no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
"-funroll-loops" no-opts "-g" } } } } */
-/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e32,\s*mf2,\s*t[au],\s*m[au]} 3 { target { 
no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
"-funroll-loops" no-opts "-g" } } } } */
-/* { dg-final { scan-assembler-times

Re: Re: [PATCH] test: Isolate slp-1.c check of target supports vect_strided5

2023-10-06 Thread juzhe.zh...@rivai.ai
Thanks for reporting it.

I think we may need to change it into:
+ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" { 
target {! vect_load_lanes } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { 
target vect_strided5 && vect_load_lanes } } } */

Could you verify it whether it work for you ?

Thanks.


juzhe.zh...@rivai.ai
 
From: Andrew Stubbs
Date: 2023-10-06 22:29
To: Juzhe-Zhong; gcc-patches@gcc.gnu.org
CC: rguent...@suse.de; jeffreya...@gmail.com; richard.sandif...@arm.com
Subject: Re: [PATCH] test: Isolate slp-1.c check of target supports 
vect_strided5
On 15/09/2023 10:16, Juzhe-Zhong wrote:
> This test failed in RISC-V:
> FAIL: gcc.dg/vect/slp-1.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
> "vectorizing stmts using SLP" 4
> FAIL: gcc.dg/vect/slp-1.c scan-tree-dump-times vect "vectorizing stmts using 
> SLP" 4
> 
> Because this loop:
>/* SLP with unrolling by 8.  */
>for (i = 0; i < N; i++)
>  {
>out[i*5] = 8;
>out[i*5 + 1] = 7;
>out[i*5 + 2] = 81;
>out[i*5 + 3] = 28;
>out[i*5 + 4] = 18;
>  }
> 
> is using vect_load_lanes with array size = 5.
> instead of SLP.
> 
> When we adjust the COST of LANES load store, then it will use SLP.
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.dg/vect/slp-1.c: Add vect_stried5.
> 
> ---
>   gcc/testsuite/gcc.dg/vect/slp-1.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-1.c 
> b/gcc/testsuite/gcc.dg/vect/slp-1.c
> index 82e4f6469fb..d4a13f12df6 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-1.c
> @@ -122,5 +122,5 @@ int main (void)
>   }
>   
>   /* { dg-final { scan-tree-dump-times "vectorized 4 loops" 1 "vect"  } } */
> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" 
> } } */
> -
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" 
> { target {! vect_strided5 } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" 
> { target vect_strided5 } } } */
 
This patch causes a test regression on amdgcn because vect_strided5 is 
true (because check_effective_target_vect_fully_masked is true), but the 
testcase still gives the message 4 times. Perhaps because amdgcn uses 
masking and not vect_load_lanes?
 
Andrew
 


Re: [PATCH v1] RISC-V: Support {U}INT64 to FP16 auto-vectorization

2023-09-27 Thread juzhe.zh...@rivai.ai
Plz add "!flag_trapping_math"



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-09-28 13:59
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Support {U}INT64 to FP16 auto-vectorization
From: Pan Li 
 
This patch would like to support the auto-vectorization from
the INT64 to FP16. We take below steps for the conversion.
 
* INT64 to FP32.
* FP32 to FP16.
 
Given sample code as below:
void
test_func (int64_t * __restrict a, _Float16 *b, unsigned n)
{
  for (unsigned i = 0; i < n; i++)
b[i] = (_Float16) (a[i]);
}
 
Before this patch:
test.c:6:26: missed: couldn't vectorize loop
test.c:6:26: missed: not vectorized: unsupported data-type
ld  a0,0(s0)
call__floatdihf
fsh fa0,0(s1)
addis0,s0,8
addis1,s1,2
bne s2,s0,.L3
ld  ra,24(sp)
ld  s0,16(sp)
ld  s1,8(sp)
ld  s2,0(sp)
addisp,sp,32
 
After this patch:
vsetvli a5,a2,e8,mf8,ta,ma
vle64.v v1,0(a0)
vsetvli a4,zero,e32,mf2,ta,ma
vfncvt.f.x.wv1,v1
vsetvli zero,zero,e16,mf4,ta,ma
vfncvt.f.f.wv1,v1
vsetvli zero,a2,e16,mf4,ta,ma
vse16.v v1,0(a1)
 
Please note VLS mode is also involved in this patch and covered by the
test cases.
 
PR target/111506
 
gcc/ChangeLog:
 
* config/riscv/autovec.md (2):
* config/riscv/vector-iterators.md:
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-rv32gcv.c:
Adjust checker.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-rv64gcv.c:
Ditto.
* gcc.target/riscv/rvv/autovec/unop/cvt-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/cvt-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cvt-0.c: New test.
 
Signed-off-by: Pan Li 
---
gcc/config/riscv/autovec.md   | 24 ++
gcc/config/riscv/vector-iterators.md  | 38 +++
.../autovec/conversions/vfncvt-itof-rv32gcv.c |  5 +-
.../autovec/conversions/vfncvt-itof-rv64gcv.c |  5 +-
.../gcc.target/riscv/rvv/autovec/unop/cvt-0.c | 21 +
.../gcc.target/riscv/rvv/autovec/unop/cvt-1.c | 22 +
.../gcc.target/riscv/rvv/autovec/vls/cvt-0.c  | 47 +++
7 files changed, 158 insertions(+), 4 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/cvt-0.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/cvt-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/cvt-0.c
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index cd0cbdd2889..6dd3b96a423 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -974,6 +974,30 @@ (define_insn_and_split "2"
}
[(set_attr "type" "vfncvtitof")])
+;; This operation can be performed in the loop vectorizer but unfortunately
+;; not applicable for now. We can remove this pattern after loop vectorizer
+;; is able to take care of INT64 to FP16 conversion.
+(define_insn_and_split "2"
+  [(set (match_operand:  0 "register_operand")
+ (any_float:
+   (match_operand:VWWCONVERTI 1 "register_operand")))]
+  "TARGET_VECTOR && TARGET_ZVFH && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+rtx single = gen_reg_rtx (mode); /* Get vector SF mode.  */
+
+/* Step-1, INT64 => FP32.  */
+emit_insn (gen_2 (single, operands[1]));
+/* Step-2, FP32 => FP16.  */
+emit_insn (gen_trunc2 (operands[0], single));
+
+DONE;
+  }
+  [(set_attr "type" "vfncvtitof")]
+)
+
;; =
;; == Unary arithmetic
;; =
diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index b6cd872eb42..c9a7344b1bc 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -1247,6 +1247,24 @@ (define_mode_iterator VWCONVERTI [
   (V512DI "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_64 && 
TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN >= 4096")
])
+(define_mode_iterator VWWCONVERTI [
+  (RVVM8DI "TARGET_VECTOR_ELEN_64 && TARGET_ZVFH")
+  (RVVM4DI "TARGET_VECTOR_ELEN_64 && TARGET_ZVFH")
+  (RVVM2DI "TARGET_VECTOR_ELEN_64 && TARGET_ZVFH")
+  (RVVM1DI "TARGET_VECTOR_ELEN_64 && TARGET_ZVFH")
+
+  (V1DI "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_64 && TARGET_ZVFH")
+  (V2DI "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_64 && TARGET_ZVFH")
+  (V4DI "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_64 && TARGET_ZVFH")
+  (V8DI "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_64 &&am

Re: [PATCH v2] RISC-V: Bugfix for RTL check[PR111533]

2023-09-27 Thread juzhe.zh...@rivai.ai
LGTM. Thanks for fixing it.



juzhe.zh...@rivai.ai
 
From: Li Xu
Date: 2023-09-28 09:33
To: gcc-patches
CC: kito.cheng; palmer; juzhe.zhong; xuli
Subject: [PATCH v2] RISC-V: Bugfix for RTL check[PR111533]
From: xuli 
 
Consider the flowing situation:
BB5: local_dem(RVV Insn 1, AVL(reg zero))
RVV Insn 1: vmv.s.x, AVL (const_int 1)
RVV Insn 2: vredsum.vs, AVL(reg zero)
 
vmv.s.x has vl operand, the following code will get
avl (cosnt_int) from RVV Insn 1.
rtx avl = has_vl_op (insn->rtl ()) ? get_vl (insn->rtl ())
   : dem.get_avl ();
 
If use REGNO for const_int, the compiler will crash:
 
during RTL pass: vsetvl
res_debug.c: In function '__dn_count_labels':
res_debug.c:1050:1: internal compiler error: RTL check: expected code 'reg',
have 'const_int' in rhs_regno, at rtl.h:1934
1050 | }
  | ^
0x8fb169 rtl_check_failed_code1(rtx_def const*, rtx_code, char const*, int, 
char const*)
../.././gcc/gcc/rtl.cc:770
0x1399818 rhs_regno(rtx_def const*)
../.././gcc/gcc/rtl.h:1934
0x1399818 anticipatable_occurrence_p
../.././gcc/gcc/config/riscv/riscv-vsetvl.cc:348
 
So in this case avl should be obtained from dem.
 
Another issue is caused by the following code:
HOST_WIDE_INT diff = INTVAL (builder.elt (i)) - i;
 
during RTL pass: expand
../../.././gcc/libgfortran/generated/matmul_c4.c: In function 'matmul_c4':
../../.././gcc/libgfortran/generated/matmul_c4.c:2906:39: internal compiler 
error: RTL check:
expected code 'const_int', have 'const_poly_int' in expand_const_vector,
at config/riscv/riscv-v.cc:1149
 
The builder.elt (i) can be either const_int or const_poly_int.
 
PR target/111533
 
gcc/ChangeLog:
 
* config/riscv/riscv-v.cc (expand_const_vector): Fix bug.
* config/riscv/riscv-vsetvl.cc (anticipatable_occurrence_p): Fix bug.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/pr111533-1.c: New test.
* gcc.target/riscv/rvv/base/pr111533-2.c: New test.
---
gcc/config/riscv/riscv-v.cc   |  5 ++--
gcc/config/riscv/riscv-vsetvl.cc  |  3 +-
.../gcc.target/riscv/rvv/base/pr111533-1.c| 15 ++
.../gcc.target/riscv/rvv/base/pr111533-2.c| 29 +++
4 files changed, 48 insertions(+), 4 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr111533-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr111533-2.c
 
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 359fb2ced8b..26700cfc732 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -1149,8 +1149,9 @@ expand_const_vector (rtx target, rtx src)
  for (unsigned int i = 0; i < v.npatterns (); ++i)
{
  /* Calculate the diff between the target sequence and
-  vid sequence.  */
-   HOST_WIDE_INT diff = INTVAL (builder.elt (i)) - i;
+  vid sequence.  The elt (i) can be either const_int or
+  const_poly_int. */
+   poly_int64 diff = rtx_to_poly_int64 (builder.elt (i)) - i;
  v.quick_push (gen_int_mode (diff, v.inner_mode ()));
}
  /* Step 2: Generate result = VID + diff.  */
diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 7af33e7ea6f..af8c31d873c 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -307,8 +307,7 @@ anticipatable_occurrence_p (const bb_info *bb, const 
vector_insn_info dem)
   if (dem.has_avl_reg ())
 {
   /* rs1 (avl) are not modified in the basic block prior to the VSETVL.  */
-  rtx avl
- = has_vl_op (insn->rtl ()) ? get_vl (insn->rtl ()) : dem.get_avl ();
+  rtx avl = dem.get_avl_or_vl_reg ();
   if (dem.dirty_p ())
{
  gcc_assert (!vsetvl_insn_p (insn->rtl ()));
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr111533-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/pr111533-1.c
new file mode 100644
index 000..aba26dfac89
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr111533-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O2 -ffast-math -ftree-vectorize" 
} */
+
+#include 
+
+typedef _Complex float GFC_COMPLEX_4;
+
+void
+test (GFC_COMPLEX_4 *a, GFC_COMPLEX_4 *b, GFC_COMPLEX_4 c, ptrdiff_t i, 
ptrdiff_t j)
+{
+  ptrdiff_t l;
+  for (l = 0; l <= i; ++l)
+c += b[l] * a[j];
+  b[j] = c;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr111533-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/pr111533-2.c
new file mode 100644
index 000..a4d2011b74b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr111533-2.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O2" } */
+
+#include 
+
+/* Return the number of DNS hierarchy levels in the name. */
+int
+test (const char *name) {
+ int i, len, count;
+
+ len = strlen(name);
+ for (i = 0, count = 0; i < len; i++) {
+ /* XXX need to check for \. or use named's nlabels(). */
+ if (nam

Re: Re: [PATCH V3] RISC-V: Remove mem-to-mem VLS move pattern[PR111566]

2023-09-27 Thread juzhe.zh...@rivai.ai
Since after removing mem-to-mem pattern.

program main
  integer, dimension(:,:), allocatable :: a, b
  integer, dimension(:), allocatable :: sh
  allocate (a(2,2))
  allocate (b(2,2))
  allocate (sh(3))
  a = 1
  b = cshift(a,sh)
end program main

This case will failed if we don't change mov pattern.



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-09-27 18:07
To: juzhe.zh...@rivai.ai
CC: kito.cheng; gcc-patches; jeffreyalaw; Robin Dapp
Subject: Re: Re: [PATCH V3] RISC-V: Remove mem-to-mem VLS move pattern[PR111566]
I can understand why remove mem to mem pattern, but why the normal mov
pattern for VLS_AVL_IMM need to change too?
 
On Wed, Sep 27, 2023 at 10:39 AM juzhe.zh...@rivai.ai
 wrote:
>
> >> Why add `can_create_pseudo_p ()` here? this will split after reload,
> >> but we forbid that pattern between reload and split2?
>
> I have no ideal. Some fortran tests just need recognization of mem-to-mem 
> pattern before RA.
> I don't know the reason.
>
> ____
> juzhe.zh...@rivai.ai
>
>
> From: Kito Cheng
> Date: 2023-09-27 17:33
> To: Juzhe-Zhong
> CC: gcc-patches; kito.cheng; jeffreyalaw; rdapp.gcc
> Subject: Re: [PATCH V3] RISC-V: Remove mem-to-mem VLS move pattern[PR111566]
> >  (define_insn_and_split "*mov"
> >[(set (match_operand:VLS_AVL_IMM 0 "reg_or_mem_operand" "=vr, m, vr")
> > (match_operand:VLS_AVL_IMM 1 "reg_or_mem_operand" "  m,vr, vr"))]
> >"TARGET_VECTOR
> > -   && (register_operand (operands[0], mode)
> > +   && (can_create_pseudo_p ()
>
> Why add `can_create_pseudo_p ()` here? this will split after reload,
> but we forbid that pattern between reload and split2?
>
> > +   || register_operand (operands[0], mode)
> > || register_operand (operands[1], mode))"
> >"@
> > #
> > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/mov-1.c 
> > b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/mov-1.c
> > index aedf98819bb..24bb7240db8 100644
> > --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/mov-1.c
> > +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/mov-1.c
> > @@ -4,54 +4,6 @@
> >
> >  #include "def.h"
> >
> > -/*
> > -** mov0:
> > -** lbu\s+[a-x0-9]+,0\s*\([a-x0-9]+\)
> > -** sb\s+[a-x0-9]+,0\s*\([a-x0-9]+\)
> > -**  ret
> > -*/
> > -void mov0 (int8_t *in, int8_t *out)
> > -{
> > -  v1qi v = *(v1qi*)in;
> > -  *(v1qi*)out = v;
> > -}
> > -
> > -/*
> > -** mov1:
> > -** lhu\s+[a-x0-9]+,0\s*\([a-x0-9]+\)
> > -** sh\s+[a-x0-9]+,0\s*\([a-x0-9]+\)
> > -**  ret
> > -*/
> > -void mov1 (int8_t *in, int8_t *out)
> > -{
> > -  v2qi v = *(v2qi*)in;
> > -  *(v2qi*)out = v;
> > -}
> > -
> > -/*
> > -** mov2:
> > -** lw\s+[a-x0-9]+,0\s*\([a-x0-9]+\)
> > -** sw\s+[a-x0-9]+,0\s*\([a-x0-9]+\)
> > -**  ret
> > -*/
> > -void mov2 (int8_t *in, int8_t *out)
> > -{
> > -  v4qi v = *(v4qi*)in;
> > -  *(v4qi*)out = v;
> > -}
> > -
> > -/*
> > -** mov3:
> > -** ld\s+[a-x0-9]+,0\s*\([a-x0-9]+\)
> > -** sd\s+[a-x0-9]+,0\s*\([a-x0-9]+\)
> > -**  ret
> > -*/
> > -void mov3 (int8_t *in, int8_t *out)
> > -{
> > -  v8qi v = *(v8qi*)in;
> > -  *(v8qi*)out = v;
> > -}
> > -
> >  /*
> >  ** mov4:
> >  ** vsetivli\s+zero,\s*16,\s*e8,\s*mf8,\s*t[au],\s*m[au]
> > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/mov-10.c 
> > b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/mov-10.c
> > index 5e9615412b7..cae96b3be3f 100644
> > --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/mov-10.c
> > +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/mov-10.c
> > @@ -4,18 +4,6 @@
> >
> >  #include "def.h"
> >
> > -/*
> > -** mov0:
> > -** fld\s+[a-x0-9]+,0\s*\([a-x0-9]+\)
> > -** fsd\s+[a-x0-9]+,0\s*\([a-x0-9]+\)
> > -**  ret
> > -*/
> > -void mov0 (double *in, double *out)
> > -{
> > -  v1df v = *(v1df*)in;
> > -  *(v1df*)out = v;
> > -}
> > -
> >  /*
> >  ** mov1:
> >  ** vsetivli\s+zero,\s*2,\s*e64,\s*m1,\s*t[au],\s*m[au]
> > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/mov-2.c 
> > b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/mov-2.c
> > deleted file mode 100644
> > index 10ae1972db7..000
> > --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/mov-2.c
> > +++ /dev/null
> > @@ -1,19 +0,0 

Re: Re: [PATCH V3] RISC-V: Remove mem-to-mem VLS move pattern[PR111566]

2023-09-27 Thread juzhe.zh...@rivai.ai
>> Why add `can_create_pseudo_p ()` here? this will split after reload,
>> but we forbid that pattern between reload and split2?

I have no ideal. Some fortran tests just need recognization of mem-to-mem 
pattern before RA.
I don't know the reason.



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-09-27 17:33
To: Juzhe-Zhong
CC: gcc-patches; kito.cheng; jeffreyalaw; rdapp.gcc
Subject: Re: [PATCH V3] RISC-V: Remove mem-to-mem VLS move pattern[PR111566]
>  (define_insn_and_split "*mov"
>[(set (match_operand:VLS_AVL_IMM 0 "reg_or_mem_operand" "=vr, m, vr")
> (match_operand:VLS_AVL_IMM 1 "reg_or_mem_operand" "  m,vr, vr"))]
>"TARGET_VECTOR
> -   && (register_operand (operands[0], mode)
> +   && (can_create_pseudo_p ()
 
Why add `can_create_pseudo_p ()` here? this will split after reload,
but we forbid that pattern between reload and split2?
 
> +   || register_operand (operands[0], mode)
> || register_operand (operands[1], mode))"
>"@
> #
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/mov-1.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/mov-1.c
> index aedf98819bb..24bb7240db8 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/mov-1.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/mov-1.c
> @@ -4,54 +4,6 @@
>
>  #include "def.h"
>
> -/*
> -** mov0:
> -** lbu\s+[a-x0-9]+,0\s*\([a-x0-9]+\)
> -** sb\s+[a-x0-9]+,0\s*\([a-x0-9]+\)
> -**  ret
> -*/
> -void mov0 (int8_t *in, int8_t *out)
> -{
> -  v1qi v = *(v1qi*)in;
> -  *(v1qi*)out = v;
> -}
> -
> -/*
> -** mov1:
> -** lhu\s+[a-x0-9]+,0\s*\([a-x0-9]+\)
> -** sh\s+[a-x0-9]+,0\s*\([a-x0-9]+\)
> -**  ret
> -*/
> -void mov1 (int8_t *in, int8_t *out)
> -{
> -  v2qi v = *(v2qi*)in;
> -  *(v2qi*)out = v;
> -}
> -
> -/*
> -** mov2:
> -** lw\s+[a-x0-9]+,0\s*\([a-x0-9]+\)
> -** sw\s+[a-x0-9]+,0\s*\([a-x0-9]+\)
> -**  ret
> -*/
> -void mov2 (int8_t *in, int8_t *out)
> -{
> -  v4qi v = *(v4qi*)in;
> -  *(v4qi*)out = v;
> -}
> -
> -/*
> -** mov3:
> -** ld\s+[a-x0-9]+,0\s*\([a-x0-9]+\)
> -** sd\s+[a-x0-9]+,0\s*\([a-x0-9]+\)
> -**  ret
> -*/
> -void mov3 (int8_t *in, int8_t *out)
> -{
> -  v8qi v = *(v8qi*)in;
> -  *(v8qi*)out = v;
> -}
> -
>  /*
>  ** mov4:
>  ** vsetivli\s+zero,\s*16,\s*e8,\s*mf8,\s*t[au],\s*m[au]
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/mov-10.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/mov-10.c
> index 5e9615412b7..cae96b3be3f 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/mov-10.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/mov-10.c
> @@ -4,18 +4,6 @@
>
>  #include "def.h"
>
> -/*
> -** mov0:
> -** fld\s+[a-x0-9]+,0\s*\([a-x0-9]+\)
> -** fsd\s+[a-x0-9]+,0\s*\([a-x0-9]+\)
> -**  ret
> -*/
> -void mov0 (double *in, double *out)
> -{
> -  v1df v = *(v1df*)in;
> -  *(v1df*)out = v;
> -}
> -
>  /*
>  ** mov1:
>  ** vsetivli\s+zero,\s*2,\s*e64,\s*m1,\s*t[au],\s*m[au]
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/mov-2.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/mov-2.c
> deleted file mode 100644
> index 10ae1972db7..000
> --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/mov-2.c
> +++ /dev/null
> @@ -1,19 +0,0 @@
> -/* { dg-do compile } */
> -/* { dg-options "-march=rv32gcv_zvfh_zvl4096b -mabi=ilp32d -O3 
> -fno-schedule-insns -fno-schedule-insns2" } */
> -/* { dg-final { check-function-bodies "**" "" } } */
> -
> -#include "def.h"
> -
> -/*
> -** mov:
> -** lw\s+[a-x0-9]+,0\s*\([a-x0-9]+\)
> -** lw\s+[a-x0-9]+,4\s*\([a-x0-9]+\)
> -** sw\s+[a-x0-9]+,0\s*\([a-x0-9]+\)
> -** sw\s+[a-x0-9]+,4\s*\([a-x0-9]+\)
> -**  ret
> -*/
> -void mov (int8_t *in, int8_t *out)
> -{
> -  v8qi v = *(v8qi*)in;
> -  *(v8qi*)out = v;
> -}
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/mov-3.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/mov-3.c
> index f2880ae5e77..86ce22896c5 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/mov-3.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/mov-3.c
> @@ -4,42 +4,6 @@
>
>  #include "def.h"
>
> -/*
> -** mov0:
> -** lhu\s+[a-x0-9]+,0\s*\([a-x0-9]+\)
> -** sh\s+[a-x0-9]+,0\s*\([a-x0-9]+\)
> -**  ret
> -*/
> -void mov0 (int16_t *in, int16_t *out)
> -{
> -  v1hi v = *(v1hi*)in;
> -  *(v1hi*)out = v;
> -}
> -
> -/*
> -** mov1:
> -** lw\s+[a-x0-9]+,0\s*\([a-x0-9]+\)
>

Re: [PATCH v1] RISC-V: Support FP roundeven auto-vectorization

2023-09-27 Thread juzhe.zh...@rivai.ai
LGTM



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-09-27 16:20
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Support FP roundeven auto-vectorization
From: Pan Li 
 
This patch would like to support auto-vectorization for the
roundeven API in math.h. It depends on the -ffast-math option.
 
When we would like to call roundeven like v2 = roundeven (v1), we will
convert it into below insns (reference the implementation of llvm).
 
* vfcvt.x.f v3, v1, RNE
* vfcvt.f.x v2, v3
 
However, the floating point value may not need the cvt as above if
its mantissa is zero. For example single precision floating point below.
 
  +---+---+-+
  | raw float | binary layout | after roundeven |
  +---+---+-+
  | 8388607.5 | 0x4aff| 8388608.0   |
  | 8388608.0 | 0x4b00| 8388608.0   |
  | 8388609.0 | 0x4b01| 8388609.0   |
  +---+---+-+
 
All single floating point glte 8388608.0 will have all zero mantisaa.
We leverage vmflt and mask to filter them out in vector and only do the
cvt on mask.
 
Befor this patch:
math-roundeven-1.c:21:1: missed: couldn't vectorize loop
  ...
.L3:
  flw fa0,0(s0)
  addis0,s0,4
  addis1,s1,4
  callroundeven
  fsw fa0,-4(s1)
  bne s0,s2,.L3
 
After this patch:
  ...
  fsrmi   0   // Rounding to nearest, ties to even
.L4:
  vfabs.v v1,v2
  vmflt.vfv0,v1,fa5
  vfcvt.x.f.v v3,v2,v0.t
  vfcvt.f.x.v v1,v3,v0.t
  vfsgnj.vv   v1,v1,v2
  bne .L4
.L14:
  fsrma6
  ret
 
Please note VLS mode is also involved in this patch and covered by the
test cases.  We will add more run test with zfa support later.
 
gcc/ChangeLog:
 
* config/riscv/autovec.md (roundeven2): New pattern.
* config/riscv/riscv-protos.h (enum insn_flags): New enum type.
(enum insn_type): Ditto.
(expand_vec_roundeven): New func decl.
* config/riscv/riscv-v.cc (expand_vec_roundeven): New func impl.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/unop/math-roundeven-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-roundeven-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-roundeven-2.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-roundeven-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-roundeven-1.c: New test.
 
Signed-off-by: Pan Li 
---
gcc/config/riscv/autovec.md   | 10 
gcc/config/riscv/riscv-protos.h   |  5 ++
gcc/config/riscv/riscv-v.cc   | 24 
.../riscv/rvv/autovec/unop/math-roundeven-0.c | 23 
.../riscv/rvv/autovec/unop/math-roundeven-1.c | 23 
.../riscv/rvv/autovec/unop/math-roundeven-2.c | 23 
.../riscv/rvv/autovec/unop/math-roundeven-3.c | 25 +
.../riscv/rvv/autovec/vls/math-roundeven-1.c  | 56 +++
8 files changed, 189 insertions(+)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-1.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-2.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-3.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-roundeven-1.c
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 680a3374972..cd0cbdd2889 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2271,3 +2271,13 @@ (define_expand "btrunc2"
 DONE;
   }
)
+
+(define_expand "roundeven2"
+  [(match_operand:V_VLSF 0 "register_operand")
+   (match_operand:V_VLSF 1 "register_operand")]
+  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
+  {
+riscv_vector::expand_vec_roundeven (operands[0], operands[1], mode, 
mode);
+DONE;
+  }
+)
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 536e70bdcd3..368982a447b 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -259,6 +259,9 @@ enum insn_flags : unsigned int
   /* Means INSN has FRM operand and the value is FRM_RMM.  */
   FRM_RMM_P = 1 << 18,
+
+  /* Means INSN has FRM operand and the value is FRM_RNE.  */
+  FRM_RNE_P = 1 << 19,
};
enum insn_type : unsigned int
@@ -303,6 +306,7 @@ enum insn_type : unsigned int
   UNARY_OP_TAMU_FRM_RUP = UNARY_OP_TAMU | FRM_RUP_P,
   UNARY_OP_TAMU_FRM_RDN = UNARY_OP_TAMU | FRM_RDN_P,
   UNARY_OP_TAMU_FRM_RMM = UNARY_OP_TAMU | FRM_RMM_P,
+  UNARY_OP_TAMU_FRM_RNE = UNARY_OP_TAMU | FRM_RNE_P,
   /* Binary operator.  */
   BINARY_OP = __NORMAL_OP | BINARY_OP_P,
@@ -469,6 +473,7 @@ void expand_vec_nearbyint (rtx, rtx, machine_mode, 
machine_mode);
void expand_vec_rint (rtx, rtx, machine_mode, machine_mode);
void expand_vec_round (rtx, rtx, machine_mode,

Re: [PATCH v1] RISC-V: Support FP trunc auto-vectorization

2023-09-26 Thread juzhe.zh...@rivai.ai
LGTM.



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-09-27 11:28
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Support FP trunc auto-vectorization
From: Pan Li 
 
This patch would like to support auto-vectorization for the
trunc API in math.h. It depends on the -ffast-math option.
 
When we would like to call trunc/truncf like v2 = trunc (v1),
we will convert it into below insns (reference the implementation of
llvm).
 
* vfcvt.rtz.x.f v3, v1
* vfcvt.f.x v2, v3
 
However, the floating point value may not need the cvt as above if
its mantissa is zero. Take single precision floating point as example:
 
  ++---+-+
  | raw float  | binary layout | after trunc |
  ++---+-+
  | -8388607.5 | 0xcaff| -8388607.0  |
  | 8388607.5  | 0x4aff| 8388607.0   |
  | 8388608.0  | 0x4b00| 8388608.0   |
  | 8388609.0  | 0x4b01| 8388609.0   |
  ++---+-+
 
All single floating point >= 8388608.0 will have all zero mantisaa.
We leverage vmflt and mask to filter them out in vector and only do
the cvt on mask.
 
Befor this patch:
math-trunc-1.c:21:1: missed: couldn't vectorize loop
  ...
.L3:
  flw fa0,0(s0)
  addis0,s0,4
  addis1,s1,4
  calltrunc
  fsw fa0,-4(s1)
  bne s0,s2,.L3
 
After this patch:
  vfabs.v v2,v1
  vmflt.vfv0,v2,fa5
  vfcvt.rtz.x.f.v v4,v1,v0.t
  vfcvt.f.x.v v2,v4,v0.t
  vfsgnj.vv   v2,v2,v1
  bne .L4
 
Please note VLS mode is also involved in this patch and covered by the
test cases.
 
gcc/ChangeLog:
 
* config/riscv/autovec.md (btrunc2): New pattern.
* config/riscv/riscv-protos.h (expand_vec_trunc): New func decl.
* config/riscv/riscv-v.cc (emit_vec_cvt_x_f_rtz): New func impl.
(expand_vec_trunc): Ditto.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/unop/math-trunc-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-trunc-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-trunc-2.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-trunc-3.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-trunc-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-trunc-run-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-trunc-1.c: New test.
 
Signed-off-by: Pan Li 
---
gcc/config/riscv/autovec.md   | 10 
gcc/config/riscv/riscv-protos.h   |  1 +
gcc/config/riscv/riscv-v.cc   | 32 +++
.../riscv/rvv/autovec/unop/math-trunc-0.c | 18 ++
.../riscv/rvv/autovec/unop/math-trunc-1.c | 18 ++
.../riscv/rvv/autovec/unop/math-trunc-2.c | 18 ++
.../riscv/rvv/autovec/unop/math-trunc-3.c | 20 +++
.../riscv/rvv/autovec/unop/math-trunc-run-1.c | 39 +
.../riscv/rvv/autovec/unop/math-trunc-run-2.c | 39 +
.../riscv/rvv/autovec/vls/math-trunc-1.c  | 56 +++
10 files changed, 251 insertions(+)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-trunc-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-trunc-1.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-trunc-2.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-trunc-3.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-trunc-run-1.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-trunc-run-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-trunc-1.c
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 798cf1272c5..680a3374972 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2261,3 +2261,13 @@ (define_expand "round2"
 DONE;
   }
)
+
+(define_expand "btrunc2"
+  [(match_operand:V_VLSF 0 "register_operand")
+   (match_operand:V_VLSF 1 "register_operand")]
+  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
+  {
+riscv_vector::expand_vec_trunc (operands[0], operands[1], mode, 
mode);
+DONE;
+  }
+)
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 70ca244c591..536e70bdcd3 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -468,6 +468,7 @@ void expand_vec_floor (rtx, rtx, machine_mode, 
machine_mode);
void expand_vec_nearbyint (rtx, rtx, machine_mode, machine_mode);
void expand_vec_rint (rtx, rtx, machine_mode, machine_mode);
void expand_vec_round (rtx, rtx, machine_mode, machine_mode);
+void expand_vec_trunc (rtx, rtx, machine_mode, machine_mode);
#endif
bool sew64_scalar_helper (rtx *, rtx *, rtx, machine_mode,
  bool, void (*)(rtx *, rtx));
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 5f738634219..8992977a51d 100644
--- a/gcc/config/riscv/riscv-v.cc

Re: [PATCH] RISC-V: Bugfix for RTL check[PR111533]

2023-09-26 Thread juzhe.zh...@rivai.ai
+  vid sequence.  The elt (i) can be either const_int or
+  const_poly_int. */
+   HOST_WIDE_INT diff = rtx_to_poly_int64 (builder.elt (i)).to_constant () - i;

How about:

poly_int64 diff = rtx_to_poly_int64 (builder.elt (i)) - i;

   rtx avl
- = has_vl_op (insn->rtl ()) ? get_vl (insn->rtl ()) : dem.get_avl ();
+ = (has_vl_op (insn->rtl ()) && REG_P (get_vl (insn->rtl (
+   ? get_vl (insn->rtl ())
+   : dem.get_avl ();

How about:

rtx avl = dem.get_avl_or_vl_reg ();

I wonder whether it is possible add a testcase for this issue ?


juzhe.zh...@rivai.ai
 
From: Li Xu
Date: 2023-09-27 11:07
To: gcc-patches
CC: kito.cheng; palmer; juzhe.zhong; xuli
Subject: [PATCH] RISC-V: Bugfix for RTL check[PR111533]
From: xuli 
 
Consider the flowing situation:
BB5: local_dem(RVV Insn 1, AVL(reg zero))
RVV Insn 1: vmv.s.x, AVL (const_int 1)
RVV Insn 2: vredsum.vs, AVL(reg zero)
 
vmv.s.x has vl operand, the following code will get
avl (cosnt_int) from RVV Insn 1.
rtx avl = has_vl_op (insn->rtl ()) ? get_vl (insn->rtl ())
   : dem.get_avl ();
 
If use REGNO for const_int, the compiler will crash:
 
during RTL pass: vsetvl
res_debug.c: In function '__dn_count_labels':
res_debug.c:1050:1: internal compiler error: RTL check: expected code 'reg',
have 'const_int' in rhs_regno, at rtl.h:1934
1050 | }
  | ^
0x8fb169 rtl_check_failed_code1(rtx_def const*, rtx_code, char const*, int, 
char const*)
../.././gcc/gcc/rtl.cc:770
0x1399818 rhs_regno(rtx_def const*)
../.././gcc/gcc/rtl.h:1934
0x1399818 anticipatable_occurrence_p
../.././gcc/gcc/config/riscv/riscv-vsetvl.cc:348
 
So in this case avl should be obtained from dem.
 
Another issue is caused by the following code:
HOST_WIDE_INT diff = INTVAL (builder.elt (i)) - i;
 
during RTL pass: expand
../../.././gcc/libgfortran/generated/matmul_c4.c: In function 'matmul_c4':
../../.././gcc/libgfortran/generated/matmul_c4.c:2906:39: internal compiler 
error: RTL check:
expected code 'const_int', have 'const_poly_int' in expand_const_vector,
at config/riscv/riscv-v.cc:1149
 
The builder.elt (i) can be either const_int or const_poly_int.
PR target/111533
gcc/ChangeLog:
 
* config/riscv/riscv-v.cc (expand_const_vector): Fix bug.
* config/riscv/riscv-vsetvl.cc (anticipatable_occurrence_p): Fix bug.
---
gcc/config/riscv/riscv-v.cc  | 6 --
gcc/config/riscv/riscv-vsetvl.cc | 5 -
2 files changed, 8 insertions(+), 3 deletions(-)
 
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 5f738634219..fb3c55b4705 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -1147,8 +1147,10 @@ expand_const_vector (rtx target, rtx src)
  for (unsigned int i = 0; i < v.npatterns (); ++i)
{
  /* Calculate the diff between the target sequence and
-  vid sequence.  */
-   HOST_WIDE_INT diff = INTVAL (builder.elt (i)) - i;
+  vid sequence.  The elt (i) can be either const_int or
+  const_poly_int. */
+   HOST_WIDE_INT diff = rtx_to_poly_int64 (builder.elt (i)).to_constant () - i;
+
  v.quick_push (gen_int_mode (diff, v.inner_mode ()));
}
  /* Step 2: Generate result = VID + diff.  */
diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 7af33e7ea6f..27000434341 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -308,7 +308,10 @@ anticipatable_occurrence_p (const bb_info *bb, const 
vector_insn_info dem)
 {
   /* rs1 (avl) are not modified in the basic block prior to the VSETVL.  */
   rtx avl
- = has_vl_op (insn->rtl ()) ? get_vl (insn->rtl ()) : dem.get_avl ();
+ = (has_vl_op (insn->rtl ()) && REG_P (get_vl (insn->rtl (
+   ? get_vl (insn->rtl ())
+   : dem.get_avl ();
+
   if (dem.dirty_p ())
{
  gcc_assert (!vsetvl_insn_p (insn->rtl ()));
-- 
2.17.1
 
 


Re: [PATCH v1] RISC-V: Support FP round auto-vectorization

2023-09-26 Thread juzhe.zh...@rivai.ai
LGTM



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-09-26 19:00
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Support FP round auto-vectorization
From: Pan Li 
 
This patch would like to support auto-vectorization for the
round API in math.h. It depends on the -ffast-math option.
 
When we would like to call round/roundf like v2 = round (v1),
we will convert it into below insns (reference the implementation of llvm).
 
* vfcvt.x.f v3, v1, RMM
* vfcvt.f.x v2, v3
 
However, the floating point value may not need the cvt as above if
its mantissa is zero. Take single precision floating point as example:
 
  ++---+-+
  | raw float  | binary layout | after round |
  ++---+-+
  | -8388607.5 | 0xcaff| -8388608.0  |
  | 8388607.5  | 0x4aff| 8388608.0   |
  | 8388608.0  | 0x4b00| 8388608.0   |
  | 8388609.0  | 0x4b01| 8388609.0   |
  ++---+-+
 
All single floating point >= 8388608.0 will have all zero mantisaa.
We leverage vmflt and mask to filter them out in vector and only do
the cvt on mask.
 
Befor this patch:
math-round-1.c:21:1: missed: couldn't vectorize loop
  ...
.L3:
  flw fa0,0(s0)
  addis0,s0,4
  addis1,s1,4
  callround
  fsw fa0,-4(s1)
  bne s0,s2,.L3
 
After this patch:
  ...
  fsrmi   4   // RMM, rounding to nearest, ties to max magnitude
.L4:
  vfabs.v v2,v1
  vmflt.vfv0,v2,fa5
  vfcvt.x.f.v v4,v1,v0.t
  vfcvt.f.x.v v2,v4,v0.t
  vfsgnj.vv   v2,v2,v1
  bne .L4
.L14:
  fsrma6
  ret
 
Please note VLS mode is also involved in this patch and covered by the
test cases.
 
gcc/ChangeLog:
 
* config/riscv/autovec.md (round2): New pattern.
* config/riscv/riscv-protos.h (enum insn_flags): New enum type.
(enum insn_type): Ditto.
(expand_vec_round): New function decl.
* config/riscv/riscv-v.cc (expand_vec_round): New function impl.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/unop/math-round-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-round-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-round-2.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-round-3.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-round-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-round-run-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-round-1.c: New test.
 
Signed-off-by: Pan Li 
---
gcc/config/riscv/autovec.md   | 10 
gcc/config/riscv/riscv-protos.h   |  5 ++
gcc/config/riscv/riscv-v.cc   | 24 
.../riscv/rvv/autovec/unop/math-round-0.c | 23 
.../riscv/rvv/autovec/unop/math-round-1.c | 23 
.../riscv/rvv/autovec/unop/math-round-2.c | 23 
.../riscv/rvv/autovec/unop/math-round-3.c | 25 +
.../riscv/rvv/autovec/unop/math-round-run-1.c | 39 +
.../riscv/rvv/autovec/unop/math-round-run-2.c | 39 +
.../riscv/rvv/autovec/vls/math-round-1.c  | 56 +++
10 files changed, 267 insertions(+)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-round-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-round-1.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-round-2.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-round-3.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-round-run-1.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-round-run-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-round-1.c
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 1d2fca60e98..798cf1272c5 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2251,3 +2251,13 @@ (define_expand "rint2"
 DONE;
   }
)
+
+(define_expand "round2"
+  [(match_operand:V_VLSF 0 "register_operand")
+   (match_operand:V_VLSF 1 "register_operand")]
+  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
+  {
+riscv_vector::expand_vec_round (operands[0], operands[1], mode, 
mode);
+DONE;
+  }
+)
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 629adeea94c..70ca244c591 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -256,6 +256,9 @@ enum insn_flags : unsigned int
   /* Means INSN has FRM operand and the value is FRM_RDN.  */
   FRM_RDN_P = 1 << 17,
+
+  /* Means INSN has FRM operand and the value is FRM_RMM.  */
+  FRM_RMM_P = 1 << 18,
};
enum insn_type : unsigned int
@@ -299,6 +302,7 @@ enum insn_type : unsigned int
   UNARY_OP_TAMU_FRM_DYN = UNARY_OP_TAMU | FRM_DYN_P,
   UNARY_OP_TAMU_FRM_RUP = UNARY_OP_TAMU | FRM

Re: Re: [PATCH] MATCH: Optimize COND_ADD reduction pattern

2023-09-26 Thread juzhe.zh...@rivai.ai
Address comments:

V3 
COND_LEN_ADD:https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631350.html
 
V2 COND_ADD: 
https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631352.html 

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-09-26 17:41
To: Juzhe-Zhong
CC: gcc-patches; richard.sandiford
Subject: Re: [PATCH] MATCH: Optimize COND_ADD reduction pattern
On Tue, 26 Sep 2023, Juzhe-Zhong wrote:
 
> Current COND_ADD reduction pattern can't optimize floating-point vector.
> As Richard suggested: 
> https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631336.html
> Allow COND_ADD reduction pattern to optimize floating-point vector.
> 
> Bootstrap and Regression is running.
> 
> Ok for trunk if tests pass ?
 
I just wondered about fixed point - zerop seems to also allow
fixed_zerop.  Maybe do
 
if (ANY_INTEGRAL_TYPE_P (type)
 || (FLOAT_TYPE_P (type)
 && fold_real_zero_addition_p (type, NULL_TREE, @4, 0)))
 
(also for the other patch) to avoid touching the fixed point case.
 
Richard.
 
> gcc/ChangeLog:
> 
> * match.pd: Optimize COND_ADD reduction pattern.
> 
> ---
>  gcc/match.pd | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 5061c19e086..398beaebd27 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -8863,8 +8863,10 @@ and,
>  
> c = mask1 && mask2 ? d + b : d.  */
>  (simplify
> -  (IFN_COND_ADD @0 @1 (vec_cond @2 @3 integer_zerop) @1)
> -   (IFN_COND_ADD (bit_and @0 @2) @1 @3 @1))
> +  (IFN_COND_ADD @0 @1 (vec_cond @2 @3 zerop@4) @1)
> +   (if (ANY_INTEGRAL_TYPE_P (type)
> + || fold_real_zero_addition_p (type, NULL_TREE, @4, 0))
> +   (IFN_COND_ADD (bit_and @0 @2) @1 @3 @1)))
>  
>  /* Detect simplication for a conditional length reduction where
>  
> 
 
-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
 


Re: Re: [PATCH] MATCH: Optimize COND_ADD_LEN reduction pattern

2023-09-26 Thread juzhe.zh...@rivai.ai
Hi, Richi.

Addresse comments.

One is V2 patch for COND_LEN_ADD reduction:
https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631340.html 

The second one is optimize COND_ADD reduction:
https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631341.html 




juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-09-26 15:46
To: Juzhe-Zhong
CC: gcc-patches; richard.sandiford; rguenther; pinskia
Subject: Re: [PATCH] MATCH: Optimize COND_ADD_LEN reduction pattern
On Tue, Sep 26, 2023 at 9:13 AM Juzhe-Zhong  wrote:
>
>
> This patch leverage this commit: 
> https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=62b505a4d5fc89
> to optimize COND_LEN_ADD reduction pattern.
>
> We are doing optimization of VEC_COND_EXPR + COND_LEN_ADD -> COND_LEN_ADD.
>
> Consider thsi following case:
>
> #include 
>
> void
> pr11594 (uint64_t *restrict a, uint64_t *restrict b, int loop_size)
> {
>   uint64_t result = 0;
>
>   for (int i = 0; i < loop_size; i++)
> {
>   if (b[i] <= a[i])
> {
>   result += a[i];
> }
> }
>
>   a[0] = result;
> }
>
> Before this patch:
> vsetvli a7,zero,e64,m1,ta,ma
> vmv.v.i v2,0
> vmv1r.v v3,v2--- redundant
> .L3:
> vsetvli a5,a2,e64,m1,ta,ma
> vle64.v v1,0(a3)
> vle64.v v0,0(a1)
> sllia6,a5,3
> vsetvli a7,zero,e64,m1,ta,ma
> sub a2,a2,a5
> vmsleu.vv   v0,v0,v1
> add a1,a1,a6
> vmerge.vvm  v1,v3,v1,v0  redundant.
> add a3,a3,a6
> vsetvli zero,a5,e64,m1,tu,ma
> vadd.vv v2,v2,v1
> bne a2,zero,.L3
> li  a5,0
> vsetvli a4,zero,e64,m1,ta,ma
> vmv.s.x v1,a5
> vredsum.vs  v2,v2,v1
> vmv.x.s a5,v2
> sd  a5,0(a0)
> ret
>
> After this patch:
>
> vsetvli a6,zero,e64,m1,ta,ma
> vmv.v.i v1,0
> .L3:
> vsetvli a5,a2,e64,m1,ta,ma
> vle64.v v2,0(a4)
> vle64.v v0,0(a1)
> sllia3,a5,3
> vsetvli a6,zero,e64,m1,ta,ma
> sub a2,a2,a5
> vmsleu.vv   v0,v0,v2
> add a1,a1,a3
> vsetvli zero,a5,e64,m1,tu,mu
> add a4,a4,a3
> vadd.vv v1,v1,v2,v0.t
> bne a2,zero,.L3
> li  a5,0
> vsetivlizero,1,e64,m1,ta,ma
> vmv.s.x v2,a5
> vsetvli a5,zero,e64,m1,ta,ma
> vredsum.vs  v1,v1,v2
> vmv.x.s a5,v1
> sd  a5,0(a0)
> ret
>
> Bootstrap && Regression is running.
>
> Ok for trunk when testing passes ?
>
> PR tree-optimization/111594
> PR tree-optimization/110660
>
> gcc/ChangeLog:
>
> * match.pd: Optimize COND_LEN_ADD reduction.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/autovec/cond/cond_reduc-1.c: New test.
> * gcc.target/riscv/rvv/autovec/cond/pr111594.c: New test.
>
> ---
>  gcc/match.pd  | 13 +
>  .../riscv/rvv/autovec/cond/cond_reduc-1.c | 29 +++
>  .../riscv/rvv/autovec/cond/pr111594.c | 22 ++
>  3 files changed, 64 insertions(+)
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_reduc-1.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111594.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index a17778fbaa6..af8d12c138e 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -8866,6 +8866,19 @@ and,
>(IFN_COND_ADD @0 @1 (vec_cond @2 @3 integer_zerop) @1)
> (IFN_COND_ADD (bit_and @0 @2) @1 @3 @1))
>
> +/* Detect simplication for a conditional length reduction where
> +
> +   a = mask ? b : 0
> +   c = i < len + bias ? d + a : d
> +
> +   is turned into
> +
> +   c = mask && i < len ? d + b : d.  */
> +(simplify
> +  (IFN_COND_LEN_ADD integer_minus_onep @0 (vec_cond @1 @2 zerop) @0 @3 @4)
 
I think you want intger_truep instead of integer_minus_onep for
readability.  Since you
use zerop here can you also adjust the preceeding pattern?
 
> +   (if (!HONOR_NANS (type) && !HONOR_SIGNED_ZEROS (type))
 
it might be better to check ANY_INTEGRAL_TYPE_P (type) ||
fold_real_zero_addition_p (type, NULL_TREE, @5, 0)
your change misses HONOR_SIGN_DEPENDENT_ROUNDING I think.
 
> +(IFN_COND_LEN_ADD @1 @0 @2 @0 @3 @4)))
> +
 
 
 
>  /* For pointers @0 and @2 and nonnegative constant offset @1, look for
> expressions like:
>
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_reduc-1.c 
> 

Re: [PATCH v1] RISC-V: Support FP rint auto-vectorization

2023-09-26 Thread juzhe.zh...@rivai.ai
LGTM。



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-09-26 15:24
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Support FP rint auto-vectorization
From: Pan Li 
 
This patch would like to support auto-vectorization for the
rint API in math.h. It depends on the -ffast-math option.
 
When we would like to call rint/rintf like v2 = rint (v1),
we will convert it into below insns (reference the implementation of llvm).
 
* vfcvt.x.f v3, v1
* vfcvt.f.x v2, v3
 
However, the floating point value may not need the cvt as above if
its mantissa is zero. Take single precision floating point as example:
 
Assume we have RTZ rounding mode
 
  ++---+-+
  | raw float  | binary layout | after int   |
  ++---+-+
  | -8388607.5 | 0xcaff| -8388607.0  |
  | 8388607.5  | 0x4aff| 8388607.0   |
  | 8388608.0  | 0x4b00| 8388608.0   |
  | 8388609.0  | 0x4b01| 8388609.0   |
  ++---+-+
 
All single floating point >= 8388608.0 will have all zero mantisaa.
We leverage vmflt and mask to filter them out in vector and only do
the cvt on mask.
 
Befor this patch:
math-rint-1.c:21:1: missed: couldn't vectorize loop
  ...
.L3:
  flw fa0,0(s0)
  addis0,s0,4
  addis1,s1,4
  callrint
  fsw fa0,-4(s1)
  bne s0,s2,.L3
 
After this patch:
  vfabs.v v2,v1
  vmflt.vfv0,v2,fa5
  vfcvt.x.f.v v4,v1,v0.t
  vfcvt.f.x.v v2,v4,v0.t
  vfsgnj.vv   v2,v2,v1
 
Please note VLS mode is also involved in this patch and covered by the
test cases.
 
gcc/ChangeLog:
 
* config/riscv/autovec.md (rint2): New pattern.
* config/riscv/riscv-protos.h (expand_vec_rint): New function decl.
* config/riscv/riscv-v.cc (expand_vec_rint): New function impl.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/unop/math-rint-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-rint-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-rint-2.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-rint-3.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-rint-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-rint-run-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-rint-1.c: New test.
 
Signed-off-by: Pan Li 
---
gcc/config/riscv/autovec.md   | 10 
gcc/config/riscv/riscv-protos.h   |  1 +
gcc/config/riscv/riscv-v.cc   | 22 +++
.../riscv/rvv/autovec/unop/math-rint-0.c  | 18 ++
.../riscv/rvv/autovec/unop/math-rint-1.c  | 18 ++
.../riscv/rvv/autovec/unop/math-rint-2.c  | 18 ++
.../riscv/rvv/autovec/unop/math-rint-3.c  | 20 +++
.../riscv/rvv/autovec/unop/math-rint-run-1.c  | 48 +++
.../riscv/rvv/autovec/unop/math-rint-run-2.c  | 48 +++
.../riscv/rvv/autovec/vls/math-rint-1.c   | 58 +++
10 files changed, 261 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-rint-0.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-rint-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-rint-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-rint-3.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-rint-run-1.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-rint-run-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-rint-1.c
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index b47f086f5e6..1d2fca60e98 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2241,3 +2241,13 @@ (define_expand "nearbyint2"
 DONE;
   }
)
+
+(define_expand "rint2"
+  [(match_operand:V_VLSF 0 "register_operand")
+   (match_operand:V_VLSF 1 "register_operand")]
+  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
+  {
+riscv_vector::expand_vec_rint (operands[0], operands[1], mode, 
mode);
+DONE;
+  }
+)
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index f87bdef0f71..629adeea94c 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -462,6 +462,7 @@ void expand_reduction (unsigned, unsigned, rtx *, rtx);
void expand_vec_ceil (rtx, rtx, machine_mode, machine_mode);
void expand_vec_floor (rtx, rtx, machine_mode, machine_mode);
void expand_vec_nearbyint (rtx, rtx, machine_mode, machine_mode);
+void expand_vec_rint (rtx, rtx, machine_mode, machine_mode);
#endif
bool sew64_scalar_helper (rtx *, rtx *, rtx, machine_mode,
  bool, void (*)(rtx *, rtx));
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 5d3d458fa6c..445ed000f88 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -3698,4 +3698

Re: [PATCH v2] RISC-V: Support FP nearbyint auto-vectorization

2023-09-26 Thread juzhe.zh...@rivai.ai
LGTM.



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-09-26 15:19
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v2] RISC-V: Support FP nearbyint auto-vectorization
From: Pan Li 
 
This patch would like to support auto-vectorization for the
nearbyint API in math.h. It depends on the -ffast-math option.
 
When we would like to call nearbyint/nearbyintf like v2 = nearbyint (v1),
we will convert it into below insns (reference the implementation of llvm).
 
* frflags a5
* vfcvt.x.f v3, v1, RDN
* vfcvt.f.x v2, v3
* fsflags a5
 
However, the floating point value may not need the cvt as above if
its mantissa is zero. Take single precision floating point as example:
 
Assume we have RTZ rounding mode
 
  ++---+-+
  | raw float  | binary layout | after nearbyint |
  ++---+-+
  | 8388607.5  | 0x4aff| 8388607.0   |
  | 8388608.0  | 0x4b00| 8388608.0   |
  | 8388609.0  | 0x4b01| 8388609.0   |
  ++---+-+
 
All single floating point >= 8388608.0 will have all zero mantisaa.
We leverage vmflt and mask to filter them out in vector and only do the
cvt on mask.
 
Befor this patch:
math-nearbyint-1.c:21:1: missed: couldn't vectorize loop
  ...
.L3:
  flw fa0,0(s0)
  addis0,s0,4
  addis1,s1,4
  callnearbyint
  fsw fa0,-4(s1)
  bne s0,s2,.L3
 
After this patch:
  vfabs.v v2,v1
  vmflt.vfv0,v2,fa5
  frflags a7
  vfcvt.x.f.v v4,v1,v0.t
  vfcvt.f.x.v v2,v4,v0.t
  fsflags a7
  vfsgnj.vv   v2,v2,v1
 
Please note VLS mode is also involved in this patch and covered by the
test cases.
 
gcc/ChangeLog:
 
* config/riscv/autovec.md (nearbyint2): New pattern.
* config/riscv/riscv-protos.h (enum insn_type): New enum.
(expand_vec_nearbyint): New function decl.
* config/riscv/riscv-v.cc (expand_vec_nearbyint): New func impl.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/unop/test-math.h: Add helper function.
* gcc.target/riscv/rvv/autovec/unop/math-nearbyint-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-nearbyint-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-nearbyint-2.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-nearbyint-3.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-nearbyint-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-nearbyint-run-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-nearbyint-1.c: New test.
 
Signed-off-by: Pan Li 
---
gcc/config/riscv/autovec.md   | 11 
gcc/config/riscv/riscv-protos.h   |  2 +
gcc/config/riscv/riscv-v.cc   | 29 ++
.../riscv/rvv/autovec/unop/math-nearbyint-0.c | 20 +++
.../riscv/rvv/autovec/unop/math-nearbyint-1.c | 20 +++
.../riscv/rvv/autovec/unop/math-nearbyint-2.c | 20 +++
.../riscv/rvv/autovec/unop/math-nearbyint-3.c | 22 +++
.../rvv/autovec/unop/math-nearbyint-run-1.c   | 48 +++
.../rvv/autovec/unop/math-nearbyint-run-2.c   | 48 +++
.../riscv/rvv/autovec/unop/test-math.h| 33 +++
.../riscv/rvv/autovec/vls/math-nearbyint-1.c  | 58 +++
11 files changed, 311 insertions(+)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-nearbyint-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-nearbyint-1.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-nearbyint-2.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-nearbyint-3.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-nearbyint-run-1.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-nearbyint-run-2.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-nearbyint-1.c
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index a005e17457e..b47f086f5e6 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2210,6 +2210,7 @@ (define_expand "avg3_ceil"
;; Includes:
;; - ceil/ceilf
;; - floor/floorf
+;; - nearbyint/nearbyintf
;; -
(define_expand "ceil2"
   [(match_operand:V_VLSF 0 "register_operand")
@@ -2230,3 +2231,13 @@ (define_expand "floor2"
 DONE;
   }
)
+
+(define_expand "nearbyint2"
+  [(match_operand:V_VLSF 0 "register_operand")
+   (match_operand:V_VLSF 1 "register_operand")]
+  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
+  {
+riscv_vector::expand_vec_nearbyint (operands[0], operands[1], mode, 
mode);
+DONE;
+  }
+)
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 63eb2475705..f87bdef0f71 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-p

Re: [PATCH v1] RISC-V: Rename rounding const fp function for refactor

2023-09-25 Thread juzhe.zh...@rivai.ai
LGTM.



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-09-26 11:12
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Rename rounding const fp function for refactor
From: Pan Li 
 
The rounding related API shared one const, rename it to avoid
unnecessary redundant code.
 
gcc/ChangeLog:
 
* config/riscv/riscv-v.cc (gen_ceil_const_fp): Remove.
(get_fp_rounding_coefficient): Rename.
(gen_floor_const_fp): Remove.
(expand_vec_ceil): Take renamed func.
(expand_vec_floor): Ditto.
 
Signed-off-by: Pan Li 
---
gcc/config/riscv/riscv-v.cc | 13 +++--
1 file changed, 3 insertions(+), 10 deletions(-)
 
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index a1ffefb23f3..9a1df950d58 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -3548,7 +3548,7 @@ cmp_lmul_gt_one (machine_mode mode)
   greater than and equal to 4503599627370496.
  */
static rtx
-gen_ceil_const_fp (machine_mode inner_mode)
+get_fp_rounding_coefficient (machine_mode inner_mode)
{
   REAL_VALUE_TYPE real;
@@ -3564,13 +3564,6 @@ gen_ceil_const_fp (machine_mode inner_mode)
   return const_double_from_real_value (real, inner_mode);
}
-static rtx
-gen_floor_const_fp (machine_mode inner_mode)
-{
-  /* The floor needs the same floating point const as ceil.  */
-  return gen_ceil_const_fp (inner_mode);
-}
-
static rtx
emit_vec_float_cmp_mask (rtx fp_vector, rtx_code code, rtx fp_scalar,
machine_mode vec_fp_mode)
@@ -3637,7 +3630,7 @@ expand_vec_ceil (rtx op_0, rtx op_1, machine_mode 
vec_fp_mode,
   emit_vec_abs (op_0, op_1, vec_fp_mode);
   /* Step-2: Generate the mask on const fp.  */
-  rtx const_fp = gen_ceil_const_fp (GET_MODE_INNER (vec_fp_mode));
+  rtx const_fp = get_fp_rounding_coefficient (GET_MODE_INNER (vec_fp_mode));
   rtx mask = emit_vec_float_cmp_mask (op_0, LT, const_fp, vec_fp_mode);
   /* Step-3: Convert to integer on mask, with rounding up (aka ceil).  */
@@ -3662,7 +3655,7 @@ expand_vec_floor (rtx op_0, rtx op_1, machine_mode 
vec_fp_mode,
   emit_vec_abs (op_0, op_1, vec_fp_mode);
   /* Step-2: Generate the mask on const fp.  */
-  rtx const_fp = gen_floor_const_fp (GET_MODE_INNER (vec_fp_mode));
+  rtx const_fp = get_fp_rounding_coefficient (GET_MODE_INNER (vec_fp_mode));
   rtx mask = emit_vec_float_cmp_mask (op_0, LT, const_fp, vec_fp_mode);
   /* Step-3: Convert to integer on mask, with rounding down (aka floor).  */
-- 
2.34.1
 
 


Re: [PATCH v1] RISC-V: Support FP nearbyint auto-vectorization

2023-09-25 Thread juzhe.zh...@rivai.ai
+static rtx
+gen_nearbyint_const_fp (machine_mode inner_mode)
+{
+  /* The nearbyint needs the same floating point const as ceil.  */
+  return gen_ceil_const_fp (inner_mode);
+}
This is redundant.

Also, this is also redundant:
static rtx
gen_floor_const_fp (machine_mode inner_mode)
{
  /* The floor needs the same floating point const as ceil.  */
  return gen_ceil_const_fp (inner_mode);
}

So rename it :
gen_ceil_const_fp (machine_mode inner_mode)

into:
get_fp_rounding_coefficient



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-09-26 10:39
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Support FP nearbyint auto-vectorization
From: Pan Li 
 
This patch would like to support auto-vectorization for the
nearbyint API in math.h. It depends on the -ffast-math option.
 
When we would like to call nearbyint/nearbyintf like v2 = nearbyint (v1),
we will convert it into below insns (reference the implementation of llvm).
 
* frflags a5
* vfcvt.x.f v3, v1, RDN
* vfcvt.f.x v2, v3
* fsflags a5
 
However, the floating point value may not need the cvt as above if
its mantissa is zero. Take single precision floating point as example:
 
Assume we have RTZ rounding mode
 
  ++---+-+
  | raw float  | binary layout | after nearbyint |
  ++---+-+
  | 8388607.5  | 0x4aff| 8388607.0   |
  | 8388608.0  | 0x4b00| 8388608.0   |
  | 8388609.0  | 0x4b01| 8388609.0   |
  ++---+-+
 
All single floating point >= 8388608.0 will have all zero mantisaa.
We leverage vmflt and mask to filter them out in vector and only do the
cvt on mask.
 
Befor this patch:
math-nearbyint-1.c:21:1: missed: couldn't vectorize loop
  ...
.L3:
  flw fa0,0(s0)
  addis0,s0,4
  addis1,s1,4
  callnearbyint
  fsw fa0,-4(s1)
  bne s0,s2,.L3
 
After this patch:
  vfabs.v v2,v1
  vmflt.vfv0,v2,fa5
  frflags a7
  vfcvt.x.f.v v4,v1,v0.t
  vfcvt.f.x.v v2,v4,v0.t
  fsflags a7
  vfsgnj.vv   v2,v2,v1
 
Please note VLS mode is also involved in this patch and covered by the
test cases.
 
gcc/ChangeLog:
 
* config/riscv/autovec.md (nearbyint2): New pattern.
* config/riscv/riscv-protos.h (enum insn_type): New enum.
(expand_vec_nearbyint): New function decl.
* config/riscv/riscv-v.cc (gen_nearbyint_const_fp): New function impl.
(expand_vec_nearbyint): Ditto.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/unop/test-math.h: Add helper function.
* gcc.target/riscv/rvv/autovec/unop/math-nearbyint-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-nearbyint-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-nearbyint-2.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-nearbyint-3.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-nearbyint-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-nearbyint-run-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-nearbyint-1.c: New test.
 
Signed-off-by: Pan Li 
---
gcc/config/riscv/autovec.md   | 11 
gcc/config/riscv/riscv-protos.h   |  2 +
gcc/config/riscv/riscv-v.cc   | 36 
.../riscv/rvv/autovec/unop/math-nearbyint-0.c | 20 +++
.../riscv/rvv/autovec/unop/math-nearbyint-1.c | 20 +++
.../riscv/rvv/autovec/unop/math-nearbyint-2.c | 20 +++
.../riscv/rvv/autovec/unop/math-nearbyint-3.c | 22 +++
.../rvv/autovec/unop/math-nearbyint-run-1.c   | 48 +++
.../rvv/autovec/unop/math-nearbyint-run-2.c   | 48 +++
.../riscv/rvv/autovec/unop/test-math.h| 33 +++
.../riscv/rvv/autovec/vls/math-nearbyint-1.c  | 58 +++
11 files changed, 318 insertions(+)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-nearbyint-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-nearbyint-1.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-nearbyint-2.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-nearbyint-3.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-nearbyint-run-1.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-nearbyint-run-2.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-nearbyint-1.c
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index a005e17457e..b47f086f5e6 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2210,6 +2210,7 @@ (define_expand "avg3_ceil"
;; Includes:
;; - ceil/ceilf
;; - floor/floorf
+;; - nearbyint/nearbyintf
;; -
(define_expand "ceil2"
   [(match_operand:V_VLSF 0 "register_operand")
@@ -2230,3 +2231,13 @@ (define_expand "floor2"
 DONE;
   }
)
+
+(define_expand "near

Re: [PATCH v2] RISC-V: Refine the code gen for ceil auto vectorization.

2023-09-22 Thread juzhe.zh...@rivai.ai
LGTM.



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-09-22 20:16
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v2] RISC-V: Refine the code gen for ceil auto vectorization.
From: Pan Li 
 
We vectorized below ceil code already.
 
void
test_ceil (float *out, float *in, int count)
{
  for (unsigned i = 0; i < count; i++)
out[i] = __builtin_ceilf (in[i]);
}
 
Before this patch:
vfmv.v.xv4,fa0 // can be removed
vfabs.v v0,v1
vmv1r.v v2,v1  // can be removed
vmflt.vvv0,v0,v4   // can be refined to vmflt.vf
vfcvt.x.f.v v3,v1,v0.t
vfcvt.f.x.v v2,v3,v0.t
vfsgnj.vv   v2,v2,v1
 
After this patch:
vfabs.v v1,v2
vmflt.vfv0,v1,fa5
vfcvt.x.f.v v3,v2,v0.t
vfcvt.f.x.v v1,v3,v0.t
vfsgnj.vv   v1,v1,v2
 
We can generate better code include below items.
 
* Remove vfmv.v.f.
* Take vmflt.vf instead of vmflt.vv.
* Remove vmv1r.v.
 
gcc/ChangeLog:
 
* config/riscv/riscv-v.cc (expand_vec_float_cmp_mask): Refactor.
(emit_vec_float_cmp_mask): Rename.
(expand_vec_copysign): Ditto.
(emit_vec_copysign): Ditto.
(emit_vec_abs): New function impl.
(emit_vec_cvt_x_f): Ditto.
(emit_vec_cvt_f_x): Ditto.
(expand_vec_ceil): Ditto.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/unop/math-ceil-0.c: Adjust body check.
* gcc.target/riscv/rvv/autovec/unop/math-ceil-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-ceil-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-ceil-3.c: Ditto.
 
Signed-off-by: Pan Li 
---
gcc/config/riscv/riscv-v.cc   | 81 ---
.../riscv/rvv/autovec/unop/math-ceil-0.c  |  5 +-
.../riscv/rvv/autovec/unop/math-ceil-1.c  |  5 +-
.../riscv/rvv/autovec/unop/math-ceil-2.c  |  5 +-
.../riscv/rvv/autovec/unop/math-ceil-3.c  |  5 +-
5 files changed, 54 insertions(+), 47 deletions(-)
 
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 4d0e1d8d1a9..251d827d973 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -3557,36 +3557,27 @@ gen_ceil_const_fp (machine_mode inner_mode)
}
static rtx
-expand_vec_float_cmp_mask (rtx fp_vector, rtx_code code, rtx fp_scalar,
-machine_mode vec_fp_mode)
+emit_vec_float_cmp_mask (rtx fp_vector, rtx_code code, rtx fp_scalar,
+ machine_mode vec_fp_mode)
{
-  /* Step-1: Get the abs float value for mask generation.  */
-  rtx tmp = gen_reg_rtx (vec_fp_mode);
-  rtx abs_ops[] = {tmp, fp_vector};
-  insn_code icode = code_for_pred (ABS, vec_fp_mode);
-  emit_vlmax_insn (icode, UNARY_OP, abs_ops);
-
-  /* Step-2: Prepare the scalar float compare register.  */
+  /* Step-1: Prepare the scalar float compare register.  */
   rtx fp_reg = gen_reg_rtx (GET_MODE_INNER (vec_fp_mode));
   emit_insn (gen_move_insn (fp_reg, fp_scalar));
-  /* Step-3: Prepare the vector float compare register.  */
-  rtx vec_dup = gen_reg_rtx (vec_fp_mode);
-  icode = code_for_pred_broadcast (vec_fp_mode);
-  rtx vfmv_ops[] = {vec_dup, fp_reg};
-  emit_vlmax_insn (icode, UNARY_OP, vfmv_ops);
-
-  /* Step-4: Generate the mask.  */
+  /* Step-2: Generate the mask.  */
   machine_mode mask_mode = get_mask_mode (vec_fp_mode);
   rtx mask = gen_reg_rtx (mask_mode);
-  expand_vec_cmp (mask, code, tmp, vec_dup);
+  rtx cmp = gen_rtx_fmt_ee (code, mask_mode, fp_vector, fp_reg);
+  rtx cmp_ops[] = {mask, cmp, fp_vector, fp_reg};
+  insn_code icode = code_for_pred_cmp_scalar (vec_fp_mode);
+  emit_vlmax_insn (icode, COMPARE_OP, cmp_ops);
   return mask;
}
static void
-expand_vec_copysign (rtx op_dest, rtx op_src_0, rtx op_src_1,
-  machine_mode vec_mode)
+emit_vec_copysign (rtx op_dest, rtx op_src_0, rtx op_src_1,
+machine_mode vec_mode)
{
   rtx sgnj_ops[] = {op_dest, op_src_0, op_src_1};
   insn_code icode = code_for_pred (UNSPEC_VCOPYSIGN, vec_mode);
@@ -3594,30 +3585,58 @@ expand_vec_copysign (rtx op_dest, rtx op_src_0, rtx 
op_src_1,
   emit_vlmax_insn (icode, BINARY_OP, sgnj_ops);
}
+static void
+emit_vec_abs (rtx op_dest, rtx op_src, machine_mode vec_mode)
+{
+  rtx abs_ops[] = {op_dest, op_src};
+  insn_code icode = code_for_pred (ABS, vec_mode);
+
+  emit_vlmax_insn (icode, UNARY_OP, abs_ops);
+}
+
+static void
+emit_vec_cvt_x_f (rtx op_dest, rtx op_src, rtx mask,
+   insn_type type, machine_mode vec_mode)
+{
+  rtx cvt_x_ops[] = {op_dest, mask, op_dest, op_src};
+  insn_code icode = code_for_pred_fcvt_x_f (UNSPEC_VFCVT, vec_mode);
+
+  emit_vlmax_insn (icode, type, cvt_x_ops);
+}
+
+static void
+emit_vec_cvt_f_x (rtx op_dest, rtx op_src, rtx mask,
+   insn_type type, machine_mode vec_mode)
+{
+  rtx cvt_fp_ops[] = {op_dest, mask, op_dest, op_src};
+  insn_code icode = code_for_pred (FLOAT, vec_mode);
+
+  emit_vlmax_insn (icode, type, cvt_fp_ops);
+}
+
void
expand_vec_ceil (rtx op_0, rtx op_1, machine_mode vec_fp_mode,
machine_mode vec_int_mode)
{
-  /* Step-1: Generate the mask on const fp.  */
+  /* Step-1: Get the abs float value for mask generation.  */
+  emit_vec_abs (op_0, op_1, vec_fp_mode);
+
+  /* Step-2: Generate 

Re: [PATCH v1] RISC-V: Refine the code gen for ceil auto vectorization.

2023-09-22 Thread juzhe.zh...@rivai.ai
I prefer change expand_vec_copysign into emit_vec_copysign。

Likewise, emit_fabs. ...etc.



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-09-22 19:19
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Refine the code gen for ceil auto vectorization.
From: Pan Li 
 
We vectorized below ceil code already.
 
void
test_ceil (float *out, float *in, int count)
{
  for (unsigned i = 0; i < count; i++)
out[i] = __builtin_ceilf (in[i]);
}
 
Before this patch:
vfmv.v.xv4,fa0 // can be removed
vfabs.v v0,v1
vmv1r.v v2,v1  // can be removed
vmflt.vvv0,v0,v4   // can be refined to vmflt.vf
vfcvt.x.f.v v3,v1,v0.t
vfcvt.f.x.v v2,v3,v0.t
vfsgnj.vv   v2,v2,v1
 
After this patch:
vfabs.v v1,v2
vmflt.vfv0,v1,fa5
vfcvt.x.f.v v3,v2,v0.t
vfcvt.f.x.v v1,v3,v0.t
vfsgnj.vv   v1,v1,v2
 
We can generate better code include below items.
 
* Remove vfmv.v.f.
* Take vmflt.vf instead of vmflt.vv.
* Remove vmv1r.v.
 
gcc/ChangeLog:
 
* config/riscv/riscv-v.cc (expand_vec_float_cmp_mask): Refactor.
(expand_vec_abs): New function impl.
(expand_vec_cvt_x_f): Ditto.
(expand_vec_cvt_f_x): Ditto.
(expand_vec_ceil): Refine.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/unop/math-ceil-0.c: Adjust body check.
* gcc.target/riscv/rvv/autovec/unop/math-ceil-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-ceil-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-ceil-3.c: Ditto.
 
Signed-off-by: Pan Li 
---
gcc/config/riscv/riscv-v.cc   | 71 ---
.../riscv/rvv/autovec/unop/math-ceil-0.c  |  5 +-
.../riscv/rvv/autovec/unop/math-ceil-1.c  |  5 +-
.../riscv/rvv/autovec/unop/math-ceil-2.c  |  5 +-
.../riscv/rvv/autovec/unop/math-ceil-3.c  |  5 +-
5 files changed, 49 insertions(+), 42 deletions(-)
 
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 4d0e1d8d1a9..ea2b01f6a6e 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -3560,26 +3560,17 @@ static rtx
expand_vec_float_cmp_mask (rtx fp_vector, rtx_code code, rtx fp_scalar,
   machine_mode vec_fp_mode)
{
-  /* Step-1: Get the abs float value for mask generation.  */
-  rtx tmp = gen_reg_rtx (vec_fp_mode);
-  rtx abs_ops[] = {tmp, fp_vector};
-  insn_code icode = code_for_pred (ABS, vec_fp_mode);
-  emit_vlmax_insn (icode, UNARY_OP, abs_ops);
-
-  /* Step-2: Prepare the scalar float compare register.  */
+  /* Step-1: Prepare the scalar float compare register.  */
   rtx fp_reg = gen_reg_rtx (GET_MODE_INNER (vec_fp_mode));
   emit_insn (gen_move_insn (fp_reg, fp_scalar));
-  /* Step-3: Prepare the vector float compare register.  */
-  rtx vec_dup = gen_reg_rtx (vec_fp_mode);
-  icode = code_for_pred_broadcast (vec_fp_mode);
-  rtx vfmv_ops[] = {vec_dup, fp_reg};
-  emit_vlmax_insn (icode, UNARY_OP, vfmv_ops);
-
-  /* Step-4: Generate the mask.  */
+  /* Step-2: Generate the mask.  */
   machine_mode mask_mode = get_mask_mode (vec_fp_mode);
   rtx mask = gen_reg_rtx (mask_mode);
-  expand_vec_cmp (mask, code, tmp, vec_dup);
+  rtx cmp = gen_rtx_fmt_ee (code, mask_mode, fp_vector, fp_reg);
+  rtx cmp_ops[] = {mask, cmp, fp_vector, fp_reg};
+  insn_code icode = code_for_pred_cmp_scalar (vec_fp_mode);
+  emit_vlmax_insn (icode, COMPARE_OP, cmp_ops);
   return mask;
}
@@ -3594,29 +3585,57 @@ expand_vec_copysign (rtx op_dest, rtx op_src_0, rtx 
op_src_1,
   emit_vlmax_insn (icode, BINARY_OP, sgnj_ops);
}
+static void
+expand_vec_abs (rtx op_dest, rtx op_src, machine_mode vec_mode)
+{
+  rtx abs_ops[] = {op_dest, op_src};
+  insn_code icode = code_for_pred (ABS, vec_mode);
+
+  emit_vlmax_insn (icode, UNARY_OP, abs_ops);
+}
+
+static void
+expand_vec_cvt_x_f (rtx op_dest, rtx op_src, rtx mask,
+ insn_type type, machine_mode vec_mode)
+{
+  rtx cvt_x_ops[] = {op_dest, mask, op_dest, op_src};
+  insn_code icode = code_for_pred_fcvt_x_f (UNSPEC_VFCVT, vec_mode);
+
+  emit_vlmax_insn (icode, type, cvt_x_ops);
+}
+
+static void
+expand_vec_cvt_f_x (rtx op_dest, rtx op_src, rtx mask,
+ insn_type type, machine_mode vec_mode)
+{
+  rtx cvt_fp_ops[] = {op_dest, mask, op_dest, op_src};
+  insn_code icode = code_for_pred (FLOAT, vec_mode);
+
+  emit_vlmax_insn (icode, type, cvt_fp_ops);
+}
+
void
expand_vec_ceil (rtx op_0, rtx op_1, machine_mode vec_fp_mode,
machine_mode vec_int_mode)
{
-  /* Step-1: Generate the mask on const fp.  */
+  /* Step-1: Get the abs float value for mask generation.  */
+  expand_vec_abs (op_0, op_1, vec_fp_mode);
+
+  /* Step-2: Generate the mask on const fp.  */
   rtx const_fp = gen_ceil_const_fp (GET_MODE_INNER (vec_fp_mode));
-  rtx mask = expand_vec_float_cmp_mask (op_1, LT, const_fp, vec_fp_mode);
+  rtx mask = expand_vec_float_cmp_mask (op_0, LT, const_fp, vec_fp_mode);
-  /* Step-2: Convert to integer on mask, with rounding up (aka ceil).  */
+  /* Step-3: Convert to integer on mask, with rounding up (aka ceil).  */
   rtx tmp = gen_reg_rtx (vec_int_mode);
-  rtx cvt_x_

Re: [PATCH v1] RISC-V: Move ceil test cases to unop folder

2023-09-22 Thread juzhe.zh...@rivai.ai
ok




juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-09-22 17:11
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Move ceil test cases to unop folder
From: Pan Li 
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/math-ceil-0.c: Moved to...
* gcc.target/riscv/rvv/autovec/unop/math-ceil-0.c: ...here.
* gcc.target/riscv/rvv/autovec/math-ceil-1.c: Moved to...
* gcc.target/riscv/rvv/autovec/unop/math-ceil-1.c: ...here.
* gcc.target/riscv/rvv/autovec/math-ceil-2.c: Moved to...
* gcc.target/riscv/rvv/autovec/unop/math-ceil-2.c: ...here.
* gcc.target/riscv/rvv/autovec/math-ceil-3.c: Moved to...
* gcc.target/riscv/rvv/autovec/unop/math-ceil-3.c: ...here.
* gcc.target/riscv/rvv/autovec/math-ceil-run-0.c: Moved to...
* gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c: ...here.
* gcc.target/riscv/rvv/autovec/math-ceil-run-1.c: Moved to...
* gcc.target/riscv/rvv/autovec/unop/math-ceil-run-1.c: ...here.
* gcc.target/riscv/rvv/autovec/math-ceil-run-2.c: Moved to...
* gcc.target/riscv/rvv/autovec/unop/math-ceil-run-2.c: ...here.
* gcc.target/riscv/rvv/autovec/test-math.h: Moved to...
* gcc.target/riscv/rvv/autovec/unop/test-math.h: ...here.
 
Signed-off-by: Pan Li 
---
.../gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-0.c | 0
.../gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-1.c | 0
.../gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-2.c | 0
.../gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-3.c | 0
.../gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-run-0.c | 0
.../gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-run-1.c | 0
.../gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-run-2.c | 0
gcc/testsuite/gcc.target/riscv/rvv/autovec/{ => unop}/test-math.h | 0
8 files changed, 0 insertions(+), 0 deletions(-)
rename gcc/testsuite/gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-0.c 
(100%)
rename gcc/testsuite/gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-1.c 
(100%)
rename gcc/testsuite/gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-2.c 
(100%)
rename gcc/testsuite/gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-3.c 
(100%)
rename gcc/testsuite/gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-run-0.c 
(100%)
rename gcc/testsuite/gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-run-1.c 
(100%)
rename gcc/testsuite/gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-run-2.c 
(100%)
rename gcc/testsuite/gcc.target/riscv/rvv/autovec/{ => unop}/test-math.h (100%)
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-0.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-0.c
similarity index 100%
rename from gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-0.c
rename to gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-0.c
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-1.c
similarity index 100%
rename from gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-1.c
rename to gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-1.c
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-2.c
similarity index 100%
rename from gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-2.c
rename to gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-2.c
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-3.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-3.c
similarity index 100%
rename from gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-3.c
rename to gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-3.c
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-0.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c
similarity index 100%
rename from gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-0.c
rename to gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-1.c
similarity index 100%
rename from gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-1.c
rename to gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-1.c
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-2.c
similarity index 100%
rename from gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-2.c
rename to gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-2.c
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/test-math.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/test-math.h
similarity index 100%
rename from gcc/testsuite/gcc.target/riscv/rvv/autovec/test-math.h
rename to gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/test-math.h
-- 
2.34.1
 
 


Re: [PATCH v1] RISC-V: Remove arch and abi option for run test case.

2023-09-21 Thread juzhe.zh...@rivai.ai
LGTM



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-09-22 11:39
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Remove arch and abi option for run test case.
From: Pan Li 
 
Remove the -march and -mabi.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/math-ceil-run-0.c: Remove arch and abi.
* gcc.target/riscv/rvv/autovec/math-ceil-run-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/math-ceil-run-2.c: Ditto.
 
Signed-off-by: Pan Li 
---
gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-0.c | 2 +-
gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-1.c | 2 +-
gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-2.c | 2 +-
3 files changed, 3 insertions(+), 3 deletions(-)
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-0.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-0.c
index f1946e197cc..67462154018 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-0.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-0.c
@@ -1,5 +1,5 @@
/* { dg-do run { target { riscv_vector } } } */
-/* { dg-additional-options "-march=rv64gcv_zvfh -std=c2x -mabi=lp64d -O3 
-ftree-vectorize -fno-vect-cost-model -ffast-math" } */
+/* { dg-additional-options "-std=c2x -O3 -ftree-vectorize -fno-vect-cost-model 
-ffast-math" } */
#include "test-math.h"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-1.c
index 202944ddd92..38adff16df9 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-1.c
@@ -1,5 +1,5 @@
/* { dg-do run { target { riscv_vector } } } */
-/* { dg-additional-options "-march=rv64gcv -std=c99 -mabi=lp64d -O3 
-ftree-vectorize -fno-vect-cost-model -ffast-math" } */
+/* { dg-additional-options "-std=c99 -O3 -ftree-vectorize -fno-vect-cost-model 
-ffast-math" } */
#include "test-math.h"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-2.c
index f0ff9bca0af..6f22842ebdb 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-2.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-2.c
@@ -1,5 +1,5 @@
/* { dg-do run { target { riscv_vector } } } */
-/* { dg-additional-options "-march=rv64gcv -std=c99 -mabi=lp64d -O3 
-ftree-vectorize -fno-vect-cost-model -ffast-math" } */
+/* { dg-additional-options "-std=c99 -O3 -ftree-vectorize -fno-vect-cost-model 
-ffast-math" } */
#include "test-math.h"
-- 
2.34.1
 
 


Re: [PATCH V2] RISC-V: Optimization of vrgather.vv into vrgatherei16.vv[PR111451]

2023-09-21 Thread juzhe.zh...@rivai.ai
LGTM. You can commit it after you pass the regression.



juzhe.zh...@rivai.ai
 
From: Li Xu
Date: 2023-09-22 10:37
To: gcc-patches
CC: kito.cheng; palmer; juzhe.zhong; xuli
Subject: [PATCH V2] RISC-V: Optimization of vrgather.vv into 
vrgatherei16.vv[PR111451]
From: xuli 
 
Consider this following case:
 
typedef int32_t vnx32si __attribute__ ((vector_size (128)));
 
  __attribute__ ((noipa)) void permute_##TYPE (TYPE values1, TYPE values2, \
   TYPE *out)  \
  {\
TYPE v \
  = __builtin_shufflevector (values1, values2, MASK_##NUNITS (0, NUNITS)); \
*(TYPE *) out = v; \
  }
 
  T (vnx32si, 32)  \
 
TEST_ALL (PERMUTE)
 
Before this patch:
  li a4,31
  vsetvli a5,zero,e32,m8,ta,ma
  vl8re32.v v24,0(a0)
  vid.v v8
  vrsub.vx v8,v8,a4
  vrgather.vv v16,v24,v8
  vs8r.v v16,0(a2)
  ret
 
The index vector register "v8" occupies 8 registers.
We should optimize it into vrgatherei16.vv which is
using int16 as the index elements.
 
After this patch:
  vsetvli a5,zero,e16,m4,ta,ma
  li a4,31
  vid.v v4
  vl8re32.v v16,0(a0)
  vrsub.vx v4,v4,a4
  vsetvli zero,zero,e32,m8,ta,ma
  vrgatherei16.vv v8,v16,v4
  vs8r.v v8,0(a2)
  ret
With vrgatherei16.vv, the v8 will occupy 4 registers instead
of 8. Lower the register consuming and register pressure.
 
PR target/111451
 
gcc/ChangeLog:
 
* config/riscv/riscv-v.cc (emit_vlmax_gather_insn): Optimization of vrgather.vv 
into vrgatherei16.vv.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c: Adjust case.
* gcc.target/riscv/rvv/autovec/vls/perm-4.c: Ditto.
---
gcc/config/riscv/riscv-v.cc| 18 ++
.../riscv/rvv/autovec/vls-vlmax/perm-4.c   |  3 ++-
.../gcc.target/riscv/rvv/autovec/vls/perm-4.c  |  3 ++-
3 files changed, 22 insertions(+), 2 deletions(-)
 
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 64a71a128d4..455efa7ea8a 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -790,6 +790,24 @@ emit_vlmax_gather_insn (rtx target, rtx op, rtx sel)
   icode = code_for_pred_gather_scalar (data_mode);
   sel = elt;
 }
+  else if (CONST_VECTOR_P (sel)
+   && GET_MODE_BITSIZE (GET_MODE_INNER (sel_mode)) > 16
+   && riscv_get_v_regno_alignment (data_mode) > 1)
+{
+  /* If the inner mode of data is not QI or HI and data_lmul > 1,
+ emitting vrgatherei16.vv instruction will lower register
+ pressure.
+ data_mode  sel_mode  ei16
+ RVVM1QIRVVM1QI   RVVM2HI  not needed
+ RVVM2QIRVVM2QI   RVVM4HI  not needed
+ RVVM2HIRVVM2HI   RVVM2HI  not needed
+ RVVM2SIRVVM2SI   RVVM1HI  need
+ RVVM4SIRVVM4SI   RVVM2HI  need
+ RVVM8DIRVVM8DI   RVVM2HI  need */
+  PUT_MODE (sel, get_vector_mode (HImode,
+GET_MODE_NUNITS (data_mode)).require ());
+  icode = code_for_pred_gatherei16 (data_mode);
+}
   else
 icode = code_for_pred_gather (data_mode);
   rtx ops[] = {target, op, sel};
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c
index 9df69a0cc2c..7ab31043547 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c
@@ -55,6 +55,7 @@
TEST_ALL (PERMUTE)
-/* { dg-final { scan-assembler-times 
{vrgather\.vv\tv[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 31 } } */
+/* { dg-final { scan-assembler-times 
{vrgather\.vv\tv[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 19 } } */
+/* { dg-final { scan-assembler-times 
{vrgatherei16\.vv\tv[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 12 } } */
/* { dg-final { scan-assembler-times {vrsub\.vi} 24 } } */
/* { dg-final { scan-assembler-times {vrsub\.vx} 7 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/perm-4.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/perm-4.c
index 46cad8ea2f4..4d6862cf1c0 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/perm-4.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/perm-4.c
@@ -3,6 +3,7 @@
#include "../vls-vlmax/perm-4.c"
-/* { dg-final { scan-assembler-times 
{vrgather\.vv\tv[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 31 } } */
+/* { dg-final { scan-assembler-times 
{vrgather\.vv\tv[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 19 } } */
+/* { dg-final { scan-assembler-times 
{vrgatherei16\.vv\tv[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 12 } } */
/* { dg-final { scan-assembler-times {vrsub\.vi} 24 } } */
/* { dg-final { scan-assembler-times {vrsub\.vx} 7 } } */
-- 
2.17.1
 
 


Re: Re: [PATCH] RISC-V: Optimization of vrgather.vv into vrgatherei16.vv[PR111451]

2023-09-21 Thread juzhe.zh...@rivai.ai
Sorry. It should be:

else if (CONST_VECTOR_P (sel) 
&& GET_MODE_BITSIZE (GET_MODE_INNER (sel_mode)).to_constant () > 16
&& riscv_get_v_regno_alignment (data_mode) > 1)



juzhe.zh...@rivai.ai
 
From: juzhe.zh...@rivai.ai
Date: 2023-09-22 09:39
To: Li Xu; gcc-patches
CC: kito.cheng; palmer; Li Xu
Subject: Re: [PATCH] RISC-V: Optimization of vrgather.vv into 
vrgatherei16.vv[PR111451]

+  unsigned int data_sew = get_sew (data_mode);
+  enum vlmul_type data_lmul = get_vlmul (data_mode);

Remove this.

+  else if (CONST_VECTOR_P (sel) && data_sew != 16
+   && data_sew != 8 && (data_lmul == LMUL_2
+   || data_lmul == LMUL_4 || data_lmul == LMUL_8))

change it into:

else if (CONST_VECTOR_P (sel) 
&& GET_MODE_BITSIZE (GET_MODE_INNER (sel_mode)).to_constant () > 16
&& riscv_get_v_regno_alignment (data_mode) > LMUL_1)




juzhe.zh...@rivai.ai
 
From: Li Xu
Date: 2023-09-22 09:33
To: gcc-patches
CC: kito.cheng; palmer; juzhe.zhong; xuli
Subject: [PATCH] RISC-V: Optimization of vrgather.vv into 
vrgatherei16.vv[PR111451]
From: xuli 
 
Consider this following case:
 
typedef int32_t vnx32si __attribute__ ((vector_size (128)));
 
  __attribute__ ((noipa)) void permute_##TYPE (TYPE values1, TYPE values2, \
   TYPE *out)  \
  {\
TYPE v \
  = __builtin_shufflevector (values1, values2, MASK_##NUNITS (0, NUNITS)); \
*(TYPE *) out = v; \
  }
 
  T (vnx32si, 32)  \
 
TEST_ALL (PERMUTE)
 
Before this patch:
  li a4,31
  vsetvli a5,zero,e32,m8,ta,ma
  vl8re32.v v24,0(a0)
  vid.v v8
  vrsub.vx v8,v8,a4
  vrgather.vv v16,v24,v8
  vs8r.v v16,0(a2)
  ret
 
The index vector register "v8" occupies 8 registers.
We should optimize it into vrgatherei16.vv which is
using int16 as the index elements.
 
After this patch:
  vsetvli a5,zero,e16,m4,ta,ma
  li a4,31
  vid.v v4
  vl8re32.v v16,0(a0)
  vrsub.vx v4,v4,a4
  vsetvli zero,zero,e32,m8,ta,ma
  vrgatherei16.vv v8,v16,v4
  vs8r.v v8,0(a2)
  ret
With vrgatherei16.vv, the v8 will occupy 4 registers instead
of 8. Lower the register consuming and register pressure.
 
gcc/ChangeLog:
 
* config/riscv/riscv-v.cc (emit_vlmax_gather_insn): Optimization of vrgather.vv 
into vrgatherei16.vv.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c: Adjust case.
* gcc.target/riscv/rvv/autovec/vls/perm-4.c: Ditto.
---
gcc/config/riscv/riscv-v.cc   | 20 +++
.../riscv/rvv/autovec/vls-vlmax/perm-4.c  |  3 ++-
.../gcc.target/riscv/rvv/autovec/vls/perm-4.c |  3 ++-
3 files changed, 24 insertions(+), 2 deletions(-)
 
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 64a71a128d4..271e0ff6dfc 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -783,6 +783,8 @@ emit_vlmax_gather_insn (rtx target, rtx op, rtx sel)
   insn_code icode;
   machine_mode data_mode = GET_MODE (target);
   machine_mode sel_mode = GET_MODE (sel);
+  unsigned int data_sew = get_sew (data_mode);
+  enum vlmul_type data_lmul = get_vlmul (data_mode);
   if (maybe_ne (GET_MODE_SIZE (data_mode), GET_MODE_SIZE (sel_mode)))
 icode = code_for_pred_gatherei16 (data_mode);
   else if (const_vec_duplicate_p (sel, &elt))
@@ -790,6 +792,24 @@ emit_vlmax_gather_insn (rtx target, rtx op, rtx sel)
   icode = code_for_pred_gather_scalar (data_mode);
   sel = elt;
 }
+  else if (CONST_VECTOR_P (sel) && data_sew != 16
+   && data_sew != 8 && (data_lmul == LMUL_2
+   || data_lmul == LMUL_4 || data_lmul == LMUL_8))
+{
+  /* If the inner mode of data is not QI or HI and data_lmul > 1,
+ emitting vrgatherei16.vv instruction will lower register
+ pressure.
+ data_mode  sel_mode  ei16
+ RVVM1QIRVVM1QI   RVVM2HI  not needed
+ RVVM2QIRVVM2QI   RVVM4HI  not needed
+ RVVM2HIRVVM2HI   RVVM2HI  not needed
+ RVVM2SIRVVM2SI   RVVM1HI  need
+ RVVM4SIRVVM4SI   RVVM2HI  need
+ RVVM8DIRVVM8DI   RVVM2HI  need */
+  PUT_MODE (sel, get_vector_mode (HImode,
+GET_MODE_NUNITS (data_mode)).require ());
+  icode = code_for_pred_gatherei16 (data_mode);
+}
   else
 icode = code_for_pred_gather (data_mode);
   rtx ops[] = {target, op, sel};
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c
index 9df69a0cc2c..7ab31043547 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c
@@ -55,6 +55,7 @@
TEST_ALL (PERMUTE)

Re: [PATCH] RISC-V: Optimization of vrgather.vv into vrgatherei16.vv[PR111451]

2023-09-21 Thread juzhe.zh...@rivai.ai

+  unsigned int data_sew = get_sew (data_mode);
+  enum vlmul_type data_lmul = get_vlmul (data_mode);

Remove this.

+  else if (CONST_VECTOR_P (sel) && data_sew != 16
+   && data_sew != 8 && (data_lmul == LMUL_2
+   || data_lmul == LMUL_4 || data_lmul == LMUL_8))

change it into:

else if (CONST_VECTOR_P (sel) 
&& GET_MODE_BITSIZE (GET_MODE_INNER (sel_mode)).to_constant () > 16
&& riscv_get_v_regno_alignment (data_mode) > LMUL_1)




juzhe.zh...@rivai.ai
 
From: Li Xu
Date: 2023-09-22 09:33
To: gcc-patches
CC: kito.cheng; palmer; juzhe.zhong; xuli
Subject: [PATCH] RISC-V: Optimization of vrgather.vv into 
vrgatherei16.vv[PR111451]
From: xuli 
 
Consider this following case:
 
typedef int32_t vnx32si __attribute__ ((vector_size (128)));
 
  __attribute__ ((noipa)) void permute_##TYPE (TYPE values1, TYPE values2, \
   TYPE *out)  \
  {\
TYPE v \
  = __builtin_shufflevector (values1, values2, MASK_##NUNITS (0, NUNITS)); \
*(TYPE *) out = v; \
  }
 
  T (vnx32si, 32)  \
 
TEST_ALL (PERMUTE)
 
Before this patch:
  li a4,31
  vsetvli a5,zero,e32,m8,ta,ma
  vl8re32.v v24,0(a0)
  vid.v v8
  vrsub.vx v8,v8,a4
  vrgather.vv v16,v24,v8
  vs8r.v v16,0(a2)
  ret
 
The index vector register "v8" occupies 8 registers.
We should optimize it into vrgatherei16.vv which is
using int16 as the index elements.
 
After this patch:
  vsetvli a5,zero,e16,m4,ta,ma
  li a4,31
  vid.v v4
  vl8re32.v v16,0(a0)
  vrsub.vx v4,v4,a4
  vsetvli zero,zero,e32,m8,ta,ma
  vrgatherei16.vv v8,v16,v4
  vs8r.v v8,0(a2)
  ret
With vrgatherei16.vv, the v8 will occupy 4 registers instead
of 8. Lower the register consuming and register pressure.
 
gcc/ChangeLog:
 
* config/riscv/riscv-v.cc (emit_vlmax_gather_insn): Optimization of vrgather.vv 
into vrgatherei16.vv.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c: Adjust case.
* gcc.target/riscv/rvv/autovec/vls/perm-4.c: Ditto.
---
gcc/config/riscv/riscv-v.cc   | 20 +++
.../riscv/rvv/autovec/vls-vlmax/perm-4.c  |  3 ++-
.../gcc.target/riscv/rvv/autovec/vls/perm-4.c |  3 ++-
3 files changed, 24 insertions(+), 2 deletions(-)
 
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 64a71a128d4..271e0ff6dfc 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -783,6 +783,8 @@ emit_vlmax_gather_insn (rtx target, rtx op, rtx sel)
   insn_code icode;
   machine_mode data_mode = GET_MODE (target);
   machine_mode sel_mode = GET_MODE (sel);
+  unsigned int data_sew = get_sew (data_mode);
+  enum vlmul_type data_lmul = get_vlmul (data_mode);
   if (maybe_ne (GET_MODE_SIZE (data_mode), GET_MODE_SIZE (sel_mode)))
 icode = code_for_pred_gatherei16 (data_mode);
   else if (const_vec_duplicate_p (sel, &elt))
@@ -790,6 +792,24 @@ emit_vlmax_gather_insn (rtx target, rtx op, rtx sel)
   icode = code_for_pred_gather_scalar (data_mode);
   sel = elt;
 }
+  else if (CONST_VECTOR_P (sel) && data_sew != 16
+   && data_sew != 8 && (data_lmul == LMUL_2
+   || data_lmul == LMUL_4 || data_lmul == LMUL_8))
+{
+  /* If the inner mode of data is not QI or HI and data_lmul > 1,
+ emitting vrgatherei16.vv instruction will lower register
+ pressure.
+ data_mode  sel_mode  ei16
+ RVVM1QIRVVM1QI   RVVM2HI  not needed
+ RVVM2QIRVVM2QI   RVVM4HI  not needed
+ RVVM2HIRVVM2HI   RVVM2HI  not needed
+ RVVM2SIRVVM2SI   RVVM1HI  need
+ RVVM4SIRVVM4SI   RVVM2HI  need
+ RVVM8DIRVVM8DI   RVVM2HI  need */
+  PUT_MODE (sel, get_vector_mode (HImode,
+GET_MODE_NUNITS (data_mode)).require ());
+  icode = code_for_pred_gatherei16 (data_mode);
+}
   else
 icode = code_for_pred_gather (data_mode);
   rtx ops[] = {target, op, sel};
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c
index 9df69a0cc2c..7ab31043547 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c
@@ -55,6 +55,7 @@
TEST_ALL (PERMUTE)
-/* { dg-final { scan-assembler-times 
{vrgather\.vv\tv[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 31 } } */
+/* { dg-final { scan-assembler-times 
{vrgather\.vv\tv[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 19 } } */
+/* { dg-final { scan-assembler-times 
{vrgatherei16\.vv\tv[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 12 } } */
/* { dg-final { scan-assembler-times {vrsub\.vi} 24 } } */
/* { dg-final { scan-assembler-times {vrsub\.vx} 7 } } *

Re: [PATCH v1] RISC-V: Leverage __builtin_xx instead of math.h for test

2023-09-21 Thread juzhe.zh...@rivai.ai
LGTM。



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-09-22 09:12
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Leverage __builtin_xx instead of math.h for test
From: Pan Li 
 
The math.h may have problems in some environment, take __builtin__xx
instead for testing.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/vls/floating-point-max-5.c:
Remove reference to math.h.
* gcc.target/riscv/rvv/autovec/vls/floating-point-min-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/floating-point-sgnjx-2.c: Ditto.
 
Signed-off-by: Pan Li 
---
.../rvv/autovec/vls/floating-point-max-5.c| 43 +--
.../rvv/autovec/vls/floating-point-min-5.c| 43 +--
.../rvv/autovec/vls/floating-point-sgnjx-2.c  | 43 +--
3 files changed, 63 insertions(+), 66 deletions(-)
 
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/floating-point-max-5.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/floating-point-max-5.c
index 775ddb1d25e..dd163682396 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/floating-point-max-5.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/floating-point-max-5.c
@@ -2,30 +2,29 @@
/* { dg-options "-march=rv64gcv_zvfh_zvl4096b -mabi=lp64d -O3 
-fno-schedule-insns -fno-schedule-insns2 --param=riscv-autovec-lmul=m8 
-ffast-math" } */
#include "def.h"
-#include "math.h"
-DEF_CALL_VV (max, 1, float, fmaxf)
-DEF_CALL_VV (max, 2, float, fmaxf)
-DEF_CALL_VV (max, 4, float, fmaxf)
-DEF_CALL_VV (max, 8, float, fmaxf)
-DEF_CALL_VV (max, 16, float, fmaxf)
-DEF_CALL_VV (max, 32, float, fmaxf)
-DEF_CALL_VV (max, 64, float, fmaxf)
-DEF_CALL_VV (max, 128, float, fmaxf)
-DEF_CALL_VV (max, 256, float, fmaxf)
-DEF_CALL_VV (max, 512, float, fmaxf)
-DEF_CALL_VV (max, 1024, float, fmaxf)
+DEF_CALL_VV (max, 1, float, __builtin_fmaxf)
+DEF_CALL_VV (max, 2, float, __builtin_fmaxf)
+DEF_CALL_VV (max, 4, float, __builtin_fmaxf)
+DEF_CALL_VV (max, 8, float, __builtin_fmaxf)
+DEF_CALL_VV (max, 16, float, __builtin_fmaxf)
+DEF_CALL_VV (max, 32, float, __builtin_fmaxf)
+DEF_CALL_VV (max, 64, float, __builtin_fmaxf)
+DEF_CALL_VV (max, 128, float, __builtin_fmaxf)
+DEF_CALL_VV (max, 256, float, __builtin_fmaxf)
+DEF_CALL_VV (max, 512, float, __builtin_fmaxf)
+DEF_CALL_VV (max, 1024, float, __builtin_fmaxf)
-DEF_CALL_VV (max, 1, double, fmax)
-DEF_CALL_VV (max, 2, double, fmax)
-DEF_CALL_VV (max, 4, double, fmax)
-DEF_CALL_VV (max, 8, double, fmax)
-DEF_CALL_VV (max, 16, double, fmax)
-DEF_CALL_VV (max, 32, double, fmax)
-DEF_CALL_VV (max, 64, double, fmax)
-DEF_CALL_VV (max, 128, double, fmax)
-DEF_CALL_VV (max, 256, double, fmax)
-DEF_CALL_VV (max, 512, double, fmax)
+DEF_CALL_VV (max, 1, double, __builtin_fmax)
+DEF_CALL_VV (max, 2, double, __builtin_fmax)
+DEF_CALL_VV (max, 4, double, __builtin_fmax)
+DEF_CALL_VV (max, 8, double, __builtin_fmax)
+DEF_CALL_VV (max, 16, double, __builtin_fmax)
+DEF_CALL_VV (max, 32, double, __builtin_fmax)
+DEF_CALL_VV (max, 64, double, __builtin_fmax)
+DEF_CALL_VV (max, 128, double, __builtin_fmax)
+DEF_CALL_VV (max, 256, double, __builtin_fmax)
+DEF_CALL_VV (max, 512, double, __builtin_fmax)
/* { dg-final { scan-assembler-times 
{vfmax\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 19 } } */
/* { dg-final { scan-assembler-not {csrr} } } */
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/floating-point-min-5.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/floating-point-min-5.c
index 1e9ff7d5054..0e3cbf2acec 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/floating-point-min-5.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/floating-point-min-5.c
@@ -2,30 +2,29 @@
/* { dg-options "-march=rv64gcv_zvfh_zvl4096b -mabi=lp64d -O3 
-fno-schedule-insns -fno-schedule-insns2 --param=riscv-autovec-lmul=m8 
-ffast-math" } */
#include "def.h"
-#include "math.h"
-DEF_CALL_VV (min, 1, float, fminf)
-DEF_CALL_VV (min, 2, float, fminf)
-DEF_CALL_VV (min, 4, float, fminf)
-DEF_CALL_VV (min, 8, float, fminf)
-DEF_CALL_VV (min, 16, float, fminf)
-DEF_CALL_VV (min, 32, float, fminf)
-DEF_CALL_VV (min, 64, float, fminf)
-DEF_CALL_VV (min, 128, float, fminf)
-DEF_CALL_VV (min, 256, float, fminf)
-DEF_CALL_VV (min, 512, float, fminf)
-DEF_CALL_VV (min, 1024, float, fminf)
+DEF_CALL_VV (min, 1, float, __builtin_fminf)
+DEF_CALL_VV (min, 2, float, __builtin_fminf)
+DEF_CALL_VV (min, 4, float, __builtin_fminf)
+DEF_CALL_VV (min, 8, float, __builtin_fminf)
+DEF_CALL_VV (min, 16, float, __builtin_fminf)
+DEF_CALL_VV (min, 32, float, __builtin_fminf)
+DEF_CALL_VV (min, 64, float, __builtin_fminf)
+DEF_CALL_VV (min, 128, float, __builtin_fminf)
+DEF_CALL_VV (min, 256, float, __builtin_fminf)
+DEF_CALL_VV (min, 512, float, __builtin_fminf)
+DEF_CALL_VV (min, 1024, float, __builtin_fminf)
-DEF_CALL_VV (min, 1, double, fmin)
-DEF_CALL_VV (min, 2, double, fmin)
-DEF_CALL_VV (min, 4, double, fmin)
-DEF_CALL_VV (min, 8, dou

Re: [PATCH v4] RISC-V: Support ceil and ceilf auto-vectorization

2023-09-21 Thread juzhe.zh...@rivai.ai
LGTM



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-09-22 08:12
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v4] RISC-V: Support ceil and ceilf auto-vectorization
From: Pan Li 
 
Update in v4:
 
* Add test for _Float16.
* Remove unnecessary macro in def.h for test.
 
Original log:
 
This patch would like to support auto-vectorization for both the
ceil and ceilf of math.h. It depends on the -ffast-math option.
 
When we would like to call ceil/ceilf like v2 = ceil (v1), we will
convert it into below insn (reference the implementation of llvm).
 
* vfcvt.x.f v3, v1, RUP
* vfcvt.f.x v2, v3
 
However, the floating point value may not need the cvt as above if
its mantissa is zero. For example single precision floating point below.
 
  +---+---+
  | float | binary layout |
  +---+---+
  | 8388607.5 | 0x4aff|
  | 8388608.0 | 0x4b00|
  | 8388609.0 | 0x4b01|
  +---+---+
 
All single floating point great than 8388608.0 will have all zero mantisaa.
We leverage vmflt and mask to filter them out in vector and only do the
cvt on mask.
 
Befor this patch:
math-ceil-1.c:21:1: missed: couldn't vectorize loop
  ...
.L3:
  flw fa0,0(s0)
  addis0,s0,4
  addis1,s1,4
  callceilf
  fsw fa0,-4(s1)
  bne s0,s2,.L3
 
After this patch:
  ...
  fsrmi   3
.L4:
  vfabs.v v0,v1
  vmv1r.v v2,v1
  vmflt.vvv0,v0,v4
  sub a3,a3,a4
  vfcvt.x.f.v v3,v1,v0.t
  vfcvt.f.x.v v2,v3,v0.t
  vfsgnj.vv   v2,v2,v1
  bne .L4
.L14:
  fsrma6
  ret
 
Please note VLS mode is also involved in this patch and covered by the
test cases.
 
gcc/ChangeLog:
 
* config/riscv/autovec.md (ceil2): New pattern.
* config/riscv/riscv-protos.h (enum insn_flags): New enum type.
(enum insn_type): Ditto.
(expand_vec_ceil): New function decl.
* config/riscv/riscv-v.cc (gen_ceil_const_fp): New function impl.
(expand_vec_float_cmp_mask): Ditto.
(expand_vec_copysign): Ditto.
(expand_vec_ceil): Ditto.
* config/riscv/vector.md: Add VLS mode support.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/math-ceil-0.c: New test.
* gcc.target/riscv/rvv/autovec/math-ceil-1.c: New test.
* gcc.target/riscv/rvv/autovec/math-ceil-2.c: New test.
* gcc.target/riscv/rvv/autovec/math-ceil-3.c: New test.
* gcc.target/riscv/rvv/autovec/math-ceil-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/math-ceil-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/math-ceil-run-2.c: New test.
* gcc.target/riscv/rvv/autovec/test-math.h: New test.
* gcc.target/riscv/rvv/autovec/vls/math-ceil-1.c: New test.
 
Signed-off-by: Pan Li 
---
gcc/config/riscv/autovec.md   |  16 +++
gcc/config/riscv/riscv-protos.h   |   5 +
gcc/config/riscv/riscv-v.cc   | 133 ++
gcc/config/riscv/vector.md|   2 +-
.../riscv/rvv/autovec/math-ceil-0.c   |  26 
.../riscv/rvv/autovec/math-ceil-1.c   |  26 
.../riscv/rvv/autovec/math-ceil-2.c   |  26 
.../riscv/rvv/autovec/math-ceil-3.c   |  28 
.../riscv/rvv/autovec/math-ceil-run-0.c   |  39 +
.../riscv/rvv/autovec/math-ceil-run-1.c   |  39 +
.../riscv/rvv/autovec/math-ceil-run-2.c   |  39 +
.../gcc.target/riscv/rvv/autovec/test-math.h  |  38 +
.../riscv/rvv/autovec/vls/math-ceil-1.c   |  56 
13 files changed, 472 insertions(+), 1 deletion(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-0.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-3.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-0.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/test-math.h
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-ceil-1.c
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index f0f1abc4e82..1b4bd82f9ec 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2239,3 +2239,19 @@ (define_expand "avg3_ceil"
   riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, ops3);
   DONE;
})
+
+;; -
+;;  [FP] Math.h.
+;; -
+;; Includes:
+;; - ceil/ceilf
+;; -
+(define_expand "ceil2"
+  [(match_operand:V_VLSF 0 "register_operand")
+   (match_operand:V_VLSF 1 "register_operand")]
+  "TARGET_VECTOR && !flag_tr

Re: [Committed] RISC-V: Remove math.h import to resolve missing stubs failures

2023-09-21 Thread juzhe.zh...@rivai.ai
Hi, Patrick.

GNU rvv intrinsic api test-generator has been merged:
https://github.com/riscv-non-isa/rvv-intrinsic-doc/commits/main 

Could you include the full RVV intrinsic API test in your test CI?
Currently, we don't include all API test in the GCC testsuite since it's too 
big.



juzhe.zh...@rivai.ai
 
From: Patrick O'Neill
Date: 2023-09-21 01:51
To: Kito Cheng
CC: GCC Patches; Robin Dapp; 钟居哲
Subject: [Committed] RISC-V: Remove math.h import to resolve missing stubs 
failures
Committed. Thanks!
On 9/20/23 10:19, Kito Cheng wrote:
LGTM 

Patrick O'Neill  於 2023年9月20日 週三 18:07 寫道:
Resolves some of the missing stubs failures:
fatal error: gnu/stubs-lp64d.h: No such file or directory
compilation terminated.

2023-09-20 Juzhe Zhong 

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls/def.h: Remove unneeded math.h
import.

Tested-by: Patrick O'Neill 
---
Tested using 590a8bec3ed92118e084b0a1897d3314a666170e
glibc rv64gcv
glibc rv32gcv

glibc rv64gcv
Resolved failures:
FAIL: gcc.target/riscv/rvv/autovec/vls/mov-2.c -O3 -ftree-vectorize --param 
riscv-autovec-preference=scalable (test for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vls/mov-4.c -O3 -ftree-vectorize --param 
riscv-autovec-preference=scalable (test for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vls/mov-6.c -O3 -ftree-vectorize --param 
riscv-autovec-preference=scalable (test for excess errors)

glibc rv32gcv
Resolved failures:
FAIL: gcc.target/riscv/rvv/autovec/vls/and-1.c -O3 -ftree-vectorize --param 
riscv-autovec-preference=scalable (test for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vls/and-2.c -O3 -ftree-vectorize --param 
riscv-autovec-preference=scalable (test for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vls/and-3.c -O3 -ftree-vectorize --param 
riscv-autovec-preference=scalable (test for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vls/cmp-1.c -O3 -ftree-vectorize --param 
riscv-autovec-preference=scalable (test for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vls/cmp-2.c -O3 -ftree-vectorize --param 
riscv-autovec-preference=scalable (test for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vls/cmp-3.c -O3 -ftree-vectorize --param 
riscv-autovec-preference=scalable (test for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vls/cmp-4.c -O3 -ftree-vectorize --param 
riscv-autovec-preference=scalable (test for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vls/cmp-5.c -O3 -ftree-vectorize --param 
riscv-autovec-preference=scalable (test for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vls/cmp-6.c -O3 -ftree-vectorize --param 
riscv-autovec-preference=scalable (test for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vls/const-1.c -O3 -ftree-vectorize --param 
riscv-autovec-preference=scalable (test for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vls/const-2.c -O3 -ftree-vectorize --param 
riscv-autovec-preference=scalable (test for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vls/const-3.c -O3 -ftree-vectorize --param 
riscv-autovec-preference=scalable (test for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vls/const-4.c -O3 -ftree-vectorize --param 
riscv-autovec-preference=scalable (test for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vls/const-5.c -O3 -ftree-vectorize --param 
riscv-autovec-preference=scalable (test for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vls/div-1.c -O3 -ftree-vectorize --param 
riscv-autovec-preference=scalable (test for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vls/dup-1.c -O3 -ftree-vectorize --param 
riscv-autovec-preference=scalable (test for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vls/dup-2.c -O3 -ftree-vectorize --param 
riscv-autovec-preference=scalable (test for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vls/dup-3.c -O3 -ftree-vectorize --param 
riscv-autovec-preference=scalable (test for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vls/dup-4.c -O3 -ftree-vectorize --param 
riscv-autovec-preference=scalable (test for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vls/dup-5.c -O3 -ftree-vectorize --param 
riscv-autovec-preference=scalable (test for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vls/dup-6.c -O3 -ftree-vectorize --param 
riscv-autovec-preference=scalable (test for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vls/dup-7.c -O3 -ftree-vectorize --param 
riscv-autovec-preference=scalable (test for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vls/extract-1.c -O3 -ftree-vectorize --param 
riscv-autovec-preference=scalable (test for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vls/extract-2.c -O3 -ftree-vectorize --param 
riscv-autovec-preference=scalable (test for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vls/floating-point-add-1.c -O3 
-ftree-vectorize --param riscv-autovec-preference=scalable (test for excess 
errors)
FAIL: gcc.target/riscv/rvv/autovec/vls/floating-point-add-2.c -O3 
-ftree-vector

Re: [PATCH v2] RISC-V: Support ceil and ceilf auto-vectorization

2023-09-21 Thread juzhe.zh...@rivai.ai
Also。 Remove math.h include。
Instead, plz use __builtin_ceil.



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-09-21 18:32
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v2] RISC-V: Support ceil and ceilf auto-vectorization
From: Pan Li 
 
This patch would like to support auto-vectorization for both the
ceil and ceilf of math.h. It depends on the -ffast-math option.
 
When we would like to call ceil/ceilf like v2 = ceil (v1), we will
convert it into below insn (reference the implementation of llvm).
 
* vfcvt.x.f v3, v1, RUP
* vfcvt.f.x v2, v3
 
However, the floating point value may not need the cvt as above if
its mantissa is zero. For example single precision floating point below.
 
  +---+---+
  | float | binary layout |
  +---+---+
  | 8388607.5 | 0x4aff|
  | 8388608.0 | 0x4b00|
  | 8388609.0 | 0x4b01|
  +---+---+
 
All single floating point great than 8388608.0 will have all zero mantisaa.
We leverage vmflt and mask to filter them out in vector and only do the
cvt on mask.
 
Befor this patch:
math-ceil-1.c:21:1: missed: couldn't vectorize loop
  ...
.L3:
  flw fa0,0(s0)
  addis0,s0,4
  addis1,s1,4
  callceilf
  fsw fa0,-4(s1)
  bne s0,s2,.L3
 
After this patch:
  ...
  fsrmi   3
.L4:
  vfabs.v v0,v1
  vmv1r.v v2,v1
  vmflt.vvv0,v0,v4
  sub a3,a3,a4
  vfcvt.x.f.v v3,v1,v0.t
  vfcvt.f.x.v v2,v3,v0.t
  vfsgnj.vv   v2,v2,v1
  bne .L4
.L14:
  fsrma6
  ret
 
Please note VLS mode is also involved in this patch and covered by the
test cases.
 
gcc/ChangeLog:
 
* config/riscv/autovec.md (ceil2): New pattern.
* config/riscv/riscv-protos.h (enum insn_flags): New enum type.
(enum insn_type): Ditto.
(expand_vec_ceil): New function decl.
* config/riscv/riscv-v.cc (gen_ceil_const_fp): New function impl.
(expand_vec_float_cmp_mask): Ditto.
(expand_vec_copysign): Ditto.
(expand_vec_ceil): Ditto.
* config/riscv/vector-iterators.md: Add VLS mode to VCONVERT.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/math-ceil-1.c: New test.
* gcc.target/riscv/rvv/autovec/math-ceil-2.c: New test.
* gcc.target/riscv/rvv/autovec/math-ceil-3.c: New test.
* gcc.target/riscv/rvv/autovec/math-ceil-4.c: New test.
* gcc.target/riscv/rvv/autovec/math-ceil-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/math-ceil-run-2.c: New test.
* gcc.target/riscv/rvv/autovec/math-ceil-run-3.c: New test.
* gcc.target/riscv/rvv/autovec/math-ceil-run-4.c: New test.
* gcc.target/riscv/rvv/autovec/math-ceil-run-double.h: New test.
* gcc.target/riscv/rvv/autovec/math-ceil-run-single.h: New test.
* gcc.target/riscv/rvv/autovec/test-math.h: New test.
 
Signed-off-by: Pan Li 
---
gcc/config/riscv/autovec.md   |  16 +++
gcc/config/riscv/riscv-protos.h   |   5 +
gcc/config/riscv/riscv-v.cc   | 116 ++
gcc/config/riscv/vector-iterators.md  |  12 ++
.../riscv/rvv/autovec/math-ceil-1.c   |  26 
.../riscv/rvv/autovec/math-ceil-2.c   |  26 
.../riscv/rvv/autovec/math-ceil-3.c   |  28 +
.../riscv/rvv/autovec/math-ceil-4.c   |  28 +
.../riscv/rvv/autovec/math-ceil-run-1.c   |   4 +
.../riscv/rvv/autovec/math-ceil-run-2.c   |   4 +
.../riscv/rvv/autovec/math-ceil-run-3.c   |   4 +
.../riscv/rvv/autovec/math-ceil-run-4.c   |   4 +
.../riscv/rvv/autovec/math-ceil-run-double.h  |  36 ++
.../riscv/rvv/autovec/math-ceil-run-single.h  |  36 ++
.../gcc.target/riscv/rvv/autovec/test-math.h  |  40 ++
15 files changed, 385 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-3.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-4.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-3.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-4.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-double.h
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-single.h
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/test-math.h
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 493d5745485..36ed839aa5b 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2374,3 +2374,19 @@ (define_expand "avg3_ceil"
   riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, ops3);
   DONE;
})
+
+;; -
+;; -

Re: [PATCH v2] RISC-V: Support ceil and ceilf auto-vectorization

2023-09-21 Thread juzhe.zh...@rivai.ai
+(define_expand "ceil2"
+  [(match_operand:V_VLSF 0 "register_operand")
+   (match_operand:V_VLSF 1 "register_operand")]
+  "TARGET_VECTOR"
+  {
+riscv_vector::expand_vec_ceil (operands[0], operands[1], mode, 
mode);
+DONE;
+  }

I think you should add !flag_trapping_math && !flag_rounding_math

You can try -ftrapping-math or frounding-mode, LLVM failed to vectorize.

Like  X86:

(define_expand "round2"
  [(match_operand:X87MODEF 0 "register_operand")
   (match_operand:X87MODEF 1 "nonimmediate_operand")]
  "(TARGET_USE_FANCY_MATH_387
&& (!(SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
  || TARGET_MIX_SSE_I387)
&& flag_unsafe_math_optimizations
&& (flag_fp_int_builtin_inexact || !flag_trapping_math))
   || (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH
   && !flag_trapping_math && !flag_rounding_math)"

Otherwise LGTM.


juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-09-21 18:32
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v2] RISC-V: Support ceil and ceilf auto-vectorization
From: Pan Li 
 
This patch would like to support auto-vectorization for both the
ceil and ceilf of math.h. It depends on the -ffast-math option.
 
When we would like to call ceil/ceilf like v2 = ceil (v1), we will
convert it into below insn (reference the implementation of llvm).
 
* vfcvt.x.f v3, v1, RUP
* vfcvt.f.x v2, v3
 
However, the floating point value may not need the cvt as above if
its mantissa is zero. For example single precision floating point below.
 
  +---+---+
  | float | binary layout |
  +---+---+
  | 8388607.5 | 0x4aff|
  | 8388608.0 | 0x4b00|
  | 8388609.0 | 0x4b01|
  +---+---+
 
All single floating point great than 8388608.0 will have all zero mantisaa.
We leverage vmflt and mask to filter them out in vector and only do the
cvt on mask.
 
Befor this patch:
math-ceil-1.c:21:1: missed: couldn't vectorize loop
  ...
.L3:
  flw fa0,0(s0)
  addis0,s0,4
  addis1,s1,4
  callceilf
  fsw fa0,-4(s1)
  bne s0,s2,.L3
 
After this patch:
  ...
  fsrmi   3
.L4:
  vfabs.v v0,v1
  vmv1r.v v2,v1
  vmflt.vvv0,v0,v4
  sub a3,a3,a4
  vfcvt.x.f.v v3,v1,v0.t
  vfcvt.f.x.v v2,v3,v0.t
  vfsgnj.vv   v2,v2,v1
  bne .L4
.L14:
  fsrma6
  ret
 
Please note VLS mode is also involved in this patch and covered by the
test cases.
 
gcc/ChangeLog:
 
* config/riscv/autovec.md (ceil2): New pattern.
* config/riscv/riscv-protos.h (enum insn_flags): New enum type.
(enum insn_type): Ditto.
(expand_vec_ceil): New function decl.
* config/riscv/riscv-v.cc (gen_ceil_const_fp): New function impl.
(expand_vec_float_cmp_mask): Ditto.
(expand_vec_copysign): Ditto.
(expand_vec_ceil): Ditto.
* config/riscv/vector-iterators.md: Add VLS mode to VCONVERT.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/math-ceil-1.c: New test.
* gcc.target/riscv/rvv/autovec/math-ceil-2.c: New test.
* gcc.target/riscv/rvv/autovec/math-ceil-3.c: New test.
* gcc.target/riscv/rvv/autovec/math-ceil-4.c: New test.
* gcc.target/riscv/rvv/autovec/math-ceil-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/math-ceil-run-2.c: New test.
* gcc.target/riscv/rvv/autovec/math-ceil-run-3.c: New test.
* gcc.target/riscv/rvv/autovec/math-ceil-run-4.c: New test.
* gcc.target/riscv/rvv/autovec/math-ceil-run-double.h: New test.
* gcc.target/riscv/rvv/autovec/math-ceil-run-single.h: New test.
* gcc.target/riscv/rvv/autovec/test-math.h: New test.
 
Signed-off-by: Pan Li 
---
gcc/config/riscv/autovec.md   |  16 +++
gcc/config/riscv/riscv-protos.h   |   5 +
gcc/config/riscv/riscv-v.cc   | 116 ++
gcc/config/riscv/vector-iterators.md  |  12 ++
.../riscv/rvv/autovec/math-ceil-1.c   |  26 
.../riscv/rvv/autovec/math-ceil-2.c   |  26 
.../riscv/rvv/autovec/math-ceil-3.c   |  28 +
.../riscv/rvv/autovec/math-ceil-4.c   |  28 +
.../riscv/rvv/autovec/math-ceil-run-1.c   |   4 +
.../riscv/rvv/autovec/math-ceil-run-2.c   |   4 +
.../riscv/rvv/autovec/math-ceil-run-3.c   |   4 +
.../riscv/rvv/autovec/math-ceil-run-4.c   |   4 +
.../riscv/rvv/autovec/math-ceil-run-double.h  |  36 ++
.../riscv/rvv/autovec/math-ceil-run-single.h  |  36 ++
.../gcc.target/riscv/rvv/autovec/test-math.h  |  40 ++
15 files changed, 385 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-3.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-4.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-1.c
create mode 10064

Re: [PATCH] RISC-V: Rename predicate vector_gs_scale_operand_16/32 to more generic names

2023-09-20 Thread juzhe.zh...@rivai.ai
LGTM



juzhe.zh...@rivai.ai
 
From: Lehua Ding
Date: 2023-09-21 11:44
To: gcc-patches
CC: juzhe.zhong; kito.cheng; rdapp.gcc; palmer; jeffreyalaw; lehua.ding
Subject: [PATCH] RISC-V: Rename predicate vector_gs_scale_operand_16/32 to more 
generic names
This little rename vector_gs_scale_operand_16/32 to more generic names
const_1_or_2/4_operand. So it's a little better understood when offered
for use elsewhere.
 
gcc/ChangeLog:
 
* config/riscv/predicates.md (const_1_or_2_operand): Rename.
(const_1_or_4_operand): Ditto.
(vector_gs_scale_operand_16): Ditto.
(vector_gs_scale_operand_32): Ditto.
* config/riscv/vector-iterators.md: Adjust.
 
---
gcc/config/riscv/predicates.md   | 16 
gcc/config/riscv/vector-iterators.md | 16 
2 files changed, 16 insertions(+), 16 deletions(-)
 
diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 4bc7ff2c9d8..a4f03242f2c 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -70,6 +70,14 @@
   (and (match_code "const_int,const_wide_int,const_vector")
(match_test "op == CONST1_RTX (GET_MODE (op))")))
 
+(define_predicate "const_1_or_2_operand"
+  (and (match_code "const_int")
+   (match_test "INTVAL (op) == 1 || INTVAL (op) == 2")))
+
+(define_predicate "const_1_or_4_operand"
+  (and (match_code "const_int")
+   (match_test "INTVAL (op) == 1 || INTVAL (op) == 4")))
+
(define_predicate "reg_or_0_operand"
   (ior (match_operand 0 "const_0_operand")
(match_operand 0 "register_operand")))
@@ -463,14 +471,6 @@
   (ior (match_operand 0 "register_operand")
(match_code "const_vector")))
 
-(define_predicate "vector_gs_scale_operand_16"
-  (and (match_code "const_int")
-   (match_test "INTVAL (op) == 1 || INTVAL (op) == 2")))
-
-(define_predicate "vector_gs_scale_operand_32"
-  (and (match_code "const_int")
-   (match_test "INTVAL (op) == 1 || INTVAL (op) == 4")))
-
(define_predicate "vector_gs_scale_operand_64"
   (and (match_code "const_int")
(match_test "INTVAL (op) == 1 || (INTVAL (op) == 8 && Pmode == 
DImode)")))
diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index 053d84c0c7d..a32d7e8d4e9 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -2723,18 +2723,18 @@
   (RVVMF4QI "const_1_operand") (RVVMF8QI "const_1_operand")
 
   (RVVM8HI "const_1_operand") (RVVM4HI "vector_gs_scale_operand_16_rv32")
-  (RVVM2HI "vector_gs_scale_operand_16") (RVVM1HI "vector_gs_scale_operand_16")
-  (RVVMF2HI "vector_gs_scale_operand_16") (RVVMF4HI 
"vector_gs_scale_operand_16")
+  (RVVM2HI "const_1_or_2_operand") (RVVM1HI "const_1_or_2_operand")
+  (RVVMF2HI "const_1_or_2_operand") (RVVMF4HI "const_1_or_2_operand")
 
   (RVVM8HF "const_1_operand") (RVVM4HF "vector_gs_scale_operand_16_rv32")
-  (RVVM2HF "vector_gs_scale_operand_16") (RVVM1HF "vector_gs_scale_operand_16")
-  (RVVMF2HF "vector_gs_scale_operand_16") (RVVMF4HF 
"vector_gs_scale_operand_16")
+  (RVVM2HF "const_1_or_2_operand") (RVVM1HF "const_1_or_2_operand")
+  (RVVMF2HF "const_1_or_2_operand") (RVVMF4HF "const_1_or_2_operand")
 
-  (RVVM8SI "vector_gs_scale_operand_32_rv32") (RVVM4SI 
"vector_gs_scale_operand_32") (RVVM2SI "vector_gs_scale_operand_32")
-  (RVVM1SI "vector_gs_scale_operand_32") (RVVMF2SI 
"vector_gs_scale_operand_32")
+  (RVVM8SI "vector_gs_scale_operand_32_rv32") (RVVM4SI "const_1_or_4_operand") 
(RVVM2SI "const_1_or_4_operand")
+  (RVVM1SI "const_1_or_4_operand") (RVVMF2SI "const_1_or_4_operand")
 
-  (RVVM8SF "vector_gs_scale_operand_32_rv32") (RVVM4SF 
"vector_gs_scale_operand_32") (RVVM2SF "vector_gs_scale_operand_32")
-  (RVVM1SF "vector_gs_scale_operand_32") (RVVMF2SF 
"vector_gs_scale_operand_32")
+  (RVVM8SF "vector_gs_scale_operand_32_rv32") (RVVM4SF "const_1_or_4_operand") 
(RVVM2SF "const_1_or_4_operand")
+  (RVVM1SF "const_1_or_4_operand") (RVVMF2SF "const_1_or_4_operand")
 
   (RVVM8DI "vector_gs_scale_operand_64") (RVVM4DI "vector_gs_scale_operand_64")
   (RVVM2DI "vector_gs_scale_operand_64") (RVVM1DI "vector_gs_scale_operand_64")
--
2.36.3
 


Re: [PATCH] RISC-V: Optimized for strided load/store with stride == element width[PR111450]

2023-09-20 Thread juzhe.zh...@rivai.ai
Thanks a lot. LGTM.



juzhe.zh...@rivai.ai
 
From: Li Xu
Date: 2023-09-21 11:12
To: gcc-patches
CC: kito.cheng; palmer; juzhe.zhong; xuli
Subject: [PATCH] RISC-V: Optimized for strided load/store with stride == 
element width[PR111450]
From: xuli 
 
When stride == element width, vlsse should be optimized into vle.v.
vsse should be optimized into vse.v.
 
PR target/111450
 
gcc/ChangeLog:
 
*config/riscv/constraints.md (c01): const_int 1.
(c02): const_int 2.
(c04): const_int 4.
(c08): const_int 8.
* config/riscv/predicates.md (vector_eew8_stride_operand): New predicate for 
stride operand.
(vector_eew16_stride_operand): Ditto.
(vector_eew32_stride_operand): Ditto.
(vector_eew64_stride_operand): Ditto.
* config/riscv/vector-iterators.md: New iterator for stride operand.
* config/riscv/vector.md: Add stride = element width constraint.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/pr111450.c: New test.
---
gcc/config/riscv/constraints.md   |  20 
gcc/config/riscv/predicates.md|  18 
gcc/config/riscv/vector-iterators.md  |  87 +++
gcc/config/riscv/vector.md|  42 +---
.../gcc.target/riscv/rvv/base/pr111450.c  | 100 ++
5 files changed, 250 insertions(+), 17 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr111450.c
 
diff --git a/gcc/config/riscv/constraints.md b/gcc/config/riscv/constraints.md
index 3f52bc76f67..964fdd450c9 100644
--- a/gcc/config/riscv/constraints.md
+++ b/gcc/config/riscv/constraints.md
@@ -45,6 +45,26 @@
   (and (match_code "const_int")
(match_test "ival == 0")))
+(define_constraint "c01"
+  "Constant value 1."
+  (and (match_code "const_int")
+   (match_test "ival == 1")))
+
+(define_constraint "c02"
+  "Constant value 2"
+  (and (match_code "const_int")
+   (match_test "ival == 2")))
+
+(define_constraint "c04"
+  "Constant value 4"
+  (and (match_code "const_int")
+   (match_test "ival == 4")))
+
+(define_constraint "c08"
+  "Constant value 8"
+  (and (match_code "const_int")
+   (match_test "ival == 8")))
+
(define_constraint "K"
   "A 5-bit unsigned immediate for CSR access instructions."
   (and (match_code "const_int")
diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 4bc7ff2c9d8..7845998e430 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -514,6 +514,24 @@
   (ior (match_operand 0 "const_0_operand")
(match_operand 0 "pmode_register_operand")))
+;; [1, 2, 4, 8] means strided load/store with stride == element width
+(define_special_predicate "vector_eew8_stride_operand"
+  (ior (match_operand 0 "pmode_register_operand")
+   (and (match_code "const_int")
+(match_test "INTVAL (op) == 1 || INTVAL (op) == 0"
+(define_special_predicate "vector_eew16_stride_operand"
+  (ior (match_operand 0 "pmode_register_operand")
+   (and (match_code "const_int")
+(match_test "INTVAL (op) == 2 || INTVAL (op) == 0"
+(define_special_predicate "vector_eew32_stride_operand"
+  (ior (match_operand 0 "pmode_register_operand")
+   (and (match_code "const_int")
+(match_test "INTVAL (op) == 4 || INTVAL (op) == 0"
+(define_special_predicate "vector_eew64_stride_operand"
+  (ior (match_operand 0 "pmode_register_operand")
+   (and (match_code "const_int")
+(match_test "INTVAL (op) == 8 || INTVAL (op) == 0"
+
;; A special predicate that doesn't match a particular mode.
(define_special_predicate "vector_any_register_operand"
   (match_code "reg"))
diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index 73df55a69c8..f85d1cc80d1 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -2596,6 +2596,93 @@
   (V512DI "V512BI")
])
+(define_mode_attr stride_predicate [
+  (RVVM8QI "vector_eew8_stride_operand") (RVVM4QI "vector_eew8_stride_operand")
+  (RVVM2QI "vector_eew8_stride_operand") (RVVM1QI "vector_eew8_stride_operand")
+  (RVVMF2QI "vector_eew8_stride_operand") (RVVMF4QI 
"vector_eew8_stride_operand")
+  (RVVMF8QI "vector_eew8_stride_operand")
+
+  (RVVM8HI "vector_eew16_stride_operand") (RVVM4HI 
"vector_eew16_stride_operand")
+  (RVVM2HI "vector_eew16_stride_operand") (RVVM1HI 
"vector_eew16_stride_operand")
+  (RVVMF2HI "vector_eew16_stride_operand") (RVVMF4HI 
"vec

Re: Re: [Committed] RISC-V: Fix Demand comparison bug[VSETVL PASS]

2023-09-20 Thread juzhe.zh...@rivai.ai
Yes. We could wait for a more few days to backport.



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-09-21 00:41
To: Juzhe-Zhong
CC: GCC Patches; Kito Cheng; Jeff Law; Robin Dapp
Subject: Re: [Committed] RISC-V: Fix Demand comparison bug[VSETVL PASS]
Does it also happened on gcc 13 branch? If so plz backport :)

Juzhe-Zhong  於 2023年9月20日 週三 11:09 寫道:
This bug is exposed when we support VLS integer conversion patterns.

FAIL: c-c++-common/torture/pr53505.c execution.

This is because incorrect vsetvl elimination by Phase 4:

   10318:   0d207057vsetvli zero,zero,e32,m4,ta,ma
   1031c:   5e003e57vmv.v.i v28,0
   .:   missed e8,m1 vsetvl
   10320:   7b07b057vmsgtu.vi   v0,v16,15
   10324:   03083157vadd.vi v2,v16,-16

Regression on release version GCC no surprise difference.

Committed.

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (vector_insn_info::operator==): Fix bug.

---
 gcc/config/riscv/riscv-vsetvl.cc | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index df980b6770e..e0f61148ef3 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -1799,10 +1799,11 @@ vector_insn_info::operator== (const vector_insn_info 
&other) const
 if (m_demands[i] != other.demand_p ((enum demand_type) i))
   return false;

-  if (vector_config_insn_p (m_insn->rtl ())
-  || vector_config_insn_p (other.get_insn ()->rtl ()))
-if (m_insn != other.get_insn ())
-  return false;
+  /* We should consider different INSN demands as different
+ expression.  Otherwise, we will be doing incorrect vsetvl
+ elimination.  */
+  if (m_insn != other.get_insn ())
+return false;

   if (!same_avl_p (other))
 return false;
-- 
2.36.3



Re: Re: [PATCH V2] RISC-V: Support combine cond extend and reduce sum to widen reduce sum

2023-09-20 Thread juzhe.zh...@rivai.ai
I think both approaches look weird to me.

Lehua is adding an const 0 move pattern which is only used by widen reduction 
is not ideal.
Also, I don't like changing abs/vcond_mask predicate.

So, IMHO, a complicate pattern which combine initial 0 value + extension + 
reduction + vmerge may be more reasonable.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-09-20 17:14
To: Lehua Ding; gcc-patches
CC: rdapp.gcc; juzhe.zhong; kito.cheng; palmer; jeffreyalaw
Subject: Re: [PATCH V2] RISC-V: Support combine cond extend and reduce sum to 
widen reduce sum
Hi Lehua,
 
I think this is better but still a bit weird :D  Allowing constants
and forcing them into registers unconditionally is slightly dubious as
well, though.  One thing that always sticks out is - how is 0 special?
Wouldn't we want other constants as well?
 
For reductions I think the vectorizer always starts accumulates
starting with the initial neutral value 0 and adds any other scalar
initial value later.  But that could change?
 
For reference, attached is what I tried.  This gives me no regressions
and your tests work.  Your approach is more generic in case we want to
match future zero constants in other patterns (that we still needed
to adjust with force reg otherwise) but the force-reg thing appears
more "natural".
 
All in all, I would prefer the force-reg approach slightly but could also
live with this v2 despite some minor "usability" concerns.  Going to leave
the decision to you, either one is OK.
 
Regards
Robin
 
From 3be4cf4403a584d560c3923207a9c4da8dafee49 Mon Sep 17 00:00:00 2001
From: Robin Dapp 
Date: Wed, 20 Sep 2023 10:15:36 +0200
Subject: [PATCH] lehua
 
---
gcc/config/riscv/autovec-opt.md | 52 -
gcc/config/riscv/autovec.md |  4 ++-
gcc/config/riscv/riscv-protos.h |  1 +
3 files changed, 55 insertions(+), 2 deletions(-)
 
diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
index a97a095691c..8d4ee2ae37f 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -103,12 +103,14 @@ (define_insn_and_split "*cond_abs"
 (if_then_else:VF
   (match_operand: 3 "register_operand")
   (abs:VF (match_operand:VF 1 "nonmemory_operand"))
-  (match_operand:VF 2 "register_operand")))]
+  (match_operand:VF 2 "nonmemory_operand")))]
   "TARGET_VECTOR && can_create_pseudo_p ()"
   "#"
   "&& 1"
   [(const_int 0)]
{
+  if (!REG_P (operands[2]))
+operands[2] = force_reg (mode, operands[2]);
   emit_insn (gen_cond_len_abs (operands[0], operands[3], operands[1],
 operands[2],
 gen_int_mode (GET_MODE_NUNITS (mode), Pmode),
@@ -1176,3 +1178,51 @@ (define_insn_and_split "*n"
 DONE;
   }
   [(set_attr "type" "vmalu")])
+
+;; Combine mask extend + vredsum to mask vwredsum[u]
+(define_insn_and_split "*cond_widen_reduc_plus_scal_"
+  [(set (match_operand: 0 "register_operand")
+(unspec: [
+  (if_then_else:
+(match_operand: 1 "register_operand")
+(any_extend:
+  (match_operand:VI_QHS_NO_M8 2 "register_operand"))
+(match_operand: 3 "vector_const_0_operand"))
+] UNSPEC_REDUC_SUM))]
+  "TARGET_VECTOR && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
+  rtx ops[] = {operands[0], operands[2], operands[1],
+   gen_int_mode (GET_MODE_NUNITS (mode), Pmode)};
+  riscv_vector::expand_reduction (,
+  riscv_vector::REDUCE_OP_M,
+  ops, CONST0_RTX (mode));
+  DONE;
+}
+[(set_attr "type" "vector")])
+
+;; Combine mask extend + vfredsum to mask vfwredusum
+(define_insn_and_split "*cond_widen_reduc_plus_scal_"
+  [(set (match_operand: 0 "register_operand")
+(unspec: [
+  (if_then_else:
+(match_operand: 1 "register_operand")
+(float_extend:
+  (match_operand:VF_HS_NO_M8 2 "register_operand"))
+(match_operand: 3 "vector_const_0_operand"))
+] UNSPEC_REDUC_SUM_UNORDERED))]
+  "TARGET_VECTOR && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
+  rtx ops[] = {operands[0], operands[2], operands[1],
+   gen_int_mode (GET_MODE_NUNITS (mode), Pmode)};
+  riscv_vector::expand_reduction (UNSPEC_WREDUC_SUM_UNORDERED,
+  riscv_vector::REDUCE_OP_M_FRM_DYN,
+  ops, CONST0_RTX (mode));
+  DONE;
+}
+[(set_attr "type" "vector")])
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 75ed7ae4f2e..1c10e841692 100644
--- 

Re: [PATCH] RISC-V: Reorganize and rename combine patterns in autovec-opt.md

2023-09-20 Thread juzhe.zh...@rivai.ai
LGTM.



juzhe.zh...@rivai.ai
 
From: Lehua Ding
Date: 2023-09-20 15:03
To: gcc-patches
CC: juzhe.zhong; kito.cheng; rdapp.gcc; palmer; jeffreyalaw; lehua.ding
Subject: [PATCH] RISC-V: Reorganize and rename combine patterns in 
autovec-opt.md
This patch reorganize and rename the combine patterns in autovec-opt.md
by category. There shouldn't be any functional changes.
The current classification includes the following categories:
 
- Combine op + vmerge to cond_op
- Combine binop + trunc to narrow_binop
- Combine extend + binop to widen_binop
- Combine extend + ternop to widen_ternop
- Misc combine patterns
 
gcc/ChangeLog:
 
* config/riscv/autovec-opt.md (*not): Move and rename.
(*n): Ditto.
(*vtrunc): Ditto.
(*trunc): Ditto.
(*narrow_): Ditto.
(*narrow__scalar): Ditto.
(*single_widen_mult): Ditto.
(*single_widen_mul): Ditto.
(*single_widen_mult): Ditto.
(*single_widen_mul): Ditto.
(*dual_widen_fma): Ditto.
(*dual_widen_fma): Ditto.
(*single_widen_fma): Ditto.
(*single_widen_fma): Ditto.
(*dual_fma): Ditto.
(*single_fma): Ditto.
(*dual_fnma): Ditto.
(*dual_widen_fnma): Ditto.
(*single_fnma): Ditto.
(*single_widen_fnma): Ditto.
(*dual_fms): Ditto.
(*dual_widen_fms): Ditto.
(*single_fms): Ditto.
(*single_widen_fms): Ditto.
(*dual_fnms): Ditto.
(*dual_widen_fnms): Ditto.
(*single_fnms): Ditto.
(*single_widen_fnms): Ditto.
 
---
gcc/config/riscv/autovec-opt.md | 203 ++--
1 file changed, 91 insertions(+), 112 deletions(-)
 
diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
index 66c77ad6ebb..46a344407c7 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -58,104 +58,6 @@
   }
)
 
-;; -
-;;  [BOOL] Binary logical operations (inverted second input)
-;; -
-;; Includes:
-;; - vmandnot.mm
-;; - vmornot.mm
-;; -
-
-(define_insn_and_split "*not"
-  [(set (match_operand:VB_VLS 0 "register_operand"   "=vr")
- (bitmanip_bitwise:VB_VLS
-   (not:VB_VLS (match_operand:VB_VLS 2 "register_operand" " vr"))
-   (match_operand:VB_VLS 1 "register_operand" " vr")))]
-  "TARGET_VECTOR && can_create_pseudo_p ()"
-  "#"
-  "&& 1"
-  [(const_int 0)]
-  {
-insn_code icode = code_for_pred_not (, mode);
-riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_MASK_OP, 
operands);
-DONE;
-  }
-  [(set_attr "type" "vmalu")
-   (set_attr "mode" "")])
-
-;; -
-;;  [BOOL] Binary logical operations (inverted result)
-;; -
-;; Includes:
-;; - vmnand.mm
-;; - vmnor.mm
-;; - vmxnor.mm
-;; -
-
-(define_insn_and_split "*n"
-  [(set (match_operand:VB_VLS 0 "register_operand" "=vr")
- (not:VB_VLS
-   (any_bitwise:VB_VLS
- (match_operand:VB_VLS 1 "register_operand" " vr")
- (match_operand:VB_VLS 2 "register_operand" " vr"]
-  "TARGET_VECTOR && can_create_pseudo_p ()"
-  "#"
-  "&& 1"
-  [(const_int 0)]
-  {
-insn_code icode = code_for_pred_n (, mode);
-riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_MASK_OP, 
operands);
-DONE;
-  }
-  [(set_attr "type" "vmalu")
-   (set_attr "mode" "")])
-
-;; -
-;;  [INT] Binary narrow shifts.
-;; -
-;; Includes:
-;; - vnsrl.wv/vnsrl.wx/vnsrl.wi
-;; - vnsra.wv/vnsra.wx/vnsra.wi
-;; -
-
-(define_insn_and_split "*vtrunc"
-  [(set (match_operand: 0 "register_operand"   "=vr,vr")
-(truncate:
-  (any_shiftrt:VWEXTI
-(match_operand:VWEXTI 1 "register_operand" " vr,vr")
- (any_extend:VWEXTI
-  (match_operand: 2 "vector_shift_operand" " 
vr,vk")]
-  "TARGET_VECTOR && can_create_pseudo_p ()"
-  "#"
-  "&& 1"
-  [(const_int 0)]
-{
-  insn_code icode = code_for_pred_narrow (, mode);
-  riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, operands);
-  DONE;
-}
- [(set_attr "type" "vnshift")
-  (set_attr "mode" "")])
-
-(define_insn_and_split "*trunc"
-  [(set (match_operand: 0 "regist

Re: [PATCH v1] RISC-V: Support ceil and ceilf auto-vectorization

2023-09-19 Thread juzhe.zh...@rivai.ai
+;; -
+;;  [FP] Math.h.
+;; -
+;; Includes:
+;; - ceil/ceilf
+;; -
+(define_expand "ceil2"
+  [(match_operand:VF 0 "register_operand")
+   (match_operand:VF 1 "register_operand")]
+  "TARGET_VECTOR"
+  {
+rtx tmp = gen_reg_rtx (mode);
+rtx ops_1[] = {tmp, operands[1]};
+insn_code icode = code_for_pred_fcvt_x_f (UNSPEC_VFCVT, mode);
+
+/* vfcvt.x.f with rounding up (aka ceil).  */
+riscv_vector::emit_vlmax_insn (icode, riscv_vector::UNARY_OP_FRM_RUP, 
ops_1);
+
+rtx ops_2[] = {operands[0], tmp};
+icode = code_for_pred (FLOAT, mode);
+
+/* vfcvt.f.x for the final result.  To avoid unnecessary frm register
+   access, we use RUP here and it will never do the rounding up because
+   the tmp rtx comes from the float to int conversion.  */
+riscv_vector::emit_vlmax_insn (icode, riscv_vector::UNARY_OP_FRM_RUP, 
ops_2);
+
+DONE;
+  }
+)

It should be "V_VLSF" instead of "VF" so that you could also support VLS CEIL.

Besides, I want to see this following case:

a[i] = cond[i] ? CEIL (b[i]): c[i];

Ideally, we should be able to combine vfcvt + vmerge into vfcvt with mask.




juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-09-20 10:30
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Support ceil and ceilf auto-vectorization
From: Pan Li 
 
This patch would like to support auto-vectorization for both the
ceil and ceilf of math.h. It depends on the -ffast-math option.
 
When we would like to call ceil/ceilf like v2 = ceil (v1), we will
onvert it into below insn (reference the implementation of llvm).
 
* vfcvt.x.f v3, v1, RUP
* vfcvt.f.x v2, v3
 
The conditional auto-vectorization for ceil/ceilf is also supported
and covered by test cases.
 
Befor this patch:
math-ceil-1.c:21:1: missed: couldn't vectorize loop
  ...
.L3:
  flw fa0,0(s0)
  addis0,s0,4
  addis1,s1,4
  callceilf
  fsw fa0,-4(s1)
  bne s0,s2,.L3
 
After this patch:
  ...
  fsrmi   3
.L4:
  vsetvli a5,a2,e32,m1,ta,ma
  vle32.v v1,0(a1)
  vsetvli a3,zero,e32,m1,ta,ma
  sllia4,a5,2
  vfcvt.x.f.v v1,v1
  sub a2,a2,a5
  vfcvt.f.x.v v1,v1
  vsetvli zero,a5,e32,m1,ta,ma
  vse32.v v1,0(a0)
  add a1,a1,a4
  add a0,a0,a4
  bne a2,zero,.L4
.L14:
  fsrma6
  ret
 
Please not VLS mode is not involved in this patch and will be token
care of in the underlying patches soon.
 
gcc/ChangeLog:
 
* config/riscv/autovec.md (ceil2): New pattern.
* config/riscv/riscv-protos.h (enum insn_flags): New enum type.
(enum insn_type): Ditto.
* config/riscv/riscv-v.cc: Handle rounding up.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/math-ceil-1.c: New test.
* gcc.target/riscv/rvv/autovec/math-ceil-2.c: New test.
* gcc.target/riscv/rvv/autovec/math-ceil-3.c: New test.
* gcc.target/riscv/rvv/autovec/math-ceil-4.c: New test.
* gcc.target/riscv/rvv/autovec/math-ceil-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/math-ceil-run-2.c: New test.
* gcc.target/riscv/rvv/autovec/test-math.h: New test.
 
Signed-off-by: Pan Li 
---
gcc/config/riscv/autovec.md   | 30 +
gcc/config/riscv/riscv-protos.h   |  4 ++
gcc/config/riscv/riscv-v.cc   |  2 +
.../riscv/rvv/autovec/math-ceil-1.c   | 21 +
.../riscv/rvv/autovec/math-ceil-2.c   | 21 +
.../riscv/rvv/autovec/math-ceil-3.c   | 24 ++
.../riscv/rvv/autovec/math-ceil-4.c   | 24 ++
.../riscv/rvv/autovec/math-ceil-run-1.c   | 24 ++
.../riscv/rvv/autovec/math-ceil-run-2.c   | 24 ++
.../gcc.target/riscv/rvv/autovec/test-math.h  | 45 +++
10 files changed, 219 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-3.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-4.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/test-math.h
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 493d5745485..ea508d81047 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2374,3 +2374,33 @@ (define_expand "avg3_ceil"
   riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, ops3);
   DONE;
})
+
+;; -
+;;  [FP] Math.h.
+;; -

Re: Re: [Committed] RISC-V: Support VLS unary floating-point patterns

2023-09-19 Thread juzhe.zh...@rivai.ai
I think we could remove match.h.

Hi, @Patrick. Could you verify it?

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/def.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/def.h
index 2292372d7a3..674098e9ba6 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/def.h
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/def.h
@@ -1,5 +1,4 @@
 #include 
-#include 

and commit it.

Thanks.


juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-09-20 08:52
To: 钟居哲
CC: Patrick O'Neill; Robin Dapp; gcc-patches; Kito.cheng; jeffreyalaw; palmer; 
Edwin Lu; joern.rennecke; jeremy.bennett; gnu-toolchain
Subject: Re: Re: [Committed] RISC-V: Support VLS unary floating-point patterns
It seems because math.h, similar issue as stdint.h, does math.h necessary for 
the test case?

juzhe.zh...@rivai.ai  於 2023年9月20日 週三 08:44 寫道:
I didn't see this issue.
They should be the bogus FAILs.
We should either fix testcases or ignore them.



juzhe.zh...@rivai.ai
 
From: Patrick O'Neill
Date: 2023-09-20 08:34
To: Juzhe-Zhong; Robin Dapp; gcc-patches
CC: kito.cheng; kito.cheng; jeffreyalaw; Palmer Dabbelt; Edwin Lu; 
joern.rennecke; jeremy.bennett; gnu-toolchain
Subject: Re: [Committed] RISC-V: Support VLS unary floating-point patterns
Hi,
 
This patch highlights an issue Edwin and I have been having with the
testsuite where rv64 testcases are run when testing rv32gcv.
 
There's a large number of new failures in the rv32gcv testsuite from
this seemingly innocuous patch.
 
https://github.com/ewlu/riscv-gnu-toolchain/issues/166
(The repo is still a WIP - eventually will be non-gating patchworks
pre-commit CI)
 
From Edwin and my investigation the failures for rv32gcv look like [1].
/home/runner/work/riscv-gnu-toolchain/riscv-gnu-toolchain/build/sysroot/usr/include/gnu/stubs.h:17:11:
 
fatal error: gnu/stubs-lp64d.h: No such file or directory
compilation terminated.
 
Top of the failing testcase:
/* { dg-do compile } */
/* { dg-options "-march=rv64gcv_zvfh_zvl4096b -mabi=lp64d -O3 
-fno-schedule-insns -fno-schedule-insns2 --param=riscv-autovec-lmul=m8" } */
 
#include "def.h"
 
The dg-options explicitly set rv64gcv, so I don't think this testcase
should even be executed.
 
For the 3 new failures on rv64gcv, they all explicitly set rv32gcv.
/* { dg-options "-march=rv32gcv -mabi=ilp32d -O3" } */
 
These are seen on non-multilib builds. Multilib rv32/64gc does not
appear to have the same issue when compiling (we're currently testing
multilib rv32/64gcv to see if they encounter issues when executing).
 
Are other people seeing similar errors/is this a known issue?
 
Patrick
 
[1]:
Executing on host: 
/home/runner/work/riscv-gnu-toolchain/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/xgcc
 
-B/home/runner/work/riscv-gnu-toolchain/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/
 
/home/runner/work/riscv-gnu-toolchain/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/floating-point-mul-3.c
 
-march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output  
-O3 -ftree-vectorize --param riscv-autovec-preference=scalable 
-march=rv64gcv_zvfh_zvl4096b -mabi=lp64d -O3 -fno-schedule-insns 
-fno-schedule-insns2 --param=riscv-autovec-lmul=m8 -ffat-lto-objects 
-fno-ident -S   -o floating-point-mul-3.s(timeout = 600)
spawn -ignore SIGHUP 
/home/runner/work/riscv-gnu-toolchain/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/xgcc
 
-B/home/runner/work/riscv-gnu-toolchain/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/
 
/home/runner/work/riscv-gnu-toolchain/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/floating-point-mul-3.c
 
-march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output 
-O3 -ftree-vectorize --param riscv-autovec-preference=scalable 
-march=rv64gcv_zvfh_zvl4096b -mabi=lp64d -O3 -fno-schedule-insns 
-fno-schedule-insns2 --param=riscv-autovec-lmul=m8 -ffat-lto-objects 
-fno-ident -S -o floating-point-mul-3.s
In file included from 
/home/runner/work/riscv-gnu-toolchain/riscv-gnu-toolchain/build/sysroot/usr/include/features.h:515,
 from 
/home/runner/work/riscv-gnu-toolchain/riscv-gnu-toolchain/build/sysroot/usr/include/bits/libc-header-start.h:33,
 from 
/home/runner/work/riscv-gnu-toolchain/riscv-gnu-toolchain/build/sysroot/usr/include/math.h:27,
 from 
/home/runner/work/riscv-gnu-toolchain/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/def.h:2,
 from 
/home/runner/work/riscv-gnu-toolchain/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/floating-point-mul-3.c:4:
/home/runner/work/riscv-gnu-toolchain/riscv-gnu-toolchain/build/sysroot/usr/include/gnu/stubs.h:17:11:
 
fatal error: gnu/stubs-lp64d.h: No such file or directory
compilation terminated.
compiler exited with status 1
FAIL: gcc.target/riscv/rvv/autovec/vls/floating-point-mul-3.c -O3 
-ftree-vectorize

Re: Re: [Committed] RISC-V: Support VLS unary floating-point patterns

2023-09-19 Thread juzhe.zh...@rivai.ai
I didn't see this issue.
They should be the bogus FAILs.
We should either fix testcases or ignore them.



juzhe.zh...@rivai.ai
 
From: Patrick O'Neill
Date: 2023-09-20 08:34
To: Juzhe-Zhong; Robin Dapp; gcc-patches
CC: kito.cheng; kito.cheng; jeffreyalaw; Palmer Dabbelt; Edwin Lu; 
joern.rennecke; jeremy.bennett; gnu-toolchain
Subject: Re: [Committed] RISC-V: Support VLS unary floating-point patterns
Hi,
 
This patch highlights an issue Edwin and I have been having with the
testsuite where rv64 testcases are run when testing rv32gcv.
 
There's a large number of new failures in the rv32gcv testsuite from
this seemingly innocuous patch.
 
https://github.com/ewlu/riscv-gnu-toolchain/issues/166
(The repo is still a WIP - eventually will be non-gating patchworks
pre-commit CI)
 
From Edwin and my investigation the failures for rv32gcv look like [1].
/home/runner/work/riscv-gnu-toolchain/riscv-gnu-toolchain/build/sysroot/usr/include/gnu/stubs.h:17:11:
 
fatal error: gnu/stubs-lp64d.h: No such file or directory
compilation terminated.
 
Top of the failing testcase:
/* { dg-do compile } */
/* { dg-options "-march=rv64gcv_zvfh_zvl4096b -mabi=lp64d -O3 
-fno-schedule-insns -fno-schedule-insns2 --param=riscv-autovec-lmul=m8" } */
 
#include "def.h"
 
The dg-options explicitly set rv64gcv, so I don't think this testcase
should even be executed.
 
For the 3 new failures on rv64gcv, they all explicitly set rv32gcv.
/* { dg-options "-march=rv32gcv -mabi=ilp32d -O3" } */
 
These are seen on non-multilib builds. Multilib rv32/64gc does not
appear to have the same issue when compiling (we're currently testing
multilib rv32/64gcv to see if they encounter issues when executing).
 
Are other people seeing similar errors/is this a known issue?
 
Patrick
 
[1]:
Executing on host: 
/home/runner/work/riscv-gnu-toolchain/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/xgcc
 
-B/home/runner/work/riscv-gnu-toolchain/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/
 
/home/runner/work/riscv-gnu-toolchain/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/floating-point-mul-3.c
 
-march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output  
-O3 -ftree-vectorize --param riscv-autovec-preference=scalable 
-march=rv64gcv_zvfh_zvl4096b -mabi=lp64d -O3 -fno-schedule-insns 
-fno-schedule-insns2 --param=riscv-autovec-lmul=m8 -ffat-lto-objects 
-fno-ident -S   -o floating-point-mul-3.s(timeout = 600)
spawn -ignore SIGHUP 
/home/runner/work/riscv-gnu-toolchain/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/xgcc
 
-B/home/runner/work/riscv-gnu-toolchain/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/
 
/home/runner/work/riscv-gnu-toolchain/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/floating-point-mul-3.c
 
-march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output 
-O3 -ftree-vectorize --param riscv-autovec-preference=scalable 
-march=rv64gcv_zvfh_zvl4096b -mabi=lp64d -O3 -fno-schedule-insns 
-fno-schedule-insns2 --param=riscv-autovec-lmul=m8 -ffat-lto-objects 
-fno-ident -S -o floating-point-mul-3.s
In file included from 
/home/runner/work/riscv-gnu-toolchain/riscv-gnu-toolchain/build/sysroot/usr/include/features.h:515,
 from 
/home/runner/work/riscv-gnu-toolchain/riscv-gnu-toolchain/build/sysroot/usr/include/bits/libc-header-start.h:33,
 from 
/home/runner/work/riscv-gnu-toolchain/riscv-gnu-toolchain/build/sysroot/usr/include/math.h:27,
 from 
/home/runner/work/riscv-gnu-toolchain/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/def.h:2,
 from 
/home/runner/work/riscv-gnu-toolchain/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/floating-point-mul-3.c:4:
/home/runner/work/riscv-gnu-toolchain/riscv-gnu-toolchain/build/sysroot/usr/include/gnu/stubs.h:17:11:
 
fatal error: gnu/stubs-lp64d.h: No such file or directory
compilation terminated.
compiler exited with status 1
FAIL: gcc.target/riscv/rvv/autovec/vls/floating-point-mul-3.c -O3 
-ftree-vectorize --param riscv-autovec-preference=scalable (test for 
excess errors)
 
On 9/19/23 04:26, Juzhe-Zhong wrote:
> Extend current VLA patterns with VLS modes.
>
> Regression all passed.
>
> gcc/ChangeLog:
>
> * config/riscv/autovec.md: Extend VLS modes.
> * config/riscv/vector.md: Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/autovec/vls/def.h: Add unary test.
> * gcc.target/riscv/rvv/autovec/vls/neg-2.c: New test.
>
> ---
>   gcc/config/riscv/autovec.md   | 12 ++---
>   gcc/config/riscv/vector.md| 20 +++
>   .../gcc.target/riscv/rvv/autovec/vls/def.h|  3 +-
>   .../gcc.target/riscv/rvv/autovec/vls/neg-2.c  | 52 +++
>   4 files changed, 70 insertions(+), 17 deletions(-)
>   create mode 100644 gcc/testsui

Re: Re: [PATCH v1] RISC-V: Fix one ICE for vect test vect-multitypes-5

2023-09-19 Thread juzhe.zh...@rivai.ai
Thanks for reporting it.

Could you try this and verify for me?

-  rtx src_op_0 = XEXP (src, 0);
-
-  if (GET_CODE (src) == CONST && GET_CODE (src_op_0) == PLUS
-&& CONST_POLY_INT_P (XEXP (src_op_0, 1)))
+  if (GET_CODE (src) == CONST && GET_CODE (XEXP (src, 0)) == PLUS
+&& CONST_POLY_INT_P (XEXP (XEXP (src, 0), 1)))
 {
   rtx dest_tmp = gen_reg_rtx (mode);
   rtx tmp = gen_reg_rtx (mode);

-  riscv_emit_move (dest, XEXP (src_op_0, 0));
-  riscv_legitimize_poly_move (mode, dest_tmp, tmp, XEXP (src_op_0, 1));
+  riscv_emit_move (dest, XEXP (XEXP (src, 0), 0));
+  riscv_legitimize_poly_move (mode, dest_tmp, tmp, XEXP (XEXP (src, 0), 
1));

If it can fix your issue, plz send a patch and commit it.

Thanks.



juzhe.zh...@rivai.ai
 
From: Patrick O'Neill
Date: 2023-09-19 01:38
To: Li, Pan2; Kito Cheng
CC: gcc-patches@gcc.gnu.org; Wang, Yanzhang; juzhe.zh...@rivai.ai; Palmer 
Dabbelt
Subject: Re: [PATCH v1] RISC-V: Fix one ICE for vect test vect-multitypes-5
Hi,
 
After this patch, there is now an ICE when bootstrapping with
--enable-checking=rtl on rv32gc.
 
More details:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111461
 
Thanks,
Patrick
 
On 8/29/23 07:40, Li, Pan2 via Gcc-patches wrote:
> Committed, thanks Kito.
>
> Pan
>
> -Original Message-
> From: Kito Cheng 
> Sent: Tuesday, August 29, 2023 9:46 PM
> To: Li, Pan2 
> Cc: gcc-patches@gcc.gnu.org; Wang, Yanzhang ; 
> juzhe.zh...@rivai.ai
> Subject: Re: [PATCH v1] RISC-V: Fix one ICE for vect test vect-multitypes-5
>
> LGTM, thanks :)
>
> On Tue, Aug 29, 2023 at 6:50 PM Pan Li via Gcc-patches
>  wrote:
>> From: Pan Li 
>>
>> There will be one ICE when build vect-multitypes-5.c similar as below:
>>
>> riscv64-unknown-elf-gcc -O3 \
>>-march=rv64imafdcv -mabi=lp64d -mcmodel=medlow \
>>-fdiagnostics-plain-output -flto -ffat-lto-objects \
>>--param riscv-autovec-preference=scalable -Wno-psabi \
>>-ftree-vectorize -fno-tree-loop-distribute-patterns \
>>-fno-vect-cost-model -fno-common -O2 -fdump-tree-vect-details \
>>gcc/testsuite/gcc.dg/vect/vect-multitypes-5.c -o test.elf -lm
>>
>> The below RTL is not well handled in riscv_legitimize_const_move, and
>> then fall through to the default pass. Then the
>> default force_const_mem will NULL_RTX, and will have ICE when operating
>> one the NULL_RTX.
>>
>> (const:DI
>>(plus:DI
>>  (symbol_ref:DI ("ic") [flags 0x2] )
>>  (const_poly_int:DI [16, 16])))
>>
>> This patch would like to take care of this rtl in 
>> riscv_legitimize_const_move.
>>
>> Signed-off-by: Pan Li 
>> Co-Authored-By: Ju-Zhe Zhong 
>>
>> gcc/ChangeLog:
>>
>>  * config/riscv/riscv.cc (riscv_legitimize_poly_move): New 
>> declaration.
>>  (riscv_legitimize_const_move): Handle ref plus const poly.
>> ---
>>   gcc/config/riscv/riscv.cc | 23 +++
>>   1 file changed, 23 insertions(+)
>>
>> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
>> index 1d6e278ea90..bab6ed70b2d 100644
>> --- a/gcc/config/riscv/riscv.cc
>> +++ b/gcc/config/riscv/riscv.cc
>> @@ -366,6 +366,7 @@ static const struct riscv_tune_param 
>> optimize_size_tune_info = {
>>
>>   static tree riscv_handle_fndecl_attribute (tree *, tree, tree, int, bool 
>> *);
>>   static tree riscv_handle_type_attribute (tree *, tree, tree, int, bool *);
>> +static void riscv_legitimize_poly_move (machine_mode, rtx, rtx, rtx);
>>
>>   /* Defining target-specific uses of __attribute__.  */
>>   static const struct attribute_spec riscv_attribute_table[] =
>> @@ -2118,6 +2119,28 @@ riscv_legitimize_const_move (machine_mode mode, rtx 
>> dest, rtx src)
>> return;
>>   }
>>
>> +  /* Handle below format.
>> + (const:DI
>> +   (plus:DI
>> +(symbol_ref:DI ("ic") [flags 0x2] ) <- 
>> op_0
>> +(const_poly_int:DI [16, 16]) // <- op_1
>> + ))
>> +   */
>> +  rtx src_op_0 = XEXP (src, 0);
>> +
>> +  if (GET_CODE (src) == CONST && GET_CODE (src_op_0) == PLUS
>> +&& CONST_POLY_INT_P (XEXP (src_op_0, 1)))
>> +{
>> +  rtx dest_tmp = gen_reg_rtx (mode);
>> +  rtx tmp = gen_reg_rtx (mode);
>> +
>> +  riscv_emit_move (dest, XEXP (src_op_0, 0));
>> +  riscv_legitimize_poly_move (mode, dest_tmp, tmp, XEXP (src_op_0, 1));
>> +
>> +  emit_insn (gen_rtx_SET (dest, gen_rtx_PLUS (mode, dest, dest_tmp)));
>> +  return;
>> +}
>> +
>> src = force_const_mem (mode, src);
>>
>> /* When using explicit relocs, constant pool references are sometimes
>> --
>> 2.34.1
>>
 


Re: [PATCH] RISC-V: Refactor and cleanup fma patterns

2023-09-18 Thread juzhe.zh...@rivai.ai
Thanks for the refactoring.

This patch is needed in VLS fma support and undefined value enabling support.

LGTM.



juzhe.zh...@rivai.ai
 
From: Lehua Ding
Date: 2023-09-18 19:37
To: gcc-patches
CC: juzhe.zhong; kito.cheng; rdapp.gcc; palmer; jeffreyalaw; lehua.ding
Subject: [PATCH] RISC-V: Refactor and cleanup fma patterns
At present, FMA autovec's patterns do not fully use the corresponding pattern
in vector.md. The previous reason is that the merge operand of pattern in
vector.md cannot be VUNDEF. Now allowing it to be VUNDEF, reunify insn used for
reload pass into vector.md, and the corresponding vlmax pattern in autovec.md
is used for combine. This patch also refactors the corresponding combine
pattern inside autovec-opt.md and removes the unused ones.
 
gcc/ChangeLog:
 
* config/riscv/autovec-opt.md (*_fma):
Removed old combine patterns.
(*single_mult_plus): Ditto.
(*double_mult_plus): Ditto.
(*sign_zero_extend_fma): Ditto.
(*zero_sign_extend_fma): Ditto.
(*double_widen_fma): Ditto.
(*single_widen_fma): Ditto.
(*double_widen_fnma): Ditto.
(*single_widen_fnma): Ditto.
(*double_widen_fms): Ditto.
(*single_widen_fms): Ditto.
(*double_widen_fnms): Ditto.
(*single_widen_fnms): Ditto.
(*reduc_plus_scal_): Adjust name.
(*widen_reduc_plus_scal_): Adjust name.
(*dual_widen_fma): New combine pattern.
(*dual_widen_fmasu): Ditto.
(*dual_widen_fmaus): Ditto.
(*dual_fma): Ditto.
(*single_fma): Ditto.
(*dual_fnma): Ditto.
(*single_fnma): Ditto.
(*dual_fms): Ditto.
(*single_fms): Ditto.
(*dual_fnms): Ditto.
(*single_fnms): Ditto.
* config/riscv/autovec.md (fma4):
Reafctor fma pattern.
(*fma): Removed.
(fnma4): Reafctor.
(*fnma): Removed.
(*fma):  Removed.
(*fnma):  Removed.
(fms4):  Reafctor.
(*fms):  Removed.
(fnms4): Reafctor.
(*fnms): Removed.
* config/riscv/riscv-protos.h (prepare_ternary_operands):
Adjust prototype.
* config/riscv/riscv-v.cc (prepare_ternary_operands): Refactor.
* config/riscv/vector.md (*pred_mul_plus_undef): New pattern.
(*pred_mul_plus): Removed.
(*pred_mul_plus_scalar): Removed.
(*pred_mul_plus_extended_scalar): Removed.
(*pred_minus_mul_undef):  New pattern.
(*pred_minus_mul): Removed.
(*pred_minus_mul_scalar): Removed.
(*pred_minus_mul_extended_scalar): Removed.
(*pred_mul__undef):  New pattern.
(*pred_mul_): Removed.
(*pred_mul__scalar): Removed.
(*pred_mul_neg__undef):  New pattern.
(*pred_mul_neg_): Removed.
(*pred_mul_neg__scalar): Removed.
 
---
gcc/config/riscv/autovec-opt.md | 736 ++--
gcc/config/riscv/autovec.md | 301 -
gcc/config/riscv/riscv-protos.h |   2 +-
gcc/config/riscv/riscv-v.cc |  14 +-
gcc/config/riscv/vector.md  | 439 ++-
5 files changed, 528 insertions(+), 964 deletions(-)
 
diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
index df516849527..c94cd0ae087 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -110,166 +110,6 @@
   [(set_attr "type" "vmalu")
(set_attr "mode" "")])
 
-;; =
-;; == Widening Ternary arithmetic
-;; =
-
-;; -
-;;  [INT] VWMACC
-;; -
-;; Includes:
-;; - vwmacc.vv
-;; - vwmaccu.vv
-;; -
-
-;; Combine ext + ext + fma ===> widen fma.
-;; Most of circumstantces, LoopVectorizer will generate the following IR:
-;;   vect__8.64_40 = (vector([4,4]) int) vect__7.63_41;
-;;   vect__11.68_35 = (vector([4,4]) int) vect__10.67_36;
-;;   vect__13.70_33 = .FMA (vect__11.68_35, vect__8.64_40, vect__4.60_45);
-(define_insn_and_split "*_fma"
-  [(set (match_operand:VWEXTI 0 "register_operand")
- (plus:VWEXTI
-   (mult:VWEXTI
- (any_extend:VWEXTI
-   (match_operand: 2 "register_operand"))
- (any_extend:VWEXTI
-   (match_operand: 3 "register_operand")))
-   (match_operand:VWEXTI 1 "register_operand")))]
-  "TARGET_VECTOR && can_create_pseudo_p ()"
-  "#"
-  "&& 1"
-  [(const_int 0)]
-  {
-riscv_vector::emit_vlmax_insn (code_for_pred_widen_mul_plus (, 
mode),
- riscv_vector::WIDEN_TERNARY_OP, operands);
-DONE;
-  }
-  [(set_attr "type" "viwmuladd")
-   (set_attr "mode" "")])
-
-;; This helps to match ext + fma.
-(define_insn_and_split "*single_mult_plus"
-  [(set (match_operand:VWEXTI 0 "register_operand")
- (plus:VWEXTI
-   (mult:VWEXTI
- (any_extend:VWEXTI
-   (match_operand: 2 "register_operand"))
- (match_operand:VWEXTI 3 "register_operand"))
-   (match_operand:VWEXTI 1 "register_operand")

Re: [PATCH] RISC-V: Fix RVV can change mode class bug

2023-09-18 Thread juzhe.zh...@rivai.ai
Sorry for I made a mistake here.

Change 'mayb_lt' into '!ordered_p' in V2:
https://gcc.gnu.org/pipermail/gcc-patches/2023-September/630835.html 





juzhe.zh...@rivai.ai
 
From: Juzhe-Zhong
Date: 2023-09-19 10:25
To: gcc-patches
CC: kito.cheng; kito.cheng; jeffreyalaw; rdapp.gcc; Juzhe-Zhong
Subject: [PATCH] RISC-V: Fix RVV can change mode class bug
After support the VLS mode conversion, current case triggers a latent bug that 
we are
lucky we didn't encounter.
 
This is a real bug in 'cprop_hardreg':
 
orig:RVVMF8BI,16,16
new:V32BI,32,0
during RTL pass: cprop_hardreg
auto.c: In function 'main':
auto.c:79:1: internal compiler error: in partial_subreg_p, at rtl.h:3186
   79 | }
  | ^
0x10979a7 partial_subreg_p(machine_mode, machine_mode)
../../../../gcc/gcc/rtl.h:3186
0x1723eda mode_change_ok
../../../../gcc/gcc/regcprop.cc:402
0x1724007 maybe_mode_change
../../../../gcc/gcc/regcprop.cc:436
0x172445d find_oldest_value_reg
../../../../gcc/gcc/regcprop.cc:489
0x172534d copyprop_hardreg_forward_1
../../../../gcc/gcc/regcprop.cc:808
0x1727017 cprop_hardreg_bb
../../../../gcc/gcc/regcprop.cc:1358
0x17272f7 execute
../../../../gcc/gcc/regcprop.cc:1425
 
When trying to do reg copy propagation between RVVMF8BI (precision = 16,16)
and V32BI (precision = 32,0).
 
The assertion failed in partial_subreg_p:
gcc_checking_assert (ordered_p (outer_prec, inner_prec));
 
In regcprop.cc:
 
  if (partial_subreg_p (orig_mode, new_mode))
return false;
 
If orig_mode (RVVMF8BI) smaller than new_mode (V32BI), we don't do the hard reg 
propogation.
However, the 'partial_subreg_p' cause ICE since gcc_checking_assert (ordered_p 
(outer_prec, inner_prec)).
 
After analysis in aarch64.cc, they do careful block in 
'TARGET_CAN_CHANGE_MODE_CLASS'.
So it's reasonable block regcprop when old mode size maybe_lt than new mode 
size since we won't do the
copy propgation.
 
gcc/ChangeLog:
 
* config/riscv/riscv.cc (riscv_can_change_mode_class): Fix RVV mode change bug.
 
---
gcc/config/riscv/riscv.cc | 16 +++-
1 file changed, 15 insertions(+), 1 deletion(-)
 
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 8c766e2e2be..28b45a87351 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -8536,8 +8536,22 @@ riscv_slow_unaligned_access (machine_mode, unsigned int)
/* Implement TARGET_CAN_CHANGE_MODE_CLASS.  */
static bool
-riscv_can_change_mode_class (machine_mode, machine_mode, reg_class_t rclass)
+riscv_can_change_mode_class (machine_mode from, machine_mode to, reg_class_t 
rclass)
{
+  /* We have RVV VLS modes and VLA modes sharing same REG_CLASS.
+ In 'cprop_hardreg' stage, we will try to do hard reg copy propagation
+ between wider mode (FROM) and narrow mode (TO).
+
+ E.g. We should not allow copy propagation
+ - RVVMF8BI (precision = [16, 16]) -> V32BI (precision = [32, 0])
+ since such propagation cause ICE and execution FAIL.
+
+ However, we could allow copy propagation
+ - RVVMF4 (precision = [32, 32]) -> V32BI (precision = [32, 0])
+ since RVVMF4 always >= RV32BI.  */
+  if (reg_classes_intersect_p (V_REGS, rclass)
+  && maybe_lt (GET_MODE_PRECISION (from), GET_MODE_PRECISION (to)))
+return false;
   return !reg_classes_intersect_p (FP_REGS, rclass);
}
-- 
2.36.3
 


Re: [PATCH] RISC-V: Removed misleading comments in testcases

2023-09-18 Thread juzhe.zh...@rivai.ai
LGTM



juzhe.zh...@rivai.ai
 
From: Lehua Ding
Date: 2023-09-18 20:29
To: gcc-patches
CC: juzhe.zhong; kito.cheng; rdapp.gcc; palmer; jeffreyalaw; lehua.ding
Subject: [PATCH] RISC-V: Removed misleading comments in testcases
This patch removed the misleading comments in testcases since we
support fold min(int, poly) to constant by this patch
(https://gcc.gnu.org/pipermail/gcc-patches/2023-September/629651.html).
Thereby the csrr will not appear inside the assembly code, even if there
is no support for some VLS vector patterns.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/vls/div-1.c: Removed comments.
* gcc.target/riscv/rvv/autovec/vls/shift-3.c: Ditto.
 
---
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/div-1.c   | 1 -
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/shift-3.c | 1 -
2 files changed, 2 deletions(-)
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/div-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/div-1.c
index 40224c69458..e36fa9decfd 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/div-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/div-1.c
@@ -54,5 +54,4 @@ DEF_OP_VV (div, 256, int64_t, /)
DEF_OP_VV (div, 512, int64_t, /)
 
/* { dg-final { scan-assembler-times 
{vdivu?\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 42 } } */
-/* TODO: Ideally, we should make sure there is no "csrr vlenb". However, we 
still have 'csrr vlenb' for some cases since we don't support VLS mode 
conversion which are needed by division.  */
/* { dg-final { scan-assembler-not {csrr} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/shift-3.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/shift-3.c
index b34a349949b..db2295b2dd6 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/shift-3.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/shift-3.c
@@ -54,5 +54,4 @@ DEF_OP_VV (shift, 256, int64_t, <<)
DEF_OP_VV (shift, 512, int64_t, <<)
 
/* { dg-final { scan-assembler-times {vsll\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 
41 } } */
-/* TODO: Ideally, we should make sure there is no "csrr vlenb". However, we 
still have 'csrr vlenb' for some cases since we don't support VLS mode 
conversion which are needed by division.  */
/* { dg-final { scan-assembler-not {csrr} } } */
--
2.36.3
 


Re: [PATCH] RISC-V: Add fixed PR111255 testcase by other patch

2023-09-18 Thread juzhe.zh...@rivai.ai
LGTM。



juzhe.zh...@rivai.ai
 
From: Lehua Ding
Date: 2023-09-18 20:13
To: gcc-patches
CC: juzhe.zhong; kito.cheng; rdapp.gcc; palmer; jeffreyalaw; lehua.ding
Subject: [PATCH] RISC-V: Add fixed PR111255 testcase by other patch
This patch add the missed PR111255 testcase which is fixed by this
committed patch 
(https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628922.html).
 
PR target/111255
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/vsetvl/pr111255.c: New test.
 
---
.../gcc.target/riscv/rvv/vsetvl/pr111255.c| 24 +++
1 file changed, 24 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111255.c
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111255.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111255.c
new file mode 100644
index 000..736f6838a50
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111255.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 --param riscv-autovec-lmul=m2 
-fno-vect-cost-model" } */
+
+#include 
+
+#define DEF_LOOP(OLD_TYPE, NEW_TYPE)   
\
+  void __attribute__ ((noipa)) 
\
+  test_##OLD_TYPE##_2_##NEW_TYPE (NEW_TYPE *__restrict r,  
\
+   OLD_TYPE *__restrict a, NEW_TYPE b,  \
+   OLD_TYPE *__restrict pred, int n)\
+  {
\
+for (int i = 0; i < n; ++i)
\
+  {
\
+ r[i] = pred[i] ? (NEW_TYPE) a[i] : b;  \
+  }
\
+  }
+
+/* INT -> narrower-INT */
+#define TEST_ALL_X2X_NARROWER(T)   
\
+  T (int16_t, int8_t)
+
+TEST_ALL_X2X_NARROWER (DEF_LOOP)
+
+/* { dg-final { scan-assembler-not 
{\tvsetvli\t[a-x0-9]+,[a-x0-9]+,e[0-9]+,m[f0-9]+,t[au],m[au]\n\tvsetvli\t} } } 
*/
--
2.36.3
 


Re: Re: [PATCH V2] internal-fn: Support undefined rtx for uninitialized SSA_NAME

2023-09-17 Thread juzhe.zh...@rivai.ai
Thanks Richard.
Address comments on V3:
https://gcc.gnu.org/pipermail/gcc-patches/2023-September/630699.html 




juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-09-17 23:29
To: Juzhe-Zhong
CC: gcc-patches; rguenther
Subject: Re: [PATCH V2] internal-fn: Support undefined rtx for uninitialized 
SSA_NAME
Juzhe-Zhong  writes:
> According to PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110751
>
> As Richard and Richi suggested, we recognize uninitialized SSA_NAME and 
> convert it
> into SCRATCH rtx if the target predicate allows SCRATCH.
>
> It can help to reduce redundant data move instructions of targets like RISC-V.
>
> gcc/ChangeLog:
>
> * internal-fn.cc (expand_fn_using_insn): Support undefined rtx.
> * optabs.cc (maybe_legitimize_operand): Ditto.
> (can_reuse_operands_p): Ditto.
> * optabs.h (enum expand_operand_type): Ditto.
> (create_undefined_input_operand): Ditto.
>
> ---
>  gcc/internal-fn.cc |  4 
>  gcc/optabs.cc  | 16 
>  gcc/optabs.h   | 14 +-
>  3 files changed, 33 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 0fd34359247..61d5a9e4772 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -247,6 +247,10 @@ expand_fn_using_insn (gcall *stmt, insn_code icode, 
> unsigned int noutputs,
>  create_convert_operand_from (&ops[opno], rhs_rtx,
>   TYPE_MODE (rhs_type),
>   TYPE_UNSIGNED (rhs_type));
> +  else if (TREE_CODE (rhs) == SSA_NAME
> +&& SSA_NAME_IS_DEFAULT_DEF (rhs)
> +&& VAR_P (SSA_NAME_VAR (rhs)))
> + create_undefined_input_operand (&ops[opno], TYPE_MODE (rhs_type));
>else
>  create_input_operand (&ops[opno], rhs_rtx, TYPE_MODE (rhs_type));
>opno += 1;
> diff --git a/gcc/optabs.cc b/gcc/optabs.cc
> index 32ff379ffc3..d8c771547a3 100644
> --- a/gcc/optabs.cc
> +++ b/gcc/optabs.cc
> @@ -8102,6 +8102,21 @@ maybe_legitimize_operand (enum insn_code icode, 
> unsigned int opno,
>goto input;
>  }
>break;
> +
> +case EXPAND_UNDEFINED:
> +  {
> + mode = insn_data[(int) icode].operand[opno].mode;
> + rtx scratch = gen_rtx_SCRATCH (mode);
 
A scratch of the right mode should already be available in op->value,
since it was created by create_undefined_input_operand.
 
If that doesn't work for some reason, then it would be better for
create_undefined_input_operand to pass NULL_RTX as the "value"
argument to create_expand_operand.
 
> + /* For SCRATCH rtx which is converted from uninitialized
> +SSA, we convert it as fresh pseudo when target doesn't
> +allow scratch rtx in predicate. Otherwise, return true.  */
> + if (!insn_operand_matches (icode, opno, scratch))
> +   {
> + op->value = gen_reg_rtx (mode);
 
The mode should come from op->mode.
 
> + goto input;
> +   }
> + return true;
> +  }
>  }
>return insn_operand_matches (icode, opno, op->value);
>  }
> @@ -8147,6 +8162,7 @@ can_reuse_operands_p (enum insn_code icode,
>  case EXPAND_INPUT:
>  case EXPAND_ADDRESS:
>  case EXPAND_INTEGER:
> +case EXPAND_UNDEFINED:
>return true;
 
I think this should be in the "return false" block instead.
 
>  
>  case EXPAND_CONVERT_TO:
> diff --git a/gcc/optabs.h b/gcc/optabs.h
> index c80b7f4dc1b..4eb1f9ee09a 100644
> --- a/gcc/optabs.h
> +++ b/gcc/optabs.h
> @@ -37,7 +37,8 @@ enum expand_operand_type {
>EXPAND_CONVERT_TO,
>EXPAND_CONVERT_FROM,
>EXPAND_ADDRESS,
> -  EXPAND_INTEGER
> +  EXPAND_INTEGER,
> +  EXPAND_UNDEFINED
 
Sorry, this was my bad suggestion.  I should have suggested
EXPAND_UNDEFINED_INPUT, to match the name of the function.
 
Thanks,
Richard
 
>  };
>  
>  /* Information about an operand for instruction expansion.  */
> @@ -117,6 +118,17 @@ create_input_operand (class expand_operand *op, rtx 
> value,
>create_expand_operand (op, EXPAND_INPUT, value, mode, false);
>  }
>  
> +/* Make OP describe an undefined input operand for uninitialized
> +   SSA.  It's the scratch operand with mode MODE; MODE cannot be
> +   VOIDmode.  */
> +
> +inline void
> +create_undefined_input_operand (class expand_operand *op, machine_mode mode)
> +{
> +  create_expand_operand (op, EXPAND_UNDEFINED, gen_rtx_SCRATCH (mode), mode,
> + false);
> +}
> +
>  /* Like create_input_operand, except that VALUE must first be converted
> to mode MODE.  UNSIGNED_P says whether VALUE is unsigned.  */
 


Re: [PATCH] RISC-V: Remove phase 6 of vsetvl pass in GCC13[PR111412]

2023-09-17 Thread juzhe.zh...@rivai.ai
Thanks for fixing it.
I am ok remove phase 6 optimization which has many latent bugs (in GCC 14 kito 
has refactored it) there.
But I think we need kito's more comments about that.



juzhe.zh...@rivai.ai
 
From: Li Xu
Date: 2023-09-18 12:19
To: gcc-patches
CC: kito.cheng; palmer; juzhe.zhong; xuli
Subject: [PATCH] RISC-V: Remove phase 6 of vsetvl pass in GCC13[PR111412]
From: xuli 
 
vsetvl pass has been refactored in gcc14, and the optimization
is more reasonable than releases/gcc-13. This problem does not
exist in gcc14.
 
Phase 6 of gcc13 is an optimization patch. Due to lack of consideration,
there will be some hidden bugs, so we decided to remove phase 6.
Although the generated code will be redundant, the program is correct.
 
PR target/111412
 
gcc/ChangeLog:
 
* config/riscv/riscv-vsetvl.cc (vector_infos_manager::release): Remove.
(pass_vsetvl::refine_vsetvls): Ditto.
(pass_vsetvl::cleanup_vsetvls): Ditto.
(pass_vsetvl::propagate_avl): Ditto.
(pass_vsetvl::lazy_vsetvl): Ditto.
* config/riscv/riscv-vsetvl.h: Ditto.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/vsetvl/avl_single-79.c: Adjust case.
* gcc.target/riscv/rvv/vsetvl/avl_single-80.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-86.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-87.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-88.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-89.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-90.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-25.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-26.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-14.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-15.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vsetvl-1.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vsetvl-5.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vsetvl-6.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vsetvl-7.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vsetvl-8.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vsetvlmax-2.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vsetvlmax-4.c: Ditto.
* gcc.target/riscv/rvv/base/pr111412.c: New test.
---
gcc/config/riscv/riscv-vsetvl.cc  | 153 +-
gcc/config/riscv/riscv-vsetvl.h   |   2 -
.../gcc.target/riscv/rvv/base/pr111412.c  |  41 +
.../riscv/rvv/vsetvl/avl_single-79.c  |   4 +-
.../riscv/rvv/vsetvl/avl_single-80.c  |   4 +-
.../riscv/rvv/vsetvl/avl_single-86.c  |   4 +-
.../riscv/rvv/vsetvl/avl_single-87.c  |   4 +-
.../riscv/rvv/vsetvl/avl_single-88.c  |   4 +-
.../riscv/rvv/vsetvl/avl_single-89.c  |   4 +-
.../riscv/rvv/vsetvl/avl_single-90.c  |   4 +-
.../riscv/rvv/vsetvl/vlmax_back_prop-25.c |  10 +-
.../riscv/rvv/vsetvl/vlmax_back_prop-26.c |  10 +-
.../riscv/rvv/vsetvl/vlmax_switch_vtype-14.c  |   6 +-
.../riscv/rvv/vsetvl/vlmax_switch_vtype-15.c  |   2 +-
.../gcc.target/riscv/rvv/vsetvl/vsetvl-1.c|   2 +-
.../gcc.target/riscv/rvv/vsetvl/vsetvl-5.c|   2 +-
.../gcc.target/riscv/rvv/vsetvl/vsetvl-6.c|   2 +-
.../gcc.target/riscv/rvv/vsetvl/vsetvl-7.c|   2 +-
.../gcc.target/riscv/rvv/vsetvl/vsetvl-8.c|   2 +-
.../gcc.target/riscv/rvv/vsetvl/vsetvlmax-2.c |   4 +-
.../gcc.target/riscv/rvv/vsetvl/vsetvlmax-4.c |   4 +-
21 files changed, 80 insertions(+), 190 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr111412.c
 
diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 0cf4bc818e2..9dca2ce709d 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -2494,8 +2494,6 @@ vector_infos_manager::release (void)
   if (!vector_exprs.is_empty ())
 vector_exprs.release ();
-  gcc_assert (to_refine_vsetvls.is_empty ());
-  gcc_assert (to_delete_vsetvls.is_empty ());
   if (optimize > 0)
 free_bitmap_vectors ();
}
@@ -2702,9 +2700,6 @@ private:
   /* Phase 5.  */
   void cleanup_insns (void) const;
-  /* Phase 6.  */
-  void propagate_avl (void) const;
-
   void init (void);
   void done (void);
   void compute_probabilities (void);
@@ -3823,10 +3818,8 @@ pass_vsetvl::refine_vsetvls (void) const
   /* We can't refine user vsetvl into vsetvl zero,zero since the dest
will be used by the following instructions.  */
   if (vector_config_insn_p (rinsn))
- {
-   m_vector_manager->to_refine_vsetvls.add (rinsn);
  continue;
- }
+
   rinsn = PREV_INSN (rinsn);
   rtx new_pat = gen_vsetvl_pat (VSETVL_VTYPE_CHANGE_ONLY, info, NULL_RTX);
   change_insn (rinsn, new_pat);
@@ -3862,10 +3855,7 @@ pass_vsetvl::cleanup_vsetvls ()
  /* We can't eliminate user vsetvl since the dest will be used
   * by the following instructions.  */
  if (vector_config_insn_p (insn->rtl ()))
- {

Re: Re: [PATCH v3] RISC-V: Implement RESOLVE_OVERLOADED_BUILTIN for RVV intrinsic

2023-09-14 Thread juzhe.zh...@rivai.ai
More information:

For PRED_TYPE_tumu, it's easy to analyze, just need to count how many arguments 
in the arglist.
If arglist has 5 arguments (mask, merge, op1, op2, len) Then it must be TUMU.

What I mean is that we should be able to quickly to compute the arguments of 
the construction of the function_instance.
Then we can get the non-overloaeded function.



juzhe.zh...@rivai.ai
 
From: juzhe.zh...@rivai.ai
Date: 2023-09-15 10:02
To: pan2.li; gcc-patches
CC: pan2.li; yanzhang.wang; kito.cheng
Subject: Re: [PATCH v3] RISC-V: Implement RESOLVE_OVERLOADED_BUILTIN for RVV 
intrinsic
Sorry for comment again.

I am not happy with current get_non_overloaeded_instance function.

I think the searching approach is very in-effective:

+function_instance *
+function_base::get_non_overloaded_instance (unsigned int code,
+ vec &arglist) const
+{
+  unsigned int code_limit = vec_safe_length (registered_functions);
+
+  for (unsigned fun_code = code; fun_code < code_limit; fun_code++)
+{
+  registered_function *rfun = (*registered_functions)[fun_code];
+  function_instance instance = rfun->instance;
+
+  if (rfun->overloaded_p)
+ continue;
+
+  unsigned k;
+  const rvv_arg_type_info *args = instance.op_info->args;
+
+  for (k = 0; args[k].base_type != NUM_BASE_TYPES; k++)
+ {
+   if (k >= arglist.length ())
+ break;
+
+   if (TYPE_MODE (instance.get_arg_type (k))
+ != TYPE_MODE (TREE_TYPE (arglist[k])))
+ break;
+ }
+
+ if (args[k].base_type == NUM_BASE_TYPES)
+   return &rfun->instance;
+}
+
+  return NULL;
+}


Instead, I think we should build up a table which map non-overloaded function 
according to the arguments so that we could get the "instance" effectively.

E.g. For vint8mf8_t tumu vadd intrinsic the instance is like this:
function_instance ("vadd", bases::vadd, shapes::alu,
  iu_ops[VECTOR_TYPE_vuint8mf8_t], PRED_TYPE_tumu, &iu_vvv_ops);

Since the get_nonoverloaed_instance is already the function of the class BASE. 
So, The first 3 arguments "vadd", bases::vadd, shapes::alu
should already known since it is a known function_base.

The last 3 arguments may need some elegant analysis or map table to quickly 
grep.

So, I think we should consider this framework seriously.



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-09-12 16:46
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v3] RISC-V: Implement RESOLVE_OVERLOADED_BUILTIN for RVV 
intrinsic
From: Pan Li 
 
Update in v3:
 
* Rewrite comment for overloaded function add.
* Move get_non_overloaded_instance to function_base.
 
Update in v2:
 
* Add get_non_overloaded_instance for function instance.
* Fix overload check for policy function.
* Enrich the test cases check.
 
Original log:
 
This patch would like add the framework to support the RVV overloaded
intrinsic API in riscv-xxx-xxx-gcc, like riscv-xxx-xxx-g++ did.
 
However, it almost leverage the hook TARGET_RESOLVE_OVERLOADED_BUILTIN
with below steps.
 
* Register overloaded functions.
* Add function_resolver for overloaded function resolving.
* Add resolve API for function shape with default implementation.
* Implement HOOK for navigating the overloaded API to non-overloaded API.
 
We validated this framework by the vmv_v intrinsic API(s), and we will
add more intrins API support in the underlying patches.
 
gcc/ChangeLog:
 
* config/riscv/riscv-c.cc
(riscv_resolve_overloaded_builtin): New function for the hook.
(riscv_register_pragmas): Register the hook
* config/riscv/riscv-protos.h (resolve_overloaded_builtin): New decl.
* config/riscv/riscv-vector-builtins-shapes.cc (build_one):
Register overloaded function.
(struct overloaded_base): New struct for overloaded shape.
(struct non_overloaded_base): New struct for non overloaded shape.
(struct move_def): Inherit overloaded shape.
* config/riscv/riscv-vector-builtins.cc
(function_base::get_non_overloaded_instance): New API impl.
(function_builder::add_function): Add overloaded arg.
(function_resolver::function_resolver): New constructor.
(function_builder::add_overloaded_function): New API impl.
(function_resolver::resolve): Ditto.
(function_resolver::lookup): Ditto.
(function_resolver::get_sub_code): Ditto.
(resolve_overloaded_builtin): New function impl.
* config/riscv/riscv-vector-builtins.h:
(class function_resolver): New class.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/overloaded_rv32_vmv_v.c: New test.
* gcc.target/riscv/rvv/base/overloaded_rv64_vmv_v.c: New test.
* gcc.target/riscv/rvv/base/overloaded_vmv_v.h: New test.
 
Signed-off-by: Pan Li 
---
gcc/config/riscv/riscv-c.cc   |  36 
gcc/config/riscv/riscv-protos.h   |   1 +
.../riscv/riscv-vector-builtins-shapes.cc |  20 ++-
gcc/config/riscv/riscv-vector-builtins.cc | 155 +-
gcc/config/riscv/riscv-vector-builtins.h  |  36 +++-
.../riscv/rvv/base/overloaded_r

Re: [PATCH v3] RISC-V: Implement RESOLVE_OVERLOADED_BUILTIN for RVV intrinsic

2023-09-14 Thread juzhe.zh...@rivai.ai
Sorry for comment again.

I am not happy with current get_non_overloaeded_instance function.

I think the searching approach is very in-effective:

+function_instance *
+function_base::get_non_overloaded_instance (unsigned int code,
+ vec &arglist) const
+{
+  unsigned int code_limit = vec_safe_length (registered_functions);
+
+  for (unsigned fun_code = code; fun_code < code_limit; fun_code++)
+{
+  registered_function *rfun = (*registered_functions)[fun_code];
+  function_instance instance = rfun->instance;
+
+  if (rfun->overloaded_p)
+ continue;
+
+  unsigned k;
+  const rvv_arg_type_info *args = instance.op_info->args;
+
+  for (k = 0; args[k].base_type != NUM_BASE_TYPES; k++)
+ {
+   if (k >= arglist.length ())
+ break;
+
+   if (TYPE_MODE (instance.get_arg_type (k))
+ != TYPE_MODE (TREE_TYPE (arglist[k])))
+ break;
+ }
+
+ if (args[k].base_type == NUM_BASE_TYPES)
+   return &rfun->instance;
+}
+
+  return NULL;
+}


Instead, I think we should build up a table which map non-overloaded function 
according to the arguments so that we could get the "instance" effectively.

E.g. For vint8mf8_t tumu vadd intrinsic the instance is like this:
function_instance ("vadd", bases::vadd, shapes::alu,
  iu_ops[VECTOR_TYPE_vuint8mf8_t], PRED_TYPE_tumu, &iu_vvv_ops);

Since the get_nonoverloaed_instance is already the function of the class BASE. 
So, The first 3 arguments "vadd", bases::vadd, shapes::alu
should already known since it is a known function_base.

The last 3 arguments may need some elegant analysis or map table to quickly 
grep.

So, I think we should consider this framework seriously.



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-09-12 16:46
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v3] RISC-V: Implement RESOLVE_OVERLOADED_BUILTIN for RVV 
intrinsic
From: Pan Li 
 
Update in v3:
 
* Rewrite comment for overloaded function add.
* Move get_non_overloaded_instance to function_base.
 
Update in v2:
 
* Add get_non_overloaded_instance for function instance.
* Fix overload check for policy function.
* Enrich the test cases check.
 
Original log:
 
This patch would like add the framework to support the RVV overloaded
intrinsic API in riscv-xxx-xxx-gcc, like riscv-xxx-xxx-g++ did.
 
However, it almost leverage the hook TARGET_RESOLVE_OVERLOADED_BUILTIN
with below steps.
 
* Register overloaded functions.
* Add function_resolver for overloaded function resolving.
* Add resolve API for function shape with default implementation.
* Implement HOOK for navigating the overloaded API to non-overloaded API.
 
We validated this framework by the vmv_v intrinsic API(s), and we will
add more intrins API support in the underlying patches.
 
gcc/ChangeLog:
 
* config/riscv/riscv-c.cc
(riscv_resolve_overloaded_builtin): New function for the hook.
(riscv_register_pragmas): Register the hook
* config/riscv/riscv-protos.h (resolve_overloaded_builtin): New decl.
* config/riscv/riscv-vector-builtins-shapes.cc (build_one):
Register overloaded function.
(struct overloaded_base): New struct for overloaded shape.
(struct non_overloaded_base): New struct for non overloaded shape.
(struct move_def): Inherit overloaded shape.
* config/riscv/riscv-vector-builtins.cc
(function_base::get_non_overloaded_instance): New API impl.
(function_builder::add_function): Add overloaded arg.
(function_resolver::function_resolver): New constructor.
(function_builder::add_overloaded_function): New API impl.
(function_resolver::resolve): Ditto.
(function_resolver::lookup): Ditto.
(function_resolver::get_sub_code): Ditto.
(resolve_overloaded_builtin): New function impl.
* config/riscv/riscv-vector-builtins.h:
(class function_resolver): New class.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/overloaded_rv32_vmv_v.c: New test.
* gcc.target/riscv/rvv/base/overloaded_rv64_vmv_v.c: New test.
* gcc.target/riscv/rvv/base/overloaded_vmv_v.h: New test.
 
Signed-off-by: Pan Li 
---
gcc/config/riscv/riscv-c.cc   |  36 
gcc/config/riscv/riscv-protos.h   |   1 +
.../riscv/riscv-vector-builtins-shapes.cc |  20 ++-
gcc/config/riscv/riscv-vector-builtins.cc | 155 +-
gcc/config/riscv/riscv-vector-builtins.h  |  36 +++-
.../riscv/rvv/base/overloaded_rv32_vmv_v.c|   8 +
.../riscv/rvv/base/overloaded_rv64_vmv_v.c|   8 +
.../riscv/rvv/base/overloaded_vmv_v.h |  27 +++
8 files changed, 288 insertions(+), 3 deletions(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/overloaded_rv32_vmv_v.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/overloaded_rv64_vmv_v.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/overloaded_vmv_v.h
 
diff --git a/gcc/config/riscv/riscv-c.cc b/gcc/config/riscv/riscv-c.cc
index 283052ae313..060edd3129d 100644
--- a/gcc/config/riscv/riscv-c.cc
+++ b/gcc/config/ris

Re: Re: [PATCH V3] RISC-V: Expand VLS mode to scalar mode move[PR111391]

2023-09-14 Thread juzhe.zh...@rivai.ai
Hi. Kito.

Could you review this code ? Regression is running
  /* Expand
   (set (reg:DI target) (subreg:DI (reg:V8QI reg) 0))
 Expand this data movement instead of simply forbid it since
 we can improve the code generation for this following scenario
 by RVV auto-vectorization:
   (set (reg:V8QI 149) (vec_duplicate:V8QI (reg:QI))
   (set (reg:DI target) (subreg:DI (reg:V8QI reg) 0))
 Since RVV mode and scalar mode are in different REG_CLASS,
 we need to explicitly move data from V_REGS to GR_REGS by scalar move.  */
  if (SUBREG_P (src) && riscv_v_ext_mode_p (GET_MODE (SUBREG_REG (src
{
  machine_mode vmode = GET_MODE (SUBREG_REG (src));
  unsigned int mode_size = GET_MODE_SIZE (mode).to_constant ();
  unsigned int vmode_size = GET_MODE_SIZE (vmode).to_constant ();
  unsigned int nunits = vmode_size / mode_size;
  scalar_mode smode = as_a (mode);
  unsigned int index = SUBREG_BYTE (src).to_constant () / mode_size;
  unsigned int num = smode == DImode && !TARGET_VECTOR_ELEN_64 ? 2 : 1;

  if (num == 2)
{
  /* If we want to extract 64bit value but ELEN < 64,
 we use RVV vector mode with EEW = 32 to extract
 the highpart and lowpart.  */
  smode = SImode;
  nunits = nunits * 2;
}
  vmode = riscv_vector::get_vector_mode (smode, nunits).require ();
  enum insn_code icode
= convert_optab_handler (vec_extract_optab, vmode, smode);
  gcc_assert (icode != CODE_FOR_nothing);
  rtx v = gen_lowpart (vmode, SUBREG_REG (src));

  for (unsigned int i = 0; i < num; i++)
{
  class expand_operand ops[3];
  rtx result;
  if (num == 1)
result = dest;
  else if (i == 0)
result = gen_lowpart (smode, dest);
  else
result = gen_reg_rtx (smode);
  create_output_operand (&ops[0], result, smode);
  ops[0].target = 1;
  create_input_operand (&ops[1], v, vmode);
  create_integer_operand (&ops[2], index + i);
  expand_insn (icode, 3, ops);
  if (ops[0].value != result)
emit_move_insn (result, ops[0].value);

  if (i == 1)
{
  rtx tmp
= expand_binop (Pmode, ashl_optab, gen_lowpart (Pmode, result),
gen_int_mode (32, Pmode), NULL_RTX, 0,
OPTAB_DIRECT);
  rtx tmp2 = expand_binop (Pmode, ior_optab, tmp, dest, NULL_RTX, 0,
   OPTAB_DIRECT);
  emit_move_insn (dest, tmp2);
}
}
  return true;
}


ASM:
vsetivli zero,2,e32,mf2,ta,ma
vslidedown.vi v2,v1,1
vmv.x.s a5,v2
slli a5,a5,32
vmv.x.s a0,v1
or a0,a5,a0



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-09-14 17:26
To: juzhe.zh...@rivai.ai
CC: gcc-patches; Kito.cheng; jeffreyalaw; Robin Dapp
Subject: Re: Re: [PATCH V3] RISC-V: Expand VLS mode to scalar mode 
move[PR111391]
Yeah, try pr111391.c with rv64gc_zve32x (NO v, my mistake in last mail
:P), maybe add a testcase pr111391-zve32x.c that just include
pr111391.c and set dg option to rv64gc_zve32x
 
 
On Thu, Sep 14, 2023 at 5:24 PM juzhe.zh...@rivai.ai
 wrote:
>
> You mean try pr111391.c
> that I added with rv64gcv_zve32x ?
>
>
>
> juzhe.zh...@rivai.ai
>
> From: Kito Cheng
> Date: 2023-09-14 17:20
> To: juzhe.zh...@rivai.ai
> CC: gcc-patches; Kito.cheng; jeffreyalaw; Robin Dapp
> Subject: Re: Re: [PATCH V3] RISC-V: Expand VLS mode to scalar mode 
> move[PR111391]
> Could you check if it work correctly for rv64gcv_zve32x? add testcase
> no matter if it works or not :)
>
> On Thu, Sep 14, 2023 at 5:19 PM juzhe.zh...@rivai.ai
>  wrote:
> >
> > Is it Ok for trunk ? Or you want me send a separate patch to remove "@" in 
> > vec_extract optab ?
> >
> >
> >
> > juzhe.zh...@rivai.ai
> >
> > From: Kito Cheng
> > Date: 2023-09-14 16:11
> > To: Juzhe-Zhong
> > CC: gcc-patches; kito.cheng; jeffreyalaw; rdapp.gcc
> > Subject: Re: [PATCH V3] RISC-V: Expand VLS mode to scalar mode 
> > move[PR111391]
> > On Thu, Sep 14, 2023 at 4:04 PM Juzhe-Zhong  wrote:
> > >
> > > This patch fixes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111391
> > >
> > > I notice that previous patch (V2 patch) cause additional execution fail 
> > > of pr69719.c
> > > This FAIL is because of the latent BUG of VSETVL PASS.
> > >
> > > So this patch includes VSETVL PASS fix even though it's not related to 
> > > the PR111391.
> > >
> > > I have confirm the whole regression no additional FAILs are introduced.
> > >
> > > PR target/111391
> > >
> &

Re: Re: [PATCH V3] RISC-V: Expand VLS mode to scalar mode move[PR111391]

2023-09-14 Thread juzhe.zh...@rivai.ai
Oh I see.
It ICE:

during RTL pass: expand
bug.c:26:9: internal compiler error: in require, at machmode.h:313
   26 | i (a);
  | ^
0x1032253 opt_mode::require() const
../../../../gcc/gcc/machmode.h:313
0x1c47877 riscv_legitimize_move(machine_mode, rtx_def*, rtx_def*)
../../../../gcc/gcc/config/riscv/riscv.cc:2532
0x274bbe0 gen_movdi(rtx_def*, rtx_def*)
../../../../gcc/gcc/config/riscv/riscv.md:2024
0x102cb1c rtx_insn* insn_gen_fn::operator()(rtx_def*, 
rtx_def*) const
../../../../gcc/gcc/recog.h:411
0x11fbc8e emit_move_insn_1(rtx_def*, rtx_def*)
../../../../gcc/gcc/expr.cc:4164
0x11fc809 emit_move_insn(rtx_def*, rtx_def*)
../../../../gcc/gcc/expr.cc:4334
0x1039a0b load_register_parameters
../../../../gcc/gcc/calls.cc:2155
0x103d865 expand_call(tree_node*, rtx_def*, int)
../../../../gcc/gcc/calls.cc:3626
0x121e78c expand_expr_real_1(tree_node*, rtx_def*, machine_mode, 
expand_modifier, rtx_def**, bool)
../../../../gcc/gcc/expr.cc:11921
0x120ffb8 expand_expr_real(tree_node*, rtx_def*, machine_mode, expand_modifier, 
rtx_def**, bool)
../../../../gcc/gcc/expr.cc:9010
0x102c694 expand_expr(tree_node*, rtx_def*, machine_mode, expand_modifier)
../../../../gcc/gcc/expr.h:310
0x105ccc9 expand_call_stmt
../../../../gcc/gcc/cfgexpand.cc:2831
0x10608af expand_gimple_stmt_1
../../../../gcc/gcc/cfgexpand.cc:3880
0x1060f4d expand_gimple_stmt
../../../../gcc/gcc/cfgexpand.cc:4044
0x10699f3 expand_gimple_basic_block


Thanks for catching this.



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-09-14 17:20
To: juzhe.zh...@rivai.ai
CC: gcc-patches; Kito.cheng; jeffreyalaw; Robin Dapp
Subject: Re: Re: [PATCH V3] RISC-V: Expand VLS mode to scalar mode 
move[PR111391]
Could you check if it work correctly for rv64gcv_zve32x? add testcase
no matter if it works or not :)
 
On Thu, Sep 14, 2023 at 5:19 PM juzhe.zh...@rivai.ai
 wrote:
>
> Is it Ok for trunk ? Or you want me send a separate patch to remove "@" in 
> vec_extract optab ?
>
>
>
> juzhe.zh...@rivai.ai
>
> From: Kito Cheng
> Date: 2023-09-14 16:11
> To: Juzhe-Zhong
> CC: gcc-patches; kito.cheng; jeffreyalaw; rdapp.gcc
> Subject: Re: [PATCH V3] RISC-V: Expand VLS mode to scalar mode move[PR111391]
> On Thu, Sep 14, 2023 at 4:04 PM Juzhe-Zhong  wrote:
> >
> > This patch fixes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111391
> >
> > I notice that previous patch (V2 patch) cause additional execution fail of 
> > pr69719.c
> > This FAIL is because of the latent BUG of VSETVL PASS.
> >
> > So this patch includes VSETVL PASS fix even though it's not related to the 
> > PR111391.
> >
> > I have confirm the whole regression no additional FAILs are introduced.
> >
> > PR target/111391
> >
> > gcc/ChangeLog:
> >
> > * config/riscv/autovec.md (@vec_extract): Remove @.
> > (vec_extract): Ditto.
> > * config/riscv/riscv-vsetvl.cc (emit_vsetvl_insn): Fix bug.
> > (pass_vsetvl::local_eliminate_vsetvl_insn): Ditto.
> > * config/riscv/riscv.cc (riscv_legitimize_move): Expand move.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/riscv/rvv/autovec/partial/slp-9.c: Adapt test.
> > * gcc.target/riscv/rvv/autovec/pr111391.c: New test.
> >
> > ---
> >  gcc/config/riscv/autovec.md   |  2 +-
> >  gcc/config/riscv/riscv-vsetvl.cc  |  4 ++-
> >  gcc/config/riscv/riscv.cc | 32 +++
> >  .../riscv/rvv/autovec/partial/slp-9.c |  1 -
> >  .../gcc.target/riscv/rvv/autovec/pr111391.c   | 28 
> >  5 files changed, 64 insertions(+), 3 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr111391.c
> >
> > diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
> > index e74a1695709..7121bab1716 100644
> > --- a/gcc/config/riscv/autovec.md
> > +++ b/gcc/config/riscv/autovec.md
> > @@ -1442,7 +1442,7 @@
> >  ;; 
> > -
> >  ;;  [INT,FP] Extract a vector element.
> >  ;; 
> > -
> > -(define_expand "@vec_extract"
> > +(define_expand "vec_extract"
>
> Why remove this? I saw this change was introduced in v3?
>
>
> >[(set (match_operand: 0 "register_operand")
> >   (vec_select:
> > (match_operand:V_VLS  1 "register_operand")
>
 


Re: Re: [PATCH V3] RISC-V: Expand VLS mode to scalar mode move[PR111391]

2023-09-14 Thread juzhe.zh...@rivai.ai
You mean try pr111391.c 
that I added with rv64gcv_zve32x ?



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-09-14 17:20
To: juzhe.zh...@rivai.ai
CC: gcc-patches; Kito.cheng; jeffreyalaw; Robin Dapp
Subject: Re: Re: [PATCH V3] RISC-V: Expand VLS mode to scalar mode 
move[PR111391]
Could you check if it work correctly for rv64gcv_zve32x? add testcase
no matter if it works or not :)
 
On Thu, Sep 14, 2023 at 5:19 PM juzhe.zh...@rivai.ai
 wrote:
>
> Is it Ok for trunk ? Or you want me send a separate patch to remove "@" in 
> vec_extract optab ?
>
>
>
> juzhe.zh...@rivai.ai
>
> From: Kito Cheng
> Date: 2023-09-14 16:11
> To: Juzhe-Zhong
> CC: gcc-patches; kito.cheng; jeffreyalaw; rdapp.gcc
> Subject: Re: [PATCH V3] RISC-V: Expand VLS mode to scalar mode move[PR111391]
> On Thu, Sep 14, 2023 at 4:04 PM Juzhe-Zhong  wrote:
> >
> > This patch fixes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111391
> >
> > I notice that previous patch (V2 patch) cause additional execution fail of 
> > pr69719.c
> > This FAIL is because of the latent BUG of VSETVL PASS.
> >
> > So this patch includes VSETVL PASS fix even though it's not related to the 
> > PR111391.
> >
> > I have confirm the whole regression no additional FAILs are introduced.
> >
> > PR target/111391
> >
> > gcc/ChangeLog:
> >
> > * config/riscv/autovec.md (@vec_extract): Remove @.
> > (vec_extract): Ditto.
> > * config/riscv/riscv-vsetvl.cc (emit_vsetvl_insn): Fix bug.
> > (pass_vsetvl::local_eliminate_vsetvl_insn): Ditto.
> > * config/riscv/riscv.cc (riscv_legitimize_move): Expand move.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/riscv/rvv/autovec/partial/slp-9.c: Adapt test.
> > * gcc.target/riscv/rvv/autovec/pr111391.c: New test.
> >
> > ---
> >  gcc/config/riscv/autovec.md   |  2 +-
> >  gcc/config/riscv/riscv-vsetvl.cc  |  4 ++-
> >  gcc/config/riscv/riscv.cc | 32 +++
> >  .../riscv/rvv/autovec/partial/slp-9.c |  1 -
> >  .../gcc.target/riscv/rvv/autovec/pr111391.c   | 28 
> >  5 files changed, 64 insertions(+), 3 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr111391.c
> >
> > diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
> > index e74a1695709..7121bab1716 100644
> > --- a/gcc/config/riscv/autovec.md
> > +++ b/gcc/config/riscv/autovec.md
> > @@ -1442,7 +1442,7 @@
> >  ;; 
> > -
> >  ;;  [INT,FP] Extract a vector element.
> >  ;; 
> > -
> > -(define_expand "@vec_extract"
> > +(define_expand "vec_extract"
>
> Why remove this? I saw this change was introduced in v3?
>
>
> >[(set (match_operand: 0 "register_operand")
> >   (vec_select:
> > (match_operand:V_VLS  1 "register_operand")
>
 


Re: Re: [PATCH V3] RISC-V: Expand VLS mode to scalar mode move[PR111391]

2023-09-14 Thread juzhe.zh...@rivai.ai
Is it Ok for trunk ? Or you want me send a separate patch to remove "@" in 
vec_extract optab ?



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-09-14 16:11
To: Juzhe-Zhong
CC: gcc-patches; kito.cheng; jeffreyalaw; rdapp.gcc
Subject: Re: [PATCH V3] RISC-V: Expand VLS mode to scalar mode move[PR111391]
On Thu, Sep 14, 2023 at 4:04 PM Juzhe-Zhong  wrote:
>
> This patch fixes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111391
>
> I notice that previous patch (V2 patch) cause additional execution fail of 
> pr69719.c
> This FAIL is because of the latent BUG of VSETVL PASS.
>
> So this patch includes VSETVL PASS fix even though it's not related to the 
> PR111391.
>
> I have confirm the whole regression no additional FAILs are introduced.
>
> PR target/111391
>
> gcc/ChangeLog:
>
> * config/riscv/autovec.md (@vec_extract): Remove @.
> (vec_extract): Ditto.
> * config/riscv/riscv-vsetvl.cc (emit_vsetvl_insn): Fix bug.
> (pass_vsetvl::local_eliminate_vsetvl_insn): Ditto.
> * config/riscv/riscv.cc (riscv_legitimize_move): Expand move.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/autovec/partial/slp-9.c: Adapt test.
> * gcc.target/riscv/rvv/autovec/pr111391.c: New test.
>
> ---
>  gcc/config/riscv/autovec.md   |  2 +-
>  gcc/config/riscv/riscv-vsetvl.cc  |  4 ++-
>  gcc/config/riscv/riscv.cc | 32 +++
>  .../riscv/rvv/autovec/partial/slp-9.c |  1 -
>  .../gcc.target/riscv/rvv/autovec/pr111391.c   | 28 
>  5 files changed, 64 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr111391.c
>
> diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
> index e74a1695709..7121bab1716 100644
> --- a/gcc/config/riscv/autovec.md
> +++ b/gcc/config/riscv/autovec.md
> @@ -1442,7 +1442,7 @@
>  ;; -
>  ;;  [INT,FP] Extract a vector element.
>  ;; -
> -(define_expand "@vec_extract"
> +(define_expand "vec_extract"
 
Why remove this? I saw this change was introduced in v3?
 
 
>[(set (match_operand: 0 "register_operand")
>   (vec_select:
> (match_operand:V_VLS  1 "register_operand")
 


Re: Re: [PATCH V3] RISC-V: Expand VLS mode to scalar mode move[PR111391]

2023-09-14 Thread juzhe.zh...@rivai.ai

>> Why remove this? I saw this change was introduced in v3?

The "@" was introduced by this patch: 
https://gcc.gnu.org/pipermail/gcc-patches/2023-September/630184.html 
At the first time, I thought I need to explicitly call emit_insn 
(gen_vec_extract (mode, mode, )
That's why I added in the last patch.

However, I found I don't need to call gen_vec_extract, so I remove "@" in this 
patch:
+  enum insn_code icode
+   = convert_optab_handler (vec_extract_optab, vmode, mode);
+  gcc_assert (icode != CODE_FOR_nothing);
+  class expand_operand ops[3];
+  create_output_operand (&ops[0], dest, mode);
+  ops[0].target = 1;
+  create_input_operand (&ops[1], gen_lowpart (vmode, SUBREG_REG (src)),
+   vmode);
+  unsigned int index = SUBREG_BYTE (src).to_constant () / mode_size;
+  create_integer_operand (&ops[2], index);
+  expand_insn (icode, 3, ops);
This code is copied from optabs-query.cc



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-09-14 16:11
To: Juzhe-Zhong
CC: gcc-patches; kito.cheng; jeffreyalaw; rdapp.gcc
Subject: Re: [PATCH V3] RISC-V: Expand VLS mode to scalar mode move[PR111391]
On Thu, Sep 14, 2023 at 4:04 PM Juzhe-Zhong  wrote:
>
> This patch fixes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111391
>
> I notice that previous patch (V2 patch) cause additional execution fail of 
> pr69719.c
> This FAIL is because of the latent BUG of VSETVL PASS.
>
> So this patch includes VSETVL PASS fix even though it's not related to the 
> PR111391.
>
> I have confirm the whole regression no additional FAILs are introduced.
>
> PR target/111391
>
> gcc/ChangeLog:
>
> * config/riscv/autovec.md (@vec_extract): Remove @.
> (vec_extract): Ditto.
> * config/riscv/riscv-vsetvl.cc (emit_vsetvl_insn): Fix bug.
> (pass_vsetvl::local_eliminate_vsetvl_insn): Ditto.
> * config/riscv/riscv.cc (riscv_legitimize_move): Expand move.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/autovec/partial/slp-9.c: Adapt test.
> * gcc.target/riscv/rvv/autovec/pr111391.c: New test.
>
> ---
>  gcc/config/riscv/autovec.md   |  2 +-
>  gcc/config/riscv/riscv-vsetvl.cc  |  4 ++-
>  gcc/config/riscv/riscv.cc | 32 +++
>  .../riscv/rvv/autovec/partial/slp-9.c |  1 -
>  .../gcc.target/riscv/rvv/autovec/pr111391.c   | 28 
>  5 files changed, 64 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr111391.c
>
> diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
> index e74a1695709..7121bab1716 100644
> --- a/gcc/config/riscv/autovec.md
> +++ b/gcc/config/riscv/autovec.md
> @@ -1442,7 +1442,7 @@
>  ;; -
>  ;;  [INT,FP] Extract a vector element.
>  ;; -
> -(define_expand "@vec_extract"
> +(define_expand "vec_extract"
 
Why remove this? I saw this change was introduced in v3?
 
 
>[(set (match_operand: 0 "register_operand")
>   (vec_select:
> (match_operand:V_VLS  1 "register_operand")
 


Re: [PATCH] RISC-V: Expand VLS mode to scalar mode move[PR111391]

2023-09-13 Thread juzhe.zh...@rivai.ai
Just realize this patch cause some unexpected ICE FAILs in GCC regression.

Now, V2: 
https://gcc.gnu.org/pipermail/gcc-patches/2023-September/630194.html 
has fully passed the regression.




juzhe.zh...@rivai.ai
 
From: Juzhe-Zhong
Date: 2023-09-13 21:01
To: gcc-patches
CC: kito.cheng; kito.cheng; jeffreyalaw; rdapp.gcc; Juzhe-Zhong
Subject: [PATCH] RISC-V: Expand VLS mode to scalar mode move[PR111391]
This patch fixes PR111391: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111391
 
PR target/111391
 
gcc/ChangeLog:
 
* config/riscv/riscv.cc (riscv_legitimize_move): Expand VLS to scalar move.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/partial/slp-9.c: Adapt test.
* gcc.target/riscv/rvv/autovec/pr111391.c: New test.
 
---
gcc/config/riscv/riscv.cc | 29 +++
.../riscv/rvv/autovec/partial/slp-9.c |  1 -
.../gcc.target/riscv/rvv/autovec/pr111391.c   | 28 ++
3 files changed, 57 insertions(+), 1 deletion(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr111391.c
 
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 9d04ddd69e0..b7daad7cbb5 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -2513,6 +2513,35 @@ riscv_legitimize_move (machine_mode mode, rtx dest, rtx 
src)
}
   return true;
 }
+  /* Expand
+   (set (reg:DI target) (subreg:DI (reg:V8QI reg) 0))
+ Expand this data movement instead of simply forbid it since
+ we can improve the code generation for this following scenario
+ by RVV auto-vectorization:
+   (set (reg:V8QI 149) (vec_duplicate:V8QI (reg:QI))
+   (set (reg:DI target) (subreg:DI (reg:V8QI reg) 0))
+ Since RVV mode and scalar mode are in different REG_CLASS,
+ we need to explicitly move data from V_REGS to GR_REGS by scalar move.  */
+  if (SUBREG_P (src) && riscv_v_ext_mode_p (GET_MODE (SUBREG_REG (src
+{
+  rtx subreg = force_reg (GET_MODE (SUBREG_REG (src)), SUBREG_REG (src));
+  machine_mode imode = GET_MODE_INNER (GET_MODE (subreg));
+  unsigned int ratio = GET_MODE_SIZE (mode).to_constant ()
+/ GET_MODE_SIZE (imode).to_constant ();
+  poly_int64 nunits = GET_MODE_NUNITS (GET_MODE (subreg));
+  nunits = exact_div (nunits, ratio);
+  scalar_mode smode = as_a (mode);
+  machine_mode vmode
+ = riscv_vector::get_vector_mode (smode, nunits).require ();
+  rtx tmp = gen_reg_rtx (mode);
+  rtx index
+ = gen_int_mode (exact_div (SUBREG_BYTE (src), GET_MODE_SIZE (smode)),
+ Pmode);
+  emit_insn (gen_vec_extract (vmode, vmode, tmp,
+   gen_lowpart (vmode, subreg), index));
+  emit_move_insn (dest, tmp);
+  return true;
+}
   /* Expand
(set (reg:QI target) (mem:QI (address)))
  to
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-9.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-9.c
index 5fba27c7a35..7c42438c9d9 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-9.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-9.c
@@ -29,4 +29,3 @@
TEST_ALL (VEC_PERM)
/* { dg-final { scan-assembler-times {viota.m} 2 } } */
-/* { dg-final { scan-assembler-not {vmv\.v\.i} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr111391.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr111391.c
new file mode 100644
index 000..a7f64c937c6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr111391.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -Wno-int-conversion 
-Wno-implicit-function -Wno-incompatible-pointer-types 
-Wno-implicit-function-declaration -Ofast -ftree-vectorize" } */
+
+int d ();
+typedef struct
+{
+  int b;
+} c;
+int
+e (char *f, long g)
+{
+  f += g;
+  while (g--)
+*--f = d;
+}
+
+int
+d (c * f)
+{
+  while (h ())
+switch (f->b)
+  case 'Q':
+  {
+ long a;
+ e (&a, sizeof (a));
+ i (a);
+  }
+}
-- 
2.36.3
 


Re: Re: [PATCH] RISC-V: Support VLS modes VEC_EXTRACT auto-vectorization

2023-09-13 Thread juzhe.zh...@rivai.ai
>> Do we need the additional helper function? 
Yes. We need the additional helper function since I will cal emit_insn 
(gen_vec_extract (mode, mode)
in the following patch which fixes PR111391 ICE.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-09-13 20:31
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Support VLS modes VEC_EXTRACT auto-vectorization
> -(define_expand "vec_extract"
> +(define_expand "@vec_extract"
 
Do we need the additional helper function?  If not let's rather not
add them for build-time reasons.  The rest is OK, no need for v2.
 
Regards
Robin
 


gimple-match: Do not try UNCOND optimization with COND_LEN.

2023-09-13 Thread juzhe.zh...@rivai.ai
Thanks Robin for fixing it.

-  : cond (cond_in), else_value (else_value_in)
+  : cond (cond_in), else_value (else_value_in), len (NULL_TREE),
+bias (NULL_TREE)It seems that you shouldn't include this fix in the patch?
+
+  if (len)
+{
+  /* If we had a COND_LEN before we need to ensure that it stays that
+way.  */
+  gimple_match_op old_op = *res_op;
+  *res_op = cond_op;
+  maybe_resimplify_conditional_op (seq, res_op, valueize);
+
+  auto cfn = combined_fn (res_op->code);
+  if (internal_fn_p (cfn)
+ && internal_fn_len_index (as_internal_fn (cfn)) != -1)
+   return true;
+
+  *res_op = old_op;
+  return false;
+}
+  else
+{
+  *res_op = cond_op;
+  maybe_resimplify_conditional_op (seq, res_op, valueize);
+  return true;
+}
This looks odd to me. 

Currently, we never has cond_len_xxx with dummy length (length = VF) and we 
always use cond_xxx if we don't have a loop mask.
So, the length of cond_len_xxx is always generated by MIN or SELET_VL. 
I think we don't need the gimple simplification like cond_len -> into argument 
value.

But we need this following optimization:

negate + cond_len_fma -> cond_len_fnma/cond_len_fms/cond_len_fnms.
That's what I want to support in gimple fold.

Let's see more comments from Richard and Richi.



juzhe.zh...@rivai.ai


Re: Re: [PATCH] RISC-V: Support VECTOR BOOL vcond_mask optab[PR111337]

2023-09-12 Thread juzhe.zh...@rivai.ai
Ok add it in V2:

https://gcc.gnu.org/pipermail/gcc-patches/2023-September/630048.html 



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-09-12 21:29
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Support VECTOR BOOL vcond_mask optab[PR111337]
Maybe you want to add PR target/111337 to the changelog?
 
The rest LGTM.
 
Regards
Robin
 


Re: Re: [PATCH V4] RISC-V: Support Dynamic LMUL Cost model

2023-09-12 Thread juzhe.zh...@rivai.ai
Then you don't need to waste time on reduce the case from SPEC.



juzhe.zh...@rivai.ai
 
From: juzhe.zh...@rivai.ai
Date: 2023-09-12 17:36
To: Robin Dapp; gcc-patches
CC: Robin Dapp; kito.cheng; Kito.cheng; jeffreyalaw
Subject: Re: Re: [PATCH V4] RISC-V: Support Dynamic LMUL Cost model
This is first version of dynamic LMUL.
I didn't test it with full GCC testsuite.

My plan is to first pass all GCC testsuite (including vect.exp) with default 
LMUL = M1.
Then enable dynamic LMUL to test it.

Maybe we could tolerate this ICE issue for now. Then we can test it with full 
GCC testsuite (I belive we can reproduce with some case in GCC testsuite in the 
future).

Is that reasonable ? If yes, I will fix all your comments and send V5.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-09-12 17:31
To: juzhe.zh...@rivai.ai; gcc-patches
CC: rdapp.gcc; kito.cheng; Kito.cheng; jeffreyalaw
Subject: Re: [PATCH V4] RISC-V: Support Dynamic LMUL Cost model
> Is calculix big ?
 
It's 7 nested for loops IIRC and, when unrolling, can get pretty nasty.
I tested with -Ofast -funroll-loops.  I think wrf is even larger, maybe I
can run a full comparison test tonight to have good coverage.
 
> Could you give me the testcase to reproduce it?
 
OK, I will try to reduce it, will be Fortran, though.
 
Regards
Robin
 


Re: Re: [PATCH V4] RISC-V: Support Dynamic LMUL Cost model

2023-09-12 Thread juzhe.zh...@rivai.ai
This is first version of dynamic LMUL.
I didn't test it with full GCC testsuite.

My plan is to first pass all GCC testsuite (including vect.exp) with default 
LMUL = M1.
Then enable dynamic LMUL to test it.

Maybe we could tolerate this ICE issue for now. Then we can test it with full 
GCC testsuite (I belive we can reproduce with some case in GCC testsuite in the 
future).

Is that reasonable ? If yes, I will fix all your comments and send V5.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-09-12 17:31
To: juzhe.zh...@rivai.ai; gcc-patches
CC: rdapp.gcc; kito.cheng; Kito.cheng; jeffreyalaw
Subject: Re: [PATCH V4] RISC-V: Support Dynamic LMUL Cost model
> Is calculix big ?
 
It's 7 nested for loops IIRC and, when unrolling, can get pretty nasty.
I tested with -Ofast -funroll-loops.  I think wrf is even larger, maybe I
can run a full comparison test tonight to have good coverage.
 
> Could you give me the testcase to reproduce it?
 
OK, I will try to reduce it, will be Fortran, though.
 
Regards
Robin
 


Re: Re: [PATCH V4] RISC-V: Support Dynamic LMUL Cost model

2023-09-12 Thread juzhe.zh...@rivai.ai
Is calculix big ?

Could you give me the testcase to reproduce it?

For +  gcc_assert (biggest_size >= mode_size);
I currently don't have an idea to fix it.

But for +  mode = TYPE_MODE (TREE_TYPE (lhs));
I think I can fix it. 

if (!gimple_store_p (stmt))
{
  tree lhs = gimple_get_lhs (stmt);
  mode = TYPE_MODE (TREE_TYPE (lhs));

If it is not a STORE, I assume it always has a LHS. Turns out that my original 
thought is incorrect.
I think I know the fix.





juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-09-12 17:17
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw
Subject: Re: [PATCH V4] RISC-V: Support Dynamic LMUL Cost model
I did some benchmarks and, at least for calculix the differences are
miniscule.  I'd say we can stick with the current approach and improve
as needed.
 
However, I noticed ICEs here:
 
+  gcc_assert (biggest_size >= mode_size);
 
and here:
 
+  mode = TYPE_MODE (TREE_TYPE (lhs));
 
when compiling calculix.
 
Regards
Robin
 


Re: Re: [PATCH V4] RISC-V: Support Dynamic LMUL Cost model

2023-09-12 Thread juzhe.zh...@rivai.ai
Thanks Robin.

I have tried your codes. It works fine and tests passes.
Does your code O(nlogn) complexity ?




juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-09-12 16:19
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw
Subject: Re: [PATCH V4] RISC-V: Support Dynamic LMUL Cost model
Hi Juzhe,
 
> +max_number_of_live_regs (const basic_block bb,
> + const hash_map &live_ranges,
> + unsigned int max_point, machine_mode biggest_mode,
> + int lmul)
> +{
> +  unsigned int max_nregs = 0;
> +  unsigned int i;
> +  unsigned int live_point = 0;
> +  auto_vec live_vars_vec;
> +  live_vars_vec.safe_grow (max_point + 1, true);
> +  for (i = 0; i < live_vars_vec.length (); ++i)
> +live_vars_vec[i] = 0;
> +  for (hash_map::iterator iter = live_ranges.begin ();
> +   iter != live_ranges.end (); ++iter)
> +{
> +  tree var = (*iter).first;
> +  pair live_range = (*iter).second;
> +  for (i = live_range.first; i <= live_range.second; i++)
> + {
> +   machine_mode mode = TYPE_MODE (TREE_TYPE (var));
> +   unsigned int nregs
> + = compute_nregs_for_mode (mode, biggest_mode, lmul);
> +   live_vars_vec[i] += nregs;
> +   if (live_vars_vec[i] > max_nregs)
> + max_nregs = live_vars_vec[i];
> + }
> +}
 
My concern is that we have O(nm) here, where n = number of live_ranges
and m = size of live range.  In large basic blocks (think calculix of
SPECfp 2006 which can reach up to 2000 instructions IIRC) this might
become prohibitive.
 
I'm going to do a quick benchmark with calculix and report back.  If
there is no noticable difference we can ditch my idea.
 
For short live ranges (like < 10) the O(nm) could be better.  As of now,
we still calculate the nregs n*m times, though.  I have something like
the following in mind (it is definitely not shorter, though):
 
  struct range {
  unsigned int pt;
  bool start;
  unsigned int nregs;
  };
 
  auto_vec ranges (2 * live_ranges.elements ());
  for (hash_map::iterator iter = live_ranges.begin ();
   iter != live_ranges.end (); ++iter)
{
  tree var = (*iter).first;
  machine_mode mode = TYPE_MODE (TREE_TYPE (var));
  unsigned int nregs
  = compute_nregs_for_mode (mode, biggest_mode, lmul);
  ranges.quick_push ({(*iter).second.first, true, nregs});
  ranges.quick_push ({(*iter).second.second, false, nregs});
}
 
  ranges.qsort ([] (const void *a, const void *b) -> int {
unsigned int aa = ((const range *)a)->pt;
unsigned int bb = ((const range *)b)->pt;
if (aa < bb)
  return -1;
if (aa == bb)
  return 0;
return 1;
});
 
  unsigned int cur = 0;
  max_nregs = ranges[0].nregs;
 
  for (auto r : ranges)
{
  if (r.start)
cur += r.nregs;
  else
cur -= r.nregs;
  max_nregs = MAX (max_nregs, cur);
}
 
> +  for (i = 0; i < cfun->gimple_df->ssa_names->length (); i++)
> +{
> +  tree t = ssa_name (i);
> +  if (!t)
> +   continue;
 
Could likely be replaced by
 
  tree t;
  FOR_EACH_SSA_NAME (i, t, cfun)
 
> +static void
> +update_local_live_ranges (
> +  vec_info *vinfo,
> +  hash_map> &program_points_per_bb,
> +  hash_map> &live_ranges_per_bb)
> +{
 
I just realized (sorry) that this is "nested" a bit far.  Can we still
have e.g. 
 
> +  if (loop_vec_info loop_vinfo = dyn_cast (vinfo))
> +{
 
this,
 
> +   if (STMT_VINFO_TYPE (vect_stmt_to_vectorize (stmt_info))
> +   != undef_vec_info_type)
 
this,
 
> +   if (live_range)
> + {
 
and this just "continue"?
 
Apart from that, LGTM.
 
Regards
Robin
 
 


Re: [PATCH] RISC-V: Add missed cond autovec testcases

2023-09-12 Thread juzhe.zh...@rivai.ai
LGTM.



juzhe.zh...@rivai.ai
 
From: Lehua Ding
Date: 2023-09-12 16:57
To: gcc-patches
CC: juzhe.zhong; kito.cheng; rdapp.gcc; palmer; jeffreyalaw; lehua.ding
Subject: [PATCH] RISC-V: Add missed cond autovec testcases
This patch adds all missed cond autovec testcases. For not support
cond patterns, the following patches will be sent to fix it.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/cond/cond_arith-1.c: Add vrem op.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-8.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-9.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_logical_run-1.c: Moved to...
* gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max_run-1.c: ...here.
* gcc.target/riscv/rvv/autovec/cond/cond_logical_run-2.c: Moved to...
* gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max_run-2.c: ...here.
* gcc.target/riscv/rvv/autovec/cond/cond_logical_run-3.c: Moved to...
* gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max_run-3.c: ...here.
* gcc.target/riscv/rvv/autovec/cond/cond_logical_run-4.c: Moved to...
* gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max_run-4.c: ...here.
* gcc.target/riscv/rvv/autovec/cond/cond_logical_run-5.c: Moved to...
* gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max_run-5.c: ...here.
* gcc.target/riscv/rvv/autovec/cond/cond_logical-1.c: Removed.
* gcc.target/riscv/rvv/autovec/cond/cond_logical-2.c: Removed.
* gcc.target/riscv/rvv/autovec/cond/cond_logical-3.c: Removed.
* gcc.target/riscv/rvv/autovec/cond/cond_logical-4.c: Removed.
* gcc.target/riscv/rvv/autovec/cond/cond_logical-5.c: Removed.
* gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max-4.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max-5.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_widen_complicate-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_widen_complicate-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_widen_complicate-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_widen_complicate-4.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_widen_complicate-5.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_widen_complicate-6.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_widen_complicate-7.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_widen_complicate-8.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_widen_complicate-9.c: New test.
 
---
.../riscv/rvv/autovec/cond/cond_arith-1.c | 13 +
.../riscv/rvv/autovec/cond/cond_arith-2.c |  3 ++
.../riscv/rvv/autovec/cond/cond_arith-3.c | 15 ++
.../riscv/rvv/autovec/cond/cond_arith-4.c |  3 ++
.../riscv/rvv/autovec/cond/cond_arith-5.c | 13 +
.../riscv/rvv/autovec/cond/cond_arith-6.c |  3 ++
.../riscv/rvv/autovec/cond/cond_arith-7.c |  9 
.../riscv/rvv/autovec/cond/cond_arith-8.c | 17 ++-
.../riscv/rvv/autovec/cond/cond_arith-9.c | 11 -
.../riscv/rvv/autovec/cond/cond_logical-1.c   | 43 
.../riscv/rvv/autovec/cond/cond_logical-2.c   | 43 
.../riscv/rvv/autovec/cond/cond_logical-3.c   | 43 
.../riscv/rvv/autovec/cond/cond_logical-4.c   | 43 
.../riscv/rvv/autovec/cond/cond_logical-5.c   | 43 
.../rvv/autovec/cond/cond_logical_min_max-1.c | 49 +++
.../rvv/autovec/cond/cond_logical_min_max-2.c | 49 +++
.../rvv/autovec/cond/cond_logical_min_max-3.c | 49 +++
.../rvv/autovec/cond/cond_logical_min_max-4.c | 49 +++
.../rvv/autovec/cond/cond_logical_min_max-5.c | 49 +++
...l_run-1.c => cond_logical_min_max_run-1.c} |  2 +-
...l_run-2.c => cond_logical_min_max_run-2.c} |  2 +-
...l_run-3.c => cond_logical_min_max_run-3.c} |  2 +-
...l_run-4.c => cond_logical_min_max_run-4.c} |  2 +-
...l_run-5.c => cond_logical_min_max_run-5.c} |  2 +-
.../autovec/cond/cond_widen_complicate-1.c| 35 +
.../autovec/cond/cond_widen_complicate-2.c| 35 +
.../autovec/cond/cond_widen_complicate-3.c| 36 ++
.../autovec/cond/cond_widen_complicate-4.c| 35 +
.../autovec/cond/cond_widen_complicate-5.c| 37 ++
.../autovec/cond/cond_widen_complicate-6.c| 32 
.../autovec/cond/cond_widen_complicate-7.c| 29 +++
.../autovec/cond/cond_widen_complicate-8.c| 28 +++
.

Re: [PATCH v3] RISC-V: Implement RESOLVE_OVERLOADED_BUILTIN for RVV intrinsic

2023-09-12 Thread juzhe.zh...@rivai.ai
It looks reasonable to me now.
But let's wait for kito's more comments.



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-09-12 16:46
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v3] RISC-V: Implement RESOLVE_OVERLOADED_BUILTIN for RVV 
intrinsic
From: Pan Li 
 
Update in v3:
 
* Rewrite comment for overloaded function add.
* Move get_non_overloaded_instance to function_base.
 
Update in v2:
 
* Add get_non_overloaded_instance for function instance.
* Fix overload check for policy function.
* Enrich the test cases check.
 
Original log:
 
This patch would like add the framework to support the RVV overloaded
intrinsic API in riscv-xxx-xxx-gcc, like riscv-xxx-xxx-g++ did.
 
However, it almost leverage the hook TARGET_RESOLVE_OVERLOADED_BUILTIN
with below steps.
 
* Register overloaded functions.
* Add function_resolver for overloaded function resolving.
* Add resolve API for function shape with default implementation.
* Implement HOOK for navigating the overloaded API to non-overloaded API.
 
We validated this framework by the vmv_v intrinsic API(s), and we will
add more intrins API support in the underlying patches.
 
gcc/ChangeLog:
 
* config/riscv/riscv-c.cc
(riscv_resolve_overloaded_builtin): New function for the hook.
(riscv_register_pragmas): Register the hook
* config/riscv/riscv-protos.h (resolve_overloaded_builtin): New decl.
* config/riscv/riscv-vector-builtins-shapes.cc (build_one):
Register overloaded function.
(struct overloaded_base): New struct for overloaded shape.
(struct non_overloaded_base): New struct for non overloaded shape.
(struct move_def): Inherit overloaded shape.
* config/riscv/riscv-vector-builtins.cc
(function_base::get_non_overloaded_instance): New API impl.
(function_builder::add_function): Add overloaded arg.
(function_resolver::function_resolver): New constructor.
(function_builder::add_overloaded_function): New API impl.
(function_resolver::resolve): Ditto.
(function_resolver::lookup): Ditto.
(function_resolver::get_sub_code): Ditto.
(resolve_overloaded_builtin): New function impl.
* config/riscv/riscv-vector-builtins.h:
(class function_resolver): New class.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/overloaded_rv32_vmv_v.c: New test.
* gcc.target/riscv/rvv/base/overloaded_rv64_vmv_v.c: New test.
* gcc.target/riscv/rvv/base/overloaded_vmv_v.h: New test.
 
Signed-off-by: Pan Li 
---
gcc/config/riscv/riscv-c.cc   |  36 
gcc/config/riscv/riscv-protos.h   |   1 +
.../riscv/riscv-vector-builtins-shapes.cc |  20 ++-
gcc/config/riscv/riscv-vector-builtins.cc | 155 +-
gcc/config/riscv/riscv-vector-builtins.h  |  36 +++-
.../riscv/rvv/base/overloaded_rv32_vmv_v.c|   8 +
.../riscv/rvv/base/overloaded_rv64_vmv_v.c|   8 +
.../riscv/rvv/base/overloaded_vmv_v.h |  27 +++
8 files changed, 288 insertions(+), 3 deletions(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/overloaded_rv32_vmv_v.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/overloaded_rv64_vmv_v.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/overloaded_vmv_v.h
 
diff --git a/gcc/config/riscv/riscv-c.cc b/gcc/config/riscv/riscv-c.cc
index 283052ae313..060edd3129d 100644
--- a/gcc/config/riscv/riscv-c.cc
+++ b/gcc/config/riscv/riscv-c.cc
@@ -220,11 +220,47 @@ riscv_check_builtin_call (location_t loc, vec 
arg_loc, tree fndecl,
   gcc_unreachable ();
}
+/* Implement TARGET_RESOLVE_OVERLOADED_BUILTIN.  */
+static tree
+riscv_resolve_overloaded_builtin (unsigned int uncast_location, tree fndecl,
+   void *uncast_arglist)
+{
+  vec empty = {};
+  location_t loc = (location_t) uncast_location;
+  vec *arglist = (vec *) uncast_arglist;
+  unsigned int code = DECL_MD_FUNCTION_CODE (fndecl);
+  unsigned int subcode = code >> RISCV_BUILTIN_SHIFT;
+  tree new_fndecl = NULL_TREE;
+
+  if (!arglist)
+arglist = ∅
+
+  switch (code & RISCV_BUILTIN_CLASS)
+{
+case RISCV_BUILTIN_GENERAL:
+  break;
+case RISCV_BUILTIN_VECTOR:
+  new_fndecl = riscv_vector::resolve_overloaded_builtin (loc, subcode,
+  arglist);
+  break;
+default:
+  gcc_unreachable ();
+}
+
+  if (new_fndecl == NULL_TREE)
+return new_fndecl;
+
+  return build_function_call_vec (loc, vNULL, new_fndecl, arglist, NULL,
+   fndecl);
+}
+
/* Implement REGISTER_TARGET_PRAGMAS.  */
void
riscv_register_pragmas (void)
{
+  targetm.resolve_overloaded_builtin = riscv_resolve_overloaded_builtin;
   targetm.check_builtin_call = riscv_check_builtin_call;
+
   c_register_pragma ("riscv", "intrinsic", riscv_pragma_intrinsic);
}
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 6dbf6b9f943..5d2492dd031 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -381,6 +381,7 @@ gimple *gimple_fold_builtin (unsigned int, 
gimple_stmt_iterator *, gcall *);
rtx expand

Re: [PATCH v2] RISC-V: Implement RESOLVE_OVERLOADED_BUILTIN for RVV intrinsic

2023-09-12 Thread juzhe.zh...@rivai.ai
I think it's better to move 'get_non_overloaded_instance' into function_base.

+  /* To avoid API conflicting, we use void return type and void argument
+ for the overloaded function register, like aarch64-sve.  */

Plz rewrite the comments, don't mention aarch64 sve.

Could you run your rvv intrinsic api ci with this patch?
I am worrying that the resolve stuff will destroy the existing APi support.




juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-09-12 15:20
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v2] RISC-V: Implement RESOLVE_OVERLOADED_BUILTIN for RVV 
intrinsic
From: Pan Li 
 
Update in v2:
 
* Add get_non_overloaded_instance for function instance.
* Fix overload check for policy function.
* Enrich the test cases check.
 
Original log:
 
This patch would like add the framework to support the RVV overloaded
intrinsic API in riscv-xxx-xxx-gcc, like riscv-xxx-xxx-g++ did.
 
However, it almost leverage the hook TARGET_RESOLVE_OVERLOADED_BUILTIN
with below steps.
 
* Register overloaded functions.
* Add function_resolver for overloaded function resolving.
* Add resolve API for function shape with default implementation.
* Implement HOOK for navigating the overloaded API to non-overloaded API.
 
We validated this framework by the vmv_v intrinsic API(s), and we will
add more intrins API support in the underlying patches.
 
gcc/ChangeLog:
 
* config/riscv/riscv-c.cc
(riscv_resolve_overloaded_builtin): New function for the hook.
(riscv_register_pragmas): Register the hook
* config/riscv/riscv-protos.h (resolve_overloaded_builtin): New decl.
* config/riscv/riscv-vector-builtins-shapes.cc (build_one):
Register overloaded function.
(struct overloaded_base): New struct for overloaded shape.
(struct non_overloaded_base): New struct for non overloaded shape.
(struct move_def): Inherit overloaded shape.
* config/riscv/riscv-vector-builtins.cc
(function_instance::get_non_overloaded_instance): New API impl.
(function_builder::add_function): Add overloaded arg.
(function_resolver::function_resolver): New constructor.
(function_builder::add_overloaded_function): New API impl.
(function_resolver::resolve): Ditto.
(function_resolver::lookup): Ditto.
(function_resolver::get_sub_code): Ditto.
(resolve_overloaded_builtin): New function impl.
* config/riscv/riscv-vector-builtins.h:
(class function_resolver): New class.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/overloaded_rv32_vmv_v.c: New test.
* gcc.target/riscv/rvv/base/overloaded_rv64_vmv_v.c: New test.
* gcc.target/riscv/rvv/base/overloaded_vmv_v.h: New test.
 
Signed-off-by: Pan Li 
---
gcc/config/riscv/riscv-c.cc   |  36 
gcc/config/riscv/riscv-protos.h   |   1 +
.../riscv/riscv-vector-builtins-shapes.cc |  20 ++-
gcc/config/riscv/riscv-vector-builtins.cc | 155 +-
gcc/config/riscv/riscv-vector-builtins.h  |  35 +++-
.../riscv/rvv/base/overloaded_rv32_vmv_v.c|   8 +
.../riscv/rvv/base/overloaded_rv64_vmv_v.c|   8 +
.../riscv/rvv/base/overloaded_vmv_v.h |  27 +++
8 files changed, 287 insertions(+), 3 deletions(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/overloaded_rv32_vmv_v.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/overloaded_rv64_vmv_v.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/overloaded_vmv_v.h
 
diff --git a/gcc/config/riscv/riscv-c.cc b/gcc/config/riscv/riscv-c.cc
index 283052ae313..060edd3129d 100644
--- a/gcc/config/riscv/riscv-c.cc
+++ b/gcc/config/riscv/riscv-c.cc
@@ -220,11 +220,47 @@ riscv_check_builtin_call (location_t loc, vec 
arg_loc, tree fndecl,
   gcc_unreachable ();
}
+/* Implement TARGET_RESOLVE_OVERLOADED_BUILTIN.  */
+static tree
+riscv_resolve_overloaded_builtin (unsigned int uncast_location, tree fndecl,
+   void *uncast_arglist)
+{
+  vec empty = {};
+  location_t loc = (location_t) uncast_location;
+  vec *arglist = (vec *) uncast_arglist;
+  unsigned int code = DECL_MD_FUNCTION_CODE (fndecl);
+  unsigned int subcode = code >> RISCV_BUILTIN_SHIFT;
+  tree new_fndecl = NULL_TREE;
+
+  if (!arglist)
+arglist = ∅
+
+  switch (code & RISCV_BUILTIN_CLASS)
+{
+case RISCV_BUILTIN_GENERAL:
+  break;
+case RISCV_BUILTIN_VECTOR:
+  new_fndecl = riscv_vector::resolve_overloaded_builtin (loc, subcode,
+  arglist);
+  break;
+default:
+  gcc_unreachable ();
+}
+
+  if (new_fndecl == NULL_TREE)
+return new_fndecl;
+
+  return build_function_call_vec (loc, vNULL, new_fndecl, arglist, NULL,
+   fndecl);
+}
+
/* Implement REGISTER_TARGET_PRAGMAS.  */
void
riscv_register_pragmas (void)
{
+  targetm.resolve_overloaded_builtin = riscv_resolve_overloaded_builtin;
   targetm.check_builtin_call = riscv_check_builtin_call;
+
   c_register_pragma ("riscv", "intrinsic", riscv_pragma_intrinsic);
}
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
inde

Re: Re: [PATCH V3] RISC-V: Support Dynamic LMUL Cost model

2023-09-11 Thread juzhe.zh...@rivai.ai
>> As long as we're just looking for the maximum number of live registers,
>> we can use a sliding-window approach:  create a structure with all
>> start and end points, sort it, and increase the current pressure
>> if we start a new range or decrease.  That's O(n log n).

I failed to see it can help. Current approach is straightforward.
  for (hash_map::iterator iter = live_ranges.begin ();
   iter != live_ranges.end (); ++iter)
{
  tree var = (*iter).first;
  pair live_range = (*iter).second;
  for (i = live_range.first; i <= live_range.second; i++)
{
  machine_mode mode = TYPE_MODE (TREE_TYPE (var));
  unsigned int nregs
= compute_nregs_for_mode (mode, biggest_mode, lmul);
  live_vars_vec[i] += nregs;
  if (live_vars_vec[i] > max_nregs)
max_nregs = live_vars_vec[i];
}
}

Could you revise this piece of codes ?

Other comments has been addressed in V4:
https://gcc.gnu.org/pipermail/gcc-patches/2023-September/629959.html 




juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-09-12 04:31
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw
Subject: Re: [PATCH V3] RISC-V: Support Dynamic LMUL Cost model
Hi Juzhe,
 
glad that we can use the dominator info directly.  Could we move the
calculation of the info to the beginning (if it's not available)?  That
makes it clearer that it's a prerequisite.  Function comments look
good now.
 
Some general remarks kind of similar to v1:
 
- I would prefer a hash_map or similar to hold the end point for a range
   instead of looking through potentially all ranges in contrived cases.
 
- As long as we're just looking for the maximum number of live registers,
   we can use a sliding-window approach:  create a structure with all
   start and end points, sort it, and increase the current pressure
   if we start a new range or decrease.  That's O(n log n).
 
> +  const ssa_use_operand_t *const head = &(SSA_NAME_IMM_USE_NODE (t));
> +  const ssa_use_operand_t *ptr;
> +
> +  for (ptr = head->next; ptr != head; ptr = ptr->next)
> + {
 
Why does FOR_EACH_IMM_USE not work here?
 
> +   unsigned int max_point
> + = (*program_points_per_bb.get (e->src)).length () - 1;
> +   for (k = 0; k < (*live_ranges).length (); k++)
> + {
> +   if ((*live_ranges)[i].var == def)
 
Would also be nice not having to search through all ranges but just index/hash
it via var (or similar).
 
What about one test with global live ranges?  Not a necessity IMHO we can still
add it later.
 
Regards
Robin
 
 


Re: [PATCH] RISC-V: Add vcreate intrinsics for RVV tuple types

2023-09-11 Thread juzhe.zh...@rivai.ai
Thanks for support it.
LGTM from my side.
Wait for kito's more comments.



juzhe.zh...@rivai.ai
 
From: Li Xu
Date: 2023-09-12 10:08
To: gcc-patches
CC: kito.cheng; palmer; juzhe.zhong; pan2.li; gaofei; wangfeng; xuli
Subject: [PATCH] RISC-V: Add vcreate intrinsics for RVV tuple types
From: xuli 
 
gcc/ChangeLog:
 
* config/riscv/riscv-vector-builtins-bases.cc (class vcreate):
(BASE): New class.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def (vcreate): Add 
vcreate support.
* config/riscv/riscv-vector-builtins-shapes.cc (struct vcreate_def): 
Ditto.
(SHAPE): Ditto.
* config/riscv/riscv-vector-builtins-shapes.h: Ditto.
* config/riscv/riscv-vector-builtins.cc: Add args type.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/tuple_create.c: New test.
---
.../riscv/riscv-vector-builtins-bases.cc  |  40 ++
.../riscv/riscv-vector-builtins-bases.h   |   1 +
.../riscv/riscv-vector-builtins-functions.def |   1 +
.../riscv/riscv-vector-builtins-shapes.cc |  50 +++
.../riscv/riscv-vector-builtins-shapes.h  |   1 +
gcc/config/riscv/riscv-vector-builtins.cc |  12 ++
.../gcc.target/riscv/rvv/base/tuple_create.c  | 123 ++
7 files changed, 228 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/tuple_create.c
 
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index 8e679f72392..be3df2c1ea2 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -1824,6 +1824,44 @@ public:
   }
};
+class vcreate : public function_base
+{
+public:
+  gimple *fold (gimple_folder &f) const override
+  {
+unsigned int nargs = gimple_call_num_args (f.call);
+tree lhs_type = TREE_TYPE (f.lhs);
+
+/* Replace the call with a clobber of the result (to prevent it from
+   becoming upwards exposed) followed by stores into each individual
+   vector of tuple.
+
+   The fold routines expect the replacement statement to have the
+   same lhs as the original call, so return the clobber statement
+   rather than the final vector store.  */
+gassign *clobber = gimple_build_assign (f.lhs, build_clobber (lhs_type));
+
+for (unsigned int i = nargs; i-- > 0; )
+  {
+ tree rhs_vector = gimple_call_arg (f.call, i);
+ tree field = tuple_type_field (TREE_TYPE (f.lhs));
+ tree lhs_array = build3 (COMPONENT_REF, TREE_TYPE (field),
+ unshare_expr (f.lhs), field, NULL_TREE);
+ tree lhs_vector = build4 (ARRAY_REF, TREE_TYPE (rhs_vector),
+   lhs_array, size_int (i),
+   NULL_TREE, NULL_TREE);
+ gassign *assign = gimple_build_assign (lhs_vector, rhs_vector);
+ gsi_insert_after (f.gsi, assign, GSI_SAME_STMT);
+  }
+return clobber;
+  }
+
+  rtx expand (function_expander &e) const override
+  {
+return NULL_RTX;
+  }
+};
+
class read_vl : public function_base
{
public:
@@ -2285,6 +2323,7 @@ static CONSTEXPR const vlmul_ext vlmul_ext_obj;
static CONSTEXPR const vlmul_trunc vlmul_trunc_obj;
static CONSTEXPR const vset vset_obj;
static CONSTEXPR const vget vget_obj;
+static CONSTEXPR const vcreate vcreate_obj;
static CONSTEXPR const read_vl read_vl_obj;
static CONSTEXPR const vleff vleff_obj;
static CONSTEXPR const vlenb vlenb_obj;
@@ -2546,6 +2585,7 @@ BASE (vlmul_ext)
BASE (vlmul_trunc)
BASE (vset)
BASE (vget)
+BASE (vcreate)
BASE (read_vl)
BASE (vleff)
BASE (vlenb)
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h 
b/gcc/config/riscv/riscv-vector-builtins-bases.h
index 69d4562091f..131041ea66f 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.h
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
@@ -267,6 +267,7 @@ extern const function_base *const vlmul_ext;
extern const function_base *const vlmul_trunc;
extern const function_base *const vset;
extern const function_base *const vget;
+extern const function_base *const vcreate;
extern const function_base *const read_vl;
extern const function_base *const vleff;
extern const function_base *const vlenb;
diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def 
b/gcc/config/riscv/riscv-vector-builtins-functions.def
index 3ce06dc60b7..18ed2c2b8f6 100644
--- a/gcc/config/riscv/riscv-vector-builtins-functions.def
+++ b/gcc/config/riscv/riscv-vector-builtins-functions.def
@@ -621,6 +621,7 @@ DEF_RVV_FUNCTION (vget, vget, none_preds, 
all_v_vget_lmul4_x2_ops)
// Tuple types
DEF_RVV_FUNCTION (vset, vset, none_preds, all_v_vset_tuple_ops)
DEF_RVV_FUNCTION (vget, vget, none_preds, all_v_vget_tuple_ops)
+DEF_RVV_FUNCTION (vcreate, vcreate, none_preds, all_v_vcreate_tuple_ops)
DEF_RVV_FUNCTION (vlseg, seg_loadstore, full_preds, 
tuple_v_scalar_const_ptr_ops)
DEF_RVV_FUNCTION (vsseg, seg_loadstore, none_m_preds, tuple_v_scalar_ptr_ops)
DEF_RVV_FUNCTION (vlsseg, seg_loadstore, full_preds, 
tuple_v_scalar_const_ptr_p

Re: RE: [PATCH v1] RISC-V: Implement RESOLVE_OVERLOADED_BUILTIN for RVV intrinsic

2023-09-11 Thread juzhe.zh...@rivai.ai
Add a function call get_non_overloaded_instance into instance.
The instance already know it is void vmv (void).
In this function search the arglist. and return the real non-overloaded decl.



juzhe.zh...@rivai.ai
 
From: Li, Pan2
Date: 2023-09-12 09:20
To: 钟居哲
CC: kito.cheng; gcc-patches; Wang, Yanzhang
Subject: RE: RE: [PATCH v1] RISC-V: Implement RESOLVE_OVERLOADED_BUILTIN for 
RVV intrinsic
We cannot leverage this instance for correctness.
The rfun of below code is the overloaded builtin is for the overloaded 
function, which is registered as void xxx(void) as aarch64 did to avoid the 
conflict.
 
Let’s take vmv_v_i32m1 as example in rfun table.
 
Index 0: void vmv_v(void) overloaded
Index 1: i32m1 vmv_v_v_i32m1_i32m1 (i32m1, size_t) non-overloaded
Index 2: placeholder.
 
When we enter the hook(aka the code list below), the rfun we have is the index 
0 rfun instead of index 1.
Then we need the arglist to lookup the rfun of index 1 for the underlying call, 
as well as build the instance for the index 1 rfun.
 
Aarch64 has the same rfun table as above, they leverage a loop to parse the 
arglist with machine mode matching in a predefined type suffix(which is not 
available in RISC-V).
 
I think they almost try to resolve the same problem but different implement 
details.
 
Pan
 
From: 钟居哲  
Sent: Tuesday, September 12, 2023 7:20 AM
To: Li, Pan2 
Cc: kito.cheng ; gcc-patches ; 
Wang, Yanzhang 
Subject: Re: RE: [PATCH v1] RISC-V: Implement RESOLVE_OVERLOADED_BUILTIN for 
RVV intrinsic
 
I don't understand.
 
+tree+resolve_overloaded_builtin (location_t loc, unsigned int code,+   
   vec *arglist)+{+  if (code >= vec_safe_length 
(registered_functions))+return NULL_TREE;++  const registered_function 
*rfun = (*registered_functions)[code];++  if (!rfun || !rfun->overloaded_p)+
return NULL_TREE;++  return function_resolver (loc, rfun->instance, rfun->decl, 
*arglist)+.resolve ();+}
You already have rfun->instance. Just use this instance should be good enough.


juzhe.zh...@rivai.ai
 
From: Li, Pan2
Date: 2023-09-11 23:24
To: 钟居哲
CC: kito.cheng; gcc-patches; Wang, Yanzhang
Subject: RE: RE: [PATCH v1] RISC-V: Implement RESOLVE_OVERLOADED_BUILTIN for 
RVV intrinsic
For function instance with void or void arguments, it is easy as you mentioned 
as below.
 
For generate API (to get the right hash), you need to build the rvv_type_info, 
predications_type_index and rvv_op_info
from the arglist (aka vec) from hook.
 
Then we need to construct above parameters from one tree argument. Sorry I not 
sure if I understand correctly but I failed
to locate somewhere has similar usage.
 
Could you please help to insight me some best practice about the transformation 
from tree to above types?
 
Pan
 
From: 钟居哲  
Sent: Monday, September 11, 2023 9:07 PM
To: Li, Pan2 
Cc: kito.cheng ; gcc-patches ; 
Wang, Yanzhang 
Subject: Re: RE: [PATCH v1] RISC-V: Implement RESOLVE_OVERLOADED_BUILTIN for 
RVV intrinsic
 
function_instance
get_read_vl_instance (void)
{
  return function_instance ("read_vl", bases::read_vl, shapes::read_vl,
  none_ops[0], PRED_TYPE_none, &p_none_void_ops);
}
 
tree
get_read_vl_decl (void)
{
  function_instance instance = get_read_vl_instance ();
  hashval_t hash = instance.hash ();
  registered_function *rfn = function_table->find_with_hash (instance, hash);
  gcc_assert (rfn);
  return rfn->decl;
}
 
You should reference it. I don't see why it's hard for use to construct 
instance first, then use that instance hash to get the decl.


juzhe.zh...@rivai.ai
 
From: Li, Pan2
Date: 2023-09-11 20:26
To: juzhe.zhong
CC: kito.cheng; gcc-patches; Wang, Yanzhang
Subject: RE: [PATCH v1] RISC-V: Implement RESOLVE_OVERLOADED_BUILTIN for RVV 
intrinsic
> No. You must construct instance. 'strcmp' is very ugly.
 
Strcmp here is defensive code here for early exit if not found (can be removed 
for correctness), which is not required to find the right declaration.
 
Pan
 
From: juzhe.zhong  
Sent: Monday, September 11, 2023 8:20 PM
To: Li, Pan2 
Cc: kito.cheng ; gcc-patches ; 
Wang, Yanzhang 
Subject: Re: [PATCH v1] RISC-V: Implement RESOLVE_OVERLOADED_BUILTIN for RVV 
intrinsic
 
No. You must construct instance. 'strcmp' is very ugly.
 Replied Message 
From
Li, Pan2
Date
09/11/2023 20:09
To
juzhe.zh...@rivai.ai,
kito.cheng
Cc
gcc-patches,
Wang, Yanzhang
Subject
RE: Re: [PATCH v1] RISC-V: Implement RESOLVE_OVERLOADED_BUILTIN for RVV 
intrinsic
> -if (overloaded_p && instance.pred == PRED_TYPE_m)
> +if (overloaded_p)
 
Thanks for pointing this out, my misunderstanding for policy function result in 
this change as mistake, will send V2 for this.
 
> Plz change it into : Actually, it is not easy to convert to this approach as 
> aarch64 has different implementation of types information.Like 
> type_suffix_info (aarch64 loop type suffix to get the arglist type in 
> infer_vecto

Re: Re: [PATCH V3] RISC-V: Support Dynamic LMUL Cost model

2023-09-11 Thread juzhe.zh...@rivai.ai
>> What about one test with global live ranges?  Not a necessity IMHO we can 
>> still
>> add it later.
We already have.


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-09-12 04:31
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw
Subject: Re: [PATCH V3] RISC-V: Support Dynamic LMUL Cost model
Hi Juzhe,
 
glad that we can use the dominator info directly.  Could we move the
calculation of the info to the beginning (if it's not available)?  That
makes it clearer that it's a prerequisite.  Function comments look
good now.
 
Some general remarks kind of similar to v1:
 
- I would prefer a hash_map or similar to hold the end point for a range
   instead of looking through potentially all ranges in contrived cases.
 
- As long as we're just looking for the maximum number of live registers,
   we can use a sliding-window approach:  create a structure with all
   start and end points, sort it, and increase the current pressure
   if we start a new range or decrease.  That's O(n log n).
 
> +  const ssa_use_operand_t *const head = &(SSA_NAME_IMM_USE_NODE (t));
> +  const ssa_use_operand_t *ptr;
> +
> +  for (ptr = head->next; ptr != head; ptr = ptr->next)
> + {
 
Why does FOR_EACH_IMM_USE not work here?
 
> +   unsigned int max_point
> + = (*program_points_per_bb.get (e->src)).length () - 1;
> +   for (k = 0; k < (*live_ranges).length (); k++)
> + {
> +   if ((*live_ranges)[i].var == def)
 
Would also be nice not having to search through all ranges but just index/hash
it via var (or similar).
 
What about one test with global live ranges?  Not a necessity IMHO we can still
add it later.
 
Regards
Robin
 
 


Re: Re: [PATCH v1] RISC-V: Implement RESOLVE_OVERLOADED_BUILTIN for RVV intrinsic

2023-09-11 Thread juzhe.zh...@rivai.ai
>> Just make sure it's the right change?
It seem incorrect to me.

More comments (I just reviewed again):

+tree
+function_resolver::lookup ()
+{
+  unsigned int code_limit = vec_safe_length (registered_functions);
+
+  for (unsigned code = get_sub_code () + 1; code < code_limit; code++)
+{
+  registered_function *rfun = (*registered_functions)[code];
+  function_instance instance = rfun->instance;
+
+  if (strcmp (base_name, instance.base_name) != 0)
+   break;
+
+  if (rfun->overloaded_p)
+   continue;
+
+  unsigned k;
+  const rvv_arg_type_info *args = instance.op_info->args;
+
+  for (k = 0; args[k].base_type != NUM_BASE_TYPES; k++)
+   {
+ if (k >= m_arglist.length ())
+   break;
+
+ if (TYPE_MODE (instance.get_arg_type (k))
+   != TYPE_MODE (TREE_TYPE (m_arglist[k])))
+   break;
+   }
+
+   if (args[k].base_type == NUM_BASE_TYPES)
+ return rfun->decl;
+}
+
+  return NULL_TREE;
+}
Plz change it into :
/* Silently check whether there is an instance of the function with the
   mode suffix given by MODE and the type suffixes given by TYPE0 and TYPE1.
   Return its function decl if so, otherwise return null.  */
tree
function_resolver::lookup_form (mode_suffix_index mode,
type_suffix_index type0,
type_suffix_index type1)
{
  type_suffix_pair types = { type0, type1 };
  function_instance instance (base_name, base, shape, mode, types, pred);
  registered_function *rfn
= function_table->find_with_hash (instance, instance.hash ());
  return rfn ? rfn->decl : NULL_TREE;
}


juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-09-11 17:04
To: juzhe.zh...@rivai.ai
CC: pan2.li; gcc-patches; yanzhang.wang
Subject: Re: [PATCH v1] RISC-V: Implement RESOLVE_OVERLOADED_BUILTIN for RVV 
intrinsic
> @@ -545,7 +563,7 @@ struct move_def : public build_base
>  /* According to rvv-intrinsic-doc, it does not add "_m" suffix
> for vop_m C++ overloaded API.  */
> -if (overloaded_p && instance.pred == PRED_TYPE_m)
> +if (overloaded_p)
 
Just make sure it's the right change?
 
>return b.finish_name ();
>  b.append_name (predication_suffixes[instance.pred]);
>  return b.finish_name ();
 


Re: [PATCH v1] RISC-V: Implement RESOLVE_OVERLOADED_BUILTIN for RVV intrinsic

2023-09-11 Thread juzhe.zh...@rivai.ai
Thanks for supporting it even though I don't like this feature :).
The framework is LGTM.

Let's wait for kito's more comments.



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-09-11 15:57
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Implement RESOLVE_OVERLOADED_BUILTIN for RVV 
intrinsic
From: Pan Li 
 
This patch would like add the framework to support the RVV overloaded
intrinsic API in riscv-xxx-xxx-gcc, like riscv-xxx-xxx-g++ did.
 
However, it almost leverage the hook TARGET_RESOLVE_OVERLOADED_BUILTIN
with below steps.
 
* Register overloaded functions.
* Add function_resolver for overloaded function resolving.
* Add resolve API for function shape with default implementation.
* Implement HOOK for navigating the overloaded API to non-overloaded API.
 
We validated this framework by the vmv_v intrinsic API(s), and we will
add more intrins API support in the underlying patches.
 
gcc/ChangeLog:
 
* config/riscv/riscv-c.cc
(riscv_resolve_overloaded_builtin): New function for the hook.
(riscv_register_pragmas): Register the hook
* config/riscv/riscv-protos.h (resolve_overloaded_builtin): New decl.
* config/riscv/riscv-vector-builtins-shapes.cc (build_one):
Register overloaded function.
(struct overloaded_base): New struct for overloaded shape.
(struct non_overloaded_base): New struct for non overloaded shape.
(struct move_def): Inherit overloaded shape.
* config/riscv/riscv-vector-builtins.cc
(function_builder::add_function): Add overloaded arg.
(function_builder::add_overloaded_function): New function impl.
(function_resolver::function_resolver): New constructor.
(function_resolver::get_sub_code): New API impl.
(function_resolver::resolve): New API impl.
(function_resolver::lookup): New API impl.
(resolve_overloaded_builtin): New func impl.
* config/riscv/riscv-vector-builtins.h
(class function_resolver): New class.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/overloaded_rv32_vmv_v.c: New test.
* gcc.target/riscv/rvv/base/overloaded_rv64_vmv_v.c: New test.
* gcc.target/riscv/rvv/base/overloaded_vmv_v.h: New test.
 
Signed-off-by: Pan Li 
---
gcc/config/riscv/riscv-c.cc   |  36 +
gcc/config/riscv/riscv-protos.h   |   1 +
.../riscv/riscv-vector-builtins-shapes.cc |  22 ++-
gcc/config/riscv/riscv-vector-builtins.cc | 138 +-
gcc/config/riscv/riscv-vector-builtins.h  |  30 +++-
.../riscv/rvv/base/overloaded_rv32_vmv_v.c|   4 +
.../riscv/rvv/base/overloaded_rv64_vmv_v.c|   4 +
.../riscv/rvv/base/overloaded_vmv_v.h |  17 +++
8 files changed, 248 insertions(+), 4 deletions(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/overloaded_rv32_vmv_v.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/overloaded_rv64_vmv_v.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/overloaded_vmv_v.h
 
diff --git a/gcc/config/riscv/riscv-c.cc b/gcc/config/riscv/riscv-c.cc
index 283052ae313..060edd3129d 100644
--- a/gcc/config/riscv/riscv-c.cc
+++ b/gcc/config/riscv/riscv-c.cc
@@ -220,11 +220,47 @@ riscv_check_builtin_call (location_t loc, vec 
arg_loc, tree fndecl,
   gcc_unreachable ();
}
+/* Implement TARGET_RESOLVE_OVERLOADED_BUILTIN.  */
+static tree
+riscv_resolve_overloaded_builtin (unsigned int uncast_location, tree fndecl,
+   void *uncast_arglist)
+{
+  vec empty = {};
+  location_t loc = (location_t) uncast_location;
+  vec *arglist = (vec *) uncast_arglist;
+  unsigned int code = DECL_MD_FUNCTION_CODE (fndecl);
+  unsigned int subcode = code >> RISCV_BUILTIN_SHIFT;
+  tree new_fndecl = NULL_TREE;
+
+  if (!arglist)
+arglist = ∅
+
+  switch (code & RISCV_BUILTIN_CLASS)
+{
+case RISCV_BUILTIN_GENERAL:
+  break;
+case RISCV_BUILTIN_VECTOR:
+  new_fndecl = riscv_vector::resolve_overloaded_builtin (loc, subcode,
+  arglist);
+  break;
+default:
+  gcc_unreachable ();
+}
+
+  if (new_fndecl == NULL_TREE)
+return new_fndecl;
+
+  return build_function_call_vec (loc, vNULL, new_fndecl, arglist, NULL,
+   fndecl);
+}
+
/* Implement REGISTER_TARGET_PRAGMAS.  */
void
riscv_register_pragmas (void)
{
+  targetm.resolve_overloaded_builtin = riscv_resolve_overloaded_builtin;
   targetm.check_builtin_call = riscv_check_builtin_call;
+
   c_register_pragma ("riscv", "intrinsic", riscv_pragma_intrinsic);
}
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 6dbf6b9f943..5d2492dd031 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -381,6 +381,7 @@ gimple *gimple_fold_builtin (unsigned int, 
gimple_stmt_iterator *, gcall *);
rtx expand_builtin (unsigned int, tree, rtx);
bool check_builtin_call (location_t, vec, unsigned int,
   tree, unsigned int, tree *);
+tree resolve_overloaded_builtin (location_t, unsigned int, vec *);
bool const_vec_all_same_in_range_p (rtx, HOST_WIDE_INT, HOST_WIDE_INT);
bool legit

Re: Re: [PATCH] RISC-V: Use dominance analysis in global vsetvl elimination

2023-09-11 Thread juzhe.zh...@rivai.ai
Committed. Thanks kito.


>> I guess you will remove get_all_predecessors once LMUL cost
>> model can use dominator info as well?
Yes. I am trying but there is a failed case for dynamic LMUL.
Not sure whether it can work now.



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-09-11 15:03
To: Juzhe-Zhong
CC: gcc-patches; kito.cheng
Subject: Re: [PATCH] RISC-V: Use dominance analysis in global vsetvl elimination
LGTM, and I guess you will remove get_all_predecessors once LMUL cost
model can use dominator info as well?
 
 
On Mon, Sep 11, 2023 at 11:34 AM Juzhe-Zhong  wrote:
>
> I found that it's more reasonable to use existing dominance analysis.
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vsetvl.cc 
> (pass_vsetvl::global_eliminate_vsetvl_insn): Use dominance analysis.
> (pass_vsetvl::init): Ditto.
> (pass_vsetvl::done): Ditto.
>
> ---
>  gcc/config/riscv/riscv-vsetvl.cc | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/config/riscv/riscv-vsetvl.cc 
> b/gcc/config/riscv/riscv-vsetvl.cc
> index 134b97737ae..f81361c4ccd 100644
> --- a/gcc/config/riscv/riscv-vsetvl.cc
> +++ b/gcc/config/riscv/riscv-vsetvl.cc
> @@ -4054,7 +4054,7 @@ pass_vsetvl::global_eliminate_vsetvl_insn (const 
> bb_info *bb) const
>  }
>
>/* Step1: Reshape the VL/VTYPE status to make sure everything compatible.  
> */
> -  hash_set pred_cfg_bbs = get_all_predecessors (cfg_bb);
> +  auto_vec pred_cfg_bbs = get_dominated_by 
> (CDI_POST_DOMINATORS, cfg_bb);
>FOR_EACH_EDGE (e, ei, cfg_bb->preds)
>  {
>sbitmap avout = m_vector_manager->vector_avout[e->src->index];
> @@ -4243,6 +4243,7 @@ pass_vsetvl::init (void)
>  {
>/* Initialization of RTL_SSA.  */
>calculate_dominance_info (CDI_DOMINATORS);
> +  calculate_dominance_info (CDI_POST_DOMINATORS);
>df_analyze ();
>crtl->ssa = new function_info (cfun);
>  }
> @@ -4264,6 +4265,7 @@ pass_vsetvl::done (void)
>  {
>/* Finalization of RTL_SSA.  */
>free_dominance_info (CDI_DOMINATORS);
> +  free_dominance_info (CDI_POST_DOMINATORS);
>if (crtl->ssa->perform_pending_updates ())
> cleanup_cfg (0);
>delete crtl->ssa;
> --
> 2.36.3
>
 


Re: [PATCH] RISC-V: Enable RVV scalable vectorization by default[PR111311]

2023-09-10 Thread juzhe.zh...@rivai.ai
Ping this patch.

I think it's time to enable scalable vectorization by default and do the whole 
regression every time (except vect.exp that we didn't enable yet)

Update current FAILs status:

Real FAILS (ICE and execution FAIL):

FAIL: gcc.dg/pr70252.c (internal compiler error: in 
gimple_expand_vec_cond_expr, at gimple-isel.cc:284)
FAIL: gcc.dg/pr70252.c (test for excess errors)
FAIL: gcc.dg/pr92301.c execution test

Robin is working on these 3 issues and will be solved soon.

FAIL: g++.dg/torture/vshuf-v4df.C   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  (internal compiler error: in as_a, at machmode.h:381)
FAIL: g++.dg/torture/vshuf-v4df.C   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  (test for excess errors)
FAIL: g++.dg/torture/vshuf-v4df.C   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  (internal compiler error: in as_a, at machmode.h:381)
FAIL: g++.dg/torture/vshuf-v4df.C   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  (test for excess errors)
This is a long time known issue I have mentioned many times, we need help for 
LTO since it's caused by mode bits extension.

The rest bogus FAILs:
FAIL: gcc.dg/unroll-8.c scan-rtl-dump loop2_unroll "Not unrolling loop, doesn't 
roll"
FAIL: gcc.dg/unroll-8.c scan-rtl-dump loop2_unroll "likely upper bound: 6"
FAIL: gcc.dg/unroll-8.c scan-rtl-dump loop2_unroll "realistic bound: -1"
FAIL: gcc.dg/var-expand1.c scan-rtl-dump loop2_unroll "Expanding Accumulator"
FAIL: gcc.dg/tree-ssa/cunroll-16.c scan-tree-dump cunroll "optimized: loop with 
[0-9]+ iterations completely unrolled"
FAIL: gcc.dg/tree-ssa/cunroll-16.c scan-tree-dump-not optimized "foo"
FAIL: gcc.dg/tree-ssa/forwprop-40.c scan-tree-dump-times optimized 
"BIT_FIELD_REF" 0
FAIL: gcc.dg/tree-ssa/forwprop-40.c scan-tree-dump-times optimized 
"BIT_INSERT_EXPR" 0
FAIL: gcc.dg/tree-ssa/forwprop-41.c scan-tree-dump-times optimized 
"BIT_FIELD_REF" 0
FAIL: gcc.dg/tree-ssa/forwprop-41.c scan-tree-dump-times optimized 
"BIT_INSERT_EXPR" 1
FAIL: gcc.dg/tree-ssa/gen-vect-11b.c scan-tree-dump-times vect "vectorized 0 
loops" 1
FAIL: gcc.dg/tree-ssa/gen-vect-11c.c scan-tree-dump-times vect "vectorized 0 
loops" 1
FAIL: gcc.dg/tree-ssa/gen-vect-26.c scan-tree-dump-times vect "Alignment of 
access forced using peeling" 1
FAIL: gcc.dg/tree-ssa/gen-vect-28.c scan-tree-dump-times vect "Alignment of 
access forced using peeling" 1
FAIL: gcc.dg/tree-ssa/loop-bound-1.c scan-tree-dump ivopts "bounded by 254"
FAIL: gcc.dg/tree-ssa/loop-bound-2.c scan-tree-dump ivopts "bounded by 254"
FAIL: gcc.dg/tree-ssa/predcom-2.c scan-tree-dump-times pcom "Unrolling 2 
times." 2
FAIL: gcc.dg/tree-ssa/predcom-4.c scan-tree-dump-times pcom "Combination" 1
FAIL: gcc.dg/tree-ssa/predcom-4.c scan-tree-dump-times pcom "Unrolling 3 
times." 1
FAIL: gcc.dg/tree-ssa/predcom-5.c scan-tree-dump-times pcom "Combination" 2
FAIL: gcc.dg/tree-ssa/predcom-5.c scan-tree-dump-times pcom "Unrolling 3 
times." 1
FAIL: gcc.dg/tree-ssa/predcom-9.c scan-tree-dump pcom "Executing predictive 
commoning without unrolling"
FAIL: gcc.dg/tree-ssa/reassoc-46.c scan-tree-dump-times optimized 
"(?:vect_)?sum_[\\d._]+ = (?:(?:vect_)?_[\\d._]+ \\+ 
(?:vect_)?sum_[\\d._]+|(?:v   ect_)?sum_[\\d._]+ \\+ (?:vect_)?_[\\d._]+)" 1
FAIL: gcc.dg/tree-ssa/scev-10.c scan-tree-dump-times ivopts "  
Type:\\tREFERENCE ADDRESS\n" 1
FAIL: gcc.dg/tree-ssa/scev-11.c scan-tree-dump-times ivopts "  
Type:\\tREFERENCE ADDRESS\n" 2
FAIL: gcc.dg/tree-ssa/scev-14.c scan-tree-dump ivopts "Overflowness wrto loop 
niter:\tNo-overflow"
FAIL: gcc.dg/tree-ssa/scev-9.c scan-tree-dump-times ivopts "  Type:\\tREFERENCE 
ADDRESS\n" 1
FAIL: gcc.dg/tree-ssa/split-path-11.c scan-tree-dump-times split-paths "join 
point for if-convertable half-diamond" 1

These are bogus dump FAILs and I have 100% confirm each of them, we are having 
same behavior as SVE.

So is this patch ok for trunk ?



juzhe.zh...@rivai.ai
 
From: Juzhe-Zhong
Date: 2023-09-07 15:28
To: gcc-patches
CC: kito.cheng; kito.cheng; jeffreyalaw; rdapp.gcc; Juzhe-Zhong
Subject: [PATCH] RISC-V: Enable RVV scalable vectorization by default[PR111311]
This patch is not ready but they all will be fixed very soon.
 
gcc/ChangeLog:
 
* config/riscv/riscv.opt: Set default as scalable vectorization.
 
---
gcc/config/riscv/riscv.opt | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
 
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index 98f342348b7..bf2eca08221 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -292,7 +292,7 @@ EnumValue
Enum(riscv_autovec_preference) String(fixed-vlmax) Value(RVV_FIXED_VLMAX)
-param=riscv-autovec-preference=
-Target RejectNegative Joined Enum(riscv_autovec_preference) 
Var(riscv_autovec_preference) Init(NO_AUTOVEC)
+Target RejectNegative Joined Enum(riscv_autovec_preference) 
Var(riscv_autovec_preference) Init(RVV_SCALABLE)
-param=riscv-autovec-preference= Set the preference of 
auto-vectorization in the RISC-V port.
Enum
-- 
2.36.3
 


Re: Re: [PATCH] RISC-V: Add VLS modes VEC_PERM support[PR111311]

2023-09-10 Thread juzhe.zh...@rivai.ai
Sure. Thanks kito.



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-09-11 10:57
To: juzhe.zh...@rivai.ai
CC: gcc-patches; Kito.cheng
Subject: Re: Re: [PATCH] RISC-V: Add VLS modes VEC_PERM support[PR111311]
OK, but could you split this patch into two patches? pre-approved for both.
 
On Mon, Sep 11, 2023 at 10:36 AM juzhe.zh...@rivai.ai
 wrote:
>
> >> Should we also add loads and stores as well?
> >> and just make sure this is also necessary for the fix and not sneaky, 
> >> right?
>
> No, we don't need loads/stores. Since this following handling codes:
> (define_insn_and_split "*mov_lra"
>   [(set (match_operand:VLS_AVL_REG 0 "reg_or_mem_operand" "=vr, m,vr")
>   (match_operand:VLS_AVL_REG 1 "reg_or_mem_operand" "  m,vr,vr"))
>(clobber (match_scratch:P 2 "=&r,&r,X"))]
>   "TARGET_VECTOR && (lra_in_progress || reload_completed)
>&& (register_operand (operands[0], mode)
>|| register_operand (operands[1], mode))"
>   "#"
>   "&& reload_completed"
>   [(const_int 0)]
> {
>   if (REG_P (operands[0]) && REG_P (operands[1]))
>   emit_insn (gen_rtx_SET (operands[0], operands[1]));
>   else
> {
>   emit_move_insn (operands[2], gen_int_mode (GET_MODE_NUNITS 
> (mode),
>  Pmode));
>   unsigned insn_flags
> = GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL
>  ? riscv_vector::UNARY_MASK_OP
>  : riscv_vector::UNARY_OP;
>   riscv_vector::emit_nonvlmax_insn (code_for_pred_mov 
> (mode),
>   insn_flags, operands, operands[2]);
> }
>   DONE;
> }
>   [(set_attr "type" "vmov")]
> )
>
> We split special case use emit_insn (gen_rtx_SET (operands[0], operands[1]));
>
> Missing this pattern will cause ICE but current testcases didn't produce such 
> issues.
> This issue is recognized after I support this pattern.
>
>
>
> juzhe.zh...@rivai.ai
>
> From: Kito Cheng
> Date: 2023-09-11 10:18
> To: Juzhe-Zhong
> CC: gcc-patches; kito.cheng
> Subject: Re: [PATCH] RISC-V: Add VLS modes VEC_PERM support[PR111311]
> > diff --git a/gcc/config/riscv/autovec-vls.md 
> > b/gcc/config/riscv/autovec-vls.md
> > index d208b418e5f..6f48f7d6232 100644
> > --- a/gcc/config/riscv/autovec-vls.md
> > +++ b/gcc/config/riscv/autovec-vls.md
> > @@ -148,6 +148,14 @@
> >[(set_attr "type" "vmov")
> > (set_attr "mode" "")])
> >
> > +(define_insn "*mov_vls"
> > +  [(set (match_operand:VLSB 0 "register_operand" "=vr")
> > +   (match_operand:VLSB 1 "register_operand" " vr"))]
> > +  "TARGET_VECTOR"
> > +  "vmv1r.v\t%0,%1"
> > +  [(set_attr "type" "vmov")
> > +   (set_attr "mode" "")])
>
> Should we also add loads and stores as well?
> and just make sure this is also necessary for the fix and not sneaky, right?
>
> > +
> >  (define_expand "movmisalign"
> >[(set (match_operand:VLS 0 "nonimmediate_operand")
> > (match_operand:VLS 1 "general_operand"))]
>
 


Re: Re: [PATCH] RISC-V: Add VLS modes VEC_PERM support[PR111311]

2023-09-10 Thread juzhe.zh...@rivai.ai
>> Should we also add loads and stores as well?
>> and just make sure this is also necessary for the fix and not sneaky, right?

No, we don't need loads/stores. Since this following handling codes:
(define_insn_and_split "*mov_lra"
  [(set (match_operand:VLS_AVL_REG 0 "reg_or_mem_operand" "=vr, m,vr")
  (match_operand:VLS_AVL_REG 1 "reg_or_mem_operand" "  m,vr,vr"))
   (clobber (match_scratch:P 2 "=&r,&r,X"))]
  "TARGET_VECTOR && (lra_in_progress || reload_completed)
   && (register_operand (operands[0], mode)
   || register_operand (operands[1], mode))"
  "#"
  "&& reload_completed"
  [(const_int 0)]
{
  if (REG_P (operands[0]) && REG_P (operands[1]))
  emit_insn (gen_rtx_SET (operands[0], operands[1]));
  else
{
  emit_move_insn (operands[2], gen_int_mode (GET_MODE_NUNITS 
(mode),
 Pmode));
  unsigned insn_flags
= GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL
 ? riscv_vector::UNARY_MASK_OP
 : riscv_vector::UNARY_OP;
  riscv_vector::emit_nonvlmax_insn (code_for_pred_mov 
(mode),
  insn_flags, operands, operands[2]);
}
  DONE;
}
  [(set_attr "type" "vmov")]
)

We split special case use emit_insn (gen_rtx_SET (operands[0], operands[1]));

Missing this pattern will cause ICE but current testcases didn't produce such 
issues.
This issue is recognized after I support this pattern.



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-09-11 10:18
To: Juzhe-Zhong
CC: gcc-patches; kito.cheng
Subject: Re: [PATCH] RISC-V: Add VLS modes VEC_PERM support[PR111311]
> diff --git a/gcc/config/riscv/autovec-vls.md b/gcc/config/riscv/autovec-vls.md
> index d208b418e5f..6f48f7d6232 100644
> --- a/gcc/config/riscv/autovec-vls.md
> +++ b/gcc/config/riscv/autovec-vls.md
> @@ -148,6 +148,14 @@
>[(set_attr "type" "vmov")
> (set_attr "mode" "")])
>
> +(define_insn "*mov_vls"
> +  [(set (match_operand:VLSB 0 "register_operand" "=vr")
> +   (match_operand:VLSB 1 "register_operand" " vr"))]
> +  "TARGET_VECTOR"
> +  "vmv1r.v\t%0,%1"
> +  [(set_attr "type" "vmov")
> +   (set_attr "mode" "")])
 
Should we also add loads and stores as well?
and just make sure this is also necessary for the fix and not sneaky, right?
 
> +
>  (define_expand "movmisalign"
>[(set (match_operand:VLS 0 "nonimmediate_operand")
> (match_operand:VLS 1 "general_operand"))]
 


Re: [PATCH v1] RISC-V: Support FP SGNJ autovec for VLS mode

2023-09-05 Thread juzhe.zh...@rivai.ai
LGTM



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-09-05 18:32
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Support FP SGNJ autovec for VLS mode
From: Pan Li 
 
This patch would like to allow the VLS mode autovec for the
floating-point binary operation MAX/MIN.
 
Given below code example:
 
void test(float * restrict out, float * restrict in1, float * restrict in2)
{
  for (int i = 0; i < 128; i++)
out[i] = __builtin_copysignf (in1[i], in2[i]);
}
 
Before this patch:
test:
  csrra4,vlenb
  sllia4,a4,1
  li  a5,128
  bleua5,a4,.L2
  mv  a5,a4
.L2:
  vsetvli zero,a5,e32,m8,ta,ma
  vle32.v v8,0(a1)
  vle32.v v16,0(a2)
  vsetvli a4,zero,e32,m8,ta,ma
  vfsgnj.vv   v8,v8,v16
  vsetvli zero,a5,e32,m8,ta,ma
  vse32.v v8,0(a0)
  ret
 
After this patch:
test:
  li  a5,128
  vsetvli zero,a5,e32,m1,ta,ma
  vle32.v v1,0(a1)
  vle32.v v2,0(a2)
  vfsgnj.vv   v1,v1,v2
  vse32.v v1,0(a0)
  ret
 
Signed-off-by: Pan Li 
 
gcc/ChangeLog:
 
* config/riscv/autovec-vls.md (copysign3): New pattern.
* config/riscv/vector.md: Extend iterator for VLS.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/vls/def.h: New macro.
* gcc.target/riscv/rvv/autovec/vls/floating-point-sgnj-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-sgnj-2.c: New test.
---
gcc/config/riscv/autovec-vls.md   | 22 ++
gcc/config/riscv/vector.md| 24 +--
.../gcc.target/riscv/rvv/autovec/vls/def.h|  8 
.../rvv/autovec/vls/floating-point-sgnj-1.c   | 43 +++
.../rvv/autovec/vls/floating-point-sgnj-2.c   | 43 +++
5 files changed, 128 insertions(+), 12 deletions(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/floating-point-sgnj-1.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/floating-point-sgnj-2.c
 
diff --git a/gcc/config/riscv/autovec-vls.md b/gcc/config/riscv/autovec-vls.md
index 7ef29637e33..31b6c4ae714 100644
--- a/gcc/config/riscv/autovec-vls.md
+++ b/gcc/config/riscv/autovec-vls.md
@@ -255,6 +255,28 @@ (define_insn_and_split "3"
[(set_attr "type" "vector")]
)
+;; -
+;; Includes:
+;; - vfsgnj.vv
+;; - vfsgnj.vf
+;; -
+(define_insn_and_split "copysign3"
+  [(set (match_operand:VLSF 0 "register_operand")
+(unspec:VLSF
+  [(match_operand:VLSF  1 "register_operand")
+   (match_operand:VLSF  2 "register_operand")] UNSPEC_VCOPYSIGN))]
+  "TARGET_VECTOR && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+riscv_vector::emit_vlmax_insn (code_for_pred (UNSPEC_VCOPYSIGN, 
mode),
+riscv_vector::BINARY_OP, operands);
+DONE;
+  }
+  [(set_attr "type" "vector")]
+)
+
;; 
---
;;  [INT] Unary operations
;; 
---
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 9d7b4bbe1d4..fc985ff6a01 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -6166,8 +6166,8 @@ (define_insn "@pred__reverse_scalar"
(symbol_ref "riscv_vector::get_frm_mode (operands[9])"))])
(define_insn "@pred_"
-  [(set (match_operand:VF 0 "register_operand"   "=vd, vd, vr, vr")
- (if_then_else:VF
+  [(set (match_operand:V_VLSF 0 "register_operand"   "=vd, vd, vr, vr")
+ (if_then_else:V_VLSF
  (unspec:
[(match_operand: 1 "vector_mask_operand" " vm, vm,Wc1,Wc1")
 (match_operand 5 "vector_length_operand"" rK, rK, rK, rK")
@@ -6176,10 +6176,10 @@ (define_insn "@pred_"
 (match_operand 8 "const_int_operand""  i,  i,  i,  i")
 (reg:SI VL_REGNUM)
 (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
-   (unspec:VF
- [(match_operand:VF 3 "register_operand"   " vr, vr, vr, vr")
-  (match_operand:VF 4 "register_operand"   " vr, vr, vr, vr")] 
VCOPYSIGNS)
-   (match_operand:VF 2 "vector_merge_operand" " vu,  0, vu,  0")))]
+   (unspec:V_VLSF
+ [(match_operand:V_VLSF 3 "register_operand"  " vr, vr, vr, vr")
+  (match_operand:V_VLSF 4 "register_operand"  " vr, vr, vr, vr")] 
VCOPYSIGNS)
+   (match_operand:V_VLSF 2 "vector_merge_operand" " vu,  0, vu,  0")))]
   "TARGET_VECTOR"
   "vfsgnj.vv\t%0,%3,%4%p1"
   [(set_attr "type" "vfsgnj")
@@ -6207,8 +6207,8 @@ (define_insn "@pred_ncopysign&quo

Re: [PATCH] RISC-V: Fix Dynamic LMUL compile option

2023-09-04 Thread juzhe.zh...@rivai.ai
simple patch for dynamic cost model:
https://gcc.gnu.org/pipermail/gcc-patches/2023-September/629212.html 
committed.



juzhe.zh...@rivai.ai
 
From: Juzhe-Zhong
Date: 2023-09-04 17:08
To: gcc-patches
CC: kito.cheng; kito.cheng; jeffreyalaw; rdapp.gcc; Juzhe-Zhong
Subject: [PATCH] RISC-V: Fix Dynamic LMUL compile option
gcc/ChangeLog:
 
* config/riscv/riscv-opts.h (enum riscv_autovec_lmul_enum): Fix Dynamic status.
* config/riscv/riscv-v.cc (preferred_simd_mode): Ditto.
(autovectorize_vector_modes): Ditto.
(vectorize_related_mode): Ditto.
 
---
gcc/config/riscv/riscv-opts.h |  2 +-
gcc/config/riscv/riscv-v.cc   | 15 ---
2 files changed, 9 insertions(+), 8 deletions(-)
 
diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index 79e0f12e388..b6b5907e111 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -81,7 +81,7 @@ enum riscv_autovec_lmul_enum {
   RVV_M4 = 4,
   RVV_M8 = 8,
   /* For dynamic LMUL, we compare COST start with LMUL8.  */
-  RVV_DYNAMIC = RVV_M8
+  RVV_DYNAMIC = 9
};
enum riscv_multilib_select_kind {
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index c8ad96f44d5..fbbc16a3c26 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -1971,16 +1971,16 @@ preferred_simd_mode (scalar_mode mode)
  vectorizer when we enable them in this target hook. Currently, we can
  support auto-vectorization in -march=rv32_zve32x_zvl128b. Wheras,
  -march=rv32_zve32x_zvl32b or -march=rv32_zve32x_zvl64b are disabled.  */
+  int lmul = riscv_autovec_lmul == RVV_DYNAMIC ? RVV_M8 : riscv_autovec_lmul;
   if (autovec_use_vlmax_p ())
 {
-  if (TARGET_MIN_VLEN < 128 && riscv_autovec_lmul < RVV_M2)
+  if (TARGET_MIN_VLEN < 128 && lmul < RVV_M2)
return word_mode;
   /* We use LMUL = 1 as base bytesize which is BYTES_PER_RISCV_VECTOR and
riscv_autovec_lmul as multiply factor to calculate the the NUNITS to
get the auto-vectorization mode.  */
   poly_uint64 nunits;
-  poly_uint64 vector_size
- = BYTES_PER_RISCV_VECTOR * ((int) riscv_autovec_lmul);
+  poly_uint64 vector_size = BYTES_PER_RISCV_VECTOR * lmul;
   poly_uint64 scalar_size = GET_MODE_SIZE (mode);
   gcc_assert (multiple_p (vector_size, scalar_size, &nunits));
   machine_mode rvv_mode;
@@ -2154,10 +2154,10 @@ get_cmp_insn_code (rtx_code code, machine_mode mode)
unsigned int
autovectorize_vector_modes (vector_modes *modes, bool)
{
+  int lmul = riscv_autovec_lmul == RVV_DYNAMIC ? RVV_M8 : riscv_autovec_lmul;
   if (autovec_use_vlmax_p ())
 {
-  poly_uint64 full_size
- = BYTES_PER_RISCV_VECTOR * ((int) riscv_autovec_lmul);
+  poly_uint64 full_size = BYTES_PER_RISCV_VECTOR * lmul;
   /* Start with a RVVQImode where LMUL is the number of units that
fit a whole vector.
@@ -2187,7 +2187,7 @@ autovectorize_vector_modes (vector_modes *modes, bool)
 {
   /* Push all VLSmodes according to TARGET_MIN_VLEN.  */
   unsigned int i = 0;
-  unsigned int base_size = TARGET_MIN_VLEN * riscv_autovec_lmul / 8;
+  unsigned int base_size = TARGET_MIN_VLEN * lmul / 8;
   unsigned int size = base_size;
   machine_mode mode;
   while (size > 0 && get_vector_mode (QImode, size).exists (&mode))
@@ -2212,8 +2212,9 @@ vectorize_related_mode (machine_mode vector_mode, 
scalar_mode element_mode,
{
   /* TODO: We will support RVV VLS auto-vectorization mode in the future. */
   poly_uint64 min_units;
+  int lmul = riscv_autovec_lmul == RVV_DYNAMIC ? RVV_M8 : riscv_autovec_lmul;
   if (autovec_use_vlmax_p () && riscv_v_ext_vector_mode_p (vector_mode)
-  && multiple_p (BYTES_PER_RISCV_VECTOR * ((int) riscv_autovec_lmul),
+  && multiple_p (BYTES_PER_RISCV_VECTOR * lmul,
 GET_MODE_SIZE (element_mode), &min_units))
 {
   machine_mode rvv_mode;
-- 
2.36.1
 


Re: [PATCH] RISC-V: Fix vsetvl pass ICE

2023-08-30 Thread juzhe.zh...@rivai.ai
Ok for trunk. But not sure whether it's ok for GCC-13.



juzhe.zh...@rivai.ai
 
From: Lehua Ding
Date: 2023-08-30 17:51
To: gcc-patches
CC: juzhe.zhong; kito.cheng; rdapp.gcc; palmer; jeffreyalaw
Subject: [PATCH] RISC-V: Fix vsetvl pass ICE
This patch fix pr111234 (a vsetvl pass ICE) when fuse a mask any
vlmax vsetvl_vtype_change_only insn with a mu vsetvl insn.
 
PR target/111234
 
gcc/ChangeLog:
 
* config/riscv/riscv-vsetvl.cc (gen_vsetvl_pat): Remove condition.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/vsetvl/pr111234.c: New test.
 
---
gcc/config/riscv/riscv-vsetvl.cc  |  2 +-
.../gcc.target/riscv/rvv/vsetvl/pr111234.c| 19 +++
2 files changed, 20 insertions(+), 1 deletion(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111234.c
 
diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 1386d9250ca..a81bb53a521 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -655,7 +655,7 @@ gen_vsetvl_pat (rtx_insn *rinsn, const vector_insn_info 
&info,
 new_pat = gen_vsetvl_pat (VSETVL_NORMAL, new_info, vl);
   else
 {
-  if (vsetvl_insn_p (rinsn) || vlmax_avl_p (info.get_avl ()))
+  if (vsetvl_insn_p (rinsn))
new_pat = gen_vsetvl_pat (VSETVL_NORMAL, new_info, get_vl (rinsn));
   else if (INSN_CODE (rinsn) == CODE_FOR_vsetvl_vtype_change_only)
new_pat = gen_vsetvl_pat (VSETVL_VTYPE_CHANGE_ONLY, new_info, NULL_RTX);
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111234.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111234.c
new file mode 100644
index 000..ee5eec4a257
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111234.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3" } */
+
+#include 
+
+void
+f (vint32m1_t *in, vint64m2_t *out, vbool32_t *m, int b)
+{
+  vint32m1_t va = *in;
+  vbool32_t mask = *m;
+  vint64m2_t vb
+= __riscv_vwadd_vx_i64m2_m (mask, va, 1, __riscv_vsetvlmax_e64m2 ());
+  vint64m2_t vc = __riscv_vadd_vx_i64m2 (vb, 1, __riscv_vsetvlmax_e64m2 ());
+
+  if (b != 0)
+vc = __riscv_vadd_vx_i64m2_mu (mask, vc, vc, 1, __riscv_vsetvlmax_e64m2 
());
+
+  *out = vc;
+}
-- 
2.36.3
 


[PATCH] rtl-optimization/110939 Really fix narrow comparison of memory and constant

2023-08-29 Thread juzhe.zh...@rivai.ai
Ping. This patch also fixed issue occurred in RISC-V backend:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71 

Thanks.


juzhe.zh...@rivai.ai


Re: Re: [PATCH V4] RISC-V: Enable vec_int testsuite for RVV VLA vectorization

2023-08-28 Thread juzhe.zh...@rivai.ai
>> Juzhe mentioned he doesn't want to commit this before
>> all/most bugs are addresses anyway, right?
Yes.


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-08-28 22:27
To: Kito Cheng; Juzhe-Zhong
CC: rdapp.gcc; gcc-patches; kito.cheng
Subject: Re: [PATCH V4] RISC-V: Enable vec_int testsuite for RVV VLA 
vectorization
> LGTM from my side, but I would like to wait Robin is ok too
 
In principle I'm OK with it as well, realizing we will still need to fine-tune
a lot here anyway.  For now, IMHO it's good to have some additional test 
coverage
in the vector space but we should not expect every test to be correct/a good 
match
for everything we do yet.  Juzhe mentioned he doesn't want to commit this before
all/most bugs are addresses anyway, right?
 
Regards
Robin
 


<    1   2   3   4   5   6   7   8   9   10   >