Re: [PATCH v1] RISC-V: Support FP irintf auto vectorization
LGTM。 Thanks。 juzhe.zh...@rivai.ai From: pan2.li Date: 2023-10-12 09:52 To: gcc-patches CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng Subject: [PATCH v1] RISC-V: Support FP irintf auto vectorization From: Pan Li This patch would like to support the FP irintf auto vectorization. * int irintf (float) Due to the limitation that only the same size of data type are allowed in the vectorier, the standard name lrintmn2 only act on SF => SI. Given we have code like: void test_irintf (int *out, float *in, unsigned count) { for (unsigned i = 0; i < count; i++) out[i] = __builtin_irintf (in[i]); } Before this patch: .L3: ... flw fa5,0(a1) fcvt.w.s a5,fa5,dyn sw a5,-4(a0) ... bne a1,a4,.L3 After this patch: .L3: ... vle32.v v1,0(a1) vfcvt.x.f.v v1,v1 vse32.v v1,0(a0) ... bne a2,zero,.L3 The rest part like DF => SI/HF => SI will be covered by the hook TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION. gcc/ChangeLog: * config/riscv/autovec.md (lrint2): Rename from. (lrint2): Rename to. * config/riscv/vector-iterators.md: Rename and remove TARGET_64BIT. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/unop/math-irint-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-irint-run-0.c: New test. * gcc.target/riscv/rvv/autovec/vls/math-irint-0.c: New test. Signed-off-by: Pan Li --- gcc/config/riscv/autovec.md | 9 ++- gcc/config/riscv/vector-iterators.md | 74 +-- .../riscv/rvv/autovec/unop/math-irint-0.c | 14 .../riscv/rvv/autovec/unop/math-irint-run-0.c | 63 .../riscv/rvv/autovec/vls/math-irint-0.c | 30 5 files changed, 149 insertions(+), 41 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-irint-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-irint-run-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-irint-0.c diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index dc76a01d82c..c3a51e22ceb 100644 --- a/gcc/config/riscv/autovec.md +++ b/gcc/config/riscv/autovec.md @@ -2240,6 +2240,7 @@ (define_expand "avg3_ceil" ;; - trunc/truncf ;; - roundeven/roundevenf ;; - lrint/lrintf +;; - irintf ;; - (define_expand "ceil2" [(match_operand:V_VLSF 0 "register_operand") @@ -2311,12 +2312,12 @@ (define_expand "roundeven2" } ) -(define_expand "lrint2" - [(match_operand: 0 "register_operand") - (match_operand:V_VLS_FCONVERTL 1 "register_operand")] +(define_expand "lrint2" + [(match_operand:0 "register_operand") + (match_operand:V_VLS_FCONVERT_I_L_LL 1 "register_operand")] "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math" { -riscv_vector::expand_vec_lrint (operands[0], operands[1], mode, mode); +riscv_vector::expand_vec_lrint (operands[0], operands[1], mode, mode); DONE; } ) diff --git a/gcc/config/riscv/vector-iterators.md b/gcc/config/riscv/vector-iterators.md index bb0c46ea30a..96ddd34c958 100644 --- a/gcc/config/riscv/vector-iterators.md +++ b/gcc/config/riscv/vector-iterators.md @@ -3281,8 +3281,8 @@ (define_mode_attr vnnconvert [ (V512DI "v512hf") ]) -;; L indicates convert to long -(define_mode_attr VLCONVERT [ +;; Convert to int, long and long long +(define_mode_attr V_I_L_LL_CONVERT [ (RVVM8SF "RVVM8SI") (RVVM4SF "RVVM4SI") (RVVM2SF "RVVM2SI") (RVVM1SF "RVVM1SI") (RVVMF2SF "RVVMF2SI") @@ -3298,7 +3298,7 @@ (define_mode_attr VLCONVERT [ (V512DF "V512DI") ]) -(define_mode_attr vlconvert [ +(define_mode_attr v_i_l_ll_convert [ (RVVM8SF "rvvm8si") (RVVM4SF "rvvm4si") (RVVM2SF "rvvm2si") (RVVM1SF "rvvm1si") (RVVMF2SF "rvvmf2si") @@ -3314,40 +3314,40 @@ (define_mode_attr vlconvert [ (V512DF "v512di") ]) -(define_mode_iterator V_VLS_FCONVERTL [ - (RVVM8SF "TARGET_VECTOR_ELEN_FP_32 && !TARGET_64BIT") - (RVVM4SF "TARGET_VECTOR_ELEN_FP_32 && !TARGET_64BIT") - (RVVM2SF "TARGET_VECTOR_ELEN_FP_32 && !TARGET_64BIT") - (RVVM1SF "TARGET_VECTOR_ELEN_FP_32 && !TARGET_64BIT") - (RVVMF2SF "TARGET_VECTOR_ELEN_FP_32 && !TARGET_64BIT && TARGET_MIN_VLEN > 32") - - (RVVM8DF "TARGET_VECTOR_ELEN_FP_64 && TARGET_64BIT") - (RVVM4DF "TARGET_VECTOR_ELEN_FP_64 && TARGET_64BIT") - (RVVM2DF "TARGET_VECTOR_ELEN_FP_64 && TARGET_64BIT") - (RVVM1DF "TARGET_VECTOR_ELEN_FP_64 && TARGET_64BIT") - - (V1SF "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_FP_32 &
Re: Re: RISC-V: Support CORE-V XCVMAC and XCVALU extensions
Plz revert it. It blocks development of all targets. juzhe.zh...@rivai.ai From: Andrew Pinski Date: 2023-10-12 09:03 To: juzhe.zh...@rivai.ai CC: gcc-patches; jeffreyalaw; Kito.cheng; kito.cheng; Robin Dapp Subject: Re: RISC-V: Support CORE-V XCVMAC and XCVALU extensions On Wed, Oct 11, 2023 at 6:01 PM juzhe.zh...@rivai.ai wrote: > > ../../../../gcc/gcc/doc/extend.texi:21708: warning: node next `RISC-V Vector > Intrinsics' in menu `CORE-V Built-in Functions' and in sectioning `RX > Built-in Functions' differ > ../../../../gcc/gcc/doc/extend.texi:21716: warning: node `RX Built-in > Functions' is next for `CORE-V Built-in Functions' in menu but not in > sectioning > ../../../../gcc/gcc/doc/extend.texi:21716: warning: node `RISC-V Vector > Intrinsics' is prev for `CORE-V Built-in Functions' in menu but not in > sectioning > ../../../../gcc/gcc/doc/extend.texi:21716: warning: node up `CORE-V Built-in > Functions' in menu `Target Builtins' and in sectioning `RISC-V Vector > Intrinsics' differ > ../../../../gcc/gcc/doc/extend.texi:21708: node `RISC-V Vector Intrinsics' > lacks menu item for `CORE-V Built-in Functions' despite being its Up target > ../../../../gcc/gcc/doc/extend.texi:21889: warning: node prev `RX Built-in > Functions' in menu `CORE-V Built-in Functions' and in sectioning `RISC-V > Vector Intrinsics' differ > In file included from ../../../../gcc/gcc/gensupport.cc:26:0: > ../../../../gcc/gcc/rtl.h:66:26: warning: ‘rtx_def::code’ is too small to > hold all values of ‘enum rtx_code’ > #define RTX_CODE_BITSIZE 8 > ^ > ../../../../gcc/gcc/rtl.h:318:33: note: in expansion of macro > ‘RTX_CODE_BITSIZE’ >ENUM_BITFIELD(rtx_code) code: RTX_CODE_BITSIZE; > ^~~~ > > make[2]: *** [Makefile:3534: doc/gcc.info] Error 1 > make[2]: *** Waiting for unfinished jobs > rm gfdl.pod gcc.pod gcov-dump.pod gcov-tool.pod fsf-funding.pod gpl.pod > cpp.pod gcov.pod lto-dump.pod > make[2]: Leaving directory > '/work/home/jzzhong/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/build-gcc-newlib-stage1/gcc' > make[1]: *** [Makefile:4648: all-gcc] Error 2 > make[1]: Leaving directory > '/work/home/jzzhong/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/build-gcc-newlib-stage1' > make: *** [Makefile:590: stamps/build-gcc-newlib-stage1] Error 2 This is also recorded as https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111777 . It breaks more than just RISCV; it depends on the version of texinfo that is installed too. Thanks, Andrew > > > juzhe.zh...@rivai.ai
RISC-V: Support CORE-V XCVMAC and XCVALU extensions
../../../../gcc/gcc/doc/extend.texi:21708: warning: node next `RISC-V Vector Intrinsics' in menu `CORE-V Built-in Functions' and in sectioning `RX Built-in Functions' differ ../../../../gcc/gcc/doc/extend.texi:21716: warning: node `RX Built-in Functions' is next for `CORE-V Built-in Functions' in menu but not in sectioning ../../../../gcc/gcc/doc/extend.texi:21716: warning: node `RISC-V Vector Intrinsics' is prev for `CORE-V Built-in Functions' in menu but not in sectioning ../../../../gcc/gcc/doc/extend.texi:21716: warning: node up `CORE-V Built-in Functions' in menu `Target Builtins' and in sectioning `RISC-V Vector Intrinsics' differ ../../../../gcc/gcc/doc/extend.texi:21708: node `RISC-V Vector Intrinsics' lacks menu item for `CORE-V Built-in Functions' despite being its Up target ../../../../gcc/gcc/doc/extend.texi:21889: warning: node prev `RX Built-in Functions' in menu `CORE-V Built-in Functions' and in sectioning `RISC-V Vector Intrinsics' differ In file included from ../../../../gcc/gcc/gensupport.cc:26:0: ../../../../gcc/gcc/rtl.h:66:26: warning: ‘rtx_def::code’ is too small to hold all values of ‘enum rtx_code’ #define RTX_CODE_BITSIZE 8 ^ ../../../../gcc/gcc/rtl.h:318:33: note: in expansion of macro ‘RTX_CODE_BITSIZE’ ENUM_BITFIELD(rtx_code) code: RTX_CODE_BITSIZE; ^~~~ make[2]: *** [Makefile:3534: doc/gcc.info] Error 1 make[2]: *** Waiting for unfinished jobs rm gfdl.pod gcc.pod gcov-dump.pod gcov-tool.pod fsf-funding.pod gpl.pod cpp.pod gcov.pod lto-dump.pod make[2]: Leaving directory '/work/home/jzzhong/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/build-gcc-newlib-stage1/gcc' make[1]: *** [Makefile:4648: all-gcc] Error 2 make[1]: Leaving directory '/work/home/jzzhong/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/build-gcc-newlib-stage1' make: *** [Makefile:590: stamps/build-gcc-newlib-stage1] Error 2 juzhe.zh...@rivai.ai
Re: Re: [PATCH V2] RISC-V: Fix incorrect index(offset) of gather/scatter
Oh. Yes. Address comment: V3: https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632623.html Use if (inner_offsize < BITS_PER_WORD) juzhe.zh...@rivai.ai From: Robin Dapp Date: 2023-10-11 17:50 To: Juzhe-Zhong; gcc-patches CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw Subject: Re: [PATCH V2] RISC-V: Fix incorrect index(offset) of gather/scatter Hi Juzhe, good that you noticed it now, I should have caught that in the review back then... One thing, though: > + if (inner_offsize < GET_MODE_BITSIZE (GET_MODE (ptr)).to_constant ()) Shouldn't ptr always be Pmode i.e. the bitsize == XLEN? Rest LGTM. Regards Robin
Re: [PATCH] RISC-V: Fix incorrect index(offset) of gather/scatter
Refine the codes in V2: https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632619.html juzhe.zh...@rivai.ai From: Juzhe-Zhong Date: 2023-10-11 17:03 To: gcc-patches CC: kito.cheng; kito.cheng; jeffreyalaw; rdapp.gcc; Juzhe-Zhong Subject: [PATCH] RISC-V: Fix incorrect index(offset) of gather/scatter I suddenly I made a mistake that was lucky un-exposed. https://godbolt.org/z/c3jzrh7or GCC is using 32 bit index offset: vsll.vi v1,v1,2 vsetvli zero,a5,e32,m1,ta,ma vluxei32.v v1,(a1),v1 This is wrong since v1 may overflow 32bit after vsll.vi. After this patch: vsext.vf2 v8,v4 vsll.vi v8,v8,2 vluxei64.v v8,(a1),v8 Same as Clang. Regression passed. Ok for trunk ? gcc/ChangeLog: * config/riscv/autovec.md: Fix offset bug. * config/riscv/riscv-protos.h (gather_scatter_valid_offset_p): New function. * config/riscv/riscv-v.cc (expand_gather_scatter): Fix offset bug. (gather_scatter_valid_offset_p): New function. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/gather-scatter/offset_extend-1.c: New test. --- gcc/config/riscv/autovec.md | 28 +-- gcc/config/riscv/riscv-protos.h | 1 + gcc/config/riscv/riscv-v.cc | 16 +-- .../autovec/gather-scatter/offset_extend-1.c | 14 ++ 4 files changed, 42 insertions(+), 17 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/offset_extend-1.c diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index 41bff3a318f..07607bff71e 100644 --- a/gcc/config/riscv/autovec.md +++ b/gcc/config/riscv/autovec.md @@ -59,7 +59,7 @@ (match_operand: 5 "vector_mask_operand") (match_operand 6 "autovec_length_operand") (match_operand 7 "const_0_operand")] - "TARGET_VECTOR" + "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p (mode)" { riscv_vector::expand_gather_scatter (operands, true); DONE; @@ -74,7 +74,7 @@ (match_operand: 5 "vector_mask_operand") (match_operand 6 "autovec_length_operand") (match_operand 7 "const_0_operand")] - "TARGET_VECTOR" + "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p (mode)" { riscv_vector::expand_gather_scatter (operands, true); DONE; @@ -89,7 +89,7 @@ (match_operand: 5 "vector_mask_operand") (match_operand 6 "autovec_length_operand") (match_operand 7 "const_0_operand")] - "TARGET_VECTOR" + "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p (mode)" { riscv_vector::expand_gather_scatter (operands, true); DONE; @@ -104,7 +104,7 @@ (match_operand: 5 "vector_mask_operand") (match_operand 6 "autovec_length_operand") (match_operand 7 "const_0_operand")] - "TARGET_VECTOR" + "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p (mode)" { riscv_vector::expand_gather_scatter (operands, true); DONE; @@ -119,7 +119,7 @@ (match_operand: 5 "vector_mask_operand") (match_operand 6 "autovec_length_operand") (match_operand 7 "const_0_operand")] - "TARGET_VECTOR" + "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p (mode)" { riscv_vector::expand_gather_scatter (operands, true); DONE; @@ -134,7 +134,7 @@ (match_operand: 5 "vector_mask_operand") (match_operand 6 "autovec_length_operand") (match_operand 7 "const_0_operand")] - "TARGET_VECTOR" + "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p (mode)" { riscv_vector::expand_gather_scatter (operands, true); DONE; @@ -153,7 +153,7 @@ (match_operand: 5 "vector_mask_operand") (match_operand 6 "autovec_length_operand") (match_operand 7 "const_0_operand")] - "TARGET_VECTOR" + "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p (mode)" { riscv_vector::expand_gather_scatter (operands, true); DONE; @@ -172,7 +172,7 @@ (match_operand: 5 "vector_mask_operand") (match_operand 6 "autovec_length_operand") (match_operand 7 "const_0_operand")] - "TARGET_VECTOR" + "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p (mode)" { riscv_vector::expand_gather_scatter (operands, false); DONE; @@ -187,7 +187,7 @@ (match_operand: 5 "vector_mask_operand") (match_operand 6 "autovec_length_operand") (match_operand 7 "const_0_operand")] - "TARGET_VECTOR" + "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p (mode)" { riscv_vector::expand_gather_scatter (operands,
Re: [PATCH v1] RISC-V: Support FP lrint/lrintf auto vectorization
LGTM. juzhe.zh...@rivai.ai From: pan2.li Date: 2023-10-11 16:49 To: gcc-patches CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng Subject: [PATCH v1] RISC-V: Support FP lrint/lrintf auto vectorization From: Pan Li This patch would like to support the FP lrint/lrintf auto vectorization. * long lrint (double) for rv64 * long lrintf (float) for rv32 Due to the limitation that only the same size of data type are allowed in the vectorier, the standard name lrintmn2 only act on DF => DI for rv64, and SF => SI for rv32. Given we have code like: void test_lrint (long *out, double *in, unsigned count) { for (unsigned i = 0; i < count; i++) out[i] = __builtin_lrint (in[i]); } Before this patch: .L3: ... fld fa5,0(a1) fcvt.l.d a5,fa5,dyn sd a5,-8(a0) ... bne a1,a4,.L3 After this patch: .L3: ... vsetvli a3,zero,e64,m1,ta,ma vfcvt.x.f.v v1,v1 vsetvli zero,a2,e64,m1,ta,ma vse32.v v1,0(a0) ... bne a2,zero,.L3 The rest part like SF => DI/HF => DI/DF => SI/HF => SI will be covered by TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION. gcc/ChangeLog: * config/riscv/autovec.md (lrint2): New pattern for lrint/lintf. * config/riscv/riscv-protos.h (expand_vec_lrint): New func decl for expanding lint. * config/riscv/riscv-v.cc (emit_vec_cvt_x_f): New helper func impl for vfcvt.x.f.v. (expand_vec_lrint): New function impl for expanding lint. * config/riscv/vector-iterators.md: New mode attr and iterator. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/unop/test-math.h: New define for CVT like test case. * gcc.target/riscv/rvv/autovec/vls/def.h: Ditto. * gcc.target/riscv/rvv/autovec/unop/math-lrint-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-lrint-1.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-lrint-run-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-lrint-run-1.c: New test. * gcc.target/riscv/rvv/autovec/vls/math-lrint-0.c: New test. * gcc.target/riscv/rvv/autovec/vls/math-lrint-1.c: New test. Signed-off-by: Pan Li --- gcc/config/riscv/autovec.md | 11 +++ gcc/config/riscv/riscv-protos.h | 1 + gcc/config/riscv/riscv-v.cc | 20 ++ gcc/config/riscv/vector-iterators.md | 69 +++ .../riscv/rvv/autovec/unop/math-lrint-0.c | 14 .../riscv/rvv/autovec/unop/math-lrint-1.c | 14 .../riscv/rvv/autovec/unop/math-lrint-run-0.c | 63 + .../riscv/rvv/autovec/unop/math-lrint-run-1.c | 63 + .../riscv/rvv/autovec/unop/test-math.h| 24 +++ .../gcc.target/riscv/rvv/autovec/vls/def.h| 9 +++ .../riscv/rvv/autovec/vls/math-lrint-0.c | 30 .../riscv/rvv/autovec/vls/math-lrint-1.c | 30 12 files changed, 348 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lrint-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lrint-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lrint-run-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lrint-run-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-lrint-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-lrint-1.c diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index 53e9d34eea1..dc76a01d82c 100644 --- a/gcc/config/riscv/autovec.md +++ b/gcc/config/riscv/autovec.md @@ -2239,6 +2239,7 @@ (define_expand "avg3_ceil" ;; - round/roundf ;; - trunc/truncf ;; - roundeven/roundevenf +;; - lrint/lrintf ;; - (define_expand "ceil2" [(match_operand:V_VLSF 0 "register_operand") @@ -2309,3 +2310,13 @@ (define_expand "roundeven2" DONE; } ) + +(define_expand "lrint2" + [(match_operand: 0 "register_operand") + (match_operand:V_VLS_FCONVERTL 1 "register_operand")] + "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math" + { +riscv_vector::expand_vec_lrint (operands[0], operands[1], mode, mode); +DONE; + } +) diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h index 43426a5326b..f6bd15b47b0 100644 --- a/gcc/config/riscv/riscv-protos.h +++ b/gcc/config/riscv/riscv-protos.h @@ -474,6 +474,7 @@ void expand_vec_rint (rtx, rtx, machine_mode, machine_mode); void expand_vec_round (rtx, rtx, machine_mode, machine_mode); void expand_vec_trunc (rtx, rtx, machine_mode, machine_mode); void expand_vec_roundeven (rtx, rtx, machine_mode, machine_mode); +void expand_vec_lrint (rtx, rtx, machine_mode, machine_mode); #endif bool sew64_scalar_helper (rtx *, rtx *, rtx, machine_mode, bool, void (*)(rtx *, rtx)); diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index c72e411f125..64f99d85d91 100644 ---
Re: Re: [PATCH] RISC-V/testsuite: Enable `vect_pack_trunc'
Hi, Maciej. I have enable all vectorization test on RVV which is committed: https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632598.html But I have added every test with: +|| ([istarget riscv*-*-*] +&& [check_effective_target_riscv_v]) As you said, you think we don't need to add check_effective_target_riscv_v every time. So, feel free to adjust it (remove check_effective_target_riscv_v) and send a patch. But I hope you can adjust each set of tests carefully to make every thing consistent. Thanks. juzhe.zh...@rivai.ai From: Maciej W. Rozycki Date: 2023-10-11 05:35 To: juzhe.zhong CC: gcc-patches; jeffreyalaw; Robin Dapp; Kito.cheng Subject: Re: Re: [PATCH] RISC-V/testsuite: Enable `vect_pack_trunc' On Tue, 10 Oct 2023, juzhe.zh...@rivai.ai wrote: > It's weird. Could you give me the FAILs report? I keep forgetting that I have a piece of code in my board description files that makes the testsuite leave output files in place, which helps much when debugging failures (although it's not a perfect solution for test cases like those verified at different optimisation levels where the output filename is reused and consequently subsequent outputs overwrite earlier ones; something to improve perhaps). Unfortunately the presence of output files confuses some test cases and makes them fail; arguably a test case bug. None of the offending test cases are directly related to RISC-V development, so I just ignore the presence of these failures and only focus on regressions and progressions between testsuite runs. Here are fresh results with the testsuite output tree made tidy: === gcc Summary === # of expected passes 194602 # of unexpected failures 145 # of unexpected successes 11 # of expected failures 1631 # of unresolved testcases 120 # of unsupported tests 3828 It probably makes no sense to clutter the mailing list with my FAIL and UNRESOLVED results; I can send them off-list if you find them useful. Maciej
Re: Re: [PATCH] RISC-V: Enable full coverage vect tests
Thanks. Committed. juzhe.zh...@rivai.ai From: Robin Dapp Date: 2023-10-11 14:54 To: Juzhe-Zhong; gcc-patches CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw Subject: Re: [PATCH] RISC-V: Enable full coverage vect tests Hi Juzhe, seems OK to me. We don't support most of the patterns directly but as we can and want to vectorize them it makes sens to enable the tests. Regards Robin
Re: Re: [PATCH] RISC-V/testsuite: Enable `vect_pack_trunc'
It's weird. Could you give me the FAILs report? juzhe.zh...@rivai.ai From: Maciej W. Rozycki Date: 2023-10-10 18:18 To: 钟居哲 CC: gcc-patches; Jeff Law; rdapp.gcc; kito.cheng Subject: Re: 回复: [PATCH] RISC-V/testsuite: Enable `vect_pack_trunc' On Mon, 9 Oct 2023, Maciej W. Rozycki wrote: > > Btw, could you rebase to the trunk and run regression again? > > Full regression-testing takes roughly 40 hours here and I do not normally > update the tree midway through my work so as not to add variables and end > up chasing a moving target, especially with such an unstable state that we > have ended up with recently with the RISC-V port. Since I'm done with > this part I can refresh and schedule another run if you are curious as to > how it looks like from my side. For the C subset alone it'll take less. After 10 hours I have now got: === gcc Summary === # of expected passes 194576 # of unexpected failures 600 # of unexpected successes 11 # of expected failures 1631 # of unresolved testcases 120 # of unsupported tests 3828 as at commit cc5033721553 ("Fixes for profile count/probability maintenance"), which is slightly better, but still far from your 92 FAILs. NB I ran this testing with `--param=riscv-autovec-preference=scalable'; I guess I could have mentioned it. Maciej
Re: [PATCH v2 0/4] RISC-V target attribute
LGTM on my side. IMHO, we need to support attribute (rvv_vector_bits) which depend on this patch, am I right? If yes, will you support this feature in GCC-14 release? juzhe.zh...@rivai.ai From: Kito Cheng Date: 2023-10-10 12:13 To: gcc-patches; kito.cheng; palmer; jeffreyalaw; rdapp; juzhe.zhong Subject: [PATCH v2 0/4] RISC-V target attribute This patch set implement target attribute for RISC-V target, which is similar to other target like x86 or ARM, let user able to set some local setting per function without changing global settings. We support arch, tune and cpu first, and we will support other target attribute later, this version DOES NOT include multi-version function support yet, that is future work, probably work for GCC 15. The full proposal is put in RISC-V C-API document[1], which has discussed with RISC-V LLVM community, so we have consistent syntax and semantics. [1] https://github.com/riscv-non-isa/riscv-c-api-doc/pull/35 v2 changelog: - Resolve awk multi-dimensional issue. - Tweak code format - Tweak testcases
Re: Re: [PATCH] RISC-V Regression: Fix FAIL of bb-slp-pr65935.c for RVV
Great ! I am gonna wait for Richi's approval. juzhe.zh...@rivai.ai From: Andrew Stubbs Date: 2023-10-10 17:40 To: Juzhe-Zhong; gcc-patches@gcc.gnu.org CC: rguent...@suse.de; jeffreya...@gmail.com Subject: Re: [PATCH] RISC-V Regression: Fix FAIL of bb-slp-pr65935.c for RVV On 10/10/2023 02:39, Juzhe-Zhong wrote: > Here is the reference comparing dump IR between ARM SVE and RVV. > > https://godbolt.org/z/zqess8Gss > > We can see RVV has one more dump IR: > optimized: basic block part vectorized using 128 byte vectors > since RVV has 1024 bit vectors. > > The codegen is reasonable good. > > However, I saw GCN also has 1024 bit vector. > This patch may cause this case FAIL in GCN port ? > > Hi, GCN folk, could you check this patch in GCN port for me ? This patch *fixes* an existing test fail on GCN. :) It's probably one of the many I've never had time to analyze (and optimizing more than expected makes it low priority). LGTM Andrew
Re: Re: [PATCH] RISC-V/testsuite: Enable `vect_pack_trunc'
Oh. I realize this patch increase FAIL that I recently fixed: https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632247.html This fail because RVV doesn't have vec_pack_trunc_optab (Loop vectorizer will failed at first time but succeed at 2nd time), then RVV will dump 4 times FOLD_EXTRACT_LAST instead of 2 (ARM SVE 2 times because they have vec_pack_trunc_optab). I think the root cause of RVV failing at multiple tests of "vect" is that we don't enable vec_pack/vec_unpack/... stuff, we still succeed at vectorizations and we want to enable tests of them (Mostly just using different approach to vectorize it (cause dump FAIL) because of some changing I have done previously in the middle-end). So enabling "vec_pack" for RVV will fix some FAILs but increase some other FAILs. CC to Richi to see more reasonable suggestions. juzhe.zh...@rivai.ai 发件人: Maciej W. Rozycki 发送时间: 2023-10-10 06:38 收件人: 钟居哲 抄送: gcc-patches; Jeff Law; rdapp.gcc; kito.cheng 主题: Re: 回复: [PATCH] RISC-V/testsuite: Enable `vect_pack_trunc' On Tue, 10 Oct 2023, 钟居哲 wrote: > Btw, could you rebase to the trunk and run regression again? Full regression-testing takes roughly 40 hours here and I do not normally update the tree midway through my work so as not to add variables and end up chasing a moving target, especially with such an unstable state that we have ended up with recently with the RISC-V port. Since I'm done with this part I can refresh and schedule another run if you are curious as to how it looks like from my side. For the C subset alone it'll take less. Maciej
Re: [PATCH v2] RISC-V: Refine bswap16 auto vectorization code gen
LGTM now. Thanks. juzhe.zh...@rivai.ai From: pan2.li Date: 2023-10-09 21:09 To: gcc-patches CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng Subject: [PATCH v2] RISC-V: Refine bswap16 auto vectorization code gen From: Pan Li Update in v2 * Remove emit helper functions. * Take expand_binop instead. Original log: This patch would like to refine the code gen for the bswap16. We will have VEC_PERM_EXPR after rtl expand when invoking __builtin_bswap. It will generate about 9 instructions in loop as below, no matter it is bswap16, bswap32 or bswap64. .L2: 1 vle16.v v4,0(a0) 2 vmv.v.x v2,a7 3 vand.vv v2,v6,v2 4 sllia2,a5,1 5 vrgatherei16.vv v1,v4,v2 6 sub a4,a4,a5 7 vse16.v v1,0(a3) 8 add a0,a0,a2 9 add a3,a3,a2 bne a4,zero,.L2 But for bswap16 we may have a even simple code gen, which has only 7 instructions in loop as below. .L5 1 vle8.v v2,0(a5) 2 addia5,a5,32 3 vsrl.vi v4,v2,8 4 vsll.vi v2,v2,8 5 vor.vv v4,v4,v2 6 vse8.v v4,0(a4) 7 addia4,a4,32 bne a5,a6,.L5 Unfortunately, this way will make the insn in loop will grow up to 13 and 24 for bswap32 and bswap64. Thus, we will refine the code gen for the bswap16 only, and leave both the bswap32 and bswap64 as is. gcc/ChangeLog: * config/riscv/riscv-v.cc (shuffle_bswap_pattern): New func impl for shuffle bswap. (expand_vec_perm_const_1): Add handling for shuffle bswap pattern. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls/perm-4.c: Adjust checker. * gcc.target/riscv/rvv/autovec/unop/bswap16-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/bswap16-run-0.c: New test. * gcc.target/riscv/rvv/autovec/vls/bswap16-0.c: New test. Signed-off-by: Pan Li --- gcc/config/riscv/riscv-v.cc | 91 +++ .../riscv/rvv/autovec/unop/bswap16-0.c| 17 .../riscv/rvv/autovec/unop/bswap16-run-0.c| 44 + .../riscv/rvv/autovec/vls/bswap16-0.c | 34 +++ .../gcc.target/riscv/rvv/autovec/vls/perm-4.c | 4 +- 5 files changed, 188 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/bswap16-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/bswap16-run-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/bswap16-0.c diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index 23633a2a74d..c72e411f125 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -3030,6 +3030,95 @@ shuffle_decompress_patterns (struct expand_vec_perm_d *d) return true; } +static bool +shuffle_bswap_pattern (struct expand_vec_perm_d *d) +{ + HOST_WIDE_INT diff; + unsigned i, size, step; + + if (!d->one_vector_p || !d->perm[0].is_constant (&diff) || !diff) +return false; + + step = diff + 1; + size = step * GET_MODE_UNIT_BITSIZE (d->vmode); + + switch (size) +{ +case 16: + break; +case 32: +case 64: + /* We will have VEC_PERM_EXPR after rtl expand when invoking + __builtin_bswap. It will generate about 9 instructions in + loop as below, no matter it is bswap16, bswap32 or bswap64. +.L2: + 1 vle16.v v4,0(a0) + 2 vmv.v.x v2,a7 + 3 vand.vv v2,v6,v2 + 4 sllia2,a5,1 + 5 vrgatherei16.vv v1,v4,v2 + 6 sub a4,a4,a5 + 7 vse16.v v1,0(a3) + 8 add a0,a0,a2 + 9 add a3,a3,a2 +bne a4,zero,.L2 + + But for bswap16 we may have a even simple code gen, which + has only 7 instructions in loop as below. +.L5 + 1 vle8.v v2,0(a5) + 2 addia5,a5,32 + 3 vsrl.vi v4,v2,8 + 4 vsll.vi v2,v2,8 + 5 vor.vv v4,v4,v2 + 6 vse8.v v4,0(a4) + 7 addia4,a4,32 +bne a5,a6,.L5 + + Unfortunately, the instructions in loop will grow to 13 and 24 + for bswap32 and bswap64. Thus, we will leverage vrgather (9 insn) + for both the bswap64 and bswap32, but take shift and or (7 insn) + for bswap16. + */ +default: + return false; +} + + for (i = 0; i < step; i++) +if (!d->perm.series_p (i, step, diff - i, step)) + return false; + + if (d->testing_p) +return true; + + machine_mode vhi_mode; + poly_uint64 vhi_nunits = exact_div (GET_MODE_NUNITS (d->vmode), 2); + + if (!get_vector_mode (HImode, vhi_nunits).exists (&vhi_mode)) +return false; + + /* Step-1: Move op0 to src with VHI mode. */ + rtx src = gen_reg_rtx (vhi_mode); + emit_move_insn (src, gen_lowpart (vhi_mode, d->op0)); + + /* Step-2: Shift right 8 bits to dest. */ + rtx dest = expand_binop (vhi_mode, lshr_optab, src, gen_int_mode (8, Pmode), +NULL_RTX, 0, OPTAB_DIRECT); + + /* Step-3: Shift left 8 bits to src. */ + src = expand_binop (vhi_mode, ashl_optab, src, gen_int_mode (8, Pmode), + NULL_RTX, 0, OPTAB_DIRECT); + + /* Step-4: Logic Or dest and src to dest. */ + dest = expand_binop (vhi_mode, ior_optab, dest, src, +NULL_RTX, 0, OPTAB_DIRECT); + + /* Step-5: Move src to target with VQI mode. */ + emit_move_insn (d->target, gen_lowpart
Re: Re: [PATCH] RISC-V Regression test: Fix FAIL of fast-math-slp-38.c for RVV
>> OK. Thanks. Committed. >> Note load/store-lanes is specifically pre-empting SLP if all >> loads/stores of a SLP intance can support that. Not sure if this >> heuristic is good for load/store lanes with high stride? Yeah, I understand your concern. Em, I am sure too. But RVV ISA define lanes load/store from 2 to 8 and LLVM already supported. I think we can fully support them, then let RISC-V COST model decide it whether it is profitable or not. Also, I found RVV can vectorize a TSVC case with stride = 5 lane_load/lane_store: tsvc-s353.c: -/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail *-*-* } } } */ +/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { ! riscv_v } } } } */ https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632213.html So, I think overall it is beneficial we support high stride lane load/store which can help us vectorize more cases. juzhe.zh...@rivai.ai From: Richard Biener Date: 2023-10-09 20:41 To: Juzhe-Zhong CC: gcc-patches; jeffreyalaw Subject: Re: [PATCH] RISC-V Regression test: Fix FAIL of fast-math-slp-38.c for RVV On Mon, 9 Oct 2023, Juzhe-Zhong wrote: > Reference: https://godbolt.org/z/G9jzf5Grh > > RVV is able to vectorize this case using SLP. However, with > -fno-vect-cost-model, RVV vectorize it by vec_load_lanes with stride 6. OK. Note load/store-lanes is specifically pre-empting SLP if all loads/stores of a SLP intance can support that. Not sure if this heuristic is good for load/store lanes with high stride? > gcc/testsuite/ChangeLog: > > * gcc.dg/vect/fast-math-slp-38.c: Add ! vect_strided6. > > --- > gcc/testsuite/gcc.dg/vect/fast-math-slp-38.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/gcc/testsuite/gcc.dg/vect/fast-math-slp-38.c > b/gcc/testsuite/gcc.dg/vect/fast-math-slp-38.c > index 7c7acd5bab6..96751faae7f 100644 > --- a/gcc/testsuite/gcc.dg/vect/fast-math-slp-38.c > +++ b/gcc/testsuite/gcc.dg/vect/fast-math-slp-38.c > @@ -18,4 +18,4 @@ foo (void) > } > > /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ > -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" > } } */ > +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" > { target { ! vect_strided6 } } } } */ > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
Re: Re: [PATCH V2] TEST: Fix vect_cond_arith_* dump checks for RVV
Thanks Robin. Could you send V3 to Richi ? And commit it if Richi is ok with that. juzhe.zh...@rivai.ai From: Robin Dapp Date: 2023-10-09 18:26 To: Andreas Schwab; juzhe.zhong CC: rdapp.gcc; gcc-patches; rguenther; jeffreyalaw Subject: Re: [PATCH V2] TEST: Fix vect_cond_arith_* dump checks for RVV On 10/9/23 09:32, Andreas Schwab wrote: > On Okt 09 2023, juzhe.zh...@rivai.ai wrote: > >> Turns out COND(_LEN)?_ADD can't work. > > It should work though. Tcl regexps are a superset of POSIX EREs. > The problem is that COND(_LEN)?_ADD matches two times against COND_LEN_ADD and a scan-tree-dump-times 1 will fail. So for those checks in vect-cond-arith-6.c we either need to switch to scan-tree-dump or change the pattern to "\.(?:COND|COND_LEN)_ADD". Juzhe, something like the attached works for me. Regards Robin diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-4.c b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-4.c index 1af0fe642a0..7d26dbedc5e 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-4.c +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-4.c @@ -52,8 +52,8 @@ main (void) return 0; } -/* { dg-final { scan-tree-dump { = \.COND_ADD} "optimized" { target vect_double_cond_arith } } } */ -/* { dg-final { scan-tree-dump { = \.COND_SUB} "optimized" { target vect_double_cond_arith } } } */ -/* { dg-final { scan-tree-dump { = \.COND_MUL} "optimized" { target vect_double_cond_arith } } } */ -/* { dg-final { scan-tree-dump { = \.COND_RDIV} "optimized" { target vect_double_cond_arith } } } */ +/* { dg-final { scan-tree-dump { = \.COND_(LEN_)?ADD} "optimized" { target vect_double_cond_arith } } } */ +/* { dg-final { scan-tree-dump { = \.COND_(LEN_)?SUB} "optimized" { target vect_double_cond_arith } } } */ +/* { dg-final { scan-tree-dump { = \.COND_(LEN_)?MUL} "optimized" { target vect_double_cond_arith } } } */ +/* { dg-final { scan-tree-dump { = \.COND_(LEN_)?RDIV} "optimized" { target vect_double_cond_arith } } } */ /* { dg-final { scan-tree-dump-not {VEC_COND_EXPR} "optimized" { target vect_double_cond_arith } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-5.c b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-5.c index ec3d9db4202..f7daa13685c 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-5.c +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-5.c @@ -54,8 +54,8 @@ main (void) return 0; } -/* { dg-final { scan-tree-dump { = \.COND_ADD} "optimized" { target { vect_double_cond_arith && vect_masked_store } } } } */ -/* { dg-final { scan-tree-dump { = \.COND_SUB} "optimized" { target { vect_double_cond_arith && vect_masked_store } } } } */ -/* { dg-final { scan-tree-dump { = \.COND_MUL} "optimized" { target { vect_double_cond_arith && vect_masked_store } } } } */ -/* { dg-final { scan-tree-dump { = \.COND_RDIV} "optimized" { target { vect_double_cond_arith && vect_masked_store } } } } */ +/* { dg-final { scan-tree-dump { = \.COND_(LEN_)?ADD} "optimized" { target { vect_double_cond_arith && vect_masked_store } } } } */ +/* { dg-final { scan-tree-dump { = \.COND_(LEN_)?SUB} "optimized" { target { vect_double_cond_arith && vect_masked_store } } } } */ +/* { dg-final { scan-tree-dump { = \.COND_(LEN_)?MUL} "optimized" { target { vect_double_cond_arith && vect_masked_store } } } } */ +/* { dg-final { scan-tree-dump { = \.COND_(LEN_)?RDIV} "optimized" { target { vect_double_cond_arith && vect_masked_store } } } } */ /* { dg-final { scan-tree-dump-not {VEC_COND_EXPR} "optimized" { target { vect_double_cond_arith && vect_masked_store } } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-6.c b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-6.c index 2aeebd44f83..a80c30a50b2 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-6.c +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-6.c @@ -56,8 +56,8 @@ main (void) } /* { dg-final { scan-tree-dump-times {vectorizing stmts using SLP} 4 "vect" { target vect_double_cond_arith } } } */ -/* { dg-final { scan-tree-dump-times { = \.COND_ADD} 1 "optimized" { target vect_double_cond_arith } } } */ -/* { dg-final { scan-tree-dump-times { = \.COND_SUB} 1 "optimized" { target vect_double_cond_arith } } } */ -/* { dg-final { scan-tree-dump-times { = \.COND_MUL} 1 "optimized" { target vect_double_cond_arith } } } */ -/* { dg-final { scan-tree-dump-times { = \.COND_RDIV} 1 "optimized" { target vect_double_cond_arith } } } */ +/* { dg-final { scan-tree-dump { = \.COND_(LEN_)?ADD} "optimized" { target vect_double_cond_arith } } } */ +/* { dg-final { scan-tree-dump { = \.COND_(LEN_)?SUB} "optimized" { target vect_double_cond_arith } } } */ +/* { dg-final { scan-tree-dump { = \.COND_(LEN_)?MUL} "optimized" { target vect_double_cond_arith } } } */ +/* { dg-final { scan-tree-dump { = \.COND_(LEN_)?RDIV} "optimized" { target vect_double_cond_arith } } } */ /* { dg-final { scan-tree-dump-not {VEC_COND_EXPR} "optimized" { target vect_double_cond_arith } } } */
Re: [PATCH v1] RISC-V: Refine bswap16 auto vectorization code gen
Remove these functions: +static void +emit_vec_sll_scalar (rtx op_0, rtx op_1, rtx op_2, machine_mode vec_mode) +{ + rtx sll_ops[] = {op_0, op_1, op_2}; + insn_code icode = code_for_pred_scalar (ASHIFT, vec_mode); + + emit_vlmax_insn (icode, BINARY_OP, sll_ops); +} + +static void +emit_vec_srl_scalar (rtx op_0, rtx op_1, rtx op_2, machine_mode vec_mode) +{ + rtx srl_ops[] = {op_0, op_1, op_2}; + insn_code icode = code_for_pred_scalar (LSHIFTRT, vec_mode); + + emit_vlmax_insn (icode, BINARY_OP, srl_ops); +} + +static void +emit_vec_or (rtx op_0, rtx op_1, rtx op_2, machine_mode vec_mode) +{ + rtx or_ops[] = {op_0, op_1, op_2}; + insn_code icode = code_for_pred (IOR, vec_mode); + + emit_vlmax_insn (icode, BINARY_OP, or_ops); +} + Instead, For sll, you should use : rtx tmp = expand_binop (Pmode, ashl_optab, op_1, gen_int_mode (8, Pmode), NULL_RTX, 0, OPTAB_DIRECT); For srl, you should use: rtx tmp = expand_binop (Pmode, lshiftrt_optab, op_1, gen_int_mode (8, Pmode), NULL_RTX, 0, OPTAB_DIRECT); For or, you should use: expand_binop (Pmode, ior_optab, tmp, dest, NULL_RTX, 0, OPTAB_DIRECT); juzhe.zh...@rivai.ai From: pan2.li Date: 2023-10-09 16:51 To: gcc-patches CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng Subject: [PATCH v1] RISC-V: Refine bswap16 auto vectorization code gen From: Pan Li This patch would like to refine the code gen for the bswap16. We will have VEC_PERM_EXPR after rtl expand when invoking __builtin_bswap. It will generate about 9 instructions in loop as below, no matter it is bswap16, bswap32 or bswap64. .L2: 1 vle16.v v4,0(a0) 2 vmv.v.x v2,a7 3 vand.vv v2,v6,v2 4 sllia2,a5,1 5 vrgatherei16.vv v1,v4,v2 6 sub a4,a4,a5 7 vse16.v v1,0(a3) 8 add a0,a0,a2 9 add a3,a3,a2 bne a4,zero,.L2 But for bswap16 we may have a even simple code gen, which has only 7 instructions in loop as below. .L5 1 vle8.v v2,0(a5) 2 addia5,a5,32 3 vsrl.vi v4,v2,8 4 vsll.vi v2,v2,8 5 vor.vv v4,v4,v2 6 vse8.v v4,0(a4) 7 addia4,a4,32 bne a5,a6,.L5 Unfortunately, this way will make the insn in loop will grow up to 13 and 24 for bswap32 and bswap64. Thus, we will refine the code gen for the bswap16 only, and leave both the bswap32 and bswap64 as is. gcc/ChangeLog: * config/riscv/riscv-v.cc (emit_vec_sll_scalar): New help func impl for emit vsll.vi/vsll.vx (emit_vec_srl_scalar): Likewise for vsrl.vi/vsrl.vx. (emit_vec_or): Likewise for vor.vv. (shuffle_bswap_pattern): New func impl for shuffle bswap. (expand_vec_perm_const_1): Add shuffle bswap pattern. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls/perm-4.c: Adjust checker. * gcc.target/riscv/rvv/autovec/unop/bswap16-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/bswap16-run-0.c: New test. * gcc.target/riscv/rvv/autovec/vls/bswap16-0.c: New test. Signed-off-by: Pan Li --- gcc/config/riscv/riscv-v.cc | 117 ++ .../riscv/rvv/autovec/unop/bswap16-0.c| 17 +++ .../riscv/rvv/autovec/unop/bswap16-run-0.c| 44 +++ .../riscv/rvv/autovec/vls/bswap16-0.c | 34 + .../gcc.target/riscv/rvv/autovec/vls/perm-4.c | 4 +- 5 files changed, 214 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/bswap16-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/bswap16-run-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/bswap16-0.c diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index 23633a2a74d..3e3b5f2e797 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -878,6 +878,33 @@ emit_vlmax_decompress_insn (rtx target, rtx op0, rtx op1, rtx mask) emit_vlmax_masked_gather_mu_insn (target, op1, sel, mask); } +static void +emit_vec_sll_scalar (rtx op_0, rtx op_1, rtx op_2, machine_mode vec_mode) +{ + rtx sll_ops[] = {op_0, op_1, op_2}; + insn_code icode = code_for_pred_scalar (ASHIFT, vec_mode); + + emit_vlmax_insn (icode, BINARY_OP, sll_ops); +} + +static void +emit_vec_srl_scalar (rtx op_0, rtx op_1, rtx op_2, machine_mode vec_mode) +{ + rtx srl_ops[] = {op_0, op_1, op_2}; + insn_code icode = code_for_pred_scalar (LSHIFTRT, vec_mode); + + emit_vlmax_insn (icode, BINARY_OP, srl_ops); +} + +static void +emit_vec_or (rtx op_0, rtx op_1, rtx op_2, machine_mode vec_mode) +{ + rtx or_ops[] = {op_0, op_1, op_2}; + insn_code icode = code_for_pred (IOR, vec_mode); + + emit_vlmax_insn (icode, BINARY_OP, or_ops); +} + /* Emit merge instruction. */ static machine_mode @@ -3030,6 +3057,94 @@ shuffle_decompress_patterns (struct expand_vec_perm_d *d) return true; } +static bool +shuffle_bswap_pattern (struct expand_vec_perm_d *d) +{ + HOST_WIDE_INT diff; + unsigned i, size, step; + + if (!d->one_vector_p || !d->perm[0].is_constant (&diff) || !diff) +return false; + + step = diff + 1; + size = step * GET_MODE_UNIT_BITSIZE
Re: Re: [PATCH] TEST: Fix XPASS of outer loop vectorization tests for RVV
Thanks Richi. I will try to figure out a better way to adapt the tests without adding riscv* specific targets variant. juzhe.zh...@rivai.ai From: Richard Biener Date: 2023-10-09 16:17 To: Juzhe-Zhong CC: gcc-patches; jeffreyalaw Subject: Re: [PATCH] TEST: Fix XPASS of outer loop vectorization tests for RVV On Sun, 8 Oct 2023, Juzhe-Zhong wrote: > Even though RVV doesn't enable vec_unpack/vec_pack, it succeed on outer loop > vectorizations. How so? I think this maybe goes with the other similar change. That is, when we already have specific target checks adding riscv-*-* looks sensible but when we don't we should figure if there's a capability we can (add and) test instead. > Fix these following XPASS FAILs: > > XPASS: gcc.dg/vect/no-scevccp-outer-16.c scan-tree-dump-times vect "OUTER > LOOP VECTORIZED." 1 > XPASS: gcc.dg/vect/no-scevccp-outer-17.c scan-tree-dump-times vect "OUTER > LOOP VECTORIZED." 1 > XPASS: gcc.dg/vect/no-scevccp-outer-19.c scan-tree-dump-times vect "OUTER > LOOP VECTORIZED." 1 > XPASS: gcc.dg/vect/no-scevccp-outer-21.c scan-tree-dump-times vect "OUTER > LOOP VECTORIZED." 1 > > gcc/testsuite/ChangeLog: > > * gcc.dg/vect/no-scevccp-outer-16.c: Fix XPASS for RVV. > * gcc.dg/vect/no-scevccp-outer-17.c: Ditto. > * gcc.dg/vect/no-scevccp-outer-19.c: Ditto. > * gcc.dg/vect/no-scevccp-outer-21.c: Ditto. > > --- > gcc/testsuite/gcc.dg/vect/no-scevccp-outer-16.c | 2 +- > gcc/testsuite/gcc.dg/vect/no-scevccp-outer-17.c | 2 +- > gcc/testsuite/gcc.dg/vect/no-scevccp-outer-19.c | 2 +- > gcc/testsuite/gcc.dg/vect/no-scevccp-outer-21.c | 2 +- > 4 files changed, 4 insertions(+), 4 deletions(-) > > diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-16.c > b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-16.c > index c7c2fa8a504..12179949e00 100644 > --- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-16.c > +++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-16.c > @@ -59,4 +59,4 @@ int main (void) >return 0; > } > > -/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { > xfail { ! {vect_unpack } } } } } */ > +/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { > xfail { { ! {vect_unpack } } && { ! {riscv_v } } } } } } */ > diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-17.c > b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-17.c > index ba904a6c03e..86554a98169 100644 > --- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-17.c > +++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-17.c > @@ -65,4 +65,4 @@ int main (void) >return 0; > } > > -/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { > xfail { ! {vect_unpack } } } } } */ > +/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { > xfail { { ! {vect_unpack } } && { ! {riscv_v } } } } } } */ > diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-19.c > b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-19.c > index 5cd4049d08c..624b54accf4 100644 > --- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-19.c > +++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-19.c > @@ -49,4 +49,4 @@ int main (void) >return 0; > } > > -/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { > xfail { ! {vect_unpack } } } } } */ > +/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { > xfail { { ! {vect_unpack } } && { ! {riscv_v } } } } } } */ > diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-21.c > b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-21.c > index 72e53c2bfb0..b30a5d78819 100644 > --- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-21.c > +++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-21.c > @@ -59,4 +59,4 @@ int main (void) >return 0; > } > > -/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { > xfail { ! { vect_pack_trunc } } } } } */ > +/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { > xfail { { ! {vect_pack_trunc } } && { ! {riscv_v } } } } } } */ > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
Re: Re: [PATCH] RISC-V: Support movmisalign of RVV VLA modes
>> But you gobble the "or .." into an existing -mstrict-align flag - are >> you sure all implementations are >> self-consistent with handling non-vector memory instructions and >> vector memory instructions here? >> At least the above wording doesn't seem to impose such requirement. RVV ISA: "Support for misaligned vector memory accesses is independent of an implementation’s support for misaligned scalar memory accesses." Support misalign vector memory access is independent on scalar memory access. I think this patch (using -mno-strict-align) is not appropriate, which means I need additional compile option. juzhe.zh...@rivai.ai From: Richard Biener Date: 2023-10-09 16:01 To: Juzhe-Zhong CC: gcc-patches; kito.cheng; kito.cheng; jeffreyalaw; rdapp.gcc Subject: Re: [PATCH] RISC-V: Support movmisalign of RVV VLA modes On Sun, Oct 8, 2023 at 9:22 AM Juzhe-Zhong wrote: > > Previously, I removed the movmisalign pattern to fix the execution FAILs in > this commit: > https://github.com/gcc-mirror/gcc/commit/f7bff24905a6959f85f866390db2fff1d6f95520 > > I was thinking that RVV doesn't allow misaligned at the beginning so I > removed that pattern. > However, after deep investigation && reading RVV ISA again and experiment on > SPIKE, > I realized I was wrong. > > RVV ISA reference: > https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vector-memory-alignment-constraints > > "If an element accessed by a vector memory instruction is not naturally > aligned to the size of the element, > either the element is transferred successfully or an address misaligned > exception is raised on that element." But you gobble the "or .." into an existing -mstrict-align flag - are you sure all implementations are self-consistent with handling non-vector memory instructions and vector memory instructions here? At least the above wording doesn't seem to impose such requirement. > It's obvious that RVV ISA does allow misaligned vector load/store. > > And experiment and confirm on SPIKE: > > [jzzhong@rios-cad122:/work/home/jzzhong/work/toolchain/riscv/gcc/gcc/testsuite/gcc.dg/vect]$~/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/install/bin/spike > --isa=rv64gcv --varch=vlen:128,elen:64 > ~/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/install/riscv64-unknown-elf/bin/pk64 > a.out > bbl loader > z ra 00010158 sp 003ffb40 gp > 00012c48 > tp t0 000110da t1 000f t2 > > s0 00013460 s1 a0 00012ef5 a1 > 00012018 > a2 00012a71 a3 000d a4 0004 a5 > 00012a71 > a6 00012a71 a7 00012018 s2 s3 > > s4 s5 s6 s7 > > s8 s9 sA sB > > t3 t4 t5 t6 > > pc 00010258 va/inst 020660a7 sr 80026620 > Store/AMO access fault! > > [jzzhong@rios-cad122:/work/home/jzzhong/work/toolchain/riscv/gcc/gcc/testsuite/gcc.dg/vect]$~/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/install/bin/spike > --misaligned --isa=rv64gcv --varch=vlen:128,elen:64 > ~/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/install/riscv64-unknown-elf/bin/pk64 > a.out > bbl loader > > We can see SPIKE can pass previous *FAILED* execution tests with specifying > --misaligned to SPIKE. > > So, to honor RVV ISA SPEC, we should add movmisalign pattern back base on the > investigations I have done since > it can improve multiple vectorization tests and fix dumple FAILs. > > This patch fixes these following dump FAILs: > > FAIL: gcc.dg/vect/vect-bitfield-read-2.c -flto -ffat-lto-objects > scan-tree-dump-not optimized "Invalid sum" > FAIL: gcc.dg/vect/vect-bitfield-read-2.c scan-tree-dump-not optimized > "Invalid sum" > FAIL: gcc.dg/vect/vect-bitfield-read-4.c -flto -ffat-lto-objects > scan-tree-dump-not optimized "Invalid sum" > FAIL: gcc.dg/vect/vect-bitfield-read-4.c scan-tree-dump-not optimized > "Invalid sum" > FAIL: gcc.dg/vect/vect-bitfield-write-2.c -flto -ffat-lto-objects > scan-tree-dump-not optimized "Invalid sum" > FAIL: gcc.dg/vect/vect-bitfield-write-2.c scan-tree-dump-not optimized > "Invalid sum" > FAIL: gcc.dg/vect/vect-bitfield-write-3.c -flto -ffat-lto-objects > scan-tree-dump-not optimi
Re: Re: [PATCH] TEST: Fix dump FAIL of vect-multitypes-16.c for RVV
Yes. We do have && enable char -> long conversion (vsext.vf8/vzext.vf8) Thanks for the comment, I will adapt test as you suggested. juzhe.zh...@rivai.ai From: Richard Biener Date: 2023-10-09 15:31 To: Jeff Law CC: Juzhe-Zhong; gcc-patches; richard.sandiford Subject: Re: [PATCH] TEST: Fix dump FAIL of vect-multitypes-16.c for RVV On Sun, 8 Oct 2023, Jeff Law wrote: > > > On 10/8/23 05:35, Juzhe-Zhong wrote: > > RVV (RISC-V Vector) doesn't enable vect_unpack, but we still vectorize this > > case well. > > So, adjust dump check for RVV. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.dg/vect/vect-multitypes-16.c: Fix dump FAIL of RVV. > I'd hoped to avoid a bunch of risc-v special casing in the generic part of the > testsuite. Basically the more we have target specific conditionals rather > than conditionals using properties, the more likely we are to keep revisiting > this stuff over time and possibly for other architectures as well. > > What is it about risc-v's vector support that allows it to optimize this case? > Is it the same property that allows us to handle the outer loop vectorization > tests that you changed in another patch? I suspect for VLA vectorization we can use direct conversion from char to long long here? I also notice the testcase uses 'char', not specifying its sign. So either of [sz]extVxyzDIVxyzQI is possibly provided by RISCV? (or possibly via some intermediate types in a multi-step conversion) For non-VLA and with the single vector size restriction we'd need unpacking. So it might be better { target { vect_unpack || { vect_vla && vect_sext_char_longlong } } } where I think neither vect_vla nor vect_sext_char_longlong exists. Richard - didn't you run into similar things with SVE? Richard. > Neither an ACK nor NAK right now. > > Jeff > > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
Re: [PATCH V2] TEST: Fix vect_cond_arith_* dump checks for RVV
Hi, Richi and Robin. Turns out COND(_LEN)?_ADD can't work. Is this patch Ok ? Or do you have another solution to change the dump check for RVV? Thanks. juzhe.zh...@rivai.ai From: Juzhe-Zhong Date: 2023-10-08 09:33 To: gcc-patches CC: rguenther; jeffreyalaw; rdapp.gcc; Juzhe-Zhong Subject: [PATCH V2] TEST: Fix vect_cond_arith_* dump checks for RVV This patch fixes the following dumple FAILs: FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects scan-tree-dump optimized " = \\.COND_SUB" FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects scan-tree-dump vect " = \\.COND_ADD" FAIL: gcc.dg/vect/vect-cond-arith-2.c scan-tree-dump optimized " = \\.COND_SUB" FAIL: gcc.dg/vect/vect-cond-arith-2.c scan-tree-dump vect " = \\.COND_ADD" FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects scan-tree-dump optimized " = \\.COND_ADD" FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects scan-tree-dump optimized " = \\.COND_MUL" FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects scan-tree-dump optimized " = \\.COND_RDIV" FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects scan-tree-dump optimized " = \\.COND_SUB" FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = \\.COND_ADD" FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = \\.COND_MUL" FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = \\.COND_RDIV" FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = \\.COND_SUB" FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects scan-tree-dump optimized " = \\.COND_ADD" FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects scan-tree-dump optimized " = \\.COND_MUL" FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects scan-tree-dump optimized " = \\.COND_RDIV" FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects scan-tree-dump optimized " = \\.COND_SUB" FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = \\.COND_ADD" FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = \\.COND_MUL" FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = \\.COND_RDIV" FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = \\.COND_SUB" FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects scan-tree-dump-times optimized " = \\.COND_ADD" 1 FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects scan-tree-dump-times optimized " = \\.COND_MUL" 1 FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects scan-tree-dump-times optimized " = \\.COND_RDIV" 1 FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects scan-tree-dump-times optimized " = \\.COND_SUB" 1 FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = \\.COND_ADD" 1 FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = \\.COND_MUL" 1 FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = \\.COND_RDIV" 1 FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = \\.COND_SUB" 1 For RVV, the expected dumple IR is COND_LEN_* pattern. Also, we are still failing at this check: FAIL: gcc.dg/vect/vect-cond-arith-2.c scan-tree-dump optimized " = \\.COND_LEN_SUB" FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects scan-tree-dump optimized " = \\.COND_LEN_SUB" Since we have a known bug in GIMPLE_FOLD that Robin is working on it. @Robin: Plz make sure vect-cond-arith-2.c passes with this patch and your bug fix patch. Ok for trunk ? gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-cond-arith-2.c: Fix dump check for RVV. * gcc.dg/vect/vect-cond-arith-4.c: Ditto. * gcc.dg/vect/vect-cond-arith-5.c: Ditto. * gcc.dg/vect/vect-cond-arith-6.c: Ditto. --- gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c | 4 ++-- gcc/testsuite/gcc.dg/vect/vect-cond-arith-4.c | 8 gcc/testsuite/gcc.dg/vect/vect-cond-arith-5.c | 8 gcc/testsuite/gcc.dg/vect/vect-cond-arith-6.c | 8 4 files changed, 14 insertions(+), 14 deletions(-) diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c index 38994ea82a5..3832a660023 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c @@ -41,5 +41,5 @@ neg_xi (double *x) return res_3; } -/* { dg-final { scan-tree-dump { = \.COND_ADD} "vect" { target { vect_double_cond_arith && vect_fully_masked } } } } */ -/* { dg-final { scan-tree-dump { = \.COND_SUB} "optimized" { target { vect_double_cond_arith && vect_fully_masked } } } } */ +/* { dg-final { scan-tree-dump { = \
Re: Re: [PATCH] TEST: Fix vect_cond_arith_* dump checks for RVV
Hi, Jeff. Address your comments and fix on V2: https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632239.html I think it look reasonable good for a long term maintenance now. Ok for trunk ? juzhe.zh...@rivai.ai From: Jeff Law Date: 2023-10-07 23:09 To: Juzhe-Zhong; gcc-patches CC: rguenther; rdapp.gcc Subject: Re: [PATCH] TEST: Fix vect_cond_arith_* dump checks for RVV On 10/7/23 05:45, Juzhe-Zhong wrote: > This patch fixes the following dumple FAILs: > FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects scan-tree-dump > optimized " = \\.COND_SUB" > FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects scan-tree-dump > vect " = \\.COND_ADD" > FAIL: gcc.dg/vect/vect-cond-arith-2.c scan-tree-dump optimized " = > \\.COND_SUB" > FAIL: gcc.dg/vect/vect-cond-arith-2.c scan-tree-dump vect " = \\.COND_ADD" > FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects scan-tree-dump > optimized " = \\.COND_ADD" > FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects scan-tree-dump > optimized " = \\.COND_MUL" > FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects scan-tree-dump > optimized " = \\.COND_RDIV" > FAIL: gcc.dg/vect/vect-cond-arith-4.c -flto -ffat-lto-objects scan-tree-dump > optimized " = \\.COND_SUB" > FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = > \\.COND_ADD" > FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = > \\.COND_MUL" > FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = > \\.COND_RDIV" > FAIL: gcc.dg/vect/vect-cond-arith-4.c scan-tree-dump optimized " = > \\.COND_SUB" > FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects scan-tree-dump > optimized " = \\.COND_ADD" > FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects scan-tree-dump > optimized " = \\.COND_MUL" > FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects scan-tree-dump > optimized " = \\.COND_RDIV" > FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects scan-tree-dump > optimized " = \\.COND_SUB" > FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = > \\.COND_ADD" > FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = > \\.COND_MUL" > FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = > \\.COND_RDIV" > FAIL: gcc.dg/vect/vect-cond-arith-5.c scan-tree-dump optimized " = > \\.COND_SUB" > FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects > scan-tree-dump-times optimized " = \\.COND_ADD" 1 > FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects > scan-tree-dump-times optimized " = \\.COND_MUL" 1 > FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects > scan-tree-dump-times optimized " = \\.COND_RDIV" 1 > FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects > scan-tree-dump-times optimized " = \\.COND_SUB" 1 > FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = > \\.COND_ADD" 1 > FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = > \\.COND_MUL" 1 > FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = > \\.COND_RDIV" 1 > FAIL: gcc.dg/vect/vect-cond-arith-6.c scan-tree-dump-times optimized " = > \\.COND_SUB" 1 > > For RVV, the expected dumple IR is COND_LEN_* pattern. > > Also, we are still failing at this check: > > FAIL: gcc.dg/vect/vect-cond-arith-2.c scan-tree-dump optimized " = > \\.COND_LEN_SUB" > FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects scan-tree-dump > optimized " = \\.COND_LEN_SUB" > > Since we have a known bug in GIMPLE_FOLD that Robin is working on it. > > @Robin: Plz make sure vect-cond-arith-2.c passes with this patch and your bug > fix patch. > > Ok for trunk ? > > gcc/testsuite/ChangeLog: > > * gcc.dg/vect/vect-cond-arith-2.c: Fix dump check for RVV. > * gcc.dg/vect/vect-cond-arith-4.c: Ditto. > * gcc.dg/vect/vect-cond-arith-5.c: Ditto. > * gcc.dg/vect/vect-cond-arith-6.c: Ditto. Would it make more sense to adjust the regexp so that it matched the standard form as well as the LEN form? So for example we could have a regexp that matched COND_ADD and COND_LEN_ADD. Just wondering if that'll be better from a long term maintenance standpoint. Jeff
Re: Re: [PATCH v1] RISC-V: Add more run test for FP rounding autovec
Also I have reverted your commit: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=066a43ce72ab6559ba14af9628df19daa0b85cdf Plz test the patch and verify it doesn't cause any FAILs if the toolchain doesn't have "zvfh_zfh". juzhe.zh...@rivai.ai From: juzhe.zh...@rivai.ai Date: 2023-10-07 17:49 To: pan2.li; gcc-patches CC: pan2.li; yanzhang.wang; kito.cheng Subject: Re: [PATCH v1] RISC-V: Add more run test for FP rounding autovec These testcases cause multiple FAILs: I think you should /* { dg-do run { target { riscv_v && riscv_zvfh_hw && riscv_zfh_ok } } } */ juzhe.zh...@rivai.ai From: pan2.li Date: 2023-10-07 14:25 To: gcc-patches CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng Subject: [PATCH v1] RISC-V: Add more run test for FP rounding autovec From: Pan Li For _Float16 types, add run test for: * ceil * floor * nearbyint * rint * round * roundeven * trunc For float and double, add run test for: * roundeven The zfa extension is required for these run test cases, the simulation target_board may look like below for rv64. target_board="riscv-sim/-march=rv64gcv_zfa_zfh/-mabi=lp64d/-mcmodel=medlow" gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/rvv.exp: Add zfa for building. * gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-floor-run-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-nearbyint-run-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-rint-run-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-round-run-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-1.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-2.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-trunc-run-0.c: New test. Signed-off-by: Pan Li --- .../riscv/rvv/autovec/unop/math-ceil-run-0.c | 39 +++ .../riscv/rvv/autovec/unop/math-floor-run-0.c | 39 +++ .../rvv/autovec/unop/math-nearbyint-run-0.c | 48 +++ .../riscv/rvv/autovec/unop/math-rint-run-0.c | 48 +++ .../riscv/rvv/autovec/unop/math-round-run-0.c | 39 +++ .../rvv/autovec/unop/math-roundeven-run-0.c | 39 +++ .../rvv/autovec/unop/math-roundeven-run-1.c | 39 +++ .../rvv/autovec/unop/math-roundeven-run-2.c | 39 +++ .../riscv/rvv/autovec/unop/math-trunc-run-0.c | 39 +++ gcc/testsuite/gcc.target/riscv/rvv/rvv.exp| 4 +- 10 files changed, 371 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-run-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-nearbyint-run-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-rint-run-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-round-run-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-trunc-run-0.c diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c new file mode 100644 index 000..70cba3602bb --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c @@ -0,0 +1,39 @@ +/* { dg-do run { target { riscv_v } } } */ +/* { dg-additional-options "-std=c99 -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math" } */ + +#include "test-math.h" + +#define ARRAY_SIZE 128 + +_Float16 in[ARRAY_SIZE]; +_Float16 out[ARRAY_SIZE]; +_Float16 ref[ARRAY_SIZE]; + +TEST_UNARY_CALL (_Float16, __builtin_ceilf16) +TEST_ASSERT (_Float16) + +TEST_INIT (_Float16, 1.2, 2.0, 1) +TEST_INIT (_Float16, -1.2, -1.0, 2) +TEST_INIT (_Float16, 3.0, 3.0, 3) +TEST_INIT (_Float16, 1023.5, 1024.0, 4) +TEST_INIT (_Float16, 1024.0, 1024.0, 5) +TEST_INIT (_Float16, 0.0, 0.0, 6) +TEST_INIT (_Float16, -0.0, -0.0, 7) +TEST_INIT (_Float16, -1023.5, -1023.0, 8) +TEST_INIT (_Float16, -1024.0, -1024.0, 9) + +int +main () +{ + RUN_TEST (_Float16, 1, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + RUN_TEST (_Float16, 2, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + RUN_TEST (_Float16, 3, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + RUN_TEST (_Float16, 4, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + RUN_TEST (_Float16, 5, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + RUN_TEST (_Float16, 6, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + RUN_TEST (_Float16, 7, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + RUN_TEST (_Float16, 8, __bu
Re: [PATCH v1] RISC-V: Add more run test for FP rounding autovec
These testcases cause multiple FAILs: I think you should /* { dg-do run { target { riscv_v && riscv_zvfh_hw && riscv_zfh_ok } } } */ juzhe.zh...@rivai.ai From: pan2.li Date: 2023-10-07 14:25 To: gcc-patches CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng Subject: [PATCH v1] RISC-V: Add more run test for FP rounding autovec From: Pan Li For _Float16 types, add run test for: * ceil * floor * nearbyint * rint * round * roundeven * trunc For float and double, add run test for: * roundeven The zfa extension is required for these run test cases, the simulation target_board may look like below for rv64. target_board="riscv-sim/-march=rv64gcv_zfa_zfh/-mabi=lp64d/-mcmodel=medlow" gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/rvv.exp: Add zfa for building. * gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-floor-run-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-nearbyint-run-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-rint-run-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-round-run-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-1.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-2.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-trunc-run-0.c: New test. Signed-off-by: Pan Li --- .../riscv/rvv/autovec/unop/math-ceil-run-0.c | 39 +++ .../riscv/rvv/autovec/unop/math-floor-run-0.c | 39 +++ .../rvv/autovec/unop/math-nearbyint-run-0.c | 48 +++ .../riscv/rvv/autovec/unop/math-rint-run-0.c | 48 +++ .../riscv/rvv/autovec/unop/math-round-run-0.c | 39 +++ .../rvv/autovec/unop/math-roundeven-run-0.c | 39 +++ .../rvv/autovec/unop/math-roundeven-run-1.c | 39 +++ .../rvv/autovec/unop/math-roundeven-run-2.c | 39 +++ .../riscv/rvv/autovec/unop/math-trunc-run-0.c | 39 +++ gcc/testsuite/gcc.target/riscv/rvv/rvv.exp| 4 +- 10 files changed, 371 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-run-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-nearbyint-run-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-rint-run-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-round-run-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-trunc-run-0.c diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c new file mode 100644 index 000..70cba3602bb --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c @@ -0,0 +1,39 @@ +/* { dg-do run { target { riscv_v } } } */ +/* { dg-additional-options "-std=c99 -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math" } */ + +#include "test-math.h" + +#define ARRAY_SIZE 128 + +_Float16 in[ARRAY_SIZE]; +_Float16 out[ARRAY_SIZE]; +_Float16 ref[ARRAY_SIZE]; + +TEST_UNARY_CALL (_Float16, __builtin_ceilf16) +TEST_ASSERT (_Float16) + +TEST_INIT (_Float16, 1.2, 2.0, 1) +TEST_INIT (_Float16, -1.2, -1.0, 2) +TEST_INIT (_Float16, 3.0, 3.0, 3) +TEST_INIT (_Float16, 1023.5, 1024.0, 4) +TEST_INIT (_Float16, 1024.0, 1024.0, 5) +TEST_INIT (_Float16, 0.0, 0.0, 6) +TEST_INIT (_Float16, -0.0, -0.0, 7) +TEST_INIT (_Float16, -1023.5, -1023.0, 8) +TEST_INIT (_Float16, -1024.0, -1024.0, 9) + +int +main () +{ + RUN_TEST (_Float16, 1, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + RUN_TEST (_Float16, 2, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + RUN_TEST (_Float16, 3, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + RUN_TEST (_Float16, 4, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + RUN_TEST (_Float16, 5, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + RUN_TEST (_Float16, 6, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + RUN_TEST (_Float16, 7, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + RUN_TEST (_Float16, 8, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + RUN_TEST (_Float16, 9, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + + return 0; +} diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-run-0.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-run-0.c new file mode 100644 index 000..c542278c1f5 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-run-0.c @@ -0,0 +1,39 @@ +/* { dg-do run {
Re: [PATCH v1] RISC-V: Add more run test for FP rounding autovec
OK juzhe.zh...@rivai.ai From: pan2.li Date: 2023-10-07 14:25 To: gcc-patches CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng Subject: [PATCH v1] RISC-V: Add more run test for FP rounding autovec From: Pan Li For _Float16 types, add run test for: * ceil * floor * nearbyint * rint * round * roundeven * trunc For float and double, add run test for: * roundeven The zfa extension is required for these run test cases, the simulation target_board may look like below for rv64. target_board="riscv-sim/-march=rv64gcv_zfa_zfh/-mabi=lp64d/-mcmodel=medlow" gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/rvv.exp: Add zfa for building. * gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-floor-run-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-nearbyint-run-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-rint-run-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-round-run-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-1.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-2.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-trunc-run-0.c: New test. Signed-off-by: Pan Li --- .../riscv/rvv/autovec/unop/math-ceil-run-0.c | 39 +++ .../riscv/rvv/autovec/unop/math-floor-run-0.c | 39 +++ .../rvv/autovec/unop/math-nearbyint-run-0.c | 48 +++ .../riscv/rvv/autovec/unop/math-rint-run-0.c | 48 +++ .../riscv/rvv/autovec/unop/math-round-run-0.c | 39 +++ .../rvv/autovec/unop/math-roundeven-run-0.c | 39 +++ .../rvv/autovec/unop/math-roundeven-run-1.c | 39 +++ .../rvv/autovec/unop/math-roundeven-run-2.c | 39 +++ .../riscv/rvv/autovec/unop/math-trunc-run-0.c | 39 +++ gcc/testsuite/gcc.target/riscv/rvv/rvv.exp| 4 +- 10 files changed, 371 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-run-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-nearbyint-run-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-rint-run-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-round-run-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-trunc-run-0.c diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c new file mode 100644 index 000..70cba3602bb --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c @@ -0,0 +1,39 @@ +/* { dg-do run { target { riscv_v } } } */ +/* { dg-additional-options "-std=c99 -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math" } */ + +#include "test-math.h" + +#define ARRAY_SIZE 128 + +_Float16 in[ARRAY_SIZE]; +_Float16 out[ARRAY_SIZE]; +_Float16 ref[ARRAY_SIZE]; + +TEST_UNARY_CALL (_Float16, __builtin_ceilf16) +TEST_ASSERT (_Float16) + +TEST_INIT (_Float16, 1.2, 2.0, 1) +TEST_INIT (_Float16, -1.2, -1.0, 2) +TEST_INIT (_Float16, 3.0, 3.0, 3) +TEST_INIT (_Float16, 1023.5, 1024.0, 4) +TEST_INIT (_Float16, 1024.0, 1024.0, 5) +TEST_INIT (_Float16, 0.0, 0.0, 6) +TEST_INIT (_Float16, -0.0, -0.0, 7) +TEST_INIT (_Float16, -1023.5, -1023.0, 8) +TEST_INIT (_Float16, -1024.0, -1024.0, 9) + +int +main () +{ + RUN_TEST (_Float16, 1, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + RUN_TEST (_Float16, 2, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + RUN_TEST (_Float16, 3, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + RUN_TEST (_Float16, 4, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + RUN_TEST (_Float16, 5, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + RUN_TEST (_Float16, 6, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + RUN_TEST (_Float16, 7, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + RUN_TEST (_Float16, 8, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + RUN_TEST (_Float16, 9, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + + return 0; +} diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-run-0.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-run-0.c new file mode 100644 index 000..c542278c1f5 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-run-0.c @@ -0,0 +1,39 @@ +/* { dg-do run { target { riscv_v } } } */ +/* { dg-additional-options "-std=c99 -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math" } */ + +#include
Re: [PATCH] RISC-V: Fix scan-assembler-times of RVV test case
OK. juzhe.zh...@rivai.ai From: Li Xu Date: 2023-10-07 11:18 To: gcc-patches CC: kito.cheng; palmer; juzhe.zhong; xuli Subject: [PATCH] RISC-V: Fix scan-assembler-times of RVV test case From: xuli gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-25.c: Adjust assembler times. * gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-26.c: Ditto. --- .../gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-25.c | 10 +- .../gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-26.c | 10 +- 2 files changed, 10 insertions(+), 10 deletions(-) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-25.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-25.c index c566f8a4751..2ec9487a6c6 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-25.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-25.c @@ -88,8 +88,8 @@ void f (void * restrict in, void * restrict out, int n, int cond) } } -/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*mf8,\s*t[au],\s*m[au]} 3 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */ -/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e16,\s*mf4,\s*t[au],\s*m[au]} 2 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */ -/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e32,\s*mf2,\s*t[au],\s*m[au]} 3 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */ -/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e64,\s*m1,\s*t[au],\s*m[au]} 2 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */ -/* { dg-final { scan-assembler-times {vsetvli} 10 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */ +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*mf8,\s*t[au],\s*m[au]} 10 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */ +/* { dg-final { scan-assembler-not {vsetvli\s+[a-x0-9]+,\s*zero,\s*e16,\s*mf4,\s*t[au],\s*m[au]} { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */ +/* { dg-final { scan-assembler-not {vsetvli\s+[a-x0-9]+,\s*zero,\s*e32,\s*mf2,\s*t[au],\s*m[au]} { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */ +/* { dg-final { scan-assembler-not {vsetvli\s+[a-x0-9]+,\s*zero,\s*e64,\s*m1,\s*t[au],\s*m[au]} { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */ +/* { dg-final { scan-assembler-times {vsetvli} 19 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-26.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-26.c index d0e75258188..bcafce36895 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-26.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-26.c @@ -80,8 +80,8 @@ void f (void * restrict in, void * restrict out, int n, int cond) } } -/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*mf8,\s*t[au],\s*m[au]} 3 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */ -/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e16,\s*mf4,\s*t[au],\s*m[au]} 2 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */ -/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e32,\s*mf2,\s*t[au],\s*m[au]} 3 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */ -/* { dg-final { scan-assembler-times
Re: Re: [PATCH] test: Isolate slp-1.c check of target supports vect_strided5
Thanks for reporting it. I think we may need to change it into: + /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" { target {! vect_load_lanes } } } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { target vect_strided5 && vect_load_lanes } } } */ Could you verify it whether it work for you ? Thanks. juzhe.zh...@rivai.ai From: Andrew Stubbs Date: 2023-10-06 22:29 To: Juzhe-Zhong; gcc-patches@gcc.gnu.org CC: rguent...@suse.de; jeffreya...@gmail.com; richard.sandif...@arm.com Subject: Re: [PATCH] test: Isolate slp-1.c check of target supports vect_strided5 On 15/09/2023 10:16, Juzhe-Zhong wrote: > This test failed in RISC-V: > FAIL: gcc.dg/vect/slp-1.c -flto -ffat-lto-objects scan-tree-dump-times vect > "vectorizing stmts using SLP" 4 > FAIL: gcc.dg/vect/slp-1.c scan-tree-dump-times vect "vectorizing stmts using > SLP" 4 > > Because this loop: >/* SLP with unrolling by 8. */ >for (i = 0; i < N; i++) > { >out[i*5] = 8; >out[i*5 + 1] = 7; >out[i*5 + 2] = 81; >out[i*5 + 3] = 28; >out[i*5 + 4] = 18; > } > > is using vect_load_lanes with array size = 5. > instead of SLP. > > When we adjust the COST of LANES load store, then it will use SLP. > > gcc/testsuite/ChangeLog: > > * gcc.dg/vect/slp-1.c: Add vect_stried5. > > --- > gcc/testsuite/gcc.dg/vect/slp-1.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/gcc/testsuite/gcc.dg/vect/slp-1.c > b/gcc/testsuite/gcc.dg/vect/slp-1.c > index 82e4f6469fb..d4a13f12df6 100644 > --- a/gcc/testsuite/gcc.dg/vect/slp-1.c > +++ b/gcc/testsuite/gcc.dg/vect/slp-1.c > @@ -122,5 +122,5 @@ int main (void) > } > > /* { dg-final { scan-tree-dump-times "vectorized 4 loops" 1 "vect" } } */ > -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" > } } */ > - > +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" > { target {! vect_strided5 } } } } */ > +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" > { target vect_strided5 } } } */ This patch causes a test regression on amdgcn because vect_strided5 is true (because check_effective_target_vect_fully_masked is true), but the testcase still gives the message 4 times. Perhaps because amdgcn uses masking and not vect_load_lanes? Andrew
Re: [PATCH v1] RISC-V: Support {U}INT64 to FP16 auto-vectorization
Plz add "!flag_trapping_math" juzhe.zh...@rivai.ai From: pan2.li Date: 2023-09-28 13:59 To: gcc-patches CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng Subject: [PATCH v1] RISC-V: Support {U}INT64 to FP16 auto-vectorization From: Pan Li This patch would like to support the auto-vectorization from the INT64 to FP16. We take below steps for the conversion. * INT64 to FP32. * FP32 to FP16. Given sample code as below: void test_func (int64_t * __restrict a, _Float16 *b, unsigned n) { for (unsigned i = 0; i < n; i++) b[i] = (_Float16) (a[i]); } Before this patch: test.c:6:26: missed: couldn't vectorize loop test.c:6:26: missed: not vectorized: unsupported data-type ld a0,0(s0) call__floatdihf fsh fa0,0(s1) addis0,s0,8 addis1,s1,2 bne s2,s0,.L3 ld ra,24(sp) ld s0,16(sp) ld s1,8(sp) ld s2,0(sp) addisp,sp,32 After this patch: vsetvli a5,a2,e8,mf8,ta,ma vle64.v v1,0(a0) vsetvli a4,zero,e32,mf2,ta,ma vfncvt.f.x.wv1,v1 vsetvli zero,zero,e16,mf4,ta,ma vfncvt.f.f.wv1,v1 vsetvli zero,a2,e16,mf4,ta,ma vse16.v v1,0(a1) Please note VLS mode is also involved in this patch and covered by the test cases. PR target/111506 gcc/ChangeLog: * config/riscv/autovec.md (2): * config/riscv/vector-iterators.md: gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-rv32gcv.c: Adjust checker. * gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-rv64gcv.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/cvt-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/cvt-1.c: New test. * gcc.target/riscv/rvv/autovec/vls/cvt-0.c: New test. Signed-off-by: Pan Li --- gcc/config/riscv/autovec.md | 24 ++ gcc/config/riscv/vector-iterators.md | 38 +++ .../autovec/conversions/vfncvt-itof-rv32gcv.c | 5 +- .../autovec/conversions/vfncvt-itof-rv64gcv.c | 5 +- .../gcc.target/riscv/rvv/autovec/unop/cvt-0.c | 21 + .../gcc.target/riscv/rvv/autovec/unop/cvt-1.c | 22 + .../gcc.target/riscv/rvv/autovec/vls/cvt-0.c | 47 +++ 7 files changed, 158 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/cvt-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/cvt-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/cvt-0.c diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index cd0cbdd2889..6dd3b96a423 100644 --- a/gcc/config/riscv/autovec.md +++ b/gcc/config/riscv/autovec.md @@ -974,6 +974,30 @@ (define_insn_and_split "2" } [(set_attr "type" "vfncvtitof")]) +;; This operation can be performed in the loop vectorizer but unfortunately +;; not applicable for now. We can remove this pattern after loop vectorizer +;; is able to take care of INT64 to FP16 conversion. +(define_insn_and_split "2" + [(set (match_operand: 0 "register_operand") + (any_float: + (match_operand:VWWCONVERTI 1 "register_operand")))] + "TARGET_VECTOR && TARGET_ZVFH && can_create_pseudo_p ()" + "#" + "&& 1" + [(const_int 0)] + { +rtx single = gen_reg_rtx (mode); /* Get vector SF mode. */ + +/* Step-1, INT64 => FP32. */ +emit_insn (gen_2 (single, operands[1])); +/* Step-2, FP32 => FP16. */ +emit_insn (gen_trunc2 (operands[0], single)); + +DONE; + } + [(set_attr "type" "vfncvtitof")] +) + ;; = ;; == Unary arithmetic ;; = diff --git a/gcc/config/riscv/vector-iterators.md b/gcc/config/riscv/vector-iterators.md index b6cd872eb42..c9a7344b1bc 100644 --- a/gcc/config/riscv/vector-iterators.md +++ b/gcc/config/riscv/vector-iterators.md @@ -1247,6 +1247,24 @@ (define_mode_iterator VWCONVERTI [ (V512DI "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_64 && TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN >= 4096") ]) +(define_mode_iterator VWWCONVERTI [ + (RVVM8DI "TARGET_VECTOR_ELEN_64 && TARGET_ZVFH") + (RVVM4DI "TARGET_VECTOR_ELEN_64 && TARGET_ZVFH") + (RVVM2DI "TARGET_VECTOR_ELEN_64 && TARGET_ZVFH") + (RVVM1DI "TARGET_VECTOR_ELEN_64 && TARGET_ZVFH") + + (V1DI "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_64 && TARGET_ZVFH") + (V2DI "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_64 && TARGET_ZVFH") + (V4DI "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_64 && TARGET_ZVFH") + (V8DI "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_64 &&am
Re: [PATCH v2] RISC-V: Bugfix for RTL check[PR111533]
LGTM. Thanks for fixing it. juzhe.zh...@rivai.ai From: Li Xu Date: 2023-09-28 09:33 To: gcc-patches CC: kito.cheng; palmer; juzhe.zhong; xuli Subject: [PATCH v2] RISC-V: Bugfix for RTL check[PR111533] From: xuli Consider the flowing situation: BB5: local_dem(RVV Insn 1, AVL(reg zero)) RVV Insn 1: vmv.s.x, AVL (const_int 1) RVV Insn 2: vredsum.vs, AVL(reg zero) vmv.s.x has vl operand, the following code will get avl (cosnt_int) from RVV Insn 1. rtx avl = has_vl_op (insn->rtl ()) ? get_vl (insn->rtl ()) : dem.get_avl (); If use REGNO for const_int, the compiler will crash: during RTL pass: vsetvl res_debug.c: In function '__dn_count_labels': res_debug.c:1050:1: internal compiler error: RTL check: expected code 'reg', have 'const_int' in rhs_regno, at rtl.h:1934 1050 | } | ^ 0x8fb169 rtl_check_failed_code1(rtx_def const*, rtx_code, char const*, int, char const*) ../.././gcc/gcc/rtl.cc:770 0x1399818 rhs_regno(rtx_def const*) ../.././gcc/gcc/rtl.h:1934 0x1399818 anticipatable_occurrence_p ../.././gcc/gcc/config/riscv/riscv-vsetvl.cc:348 So in this case avl should be obtained from dem. Another issue is caused by the following code: HOST_WIDE_INT diff = INTVAL (builder.elt (i)) - i; during RTL pass: expand ../../.././gcc/libgfortran/generated/matmul_c4.c: In function 'matmul_c4': ../../.././gcc/libgfortran/generated/matmul_c4.c:2906:39: internal compiler error: RTL check: expected code 'const_int', have 'const_poly_int' in expand_const_vector, at config/riscv/riscv-v.cc:1149 The builder.elt (i) can be either const_int or const_poly_int. PR target/111533 gcc/ChangeLog: * config/riscv/riscv-v.cc (expand_const_vector): Fix bug. * config/riscv/riscv-vsetvl.cc (anticipatable_occurrence_p): Fix bug. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/pr111533-1.c: New test. * gcc.target/riscv/rvv/base/pr111533-2.c: New test. --- gcc/config/riscv/riscv-v.cc | 5 ++-- gcc/config/riscv/riscv-vsetvl.cc | 3 +- .../gcc.target/riscv/rvv/base/pr111533-1.c| 15 ++ .../gcc.target/riscv/rvv/base/pr111533-2.c| 29 +++ 4 files changed, 48 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr111533-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr111533-2.c diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index 359fb2ced8b..26700cfc732 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -1149,8 +1149,9 @@ expand_const_vector (rtx target, rtx src) for (unsigned int i = 0; i < v.npatterns (); ++i) { /* Calculate the diff between the target sequence and - vid sequence. */ - HOST_WIDE_INT diff = INTVAL (builder.elt (i)) - i; + vid sequence. The elt (i) can be either const_int or + const_poly_int. */ + poly_int64 diff = rtx_to_poly_int64 (builder.elt (i)) - i; v.quick_push (gen_int_mode (diff, v.inner_mode ())); } /* Step 2: Generate result = VID + diff. */ diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc index 7af33e7ea6f..af8c31d873c 100644 --- a/gcc/config/riscv/riscv-vsetvl.cc +++ b/gcc/config/riscv/riscv-vsetvl.cc @@ -307,8 +307,7 @@ anticipatable_occurrence_p (const bb_info *bb, const vector_insn_info dem) if (dem.has_avl_reg ()) { /* rs1 (avl) are not modified in the basic block prior to the VSETVL. */ - rtx avl - = has_vl_op (insn->rtl ()) ? get_vl (insn->rtl ()) : dem.get_avl (); + rtx avl = dem.get_avl_or_vl_reg (); if (dem.dirty_p ()) { gcc_assert (!vsetvl_insn_p (insn->rtl ())); diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr111533-1.c b/gcc/testsuite/gcc.target/riscv/rvv/base/pr111533-1.c new file mode 100644 index 000..aba26dfac89 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr111533-1.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gcv -mabi=lp64d -O2 -ffast-math -ftree-vectorize" } */ + +#include + +typedef _Complex float GFC_COMPLEX_4; + +void +test (GFC_COMPLEX_4 *a, GFC_COMPLEX_4 *b, GFC_COMPLEX_4 c, ptrdiff_t i, ptrdiff_t j) +{ + ptrdiff_t l; + for (l = 0; l <= i; ++l) +c += b[l] * a[j]; + b[j] = c; +} diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr111533-2.c b/gcc/testsuite/gcc.target/riscv/rvv/base/pr111533-2.c new file mode 100644 index 000..a4d2011b74b --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr111533-2.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gcv -mabi=lp64d -O2" } */ + +#include + +/* Return the number of DNS hierarchy levels in the name. */ +int +test (const char *name) { + int i, len, count; + + len = strlen(name); + for (i = 0, count = 0; i < len; i++) { + /* XXX need to check for \. or use named's nlabels(). */ + if (nam
Re: Re: [PATCH V3] RISC-V: Remove mem-to-mem VLS move pattern[PR111566]
Since after removing mem-to-mem pattern. program main integer, dimension(:,:), allocatable :: a, b integer, dimension(:), allocatable :: sh allocate (a(2,2)) allocate (b(2,2)) allocate (sh(3)) a = 1 b = cshift(a,sh) end program main This case will failed if we don't change mov pattern. juzhe.zh...@rivai.ai From: Kito Cheng Date: 2023-09-27 18:07 To: juzhe.zh...@rivai.ai CC: kito.cheng; gcc-patches; jeffreyalaw; Robin Dapp Subject: Re: Re: [PATCH V3] RISC-V: Remove mem-to-mem VLS move pattern[PR111566] I can understand why remove mem to mem pattern, but why the normal mov pattern for VLS_AVL_IMM need to change too? On Wed, Sep 27, 2023 at 10:39 AM juzhe.zh...@rivai.ai wrote: > > >> Why add `can_create_pseudo_p ()` here? this will split after reload, > >> but we forbid that pattern between reload and split2? > > I have no ideal. Some fortran tests just need recognization of mem-to-mem > pattern before RA. > I don't know the reason. > > ____ > juzhe.zh...@rivai.ai > > > From: Kito Cheng > Date: 2023-09-27 17:33 > To: Juzhe-Zhong > CC: gcc-patches; kito.cheng; jeffreyalaw; rdapp.gcc > Subject: Re: [PATCH V3] RISC-V: Remove mem-to-mem VLS move pattern[PR111566] > > (define_insn_and_split "*mov" > >[(set (match_operand:VLS_AVL_IMM 0 "reg_or_mem_operand" "=vr, m, vr") > > (match_operand:VLS_AVL_IMM 1 "reg_or_mem_operand" " m,vr, vr"))] > >"TARGET_VECTOR > > - && (register_operand (operands[0], mode) > > + && (can_create_pseudo_p () > > Why add `can_create_pseudo_p ()` here? this will split after reload, > but we forbid that pattern between reload and split2? > > > + || register_operand (operands[0], mode) > > || register_operand (operands[1], mode))" > >"@ > > # > > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/mov-1.c > > b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/mov-1.c > > index aedf98819bb..24bb7240db8 100644 > > --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/mov-1.c > > +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/mov-1.c > > @@ -4,54 +4,6 @@ > > > > #include "def.h" > > > > -/* > > -** mov0: > > -** lbu\s+[a-x0-9]+,0\s*\([a-x0-9]+\) > > -** sb\s+[a-x0-9]+,0\s*\([a-x0-9]+\) > > -** ret > > -*/ > > -void mov0 (int8_t *in, int8_t *out) > > -{ > > - v1qi v = *(v1qi*)in; > > - *(v1qi*)out = v; > > -} > > - > > -/* > > -** mov1: > > -** lhu\s+[a-x0-9]+,0\s*\([a-x0-9]+\) > > -** sh\s+[a-x0-9]+,0\s*\([a-x0-9]+\) > > -** ret > > -*/ > > -void mov1 (int8_t *in, int8_t *out) > > -{ > > - v2qi v = *(v2qi*)in; > > - *(v2qi*)out = v; > > -} > > - > > -/* > > -** mov2: > > -** lw\s+[a-x0-9]+,0\s*\([a-x0-9]+\) > > -** sw\s+[a-x0-9]+,0\s*\([a-x0-9]+\) > > -** ret > > -*/ > > -void mov2 (int8_t *in, int8_t *out) > > -{ > > - v4qi v = *(v4qi*)in; > > - *(v4qi*)out = v; > > -} > > - > > -/* > > -** mov3: > > -** ld\s+[a-x0-9]+,0\s*\([a-x0-9]+\) > > -** sd\s+[a-x0-9]+,0\s*\([a-x0-9]+\) > > -** ret > > -*/ > > -void mov3 (int8_t *in, int8_t *out) > > -{ > > - v8qi v = *(v8qi*)in; > > - *(v8qi*)out = v; > > -} > > - > > /* > > ** mov4: > > ** vsetivli\s+zero,\s*16,\s*e8,\s*mf8,\s*t[au],\s*m[au] > > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/mov-10.c > > b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/mov-10.c > > index 5e9615412b7..cae96b3be3f 100644 > > --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/mov-10.c > > +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/mov-10.c > > @@ -4,18 +4,6 @@ > > > > #include "def.h" > > > > -/* > > -** mov0: > > -** fld\s+[a-x0-9]+,0\s*\([a-x0-9]+\) > > -** fsd\s+[a-x0-9]+,0\s*\([a-x0-9]+\) > > -** ret > > -*/ > > -void mov0 (double *in, double *out) > > -{ > > - v1df v = *(v1df*)in; > > - *(v1df*)out = v; > > -} > > - > > /* > > ** mov1: > > ** vsetivli\s+zero,\s*2,\s*e64,\s*m1,\s*t[au],\s*m[au] > > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/mov-2.c > > b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/mov-2.c > > deleted file mode 100644 > > index 10ae1972db7..000 > > --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/mov-2.c > > +++ /dev/null > > @@ -1,19 +0,0
Re: Re: [PATCH V3] RISC-V: Remove mem-to-mem VLS move pattern[PR111566]
>> Why add `can_create_pseudo_p ()` here? this will split after reload, >> but we forbid that pattern between reload and split2? I have no ideal. Some fortran tests just need recognization of mem-to-mem pattern before RA. I don't know the reason. juzhe.zh...@rivai.ai From: Kito Cheng Date: 2023-09-27 17:33 To: Juzhe-Zhong CC: gcc-patches; kito.cheng; jeffreyalaw; rdapp.gcc Subject: Re: [PATCH V3] RISC-V: Remove mem-to-mem VLS move pattern[PR111566] > (define_insn_and_split "*mov" >[(set (match_operand:VLS_AVL_IMM 0 "reg_or_mem_operand" "=vr, m, vr") > (match_operand:VLS_AVL_IMM 1 "reg_or_mem_operand" " m,vr, vr"))] >"TARGET_VECTOR > - && (register_operand (operands[0], mode) > + && (can_create_pseudo_p () Why add `can_create_pseudo_p ()` here? this will split after reload, but we forbid that pattern between reload and split2? > + || register_operand (operands[0], mode) > || register_operand (operands[1], mode))" >"@ > # > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/mov-1.c > b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/mov-1.c > index aedf98819bb..24bb7240db8 100644 > --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/mov-1.c > +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/mov-1.c > @@ -4,54 +4,6 @@ > > #include "def.h" > > -/* > -** mov0: > -** lbu\s+[a-x0-9]+,0\s*\([a-x0-9]+\) > -** sb\s+[a-x0-9]+,0\s*\([a-x0-9]+\) > -** ret > -*/ > -void mov0 (int8_t *in, int8_t *out) > -{ > - v1qi v = *(v1qi*)in; > - *(v1qi*)out = v; > -} > - > -/* > -** mov1: > -** lhu\s+[a-x0-9]+,0\s*\([a-x0-9]+\) > -** sh\s+[a-x0-9]+,0\s*\([a-x0-9]+\) > -** ret > -*/ > -void mov1 (int8_t *in, int8_t *out) > -{ > - v2qi v = *(v2qi*)in; > - *(v2qi*)out = v; > -} > - > -/* > -** mov2: > -** lw\s+[a-x0-9]+,0\s*\([a-x0-9]+\) > -** sw\s+[a-x0-9]+,0\s*\([a-x0-9]+\) > -** ret > -*/ > -void mov2 (int8_t *in, int8_t *out) > -{ > - v4qi v = *(v4qi*)in; > - *(v4qi*)out = v; > -} > - > -/* > -** mov3: > -** ld\s+[a-x0-9]+,0\s*\([a-x0-9]+\) > -** sd\s+[a-x0-9]+,0\s*\([a-x0-9]+\) > -** ret > -*/ > -void mov3 (int8_t *in, int8_t *out) > -{ > - v8qi v = *(v8qi*)in; > - *(v8qi*)out = v; > -} > - > /* > ** mov4: > ** vsetivli\s+zero,\s*16,\s*e8,\s*mf8,\s*t[au],\s*m[au] > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/mov-10.c > b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/mov-10.c > index 5e9615412b7..cae96b3be3f 100644 > --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/mov-10.c > +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/mov-10.c > @@ -4,18 +4,6 @@ > > #include "def.h" > > -/* > -** mov0: > -** fld\s+[a-x0-9]+,0\s*\([a-x0-9]+\) > -** fsd\s+[a-x0-9]+,0\s*\([a-x0-9]+\) > -** ret > -*/ > -void mov0 (double *in, double *out) > -{ > - v1df v = *(v1df*)in; > - *(v1df*)out = v; > -} > - > /* > ** mov1: > ** vsetivli\s+zero,\s*2,\s*e64,\s*m1,\s*t[au],\s*m[au] > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/mov-2.c > b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/mov-2.c > deleted file mode 100644 > index 10ae1972db7..000 > --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/mov-2.c > +++ /dev/null > @@ -1,19 +0,0 @@ > -/* { dg-do compile } */ > -/* { dg-options "-march=rv32gcv_zvfh_zvl4096b -mabi=ilp32d -O3 > -fno-schedule-insns -fno-schedule-insns2" } */ > -/* { dg-final { check-function-bodies "**" "" } } */ > - > -#include "def.h" > - > -/* > -** mov: > -** lw\s+[a-x0-9]+,0\s*\([a-x0-9]+\) > -** lw\s+[a-x0-9]+,4\s*\([a-x0-9]+\) > -** sw\s+[a-x0-9]+,0\s*\([a-x0-9]+\) > -** sw\s+[a-x0-9]+,4\s*\([a-x0-9]+\) > -** ret > -*/ > -void mov (int8_t *in, int8_t *out) > -{ > - v8qi v = *(v8qi*)in; > - *(v8qi*)out = v; > -} > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/mov-3.c > b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/mov-3.c > index f2880ae5e77..86ce22896c5 100644 > --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/mov-3.c > +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/mov-3.c > @@ -4,42 +4,6 @@ > > #include "def.h" > > -/* > -** mov0: > -** lhu\s+[a-x0-9]+,0\s*\([a-x0-9]+\) > -** sh\s+[a-x0-9]+,0\s*\([a-x0-9]+\) > -** ret > -*/ > -void mov0 (int16_t *in, int16_t *out) > -{ > - v1hi v = *(v1hi*)in; > - *(v1hi*)out = v; > -} > - > -/* > -** mov1: > -** lw\s+[a-x0-9]+,0\s*\([a-x0-9]+\) >
Re: [PATCH v1] RISC-V: Support FP roundeven auto-vectorization
LGTM juzhe.zh...@rivai.ai From: pan2.li Date: 2023-09-27 16:20 To: gcc-patches CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng Subject: [PATCH v1] RISC-V: Support FP roundeven auto-vectorization From: Pan Li This patch would like to support auto-vectorization for the roundeven API in math.h. It depends on the -ffast-math option. When we would like to call roundeven like v2 = roundeven (v1), we will convert it into below insns (reference the implementation of llvm). * vfcvt.x.f v3, v1, RNE * vfcvt.f.x v2, v3 However, the floating point value may not need the cvt as above if its mantissa is zero. For example single precision floating point below. +---+---+-+ | raw float | binary layout | after roundeven | +---+---+-+ | 8388607.5 | 0x4aff| 8388608.0 | | 8388608.0 | 0x4b00| 8388608.0 | | 8388609.0 | 0x4b01| 8388609.0 | +---+---+-+ All single floating point glte 8388608.0 will have all zero mantisaa. We leverage vmflt and mask to filter them out in vector and only do the cvt on mask. Befor this patch: math-roundeven-1.c:21:1: missed: couldn't vectorize loop ... .L3: flw fa0,0(s0) addis0,s0,4 addis1,s1,4 callroundeven fsw fa0,-4(s1) bne s0,s2,.L3 After this patch: ... fsrmi 0 // Rounding to nearest, ties to even .L4: vfabs.v v1,v2 vmflt.vfv0,v1,fa5 vfcvt.x.f.v v3,v2,v0.t vfcvt.f.x.v v1,v3,v0.t vfsgnj.vv v1,v1,v2 bne .L4 .L14: fsrma6 ret Please note VLS mode is also involved in this patch and covered by the test cases. We will add more run test with zfa support later. gcc/ChangeLog: * config/riscv/autovec.md (roundeven2): New pattern. * config/riscv/riscv-protos.h (enum insn_flags): New enum type. (enum insn_type): Ditto. (expand_vec_roundeven): New func decl. * config/riscv/riscv-v.cc (expand_vec_roundeven): New func impl. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/unop/math-roundeven-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-roundeven-1.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-roundeven-2.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-roundeven-3.c: New test. * gcc.target/riscv/rvv/autovec/vls/math-roundeven-1.c: New test. Signed-off-by: Pan Li --- gcc/config/riscv/autovec.md | 10 gcc/config/riscv/riscv-protos.h | 5 ++ gcc/config/riscv/riscv-v.cc | 24 .../riscv/rvv/autovec/unop/math-roundeven-0.c | 23 .../riscv/rvv/autovec/unop/math-roundeven-1.c | 23 .../riscv/rvv/autovec/unop/math-roundeven-2.c | 23 .../riscv/rvv/autovec/unop/math-roundeven-3.c | 25 + .../riscv/rvv/autovec/vls/math-roundeven-1.c | 56 +++ 8 files changed, 189 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-3.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-roundeven-1.c diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index 680a3374972..cd0cbdd2889 100644 --- a/gcc/config/riscv/autovec.md +++ b/gcc/config/riscv/autovec.md @@ -2271,3 +2271,13 @@ (define_expand "btrunc2" DONE; } ) + +(define_expand "roundeven2" + [(match_operand:V_VLSF 0 "register_operand") + (match_operand:V_VLSF 1 "register_operand")] + "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math" + { +riscv_vector::expand_vec_roundeven (operands[0], operands[1], mode, mode); +DONE; + } +) diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h index 536e70bdcd3..368982a447b 100644 --- a/gcc/config/riscv/riscv-protos.h +++ b/gcc/config/riscv/riscv-protos.h @@ -259,6 +259,9 @@ enum insn_flags : unsigned int /* Means INSN has FRM operand and the value is FRM_RMM. */ FRM_RMM_P = 1 << 18, + + /* Means INSN has FRM operand and the value is FRM_RNE. */ + FRM_RNE_P = 1 << 19, }; enum insn_type : unsigned int @@ -303,6 +306,7 @@ enum insn_type : unsigned int UNARY_OP_TAMU_FRM_RUP = UNARY_OP_TAMU | FRM_RUP_P, UNARY_OP_TAMU_FRM_RDN = UNARY_OP_TAMU | FRM_RDN_P, UNARY_OP_TAMU_FRM_RMM = UNARY_OP_TAMU | FRM_RMM_P, + UNARY_OP_TAMU_FRM_RNE = UNARY_OP_TAMU | FRM_RNE_P, /* Binary operator. */ BINARY_OP = __NORMAL_OP | BINARY_OP_P, @@ -469,6 +473,7 @@ void expand_vec_nearbyint (rtx, rtx, machine_mode, machine_mode); void expand_vec_rint (rtx, rtx, machine_mode, machine_mode); void expand_vec_round (rtx, rtx, machine_mode,
Re: [PATCH v1] RISC-V: Support FP trunc auto-vectorization
LGTM. juzhe.zh...@rivai.ai From: pan2.li Date: 2023-09-27 11:28 To: gcc-patches CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng Subject: [PATCH v1] RISC-V: Support FP trunc auto-vectorization From: Pan Li This patch would like to support auto-vectorization for the trunc API in math.h. It depends on the -ffast-math option. When we would like to call trunc/truncf like v2 = trunc (v1), we will convert it into below insns (reference the implementation of llvm). * vfcvt.rtz.x.f v3, v1 * vfcvt.f.x v2, v3 However, the floating point value may not need the cvt as above if its mantissa is zero. Take single precision floating point as example: ++---+-+ | raw float | binary layout | after trunc | ++---+-+ | -8388607.5 | 0xcaff| -8388607.0 | | 8388607.5 | 0x4aff| 8388607.0 | | 8388608.0 | 0x4b00| 8388608.0 | | 8388609.0 | 0x4b01| 8388609.0 | ++---+-+ All single floating point >= 8388608.0 will have all zero mantisaa. We leverage vmflt and mask to filter them out in vector and only do the cvt on mask. Befor this patch: math-trunc-1.c:21:1: missed: couldn't vectorize loop ... .L3: flw fa0,0(s0) addis0,s0,4 addis1,s1,4 calltrunc fsw fa0,-4(s1) bne s0,s2,.L3 After this patch: vfabs.v v2,v1 vmflt.vfv0,v2,fa5 vfcvt.rtz.x.f.v v4,v1,v0.t vfcvt.f.x.v v2,v4,v0.t vfsgnj.vv v2,v2,v1 bne .L4 Please note VLS mode is also involved in this patch and covered by the test cases. gcc/ChangeLog: * config/riscv/autovec.md (btrunc2): New pattern. * config/riscv/riscv-protos.h (expand_vec_trunc): New func decl. * config/riscv/riscv-v.cc (emit_vec_cvt_x_f_rtz): New func impl. (expand_vec_trunc): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/unop/math-trunc-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-trunc-1.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-trunc-2.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-trunc-3.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-trunc-run-1.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-trunc-run-2.c: New test. * gcc.target/riscv/rvv/autovec/vls/math-trunc-1.c: New test. Signed-off-by: Pan Li --- gcc/config/riscv/autovec.md | 10 gcc/config/riscv/riscv-protos.h | 1 + gcc/config/riscv/riscv-v.cc | 32 +++ .../riscv/rvv/autovec/unop/math-trunc-0.c | 18 ++ .../riscv/rvv/autovec/unop/math-trunc-1.c | 18 ++ .../riscv/rvv/autovec/unop/math-trunc-2.c | 18 ++ .../riscv/rvv/autovec/unop/math-trunc-3.c | 20 +++ .../riscv/rvv/autovec/unop/math-trunc-run-1.c | 39 + .../riscv/rvv/autovec/unop/math-trunc-run-2.c | 39 + .../riscv/rvv/autovec/vls/math-trunc-1.c | 56 +++ 10 files changed, 251 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-trunc-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-trunc-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-trunc-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-trunc-3.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-trunc-run-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-trunc-run-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-trunc-1.c diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index 798cf1272c5..680a3374972 100644 --- a/gcc/config/riscv/autovec.md +++ b/gcc/config/riscv/autovec.md @@ -2261,3 +2261,13 @@ (define_expand "round2" DONE; } ) + +(define_expand "btrunc2" + [(match_operand:V_VLSF 0 "register_operand") + (match_operand:V_VLSF 1 "register_operand")] + "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math" + { +riscv_vector::expand_vec_trunc (operands[0], operands[1], mode, mode); +DONE; + } +) diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h index 70ca244c591..536e70bdcd3 100644 --- a/gcc/config/riscv/riscv-protos.h +++ b/gcc/config/riscv/riscv-protos.h @@ -468,6 +468,7 @@ void expand_vec_floor (rtx, rtx, machine_mode, machine_mode); void expand_vec_nearbyint (rtx, rtx, machine_mode, machine_mode); void expand_vec_rint (rtx, rtx, machine_mode, machine_mode); void expand_vec_round (rtx, rtx, machine_mode, machine_mode); +void expand_vec_trunc (rtx, rtx, machine_mode, machine_mode); #endif bool sew64_scalar_helper (rtx *, rtx *, rtx, machine_mode, bool, void (*)(rtx *, rtx)); diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index 5f738634219..8992977a51d 100644 --- a/gcc/config/riscv/riscv-v.cc
Re: [PATCH] RISC-V: Bugfix for RTL check[PR111533]
+ vid sequence. The elt (i) can be either const_int or + const_poly_int. */ + HOST_WIDE_INT diff = rtx_to_poly_int64 (builder.elt (i)).to_constant () - i; How about: poly_int64 diff = rtx_to_poly_int64 (builder.elt (i)) - i; rtx avl - = has_vl_op (insn->rtl ()) ? get_vl (insn->rtl ()) : dem.get_avl (); + = (has_vl_op (insn->rtl ()) && REG_P (get_vl (insn->rtl ( + ? get_vl (insn->rtl ()) + : dem.get_avl (); How about: rtx avl = dem.get_avl_or_vl_reg (); I wonder whether it is possible add a testcase for this issue ? juzhe.zh...@rivai.ai From: Li Xu Date: 2023-09-27 11:07 To: gcc-patches CC: kito.cheng; palmer; juzhe.zhong; xuli Subject: [PATCH] RISC-V: Bugfix for RTL check[PR111533] From: xuli Consider the flowing situation: BB5: local_dem(RVV Insn 1, AVL(reg zero)) RVV Insn 1: vmv.s.x, AVL (const_int 1) RVV Insn 2: vredsum.vs, AVL(reg zero) vmv.s.x has vl operand, the following code will get avl (cosnt_int) from RVV Insn 1. rtx avl = has_vl_op (insn->rtl ()) ? get_vl (insn->rtl ()) : dem.get_avl (); If use REGNO for const_int, the compiler will crash: during RTL pass: vsetvl res_debug.c: In function '__dn_count_labels': res_debug.c:1050:1: internal compiler error: RTL check: expected code 'reg', have 'const_int' in rhs_regno, at rtl.h:1934 1050 | } | ^ 0x8fb169 rtl_check_failed_code1(rtx_def const*, rtx_code, char const*, int, char const*) ../.././gcc/gcc/rtl.cc:770 0x1399818 rhs_regno(rtx_def const*) ../.././gcc/gcc/rtl.h:1934 0x1399818 anticipatable_occurrence_p ../.././gcc/gcc/config/riscv/riscv-vsetvl.cc:348 So in this case avl should be obtained from dem. Another issue is caused by the following code: HOST_WIDE_INT diff = INTVAL (builder.elt (i)) - i; during RTL pass: expand ../../.././gcc/libgfortran/generated/matmul_c4.c: In function 'matmul_c4': ../../.././gcc/libgfortran/generated/matmul_c4.c:2906:39: internal compiler error: RTL check: expected code 'const_int', have 'const_poly_int' in expand_const_vector, at config/riscv/riscv-v.cc:1149 The builder.elt (i) can be either const_int or const_poly_int. PR target/111533 gcc/ChangeLog: * config/riscv/riscv-v.cc (expand_const_vector): Fix bug. * config/riscv/riscv-vsetvl.cc (anticipatable_occurrence_p): Fix bug. --- gcc/config/riscv/riscv-v.cc | 6 -- gcc/config/riscv/riscv-vsetvl.cc | 5 - 2 files changed, 8 insertions(+), 3 deletions(-) diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index 5f738634219..fb3c55b4705 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -1147,8 +1147,10 @@ expand_const_vector (rtx target, rtx src) for (unsigned int i = 0; i < v.npatterns (); ++i) { /* Calculate the diff between the target sequence and - vid sequence. */ - HOST_WIDE_INT diff = INTVAL (builder.elt (i)) - i; + vid sequence. The elt (i) can be either const_int or + const_poly_int. */ + HOST_WIDE_INT diff = rtx_to_poly_int64 (builder.elt (i)).to_constant () - i; + v.quick_push (gen_int_mode (diff, v.inner_mode ())); } /* Step 2: Generate result = VID + diff. */ diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc index 7af33e7ea6f..27000434341 100644 --- a/gcc/config/riscv/riscv-vsetvl.cc +++ b/gcc/config/riscv/riscv-vsetvl.cc @@ -308,7 +308,10 @@ anticipatable_occurrence_p (const bb_info *bb, const vector_insn_info dem) { /* rs1 (avl) are not modified in the basic block prior to the VSETVL. */ rtx avl - = has_vl_op (insn->rtl ()) ? get_vl (insn->rtl ()) : dem.get_avl (); + = (has_vl_op (insn->rtl ()) && REG_P (get_vl (insn->rtl ( + ? get_vl (insn->rtl ()) + : dem.get_avl (); + if (dem.dirty_p ()) { gcc_assert (!vsetvl_insn_p (insn->rtl ())); -- 2.17.1
Re: [PATCH v1] RISC-V: Support FP round auto-vectorization
LGTM juzhe.zh...@rivai.ai From: pan2.li Date: 2023-09-26 19:00 To: gcc-patches CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng Subject: [PATCH v1] RISC-V: Support FP round auto-vectorization From: Pan Li This patch would like to support auto-vectorization for the round API in math.h. It depends on the -ffast-math option. When we would like to call round/roundf like v2 = round (v1), we will convert it into below insns (reference the implementation of llvm). * vfcvt.x.f v3, v1, RMM * vfcvt.f.x v2, v3 However, the floating point value may not need the cvt as above if its mantissa is zero. Take single precision floating point as example: ++---+-+ | raw float | binary layout | after round | ++---+-+ | -8388607.5 | 0xcaff| -8388608.0 | | 8388607.5 | 0x4aff| 8388608.0 | | 8388608.0 | 0x4b00| 8388608.0 | | 8388609.0 | 0x4b01| 8388609.0 | ++---+-+ All single floating point >= 8388608.0 will have all zero mantisaa. We leverage vmflt and mask to filter them out in vector and only do the cvt on mask. Befor this patch: math-round-1.c:21:1: missed: couldn't vectorize loop ... .L3: flw fa0,0(s0) addis0,s0,4 addis1,s1,4 callround fsw fa0,-4(s1) bne s0,s2,.L3 After this patch: ... fsrmi 4 // RMM, rounding to nearest, ties to max magnitude .L4: vfabs.v v2,v1 vmflt.vfv0,v2,fa5 vfcvt.x.f.v v4,v1,v0.t vfcvt.f.x.v v2,v4,v0.t vfsgnj.vv v2,v2,v1 bne .L4 .L14: fsrma6 ret Please note VLS mode is also involved in this patch and covered by the test cases. gcc/ChangeLog: * config/riscv/autovec.md (round2): New pattern. * config/riscv/riscv-protos.h (enum insn_flags): New enum type. (enum insn_type): Ditto. (expand_vec_round): New function decl. * config/riscv/riscv-v.cc (expand_vec_round): New function impl. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/unop/math-round-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-round-1.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-round-2.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-round-3.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-round-run-1.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-round-run-2.c: New test. * gcc.target/riscv/rvv/autovec/vls/math-round-1.c: New test. Signed-off-by: Pan Li --- gcc/config/riscv/autovec.md | 10 gcc/config/riscv/riscv-protos.h | 5 ++ gcc/config/riscv/riscv-v.cc | 24 .../riscv/rvv/autovec/unop/math-round-0.c | 23 .../riscv/rvv/autovec/unop/math-round-1.c | 23 .../riscv/rvv/autovec/unop/math-round-2.c | 23 .../riscv/rvv/autovec/unop/math-round-3.c | 25 + .../riscv/rvv/autovec/unop/math-round-run-1.c | 39 + .../riscv/rvv/autovec/unop/math-round-run-2.c | 39 + .../riscv/rvv/autovec/vls/math-round-1.c | 56 +++ 10 files changed, 267 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-round-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-round-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-round-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-round-3.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-round-run-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-round-run-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-round-1.c diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index 1d2fca60e98..798cf1272c5 100644 --- a/gcc/config/riscv/autovec.md +++ b/gcc/config/riscv/autovec.md @@ -2251,3 +2251,13 @@ (define_expand "rint2" DONE; } ) + +(define_expand "round2" + [(match_operand:V_VLSF 0 "register_operand") + (match_operand:V_VLSF 1 "register_operand")] + "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math" + { +riscv_vector::expand_vec_round (operands[0], operands[1], mode, mode); +DONE; + } +) diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h index 629adeea94c..70ca244c591 100644 --- a/gcc/config/riscv/riscv-protos.h +++ b/gcc/config/riscv/riscv-protos.h @@ -256,6 +256,9 @@ enum insn_flags : unsigned int /* Means INSN has FRM operand and the value is FRM_RDN. */ FRM_RDN_P = 1 << 17, + + /* Means INSN has FRM operand and the value is FRM_RMM. */ + FRM_RMM_P = 1 << 18, }; enum insn_type : unsigned int @@ -299,6 +302,7 @@ enum insn_type : unsigned int UNARY_OP_TAMU_FRM_DYN = UNARY_OP_TAMU | FRM_DYN_P, UNARY_OP_TAMU_FRM_RUP = UNARY_OP_TAMU | FRM
Re: Re: [PATCH] MATCH: Optimize COND_ADD reduction pattern
Address comments: V3 COND_LEN_ADD:https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631350.html V2 COND_ADD: https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631352.html Thanks. juzhe.zh...@rivai.ai From: Richard Biener Date: 2023-09-26 17:41 To: Juzhe-Zhong CC: gcc-patches; richard.sandiford Subject: Re: [PATCH] MATCH: Optimize COND_ADD reduction pattern On Tue, 26 Sep 2023, Juzhe-Zhong wrote: > Current COND_ADD reduction pattern can't optimize floating-point vector. > As Richard suggested: > https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631336.html > Allow COND_ADD reduction pattern to optimize floating-point vector. > > Bootstrap and Regression is running. > > Ok for trunk if tests pass ? I just wondered about fixed point - zerop seems to also allow fixed_zerop. Maybe do if (ANY_INTEGRAL_TYPE_P (type) || (FLOAT_TYPE_P (type) && fold_real_zero_addition_p (type, NULL_TREE, @4, 0))) (also for the other patch) to avoid touching the fixed point case. Richard. > gcc/ChangeLog: > > * match.pd: Optimize COND_ADD reduction pattern. > > --- > gcc/match.pd | 6 -- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/gcc/match.pd b/gcc/match.pd > index 5061c19e086..398beaebd27 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -8863,8 +8863,10 @@ and, > > c = mask1 && mask2 ? d + b : d. */ > (simplify > - (IFN_COND_ADD @0 @1 (vec_cond @2 @3 integer_zerop) @1) > - (IFN_COND_ADD (bit_and @0 @2) @1 @3 @1)) > + (IFN_COND_ADD @0 @1 (vec_cond @2 @3 zerop@4) @1) > + (if (ANY_INTEGRAL_TYPE_P (type) > + || fold_real_zero_addition_p (type, NULL_TREE, @4, 0)) > + (IFN_COND_ADD (bit_and @0 @2) @1 @3 @1))) > > /* Detect simplication for a conditional length reduction where > > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
Re: Re: [PATCH] MATCH: Optimize COND_ADD_LEN reduction pattern
Hi, Richi. Addresse comments. One is V2 patch for COND_LEN_ADD reduction: https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631340.html The second one is optimize COND_ADD reduction: https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631341.html juzhe.zh...@rivai.ai From: Richard Biener Date: 2023-09-26 15:46 To: Juzhe-Zhong CC: gcc-patches; richard.sandiford; rguenther; pinskia Subject: Re: [PATCH] MATCH: Optimize COND_ADD_LEN reduction pattern On Tue, Sep 26, 2023 at 9:13 AM Juzhe-Zhong wrote: > > > This patch leverage this commit: > https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=62b505a4d5fc89 > to optimize COND_LEN_ADD reduction pattern. > > We are doing optimization of VEC_COND_EXPR + COND_LEN_ADD -> COND_LEN_ADD. > > Consider thsi following case: > > #include > > void > pr11594 (uint64_t *restrict a, uint64_t *restrict b, int loop_size) > { > uint64_t result = 0; > > for (int i = 0; i < loop_size; i++) > { > if (b[i] <= a[i]) > { > result += a[i]; > } > } > > a[0] = result; > } > > Before this patch: > vsetvli a7,zero,e64,m1,ta,ma > vmv.v.i v2,0 > vmv1r.v v3,v2--- redundant > .L3: > vsetvli a5,a2,e64,m1,ta,ma > vle64.v v1,0(a3) > vle64.v v0,0(a1) > sllia6,a5,3 > vsetvli a7,zero,e64,m1,ta,ma > sub a2,a2,a5 > vmsleu.vv v0,v0,v1 > add a1,a1,a6 > vmerge.vvm v1,v3,v1,v0 redundant. > add a3,a3,a6 > vsetvli zero,a5,e64,m1,tu,ma > vadd.vv v2,v2,v1 > bne a2,zero,.L3 > li a5,0 > vsetvli a4,zero,e64,m1,ta,ma > vmv.s.x v1,a5 > vredsum.vs v2,v2,v1 > vmv.x.s a5,v2 > sd a5,0(a0) > ret > > After this patch: > > vsetvli a6,zero,e64,m1,ta,ma > vmv.v.i v1,0 > .L3: > vsetvli a5,a2,e64,m1,ta,ma > vle64.v v2,0(a4) > vle64.v v0,0(a1) > sllia3,a5,3 > vsetvli a6,zero,e64,m1,ta,ma > sub a2,a2,a5 > vmsleu.vv v0,v0,v2 > add a1,a1,a3 > vsetvli zero,a5,e64,m1,tu,mu > add a4,a4,a3 > vadd.vv v1,v1,v2,v0.t > bne a2,zero,.L3 > li a5,0 > vsetivlizero,1,e64,m1,ta,ma > vmv.s.x v2,a5 > vsetvli a5,zero,e64,m1,ta,ma > vredsum.vs v1,v1,v2 > vmv.x.s a5,v1 > sd a5,0(a0) > ret > > Bootstrap && Regression is running. > > Ok for trunk when testing passes ? > > PR tree-optimization/111594 > PR tree-optimization/110660 > > gcc/ChangeLog: > > * match.pd: Optimize COND_LEN_ADD reduction. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/rvv/autovec/cond/cond_reduc-1.c: New test. > * gcc.target/riscv/rvv/autovec/cond/pr111594.c: New test. > > --- > gcc/match.pd | 13 + > .../riscv/rvv/autovec/cond/cond_reduc-1.c | 29 +++ > .../riscv/rvv/autovec/cond/pr111594.c | 22 ++ > 3 files changed, 64 insertions(+) > create mode 100644 > gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_reduc-1.c > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111594.c > > diff --git a/gcc/match.pd b/gcc/match.pd > index a17778fbaa6..af8d12c138e 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -8866,6 +8866,19 @@ and, >(IFN_COND_ADD @0 @1 (vec_cond @2 @3 integer_zerop) @1) > (IFN_COND_ADD (bit_and @0 @2) @1 @3 @1)) > > +/* Detect simplication for a conditional length reduction where > + > + a = mask ? b : 0 > + c = i < len + bias ? d + a : d > + > + is turned into > + > + c = mask && i < len ? d + b : d. */ > +(simplify > + (IFN_COND_LEN_ADD integer_minus_onep @0 (vec_cond @1 @2 zerop) @0 @3 @4) I think you want intger_truep instead of integer_minus_onep for readability. Since you use zerop here can you also adjust the preceeding pattern? > + (if (!HONOR_NANS (type) && !HONOR_SIGNED_ZEROS (type)) it might be better to check ANY_INTEGRAL_TYPE_P (type) || fold_real_zero_addition_p (type, NULL_TREE, @5, 0) your change misses HONOR_SIGN_DEPENDENT_ROUNDING I think. > +(IFN_COND_LEN_ADD @1 @0 @2 @0 @3 @4))) > + > /* For pointers @0 and @2 and nonnegative constant offset @1, look for > expressions like: > > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_reduc-1.c >
Re: [PATCH v1] RISC-V: Support FP rint auto-vectorization
LGTM。 juzhe.zh...@rivai.ai From: pan2.li Date: 2023-09-26 15:24 To: gcc-patches CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng Subject: [PATCH v1] RISC-V: Support FP rint auto-vectorization From: Pan Li This patch would like to support auto-vectorization for the rint API in math.h. It depends on the -ffast-math option. When we would like to call rint/rintf like v2 = rint (v1), we will convert it into below insns (reference the implementation of llvm). * vfcvt.x.f v3, v1 * vfcvt.f.x v2, v3 However, the floating point value may not need the cvt as above if its mantissa is zero. Take single precision floating point as example: Assume we have RTZ rounding mode ++---+-+ | raw float | binary layout | after int | ++---+-+ | -8388607.5 | 0xcaff| -8388607.0 | | 8388607.5 | 0x4aff| 8388607.0 | | 8388608.0 | 0x4b00| 8388608.0 | | 8388609.0 | 0x4b01| 8388609.0 | ++---+-+ All single floating point >= 8388608.0 will have all zero mantisaa. We leverage vmflt and mask to filter them out in vector and only do the cvt on mask. Befor this patch: math-rint-1.c:21:1: missed: couldn't vectorize loop ... .L3: flw fa0,0(s0) addis0,s0,4 addis1,s1,4 callrint fsw fa0,-4(s1) bne s0,s2,.L3 After this patch: vfabs.v v2,v1 vmflt.vfv0,v2,fa5 vfcvt.x.f.v v4,v1,v0.t vfcvt.f.x.v v2,v4,v0.t vfsgnj.vv v2,v2,v1 Please note VLS mode is also involved in this patch and covered by the test cases. gcc/ChangeLog: * config/riscv/autovec.md (rint2): New pattern. * config/riscv/riscv-protos.h (expand_vec_rint): New function decl. * config/riscv/riscv-v.cc (expand_vec_rint): New function impl. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/unop/math-rint-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-rint-1.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-rint-2.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-rint-3.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-rint-run-1.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-rint-run-2.c: New test. * gcc.target/riscv/rvv/autovec/vls/math-rint-1.c: New test. Signed-off-by: Pan Li --- gcc/config/riscv/autovec.md | 10 gcc/config/riscv/riscv-protos.h | 1 + gcc/config/riscv/riscv-v.cc | 22 +++ .../riscv/rvv/autovec/unop/math-rint-0.c | 18 ++ .../riscv/rvv/autovec/unop/math-rint-1.c | 18 ++ .../riscv/rvv/autovec/unop/math-rint-2.c | 18 ++ .../riscv/rvv/autovec/unop/math-rint-3.c | 20 +++ .../riscv/rvv/autovec/unop/math-rint-run-1.c | 48 +++ .../riscv/rvv/autovec/unop/math-rint-run-2.c | 48 +++ .../riscv/rvv/autovec/vls/math-rint-1.c | 58 +++ 10 files changed, 261 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-rint-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-rint-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-rint-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-rint-3.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-rint-run-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-rint-run-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-rint-1.c diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index b47f086f5e6..1d2fca60e98 100644 --- a/gcc/config/riscv/autovec.md +++ b/gcc/config/riscv/autovec.md @@ -2241,3 +2241,13 @@ (define_expand "nearbyint2" DONE; } ) + +(define_expand "rint2" + [(match_operand:V_VLSF 0 "register_operand") + (match_operand:V_VLSF 1 "register_operand")] + "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math" + { +riscv_vector::expand_vec_rint (operands[0], operands[1], mode, mode); +DONE; + } +) diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h index f87bdef0f71..629adeea94c 100644 --- a/gcc/config/riscv/riscv-protos.h +++ b/gcc/config/riscv/riscv-protos.h @@ -462,6 +462,7 @@ void expand_reduction (unsigned, unsigned, rtx *, rtx); void expand_vec_ceil (rtx, rtx, machine_mode, machine_mode); void expand_vec_floor (rtx, rtx, machine_mode, machine_mode); void expand_vec_nearbyint (rtx, rtx, machine_mode, machine_mode); +void expand_vec_rint (rtx, rtx, machine_mode, machine_mode); #endif bool sew64_scalar_helper (rtx *, rtx *, rtx, machine_mode, bool, void (*)(rtx *, rtx)); diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index 5d3d458fa6c..445ed000f88 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -3698,4 +3698
Re: [PATCH v2] RISC-V: Support FP nearbyint auto-vectorization
LGTM. juzhe.zh...@rivai.ai From: pan2.li Date: 2023-09-26 15:19 To: gcc-patches CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng Subject: [PATCH v2] RISC-V: Support FP nearbyint auto-vectorization From: Pan Li This patch would like to support auto-vectorization for the nearbyint API in math.h. It depends on the -ffast-math option. When we would like to call nearbyint/nearbyintf like v2 = nearbyint (v1), we will convert it into below insns (reference the implementation of llvm). * frflags a5 * vfcvt.x.f v3, v1, RDN * vfcvt.f.x v2, v3 * fsflags a5 However, the floating point value may not need the cvt as above if its mantissa is zero. Take single precision floating point as example: Assume we have RTZ rounding mode ++---+-+ | raw float | binary layout | after nearbyint | ++---+-+ | 8388607.5 | 0x4aff| 8388607.0 | | 8388608.0 | 0x4b00| 8388608.0 | | 8388609.0 | 0x4b01| 8388609.0 | ++---+-+ All single floating point >= 8388608.0 will have all zero mantisaa. We leverage vmflt and mask to filter them out in vector and only do the cvt on mask. Befor this patch: math-nearbyint-1.c:21:1: missed: couldn't vectorize loop ... .L3: flw fa0,0(s0) addis0,s0,4 addis1,s1,4 callnearbyint fsw fa0,-4(s1) bne s0,s2,.L3 After this patch: vfabs.v v2,v1 vmflt.vfv0,v2,fa5 frflags a7 vfcvt.x.f.v v4,v1,v0.t vfcvt.f.x.v v2,v4,v0.t fsflags a7 vfsgnj.vv v2,v2,v1 Please note VLS mode is also involved in this patch and covered by the test cases. gcc/ChangeLog: * config/riscv/autovec.md (nearbyint2): New pattern. * config/riscv/riscv-protos.h (enum insn_type): New enum. (expand_vec_nearbyint): New function decl. * config/riscv/riscv-v.cc (expand_vec_nearbyint): New func impl. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/unop/test-math.h: Add helper function. * gcc.target/riscv/rvv/autovec/unop/math-nearbyint-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-nearbyint-1.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-nearbyint-2.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-nearbyint-3.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-nearbyint-run-1.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-nearbyint-run-2.c: New test. * gcc.target/riscv/rvv/autovec/vls/math-nearbyint-1.c: New test. Signed-off-by: Pan Li --- gcc/config/riscv/autovec.md | 11 gcc/config/riscv/riscv-protos.h | 2 + gcc/config/riscv/riscv-v.cc | 29 ++ .../riscv/rvv/autovec/unop/math-nearbyint-0.c | 20 +++ .../riscv/rvv/autovec/unop/math-nearbyint-1.c | 20 +++ .../riscv/rvv/autovec/unop/math-nearbyint-2.c | 20 +++ .../riscv/rvv/autovec/unop/math-nearbyint-3.c | 22 +++ .../rvv/autovec/unop/math-nearbyint-run-1.c | 48 +++ .../rvv/autovec/unop/math-nearbyint-run-2.c | 48 +++ .../riscv/rvv/autovec/unop/test-math.h| 33 +++ .../riscv/rvv/autovec/vls/math-nearbyint-1.c | 58 +++ 11 files changed, 311 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-nearbyint-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-nearbyint-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-nearbyint-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-nearbyint-3.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-nearbyint-run-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-nearbyint-run-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-nearbyint-1.c diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index a005e17457e..b47f086f5e6 100644 --- a/gcc/config/riscv/autovec.md +++ b/gcc/config/riscv/autovec.md @@ -2210,6 +2210,7 @@ (define_expand "avg3_ceil" ;; Includes: ;; - ceil/ceilf ;; - floor/floorf +;; - nearbyint/nearbyintf ;; - (define_expand "ceil2" [(match_operand:V_VLSF 0 "register_operand") @@ -2230,3 +2231,13 @@ (define_expand "floor2" DONE; } ) + +(define_expand "nearbyint2" + [(match_operand:V_VLSF 0 "register_operand") + (match_operand:V_VLSF 1 "register_operand")] + "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math" + { +riscv_vector::expand_vec_nearbyint (operands[0], operands[1], mode, mode); +DONE; + } +) diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h index 63eb2475705..f87bdef0f71 100644 --- a/gcc/config/riscv/riscv-protos.h +++ b/gcc/config/riscv/riscv-p
Re: [PATCH v1] RISC-V: Rename rounding const fp function for refactor
LGTM. juzhe.zh...@rivai.ai From: pan2.li Date: 2023-09-26 11:12 To: gcc-patches CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng Subject: [PATCH v1] RISC-V: Rename rounding const fp function for refactor From: Pan Li The rounding related API shared one const, rename it to avoid unnecessary redundant code. gcc/ChangeLog: * config/riscv/riscv-v.cc (gen_ceil_const_fp): Remove. (get_fp_rounding_coefficient): Rename. (gen_floor_const_fp): Remove. (expand_vec_ceil): Take renamed func. (expand_vec_floor): Ditto. Signed-off-by: Pan Li --- gcc/config/riscv/riscv-v.cc | 13 +++-- 1 file changed, 3 insertions(+), 10 deletions(-) diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index a1ffefb23f3..9a1df950d58 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -3548,7 +3548,7 @@ cmp_lmul_gt_one (machine_mode mode) greater than and equal to 4503599627370496. */ static rtx -gen_ceil_const_fp (machine_mode inner_mode) +get_fp_rounding_coefficient (machine_mode inner_mode) { REAL_VALUE_TYPE real; @@ -3564,13 +3564,6 @@ gen_ceil_const_fp (machine_mode inner_mode) return const_double_from_real_value (real, inner_mode); } -static rtx -gen_floor_const_fp (machine_mode inner_mode) -{ - /* The floor needs the same floating point const as ceil. */ - return gen_ceil_const_fp (inner_mode); -} - static rtx emit_vec_float_cmp_mask (rtx fp_vector, rtx_code code, rtx fp_scalar, machine_mode vec_fp_mode) @@ -3637,7 +3630,7 @@ expand_vec_ceil (rtx op_0, rtx op_1, machine_mode vec_fp_mode, emit_vec_abs (op_0, op_1, vec_fp_mode); /* Step-2: Generate the mask on const fp. */ - rtx const_fp = gen_ceil_const_fp (GET_MODE_INNER (vec_fp_mode)); + rtx const_fp = get_fp_rounding_coefficient (GET_MODE_INNER (vec_fp_mode)); rtx mask = emit_vec_float_cmp_mask (op_0, LT, const_fp, vec_fp_mode); /* Step-3: Convert to integer on mask, with rounding up (aka ceil). */ @@ -3662,7 +3655,7 @@ expand_vec_floor (rtx op_0, rtx op_1, machine_mode vec_fp_mode, emit_vec_abs (op_0, op_1, vec_fp_mode); /* Step-2: Generate the mask on const fp. */ - rtx const_fp = gen_floor_const_fp (GET_MODE_INNER (vec_fp_mode)); + rtx const_fp = get_fp_rounding_coefficient (GET_MODE_INNER (vec_fp_mode)); rtx mask = emit_vec_float_cmp_mask (op_0, LT, const_fp, vec_fp_mode); /* Step-3: Convert to integer on mask, with rounding down (aka floor). */ -- 2.34.1
Re: [PATCH v1] RISC-V: Support FP nearbyint auto-vectorization
+static rtx +gen_nearbyint_const_fp (machine_mode inner_mode) +{ + /* The nearbyint needs the same floating point const as ceil. */ + return gen_ceil_const_fp (inner_mode); +} This is redundant. Also, this is also redundant: static rtx gen_floor_const_fp (machine_mode inner_mode) { /* The floor needs the same floating point const as ceil. */ return gen_ceil_const_fp (inner_mode); } So rename it : gen_ceil_const_fp (machine_mode inner_mode) into: get_fp_rounding_coefficient juzhe.zh...@rivai.ai From: pan2.li Date: 2023-09-26 10:39 To: gcc-patches CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng Subject: [PATCH v1] RISC-V: Support FP nearbyint auto-vectorization From: Pan Li This patch would like to support auto-vectorization for the nearbyint API in math.h. It depends on the -ffast-math option. When we would like to call nearbyint/nearbyintf like v2 = nearbyint (v1), we will convert it into below insns (reference the implementation of llvm). * frflags a5 * vfcvt.x.f v3, v1, RDN * vfcvt.f.x v2, v3 * fsflags a5 However, the floating point value may not need the cvt as above if its mantissa is zero. Take single precision floating point as example: Assume we have RTZ rounding mode ++---+-+ | raw float | binary layout | after nearbyint | ++---+-+ | 8388607.5 | 0x4aff| 8388607.0 | | 8388608.0 | 0x4b00| 8388608.0 | | 8388609.0 | 0x4b01| 8388609.0 | ++---+-+ All single floating point >= 8388608.0 will have all zero mantisaa. We leverage vmflt and mask to filter them out in vector and only do the cvt on mask. Befor this patch: math-nearbyint-1.c:21:1: missed: couldn't vectorize loop ... .L3: flw fa0,0(s0) addis0,s0,4 addis1,s1,4 callnearbyint fsw fa0,-4(s1) bne s0,s2,.L3 After this patch: vfabs.v v2,v1 vmflt.vfv0,v2,fa5 frflags a7 vfcvt.x.f.v v4,v1,v0.t vfcvt.f.x.v v2,v4,v0.t fsflags a7 vfsgnj.vv v2,v2,v1 Please note VLS mode is also involved in this patch and covered by the test cases. gcc/ChangeLog: * config/riscv/autovec.md (nearbyint2): New pattern. * config/riscv/riscv-protos.h (enum insn_type): New enum. (expand_vec_nearbyint): New function decl. * config/riscv/riscv-v.cc (gen_nearbyint_const_fp): New function impl. (expand_vec_nearbyint): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/unop/test-math.h: Add helper function. * gcc.target/riscv/rvv/autovec/unop/math-nearbyint-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-nearbyint-1.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-nearbyint-2.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-nearbyint-3.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-nearbyint-run-1.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-nearbyint-run-2.c: New test. * gcc.target/riscv/rvv/autovec/vls/math-nearbyint-1.c: New test. Signed-off-by: Pan Li --- gcc/config/riscv/autovec.md | 11 gcc/config/riscv/riscv-protos.h | 2 + gcc/config/riscv/riscv-v.cc | 36 .../riscv/rvv/autovec/unop/math-nearbyint-0.c | 20 +++ .../riscv/rvv/autovec/unop/math-nearbyint-1.c | 20 +++ .../riscv/rvv/autovec/unop/math-nearbyint-2.c | 20 +++ .../riscv/rvv/autovec/unop/math-nearbyint-3.c | 22 +++ .../rvv/autovec/unop/math-nearbyint-run-1.c | 48 +++ .../rvv/autovec/unop/math-nearbyint-run-2.c | 48 +++ .../riscv/rvv/autovec/unop/test-math.h| 33 +++ .../riscv/rvv/autovec/vls/math-nearbyint-1.c | 58 +++ 11 files changed, 318 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-nearbyint-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-nearbyint-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-nearbyint-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-nearbyint-3.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-nearbyint-run-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-nearbyint-run-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-nearbyint-1.c diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index a005e17457e..b47f086f5e6 100644 --- a/gcc/config/riscv/autovec.md +++ b/gcc/config/riscv/autovec.md @@ -2210,6 +2210,7 @@ (define_expand "avg3_ceil" ;; Includes: ;; - ceil/ceilf ;; - floor/floorf +;; - nearbyint/nearbyintf ;; - (define_expand "ceil2" [(match_operand:V_VLSF 0 "register_operand") @@ -2230,3 +2231,13 @@ (define_expand "floor2" DONE; } ) + +(define_expand "near
Re: [PATCH v2] RISC-V: Refine the code gen for ceil auto vectorization.
LGTM. juzhe.zh...@rivai.ai From: pan2.li Date: 2023-09-22 20:16 To: gcc-patches CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng Subject: [PATCH v2] RISC-V: Refine the code gen for ceil auto vectorization. From: Pan Li We vectorized below ceil code already. void test_ceil (float *out, float *in, int count) { for (unsigned i = 0; i < count; i++) out[i] = __builtin_ceilf (in[i]); } Before this patch: vfmv.v.xv4,fa0 // can be removed vfabs.v v0,v1 vmv1r.v v2,v1 // can be removed vmflt.vvv0,v0,v4 // can be refined to vmflt.vf vfcvt.x.f.v v3,v1,v0.t vfcvt.f.x.v v2,v3,v0.t vfsgnj.vv v2,v2,v1 After this patch: vfabs.v v1,v2 vmflt.vfv0,v1,fa5 vfcvt.x.f.v v3,v2,v0.t vfcvt.f.x.v v1,v3,v0.t vfsgnj.vv v1,v1,v2 We can generate better code include below items. * Remove vfmv.v.f. * Take vmflt.vf instead of vmflt.vv. * Remove vmv1r.v. gcc/ChangeLog: * config/riscv/riscv-v.cc (expand_vec_float_cmp_mask): Refactor. (emit_vec_float_cmp_mask): Rename. (expand_vec_copysign): Ditto. (emit_vec_copysign): Ditto. (emit_vec_abs): New function impl. (emit_vec_cvt_x_f): Ditto. (emit_vec_cvt_f_x): Ditto. (expand_vec_ceil): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/unop/math-ceil-0.c: Adjust body check. * gcc.target/riscv/rvv/autovec/unop/math-ceil-1.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/math-ceil-2.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/math-ceil-3.c: Ditto. Signed-off-by: Pan Li --- gcc/config/riscv/riscv-v.cc | 81 --- .../riscv/rvv/autovec/unop/math-ceil-0.c | 5 +- .../riscv/rvv/autovec/unop/math-ceil-1.c | 5 +- .../riscv/rvv/autovec/unop/math-ceil-2.c | 5 +- .../riscv/rvv/autovec/unop/math-ceil-3.c | 5 +- 5 files changed, 54 insertions(+), 47 deletions(-) diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index 4d0e1d8d1a9..251d827d973 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -3557,36 +3557,27 @@ gen_ceil_const_fp (machine_mode inner_mode) } static rtx -expand_vec_float_cmp_mask (rtx fp_vector, rtx_code code, rtx fp_scalar, -machine_mode vec_fp_mode) +emit_vec_float_cmp_mask (rtx fp_vector, rtx_code code, rtx fp_scalar, + machine_mode vec_fp_mode) { - /* Step-1: Get the abs float value for mask generation. */ - rtx tmp = gen_reg_rtx (vec_fp_mode); - rtx abs_ops[] = {tmp, fp_vector}; - insn_code icode = code_for_pred (ABS, vec_fp_mode); - emit_vlmax_insn (icode, UNARY_OP, abs_ops); - - /* Step-2: Prepare the scalar float compare register. */ + /* Step-1: Prepare the scalar float compare register. */ rtx fp_reg = gen_reg_rtx (GET_MODE_INNER (vec_fp_mode)); emit_insn (gen_move_insn (fp_reg, fp_scalar)); - /* Step-3: Prepare the vector float compare register. */ - rtx vec_dup = gen_reg_rtx (vec_fp_mode); - icode = code_for_pred_broadcast (vec_fp_mode); - rtx vfmv_ops[] = {vec_dup, fp_reg}; - emit_vlmax_insn (icode, UNARY_OP, vfmv_ops); - - /* Step-4: Generate the mask. */ + /* Step-2: Generate the mask. */ machine_mode mask_mode = get_mask_mode (vec_fp_mode); rtx mask = gen_reg_rtx (mask_mode); - expand_vec_cmp (mask, code, tmp, vec_dup); + rtx cmp = gen_rtx_fmt_ee (code, mask_mode, fp_vector, fp_reg); + rtx cmp_ops[] = {mask, cmp, fp_vector, fp_reg}; + insn_code icode = code_for_pred_cmp_scalar (vec_fp_mode); + emit_vlmax_insn (icode, COMPARE_OP, cmp_ops); return mask; } static void -expand_vec_copysign (rtx op_dest, rtx op_src_0, rtx op_src_1, - machine_mode vec_mode) +emit_vec_copysign (rtx op_dest, rtx op_src_0, rtx op_src_1, +machine_mode vec_mode) { rtx sgnj_ops[] = {op_dest, op_src_0, op_src_1}; insn_code icode = code_for_pred (UNSPEC_VCOPYSIGN, vec_mode); @@ -3594,30 +3585,58 @@ expand_vec_copysign (rtx op_dest, rtx op_src_0, rtx op_src_1, emit_vlmax_insn (icode, BINARY_OP, sgnj_ops); } +static void +emit_vec_abs (rtx op_dest, rtx op_src, machine_mode vec_mode) +{ + rtx abs_ops[] = {op_dest, op_src}; + insn_code icode = code_for_pred (ABS, vec_mode); + + emit_vlmax_insn (icode, UNARY_OP, abs_ops); +} + +static void +emit_vec_cvt_x_f (rtx op_dest, rtx op_src, rtx mask, + insn_type type, machine_mode vec_mode) +{ + rtx cvt_x_ops[] = {op_dest, mask, op_dest, op_src}; + insn_code icode = code_for_pred_fcvt_x_f (UNSPEC_VFCVT, vec_mode); + + emit_vlmax_insn (icode, type, cvt_x_ops); +} + +static void +emit_vec_cvt_f_x (rtx op_dest, rtx op_src, rtx mask, + insn_type type, machine_mode vec_mode) +{ + rtx cvt_fp_ops[] = {op_dest, mask, op_dest, op_src}; + insn_code icode = code_for_pred (FLOAT, vec_mode); + + emit_vlmax_insn (icode, type, cvt_fp_ops); +} + void expand_vec_ceil (rtx op_0, rtx op_1, machine_mode vec_fp_mode, machine_mode vec_int_mode) { - /* Step-1: Generate the mask on const fp. */ + /* Step-1: Get the abs float value for mask generation. */ + emit_vec_abs (op_0, op_1, vec_fp_mode); + + /* Step-2: Generate
Re: [PATCH v1] RISC-V: Refine the code gen for ceil auto vectorization.
I prefer change expand_vec_copysign into emit_vec_copysign。 Likewise, emit_fabs. ...etc. juzhe.zh...@rivai.ai From: pan2.li Date: 2023-09-22 19:19 To: gcc-patches CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng Subject: [PATCH v1] RISC-V: Refine the code gen for ceil auto vectorization. From: Pan Li We vectorized below ceil code already. void test_ceil (float *out, float *in, int count) { for (unsigned i = 0; i < count; i++) out[i] = __builtin_ceilf (in[i]); } Before this patch: vfmv.v.xv4,fa0 // can be removed vfabs.v v0,v1 vmv1r.v v2,v1 // can be removed vmflt.vvv0,v0,v4 // can be refined to vmflt.vf vfcvt.x.f.v v3,v1,v0.t vfcvt.f.x.v v2,v3,v0.t vfsgnj.vv v2,v2,v1 After this patch: vfabs.v v1,v2 vmflt.vfv0,v1,fa5 vfcvt.x.f.v v3,v2,v0.t vfcvt.f.x.v v1,v3,v0.t vfsgnj.vv v1,v1,v2 We can generate better code include below items. * Remove vfmv.v.f. * Take vmflt.vf instead of vmflt.vv. * Remove vmv1r.v. gcc/ChangeLog: * config/riscv/riscv-v.cc (expand_vec_float_cmp_mask): Refactor. (expand_vec_abs): New function impl. (expand_vec_cvt_x_f): Ditto. (expand_vec_cvt_f_x): Ditto. (expand_vec_ceil): Refine. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/unop/math-ceil-0.c: Adjust body check. * gcc.target/riscv/rvv/autovec/unop/math-ceil-1.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/math-ceil-2.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/math-ceil-3.c: Ditto. Signed-off-by: Pan Li --- gcc/config/riscv/riscv-v.cc | 71 --- .../riscv/rvv/autovec/unop/math-ceil-0.c | 5 +- .../riscv/rvv/autovec/unop/math-ceil-1.c | 5 +- .../riscv/rvv/autovec/unop/math-ceil-2.c | 5 +- .../riscv/rvv/autovec/unop/math-ceil-3.c | 5 +- 5 files changed, 49 insertions(+), 42 deletions(-) diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index 4d0e1d8d1a9..ea2b01f6a6e 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -3560,26 +3560,17 @@ static rtx expand_vec_float_cmp_mask (rtx fp_vector, rtx_code code, rtx fp_scalar, machine_mode vec_fp_mode) { - /* Step-1: Get the abs float value for mask generation. */ - rtx tmp = gen_reg_rtx (vec_fp_mode); - rtx abs_ops[] = {tmp, fp_vector}; - insn_code icode = code_for_pred (ABS, vec_fp_mode); - emit_vlmax_insn (icode, UNARY_OP, abs_ops); - - /* Step-2: Prepare the scalar float compare register. */ + /* Step-1: Prepare the scalar float compare register. */ rtx fp_reg = gen_reg_rtx (GET_MODE_INNER (vec_fp_mode)); emit_insn (gen_move_insn (fp_reg, fp_scalar)); - /* Step-3: Prepare the vector float compare register. */ - rtx vec_dup = gen_reg_rtx (vec_fp_mode); - icode = code_for_pred_broadcast (vec_fp_mode); - rtx vfmv_ops[] = {vec_dup, fp_reg}; - emit_vlmax_insn (icode, UNARY_OP, vfmv_ops); - - /* Step-4: Generate the mask. */ + /* Step-2: Generate the mask. */ machine_mode mask_mode = get_mask_mode (vec_fp_mode); rtx mask = gen_reg_rtx (mask_mode); - expand_vec_cmp (mask, code, tmp, vec_dup); + rtx cmp = gen_rtx_fmt_ee (code, mask_mode, fp_vector, fp_reg); + rtx cmp_ops[] = {mask, cmp, fp_vector, fp_reg}; + insn_code icode = code_for_pred_cmp_scalar (vec_fp_mode); + emit_vlmax_insn (icode, COMPARE_OP, cmp_ops); return mask; } @@ -3594,29 +3585,57 @@ expand_vec_copysign (rtx op_dest, rtx op_src_0, rtx op_src_1, emit_vlmax_insn (icode, BINARY_OP, sgnj_ops); } +static void +expand_vec_abs (rtx op_dest, rtx op_src, machine_mode vec_mode) +{ + rtx abs_ops[] = {op_dest, op_src}; + insn_code icode = code_for_pred (ABS, vec_mode); + + emit_vlmax_insn (icode, UNARY_OP, abs_ops); +} + +static void +expand_vec_cvt_x_f (rtx op_dest, rtx op_src, rtx mask, + insn_type type, machine_mode vec_mode) +{ + rtx cvt_x_ops[] = {op_dest, mask, op_dest, op_src}; + insn_code icode = code_for_pred_fcvt_x_f (UNSPEC_VFCVT, vec_mode); + + emit_vlmax_insn (icode, type, cvt_x_ops); +} + +static void +expand_vec_cvt_f_x (rtx op_dest, rtx op_src, rtx mask, + insn_type type, machine_mode vec_mode) +{ + rtx cvt_fp_ops[] = {op_dest, mask, op_dest, op_src}; + insn_code icode = code_for_pred (FLOAT, vec_mode); + + emit_vlmax_insn (icode, type, cvt_fp_ops); +} + void expand_vec_ceil (rtx op_0, rtx op_1, machine_mode vec_fp_mode, machine_mode vec_int_mode) { - /* Step-1: Generate the mask on const fp. */ + /* Step-1: Get the abs float value for mask generation. */ + expand_vec_abs (op_0, op_1, vec_fp_mode); + + /* Step-2: Generate the mask on const fp. */ rtx const_fp = gen_ceil_const_fp (GET_MODE_INNER (vec_fp_mode)); - rtx mask = expand_vec_float_cmp_mask (op_1, LT, const_fp, vec_fp_mode); + rtx mask = expand_vec_float_cmp_mask (op_0, LT, const_fp, vec_fp_mode); - /* Step-2: Convert to integer on mask, with rounding up (aka ceil). */ + /* Step-3: Convert to integer on mask, with rounding up (aka ceil). */ rtx tmp = gen_reg_rtx (vec_int_mode); - rtx cvt_x_
Re: [PATCH v1] RISC-V: Move ceil test cases to unop folder
ok juzhe.zh...@rivai.ai From: pan2.li Date: 2023-09-22 17:11 To: gcc-patches CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng Subject: [PATCH v1] RISC-V: Move ceil test cases to unop folder From: Pan Li gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/math-ceil-0.c: Moved to... * gcc.target/riscv/rvv/autovec/unop/math-ceil-0.c: ...here. * gcc.target/riscv/rvv/autovec/math-ceil-1.c: Moved to... * gcc.target/riscv/rvv/autovec/unop/math-ceil-1.c: ...here. * gcc.target/riscv/rvv/autovec/math-ceil-2.c: Moved to... * gcc.target/riscv/rvv/autovec/unop/math-ceil-2.c: ...here. * gcc.target/riscv/rvv/autovec/math-ceil-3.c: Moved to... * gcc.target/riscv/rvv/autovec/unop/math-ceil-3.c: ...here. * gcc.target/riscv/rvv/autovec/math-ceil-run-0.c: Moved to... * gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c: ...here. * gcc.target/riscv/rvv/autovec/math-ceil-run-1.c: Moved to... * gcc.target/riscv/rvv/autovec/unop/math-ceil-run-1.c: ...here. * gcc.target/riscv/rvv/autovec/math-ceil-run-2.c: Moved to... * gcc.target/riscv/rvv/autovec/unop/math-ceil-run-2.c: ...here. * gcc.target/riscv/rvv/autovec/test-math.h: Moved to... * gcc.target/riscv/rvv/autovec/unop/test-math.h: ...here. Signed-off-by: Pan Li --- .../gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-0.c | 0 .../gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-1.c | 0 .../gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-2.c | 0 .../gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-3.c | 0 .../gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-run-0.c | 0 .../gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-run-1.c | 0 .../gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-run-2.c | 0 gcc/testsuite/gcc.target/riscv/rvv/autovec/{ => unop}/test-math.h | 0 8 files changed, 0 insertions(+), 0 deletions(-) rename gcc/testsuite/gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-0.c (100%) rename gcc/testsuite/gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-1.c (100%) rename gcc/testsuite/gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-2.c (100%) rename gcc/testsuite/gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-3.c (100%) rename gcc/testsuite/gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-run-0.c (100%) rename gcc/testsuite/gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-run-1.c (100%) rename gcc/testsuite/gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-run-2.c (100%) rename gcc/testsuite/gcc.target/riscv/rvv/autovec/{ => unop}/test-math.h (100%) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-0.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-0.c similarity index 100% rename from gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-0.c rename to gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-0.c diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-1.c similarity index 100% rename from gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-1.c rename to gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-1.c diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-2.c similarity index 100% rename from gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-2.c rename to gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-2.c diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-3.c similarity index 100% rename from gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-3.c rename to gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-3.c diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-0.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c similarity index 100% rename from gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-0.c rename to gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-1.c similarity index 100% rename from gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-1.c rename to gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-1.c diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-2.c similarity index 100% rename from gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-2.c rename to gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-2.c diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/test-math.h b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/test-math.h similarity index 100% rename from gcc/testsuite/gcc.target/riscv/rvv/autovec/test-math.h rename to gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/test-math.h -- 2.34.1
Re: [PATCH v1] RISC-V: Remove arch and abi option for run test case.
LGTM juzhe.zh...@rivai.ai From: pan2.li Date: 2023-09-22 11:39 To: gcc-patches CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng Subject: [PATCH v1] RISC-V: Remove arch and abi option for run test case. From: Pan Li Remove the -march and -mabi. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/math-ceil-run-0.c: Remove arch and abi. * gcc.target/riscv/rvv/autovec/math-ceil-run-1.c: Ditto. * gcc.target/riscv/rvv/autovec/math-ceil-run-2.c: Ditto. Signed-off-by: Pan Li --- gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-0.c | 2 +- gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-1.c | 2 +- gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-2.c | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-0.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-0.c index f1946e197cc..67462154018 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-0.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-0.c @@ -1,5 +1,5 @@ /* { dg-do run { target { riscv_vector } } } */ -/* { dg-additional-options "-march=rv64gcv_zvfh -std=c2x -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math" } */ +/* { dg-additional-options "-std=c2x -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math" } */ #include "test-math.h" diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-1.c index 202944ddd92..38adff16df9 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-1.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-1.c @@ -1,5 +1,5 @@ /* { dg-do run { target { riscv_vector } } } */ -/* { dg-additional-options "-march=rv64gcv -std=c99 -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math" } */ +/* { dg-additional-options "-std=c99 -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math" } */ #include "test-math.h" diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-2.c index f0ff9bca0af..6f22842ebdb 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-2.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-2.c @@ -1,5 +1,5 @@ /* { dg-do run { target { riscv_vector } } } */ -/* { dg-additional-options "-march=rv64gcv -std=c99 -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math" } */ +/* { dg-additional-options "-std=c99 -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math" } */ #include "test-math.h" -- 2.34.1
Re: [PATCH V2] RISC-V: Optimization of vrgather.vv into vrgatherei16.vv[PR111451]
LGTM. You can commit it after you pass the regression. juzhe.zh...@rivai.ai From: Li Xu Date: 2023-09-22 10:37 To: gcc-patches CC: kito.cheng; palmer; juzhe.zhong; xuli Subject: [PATCH V2] RISC-V: Optimization of vrgather.vv into vrgatherei16.vv[PR111451] From: xuli Consider this following case: typedef int32_t vnx32si __attribute__ ((vector_size (128))); __attribute__ ((noipa)) void permute_##TYPE (TYPE values1, TYPE values2, \ TYPE *out) \ {\ TYPE v \ = __builtin_shufflevector (values1, values2, MASK_##NUNITS (0, NUNITS)); \ *(TYPE *) out = v; \ } T (vnx32si, 32) \ TEST_ALL (PERMUTE) Before this patch: li a4,31 vsetvli a5,zero,e32,m8,ta,ma vl8re32.v v24,0(a0) vid.v v8 vrsub.vx v8,v8,a4 vrgather.vv v16,v24,v8 vs8r.v v16,0(a2) ret The index vector register "v8" occupies 8 registers. We should optimize it into vrgatherei16.vv which is using int16 as the index elements. After this patch: vsetvli a5,zero,e16,m4,ta,ma li a4,31 vid.v v4 vl8re32.v v16,0(a0) vrsub.vx v4,v4,a4 vsetvli zero,zero,e32,m8,ta,ma vrgatherei16.vv v8,v16,v4 vs8r.v v8,0(a2) ret With vrgatherei16.vv, the v8 will occupy 4 registers instead of 8. Lower the register consuming and register pressure. PR target/111451 gcc/ChangeLog: * config/riscv/riscv-v.cc (emit_vlmax_gather_insn): Optimization of vrgather.vv into vrgatherei16.vv. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c: Adjust case. * gcc.target/riscv/rvv/autovec/vls/perm-4.c: Ditto. --- gcc/config/riscv/riscv-v.cc| 18 ++ .../riscv/rvv/autovec/vls-vlmax/perm-4.c | 3 ++- .../gcc.target/riscv/rvv/autovec/vls/perm-4.c | 3 ++- 3 files changed, 22 insertions(+), 2 deletions(-) diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index 64a71a128d4..455efa7ea8a 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -790,6 +790,24 @@ emit_vlmax_gather_insn (rtx target, rtx op, rtx sel) icode = code_for_pred_gather_scalar (data_mode); sel = elt; } + else if (CONST_VECTOR_P (sel) + && GET_MODE_BITSIZE (GET_MODE_INNER (sel_mode)) > 16 + && riscv_get_v_regno_alignment (data_mode) > 1) +{ + /* If the inner mode of data is not QI or HI and data_lmul > 1, + emitting vrgatherei16.vv instruction will lower register + pressure. + data_mode sel_mode ei16 + RVVM1QIRVVM1QI RVVM2HI not needed + RVVM2QIRVVM2QI RVVM4HI not needed + RVVM2HIRVVM2HI RVVM2HI not needed + RVVM2SIRVVM2SI RVVM1HI need + RVVM4SIRVVM4SI RVVM2HI need + RVVM8DIRVVM8DI RVVM2HI need */ + PUT_MODE (sel, get_vector_mode (HImode, +GET_MODE_NUNITS (data_mode)).require ()); + icode = code_for_pred_gatherei16 (data_mode); +} else icode = code_for_pred_gather (data_mode); rtx ops[] = {target, op, sel}; diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c index 9df69a0cc2c..7ab31043547 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c @@ -55,6 +55,7 @@ TEST_ALL (PERMUTE) -/* { dg-final { scan-assembler-times {vrgather\.vv\tv[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 31 } } */ +/* { dg-final { scan-assembler-times {vrgather\.vv\tv[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 19 } } */ +/* { dg-final { scan-assembler-times {vrgatherei16\.vv\tv[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 12 } } */ /* { dg-final { scan-assembler-times {vrsub\.vi} 24 } } */ /* { dg-final { scan-assembler-times {vrsub\.vx} 7 } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/perm-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/perm-4.c index 46cad8ea2f4..4d6862cf1c0 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/perm-4.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/perm-4.c @@ -3,6 +3,7 @@ #include "../vls-vlmax/perm-4.c" -/* { dg-final { scan-assembler-times {vrgather\.vv\tv[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 31 } } */ +/* { dg-final { scan-assembler-times {vrgather\.vv\tv[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 19 } } */ +/* { dg-final { scan-assembler-times {vrgatherei16\.vv\tv[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 12 } } */ /* { dg-final { scan-assembler-times {vrsub\.vi} 24 } } */ /* { dg-final { scan-assembler-times {vrsub\.vx} 7 } } */ -- 2.17.1
Re: Re: [PATCH] RISC-V: Optimization of vrgather.vv into vrgatherei16.vv[PR111451]
Sorry. It should be: else if (CONST_VECTOR_P (sel) && GET_MODE_BITSIZE (GET_MODE_INNER (sel_mode)).to_constant () > 16 && riscv_get_v_regno_alignment (data_mode) > 1) juzhe.zh...@rivai.ai From: juzhe.zh...@rivai.ai Date: 2023-09-22 09:39 To: Li Xu; gcc-patches CC: kito.cheng; palmer; Li Xu Subject: Re: [PATCH] RISC-V: Optimization of vrgather.vv into vrgatherei16.vv[PR111451] + unsigned int data_sew = get_sew (data_mode); + enum vlmul_type data_lmul = get_vlmul (data_mode); Remove this. + else if (CONST_VECTOR_P (sel) && data_sew != 16 + && data_sew != 8 && (data_lmul == LMUL_2 + || data_lmul == LMUL_4 || data_lmul == LMUL_8)) change it into: else if (CONST_VECTOR_P (sel) && GET_MODE_BITSIZE (GET_MODE_INNER (sel_mode)).to_constant () > 16 && riscv_get_v_regno_alignment (data_mode) > LMUL_1) juzhe.zh...@rivai.ai From: Li Xu Date: 2023-09-22 09:33 To: gcc-patches CC: kito.cheng; palmer; juzhe.zhong; xuli Subject: [PATCH] RISC-V: Optimization of vrgather.vv into vrgatherei16.vv[PR111451] From: xuli Consider this following case: typedef int32_t vnx32si __attribute__ ((vector_size (128))); __attribute__ ((noipa)) void permute_##TYPE (TYPE values1, TYPE values2, \ TYPE *out) \ {\ TYPE v \ = __builtin_shufflevector (values1, values2, MASK_##NUNITS (0, NUNITS)); \ *(TYPE *) out = v; \ } T (vnx32si, 32) \ TEST_ALL (PERMUTE) Before this patch: li a4,31 vsetvli a5,zero,e32,m8,ta,ma vl8re32.v v24,0(a0) vid.v v8 vrsub.vx v8,v8,a4 vrgather.vv v16,v24,v8 vs8r.v v16,0(a2) ret The index vector register "v8" occupies 8 registers. We should optimize it into vrgatherei16.vv which is using int16 as the index elements. After this patch: vsetvli a5,zero,e16,m4,ta,ma li a4,31 vid.v v4 vl8re32.v v16,0(a0) vrsub.vx v4,v4,a4 vsetvli zero,zero,e32,m8,ta,ma vrgatherei16.vv v8,v16,v4 vs8r.v v8,0(a2) ret With vrgatherei16.vv, the v8 will occupy 4 registers instead of 8. Lower the register consuming and register pressure. gcc/ChangeLog: * config/riscv/riscv-v.cc (emit_vlmax_gather_insn): Optimization of vrgather.vv into vrgatherei16.vv. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c: Adjust case. * gcc.target/riscv/rvv/autovec/vls/perm-4.c: Ditto. --- gcc/config/riscv/riscv-v.cc | 20 +++ .../riscv/rvv/autovec/vls-vlmax/perm-4.c | 3 ++- .../gcc.target/riscv/rvv/autovec/vls/perm-4.c | 3 ++- 3 files changed, 24 insertions(+), 2 deletions(-) diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index 64a71a128d4..271e0ff6dfc 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -783,6 +783,8 @@ emit_vlmax_gather_insn (rtx target, rtx op, rtx sel) insn_code icode; machine_mode data_mode = GET_MODE (target); machine_mode sel_mode = GET_MODE (sel); + unsigned int data_sew = get_sew (data_mode); + enum vlmul_type data_lmul = get_vlmul (data_mode); if (maybe_ne (GET_MODE_SIZE (data_mode), GET_MODE_SIZE (sel_mode))) icode = code_for_pred_gatherei16 (data_mode); else if (const_vec_duplicate_p (sel, &elt)) @@ -790,6 +792,24 @@ emit_vlmax_gather_insn (rtx target, rtx op, rtx sel) icode = code_for_pred_gather_scalar (data_mode); sel = elt; } + else if (CONST_VECTOR_P (sel) && data_sew != 16 + && data_sew != 8 && (data_lmul == LMUL_2 + || data_lmul == LMUL_4 || data_lmul == LMUL_8)) +{ + /* If the inner mode of data is not QI or HI and data_lmul > 1, + emitting vrgatherei16.vv instruction will lower register + pressure. + data_mode sel_mode ei16 + RVVM1QIRVVM1QI RVVM2HI not needed + RVVM2QIRVVM2QI RVVM4HI not needed + RVVM2HIRVVM2HI RVVM2HI not needed + RVVM2SIRVVM2SI RVVM1HI need + RVVM4SIRVVM4SI RVVM2HI need + RVVM8DIRVVM8DI RVVM2HI need */ + PUT_MODE (sel, get_vector_mode (HImode, +GET_MODE_NUNITS (data_mode)).require ()); + icode = code_for_pred_gatherei16 (data_mode); +} else icode = code_for_pred_gather (data_mode); rtx ops[] = {target, op, sel}; diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c index 9df69a0cc2c..7ab31043547 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c @@ -55,6 +55,7 @@ TEST_ALL (PERMUTE)
Re: [PATCH] RISC-V: Optimization of vrgather.vv into vrgatherei16.vv[PR111451]
+ unsigned int data_sew = get_sew (data_mode); + enum vlmul_type data_lmul = get_vlmul (data_mode); Remove this. + else if (CONST_VECTOR_P (sel) && data_sew != 16 + && data_sew != 8 && (data_lmul == LMUL_2 + || data_lmul == LMUL_4 || data_lmul == LMUL_8)) change it into: else if (CONST_VECTOR_P (sel) && GET_MODE_BITSIZE (GET_MODE_INNER (sel_mode)).to_constant () > 16 && riscv_get_v_regno_alignment (data_mode) > LMUL_1) juzhe.zh...@rivai.ai From: Li Xu Date: 2023-09-22 09:33 To: gcc-patches CC: kito.cheng; palmer; juzhe.zhong; xuli Subject: [PATCH] RISC-V: Optimization of vrgather.vv into vrgatherei16.vv[PR111451] From: xuli Consider this following case: typedef int32_t vnx32si __attribute__ ((vector_size (128))); __attribute__ ((noipa)) void permute_##TYPE (TYPE values1, TYPE values2, \ TYPE *out) \ {\ TYPE v \ = __builtin_shufflevector (values1, values2, MASK_##NUNITS (0, NUNITS)); \ *(TYPE *) out = v; \ } T (vnx32si, 32) \ TEST_ALL (PERMUTE) Before this patch: li a4,31 vsetvli a5,zero,e32,m8,ta,ma vl8re32.v v24,0(a0) vid.v v8 vrsub.vx v8,v8,a4 vrgather.vv v16,v24,v8 vs8r.v v16,0(a2) ret The index vector register "v8" occupies 8 registers. We should optimize it into vrgatherei16.vv which is using int16 as the index elements. After this patch: vsetvli a5,zero,e16,m4,ta,ma li a4,31 vid.v v4 vl8re32.v v16,0(a0) vrsub.vx v4,v4,a4 vsetvli zero,zero,e32,m8,ta,ma vrgatherei16.vv v8,v16,v4 vs8r.v v8,0(a2) ret With vrgatherei16.vv, the v8 will occupy 4 registers instead of 8. Lower the register consuming and register pressure. gcc/ChangeLog: * config/riscv/riscv-v.cc (emit_vlmax_gather_insn): Optimization of vrgather.vv into vrgatherei16.vv. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c: Adjust case. * gcc.target/riscv/rvv/autovec/vls/perm-4.c: Ditto. --- gcc/config/riscv/riscv-v.cc | 20 +++ .../riscv/rvv/autovec/vls-vlmax/perm-4.c | 3 ++- .../gcc.target/riscv/rvv/autovec/vls/perm-4.c | 3 ++- 3 files changed, 24 insertions(+), 2 deletions(-) diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index 64a71a128d4..271e0ff6dfc 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -783,6 +783,8 @@ emit_vlmax_gather_insn (rtx target, rtx op, rtx sel) insn_code icode; machine_mode data_mode = GET_MODE (target); machine_mode sel_mode = GET_MODE (sel); + unsigned int data_sew = get_sew (data_mode); + enum vlmul_type data_lmul = get_vlmul (data_mode); if (maybe_ne (GET_MODE_SIZE (data_mode), GET_MODE_SIZE (sel_mode))) icode = code_for_pred_gatherei16 (data_mode); else if (const_vec_duplicate_p (sel, &elt)) @@ -790,6 +792,24 @@ emit_vlmax_gather_insn (rtx target, rtx op, rtx sel) icode = code_for_pred_gather_scalar (data_mode); sel = elt; } + else if (CONST_VECTOR_P (sel) && data_sew != 16 + && data_sew != 8 && (data_lmul == LMUL_2 + || data_lmul == LMUL_4 || data_lmul == LMUL_8)) +{ + /* If the inner mode of data is not QI or HI and data_lmul > 1, + emitting vrgatherei16.vv instruction will lower register + pressure. + data_mode sel_mode ei16 + RVVM1QIRVVM1QI RVVM2HI not needed + RVVM2QIRVVM2QI RVVM4HI not needed + RVVM2HIRVVM2HI RVVM2HI not needed + RVVM2SIRVVM2SI RVVM1HI need + RVVM4SIRVVM4SI RVVM2HI need + RVVM8DIRVVM8DI RVVM2HI need */ + PUT_MODE (sel, get_vector_mode (HImode, +GET_MODE_NUNITS (data_mode)).require ()); + icode = code_for_pred_gatherei16 (data_mode); +} else icode = code_for_pred_gather (data_mode); rtx ops[] = {target, op, sel}; diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c index 9df69a0cc2c..7ab31043547 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c @@ -55,6 +55,7 @@ TEST_ALL (PERMUTE) -/* { dg-final { scan-assembler-times {vrgather\.vv\tv[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 31 } } */ +/* { dg-final { scan-assembler-times {vrgather\.vv\tv[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 19 } } */ +/* { dg-final { scan-assembler-times {vrgatherei16\.vv\tv[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 12 } } */ /* { dg-final { scan-assembler-times {vrsub\.vi} 24 } } */ /* { dg-final { scan-assembler-times {vrsub\.vx} 7 } } *
Re: [PATCH v1] RISC-V: Leverage __builtin_xx instead of math.h for test
LGTM。 juzhe.zh...@rivai.ai From: pan2.li Date: 2023-09-22 09:12 To: gcc-patches CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng Subject: [PATCH v1] RISC-V: Leverage __builtin_xx instead of math.h for test From: Pan Li The math.h may have problems in some environment, take __builtin__xx instead for testing. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls/floating-point-max-5.c: Remove reference to math.h. * gcc.target/riscv/rvv/autovec/vls/floating-point-min-5.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/floating-point-sgnjx-2.c: Ditto. Signed-off-by: Pan Li --- .../rvv/autovec/vls/floating-point-max-5.c| 43 +-- .../rvv/autovec/vls/floating-point-min-5.c| 43 +-- .../rvv/autovec/vls/floating-point-sgnjx-2.c | 43 +-- 3 files changed, 63 insertions(+), 66 deletions(-) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/floating-point-max-5.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/floating-point-max-5.c index 775ddb1d25e..dd163682396 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/floating-point-max-5.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/floating-point-max-5.c @@ -2,30 +2,29 @@ /* { dg-options "-march=rv64gcv_zvfh_zvl4096b -mabi=lp64d -O3 -fno-schedule-insns -fno-schedule-insns2 --param=riscv-autovec-lmul=m8 -ffast-math" } */ #include "def.h" -#include "math.h" -DEF_CALL_VV (max, 1, float, fmaxf) -DEF_CALL_VV (max, 2, float, fmaxf) -DEF_CALL_VV (max, 4, float, fmaxf) -DEF_CALL_VV (max, 8, float, fmaxf) -DEF_CALL_VV (max, 16, float, fmaxf) -DEF_CALL_VV (max, 32, float, fmaxf) -DEF_CALL_VV (max, 64, float, fmaxf) -DEF_CALL_VV (max, 128, float, fmaxf) -DEF_CALL_VV (max, 256, float, fmaxf) -DEF_CALL_VV (max, 512, float, fmaxf) -DEF_CALL_VV (max, 1024, float, fmaxf) +DEF_CALL_VV (max, 1, float, __builtin_fmaxf) +DEF_CALL_VV (max, 2, float, __builtin_fmaxf) +DEF_CALL_VV (max, 4, float, __builtin_fmaxf) +DEF_CALL_VV (max, 8, float, __builtin_fmaxf) +DEF_CALL_VV (max, 16, float, __builtin_fmaxf) +DEF_CALL_VV (max, 32, float, __builtin_fmaxf) +DEF_CALL_VV (max, 64, float, __builtin_fmaxf) +DEF_CALL_VV (max, 128, float, __builtin_fmaxf) +DEF_CALL_VV (max, 256, float, __builtin_fmaxf) +DEF_CALL_VV (max, 512, float, __builtin_fmaxf) +DEF_CALL_VV (max, 1024, float, __builtin_fmaxf) -DEF_CALL_VV (max, 1, double, fmax) -DEF_CALL_VV (max, 2, double, fmax) -DEF_CALL_VV (max, 4, double, fmax) -DEF_CALL_VV (max, 8, double, fmax) -DEF_CALL_VV (max, 16, double, fmax) -DEF_CALL_VV (max, 32, double, fmax) -DEF_CALL_VV (max, 64, double, fmax) -DEF_CALL_VV (max, 128, double, fmax) -DEF_CALL_VV (max, 256, double, fmax) -DEF_CALL_VV (max, 512, double, fmax) +DEF_CALL_VV (max, 1, double, __builtin_fmax) +DEF_CALL_VV (max, 2, double, __builtin_fmax) +DEF_CALL_VV (max, 4, double, __builtin_fmax) +DEF_CALL_VV (max, 8, double, __builtin_fmax) +DEF_CALL_VV (max, 16, double, __builtin_fmax) +DEF_CALL_VV (max, 32, double, __builtin_fmax) +DEF_CALL_VV (max, 64, double, __builtin_fmax) +DEF_CALL_VV (max, 128, double, __builtin_fmax) +DEF_CALL_VV (max, 256, double, __builtin_fmax) +DEF_CALL_VV (max, 512, double, __builtin_fmax) /* { dg-final { scan-assembler-times {vfmax\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 19 } } */ /* { dg-final { scan-assembler-not {csrr} } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/floating-point-min-5.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/floating-point-min-5.c index 1e9ff7d5054..0e3cbf2acec 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/floating-point-min-5.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/floating-point-min-5.c @@ -2,30 +2,29 @@ /* { dg-options "-march=rv64gcv_zvfh_zvl4096b -mabi=lp64d -O3 -fno-schedule-insns -fno-schedule-insns2 --param=riscv-autovec-lmul=m8 -ffast-math" } */ #include "def.h" -#include "math.h" -DEF_CALL_VV (min, 1, float, fminf) -DEF_CALL_VV (min, 2, float, fminf) -DEF_CALL_VV (min, 4, float, fminf) -DEF_CALL_VV (min, 8, float, fminf) -DEF_CALL_VV (min, 16, float, fminf) -DEF_CALL_VV (min, 32, float, fminf) -DEF_CALL_VV (min, 64, float, fminf) -DEF_CALL_VV (min, 128, float, fminf) -DEF_CALL_VV (min, 256, float, fminf) -DEF_CALL_VV (min, 512, float, fminf) -DEF_CALL_VV (min, 1024, float, fminf) +DEF_CALL_VV (min, 1, float, __builtin_fminf) +DEF_CALL_VV (min, 2, float, __builtin_fminf) +DEF_CALL_VV (min, 4, float, __builtin_fminf) +DEF_CALL_VV (min, 8, float, __builtin_fminf) +DEF_CALL_VV (min, 16, float, __builtin_fminf) +DEF_CALL_VV (min, 32, float, __builtin_fminf) +DEF_CALL_VV (min, 64, float, __builtin_fminf) +DEF_CALL_VV (min, 128, float, __builtin_fminf) +DEF_CALL_VV (min, 256, float, __builtin_fminf) +DEF_CALL_VV (min, 512, float, __builtin_fminf) +DEF_CALL_VV (min, 1024, float, __builtin_fminf) -DEF_CALL_VV (min, 1, double, fmin) -DEF_CALL_VV (min, 2, double, fmin) -DEF_CALL_VV (min, 4, double, fmin) -DEF_CALL_VV (min, 8, dou
Re: [PATCH v4] RISC-V: Support ceil and ceilf auto-vectorization
LGTM juzhe.zh...@rivai.ai From: pan2.li Date: 2023-09-22 08:12 To: gcc-patches CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng Subject: [PATCH v4] RISC-V: Support ceil and ceilf auto-vectorization From: Pan Li Update in v4: * Add test for _Float16. * Remove unnecessary macro in def.h for test. Original log: This patch would like to support auto-vectorization for both the ceil and ceilf of math.h. It depends on the -ffast-math option. When we would like to call ceil/ceilf like v2 = ceil (v1), we will convert it into below insn (reference the implementation of llvm). * vfcvt.x.f v3, v1, RUP * vfcvt.f.x v2, v3 However, the floating point value may not need the cvt as above if its mantissa is zero. For example single precision floating point below. +---+---+ | float | binary layout | +---+---+ | 8388607.5 | 0x4aff| | 8388608.0 | 0x4b00| | 8388609.0 | 0x4b01| +---+---+ All single floating point great than 8388608.0 will have all zero mantisaa. We leverage vmflt and mask to filter them out in vector and only do the cvt on mask. Befor this patch: math-ceil-1.c:21:1: missed: couldn't vectorize loop ... .L3: flw fa0,0(s0) addis0,s0,4 addis1,s1,4 callceilf fsw fa0,-4(s1) bne s0,s2,.L3 After this patch: ... fsrmi 3 .L4: vfabs.v v0,v1 vmv1r.v v2,v1 vmflt.vvv0,v0,v4 sub a3,a3,a4 vfcvt.x.f.v v3,v1,v0.t vfcvt.f.x.v v2,v3,v0.t vfsgnj.vv v2,v2,v1 bne .L4 .L14: fsrma6 ret Please note VLS mode is also involved in this patch and covered by the test cases. gcc/ChangeLog: * config/riscv/autovec.md (ceil2): New pattern. * config/riscv/riscv-protos.h (enum insn_flags): New enum type. (enum insn_type): Ditto. (expand_vec_ceil): New function decl. * config/riscv/riscv-v.cc (gen_ceil_const_fp): New function impl. (expand_vec_float_cmp_mask): Ditto. (expand_vec_copysign): Ditto. (expand_vec_ceil): Ditto. * config/riscv/vector.md: Add VLS mode support. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/math-ceil-0.c: New test. * gcc.target/riscv/rvv/autovec/math-ceil-1.c: New test. * gcc.target/riscv/rvv/autovec/math-ceil-2.c: New test. * gcc.target/riscv/rvv/autovec/math-ceil-3.c: New test. * gcc.target/riscv/rvv/autovec/math-ceil-run-0.c: New test. * gcc.target/riscv/rvv/autovec/math-ceil-run-1.c: New test. * gcc.target/riscv/rvv/autovec/math-ceil-run-2.c: New test. * gcc.target/riscv/rvv/autovec/test-math.h: New test. * gcc.target/riscv/rvv/autovec/vls/math-ceil-1.c: New test. Signed-off-by: Pan Li --- gcc/config/riscv/autovec.md | 16 +++ gcc/config/riscv/riscv-protos.h | 5 + gcc/config/riscv/riscv-v.cc | 133 ++ gcc/config/riscv/vector.md| 2 +- .../riscv/rvv/autovec/math-ceil-0.c | 26 .../riscv/rvv/autovec/math-ceil-1.c | 26 .../riscv/rvv/autovec/math-ceil-2.c | 26 .../riscv/rvv/autovec/math-ceil-3.c | 28 .../riscv/rvv/autovec/math-ceil-run-0.c | 39 + .../riscv/rvv/autovec/math-ceil-run-1.c | 39 + .../riscv/rvv/autovec/math-ceil-run-2.c | 39 + .../gcc.target/riscv/rvv/autovec/test-math.h | 38 + .../riscv/rvv/autovec/vls/math-ceil-1.c | 56 13 files changed, 472 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-3.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/test-math.h create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-ceil-1.c diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index f0f1abc4e82..1b4bd82f9ec 100644 --- a/gcc/config/riscv/autovec.md +++ b/gcc/config/riscv/autovec.md @@ -2239,3 +2239,19 @@ (define_expand "avg3_ceil" riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, ops3); DONE; }) + +;; - +;; [FP] Math.h. +;; - +;; Includes: +;; - ceil/ceilf +;; - +(define_expand "ceil2" + [(match_operand:V_VLSF 0 "register_operand") + (match_operand:V_VLSF 1 "register_operand")] + "TARGET_VECTOR && !flag_tr
Re: [Committed] RISC-V: Remove math.h import to resolve missing stubs failures
Hi, Patrick. GNU rvv intrinsic api test-generator has been merged: https://github.com/riscv-non-isa/rvv-intrinsic-doc/commits/main Could you include the full RVV intrinsic API test in your test CI? Currently, we don't include all API test in the GCC testsuite since it's too big. juzhe.zh...@rivai.ai From: Patrick O'Neill Date: 2023-09-21 01:51 To: Kito Cheng CC: GCC Patches; Robin Dapp; 钟居哲 Subject: [Committed] RISC-V: Remove math.h import to resolve missing stubs failures Committed. Thanks! On 9/20/23 10:19, Kito Cheng wrote: LGTM Patrick O'Neill 於 2023年9月20日 週三 18:07 寫道: Resolves some of the missing stubs failures: fatal error: gnu/stubs-lp64d.h: No such file or directory compilation terminated. 2023-09-20 Juzhe Zhong gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls/def.h: Remove unneeded math.h import. Tested-by: Patrick O'Neill --- Tested using 590a8bec3ed92118e084b0a1897d3314a666170e glibc rv64gcv glibc rv32gcv glibc rv64gcv Resolved failures: FAIL: gcc.target/riscv/rvv/autovec/vls/mov-2.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable (test for excess errors) FAIL: gcc.target/riscv/rvv/autovec/vls/mov-4.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable (test for excess errors) FAIL: gcc.target/riscv/rvv/autovec/vls/mov-6.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable (test for excess errors) glibc rv32gcv Resolved failures: FAIL: gcc.target/riscv/rvv/autovec/vls/and-1.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable (test for excess errors) FAIL: gcc.target/riscv/rvv/autovec/vls/and-2.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable (test for excess errors) FAIL: gcc.target/riscv/rvv/autovec/vls/and-3.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable (test for excess errors) FAIL: gcc.target/riscv/rvv/autovec/vls/cmp-1.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable (test for excess errors) FAIL: gcc.target/riscv/rvv/autovec/vls/cmp-2.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable (test for excess errors) FAIL: gcc.target/riscv/rvv/autovec/vls/cmp-3.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable (test for excess errors) FAIL: gcc.target/riscv/rvv/autovec/vls/cmp-4.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable (test for excess errors) FAIL: gcc.target/riscv/rvv/autovec/vls/cmp-5.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable (test for excess errors) FAIL: gcc.target/riscv/rvv/autovec/vls/cmp-6.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable (test for excess errors) FAIL: gcc.target/riscv/rvv/autovec/vls/const-1.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable (test for excess errors) FAIL: gcc.target/riscv/rvv/autovec/vls/const-2.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable (test for excess errors) FAIL: gcc.target/riscv/rvv/autovec/vls/const-3.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable (test for excess errors) FAIL: gcc.target/riscv/rvv/autovec/vls/const-4.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable (test for excess errors) FAIL: gcc.target/riscv/rvv/autovec/vls/const-5.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable (test for excess errors) FAIL: gcc.target/riscv/rvv/autovec/vls/div-1.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable (test for excess errors) FAIL: gcc.target/riscv/rvv/autovec/vls/dup-1.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable (test for excess errors) FAIL: gcc.target/riscv/rvv/autovec/vls/dup-2.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable (test for excess errors) FAIL: gcc.target/riscv/rvv/autovec/vls/dup-3.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable (test for excess errors) FAIL: gcc.target/riscv/rvv/autovec/vls/dup-4.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable (test for excess errors) FAIL: gcc.target/riscv/rvv/autovec/vls/dup-5.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable (test for excess errors) FAIL: gcc.target/riscv/rvv/autovec/vls/dup-6.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable (test for excess errors) FAIL: gcc.target/riscv/rvv/autovec/vls/dup-7.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable (test for excess errors) FAIL: gcc.target/riscv/rvv/autovec/vls/extract-1.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable (test for excess errors) FAIL: gcc.target/riscv/rvv/autovec/vls/extract-2.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable (test for excess errors) FAIL: gcc.target/riscv/rvv/autovec/vls/floating-point-add-1.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable (test for excess errors) FAIL: gcc.target/riscv/rvv/autovec/vls/floating-point-add-2.c -O3 -ftree-vector
Re: [PATCH v2] RISC-V: Support ceil and ceilf auto-vectorization
Also。 Remove math.h include。 Instead, plz use __builtin_ceil. juzhe.zh...@rivai.ai From: pan2.li Date: 2023-09-21 18:32 To: gcc-patches CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng Subject: [PATCH v2] RISC-V: Support ceil and ceilf auto-vectorization From: Pan Li This patch would like to support auto-vectorization for both the ceil and ceilf of math.h. It depends on the -ffast-math option. When we would like to call ceil/ceilf like v2 = ceil (v1), we will convert it into below insn (reference the implementation of llvm). * vfcvt.x.f v3, v1, RUP * vfcvt.f.x v2, v3 However, the floating point value may not need the cvt as above if its mantissa is zero. For example single precision floating point below. +---+---+ | float | binary layout | +---+---+ | 8388607.5 | 0x4aff| | 8388608.0 | 0x4b00| | 8388609.0 | 0x4b01| +---+---+ All single floating point great than 8388608.0 will have all zero mantisaa. We leverage vmflt and mask to filter them out in vector and only do the cvt on mask. Befor this patch: math-ceil-1.c:21:1: missed: couldn't vectorize loop ... .L3: flw fa0,0(s0) addis0,s0,4 addis1,s1,4 callceilf fsw fa0,-4(s1) bne s0,s2,.L3 After this patch: ... fsrmi 3 .L4: vfabs.v v0,v1 vmv1r.v v2,v1 vmflt.vvv0,v0,v4 sub a3,a3,a4 vfcvt.x.f.v v3,v1,v0.t vfcvt.f.x.v v2,v3,v0.t vfsgnj.vv v2,v2,v1 bne .L4 .L14: fsrma6 ret Please note VLS mode is also involved in this patch and covered by the test cases. gcc/ChangeLog: * config/riscv/autovec.md (ceil2): New pattern. * config/riscv/riscv-protos.h (enum insn_flags): New enum type. (enum insn_type): Ditto. (expand_vec_ceil): New function decl. * config/riscv/riscv-v.cc (gen_ceil_const_fp): New function impl. (expand_vec_float_cmp_mask): Ditto. (expand_vec_copysign): Ditto. (expand_vec_ceil): Ditto. * config/riscv/vector-iterators.md: Add VLS mode to VCONVERT. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/math-ceil-1.c: New test. * gcc.target/riscv/rvv/autovec/math-ceil-2.c: New test. * gcc.target/riscv/rvv/autovec/math-ceil-3.c: New test. * gcc.target/riscv/rvv/autovec/math-ceil-4.c: New test. * gcc.target/riscv/rvv/autovec/math-ceil-run-1.c: New test. * gcc.target/riscv/rvv/autovec/math-ceil-run-2.c: New test. * gcc.target/riscv/rvv/autovec/math-ceil-run-3.c: New test. * gcc.target/riscv/rvv/autovec/math-ceil-run-4.c: New test. * gcc.target/riscv/rvv/autovec/math-ceil-run-double.h: New test. * gcc.target/riscv/rvv/autovec/math-ceil-run-single.h: New test. * gcc.target/riscv/rvv/autovec/test-math.h: New test. Signed-off-by: Pan Li --- gcc/config/riscv/autovec.md | 16 +++ gcc/config/riscv/riscv-protos.h | 5 + gcc/config/riscv/riscv-v.cc | 116 ++ gcc/config/riscv/vector-iterators.md | 12 ++ .../riscv/rvv/autovec/math-ceil-1.c | 26 .../riscv/rvv/autovec/math-ceil-2.c | 26 .../riscv/rvv/autovec/math-ceil-3.c | 28 + .../riscv/rvv/autovec/math-ceil-4.c | 28 + .../riscv/rvv/autovec/math-ceil-run-1.c | 4 + .../riscv/rvv/autovec/math-ceil-run-2.c | 4 + .../riscv/rvv/autovec/math-ceil-run-3.c | 4 + .../riscv/rvv/autovec/math-ceil-run-4.c | 4 + .../riscv/rvv/autovec/math-ceil-run-double.h | 36 ++ .../riscv/rvv/autovec/math-ceil-run-single.h | 36 ++ .../gcc.target/riscv/rvv/autovec/test-math.h | 40 ++ 15 files changed, 385 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-3.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-4.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-3.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-4.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-double.h create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-single.h create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/test-math.h diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index 493d5745485..36ed839aa5b 100644 --- a/gcc/config/riscv/autovec.md +++ b/gcc/config/riscv/autovec.md @@ -2374,3 +2374,19 @@ (define_expand "avg3_ceil" riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, ops3); DONE; }) + +;; - +;; -
Re: [PATCH v2] RISC-V: Support ceil and ceilf auto-vectorization
+(define_expand "ceil2" + [(match_operand:V_VLSF 0 "register_operand") + (match_operand:V_VLSF 1 "register_operand")] + "TARGET_VECTOR" + { +riscv_vector::expand_vec_ceil (operands[0], operands[1], mode, mode); +DONE; + } I think you should add !flag_trapping_math && !flag_rounding_math You can try -ftrapping-math or frounding-mode, LLVM failed to vectorize. Like X86: (define_expand "round2" [(match_operand:X87MODEF 0 "register_operand") (match_operand:X87MODEF 1 "nonimmediate_operand")] "(TARGET_USE_FANCY_MATH_387 && (!(SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH) || TARGET_MIX_SSE_I387) && flag_unsafe_math_optimizations && (flag_fp_int_builtin_inexact || !flag_trapping_math)) || (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH && !flag_trapping_math && !flag_rounding_math)" Otherwise LGTM. juzhe.zh...@rivai.ai From: pan2.li Date: 2023-09-21 18:32 To: gcc-patches CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng Subject: [PATCH v2] RISC-V: Support ceil and ceilf auto-vectorization From: Pan Li This patch would like to support auto-vectorization for both the ceil and ceilf of math.h. It depends on the -ffast-math option. When we would like to call ceil/ceilf like v2 = ceil (v1), we will convert it into below insn (reference the implementation of llvm). * vfcvt.x.f v3, v1, RUP * vfcvt.f.x v2, v3 However, the floating point value may not need the cvt as above if its mantissa is zero. For example single precision floating point below. +---+---+ | float | binary layout | +---+---+ | 8388607.5 | 0x4aff| | 8388608.0 | 0x4b00| | 8388609.0 | 0x4b01| +---+---+ All single floating point great than 8388608.0 will have all zero mantisaa. We leverage vmflt and mask to filter them out in vector and only do the cvt on mask. Befor this patch: math-ceil-1.c:21:1: missed: couldn't vectorize loop ... .L3: flw fa0,0(s0) addis0,s0,4 addis1,s1,4 callceilf fsw fa0,-4(s1) bne s0,s2,.L3 After this patch: ... fsrmi 3 .L4: vfabs.v v0,v1 vmv1r.v v2,v1 vmflt.vvv0,v0,v4 sub a3,a3,a4 vfcvt.x.f.v v3,v1,v0.t vfcvt.f.x.v v2,v3,v0.t vfsgnj.vv v2,v2,v1 bne .L4 .L14: fsrma6 ret Please note VLS mode is also involved in this patch and covered by the test cases. gcc/ChangeLog: * config/riscv/autovec.md (ceil2): New pattern. * config/riscv/riscv-protos.h (enum insn_flags): New enum type. (enum insn_type): Ditto. (expand_vec_ceil): New function decl. * config/riscv/riscv-v.cc (gen_ceil_const_fp): New function impl. (expand_vec_float_cmp_mask): Ditto. (expand_vec_copysign): Ditto. (expand_vec_ceil): Ditto. * config/riscv/vector-iterators.md: Add VLS mode to VCONVERT. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/math-ceil-1.c: New test. * gcc.target/riscv/rvv/autovec/math-ceil-2.c: New test. * gcc.target/riscv/rvv/autovec/math-ceil-3.c: New test. * gcc.target/riscv/rvv/autovec/math-ceil-4.c: New test. * gcc.target/riscv/rvv/autovec/math-ceil-run-1.c: New test. * gcc.target/riscv/rvv/autovec/math-ceil-run-2.c: New test. * gcc.target/riscv/rvv/autovec/math-ceil-run-3.c: New test. * gcc.target/riscv/rvv/autovec/math-ceil-run-4.c: New test. * gcc.target/riscv/rvv/autovec/math-ceil-run-double.h: New test. * gcc.target/riscv/rvv/autovec/math-ceil-run-single.h: New test. * gcc.target/riscv/rvv/autovec/test-math.h: New test. Signed-off-by: Pan Li --- gcc/config/riscv/autovec.md | 16 +++ gcc/config/riscv/riscv-protos.h | 5 + gcc/config/riscv/riscv-v.cc | 116 ++ gcc/config/riscv/vector-iterators.md | 12 ++ .../riscv/rvv/autovec/math-ceil-1.c | 26 .../riscv/rvv/autovec/math-ceil-2.c | 26 .../riscv/rvv/autovec/math-ceil-3.c | 28 + .../riscv/rvv/autovec/math-ceil-4.c | 28 + .../riscv/rvv/autovec/math-ceil-run-1.c | 4 + .../riscv/rvv/autovec/math-ceil-run-2.c | 4 + .../riscv/rvv/autovec/math-ceil-run-3.c | 4 + .../riscv/rvv/autovec/math-ceil-run-4.c | 4 + .../riscv/rvv/autovec/math-ceil-run-double.h | 36 ++ .../riscv/rvv/autovec/math-ceil-run-single.h | 36 ++ .../gcc.target/riscv/rvv/autovec/test-math.h | 40 ++ 15 files changed, 385 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-3.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-4.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-1.c create mode 10064
Re: [PATCH] RISC-V: Rename predicate vector_gs_scale_operand_16/32 to more generic names
LGTM juzhe.zh...@rivai.ai From: Lehua Ding Date: 2023-09-21 11:44 To: gcc-patches CC: juzhe.zhong; kito.cheng; rdapp.gcc; palmer; jeffreyalaw; lehua.ding Subject: [PATCH] RISC-V: Rename predicate vector_gs_scale_operand_16/32 to more generic names This little rename vector_gs_scale_operand_16/32 to more generic names const_1_or_2/4_operand. So it's a little better understood when offered for use elsewhere. gcc/ChangeLog: * config/riscv/predicates.md (const_1_or_2_operand): Rename. (const_1_or_4_operand): Ditto. (vector_gs_scale_operand_16): Ditto. (vector_gs_scale_operand_32): Ditto. * config/riscv/vector-iterators.md: Adjust. --- gcc/config/riscv/predicates.md | 16 gcc/config/riscv/vector-iterators.md | 16 2 files changed, 16 insertions(+), 16 deletions(-) diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md index 4bc7ff2c9d8..a4f03242f2c 100644 --- a/gcc/config/riscv/predicates.md +++ b/gcc/config/riscv/predicates.md @@ -70,6 +70,14 @@ (and (match_code "const_int,const_wide_int,const_vector") (match_test "op == CONST1_RTX (GET_MODE (op))"))) +(define_predicate "const_1_or_2_operand" + (and (match_code "const_int") + (match_test "INTVAL (op) == 1 || INTVAL (op) == 2"))) + +(define_predicate "const_1_or_4_operand" + (and (match_code "const_int") + (match_test "INTVAL (op) == 1 || INTVAL (op) == 4"))) + (define_predicate "reg_or_0_operand" (ior (match_operand 0 "const_0_operand") (match_operand 0 "register_operand"))) @@ -463,14 +471,6 @@ (ior (match_operand 0 "register_operand") (match_code "const_vector"))) -(define_predicate "vector_gs_scale_operand_16" - (and (match_code "const_int") - (match_test "INTVAL (op) == 1 || INTVAL (op) == 2"))) - -(define_predicate "vector_gs_scale_operand_32" - (and (match_code "const_int") - (match_test "INTVAL (op) == 1 || INTVAL (op) == 4"))) - (define_predicate "vector_gs_scale_operand_64" (and (match_code "const_int") (match_test "INTVAL (op) == 1 || (INTVAL (op) == 8 && Pmode == DImode)"))) diff --git a/gcc/config/riscv/vector-iterators.md b/gcc/config/riscv/vector-iterators.md index 053d84c0c7d..a32d7e8d4e9 100644 --- a/gcc/config/riscv/vector-iterators.md +++ b/gcc/config/riscv/vector-iterators.md @@ -2723,18 +2723,18 @@ (RVVMF4QI "const_1_operand") (RVVMF8QI "const_1_operand") (RVVM8HI "const_1_operand") (RVVM4HI "vector_gs_scale_operand_16_rv32") - (RVVM2HI "vector_gs_scale_operand_16") (RVVM1HI "vector_gs_scale_operand_16") - (RVVMF2HI "vector_gs_scale_operand_16") (RVVMF4HI "vector_gs_scale_operand_16") + (RVVM2HI "const_1_or_2_operand") (RVVM1HI "const_1_or_2_operand") + (RVVMF2HI "const_1_or_2_operand") (RVVMF4HI "const_1_or_2_operand") (RVVM8HF "const_1_operand") (RVVM4HF "vector_gs_scale_operand_16_rv32") - (RVVM2HF "vector_gs_scale_operand_16") (RVVM1HF "vector_gs_scale_operand_16") - (RVVMF2HF "vector_gs_scale_operand_16") (RVVMF4HF "vector_gs_scale_operand_16") + (RVVM2HF "const_1_or_2_operand") (RVVM1HF "const_1_or_2_operand") + (RVVMF2HF "const_1_or_2_operand") (RVVMF4HF "const_1_or_2_operand") - (RVVM8SI "vector_gs_scale_operand_32_rv32") (RVVM4SI "vector_gs_scale_operand_32") (RVVM2SI "vector_gs_scale_operand_32") - (RVVM1SI "vector_gs_scale_operand_32") (RVVMF2SI "vector_gs_scale_operand_32") + (RVVM8SI "vector_gs_scale_operand_32_rv32") (RVVM4SI "const_1_or_4_operand") (RVVM2SI "const_1_or_4_operand") + (RVVM1SI "const_1_or_4_operand") (RVVMF2SI "const_1_or_4_operand") - (RVVM8SF "vector_gs_scale_operand_32_rv32") (RVVM4SF "vector_gs_scale_operand_32") (RVVM2SF "vector_gs_scale_operand_32") - (RVVM1SF "vector_gs_scale_operand_32") (RVVMF2SF "vector_gs_scale_operand_32") + (RVVM8SF "vector_gs_scale_operand_32_rv32") (RVVM4SF "const_1_or_4_operand") (RVVM2SF "const_1_or_4_operand") + (RVVM1SF "const_1_or_4_operand") (RVVMF2SF "const_1_or_4_operand") (RVVM8DI "vector_gs_scale_operand_64") (RVVM4DI "vector_gs_scale_operand_64") (RVVM2DI "vector_gs_scale_operand_64") (RVVM1DI "vector_gs_scale_operand_64") -- 2.36.3
Re: [PATCH] RISC-V: Optimized for strided load/store with stride == element width[PR111450]
Thanks a lot. LGTM. juzhe.zh...@rivai.ai From: Li Xu Date: 2023-09-21 11:12 To: gcc-patches CC: kito.cheng; palmer; juzhe.zhong; xuli Subject: [PATCH] RISC-V: Optimized for strided load/store with stride == element width[PR111450] From: xuli When stride == element width, vlsse should be optimized into vle.v. vsse should be optimized into vse.v. PR target/111450 gcc/ChangeLog: *config/riscv/constraints.md (c01): const_int 1. (c02): const_int 2. (c04): const_int 4. (c08): const_int 8. * config/riscv/predicates.md (vector_eew8_stride_operand): New predicate for stride operand. (vector_eew16_stride_operand): Ditto. (vector_eew32_stride_operand): Ditto. (vector_eew64_stride_operand): Ditto. * config/riscv/vector-iterators.md: New iterator for stride operand. * config/riscv/vector.md: Add stride = element width constraint. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/pr111450.c: New test. --- gcc/config/riscv/constraints.md | 20 gcc/config/riscv/predicates.md| 18 gcc/config/riscv/vector-iterators.md | 87 +++ gcc/config/riscv/vector.md| 42 +--- .../gcc.target/riscv/rvv/base/pr111450.c | 100 ++ 5 files changed, 250 insertions(+), 17 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr111450.c diff --git a/gcc/config/riscv/constraints.md b/gcc/config/riscv/constraints.md index 3f52bc76f67..964fdd450c9 100644 --- a/gcc/config/riscv/constraints.md +++ b/gcc/config/riscv/constraints.md @@ -45,6 +45,26 @@ (and (match_code "const_int") (match_test "ival == 0"))) +(define_constraint "c01" + "Constant value 1." + (and (match_code "const_int") + (match_test "ival == 1"))) + +(define_constraint "c02" + "Constant value 2" + (and (match_code "const_int") + (match_test "ival == 2"))) + +(define_constraint "c04" + "Constant value 4" + (and (match_code "const_int") + (match_test "ival == 4"))) + +(define_constraint "c08" + "Constant value 8" + (and (match_code "const_int") + (match_test "ival == 8"))) + (define_constraint "K" "A 5-bit unsigned immediate for CSR access instructions." (and (match_code "const_int") diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md index 4bc7ff2c9d8..7845998e430 100644 --- a/gcc/config/riscv/predicates.md +++ b/gcc/config/riscv/predicates.md @@ -514,6 +514,24 @@ (ior (match_operand 0 "const_0_operand") (match_operand 0 "pmode_register_operand"))) +;; [1, 2, 4, 8] means strided load/store with stride == element width +(define_special_predicate "vector_eew8_stride_operand" + (ior (match_operand 0 "pmode_register_operand") + (and (match_code "const_int") +(match_test "INTVAL (op) == 1 || INTVAL (op) == 0" +(define_special_predicate "vector_eew16_stride_operand" + (ior (match_operand 0 "pmode_register_operand") + (and (match_code "const_int") +(match_test "INTVAL (op) == 2 || INTVAL (op) == 0" +(define_special_predicate "vector_eew32_stride_operand" + (ior (match_operand 0 "pmode_register_operand") + (and (match_code "const_int") +(match_test "INTVAL (op) == 4 || INTVAL (op) == 0" +(define_special_predicate "vector_eew64_stride_operand" + (ior (match_operand 0 "pmode_register_operand") + (and (match_code "const_int") +(match_test "INTVAL (op) == 8 || INTVAL (op) == 0" + ;; A special predicate that doesn't match a particular mode. (define_special_predicate "vector_any_register_operand" (match_code "reg")) diff --git a/gcc/config/riscv/vector-iterators.md b/gcc/config/riscv/vector-iterators.md index 73df55a69c8..f85d1cc80d1 100644 --- a/gcc/config/riscv/vector-iterators.md +++ b/gcc/config/riscv/vector-iterators.md @@ -2596,6 +2596,93 @@ (V512DI "V512BI") ]) +(define_mode_attr stride_predicate [ + (RVVM8QI "vector_eew8_stride_operand") (RVVM4QI "vector_eew8_stride_operand") + (RVVM2QI "vector_eew8_stride_operand") (RVVM1QI "vector_eew8_stride_operand") + (RVVMF2QI "vector_eew8_stride_operand") (RVVMF4QI "vector_eew8_stride_operand") + (RVVMF8QI "vector_eew8_stride_operand") + + (RVVM8HI "vector_eew16_stride_operand") (RVVM4HI "vector_eew16_stride_operand") + (RVVM2HI "vector_eew16_stride_operand") (RVVM1HI "vector_eew16_stride_operand") + (RVVMF2HI "vector_eew16_stride_operand") (RVVMF4HI "vec
Re: Re: [Committed] RISC-V: Fix Demand comparison bug[VSETVL PASS]
Yes. We could wait for a more few days to backport. juzhe.zh...@rivai.ai From: Kito Cheng Date: 2023-09-21 00:41 To: Juzhe-Zhong CC: GCC Patches; Kito Cheng; Jeff Law; Robin Dapp Subject: Re: [Committed] RISC-V: Fix Demand comparison bug[VSETVL PASS] Does it also happened on gcc 13 branch? If so plz backport :) Juzhe-Zhong 於 2023年9月20日 週三 11:09 寫道: This bug is exposed when we support VLS integer conversion patterns. FAIL: c-c++-common/torture/pr53505.c execution. This is because incorrect vsetvl elimination by Phase 4: 10318: 0d207057vsetvli zero,zero,e32,m4,ta,ma 1031c: 5e003e57vmv.v.i v28,0 .: missed e8,m1 vsetvl 10320: 7b07b057vmsgtu.vi v0,v16,15 10324: 03083157vadd.vi v2,v16,-16 Regression on release version GCC no surprise difference. Committed. gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (vector_insn_info::operator==): Fix bug. --- gcc/config/riscv/riscv-vsetvl.cc | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc index df980b6770e..e0f61148ef3 100644 --- a/gcc/config/riscv/riscv-vsetvl.cc +++ b/gcc/config/riscv/riscv-vsetvl.cc @@ -1799,10 +1799,11 @@ vector_insn_info::operator== (const vector_insn_info &other) const if (m_demands[i] != other.demand_p ((enum demand_type) i)) return false; - if (vector_config_insn_p (m_insn->rtl ()) - || vector_config_insn_p (other.get_insn ()->rtl ())) -if (m_insn != other.get_insn ()) - return false; + /* We should consider different INSN demands as different + expression. Otherwise, we will be doing incorrect vsetvl + elimination. */ + if (m_insn != other.get_insn ()) +return false; if (!same_avl_p (other)) return false; -- 2.36.3
Re: Re: [PATCH V2] RISC-V: Support combine cond extend and reduce sum to widen reduce sum
I think both approaches look weird to me. Lehua is adding an const 0 move pattern which is only used by widen reduction is not ideal. Also, I don't like changing abs/vcond_mask predicate. So, IMHO, a complicate pattern which combine initial 0 value + extension + reduction + vmerge may be more reasonable. juzhe.zh...@rivai.ai From: Robin Dapp Date: 2023-09-20 17:14 To: Lehua Ding; gcc-patches CC: rdapp.gcc; juzhe.zhong; kito.cheng; palmer; jeffreyalaw Subject: Re: [PATCH V2] RISC-V: Support combine cond extend and reduce sum to widen reduce sum Hi Lehua, I think this is better but still a bit weird :D Allowing constants and forcing them into registers unconditionally is slightly dubious as well, though. One thing that always sticks out is - how is 0 special? Wouldn't we want other constants as well? For reductions I think the vectorizer always starts accumulates starting with the initial neutral value 0 and adds any other scalar initial value later. But that could change? For reference, attached is what I tried. This gives me no regressions and your tests work. Your approach is more generic in case we want to match future zero constants in other patterns (that we still needed to adjust with force reg otherwise) but the force-reg thing appears more "natural". All in all, I would prefer the force-reg approach slightly but could also live with this v2 despite some minor "usability" concerns. Going to leave the decision to you, either one is OK. Regards Robin From 3be4cf4403a584d560c3923207a9c4da8dafee49 Mon Sep 17 00:00:00 2001 From: Robin Dapp Date: Wed, 20 Sep 2023 10:15:36 +0200 Subject: [PATCH] lehua --- gcc/config/riscv/autovec-opt.md | 52 - gcc/config/riscv/autovec.md | 4 ++- gcc/config/riscv/riscv-protos.h | 1 + 3 files changed, 55 insertions(+), 2 deletions(-) diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md index a97a095691c..8d4ee2ae37f 100644 --- a/gcc/config/riscv/autovec-opt.md +++ b/gcc/config/riscv/autovec-opt.md @@ -103,12 +103,14 @@ (define_insn_and_split "*cond_abs" (if_then_else:VF (match_operand: 3 "register_operand") (abs:VF (match_operand:VF 1 "nonmemory_operand")) - (match_operand:VF 2 "register_operand")))] + (match_operand:VF 2 "nonmemory_operand")))] "TARGET_VECTOR && can_create_pseudo_p ()" "#" "&& 1" [(const_int 0)] { + if (!REG_P (operands[2])) +operands[2] = force_reg (mode, operands[2]); emit_insn (gen_cond_len_abs (operands[0], operands[3], operands[1], operands[2], gen_int_mode (GET_MODE_NUNITS (mode), Pmode), @@ -1176,3 +1178,51 @@ (define_insn_and_split "*n" DONE; } [(set_attr "type" "vmalu")]) + +;; Combine mask extend + vredsum to mask vwredsum[u] +(define_insn_and_split "*cond_widen_reduc_plus_scal_" + [(set (match_operand: 0 "register_operand") +(unspec: [ + (if_then_else: +(match_operand: 1 "register_operand") +(any_extend: + (match_operand:VI_QHS_NO_M8 2 "register_operand")) +(match_operand: 3 "vector_const_0_operand")) +] UNSPEC_REDUC_SUM))] + "TARGET_VECTOR && can_create_pseudo_p ()" + "#" + "&& 1" + [(const_int 0)] +{ + rtx ops[] = {operands[0], operands[2], operands[1], + gen_int_mode (GET_MODE_NUNITS (mode), Pmode)}; + riscv_vector::expand_reduction (, + riscv_vector::REDUCE_OP_M, + ops, CONST0_RTX (mode)); + DONE; +} +[(set_attr "type" "vector")]) + +;; Combine mask extend + vfredsum to mask vfwredusum +(define_insn_and_split "*cond_widen_reduc_plus_scal_" + [(set (match_operand: 0 "register_operand") +(unspec: [ + (if_then_else: +(match_operand: 1 "register_operand") +(float_extend: + (match_operand:VF_HS_NO_M8 2 "register_operand")) +(match_operand: 3 "vector_const_0_operand")) +] UNSPEC_REDUC_SUM_UNORDERED))] + "TARGET_VECTOR && can_create_pseudo_p ()" + "#" + "&& 1" + [(const_int 0)] +{ + rtx ops[] = {operands[0], operands[2], operands[1], + gen_int_mode (GET_MODE_NUNITS (mode), Pmode)}; + riscv_vector::expand_reduction (UNSPEC_WREDUC_SUM_UNORDERED, + riscv_vector::REDUCE_OP_M_FRM_DYN, + ops, CONST0_RTX (mode)); + DONE; +} +[(set_attr "type" "vector")]) diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index 75ed7ae4f2e..1c10e841692 100644 ---
Re: [PATCH] RISC-V: Reorganize and rename combine patterns in autovec-opt.md
LGTM. juzhe.zh...@rivai.ai From: Lehua Ding Date: 2023-09-20 15:03 To: gcc-patches CC: juzhe.zhong; kito.cheng; rdapp.gcc; palmer; jeffreyalaw; lehua.ding Subject: [PATCH] RISC-V: Reorganize and rename combine patterns in autovec-opt.md This patch reorganize and rename the combine patterns in autovec-opt.md by category. There shouldn't be any functional changes. The current classification includes the following categories: - Combine op + vmerge to cond_op - Combine binop + trunc to narrow_binop - Combine extend + binop to widen_binop - Combine extend + ternop to widen_ternop - Misc combine patterns gcc/ChangeLog: * config/riscv/autovec-opt.md (*not): Move and rename. (*n): Ditto. (*vtrunc): Ditto. (*trunc): Ditto. (*narrow_): Ditto. (*narrow__scalar): Ditto. (*single_widen_mult): Ditto. (*single_widen_mul): Ditto. (*single_widen_mult): Ditto. (*single_widen_mul): Ditto. (*dual_widen_fma): Ditto. (*dual_widen_fma): Ditto. (*single_widen_fma): Ditto. (*single_widen_fma): Ditto. (*dual_fma): Ditto. (*single_fma): Ditto. (*dual_fnma): Ditto. (*dual_widen_fnma): Ditto. (*single_fnma): Ditto. (*single_widen_fnma): Ditto. (*dual_fms): Ditto. (*dual_widen_fms): Ditto. (*single_fms): Ditto. (*single_widen_fms): Ditto. (*dual_fnms): Ditto. (*dual_widen_fnms): Ditto. (*single_fnms): Ditto. (*single_widen_fnms): Ditto. --- gcc/config/riscv/autovec-opt.md | 203 ++-- 1 file changed, 91 insertions(+), 112 deletions(-) diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md index 66c77ad6ebb..46a344407c7 100644 --- a/gcc/config/riscv/autovec-opt.md +++ b/gcc/config/riscv/autovec-opt.md @@ -58,104 +58,6 @@ } ) -;; - -;; [BOOL] Binary logical operations (inverted second input) -;; - -;; Includes: -;; - vmandnot.mm -;; - vmornot.mm -;; - - -(define_insn_and_split "*not" - [(set (match_operand:VB_VLS 0 "register_operand" "=vr") - (bitmanip_bitwise:VB_VLS - (not:VB_VLS (match_operand:VB_VLS 2 "register_operand" " vr")) - (match_operand:VB_VLS 1 "register_operand" " vr")))] - "TARGET_VECTOR && can_create_pseudo_p ()" - "#" - "&& 1" - [(const_int 0)] - { -insn_code icode = code_for_pred_not (, mode); -riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_MASK_OP, operands); -DONE; - } - [(set_attr "type" "vmalu") - (set_attr "mode" "")]) - -;; - -;; [BOOL] Binary logical operations (inverted result) -;; - -;; Includes: -;; - vmnand.mm -;; - vmnor.mm -;; - vmxnor.mm -;; - - -(define_insn_and_split "*n" - [(set (match_operand:VB_VLS 0 "register_operand" "=vr") - (not:VB_VLS - (any_bitwise:VB_VLS - (match_operand:VB_VLS 1 "register_operand" " vr") - (match_operand:VB_VLS 2 "register_operand" " vr"] - "TARGET_VECTOR && can_create_pseudo_p ()" - "#" - "&& 1" - [(const_int 0)] - { -insn_code icode = code_for_pred_n (, mode); -riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_MASK_OP, operands); -DONE; - } - [(set_attr "type" "vmalu") - (set_attr "mode" "")]) - -;; - -;; [INT] Binary narrow shifts. -;; - -;; Includes: -;; - vnsrl.wv/vnsrl.wx/vnsrl.wi -;; - vnsra.wv/vnsra.wx/vnsra.wi -;; - - -(define_insn_and_split "*vtrunc" - [(set (match_operand: 0 "register_operand" "=vr,vr") -(truncate: - (any_shiftrt:VWEXTI -(match_operand:VWEXTI 1 "register_operand" " vr,vr") - (any_extend:VWEXTI - (match_operand: 2 "vector_shift_operand" " vr,vk")] - "TARGET_VECTOR && can_create_pseudo_p ()" - "#" - "&& 1" - [(const_int 0)] -{ - insn_code icode = code_for_pred_narrow (, mode); - riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, operands); - DONE; -} - [(set_attr "type" "vnshift") - (set_attr "mode" "")]) - -(define_insn_and_split "*trunc" - [(set (match_operand: 0 "regist
Re: [PATCH v1] RISC-V: Support ceil and ceilf auto-vectorization
+;; - +;; [FP] Math.h. +;; - +;; Includes: +;; - ceil/ceilf +;; - +(define_expand "ceil2" + [(match_operand:VF 0 "register_operand") + (match_operand:VF 1 "register_operand")] + "TARGET_VECTOR" + { +rtx tmp = gen_reg_rtx (mode); +rtx ops_1[] = {tmp, operands[1]}; +insn_code icode = code_for_pred_fcvt_x_f (UNSPEC_VFCVT, mode); + +/* vfcvt.x.f with rounding up (aka ceil). */ +riscv_vector::emit_vlmax_insn (icode, riscv_vector::UNARY_OP_FRM_RUP, ops_1); + +rtx ops_2[] = {operands[0], tmp}; +icode = code_for_pred (FLOAT, mode); + +/* vfcvt.f.x for the final result. To avoid unnecessary frm register + access, we use RUP here and it will never do the rounding up because + the tmp rtx comes from the float to int conversion. */ +riscv_vector::emit_vlmax_insn (icode, riscv_vector::UNARY_OP_FRM_RUP, ops_2); + +DONE; + } +) It should be "V_VLSF" instead of "VF" so that you could also support VLS CEIL. Besides, I want to see this following case: a[i] = cond[i] ? CEIL (b[i]): c[i]; Ideally, we should be able to combine vfcvt + vmerge into vfcvt with mask. juzhe.zh...@rivai.ai From: pan2.li Date: 2023-09-20 10:30 To: gcc-patches CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng Subject: [PATCH v1] RISC-V: Support ceil and ceilf auto-vectorization From: Pan Li This patch would like to support auto-vectorization for both the ceil and ceilf of math.h. It depends on the -ffast-math option. When we would like to call ceil/ceilf like v2 = ceil (v1), we will onvert it into below insn (reference the implementation of llvm). * vfcvt.x.f v3, v1, RUP * vfcvt.f.x v2, v3 The conditional auto-vectorization for ceil/ceilf is also supported and covered by test cases. Befor this patch: math-ceil-1.c:21:1: missed: couldn't vectorize loop ... .L3: flw fa0,0(s0) addis0,s0,4 addis1,s1,4 callceilf fsw fa0,-4(s1) bne s0,s2,.L3 After this patch: ... fsrmi 3 .L4: vsetvli a5,a2,e32,m1,ta,ma vle32.v v1,0(a1) vsetvli a3,zero,e32,m1,ta,ma sllia4,a5,2 vfcvt.x.f.v v1,v1 sub a2,a2,a5 vfcvt.f.x.v v1,v1 vsetvli zero,a5,e32,m1,ta,ma vse32.v v1,0(a0) add a1,a1,a4 add a0,a0,a4 bne a2,zero,.L4 .L14: fsrma6 ret Please not VLS mode is not involved in this patch and will be token care of in the underlying patches soon. gcc/ChangeLog: * config/riscv/autovec.md (ceil2): New pattern. * config/riscv/riscv-protos.h (enum insn_flags): New enum type. (enum insn_type): Ditto. * config/riscv/riscv-v.cc: Handle rounding up. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/math-ceil-1.c: New test. * gcc.target/riscv/rvv/autovec/math-ceil-2.c: New test. * gcc.target/riscv/rvv/autovec/math-ceil-3.c: New test. * gcc.target/riscv/rvv/autovec/math-ceil-4.c: New test. * gcc.target/riscv/rvv/autovec/math-ceil-run-1.c: New test. * gcc.target/riscv/rvv/autovec/math-ceil-run-2.c: New test. * gcc.target/riscv/rvv/autovec/test-math.h: New test. Signed-off-by: Pan Li --- gcc/config/riscv/autovec.md | 30 + gcc/config/riscv/riscv-protos.h | 4 ++ gcc/config/riscv/riscv-v.cc | 2 + .../riscv/rvv/autovec/math-ceil-1.c | 21 + .../riscv/rvv/autovec/math-ceil-2.c | 21 + .../riscv/rvv/autovec/math-ceil-3.c | 24 ++ .../riscv/rvv/autovec/math-ceil-4.c | 24 ++ .../riscv/rvv/autovec/math-ceil-run-1.c | 24 ++ .../riscv/rvv/autovec/math-ceil-run-2.c | 24 ++ .../gcc.target/riscv/rvv/autovec/test-math.h | 45 +++ 10 files changed, 219 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-3.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-4.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/test-math.h diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index 493d5745485..ea508d81047 100644 --- a/gcc/config/riscv/autovec.md +++ b/gcc/config/riscv/autovec.md @@ -2374,3 +2374,33 @@ (define_expand "avg3_ceil" riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, ops3); DONE; }) + +;; - +;; [FP] Math.h. +;; -
Re: Re: [Committed] RISC-V: Support VLS unary floating-point patterns
I think we could remove match.h. Hi, @Patrick. Could you verify it? diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/def.h b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/def.h index 2292372d7a3..674098e9ba6 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/def.h +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/def.h @@ -1,5 +1,4 @@ #include -#include and commit it. Thanks. juzhe.zh...@rivai.ai From: Kito Cheng Date: 2023-09-20 08:52 To: 钟居哲 CC: Patrick O'Neill; Robin Dapp; gcc-patches; Kito.cheng; jeffreyalaw; palmer; Edwin Lu; joern.rennecke; jeremy.bennett; gnu-toolchain Subject: Re: Re: [Committed] RISC-V: Support VLS unary floating-point patterns It seems because math.h, similar issue as stdint.h, does math.h necessary for the test case? juzhe.zh...@rivai.ai 於 2023年9月20日 週三 08:44 寫道: I didn't see this issue. They should be the bogus FAILs. We should either fix testcases or ignore them. juzhe.zh...@rivai.ai From: Patrick O'Neill Date: 2023-09-20 08:34 To: Juzhe-Zhong; Robin Dapp; gcc-patches CC: kito.cheng; kito.cheng; jeffreyalaw; Palmer Dabbelt; Edwin Lu; joern.rennecke; jeremy.bennett; gnu-toolchain Subject: Re: [Committed] RISC-V: Support VLS unary floating-point patterns Hi, This patch highlights an issue Edwin and I have been having with the testsuite where rv64 testcases are run when testing rv32gcv. There's a large number of new failures in the rv32gcv testsuite from this seemingly innocuous patch. https://github.com/ewlu/riscv-gnu-toolchain/issues/166 (The repo is still a WIP - eventually will be non-gating patchworks pre-commit CI) From Edwin and my investigation the failures for rv32gcv look like [1]. /home/runner/work/riscv-gnu-toolchain/riscv-gnu-toolchain/build/sysroot/usr/include/gnu/stubs.h:17:11: fatal error: gnu/stubs-lp64d.h: No such file or directory compilation terminated. Top of the failing testcase: /* { dg-do compile } */ /* { dg-options "-march=rv64gcv_zvfh_zvl4096b -mabi=lp64d -O3 -fno-schedule-insns -fno-schedule-insns2 --param=riscv-autovec-lmul=m8" } */ #include "def.h" The dg-options explicitly set rv64gcv, so I don't think this testcase should even be executed. For the 3 new failures on rv64gcv, they all explicitly set rv32gcv. /* { dg-options "-march=rv32gcv -mabi=ilp32d -O3" } */ These are seen on non-multilib builds. Multilib rv32/64gc does not appear to have the same issue when compiling (we're currently testing multilib rv32/64gcv to see if they encounter issues when executing). Are other people seeing similar errors/is this a known issue? Patrick [1]: Executing on host: /home/runner/work/riscv-gnu-toolchain/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/xgcc -B/home/runner/work/riscv-gnu-toolchain/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/ /home/runner/work/riscv-gnu-toolchain/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/floating-point-mul-3.c -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output -O3 -ftree-vectorize --param riscv-autovec-preference=scalable -march=rv64gcv_zvfh_zvl4096b -mabi=lp64d -O3 -fno-schedule-insns -fno-schedule-insns2 --param=riscv-autovec-lmul=m8 -ffat-lto-objects -fno-ident -S -o floating-point-mul-3.s(timeout = 600) spawn -ignore SIGHUP /home/runner/work/riscv-gnu-toolchain/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/xgcc -B/home/runner/work/riscv-gnu-toolchain/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/ /home/runner/work/riscv-gnu-toolchain/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/floating-point-mul-3.c -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output -O3 -ftree-vectorize --param riscv-autovec-preference=scalable -march=rv64gcv_zvfh_zvl4096b -mabi=lp64d -O3 -fno-schedule-insns -fno-schedule-insns2 --param=riscv-autovec-lmul=m8 -ffat-lto-objects -fno-ident -S -o floating-point-mul-3.s In file included from /home/runner/work/riscv-gnu-toolchain/riscv-gnu-toolchain/build/sysroot/usr/include/features.h:515, from /home/runner/work/riscv-gnu-toolchain/riscv-gnu-toolchain/build/sysroot/usr/include/bits/libc-header-start.h:33, from /home/runner/work/riscv-gnu-toolchain/riscv-gnu-toolchain/build/sysroot/usr/include/math.h:27, from /home/runner/work/riscv-gnu-toolchain/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/def.h:2, from /home/runner/work/riscv-gnu-toolchain/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/floating-point-mul-3.c:4: /home/runner/work/riscv-gnu-toolchain/riscv-gnu-toolchain/build/sysroot/usr/include/gnu/stubs.h:17:11: fatal error: gnu/stubs-lp64d.h: No such file or directory compilation terminated. compiler exited with status 1 FAIL: gcc.target/riscv/rvv/autovec/vls/floating-point-mul-3.c -O3 -ftree-vectorize
Re: Re: [Committed] RISC-V: Support VLS unary floating-point patterns
I didn't see this issue. They should be the bogus FAILs. We should either fix testcases or ignore them. juzhe.zh...@rivai.ai From: Patrick O'Neill Date: 2023-09-20 08:34 To: Juzhe-Zhong; Robin Dapp; gcc-patches CC: kito.cheng; kito.cheng; jeffreyalaw; Palmer Dabbelt; Edwin Lu; joern.rennecke; jeremy.bennett; gnu-toolchain Subject: Re: [Committed] RISC-V: Support VLS unary floating-point patterns Hi, This patch highlights an issue Edwin and I have been having with the testsuite where rv64 testcases are run when testing rv32gcv. There's a large number of new failures in the rv32gcv testsuite from this seemingly innocuous patch. https://github.com/ewlu/riscv-gnu-toolchain/issues/166 (The repo is still a WIP - eventually will be non-gating patchworks pre-commit CI) From Edwin and my investigation the failures for rv32gcv look like [1]. /home/runner/work/riscv-gnu-toolchain/riscv-gnu-toolchain/build/sysroot/usr/include/gnu/stubs.h:17:11: fatal error: gnu/stubs-lp64d.h: No such file or directory compilation terminated. Top of the failing testcase: /* { dg-do compile } */ /* { dg-options "-march=rv64gcv_zvfh_zvl4096b -mabi=lp64d -O3 -fno-schedule-insns -fno-schedule-insns2 --param=riscv-autovec-lmul=m8" } */ #include "def.h" The dg-options explicitly set rv64gcv, so I don't think this testcase should even be executed. For the 3 new failures on rv64gcv, they all explicitly set rv32gcv. /* { dg-options "-march=rv32gcv -mabi=ilp32d -O3" } */ These are seen on non-multilib builds. Multilib rv32/64gc does not appear to have the same issue when compiling (we're currently testing multilib rv32/64gcv to see if they encounter issues when executing). Are other people seeing similar errors/is this a known issue? Patrick [1]: Executing on host: /home/runner/work/riscv-gnu-toolchain/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/xgcc -B/home/runner/work/riscv-gnu-toolchain/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/ /home/runner/work/riscv-gnu-toolchain/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/floating-point-mul-3.c -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output -O3 -ftree-vectorize --param riscv-autovec-preference=scalable -march=rv64gcv_zvfh_zvl4096b -mabi=lp64d -O3 -fno-schedule-insns -fno-schedule-insns2 --param=riscv-autovec-lmul=m8 -ffat-lto-objects -fno-ident -S -o floating-point-mul-3.s(timeout = 600) spawn -ignore SIGHUP /home/runner/work/riscv-gnu-toolchain/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/xgcc -B/home/runner/work/riscv-gnu-toolchain/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/ /home/runner/work/riscv-gnu-toolchain/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/floating-point-mul-3.c -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output -O3 -ftree-vectorize --param riscv-autovec-preference=scalable -march=rv64gcv_zvfh_zvl4096b -mabi=lp64d -O3 -fno-schedule-insns -fno-schedule-insns2 --param=riscv-autovec-lmul=m8 -ffat-lto-objects -fno-ident -S -o floating-point-mul-3.s In file included from /home/runner/work/riscv-gnu-toolchain/riscv-gnu-toolchain/build/sysroot/usr/include/features.h:515, from /home/runner/work/riscv-gnu-toolchain/riscv-gnu-toolchain/build/sysroot/usr/include/bits/libc-header-start.h:33, from /home/runner/work/riscv-gnu-toolchain/riscv-gnu-toolchain/build/sysroot/usr/include/math.h:27, from /home/runner/work/riscv-gnu-toolchain/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/def.h:2, from /home/runner/work/riscv-gnu-toolchain/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/floating-point-mul-3.c:4: /home/runner/work/riscv-gnu-toolchain/riscv-gnu-toolchain/build/sysroot/usr/include/gnu/stubs.h:17:11: fatal error: gnu/stubs-lp64d.h: No such file or directory compilation terminated. compiler exited with status 1 FAIL: gcc.target/riscv/rvv/autovec/vls/floating-point-mul-3.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable (test for excess errors) On 9/19/23 04:26, Juzhe-Zhong wrote: > Extend current VLA patterns with VLS modes. > > Regression all passed. > > gcc/ChangeLog: > > * config/riscv/autovec.md: Extend VLS modes. > * config/riscv/vector.md: Ditto. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/rvv/autovec/vls/def.h: Add unary test. > * gcc.target/riscv/rvv/autovec/vls/neg-2.c: New test. > > --- > gcc/config/riscv/autovec.md | 12 ++--- > gcc/config/riscv/vector.md| 20 +++ > .../gcc.target/riscv/rvv/autovec/vls/def.h| 3 +- > .../gcc.target/riscv/rvv/autovec/vls/neg-2.c | 52 +++ > 4 files changed, 70 insertions(+), 17 deletions(-) > create mode 100644 gcc/testsui
Re: Re: [PATCH v1] RISC-V: Fix one ICE for vect test vect-multitypes-5
Thanks for reporting it. Could you try this and verify for me? - rtx src_op_0 = XEXP (src, 0); - - if (GET_CODE (src) == CONST && GET_CODE (src_op_0) == PLUS -&& CONST_POLY_INT_P (XEXP (src_op_0, 1))) + if (GET_CODE (src) == CONST && GET_CODE (XEXP (src, 0)) == PLUS +&& CONST_POLY_INT_P (XEXP (XEXP (src, 0), 1))) { rtx dest_tmp = gen_reg_rtx (mode); rtx tmp = gen_reg_rtx (mode); - riscv_emit_move (dest, XEXP (src_op_0, 0)); - riscv_legitimize_poly_move (mode, dest_tmp, tmp, XEXP (src_op_0, 1)); + riscv_emit_move (dest, XEXP (XEXP (src, 0), 0)); + riscv_legitimize_poly_move (mode, dest_tmp, tmp, XEXP (XEXP (src, 0), 1)); If it can fix your issue, plz send a patch and commit it. Thanks. juzhe.zh...@rivai.ai From: Patrick O'Neill Date: 2023-09-19 01:38 To: Li, Pan2; Kito Cheng CC: gcc-patches@gcc.gnu.org; Wang, Yanzhang; juzhe.zh...@rivai.ai; Palmer Dabbelt Subject: Re: [PATCH v1] RISC-V: Fix one ICE for vect test vect-multitypes-5 Hi, After this patch, there is now an ICE when bootstrapping with --enable-checking=rtl on rv32gc. More details: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111461 Thanks, Patrick On 8/29/23 07:40, Li, Pan2 via Gcc-patches wrote: > Committed, thanks Kito. > > Pan > > -Original Message- > From: Kito Cheng > Sent: Tuesday, August 29, 2023 9:46 PM > To: Li, Pan2 > Cc: gcc-patches@gcc.gnu.org; Wang, Yanzhang ; > juzhe.zh...@rivai.ai > Subject: Re: [PATCH v1] RISC-V: Fix one ICE for vect test vect-multitypes-5 > > LGTM, thanks :) > > On Tue, Aug 29, 2023 at 6:50 PM Pan Li via Gcc-patches > wrote: >> From: Pan Li >> >> There will be one ICE when build vect-multitypes-5.c similar as below: >> >> riscv64-unknown-elf-gcc -O3 \ >>-march=rv64imafdcv -mabi=lp64d -mcmodel=medlow \ >>-fdiagnostics-plain-output -flto -ffat-lto-objects \ >>--param riscv-autovec-preference=scalable -Wno-psabi \ >>-ftree-vectorize -fno-tree-loop-distribute-patterns \ >>-fno-vect-cost-model -fno-common -O2 -fdump-tree-vect-details \ >>gcc/testsuite/gcc.dg/vect/vect-multitypes-5.c -o test.elf -lm >> >> The below RTL is not well handled in riscv_legitimize_const_move, and >> then fall through to the default pass. Then the >> default force_const_mem will NULL_RTX, and will have ICE when operating >> one the NULL_RTX. >> >> (const:DI >>(plus:DI >> (symbol_ref:DI ("ic") [flags 0x2] ) >> (const_poly_int:DI [16, 16]))) >> >> This patch would like to take care of this rtl in >> riscv_legitimize_const_move. >> >> Signed-off-by: Pan Li >> Co-Authored-By: Ju-Zhe Zhong >> >> gcc/ChangeLog: >> >> * config/riscv/riscv.cc (riscv_legitimize_poly_move): New >> declaration. >> (riscv_legitimize_const_move): Handle ref plus const poly. >> --- >> gcc/config/riscv/riscv.cc | 23 +++ >> 1 file changed, 23 insertions(+) >> >> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc >> index 1d6e278ea90..bab6ed70b2d 100644 >> --- a/gcc/config/riscv/riscv.cc >> +++ b/gcc/config/riscv/riscv.cc >> @@ -366,6 +366,7 @@ static const struct riscv_tune_param >> optimize_size_tune_info = { >> >> static tree riscv_handle_fndecl_attribute (tree *, tree, tree, int, bool >> *); >> static tree riscv_handle_type_attribute (tree *, tree, tree, int, bool *); >> +static void riscv_legitimize_poly_move (machine_mode, rtx, rtx, rtx); >> >> /* Defining target-specific uses of __attribute__. */ >> static const struct attribute_spec riscv_attribute_table[] = >> @@ -2118,6 +2119,28 @@ riscv_legitimize_const_move (machine_mode mode, rtx >> dest, rtx src) >> return; >> } >> >> + /* Handle below format. >> + (const:DI >> + (plus:DI >> +(symbol_ref:DI ("ic") [flags 0x2] ) <- >> op_0 >> +(const_poly_int:DI [16, 16]) // <- op_1 >> + )) >> + */ >> + rtx src_op_0 = XEXP (src, 0); >> + >> + if (GET_CODE (src) == CONST && GET_CODE (src_op_0) == PLUS >> +&& CONST_POLY_INT_P (XEXP (src_op_0, 1))) >> +{ >> + rtx dest_tmp = gen_reg_rtx (mode); >> + rtx tmp = gen_reg_rtx (mode); >> + >> + riscv_emit_move (dest, XEXP (src_op_0, 0)); >> + riscv_legitimize_poly_move (mode, dest_tmp, tmp, XEXP (src_op_0, 1)); >> + >> + emit_insn (gen_rtx_SET (dest, gen_rtx_PLUS (mode, dest, dest_tmp))); >> + return; >> +} >> + >> src = force_const_mem (mode, src); >> >> /* When using explicit relocs, constant pool references are sometimes >> -- >> 2.34.1 >>
Re: [PATCH] RISC-V: Refactor and cleanup fma patterns
Thanks for the refactoring. This patch is needed in VLS fma support and undefined value enabling support. LGTM. juzhe.zh...@rivai.ai From: Lehua Ding Date: 2023-09-18 19:37 To: gcc-patches CC: juzhe.zhong; kito.cheng; rdapp.gcc; palmer; jeffreyalaw; lehua.ding Subject: [PATCH] RISC-V: Refactor and cleanup fma patterns At present, FMA autovec's patterns do not fully use the corresponding pattern in vector.md. The previous reason is that the merge operand of pattern in vector.md cannot be VUNDEF. Now allowing it to be VUNDEF, reunify insn used for reload pass into vector.md, and the corresponding vlmax pattern in autovec.md is used for combine. This patch also refactors the corresponding combine pattern inside autovec-opt.md and removes the unused ones. gcc/ChangeLog: * config/riscv/autovec-opt.md (*_fma): Removed old combine patterns. (*single_mult_plus): Ditto. (*double_mult_plus): Ditto. (*sign_zero_extend_fma): Ditto. (*zero_sign_extend_fma): Ditto. (*double_widen_fma): Ditto. (*single_widen_fma): Ditto. (*double_widen_fnma): Ditto. (*single_widen_fnma): Ditto. (*double_widen_fms): Ditto. (*single_widen_fms): Ditto. (*double_widen_fnms): Ditto. (*single_widen_fnms): Ditto. (*reduc_plus_scal_): Adjust name. (*widen_reduc_plus_scal_): Adjust name. (*dual_widen_fma): New combine pattern. (*dual_widen_fmasu): Ditto. (*dual_widen_fmaus): Ditto. (*dual_fma): Ditto. (*single_fma): Ditto. (*dual_fnma): Ditto. (*single_fnma): Ditto. (*dual_fms): Ditto. (*single_fms): Ditto. (*dual_fnms): Ditto. (*single_fnms): Ditto. * config/riscv/autovec.md (fma4): Reafctor fma pattern. (*fma): Removed. (fnma4): Reafctor. (*fnma): Removed. (*fma): Removed. (*fnma): Removed. (fms4): Reafctor. (*fms): Removed. (fnms4): Reafctor. (*fnms): Removed. * config/riscv/riscv-protos.h (prepare_ternary_operands): Adjust prototype. * config/riscv/riscv-v.cc (prepare_ternary_operands): Refactor. * config/riscv/vector.md (*pred_mul_plus_undef): New pattern. (*pred_mul_plus): Removed. (*pred_mul_plus_scalar): Removed. (*pred_mul_plus_extended_scalar): Removed. (*pred_minus_mul_undef): New pattern. (*pred_minus_mul): Removed. (*pred_minus_mul_scalar): Removed. (*pred_minus_mul_extended_scalar): Removed. (*pred_mul__undef): New pattern. (*pred_mul_): Removed. (*pred_mul__scalar): Removed. (*pred_mul_neg__undef): New pattern. (*pred_mul_neg_): Removed. (*pred_mul_neg__scalar): Removed. --- gcc/config/riscv/autovec-opt.md | 736 ++-- gcc/config/riscv/autovec.md | 301 - gcc/config/riscv/riscv-protos.h | 2 +- gcc/config/riscv/riscv-v.cc | 14 +- gcc/config/riscv/vector.md | 439 ++- 5 files changed, 528 insertions(+), 964 deletions(-) diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md index df516849527..c94cd0ae087 100644 --- a/gcc/config/riscv/autovec-opt.md +++ b/gcc/config/riscv/autovec-opt.md @@ -110,166 +110,6 @@ [(set_attr "type" "vmalu") (set_attr "mode" "")]) -;; = -;; == Widening Ternary arithmetic -;; = - -;; - -;; [INT] VWMACC -;; - -;; Includes: -;; - vwmacc.vv -;; - vwmaccu.vv -;; - - -;; Combine ext + ext + fma ===> widen fma. -;; Most of circumstantces, LoopVectorizer will generate the following IR: -;; vect__8.64_40 = (vector([4,4]) int) vect__7.63_41; -;; vect__11.68_35 = (vector([4,4]) int) vect__10.67_36; -;; vect__13.70_33 = .FMA (vect__11.68_35, vect__8.64_40, vect__4.60_45); -(define_insn_and_split "*_fma" - [(set (match_operand:VWEXTI 0 "register_operand") - (plus:VWEXTI - (mult:VWEXTI - (any_extend:VWEXTI - (match_operand: 2 "register_operand")) - (any_extend:VWEXTI - (match_operand: 3 "register_operand"))) - (match_operand:VWEXTI 1 "register_operand")))] - "TARGET_VECTOR && can_create_pseudo_p ()" - "#" - "&& 1" - [(const_int 0)] - { -riscv_vector::emit_vlmax_insn (code_for_pred_widen_mul_plus (, mode), - riscv_vector::WIDEN_TERNARY_OP, operands); -DONE; - } - [(set_attr "type" "viwmuladd") - (set_attr "mode" "")]) - -;; This helps to match ext + fma. -(define_insn_and_split "*single_mult_plus" - [(set (match_operand:VWEXTI 0 "register_operand") - (plus:VWEXTI - (mult:VWEXTI - (any_extend:VWEXTI - (match_operand: 2 "register_operand")) - (match_operand:VWEXTI 3 "register_operand")) - (match_operand:VWEXTI 1 "register_operand")
Re: [PATCH] RISC-V: Fix RVV can change mode class bug
Sorry for I made a mistake here. Change 'mayb_lt' into '!ordered_p' in V2: https://gcc.gnu.org/pipermail/gcc-patches/2023-September/630835.html juzhe.zh...@rivai.ai From: Juzhe-Zhong Date: 2023-09-19 10:25 To: gcc-patches CC: kito.cheng; kito.cheng; jeffreyalaw; rdapp.gcc; Juzhe-Zhong Subject: [PATCH] RISC-V: Fix RVV can change mode class bug After support the VLS mode conversion, current case triggers a latent bug that we are lucky we didn't encounter. This is a real bug in 'cprop_hardreg': orig:RVVMF8BI,16,16 new:V32BI,32,0 during RTL pass: cprop_hardreg auto.c: In function 'main': auto.c:79:1: internal compiler error: in partial_subreg_p, at rtl.h:3186 79 | } | ^ 0x10979a7 partial_subreg_p(machine_mode, machine_mode) ../../../../gcc/gcc/rtl.h:3186 0x1723eda mode_change_ok ../../../../gcc/gcc/regcprop.cc:402 0x1724007 maybe_mode_change ../../../../gcc/gcc/regcprop.cc:436 0x172445d find_oldest_value_reg ../../../../gcc/gcc/regcprop.cc:489 0x172534d copyprop_hardreg_forward_1 ../../../../gcc/gcc/regcprop.cc:808 0x1727017 cprop_hardreg_bb ../../../../gcc/gcc/regcprop.cc:1358 0x17272f7 execute ../../../../gcc/gcc/regcprop.cc:1425 When trying to do reg copy propagation between RVVMF8BI (precision = 16,16) and V32BI (precision = 32,0). The assertion failed in partial_subreg_p: gcc_checking_assert (ordered_p (outer_prec, inner_prec)); In regcprop.cc: if (partial_subreg_p (orig_mode, new_mode)) return false; If orig_mode (RVVMF8BI) smaller than new_mode (V32BI), we don't do the hard reg propogation. However, the 'partial_subreg_p' cause ICE since gcc_checking_assert (ordered_p (outer_prec, inner_prec)). After analysis in aarch64.cc, they do careful block in 'TARGET_CAN_CHANGE_MODE_CLASS'. So it's reasonable block regcprop when old mode size maybe_lt than new mode size since we won't do the copy propgation. gcc/ChangeLog: * config/riscv/riscv.cc (riscv_can_change_mode_class): Fix RVV mode change bug. --- gcc/config/riscv/riscv.cc | 16 +++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index 8c766e2e2be..28b45a87351 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -8536,8 +8536,22 @@ riscv_slow_unaligned_access (machine_mode, unsigned int) /* Implement TARGET_CAN_CHANGE_MODE_CLASS. */ static bool -riscv_can_change_mode_class (machine_mode, machine_mode, reg_class_t rclass) +riscv_can_change_mode_class (machine_mode from, machine_mode to, reg_class_t rclass) { + /* We have RVV VLS modes and VLA modes sharing same REG_CLASS. + In 'cprop_hardreg' stage, we will try to do hard reg copy propagation + between wider mode (FROM) and narrow mode (TO). + + E.g. We should not allow copy propagation + - RVVMF8BI (precision = [16, 16]) -> V32BI (precision = [32, 0]) + since such propagation cause ICE and execution FAIL. + + However, we could allow copy propagation + - RVVMF4 (precision = [32, 32]) -> V32BI (precision = [32, 0]) + since RVVMF4 always >= RV32BI. */ + if (reg_classes_intersect_p (V_REGS, rclass) + && maybe_lt (GET_MODE_PRECISION (from), GET_MODE_PRECISION (to))) +return false; return !reg_classes_intersect_p (FP_REGS, rclass); } -- 2.36.3
Re: [PATCH] RISC-V: Removed misleading comments in testcases
LGTM juzhe.zh...@rivai.ai From: Lehua Ding Date: 2023-09-18 20:29 To: gcc-patches CC: juzhe.zhong; kito.cheng; rdapp.gcc; palmer; jeffreyalaw; lehua.ding Subject: [PATCH] RISC-V: Removed misleading comments in testcases This patch removed the misleading comments in testcases since we support fold min(int, poly) to constant by this patch (https://gcc.gnu.org/pipermail/gcc-patches/2023-September/629651.html). Thereby the csrr will not appear inside the assembly code, even if there is no support for some VLS vector patterns. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls/div-1.c: Removed comments. * gcc.target/riscv/rvv/autovec/vls/shift-3.c: Ditto. --- gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/div-1.c | 1 - gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/shift-3.c | 1 - 2 files changed, 2 deletions(-) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/div-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/div-1.c index 40224c69458..e36fa9decfd 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/div-1.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/div-1.c @@ -54,5 +54,4 @@ DEF_OP_VV (div, 256, int64_t, /) DEF_OP_VV (div, 512, int64_t, /) /* { dg-final { scan-assembler-times {vdivu?\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 42 } } */ -/* TODO: Ideally, we should make sure there is no "csrr vlenb". However, we still have 'csrr vlenb' for some cases since we don't support VLS mode conversion which are needed by division. */ /* { dg-final { scan-assembler-not {csrr} } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/shift-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/shift-3.c index b34a349949b..db2295b2dd6 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/shift-3.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/shift-3.c @@ -54,5 +54,4 @@ DEF_OP_VV (shift, 256, int64_t, <<) DEF_OP_VV (shift, 512, int64_t, <<) /* { dg-final { scan-assembler-times {vsll\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 41 } } */ -/* TODO: Ideally, we should make sure there is no "csrr vlenb". However, we still have 'csrr vlenb' for some cases since we don't support VLS mode conversion which are needed by division. */ /* { dg-final { scan-assembler-not {csrr} } } */ -- 2.36.3
Re: [PATCH] RISC-V: Add fixed PR111255 testcase by other patch
LGTM。 juzhe.zh...@rivai.ai From: Lehua Ding Date: 2023-09-18 20:13 To: gcc-patches CC: juzhe.zhong; kito.cheng; rdapp.gcc; palmer; jeffreyalaw; lehua.ding Subject: [PATCH] RISC-V: Add fixed PR111255 testcase by other patch This patch add the missed PR111255 testcase which is fixed by this committed patch (https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628922.html). PR target/111255 gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/vsetvl/pr111255.c: New test. --- .../gcc.target/riscv/rvv/vsetvl/pr111255.c| 24 +++ 1 file changed, 24 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111255.c diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111255.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111255.c new file mode 100644 index 000..736f6838a50 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111255.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 --param riscv-autovec-lmul=m2 -fno-vect-cost-model" } */ + +#include + +#define DEF_LOOP(OLD_TYPE, NEW_TYPE) \ + void __attribute__ ((noipa)) \ + test_##OLD_TYPE##_2_##NEW_TYPE (NEW_TYPE *__restrict r, \ + OLD_TYPE *__restrict a, NEW_TYPE b, \ + OLD_TYPE *__restrict pred, int n)\ + { \ +for (int i = 0; i < n; ++i) \ + { \ + r[i] = pred[i] ? (NEW_TYPE) a[i] : b; \ + } \ + } + +/* INT -> narrower-INT */ +#define TEST_ALL_X2X_NARROWER(T) \ + T (int16_t, int8_t) + +TEST_ALL_X2X_NARROWER (DEF_LOOP) + +/* { dg-final { scan-assembler-not {\tvsetvli\t[a-x0-9]+,[a-x0-9]+,e[0-9]+,m[f0-9]+,t[au],m[au]\n\tvsetvli\t} } } */ -- 2.36.3
Re: Re: [PATCH V2] internal-fn: Support undefined rtx for uninitialized SSA_NAME
Thanks Richard. Address comments on V3: https://gcc.gnu.org/pipermail/gcc-patches/2023-September/630699.html juzhe.zh...@rivai.ai From: Richard Sandiford Date: 2023-09-17 23:29 To: Juzhe-Zhong CC: gcc-patches; rguenther Subject: Re: [PATCH V2] internal-fn: Support undefined rtx for uninitialized SSA_NAME Juzhe-Zhong writes: > According to PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110751 > > As Richard and Richi suggested, we recognize uninitialized SSA_NAME and > convert it > into SCRATCH rtx if the target predicate allows SCRATCH. > > It can help to reduce redundant data move instructions of targets like RISC-V. > > gcc/ChangeLog: > > * internal-fn.cc (expand_fn_using_insn): Support undefined rtx. > * optabs.cc (maybe_legitimize_operand): Ditto. > (can_reuse_operands_p): Ditto. > * optabs.h (enum expand_operand_type): Ditto. > (create_undefined_input_operand): Ditto. > > --- > gcc/internal-fn.cc | 4 > gcc/optabs.cc | 16 > gcc/optabs.h | 14 +- > 3 files changed, 33 insertions(+), 1 deletion(-) > > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc > index 0fd34359247..61d5a9e4772 100644 > --- a/gcc/internal-fn.cc > +++ b/gcc/internal-fn.cc > @@ -247,6 +247,10 @@ expand_fn_using_insn (gcall *stmt, insn_code icode, > unsigned int noutputs, > create_convert_operand_from (&ops[opno], rhs_rtx, > TYPE_MODE (rhs_type), > TYPE_UNSIGNED (rhs_type)); > + else if (TREE_CODE (rhs) == SSA_NAME > +&& SSA_NAME_IS_DEFAULT_DEF (rhs) > +&& VAR_P (SSA_NAME_VAR (rhs))) > + create_undefined_input_operand (&ops[opno], TYPE_MODE (rhs_type)); >else > create_input_operand (&ops[opno], rhs_rtx, TYPE_MODE (rhs_type)); >opno += 1; > diff --git a/gcc/optabs.cc b/gcc/optabs.cc > index 32ff379ffc3..d8c771547a3 100644 > --- a/gcc/optabs.cc > +++ b/gcc/optabs.cc > @@ -8102,6 +8102,21 @@ maybe_legitimize_operand (enum insn_code icode, > unsigned int opno, >goto input; > } >break; > + > +case EXPAND_UNDEFINED: > + { > + mode = insn_data[(int) icode].operand[opno].mode; > + rtx scratch = gen_rtx_SCRATCH (mode); A scratch of the right mode should already be available in op->value, since it was created by create_undefined_input_operand. If that doesn't work for some reason, then it would be better for create_undefined_input_operand to pass NULL_RTX as the "value" argument to create_expand_operand. > + /* For SCRATCH rtx which is converted from uninitialized > +SSA, we convert it as fresh pseudo when target doesn't > +allow scratch rtx in predicate. Otherwise, return true. */ > + if (!insn_operand_matches (icode, opno, scratch)) > + { > + op->value = gen_reg_rtx (mode); The mode should come from op->mode. > + goto input; > + } > + return true; > + } > } >return insn_operand_matches (icode, opno, op->value); > } > @@ -8147,6 +8162,7 @@ can_reuse_operands_p (enum insn_code icode, > case EXPAND_INPUT: > case EXPAND_ADDRESS: > case EXPAND_INTEGER: > +case EXPAND_UNDEFINED: >return true; I think this should be in the "return false" block instead. > > case EXPAND_CONVERT_TO: > diff --git a/gcc/optabs.h b/gcc/optabs.h > index c80b7f4dc1b..4eb1f9ee09a 100644 > --- a/gcc/optabs.h > +++ b/gcc/optabs.h > @@ -37,7 +37,8 @@ enum expand_operand_type { >EXPAND_CONVERT_TO, >EXPAND_CONVERT_FROM, >EXPAND_ADDRESS, > - EXPAND_INTEGER > + EXPAND_INTEGER, > + EXPAND_UNDEFINED Sorry, this was my bad suggestion. I should have suggested EXPAND_UNDEFINED_INPUT, to match the name of the function. Thanks, Richard > }; > > /* Information about an operand for instruction expansion. */ > @@ -117,6 +118,17 @@ create_input_operand (class expand_operand *op, rtx > value, >create_expand_operand (op, EXPAND_INPUT, value, mode, false); > } > > +/* Make OP describe an undefined input operand for uninitialized > + SSA. It's the scratch operand with mode MODE; MODE cannot be > + VOIDmode. */ > + > +inline void > +create_undefined_input_operand (class expand_operand *op, machine_mode mode) > +{ > + create_expand_operand (op, EXPAND_UNDEFINED, gen_rtx_SCRATCH (mode), mode, > + false); > +} > + > /* Like create_input_operand, except that VALUE must first be converted > to mode MODE. UNSIGNED_P says whether VALUE is unsigned. */
Re: [PATCH] RISC-V: Remove phase 6 of vsetvl pass in GCC13[PR111412]
Thanks for fixing it. I am ok remove phase 6 optimization which has many latent bugs (in GCC 14 kito has refactored it) there. But I think we need kito's more comments about that. juzhe.zh...@rivai.ai From: Li Xu Date: 2023-09-18 12:19 To: gcc-patches CC: kito.cheng; palmer; juzhe.zhong; xuli Subject: [PATCH] RISC-V: Remove phase 6 of vsetvl pass in GCC13[PR111412] From: xuli vsetvl pass has been refactored in gcc14, and the optimization is more reasonable than releases/gcc-13. This problem does not exist in gcc14. Phase 6 of gcc13 is an optimization patch. Due to lack of consideration, there will be some hidden bugs, so we decided to remove phase 6. Although the generated code will be redundant, the program is correct. PR target/111412 gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (vector_infos_manager::release): Remove. (pass_vsetvl::refine_vsetvls): Ditto. (pass_vsetvl::cleanup_vsetvls): Ditto. (pass_vsetvl::propagate_avl): Ditto. (pass_vsetvl::lazy_vsetvl): Ditto. * config/riscv/riscv-vsetvl.h: Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/vsetvl/avl_single-79.c: Adjust case. * gcc.target/riscv/rvv/vsetvl/avl_single-80.c: Ditto. * gcc.target/riscv/rvv/vsetvl/avl_single-86.c: Ditto. * gcc.target/riscv/rvv/vsetvl/avl_single-87.c: Ditto. * gcc.target/riscv/rvv/vsetvl/avl_single-88.c: Ditto. * gcc.target/riscv/rvv/vsetvl/avl_single-89.c: Ditto. * gcc.target/riscv/rvv/vsetvl/avl_single-90.c: Ditto. * gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-25.c: Ditto. * gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-26.c: Ditto. * gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-14.c: Ditto. * gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-15.c: Ditto. * gcc.target/riscv/rvv/vsetvl/vsetvl-1.c: Ditto. * gcc.target/riscv/rvv/vsetvl/vsetvl-5.c: Ditto. * gcc.target/riscv/rvv/vsetvl/vsetvl-6.c: Ditto. * gcc.target/riscv/rvv/vsetvl/vsetvl-7.c: Ditto. * gcc.target/riscv/rvv/vsetvl/vsetvl-8.c: Ditto. * gcc.target/riscv/rvv/vsetvl/vsetvlmax-2.c: Ditto. * gcc.target/riscv/rvv/vsetvl/vsetvlmax-4.c: Ditto. * gcc.target/riscv/rvv/base/pr111412.c: New test. --- gcc/config/riscv/riscv-vsetvl.cc | 153 +- gcc/config/riscv/riscv-vsetvl.h | 2 - .../gcc.target/riscv/rvv/base/pr111412.c | 41 + .../riscv/rvv/vsetvl/avl_single-79.c | 4 +- .../riscv/rvv/vsetvl/avl_single-80.c | 4 +- .../riscv/rvv/vsetvl/avl_single-86.c | 4 +- .../riscv/rvv/vsetvl/avl_single-87.c | 4 +- .../riscv/rvv/vsetvl/avl_single-88.c | 4 +- .../riscv/rvv/vsetvl/avl_single-89.c | 4 +- .../riscv/rvv/vsetvl/avl_single-90.c | 4 +- .../riscv/rvv/vsetvl/vlmax_back_prop-25.c | 10 +- .../riscv/rvv/vsetvl/vlmax_back_prop-26.c | 10 +- .../riscv/rvv/vsetvl/vlmax_switch_vtype-14.c | 6 +- .../riscv/rvv/vsetvl/vlmax_switch_vtype-15.c | 2 +- .../gcc.target/riscv/rvv/vsetvl/vsetvl-1.c| 2 +- .../gcc.target/riscv/rvv/vsetvl/vsetvl-5.c| 2 +- .../gcc.target/riscv/rvv/vsetvl/vsetvl-6.c| 2 +- .../gcc.target/riscv/rvv/vsetvl/vsetvl-7.c| 2 +- .../gcc.target/riscv/rvv/vsetvl/vsetvl-8.c| 2 +- .../gcc.target/riscv/rvv/vsetvl/vsetvlmax-2.c | 4 +- .../gcc.target/riscv/rvv/vsetvl/vsetvlmax-4.c | 4 +- 21 files changed, 80 insertions(+), 190 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr111412.c diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc index 0cf4bc818e2..9dca2ce709d 100644 --- a/gcc/config/riscv/riscv-vsetvl.cc +++ b/gcc/config/riscv/riscv-vsetvl.cc @@ -2494,8 +2494,6 @@ vector_infos_manager::release (void) if (!vector_exprs.is_empty ()) vector_exprs.release (); - gcc_assert (to_refine_vsetvls.is_empty ()); - gcc_assert (to_delete_vsetvls.is_empty ()); if (optimize > 0) free_bitmap_vectors (); } @@ -2702,9 +2700,6 @@ private: /* Phase 5. */ void cleanup_insns (void) const; - /* Phase 6. */ - void propagate_avl (void) const; - void init (void); void done (void); void compute_probabilities (void); @@ -3823,10 +3818,8 @@ pass_vsetvl::refine_vsetvls (void) const /* We can't refine user vsetvl into vsetvl zero,zero since the dest will be used by the following instructions. */ if (vector_config_insn_p (rinsn)) - { - m_vector_manager->to_refine_vsetvls.add (rinsn); continue; - } + rinsn = PREV_INSN (rinsn); rtx new_pat = gen_vsetvl_pat (VSETVL_VTYPE_CHANGE_ONLY, info, NULL_RTX); change_insn (rinsn, new_pat); @@ -3862,10 +3855,7 @@ pass_vsetvl::cleanup_vsetvls () /* We can't eliminate user vsetvl since the dest will be used * by the following instructions. */ if (vector_config_insn_p (insn->rtl ())) - {
Re: Re: [PATCH v3] RISC-V: Implement RESOLVE_OVERLOADED_BUILTIN for RVV intrinsic
More information: For PRED_TYPE_tumu, it's easy to analyze, just need to count how many arguments in the arglist. If arglist has 5 arguments (mask, merge, op1, op2, len) Then it must be TUMU. What I mean is that we should be able to quickly to compute the arguments of the construction of the function_instance. Then we can get the non-overloaeded function. juzhe.zh...@rivai.ai From: juzhe.zh...@rivai.ai Date: 2023-09-15 10:02 To: pan2.li; gcc-patches CC: pan2.li; yanzhang.wang; kito.cheng Subject: Re: [PATCH v3] RISC-V: Implement RESOLVE_OVERLOADED_BUILTIN for RVV intrinsic Sorry for comment again. I am not happy with current get_non_overloaeded_instance function. I think the searching approach is very in-effective: +function_instance * +function_base::get_non_overloaded_instance (unsigned int code, + vec &arglist) const +{ + unsigned int code_limit = vec_safe_length (registered_functions); + + for (unsigned fun_code = code; fun_code < code_limit; fun_code++) +{ + registered_function *rfun = (*registered_functions)[fun_code]; + function_instance instance = rfun->instance; + + if (rfun->overloaded_p) + continue; + + unsigned k; + const rvv_arg_type_info *args = instance.op_info->args; + + for (k = 0; args[k].base_type != NUM_BASE_TYPES; k++) + { + if (k >= arglist.length ()) + break; + + if (TYPE_MODE (instance.get_arg_type (k)) + != TYPE_MODE (TREE_TYPE (arglist[k]))) + break; + } + + if (args[k].base_type == NUM_BASE_TYPES) + return &rfun->instance; +} + + return NULL; +} Instead, I think we should build up a table which map non-overloaded function according to the arguments so that we could get the "instance" effectively. E.g. For vint8mf8_t tumu vadd intrinsic the instance is like this: function_instance ("vadd", bases::vadd, shapes::alu, iu_ops[VECTOR_TYPE_vuint8mf8_t], PRED_TYPE_tumu, &iu_vvv_ops); Since the get_nonoverloaed_instance is already the function of the class BASE. So, The first 3 arguments "vadd", bases::vadd, shapes::alu should already known since it is a known function_base. The last 3 arguments may need some elegant analysis or map table to quickly grep. So, I think we should consider this framework seriously. juzhe.zh...@rivai.ai From: pan2.li Date: 2023-09-12 16:46 To: gcc-patches CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng Subject: [PATCH v3] RISC-V: Implement RESOLVE_OVERLOADED_BUILTIN for RVV intrinsic From: Pan Li Update in v3: * Rewrite comment for overloaded function add. * Move get_non_overloaded_instance to function_base. Update in v2: * Add get_non_overloaded_instance for function instance. * Fix overload check for policy function. * Enrich the test cases check. Original log: This patch would like add the framework to support the RVV overloaded intrinsic API in riscv-xxx-xxx-gcc, like riscv-xxx-xxx-g++ did. However, it almost leverage the hook TARGET_RESOLVE_OVERLOADED_BUILTIN with below steps. * Register overloaded functions. * Add function_resolver for overloaded function resolving. * Add resolve API for function shape with default implementation. * Implement HOOK for navigating the overloaded API to non-overloaded API. We validated this framework by the vmv_v intrinsic API(s), and we will add more intrins API support in the underlying patches. gcc/ChangeLog: * config/riscv/riscv-c.cc (riscv_resolve_overloaded_builtin): New function for the hook. (riscv_register_pragmas): Register the hook * config/riscv/riscv-protos.h (resolve_overloaded_builtin): New decl. * config/riscv/riscv-vector-builtins-shapes.cc (build_one): Register overloaded function. (struct overloaded_base): New struct for overloaded shape. (struct non_overloaded_base): New struct for non overloaded shape. (struct move_def): Inherit overloaded shape. * config/riscv/riscv-vector-builtins.cc (function_base::get_non_overloaded_instance): New API impl. (function_builder::add_function): Add overloaded arg. (function_resolver::function_resolver): New constructor. (function_builder::add_overloaded_function): New API impl. (function_resolver::resolve): Ditto. (function_resolver::lookup): Ditto. (function_resolver::get_sub_code): Ditto. (resolve_overloaded_builtin): New function impl. * config/riscv/riscv-vector-builtins.h: (class function_resolver): New class. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/overloaded_rv32_vmv_v.c: New test. * gcc.target/riscv/rvv/base/overloaded_rv64_vmv_v.c: New test. * gcc.target/riscv/rvv/base/overloaded_vmv_v.h: New test. Signed-off-by: Pan Li --- gcc/config/riscv/riscv-c.cc | 36 gcc/config/riscv/riscv-protos.h | 1 + .../riscv/riscv-vector-builtins-shapes.cc | 20 ++- gcc/config/riscv/riscv-vector-builtins.cc | 155 +- gcc/config/riscv/riscv-vector-builtins.h | 36 +++- .../riscv/rvv/base/overloaded_r
Re: [PATCH v3] RISC-V: Implement RESOLVE_OVERLOADED_BUILTIN for RVV intrinsic
Sorry for comment again. I am not happy with current get_non_overloaeded_instance function. I think the searching approach is very in-effective: +function_instance * +function_base::get_non_overloaded_instance (unsigned int code, + vec &arglist) const +{ + unsigned int code_limit = vec_safe_length (registered_functions); + + for (unsigned fun_code = code; fun_code < code_limit; fun_code++) +{ + registered_function *rfun = (*registered_functions)[fun_code]; + function_instance instance = rfun->instance; + + if (rfun->overloaded_p) + continue; + + unsigned k; + const rvv_arg_type_info *args = instance.op_info->args; + + for (k = 0; args[k].base_type != NUM_BASE_TYPES; k++) + { + if (k >= arglist.length ()) + break; + + if (TYPE_MODE (instance.get_arg_type (k)) + != TYPE_MODE (TREE_TYPE (arglist[k]))) + break; + } + + if (args[k].base_type == NUM_BASE_TYPES) + return &rfun->instance; +} + + return NULL; +} Instead, I think we should build up a table which map non-overloaded function according to the arguments so that we could get the "instance" effectively. E.g. For vint8mf8_t tumu vadd intrinsic the instance is like this: function_instance ("vadd", bases::vadd, shapes::alu, iu_ops[VECTOR_TYPE_vuint8mf8_t], PRED_TYPE_tumu, &iu_vvv_ops); Since the get_nonoverloaed_instance is already the function of the class BASE. So, The first 3 arguments "vadd", bases::vadd, shapes::alu should already known since it is a known function_base. The last 3 arguments may need some elegant analysis or map table to quickly grep. So, I think we should consider this framework seriously. juzhe.zh...@rivai.ai From: pan2.li Date: 2023-09-12 16:46 To: gcc-patches CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng Subject: [PATCH v3] RISC-V: Implement RESOLVE_OVERLOADED_BUILTIN for RVV intrinsic From: Pan Li Update in v3: * Rewrite comment for overloaded function add. * Move get_non_overloaded_instance to function_base. Update in v2: * Add get_non_overloaded_instance for function instance. * Fix overload check for policy function. * Enrich the test cases check. Original log: This patch would like add the framework to support the RVV overloaded intrinsic API in riscv-xxx-xxx-gcc, like riscv-xxx-xxx-g++ did. However, it almost leverage the hook TARGET_RESOLVE_OVERLOADED_BUILTIN with below steps. * Register overloaded functions. * Add function_resolver for overloaded function resolving. * Add resolve API for function shape with default implementation. * Implement HOOK for navigating the overloaded API to non-overloaded API. We validated this framework by the vmv_v intrinsic API(s), and we will add more intrins API support in the underlying patches. gcc/ChangeLog: * config/riscv/riscv-c.cc (riscv_resolve_overloaded_builtin): New function for the hook. (riscv_register_pragmas): Register the hook * config/riscv/riscv-protos.h (resolve_overloaded_builtin): New decl. * config/riscv/riscv-vector-builtins-shapes.cc (build_one): Register overloaded function. (struct overloaded_base): New struct for overloaded shape. (struct non_overloaded_base): New struct for non overloaded shape. (struct move_def): Inherit overloaded shape. * config/riscv/riscv-vector-builtins.cc (function_base::get_non_overloaded_instance): New API impl. (function_builder::add_function): Add overloaded arg. (function_resolver::function_resolver): New constructor. (function_builder::add_overloaded_function): New API impl. (function_resolver::resolve): Ditto. (function_resolver::lookup): Ditto. (function_resolver::get_sub_code): Ditto. (resolve_overloaded_builtin): New function impl. * config/riscv/riscv-vector-builtins.h: (class function_resolver): New class. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/overloaded_rv32_vmv_v.c: New test. * gcc.target/riscv/rvv/base/overloaded_rv64_vmv_v.c: New test. * gcc.target/riscv/rvv/base/overloaded_vmv_v.h: New test. Signed-off-by: Pan Li --- gcc/config/riscv/riscv-c.cc | 36 gcc/config/riscv/riscv-protos.h | 1 + .../riscv/riscv-vector-builtins-shapes.cc | 20 ++- gcc/config/riscv/riscv-vector-builtins.cc | 155 +- gcc/config/riscv/riscv-vector-builtins.h | 36 +++- .../riscv/rvv/base/overloaded_rv32_vmv_v.c| 8 + .../riscv/rvv/base/overloaded_rv64_vmv_v.c| 8 + .../riscv/rvv/base/overloaded_vmv_v.h | 27 +++ 8 files changed, 288 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/overloaded_rv32_vmv_v.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/overloaded_rv64_vmv_v.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/overloaded_vmv_v.h diff --git a/gcc/config/riscv/riscv-c.cc b/gcc/config/riscv/riscv-c.cc index 283052ae313..060edd3129d 100644 --- a/gcc/config/riscv/riscv-c.cc +++ b/gcc/config/ris
Re: Re: [PATCH V3] RISC-V: Expand VLS mode to scalar mode move[PR111391]
Hi. Kito. Could you review this code ? Regression is running /* Expand (set (reg:DI target) (subreg:DI (reg:V8QI reg) 0)) Expand this data movement instead of simply forbid it since we can improve the code generation for this following scenario by RVV auto-vectorization: (set (reg:V8QI 149) (vec_duplicate:V8QI (reg:QI)) (set (reg:DI target) (subreg:DI (reg:V8QI reg) 0)) Since RVV mode and scalar mode are in different REG_CLASS, we need to explicitly move data from V_REGS to GR_REGS by scalar move. */ if (SUBREG_P (src) && riscv_v_ext_mode_p (GET_MODE (SUBREG_REG (src { machine_mode vmode = GET_MODE (SUBREG_REG (src)); unsigned int mode_size = GET_MODE_SIZE (mode).to_constant (); unsigned int vmode_size = GET_MODE_SIZE (vmode).to_constant (); unsigned int nunits = vmode_size / mode_size; scalar_mode smode = as_a (mode); unsigned int index = SUBREG_BYTE (src).to_constant () / mode_size; unsigned int num = smode == DImode && !TARGET_VECTOR_ELEN_64 ? 2 : 1; if (num == 2) { /* If we want to extract 64bit value but ELEN < 64, we use RVV vector mode with EEW = 32 to extract the highpart and lowpart. */ smode = SImode; nunits = nunits * 2; } vmode = riscv_vector::get_vector_mode (smode, nunits).require (); enum insn_code icode = convert_optab_handler (vec_extract_optab, vmode, smode); gcc_assert (icode != CODE_FOR_nothing); rtx v = gen_lowpart (vmode, SUBREG_REG (src)); for (unsigned int i = 0; i < num; i++) { class expand_operand ops[3]; rtx result; if (num == 1) result = dest; else if (i == 0) result = gen_lowpart (smode, dest); else result = gen_reg_rtx (smode); create_output_operand (&ops[0], result, smode); ops[0].target = 1; create_input_operand (&ops[1], v, vmode); create_integer_operand (&ops[2], index + i); expand_insn (icode, 3, ops); if (ops[0].value != result) emit_move_insn (result, ops[0].value); if (i == 1) { rtx tmp = expand_binop (Pmode, ashl_optab, gen_lowpart (Pmode, result), gen_int_mode (32, Pmode), NULL_RTX, 0, OPTAB_DIRECT); rtx tmp2 = expand_binop (Pmode, ior_optab, tmp, dest, NULL_RTX, 0, OPTAB_DIRECT); emit_move_insn (dest, tmp2); } } return true; } ASM: vsetivli zero,2,e32,mf2,ta,ma vslidedown.vi v2,v1,1 vmv.x.s a5,v2 slli a5,a5,32 vmv.x.s a0,v1 or a0,a5,a0 juzhe.zh...@rivai.ai From: Kito Cheng Date: 2023-09-14 17:26 To: juzhe.zh...@rivai.ai CC: gcc-patches; Kito.cheng; jeffreyalaw; Robin Dapp Subject: Re: Re: [PATCH V3] RISC-V: Expand VLS mode to scalar mode move[PR111391] Yeah, try pr111391.c with rv64gc_zve32x (NO v, my mistake in last mail :P), maybe add a testcase pr111391-zve32x.c that just include pr111391.c and set dg option to rv64gc_zve32x On Thu, Sep 14, 2023 at 5:24 PM juzhe.zh...@rivai.ai wrote: > > You mean try pr111391.c > that I added with rv64gcv_zve32x ? > > > > juzhe.zh...@rivai.ai > > From: Kito Cheng > Date: 2023-09-14 17:20 > To: juzhe.zh...@rivai.ai > CC: gcc-patches; Kito.cheng; jeffreyalaw; Robin Dapp > Subject: Re: Re: [PATCH V3] RISC-V: Expand VLS mode to scalar mode > move[PR111391] > Could you check if it work correctly for rv64gcv_zve32x? add testcase > no matter if it works or not :) > > On Thu, Sep 14, 2023 at 5:19 PM juzhe.zh...@rivai.ai > wrote: > > > > Is it Ok for trunk ? Or you want me send a separate patch to remove "@" in > > vec_extract optab ? > > > > > > > > juzhe.zh...@rivai.ai > > > > From: Kito Cheng > > Date: 2023-09-14 16:11 > > To: Juzhe-Zhong > > CC: gcc-patches; kito.cheng; jeffreyalaw; rdapp.gcc > > Subject: Re: [PATCH V3] RISC-V: Expand VLS mode to scalar mode > > move[PR111391] > > On Thu, Sep 14, 2023 at 4:04 PM Juzhe-Zhong wrote: > > > > > > This patch fixes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111391 > > > > > > I notice that previous patch (V2 patch) cause additional execution fail > > > of pr69719.c > > > This FAIL is because of the latent BUG of VSETVL PASS. > > > > > > So this patch includes VSETVL PASS fix even though it's not related to > > > the PR111391. > > > > > > I have confirm the whole regression no additional FAILs are introduced. > > > > > > PR target/111391 > > > > &
Re: Re: [PATCH V3] RISC-V: Expand VLS mode to scalar mode move[PR111391]
Oh I see. It ICE: during RTL pass: expand bug.c:26:9: internal compiler error: in require, at machmode.h:313 26 | i (a); | ^ 0x1032253 opt_mode::require() const ../../../../gcc/gcc/machmode.h:313 0x1c47877 riscv_legitimize_move(machine_mode, rtx_def*, rtx_def*) ../../../../gcc/gcc/config/riscv/riscv.cc:2532 0x274bbe0 gen_movdi(rtx_def*, rtx_def*) ../../../../gcc/gcc/config/riscv/riscv.md:2024 0x102cb1c rtx_insn* insn_gen_fn::operator()(rtx_def*, rtx_def*) const ../../../../gcc/gcc/recog.h:411 0x11fbc8e emit_move_insn_1(rtx_def*, rtx_def*) ../../../../gcc/gcc/expr.cc:4164 0x11fc809 emit_move_insn(rtx_def*, rtx_def*) ../../../../gcc/gcc/expr.cc:4334 0x1039a0b load_register_parameters ../../../../gcc/gcc/calls.cc:2155 0x103d865 expand_call(tree_node*, rtx_def*, int) ../../../../gcc/gcc/calls.cc:3626 0x121e78c expand_expr_real_1(tree_node*, rtx_def*, machine_mode, expand_modifier, rtx_def**, bool) ../../../../gcc/gcc/expr.cc:11921 0x120ffb8 expand_expr_real(tree_node*, rtx_def*, machine_mode, expand_modifier, rtx_def**, bool) ../../../../gcc/gcc/expr.cc:9010 0x102c694 expand_expr(tree_node*, rtx_def*, machine_mode, expand_modifier) ../../../../gcc/gcc/expr.h:310 0x105ccc9 expand_call_stmt ../../../../gcc/gcc/cfgexpand.cc:2831 0x10608af expand_gimple_stmt_1 ../../../../gcc/gcc/cfgexpand.cc:3880 0x1060f4d expand_gimple_stmt ../../../../gcc/gcc/cfgexpand.cc:4044 0x10699f3 expand_gimple_basic_block Thanks for catching this. juzhe.zh...@rivai.ai From: Kito Cheng Date: 2023-09-14 17:20 To: juzhe.zh...@rivai.ai CC: gcc-patches; Kito.cheng; jeffreyalaw; Robin Dapp Subject: Re: Re: [PATCH V3] RISC-V: Expand VLS mode to scalar mode move[PR111391] Could you check if it work correctly for rv64gcv_zve32x? add testcase no matter if it works or not :) On Thu, Sep 14, 2023 at 5:19 PM juzhe.zh...@rivai.ai wrote: > > Is it Ok for trunk ? Or you want me send a separate patch to remove "@" in > vec_extract optab ? > > > > juzhe.zh...@rivai.ai > > From: Kito Cheng > Date: 2023-09-14 16:11 > To: Juzhe-Zhong > CC: gcc-patches; kito.cheng; jeffreyalaw; rdapp.gcc > Subject: Re: [PATCH V3] RISC-V: Expand VLS mode to scalar mode move[PR111391] > On Thu, Sep 14, 2023 at 4:04 PM Juzhe-Zhong wrote: > > > > This patch fixes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111391 > > > > I notice that previous patch (V2 patch) cause additional execution fail of > > pr69719.c > > This FAIL is because of the latent BUG of VSETVL PASS. > > > > So this patch includes VSETVL PASS fix even though it's not related to the > > PR111391. > > > > I have confirm the whole regression no additional FAILs are introduced. > > > > PR target/111391 > > > > gcc/ChangeLog: > > > > * config/riscv/autovec.md (@vec_extract): Remove @. > > (vec_extract): Ditto. > > * config/riscv/riscv-vsetvl.cc (emit_vsetvl_insn): Fix bug. > > (pass_vsetvl::local_eliminate_vsetvl_insn): Ditto. > > * config/riscv/riscv.cc (riscv_legitimize_move): Expand move. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/riscv/rvv/autovec/partial/slp-9.c: Adapt test. > > * gcc.target/riscv/rvv/autovec/pr111391.c: New test. > > > > --- > > gcc/config/riscv/autovec.md | 2 +- > > gcc/config/riscv/riscv-vsetvl.cc | 4 ++- > > gcc/config/riscv/riscv.cc | 32 +++ > > .../riscv/rvv/autovec/partial/slp-9.c | 1 - > > .../gcc.target/riscv/rvv/autovec/pr111391.c | 28 > > 5 files changed, 64 insertions(+), 3 deletions(-) > > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr111391.c > > > > diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md > > index e74a1695709..7121bab1716 100644 > > --- a/gcc/config/riscv/autovec.md > > +++ b/gcc/config/riscv/autovec.md > > @@ -1442,7 +1442,7 @@ > > ;; > > - > > ;; [INT,FP] Extract a vector element. > > ;; > > - > > -(define_expand "@vec_extract" > > +(define_expand "vec_extract" > > Why remove this? I saw this change was introduced in v3? > > > >[(set (match_operand: 0 "register_operand") > > (vec_select: > > (match_operand:V_VLS 1 "register_operand") >
Re: Re: [PATCH V3] RISC-V: Expand VLS mode to scalar mode move[PR111391]
You mean try pr111391.c that I added with rv64gcv_zve32x ? juzhe.zh...@rivai.ai From: Kito Cheng Date: 2023-09-14 17:20 To: juzhe.zh...@rivai.ai CC: gcc-patches; Kito.cheng; jeffreyalaw; Robin Dapp Subject: Re: Re: [PATCH V3] RISC-V: Expand VLS mode to scalar mode move[PR111391] Could you check if it work correctly for rv64gcv_zve32x? add testcase no matter if it works or not :) On Thu, Sep 14, 2023 at 5:19 PM juzhe.zh...@rivai.ai wrote: > > Is it Ok for trunk ? Or you want me send a separate patch to remove "@" in > vec_extract optab ? > > > > juzhe.zh...@rivai.ai > > From: Kito Cheng > Date: 2023-09-14 16:11 > To: Juzhe-Zhong > CC: gcc-patches; kito.cheng; jeffreyalaw; rdapp.gcc > Subject: Re: [PATCH V3] RISC-V: Expand VLS mode to scalar mode move[PR111391] > On Thu, Sep 14, 2023 at 4:04 PM Juzhe-Zhong wrote: > > > > This patch fixes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111391 > > > > I notice that previous patch (V2 patch) cause additional execution fail of > > pr69719.c > > This FAIL is because of the latent BUG of VSETVL PASS. > > > > So this patch includes VSETVL PASS fix even though it's not related to the > > PR111391. > > > > I have confirm the whole regression no additional FAILs are introduced. > > > > PR target/111391 > > > > gcc/ChangeLog: > > > > * config/riscv/autovec.md (@vec_extract): Remove @. > > (vec_extract): Ditto. > > * config/riscv/riscv-vsetvl.cc (emit_vsetvl_insn): Fix bug. > > (pass_vsetvl::local_eliminate_vsetvl_insn): Ditto. > > * config/riscv/riscv.cc (riscv_legitimize_move): Expand move. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/riscv/rvv/autovec/partial/slp-9.c: Adapt test. > > * gcc.target/riscv/rvv/autovec/pr111391.c: New test. > > > > --- > > gcc/config/riscv/autovec.md | 2 +- > > gcc/config/riscv/riscv-vsetvl.cc | 4 ++- > > gcc/config/riscv/riscv.cc | 32 +++ > > .../riscv/rvv/autovec/partial/slp-9.c | 1 - > > .../gcc.target/riscv/rvv/autovec/pr111391.c | 28 > > 5 files changed, 64 insertions(+), 3 deletions(-) > > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr111391.c > > > > diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md > > index e74a1695709..7121bab1716 100644 > > --- a/gcc/config/riscv/autovec.md > > +++ b/gcc/config/riscv/autovec.md > > @@ -1442,7 +1442,7 @@ > > ;; > > - > > ;; [INT,FP] Extract a vector element. > > ;; > > - > > -(define_expand "@vec_extract" > > +(define_expand "vec_extract" > > Why remove this? I saw this change was introduced in v3? > > > >[(set (match_operand: 0 "register_operand") > > (vec_select: > > (match_operand:V_VLS 1 "register_operand") >
Re: Re: [PATCH V3] RISC-V: Expand VLS mode to scalar mode move[PR111391]
Is it Ok for trunk ? Or you want me send a separate patch to remove "@" in vec_extract optab ? juzhe.zh...@rivai.ai From: Kito Cheng Date: 2023-09-14 16:11 To: Juzhe-Zhong CC: gcc-patches; kito.cheng; jeffreyalaw; rdapp.gcc Subject: Re: [PATCH V3] RISC-V: Expand VLS mode to scalar mode move[PR111391] On Thu, Sep 14, 2023 at 4:04 PM Juzhe-Zhong wrote: > > This patch fixes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111391 > > I notice that previous patch (V2 patch) cause additional execution fail of > pr69719.c > This FAIL is because of the latent BUG of VSETVL PASS. > > So this patch includes VSETVL PASS fix even though it's not related to the > PR111391. > > I have confirm the whole regression no additional FAILs are introduced. > > PR target/111391 > > gcc/ChangeLog: > > * config/riscv/autovec.md (@vec_extract): Remove @. > (vec_extract): Ditto. > * config/riscv/riscv-vsetvl.cc (emit_vsetvl_insn): Fix bug. > (pass_vsetvl::local_eliminate_vsetvl_insn): Ditto. > * config/riscv/riscv.cc (riscv_legitimize_move): Expand move. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/rvv/autovec/partial/slp-9.c: Adapt test. > * gcc.target/riscv/rvv/autovec/pr111391.c: New test. > > --- > gcc/config/riscv/autovec.md | 2 +- > gcc/config/riscv/riscv-vsetvl.cc | 4 ++- > gcc/config/riscv/riscv.cc | 32 +++ > .../riscv/rvv/autovec/partial/slp-9.c | 1 - > .../gcc.target/riscv/rvv/autovec/pr111391.c | 28 > 5 files changed, 64 insertions(+), 3 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr111391.c > > diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md > index e74a1695709..7121bab1716 100644 > --- a/gcc/config/riscv/autovec.md > +++ b/gcc/config/riscv/autovec.md > @@ -1442,7 +1442,7 @@ > ;; - > ;; [INT,FP] Extract a vector element. > ;; - > -(define_expand "@vec_extract" > +(define_expand "vec_extract" Why remove this? I saw this change was introduced in v3? >[(set (match_operand: 0 "register_operand") > (vec_select: > (match_operand:V_VLS 1 "register_operand")
Re: Re: [PATCH V3] RISC-V: Expand VLS mode to scalar mode move[PR111391]
>> Why remove this? I saw this change was introduced in v3? The "@" was introduced by this patch: https://gcc.gnu.org/pipermail/gcc-patches/2023-September/630184.html At the first time, I thought I need to explicitly call emit_insn (gen_vec_extract (mode, mode, ) That's why I added in the last patch. However, I found I don't need to call gen_vec_extract, so I remove "@" in this patch: + enum insn_code icode + = convert_optab_handler (vec_extract_optab, vmode, mode); + gcc_assert (icode != CODE_FOR_nothing); + class expand_operand ops[3]; + create_output_operand (&ops[0], dest, mode); + ops[0].target = 1; + create_input_operand (&ops[1], gen_lowpart (vmode, SUBREG_REG (src)), + vmode); + unsigned int index = SUBREG_BYTE (src).to_constant () / mode_size; + create_integer_operand (&ops[2], index); + expand_insn (icode, 3, ops); This code is copied from optabs-query.cc juzhe.zh...@rivai.ai From: Kito Cheng Date: 2023-09-14 16:11 To: Juzhe-Zhong CC: gcc-patches; kito.cheng; jeffreyalaw; rdapp.gcc Subject: Re: [PATCH V3] RISC-V: Expand VLS mode to scalar mode move[PR111391] On Thu, Sep 14, 2023 at 4:04 PM Juzhe-Zhong wrote: > > This patch fixes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111391 > > I notice that previous patch (V2 patch) cause additional execution fail of > pr69719.c > This FAIL is because of the latent BUG of VSETVL PASS. > > So this patch includes VSETVL PASS fix even though it's not related to the > PR111391. > > I have confirm the whole regression no additional FAILs are introduced. > > PR target/111391 > > gcc/ChangeLog: > > * config/riscv/autovec.md (@vec_extract): Remove @. > (vec_extract): Ditto. > * config/riscv/riscv-vsetvl.cc (emit_vsetvl_insn): Fix bug. > (pass_vsetvl::local_eliminate_vsetvl_insn): Ditto. > * config/riscv/riscv.cc (riscv_legitimize_move): Expand move. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/rvv/autovec/partial/slp-9.c: Adapt test. > * gcc.target/riscv/rvv/autovec/pr111391.c: New test. > > --- > gcc/config/riscv/autovec.md | 2 +- > gcc/config/riscv/riscv-vsetvl.cc | 4 ++- > gcc/config/riscv/riscv.cc | 32 +++ > .../riscv/rvv/autovec/partial/slp-9.c | 1 - > .../gcc.target/riscv/rvv/autovec/pr111391.c | 28 > 5 files changed, 64 insertions(+), 3 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr111391.c > > diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md > index e74a1695709..7121bab1716 100644 > --- a/gcc/config/riscv/autovec.md > +++ b/gcc/config/riscv/autovec.md > @@ -1442,7 +1442,7 @@ > ;; - > ;; [INT,FP] Extract a vector element. > ;; - > -(define_expand "@vec_extract" > +(define_expand "vec_extract" Why remove this? I saw this change was introduced in v3? >[(set (match_operand: 0 "register_operand") > (vec_select: > (match_operand:V_VLS 1 "register_operand")
Re: [PATCH] RISC-V: Expand VLS mode to scalar mode move[PR111391]
Just realize this patch cause some unexpected ICE FAILs in GCC regression. Now, V2: https://gcc.gnu.org/pipermail/gcc-patches/2023-September/630194.html has fully passed the regression. juzhe.zh...@rivai.ai From: Juzhe-Zhong Date: 2023-09-13 21:01 To: gcc-patches CC: kito.cheng; kito.cheng; jeffreyalaw; rdapp.gcc; Juzhe-Zhong Subject: [PATCH] RISC-V: Expand VLS mode to scalar mode move[PR111391] This patch fixes PR111391: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111391 PR target/111391 gcc/ChangeLog: * config/riscv/riscv.cc (riscv_legitimize_move): Expand VLS to scalar move. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/partial/slp-9.c: Adapt test. * gcc.target/riscv/rvv/autovec/pr111391.c: New test. --- gcc/config/riscv/riscv.cc | 29 +++ .../riscv/rvv/autovec/partial/slp-9.c | 1 - .../gcc.target/riscv/rvv/autovec/pr111391.c | 28 ++ 3 files changed, 57 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr111391.c diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index 9d04ddd69e0..b7daad7cbb5 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -2513,6 +2513,35 @@ riscv_legitimize_move (machine_mode mode, rtx dest, rtx src) } return true; } + /* Expand + (set (reg:DI target) (subreg:DI (reg:V8QI reg) 0)) + Expand this data movement instead of simply forbid it since + we can improve the code generation for this following scenario + by RVV auto-vectorization: + (set (reg:V8QI 149) (vec_duplicate:V8QI (reg:QI)) + (set (reg:DI target) (subreg:DI (reg:V8QI reg) 0)) + Since RVV mode and scalar mode are in different REG_CLASS, + we need to explicitly move data from V_REGS to GR_REGS by scalar move. */ + if (SUBREG_P (src) && riscv_v_ext_mode_p (GET_MODE (SUBREG_REG (src +{ + rtx subreg = force_reg (GET_MODE (SUBREG_REG (src)), SUBREG_REG (src)); + machine_mode imode = GET_MODE_INNER (GET_MODE (subreg)); + unsigned int ratio = GET_MODE_SIZE (mode).to_constant () +/ GET_MODE_SIZE (imode).to_constant (); + poly_int64 nunits = GET_MODE_NUNITS (GET_MODE (subreg)); + nunits = exact_div (nunits, ratio); + scalar_mode smode = as_a (mode); + machine_mode vmode + = riscv_vector::get_vector_mode (smode, nunits).require (); + rtx tmp = gen_reg_rtx (mode); + rtx index + = gen_int_mode (exact_div (SUBREG_BYTE (src), GET_MODE_SIZE (smode)), + Pmode); + emit_insn (gen_vec_extract (vmode, vmode, tmp, + gen_lowpart (vmode, subreg), index)); + emit_move_insn (dest, tmp); + return true; +} /* Expand (set (reg:QI target) (mem:QI (address))) to diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-9.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-9.c index 5fba27c7a35..7c42438c9d9 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-9.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-9.c @@ -29,4 +29,3 @@ TEST_ALL (VEC_PERM) /* { dg-final { scan-assembler-times {viota.m} 2 } } */ -/* { dg-final { scan-assembler-not {vmv\.v\.i} } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr111391.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr111391.c new file mode 100644 index 000..a7f64c937c6 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr111391.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gcv -mabi=lp64d -Wno-int-conversion -Wno-implicit-function -Wno-incompatible-pointer-types -Wno-implicit-function-declaration -Ofast -ftree-vectorize" } */ + +int d (); +typedef struct +{ + int b; +} c; +int +e (char *f, long g) +{ + f += g; + while (g--) +*--f = d; +} + +int +d (c * f) +{ + while (h ()) +switch (f->b) + case 'Q': + { + long a; + e (&a, sizeof (a)); + i (a); + } +} -- 2.36.3
Re: Re: [PATCH] RISC-V: Support VLS modes VEC_EXTRACT auto-vectorization
>> Do we need the additional helper function? Yes. We need the additional helper function since I will cal emit_insn (gen_vec_extract (mode, mode) in the following patch which fixes PR111391 ICE. juzhe.zh...@rivai.ai From: Robin Dapp Date: 2023-09-13 20:31 To: Juzhe-Zhong; gcc-patches CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw Subject: Re: [PATCH] RISC-V: Support VLS modes VEC_EXTRACT auto-vectorization > -(define_expand "vec_extract" > +(define_expand "@vec_extract" Do we need the additional helper function? If not let's rather not add them for build-time reasons. The rest is OK, no need for v2. Regards Robin
gimple-match: Do not try UNCOND optimization with COND_LEN.
Thanks Robin for fixing it. - : cond (cond_in), else_value (else_value_in) + : cond (cond_in), else_value (else_value_in), len (NULL_TREE), +bias (NULL_TREE)It seems that you shouldn't include this fix in the patch? + + if (len) +{ + /* If we had a COND_LEN before we need to ensure that it stays that +way. */ + gimple_match_op old_op = *res_op; + *res_op = cond_op; + maybe_resimplify_conditional_op (seq, res_op, valueize); + + auto cfn = combined_fn (res_op->code); + if (internal_fn_p (cfn) + && internal_fn_len_index (as_internal_fn (cfn)) != -1) + return true; + + *res_op = old_op; + return false; +} + else +{ + *res_op = cond_op; + maybe_resimplify_conditional_op (seq, res_op, valueize); + return true; +} This looks odd to me. Currently, we never has cond_len_xxx with dummy length (length = VF) and we always use cond_xxx if we don't have a loop mask. So, the length of cond_len_xxx is always generated by MIN or SELET_VL. I think we don't need the gimple simplification like cond_len -> into argument value. But we need this following optimization: negate + cond_len_fma -> cond_len_fnma/cond_len_fms/cond_len_fnms. That's what I want to support in gimple fold. Let's see more comments from Richard and Richi. juzhe.zh...@rivai.ai
Re: Re: [PATCH] RISC-V: Support VECTOR BOOL vcond_mask optab[PR111337]
Ok add it in V2: https://gcc.gnu.org/pipermail/gcc-patches/2023-September/630048.html juzhe.zh...@rivai.ai From: Robin Dapp Date: 2023-09-12 21:29 To: Juzhe-Zhong; gcc-patches CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw Subject: Re: [PATCH] RISC-V: Support VECTOR BOOL vcond_mask optab[PR111337] Maybe you want to add PR target/111337 to the changelog? The rest LGTM. Regards Robin
Re: Re: [PATCH V4] RISC-V: Support Dynamic LMUL Cost model
Then you don't need to waste time on reduce the case from SPEC. juzhe.zh...@rivai.ai From: juzhe.zh...@rivai.ai Date: 2023-09-12 17:36 To: Robin Dapp; gcc-patches CC: Robin Dapp; kito.cheng; Kito.cheng; jeffreyalaw Subject: Re: Re: [PATCH V4] RISC-V: Support Dynamic LMUL Cost model This is first version of dynamic LMUL. I didn't test it with full GCC testsuite. My plan is to first pass all GCC testsuite (including vect.exp) with default LMUL = M1. Then enable dynamic LMUL to test it. Maybe we could tolerate this ICE issue for now. Then we can test it with full GCC testsuite (I belive we can reproduce with some case in GCC testsuite in the future). Is that reasonable ? If yes, I will fix all your comments and send V5. juzhe.zh...@rivai.ai From: Robin Dapp Date: 2023-09-12 17:31 To: juzhe.zh...@rivai.ai; gcc-patches CC: rdapp.gcc; kito.cheng; Kito.cheng; jeffreyalaw Subject: Re: [PATCH V4] RISC-V: Support Dynamic LMUL Cost model > Is calculix big ? It's 7 nested for loops IIRC and, when unrolling, can get pretty nasty. I tested with -Ofast -funroll-loops. I think wrf is even larger, maybe I can run a full comparison test tonight to have good coverage. > Could you give me the testcase to reproduce it? OK, I will try to reduce it, will be Fortran, though. Regards Robin
Re: Re: [PATCH V4] RISC-V: Support Dynamic LMUL Cost model
This is first version of dynamic LMUL. I didn't test it with full GCC testsuite. My plan is to first pass all GCC testsuite (including vect.exp) with default LMUL = M1. Then enable dynamic LMUL to test it. Maybe we could tolerate this ICE issue for now. Then we can test it with full GCC testsuite (I belive we can reproduce with some case in GCC testsuite in the future). Is that reasonable ? If yes, I will fix all your comments and send V5. juzhe.zh...@rivai.ai From: Robin Dapp Date: 2023-09-12 17:31 To: juzhe.zh...@rivai.ai; gcc-patches CC: rdapp.gcc; kito.cheng; Kito.cheng; jeffreyalaw Subject: Re: [PATCH V4] RISC-V: Support Dynamic LMUL Cost model > Is calculix big ? It's 7 nested for loops IIRC and, when unrolling, can get pretty nasty. I tested with -Ofast -funroll-loops. I think wrf is even larger, maybe I can run a full comparison test tonight to have good coverage. > Could you give me the testcase to reproduce it? OK, I will try to reduce it, will be Fortran, though. Regards Robin
Re: Re: [PATCH V4] RISC-V: Support Dynamic LMUL Cost model
Is calculix big ? Could you give me the testcase to reproduce it? For + gcc_assert (biggest_size >= mode_size); I currently don't have an idea to fix it. But for + mode = TYPE_MODE (TREE_TYPE (lhs)); I think I can fix it. if (!gimple_store_p (stmt)) { tree lhs = gimple_get_lhs (stmt); mode = TYPE_MODE (TREE_TYPE (lhs)); If it is not a STORE, I assume it always has a LHS. Turns out that my original thought is incorrect. I think I know the fix. juzhe.zh...@rivai.ai From: Robin Dapp Date: 2023-09-12 17:17 To: Juzhe-Zhong; gcc-patches CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw Subject: Re: [PATCH V4] RISC-V: Support Dynamic LMUL Cost model I did some benchmarks and, at least for calculix the differences are miniscule. I'd say we can stick with the current approach and improve as needed. However, I noticed ICEs here: + gcc_assert (biggest_size >= mode_size); and here: + mode = TYPE_MODE (TREE_TYPE (lhs)); when compiling calculix. Regards Robin
Re: Re: [PATCH V4] RISC-V: Support Dynamic LMUL Cost model
Thanks Robin. I have tried your codes. It works fine and tests passes. Does your code O(nlogn) complexity ? juzhe.zh...@rivai.ai From: Robin Dapp Date: 2023-09-12 16:19 To: Juzhe-Zhong; gcc-patches CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw Subject: Re: [PATCH V4] RISC-V: Support Dynamic LMUL Cost model Hi Juzhe, > +max_number_of_live_regs (const basic_block bb, > + const hash_map &live_ranges, > + unsigned int max_point, machine_mode biggest_mode, > + int lmul) > +{ > + unsigned int max_nregs = 0; > + unsigned int i; > + unsigned int live_point = 0; > + auto_vec live_vars_vec; > + live_vars_vec.safe_grow (max_point + 1, true); > + for (i = 0; i < live_vars_vec.length (); ++i) > +live_vars_vec[i] = 0; > + for (hash_map::iterator iter = live_ranges.begin (); > + iter != live_ranges.end (); ++iter) > +{ > + tree var = (*iter).first; > + pair live_range = (*iter).second; > + for (i = live_range.first; i <= live_range.second; i++) > + { > + machine_mode mode = TYPE_MODE (TREE_TYPE (var)); > + unsigned int nregs > + = compute_nregs_for_mode (mode, biggest_mode, lmul); > + live_vars_vec[i] += nregs; > + if (live_vars_vec[i] > max_nregs) > + max_nregs = live_vars_vec[i]; > + } > +} My concern is that we have O(nm) here, where n = number of live_ranges and m = size of live range. In large basic blocks (think calculix of SPECfp 2006 which can reach up to 2000 instructions IIRC) this might become prohibitive. I'm going to do a quick benchmark with calculix and report back. If there is no noticable difference we can ditch my idea. For short live ranges (like < 10) the O(nm) could be better. As of now, we still calculate the nregs n*m times, though. I have something like the following in mind (it is definitely not shorter, though): struct range { unsigned int pt; bool start; unsigned int nregs; }; auto_vec ranges (2 * live_ranges.elements ()); for (hash_map::iterator iter = live_ranges.begin (); iter != live_ranges.end (); ++iter) { tree var = (*iter).first; machine_mode mode = TYPE_MODE (TREE_TYPE (var)); unsigned int nregs = compute_nregs_for_mode (mode, biggest_mode, lmul); ranges.quick_push ({(*iter).second.first, true, nregs}); ranges.quick_push ({(*iter).second.second, false, nregs}); } ranges.qsort ([] (const void *a, const void *b) -> int { unsigned int aa = ((const range *)a)->pt; unsigned int bb = ((const range *)b)->pt; if (aa < bb) return -1; if (aa == bb) return 0; return 1; }); unsigned int cur = 0; max_nregs = ranges[0].nregs; for (auto r : ranges) { if (r.start) cur += r.nregs; else cur -= r.nregs; max_nregs = MAX (max_nregs, cur); } > + for (i = 0; i < cfun->gimple_df->ssa_names->length (); i++) > +{ > + tree t = ssa_name (i); > + if (!t) > + continue; Could likely be replaced by tree t; FOR_EACH_SSA_NAME (i, t, cfun) > +static void > +update_local_live_ranges ( > + vec_info *vinfo, > + hash_map> &program_points_per_bb, > + hash_map> &live_ranges_per_bb) > +{ I just realized (sorry) that this is "nested" a bit far. Can we still have e.g. > + if (loop_vec_info loop_vinfo = dyn_cast (vinfo)) > +{ this, > + if (STMT_VINFO_TYPE (vect_stmt_to_vectorize (stmt_info)) > + != undef_vec_info_type) this, > + if (live_range) > + { and this just "continue"? Apart from that, LGTM. Regards Robin
Re: [PATCH] RISC-V: Add missed cond autovec testcases
LGTM. juzhe.zh...@rivai.ai From: Lehua Ding Date: 2023-09-12 16:57 To: gcc-patches CC: juzhe.zhong; kito.cheng; rdapp.gcc; palmer; jeffreyalaw; lehua.ding Subject: [PATCH] RISC-V: Add missed cond autovec testcases This patch adds all missed cond autovec testcases. For not support cond patterns, the following patches will be sent to fix it. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/cond/cond_arith-1.c: Add vrem op. * gcc.target/riscv/rvv/autovec/cond/cond_arith-2.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_arith-3.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_arith-4.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_arith-5.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_arith-6.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_arith-7.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_arith-8.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_arith-9.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_logical_run-1.c: Moved to... * gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max_run-1.c: ...here. * gcc.target/riscv/rvv/autovec/cond/cond_logical_run-2.c: Moved to... * gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max_run-2.c: ...here. * gcc.target/riscv/rvv/autovec/cond/cond_logical_run-3.c: Moved to... * gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max_run-3.c: ...here. * gcc.target/riscv/rvv/autovec/cond/cond_logical_run-4.c: Moved to... * gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max_run-4.c: ...here. * gcc.target/riscv/rvv/autovec/cond/cond_logical_run-5.c: Moved to... * gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max_run-5.c: ...here. * gcc.target/riscv/rvv/autovec/cond/cond_logical-1.c: Removed. * gcc.target/riscv/rvv/autovec/cond/cond_logical-2.c: Removed. * gcc.target/riscv/rvv/autovec/cond/cond_logical-3.c: Removed. * gcc.target/riscv/rvv/autovec/cond/cond_logical-4.c: Removed. * gcc.target/riscv/rvv/autovec/cond/cond_logical-5.c: Removed. * gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max-1.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max-2.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max-3.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max-4.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max-5.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_widen_complicate-1.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_widen_complicate-2.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_widen_complicate-3.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_widen_complicate-4.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_widen_complicate-5.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_widen_complicate-6.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_widen_complicate-7.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_widen_complicate-8.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_widen_complicate-9.c: New test. --- .../riscv/rvv/autovec/cond/cond_arith-1.c | 13 + .../riscv/rvv/autovec/cond/cond_arith-2.c | 3 ++ .../riscv/rvv/autovec/cond/cond_arith-3.c | 15 ++ .../riscv/rvv/autovec/cond/cond_arith-4.c | 3 ++ .../riscv/rvv/autovec/cond/cond_arith-5.c | 13 + .../riscv/rvv/autovec/cond/cond_arith-6.c | 3 ++ .../riscv/rvv/autovec/cond/cond_arith-7.c | 9 .../riscv/rvv/autovec/cond/cond_arith-8.c | 17 ++- .../riscv/rvv/autovec/cond/cond_arith-9.c | 11 - .../riscv/rvv/autovec/cond/cond_logical-1.c | 43 .../riscv/rvv/autovec/cond/cond_logical-2.c | 43 .../riscv/rvv/autovec/cond/cond_logical-3.c | 43 .../riscv/rvv/autovec/cond/cond_logical-4.c | 43 .../riscv/rvv/autovec/cond/cond_logical-5.c | 43 .../rvv/autovec/cond/cond_logical_min_max-1.c | 49 +++ .../rvv/autovec/cond/cond_logical_min_max-2.c | 49 +++ .../rvv/autovec/cond/cond_logical_min_max-3.c | 49 +++ .../rvv/autovec/cond/cond_logical_min_max-4.c | 49 +++ .../rvv/autovec/cond/cond_logical_min_max-5.c | 49 +++ ...l_run-1.c => cond_logical_min_max_run-1.c} | 2 +- ...l_run-2.c => cond_logical_min_max_run-2.c} | 2 +- ...l_run-3.c => cond_logical_min_max_run-3.c} | 2 +- ...l_run-4.c => cond_logical_min_max_run-4.c} | 2 +- ...l_run-5.c => cond_logical_min_max_run-5.c} | 2 +- .../autovec/cond/cond_widen_complicate-1.c| 35 + .../autovec/cond/cond_widen_complicate-2.c| 35 + .../autovec/cond/cond_widen_complicate-3.c| 36 ++ .../autovec/cond/cond_widen_complicate-4.c| 35 + .../autovec/cond/cond_widen_complicate-5.c| 37 ++ .../autovec/cond/cond_widen_complicate-6.c| 32 .../autovec/cond/cond_widen_complicate-7.c| 29 +++ .../autovec/cond/cond_widen_complicate-8.c| 28 +++ .
Re: [PATCH v3] RISC-V: Implement RESOLVE_OVERLOADED_BUILTIN for RVV intrinsic
It looks reasonable to me now. But let's wait for kito's more comments. juzhe.zh...@rivai.ai From: pan2.li Date: 2023-09-12 16:46 To: gcc-patches CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng Subject: [PATCH v3] RISC-V: Implement RESOLVE_OVERLOADED_BUILTIN for RVV intrinsic From: Pan Li Update in v3: * Rewrite comment for overloaded function add. * Move get_non_overloaded_instance to function_base. Update in v2: * Add get_non_overloaded_instance for function instance. * Fix overload check for policy function. * Enrich the test cases check. Original log: This patch would like add the framework to support the RVV overloaded intrinsic API in riscv-xxx-xxx-gcc, like riscv-xxx-xxx-g++ did. However, it almost leverage the hook TARGET_RESOLVE_OVERLOADED_BUILTIN with below steps. * Register overloaded functions. * Add function_resolver for overloaded function resolving. * Add resolve API for function shape with default implementation. * Implement HOOK for navigating the overloaded API to non-overloaded API. We validated this framework by the vmv_v intrinsic API(s), and we will add more intrins API support in the underlying patches. gcc/ChangeLog: * config/riscv/riscv-c.cc (riscv_resolve_overloaded_builtin): New function for the hook. (riscv_register_pragmas): Register the hook * config/riscv/riscv-protos.h (resolve_overloaded_builtin): New decl. * config/riscv/riscv-vector-builtins-shapes.cc (build_one): Register overloaded function. (struct overloaded_base): New struct for overloaded shape. (struct non_overloaded_base): New struct for non overloaded shape. (struct move_def): Inherit overloaded shape. * config/riscv/riscv-vector-builtins.cc (function_base::get_non_overloaded_instance): New API impl. (function_builder::add_function): Add overloaded arg. (function_resolver::function_resolver): New constructor. (function_builder::add_overloaded_function): New API impl. (function_resolver::resolve): Ditto. (function_resolver::lookup): Ditto. (function_resolver::get_sub_code): Ditto. (resolve_overloaded_builtin): New function impl. * config/riscv/riscv-vector-builtins.h: (class function_resolver): New class. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/overloaded_rv32_vmv_v.c: New test. * gcc.target/riscv/rvv/base/overloaded_rv64_vmv_v.c: New test. * gcc.target/riscv/rvv/base/overloaded_vmv_v.h: New test. Signed-off-by: Pan Li --- gcc/config/riscv/riscv-c.cc | 36 gcc/config/riscv/riscv-protos.h | 1 + .../riscv/riscv-vector-builtins-shapes.cc | 20 ++- gcc/config/riscv/riscv-vector-builtins.cc | 155 +- gcc/config/riscv/riscv-vector-builtins.h | 36 +++- .../riscv/rvv/base/overloaded_rv32_vmv_v.c| 8 + .../riscv/rvv/base/overloaded_rv64_vmv_v.c| 8 + .../riscv/rvv/base/overloaded_vmv_v.h | 27 +++ 8 files changed, 288 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/overloaded_rv32_vmv_v.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/overloaded_rv64_vmv_v.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/overloaded_vmv_v.h diff --git a/gcc/config/riscv/riscv-c.cc b/gcc/config/riscv/riscv-c.cc index 283052ae313..060edd3129d 100644 --- a/gcc/config/riscv/riscv-c.cc +++ b/gcc/config/riscv/riscv-c.cc @@ -220,11 +220,47 @@ riscv_check_builtin_call (location_t loc, vec arg_loc, tree fndecl, gcc_unreachable (); } +/* Implement TARGET_RESOLVE_OVERLOADED_BUILTIN. */ +static tree +riscv_resolve_overloaded_builtin (unsigned int uncast_location, tree fndecl, + void *uncast_arglist) +{ + vec empty = {}; + location_t loc = (location_t) uncast_location; + vec *arglist = (vec *) uncast_arglist; + unsigned int code = DECL_MD_FUNCTION_CODE (fndecl); + unsigned int subcode = code >> RISCV_BUILTIN_SHIFT; + tree new_fndecl = NULL_TREE; + + if (!arglist) +arglist = ∅ + + switch (code & RISCV_BUILTIN_CLASS) +{ +case RISCV_BUILTIN_GENERAL: + break; +case RISCV_BUILTIN_VECTOR: + new_fndecl = riscv_vector::resolve_overloaded_builtin (loc, subcode, + arglist); + break; +default: + gcc_unreachable (); +} + + if (new_fndecl == NULL_TREE) +return new_fndecl; + + return build_function_call_vec (loc, vNULL, new_fndecl, arglist, NULL, + fndecl); +} + /* Implement REGISTER_TARGET_PRAGMAS. */ void riscv_register_pragmas (void) { + targetm.resolve_overloaded_builtin = riscv_resolve_overloaded_builtin; targetm.check_builtin_call = riscv_check_builtin_call; + c_register_pragma ("riscv", "intrinsic", riscv_pragma_intrinsic); } diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h index 6dbf6b9f943..5d2492dd031 100644 --- a/gcc/config/riscv/riscv-protos.h +++ b/gcc/config/riscv/riscv-protos.h @@ -381,6 +381,7 @@ gimple *gimple_fold_builtin (unsigned int, gimple_stmt_iterator *, gcall *); rtx expand
Re: [PATCH v2] RISC-V: Implement RESOLVE_OVERLOADED_BUILTIN for RVV intrinsic
I think it's better to move 'get_non_overloaded_instance' into function_base. + /* To avoid API conflicting, we use void return type and void argument + for the overloaded function register, like aarch64-sve. */ Plz rewrite the comments, don't mention aarch64 sve. Could you run your rvv intrinsic api ci with this patch? I am worrying that the resolve stuff will destroy the existing APi support. juzhe.zh...@rivai.ai From: pan2.li Date: 2023-09-12 15:20 To: gcc-patches CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng Subject: [PATCH v2] RISC-V: Implement RESOLVE_OVERLOADED_BUILTIN for RVV intrinsic From: Pan Li Update in v2: * Add get_non_overloaded_instance for function instance. * Fix overload check for policy function. * Enrich the test cases check. Original log: This patch would like add the framework to support the RVV overloaded intrinsic API in riscv-xxx-xxx-gcc, like riscv-xxx-xxx-g++ did. However, it almost leverage the hook TARGET_RESOLVE_OVERLOADED_BUILTIN with below steps. * Register overloaded functions. * Add function_resolver for overloaded function resolving. * Add resolve API for function shape with default implementation. * Implement HOOK for navigating the overloaded API to non-overloaded API. We validated this framework by the vmv_v intrinsic API(s), and we will add more intrins API support in the underlying patches. gcc/ChangeLog: * config/riscv/riscv-c.cc (riscv_resolve_overloaded_builtin): New function for the hook. (riscv_register_pragmas): Register the hook * config/riscv/riscv-protos.h (resolve_overloaded_builtin): New decl. * config/riscv/riscv-vector-builtins-shapes.cc (build_one): Register overloaded function. (struct overloaded_base): New struct for overloaded shape. (struct non_overloaded_base): New struct for non overloaded shape. (struct move_def): Inherit overloaded shape. * config/riscv/riscv-vector-builtins.cc (function_instance::get_non_overloaded_instance): New API impl. (function_builder::add_function): Add overloaded arg. (function_resolver::function_resolver): New constructor. (function_builder::add_overloaded_function): New API impl. (function_resolver::resolve): Ditto. (function_resolver::lookup): Ditto. (function_resolver::get_sub_code): Ditto. (resolve_overloaded_builtin): New function impl. * config/riscv/riscv-vector-builtins.h: (class function_resolver): New class. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/overloaded_rv32_vmv_v.c: New test. * gcc.target/riscv/rvv/base/overloaded_rv64_vmv_v.c: New test. * gcc.target/riscv/rvv/base/overloaded_vmv_v.h: New test. Signed-off-by: Pan Li --- gcc/config/riscv/riscv-c.cc | 36 gcc/config/riscv/riscv-protos.h | 1 + .../riscv/riscv-vector-builtins-shapes.cc | 20 ++- gcc/config/riscv/riscv-vector-builtins.cc | 155 +- gcc/config/riscv/riscv-vector-builtins.h | 35 +++- .../riscv/rvv/base/overloaded_rv32_vmv_v.c| 8 + .../riscv/rvv/base/overloaded_rv64_vmv_v.c| 8 + .../riscv/rvv/base/overloaded_vmv_v.h | 27 +++ 8 files changed, 287 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/overloaded_rv32_vmv_v.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/overloaded_rv64_vmv_v.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/overloaded_vmv_v.h diff --git a/gcc/config/riscv/riscv-c.cc b/gcc/config/riscv/riscv-c.cc index 283052ae313..060edd3129d 100644 --- a/gcc/config/riscv/riscv-c.cc +++ b/gcc/config/riscv/riscv-c.cc @@ -220,11 +220,47 @@ riscv_check_builtin_call (location_t loc, vec arg_loc, tree fndecl, gcc_unreachable (); } +/* Implement TARGET_RESOLVE_OVERLOADED_BUILTIN. */ +static tree +riscv_resolve_overloaded_builtin (unsigned int uncast_location, tree fndecl, + void *uncast_arglist) +{ + vec empty = {}; + location_t loc = (location_t) uncast_location; + vec *arglist = (vec *) uncast_arglist; + unsigned int code = DECL_MD_FUNCTION_CODE (fndecl); + unsigned int subcode = code >> RISCV_BUILTIN_SHIFT; + tree new_fndecl = NULL_TREE; + + if (!arglist) +arglist = ∅ + + switch (code & RISCV_BUILTIN_CLASS) +{ +case RISCV_BUILTIN_GENERAL: + break; +case RISCV_BUILTIN_VECTOR: + new_fndecl = riscv_vector::resolve_overloaded_builtin (loc, subcode, + arglist); + break; +default: + gcc_unreachable (); +} + + if (new_fndecl == NULL_TREE) +return new_fndecl; + + return build_function_call_vec (loc, vNULL, new_fndecl, arglist, NULL, + fndecl); +} + /* Implement REGISTER_TARGET_PRAGMAS. */ void riscv_register_pragmas (void) { + targetm.resolve_overloaded_builtin = riscv_resolve_overloaded_builtin; targetm.check_builtin_call = riscv_check_builtin_call; + c_register_pragma ("riscv", "intrinsic", riscv_pragma_intrinsic); } diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h inde
Re: Re: [PATCH V3] RISC-V: Support Dynamic LMUL Cost model
>> As long as we're just looking for the maximum number of live registers, >> we can use a sliding-window approach: create a structure with all >> start and end points, sort it, and increase the current pressure >> if we start a new range or decrease. That's O(n log n). I failed to see it can help. Current approach is straightforward. for (hash_map::iterator iter = live_ranges.begin (); iter != live_ranges.end (); ++iter) { tree var = (*iter).first; pair live_range = (*iter).second; for (i = live_range.first; i <= live_range.second; i++) { machine_mode mode = TYPE_MODE (TREE_TYPE (var)); unsigned int nregs = compute_nregs_for_mode (mode, biggest_mode, lmul); live_vars_vec[i] += nregs; if (live_vars_vec[i] > max_nregs) max_nregs = live_vars_vec[i]; } } Could you revise this piece of codes ? Other comments has been addressed in V4: https://gcc.gnu.org/pipermail/gcc-patches/2023-September/629959.html juzhe.zh...@rivai.ai From: Robin Dapp Date: 2023-09-12 04:31 To: Juzhe-Zhong; gcc-patches CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw Subject: Re: [PATCH V3] RISC-V: Support Dynamic LMUL Cost model Hi Juzhe, glad that we can use the dominator info directly. Could we move the calculation of the info to the beginning (if it's not available)? That makes it clearer that it's a prerequisite. Function comments look good now. Some general remarks kind of similar to v1: - I would prefer a hash_map or similar to hold the end point for a range instead of looking through potentially all ranges in contrived cases. - As long as we're just looking for the maximum number of live registers, we can use a sliding-window approach: create a structure with all start and end points, sort it, and increase the current pressure if we start a new range or decrease. That's O(n log n). > + const ssa_use_operand_t *const head = &(SSA_NAME_IMM_USE_NODE (t)); > + const ssa_use_operand_t *ptr; > + > + for (ptr = head->next; ptr != head; ptr = ptr->next) > + { Why does FOR_EACH_IMM_USE not work here? > + unsigned int max_point > + = (*program_points_per_bb.get (e->src)).length () - 1; > + for (k = 0; k < (*live_ranges).length (); k++) > + { > + if ((*live_ranges)[i].var == def) Would also be nice not having to search through all ranges but just index/hash it via var (or similar). What about one test with global live ranges? Not a necessity IMHO we can still add it later. Regards Robin
Re: [PATCH] RISC-V: Add vcreate intrinsics for RVV tuple types
Thanks for support it. LGTM from my side. Wait for kito's more comments. juzhe.zh...@rivai.ai From: Li Xu Date: 2023-09-12 10:08 To: gcc-patches CC: kito.cheng; palmer; juzhe.zhong; pan2.li; gaofei; wangfeng; xuli Subject: [PATCH] RISC-V: Add vcreate intrinsics for RVV tuple types From: xuli gcc/ChangeLog: * config/riscv/riscv-vector-builtins-bases.cc (class vcreate): (BASE): New class. * config/riscv/riscv-vector-builtins-bases.h: Ditto. * config/riscv/riscv-vector-builtins-functions.def (vcreate): Add vcreate support. * config/riscv/riscv-vector-builtins-shapes.cc (struct vcreate_def): Ditto. (SHAPE): Ditto. * config/riscv/riscv-vector-builtins-shapes.h: Ditto. * config/riscv/riscv-vector-builtins.cc: Add args type. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/tuple_create.c: New test. --- .../riscv/riscv-vector-builtins-bases.cc | 40 ++ .../riscv/riscv-vector-builtins-bases.h | 1 + .../riscv/riscv-vector-builtins-functions.def | 1 + .../riscv/riscv-vector-builtins-shapes.cc | 50 +++ .../riscv/riscv-vector-builtins-shapes.h | 1 + gcc/config/riscv/riscv-vector-builtins.cc | 12 ++ .../gcc.target/riscv/rvv/base/tuple_create.c | 123 ++ 7 files changed, 228 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/tuple_create.c diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc b/gcc/config/riscv/riscv-vector-builtins-bases.cc index 8e679f72392..be3df2c1ea2 100644 --- a/gcc/config/riscv/riscv-vector-builtins-bases.cc +++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc @@ -1824,6 +1824,44 @@ public: } }; +class vcreate : public function_base +{ +public: + gimple *fold (gimple_folder &f) const override + { +unsigned int nargs = gimple_call_num_args (f.call); +tree lhs_type = TREE_TYPE (f.lhs); + +/* Replace the call with a clobber of the result (to prevent it from + becoming upwards exposed) followed by stores into each individual + vector of tuple. + + The fold routines expect the replacement statement to have the + same lhs as the original call, so return the clobber statement + rather than the final vector store. */ +gassign *clobber = gimple_build_assign (f.lhs, build_clobber (lhs_type)); + +for (unsigned int i = nargs; i-- > 0; ) + { + tree rhs_vector = gimple_call_arg (f.call, i); + tree field = tuple_type_field (TREE_TYPE (f.lhs)); + tree lhs_array = build3 (COMPONENT_REF, TREE_TYPE (field), + unshare_expr (f.lhs), field, NULL_TREE); + tree lhs_vector = build4 (ARRAY_REF, TREE_TYPE (rhs_vector), + lhs_array, size_int (i), + NULL_TREE, NULL_TREE); + gassign *assign = gimple_build_assign (lhs_vector, rhs_vector); + gsi_insert_after (f.gsi, assign, GSI_SAME_STMT); + } +return clobber; + } + + rtx expand (function_expander &e) const override + { +return NULL_RTX; + } +}; + class read_vl : public function_base { public: @@ -2285,6 +2323,7 @@ static CONSTEXPR const vlmul_ext vlmul_ext_obj; static CONSTEXPR const vlmul_trunc vlmul_trunc_obj; static CONSTEXPR const vset vset_obj; static CONSTEXPR const vget vget_obj; +static CONSTEXPR const vcreate vcreate_obj; static CONSTEXPR const read_vl read_vl_obj; static CONSTEXPR const vleff vleff_obj; static CONSTEXPR const vlenb vlenb_obj; @@ -2546,6 +2585,7 @@ BASE (vlmul_ext) BASE (vlmul_trunc) BASE (vset) BASE (vget) +BASE (vcreate) BASE (read_vl) BASE (vleff) BASE (vlenb) diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h b/gcc/config/riscv/riscv-vector-builtins-bases.h index 69d4562091f..131041ea66f 100644 --- a/gcc/config/riscv/riscv-vector-builtins-bases.h +++ b/gcc/config/riscv/riscv-vector-builtins-bases.h @@ -267,6 +267,7 @@ extern const function_base *const vlmul_ext; extern const function_base *const vlmul_trunc; extern const function_base *const vset; extern const function_base *const vget; +extern const function_base *const vcreate; extern const function_base *const read_vl; extern const function_base *const vleff; extern const function_base *const vlenb; diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def b/gcc/config/riscv/riscv-vector-builtins-functions.def index 3ce06dc60b7..18ed2c2b8f6 100644 --- a/gcc/config/riscv/riscv-vector-builtins-functions.def +++ b/gcc/config/riscv/riscv-vector-builtins-functions.def @@ -621,6 +621,7 @@ DEF_RVV_FUNCTION (vget, vget, none_preds, all_v_vget_lmul4_x2_ops) // Tuple types DEF_RVV_FUNCTION (vset, vset, none_preds, all_v_vset_tuple_ops) DEF_RVV_FUNCTION (vget, vget, none_preds, all_v_vget_tuple_ops) +DEF_RVV_FUNCTION (vcreate, vcreate, none_preds, all_v_vcreate_tuple_ops) DEF_RVV_FUNCTION (vlseg, seg_loadstore, full_preds, tuple_v_scalar_const_ptr_ops) DEF_RVV_FUNCTION (vsseg, seg_loadstore, none_m_preds, tuple_v_scalar_ptr_ops) DEF_RVV_FUNCTION (vlsseg, seg_loadstore, full_preds, tuple_v_scalar_const_ptr_p
Re: RE: [PATCH v1] RISC-V: Implement RESOLVE_OVERLOADED_BUILTIN for RVV intrinsic
Add a function call get_non_overloaded_instance into instance. The instance already know it is void vmv (void). In this function search the arglist. and return the real non-overloaded decl. juzhe.zh...@rivai.ai From: Li, Pan2 Date: 2023-09-12 09:20 To: 钟居哲 CC: kito.cheng; gcc-patches; Wang, Yanzhang Subject: RE: RE: [PATCH v1] RISC-V: Implement RESOLVE_OVERLOADED_BUILTIN for RVV intrinsic We cannot leverage this instance for correctness. The rfun of below code is the overloaded builtin is for the overloaded function, which is registered as void xxx(void) as aarch64 did to avoid the conflict. Let’s take vmv_v_i32m1 as example in rfun table. Index 0: void vmv_v(void) overloaded Index 1: i32m1 vmv_v_v_i32m1_i32m1 (i32m1, size_t) non-overloaded Index 2: placeholder. When we enter the hook(aka the code list below), the rfun we have is the index 0 rfun instead of index 1. Then we need the arglist to lookup the rfun of index 1 for the underlying call, as well as build the instance for the index 1 rfun. Aarch64 has the same rfun table as above, they leverage a loop to parse the arglist with machine mode matching in a predefined type suffix(which is not available in RISC-V). I think they almost try to resolve the same problem but different implement details. Pan From: 钟居哲 Sent: Tuesday, September 12, 2023 7:20 AM To: Li, Pan2 Cc: kito.cheng ; gcc-patches ; Wang, Yanzhang Subject: Re: RE: [PATCH v1] RISC-V: Implement RESOLVE_OVERLOADED_BUILTIN for RVV intrinsic I don't understand. +tree+resolve_overloaded_builtin (location_t loc, unsigned int code,+ vec *arglist)+{+ if (code >= vec_safe_length (registered_functions))+return NULL_TREE;++ const registered_function *rfun = (*registered_functions)[code];++ if (!rfun || !rfun->overloaded_p)+ return NULL_TREE;++ return function_resolver (loc, rfun->instance, rfun->decl, *arglist)+.resolve ();+} You already have rfun->instance. Just use this instance should be good enough. juzhe.zh...@rivai.ai From: Li, Pan2 Date: 2023-09-11 23:24 To: 钟居哲 CC: kito.cheng; gcc-patches; Wang, Yanzhang Subject: RE: RE: [PATCH v1] RISC-V: Implement RESOLVE_OVERLOADED_BUILTIN for RVV intrinsic For function instance with void or void arguments, it is easy as you mentioned as below. For generate API (to get the right hash), you need to build the rvv_type_info, predications_type_index and rvv_op_info from the arglist (aka vec) from hook. Then we need to construct above parameters from one tree argument. Sorry I not sure if I understand correctly but I failed to locate somewhere has similar usage. Could you please help to insight me some best practice about the transformation from tree to above types? Pan From: 钟居哲 Sent: Monday, September 11, 2023 9:07 PM To: Li, Pan2 Cc: kito.cheng ; gcc-patches ; Wang, Yanzhang Subject: Re: RE: [PATCH v1] RISC-V: Implement RESOLVE_OVERLOADED_BUILTIN for RVV intrinsic function_instance get_read_vl_instance (void) { return function_instance ("read_vl", bases::read_vl, shapes::read_vl, none_ops[0], PRED_TYPE_none, &p_none_void_ops); } tree get_read_vl_decl (void) { function_instance instance = get_read_vl_instance (); hashval_t hash = instance.hash (); registered_function *rfn = function_table->find_with_hash (instance, hash); gcc_assert (rfn); return rfn->decl; } You should reference it. I don't see why it's hard for use to construct instance first, then use that instance hash to get the decl. juzhe.zh...@rivai.ai From: Li, Pan2 Date: 2023-09-11 20:26 To: juzhe.zhong CC: kito.cheng; gcc-patches; Wang, Yanzhang Subject: RE: [PATCH v1] RISC-V: Implement RESOLVE_OVERLOADED_BUILTIN for RVV intrinsic > No. You must construct instance. 'strcmp' is very ugly. Strcmp here is defensive code here for early exit if not found (can be removed for correctness), which is not required to find the right declaration. Pan From: juzhe.zhong Sent: Monday, September 11, 2023 8:20 PM To: Li, Pan2 Cc: kito.cheng ; gcc-patches ; Wang, Yanzhang Subject: Re: [PATCH v1] RISC-V: Implement RESOLVE_OVERLOADED_BUILTIN for RVV intrinsic No. You must construct instance. 'strcmp' is very ugly. Replied Message From Li, Pan2 Date 09/11/2023 20:09 To juzhe.zh...@rivai.ai, kito.cheng Cc gcc-patches, Wang, Yanzhang Subject RE: Re: [PATCH v1] RISC-V: Implement RESOLVE_OVERLOADED_BUILTIN for RVV intrinsic > -if (overloaded_p && instance.pred == PRED_TYPE_m) > +if (overloaded_p) Thanks for pointing this out, my misunderstanding for policy function result in this change as mistake, will send V2 for this. > Plz change it into : Actually, it is not easy to convert to this approach as > aarch64 has different implementation of types information.Like > type_suffix_info (aarch64 loop type suffix to get the arglist type in > infer_vecto
Re: Re: [PATCH V3] RISC-V: Support Dynamic LMUL Cost model
>> What about one test with global live ranges? Not a necessity IMHO we can >> still >> add it later. We already have. juzhe.zh...@rivai.ai From: Robin Dapp Date: 2023-09-12 04:31 To: Juzhe-Zhong; gcc-patches CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw Subject: Re: [PATCH V3] RISC-V: Support Dynamic LMUL Cost model Hi Juzhe, glad that we can use the dominator info directly. Could we move the calculation of the info to the beginning (if it's not available)? That makes it clearer that it's a prerequisite. Function comments look good now. Some general remarks kind of similar to v1: - I would prefer a hash_map or similar to hold the end point for a range instead of looking through potentially all ranges in contrived cases. - As long as we're just looking for the maximum number of live registers, we can use a sliding-window approach: create a structure with all start and end points, sort it, and increase the current pressure if we start a new range or decrease. That's O(n log n). > + const ssa_use_operand_t *const head = &(SSA_NAME_IMM_USE_NODE (t)); > + const ssa_use_operand_t *ptr; > + > + for (ptr = head->next; ptr != head; ptr = ptr->next) > + { Why does FOR_EACH_IMM_USE not work here? > + unsigned int max_point > + = (*program_points_per_bb.get (e->src)).length () - 1; > + for (k = 0; k < (*live_ranges).length (); k++) > + { > + if ((*live_ranges)[i].var == def) Would also be nice not having to search through all ranges but just index/hash it via var (or similar). What about one test with global live ranges? Not a necessity IMHO we can still add it later. Regards Robin
Re: Re: [PATCH v1] RISC-V: Implement RESOLVE_OVERLOADED_BUILTIN for RVV intrinsic
>> Just make sure it's the right change? It seem incorrect to me. More comments (I just reviewed again): +tree +function_resolver::lookup () +{ + unsigned int code_limit = vec_safe_length (registered_functions); + + for (unsigned code = get_sub_code () + 1; code < code_limit; code++) +{ + registered_function *rfun = (*registered_functions)[code]; + function_instance instance = rfun->instance; + + if (strcmp (base_name, instance.base_name) != 0) + break; + + if (rfun->overloaded_p) + continue; + + unsigned k; + const rvv_arg_type_info *args = instance.op_info->args; + + for (k = 0; args[k].base_type != NUM_BASE_TYPES; k++) + { + if (k >= m_arglist.length ()) + break; + + if (TYPE_MODE (instance.get_arg_type (k)) + != TYPE_MODE (TREE_TYPE (m_arglist[k]))) + break; + } + + if (args[k].base_type == NUM_BASE_TYPES) + return rfun->decl; +} + + return NULL_TREE; +} Plz change it into : /* Silently check whether there is an instance of the function with the mode suffix given by MODE and the type suffixes given by TYPE0 and TYPE1. Return its function decl if so, otherwise return null. */ tree function_resolver::lookup_form (mode_suffix_index mode, type_suffix_index type0, type_suffix_index type1) { type_suffix_pair types = { type0, type1 }; function_instance instance (base_name, base, shape, mode, types, pred); registered_function *rfn = function_table->find_with_hash (instance, instance.hash ()); return rfn ? rfn->decl : NULL_TREE; } juzhe.zh...@rivai.ai From: Kito Cheng Date: 2023-09-11 17:04 To: juzhe.zh...@rivai.ai CC: pan2.li; gcc-patches; yanzhang.wang Subject: Re: [PATCH v1] RISC-V: Implement RESOLVE_OVERLOADED_BUILTIN for RVV intrinsic > @@ -545,7 +563,7 @@ struct move_def : public build_base > /* According to rvv-intrinsic-doc, it does not add "_m" suffix > for vop_m C++ overloaded API. */ > -if (overloaded_p && instance.pred == PRED_TYPE_m) > +if (overloaded_p) Just make sure it's the right change? >return b.finish_name (); > b.append_name (predication_suffixes[instance.pred]); > return b.finish_name ();
Re: [PATCH v1] RISC-V: Implement RESOLVE_OVERLOADED_BUILTIN for RVV intrinsic
Thanks for supporting it even though I don't like this feature :). The framework is LGTM. Let's wait for kito's more comments. juzhe.zh...@rivai.ai From: pan2.li Date: 2023-09-11 15:57 To: gcc-patches CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng Subject: [PATCH v1] RISC-V: Implement RESOLVE_OVERLOADED_BUILTIN for RVV intrinsic From: Pan Li This patch would like add the framework to support the RVV overloaded intrinsic API in riscv-xxx-xxx-gcc, like riscv-xxx-xxx-g++ did. However, it almost leverage the hook TARGET_RESOLVE_OVERLOADED_BUILTIN with below steps. * Register overloaded functions. * Add function_resolver for overloaded function resolving. * Add resolve API for function shape with default implementation. * Implement HOOK for navigating the overloaded API to non-overloaded API. We validated this framework by the vmv_v intrinsic API(s), and we will add more intrins API support in the underlying patches. gcc/ChangeLog: * config/riscv/riscv-c.cc (riscv_resolve_overloaded_builtin): New function for the hook. (riscv_register_pragmas): Register the hook * config/riscv/riscv-protos.h (resolve_overloaded_builtin): New decl. * config/riscv/riscv-vector-builtins-shapes.cc (build_one): Register overloaded function. (struct overloaded_base): New struct for overloaded shape. (struct non_overloaded_base): New struct for non overloaded shape. (struct move_def): Inherit overloaded shape. * config/riscv/riscv-vector-builtins.cc (function_builder::add_function): Add overloaded arg. (function_builder::add_overloaded_function): New function impl. (function_resolver::function_resolver): New constructor. (function_resolver::get_sub_code): New API impl. (function_resolver::resolve): New API impl. (function_resolver::lookup): New API impl. (resolve_overloaded_builtin): New func impl. * config/riscv/riscv-vector-builtins.h (class function_resolver): New class. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/overloaded_rv32_vmv_v.c: New test. * gcc.target/riscv/rvv/base/overloaded_rv64_vmv_v.c: New test. * gcc.target/riscv/rvv/base/overloaded_vmv_v.h: New test. Signed-off-by: Pan Li --- gcc/config/riscv/riscv-c.cc | 36 + gcc/config/riscv/riscv-protos.h | 1 + .../riscv/riscv-vector-builtins-shapes.cc | 22 ++- gcc/config/riscv/riscv-vector-builtins.cc | 138 +- gcc/config/riscv/riscv-vector-builtins.h | 30 +++- .../riscv/rvv/base/overloaded_rv32_vmv_v.c| 4 + .../riscv/rvv/base/overloaded_rv64_vmv_v.c| 4 + .../riscv/rvv/base/overloaded_vmv_v.h | 17 +++ 8 files changed, 248 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/overloaded_rv32_vmv_v.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/overloaded_rv64_vmv_v.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/overloaded_vmv_v.h diff --git a/gcc/config/riscv/riscv-c.cc b/gcc/config/riscv/riscv-c.cc index 283052ae313..060edd3129d 100644 --- a/gcc/config/riscv/riscv-c.cc +++ b/gcc/config/riscv/riscv-c.cc @@ -220,11 +220,47 @@ riscv_check_builtin_call (location_t loc, vec arg_loc, tree fndecl, gcc_unreachable (); } +/* Implement TARGET_RESOLVE_OVERLOADED_BUILTIN. */ +static tree +riscv_resolve_overloaded_builtin (unsigned int uncast_location, tree fndecl, + void *uncast_arglist) +{ + vec empty = {}; + location_t loc = (location_t) uncast_location; + vec *arglist = (vec *) uncast_arglist; + unsigned int code = DECL_MD_FUNCTION_CODE (fndecl); + unsigned int subcode = code >> RISCV_BUILTIN_SHIFT; + tree new_fndecl = NULL_TREE; + + if (!arglist) +arglist = ∅ + + switch (code & RISCV_BUILTIN_CLASS) +{ +case RISCV_BUILTIN_GENERAL: + break; +case RISCV_BUILTIN_VECTOR: + new_fndecl = riscv_vector::resolve_overloaded_builtin (loc, subcode, + arglist); + break; +default: + gcc_unreachable (); +} + + if (new_fndecl == NULL_TREE) +return new_fndecl; + + return build_function_call_vec (loc, vNULL, new_fndecl, arglist, NULL, + fndecl); +} + /* Implement REGISTER_TARGET_PRAGMAS. */ void riscv_register_pragmas (void) { + targetm.resolve_overloaded_builtin = riscv_resolve_overloaded_builtin; targetm.check_builtin_call = riscv_check_builtin_call; + c_register_pragma ("riscv", "intrinsic", riscv_pragma_intrinsic); } diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h index 6dbf6b9f943..5d2492dd031 100644 --- a/gcc/config/riscv/riscv-protos.h +++ b/gcc/config/riscv/riscv-protos.h @@ -381,6 +381,7 @@ gimple *gimple_fold_builtin (unsigned int, gimple_stmt_iterator *, gcall *); rtx expand_builtin (unsigned int, tree, rtx); bool check_builtin_call (location_t, vec, unsigned int, tree, unsigned int, tree *); +tree resolve_overloaded_builtin (location_t, unsigned int, vec *); bool const_vec_all_same_in_range_p (rtx, HOST_WIDE_INT, HOST_WIDE_INT); bool legit
Re: Re: [PATCH] RISC-V: Use dominance analysis in global vsetvl elimination
Committed. Thanks kito. >> I guess you will remove get_all_predecessors once LMUL cost >> model can use dominator info as well? Yes. I am trying but there is a failed case for dynamic LMUL. Not sure whether it can work now. juzhe.zh...@rivai.ai From: Kito Cheng Date: 2023-09-11 15:03 To: Juzhe-Zhong CC: gcc-patches; kito.cheng Subject: Re: [PATCH] RISC-V: Use dominance analysis in global vsetvl elimination LGTM, and I guess you will remove get_all_predecessors once LMUL cost model can use dominator info as well? On Mon, Sep 11, 2023 at 11:34 AM Juzhe-Zhong wrote: > > I found that it's more reasonable to use existing dominance analysis. > > gcc/ChangeLog: > > * config/riscv/riscv-vsetvl.cc > (pass_vsetvl::global_eliminate_vsetvl_insn): Use dominance analysis. > (pass_vsetvl::init): Ditto. > (pass_vsetvl::done): Ditto. > > --- > gcc/config/riscv/riscv-vsetvl.cc | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/gcc/config/riscv/riscv-vsetvl.cc > b/gcc/config/riscv/riscv-vsetvl.cc > index 134b97737ae..f81361c4ccd 100644 > --- a/gcc/config/riscv/riscv-vsetvl.cc > +++ b/gcc/config/riscv/riscv-vsetvl.cc > @@ -4054,7 +4054,7 @@ pass_vsetvl::global_eliminate_vsetvl_insn (const > bb_info *bb) const > } > >/* Step1: Reshape the VL/VTYPE status to make sure everything compatible. > */ > - hash_set pred_cfg_bbs = get_all_predecessors (cfg_bb); > + auto_vec pred_cfg_bbs = get_dominated_by > (CDI_POST_DOMINATORS, cfg_bb); >FOR_EACH_EDGE (e, ei, cfg_bb->preds) > { >sbitmap avout = m_vector_manager->vector_avout[e->src->index]; > @@ -4243,6 +4243,7 @@ pass_vsetvl::init (void) > { >/* Initialization of RTL_SSA. */ >calculate_dominance_info (CDI_DOMINATORS); > + calculate_dominance_info (CDI_POST_DOMINATORS); >df_analyze (); >crtl->ssa = new function_info (cfun); > } > @@ -4264,6 +4265,7 @@ pass_vsetvl::done (void) > { >/* Finalization of RTL_SSA. */ >free_dominance_info (CDI_DOMINATORS); > + free_dominance_info (CDI_POST_DOMINATORS); >if (crtl->ssa->perform_pending_updates ()) > cleanup_cfg (0); >delete crtl->ssa; > -- > 2.36.3 >
Re: [PATCH] RISC-V: Enable RVV scalable vectorization by default[PR111311]
Ping this patch. I think it's time to enable scalable vectorization by default and do the whole regression every time (except vect.exp that we didn't enable yet) Update current FAILs status: Real FAILS (ICE and execution FAIL): FAIL: gcc.dg/pr70252.c (internal compiler error: in gimple_expand_vec_cond_expr, at gimple-isel.cc:284) FAIL: gcc.dg/pr70252.c (test for excess errors) FAIL: gcc.dg/pr92301.c execution test Robin is working on these 3 issues and will be solved soon. FAIL: g++.dg/torture/vshuf-v4df.C -O2 -flto -fno-use-linker-plugin -flto-partition=none (internal compiler error: in as_a, at machmode.h:381) FAIL: g++.dg/torture/vshuf-v4df.C -O2 -flto -fno-use-linker-plugin -flto-partition=none (test for excess errors) FAIL: g++.dg/torture/vshuf-v4df.C -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects (internal compiler error: in as_a, at machmode.h:381) FAIL: g++.dg/torture/vshuf-v4df.C -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects (test for excess errors) This is a long time known issue I have mentioned many times, we need help for LTO since it's caused by mode bits extension. The rest bogus FAILs: FAIL: gcc.dg/unroll-8.c scan-rtl-dump loop2_unroll "Not unrolling loop, doesn't roll" FAIL: gcc.dg/unroll-8.c scan-rtl-dump loop2_unroll "likely upper bound: 6" FAIL: gcc.dg/unroll-8.c scan-rtl-dump loop2_unroll "realistic bound: -1" FAIL: gcc.dg/var-expand1.c scan-rtl-dump loop2_unroll "Expanding Accumulator" FAIL: gcc.dg/tree-ssa/cunroll-16.c scan-tree-dump cunroll "optimized: loop with [0-9]+ iterations completely unrolled" FAIL: gcc.dg/tree-ssa/cunroll-16.c scan-tree-dump-not optimized "foo" FAIL: gcc.dg/tree-ssa/forwprop-40.c scan-tree-dump-times optimized "BIT_FIELD_REF" 0 FAIL: gcc.dg/tree-ssa/forwprop-40.c scan-tree-dump-times optimized "BIT_INSERT_EXPR" 0 FAIL: gcc.dg/tree-ssa/forwprop-41.c scan-tree-dump-times optimized "BIT_FIELD_REF" 0 FAIL: gcc.dg/tree-ssa/forwprop-41.c scan-tree-dump-times optimized "BIT_INSERT_EXPR" 1 FAIL: gcc.dg/tree-ssa/gen-vect-11b.c scan-tree-dump-times vect "vectorized 0 loops" 1 FAIL: gcc.dg/tree-ssa/gen-vect-11c.c scan-tree-dump-times vect "vectorized 0 loops" 1 FAIL: gcc.dg/tree-ssa/gen-vect-26.c scan-tree-dump-times vect "Alignment of access forced using peeling" 1 FAIL: gcc.dg/tree-ssa/gen-vect-28.c scan-tree-dump-times vect "Alignment of access forced using peeling" 1 FAIL: gcc.dg/tree-ssa/loop-bound-1.c scan-tree-dump ivopts "bounded by 254" FAIL: gcc.dg/tree-ssa/loop-bound-2.c scan-tree-dump ivopts "bounded by 254" FAIL: gcc.dg/tree-ssa/predcom-2.c scan-tree-dump-times pcom "Unrolling 2 times." 2 FAIL: gcc.dg/tree-ssa/predcom-4.c scan-tree-dump-times pcom "Combination" 1 FAIL: gcc.dg/tree-ssa/predcom-4.c scan-tree-dump-times pcom "Unrolling 3 times." 1 FAIL: gcc.dg/tree-ssa/predcom-5.c scan-tree-dump-times pcom "Combination" 2 FAIL: gcc.dg/tree-ssa/predcom-5.c scan-tree-dump-times pcom "Unrolling 3 times." 1 FAIL: gcc.dg/tree-ssa/predcom-9.c scan-tree-dump pcom "Executing predictive commoning without unrolling" FAIL: gcc.dg/tree-ssa/reassoc-46.c scan-tree-dump-times optimized "(?:vect_)?sum_[\\d._]+ = (?:(?:vect_)?_[\\d._]+ \\+ (?:vect_)?sum_[\\d._]+|(?:v ect_)?sum_[\\d._]+ \\+ (?:vect_)?_[\\d._]+)" 1 FAIL: gcc.dg/tree-ssa/scev-10.c scan-tree-dump-times ivopts " Type:\\tREFERENCE ADDRESS\n" 1 FAIL: gcc.dg/tree-ssa/scev-11.c scan-tree-dump-times ivopts " Type:\\tREFERENCE ADDRESS\n" 2 FAIL: gcc.dg/tree-ssa/scev-14.c scan-tree-dump ivopts "Overflowness wrto loop niter:\tNo-overflow" FAIL: gcc.dg/tree-ssa/scev-9.c scan-tree-dump-times ivopts " Type:\\tREFERENCE ADDRESS\n" 1 FAIL: gcc.dg/tree-ssa/split-path-11.c scan-tree-dump-times split-paths "join point for if-convertable half-diamond" 1 These are bogus dump FAILs and I have 100% confirm each of them, we are having same behavior as SVE. So is this patch ok for trunk ? juzhe.zh...@rivai.ai From: Juzhe-Zhong Date: 2023-09-07 15:28 To: gcc-patches CC: kito.cheng; kito.cheng; jeffreyalaw; rdapp.gcc; Juzhe-Zhong Subject: [PATCH] RISC-V: Enable RVV scalable vectorization by default[PR111311] This patch is not ready but they all will be fixed very soon. gcc/ChangeLog: * config/riscv/riscv.opt: Set default as scalable vectorization. --- gcc/config/riscv/riscv.opt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt index 98f342348b7..bf2eca08221 100644 --- a/gcc/config/riscv/riscv.opt +++ b/gcc/config/riscv/riscv.opt @@ -292,7 +292,7 @@ EnumValue Enum(riscv_autovec_preference) String(fixed-vlmax) Value(RVV_FIXED_VLMAX) -param=riscv-autovec-preference= -Target RejectNegative Joined Enum(riscv_autovec_preference) Var(riscv_autovec_preference) Init(NO_AUTOVEC) +Target RejectNegative Joined Enum(riscv_autovec_preference) Var(riscv_autovec_preference) Init(RVV_SCALABLE) -param=riscv-autovec-preference= Set the preference of auto-vectorization in the RISC-V port. Enum -- 2.36.3
Re: Re: [PATCH] RISC-V: Add VLS modes VEC_PERM support[PR111311]
Sure. Thanks kito. juzhe.zh...@rivai.ai From: Kito Cheng Date: 2023-09-11 10:57 To: juzhe.zh...@rivai.ai CC: gcc-patches; Kito.cheng Subject: Re: Re: [PATCH] RISC-V: Add VLS modes VEC_PERM support[PR111311] OK, but could you split this patch into two patches? pre-approved for both. On Mon, Sep 11, 2023 at 10:36 AM juzhe.zh...@rivai.ai wrote: > > >> Should we also add loads and stores as well? > >> and just make sure this is also necessary for the fix and not sneaky, > >> right? > > No, we don't need loads/stores. Since this following handling codes: > (define_insn_and_split "*mov_lra" > [(set (match_operand:VLS_AVL_REG 0 "reg_or_mem_operand" "=vr, m,vr") > (match_operand:VLS_AVL_REG 1 "reg_or_mem_operand" " m,vr,vr")) >(clobber (match_scratch:P 2 "=&r,&r,X"))] > "TARGET_VECTOR && (lra_in_progress || reload_completed) >&& (register_operand (operands[0], mode) >|| register_operand (operands[1], mode))" > "#" > "&& reload_completed" > [(const_int 0)] > { > if (REG_P (operands[0]) && REG_P (operands[1])) > emit_insn (gen_rtx_SET (operands[0], operands[1])); > else > { > emit_move_insn (operands[2], gen_int_mode (GET_MODE_NUNITS > (mode), > Pmode)); > unsigned insn_flags > = GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL > ? riscv_vector::UNARY_MASK_OP > : riscv_vector::UNARY_OP; > riscv_vector::emit_nonvlmax_insn (code_for_pred_mov > (mode), > insn_flags, operands, operands[2]); > } > DONE; > } > [(set_attr "type" "vmov")] > ) > > We split special case use emit_insn (gen_rtx_SET (operands[0], operands[1])); > > Missing this pattern will cause ICE but current testcases didn't produce such > issues. > This issue is recognized after I support this pattern. > > > > juzhe.zh...@rivai.ai > > From: Kito Cheng > Date: 2023-09-11 10:18 > To: Juzhe-Zhong > CC: gcc-patches; kito.cheng > Subject: Re: [PATCH] RISC-V: Add VLS modes VEC_PERM support[PR111311] > > diff --git a/gcc/config/riscv/autovec-vls.md > > b/gcc/config/riscv/autovec-vls.md > > index d208b418e5f..6f48f7d6232 100644 > > --- a/gcc/config/riscv/autovec-vls.md > > +++ b/gcc/config/riscv/autovec-vls.md > > @@ -148,6 +148,14 @@ > >[(set_attr "type" "vmov") > > (set_attr "mode" "")]) > > > > +(define_insn "*mov_vls" > > + [(set (match_operand:VLSB 0 "register_operand" "=vr") > > + (match_operand:VLSB 1 "register_operand" " vr"))] > > + "TARGET_VECTOR" > > + "vmv1r.v\t%0,%1" > > + [(set_attr "type" "vmov") > > + (set_attr "mode" "")]) > > Should we also add loads and stores as well? > and just make sure this is also necessary for the fix and not sneaky, right? > > > + > > (define_expand "movmisalign" > >[(set (match_operand:VLS 0 "nonimmediate_operand") > > (match_operand:VLS 1 "general_operand"))] >
Re: Re: [PATCH] RISC-V: Add VLS modes VEC_PERM support[PR111311]
>> Should we also add loads and stores as well? >> and just make sure this is also necessary for the fix and not sneaky, right? No, we don't need loads/stores. Since this following handling codes: (define_insn_and_split "*mov_lra" [(set (match_operand:VLS_AVL_REG 0 "reg_or_mem_operand" "=vr, m,vr") (match_operand:VLS_AVL_REG 1 "reg_or_mem_operand" " m,vr,vr")) (clobber (match_scratch:P 2 "=&r,&r,X"))] "TARGET_VECTOR && (lra_in_progress || reload_completed) && (register_operand (operands[0], mode) || register_operand (operands[1], mode))" "#" "&& reload_completed" [(const_int 0)] { if (REG_P (operands[0]) && REG_P (operands[1])) emit_insn (gen_rtx_SET (operands[0], operands[1])); else { emit_move_insn (operands[2], gen_int_mode (GET_MODE_NUNITS (mode), Pmode)); unsigned insn_flags = GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL ? riscv_vector::UNARY_MASK_OP : riscv_vector::UNARY_OP; riscv_vector::emit_nonvlmax_insn (code_for_pred_mov (mode), insn_flags, operands, operands[2]); } DONE; } [(set_attr "type" "vmov")] ) We split special case use emit_insn (gen_rtx_SET (operands[0], operands[1])); Missing this pattern will cause ICE but current testcases didn't produce such issues. This issue is recognized after I support this pattern. juzhe.zh...@rivai.ai From: Kito Cheng Date: 2023-09-11 10:18 To: Juzhe-Zhong CC: gcc-patches; kito.cheng Subject: Re: [PATCH] RISC-V: Add VLS modes VEC_PERM support[PR111311] > diff --git a/gcc/config/riscv/autovec-vls.md b/gcc/config/riscv/autovec-vls.md > index d208b418e5f..6f48f7d6232 100644 > --- a/gcc/config/riscv/autovec-vls.md > +++ b/gcc/config/riscv/autovec-vls.md > @@ -148,6 +148,14 @@ >[(set_attr "type" "vmov") > (set_attr "mode" "")]) > > +(define_insn "*mov_vls" > + [(set (match_operand:VLSB 0 "register_operand" "=vr") > + (match_operand:VLSB 1 "register_operand" " vr"))] > + "TARGET_VECTOR" > + "vmv1r.v\t%0,%1" > + [(set_attr "type" "vmov") > + (set_attr "mode" "")]) Should we also add loads and stores as well? and just make sure this is also necessary for the fix and not sneaky, right? > + > (define_expand "movmisalign" >[(set (match_operand:VLS 0 "nonimmediate_operand") > (match_operand:VLS 1 "general_operand"))]
Re: [PATCH v1] RISC-V: Support FP SGNJ autovec for VLS mode
LGTM juzhe.zh...@rivai.ai From: pan2.li Date: 2023-09-05 18:32 To: gcc-patches CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng Subject: [PATCH v1] RISC-V: Support FP SGNJ autovec for VLS mode From: Pan Li This patch would like to allow the VLS mode autovec for the floating-point binary operation MAX/MIN. Given below code example: void test(float * restrict out, float * restrict in1, float * restrict in2) { for (int i = 0; i < 128; i++) out[i] = __builtin_copysignf (in1[i], in2[i]); } Before this patch: test: csrra4,vlenb sllia4,a4,1 li a5,128 bleua5,a4,.L2 mv a5,a4 .L2: vsetvli zero,a5,e32,m8,ta,ma vle32.v v8,0(a1) vle32.v v16,0(a2) vsetvli a4,zero,e32,m8,ta,ma vfsgnj.vv v8,v8,v16 vsetvli zero,a5,e32,m8,ta,ma vse32.v v8,0(a0) ret After this patch: test: li a5,128 vsetvli zero,a5,e32,m1,ta,ma vle32.v v1,0(a1) vle32.v v2,0(a2) vfsgnj.vv v1,v1,v2 vse32.v v1,0(a0) ret Signed-off-by: Pan Li gcc/ChangeLog: * config/riscv/autovec-vls.md (copysign3): New pattern. * config/riscv/vector.md: Extend iterator for VLS. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls/def.h: New macro. * gcc.target/riscv/rvv/autovec/vls/floating-point-sgnj-1.c: New test. * gcc.target/riscv/rvv/autovec/vls/floating-point-sgnj-2.c: New test. --- gcc/config/riscv/autovec-vls.md | 22 ++ gcc/config/riscv/vector.md| 24 +-- .../gcc.target/riscv/rvv/autovec/vls/def.h| 8 .../rvv/autovec/vls/floating-point-sgnj-1.c | 43 +++ .../rvv/autovec/vls/floating-point-sgnj-2.c | 43 +++ 5 files changed, 128 insertions(+), 12 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/floating-point-sgnj-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/floating-point-sgnj-2.c diff --git a/gcc/config/riscv/autovec-vls.md b/gcc/config/riscv/autovec-vls.md index 7ef29637e33..31b6c4ae714 100644 --- a/gcc/config/riscv/autovec-vls.md +++ b/gcc/config/riscv/autovec-vls.md @@ -255,6 +255,28 @@ (define_insn_and_split "3" [(set_attr "type" "vector")] ) +;; - +;; Includes: +;; - vfsgnj.vv +;; - vfsgnj.vf +;; - +(define_insn_and_split "copysign3" + [(set (match_operand:VLSF 0 "register_operand") +(unspec:VLSF + [(match_operand:VLSF 1 "register_operand") + (match_operand:VLSF 2 "register_operand")] UNSPEC_VCOPYSIGN))] + "TARGET_VECTOR && can_create_pseudo_p ()" + "#" + "&& 1" + [(const_int 0)] + { +riscv_vector::emit_vlmax_insn (code_for_pred (UNSPEC_VCOPYSIGN, mode), +riscv_vector::BINARY_OP, operands); +DONE; + } + [(set_attr "type" "vector")] +) + ;; --- ;; [INT] Unary operations ;; --- diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md index 9d7b4bbe1d4..fc985ff6a01 100644 --- a/gcc/config/riscv/vector.md +++ b/gcc/config/riscv/vector.md @@ -6166,8 +6166,8 @@ (define_insn "@pred__reverse_scalar" (symbol_ref "riscv_vector::get_frm_mode (operands[9])"))]) (define_insn "@pred_" - [(set (match_operand:VF 0 "register_operand" "=vd, vd, vr, vr") - (if_then_else:VF + [(set (match_operand:V_VLSF 0 "register_operand" "=vd, vd, vr, vr") + (if_then_else:V_VLSF (unspec: [(match_operand: 1 "vector_mask_operand" " vm, vm,Wc1,Wc1") (match_operand 5 "vector_length_operand"" rK, rK, rK, rK") @@ -6176,10 +6176,10 @@ (define_insn "@pred_" (match_operand 8 "const_int_operand"" i, i, i, i") (reg:SI VL_REGNUM) (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE) - (unspec:VF - [(match_operand:VF 3 "register_operand" " vr, vr, vr, vr") - (match_operand:VF 4 "register_operand" " vr, vr, vr, vr")] VCOPYSIGNS) - (match_operand:VF 2 "vector_merge_operand" " vu, 0, vu, 0")))] + (unspec:V_VLSF + [(match_operand:V_VLSF 3 "register_operand" " vr, vr, vr, vr") + (match_operand:V_VLSF 4 "register_operand" " vr, vr, vr, vr")] VCOPYSIGNS) + (match_operand:V_VLSF 2 "vector_merge_operand" " vu, 0, vu, 0")))] "TARGET_VECTOR" "vfsgnj.vv\t%0,%3,%4%p1" [(set_attr "type" "vfsgnj") @@ -6207,8 +6207,8 @@ (define_insn "@pred_ncopysign&quo
Re: [PATCH] RISC-V: Fix Dynamic LMUL compile option
simple patch for dynamic cost model: https://gcc.gnu.org/pipermail/gcc-patches/2023-September/629212.html committed. juzhe.zh...@rivai.ai From: Juzhe-Zhong Date: 2023-09-04 17:08 To: gcc-patches CC: kito.cheng; kito.cheng; jeffreyalaw; rdapp.gcc; Juzhe-Zhong Subject: [PATCH] RISC-V: Fix Dynamic LMUL compile option gcc/ChangeLog: * config/riscv/riscv-opts.h (enum riscv_autovec_lmul_enum): Fix Dynamic status. * config/riscv/riscv-v.cc (preferred_simd_mode): Ditto. (autovectorize_vector_modes): Ditto. (vectorize_related_mode): Ditto. --- gcc/config/riscv/riscv-opts.h | 2 +- gcc/config/riscv/riscv-v.cc | 15 --- 2 files changed, 9 insertions(+), 8 deletions(-) diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h index 79e0f12e388..b6b5907e111 100644 --- a/gcc/config/riscv/riscv-opts.h +++ b/gcc/config/riscv/riscv-opts.h @@ -81,7 +81,7 @@ enum riscv_autovec_lmul_enum { RVV_M4 = 4, RVV_M8 = 8, /* For dynamic LMUL, we compare COST start with LMUL8. */ - RVV_DYNAMIC = RVV_M8 + RVV_DYNAMIC = 9 }; enum riscv_multilib_select_kind { diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index c8ad96f44d5..fbbc16a3c26 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -1971,16 +1971,16 @@ preferred_simd_mode (scalar_mode mode) vectorizer when we enable them in this target hook. Currently, we can support auto-vectorization in -march=rv32_zve32x_zvl128b. Wheras, -march=rv32_zve32x_zvl32b or -march=rv32_zve32x_zvl64b are disabled. */ + int lmul = riscv_autovec_lmul == RVV_DYNAMIC ? RVV_M8 : riscv_autovec_lmul; if (autovec_use_vlmax_p ()) { - if (TARGET_MIN_VLEN < 128 && riscv_autovec_lmul < RVV_M2) + if (TARGET_MIN_VLEN < 128 && lmul < RVV_M2) return word_mode; /* We use LMUL = 1 as base bytesize which is BYTES_PER_RISCV_VECTOR and riscv_autovec_lmul as multiply factor to calculate the the NUNITS to get the auto-vectorization mode. */ poly_uint64 nunits; - poly_uint64 vector_size - = BYTES_PER_RISCV_VECTOR * ((int) riscv_autovec_lmul); + poly_uint64 vector_size = BYTES_PER_RISCV_VECTOR * lmul; poly_uint64 scalar_size = GET_MODE_SIZE (mode); gcc_assert (multiple_p (vector_size, scalar_size, &nunits)); machine_mode rvv_mode; @@ -2154,10 +2154,10 @@ get_cmp_insn_code (rtx_code code, machine_mode mode) unsigned int autovectorize_vector_modes (vector_modes *modes, bool) { + int lmul = riscv_autovec_lmul == RVV_DYNAMIC ? RVV_M8 : riscv_autovec_lmul; if (autovec_use_vlmax_p ()) { - poly_uint64 full_size - = BYTES_PER_RISCV_VECTOR * ((int) riscv_autovec_lmul); + poly_uint64 full_size = BYTES_PER_RISCV_VECTOR * lmul; /* Start with a RVVQImode where LMUL is the number of units that fit a whole vector. @@ -2187,7 +2187,7 @@ autovectorize_vector_modes (vector_modes *modes, bool) { /* Push all VLSmodes according to TARGET_MIN_VLEN. */ unsigned int i = 0; - unsigned int base_size = TARGET_MIN_VLEN * riscv_autovec_lmul / 8; + unsigned int base_size = TARGET_MIN_VLEN * lmul / 8; unsigned int size = base_size; machine_mode mode; while (size > 0 && get_vector_mode (QImode, size).exists (&mode)) @@ -2212,8 +2212,9 @@ vectorize_related_mode (machine_mode vector_mode, scalar_mode element_mode, { /* TODO: We will support RVV VLS auto-vectorization mode in the future. */ poly_uint64 min_units; + int lmul = riscv_autovec_lmul == RVV_DYNAMIC ? RVV_M8 : riscv_autovec_lmul; if (autovec_use_vlmax_p () && riscv_v_ext_vector_mode_p (vector_mode) - && multiple_p (BYTES_PER_RISCV_VECTOR * ((int) riscv_autovec_lmul), + && multiple_p (BYTES_PER_RISCV_VECTOR * lmul, GET_MODE_SIZE (element_mode), &min_units)) { machine_mode rvv_mode; -- 2.36.1
Re: [PATCH] RISC-V: Fix vsetvl pass ICE
Ok for trunk. But not sure whether it's ok for GCC-13. juzhe.zh...@rivai.ai From: Lehua Ding Date: 2023-08-30 17:51 To: gcc-patches CC: juzhe.zhong; kito.cheng; rdapp.gcc; palmer; jeffreyalaw Subject: [PATCH] RISC-V: Fix vsetvl pass ICE This patch fix pr111234 (a vsetvl pass ICE) when fuse a mask any vlmax vsetvl_vtype_change_only insn with a mu vsetvl insn. PR target/111234 gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (gen_vsetvl_pat): Remove condition. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/vsetvl/pr111234.c: New test. --- gcc/config/riscv/riscv-vsetvl.cc | 2 +- .../gcc.target/riscv/rvv/vsetvl/pr111234.c| 19 +++ 2 files changed, 20 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111234.c diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc index 1386d9250ca..a81bb53a521 100644 --- a/gcc/config/riscv/riscv-vsetvl.cc +++ b/gcc/config/riscv/riscv-vsetvl.cc @@ -655,7 +655,7 @@ gen_vsetvl_pat (rtx_insn *rinsn, const vector_insn_info &info, new_pat = gen_vsetvl_pat (VSETVL_NORMAL, new_info, vl); else { - if (vsetvl_insn_p (rinsn) || vlmax_avl_p (info.get_avl ())) + if (vsetvl_insn_p (rinsn)) new_pat = gen_vsetvl_pat (VSETVL_NORMAL, new_info, get_vl (rinsn)); else if (INSN_CODE (rinsn) == CODE_FOR_vsetvl_vtype_change_only) new_pat = gen_vsetvl_pat (VSETVL_VTYPE_CHANGE_ONLY, new_info, NULL_RTX); diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111234.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111234.c new file mode 100644 index 000..ee5eec4a257 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111234.c @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3" } */ + +#include + +void +f (vint32m1_t *in, vint64m2_t *out, vbool32_t *m, int b) +{ + vint32m1_t va = *in; + vbool32_t mask = *m; + vint64m2_t vb += __riscv_vwadd_vx_i64m2_m (mask, va, 1, __riscv_vsetvlmax_e64m2 ()); + vint64m2_t vc = __riscv_vadd_vx_i64m2 (vb, 1, __riscv_vsetvlmax_e64m2 ()); + + if (b != 0) +vc = __riscv_vadd_vx_i64m2_mu (mask, vc, vc, 1, __riscv_vsetvlmax_e64m2 ()); + + *out = vc; +} -- 2.36.3
[PATCH] rtl-optimization/110939 Really fix narrow comparison of memory and constant
Ping. This patch also fixed issue occurred in RISC-V backend: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71 Thanks. juzhe.zh...@rivai.ai
Re: Re: [PATCH V4] RISC-V: Enable vec_int testsuite for RVV VLA vectorization
>> Juzhe mentioned he doesn't want to commit this before >> all/most bugs are addresses anyway, right? Yes. juzhe.zh...@rivai.ai From: Robin Dapp Date: 2023-08-28 22:27 To: Kito Cheng; Juzhe-Zhong CC: rdapp.gcc; gcc-patches; kito.cheng Subject: Re: [PATCH V4] RISC-V: Enable vec_int testsuite for RVV VLA vectorization > LGTM from my side, but I would like to wait Robin is ok too In principle I'm OK with it as well, realizing we will still need to fine-tune a lot here anyway. For now, IMHO it's good to have some additional test coverage in the vector space but we should not expect every test to be correct/a good match for everything we do yet. Juzhe mentioned he doesn't want to commit this before all/most bugs are addresses anyway, right? Regards Robin