RE: [PATCH v1 2/2] RISC-V: Add testcases for form 3 of signed vector SAT_ADD
Thanks Robin, this depends on [PATCH 1/2] of match.pd change, will commit it after that. Pan -Original Message- From: Robin Dapp Sent: Tuesday, September 24, 2024 8:40 PM To: Li, Pan2 ; gcc-patches@gcc.gnu.org Cc: richard.guent...@gmail.com; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; Robin Dapp Subject: Re: [PATCH v1 2/2] RISC-V: Add testcases for form 3 of signed vector SAT_ADD LGTM (in case you haven't committed it yet). -- Regards Robin
RE: [PATCH v2] Widening-Mul: Fix one ICE for SAT_SUB matching operand checking
Thanks Richard for comments. > Since you're creating the call with op_0/op_1 shouldn't you _only_ check > support > for op_type operation and not lhs_type? Yes, your are right. Checking operand makes much more sense to me. Let me update in v3. Pan -Original Message- From: Richard Biener Sent: Tuesday, September 24, 2024 3:42 PM To: Li, Pan2 Cc: gcc-patches@gcc.gnu.org; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH v2] Widening-Mul: Fix one ICE for SAT_SUB matching operand checking On Tue, Sep 24, 2024 at 9:13 AM wrote: > > From: Pan Li > > This patch would like to fix the following ICE for -O2 -m32 of x86_64. > > during RTL pass: expand > JackMidiAsyncWaitQueue.cpp.cpp: In function 'void DequeueEvent(unsigned > int)': > JackMidiAsyncWaitQueue.cpp.cpp:3:6: internal compiler error: in > expand_fn_using_insn, at internal-fn.cc:263 > 3 | void DequeueEvent(unsigned frame) { > | ^~~~ > 0x27b580d diagnostic_context::diagnostic_impl(rich_location*, > diagnostic_metadata const*, diagnostic_option_id, char const*, > __va_list_tag (*) [1], diagnostic_t) > ???:0 > 0x27c4a3f internal_error(char const*, ...) > ???:0 > 0x27b3994 fancy_abort(char const*, int, char const*) > ???:0 > 0xf25ae5 expand_fn_using_insn(gcall*, insn_code, unsigned int, unsigned int) > ???:0 > 0xf2a124 expand_direct_optab_fn(internal_fn, gcall*, optab_tag, unsigned int) > ???:0 > 0xf2c87c expand_SAT_SUB(internal_fn, gcall*) > ???:0 > > We allowed the operand convert when matching SAT_SUB in match.pd, to support > the zip benchmark SAT_SUB pattern. Aka, > > (convert? (minus (convert1? @0) (convert1? @1))) for below sample code. > > void test (uint16_t *x, unsigned b, unsigned n) > { > unsigned a = 0; > register uint16_t *p = x; > > do { > a = *--p; > *p = (uint16_t)(a >= b ? a - b : 0); // Truncate after .SAT_SUB > } while (--n); > } > > The pattern match for SAT_SUB itself may also act on below scalar sample > code too. > > unsigned long long GetTimeFromFrames(int); > unsigned long long GetMicroSeconds(); > > void DequeueEvent(unsigned frame) { > long long frame_time = GetTimeFromFrames(frame); > unsigned long long current_time = GetMicroSeconds(); > DequeueEvent(frame_time < current_time ? 0 : frame_time - current_time); > } > > Aka: > > uint32_t a = (uint32_t)SAT_SUB(uint64_t, uint64_t); > > Then there will be a problem when ia32 or -m32 is given when compiling. > Because we only check the lhs (aka uint32_t) type is supported by ifn > and missed the operand (aka uint64_t). Mostly DImode is disabled for > 32 bits target like ia32 or rv32gcv, and then trigger ICE when expanding. > > The below test suites are passed for this patch. > * The rv64gcv fully regression test. > * The x86 bootstrap test. > * The x86 fully regression test. > > PR middle-end/116814 > > gcc/ChangeLog: > > * tree-ssa-math-opts.cc (build_saturation_binary_arith_call): Add > ifn is_supported check for operand TREE type. > > gcc/testsuite/ChangeLog: > > * g++.dg/torture/pr116814-1.C: New test. > > Signed-off-by: Pan Li > --- > gcc/testsuite/g++.dg/torture/pr116814-1.C | 12 > gcc/tree-ssa-math-opts.cc | 23 +++ > 2 files changed, 27 insertions(+), 8 deletions(-) > create mode 100644 gcc/testsuite/g++.dg/torture/pr116814-1.C > > diff --git a/gcc/testsuite/g++.dg/torture/pr116814-1.C > b/gcc/testsuite/g++.dg/torture/pr116814-1.C > new file mode 100644 > index 000..dd6f29daa7c > --- /dev/null > +++ b/gcc/testsuite/g++.dg/torture/pr116814-1.C > @@ -0,0 +1,12 @@ > +/* { dg-do compile { target { ia32 } } } */ > +/* { dg-options "-O2" } */ > + > +unsigned long long GetTimeFromFrames(int); > +unsigned long long GetMicroSeconds(); > + > +void DequeueEvent(unsigned frame) { > + long long frame_time = GetTimeFromFrames(frame); > + unsigned long long current_time = GetMicroSeconds(); > + > + DequeueEvent(frame_time < current_time ? 0 : frame_time - current_time); > +} > diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc > index d61668aacfc..361761cedef 100644 > --- a/gcc/tree-ssa-math-opts.cc > +++ b/gcc/tree-ssa-math-opts.cc > @@ -4042,15 +4042,22 @@ build_saturation_binary_arith_call > (gimple_stmt_iterator *gsi, gphi *phi, > internal_fn fn, tree lhs, tree op_0, > tree op_1) > { > - if (direct_internal_fn_su
RE: [PATCH v1] Widening-Mul: Fix one ICE for SAT_SUB matching operand promotion
Got it, thanks a lot. Pan -Original Message- From: Uros Bizjak Sent: Tuesday, September 24, 2024 3:29 PM To: Li, Pan2 Cc: gcc-patches@gcc.gnu.org; richard.guent...@gmail.com; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH v1] Widening-Mul: Fix one ICE for SAT_SUB matching operand promotion On Tue, Sep 24, 2024 at 8:53 AM Li, Pan2 wrote: > > Got it and thanks, let me rerun to make sure it works well as expected. For reference, this is documented in: https://gcc.gnu.org/wiki/Testing_GCC https://gcc-newbies-guide.readthedocs.io/en/latest/working-with-the-testsuite.html https://gcc.gnu.org/install/test.html Uros.
RE: [PATCH v1] Widening-Mul: Fix one ICE for SAT_SUB matching operand promotion
Got it and thanks, let me rerun to make sure it works well as expected. Pan -Original Message- From: Uros Bizjak Sent: Tuesday, September 24, 2024 2:33 PM To: Li, Pan2 Cc: gcc-patches@gcc.gnu.org; richard.guent...@gmail.com; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH v1] Widening-Mul: Fix one ICE for SAT_SUB matching operand promotion On Tue, Sep 24, 2024 at 8:24 AM Li, Pan2 wrote: > > Thanks Uros for comments. > > > This is not "target", but "middle-end" component. Even though the bug > > is exposed on x86_64 target, the fix is in the middle-end code, not in > > the target code. > > Sure, will rename to middle-end. > > > Please remove -m32 and use "{ dg-do compile { target ia32 } }" instead. > > Is there any suggestion to run the "ia32" test when configure gcc build? > I first leverage ia32 but complain UNSUPPORTED for this case. You can add the following to your testsuite run: RUNTESTFLAGS="--target-board=unix\{,-m32\}" e.g: make -j N -k check RUNTESTFLAGS=... (where N is the number of make threads) You can also add "dg.exp" or "dg.exp=pr12345.c" (or any other exp file or testcase name) to RUNTESTFLAGS to run only one exp file or a single test. Uros. > Pan > > -Original Message- > From: Uros Bizjak > Sent: Tuesday, September 24, 2024 2:17 PM > To: Li, Pan2 > Cc: gcc-patches@gcc.gnu.org; richard.guent...@gmail.com; > tamar.christ...@arm.com; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; > jeffreya...@gmail.com; rdapp@gmail.com > Subject: Re: [PATCH v1] Widening-Mul: Fix one ICE for SAT_SUB matching > operand promotion > > On Mon, Sep 23, 2024 at 4:58 PM wrote: > > > > From: Pan Li > > > > This patch would like to fix the following ICE for -O2 -m32 of x86_64. > > > > during RTL pass: expand > > JackMidiAsyncWaitQueue.cpp.cpp: In function 'void DequeueEvent(unsigned > > int)': > > JackMidiAsyncWaitQueue.cpp.cpp:3:6: internal compiler error: in > > expand_fn_using_insn, at internal-fn.cc:263 > > 3 | void DequeueEvent(unsigned frame) { > > | ^~~~ > > 0x27b580d diagnostic_context::diagnostic_impl(rich_location*, > > diagnostic_metadata const*, diagnostic_option_id, char const*, > > __va_list_tag (*) [1], diagnostic_t) > > ???:0 > > 0x27c4a3f internal_error(char const*, ...) > > ???:0 > > 0x27b3994 fancy_abort(char const*, int, char const*) > > ???:0 > > 0xf25ae5 expand_fn_using_insn(gcall*, insn_code, unsigned int, unsigned int) > > ???:0 > > 0xf2a124 expand_direct_optab_fn(internal_fn, gcall*, optab_tag, unsigned > > int) > > ???:0 > > 0xf2c87c expand_SAT_SUB(internal_fn, gcall*) > > ???:0 > > > > We allowed the operand convert when matching SAT_SUB in match.pd, to support > > the zip benchmark SAT_SUB pattern. Aka, > > > > (convert? (minus (convert1? @0) (convert1? @1))) for below sample code. > > > > void test (uint16_t *x, unsigned b, unsigned n) > > { > > unsigned a = 0; > > register uint16_t *p = x; > > > > do { > > a = *--p; > > *p = (uint16_t)(a >= b ? a - b : 0); // Truncate after .SAT_SUB > > } while (--n); > > } > > > > The pattern match for SAT_SUB itself may also act on below scalar sample > > code too. > > > > unsigned long long GetTimeFromFrames(int); > > unsigned long long GetMicroSeconds(); > > > > void DequeueEvent(unsigned frame) { > > long long frame_time = GetTimeFromFrames(frame); > > unsigned long long current_time = GetMicroSeconds(); > > DequeueEvent(frame_time < current_time ? 0 : frame_time - current_time); > > } > > > > Aka: > > > > uint32_t a = (uint32_t)SAT_SUB(uint64_t, uint64_t); > > > > Then there will be a problem when ia32 or -m32 is given when compiling. > > Because we only check the lhs (aka uint32_t) type is supported by ifn > > and missed the operand (aka uint64_t). Mostly DImode is disabled for > > 32 bits target like ia32 or rv32gcv, and then trigger ICE when expanding. > > > > The below test suites are passed for this patch. > > * The rv64gcv fully regression test. > > * The x86 bootstrap test. > > * The x86 fully regression test. > > > > PR target/116814 > > This is not "target", but "middle-end" component. Even though the bug > is exposed on x86_64 target, the fix is in the middle-end code, not in > the targ
RE: [PATCH v1] Widening-Mul: Fix one ICE for SAT_SUB matching operand promotion
Thanks Uros for comments. > This is not "target", but "middle-end" component. Even though the bug > is exposed on x86_64 target, the fix is in the middle-end code, not in > the target code. Sure, will rename to middle-end. > Please remove -m32 and use "{ dg-do compile { target ia32 } }" instead. Is there any suggestion to run the "ia32" test when configure gcc build? I first leverage ia32 but complain UNSUPPORTED for this case. Pan -Original Message- From: Uros Bizjak Sent: Tuesday, September 24, 2024 2:17 PM To: Li, Pan2 Cc: gcc-patches@gcc.gnu.org; richard.guent...@gmail.com; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH v1] Widening-Mul: Fix one ICE for SAT_SUB matching operand promotion On Mon, Sep 23, 2024 at 4:58 PM wrote: > > From: Pan Li > > This patch would like to fix the following ICE for -O2 -m32 of x86_64. > > during RTL pass: expand > JackMidiAsyncWaitQueue.cpp.cpp: In function 'void DequeueEvent(unsigned > int)': > JackMidiAsyncWaitQueue.cpp.cpp:3:6: internal compiler error: in > expand_fn_using_insn, at internal-fn.cc:263 > 3 | void DequeueEvent(unsigned frame) { > | ^~~~ > 0x27b580d diagnostic_context::diagnostic_impl(rich_location*, > diagnostic_metadata const*, diagnostic_option_id, char const*, > __va_list_tag (*) [1], diagnostic_t) > ???:0 > 0x27c4a3f internal_error(char const*, ...) > ???:0 > 0x27b3994 fancy_abort(char const*, int, char const*) > ???:0 > 0xf25ae5 expand_fn_using_insn(gcall*, insn_code, unsigned int, unsigned int) > ???:0 > 0xf2a124 expand_direct_optab_fn(internal_fn, gcall*, optab_tag, unsigned int) > ???:0 > 0xf2c87c expand_SAT_SUB(internal_fn, gcall*) > ???:0 > > We allowed the operand convert when matching SAT_SUB in match.pd, to support > the zip benchmark SAT_SUB pattern. Aka, > > (convert? (minus (convert1? @0) (convert1? @1))) for below sample code. > > void test (uint16_t *x, unsigned b, unsigned n) > { > unsigned a = 0; > register uint16_t *p = x; > > do { > a = *--p; > *p = (uint16_t)(a >= b ? a - b : 0); // Truncate after .SAT_SUB > } while (--n); > } > > The pattern match for SAT_SUB itself may also act on below scalar sample > code too. > > unsigned long long GetTimeFromFrames(int); > unsigned long long GetMicroSeconds(); > > void DequeueEvent(unsigned frame) { > long long frame_time = GetTimeFromFrames(frame); > unsigned long long current_time = GetMicroSeconds(); > DequeueEvent(frame_time < current_time ? 0 : frame_time - current_time); > } > > Aka: > > uint32_t a = (uint32_t)SAT_SUB(uint64_t, uint64_t); > > Then there will be a problem when ia32 or -m32 is given when compiling. > Because we only check the lhs (aka uint32_t) type is supported by ifn > and missed the operand (aka uint64_t). Mostly DImode is disabled for > 32 bits target like ia32 or rv32gcv, and then trigger ICE when expanding. > > The below test suites are passed for this patch. > * The rv64gcv fully regression test. > * The x86 bootstrap test. > * The x86 fully regression test. > > PR target/116814 This is not "target", but "middle-end" component. Even though the bug is exposed on x86_64 target, the fix is in the middle-end code, not in the target code. > gcc/ChangeLog: > > * tree-ssa-math-opts.cc (build_saturation_binary_arith_call): Add > ifn is_supported check for operand TREE type. > > gcc/testsuite/ChangeLog: > > * g++.dg/torture/pr116814-1.C: New test. > > Signed-off-by: Pan Li > --- > gcc/testsuite/g++.dg/torture/pr116814-1.C | 12 > gcc/tree-ssa-math-opts.cc | 23 +++ > 2 files changed, 27 insertions(+), 8 deletions(-) > create mode 100644 gcc/testsuite/g++.dg/torture/pr116814-1.C > > diff --git a/gcc/testsuite/g++.dg/torture/pr116814-1.C > b/gcc/testsuite/g++.dg/torture/pr116814-1.C > new file mode 100644 > index 000..8db5b020cfd > --- /dev/null > +++ b/gcc/testsuite/g++.dg/torture/pr116814-1.C > @@ -0,0 +1,12 @@ > +/* { dg-do compile { target { i?86-*-* x86_64-*-* } } } */ > +/* { dg-options "-O2 -m32" } */ Please remove -m32 and use "{ dg-do compile { target ia32 } }" instead. Uros, > + > +unsigned long long GetTimeFromFrames(int); > +unsigned long long GetMicroSeconds(); > + > +void DequeueEvent(unsigned frame) { > + long long frame_time = GetTimeFromFrames(frame); > + unsigned long long current_time = GetMicroSeconds(); > + > + DequeueEven
RE: [PATCH] RISC-V: testsuite: Fix SELECT_VL SLP fallout.
Thanks Robin. > I think those tests don't really need to check for vsetvl anyway. Looks only scan asm for RVV fixed-pointer insn is good enough for vector part, which is somehow different to scalar. I will make the change after this patch pushed. Pan -Original Message- From: Robin Dapp Sent: Thursday, September 19, 2024 9:25 PM To: gcc-patches Cc: pal...@dabbelt.com; kito.ch...@gmail.com; juzhe.zh...@rivai.ai; jeffreya...@gmail.com; Li, Pan2 ; rdapp@gmail.com Subject: [PATCH] RISC-V: testsuite: Fix SELECT_VL SLP fallout. Hi, this fixes asm-scan fallout from r15-3712-g5e3a4a01785e2d where we allow SLP with SELECT_VL. Assisted by sed and regtested on rv64gcv_zvfh_zvbb. Rather lengthy but obvious, so going to commit after a while if the CI is happy. I think those tests don't really need to check for vsetvl anyway, not all of them at least but I didn't change that for now. Regards Robin gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-1.c: Expect length-controlled loop. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-2.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-3.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-4.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-1.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-10.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-11.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-12.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-13.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-14.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-15.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-16.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-17.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-18.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-19.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-2.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-20.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-21.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-22.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-23.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-24.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-25.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-26.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-27.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-28.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-29.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-3.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-30.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-31.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-32.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-4.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-5.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-6.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-7.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-8.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-9.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-1.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-10.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-11.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-12.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-13.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-14.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-15.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-16.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-17.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-18.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-19.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-2.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-20.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-21.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-22.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-23.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-24.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-25.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-26.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat
RE: [PATCH v5 4/4] RISC-V: Fix vector SAT_ADD dump check due to middle-end change
> So for the future I'd suggest you post those with a remark that you think > they're obvious and going to commit in a day (or some other reasonable > timeframe) if there are no complaints. Oh, I see. Thanks Robin for reminding. That would be perfect. Do you have any best practices for the remark "obvious"? Like [NFC] in subject to give some hit for not-function-change, maybe take [TBO] stand for to-be-obvious or something like that. Pan -Original Message- From: Robin Dapp Sent: Thursday, September 19, 2024 4:26 PM To: Li, Pan2 ; gcc-patches@gcc.gnu.org Cc: richard.guent...@gmail.com; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; Robin Dapp Subject: Re: [PATCH v5 4/4] RISC-V: Fix vector SAT_ADD dump check due to middle-end change > This patch would like fix the dump check times of vector SAT_ADD. The > middle-end change makes the match times from 2 to 4 times. > > The below test suites are passed for this patch. > * The rv64gcv fully regression test. That's OK. And I think testsuite fixup patches like this you can consider "obvious" as long as you're sure the underlying reason is understood. In particular as you have been working in the saturating space for a while now. So for the future I'd suggest you post those with a remark that you think they're obvious and going to commit in a day (or some other reasonable timeframe) if there are no complaints. -- Regards Robin
RE: [PATCH v5 2/4] Genmatch: Refine the gen_phi_on_cond by match_cond_with_binary_phi
Thanks Richard for comments. Will commit it with that change if no surprise from test suite. Pan -Original Message- From: Richard Biener Sent: Thursday, September 19, 2024 2:23 PM To: Li, Pan2 Cc: gcc-patches@gcc.gnu.org; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH v5 2/4] Genmatch: Refine the gen_phi_on_cond by match_cond_with_binary_phi On Thu, Sep 19, 2024 at 6:11 AM wrote: > > From: Pan Li > > This patch would like to leverage the match_cond_with_binary_phi to > match the phi on cond, and get the true/false arg if matched. This > helps a lot to simplify the implementation of gen_phi_on_cond. > > Before this patch: > basic_block _b1 = gimple_bb (_a1); > if (gimple_phi_num_args (_a1) == 2) > { > basic_block _pb_0_1 = EDGE_PRED (_b1, 0)->src; > basic_block _pb_1_1 = EDGE_PRED (_b1, 1)->src; > basic_block _db_1 = safe_dyn_cast (*gsi_last_bb (_pb_0_1)) ? > _pb_0_1 : _pb_1_1; > basic_block _other_db_1 = safe_dyn_cast (*gsi_last_bb > (_pb_0_1)) ? _pb_1_1 : _pb_0_1; > gcond *_ct_1 = safe_dyn_cast (*gsi_last_bb (_db_1)); > if (_ct_1 && EDGE_COUNT (_other_db_1->preds) == 1 > && EDGE_COUNT (_other_db_1->succs) == 1 > && EDGE_PRED (_other_db_1, 0)->src == _db_1) > { > tree _cond_lhs_1 = gimple_cond_lhs (_ct_1); > tree _cond_rhs_1 = gimple_cond_rhs (_ct_1); > tree _p0 = build2 (gimple_cond_code (_ct_1), boolean_type_node, > _cond_lhs_1, _cond_rhs_1); > bool _arg_0_is_true_1 = gimple_phi_arg_edge (_a1, 0)->flags & > EDGE_TRUE_VALUE; > tree _p1 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? 0 : 1); > tree _p2 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? 1 : 0); > ... > > After this patch: > basic_block _b1 = gimple_bb (_a1); > tree _p1, _p2; > gcond *_cond_1 = match_cond_with_binary_phi (_a1, &_p1, &_p2); > if (_cond_1 && _p1 && _p2) It should be enough to test _cond_1 for nullptr, at least I think the API should guarantee that _p1 and _p2 are then set correctly. OK with that change. Richard. > { > tree _cond_lhs_1 = gimple_cond_lhs (_cond_1); > tree _cond_rhs_1 = gimple_cond_rhs (_cond_1); > tree _p0 = build2 (gimple_cond_code (_cond_1), boolean_type_node, > _cond_lhs_1, _cond_rhs_1); > ... > > The below test suites are passed for this patch. > * The rv64gcv fully regression test. > * The x86 bootstrap test. > * The x86 fully regression test. > > gcc/ChangeLog: > > * genmatch.cc (dt_operand::gen_phi_on_cond): Leverage the > match_cond_with_binary_phi API to get cond gimple, true and > false TREE arg. > > Signed-off-by: Pan Li > --- > gcc/genmatch.cc | 67 +++-- > 1 file changed, 15 insertions(+), 52 deletions(-) > > diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc > index f1ff1d18265..149458fffe1 100644 > --- a/gcc/genmatch.cc > +++ b/gcc/genmatch.cc > @@ -3516,79 +3516,42 @@ dt_operand::gen (FILE *f, int indent, bool gimple, > int depth) > void > dt_operand::gen_phi_on_cond (FILE *f, int indent, int depth) > { > - fprintf_indent (f, indent, > -"basic_block _b%d = gimple_bb (_a%d);\n", depth, depth); > - > - fprintf_indent (f, indent, "if (gimple_phi_num_args (_a%d) == 2)\n", > depth); > + char opname_0[20]; > + char opname_1[20]; > + char opname_2[20]; > > - indent += 2; > - fprintf_indent (f, indent, "{\n"); > - indent += 2; > + gen_opname (opname_0, 0); > + gen_opname (opname_1, 1); > + gen_opname (opname_2, 2); > >fprintf_indent (f, indent, > -"basic_block _pb_0_%d = EDGE_PRED (_b%d, 0)->src;\n", depth, depth); > - fprintf_indent (f, indent, > -"basic_block _pb_1_%d = EDGE_PRED (_b%d, 1)->src;\n", depth, depth); > - fprintf_indent (f, indent, > -"basic_block _db_%d = safe_dyn_cast (*gsi_last_bb (_pb_0_%d)) > ? " > -"_pb_0_%d : _pb_1_%d;\n", depth, depth, depth, depth); > +"basic_block _b%d = gimple_bb (_a%d);\n", depth, depth); > + fprintf_indent (f, indent, "tree %s, %s;\n", opname_1, opname_2); >fprintf_indent (f, indent, > -"basic_block _other_db_%d = safe_dyn_cast " > -"(*gsi_last_bb (_pb_0_%d)) ? _pb_1_%d : _pb_0_%d;\n", > -depth, depth, depth, depth); > +"gcond *_cond_%d = match_cond_with_binary_phi (_a%d, &%s, &%s);\n", > +depth, depth, opname_1, opname_2); > > - fprintf_indent (f, indent, > -"gcond *_ct_%d =
RE: [PATCH v1] RISC-V: Add testcases for form 2 of signed scalar SAT_ADD
Thanks Jeff for comments. > Not particularly happy with the wall of expected assembly output, though > it at least tries to be generic in terms of registers and such. Sort of, the asm check for ssadd is quit long up to a point. > So I'll ACK. But > I'd like us to start thinking about what is the most important part of > what's being tested rather than just matching a blob of assembly text. > I believe (and please correct me if I'm wrong), what you're really > testing here is whether or not we're recognizing the saturation idiom in > gimple and then proceeding to generate code via the RISC-V backend's > define_expand patterns. Yes, you are right. The tests cover 3 parts, the SAT IR in expand dump, the Riscv backend code-gen, and the run test. > So a better test would check for the IFN, probably in the .optimized or > .expand dump. What I don't offhand see is a good way to test that we're > in one of the saturation related expanders. > I wonder if we could emit debugging output as part of the expander. > It's reasonably likely that the dump_file and dump_flags are exposed as > global variables. That in turn would allow us to emit messages into the > .expand dump file. It doesn't have to be terribly complex. Just a note > about which expander we're in and perhaps some info about the arguments. > The point being to get away from using a scan-asm test for something > we can look at more directly if we're willing to add a bit more > information into the dump file. I see, that would be a alternative approach for the backend code-gen checking. It may make it easier for similar cases, I think we can have a try in short future. Pan -Original Message- From: Jeff Law Sent: Wednesday, September 18, 2024 11:10 PM To: Li, Pan2 ; gcc-patches@gcc.gnu.org Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH v1] RISC-V: Add testcases for form 2 of signed scalar SAT_ADD On 9/12/24 8:14 PM, pan2...@intel.com wrote: > From: Pan Li > > This patch would like to add testcases of the signed scalar SAT_ADD > for form 2. Aka: > > Form 2: >#define DEF_SAT_S_ADD_FMT_2(T, UT, MIN, MAX) \ >T __attribute__((noinline)) \ >sat_s_add_##T##_fmt_2 (T x, T y) \ >{\ > T sum = (UT)x + (UT)y; \ > if ((x ^ y) < 0 || (sum ^ x) >= 0) \ >return sum; \ > return x < 0 ? MIN : MAX; \ >} > > DEF_SAT_S_ADD_FMT_2 (int64_t, uint64_t, INT64_MIN, INT64_MAX) > > The below test are passed for this patch. > * The rv64gcv fully regression test. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/sat_arith.h: Add test helper macros. > * gcc.target/riscv/sat_s_add-5.c: New test. > * gcc.target/riscv/sat_s_add-6.c: New test. > * gcc.target/riscv/sat_s_add-7.c: New test. > * gcc.target/riscv/sat_s_add-8.c: New test. > * gcc.target/riscv/sat_s_add-run-5.c: New test. > * gcc.target/riscv/sat_s_add-run-6.c: New test. > * gcc.target/riscv/sat_s_add-run-7.c: New test. > * gcc.target/riscv/sat_s_add-run-8.c: New test. Not particularly happy with the wall of expected assembly output, though it at least tries to be generic in terms of registers and such. So I'll ACK. But I'd like us to start thinking about what is the most important part of what's being tested rather than just matching a blob of assembly text. I believe (and please correct me if I'm wrong), what you're really testing here is whether or not we're recognizing the saturation idiom in gimple and then proceeding to generate code via the RISC-V backend's define_expand patterns. So a better test would check for the IFN, probably in the .optimized or .expand dump. What I don't offhand see is a good way to test that we're in one of the saturation related expanders. I wonder if we could emit debugging output as part of the expander. It's reasonably likely that the dump_file and dump_flags are exposed as global variables. That in turn would allow us to emit messages into the .expand dump file. It doesn't have to be terribly complex. Just a note about which expander we're in and perhaps some info about the arguments. The point being to get away from using a scan-asm test for something we can look at more directly if we're willing to add a bit more information into the dump file. jeff > > Signed-off-by: Pan Li > --- > gcc/testsuite/gcc.target/riscv/sat_arith.h| 13 > gcc/testsuite/gcc.target/riscv/sat_s_add-5.c | 30 ++
RE: [PATCH v4 1/4] Match: Add interface match_cond_with_binary_phi for true/false arg
Got, thanks Richard and will have a try in v5. Pan -Original Message- From: Richard Biener Sent: Wednesday, September 18, 2024 8:06 PM To: Li, Pan2 Cc: gcc-patches@gcc.gnu.org; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH v4 1/4] Match: Add interface match_cond_with_binary_phi for true/false arg On Wed, Sep 18, 2024 at 2:02 PM Richard Biener wrote: > > On Fri, Sep 13, 2024 at 12:42 AM wrote: > > > > From: Pan Li > > > > When matching the cond with 2 args phi node, we need to figure out > > which arg of phi node comes from the true edge of cond block, as > > well as the false edge. This patch would like to add interface > > to perform the action and return the true and false arg in TREE type. > > > > There will be some additional handling if one of the arg is INTEGER_CST. > > Because the INTEGER_CST args may have no source block, thus its' edge > > source points to the condition block. See below example in line 31, > > the 255 INTEGER_CST has block 2 as source. Thus, we need to find > > the non-INTEGER_CST (aka _1) to tell which one is the true/false edge. > > For example, the _1(3) takes block 3 as source, which is the dest > > of false edge of the condition block. > > > >4 │ __attribute__((noinline)) > >5 │ uint8_t sat_u_add_imm_type_check_uint8_t_fmt_2 (uint8_t x) > >6 │ { > >7 │ unsigned char _1; > >8 │ unsigned char _2; > >9 │ uint8_t _3; > > 10 │ __complex__ unsigned char _5; > > 11 │ > > 12 │ ;; basic block 2, loop depth 0 > > 13 │ ;;pred: ENTRY > > 14 │ _5 = .ADD_OVERFLOW (x_4(D), 9); > > 15 │ _2 = IMAGPART_EXPR <_5>; > > 16 │ if (_2 != 0) > > 17 │ goto ; [35.00%] > > 18 │ else > > 19 │ goto ; [65.00%] > > 20 │ ;;succ: 3 > > 21 │ ;;4 > > 22 │ > > 23 │ ;; basic block 3, loop depth 0 > > 24 │ ;;pred: 2 > > 25 │ _1 = REALPART_EXPR <_5>; > > 26 │ ;;succ: 4 > > 27 │ > > 28 │ ;; basic block 4, loop depth 0 > > 29 │ ;;pred: 2 > > 30 │ ;;3 > > 31 │ # _3 = PHI <255(2), _1(3)> > > 32 │ return _3; > > 33 │ ;;succ: EXIT > > 34 │ > > 35 │ } > > > > The below test suites are passed for this patch. > > * The rv64gcv fully regression test. > > * The x86 bootstrap test. > > * The x86 fully regression test. > > > > gcc/ChangeLog: > > > > * gimple-match-head.cc (match_cond_with_binary_phi): Add new func > > impl to match binary phi for true and false arg. > > > > Signed-off-by: Pan Li > > --- > > gcc/gimple-match-head.cc | 118 +++ > > 1 file changed, 118 insertions(+) > > > > diff --git a/gcc/gimple-match-head.cc b/gcc/gimple-match-head.cc > > index 924d3f1e710..6e7a3a0d62e 100644 > > --- a/gcc/gimple-match-head.cc > > +++ b/gcc/gimple-match-head.cc > > @@ -375,3 +375,121 @@ gimple_bitwise_inverted_equal_p (tree expr1, tree > > expr2, bool &wascmp, tree (*va > > return true; > >return false; > > } > > + > > +/* > > + * Return the relevant gcond * of the given phi, as well as the true > > + * and false TREE args of the phi. Or return NULL. > > + * > > + * If matched the gcond *, the output argument TREE true_arg and false_arg > > + * will be updated to the relevant args of phi. > > + * > > + * If failed to match, NULL gcond * will be returned, as well as the output > > + * arguments will be set to NULL_TREE. > > + */ > > + > > +static inline gcond * > > +match_cond_with_binary_phi (gphi *phi, tree *true_arg, tree *false_arg) > > +{ > > + *true_arg = *false_arg = NULL_TREE; > > + > > + if (gimple_phi_num_args (phi) != 2 > > + || EDGE_COUNT (gimple_bb (phi)->preds) != 2) > > +return NULL; > > + > > + basic_block pred_0 = EDGE_PRED (gimple_bb (phi), 0)->src; > > + basic_block pred_1 = EDGE_PRED (gimple_bb (phi), 1)->src; > > + basic_block cond_block = NULL; > > + > > + if ((EDGE_COUNT (pred_0->succs) == 2 && EDGE_COUNT (pred_1->succs) == 1) > > + || (EDGE_COUNT (pred_0->succs) == 1 && EDGE_COUNT (pred_1->succs) == > > 2)) > > +{ > > + /* For below control flow graph: > > +
RE: [PATCH v3 2/5] Match: Add interface match_cond_with_binary_phi for true/false arg
Thanks Richard for comments. > Yes, inline both CFG matches and unify them - there should be exactly > three cases at > the moment. And "duplicate" computing the true/false arg into the > respective cases > since it's trivial which edge(s) to look at. Got it, will resend the v4 series for this change. Pan -Original Message- From: Richard Biener Sent: Thursday, September 12, 2024 2:51 PM To: Li, Pan2 Cc: gcc-patches@gcc.gnu.org; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH v3 2/5] Match: Add interface match_cond_with_binary_phi for true/false arg On Thu, Sep 12, 2024 at 3:41 AM Li, Pan2 wrote: > > Thanks Richard for comments. > > > why would arg_edge depend on whether t0 is INTEGER_CST or not? > Because the edge->src of INTEGER_CST points to the cond block which cannot > match the > edge->dest of the cond_block. For example as below, the first arg of PHI is > 255(2), which > cannot match neither goto nor goto . > > Thus, I need to take the second arg, aka _1(3) to match the edge->dest of > cond_block. > Aka the phi arg edge->src == cond_block edge->dest. In below example, > the goto matches _1(3) with false condition, and then I can locate the > edge from b2 -> b3. > > Or is there any better approach for this scenario? > >4 │ __attribute__((noinline)) >5 │ uint8_t sat_u_add_imm_type_check_uint8_t_fmt_2 (uint8_t x) >6 │ { >7 │ unsigned char _1; >8 │ unsigned char _2; >9 │ uint8_t _3; > 10 │ __complex__ unsigned char _5; > 11 │ > 12 │ ;; basic block 2, loop depth 0 > 13 │ ;;pred: ENTRY > 14 │ _5 = .ADD_OVERFLOW (x_4(D), 9); > 15 │ _2 = IMAGPART_EXPR <_5>; > 16 │ if (_2 != 0) > 17 │ goto ; [35.00%] > 18 │ else > 19 │ goto ; [65.00%] > 20 │ ;;succ: 3 > 21 │ ;;4 > 22 │ > 23 │ ;; basic block 3, loop depth 0 > 24 │ ;;pred: 2 > 25 │ _1 = REALPART_EXPR <_5>; > 26 │ ;;succ: 4 > 27 │ > 28 │ ;; basic block 4, loop depth 0 > 29 │ ;;pred: 2 > 30 │ ;;3 > 31 │ # _3 = PHI <255(2), _1(3)> > 32 │ return _3; > 33 │ ;;succ: EXIT > 34 │ > 35 │ } > > > Can you instead inline match_control_flow_graph_case_0 and _1 and do the > > argument assignment within the three cases of CFGs we accept? That > > would be much easier to follow. > > To double confirm, are you suggest inline the cfg match for both the case_0 > and case_1? > That may make func body grows, and we may have more cases like case_2, > case_3... etc. > If so, I will inline this to match_cond_with_binary_phi in v4. Yes, inline both CFG matches and unify them - there should be exactly three cases at the moment. And "duplicate" computing the true/false arg into the respective cases since it's trivial which edge(s) to look at. This should make the code more maintainable and easier to understand. I'm not sure what additional cases you are thinking of, more complex CFGs should always mean more than a single controlling condition - I'm not sure we want to go the way to present those as cond1 | cond2. Richard. > Pan > > -Original Message- > From: Richard Biener > Sent: Wednesday, September 11, 2024 9:39 PM > To: Li, Pan2 > Cc: gcc-patches@gcc.gnu.org; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; > kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com > Subject: Re: [PATCH v3 2/5] Match: Add interface match_cond_with_binary_phi > for true/false arg > > On Wed, Sep 11, 2024 at 8:31 AM wrote: > > > > From: Pan Li > > > > When matching the cond with 2 args phi node, we need to figure out > > which arg of phi node comes from the true edge of cond block, as > > well as the false edge. This patch would like to add interface > > to perform the action and return the true and false arg in TREE type. > > > > There will be some additional handling if one of the arg is INTEGER_CST. > > Because the INTEGER_CST args may have no source block, thus its' edge > > source points to the condition block. See below example in line 31, > > the 255 INTEGER_CST has block 2 as source. Thus, we need to find > > the non-INTEGER_CST (aka _1) to tell which one is the true/false edge. > > For example, the _1(3) takes block 3 as source, which is the dest > > of false edge of the condition block. > > > >4 │ __attribute__((noinline)) > >5 │ uint8_t sat_u_add_imm
RE: [PATCH v2] RISC-V: Eliminate latter vsetvl when fused
Committed. Pan From: 钟居哲 Sent: Thursday, September 12, 2024 12:40 PM To: Bohan Lei ; gcc-patches Cc: Li, Pan2 Subject: Re: [PATCH v2] RISC-V: Eliminate latter vsetvl when fused LGTM juzhe.zh...@rivai.ai<mailto:juzhe.zh...@rivai.ai> From: Bohan Lei<mailto:garth...@linux.alibaba.com> Date: 2024-09-12 12:38 To: gcc-patches<mailto:gcc-patches@gcc.gnu.org> CC: juzhe.zhong<mailto:juzhe.zh...@rivai.ai> Subject: [PATCH v2] RISC-V: Eliminate latter vsetvl when fused Resent to cc Juzhe. -- Hi all, A simple assembly check has been added in this version. Previous version: https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662783.html Thanks, Bohan -- The current vsetvl pass eliminates a vsetvl instruction when the previous info is "available," but does not when "compatible." This can lead to not only redundancy, but also incorrect behaviors when the previous info happens to be compatible with a later vector instruction, which ends of using the vsetvl info that should have been eliminated, as is shown in the testcase. This patch eliminates the vsetvl when the previous info is "compatible." gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (pre_vsetvl::fuse_local_vsetvl_info): Delete vsetvl insn when `prev_info` is compatible gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/vsetvl/vsetvl_bug-4.c: New test. --- gcc/config/riscv/riscv-vsetvl.cc | 3 +++ .../riscv/rvv/vsetvl/vsetvl_bug-4.c | 19 +++ 2 files changed, 22 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_bug-4.c diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc index ce831685439..030ffbe2ebb 100644 --- a/gcc/config/riscv/riscv-vsetvl.cc +++ b/gcc/config/riscv/riscv-vsetvl.cc @@ -2796,6 +2796,9 @@ pre_vsetvl::fuse_local_vsetvl_info () curr_info.dump (dump_file, ""); } m_dem.merge (prev_info, curr_info); + if (!curr_info.vl_used_by_non_rvv_insn_p () + && vsetvl_insn_p (curr_info.get_insn ()->rtl ())) + m_delete_list.safe_push (curr_info); if (curr_info.get_read_vl_insn ()) prev_info.set_read_vl_insn (curr_info.get_read_vl_insn ()); if (dump_file && (dump_flags & TDF_DETAILS)) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_bug-4.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_bug-4.c new file mode 100644 index 000..04a8ff2945a --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_bug-4.c @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gcv -mabi=lp64d -O2 -fno-schedule-insns -fdump-rtl-vsetvl-details" } */ + +#include + +vuint16m1_t +foo (vuint16m1_t a, vuint16m1_t b, size_t avl) +{ + size_t vl; + vuint16m1_t ret; + uint16_t c = __riscv_vmv_x_s_u16m1_u16(a); + vl = __riscv_vsetvl_e8mf2 (avl); + ret = __riscv_vadd_vx_u16m1 (a, c, avl); + ret = __riscv_vadd_vv_u16m1 (ret, a, vl); + return ret; +} + +/* { dg-final { scan-rtl-dump "Eliminate insn" "vsetvl" } } */ +/* { dg-final { scan-assembler-times {vsetvli} 2 } } */ -- 2.17.1
RE: [PATCH v3 2/5] Match: Add interface match_cond_with_binary_phi for true/false arg
Thanks Richard for comments. > why would arg_edge depend on whether t0 is INTEGER_CST or not? Because the edge->src of INTEGER_CST points to the cond block which cannot match the edge->dest of the cond_block. For example as below, the first arg of PHI is 255(2), which cannot match neither goto nor goto . Thus, I need to take the second arg, aka _1(3) to match the edge->dest of cond_block. Aka the phi arg edge->src == cond_block edge->dest. In below example, the goto matches _1(3) with false condition, and then I can locate the edge from b2 -> b3. Or is there any better approach for this scenario? 4 │ __attribute__((noinline)) 5 │ uint8_t sat_u_add_imm_type_check_uint8_t_fmt_2 (uint8_t x) 6 │ { 7 │ unsigned char _1; 8 │ unsigned char _2; 9 │ uint8_t _3; 10 │ __complex__ unsigned char _5; 11 │ 12 │ ;; basic block 2, loop depth 0 13 │ ;;pred: ENTRY 14 │ _5 = .ADD_OVERFLOW (x_4(D), 9); 15 │ _2 = IMAGPART_EXPR <_5>; 16 │ if (_2 != 0) 17 │ goto ; [35.00%] 18 │ else 19 │ goto ; [65.00%] 20 │ ;;succ: 3 21 │ ;;4 22 │ 23 │ ;; basic block 3, loop depth 0 24 │ ;;pred: 2 25 │ _1 = REALPART_EXPR <_5>; 26 │ ;;succ: 4 27 │ 28 │ ;; basic block 4, loop depth 0 29 │ ;;pred: 2 30 │ ;;3 31 │ # _3 = PHI <255(2), _1(3)> 32 │ return _3; 33 │ ;;succ: EXIT 34 │ 35 │ } > Can you instead inline match_control_flow_graph_case_0 and _1 and do the > argument assignment within the three cases of CFGs we accept? That > would be much easier to follow. To double confirm, are you suggest inline the cfg match for both the case_0 and case_1? That may make func body grows, and we may have more cases like case_2, case_3... etc. If so, I will inline this to match_cond_with_binary_phi in v4. Pan -Original Message- From: Richard Biener Sent: Wednesday, September 11, 2024 9:39 PM To: Li, Pan2 Cc: gcc-patches@gcc.gnu.org; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH v3 2/5] Match: Add interface match_cond_with_binary_phi for true/false arg On Wed, Sep 11, 2024 at 8:31 AM wrote: > > From: Pan Li > > When matching the cond with 2 args phi node, we need to figure out > which arg of phi node comes from the true edge of cond block, as > well as the false edge. This patch would like to add interface > to perform the action and return the true and false arg in TREE type. > > There will be some additional handling if one of the arg is INTEGER_CST. > Because the INTEGER_CST args may have no source block, thus its' edge > source points to the condition block. See below example in line 31, > the 255 INTEGER_CST has block 2 as source. Thus, we need to find > the non-INTEGER_CST (aka _1) to tell which one is the true/false edge. > For example, the _1(3) takes block 3 as source, which is the dest > of false edge of the condition block. > >4 │ __attribute__((noinline)) >5 │ uint8_t sat_u_add_imm_type_check_uint8_t_fmt_2 (uint8_t x) >6 │ { >7 │ unsigned char _1; >8 │ unsigned char _2; >9 │ uint8_t _3; > 10 │ __complex__ unsigned char _5; > 11 │ > 12 │ ;; basic block 2, loop depth 0 > 13 │ ;;pred: ENTRY > 14 │ _5 = .ADD_OVERFLOW (x_4(D), 9); > 15 │ _2 = IMAGPART_EXPR <_5>; > 16 │ if (_2 != 0) > 17 │ goto ; [35.00%] > 18 │ else > 19 │ goto ; [65.00%] > 20 │ ;;succ: 3 > 21 │ ;;4 > 22 │ > 23 │ ;; basic block 3, loop depth 0 > 24 │ ;;pred: 2 > 25 │ _1 = REALPART_EXPR <_5>; > 26 │ ;;succ: 4 > 27 │ > 28 │ ;; basic block 4, loop depth 0 > 29 │ ;;pred: 2 > 30 │ ;;3 > 31 │ # _3 = PHI <255(2), _1(3)> > 32 │ return _3; > 33 │ ;;succ: EXIT > 34 │ > 35 │ } > > The below test suites are passed for this patch. > * The rv64gcv fully regression test. > * The x86 bootstrap test. > * The x86 fully regression test. > > gcc/ChangeLog: > > * gimple-match-head.cc (match_cond_with_binary_phi): Add new func > impl to match binary phi for true and false arg. > > Signed-off-by: Pan Li > --- > gcc/gimple-match-head.cc | 60 > 1 file changed, 60 insertions(+) > > diff --git a/gcc/gimple-match-head.cc b/gcc/gimple-match-head.cc > index c51728ae742..64f4f28cc72 100644 > --- a/gcc/gimple-match-head.cc > +++ b/gcc/gimple-match
RE: [PATCH 1/2] RISC-V: Fix vl_used_by_non_rvv_insn logic of vsetvl pass
Committed, thanks Juzhe and garthlei. Pan From: 钟居哲 Sent: Wednesday, September 11, 2024 7:36 PM To: gcc-patches Cc: Li, Pan2 ; Robin Dapp ; jeffreyalaw ; kito.cheng Subject: [PATCH 1/2] RISC-V: Fix vl_used_by_non_rvv_insn logic of vsetvl pass Hi, garthlei. Thanks for fixing it. I see, you are trying to fix this bug: lui a5,%hi(.LANCHOR0) addia5,a5,%lo(.LANCHOR0) vsetivlizero,2,e8,mf8,ta,ma ---> It should be a4, 2 instead of zero, 2 vle64.v v1,0(a5) --- missing vsetvli a4, a4 here sllia4,a4,1 vsetvli zero,a4,e32,m1,ta,ma li a2,-1 addia5,a5,16 vslide1down.vx v1,v1,a2 vslide1down.vx v1,v1,zero vsetivlizero,2,e64,m1,ta,ma vse64.v v1,0(a5) ret When I revisit the codes here: m_vl = ::get_vl ... update_avl -> "m_vl" variable is modified ... using wrong m_vl in the following. A dedicated temporary variable dest_vl looks reasonable here. LGTM. The RISC-V folks will commit this patch for you. Thanks. juzhe.zh...@rivai.ai<mailto:juzhe.zh...@rivai.ai> From: Li, Pan2<mailto:pan2...@intel.com> Date: 2024-09-11 19:29 To: juzhe.zh...@rivai.ai<mailto:juzhe.zh...@rivai.ai> Subject: FW: [PATCH 1/2] RISC-V: Fix vl_used_by_non_rvv_insn logic of vsetvl pass FYI. -Original Message- From: garthlei mailto:garth...@linux.alibaba.com>> Sent: Wednesday, September 11, 2024 5:10 PM To: gcc-patches mailto:gcc-patches@gcc.gnu.org>> Subject: [PATCH 1/2] RISC-V: Fix vl_used_by_non_rvv_insn logic of vsetvl pass This patch fixes a bug in the current vsetvl pass. The current pass uses `m_vl` to determine whether the dest operand has been used by non-RVV instructions. However, `m_vl` may have been modified as a result of an `update_avl` call, and thus would be no longer the dest operand of the original instruction. This can lead to incorrect vsetvl eliminations, as is shown in the testcase. In this patch, we create a `dest_vl` variable for this scenerio. gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc: Use `dest_vl` for dest VL operand gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/vsetvl/vsetvl_bug-3.c: New test. --- gcc/config/riscv/riscv-vsetvl.cc| 16 +++- .../gcc.target/riscv/rvv/vsetvl/vsetvl_bug-3.c | 17 + 2 files changed, 28 insertions(+), 5 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_bug-3.c diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc index 017efa8bc17..ce831685439 100644 --- a/gcc/config/riscv/riscv-vsetvl.cc +++ b/gcc/config/riscv/riscv-vsetvl.cc @@ -1002,6 +1002,9 @@ public: void parse_insn (insn_info *insn) { +/* The VL dest of the insn */ +rtx dest_vl = NULL_RTX; + m_insn = insn; m_bb = insn->bb (); /* Return if it is debug insn for the consistency with optimize == 0. */ @@ -1035,7 +1038,10 @@ public: if (m_avl) { if (vsetvl_insn_p (insn->rtl ()) || has_vlmax_avl ()) - m_vl = ::get_vl (insn->rtl ()); + { + m_vl = ::get_vl (insn->rtl ()); + dest_vl = m_vl; + } if (has_nonvlmax_reg_avl ()) m_avl_def = find_access (insn->uses (), REGNO (m_avl))->def (); @@ -1132,22 +1138,22 @@ public: } /* Determine if dest operand(vl) has been used by non-RVV instructions. */ -if (has_vl ()) +if (dest_vl) { const hash_set vl_uses - = get_all_real_uses (get_insn (), REGNO (get_vl ())); + = get_all_real_uses (get_insn (), REGNO (dest_vl)); for (use_info *use : vl_uses) { gcc_assert (use->insn ()->is_real ()); rtx_insn *rinsn = use->insn ()->rtl (); if (!has_vl_op (rinsn) - || count_regno_occurrences (rinsn, REGNO (get_vl ())) != 1) + || count_regno_occurrences (rinsn, REGNO (dest_vl)) != 1) { m_vl_used_by_non_rvv_insn = true; break; } rtx avl = ::get_avl (rinsn); - if (!avl || !REG_P (avl) || REGNO (get_vl ()) != REGNO (avl)) + if (!avl || !REG_P (avl) || REGNO (dest_vl) != REGNO (avl)) { m_vl_used_by_non_rvv_insn = true; break; diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_bug-3.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_bug-3.c new file mode 100644 index 000..c155f5613d2 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_bug-3.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv32gcv -mabi=ilp32d -O2 -fdump-rtl-vsetvl-details" } */ + +#include + +uint64_t a[2], b[2]; + +void +foo () +{ + size_t vl = __riscv_vsetvl_e64m1 (2); + vuint64m1_t vx = __riscv_vle64_v_u64m1 (a, vl); + vx = __riscv_vslide1down_vx_u64m1 (vx, 0xull, vl); + __riscv_vse64_v_u64m1 (b, vx, vl); +} + +/* { dg-final { scan-rtl-dump-not "Eliminate insn" "vsetvl" } } */ -- 2.17.1
RE: [PATCH v2 2/2] RISC-V: Fix ICE due to inconsistency of RVV intrinsic list in lto and cc1.
> * gcc.target/riscv/rvv/base/bug-11.c: New test. Seems you missed this file in patch v2? > +/* Helper for init_builtins in LTO. */ > +static void > +handle_pragma_vector_for_lto () > +{ > + struct pragma_intrinsic_flags backup_flags; > + > + riscv_pragma_intrinsic_flags_pollute (&backup_flags); > + > + riscv_option_override (); > + init_adjust_machine_modes (); > + > + register_builtin_types (); > + > + handle_pragma_vector (); > + riscv_pragma_intrinsic_flags_restore (&backup_flags); > + > + /* Re-initialize after the flags are restored. */ > + riscv_option_override (); > + init_adjust_machine_modes (); > +} Looks this part almost the same as most of riscv_pragma_intrinsic except register_builtin_types (). I wonder if we can wrap a helper to avoid code duplication, and IMO the _lto suffix should be removed as the body of function has nothing to do with lto. Otherwise no comments from myside, and l'd leave it to kito or juzhe. Pan -Original Message- From: Jin Ma Sent: Tuesday, September 10, 2024 1:57 PM To: gcc-patches@gcc.gnu.org Cc: jeffreya...@gmail.com; juzhe.zh...@rivai.ai; Li, Pan2 ; kito.ch...@gmail.com; richard.guent...@gmail.com; jinma.cont...@gmail.com; Jin Ma Subject: [PATCH v2 2/2] RISC-V: Fix ICE due to inconsistency of RVV intrinsic list in lto and cc1. When we use flto, the function list of rvv will be generated twice, once in the cc1 phase and once in the lto phase. However, due to the different generation methods, the two lists are different. For example, when there is no zvfh or zvfhmin in arch, it is generated by calling function "riscv_pragma_intrinsic". since the TARGET_VECTOR_ELEN_FP_16 is enabled before rvv function generation, a list of rvv functions related to float16 will be generated. In the lto phase, the rvv function list is generated only by calling the function "riscv_init_builtins", but the TARGET_VECTOR_ELEN_FP_16 is disabled, so that the float16-related rvv function list cannot be generated like cc1. This will cause confusion, resulting in matching tothe wrong function due to inconsistent fcode in the lto phase, eventually leading to ICE. So I think we should be consistent with their generated lists, which is exactly what this patch does. gcc/ChangeLog: * config/riscv/riscv-c.cc (struct pragma_intrinsic_flags): Mov to riscv-protos.h. (riscv_pragma_intrinsic_flags_pollute): Mov to riscv-vector-builtins.c. (riscv_pragma_intrinsic_flags_restore): Likewise. (riscv_pragma_intrinsic): Likewise. * config/riscv/riscv-protos.h (struct pragma_intrinsic_flags): New. (riscv_pragma_intrinsic_flags_restore): New. (riscv_pragma_intrinsic_flags_pollute): New. * config/riscv/riscv-vector-builtins.cc (riscv_pragma_intrinsic_flags_pollute): New. (riscv_pragma_intrinsic_flags_restore): New. (handle_pragma_vector_for_lto): New. (init_builtins): Correct the processing logic for lto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/bug-11.c: New test. --- gcc/config/riscv/riscv-c.cc | 70 +-- gcc/config/riscv/riscv-protos.h | 13 gcc/config/riscv/riscv-vector-builtins.cc | 83 ++- 3 files changed, 96 insertions(+), 70 deletions(-) diff --git a/gcc/config/riscv/riscv-c.cc b/gcc/config/riscv/riscv-c.cc index 71112d9c66d7..7037ecc1268a 100644 --- a/gcc/config/riscv/riscv-c.cc +++ b/gcc/config/riscv/riscv-c.cc @@ -34,72 +34,6 @@ along with GCC; see the file COPYING3. If not see #define builtin_define(TXT) cpp_define (pfile, TXT) -struct pragma_intrinsic_flags -{ - int intrinsic_target_flags; - - int intrinsic_riscv_vector_elen_flags; - int intrinsic_riscv_zvl_flags; - int intrinsic_riscv_zvb_subext; - int intrinsic_riscv_zvk_subext; -}; - -static void -riscv_pragma_intrinsic_flags_pollute (struct pragma_intrinsic_flags *flags) -{ - flags->intrinsic_target_flags = target_flags; - flags->intrinsic_riscv_vector_elen_flags = riscv_vector_elen_flags; - flags->intrinsic_riscv_zvl_flags = riscv_zvl_flags; - flags->intrinsic_riscv_zvb_subext = riscv_zvb_subext; - flags->intrinsic_riscv_zvk_subext = riscv_zvk_subext; - - target_flags = target_flags -| MASK_VECTOR; - - riscv_zvl_flags = riscv_zvl_flags -| MASK_ZVL32B -| MASK_ZVL64B -| MASK_ZVL128B; - - riscv_vector_elen_flags = riscv_vector_elen_flags -| MASK_VECTOR_ELEN_32 -| MASK_VECTOR_ELEN_64 -| MASK_VECTOR_ELEN_FP_16 -| MASK_VECTOR_ELEN_FP_32 -| MASK_VECTOR_ELEN_FP_64; - - riscv_zvb_subext = riscv_zvb_subext -| MASK_ZVBB -| MASK_ZVBC -| MASK_ZVKB; - - riscv_zvk_subext = riscv_zvk_subext -| MASK_ZVKG -| MASK_ZVKNED -| MASK_ZVKNHA -| MASK_ZVKNHB -| MASK_ZVKSED -| MASK_ZVKSH -| MASK_ZVKN -| MASK_Z
RE: [PATCH v1] Match: Support form 2 for scalar signed integer .SAT_ADD
Thanks a lot. > It's just the number of patterns generated > is 2^number-of-:c, so it's good to prune known unnecessary combinations. I see, will make the changes as your suggestion and commit it if no surprise from test suites. > Yes, all commutative binary operators require matching types on their > operands. Got it, will revisit the matching I added before for possible redundant checking. Pan -Original Message- From: Richard Biener Sent: Tuesday, September 10, 2024 3:02 PM To: Li, Pan2 Cc: gcc-patches@gcc.gnu.org; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH v1] Match: Support form 2 for scalar signed integer .SAT_ADD On Tue, Sep 10, 2024 at 1:05 AM Li, Pan2 wrote: > > Thanks Richard for comments. > > >> + The T and UT are type pair like T=int8_t, UT=uint8_t. */ > >> +(match (signed_integer_sat_add @0 @1) > >> + (cond^ (ge (bit_and:c (bit_xor:c @0 (nop_convert@2 (plus (nop_convert @0) > >> + (nop_convert > >> @1 > >> + (bit_not (bit_xor:c @0 @1))) > > >You only need one :c on either bit_xor. > > Sorry don't get the pointer here. I can understand swap @0 and @1 can also > acts on plus op. > But the first xor with :c would like to allow (@0 @2) and (@2 @0). > > Or due to the commutative(xor), swap @0 and @1 also valid for (@1 @2) in the > first xor. But > I failed to get the point how to make the @2 as first arg here. Hmm, my logic was that there's a canonicalization rule for SSA operands which is to put SSA names with higher SSA_NAME_VERSION last. That means we get the 2nd bit_xor in a defined order, we don't know the @0 order wrt @2 so we need to put :c on that. That should get us all interesting cases plus making sure the @0s match up? But maybe I'm missing something. It's just the number of patterns generated is 2^number-of-:c, so it's good to prune known unnecessary combinations. > >> + integer_zerop) > >> + @2 > >> + (bit_xor:c (negate (convert (lt @0 integer_zerop))) max_value)) > > >> + (if (INTEGRAL_TYPE_P (type) && !TYPE_UNSIGNED (type) > >> + && types_match (type, @0, @1 > > >I think the types_match is redundant as you have the bit_xor combining both. > > Got it, does that indicates the bit_xor somehow has the similar type check > already? As well as other > op like and/or ... etc. Yes, all commutative binary operators require matching types on their operands. > > Pan > > -Original Message- > From: Richard Biener > Sent: Monday, September 9, 2024 8:19 PM > To: Li, Pan2 > Cc: gcc-patches@gcc.gnu.org; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; > kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com > Subject: Re: [PATCH v1] Match: Support form 2 for scalar signed integer > .SAT_ADD > > On Tue, Sep 3, 2024 at 2:34 PM wrote: > > > > From: Pan Li > > > > This patch would like to support the form 2 of the scalar signed > > integer .SAT_ADD. Aka below example: > > > > Form 2: > > #define DEF_SAT_S_ADD_FMT_2(T, UT, MIN, MAX) \ > > T __attribute__((noinline)) \ > > sat_s_add_##T##_fmt_2 (T x, T y) \ > > {\ > > T sum = (UT)x + (UT)y; \ > >\ > > if ((x ^ y) < 0 || (sum ^ x) >= 0) \ > > return sum; \ > >\ > > return x < 0 ? MIN : MAX; \ > > } > > > > DEF_SAT_S_ADD_FMT_2(int8_t, uint8_t, INT8_MIN, INT8_MAX) > > > > We can tell the difference before and after this patch if backend > > implemented the ssadd3 pattern similar as below. > > > > Before this patch: > >4 │ __attribute__((noinline)) > >5 │ int8_t sat_s_add_int8_t_fmt_2 (int8_t x, int8_t y) > >6 │ { > >7 │ int8_t sum; > >8 │ unsigned char x.0_1; > >9 │ unsigned char y.1_2; > > 10 │ unsigned char _3; > > 11 │ signed char _4; > > 12 │ signed char _5; > > 13 │ int8_t _6; > > 14 │ _Bool _11; > > 15 │ signed char _12; > > 16 │ signed char _13; > > 17 │ signed char _14; > > 18 │ signed char _22; > > 19 │ signed char _23; > > 20 │ > > 21 │ ;; basic block 2, loop depth 0 > > 22 │ ;;pred: ENTRY > > 2
RE: [PATCH v1] Match: Support form 2 for scalar signed integer .SAT_ADD
Thanks Richard for comments. >> + The T and UT are type pair like T=int8_t, UT=uint8_t. */ >> +(match (signed_integer_sat_add @0 @1) >> + (cond^ (ge (bit_and:c (bit_xor:c @0 (nop_convert@2 (plus (nop_convert @0) >> + (nop_convert @1 >> + (bit_not (bit_xor:c @0 @1))) >You only need one :c on either bit_xor. Sorry don't get the pointer here. I can understand swap @0 and @1 can also acts on plus op. But the first xor with :c would like to allow (@0 @2) and (@2 @0). Or due to the commutative(xor), swap @0 and @1 also valid for (@1 @2) in the first xor. But I failed to get the point how to make the @2 as first arg here. >> + integer_zerop) >> + @2 >> + (bit_xor:c (negate (convert (lt @0 integer_zerop))) max_value)) >> + (if (INTEGRAL_TYPE_P (type) && !TYPE_UNSIGNED (type) >> + && types_match (type, @0, @1 >I think the types_match is redundant as you have the bit_xor combining both. Got it, does that indicates the bit_xor somehow has the similar type check already? As well as other op like and/or ... etc. Pan -Original Message- From: Richard Biener Sent: Monday, September 9, 2024 8:19 PM To: Li, Pan2 Cc: gcc-patches@gcc.gnu.org; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH v1] Match: Support form 2 for scalar signed integer .SAT_ADD On Tue, Sep 3, 2024 at 2:34 PM wrote: > > From: Pan Li > > This patch would like to support the form 2 of the scalar signed > integer .SAT_ADD. Aka below example: > > Form 2: > #define DEF_SAT_S_ADD_FMT_2(T, UT, MIN, MAX) \ > T __attribute__((noinline)) \ > sat_s_add_##T##_fmt_2 (T x, T y) \ > {\ > T sum = (UT)x + (UT)y; \ >\ > if ((x ^ y) < 0 || (sum ^ x) >= 0) \ > return sum; \ >\ > return x < 0 ? MIN : MAX; \ > } > > DEF_SAT_S_ADD_FMT_2(int8_t, uint8_t, INT8_MIN, INT8_MAX) > > We can tell the difference before and after this patch if backend > implemented the ssadd3 pattern similar as below. > > Before this patch: >4 │ __attribute__((noinline)) >5 │ int8_t sat_s_add_int8_t_fmt_2 (int8_t x, int8_t y) >6 │ { >7 │ int8_t sum; >8 │ unsigned char x.0_1; >9 │ unsigned char y.1_2; > 10 │ unsigned char _3; > 11 │ signed char _4; > 12 │ signed char _5; > 13 │ int8_t _6; > 14 │ _Bool _11; > 15 │ signed char _12; > 16 │ signed char _13; > 17 │ signed char _14; > 18 │ signed char _22; > 19 │ signed char _23; > 20 │ > 21 │ ;; basic block 2, loop depth 0 > 22 │ ;;pred: ENTRY > 23 │ x.0_1 = (unsigned char) x_7(D); > 24 │ y.1_2 = (unsigned char) y_8(D); > 25 │ _3 = x.0_1 + y.1_2; > 26 │ sum_9 = (int8_t) _3; > 27 │ _4 = x_7(D) ^ y_8(D); > 28 │ _5 = x_7(D) ^ sum_9; > 29 │ _23 = ~_4; > 30 │ _22 = _5 & _23; > 31 │ if (_22 >= 0) > 32 │ goto ; [42.57%] > 33 │ else > 34 │ goto ; [57.43%] > 35 │ ;;succ: 4 > 36 │ ;;3 > 37 │ > 38 │ ;; basic block 3, loop depth 0 > 39 │ ;;pred: 2 > 40 │ _11 = x_7(D) < 0; > 41 │ _12 = (signed char) _11; > 42 │ _13 = -_12; > 43 │ _14 = _13 ^ 127; > 44 │ ;;succ: 4 > 45 │ > 46 │ ;; basic block 4, loop depth 0 > 47 │ ;;pred: 2 > 48 │ ;;3 > 49 │ # _6 = PHI > 50 │ return _6; > 51 │ ;;succ: EXIT > 52 │ > 53 │ } > > After this patch: >4 │ __attribute__((noinline)) >5 │ int8_t sat_s_add_int8_t_fmt_2 (int8_t x, int8_t y) >6 │ { >7 │ int8_t _6; >8 │ >9 │ ;; basic block 2, loop depth 0 > 10 │ ;;pred: ENTRY > 11 │ _6 = .SAT_ADD (x_7(D), y_8(D)); [tail call] > 12 │ return _6; > 13 │ ;;succ: EXIT > 14 │ > 15 │ } > > The below test suites are passed for this patch. > * The rv64gcv fully regression test. > * The x86 bootstrap test. > * The x86 fully regression test. > > gcc/ChangeLog: > > * match.pd: Add the form 2 of signed .SAT_ADD matching. > > Signed-off-by: Pan Li > --- > gcc/match.pd | 15 +++ > 1 file changed, 15 insertions(+) > > diff
RE: [PATCH v2 1/2] Genmatch: Support control flow graph case 1 for phi on condition
Thanks Richard for comments. > Sorry to spoil this again, but can you instead create an interface like Need mind, let me update it. > gcond * > match_cond_with_phi (gphi *phi, tree *true_arg, tree *false_arg); > That would from a PHI node match up the controlling condition and > initialize {true,false}_arg with the PHI args that match the conditions > true/false case? > I also think for the diamond case you fail to identify the appropriate > true/false PHI argument since both incoming edges are not from the > condition block they won't have EDGE_{TRUE,FALSE}_VALUE set. Sure thing, I also noticed that in form 4 the both edge of PHI are false, thus I am working on another patch like extract_true_false_args_from_binary_phi to take care of this. Let me append that patch to the series v3. Pan -Original Message- From: Richard Biener Sent: Monday, September 9, 2024 8:27 PM To: Li, Pan2 Cc: gcc-patches@gcc.gnu.org; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH v2 1/2] Genmatch: Support control flow graph case 1 for phi on condition On Thu, Sep 5, 2024 at 2:01 PM wrote: > > From: Pan Li > > The gen_phi_on_cond can only support below control flow for cond > from day 1. Aka: > > +--+ > | def | > | ... | +-+ > | cond |-->| def | > +--+ | ... | >| +-+ >| | >v | > +-+ | > | PHI |<--+ > +-+ > > Unfortunately, there will be more scenarios of control flow on PHI. > For example as below: > > T __attribute__((noinline))\ > sat_s_add_##T##_fmt_3 (T x, T y) \ > { \ > T sum; \ > bool overflow = __builtin_add_overflow (x, y, &sum); \ > return overflow ? x < 0 ? MIN : MAX : sum; \ > } > > DEF_SAT_S_ADD_FMT_3(int8_t, uint8_t, INT8_MIN, INT8_MAX) > > With expanded RTL like below. >3 │ >4 │ __attribute__((noinline)) >5 │ int8_t sat_s_add_int8_t_fmt_3 (int8_t x, int8_t y) >6 │ { >7 │ signed char _1; >8 │ signed char _2; >9 │ int8_t _3; > 10 │ __complex__ signed char _6; > 11 │ _Bool _8; > 12 │ signed char _9; > 13 │ signed char _10; > 14 │ signed char _11; > 15 │ > 16 │ ;; basic block 2, loop depth 0 > 17 │ ;;pred: ENTRY > 18 │ _6 = .ADD_OVERFLOW (x_4(D), y_5(D)); > 19 │ _2 = IMAGPART_EXPR <_6>; > 20 │ if (_2 != 0) > 21 │ goto ; [50.00%] > 22 │ else > 23 │ goto ; [50.00%] > 24 │ ;;succ: 4 > 25 │ ;;3 > 26 │ > 27 │ ;; basic block 3, loop depth 0 > 28 │ ;;pred: 2 > 29 │ _1 = REALPART_EXPR <_6>; > 30 │ goto ; [100.00%] > 31 │ ;;succ: 5 > 32 │ > 33 │ ;; basic block 4, loop depth 0 > 34 │ ;;pred: 2 > 35 │ _8 = x_4(D) < 0; > 36 │ _9 = (signed char) _8; > 37 │ _10 = -_9; > 38 │ _11 = _10 ^ 127; > 39 │ ;;succ: 5 > 40 │ > 41 │ ;; basic block 5, loop depth 0 > 42 │ ;;pred: 3 > 43 │ ;;4 > 44 │ # _3 = PHI <_1(3), _11(4)> > 45 │ return _3; > 46 │ ;;succ: EXIT > 47 │ > 48 │ } > > The above code will have below control flow which is not supported by > the gen_phi_on_cond. > > +--+ > | def | > | ... | +-+ > | cond |-->| def | > +--+ | ... | >| +-+ >| | >v | > +-+ | > | def | | > | ... | | > +-+ | >| | >| | >v | > +-+ | > | PHI |<--+ > +-+ > > This patch would like to add support above control flow for the > gen_phi_on_cond. The generated match code looks like below. > > Before this patch: > basic_block _b1 = gimple_bb (_a1); > if (gimple_phi_num_args (_a1) == 2) > { > basic_block _pb_0_1 = EDGE_PRED (_b1, 0)->src; > basic_block _pb_1_1 = EDGE_PRED (_b1, 1)->src; > basic_block _db_1 = safe_dyn_cast (*gsi_last_bb (_pb_0_1)) ? > _pb_0_1 : _pb_1_1; > basic_block _other_db_1 = safe_dyn_cast (*gsi_last_bb > (_pb_0_1)) ? _pb_1_1 : _pb_0_1; > gcond *_ct_1 = safe_dyn_cast (*gsi_last_bb (_db_1)); > if (_ct_1 && EDGE_COUNT (_other_db_1->preds) == 1 > &a
RE: [PATCH] RISC-V: Fix ICE for rvv in lto
> Any comments on this patch? I may need some time to go through all details (PS: Sorry I cannot approve patches, leave it to juzhe or kito). Thanks a lot for fixing this. Pan -Original Message- From: Jin Ma Sent: Monday, September 9, 2024 6:30 PM To: Li, Pan2 ; gcc-patches@gcc.gnu.org Cc: jeffreya...@gmail.com; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jinma.cont...@gmail.com Subject: Re: [PATCH] RISC-V: Fix ICE for rvv in lto > I see, I can reproduce this when build "-march=rv64gcv -mabi=lp64d -flto -O0 > test.c -o test.elf". > > #include > > int > main () > { > size_t vl = 8; > vint32m1_t vs1 = {}; > vint32m1_t vs2 = {}; > vint32m1_t vd = __riscv_vadd_vv_i32m1(vs1, vs2, vl); > > return (int)&vd; > } > > Pan Hi, Pan Any comments on this patch? I think this patch is quite important, because RVV is completely unavailable on LTO at present. In fact, I discovered this ICE while trying to compile some computational libraries using LTO. Unfortunately, none of the libraries currently compile through properly. BR Jin
RE: [PATCH v1] Vect: Support form 1 of vector signed integer .SAT_ADD
Kindly ping. Pan -Original Message- From: Li, Pan2 Sent: Friday, August 30, 2024 6:16 PM To: gcc-patches@gcc.gnu.org Cc: richard.guent...@gmail.com; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com; Li, Pan2 Subject: [PATCH v1] Vect: Support form 1 of vector signed integer .SAT_ADD From: Pan Li This patch would like to support the vector signed ssadd pattern for the RISC-V backend. Aka Form 1: #define DEF_VEC_SAT_S_ADD_FMT_1(T, UT, MIN, MAX) \ void __attribute__((noinline)) \ vec_sat_s_add_##T##_fmt_1 (T *out, T *x, T *y, unsigned n) \ { \ for (unsigned i = 0; i < n; i++) \ { \ T sum = (UT)x[i] + (UT)y[i]; \ out[i] = (x[i] ^ y[i]) < 0 \ ? sum \ : (sum ^ x[i]) >= 0\ ? sum\ : x[i] < 0 ? MIN : MAX; \ } \ } DEF_VEC_SAT_S_ADD_FMT_1(int64_t, uint64_t, INT64_MIN, INT64_MAX) If the backend implemented the vector mode of ssadd, we will see IR diff similar as below: Before this patch: 108 │ _114 = .SELECT_VL (ivtmp_112, POLY_INT_CST [2, 2]); 109 │ ivtmp_77 = _114 * 8; 110 │ vect__4.9_80 = .MASK_LEN_LOAD (vectp_x.7_78, 64B, { -1, ... }, _114, 0); 111 │ vect__5.10_81 = VIEW_CONVERT_EXPR(vect__4.9_80); 112 │ vect__7.13_85 = .MASK_LEN_LOAD (vectp_y.11_83, 64B, { -1, ... }, _114, 0); 113 │ vect__8.14_86 = VIEW_CONVERT_EXPR(vect__7.13_85); 114 │ vect__9.15_87 = vect__5.10_81 + vect__8.14_86; 115 │ vect_sum_20.16_88 = VIEW_CONVERT_EXPR(vect__9.15_87); 116 │ vect__10.17_89 = vect__4.9_80 ^ vect__7.13_85; 117 │ vect__11.18_90 = vect__4.9_80 ^ vect_sum_20.16_88; 118 │ mask__46.19_92 = vect__10.17_89 >= { 0, ... }; 119 │ _36 = vect__4.9_80 >> 63; 120 │ mask__44.26_104 = vect__11.18_90 < { 0, ... }; 121 │ mask__43.27_105 = mask__46.19_92 & mask__44.26_104; 122 │ _115 = .COND_XOR (mask__43.27_105, _36, { 9223372036854775807, ... }, vect_sum_20.16_88); 123 │ .MASK_LEN_STORE (vectp_out.29_108, 64B, { -1, ... }, _114, 0, _115); 124 │ vectp_x.7_79 = vectp_x.7_78 + ivtmp_77; 125 │ vectp_y.11_84 = vectp_y.11_83 + ivtmp_77; 126 │ vectp_out.29_109 = vectp_out.29_108 + ivtmp_77; 127 │ ivtmp_113 = ivtmp_112 - _114; After this patch: 94 │ # vectp_x.7_82 = PHI 95 │ # vectp_y.10_86 = PHI 96 │ # vectp_out.14_91 = PHI 97 │ # ivtmp_95 = PHI 98 │ _97 = .SELECT_VL (ivtmp_95, POLY_INT_CST [2, 2]); 99 │ ivtmp_81 = _97 * 8; 100 │ vect__4.9_84 = .MASK_LEN_LOAD (vectp_x.7_82, 64B, { -1, ... }, _97, 0); 101 │ vect__7.12_88 = .MASK_LEN_LOAD (vectp_y.10_86, 64B, { -1, ... }, _97, 0); 102 │ vect_patt_40.13_89 = .SAT_ADD (vect__4.9_84, vect__7.12_88); 103 │ .MASK_LEN_STORE (vectp_out.14_91, 64B, { -1, ... }, _97, 0, vect_patt_40.13_89); 104 │ vectp_x.7_83 = vectp_x.7_82 + ivtmp_81; 105 │ vectp_y.10_87 = vectp_y.10_86 + ivtmp_81; 106 │ vectp_out.14_92 = vectp_out.14_91 + ivtmp_81; 107 │ ivtmp_96 = ivtmp_95 - _97; The below test suites are passed for this patch: 1. The rv64gcv fully regression tests. 2. The x86 bootstrap tests. 3. The x86 fully regression tests. gcc/ChangeLog: * match.pd: Add case 2 for the signed .SAT_ADD consumed by vect pattern. * tree-vect-patterns.cc (gimple_signed_integer_sat_add): Add new matching func decl for signed .SAT_ADD. (vect_recog_sat_add_pattern): Add signed .SAT_ADD pattern match. Signed-off-by: Pan Li --- gcc/match.pd | 17 + gcc/tree-vect-patterns.cc | 5 - 2 files changed, 21 insertions(+), 1 deletion(-) diff --git a/gcc/match.pd b/gcc/match.pd index be211535a49..578c9dd5b77 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -3207,6 +3207,23 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (if (INTEGRAL_TYPE_P (type) && !TYPE_UNSIGNED (type) && types_match (type, @0, @1 +/* Signed saturation add, case 2: + T sum = (T)((UT)X + (UT)Y) + SAT_S_ADD = (X ^ Y) < 0 && (X ^ sum) >= 0 ? (-(T)(X < 0) ^ MAX) : sum; + + The T and UT are type pair like T=int8_t, UT=uint8_t. */ +(match (signed_integer_sat_add @0 @1) + (cond^ (bit_and:c (lt (bit_xor:c @0 (nop_convert@2 (plus (nop_convert @0) + (nop_convert @1 + integer_zerop) + (ge (bit_xor:c @0 @1) integer_zerop))
RE: [PATCH] RISC-V: Fix ICE for rvv in lto
I see, I can reproduce this when build "-march=rv64gcv -mabi=lp64d -flto -O0 test.c -o test.elf". #include int main () { size_t vl = 8; vint32m1_t vs1 = {}; vint32m1_t vs2 = {}; vint32m1_t vd = __riscv_vadd_vv_i32m1(vs1, vs2, vl); return (int)&vd; } Pan -Original Message- From: Jin Ma Sent: Sunday, September 8, 2024 1:15 PM To: Li, Pan2 ; gcc-patches@gcc.gnu.org Cc: jeffreya...@gmail.com; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jinma.cont...@gmail.com Subject: Re: [PATCH] RISC-V: Fix ICE for rvv in lto > > #include > > > > vint32m1_t foo(vint32m1_t vs1, vint32m1_t vs2, size_t vl) > > { > > return __riscv_vadd_vv_i32m1(vs1, vs2, vl); > > } > > To double confirm, you mean "riscv64-linux-gnu-gcc-14 -march=rv64gcv > -mabi=lp64d -flto -O0 tmp.c -c -S -o -" with above is able to reproduce this > ICE? > > Pan Not too accurate, please don't add "-S" or "-c", let the compilation go to the linker and try to generate the binary. The normal result of compilation should be to throw an error that the main function cannot be found, but unfortunately ICE appears. By the way, The gcc-14 in my environment is built on releases/gcc-14, I didn't download any compiled gcc. Of course, it is also possible that my local environment is broken, and I will check it again. BR Jin
RE: [PATCH] RISC-V: Fix ICE for rvv in lto
> #include > > vint32m1_t foo(vint32m1_t vs1, vint32m1_t vs2, size_t vl) > { > return __riscv_vadd_vv_i32m1(vs1, vs2, vl); > } To double confirm, you mean "riscv64-linux-gnu-gcc-14 -march=rv64gcv -mabi=lp64d -flto -O0 tmp.c -c -S -o -" with above is able to reproduce this ICE? Pan -Original Message- From: Jin Ma Sent: Saturday, September 7, 2024 5:43 PM To: Li, Pan2 ; gcc-patches@gcc.gnu.org Cc: jeffreya...@gmail.com; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jinma.cont...@gmail.com Subject: Re: [PATCH] RISC-V: Fix ICE for rvv in lto > > +/* Test that we do not have ice when compile */ > > + > > +/* { dg-do run } */ > > +/* { dg-options "-march=rv64gcv -mabi=lp64d -mrvv-vector-bits=zvl -flto > > -O2 -fno-checking" } */ > > + > > +#include > > + > > +int > > +main () > > +{ > > + size_t vl = 8; > > + vint32m1_t vs1 = {}; > > + vint32m1_t vs2 = {}; > > + > > + __volatile__ vint32m1_t vd = __riscv_vadd_vv_i32m1(vs1, vs2, vl); > > + > > + return 0; > > +} > > Interesting, do we still have ice when there is no __voltaile__ for vd? As > well as gcc-14 branch. > Because it is quite a common case that should be covered by test already. > > Pan Yes, I am also surprised that this kind of ICE will appear. It really should be covered by test cases. But in fact, if we do not use zvfh or zvfhmin in arch, rvv cannot be used in LTO. This has nothing to do with "__voltaile__". "__voltaile__" in the case is just that I want it to be compiled to the end and not optimized. In fact, a simple case can reproduce ICE, including gcc-14 and master, for example: #include vint32m1_t foo(vint32m1_t vs1, vint32m1_t vs2, size_t vl) { return __riscv_vadd_vv_i32m1(vs1, vs2, vl); } If we compile this case with the option " -march=rv64gcv -mabi=lp64d -flto -O0", we will get the following error: during RTL pass: expand ../test.c: In function 'foo': ../test.c:5:10: internal compiler error: tree check: expected tree that contains 'typed' structure, have 'ggc_freed' in function_returns_void_p, at config/riscv/riscv-vector-builtins.h:456 5 | return __riscv_vadd_vv_i32m1(vs1, vs2, vl); | ^ 0x4081948 internal_error(char const*, ...) /iothome/jin.ma/code/master/gcc/gcc/diagnostic-global-context.cc:492 0x1dc584d tree_contains_struct_check_failed(tree_node const*, tree_node_structure_enum, char const*, int, char const*) /iothome/jin.ma/code/master/gcc/gcc/tree.cc:9177 0x10d8230 contains_struct_check(tree_node*, tree_node_structure_enum, char const*, int, char const*) /iothome/jin.ma/code/master/gcc/gcc/tree.h:3779 0x2078f0c riscv_vector::function_call_info::function_returns_void_p() /iothome/jin.ma/code/master/gcc/gcc/config/riscv/riscv-vector-builtins.h:456 0x2074f54 riscv_vector::function_expander::function_expander(riscv_vector::function_instance const&, tree_node*, tree_node*, rtx_def*) /iothome/jin.ma/code/master/gcc/gcc/config/riscv/riscv-vector-builtins.cc:3920 0x20787b8 riscv_vector::expand_builtin(unsigned int, tree_node*, rtx_def*) /iothome/jin.ma/code/master/gcc/gcc/config/riscv/riscv-vector-builtins.cc:4775 0x2029b60 riscv_expand_builtin(tree_node*, rtx_def*, rtx_def*, machine_mode, int) /iothome/jin.ma/code/master/gcc/gcc/config/riscv/riscv-builtins.cc:433 0x1167cb7 expand_builtin(tree_node*, rtx_def*, rtx_def*, machine_mode, int) /iothome/jin.ma/code/master/gcc/gcc/builtins.cc:7763 0x137e5d2 expand_expr_real_1(tree_node*, rtx_def*, machine_mode, expand_modifier, rtx_def**, bool) /iothome/jin.ma/code/master/gcc/gcc/expr.cc:12390 0x1370068 expand_expr_real(tree_node*, rtx_def*, machine_mode, expand_modifier, rtx_def**, bool) /iothome/jin.ma/code/master/gcc/gcc/expr.cc:9473 0x136434a store_expr(tree_node*, rtx_def*, int, bool, bool) /iothome/jin.ma/code/master/gcc/gcc/expr.cc:6766 0x13629e3 expand_assignment(tree_node*, tree_node*, bool) /iothome/jin.ma/code/master/gcc/gcc/expr.cc:6487 0x11a8419 expand_call_stmt /iothome/jin.ma/code/master/gcc/gcc/cfgexpand.cc:2893 0x11ac48e expand_gimple_stmt_1 /iothome/jin.ma/code/master/gcc/gcc/cfgexpand.cc:3962 0x11acaad expand_gimple_stmt /iothome/jin.ma/code/master/gcc/gcc/cfgexpand.cc:4104 0x11b55a1 expand_gimple_basic_block /iothome/jin.ma/code/master/gcc/gcc/cfgexpand.cc:6160 0x11b7b96 execute /iothome/jin.ma/code/master/gcc/gcc/cfgexpand.cc:6899 Please submit a full bug report, with preprocessed source (by using -freport-bug). Please include the complete backtrace with any bug report. See <https://gcc.gnu.org/bugs/> for instructions. lto-wrapper: fatal error: riscv64-unknown-linux-gnu-gcc returned 1 exit status compilation terminated. /mnt
RE: [PATCH] RISC-V: Fix ICE for rvv in lto
> +/* Test that we do not have ice when compile */ > + > +/* { dg-do run } */ > +/* { dg-options "-march=rv64gcv -mabi=lp64d -mrvv-vector-bits=zvl -flto -O2 > -fno-checking" } */ > + > +#include > + > +int > +main () > +{ > + size_t vl = 8; > + vint32m1_t vs1 = {}; > + vint32m1_t vs2 = {}; > + > + __volatile__ vint32m1_t vd = __riscv_vadd_vv_i32m1(vs1, vs2, vl); > + > + return 0; > +} Interesting, do we still have ice when there is no __voltaile__ for vd? As well as gcc-14 branch. Because it is quite a common case that should be covered by test already. Pan -Original Message- From: Jin Ma Sent: Saturday, September 7, 2024 1:31 AM To: gcc-patches@gcc.gnu.org Cc: jeffreya...@gmail.com; juzhe.zh...@rivai.ai; Li, Pan2 ; kito.ch...@gmail.com; jinma.cont...@gmail.com; Jin Ma Subject: [PATCH] RISC-V: Fix ICE for rvv in lto When we use flto, the function list of rvv will be generated twice, once in the cc1 phase and once in the lto phase. However, due to the different generation methods, the two lists are different. For example, when there is no zvfh or zvfhmin in arch, it is generated by calling function "riscv_pragma_intrinsic". since the TARGET_VECTOR_ELEN_FP_16 is enabled before rvv function generation, a list of rvv functions related to float16 will be generated. In the lto phase, the rvv function list is generated only by calling the function "riscv_init_builtins", but the TARGET_VECTOR_ELEN_FP_16 is disabled, so that the float16-related rvv function list cannot be generated like cc1. This will cause confusion, resulting in matching tothe wrong function due to inconsistent fcode in the lto phase, eventually leading to ICE. So I think we should be consistent with their generated lists, which is exactly what this patch does. But there is still a problem here. If we use "-fchecking", we still have ICE. This is because in the lto phase, after the rvv function list is generated and before the expand_builtin, the ggc_grow will be called to clean up the memory, resulting in "(* registered_functions)[code]->decl" being cleaned up to ", and finally ICE". I think this is wrong and needs to be fixed, maybe we shouldn't use "ggc_alloc ()", or is there another better way to implement it? I'm trying to fix it here. Any comments here? gcc/ChangeLog: * config/riscv/riscv-c.cc (struct pragma_intrinsic_flags): Mov to riscv-protos.h. (riscv_pragma_intrinsic_flags_pollute): Mov to riscv-vector-builtins.c. (riscv_pragma_intrinsic_flags_restore): Likewise. (riscv_pragma_intrinsic): Likewise. * config/riscv/riscv-protos.h (struct pragma_intrinsic_flags): New. (riscv_pragma_intrinsic_flags_restore): New. (riscv_pragma_intrinsic_flags_pollute): New. * config/riscv/riscv-vector-builtins.cc (riscv_pragma_intrinsic_flags_pollute): New. (riscv_pragma_intrinsic_flags_restore): New. (handle_pragma_vector_for_lto): New. (init_builtins): Correct the processing logic for lto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/bug-10.c: New test. --- gcc/config/riscv/riscv-c.cc | 70 +--- gcc/config/riscv/riscv-protos.h | 13 +++ gcc/config/riscv/riscv-vector-builtins.cc | 83 ++- .../gcc.target/riscv/rvv/base/bug-10.c| 18 4 files changed, 114 insertions(+), 70 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/bug-10.c diff --git a/gcc/config/riscv/riscv-c.cc b/gcc/config/riscv/riscv-c.cc index 71112d9c66d7..7037ecc1268a 100644 --- a/gcc/config/riscv/riscv-c.cc +++ b/gcc/config/riscv/riscv-c.cc @@ -34,72 +34,6 @@ along with GCC; see the file COPYING3. If not see #define builtin_define(TXT) cpp_define (pfile, TXT) -struct pragma_intrinsic_flags -{ - int intrinsic_target_flags; - - int intrinsic_riscv_vector_elen_flags; - int intrinsic_riscv_zvl_flags; - int intrinsic_riscv_zvb_subext; - int intrinsic_riscv_zvk_subext; -}; - -static void -riscv_pragma_intrinsic_flags_pollute (struct pragma_intrinsic_flags *flags) -{ - flags->intrinsic_target_flags = target_flags; - flags->intrinsic_riscv_vector_elen_flags = riscv_vector_elen_flags; - flags->intrinsic_riscv_zvl_flags = riscv_zvl_flags; - flags->intrinsic_riscv_zvb_subext = riscv_zvb_subext; - flags->intrinsic_riscv_zvk_subext = riscv_zvk_subext; - - target_flags = target_flags -| MASK_VECTOR; - - riscv_zvl_flags = riscv_zvl_flags -| MASK_ZVL32B -| MASK_ZVL64B -| MASK_ZVL128B; - - riscv_vector_elen_flags = riscv_vector_elen_flags -| MASK_VECTOR_ELEN_32 -| MASK_VECTOR_ELEN_64 -| MASK_VECTOR_ELEN_FP_16 -| MASK_VECTOR_ELEN_FP_32 -| MASK_VECTOR_ELEN_FP_64; - - riscv_zvb_subext = riscv_zvb_subext -| MASK_ZVBB -| MASK_ZVBC -| MASK_ZV
RE: [PATCH v1] RISC-V: Fix SAT_* dump check failure due to middle-end change.
> This won't apply as I've already updated those tests. I think verifying > the number of SAT_ADDs is useful to ensure we don't regress as some of > these tests detect > 1 SAT_ADD idiom. I see, thanks Jeff. Then drop this patch. Pan -Original Message- From: Jeff Law Sent: Thursday, September 5, 2024 10:10 AM To: Li, Pan2 ; gcc-patches@gcc.gnu.org Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH v1] RISC-V: Fix SAT_* dump check failure due to middle-end change. On 9/4/24 8:01 PM, pan2...@intel.com wrote: > From: Pan Li > > Some middl-end change may effect on the times of .SAT_*. Thus, > refine the dump check for SAT_*, from the scan-times to scan as > we only care about the .SAT_* exist or not. And there will an > other PATCH to perform similar refinement and this PATCH only > fix the failed test cases. This won't apply as I've already updated those tests. I think verifying the number of SAT_ADDs is useful to ensure we don't regress as some of these tests detect > 1 SAT_ADD idiom. jeff
RE: [PATCH v1 1/2] Genmatch: Support new flow for phi on condition
Thanks Richard for comments. > I also think we may want to split out this CFG matching code out into > a helper function > in gimple-match-head.cc instead of repeating it fully for each pattern? That makes sense to me, let me have a try in v2. Pan -Original Message- From: Richard Biener Sent: Wednesday, September 4, 2024 6:56 PM To: Li, Pan2 Cc: gcc-patches@gcc.gnu.org; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH v1 1/2] Genmatch: Support new flow for phi on condition On Wed, Sep 4, 2024 at 9:48 AM Li, Pan2 wrote: > > > I'm lazy - can you please quote genmatch generated code for the condition > > for > > one case? > > Sure thing, list the before and after covers all the changes to generated > code as blow. > > Before this patch: > basic_block _b1 = gimple_bb (_a1); > if (gimple_phi_num_args (_a1) == 2) > { > basic_block _pb_0_1 = EDGE_PRED (_b1, 0)->src; > basic_block _pb_1_1 = EDGE_PRED (_b1, 1)->src; > basic_block _db_1 = safe_dyn_cast (*gsi_last_bb > (_pb_0_1)) ? _pb_0_1 : _pb_1_1; > basic_block _other_db_1 = safe_dyn_cast > (*gsi_last_bb (_pb_0_1)) ? _pb_1_1 : _pb_0_1; > gcond *_ct_1 = safe_dyn_cast (*gsi_last_bb > (_db_1)); > if (_ct_1 && EDGE_COUNT (_other_db_1->preds) == 1 > && EDGE_COUNT (_other_db_1->succs) == 1 > && EDGE_PRED (_other_db_1, 0)->src == _db_1) > { > tree _cond_lhs_1 = gimple_cond_lhs (_ct_1); > tree _cond_rhs_1 = gimple_cond_rhs (_ct_1); > tree _p0 = build2 (gimple_cond_code (_ct_1), > boolean_type_node, _cond_lhs_1, _cond_rhs_1); > bool _arg_0_is_true_1 = gimple_phi_arg_edge (_a1, > 0)->flags & EDGE_TRUE_VALUE; > tree _p1 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? > 0 : 1); > tree _p2 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? > 1 : 0); > switch (TREE_CODE (_p0)) > { > > After this patch: > basic_block _pb_0_1 = EDGE_PRED (_b1, 0)->src; > basic_block _pb_1_1 = EDGE_PRED (_b1, 1)->src; > gcond *_ct_0_1 = safe_dyn_cast (*gsi_last_bb > (_pb_0_1)); > gcond *_ct_1_1 = safe_dyn_cast (*gsi_last_bb > (_pb_1_1)); > gcond *_ct_a_1 = _ct_0_1 ? _ct_0_1 : _ct_1_1; > basic_block _db_1 = _ct_0_1 ? _pb_0_1 : _pb_1_1; > basic_block _other_db_1 = _ct_0_1 ? _pb_1_1 : _pb_0_1; > edge _e_00_1 = _pb_0_1->preds ? EDGE_PRED (_pb_0_1, 0) : > NULL; > basic_block _pb_00_1 = _e_00_1 ? _e_00_1->src : NULL; > gcond *_ct_b_1 = _pb_00_1 ? safe_dyn_cast > (*gsi_last_bb (_pb_00_1)) : NULL; > if ((_ct_a_1 && EDGE_COUNT (_other_db_1->preds) == 1 > && EDGE_COUNT (_other_db_1->succs) == 1 > && EDGE_PRED (_other_db_1, 0)->src == _db_1) > || > (_ct_b_1 && _pb_00_1 && EDGE_COUNT (_pb_0_1->succs) == 1 > && EDGE_COUNT (_pb_0_1->preds) == 1 > && EDGE_COUNT (_other_db_1->preds) == 1 > && EDGE_COUNT (_other_db_1->succs) == 1 > && EDGE_PRED (_other_db_1, 0)->src == _pb_00_1)) > { > gcond *_ct_1 = _ct_a_1 ? _ct_a_1 : _ct_b_1; > tree _cond_lhs_1 = gimple_cond_lhs (_ct_1); > tree _cond_rhs_1 = gimple_cond_rhs (_ct_1); > tree _p0 = build2 (gimple_cond_code (_ct_1), > boolean_type_node, _cond_lhs_1, _cond_rhs_1); > bool _arg_0_is_true_1 = gimple_phi_arg_edge (_a1, > 0)->flags & EDGE_TRUE_VALUE; > tree _p1 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? > 0 : 1); > tree _p2 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? > 1 : 0); I think it might be better to refactor this to detect the three CFGs like if (EDGE_COUNT (_pb_0_1->preds) == 1 && EDGE_PRED (_pb_0_1, 0)->src == pb_1_1) { .. check rest of constraints .. } else if (... same for _pb_1_1 being the forwarder ...) ... else if (EDGE_COUNT (_pb_0_1->preds) == 1
RE: [PATCH v1 1/2] Genmatch: Support new flow for phi on condition
> I'm lazy - can you please quote genmatch generated code for the condition for > one case? Sure thing, list the before and after covers all the changes to generated code as blow. Before this patch: basic_block _b1 = gimple_bb (_a1); if (gimple_phi_num_args (_a1) == 2) { basic_block _pb_0_1 = EDGE_PRED (_b1, 0)->src; basic_block _pb_1_1 = EDGE_PRED (_b1, 1)->src; basic_block _db_1 = safe_dyn_cast (*gsi_last_bb (_pb_0_1)) ? _pb_0_1 : _pb_1_1; basic_block _other_db_1 = safe_dyn_cast (*gsi_last_bb (_pb_0_1)) ? _pb_1_1 : _pb_0_1; gcond *_ct_1 = safe_dyn_cast (*gsi_last_bb (_db_1)); if (_ct_1 && EDGE_COUNT (_other_db_1->preds) == 1 && EDGE_COUNT (_other_db_1->succs) == 1 && EDGE_PRED (_other_db_1, 0)->src == _db_1) { tree _cond_lhs_1 = gimple_cond_lhs (_ct_1); tree _cond_rhs_1 = gimple_cond_rhs (_ct_1); tree _p0 = build2 (gimple_cond_code (_ct_1), boolean_type_node, _cond_lhs_1, _cond_rhs_1); bool _arg_0_is_true_1 = gimple_phi_arg_edge (_a1, 0)->flags & EDGE_TRUE_VALUE; tree _p1 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? 0 : 1); tree _p2 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? 1 : 0); switch (TREE_CODE (_p0)) { After this patch: basic_block _pb_0_1 = EDGE_PRED (_b1, 0)->src; basic_block _pb_1_1 = EDGE_PRED (_b1, 1)->src; gcond *_ct_0_1 = safe_dyn_cast (*gsi_last_bb (_pb_0_1)); gcond *_ct_1_1 = safe_dyn_cast (*gsi_last_bb (_pb_1_1)); gcond *_ct_a_1 = _ct_0_1 ? _ct_0_1 : _ct_1_1; basic_block _db_1 = _ct_0_1 ? _pb_0_1 : _pb_1_1; basic_block _other_db_1 = _ct_0_1 ? _pb_1_1 : _pb_0_1; edge _e_00_1 = _pb_0_1->preds ? EDGE_PRED (_pb_0_1, 0) : NULL; basic_block _pb_00_1 = _e_00_1 ? _e_00_1->src : NULL; gcond *_ct_b_1 = _pb_00_1 ? safe_dyn_cast (*gsi_last_bb (_pb_00_1)) : NULL; if ((_ct_a_1 && EDGE_COUNT (_other_db_1->preds) == 1 && EDGE_COUNT (_other_db_1->succs) == 1 && EDGE_PRED (_other_db_1, 0)->src == _db_1) || (_ct_b_1 && _pb_00_1 && EDGE_COUNT (_pb_0_1->succs) == 1 && EDGE_COUNT (_pb_0_1->preds) == 1 && EDGE_COUNT (_other_db_1->preds) == 1 && EDGE_COUNT (_other_db_1->succs) == 1 && EDGE_PRED (_other_db_1, 0)->src == _pb_00_1)) { gcond *_ct_1 = _ct_a_1 ? _ct_a_1 : _ct_b_1; tree _cond_lhs_1 = gimple_cond_lhs (_ct_1); tree _cond_rhs_1 = gimple_cond_rhs (_ct_1); tree _p0 = build2 (gimple_cond_code (_ct_1), boolean_type_node, _cond_lhs_1, _cond_rhs_1); bool _arg_0_is_true_1 = gimple_phi_arg_edge (_a1, 0)->flags & EDGE_TRUE_VALUE; tree _p1 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? 0 : 1); tree _p2 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? 1 : 0); Pan -Original Message- From: Richard Biener Sent: Wednesday, September 4, 2024 3:42 PM To: Li, Pan2 Cc: gcc-patches@gcc.gnu.org; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH v1 1/2] Genmatch: Support new flow for phi on condition On Wed, Sep 4, 2024 at 9:25 AM wrote: > > From: Pan Li > > The gen_phi_on_cond can only support below control flow for cond > from day 1. Aka: > > +--+ > | def | > | ... | +-+ > | cond |-->| def | > +--+ | ... | >| +-+ >| | >v | > +-+ | > | PHI |<--+ > +-+ > > Unfortunately, there will be more scenarios of control flow on PHI. > For example as below: > > T __attribute__((noinline))\ > sat_s_add_##T##_fmt_3 (T x, T y) \ > { \ > T sum; \ > bool overflow = __builtin_add_overflow (x, y, &sum); \ > return overflow ? x < 0 ? MIN : MAX : sum; \ > } > > DEF_SAT_S_ADD_FMT_3(int8_t, uint8_t, INT8_MIN, INT8_MAX) > > With expanded RTL like b
RE: [PATCH v1] RISC-V: Support form 1 of integer scalar .SAT_ADD
Thanks Jeff. > But I would expect that may be beneficial on other targets as well. I think x86 have the similar insn for saturation, for example as paddsw in below link. https://www.felixcloutier.com/x86/paddsb:paddsw And the backend of x86 implemented some of them already I bet, like usadd, ussub. > The other question that I think Robin initially raised to me privately > is whether or not the sequences we're generating are well suited for > zicond or not. Got it, cmov like insn is well designed for such case(s). We can consider the best practice to leverage zicond ext in further improvements. Pan -Original Message- From: Jeff Law Sent: Monday, September 2, 2024 11:32 AM To: Li, Pan2 ; gcc-patches@gcc.gnu.org Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH v1] RISC-V: Support form 1 of integer scalar .SAT_ADD On 9/1/24 8:50 PM, Li, Pan2 wrote: > Thanks Jeff for comments. > >> OK. Presumably the code you're getting here is more efficient than >> whatever standard expansion would provide? If so, should we be looking >> at moving some of this stuff into generic expanders? I don't really see >> anything all that target specific here. > > Mostly for that we can eliminate the branch for .SAT_ADD in scalar. Given we > don't have one SAT_ADD like insn like RVV vsadd.vv/vx/vi. But I would expect that may be beneficial on other targets as well. It's not conceptually a lot different than what we do basic arithmetic with overflow, which has generic expansion which can be overridden by target specific expanders. See expand_addsub_overflow. Again, I think this is OK, but I'm thinking we probably want something more generic in the longer term. The other question that I think Robin initially raised to me privately is whether or not the sequences we're generating are well suited for zicond or not. If not, we might want to consider adjustments to either generate zicond if-then-else constructs during initial code generation or bias initial code generator towards sequences that ifcvt & combine can turn into zicond. But again not strictly necessary for this patch to go forward, more a potential avenue for further improvements. > > Pan > > -Original Message- > From: Jeff Law > Sent: Sunday, September 1, 2024 11:35 PM > To: Li, Pan2 ; gcc-patches@gcc.gnu.org > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com > Subject: Re: [PATCH v1] RISC-V: Support form 1 of integer scalar .SAT_ADD > > > > On 8/29/24 12:25 AM, pan2...@intel.com wrote: >> From: Pan Li >> >> This patch would like to support the scalar signed ssadd pattern >> for the RISC-V backend. Aka >> >> Form 1: >> #define DEF_SAT_S_ADD_FMT_1(T, UT, MIN, MAX) \ >> T __attribute__((noinline)) \ >> sat_s_add_##T##_fmt_1 (T x, T y) \ >> {\ >> T sum = (UT)x + (UT)y; \ >> return (x ^ y) < 0 \ >> ? sum\ >> : (sum ^ x) >= 0 \ >> ? sum \ >> : x < 0 ? MIN : MAX; \ >> } >> >> DEF_SAT_S_ADD_FMT_1(int64_t, uint64_t, INT64_MIN, INT64_MAX) >> >> Before this patch: >> 10 │ sat_s_add_int64_t_fmt_1: >> 11 │ mv a5,a0 >> 12 │ add a0,a0,a1 >> 13 │ xor a1,a5,a1 >> 14 │ not a1,a1 >> 15 │ xor a4,a5,a0 >> 16 │ and a1,a1,a4 >> 17 │ blt a1,zero,.L5 >> 18 │ ret >> 19 │ .L5: >> 20 │ srai a5,a5,63 >> 21 │ li a0,-1 >> 22 │ srli a0,a0,1 >> 23 │ xor a0,a5,a0 >> 24 │ ret >> >> After this patch: >> 10 │ sat_s_add_int64_t_fmt_1: >> 11 │ add a2,a0,a1 >> 12 │ xor a1,a0,a1 >> 13 │ xor a5,a0,a2 >> 14 │ srli a5,a5,63 >> 15 │ srli a1,a1,63 >> 16 │ xori a1,a1,1 >> 17 │ and a5,a5,a1 >> 18 │ srai a4,a0,63 >> 19 │ li a3,-1 >> 20 │ srli a3,a3,1 >> 21 │ xor a3,a3,a4 >> 22 │ neg a4,a5 >> 23 │ and a3,a3,a4 >> 24 │ addi a5,a5,-1 >> 25 │ and a0,a2,a5 >> 26 │ or a0,a0,a3 >> 27 │ ret >> >> The below test suites are passed for this patch: >> 1. The rv64gcv fully regression test. >>
RE: [PATCH v1] RISC-V: Support form 1 of integer scalar .SAT_ADD
Thanks Jeff for comments. > OK. Presumably the code you're getting here is more efficient than > whatever standard expansion would provide? If so, should we be looking > at moving some of this stuff into generic expanders? I don't really see > anything all that target specific here. Mostly for that we can eliminate the branch for .SAT_ADD in scalar. Given we don't have one SAT_ADD like insn like RVV vsadd.vv/vx/vi. Pan -Original Message- From: Jeff Law Sent: Sunday, September 1, 2024 11:35 PM To: Li, Pan2 ; gcc-patches@gcc.gnu.org Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH v1] RISC-V: Support form 1 of integer scalar .SAT_ADD On 8/29/24 12:25 AM, pan2...@intel.com wrote: > From: Pan Li > > This patch would like to support the scalar signed ssadd pattern > for the RISC-V backend. Aka > > Form 1: >#define DEF_SAT_S_ADD_FMT_1(T, UT, MIN, MAX) \ >T __attribute__((noinline)) \ >sat_s_add_##T##_fmt_1 (T x, T y) \ >{\ > T sum = (UT)x + (UT)y; \ > return (x ^ y) < 0 \ >? sum\ >: (sum ^ x) >= 0 \ > ? sum \ > : x < 0 ? MIN : MAX; \ >} > > DEF_SAT_S_ADD_FMT_1(int64_t, uint64_t, INT64_MIN, INT64_MAX) > > Before this patch: >10 │ sat_s_add_int64_t_fmt_1: >11 │ mv a5,a0 >12 │ add a0,a0,a1 >13 │ xor a1,a5,a1 >14 │ not a1,a1 >15 │ xor a4,a5,a0 >16 │ and a1,a1,a4 >17 │ blt a1,zero,.L5 >18 │ ret >19 │ .L5: >20 │ srai a5,a5,63 >21 │ li a0,-1 >22 │ srli a0,a0,1 >23 │ xor a0,a5,a0 >24 │ ret > > After this patch: >10 │ sat_s_add_int64_t_fmt_1: >11 │ add a2,a0,a1 >12 │ xor a1,a0,a1 >13 │ xor a5,a0,a2 >14 │ srli a5,a5,63 >15 │ srli a1,a1,63 >16 │ xori a1,a1,1 >17 │ and a5,a5,a1 >18 │ srai a4,a0,63 >19 │ li a3,-1 >20 │ srli a3,a3,1 >21 │ xor a3,a3,a4 >22 │ neg a4,a5 >23 │ and a3,a3,a4 >24 │ addi a5,a5,-1 >25 │ and a0,a2,a5 >26 │ or a0,a0,a3 >27 │ ret > > The below test suites are passed for this patch: > 1. The rv64gcv fully regression test. > > gcc/ChangeLog: > > * config/riscv/riscv-protos.h (riscv_expand_ssadd): Add new func > decl for expanding ssadd. > * config/riscv/riscv.cc (riscv_gen_sign_max_cst): Add new func > impl to gen the max int rtx. > (riscv_expand_ssadd): Add new func impl to expand the ssadd. > * config/riscv/riscv.md (ssadd3): Add new pattern for > signed integer .SAT_ADD. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/sat_arith.h: Add test helper macros. > * gcc.target/riscv/sat_arith_data.h: Add test data. > * gcc.target/riscv/sat_s_add-1.c: New test. > * gcc.target/riscv/sat_s_add-2.c: New test. > * gcc.target/riscv/sat_s_add-3.c: New test. > * gcc.target/riscv/sat_s_add-4.c: New test. > * gcc.target/riscv/sat_s_add-run-1.c: New test. > * gcc.target/riscv/sat_s_add-run-2.c: New test. > * gcc.target/riscv/sat_s_add-run-3.c: New test. > * gcc.target/riscv/sat_s_add-run-4.c: New test. > * gcc.target/riscv/scalar_sat_binary_run_xxx.h: New test. OK. Presumably the code you're getting here is more efficient than whatever standard expansion would provide? If so, should we be looking at moving some of this stuff into generic expanders? I don't really see anything all that target specific here. jeff
RE: [PATCH v2] Test: Move pr116278 run test to dg/torture [NFC]
Noted with thanks, will commit with that change if no surprise from test. Pan -Original Message- From: Richard Biener Sent: Wednesday, August 28, 2024 3:24 PM To: Li, Pan2 Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH v2] Test: Move pr116278 run test to dg/torture [NFC] On Wed, Aug 28, 2024 at 3:18 AM Li, Pan2 wrote: > > Kindly ping. Please do not include stdint-gcc.h but stdint.h. otherwise OK. Richard. > Pan > > -Original Message----- > From: Li, Pan2 > Sent: Monday, August 19, 2024 10:05 AM > To: gcc-patches@gcc.gnu.org > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; > rdapp@gmail.com; Li, Pan2 > Subject: [PATCH v2] Test: Move pr116278 run test to dg/torture [NFC] > > From: Pan Li > > Move the run test of pr116278 to dg/torture and leave the risc-v the > asm check under risc-v part. > > PR target/116278 > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/pr116278-run-1.c: Take compile instead of run. > * gcc.target/riscv/pr116278-run-2.c: Ditto. > * gcc.dg/torture/pr116278-run-1.c: New test. > * gcc.dg/torture/pr116278-run-2.c: New test. > > Signed-off-by: Pan Li > --- > gcc/testsuite/gcc.dg/torture/pr116278-run-1.c | 19 +++ > gcc/testsuite/gcc.dg/torture/pr116278-run-2.c | 19 +++ > .../gcc.target/riscv/pr116278-run-1.c | 2 +- > .../gcc.target/riscv/pr116278-run-2.c | 2 +- > 4 files changed, 40 insertions(+), 2 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/torture/pr116278-run-1.c > create mode 100644 gcc/testsuite/gcc.dg/torture/pr116278-run-2.c > > diff --git a/gcc/testsuite/gcc.dg/torture/pr116278-run-1.c > b/gcc/testsuite/gcc.dg/torture/pr116278-run-1.c > new file mode 100644 > index 000..8e07fb6af29 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/torture/pr116278-run-1.c > @@ -0,0 +1,19 @@ > +/* { dg-do run } */ > +/* { dg-require-effective-target int32 } */ > +/* { dg-options "-O2" } */ > + > +#include > + > +int8_t b[1]; > +int8_t *d = b; > +int32_t c; > + > +int main() { > + b[0] = -40; > + uint16_t t = (uint16_t)d[0]; > + > + c = (t < 0xFFF6 ? t : 0xFFF6) + 9; > + > + if (c != 65505) > +__builtin_abort (); > +} > diff --git a/gcc/testsuite/gcc.dg/torture/pr116278-run-2.c > b/gcc/testsuite/gcc.dg/torture/pr116278-run-2.c > new file mode 100644 > index 000..d85e21531e1 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/torture/pr116278-run-2.c > @@ -0,0 +1,19 @@ > +/* { dg-do run } */ > +/* { dg-require-effective-target int32 } */ > +/* { dg-options "-O2" } */ > + > +#include > + > +int16_t b[1]; > +int16_t *d = b; > +int64_t c; > + > +int main() { > + b[0] = -40; > + uint32_t t = (uint32_t)d[0]; > + > + c = (t < 0xFFF6u ? t : 0xFFF6u) + 9; > + > + if (c != 4294967265) > +__builtin_abort (); > +} > diff --git a/gcc/testsuite/gcc.target/riscv/pr116278-run-1.c > b/gcc/testsuite/gcc.target/riscv/pr116278-run-1.c > index d3812bdcdfb..c758fca7975 100644 > --- a/gcc/testsuite/gcc.target/riscv/pr116278-run-1.c > +++ b/gcc/testsuite/gcc.target/riscv/pr116278-run-1.c > @@ -1,4 +1,4 @@ > -/* { dg-do run { target { riscv_v } } } */ > +/* { dg-do compile } */ > /* { dg-options "-O2 -fdump-rtl-expand-details" } */ > > #include > diff --git a/gcc/testsuite/gcc.target/riscv/pr116278-run-2.c > b/gcc/testsuite/gcc.target/riscv/pr116278-run-2.c > index 669cd4f003f..a4da8a323f0 100644 > --- a/gcc/testsuite/gcc.target/riscv/pr116278-run-2.c > +++ b/gcc/testsuite/gcc.target/riscv/pr116278-run-2.c > @@ -1,4 +1,4 @@ > -/* { dg-do run { target { riscv_v } } } */ > +/* { dg-do compile } */ > /* { dg-options "-O2 -fdump-rtl-expand-details" } */ > > #include > -- > 2.43.0 >
RE: [PATCH v1 1/2] RISC-V: Add testcases for unsigned scalar quad and oct .SAT_TRUNC form 2
Hi Patrick, Could you please help to re-trigger the pre-commit? Thanks in advance! Pan -Original Message- From: Patrick O'Neill Sent: Tuesday, August 20, 2024 12:14 AM To: Li, Pan2 ; gcc-patches@gcc.gnu.org Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com; Jeff Law Subject: Re: [PATCH v1 1/2] RISC-V: Add testcases for unsigned scalar quad and oct .SAT_TRUNC form 2 Hi Pan, Once the postcommit baseline moves forward (trunk is currently failing to build linux targets [1] [2]) I'll re-trigger precommit for you. Thanks, Patrick [1]: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116409 [2]: https://github.com/patrick-rivos/gcc-postcommit-ci/issues/1564 On 8/18/24 19:49, Li, Pan2 wrote: > Turn out that the pre-commit doesn't pick up the newest upstream when testing > this patch. > > Pan > > -----Original Message- > From: Li, Pan2 > Sent: Monday, August 19, 2024 9:25 AM > To: Jeff Law ; gcc-patches@gcc.gnu.org > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com > Subject: RE: [PATCH v1 1/2] RISC-V: Add testcases for unsigned scalar quad > and oct .SAT_TRUNC form 2 > > Opps, let me double check what happened to my local tester. > > Pan > > -Original Message----- > From: Jeff Law > Sent: Sunday, August 18, 2024 11:21 PM > To: Li, Pan2 ; gcc-patches@gcc.gnu.org > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com > Subject: Re: [PATCH v1 1/2] RISC-V: Add testcases for unsigned scalar quad > and oct .SAT_TRUNC form 2 > > > > On 8/18/24 12:10 AM, pan2...@intel.com wrote: >> From: Pan Li >> >> This patch would like to add test cases for the unsigned scalar quad and >> oct .SAT_TRUNC form 2. Aka: >> >> Form 2: >> #define DEF_SAT_U_TRUC_FMT_2(NT, WT) \ >> NT __attribute__((noinline)) \ >> sat_u_truc_##WT##_to_##NT##_fmt_2 (WT x) \ >> {\ >> WT max = (WT)(NT)-1; \ >> return x > max ? (NT) max : (NT)x; \ >> } >> >> QUAD: >> DEF_SAT_U_TRUC_FMT_2 (uint16_t, uint64_t) >> DEF_SAT_U_TRUC_FMT_2 (uint8_t, uint32_t) >> >> OCT: >> DEF_SAT_U_TRUC_FMT_2 (uint8_t, uint64_t) >> >> The below test is passed for this patch. >> * The rv64gcv regression test. >> >> gcc/testsuite/ChangeLog: >> >> * gcc.target/riscv/sat_u_trunc-10.c: New test. >> * gcc.target/riscv/sat_u_trunc-11.c: New test. >> * gcc.target/riscv/sat_u_trunc-12.c: New test. >> * gcc.target/riscv/sat_u_trunc-run-10.c: New test. >> * gcc.target/riscv/sat_u_trunc-run-11.c: New test. >> * gcc.target/riscv/sat_u_trunc-run-12.c: New test. > Looks like they're failing in the upstream pre-commit tester: > >> https://github.com/ewlu/gcc-precommit-ci/issues/2066#issuecomment-2295137578 > > jeff
RE: [PATCH v2] Test: Move pr116278 run test to dg/torture [NFC]
Kindly ping. Pan -Original Message- From: Li, Pan2 Sent: Monday, August 19, 2024 10:05 AM To: gcc-patches@gcc.gnu.org Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com; Li, Pan2 Subject: [PATCH v2] Test: Move pr116278 run test to dg/torture [NFC] From: Pan Li Move the run test of pr116278 to dg/torture and leave the risc-v the asm check under risc-v part. PR target/116278 gcc/testsuite/ChangeLog: * gcc.target/riscv/pr116278-run-1.c: Take compile instead of run. * gcc.target/riscv/pr116278-run-2.c: Ditto. * gcc.dg/torture/pr116278-run-1.c: New test. * gcc.dg/torture/pr116278-run-2.c: New test. Signed-off-by: Pan Li --- gcc/testsuite/gcc.dg/torture/pr116278-run-1.c | 19 +++ gcc/testsuite/gcc.dg/torture/pr116278-run-2.c | 19 +++ .../gcc.target/riscv/pr116278-run-1.c | 2 +- .../gcc.target/riscv/pr116278-run-2.c | 2 +- 4 files changed, 40 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/torture/pr116278-run-1.c create mode 100644 gcc/testsuite/gcc.dg/torture/pr116278-run-2.c diff --git a/gcc/testsuite/gcc.dg/torture/pr116278-run-1.c b/gcc/testsuite/gcc.dg/torture/pr116278-run-1.c new file mode 100644 index 000..8e07fb6af29 --- /dev/null +++ b/gcc/testsuite/gcc.dg/torture/pr116278-run-1.c @@ -0,0 +1,19 @@ +/* { dg-do run } */ +/* { dg-require-effective-target int32 } */ +/* { dg-options "-O2" } */ + +#include + +int8_t b[1]; +int8_t *d = b; +int32_t c; + +int main() { + b[0] = -40; + uint16_t t = (uint16_t)d[0]; + + c = (t < 0xFFF6 ? t : 0xFFF6) + 9; + + if (c != 65505) +__builtin_abort (); +} diff --git a/gcc/testsuite/gcc.dg/torture/pr116278-run-2.c b/gcc/testsuite/gcc.dg/torture/pr116278-run-2.c new file mode 100644 index 000..d85e21531e1 --- /dev/null +++ b/gcc/testsuite/gcc.dg/torture/pr116278-run-2.c @@ -0,0 +1,19 @@ +/* { dg-do run } */ +/* { dg-require-effective-target int32 } */ +/* { dg-options "-O2" } */ + +#include + +int16_t b[1]; +int16_t *d = b; +int64_t c; + +int main() { + b[0] = -40; + uint32_t t = (uint32_t)d[0]; + + c = (t < 0xFFF6u ? t : 0xFFF6u) + 9; + + if (c != 4294967265) +__builtin_abort (); +} diff --git a/gcc/testsuite/gcc.target/riscv/pr116278-run-1.c b/gcc/testsuite/gcc.target/riscv/pr116278-run-1.c index d3812bdcdfb..c758fca7975 100644 --- a/gcc/testsuite/gcc.target/riscv/pr116278-run-1.c +++ b/gcc/testsuite/gcc.target/riscv/pr116278-run-1.c @@ -1,4 +1,4 @@ -/* { dg-do run { target { riscv_v } } } */ +/* { dg-do compile } */ /* { dg-options "-O2 -fdump-rtl-expand-details" } */ #include diff --git a/gcc/testsuite/gcc.target/riscv/pr116278-run-2.c b/gcc/testsuite/gcc.target/riscv/pr116278-run-2.c index 669cd4f003f..a4da8a323f0 100644 --- a/gcc/testsuite/gcc.target/riscv/pr116278-run-2.c +++ b/gcc/testsuite/gcc.target/riscv/pr116278-run-2.c @@ -1,4 +1,4 @@ -/* { dg-do run { target { riscv_v } } } */ +/* { dg-do compile } */ /* { dg-options "-O2 -fdump-rtl-expand-details" } */ #include -- 2.43.0
RE: [PATCH v3] Match: Support form 1 for scalar signed integer .SAT_ADD
> :c is required when you want to match up @0s and they appear in a commutative > operation and there's no canonicalization rule putting it into one or the > other > position. In your case you have two commutative operations you want to match > up, so it should be only necessary to try swapping one of it to get the match, > it's not required to swap both. This reduces the number of generated > patterns. Thanks Richard for the explanation. Got the point that the swap on captures for a op will also effect on other op(s), will update in v4. Pan -Original Message- From: Richard Biener Sent: Tuesday, August 27, 2024 4:41 PM To: Li, Pan2 Cc: gcc-patches@gcc.gnu.org; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH v3] Match: Support form 1 for scalar signed integer .SAT_ADD On Tue, Aug 27, 2024 at 3:06 AM Li, Pan2 wrote: > > Thanks Richard for comments. > > > I think you want to use nop_convert here, for sure a truncation or > > extension wouldn't be valid? > > Oh, yes, should be nop_convert. > > > I think you don't need :c on both the inner plus and the bit_xor here? > > Sure, could you please help to explain more about when should I need to add > :c? > Liker inner plus/and/or ... etc, sometimes got confused for similar scenarios. :c is required when you want to match up @0s and they appear in a commutative operation and there's no canonicalization rule putting it into one or the other position. In your case you have two commutative operations you want to match up, so it should be only necessary to try swapping one of it to get the match, it's not required to swap both. This reduces the number of generated patterns. > > + integer_zerop) > > + (bit_xor:c (negate (convert (lt @0 integer_zerop))) max_value) > > > The comment above quotes 'MIN' but that's not present here - that is, > > the comment quotes a source form while we match what we see on > > GIMPLE? I do expect the matching will be quite fragile when not > > being isolated. > > Got it, will update the comments to gimple. > > Pan > > -Original Message- > From: Richard Biener > Sent: Monday, August 26, 2024 9:40 PM > To: Li, Pan2 > Cc: gcc-patches@gcc.gnu.org; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; > kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com > Subject: Re: [PATCH v3] Match: Support form 1 for scalar signed integer > .SAT_ADD > > On Mon, Aug 26, 2024 at 4:20 AM wrote: > > > > From: Pan Li > > > > This patch would like to support the form 1 of the scalar signed > > integer .SAT_ADD. Aka below example: > > > > Form 1: > > #define DEF_SAT_S_ADD_FMT_1(T, UT, MIN, MAX) \ > > T __attribute__((noinline)) \ > > sat_s_add_##T##_fmt_1 (T x, T y) \ > > {\ > > T sum = (UT)x + (UT)y; \ > > return (x ^ y) < 0 \ > > ? sum\ > > : (sum ^ x) >= 0 \ > > ? sum \ > > : x < 0 ? MIN : MAX; \ > > } > > > > DEF_SAT_S_ADD_FMT_1(int64_t, uint64_t, INT64_MIN, INT64_MAX) > > > > We can tell the difference before and after this patch if backend > > implemented the ssadd3 pattern similar as below. > > > > Before this patch: > >4 │ __attribute__((noinline)) > >5 │ int64_t sat_s_add_int64_t_fmt_1 (int64_t x, int64_t y) > >6 │ { > >7 │ int64_t sum; > >8 │ long unsigned int x.0_1; > >9 │ long unsigned int y.1_2; > > 10 │ long unsigned int _3; > > 11 │ long int _4; > > 12 │ long int _5; > > 13 │ int64_t _6; > > 14 │ _Bool _11; > > 15 │ long int _12; > > 16 │ long int _13; > > 17 │ long int _14; > > 18 │ long int _16; > > 19 │ long int _17; > > 20 │ > > 21 │ ;; basic block 2, loop depth 0 > > 22 │ ;;pred: ENTRY > > 23 │ x.0_1 = (long unsigned int) x_7(D); > > 24 │ y.1_2 = (long unsigned int) y_8(D); > > 25 │ _3 = x.0_1 + y.1_2; > > 26 │ sum_9 = (int64_t) _3; > > 27 │ _4 = x_7(D) ^ y_8(D); > > 28 │ _5 = x_7(D) ^ sum_9; > > 29 │ _17 = ~_4; > > 30 │ _16 = _5 & _17; > > 31 │ if (_16 < 0) > > 32 │ goto ; [41.00%] > > 33 │ else &
RE: [PATCH v2] Vect: Reconcile the const_int operand type of unsigned .SAT_ADD
Thanks Richard for comments. > Err, can you please simply do >if (TREE_CODE (ops[1]) == INTEGER_CST) > ops[1] = fold_convert (TREE_TYPE (ops[0]), ops[1]) > ? you are always matching the constant to @1 IIRC. That would be much more simple, will have a try in v3. Pan -Original Message- From: Richard Biener Sent: Tuesday, August 27, 2024 5:09 PM To: Li, Pan2 Cc: gcc-patches@gcc.gnu.org; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH v2] Vect: Reconcile the const_int operand type of unsigned .SAT_ADD On Tue, Aug 27, 2024 at 9:09 AM wrote: > > From: Pan Li > > The .SAT_ADD has 2 operand, when one of the operand may be INTEGER_CST. > For example _1 = .SAT_ADD (_2, 9) comes from below sample code. > > Form 3: > #define DEF_VEC_SAT_U_ADD_IMM_FMT_3(T, IMM) \ > T __attribute__((noinline)) \ > vec_sat_u_add_imm##IMM##_##T##_fmt_3 (T *out, T *in, unsigned limit) \ > {\ > unsigned i;\ > T ret; \ > for (i = 0; i < limit; i++)\ > {\ > out[i] = __builtin_add_overflow (in[i], IMM, &ret) ? -1 : ret; \ > }\ > } > > DEF_VEC_SAT_U_ADD_IMM_FMT_3(uint64_t, 9) > > It will fail to vectorize as the vectorizable_call will check the > operands is type_compatiable but the imm will be (const_int 9) with > the SImode, which is different from _2 (DImode). Aka: > > uint64_t _1; > uint64_t _2; > _1 = .SAT_ADD (_2, 9); > > This patch would like to reconcile the imm operand to the operand type > mode of _2 if and only if there is no precision/data loss. Aka convert > the imm 9 to the DImode for above example. > > The below test suites are passed for this patch: > 1. The rv64gcv fully regression tests. > 2. The rv64gcv build with glibc. > 3. The x86 bootstrap tests. > 4. The x86 fully regression tests. > > gcc/ChangeLog: > > * tree-vect-patterns.cc (vect_recog_reconcile_cst_to_unsigned): > Add new func impl to reconcile the cst int type to given TREE type. > (vect_recog_sat_add_pattern): Reconcile the ops of .SAT_ADD > before building the gimple call. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/rvv/autovec/vec_sat_arith.h: Add test helper > macros. > * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-1.c: > New test. > * > gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-10.c: New test. > * > gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-11.c: New test. > * > gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-12.c: New test. > * > gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-13.c: New test. > * > gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-14.c: New test. > * > gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-15.c: New test. > * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-2.c: > New test. > * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-3.c: > New test. > * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-4.c: > New test. > * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-5.c: > New test. > * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-6.c: > New test. > * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-7.c: > New test. > * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-8.c: > New test. > * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-9.c: > New test. > > Signed-off-by: Pan Li > --- > .../binop/vec_sat_u_add_imm_reconcile-1.c | 9 + > .../binop/vec_sat_u_add_imm_reconcile-10.c| 9 + > .../binop/vec_sat_u_add_imm_reconcile-11.c| 9 + > .../binop/vec_sat_u_add_imm_reconcile-12.c| 9 + > .../binop/vec_sat_u_add_imm_reconcile-13.c| 9 + > .../binop/vec_sat_u_add_imm_reconcile-14.c| 9 + > .../binop/vec_sat_u_add_imm_reconcile-15.c| 9 + > .../binop/vec_sat_u_add_imm_reconcile-2.c | 9 + > .../binop/vec_sat_u_add_imm_reconcile-3.c | 9 + > .../binop/v
RE: [PATCH v3] Match: Support form 1 for scalar signed integer .SAT_ADD
Thanks Richard for comments. > I think you want to use nop_convert here, for sure a truncation or > extension wouldn't be valid? Oh, yes, should be nop_convert. > I think you don't need :c on both the inner plus and the bit_xor here? Sure, could you please help to explain more about when should I need to add :c? Liker inner plus/and/or ... etc, sometimes got confused for similar scenarios. > + integer_zerop) > + (bit_xor:c (negate (convert (lt @0 integer_zerop))) max_value) > The comment above quotes 'MIN' but that's not present here - that is, > the comment quotes a source form while we match what we see on > GIMPLE? I do expect the matching will be quite fragile when not > being isolated. Got it, will update the comments to gimple. Pan -Original Message- From: Richard Biener Sent: Monday, August 26, 2024 9:40 PM To: Li, Pan2 Cc: gcc-patches@gcc.gnu.org; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH v3] Match: Support form 1 for scalar signed integer .SAT_ADD On Mon, Aug 26, 2024 at 4:20 AM wrote: > > From: Pan Li > > This patch would like to support the form 1 of the scalar signed > integer .SAT_ADD. Aka below example: > > Form 1: > #define DEF_SAT_S_ADD_FMT_1(T, UT, MIN, MAX) \ > T __attribute__((noinline)) \ > sat_s_add_##T##_fmt_1 (T x, T y) \ > {\ > T sum = (UT)x + (UT)y; \ > return (x ^ y) < 0 \ > ? sum\ > : (sum ^ x) >= 0 \ > ? sum \ > : x < 0 ? MIN : MAX; \ > } > > DEF_SAT_S_ADD_FMT_1(int64_t, uint64_t, INT64_MIN, INT64_MAX) > > We can tell the difference before and after this patch if backend > implemented the ssadd3 pattern similar as below. > > Before this patch: >4 │ __attribute__((noinline)) >5 │ int64_t sat_s_add_int64_t_fmt_1 (int64_t x, int64_t y) >6 │ { >7 │ int64_t sum; >8 │ long unsigned int x.0_1; >9 │ long unsigned int y.1_2; > 10 │ long unsigned int _3; > 11 │ long int _4; > 12 │ long int _5; > 13 │ int64_t _6; > 14 │ _Bool _11; > 15 │ long int _12; > 16 │ long int _13; > 17 │ long int _14; > 18 │ long int _16; > 19 │ long int _17; > 20 │ > 21 │ ;; basic block 2, loop depth 0 > 22 │ ;;pred: ENTRY > 23 │ x.0_1 = (long unsigned int) x_7(D); > 24 │ y.1_2 = (long unsigned int) y_8(D); > 25 │ _3 = x.0_1 + y.1_2; > 26 │ sum_9 = (int64_t) _3; > 27 │ _4 = x_7(D) ^ y_8(D); > 28 │ _5 = x_7(D) ^ sum_9; > 29 │ _17 = ~_4; > 30 │ _16 = _5 & _17; > 31 │ if (_16 < 0) > 32 │ goto ; [41.00%] > 33 │ else > 34 │ goto ; [59.00%] > 35 │ ;;succ: 3 > 36 │ ;;4 > 37 │ > 38 │ ;; basic block 3, loop depth 0 > 39 │ ;;pred: 2 > 40 │ _11 = x_7(D) < 0; > 41 │ _12 = (long int) _11; > 42 │ _13 = -_12; > 43 │ _14 = _13 ^ 9223372036854775807; > 44 │ ;;succ: 4 > 45 │ > 46 │ ;; basic block 4, loop depth 0 > 47 │ ;;pred: 2 > 48 │ ;;3 > 49 │ # _6 = PHI > 50 │ return _6; > 51 │ ;;succ: EXIT > 52 │ > 53 │ } > > After this patch: >4 │ __attribute__((noinline)) >5 │ int64_t sat_s_add_int64_t_fmt_1 (int64_t x, int64_t y) >6 │ { >7 │ int64_t _4; >8 │ >9 │ ;; basic block 2, loop depth 0 > 10 │ ;;pred: ENTRY > 11 │ _4 = .SAT_ADD (x_5(D), y_6(D)); [tail call] > 12 │ return _4; > 13 │ ;;succ: EXIT > 14 │ > 15 │ } > > The below test suites are passed for this patch. > * The rv64gcv fully regression test. > * The x86 bootstrap test. > * The x86 fully regression test. > > gcc/ChangeLog: > > * match.pd: Add the matching for signed .SAT_ADD. > * tree-ssa-math-opts.cc (gimple_signed_integer_sat_add): Add new > matching func decl. > (match_unsigned_saturation_add): Try signed .SAT_ADD and rename > to ... > (match_saturation_add): ... here. > (math_opts_dom_walker::after_dom_children): Update the above renamed > func from caller. > > Signed-off-by: Pan Li > --- > gcc/match.pd | 18 ++ >
RE: [PATCH v3] RISC-V: Support IMM for operand 0 of ussub pattern
Got it, thanks Jeff. Pan -Original Message- From: Jeff Law Sent: Monday, August 26, 2024 10:21 AM To: Li, Pan2 ; gcc-patches@gcc.gnu.org Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH v3] RISC-V: Support IMM for operand 0 of ussub pattern On 8/25/24 7:35 PM, Li, Pan2 wrote: > Thanks Jeff. > >> OK. I'm assuming we don't have to worry about the case where X is wider >> than Xmode? ie, a DImode on rv32? > > Yes, the DImode is disabled by ANYI iterator for ussub pattern. Thanks. Just wanted to make sure. And for the avoidance of doubt, this patch is fine for the trunk. jeff
RE: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for vectorizable_call
Thanks Richard for comments and confirmation. > Instead pattern recognition of .SAT_ADD should promote/demote the invariants - Got it, will have a try to reconcile the types in .SAT_ADD for const_int. > What I read is that > .ADD_OVERFLOW > produces a value that is equal to the twos-complement add of its arguments > promoted/demoted to the result type, correct? Yes, that make sense to me. Pan -Original Message- From: Richard Biener Sent: Sunday, August 25, 2024 3:42 PM To: Li, Pan2 Cc: Jakub Jelinek ; gcc-patches@gcc.gnu.org Subject: Re: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for vectorizable_call On Sat, Aug 24, 2024 at 1:31 PM Li, Pan2 wrote: > > Thanks Jakub and Richard for explanation and help, I will double check > saturate matching for the const_int strict check. > > Back to this below case, do we still need some ad-hoc step to unblock the > type check when vectorizable_call? > For example, the const_int 9u may have int type for .SAT_ADD(uint8_t, 9u). > Or we have somewhere else to make the vectorizable_call happy. I don't see how vectorizable_call itself can handle this since it doesn't have any idea about the type requirements. Instead pattern recognition of .SAT_ADD should promote/demote the invariants - of course there might be correctness issues involved with matching .ADD_OVERFLOW in the first place. What I read is that .ADD_OVERFLOW produces a value that is equal to the twos-complement add of its arguments promoted/demoted to the result type, correct? Richard. > #define DEF_VEC_SAT_U_ADD_IMM_FMT_3(T, IMM) \ > T __attribute__((noinline)) \ > vec_sat_u_add_imm##IMM##_##T##_fmt_3 (T *out, T *in, unsigned limit) \ > {\ > unsigned i;\ > T ret; \ > for (i = 0; i < limit; i++)\ > {\ > out[i] = __builtin_add_overflow (in[i], IMM, &ret) ? -1 : ret; \ > }\ > } > > DEF_VEC_SAT_U_ADD_IMM_FMT_3(uint8_t, 9u) > > Pan > > -Original Message- > From: Richard Biener > Sent: Friday, August 23, 2024 6:53 PM > To: Jakub Jelinek > Cc: Li, Pan2 ; gcc-patches@gcc.gnu.org > Subject: Re: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for > vectorizable_call > > On Thu, Aug 22, 2024 at 8:36 PM Jakub Jelinek wrote: > > > > On Tue, Aug 20, 2024 at 01:52:35PM +0200, Richard Biener wrote: > > > On Sat, Aug 17, 2024 at 11:18 PM Jakub Jelinek wrote: > > > > > > > > On Sat, Aug 17, 2024 at 05:03:14AM +, Li, Pan2 wrote: > > > > > Please feel free to let me know if there is anything I can do to fix > > > > > this issue. Thanks a lot. > > > > > > > > There is no bug. The operands of .{ADD,SUB,MUL}_OVERFLOW don't have to > > > > have the same type, as described in the > > > > __builtin_{add,sub,mul}_overflow{,_p} documentation, each argument can > > > > have different type and result yet another one, the behavior is then > > > > (as if) to perform the operation in infinite precision and if that > > > > result fits into the result type, there is no overflow, otherwise there > > > > is. > > > > So, there is no need to promote anything. > > > > > > Hmm, it's a bit awkward to have this state in the IL. > > > > Why? These aren't the only internal functions which have different types > > of arguments, from the various widening ifns, conditional ifns, > > scatter/gather, ... Even the WIDEN_*EXPR trees do have type differences > > among arguments. > > And it matches what the user builtin does. > > > > Furthermore, at least without _BitInt (but even with _BitInt at the maximum > > precision too) this might not be even possible. > > E.g. if there is __builtin_add_overflow with unsigned __int128 and __int128 > > arguments and there are no wider types there is simply no type to use for > > both > > arguments, it would need to be a signed type with at least 129 bits... > > > > > I see that > > > expand_arith_overflow eventually applies > > > promotion, namely to the type of the LHS. > > > > The LHS doesn't have to be wider than the operand types, so it can't promote > > always. Yes, in some cases it applies promotion if it is desirable for > >
RE: [PATCH v3] RISC-V: Support IMM for operand 0 of ussub pattern
Thanks Jeff. > OK. I'm assuming we don't have to worry about the case where X is wider > than Xmode? ie, a DImode on rv32? Yes, the DImode is disabled by ANYI iterator for ussub pattern. Pan -Original Message- From: Jeff Law Sent: Sunday, August 25, 2024 11:22 PM To: Li, Pan2 ; gcc-patches@gcc.gnu.org Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH v3] RISC-V: Support IMM for operand 0 of ussub pattern On 8/18/24 11:23 PM, pan2...@intel.com wrote: > From: Pan Li > > This patch would like to allow IMM for the operand 0 of ussub pattern. > Aka .SAT_SUB(1023, y) as the below example. > > Form 1: >#define DEF_SAT_U_SUB_IMM_FMT_1(T, IMM) \ >T __attribute__((noinline)) \ >sat_u_sub_imm##IMM##_##T##_fmt_1 (T y) \ >{ \ > return (T)IMM >= y ? (T)IMM - y : 0; \ >} > > DEF_SAT_U_SUB_IMM_FMT_1(uint64_t, 1023) > > Before this patch: >10 │ sat_u_sub_imm82_uint64_t_fmt_1: >11 │ li a5,82 >12 │ bgtua0,a5,.L3 >13 │ sub a0,a5,a0 >14 │ ret >15 │ .L3: >16 │ li a0,0 >17 │ ret > > After this patch: >10 │ sat_u_sub_imm82_uint64_t_fmt_1: >11 │ li a5,82 >12 │ sltua4,a5,a0 >13 │ addia4,a4,-1 >14 │ sub a0,a5,a0 >15 │ and a0,a4,a0 >16 │ ret > > The below test suites are passed for this patch: > 1. The rv64gcv fully regression test. > > gcc/ChangeLog: > > * config/riscv/riscv.cc (riscv_gen_unsigned_xmode_reg): Add new > func impl to gen xmode rtx reg from operand rtx. > (riscv_expand_ussub): Gen xmode reg for operand 1. > * config/riscv/riscv.md: Allow const_int for operand 1. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/sat_arith.h: Add test helper macro. > * gcc.target/riscv/sat_u_sub_imm-1.c: New test. > * gcc.target/riscv/sat_u_sub_imm-1_1.c: New test. > * gcc.target/riscv/sat_u_sub_imm-1_2.c: New test. > * gcc.target/riscv/sat_u_sub_imm-2.c: New test. > * gcc.target/riscv/sat_u_sub_imm-2_1.c: New test. > * gcc.target/riscv/sat_u_sub_imm-2_2.c: New test. > * gcc.target/riscv/sat_u_sub_imm-3.c: New test. > * gcc.target/riscv/sat_u_sub_imm-3_1.c: New test. > * gcc.target/riscv/sat_u_sub_imm-3_2.c: New test. > * gcc.target/riscv/sat_u_sub_imm-4.c: New test. > * gcc.target/riscv/sat_u_sub_imm-run-1.c: New test. > * gcc.target/riscv/sat_u_sub_imm-run-2.c: New test. > * gcc.target/riscv/sat_u_sub_imm-run-3.c: New test. > * gcc.target/riscv/sat_u_sub_imm-run-4.c: New test. OK. I'm assuming we don't have to worry about the case where X is wider than Xmode? ie, a DImode on rv32? Jeff
RE: [PATCH v2] Match: Support form 1 for scalar signed integer .SAT_ADD
> Wow. I wonder why this isn't simplified to never saturate since > signed x + y has undefined behavior on overflow? So I'd > expect instead > T sum = (unsigned T)x + (unsigned T)y; > to be used. Thanks, let me update in v3. Pan -Original Message- From: Richard Biener Sent: Thursday, August 22, 2024 5:47 PM To: Li, Pan2 Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH v2] Match: Support form 1 for scalar signed integer .SAT_ADD On Wed, Aug 7, 2024 at 11:31 AM wrote: > > From: Pan Li > > This patch would like to support the form 1 of the scalar signed > integer .SAT_ADD. Aka below example: > > Form 1: > #define DEF_SAT_S_ADD_FMT_1(T, MIN, MAX) \ > T __attribute__((noinline)) \ > sat_s_add_##T##_fmt_1 (T x, T y) \ > {\ > T sum = x + y; \ > return (x ^ y) < 0 \ > ? sum\ > : (sum ^ x) >= 0 \ > ? sum \ > : x < 0 ? MIN : MAX; \ > } Wow. I wonder why this isn't simplified to never saturate since signed x + y has undefined behavior on overflow? So I'd expect instead T sum = (unsigned T)x + (unsigned T)y; to be used. > DEF_SAT_S_ADD_FMT_1(int64_t, INT64_MIN, INT64_MAX) > > We can tell the difference before and after this patch if backend > implemented the ssadd3 pattern similar as below. > > Before this patch: >4 │ __attribute__((noinline)) >5 │ int64_t sat_s_add_int64_t_fmt_1 (int64_t x, int64_t y) >6 │ { >7 │ int64_t sum; >8 │ long int _1; >9 │ long int _2; > 10 │ int64_t _3; > 11 │ _Bool _8; > 12 │ long int _9; > 13 │ long int _10; > 14 │ long int _11; > 15 │ long int _12; > 16 │ long int _13; > 17 │ > 18 │[local count: 1073741824]: > 19 │ sum_6 = x_4(D) + y_5(D); > 20 │ _1 = x_4(D) ^ y_5(D); > 21 │ _2 = x_4(D) ^ sum_6; > 22 │ _12 = ~_1; > 23 │ _13 = _2 & _12; > 24 │ if (_13 < 0) > 25 │ goto ; [41.00%] > 26 │ else > 27 │ goto ; [59.00%] > 28 │ > 29 │[local count: 259738147]: > 30 │ _8 = x_4(D) < 0; > 31 │ _9 = (long int) _8; > 32 │ _10 = -_9; > 33 │ _11 = _10 ^ 9223372036854775807; > 34 │ > 35 │[local count: 1073741824]: > 36 │ # _3 = PHI > 37 │ return _3; > 38 │ > 39 │ } > > After this patch: >4 │ __attribute__((noinline)) >5 │ int64_t sat_s_add_int64_t_fmt_1 (int64_t x, int64_t y) >6 │ { >7 │ int64_t _4; >8 │ >9 │ ;; basic block 2, loop depth 0 > 10 │ ;;pred: ENTRY > 11 │ _4 = .SAT_ADD (x_5(D), y_6(D)); [tail call] > 12 │ return _4; > 13 │ ;;succ: EXIT > 14 │ > 15 │ } > > The below test suites are passed for this patch. > * The rv64gcv fully regression test. > * The x86 bootstrap test. > * The x86 fully regression test. > > gcc/ChangeLog: > > * match.pd: Add the matching for signed .SAT_ADD. > * tree-ssa-math-opts.cc (gimple_signed_integer_sat_add): Add new > matching func decl. > (match_unsigned_saturation_add): Try signed .SAT_ADD and rename > to ... > (match_saturation_add): ... here. > (math_opts_dom_walker::after_dom_children): Update the above renamed > func from caller. > > Signed-off-by: Pan Li > --- > gcc/match.pd | 17 > gcc/tree-ssa-math-opts.cc | 42 ++- > 2 files changed, 54 insertions(+), 5 deletions(-) > > diff --git a/gcc/match.pd b/gcc/match.pd > index c9c8478d286..8b8a5dbcfe3 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -3311,6 +3311,23 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) >} >(if (otype_precision < itype_precision && wi::eq_p (trunc_max, > int_cst)) > > +/* Signed saturation add, case 1: > + T sum = X + Y; > + SAT_S_ADD = (X ^ Y) < 0 > + ? sum > + : (sum ^ x) >= 0 > + ? sum > + : x < 0 ? MIN : MAX; */ > +(match (signed_integer_sat_add @0 @1) > + (cond^ (lt (bit_and:c (bit_xor:c @0 (convert?@2 (plus:c (convert? @0) > +(convert? @1 > + (bit_not (bit_xor:c @0 @1))) > + integer_zerop) > + (bit_xor:c (negate (convert (lt @0 in
RE: [PATCH v1] Match: Add type check for .SAT_ADD imm operand
Thanks Richard and Jakub for comments. Ideally would like to make sure the imm operand will have exactly the same type as operand 1. But for uint8_t/uint16_t types, the INTERGER_CST will become the (const_int 3) with int type before matching. Thus, add the type check like that, as well as some negative test case like fail to match .SAT_ADD (uint32_t, 3ull).. etc. .SAT_ADD (uint8_t, (uint8_t)3u) .SAT_ADD (uint16_t, (uint16_t)3u) .SAT_ADD (uint32_t, 3u) .SAT_ADD (uint64_t, 3ull) Thanks again, good to know int_fits_type_p and let me have a try in v2. Pan -Original Message- From: Jakub Jelinek Sent: Sunday, August 25, 2024 1:16 AM To: Li, Pan2 Cc: gcc-patches@gcc.gnu.org; richard.guent...@gmail.com; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH v1] Match: Add type check for .SAT_ADD imm operand On Sat, Aug 24, 2024 at 07:33:06PM +0800, pan2...@intel.com wrote: > From: Pan Li > > This patch would like to add strict check for imm operand of .SAT_ADD > matching. We have no type checking for imm operand in previous, which > may result in unexpected IL to be catched by .SAT_ADD pattern. > > However, things may become more complicated due to the int promotion. > This means any const_int without any suffix will be promoted to int > before matching. For example as below. > > uint8_t a; > uint8_t sum = .SAT_ADD (a, 12); > > The second operand will be (const_int 12) with int type when try to > match .SAT_ADD. Thus, to support int8/int16 .SAT_ADD, only the > int32 and int64 will be strictly checked. > > The below test suite are passed for this patch: > * The rv64gcv fully regression test. > * The x86 bootstrap test. > * The x86 fully regression test. > > gcc/ChangeLog: > > * match.pd: ??? > * match.pd: Add strict type check for .SAT_ADD imm operand. Usually you should say * match.pd (pattern you change): What you change. > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -3190,7 +3190,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > (cond^ (ne (imagpart (IFN_ADD_OVERFLOW@2 @0 INTEGER_CST@1)) integer_zerop) >integer_minus_onep (realpart @2)) >(if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type) > - && types_match (type, @0 > + && types_match (type, @0)) > + (with > +{ > + unsigned precision = TYPE_PRECISION (type); > + unsigned int_precision = HOST_BITS_PER_INT; This has nothing to do with HOST_BITS_PER_INT. The INTEGER_CST can have any type, not just int. > +} > +/* The const_int will perform int promotion, the const_int will have at const_int (well, CONST_INT) is an RTL name, it is INTEGER_CST in GIMPLE. Just one space after , > + least the int_precision. Thus, type less than int_precision will be > + skipped the type match checking. */ But the whole comment doesn't make much sense to me, the INTEGER_CST won't perform any int promotion. > +(if (precision < int_precision || types_match (type, @1)) Why do you compare precision of type against anything? You want to check that the INTEGER_CST@1 is representable in the type (compatible with TREE_TYPE (@0)), because only then the caller can fold_convert @1 to type without the value being altered. So, IMHO best would be (if (int_fits_type_p (@1, type)) Jakub
RE: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for vectorizable_call
Thanks Jakub and Richard for explanation and help, I will double check saturate matching for the const_int strict check. Back to this below case, do we still need some ad-hoc step to unblock the type check when vectorizable_call? For example, the const_int 9u may have int type for .SAT_ADD(uint8_t, 9u). Or we have somewhere else to make the vectorizable_call happy. #define DEF_VEC_SAT_U_ADD_IMM_FMT_3(T, IMM) \ T __attribute__((noinline)) \ vec_sat_u_add_imm##IMM##_##T##_fmt_3 (T *out, T *in, unsigned limit) \ {\ unsigned i;\ T ret; \ for (i = 0; i < limit; i++)\ {\ out[i] = __builtin_add_overflow (in[i], IMM, &ret) ? -1 : ret; \ }\ } DEF_VEC_SAT_U_ADD_IMM_FMT_3(uint8_t, 9u) Pan -Original Message- From: Richard Biener Sent: Friday, August 23, 2024 6:53 PM To: Jakub Jelinek Cc: Li, Pan2 ; gcc-patches@gcc.gnu.org Subject: Re: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for vectorizable_call On Thu, Aug 22, 2024 at 8:36 PM Jakub Jelinek wrote: > > On Tue, Aug 20, 2024 at 01:52:35PM +0200, Richard Biener wrote: > > On Sat, Aug 17, 2024 at 11:18 PM Jakub Jelinek wrote: > > > > > > On Sat, Aug 17, 2024 at 05:03:14AM +, Li, Pan2 wrote: > > > > Please feel free to let me know if there is anything I can do to fix > > > > this issue. Thanks a lot. > > > > > > There is no bug. The operands of .{ADD,SUB,MUL}_OVERFLOW don't have to > > > have the same type, as described in the > > > __builtin_{add,sub,mul}_overflow{,_p} documentation, each argument can > > > have different type and result yet another one, the behavior is then (as > > > if) to perform the operation in infinite precision and if that result > > > fits into the result type, there is no overflow, otherwise there is. > > > So, there is no need to promote anything. > > > > Hmm, it's a bit awkward to have this state in the IL. > > Why? These aren't the only internal functions which have different types > of arguments, from the various widening ifns, conditional ifns, > scatter/gather, ... Even the WIDEN_*EXPR trees do have type differences > among arguments. > And it matches what the user builtin does. > > Furthermore, at least without _BitInt (but even with _BitInt at the maximum > precision too) this might not be even possible. > E.g. if there is __builtin_add_overflow with unsigned __int128 and __int128 > arguments and there are no wider types there is simply no type to use for both > arguments, it would need to be a signed type with at least 129 bits... > > > I see that > > expand_arith_overflow eventually applies > > promotion, namely to the type of the LHS. > > The LHS doesn't have to be wider than the operand types, so it can't promote > always. Yes, in some cases it applies promotion if it is desirable for > codegen purposes. But without the promotions explicitly in the IL it > doesn't need to rely on VRP to figure out how to expand it exactly. > > > Exposing this earlier could > > enable optimization even > > Which optimizations? I was thinking of merging conversions with that implied promotion. > We already try to fold the .{ADD,SUB,MUL}_OVERFLOW > builtins to constants or non-overflowing arithmetics etc. as soon as we > can e.g. using ranges prove the operation will never overflow or will always > overflow. Doing unnecessary promotion (see above that it might not be > always possible at all) would just make the IL larger and risk we during > expansion actually perform the promotions even when we don't have to. > We on the other side already have match.pd rules to undo such promotions > in the operands. See > /* Demote operands of IFN_{ADD,SUB,MUL}_OVERFLOW. */ > And the result (well, TREE_TYPE of the lhs type) can be yet another type, > not related to either of those in any way. OK, fair enough. I think this also shows again the lack of documentation of internal function signatures (hits me all the time with the more complex ones like MASK_LEN_GATHER_LOAD where I always wonder which argument is what) as well as IL type checking (which can also serve as documentation about argument constraints). IMO comments in internal-fn.def would suffice for the former (like effectively tree.h/def provide authority for tree codes); for IL verification a function in interna
RE: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for vectorizable_call
Thanks Tamar for comments and explanations. > But because you've now matched here, another pattern can't match > anymore, and more importantly, it prevents is from trying any alternative way > to vectorize this (if there was one). > That's why the pattern matcher shouldn't knowingly accept something we know > can't get vectorized. You shouldn't > build the pattern at all. > And the reason I suggested doing this check in the match.pd is because of an > inconsistency between the variable and immediate > variant if it's not done there. Got the point here, I will double check all SAT_* related matching pattern for INT_CST type check. Pan -Original Message- From: Tamar Christina Sent: Tuesday, August 20, 2024 3:56 PM To: Li, Pan2 ; Jakub Jelinek Cc: Richard Biener ; gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com; Liu, Hongtao Subject: RE: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for vectorizable_call Hi Pan, > -Original Message- > From: Li, Pan2 > Sent: Tuesday, August 20, 2024 1:58 AM > To: Tamar Christina ; Jakub Jelinek > > Cc: Richard Biener ; gcc-patches@gcc.gnu.org; > juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; > rdapp@gmail.com; Liu, Hongtao > Subject: RE: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for > vectorizable_call > > Thanks Jakub and Tamar for comments and suggestions. > > The match.pd list as below doesn't check the INT_CST type for .SAT_ADD. > > (match (unsigned_integer_sat_add @0 @1) > (cond^ (ne (imagpart (IFN_ADD_OVERFLOW@2 @0 INTEGER_CST@1)) > integer_zerop) > integer_minus_onep (realpart @2)) > (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type) > && types_match (type, @0 > > Thus the different types of .ADD_OVERFLOW could hit the pattern. The > vectorizable_call strictly > check the operands are totally the same, while the scalar doesn't have similar > check. That > is why I only found this issue from vector part. Yeah and my question was more why are we getting away with it for the scalar. So I implemented the optabs so I can take a look. It looks like the scalar version doesn't match because split-path rewrites the IL when the argument is a constant. Passing -fno-split-paths gets it to generate the instruction where we see that the IFN will then also contain mixed types. #include #include #define DEF_VEC_SAT_U_ADD_IMM_FMT_3(T, IMM) \ T __attribute__((noinline)) \ vec_sat_u_add_imm##IMM##_##T##_fmt_3 (T *out, T *in, unsigned limit) \ {\ unsigned i;\ T ret; \ for (i = 0; i < limit; i++)\ {\ out[i] = __builtin_add_overflow (in[i], IMM, &ret) ? -1 : ret; \ }\ } #define CST -9LL DEF_VEC_SAT_U_ADD_IMM_FMT_3(uint8_t, CST) int main () { uint8_t a = 1; uint8_t r = 0; vec_sat_u_add_immCST_uint8_t_fmt_3 (&r, &a, 1); printf ("r=%u\n", r); } Generates: moviv31.8b, 0xfff7 uxtwx2, w2 mov x3, 0 .p2align 5,,15 .L3: ldr b30, [x1, x3] uqadd b30, b30, b31 str b30, [x0, x3] add x3, x3, 1 cmp x2, x3 bne .L3 which is incorrect, it's expected to saturate but instead is doing x + 0xF7. This is because of what Richi said before, there's nothing else in GIMPLE that tries to validate the operands, and expand will simply force the operand to the register of the size it requested and doesn't care about the outcome. For constants that are out of range, we're getting lucky in that existing math rules will remove the operation and replace It with -1. Because the operations know it would overflow in this case so at compile time the check goes away. That's why The problem doesn't show up with an out of range constant since there's no saturation check anymore. But this is pure luck. Secondly the reason I said that +static void +vect_recog_promote_cst_to_unsigned (tree *op, tree type) +{ + if (TREE_CODE (*op) != INTEGER_CST || !TYPE_UNSIGNED (type)) +return; + + unsigned precision = TYPE_PRECISION (type); + wide_int type_max = wi::mask (precision, false, precision); + wide_int op_cst_val = wi::to_wide (*op, precision); + + if (wi::leu_p (op_cst_val, type_max)) +*op = wide_int_to_tree (t
RE: [PATCH v2] Match: Support form 1 for scalar signed integer .SAT_ADD
Kindly ping. Pan -Original Message- From: Li, Pan2 Sent: Wednesday, August 7, 2024 5:31 PM To: gcc-patches@gcc.gnu.org Cc: richard.guent...@gmail.com; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com; Li, Pan2 Subject: [PATCH v2] Match: Support form 1 for scalar signed integer .SAT_ADD From: Pan Li This patch would like to support the form 1 of the scalar signed integer .SAT_ADD. Aka below example: Form 1: #define DEF_SAT_S_ADD_FMT_1(T, MIN, MAX) \ T __attribute__((noinline)) \ sat_s_add_##T##_fmt_1 (T x, T y) \ {\ T sum = x + y; \ return (x ^ y) < 0 \ ? sum\ : (sum ^ x) >= 0 \ ? sum \ : x < 0 ? MIN : MAX; \ } DEF_SAT_S_ADD_FMT_1(int64_t, INT64_MIN, INT64_MAX) We can tell the difference before and after this patch if backend implemented the ssadd3 pattern similar as below. Before this patch: 4 │ __attribute__((noinline)) 5 │ int64_t sat_s_add_int64_t_fmt_1 (int64_t x, int64_t y) 6 │ { 7 │ int64_t sum; 8 │ long int _1; 9 │ long int _2; 10 │ int64_t _3; 11 │ _Bool _8; 12 │ long int _9; 13 │ long int _10; 14 │ long int _11; 15 │ long int _12; 16 │ long int _13; 17 │ 18 │[local count: 1073741824]: 19 │ sum_6 = x_4(D) + y_5(D); 20 │ _1 = x_4(D) ^ y_5(D); 21 │ _2 = x_4(D) ^ sum_6; 22 │ _12 = ~_1; 23 │ _13 = _2 & _12; 24 │ if (_13 < 0) 25 │ goto ; [41.00%] 26 │ else 27 │ goto ; [59.00%] 28 │ 29 │[local count: 259738147]: 30 │ _8 = x_4(D) < 0; 31 │ _9 = (long int) _8; 32 │ _10 = -_9; 33 │ _11 = _10 ^ 9223372036854775807; 34 │ 35 │[local count: 1073741824]: 36 │ # _3 = PHI 37 │ return _3; 38 │ 39 │ } After this patch: 4 │ __attribute__((noinline)) 5 │ int64_t sat_s_add_int64_t_fmt_1 (int64_t x, int64_t y) 6 │ { 7 │ int64_t _4; 8 │ 9 │ ;; basic block 2, loop depth 0 10 │ ;;pred: ENTRY 11 │ _4 = .SAT_ADD (x_5(D), y_6(D)); [tail call] 12 │ return _4; 13 │ ;;succ: EXIT 14 │ 15 │ } The below test suites are passed for this patch. * The rv64gcv fully regression test. * The x86 bootstrap test. * The x86 fully regression test. gcc/ChangeLog: * match.pd: Add the matching for signed .SAT_ADD. * tree-ssa-math-opts.cc (gimple_signed_integer_sat_add): Add new matching func decl. (match_unsigned_saturation_add): Try signed .SAT_ADD and rename to ... (match_saturation_add): ... here. (math_opts_dom_walker::after_dom_children): Update the above renamed func from caller. Signed-off-by: Pan Li --- gcc/match.pd | 17 gcc/tree-ssa-math-opts.cc | 42 ++- 2 files changed, 54 insertions(+), 5 deletions(-) diff --git a/gcc/match.pd b/gcc/match.pd index c9c8478d286..8b8a5dbcfe3 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -3311,6 +3311,23 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) } (if (otype_precision < itype_precision && wi::eq_p (trunc_max, int_cst)) +/* Signed saturation add, case 1: + T sum = X + Y; + SAT_S_ADD = (X ^ Y) < 0 + ? sum + : (sum ^ x) >= 0 + ? sum + : x < 0 ? MIN : MAX; */ +(match (signed_integer_sat_add @0 @1) + (cond^ (lt (bit_and:c (bit_xor:c @0 (convert?@2 (plus:c (convert? @0) +(convert? @1 + (bit_not (bit_xor:c @0 @1))) + integer_zerop) + (bit_xor:c (negate (convert (lt @0 integer_zerop))) max_value) + @2) + (if (INTEGRAL_TYPE_P (type) && !TYPE_UNSIGNED (type) + && types_match (type, @0, @1 + /* x > y && x != XXX_MIN --> x > y x > y && x == XXX_MIN --> false . */ (for eqne (eq ne) diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc index 8d96a4c964b..f39c88741a4 100644 --- a/gcc/tree-ssa-math-opts.cc +++ b/gcc/tree-ssa-math-opts.cc @@ -4023,6 +4023,8 @@ extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree)); extern bool gimple_unsigned_integer_sat_sub (tree, tree*, tree (*)(tree)); extern bool gimple_unsigned_integer_sat_trunc (tree, tree*, tree (*)(tree)); +extern bool gimple_signed_integer_sat_add (tree, tree*, tree (*)(tree)); + static void build_saturation_binary_arith_call (gimple_stmt_iterator *gsi, internal_fn fn, tree lhs, tree op_0, tree op_1) @@ -4072,7 +4074,8 @@ match_unsigned_saturation_add (gimple_stmt_iterator *gsi, gassign *stmt) } /* - * T
RE: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for vectorizable_call
Thanks Jakub and Tamar for comments and suggestions. The match.pd list as below doesn't check the INT_CST type for .SAT_ADD. (match (unsigned_integer_sat_add @0 @1) (cond^ (ne (imagpart (IFN_ADD_OVERFLOW@2 @0 INTEGER_CST@1)) integer_zerop) integer_minus_onep (realpart @2)) (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type) && types_match (type, @0 Thus the different types of .ADD_OVERFLOW could hit the pattern. The vectorizable_call strictly check the operands are totally the same, while the scalar doesn't have similar check. That is why I only found this issue from vector part. It looks like we need to add the type check for INT_CST in match.pd predicate and then add explicit cast to IMM from the source code to match the pattern. For example as below, not very sure it is reasonable or not. #define DEF_VEC_SAT_U_ADD_IMM_FMT_3(T, IMM) \ T __attribute__((noinline)) \ vec_sat_u_add_imm##IMM##_##T##_fmt_3 (T *out, T *in, unsigned limit) \ {\ unsigned i;\ T ret; \ for (i = 0; i < limit; i++)\ {\ out[i] = __builtin_add_overflow (in[i], IMM, &ret) ? -1 : ret; \ // need add (T)IMM, aka out[i] = __builtin_add_overflow (in[i], (T)IMM, &ret) ? -1 : ret; to hit the pattern. }\ } Pan -Original Message- From: Tamar Christina Sent: Tuesday, August 20, 2024 3:41 AM To: Jakub Jelinek Cc: Li, Pan2 ; Richard Biener ; gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com; Liu, Hongtao Subject: RE: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for vectorizable_call > -Original Message- > From: Jakub Jelinek > Sent: Monday, August 19, 2024 8:25 PM > To: Tamar Christina > Cc: Li, Pan2 ; Richard Biener ; > gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; > jeffreya...@gmail.com; rdapp@gmail.com; Liu, Hongtao > > Subject: Re: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for > vectorizable_call > > On Mon, Aug 19, 2024 at 01:55:38PM +, Tamar Christina wrote: > > So would this not be the simplest fix: > > > > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc > > index 87b3dc413b8..fcbc83a49f0 100644 > > --- a/gcc/tree-vect-patterns.cc > > +++ b/gcc/tree-vect-patterns.cc > > @@ -4558,6 +4558,9 @@ vect_recog_sat_add_pattern (vec_info *vinfo, > stmt_vec_info stmt_vinfo, > > > >if (gimple_unsigned_integer_sat_add (lhs, ops, NULL)) > > But then you call gimple_unsigned_integer_sat_add with mismatching types, > not sure if that is ok. > gimple_unsigned_integer_sat_add is a match.pd predicate. It matches the expression rooted in lhs and returns the results in ops. So not sure what you mean here. > > { > > + if (TREE_CODE (ops[1]) == INTEGER_CST) > > + ops[1] = fold_convert (TREE_TYPE (ops[0]), ops[1]); > > + > > This would be only ok if the conversion doesn't change the value > of the constant. > .ADD_OVERFLOW etc. could have e.g. int and unsigned arguments, you don't > want to change the latter to the former if the value has the most > significant bit set. > Similarly, .ADD_OVERFLOW could have e.g. unsigned and unsigned __int128 > arguments, you don't want to truncate the constant. Yes, if the expression truncates or changes the sign of the expression this wouldn't work. But then you can also not match at all. So the match should be rejected then since the values need to fit in the same type as the argument and be the same sign. So the original vect_recog_promote_cst_to_unsigned is also wrong since it doesn't stop the match if it doesn't fit. Imho, the pattern here cannot check this and it should be part of the match condition. If the constant cannot fir into the same type as the operand or has a different sign the matching should fail. It's just that in match.pd you can't modify the arguments returned from a predicate but since the predicate is intended to be used to rewrite to the IFN I still think the above solution is right, and the range check should be done within the predicate. Tamar. > So, you could e.g. do the fold_convert and then verify if > wi::to_widest on the old and new tree are equal, or you could check for > TREE_OVERFLOW if fold_convert honors that. > As I said, for INTEGER_CST operands of .ADD/SUB/MUL_OVERFLOW, the infinite > precision value (aka wi::to_widest) is all that matters. > > Jakub
RE: [PATCH v1 1/2] RISC-V: Add testcases for unsigned scalar quad and oct .SAT_TRUNC form 2
Great! Thanks Patrick. Pan -Original Message- From: Patrick O'Neill Sent: Tuesday, August 20, 2024 12:14 AM To: Li, Pan2 ; gcc-patches@gcc.gnu.org Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com; Jeff Law Subject: Re: [PATCH v1 1/2] RISC-V: Add testcases for unsigned scalar quad and oct .SAT_TRUNC form 2 Hi Pan, Once the postcommit baseline moves forward (trunk is currently failing to build linux targets [1] [2]) I'll re-trigger precommit for you. Thanks, Patrick [1]: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116409 [2]: https://github.com/patrick-rivos/gcc-postcommit-ci/issues/1564 On 8/18/24 19:49, Li, Pan2 wrote: > Turn out that the pre-commit doesn't pick up the newest upstream when testing > this patch. > > Pan > > -----Original Message- > From: Li, Pan2 > Sent: Monday, August 19, 2024 9:25 AM > To: Jeff Law ; gcc-patches@gcc.gnu.org > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com > Subject: RE: [PATCH v1 1/2] RISC-V: Add testcases for unsigned scalar quad > and oct .SAT_TRUNC form 2 > > Opps, let me double check what happened to my local tester. > > Pan > > -Original Message----- > From: Jeff Law > Sent: Sunday, August 18, 2024 11:21 PM > To: Li, Pan2 ; gcc-patches@gcc.gnu.org > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com > Subject: Re: [PATCH v1 1/2] RISC-V: Add testcases for unsigned scalar quad > and oct .SAT_TRUNC form 2 > > > > On 8/18/24 12:10 AM, pan2...@intel.com wrote: >> From: Pan Li >> >> This patch would like to add test cases for the unsigned scalar quad and >> oct .SAT_TRUNC form 2. Aka: >> >> Form 2: >> #define DEF_SAT_U_TRUC_FMT_2(NT, WT) \ >> NT __attribute__((noinline)) \ >> sat_u_truc_##WT##_to_##NT##_fmt_2 (WT x) \ >> {\ >> WT max = (WT)(NT)-1; \ >> return x > max ? (NT) max : (NT)x; \ >> } >> >> QUAD: >> DEF_SAT_U_TRUC_FMT_2 (uint16_t, uint64_t) >> DEF_SAT_U_TRUC_FMT_2 (uint8_t, uint32_t) >> >> OCT: >> DEF_SAT_U_TRUC_FMT_2 (uint8_t, uint64_t) >> >> The below test is passed for this patch. >> * The rv64gcv regression test. >> >> gcc/testsuite/ChangeLog: >> >> * gcc.target/riscv/sat_u_trunc-10.c: New test. >> * gcc.target/riscv/sat_u_trunc-11.c: New test. >> * gcc.target/riscv/sat_u_trunc-12.c: New test. >> * gcc.target/riscv/sat_u_trunc-run-10.c: New test. >> * gcc.target/riscv/sat_u_trunc-run-11.c: New test. >> * gcc.target/riscv/sat_u_trunc-run-12.c: New test. > Looks like they're failing in the upstream pre-commit tester: > >> https://github.com/ewlu/gcc-precommit-ci/issues/2066#issuecomment-2295137578 > > jeff
RE: [PATCH v1 1/2] RISC-V: Add testcases for unsigned scalar quad and oct .SAT_TRUNC form 2
Turn out that the pre-commit doesn't pick up the newest upstream when testing this patch. Pan -Original Message- From: Li, Pan2 Sent: Monday, August 19, 2024 9:25 AM To: Jeff Law ; gcc-patches@gcc.gnu.org Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com Subject: RE: [PATCH v1 1/2] RISC-V: Add testcases for unsigned scalar quad and oct .SAT_TRUNC form 2 Opps, let me double check what happened to my local tester. Pan -Original Message- From: Jeff Law Sent: Sunday, August 18, 2024 11:21 PM To: Li, Pan2 ; gcc-patches@gcc.gnu.org Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH v1 1/2] RISC-V: Add testcases for unsigned scalar quad and oct .SAT_TRUNC form 2 On 8/18/24 12:10 AM, pan2...@intel.com wrote: > From: Pan Li > > This patch would like to add test cases for the unsigned scalar quad and > oct .SAT_TRUNC form 2. Aka: > > Form 2: >#define DEF_SAT_U_TRUC_FMT_2(NT, WT) \ >NT __attribute__((noinline)) \ >sat_u_truc_##WT##_to_##NT##_fmt_2 (WT x) \ >{\ > WT max = (WT)(NT)-1; \ > return x > max ? (NT) max : (NT)x; \ >} > > QUAD: > DEF_SAT_U_TRUC_FMT_2 (uint16_t, uint64_t) > DEF_SAT_U_TRUC_FMT_2 (uint8_t, uint32_t) > > OCT: > DEF_SAT_U_TRUC_FMT_2 (uint8_t, uint64_t) > > The below test is passed for this patch. > * The rv64gcv regression test. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/sat_u_trunc-10.c: New test. > * gcc.target/riscv/sat_u_trunc-11.c: New test. > * gcc.target/riscv/sat_u_trunc-12.c: New test. > * gcc.target/riscv/sat_u_trunc-run-10.c: New test. > * gcc.target/riscv/sat_u_trunc-run-11.c: New test. > * gcc.target/riscv/sat_u_trunc-run-12.c: New test. Looks like they're failing in the upstream pre-commit tester: > https://github.com/ewlu/gcc-precommit-ci/issues/2066#issuecomment-2295137578 jeff
RE: [PATCH v1 1/2] RISC-V: Add testcases for unsigned scalar quad and oct .SAT_TRUNC form 2
Please ignore this patch, should be sent by mistake. Pan -Original Message- From: Li, Pan2 Sent: Monday, August 19, 2024 10:04 AM To: gcc-patches@gcc.gnu.org Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com; Li, Pan2 Subject: [PATCH v1 1/2] RISC-V: Add testcases for unsigned scalar quad and oct .SAT_TRUNC form 2 From: Pan Li This patch would like to add test cases for the unsigned scalar quad and oct .SAT_TRUNC form 2. Aka: Form 2: #define DEF_SAT_U_TRUC_FMT_2(NT, WT) \ NT __attribute__((noinline)) \ sat_u_truc_##WT##_to_##NT##_fmt_2 (WT x) \ {\ WT max = (WT)(NT)-1; \ return x > max ? (NT) max : (NT)x; \ } QUAD: DEF_SAT_U_TRUC_FMT_2 (uint16_t, uint64_t) DEF_SAT_U_TRUC_FMT_2 (uint8_t, uint32_t) OCT: DEF_SAT_U_TRUC_FMT_2 (uint8_t, uint64_t) The below test is passed for this patch. * The rv64gcv regression test. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_u_trunc-10.c: New test. * gcc.target/riscv/sat_u_trunc-11.c: New test. * gcc.target/riscv/sat_u_trunc-12.c: New test. * gcc.target/riscv/sat_u_trunc-run-10.c: New test. * gcc.target/riscv/sat_u_trunc-run-11.c: New test. * gcc.target/riscv/sat_u_trunc-run-12.c: New test. Signed-off-by: Pan Li --- .../gcc.target/riscv/sat_u_trunc-10.c | 17 .../gcc.target/riscv/sat_u_trunc-11.c | 17 .../gcc.target/riscv/sat_u_trunc-12.c | 20 +++ .../gcc.target/riscv/sat_u_trunc-run-10.c | 16 +++ .../gcc.target/riscv/sat_u_trunc-run-11.c | 16 +++ .../gcc.target/riscv/sat_u_trunc-run-12.c | 16 +++ 6 files changed, 102 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-10.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-11.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-12.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-run-10.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-run-11.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-run-12.c diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_trunc-10.c b/gcc/testsuite/gcc.target/riscv/sat_u_trunc-10.c new file mode 100644 index 000..7dfc740c54f --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/sat_u_trunc-10.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#include "sat_arith.h" + +/* +** sat_u_truc_uint32_t_to_uint8_t_fmt_2: +** sltiu\s+[atx][0-9]+,\s*a0,\s*255 +** addi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*-1 +** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+ +** andi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*0xff +** ret +*/ +DEF_SAT_U_TRUC_FMT_2(uint8_t, uint32_t) + +/* { dg-final { scan-rtl-dump-times ".SAT_TRUNC " 2 "expand" } } */ diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_trunc-11.c b/gcc/testsuite/gcc.target/riscv/sat_u_trunc-11.c new file mode 100644 index 000..c50ae96f47d --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/sat_u_trunc-11.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#include "sat_arith.h" + +/* +** sat_u_truc_uint64_t_to_uint8_t_fmt_2: +** sltiu\s+[atx][0-9]+,\s*a0,\s*255 +** addi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*-1 +** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+ +** andi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*0xff +** ret +*/ +DEF_SAT_U_TRUC_FMT_2(uint8_t, uint64_t) + +/* { dg-final { scan-rtl-dump-times ".SAT_TRUNC " 2 "expand" } } */ diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_trunc-12.c b/gcc/testsuite/gcc.target/riscv/sat_u_trunc-12.c new file mode 100644 index 000..61331cee6fa --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/sat_u_trunc-12.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#include "sat_arith.h" + +/* +** sat_u_truc_uint64_t_to_uint16_t_fmt_2: +** li\s+[atx][0-9]+,\s*65536 +** addi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*-1 +** sltu\s+[atx][0-9]+,\s*a0,\s*[atx][0-9]+ +** addi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*-1 +** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+ +** slli\s+a0,\s*a0,\s*48 +** srli\s+a0,\s*a0,\s*48 +** ret +*/ +DEF_SAT_U_TRUC_FMT_2(uint16_t, uint64_t) + +/* { dg-final { scan-rtl-dump-times ".SAT_TRUNC " 2 "expand" } }
RE: [PATCH v1 1/2] RISC-V: Add testcases for unsigned scalar quad and oct .SAT_TRUNC form 2
Opps, let me double check what happened to my local tester. Pan -Original Message- From: Jeff Law Sent: Sunday, August 18, 2024 11:21 PM To: Li, Pan2 ; gcc-patches@gcc.gnu.org Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH v1 1/2] RISC-V: Add testcases for unsigned scalar quad and oct .SAT_TRUNC form 2 On 8/18/24 12:10 AM, pan2...@intel.com wrote: > From: Pan Li > > This patch would like to add test cases for the unsigned scalar quad and > oct .SAT_TRUNC form 2. Aka: > > Form 2: >#define DEF_SAT_U_TRUC_FMT_2(NT, WT) \ >NT __attribute__((noinline)) \ >sat_u_truc_##WT##_to_##NT##_fmt_2 (WT x) \ >{\ > WT max = (WT)(NT)-1; \ > return x > max ? (NT) max : (NT)x; \ >} > > QUAD: > DEF_SAT_U_TRUC_FMT_2 (uint16_t, uint64_t) > DEF_SAT_U_TRUC_FMT_2 (uint8_t, uint32_t) > > OCT: > DEF_SAT_U_TRUC_FMT_2 (uint8_t, uint64_t) > > The below test is passed for this patch. > * The rv64gcv regression test. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/sat_u_trunc-10.c: New test. > * gcc.target/riscv/sat_u_trunc-11.c: New test. > * gcc.target/riscv/sat_u_trunc-12.c: New test. > * gcc.target/riscv/sat_u_trunc-run-10.c: New test. > * gcc.target/riscv/sat_u_trunc-run-11.c: New test. > * gcc.target/riscv/sat_u_trunc-run-12.c: New test. Looks like they're failing in the upstream pre-commit tester: > https://github.com/ewlu/gcc-precommit-ci/issues/2066#issuecomment-2295137578 jeff
RE: [PATCH v1] Test: Move pr116278 run test to c-torture [NFC]
Sure, will send v2 for this. Pan -Original Message- From: Jeff Law Sent: Sunday, August 18, 2024 11:19 PM To: Li, Pan2 ; gcc-patches@gcc.gnu.org Cc: richard.guent...@gmail.com; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com; s...@gentoo.org Subject: Re: [PATCH v1] Test: Move pr116278 run test to c-torture [NFC] On 8/18/24 1:13 AM, pan2...@intel.com wrote: > From: Pan Li > > Move the run test of pr116278 to c-torture and leave the risc-v the > asm check under risc-v part. > > PR target/116278 > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/pr116278-run-1.c: Take compile instead of > run test. > * gcc.target/riscv/pr116278-run-2.c: Ditto. > * gcc.c-torture/execute/pr116278-run-1.c: New test. > * gcc.c-torture/execute/pr116278-run-2.c: New test. We should be using the dg-torture framework, so the right directory for the test is gcc.dg/torture. I suspect these tests (just based on the constants that appear) may not work on the 16 bit integer targets. So we may need /* { dg-require-effective-target int32 } */ But I don't mind faulting that in if/when we see the 16bit int targets complain. So OK in the right directory (gcc.dg/torture). Jeff
RE: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for vectorizable_call
Thanks Jakub for explaining. Hi Richard, Does it mean we need to do some promotion similar as this patch to make the vectorizable_call happy when there is a constant operand? I am not sure if there is a better approach for this case. Pan -Original Message- From: Jakub Jelinek Sent: Sunday, August 18, 2024 5:21 AM To: Li, Pan2 Cc: Richard Biener ; gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; tamar.christ...@arm.com; jeffreya...@gmail.com; rdapp@gmail.com; Liu, Hongtao Subject: Re: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for vectorizable_call On Sat, Aug 17, 2024 at 05:03:14AM +, Li, Pan2 wrote: > Thanks Richard for confirmation. Sorry almost forget this thread. > > Please feel free to let me know if there is anything I can do to fix this > issue. Thanks a lot. There is no bug. The operands of .{ADD,SUB,MUL}_OVERFLOW don't have to have the same type, as described in the __builtin_{add,sub,mul}_overflow{,_p} documentation, each argument can have different type and result yet another one, the behavior is then (as if) to perform the operation in infinite precision and if that result fits into the result type, there is no overflow, otherwise there is. So, there is no need to promote anything, promoted constants would have the same value as the non-promoted ones and the value is all that matters for constants. Jakub
RE: [PATCH v1] RISC-V: Implement the quad and oct .SAT_TRUNC for scalar
> OK. Sorry for the delays here. I wanted to make sure we had the issues > WRT operand extension resolved before diving into this. But in > retrospect, this probably could have moved forward independently. That make much sense to me, thanks a lot. Pan -Original Message- From: Jeff Law Sent: Sunday, August 18, 2024 2:21 AM To: Li, Pan2 ; gcc-patches@gcc.gnu.org Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH v1] RISC-V: Implement the quad and oct .SAT_TRUNC for scalar On 7/22/24 11:06 PM, pan2...@intel.com wrote: > From: Pan Li > > This patch would like to implement the quad and oct .SAT_TRUNC pattern > in the riscv backend. Aka: > > Form 1: >#define DEF_SAT_U_TRUC_FMT_1(NT, WT) \ >NT __attribute__((noinline)) \ >sat_u_truc_##WT##_to_##NT##_fmt_1 (WT x) \ >{\ > bool overflow = x > (WT)(NT)(-1); \ > return ((NT)x) | (NT)-overflow;\ >} > > DEF_SAT_U_TRUC_FMT_1(uint16_t, uint64_t) > > Before this patch: > 4 │ __attribute__((noinline)) > 5 │ uint16_t sat_u_truc_uint64_t_to_uint16_t_fmt_1 (uint64_t x) > 6 │ { > 7 │ _Bool overflow; > 8 │ short unsigned int _1; > 9 │ short unsigned int _2; >10 │ short unsigned int _3; >11 │ uint16_t _6; >12 │ >13 │ ;; basic block 2, loop depth 0 >14 │ ;;pred: ENTRY >15 │ overflow_5 = x_4(D) > 65535; >16 │ _1 = (short unsigned int) x_4(D); >17 │ _2 = (short unsigned int) overflow_5; >18 │ _3 = -_2; >19 │ _6 = _1 | _3; >20 │ return _6; >21 │ ;;succ: EXIT >22 │ >23 │ } > > After this patch: > 3 │ > 4 │ __attribute__((noinline)) > 5 │ uint16_t sat_u_truc_uint64_t_to_uint16_t_fmt_1 (uint64_t x) > 6 │ { > 7 │ uint16_t _6; > 8 │ > 9 │ ;; basic block 2, loop depth 0 >10 │ ;;pred: ENTRY >11 │ _6 = .SAT_TRUNC (x_4(D)); [tail call] >12 │ return _6; >13 │ ;;succ: EXIT >14 │ >15 │ } > > The below tests suites are passed for this patch > 1. The rv64gcv fully regression test. > 2. The rv64gcv build with glibc > > gcc/ChangeLog: > > * config/riscv/iterators.md (ANYI_QUAD_TRUNC): New iterator for > quad truncation. > (ANYI_OCT_TRUNC): New iterator for oct truncation. > (ANYI_QUAD_TRUNCATED): New attr for truncated quad modes. > (ANYI_OCT_TRUNCATED): New attr for truncated oct modes. > (anyi_quad_truncated): Ditto but for lower case. > (anyi_oct_truncated): Ditto but for lower case. > * config/riscv/riscv.md (ustrunc2): > Add new pattern for quad truncation. > (ustrunc2): Ditto but for oct. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-2.c: Adjust > the expand dump check times. > * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-3.c: Ditto. > * gcc.target/riscv/sat_arith_data.h: Add test helper macros. > * gcc.target/riscv/sat_u_trunc-4.c: New test. > * gcc.target/riscv/sat_u_trunc-5.c: New test. > * gcc.target/riscv/sat_u_trunc-6.c: New test. > * gcc.target/riscv/sat_u_trunc-run-4.c: New test. > * gcc.target/riscv/sat_u_trunc-run-5.c: New test. > * gcc.target/riscv/sat_u_trunc-run-6.c: New test. OK. Sorry for the delays here. I wanted to make sure we had the issues WRT operand extension resolved before diving into this. But in retrospect, this probably could have moved forward independently. Jeff
RE: [PATCH v4] RISC-V: Make sure high bits of usadd operands is clean for non-Xmode [PR116278]
> OK. And I think this shows the basic approach we want to use if there > are other builtins that accept sub-word modes. ie, get the operands > into X mode (by extending them as appropriate), then do as much work in > X mode as possible, then truncate the result if needed. > Thanks for your patience on this. Thanks Jeff for comments and suggestions, I will have a try if we can do some combine-like optimization for the SImode asm in RV64. Pan -Original Message- From: Jeff Law Sent: Sunday, August 18, 2024 2:17 AM To: Li, Pan2 ; gcc-patches@gcc.gnu.org Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH v4] RISC-V: Make sure high bits of usadd operands is clean for non-Xmode [PR116278] On 8/16/24 9:43 PM, pan2...@intel.com wrote: > From: Pan Li > > For QI/HImode of .SAT_ADD, the operands may be sign-extended and the > high bits of Xmode may be all 1 which is not expected. For example as > below code. > > signed char b[1]; > unsigned short c; > signed char *d = b; > int main() { >b[0] = -40; >c = ({ (unsigned short)d[0] < 0xFFF6 ? (unsigned short)d[0] : 0xFFF6; }) + > 9; >__builtin_printf("%d\n", c); > } > > After expanding we have: > > ;; _6 = .SAT_ADD (_3, 9); > (insn 8 7 9 (set (reg:DI 143) > (high:DI (symbol_ref:DI ("d") [flags 0x86] ))) > (nil)) > (insn 9 8 10 (set (reg/f:DI 142) > (mem/f/c:DI (lo_sum:DI (reg:DI 143) > (symbol_ref:DI ("d") [flags 0x86] )) [1 d+0 S8 > A64])) > (nil)) > (insn 10 9 11 (set (reg:HI 144 [ _3 ]) > (sign_extend:HI (mem:QI (reg/f:DI 142) [0 *d.0_1+0 S1 A8]))) > "test.c":7:10 -1 > (nil)) > > The convert from signed char to unsigned short will have sign_extend rtl > as above. And finally become the lb insn as below: > > lb a1,0(a5) // a1 is -40, aka 0xffd8 > lui a0,0x1a > addia5,a1,9 > sllia5,a5,0x30 > srlia5,a5,0x30 // a5 is 65505 > sltua1,a5,a1 // compare 65505 and 0xffd8 => TRUE > > The sltu try to compare 65505 and 0xffd8 here, but we > actually want to compare 65505 and 65496 (0xffd8). Thus we need to > clean up the high bits to ensure this. > > The below test suites are passed for this patch: > * The rv64gcv fully regression test. > > PR target/116278 > > gcc/ChangeLog: > > * config/riscv/riscv.cc (riscv_gen_zero_extend_rtx): Add new > func impl to zero extend rtx. > (riscv_expand_usadd): Leverage above func to cleanup operands > and sum. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/pr116278-run-1.c: New test. > * gcc.target/riscv/pr116278-run-2.c: New test. > > PR 116278 > > gcc/ChangeLog: > > * config/riscv/riscv.cc (riscv_gen_zero_extend_rtx): Add new > func impl to zero extend rtx. > (riscv_expand_usadd): Leverage above func to cleanup operands 0 > and remove the special handing for SImode in RV64. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/sat_u_add-11.c: Adjust asm check body. > * gcc.target/riscv/sat_u_add-15.c: Ditto. > * gcc.target/riscv/sat_u_add-19.c: Ditto. > * gcc.target/riscv/sat_u_add-23.c: Ditto. > * gcc.target/riscv/sat_u_add-3.c: Ditto. > * gcc.target/riscv/sat_u_add-7.c: Ditto. > * gcc.target/riscv/sat_u_add_imm-11.c: Ditto. > * gcc.target/riscv/sat_u_add_imm-15.c: Ditto. > * gcc.target/riscv/sat_u_add_imm-3.c: Ditto. > * gcc.target/riscv/sat_u_add_imm-7.c: Ditto. > * gcc.target/riscv/pr116278-run-1.c: New test. > * gcc.target/riscv/pr116278-run-2.c: New test. OK. And I think this shows the basic approach we want to use if there are other builtins that accept sub-word modes. ie, get the operands into X mode (by extending them as appropriate), then do as much work in X mode as possible, then truncate the result if needed. Thanks for your patience on this. Jeff
RE: [PATCH v1] RISC-V: Bugfix incorrect operand for vwsll auto-vect
> Thanks. I've pushed this to the trunk. Thanks a lot, Jeff. Pan -Original Message- From: Jeff Law Sent: Saturday, August 17, 2024 11:27 PM To: Li, Pan2 ; gcc-patches@gcc.gnu.org Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH v1] RISC-V: Bugfix incorrect operand for vwsll auto-vect On 8/10/24 6:36 AM, pan2...@intel.com wrote: > This patch would like to fix one ICE when rv64gcv_zvbb for vwsll. > Consider below example. > > void vwsll_vv_test (short *restrict dst, char *restrict a, > int *restrict b, int n) > { >for (int i = 0; i < n; i++) > dst[i] = a[i] << b[i]; > } > > It will hit the vwsll pattern with following operands. > operand 0 -> (reg:RVVMF2HI 146 [ vect__7.13 ]) > operand 1 -> (reg:RVVMF4QI 165 [ vect_cst__33 ]) > operand 2 -> (reg:RVVM1SI 171 [ vect_cst__36 ]) > > According to the ISA, operand 2 should be the same as operand 1. > Aka operand 2 should have RVVMF4QI mode as above. Thus, add > quad truncation for operand 2 before emit vwsll. > > The below test suites are passed for this patch. > * The rv64gcv fully regression test. > > PR target/116280 > > gcc/ChangeLog: > > * config/riscv/autovec-opt.md: Add quad truncation to > align the mode requirement for vwsll. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/rvv/base/pr116280-1.c: New test. > * gcc.target/riscv/rvv/base/pr116280-2.c: New test. Thanks. I've pushed this to the trunk. jeff
RE: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for vectorizable_call
Thanks Richard for confirmation. Sorry almost forget this thread. Hi Jakub, Please feel free to let me know if there is anything I can do to fix this issue. Thanks a lot. Pan -Original Message- From: Richard Biener Sent: Tuesday, July 16, 2024 11:29 PM To: Li, Pan2 Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; tamar.christ...@arm.com; jeffreya...@gmail.com; rdapp@gmail.com; Liu, Hongtao ; Jakub Jelinek Subject: Re: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for vectorizable_call On Tue, Jul 16, 2024 at 3:22 PM Li, Pan2 wrote: > > > I think that's a bug. Do you say __builtin_add_overflow fails to promote > > (constant) arguments? > > I double checked the 022t.ssa pass for the __builtin_add_overflow operands > tree type. It looks like that > the 2 operands of .ADD_OVERFLOW has different tree types when one of them is > constant. > One is unsigned DI, and the other is int. I think that's a bug (and a downside of internal-functions as they have no prototype the type verifier could work with). That you see them in 022t.ssa means that either the frontend mis-handles the builtin call parsing or fold_builtin_arith_overflow which is responsible for the rewriting to an internal function is wrong. I've CCed Jakub who added those. I think we could add verification for internal functions in the set of commutative_binary_fn_p, commutative_ternary_fn_p, associative_binary_fn_p and possibly others where we can constrain argument and result types. Richard. > (gdb) call debug_gimple_stmt(stmt) > _14 = .ADD_OVERFLOW (_4, 129); > (gdb) call debug_tree (gimple_call_arg(stmt, 0)) > type public unsigned DI > size > unit-size > align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type > 0x76a437e0 precision:64 min max > > pointer_to_this > > visited > def_stmt _4 = *_3; > version:4> > (gdb) call debug_tree (gimple_call_arg(stmt, 1)) > constant > 129> > (gdb) > > Then we go to the vect pass, we can also see that the ops of .ADD_OVERFLOW > has different tree types. > As my understanding, here we should have unsigned DI for constant operands > > (gdb) layout src > (gdb) list > 506 > if (gimple_call_num_args (_c4) == 2) > 507 > { > 508 > tree _q40 = gimple_call_arg (_c4, 0); > 509 > _q40 = do_valueize (valueize, _q40); > 510 > tree _q41 = gimple_call_arg (_c4, 1); > 511 > _q41 = do_valueize (valueize, _q41); > 512 > if (integer_zerop (_q21)) > 513 > { > 514 > if (integer_minus_onep (_p1)) > 515 > { > (gdb) call debug_tree (_q40) > type public unsigned DI > size > unit-size > align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type > 0x76a437e0 precision:64 min max > > pointer_to_this > > visited > def_stmt _4 = *_3; > version:4> > (gdb) call debug_tree (_q41) > constant > 129> > > Pan > > -Original Message- > From: Richard Biener > Sent: Wednesday, July 10, 2024 7:36 PM > To: Li, Pan2 > Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; > tamar.christ...@arm.com; jeffreya...@gmail.com; rdapp@gmail.com; Liu, > Hongtao > Subject: Re: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for > vectorizable_call > > On Wed, Jul 10, 2024 at 11:28 AM wrote: > > > > From: Pan Li > > > > The .SAT_ADD has 2 operand and one of the operand may be INTEGER_CST. > > For example _1 = .SAT_ADD (_2, 9) comes from below sample code. > > > > Form 3: > > #define DEF_VEC_SAT_U_ADD_IMM_FMT_3(T, IMM) \ > > T __attribute__((noinline)) \ > > vec_sat_u_add_imm##IMM##_##T##_fmt_3 (T *out, T *in, unsigned limit) \ > > {\ > > unsigned i;
RE: [PATCH] RISC-V: Fix factor in dwarf_poly_indeterminate_value [PR116305]
Should be in upstream already. Pan -Original Message- From: Li, Pan2 Sent: Saturday, August 17, 2024 11:45 AM To: Zhijin Zeng Cc: gcc-patches@gcc.gnu.org; gcc-b...@gcc.gnu.org; Kito Cheng Subject: RE: [PATCH] RISC-V: Fix factor in dwarf_poly_indeterminate_value [PR116305] Ok, I will commit it if no surprise from test as manually changing. Pan -Original Message- From: Zhijin Zeng Sent: Saturday, August 17, 2024 10:46 AM To: Li, Pan2 Cc: gcc-patches@gcc.gnu.org; gcc-b...@gcc.gnu.org; Kito Cheng Subject: Re: [PATCH] RISC-V: Fix factor in dwarf_poly_indeterminate_value [PR116305] The patch for 3c9c93 as follow. But it's a little strange that this patch hasn't changed and I don't know why it apply fail. May you directly modify the riscv.cc if this version still conflict? The riscv.cc just changed two lines. Thank you again. Zhijjin This patch is to fix the bug (BugId:116305) introduced by the commit bd93ef for risc-v target. The commit bd93ef changes the chunk_num from 1 to TARGET_MIN_VLEN/128 if TARGET_MIN_VLEN is larger than 128 in riscv_convert_vector_bits. So it changes the value of BYTES_PER_RISCV_VECTOR. For example, before merging the commit bd93ef and if TARGET_MIN_VLEN is 256, the value of BYTES_PER_RISCV_VECTOR should be [8, 8], but now [16, 16]. The value of riscv_bytes_per_vector_chunk and BYTES_PER_RISCV_VECTOR are no longer equal. Prologue will use BYTES_PER_RISCV_VECTOR.coeffs[1] to estimate the vlenb register value in riscv_legitimize_poly_move, and dwarf2cfi will also get the estimated vlenb register value in riscv_dwarf_poly_indeterminate_value to calculate the number of times to multiply the vlenb register value. So need to change the factor from riscv_bytes_per_vector_chunk to BYTES_PER_RISCV_VECTOR, otherwise we will get the incorrect dwarf information. The incorrect example as follow: ``` csrr t0,vlenb slli t1,t0,1 sub sp,sp,t1 .cfi_escape 0xf,0xb,0x72,0,0x92,0xa2,0x38,0,0x34,0x1e,0x23,0x50,0x22 ``` The sequence '0x92,0xa2,0x38,0' means the vlenb register, '0x34' means the literal 4, '0x1e' means the multiply operation. But in fact, the vlenb register value just need to multiply the literal 2. gcc/ChangeLog: * config/riscv/riscv.cc (riscv_dwarf_poly_indeterminate_value): gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/scalable_vector_cfi.c: New test. Signed-off-by: Zhijin Zeng --- gcc/config/riscv/riscv.cc | 4 +-- .../riscv/rvv/base/scalable_vector_cfi.c | 32 +++ 2 files changed, 34 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index 1f60d8f9711..8b7123e043e 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -11010,12 +11010,12 @@ static unsigned int riscv_dwarf_poly_indeterminate_value (unsigned int i, unsigned int *factor, int *offset) { - /* Polynomial invariant 1 == (VLENB / riscv_bytes_per_vector_chunk) - 1. + /* Polynomial invariant 1 == (VLENB / BYTES_PER_RISCV_VECTOR) - 1. 1. TARGET_MIN_VLEN == 32, polynomial invariant 1 == (VLENB / 4) - 1. 2. TARGET_MIN_VLEN > 32, polynomial invariant 1 == (VLENB / 8) - 1. */ gcc_assert (i == 1); - *factor = riscv_bytes_per_vector_chunk; + *factor = BYTES_PER_RISCV_VECTOR.coeffs[1]; *offset = 1; return RISCV_DWARF_VLENB; } diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c b/gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c new file mode 100644 index 000..184da10caf3 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c @@ -0,0 +1,32 @@ +/* { dg-do compile } */ +/* { dg-options "-g -O3 -march=rv64gcv -mabi=lp64d" } */ +/* { dg-skip-if "" { *-*-* } {"-O2" "-O1" "-O0" "-Og" "-Oz" "-flto"} } */ +/* { dg-final { scan-assembler {cfi_escape .*0x92,0xa2,0x38,0,0x32,0x1e} } } */ + +#include "riscv_vector.h" + +#define PI_2 1.570796326795 + +extern void func(float *result); + +void test(const float *ys, const float *xs, float *result, size_t length) { + size_t gvl = __riscv_vsetvlmax_e32m2(); + vfloat32m2_t vpi2 = __riscv_vfmv_v_f_f32m2(PI_2, gvl); + + for(size_t i = 0; i < length;) { + gvl = __riscv_vsetvl_e32m2(length - i); + vfloat32m2_t y = __riscv_vle32_v_f32m2(ys, gvl); + vfloat32m2_t x = __riscv_vle32_v_f32m2(xs, gvl); + vbool16_t mask0 = __riscv_vmflt_vv_f32m2_b16(x, y, gvl); + vfloat32m2_t fixpi = __riscv_vfrsub_vf_f32m2_mu(mask0, vpi2, vpi2, 0, gvl); + + __riscv_vse32_v_f32m2(result, fixpi, gvl); + + func(result); + + i += gvl; + ys += gvl; + xs += gvl; + result += gvl; + } +} -- 2.34.1 > Fr
RE: [PATCH] RISC-V: Fix factor in dwarf_poly_indeterminate_value [PR116305]
Ok, I will commit it if no surprise from test as manually changing. Pan -Original Message- From: Zhijin Zeng Sent: Saturday, August 17, 2024 10:46 AM To: Li, Pan2 Cc: gcc-patches@gcc.gnu.org; gcc-b...@gcc.gnu.org; Kito Cheng Subject: Re: [PATCH] RISC-V: Fix factor in dwarf_poly_indeterminate_value [PR116305] The patch for 3c9c93 as follow. But it's a little strange that this patch hasn't changed and I don't know why it apply fail. May you directly modify the riscv.cc if this version still conflict? The riscv.cc just changed two lines. Thank you again. Zhijjin This patch is to fix the bug (BugId:116305) introduced by the commit bd93ef for risc-v target. The commit bd93ef changes the chunk_num from 1 to TARGET_MIN_VLEN/128 if TARGET_MIN_VLEN is larger than 128 in riscv_convert_vector_bits. So it changes the value of BYTES_PER_RISCV_VECTOR. For example, before merging the commit bd93ef and if TARGET_MIN_VLEN is 256, the value of BYTES_PER_RISCV_VECTOR should be [8, 8], but now [16, 16]. The value of riscv_bytes_per_vector_chunk and BYTES_PER_RISCV_VECTOR are no longer equal. Prologue will use BYTES_PER_RISCV_VECTOR.coeffs[1] to estimate the vlenb register value in riscv_legitimize_poly_move, and dwarf2cfi will also get the estimated vlenb register value in riscv_dwarf_poly_indeterminate_value to calculate the number of times to multiply the vlenb register value. So need to change the factor from riscv_bytes_per_vector_chunk to BYTES_PER_RISCV_VECTOR, otherwise we will get the incorrect dwarf information. The incorrect example as follow: ``` csrr t0,vlenb slli t1,t0,1 sub sp,sp,t1 .cfi_escape 0xf,0xb,0x72,0,0x92,0xa2,0x38,0,0x34,0x1e,0x23,0x50,0x22 ``` The sequence '0x92,0xa2,0x38,0' means the vlenb register, '0x34' means the literal 4, '0x1e' means the multiply operation. But in fact, the vlenb register value just need to multiply the literal 2. gcc/ChangeLog: * config/riscv/riscv.cc (riscv_dwarf_poly_indeterminate_value): gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/scalable_vector_cfi.c: New test. Signed-off-by: Zhijin Zeng --- gcc/config/riscv/riscv.cc | 4 +-- .../riscv/rvv/base/scalable_vector_cfi.c | 32 +++ 2 files changed, 34 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index 1f60d8f9711..8b7123e043e 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -11010,12 +11010,12 @@ static unsigned int riscv_dwarf_poly_indeterminate_value (unsigned int i, unsigned int *factor, int *offset) { - /* Polynomial invariant 1 == (VLENB / riscv_bytes_per_vector_chunk) - 1. + /* Polynomial invariant 1 == (VLENB / BYTES_PER_RISCV_VECTOR) - 1. 1. TARGET_MIN_VLEN == 32, polynomial invariant 1 == (VLENB / 4) - 1. 2. TARGET_MIN_VLEN > 32, polynomial invariant 1 == (VLENB / 8) - 1. */ gcc_assert (i == 1); - *factor = riscv_bytes_per_vector_chunk; + *factor = BYTES_PER_RISCV_VECTOR.coeffs[1]; *offset = 1; return RISCV_DWARF_VLENB; } diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c b/gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c new file mode 100644 index 000..184da10caf3 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c @@ -0,0 +1,32 @@ +/* { dg-do compile } */ +/* { dg-options "-g -O3 -march=rv64gcv -mabi=lp64d" } */ +/* { dg-skip-if "" { *-*-* } {"-O2" "-O1" "-O0" "-Og" "-Oz" "-flto"} } */ +/* { dg-final { scan-assembler {cfi_escape .*0x92,0xa2,0x38,0,0x32,0x1e} } } */ + +#include "riscv_vector.h" + +#define PI_2 1.570796326795 + +extern void func(float *result); + +void test(const float *ys, const float *xs, float *result, size_t length) { + size_t gvl = __riscv_vsetvlmax_e32m2(); + vfloat32m2_t vpi2 = __riscv_vfmv_v_f_f32m2(PI_2, gvl); + + for(size_t i = 0; i < length;) { + gvl = __riscv_vsetvl_e32m2(length - i); + vfloat32m2_t y = __riscv_vle32_v_f32m2(ys, gvl); + vfloat32m2_t x = __riscv_vle32_v_f32m2(xs, gvl); + vbool16_t mask0 = __riscv_vmflt_vv_f32m2_b16(x, y, gvl); + vfloat32m2_t fixpi = __riscv_vfrsub_vf_f32m2_mu(mask0, vpi2, vpi2, 0, gvl); + + __riscv_vse32_v_f32m2(result, fixpi, gvl); + + func(result); + + i += gvl; + ys += gvl; + xs += gvl; + result += gvl; + } +} -- 2.34.1 > From: "Li, Pan2" > Date: Sat, Aug 17, 2024, 09:20 > Subject: RE: [PATCH] RISC-V: Fix factor in dwarf_poly_indeterminate_value > [PR116305] > To: "Zhijin Zeng" > Cc: "gcc-patches@gcc.gnu.org", > "gcc-b...@gcc.gnu.org", "Kito &g
RE: [PATCH] RISC-V: Fix factor in dwarf_poly_indeterminate_value [PR116305]
Never mind, looks still conflict, could you please help to double check about it? Current upstream should be 3c9c93f3c923c4a0ccd42db4fd26a944a3c91458. └─(09:18:27 on master ✭)──> git apply tmp.patch ──(Sat,Aug17)─┘ error: patch failed: gcc/config/riscv/riscv.cc:11010 error: gcc/config/riscv/riscv.cc: patch does not apply Pan -Original Message- From: Zhijin Zeng Sent: Friday, August 16, 2024 9:30 PM To: Li, Pan2 Cc: gcc-patches@gcc.gnu.org; gcc-b...@gcc.gnu.org; Kito Cheng Subject: Re: [PATCH] RISC-V: Fix factor in dwarf_poly_indeterminate_value [PR116305] Sorry, the line number changed. The newest version as follow, This patch is to fix the bug (BugId:116305) introduced by the commit bd93ef for risc-v target. The commit bd93ef changes the chunk_num from 1 to TARGET_MIN_VLEN/128 if TARGET_MIN_VLEN is larger than 128 in riscv_convert_vector_bits. So it changes the value of BYTES_PER_RISCV_VECTOR. For example, before merging the commit bd93ef and if TARGET_MIN_VLEN is 256, the value of BYTES_PER_RISCV_VECTOR should be [8, 8], but now [16, 16]. The value of riscv_bytes_per_vector_chunk and BYTES_PER_RISCV_VECTOR are no longer equal. Prologue will use BYTES_PER_RISCV_VECTOR.coeffs[1] to estimate the vlenb register value in riscv_legitimize_poly_move, and dwarf2cfi will also get the estimated vlenb register value in riscv_dwarf_poly_indeterminate_value to calculate the number of times to multiply the vlenb register value. So need to change the factor from riscv_bytes_per_vector_chunk to BYTES_PER_RISCV_VECTOR, otherwise we will get the incorrect dwarf information. The incorrect example as follow: ``` csrr t0,vlenb slli t1,t0,1 sub sp,sp,t1 .cfi_escape 0xf,0xb,0x72,0,0x92,0xa2,0x38,0,0x34,0x1e,0x23,0x50,0x22 ``` The sequence '0x92,0xa2,0x38,0' means the vlenb register, '0x34' means the literal 4, '0x1e' means the multiply operation. But in fact, the vlenb register value just need to multiply the literal 2. gcc/ChangeLog: * config/riscv/riscv.cc (riscv_dwarf_poly_indeterminate_value): gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/scalable_vector_cfi.c: New test. Signed-off-by: Zhijin Zeng --- gcc/config/riscv/riscv.cc | 4 +-- .../riscv/rvv/base/scalable_vector_cfi.c | 32 +++ 2 files changed, 34 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index 1f60d8f9711..8b7123e043e 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -11010,12 +11010,12 @@ static unsigned int riscv_dwarf_poly_indeterminate_value (unsigned int i, unsigned int *factor, int *offset) { - /* Polynomial invariant 1 == (VLENB / riscv_bytes_per_vector_chunk) - 1. + /* Polynomial invariant 1 == (VLENB / BYTES_PER_RISCV_VECTOR) - 1. 1. TARGET_MIN_VLEN == 32, polynomial invariant 1 == (VLENB / 4) - 1. 2. TARGET_MIN_VLEN > 32, polynomial invariant 1 == (VLENB / 8) - 1. */ gcc_assert (i == 1); - *factor = riscv_bytes_per_vector_chunk; + *factor = BYTES_PER_RISCV_VECTOR.coeffs[1]; *offset = 1; return RISCV_DWARF_VLENB; } diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c b/gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c new file mode 100644 index 000..184da10caf3 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c @@ -0,0 +1,32 @@ +/* { dg-do compile } */ +/* { dg-options "-g -O3 -march=rv64gcv -mabi=lp64d" } */ +/* { dg-skip-if "" { *-*-* } {"-O2" "-O1" "-O0" "-Og" "-Oz" "-flto"} } */ +/* { dg-final { scan-assembler {cfi_escape .*0x92,0xa2,0x38,0,0x32,0x1e} } } */ + +#include "riscv_vector.h" + +#define PI_2 1.570796326795 + +extern void func(float *result); + +void test(const float *ys, const float *xs, float *result, size_t length) { + size_t gvl = __riscv_vsetvlmax_e32m2(); + vfloat32m2_t vpi2 = __riscv_vfmv_v_f_f32m2(PI_2, gvl); + + for(size_t i = 0; i < length;) { + gvl = __riscv_vsetvl_e32m2(length - i); + vfloat32m2_t y = __riscv_vle32_v_f32m2(ys, gvl); + vfloat32m2_t x = __riscv_vle32_v_f32m2(xs, gvl); + vbool16_t mask0 = __riscv_vmflt_vv_f32m2_b16(x, y, gvl); + vfloat32m2_t fixpi = __riscv_vfrsub_vf_f32m2_mu(mask0, vpi2, vpi2, 0, gvl); + + __riscv_vse32_v_f32m2(result, fixpi, gvl); + + func(result); + + i += gvl; + ys += gvl; + xs += gvl; + result += gvl; + } +} -- 2.34.1 > From: "Li, Pan2" > Date: Fri, Aug 16, 2024, 21:05 > Sub
RE: [PATCH] RISC-V: Fix factor in dwarf_poly_indeterminate_value [PR116305]
Is this you newest version? https://patchwork.sourceware.org/project/gcc/patch/8fd4328940034d8778cca67eaad54e5a2c2b1a6c.1c2f51e1.0a9a.4367.9762.9b6eccc3b...@feishu.cn/ If so, you may need to rebase upstream, I got conflict when git am. Applying: RISC-V: Fix factor in dwarf_poly_indeterminate_value [PR116305] error: corrupt patch at line 20 Patch failed at 0001 RISC-V: Fix factor in dwarf_poly_indeterminate_value [PR116305] hint: Use 'git am --show-current-patch=diff' to see the failed patch When you have resolved this problem, run "git am --continue". If you prefer to skip this patch, run "git am --skip" instead. To restore the original branch and stop patching, run "git am --abort". Pan -Original Message- From: Zhijin Zeng Sent: Friday, August 16, 2024 8:47 PM To: Li, Pan2 Cc: gcc-patches@gcc.gnu.org; gcc-b...@gcc.gnu.org; Kito Cheng Subject: Re: [PATCH] RISC-V: Fix factor in dwarf_poly_indeterminate_value [PR116305] Hi Pan, I am a new guy for GCC and don't have authority to commit. Please help to commit this patch. Thank you very much. Zhijin > From: "Li, Pan2" > Date: Fri, Aug 16, 2024, 20:15 > Subject: RE: [PATCH] RISC-V: Fix factor in dwarf_poly_indeterminate_value > [PR116305] > To: "曾治金" > Cc: "gcc-patches@gcc.gnu.org", > "gcc-b...@gcc.gnu.org", "Kito > Cheng" > Hi there, > > Please feel free to let me know if you don't have authority to commit it. I > can help to commit this patch. > > Pan > > > -Original Message- > From: Kito Cheng > Sent: Friday, August 16, 2024 3:48 PM > To: 曾治金 > Cc: gcc-patches@gcc.gnu.org; gcc-b...@gcc.gnu.org > Subject: Re: [PATCH] RISC-V: Fix factor in dwarf_poly_indeterminate_value > [PR116305] > > LGTM, thanks for fixing that :) > > On Wed, Aug 14, 2024 at 2:06 PM 曾治金 wrote: > > > > This patch is to fix the bug (BugId:116305) introduced by the commit > > bd93ef for risc-v target. > > > > The commit bd93ef changes the chunk_num from 1 to TARGET_MIN_VLEN/128 > > if TARGET_MIN_VLEN is larger than 128 in riscv_convert_vector_bits. So > > it changes the value of BYTES_PER_RISCV_VECTOR. For example, before > > merging the commit bd93ef and if TARGET_MIN_VLEN is 256, the value > > of BYTES_PER_RISCV_VECTOR should be [8, 8], but now [16, 16]. The value > > of riscv_bytes_per_vector_chunk and BYTES_PER_RISCV_VECTOR are no longer > > equal. > > > > Prologue will use BYTES_PER_RISCV_VECTOR.coeffs[1] to estimate the vlenb > > register value in riscv_legitimize_poly_move, and dwarf2cfi will also > > get the estimated vlenb register value in > > riscv_dwarf_poly_indeterminate_value > > to calculate the number of times to multiply the vlenb register value. > > > > So need to change the factor from riscv_bytes_per_vector_chunk to > > BYTES_PER_RISCV_VECTOR, otherwise we will get the incorrect dwarf > > information. The incorrect example as follow: > > > > ``` > > csrr t0,vlenb > > slli t1,t0,1 > > sub sp,sp,t1 > > > > .cfi_escape 0xf,0xb,0x72,0,0x92,0xa2,0x38,0,0x34,0x1e,0x23,0x50,0x22 > > ``` > > > > The sequence '0x92,0xa2,0x38,0' means the vlenb register, '0x34' means > > the literal 4, '0x1e' means the multiply operation. But in fact, the > > vlenb register value just need to multiply the literal 2. > > > > gcc/ChangeLog: > > > > * config/riscv/riscv.cc (riscv_dwarf_poly_indeterminate_value): > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/riscv/rvv/base/scalable_vector_cfi.c: New test. > > > > Signed-off-by: Zhijin Zeng > > --- > > gcc/config/riscv/riscv.cc | 4 +-- > > .../riscv/rvv/base/scalable_vector_cfi.c | 32 +++ > > 2 files changed, 34 insertions(+), 2 deletions(-) > > create mode 100644 > >gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c > > > > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc > > index 5fe4273beb7..e740fc159dd 100644 > > --- a/gcc/config/riscv/riscv.cc > > +++ b/gcc/config/riscv/riscv.cc > > @@ -10773,12 +10773,12 @@ static unsigned int > > riscv_dwarf_poly_indeterminate_value (unsigned int i, unsigned int *factor, > > int *offset) > > { > > - /* Polynomial invariant 1 == (VLENB / riscv_bytes_per_vector_chunk) - 1. > > + /* Polynomial invariant 1 == (VLENB / BYTES_PER_RISCV_VECTOR) - 1. > > 1. TARGET_MIN_VLEN == 32, polynomial invariant 1 == (VLENB / 4) - 1. > > 2. TARGET_MI
RE: [PATCH v2] RISC-V: Make sure high bits of usadd operands is clean for HI/QI [PR116278]
Thanks Jeff and waterman for comments. > What's more important is that we get the RTL semantics right, the fact > that it seems to work due to addiw seems to be more of an accident than > by design. The SImode has different handling from day 1 which follow the algorithm up to a point. 11842 if (mode == SImode && mode != Xmode) 11843 { /* Take addw to avoid the sum truncate. 11844 rtx simode_sum = gen_reg_rtx (SImode 11845 riscv_emit_binary (PLUS, simode_sum, x, y 11846 emit_move_insn (xmode_sum, gen_lowpart (Xmode, simode_sum)); 11847 } > I think your overall point still holds, though. Got the point here but I would like to double confirm the below 2 more insn is acceptable for this change. (or we can eliminate it later) sat_u_add_uint32_t_fmt_1: sllia5,a0,32 // additional insn for taking care SI in rv64 srlia5,a5,32 // Ditto. addwa0,a0,a1 sltua5,a0,a5 neg a5,a5 or a0,a5,a0 sext.w a0,a0 ret If so, I will prepare the v3 for the SImode in RV64. Pan -Original Message- From: Andrew Waterman Sent: Friday, August 16, 2024 12:28 PM To: Jeff Law Cc: Li, Pan2 ; gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH v2] RISC-V: Make sure high bits of usadd operands is clean for HI/QI [PR116278] On Thu, Aug 15, 2024 at 9:23 PM Jeff Law wrote: > > > > On 8/13/24 10:16 PM, Li, Pan2 wrote: > >> How specifically is it avoided for SI? ISTM it should have the exact > >> same problem with a constant like 0x8000 in SImode on rv64 which is > >> going to be extended to 0x8000. > > > > HI and QI need some special handling for sum. For example, for HImode. > > > > 65535 + 2 = 65537, when compare sum and 2, we need to cleanup the high bits > > (aka make 65537 become 1) to tell the HImode overflow. > > Thus, for HI and QI, we need to clean up highest bits of mode. > > > > But for SI, we don't need that as we have addw insn, the sign extend will > > take care of this as well as the sltu. For example, SImode. > > > > lw a1,0(a5) // a1 is -40, aka 0xffd8 > > lui a0,0x1a // > > addwia5,a1,9 // a5 is -31, aka 0xffe1 > > // For QI and HI, we need to mask the highbits, > > but not applicable for SI. > > sltua1,a5,a1 // compare a1 and a5, a5 > a1, then no-overflow as > > expected. > What's more important is that we get the RTL semantics right, the fact > that it seems to work due to addiw seems to be more of an accident than > by design. Also note that addiw isn't available unless ZBA is enabled, > so we don't want to depend on that to save us. addiw is always available in RV64; you're probably thinking of add.uw, which is an RV64_Zba instruction. I think your overall point still holds, though. > > I still think we should be handling SI on rv64 in a manner similar to > QI/HI are handled on rv32/rv64. > > jeff >
RE: [PATCH] RISC-V: Fix factor in dwarf_poly_indeterminate_value [PR116305]
Hi there, Please feel free to let me know if you don't have authority to commit it. I can help to commit this patch. Pan -Original Message- From: Kito Cheng Sent: Friday, August 16, 2024 3:48 PM To: 曾治金 Cc: gcc-patches@gcc.gnu.org; gcc-b...@gcc.gnu.org Subject: Re: [PATCH] RISC-V: Fix factor in dwarf_poly_indeterminate_value [PR116305] LGTM, thanks for fixing that :) On Wed, Aug 14, 2024 at 2:06 PM 曾治金 wrote: > > This patch is to fix the bug (BugId:116305) introduced by the commit > bd93ef for risc-v target. > > The commit bd93ef changes the chunk_num from 1 to TARGET_MIN_VLEN/128 > if TARGET_MIN_VLEN is larger than 128 in riscv_convert_vector_bits. So > it changes the value of BYTES_PER_RISCV_VECTOR. For example, before > merging the commit bd93ef and if TARGET_MIN_VLEN is 256, the value > of BYTES_PER_RISCV_VECTOR should be [8, 8], but now [16, 16]. The value > of riscv_bytes_per_vector_chunk and BYTES_PER_RISCV_VECTOR are no longer > equal. > > Prologue will use BYTES_PER_RISCV_VECTOR.coeffs[1] to estimate the vlenb > register value in riscv_legitimize_poly_move, and dwarf2cfi will also > get the estimated vlenb register value in riscv_dwarf_poly_indeterminate_value > to calculate the number of times to multiply the vlenb register value. > > So need to change the factor from riscv_bytes_per_vector_chunk to > BYTES_PER_RISCV_VECTOR, otherwise we will get the incorrect dwarf > information. The incorrect example as follow: > > ``` > csrrt0,vlenb > sllit1,t0,1 > sub sp,sp,t1 > > .cfi_escape 0xf,0xb,0x72,0,0x92,0xa2,0x38,0,0x34,0x1e,0x23,0x50,0x22 > ``` > > The sequence '0x92,0xa2,0x38,0' means the vlenb register, '0x34' means > the literal 4, '0x1e' means the multiply operation. But in fact, the > vlenb register value just need to multiply the literal 2. > > gcc/ChangeLog: > > * config/riscv/riscv.cc (riscv_dwarf_poly_indeterminate_value): > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/rvv/base/scalable_vector_cfi.c: New test. > > Signed-off-by: Zhijin Zeng > --- > gcc/config/riscv/riscv.cc | 4 +-- > .../riscv/rvv/base/scalable_vector_cfi.c | 32 +++ > 2 files changed, 34 insertions(+), 2 deletions(-) > create mode 100644 > gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c > > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc > index 5fe4273beb7..e740fc159dd 100644 > --- a/gcc/config/riscv/riscv.cc > +++ b/gcc/config/riscv/riscv.cc > @@ -10773,12 +10773,12 @@ static unsigned int > riscv_dwarf_poly_indeterminate_value (unsigned int i, unsigned int *factor, > int *offset) > { > - /* Polynomial invariant 1 == (VLENB / riscv_bytes_per_vector_chunk) - 1. > + /* Polynomial invariant 1 == (VLENB / BYTES_PER_RISCV_VECTOR) - 1. > 1. TARGET_MIN_VLEN == 32, polynomial invariant 1 == (VLENB / 4) - 1. > 2. TARGET_MIN_VLEN > 32, polynomial invariant 1 == (VLENB / 8) - 1. >*/ >gcc_assert (i == 1); > - *factor = riscv_bytes_per_vector_chunk; > + *factor = BYTES_PER_RISCV_VECTOR.coeffs[1]; >*offset = 1; >return RISCV_DWARF_VLENB; > } > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c > b/gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c > new file mode 100644 > index 000..184da10caf3 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c > @@ -0,0 +1,32 @@ > +/* { dg-do compile } */ > +/* { dg-options "-g -O3 -march=rv64gcv -mabi=lp64d" } */ > +/* { dg-skip-if "" { *-*-* } {"-O2" "-O1" "-O0" "-Og" "-Oz" "-flto"} } */ > +/* { dg-final { scan-assembler {cfi_escape .*0x92,0xa2,0x38,0,0x32,0x1e} } } > */ > + > +#include "riscv_vector.h" > + > +#define PI_2 1.570796326795 > + > +extern void func(float *result); > + > +void test(const float *ys, const float *xs, float *result, size_t length) { > +size_t gvl = __riscv_vsetvlmax_e32m2(); > +vfloat32m2_t vpi2 = __riscv_vfmv_v_f_f32m2(PI_2, gvl); > + > +for(size_t i = 0; i < length;) { > +gvl = __riscv_vsetvl_e32m2(length - i); > +vfloat32m2_t y = __riscv_vle32_v_f32m2(ys, gvl); > +vfloat32m2_t x = __riscv_vle32_v_f32m2(xs, gvl); > +vbool16_t mask0 = __riscv_vmflt_vv_f32m2_b16(x, y, gvl); > +vfloat32m2_t fixpi = __riscv_vfrsub_vf_f32m2_mu(mask0, vpi2, vpi2, > 0, gvl); > + > +__riscv_vse32_v_f32m2(result, fixpi, gvl); > + > +func(result); > + > +i += gvl; > +ys += gvl; > +xs += gvl; > +result += gvl; > +} > +} > -- > 2.34.1 > > > This message and any attachment are confidential and may be privileged or > otherwise protected from disclosure. If you are not an intended recipient of > this message, please delete it and any attachment from your system and notify > the sender immediately by reply e-mail. Unintended recipients should not use, > copy, disclose or take any action based on this message or any
RE: [PATCH v2] RISC-V: Support IMM for operand 0 of ussub pattern
> But you're shifting a REG, not a CONST_INT. I see, we can make a QImode REG to be moved to, and then zero_extend. Thanks Jeff for enlightening me, and will send v3 for this. Pan -Original Message- From: Jeff Law Sent: Wednesday, August 14, 2024 11:52 AM To: Li, Pan2 ; gcc-patches@gcc.gnu.org Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH v2] RISC-V: Support IMM for operand 0 of ussub pattern On 8/13/24 9:47 PM, Li, Pan2 wrote: >>> +static rtx >>> +riscv_gen_unsigned_xmode_reg (rtx x, machine_mode mode) >>> +{ >>> + if (!CONST_INT_P (x)) >>> +return gen_lowpart (Xmode, x); >>> + >>> + rtx xmode_x = gen_reg_rtx (Xmode); >>> + HOST_WIDE_INT cst = INTVAL (x); >>> + >>> + emit_move_insn (xmode_x, x); >>> + >>> + int xmode_bits = GET_MODE_BITSIZE (Xmode); >>> + int mode_bits = GET_MODE_BITSIZE (mode).to_constant (); >>> + >>> + if (cst < 0 && mode_bits < xmode_bits) >>> +{ >>> + int shift_bits = xmode_bits - mode_bits; >>> + >>> + riscv_emit_binary (ASHIFT, xmode_x, xmode_x, GEN_INT (shift_bits)); >>> + riscv_emit_binary (LSHIFTRT, xmode_x, xmode_x, GEN_INT >>> (shift_bits)); >>> +} >> Isn't this a zero_extension? > > I am not sure it is valid for zero_extend, given the incoming rtx x is > const_int which is DImode(integer promoted) > for ussub. > I will rebase this patch after PR116278 commit, and give a try for this. But you're shifting a REG, not a CONST_INT. Jeff
RE: [PATCH v2] RISC-V: Make sure high bits of usadd operands is clean for HI/QI [PR116278]
> How specifically is it avoided for SI? ISTM it should have the exact > same problem with a constant like 0x8000 in SImode on rv64 which is > going to be extended to 0x8000. HI and QI need some special handling for sum. For example, for HImode. 65535 + 2 = 65537, when compare sum and 2, we need to cleanup the high bits (aka make 65537 become 1) to tell the HImode overflow. Thus, for HI and QI, we need to clean up highest bits of mode. But for SI, we don't need that as we have addw insn, the sign extend will take care of this as well as the sltu. For example, SImode. lw a1,0(a5) // a1 is -40, aka 0xffd8 lui a0,0x1a // addwia5,a1,9 // a5 is -31, aka 0xffe1 // For QI and HI, we need to mask the highbits, but not applicable for SI. sltua1,a5,a1 // compare a1 and a5, a5 > a1, then no-overflow as expected. Pan -Original Message- From: Jeff Law Sent: Wednesday, August 14, 2024 12:03 PM To: Li, Pan2 ; gcc-patches@gcc.gnu.org Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH v2] RISC-V: Make sure high bits of usadd operands is clean for HI/QI [PR116278] On 8/12/24 8:09 PM, Li, Pan2 wrote: >> Isn't this wrong for SImode on rv64? It seems to me the right test is >> mode != word_mode? >> Assuming that works, it's OK for the trunk. > > Thanks Jeff, Simode version of test file doesn't have this issue. Thus, only > HI and QI here. > I will add a new test for SImode in v3 to ensure this. How specifically is it avoided for SI? ISTM it should have the exact same problem with a constant like 0x8000 in SImode on rv64 which is going to be extended to 0x8000. Jeff
RE: [PATCH v2] RISC-V: Support IMM for operand 0 of ussub pattern
>> +static rtx >> +riscv_gen_unsigned_xmode_reg (rtx x, machine_mode mode) >> +{ >> + if (!CONST_INT_P (x)) >> +return gen_lowpart (Xmode, x); >> + >> + rtx xmode_x = gen_reg_rtx (Xmode); >> + HOST_WIDE_INT cst = INTVAL (x); >> + >> + emit_move_insn (xmode_x, x); >> + >> + int xmode_bits = GET_MODE_BITSIZE (Xmode); >> + int mode_bits = GET_MODE_BITSIZE (mode).to_constant (); >> + >> + if (cst < 0 && mode_bits < xmode_bits) >> +{ >> + int shift_bits = xmode_bits - mode_bits; >> + >> + riscv_emit_binary (ASHIFT, xmode_x, xmode_x, GEN_INT (shift_bits)); >> + riscv_emit_binary (LSHIFTRT, xmode_x, xmode_x, GEN_INT (shift_bits)); >> +} > Isn't this a zero_extension? I am not sure it is valid for zero_extend, given the incoming rtx x is const_int which is DImode(integer promoted) for ussub. I will rebase this patch after PR116278 commit, and give a try for this. Pan -Original Message- From: Jeff Law Sent: Wednesday, August 14, 2024 11:33 AM To: Li, Pan2 ; gcc-patches@gcc.gnu.org Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH v2] RISC-V: Support IMM for operand 0 of ussub pattern On 8/13/24 8:23 PM, Li, Pan2 wrote: > This Patch may requires rebase, will send v3 for conflict resolving. > > Pan > > -Original Message- > From: Li, Pan2 > Sent: Sunday, August 4, 2024 7:48 PM > To: gcc-patches@gcc.gnu.org > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; > rdapp@gmail.com; Li, Pan2 > Subject: [PATCH v2] RISC-V: Support IMM for operand 0 of ussub pattern > > From: Pan Li > > This patch would like to allow IMM for the operand 0 of ussub pattern. > Aka .SAT_SUB(1023, y) as the below example. > > Form 1: >#define DEF_SAT_U_SUB_IMM_FMT_1(T, IMM) \ >T __attribute__((noinline)) \ >sat_u_sub_imm##IMM##_##T##_fmt_1 (T y) \ >{ \ > return (T)IMM >= y ? (T)IMM - y : 0; \ >} > > DEF_SAT_U_SUB_IMM_FMT_1(uint64_t, 1023) > > Before this patch: >10 │ sat_u_sub_imm82_uint64_t_fmt_1: >11 │ li a5,82 >12 │ bgtua0,a5,.L3 >13 │ sub a0,a5,a0 >14 │ ret >15 │ .L3: >16 │ li a0,0 >17 │ ret > > After this patch: >10 │ sat_u_sub_imm82_uint64_t_fmt_1: >11 │ li a5,82 >12 │ sltua4,a5,a0 >13 │ addia4,a4,-1 >14 │ sub a0,a5,a0 >15 │ and a0,a4,a0 >16 │ ret > > The below test suites are passed for this patch: > 1. The rv64gcv fully regression test. > > gcc/ChangeLog: > > * config/riscv/riscv.cc (riscv_gen_unsigned_xmode_reg): Add new > func impl to gen xmode rtx reg from operand rtx. > (riscv_expand_ussub): Gen xmode reg for operand 1. > * config/riscv/riscv.md: Allow const_int for operand 1. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/sat_arith.h: Add test helper macro. > * gcc.target/riscv/sat_u_sub_imm-1.c: New test. > * gcc.target/riscv/sat_u_sub_imm-1_1.c: New test. > * gcc.target/riscv/sat_u_sub_imm-1_2.c: New test. > * gcc.target/riscv/sat_u_sub_imm-2.c: New test. > * gcc.target/riscv/sat_u_sub_imm-2_1.c: New test. > * gcc.target/riscv/sat_u_sub_imm-2_2.c: New test. > * gcc.target/riscv/sat_u_sub_imm-3.c: New test. > * gcc.target/riscv/sat_u_sub_imm-3_1.c: New test. > * gcc.target/riscv/sat_u_sub_imm-3_2.c: New test. > * gcc.target/riscv/sat_u_sub_imm-4.c: New test. > * gcc.target/riscv/sat_u_sub_imm-run-1.c: New test. > * gcc.target/riscv/sat_u_sub_imm-run-2.c: New test. > * gcc.target/riscv/sat_u_sub_imm-run-3.c: New test. > * gcc.target/riscv/sat_u_sub_imm-run-4.c: New test. > > Signed-off-by: Pan Li > --- > gcc/config/riscv/riscv.cc | 51 - > gcc/config/riscv/riscv.md | 2 +- > gcc/testsuite/gcc.target/riscv/sat_arith.h| 10 > .../gcc.target/riscv/sat_u_sub_imm-1.c| 20 +++ > .../gcc.target/riscv/sat_u_sub_imm-1_1.c | 20 +++ > .../gcc.target/riscv/sat_u_sub_imm-1_2.c | 20 +++ > .../gcc.target/riscv/sat_u_sub_imm-2.c| 21 +++ > .../gcc.target/riscv/sat_u_sub_imm-2_1.c | 21 +++ > .../gcc.target/riscv/sat_u_sub_imm-2_2.c | 22 > .../gcc.target/riscv/sat_u_sub_imm-3.c| 20 +++ > .../gcc.target/riscv/sat_u_sub_imm-3_1.c | 21 +++ > .../gcc.target/riscv/sat_u_sub_imm-3_2.c | 22 +
RE: [PATCH v2] RISC-V: Support IMM for operand 0 of ussub pattern
This Patch may requires rebase, will send v3 for conflict resolving. Pan -Original Message- From: Li, Pan2 Sent: Sunday, August 4, 2024 7:48 PM To: gcc-patches@gcc.gnu.org Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com; Li, Pan2 Subject: [PATCH v2] RISC-V: Support IMM for operand 0 of ussub pattern From: Pan Li This patch would like to allow IMM for the operand 0 of ussub pattern. Aka .SAT_SUB(1023, y) as the below example. Form 1: #define DEF_SAT_U_SUB_IMM_FMT_1(T, IMM) \ T __attribute__((noinline)) \ sat_u_sub_imm##IMM##_##T##_fmt_1 (T y) \ { \ return (T)IMM >= y ? (T)IMM - y : 0; \ } DEF_SAT_U_SUB_IMM_FMT_1(uint64_t, 1023) Before this patch: 10 │ sat_u_sub_imm82_uint64_t_fmt_1: 11 │ li a5,82 12 │ bgtua0,a5,.L3 13 │ sub a0,a5,a0 14 │ ret 15 │ .L3: 16 │ li a0,0 17 │ ret After this patch: 10 │ sat_u_sub_imm82_uint64_t_fmt_1: 11 │ li a5,82 12 │ sltua4,a5,a0 13 │ addia4,a4,-1 14 │ sub a0,a5,a0 15 │ and a0,a4,a0 16 │ ret The below test suites are passed for this patch: 1. The rv64gcv fully regression test. gcc/ChangeLog: * config/riscv/riscv.cc (riscv_gen_unsigned_xmode_reg): Add new func impl to gen xmode rtx reg from operand rtx. (riscv_expand_ussub): Gen xmode reg for operand 1. * config/riscv/riscv.md: Allow const_int for operand 1. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_arith.h: Add test helper macro. * gcc.target/riscv/sat_u_sub_imm-1.c: New test. * gcc.target/riscv/sat_u_sub_imm-1_1.c: New test. * gcc.target/riscv/sat_u_sub_imm-1_2.c: New test. * gcc.target/riscv/sat_u_sub_imm-2.c: New test. * gcc.target/riscv/sat_u_sub_imm-2_1.c: New test. * gcc.target/riscv/sat_u_sub_imm-2_2.c: New test. * gcc.target/riscv/sat_u_sub_imm-3.c: New test. * gcc.target/riscv/sat_u_sub_imm-3_1.c: New test. * gcc.target/riscv/sat_u_sub_imm-3_2.c: New test. * gcc.target/riscv/sat_u_sub_imm-4.c: New test. * gcc.target/riscv/sat_u_sub_imm-run-1.c: New test. * gcc.target/riscv/sat_u_sub_imm-run-2.c: New test. * gcc.target/riscv/sat_u_sub_imm-run-3.c: New test. * gcc.target/riscv/sat_u_sub_imm-run-4.c: New test. Signed-off-by: Pan Li --- gcc/config/riscv/riscv.cc | 51 - gcc/config/riscv/riscv.md | 2 +- gcc/testsuite/gcc.target/riscv/sat_arith.h| 10 .../gcc.target/riscv/sat_u_sub_imm-1.c| 20 +++ .../gcc.target/riscv/sat_u_sub_imm-1_1.c | 20 +++ .../gcc.target/riscv/sat_u_sub_imm-1_2.c | 20 +++ .../gcc.target/riscv/sat_u_sub_imm-2.c| 21 +++ .../gcc.target/riscv/sat_u_sub_imm-2_1.c | 21 +++ .../gcc.target/riscv/sat_u_sub_imm-2_2.c | 22 .../gcc.target/riscv/sat_u_sub_imm-3.c| 20 +++ .../gcc.target/riscv/sat_u_sub_imm-3_1.c | 21 +++ .../gcc.target/riscv/sat_u_sub_imm-3_2.c | 22 .../gcc.target/riscv/sat_u_sub_imm-4.c| 19 +++ .../gcc.target/riscv/sat_u_sub_imm-run-1.c| 56 +++ .../gcc.target/riscv/sat_u_sub_imm-run-2.c| 56 +++ .../gcc.target/riscv/sat_u_sub_imm-run-3.c| 55 ++ .../gcc.target/riscv/sat_u_sub_imm-run-4.c| 48 17 files changed, 482 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-1_1.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-1_2.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-2_1.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-2_2.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-3.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-3_1.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-3_2.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-4.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-run-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-run-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-run-3.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-run-4.c diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index b19d56149e7..5e4e9722729 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -11612,6 +11612,55 @@ riscv_expand_usadd (rtx dest, rtx x, rtx y) emit_move_insn (dest, gen_lowpart (mode, xmode_dest)); } +/* Generate a REG rtx of Xmode from the given rtx and mode. + The rtx x can be REG (QI/HI/SI/DI)
RE: [PATCH v2] Internal-fn: Handle vector bool type for type strict match mode [PR116103]
> Looks good to me too. Sorry, didn't realise you were waiting for a second > ack. Never mind, thanks Richard S for confirmation and suggestions. Pan -Original Message- From: Richard Sandiford Sent: Tuesday, August 13, 2024 5:25 PM To: Li, Pan2 Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; tamar.christ...@arm.com; jeffreya...@gmail.com; rdapp@gmail.com; Richard Biener Subject: Re: [PATCH v2] Internal-fn: Handle vector bool type for type strict match mode [PR116103] "Li, Pan2" writes: > Hi Richard S, > > Please feel free to let me know if there is any further comments in v2. > Thanks a lot. Looks good to me too. Sorry, didn't realise you were waiting for a second ack. Thanks, Richard > > Pan > > > -Original Message- > From: Li, Pan2 > Sent: Thursday, August 1, 2024 8:11 PM > To: Richard Biener > Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; > tamar.christ...@arm.com; jeffreya...@gmail.com; rdapp@gmail.com > Subject: RE: [PATCH v2] Internal-fn: Handle vector bool type for type strict > match mode [PR116103] > >> Still OK. > > Thanks Richard, let me wait the final confirmation from Richard S. > > Pan > > -Original Message- > From: Richard Biener > Sent: Tuesday, July 30, 2024 5:03 PM > To: Li, Pan2 > Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; > tamar.christ...@arm.com; jeffreya...@gmail.com; rdapp@gmail.com > Subject: Re: [PATCH v2] Internal-fn: Handle vector bool type for type strict > match mode [PR116103] > > On Tue, Jul 30, 2024 at 5:08 AM wrote: >> >> From: Pan Li >> >> For some target like target=amdgcn-amdhsa, we need to take care of >> vector bool types prior to general vector mode types. Or we may have >> the asm check failure as below. >> >> gcc.target/gcn/cond_smax_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, >> s[0-9]+, v[0-9]+ 80 >> gcc.target/gcn/cond_smin_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, >> s[0-9]+, v[0-9]+ 80 >> gcc.target/gcn/cond_umax_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, >> s[0-9]+, v[0-9]+ 56 >> gcc.target/gcn/cond_umin_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, >> s[0-9]+, v[0-9]+ 56 >> gcc.dg/tree-ssa/loop-bound-2.c scan-tree-dump-not ivopts "zero if " >> >> The below test suites are passed for this patch. >> 1. The rv64gcv fully regression tests. >> 2. The x86 bootstrap tests. >> 3. The x86 fully regression tests. >> 4. The amdgcn test case as above. > > Still OK. > > Richard. > >> gcc/ChangeLog: >> >> * internal-fn.cc (type_strictly_matches_mode_p): Add handling >> for vector bool type. >> >> Signed-off-by: Pan Li >> --- >> gcc/internal-fn.cc | 10 ++ >> 1 file changed, 10 insertions(+) >> >> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc >> index 8a2e07f2f96..966594a52ed 100644 >> --- a/gcc/internal-fn.cc >> +++ b/gcc/internal-fn.cc >> @@ -4171,6 +4171,16 @@ direct_internal_fn_optab (internal_fn fn) >> static bool >> type_strictly_matches_mode_p (const_tree type) >> { >> + /* The masked vector operations have both vector data operands and vector >> + boolean operands. The vector data operands are expected to have a >> vector >> + mode, but the vector boolean operands can be an integer mode rather >> than >> + a vector mode, depending on how TARGET_VECTORIZE_GET_MASK_MODE is >> + defined. PR116103. */ >> + if (VECTOR_BOOLEAN_TYPE_P (type) >> + && SCALAR_INT_MODE_P (TYPE_MODE (type)) >> + && TYPE_PRECISION (TREE_TYPE (type)) == 1) >> +return true; >> + >>if (VECTOR_TYPE_P (type)) >> return VECTOR_MODE_P (TYPE_MODE (type)); >> >> -- >> 2.34.1 >>
RE: [PATCH v2] RISC-V: Make sure high bits of usadd operands is clean for HI/QI [PR116278]
> Isn't this wrong for SImode on rv64? It seems to me the right test is > mode != word_mode? > Assuming that works, it's OK for the trunk. Thanks Jeff, Simode version of test file doesn't have this issue. Thus, only HI and QI here. I will add a new test for SImode in v3 to ensure this. Pan -Original Message- From: Jeff Law Sent: Tuesday, August 13, 2024 12:58 AM To: Li, Pan2 ; gcc-patches@gcc.gnu.org Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH v2] RISC-V: Make sure high bits of usadd operands is clean for HI/QI [PR116278] On 8/11/24 4:43 AM, pan2...@intel.com wrote: > +static rtx > +riscv_gen_zero_extend_rtx (rtx x, machine_mode mode) > +{ > + if (mode != HImode && mode != QImode) > +return gen_lowpart (Xmode, x); Isn't this wrong for SImode on rv64? It seems to me the right test is mode != word_mode? Assuming that works, it's OK for the trunk. jeff
RE: [PATCH v1] RISC-V: Make sure high bits of usadd operands is clean for HI/QI [PR116278]
> Isn't this just zero extension from a narrower mode to a wider mode? > Why not just use zero_extend? That will take advantage of existing > expansion code to select an efficient extension approach at initial RTL > generation rather than waiting for combine to clean things up. Thanks Jeff, let me have a try in v2. Pan -Original Message- From: Jeff Law Sent: Saturday, August 10, 2024 11:34 PM To: Li, Pan2 ; gcc-patches@gcc.gnu.org Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH v1] RISC-V: Make sure high bits of usadd operands is clean for HI/QI [PR116278] On 8/8/24 9:12 PM, pan2...@intel.com wrote: > From: Pan Li > > For QI/HImode of .SAT_ADD, the operands may be sign-extended and the > high bits of Xmode may be all 1 which is not expected. For example as > below code. > > signed char b[1]; > unsigned short c; > signed char *d = b; > int main() { >b[0] = -40; >c = ({ (unsigned short)d[0] < 0xFFF6 ? (unsigned short)d[0] : 0xFFF6; }) + > 9; >__builtin_printf("%d\n", c); > } > > After expanding we have: > > ;; _6 = .SAT_ADD (_3, 9); > (insn 8 7 9 (set (reg:DI 143) > (high:DI (symbol_ref:DI ("d") [flags 0x86] ))) > (nil)) > (insn 9 8 10 (set (reg/f:DI 142) > (mem/f/c:DI (lo_sum:DI (reg:DI 143) > (symbol_ref:DI ("d") [flags 0x86] )) [1 d+0 S8 > A64])) > (nil)) > (insn 10 9 11 (set (reg:HI 144 [ _3 ]) > (sign_extend:HI (mem:QI (reg/f:DI 142) [0 *d.0_1+0 S1 A8]))) > "test.c":7:10 -1 > (nil)) > > The convert from signed char to unsigned short will have sign_extend rtl > as above. And finally become the lb insn as below: > > lb a1,0(a5) // a1 is -40, aka 0xffd8 > lui a0,0x1a > addia5,a1,9 > sllia5,a5,0x30 > srlia5,a5,0x30 // a5 is 65505 > sltua1,a5,a1 // compare 65505 and 0xffd8 => TRUE > > The sltu try to compare 65505 and 0xffd8 here, but we > actually want to compare 65505 and 65496 (0xffd8). Thus we need to > clean up the high bits to ensure this. > > The below test suites are passed for this patch: > * The rv64gcv fully regression test. > > PR target/116278 > > gcc/ChangeLog: > > * config/riscv/riscv.cc (riscv_cleanup_rtx_high): Add new func > impl to cleanup high bits of rtx. > (riscv_expand_usadd): Leverage above func to cleanup operands > and sum. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/sat_u_add-1.c: Adjust asm check. > * gcc.target/riscv/sat_u_add-10.c: Ditto. > * gcc.target/riscv/sat_u_add-13.c: Ditto. > * gcc.target/riscv/sat_u_add-14.c: Ditto. > * gcc.target/riscv/sat_u_add-17.c: Ditto. > * gcc.target/riscv/sat_u_add-18.c: Ditto. > * gcc.target/riscv/sat_u_add-2.c: Ditto. > * gcc.target/riscv/sat_u_add-21.c: Ditto. > * gcc.target/riscv/sat_u_add-22.c: Ditto. > * gcc.target/riscv/sat_u_add-5.c: Ditto. > * gcc.target/riscv/sat_u_add-6.c: Ditto. > * gcc.target/riscv/sat_u_add-9.c: Ditto. > * gcc.target/riscv/sat_u_add_imm-1.c: Ditto. > * gcc.target/riscv/sat_u_add_imm-10.c: Ditto. > * gcc.target/riscv/sat_u_add_imm-13.c: Ditto. > * gcc.target/riscv/sat_u_add_imm-14.c: Ditto. > * gcc.target/riscv/sat_u_add_imm-2.c: Ditto. > * gcc.target/riscv/sat_u_add_imm-5.c: Ditto. > * gcc.target/riscv/sat_u_add_imm-6.c: Ditto. > * gcc.target/riscv/sat_u_add_imm-9.c: Ditto. > * gcc.target/riscv/pr116278-run-1.c: New test. > > Signed-off-by: Pan Li > --- > gcc/config/riscv/riscv.cc | 30 ++- > .../gcc.target/riscv/pr116278-run-1.c | 16 ++ > gcc/testsuite/gcc.target/riscv/sat_u_add-1.c | 1 + > gcc/testsuite/gcc.target/riscv/sat_u_add-10.c | 2 ++ > gcc/testsuite/gcc.target/riscv/sat_u_add-13.c | 1 + > gcc/testsuite/gcc.target/riscv/sat_u_add-14.c | 2 ++ > gcc/testsuite/gcc.target/riscv/sat_u_add-17.c | 1 + > gcc/testsuite/gcc.target/riscv/sat_u_add-18.c | 2 ++ > gcc/testsuite/gcc.target/riscv/sat_u_add-2.c | 2 ++ > gcc/testsuite/gcc.target/riscv/sat_u_add-21.c | 1 + > gcc/testsuite/gcc.target/riscv/sat_u_add-22.c | 2 ++ > gcc/testsuite/gcc.target/riscv/sat_u_add-5.c | 1 + > gcc/testsuite/gcc.target/riscv/sat_u_add-6.c | 2 ++ > gcc/testsuite/gcc.target/riscv/sat_u_add-9.c | 1 + > .../gcc.target/riscv/sat_u_add_imm-1.c| 1 + > .../gcc.target/riscv/sat_u_add_imm-10.c | 2 ++ > .../gcc.target/riscv/sat_u_add_imm-13.c | 1 + > .../gcc.target/riscv/sa
RE: [PATCH v1] RISC-V: Bugfix incorrect operand for vwsll auto-vect
> I think my original (failed) idea was this pattern to be an > intermediate/bridge > pattern that never splits. Yes, this pattern should not be hit by design, and any changes to the layout of pattern may result in some vwsll autovec failure. > Once we need to "split" maybe the regular shift is > better or at least similar? Actually it is something similar to short = char << int. Maybe we can 1. extend char to short. 2. truncate int to short. Then regular short shift is suitable here. Honestly I am not sure it is better than vwsll. Pan -Original Message- From: Robin Dapp Sent: Saturday, August 10, 2024 10:32 PM To: Li, Pan2 ; gcc-patches@gcc.gnu.org Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; Robin Dapp Subject: Re: [PATCH v1] RISC-V: Bugfix incorrect operand for vwsll auto-vect A bit of bikeshedding: While it's obviously a bug, I'm not really sure it's useful to truncate before emitting the widening shift. Do we save an instruction vs. the regular non-widening shift by doing so? I think my original (failed) idea was this pattern to be an intermediate/bridge pattern that never splits. Once we need to "split" maybe the regular shift is better or at least similar? -- Regards Robin
RE: [PATCH v2] Internal-fn: Handle vector bool type for type strict match mode [PR116103]
Hi Richard S, Please feel free to let me know if there is any further comments in v2. Thanks a lot. Pan -Original Message- From: Li, Pan2 Sent: Thursday, August 1, 2024 8:11 PM To: Richard Biener Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; tamar.christ...@arm.com; jeffreya...@gmail.com; rdapp@gmail.com Subject: RE: [PATCH v2] Internal-fn: Handle vector bool type for type strict match mode [PR116103] > Still OK. Thanks Richard, let me wait the final confirmation from Richard S. Pan -Original Message- From: Richard Biener Sent: Tuesday, July 30, 2024 5:03 PM To: Li, Pan2 Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; tamar.christ...@arm.com; jeffreya...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH v2] Internal-fn: Handle vector bool type for type strict match mode [PR116103] On Tue, Jul 30, 2024 at 5:08 AM wrote: > > From: Pan Li > > For some target like target=amdgcn-amdhsa, we need to take care of > vector bool types prior to general vector mode types. Or we may have > the asm check failure as below. > > gcc.target/gcn/cond_smax_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, > s[0-9]+, v[0-9]+ 80 > gcc.target/gcn/cond_smin_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, > s[0-9]+, v[0-9]+ 80 > gcc.target/gcn/cond_umax_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, > s[0-9]+, v[0-9]+ 56 > gcc.target/gcn/cond_umin_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, > s[0-9]+, v[0-9]+ 56 > gcc.dg/tree-ssa/loop-bound-2.c scan-tree-dump-not ivopts "zero if " > > The below test suites are passed for this patch. > 1. The rv64gcv fully regression tests. > 2. The x86 bootstrap tests. > 3. The x86 fully regression tests. > 4. The amdgcn test case as above. Still OK. Richard. > gcc/ChangeLog: > > * internal-fn.cc (type_strictly_matches_mode_p): Add handling > for vector bool type. > > Signed-off-by: Pan Li > --- > gcc/internal-fn.cc | 10 ++ > 1 file changed, 10 insertions(+) > > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc > index 8a2e07f2f96..966594a52ed 100644 > --- a/gcc/internal-fn.cc > +++ b/gcc/internal-fn.cc > @@ -4171,6 +4171,16 @@ direct_internal_fn_optab (internal_fn fn) > static bool > type_strictly_matches_mode_p (const_tree type) > { > + /* The masked vector operations have both vector data operands and vector > + boolean operands. The vector data operands are expected to have a > vector > + mode, but the vector boolean operands can be an integer mode rather > than > + a vector mode, depending on how TARGET_VECTORIZE_GET_MASK_MODE is > + defined. PR116103. */ > + if (VECTOR_BOOLEAN_TYPE_P (type) > + && SCALAR_INT_MODE_P (TYPE_MODE (type)) > + && TYPE_PRECISION (TREE_TYPE (type)) == 1) > +return true; > + >if (VECTOR_TYPE_P (type)) > return VECTOR_MODE_P (TYPE_MODE (type)); > > -- > 2.34.1 >
RE: [PATCH v1] RISC-V: Implement the quad and oct .SAT_TRUNC for scalar
Kindly ping++. Pan -Original Message- From: Li, Pan2 Sent: Wednesday, July 31, 2024 9:12 AM To: gcc-patches@gcc.gnu.org Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com Subject: RE: [PATCH v1] RISC-V: Implement the quad and oct .SAT_TRUNC for scalar Kindly ping. Pan -Original Message- From: Li, Pan2 Sent: Tuesday, July 23, 2024 1:06 PM To: gcc-patches@gcc.gnu.org Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com; Li, Pan2 Subject: [PATCH v1] RISC-V: Implement the quad and oct .SAT_TRUNC for scalar From: Pan Li This patch would like to implement the quad and oct .SAT_TRUNC pattern in the riscv backend. Aka: Form 1: #define DEF_SAT_U_TRUC_FMT_1(NT, WT) \ NT __attribute__((noinline)) \ sat_u_truc_##WT##_to_##NT##_fmt_1 (WT x) \ {\ bool overflow = x > (WT)(NT)(-1); \ return ((NT)x) | (NT)-overflow;\ } DEF_SAT_U_TRUC_FMT_1(uint16_t, uint64_t) Before this patch: 4 │ __attribute__((noinline)) 5 │ uint16_t sat_u_truc_uint64_t_to_uint16_t_fmt_1 (uint64_t x) 6 │ { 7 │ _Bool overflow; 8 │ short unsigned int _1; 9 │ short unsigned int _2; 10 │ short unsigned int _3; 11 │ uint16_t _6; 12 │ 13 │ ;; basic block 2, loop depth 0 14 │ ;;pred: ENTRY 15 │ overflow_5 = x_4(D) > 65535; 16 │ _1 = (short unsigned int) x_4(D); 17 │ _2 = (short unsigned int) overflow_5; 18 │ _3 = -_2; 19 │ _6 = _1 | _3; 20 │ return _6; 21 │ ;;succ: EXIT 22 │ 23 │ } After this patch: 3 │ 4 │ __attribute__((noinline)) 5 │ uint16_t sat_u_truc_uint64_t_to_uint16_t_fmt_1 (uint64_t x) 6 │ { 7 │ uint16_t _6; 8 │ 9 │ ;; basic block 2, loop depth 0 10 │ ;;pred: ENTRY 11 │ _6 = .SAT_TRUNC (x_4(D)); [tail call] 12 │ return _6; 13 │ ;;succ: EXIT 14 │ 15 │ } The below tests suites are passed for this patch 1. The rv64gcv fully regression test. 2. The rv64gcv build with glibc gcc/ChangeLog: * config/riscv/iterators.md (ANYI_QUAD_TRUNC): New iterator for quad truncation. (ANYI_OCT_TRUNC): New iterator for oct truncation. (ANYI_QUAD_TRUNCATED): New attr for truncated quad modes. (ANYI_OCT_TRUNCATED): New attr for truncated oct modes. (anyi_quad_truncated): Ditto but for lower case. (anyi_oct_truncated): Ditto but for lower case. * config/riscv/riscv.md (ustrunc2): Add new pattern for quad truncation. (ustrunc2): Ditto but for oct. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-2.c: Adjust the expand dump check times. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-3.c: Ditto. * gcc.target/riscv/sat_arith_data.h: Add test helper macros. * gcc.target/riscv/sat_u_trunc-4.c: New test. * gcc.target/riscv/sat_u_trunc-5.c: New test. * gcc.target/riscv/sat_u_trunc-6.c: New test. * gcc.target/riscv/sat_u_trunc-run-4.c: New test. * gcc.target/riscv/sat_u_trunc-run-5.c: New test. * gcc.target/riscv/sat_u_trunc-run-6.c: New test. Signed-off-by: Pan Li --- gcc/config/riscv/iterators.md | 20 gcc/config/riscv/riscv.md | 20 .../rvv/autovec/unop/vec_sat_u_trunc-2.c | 2 +- .../rvv/autovec/unop/vec_sat_u_trunc-3.c | 2 +- .../gcc.target/riscv/sat_arith_data.h | 51 +++ .../gcc.target/riscv/sat_u_trunc-4.c | 17 +++ .../gcc.target/riscv/sat_u_trunc-5.c | 17 +++ .../gcc.target/riscv/sat_u_trunc-6.c | 20 .../gcc.target/riscv/sat_u_trunc-run-4.c | 16 ++ .../gcc.target/riscv/sat_u_trunc-run-5.c | 16 ++ .../gcc.target/riscv/sat_u_trunc-run-6.c | 16 ++ 11 files changed, 195 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-4.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-5.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-6.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-run-4.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-run-5.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-run-6.c diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md index 734da041f0c..bdcdb8babc8 100644 --- a/gcc/config/riscv/iterators.md +++ b/gcc/config/riscv/iterators.md @@ -67,14 +67,34 @@ (define_mode_iterator ANYI [QI HI SI (DI "TARGET_64BIT")]) (define_mode_iterator ANYI_DOUBLE_TRUNC [HI SI (DI "TARGET_64BIT")]) +(define_mode_iterator ANYI_QUAD_TRUNC [SI (DI "TARGET_64BIT")]) + +(define_mode_iterator ANYI_OCT_TRUNC [(DI "TARGET_64BIT"
RE: [PATCH v2] Vect: Make sure the lhs type of .SAT_TRUNC has its mode precision [PR116202]
> OK. Thanks Richard. Just notice we can put type_has_mode_precision_p as the first condition to avoid unnecessary pattern matching (which is heavy), will commit with this change if no surprise from test suite. From: > + if (gimple_unsigned_integer_sat_trunc (lhs, ops, NULL) > + && type_has_mode_precision_p (otype)) To: > + if (type_has_mode_precision_p (otype) > + && gimple_unsigned_integer_sat_trunc (lhs, ops, NULL)) Pan -Original Message- From: Richard Biener Sent: Tuesday, August 6, 2024 9:26 PM To: Li, Pan2 Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH v2] Vect: Make sure the lhs type of .SAT_TRUNC has its mode precision [PR116202] On Tue, Aug 6, 2024 at 2:59 PM wrote: > > From: Pan Li > > The .SAT_TRUNC vect pattern recog is valid when the lhs type has > its mode precision. For example as below, QImode with 1 bit precision > like _Bool is invalid here. > > g_12 = (long unsigned int) _2; > _13 = MIN_EXPR ; > _3 = (_Bool) _13; > > The above pattern cannot be recog as .SAT_TRUNC (g_12) because the dest > only has 1 bit precision with QImode mode. Aka the type doesn't have > the mode precision. > > The below tests are passed for this patch. > 1. The rv64gcv fully regression tests. > 2. The x86 bootstrap tests. > 3. The x86 fully regression tests. OK > PR target/116202 > > gcc/ChangeLog: > > * tree-vect-patterns.cc (vect_recog_sat_trunc_pattern): Add the > type_has_mode_precision_p check for the lhs type. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/rvv/base/pr116202-run-1.c: New test. > > Signed-off-by: Pan Li > --- > .../riscv/rvv/base/pr116202-run-1.c | 24 +++ > gcc/tree-vect-patterns.cc | 5 ++-- > 2 files changed, 27 insertions(+), 2 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr116202-run-1.c > > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr116202-run-1.c > b/gcc/testsuite/gcc.target/riscv/rvv/base/pr116202-run-1.c > new file mode 100644 > index 000..d150f20b5d9 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr116202-run-1.c > @@ -0,0 +1,24 @@ > +/* { dg-do run } */ > +/* { dg-options "-O3 -march=rv64gcv_zvl256b -fdump-rtl-expand-details" } */ > + > +int b[24]; > +_Bool c[24]; > + > +int main() { > + for (int f = 0; f < 4; ++f) > +b[f] = 6; > + > + for (int f = 0; f < 24; f += 4) > +c[f] = ({ > + int g = ({ > +unsigned long g = -b[f]; > +1 < g ? 1 : g; > + }); > + g; > +}); > + > + if (c[0] != 1) > +__builtin_abort (); > +} > + > +/* { dg-final { scan-rtl-dump-not ".SAT_TRUNC " "expand" } } */ > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc > index 4674a16d15f..74f80587b0e 100644 > --- a/gcc/tree-vect-patterns.cc > +++ b/gcc/tree-vect-patterns.cc > @@ -4695,11 +4695,12 @@ vect_recog_sat_trunc_pattern (vec_info *vinfo, > stmt_vec_info stmt_vinfo, > >tree ops[1]; >tree lhs = gimple_assign_lhs (last_stmt); > + tree otype = TREE_TYPE (lhs); > > - if (gimple_unsigned_integer_sat_trunc (lhs, ops, NULL)) > + if (gimple_unsigned_integer_sat_trunc (lhs, ops, NULL) > + && type_has_mode_precision_p (otype)) > { >tree itype = TREE_TYPE (ops[0]); > - tree otype = TREE_TYPE (lhs); >tree v_itype = get_vectype_for_scalar_type (vinfo, itype); >tree v_otype = get_vectype_for_scalar_type (vinfo, otype); >internal_fn fn = IFN_SAT_TRUNC; > -- > 2.43.0 >
RE: [PATCH v1] Match: Support form 1 for scalar signed integer .SAT_ADD
> Ah, yeah - that's the usual (premature) frontend optimization to > shorten operations after the standard > mandated standard conversion (to 'int' in this case). Thanks Richard for confirmation, let me refine the matching in v2. Pan -Original Message- From: Richard Biener Sent: Tuesday, August 6, 2024 7:50 PM To: Li, Pan2 Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH v1] Match: Support form 1 for scalar signed integer .SAT_ADD On Tue, Aug 6, 2024 at 3:21 AM Li, Pan2 wrote: > > Hi Richard, > > It looks like the plus will have additional convert to unsigned in int8 and > int16, see below example in test.c.006t.gimple. > And we need these convert ops in one matching pattern to cover all int scalar > types. Ah, yeah - that's the usual (premature) frontend optimization to shorten operations after the standard mandated standard conversion (to 'int' in this case). > I am not sure if there is a better way here, given convert in matching > pattern is not very elegant up to a point. > > int16_t > add_i16 (int16_t a, int16_t b) > { > int16_t sum = a + b; > return sum; > } > > int32_t > add_i32 (int32_t a, int32_t b) > { > int32_t sum = a + b; > return sum; > } > > --- 006t.gimple --- > int16_t add_i16 (int16_t a, int16_t b) > { > int16_t D.2815; > int16_t sum; > > a.0_1 = (unsigned short) a; > b.1_2 = (unsigned short) b; > _3 = a.0_1 + b.1_2; > sum = (int16_t) _3; > D.2815 = sum; > return D.2815; > } > > int32_t add_i32 (int32_t a, int32_t b) > { > int32_t D.2817; > int32_t sum; > > sum = a + b; > D.2817 = sum; > return D.2817; > } > > Pan > > -Original Message- > From: Li, Pan2 > Sent: Monday, August 5, 2024 9:52 PM > To: Richard Biener > Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; > jeffreya...@gmail.com; rdapp@gmail.com > Subject: RE: [PATCH v1] Match: Support form 1 for scalar signed integer > .SAT_ADD > > Thanks Richard for comments. > > > The convert looks odd to me given @0 is involved in both & operands. > > The convert is introduced as the GIMPLE IL is somehow different for int8_t > when compares to int32_t or int64_t. > There are some additional ops convert to unsigned for plus, see below line > 8-9 and line 22-23. > But we cannot see similar GIMPLE IL for int32_t and int64_t. To reconcile the > types from int8_t to int64_t, add the > convert here. > > Or may be I have some mistake in the example, let me revisit it and send v2 > if no surprise. > >4 │ __attribute__((noinline)) >5 │ int8_t sat_s_add_int8_t_fmt_1 (int8_t x, int8_t y) >6 │ { >7 │ int8_t sum; >8 │ unsigned char x.1_1; >9 │ unsigned char y.2_2; > 10 │ unsigned char _3; > 11 │ signed char _4; > 12 │ signed char _5; > 13 │ int8_t _6; > 14 │ _Bool _11; > 15 │ signed char _12; > 16 │ signed char _13; > 17 │ signed char _14; > 18 │ signed char _22; > 19 │ signed char _23; > 20 │ > 21 │[local count: 1073741822]: > 22 │ x.1_1 = (unsigned char) x_7(D); > 23 │ y.2_2 = (unsigned char) y_8(D); > 24 │ _3 = x.1_1 + y.2_2; > 25 │ sum_9 = (int8_t) _3; > 26 │ _4 = x_7(D) ^ y_8(D); > 27 │ _5 = x_7(D) ^ sum_9; > 28 │ _23 = ~_4; > 29 │ _22 = _5 & _23; > 30 │ if (_22 < 0) > 31 │ goto ; [41.00%] > 32 │ else > 33 │ goto ; [59.00%] > 34 │ > 35 │[local count: 259738146]: > 36 │ _11 = x_7(D) < 0; > 37 │ _12 = (signed char) _11; > 38 │ _13 = -_12; > 39 │ _14 = _13 ^ 127; > 40 │ > 41 │[local count: 1073741824]: > 42 │ # _6 = PHI <_14(3), sum_9(2)> > 43 │ return _6; > 44 │ > 45 │ } > > Pan > > -Original Message- > From: Richard Biener > Sent: Monday, August 5, 2024 7:16 PM > To: Li, Pan2 > Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; > jeffreya...@gmail.com; rdapp@gmail.com > Subject: Re: [PATCH v1] Match: Support form 1 for scalar signed integer > .SAT_ADD > > On Mon, Aug 5, 2024 at 9:14 AM wrote: > > > > From: Pan Li > > > > This patch would like to support the form 1 of the scalar signed > > integer .SAT_ADD. Aka below example: > > > > Form 1: > > #define DEF_SAT_S_ADD_FMT_1(T) \ > > T __attribute__((noinline))\ > > sat_s_add_##T
RE: [PATCH v1] Match: Add type_has_mode_precision_p check for SAT_TRUNC [PR116202]
> Well that means the caller (vectorizer pattern recog?) wrongly used a > vector of QImode in > the first place, so it needs to check the scalar mode as well? Current vect pattern recog only check the vector mode of define_expand pattern implemented or not. Similar as below without scalar part. tree v_itype = get_vectype_for_scalar_type (vinfo, itype); tree v_otype = get_vectype_for_scalar_type (vinfo, otype); direct_internal_fn_supported_p (fn, tree_pair (v_otype, v_itype), ... > So possibly vectorizable_internal_function would need to be amended or better, > vector pattern matching be constrainted. Sure, will have a try in vectorizable_internal_function. Pan -Original Message- From: Richard Biener Sent: Monday, August 5, 2024 9:43 PM To: Li, Pan2 Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH v1] Match: Add type_has_mode_precision_p check for SAT_TRUNC [PR116202] On Mon, Aug 5, 2024 at 3:04 PM Li, Pan2 wrote: > > > Isn't that now handled by the direct_internal_fn_supported_p check? That > > is, > > by the caller which needs to verify the matched operation is supported by > > the target? > > type_strictly_matches_mode_p doesn't help here (include the un-committed one). > It will hit below case and return true directly as TYPE_MODE (type) is > E_RVVM1QImode. > >if (VECTOR_TYPE_P (type)) > return VECTOR_MODE_P (TYPE_MODE (type)); > > And looks we cannot TREE_PRECISION on vector type here similar as > type_has_mode_precision_p > do for scalar types. Thus, add the check to the matching. > > Looks like we need to take care of vector in type_strictly_matches_mode_p, > right ? Well that means the caller (vectorizer pattern recog?) wrongly used a vector of QImode in the first place, so it needs to check the scalar mode as well? Vector type assignment does /* For vector types of elements whose mode precision doesn't match their types precision we use a element type of mode precision. The vectorization routines will have to make sure they support the proper result truncation/extension. We also make sure to build vector types with INTEGER_TYPE component type only. */ if (INTEGRAL_TYPE_P (scalar_type) && (GET_MODE_BITSIZE (inner_mode) != TYPE_PRECISION (scalar_type) || TREE_CODE (scalar_type) != INTEGER_TYPE)) scalar_type = build_nonstandard_integer_type (GET_MODE_BITSIZE (inner_mode), TYPE_UNSIGNED (scalar_type)); So possibly vectorizable_internal_function would need to be amended or better, vector pattern matching be constrainted. Richard. > Pan > > -Original Message- > From: Richard Biener > Sent: Monday, August 5, 2024 7:02 PM > To: Li, Pan2 > Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; > jeffreya...@gmail.com; rdapp@gmail.com > Subject: Re: [PATCH v1] Match: Add type_has_mode_precision_p check for > SAT_TRUNC [PR116202] > > On Sun, Aug 4, 2024 at 1:47 PM wrote: > > > > From: Pan Li > > > > The .SAT_TRUNC matching can only perform the type has its mode > > precision. > > > > g_12 = (long unsigned int) _2; > > _13 = MIN_EXPR ; > > _3 = (_Bool) _13; > > > > The above pattern cannot be recog as .SAT_TRUNC (g_12) because the dest > > only has 1 bit precision but QImode. Aka the type doesn't have the mode > > precision. Thus, add the type_has_mode_precision_p for the dest to > > avoid such case. > > > > The below tests are passed for this patch. > > 1. The rv64gcv fully regression tests. > > 2. The x86 bootstrap tests. > > 3. The x86 fully regression tests. > > Isn't that now handled by the direct_internal_fn_supported_p check? That is, > by the caller which needs to verify the matched operation is supported by > the target? > > > PR target/116202 > > > > gcc/ChangeLog: > > > > * match.pd: Add type_has_mode_precision_p for the dest type > > of the .SAT_TRUNC matching. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/riscv/rvv/base/pr116202-run-1.c: New test. > > > > Signed-off-by: Pan Li > > --- > > gcc/match.pd | 6 +++-- > > .../riscv/rvv/base/pr116202-run-1.c | 24 +++ > > 2 files changed, 28 insertions(+), 2 deletions(-) > > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr116202-run-1.c > > > > diff --git a/gcc/match.pd b/gcc/match.pd > > index c9c8478d286..dfa0bba3908 100644 > > --- a/gcc/match.pd > > +++ b/gcc/mat
RE: [PATCH v1] Match: Support form 1 for scalar signed integer .SAT_ADD
Hi Richard, It looks like the plus will have additional convert to unsigned in int8 and int16, see below example in test.c.006t.gimple. And we need these convert ops in one matching pattern to cover all int scalar types. I am not sure if there is a better way here, given convert in matching pattern is not very elegant up to a point. int16_t add_i16 (int16_t a, int16_t b) { int16_t sum = a + b; return sum; } int32_t add_i32 (int32_t a, int32_t b) { int32_t sum = a + b; return sum; } --- 006t.gimple --- int16_t add_i16 (int16_t a, int16_t b) { int16_t D.2815; int16_t sum; a.0_1 = (unsigned short) a; b.1_2 = (unsigned short) b; _3 = a.0_1 + b.1_2; sum = (int16_t) _3; D.2815 = sum; return D.2815; } int32_t add_i32 (int32_t a, int32_t b) { int32_t D.2817; int32_t sum; sum = a + b; D.2817 = sum; return D.2817; } Pan -Original Message- From: Li, Pan2 Sent: Monday, August 5, 2024 9:52 PM To: Richard Biener Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com Subject: RE: [PATCH v1] Match: Support form 1 for scalar signed integer .SAT_ADD Thanks Richard for comments. > The convert looks odd to me given @0 is involved in both & operands. The convert is introduced as the GIMPLE IL is somehow different for int8_t when compares to int32_t or int64_t. There are some additional ops convert to unsigned for plus, see below line 8-9 and line 22-23. But we cannot see similar GIMPLE IL for int32_t and int64_t. To reconcile the types from int8_t to int64_t, add the convert here. Or may be I have some mistake in the example, let me revisit it and send v2 if no surprise. 4 │ __attribute__((noinline)) 5 │ int8_t sat_s_add_int8_t_fmt_1 (int8_t x, int8_t y) 6 │ { 7 │ int8_t sum; 8 │ unsigned char x.1_1; 9 │ unsigned char y.2_2; 10 │ unsigned char _3; 11 │ signed char _4; 12 │ signed char _5; 13 │ int8_t _6; 14 │ _Bool _11; 15 │ signed char _12; 16 │ signed char _13; 17 │ signed char _14; 18 │ signed char _22; 19 │ signed char _23; 20 │ 21 │[local count: 1073741822]: 22 │ x.1_1 = (unsigned char) x_7(D); 23 │ y.2_2 = (unsigned char) y_8(D); 24 │ _3 = x.1_1 + y.2_2; 25 │ sum_9 = (int8_t) _3; 26 │ _4 = x_7(D) ^ y_8(D); 27 │ _5 = x_7(D) ^ sum_9; 28 │ _23 = ~_4; 29 │ _22 = _5 & _23; 30 │ if (_22 < 0) 31 │ goto ; [41.00%] 32 │ else 33 │ goto ; [59.00%] 34 │ 35 │[local count: 259738146]: 36 │ _11 = x_7(D) < 0; 37 │ _12 = (signed char) _11; 38 │ _13 = -_12; 39 │ _14 = _13 ^ 127; 40 │ 41 │[local count: 1073741824]: 42 │ # _6 = PHI <_14(3), sum_9(2)> 43 │ return _6; 44 │ 45 │ } Pan -Original Message- From: Richard Biener Sent: Monday, August 5, 2024 7:16 PM To: Li, Pan2 Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH v1] Match: Support form 1 for scalar signed integer .SAT_ADD On Mon, Aug 5, 2024 at 9:14 AM wrote: > > From: Pan Li > > This patch would like to support the form 1 of the scalar signed > integer .SAT_ADD. Aka below example: > > Form 1: > #define DEF_SAT_S_ADD_FMT_1(T) \ > T __attribute__((noinline))\ > sat_s_add_##T##_fmt_1 (T x, T y) \ > { \ > T min = (T)1u << (sizeof (T) * 8 - 1); \ > T max = min - 1; \ > return (x ^ y) < 0 \ > ? (T)(x + y) \ > : ((T)(x + y) ^ x) >= 0\ > ? (T)(x + y) \ > : x < 0 ? min : max; \ > } > > DEF_SAT_S_ADD_FMT_1 (int64_t) > > We can tell the difference before and after this patch if backend > implemented the ssadd3 pattern similar as below. > > Before this patch: >4 │ __attribute__((noinline)) >5 │ int64_t sat_s_add_int64_t_fmt_1 (int64_t x, int64_t y) >6 │ { >7 │ long int _1; >8 │ long int _2; >9 │ long int _3; > 10 │ int64_t _4; > 11 │ long int _7; > 12 │ _Bool _9; > 13 │ long int _10; > 14 │ long int _11; > 15 │ long int _12; > 16 │ long int _13; > 17 │ > 18 │ ;; basic block 2, loop depth 0 > 19 │ ;;pred: ENTRY > 20 │ _1 = x_5(D) ^ y_6(D); > 21 │ _13 = x_5(D) + y_6(D); > 22 │ _3 = x_5(D) ^ _13; > 23 │ _2 = ~_1; > 24 │ _7 = _2 & _3; > 25 │ if (_7 >= 0) > 26 │ goto ; [59.00%] > 27 │ else > 28 │ go
RE: [PATCH v1] Match: Support form 1 for scalar signed integer .SAT_ADD
Thanks Richard for comments. > The convert looks odd to me given @0 is involved in both & operands. The convert is introduced as the GIMPLE IL is somehow different for int8_t when compares to int32_t or int64_t. There are some additional ops convert to unsigned for plus, see below line 8-9 and line 22-23. But we cannot see similar GIMPLE IL for int32_t and int64_t. To reconcile the types from int8_t to int64_t, add the convert here. Or may be I have some mistake in the example, let me revisit it and send v2 if no surprise. 4 │ __attribute__((noinline)) 5 │ int8_t sat_s_add_int8_t_fmt_1 (int8_t x, int8_t y) 6 │ { 7 │ int8_t sum; 8 │ unsigned char x.1_1; 9 │ unsigned char y.2_2; 10 │ unsigned char _3; 11 │ signed char _4; 12 │ signed char _5; 13 │ int8_t _6; 14 │ _Bool _11; 15 │ signed char _12; 16 │ signed char _13; 17 │ signed char _14; 18 │ signed char _22; 19 │ signed char _23; 20 │ 21 │[local count: 1073741822]: 22 │ x.1_1 = (unsigned char) x_7(D); 23 │ y.2_2 = (unsigned char) y_8(D); 24 │ _3 = x.1_1 + y.2_2; 25 │ sum_9 = (int8_t) _3; 26 │ _4 = x_7(D) ^ y_8(D); 27 │ _5 = x_7(D) ^ sum_9; 28 │ _23 = ~_4; 29 │ _22 = _5 & _23; 30 │ if (_22 < 0) 31 │ goto ; [41.00%] 32 │ else 33 │ goto ; [59.00%] 34 │ 35 │[local count: 259738146]: 36 │ _11 = x_7(D) < 0; 37 │ _12 = (signed char) _11; 38 │ _13 = -_12; 39 │ _14 = _13 ^ 127; 40 │ 41 │[local count: 1073741824]: 42 │ # _6 = PHI <_14(3), sum_9(2)> 43 │ return _6; 44 │ 45 │ } Pan -Original Message- From: Richard Biener Sent: Monday, August 5, 2024 7:16 PM To: Li, Pan2 Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH v1] Match: Support form 1 for scalar signed integer .SAT_ADD On Mon, Aug 5, 2024 at 9:14 AM wrote: > > From: Pan Li > > This patch would like to support the form 1 of the scalar signed > integer .SAT_ADD. Aka below example: > > Form 1: > #define DEF_SAT_S_ADD_FMT_1(T) \ > T __attribute__((noinline))\ > sat_s_add_##T##_fmt_1 (T x, T y) \ > { \ > T min = (T)1u << (sizeof (T) * 8 - 1); \ > T max = min - 1; \ > return (x ^ y) < 0 \ > ? (T)(x + y) \ > : ((T)(x + y) ^ x) >= 0\ > ? (T)(x + y) \ > : x < 0 ? min : max; \ > } > > DEF_SAT_S_ADD_FMT_1 (int64_t) > > We can tell the difference before and after this patch if backend > implemented the ssadd3 pattern similar as below. > > Before this patch: >4 │ __attribute__((noinline)) >5 │ int64_t sat_s_add_int64_t_fmt_1 (int64_t x, int64_t y) >6 │ { >7 │ long int _1; >8 │ long int _2; >9 │ long int _3; > 10 │ int64_t _4; > 11 │ long int _7; > 12 │ _Bool _9; > 13 │ long int _10; > 14 │ long int _11; > 15 │ long int _12; > 16 │ long int _13; > 17 │ > 18 │ ;; basic block 2, loop depth 0 > 19 │ ;;pred: ENTRY > 20 │ _1 = x_5(D) ^ y_6(D); > 21 │ _13 = x_5(D) + y_6(D); > 22 │ _3 = x_5(D) ^ _13; > 23 │ _2 = ~_1; > 24 │ _7 = _2 & _3; > 25 │ if (_7 >= 0) > 26 │ goto ; [59.00%] > 27 │ else > 28 │ goto ; [41.00%] > 29 │ ;;succ: 4 > 30 │ ;;3 > 31 │ > 32 │ ;; basic block 3, loop depth 0 > 33 │ ;;pred: 2 > 34 │ _9 = x_5(D) < 0; > 35 │ _10 = (long int) _9; > 36 │ _11 = -_10; > 37 │ _12 = _11 ^ 9223372036854775807; > 38 │ ;;succ: 4 > 39 │ > 40 │ ;; basic block 4, loop depth 0 > 41 │ ;;pred: 2 > 42 │ ;;3 > 43 │ # _4 = PHI <_13(2), _12(3)> > 44 │ return _4; > 45 │ ;;succ: EXIT > 46 │ > 47 │ } > > After this patch: >4 │ __attribute__((noinline)) >5 │ int64_t sat_s_add_int64_t_fmt_1 (int64_t x, int64_t y) >6 │ { >7 │ int64_t _4; >8 │ >9 │ ;; basic block 2, loop depth 0 > 10 │ ;;pred: ENTRY > 11 │ _4 = .SAT_ADD (x_5(D), y_6(D)); [tail call] > 12 │ return _4; > 13 │ ;;succ: EXIT > 14 │ > 15 │ } > > The below test suites are passed for this patch. > * The rv64gcv fully regression test. > *
RE: [PATCH v1] Match: Add type_has_mode_precision_p check for SAT_TRUNC [PR116202]
> Isn't that now handled by the direct_internal_fn_supported_p check? That is, > by the caller which needs to verify the matched operation is supported by > the target? type_strictly_matches_mode_p doesn't help here (include the un-committed one). It will hit below case and return true directly as TYPE_MODE (type) is E_RVVM1QImode. if (VECTOR_TYPE_P (type)) return VECTOR_MODE_P (TYPE_MODE (type)); And looks we cannot TREE_PRECISION on vector type here similar as type_has_mode_precision_p do for scalar types. Thus, add the check to the matching. Looks like we need to take care of vector in type_strictly_matches_mode_p, right ? Pan -Original Message- From: Richard Biener Sent: Monday, August 5, 2024 7:02 PM To: Li, Pan2 Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH v1] Match: Add type_has_mode_precision_p check for SAT_TRUNC [PR116202] On Sun, Aug 4, 2024 at 1:47 PM wrote: > > From: Pan Li > > The .SAT_TRUNC matching can only perform the type has its mode > precision. > > g_12 = (long unsigned int) _2; > _13 = MIN_EXPR ; > _3 = (_Bool) _13; > > The above pattern cannot be recog as .SAT_TRUNC (g_12) because the dest > only has 1 bit precision but QImode. Aka the type doesn't have the mode > precision. Thus, add the type_has_mode_precision_p for the dest to > avoid such case. > > The below tests are passed for this patch. > 1. The rv64gcv fully regression tests. > 2. The x86 bootstrap tests. > 3. The x86 fully regression tests. Isn't that now handled by the direct_internal_fn_supported_p check? That is, by the caller which needs to verify the matched operation is supported by the target? > PR target/116202 > > gcc/ChangeLog: > > * match.pd: Add type_has_mode_precision_p for the dest type > of the .SAT_TRUNC matching. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/rvv/base/pr116202-run-1.c: New test. > > Signed-off-by: Pan Li > --- > gcc/match.pd | 6 +++-- > .../riscv/rvv/base/pr116202-run-1.c | 24 +++ > 2 files changed, 28 insertions(+), 2 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr116202-run-1.c > > diff --git a/gcc/match.pd b/gcc/match.pd > index c9c8478d286..dfa0bba3908 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -3283,7 +3283,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > wide_int trunc_max = wi::mask (otype_precision, false, itype_precision); > wide_int int_cst = wi::to_wide (@1, itype_precision); >} > - (if (otype_precision < itype_precision && wi::eq_p (trunc_max, > int_cst)) > + (if (type_has_mode_precision_p (type) && otype_precision < itype_precision > + && wi::eq_p (trunc_max, int_cst)) > > /* Unsigned saturation truncate, case 2, sizeof (WT) > sizeof (NT). > SAT_U_TRUNC = (NT)(MIN_EXPR (X, 255)). */ > @@ -3309,7 +3310,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > wide_int trunc_max = wi::mask (otype_precision, false, itype_precision); > wide_int int_cst = wi::to_wide (@1, itype_precision); >} > - (if (otype_precision < itype_precision && wi::eq_p (trunc_max, > int_cst)) > + (if (type_has_mode_precision_p (type) && otype_precision < itype_precision > + && wi::eq_p (trunc_max, int_cst)) > > /* x > y && x != XXX_MIN --> x > y > x > y && x == XXX_MIN --> false . */ > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr116202-run-1.c > b/gcc/testsuite/gcc.target/riscv/rvv/base/pr116202-run-1.c > new file mode 100644 > index 000..d150f20b5d9 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr116202-run-1.c > @@ -0,0 +1,24 @@ > +/* { dg-do run } */ > +/* { dg-options "-O3 -march=rv64gcv_zvl256b -fdump-rtl-expand-details" } */ > + > +int b[24]; > +_Bool c[24]; > + > +int main() { > + for (int f = 0; f < 4; ++f) > +b[f] = 6; > + > + for (int f = 0; f < 24; f += 4) > +c[f] = ({ > + int g = ({ > +unsigned long g = -b[f]; > +1 < g ? 1 : g; > + }); > + g; > +}); > + > + if (c[0] != 1) > +__builtin_abort (); > +} > + > +/* { dg-final { scan-rtl-dump-not ".SAT_TRUNC " "expand" } } */ > -- > 2.43.0 >
RE: [PATCH v1] RISC-V: Support IMM for operand 0 of ussub pattern
Thanks Jeff for comments, let me refine the comments in v2. Pan -Original Message- From: Jeff Law Sent: Sunday, August 4, 2024 6:25 AM To: Li, Pan2 ; gcc-patches@gcc.gnu.org Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH v1] RISC-V: Support IMM for operand 0 of ussub pattern On 8/3/24 3:33 AM, pan2...@intel.com wrote: > From: Pan Li > > This patch would like to allow IMM for the operand 0 of ussub pattern. > Aka .SAT_SUB(1023, y) as the below example. > > Form 1: >#define DEF_SAT_U_SUB_IMM_FMT_1(T, IMM) \ >T __attribute__((noinline)) \ >sat_u_sub_imm##IMM##_##T##_fmt_1 (T y) \ >{ \ > return (T)IMM >= y ? (T)IMM - y : 0; \ >} > > DEF_SAT_U_SUB_IMM_FMT_1(uint64_t, 1023) > > Before this patch: >10 │ sat_u_sub_imm82_uint64_t_fmt_1: >11 │ li a5,82 >12 │ bgtua0,a5,.L3 >13 │ sub a0,a5,a0 >14 │ ret >15 │ .L3: >16 │ li a0,0 >17 │ ret > > After this patch: >10 │ sat_u_sub_imm82_uint64_t_fmt_1: >11 │ li a5,82 >12 │ sltua4,a5,a0 >13 │ addia4,a4,-1 >14 │ sub a0,a5,a0 >15 │ and a0,a4,a0 >16 │ ret > > The below test suites are passed for this patch: > 1. The rv64gcv fully regression test. > > gcc/ChangeLog: > > * config/riscv/riscv.cc (riscv_gen_unsigned_xmode_reg): Add new > func impl to gen xmode rtx reg. > (riscv_expand_ussub): Gen xmode reg for operand 1. > * config/riscv/riscv.md: Allow const_int for operand 1. > +> + 1. Case 1: .SAT_SUB (127, y) for QImode. > + The imm will be (const_int 127) after expand_expr_real_1, thus we > + can just move the (const_int 127) to Xmode reg without any other insn. > + > + 2. Case 2: .SAT_SUB (254, y) for QImode. > + The imm will be (const_int -2) after expand_expr_real_1, thus we > + will have li a0, -2 (aka a0 = 0xfffe if RV64). This is > + not what we want for the underlying insn like sltu. So we need to > + clean the up highest 56 bits for a0 to get the real value (254, 0xfe). > +> + This function would like to take care of above scenario and return the > + rtx reg which holds the x in Xmode. */ What does this function do. ie, what are the inputs, what are the outputs? Without that core information it's hard to know if your implementation is correct. If really looks like you're returning a reg in X mode. In which case you can just gen_int_mode (constant, word_mode) If the constant is 254, then that's going to load 0x00fe on rv64. If the problem is that you have a target of SImode on RV64, then you do have a real problem. The rv64 ABI mandates that a 32bit value be sign extended out to 64 bits. And if this is the problem you're trying to solve, then it's a good indicator you've made a mistake elsewhere. Anyway, it seems like you need to describe better where things are going wrong before we can ACK/NACK this patch. jeff
RE: [PATCH v2] Internal-fn: Handle vector bool type for type strict match mode [PR116103]
> Still OK. Thanks Richard, let me wait the final confirmation from Richard S. Pan -Original Message- From: Richard Biener Sent: Tuesday, July 30, 2024 5:03 PM To: Li, Pan2 Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; tamar.christ...@arm.com; jeffreya...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH v2] Internal-fn: Handle vector bool type for type strict match mode [PR116103] On Tue, Jul 30, 2024 at 5:08 AM wrote: > > From: Pan Li > > For some target like target=amdgcn-amdhsa, we need to take care of > vector bool types prior to general vector mode types. Or we may have > the asm check failure as below. > > gcc.target/gcn/cond_smax_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, > s[0-9]+, v[0-9]+ 80 > gcc.target/gcn/cond_smin_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, > s[0-9]+, v[0-9]+ 80 > gcc.target/gcn/cond_umax_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, > s[0-9]+, v[0-9]+ 56 > gcc.target/gcn/cond_umin_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, > s[0-9]+, v[0-9]+ 56 > gcc.dg/tree-ssa/loop-bound-2.c scan-tree-dump-not ivopts "zero if " > > The below test suites are passed for this patch. > 1. The rv64gcv fully regression tests. > 2. The x86 bootstrap tests. > 3. The x86 fully regression tests. > 4. The amdgcn test case as above. Still OK. Richard. > gcc/ChangeLog: > > * internal-fn.cc (type_strictly_matches_mode_p): Add handling > for vector bool type. > > Signed-off-by: Pan Li > --- > gcc/internal-fn.cc | 10 ++ > 1 file changed, 10 insertions(+) > > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc > index 8a2e07f2f96..966594a52ed 100644 > --- a/gcc/internal-fn.cc > +++ b/gcc/internal-fn.cc > @@ -4171,6 +4171,16 @@ direct_internal_fn_optab (internal_fn fn) > static bool > type_strictly_matches_mode_p (const_tree type) > { > + /* The masked vector operations have both vector data operands and vector > + boolean operands. The vector data operands are expected to have a > vector > + mode, but the vector boolean operands can be an integer mode rather > than > + a vector mode, depending on how TARGET_VECTORIZE_GET_MASK_MODE is > + defined. PR116103. */ > + if (VECTOR_BOOLEAN_TYPE_P (type) > + && SCALAR_INT_MODE_P (TYPE_MODE (type)) > + && TYPE_PRECISION (TREE_TYPE (type)) == 1) > +return true; > + >if (VECTOR_TYPE_P (type)) > return VECTOR_MODE_P (TYPE_MODE (type)); > > -- > 2.34.1 >
RE: [PATCH v1] RISC-V: Implement the quad and oct .SAT_TRUNC for scalar
Kindly ping. Pan -Original Message- From: Li, Pan2 Sent: Tuesday, July 23, 2024 1:06 PM To: gcc-patches@gcc.gnu.org Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com; Li, Pan2 Subject: [PATCH v1] RISC-V: Implement the quad and oct .SAT_TRUNC for scalar From: Pan Li This patch would like to implement the quad and oct .SAT_TRUNC pattern in the riscv backend. Aka: Form 1: #define DEF_SAT_U_TRUC_FMT_1(NT, WT) \ NT __attribute__((noinline)) \ sat_u_truc_##WT##_to_##NT##_fmt_1 (WT x) \ {\ bool overflow = x > (WT)(NT)(-1); \ return ((NT)x) | (NT)-overflow;\ } DEF_SAT_U_TRUC_FMT_1(uint16_t, uint64_t) Before this patch: 4 │ __attribute__((noinline)) 5 │ uint16_t sat_u_truc_uint64_t_to_uint16_t_fmt_1 (uint64_t x) 6 │ { 7 │ _Bool overflow; 8 │ short unsigned int _1; 9 │ short unsigned int _2; 10 │ short unsigned int _3; 11 │ uint16_t _6; 12 │ 13 │ ;; basic block 2, loop depth 0 14 │ ;;pred: ENTRY 15 │ overflow_5 = x_4(D) > 65535; 16 │ _1 = (short unsigned int) x_4(D); 17 │ _2 = (short unsigned int) overflow_5; 18 │ _3 = -_2; 19 │ _6 = _1 | _3; 20 │ return _6; 21 │ ;;succ: EXIT 22 │ 23 │ } After this patch: 3 │ 4 │ __attribute__((noinline)) 5 │ uint16_t sat_u_truc_uint64_t_to_uint16_t_fmt_1 (uint64_t x) 6 │ { 7 │ uint16_t _6; 8 │ 9 │ ;; basic block 2, loop depth 0 10 │ ;;pred: ENTRY 11 │ _6 = .SAT_TRUNC (x_4(D)); [tail call] 12 │ return _6; 13 │ ;;succ: EXIT 14 │ 15 │ } The below tests suites are passed for this patch 1. The rv64gcv fully regression test. 2. The rv64gcv build with glibc gcc/ChangeLog: * config/riscv/iterators.md (ANYI_QUAD_TRUNC): New iterator for quad truncation. (ANYI_OCT_TRUNC): New iterator for oct truncation. (ANYI_QUAD_TRUNCATED): New attr for truncated quad modes. (ANYI_OCT_TRUNCATED): New attr for truncated oct modes. (anyi_quad_truncated): Ditto but for lower case. (anyi_oct_truncated): Ditto but for lower case. * config/riscv/riscv.md (ustrunc2): Add new pattern for quad truncation. (ustrunc2): Ditto but for oct. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-2.c: Adjust the expand dump check times. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-3.c: Ditto. * gcc.target/riscv/sat_arith_data.h: Add test helper macros. * gcc.target/riscv/sat_u_trunc-4.c: New test. * gcc.target/riscv/sat_u_trunc-5.c: New test. * gcc.target/riscv/sat_u_trunc-6.c: New test. * gcc.target/riscv/sat_u_trunc-run-4.c: New test. * gcc.target/riscv/sat_u_trunc-run-5.c: New test. * gcc.target/riscv/sat_u_trunc-run-6.c: New test. Signed-off-by: Pan Li --- gcc/config/riscv/iterators.md | 20 gcc/config/riscv/riscv.md | 20 .../rvv/autovec/unop/vec_sat_u_trunc-2.c | 2 +- .../rvv/autovec/unop/vec_sat_u_trunc-3.c | 2 +- .../gcc.target/riscv/sat_arith_data.h | 51 +++ .../gcc.target/riscv/sat_u_trunc-4.c | 17 +++ .../gcc.target/riscv/sat_u_trunc-5.c | 17 +++ .../gcc.target/riscv/sat_u_trunc-6.c | 20 .../gcc.target/riscv/sat_u_trunc-run-4.c | 16 ++ .../gcc.target/riscv/sat_u_trunc-run-5.c | 16 ++ .../gcc.target/riscv/sat_u_trunc-run-6.c | 16 ++ 11 files changed, 195 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-4.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-5.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-6.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-run-4.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-run-5.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-run-6.c diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md index 734da041f0c..bdcdb8babc8 100644 --- a/gcc/config/riscv/iterators.md +++ b/gcc/config/riscv/iterators.md @@ -67,14 +67,34 @@ (define_mode_iterator ANYI [QI HI SI (DI "TARGET_64BIT")]) (define_mode_iterator ANYI_DOUBLE_TRUNC [HI SI (DI "TARGET_64BIT")]) +(define_mode_iterator ANYI_QUAD_TRUNC [SI (DI "TARGET_64BIT")]) + +(define_mode_iterator ANYI_OCT_TRUNC [(DI "TARGET_64BIT")]) + (define_mode_attr ANYI_DOUBLE_TRUNCATED [ (HI "QI") (SI "HI") (DI "SI") ]) +(define_mode_attr ANYI_QUAD_TRUNCATED [ + (SI "QI") (DI "HI") +]) + +(define_mode_attr ANYI_OCT_TRUNCATED [ + (DI "QI") +]) + (define_mode_attr anyi_dou
RE: [PATCH v1] RISC-V: Take Xmode instead of Pmode for ussub expanding
Committed, thanks Robin. Pan -Original Message- From: Robin Dapp Sent: Tuesday, July 30, 2024 2:28 PM To: Li, Pan2 ; gcc-patches@gcc.gnu.org Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; Robin Dapp Subject: Re: [PATCH v1] RISC-V: Take Xmode instead of Pmode for ussub expanding OK. -- Regards Robin
RE: [PATCH v1] Internal-fn: Handle vector bool type for type strict match mode [PR116103]
Thanks Richard S for comments, updated in v2. https://gcc.gnu.org/pipermail/gcc-patches/2024-July/658637.html Pan -Original Message- From: Richard Sandiford Sent: Tuesday, July 30, 2024 12:09 AM To: Li, Pan2 Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com; tamar.christ...@arm.com; jeffreya...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH v1] Internal-fn: Handle vector bool type for type strict match mode [PR116103] pan2...@intel.com writes: > From: Pan Li > > For some target like target=amdgcn-amdhsa, we need to take care of > vector bool types prior to general vector mode types. Or we may have > the asm check failure as below. > > gcc.target/gcn/cond_smax_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, > s[0-9]+, v[0-9]+ 80 > gcc.target/gcn/cond_smin_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, > s[0-9]+, v[0-9]+ 80 > gcc.target/gcn/cond_umax_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, > s[0-9]+, v[0-9]+ 56 > gcc.target/gcn/cond_umin_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, > s[0-9]+, v[0-9]+ 56 > gcc.dg/tree-ssa/loop-bound-2.c scan-tree-dump-not ivopts "zero if " > > The below test suites are passed for this patch. > 1. The rv64gcv fully regression tests. > 2. The x86 bootstrap tests. > 3. The x86 fully regression tests. > 4. The amdgcn test case as above. > > gcc/ChangeLog: > > * internal-fn.cc (type_strictly_matches_mode_p): Add handling > for vector bool type. > > Signed-off-by: Pan Li > --- > gcc/internal-fn.cc | 6 ++ > 1 file changed, 6 insertions(+) > > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc > index 8a2e07f2f96..086c8be398a 100644 > --- a/gcc/internal-fn.cc > +++ b/gcc/internal-fn.cc > @@ -4171,6 +4171,12 @@ direct_internal_fn_optab (internal_fn fn) > static bool > type_strictly_matches_mode_p (const_tree type) > { > + /* For target=amdgcn-amdhsa, we need to take care of vector bool types. > + More details see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116103. > */ > + if (VECTOR_BOOLEAN_TYPE_P (type) && SCALAR_INT_MODE_P (TYPE_MODE (type)) > +&& TYPE_PRECISION (TREE_TYPE (type)) == 1) Sorry for the formatting nits, but I think this should be: if (VECTOR_BOOLEAN_TYPE_P (type) && SCALAR_INT_MODE_P (TYPE_MODE (type)) && TYPE_PRECISION (TREE_TYPE (type)) == 1) (one condition per line, indented below "VECTOR"). But I think the comment should give the underlying reason, rather than treat it as a target oddity. Maybe something like: /* Masked vector operations have both vector data operands and vector boolean operands. The vector data operands are expected to have a vector mode, but the vector boolean operands can be an integer mode rather than a vector mode, depending on how TARGET_VECTORIZE_GET_MASK_MODE is defined. */ Thanks, Richard > +return true; > + >if (VECTOR_TYPE_P (type)) > return VECTOR_MODE_P (TYPE_MODE (type));
RE: [PATCH v1] Internal-fn: Handle vector bool type for type strict match mode [PR116103]
> OK. Thanks Richard, will wait the confirmation from Thomas in case I missed some more failed cases. Pan -Original Message- From: Richard Biener Sent: Monday, July 29, 2024 4:44 PM To: Li, Pan2 Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; tamar.christ...@arm.com; jeffreya...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH v1] Internal-fn: Handle vector bool type for type strict match mode [PR116103] On Mon, Jul 29, 2024 at 9:57 AM wrote: > > From: Pan Li > > For some target like target=amdgcn-amdhsa, we need to take care of > vector bool types prior to general vector mode types. Or we may have > the asm check failure as below. > > gcc.target/gcn/cond_smax_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, > s[0-9]+, v[0-9]+ 80 > gcc.target/gcn/cond_smin_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, > s[0-9]+, v[0-9]+ 80 > gcc.target/gcn/cond_umax_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, > s[0-9]+, v[0-9]+ 56 > gcc.target/gcn/cond_umin_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, > s[0-9]+, v[0-9]+ 56 > gcc.dg/tree-ssa/loop-bound-2.c scan-tree-dump-not ivopts "zero if " > > The below test suites are passed for this patch. > 1. The rv64gcv fully regression tests. > 2. The x86 bootstrap tests. > 3. The x86 fully regression tests. > 4. The amdgcn test case as above. OK. Richard. > gcc/ChangeLog: > > * internal-fn.cc (type_strictly_matches_mode_p): Add handling > for vector bool type. > > Signed-off-by: Pan Li > --- > gcc/internal-fn.cc | 6 ++ > 1 file changed, 6 insertions(+) > > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc > index 8a2e07f2f96..086c8be398a 100644 > --- a/gcc/internal-fn.cc > +++ b/gcc/internal-fn.cc > @@ -4171,6 +4171,12 @@ direct_internal_fn_optab (internal_fn fn) > static bool > type_strictly_matches_mode_p (const_tree type) > { > + /* For target=amdgcn-amdhsa, we need to take care of vector bool types. > + More details see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116103. > */ > + if (VECTOR_BOOLEAN_TYPE_P (type) && SCALAR_INT_MODE_P (TYPE_MODE (type)) > +&& TYPE_PRECISION (TREE_TYPE (type)) == 1) > +return true; > + >if (VECTOR_TYPE_P (type)) > return VECTOR_MODE_P (TYPE_MODE (type)); > > -- > 2.34.1 >
RE: [PATCH v1] Widening-Mul: Try .SAT_SUB for PLUS_EXPR when one op is IMM
> OK Committed, thanks Richard. Pan -Original Message- From: Richard Biener Sent: Monday, July 29, 2024 5:03 PM To: Li, Pan2 Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; tamar.christ...@arm.com; jeffreya...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH v1] Widening-Mul: Try .SAT_SUB for PLUS_EXPR when one op is IMM On Sun, Jul 28, 2024 at 5:25 AM wrote: > > From: Pan Li > > After add the matching for .SAT_SUB when one op is IMM, there > will be a new root PLUS_EXPR for the .SAT_SUB pattern. For example, > > Form 3: > #define DEF_SAT_U_SUB_IMM_FMT_3(T, IMM) \ > T __attribute__((noinline)) \ > sat_u_sub_imm##IMM##_##T##_fmt_3 (T x) \ > { \ > return x >= IMM ? x - IMM : 0;\ > } > > DEF_SAT_U_SUB_IMM_FMT_3(uint64_t, 11) > > And then we will have gimple before widening-mul as below. Thus, try > the .SAT_SUB for the PLUS_EXPR. > >4 │ __attribute__((noinline)) >5 │ uint64_t sat_u_sub_imm11_uint64_t_fmt_3 (uint64_t x) >6 │ { >7 │ long unsigned int _1; >8 │ uint64_t _3; >9 │ > 10 │[local count: 1073741824]: > 11 │ _1 = MAX_EXPR ; > 12 │ _3 = _1 + 18446744073709551605; > 13 │ return _3; > 14 │ > 15 │ } > > The below test suites are passed for this patch. > 1. The rv64gcv fully regression tests. > 2. The x86 bootstrap tests. > 3. The x86 fully regression tests. OK > gcc/ChangeLog: > > * tree-ssa-math-opts.cc (math_opts_dom_walker::after_dom_children): > Try .SAT_SUB for PLUS_EXPR case. > > Signed-off-by: Pan Li > --- > gcc/tree-ssa-math-opts.cc | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc > index ac86be8eb94..8d96a4c964b 100644 > --- a/gcc/tree-ssa-math-opts.cc > +++ b/gcc/tree-ssa-math-opts.cc > @@ -6129,6 +6129,7 @@ math_opts_dom_walker::after_dom_children (basic_block > bb) > > case PLUS_EXPR: > match_unsigned_saturation_add (&gsi, as_a (stmt)); > + match_unsigned_saturation_sub (&gsi, as_a (stmt)); > /* fall-through */ > case MINUS_EXPR: > if (!convert_plusminus_to_widen (&gsi, stmt, code)) > -- > 2.34.1 >
RE: [PATCH v1] Match: Support .SAT_SUB with IMM op for form 1-4
> OK. Committed, thanks Richard. Pan -Original Message- From: Richard Biener Sent: Friday, July 26, 2024 9:32 PM To: Li, Pan2 Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; tamar.christ...@arm.com; jeffreya...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH v1] Match: Support .SAT_SUB with IMM op for form 1-4 On Fri, Jul 26, 2024 at 11:20 AM wrote: > > From: Pan Li > > This patch would like to support .SAT_SUB when one of the op > is IMM. Aka below 1-4 forms. > > Form 1: > #define DEF_SAT_U_SUB_IMM_FMT_1(T, IMM) \ > T __attribute__((noinline)) \ > sat_u_sub_imm##IMM##_##T##_fmt_1 (T y) \ > { \ >return IMM >= y ? IMM - y : 0;\ > } > > Form 2: > #define DEF_SAT_U_SUB_IMM_FMT_2(T, IMM) \ > T __attribute__((noinline)) \ > sat_u_sub_imm##IMM##_##T##_fmt_2 (T y) \ > { \ > return IMM > y ? IMM - y : 0; \ > } > > Form 3: > #define DEF_SAT_U_SUB_IMM_FMT_3(T, IMM) \ > T __attribute__((noinline)) \ > sat_u_sub_imm##IMM##_##T##_fmt_3 (T x) \ > { \ > return x >= IMM ? x - IMM : 0;\ > } > > Form 4: > #define DEF_SAT_U_SUB_IMM_FMT_4(T, IMM) \ > T __attribute__((noinline)) \ > sat_u_sub_imm##IMM##_##T##_fmt_4 (T x) \ > { \ > return x > IMM ? x - IMM : 0; \ > } > > Take below form 1 as example: > > DEF_SAT_U_SUB_OP0_IMM_FMT_1(uint32_t, 11) > > Before this patch: >4 │ __attribute__((noinline)) >5 │ uint64_t sat_u_sub_imm11_uint64_t_fmt_1 (uint64_t y) >6 │ { >7 │ uint64_t _1; >8 │ uint64_t _3; >9 │ > 10 │ ;; basic block 2, loop depth 0 > 11 │ ;;pred: ENTRY > 12 │ if (y_2(D) <= 11) > 13 │ goto ; [50.00%] > 14 │ else > 15 │ goto ; [50.00%] > 16 │ ;;succ: 3 > 17 │ ;;4 > 18 │ > 19 │ ;; basic block 3, loop depth 0 > 20 │ ;;pred: 2 > 21 │ _3 = 11 - y_2(D); > 22 │ ;;succ: 4 > 23 │ > 24 │ ;; basic block 4, loop depth 0 > 25 │ ;;pred: 2 > 26 │ ;;3 > 27 │ # _1 = PHI <0(2), _3(3)> > 28 │ return _1; > 29 │ ;;succ: EXIT > 30 │ > 31 │ } > > After this patch: >4 │ __attribute__((noinline)) >5 │ uint64_t sat_u_sub_imm11_uint64_t_fmt_1 (uint64_t y) >6 │ { >7 │ uint64_t _1; >8 │ >9 │ ;; basic block 2, loop depth 0 > 10 │ ;;pred: ENTRY > 11 │ _1 = .SAT_SUB (11, y_2(D)); [tail call] > 12 │ return _1; > 13 │ ;;succ: EXIT > 14 │ > 15 │ } > > The below test suites are passed for this patch: > 1. The rv64gcv fully regression tests. > 2. The x86 bootstrap tests. > 3. The x86 fully regression tests. OK. Thanks, Richard. > gcc/ChangeLog: > > * match.pd: Add case 9 and case 10 for .SAT_SUB when one > of the op is IMM. > > Signed-off-by: Pan Li > --- > gcc/match.pd | 35 +++ > 1 file changed, 35 insertions(+) > > diff --git a/gcc/match.pd b/gcc/match.pd > index cf359b0ec0f..b2e7d61790d 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -3234,6 +3234,41 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type) >&& types_match (type, @0, @1 > > +/* Unsigned saturation sub with op_0 imm, case 9 (branch with gt): > + SAT_U_SUB = IMM > Y ? (IMM - Y) : 0. > + = IMM >= Y ? (IMM - Y) : 0. */ > +(match (unsigned_integer_sat_sub @0 @1) > + (cond^ (le @1 INTEGER_CST@2) (minus INTEGER_CST@0 @1) integer_zerop) > + (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type) > + && types_match (type, @1)) > + (with > + { > + unsigned precision = TYPE_PRECISION (type); > + wide_int max = wi::mask (precision, false, precision); > + wide_int c0 = wi::to_wide (@0); > + wide_int c2 = wi::to_wide (@2); > + wide_int c2_add_1 = wi::add (c2, wi::uhwi (1, precision)); > + bool equal_p = wi::eq_p (c0, c2); > + bool less_than_1_p = !wi::eq_p (c2, max) && wi::eq_p (c2_add_1, c0); > + } > + (if (equal_p || less_than_1_p) > + > +/* Unsigned saturation sub with op_1 imm, case 10: > + SAT_U_SUB = X > IMM ? (X - IMM) : 0. > + = X >= IMM ? (X - IMM) : 0. */ > +(match (unsigned_integer_sat_sub @0 @1) > + (plus (max @0
RE: [PATCH v2] Internal-fn: Only allow type matches mode for internal fn[PR115961]
> Just a slight comment improvement: > /* Returns true if both types of TYPE_PAIR strictly match their modes, > else returns false. */ > This testcase could go in g++.dg/torture/ without the -O3 option. > Since we are scanning for the negative it should pass on all targets > even ones without SAT_TRUNC support. And then you should not need the > other testcase either. Thanks all, will address above comments and commit it if no surprise from test. Pan -Original Message- From: Richard Sandiford Sent: Tuesday, July 23, 2024 10:03 PM To: Richard Biener Cc: Li, Pan2 ; gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; tamar.christ...@arm.com; jeffreya...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH v2] Internal-fn: Only allow type matches mode for internal fn[PR115961] Richard Biener writes: > On Fri, Jul 19, 2024 at 1:10 PM wrote: >> >> From: Pan Li >> >> The direct_internal_fn_supported_p has no restrictions for the type >> modes. For example the bitfield like below will be recog as .SAT_TRUNC. >> >> struct e >> { >> unsigned pre : 12; >> unsigned a : 4; >> }; >> >> __attribute__((noipa)) >> void bug (e * v, unsigned def, unsigned use) { >> e & defE = *v; >> defE.a = min_u (use + 1, 0xf); >> } >> >> This patch would like to check strictly for the >> direct_internal_fn_supported_p, >> and only allows the type matches mode for ifn type tree pair. >> >> The below test suites are passed for this patch: >> 1. The rv64gcv fully regression tests. >> 2. The x86 bootstrap tests. >> 3. The x86 fully regression tests. > > LGTM unless Richard S. has any more comments. LGTM too with Andrew's comments addressed. Thanks, Richard > > Richard. > >> PR target/115961 >> >> gcc/ChangeLog: >> >> * internal-fn.cc (type_strictly_matches_mode_p): Add new func >> impl to check type strictly matches mode or not. >> (type_pair_strictly_matches_mode_p): Ditto but for tree type >> pair. >> (direct_internal_fn_supported_p): Add above check for the tree >> type pair. >> >> gcc/testsuite/ChangeLog: >> >> * g++.target/i386/pr115961-run-1.C: New test. >> * g++.target/riscv/rvv/base/pr115961-run-1.C: New test. >> >> Signed-off-by: Pan Li >> --- >> gcc/internal-fn.cc| 32 + >> .../g++.target/i386/pr115961-run-1.C | 34 +++ >> .../riscv/rvv/base/pr115961-run-1.C | 34 +++ >> 3 files changed, 100 insertions(+) >> create mode 100644 gcc/testsuite/g++.target/i386/pr115961-run-1.C >> create mode 100644 gcc/testsuite/g++.target/riscv/rvv/base/pr115961-run-1.C >> >> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc >> index 95946bfd683..5c21249318e 100644 >> --- a/gcc/internal-fn.cc >> +++ b/gcc/internal-fn.cc >> @@ -4164,6 +4164,35 @@ direct_internal_fn_optab (internal_fn fn) >>gcc_unreachable (); >> } >> >> +/* Return true if TYPE's mode has the same format as TYPE, and if there is >> + a 1:1 correspondence between the values that the mode can store and the >> + values that the type can store. */ >> + >> +static bool >> +type_strictly_matches_mode_p (const_tree type) >> +{ >> + if (VECTOR_TYPE_P (type)) >> +return VECTOR_MODE_P (TYPE_MODE (type)); >> + >> + if (INTEGRAL_TYPE_P (type)) >> +return type_has_mode_precision_p (type); >> + >> + if (SCALAR_FLOAT_TYPE_P (type) || COMPLEX_FLOAT_TYPE_P (type)) >> +return true; >> + >> + return false; >> +} >> + >> +/* Return true if both the first and the second type of tree pair are >> + strictly matches their modes, or return false. */ >> + >> +static bool >> +type_pair_strictly_matches_mode_p (tree_pair type_pair) >> +{ >> + return type_strictly_matches_mode_p (type_pair.first) >> +&& type_strictly_matches_mode_p (type_pair.second); >> +} >> + >> /* Return true if FN is supported for the types in TYPES when the >> optimization type is OPT_TYPE. The types are those associated with >> the "type0" and "type1" fields of FN's direct_internal_fn_info >> @@ -4173,6 +4202,9 @@ bool >> direct_internal_fn_supported_p (internal_fn fn, tree_pair types, >> optimization_type opt_type) >> { >> + if (!type_pair_strictly_matches_mode_p (types)) >> +ret
RE: [PATCH v3] RISC-V: Implement the .SAT_TRUNC for scalar
Committed, thanks Robin. Pan -Original Message- From: Robin Dapp Sent: Monday, July 22, 2024 11:27 PM To: Li, Pan2 ; gcc-patches@gcc.gnu.org Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com Subject: Re: [PATCH v3] RISC-V: Implement the .SAT_TRUNC for scalar LGTM. -- Regards Robin
RE: [PATCH v3] RISC-V: Implement the .SAT_TRUNC for scalar
Kindly ping. Pan -Original Message- From: Li, Pan2 Sent: Monday, July 15, 2024 6:35 PM To: gcc-patches@gcc.gnu.org Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com; Li, Pan2 Subject: [PATCH v3] RISC-V: Implement the .SAT_TRUNC for scalar From: Pan Li Update in v3: * Rebase the upstream. * Adjust asm check. Original log: This patch would like to implement the simple .SAT_TRUNC pattern in the riscv backend. Aka: Form 1: #define DEF_SAT_U_TRUC_FMT_1(NT, WT) \ NT __attribute__((noinline)) \ sat_u_truc_##WT##_to_##NT##_fmt_1 (WT x) \ {\ bool overflow = x > (WT)(NT)(-1); \ return ((NT)x) | (NT)-overflow;\ } DEF_SAT_U_TRUC_FMT_1(uint32_t, uint64_t) Before this patch: __attribute__((noinline)) uint8_t sat_u_truc_uint16_t_to_uint8_t_fmt_1 (uint16_t x) { _Bool overflow; unsigned char _1; unsigned char _2; unsigned char _3; uint8_t _6; ;; basic block 2, loop depth 0 ;;pred: ENTRY overflow_5 = x_4(D) > 255; _1 = (unsigned char) x_4(D); _2 = (unsigned char) overflow_5; _3 = -_2; _6 = _1 | _3; return _6; ;;succ: EXIT } After this patch: __attribute__((noinline)) uint8_t sat_u_truc_uint16_t_to_uint8_t_fmt_1 (uint16_t x) { uint8_t _6; ;; basic block 2, loop depth 0 ;;pred: ENTRY _6 = .SAT_TRUNC (x_4(D)); [tail call] return _6; ;;succ: EXIT } The below tests suites are passed for this patch 1. The rv64gcv fully regression test. 2. The rv64gcv build with glibc gcc/ChangeLog: * config/riscv/iterators.md (ANYI_DOUBLE_TRUNC): Add new iterator for int double truncation. (ANYI_DOUBLE_TRUNCATED): Add new attr for int double truncation. (anyi_double_truncated): Ditto but for lowercase. * config/riscv/riscv-protos.h (riscv_expand_ustrunc): Add new func decl for expanding ustrunc * config/riscv/riscv.cc (riscv_expand_ustrunc): Add new func impl to expand ustrunc. * config/riscv/riscv.md (ustrunc2): Impl the new pattern ustrunc2 for int. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-1.c: Adjust asm check times from 2 to 4. * gcc.target/riscv/sat_arith.h: Add test helper macro. * gcc.target/riscv/sat_arith_data.h: New test. * gcc.target/riscv/sat_u_trunc-1.c: New test. * gcc.target/riscv/sat_u_trunc-2.c: New test. * gcc.target/riscv/sat_u_trunc-3.c: New test. * gcc.target/riscv/sat_u_trunc-run-1.c: New test. * gcc.target/riscv/sat_u_trunc-run-2.c: New test. * gcc.target/riscv/sat_u_trunc-run-3.c: New test. * gcc.target/riscv/scalar_sat_unary.h: New test. Signed-off-by: Pan Li --- gcc/config/riscv/iterators.md | 10 gcc/config/riscv/riscv-protos.h | 1 + gcc/config/riscv/riscv.cc | 40 + gcc/config/riscv/riscv.md | 10 .../rvv/autovec/unop/vec_sat_u_trunc-1.c | 2 +- gcc/testsuite/gcc.target/riscv/sat_arith.h| 16 ++ .../gcc.target/riscv/sat_arith_data.h | 56 +++ .../gcc.target/riscv/sat_u_trunc-1.c | 17 ++ .../gcc.target/riscv/sat_u_trunc-2.c | 20 +++ .../gcc.target/riscv/sat_u_trunc-3.c | 19 +++ .../gcc.target/riscv/sat_u_trunc-run-1.c | 16 ++ .../gcc.target/riscv/sat_u_trunc-run-2.c | 16 ++ .../gcc.target/riscv/sat_u_trunc-run-3.c | 16 ++ .../gcc.target/riscv/scalar_sat_unary.h | 22 14 files changed, 260 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/riscv/sat_arith_data.h create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-3.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-run-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-run-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-run-3.c create mode 100644 gcc/testsuite/gcc.target/riscv/scalar_sat_unary.h diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md index d61ed53a8b1..734da041f0c 100644 --- a/gcc/config/riscv/iterators.md +++ b/gcc/config/riscv/iterators.md @@ -65,6 +65,16 @@ (define_mode_iterator SUBX [QI HI (SI "TARGET_64BIT")]) ;; Iterator for hardware-supported integer modes. (define_mode_iterator ANYI [QI HI SI (DI "TARGET_64BIT")]) +(define_mode_iterator ANYI_DOUBLE_TRUNC [HI SI (DI "TARGET_64BIT")]) + +(define_mode_attr ANYI_DOUBLE_TRUNCATED [ + (HI "QI") (SI "HI") (DI "SI") +]) + +(define_mode_attr anyi_double_truncated [ + (HI "qi") (SI "hi") (DI "si") +]) + ;; Iterator
RE: [PATCH v1] Internal-fn: Only allow modes describe types for internal fn[PR115961]
Thanks Richard S for comments and suggestions, updated in v2. Pan -Original Message- From: Richard Sandiford Sent: Friday, July 19, 2024 3:46 PM To: Li, Pan2 Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com; tamar.christ...@arm.com; jeffreya...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH v1] Internal-fn: Only allow modes describe types for internal fn[PR115961] pan2...@intel.com writes: > From: Pan Li > > The direct_internal_fn_supported_p has no restrictions for the type > modes. For example the bitfield like below will be recog as .SAT_TRUNC. > > struct e > { > unsigned pre : 12; > unsigned a : 4; > }; > > __attribute__((noipa)) > void bug (e * v, unsigned def, unsigned use) { > e & defE = *v; > defE.a = min_u (use + 1, 0xf); > } > > This patch would like to add checks for the direct_internal_fn_supported_p, > and only allows the tree types describled by modes. > > The below test suites are passed for this patch: > 1. The rv64gcv fully regression tests. > 2. The x86 bootstrap tests. > 3. The x86 fully regression tests. > > PR target/115961 > > gcc/ChangeLog: > > * internal-fn.cc (mode_describle_type_precision_p): Add new func > impl to check if mode describle the tree type. > (direct_internal_fn_supported_p): Add above check for the first > and second tree type of tree pair. > > gcc/testsuite/ChangeLog: > > * g++.target/i386/pr115961-run-1.C: New test. > * g++.target/riscv/rvv/base/pr115961-run-1.C: New test. > > Signed-off-by: Pan Li > --- > gcc/internal-fn.cc| 21 > .../g++.target/i386/pr115961-run-1.C | 34 +++ > .../riscv/rvv/base/pr115961-run-1.C | 34 +++ > 3 files changed, 89 insertions(+) > create mode 100644 gcc/testsuite/g++.target/i386/pr115961-run-1.C > create mode 100644 gcc/testsuite/g++.target/riscv/rvv/base/pr115961-run-1.C > > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc > index 95946bfd683..4dc69264a24 100644 > --- a/gcc/internal-fn.cc > +++ b/gcc/internal-fn.cc > @@ -4164,6 +4164,23 @@ direct_internal_fn_optab (internal_fn fn) >gcc_unreachable (); > } > > +/* Return true if the mode describes the precision of tree type, or false. > */ > + > +static bool > +mode_describle_type_precision_p (const_tree type) Bit pedantic, but it's not really just about precision. For floats and vectors it's also about format. Maybe: /* Return true if TYPE's mode has the same format as TYPE, and if there is a 1:1 correspondence between the values that the mode can store and the values that the type can store. */ And maybe my mode_describes_type_p suggestion wasn't the best, but given that it's not just about precision, I'm not sure about mode_describle_type_precision_p either. How about: type_strictly_matches_mode_p ? I'm open to other suggestions. > +{ > + if (VECTOR_TYPE_P (type)) > +return VECTOR_MODE_P (TYPE_MODE (type)); > + > + if (INTEGRAL_TYPE_P (type)) > +return type_has_mode_precision_p (type); > + > + if (SCALAR_FLOAT_TYPE_P (type) || COMPLEX_FLOAT_TYPE_P (type)) > +return true; > + > + return false; > +} > + > /* Return true if FN is supported for the types in TYPES when the > optimization type is OPT_TYPE. The types are those associated with > the "type0" and "type1" fields of FN's direct_internal_fn_info > @@ -4173,6 +4190,10 @@ bool > direct_internal_fn_supported_p (internal_fn fn, tree_pair types, > optimization_type opt_type) > { > + if (!mode_describle_type_precision_p (types.first) > +|| !mode_describle_type_precision_p (types.second)) Formatting nit: the "||" should line up with the "!". LGTM otherwise. Thanks, Richard > +return false; > + >switch (fn) > { > #define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) \ > diff --git a/gcc/testsuite/g++.target/i386/pr115961-run-1.C > b/gcc/testsuite/g++.target/i386/pr115961-run-1.C > new file mode 100644 > index 000..b8c8aef3b17 > --- /dev/null > +++ b/gcc/testsuite/g++.target/i386/pr115961-run-1.C > @@ -0,0 +1,34 @@ > +/* PR target/115961 */ > +/* { dg-do run } */ > +/* { dg-options "-O3 -fdump-rtl-expand-details" } */ > + > +struct e > +{ > + unsigned pre : 12; > + unsigned a : 4; > +}; > + > +static unsigned min_u (unsigned a, unsigned b) > +{ > + return (b < a) ? b : a; > +} > + > +__attribute__((noipa)) > +void bug (e * v, unsigned def, unsigned use) { > + e & defE
RE: [PATCH v2] RISC-V: More support of vx and vf for autovec comparison
> + TEST_COND_IMM_FLOAT (T, >, 0.0, _gt) > \ > + TEST_COND_IMM_FLOAT (T, <, 0.0, _lt) > \ > + TEST_COND_IMM_FLOAT (T, >=, 0.0, _ge) > \ > + TEST_COND_IMM_FLOAT (T, <=, 0.0, _le) > \ > + TEST_COND_IMM_FLOAT (T, ==, 0.0, _eq) > \ > + TEST_COND_IMM_FLOAT (T, !=, 0.0, _ne) > \ Just curious, does this patch covered float imm is -0.0 (notice only +0.0 is mentioned)? If so we can have similar tests as +0.0 here. It is totally Ok if -0.0f is not applicable here. Pan -Original Message- From: demin.han Sent: Friday, July 19, 2024 4:55 PM To: gcc-patches@gcc.gnu.org Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; Li, Pan2 ; jeffreya...@gmail.com; rdapp@gmail.com Subject: [PATCH v2] RISC-V: More support of vx and vf for autovec comparison There are still some cases which can't utilize vx or vf after last_combine pass. 1. integer comparison when imm isn't in range of [-16, 15] 2. float imm is 0.0 3. DI or DF mode under RV32 This patch fix above mentioned issues. Tested on RV32 and RV64. Signed-off-by: demin.han gcc/ChangeLog: * config/riscv/autovec.md: register_operand to nonmemory_operand * config/riscv/riscv-v.cc (get_cmp_insn_code): Select code according * to scalar_p (expand_vec_cmp): Generate scalar_p and transform op1 * config/riscv/riscv.cc (riscv_const_insns): Add !FLOAT_MODE_P * constrain gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/cmp/vcond-1.c: Fix and add test Signed-off-by: demin.han --- V2 changes: 1. remove unnecessary add_integer_operand and related code 2. fix one format issue 3. split patch and make it only related to vec cmp gcc/config/riscv/autovec.md | 2 +- gcc/config/riscv/riscv-v.cc | 57 +++ gcc/config/riscv/riscv.cc | 2 +- .../riscv/rvv/autovec/cmp/vcond-1.c | 48 +++- 4 files changed, 82 insertions(+), 27 deletions(-) diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index d5793acc999..a772153 100644 --- a/gcc/config/riscv/autovec.md +++ b/gcc/config/riscv/autovec.md @@ -690,7 +690,7 @@ (define_expand "vec_cmp" [(set (match_operand: 0 "register_operand") (match_operator: 1 "comparison_operator" [(match_operand:V_VLSF 2 "register_operand") - (match_operand:V_VLSF 3 "register_operand")]))] + (match_operand:V_VLSF 3 "nonmemory_operand")]))] "TARGET_VECTOR" { riscv_vector::expand_vec_cmp_float (operands[0], GET_CODE (operands[1]), diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index e290675bbf0..56328075aeb 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -2624,32 +2624,27 @@ expand_vec_init (rtx target, rtx vals) /* Get insn code for corresponding comparison. */ static insn_code -get_cmp_insn_code (rtx_code code, machine_mode mode) +get_cmp_insn_code (rtx_code code, machine_mode mode, bool scalar_p) { insn_code icode; - switch (code) + if (FLOAT_MODE_P (mode)) { -case EQ: -case NE: -case LE: -case LEU: -case GT: -case GTU: -case LTGT: - icode = code_for_pred_cmp (mode); - break; -case LT: -case LTU: -case GE: -case GEU: - if (FLOAT_MODE_P (mode)) - icode = code_for_pred_cmp (mode); + icode = !scalar_p ? code_for_pred_cmp (mode) + : code_for_pred_cmp_scalar (mode); + return icode; +} + if (scalar_p) +{ + if (code == GE || code == GEU) + icode = code_for_pred_ge_scalar (mode); else - icode = code_for_pred_ltge (mode); - break; -default: - gcc_unreachable (); + icode = code_for_pred_cmp_scalar (mode); + return icode; } + if (code == LT || code == LTU || code == GE || code == GEU) +icode = code_for_pred_ltge (mode); + else +icode = code_for_pred_cmp (mode); return icode; } @@ -2771,7 +2766,6 @@ expand_vec_cmp (rtx target, rtx_code code, rtx op0, rtx op1, rtx mask, { machine_mode mask_mode = GET_MODE (target); machine_mode data_mode = GET_MODE (op0); - insn_code icode = get_cmp_insn_code (code, data_mode); if (code == LTGT) { @@ -2779,12 +2773,29 @@ expand_vec_cmp (rtx target, rtx_code code, rtx op0, rtx op1, rtx mask, rtx gt = gen_reg_rtx (mask_mode); expand_vec_cmp (lt, LT, op0, op1, mask, maskoff); expand_vec_cmp (gt, GT, op0, op1, mask, maskoff); - icode = code_for_pred (IOR, mask_mode); + insn_code icode = code_for_pred (IOR, mask_mode); rtx ops[] = {target, l
RE: [PATCH v1] Match: Only allow single use of MIN_EXPR for SAT_TRUNC form 2 [PR115863]
> Otherwise the patch looks good to me. Thanks Richard, will commit with the log updated. Pan -Original Message- From: Richard Biener Sent: Thursday, July 18, 2024 9:27 PM To: Li, Pan2 Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; tamar.christ...@arm.com; jeffreya...@gmail.com; rdapp@gmail.com; Liu, Hongtao Subject: Re: [PATCH v1] Match: Only allow single use of MIN_EXPR for SAT_TRUNC form 2 [PR115863] On Thu, Jul 18, 2024 at 2:27 PM wrote: > > From: Pan Li > > The SAT_TRUNC form 2 has below pattern matching. > From: > _18 = MIN_EXPR ; > iftmp.0_11 = (unsigned int) _18; > > To: > _18 = MIN_EXPR ; > iftmp.0_11 = .SAT_TRUNC (_18); .SAT_TRUNC (left_8); > But if there is another use of _18 like below, the transform to the > .SAT_TRUNC may have no earnings. For example: > > From: > _18 = MIN_EXPR ; // op_0 def > iftmp.0_11 = (unsigned int) _18; // op_0 > stream.avail_out = iftmp.0_11; > left_37 = left_8 - _18; // op_0 use > > To: > _18 = MIN_EXPR ; // op_0 def > iftmp.0_11 = .SAT_TRUNC (_18); .SAT_TRUNC (left_8);? Otherwise the patch looks good to me. Thanks, Richard. > stream.avail_out = iftmp.0_11; > left_37 = left_8 - _18; // op_0 use > > Pattern recog to .SAT_TRUNC cannot eliminate MIN_EXPR as above. Then the > backend (for example x86/riscv) will have additional 2-3 more insns > after pattern recog besides the MIN_EXPR. Thus, keep the normal truncation > as is should be the better choose. > > The below testsuites are passed for this patch: > 1. The rv64gcv fully regression tests. > 2. The x86 bootstrap tests. > 3. The x86 fully regression tests. > > PR target/115863 > > gcc/ChangeLog: > > * match.pd: Add single_use of MIN_EXPR for .SAT_TRUNC form 2. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/pr115863-1.c: New test. > > Signed-off-by: Pan Li > --- > gcc/match.pd | 15 +++-- > gcc/testsuite/gcc.target/i386/pr115863-1.c | 37 ++ > 2 files changed, 50 insertions(+), 2 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/i386/pr115863-1.c > > diff --git a/gcc/match.pd b/gcc/match.pd > index 5cb399b8718..d4f040b5c7b 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -3252,10 +3252,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > > /* Unsigned saturation truncate, case 2, sizeof (WT) > sizeof (NT). > SAT_U_TRUNC = (NT)(MIN_EXPR (X, 255)). */ > +/* If Op_0 def is MIN_EXPR and not single_use. Aka below pattern: > + > + _18 = MIN_EXPR ; // op_0 def > + iftmp.0_11 = (unsigned int) _18; // op_0 > + stream.avail_out = iftmp.0_11; > + left_37 = left_8 - _18; // op_0 use > + > + Transfer to .SAT_TRUNC will have MIN_EXPR still live. Then the backend > + (for example x86/riscv) will have 2-3 more insns generation for .SAT_TRUNC > + besides the MIN_EXPR. Thus, keep the normal truncation as is should be > + the better choose. */ > (match (unsigned_integer_sat_trunc @0) > - (convert (min @0 INTEGER_CST@1)) > + (convert (min@2 @0 INTEGER_CST@1)) > (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type) > - && TYPE_UNSIGNED (TREE_TYPE (@0))) > + && TYPE_UNSIGNED (TREE_TYPE (@0)) && single_use (@2)) > (with >{ > unsigned itype_precision = TYPE_PRECISION (TREE_TYPE (@0)); > diff --git a/gcc/testsuite/gcc.target/i386/pr115863-1.c > b/gcc/testsuite/gcc.target/i386/pr115863-1.c > new file mode 100644 > index 000..a672f62cec5 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr115863-1.c > @@ -0,0 +1,37 @@ > +/* PR target/115863 */ > +/* { dg-do compile } */ > +/* { dg-options "-O3 -fdump-rtl-expand-details" } */ > + > +#include > + > +typedef struct z_stream_s { > +uint32_t avail_out; > +} z_stream; > + > +typedef z_stream *z_streamp; > + > +extern int deflate (z_streamp strmp); > + > +int compress2 (uint64_t *destLen) > +{ > + z_stream stream; > + int err; > + const uint32_t max = (uint32_t)(-1); > + uint64_t left; > + > + left = *destLen; > + > + stream.avail_out = 0; > + > + do { > +if (stream.avail_out == 0) { > +stream.avail_out = left > (uint64_t)max ? max : (uint32_t)left; > +left -= stream.avail_out; > +} > +err = deflate(&stream); > +} while (err == 0); > + > + return err; > +} > + > +/* { dg-final { scan-rtl-dump-not ".SAT_TRUNC " "expand" } } */ > -- > 2.34.1 >
RE: [PATCH v1] Match: Only allow single use of MIN_EXPR for SAT_TRUNC form 2 [PR115863]
Thanks Tamar for comments. The :s flag is somehow ignored in matching according the gccint doc. "The second supported flag is s which tells the code generator to fail the pattern if the expression marked with s does have more than one use and the simplification results in an expression with more than one operator." I also diff the generated code in gimple_unsigned_integer_sat_trunc, it doesn't have single use when :s flag. && TYPE_UNSIGNED (TREE_TYPE (captures[0])) // the :s flag && TYPE_UNSIGNED (TREE_TYPE (captures[0])) && single_use (captures[1]) // explicit single_use check. Pan -Original Message- From: Tamar Christina Sent: Thursday, July 18, 2024 8:36 PM To: Li, Pan2 ; gcc-patches@gcc.gnu.org Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com; Liu, Hongtao Subject: RE: [PATCH v1] Match: Only allow single use of MIN_EXPR for SAT_TRUNC form 2 [PR115863] > -Original Message- > From: pan2...@intel.com > Sent: Thursday, July 18, 2024 1:27 PM > To: gcc-patches@gcc.gnu.org > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com; > Tamar Christina ; jeffreya...@gmail.com; > rdapp@gmail.com; hongtao@intel.com; Pan Li > Subject: [PATCH v1] Match: Only allow single use of MIN_EXPR for SAT_TRUNC > form 2 [PR115863] > > From: Pan Li > > The SAT_TRUNC form 2 has below pattern matching. > From: > _18 = MIN_EXPR ; > iftmp.0_11 = (unsigned int) _18; > > To: > _18 = MIN_EXPR ; > iftmp.0_11 = .SAT_TRUNC (_18); > > But if there is another use of _18 like below, the transform to the > .SAT_TRUNC may have no earnings. For example: > > From: > _18 = MIN_EXPR ; // op_0 def > iftmp.0_11 = (unsigned int) _18; // op_0 > stream.avail_out = iftmp.0_11; > left_37 = left_8 - _18; // op_0 use > > To: > _18 = MIN_EXPR ; // op_0 def > iftmp.0_11 = .SAT_TRUNC (_18); > stream.avail_out = iftmp.0_11; > left_37 = left_8 - _18; // op_0 use > > Pattern recog to .SAT_TRUNC cannot eliminate MIN_EXPR as above. Then the > backend (for example x86/riscv) will have additional 2-3 more insns > after pattern recog besides the MIN_EXPR. Thus, keep the normal truncation > as is should be the better choose. > > The below testsuites are passed for this patch: > 1. The rv64gcv fully regression tests. > 2. The x86 bootstrap tests. > 3. The x86 fully regression tests. > > PR target/115863 > > gcc/ChangeLog: > > * match.pd: Add single_use of MIN_EXPR for .SAT_TRUNC form 2. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/pr115863-1.c: New test. > > Signed-off-by: Pan Li > --- > gcc/match.pd | 15 +++-- > gcc/testsuite/gcc.target/i386/pr115863-1.c | 37 ++ > 2 files changed, 50 insertions(+), 2 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/i386/pr115863-1.c > > diff --git a/gcc/match.pd b/gcc/match.pd > index 5cb399b8718..d4f040b5c7b 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -3252,10 +3252,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > > /* Unsigned saturation truncate, case 2, sizeof (WT) > sizeof (NT). > SAT_U_TRUNC = (NT)(MIN_EXPR (X, 255)). */ > +/* If Op_0 def is MIN_EXPR and not single_use. Aka below pattern: > + > + _18 = MIN_EXPR ; // op_0 def > + iftmp.0_11 = (unsigned int) _18; // op_0 > + stream.avail_out = iftmp.0_11; > + left_37 = left_8 - _18; // op_0 use > + > + Transfer to .SAT_TRUNC will have MIN_EXPR still live. Then the backend > + (for example x86/riscv) will have 2-3 more insns generation for .SAT_TRUNC > + besides the MIN_EXPR. Thus, keep the normal truncation as is should be > + the better choose. */ > (match (unsigned_integer_sat_trunc @0) > - (convert (min @0 INTEGER_CST@1)) > + (convert (min@2 @0 INTEGER_CST@1)) > (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type) > - && TYPE_UNSIGNED (TREE_TYPE (@0))) > + && TYPE_UNSIGNED (TREE_TYPE (@0)) && single_use (@2)) You can probably use the single use flag here? so > - (convert (min @0 INTEGER_CST@1)) > + (convert (min:s @0 @0 INTEGER_CST@1)) ? Cheers, Tamar > (with >{ > unsigned itype_precision = TYPE_PRECISION (TREE_TYPE (@0)); > diff --git a/gcc/testsuite/gcc.target/i386/pr115863-1.c > b/gcc/testsuite/gcc.target/i386/pr115863-1.c > new file mode 100644 > index 000..a672f62cec5 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr115863-1.c > @@ -0,0 +1,37 @@ > +/* PR tar
RE: [PATCH v1] Doc: Add Standard-Names ustrunc and sstrunc for integer modes
Thanks Richard and Andrew, will commit v2 with that changes. https://gcc.gnu.org/pipermail/gcc-patches/2024-July/657617.html Pan -Original Message- From: Richard Biener Sent: Thursday, July 18, 2024 3:00 PM To: Andrew Pinski Cc: Li, Pan2 ; gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; tamar.christ...@arm.com; jeffreya...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH v1] Doc: Add Standard-Names ustrunc and sstrunc for integer modes On Thu, Jul 18, 2024 at 7:35 AM Andrew Pinski wrote: > > On Wed, Jul 17, 2024 at 9:20 PM wrote: > > > > From: Pan Li > > > > This patch would like to add the doc for the Standard-Names > > ustrunc and sstrunc, include both the scalar and vector integer > > modes. > > Thanks for doing this and this looks mostly good to me (can't approve it). Too bad. OK with the changes Andrew requested. Thanks, Richard. > > > > > gcc/ChangeLog: > > > > * doc/md.texi: Add Standard-Names ustrunc and sstrunc. > > > > Signed-off-by: Pan Li > > --- > > gcc/doc/md.texi | 12 > > 1 file changed, 12 insertions(+) > > > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi > > index 7f4335e0aac..f116dede906 100644 > > --- a/gcc/doc/md.texi > > +++ b/gcc/doc/md.texi > > @@ -5543,6 +5543,18 @@ means of constraints requiring operands 1 and 0 to > > be the same location. > > @itemx @samp{and@var{m}3}, @samp{ior@var{m}3}, @samp{xor@var{m}3} > > Similar, for other arithmetic operations. > > > > +@cindex @code{ustrunc@var{m}@var{n}2} instruction pattern > > +@item @samp{ustrunc@var{m}@var{n}2} > > +Truncate the operand 1, and storing the result in operand 0. There will > > +be saturation during the trunction. The result will be saturated to the > > +maximal value of operand 0 type if there is overflow when truncation. The > s/type/mode/ . > > +operand 1 must have mode @var{n}, and the operand 0 must have mode > > @var{m}. > > +Both the scalar and vector integer modes are allowed. > I don't think you need the article `the` here. It reads wrong with it > at least to me. > > > + > > +@cindex @code{sstrunc@var{m}@var{n}2} instruction pattern > > +@item @samp{sstrunc@var{m}@var{n}2} > > +Similar but for signed. > > + > > @cindex @code{andc@var{m}3} instruction pattern > > @item @samp{andc@var{m}3} > > Like @code{and@var{m}3}, but it uses bitwise-complement of operand 2 > > -- > > 2.34.1 > >
RE: [PATCH v1] Match: Bugfix .SAT_TRUNC honor types has no mode precision [PR115961]
Thanks all, will have a try in v2. Pan -Original Message- From: Richard Sandiford Sent: Thursday, July 18, 2024 5:14 AM To: Andrew Pinski Cc: Tamar Christina ; Richard Biener ; Li, Pan2 ; gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com; Liu, Hongtao Subject: Re: [PATCH v1] Match: Bugfix .SAT_TRUNC honor types has no mode precision [PR115961] Andrew Pinski writes: > On Wed, Jul 17, 2024 at 1:03 PM Tamar Christina > wrote: >> >> > -Original Message- >> > From: Richard Sandiford >> > Sent: Wednesday, July 17, 2024 8:55 PM >> > To: Richard Biener >> > Cc: pan2...@intel.com; gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; >> > kito.ch...@gmail.com; Tamar Christina ; >> > jeffreya...@gmail.com; rdapp@gmail.com; hongtao@intel.com >> > Subject: Re: [PATCH v1] Match: Bugfix .SAT_TRUNC honor types has no mode >> > precision [PR115961] >> > >> > Richard Biener writes: >> > > On Wed, Jul 17, 2024 at 11:48 AM wrote: >> > >> >> > >> From: Pan Li >> > >> >> > >> The .SAT_TRUNC matching doesn't check the type has mode precision. Thus >> > >> when bitfield like below will be recog as .SAT_TRUNC. >> > >> >> > >> struct e >> > >> { >> > >> unsigned pre : 12; >> > >> unsigned a : 4; >> > >> }; >> > >> >> > >> __attribute__((noipa)) >> > >> void bug (e * v, unsigned def, unsigned use) { >> > >> e & defE = *v; >> > >> defE.a = min_u (use + 1, 0xf); >> > >> } >> > >> >> > >> This patch would like to add type_has_mode_precision_p for the >> > >> .SAT_TRUNC matching to get rid of this. >> > >> >> > >> The below test suites are passed for this patch: >> > >> 1. The rv64gcv fully regression tests. >> > >> 2. The x86 bootstrap tests. >> > >> 3. The x86 fully regression tests. >> > > >> > > Hmm, rather than restricting the matching the issue is the optab query or >> > > in this case how *_optab_supported_p blindly uses TYPE_MODE without >> > > either asserting the type has mode precision or failing the query in >> > > this case. >> > > >> > > I think it would be simplest to adjust direct_optab_supported_p >> > > (and convert_optab_supported_p) to reject such operations? Richard, do >> > > you agree or should callers check this instead? >> > >> > Sounds good to me, although I suppose it should go: >> > >> > bool >> > direct_internal_fn_supported_p (internal_fn fn, tree_pair types, >> > optimization_type opt_type) >> > { >> > // <--- Here >> > switch (fn) >> > { >> > >> > } >> > } >> > >> > until we know of a specific case where that's wrong. >> > >> > Is type_has_mode_precision_p meaningful for all types? >> > >> >> I was wondering about that, wouldn't VECTOR_BOOLEAN_TYPE_P types fail? >> e.g. on AVX where the type precision is 1 but the mode precision QImode? >> >> Unless I misunderstood the predicate. > > So type_has_mode_precision_p only works with scalar integral types > (maybe scalar real types too) since it uses TYPE_PRECISION directly > and not element_precision (the precision field is overloaded for > vectors for the number of elements and TYPE_PRECISION on a vector type > will cause an ICE since r14-2150-gfe48f2651334bc). > So I suspect you need to check !VECTOR_TYPE_P (type) before calling > type_has_mode_precision_p . I think for VECTOR_TYPE_P it would be worth checking VECTOR_MODE_P instead, if we're not requiring callers to check this kind of thing. So something like: bool mode_describes_type_p (const_tree type) { if (VECTOR_TYPE_P (type)) return VECTOR_MODE_P (TREE_TYPE (type)); if (INTEGRAL_TYPE_P (type)) return type_has_mode_precision_p (type); if (SCALAR_FLOAT_TYPE_P (type)) return true; return false; } ? Possibly also with complex handling if we need that. Richard
RE: [PATCH v1] Internal-fn: Support new IFN SAT_TRUNC for unsigned scalar int
> I just noticed you added ustrunc/sstrunc optabs but didn't add > documentation for them in md.texi like the other optabs that are > defined. > See https://gcc.gnu.org/onlinedocs/gccint/Standard-Names.html for the > generated file of md.texi there. > Can you please update md.texi to add them? Thanks Andrew, almost forgot this, will add it soon. Pan -Original Message- From: Andrew Pinski Sent: Thursday, July 18, 2024 6:59 AM To: Li, Pan2 Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH v1] Internal-fn: Support new IFN SAT_TRUNC for unsigned scalar int On Tue, Jun 25, 2024 at 6:46 PM wrote: > > From: Pan Li > > This patch would like to add the middle-end presentation for the > saturation truncation. Aka set the result of truncated value to > the max value when overflow. It will take the pattern similar > as below. > > Form 1: > #define DEF_SAT_U_TRUC_FMT_1(WT, NT) \ > NT __attribute__((noinline)) \ > sat_u_truc_##T##_fmt_1 (WT x)\ > {\ > bool overflow = x > (WT)(NT)(-1); \ > return ((NT)x) | (NT)-overflow;\ > } > > For example, truncated uint16_t to uint8_t, we have > > * SAT_TRUNC (254) => 254 > * SAT_TRUNC (255) => 255 > * SAT_TRUNC (256) => 255 > * SAT_TRUNC (65536) => 255 > > Given below SAT_TRUNC from uint64_t to uint32_t. > > DEF_SAT_U_TRUC_FMT_1 (uint64_t, uint32_t) > > Before this patch: > __attribute__((noinline)) > uint32_t sat_u_truc_T_fmt_1 (uint64_t x) > { > _Bool overflow; > unsigned int _1; > unsigned int _2; > unsigned int _3; > uint32_t _6; > > ;; basic block 2, loop depth 0 > ;;pred: ENTRY > overflow_5 = x_4(D) > 4294967295; > _1 = (unsigned int) x_4(D); > _2 = (unsigned int) overflow_5; > _3 = -_2; > _6 = _1 | _3; > return _6; > ;;succ: EXIT > > } > > After this patch: > __attribute__((noinline)) > uint32_t sat_u_truc_T_fmt_1 (uint64_t x) > { > uint32_t _6; > > ;; basic block 2, loop depth 0 > ;;pred: ENTRY > _6 = .SAT_TRUNC (x_4(D)); [tail call] > return _6; > ;;succ: EXIT > > } > > The below tests are passed for this patch: > *. The rv64gcv fully regression tests. > *. The rv64gcv build with glibc. > *. The x86 bootstrap tests. > *. The x86 fully regression tests. > > gcc/ChangeLog: > > * internal-fn.def (SAT_TRUNC): Add new signed IFN sat_trunc as > unary_convert. > * match.pd: Add new matching pattern for unsigned int sat_trunc. > * optabs.def (OPTAB_CL): Add unsigned and signed optab. I just noticed you added ustrunc/sstrunc optabs but didn't add documentation for them in md.texi like the other optabs that are defined. See https://gcc.gnu.org/onlinedocs/gccint/Standard-Names.html for the generated file of md.texi there. Can you please update md.texi to add them? Thanks, Andrew Pinski > * tree-ssa-math-opts.cc (gimple_unsigend_integer_sat_trunc): Add > new decl for the matching pattern generated func. > (match_unsigned_saturation_trunc): Add new func impl to match > the .SAT_TRUNC. > (math_opts_dom_walker::after_dom_children): Add .SAT_TRUNC match > function under BIT_IOR_EXPR case. > * tree.cc (integer_half_truncated_all_ones_p): Add new func impl > to filter the truncated threshold. > * tree.h (integer_half_truncated_all_ones_p): Add new func decl. > > Signed-off-by: Pan Li > --- > gcc/internal-fn.def | 2 ++ > gcc/match.pd | 12 +++- > gcc/optabs.def| 3 +++ > gcc/tree-ssa-math-opts.cc | 32 > gcc/tree.cc | 22 ++ > gcc/tree.h| 6 ++ > 6 files changed, 76 insertions(+), 1 deletion(-) > > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def > index a8c83437ada..915d329c05a 100644 > --- a/gcc/internal-fn.def > +++ b/gcc/internal-fn.def > @@ -278,6 +278,8 @@ DEF_INTERNAL_SIGNED_OPTAB_FN (MULHRS, ECF_CONST | > ECF_NOTHROW, first, > DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_ADD, ECF_CONST, first, ssadd, usadd, > binary) > DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_SUB, ECF_CONST, first, sssub, ussub, > binary) > > +DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_TRUNC, ECF_CONST, first, sstrunc, ustrunc, > unary_convert) > + > DEF_INTERNAL_COND_FN (ADD, ECF_CONST, add, binary) > DEF_INTERNAL_COND_FN (SUB, ECF_CONST, sub, binary) > DEF_INTERNAL_COND_FN (MUL, ECF_CONST, smul, binary) >
RE: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for vectorizable_call
> I think that's a bug. Do you say __builtin_add_overflow fails to promote > (constant) arguments? I double checked the 022t.ssa pass for the __builtin_add_overflow operands tree type. It looks like that the 2 operands of .ADD_OVERFLOW has different tree types when one of them is constant. One is unsigned DI, and the other is int. (gdb) call debug_gimple_stmt(stmt) _14 = .ADD_OVERFLOW (_4, 129); (gdb) call debug_tree (gimple_call_arg(stmt, 0)) unit-size align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x76a437e0 precision:64 min max pointer_to_this > visited def_stmt _4 = *_3; version:4> (gdb) call debug_tree (gimple_call_arg(stmt, 1)) constant 129> (gdb) Then we go to the vect pass, we can also see that the ops of .ADD_OVERFLOW has different tree types. As my understanding, here we should have unsigned DI for constant operands (gdb) layout src (gdb) list 506 if (gimple_call_num_args (_c4) == 2) 507 { 508 tree _q40 = gimple_call_arg (_c4, 0); 509 _q40 = do_valueize (valueize, _q40); 510 tree _q41 = gimple_call_arg (_c4, 1); 511 _q41 = do_valueize (valueize, _q41); 512 if (integer_zerop (_q21)) 513 { 514 if (integer_minus_onep (_p1)) 515 { (gdb) call debug_tree (_q40) unit-size align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x76a437e0 precision:64 min max pointer_to_this > visited def_stmt _4 = *_3; version:4> (gdb) call debug_tree (_q41) constant 129> Pan -Original Message- From: Richard Biener Sent: Wednesday, July 10, 2024 7:36 PM To: Li, Pan2 Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; tamar.christ...@arm.com; jeffreya...@gmail.com; rdapp@gmail.com; Liu, Hongtao Subject: Re: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for vectorizable_call On Wed, Jul 10, 2024 at 11:28 AM wrote: > > From: Pan Li > > The .SAT_ADD has 2 operand and one of the operand may be INTEGER_CST. > For example _1 = .SAT_ADD (_2, 9) comes from below sample code. > > Form 3: > #define DEF_VEC_SAT_U_ADD_IMM_FMT_3(T, IMM) \ > T __attribute__((noinline)) \ > vec_sat_u_add_imm##IMM##_##T##_fmt_3 (T *out, T *in, unsigned limit) \ > {\ > unsigned i;\ > T ret; \ > for (i = 0; i < limit; i++)\ > {\ > out[i] = __builtin_add_overflow (in[i], IMM, &ret) ? -1 : ret; \ > }\ > } > > DEF_VEC_SAT_U_ADD_IMM_FMT_3(uint64_t, 9) > > It will failure to vectorize as the vectorizable_call will check the > operands is type_compatiable but the imm will be treated as unsigned > SImode from the perspective of tree. I think that's a bug. Do you say __builtin_add_overflow fails to promote (constant) arguments? > Aka > > uint64_t _1; > uint64_t _2; > > _1 = .SAT_ADD (_2, 9); > > The _1 and _2 are unsigned DImode, which is different to imm 9 unsigned > SImode, and then result in vectorizable_call fails. This patch would > like to promote the imm operand to the operand type mode of _2 if and > only if there is no precision/data loss. Aka convert the imm 9 to the > DImode for above example. > > The below test suites are passed for this patch: > 1. The rv64gcv fully regression tests. > 2. The rv64gcv build with glibc. > 3. The x86 bootstrap tests. > 4. The x86 fully regression tests. > > gcc/ChangeLog: > > * tree-vect-patterns.cc (vect_recog_promote_cst_to_unsigned): Add > new func impl to promote the imm tree to target type. > (vect_recog_sat_add_pattern): Peform the type promotion before > generate .SAT_ADD call. >
RE: [PATCH] RISC-V: Fix testcase for vector .SAT_SUB in zip benchmark
Thanks Jeff and Edwin for my silly mistake. Pan -Original Message- From: Jeff Law Sent: Saturday, July 13, 2024 5:40 AM To: Edwin Lu ; gcc-patches@gcc.gnu.org Cc: Li, Pan2 ; gnu-toolch...@rivosinc.com Subject: Re: [PATCH] RISC-V: Fix testcase for vector .SAT_SUB in zip benchmark On 7/12/24 12:37 PM, Edwin Lu wrote: > The following testcase was not properly testing anything due to an > uninitialized variable. As a result, the loop was not iterating through > the testing data, but instead on undefined values which could cause an > unexpected abort. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_vx.h: > initialize variable OK. Thanks for chasing this down. jeff
RE: [PATCH v1] RISC-V: Add testcases for vector .SAT_SUB in zip benchmark
Committed, thanks Juzhe. Pan From: juzhe.zh...@rivai.ai Sent: Thursday, July 11, 2024 6:32 PM To: Li, Pan2 ; gcc-patches Cc: kito.cheng ; jeffreyalaw ; Robin Dapp ; Li, Pan2 Subject: Re: [PATCH v1] RISC-V: Add testcases for vector .SAT_SUB in zip benchmark LGTM juzhe.zh...@rivai.ai<mailto:juzhe.zh...@rivai.ai> From: pan2.li<mailto:pan2...@intel.com> Date: 2024-07-11 16:29 To: gcc-patches<mailto:gcc-patches@gcc.gnu.org> CC: juzhe.zhong<mailto:juzhe.zh...@rivai.ai>; kito.cheng<mailto:kito.ch...@gmail.com>; jeffreyalaw<mailto:jeffreya...@gmail.com>; rdapp.gcc<mailto:rdapp@gmail.com>; Pan Li<mailto:pan2...@intel.com> Subject: [PATCH v1] RISC-V: Add testcases for vector .SAT_SUB in zip benchmark From: Pan Li mailto:pan2...@intel.com>> This patch would like to add the test cases for the vector .SAT_SUB in the zip benchmark. Aka: Form in zip benchmark: #define DEF_VEC_SAT_U_SUB_ZIP(T1, T2) \ void __attribute__((noinline))\ vec_sat_u_sub_##T1##_##T2##_fmt_zip (T1 *x, T2 b, unsigned limit) \ { \ T2 a; \ T1 *p = x; \ do {\ a = *--p; \ *p = (T1)(a >= b ? a - b : 0);\ } while (--limit); \ } DEF_VEC_SAT_U_SUB_ZIP(uint8_t, uint16_t) vec_sat_u_sub_uint16_t_uint32_t_fmt_zip: ... vsetvli a4,zero,e32,m1,ta,ma vmv.v.x v6,a1 vsetvli zero,zero,e16,mf2,ta,ma vid.v v2 lia4,-1 vnclipu.wiv6,v6,0 // .SAT_TRUNC .L3: vle16.v v3,0(a3) vrsub.vx v5,v2,a6 mva7,a4 addw a4,a4,t3 vrgather.vv v1,v3,v5 vssubu.vv v1,v1,v6 // .SAT_SUB vrgather.vv v3,v1,v5 vse16.v v3,0(a3) sub a3,a3,t1 bgtu t4,a4,.L3 Passed the rv64gcv tests. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h: Add test helper macros. * gcc.target/riscv/rvv/autovec/binop/vec_sat_data.h: Add test data for .SAT_SUB in zip benchmark. * gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_vx.h: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_zip-run.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_zip.c: New test. Signed-off-by: Pan Li mailto:pan2...@intel.com>> --- .../riscv/rvv/autovec/binop/vec_sat_arith.h | 18 + .../rvv/autovec/binop/vec_sat_binary_vx.h | 22 + .../riscv/rvv/autovec/binop/vec_sat_data.h| 81 +++ .../rvv/autovec/binop/vec_sat_u_sub_zip-run.c | 16 .../rvv/autovec/binop/vec_sat_u_sub_zip.c | 18 + 5 files changed, 155 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_vx.h create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_zip-run.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_zip.c diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h index 10459807b2c..416a1e49a47 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h @@ -322,6 +322,19 @@ vec_sat_u_sub_##T##_fmt_10 (T *out, T *op_1, T *op_2, unsigned limit) \ } \ } +#define DEF_VEC_SAT_U_SUB_ZIP(T1, T2) \ +void __attribute__((noinline))\ +vec_sat_u_sub_##T1##_##T2##_fmt_zip (T1 *x, T2 b, unsigned limit) \ +{ \ + T2 a; \ + T1 *p = x; \ + do {\ +a = *--p; \ +*p = (T1)(a >= b ? a - b : 0);\ + } while (--limit); \ +} +#define DEF_VEC_SAT_U_SUB_ZIP_WRAP(T1, T2) DEF_VEC_SAT_U_SUB_ZIP(T1, T2) + #define RUN_VEC_SAT_U_SUB_FMT_1(T, out, op_1, op_2, N) \ vec_sat_u_sub_##T##_fmt_1(out, op_1, op_2, N) @@ -352,6 +365,11 @@ vec_sat_u_sub_##T##_fmt_10 (T *out, T *op_1, T *op_2, unsigned limit) \ #define RUN_VEC_SAT_U_SUB_FMT_10(T, out, op_1, op_2, N) \ vec_sat_u_sub_##T##_fmt_10(out, op_1, op_2, N) +#define RUN_VEC_SAT_U_SUB_FMT_ZIP(T1, T2, x, b, N) \ + vec_sat_u_sub_##T1##_##T2##_fmt_zi
RE: [PATCH v3] Vect: Optimize truncation for .SAT_SUB operands
> OK. Committed, thanks Richard. Pan -Original Message- From: Richard Biener Sent: Wednesday, July 10, 2024 7:26 PM To: Li, Pan2 Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; tamar.christ...@arm.com; jeffreya...@gmail.com; rdapp@gmail.com; Liu, Hongtao Subject: Re: [PATCH v3] Vect: Optimize truncation for .SAT_SUB operands On Tue, Jul 9, 2024 at 6:03 AM wrote: > > From: Pan Li > > To get better vectorized code of .SAT_SUB, we would like to avoid the > truncated operation for the assignment. For example, as below. > > unsigned int _1; > unsigned int _2; > unsigned short int _4; > _9 = (unsigned short int).SAT_SUB (_1, _2); > > If we make sure that the _1 is in the range of unsigned short int. Such > as a def similar to: > > _1 = (unsigned short int)_4; > > Then we can do the distribute the truncation operation to: > > _3 = (unsigned short int) MIN (65535, _2); // aka _3 = .SAT_TRUNC (_2); > _9 = .SAT_SUB (_4, _3); > > Then, we can better vectorized code and avoid the unnecessary narrowing > stmt during vectorization with below stmt(s). > > _3 = .SAT_TRUNC(_2); // SI => HI > _9 = .SAT_SUB (_4, _3); > > Let's take RISC-V vector as example to tell the changes. For below > sample code: > > __attribute__((noinline)) > void test (uint16_t *x, unsigned b, unsigned n) > { > unsigned a = 0; > uint16_t *p = x; > > do { > a = *--p; > *p = (uint16_t)(a >= b ? a - b : 0); > } while (--n); > } > > Before this patch: > ... > .L3: > vle16.v v1,0(a3) > vrsub.vx v5,v2,t1 > mvt3,a4 > addw a4,a4,t5 > vrgather.vv v3,v1,v5 > vsetvli zero,zero,e32,m1,ta,ma > vzext.vf2 v1,v3 > vssubu.vx v1,v1,a1 > vsetvli zero,zero,e16,mf2,ta,ma > vncvt.x.x.w v1,v1 > vrgather.vv v3,v1,v5 > vse16.v v3,0(a3) > sub a3,a3,t4 > bgtu t6,a4,.L3 > ... > > After this patch: > test: > ... > .L3: > vle16.v v3,0(a3) > vrsub.vxv5,v2,a6 > mv a7,a4 > addwa4,a4,t3 > vrgather.vv v1,v3,v5 > vssubu.vv v1,v1,v6 > vrgather.vv v3,v1,v5 > vse16.v v3,0(a3) > sub a3,a3,t1 > bgtut4,a4,.L3 > ... > > The below test suites are passed for this patch: > 1. The rv64gcv fully regression tests. > 2. The rv64gcv build with glibc. > 3. The x86 bootstrap tests. > 4. The x86 fully regression tests. OK. Thanks, Richard. > gcc/ChangeLog: > > * tree-vect-patterns.cc (vect_recog_sat_sub_pattern_transform): > Add new func impl to perform the truncation distribution. > (vect_recog_sat_sub_pattern): Perform above optimize before > generate .SAT_SUB call. > > Signed-off-by: Pan Li > --- > gcc/tree-vect-patterns.cc | 65 +++ > 1 file changed, 65 insertions(+) > > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc > index 86e893a1c43..4570c25b664 100644 > --- a/gcc/tree-vect-patterns.cc > +++ b/gcc/tree-vect-patterns.cc > @@ -4566,6 +4566,70 @@ vect_recog_sat_add_pattern (vec_info *vinfo, > stmt_vec_info stmt_vinfo, >return NULL; > } > > +/* > + * Try to transform the truncation for .SAT_SUB pattern, mostly occurs in > + * the benchmark zip. Aka: > + * > + * unsigned int _1; > + * unsigned int _2; > + * unsigned short int _4; > + * _9 = (unsigned short int).SAT_SUB (_1, _2); > + * > + * if _1 is known to be in the range of unsigned short int. For example > + * there is a def _1 = (unsigned short int)_4. Then we can transform the > + * truncation to: > + * > + * _3 = (unsigned short int) MIN (65535, _2); // aka _3 = .SAT_TRUNC (_2); > + * _9 = .SAT_SUB (_4, _3); > + * > + * Then, we can better vectorized code and avoid the unnecessary narrowing > + * stmt during vectorization with below stmt(s). > + * > + * _3 = .SAT_TRUNC(_2); // SI => HI > + * _9 = .SAT_SUB (_4, _3); > + */ > +static void > +vect_recog_sat_sub_pattern_transform (vec_info *vinfo, > + stmt_vec_info stmt_vinfo, > + tree lhs, tree *ops) > +{ > + tree otype = TREE_TYPE (lhs); > + tree itype = TREE_TYPE (ops[0]); > + unsigned itype_prec = TYPE_PRECISION (itype); > + unsigned otype_prec = TYPE_PRECISION (otype); > + > + if (types_compatible_p (otype, itype) || otype_prec >= itype_prec) > +return; > + > + tree v_otype = get_vectype_for_scalar_type (vinfo, otype); > + tree v_itype = get_vectype_for_scalar_type (vinfo, itype); > + tree_pair v_pair = tree_pair (v_oty
RE: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for vectorizable_call
> I think that's a bug. Do you say __builtin_add_overflow fails to promote > (constant) arguments? Thanks Richard. Not very sure which part result in type mismatch, will take a look and keep you posted. Pan -Original Message- From: Richard Biener Sent: Wednesday, July 10, 2024 7:36 PM To: Li, Pan2 Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; tamar.christ...@arm.com; jeffreya...@gmail.com; rdapp@gmail.com; Liu, Hongtao Subject: Re: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for vectorizable_call On Wed, Jul 10, 2024 at 11:28 AM wrote: > > From: Pan Li > > The .SAT_ADD has 2 operand and one of the operand may be INTEGER_CST. > For example _1 = .SAT_ADD (_2, 9) comes from below sample code. > > Form 3: > #define DEF_VEC_SAT_U_ADD_IMM_FMT_3(T, IMM) \ > T __attribute__((noinline)) \ > vec_sat_u_add_imm##IMM##_##T##_fmt_3 (T *out, T *in, unsigned limit) \ > {\ > unsigned i;\ > T ret; \ > for (i = 0; i < limit; i++)\ > {\ > out[i] = __builtin_add_overflow (in[i], IMM, &ret) ? -1 : ret; \ > }\ > } > > DEF_VEC_SAT_U_ADD_IMM_FMT_3(uint64_t, 9) > > It will failure to vectorize as the vectorizable_call will check the > operands is type_compatiable but the imm will be treated as unsigned > SImode from the perspective of tree. I think that's a bug. Do you say __builtin_add_overflow fails to promote (constant) arguments? > Aka > > uint64_t _1; > uint64_t _2; > > _1 = .SAT_ADD (_2, 9); > > The _1 and _2 are unsigned DImode, which is different to imm 9 unsigned > SImode, and then result in vectorizable_call fails. This patch would > like to promote the imm operand to the operand type mode of _2 if and > only if there is no precision/data loss. Aka convert the imm 9 to the > DImode for above example. > > The below test suites are passed for this patch: > 1. The rv64gcv fully regression tests. > 2. The rv64gcv build with glibc. > 3. The x86 bootstrap tests. > 4. The x86 fully regression tests. > > gcc/ChangeLog: > > * tree-vect-patterns.cc (vect_recog_promote_cst_to_unsigned): Add > new func impl to promote the imm tree to target type. > (vect_recog_sat_add_pattern): Peform the type promotion before > generate .SAT_ADD call. > > Signed-off-by: Pan Li > --- > gcc/tree-vect-patterns.cc | 17 + > 1 file changed, 17 insertions(+) > > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc > index 86e893a1c43..e1013222b12 100644 > --- a/gcc/tree-vect-patterns.cc > +++ b/gcc/tree-vect-patterns.cc > @@ -4527,6 +4527,20 @@ vect_recog_build_binary_gimple_stmt (vec_info *vinfo, > stmt_vec_info stmt_info, >return NULL; > } > > +static void > +vect_recog_promote_cst_to_unsigned (tree *op, tree type) > +{ > + if (TREE_CODE (*op) != INTEGER_CST || !TYPE_UNSIGNED (type)) > +return; > + > + unsigned precision = TYPE_PRECISION (type); > + wide_int type_max = wi::mask (precision, false, precision); > + wide_int op_cst_val = wi::to_wide (*op, precision); > + > + if (wi::leu_p (op_cst_val, type_max)) > +*op = wide_int_to_tree (type, op_cst_val); > +} > + > /* > * Try to detect saturation add pattern (SAT_ADD), aka below gimple: > * _7 = _4 + _6; > @@ -4553,6 +4567,9 @@ vect_recog_sat_add_pattern (vec_info *vinfo, > stmt_vec_info stmt_vinfo, > >if (gimple_unsigned_integer_sat_add (lhs, ops, NULL)) > { > + vect_recog_promote_cst_to_unsigned (&ops[0], TREE_TYPE (ops[1])); > + vect_recog_promote_cst_to_unsigned (&ops[1], TREE_TYPE (ops[0])); > + >gimple *stmt = vect_recog_build_binary_gimple_stmt (vinfo, stmt_vinfo, > IFN_SAT_ADD, > type_out, > lhs, ops[0], > ops[1]); > -- > 2.34.1 >
RE: [PATCH v1] Match: Support form 2 for the .SAT_TRUNC
> OK. Committed, thanks Richard. Pan -Original Message- From: Richard Biener Sent: Wednesday, July 10, 2024 5:24 PM To: Li, Pan2 Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; tamar.christ...@arm.com; jeffreya...@gmail.com; rdapp@gmail.com; Liu, Hongtao Subject: Re: [PATCH v1] Match: Support form 2 for the .SAT_TRUNC On Fri, Jul 5, 2024 at 2:48 PM wrote: > > From: Pan Li > > This patch would like to add form 2 support for the .SAT_TRUNC. Aka: > > Form 2: > #define DEF_SAT_U_TRUC_FMT_2(NT, WT) \ > NT __attribute__((noinline)) \ > sat_u_truc_##WT##_to_##NT##_fmt_2 (WT x) \ > {\ > bool overflow = x > (WT)(NT)(-1); \ > return overflow ? (NT)-1 : (NT)x; \ > } > > DEF_SAT_U_TRUC_FMT_2(uint32, uint64) > > Before this patch: >3 │ >4 │ __attribute__((noinline)) >5 │ uint32_t sat_u_truc_uint64_t_to_uint32_t_fmt_2 (uint64_t x) >6 │ { >7 │ uint32_t _1; >8 │ long unsigned int _3; >9 │ > 10 │ ;; basic block 2, loop depth 0 > 11 │ ;;pred: ENTRY > 12 │ _3 = MIN_EXPR ; > 13 │ _1 = (uint32_t) _3; > 14 │ return _1; > 15 │ ;;succ: EXIT > 16 │ > 17 │ } > > After this patch: >3 │ >4 │ __attribute__((noinline)) >5 │ uint32_t sat_u_truc_uint64_t_to_uint32_t_fmt_2 (uint64_t x) >6 │ { >7 │ uint32_t _1; >8 │ >9 │ ;; basic block 2, loop depth 0 > 10 │ ;;pred: ENTRY > 11 │ _1 = .SAT_TRUNC (x_2(D)); [tail call] > 12 │ return _1; > 13 │ ;;succ: EXIT > 14 │ > 15 │ } > > The below test suites are passed for this patch: > 1. The x86 bootstrap test. > 2. The x86 fully regression test. > 3. The rv64gcv fully regresssion test. OK. Thanks, Richard. > gcc/ChangeLog: > > * match.pd: Add form 2 for .SAT_TRUNC. > * tree-ssa-math-opts.cc (math_opts_dom_walker::after_dom_children): > Add new case NOP_EXPR, and try to match SAT_TRUNC. > > Signed-off-by: Pan Li > --- > gcc/match.pd | 17 - > gcc/tree-ssa-math-opts.cc | 4 > 2 files changed, 20 insertions(+), 1 deletion(-) > > diff --git a/gcc/match.pd b/gcc/match.pd > index 4edfa2ae2c9..3759c64d461 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -3234,7 +3234,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type) >&& types_match (type, @0, @1 > > -/* Unsigned saturation truncate, case 1 (), sizeof (WT) > sizeof (NT). > +/* Unsigned saturation truncate, case 1, sizeof (WT) > sizeof (NT). > SAT_U_TRUNC = (NT)x | (NT)(-(X > (WT)(NT)(-1))). */ > (match (unsigned_integer_sat_trunc @0) > (bit_ior:c (negate (convert (gt @0 INTEGER_CST@1))) > @@ -3250,6 +3250,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) >} >(if (otype_precision < itype_precision && wi::eq_p (trunc_max, > int_cst)) > > +/* Unsigned saturation truncate, case 2, sizeof (WT) > sizeof (NT). > + SAT_U_TRUNC = (NT)(MIN_EXPR (X, 255)). */ > +(match (unsigned_integer_sat_trunc @0) > + (convert (min @0 INTEGER_CST@1)) > + (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type) > + && TYPE_UNSIGNED (TREE_TYPE (@0))) > + (with > + { > + unsigned itype_precision = TYPE_PRECISION (TREE_TYPE (@0)); > + unsigned otype_precision = TYPE_PRECISION (type); > + wide_int trunc_max = wi::mask (otype_precision, false, itype_precision); > + wide_int int_cst = wi::to_wide (@1, itype_precision); > + } > + (if (otype_precision < itype_precision && wi::eq_p (trunc_max, > int_cst)) > + > /* x > y && x != XXX_MIN --> x > y > x > y && x == XXX_MIN --> false . */ > (for eqne (eq ne) > diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc > index a35caf5f058..ac86be8eb94 100644 > --- a/gcc/tree-ssa-math-opts.cc > +++ b/gcc/tree-ssa-math-opts.cc > @@ -6170,6 +6170,10 @@ math_opts_dom_walker::after_dom_children (basic_block > bb) > match_unsigned_saturation_sub (&gsi, as_a (stmt)); > break; > > + case NOP_EXPR: > + match_unsigned_saturation_trunc (&gsi, as_a (stmt)); > + break; > + > default:; > } > } > -- > 2.34.1 >
RE: [PATCH v1] RISC-V: Bugfix vfmv insn honor zvfhmin for FP16 SEW [PR115763]
Backported to gcc 14 already. Pan From: Li, Pan2 Sent: Wednesday, July 3, 2024 10:41 PM To: Kito Cheng ; juzhe.zh...@rivai.ai Cc: gcc-patches@gcc.gnu.org; jeffreya...@gmail.com; rdapp@gmail.com Subject: RE: [PATCH v1] RISC-V: Bugfix vfmv insn honor zvfhmin for FP16 SEW [PR115763] Committed, thanks Juzhe and Kito. Let’s wait for a while before backport to 14. I suspect there may be similar cases for other insn(s), will double check and fix first. Pan From: Kito Cheng mailto:kito.ch...@gmail.com>> Sent: Wednesday, July 3, 2024 10:32 PM To: juzhe.zh...@rivai.ai<mailto:juzhe.zh...@rivai.ai> Cc: Li, Pan2 mailto:pan2...@intel.com>>; gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>; jeffreya...@gmail.com<mailto:jeffreya...@gmail.com>; rdapp@gmail.com<mailto:rdapp@gmail.com> Subject: Re: [PATCH v1] RISC-V: Bugfix vfmv insn honor zvfhmin for FP16 SEW [PR115763] LGTM and ok for gcc 14 as well, btw an idea is that actually could passed via gpr, I mean fpr->gpr and then vmv.v.x, but it's not block commend for this patch. 钟居哲 mailto:juzhe.zh...@rivai.ai>> 於 2024年7月3日 週三 22:18 寫道: LGTM。 juzhe.zh...@rivai.ai<mailto:juzhe.zh...@rivai.ai> From: pan2.li<mailto:pan2...@intel.com> Date: 2024-07-03 22:17 To: gcc-patches<mailto:gcc-patches@gcc.gnu.org> CC: juzhe.zhong<mailto:juzhe.zh...@rivai.ai>; kito.cheng<mailto:kito.ch...@gmail.com>; jeffreyalaw<mailto:jeffreya...@gmail.com>; rdapp.gcc<mailto:rdapp@gmail.com>; Pan Li<mailto:pan2...@intel.com> Subject: [PATCH v1] RISC-V: Bugfix vfmv insn honor zvfhmin for FP16 SEW [PR115763] From: Pan Li mailto:pan2...@intel.com>> According to the ISA, the zvfhmin sub extension should only contain convertion insn. Thus, the vfmv insn acts on FP16 should not be present when only the zvfhmin option is given. This patch would like to fix it by split the pred_broadcast define_insn into zvfhmin and zvfh part. Given below example: void test (_Float16 *dest, _Float16 bias) { dest[0] = bias; dest[1] = bias; } when compile with -march=rv64gcv_zfh_zvfhmin Before this patch: test: vsetivlizero,2,e16,mf4,ta,ma vfmv.v.fv1,fa0 // should not leverage vfmv for zvfhmin vse16.v v1,0(a0) ret After this patch: test: addi sp,sp,-16 fsh fa0,14(sp) addi a5,sp,14 vsetivli zero,2,e16,mf4,ta,ma vlse16.v v1,0(a5),zero vse16.v v1,0(a0) addi sp,sp,16 jr ra PR target/115763 gcc/ChangeLog: * config/riscv/vector.md (*pred_broadcast): Split into zvfh and zvfhmin part. (*pred_broadcast_zvfh): New define_insn for zvfh part. (*pred_broadcast_zvfhmin): Ditto but for zvfhmin. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/scalar_move-5.c: Adjust asm check. * gcc.target/riscv/rvv/base/scalar_move-6.c: Ditto. * gcc.target/riscv/rvv/base/scalar_move-7.c: Ditto. * gcc.target/riscv/rvv/base/scalar_move-8.c: Ditto. * gcc.target/riscv/rvv/base/pr115763-1.c: New test. * gcc.target/riscv/rvv/base/pr115763-2.c: New test. Signed-off-by: Pan Li mailto:pan2...@intel.com>> --- gcc/config/riscv/vector.md| 49 +-- .../gcc.target/riscv/rvv/base/pr115763-1.c| 9 .../gcc.target/riscv/rvv/base/pr115763-2.c| 10 .../gcc.target/riscv/rvv/base/scalar_move-5.c | 4 +- .../gcc.target/riscv/rvv/base/scalar_move-6.c | 6 +-- .../gcc.target/riscv/rvv/base/scalar_move-7.c | 6 +-- .../gcc.target/riscv/rvv/base/scalar_move-8.c | 6 +-- 7 files changed, 64 insertions(+), 26 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr115763-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr115763-2.c diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md index fe18ee5b5f7..d9474262d54 100644 --- a/gcc/config/riscv/vector.md +++ b/gcc/config/riscv/vector.md @@ -2080,31 +2080,50 @@ (define_insn_and_split "*pred_broadcast" [(set_attr "type" "vimov,vimov,vlds,vlds,vlds,vlds,vimovxv,vimovxv") (set_attr "mode" "")]) -(define_insn "*pred_broadcast" - [(set (match_operand:V_VLSF_ZVFHMIN 0 "register_operand" "=vr, vr, vr, vr, vr, vr, vr, vr") - (if_then_else:V_VLSF_ZVFHMIN +(define_insn "*pred_broadcast_zvfh" + [(set (match_operand:V_VLSF0 "register_operand" "=vr, vr, vr, vr") + (if_then_else:V_VLSF (unspec: - [(match_operand: 1 "vector_broadcast_mask_operand" "Wc1,Wc1, vm, vm,Wc1,Wc1,Wb1,Wb1") - (match_operand 4 "vector_length_operand" " rK, rK, rK, rK, rK, rK, rK, rK") - (match_operand 5 "const_int_operand" " i, i, i, i, i, i, i, i") - (match_operand 6 "const_int_operand" " i, i, i, i, i, i, i,