RE: [PATCH v1 2/2] RISC-V: Add testcases for form 3 of signed vector SAT_ADD

2024-09-24 Thread Li, Pan2
Thanks Robin, this depends on [PATCH 1/2] of match.pd change, will commit it 
after that.

Pan

-Original Message-
From: Robin Dapp  
Sent: Tuesday, September 24, 2024 8:40 PM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: richard.guent...@gmail.com; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; 
kito.ch...@gmail.com; jeffreya...@gmail.com; Robin Dapp 
Subject: Re: [PATCH v1 2/2] RISC-V: Add testcases for form 3 of signed vector 
SAT_ADD

LGTM (in case you haven't committed it yet).

-- 
Regards
 Robin



RE: [PATCH v2] Widening-Mul: Fix one ICE for SAT_SUB matching operand checking

2024-09-24 Thread Li, Pan2
Thanks Richard for comments.

> Since you're creating the call with op_0/op_1 shouldn't you _only_ check 
> support
> for op_type operation and not lhs_type?

Yes, your are right. Checking operand makes much more sense to me. Let me 
update in v3.

Pan

-Original Message-
From: Richard Biener  
Sent: Tuesday, September 24, 2024 3:42 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; 
kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v2] Widening-Mul: Fix one ICE for SAT_SUB matching operand 
checking

On Tue, Sep 24, 2024 at 9:13 AM  wrote:
>
> From: Pan Li 
>
> This patch would like to fix the following ICE for -O2 -m32 of x86_64.
>
> during RTL pass: expand
> JackMidiAsyncWaitQueue.cpp.cpp: In function 'void DequeueEvent(unsigned
> int)':
> JackMidiAsyncWaitQueue.cpp.cpp:3:6: internal compiler error: in
> expand_fn_using_insn, at internal-fn.cc:263
> 3 | void DequeueEvent(unsigned frame) {
>   |  ^~~~
> 0x27b580d diagnostic_context::diagnostic_impl(rich_location*,
> diagnostic_metadata const*, diagnostic_option_id, char const*,
> __va_list_tag (*) [1], diagnostic_t)
> ???:0
> 0x27c4a3f internal_error(char const*, ...)
> ???:0
> 0x27b3994 fancy_abort(char const*, int, char const*)
> ???:0
> 0xf25ae5 expand_fn_using_insn(gcall*, insn_code, unsigned int, unsigned int)
> ???:0
> 0xf2a124 expand_direct_optab_fn(internal_fn, gcall*, optab_tag, unsigned int)
> ???:0
> 0xf2c87c expand_SAT_SUB(internal_fn, gcall*)
> ???:0
>
> We allowed the operand convert when matching SAT_SUB in match.pd, to support
> the zip benchmark SAT_SUB pattern.  Aka,
>
> (convert? (minus (convert1? @0) (convert1? @1))) for below sample code.
>
> void test (uint16_t *x, unsigned b, unsigned n)
> {
>   unsigned a = 0;
>   register uint16_t *p = x;
>
>   do {
> a = *--p;
> *p = (uint16_t)(a >= b ? a - b : 0); // Truncate after .SAT_SUB
>   } while (--n);
> }
>
> The pattern match for SAT_SUB itself may also act on below scalar sample
> code too.
>
> unsigned long long GetTimeFromFrames(int);
> unsigned long long GetMicroSeconds();
>
> void DequeueEvent(unsigned frame) {
>   long long frame_time = GetTimeFromFrames(frame);
>   unsigned long long current_time = GetMicroSeconds();
>   DequeueEvent(frame_time < current_time ? 0 : frame_time - current_time);
> }
>
> Aka:
>
> uint32_t a = (uint32_t)SAT_SUB(uint64_t, uint64_t);
>
> Then there will be a problem when ia32 or -m32 is given when compiling.
> Because we only check the lhs (aka uint32_t) type is supported by ifn
> and missed the operand (aka uint64_t).  Mostly DImode is disabled for
> 32 bits target like ia32 or rv32gcv, and then trigger ICE when expanding.
>
> The below test suites are passed for this patch.
> * The rv64gcv fully regression test.
> * The x86 bootstrap test.
> * The x86 fully regression test.
>
> PR middle-end/116814
>
> gcc/ChangeLog:
>
> * tree-ssa-math-opts.cc (build_saturation_binary_arith_call): Add
> ifn is_supported check for operand TREE type.
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/torture/pr116814-1.C: New test.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/testsuite/g++.dg/torture/pr116814-1.C | 12 
>  gcc/tree-ssa-math-opts.cc | 23 +++
>  2 files changed, 27 insertions(+), 8 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/torture/pr116814-1.C
>
> diff --git a/gcc/testsuite/g++.dg/torture/pr116814-1.C 
> b/gcc/testsuite/g++.dg/torture/pr116814-1.C
> new file mode 100644
> index 000..dd6f29daa7c
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/torture/pr116814-1.C
> @@ -0,0 +1,12 @@
> +/* { dg-do compile { target { ia32 } } } */
> +/* { dg-options "-O2" } */
> +
> +unsigned long long GetTimeFromFrames(int);
> +unsigned long long GetMicroSeconds();
> +
> +void DequeueEvent(unsigned frame) {
> +  long long frame_time = GetTimeFromFrames(frame);
> +  unsigned long long current_time = GetMicroSeconds();
> +
> +  DequeueEvent(frame_time < current_time ? 0 : frame_time - current_time);
> +}
> diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
> index d61668aacfc..361761cedef 100644
> --- a/gcc/tree-ssa-math-opts.cc
> +++ b/gcc/tree-ssa-math-opts.cc
> @@ -4042,15 +4042,22 @@ build_saturation_binary_arith_call 
> (gimple_stmt_iterator *gsi, gphi *phi,
> internal_fn fn, tree lhs, tree op_0,
> tree op_1)
>  {
> -  if (direct_internal_fn_su

RE: [PATCH v1] Widening-Mul: Fix one ICE for SAT_SUB matching operand promotion

2024-09-24 Thread Li, Pan2
Got it, thanks a lot.

Pan

-Original Message-
From: Uros Bizjak  
Sent: Tuesday, September 24, 2024 3:29 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; richard.guent...@gmail.com; 
tamar.christ...@arm.com; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1] Widening-Mul: Fix one ICE for SAT_SUB matching operand 
promotion

On Tue, Sep 24, 2024 at 8:53 AM Li, Pan2  wrote:
>
> Got it and thanks, let me rerun to make sure it works well as expected.

For reference, this is documented in:

https://gcc.gnu.org/wiki/Testing_GCC
https://gcc-newbies-guide.readthedocs.io/en/latest/working-with-the-testsuite.html
https://gcc.gnu.org/install/test.html

Uros.


RE: [PATCH v1] Widening-Mul: Fix one ICE for SAT_SUB matching operand promotion

2024-09-23 Thread Li, Pan2
Got it and thanks, let me rerun to make sure it works well as expected.

Pan

-Original Message-
From: Uros Bizjak  
Sent: Tuesday, September 24, 2024 2:33 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; richard.guent...@gmail.com; 
tamar.christ...@arm.com; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1] Widening-Mul: Fix one ICE for SAT_SUB matching operand 
promotion

On Tue, Sep 24, 2024 at 8:24 AM Li, Pan2  wrote:
>
> Thanks Uros for comments.
>
> > This is not "target", but "middle-end" component. Even though the bug
> > is exposed on x86_64 target, the fix is in the middle-end code, not in
> > the target code.
>
> Sure, will rename to middle-end.
>
> > Please remove -m32 and use "{ dg-do compile { target ia32 } }" instead.
>
> Is there any suggestion to run the "ia32" test when configure gcc build?
> I first leverage ia32 but complain UNSUPPORTED for this case.

You can add the following to your testsuite run:

RUNTESTFLAGS="--target-board=unix\{,-m32\}"

e.g:

make -j N -k check RUNTESTFLAGS=...

(where N is the number of make threads)

You can also add "dg.exp" or "dg.exp=pr12345.c" (or any other exp file
or testcase name) to RUNTESTFLAGS to run only one exp file or a single
test.

Uros.

> Pan
>
> -Original Message-
> From: Uros Bizjak 
> Sent: Tuesday, September 24, 2024 2:17 PM
> To: Li, Pan2 
> Cc: gcc-patches@gcc.gnu.org; richard.guent...@gmail.com; 
> tamar.christ...@arm.com; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
> jeffreya...@gmail.com; rdapp@gmail.com
> Subject: Re: [PATCH v1] Widening-Mul: Fix one ICE for SAT_SUB matching 
> operand promotion
>
> On Mon, Sep 23, 2024 at 4:58 PM  wrote:
> >
> > From: Pan Li 
> >
> > This patch would like to fix the following ICE for -O2 -m32 of x86_64.
> >
> > during RTL pass: expand
> > JackMidiAsyncWaitQueue.cpp.cpp: In function 'void DequeueEvent(unsigned
> > int)':
> > JackMidiAsyncWaitQueue.cpp.cpp:3:6: internal compiler error: in
> > expand_fn_using_insn, at internal-fn.cc:263
> > 3 | void DequeueEvent(unsigned frame) {
> >   |  ^~~~
> > 0x27b580d diagnostic_context::diagnostic_impl(rich_location*,
> > diagnostic_metadata const*, diagnostic_option_id, char const*,
> > __va_list_tag (*) [1], diagnostic_t)
> > ???:0
> > 0x27c4a3f internal_error(char const*, ...)
> > ???:0
> > 0x27b3994 fancy_abort(char const*, int, char const*)
> > ???:0
> > 0xf25ae5 expand_fn_using_insn(gcall*, insn_code, unsigned int, unsigned int)
> > ???:0
> > 0xf2a124 expand_direct_optab_fn(internal_fn, gcall*, optab_tag, unsigned 
> > int)
> > ???:0
> > 0xf2c87c expand_SAT_SUB(internal_fn, gcall*)
> > ???:0
> >
> > We allowed the operand convert when matching SAT_SUB in match.pd, to support
> > the zip benchmark SAT_SUB pattern.  Aka,
> >
> > (convert? (minus (convert1? @0) (convert1? @1))) for below sample code.
> >
> > void test (uint16_t *x, unsigned b, unsigned n)
> > {
> >   unsigned a = 0;
> >   register uint16_t *p = x;
> >
> >   do {
> > a = *--p;
> > *p = (uint16_t)(a >= b ? a - b : 0); // Truncate after .SAT_SUB
> >   } while (--n);
> > }
> >
> > The pattern match for SAT_SUB itself may also act on below scalar sample
> > code too.
> >
> > unsigned long long GetTimeFromFrames(int);
> > unsigned long long GetMicroSeconds();
> >
> > void DequeueEvent(unsigned frame) {
> >   long long frame_time = GetTimeFromFrames(frame);
> >   unsigned long long current_time = GetMicroSeconds();
> >   DequeueEvent(frame_time < current_time ? 0 : frame_time - current_time);
> > }
> >
> > Aka:
> >
> > uint32_t a = (uint32_t)SAT_SUB(uint64_t, uint64_t);
> >
> > Then there will be a problem when ia32 or -m32 is given when compiling.
> > Because we only check the lhs (aka uint32_t) type is supported by ifn
> > and missed the operand (aka uint64_t).  Mostly DImode is disabled for
> > 32 bits target like ia32 or rv32gcv, and then trigger ICE when expanding.
> >
> > The below test suites are passed for this patch.
> > * The rv64gcv fully regression test.
> > * The x86 bootstrap test.
> > * The x86 fully regression test.
> >
> > PR target/116814
>
> This is not "target", but "middle-end" component. Even though the bug
> is exposed on x86_64 target, the fix is in the middle-end code, not in
> the targ

RE: [PATCH v1] Widening-Mul: Fix one ICE for SAT_SUB matching operand promotion

2024-09-23 Thread Li, Pan2
Thanks Uros for comments.

> This is not "target", but "middle-end" component. Even though the bug
> is exposed on x86_64 target, the fix is in the middle-end code, not in
> the target code.

Sure, will rename to middle-end.

> Please remove -m32 and use "{ dg-do compile { target ia32 } }" instead.

Is there any suggestion to run the "ia32" test when configure gcc build?
I first leverage ia32 but complain UNSUPPORTED for this case.

Pan

-Original Message-
From: Uros Bizjak  
Sent: Tuesday, September 24, 2024 2:17 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; richard.guent...@gmail.com; 
tamar.christ...@arm.com; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1] Widening-Mul: Fix one ICE for SAT_SUB matching operand 
promotion

On Mon, Sep 23, 2024 at 4:58 PM  wrote:
>
> From: Pan Li 
>
> This patch would like to fix the following ICE for -O2 -m32 of x86_64.
>
> during RTL pass: expand
> JackMidiAsyncWaitQueue.cpp.cpp: In function 'void DequeueEvent(unsigned
> int)':
> JackMidiAsyncWaitQueue.cpp.cpp:3:6: internal compiler error: in
> expand_fn_using_insn, at internal-fn.cc:263
> 3 | void DequeueEvent(unsigned frame) {
>   |  ^~~~
> 0x27b580d diagnostic_context::diagnostic_impl(rich_location*,
> diagnostic_metadata const*, diagnostic_option_id, char const*,
> __va_list_tag (*) [1], diagnostic_t)
> ???:0
> 0x27c4a3f internal_error(char const*, ...)
> ???:0
> 0x27b3994 fancy_abort(char const*, int, char const*)
> ???:0
> 0xf25ae5 expand_fn_using_insn(gcall*, insn_code, unsigned int, unsigned int)
> ???:0
> 0xf2a124 expand_direct_optab_fn(internal_fn, gcall*, optab_tag, unsigned int)
> ???:0
> 0xf2c87c expand_SAT_SUB(internal_fn, gcall*)
> ???:0
>
> We allowed the operand convert when matching SAT_SUB in match.pd, to support
> the zip benchmark SAT_SUB pattern.  Aka,
>
> (convert? (minus (convert1? @0) (convert1? @1))) for below sample code.
>
> void test (uint16_t *x, unsigned b, unsigned n)
> {
>   unsigned a = 0;
>   register uint16_t *p = x;
>
>   do {
> a = *--p;
> *p = (uint16_t)(a >= b ? a - b : 0); // Truncate after .SAT_SUB
>   } while (--n);
> }
>
> The pattern match for SAT_SUB itself may also act on below scalar sample
> code too.
>
> unsigned long long GetTimeFromFrames(int);
> unsigned long long GetMicroSeconds();
>
> void DequeueEvent(unsigned frame) {
>   long long frame_time = GetTimeFromFrames(frame);
>   unsigned long long current_time = GetMicroSeconds();
>   DequeueEvent(frame_time < current_time ? 0 : frame_time - current_time);
> }
>
> Aka:
>
> uint32_t a = (uint32_t)SAT_SUB(uint64_t, uint64_t);
>
> Then there will be a problem when ia32 or -m32 is given when compiling.
> Because we only check the lhs (aka uint32_t) type is supported by ifn
> and missed the operand (aka uint64_t).  Mostly DImode is disabled for
> 32 bits target like ia32 or rv32gcv, and then trigger ICE when expanding.
>
> The below test suites are passed for this patch.
> * The rv64gcv fully regression test.
> * The x86 bootstrap test.
> * The x86 fully regression test.
>
> PR target/116814

This is not "target", but "middle-end" component. Even though the bug
is exposed on x86_64 target, the fix is in the middle-end code, not in
the target code.

> gcc/ChangeLog:
>
> * tree-ssa-math-opts.cc (build_saturation_binary_arith_call): Add
> ifn is_supported check for operand TREE type.
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/torture/pr116814-1.C: New test.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/testsuite/g++.dg/torture/pr116814-1.C | 12 
>  gcc/tree-ssa-math-opts.cc | 23 +++
>  2 files changed, 27 insertions(+), 8 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/torture/pr116814-1.C
>
> diff --git a/gcc/testsuite/g++.dg/torture/pr116814-1.C 
> b/gcc/testsuite/g++.dg/torture/pr116814-1.C
> new file mode 100644
> index 000..8db5b020cfd
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/torture/pr116814-1.C
> @@ -0,0 +1,12 @@
> +/* { dg-do compile { target { i?86-*-* x86_64-*-* } } } */
> +/* { dg-options "-O2 -m32" } */

Please remove -m32 and use "{ dg-do compile { target ia32 } }" instead.

Uros,

> +
> +unsigned long long GetTimeFromFrames(int);
> +unsigned long long GetMicroSeconds();
> +
> +void DequeueEvent(unsigned frame) {
> +  long long frame_time = GetTimeFromFrames(frame);
> +  unsigned long long current_time = GetMicroSeconds();
> +
> +  DequeueEven

RE: [PATCH] RISC-V: testsuite: Fix SELECT_VL SLP fallout.

2024-09-19 Thread Li, Pan2
Thanks Robin.

> I think those tests don't really need to check for vsetvl anyway.
Looks only scan asm for RVV fixed-pointer insn is good enough for vector part, 
which
is somehow different to scalar. I will make the change after this patch pushed.

Pan

-Original Message-
From: Robin Dapp  
Sent: Thursday, September 19, 2024 9:25 PM
To: gcc-patches 
Cc: pal...@dabbelt.com; kito.ch...@gmail.com; juzhe.zh...@rivai.ai; 
jeffreya...@gmail.com; Li, Pan2 ; rdapp@gmail.com
Subject: [PATCH] RISC-V: testsuite: Fix SELECT_VL SLP fallout.

Hi,

this fixes asm-scan fallout from r15-3712-g5e3a4a01785e2d where we allow
SLP with SELECT_VL.

Assisted by sed and regtested on rv64gcv_zvfh_zvbb.

Rather lengthy but obvious, so going to commit after a while if the CI is
happy.  I think those tests don't really need to check for vsetvl anyway,
not all of them at least but I didn't change that for now.

Regards
 Robin

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-1.c: Expect
length-controlled loop.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-10.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-11.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-12.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-13.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-14.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-15.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-16.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-17.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-18.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-19.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-20.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-21.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-22.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-23.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-24.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-25.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-26.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-27.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-28.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-29.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-30.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-31.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-32.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-8.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-9.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-10.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-11.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-12.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-13.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-14.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-15.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-16.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-17.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-18.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-19.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-20.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-21.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-22.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-23.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-24.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-25.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-26.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat

RE: [PATCH v5 4/4] RISC-V: Fix vector SAT_ADD dump check due to middle-end change

2024-09-19 Thread Li, Pan2
> So for the future I'd suggest you post those with a remark that you think
> they're obvious and going to commit in a day (or some other reasonable
> timeframe) if there are no complaints.

Oh, I see. Thanks Robin for reminding.

That would be perfect. Do you have any best practices for the remark "obvious"?
Like [NFC] in subject to give some hit for not-function-change, maybe take 
[TBO] stand for to-be-obvious or something like that.

Pan

-Original Message-
From: Robin Dapp  
Sent: Thursday, September 19, 2024 4:26 PM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: richard.guent...@gmail.com; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; 
kito.ch...@gmail.com; jeffreya...@gmail.com; Robin Dapp 
Subject: Re: [PATCH v5 4/4] RISC-V: Fix vector SAT_ADD dump check due to 
middle-end change

> This patch would like fix the dump check times of vector SAT_ADD.  The
> middle-end change makes the match times from 2 to 4 times.
>
> The below test suites are passed for this patch.
> * The rv64gcv fully regression test.

That's OK.  And I think testsuite fixup patches like this you can consider
"obvious" as long as you're sure the underlying reason is understood.
In particular as you have been working in the saturating space for a while now.

So for the future I'd suggest you post those with a remark that you think
they're obvious and going to commit in a day (or some other reasonable
timeframe) if there are no complaints.

-- 
Regards
 Robin



RE: [PATCH v5 2/4] Genmatch: Refine the gen_phi_on_cond by match_cond_with_binary_phi

2024-09-18 Thread Li, Pan2
Thanks Richard for comments.

Will commit it with that change if no surprise from test suite.

Pan

-Original Message-
From: Richard Biener  
Sent: Thursday, September 19, 2024 2:23 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; 
kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v5 2/4] Genmatch: Refine the gen_phi_on_cond by 
match_cond_with_binary_phi

On Thu, Sep 19, 2024 at 6:11 AM  wrote:
>
> From: Pan Li 
>
> This patch would like to leverage the match_cond_with_binary_phi to
> match the phi on cond, and get the true/false arg if matched.  This
> helps a lot to simplify the implementation of gen_phi_on_cond.
>
> Before this patch:
> basic_block _b1 = gimple_bb (_a1);
> if (gimple_phi_num_args (_a1) == 2)
>   {
> basic_block _pb_0_1 = EDGE_PRED (_b1, 0)->src;
> basic_block _pb_1_1 = EDGE_PRED (_b1, 1)->src;
> basic_block _db_1 = safe_dyn_cast  (*gsi_last_bb (_pb_0_1)) ? 
> _pb_0_1 : _pb_1_1;
> basic_block _other_db_1 = safe_dyn_cast  (*gsi_last_bb 
> (_pb_0_1)) ? _pb_1_1 : _pb_0_1;
> gcond *_ct_1 = safe_dyn_cast  (*gsi_last_bb (_db_1));
> if (_ct_1 && EDGE_COUNT (_other_db_1->preds) == 1
> && EDGE_COUNT (_other_db_1->succs) == 1
> && EDGE_PRED (_other_db_1, 0)->src == _db_1)
> {
>   tree _cond_lhs_1 = gimple_cond_lhs (_ct_1);
>   tree _cond_rhs_1 = gimple_cond_rhs (_ct_1);
>   tree _p0 = build2 (gimple_cond_code (_ct_1), boolean_type_node, 
> _cond_lhs_1, _cond_rhs_1);
>   bool _arg_0_is_true_1 = gimple_phi_arg_edge (_a1, 0)->flags & 
> EDGE_TRUE_VALUE;
>   tree _p1 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? 0 : 1);
>   tree _p2 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? 1 : 0);
> ...
>
> After this patch:
> basic_block _b1 = gimple_bb (_a1);
> tree _p1, _p2;
> gcond *_cond_1 = match_cond_with_binary_phi (_a1, &_p1, &_p2);
> if (_cond_1 && _p1 && _p2)

It should be enough to test _cond_1 for nullptr, at least I think the API should
guarantee that _p1 and _p2 are then set correctly.

OK with that change.

Richard.

>   {
> tree _cond_lhs_1 = gimple_cond_lhs (_cond_1);
> tree _cond_rhs_1 = gimple_cond_rhs (_cond_1);
> tree _p0 = build2 (gimple_cond_code (_cond_1), boolean_type_node, 
> _cond_lhs_1, _cond_rhs_1);
> ...
>
> The below test suites are passed for this patch.
> * The rv64gcv fully regression test.
> * The x86 bootstrap test.
> * The x86 fully regression test.
>
> gcc/ChangeLog:
>
> * genmatch.cc (dt_operand::gen_phi_on_cond): Leverage the
> match_cond_with_binary_phi API to get cond gimple, true and
> false TREE arg.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/genmatch.cc | 67 +++--
>  1 file changed, 15 insertions(+), 52 deletions(-)
>
> diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc
> index f1ff1d18265..149458fffe1 100644
> --- a/gcc/genmatch.cc
> +++ b/gcc/genmatch.cc
> @@ -3516,79 +3516,42 @@ dt_operand::gen (FILE *f, int indent, bool gimple, 
> int depth)
>  void
>  dt_operand::gen_phi_on_cond (FILE *f, int indent, int depth)
>  {
> -  fprintf_indent (f, indent,
> -"basic_block _b%d = gimple_bb (_a%d);\n", depth, depth);
> -
> -  fprintf_indent (f, indent, "if (gimple_phi_num_args (_a%d) == 2)\n", 
> depth);
> +  char opname_0[20];
> +  char opname_1[20];
> +  char opname_2[20];
>
> -  indent += 2;
> -  fprintf_indent (f, indent, "{\n");
> -  indent += 2;
> +  gen_opname (opname_0, 0);
> +  gen_opname (opname_1, 1);
> +  gen_opname (opname_2, 2);
>
>fprintf_indent (f, indent,
> -"basic_block _pb_0_%d = EDGE_PRED (_b%d, 0)->src;\n", depth, depth);
> -  fprintf_indent (f, indent,
> -"basic_block _pb_1_%d = EDGE_PRED (_b%d, 1)->src;\n", depth, depth);
> -  fprintf_indent (f, indent,
> -"basic_block _db_%d = safe_dyn_cast  (*gsi_last_bb (_pb_0_%d)) 
> ? "
> -"_pb_0_%d : _pb_1_%d;\n", depth, depth, depth, depth);
> +"basic_block _b%d = gimple_bb (_a%d);\n", depth, depth);
> +  fprintf_indent (f, indent, "tree %s, %s;\n", opname_1, opname_2);
>fprintf_indent (f, indent,
> -"basic_block _other_db_%d = safe_dyn_cast  "
> -"(*gsi_last_bb (_pb_0_%d)) ? _pb_1_%d : _pb_0_%d;\n",
> -depth, depth, depth, depth);
> +"gcond *_cond_%d = match_cond_with_binary_phi (_a%d, &%s, &%s);\n",
> +depth, depth, opname_1, opname_2);
>
> -  fprintf_indent (f, indent,
> -"gcond *_ct_%d =

RE: [PATCH v1] RISC-V: Add testcases for form 2 of signed scalar SAT_ADD

2024-09-18 Thread Li, Pan2
Thanks Jeff for comments.

> Not particularly happy with the wall of expected assembly output, though 
> it at least tries to be generic in terms of registers and such.

Sort of, the asm check for ssadd is quit long up to a point.

> So I'll ACK.  But

> I'd like us to start thinking about what is the most important part of 
> what's being tested rather than just matching a blob of assembly text.

> I believe (and please correct me if I'm wrong), what you're really 
> testing here is whether or not we're recognizing the saturation idiom in 
> gimple and then proceeding to generate code via the RISC-V backend's 
> define_expand patterns.

Yes, you are right. The tests cover 3 parts, the SAT IR in expand dump, the 
Riscv backend code-gen, and the run test.

> So a better test would check for the IFN, probably in the .optimized or 
> .expand dump.  What I don't offhand see is a good way to test that we're 
> in one of the saturation related expanders.

> I wonder if we could emit debugging output as part of the expander. 
> It's reasonably likely that the dump_file and dump_flags are exposed as 
> global variables.  That in turn would allow us to emit messages into the 
> .expand dump file.  It doesn't have to be terribly complex.  Just a note 
> about which expander we're in and perhaps some info about the arguments. 
>   The point being to get away from using a scan-asm test for something 
> we can look at more directly if we're willing to add a bit more 
> information into the dump file.

I see, that would be a alternative approach for the backend code-gen checking.
It may make it easier for similar cases, I think we can have a try in short 
future.

Pan

-Original Message-
From: Jeff Law  
Sent: Wednesday, September 18, 2024 11:10 PM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1] RISC-V: Add testcases for form 2 of signed scalar 
SAT_ADD



On 9/12/24 8:14 PM, pan2...@intel.com wrote:
> From: Pan Li 
> 
> This patch would like to add testcases of the signed scalar SAT_ADD
> for form 2.  Aka:
> 
> Form 2:
>#define DEF_SAT_S_ADD_FMT_2(T, UT, MIN, MAX) \
>T __attribute__((noinline))  \
>sat_s_add_##T##_fmt_2 (T x, T y) \
>{\
>  T sum = (UT)x + (UT)y; \
>  if ((x ^ y) < 0 || (sum ^ x) >= 0) \
>return sum;  \
>  return x < 0 ? MIN : MAX;  \
>}
> 
> DEF_SAT_S_ADD_FMT_2 (int64_t, uint64_t, INT64_MIN, INT64_MAX)
> 
> The below test are passed for this patch.
> * The rv64gcv fully regression test.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/riscv/sat_arith.h: Add test helper macros.
>   * gcc.target/riscv/sat_s_add-5.c: New test.
>   * gcc.target/riscv/sat_s_add-6.c: New test.
>   * gcc.target/riscv/sat_s_add-7.c: New test.
>   * gcc.target/riscv/sat_s_add-8.c: New test.
>   * gcc.target/riscv/sat_s_add-run-5.c: New test.
>   * gcc.target/riscv/sat_s_add-run-6.c: New test.
>   * gcc.target/riscv/sat_s_add-run-7.c: New test.
>   * gcc.target/riscv/sat_s_add-run-8.c: New test.
Not particularly happy with the wall of expected assembly output, though 
it at least tries to be generic in terms of registers and such.

So I'll ACK.  But

I'd like us to start thinking about what is the most important part of 
what's being tested rather than just matching a blob of assembly text.

I believe (and please correct me if I'm wrong), what you're really 
testing here is whether or not we're recognizing the saturation idiom in 
gimple and then proceeding to generate code via the RISC-V backend's 
define_expand patterns.

So a better test would check for the IFN, probably in the .optimized or 
.expand dump.  What I don't offhand see is a good way to test that we're 
in one of the saturation related expanders.

I wonder if we could emit debugging output as part of the expander. 
It's reasonably likely that the dump_file and dump_flags are exposed as 
global variables.  That in turn would allow us to emit messages into the 
.expand dump file.  It doesn't have to be terribly complex.  Just a note 
about which expander we're in and perhaps some info about the arguments. 
   The point being to get away from using a scan-asm test for something 
we can look at more directly if we're willing to add a bit more 
information into the dump file.

jeff
> 
> Signed-off-by: Pan Li 
> ---
>   gcc/testsuite/gcc.target/riscv/sat_arith.h| 13 
>   gcc/testsuite/gcc.target/riscv/sat_s_add-5.c  | 30 ++

RE: [PATCH v4 1/4] Match: Add interface match_cond_with_binary_phi for true/false arg

2024-09-18 Thread Li, Pan2
Got, thanks Richard and will have a try in v5.

Pan

-Original Message-
From: Richard Biener  
Sent: Wednesday, September 18, 2024 8:06 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; 
kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v4 1/4] Match: Add interface match_cond_with_binary_phi for 
true/false arg

On Wed, Sep 18, 2024 at 2:02 PM Richard Biener
 wrote:
>
> On Fri, Sep 13, 2024 at 12:42 AM  wrote:
> >
> > From: Pan Li 
> >
> > When matching the cond with 2 args phi node, we need to figure out
> > which arg of phi node comes from the true edge of cond block, as
> > well as the false edge.  This patch would like to add interface
> > to perform the action and return the true and false arg in TREE type.
> >
> > There will be some additional handling if one of the arg is INTEGER_CST.
> > Because the INTEGER_CST args may have no source block, thus its' edge
> > source points to the condition block.  See below example in line 31,
> > the 255 INTEGER_CST has block 2 as source.  Thus, we need to find
> > the non-INTEGER_CST (aka _1) to tell which one is the true/false edge.
> > For example, the _1(3) takes block 3 as source, which is the dest
> > of false edge of the condition block.
> >
> >4   │ __attribute__((noinline))
> >5   │ uint8_t sat_u_add_imm_type_check_uint8_t_fmt_2 (uint8_t x)
> >6   │ {
> >7   │   unsigned char _1;
> >8   │   unsigned char _2;
> >9   │   uint8_t _3;
> >   10   │   __complex__ unsigned char _5;
> >   11   │
> >   12   │ ;;   basic block 2, loop depth 0
> >   13   │ ;;pred:   ENTRY
> >   14   │   _5 = .ADD_OVERFLOW (x_4(D), 9);
> >   15   │   _2 = IMAGPART_EXPR <_5>;
> >   16   │   if (_2 != 0)
> >   17   │ goto ; [35.00%]
> >   18   │   else
> >   19   │ goto ; [65.00%]
> >   20   │ ;;succ:   3
> >   21   │ ;;4
> >   22   │
> >   23   │ ;;   basic block 3, loop depth 0
> >   24   │ ;;pred:   2
> >   25   │   _1 = REALPART_EXPR <_5>;
> >   26   │ ;;succ:   4
> >   27   │
> >   28   │ ;;   basic block 4, loop depth 0
> >   29   │ ;;pred:   2
> >   30   │ ;;3
> >   31   │   # _3 = PHI <255(2), _1(3)>
> >   32   │   return _3;
> >   33   │ ;;succ:   EXIT
> >   34   │
> >   35   │ }
> >
> > The below test suites are passed for this patch.
> > * The rv64gcv fully regression test.
> > * The x86 bootstrap test.
> > * The x86 fully regression test.
> >
> > gcc/ChangeLog:
> >
> > * gimple-match-head.cc (match_cond_with_binary_phi): Add new func
> > impl to match binary phi for true and false arg.
> >
> > Signed-off-by: Pan Li 
> > ---
> >  gcc/gimple-match-head.cc | 118 +++
> >  1 file changed, 118 insertions(+)
> >
> > diff --git a/gcc/gimple-match-head.cc b/gcc/gimple-match-head.cc
> > index 924d3f1e710..6e7a3a0d62e 100644
> > --- a/gcc/gimple-match-head.cc
> > +++ b/gcc/gimple-match-head.cc
> > @@ -375,3 +375,121 @@ gimple_bitwise_inverted_equal_p (tree expr1, tree 
> > expr2, bool &wascmp, tree (*va
> >  return true;
> >return false;
> >  }
> > +
> > +/*
> > + * Return the relevant gcond * of the given phi, as well as the true
> > + * and false TREE args of the phi.  Or return NULL.
> > + *
> > + * If matched the gcond *, the output argument TREE true_arg and false_arg
> > + * will be updated to the relevant args of phi.
> > + *
> > + * If failed to match, NULL gcond * will be returned, as well as the output
> > + * arguments will be set to NULL_TREE.
> > + */
> > +
> > +static inline gcond *
> > +match_cond_with_binary_phi (gphi *phi, tree *true_arg, tree *false_arg)
> > +{
> > +  *true_arg = *false_arg = NULL_TREE;
> > +
> > +  if (gimple_phi_num_args (phi) != 2
> > +  || EDGE_COUNT (gimple_bb (phi)->preds) != 2)
> > +return NULL;
> > +
> > +  basic_block pred_0 = EDGE_PRED (gimple_bb (phi), 0)->src;
> > +  basic_block pred_1 = EDGE_PRED (gimple_bb (phi), 1)->src;
> > +  basic_block cond_block = NULL;
> > +
> > +  if ((EDGE_COUNT (pred_0->succs) == 2 && EDGE_COUNT (pred_1->succs) == 1)
> > + || (EDGE_COUNT (pred_0->succs) == 1 && EDGE_COUNT (pred_1->succs) == 
> > 2))
> > +{
> > +  /* For below control flow graph:
> > +   

RE: [PATCH v3 2/5] Match: Add interface match_cond_with_binary_phi for true/false arg

2024-09-11 Thread Li, Pan2
Thanks Richard for comments.

> Yes, inline both CFG matches and unify them - there should be exactly
> three cases at
> the moment.  And "duplicate" computing the true/false arg into the
> respective cases
> since it's trivial which edge(s) to look at.

Got it, will resend the v4 series for this change.

Pan

-Original Message-
From: Richard Biener  
Sent: Thursday, September 12, 2024 2:51 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; 
kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v3 2/5] Match: Add interface match_cond_with_binary_phi for 
true/false arg

On Thu, Sep 12, 2024 at 3:41 AM Li, Pan2  wrote:
>
> Thanks Richard for comments.
>
> > why would arg_edge depend on whether t0 is INTEGER_CST or not?
> Because the edge->src of INTEGER_CST points to the cond block which cannot 
> match the
> edge->dest of the cond_block. For example as below, the first arg of PHI is 
> 255(2), which
> cannot match neither goto  nor goto .
>
> Thus, I need to take the second arg, aka _1(3) to match the edge->dest of 
> cond_block.
> Aka the phi arg edge->src == cond_block edge->dest. In below example,
> the goto matches _1(3) with false condition, and then I can locate the 
> edge from b2 -> b3.
>
> Or is there any better approach for this scenario?
>
>4   │ __attribute__((noinline))
>5   │ uint8_t sat_u_add_imm_type_check_uint8_t_fmt_2 (uint8_t x)
>6   │ {
>7   │   unsigned char _1;
>8   │   unsigned char _2;
>9   │   uint8_t _3;
>   10   │   __complex__ unsigned char _5;
>   11   │
>   12   │ ;;   basic block 2, loop depth 0
>   13   │ ;;pred:   ENTRY
>   14   │   _5 = .ADD_OVERFLOW (x_4(D), 9);
>   15   │   _2 = IMAGPART_EXPR <_5>;
>   16   │   if (_2 != 0)
>   17   │ goto ; [35.00%]
>   18   │   else
>   19   │ goto ; [65.00%]
>   20   │ ;;succ:   3
>   21   │ ;;4
>   22   │
>   23   │ ;;   basic block 3, loop depth 0
>   24   │ ;;pred:   2
>   25   │   _1 = REALPART_EXPR <_5>;
>   26   │ ;;succ:   4
>   27   │
>   28   │ ;;   basic block 4, loop depth 0
>   29   │ ;;pred:   2
>   30   │ ;;3
>   31   │   # _3 = PHI <255(2), _1(3)>
>   32   │   return _3;
>   33   │ ;;succ:   EXIT
>   34   │
>   35   │ }
>
> > Can you instead inline match_control_flow_graph_case_0 and _1 and do the
> > argument assignment within the three cases of CFGs we accept?  That
> > would be much easier to follow.
>
> To double confirm, are you suggest inline the cfg match for both the case_0 
> and case_1?
> That may make func body grows, and we may have more cases like case_2, 
> case_3... etc.
> If so, I will inline this to match_cond_with_binary_phi in v4.

Yes, inline both CFG matches and unify them - there should be exactly
three cases at
the moment.  And "duplicate" computing the true/false arg into the
respective cases
since it's trivial which edge(s) to look at.

This should make the code more maintainable and easier to understand.

I'm not sure what additional cases you are thinking of, more complex CFGs should
always mean more than a single controlling condition - I'm not sure we
want to go
the way to present those as cond1 | cond2.

Richard.

> Pan
>
> -Original Message-
> From: Richard Biener 
> Sent: Wednesday, September 11, 2024 9:39 PM
> To: Li, Pan2 
> Cc: gcc-patches@gcc.gnu.org; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; 
> kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com
> Subject: Re: [PATCH v3 2/5] Match: Add interface match_cond_with_binary_phi 
> for true/false arg
>
> On Wed, Sep 11, 2024 at 8:31 AM  wrote:
> >
> > From: Pan Li 
> >
> > When matching the cond with 2 args phi node, we need to figure out
> > which arg of phi node comes from the true edge of cond block, as
> > well as the false edge.  This patch would like to add interface
> > to perform the action and return the true and false arg in TREE type.
> >
> > There will be some additional handling if one of the arg is INTEGER_CST.
> > Because the INTEGER_CST args may have no source block, thus its' edge
> > source points to the condition block.  See below example in line 31,
> > the 255 INTEGER_CST has block 2 as source.  Thus, we need to find
> > the non-INTEGER_CST (aka _1) to tell which one is the true/false edge.
> > For example, the _1(3) takes block 3 as source, which is the dest
> > of false edge of the condition block.
> >
> >4   │ __attribute__((noinline))
> >5   │ uint8_t sat_u_add_imm

RE: [PATCH v2] RISC-V: Eliminate latter vsetvl when fused

2024-09-11 Thread Li, Pan2
Committed.

Pan

From: 钟居哲 
Sent: Thursday, September 12, 2024 12:40 PM
To: Bohan Lei ; gcc-patches 

Cc: Li, Pan2 
Subject: Re: [PATCH v2] RISC-V: Eliminate latter vsetvl when fused

LGTM


juzhe.zh...@rivai.ai<mailto:juzhe.zh...@rivai.ai>

From: Bohan Lei<mailto:garth...@linux.alibaba.com>
Date: 2024-09-12 12:38
To: gcc-patches<mailto:gcc-patches@gcc.gnu.org>
CC: juzhe.zhong<mailto:juzhe.zh...@rivai.ai>
Subject: [PATCH v2] RISC-V: Eliminate latter vsetvl when fused
Resent to cc Juzhe.

--

Hi all,

A simple assembly check has been added in this version. Previous version:
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662783.html

Thanks,
Bohan

--

The current vsetvl pass eliminates a vsetvl instruction when the previous
info is "available," but does not when "compatible."  This can lead to not
only redundancy, but also incorrect behaviors when the previous info happens
to be compatible with a later vector instruction, which ends of using the
vsetvl info that should have been eliminated, as is shown in the testcase.
This patch eliminates the vsetvl when the previous info is "compatible."

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (pre_vsetvl::fuse_local_vsetvl_info):
Delete vsetvl insn when `prev_info` is compatible

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/vsetvl_bug-4.c: New test.
---
gcc/config/riscv/riscv-vsetvl.cc  |  3 +++
.../riscv/rvv/vsetvl/vsetvl_bug-4.c   | 19 +++
2 files changed, 22 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_bug-4.c

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index ce831685439..030ffbe2ebb 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -2796,6 +2796,9 @@ pre_vsetvl::fuse_local_vsetvl_info ()
  curr_info.dump (dump_file, "");
}
  m_dem.merge (prev_info, curr_info);
+   if (!curr_info.vl_used_by_non_rvv_insn_p ()
+   && vsetvl_insn_p (curr_info.get_insn ()->rtl ()))
+ m_delete_list.safe_push (curr_info);
  if (curr_info.get_read_vl_insn ())
prev_info.set_read_vl_insn (curr_info.get_read_vl_insn ());
  if (dump_file && (dump_flags & TDF_DETAILS))
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_bug-4.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_bug-4.c
new file mode 100644
index 000..04a8ff2945a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_bug-4.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O2 -fno-schedule-insns 
-fdump-rtl-vsetvl-details" } */
+
+#include 
+
+vuint16m1_t
+foo (vuint16m1_t a, vuint16m1_t b, size_t avl)
+{
+  size_t vl;
+  vuint16m1_t ret;
+  uint16_t c = __riscv_vmv_x_s_u16m1_u16(a);
+  vl = __riscv_vsetvl_e8mf2 (avl);
+  ret = __riscv_vadd_vx_u16m1 (a, c, avl);
+  ret = __riscv_vadd_vv_u16m1 (ret, a, vl);
+  return ret;
+}
+
+/* { dg-final { scan-rtl-dump "Eliminate insn" "vsetvl" } }  */
+/* { dg-final { scan-assembler-times {vsetvli} 2 } } */
--
2.17.1



RE: [PATCH v3 2/5] Match: Add interface match_cond_with_binary_phi for true/false arg

2024-09-11 Thread Li, Pan2
Thanks Richard for comments.

> why would arg_edge depend on whether t0 is INTEGER_CST or not?
Because the edge->src of INTEGER_CST points to the cond block which cannot 
match the 
edge->dest of the cond_block. For example as below, the first arg of PHI is 
255(2), which 
cannot match neither goto  nor goto .

Thus, I need to take the second arg, aka _1(3) to match the edge->dest of 
cond_block.
Aka the phi arg edge->src == cond_block edge->dest. In below example,
the goto matches _1(3) with false condition, and then I can locate the 
edge from b2 -> b3.

Or is there any better approach for this scenario?

   4   │ __attribute__((noinline))
   5   │ uint8_t sat_u_add_imm_type_check_uint8_t_fmt_2 (uint8_t x)
   6   │ {
   7   │   unsigned char _1;
   8   │   unsigned char _2;
   9   │   uint8_t _3;
  10   │   __complex__ unsigned char _5;
  11   │
  12   │ ;;   basic block 2, loop depth 0
  13   │ ;;pred:   ENTRY
  14   │   _5 = .ADD_OVERFLOW (x_4(D), 9);
  15   │   _2 = IMAGPART_EXPR <_5>;
  16   │   if (_2 != 0)
  17   │ goto ; [35.00%]
  18   │   else
  19   │ goto ; [65.00%]
  20   │ ;;succ:   3
  21   │ ;;4
  22   │
  23   │ ;;   basic block 3, loop depth 0
  24   │ ;;pred:   2
  25   │   _1 = REALPART_EXPR <_5>;
  26   │ ;;succ:   4
  27   │
  28   │ ;;   basic block 4, loop depth 0
  29   │ ;;pred:   2
  30   │ ;;3
  31   │   # _3 = PHI <255(2), _1(3)>
  32   │   return _3;
  33   │ ;;succ:   EXIT
  34   │
  35   │ }

> Can you instead inline match_control_flow_graph_case_0 and _1 and do the
> argument assignment within the three cases of CFGs we accept?  That
> would be much easier to follow.

To double confirm, are you suggest inline the cfg match for both the case_0 and 
case_1?
That may make func body grows, and we may have more cases like case_2, 
case_3... etc.
If so, I will inline this to match_cond_with_binary_phi in v4.

Pan

-Original Message-
From: Richard Biener  
Sent: Wednesday, September 11, 2024 9:39 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; 
kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v3 2/5] Match: Add interface match_cond_with_binary_phi for 
true/false arg

On Wed, Sep 11, 2024 at 8:31 AM  wrote:
>
> From: Pan Li 
>
> When matching the cond with 2 args phi node, we need to figure out
> which arg of phi node comes from the true edge of cond block, as
> well as the false edge.  This patch would like to add interface
> to perform the action and return the true and false arg in TREE type.
>
> There will be some additional handling if one of the arg is INTEGER_CST.
> Because the INTEGER_CST args may have no source block, thus its' edge
> source points to the condition block.  See below example in line 31,
> the 255 INTEGER_CST has block 2 as source.  Thus, we need to find
> the non-INTEGER_CST (aka _1) to tell which one is the true/false edge.
> For example, the _1(3) takes block 3 as source, which is the dest
> of false edge of the condition block.
>
>4   │ __attribute__((noinline))
>5   │ uint8_t sat_u_add_imm_type_check_uint8_t_fmt_2 (uint8_t x)
>6   │ {
>7   │   unsigned char _1;
>8   │   unsigned char _2;
>9   │   uint8_t _3;
>   10   │   __complex__ unsigned char _5;
>   11   │
>   12   │ ;;   basic block 2, loop depth 0
>   13   │ ;;pred:   ENTRY
>   14   │   _5 = .ADD_OVERFLOW (x_4(D), 9);
>   15   │   _2 = IMAGPART_EXPR <_5>;
>   16   │   if (_2 != 0)
>   17   │ goto ; [35.00%]
>   18   │   else
>   19   │ goto ; [65.00%]
>   20   │ ;;succ:   3
>   21   │ ;;4
>   22   │
>   23   │ ;;   basic block 3, loop depth 0
>   24   │ ;;pred:   2
>   25   │   _1 = REALPART_EXPR <_5>;
>   26   │ ;;succ:   4
>   27   │
>   28   │ ;;   basic block 4, loop depth 0
>   29   │ ;;pred:   2
>   30   │ ;;3
>   31   │   # _3 = PHI <255(2), _1(3)>
>   32   │   return _3;
>   33   │ ;;succ:   EXIT
>   34   │
>   35   │ }
>
> The below test suites are passed for this patch.
> * The rv64gcv fully regression test.
> * The x86 bootstrap test.
> * The x86 fully regression test.
>
> gcc/ChangeLog:
>
> * gimple-match-head.cc (match_cond_with_binary_phi): Add new func
> impl to match binary phi for true and false arg.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/gimple-match-head.cc | 60 
>  1 file changed, 60 insertions(+)
>
> diff --git a/gcc/gimple-match-head.cc b/gcc/gimple-match-head.cc
> index c51728ae742..64f4f28cc72 100644
> --- a/gcc/gimple-match-head.cc
> +++ b/gcc/gimple-match

RE: [PATCH 1/2] RISC-V: Fix vl_used_by_non_rvv_insn logic of vsetvl pass

2024-09-11 Thread Li, Pan2
Committed, thanks Juzhe and garthlei.

Pan

From: 钟居哲 
Sent: Wednesday, September 11, 2024 7:36 PM
To: gcc-patches 
Cc: Li, Pan2 ; Robin Dapp ; jeffreyalaw 
; kito.cheng 
Subject: [PATCH 1/2] RISC-V: Fix vl_used_by_non_rvv_insn logic of vsetvl pass

Hi, garthlei.
Thanks for fixing it.

I see, you are trying to fix this bug:

lui a5,%hi(.LANCHOR0)
addia5,a5,%lo(.LANCHOR0)
vsetivlizero,2,e8,mf8,ta,ma   ---> It should be a4, 2 instead 
of zero, 2
vle64.v v1,0(a5)
--- missing vsetvli a4, a4 here
sllia4,a4,1
vsetvli zero,a4,e32,m1,ta,ma
li  a2,-1
addia5,a5,16
vslide1down.vx  v1,v1,a2
vslide1down.vx  v1,v1,zero
vsetivlizero,2,e64,m1,ta,ma
vse64.v v1,0(a5)
ret

When I revisit the codes here:

m_vl = ::get_vl
...
update_avl -> "m_vl" variable is modified
...
using wrong m_vl in the following.

A dedicated temporary variable dest_vl looks reasonable here.

LGTM.

The RISC-V folks will commit this patch for you.
Thanks.

juzhe.zh...@rivai.ai<mailto:juzhe.zh...@rivai.ai>

From: Li, Pan2<mailto:pan2...@intel.com>
Date: 2024-09-11 19:29
To: juzhe.zh...@rivai.ai<mailto:juzhe.zh...@rivai.ai>
Subject: FW: [PATCH 1/2] RISC-V: Fix vl_used_by_non_rvv_insn logic of vsetvl 
pass
FYI.

-Original Message-
From: garthlei mailto:garth...@linux.alibaba.com>>
Sent: Wednesday, September 11, 2024 5:10 PM
To: gcc-patches mailto:gcc-patches@gcc.gnu.org>>
Subject: [PATCH 1/2] RISC-V: Fix vl_used_by_non_rvv_insn logic of vsetvl pass

This patch fixes a bug in the current vsetvl pass.  The current pass uses
`m_vl` to determine whether the dest operand has been used by non-RVV
instructions.  However, `m_vl` may have been modified as a result of an
`update_avl` call, and thus would be no longer the dest operand of the
original instruction.  This can lead to incorrect vsetvl eliminations, as is
shown in the testcase.  In this patch, we create a `dest_vl` variable for
this scenerio.

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc: Use `dest_vl` for dest VL operand

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/vsetvl_bug-3.c: New test.
---
gcc/config/riscv/riscv-vsetvl.cc| 16 +++-
.../gcc.target/riscv/rvv/vsetvl/vsetvl_bug-3.c  | 17 +
2 files changed, 28 insertions(+), 5 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_bug-3.c

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 017efa8bc17..ce831685439 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -1002,6 +1002,9 @@ public:
   void parse_insn (insn_info *insn)
   {
+/* The VL dest of the insn */
+rtx dest_vl = NULL_RTX;
+
 m_insn = insn;
 m_bb = insn->bb ();
 /* Return if it is debug insn for the consistency with optimize == 0.  */
@@ -1035,7 +1038,10 @@ public:
 if (m_avl)
   {
if (vsetvl_insn_p (insn->rtl ()) || has_vlmax_avl ())
-   m_vl = ::get_vl (insn->rtl ());
+   {
+ m_vl = ::get_vl (insn->rtl ());
+ dest_vl = m_vl;
+   }
if (has_nonvlmax_reg_avl ())
  m_avl_def = find_access (insn->uses (), REGNO (m_avl))->def ();
@@ -1132,22 +1138,22 @@ public:
   }
 /* Determine if dest operand(vl) has been used by non-RVV instructions.  */
-if (has_vl ())
+if (dest_vl)
   {
const hash_set vl_uses
-   = get_all_real_uses (get_insn (), REGNO (get_vl ()));
+   = get_all_real_uses (get_insn (), REGNO (dest_vl));
for (use_info *use : vl_uses)
  {
gcc_assert (use->insn ()->is_real ());
rtx_insn *rinsn = use->insn ()->rtl ();
if (!has_vl_op (rinsn)
- || count_regno_occurrences (rinsn, REGNO (get_vl ())) != 1)
+ || count_regno_occurrences (rinsn, REGNO (dest_vl)) != 1)
  {
m_vl_used_by_non_rvv_insn = true;
break;
  }
rtx avl = ::get_avl (rinsn);
- if (!avl || !REG_P (avl) || REGNO (get_vl ()) != REGNO (avl))
+ if (!avl || !REG_P (avl) || REGNO (dest_vl) != REGNO (avl))
  {
m_vl_used_by_non_rvv_insn = true;
break;
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_bug-3.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_bug-3.c
new file mode 100644
index 000..c155f5613d2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_bug-3.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32d -O2 -fdump-rtl-vsetvl-details" } 
*/
+
+#include 
+
+uint64_t a[2], b[2];
+
+void
+foo ()
+{
+  size_t vl = __riscv_vsetvl_e64m1 (2);
+  vuint64m1_t vx = __riscv_vle64_v_u64m1 (a, vl);
+  vx = __riscv_vslide1down_vx_u64m1 (vx, 0xull, vl);
+  __riscv_vse64_v_u64m1 (b, vx, vl);
+}
+
+/* { dg-final { scan-rtl-dump-not "Eliminate insn" "vsetvl" } }  */
--
2.17.1



RE: [PATCH v2 2/2] RISC-V: Fix ICE due to inconsistency of RVV intrinsic list in lto and cc1.

2024-09-10 Thread Li, Pan2
> * gcc.target/riscv/rvv/base/bug-11.c: New test.

Seems you missed this file in patch v2?

> +/* Helper for init_builtins in LTO.  */
> +static void
> +handle_pragma_vector_for_lto ()
> +{
> +  struct pragma_intrinsic_flags backup_flags;
> +
> +  riscv_pragma_intrinsic_flags_pollute (&backup_flags);
> +
> +  riscv_option_override ();
> +  init_adjust_machine_modes ();
> +
> +  register_builtin_types ();
> +
> +  handle_pragma_vector ();
> +  riscv_pragma_intrinsic_flags_restore (&backup_flags);
> +
> +  /* Re-initialize after the flags are restored.  */
> +  riscv_option_override ();
> +  init_adjust_machine_modes ();
> +}

Looks this part almost the same as most of riscv_pragma_intrinsic except 
register_builtin_types ().
I wonder if we can wrap a helper to avoid code duplication, and IMO the _lto 
suffix should be
removed as the body of function has nothing to do with lto.

Otherwise no comments from myside, and l'd leave it to kito or juzhe.

Pan

-Original Message-
From: Jin Ma  
Sent: Tuesday, September 10, 2024 1:57 PM
To: gcc-patches@gcc.gnu.org
Cc: jeffreya...@gmail.com; juzhe.zh...@rivai.ai; Li, Pan2 ; 
kito.ch...@gmail.com; richard.guent...@gmail.com; jinma.cont...@gmail.com; Jin 
Ma 
Subject: [PATCH v2 2/2] RISC-V: Fix ICE due to inconsistency of RVV intrinsic 
list in lto and cc1.

When we use flto, the function list of rvv will be generated twice,
once in the cc1 phase and once in the lto phase. However, due to
the different generation methods, the two lists are different.

For example, when there is no zvfh or zvfhmin in arch, it is
generated by calling function "riscv_pragma_intrinsic". since the
TARGET_VECTOR_ELEN_FP_16 is enabled before rvv function generation,
a list of rvv functions related to float16 will be generated. In
the lto phase, the rvv function list is generated only by calling
the function "riscv_init_builtins", but the TARGET_VECTOR_ELEN_FP_16
is disabled, so that the float16-related rvv function list cannot
be generated like cc1. This will cause confusion, resulting in
matching tothe wrong function due to inconsistent fcode in the lto
phase, eventually leading to ICE.

So I think we should be consistent with their generated lists, which
is exactly what this patch does.

gcc/ChangeLog:

* config/riscv/riscv-c.cc (struct pragma_intrinsic_flags): Mov
to riscv-protos.h.
(riscv_pragma_intrinsic_flags_pollute): Mov to riscv-vector-builtins.c.
(riscv_pragma_intrinsic_flags_restore): Likewise.
(riscv_pragma_intrinsic): Likewise.
* config/riscv/riscv-protos.h (struct pragma_intrinsic_flags):
New.
(riscv_pragma_intrinsic_flags_restore): New.
(riscv_pragma_intrinsic_flags_pollute): New.
* config/riscv/riscv-vector-builtins.cc 
(riscv_pragma_intrinsic_flags_pollute): New.
(riscv_pragma_intrinsic_flags_restore): New.
(handle_pragma_vector_for_lto): New.
(init_builtins): Correct the processing logic for lto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/bug-11.c: New test.
---
 gcc/config/riscv/riscv-c.cc   | 70 +--
 gcc/config/riscv/riscv-protos.h   | 13 
 gcc/config/riscv/riscv-vector-builtins.cc | 83 ++-
 3 files changed, 96 insertions(+), 70 deletions(-)

diff --git a/gcc/config/riscv/riscv-c.cc b/gcc/config/riscv/riscv-c.cc
index 71112d9c66d7..7037ecc1268a 100644
--- a/gcc/config/riscv/riscv-c.cc
+++ b/gcc/config/riscv/riscv-c.cc
@@ -34,72 +34,6 @@ along with GCC; see the file COPYING3.  If not see
 
 #define builtin_define(TXT) cpp_define (pfile, TXT)
 
-struct pragma_intrinsic_flags
-{
-  int intrinsic_target_flags;
-
-  int intrinsic_riscv_vector_elen_flags;
-  int intrinsic_riscv_zvl_flags;
-  int intrinsic_riscv_zvb_subext;
-  int intrinsic_riscv_zvk_subext;
-};
-
-static void
-riscv_pragma_intrinsic_flags_pollute (struct pragma_intrinsic_flags *flags)
-{
-  flags->intrinsic_target_flags = target_flags;
-  flags->intrinsic_riscv_vector_elen_flags = riscv_vector_elen_flags;
-  flags->intrinsic_riscv_zvl_flags = riscv_zvl_flags;
-  flags->intrinsic_riscv_zvb_subext = riscv_zvb_subext;
-  flags->intrinsic_riscv_zvk_subext = riscv_zvk_subext;
-
-  target_flags = target_flags
-| MASK_VECTOR;
-
-  riscv_zvl_flags = riscv_zvl_flags
-| MASK_ZVL32B
-| MASK_ZVL64B
-| MASK_ZVL128B;
-
-  riscv_vector_elen_flags = riscv_vector_elen_flags
-| MASK_VECTOR_ELEN_32
-| MASK_VECTOR_ELEN_64
-| MASK_VECTOR_ELEN_FP_16
-| MASK_VECTOR_ELEN_FP_32
-| MASK_VECTOR_ELEN_FP_64;
-
-  riscv_zvb_subext = riscv_zvb_subext
-| MASK_ZVBB
-| MASK_ZVBC
-| MASK_ZVKB;
-
-  riscv_zvk_subext = riscv_zvk_subext
-| MASK_ZVKG
-| MASK_ZVKNED
-| MASK_ZVKNHA
-| MASK_ZVKNHB
-| MASK_ZVKSED
-| MASK_ZVKSH
-| MASK_ZVKN
-| MASK_Z

RE: [PATCH v1] Match: Support form 2 for scalar signed integer .SAT_ADD

2024-09-10 Thread Li, Pan2
Thanks a lot.

> It's just the number of patterns generated
> is 2^number-of-:c, so it's good to prune known unnecessary combinations.

I see, will make the changes as your suggestion and commit it if no surprise 
from test suites.

> Yes, all commutative binary operators require matching types on their 
> operands.

Got it, will revisit the matching I added before for possible redundant 
checking.

Pan

-Original Message-
From: Richard Biener  
Sent: Tuesday, September 10, 2024 3:02 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; 
kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1] Match: Support form 2 for scalar signed integer .SAT_ADD

On Tue, Sep 10, 2024 at 1:05 AM Li, Pan2  wrote:
>
> Thanks Richard for comments.
>
> >> +   The T and UT are type pair like T=int8_t, UT=uint8_t.  */
> >> +(match (signed_integer_sat_add @0 @1)
> >> + (cond^ (ge (bit_and:c (bit_xor:c @0 (nop_convert@2 (plus (nop_convert @0)
> >> + (nop_convert 
> >> @1
> >> +  (bit_not (bit_xor:c @0 @1)))
>
> >You only need one :c on either bit_xor.
>
> Sorry don't get the pointer here. I can understand swap @0 and @1 can also 
> acts on plus op.
> But the first xor with :c would like to allow (@0 @2) and (@2 @0).
>
> Or due to the commutative(xor), swap @0 and @1 also valid for (@1 @2) in the 
> first xor. But
> I failed to get the point how to make the @2 as first arg here.

Hmm, my logic was that there's a canonicalization rule for SSA
operands which is to put
SSA names with higher SSA_NAME_VERSION last.  That means we get the 2nd
bit_xor in a defined order, we don't know the @0 order wrt @2 so we
need to put :c on that.
That should get us all interesting cases plus making sure the @0s match up?

But maybe I'm missing something.  It's just the number of patterns generated
is 2^number-of-:c, so it's good to prune known unnecessary combinations.

> >> +   integer_zerop)
> >> +   @2
> >> +   (bit_xor:c (negate (convert (lt @0 integer_zerop))) max_value))
>
> >> + (if (INTEGRAL_TYPE_P (type) && !TYPE_UNSIGNED (type)
> >> +  && types_match (type, @0, @1
>
> >I think the types_match is redundant as you have the bit_xor combining both.
>
> Got it, does that indicates the bit_xor somehow has the similar type check 
> already? As well as other
> op like and/or ... etc.

Yes, all commutative binary operators require matching types on their operands.

>
> Pan
>
> -Original Message-
> From: Richard Biener 
> Sent: Monday, September 9, 2024 8:19 PM
> To: Li, Pan2 
> Cc: gcc-patches@gcc.gnu.org; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; 
> kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com
> Subject: Re: [PATCH v1] Match: Support form 2 for scalar signed integer 
> .SAT_ADD
>
> On Tue, Sep 3, 2024 at 2:34 PM  wrote:
> >
> > From: Pan Li 
> >
> > This patch would like to support the form 2 of the scalar signed
> > integer .SAT_ADD.  Aka below example:
> >
> > Form 2:
> >   #define DEF_SAT_S_ADD_FMT_2(T, UT, MIN, MAX) \
> >   T __attribute__((noinline))  \
> >   sat_s_add_##T##_fmt_2 (T x, T y) \
> >   {\
> > T sum = (UT)x + (UT)y; \
> >\
> > if ((x ^ y) < 0 || (sum ^ x) >= 0) \
> >   return sum;  \
> >\
> > return x < 0 ? MIN : MAX;  \
> >   }
> >
> > DEF_SAT_S_ADD_FMT_2(int8_t, uint8_t, INT8_MIN, INT8_MAX)
> >
> > We can tell the difference before and after this patch if backend
> > implemented the ssadd3 pattern similar as below.
> >
> > Before this patch:
> >4   │ __attribute__((noinline))
> >5   │ int8_t sat_s_add_int8_t_fmt_2 (int8_t x, int8_t y)
> >6   │ {
> >7   │   int8_t sum;
> >8   │   unsigned char x.0_1;
> >9   │   unsigned char y.1_2;
> >   10   │   unsigned char _3;
> >   11   │   signed char _4;
> >   12   │   signed char _5;
> >   13   │   int8_t _6;
> >   14   │   _Bool _11;
> >   15   │   signed char _12;
> >   16   │   signed char _13;
> >   17   │   signed char _14;
> >   18   │   signed char _22;
> >   19   │   signed char _23;
> >   20   │
> >   21   │ ;;   basic block 2, loop depth 0
> >   22   │ ;;pred:   ENTRY
> >   2

RE: [PATCH v1] Match: Support form 2 for scalar signed integer .SAT_ADD

2024-09-09 Thread Li, Pan2
Thanks Richard for comments.

>> +   The T and UT are type pair like T=int8_t, UT=uint8_t.  */
>> +(match (signed_integer_sat_add @0 @1)
>> + (cond^ (ge (bit_and:c (bit_xor:c @0 (nop_convert@2 (plus (nop_convert @0)
>> + (nop_convert @1
>> +  (bit_not (bit_xor:c @0 @1)))

>You only need one :c on either bit_xor.

Sorry don't get the pointer here. I can understand swap @0 and @1 can also acts 
on plus op.
But the first xor with :c would like to allow (@0 @2) and (@2 @0).

Or due to the commutative(xor), swap @0 and @1 also valid for (@1 @2) in the 
first xor. But
I failed to get the point how to make the @2 as first arg here.

>> +   integer_zerop)
>> +   @2
>> +   (bit_xor:c (negate (convert (lt @0 integer_zerop))) max_value))

>> + (if (INTEGRAL_TYPE_P (type) && !TYPE_UNSIGNED (type)
>> +  && types_match (type, @0, @1

>I think the types_match is redundant as you have the bit_xor combining both.

Got it, does that indicates the bit_xor somehow has the similar type check 
already? As well as other
op like and/or ... etc.

Pan

-Original Message-
From: Richard Biener  
Sent: Monday, September 9, 2024 8:19 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; 
kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1] Match: Support form 2 for scalar signed integer .SAT_ADD

On Tue, Sep 3, 2024 at 2:34 PM  wrote:
>
> From: Pan Li 
>
> This patch would like to support the form 2 of the scalar signed
> integer .SAT_ADD.  Aka below example:
>
> Form 2:
>   #define DEF_SAT_S_ADD_FMT_2(T, UT, MIN, MAX) \
>   T __attribute__((noinline))  \
>   sat_s_add_##T##_fmt_2 (T x, T y) \
>   {\
> T sum = (UT)x + (UT)y; \
>\
> if ((x ^ y) < 0 || (sum ^ x) >= 0) \
>   return sum;  \
>\
> return x < 0 ? MIN : MAX;  \
>   }
>
> DEF_SAT_S_ADD_FMT_2(int8_t, uint8_t, INT8_MIN, INT8_MAX)
>
> We can tell the difference before and after this patch if backend
> implemented the ssadd3 pattern similar as below.
>
> Before this patch:
>4   │ __attribute__((noinline))
>5   │ int8_t sat_s_add_int8_t_fmt_2 (int8_t x, int8_t y)
>6   │ {
>7   │   int8_t sum;
>8   │   unsigned char x.0_1;
>9   │   unsigned char y.1_2;
>   10   │   unsigned char _3;
>   11   │   signed char _4;
>   12   │   signed char _5;
>   13   │   int8_t _6;
>   14   │   _Bool _11;
>   15   │   signed char _12;
>   16   │   signed char _13;
>   17   │   signed char _14;
>   18   │   signed char _22;
>   19   │   signed char _23;
>   20   │
>   21   │ ;;   basic block 2, loop depth 0
>   22   │ ;;pred:   ENTRY
>   23   │   x.0_1 = (unsigned char) x_7(D);
>   24   │   y.1_2 = (unsigned char) y_8(D);
>   25   │   _3 = x.0_1 + y.1_2;
>   26   │   sum_9 = (int8_t) _3;
>   27   │   _4 = x_7(D) ^ y_8(D);
>   28   │   _5 = x_7(D) ^ sum_9;
>   29   │   _23 = ~_4;
>   30   │   _22 = _5 & _23;
>   31   │   if (_22 >= 0)
>   32   │ goto ; [42.57%]
>   33   │   else
>   34   │ goto ; [57.43%]
>   35   │ ;;succ:   4
>   36   │ ;;3
>   37   │
>   38   │ ;;   basic block 3, loop depth 0
>   39   │ ;;pred:   2
>   40   │   _11 = x_7(D) < 0;
>   41   │   _12 = (signed char) _11;
>   42   │   _13 = -_12;
>   43   │   _14 = _13 ^ 127;
>   44   │ ;;succ:   4
>   45   │
>   46   │ ;;   basic block 4, loop depth 0
>   47   │ ;;pred:   2
>   48   │ ;;3
>   49   │   # _6 = PHI 
>   50   │   return _6;
>   51   │ ;;succ:   EXIT
>   52   │
>   53   │ }
>
> After this patch:
>4   │ __attribute__((noinline))
>5   │ int8_t sat_s_add_int8_t_fmt_2 (int8_t x, int8_t y)
>6   │ {
>7   │   int8_t _6;
>8   │
>9   │ ;;   basic block 2, loop depth 0
>   10   │ ;;pred:   ENTRY
>   11   │   _6 = .SAT_ADD (x_7(D), y_8(D)); [tail call]
>   12   │   return _6;
>   13   │ ;;succ:   EXIT
>   14   │
>   15   │ }
>
> The below test suites are passed for this patch.
> * The rv64gcv fully regression test.
> * The x86 bootstrap test.
> * The x86 fully regression test.
>
> gcc/ChangeLog:
>
> * match.pd: Add the form 2 of signed .SAT_ADD matching.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/match.pd | 15 +++
>  1 file changed, 15 insertions(+)
>
> diff 

RE: [PATCH v2 1/2] Genmatch: Support control flow graph case 1 for phi on condition

2024-09-09 Thread Li, Pan2
Thanks Richard for comments.

> Sorry to spoil this again, but can you instead create an interface like

Need mind, let me update it.

> gcond *
> match_cond_with_phi (gphi *phi, tree *true_arg, tree *false_arg);

> That would from a PHI node match up the controlling condition and
> initialize {true,false}_arg with the PHI args that match the conditions
> true/false case?

> I also think for the diamond case you fail to identify the appropriate
> true/false PHI argument since both incoming edges are not from the
> condition block they won't have EDGE_{TRUE,FALSE}_VALUE set.

Sure thing, I also noticed that in form 4 the both edge of PHI are false, thus
I am working on another patch like extract_true_false_args_from_binary_phi
to take care of this. Let me append that patch to the series v3.

Pan

-Original Message-
From: Richard Biener  
Sent: Monday, September 9, 2024 8:27 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; 
kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v2 1/2] Genmatch: Support control flow graph case 1 for phi 
on condition

On Thu, Sep 5, 2024 at 2:01 PM  wrote:
>
> From: Pan Li 
>
> The gen_phi_on_cond can only support below control flow for cond
> from day 1.  Aka:
>
> +--+
> | def  |
> | ...  |   +-+
> | cond |-->| def |
> +--+   | ... |
>|   +-+
>|  |
>v  |
> +-+   |
> | PHI |<--+
> +-+
>
> Unfortunately, there will be more scenarios of control flow on PHI.
> For example as below:
>
> T __attribute__((noinline))\
> sat_s_add_##T##_fmt_3 (T x, T y)   \
> {  \
>   T sum;   \
>   bool overflow = __builtin_add_overflow (x, y, &sum); \
>   return overflow ? x < 0 ? MIN : MAX : sum;   \
> }
>
> DEF_SAT_S_ADD_FMT_3(int8_t, uint8_t, INT8_MIN, INT8_MAX)
>
> With expanded RTL like below.
>3   │
>4   │ __attribute__((noinline))
>5   │ int8_t sat_s_add_int8_t_fmt_3 (int8_t x, int8_t y)
>6   │ {
>7   │   signed char _1;
>8   │   signed char _2;
>9   │   int8_t _3;
>   10   │   __complex__ signed char _6;
>   11   │   _Bool _8;
>   12   │   signed char _9;
>   13   │   signed char _10;
>   14   │   signed char _11;
>   15   │
>   16   │ ;;   basic block 2, loop depth 0
>   17   │ ;;pred:   ENTRY
>   18   │   _6 = .ADD_OVERFLOW (x_4(D), y_5(D));
>   19   │   _2 = IMAGPART_EXPR <_6>;
>   20   │   if (_2 != 0)
>   21   │ goto ; [50.00%]
>   22   │   else
>   23   │ goto ; [50.00%]
>   24   │ ;;succ:   4
>   25   │ ;;3
>   26   │
>   27   │ ;;   basic block 3, loop depth 0
>   28   │ ;;pred:   2
>   29   │   _1 = REALPART_EXPR <_6>;
>   30   │   goto ; [100.00%]
>   31   │ ;;succ:   5
>   32   │
>   33   │ ;;   basic block 4, loop depth 0
>   34   │ ;;pred:   2
>   35   │   _8 = x_4(D) < 0;
>   36   │   _9 = (signed char) _8;
>   37   │   _10 = -_9;
>   38   │   _11 = _10 ^ 127;
>   39   │ ;;succ:   5
>   40   │
>   41   │ ;;   basic block 5, loop depth 0
>   42   │ ;;pred:   3
>   43   │ ;;4
>   44   │   # _3 = PHI <_1(3), _11(4)>
>   45   │   return _3;
>   46   │ ;;succ:   EXIT
>   47   │
>   48   │ }
>
> The above code will have below control flow which is not supported by
> the gen_phi_on_cond.
>
> +--+
> | def  |
> | ...  |   +-+
> | cond |-->| def |
> +--+   | ... |
>|   +-+
>|  |
>v  |
> +-+   |
> | def |   |
> | ... |   |
> +-+   |
>|  |
>|  |
>v  |
> +-+   |
> | PHI |<--+
> +-+
>
> This patch would like to add support above control flow for the
> gen_phi_on_cond.  The generated match code looks like below.
>
> Before this patch:
> basic_block _b1 = gimple_bb (_a1);
> if (gimple_phi_num_args (_a1) == 2)
>   {
> basic_block _pb_0_1 = EDGE_PRED (_b1, 0)->src;
> basic_block _pb_1_1 = EDGE_PRED (_b1, 1)->src;
> basic_block _db_1 = safe_dyn_cast  (*gsi_last_bb (_pb_0_1)) ? 
> _pb_0_1 : _pb_1_1;
> basic_block _other_db_1 = safe_dyn_cast  (*gsi_last_bb 
> (_pb_0_1)) ? _pb_1_1 : _pb_0_1;
> gcond *_ct_1 = safe_dyn_cast  (*gsi_last_bb (_db_1));
> if (_ct_1 && EDGE_COUNT (_other_db_1->preds) == 1
>   &a

RE: [PATCH] RISC-V: Fix ICE for rvv in lto

2024-09-09 Thread Li, Pan2
> Any comments on this patch?

I may need some time to go through all details (PS: Sorry I cannot approve 
patches, leave it to juzhe or kito).
Thanks a lot for fixing this.

Pan

-Original Message-
From: Jin Ma  
Sent: Monday, September 9, 2024 6:30 PM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: jeffreya...@gmail.com; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
jinma.cont...@gmail.com
Subject: Re: [PATCH] RISC-V: Fix ICE for rvv in lto

> I see, I can reproduce this when build "-march=rv64gcv -mabi=lp64d -flto -O0 
> test.c -o test.elf".
> 
> #include 
> 
> int
> main ()
> {
>   size_t vl = 8;
>   vint32m1_t vs1 = {};
>   vint32m1_t vs2 = {};
>   vint32m1_t vd = __riscv_vadd_vv_i32m1(vs1, vs2, vl);
> 
>   return (int)&vd;
> }
> 
> Pan

Hi, Pan

Any comments on this patch?

I think this patch is quite important, because RVV is completely unavailable on 
LTO at
present. In fact, I discovered this ICE while trying to compile some 
computational
libraries using LTO. Unfortunately, none of the libraries currently compile 
through
properly.

BR
Jin


RE: [PATCH v1] Vect: Support form 1 of vector signed integer .SAT_ADD

2024-09-08 Thread Li, Pan2
Kindly ping.

Pan

-Original Message-
From: Li, Pan2  
Sent: Friday, August 30, 2024 6:16 PM
To: gcc-patches@gcc.gnu.org
Cc: richard.guent...@gmail.com; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; 
kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com; Li, Pan2 

Subject: [PATCH v1] Vect: Support form 1 of vector signed integer .SAT_ADD

From: Pan Li 

This patch would like to support the vector signed ssadd pattern
for the RISC-V backend.  Aka

Form 1:
  #define DEF_VEC_SAT_S_ADD_FMT_1(T, UT, MIN, MAX)   \
  void __attribute__((noinline)) \
  vec_sat_s_add_##T##_fmt_1 (T *out, T *x, T *y, unsigned n) \
  {  \
for (unsigned i = 0; i < n; i++) \
  {  \
T sum = (UT)x[i] + (UT)y[i]; \
out[i] = (x[i] ^ y[i]) < 0   \
  ? sum  \
  : (sum ^ x[i]) >= 0\
? sum\
: x[i] < 0 ? MIN : MAX;  \
  }  \
  }

DEF_VEC_SAT_S_ADD_FMT_1(int64_t, uint64_t, INT64_MIN, INT64_MAX)

If the backend implemented the vector mode of ssadd, we will see IR diff
similar as below:

Before this patch:
 108   │   _114 = .SELECT_VL (ivtmp_112, POLY_INT_CST [2, 2]);
 109   │   ivtmp_77 = _114 * 8;
 110   │   vect__4.9_80 = .MASK_LEN_LOAD (vectp_x.7_78, 64B, { -1, ...  }, 
_114, 0);
 111   │   vect__5.10_81 = VIEW_CONVERT_EXPR(vect__4.9_80);
 112   │   vect__7.13_85 = .MASK_LEN_LOAD (vectp_y.11_83, 64B, { -1, ...  }, 
_114, 0);
 113   │   vect__8.14_86 = VIEW_CONVERT_EXPR(vect__7.13_85);
 114   │   vect__9.15_87 = vect__5.10_81 + vect__8.14_86;
 115   │   vect_sum_20.16_88 = VIEW_CONVERT_EXPR(vect__9.15_87);
 116   │   vect__10.17_89 = vect__4.9_80 ^ vect__7.13_85;
 117   │   vect__11.18_90 = vect__4.9_80 ^ vect_sum_20.16_88;
 118   │   mask__46.19_92 = vect__10.17_89 >= { 0, ... };
 119   │   _36 = vect__4.9_80 >> 63;
 120   │   mask__44.26_104 = vect__11.18_90 < { 0, ... };
 121   │   mask__43.27_105 = mask__46.19_92 & mask__44.26_104;
 122   │   _115 = .COND_XOR (mask__43.27_105, _36, { 9223372036854775807, ... 
}, vect_sum_20.16_88);
 123   │   .MASK_LEN_STORE (vectp_out.29_108, 64B, { -1, ... }, _114, 0, _115);
 124   │   vectp_x.7_79 = vectp_x.7_78 + ivtmp_77;
 125   │   vectp_y.11_84 = vectp_y.11_83 + ivtmp_77;
 126   │   vectp_out.29_109 = vectp_out.29_108 + ivtmp_77;
 127   │   ivtmp_113 = ivtmp_112 - _114;

After this patch:
  94   │   # vectp_x.7_82 = PHI 
  95   │   # vectp_y.10_86 = PHI 
  96   │   # vectp_out.14_91 = PHI 
  97   │   # ivtmp_95 = PHI 
  98   │   _97 = .SELECT_VL (ivtmp_95, POLY_INT_CST [2, 2]);
  99   │   ivtmp_81 = _97 * 8;
 100   │   vect__4.9_84 = .MASK_LEN_LOAD (vectp_x.7_82, 64B, { -1, ...  }, _97, 
0);
 101   │   vect__7.12_88 = .MASK_LEN_LOAD (vectp_y.10_86, 64B, { -1, ...  }, 
_97, 0);
 102   │   vect_patt_40.13_89 = .SAT_ADD (vect__4.9_84, vect__7.12_88);
 103   │   .MASK_LEN_STORE (vectp_out.14_91, 64B, { -1, ... }, _97, 0, 
vect_patt_40.13_89);
 104   │   vectp_x.7_83 = vectp_x.7_82 + ivtmp_81;
 105   │   vectp_y.10_87 = vectp_y.10_86 + ivtmp_81;
 106   │   vectp_out.14_92 = vectp_out.14_91 + ivtmp_81;
 107   │   ivtmp_96 = ivtmp_95 - _97;

The below test suites are passed for this patch:
1. The rv64gcv fully regression tests.
2. The x86 bootstrap tests.
3. The x86 fully regression tests.

gcc/ChangeLog:

* match.pd: Add case 2 for the signed .SAT_ADD consumed by
vect pattern.
* tree-vect-patterns.cc (gimple_signed_integer_sat_add): Add new
matching func decl for signed .SAT_ADD.
(vect_recog_sat_add_pattern): Add signed .SAT_ADD pattern match.

Signed-off-by: Pan Li 
---
 gcc/match.pd  | 17 +
 gcc/tree-vect-patterns.cc |  5 -
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/gcc/match.pd b/gcc/match.pd
index be211535a49..578c9dd5b77 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3207,6 +3207,23 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (if (INTEGRAL_TYPE_P (type) && !TYPE_UNSIGNED (type)
   && types_match (type, @0, @1
 
+/* Signed saturation add, case 2:
+   T sum = (T)((UT)X + (UT)Y)
+   SAT_S_ADD = (X ^ Y) < 0 && (X ^ sum) >= 0 ? (-(T)(X < 0) ^ MAX) : sum;
+
+   The T and UT are type pair like T=int8_t, UT=uint8_t.  */
+(match (signed_integer_sat_add @0 @1)
+ (cond^ (bit_and:c (lt (bit_xor:c @0 (nop_convert@2 (plus (nop_convert @0)
+ (nop_convert @1
+  integer_zerop)
+  (ge (bit_xor:c @0 @1) integer_zerop))

RE: [PATCH] RISC-V: Fix ICE for rvv in lto

2024-09-08 Thread Li, Pan2
I see, I can reproduce this when build "-march=rv64gcv -mabi=lp64d -flto -O0 
test.c -o test.elf".

#include 

int
main ()
{
  size_t vl = 8;
  vint32m1_t vs1 = {};
  vint32m1_t vs2 = {};
  vint32m1_t vd = __riscv_vadd_vv_i32m1(vs1, vs2, vl);

  return (int)&vd;
}

Pan

-Original Message-
From: Jin Ma  
Sent: Sunday, September 8, 2024 1:15 PM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: jeffreya...@gmail.com; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
jinma.cont...@gmail.com
Subject: Re: [PATCH] RISC-V: Fix ICE for rvv in lto

> > #include 
> > 
> > vint32m1_t foo(vint32m1_t vs1, vint32m1_t vs2, size_t vl) 
> > {
> >   return __riscv_vadd_vv_i32m1(vs1, vs2, vl);
> > }
> 
> To double confirm, you mean "riscv64-linux-gnu-gcc-14 -march=rv64gcv 
> -mabi=lp64d -flto -O0 tmp.c -c -S -o -" with above is able to reproduce this 
> ICE?
> 
> Pan

Not too accurate, please don't add "-S" or "-c", let the compilation go to the 
linker and try to generate the binary.
The normal result of compilation should be to throw an error that the main 
function cannot be found, but unfortunately
ICE appears.

By the way, The gcc-14 in my environment is built on releases/gcc-14, I didn't 
download any compiled gcc.

Of course, it is also possible that my local environment is broken, and I will 
check it again.

BR
Jin


RE: [PATCH] RISC-V: Fix ICE for rvv in lto

2024-09-07 Thread Li, Pan2
> #include 
> 
> vint32m1_t foo(vint32m1_t vs1, vint32m1_t vs2, size_t vl) 
> {
>   return __riscv_vadd_vv_i32m1(vs1, vs2, vl);
> }

To double confirm, you mean "riscv64-linux-gnu-gcc-14 -march=rv64gcv 
-mabi=lp64d -flto -O0 tmp.c -c -S -o -" with above is able to reproduce this 
ICE?

Pan

-Original Message-
From: Jin Ma  
Sent: Saturday, September 7, 2024 5:43 PM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: jeffreya...@gmail.com; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
jinma.cont...@gmail.com
Subject: Re: [PATCH] RISC-V: Fix ICE for rvv in lto

> > +/* Test that we do not have ice when compile */
> > +
> > +/* { dg-do run } */
> > +/* { dg-options "-march=rv64gcv -mabi=lp64d -mrvv-vector-bits=zvl -flto 
> > -O2 -fno-checking" } */
> > +
> > +#include 
> > +
> > +int
> > +main ()
> > +{
> > +  size_t vl = 8;
> > +  vint32m1_t vs1 = {};
> > +  vint32m1_t vs2 = {};
> > +
> > +  __volatile__ vint32m1_t vd = __riscv_vadd_vv_i32m1(vs1, vs2, vl);
> > +
> > +  return 0;
> > +}
> 
> Interesting, do we still have ice when there is no __voltaile__ for vd? As 
> well as gcc-14 branch.
> Because it is quite a common case that should be covered by test already.
> 
> Pan

Yes, I am also surprised that this kind of ICE will appear. It really should be 
covered by
test cases. But in fact, if we do not use zvfh or zvfhmin in arch, rvv cannot 
be used in LTO.

This has nothing to do with "__voltaile__". "__voltaile__" in the case is just 
that I want it
to be compiled to the end and not optimized.

In fact, a simple case can reproduce ICE, including gcc-14 and master, for 
example:

#include 

vint32m1_t foo(vint32m1_t vs1, vint32m1_t vs2, size_t vl) 
{
  return __riscv_vadd_vv_i32m1(vs1, vs2, vl);
}

If we compile this case with the option " -march=rv64gcv -mabi=lp64d  -flto 
-O0", we will
get the following error:

during RTL pass: expand
../test.c: In function 'foo':
../test.c:5:10: internal compiler error: tree check: expected tree that 
contains 'typed' structure, have 'ggc_freed' in function_returns_void_p, at 
config/riscv/riscv-vector-builtins.h:456
5 |   return __riscv_vadd_vv_i32m1(vs1, vs2, vl);
  |  ^
0x4081948 internal_error(char const*, ...)
/iothome/jin.ma/code/master/gcc/gcc/diagnostic-global-context.cc:492
0x1dc584d tree_contains_struct_check_failed(tree_node const*, 
tree_node_structure_enum, char const*, int, char const*)
/iothome/jin.ma/code/master/gcc/gcc/tree.cc:9177
0x10d8230 contains_struct_check(tree_node*, tree_node_structure_enum, char 
const*, int, char const*)
/iothome/jin.ma/code/master/gcc/gcc/tree.h:3779
0x2078f0c riscv_vector::function_call_info::function_returns_void_p()

/iothome/jin.ma/code/master/gcc/gcc/config/riscv/riscv-vector-builtins.h:456
0x2074f54 
riscv_vector::function_expander::function_expander(riscv_vector::function_instance
 const&, tree_node*, tree_node*, rtx_def*)

/iothome/jin.ma/code/master/gcc/gcc/config/riscv/riscv-vector-builtins.cc:3920
0x20787b8 riscv_vector::expand_builtin(unsigned int, tree_node*, rtx_def*)

/iothome/jin.ma/code/master/gcc/gcc/config/riscv/riscv-vector-builtins.cc:4775
0x2029b60 riscv_expand_builtin(tree_node*, rtx_def*, rtx_def*, machine_mode, 
int)
/iothome/jin.ma/code/master/gcc/gcc/config/riscv/riscv-builtins.cc:433
0x1167cb7 expand_builtin(tree_node*, rtx_def*, rtx_def*, machine_mode, int)
/iothome/jin.ma/code/master/gcc/gcc/builtins.cc:7763
0x137e5d2 expand_expr_real_1(tree_node*, rtx_def*, machine_mode, 
expand_modifier, rtx_def**, bool)
/iothome/jin.ma/code/master/gcc/gcc/expr.cc:12390
0x1370068 expand_expr_real(tree_node*, rtx_def*, machine_mode, expand_modifier, 
rtx_def**, bool)
/iothome/jin.ma/code/master/gcc/gcc/expr.cc:9473
0x136434a store_expr(tree_node*, rtx_def*, int, bool, bool)
/iothome/jin.ma/code/master/gcc/gcc/expr.cc:6766
0x13629e3 expand_assignment(tree_node*, tree_node*, bool)
/iothome/jin.ma/code/master/gcc/gcc/expr.cc:6487
0x11a8419 expand_call_stmt
/iothome/jin.ma/code/master/gcc/gcc/cfgexpand.cc:2893
0x11ac48e expand_gimple_stmt_1
/iothome/jin.ma/code/master/gcc/gcc/cfgexpand.cc:3962
0x11acaad expand_gimple_stmt
/iothome/jin.ma/code/master/gcc/gcc/cfgexpand.cc:4104
0x11b55a1 expand_gimple_basic_block
/iothome/jin.ma/code/master/gcc/gcc/cfgexpand.cc:6160
0x11b7b96 execute
/iothome/jin.ma/code/master/gcc/gcc/cfgexpand.cc:6899
Please submit a full bug report, with preprocessed source (by using 
-freport-bug).
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.
lto-wrapper: fatal error: riscv64-unknown-linux-gnu-gcc returned 1 exit status
compilation terminated.
/mnt

RE: [PATCH] RISC-V: Fix ICE for rvv in lto

2024-09-06 Thread Li, Pan2
> +/* Test that we do not have ice when compile */
> +
> +/* { dg-do run } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64d -mrvv-vector-bits=zvl -flto -O2 
> -fno-checking" } */
> +
> +#include 
> +
> +int
> +main ()
> +{
> +  size_t vl = 8;
> +  vint32m1_t vs1 = {};
> +  vint32m1_t vs2 = {};
> +
> +  __volatile__ vint32m1_t vd = __riscv_vadd_vv_i32m1(vs1, vs2, vl);
> +
> +  return 0;
> +}

Interesting, do we still have ice when there is no __voltaile__ for vd? As well 
as gcc-14 branch.
Because it is quite a common case that should be covered by test already.

Pan

-Original Message-
From: Jin Ma  
Sent: Saturday, September 7, 2024 1:31 AM
To: gcc-patches@gcc.gnu.org
Cc: jeffreya...@gmail.com; juzhe.zh...@rivai.ai; Li, Pan2 ; 
kito.ch...@gmail.com; jinma.cont...@gmail.com; Jin Ma 
Subject: [PATCH] RISC-V: Fix ICE for rvv in lto

When we use flto, the function list of rvv will be generated twice,
once in the cc1 phase and once in the lto phase. However, due to
the different generation methods, the two lists are different.

For example, when there is no zvfh or zvfhmin in arch, it is
generated by calling function "riscv_pragma_intrinsic". since the
TARGET_VECTOR_ELEN_FP_16 is enabled before rvv function generation,
a list of rvv functions related to float16 will be generated. In
the lto phase, the rvv function list is generated only by calling
the function "riscv_init_builtins", but the TARGET_VECTOR_ELEN_FP_16
is disabled, so that the float16-related rvv function list cannot
be generated like cc1. This will cause confusion, resulting in
matching tothe wrong function due to inconsistent fcode in the lto
phase, eventually leading to ICE.

So I think we should be consistent with their generated lists, which
is exactly what this patch does.

But there is still a problem here. If we use "-fchecking", we still
have ICE. This is because in the lto phase, after the rvv function
list is generated and before the expand_builtin, the ggc_grow will
be called to clean up the memory, resulting in
"(* registered_functions)[code]->decl" being cleaned up to
", and finally ICE".

I think this is wrong and needs to be fixed, maybe we shouldn't
use "ggc_alloc ()", or is there another better
way to implement it?

I'm trying to fix it here. Any comments here?

gcc/ChangeLog:

* config/riscv/riscv-c.cc (struct pragma_intrinsic_flags): Mov
to riscv-protos.h.
(riscv_pragma_intrinsic_flags_pollute): Mov to riscv-vector-builtins.c.
(riscv_pragma_intrinsic_flags_restore): Likewise.
(riscv_pragma_intrinsic): Likewise.
* config/riscv/riscv-protos.h (struct pragma_intrinsic_flags):
New.
(riscv_pragma_intrinsic_flags_restore): New.
(riscv_pragma_intrinsic_flags_pollute): New.
* config/riscv/riscv-vector-builtins.cc 
(riscv_pragma_intrinsic_flags_pollute): New.
(riscv_pragma_intrinsic_flags_restore): New.
(handle_pragma_vector_for_lto): New.
(init_builtins): Correct the processing logic for lto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/bug-10.c: New test.
---
 gcc/config/riscv/riscv-c.cc   | 70 +---
 gcc/config/riscv/riscv-protos.h   | 13 +++
 gcc/config/riscv/riscv-vector-builtins.cc | 83 ++-
 .../gcc.target/riscv/rvv/base/bug-10.c| 18 
 4 files changed, 114 insertions(+), 70 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/bug-10.c

diff --git a/gcc/config/riscv/riscv-c.cc b/gcc/config/riscv/riscv-c.cc
index 71112d9c66d7..7037ecc1268a 100644
--- a/gcc/config/riscv/riscv-c.cc
+++ b/gcc/config/riscv/riscv-c.cc
@@ -34,72 +34,6 @@ along with GCC; see the file COPYING3.  If not see
 
 #define builtin_define(TXT) cpp_define (pfile, TXT)
 
-struct pragma_intrinsic_flags
-{
-  int intrinsic_target_flags;
-
-  int intrinsic_riscv_vector_elen_flags;
-  int intrinsic_riscv_zvl_flags;
-  int intrinsic_riscv_zvb_subext;
-  int intrinsic_riscv_zvk_subext;
-};
-
-static void
-riscv_pragma_intrinsic_flags_pollute (struct pragma_intrinsic_flags *flags)
-{
-  flags->intrinsic_target_flags = target_flags;
-  flags->intrinsic_riscv_vector_elen_flags = riscv_vector_elen_flags;
-  flags->intrinsic_riscv_zvl_flags = riscv_zvl_flags;
-  flags->intrinsic_riscv_zvb_subext = riscv_zvb_subext;
-  flags->intrinsic_riscv_zvk_subext = riscv_zvk_subext;
-
-  target_flags = target_flags
-| MASK_VECTOR;
-
-  riscv_zvl_flags = riscv_zvl_flags
-| MASK_ZVL32B
-| MASK_ZVL64B
-| MASK_ZVL128B;
-
-  riscv_vector_elen_flags = riscv_vector_elen_flags
-| MASK_VECTOR_ELEN_32
-| MASK_VECTOR_ELEN_64
-| MASK_VECTOR_ELEN_FP_16
-| MASK_VECTOR_ELEN_FP_32
-| MASK_VECTOR_ELEN_FP_64;
-
-  riscv_zvb_subext = riscv_zvb_subext
-| MASK_ZVBB
-| MASK_ZVBC
-| MASK_ZV

RE: [PATCH v1] RISC-V: Fix SAT_* dump check failure due to middle-end change.

2024-09-04 Thread Li, Pan2
> This won't apply as I've already updated those tests.  I think verifying 
> the number of SAT_ADDs is useful to ensure we don't regress as some of 
> these tests detect > 1 SAT_ADD idiom.

I see, thanks Jeff. Then drop this patch.

Pan

-Original Message-
From: Jeff Law  
Sent: Thursday, September 5, 2024 10:10 AM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1] RISC-V: Fix SAT_* dump check failure due to middle-end 
change.



On 9/4/24 8:01 PM, pan2...@intel.com wrote:
> From: Pan Li 
> 
> Some middl-end change may effect on the times of .SAT_*.  Thus,
> refine the dump check for SAT_*, from the scan-times to scan as
> we only care about the .SAT_* exist or not.  And there will an
> other PATCH to perform similar refinement and this PATCH only
> fix the failed test cases.
This won't apply as I've already updated those tests.  I think verifying 
the number of SAT_ADDs is useful to ensure we don't regress as some of 
these tests detect > 1 SAT_ADD idiom.

jeff



RE: [PATCH v1 1/2] Genmatch: Support new flow for phi on condition

2024-09-04 Thread Li, Pan2
Thanks Richard for comments.

> I also think we may want to split out this CFG matching code out into
> a helper function
> in gimple-match-head.cc instead of repeating it fully for each pattern?

That makes sense to me, let me have a try in v2.

Pan

-Original Message-
From: Richard Biener  
Sent: Wednesday, September 4, 2024 6:56 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; 
kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1 1/2] Genmatch: Support new flow for phi on condition

On Wed, Sep 4, 2024 at 9:48 AM Li, Pan2  wrote:
>
> > I'm lazy - can you please quote genmatch generated code for the condition 
> > for
> > one case?
>
> Sure thing, list the before and after covers all the changes to generated 
> code as blow.
>
> Before this patch:
>   basic_block _b1 = gimple_bb (_a1);
>   if (gimple_phi_num_args (_a1) == 2)
> {
>   basic_block _pb_0_1 = EDGE_PRED (_b1, 0)->src;
>   basic_block _pb_1_1 = EDGE_PRED (_b1, 1)->src;
>   basic_block _db_1 = safe_dyn_cast  (*gsi_last_bb 
> (_pb_0_1)) ? _pb_0_1 : _pb_1_1;
>   basic_block _other_db_1 = safe_dyn_cast  
> (*gsi_last_bb (_pb_0_1)) ? _pb_1_1 : _pb_0_1;
>   gcond *_ct_1 = safe_dyn_cast  (*gsi_last_bb 
> (_db_1));
>   if (_ct_1 && EDGE_COUNT (_other_db_1->preds) == 1
> && EDGE_COUNT (_other_db_1->succs) == 1
> && EDGE_PRED (_other_db_1, 0)->src == _db_1)
> {
>   tree _cond_lhs_1 = gimple_cond_lhs (_ct_1);
>   tree _cond_rhs_1 = gimple_cond_rhs (_ct_1);
>   tree _p0 = build2 (gimple_cond_code (_ct_1), 
> boolean_type_node, _cond_lhs_1, _cond_rhs_1);
>   bool _arg_0_is_true_1 = gimple_phi_arg_edge (_a1, 
> 0)->flags & EDGE_TRUE_VALUE;
>   tree _p1 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? 
> 0 : 1);
>   tree _p2 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? 
> 1 : 0);
>   switch (TREE_CODE (_p0))
> {
>
> After this patch:
>   basic_block _pb_0_1 = EDGE_PRED (_b1, 0)->src;
>   basic_block _pb_1_1 = EDGE_PRED (_b1, 1)->src;
>   gcond *_ct_0_1 = safe_dyn_cast  (*gsi_last_bb 
> (_pb_0_1));
>   gcond *_ct_1_1 = safe_dyn_cast  (*gsi_last_bb 
> (_pb_1_1));
>   gcond *_ct_a_1 = _ct_0_1 ? _ct_0_1 : _ct_1_1;
>   basic_block _db_1 = _ct_0_1 ? _pb_0_1 : _pb_1_1;
>   basic_block _other_db_1 = _ct_0_1 ? _pb_1_1 : _pb_0_1;
>   edge _e_00_1 = _pb_0_1->preds ? EDGE_PRED (_pb_0_1, 0) : 
> NULL;
>   basic_block _pb_00_1 = _e_00_1 ? _e_00_1->src : NULL;
>   gcond *_ct_b_1 = _pb_00_1 ? safe_dyn_cast  
> (*gsi_last_bb (_pb_00_1)) : NULL;
>   if ((_ct_a_1 && EDGE_COUNT (_other_db_1->preds) == 1
>    && EDGE_COUNT (_other_db_1->succs) == 1
>    && EDGE_PRED (_other_db_1, 0)->src == _db_1)
>   ||
>   (_ct_b_1 && _pb_00_1 && EDGE_COUNT (_pb_0_1->succs) == 1
>    && EDGE_COUNT (_pb_0_1->preds) == 1
>    && EDGE_COUNT (_other_db_1->preds) == 1
>    && EDGE_COUNT (_other_db_1->succs) == 1
>    && EDGE_PRED (_other_db_1, 0)->src == _pb_00_1))
> {
>   gcond *_ct_1 = _ct_a_1 ? _ct_a_1 : _ct_b_1;
>   tree _cond_lhs_1 = gimple_cond_lhs (_ct_1);
>   tree _cond_rhs_1 = gimple_cond_rhs (_ct_1);
>   tree _p0 = build2 (gimple_cond_code (_ct_1), 
> boolean_type_node, _cond_lhs_1, _cond_rhs_1);
>   bool _arg_0_is_true_1 = gimple_phi_arg_edge (_a1, 
> 0)->flags & EDGE_TRUE_VALUE;
>   tree _p1 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? 
> 0 : 1);
>   tree _p2 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? 
> 1 : 0);

I think it might be better to refactor this to detect the three CFGs like

 if (EDGE_COUNT (_pb_0_1->preds) == 1
 && EDGE_PRED (_pb_0_1, 0)->src == pb_1_1)
   {
.. check rest of constraints ..
   }
else if (... same for _pb_1_1 being the forwarder ...)
 ...
else if (EDGE_COUNT (_pb_0_1->preds) == 1
     

RE: [PATCH v1 1/2] Genmatch: Support new flow for phi on condition

2024-09-04 Thread Li, Pan2
> I'm lazy - can you please quote genmatch generated code for the condition for
> one case?

Sure thing, list the before and after covers all the changes to generated code 
as blow.

Before this patch:
  basic_block _b1 = gimple_bb (_a1);
  if (gimple_phi_num_args (_a1) == 2)
{
  basic_block _pb_0_1 = EDGE_PRED (_b1, 0)->src;
  basic_block _pb_1_1 = EDGE_PRED (_b1, 1)->src;
  basic_block _db_1 = safe_dyn_cast  (*gsi_last_bb 
(_pb_0_1)) ? _pb_0_1 : _pb_1_1;
  basic_block _other_db_1 = safe_dyn_cast  
(*gsi_last_bb (_pb_0_1)) ? _pb_1_1 : _pb_0_1;
  gcond *_ct_1 = safe_dyn_cast  (*gsi_last_bb (_db_1));
  if (_ct_1 && EDGE_COUNT (_other_db_1->preds) == 1
&& EDGE_COUNT (_other_db_1->succs) == 1
&& EDGE_PRED (_other_db_1, 0)->src == _db_1)
{
  tree _cond_lhs_1 = gimple_cond_lhs (_ct_1);
  tree _cond_rhs_1 = gimple_cond_rhs (_ct_1);
  tree _p0 = build2 (gimple_cond_code (_ct_1), 
boolean_type_node, _cond_lhs_1, _cond_rhs_1);
  bool _arg_0_is_true_1 = gimple_phi_arg_edge (_a1, 
0)->flags & EDGE_TRUE_VALUE;
  tree _p1 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? 0 
: 1);
  tree _p2 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? 1 
: 0);
  switch (TREE_CODE (_p0))
{

After this patch:
  basic_block _pb_0_1 = EDGE_PRED (_b1, 0)->src;
  basic_block _pb_1_1 = EDGE_PRED (_b1, 1)->src;
  gcond *_ct_0_1 = safe_dyn_cast  (*gsi_last_bb 
(_pb_0_1));
  gcond *_ct_1_1 = safe_dyn_cast  (*gsi_last_bb 
(_pb_1_1));
  gcond *_ct_a_1 = _ct_0_1 ? _ct_0_1 : _ct_1_1;
  basic_block _db_1 = _ct_0_1 ? _pb_0_1 : _pb_1_1;
  basic_block _other_db_1 = _ct_0_1 ? _pb_1_1 : _pb_0_1;
  edge _e_00_1 = _pb_0_1->preds ? EDGE_PRED (_pb_0_1, 0) : NULL;
  basic_block _pb_00_1 = _e_00_1 ? _e_00_1->src : NULL;
  gcond *_ct_b_1 = _pb_00_1 ? safe_dyn_cast  
(*gsi_last_bb (_pb_00_1)) : NULL;
  if ((_ct_a_1 && EDGE_COUNT (_other_db_1->preds) == 1
   && EDGE_COUNT (_other_db_1->succs) == 1
   && EDGE_PRED (_other_db_1, 0)->src == _db_1)
  ||
  (_ct_b_1 && _pb_00_1 && EDGE_COUNT (_pb_0_1->succs) == 1
   && EDGE_COUNT (_pb_0_1->preds) == 1
   && EDGE_COUNT (_other_db_1->preds) == 1
   && EDGE_COUNT (_other_db_1->succs) == 1
   && EDGE_PRED (_other_db_1, 0)->src == _pb_00_1))
{
  gcond *_ct_1 = _ct_a_1 ? _ct_a_1 : _ct_b_1;
  tree _cond_lhs_1 = gimple_cond_lhs (_ct_1);
  tree _cond_rhs_1 = gimple_cond_rhs (_ct_1);
  tree _p0 = build2 (gimple_cond_code (_ct_1), 
boolean_type_node, _cond_lhs_1, _cond_rhs_1);
  bool _arg_0_is_true_1 = gimple_phi_arg_edge (_a1, 
0)->flags & EDGE_TRUE_VALUE;
  tree _p1 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? 0 
: 1);
      tree _p2 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? 1 
: 0);

Pan

-Original Message-
From: Richard Biener  
Sent: Wednesday, September 4, 2024 3:42 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; 
kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1 1/2] Genmatch: Support new flow for phi on condition

On Wed, Sep 4, 2024 at 9:25 AM  wrote:
>
> From: Pan Li 
>
> The gen_phi_on_cond can only support below control flow for cond
> from day 1.  Aka:
>
> +--+
> | def  |
> | ...  |   +-+
> | cond |-->| def |
> +--+   | ... |
>|   +-+
>|  |
>v  |
> +-+   |
> | PHI |<--+
> +-+
>
> Unfortunately, there will be more scenarios of control flow on PHI.
> For example as below:
>
> T __attribute__((noinline))\
> sat_s_add_##T##_fmt_3 (T x, T y)   \
> {  \
>   T sum;   \
>   bool overflow = __builtin_add_overflow (x, y, &sum); \
>   return overflow ? x < 0 ? MIN : MAX : sum;   \
> }
>
> DEF_SAT_S_ADD_FMT_3(int8_t, uint8_t, INT8_MIN, INT8_MAX)
>
> With expanded RTL like b

RE: [PATCH v1] RISC-V: Support form 1 of integer scalar .SAT_ADD

2024-09-01 Thread Li, Pan2
Thanks Jeff.

> But I would expect that may be beneficial on other targets as well.
I think x86 have the similar insn for saturation, for example as paddsw in 
below link.
https://www.felixcloutier.com/x86/paddsb:paddsw

And the backend of x86 implemented some of them already I bet, like usadd, 
ussub.

> The other question that I think Robin initially raised to me privately 
> is whether or not the sequences we're generating are well suited for 
> zicond or not.  

Got it, cmov like insn is well designed for such case(s). We can consider the 
best
practice to leverage zicond ext in further improvements.

Pan

-Original Message-
From: Jeff Law  
Sent: Monday, September 2, 2024 11:32 AM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1] RISC-V: Support form 1 of integer scalar .SAT_ADD



On 9/1/24 8:50 PM, Li, Pan2 wrote:
> Thanks Jeff for comments.
> 
>> OK.  Presumably the code you're getting here is more efficient than
>> whatever standard expansion would provide?  If so, should we be looking
>> at moving some of this stuff into generic expanders?  I don't really see
>> anything all that target specific here.
> 
> Mostly for that we can eliminate the branch for .SAT_ADD in scalar. Given we
> don't have one SAT_ADD like insn like RVV vsadd.vv/vx/vi.
But I would expect that may be beneficial on other targets as well. 
It's not conceptually a lot different than what we do basic arithmetic 
with overflow, which has generic expansion which can be overridden by 
target specific expanders.  See expand_addsub_overflow.

Again, I think this is OK, but I'm thinking we probably want something 
more generic in the longer term.

The other question that I think Robin initially raised to me privately 
is whether or not the sequences we're generating are well suited for 
zicond or not.  If not, we might want to consider adjustments to either 
generate zicond if-then-else constructs during initial code generation 
or bias initial code generator towards sequences that ifcvt & combine 
can turn into zicond.  But again not strictly necessary for this patch 
to go forward, more a potential avenue for further improvements.


> 
> Pan
> 
> -Original Message-
> From: Jeff Law 
> Sent: Sunday, September 1, 2024 11:35 PM
> To: Li, Pan2 ; gcc-patches@gcc.gnu.org
> Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com
> Subject: Re: [PATCH v1] RISC-V: Support form 1 of integer scalar .SAT_ADD
> 
> 
> 
> On 8/29/24 12:25 AM, pan2...@intel.com wrote:
>> From: Pan Li 
>>
>> This patch would like to support the scalar signed ssadd pattern
>> for the RISC-V backend.  Aka
>>
>> Form 1:
>> #define DEF_SAT_S_ADD_FMT_1(T, UT, MIN, MAX) \
>> T __attribute__((noinline))  \
>> sat_s_add_##T##_fmt_1 (T x, T y) \
>> {\
>>   T sum = (UT)x + (UT)y; \
>>   return (x ^ y) < 0 \
>> ? sum\
>> : (sum ^ x) >= 0 \
>>   ? sum  \
>>   : x < 0 ? MIN : MAX;   \
>> }
>>
>> DEF_SAT_S_ADD_FMT_1(int64_t, uint64_t, INT64_MIN, INT64_MAX)
>>
>> Before this patch:
>> 10   │ sat_s_add_int64_t_fmt_1:
>> 11   │ mv   a5,a0
>> 12   │ add  a0,a0,a1
>> 13   │ xor  a1,a5,a1
>> 14   │ not  a1,a1
>> 15   │ xor  a4,a5,a0
>> 16   │ and  a1,a1,a4
>> 17   │ blt  a1,zero,.L5
>> 18   │ ret
>> 19   │ .L5:
>> 20   │ srai a5,a5,63
>> 21   │ li   a0,-1
>> 22   │ srli a0,a0,1
>> 23   │ xor  a0,a5,a0
>> 24   │ ret
>>
>> After this patch:
>> 10   │ sat_s_add_int64_t_fmt_1:
>> 11   │ add  a2,a0,a1
>> 12   │ xor  a1,a0,a1
>> 13   │ xor  a5,a0,a2
>> 14   │ srli a5,a5,63
>> 15   │ srli a1,a1,63
>> 16   │ xori a1,a1,1
>> 17   │ and  a5,a5,a1
>> 18   │ srai a4,a0,63
>> 19   │ li   a3,-1
>> 20   │ srli a3,a3,1
>> 21   │ xor  a3,a3,a4
>> 22   │ neg  a4,a5
>> 23   │ and  a3,a3,a4
>> 24   │ addi a5,a5,-1
>> 25   │ and  a0,a2,a5
>> 26   │ or   a0,a0,a3
>> 27   │ ret
>>
>> The below test suites are passed for this patch:
>> 1. The rv64gcv fully regression test.
>>

RE: [PATCH v1] RISC-V: Support form 1 of integer scalar .SAT_ADD

2024-09-01 Thread Li, Pan2
Thanks Jeff for comments.

> OK.  Presumably the code you're getting here is more efficient than 
> whatever standard expansion would provide?  If so, should we be looking 
> at moving some of this stuff into generic expanders?  I don't really see 
> anything all that target specific here.

Mostly for that we can eliminate the branch for .SAT_ADD in scalar. Given we
don't have one SAT_ADD like insn like RVV vsadd.vv/vx/vi.

Pan

-Original Message-
From: Jeff Law  
Sent: Sunday, September 1, 2024 11:35 PM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1] RISC-V: Support form 1 of integer scalar .SAT_ADD



On 8/29/24 12:25 AM, pan2...@intel.com wrote:
> From: Pan Li 
> 
> This patch would like to support the scalar signed ssadd pattern
> for the RISC-V backend.  Aka
> 
> Form 1:
>#define DEF_SAT_S_ADD_FMT_1(T, UT, MIN, MAX) \
>T __attribute__((noinline))  \
>sat_s_add_##T##_fmt_1 (T x, T y) \
>{\
>  T sum = (UT)x + (UT)y; \
>  return (x ^ y) < 0 \
>? sum\
>: (sum ^ x) >= 0 \
>  ? sum  \
>  : x < 0 ? MIN : MAX;   \
>}
> 
> DEF_SAT_S_ADD_FMT_1(int64_t, uint64_t, INT64_MIN, INT64_MAX)
> 
> Before this patch:
>10   │ sat_s_add_int64_t_fmt_1:
>11   │ mv   a5,a0
>12   │ add  a0,a0,a1
>13   │ xor  a1,a5,a1
>14   │ not  a1,a1
>15   │ xor  a4,a5,a0
>16   │ and  a1,a1,a4
>17   │ blt  a1,zero,.L5
>18   │ ret
>19   │ .L5:
>20   │ srai a5,a5,63
>21   │ li   a0,-1
>22   │ srli a0,a0,1
>23   │ xor  a0,a5,a0
>24   │ ret
> 
> After this patch:
>10   │ sat_s_add_int64_t_fmt_1:
>11   │ add  a2,a0,a1
>12   │ xor  a1,a0,a1
>13   │ xor  a5,a0,a2
>14   │ srli a5,a5,63
>15   │ srli a1,a1,63
>16   │ xori a1,a1,1
>17   │ and  a5,a5,a1
>18   │ srai a4,a0,63
>19   │ li   a3,-1
>20   │ srli a3,a3,1
>21   │ xor  a3,a3,a4
>22   │ neg  a4,a5
>23   │ and  a3,a3,a4
>24   │ addi a5,a5,-1
>25   │ and  a0,a2,a5
>26   │ or   a0,a0,a3
>27   │ ret
> 
> The below test suites are passed for this patch:
> 1. The rv64gcv fully regression test.
> 
> gcc/ChangeLog:
> 
>   * config/riscv/riscv-protos.h (riscv_expand_ssadd): Add new func
>   decl for expanding ssadd.
>   * config/riscv/riscv.cc (riscv_gen_sign_max_cst): Add new func
>   impl to gen the max int rtx.
>   (riscv_expand_ssadd): Add new func impl to expand the ssadd.
>   * config/riscv/riscv.md (ssadd3): Add new pattern for
>   signed integer .SAT_ADD.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/riscv/sat_arith.h: Add test helper macros.
>   * gcc.target/riscv/sat_arith_data.h: Add test data.
>   * gcc.target/riscv/sat_s_add-1.c: New test.
>   * gcc.target/riscv/sat_s_add-2.c: New test.
>   * gcc.target/riscv/sat_s_add-3.c: New test.
>   * gcc.target/riscv/sat_s_add-4.c: New test.
>   * gcc.target/riscv/sat_s_add-run-1.c: New test.
>   * gcc.target/riscv/sat_s_add-run-2.c: New test.
>   * gcc.target/riscv/sat_s_add-run-3.c: New test.
>   * gcc.target/riscv/sat_s_add-run-4.c: New test.
>   * gcc.target/riscv/scalar_sat_binary_run_xxx.h: New test.
OK.  Presumably the code you're getting here is more efficient than 
whatever standard expansion would provide?  If so, should we be looking 
at moving some of this stuff into generic expanders?  I don't really see 
anything all that target specific here.

jeff



RE: [PATCH v2] Test: Move pr116278 run test to dg/torture [NFC]

2024-08-28 Thread Li, Pan2
Noted with thanks, will commit with that change if no surprise from test.

Pan

-Original Message-
From: Richard Biener  
Sent: Wednesday, August 28, 2024 3:24 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v2] Test: Move pr116278 run test to dg/torture [NFC]

On Wed, Aug 28, 2024 at 3:18 AM Li, Pan2  wrote:
>
> Kindly ping.

Please do not include stdint-gcc.h but stdint.h.

otherwise OK.

Richard.

> Pan
>
> -Original Message-----
> From: Li, Pan2 
> Sent: Monday, August 19, 2024 10:05 AM
> To: gcc-patches@gcc.gnu.org
> Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; 
> rdapp@gmail.com; Li, Pan2 
> Subject: [PATCH v2] Test: Move pr116278 run test to dg/torture [NFC]
>
> From: Pan Li 
>
> Move the run test of pr116278 to dg/torture and leave the risc-v the
> asm check under risc-v part.
>
> PR target/116278
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/pr116278-run-1.c: Take compile instead of run.
> * gcc.target/riscv/pr116278-run-2.c: Ditto.
> * gcc.dg/torture/pr116278-run-1.c: New test.
> * gcc.dg/torture/pr116278-run-2.c: New test.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/testsuite/gcc.dg/torture/pr116278-run-1.c | 19 +++
>  gcc/testsuite/gcc.dg/torture/pr116278-run-2.c | 19 +++
>  .../gcc.target/riscv/pr116278-run-1.c |  2 +-
>  .../gcc.target/riscv/pr116278-run-2.c |  2 +-
>  4 files changed, 40 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/torture/pr116278-run-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/torture/pr116278-run-2.c
>
> diff --git a/gcc/testsuite/gcc.dg/torture/pr116278-run-1.c 
> b/gcc/testsuite/gcc.dg/torture/pr116278-run-1.c
> new file mode 100644
> index 000..8e07fb6af29
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/torture/pr116278-run-1.c
> @@ -0,0 +1,19 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target int32 } */
> +/* { dg-options "-O2" } */
> +
> +#include 
> +
> +int8_t b[1];
> +int8_t *d = b;
> +int32_t c;
> +
> +int main() {
> +  b[0] = -40;
> +  uint16_t t = (uint16_t)d[0];
> +
> +  c = (t < 0xFFF6 ? t : 0xFFF6) + 9;
> +
> +  if (c != 65505)
> +__builtin_abort ();
> +}
> diff --git a/gcc/testsuite/gcc.dg/torture/pr116278-run-2.c 
> b/gcc/testsuite/gcc.dg/torture/pr116278-run-2.c
> new file mode 100644
> index 000..d85e21531e1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/torture/pr116278-run-2.c
> @@ -0,0 +1,19 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target int32 } */
> +/* { dg-options "-O2" } */
> +
> +#include 
> +
> +int16_t b[1];
> +int16_t *d = b;
> +int64_t c;
> +
> +int main() {
> +  b[0] = -40;
> +  uint32_t t = (uint32_t)d[0];
> +
> +  c = (t < 0xFFF6u ? t : 0xFFF6u) + 9;
> +
> +  if (c != 4294967265)
> +__builtin_abort ();
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/pr116278-run-1.c 
> b/gcc/testsuite/gcc.target/riscv/pr116278-run-1.c
> index d3812bdcdfb..c758fca7975 100644
> --- a/gcc/testsuite/gcc.target/riscv/pr116278-run-1.c
> +++ b/gcc/testsuite/gcc.target/riscv/pr116278-run-1.c
> @@ -1,4 +1,4 @@
> -/* { dg-do run { target { riscv_v } } } */
> +/* { dg-do compile } */
>  /* { dg-options "-O2 -fdump-rtl-expand-details" } */
>
>  #include 
> diff --git a/gcc/testsuite/gcc.target/riscv/pr116278-run-2.c 
> b/gcc/testsuite/gcc.target/riscv/pr116278-run-2.c
> index 669cd4f003f..a4da8a323f0 100644
> --- a/gcc/testsuite/gcc.target/riscv/pr116278-run-2.c
> +++ b/gcc/testsuite/gcc.target/riscv/pr116278-run-2.c
> @@ -1,4 +1,4 @@
> -/* { dg-do run { target { riscv_v } } } */
> +/* { dg-do compile } */
>  /* { dg-options "-O2 -fdump-rtl-expand-details" } */
>
>  #include 
> --
> 2.43.0
>


RE: [PATCH v1 1/2] RISC-V: Add testcases for unsigned scalar quad and oct .SAT_TRUNC form 2

2024-08-27 Thread Li, Pan2
Hi Patrick,

Could you please help to re-trigger the pre-commit?
Thanks in advance!

Pan

-Original Message-
From: Patrick O'Neill  
Sent: Tuesday, August 20, 2024 12:14 AM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com; Jeff Law 

Subject: Re: [PATCH v1 1/2] RISC-V: Add testcases for unsigned scalar quad and 
oct .SAT_TRUNC form 2

Hi Pan,

Once the postcommit baseline moves forward (trunk is currently failing 
to build linux targets [1] [2]) I'll re-trigger precommit for you.

Thanks,
Patrick

[1]: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116409
[2]: https://github.com/patrick-rivos/gcc-postcommit-ci/issues/1564

On 8/18/24 19:49, Li, Pan2 wrote:
> Turn out that the pre-commit doesn't pick up the newest upstream when testing 
> this patch.
>
> Pan
>
> -----Original Message-
> From: Li, Pan2 
> Sent: Monday, August 19, 2024 9:25 AM
> To: Jeff Law ; gcc-patches@gcc.gnu.org
> Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com
> Subject: RE: [PATCH v1 1/2] RISC-V: Add testcases for unsigned scalar quad 
> and oct .SAT_TRUNC form 2
>
> Opps, let me double check what happened to my local tester.
>
> Pan
>
> -Original Message-----
> From: Jeff Law 
> Sent: Sunday, August 18, 2024 11:21 PM
> To: Li, Pan2 ; gcc-patches@gcc.gnu.org
> Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com
> Subject: Re: [PATCH v1 1/2] RISC-V: Add testcases for unsigned scalar quad 
> and oct .SAT_TRUNC form 2
>
>
>
> On 8/18/24 12:10 AM, pan2...@intel.com wrote:
>> From: Pan Li 
>>
>> This patch would like to add test cases for the unsigned scalar quad and
>> oct .SAT_TRUNC form 2.  Aka:
>>
>> Form 2:
>> #define DEF_SAT_U_TRUC_FMT_2(NT, WT) \
>> NT __attribute__((noinline)) \
>> sat_u_truc_##WT##_to_##NT##_fmt_2 (WT x) \
>> {\
>>   WT max = (WT)(NT)-1;   \
>>   return x > max ? (NT) max : (NT)x; \
>> }
>>
>> QUAD:
>> DEF_SAT_U_TRUC_FMT_2 (uint16_t, uint64_t)
>> DEF_SAT_U_TRUC_FMT_2 (uint8_t, uint32_t)
>>
>> OCT:
>> DEF_SAT_U_TRUC_FMT_2 (uint8_t, uint64_t)
>>
>> The below test is passed for this patch.
>> * The rv64gcv regression test.
>>
>> gcc/testsuite/ChangeLog:
>>
>>  * gcc.target/riscv/sat_u_trunc-10.c: New test.
>>  * gcc.target/riscv/sat_u_trunc-11.c: New test.
>>  * gcc.target/riscv/sat_u_trunc-12.c: New test.
>>  * gcc.target/riscv/sat_u_trunc-run-10.c: New test.
>>  * gcc.target/riscv/sat_u_trunc-run-11.c: New test.
>>  * gcc.target/riscv/sat_u_trunc-run-12.c: New test.
> Looks like they're failing in the upstream pre-commit tester:
>
>> https://github.com/ewlu/gcc-precommit-ci/issues/2066#issuecomment-2295137578
>
> jeff


RE: [PATCH v2] Test: Move pr116278 run test to dg/torture [NFC]

2024-08-27 Thread Li, Pan2
Kindly ping.

Pan

-Original Message-
From: Li, Pan2  
Sent: Monday, August 19, 2024 10:05 AM
To: gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; 
rdapp@gmail.com; Li, Pan2 
Subject: [PATCH v2] Test: Move pr116278 run test to dg/torture [NFC]

From: Pan Li 

Move the run test of pr116278 to dg/torture and leave the risc-v the
asm check under risc-v part.

PR target/116278

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr116278-run-1.c: Take compile instead of run.
* gcc.target/riscv/pr116278-run-2.c: Ditto.
* gcc.dg/torture/pr116278-run-1.c: New test.
* gcc.dg/torture/pr116278-run-2.c: New test.

Signed-off-by: Pan Li 
---
 gcc/testsuite/gcc.dg/torture/pr116278-run-1.c | 19 +++
 gcc/testsuite/gcc.dg/torture/pr116278-run-2.c | 19 +++
 .../gcc.target/riscv/pr116278-run-1.c |  2 +-
 .../gcc.target/riscv/pr116278-run-2.c |  2 +-
 4 files changed, 40 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr116278-run-1.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr116278-run-2.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr116278-run-1.c 
b/gcc/testsuite/gcc.dg/torture/pr116278-run-1.c
new file mode 100644
index 000..8e07fb6af29
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr116278-run-1.c
@@ -0,0 +1,19 @@
+/* { dg-do run } */
+/* { dg-require-effective-target int32 } */
+/* { dg-options "-O2" } */
+
+#include 
+
+int8_t b[1];
+int8_t *d = b;
+int32_t c;
+
+int main() {
+  b[0] = -40;
+  uint16_t t = (uint16_t)d[0];
+
+  c = (t < 0xFFF6 ? t : 0xFFF6) + 9;
+
+  if (c != 65505)
+__builtin_abort ();
+}
diff --git a/gcc/testsuite/gcc.dg/torture/pr116278-run-2.c 
b/gcc/testsuite/gcc.dg/torture/pr116278-run-2.c
new file mode 100644
index 000..d85e21531e1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr116278-run-2.c
@@ -0,0 +1,19 @@
+/* { dg-do run } */
+/* { dg-require-effective-target int32 } */
+/* { dg-options "-O2" } */
+
+#include 
+
+int16_t b[1];
+int16_t *d = b;
+int64_t c;
+
+int main() {
+  b[0] = -40;
+  uint32_t t = (uint32_t)d[0];
+
+  c = (t < 0xFFF6u ? t : 0xFFF6u) + 9;
+
+  if (c != 4294967265)
+__builtin_abort ();
+}
diff --git a/gcc/testsuite/gcc.target/riscv/pr116278-run-1.c 
b/gcc/testsuite/gcc.target/riscv/pr116278-run-1.c
index d3812bdcdfb..c758fca7975 100644
--- a/gcc/testsuite/gcc.target/riscv/pr116278-run-1.c
+++ b/gcc/testsuite/gcc.target/riscv/pr116278-run-1.c
@@ -1,4 +1,4 @@
-/* { dg-do run { target { riscv_v } } } */
+/* { dg-do compile } */
 /* { dg-options "-O2 -fdump-rtl-expand-details" } */
 
 #include 
diff --git a/gcc/testsuite/gcc.target/riscv/pr116278-run-2.c 
b/gcc/testsuite/gcc.target/riscv/pr116278-run-2.c
index 669cd4f003f..a4da8a323f0 100644
--- a/gcc/testsuite/gcc.target/riscv/pr116278-run-2.c
+++ b/gcc/testsuite/gcc.target/riscv/pr116278-run-2.c
@@ -1,4 +1,4 @@
-/* { dg-do run { target { riscv_v } } } */
+/* { dg-do compile } */
 /* { dg-options "-O2 -fdump-rtl-expand-details" } */
 
 #include 
-- 
2.43.0



RE: [PATCH v3] Match: Support form 1 for scalar signed integer .SAT_ADD

2024-08-27 Thread Li, Pan2
> :c is required when you want to match up @0s and they appear in a commutative
> operation and there's no canonicalization rule putting it into one or the 
> other
> position.  In your case you have two commutative operations you want to match
> up, so it should be only necessary to try swapping one of it to get the match,
> it's not required to swap both.  This reduces the number of generated 
> patterns.

Thanks Richard for the explanation. Got the point that the swap on captures for 
a op will
also effect on other op(s), will update in v4.

Pan

-Original Message-
From: Richard Biener  
Sent: Tuesday, August 27, 2024 4:41 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; 
kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v3] Match: Support form 1 for scalar signed integer .SAT_ADD

On Tue, Aug 27, 2024 at 3:06 AM Li, Pan2  wrote:
>
> Thanks Richard for comments.
>
> > I think you want to use nop_convert here, for sure a truncation or
> > extension wouldn't be valid?
>
> Oh, yes, should be nop_convert.
>
> > I think you don't need :c on both the inner plus and the bit_xor here?
>
> Sure, could you please help to explain more about when should I need to add 
> :c?
> Liker inner plus/and/or ... etc, sometimes got confused for similar scenarios.

:c is required when you want to match up @0s and they appear in a commutative
operation and there's no canonicalization rule putting it into one or the other
position.  In your case you have two commutative operations you want to match
up, so it should be only necessary to try swapping one of it to get the match,
it's not required to swap both.  This reduces the number of generated patterns.

> > +   integer_zerop)
> > +   (bit_xor:c (negate (convert (lt @0 integer_zerop))) max_value)
>
> > The comment above quotes 'MIN' but that's not present here - that is,
> > the comment quotes a source form while we match what we see on
> > GIMPLE?  I do expect the matching will be quite fragile when not
> > being isolated.
>
> Got it, will update the comments to gimple.
>
> Pan
>
> -Original Message-
> From: Richard Biener 
> Sent: Monday, August 26, 2024 9:40 PM
> To: Li, Pan2 
> Cc: gcc-patches@gcc.gnu.org; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; 
> kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com
> Subject: Re: [PATCH v3] Match: Support form 1 for scalar signed integer 
> .SAT_ADD
>
> On Mon, Aug 26, 2024 at 4:20 AM  wrote:
> >
> > From: Pan Li 
> >
> > This patch would like to support the form 1 of the scalar signed
> > integer .SAT_ADD.  Aka below example:
> >
> > Form 1:
> >   #define DEF_SAT_S_ADD_FMT_1(T, UT, MIN, MAX) \
> >   T __attribute__((noinline))  \
> >   sat_s_add_##T##_fmt_1 (T x, T y) \
> >   {\
> > T sum = (UT)x + (UT)y; \
> > return (x ^ y) < 0 \
> >   ? sum\
> >   : (sum ^ x) >= 0 \
> > ? sum  \
> > : x < 0 ? MIN : MAX;   \
> >   }
> >
> > DEF_SAT_S_ADD_FMT_1(int64_t, uint64_t, INT64_MIN, INT64_MAX)
> >
> > We can tell the difference before and after this patch if backend
> > implemented the ssadd3 pattern similar as below.
> >
> > Before this patch:
> >4   │ __attribute__((noinline))
> >5   │ int64_t sat_s_add_int64_t_fmt_1 (int64_t x, int64_t y)
> >6   │ {
> >7   │   int64_t sum;
> >8   │   long unsigned int x.0_1;
> >9   │   long unsigned int y.1_2;
> >   10   │   long unsigned int _3;
> >   11   │   long int _4;
> >   12   │   long int _5;
> >   13   │   int64_t _6;
> >   14   │   _Bool _11;
> >   15   │   long int _12;
> >   16   │   long int _13;
> >   17   │   long int _14;
> >   18   │   long int _16;
> >   19   │   long int _17;
> >   20   │
> >   21   │ ;;   basic block 2, loop depth 0
> >   22   │ ;;pred:   ENTRY
> >   23   │   x.0_1 = (long unsigned int) x_7(D);
> >   24   │   y.1_2 = (long unsigned int) y_8(D);
> >   25   │   _3 = x.0_1 + y.1_2;
> >   26   │   sum_9 = (int64_t) _3;
> >   27   │   _4 = x_7(D) ^ y_8(D);
> >   28   │   _5 = x_7(D) ^ sum_9;
> >   29   │   _17 = ~_4;
> >   30   │   _16 = _5 & _17;
> >   31   │   if (_16 < 0)
> >   32   │ goto ; [41.00%]
> >   33   │   else
&

RE: [PATCH v2] Vect: Reconcile the const_int operand type of unsigned .SAT_ADD

2024-08-27 Thread Li, Pan2
Thanks Richard for comments.


> Err, can you please simply do
>if (TREE_CODE (ops[1]) == INTEGER_CST)
>  ops[1] = fold_convert (TREE_TYPE (ops[0]), ops[1])
> ?  you are always matching the constant to @1 IIRC.

That would be much more simple, will have a try in v3.

Pan

-Original Message-
From: Richard Biener  
Sent: Tuesday, August 27, 2024 5:09 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; 
kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v2] Vect: Reconcile the const_int operand type of unsigned 
.SAT_ADD

On Tue, Aug 27, 2024 at 9:09 AM  wrote:
>
> From: Pan Li 
>
> The .SAT_ADD has 2 operand, when one of the operand may be INTEGER_CST.
> For example _1 = .SAT_ADD (_2, 9) comes from below sample code.
>
> Form 3:
>   #define DEF_VEC_SAT_U_ADD_IMM_FMT_3(T, IMM)  \
>   T __attribute__((noinline))  \
>   vec_sat_u_add_imm##IMM##_##T##_fmt_3 (T *out, T *in, unsigned limit) \
>   {\
> unsigned i;\
> T ret; \
> for (i = 0; i < limit; i++)\
>   {\
> out[i] = __builtin_add_overflow (in[i], IMM, &ret) ? -1 : ret; \
>   }\
>   }
>
> DEF_VEC_SAT_U_ADD_IMM_FMT_3(uint64_t, 9)
>
> It will fail to vectorize as the vectorizable_call will check the
> operands is type_compatiable but the imm will be (const_int 9) with
> the SImode, which is different from _2 (DImode).  Aka:
>
> uint64_t _1;
> uint64_t _2;
> _1 = .SAT_ADD (_2, 9);
>
> This patch would like to reconcile the imm operand to the operand type
> mode of _2 if and only if there is no precision/data loss.  Aka convert
> the imm 9 to the DImode for above example.
>
> The below test suites are passed for this patch:
> 1. The rv64gcv fully regression tests.
> 2. The rv64gcv build with glibc.
> 3. The x86 bootstrap tests.
> 4. The x86 fully regression tests.
>
> gcc/ChangeLog:
>
> * tree-vect-patterns.cc (vect_recog_reconcile_cst_to_unsigned):
> Add new func impl to reconcile the cst int type to given TREE type.
> (vect_recog_sat_add_pattern): Reconcile the ops of .SAT_ADD
> before building the gimple call.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/autovec/vec_sat_arith.h: Add test helper 
> macros.
> * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-1.c: 
> New test.
> * 
> gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-10.c: New test.
> * 
> gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-11.c: New test.
> * 
> gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-12.c: New test.
> * 
> gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-13.c: New test.
> * 
> gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-14.c: New test.
> * 
> gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-15.c: New test.
> * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-2.c: 
> New test.
> * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-3.c: 
> New test.
> * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-4.c: 
> New test.
> * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-5.c: 
> New test.
> * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-6.c: 
> New test.
> * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-7.c: 
> New test.
> * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-8.c: 
> New test.
> * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-9.c: 
> New test.
>
> Signed-off-by: Pan Li 
> ---
>  .../binop/vec_sat_u_add_imm_reconcile-1.c |  9 +
>  .../binop/vec_sat_u_add_imm_reconcile-10.c|  9 +
>  .../binop/vec_sat_u_add_imm_reconcile-11.c|  9 +
>  .../binop/vec_sat_u_add_imm_reconcile-12.c|  9 +
>  .../binop/vec_sat_u_add_imm_reconcile-13.c|  9 +
>  .../binop/vec_sat_u_add_imm_reconcile-14.c|  9 +
>  .../binop/vec_sat_u_add_imm_reconcile-15.c|  9 +
>  .../binop/vec_sat_u_add_imm_reconcile-2.c |  9 +
>  .../binop/vec_sat_u_add_imm_reconcile-3.c |  9 +
>  .../binop/v

RE: [PATCH v3] Match: Support form 1 for scalar signed integer .SAT_ADD

2024-08-26 Thread Li, Pan2
Thanks Richard for comments.

> I think you want to use nop_convert here, for sure a truncation or
> extension wouldn't be valid?

Oh, yes, should be nop_convert.

> I think you don't need :c on both the inner plus and the bit_xor here?

Sure, could you please help to explain more about when should I need to add :c?
Liker inner plus/and/or ... etc, sometimes got confused for similar scenarios.

> +   integer_zerop)
> +   (bit_xor:c (negate (convert (lt @0 integer_zerop))) max_value)

> The comment above quotes 'MIN' but that's not present here - that is,
> the comment quotes a source form while we match what we see on
> GIMPLE?  I do expect the matching will be quite fragile when not
> being isolated.

Got it, will update the comments to gimple.

Pan

-Original Message-
From: Richard Biener  
Sent: Monday, August 26, 2024 9:40 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; 
kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v3] Match: Support form 1 for scalar signed integer .SAT_ADD

On Mon, Aug 26, 2024 at 4:20 AM  wrote:
>
> From: Pan Li 
>
> This patch would like to support the form 1 of the scalar signed
> integer .SAT_ADD.  Aka below example:
>
> Form 1:
>   #define DEF_SAT_S_ADD_FMT_1(T, UT, MIN, MAX) \
>   T __attribute__((noinline))  \
>   sat_s_add_##T##_fmt_1 (T x, T y) \
>   {\
> T sum = (UT)x + (UT)y; \
> return (x ^ y) < 0 \
>   ? sum\
>   : (sum ^ x) >= 0 \
> ? sum  \
> : x < 0 ? MIN : MAX;   \
>   }
>
> DEF_SAT_S_ADD_FMT_1(int64_t, uint64_t, INT64_MIN, INT64_MAX)
>
> We can tell the difference before and after this patch if backend
> implemented the ssadd3 pattern similar as below.
>
> Before this patch:
>4   │ __attribute__((noinline))
>5   │ int64_t sat_s_add_int64_t_fmt_1 (int64_t x, int64_t y)
>6   │ {
>7   │   int64_t sum;
>8   │   long unsigned int x.0_1;
>9   │   long unsigned int y.1_2;
>   10   │   long unsigned int _3;
>   11   │   long int _4;
>   12   │   long int _5;
>   13   │   int64_t _6;
>   14   │   _Bool _11;
>   15   │   long int _12;
>   16   │   long int _13;
>   17   │   long int _14;
>   18   │   long int _16;
>   19   │   long int _17;
>   20   │
>   21   │ ;;   basic block 2, loop depth 0
>   22   │ ;;pred:   ENTRY
>   23   │   x.0_1 = (long unsigned int) x_7(D);
>   24   │   y.1_2 = (long unsigned int) y_8(D);
>   25   │   _3 = x.0_1 + y.1_2;
>   26   │   sum_9 = (int64_t) _3;
>   27   │   _4 = x_7(D) ^ y_8(D);
>   28   │   _5 = x_7(D) ^ sum_9;
>   29   │   _17 = ~_4;
>   30   │   _16 = _5 & _17;
>   31   │   if (_16 < 0)
>   32   │ goto ; [41.00%]
>   33   │   else
>   34   │ goto ; [59.00%]
>   35   │ ;;succ:   3
>   36   │ ;;4
>   37   │
>   38   │ ;;   basic block 3, loop depth 0
>   39   │ ;;pred:   2
>   40   │   _11 = x_7(D) < 0;
>   41   │   _12 = (long int) _11;
>   42   │   _13 = -_12;
>   43   │   _14 = _13 ^ 9223372036854775807;
>   44   │ ;;succ:   4
>   45   │
>   46   │ ;;   basic block 4, loop depth 0
>   47   │ ;;pred:   2
>   48   │ ;;3
>   49   │   # _6 = PHI 
>   50   │   return _6;
>   51   │ ;;succ:   EXIT
>   52   │
>   53   │ }
>
> After this patch:
>4   │ __attribute__((noinline))
>5   │ int64_t sat_s_add_int64_t_fmt_1 (int64_t x, int64_t y)
>6   │ {
>7   │   int64_t _4;
>8   │
>9   │ ;;   basic block 2, loop depth 0
>   10   │ ;;pred:   ENTRY
>   11   │   _4 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
>   12   │   return _4;
>   13   │ ;;succ:   EXIT
>   14   │
>   15   │ }
>
> The below test suites are passed for this patch.
> * The rv64gcv fully regression test.
> * The x86 bootstrap test.
> * The x86 fully regression test.
>
> gcc/ChangeLog:
>
> * match.pd: Add the matching for signed .SAT_ADD.
> * tree-ssa-math-opts.cc (gimple_signed_integer_sat_add): Add new
> matching func decl.
> (match_unsigned_saturation_add): Try signed .SAT_ADD and rename
> to ...
> (match_saturation_add): ... here.
> (math_opts_dom_walker::after_dom_children): Update the above renamed
> func from caller.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/match.pd  | 18 ++
>

RE: [PATCH v3] RISC-V: Support IMM for operand 0 of ussub pattern

2024-08-25 Thread Li, Pan2
Got it, thanks Jeff.

Pan

-Original Message-
From: Jeff Law  
Sent: Monday, August 26, 2024 10:21 AM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v3] RISC-V: Support IMM for operand 0 of ussub pattern



On 8/25/24 7:35 PM, Li, Pan2 wrote:
> Thanks Jeff.
> 
>> OK.  I'm assuming we don't have to worry about the case where X is wider
>> than Xmode?  ie, a DImode on rv32?
> 
> Yes, the DImode is disabled by ANYI iterator for ussub pattern.
Thanks.  Just wanted to make sure.  And for the avoidance of doubt, this 
patch is fine for the trunk.

jeff



RE: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for vectorizable_call

2024-08-25 Thread Li, Pan2
Thanks Richard for comments and confirmation.

> Instead pattern recognition of .SAT_ADD should promote/demote the invariants -

Got it, will have a try to reconcile the types in .SAT_ADD for const_int.

> What I read is that
> .ADD_OVERFLOW
> produces a value that is equal to the twos-complement add of its arguments
> promoted/demoted to the result type, correct?

Yes, that make sense to me.

Pan

-Original Message-
From: Richard Biener  
Sent: Sunday, August 25, 2024 3:42 PM
To: Li, Pan2 
Cc: Jakub Jelinek ; gcc-patches@gcc.gnu.org
Subject: Re: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for 
vectorizable_call

On Sat, Aug 24, 2024 at 1:31 PM Li, Pan2  wrote:
>
> Thanks Jakub and Richard for explanation and help, I will double check 
> saturate matching for the const_int strict check.
>
> Back to this below case, do we still need some ad-hoc step to unblock the 
> type check when vectorizable_call?
> For example, the const_int 9u may have int type for .SAT_ADD(uint8_t, 9u).
> Or we have somewhere else to make the vectorizable_call happy.

I don't see how vectorizable_call itself can handle this since it
doesn't have any idea
about the type requirements.  Instead pattern recognition of .SAT_ADD should
promote/demote the invariants - of course there might be correctness
issues involved
with matching .ADD_OVERFLOW in the first place.  What I read is that
.ADD_OVERFLOW
produces a value that is equal to the twos-complement add of its arguments
promoted/demoted to the result type, correct?

Richard.

> #define DEF_VEC_SAT_U_ADD_IMM_FMT_3(T, IMM)  \
> T __attribute__((noinline))  \
> vec_sat_u_add_imm##IMM##_##T##_fmt_3 (T *out, T *in, unsigned limit) \
> {\
>   unsigned i;\
>   T ret; \
>   for (i = 0; i < limit; i++)\
> {\
>   out[i] = __builtin_add_overflow (in[i], IMM, &ret) ? -1 : ret; \
> }\
> }
>
> DEF_VEC_SAT_U_ADD_IMM_FMT_3(uint8_t, 9u)
>
> Pan
>
> -Original Message-
> From: Richard Biener 
> Sent: Friday, August 23, 2024 6:53 PM
> To: Jakub Jelinek 
> Cc: Li, Pan2 ; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for 
> vectorizable_call
>
> On Thu, Aug 22, 2024 at 8:36 PM Jakub Jelinek  wrote:
> >
> > On Tue, Aug 20, 2024 at 01:52:35PM +0200, Richard Biener wrote:
> > > On Sat, Aug 17, 2024 at 11:18 PM Jakub Jelinek  wrote:
> > > >
> > > > On Sat, Aug 17, 2024 at 05:03:14AM +, Li, Pan2 wrote:
> > > > > Please feel free to let me know if there is anything I can do to fix 
> > > > > this issue. Thanks a lot.
> > > >
> > > > There is no bug.  The operands of .{ADD,SUB,MUL}_OVERFLOW don't have to 
> > > > have the same type, as described in the 
> > > > __builtin_{add,sub,mul}_overflow{,_p} documentation, each argument can 
> > > > have different type and result yet another one, the behavior is then 
> > > > (as if) to perform the operation in infinite precision and if that 
> > > > result fits into the result type, there is no overflow, otherwise there 
> > > > is.
> > > > So, there is no need to promote anything.
> > >
> > > Hmm, it's a bit awkward to have this state in the IL.
> >
> > Why?  These aren't the only internal functions which have different types
> > of arguments, from the various widening ifns, conditional ifns,
> > scatter/gather, ...  Even the WIDEN_*EXPR trees do have type differences
> > among arguments.
> > And it matches what the user builtin does.
> >
> > Furthermore, at least without _BitInt (but even with _BitInt at the maximum
> > precision too) this might not be even possible.
> > E.g. if there is __builtin_add_overflow with unsigned __int128 and __int128
> > arguments and there are no wider types there is simply no type to use for 
> > both
> > arguments, it would need to be a signed type with at least 129 bits...
> >
> > > I see that
> > > expand_arith_overflow eventually applies
> > > promotion, namely to the type of the LHS.
> >
> > The LHS doesn't have to be wider than the operand types, so it can't promote
> > always.  Yes, in some cases it applies promotion if it is desirable for
> >

RE: [PATCH v3] RISC-V: Support IMM for operand 0 of ussub pattern

2024-08-25 Thread Li, Pan2
Thanks Jeff.

> OK.  I'm assuming we don't have to worry about the case where X is wider 
> than Xmode?  ie, a DImode on rv32?

Yes, the DImode is disabled by ANYI iterator for ussub pattern.

Pan

-Original Message-
From: Jeff Law  
Sent: Sunday, August 25, 2024 11:22 PM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v3] RISC-V: Support IMM for operand 0 of ussub pattern



On 8/18/24 11:23 PM, pan2...@intel.com wrote:
> From: Pan Li 
> 
> This patch would like to allow IMM for the operand 0 of ussub pattern.
> Aka .SAT_SUB(1023, y) as the below example.
> 
> Form 1:
>#define DEF_SAT_U_SUB_IMM_FMT_1(T, IMM) \
>T __attribute__((noinline)) \
>sat_u_sub_imm##IMM##_##T##_fmt_1 (T y)  \
>{   \
>  return (T)IMM >= y ? (T)IMM - y : 0;  \
>}
> 
> DEF_SAT_U_SUB_IMM_FMT_1(uint64_t, 1023)
> 
> Before this patch:
>10   │ sat_u_sub_imm82_uint64_t_fmt_1:
>11   │ li  a5,82
>12   │ bgtua0,a5,.L3
>13   │ sub a0,a5,a0
>14   │ ret
>15   │ .L3:
>16   │ li  a0,0
>17   │ ret
> 
> After this patch:
>10   │ sat_u_sub_imm82_uint64_t_fmt_1:
>11   │ li  a5,82
>12   │ sltua4,a5,a0
>13   │ addia4,a4,-1
>14   │ sub a0,a5,a0
>15   │ and a0,a4,a0
>16   │ ret
> 
> The below test suites are passed for this patch:
> 1. The rv64gcv fully regression test.
> 
> gcc/ChangeLog:
> 
>   * config/riscv/riscv.cc (riscv_gen_unsigned_xmode_reg): Add new
>   func impl to gen xmode rtx reg from operand rtx.
>   (riscv_expand_ussub): Gen xmode reg for operand 1.
>   * config/riscv/riscv.md: Allow const_int for operand 1.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/riscv/sat_arith.h: Add test helper macro.
>   * gcc.target/riscv/sat_u_sub_imm-1.c: New test.
>   * gcc.target/riscv/sat_u_sub_imm-1_1.c: New test.
>   * gcc.target/riscv/sat_u_sub_imm-1_2.c: New test.
>   * gcc.target/riscv/sat_u_sub_imm-2.c: New test.
>   * gcc.target/riscv/sat_u_sub_imm-2_1.c: New test.
>   * gcc.target/riscv/sat_u_sub_imm-2_2.c: New test.
>   * gcc.target/riscv/sat_u_sub_imm-3.c: New test.
>   * gcc.target/riscv/sat_u_sub_imm-3_1.c: New test.
>   * gcc.target/riscv/sat_u_sub_imm-3_2.c: New test.
>   * gcc.target/riscv/sat_u_sub_imm-4.c: New test.
>   * gcc.target/riscv/sat_u_sub_imm-run-1.c: New test.
>   * gcc.target/riscv/sat_u_sub_imm-run-2.c: New test.
>   * gcc.target/riscv/sat_u_sub_imm-run-3.c: New test.
>   * gcc.target/riscv/sat_u_sub_imm-run-4.c: New test.
OK.  I'm assuming we don't have to worry about the case where X is wider 
than Xmode?  ie, a DImode on rv32?


Jeff



RE: [PATCH v2] Match: Support form 1 for scalar signed integer .SAT_ADD

2024-08-24 Thread Li, Pan2
> Wow.  I wonder why this isn't simplified to never saturate since
> signed x + y has undefined behavior on overflow?  So I'd
> expect instead
>  T sum = (unsigned T)x + (unsigned T)y;
> to be used.

Thanks, let me update in v3.

Pan

-Original Message-
From: Richard Biener  
Sent: Thursday, August 22, 2024 5:47 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v2] Match: Support form 1 for scalar signed integer .SAT_ADD

On Wed, Aug 7, 2024 at 11:31 AM  wrote:
>
> From: Pan Li 
>
> This patch would like to support the form 1 of the scalar signed
> integer .SAT_ADD.  Aka below example:
>
> Form 1:
>   #define DEF_SAT_S_ADD_FMT_1(T, MIN, MAX) \
>   T __attribute__((noinline))  \
>   sat_s_add_##T##_fmt_1 (T x, T y) \
>   {\
> T sum = x + y; \
> return (x ^ y) < 0 \
>   ? sum\
>   : (sum ^ x) >= 0 \
> ? sum  \
> : x < 0 ? MIN : MAX;   \
>   }

Wow.  I wonder why this isn't simplified to never saturate since
signed x + y has undefined behavior on overflow?  So I'd
expect instead

  T sum = (unsigned T)x + (unsigned T)y;

to be used.

> DEF_SAT_S_ADD_FMT_1(int64_t, INT64_MIN, INT64_MAX)
>
> We can tell the difference before and after this patch if backend
> implemented the ssadd3 pattern similar as below.
>
> Before this patch:
>4   │ __attribute__((noinline))
>5   │ int64_t sat_s_add_int64_t_fmt_1 (int64_t x, int64_t y)
>6   │ {
>7   │   int64_t sum;
>8   │   long int _1;
>9   │   long int _2;
>   10   │   int64_t _3;
>   11   │   _Bool _8;
>   12   │   long int _9;
>   13   │   long int _10;
>   14   │   long int _11;
>   15   │   long int _12;
>   16   │   long int _13;
>   17   │
>   18   │[local count: 1073741824]:
>   19   │   sum_6 = x_4(D) + y_5(D);
>   20   │   _1 = x_4(D) ^ y_5(D);
>   21   │   _2 = x_4(D) ^ sum_6;
>   22   │   _12 = ~_1;
>   23   │   _13 = _2 & _12;
>   24   │   if (_13 < 0)
>   25   │ goto ; [41.00%]
>   26   │   else
>   27   │ goto ; [59.00%]
>   28   │
>   29   │[local count: 259738147]:
>   30   │   _8 = x_4(D) < 0;
>   31   │   _9 = (long int) _8;
>   32   │   _10 = -_9;
>   33   │   _11 = _10 ^ 9223372036854775807;
>   34   │
>   35   │[local count: 1073741824]:
>   36   │   # _3 = PHI 
>   37   │   return _3;
>   38   │
>   39   │ }
>
> After this patch:
>4   │ __attribute__((noinline))
>5   │ int64_t sat_s_add_int64_t_fmt_1 (int64_t x, int64_t y)
>6   │ {
>7   │   int64_t _4;
>8   │
>9   │ ;;   basic block 2, loop depth 0
>   10   │ ;;pred:   ENTRY
>   11   │   _4 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
>   12   │   return _4;
>   13   │ ;;succ:   EXIT
>   14   │
>   15   │ }
>
> The below test suites are passed for this patch.
> * The rv64gcv fully regression test.
> * The x86 bootstrap test.
> * The x86 fully regression test.
>
> gcc/ChangeLog:
>
> * match.pd: Add the matching for signed .SAT_ADD.
> * tree-ssa-math-opts.cc (gimple_signed_integer_sat_add): Add new
> matching func decl.
> (match_unsigned_saturation_add): Try signed .SAT_ADD and rename
> to ...
> (match_saturation_add): ... here.
> (math_opts_dom_walker::after_dom_children): Update the above renamed
> func from caller.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/match.pd  | 17 
>  gcc/tree-ssa-math-opts.cc | 42 ++-
>  2 files changed, 54 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index c9c8478d286..8b8a5dbcfe3 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3311,6 +3311,23 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>}
>(if (otype_precision < itype_precision && wi::eq_p (trunc_max, 
> int_cst))
>
> +/* Signed saturation add, case 1:
> +   T sum = X + Y;
> +   SAT_S_ADD = (X ^ Y) < 0
> + ? sum
> + : (sum ^ x) >= 0
> +   ? sum
> +   : x < 0 ? MIN : MAX;  */
> +(match (signed_integer_sat_add @0 @1)
> + (cond^ (lt (bit_and:c (bit_xor:c @0 (convert?@2 (plus:c (convert? @0)
> +(convert? @1
> +  (bit_not (bit_xor:c @0 @1)))
> +   integer_zerop)
> +   (bit_xor:c (negate (convert (lt @0 in

RE: [PATCH v1] Match: Add type check for .SAT_ADD imm operand

2024-08-24 Thread Li, Pan2
Thanks Richard and Jakub for comments. 

Ideally would like to make sure the imm operand will have exactly the same type 
as operand 1.
But for uint8_t/uint16_t types, the INTERGER_CST will become the (const_int 3) 
with int type before matching.
Thus, add the type check like that, as well as some negative test case like 
fail to match .SAT_ADD (uint32_t, 3ull).. etc.

.SAT_ADD (uint8_t, (uint8_t)3u)
.SAT_ADD (uint16_t, (uint16_t)3u)
.SAT_ADD (uint32_t, 3u)
.SAT_ADD (uint64_t, 3ull)

Thanks again, good to know int_fits_type_p and let me have a try in v2.

Pan

-Original Message-
From: Jakub Jelinek  
Sent: Sunday, August 25, 2024 1:16 AM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; richard.guent...@gmail.com; 
tamar.christ...@arm.com; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1] Match: Add type check for .SAT_ADD imm operand

On Sat, Aug 24, 2024 at 07:33:06PM +0800, pan2...@intel.com wrote:
> From: Pan Li 
> 
> This patch would like to add strict check for imm operand of .SAT_ADD
> matching.  We have no type checking for imm operand in previous,  which
> may result in unexpected IL to be catched by .SAT_ADD pattern.
> 
> However,  things may become more complicated due to the int promotion.
> This means any const_int without any suffix will be promoted to int
> before matching.  For example as below.
> 
> uint8_t a;
> uint8_t sum = .SAT_ADD (a, 12);
> 
> The second operand will be (const_int 12) with int type when try to
> match .SAT_ADD.  Thus,  to support int8/int16 .SAT_ADD,  only the
> int32 and int64 will be strictly checked.
> 
> The below test suite are passed for this patch:
> * The rv64gcv fully regression test.
> * The x86 bootstrap test.
> * The x86 fully regression test.
> 
> gcc/ChangeLog:
> 
>   * match.pd:

???
>   * match.pd: Add strict type check for .SAT_ADD imm operand.

Usually you should say
* match.pd (pattern you change): What you change.

> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3190,7 +3190,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (cond^ (ne (imagpart (IFN_ADD_OVERFLOW@2 @0 INTEGER_CST@1)) integer_zerop)
>integer_minus_onep (realpart @2))
>(if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> -  && types_match (type, @0
> +   && types_match (type, @0))
> +   (with
> +{
> + unsigned precision = TYPE_PRECISION (type);
> + unsigned int_precision = HOST_BITS_PER_INT;

This has nothing to do with HOST_BITS_PER_INT.
The INTEGER_CST can have any type, not just int.

> +}
> +/* The const_int will perform int promotion,  the const_int will have at

const_int (well, CONST_INT) is an RTL name, it is INTEGER_CST in GIMPLE.
Just one space after ,

> +   least the int_precision.  Thus, type less than int_precision will be
> +   skipped the type match checking.  */

But the whole comment doesn't make much sense to me, the INTEGER_CST won't
perform any int promotion.

> +(if (precision < int_precision || types_match (type, @1))

Why do you compare precision of type against anything?

You want to check that the INTEGER_CST@1 is representable in the type
(compatible with TREE_TYPE (@0)), because only then the caller can
fold_convert @1 to type without the value being altered.
So, IMHO best would be
(if (int_fits_type_p (@1, type))

Jakub



RE: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for vectorizable_call

2024-08-24 Thread Li, Pan2
Thanks Jakub and Richard for explanation and help, I will double check saturate 
matching for the const_int strict check.

Back to this below case, do we still need some ad-hoc step to unblock the type 
check when vectorizable_call?
For example, the const_int 9u may have int type for .SAT_ADD(uint8_t, 9u).
Or we have somewhere else to make the vectorizable_call happy.

#define DEF_VEC_SAT_U_ADD_IMM_FMT_3(T, IMM)  \
T __attribute__((noinline))  \
vec_sat_u_add_imm##IMM##_##T##_fmt_3 (T *out, T *in, unsigned limit) \
{\
  unsigned i;\
  T ret; \
  for (i = 0; i < limit; i++)\
{\
  out[i] = __builtin_add_overflow (in[i], IMM, &ret) ? -1 : ret; \
}\
}

DEF_VEC_SAT_U_ADD_IMM_FMT_3(uint8_t, 9u)

Pan

-Original Message-
From: Richard Biener  
Sent: Friday, August 23, 2024 6:53 PM
To: Jakub Jelinek 
Cc: Li, Pan2 ; gcc-patches@gcc.gnu.org
Subject: Re: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for 
vectorizable_call

On Thu, Aug 22, 2024 at 8:36 PM Jakub Jelinek  wrote:
>
> On Tue, Aug 20, 2024 at 01:52:35PM +0200, Richard Biener wrote:
> > On Sat, Aug 17, 2024 at 11:18 PM Jakub Jelinek  wrote:
> > >
> > > On Sat, Aug 17, 2024 at 05:03:14AM +, Li, Pan2 wrote:
> > > > Please feel free to let me know if there is anything I can do to fix 
> > > > this issue. Thanks a lot.
> > >
> > > There is no bug.  The operands of .{ADD,SUB,MUL}_OVERFLOW don't have to 
> > > have the same type, as described in the 
> > > __builtin_{add,sub,mul}_overflow{,_p} documentation, each argument can 
> > > have different type and result yet another one, the behavior is then (as 
> > > if) to perform the operation in infinite precision and if that result 
> > > fits into the result type, there is no overflow, otherwise there is.
> > > So, there is no need to promote anything.
> >
> > Hmm, it's a bit awkward to have this state in the IL.
>
> Why?  These aren't the only internal functions which have different types
> of arguments, from the various widening ifns, conditional ifns,
> scatter/gather, ...  Even the WIDEN_*EXPR trees do have type differences
> among arguments.
> And it matches what the user builtin does.
>
> Furthermore, at least without _BitInt (but even with _BitInt at the maximum
> precision too) this might not be even possible.
> E.g. if there is __builtin_add_overflow with unsigned __int128 and __int128
> arguments and there are no wider types there is simply no type to use for both
> arguments, it would need to be a signed type with at least 129 bits...
>
> > I see that
> > expand_arith_overflow eventually applies
> > promotion, namely to the type of the LHS.
>
> The LHS doesn't have to be wider than the operand types, so it can't promote
> always.  Yes, in some cases it applies promotion if it is desirable for
> codegen purposes.  But without the promotions explicitly in the IL it
> doesn't need to rely on VRP to figure out how to expand it exactly.
>
> > Exposing this earlier could
> > enable optimization even
>
> Which optimizations?

I was thinking of merging conversions with that implied promotion.

>  We already try to fold the .{ADD,SUB,MUL}_OVERFLOW
> builtins to constants or non-overflowing arithmetics etc. as soon as we
> can e.g. using ranges prove the operation will never overflow or will always
> overflow.  Doing unnecessary promotion (see above that it might not be
> always possible at all) would just make the IL larger and risk we during
> expansion actually perform the promotions even when we don't have to.
> We on the other side already have match.pd rules to undo such promotions
> in the operands.  See
> /* Demote operands of IFN_{ADD,SUB,MUL}_OVERFLOW.  */
> And the result (well, TREE_TYPE of the lhs type) can be yet another type,
> not related to either of those in any way.

OK, fair enough.  I think this also shows again the lack of documentation
of internal function signatures (hits me all the time with the more complex
ones like MASK_LEN_GATHER_LOAD where I always wonder which
argument is what) as well as IL type checking (which can also serve as
documentation about argument constraints).

IMO comments in internal-fn.def would suffice for the former (like effectively
tree.h/def provide authority for tree codes);  for IL verification a function
in interna

RE: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for vectorizable_call

2024-08-20 Thread Li, Pan2
Thanks Tamar for comments and explanations.

> But because you've now matched here, another pattern can't match
> anymore, and more importantly, it prevents is from trying any alternative way 
> to vectorize this (if there was one).

> That's why the pattern matcher shouldn't knowingly accept something we know 
> can't get vectorized.  You shouldn't
> build the pattern at all.

> And the reason I suggested doing this check in the match.pd is because of an 
> inconsistency between the variable and immediate
> variant if it's not done there.

Got the point here, I will double check all SAT_* related matching pattern for 
INT_CST type check.

Pan

-Original Message-
From: Tamar Christina  
Sent: Tuesday, August 20, 2024 3:56 PM
To: Li, Pan2 ; Jakub Jelinek 
Cc: Richard Biener ; gcc-patches@gcc.gnu.org; 
juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; 
rdapp@gmail.com; Liu, Hongtao 
Subject: RE: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for 
vectorizable_call

Hi Pan,

> -Original Message-
> From: Li, Pan2 
> Sent: Tuesday, August 20, 2024 1:58 AM
> To: Tamar Christina ; Jakub Jelinek
> 
> Cc: Richard Biener ; gcc-patches@gcc.gnu.org;
> juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com;
> rdapp@gmail.com; Liu, Hongtao 
> Subject: RE: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for
> vectorizable_call
> 
> Thanks Jakub and Tamar for comments and suggestions.
> 
> The match.pd list as below doesn't check the INT_CST type for .SAT_ADD.
> 
> (match (unsigned_integer_sat_add @0 @1)
> (cond^ (ne (imagpart (IFN_ADD_OVERFLOW@2 @0 INTEGER_CST@1))
> integer_zerop)
>   integer_minus_onep (realpart @2))
>   (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
>   && types_match (type, @0
> 
> Thus the different types of .ADD_OVERFLOW could hit the pattern. The
> vectorizable_call strictly
> check the operands are totally the same, while the scalar doesn't have similar
> check. That
> is why I only found this issue from vector part.

Yeah and my question was more why are we getting away with it for the scalar.
So I implemented the optabs so I can take a look.

It looks like the scalar version doesn't match because split-path rewrites the 
IL
when the argument is a constant.  Passing -fno-split-paths gets it to generate
the instruction where we see that the IFN will then also contain mixed types.

#include 
#include 

  #define DEF_VEC_SAT_U_ADD_IMM_FMT_3(T, IMM)  \
  T __attribute__((noinline))  \
  vec_sat_u_add_imm##IMM##_##T##_fmt_3 (T *out, T *in, unsigned limit) \
  {\
unsigned i;\
T ret; \
for (i = 0; i < limit; i++)\
  {\
out[i] = __builtin_add_overflow (in[i], IMM, &ret) ? -1 : ret; \
  }\
  }

#define CST -9LL
DEF_VEC_SAT_U_ADD_IMM_FMT_3(uint8_t, CST)

int
main ()
{
  uint8_t a = 1;
  uint8_t r = 0;
  vec_sat_u_add_immCST_uint8_t_fmt_3 (&r, &a, 1);
  printf ("r=%u\n", r);
}

Generates:

moviv31.8b, 0xfff7
uxtwx2, w2
mov x3, 0
.p2align 5,,15
.L3:
ldr b30, [x1, x3]
uqadd   b30, b30, b31
str b30, [x0, x3]
add x3, x3, 1
cmp x2, x3
bne .L3

which is incorrect, it's expected to saturate but instead is doing x + 0xF7.

This is because of what Richi said before, there's nothing else in GIMPLE that 
tries to validate the operands,
and expand will simply force the operand to the register of the size it 
requested and doesn't care about the outcome.

For constants that are out of range, we're getting lucky in that existing math 
rules will remove the operation and replace
It with -1.  Because the operations know it would overflow in this case so at 
compile time the check goes away.  That's why
The problem doesn't show up with an out of range constant since there's no 
saturation check anymore.  But this is pure luck.

Secondly the reason I said that

+static void
+vect_recog_promote_cst_to_unsigned (tree *op, tree type)
+{
+  if (TREE_CODE (*op) != INTEGER_CST || !TYPE_UNSIGNED (type))
+return;
+
+  unsigned precision = TYPE_PRECISION (type);
+  wide_int type_max = wi::mask (precision, false, precision);
+  wide_int op_cst_val = wi::to_wide (*op, precision);
+
+  if (wi::leu_p (op_cst_val, type_max))
+*op = wide_int_to_tree (t

RE: [PATCH v2] Match: Support form 1 for scalar signed integer .SAT_ADD

2024-08-19 Thread Li, Pan2
Kindly ping.

Pan

-Original Message-
From: Li, Pan2  
Sent: Wednesday, August 7, 2024 5:31 PM
To: gcc-patches@gcc.gnu.org
Cc: richard.guent...@gmail.com; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
jeffreya...@gmail.com; rdapp@gmail.com; Li, Pan2 
Subject: [PATCH v2] Match: Support form 1 for scalar signed integer .SAT_ADD

From: Pan Li 

This patch would like to support the form 1 of the scalar signed
integer .SAT_ADD.  Aka below example:

Form 1:
  #define DEF_SAT_S_ADD_FMT_1(T, MIN, MAX) \
  T __attribute__((noinline))  \
  sat_s_add_##T##_fmt_1 (T x, T y) \
  {\
T sum = x + y; \
return (x ^ y) < 0 \
  ? sum\
  : (sum ^ x) >= 0 \
? sum  \
: x < 0 ? MIN : MAX;   \
  }

DEF_SAT_S_ADD_FMT_1(int64_t, INT64_MIN, INT64_MAX)

We can tell the difference before and after this patch if backend
implemented the ssadd3 pattern similar as below.

Before this patch:
   4   │ __attribute__((noinline))
   5   │ int64_t sat_s_add_int64_t_fmt_1 (int64_t x, int64_t y)
   6   │ {
   7   │   int64_t sum;
   8   │   long int _1;
   9   │   long int _2;
  10   │   int64_t _3;
  11   │   _Bool _8;
  12   │   long int _9;
  13   │   long int _10;
  14   │   long int _11;
  15   │   long int _12;
  16   │   long int _13;
  17   │
  18   │[local count: 1073741824]:
  19   │   sum_6 = x_4(D) + y_5(D);
  20   │   _1 = x_4(D) ^ y_5(D);
  21   │   _2 = x_4(D) ^ sum_6;
  22   │   _12 = ~_1;
  23   │   _13 = _2 & _12;
  24   │   if (_13 < 0)
  25   │ goto ; [41.00%]
  26   │   else
  27   │ goto ; [59.00%]
  28   │
  29   │[local count: 259738147]:
  30   │   _8 = x_4(D) < 0;
  31   │   _9 = (long int) _8;
  32   │   _10 = -_9;
  33   │   _11 = _10 ^ 9223372036854775807;
  34   │
  35   │[local count: 1073741824]:
  36   │   # _3 = PHI 
  37   │   return _3;
  38   │
  39   │ }

After this patch:
   4   │ __attribute__((noinline))
   5   │ int64_t sat_s_add_int64_t_fmt_1 (int64_t x, int64_t y)
   6   │ {
   7   │   int64_t _4;
   8   │
   9   │ ;;   basic block 2, loop depth 0
  10   │ ;;pred:   ENTRY
  11   │   _4 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
  12   │   return _4;
  13   │ ;;succ:   EXIT
  14   │
  15   │ }

The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.

gcc/ChangeLog:

* match.pd: Add the matching for signed .SAT_ADD.
* tree-ssa-math-opts.cc (gimple_signed_integer_sat_add): Add new
matching func decl.
(match_unsigned_saturation_add): Try signed .SAT_ADD and rename
to ...
(match_saturation_add): ... here.
(math_opts_dom_walker::after_dom_children): Update the above renamed
func from caller.

Signed-off-by: Pan Li 
---
 gcc/match.pd  | 17 
 gcc/tree-ssa-math-opts.cc | 42 ++-
 2 files changed, 54 insertions(+), 5 deletions(-)

diff --git a/gcc/match.pd b/gcc/match.pd
index c9c8478d286..8b8a5dbcfe3 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3311,6 +3311,23 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   }
   (if (otype_precision < itype_precision && wi::eq_p (trunc_max, int_cst))
 
+/* Signed saturation add, case 1:
+   T sum = X + Y;
+   SAT_S_ADD = (X ^ Y) < 0
+ ? sum
+ : (sum ^ x) >= 0
+   ? sum
+   : x < 0 ? MIN : MAX;  */
+(match (signed_integer_sat_add @0 @1)
+ (cond^ (lt (bit_and:c (bit_xor:c @0 (convert?@2 (plus:c (convert? @0)
+(convert? @1
+  (bit_not (bit_xor:c @0 @1)))
+   integer_zerop)
+   (bit_xor:c (negate (convert (lt @0 integer_zerop))) max_value)
+   @2)
+ (if (INTEGRAL_TYPE_P (type) && !TYPE_UNSIGNED (type)
+  && types_match (type, @0, @1
+
 /* x >  y  &&  x != XXX_MIN  -->  x > y
x >  y  &&  x == XXX_MIN  -->  false . */
 (for eqne (eq ne)
diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
index 8d96a4c964b..f39c88741a4 100644
--- a/gcc/tree-ssa-math-opts.cc
+++ b/gcc/tree-ssa-math-opts.cc
@@ -4023,6 +4023,8 @@ extern bool gimple_unsigned_integer_sat_add (tree, tree*, 
tree (*)(tree));
 extern bool gimple_unsigned_integer_sat_sub (tree, tree*, tree (*)(tree));
 extern bool gimple_unsigned_integer_sat_trunc (tree, tree*, tree (*)(tree));
 
+extern bool gimple_signed_integer_sat_add (tree, tree*, tree (*)(tree));
+
 static void
 build_saturation_binary_arith_call (gimple_stmt_iterator *gsi, internal_fn fn,
tree lhs, tree op_0, tree op_1)
@@ -4072,7 +4074,8 @@ match_unsigned_saturation_add (gimple_stmt_iterator *gsi, 
gassign *stmt)
 }
 
 /*
- * T

RE: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for vectorizable_call

2024-08-19 Thread Li, Pan2
Thanks Jakub and Tamar for comments and suggestions.

The match.pd list as below doesn't check the INT_CST type for .SAT_ADD.

(match (unsigned_integer_sat_add @0 @1)
(cond^ (ne (imagpart (IFN_ADD_OVERFLOW@2 @0 INTEGER_CST@1)) integer_zerop)
  integer_minus_onep (realpart @2))
  (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
  && types_match (type, @0

Thus the different types of .ADD_OVERFLOW could hit the pattern. The 
vectorizable_call strictly
check the operands are totally the same, while the scalar doesn't have similar 
check. That
is why I only found this issue from vector part.

It looks like we need to add the type check for INT_CST in match.pd predicate 
and then add explicit cast
to IMM from the source code to match the pattern. For example as below, not 
very sure it is reasonable or not.

#define DEF_VEC_SAT_U_ADD_IMM_FMT_3(T, IMM)  \
T __attribute__((noinline))  \
vec_sat_u_add_imm##IMM##_##T##_fmt_3 (T *out, T *in, unsigned limit) \
{\
  unsigned i;\
  T ret; \
  for (i = 0; i < limit; i++)\
{\
  out[i] = __builtin_add_overflow (in[i], IMM, &ret) ? -1 : ret; \   // 
need add (T)IMM, aka out[i] = __builtin_add_overflow (in[i], (T)IMM, &ret) ? -1 
: ret; to hit the pattern.
}\
}

Pan

-Original Message-
From: Tamar Christina  
Sent: Tuesday, August 20, 2024 3:41 AM
To: Jakub Jelinek 
Cc: Li, Pan2 ; Richard Biener ; 
gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
jeffreya...@gmail.com; rdapp@gmail.com; Liu, Hongtao 
Subject: RE: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for 
vectorizable_call

> -Original Message-
> From: Jakub Jelinek 
> Sent: Monday, August 19, 2024 8:25 PM
> To: Tamar Christina 
> Cc: Li, Pan2 ; Richard Biener ;
> gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com;
> jeffreya...@gmail.com; rdapp@gmail.com; Liu, Hongtao
> 
> Subject: Re: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for
> vectorizable_call
> 
> On Mon, Aug 19, 2024 at 01:55:38PM +, Tamar Christina wrote:
> > So would this not be the simplest fix:
> >
> > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> > index 87b3dc413b8..fcbc83a49f0 100644
> > --- a/gcc/tree-vect-patterns.cc
> > +++ b/gcc/tree-vect-patterns.cc
> > @@ -4558,6 +4558,9 @@ vect_recog_sat_add_pattern (vec_info *vinfo,
> stmt_vec_info stmt_vinfo,
> >
> >if (gimple_unsigned_integer_sat_add (lhs, ops, NULL))
> 
> But then you call gimple_unsigned_integer_sat_add with mismatching types,
> not sure if that is ok.
> 

gimple_unsigned_integer_sat_add is a match.pd predicate. It matches the 
expression
rooted in lhs and returns the results in ops.  So not sure what you mean here.

> >  {
> > +  if (TREE_CODE (ops[1]) == INTEGER_CST)
> > +   ops[1] = fold_convert (TREE_TYPE (ops[0]), ops[1]);
> > +
> 
> This would be only ok if the conversion doesn't change the value
> of the constant.
> .ADD_OVERFLOW etc. could have e.g. int and unsigned arguments, you don't
> want to change the latter to the former if the value has the most
> significant bit set.
> Similarly, .ADD_OVERFLOW could have e.g. unsigned and unsigned __int128
> arguments, you don't want to truncate the constant.

Yes, if the expression truncates or changes the sign of the expression this 
wouldn't
work.  But then you can also not match at all.  So the match should be rejected 
then
since the values need to fit in the same type as the argument and be the same 
sign.

So the original vect_recog_promote_cst_to_unsigned is also wrong since it 
doesn't
stop the match if it doesn't fit.

Imho, the pattern here cannot check this and it should be part of the match 
condition.
If the constant cannot fir into the same type as the operand or has a different 
sign
the matching should fail.

It's just that in match.pd you can't modify the arguments returned from a 
predicate
but since the predicate is intended to be used to rewrite to the IFN I still 
think the
above solution is right, and the range check should be done within the 
predicate.

Tamar.

> So, you could e.g. do the fold_convert and then verify if
> wi::to_widest on the old and new tree are equal, or you could check for
> TREE_OVERFLOW if fold_convert honors that.
> As I said, for INTEGER_CST operands of .ADD/SUB/MUL_OVERFLOW, the infinite
> precision value (aka wi::to_widest) is all that matters.
> 
>   Jakub



RE: [PATCH v1 1/2] RISC-V: Add testcases for unsigned scalar quad and oct .SAT_TRUNC form 2

2024-08-19 Thread Li, Pan2
Great! Thanks Patrick.

Pan

-Original Message-
From: Patrick O'Neill  
Sent: Tuesday, August 20, 2024 12:14 AM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com; Jeff Law 

Subject: Re: [PATCH v1 1/2] RISC-V: Add testcases for unsigned scalar quad and 
oct .SAT_TRUNC form 2

Hi Pan,

Once the postcommit baseline moves forward (trunk is currently failing 
to build linux targets [1] [2]) I'll re-trigger precommit for you.

Thanks,
Patrick

[1]: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116409
[2]: https://github.com/patrick-rivos/gcc-postcommit-ci/issues/1564

On 8/18/24 19:49, Li, Pan2 wrote:
> Turn out that the pre-commit doesn't pick up the newest upstream when testing 
> this patch.
>
> Pan
>
> -----Original Message-
> From: Li, Pan2 
> Sent: Monday, August 19, 2024 9:25 AM
> To: Jeff Law ; gcc-patches@gcc.gnu.org
> Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com
> Subject: RE: [PATCH v1 1/2] RISC-V: Add testcases for unsigned scalar quad 
> and oct .SAT_TRUNC form 2
>
> Opps, let me double check what happened to my local tester.
>
> Pan
>
> -Original Message-----
> From: Jeff Law 
> Sent: Sunday, August 18, 2024 11:21 PM
> To: Li, Pan2 ; gcc-patches@gcc.gnu.org
> Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com
> Subject: Re: [PATCH v1 1/2] RISC-V: Add testcases for unsigned scalar quad 
> and oct .SAT_TRUNC form 2
>
>
>
> On 8/18/24 12:10 AM, pan2...@intel.com wrote:
>> From: Pan Li 
>>
>> This patch would like to add test cases for the unsigned scalar quad and
>> oct .SAT_TRUNC form 2.  Aka:
>>
>> Form 2:
>> #define DEF_SAT_U_TRUC_FMT_2(NT, WT) \
>> NT __attribute__((noinline)) \
>> sat_u_truc_##WT##_to_##NT##_fmt_2 (WT x) \
>> {\
>>   WT max = (WT)(NT)-1;   \
>>   return x > max ? (NT) max : (NT)x; \
>> }
>>
>> QUAD:
>> DEF_SAT_U_TRUC_FMT_2 (uint16_t, uint64_t)
>> DEF_SAT_U_TRUC_FMT_2 (uint8_t, uint32_t)
>>
>> OCT:
>> DEF_SAT_U_TRUC_FMT_2 (uint8_t, uint64_t)
>>
>> The below test is passed for this patch.
>> * The rv64gcv regression test.
>>
>> gcc/testsuite/ChangeLog:
>>
>>  * gcc.target/riscv/sat_u_trunc-10.c: New test.
>>  * gcc.target/riscv/sat_u_trunc-11.c: New test.
>>  * gcc.target/riscv/sat_u_trunc-12.c: New test.
>>  * gcc.target/riscv/sat_u_trunc-run-10.c: New test.
>>  * gcc.target/riscv/sat_u_trunc-run-11.c: New test.
>>  * gcc.target/riscv/sat_u_trunc-run-12.c: New test.
> Looks like they're failing in the upstream pre-commit tester:
>
>> https://github.com/ewlu/gcc-precommit-ci/issues/2066#issuecomment-2295137578
>
> jeff


RE: [PATCH v1 1/2] RISC-V: Add testcases for unsigned scalar quad and oct .SAT_TRUNC form 2

2024-08-18 Thread Li, Pan2
Turn out that the pre-commit doesn't pick up the newest upstream when testing 
this patch.

Pan

-Original Message-
From: Li, Pan2  
Sent: Monday, August 19, 2024 9:25 AM
To: Jeff Law ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com
Subject: RE: [PATCH v1 1/2] RISC-V: Add testcases for unsigned scalar quad and 
oct .SAT_TRUNC form 2

Opps, let me double check what happened to my local tester.

Pan

-Original Message-
From: Jeff Law  
Sent: Sunday, August 18, 2024 11:21 PM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1 1/2] RISC-V: Add testcases for unsigned scalar quad and 
oct .SAT_TRUNC form 2



On 8/18/24 12:10 AM, pan2...@intel.com wrote:
> From: Pan Li 
> 
> This patch would like to add test cases for the unsigned scalar quad and
> oct .SAT_TRUNC form 2.  Aka:
> 
> Form 2:
>#define DEF_SAT_U_TRUC_FMT_2(NT, WT) \
>NT __attribute__((noinline)) \
>sat_u_truc_##WT##_to_##NT##_fmt_2 (WT x) \
>{\
>  WT max = (WT)(NT)-1;   \
>  return x > max ? (NT) max : (NT)x; \
>}
> 
> QUAD:
> DEF_SAT_U_TRUC_FMT_2 (uint16_t, uint64_t)
> DEF_SAT_U_TRUC_FMT_2 (uint8_t, uint32_t)
> 
> OCT:
> DEF_SAT_U_TRUC_FMT_2 (uint8_t, uint64_t)
> 
> The below test is passed for this patch.
> * The rv64gcv regression test.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/riscv/sat_u_trunc-10.c: New test.
>   * gcc.target/riscv/sat_u_trunc-11.c: New test.
>   * gcc.target/riscv/sat_u_trunc-12.c: New test.
>   * gcc.target/riscv/sat_u_trunc-run-10.c: New test.
>   * gcc.target/riscv/sat_u_trunc-run-11.c: New test.
>   * gcc.target/riscv/sat_u_trunc-run-12.c: New test.
Looks like they're failing in the upstream pre-commit tester:

> https://github.com/ewlu/gcc-precommit-ci/issues/2066#issuecomment-2295137578


jeff


RE: [PATCH v1 1/2] RISC-V: Add testcases for unsigned scalar quad and oct .SAT_TRUNC form 2

2024-08-18 Thread Li, Pan2
Please ignore this patch, should be sent by mistake.

Pan

-Original Message-
From: Li, Pan2  
Sent: Monday, August 19, 2024 10:04 AM
To: gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; 
rdapp@gmail.com; Li, Pan2 
Subject: [PATCH v1 1/2] RISC-V: Add testcases for unsigned scalar quad and oct 
.SAT_TRUNC form 2

From: Pan Li 

This patch would like to add test cases for the unsigned scalar quad and
oct .SAT_TRUNC form 2.  Aka:

Form 2:
  #define DEF_SAT_U_TRUC_FMT_2(NT, WT) \
  NT __attribute__((noinline)) \
  sat_u_truc_##WT##_to_##NT##_fmt_2 (WT x) \
  {\
WT max = (WT)(NT)-1;   \
return x > max ? (NT) max : (NT)x; \
  }

QUAD:
DEF_SAT_U_TRUC_FMT_2 (uint16_t, uint64_t)
DEF_SAT_U_TRUC_FMT_2 (uint8_t, uint32_t)

OCT:
DEF_SAT_U_TRUC_FMT_2 (uint8_t, uint64_t)

The below test is passed for this patch.
* The rv64gcv regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_u_trunc-10.c: New test.
* gcc.target/riscv/sat_u_trunc-11.c: New test.
* gcc.target/riscv/sat_u_trunc-12.c: New test.
* gcc.target/riscv/sat_u_trunc-run-10.c: New test.
* gcc.target/riscv/sat_u_trunc-run-11.c: New test.
* gcc.target/riscv/sat_u_trunc-run-12.c: New test.

Signed-off-by: Pan Li 
---
 .../gcc.target/riscv/sat_u_trunc-10.c | 17 
 .../gcc.target/riscv/sat_u_trunc-11.c | 17 
 .../gcc.target/riscv/sat_u_trunc-12.c | 20 +++
 .../gcc.target/riscv/sat_u_trunc-run-10.c | 16 +++
 .../gcc.target/riscv/sat_u_trunc-run-11.c | 16 +++
 .../gcc.target/riscv/sat_u_trunc-run-12.c | 16 +++
 6 files changed, 102 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-10.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-11.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-12.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-run-10.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-run-11.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-run-12.c

diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_trunc-10.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_trunc-10.c
new file mode 100644
index 000..7dfc740c54f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_trunc-10.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_truc_uint32_t_to_uint8_t_fmt_2:
+** sltiu\s+[atx][0-9]+,\s*a0,\s*255
+** addi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*-1
+** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** andi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*0xff
+** ret
+*/
+DEF_SAT_U_TRUC_FMT_2(uint8_t, uint32_t)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_TRUNC " 2 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_trunc-11.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_trunc-11.c
new file mode 100644
index 000..c50ae96f47d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_trunc-11.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_truc_uint64_t_to_uint8_t_fmt_2:
+** sltiu\s+[atx][0-9]+,\s*a0,\s*255
+** addi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*-1
+** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** andi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*0xff
+** ret
+*/
+DEF_SAT_U_TRUC_FMT_2(uint8_t, uint64_t)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_TRUNC " 2 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_trunc-12.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_trunc-12.c
new file mode 100644
index 000..61331cee6fa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_trunc-12.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_truc_uint64_t_to_uint16_t_fmt_2:
+** li\s+[atx][0-9]+,\s*65536
+** addi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*-1
+** sltu\s+[atx][0-9]+,\s*a0,\s*[atx][0-9]+
+** addi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*-1
+** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** slli\s+a0,\s*a0,\s*48
+** srli\s+a0,\s*a0,\s*48
+** ret
+*/
+DEF_SAT_U_TRUC_FMT_2(uint16_t, uint64_t)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_TRUNC " 2 "expand" } } 

RE: [PATCH v1 1/2] RISC-V: Add testcases for unsigned scalar quad and oct .SAT_TRUNC form 2

2024-08-18 Thread Li, Pan2
Opps, let me double check what happened to my local tester.

Pan

-Original Message-
From: Jeff Law  
Sent: Sunday, August 18, 2024 11:21 PM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1 1/2] RISC-V: Add testcases for unsigned scalar quad and 
oct .SAT_TRUNC form 2



On 8/18/24 12:10 AM, pan2...@intel.com wrote:
> From: Pan Li 
> 
> This patch would like to add test cases for the unsigned scalar quad and
> oct .SAT_TRUNC form 2.  Aka:
> 
> Form 2:
>#define DEF_SAT_U_TRUC_FMT_2(NT, WT) \
>NT __attribute__((noinline)) \
>sat_u_truc_##WT##_to_##NT##_fmt_2 (WT x) \
>{\
>  WT max = (WT)(NT)-1;   \
>  return x > max ? (NT) max : (NT)x; \
>}
> 
> QUAD:
> DEF_SAT_U_TRUC_FMT_2 (uint16_t, uint64_t)
> DEF_SAT_U_TRUC_FMT_2 (uint8_t, uint32_t)
> 
> OCT:
> DEF_SAT_U_TRUC_FMT_2 (uint8_t, uint64_t)
> 
> The below test is passed for this patch.
> * The rv64gcv regression test.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/riscv/sat_u_trunc-10.c: New test.
>   * gcc.target/riscv/sat_u_trunc-11.c: New test.
>   * gcc.target/riscv/sat_u_trunc-12.c: New test.
>   * gcc.target/riscv/sat_u_trunc-run-10.c: New test.
>   * gcc.target/riscv/sat_u_trunc-run-11.c: New test.
>   * gcc.target/riscv/sat_u_trunc-run-12.c: New test.
Looks like they're failing in the upstream pre-commit tester:

> https://github.com/ewlu/gcc-precommit-ci/issues/2066#issuecomment-2295137578


jeff


RE: [PATCH v1] Test: Move pr116278 run test to c-torture [NFC]

2024-08-18 Thread Li, Pan2
Sure, will send v2 for this.

Pan

-Original Message-
From: Jeff Law  
Sent: Sunday, August 18, 2024 11:19 PM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: richard.guent...@gmail.com; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
rdapp@gmail.com; s...@gentoo.org
Subject: Re: [PATCH v1] Test: Move pr116278 run test to c-torture [NFC]



On 8/18/24 1:13 AM, pan2...@intel.com wrote:
> From: Pan Li 
> 
> Move the run test of pr116278 to c-torture and leave the risc-v the
> asm check under risc-v part.
> 
>   PR target/116278
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/riscv/pr116278-run-1.c: Take compile instead of
>   run test.
>   * gcc.target/riscv/pr116278-run-2.c: Ditto.
>   * gcc.c-torture/execute/pr116278-run-1.c: New test.
>   * gcc.c-torture/execute/pr116278-run-2.c: New test.
We should be using the dg-torture framework, so the right directory for 
the test is gcc.dg/torture.

I suspect these tests (just based on the constants that appear) may not 
work on the 16 bit integer targets.  So we may need

/* { dg-require-effective-target int32 } */

But I don't mind faulting that in if/when we see the 16bit int targets 
complain.

So OK in the right directory (gcc.dg/torture).

Jeff



RE: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for vectorizable_call

2024-08-17 Thread Li, Pan2
Thanks Jakub for explaining.

Hi Richard,

Does it mean we need to do some promotion similar as this patch to make the 
vectorizable_call happy
when there is a constant operand? I am not sure if there is a better approach 
for this case.

Pan 

-Original Message-
From: Jakub Jelinek  
Sent: Sunday, August 18, 2024 5:21 AM
To: Li, Pan2 
Cc: Richard Biener ; gcc-patches@gcc.gnu.org; 
juzhe.zh...@rivai.ai; kito.ch...@gmail.com; tamar.christ...@arm.com; 
jeffreya...@gmail.com; rdapp@gmail.com; Liu, Hongtao 
Subject: Re: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for 
vectorizable_call

On Sat, Aug 17, 2024 at 05:03:14AM +, Li, Pan2 wrote:
> Thanks Richard for confirmation. Sorry almost forget this thread.
> 
> Please feel free to let me know if there is anything I can do to fix this 
> issue. Thanks a lot.

There is no bug.  The operands of .{ADD,SUB,MUL}_OVERFLOW don't have to
have the same type, as described in the
__builtin_{add,sub,mul}_overflow{,_p} documentation, each argument can have
different type and result yet another one, the behavior is then (as if) to
perform the operation in infinite precision and if that result fits into
the result type, there is no overflow, otherwise there is.
So, there is no need to promote anything, promoted constants would have the
same value as the non-promoted ones and the value is all that matters for
constants.

Jakub



RE: [PATCH v1] RISC-V: Implement the quad and oct .SAT_TRUNC for scalar

2024-08-17 Thread Li, Pan2
> OK.  Sorry for the delays here.  I wanted to make sure we had the issues 
> WRT operand extension resolved before diving into this.  But in 
> retrospect, this probably could have moved forward independently.

That make much sense to me, thanks a lot.

Pan

-Original Message-
From: Jeff Law  
Sent: Sunday, August 18, 2024 2:21 AM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1] RISC-V: Implement the quad and oct .SAT_TRUNC for scalar



On 7/22/24 11:06 PM, pan2...@intel.com wrote:
> From: Pan Li 
> 
> This patch would like to implement the quad and oct .SAT_TRUNC pattern
> in the riscv backend. Aka:
> 
> Form 1:
>#define DEF_SAT_U_TRUC_FMT_1(NT, WT) \
>NT __attribute__((noinline)) \
>sat_u_truc_##WT##_to_##NT##_fmt_1 (WT x) \
>{\
>  bool overflow = x > (WT)(NT)(-1);  \
>  return ((NT)x) | (NT)-overflow;\
>}
> 
> DEF_SAT_U_TRUC_FMT_1(uint16_t, uint64_t)
> 
> Before this patch:
> 4   │ __attribute__((noinline))
> 5   │ uint16_t sat_u_truc_uint64_t_to_uint16_t_fmt_1 (uint64_t x)
> 6   │ {
> 7   │   _Bool overflow;
> 8   │   short unsigned int _1;
> 9   │   short unsigned int _2;
>10   │   short unsigned int _3;
>11   │   uint16_t _6;
>12   │
>13   │ ;;   basic block 2, loop depth 0
>14   │ ;;pred:   ENTRY
>15   │   overflow_5 = x_4(D) > 65535;
>16   │   _1 = (short unsigned int) x_4(D);
>17   │   _2 = (short unsigned int) overflow_5;
>18   │   _3 = -_2;
>19   │   _6 = _1 | _3;
>20   │   return _6;
>21   │ ;;succ:   EXIT
>22   │
>23   │ }
> 
> After this patch:
> 3   │
> 4   │ __attribute__((noinline))
> 5   │ uint16_t sat_u_truc_uint64_t_to_uint16_t_fmt_1 (uint64_t x)
> 6   │ {
> 7   │   uint16_t _6;
> 8   │
> 9   │ ;;   basic block 2, loop depth 0
>10   │ ;;pred:   ENTRY
>11   │   _6 = .SAT_TRUNC (x_4(D)); [tail call]
>12   │   return _6;
>13   │ ;;succ:   EXIT
>14   │
>15   │ }
> 
> The below tests suites are passed for this patch
> 1. The rv64gcv fully regression test.
> 2. The rv64gcv build with glibc
> 
> gcc/ChangeLog:
> 
>   * config/riscv/iterators.md (ANYI_QUAD_TRUNC): New iterator for
>   quad truncation.
>   (ANYI_OCT_TRUNC): New iterator for oct truncation.
>   (ANYI_QUAD_TRUNCATED): New attr for truncated quad modes.
>   (ANYI_OCT_TRUNCATED): New attr for truncated oct modes.
>   (anyi_quad_truncated): Ditto but for lower case.
>   (anyi_oct_truncated): Ditto but for lower case.
>   * config/riscv/riscv.md (ustrunc2):
>   Add new pattern for quad truncation.
>   (ustrunc2): Ditto but for oct.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-2.c: Adjust
>   the expand dump check times.
>   * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-3.c: Ditto.
>   * gcc.target/riscv/sat_arith_data.h: Add test helper macros.
>   * gcc.target/riscv/sat_u_trunc-4.c: New test.
>   * gcc.target/riscv/sat_u_trunc-5.c: New test.
>   * gcc.target/riscv/sat_u_trunc-6.c: New test.
>   * gcc.target/riscv/sat_u_trunc-run-4.c: New test.
>   * gcc.target/riscv/sat_u_trunc-run-5.c: New test.
>   * gcc.target/riscv/sat_u_trunc-run-6.c: New test.
OK.  Sorry for the delays here.  I wanted to make sure we had the issues 
WRT operand extension resolved before diving into this.  But in 
retrospect, this probably could have moved forward independently.

Jeff




RE: [PATCH v4] RISC-V: Make sure high bits of usadd operands is clean for non-Xmode [PR116278]

2024-08-17 Thread Li, Pan2
> OK.  And I think this shows the basic approach we want to use if there 
> are other builtins that accept sub-word modes.  ie, get the operands 
> into X mode (by extending them as appropriate), then do as much work in 
> X mode as possible, then truncate the result if needed.

> Thanks for your patience on this.

Thanks Jeff for comments and suggestions, I will have a try if we can do some 
combine-like optimization for
the SImode asm in RV64.

Pan

-Original Message-
From: Jeff Law  
Sent: Sunday, August 18, 2024 2:17 AM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v4] RISC-V: Make sure high bits of usadd operands is clean 
for non-Xmode [PR116278]



On 8/16/24 9:43 PM, pan2...@intel.com wrote:
> From: Pan Li 
> 
> For QI/HImode of .SAT_ADD,  the operands may be sign-extended and the
> high bits of Xmode may be all 1 which is not expected.  For example as
> below code.
> 
> signed char b[1];
> unsigned short c;
> signed char *d = b;
> int main() {
>b[0] = -40;
>c = ({ (unsigned short)d[0] < 0xFFF6 ? (unsigned short)d[0] : 0xFFF6; }) + 
> 9;
>__builtin_printf("%d\n", c);
> }
> 
> After expanding we have:
> 
> ;; _6 = .SAT_ADD (_3, 9);
> (insn 8 7 9 (set (reg:DI 143)
>  (high:DI (symbol_ref:DI ("d") [flags 0x86]  )))
>   (nil))
> (insn 9 8 10 (set (reg/f:DI 142)
>  (mem/f/c:DI (lo_sum:DI (reg:DI 143)
>  (symbol_ref:DI ("d") [flags 0x86]  )) [1 d+0 S8 
> A64]))
>   (nil))
> (insn 10 9 11 (set (reg:HI 144 [ _3 ])
>  (sign_extend:HI (mem:QI (reg/f:DI 142) [0 *d.0_1+0 S1 A8]))) 
> "test.c":7:10 -1
>   (nil))
> 
> The convert from signed char to unsigned short will have sign_extend rtl
> as above.  And finally become the lb insn as below:
> 
> lb  a1,0(a5)   // a1 is -40, aka 0xffd8
> lui a0,0x1a
> addia5,a1,9
> sllia5,a5,0x30
> srlia5,a5,0x30 // a5 is 65505
> sltua1,a5,a1   // compare 65505 and 0xffd8 => TRUE
> 
> The sltu try to compare 65505 and 0xffd8 here,  but we
> actually want to compare 65505 and 65496 (0xffd8).  Thus we need to
> clean up the high bits to ensure this.
> 
> The below test suites are passed for this patch:
> * The rv64gcv fully regression test.
> 
>   PR target/116278
> 
> gcc/ChangeLog:
> 
>   * config/riscv/riscv.cc (riscv_gen_zero_extend_rtx): Add new
>   func impl to zero extend rtx.
>   (riscv_expand_usadd): Leverage above func to cleanup operands
>   and sum.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/riscv/pr116278-run-1.c: New test.
>   * gcc.target/riscv/pr116278-run-2.c: New test.
> 
>   PR 116278
> 
> gcc/ChangeLog:
> 
>   * config/riscv/riscv.cc (riscv_gen_zero_extend_rtx): Add new
>   func impl to zero extend rtx.
>   (riscv_expand_usadd): Leverage above func to cleanup operands 0
>   and remove the special handing for SImode in RV64.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/riscv/sat_u_add-11.c: Adjust asm check body.
>   * gcc.target/riscv/sat_u_add-15.c: Ditto.
>   * gcc.target/riscv/sat_u_add-19.c: Ditto.
>   * gcc.target/riscv/sat_u_add-23.c: Ditto.
>   * gcc.target/riscv/sat_u_add-3.c: Ditto.
>   * gcc.target/riscv/sat_u_add-7.c: Ditto.
>   * gcc.target/riscv/sat_u_add_imm-11.c: Ditto.
>   * gcc.target/riscv/sat_u_add_imm-15.c: Ditto.
>   * gcc.target/riscv/sat_u_add_imm-3.c: Ditto.
>   * gcc.target/riscv/sat_u_add_imm-7.c: Ditto.
>   * gcc.target/riscv/pr116278-run-1.c: New test.
>   * gcc.target/riscv/pr116278-run-2.c: New test.
OK.  And I think this shows the basic approach we want to use if there 
are other builtins that accept sub-word modes.  ie, get the operands 
into X mode (by extending them as appropriate), then do as much work in 
X mode as possible, then truncate the result if needed.

Thanks for your patience on this.

Jeff


RE: [PATCH v1] RISC-V: Bugfix incorrect operand for vwsll auto-vect

2024-08-17 Thread Li, Pan2
> Thanks.  I've pushed this to the trunk.

Thanks a lot, Jeff.

Pan

-Original Message-
From: Jeff Law  
Sent: Saturday, August 17, 2024 11:27 PM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1] RISC-V: Bugfix incorrect operand for vwsll auto-vect



On 8/10/24 6:36 AM, pan2...@intel.com wrote:
> This patch would like to fix one ICE when rv64gcv_zvbb for vwsll.
> Consider below example.
> 
> void vwsll_vv_test (short *restrict dst, char *restrict a,
>  int *restrict b, int n)
> {
>for (int i = 0; i < n; i++)
>  dst[i] = a[i] << b[i];
> }
> 
> It will hit the vwsll pattern with following operands.
> operand 0 -> (reg:RVVMF2HI 146 [ vect__7.13 ])
> operand 1 -> (reg:RVVMF4QI 165 [ vect_cst__33 ])
> operand 2 -> (reg:RVVM1SI 171 [ vect_cst__36 ])
> 
> According to the ISA, operand 2 should be the same as operand 1.
> Aka operand 2 should have RVVMF4QI mode as above.  Thus,  add
> quad truncation for operand 2 before emit vwsll.
> 
> The below test suites are passed for this patch.
> * The rv64gcv fully regression test.
> 
>   PR target/116280
> 
> gcc/ChangeLog:
> 
>   * config/riscv/autovec-opt.md: Add quad truncation to
>   align the mode requirement for vwsll.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/riscv/rvv/base/pr116280-1.c: New test.
>   * gcc.target/riscv/rvv/base/pr116280-2.c: New test.
Thanks.  I've pushed this to the trunk.

jeff



RE: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for vectorizable_call

2024-08-16 Thread Li, Pan2
Thanks Richard for confirmation. Sorry almost forget this thread.

Hi Jakub,

Please feel free to let me know if there is anything I can do to fix this 
issue. Thanks a lot.

Pan


-Original Message-
From: Richard Biener  
Sent: Tuesday, July 16, 2024 11:29 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
tamar.christ...@arm.com; jeffreya...@gmail.com; rdapp@gmail.com; Liu, 
Hongtao ; Jakub Jelinek 
Subject: Re: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for 
vectorizable_call

On Tue, Jul 16, 2024 at 3:22 PM Li, Pan2  wrote:
>
> > I think that's a bug.  Do you say __builtin_add_overflow fails to promote
> > (constant) arguments?
>
> I double checked the 022t.ssa pass for the __builtin_add_overflow operands 
> tree type. It looks like that
> the 2 operands of .ADD_OVERFLOW has different tree types when one of them is 
> constant.
> One is unsigned DI, and the other is int.

I think that's a bug (and a downside of internal-functions as they
have no prototype the type
verifier could work with).

That you see them in 022t.ssa means that either the frontend
mis-handles the builtin call parsing
or fold_builtin_arith_overflow which is responsible for the rewriting
to an internal function is
wrong.

I've CCed Jakub who added those.

I think we could add verification for internal functions in the set of
commutative_binary_fn_p, commutative_ternary_fn_p, associative_binary_fn_p
and possibly others where we can constrain argument and result types.

Richard.

> (gdb) call debug_gimple_stmt(stmt)
> _14 = .ADD_OVERFLOW (_4, 129);
> (gdb) call debug_tree (gimple_call_arg(stmt, 0))
>   type  public unsigned DI
> size 
> unit-size 
> align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 
> 0x76a437e0 precision:64 min  max 
> 
> pointer_to_this >
> visited
> def_stmt _4 = *_3;
> version:4>
> (gdb) call debug_tree (gimple_call_arg(stmt, 1))
>   constant 
> 129>
> (gdb)
>
> Then we go to the vect pass, we can also see that the ops of .ADD_OVERFLOW 
> has different tree types.
> As my understanding, here we should have unsigned DI for constant operands
>
> (gdb) layout src
> (gdb) list
> 506 
> if (gimple_call_num_args (_c4) == 2)
> 507   
> {
> 508   
>   tree _q40 = gimple_call_arg (_c4, 0);
> 509   
>   _q40 = do_valueize (valueize, _q40);
> 510   
>   tree _q41 = gimple_call_arg (_c4, 1);
> 511   
>   _q41 = do_valueize (valueize, _q41);
> 512   
>   if (integer_zerop (_q21))
> 513   
> {
> 514   
>   if (integer_minus_onep (_p1))
> 515   
> {
> (gdb) call debug_tree (_q40)
>   type  public unsigned DI
> size 
> unit-size 
> align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 
> 0x76a437e0 precision:64 min  max 
> 
> pointer_to_this >
>     visited
> def_stmt _4 = *_3;
> version:4>
> (gdb) call debug_tree (_q41)
>   constant 
> 129>
>
> Pan
>
> -Original Message-
> From: Richard Biener 
> Sent: Wednesday, July 10, 2024 7:36 PM
> To: Li, Pan2 
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
> tamar.christ...@arm.com; jeffreya...@gmail.com; rdapp@gmail.com; Liu, 
> Hongtao 
> Subject: Re: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for 
> vectorizable_call
>
> On Wed, Jul 10, 2024 at 11:28 AM  wrote:
> >
> > From: Pan Li 
> >
> > The .SAT_ADD has 2 operand and one of the operand may be INTEGER_CST.
> > For example _1 = .SAT_ADD (_2, 9) comes from below sample code.
> >
> > Form 3:
> >   #define DEF_VEC_SAT_U_ADD_IMM_FMT_3(T, IMM)  \
> >   T __attribute__((noinline))  \
> >   vec_sat_u_add_imm##IMM##_##T##_fmt_3 (T *out, T *in, unsigned limit) \
> >   {\
> > unsigned i;  

RE: [PATCH] RISC-V: Fix factor in dwarf_poly_indeterminate_value [PR116305]

2024-08-16 Thread Li, Pan2
Should be in upstream already.

Pan

-Original Message-
From: Li, Pan2  
Sent: Saturday, August 17, 2024 11:45 AM
To: Zhijin Zeng 
Cc: gcc-patches@gcc.gnu.org; gcc-b...@gcc.gnu.org; Kito Cheng 

Subject: RE: [PATCH] RISC-V: Fix factor in dwarf_poly_indeterminate_value 
[PR116305]

Ok, I will commit it if no surprise from test as manually changing.

Pan

-Original Message-
From: Zhijin Zeng  
Sent: Saturday, August 17, 2024 10:46 AM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; gcc-b...@gcc.gnu.org; Kito Cheng 

Subject: Re: [PATCH] RISC-V: Fix factor in dwarf_poly_indeterminate_value 
[PR116305]

The patch for 3c9c93 as follow. But it's a little strange that this patch 
hasn't changed and I don't know why it apply fail. May you directly modify the 
riscv.cc if this version still conflict? The riscv.cc just changed two lines. 
Thank you again.
Zhijjin

This patch is to fix the bug (BugId:116305) introduced by the commit
bd93ef for risc-v target.

The commit bd93ef changes the chunk_num from 1 to TARGET_MIN_VLEN/128
if TARGET_MIN_VLEN is larger than 128 in riscv_convert_vector_bits. So
it changes the value of BYTES_PER_RISCV_VECTOR. For example, before
merging the commit bd93ef and if TARGET_MIN_VLEN is 256, the value
of BYTES_PER_RISCV_VECTOR should be [8, 8], but now [16, 16]. The value
of riscv_bytes_per_vector_chunk and BYTES_PER_RISCV_VECTOR are no longer
equal.

Prologue will use BYTES_PER_RISCV_VECTOR.coeffs[1] to estimate the vlenb
register value in riscv_legitimize_poly_move, and dwarf2cfi will also
get the estimated vlenb register value in riscv_dwarf_poly_indeterminate_value
to calculate the number of times to multiply the vlenb register value.

So need to change the factor from riscv_bytes_per_vector_chunk to
BYTES_PER_RISCV_VECTOR, otherwise we will get the incorrect dwarf
information. The incorrect example as follow:

```
csrr    t0,vlenb
slli    t1,t0,1
sub     sp,sp,t1

.cfi_escape 0xf,0xb,0x72,0,0x92,0xa2,0x38,0,0x34,0x1e,0x23,0x50,0x22
```

The sequence '0x92,0xa2,0x38,0' means the vlenb register, '0x34' means
the literal 4, '0x1e' means the multiply operation. But in fact, the
vlenb register value just need to multiply the literal 2.

gcc/ChangeLog:

        * config/riscv/riscv.cc (riscv_dwarf_poly_indeterminate_value):

gcc/testsuite/ChangeLog:

        * gcc.target/riscv/rvv/base/scalable_vector_cfi.c: New test.

Signed-off-by: Zhijin Zeng 
---
 gcc/config/riscv/riscv.cc                     |  4 +--
 .../riscv/rvv/base/scalable_vector_cfi.c      | 32 +++
 2 files changed, 34 insertions(+), 2 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 1f60d8f9711..8b7123e043e 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -11010,12 +11010,12 @@ static unsigned int
 riscv_dwarf_poly_indeterminate_value (unsigned int i, unsigned int *factor,
                                      int *offset)
 {
-  /* Polynomial invariant 1 == (VLENB / riscv_bytes_per_vector_chunk) - 1.
+  /* Polynomial invariant 1 == (VLENB / BYTES_PER_RISCV_VECTOR) - 1.
      1. TARGET_MIN_VLEN == 32, polynomial invariant 1 == (VLENB / 4) - 1.
      2. TARGET_MIN_VLEN > 32, polynomial invariant 1 == (VLENB / 8) - 1.
   */
   gcc_assert (i == 1);
-  *factor = riscv_bytes_per_vector_chunk;
+  *factor = BYTES_PER_RISCV_VECTOR.coeffs[1];
   *offset = 1;
   return RISCV_DWARF_VLENB;
 }
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c
new file mode 100644
index 000..184da10caf3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-options "-g -O3 -march=rv64gcv -mabi=lp64d" } */
+/* { dg-skip-if "" { *-*-* } {"-O2" "-O1" "-O0" "-Og" "-Oz" "-flto"} } */
+/* { dg-final { scan-assembler {cfi_escape .*0x92,0xa2,0x38,0,0x32,0x1e} } } */
+
+#include "riscv_vector.h"
+
+#define PI_2 1.570796326795
+
+extern void func(float *result);
+
+void test(const float *ys, const float *xs, float *result, size_t length) {
+    size_t gvl = __riscv_vsetvlmax_e32m2();
+    vfloat32m2_t vpi2 = __riscv_vfmv_v_f_f32m2(PI_2, gvl);
+
+    for(size_t i = 0; i < length;) {
+        gvl = __riscv_vsetvl_e32m2(length - i);
+        vfloat32m2_t y = __riscv_vle32_v_f32m2(ys, gvl);
+        vfloat32m2_t x = __riscv_vle32_v_f32m2(xs, gvl);
+        vbool16_t mask0  = __riscv_vmflt_vv_f32m2_b16(x, y, gvl);
+        vfloat32m2_t fixpi = __riscv_vfrsub_vf_f32m2_mu(mask0, vpi2, vpi2, 0, 
gvl);
+
+        __riscv_vse32_v_f32m2(result, fixpi, gvl);
+
+        func(result);
+
+        i += gvl;
+        ys += gvl;
+        xs += gvl;
+        result += gvl;
+    }
+}
--
2.34.1


> Fr

RE: [PATCH] RISC-V: Fix factor in dwarf_poly_indeterminate_value [PR116305]

2024-08-16 Thread Li, Pan2
Ok, I will commit it if no surprise from test as manually changing.

Pan

-Original Message-
From: Zhijin Zeng  
Sent: Saturday, August 17, 2024 10:46 AM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; gcc-b...@gcc.gnu.org; Kito Cheng 

Subject: Re: [PATCH] RISC-V: Fix factor in dwarf_poly_indeterminate_value 
[PR116305]

The patch for 3c9c93 as follow. But it's a little strange that this patch 
hasn't changed and I don't know why it apply fail. May you directly modify the 
riscv.cc if this version still conflict? The riscv.cc just changed two lines. 
Thank you again.
Zhijjin

This patch is to fix the bug (BugId:116305) introduced by the commit
bd93ef for risc-v target.

The commit bd93ef changes the chunk_num from 1 to TARGET_MIN_VLEN/128
if TARGET_MIN_VLEN is larger than 128 in riscv_convert_vector_bits. So
it changes the value of BYTES_PER_RISCV_VECTOR. For example, before
merging the commit bd93ef and if TARGET_MIN_VLEN is 256, the value
of BYTES_PER_RISCV_VECTOR should be [8, 8], but now [16, 16]. The value
of riscv_bytes_per_vector_chunk and BYTES_PER_RISCV_VECTOR are no longer
equal.

Prologue will use BYTES_PER_RISCV_VECTOR.coeffs[1] to estimate the vlenb
register value in riscv_legitimize_poly_move, and dwarf2cfi will also
get the estimated vlenb register value in riscv_dwarf_poly_indeterminate_value
to calculate the number of times to multiply the vlenb register value.

So need to change the factor from riscv_bytes_per_vector_chunk to
BYTES_PER_RISCV_VECTOR, otherwise we will get the incorrect dwarf
information. The incorrect example as follow:

```
csrr    t0,vlenb
slli    t1,t0,1
sub     sp,sp,t1

.cfi_escape 0xf,0xb,0x72,0,0x92,0xa2,0x38,0,0x34,0x1e,0x23,0x50,0x22
```

The sequence '0x92,0xa2,0x38,0' means the vlenb register, '0x34' means
the literal 4, '0x1e' means the multiply operation. But in fact, the
vlenb register value just need to multiply the literal 2.

gcc/ChangeLog:

        * config/riscv/riscv.cc (riscv_dwarf_poly_indeterminate_value):

gcc/testsuite/ChangeLog:

        * gcc.target/riscv/rvv/base/scalable_vector_cfi.c: New test.

Signed-off-by: Zhijin Zeng 
---
 gcc/config/riscv/riscv.cc                     |  4 +--
 .../riscv/rvv/base/scalable_vector_cfi.c      | 32 +++
 2 files changed, 34 insertions(+), 2 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 1f60d8f9711..8b7123e043e 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -11010,12 +11010,12 @@ static unsigned int
 riscv_dwarf_poly_indeterminate_value (unsigned int i, unsigned int *factor,
                                      int *offset)
 {
-  /* Polynomial invariant 1 == (VLENB / riscv_bytes_per_vector_chunk) - 1.
+  /* Polynomial invariant 1 == (VLENB / BYTES_PER_RISCV_VECTOR) - 1.
      1. TARGET_MIN_VLEN == 32, polynomial invariant 1 == (VLENB / 4) - 1.
      2. TARGET_MIN_VLEN > 32, polynomial invariant 1 == (VLENB / 8) - 1.
   */
   gcc_assert (i == 1);
-  *factor = riscv_bytes_per_vector_chunk;
+  *factor = BYTES_PER_RISCV_VECTOR.coeffs[1];
   *offset = 1;
   return RISCV_DWARF_VLENB;
 }
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c
new file mode 100644
index 000..184da10caf3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-options "-g -O3 -march=rv64gcv -mabi=lp64d" } */
+/* { dg-skip-if "" { *-*-* } {"-O2" "-O1" "-O0" "-Og" "-Oz" "-flto"} } */
+/* { dg-final { scan-assembler {cfi_escape .*0x92,0xa2,0x38,0,0x32,0x1e} } } */
+
+#include "riscv_vector.h"
+
+#define PI_2 1.570796326795
+
+extern void func(float *result);
+
+void test(const float *ys, const float *xs, float *result, size_t length) {
+    size_t gvl = __riscv_vsetvlmax_e32m2();
+    vfloat32m2_t vpi2 = __riscv_vfmv_v_f_f32m2(PI_2, gvl);
+
+    for(size_t i = 0; i < length;) {
+        gvl = __riscv_vsetvl_e32m2(length - i);
+        vfloat32m2_t y = __riscv_vle32_v_f32m2(ys, gvl);
+        vfloat32m2_t x = __riscv_vle32_v_f32m2(xs, gvl);
+        vbool16_t mask0  = __riscv_vmflt_vv_f32m2_b16(x, y, gvl);
+        vfloat32m2_t fixpi = __riscv_vfrsub_vf_f32m2_mu(mask0, vpi2, vpi2, 0, 
gvl);
+
+        __riscv_vse32_v_f32m2(result, fixpi, gvl);
+
+        func(result);
+
+        i += gvl;
+        ys += gvl;
+        xs += gvl;
+        result += gvl;
+    }
+}
--
2.34.1


> From: "Li, Pan2"
> Date:  Sat, Aug 17, 2024, 09:20
> Subject:  RE: [PATCH] RISC-V: Fix factor in dwarf_poly_indeterminate_value 
> [PR116305]
> To: "Zhijin Zeng"
> Cc: "gcc-patches@gcc.gnu.org", 
> "gcc-b...@gcc.gnu.org", "Kito 
&g

RE: [PATCH] RISC-V: Fix factor in dwarf_poly_indeterminate_value [PR116305]

2024-08-16 Thread Li, Pan2
Never mind, looks still conflict, could you please help to double check about 
it?
Current upstream should be 3c9c93f3c923c4a0ccd42db4fd26a944a3c91458.

└─(09:18:27 on master ✭)──> git apply tmp.patch 

──(Sat,Aug17)─┘
error: patch failed: gcc/config/riscv/riscv.cc:11010
error: gcc/config/riscv/riscv.cc: patch does not apply

Pan

-Original Message-
From: Zhijin Zeng  
Sent: Friday, August 16, 2024 9:30 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; gcc-b...@gcc.gnu.org; Kito Cheng 

Subject: Re: [PATCH] RISC-V: Fix factor in dwarf_poly_indeterminate_value 
[PR116305]

Sorry, the line number changed. The newest version as follow,

This patch is to fix the bug (BugId:116305) introduced by the commit
bd93ef for risc-v target.

The commit bd93ef changes the chunk_num from 1 to TARGET_MIN_VLEN/128
if TARGET_MIN_VLEN is larger than 128 in riscv_convert_vector_bits. So
it changes the value of BYTES_PER_RISCV_VECTOR. For example, before
merging the commit bd93ef and if TARGET_MIN_VLEN is 256, the value
of BYTES_PER_RISCV_VECTOR should be [8, 8], but now [16, 16]. The value
of riscv_bytes_per_vector_chunk and BYTES_PER_RISCV_VECTOR are no longer
equal.

Prologue will use BYTES_PER_RISCV_VECTOR.coeffs[1] to estimate the vlenb
register value in riscv_legitimize_poly_move, and dwarf2cfi will also
get the estimated vlenb register value in riscv_dwarf_poly_indeterminate_value
to calculate the number of times to multiply the vlenb register value.

So need to change the factor from riscv_bytes_per_vector_chunk to
BYTES_PER_RISCV_VECTOR, otherwise we will get the incorrect dwarf
information. The incorrect example as follow:

```
csrr    t0,vlenb
slli    t1,t0,1
sub     sp,sp,t1

.cfi_escape 0xf,0xb,0x72,0,0x92,0xa2,0x38,0,0x34,0x1e,0x23,0x50,0x22
```

The sequence '0x92,0xa2,0x38,0' means the vlenb register, '0x34' means
the literal 4, '0x1e' means the multiply operation. But in fact, the
vlenb register value just need to multiply the literal 2.

gcc/ChangeLog:

        * config/riscv/riscv.cc (riscv_dwarf_poly_indeterminate_value):

gcc/testsuite/ChangeLog:

        * gcc.target/riscv/rvv/base/scalable_vector_cfi.c: New test.

Signed-off-by: Zhijin Zeng 
---
 gcc/config/riscv/riscv.cc                     |  4 +--
 .../riscv/rvv/base/scalable_vector_cfi.c      | 32 +++
 2 files changed, 34 insertions(+), 2 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 1f60d8f9711..8b7123e043e 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -11010,12 +11010,12 @@ static unsigned int
 riscv_dwarf_poly_indeterminate_value (unsigned int i, unsigned int *factor,
                                      int *offset)
 {
-  /* Polynomial invariant 1 == (VLENB / riscv_bytes_per_vector_chunk) - 1.
+  /* Polynomial invariant 1 == (VLENB / BYTES_PER_RISCV_VECTOR) - 1.
      1. TARGET_MIN_VLEN == 32, polynomial invariant 1 == (VLENB / 4) - 1.
      2. TARGET_MIN_VLEN > 32, polynomial invariant 1 == (VLENB / 8) - 1.
   */
   gcc_assert (i == 1);
-  *factor = riscv_bytes_per_vector_chunk;
+  *factor = BYTES_PER_RISCV_VECTOR.coeffs[1];
   *offset = 1;
   return RISCV_DWARF_VLENB;
 }
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c
new file mode 100644
index 000..184da10caf3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-options "-g -O3 -march=rv64gcv -mabi=lp64d" } */
+/* { dg-skip-if "" { *-*-* } {"-O2" "-O1" "-O0" "-Og" "-Oz" "-flto"} } */
+/* { dg-final { scan-assembler {cfi_escape .*0x92,0xa2,0x38,0,0x32,0x1e} } } */
+
+#include "riscv_vector.h"
+
+#define PI_2 1.570796326795
+
+extern void func(float *result);
+
+void test(const float *ys, const float *xs, float *result, size_t length) {
+    size_t gvl = __riscv_vsetvlmax_e32m2();
+    vfloat32m2_t vpi2 = __riscv_vfmv_v_f_f32m2(PI_2, gvl);
+
+    for(size_t i = 0; i < length;) {
+        gvl = __riscv_vsetvl_e32m2(length - i);
+        vfloat32m2_t y = __riscv_vle32_v_f32m2(ys, gvl);
+        vfloat32m2_t x = __riscv_vle32_v_f32m2(xs, gvl);
+        vbool16_t mask0  = __riscv_vmflt_vv_f32m2_b16(x, y, gvl);
+        vfloat32m2_t fixpi = __riscv_vfrsub_vf_f32m2_mu(mask0, vpi2, vpi2, 0, 
gvl);
+
+        __riscv_vse32_v_f32m2(result, fixpi, gvl);
+
+        func(result);
+
+        i += gvl;
+        ys += gvl;
+        xs += gvl;
+        result += gvl;
+    }
+}
--
2.34.1

> From: "Li, Pan2"
> Date:  Fri, Aug 16, 2024, 21:05
> Sub

RE: [PATCH] RISC-V: Fix factor in dwarf_poly_indeterminate_value [PR116305]

2024-08-16 Thread Li, Pan2
Is this you newest version?
https://patchwork.sourceware.org/project/gcc/patch/8fd4328940034d8778cca67eaad54e5a2c2b1a6c.1c2f51e1.0a9a.4367.9762.9b6eccc3b...@feishu.cn/

If so, you may need to rebase upstream, I got conflict when git am.

Applying: RISC-V: Fix factor in dwarf_poly_indeterminate_value [PR116305]
error: corrupt patch at line 20
Patch failed at 0001 RISC-V: Fix factor in dwarf_poly_indeterminate_value 
[PR116305]
hint: Use 'git am --show-current-patch=diff' to see the failed patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

Pan

-Original Message-
From: Zhijin Zeng  
Sent: Friday, August 16, 2024 8:47 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; gcc-b...@gcc.gnu.org; Kito Cheng 

Subject: Re: [PATCH] RISC-V: Fix factor in dwarf_poly_indeterminate_value 
[PR116305]

Hi Pan,
I am a new guy for GCC and don't have authority to commit. Please help to 
commit this patch. Thank you very much.
Zhijin

> From: "Li, Pan2"
> Date:  Fri, Aug 16, 2024, 20:15
> Subject:  RE: [PATCH] RISC-V: Fix factor in dwarf_poly_indeterminate_value 
> [PR116305]
> To: "曾治金"
> Cc: "gcc-patches@gcc.gnu.org", 
> "gcc-b...@gcc.gnu.org", "Kito 
> Cheng"
> Hi there,
> 
> Please feel free to let me know if you don't have authority to commit it. I 
> can help to commit this patch.
> 
> Pan
> 
> 
> -Original Message-
> From: Kito Cheng  
> Sent: Friday, August 16, 2024 3:48 PM
> To: 曾治金 
> Cc: gcc-patches@gcc.gnu.org; gcc-b...@gcc.gnu.org
> Subject: Re: [PATCH] RISC-V: Fix factor in dwarf_poly_indeterminate_value 
> [PR116305]
> 
> LGTM, thanks for fixing that :)
> 
> On Wed, Aug 14, 2024 at 2:06 PM 曾治金  wrote:
> >
> > This patch is to fix the bug (BugId:116305) introduced by the commit
> > bd93ef for risc-v target.
> >
> > The commit bd93ef changes the chunk_num from 1 to TARGET_MIN_VLEN/128
> > if TARGET_MIN_VLEN is larger than 128 in riscv_convert_vector_bits. So
> > it changes the value of BYTES_PER_RISCV_VECTOR. For example, before
> > merging the commit bd93ef and if TARGET_MIN_VLEN is 256, the value
> > of BYTES_PER_RISCV_VECTOR should be [8, 8], but now [16, 16]. The value
> > of riscv_bytes_per_vector_chunk and BYTES_PER_RISCV_VECTOR are no longer
> > equal.
> >
> > Prologue will use BYTES_PER_RISCV_VECTOR.coeffs[1] to estimate the vlenb
> > register value in riscv_legitimize_poly_move, and dwarf2cfi will also
> > get the estimated vlenb register value in 
> > riscv_dwarf_poly_indeterminate_value
> > to calculate the number of times to multiply the vlenb register value.
> >
> > So need to change the factor from riscv_bytes_per_vector_chunk to
> > BYTES_PER_RISCV_VECTOR, otherwise we will get the incorrect dwarf
> > information. The incorrect example as follow:
> >
> > ```
> > csrr    t0,vlenb
> > slli    t1,t0,1
> > sub     sp,sp,t1
> >
> > .cfi_escape 0xf,0xb,0x72,0,0x92,0xa2,0x38,0,0x34,0x1e,0x23,0x50,0x22
> > ```
> >
> > The sequence '0x92,0xa2,0x38,0' means the vlenb register, '0x34' means
> > the literal 4, '0x1e' means the multiply operation. But in fact, the
> > vlenb register value just need to multiply the literal 2.
> >
> > gcc/ChangeLog:
> >
> >         * config/riscv/riscv.cc (riscv_dwarf_poly_indeterminate_value):
> >
> > gcc/testsuite/ChangeLog:
> >
> >         * gcc.target/riscv/rvv/base/scalable_vector_cfi.c: New test.
> >
> > Signed-off-by: Zhijin Zeng 
> > ---
> >  gcc/config/riscv/riscv.cc                     |  4 +--
> >  .../riscv/rvv/base/scalable_vector_cfi.c      | 32 +++
> >  2 files changed, 34 insertions(+), 2 deletions(-)
> >  create mode 100644 
> >gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c
> >
> > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> > index 5fe4273beb7..e740fc159dd 100644
> > --- a/gcc/config/riscv/riscv.cc
> > +++ b/gcc/config/riscv/riscv.cc
> > @@ -10773,12 +10773,12 @@ static unsigned int
> >  riscv_dwarf_poly_indeterminate_value (unsigned int i, unsigned int *factor,
> >                                       int *offset)
> >  {
> > -  /* Polynomial invariant 1 == (VLENB / riscv_bytes_per_vector_chunk) - 1.
> > +  /* Polynomial invariant 1 == (VLENB / BYTES_PER_RISCV_VECTOR) - 1.
> >       1. TARGET_MIN_VLEN == 32, polynomial invariant 1 == (VLENB / 4) - 1.
> >       2. TARGET_MI

RE: [PATCH v2] RISC-V: Make sure high bits of usadd operands is clean for HI/QI [PR116278]

2024-08-16 Thread Li, Pan2
Thanks Jeff and waterman for comments.

> What's more important is that we get the RTL semantics right, the fact
> that it seems to work due to addiw seems to be more of an accident than
> by design.

The SImode has different handling from day 1 which follow the algorithm up to a 
point.

11842   if (mode == SImode && mode != Xmode)
11843 { /* Take addw to avoid the sum truncate.  
11844   rtx simode_sum = gen_reg_rtx (SImode
11845   riscv_emit_binary (PLUS, simode_sum, x, y
11846   emit_move_insn (xmode_sum, gen_lowpart (Xmode, simode_sum));

  
11847 }

> I think your overall point still holds, though.

Got the point here but I would like to double confirm the below 2 more insn is 
acceptable for this change. (or we can eliminate it later)

sat_u_add_uint32_t_fmt_1:
sllia5,a0,32   // additional insn for taking care SI in rv64
srlia5,a5,32   // Ditto.
addwa0,a0,a1
sltua5,a0,a5


   
neg a5,a5   


   
or  a0,a5,a0
sext.w  a0,a0
ret

If so, I will prepare the v3 for the SImode in RV64.

Pan

-Original Message-
From: Andrew Waterman  
Sent: Friday, August 16, 2024 12:28 PM
To: Jeff Law 
Cc: Li, Pan2 ; gcc-patches@gcc.gnu.org; 
juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v2] RISC-V: Make sure high bits of usadd operands is clean 
for HI/QI [PR116278]

On Thu, Aug 15, 2024 at 9:23 PM Jeff Law  wrote:
>
>
>
> On 8/13/24 10:16 PM, Li, Pan2 wrote:
> >> How specifically is it avoided for SI?  ISTM it should have the exact
> >> same problem with a constant like 0x8000 in SImode on rv64 which is
> >> going to be extended to 0x8000.
> >
> > HI and QI need some special handling for sum. For example, for HImode.
> >
> > 65535 + 2 = 65537, when compare sum and 2, we need to cleanup the high bits 
> > (aka make 65537 become 1) to tell the HImode overflow.
> > Thus, for HI and QI, we need to clean up highest bits of mode.
> >
> > But for SI, we don't need that as we have addw insn, the sign extend will 
> > take care of this as well as the sltu. For example, SImode.
> >
> > lw  a1,0(a5)  // a1 is -40, aka 0xffd8
> > lui a0,0x1a   //
> > addwia5,a1,9   // a5 is -31, aka 0xffe1
> > // For QI and HI, we need to mask the highbits, 
> > but not applicable for SI.
> > sltua1,a5,a1  // compare a1 and a5, a5 > a1, then no-overflow as 
> > expected.
> What's more important is that we get the RTL semantics right, the fact
> that it seems to work due to addiw seems to be more of an accident than
> by design.  Also note that addiw isn't available unless ZBA is enabled,
> so we don't want to depend on that to save us.

addiw is always available in RV64; you're probably thinking of add.uw,
which is an RV64_Zba instruction.  I think your overall point still
holds, though.

>
> I still think we should be handling SI on rv64 in a manner similar to
> QI/HI are handled on rv32/rv64.
>
> jeff
>


RE: [PATCH] RISC-V: Fix factor in dwarf_poly_indeterminate_value [PR116305]

2024-08-16 Thread Li, Pan2
Hi there,

Please feel free to let me know if you don't have authority to commit it. I can 
help to commit this patch.

Pan


-Original Message-
From: Kito Cheng  
Sent: Friday, August 16, 2024 3:48 PM
To: 曾治金 
Cc: gcc-patches@gcc.gnu.org; gcc-b...@gcc.gnu.org
Subject: Re: [PATCH] RISC-V: Fix factor in dwarf_poly_indeterminate_value 
[PR116305]

LGTM, thanks for fixing that :)

On Wed, Aug 14, 2024 at 2:06 PM 曾治金  wrote:
>
> This patch is to fix the bug (BugId:116305) introduced by the commit
> bd93ef for risc-v target.
>
> The commit bd93ef changes the chunk_num from 1 to TARGET_MIN_VLEN/128
> if TARGET_MIN_VLEN is larger than 128 in riscv_convert_vector_bits. So
> it changes the value of BYTES_PER_RISCV_VECTOR. For example, before
> merging the commit bd93ef and if TARGET_MIN_VLEN is 256, the value
> of BYTES_PER_RISCV_VECTOR should be [8, 8], but now [16, 16]. The value
> of riscv_bytes_per_vector_chunk and BYTES_PER_RISCV_VECTOR are no longer
> equal.
>
> Prologue will use BYTES_PER_RISCV_VECTOR.coeffs[1] to estimate the vlenb
> register value in riscv_legitimize_poly_move, and dwarf2cfi will also
> get the estimated vlenb register value in riscv_dwarf_poly_indeterminate_value
> to calculate the number of times to multiply the vlenb register value.
>
> So need to change the factor from riscv_bytes_per_vector_chunk to
> BYTES_PER_RISCV_VECTOR, otherwise we will get the incorrect dwarf
> information. The incorrect example as follow:
>
> ```
> csrrt0,vlenb
> sllit1,t0,1
> sub sp,sp,t1
>
> .cfi_escape 0xf,0xb,0x72,0,0x92,0xa2,0x38,0,0x34,0x1e,0x23,0x50,0x22
> ```
>
> The sequence '0x92,0xa2,0x38,0' means the vlenb register, '0x34' means
> the literal 4, '0x1e' means the multiply operation. But in fact, the
> vlenb register value just need to multiply the literal 2.
>
> gcc/ChangeLog:
>
> * config/riscv/riscv.cc (riscv_dwarf_poly_indeterminate_value):
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/scalable_vector_cfi.c: New test.
>
> Signed-off-by: Zhijin Zeng 
> ---
>  gcc/config/riscv/riscv.cc |  4 +--
>  .../riscv/rvv/base/scalable_vector_cfi.c  | 32 +++
>  2 files changed, 34 insertions(+), 2 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 5fe4273beb7..e740fc159dd 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -10773,12 +10773,12 @@ static unsigned int
>  riscv_dwarf_poly_indeterminate_value (unsigned int i, unsigned int *factor,
>   int *offset)
>  {
> -  /* Polynomial invariant 1 == (VLENB / riscv_bytes_per_vector_chunk) - 1.
> +  /* Polynomial invariant 1 == (VLENB / BYTES_PER_RISCV_VECTOR) - 1.
>   1. TARGET_MIN_VLEN == 32, polynomial invariant 1 == (VLENB / 4) - 1.
>   2. TARGET_MIN_VLEN > 32, polynomial invariant 1 == (VLENB / 8) - 1.
>*/
>gcc_assert (i == 1);
> -  *factor = riscv_bytes_per_vector_chunk;
> +  *factor = BYTES_PER_RISCV_VECTOR.coeffs[1];
>*offset = 1;
>return RISCV_DWARF_VLENB;
>  }
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c
> new file mode 100644
> index 000..184da10caf3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c
> @@ -0,0 +1,32 @@
> +/* { dg-do compile } */
> +/* { dg-options "-g -O3 -march=rv64gcv -mabi=lp64d" } */
> +/* { dg-skip-if "" { *-*-* } {"-O2" "-O1" "-O0" "-Og" "-Oz" "-flto"} } */
> +/* { dg-final { scan-assembler {cfi_escape .*0x92,0xa2,0x38,0,0x32,0x1e} } } 
> */
> +
> +#include "riscv_vector.h"
> +
> +#define PI_2 1.570796326795
> +
> +extern void func(float *result);
> +
> +void test(const float *ys, const float *xs, float *result, size_t length) {
> +size_t gvl = __riscv_vsetvlmax_e32m2();
> +vfloat32m2_t vpi2 = __riscv_vfmv_v_f_f32m2(PI_2, gvl);
> +
> +for(size_t i = 0; i < length;) {
> +gvl = __riscv_vsetvl_e32m2(length - i);
> +vfloat32m2_t y = __riscv_vle32_v_f32m2(ys, gvl);
> +vfloat32m2_t x = __riscv_vle32_v_f32m2(xs, gvl);
> +vbool16_t mask0  = __riscv_vmflt_vv_f32m2_b16(x, y, gvl);
> +vfloat32m2_t fixpi = __riscv_vfrsub_vf_f32m2_mu(mask0, vpi2, vpi2, 
> 0, gvl);
> +
> +__riscv_vse32_v_f32m2(result, fixpi, gvl);
> +
> +func(result);
> +
> +i += gvl;
> +ys += gvl;
> +xs += gvl;
> +result += gvl;
> +}
> +}
> --
> 2.34.1
>
>
> This message and any attachment are confidential and may be privileged or 
> otherwise protected from disclosure. If you are not an intended recipient of 
> this message, please delete it and any attachment from your system and notify 
> the sender immediately by reply e-mail. Unintended recipients should not use, 
> copy, disclose or take any action based on this message or any 

RE: [PATCH v2] RISC-V: Support IMM for operand 0 of ussub pattern

2024-08-13 Thread Li, Pan2
> But you're shifting a REG, not a CONST_INT.
I see, we can make a QImode REG to be moved to, and then zero_extend.
Thanks Jeff for enlightening me, and will send v3 for this.

Pan

-Original Message-
From: Jeff Law  
Sent: Wednesday, August 14, 2024 11:52 AM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v2] RISC-V: Support IMM for operand 0 of ussub pattern



On 8/13/24 9:47 PM, Li, Pan2 wrote:
>>> +static rtx
>>> +riscv_gen_unsigned_xmode_reg (rtx x, machine_mode mode)
>>> +{
>>> +  if (!CONST_INT_P (x))
>>> +return gen_lowpart (Xmode, x);
>>> +
>>> +  rtx xmode_x = gen_reg_rtx (Xmode);
>>> +  HOST_WIDE_INT cst = INTVAL (x);
>>> +
>>> +  emit_move_insn (xmode_x, x);
>>> +
>>> +  int xmode_bits = GET_MODE_BITSIZE (Xmode);
>>> +  int mode_bits = GET_MODE_BITSIZE (mode).to_constant ();
>>> +
>>> +  if (cst < 0 && mode_bits < xmode_bits)
>>> +{
>>> +  int shift_bits = xmode_bits - mode_bits;
>>> +
>>> +   riscv_emit_binary (ASHIFT, xmode_x, xmode_x, GEN_INT (shift_bits));
>>> +   riscv_emit_binary (LSHIFTRT, xmode_x, xmode_x, GEN_INT 
>>> (shift_bits));
>>> +}
>> Isn't this a zero_extension?
> 
> I am not sure it is valid for zero_extend, given the incoming rtx x is 
> const_int which is DImode(integer promoted)
> for ussub.
> I will rebase this patch after PR116278 commit, and give a try for this.
But you're shifting a REG, not a CONST_INT.

Jeff



RE: [PATCH v2] RISC-V: Make sure high bits of usadd operands is clean for HI/QI [PR116278]

2024-08-13 Thread Li, Pan2
> How specifically is it avoided for SI?  ISTM it should have the exact 
> same problem with a constant like 0x8000 in SImode on rv64 which is 
> going to be extended to 0x8000.

HI and QI need some special handling for sum. For example, for HImode.

65535 + 2 = 65537, when compare sum and 2, we need to cleanup the high bits 
(aka make 65537 become 1) to tell the HImode overflow.
Thus, for HI and QI, we need to clean up highest bits of mode.

But for SI, we don't need that as we have addw insn, the sign extend will take 
care of this as well as the sltu. For example, SImode.

lw  a1,0(a5)  // a1 is -40, aka 0xffd8
lui a0,0x1a   // 
addwia5,a1,9   // a5 is -31, aka 0xffe1
   // For QI and HI, we need to mask the highbits, but 
not applicable for SI.
sltua1,a5,a1  // compare a1 and a5, a5 > a1, then no-overflow as expected.

Pan

-Original Message-
From: Jeff Law  
Sent: Wednesday, August 14, 2024 12:03 PM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v2] RISC-V: Make sure high bits of usadd operands is clean 
for HI/QI [PR116278]



On 8/12/24 8:09 PM, Li, Pan2 wrote:
>> Isn't this wrong for SImode on rv64?  It seems to me the right test is
>> mode != word_mode?
>> Assuming that works, it's OK for the trunk.
> 
> Thanks Jeff, Simode version of test file doesn't have this issue. Thus, only 
> HI and QI here.
> I will add a new test for SImode in v3 to ensure this.
How specifically is it avoided for SI?  ISTM it should have the exact 
same problem with a constant like 0x8000 in SImode on rv64 which is 
going to be extended to 0x8000.

Jeff


RE: [PATCH v2] RISC-V: Support IMM for operand 0 of ussub pattern

2024-08-13 Thread Li, Pan2
>> +static rtx
>> +riscv_gen_unsigned_xmode_reg (rtx x, machine_mode mode)
>> +{
>> +  if (!CONST_INT_P (x))
>> +return gen_lowpart (Xmode, x);
>> +
>> +  rtx xmode_x = gen_reg_rtx (Xmode);
>> +  HOST_WIDE_INT cst = INTVAL (x);
>> +
>> +  emit_move_insn (xmode_x, x);
>> +
>> +  int xmode_bits = GET_MODE_BITSIZE (Xmode);
>> +  int mode_bits = GET_MODE_BITSIZE (mode).to_constant ();
>> +
>> +  if (cst < 0 && mode_bits < xmode_bits)
>> +{
>> +  int shift_bits = xmode_bits - mode_bits;
>> +
>> +   riscv_emit_binary (ASHIFT, xmode_x, xmode_x, GEN_INT (shift_bits));
>> +   riscv_emit_binary (LSHIFTRT, xmode_x, xmode_x, GEN_INT (shift_bits));
>> +}
> Isn't this a zero_extension?

I am not sure it is valid for zero_extend, given the incoming rtx x is 
const_int which is DImode(integer promoted)
for ussub. 
I will rebase this patch after PR116278 commit, and give a try for this.

Pan

-Original Message-
From: Jeff Law  
Sent: Wednesday, August 14, 2024 11:33 AM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v2] RISC-V: Support IMM for operand 0 of ussub pattern



On 8/13/24 8:23 PM, Li, Pan2 wrote:
> This Patch may requires rebase, will send v3 for conflict resolving.
> 
> Pan
> 
> -Original Message-
> From: Li, Pan2 
> Sent: Sunday, August 4, 2024 7:48 PM
> To: gcc-patches@gcc.gnu.org
> Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; 
> rdapp@gmail.com; Li, Pan2 
> Subject: [PATCH v2] RISC-V: Support IMM for operand 0 of ussub pattern
> 
> From: Pan Li 
> 
> This patch would like to allow IMM for the operand 0 of ussub pattern.
> Aka .SAT_SUB(1023, y) as the below example.
> 
> Form 1:
>#define DEF_SAT_U_SUB_IMM_FMT_1(T, IMM) \
>T __attribute__((noinline)) \
>sat_u_sub_imm##IMM##_##T##_fmt_1 (T y)  \
>{   \
>  return (T)IMM >= y ? (T)IMM - y : 0;  \
>}
> 
> DEF_SAT_U_SUB_IMM_FMT_1(uint64_t, 1023)
> 
> Before this patch:
>10   │ sat_u_sub_imm82_uint64_t_fmt_1:
>11   │ li  a5,82
>12   │ bgtua0,a5,.L3
>13   │ sub a0,a5,a0
>14   │ ret
>15   │ .L3:
>16   │ li  a0,0
>17   │ ret
> 
> After this patch:
>10   │ sat_u_sub_imm82_uint64_t_fmt_1:
>11   │ li  a5,82
>12   │ sltua4,a5,a0
>13   │ addia4,a4,-1
>14   │ sub a0,a5,a0
>15   │ and a0,a4,a0
>16   │ ret
> 
> The below test suites are passed for this patch:
> 1. The rv64gcv fully regression test.
> 
> gcc/ChangeLog:
> 
>   * config/riscv/riscv.cc (riscv_gen_unsigned_xmode_reg): Add new
>   func impl to gen xmode rtx reg from operand rtx.
>   (riscv_expand_ussub): Gen xmode reg for operand 1.
>   * config/riscv/riscv.md: Allow const_int for operand 1.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/riscv/sat_arith.h: Add test helper macro.
>   * gcc.target/riscv/sat_u_sub_imm-1.c: New test.
>   * gcc.target/riscv/sat_u_sub_imm-1_1.c: New test.
>   * gcc.target/riscv/sat_u_sub_imm-1_2.c: New test.
>   * gcc.target/riscv/sat_u_sub_imm-2.c: New test.
>   * gcc.target/riscv/sat_u_sub_imm-2_1.c: New test.
>   * gcc.target/riscv/sat_u_sub_imm-2_2.c: New test.
>   * gcc.target/riscv/sat_u_sub_imm-3.c: New test.
>   * gcc.target/riscv/sat_u_sub_imm-3_1.c: New test.
>   * gcc.target/riscv/sat_u_sub_imm-3_2.c: New test.
>   * gcc.target/riscv/sat_u_sub_imm-4.c: New test.
>   * gcc.target/riscv/sat_u_sub_imm-run-1.c: New test.
>   * gcc.target/riscv/sat_u_sub_imm-run-2.c: New test.
>   * gcc.target/riscv/sat_u_sub_imm-run-3.c: New test.
>   * gcc.target/riscv/sat_u_sub_imm-run-4.c: New test.
> 
> Signed-off-by: Pan Li 
> ---
>   gcc/config/riscv/riscv.cc | 51 -
>   gcc/config/riscv/riscv.md |  2 +-
>   gcc/testsuite/gcc.target/riscv/sat_arith.h| 10 
>   .../gcc.target/riscv/sat_u_sub_imm-1.c| 20 +++
>   .../gcc.target/riscv/sat_u_sub_imm-1_1.c  | 20 +++
>   .../gcc.target/riscv/sat_u_sub_imm-1_2.c  | 20 +++
>   .../gcc.target/riscv/sat_u_sub_imm-2.c| 21 +++
>   .../gcc.target/riscv/sat_u_sub_imm-2_1.c  | 21 +++
>   .../gcc.target/riscv/sat_u_sub_imm-2_2.c  | 22 
>   .../gcc.target/riscv/sat_u_sub_imm-3.c| 20 +++
>   .../gcc.target/riscv/sat_u_sub_imm-3_1.c  | 21 +++
>   .../gcc.target/riscv/sat_u_sub_imm-3_2.c  | 22 +

RE: [PATCH v2] RISC-V: Support IMM for operand 0 of ussub pattern

2024-08-13 Thread Li, Pan2
This Patch may requires rebase, will send v3 for conflict resolving.

Pan

-Original Message-
From: Li, Pan2  
Sent: Sunday, August 4, 2024 7:48 PM
To: gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; 
rdapp@gmail.com; Li, Pan2 
Subject: [PATCH v2] RISC-V: Support IMM for operand 0 of ussub pattern

From: Pan Li 

This patch would like to allow IMM for the operand 0 of ussub pattern.
Aka .SAT_SUB(1023, y) as the below example.

Form 1:
  #define DEF_SAT_U_SUB_IMM_FMT_1(T, IMM) \
  T __attribute__((noinline)) \
  sat_u_sub_imm##IMM##_##T##_fmt_1 (T y)  \
  {   \
return (T)IMM >= y ? (T)IMM - y : 0;  \
  }

DEF_SAT_U_SUB_IMM_FMT_1(uint64_t, 1023)

Before this patch:
  10   │ sat_u_sub_imm82_uint64_t_fmt_1:
  11   │ li  a5,82
  12   │ bgtua0,a5,.L3
  13   │ sub a0,a5,a0
  14   │ ret
  15   │ .L3:
  16   │ li  a0,0
  17   │ ret

After this patch:
  10   │ sat_u_sub_imm82_uint64_t_fmt_1:
  11   │ li  a5,82
  12   │ sltua4,a5,a0
  13   │ addia4,a4,-1
  14   │ sub a0,a5,a0
  15   │ and a0,a4,a0
  16   │ ret

The below test suites are passed for this patch:
1. The rv64gcv fully regression test.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_gen_unsigned_xmode_reg): Add new
func impl to gen xmode rtx reg from operand rtx.
(riscv_expand_ussub): Gen xmode reg for operand 1.
* config/riscv/riscv.md: Allow const_int for operand 1.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add test helper macro.
* gcc.target/riscv/sat_u_sub_imm-1.c: New test.
* gcc.target/riscv/sat_u_sub_imm-1_1.c: New test.
* gcc.target/riscv/sat_u_sub_imm-1_2.c: New test.
* gcc.target/riscv/sat_u_sub_imm-2.c: New test.
* gcc.target/riscv/sat_u_sub_imm-2_1.c: New test.
* gcc.target/riscv/sat_u_sub_imm-2_2.c: New test.
* gcc.target/riscv/sat_u_sub_imm-3.c: New test.
* gcc.target/riscv/sat_u_sub_imm-3_1.c: New test.
* gcc.target/riscv/sat_u_sub_imm-3_2.c: New test.
* gcc.target/riscv/sat_u_sub_imm-4.c: New test.
* gcc.target/riscv/sat_u_sub_imm-run-1.c: New test.
* gcc.target/riscv/sat_u_sub_imm-run-2.c: New test.
* gcc.target/riscv/sat_u_sub_imm-run-3.c: New test.
* gcc.target/riscv/sat_u_sub_imm-run-4.c: New test.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/riscv.cc | 51 -
 gcc/config/riscv/riscv.md |  2 +-
 gcc/testsuite/gcc.target/riscv/sat_arith.h| 10 
 .../gcc.target/riscv/sat_u_sub_imm-1.c| 20 +++
 .../gcc.target/riscv/sat_u_sub_imm-1_1.c  | 20 +++
 .../gcc.target/riscv/sat_u_sub_imm-1_2.c  | 20 +++
 .../gcc.target/riscv/sat_u_sub_imm-2.c| 21 +++
 .../gcc.target/riscv/sat_u_sub_imm-2_1.c  | 21 +++
 .../gcc.target/riscv/sat_u_sub_imm-2_2.c  | 22 
 .../gcc.target/riscv/sat_u_sub_imm-3.c| 20 +++
 .../gcc.target/riscv/sat_u_sub_imm-3_1.c  | 21 +++
 .../gcc.target/riscv/sat_u_sub_imm-3_2.c  | 22 
 .../gcc.target/riscv/sat_u_sub_imm-4.c| 19 +++
 .../gcc.target/riscv/sat_u_sub_imm-run-1.c| 56 +++
 .../gcc.target/riscv/sat_u_sub_imm-run-2.c| 56 +++
 .../gcc.target/riscv/sat_u_sub_imm-run-3.c| 55 ++
 .../gcc.target/riscv/sat_u_sub_imm-run-4.c| 48 
 17 files changed, 482 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-1_1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-1_2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-2_1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-2_2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-3_1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-3_2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-run-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-run-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-run-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-run-4.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index b19d56149e7..5e4e9722729 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -11612,6 +11612,55 @@ riscv_expand_usadd (rtx dest, rtx x, rtx y)
   emit_move_insn (dest, gen_lowpart (mode, xmode_dest));
 }
 
+/* Generate a REG rtx of Xmode from the given rtx and mode.
+   The rtx x can be REG (QI/HI/SI/DI)

RE: [PATCH v2] Internal-fn: Handle vector bool type for type strict match mode [PR116103]

2024-08-13 Thread Li, Pan2
> Looks good to me too.  Sorry, didn't realise you were waiting for a second 
> ack.

Never mind, thanks Richard S for confirmation and suggestions.

Pan

-Original Message-
From: Richard Sandiford  
Sent: Tuesday, August 13, 2024 5:25 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
tamar.christ...@arm.com; jeffreya...@gmail.com; rdapp@gmail.com; Richard 
Biener 
Subject: Re: [PATCH v2] Internal-fn: Handle vector bool type for type strict 
match mode [PR116103]

"Li, Pan2"  writes:
> Hi Richard S,
>
> Please feel free to let me know if there is any further comments in v2. 
> Thanks a lot.

Looks good to me too.  Sorry, didn't realise you were waiting for a second ack.

Thanks,
Richard

>
> Pan
>
>
> -Original Message-
> From: Li, Pan2 
> Sent: Thursday, August 1, 2024 8:11 PM
> To: Richard Biener 
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
> tamar.christ...@arm.com; jeffreya...@gmail.com; rdapp@gmail.com
> Subject: RE: [PATCH v2] Internal-fn: Handle vector bool type for type strict 
> match mode [PR116103]
>
>> Still OK.
>
> Thanks Richard, let me wait the final confirmation from Richard S.
>
> Pan
>
> -Original Message-
> From: Richard Biener  
> Sent: Tuesday, July 30, 2024 5:03 PM
> To: Li, Pan2 
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
> tamar.christ...@arm.com; jeffreya...@gmail.com; rdapp@gmail.com
> Subject: Re: [PATCH v2] Internal-fn: Handle vector bool type for type strict 
> match mode [PR116103]
>
> On Tue, Jul 30, 2024 at 5:08 AM  wrote:
>>
>> From: Pan Li 
>>
>> For some target like target=amdgcn-amdhsa,  we need to take care of
>> vector bool types prior to general vector mode types.  Or we may have
>> the asm check failure as below.
>>
>> gcc.target/gcn/cond_smax_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, 
>> s[0-9]+, v[0-9]+ 80
>> gcc.target/gcn/cond_smin_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, 
>> s[0-9]+, v[0-9]+ 80
>> gcc.target/gcn/cond_umax_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, 
>> s[0-9]+, v[0-9]+ 56
>> gcc.target/gcn/cond_umin_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, 
>> s[0-9]+, v[0-9]+ 56
>> gcc.dg/tree-ssa/loop-bound-2.c scan-tree-dump-not ivopts "zero if "
>>
>> The below test suites are passed for this patch.
>> 1. The rv64gcv fully regression tests.
>> 2. The x86 bootstrap tests.
>> 3. The x86 fully regression tests.
>> 4. The amdgcn test case as above.
>
> Still OK.
>
> Richard.
>
>> gcc/ChangeLog:
>>
>> * internal-fn.cc (type_strictly_matches_mode_p): Add handling
>> for vector bool type.
>>
>> Signed-off-by: Pan Li 
>> ---
>>  gcc/internal-fn.cc | 10 ++
>>  1 file changed, 10 insertions(+)
>>
>> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
>> index 8a2e07f2f96..966594a52ed 100644
>> --- a/gcc/internal-fn.cc
>> +++ b/gcc/internal-fn.cc
>> @@ -4171,6 +4171,16 @@ direct_internal_fn_optab (internal_fn fn)
>>  static bool
>>  type_strictly_matches_mode_p (const_tree type)
>>  {
>> +  /* The masked vector operations have both vector data operands and vector
>> + boolean operands.  The vector data operands are expected to have a 
>> vector
>> + mode,  but the vector boolean operands can be an integer mode rather 
>> than
>> + a vector mode,  depending on how TARGET_VECTORIZE_GET_MASK_MODE is
>> + defined.  PR116103.  */
>> +  if (VECTOR_BOOLEAN_TYPE_P (type)
>> +  && SCALAR_INT_MODE_P (TYPE_MODE (type))
>> +  && TYPE_PRECISION (TREE_TYPE (type)) == 1)
>> +return true;
>> +
>>if (VECTOR_TYPE_P (type))
>>  return VECTOR_MODE_P (TYPE_MODE (type));
>>
>> --
>> 2.34.1
>>


RE: [PATCH v2] RISC-V: Make sure high bits of usadd operands is clean for HI/QI [PR116278]

2024-08-12 Thread Li, Pan2
> Isn't this wrong for SImode on rv64?  It seems to me the right test is 
> mode != word_mode?
> Assuming that works, it's OK for the trunk.

Thanks Jeff, Simode version of test file doesn't have this issue. Thus, only HI 
and QI here.
I will add a new test for SImode in v3 to ensure this.

Pan

-Original Message-
From: Jeff Law  
Sent: Tuesday, August 13, 2024 12:58 AM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v2] RISC-V: Make sure high bits of usadd operands is clean 
for HI/QI [PR116278]



On 8/11/24 4:43 AM, pan2...@intel.com wrote:

> +static rtx
> +riscv_gen_zero_extend_rtx (rtx x, machine_mode mode)
> +{
> +  if (mode != HImode && mode != QImode)
> +return gen_lowpart (Xmode, x);
Isn't this wrong for SImode on rv64?  It seems to me the right test is 
mode != word_mode?


Assuming that works, it's OK for the trunk.

jeff


RE: [PATCH v1] RISC-V: Make sure high bits of usadd operands is clean for HI/QI [PR116278]

2024-08-10 Thread Li, Pan2
> Isn't this just zero extension from a narrower mode to a wider mode? 
> Why not just use zero_extend?  That will take advantage of existing 
> expansion code to select an efficient extension approach at initial RTL 
> generation rather than waiting for combine to clean things up.

Thanks Jeff, let me have a try in v2.

Pan

-Original Message-
From: Jeff Law  
Sent: Saturday, August 10, 2024 11:34 PM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1] RISC-V: Make sure high bits of usadd operands is clean 
for HI/QI [PR116278]



On 8/8/24 9:12 PM, pan2...@intel.com wrote:
> From: Pan Li 
> 
> For QI/HImode of .SAT_ADD,  the operands may be sign-extended and the
> high bits of Xmode may be all 1 which is not expected.  For example as
> below code.
> 
> signed char b[1];
> unsigned short c;
> signed char *d = b;
> int main() {
>b[0] = -40;
>c = ({ (unsigned short)d[0] < 0xFFF6 ? (unsigned short)d[0] : 0xFFF6; }) + 
> 9;
>__builtin_printf("%d\n", c);
> }
> 
> After expanding we have:
> 
> ;; _6 = .SAT_ADD (_3, 9);
> (insn 8 7 9 (set (reg:DI 143)
>  (high:DI (symbol_ref:DI ("d") [flags 0x86]  )))
>   (nil))
> (insn 9 8 10 (set (reg/f:DI 142)
>  (mem/f/c:DI (lo_sum:DI (reg:DI 143)
>  (symbol_ref:DI ("d") [flags 0x86]  )) [1 d+0 S8 
> A64]))
>   (nil))
> (insn 10 9 11 (set (reg:HI 144 [ _3 ])
>  (sign_extend:HI (mem:QI (reg/f:DI 142) [0 *d.0_1+0 S1 A8]))) 
> "test.c":7:10 -1
>   (nil))
> 
> The convert from signed char to unsigned short will have sign_extend rtl
> as above.  And finally become the lb insn as below:
> 
> lb  a1,0(a5)   // a1 is -40, aka 0xffd8
> lui a0,0x1a
> addia5,a1,9
> sllia5,a5,0x30
> srlia5,a5,0x30 // a5 is 65505
> sltua1,a5,a1   // compare 65505 and 0xffd8 => TRUE
> 
> The sltu try to compare 65505 and 0xffd8 here,  but we
> actually want to compare 65505 and 65496 (0xffd8).  Thus we need to
> clean up the high bits to ensure this.
> 
> The below test suites are passed for this patch:
> * The rv64gcv fully regression test.
> 
>   PR target/116278
> 
> gcc/ChangeLog:
> 
>   * config/riscv/riscv.cc (riscv_cleanup_rtx_high): Add new func
>   impl to cleanup high bits of rtx.
>   (riscv_expand_usadd): Leverage above func to cleanup operands
>   and sum.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/riscv/sat_u_add-1.c: Adjust asm check.
>   * gcc.target/riscv/sat_u_add-10.c: Ditto.
>   * gcc.target/riscv/sat_u_add-13.c: Ditto.
>   * gcc.target/riscv/sat_u_add-14.c: Ditto.
>   * gcc.target/riscv/sat_u_add-17.c: Ditto.
>   * gcc.target/riscv/sat_u_add-18.c: Ditto.
>   * gcc.target/riscv/sat_u_add-2.c: Ditto.
>   * gcc.target/riscv/sat_u_add-21.c: Ditto.
>   * gcc.target/riscv/sat_u_add-22.c: Ditto.
>   * gcc.target/riscv/sat_u_add-5.c: Ditto.
>   * gcc.target/riscv/sat_u_add-6.c: Ditto.
>   * gcc.target/riscv/sat_u_add-9.c: Ditto.
>   * gcc.target/riscv/sat_u_add_imm-1.c: Ditto.
>   * gcc.target/riscv/sat_u_add_imm-10.c: Ditto.
>   * gcc.target/riscv/sat_u_add_imm-13.c: Ditto.
>   * gcc.target/riscv/sat_u_add_imm-14.c: Ditto.
>   * gcc.target/riscv/sat_u_add_imm-2.c: Ditto.
>   * gcc.target/riscv/sat_u_add_imm-5.c: Ditto.
>   * gcc.target/riscv/sat_u_add_imm-6.c: Ditto.
>   * gcc.target/riscv/sat_u_add_imm-9.c: Ditto.
>   * gcc.target/riscv/pr116278-run-1.c: New test.
> 
> Signed-off-by: Pan Li 
> ---
>   gcc/config/riscv/riscv.cc | 30 ++-
>   .../gcc.target/riscv/pr116278-run-1.c | 16 ++
>   gcc/testsuite/gcc.target/riscv/sat_u_add-1.c  |  1 +
>   gcc/testsuite/gcc.target/riscv/sat_u_add-10.c |  2 ++
>   gcc/testsuite/gcc.target/riscv/sat_u_add-13.c |  1 +
>   gcc/testsuite/gcc.target/riscv/sat_u_add-14.c |  2 ++
>   gcc/testsuite/gcc.target/riscv/sat_u_add-17.c |  1 +
>   gcc/testsuite/gcc.target/riscv/sat_u_add-18.c |  2 ++
>   gcc/testsuite/gcc.target/riscv/sat_u_add-2.c  |  2 ++
>   gcc/testsuite/gcc.target/riscv/sat_u_add-21.c |  1 +
>   gcc/testsuite/gcc.target/riscv/sat_u_add-22.c |  2 ++
>   gcc/testsuite/gcc.target/riscv/sat_u_add-5.c  |  1 +
>   gcc/testsuite/gcc.target/riscv/sat_u_add-6.c  |  2 ++
>   gcc/testsuite/gcc.target/riscv/sat_u_add-9.c  |  1 +
>   .../gcc.target/riscv/sat_u_add_imm-1.c|  1 +
>   .../gcc.target/riscv/sat_u_add_imm-10.c   |  2 ++
>   .../gcc.target/riscv/sat_u_add_imm-13.c   |  1 +
>   .../gcc.target/riscv/sa

RE: [PATCH v1] RISC-V: Bugfix incorrect operand for vwsll auto-vect

2024-08-10 Thread Li, Pan2
> I think my original (failed) idea was this pattern to be an 
> intermediate/bridge
> pattern that never splits.  
Yes, this pattern should not be hit by design, and any changes to the layout of 
pattern may result in
some vwsll autovec failure.

> Once we need to "split" maybe the regular shift is
> better or at least similar?

Actually it is something similar to short = char << int. Maybe we can 
1. extend char to short.
2. truncate int to short.

Then regular short shift is suitable here. Honestly I am not sure it is better 
than vwsll.

Pan


-Original Message-
From: Robin Dapp  
Sent: Saturday, August 10, 2024 10:32 PM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; Robin 
Dapp 
Subject: Re: [PATCH v1] RISC-V: Bugfix incorrect operand for vwsll auto-vect

A bit of bikeshedding:

While it's obviously a bug, I'm not really sure it's useful to truncate before
emitting the widening shift.  Do we save an instruction vs. the regular
non-widening shift by doing so?

I think my original (failed) idea was this pattern to be an intermediate/bridge
pattern that never splits.  Once we need to "split" maybe the regular shift is
better or at least similar?

-- 
Regards
 Robin



RE: [PATCH v2] Internal-fn: Handle vector bool type for type strict match mode [PR116103]

2024-08-08 Thread Li, Pan2
Hi Richard S,

Please feel free to let me know if there is any further comments in v2. Thanks 
a lot.

Pan


-Original Message-
From: Li, Pan2 
Sent: Thursday, August 1, 2024 8:11 PM
To: Richard Biener 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
tamar.christ...@arm.com; jeffreya...@gmail.com; rdapp@gmail.com
Subject: RE: [PATCH v2] Internal-fn: Handle vector bool type for type strict 
match mode [PR116103]

> Still OK.

Thanks Richard, let me wait the final confirmation from Richard S.

Pan

-Original Message-
From: Richard Biener  
Sent: Tuesday, July 30, 2024 5:03 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
tamar.christ...@arm.com; jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v2] Internal-fn: Handle vector bool type for type strict 
match mode [PR116103]

On Tue, Jul 30, 2024 at 5:08 AM  wrote:
>
> From: Pan Li 
>
> For some target like target=amdgcn-amdhsa,  we need to take care of
> vector bool types prior to general vector mode types.  Or we may have
> the asm check failure as below.
>
> gcc.target/gcn/cond_smax_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, 
> s[0-9]+, v[0-9]+ 80
> gcc.target/gcn/cond_smin_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, 
> s[0-9]+, v[0-9]+ 80
> gcc.target/gcn/cond_umax_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, 
> s[0-9]+, v[0-9]+ 56
> gcc.target/gcn/cond_umin_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, 
> s[0-9]+, v[0-9]+ 56
> gcc.dg/tree-ssa/loop-bound-2.c scan-tree-dump-not ivopts "zero if "
>
> The below test suites are passed for this patch.
> 1. The rv64gcv fully regression tests.
> 2. The x86 bootstrap tests.
> 3. The x86 fully regression tests.
> 4. The amdgcn test case as above.

Still OK.

Richard.

> gcc/ChangeLog:
>
> * internal-fn.cc (type_strictly_matches_mode_p): Add handling
> for vector bool type.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/internal-fn.cc | 10 ++
>  1 file changed, 10 insertions(+)
>
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 8a2e07f2f96..966594a52ed 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -4171,6 +4171,16 @@ direct_internal_fn_optab (internal_fn fn)
>  static bool
>  type_strictly_matches_mode_p (const_tree type)
>  {
> +  /* The masked vector operations have both vector data operands and vector
> + boolean operands.  The vector data operands are expected to have a 
> vector
> + mode,  but the vector boolean operands can be an integer mode rather 
> than
> + a vector mode,  depending on how TARGET_VECTORIZE_GET_MASK_MODE is
> + defined.  PR116103.  */
> +  if (VECTOR_BOOLEAN_TYPE_P (type)
> +  && SCALAR_INT_MODE_P (TYPE_MODE (type))
> +  && TYPE_PRECISION (TREE_TYPE (type)) == 1)
> +return true;
> +
>if (VECTOR_TYPE_P (type))
>  return VECTOR_MODE_P (TYPE_MODE (type));
>
> --
> 2.34.1
>


RE: [PATCH v1] RISC-V: Implement the quad and oct .SAT_TRUNC for scalar

2024-08-07 Thread Li, Pan2
Kindly ping++.

Pan

-Original Message-
From: Li, Pan2 
Sent: Wednesday, July 31, 2024 9:12 AM
To: gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; 
rdapp@gmail.com
Subject: RE: [PATCH v1] RISC-V: Implement the quad and oct .SAT_TRUNC for scalar

Kindly ping.

Pan

-Original Message-
From: Li, Pan2  
Sent: Tuesday, July 23, 2024 1:06 PM
To: gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; 
rdapp@gmail.com; Li, Pan2 
Subject: [PATCH v1] RISC-V: Implement the quad and oct .SAT_TRUNC for scalar

From: Pan Li 

This patch would like to implement the quad and oct .SAT_TRUNC pattern
in the riscv backend. Aka:

Form 1:
  #define DEF_SAT_U_TRUC_FMT_1(NT, WT) \
  NT __attribute__((noinline)) \
  sat_u_truc_##WT##_to_##NT##_fmt_1 (WT x) \
  {\
bool overflow = x > (WT)(NT)(-1);  \
return ((NT)x) | (NT)-overflow;\
  }

DEF_SAT_U_TRUC_FMT_1(uint16_t, uint64_t)

Before this patch:
   4   │ __attribute__((noinline))
   5   │ uint16_t sat_u_truc_uint64_t_to_uint16_t_fmt_1 (uint64_t x)
   6   │ {
   7   │   _Bool overflow;
   8   │   short unsigned int _1;
   9   │   short unsigned int _2;
  10   │   short unsigned int _3;
  11   │   uint16_t _6;
  12   │
  13   │ ;;   basic block 2, loop depth 0
  14   │ ;;pred:   ENTRY
  15   │   overflow_5 = x_4(D) > 65535;
  16   │   _1 = (short unsigned int) x_4(D);
  17   │   _2 = (short unsigned int) overflow_5;
  18   │   _3 = -_2;
  19   │   _6 = _1 | _3;
  20   │   return _6;
  21   │ ;;succ:   EXIT
  22   │
  23   │ }

After this patch:
   3   │
   4   │ __attribute__((noinline))
   5   │ uint16_t sat_u_truc_uint64_t_to_uint16_t_fmt_1 (uint64_t x)
   6   │ {
   7   │   uint16_t _6;
   8   │
   9   │ ;;   basic block 2, loop depth 0
  10   │ ;;pred:   ENTRY
  11   │   _6 = .SAT_TRUNC (x_4(D)); [tail call]
  12   │   return _6;
  13   │ ;;succ:   EXIT
  14   │
  15   │ }

The below tests suites are passed for this patch
1. The rv64gcv fully regression test.
2. The rv64gcv build with glibc

gcc/ChangeLog:

* config/riscv/iterators.md (ANYI_QUAD_TRUNC): New iterator for
quad truncation.
(ANYI_OCT_TRUNC): New iterator for oct truncation.
(ANYI_QUAD_TRUNCATED): New attr for truncated quad modes.
(ANYI_OCT_TRUNCATED): New attr for truncated oct modes.
(anyi_quad_truncated): Ditto but for lower case.
(anyi_oct_truncated): Ditto but for lower case.
* config/riscv/riscv.md (ustrunc2):
Add new pattern for quad truncation.
(ustrunc2): Ditto but for oct.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-2.c: Adjust
the expand dump check times.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-3.c: Ditto.
* gcc.target/riscv/sat_arith_data.h: Add test helper macros.
* gcc.target/riscv/sat_u_trunc-4.c: New test.
* gcc.target/riscv/sat_u_trunc-5.c: New test.
* gcc.target/riscv/sat_u_trunc-6.c: New test.
* gcc.target/riscv/sat_u_trunc-run-4.c: New test.
* gcc.target/riscv/sat_u_trunc-run-5.c: New test.
* gcc.target/riscv/sat_u_trunc-run-6.c: New test.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/iterators.md | 20 
 gcc/config/riscv/riscv.md | 20 
 .../rvv/autovec/unop/vec_sat_u_trunc-2.c  |  2 +-
 .../rvv/autovec/unop/vec_sat_u_trunc-3.c  |  2 +-
 .../gcc.target/riscv/sat_arith_data.h | 51 +++
 .../gcc.target/riscv/sat_u_trunc-4.c  | 17 +++
 .../gcc.target/riscv/sat_u_trunc-5.c  | 17 +++
 .../gcc.target/riscv/sat_u_trunc-6.c  | 20 
 .../gcc.target/riscv/sat_u_trunc-run-4.c  | 16 ++
 .../gcc.target/riscv/sat_u_trunc-run-5.c  | 16 ++
 .../gcc.target/riscv/sat_u_trunc-run-6.c  | 16 ++
 11 files changed, 195 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-run-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-run-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-run-6.c

diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md
index 734da041f0c..bdcdb8babc8 100644
--- a/gcc/config/riscv/iterators.md
+++ b/gcc/config/riscv/iterators.md
@@ -67,14 +67,34 @@ (define_mode_iterator ANYI [QI HI SI (DI "TARGET_64BIT")])
 
 (define_mode_iterator ANYI_DOUBLE_TRUNC [HI SI (DI "TARGET_64BIT")])
 
+(define_mode_iterator ANYI_QUAD_TRUNC [SI (DI "TARGET_64BIT")])
+
+(define_mode_iterator ANYI_OCT_TRUNC [(DI "TARGET_64BIT"

RE: [PATCH v2] Vect: Make sure the lhs type of .SAT_TRUNC has its mode precision [PR116202]

2024-08-06 Thread Li, Pan2
> OK.

Thanks Richard.

Just notice we can put type_has_mode_precision_p as the first condition to 
avoid unnecessary
pattern matching (which is heavy), will commit with this change if no surprise 
from test suite.

From:
> +  if (gimple_unsigned_integer_sat_trunc (lhs, ops, NULL)
> +  && type_has_mode_precision_p (otype))

To:
> +  if (type_has_mode_precision_p (otype)
> +  && gimple_unsigned_integer_sat_trunc (lhs, ops, NULL))

Pan

-Original Message-
From: Richard Biener  
Sent: Tuesday, August 6, 2024 9:26 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v2] Vect: Make sure the lhs type of .SAT_TRUNC has its mode 
precision [PR116202]

On Tue, Aug 6, 2024 at 2:59 PM  wrote:
>
> From: Pan Li 
>
> The .SAT_TRUNC vect pattern recog is valid when the lhs type has
> its mode precision.  For example as below, QImode with 1 bit precision
> like _Bool is invalid here.
>
> g_12 = (long unsigned int) _2;
> _13 = MIN_EXPR ;
> _3 = (_Bool) _13;
>
> The above pattern cannot be recog as .SAT_TRUNC (g_12) because the dest
> only has 1 bit precision with QImode mode.  Aka the type doesn't have
> the mode precision.
>
> The below tests are passed for this patch.
> 1. The rv64gcv fully regression tests.
> 2. The x86 bootstrap tests.
> 3. The x86 fully regression tests.

OK

> PR target/116202
>
> gcc/ChangeLog:
>
> * tree-vect-patterns.cc (vect_recog_sat_trunc_pattern): Add the
> type_has_mode_precision_p check for the lhs type.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/pr116202-run-1.c: New test.
>
> Signed-off-by: Pan Li 
> ---
>  .../riscv/rvv/base/pr116202-run-1.c   | 24 +++
>  gcc/tree-vect-patterns.cc |  5 ++--
>  2 files changed, 27 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr116202-run-1.c
>
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr116202-run-1.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/base/pr116202-run-1.c
> new file mode 100644
> index 000..d150f20b5d9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr116202-run-1.c
> @@ -0,0 +1,24 @@
> +/* { dg-do run } */
> +/* { dg-options "-O3 -march=rv64gcv_zvl256b -fdump-rtl-expand-details" } */
> +
> +int b[24];
> +_Bool c[24];
> +
> +int main() {
> +  for (int f = 0; f < 4; ++f)
> +b[f] = 6;
> +
> +  for (int f = 0; f < 24; f += 4)
> +c[f] = ({
> +  int g = ({
> +unsigned long g = -b[f];
> +1 < g ? 1 : g;
> +  });
> +  g;
> +});
> +
> +  if (c[0] != 1)
> +__builtin_abort ();
> +}
> +
> +/* { dg-final { scan-rtl-dump-not ".SAT_TRUNC " "expand" } } */
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index 4674a16d15f..74f80587b0e 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -4695,11 +4695,12 @@ vect_recog_sat_trunc_pattern (vec_info *vinfo, 
> stmt_vec_info stmt_vinfo,
>
>tree ops[1];
>tree lhs = gimple_assign_lhs (last_stmt);
> +  tree otype = TREE_TYPE (lhs);
>
> -  if (gimple_unsigned_integer_sat_trunc (lhs, ops, NULL))
> +  if (gimple_unsigned_integer_sat_trunc (lhs, ops, NULL)
> +  && type_has_mode_precision_p (otype))
>  {
>tree itype = TREE_TYPE (ops[0]);
> -  tree otype = TREE_TYPE (lhs);
>tree v_itype = get_vectype_for_scalar_type (vinfo, itype);
>tree v_otype = get_vectype_for_scalar_type (vinfo, otype);
>internal_fn fn = IFN_SAT_TRUNC;
> --
> 2.43.0
>


RE: [PATCH v1] Match: Support form 1 for scalar signed integer .SAT_ADD

2024-08-06 Thread Li, Pan2
> Ah, yeah - that's the usual (premature) frontend optimization to
> shorten operations after the standard
> mandated standard conversion (to 'int' in this case).

Thanks Richard for confirmation, let me refine the matching in v2.

Pan

-Original Message-
From: Richard Biener  
Sent: Tuesday, August 6, 2024 7:50 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1] Match: Support form 1 for scalar signed integer .SAT_ADD

On Tue, Aug 6, 2024 at 3:21 AM Li, Pan2  wrote:
>
> Hi Richard,
>
> It looks like the plus will have additional convert to unsigned in int8 and 
> int16, see below example in test.c.006t.gimple.
> And we need these convert ops in one matching pattern to cover all int scalar 
> types.

Ah, yeah - that's the usual (premature) frontend optimization to
shorten operations after the standard
mandated standard conversion (to 'int' in this case).

> I am not sure if there is a better way here, given convert in matching 
> pattern is not very elegant up to a point.
>
> int16_t
> add_i16 (int16_t a, int16_t b)
> {
>   int16_t sum = a + b;
>   return sum;
> }
>
> int32_t
> add_i32 (int32_t a, int32_t b)
> {
>   int32_t sum = a + b;
>   return sum;
> }
>
> --- 006t.gimple ---
> int16_t add_i16 (int16_t a, int16_t b)
> {
>   int16_t D.2815;
>   int16_t sum;
>
>   a.0_1 = (unsigned short) a;
>   b.1_2 = (unsigned short) b;
>   _3 = a.0_1 + b.1_2;
>   sum = (int16_t) _3;
>   D.2815 = sum;
>   return D.2815;
> }
>
> int32_t add_i32 (int32_t a, int32_t b)
> {
>   int32_t D.2817;
>   int32_t sum;
>
>   sum = a + b;
>   D.2817 = sum;
>   return D.2817;
> }
>
> Pan
>
> -Original Message-
> From: Li, Pan2
> Sent: Monday, August 5, 2024 9:52 PM
> To: Richard Biener 
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
> jeffreya...@gmail.com; rdapp@gmail.com
> Subject: RE: [PATCH v1] Match: Support form 1 for scalar signed integer 
> .SAT_ADD
>
> Thanks Richard for comments.
>
> > The convert looks odd to me given @0 is involved in both & operands.
>
> The convert is introduced as the GIMPLE IL is somehow different for int8_t 
> when compares to int32_t or int64_t.
> There are some additional ops convert to unsigned for plus, see below line 
> 8-9 and line 22-23.
> But we cannot see similar GIMPLE IL for int32_t and int64_t. To reconcile the 
> types from int8_t to int64_t, add the
> convert here.
>
> Or may be I have some mistake in the example, let me revisit it and send v2 
> if no surprise.
>
>4   │ __attribute__((noinline))
>5   │ int8_t sat_s_add_int8_t_fmt_1 (int8_t x, int8_t y)
>6   │ {
>7   │   int8_t sum;
>8   │   unsigned char x.1_1;
>9   │   unsigned char y.2_2;
>   10   │   unsigned char _3;
>   11   │   signed char _4;
>   12   │   signed char _5;
>   13   │   int8_t _6;
>   14   │   _Bool _11;
>   15   │   signed char _12;
>   16   │   signed char _13;
>   17   │   signed char _14;
>   18   │   signed char _22;
>   19   │   signed char _23;
>   20   │
>   21   │[local count: 1073741822]:
>   22   │   x.1_1 = (unsigned char) x_7(D);
>   23   │   y.2_2 = (unsigned char) y_8(D);
>   24   │   _3 = x.1_1 + y.2_2;
>   25   │   sum_9 = (int8_t) _3;
>   26   │   _4 = x_7(D) ^ y_8(D);
>   27   │   _5 = x_7(D) ^ sum_9;
>   28   │   _23 = ~_4;
>   29   │   _22 = _5 & _23;
>   30   │   if (_22 < 0)
>   31   │ goto ; [41.00%]
>   32   │   else
>   33   │ goto ; [59.00%]
>   34   │
>   35   │[local count: 259738146]:
>   36   │   _11 = x_7(D) < 0;
>   37   │   _12 = (signed char) _11;
>   38   │   _13 = -_12;
>   39   │   _14 = _13 ^ 127;
>   40   │
>   41   │[local count: 1073741824]:
>   42   │   # _6 = PHI <_14(3), sum_9(2)>
>   43   │   return _6;
>   44   │
>   45   │ }
>
> Pan
>
> -Original Message-
> From: Richard Biener 
> Sent: Monday, August 5, 2024 7:16 PM
> To: Li, Pan2 
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
> jeffreya...@gmail.com; rdapp@gmail.com
> Subject: Re: [PATCH v1] Match: Support form 1 for scalar signed integer 
> .SAT_ADD
>
> On Mon, Aug 5, 2024 at 9:14 AM  wrote:
> >
> > From: Pan Li 
> >
> > This patch would like to support the form 1 of the scalar signed
> > integer .SAT_ADD.  Aka below example:
> >
> > Form 1:
> >   #define DEF_SAT_S_ADD_FMT_1(T) \
> >   T __attribute__((noinline))\
> >   sat_s_add_##T

RE: [PATCH v1] Match: Add type_has_mode_precision_p check for SAT_TRUNC [PR116202]

2024-08-05 Thread Li, Pan2
> Well that means the caller (vectorizer pattern recog?) wrongly used a
> vector of QImode in
> the first place, so it needs to check the scalar mode as well?  

Current vect pattern recog only check the vector mode of define_expand pattern 
implemented or not.
Similar as below without scalar part.

tree v_itype = get_vectype_for_scalar_type (vinfo, itype);
tree v_otype = get_vectype_for_scalar_type (vinfo, otype);
direct_internal_fn_supported_p (fn, tree_pair (v_otype, v_itype), ...

> So possibly vectorizable_internal_function would need to be amended or better,
> vector pattern matching be constrainted.

Sure, will have a try in vectorizable_internal_function.

Pan

-Original Message-
From: Richard Biener  
Sent: Monday, August 5, 2024 9:43 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1] Match: Add type_has_mode_precision_p check for 
SAT_TRUNC [PR116202]

On Mon, Aug 5, 2024 at 3:04 PM Li, Pan2  wrote:
>
> > Isn't that now handled by the direct_internal_fn_supported_p check?  That 
> > is,
> > by the caller which needs to verify the matched operation is supported by
> > the target?
>
> type_strictly_matches_mode_p doesn't help here (include the un-committed one).
> It will hit below case and return true directly as TYPE_MODE (type) is 
> E_RVVM1QImode.
>
>if (VECTOR_TYPE_P (type))
> return VECTOR_MODE_P (TYPE_MODE (type));
>
> And looks we cannot TREE_PRECISION on vector type here similar as 
> type_has_mode_precision_p
> do for scalar types.  Thus, add the check to the matching.
>
> Looks like we need to take care of vector in type_strictly_matches_mode_p, 
> right ?

Well that means the caller (vectorizer pattern recog?) wrongly used a
vector of QImode in
the first place, so it needs to check the scalar mode as well?  Vector
type assignment does

  /* For vector types of elements whose mode precision doesn't
 match their types precision we use a element type of mode
 precision.  The vectorization routines will have to make sure
 they support the proper result truncation/extension.
 We also make sure to build vector types with INTEGER_TYPE
 component type only.  */
  if (INTEGRAL_TYPE_P (scalar_type)
  && (GET_MODE_BITSIZE (inner_mode) != TYPE_PRECISION (scalar_type)
  || TREE_CODE (scalar_type) != INTEGER_TYPE))
scalar_type = build_nonstandard_integer_type (GET_MODE_BITSIZE (inner_mode),
  TYPE_UNSIGNED (scalar_type));

So possibly vectorizable_internal_function would need to be amended or better,
vector pattern matching be constrainted.

Richard.

> Pan
>
> -Original Message-
> From: Richard Biener 
> Sent: Monday, August 5, 2024 7:02 PM
> To: Li, Pan2 
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
> jeffreya...@gmail.com; rdapp@gmail.com
> Subject: Re: [PATCH v1] Match: Add type_has_mode_precision_p check for 
> SAT_TRUNC [PR116202]
>
> On Sun, Aug 4, 2024 at 1:47 PM  wrote:
> >
> > From: Pan Li 
> >
> > The .SAT_TRUNC matching can only perform the type has its mode
> > precision.
> >
> > g_12 = (long unsigned int) _2;
> > _13 = MIN_EXPR ;
> > _3 = (_Bool) _13;
> >
> > The above pattern cannot be recog as .SAT_TRUNC (g_12) because the dest
> > only has 1 bit precision but QImode.  Aka the type doesn't have the mode
> > precision.  Thus,  add the type_has_mode_precision_p for the dest to
> > avoid such case.
> >
> > The below tests are passed for this patch.
> > 1. The rv64gcv fully regression tests.
> > 2. The x86 bootstrap tests.
> > 3. The x86 fully regression tests.
>
> Isn't that now handled by the direct_internal_fn_supported_p check?  That is,
> by the caller which needs to verify the matched operation is supported by
> the target?
>
> > PR target/116202
> >
> > gcc/ChangeLog:
> >
> > * match.pd: Add type_has_mode_precision_p for the dest type
> > of the .SAT_TRUNC matching.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/riscv/rvv/base/pr116202-run-1.c: New test.
> >
> > Signed-off-by: Pan Li 
> > ---
> >  gcc/match.pd  |  6 +++--
> >  .../riscv/rvv/base/pr116202-run-1.c   | 24 +++
> >  2 files changed, 28 insertions(+), 2 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr116202-run-1.c
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index c9c8478d286..dfa0bba3908 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/mat

RE: [PATCH v1] Match: Support form 1 for scalar signed integer .SAT_ADD

2024-08-05 Thread Li, Pan2
Hi Richard,

It looks like the plus will have additional convert to unsigned in int8 and 
int16, see below example in test.c.006t.gimple.
And we need these convert ops in one matching pattern to cover all int scalar 
types.

I am not sure if there is a better way here, given convert in matching pattern 
is not very elegant up to a point.

int16_t
add_i16 (int16_t a, int16_t b)
{
  int16_t sum = a + b;
  return sum;
}

int32_t
add_i32 (int32_t a, int32_t b)
{
  int32_t sum = a + b;
  return sum;
}

--- 006t.gimple ---
int16_t add_i16 (int16_t a, int16_t b)
{
  int16_t D.2815;
  int16_t sum;

  a.0_1 = (unsigned short) a;
  b.1_2 = (unsigned short) b;
  _3 = a.0_1 + b.1_2;
  sum = (int16_t) _3;
  D.2815 = sum;
  return D.2815;
}

int32_t add_i32 (int32_t a, int32_t b)
{
  int32_t D.2817;
  int32_t sum;

  sum = a + b;
  D.2817 = sum;
  return D.2817;
}

Pan

-Original Message-
From: Li, Pan2 
Sent: Monday, August 5, 2024 9:52 PM
To: Richard Biener 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
jeffreya...@gmail.com; rdapp@gmail.com
Subject: RE: [PATCH v1] Match: Support form 1 for scalar signed integer .SAT_ADD

Thanks Richard for comments.

> The convert looks odd to me given @0 is involved in both & operands.

The convert is introduced as the GIMPLE IL is somehow different for int8_t when 
compares to int32_t or int64_t.
There are some additional ops convert to unsigned for plus, see below line 8-9 
and line 22-23.
But we cannot see similar GIMPLE IL for int32_t and int64_t. To reconcile the 
types from int8_t to int64_t, add the
convert here.

Or may be I have some mistake in the example, let me revisit it and send v2 if 
no surprise.

   4   │ __attribute__((noinline))
   5   │ int8_t sat_s_add_int8_t_fmt_1 (int8_t x, int8_t y)
   6   │ {
   7   │   int8_t sum;
   8   │   unsigned char x.1_1;
   9   │   unsigned char y.2_2;
  10   │   unsigned char _3;
  11   │   signed char _4;
  12   │   signed char _5;
  13   │   int8_t _6;
  14   │   _Bool _11;
  15   │   signed char _12;
  16   │   signed char _13;
  17   │   signed char _14;
  18   │   signed char _22;
  19   │   signed char _23;
  20   │
  21   │[local count: 1073741822]:
  22   │   x.1_1 = (unsigned char) x_7(D);
  23   │   y.2_2 = (unsigned char) y_8(D);
  24   │   _3 = x.1_1 + y.2_2;
  25   │   sum_9 = (int8_t) _3;
  26   │   _4 = x_7(D) ^ y_8(D);
  27   │   _5 = x_7(D) ^ sum_9;
  28   │   _23 = ~_4;
  29   │   _22 = _5 & _23;
  30   │   if (_22 < 0)
  31   │ goto ; [41.00%]
  32   │   else
  33   │ goto ; [59.00%]
  34   │
  35   │[local count: 259738146]:
  36   │   _11 = x_7(D) < 0;
  37   │   _12 = (signed char) _11;
  38   │   _13 = -_12;
  39   │   _14 = _13 ^ 127;
  40   │
  41   │[local count: 1073741824]:
  42   │   # _6 = PHI <_14(3), sum_9(2)>
  43   │   return _6;
  44   │
  45   │ }

Pan

-Original Message-
From: Richard Biener  
Sent: Monday, August 5, 2024 7:16 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1] Match: Support form 1 for scalar signed integer .SAT_ADD

On Mon, Aug 5, 2024 at 9:14 AM  wrote:
>
> From: Pan Li 
>
> This patch would like to support the form 1 of the scalar signed
> integer .SAT_ADD.  Aka below example:
>
> Form 1:
>   #define DEF_SAT_S_ADD_FMT_1(T) \
>   T __attribute__((noinline))\
>   sat_s_add_##T##_fmt_1 (T x, T y)   \
>   {  \
> T min = (T)1u << (sizeof (T) * 8 - 1);   \
> T max = min - 1; \
> return (x ^ y) < 0   \
>   ? (T)(x + y)   \
>   : ((T)(x + y) ^ x) >= 0\
> ? (T)(x + y) \
> : x < 0 ? min : max; \
>   }
>
> DEF_SAT_S_ADD_FMT_1 (int64_t)
>
> We can tell the difference before and after this patch if backend
> implemented the ssadd3 pattern similar as below.
>
> Before this patch:
>4   │ __attribute__((noinline))
>5   │ int64_t sat_s_add_int64_t_fmt_1 (int64_t x, int64_t y)
>6   │ {
>7   │   long int _1;
>8   │   long int _2;
>9   │   long int _3;
>   10   │   int64_t _4;
>   11   │   long int _7;
>   12   │   _Bool _9;
>   13   │   long int _10;
>   14   │   long int _11;
>   15   │   long int _12;
>   16   │   long int _13;
>   17   │
>   18   │ ;;   basic block 2, loop depth 0
>   19   │ ;;pred:   ENTRY
>   20   │   _1 = x_5(D) ^ y_6(D);
>   21   │   _13 = x_5(D) + y_6(D);
>   22   │   _3 = x_5(D) ^ _13;
>   23   │   _2 = ~_1;
>   24   │   _7 = _2 & _3;
>   25   │   if (_7 >= 0)
>   26   │ goto ; [59.00%]
>   27   │   else
>   28   │ go

RE: [PATCH v1] Match: Support form 1 for scalar signed integer .SAT_ADD

2024-08-05 Thread Li, Pan2
Thanks Richard for comments.

> The convert looks odd to me given @0 is involved in both & operands.

The convert is introduced as the GIMPLE IL is somehow different for int8_t when 
compares to int32_t or int64_t.
There are some additional ops convert to unsigned for plus, see below line 8-9 
and line 22-23.
But we cannot see similar GIMPLE IL for int32_t and int64_t. To reconcile the 
types from int8_t to int64_t, add the
convert here.

Or may be I have some mistake in the example, let me revisit it and send v2 if 
no surprise.

   4   │ __attribute__((noinline))
   5   │ int8_t sat_s_add_int8_t_fmt_1 (int8_t x, int8_t y)
   6   │ {
   7   │   int8_t sum;
   8   │   unsigned char x.1_1;
   9   │   unsigned char y.2_2;
  10   │   unsigned char _3;
  11   │   signed char _4;
  12   │   signed char _5;
  13   │   int8_t _6;
  14   │   _Bool _11;
  15   │   signed char _12;
  16   │   signed char _13;
  17   │   signed char _14;
  18   │   signed char _22;
  19   │   signed char _23;
  20   │
  21   │[local count: 1073741822]:
  22   │   x.1_1 = (unsigned char) x_7(D);
  23   │   y.2_2 = (unsigned char) y_8(D);
  24   │   _3 = x.1_1 + y.2_2;
  25   │   sum_9 = (int8_t) _3;
  26   │   _4 = x_7(D) ^ y_8(D);
  27   │   _5 = x_7(D) ^ sum_9;
  28   │   _23 = ~_4;
  29   │   _22 = _5 & _23;
  30   │   if (_22 < 0)
  31   │ goto ; [41.00%]
  32   │   else
  33   │ goto ; [59.00%]
  34   │
  35   │[local count: 259738146]:
  36   │   _11 = x_7(D) < 0;
  37   │   _12 = (signed char) _11;
  38   │   _13 = -_12;
  39   │   _14 = _13 ^ 127;
  40   │
  41   │[local count: 1073741824]:
  42   │   # _6 = PHI <_14(3), sum_9(2)>
  43   │   return _6;
  44   │
  45   │ }

Pan

-Original Message-
From: Richard Biener  
Sent: Monday, August 5, 2024 7:16 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1] Match: Support form 1 for scalar signed integer .SAT_ADD

On Mon, Aug 5, 2024 at 9:14 AM  wrote:
>
> From: Pan Li 
>
> This patch would like to support the form 1 of the scalar signed
> integer .SAT_ADD.  Aka below example:
>
> Form 1:
>   #define DEF_SAT_S_ADD_FMT_1(T) \
>   T __attribute__((noinline))\
>   sat_s_add_##T##_fmt_1 (T x, T y)   \
>   {  \
> T min = (T)1u << (sizeof (T) * 8 - 1);   \
> T max = min - 1; \
> return (x ^ y) < 0   \
>   ? (T)(x + y)   \
>   : ((T)(x + y) ^ x) >= 0\
> ? (T)(x + y) \
> : x < 0 ? min : max; \
>   }
>
> DEF_SAT_S_ADD_FMT_1 (int64_t)
>
> We can tell the difference before and after this patch if backend
> implemented the ssadd3 pattern similar as below.
>
> Before this patch:
>4   │ __attribute__((noinline))
>5   │ int64_t sat_s_add_int64_t_fmt_1 (int64_t x, int64_t y)
>6   │ {
>7   │   long int _1;
>8   │   long int _2;
>9   │   long int _3;
>   10   │   int64_t _4;
>   11   │   long int _7;
>   12   │   _Bool _9;
>   13   │   long int _10;
>   14   │   long int _11;
>   15   │   long int _12;
>   16   │   long int _13;
>   17   │
>   18   │ ;;   basic block 2, loop depth 0
>   19   │ ;;pred:   ENTRY
>   20   │   _1 = x_5(D) ^ y_6(D);
>   21   │   _13 = x_5(D) + y_6(D);
>   22   │   _3 = x_5(D) ^ _13;
>   23   │   _2 = ~_1;
>   24   │   _7 = _2 & _3;
>   25   │   if (_7 >= 0)
>   26   │ goto ; [59.00%]
>   27   │   else
>   28   │ goto ; [41.00%]
>   29   │ ;;succ:   4
>   30   │ ;;3
>   31   │
>   32   │ ;;   basic block 3, loop depth 0
>   33   │ ;;pred:   2
>   34   │   _9 = x_5(D) < 0;
>   35   │   _10 = (long int) _9;
>   36   │   _11 = -_10;
>   37   │   _12 = _11 ^ 9223372036854775807;
>   38   │ ;;succ:   4
>   39   │
>   40   │ ;;   basic block 4, loop depth 0
>   41   │ ;;pred:   2
>   42   │ ;;3
>   43   │   # _4 = PHI <_13(2), _12(3)>
>   44   │   return _4;
>   45   │ ;;succ:   EXIT
>   46   │
>   47   │ }
>
> After this patch:
>4   │ __attribute__((noinline))
>5   │ int64_t sat_s_add_int64_t_fmt_1 (int64_t x, int64_t y)
>6   │ {
>7   │   int64_t _4;
>8   │
>9   │ ;;   basic block 2, loop depth 0
>   10   │ ;;pred:   ENTRY
>   11   │   _4 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
>   12   │   return _4;
>   13   │ ;;succ:   EXIT
>   14   │
>   15   │ }
>
> The below test suites are passed for this patch.
> * The rv64gcv fully regression test.
> * 

RE: [PATCH v1] Match: Add type_has_mode_precision_p check for SAT_TRUNC [PR116202]

2024-08-05 Thread Li, Pan2
> Isn't that now handled by the direct_internal_fn_supported_p check?  That is,
> by the caller which needs to verify the matched operation is supported by
> the target?

type_strictly_matches_mode_p doesn't help here (include the un-committed one).
It will hit below case and return true directly as TYPE_MODE (type) is 
E_RVVM1QImode.

   if (VECTOR_TYPE_P (type))
return VECTOR_MODE_P (TYPE_MODE (type));

And looks we cannot TREE_PRECISION on vector type here similar as 
type_has_mode_precision_p
do for scalar types.  Thus, add the check to the matching.

Looks like we need to take care of vector in type_strictly_matches_mode_p, 
right ?

Pan

-Original Message-
From: Richard Biener  
Sent: Monday, August 5, 2024 7:02 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1] Match: Add type_has_mode_precision_p check for 
SAT_TRUNC [PR116202]

On Sun, Aug 4, 2024 at 1:47 PM  wrote:
>
> From: Pan Li 
>
> The .SAT_TRUNC matching can only perform the type has its mode
> precision.
>
> g_12 = (long unsigned int) _2;
> _13 = MIN_EXPR ;
> _3 = (_Bool) _13;
>
> The above pattern cannot be recog as .SAT_TRUNC (g_12) because the dest
> only has 1 bit precision but QImode.  Aka the type doesn't have the mode
> precision.  Thus,  add the type_has_mode_precision_p for the dest to
> avoid such case.
>
> The below tests are passed for this patch.
> 1. The rv64gcv fully regression tests.
> 2. The x86 bootstrap tests.
> 3. The x86 fully regression tests.

Isn't that now handled by the direct_internal_fn_supported_p check?  That is,
by the caller which needs to verify the matched operation is supported by
the target?

> PR target/116202
>
> gcc/ChangeLog:
>
> * match.pd: Add type_has_mode_precision_p for the dest type
> of the .SAT_TRUNC matching.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/pr116202-run-1.c: New test.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/match.pd  |  6 +++--
>  .../riscv/rvv/base/pr116202-run-1.c   | 24 +++
>  2 files changed, 28 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr116202-run-1.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index c9c8478d286..dfa0bba3908 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3283,7 +3283,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> wide_int trunc_max = wi::mask (otype_precision, false, itype_precision);
> wide_int int_cst = wi::to_wide (@1, itype_precision);
>}
> -  (if (otype_precision < itype_precision && wi::eq_p (trunc_max, 
> int_cst))
> +  (if (type_has_mode_precision_p (type) && otype_precision < itype_precision
> +   && wi::eq_p (trunc_max, int_cst))
>
>  /* Unsigned saturation truncate, case 2, sizeof (WT) > sizeof (NT).
> SAT_U_TRUNC = (NT)(MIN_EXPR (X, 255)).  */
> @@ -3309,7 +3310,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> wide_int trunc_max = wi::mask (otype_precision, false, itype_precision);
> wide_int int_cst = wi::to_wide (@1, itype_precision);
>}
> -  (if (otype_precision < itype_precision && wi::eq_p (trunc_max, 
> int_cst))
> +  (if (type_has_mode_precision_p (type) && otype_precision < itype_precision
> +   && wi::eq_p (trunc_max, int_cst))
>
>  /* x >  y  &&  x != XXX_MIN  -->  x > y
> x >  y  &&  x == XXX_MIN  -->  false . */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr116202-run-1.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/base/pr116202-run-1.c
> new file mode 100644
> index 000..d150f20b5d9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr116202-run-1.c
> @@ -0,0 +1,24 @@
> +/* { dg-do run } */
> +/* { dg-options "-O3 -march=rv64gcv_zvl256b -fdump-rtl-expand-details" } */
> +
> +int b[24];
> +_Bool c[24];
> +
> +int main() {
> +  for (int f = 0; f < 4; ++f)
> +b[f] = 6;
> +
> +  for (int f = 0; f < 24; f += 4)
> +c[f] = ({
> +  int g = ({
> +unsigned long g = -b[f];
> +1 < g ? 1 : g;
> +  });
> +  g;
> +});
> +
> +  if (c[0] != 1)
> +__builtin_abort ();
> +}
> +
> +/* { dg-final { scan-rtl-dump-not ".SAT_TRUNC " "expand" } } */
> --
> 2.43.0
>


RE: [PATCH v1] RISC-V: Support IMM for operand 0 of ussub pattern

2024-08-03 Thread Li, Pan2
Thanks Jeff for comments, let me refine the comments in v2.

Pan

-Original Message-
From: Jeff Law  
Sent: Sunday, August 4, 2024 6:25 AM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1] RISC-V: Support IMM for operand 0 of ussub pattern



On 8/3/24 3:33 AM, pan2...@intel.com wrote:
> From: Pan Li 
> 
> This patch would like to allow IMM for the operand 0 of ussub pattern.
> Aka .SAT_SUB(1023, y) as the below example.
> 
> Form 1:
>#define DEF_SAT_U_SUB_IMM_FMT_1(T, IMM) \
>T __attribute__((noinline)) \
>sat_u_sub_imm##IMM##_##T##_fmt_1 (T y)  \
>{   \
>  return (T)IMM >= y ? (T)IMM - y : 0;  \
>}
> 
> DEF_SAT_U_SUB_IMM_FMT_1(uint64_t, 1023)
> 
> Before this patch:
>10   │ sat_u_sub_imm82_uint64_t_fmt_1:
>11   │ li  a5,82
>12   │ bgtua0,a5,.L3
>13   │ sub a0,a5,a0
>14   │ ret
>15   │ .L3:
>16   │ li  a0,0
>17   │ ret
> 
> After this patch:
>10   │ sat_u_sub_imm82_uint64_t_fmt_1:
>11   │ li  a5,82
>12   │ sltua4,a5,a0
>13   │ addia4,a4,-1
>14   │ sub a0,a5,a0
>15   │ and a0,a4,a0
>16   │ ret
> 
> The below test suites are passed for this patch:
> 1. The rv64gcv fully regression test.
> 
> gcc/ChangeLog:
> 
>   * config/riscv/riscv.cc (riscv_gen_unsigned_xmode_reg): Add new
>   func impl to gen xmode rtx reg.
>   (riscv_expand_ussub): Gen xmode reg for operand 1.
>   * config/riscv/riscv.md: Allow const_int for operand 1.
 > +> +   1. Case 1:  .SAT_SUB (127, y) for QImode.
> +  The imm will be (const_int 127) after expand_expr_real_1,  thus we
> +  can just move the (const_int 127) to Xmode reg without any other insn.
> +
> +   2. Case 2:  .SAT_SUB (254, y) for QImode.
> +  The imm will be (const_int -2) after expand_expr_real_1,  thus we
> +  will have li a0, -2 (aka a0 = 0xfffe if RV64).  This is
> +  not what we want for the underlying insn like sltu.  So we need to
> +  clean the up highest 56 bits for a0 to get the real value (254, 0xfe).
 > +> +   This function would like to take care of above scenario and 
return the
> +   rtx reg which holds the x in Xmode.  */
What does this function do.  ie, what are the inputs, what are the 
outputs?  Without that core information it's hard to know if your 
implementation is correct.


If really looks like you're returning a reg in X mode.  In which case 
you can just gen_int_mode (constant, word_mode)

If the constant is 254, then that's going to load 0x00fe on 
rv64.

If the problem is that you have a target of SImode on RV64, then you do 
have a real problem.  The rv64 ABI mandates that a 32bit value be sign 
extended out to 64 bits.  And if this is the problem you're trying to 
solve, then it's a good indicator you've made a mistake elsewhere.


Anyway, it seems like you need to describe better where things are going 
wrong before we can ACK/NACK this patch.

jeff




RE: [PATCH v2] Internal-fn: Handle vector bool type for type strict match mode [PR116103]

2024-08-01 Thread Li, Pan2
> Still OK.

Thanks Richard, let me wait the final confirmation from Richard S.

Pan

-Original Message-
From: Richard Biener  
Sent: Tuesday, July 30, 2024 5:03 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
tamar.christ...@arm.com; jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v2] Internal-fn: Handle vector bool type for type strict 
match mode [PR116103]

On Tue, Jul 30, 2024 at 5:08 AM  wrote:
>
> From: Pan Li 
>
> For some target like target=amdgcn-amdhsa,  we need to take care of
> vector bool types prior to general vector mode types.  Or we may have
> the asm check failure as below.
>
> gcc.target/gcn/cond_smax_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, 
> s[0-9]+, v[0-9]+ 80
> gcc.target/gcn/cond_smin_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, 
> s[0-9]+, v[0-9]+ 80
> gcc.target/gcn/cond_umax_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, 
> s[0-9]+, v[0-9]+ 56
> gcc.target/gcn/cond_umin_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, 
> s[0-9]+, v[0-9]+ 56
> gcc.dg/tree-ssa/loop-bound-2.c scan-tree-dump-not ivopts "zero if "
>
> The below test suites are passed for this patch.
> 1. The rv64gcv fully regression tests.
> 2. The x86 bootstrap tests.
> 3. The x86 fully regression tests.
> 4. The amdgcn test case as above.

Still OK.

Richard.

> gcc/ChangeLog:
>
> * internal-fn.cc (type_strictly_matches_mode_p): Add handling
> for vector bool type.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/internal-fn.cc | 10 ++
>  1 file changed, 10 insertions(+)
>
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 8a2e07f2f96..966594a52ed 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -4171,6 +4171,16 @@ direct_internal_fn_optab (internal_fn fn)
>  static bool
>  type_strictly_matches_mode_p (const_tree type)
>  {
> +  /* The masked vector operations have both vector data operands and vector
> + boolean operands.  The vector data operands are expected to have a 
> vector
> + mode,  but the vector boolean operands can be an integer mode rather 
> than
> + a vector mode,  depending on how TARGET_VECTORIZE_GET_MASK_MODE is
> + defined.  PR116103.  */
> +  if (VECTOR_BOOLEAN_TYPE_P (type)
> +  && SCALAR_INT_MODE_P (TYPE_MODE (type))
> +  && TYPE_PRECISION (TREE_TYPE (type)) == 1)
> +return true;
> +
>if (VECTOR_TYPE_P (type))
>  return VECTOR_MODE_P (TYPE_MODE (type));
>
> --
> 2.34.1
>


RE: [PATCH v1] RISC-V: Implement the quad and oct .SAT_TRUNC for scalar

2024-07-30 Thread Li, Pan2
Kindly ping.

Pan

-Original Message-
From: Li, Pan2  
Sent: Tuesday, July 23, 2024 1:06 PM
To: gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; 
rdapp@gmail.com; Li, Pan2 
Subject: [PATCH v1] RISC-V: Implement the quad and oct .SAT_TRUNC for scalar

From: Pan Li 

This patch would like to implement the quad and oct .SAT_TRUNC pattern
in the riscv backend. Aka:

Form 1:
  #define DEF_SAT_U_TRUC_FMT_1(NT, WT) \
  NT __attribute__((noinline)) \
  sat_u_truc_##WT##_to_##NT##_fmt_1 (WT x) \
  {\
bool overflow = x > (WT)(NT)(-1);  \
return ((NT)x) | (NT)-overflow;\
  }

DEF_SAT_U_TRUC_FMT_1(uint16_t, uint64_t)

Before this patch:
   4   │ __attribute__((noinline))
   5   │ uint16_t sat_u_truc_uint64_t_to_uint16_t_fmt_1 (uint64_t x)
   6   │ {
   7   │   _Bool overflow;
   8   │   short unsigned int _1;
   9   │   short unsigned int _2;
  10   │   short unsigned int _3;
  11   │   uint16_t _6;
  12   │
  13   │ ;;   basic block 2, loop depth 0
  14   │ ;;pred:   ENTRY
  15   │   overflow_5 = x_4(D) > 65535;
  16   │   _1 = (short unsigned int) x_4(D);
  17   │   _2 = (short unsigned int) overflow_5;
  18   │   _3 = -_2;
  19   │   _6 = _1 | _3;
  20   │   return _6;
  21   │ ;;succ:   EXIT
  22   │
  23   │ }

After this patch:
   3   │
   4   │ __attribute__((noinline))
   5   │ uint16_t sat_u_truc_uint64_t_to_uint16_t_fmt_1 (uint64_t x)
   6   │ {
   7   │   uint16_t _6;
   8   │
   9   │ ;;   basic block 2, loop depth 0
  10   │ ;;pred:   ENTRY
  11   │   _6 = .SAT_TRUNC (x_4(D)); [tail call]
  12   │   return _6;
  13   │ ;;succ:   EXIT
  14   │
  15   │ }

The below tests suites are passed for this patch
1. The rv64gcv fully regression test.
2. The rv64gcv build with glibc

gcc/ChangeLog:

* config/riscv/iterators.md (ANYI_QUAD_TRUNC): New iterator for
quad truncation.
(ANYI_OCT_TRUNC): New iterator for oct truncation.
(ANYI_QUAD_TRUNCATED): New attr for truncated quad modes.
(ANYI_OCT_TRUNCATED): New attr for truncated oct modes.
(anyi_quad_truncated): Ditto but for lower case.
(anyi_oct_truncated): Ditto but for lower case.
* config/riscv/riscv.md (ustrunc2):
Add new pattern for quad truncation.
(ustrunc2): Ditto but for oct.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-2.c: Adjust
the expand dump check times.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-3.c: Ditto.
* gcc.target/riscv/sat_arith_data.h: Add test helper macros.
* gcc.target/riscv/sat_u_trunc-4.c: New test.
* gcc.target/riscv/sat_u_trunc-5.c: New test.
* gcc.target/riscv/sat_u_trunc-6.c: New test.
* gcc.target/riscv/sat_u_trunc-run-4.c: New test.
* gcc.target/riscv/sat_u_trunc-run-5.c: New test.
* gcc.target/riscv/sat_u_trunc-run-6.c: New test.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/iterators.md | 20 
 gcc/config/riscv/riscv.md | 20 
 .../rvv/autovec/unop/vec_sat_u_trunc-2.c  |  2 +-
 .../rvv/autovec/unop/vec_sat_u_trunc-3.c  |  2 +-
 .../gcc.target/riscv/sat_arith_data.h | 51 +++
 .../gcc.target/riscv/sat_u_trunc-4.c  | 17 +++
 .../gcc.target/riscv/sat_u_trunc-5.c  | 17 +++
 .../gcc.target/riscv/sat_u_trunc-6.c  | 20 
 .../gcc.target/riscv/sat_u_trunc-run-4.c  | 16 ++
 .../gcc.target/riscv/sat_u_trunc-run-5.c  | 16 ++
 .../gcc.target/riscv/sat_u_trunc-run-6.c  | 16 ++
 11 files changed, 195 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-run-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-run-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-run-6.c

diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md
index 734da041f0c..bdcdb8babc8 100644
--- a/gcc/config/riscv/iterators.md
+++ b/gcc/config/riscv/iterators.md
@@ -67,14 +67,34 @@ (define_mode_iterator ANYI [QI HI SI (DI "TARGET_64BIT")])
 
 (define_mode_iterator ANYI_DOUBLE_TRUNC [HI SI (DI "TARGET_64BIT")])
 
+(define_mode_iterator ANYI_QUAD_TRUNC [SI (DI "TARGET_64BIT")])
+
+(define_mode_iterator ANYI_OCT_TRUNC [(DI "TARGET_64BIT")])
+
 (define_mode_attr ANYI_DOUBLE_TRUNCATED [
   (HI "QI") (SI "HI") (DI "SI")
 ])
 
+(define_mode_attr ANYI_QUAD_TRUNCATED [
+  (SI "QI") (DI "HI")
+])
+
+(define_mode_attr ANYI_OCT_TRUNCATED [
+  (DI "QI")
+])
+
 (define_mode_attr anyi_dou

RE: [PATCH v1] RISC-V: Take Xmode instead of Pmode for ussub expanding

2024-07-29 Thread Li, Pan2
Committed, thanks Robin.

Pan

-Original Message-
From: Robin Dapp  
Sent: Tuesday, July 30, 2024 2:28 PM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; Robin 
Dapp 
Subject: Re: [PATCH v1] RISC-V: Take Xmode instead of Pmode for ussub expanding

OK.

-- 
Regards
 Robin



RE: [PATCH v1] Internal-fn: Handle vector bool type for type strict match mode [PR116103]

2024-07-29 Thread Li, Pan2
Thanks Richard S for comments, updated in v2.

https://gcc.gnu.org/pipermail/gcc-patches/2024-July/658637.html

Pan

-Original Message-
From: Richard Sandiford  
Sent: Tuesday, July 30, 2024 12:09 AM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
richard.guent...@gmail.com; tamar.christ...@arm.com; jeffreya...@gmail.com; 
rdapp@gmail.com
Subject: Re: [PATCH v1] Internal-fn: Handle vector bool type for type strict 
match mode [PR116103]

pan2...@intel.com writes:
> From: Pan Li 
>
> For some target like target=amdgcn-amdhsa,  we need to take care of
> vector bool types prior to general vector mode types.  Or we may have
> the asm check failure as below.
>
> gcc.target/gcn/cond_smax_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, 
> s[0-9]+, v[0-9]+ 80
> gcc.target/gcn/cond_smin_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, 
> s[0-9]+, v[0-9]+ 80
> gcc.target/gcn/cond_umax_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, 
> s[0-9]+, v[0-9]+ 56
> gcc.target/gcn/cond_umin_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, 
> s[0-9]+, v[0-9]+ 56
> gcc.dg/tree-ssa/loop-bound-2.c scan-tree-dump-not ivopts "zero if "
>
> The below test suites are passed for this patch.
> 1. The rv64gcv fully regression tests.
> 2. The x86 bootstrap tests.
> 3. The x86 fully regression tests.
> 4. The amdgcn test case as above.
>
> gcc/ChangeLog:
>
>   * internal-fn.cc (type_strictly_matches_mode_p): Add handling
>   for vector bool type.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/internal-fn.cc | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 8a2e07f2f96..086c8be398a 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -4171,6 +4171,12 @@ direct_internal_fn_optab (internal_fn fn)
>  static bool
>  type_strictly_matches_mode_p (const_tree type)
>  {
> +  /* For target=amdgcn-amdhsa,  we need to take care of vector bool types.
> + More details see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116103.  
> */
> +  if (VECTOR_BOOLEAN_TYPE_P (type) && SCALAR_INT_MODE_P (TYPE_MODE (type))
> +&& TYPE_PRECISION (TREE_TYPE (type)) == 1)

Sorry for the formatting nits, but I think this should be:

  if (VECTOR_BOOLEAN_TYPE_P (type)
  && SCALAR_INT_MODE_P (TYPE_MODE (type))
  && TYPE_PRECISION (TREE_TYPE (type)) == 1)

(one condition per line, indented below "VECTOR").

But I think the comment should give the underlying reason, rather than
treat it as a target oddity.  Maybe something like:

  /* Masked vector operations have both vector data operands and
 vector boolean operands.  The vector data operands are expected
 to have a vector mode, but the vector boolean operands can be
 an integer mode rather than a vector mode, depending on how
 TARGET_VECTORIZE_GET_MASK_MODE is defined.  */

Thanks,
Richard

> +return true;
> +
>if (VECTOR_TYPE_P (type))
>  return VECTOR_MODE_P (TYPE_MODE (type));


RE: [PATCH v1] Internal-fn: Handle vector bool type for type strict match mode [PR116103]

2024-07-29 Thread Li, Pan2
> OK.

Thanks Richard, will wait the confirmation from Thomas in case I missed some 
more failed cases.

Pan

-Original Message-
From: Richard Biener  
Sent: Monday, July 29, 2024 4:44 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
tamar.christ...@arm.com; jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1] Internal-fn: Handle vector bool type for type strict 
match mode [PR116103]

On Mon, Jul 29, 2024 at 9:57 AM  wrote:
>
> From: Pan Li 
>
> For some target like target=amdgcn-amdhsa,  we need to take care of
> vector bool types prior to general vector mode types.  Or we may have
> the asm check failure as below.
>
> gcc.target/gcn/cond_smax_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, 
> s[0-9]+, v[0-9]+ 80
> gcc.target/gcn/cond_smin_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, 
> s[0-9]+, v[0-9]+ 80
> gcc.target/gcn/cond_umax_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, 
> s[0-9]+, v[0-9]+ 56
> gcc.target/gcn/cond_umin_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, 
> s[0-9]+, v[0-9]+ 56
> gcc.dg/tree-ssa/loop-bound-2.c scan-tree-dump-not ivopts "zero if "
>
> The below test suites are passed for this patch.
> 1. The rv64gcv fully regression tests.
> 2. The x86 bootstrap tests.
> 3. The x86 fully regression tests.
> 4. The amdgcn test case as above.

OK.

Richard.

> gcc/ChangeLog:
>
> * internal-fn.cc (type_strictly_matches_mode_p): Add handling
> for vector bool type.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/internal-fn.cc | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 8a2e07f2f96..086c8be398a 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -4171,6 +4171,12 @@ direct_internal_fn_optab (internal_fn fn)
>  static bool
>  type_strictly_matches_mode_p (const_tree type)
>  {
> +  /* For target=amdgcn-amdhsa,  we need to take care of vector bool types.
> + More details see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116103.  
> */
> +  if (VECTOR_BOOLEAN_TYPE_P (type) && SCALAR_INT_MODE_P (TYPE_MODE (type))
> +&& TYPE_PRECISION (TREE_TYPE (type)) == 1)
> +return true;
> +
>if (VECTOR_TYPE_P (type))
>  return VECTOR_MODE_P (TYPE_MODE (type));
>
> --
> 2.34.1
>


RE: [PATCH v1] Widening-Mul: Try .SAT_SUB for PLUS_EXPR when one op is IMM

2024-07-29 Thread Li, Pan2
> OK

Committed, thanks Richard.

Pan

-Original Message-
From: Richard Biener  
Sent: Monday, July 29, 2024 5:03 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
tamar.christ...@arm.com; jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1] Widening-Mul: Try .SAT_SUB for PLUS_EXPR when one op is 
IMM

On Sun, Jul 28, 2024 at 5:25 AM  wrote:
>
> From: Pan Li 
>
> After add the matching for .SAT_SUB when one op is IMM,  there
> will be a new root PLUS_EXPR for the .SAT_SUB pattern.  For example,
>
> Form 3:
>   #define DEF_SAT_U_SUB_IMM_FMT_3(T, IMM) \
>   T __attribute__((noinline)) \
>   sat_u_sub_imm##IMM##_##T##_fmt_3 (T x)  \
>   {   \
> return x >= IMM ? x - IMM : 0;\
>   }
>
> DEF_SAT_U_SUB_IMM_FMT_3(uint64_t, 11)
>
> And then we will have gimple before widening-mul as below.  Thus,  try
> the .SAT_SUB for the PLUS_EXPR.
>
>4   │ __attribute__((noinline))
>5   │ uint64_t sat_u_sub_imm11_uint64_t_fmt_3 (uint64_t x)
>6   │ {
>7   │   long unsigned int _1;
>8   │   uint64_t _3;
>9   │
>   10   │[local count: 1073741824]:
>   11   │   _1 = MAX_EXPR ;
>   12   │   _3 = _1 + 18446744073709551605;
>   13   │   return _3;
>   14   │
>   15   │ }
>
> The below test suites are passed for this patch.
> 1. The rv64gcv fully regression tests.
> 2. The x86 bootstrap tests.
> 3. The x86 fully regression tests.

OK

> gcc/ChangeLog:
>
> * tree-ssa-math-opts.cc (math_opts_dom_walker::after_dom_children):
> Try .SAT_SUB for PLUS_EXPR case.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/tree-ssa-math-opts.cc | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
> index ac86be8eb94..8d96a4c964b 100644
> --- a/gcc/tree-ssa-math-opts.cc
> +++ b/gcc/tree-ssa-math-opts.cc
> @@ -6129,6 +6129,7 @@ math_opts_dom_walker::after_dom_children (basic_block 
> bb)
>
> case PLUS_EXPR:
>   match_unsigned_saturation_add (&gsi, as_a (stmt));
> + match_unsigned_saturation_sub (&gsi, as_a (stmt));
>   /* fall-through  */
> case MINUS_EXPR:
>   if (!convert_plusminus_to_widen (&gsi, stmt, code))
> --
> 2.34.1
>


RE: [PATCH v1] Match: Support .SAT_SUB with IMM op for form 1-4

2024-07-26 Thread Li, Pan2
> OK.

Committed, thanks Richard.

Pan

-Original Message-
From: Richard Biener  
Sent: Friday, July 26, 2024 9:32 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
tamar.christ...@arm.com; jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1] Match: Support .SAT_SUB with IMM op for form 1-4

On Fri, Jul 26, 2024 at 11:20 AM  wrote:
>
> From: Pan Li 
>
> This patch would like to support .SAT_SUB when one of the op
> is IMM.  Aka below 1-4 forms.
>
> Form 1:
>  #define DEF_SAT_U_SUB_IMM_FMT_1(T, IMM) \
>  T __attribute__((noinline)) \
>  sat_u_sub_imm##IMM##_##T##_fmt_1 (T y)  \
>  {   \
>return IMM >= y ? IMM - y : 0;\
>  }
>
> Form 2:
>   #define DEF_SAT_U_SUB_IMM_FMT_2(T, IMM) \
>   T __attribute__((noinline)) \
>   sat_u_sub_imm##IMM##_##T##_fmt_2 (T y)  \
>   {   \
> return IMM > y ? IMM - y : 0; \
>   }
>
> Form 3:
>   #define DEF_SAT_U_SUB_IMM_FMT_3(T, IMM) \
>   T __attribute__((noinline)) \
>   sat_u_sub_imm##IMM##_##T##_fmt_3 (T x)  \
>   {   \
> return x >= IMM ? x - IMM : 0;\
>   }
>
> Form 4:
>   #define DEF_SAT_U_SUB_IMM_FMT_4(T, IMM) \
>   T __attribute__((noinline)) \
>   sat_u_sub_imm##IMM##_##T##_fmt_4 (T x)  \
>   {   \
> return x > IMM ? x - IMM : 0; \
>   }
>
> Take below form 1 as example:
>
> DEF_SAT_U_SUB_OP0_IMM_FMT_1(uint32_t, 11)
>
> Before this patch:
>4   │ __attribute__((noinline))
>5   │ uint64_t sat_u_sub_imm11_uint64_t_fmt_1 (uint64_t y)
>6   │ {
>7   │   uint64_t _1;
>8   │   uint64_t _3;
>9   │
>   10   │ ;;   basic block 2, loop depth 0
>   11   │ ;;pred:   ENTRY
>   12   │   if (y_2(D) <= 11)
>   13   │ goto ; [50.00%]
>   14   │   else
>   15   │ goto ; [50.00%]
>   16   │ ;;succ:   3
>   17   │ ;;4
>   18   │
>   19   │ ;;   basic block 3, loop depth 0
>   20   │ ;;pred:   2
>   21   │   _3 = 11 - y_2(D);
>   22   │ ;;succ:   4
>   23   │
>   24   │ ;;   basic block 4, loop depth 0
>   25   │ ;;pred:   2
>   26   │ ;;3
>   27   │   # _1 = PHI <0(2), _3(3)>
>   28   │   return _1;
>   29   │ ;;succ:   EXIT
>   30   │
>   31   │ }
>
> After this patch:
>4   │ __attribute__((noinline))
>5   │ uint64_t sat_u_sub_imm11_uint64_t_fmt_1 (uint64_t y)
>6   │ {
>7   │   uint64_t _1;
>8   │
>9   │ ;;   basic block 2, loop depth 0
>   10   │ ;;pred:   ENTRY
>   11   │   _1 = .SAT_SUB (11, y_2(D)); [tail call]
>   12   │   return _1;
>   13   │ ;;succ:   EXIT
>   14   │
>   15   │ }
>
> The below test suites are passed for this patch:
> 1. The rv64gcv fully regression tests.
> 2. The x86 bootstrap tests.
> 3. The x86 fully regression tests.

OK.

Thanks,
Richard.

> gcc/ChangeLog:
>
> * match.pd: Add case 9 and case 10 for .SAT_SUB when one
> of the op is IMM.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/match.pd | 35 +++
>  1 file changed, 35 insertions(+)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index cf359b0ec0f..b2e7d61790d 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3234,6 +3234,41 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
>&& types_match (type, @0, @1
>
> +/* Unsigned saturation sub with op_0 imm, case 9 (branch with gt):
> +   SAT_U_SUB = IMM > Y  ? (IMM - Y) : 0.
> + = IMM >= Y ? (IMM - Y) : 0.  */
> +(match (unsigned_integer_sat_sub @0 @1)
> + (cond^ (le @1 INTEGER_CST@2) (minus INTEGER_CST@0 @1) integer_zerop)
> + (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> + && types_match (type, @1))
> + (with
> +  {
> +   unsigned precision = TYPE_PRECISION (type);
> +   wide_int max = wi::mask (precision, false, precision);
> +   wide_int c0 = wi::to_wide (@0);
> +   wide_int c2 = wi::to_wide (@2);
> +   wide_int c2_add_1 = wi::add (c2, wi::uhwi (1, precision));
> +   bool equal_p = wi::eq_p (c0, c2);
> +   bool less_than_1_p = !wi::eq_p (c2, max) && wi::eq_p (c2_add_1, c0);
> +  }
> +  (if (equal_p || less_than_1_p)
> +
> +/* Unsigned saturation sub with op_1 imm, case 10:
> +   SAT_U_SUB = X > IMM  ? (X - IMM) : 0.
> + = X >= IMM ? (X - IMM) : 0.  */
> +(match (unsigned_integer_sat_sub @0 @1)
> + (plus (max @0

RE: [PATCH v2] Internal-fn: Only allow type matches mode for internal fn[PR115961]

2024-07-23 Thread Li, Pan2
> Just a slight comment improvement:
> /* Returns true if both types of TYPE_PAIR strictly match their modes,
> else returns false.  */

> This testcase could go in g++.dg/torture/ without the -O3 option.

> Since we are scanning for the negative it should pass on all targets
> even ones without SAT_TRUNC support. And then you should not need the
> other testcase either.

Thanks all, will address above comments and commit it if no surprise from test.

Pan

-Original Message-
From: Richard Sandiford  
Sent: Tuesday, July 23, 2024 10:03 PM
To: Richard Biener 
Cc: Li, Pan2 ; gcc-patches@gcc.gnu.org; 
juzhe.zh...@rivai.ai; kito.ch...@gmail.com; tamar.christ...@arm.com; 
jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v2] Internal-fn: Only allow type matches mode for internal 
fn[PR115961]

Richard Biener  writes:
> On Fri, Jul 19, 2024 at 1:10 PM  wrote:
>>
>> From: Pan Li 
>>
>> The direct_internal_fn_supported_p has no restrictions for the type
>> modes.  For example the bitfield like below will be recog as .SAT_TRUNC.
>>
>> struct e
>> {
>>   unsigned pre : 12;
>>   unsigned a : 4;
>> };
>>
>> __attribute__((noipa))
>> void bug (e * v, unsigned def, unsigned use) {
>>   e & defE = *v;
>>   defE.a = min_u (use + 1, 0xf);
>> }
>>
>> This patch would like to check strictly for the 
>> direct_internal_fn_supported_p,
>> and only allows the type matches mode for ifn type tree pair.
>>
>> The below test suites are passed for this patch:
>> 1. The rv64gcv fully regression tests.
>> 2. The x86 bootstrap tests.
>> 3. The x86 fully regression tests.
>
> LGTM unless Richard S. has any more comments.

LGTM too with Andrew's comments addressed.

Thanks,
Richard

>
> Richard.
>
>> PR target/115961
>>
>> gcc/ChangeLog:
>>
>> * internal-fn.cc (type_strictly_matches_mode_p): Add new func
>> impl to check type strictly matches mode or not.
>> (type_pair_strictly_matches_mode_p): Ditto but for tree type
>> pair.
>> (direct_internal_fn_supported_p): Add above check for the tree
>> type pair.
>>
>> gcc/testsuite/ChangeLog:
>>
>> * g++.target/i386/pr115961-run-1.C: New test.
>> * g++.target/riscv/rvv/base/pr115961-run-1.C: New test.
>>
>> Signed-off-by: Pan Li 
>> ---
>>  gcc/internal-fn.cc| 32 +
>>  .../g++.target/i386/pr115961-run-1.C  | 34 +++
>>  .../riscv/rvv/base/pr115961-run-1.C   | 34 +++
>>  3 files changed, 100 insertions(+)
>>  create mode 100644 gcc/testsuite/g++.target/i386/pr115961-run-1.C
>>  create mode 100644 gcc/testsuite/g++.target/riscv/rvv/base/pr115961-run-1.C
>>
>> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
>> index 95946bfd683..5c21249318e 100644
>> --- a/gcc/internal-fn.cc
>> +++ b/gcc/internal-fn.cc
>> @@ -4164,6 +4164,35 @@ direct_internal_fn_optab (internal_fn fn)
>>gcc_unreachable ();
>>  }
>>
>> +/* Return true if TYPE's mode has the same format as TYPE, and if there is
>> +   a 1:1 correspondence between the values that the mode can store and the
>> +   values that the type can store.  */
>> +
>> +static bool
>> +type_strictly_matches_mode_p (const_tree type)
>> +{
>> +  if (VECTOR_TYPE_P (type))
>> +return VECTOR_MODE_P (TYPE_MODE (type));
>> +
>> +  if (INTEGRAL_TYPE_P (type))
>> +return type_has_mode_precision_p (type);
>> +
>> +  if (SCALAR_FLOAT_TYPE_P (type) || COMPLEX_FLOAT_TYPE_P (type))
>> +return true;
>> +
>> +  return false;
>> +}
>> +
>> +/* Return true if both the first and the second type of tree pair are
>> +   strictly matches their modes,  or return false.  */
>> +
>> +static bool
>> +type_pair_strictly_matches_mode_p (tree_pair type_pair)
>> +{
>> +  return type_strictly_matches_mode_p (type_pair.first)
>> +&& type_strictly_matches_mode_p (type_pair.second);
>> +}
>> +
>>  /* Return true if FN is supported for the types in TYPES when the
>> optimization type is OPT_TYPE.  The types are those associated with
>> the "type0" and "type1" fields of FN's direct_internal_fn_info
>> @@ -4173,6 +4202,9 @@ bool
>>  direct_internal_fn_supported_p (internal_fn fn, tree_pair types,
>> optimization_type opt_type)
>>  {
>> +  if (!type_pair_strictly_matches_mode_p (types))
>> +ret

RE: [PATCH v3] RISC-V: Implement the .SAT_TRUNC for scalar

2024-07-22 Thread Li, Pan2
Committed, thanks Robin.

Pan

-Original Message-
From: Robin Dapp  
Sent: Monday, July 22, 2024 11:27 PM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com
Subject: Re: [PATCH v3] RISC-V: Implement the .SAT_TRUNC for scalar

LGTM.

-- 
Regards
 Robin



RE: [PATCH v3] RISC-V: Implement the .SAT_TRUNC for scalar

2024-07-22 Thread Li, Pan2
Kindly ping.

Pan

-Original Message-
From: Li, Pan2  
Sent: Monday, July 15, 2024 6:35 PM
To: gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; 
rdapp@gmail.com; Li, Pan2 
Subject: [PATCH v3] RISC-V: Implement the .SAT_TRUNC for scalar

From: Pan Li 

Update in v3:
* Rebase the upstream.
* Adjust asm check.

Original log:
This patch would like to implement the simple .SAT_TRUNC pattern
in the riscv backend. Aka:

Form 1:
  #define DEF_SAT_U_TRUC_FMT_1(NT, WT) \
  NT __attribute__((noinline)) \
  sat_u_truc_##WT##_to_##NT##_fmt_1 (WT x) \
  {\
bool overflow = x > (WT)(NT)(-1);  \
return ((NT)x) | (NT)-overflow;\
  }

DEF_SAT_U_TRUC_FMT_1(uint32_t, uint64_t)

Before this patch:
__attribute__((noinline))
uint8_t sat_u_truc_uint16_t_to_uint8_t_fmt_1 (uint16_t x)
{
  _Bool overflow;
  unsigned char _1;
  unsigned char _2;
  unsigned char _3;
  uint8_t _6;

;;   basic block 2, loop depth 0
;;pred:   ENTRY
  overflow_5 = x_4(D) > 255;
  _1 = (unsigned char) x_4(D);
  _2 = (unsigned char) overflow_5;
  _3 = -_2;
  _6 = _1 | _3;
  return _6;
;;succ:   EXIT

}

After this patch:
__attribute__((noinline))
uint8_t sat_u_truc_uint16_t_to_uint8_t_fmt_1 (uint16_t x)
{
  uint8_t _6;

;;   basic block 2, loop depth 0
;;pred:   ENTRY
  _6 = .SAT_TRUNC (x_4(D)); [tail call]
  return _6;
;;succ:   EXIT

}

The below tests suites are passed for this patch
1. The rv64gcv fully regression test.
2. The rv64gcv build with glibc

gcc/ChangeLog:

* config/riscv/iterators.md (ANYI_DOUBLE_TRUNC): Add new iterator
for int double truncation.
(ANYI_DOUBLE_TRUNCATED): Add new attr for int double truncation.
(anyi_double_truncated): Ditto but for lowercase.
* config/riscv/riscv-protos.h (riscv_expand_ustrunc): Add new
func decl for expanding ustrunc
* config/riscv/riscv.cc (riscv_expand_ustrunc): Add new func
impl to expand ustrunc.
* config/riscv/riscv.md (ustrunc2): Impl
the new pattern ustrunc2 for int.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-1.c: Adjust
asm check times from 2 to 4.
* gcc.target/riscv/sat_arith.h: Add test helper macro.
* gcc.target/riscv/sat_arith_data.h: New test.
* gcc.target/riscv/sat_u_trunc-1.c: New test.
* gcc.target/riscv/sat_u_trunc-2.c: New test.
* gcc.target/riscv/sat_u_trunc-3.c: New test.
* gcc.target/riscv/sat_u_trunc-run-1.c: New test.
* gcc.target/riscv/sat_u_trunc-run-2.c: New test.
* gcc.target/riscv/sat_u_trunc-run-3.c: New test.
* gcc.target/riscv/scalar_sat_unary.h: New test.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/iterators.md | 10 
 gcc/config/riscv/riscv-protos.h   |  1 +
 gcc/config/riscv/riscv.cc | 40 +
 gcc/config/riscv/riscv.md | 10 
 .../rvv/autovec/unop/vec_sat_u_trunc-1.c  |  2 +-
 gcc/testsuite/gcc.target/riscv/sat_arith.h| 16 ++
 .../gcc.target/riscv/sat_arith_data.h | 56 +++
 .../gcc.target/riscv/sat_u_trunc-1.c  | 17 ++
 .../gcc.target/riscv/sat_u_trunc-2.c  | 20 +++
 .../gcc.target/riscv/sat_u_trunc-3.c  | 19 +++
 .../gcc.target/riscv/sat_u_trunc-run-1.c  | 16 ++
 .../gcc.target/riscv/sat_u_trunc-run-2.c  | 16 ++
 .../gcc.target/riscv/sat_u_trunc-run-3.c  | 16 ++
 .../gcc.target/riscv/scalar_sat_unary.h   | 22 
 14 files changed, 260 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_arith_data.h
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-run-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-run-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-run-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/scalar_sat_unary.h

diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md
index d61ed53a8b1..734da041f0c 100644
--- a/gcc/config/riscv/iterators.md
+++ b/gcc/config/riscv/iterators.md
@@ -65,6 +65,16 @@ (define_mode_iterator SUBX [QI HI (SI "TARGET_64BIT")])
 ;; Iterator for hardware-supported integer modes.
 (define_mode_iterator ANYI [QI HI SI (DI "TARGET_64BIT")])
 
+(define_mode_iterator ANYI_DOUBLE_TRUNC [HI SI (DI "TARGET_64BIT")])
+
+(define_mode_attr ANYI_DOUBLE_TRUNCATED [
+  (HI "QI") (SI "HI") (DI "SI")
+])
+
+(define_mode_attr anyi_double_truncated [
+  (HI "qi") (SI "hi") (DI "si")
+])
+
 ;; Iterator 

RE: [PATCH v1] Internal-fn: Only allow modes describe types for internal fn[PR115961]

2024-07-19 Thread Li, Pan2
Thanks Richard S for comments and suggestions, updated in v2.

Pan

-Original Message-
From: Richard Sandiford  
Sent: Friday, July 19, 2024 3:46 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
richard.guent...@gmail.com; tamar.christ...@arm.com; jeffreya...@gmail.com; 
rdapp@gmail.com
Subject: Re: [PATCH v1] Internal-fn: Only allow modes describe types for 
internal fn[PR115961]

pan2...@intel.com writes:
> From: Pan Li 
>
> The direct_internal_fn_supported_p has no restrictions for the type
> modes.  For example the bitfield like below will be recog as .SAT_TRUNC.
>
> struct e
> {
>   unsigned pre : 12;
>   unsigned a : 4;
> };
>
> __attribute__((noipa))
> void bug (e * v, unsigned def, unsigned use) {
>   e & defE = *v;
>   defE.a = min_u (use + 1, 0xf);
> }
>
> This patch would like to add checks for the direct_internal_fn_supported_p,
> and only allows the tree types describled by modes.
>
> The below test suites are passed for this patch:
> 1. The rv64gcv fully regression tests.
> 2. The x86 bootstrap tests.
> 3. The x86 fully regression tests.
>
>   PR target/115961
>
> gcc/ChangeLog:
>
>   * internal-fn.cc (mode_describle_type_precision_p): Add new func
>   impl to check if mode describle the tree type.
>   (direct_internal_fn_supported_p): Add above check for the first
>   and second tree type of tree pair.
>
> gcc/testsuite/ChangeLog:
>
>   * g++.target/i386/pr115961-run-1.C: New test.
>   * g++.target/riscv/rvv/base/pr115961-run-1.C: New test.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/internal-fn.cc| 21 
>  .../g++.target/i386/pr115961-run-1.C  | 34 +++
>  .../riscv/rvv/base/pr115961-run-1.C   | 34 +++
>  3 files changed, 89 insertions(+)
>  create mode 100644 gcc/testsuite/g++.target/i386/pr115961-run-1.C
>  create mode 100644 gcc/testsuite/g++.target/riscv/rvv/base/pr115961-run-1.C
>
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 95946bfd683..4dc69264a24 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -4164,6 +4164,23 @@ direct_internal_fn_optab (internal_fn fn)
>gcc_unreachable ();
>  }
>  
> +/* Return true if the mode describes the precision of tree type,  or false.  
> */
> +
> +static bool
> +mode_describle_type_precision_p (const_tree type)

Bit pedantic, but it's not really just about precision.  For floats
and vectors it's also about format.  Maybe:

/* Return true if TYPE's mode has the same format as TYPE, and if there is
   a 1:1 correspondence between the values that the mode can store and the
   values that the type can store.  */

And maybe my mode_describes_type_p suggestion wasn't the best,
but given that it's not just about precision, I'm not sure about
mode_describle_type_precision_p either.  How about:

  type_strictly_matches_mode_p

?  I'm open to other suggestions.

> +{
> +  if (VECTOR_TYPE_P (type))
> +return VECTOR_MODE_P (TYPE_MODE (type));
> +
> +  if (INTEGRAL_TYPE_P (type))
> +return type_has_mode_precision_p (type);
> +
> +  if (SCALAR_FLOAT_TYPE_P (type) || COMPLEX_FLOAT_TYPE_P (type))
> +return true;
> +
> +  return false;
> +}
> +
>  /* Return true if FN is supported for the types in TYPES when the
> optimization type is OPT_TYPE.  The types are those associated with
> the "type0" and "type1" fields of FN's direct_internal_fn_info
> @@ -4173,6 +4190,10 @@ bool
>  direct_internal_fn_supported_p (internal_fn fn, tree_pair types,
>   optimization_type opt_type)
>  {
> +  if (!mode_describle_type_precision_p (types.first)
> +|| !mode_describle_type_precision_p (types.second))

Formatting nit: the "||" should line up with the "!".

LGTM otherwise.

Thanks,
Richard

> +return false;
> +
>switch (fn)
>  {
>  #define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) \
> diff --git a/gcc/testsuite/g++.target/i386/pr115961-run-1.C 
> b/gcc/testsuite/g++.target/i386/pr115961-run-1.C
> new file mode 100644
> index 000..b8c8aef3b17
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/i386/pr115961-run-1.C
> @@ -0,0 +1,34 @@
> +/* PR target/115961 */
> +/* { dg-do run } */
> +/* { dg-options "-O3 -fdump-rtl-expand-details" } */
> +
> +struct e
> +{
> +  unsigned pre : 12;
> +  unsigned a : 4;
> +};
> +
> +static unsigned min_u (unsigned a, unsigned b)
> +{
> +  return (b < a) ? b : a;
> +}
> +
> +__attribute__((noipa))
> +void bug (e * v, unsigned def, unsigned use) {
> +  e & defE 

RE: [PATCH v2] RISC-V: More support of vx and vf for autovec comparison

2024-07-19 Thread Li, Pan2
> +  TEST_COND_IMM_FLOAT (T, >, 0.0, _gt)   
> \
>  +  TEST_COND_IMM_FLOAT (T, <, 0.0, _lt)  
> \
>  +  TEST_COND_IMM_FLOAT (T, >=, 0.0, _ge) 
> \
>  +  TEST_COND_IMM_FLOAT (T, <=, 0.0, _le) 
> \
>  +  TEST_COND_IMM_FLOAT (T, ==, 0.0, _eq) 
> \
>  +  TEST_COND_IMM_FLOAT (T, !=, 0.0, _ne) 
> \

Just curious, does this patch covered float imm is -0.0 (notice only +0.0 is 
mentioned)?
If so we can have similar tests as +0.0 here.

It is totally Ok if -0.0f is not applicable here.

Pan

-Original Message-
From: demin.han  
Sent: Friday, July 19, 2024 4:55 PM
To: gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; Li, Pan2 ; 
jeffreya...@gmail.com; rdapp@gmail.com
Subject: [PATCH v2] RISC-V: More support of vx and vf for autovec comparison

There are still some cases which can't utilize vx or vf after
last_combine pass.

1. integer comparison when imm isn't in range of [-16, 15]
2. float imm is 0.0
3. DI or DF mode under RV32

This patch fix above mentioned issues.

Tested on RV32 and RV64.

Signed-off-by: demin.han 
gcc/ChangeLog:

* config/riscv/autovec.md: register_operand to nonmemory_operand
* config/riscv/riscv-v.cc (get_cmp_insn_code): Select code according
* to scalar_p
(expand_vec_cmp): Generate scalar_p and transform op1
* config/riscv/riscv.cc (riscv_const_insns): Add !FLOAT_MODE_P
* constrain

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/cmp/vcond-1.c: Fix and add test

Signed-off-by: demin.han 
---
V2 changes:
  1. remove unnecessary add_integer_operand and related code
  2. fix one format issue
  3. split patch and make it only related to vec cmp

 gcc/config/riscv/autovec.md   |  2 +-
 gcc/config/riscv/riscv-v.cc   | 57 +++
 gcc/config/riscv/riscv.cc |  2 +-
 .../riscv/rvv/autovec/cmp/vcond-1.c   | 48 +++-
 4 files changed, 82 insertions(+), 27 deletions(-)

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index d5793acc999..a772153 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -690,7 +690,7 @@ (define_expand "vec_cmp"
   [(set (match_operand: 0 "register_operand")
(match_operator: 1 "comparison_operator"
  [(match_operand:V_VLSF 2 "register_operand")
-  (match_operand:V_VLSF 3 "register_operand")]))]
+  (match_operand:V_VLSF 3 "nonmemory_operand")]))]
   "TARGET_VECTOR"
   {
 riscv_vector::expand_vec_cmp_float (operands[0], GET_CODE (operands[1]),
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index e290675bbf0..56328075aeb 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -2624,32 +2624,27 @@ expand_vec_init (rtx target, rtx vals)
 /* Get insn code for corresponding comparison.  */
 
 static insn_code
-get_cmp_insn_code (rtx_code code, machine_mode mode)
+get_cmp_insn_code (rtx_code code, machine_mode mode, bool scalar_p)
 {
   insn_code icode;
-  switch (code)
+  if (FLOAT_MODE_P (mode))
 {
-case EQ:
-case NE:
-case LE:
-case LEU:
-case GT:
-case GTU:
-case LTGT:
-  icode = code_for_pred_cmp (mode);
-  break;
-case LT:
-case LTU:
-case GE:
-case GEU:
-  if (FLOAT_MODE_P (mode))
-   icode = code_for_pred_cmp (mode);
+  icode = !scalar_p ? code_for_pred_cmp (mode)
+   : code_for_pred_cmp_scalar (mode);
+  return icode;
+}
+  if (scalar_p)
+{
+  if (code == GE || code == GEU)
+   icode = code_for_pred_ge_scalar (mode);
   else
-   icode = code_for_pred_ltge (mode);
-  break;
-default:
-  gcc_unreachable ();
+   icode = code_for_pred_cmp_scalar (mode);
+  return icode;
 }
+  if (code == LT || code == LTU || code == GE || code == GEU)
+icode = code_for_pred_ltge (mode);
+  else
+icode = code_for_pred_cmp (mode);
   return icode;
 }
 
@@ -2771,7 +2766,6 @@ expand_vec_cmp (rtx target, rtx_code code, rtx op0, rtx 
op1, rtx mask,
 {
   machine_mode mask_mode = GET_MODE (target);
   machine_mode data_mode = GET_MODE (op0);
-  insn_code icode = get_cmp_insn_code (code, data_mode);
 
   if (code == LTGT)
 {
@@ -2779,12 +2773,29 @@ expand_vec_cmp (rtx target, rtx_code code, rtx op0, rtx 
op1, rtx mask,
   rtx gt = gen_reg_rtx (mask_mode);
   expand_vec_cmp (lt, LT, op0, op1, mask, maskoff);
   expand_vec_cmp (gt, GT, op0, op1, mask, maskoff);
-  icode = code_for_pred (IOR, mask_mode);
+  insn_code icode = code_for_pred (IOR, mask_mode);
   rtx ops[] = {target, l

RE: [PATCH v1] Match: Only allow single use of MIN_EXPR for SAT_TRUNC form 2 [PR115863]

2024-07-18 Thread Li, Pan2
> Otherwise the patch looks good to me.

Thanks Richard, will commit with the log updated.

Pan

-Original Message-
From: Richard Biener  
Sent: Thursday, July 18, 2024 9:27 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
tamar.christ...@arm.com; jeffreya...@gmail.com; rdapp@gmail.com; Liu, 
Hongtao 
Subject: Re: [PATCH v1] Match: Only allow single use of MIN_EXPR for SAT_TRUNC 
form 2 [PR115863]

On Thu, Jul 18, 2024 at 2:27 PM  wrote:
>
> From: Pan Li 
>
> The SAT_TRUNC form 2 has below pattern matching.
> From:
>   _18 = MIN_EXPR ;
>   iftmp.0_11 = (unsigned int) _18;
>
> To:
>   _18 = MIN_EXPR ;
>   iftmp.0_11 = .SAT_TRUNC (_18);

.SAT_TRUNC (left_8);

> But if there is another use of _18 like below,  the transform to the
> .SAT_TRUNC may have no earnings.  For example:
>
> From:
>   _18 = MIN_EXPR ; // op_0 def
>   iftmp.0_11 = (unsigned int) _18; // op_0
>   stream.avail_out = iftmp.0_11;
>   left_37 = left_8 - _18;  // op_0 use
>
> To:
>   _18 = MIN_EXPR ; // op_0 def
>   iftmp.0_11 = .SAT_TRUNC (_18);

.SAT_TRUNC (left_8);?

Otherwise the patch looks good to me.

Thanks,
Richard.

>   stream.avail_out = iftmp.0_11;
>   left_37 = left_8 - _18;  // op_0 use
>
> Pattern recog to .SAT_TRUNC cannot eliminate MIN_EXPR as above.  Then the
> backend (for example x86/riscv) will have additional 2-3 more insns
> after pattern recog besides the MIN_EXPR.  Thus,  keep the normal truncation
> as is should be the better choose.
>
> The below testsuites are passed for this patch:
> 1. The rv64gcv fully regression tests.
> 2. The x86 bootstrap tests.
> 3. The x86 fully regression tests.
>
> PR target/115863
>
> gcc/ChangeLog:
>
> * match.pd: Add single_use of MIN_EXPR for .SAT_TRUNC form 2.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr115863-1.c: New test.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/match.pd   | 15 +++--
>  gcc/testsuite/gcc.target/i386/pr115863-1.c | 37 ++
>  2 files changed, 50 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr115863-1.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 5cb399b8718..d4f040b5c7b 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3252,10 +3252,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>
>  /* Unsigned saturation truncate, case 2, sizeof (WT) > sizeof (NT).
> SAT_U_TRUNC = (NT)(MIN_EXPR (X, 255)).  */
> +/* If Op_0 def is MIN_EXPR and not single_use.  Aka below pattern:
> +
> + _18 = MIN_EXPR ; // op_0 def
> + iftmp.0_11 = (unsigned int) _18; // op_0
> + stream.avail_out = iftmp.0_11;
> + left_37 = left_8 - _18;  // op_0 use
> +
> +   Transfer to .SAT_TRUNC will have MIN_EXPR still live.  Then the backend
> +   (for example x86/riscv) will have 2-3 more insns generation for .SAT_TRUNC
> +   besides the MIN_EXPR.  Thus,  keep the normal truncation as is should be
> +   the better choose.  */
>  (match (unsigned_integer_sat_trunc @0)
> - (convert (min @0 INTEGER_CST@1))
> + (convert (min@2 @0 INTEGER_CST@1))
>   (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> -  && TYPE_UNSIGNED (TREE_TYPE (@0)))
> +  && TYPE_UNSIGNED (TREE_TYPE (@0)) && single_use (@2))
>   (with
>{
> unsigned itype_precision = TYPE_PRECISION (TREE_TYPE (@0));
> diff --git a/gcc/testsuite/gcc.target/i386/pr115863-1.c 
> b/gcc/testsuite/gcc.target/i386/pr115863-1.c
> new file mode 100644
> index 000..a672f62cec5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr115863-1.c
> @@ -0,0 +1,37 @@
> +/* PR target/115863 */
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-rtl-expand-details" } */
> +
> +#include 
> +
> +typedef struct z_stream_s {
> +uint32_t avail_out;
> +} z_stream;
> +
> +typedef z_stream *z_streamp;
> +
> +extern int deflate (z_streamp strmp);
> +
> +int compress2 (uint64_t *destLen)
> +{
> +  z_stream stream;
> +  int err;
> +  const uint32_t max = (uint32_t)(-1);
> +  uint64_t left;
> +
> +  left = *destLen;
> +
> +  stream.avail_out = 0;
> +
> +  do {
> +if (stream.avail_out == 0) {
> +stream.avail_out = left > (uint64_t)max ? max : (uint32_t)left;
> +left -= stream.avail_out;
> +}
> +err = deflate(&stream);
> +} while (err == 0);
> +
> +  return err;
> +}
> +
> +/* { dg-final { scan-rtl-dump-not ".SAT_TRUNC " "expand" } } */
> --
> 2.34.1
>


RE: [PATCH v1] Match: Only allow single use of MIN_EXPR for SAT_TRUNC form 2 [PR115863]

2024-07-18 Thread Li, Pan2
Thanks Tamar for comments.

The :s flag is somehow ignored in matching according the gccint doc.

"The second supported flag is s which tells the code generator to fail the 
pattern if the
expression marked with s does have more than one use and the simplification 
results in an
expression with more than one operator."

I also diff the generated code in gimple_unsigned_integer_sat_trunc, it doesn't 
have single use when :s flag.

&& TYPE_UNSIGNED (TREE_TYPE (captures[0]))  
 // the :s flag
&& TYPE_UNSIGNED (TREE_TYPE (captures[0])) && single_use (captures[1]) // 
explicit single_use check.

Pan

-Original Message-
From: Tamar Christina  
Sent: Thursday, July 18, 2024 8:36 PM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com; 
jeffreya...@gmail.com; rdapp@gmail.com; Liu, Hongtao 
Subject: RE: [PATCH v1] Match: Only allow single use of MIN_EXPR for SAT_TRUNC 
form 2 [PR115863]

> -Original Message-
> From: pan2...@intel.com 
> Sent: Thursday, July 18, 2024 1:27 PM
> To: gcc-patches@gcc.gnu.org
> Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com;
> Tamar Christina ; jeffreya...@gmail.com;
> rdapp@gmail.com; hongtao@intel.com; Pan Li 
> Subject: [PATCH v1] Match: Only allow single use of MIN_EXPR for SAT_TRUNC
> form 2 [PR115863]
> 
> From: Pan Li 
> 
> The SAT_TRUNC form 2 has below pattern matching.
> From:
>   _18 = MIN_EXPR ;
>   iftmp.0_11 = (unsigned int) _18;
> 
> To:
>   _18 = MIN_EXPR ;
>   iftmp.0_11 = .SAT_TRUNC (_18);
> 
> But if there is another use of _18 like below,  the transform to the
> .SAT_TRUNC may have no earnings.  For example:
> 
> From:
>   _18 = MIN_EXPR ; // op_0 def
>   iftmp.0_11 = (unsigned int) _18; // op_0
>   stream.avail_out = iftmp.0_11;
>   left_37 = left_8 - _18;  // op_0 use
> 
> To:
>   _18 = MIN_EXPR ; // op_0 def
>   iftmp.0_11 = .SAT_TRUNC (_18);
>   stream.avail_out = iftmp.0_11;
>   left_37 = left_8 - _18;  // op_0 use
> 
> Pattern recog to .SAT_TRUNC cannot eliminate MIN_EXPR as above.  Then the
> backend (for example x86/riscv) will have additional 2-3 more insns
> after pattern recog besides the MIN_EXPR.  Thus,  keep the normal truncation
> as is should be the better choose.
> 
> The below testsuites are passed for this patch:
> 1. The rv64gcv fully regression tests.
> 2. The x86 bootstrap tests.
> 3. The x86 fully regression tests.
> 
>   PR target/115863
> 
> gcc/ChangeLog:
> 
>   * match.pd: Add single_use of MIN_EXPR for .SAT_TRUNC form 2.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/i386/pr115863-1.c: New test.
> 
> Signed-off-by: Pan Li 
> ---
>  gcc/match.pd   | 15 +++--
>  gcc/testsuite/gcc.target/i386/pr115863-1.c | 37 ++
>  2 files changed, 50 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr115863-1.c
> 
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 5cb399b8718..d4f040b5c7b 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3252,10 +3252,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> 
>  /* Unsigned saturation truncate, case 2, sizeof (WT) > sizeof (NT).
> SAT_U_TRUNC = (NT)(MIN_EXPR (X, 255)).  */
> +/* If Op_0 def is MIN_EXPR and not single_use.  Aka below pattern:
> +
> + _18 = MIN_EXPR ; // op_0 def
> + iftmp.0_11 = (unsigned int) _18; // op_0
> + stream.avail_out = iftmp.0_11;
> + left_37 = left_8 - _18;  // op_0 use
> +
> +   Transfer to .SAT_TRUNC will have MIN_EXPR still live.  Then the backend
> +   (for example x86/riscv) will have 2-3 more insns generation for .SAT_TRUNC
> +   besides the MIN_EXPR.  Thus,  keep the normal truncation as is should be
> +   the better choose.  */
>  (match (unsigned_integer_sat_trunc @0)
> - (convert (min @0 INTEGER_CST@1))
> + (convert (min@2 @0 INTEGER_CST@1))
>   (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> -  && TYPE_UNSIGNED (TREE_TYPE (@0)))
> +  && TYPE_UNSIGNED (TREE_TYPE (@0)) && single_use (@2))

You can probably use the single use flag here? so

> - (convert (min @0 INTEGER_CST@1))
> + (convert (min:s @0 @0 INTEGER_CST@1))

?

Cheers,
Tamar

>   (with
>{
> unsigned itype_precision = TYPE_PRECISION (TREE_TYPE (@0));
> diff --git a/gcc/testsuite/gcc.target/i386/pr115863-1.c
> b/gcc/testsuite/gcc.target/i386/pr115863-1.c
> new file mode 100644
> index 000..a672f62cec5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr115863-1.c
> @@ -0,0 +1,37 @@
> +/* PR tar

RE: [PATCH v1] Doc: Add Standard-Names ustrunc and sstrunc for integer modes

2024-07-18 Thread Li, Pan2
Thanks Richard and Andrew, will commit v2 with that changes.

https://gcc.gnu.org/pipermail/gcc-patches/2024-July/657617.html

Pan

-Original Message-
From: Richard Biener  
Sent: Thursday, July 18, 2024 3:00 PM
To: Andrew Pinski 
Cc: Li, Pan2 ; gcc-patches@gcc.gnu.org; 
juzhe.zh...@rivai.ai; kito.ch...@gmail.com; tamar.christ...@arm.com; 
jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1] Doc: Add Standard-Names ustrunc and sstrunc for integer 
modes

On Thu, Jul 18, 2024 at 7:35 AM Andrew Pinski  wrote:
>
> On Wed, Jul 17, 2024 at 9:20 PM  wrote:
> >
> > From: Pan Li 
> >
> > This patch would like to add the doc for the Standard-Names
> > ustrunc and sstrunc,  include both the scalar and vector integer
> > modes.
>
> Thanks for doing this and this looks mostly good to me (can't approve it).

Too bad.  OK with the changes Andrew requested.

Thanks,
Richard.

>
> >
> > gcc/ChangeLog:
> >
> > * doc/md.texi: Add Standard-Names ustrunc and sstrunc.
> >
> > Signed-off-by: Pan Li 
> > ---
> >  gcc/doc/md.texi | 12 
> >  1 file changed, 12 insertions(+)
> >
> > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> > index 7f4335e0aac..f116dede906 100644
> > --- a/gcc/doc/md.texi
> > +++ b/gcc/doc/md.texi
> > @@ -5543,6 +5543,18 @@ means of constraints requiring operands 1 and 0 to 
> > be the same location.
> >  @itemx @samp{and@var{m}3}, @samp{ior@var{m}3}, @samp{xor@var{m}3}
> >  Similar, for other arithmetic operations.
> >
> > +@cindex @code{ustrunc@var{m}@var{n}2} instruction pattern
> > +@item @samp{ustrunc@var{m}@var{n}2}
> > +Truncate the operand 1, and storing the result in operand 0.  There will
> > +be saturation during the trunction.  The result will be saturated to the
> > +maximal value of operand 0 type if there is overflow when truncation.  The
> s/type/mode/ .
> > +operand 1 must have mode @var{n},  and the operand 0 must have mode 
> > @var{m}.
> > +Both the scalar and vector integer modes are allowed.
> I don't think you need the article `the` here. It reads wrong with it
> at least to me.
>
> > +
> > +@cindex @code{sstrunc@var{m}@var{n}2} instruction pattern
> > +@item @samp{sstrunc@var{m}@var{n}2}
> > +Similar but for signed.
> > +
> >  @cindex @code{andc@var{m}3} instruction pattern
> >  @item @samp{andc@var{m}3}
> >  Like @code{and@var{m}3}, but it uses bitwise-complement of operand 2
> > --
> > 2.34.1
> >


RE: [PATCH v1] Match: Bugfix .SAT_TRUNC honor types has no mode precision [PR115961]

2024-07-17 Thread Li, Pan2
Thanks all, will have a try in v2.

Pan

-Original Message-
From: Richard Sandiford  
Sent: Thursday, July 18, 2024 5:14 AM
To: Andrew Pinski 
Cc: Tamar Christina ; Richard Biener 
; Li, Pan2 ; 
gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
jeffreya...@gmail.com; rdapp@gmail.com; Liu, Hongtao 
Subject: Re: [PATCH v1] Match: Bugfix .SAT_TRUNC honor types has no mode 
precision [PR115961]

Andrew Pinski  writes:
> On Wed, Jul 17, 2024 at 1:03 PM Tamar Christina  
> wrote:
>>
>> > -Original Message-
>> > From: Richard Sandiford 
>> > Sent: Wednesday, July 17, 2024 8:55 PM
>> > To: Richard Biener 
>> > Cc: pan2...@intel.com; gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai;
>> > kito.ch...@gmail.com; Tamar Christina ;
>> > jeffreya...@gmail.com; rdapp@gmail.com; hongtao@intel.com
>> > Subject: Re: [PATCH v1] Match: Bugfix .SAT_TRUNC honor types has no mode
>> > precision [PR115961]
>> >
>> > Richard Biener  writes:
>> > > On Wed, Jul 17, 2024 at 11:48 AM  wrote:
>> > >>
>> > >> From: Pan Li 
>> > >>
>> > >> The .SAT_TRUNC matching doesn't check the type has mode precision.  Thus
>> > >> when bitfield like below will be recog as .SAT_TRUNC.
>> > >>
>> > >> struct e
>> > >> {
>> > >>   unsigned pre : 12;
>> > >>   unsigned a : 4;
>> > >> };
>> > >>
>> > >> __attribute__((noipa))
>> > >> void bug (e * v, unsigned def, unsigned use) {
>> > >>   e & defE = *v;
>> > >>   defE.a = min_u (use + 1, 0xf);
>> > >> }
>> > >>
>> > >> This patch would like to add type_has_mode_precision_p for the
>> > >> .SAT_TRUNC matching to get rid of this.
>> > >>
>> > >> The below test suites are passed for this patch:
>> > >> 1. The rv64gcv fully regression tests.
>> > >> 2. The x86 bootstrap tests.
>> > >> 3. The x86 fully regression tests.
>> > >
>> > > Hmm, rather than restricting the matching the issue is the optab query or
>> > > in this case how *_optab_supported_p blindly uses TYPE_MODE without
>> > > either asserting the type has mode precision or failing the query in 
>> > > this case.
>> > >
>> > > I think it would be simplest to adjust direct_optab_supported_p
>> > > (and convert_optab_supported_p) to reject such operations?  Richard, do
>> > > you agree or should callers check this instead?
>> >
>> > Sounds good to me, although I suppose it should go:
>> >
>> > bool
>> > direct_internal_fn_supported_p (internal_fn fn, tree_pair types,
>> >   optimization_type opt_type)
>> > {
>> >   // <--- Here
>> >   switch (fn)
>> > {
>> >
>> > }
>> > }
>> >
>> > until we know of a specific case where that's wrong.
>> >
>> > Is type_has_mode_precision_p meaningful for all types?
>> >
>>
>> I was wondering about that, wouldn't VECTOR_BOOLEAN_TYPE_P types fail?
>> e.g. on AVX where the type precision is 1 but the mode precision QImode?
>>
>> Unless I misunderstood the predicate.
>
> So type_has_mode_precision_p only works with scalar integral types
> (maybe scalar real types too) since it uses TYPE_PRECISION directly
> and not element_precision (the precision field is overloaded for
> vectors for the number of elements and TYPE_PRECISION on a vector type
> will cause an ICE since r14-2150-gfe48f2651334bc).
> So I suspect you need to check !VECTOR_TYPE_P (type) before calling
> type_has_mode_precision_p .

I think for VECTOR_TYPE_P it would be worth checking VECTOR_MODE_P instead,
if we're not requiring callers to check this kind of thing.

So something like:

bool
mode_describes_type_p (const_tree type)
{
  if (VECTOR_TYPE_P (type))
return VECTOR_MODE_P (TREE_TYPE (type));

  if (INTEGRAL_TYPE_P (type))
return type_has_mode_precision_p (type);

  if (SCALAR_FLOAT_TYPE_P (type))
return true;

  return false;
}

?  Possibly also with complex handling if we need that.

Richard


RE: [PATCH v1] Internal-fn: Support new IFN SAT_TRUNC for unsigned scalar int

2024-07-17 Thread Li, Pan2
> I just noticed you added ustrunc/sstrunc optabs but didn't add
> documentation for them in md.texi like the other optabs that are
> defined.
> See https://gcc.gnu.org/onlinedocs/gccint/Standard-Names.html for the
> generated file of md.texi there.

> Can you please update md.texi to add them?

Thanks Andrew, almost forgot this, will add it soon.

Pan

-Original Message-
From: Andrew Pinski  
Sent: Thursday, July 18, 2024 6:59 AM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
richard.guent...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1] Internal-fn: Support new IFN SAT_TRUNC for unsigned 
scalar int

On Tue, Jun 25, 2024 at 6:46 PM  wrote:
>
> From: Pan Li 
>
> This patch would like to add the middle-end presentation for the
> saturation truncation.  Aka set the result of truncated value to
> the max value when overflow.  It will take the pattern similar
> as below.
>
> Form 1:
>   #define DEF_SAT_U_TRUC_FMT_1(WT, NT) \
>   NT __attribute__((noinline)) \
>   sat_u_truc_##T##_fmt_1 (WT x)\
>   {\
> bool overflow = x > (WT)(NT)(-1);  \
> return ((NT)x) | (NT)-overflow;\
>   }
>
> For example, truncated uint16_t to uint8_t, we have
>
> * SAT_TRUNC (254)   => 254
> * SAT_TRUNC (255)   => 255
> * SAT_TRUNC (256)   => 255
> * SAT_TRUNC (65536) => 255
>
> Given below SAT_TRUNC from uint64_t to uint32_t.
>
> DEF_SAT_U_TRUC_FMT_1 (uint64_t, uint32_t)
>
> Before this patch:
> __attribute__((noinline))
> uint32_t sat_u_truc_T_fmt_1 (uint64_t x)
> {
>   _Bool overflow;
>   unsigned int _1;
>   unsigned int _2;
>   unsigned int _3;
>   uint32_t _6;
>
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   overflow_5 = x_4(D) > 4294967295;
>   _1 = (unsigned int) x_4(D);
>   _2 = (unsigned int) overflow_5;
>   _3 = -_2;
>   _6 = _1 | _3;
>   return _6;
> ;;succ:   EXIT
>
> }
>
> After this patch:
> __attribute__((noinline))
> uint32_t sat_u_truc_T_fmt_1 (uint64_t x)
> {
>   uint32_t _6;
>
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _6 = .SAT_TRUNC (x_4(D)); [tail call]
>   return _6;
> ;;succ:   EXIT
>
> }
>
> The below tests are passed for this patch:
> *. The rv64gcv fully regression tests.
> *. The rv64gcv build with glibc.
> *. The x86 bootstrap tests.
> *. The x86 fully regression tests.
>
> gcc/ChangeLog:
>
> * internal-fn.def (SAT_TRUNC): Add new signed IFN sat_trunc as
> unary_convert.
> * match.pd: Add new matching pattern for unsigned int sat_trunc.
> * optabs.def (OPTAB_CL): Add unsigned and signed optab.

I just noticed you added ustrunc/sstrunc optabs but didn't add
documentation for them in md.texi like the other optabs that are
defined.
See https://gcc.gnu.org/onlinedocs/gccint/Standard-Names.html for the
generated file of md.texi there.

Can you please update md.texi to add them?

Thanks,
Andrew Pinski


> * tree-ssa-math-opts.cc (gimple_unsigend_integer_sat_trunc): Add
> new decl for the matching pattern generated func.
> (match_unsigned_saturation_trunc): Add new func impl to match
> the .SAT_TRUNC.
> (math_opts_dom_walker::after_dom_children): Add .SAT_TRUNC match
> function under BIT_IOR_EXPR case.
> * tree.cc (integer_half_truncated_all_ones_p): Add new func impl
> to filter the truncated threshold.
> * tree.h (integer_half_truncated_all_ones_p): Add new func decl.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/internal-fn.def   |  2 ++
>  gcc/match.pd  | 12 +++-
>  gcc/optabs.def|  3 +++
>  gcc/tree-ssa-math-opts.cc | 32 
>  gcc/tree.cc   | 22 ++
>  gcc/tree.h|  6 ++
>  6 files changed, 76 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index a8c83437ada..915d329c05a 100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -278,6 +278,8 @@ DEF_INTERNAL_SIGNED_OPTAB_FN (MULHRS, ECF_CONST | 
> ECF_NOTHROW, first,
>  DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_ADD, ECF_CONST, first, ssadd, usadd, 
> binary)
>  DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_SUB, ECF_CONST, first, sssub, ussub, 
> binary)
>
> +DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_TRUNC, ECF_CONST, first, sstrunc, ustrunc, 
> unary_convert)
> +
>  DEF_INTERNAL_COND_FN (ADD, ECF_CONST, add, binary)
>  DEF_INTERNAL_COND_FN (SUB, ECF_CONST, sub, binary)
>  DEF_INTERNAL_COND_FN (MUL, ECF_CONST, smul, binary)
> 

RE: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for vectorizable_call

2024-07-16 Thread Li, Pan2
> I think that's a bug.  Do you say __builtin_add_overflow fails to promote
> (constant) arguments?

I double checked the 022t.ssa pass for the __builtin_add_overflow operands tree 
type. It looks like that
the 2 operands of .ADD_OVERFLOW has different tree types when one of them is 
constant.
One is unsigned DI, and the other is int.

(gdb) call debug_gimple_stmt(stmt)
_14 = .ADD_OVERFLOW (_4, 129);
(gdb) call debug_tree (gimple_call_arg(stmt, 0))
 
unit-size 
align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 
0x76a437e0 precision:64 min  max 
pointer_to_this >
visited
def_stmt _4 = *_3;
version:4>
(gdb) call debug_tree (gimple_call_arg(stmt, 1))
  constant 
129>
(gdb)

Then we go to the vect pass, we can also see that the ops of .ADD_OVERFLOW has 
different tree types.
As my understanding, here we should have unsigned DI for constant operands

(gdb) layout src
(gdb) list
506 if 
(gimple_call_num_args (_c4) == 2)
507   {
508 
tree _q40 = gimple_call_arg (_c4, 0);
509 
_q40 = do_valueize (valueize, _q40);
510 
tree _q41 = gimple_call_arg (_c4, 1);
511 
_q41 = do_valueize (valueize, _q41);
512 
if (integer_zerop (_q21))
513 
  {
514 
if (integer_minus_onep (_p1))
515 
  {
(gdb) call debug_tree (_q40)
 
unit-size 
align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 
0x76a437e0 precision:64 min  max 
pointer_to_this >
visited
def_stmt _4 = *_3;
version:4>
(gdb) call debug_tree (_q41)
  constant 
129>

Pan

-Original Message-
From: Richard Biener  
Sent: Wednesday, July 10, 2024 7:36 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
tamar.christ...@arm.com; jeffreya...@gmail.com; rdapp@gmail.com; Liu, 
Hongtao 
Subject: Re: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for 
vectorizable_call

On Wed, Jul 10, 2024 at 11:28 AM  wrote:
>
> From: Pan Li 
>
> The .SAT_ADD has 2 operand and one of the operand may be INTEGER_CST.
> For example _1 = .SAT_ADD (_2, 9) comes from below sample code.
>
> Form 3:
>   #define DEF_VEC_SAT_U_ADD_IMM_FMT_3(T, IMM)  \
>   T __attribute__((noinline))  \
>   vec_sat_u_add_imm##IMM##_##T##_fmt_3 (T *out, T *in, unsigned limit) \
>   {\
> unsigned i;\
> T ret; \
> for (i = 0; i < limit; i++)\
>   {\
> out[i] = __builtin_add_overflow (in[i], IMM, &ret) ? -1 : ret; \
>   }\
>   }
>
> DEF_VEC_SAT_U_ADD_IMM_FMT_3(uint64_t, 9)
>
> It will failure to vectorize as the vectorizable_call will check the
> operands is type_compatiable but the imm will be treated as unsigned
> SImode from the perspective of tree.

I think that's a bug.  Do you say __builtin_add_overflow fails to promote
(constant) arguments?

>  Aka
>
> uint64_t _1;
> uint64_t _2;
>
> _1 = .SAT_ADD (_2, 9);
>
> The _1 and _2 are unsigned DImode, which is different to imm 9 unsigned
> SImode,  and then result in vectorizable_call fails.  This patch would
> like to promote the imm operand to the operand type mode of _2 if and
> only if there is no precision/data loss.  Aka convert the imm 9 to the
> DImode for above example.
>
> The below test suites are passed for this patch:
> 1. The rv64gcv fully regression tests.
> 2. The rv64gcv build with glibc.
> 3. The x86 bootstrap tests.
> 4. The x86 fully regression tests.
>
> gcc/ChangeLog:
>
> * tree-vect-patterns.cc (vect_recog_promote_cst_to_unsigned): Add
> new func impl to promote the imm tree to target type.
> (vect_recog_sat_add_pattern): Peform the type promotion before
> generate .SAT_ADD call.
>

RE: [PATCH] RISC-V: Fix testcase for vector .SAT_SUB in zip benchmark

2024-07-12 Thread Li, Pan2
Thanks Jeff and Edwin for my silly mistake.

Pan

-Original Message-
From: Jeff Law  
Sent: Saturday, July 13, 2024 5:40 AM
To: Edwin Lu ; gcc-patches@gcc.gnu.org
Cc: Li, Pan2 ; gnu-toolch...@rivosinc.com
Subject: Re: [PATCH] RISC-V: Fix testcase for vector .SAT_SUB in zip benchmark



On 7/12/24 12:37 PM, Edwin Lu wrote:
> The following testcase was not properly testing anything due to an
> uninitialized variable. As a result, the loop was not iterating through
> the testing data, but instead on undefined values which could cause an
> unexpected abort.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_vx.h:
>   initialize variable
OK.  Thanks for chasing this down.

jeff



RE: [PATCH v1] RISC-V: Add testcases for vector .SAT_SUB in zip benchmark

2024-07-11 Thread Li, Pan2
Committed, thanks Juzhe.

Pan

From: juzhe.zh...@rivai.ai 
Sent: Thursday, July 11, 2024 6:32 PM
To: Li, Pan2 ; gcc-patches 
Cc: kito.cheng ; jeffreyalaw ; 
Robin Dapp ; Li, Pan2 
Subject: Re: [PATCH v1] RISC-V: Add testcases for vector .SAT_SUB in zip 
benchmark


LGTM

juzhe.zh...@rivai.ai<mailto:juzhe.zh...@rivai.ai>

From: pan2.li<mailto:pan2...@intel.com>
Date: 2024-07-11 16:29
To: gcc-patches<mailto:gcc-patches@gcc.gnu.org>
CC: juzhe.zhong<mailto:juzhe.zh...@rivai.ai>; 
kito.cheng<mailto:kito.ch...@gmail.com>; 
jeffreyalaw<mailto:jeffreya...@gmail.com>; 
rdapp.gcc<mailto:rdapp@gmail.com>; Pan Li<mailto:pan2...@intel.com>
Subject: [PATCH v1] RISC-V: Add testcases for vector .SAT_SUB in zip benchmark
From: Pan Li mailto:pan2...@intel.com>>

This patch would like to add the test cases for the vector .SAT_SUB in
the zip benchmark.  Aka:

Form in zip benchmark:
  #define DEF_VEC_SAT_U_SUB_ZIP(T1, T2) \
  void __attribute__((noinline))\
  vec_sat_u_sub_##T1##_##T2##_fmt_zip (T1 *x, T2 b, unsigned limit) \
  { \
T2 a;   \
T1 *p = x;  \
do {\
  a = *--p; \
  *p = (T1)(a >= b ? a - b : 0);\
} while (--limit);  \
  }

DEF_VEC_SAT_U_SUB_ZIP(uint8_t, uint16_t)

vec_sat_u_sub_uint16_t_uint32_t_fmt_zip:
  ...
  vsetvli   a4,zero,e32,m1,ta,ma
  vmv.v.x   v6,a1
  vsetvli   zero,zero,e16,mf2,ta,ma
  vid.v v2
  lia4,-1
  vnclipu.wiv6,v6,0   // .SAT_TRUNC
.L3:
  vle16.v   v3,0(a3)
  vrsub.vx  v5,v2,a6
  mva7,a4
  addw  a4,a4,t3
  vrgather.vv   v1,v3,v5
  vssubu.vv v1,v1,v6  // .SAT_SUB
  vrgather.vv   v3,v1,v5
  vse16.v   v3,0(a3)
  sub   a3,a3,t1
  bgtu  t4,a4,.L3

Passed the rv64gcv tests.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h: Add test
helper macros.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_data.h: Add test
data for .SAT_SUB in zip benchmark.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_vx.h: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_zip-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_zip.c: New test.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>
---
.../riscv/rvv/autovec/binop/vec_sat_arith.h   | 18 +
.../rvv/autovec/binop/vec_sat_binary_vx.h | 22 +
.../riscv/rvv/autovec/binop/vec_sat_data.h| 81 +++
.../rvv/autovec/binop/vec_sat_u_sub_zip-run.c | 16 
.../rvv/autovec/binop/vec_sat_u_sub_zip.c | 18 +
5 files changed, 155 insertions(+)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_vx.h
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_zip-run.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_zip.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h
index 10459807b2c..416a1e49a47 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h
@@ -322,6 +322,19 @@ vec_sat_u_sub_##T##_fmt_10 (T *out, T *op_1, T *op_2, 
unsigned limit) \
 } \
}
+#define DEF_VEC_SAT_U_SUB_ZIP(T1, T2) \
+void __attribute__((noinline))\
+vec_sat_u_sub_##T1##_##T2##_fmt_zip (T1 *x, T2 b, unsigned limit) \
+{ \
+  T2 a;   \
+  T1 *p = x;  \
+  do {\
+a = *--p; \
+*p = (T1)(a >= b ? a - b : 0);\
+  } while (--limit);  \
+}
+#define DEF_VEC_SAT_U_SUB_ZIP_WRAP(T1, T2) DEF_VEC_SAT_U_SUB_ZIP(T1, T2)
+
#define RUN_VEC_SAT_U_SUB_FMT_1(T, out, op_1, op_2, N) \
   vec_sat_u_sub_##T##_fmt_1(out, op_1, op_2, N)
@@ -352,6 +365,11 @@ vec_sat_u_sub_##T##_fmt_10 (T *out, T *op_1, T *op_2, 
unsigned limit) \
#define RUN_VEC_SAT_U_SUB_FMT_10(T, out, op_1, op_2, N) \
   vec_sat_u_sub_##T##_fmt_10(out, op_1, op_2, N)
+#define RUN_VEC_SAT_U_SUB_FMT_ZIP(T1, T2, x, b, N) \
+  vec_sat_u_sub_##T1##_##T2##_fmt_zi

RE: [PATCH v3] Vect: Optimize truncation for .SAT_SUB operands

2024-07-10 Thread Li, Pan2
> OK.

Committed, thanks Richard.

Pan

-Original Message-
From: Richard Biener  
Sent: Wednesday, July 10, 2024 7:26 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
tamar.christ...@arm.com; jeffreya...@gmail.com; rdapp@gmail.com; Liu, 
Hongtao 
Subject: Re: [PATCH v3] Vect: Optimize truncation for .SAT_SUB operands

On Tue, Jul 9, 2024 at 6:03 AM  wrote:
>
> From: Pan Li 
>
> To get better vectorized code of .SAT_SUB,  we would like to avoid the
> truncated operation for the assignment.  For example, as below.
>
> unsigned int _1;
> unsigned int _2;
> unsigned short int _4;
> _9 = (unsigned short int).SAT_SUB (_1, _2);
>
> If we make sure that the _1 is in the range of unsigned short int.  Such
> as a def similar to:
>
> _1 = (unsigned short int)_4;
>
> Then we can do the distribute the truncation operation to:
>
> _3 = (unsigned short int) MIN (65535, _2); // aka _3 = .SAT_TRUNC (_2);
> _9 = .SAT_SUB (_4, _3);
>
> Then,  we can better vectorized code and avoid the unnecessary narrowing
> stmt during vectorization with below stmt(s).
>
> _3 = .SAT_TRUNC(_2); // SI => HI
> _9 = .SAT_SUB (_4, _3);
>
> Let's take RISC-V vector as example to tell the changes.  For below
> sample code:
>
> __attribute__((noinline))
> void test (uint16_t *x, unsigned b, unsigned n)
> {
>   unsigned a = 0;
>   uint16_t *p = x;
>
>   do {
> a = *--p;
> *p = (uint16_t)(a >= b ? a - b : 0);
>   } while (--n);
> }
>
> Before this patch:
>   ...
>   .L3:
>   vle16.v   v1,0(a3)
>   vrsub.vx  v5,v2,t1
>   mvt3,a4
>   addw  a4,a4,t5
>   vrgather.vv   v3,v1,v5
>   vsetvli   zero,zero,e32,m1,ta,ma
>   vzext.vf2 v1,v3
>   vssubu.vx v1,v1,a1
>   vsetvli   zero,zero,e16,mf2,ta,ma
>   vncvt.x.x.w   v1,v1
>   vrgather.vv   v3,v1,v5
>   vse16.v   v3,0(a3)
>   sub   a3,a3,t4
>   bgtu  t6,a4,.L3
>   ...
>
> After this patch:
> test:
>   ...
>   .L3:
>   vle16.v v3,0(a3)
>   vrsub.vxv5,v2,a6
>   mv  a7,a4
>   addwa4,a4,t3
>   vrgather.vv v1,v3,v5
>   vssubu.vv   v1,v1,v6
>   vrgather.vv v3,v1,v5
>   vse16.v v3,0(a3)
>   sub a3,a3,t1
>   bgtut4,a4,.L3
>   ...
>
> The below test suites are passed for this patch:
> 1. The rv64gcv fully regression tests.
> 2. The rv64gcv build with glibc.
> 3. The x86 bootstrap tests.
> 4. The x86 fully regression tests.

OK.

Thanks,
Richard.

> gcc/ChangeLog:
>
> * tree-vect-patterns.cc (vect_recog_sat_sub_pattern_transform):
> Add new func impl to perform the truncation distribution.
> (vect_recog_sat_sub_pattern): Perform above optimize before
> generate .SAT_SUB call.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/tree-vect-patterns.cc | 65 +++
>  1 file changed, 65 insertions(+)
>
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index 86e893a1c43..4570c25b664 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -4566,6 +4566,70 @@ vect_recog_sat_add_pattern (vec_info *vinfo, 
> stmt_vec_info stmt_vinfo,
>return NULL;
>  }
>
> +/*
> + * Try to transform the truncation for .SAT_SUB pattern,  mostly occurs in
> + * the benchmark zip.  Aka:
> + *
> + *   unsigned int _1;
> + *   unsigned int _2;
> + *   unsigned short int _4;
> + *   _9 = (unsigned short int).SAT_SUB (_1, _2);
> + *
> + *   if _1 is known to be in the range of unsigned short int.  For example
> + *   there is a def _1 = (unsigned short int)_4.  Then we can transform the
> + *   truncation to:
> + *
> + *   _3 = (unsigned short int) MIN (65535, _2); // aka _3 = .SAT_TRUNC (_2);
> + *   _9 = .SAT_SUB (_4, _3);
> + *
> + *   Then,  we can better vectorized code and avoid the unnecessary narrowing
> + *   stmt during vectorization with below stmt(s).
> + *
> + *   _3 = .SAT_TRUNC(_2); // SI => HI
> + *   _9 = .SAT_SUB (_4, _3);
> + */
> +static void
> +vect_recog_sat_sub_pattern_transform (vec_info *vinfo,
> + stmt_vec_info stmt_vinfo,
> + tree lhs, tree *ops)
> +{
> +  tree otype = TREE_TYPE (lhs);
> +  tree itype = TREE_TYPE (ops[0]);
> +  unsigned itype_prec = TYPE_PRECISION (itype);
> +  unsigned otype_prec = TYPE_PRECISION (otype);
> +
> +  if (types_compatible_p (otype, itype) || otype_prec >= itype_prec)
> +return;
> +
> +  tree v_otype = get_vectype_for_scalar_type (vinfo, otype);
> +  tree v_itype = get_vectype_for_scalar_type (vinfo, itype);
> +  tree_pair v_pair = tree_pair (v_oty

RE: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for vectorizable_call

2024-07-10 Thread Li, Pan2
> I think that's a bug.  Do you say __builtin_add_overflow fails to promote
> (constant) arguments?

Thanks Richard. Not very sure which part result in type mismatch, will take a 
look and keep you posted.

Pan

-Original Message-
From: Richard Biener  
Sent: Wednesday, July 10, 2024 7:36 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
tamar.christ...@arm.com; jeffreya...@gmail.com; rdapp@gmail.com; Liu, 
Hongtao 
Subject: Re: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for 
vectorizable_call

On Wed, Jul 10, 2024 at 11:28 AM  wrote:
>
> From: Pan Li 
>
> The .SAT_ADD has 2 operand and one of the operand may be INTEGER_CST.
> For example _1 = .SAT_ADD (_2, 9) comes from below sample code.
>
> Form 3:
>   #define DEF_VEC_SAT_U_ADD_IMM_FMT_3(T, IMM)  \
>   T __attribute__((noinline))  \
>   vec_sat_u_add_imm##IMM##_##T##_fmt_3 (T *out, T *in, unsigned limit) \
>   {\
> unsigned i;\
> T ret; \
> for (i = 0; i < limit; i++)\
>   {\
> out[i] = __builtin_add_overflow (in[i], IMM, &ret) ? -1 : ret; \
>   }\
>   }
>
> DEF_VEC_SAT_U_ADD_IMM_FMT_3(uint64_t, 9)
>
> It will failure to vectorize as the vectorizable_call will check the
> operands is type_compatiable but the imm will be treated as unsigned
> SImode from the perspective of tree.

I think that's a bug.  Do you say __builtin_add_overflow fails to promote
(constant) arguments?

>  Aka
>
> uint64_t _1;
> uint64_t _2;
>
> _1 = .SAT_ADD (_2, 9);
>
> The _1 and _2 are unsigned DImode, which is different to imm 9 unsigned
> SImode,  and then result in vectorizable_call fails.  This patch would
> like to promote the imm operand to the operand type mode of _2 if and
> only if there is no precision/data loss.  Aka convert the imm 9 to the
> DImode for above example.
>
> The below test suites are passed for this patch:
> 1. The rv64gcv fully regression tests.
> 2. The rv64gcv build with glibc.
> 3. The x86 bootstrap tests.
> 4. The x86 fully regression tests.
>
> gcc/ChangeLog:
>
> * tree-vect-patterns.cc (vect_recog_promote_cst_to_unsigned): Add
> new func impl to promote the imm tree to target type.
> (vect_recog_sat_add_pattern): Peform the type promotion before
> generate .SAT_ADD call.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/tree-vect-patterns.cc | 17 +
>  1 file changed, 17 insertions(+)
>
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index 86e893a1c43..e1013222b12 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -4527,6 +4527,20 @@ vect_recog_build_binary_gimple_stmt (vec_info *vinfo, 
> stmt_vec_info stmt_info,
>return NULL;
>  }
>
> +static void
> +vect_recog_promote_cst_to_unsigned (tree *op, tree type)
> +{
> +  if (TREE_CODE (*op) != INTEGER_CST || !TYPE_UNSIGNED (type))
> +return;
> +
> +  unsigned precision = TYPE_PRECISION (type);
> +  wide_int type_max = wi::mask (precision, false, precision);
> +  wide_int op_cst_val = wi::to_wide (*op, precision);
> +
> +  if (wi::leu_p (op_cst_val, type_max))
> +*op = wide_int_to_tree (type, op_cst_val);
> +}
> +
>  /*
>   * Try to detect saturation add pattern (SAT_ADD), aka below gimple:
>   *   _7 = _4 + _6;
> @@ -4553,6 +4567,9 @@ vect_recog_sat_add_pattern (vec_info *vinfo, 
> stmt_vec_info stmt_vinfo,
>
>if (gimple_unsigned_integer_sat_add (lhs, ops, NULL))
>  {
> +  vect_recog_promote_cst_to_unsigned (&ops[0], TREE_TYPE (ops[1]));
> +  vect_recog_promote_cst_to_unsigned (&ops[1], TREE_TYPE (ops[0]));
> +
>gimple *stmt = vect_recog_build_binary_gimple_stmt (vinfo, stmt_vinfo,
>   IFN_SAT_ADD, 
> type_out,
>   lhs, ops[0], 
> ops[1]);
> --
> 2.34.1
>


RE: [PATCH v1] Match: Support form 2 for the .SAT_TRUNC

2024-07-10 Thread Li, Pan2
> OK.

Committed, thanks Richard.

Pan

-Original Message-
From: Richard Biener  
Sent: Wednesday, July 10, 2024 5:24 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
tamar.christ...@arm.com; jeffreya...@gmail.com; rdapp@gmail.com; Liu, 
Hongtao 
Subject: Re: [PATCH v1] Match: Support form 2 for the .SAT_TRUNC

On Fri, Jul 5, 2024 at 2:48 PM  wrote:
>
> From: Pan Li 
>
> This patch would like to add form 2 support for the .SAT_TRUNC.  Aka:
>
> Form 2:
>   #define DEF_SAT_U_TRUC_FMT_2(NT, WT) \
>   NT __attribute__((noinline)) \
>   sat_u_truc_##WT##_to_##NT##_fmt_2 (WT x) \
>   {\
> bool overflow = x > (WT)(NT)(-1);  \
> return overflow ? (NT)-1 : (NT)x;  \
>   }
>
> DEF_SAT_U_TRUC_FMT_2(uint32, uint64)
>
> Before this patch:
>3   │
>4   │ __attribute__((noinline))
>5   │ uint32_t sat_u_truc_uint64_t_to_uint32_t_fmt_2 (uint64_t x)
>6   │ {
>7   │   uint32_t _1;
>8   │   long unsigned int _3;
>9   │
>   10   │ ;;   basic block 2, loop depth 0
>   11   │ ;;pred:   ENTRY
>   12   │   _3 = MIN_EXPR ;
>   13   │   _1 = (uint32_t) _3;
>   14   │   return _1;
>   15   │ ;;succ:   EXIT
>   16   │
>   17   │ }
>
> After this patch:
>3   │
>4   │ __attribute__((noinline))
>5   │ uint32_t sat_u_truc_uint64_t_to_uint32_t_fmt_2 (uint64_t x)
>6   │ {
>7   │   uint32_t _1;
>8   │
>9   │ ;;   basic block 2, loop depth 0
>   10   │ ;;pred:   ENTRY
>   11   │   _1 = .SAT_TRUNC (x_2(D)); [tail call]
>   12   │   return _1;
>   13   │ ;;succ:   EXIT
>   14   │
>   15   │ }
>
> The below test suites are passed for this patch:
> 1. The x86 bootstrap test.
> 2. The x86 fully regression test.
> 3. The rv64gcv fully regresssion test.

OK.

Thanks,
Richard.

> gcc/ChangeLog:
>
> * match.pd: Add form 2 for .SAT_TRUNC.
> * tree-ssa-math-opts.cc (math_opts_dom_walker::after_dom_children):
> Add new case NOP_EXPR,  and try to match SAT_TRUNC.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/match.pd  | 17 -
>  gcc/tree-ssa-math-opts.cc |  4 
>  2 files changed, 20 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 4edfa2ae2c9..3759c64d461 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3234,7 +3234,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
>&& types_match (type, @0, @1
>
> -/* Unsigned saturation truncate, case 1 (), sizeof (WT) > sizeof (NT).
> +/* Unsigned saturation truncate, case 1, sizeof (WT) > sizeof (NT).
> SAT_U_TRUNC = (NT)x | (NT)(-(X > (WT)(NT)(-1))).  */
>  (match (unsigned_integer_sat_trunc @0)
>   (bit_ior:c (negate (convert (gt @0 INTEGER_CST@1)))
> @@ -3250,6 +3250,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>}
>(if (otype_precision < itype_precision && wi::eq_p (trunc_max, 
> int_cst))
>
> +/* Unsigned saturation truncate, case 2, sizeof (WT) > sizeof (NT).
> +   SAT_U_TRUNC = (NT)(MIN_EXPR (X, 255)).  */
> +(match (unsigned_integer_sat_trunc @0)
> + (convert (min @0 INTEGER_CST@1))
> + (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> +  && TYPE_UNSIGNED (TREE_TYPE (@0)))
> + (with
> +  {
> +   unsigned itype_precision = TYPE_PRECISION (TREE_TYPE (@0));
> +   unsigned otype_precision = TYPE_PRECISION (type);
> +   wide_int trunc_max = wi::mask (otype_precision, false, itype_precision);
> +   wide_int int_cst = wi::to_wide (@1, itype_precision);
> +  }
> +  (if (otype_precision < itype_precision && wi::eq_p (trunc_max, 
> int_cst))
> +
>  /* x >  y  &&  x != XXX_MIN  -->  x > y
> x >  y  &&  x == XXX_MIN  -->  false . */
>  (for eqne (eq ne)
> diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
> index a35caf5f058..ac86be8eb94 100644
> --- a/gcc/tree-ssa-math-opts.cc
> +++ b/gcc/tree-ssa-math-opts.cc
> @@ -6170,6 +6170,10 @@ math_opts_dom_walker::after_dom_children (basic_block 
> bb)
>   match_unsigned_saturation_sub (&gsi, as_a (stmt));
>   break;
>
> +   case NOP_EXPR:
> + match_unsigned_saturation_trunc (&gsi, as_a (stmt));
> + break;
> +
> default:;
> }
> }
> --
> 2.34.1
>


RE: [PATCH v1] RISC-V: Bugfix vfmv insn honor zvfhmin for FP16 SEW [PR115763]

2024-07-08 Thread Li, Pan2
Backported to gcc 14 already.

Pan

From: Li, Pan2
Sent: Wednesday, July 3, 2024 10:41 PM
To: Kito Cheng ; juzhe.zh...@rivai.ai
Cc: gcc-patches@gcc.gnu.org; jeffreya...@gmail.com; rdapp@gmail.com
Subject: RE: [PATCH v1] RISC-V: Bugfix vfmv insn honor zvfhmin for FP16 SEW 
[PR115763]

Committed, thanks Juzhe and Kito. Let’s wait for a while before backport to 14.

I suspect there may be similar cases for other insn(s), will double check and 
fix first.

Pan

From: Kito Cheng mailto:kito.ch...@gmail.com>>
Sent: Wednesday, July 3, 2024 10:32 PM
To: juzhe.zh...@rivai.ai<mailto:juzhe.zh...@rivai.ai>
Cc: Li, Pan2 mailto:pan2...@intel.com>>; 
gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>; 
jeffreya...@gmail.com<mailto:jeffreya...@gmail.com>; 
rdapp@gmail.com<mailto:rdapp@gmail.com>
Subject: Re: [PATCH v1] RISC-V: Bugfix vfmv insn honor zvfhmin for FP16 SEW 
[PR115763]


LGTM and ok for gcc 14 as well,
btw an idea is that actually could passed via gpr, I mean fpr->gpr and then 
vmv.v.x, but it's not block commend for this patch.

钟居哲 mailto:juzhe.zh...@rivai.ai>> 於 2024年7月3日 週三 22:18 寫道:
LGTM。


juzhe.zh...@rivai.ai<mailto:juzhe.zh...@rivai.ai>

From: pan2.li<mailto:pan2...@intel.com>
Date: 2024-07-03 22:17
To: gcc-patches<mailto:gcc-patches@gcc.gnu.org>
CC: juzhe.zhong<mailto:juzhe.zh...@rivai.ai>; 
kito.cheng<mailto:kito.ch...@gmail.com>; 
jeffreyalaw<mailto:jeffreya...@gmail.com>; 
rdapp.gcc<mailto:rdapp@gmail.com>; Pan Li<mailto:pan2...@intel.com>
Subject: [PATCH v1] RISC-V: Bugfix vfmv insn honor zvfhmin for FP16 SEW 
[PR115763]
From: Pan Li mailto:pan2...@intel.com>>

According to the ISA,  the zvfhmin sub extension should only contain
convertion insn.  Thus,  the vfmv insn acts on FP16 should not be
present when only the zvfhmin option is given.

This patch would like to fix it by split the pred_broadcast define_insn
into zvfhmin and zvfh part.  Given below example:

void test (_Float16 *dest, _Float16 bias) {
  dest[0] = bias;
  dest[1] = bias;
}

when compile with -march=rv64gcv_zfh_zvfhmin

Before this patch:
test:
  vsetivlizero,2,e16,mf4,ta,ma
  vfmv.v.fv1,fa0 // should not leverage vfmv for zvfhmin
  vse16.v v1,0(a0)
  ret

After this patch:
test:
  addi sp,sp,-16
  fsh  fa0,14(sp)
  addi a5,sp,14
  vsetivli zero,2,e16,mf4,ta,ma
  vlse16.v v1,0(a5),zero
  vse16.v  v1,0(a0)
  addi sp,sp,16
  jr   ra

PR target/115763

gcc/ChangeLog:

* config/riscv/vector.md (*pred_broadcast): Split into
zvfh and zvfhmin part.
(*pred_broadcast_zvfh): New define_insn for zvfh part.
(*pred_broadcast_zvfhmin): Ditto but for zvfhmin.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/scalar_move-5.c: Adjust asm check.
* gcc.target/riscv/rvv/base/scalar_move-6.c: Ditto.
* gcc.target/riscv/rvv/base/scalar_move-7.c: Ditto.
* gcc.target/riscv/rvv/base/scalar_move-8.c: Ditto.
* gcc.target/riscv/rvv/base/pr115763-1.c: New test.
* gcc.target/riscv/rvv/base/pr115763-2.c: New test.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>
---
gcc/config/riscv/vector.md| 49 +--
.../gcc.target/riscv/rvv/base/pr115763-1.c|  9 
.../gcc.target/riscv/rvv/base/pr115763-2.c| 10 
.../gcc.target/riscv/rvv/base/scalar_move-5.c |  4 +-
.../gcc.target/riscv/rvv/base/scalar_move-6.c |  6 +--
.../gcc.target/riscv/rvv/base/scalar_move-7.c |  6 +--
.../gcc.target/riscv/rvv/base/scalar_move-8.c |  6 +--
7 files changed, 64 insertions(+), 26 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr115763-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr115763-2.c

diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index fe18ee5b5f7..d9474262d54 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -2080,31 +2080,50 @@ (define_insn_and_split "*pred_broadcast"
   [(set_attr "type" "vimov,vimov,vlds,vlds,vlds,vlds,vimovxv,vimovxv")
(set_attr "mode" "")])
-(define_insn "*pred_broadcast"
-  [(set (match_operand:V_VLSF_ZVFHMIN 0 "register_operand" "=vr, vr, 
vr, vr, vr, vr, vr, vr")
- (if_then_else:V_VLSF_ZVFHMIN
+(define_insn "*pred_broadcast_zvfh"
+  [(set (match_operand:V_VLSF0 "register_operand"  "=vr,  vr,  
vr,  vr")
+ (if_then_else:V_VLSF
  (unspec:
- [(match_operand: 1 "vector_broadcast_mask_operand" "Wc1,Wc1, vm, 
vm,Wc1,Wc1,Wb1,Wb1")
-  (match_operand 4 "vector_length_operand"  " rK, rK, rK, rK, 
rK, rK, rK, rK")
-  (match_operand 5 "const_int_operand"  "  i,  i,  i,  i,  
i,  i,  i,  i")
-  (match_operand 6 "const_int_operand"  "  i,  i,  i,  i,  
i,  i,  i,

  1   2   3   4   5   6   7   8   9   10   >