RE: [PATCH v2] Match: Extract integer_types_ternary_match helper to avoid code dup [NFC]

2024-05-22 Thread Li, Pan2
Thanks Richard for comments.

> I think it's more useful to add an overload to types_match with three
> arguments and then use

> (if (INTEGRAL_TYPE_P (type)
>   && types_match (type, TREE_TYPE (@0), TREE_TYPE (@1))

Sure thing, will try to add overloaded types_match here.

Pan

-Original Message-
From: Richard Biener  
Sent: Wednesday, May 22, 2024 9:04 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
tamar.christ...@arm.com; pins...@gmail.com
Subject: Re: [PATCH v2] Match: Extract integer_types_ternary_match helper to 
avoid code dup [NFC]

On Mon, May 20, 2024 at 1:00 PM  wrote:
>
> From: Pan Li 
>
> There are sorts of match pattern for SAT related cases,  there will be
> some duplicated code to check the dest, op_0, op_1 are same tree types.
> Aka ternary tree type matches.  Thus, extract one helper function to
> do this and avoid match code duplication.

I think it's more useful to add an overload to types_match with three
arguments and then use

 (if (INTEGRAL_TYPE_P (type)
  && types_match (type, TREE_TYPE (@0), TREE_TYPE (@1))
...

Richard.

> The below test suites are passed for this patch:
> * The rv64gcv fully regression test.
> * The x86 bootstrap test.
> * The x86 regression test.
>
> gcc/ChangeLog:
>
> * generic-match-head.cc (integer_types_ternary_match): New helper
> function to check tenary tree type matches or not.
> * gimple-match-head.cc (integer_types_ternary_match): Ditto but
> for match.
> * match.pd: Leverage above helper function to avoid code dup.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/generic-match-head.cc | 17 +
>  gcc/gimple-match-head.cc  | 17 +
>  gcc/match.pd  | 25 +
>  3 files changed, 39 insertions(+), 20 deletions(-)
>
> diff --git a/gcc/generic-match-head.cc b/gcc/generic-match-head.cc
> index 0d3f648fe8d..cdd48c7a5cc 100644
> --- a/gcc/generic-match-head.cc
> +++ b/gcc/generic-match-head.cc
> @@ -59,6 +59,23 @@ types_match (tree t1, tree t2)
>return TYPE_MAIN_VARIANT (t1) == TYPE_MAIN_VARIANT (t2);
>  }
>
> +/* Routine to determine if the types T1,  T2 and T3 are effectively
> +   the same integer type for GENERIC.  If T1,  T2 or T3 is not a type,
> +   the test applies to their TREE_TYPE.  */
> +
> +static inline bool
> +integer_types_ternary_match (tree t1, tree t2, tree t3)
> +{
> +  t1 = TYPE_P (t1) ? t1 : TREE_TYPE (t1);
> +  t2 = TYPE_P (t2) ? t2 : TREE_TYPE (t2);
> +  t3 = TYPE_P (t3) ? t3 : TREE_TYPE (t3);
> +
> +  if (!INTEGRAL_TYPE_P (t1) || !INTEGRAL_TYPE_P (t2) || !INTEGRAL_TYPE_P 
> (t3))
> +return false;
> +
> +  return types_match (t1, t2) && types_match (t1, t3);
> +}
> +
>  /* Return if T has a single use.  For GENERIC, we assume this is
> always true.  */
>
> diff --git a/gcc/gimple-match-head.cc b/gcc/gimple-match-head.cc
> index 5f8a1a1ad8e..91f2e56b8ef 100644
> --- a/gcc/gimple-match-head.cc
> +++ b/gcc/gimple-match-head.cc
> @@ -79,6 +79,23 @@ types_match (tree t1, tree t2)
>return types_compatible_p (t1, t2);
>  }
>
> +/* Routine to determine if the types T1,  T2 and T3 are effectively
> +   the same integer type for GIMPLE.  If T1,  T2 or T3 is not a type,
> +   the test applies to their TREE_TYPE.  */
> +
> +static inline bool
> +integer_types_ternary_match (tree t1, tree t2, tree t3)
> +{
> +  t1 = TYPE_P (t1) ? t1 : TREE_TYPE (t1);
> +  t2 = TYPE_P (t2) ? t2 : TREE_TYPE (t2);
> +  t3 = TYPE_P (t3) ? t3 : TREE_TYPE (t3);
> +
> +  if (!INTEGRAL_TYPE_P (t1) || !INTEGRAL_TYPE_P (t2) || !INTEGRAL_TYPE_P 
> (t3))
> +return false;
> +
> +  return types_match (t1, t2) && types_match (t1, t3);
> +}
> +
>  /* Return if T has a single use.  For GIMPLE, we also allow any
> non-SSA_NAME (ie constants) and zero uses to cope with uses
> that aren't linked up yet.  */
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 0f9c34fa897..401b52e7573 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3046,38 +3046,23 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  /* Unsigned Saturation Add */
>  (match (usadd_left_part_1 @0 @1)
>   (plus:c @0 @1)
> - (if (INTEGRAL_TYPE_P (type)
> -  && TYPE_UNSIGNED (TREE_TYPE (@0))
> -  && types_match (type, TREE_TYPE (@0))
> -  && types_match (type, TREE_TYPE (@1)
> + (if (integer_types_ternary_match (type, @0, @1) && TYPE_UNSIGNED (type
>
>  (match (usadd_left_part_2 @0 @1)
>   (realpart (IFN_ADD_OVERFLOW:c @0 @1))
> - (if (INTEGRAL_TYPE_P (type)
> -  && TYPE_UNSIGNED (TREE_TYPE (@0))
> -  &&a

RE: [PATCH v1 1/2] Match: Support __builtin_add_overflow for branchless unsigned SAT_ADD

2024-05-22 Thread Li, Pan2
Thanks Richard for comments, will merge the rest form of .SAT_ADD in one middle 
end patch for fully picture, as well as comments addressing.

Pan

-Original Message-
From: Richard Biener  
Sent: Wednesday, May 22, 2024 9:16 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
tamar.christ...@arm.com
Subject: Re: [PATCH v1 1/2] Match: Support __builtin_add_overflow for 
branchless unsigned SAT_ADD

On Sun, May 19, 2024 at 8:37 AM  wrote:
>
> From: Pan Li 
>
> This patch would like to support the branchless form for unsigned
> SAT_ADD when leverage __builtin_add_overflow.  For example as below:
>
> uint64_t sat_add_u(uint64_t x, uint64_t y)
> {
>   uint64_t ret;
>   uint64_t overflow = __builtin_add_overflow (x, y, &ret);
>
>   return (T)(-overflow) | ret;
> }
>
> Before this patch:
>
> uint64_t sat_add_u (uint64_t x, uint64_t y)
> {
>   long unsigned int _1;
>   long unsigned int _2;
>   long unsigned int _3;
>   __complex__ long unsigned int _6;
>   uint64_t _8;
>
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _6 = .ADD_OVERFLOW (x_4(D), y_5(D));
>   _1 = REALPART_EXPR <_6>;
>   _2 = IMAGPART_EXPR <_6>;
>   _3 = -_2;
>   _8 = _1 | _3;
>   return _8;
> ;;succ:   EXIT
>
> }
>
> After this patch:
>
> uint64_t sat_add_u (uint64_t x, uint64_t y)
> {
>   uint64_t _8;
>
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _8 = .SAT_ADD (x_4(D), y_5(D)); [tail call]
>   return _8;
> ;;succ:   EXIT
>
> }
>
> The below tests suite are passed for this patch.
> * The rv64gcv fully regression test.
> * The x86 bootstrap test.
> * The x86 fully regression test.
>
> gcc/ChangeLog:
>
> * match.pd: Add SAT_ADD right part 2 for __builtin_add_overflow.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/match.pd | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index b291e34bbe4..5328e846aff 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3064,6 +3064,10 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (negate (convert (ne (imagpart (IFN_ADD_OVERFLOW:c @0 @1)) integer_zerop)))
>   (if (TYPE_UNSIGNED (type) && integer_types_ternary_match (type, @0, @1
>
> +(match (usadd_right_part_2 @0 @1)
> + (negate (imagpart (IFN_ADD_OVERFLOW:c @0 @1)))
> + (if (TYPE_UNSIGNED (type) && integer_types_ternary_match (type, @0, @1
> +

Can you merge this with the patch that makes use of the
usadd_right_part_2 match?
It's difficult to review on its own.

>  /* We cannot merge or overload usadd_left_part_1 and usadd_left_part_2
> because the sub part of left_part_2 cannot work with right_part_1.
> For example, left_part_2 pattern focus one .ADD_OVERFLOW but the
> --
> 2.34.1
>


RE: [PATCH v2] Match: Support __builtin_add_overflow branch form for unsigned SAT_ADD

2024-05-22 Thread Li, Pan2
Thanks Richard for reviewing.

> I'm not convinced we should match this during early if-conversion, should we?
> The middle-end doesn't really know .SAT_ADD but some handling of
> .ADD_OVERFLOW is present.

I tried to do the branch (aka cond) match in widen-mult pass similar as 
previous branchless form.
Unfortunately, the branch will be converted to PHI when widen-mult, thus try to 
bypass the PHI handling
and convert the branch form to the branchless form in v2.

> But please add a comment before the new pattern, esp. since it's
> non-obvious that this is an improvent.

Sure thing.

> I suspect you rely on this form being recognized as .SAT_ADD later but
> what prevents us from breaking this?  Why not convert it to .SAT_ADD
> immediately?  If this is because the ISEL pass (or the widen-mult pass)
> cannot handle PHIs then I would suggest to split out enough parts of
> tree-ssa-phiopt.cc to be able to query match.pd for COND_EXPRs.

Yes, this is sort of redundant, we can also convert it to .SAT_ADD immediately 
in match.pd before widen-mult.

Sorry I may get confused here, for branch form like below, what transform 
should we perform in phiopt?
The gimple_simplify_phiopt mostly leverage the simplify in match.pd but we may 
hit the simplify in the
other early pass.

Or we can leverage branch version of unsigned_integer_sat_add gimple match in 
phiopt and generate the gimple call .SAT_ADD
In phiopt (mostly like what we do in widen-mult).
Not sure if my understanding is correct or not, thanks again for help.

#define SAT_ADD_U_1(T) \
T sat_add_u_1_##T(T x, T y) \
{ \
  return (T)(x + y) >= x ? (x + y) : -1; \
}

SAT_ADD_U_1(uint8_t);

Pan

-Original Message-
From: Richard Biener  
Sent: Wednesday, May 22, 2024 9:14 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
tamar.christ...@arm.com; pins...@gmail.com
Subject: Re: [PATCH v2] Match: Support __builtin_add_overflow branch form for 
unsigned SAT_ADD

On Wed, May 22, 2024 at 3:17 AM  wrote:
>
> From: Pan Li 
>
> This patch would like to support the __builtin_add_overflow branch form for
> unsigned SAT_ADD.  For example as below:
>
> uint64_t
> sat_add (uint64_t x, uint64_t y)
> {
>   uint64_t ret;
>   return __builtin_add_overflow (x, y, &ret) ? -1 : ret;
> }
>
> Different to the branchless version,  we leverage the simplify to
> convert the branch version of SAT_ADD into branchless if and only
> if the backend has supported the IFN_SAT_ADD.  Thus,  the backend has
> the ability to choose branch or branchless implementation of .SAT_ADD.
> For example,  some target can take care of branches code more optimally.
>
> When the target implement the IFN_SAT_ADD for unsigned and before this
> patch:
>
> uint64_t sat_add (uint64_t x, uint64_t y)
> {
>   long unsigned int _1;
>   long unsigned int _2;
>   uint64_t _3;
>   __complex__ long unsigned int _6;
>
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _6 = .ADD_OVERFLOW (x_4(D), y_5(D));
>   _2 = IMAGPART_EXPR <_6>;
>   if (_2 != 0)
> goto ; [35.00%]
>   else
> goto ; [65.00%]
> ;;succ:   4
> ;;3
>
> ;;   basic block 3, loop depth 0
> ;;pred:   2
>   _1 = REALPART_EXPR <_6>;
> ;;succ:   4
>
> ;;   basic block 4, loop depth 0
> ;;pred:   3
> ;;2
>   # _3 = PHI <_1(3), 18446744073709551615(2)>
>   return _3;
> ;;succ:   EXIT
> }
>
> After this patch:
> uint64_t sat_add (uint64_t x, uint64_t y)
> {
>   long unsigned int _12;
>
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _12 = .SAT_ADD (x_4(D), y_5(D)); [tail call]
>   return _12;
> ;;succ:   EXIT
> }
>
> The below test suites are passed for this patch:
> * The x86 bootstrap test.
> * The x86 fully regression test.
> * The riscv fully regression test.

I'm not convinced we should match this during early if-conversion, should we?
The middle-end doesn't really know .SAT_ADD but some handling of
.ADD_OVERFLOW is present.

But please add a comment before the new pattern, esp. since it's
non-obvious that this is an improvent.

I suspect you rely on this form being recognized as .SAT_ADD later but
what prevents us from breaking this?  Why not convert it to .SAT_ADD
immediately?  If this is because the ISEL pass (or the widen-mult pass)
cannot handle PHIs then I would suggest to split out enough parts of
tree-ssa-phiopt.cc to be able to query match.pd for COND_EXPRs.

> gcc/ChangeLog:
>
> * match.pd: Add new simplify to convert branch SAT_ADD into
> branchless,  if and only if backend implement the IFN.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/match.pd | 11 +++

RE: [PATCH v2] Match: Support __builtin_add_overflow branch form for unsigned SAT_ADD

2024-05-23 Thread Li, Pan2
I have a try to convert the PHI from Part-A to Part-B, aka PHI to _2 = phi_cond 
? _1 : 255.
And then we can do the matching on COND_EXPR in the underlying widen-mul pass.

Unfortunately, meet some ICE when verify_gimple_phi in sccopy1 pass => 
sat_add.c:66:1: internal compiler error: tree check: expected class ‘type’, 
have ‘exceptional’ (error_mark) in useless_type_conversion_p, at 
gimple-expr.cc:86

will go on to see if this works or not.

Part-A:
uint8_t sat_add_u_1_uint8_t (uint8_t x, uint8_t y)
{
  unsigned char _1;
  uint8_t _2;

   :
  _1 = x_3(D) + y_4(D);
  if (_1 >= x_3(D))
goto ; [INV]
  else
goto ; [INV]

   :

   :
  # _2 = PHI <255(2), _1(3)>
  return _2;

}

Part-B:
uint8_t sat_add_u_1_uint8_t (uint8_t x, uint8_t y)
{
  unsigned char _1;
  _Bool phi_cond_6;

   :
  _1 = x_3(D) + y_4(D);
  phi_cond_6 = _1 >= x_3(D);
  _2 = phi_cond_6 ? _1 : 255;
  return _2;

}

-Original Message-
From: Li, Pan2 
Sent: Thursday, May 23, 2024 12:17 PM
To: Richard Biener 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
tamar.christ...@arm.com; pins...@gmail.com
Subject: RE: [PATCH v2] Match: Support __builtin_add_overflow branch form for 
unsigned SAT_ADD

Thanks Richard for reviewing.

> I'm not convinced we should match this during early if-conversion, should we?
> The middle-end doesn't really know .SAT_ADD but some handling of
> .ADD_OVERFLOW is present.

I tried to do the branch (aka cond) match in widen-mult pass similar as 
previous branchless form.
Unfortunately, the branch will be converted to PHI when widen-mult, thus try to 
bypass the PHI handling
and convert the branch form to the branchless form in v2.

> But please add a comment before the new pattern, esp. since it's
> non-obvious that this is an improvent.

Sure thing.

> I suspect you rely on this form being recognized as .SAT_ADD later but
> what prevents us from breaking this?  Why not convert it to .SAT_ADD
> immediately?  If this is because the ISEL pass (or the widen-mult pass)
> cannot handle PHIs then I would suggest to split out enough parts of
> tree-ssa-phiopt.cc to be able to query match.pd for COND_EXPRs.

Yes, this is sort of redundant, we can also convert it to .SAT_ADD immediately 
in match.pd before widen-mult.

Sorry I may get confused here, for branch form like below, what transform 
should we perform in phiopt?
The gimple_simplify_phiopt mostly leverage the simplify in match.pd but we may 
hit the simplify in the
other early pass.

Or we can leverage branch version of unsigned_integer_sat_add gimple match in 
phiopt and generate the gimple call .SAT_ADD
In phiopt (mostly like what we do in widen-mult).
Not sure if my understanding is correct or not, thanks again for help.

#define SAT_ADD_U_1(T) \
T sat_add_u_1_##T(T x, T y) \
{ \
  return (T)(x + y) >= x ? (x + y) : -1; \
}

SAT_ADD_U_1(uint8_t);

Pan

-Original Message-
From: Richard Biener  
Sent: Wednesday, May 22, 2024 9:14 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
tamar.christ...@arm.com; pins...@gmail.com
Subject: Re: [PATCH v2] Match: Support __builtin_add_overflow branch form for 
unsigned SAT_ADD

On Wed, May 22, 2024 at 3:17 AM  wrote:
>
> From: Pan Li 
>
> This patch would like to support the __builtin_add_overflow branch form for
> unsigned SAT_ADD.  For example as below:
>
> uint64_t
> sat_add (uint64_t x, uint64_t y)
> {
>   uint64_t ret;
>   return __builtin_add_overflow (x, y, &ret) ? -1 : ret;
> }
>
> Different to the branchless version,  we leverage the simplify to
> convert the branch version of SAT_ADD into branchless if and only
> if the backend has supported the IFN_SAT_ADD.  Thus,  the backend has
> the ability to choose branch or branchless implementation of .SAT_ADD.
> For example,  some target can take care of branches code more optimally.
>
> When the target implement the IFN_SAT_ADD for unsigned and before this
> patch:
>
> uint64_t sat_add (uint64_t x, uint64_t y)
> {
>   long unsigned int _1;
>   long unsigned int _2;
>   uint64_t _3;
>   __complex__ long unsigned int _6;
>
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _6 = .ADD_OVERFLOW (x_4(D), y_5(D));
>   _2 = IMAGPART_EXPR <_6>;
>   if (_2 != 0)
> goto ; [35.00%]
>   else
> goto ; [65.00%]
> ;;succ:   4
> ;;3
>
> ;;   basic block 3, loop depth 0
> ;;pred:   2
>   _1 = REALPART_EXPR <_6>;
> ;;succ:   4
>
> ;;   basic block 4, loop depth 0
> ;;pred:   3
> ;;2
>   # _3 = PHI <_1(3), 18446744073709551615(2)>
>   return _3;
> ;;succ:   EXIT
> }
>
> After this patch:
> uint64_t sat_add (uint64_t x, uint64_t y)
> {
>   long unsigned int _12;

RE: [PATCH v4] Match: Add overloaded types_match to avoid code dup [NFC]

2024-05-23 Thread Li, Pan2
> the above three lines are redundant.
> OK with those removed.

Got it, will commit it after no surprise in test for removal.

Pan

-Original Message-
From: Richard Biener  
Sent: Thursday, May 23, 2024 7:49 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
tamar.christ...@arm.com; pins...@gmail.com
Subject: Re: [PATCH v4] Match: Add overloaded types_match to avoid code dup 
[NFC]

On Thu, May 23, 2024 at 2:24 AM  wrote:
>
> From: Pan Li 
>
> There are sorts of match pattern for SAT related cases,  there will be
> some duplicated code to check the dest, op_0, op_1 are same tree types.
> Aka ternary tree type matches.  Thus,  add overloaded types_match func
> do this and avoid match code duplication.
>
> The below test suites are passed for this patch:
> * The rv64gcv fully regression test.
> * The x86 bootstrap test.
> * The x86 regression test.
>
> gcc/ChangeLog:
>
> * generic-match-head.cc (types_match): Add overloaded types_match
> for 3 types.
> * gimple-match-head.cc (types_match): Ditto.
> * match.pd: Leverage overloaded types_match.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/generic-match-head.cc | 14 ++
>  gcc/gimple-match-head.cc  | 14 ++
>  gcc/match.pd  | 30 ++
>  3 files changed, 38 insertions(+), 20 deletions(-)
>
> diff --git a/gcc/generic-match-head.cc b/gcc/generic-match-head.cc
> index 0d3f648fe8d..8d8ecfaeb1d 100644
> --- a/gcc/generic-match-head.cc
> +++ b/gcc/generic-match-head.cc
> @@ -59,6 +59,20 @@ types_match (tree t1, tree t2)
>return TYPE_MAIN_VARIANT (t1) == TYPE_MAIN_VARIANT (t2);
>  }
>
> +/* Routine to determine if the types T1, T2 and T3 are effectively
> +   the same for GENERIC.  If T1, T2 or T2 is not a type, the test
> +   applies to their TREE_TYPE.  */
> +
> +static inline bool
> +types_match (tree t1, tree t2, tree t3)
> +{
> +  t1 = TYPE_P (t1) ? t1 : TREE_TYPE (t1);
> +  t2 = TYPE_P (t2) ? t2 : TREE_TYPE (t2);
> +  t3 = TYPE_P (t3) ? t3 : TREE_TYPE (t3);

the above three lines are redundant.

> +  return types_match (t1, t2) && types_match (t2, t3);
> +}
> +
>  /* Return if T has a single use.  For GENERIC, we assume this is
> always true.  */
>
> diff --git a/gcc/gimple-match-head.cc b/gcc/gimple-match-head.cc
> index 5f8a1a1ad8e..2b7f746ab13 100644
> --- a/gcc/gimple-match-head.cc
> +++ b/gcc/gimple-match-head.cc
> @@ -79,6 +79,20 @@ types_match (tree t1, tree t2)
>return types_compatible_p (t1, t2);
>  }
>
> +/* Routine to determine if the types T1, T2 and T3 are effectively
> +   the same for GIMPLE.  If T1, T2 or T2 is not a type, the test
> +   applies to their TREE_TYPE.  */
> +
> +static inline bool
> +types_match (tree t1, tree t2, tree t3)
> +{
> +  t1 = TYPE_P (t1) ? t1 : TREE_TYPE (t1);
> +  t2 = TYPE_P (t2) ? t2 : TREE_TYPE (t2);
> +  t3 = TYPE_P (t3) ? t3 : TREE_TYPE (t3);

likewise.

OK with those removed.

Richard.

> +  return types_match (t1, t2) && types_match (t2, t3);
> +}
> +
>  /* Return if T has a single use.  For GIMPLE, we also allow any
> non-SSA_NAME (ie constants) and zero uses to cope with uses
> that aren't linked up yet.  */
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 35e3d82b131..7081d76d56a 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3048,38 +3048,28 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  /* Unsigned Saturation Add */
>  (match (usadd_left_part_1 @0 @1)
>   (plus:c @0 @1)
> - (if (INTEGRAL_TYPE_P (type)
> -  && TYPE_UNSIGNED (TREE_TYPE (@0))
> -  && types_match (type, TREE_TYPE (@0))
> -  && types_match (type, TREE_TYPE (@1)
> + (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> +  && types_match (type, @0, @1
>
>  (match (usadd_left_part_2 @0 @1)
>   (realpart (IFN_ADD_OVERFLOW:c @0 @1))
> - (if (INTEGRAL_TYPE_P (type)
> -  && TYPE_UNSIGNED (TREE_TYPE (@0))
> -  && types_match (type, TREE_TYPE (@0))
> -  && types_match (type, TREE_TYPE (@1)
> + (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> +  && types_match (type, @0, @1
>
>  (match (usadd_right_part_1 @0 @1)
>   (negate (convert (lt (plus:c @0 @1) @0)))
> - (if (INTEGRAL_TYPE_P (type)
> -  && TYPE_UNSIGNED (TREE_TYPE (@0))
> -  && types_match (type, TREE_TYPE (@0))
> -  && types_match (type, TREE_TYPE (@1)
> + (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> +  && types_match (type, @0, @1
>
>  (match (usadd_right_part_1 @0 @1)
>   (negate (co

RE: [PATCH v4] Match: Add overloaded types_match to avoid code dup [NFC]

2024-05-23 Thread Li, Pan2
Committed as passed below test suites, thanks Richard.

* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 regression test.

Pan

-Original Message-
From: Li, Pan2 
Sent: Thursday, May 23, 2024 8:06 PM
To: Richard Biener 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
tamar.christ...@arm.com; pins...@gmail.com
Subject: RE: [PATCH v4] Match: Add overloaded types_match to avoid code dup 
[NFC]

> the above three lines are redundant.
> OK with those removed.

Got it, will commit it after no surprise in test for removal.

Pan

-Original Message-
From: Richard Biener  
Sent: Thursday, May 23, 2024 7:49 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
tamar.christ...@arm.com; pins...@gmail.com
Subject: Re: [PATCH v4] Match: Add overloaded types_match to avoid code dup 
[NFC]

On Thu, May 23, 2024 at 2:24 AM  wrote:
>
> From: Pan Li 
>
> There are sorts of match pattern for SAT related cases,  there will be
> some duplicated code to check the dest, op_0, op_1 are same tree types.
> Aka ternary tree type matches.  Thus,  add overloaded types_match func
> do this and avoid match code duplication.
>
> The below test suites are passed for this patch:
> * The rv64gcv fully regression test.
> * The x86 bootstrap test.
> * The x86 regression test.
>
> gcc/ChangeLog:
>
> * generic-match-head.cc (types_match): Add overloaded types_match
> for 3 types.
> * gimple-match-head.cc (types_match): Ditto.
> * match.pd: Leverage overloaded types_match.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/generic-match-head.cc | 14 ++
>  gcc/gimple-match-head.cc  | 14 ++
>  gcc/match.pd  | 30 ++
>  3 files changed, 38 insertions(+), 20 deletions(-)
>
> diff --git a/gcc/generic-match-head.cc b/gcc/generic-match-head.cc
> index 0d3f648fe8d..8d8ecfaeb1d 100644
> --- a/gcc/generic-match-head.cc
> +++ b/gcc/generic-match-head.cc
> @@ -59,6 +59,20 @@ types_match (tree t1, tree t2)
>return TYPE_MAIN_VARIANT (t1) == TYPE_MAIN_VARIANT (t2);
>  }
>
> +/* Routine to determine if the types T1, T2 and T3 are effectively
> +   the same for GENERIC.  If T1, T2 or T2 is not a type, the test
> +   applies to their TREE_TYPE.  */
> +
> +static inline bool
> +types_match (tree t1, tree t2, tree t3)
> +{
> +  t1 = TYPE_P (t1) ? t1 : TREE_TYPE (t1);
> +  t2 = TYPE_P (t2) ? t2 : TREE_TYPE (t2);
> +  t3 = TYPE_P (t3) ? t3 : TREE_TYPE (t3);

the above three lines are redundant.

> +  return types_match (t1, t2) && types_match (t2, t3);
> +}
> +
>  /* Return if T has a single use.  For GENERIC, we assume this is
> always true.  */
>
> diff --git a/gcc/gimple-match-head.cc b/gcc/gimple-match-head.cc
> index 5f8a1a1ad8e..2b7f746ab13 100644
> --- a/gcc/gimple-match-head.cc
> +++ b/gcc/gimple-match-head.cc
> @@ -79,6 +79,20 @@ types_match (tree t1, tree t2)
>return types_compatible_p (t1, t2);
>  }
>
> +/* Routine to determine if the types T1, T2 and T3 are effectively
> +   the same for GIMPLE.  If T1, T2 or T2 is not a type, the test
> +   applies to their TREE_TYPE.  */
> +
> +static inline bool
> +types_match (tree t1, tree t2, tree t3)
> +{
> +  t1 = TYPE_P (t1) ? t1 : TREE_TYPE (t1);
> +  t2 = TYPE_P (t2) ? t2 : TREE_TYPE (t2);
> +  t3 = TYPE_P (t3) ? t3 : TREE_TYPE (t3);

likewise.

OK with those removed.

Richard.

> +  return types_match (t1, t2) && types_match (t2, t3);
> +}
> +
>  /* Return if T has a single use.  For GIMPLE, we also allow any
> non-SSA_NAME (ie constants) and zero uses to cope with uses
> that aren't linked up yet.  */
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 35e3d82b131..7081d76d56a 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3048,38 +3048,28 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  /* Unsigned Saturation Add */
>  (match (usadd_left_part_1 @0 @1)
>   (plus:c @0 @1)
> - (if (INTEGRAL_TYPE_P (type)
> -  && TYPE_UNSIGNED (TREE_TYPE (@0))
> -  && types_match (type, TREE_TYPE (@0))
> -  && types_match (type, TREE_TYPE (@1)
> + (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> +  && types_match (type, @0, @1
>
>  (match (usadd_left_part_2 @0 @1)
>   (realpart (IFN_ADD_OVERFLOW:c @0 @1))
> - (if (INTEGRAL_TYPE_P (type)
> -  && TYPE_UNSIGNED (TREE_TYPE (@0))
> -  && types_match (type, TREE_TYPE (@0))
> -  && types_match (type, TREE_TYPE (@1)
> + (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> +  && types_match (type, @0, @1
>
>  (match (u

RE: [PATCH v2] Match: Support __builtin_add_overflow branch form for unsigned SAT_ADD

2024-05-23 Thread Li, Pan2
 PHI arguments, match and 
simplify
  can happen on (COND) ? arg0 : arg1. */

+  if (match_phi_to_gimple_cond (cond_bb, phi, arg0, arg1))
+return true;
+
   stmt = last_nondebug_stmt (cond_bb);

   /* We need to know which is the true edge and which is the false


-Original Message-
From: Jeff Law  
Sent: Thursday, May 23, 2024 10:59 PM
To: Richard Biener ; Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
tamar.christ...@arm.com; pins...@gmail.com
Subject: Re: [PATCH v2] Match: Support __builtin_add_overflow branch form for 
unsigned SAT_ADD



On 5/23/24 6:14 AM, Richard Biener wrote:
> On Thu, May 23, 2024 at 1:08 PM Li, Pan2  wrote:
>>
>> I have a try to convert the PHI from Part-A to Part-B, aka PHI to _2 = 
>> phi_cond ? _1 : 255.
>> And then we can do the matching on COND_EXPR in the underlying widen-mul 
>> pass.
>>
>> Unfortunately, meet some ICE when verify_gimple_phi in sccopy1 pass =>
>> sat_add.c:66:1: internal compiler error: tree check: expected class ‘type’, 
>> have ‘exceptional’ (error_mark) in useless_type_conversion_p, at 
>> gimple-expr.cc:86
> 
> Likely you have released _2, more comments below on your previous mail.
You can be sure by calling debug_tree () on the SSA_NAME node in 
question.  If it reports "in-free-list", then that's definitive that the 
SSA_NAME was released back to the SSA_NAME manager.  If that SSA_NAME is 
still in the IL, then that's very bad.

jeff



RE: [PATCH v2] Match: Support __builtin_add_overflow branch form for unsigned SAT_ADD

2024-05-24 Thread Li, Pan2
Thanks Richard for help and comments.

If my understanding is correct, I should be able to follow below step(s) to 
support the branch form for unsigned SAT_ADD.

1. Building a helper in one place to match a PHI def as COND_EXPR (or even a 
better way to do it by providing native support from genmatch)
2. Leverage this helper in widen-mul and recog it as .SAT_ADD if matches.

Will have a try and keep you posted.

Pan

-Original Message-
From: Richard Biener  
Sent: Friday, May 24, 2024 3:21 PM
To: Li, Pan2 
Cc: Jeff Law ; gcc-patches@gcc.gnu.org; 
juzhe.zh...@rivai.ai; kito.ch...@gmail.com; tamar.christ...@arm.com; 
pins...@gmail.com
Subject: Re: [PATCH v2] Match: Support __builtin_add_overflow branch form for 
unsigned SAT_ADD

On Fri, May 24, 2024 at 8:56 AM Richard Biener
 wrote:
>
> On Fri, May 24, 2024 at 8:37 AM Li, Pan2  wrote:
> >
> > Thanks Jeff and Richard for suggestion and reviewing.
> >
> > Have another try in phiopt to do the convert from PHI to stmt = cond ? a : 
> > b.
> > It can perform the convert from PHI to stmt = cond ? a : b successfully, 
> > and then
> > the widen-mul is able to do the recog to .SAT_ADD.
> >
> > For now, to limit the risck, the above convert from PHI to stmt = cond ? a 
> > : b only be performed when matched,
> > as well as the backend support the usadd standard name. Unfortunately, I am 
> > stuck in the case that when the lhs
> > is not matched, we need to clean up something like created stmt in 
> > previous, or we will have ICE for missing definition.
> >
> > sat_add.c: In function ‘sat_add_u_3_uint8_t’:
> > sat_add.c:69:1: error: missing definition
> >69 | SAT_ADD_U_3(uint8_t);
> >   | ^~~
> > for SSA_NAME: _6 in statement:
> > # VUSE <.MEM_14(D)>
> > return _6;
> > during GIMPLE pass: phiopt
> > dump file: sat_add.c.046t.phiopt1
> > sat_add.c:69:1: internal compiler error: verify_ssa failed
> > 0x1db41ba verify_ssa(bool, bool
> > /home/pli/gcc/555/riscv-gnu-toolchain/gcc/__RISCV_BUILD__/../gcc/tree-ssa.cc:1203
> > 0x18e3075 execute_function_todo
> > 
> > /home/pli/gcc/555/riscv-gnu-toolchain/gcc/__RISCV_BUILD__/../gcc/passes.cc:2096
> > 0x18e1c52 do_per_function
> > 
> > /home/pli/gcc/555/riscv-gnu-toolchain/gcc/__RISCV_BUILD__/../gcc/passes.cc:1688
> > 0x18e3222 execute_todo
> >
> > I bet the reason is that we created new stmt like stmt_cond and stmt_val 
> > but we don't insert it.
> > Thus, there will be orphan nodes somewhere and we need something like 
> > rollback to recover the
> > gimple up to a point. I tried sorts of release_xx or likewise but seems not 
> > working.
> >
> > So is there any suggest to take care of such gimple rollback or another 
> > solution for this? Below are
> > The function to perform the convert from PHI to stmt = cond ? a : b for 
> > reference, thanks a lot.
> >
> > Pan
> >
> > diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
> > index 918cf50b589..7982b65bac4 100644
> > --- a/gcc/tree-ssa-phiopt.cc
> > +++ b/gcc/tree-ssa-phiopt.cc
> > @@ -486,6 +486,88 @@ phiopt_early_allow (gimple_seq &seq, gimple_match_op 
> > &op)
> >  }
> >  }
> >
> > +extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree));
> > +
> > +/* Try to match the phi expr to the gimple cond. Return true if we can
> > +   perform the convert or return false.  There will be some restrictions
> > +   or such kind of conversion, aka:
> > +
> > +   1. Only selected pattern will try this convert.
> > +   2. The generated gassign matched the selected IFN pattern.
> > +   3. The backend has implement the standard name.
> > +
> > +   From:
> > +  :
> > + _1 = x_3(D) + y_4(D);
> > + if (_1 >= x_3(D))
> > +   goto ; [INV]
> > + else
> > +   goto ; [INV]
> > +
> > +  :
> > +
> > +  :
> > + # _2 = PHI <255(2), _1(3)>
> > +
> > +   To:
> > +  :
> > + _1 = x_3(D) + y_4(D);
> > + phi_cond_6 = _1 >= x_3(D);
> > + _2 = phi_cond_6 ? _1 : 255; */
> > +
> > +static bool
> > +match_phi_to_gimple_cond (basic_block cond_bb, gphi *phi, tree arg0, tree 
> > arg1)
>
> You should do this in widen-mult and/or ISEL and if necessary for 
> vectorization
> in tree-if-conv.cc, though eventually what if-convert creates might be
> good enough
> to match during pattern recognition.
>
> > +{
> > +  gcond *cond = as_a  (*gsi_last_bb (cond_bb));
> > +
> 

RE: [PATCH][v2] tree-optimization/115144 - improve sinking destination choice

2024-05-26 Thread Li, Pan2
Hi Richard,

Looks this commit may result one ICE similar as below when build the newlib, 
feel free to ping me if you need one PR for this.

CC RISC-V port for awareness.

In file included from 
/home/pli/gcc/111/riscv-gnu-toolchain/newlib/newlib/libc/stdlib/setenv_r.c:26:
/home/pli/gcc/111/riscv-gnu-toolchain/newlib/newlib/libc/include/stdlib.h: In 
function '_setenv_r':
/home/pli/gcc/111/riscv-gnu-toolchain/newlib/newlib/libc/include/stdlib.h:212:9:
 error: stmt with wrong VUSE
  212 | int _setenv_r (struct _reent *, const char *__string, const char 
*__value, int __overwrite);
  | ^
# .MEM_109 = VDEF <.MEM_67>
*C_59 = 61;
expected .MEM_106
during GIMPLE pass: sink
/home/pli/gcc/111/riscv-gnu-toolchain/newlib/newlib/libc/include/stdlib.h:212:9:
 internal compiler error: verify_ssa failed

Pan


-Original Message-
From: Richard Biener  
Sent: Friday, May 24, 2024 7:01 PM
To: gcc-patches@gcc.gnu.org
Subject: [PATCH][v2] tree-optimization/115144 - improve sinking destination 
choice

When sinking code closer to its uses we already try to minimize the
distance we move by inserting at the start of the basic-block.  The
following makes sure to sink closest to the control dependence
check of the region we want to sink to as well as make sure to
ignore control dependences that are only guarding exceptional code.
This restores somewhat the old profile check but without requiring
nearly even probabilities.  The patch also makes sure to not give
up completely when the best sink location is one we do not want to
sink to but possibly then choose the next best one.

This addresses fallout observed in building libgo.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/115144
* tree-ssa-sink.cc (do_not_sink): New function, split out
from ...
(select_best_block): Here.  First pick valid block to
sink to.  From that search for the best valid block,
avoiding sinking across conditions to exceptional code.
(sink_code_in_bb): When updating vuses of stores in
paths we do not sink a store to make sure we didn't
pick a dominating sink location.

* gcc.dg/tree-ssa/ssa-sink-22.c: New testcase.
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c |  14 +++
 gcc/tree-ssa-sink.cc| 106 +---
 2 files changed, 86 insertions(+), 34 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
new file mode 100644
index 000..e35626d4070
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-sink1-details" } */
+
+extern void abort (void);
+
+int foo (int x, int y, int f)
+{
+  int tem = x / y;
+  if (f)
+abort ();
+  return tem;
+}
+
+/* { dg-final { scan-tree-dump-not "Sinking" "sink1" } } */
diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
index 2188b7523c7..b0fe871cf1e 100644
--- a/gcc/tree-ssa-sink.cc
+++ b/gcc/tree-ssa-sink.cc
@@ -172,6 +172,39 @@ nearest_common_dominator_of_uses (def_operand_p def_p, 
bool *debug_stmts)
   return commondom;
 }
 
+/* Return whether sinking STMT from EARLY_BB to BEST_BB should be avoided.  */
+
+static bool
+do_not_sink (gimple *stmt, basic_block early_bb, basic_block best_bb)
+{
+  /* Placing a statement before a setjmp-like function would be invalid
+ (it cannot be reevaluated when execution follows an abnormal edge).
+ If we selected a block with abnormal predecessors, just punt.  */
+  if (bb_has_abnormal_pred (best_bb))
+return true;
+
+  /* If the latch block is empty, don't make it non-empty by sinking
+ something into it.  */
+  if (best_bb == early_bb->loop_father->latch
+  && empty_block_p (best_bb))
+return true;
+
+  /* Avoid turning an unconditional read into a conditional one when we
+ still might want to perform vectorization.  */
+  if (best_bb->loop_father == early_bb->loop_father
+  && loop_outer (best_bb->loop_father)
+  && !best_bb->loop_father->inner
+  && gimple_vuse (stmt)
+  && !gimple_vdef (stmt)
+  && flag_tree_loop_vectorize
+  && !(cfun->curr_properties & PROP_loop_opts_done)
+  && dominated_by_p (CDI_DOMINATORS, best_bb->loop_father->latch, early_bb)
+  && !dominated_by_p (CDI_DOMINATORS, best_bb->loop_father->latch, 
best_bb))
+return true;
+
+  return false;
+}
+
 /* Given EARLY_BB and LATE_BB, two blocks in a path through the dominator
tree, return the best basic block between them (inclusive) to place
statements.
@@ -185,54 +218,57 @@ select_best_block (basic_block early_bb,
   basic_block late_bb,
   gimple *stmt)
 {
+  /* First pick a block we do not disqualify.  */
+  while (late_bb != early_bb
+&& do_not_sink (stmt, early_bb, late_bb))
+late_bb = ge

RE: [PATCH v1] Gen-Match: Fix gen_kids_1 right hand braces mis-alignment

2024-05-26 Thread Li, Pan2
Committed, thanks Jeff.

Pan

-Original Message-
From: Jeff Law  
Sent: Sunday, May 26, 2024 9:59 PM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com
Subject: Re: [PATCH v1] Gen-Match: Fix gen_kids_1 right hand braces 
mis-alignment



On 5/25/24 6:39 PM, pan2...@intel.com wrote:
> From: Pan Li 
> 
> Notice some mis-alignment for gen_kids_1 right hand braces as below:
> 
>if ((_q50 == _q20 && ! TREE_SIDE_EFFECTS (...
>  {
>if ((_q51 == _q21 && ! TREE_SIDE_EFFECTS (...
>  {
>{
>  tree captures[2] ATTRIBUTE_UNUSED = {...
>  {
>res_ops[0] = captures[0];
>res_ops[1] = captures[1];
>if (UNLIKELY (debug_dump)) ...
>return true;
>  }
>}
>  }
>  }
> }  // mis-aligned here.
>   }
> 
> The below test are passed for this patch:
> * The x86 bootstrap test.
> * The x86 fully regression test.
> 
> gcc/ChangeLog:
> 
>   * genmatch.cc (dt_node::gen_kids_1): Fix indenet mis-aligned.
OK
Thanks.
jeff



RE: [PATCH v3] Match: Support more form for scalar unsigned SAT_ADD

2024-05-29 Thread Li, Pan2
Thanks Richard for suggestion and review.

Did some tricky/ugly restrictions v3 for the phi gen as there are 
sorts of (cond in match.pd, will have a try with your proposal in v4.
Thanks again for help.

Pan

-Original Message-
From: Richard Biener  
Sent: Wednesday, May 29, 2024 8:36 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
tamar.christ...@arm.com
Subject: Re: [PATCH v3] Match: Support more form for scalar unsigned SAT_ADD

On Mon, May 27, 2024 at 8:29 AM  wrote:
>
> From: Pan Li 
>
> After we support one gassign form of the unsigned .SAT_ADD,  we
> would like to support more forms including both the branch and
> branchless.  There are 5 other forms of .SAT_ADD,  list as below:
>
> Form 1:
>   #define SAT_ADD_U_1(T) \
>   T sat_add_u_1_##T(T x, T y) \
>   { \
> return (T)(x + y) >= x ? (x + y) : -1; \
>   }
>
> Form 2:
>   #define SAT_ADD_U_2(T) \
>   T sat_add_u_2_##T(T x, T y) \
>   { \
> T ret; \
> T overflow = __builtin_add_overflow (x, y, &ret); \
> return (T)(-overflow) | ret; \
>   }
>
> Form 3:
>   #define SAT_ADD_U_3(T) \
>   T sat_add_u_3_##T (T x, T y) \
>   { \
> T ret; \
> return __builtin_add_overflow (x, y, &ret) ? -1 : ret; \
>   }
>
> Form 4:
>   #define SAT_ADD_U_4(T) \
>   T sat_add_u_4_##T (T x, T y) \
>   { \
> T ret; \
> return __builtin_add_overflow (x, y, &ret) == 0 ? ret : -1; \
>   }
>
> Form 5:
>   #define SAT_ADD_U_5(T) \
>   T sat_add_u_5_##T(T x, T y) \
>   { \
> return (T)(x + y) < x ? -1 : (x + y); \
>   }
>
> Take the forms 3 of above as example:
>
> uint64_t
> sat_add (uint64_t x, uint64_t y)
> {
>   uint64_t ret;
>   return __builtin_add_overflow (x, y, &ret) ? -1 : ret;
> }
>
> Before this patch:
> uint64_t sat_add (uint64_t x, uint64_t y)
> {
>   long unsigned int _1;
>   long unsigned int _2;
>   uint64_t _3;
>   __complex__ long unsigned int _6;
>
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _6 = .ADD_OVERFLOW (x_4(D), y_5(D));
>   _2 = IMAGPART_EXPR <_6>;
>   if (_2 != 0)
> goto ; [35.00%]
>   else
> goto ; [65.00%]
> ;;succ:   4
> ;;3
>
> ;;   basic block 3, loop depth 0
> ;;pred:   2
>   _1 = REALPART_EXPR <_6>;
> ;;succ:   4
>
> ;;   basic block 4, loop depth 0
> ;;pred:   3
> ;;2
>   # _3 = PHI <_1(3), 18446744073709551615(2)>
>   return _3;
> ;;succ:   EXIT
> }
>
> After this patch:
> uint64_t sat_add (uint64_t x, uint64_t y)
> {
>   long unsigned int _12;
>
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _12 = .SAT_ADD (x_4(D), y_5(D)); [tail call]
>   return _12;
> ;;succ:   EXIT
> }
>
> The below test suites are still running, will update it later.
> * The x86 bootstrap test.
> * The x86 fully regression test.
> * The riscv fully regression test.
>
> gcc/ChangeLog:
>
> * genmatch.cc (dt_node::gen_kids): Add new arg of predicate id.
> (allow_phi_predicate_p): New func impl to check the phi
> predicate is allowed or not.
> (dt_node::gen_kids_1): Add COND_EXPR gen for phi node if allowed.
> (dt_operand::gen_phi_on_cond):
> (write_predicate): Init the predicate id before gen_kids.
> * match.pd: Add more forms of unsigned_integer_sat_add and
> comments.
> * tree-ssa-math-opts.cc (match_saturation_arith): Rename from.
> (match_assign_saturation_arith): Rename to.
> (match_phi_saturation_arith): New func impl to match phi.
> (math_opts_dom_walker::after_dom_children): Add phi match for
> echo bb.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/genmatch.cc   | 123 --
>  gcc/match.pd  |  43 -
>  gcc/tree-ssa-math-opts.cc |  51 +++-
>  3 files changed, 210 insertions(+), 7 deletions(-)
>
> diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc
> index f1e0e7abe0c..816d2dafd23 100644
> --- a/gcc/genmatch.cc
> +++ b/gcc/genmatch.cc
> @@ -1767,6 +1767,7 @@ public:
>unsigned level;
>dt_node *parent;
>vec kids;
> +  const char *id;
>
>/* Statistics.  */
>unsigned num_leafs;
> @@ -1786,7 +1787,7 @@ public:
>virtual void gen (FILE *, int, bool, int) {}
>
>void gen_kids (FILE *, int, bool, int);
> -  void gen_kids_1 (FILE *, int, bool, int,
> +  void gen_kids_1 (FILE *, const char *, int, bool, int,
>const vec &, const vec &,
>const vec &, con

RE: [PATCH v1 1/5] RISC-V: Add testcases for scalar unsigned SAT_ADD form 1

2024-06-02 Thread Li, Pan2
Thanks Juzhe, will commit it after the middle-end patch, as well as the rest 
similar 4 patches.

Pan

From: juzhe.zh...@rivai.ai 
Sent: Monday, June 3, 2024 11:19 AM
To: Li, Pan2 ; gcc-patches 
Cc: kito.cheng ; Li, Pan2 
Subject: Re: [PATCH v1 1/5] RISC-V: Add testcases for scalar unsigned SAT_ADD 
form 1

LGTM. Thanks.


juzhe.zh...@rivai.ai<mailto:juzhe.zh...@rivai.ai>

From: pan2.li<mailto:pan2...@intel.com>
Date: 2024-06-03 11:09
To: gcc-patches<mailto:gcc-patches@gcc.gnu.org>
CC: juzhe.zhong<mailto:juzhe.zh...@rivai.ai>; 
kito.cheng<mailto:kito.ch...@gmail.com>; Pan Li<mailto:pan2...@intel.com>
Subject: [PATCH v1 1/5] RISC-V: Add testcases for scalar unsigned SAT_ADD form 1
From: Pan Li mailto:pan2...@intel.com>>

After the middle-end support the form 1 of unsigned SAT_ADD and
the RISC-V backend implement the scalar .SAT_ADD, add more test
case to cover the form 1 of unsigned .SAT_ADD.

Form 1:

  #define SAT_ADD_U_1(T)   \
  T sat_add_u_1_##T(T x, T y)  \
  {\
return (T)(x + y) >= x ? (x + y) : -1; \
  }

Passed the riscv fully regression tests.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add helper macro for form 1.
* gcc.target/riscv/sat_u_add-5.c: New test.
* gcc.target/riscv/sat_u_add-6.c: New test.
* gcc.target/riscv/sat_u_add-7.c: New test.
* gcc.target/riscv/sat_u_add-8.c: New test.
* gcc.target/riscv/sat_u_add-run-5.c: New test.
* gcc.target/riscv/sat_u_add-run-6.c: New test.
* gcc.target/riscv/sat_u_add-run-7.c: New test.
* gcc.target/riscv/sat_u_add-run-8.c: New test.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>
---
gcc/testsuite/gcc.target/riscv/sat_arith.h|  8 ++
gcc/testsuite/gcc.target/riscv/sat_u_add-5.c  | 19 ++
gcc/testsuite/gcc.target/riscv/sat_u_add-6.c  | 21 
gcc/testsuite/gcc.target/riscv/sat_u_add-7.c  | 18 +
gcc/testsuite/gcc.target/riscv/sat_u_add-8.c  | 17 +
.../gcc.target/riscv/sat_u_add-run-5.c| 25 +++
.../gcc.target/riscv/sat_u_add-run-6.c| 25 +++
.../gcc.target/riscv/sat_u_add-run-7.c| 25 +++
.../gcc.target/riscv/sat_u_add-run-8.c| 25 +++
9 files changed, 183 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-5.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-6.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-7.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-8.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-5.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-6.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-7.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-8.c

diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/sat_arith.h
index 2ef9fd825f3..2abc83d7666 100644
--- a/gcc/testsuite/gcc.target/riscv/sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
@@ -10,6 +10,13 @@ sat_u_add_##T##_fmt_1 (T x, T y)   \
   return (x + y) | (-(T)((T)(x + y) < x)); \
}
+#define DEF_SAT_U_ADD_FMT_2(T)   \
+T __attribute__((noinline))  \
+sat_u_add_##T##_fmt_2 (T x, T y) \
+{\
+  return (T)(x + y) >= x ? (x + y) : -1; \
+}
+
#define DEF_VEC_SAT_U_ADD_FMT_1(T)   \
void __attribute__((noinline))   \
vec_sat_u_add_##T##_fmt_1 (T *out, T *op_1, T *op_2, unsigned limit) \
@@ -24,6 +31,7 @@ vec_sat_u_add_##T##_fmt_1 (T *out, T *op_1, T *op_2, unsigned 
limit) \
}
#define RUN_SAT_U_ADD_FMT_1(T, x, y) sat_u_add_##T##_fmt_1(x, y)
+#define RUN_SAT_U_ADD_FMT_2(T, x, y) sat_u_add_##T##_fmt_2(x, y)
#define RUN_VEC_SAT_U_ADD_FMT_1(T, out, op_1, op_2, N) \
   vec_sat_u_add_##T##_fmt_1(out, op_1, op_2, N)
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-5.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_add-5.c
new file mode 100644
index 000..4c73c7f8a21
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-5.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_add_uint8_t_fmt_2:
+** add\s+[atx][0-9]+,\s*a0,\s*a1
+** andi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*0xff
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** andi\s+a0,\s*a0,\s*0xff
+** ret
+*/
+DEF_SAT_U_ADD_FMT_2(uint8_t)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 2 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u

RE: [PATCH v1] Internal-fn: Add new IFN mask_len_strided_load/store

2024-06-04 Thread Li, Pan2
> Sorry if we have discussed this last year already - is there anything wrong
> with using a gather/scatter with a VEC_SERIES gimple/rtl def for the offset?

Thanks for comments, it is quit a while since last discussion. Let me recall a 
little about it and keep you posted.

Pan

-Original Message-
From: Richard Biener  
Sent: Tuesday, June 4, 2024 9:22 PM
To: Li, Pan2 ; Richard Sandiford 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
tamar.christ...@arm.com
Subject: Re: [PATCH v1] Internal-fn: Add new IFN mask_len_strided_load/store

On Tue, May 28, 2024 at 5:15 AM  wrote:
>
> From: Pan Li 
>
> This patch would like to add new internal fun for the below 2 IFN.
> * mask_len_strided_load
> * mask_len_strided_store
>
> The GIMPLE v = MASK_LEN_STRIDED_LOAD (ptr, stride, mask, len, bias) will
> be expanded into v = mask_len_strided_load (ptr, stried, mask, len, bias).
>
> The GIMPLE MASK_LEN_STRIED_STORE (ptr, stride, v, mask, len, bias)
> be expanded into mask_len_stried_store (ptr, stride, v, mask, len, bias).
>
> The below test suites are passed for this patch:
> * The x86 bootstrap test.
> * The x86 fully regression test.
> * The riscv fully regression test.

Sorry if we have discussed this last year already - is there anything wrong
with using a gather/scatter with a VEC_SERIES gimple/rtl def for the offset?

Richard.

> gcc/ChangeLog:
>
> * doc/md.texi: Add description for mask_len_strided_load/store.
> * internal-fn.cc (strided_load_direct): New internal_fn define
> for strided_load_direct.
> (strided_store_direct): Ditto but for store.
> (expand_strided_load_optab_fn): New expand func for
> mask_len_strided_load.
> (expand_strided_store_optab_fn): Ditto but for store.
> (direct_strided_load_optab_supported_p): New define for load
> direct optab supported.
> (direct_strided_store_optab_supported_p): Ditto but for store.
> (internal_fn_len_index): Add len index for both load and store.
> (internal_fn_mask_index): Ditto but for mask index.
> (internal_fn_stored_value_index): Add stored index.
> * internal-fn.def (MASK_LEN_STRIDED_LOAD): New direct fn define
> for strided_load.
> (MASK_LEN_STRIDED_STORE): Ditto but for stride_store.
> * optabs.def (OPTAB_D): New optab define for load and store.
>
> Signed-off-by: Pan Li 
> Co-Authored-By: Juzhe-Zhong 
> ---
>  gcc/doc/md.texi | 27 
>  gcc/internal-fn.cc  | 75 +
>  gcc/internal-fn.def |  6 
>  gcc/optabs.def  |  2 ++
>  4 files changed, 110 insertions(+)
>
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 5730bda80dc..3d242675c63 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5138,6 +5138,20 @@ Bit @var{i} of the mask is set if element @var{i} of 
> the result should
>  be loaded from memory and clear if element @var{i} of the result should be 
> undefined.
>  Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
>
> +@cindex @code{mask_len_strided_load@var{m}} instruction pattern
> +@item @samp{mask_len_strided_load@var{m}}
> +Load several separate memory locations into a destination vector of mode 
> @var{m}.
> +Operand 0 is a destination vector of mode @var{m}.
> +Operand 1 is a scalar base address and operand 2 is a scalar stride of Pmode.
> +operand 3 is mask operand, operand 4 is length operand and operand 5 is bias 
> operand.
> +The instruction can be seen as a special case of 
> @code{mask_len_gather_load@var{m}@var{n}}
> +with an offset vector that is a @code{vec_series} with operand 1 as base and 
> operand 2 as step.
> +For each element index i load address is operand 1 + @var{i} * operand 2.
> +Similar to mask_len_load, the instruction loads at most (operand 4 + operand 
> 5) elements from memory.
> +Element @var{i} of the mask (operand 3) is set if element @var{i} of the 
> result should
> +be loaded from memory and clear if element @var{i} of the result should be 
> zero.
> +Mask elements @var{i} with @var{i} > (operand 4 + operand 5) are ignored.
> +
>  @cindex @code{scatter_store@var{m}@var{n}} instruction pattern
>  @item @samp{scatter_store@var{m}@var{n}}
>  Store a vector of mode @var{m} into several distinct memory locations.
> @@ -5175,6 +5189,19 @@ at most (operand 6 + operand 7) elements of (operand 
> 4) to memory.
>  Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be 
> stored.
>  Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
>
> +@cindex @code{mask_len_strided_store@var{m}} instruction pattern
> +@item @samp{mask_len_strided

RE: [PATCH v1] Internal-fn: Support new IFN SAT_SUB for unsigned scalar int

2024-06-04 Thread Li, Pan2
Kindly ping, almost the same but for subtract.

Pan

-Original Message-
From: Li, Pan2  
Sent: Tuesday, May 28, 2024 4:30 PM
To: gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; tamar.christ...@arm.com; 
richard.guent...@gmail.com; Li, Pan2 
Subject: [PATCH v1] Internal-fn: Support new IFN SAT_SUB for unsigned scalar int

From: Pan Li 

This patch would like to add the middle-end presentation for the
saturation sub.  Aka set the result of add to the min when downflow.
It will take the pattern similar as below.

SAT_SUB (x, y) => (x - y) & (-(TYPE)(x >= y));

For example for uint8_t, we have

* SAT_SUB (255, 0)   => 255
* SAT_SUB (1, 2) => 0
* SAT_SUB (254, 255) => 0
* SAT_SUB (0, 255)   => 0

Given below SAT_SUB for uint64

uint64_t sat_sub_u64 (uint64_t x, uint64_t y)
{
  return (x + y) & (- (uint64_t)((x >= y)));
}

Before this patch:
uint64_t sat_sub_u_0_uint64_t (uint64_t x, uint64_t y)
{
  _Bool _1;
  long unsigned int _3;
  uint64_t _6;

;;   basic block 2, loop depth 0
;;pred:   ENTRY
  _1 = x_4(D) >= y_5(D);
  _3 = x_4(D) - y_5(D);
  _6 = _1 ? _3 : 0;
  return _6;
;;succ:   EXIT
}

After this patch:
uint64_t sat_sub_u_0_uint64_t (uint64_t x, uint64_t y)
{
  uint64_t _6;

;;   basic block 2, loop depth 0
;;pred:   ENTRY
  _6 = .SAT_SUB (x_4(D), y_5(D)); [tail call]
  return _6;
;;succ:   EXIT
}

The below tests are running for this patch:
*. The riscv fully regression tests.
*. The x86 bootstrap tests.
*. The x86 fully regression tests.

PR target/51492
PR target/112600

gcc/ChangeLog:

* internal-fn.def (SAT_SUB): Add new IFN define for SAT_SUB.
* match.pd: Add new match for SAT_SUB.
* optabs.def (OPTAB_NL): Remove fixed-point for ussub/ssub.
* tree-ssa-math-opts.cc (gimple_unsigned_integer_sat_sub): Add
new decl for generated in match.pd.
(build_saturation_binary_arith_call): Add new helper function
to build the gimple call to binary SAT alu.
(match_saturation_arith): Rename from.
(match_unsigned_saturation_add): Rename to.
(match_unsigned_saturation_sub): Add new func to match the
unsigned sat sub.
(math_opts_dom_walker::after_dom_children): Add SAT_SUB matching
try when COND_EXPR.

Signed-off-by: Pan Li 
---
 gcc/internal-fn.def   |  1 +
 gcc/match.pd  | 14 
 gcc/optabs.def|  4 +--
 gcc/tree-ssa-math-opts.cc | 67 +++
 4 files changed, 64 insertions(+), 22 deletions(-)

diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 25badbb86e5..24539716e5b 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -276,6 +276,7 @@ DEF_INTERNAL_SIGNED_OPTAB_FN (MULHRS, ECF_CONST | 
ECF_NOTHROW, first,
  smulhrs, umulhrs, binary)
 
 DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_ADD, ECF_CONST, first, ssadd, usadd, binary)
+DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_SUB, ECF_CONST, first, sssub, ussub, binary)
 
 DEF_INTERNAL_COND_FN (ADD, ECF_CONST, add, binary)
 DEF_INTERNAL_COND_FN (SUB, ECF_CONST, sub, binary)
diff --git a/gcc/match.pd b/gcc/match.pd
index 024e3350465..3e334533ff8 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3086,6 +3086,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (match (unsigned_integer_sat_add @0 @1)
  (bit_ior:c (usadd_left_part_2 @0 @1) (usadd_right_part_2 @0 @1)))
 
+/* Unsigned saturation sub, case 1 (branch with gt):
+   SAT_U_SUB = X > Y ? X - Y : 0  */
+(match (unsigned_integer_sat_sub @0 @1)
+ (cond (gt @0 @1) (minus @0 @1) integer_zerop)
+ (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
+  && types_match (type, @0, @1
+
+/* Unsigned saturation sub, case 2 (branch with ge):
+   SAT_U_SUB = X >= Y ? X - Y : 0.  */
+(match (unsigned_integer_sat_sub @0 @1)
+ (cond (ge @0 @1) (minus @0 @1) integer_zerop)
+ (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
+  && types_match (type, @0, @1
+
 /* x >  y  &&  x != XXX_MIN  -->  x > y
x >  y  &&  x == XXX_MIN  -->  false . */
 (for eqne (eq ne)
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 3f2cb46aff8..bc2611abdc2 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -118,8 +118,8 @@ OPTAB_NX(sub_optab, "sub$F$a3")
 OPTAB_NX(sub_optab, "sub$Q$a3")
 OPTAB_VL(subv_optab, "subv$I$a3", MINUS, "sub", '3', gen_intv_fp_libfunc)
 OPTAB_VX(subv_optab, "sub$F$a3")
-OPTAB_NL(sssub_optab, "sssub$Q$a3", SS_MINUS, "sssub", '3', 
gen_signed_fixed_libfunc)
-OPTAB_NL(ussub_optab, "ussub$Q$a3", US_MINUS, "ussub", '3', 
gen_unsigned_fixed_libfunc)
+OPTAB_NL(sssub_optab, "sssub$a3", SS_MINUS, "sssub", '3', 
gen_signed_fixed_libfunc)
+OPTAB_NL(ussub_optab, "ussub$a3", US_MINUS, "ussub&

RE: [PATCH v1] Internal-fn: Support new IFN SAT_SUB for unsigned scalar int

2024-06-05 Thread Li, Pan2
Thanks Richard, will commit after the rebased pass the regression test.

Pan

-Original Message-
From: Richard Biener  
Sent: Wednesday, June 5, 2024 3:19 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
tamar.christ...@arm.com
Subject: Re: [PATCH v1] Internal-fn: Support new IFN SAT_SUB for unsigned 
scalar int

On Tue, May 28, 2024 at 10:29 AM  wrote:
>
> From: Pan Li 
>
> This patch would like to add the middle-end presentation for the
> saturation sub.  Aka set the result of add to the min when downflow.
> It will take the pattern similar as below.
>
> SAT_SUB (x, y) => (x - y) & (-(TYPE)(x >= y));
>
> For example for uint8_t, we have
>
> * SAT_SUB (255, 0)   => 255
> * SAT_SUB (1, 2) => 0
> * SAT_SUB (254, 255) => 0
> * SAT_SUB (0, 255)   => 0
>
> Given below SAT_SUB for uint64
>
> uint64_t sat_sub_u64 (uint64_t x, uint64_t y)
> {
>   return (x + y) & (- (uint64_t)((x >= y)));
> }
>
> Before this patch:
> uint64_t sat_sub_u_0_uint64_t (uint64_t x, uint64_t y)
> {
>   _Bool _1;
>   long unsigned int _3;
>   uint64_t _6;
>
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _1 = x_4(D) >= y_5(D);
>   _3 = x_4(D) - y_5(D);
>   _6 = _1 ? _3 : 0;
>   return _6;
> ;;succ:   EXIT
> }
>
> After this patch:
> uint64_t sat_sub_u_0_uint64_t (uint64_t x, uint64_t y)
> {
>   uint64_t _6;
>
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _6 = .SAT_SUB (x_4(D), y_5(D)); [tail call]
>   return _6;
> ;;succ:   EXIT
> }
>
> The below tests are running for this patch:
> *. The riscv fully regression tests.
> *. The x86 bootstrap tests.
> *. The x86 fully regression tests.

OK.

Thanks,
Richard.

> PR target/51492
> PR target/112600
>
> gcc/ChangeLog:
>
> * internal-fn.def (SAT_SUB): Add new IFN define for SAT_SUB.
> * match.pd: Add new match for SAT_SUB.
> * optabs.def (OPTAB_NL): Remove fixed-point for ussub/ssub.
> * tree-ssa-math-opts.cc (gimple_unsigned_integer_sat_sub): Add
> new decl for generated in match.pd.
> (build_saturation_binary_arith_call): Add new helper function
> to build the gimple call to binary SAT alu.
> (match_saturation_arith): Rename from.
> (match_unsigned_saturation_add): Rename to.
> (match_unsigned_saturation_sub): Add new func to match the
> unsigned sat sub.
> (math_opts_dom_walker::after_dom_children): Add SAT_SUB matching
> try when COND_EXPR.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/internal-fn.def   |  1 +
>  gcc/match.pd  | 14 
>  gcc/optabs.def|  4 +--
>  gcc/tree-ssa-math-opts.cc | 67 +++
>  4 files changed, 64 insertions(+), 22 deletions(-)
>
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index 25badbb86e5..24539716e5b 100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -276,6 +276,7 @@ DEF_INTERNAL_SIGNED_OPTAB_FN (MULHRS, ECF_CONST | 
> ECF_NOTHROW, first,
>   smulhrs, umulhrs, binary)
>
>  DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_ADD, ECF_CONST, first, ssadd, usadd, 
> binary)
> +DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_SUB, ECF_CONST, first, sssub, ussub, 
> binary)
>
>  DEF_INTERNAL_COND_FN (ADD, ECF_CONST, add, binary)
>  DEF_INTERNAL_COND_FN (SUB, ECF_CONST, sub, binary)
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 024e3350465..3e334533ff8 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3086,6 +3086,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  (match (unsigned_integer_sat_add @0 @1)
>   (bit_ior:c (usadd_left_part_2 @0 @1) (usadd_right_part_2 @0 @1)))
>
> +/* Unsigned saturation sub, case 1 (branch with gt):
> +   SAT_U_SUB = X > Y ? X - Y : 0  */
> +(match (unsigned_integer_sat_sub @0 @1)
> + (cond (gt @0 @1) (minus @0 @1) integer_zerop)
> + (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> +  && types_match (type, @0, @1
> +
> +/* Unsigned saturation sub, case 2 (branch with ge):
> +   SAT_U_SUB = X >= Y ? X - Y : 0.  */
> +(match (unsigned_integer_sat_sub @0 @1)
> + (cond (ge @0 @1) (minus @0 @1) integer_zerop)
> + (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> +  && types_match (type, @0, @1
> +
>  /* x >  y  &&  x != XXX_MIN  -->  x > y
> x >  y  &&  x == XXX_MIN  -->  false . */
>  (for eqne (eq ne)
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index 3f2cb46aff8..bc2611abdc2 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.d

RE: [PATCH v1] Internal-fn: Add new IFN mask_len_strided_load/store

2024-06-05 Thread Li, Pan2
Looks not easy to get the original context/history, only catch some shadow from 
below patch but not the fully picture.

https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634683.html

It is reasonable to me that using gather/scatter with a VEC_SERICES, for 
example as blow, will have a try for this.

operand_0 = mask_gather_loadmn (ptr, offset, 1/0(sign/unsign), multiply, mask)
  offset = (vec_series:m base step) => base + i * step
  op_0[i] = memory[ptr + offset[i] * multiply] && mask[i]

operand_0 = mask_len_strided_load (ptr, stride, mask, len, bias).
  op_0[i] = memory[prt + stride * i] && mask[i] && i < (len + bias)

Pan

-----Original Message-
From: Li, Pan2 
Sent: Wednesday, June 5, 2024 9:18 AM
To: Richard Biener ; Richard Sandiford 

Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
tamar.christ...@arm.com
Subject: RE: [PATCH v1] Internal-fn: Add new IFN mask_len_strided_load/store

> Sorry if we have discussed this last year already - is there anything wrong
> with using a gather/scatter with a VEC_SERIES gimple/rtl def for the offset?

Thanks for comments, it is quit a while since last discussion. Let me recall a 
little about it and keep you posted.

Pan

-Original Message-
From: Richard Biener  
Sent: Tuesday, June 4, 2024 9:22 PM
To: Li, Pan2 ; Richard Sandiford 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
tamar.christ...@arm.com
Subject: Re: [PATCH v1] Internal-fn: Add new IFN mask_len_strided_load/store

On Tue, May 28, 2024 at 5:15 AM  wrote:
>
> From: Pan Li 
>
> This patch would like to add new internal fun for the below 2 IFN.
> * mask_len_strided_load
> * mask_len_strided_store
>
> The GIMPLE v = MASK_LEN_STRIDED_LOAD (ptr, stride, mask, len, bias) will
> be expanded into v = mask_len_strided_load (ptr, stried, mask, len, bias).
>
> The GIMPLE MASK_LEN_STRIED_STORE (ptr, stride, v, mask, len, bias)
> be expanded into mask_len_stried_store (ptr, stride, v, mask, len, bias).
>
> The below test suites are passed for this patch:
> * The x86 bootstrap test.
> * The x86 fully regression test.
> * The riscv fully regression test.

Sorry if we have discussed this last year already - is there anything wrong
with using a gather/scatter with a VEC_SERIES gimple/rtl def for the offset?

Richard.

> gcc/ChangeLog:
>
> * doc/md.texi: Add description for mask_len_strided_load/store.
> * internal-fn.cc (strided_load_direct): New internal_fn define
> for strided_load_direct.
> (strided_store_direct): Ditto but for store.
> (expand_strided_load_optab_fn): New expand func for
> mask_len_strided_load.
> (expand_strided_store_optab_fn): Ditto but for store.
> (direct_strided_load_optab_supported_p): New define for load
> direct optab supported.
> (direct_strided_store_optab_supported_p): Ditto but for store.
> (internal_fn_len_index): Add len index for both load and store.
> (internal_fn_mask_index): Ditto but for mask index.
> (internal_fn_stored_value_index): Add stored index.
> * internal-fn.def (MASK_LEN_STRIDED_LOAD): New direct fn define
> for strided_load.
> (MASK_LEN_STRIDED_STORE): Ditto but for stride_store.
> * optabs.def (OPTAB_D): New optab define for load and store.
>
> Signed-off-by: Pan Li 
> Co-Authored-By: Juzhe-Zhong 
> ---
>  gcc/doc/md.texi | 27 
>  gcc/internal-fn.cc  | 75 +
>  gcc/internal-fn.def |  6 
>  gcc/optabs.def  |  2 ++
>  4 files changed, 110 insertions(+)
>
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 5730bda80dc..3d242675c63 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5138,6 +5138,20 @@ Bit @var{i} of the mask is set if element @var{i} of 
> the result should
>  be loaded from memory and clear if element @var{i} of the result should be 
> undefined.
>  Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
>
> +@cindex @code{mask_len_strided_load@var{m}} instruction pattern
> +@item @samp{mask_len_strided_load@var{m}}
> +Load several separate memory locations into a destination vector of mode 
> @var{m}.
> +Operand 0 is a destination vector of mode @var{m}.
> +Operand 1 is a scalar base address and operand 2 is a scalar stride of Pmode.
> +operand 3 is mask operand, operand 4 is length operand and operand 5 is bias 
> operand.
> +The instruction can be seen as a special case of 
> @code{mask_len_gather_load@var{m}@var{n}}
> +with an offset vector that is a @code{vec_series} with operand 1 as base and 
> operand 2 as step.
> +For each element index i load address is operand 1 + @var{i} * operand 2.
> +Simil

RE: [PATCH v1] Internal-fn: Support new IFN SAT_SUB for unsigned scalar int

2024-06-05 Thread Li, Pan2
> Is the above testcase correct? You need "(x + y)" as the first term.

Thanks for comments, should be copy issue here, you can take SAT_SUB (x, y) => 
(x - y) & (-(TYPE)(x >= y)) or below template for reference.

+#define DEF_SAT_U_SUB_FMT_1(T) \
+T __attribute__((noinline))\
+sat_u_sub_##T##_fmt_1 (T x, T y)   \
+{  \
+  return (x - y) & (-(T)(x >= y)); \
+}
+
+#define DEF_SAT_U_SUB_FMT_2(T)\
+T __attribute__((noinline))   \
+sat_u_sub_##T##_fmt_2 (T x, T y)  \
+{ \
+  return (x - y) & (-(T)(x > y)); \
+}

> BTW: After applying your patch, I'm not able to produce .SAT_SUB with
> x86_64 and the following testcase:

You mean vectorize part? This patch is only for unsigned scalar int (see title) 
and the below is the vect part.
Could you please help to double confirm if you cannot see .SAT_SUB after 
widen_mul pass in x86 for unsigned scalar int?
Of course, I will have a try later as in the middle of sth.

https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653024.html

Pan


-Original Message-
From: Uros Bizjak  
Sent: Wednesday, June 5, 2024 4:09 PM
To: Li, Pan2 
Cc: Richard Biener ; gcc-patches@gcc.gnu.org; 
juzhe.zh...@rivai.ai; kito.ch...@gmail.com; tamar.christ...@arm.com
Subject: Re: [PATCH v1] Internal-fn: Support new IFN SAT_SUB for unsigned 
scalar int

On Wed, Jun 5, 2024 at 9:38 AM Li, Pan2  wrote:
>
> Thanks Richard, will commit after the rebased pass the regression test.
>
> Pan
>
> -Original Message-
> From: Richard Biener 
> Sent: Wednesday, June 5, 2024 3:19 PM
> To: Li, Pan2 
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
> tamar.christ...@arm.com
> Subject: Re: [PATCH v1] Internal-fn: Support new IFN SAT_SUB for unsigned 
> scalar int
>
> On Tue, May 28, 2024 at 10:29 AM  wrote:
> >
> > From: Pan Li 
> >
> > This patch would like to add the middle-end presentation for the
> > saturation sub.  Aka set the result of add to the min when downflow.
> > It will take the pattern similar as below.
> >
> > SAT_SUB (x, y) => (x - y) & (-(TYPE)(x >= y));
> >
> > For example for uint8_t, we have
> >
> > * SAT_SUB (255, 0)   => 255
> > * SAT_SUB (1, 2) => 0
> > * SAT_SUB (254, 255) => 0
> > * SAT_SUB (0, 255)   => 0
> >
> > Given below SAT_SUB for uint64
> >
> > uint64_t sat_sub_u64 (uint64_t x, uint64_t y)
> > {
> >   return (x + y) & (- (uint64_t)((x >= y)));
> > }

Is the above testcase correct? You need "(x + y)" as the first term.

BTW: After applying your patch, I'm not able to produce .SAT_SUB with
x86_64 and the following testcase:

--cut here--
typedef unsigned short T;

void foo (T *out, T *x, T *y, int n)
{
  int i;

  for (i = 0; i < n; i++)
out[i] = (x[i] - y[i]) & (-(T)(x[i] >= y[i]));
}
--cut here--

with gcc -O2 -ftree-vectorize -msse2

I think that all relevant optabs were added for x86 in

https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=b59de4113262f2bee14147eb17eb3592f03d9556

as part of the commit for PR112600, comment 8.

Uros.

> >
> > Before this patch:
> > uint64_t sat_sub_u_0_uint64_t (uint64_t x, uint64_t y)
> > {
> >   _Bool _1;
> >   long unsigned int _3;
> >   uint64_t _6;
> >
> > ;;   basic block 2, loop depth 0
> > ;;pred:   ENTRY
> >   _1 = x_4(D) >= y_5(D);
> >   _3 = x_4(D) - y_5(D);
> >   _6 = _1 ? _3 : 0;
> >   return _6;
> > ;;succ:   EXIT
> > }
> >
> > After this patch:
> > uint64_t sat_sub_u_0_uint64_t (uint64_t x, uint64_t y)
> > {
> >   uint64_t _6;
> >
> > ;;   basic block 2, loop depth 0
> > ;;pred:   ENTRY
> >   _6 = .SAT_SUB (x_4(D), y_5(D)); [tail call]
> >   return _6;
> > ;;succ:   EXIT
> > }
> >
> > The below tests are running for this patch:
> > *. The riscv fully regression tests.
> > *. The x86 bootstrap tests.
> > *. The x86 fully regression tests.
>
> OK.
>
> Thanks,
> Richard.
>
> > PR target/51492
> > PR target/112600
> >
> > gcc/ChangeLog:
> >
> > * internal-fn.def (SAT_SUB): Add new IFN define for SAT_SUB.
> > * match.pd: Add new match for SAT_SUB.
> > * optabs.def (OPTAB_NL): Remove fixed-point for ussub/ssub.
> > * tree-ssa-math-opts.cc (gimple_unsigned_integer_sat_sub): Add
> > new decl for generated in match.pd.
> > (build_saturation_binary_arith_call): Add new helper function
> > to build the gimple call to binary SAT alu.
> > (m

RE: [PATCH v1] Internal-fn: Support new IFN SAT_SUB for unsigned scalar int

2024-06-05 Thread Li, Pan2
> I see. x86 doesn't have scalar saturating instructions, so the scalar
> version indeed can't be converted.

> I will amend x86 testcases after the vector part of your patch is committed.

Thanks for the confirmation. Just curious, the .SAT_SUB for scalar has sorts of 
forms, like a branch version as below.

.SAT_SUB (x, y) = x > y ? x - y : 0. // or leverage __builtin_sub_overflow here

It is reasonable to implement the scalar .SAT_SUB for x86? Given somehow we can 
eliminate the branch here.

Pan

-Original Message-
From: Uros Bizjak  
Sent: Wednesday, June 5, 2024 4:30 PM
To: Li, Pan2 
Cc: Richard Biener ; gcc-patches@gcc.gnu.org; 
juzhe.zh...@rivai.ai; kito.ch...@gmail.com; tamar.christ...@arm.com
Subject: Re: [PATCH v1] Internal-fn: Support new IFN SAT_SUB for unsigned 
scalar int

On Wed, Jun 5, 2024 at 10:22 AM Li, Pan2  wrote:
>
> > Is the above testcase correct? You need "(x + y)" as the first term.
>
> Thanks for comments, should be copy issue here, you can take SAT_SUB (x, y) 
> => (x - y) & (-(TYPE)(x >= y)) or below template for reference.
>
> +#define DEF_SAT_U_SUB_FMT_1(T) \
> +T __attribute__((noinline))\
> +sat_u_sub_##T##_fmt_1 (T x, T y)   \
> +{  \
> +  return (x - y) & (-(T)(x >= y)); \
> +}
> +
> +#define DEF_SAT_U_SUB_FMT_2(T)\
> +T __attribute__((noinline))   \
> +sat_u_sub_##T##_fmt_2 (T x, T y)  \
> +{ \
> +  return (x - y) & (-(T)(x > y)); \
> +}
>
> > BTW: After applying your patch, I'm not able to produce .SAT_SUB with
> > x86_64 and the following testcase:
>
> You mean vectorize part? This patch is only for unsigned scalar int (see 
> title) and the below is the vect part.
> Could you please help to double confirm if you cannot see .SAT_SUB after 
> widen_mul pass in x86 for unsigned scalar int?
> Of course, I will have a try later as in the middle of sth.
>
> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653024.html

I see. x86 doesn't have scalar saturating instructions, so the scalar
version indeed can't be converted.

I will amend x86 testcases after the vector part of your patch is committed.

Thanks,
Uros.


RE: [PATCH v1] Internal-fn: Support new IFN SAT_SUB for unsigned scalar int

2024-06-05 Thread Li, Pan2
Committed with the example in commit log updated, thanks all.

Pan

-Original Message-
From: Li, Pan2  
Sent: Wednesday, June 5, 2024 4:38 PM
To: Uros Bizjak 
Cc: Richard Biener ; gcc-patches@gcc.gnu.org; 
juzhe.zh...@rivai.ai; kito.ch...@gmail.com; tamar.christ...@arm.com
Subject: RE: [PATCH v1] Internal-fn: Support new IFN SAT_SUB for unsigned 
scalar int

> I see. x86 doesn't have scalar saturating instructions, so the scalar
> version indeed can't be converted.

> I will amend x86 testcases after the vector part of your patch is committed.

Thanks for the confirmation. Just curious, the .SAT_SUB for scalar has sorts of 
forms, like a branch version as below.

.SAT_SUB (x, y) = x > y ? x - y : 0. // or leverage __builtin_sub_overflow here

It is reasonable to implement the scalar .SAT_SUB for x86? Given somehow we can 
eliminate the branch here.

Pan

-Original Message-
From: Uros Bizjak  
Sent: Wednesday, June 5, 2024 4:30 PM
To: Li, Pan2 
Cc: Richard Biener ; gcc-patches@gcc.gnu.org; 
juzhe.zh...@rivai.ai; kito.ch...@gmail.com; tamar.christ...@arm.com
Subject: Re: [PATCH v1] Internal-fn: Support new IFN SAT_SUB for unsigned 
scalar int

On Wed, Jun 5, 2024 at 10:22 AM Li, Pan2  wrote:
>
> > Is the above testcase correct? You need "(x + y)" as the first term.
>
> Thanks for comments, should be copy issue here, you can take SAT_SUB (x, y) 
> => (x - y) & (-(TYPE)(x >= y)) or below template for reference.
>
> +#define DEF_SAT_U_SUB_FMT_1(T) \
> +T __attribute__((noinline))\
> +sat_u_sub_##T##_fmt_1 (T x, T y)   \
> +{  \
> +  return (x - y) & (-(T)(x >= y)); \
> +}
> +
> +#define DEF_SAT_U_SUB_FMT_2(T)\
> +T __attribute__((noinline))   \
> +sat_u_sub_##T##_fmt_2 (T x, T y)  \
> +{ \
> +  return (x - y) & (-(T)(x > y)); \
> +}
>
> > BTW: After applying your patch, I'm not able to produce .SAT_SUB with
> > x86_64 and the following testcase:
>
> You mean vectorize part? This patch is only for unsigned scalar int (see 
> title) and the below is the vect part.
> Could you please help to double confirm if you cannot see .SAT_SUB after 
> widen_mul pass in x86 for unsigned scalar int?
> Of course, I will have a try later as in the middle of sth.
>
> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653024.html

I see. x86 doesn't have scalar saturating instructions, so the scalar
version indeed can't be converted.

I will amend x86 testcases after the vector part of your patch is committed.

Thanks,
Uros.


RE: [PATCH v1] Internal-fn: Support new IFN SAT_SUB for unsigned scalar int

2024-06-05 Thread Li, Pan2
Thanks for explaining. I see, cmove is well designed for such cases.

Pan

-Original Message-
From: Uros Bizjak  
Sent: Wednesday, June 5, 2024 4:46 PM
To: Li, Pan2 
Cc: Richard Biener ; gcc-patches@gcc.gnu.org; 
juzhe.zh...@rivai.ai; kito.ch...@gmail.com; tamar.christ...@arm.com
Subject: Re: [PATCH v1] Internal-fn: Support new IFN SAT_SUB for unsigned 
scalar int

On Wed, Jun 5, 2024 at 10:38 AM Li, Pan2  wrote:
>
> > I see. x86 doesn't have scalar saturating instructions, so the scalar
> > version indeed can't be converted.
>
> > I will amend x86 testcases after the vector part of your patch is committed.
>
> Thanks for the confirmation. Just curious, the .SAT_SUB for scalar has sorts 
> of forms, like a branch version as below.
>
> .SAT_SUB (x, y) = x > y ? x - y : 0. // or leverage __builtin_sub_overflow 
> here
>
> It is reasonable to implement the scalar .SAT_SUB for x86? Given somehow we 
> can eliminate the branch here.

x86 will emit cmove in the above case:

   movl%edi, %eax
   xorl%edx, %edx
   subl%esi, %eax
   cmpl%edi, %esi
   cmovnb  %edx, %eax

Maybe we can reuse flags from the subtraction here to avoid the compare.

Uros.


RE: [PATCH v4] Match: Support more form for scalar unsigned SAT_ADD

2024-06-05 Thread Li, Pan2
Thanks Richard for comments, will address the comments in v7, and looks like I 
also need to resolve conflict up to a point.

Pan

-Original Message-
From: Richard Biener  
Sent: Wednesday, June 5, 2024 4:50 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
tamar.christ...@arm.com
Subject: Re: [PATCH v4] Match: Support more form for scalar unsigned SAT_ADD

On Thu, May 30, 2024 at 3:37 PM  wrote:
>
> From: Pan Li 
>
> After we support one gassign form of the unsigned .SAT_ADD,  we
> would like to support more forms including both the branch and
> branchless.  There are 5 other forms of .SAT_ADD,  list as below:
>
> Form 1:
>   #define SAT_ADD_U_1(T) \
>   T sat_add_u_1_##T(T x, T y) \
>   { \
> return (T)(x + y) >= x ? (x + y) : -1; \
>   }
>
> Form 2:
>   #define SAT_ADD_U_2(T) \
>   T sat_add_u_2_##T(T x, T y) \
>   { \
> T ret; \
> T overflow = __builtin_add_overflow (x, y, &ret); \
> return (T)(-overflow) | ret; \
>   }
>
> Form 3:
>   #define SAT_ADD_U_3(T) \
>   T sat_add_u_3_##T (T x, T y) \
>   { \
> T ret; \
> return __builtin_add_overflow (x, y, &ret) ? -1 : ret; \
>   }
>
> Form 4:
>   #define SAT_ADD_U_4(T) \
>   T sat_add_u_4_##T (T x, T y) \
>   { \
> T ret; \
> return __builtin_add_overflow (x, y, &ret) == 0 ? ret : -1; \
>   }
>
> Form 5:
>   #define SAT_ADD_U_5(T) \
>   T sat_add_u_5_##T(T x, T y) \
>   { \
> return (T)(x + y) < x ? -1 : (x + y); \
>   }
>
> Take the forms 3 of above as example:
>
> uint64_t
> sat_add (uint64_t x, uint64_t y)
> {
>   uint64_t ret;
>   return __builtin_add_overflow (x, y, &ret) ? -1 : ret;
> }
>
> Before this patch:
> uint64_t sat_add (uint64_t x, uint64_t y)
> {
>   long unsigned int _1;
>   long unsigned int _2;
>   uint64_t _3;
>   __complex__ long unsigned int _6;
>
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _6 = .ADD_OVERFLOW (x_4(D), y_5(D));
>   _2 = IMAGPART_EXPR <_6>;
>   if (_2 != 0)
> goto ; [35.00%]
>   else
> goto ; [65.00%]
> ;;succ:   4
> ;;3
>
> ;;   basic block 3, loop depth 0
> ;;pred:   2
>   _1 = REALPART_EXPR <_6>;
> ;;succ:   4
>
> ;;   basic block 4, loop depth 0
> ;;pred:   3
> ;;2
>   # _3 = PHI <_1(3), 18446744073709551615(2)>
>   return _3;
> ;;succ:   EXIT
> }
>
> After this patch:
> uint64_t sat_add (uint64_t x, uint64_t y)
> {
>   long unsigned int _12;
>
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _12 = .SAT_ADD (x_4(D), y_5(D)); [tail call]
>   return _12;
> ;;succ:   EXIT
> }
>
> The flag '^' acts on cond_expr will generate matching code similar as below:
>
> else if (gphi *_a1 = dyn_cast  (_d1))
>   {
> basic_block _b1 = gimple_bb (_a1);
> if (gimple_phi_num_args (_a1) == 2)
>   {
> basic_block _pb_0_1 = EDGE_PRED (_b1, 0)->src;
> basic_block _pb_1_1 = EDGE_PRED (_b1, 1)->src;
> basic_block _db_1 = safe_dyn_cast  (*gsi_last_bb (_pb_0_1)) 
> ? _pb_0_1 : ...
> basic_block _other_db_1 = safe_dyn_cast  (*gsi_last_bb 
> (_pb_0_1)) ? _pb_1_1 : ...
> gcond *_ct_1 = safe_dyn_cast  (*gsi_last_bb (_db_1));
> if (_ct_1 && EDGE_COUNT (_other_db_1->preds) == 1
>   && EDGE_COUNT (_other_db_1->succs) == 1
>   && EDGE_PRED (_other_db_1, 0)->src == _db_1)
>   {
> tree _cond_lhs_1 = gimple_cond_lhs (_ct_1);
> tree _cond_rhs_1 = gimple_cond_rhs (_ct_1);
> tree _p0 = build2 (gimple_cond_code (_ct_1), boolean_type_node, 
> _cond_lhs_1, ...);
> bool _arg_0_is_true_1 = gimple_phi_arg_edge (_a1, 0)->flags  & 
> EDGE_TRUE_VALUE;
> tree _p1 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? 0 : 1);
> tree _p2 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? 1 : 0);
> switch (TREE_CODE (_p0))
>   ...
>
> The below test suites are still running, will update it later.
> * The x86 bootstrap test.
> * The x86 fully regression test.
> * The riscv fully regression test.
>
> gcc/ChangeLog:
>
> * doc/match-and-simplify.texi: Add doc for the matching flag '^'.
> * genmatch.cc (enum expr_flag): Add new enum for expr flag.
> (dt_node::gen_kids_1): Add cond_expr and flag handling.
> (dt_operand::gen_phi_on_cond): Add new func to gen phi matching
> on cond_expr.
> (parser::parse_expr): Add handling for the e

RE: [PATCH v4] Match: Support more form for scalar unsigned SAT_ADD

2024-06-05 Thread Li, Pan2
Hi Richard,

After revisited all the comments of the mail thread, I would like to confirm if 
my understanding is correct according to the generated match code.
For now the generated code looks like below:

else if (gphi *_a1 = dyn_cast  (_d1))
  {
basic_block _b1 = gimple_bb (_a1);
if (gimple_phi_num_args (_a1) == 2)
  {
basic_block _pb_0_1 = EDGE_PRED (_b1, 0)->src;
basic_block _pb_1_1 = EDGE_PRED (_b1, 1)->src;
basic_block _db_1 = safe_dyn_cast  (*gsi_last_bb (_pb_0_1)) ? 
_pb_0_1 : _pb_1_1;
basic_block _other_db_1 = safe_dyn_cast  (*gsi_last_bb 
(_pb_0_1)) ? _pb_1_1 : _pb_0_1;
gcond *_ct_1 = safe_dyn_cast  (*gsi_last_bb (_db_1));
if (_ct_1 && EDGE_COUNT (_other_db_1->preds) == 1
  && EDGE_COUNT (_other_db_1->succs) == 1
  && EDGE_PRED (_other_db_1, 0)->src == _db_1)
  {
tree _cond_lhs_1 = gimple_cond_lhs (_ct_1);
tree _cond_rhs_1 = gimple_cond_rhs (_ct_1);
tree _p0 = build2 (gimple_cond_code (_ct_1), boolean_type_node, 
_cond_lhs_1, _cond_rhs_1);
bool _arg_0_is_true_1 = gimple_phi_arg_edge (_a1, 0)->flags & 
EDGE_TRUE_VALUE;
tree _p1 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? 0 : 1);
tree _p2 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? 1 : 0);


The flow may look like below, or can only handling flow like below.

+--+
| cond |---+
+--+   v
   |+---+
   || other |
   |+---+
   v   |
+-+|
| PHI | <--+ 
+-+

Thus, I think it cannot handle the below 2 PHI flows (or even more complicated 
shapes)

+--+
| cond |---+
+--+   |
   |   |
   v   |
+--+   |
| mid  |   v
+--++---+
   || other |
   |+---+
   v   |
+-+|
| PHI | <--+ 
+-+

+--+
| cond |---+
+--+   |
   |   v
   |+---+
   || mid-0 |+
   |+---+|
   |   | v
   |   |   +---+
   |   |   | mid-1 |
   |   v   +---+
   |+---+|
   || other |<---+
   |+---+
   v   |
+-+|
| PHI | <--+ 
+-+

So I am not very sure if we need (or reasonable) to take care of all the PHI 
gimple flows (may impossible ?) Or keep the simplest one for now and add more 
case by case.
Thanks a lot.

Pan

-Original Message-
From: Li, Pan2 
Sent: Wednesday, June 5, 2024 9:44 PM
To: Richard Biener 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
tamar.christ...@arm.com
Subject: RE: [PATCH v4] Match: Support more form for scalar unsigned SAT_ADD

Thanks Richard for comments, will address the comments in v7, and looks like I 
also need to resolve conflict up to a point.

Pan

-Original Message-
From: Richard Biener  
Sent: Wednesday, June 5, 2024 4:50 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
tamar.christ...@arm.com
Subject: Re: [PATCH v4] Match: Support more form for scalar unsigned SAT_ADD

On Thu, May 30, 2024 at 3:37 PM  wrote:
>
> From: Pan Li 
>
> After we support one gassign form of the unsigned .SAT_ADD,  we
> would like to support more forms including both the branch and
> branchless.  There are 5 other forms of .SAT_ADD,  list as below:
>
> Form 1:
>   #define SAT_ADD_U_1(T) \
>   T sat_add_u_1_##T(T x, T y) \
>   { \
> return (T)(x + y) >= x ? (x + y) : -1; \
>   }
>
> Form 2:
>   #define SAT_ADD_U_2(T) \
>   T sat_add_u_2_##T(T x, T y) \
>   { \
> T ret; \
> T overflow = __builtin_add_overflow (x, y, &ret); \
> return (T)(-overflow) | ret; \
>   }
>
> Form 3:
>   #define SAT_ADD_U_3(T) \
>   T sat_add_u_3_##T (T x, T y) \
>   { \
> T ret; \
> return __builtin_add_overflow (x, y, &ret) ? -1 : ret; \
>   }
>
> Form 4:
>   #define SAT_ADD_U_4(T) \
>   T sat_add_u_4_##T (T x, T y) \
>   { \
> T ret; \
> return __builtin_add_overflow (x, y, &ret) == 0 ? ret : -1; \
>   }
>
> Form 5:
>   #define SAT_ADD_U_5(T) \
>   T sat_add_u_5_##T(T x, T y) \
>   { \
> return (T)(x + y) < x ? -1 : (x + y); \
>   }
>
> Take the forms 3 of above as example:
>
> uint64_t
> sat_add (uint64_t x, uint64_t y)
> {
>   uint64_t ret;
>   return __builtin_add_overflow (x, y, &ret) ? -1 : ret;
> }
>
> Before this patch:
> uint64_t sat_add (uint64_t x, uint64_t y)
> {
>   long unsigned int _1;
>   long unsigned int _2;
>   uint64_t _3;
>   __complex__ long unsigned int _6;
>
> ;;   ba

RE: [PATCH v2] Vect: Support IFN SAT_SUB for unsigned vector int

2024-06-06 Thread Li, Pan2
Committed, thanks Richard.

Pan

-Original Message-
From: Richard Biener  
Sent: Thursday, June 6, 2024 6:50 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
tamar.christ...@arm.com; ubiz...@gmail.com
Subject: Re: [PATCH v2] Vect: Support IFN SAT_SUB for unsigned vector int

On Thu, Jun 6, 2024 at 8:26 AM  wrote:
>
> From: Pan Li 
>
> This patch would like to support the .SAT_SUB for the unsigned
> vector int.  Given we have below example code:
>
> void
> vec_sat_sub_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
> {
>   for (unsigned i = 0; i < n; i++)
> out[i] = (x[i] - y[i]) & (-(uint64_t)(x[i] >= y[i]));
> }
>
> Before this patch:
> void
> vec_sat_sub_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
> {
>   ...
>   _77 = .SELECT_VL (ivtmp_75, POLY_INT_CST [2, 2]);
>   ivtmp_56 = _77 * 8;
>   vect__4.7_59 = .MASK_LEN_LOAD (vectp_x.5_57, 64B, { -1, ... }, _77, 0);
>   vect__6.10_63 = .MASK_LEN_LOAD (vectp_y.8_61, 64B, { -1, ... }, _77, 0);
>
>   mask__7.11_64 = vect__4.7_59 >= vect__6.10_63;
>   _66 = .COND_SUB (mask__7.11_64, vect__4.7_59, vect__6.10_63, { 0, ... });
>
>   .MASK_LEN_STORE (vectp_out.15_71, 64B, { -1, ... }, _77, 0, _66);
>   vectp_x.5_58 = vectp_x.5_57 + ivtmp_56;
>   vectp_y.8_62 = vectp_y.8_61 + ivtmp_56;
>   vectp_out.15_72 = vectp_out.15_71 + ivtmp_56;
>   ivtmp_76 = ivtmp_75 - _77;
>   ...
> }
>
> After this patch:
> void
> vec_sat_sub_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
> {
>   ...
>   _76 = .SELECT_VL (ivtmp_74, POLY_INT_CST [2, 2]);
>   ivtmp_60 = _76 * 8;
>   vect__4.7_63 = .MASK_LEN_LOAD (vectp_x.5_61, 64B, { -1, ... }, _76, 0);
>   vect__6.10_67 = .MASK_LEN_LOAD (vectp_y.8_65, 64B, { -1, ... }, _76, 0);
>
>   vect_patt_37.11_68 = .SAT_SUB (vect__4.7_63, vect__6.10_67);
>
>   .MASK_LEN_STORE (vectp_out.12_70, 64B, { -1, ... }, _76, 0, 
> vect_patt_37.11_68);
>   vectp_x.5_62 = vectp_x.5_61 + ivtmp_60;
>   vectp_y.8_66 = vectp_y.8_65 + ivtmp_60;
>   vectp_out.12_71 = vectp_out.12_70 + ivtmp_60;
>   ivtmp_75 = ivtmp_74 - _76;
>   ...
> }
>
> The below test suites are passed for this patch
> * The x86 bootstrap test.
> * The x86 fully regression test.
> * The riscv fully regression tests.

OK.

Richard.

> gcc/ChangeLog:
>
> * match.pd: Add new form for vector mode recog.
> * tree-vect-patterns.cc (gimple_unsigned_integer_sat_sub): Add
> new match func decl;
> (vect_recog_build_binary_gimple_call): Extract helper func to
> build gcall with given internal_fn.
> (vect_recog_sat_sub_pattern): Add new func impl to recog .SAT_SUB.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/match.pd  | 14 +++
>  gcc/tree-vect-patterns.cc | 85 ---
>  2 files changed, 84 insertions(+), 15 deletions(-)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 7c1ad428a3c..ebc60eba8dc 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3110,6 +3110,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
>&& types_match (type, @0, @1
>
> +/* Unsigned saturation sub, case 3 (branchless with gt):
> +   SAT_U_SUB = (X - Y) * (X > Y).  */
> +(match (unsigned_integer_sat_sub @0 @1)
> + (mult:c (minus @0 @1) (convert (gt @0 @1)))
> + (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> +  && types_match (type, @0, @1
> +
> +/* Unsigned saturation sub, case 4 (branchless with ge):
> +   SAT_U_SUB = (X - Y) * (X >= Y).  */
> +(match (unsigned_integer_sat_sub @0 @1)
> + (mult:c (minus @0 @1) (convert (ge @0 @1)))
> + (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> +  && types_match (type, @0, @1
> +
>  /* x >  y  &&  x != XXX_MIN  -->  x > y
> x >  y  &&  x == XXX_MIN  -->  false . */
>  (for eqne (eq ne)
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index 81e8fdc9122..cef901808eb 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -4488,6 +4488,32 @@ vect_recog_mult_pattern (vec_info *vinfo,
>  }
>
>  extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree));
> +extern bool gimple_unsigned_integer_sat_sub (tree, tree*, tree (*)(tree));
> +
> +static gcall *
> +vect_recog_build_binary_gimple_call (vec_info *vinfo, gimple *stmt,
> +internal_fn fn, tree *type_out,
> +tree op_0, tree op_1)
> +{
> +  tree itype = TREE_TYPE (op_0);
> +  tree vtype = get_vectype_for_sc

RE: [PATCH v4] Match: Support more form for scalar unsigned SAT_ADD

2024-06-06 Thread Li, Pan2
> I'd only keep the simplest one for now.  More complex cases can be
> handled easily
> with using dominators but those might not always be available or up-to-date 
> when
> doing match queries.  So let's revisit when we run into a case where
> the simple form
> isn't enough.

Got it. Thanks, will send the v7 if no surprise from test suites.

Pan

-Original Message-
From: Richard Biener  
Sent: Thursday, June 6, 2024 6:47 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
tamar.christ...@arm.com
Subject: Re: [PATCH v4] Match: Support more form for scalar unsigned SAT_ADD

On Thu, Jun 6, 2024 at 3:19 AM Li, Pan2  wrote:
>
> Hi Richard,
>
> After revisited all the comments of the mail thread, I would like to confirm 
> if my understanding is correct according to the generated match code.
> For now the generated code looks like below:
>
> else if (gphi *_a1 = dyn_cast  (_d1))
>   {
> basic_block _b1 = gimple_bb (_a1);
> if (gimple_phi_num_args (_a1) == 2)
>   {
> basic_block _pb_0_1 = EDGE_PRED (_b1, 0)->src;
> basic_block _pb_1_1 = EDGE_PRED (_b1, 1)->src;
> basic_block _db_1 = safe_dyn_cast  (*gsi_last_bb (_pb_0_1)) 
> ? _pb_0_1 : _pb_1_1;
> basic_block _other_db_1 = safe_dyn_cast  (*gsi_last_bb 
> (_pb_0_1)) ? _pb_1_1 : _pb_0_1;
> gcond *_ct_1 = safe_dyn_cast  (*gsi_last_bb (_db_1));
> if (_ct_1 && EDGE_COUNT (_other_db_1->preds) == 1
>   && EDGE_COUNT (_other_db_1->succs) == 1
>   && EDGE_PRED (_other_db_1, 0)->src == _db_1)
>   {
> tree _cond_lhs_1 = gimple_cond_lhs (_ct_1);
> tree _cond_rhs_1 = gimple_cond_rhs (_ct_1);
> tree _p0 = build2 (gimple_cond_code (_ct_1), boolean_type_node, 
> _cond_lhs_1, _cond_rhs_1);
> bool _arg_0_is_true_1 = gimple_phi_arg_edge (_a1, 0)->flags & 
> EDGE_TRUE_VALUE;
> tree _p1 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? 0 : 1);
> tree _p2 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? 1 : 0);
> 
>
> The flow may look like below, or can only handling flow like below.
>
> +--+
> | cond |---+
> +--+   v
>|+---+
>|| other |
>|+---+
>v   |
> +-+|
> | PHI | <--+
> +-+
>
> Thus, I think it cannot handle the below 2 PHI flows (or even more 
> complicated shapes)
>
> +--+
> | cond |---+
> +--+   |
>|   |
>v   |
> +--+   |
> | mid  |   v
> +--++---+
>|| other |
>|+---+
>v   |
> +-+|
> | PHI | <--+
> +-+
>
> +--+
> | cond |---+
> +--+   |
>|   v
>|+---+
>|| mid-0 |+
>|+---+|
>|   | v
>|   |   +---+
>|   |   | mid-1 |
>|   v   +---+
>|+---+|
>|| other |<---+
>|+---+
>v   |
> +-+|
> | PHI | <--+
> +-+

Correct.

> So I am not very sure if we need (or reasonable) to take care of all the PHI 
> gimple flows (may impossible ?) Or keep the simplest one for now and add more 
> case by case.
> Thanks a lot.

I'd only keep the simplest one for now.  More complex cases can be
handled easily
with using dominators but those might not always be available or up-to-date when
doing match queries.  So let's revisit when we run into a case where
the simple form
isn't enough.

Richard.

>
> Pan
>
> -Original Message-
> From: Li, Pan2
> Sent: Wednesday, June 5, 2024 9:44 PM
> To: Richard Biener 
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
> tamar.christ...@arm.com
> Subject: RE: [PATCH v4] Match: Support more form for scalar unsigned SAT_ADD
>
> Thanks Richard for comments, will address the comments in v7, and looks like 
> I also need to resolve conflict up to a point.
>
> Pan
>
> -Original Message-
> From: Richard Biener 
> Sent: Wednesday, June 5, 2024 4:50 PM
> To: Li, Pan2 
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
> tamar.christ...@arm.com
> Subject: Re: [PATCH v4] Match: Support more form for scalar unsigned SAT_ADD
>
> On Thu, May 30, 2024 at 3:37 PM  wrote:
> >
> > From: Pan Li 
> >
> > After we supp

RE: [PATCH v7] Match: Support more form for scalar unsigned SAT_ADD

2024-06-06 Thread Li, Pan2
Committed, thanks Richard.

Pan

-Original Message-
From: Richard Biener  
Sent: Thursday, June 6, 2024 10:04 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
tamar.christ...@arm.com
Subject: Re: [PATCH v7] Match: Support more form for scalar unsigned SAT_ADD

On Thu, Jun 6, 2024 at 3:37 PM  wrote:
>
> From: Pan Li 
>
> After we support one gassign form of the unsigned .SAT_ADD,  we
> would like to support more forms including both the branch and
> branchless.  There are 5 other forms of .SAT_ADD,  list as below:
>
> Form 1:
>   #define SAT_ADD_U_1(T) \
>   T sat_add_u_1_##T(T x, T y) \
>   { \
> return (T)(x + y) >= x ? (x + y) : -1; \
>   }
>
> Form 2:
>   #define SAT_ADD_U_2(T) \
>   T sat_add_u_2_##T(T x, T y) \
>   { \
> T ret; \
> T overflow = __builtin_add_overflow (x, y, &ret); \
> return (T)(-overflow) | ret; \
>   }
>
> Form 3:
>   #define SAT_ADD_U_3(T) \
>   T sat_add_u_3_##T (T x, T y) \
>   { \
> T ret; \
> return __builtin_add_overflow (x, y, &ret) ? -1 : ret; \
>   }
>
> Form 4:
>   #define SAT_ADD_U_4(T) \
>   T sat_add_u_4_##T (T x, T y) \
>   { \
> T ret; \
> return __builtin_add_overflow (x, y, &ret) == 0 ? ret : -1; \
>   }
>
> Form 5:
>   #define SAT_ADD_U_5(T) \
>   T sat_add_u_5_##T(T x, T y) \
>   { \
> return (T)(x + y) < x ? -1 : (x + y); \
>   }
>
> Take the forms 3 of above as example:
>
> uint64_t
> sat_add (uint64_t x, uint64_t y)
> {
>   uint64_t ret;
>   return __builtin_add_overflow (x, y, &ret) ? -1 : ret;
> }
>
> Before this patch:
> uint64_t sat_add (uint64_t x, uint64_t y)
> {
>   long unsigned int _1;
>   long unsigned int _2;
>   uint64_t _3;
>   __complex__ long unsigned int _6;
>
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _6 = .ADD_OVERFLOW (x_4(D), y_5(D));
>   _2 = IMAGPART_EXPR <_6>;
>   if (_2 != 0)
> goto ; [35.00%]
>   else
> goto ; [65.00%]
> ;;succ:   4
> ;;3
>
> ;;   basic block 3, loop depth 0
> ;;pred:   2
>   _1 = REALPART_EXPR <_6>;
> ;;succ:   4
>
> ;;   basic block 4, loop depth 0
> ;;pred:   3
> ;;2
>   # _3 = PHI <_1(3), 18446744073709551615(2)>
>   return _3;
> ;;succ:   EXIT
> }
>
> After this patch:
> uint64_t sat_add (uint64_t x, uint64_t y)
> {
>   long unsigned int _12;
>
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _12 = .SAT_ADD (x_4(D), y_5(D)); [tail call]
>   return _12;
> ;;succ:   EXIT
> }
>
> The flag '^' acts on cond_expr will generate matching code similar as below:
>
> else if (gphi *_a1 = dyn_cast  (_d1))
>   {
> basic_block _b1 = gimple_bb (_a1);
> if (gimple_phi_num_args (_a1) == 2)
>   {
> basic_block _pb_0_1 = EDGE_PRED (_b1, 0)->src;
> basic_block _pb_1_1 = EDGE_PRED (_b1, 1)->src;
> basic_block _db_1 = safe_dyn_cast  (*gsi_last_bb (_pb_0_1))
> ? _pb_0_1 : _pb_1_1;
> basic_block _other_db_1 = safe_dyn_cast  (*gsi_last_bb 
> (_pb_0_1))
>   ? _pb_1_1 : _pb_0_1;
> gcond *_ct_1 = safe_dyn_cast  (*gsi_last_bb (_db_1));
> if (_ct_1 && EDGE_COUNT (_other_db_1->preds) == 1
>   && EDGE_COUNT (_other_db_1->succs) == 1
>   && EDGE_PRED (_other_db_1, 0)->src == _db_1)
>   {
> tree _cond_lhs_1 = gimple_cond_lhs (_ct_1);
> tree _cond_rhs_1 = gimple_cond_rhs (_ct_1);
> tree _p0 = build2 (gimple_cond_code (_ct_1), boolean_type_node,
>_cond_lhs_1, _cond_rhs_1);
> bool _arg_0_is_true_1 = gimple_phi_arg_edge (_a1, 0)->flags & 
> EDGE_TRUE_VALUE;
> tree _p1 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? 0 : 1);
> tree _p2 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? 1 : 0);
> 
>
> The below test suites are passed for this patch.
> * The x86 bootstrap test.
> * The x86 fully regression test.
> * The riscv fully regression test.

OK.

Thanks,
Richard.

> gcc/ChangeLog:
>
> * doc/match-and-simplify.texi: Add doc for the matching flag '^'.
> * genmatch.cc (cmp_operand): Add match_phi comparation.
> (dt_node::gen_kids_1): Add cond_expr bool flag for phi match.
> (dt_operand::gen_phi_on_cond): Add new func to gen phi matching
> on cond_expr.
> (parser::parse_expr): Add handling for the expr flag '

RE: [PATCH v1 1/5] RISC-V: Add testcases for scalar unsigned SAT_ADD form 1

2024-06-06 Thread Li, Pan2
Committed the series as the middle-end patch committed.

Pan

From: Li, Pan2
Sent: Monday, June 3, 2024 11:24 AM
To: juzhe.zh...@rivai.ai; gcc-patches 
Cc: kito.cheng 
Subject: RE: [PATCH v1 1/5] RISC-V: Add testcases for scalar unsigned SAT_ADD 
form 1

Thanks Juzhe, will commit it after the middle-end patch, as well as the rest 
similar 4 patches.

Pan

From: juzhe.zh...@rivai.ai<mailto:juzhe.zh...@rivai.ai> 
mailto:juzhe.zh...@rivai.ai>>
Sent: Monday, June 3, 2024 11:19 AM
To: Li, Pan2 mailto:pan2...@intel.com>>; gcc-patches 
mailto:gcc-patches@gcc.gnu.org>>
Cc: kito.cheng mailto:kito.ch...@gmail.com>>; Li, Pan2 
mailto:pan2...@intel.com>>
Subject: Re: [PATCH v1 1/5] RISC-V: Add testcases for scalar unsigned SAT_ADD 
form 1

LGTM. Thanks.


juzhe.zh...@rivai.ai<mailto:juzhe.zh...@rivai.ai>

From: pan2.li<mailto:pan2...@intel.com>
Date: 2024-06-03 11:09
To: gcc-patches<mailto:gcc-patches@gcc.gnu.org>
CC: juzhe.zhong<mailto:juzhe.zh...@rivai.ai>; 
kito.cheng<mailto:kito.ch...@gmail.com>; Pan Li<mailto:pan2...@intel.com>
Subject: [PATCH v1 1/5] RISC-V: Add testcases for scalar unsigned SAT_ADD form 1
From: Pan Li mailto:pan2...@intel.com>>

After the middle-end support the form 1 of unsigned SAT_ADD and
the RISC-V backend implement the scalar .SAT_ADD, add more test
case to cover the form 1 of unsigned .SAT_ADD.

Form 1:

  #define SAT_ADD_U_1(T)   \
  T sat_add_u_1_##T(T x, T y)  \
  {\
return (T)(x + y) >= x ? (x + y) : -1; \
  }

Passed the riscv fully regression tests.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add helper macro for form 1.
* gcc.target/riscv/sat_u_add-5.c: New test.
* gcc.target/riscv/sat_u_add-6.c: New test.
* gcc.target/riscv/sat_u_add-7.c: New test.
* gcc.target/riscv/sat_u_add-8.c: New test.
* gcc.target/riscv/sat_u_add-run-5.c: New test.
* gcc.target/riscv/sat_u_add-run-6.c: New test.
* gcc.target/riscv/sat_u_add-run-7.c: New test.
* gcc.target/riscv/sat_u_add-run-8.c: New test.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>
---
gcc/testsuite/gcc.target/riscv/sat_arith.h|  8 ++
gcc/testsuite/gcc.target/riscv/sat_u_add-5.c  | 19 ++
gcc/testsuite/gcc.target/riscv/sat_u_add-6.c  | 21 
gcc/testsuite/gcc.target/riscv/sat_u_add-7.c  | 18 +
gcc/testsuite/gcc.target/riscv/sat_u_add-8.c  | 17 +
.../gcc.target/riscv/sat_u_add-run-5.c| 25 +++
.../gcc.target/riscv/sat_u_add-run-6.c| 25 +++
.../gcc.target/riscv/sat_u_add-run-7.c| 25 +++
.../gcc.target/riscv/sat_u_add-run-8.c| 25 +++
9 files changed, 183 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-5.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-6.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-7.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-8.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-5.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-6.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-7.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-8.c

diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/sat_arith.h
index 2ef9fd825f3..2abc83d7666 100644
--- a/gcc/testsuite/gcc.target/riscv/sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
@@ -10,6 +10,13 @@ sat_u_add_##T##_fmt_1 (T x, T y)   \
   return (x + y) | (-(T)((T)(x + y) < x)); \
}
+#define DEF_SAT_U_ADD_FMT_2(T)   \
+T __attribute__((noinline))  \
+sat_u_add_##T##_fmt_2 (T x, T y) \
+{\
+  return (T)(x + y) >= x ? (x + y) : -1; \
+}
+
#define DEF_VEC_SAT_U_ADD_FMT_1(T)   \
void __attribute__((noinline))   \
vec_sat_u_add_##T##_fmt_1 (T *out, T *op_1, T *op_2, unsigned limit) \
@@ -24,6 +31,7 @@ vec_sat_u_add_##T##_fmt_1 (T *out, T *op_1, T *op_2, unsigned 
limit) \
}
#define RUN_SAT_U_ADD_FMT_1(T, x, y) sat_u_add_##T##_fmt_1(x, y)
+#define RUN_SAT_U_ADD_FMT_2(T, x, y) sat_u_add_##T##_fmt_2(x, y)
#define RUN_VEC_SAT_U_ADD_FMT_1(T, out, op_1, op_2, N) \
   vec_sat_u_add_##T##_fmt_1(out, op_1, op_2, N)
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-5.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_add-5.c
new file mode 100644
index 000..4c73c7f8a21
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-5.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h&quo

RE: [PATCH v2] RISC-V: Implement .SAT_SUB for unsigned scalar int

2024-06-07 Thread Li, Pan2
Thanks Robin for comments.

> Can we replace step 3 and 4 with sub lt, -1 directly when
> it's supposed to be optimized like that anyway?

Sure thing, will update in v3.

> When you say other variants are still to be implemented
> does that also include variants for zbb with min/max
> or zicond?

No, I mean some other forms like branch need the improvement from the middle 
end(aka widen_mul).

Pan

-Original Message-
From: Robin Dapp  
Sent: Friday, June 7, 2024 6:18 PM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: rdapp@gmail.com; juzhe.zh...@rivai.ai; kito.ch...@gmail.com
Subject: Re: [PATCH v2] RISC-V: Implement .SAT_SUB for unsigned scalar int

Hi Pan,

> +  /* Step-2: lt = x < y  */
> +  riscv_emit_binary (LTU, pmode_lt, pmode_x, pmode_y);
> +
> +  /* Step-3: lt = -lt  */
> +  riscv_emit_unary (NEG, pmode_lt, pmode_lt);
> +
> +  /* Step-4: lt = ~lt  */
> +  riscv_emit_unary (NOT, pmode_lt, pmode_lt);

Can we replace step 3 and 4 with sub lt, -1 directly when
it's supposed to be optimized like that anyway?
I was a bit irritated when reading the code because I
figured we could surely save one instruction there but then
realized that the cover letter has the shorter sequence.

The rest LGTM.

When you say other variants are still to be implemented
does that also include variants for zbb with min/max
or zicond?

Regards
 Robin


RE: [PATCH v3] RISC-V: Implement .SAT_SUB for unsigned scalar int

2024-06-08 Thread Li, Pan2
> LGTM.

Committed, thanks Robin.

> Let's keep in mind that min/max will save us two insns(?)
> and a conditional move would save us one.

Got it, cmov is well designed for such case(s).

Pan


-Original Message-
From: Robin Dapp  
Sent: Friday, June 7, 2024 9:57 PM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: rdapp@gmail.com; juzhe.zh...@rivai.ai; kito.ch...@gmail.com
Subject: Re: [PATCH v3] RISC-V: Implement .SAT_SUB for unsigned scalar int

LGTM.

Let's keep in mind that min/max will save us two insns(?)
and a conditional move would save us one.

Regards
 Robin


RE: [PATCH 1/5] RISC-V: Remove float vector eqne pattern

2024-06-10 Thread Li, Pan2
Not sure if below float eq implement in sail-riscv is useful or not, but looks 
like some special handling for nan, as well as snan.

https://github.com/riscv/sail-riscv/blob/master/c_emulator/SoftFloat-3e/source/f32_eq.c

Pan

-Original Message-
From: Jeff Law  
Sent: Monday, June 10, 2024 9:50 PM
To: Robin Dapp ; Demin Han ; 
钟居哲 ; gcc-patches 
Cc: kito.cheng ; Li, Pan2 
Subject: Re: [PATCH 1/5] RISC-V: Remove float vector eqne pattern



On 6/10/24 1:33 AM, Robin Dapp wrote:
>> But isn't canonicalization of EQ/NE safe, even for IEEE NaN and +-0.0?
>>
>> target = (a == b) ? x : y
>> target = (a != b) ? y : x
>>
>> Are equivalent, even for IEEE IIRC.
> 
> Yes, that should be fine.  My concern was not that we do a
> canonicalization but that we might not do it for some of the
> vector cases.  In particular when one of the operands is wrapped
> in a vec_duplicate and we end up with it first rather than
> second.
> 
> My general feeling is that the patch is good but I wasn't entirely
> sure about all cases (in particular in case we transform something
> after expand).  That's why I would have liked to see at least some
> small test cases for it along with the patch (for the combinations
> we don't test yet).
Ah, OK.

Demin, can you some additional test coverage, guided by Robin's concerns 
above?

Thanks,
jeff



RE: [PATCH v1] Widening-Mul: Fix one ICE of gcall insertion for PHI match

2024-06-10 Thread Li, Pan2
Thank a lot, Jeff.

Pan

-Original Message-
From: Jeff Law  
Sent: Tuesday, June 11, 2024 4:15 AM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com
Subject: Re: [PATCH v1] Widening-Mul: Fix one ICE of gcall insertion for PHI 
match



On 6/10/24 8:49 AM, pan2...@intel.com wrote:
> When enabled the PHI handing for COND_EXPR,  we need to insert the gcall
> to replace the PHI node.  Unfortunately,  I made a mistake that insert
> the gcall to before the last stmt of the bb.  See below gimple,  the PHI
> is located at no.1 but we insert the gcall (aka no.9) to the end of
> the bb.  Then the use of _9 in no.2 will have no def and will trigger
> ICE when verify_ssa.
> 
>1. # _9 = PHI <_3(4), 18446744073709551615(3)> // The PHI node to be 
> deleted.
>2. prephitmp_36 = (char *) _9;
>3. buf.write_base = string_13(D);
>4. buf.write_ptr = string_13(D);
>5. buf.write_end = prephitmp_36;
>6. buf.written = 0;
>7. buf.mode = 3;
>8. _7 = buf.write_end;
>9. _9 = .SAT_ADD (string.0_2, maxlen_15(D));   // Insert gcall to last bb 
> by mistake
> 
> This patch would like to insert the gcall to before the start of the bb
> stmt.  To ensure the possible use of PHI_result will have a def exists.
> After this patch the above gimple will be:
> 
>0. _9 = .SAT_ADD (string.0_2, maxlen_15(D));   // Insert gcall to start bb 
> by mistake
>1. # _9 = PHI <_3(4), 18446744073709551615(3)> // The PHI node to be 
> deleted.
>2. prephitmp_36 = (char *) _9;
>3. buf.write_base = string_13(D);
>4. buf.write_ptr = string_13(D);
>5. buf.write_end = prephitmp_36;
>6. buf.written = 0;
>7. buf.mode = 3;
>8. _7 = buf.write_end;
> 
> The below test suites are passed for this patch:
> * The rv64gcv fully regression test with newlib.
> * The rv64gcv build with glibc.
> * The x86 regression test with newlib.
> * The x86 bootstrap test with newlib.
> 
>   PR target/115387
> 
> gcc/ChangeLog:
> 
>   * tree-ssa-math-opts.cc (math_opts_dom_walker::after_dom_children): Take
>   the gsi of start_bb instead of last_bb.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/riscv/pr115387-1.c: New test.
>   * gcc.target/riscv/pr115387-2.c: New test.
I did a fresh x86_64 bootstrap and regression test and pushed this.

jeff



RE: [PATCH v1] Widening-Mul: Fix one ICE of gcall insertion for PHI match

2024-06-10 Thread Li, Pan2
Hi Sam,

> This testcases ICEs for me on x86-64 too (without your patch) with just -O2.
> Can you move it out of the riscv suite? (I suspect the other fails on x86-64 
> too).

Sure thing, but do you have any suggestion about where should I put these 2 
cases? 
There are sorts of sub-directories under gcc/testsuite, I am not very familiar 
that where
is the best reasonable location.

Pan

-Original Message-
From: Sam James  
Sent: Monday, June 10, 2024 11:33 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
richard.guent...@gmail.com
Subject: Re: [PATCH v1] Widening-Mul: Fix one ICE of gcall insertion for PHI 
match

pan2...@intel.com writes:

> From: Pan Li 
>
> When enabled the PHI handing for COND_EXPR,  we need to insert the gcall
> to replace the PHI node.  Unfortunately,  I made a mistake that insert
> the gcall to before the last stmt of the bb.  See below gimple,  the PHI
> is located at no.1 but we insert the gcall (aka no.9) to the end of
> the bb.  Then the use of _9 in no.2 will have no def and will trigger
> ICE when verify_ssa.
>
>   1. # _9 = PHI <_3(4), 18446744073709551615(3)> // The PHI node to be 
> deleted.
>   2. prephitmp_36 = (char *) _9;
>   3. buf.write_base = string_13(D);
>   4. buf.write_ptr = string_13(D);
>   5. buf.write_end = prephitmp_36;
>   6. buf.written = 0;
>   7. buf.mode = 3;
>   8. _7 = buf.write_end;
>   9. _9 = .SAT_ADD (string.0_2, maxlen_15(D));   // Insert gcall to last bb 
> by mistake
>
> This patch would like to insert the gcall to before the start of the bb
> stmt.  To ensure the possible use of PHI_result will have a def exists.
> After this patch the above gimple will be:
>
>   0. _9 = .SAT_ADD (string.0_2, maxlen_15(D));   // Insert gcall to start bb 
> by mistake
>   1. # _9 = PHI <_3(4), 18446744073709551615(3)> // The PHI node to be 
> deleted.
>   2. prephitmp_36 = (char *) _9;
>   3. buf.write_base = string_13(D);
>   4. buf.write_ptr = string_13(D);
>   5. buf.write_end = prephitmp_36;
>   6. buf.written = 0;
>   7. buf.mode = 3;
>   8. _7 = buf.write_end;
>
> The below test suites are passed for this patch:
> * The rv64gcv fully regression test with newlib.
> * The rv64gcv build with glibc.
> * The x86 regression test with newlib.
> * The x86 bootstrap test with newlib.
>
>   PR target/115387
>
> gcc/ChangeLog:
>
>   * tree-ssa-math-opts.cc (math_opts_dom_walker::after_dom_children): Take
>   the gsi of start_bb instead of last_bb.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/riscv/pr115387-1.c: New test.
>   * gcc.target/riscv/pr115387-2.c: New test.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/testsuite/gcc.target/riscv/pr115387-1.c | 35 +
>  gcc/testsuite/gcc.target/riscv/pr115387-2.c | 18 +++
>  gcc/tree-ssa-math-opts.cc   |  2 +-
>  3 files changed, 54 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr115387-1.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr115387-2.c
>
> diff --git a/gcc/testsuite/gcc.target/riscv/pr115387-1.c 
> b/gcc/testsuite/gcc.target/riscv/pr115387-1.c
> new file mode 100644
> index 000..a1c926977c4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr115387-1.c
> @@ -0,0 +1,35 @@
> +/* Test there is no ICE when compile.  */
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3" } */
> +
> +#define PRINTF_CHK 0x34
> +
> +typedef unsigned long uintptr_t;
> +
> +struct __printf_buffer {
> +  char *write_ptr;
> +  int status;
> +};
> +
> +extern void __printf_buffer_init_end (struct __printf_buffer *, char *, char 
> *);
> +
> +void
> +test (char *string, unsigned long maxlen, unsigned mode_flags)
> +{
> +  struct __printf_buffer buf;
> +
> +  if ((mode_flags & PRINTF_CHK) != 0)
> +{
> +  string[0] = '\0';
> +  uintptr_t end;
> +
> +  if (__builtin_add_overflow ((uintptr_t) string, maxlen, &end))
> + end = -1;
> +
> +  __printf_buffer_init_end (&buf, string, (char *) end);
> +}
> +  else
> +__printf_buffer_init_end (&buf, string, (char *) ~(uintptr_t) 0);
> +
> +  *buf.write_ptr = '\0';
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/pr115387-2.c 
> b/gcc/testsuite/gcc.target/riscv/pr115387-2.c
> new file mode 100644
> index 000..7183bf18dfd
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr115387-2.c
> @@ -0,0 +1,18 @@
> +/* Test there is no ICE when compile.  */
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3" } */
> +
> +#inc

RE: [PATCH v1] Widening-Mul: Fix one ICE of gcall insertion for PHI match

2024-06-10 Thread Li, Pan2
Got it, thanks. Let me prepare the patch after test.

Pan

-Original Message-
From: Jeff Law  
Sent: Tuesday, June 11, 2024 9:42 AM
To: Li, Pan2 ; Sam James 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
richard.guent...@gmail.com
Subject: Re: [PATCH v1] Widening-Mul: Fix one ICE of gcall insertion for PHI 
match



On 6/10/24 7:28 PM, Li, Pan2 wrote:
> Hi Sam,
> 
>> This testcases ICEs for me on x86-64 too (without your patch) with just -O2.
>> Can you move it out of the riscv suite? (I suspect the other fails on x86-64 
>> too).
> 
> Sure thing, but do you have any suggestion about where should I put these 2 
> cases?
> There are sorts of sub-directories under gcc/testsuite, I am not very 
> familiar that where
> is the best reasonable location.
gcc.dg/torture would be the most natural location I think.

jeff



RE: [PATCH v1] Widening-Mul: Fix one ICE of gcall insertion for PHI match

2024-06-11 Thread Li, Pan2
Thanks Richard for comments.

> This should use gsi_after_labels (bb); otherwise you'll ICE when there's a 
> label
> in the BB. 
> Please fix the label issue though.

Sure.

> You also have to look out for a first stmt that returns twice since
> you may not insert anything before that.  I would suggest to not match when
> BB has abnormal incoming edges which I guess will be ensured by the PHI
> matching code anyway, so I just mentioned this insertion restriction.

Got it, the PHI matching code ensured this.
But I may lose the point about the scenario you mentioned, aka
"a first stmt return twice since you may not insert anything before that".
Could you help to explain more about it? Thanks a lot.

Pan

-Original Message-
From: Richard Biener  
Sent: Tuesday, June 11, 2024 3:07 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com
Subject: Re: [PATCH v1] Widening-Mul: Fix one ICE of gcall insertion for PHI 
match

On Mon, Jun 10, 2024 at 4:49 PM  wrote:
>
> From: Pan Li 
>
> When enabled the PHI handing for COND_EXPR,  we need to insert the gcall
> to replace the PHI node.  Unfortunately,  I made a mistake that insert
> the gcall to before the last stmt of the bb.  See below gimple,  the PHI
> is located at no.1 but we insert the gcall (aka no.9) to the end of
> the bb.  Then the use of _9 in no.2 will have no def and will trigger
> ICE when verify_ssa.
>
>   1. # _9 = PHI <_3(4), 18446744073709551615(3)> // The PHI node to be 
> deleted.
>   2. prephitmp_36 = (char *) _9;
>   3. buf.write_base = string_13(D);
>   4. buf.write_ptr = string_13(D);
>   5. buf.write_end = prephitmp_36;
>   6. buf.written = 0;
>   7. buf.mode = 3;
>   8. _7 = buf.write_end;
>   9. _9 = .SAT_ADD (string.0_2, maxlen_15(D));   // Insert gcall to last bb 
> by mistake
>
> This patch would like to insert the gcall to before the start of the bb
> stmt.  To ensure the possible use of PHI_result will have a def exists.
> After this patch the above gimple will be:
>
>   0. _9 = .SAT_ADD (string.0_2, maxlen_15(D));   // Insert gcall to start bb 
> by mistake
>   1. # _9 = PHI <_3(4), 18446744073709551615(3)> // The PHI node to be 
> deleted.
>   2. prephitmp_36 = (char *) _9;
>   3. buf.write_base = string_13(D);
>   4. buf.write_ptr = string_13(D);
>   5. buf.write_end = prephitmp_36;
>   6. buf.written = 0;
>   7. buf.mode = 3;
>   8. _7 = buf.write_end;
>
> The below test suites are passed for this patch:
> * The rv64gcv fully regression test with newlib.
> * The rv64gcv build with glibc.
> * The x86 regression test with newlib.
> * The x86 bootstrap test with newlib.
>
> PR target/115387
>
> gcc/ChangeLog:
>
> * tree-ssa-math-opts.cc (math_opts_dom_walker::after_dom_children): 
> Take
> the gsi of start_bb instead of last_bb.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/pr115387-1.c: New test.
> * gcc.target/riscv/pr115387-2.c: New test.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/testsuite/gcc.target/riscv/pr115387-1.c | 35 +
>  gcc/testsuite/gcc.target/riscv/pr115387-2.c | 18 +++
>  gcc/tree-ssa-math-opts.cc   |  2 +-
>  3 files changed, 54 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr115387-1.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr115387-2.c
>
> diff --git a/gcc/testsuite/gcc.target/riscv/pr115387-1.c 
> b/gcc/testsuite/gcc.target/riscv/pr115387-1.c
> new file mode 100644
> index 000..a1c926977c4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr115387-1.c
> @@ -0,0 +1,35 @@
> +/* Test there is no ICE when compile.  */
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3" } */
> +
> +#define PRINTF_CHK 0x34
> +
> +typedef unsigned long uintptr_t;
> +
> +struct __printf_buffer {
> +  char *write_ptr;
> +  int status;
> +};
> +
> +extern void __printf_buffer_init_end (struct __printf_buffer *, char *, char 
> *);
> +
> +void
> +test (char *string, unsigned long maxlen, unsigned mode_flags)
> +{
> +  struct __printf_buffer buf;
> +
> +  if ((mode_flags & PRINTF_CHK) != 0)
> +{
> +  string[0] = '\0';
> +  uintptr_t end;
> +
> +  if (__builtin_add_overflow ((uintptr_t) string, maxlen, &end))
> +   end = -1;
> +
> +  __printf_buffer_init_end (&buf, string, (char *) end);
> +}
> +  else
> +__printf_buffer_init_end (&buf, string, (char *) ~(uintptr_t) 0);
> +
> +  *buf.write_ptr = '\0';
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/pr115387-

RE: [PATCH v1] RISC-V: Implement .SAT_SUB for unsigned vector int

2024-06-11 Thread Li, Pan2
Thanks Robin.

> in general LGTM.  Would you mind adding the coremark-pro
> testcase which should be working now, and, was the original
> reason for doing this?

Yes, of course.

Unfortunately, the pattern from coremark-pro is not working for now because it 
is branch form that generate PHI node
during vectorization. We still need some enhancement from middle-end to support 
PHI node vectorization.

After that I will add more test cases with sorts of forms.

Pan

-Original Message-
From: Robin Dapp  
Sent: Tuesday, June 11, 2024 4:03 PM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: rdapp@gmail.com; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
jeffreya...@gmail.com
Subject: Re: [PATCH v1] RISC-V: Implement .SAT_SUB for unsigned vector int

Hi Pan,

in general LGTM.  Would you mind adding the coremark-pro
testcase which should be working now, and, was the original
reason for doing this?

I believe the following should do:

extern int wsize;

typedef unsigned short Posf;
#define NIL 0

void foo (Posf *p)
{
  register unsigned n, m;
  do {
  m = *--p;
  *p = (Posf)(m >= wsize ? m-wsize : NIL);
  } while (--n);
}

Regards
 Robin


RE: [PATCH v1] RISC-V: Implement .SAT_SUB for unsigned vector int

2024-06-11 Thread Li, Pan2
Committed, thanks Robin.

Pan

-Original Message-
From: Robin Dapp  
Sent: Tuesday, June 11, 2024 4:19 PM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: rdapp@gmail.com; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
jeffreya...@gmail.com
Subject: Re: [PATCH v1] RISC-V: Implement .SAT_SUB for unsigned vector int

Thanks, the patch is OK then.

Regards
 Robin


RE: [PATCH v1] Widening-Mul: Fix one ICE of gcall insertion for PHI match

2024-06-11 Thread Li, Pan2
Got it. Thanks Richard.

Pan

-Original Message-
From: Richard Biener  
Sent: Tuesday, June 11, 2024 5:31 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com
Subject: Re: [PATCH v1] Widening-Mul: Fix one ICE of gcall insertion for PHI 
match

On Tue, Jun 11, 2024 at 9:45 AM Li, Pan2  wrote:
>
> Thanks Richard for comments.
>
> > This should use gsi_after_labels (bb); otherwise you'll ICE when there's a 
> > label
> > in the BB.
> > Please fix the label issue though.
>
> Sure.
>
> > You also have to look out for a first stmt that returns twice since
> > you may not insert anything before that.  I would suggest to not match when
> > BB has abnormal incoming edges which I guess will be ensured by the PHI
> > matching code anyway, so I just mentioned this insertion restriction.
>
> Got it, the PHI matching code ensured this.
> But I may lose the point about the scenario you mentioned, aka
> "a first stmt return twice since you may not insert anything before that".
> Could you help to explain more about it? Thanks a lot.

When we have a setjmp call an incoming abnormal edge represents the
alternate return from the call from any point that can do longjmp.  Inserting
before the call would be inserting on that edge which cannot be done.
It's a bit of an awkward representation since in reality we'd have to split
the call into the actual call and the longjmp receiver.

Richard.

>
> Pan
>
> -Original Message-
> From: Richard Biener 
> Sent: Tuesday, June 11, 2024 3:07 PM
> To: Li, Pan2 
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com
> Subject: Re: [PATCH v1] Widening-Mul: Fix one ICE of gcall insertion for PHI 
> match
>
> On Mon, Jun 10, 2024 at 4:49 PM  wrote:
> >
> > From: Pan Li 
> >
> > When enabled the PHI handing for COND_EXPR,  we need to insert the gcall
> > to replace the PHI node.  Unfortunately,  I made a mistake that insert
> > the gcall to before the last stmt of the bb.  See below gimple,  the PHI
> > is located at no.1 but we insert the gcall (aka no.9) to the end of
> > the bb.  Then the use of _9 in no.2 will have no def and will trigger
> > ICE when verify_ssa.
> >
> >   1. # _9 = PHI <_3(4), 18446744073709551615(3)> // The PHI node to be 
> > deleted.
> >   2. prephitmp_36 = (char *) _9;
> >   3. buf.write_base = string_13(D);
> >   4. buf.write_ptr = string_13(D);
> >   5. buf.write_end = prephitmp_36;
> >   6. buf.written = 0;
> >   7. buf.mode = 3;
> >   8. _7 = buf.write_end;
> >   9. _9 = .SAT_ADD (string.0_2, maxlen_15(D));   // Insert gcall to last bb 
> > by mistake
> >
> > This patch would like to insert the gcall to before the start of the bb
> > stmt.  To ensure the possible use of PHI_result will have a def exists.
> > After this patch the above gimple will be:
> >
> >   0. _9 = .SAT_ADD (string.0_2, maxlen_15(D));   // Insert gcall to start 
> > bb by mistake
> >   1. # _9 = PHI <_3(4), 18446744073709551615(3)> // The PHI node to be 
> > deleted.
> >   2. prephitmp_36 = (char *) _9;
> >   3. buf.write_base = string_13(D);
> >   4. buf.write_ptr = string_13(D);
> >   5. buf.write_end = prephitmp_36;
> >   6. buf.written = 0;
> >   7. buf.mode = 3;
> >   8. _7 = buf.write_end;
> >
> > The below test suites are passed for this patch:
> > * The rv64gcv fully regression test with newlib.
> > * The rv64gcv build with glibc.
> > * The x86 regression test with newlib.
> > * The x86 bootstrap test with newlib.
> >
> > PR target/115387
> >
> > gcc/ChangeLog:
> >
> > * tree-ssa-math-opts.cc (math_opts_dom_walker::after_dom_children): 
> > Take
> > the gsi of start_bb instead of last_bb.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/riscv/pr115387-1.c: New test.
> > * gcc.target/riscv/pr115387-2.c: New test.
> >
> > Signed-off-by: Pan Li 
> > ---
> >  gcc/testsuite/gcc.target/riscv/pr115387-1.c | 35 +
> >  gcc/testsuite/gcc.target/riscv/pr115387-2.c | 18 +++
> >  gcc/tree-ssa-math-opts.cc   |  2 +-
> >  3 files changed, 54 insertions(+), 1 deletion(-)
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/pr115387-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/pr115387-2.c
> >
> > diff --git a/gcc/testsuite/gcc.target/riscv/pr115387-1.c 
> > b/gcc/testsuite/gcc.target/riscv/pr115387-1.c
> > new file mode 100644
> > index 000..a1c926

RE: [PATCH v1] Test: Move target independent test cases to gcc.dg/torture

2024-06-11 Thread Li, Pan2
> Since you are moving it to torture, please remove -O3 as it is already 
> supplied there as one of the torture options.

Oh, I see. Thanks for comments, and will update it in v2.

Pan

From: Andrew Pinski 
Sent: Tuesday, June 11, 2024 9:45 PM
To: Li, Pan2 
Cc: GCC Patches ; 钟居哲 ; Kito 
Cheng ; Richard Guenther ; 
Jeff Law 
Subject: Re: [PATCH v1] Test: Move target independent test cases to 
gcc.dg/torture


On Mon, Jun 10, 2024, 11:20 PM mailto:pan2...@intel.com>> 
wrote:
From: Pan Li mailto:pan2...@intel.com>>

The test cases of pr115387 are target independent,  at least x86
and riscv are able to reproduce.  Thus,  move these cases to
the gcc.dg/torture.

The below test suites are passed.
1. The rv64gcv fully regression test.
2. The x86 fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr115387-1.c: Move to...
* gcc.dg/torture/pr115387-1.c: ...here.
* gcc.target/riscv/pr115387-2.c: Move to...
* gcc.dg/torture/pr115387-2.c: ...here.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>
---
 gcc/testsuite/{gcc.target/riscv => gcc.dg/torture}/pr115387-1.c | 2 +-
 gcc/testsuite/{gcc.target/riscv => gcc.dg/torture}/pr115387-2.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)
 rename gcc/testsuite/{gcc.target/riscv => gcc.dg/torture}/pr115387-1.c (92%)
 rename gcc/testsuite/{gcc.target/riscv => gcc.dg/torture}/pr115387-2.c (84%)

diff --git a/gcc/testsuite/gcc.target/riscv/pr115387-1.c 
b/gcc/testsuite/gcc.dg/torture/pr115387-1.c
similarity index 92%
rename from gcc/testsuite/gcc.target/riscv/pr115387-1.c
rename to gcc/testsuite/gcc.dg/torture/pr115387-1.c
index a1c926977c4..fde79f66757 100644
--- a/gcc/testsuite/gcc.target/riscv/pr115387-1.c
+++ b/gcc/testsuite/gcc.dg/torture/pr115387-1.c
@@ -1,6 +1,6 @@
 /* Test there is no ICE when compile.  */
 /* { dg-do compile } */
-/* { dg-options "-march=rv64gcv -mabi=lp64d -O3" } */
+/* { dg-options "-O3" } */

Since you are moving it to torture, please remove -O3 as it is already supplied 
there as one of the torture options.



 #define PRINTF_CHK 0x34

diff --git a/gcc/testsuite/gcc.target/riscv/pr115387-2.c 
b/gcc/testsuite/gcc.dg/torture/pr115387-2.c
similarity index 84%
rename from gcc/testsuite/gcc.target/riscv/pr115387-2.c
rename to gcc/testsuite/gcc.dg/torture/pr115387-2.c
index 7183bf18dfd..0cb4b48d27b 100644
--- a/gcc/testsuite/gcc.target/riscv/pr115387-2.c
+++ b/gcc/testsuite/gcc.dg/torture/pr115387-2.c
@@ -1,6 +1,6 @@
 /* Test there is no ICE when compile.  */
 /* { dg-do compile } */
-/* { dg-options "-march=rv64gcv -mabi=lp64d -O3" } */
+/* { dg-options "-O3" } */

 #include 
 #include 
--
2.34.1


RE: [PATCH v1] Widening-Mul: Take gsi after_labels instead of start_bb for gcall insertion

2024-06-11 Thread Li, Pan2
Committed, thanks Richard.

Pan

-Original Message-
From: Richard Biener  
Sent: Wednesday, June 12, 2024 2:41 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
jeffreya...@gmail.com
Subject: Re: [PATCH v1] Widening-Mul: Take gsi after_labels instead of start_bb 
for gcall insertion

On Tue, Jun 11, 2024 at 3:53 PM  wrote:
>
> From: Pan Li 
>
> We inserted the gcall of .SAT_ADD before the gsi_start_bb for avoiding
> the ssa def after use ICE issue.  Unfortunately,  there will be the
> potential ICE when the first stmt is label.  We cannot insert the gcall
> before the label.  Thus,  we take gsi_after_labels to locate the
> 'really' stmt that the gcall will insert before.
>
> The existing test cases pr115387-1.c and pr115387-2.c cover this change.

OK

> The below test suites are passed for this patch.
> * The rv64gcv fully regression test with newlib.
> * The x86 regression test.
> * The x86 bootstrap test.
>
> gcc/ChangeLog:
>
> * tree-ssa-math-opts.cc (math_opts_dom_walker::after_dom_children):
> Leverage gsi_after_labels instead of gsi_start_bb to skip the
> leading labels of bb.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/tree-ssa-math-opts.cc | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
> index fbb8e0ea306..c09e9006443 100644
> --- a/gcc/tree-ssa-math-opts.cc
> +++ b/gcc/tree-ssa-math-opts.cc
> @@ -6102,7 +6102,7 @@ math_opts_dom_walker::after_dom_children (basic_block 
> bb)
>for (gphi_iterator psi = gsi_start_phis (bb); !gsi_end_p (psi);
>  gsi_next (&psi))
>  {
> -  gimple_stmt_iterator gsi = gsi_start_bb (bb);
> +  gimple_stmt_iterator gsi = gsi_after_labels (bb);
>match_unsigned_saturation_add (&gsi, psi.phi ());
>  }
>
> --
> 2.34.1
>


RE: [Committed] RISC-V: Add basic Zaamo and Zalrsc support

2024-06-12 Thread Li, Pan2
Do we need to upgrade the binutils of the riscv-gnu-toolchain repo? Or we may 
have unknown prefixed ISA extension `zaamo' when building.

Pan


-Original Message-
From: Patrick O'Neill  
Sent: Wednesday, June 12, 2024 1:08 AM
To: Jeff Law ; gcc-patches@gcc.gnu.org
Cc: kito.ch...@gmail.com; e...@rivosinc.com; pal...@dabbelt.com; 
gnu-toolch...@rivosinc.com; and...@rivosinc.com
Subject: [Committed] RISC-V: Add basic Zaamo and Zalrsc support


On 6/10/24 21:33, Jeff Law wrote:
>
>
> On 6/10/24 3:46 PM, Patrick O'Neill wrote:
>> The A extension has been split into two parts: Zaamo and Zalrsc.
>> This patch adds basic support by making the A extension imply Zaamo and
>> Zalrsc.
>>
>> Zaamo/Zalrsc spec: https://github.com/riscv/riscv-zaamo-zalrsc/tags
>> Ratification: https://jira.riscv.org/browse/RVS-1995
>>
>> v2:
>> Rebased and updated some testcases that rely on the ISA string.
>>
>> v3:
>> Regex-ify temp registers in added testcases.
>> Remove unintentional whitespace changes.
>> Add riscv_{a|ztso|zaamo|zalrsc} docs to sourcebuild.texi (and move 
>> core-v bi
>> extension doc into appropriate section).
>>
>> Edwin Lu (1):
>>    RISC-V: Add basic Zaamo and Zalrsc support
>>
>> Patrick O'Neill (2):
>>    RISC-V: Add Zalrsc and Zaamo testsuite support
>>    RISC-V: Add Zalrsc amo-op patterns
>>
>>   gcc/common/config/riscv/riscv-common.cc   |  11 +-
>>   gcc/config/riscv/arch-canonicalize    |   1 +
>>   gcc/config/riscv/riscv.opt    |   6 +-
>>   gcc/config/riscv/sync.md  | 152 +++---
>>   gcc/doc/sourcebuild.texi  |  16 +-
>>   .../riscv/amo-table-a-6-amo-add-1.c   |   2 +-
>>   .../riscv/amo-table-a-6-amo-add-2.c   |   2 +-
>>   .../riscv/amo-table-a-6-amo-add-3.c   |   2 +-
>>   .../riscv/amo-table-a-6-amo-add-4.c   |   2 +-
>>   .../riscv/amo-table-a-6-amo-add-5.c   |   2 +-
>>   .../riscv/amo-table-a-6-compare-exchange-1.c  |   2 +-
>>   .../riscv/amo-table-a-6-compare-exchange-2.c  |   2 +-
>>   .../riscv/amo-table-a-6-compare-exchange-3.c  |   2 +-
>>   .../riscv/amo-table-a-6-compare-exchange-4.c  |   2 +-
>>   .../riscv/amo-table-a-6-compare-exchange-5.c  |   2 +-
>>   .../riscv/amo-table-a-6-compare-exchange-6.c  |   2 +-
>>   .../riscv/amo-table-a-6-compare-exchange-7.c  |   2 +-
>>   .../riscv/amo-table-a-6-subword-amo-add-1.c   |   2 +-
>>   .../riscv/amo-table-a-6-subword-amo-add-2.c   |   2 +-
>>   .../riscv/amo-table-a-6-subword-amo-add-3.c   |   2 +-
>>   .../riscv/amo-table-a-6-subword-amo-add-4.c   |   2 +-
>>   .../riscv/amo-table-a-6-subword-amo-add-5.c   |   2 +-
>>   .../riscv/amo-table-ztso-amo-add-1.c  |   2 +-
>>   .../riscv/amo-table-ztso-amo-add-2.c  |   2 +-
>>   .../riscv/amo-table-ztso-amo-add-3.c  |   2 +-
>>   .../riscv/amo-table-ztso-amo-add-4.c  |   2 +-
>>   .../riscv/amo-table-ztso-amo-add-5.c  |   2 +-
>>   .../riscv/amo-table-ztso-compare-exchange-1.c |   2 +-
>>   .../riscv/amo-table-ztso-compare-exchange-2.c |   2 +-
>>   .../riscv/amo-table-ztso-compare-exchange-3.c |   2 +-
>>   .../riscv/amo-table-ztso-compare-exchange-4.c |   2 +-
>>   .../riscv/amo-table-ztso-compare-exchange-5.c |   2 +-
>>   .../riscv/amo-table-ztso-compare-exchange-6.c |   2 +-
>>   .../riscv/amo-table-ztso-compare-exchange-7.c |   2 +-
>>   .../riscv/amo-table-ztso-subword-amo-add-1.c  |   2 +-
>>   .../riscv/amo-table-ztso-subword-amo-add-2.c  |   2 +-
>>   .../riscv/amo-table-ztso-subword-amo-add-3.c  |   2 +-
>>   .../riscv/amo-table-ztso-subword-amo-add-4.c  |   2 +-
>>   .../riscv/amo-table-ztso-subword-amo-add-5.c  |   2 +-
>>   .../riscv/amo-zaamo-preferred-over-zalrsc.c   |  17 ++
>>   .../gcc.target/riscv/amo-zalrsc-amo-add-1.c   |  19 +++
>>   .../gcc.target/riscv/amo-zalrsc-amo-add-2.c   |  19 +++
>>   .../gcc.target/riscv/amo-zalrsc-amo-add-3.c   |  19 +++
>>   .../gcc.target/riscv/amo-zalrsc-amo-add-4.c   |  19 +++
>>   .../gcc.target/riscv/amo-zalrsc-amo-add-5.c   |  19 +++
>>   gcc/testsuite/gcc.target/riscv/attribute-15.c |   2 +-
>>   gcc/testsuite/gcc.target/riscv/attribute-16.c |   2 +-
>>   gcc/testsuite/gcc.target/riscv/attribute-17.c |   2 +-
>>   gcc/testsuite/gcc.target/riscv/attribute-18.c |   2 +-
>>   gcc/testsuite/gcc.target/riscv/pr110696.c |   2 +-
>>   .../gcc.target/riscv/rvv/base/pr114352-1.c    |   4 +-
>>   .../gcc.target/riscv/rvv/base/pr114352-3.c    |   8 +-
>>   gcc/testsuite/lib/target-supports.exp |  48 +-
>>   53 files changed, 366 insertions(+), 70 deletions(-)
>>   create mode 100644 
>> gcc/testsuite/gcc.target/riscv/amo-zaamo-preferred-over-zalrsc.c
>>   create mode 100644 
>> gcc/testsuite/gcc.target/riscv/amo-zalrsc-amo-add-1.c
>>   create mode 100644 
>> gcc/testsuite/gcc.target/riscv/amo-zalrsc-amo-add-2.c
>>   create mode 100644 
>> gcc/testsuite/gcc.target/riscv/amo-zalrsc-amo-add-3.c
>>   create mode 100644 
>> gcc/testsuite/gcc.target/riscv/amo-zalrsc-amo-add

RE: [PATCH v2] Test: Move target independent test cases to gcc.dg/torture

2024-06-12 Thread Li, Pan2
Committed, thanks Jeff.

Pan

-Original Message-
From: Jeff Law  
Sent: Thursday, June 13, 2024 2:11 AM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com; 
pins...@gmail.com
Subject: Re: [PATCH v2] Test: Move target independent test cases to 
gcc.dg/torture



On 6/11/24 8:53 AM, pan2...@intel.com wrote:
> From: Pan Li 
> 
> The test cases of pr115387 are target independent,  at least x86
> and riscv are able to reproduce.  Thus,  move these cases to
> the gcc.dg/torture.
> 
> The below test suites are passed.
> 1. The rv64gcv fully regression test.
> 2. The x86 fully regression test.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/riscv/pr115387-1.c: Move to...
>   * gcc.dg/torture/pr115387-1.c: ...here.
>   * gcc.target/riscv/pr115387-2.c: Move to...
>   * gcc.dg/torture/pr115387-2.c: ...here.
OK
jeff



RE: [PATCH v1] RISC-V: Bugfix vec_extract vls mode iterator restriction mismatch

2024-06-13 Thread Li, Pan2
Committed, thanks Juzhe.

Pan

From: juzhe.zh...@rivai.ai 
Sent: Thursday, June 13, 2024 5:23 PM
To: Li, Pan2 ; gcc-patches 
Cc: kito.cheng ; jeffreyalaw ; 
Robin Dapp ; Li, Pan2 
Subject: Re: [PATCH v1] RISC-V: Bugfix vec_extract vls mode iterator 
restriction mismatch

LGTM


juzhe.zh...@rivai.ai<mailto:juzhe.zh...@rivai.ai>

From: pan2.li<mailto:pan2...@intel.com>
Date: 2024-06-13 16:26
To: gcc-patches<mailto:gcc-patches@gcc.gnu.org>
CC: juzhe.zhong<mailto:juzhe.zh...@rivai.ai>; 
kito.cheng<mailto:kito.ch...@gmail.com>; 
jeffreyalaw<mailto:jeffreya...@gmail.com>; 
rdapp.gcc<mailto:rdapp@gmail.com>; Pan Li<mailto:pan2...@intel.com>
Subject: [PATCH v1] RISC-V: Bugfix vec_extract vls mode iterator restriction 
mismatch
From: Pan Li mailto:pan2...@intel.com>>

We have vec_extract pattern which takes ZVFHMIN as the mode
iterator of the VLS mode.  Aka V_VLS.  But it will expand to
pred_extract_first pattern which takes the ZVFH as the mode
iterator of the VLS mode.  AKa V_VLSF.  The mismatch will
result in one ICE similar as below:

error: unrecognizable insn:
   27 | }
  | ^
(insn 19 18 20 2 (set (reg:HF 150 [ _13 ])
(unspec:HF [
(vec_select:HF (reg:V4HF 134 [ _1 ])
(parallel [
(const_int 0 [0])
]))
(reg:SI 67 vtype)
] UNSPEC_VPREDICATE)) "compress_run-2.c":24:5 -1
 (nil))
during RTL pass: vregs
compress_run-2.c:27:1: internal compiler error: in extract_insn, at
recog.cc:2812
0x1a627ef _fatal_insn(char const*, rtx_def const*, char const*, int,
char const*)
../../../gcc/gcc/rtl-error.cc:108
0x1a62834 _fatal_insn_not_found(rtx_def const*, char const*, int, char
const*)
../../../gcc/gcc/rtl-error.cc:116
0x1a0f356 extract_insn(rtx_insn*)
../../../gcc/gcc/recog.cc:2812
0x159ee61 instantiate_virtual_regs_in_insn
../../../gcc/gcc/function.cc:1612
0x15a04aa instantiate_virtual_regs
../../../gcc/gcc/function.cc:1995
0x15a058e execute
../../../gcc/gcc/function.cc:2042

This patch would like to fix this issue by align the mode
iterator restriction to ZVFH.

The below test suites are passed for this patch.
1. The rv64gcv fully regression test.
2. The rv64gcv build with glibc.

PR target/115456

gcc/ChangeLog:

* config/riscv/autovec.md: Take ZVFH mode iterator instead of
the ZVFHMIN for the alignment.
* config/riscv/vector-iterators.md: Add 2 new iterator
V_VLS_ZVFH and VLS_ZVFH.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr115456-1.c: New test.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>
---
gcc/config/riscv/autovec.md   |  2 +-
gcc/config/riscv/vector-iterators.md  |  4 +++
.../gcc.target/riscv/rvv/base/pr115456-1.c| 31 +++
3 files changed, 36 insertions(+), 1 deletion(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr115456-1.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 0b1e50dd0e9..66d70f678a6 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -1383,7 +1383,7 @@ (define_expand "vec_set"
(define_expand "vec_extract"
   [(set (match_operand:   0 "register_operand")
  (vec_select:
-   (match_operand:V_VLS   1 "register_operand")
+   (match_operand:V_VLS_ZVFH  1 "register_operand")
(parallel
[(match_operand   2 "nonmemory_operand")])))]
   "TARGET_VECTOR"
diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index 76c27035a73..47392d0da4c 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -1574,10 +1574,14 @@ (define_mode_iterator VB_VLS [VB VLSB])
(define_mode_iterator VLS [VLSI VLSF_ZVFHMIN])
+(define_mode_iterator VLS_ZVFH [VLSI VLSF])
+
(define_mode_iterator V [VI VF_ZVFHMIN])
(define_mode_iterator V_VLS [V VLS])
+(define_mode_iterator V_VLS_ZVFH [V VLS_ZVFH])
+
(define_mode_iterator V_VLSI [VI VLSI])
(define_mode_iterator V_VLSF [VF VLSF])
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr115456-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/pr115456-1.c
new file mode 100644
index 000..2c6cc7121b4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr115456-1.c
@@ -0,0 +1,31 @@
+/* Test there is no ICE when compile.  */
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvfhmin -mabi=lp64d -O3 -ftree-vectorize" } */
+
+#include 
+#include 
+
+typedef _Float16 vnx4f __attribute__ ((vector_size (8)));
+
+vnx4f __attribute__ ((noinline, noclone))
+test_5 (vnx4f x, vnx4f y)
+{
+  return __builtin_shufflevector (x, y, 1, 3, 6, 7);
+}
+
+int
+main (void)
+{
+  vnx4f test_5_x = {0, 1, 3, 4};
+  vnx4f test_5_y = {4, 5, 6, 7};
+  vnx4f test_5_except = {1, 4, 6, 7};
+  vnx4f test_5_real;
+  test_5_rea

RE: [PATCH v7] Match: Support more form for scalar unsigned SAT_ADD

2024-06-13 Thread Li, Pan2
Could you please help to update the upstream for another try ?

Should be fixed by this commit 
https://github.com/gcc-mirror/gcc/commit/d03ff3fd3e2da1352a404e3c53fe61314569345c.

Feel free to ping me if any questions or concerns.

Pan

-Original Message-
From: Maciej W. Rozycki  
Sent: Thursday, June 13, 2024 8:01 PM
To: Li, Pan2 
Cc: Richard Biener ; gcc-patches@gcc.gnu.org; 
juzhe.zh...@rivai.ai; kito.ch...@gmail.com; tamar.christ...@arm.com
Subject: RE: [PATCH v7] Match: Support more form for scalar unsigned SAT_ADD

On Thu, 6 Jun 2024, Li, Pan2 wrote:

> Committed, thanks Richard.

 This has broken glibc building for the `riscv64-linux-gnu' target:

iovsprintf.c: In function '__vsprintf_internal':
iovsprintf.c:34:1: error: definition in block 5 follows the use
   34 | __vsprintf_internal (char *string, size_t maxlen,
  | ^~~
for SSA_NAME: _9 in statement:
prephitmp_36 = (char *) _9;
during GIMPLE pass: widening_mul
iovsprintf.c:34:1: internal compiler error: verify_ssa failed
0x7fffb1474bf7 generic_start_main
../csu/libc-start.c:308
0x7fffb1474de3 __libc_start_main
../sysdeps/unix/sysv/linux/powerpc/libc-start.c:116
Please submit a full bug report, with preprocessed source (by using 
-freport-bug).
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.

Use just `riscv64-linux-gnu -O2' on the attached preprocessed source to 
reproduce.  This builds just fine up to e14afbe2d1c6^ and then breaks with 
the message quoted.

 Shall I investigate it further or will you be able to figure it out with 
the information supplied what's wrong?

  Maciej


RE: [PATCH v7] Match: Support more form for scalar unsigned SAT_ADD

2024-06-13 Thread Li, Pan2
Thanks for another try.

Looks the build failure list below has nothing to do with this patch. I think 
the correlated owner will take care of this Werror build issue soon.

Pan

-Original Message-
From: Maciej W. Rozycki  
Sent: Friday, June 14, 2024 12:15 AM
To: Li, Pan2 
Cc: Richard Biener ; gcc-patches@gcc.gnu.org; 
juzhe.zh...@rivai.ai; kito.ch...@gmail.com; tamar.christ...@arm.com
Subject: RE: [PATCH v7] Match: Support more form for scalar unsigned SAT_ADD

On Thu, 13 Jun 2024, Li, Pan2 wrote:

> Could you please help to update the upstream for another try ?
> 
> Should be fixed by this commit 
> https://github.com/gcc-mirror/gcc/commit/d03ff3fd3e2da1352a404e3c53fe61314569345c.
> 
> Feel free to ping me if any questions or concerns.

 Upstream master (as at 609764a42f0c) doesn't build:

In file included from .../gcc/gcc/coretypes.h:487,
 from .../gcc/gcc/tree-vect-stmts.cc:24:
In member function 'bool poly_int::is_constant() const [with unsigned int 
N = 2; C = long unsigned int]',
inlined from 'C poly_int::to_constant() const [with unsigned int N = 
2; C = long unsigned int]' at .../gcc/gcc/poly-int.h:588:3,
inlined from 'bool get_group_load_store_type(vec_info*, stmt_vec_info, 
tree, slp_tree, bool, vec_load_store_type, vect_memory_access_type*, 
poly_int64*, dr_alignment_support*, int*, gather_scatter_info*, internal_fn*)' 
at .../gcc/gcc/tree-vect-stmts.cc:2155:39,
inlined from 'bool get_load_store_type(vec_info*, stmt_vec_info, tree, 
slp_tree, bool, vec_load_store_type, unsigned int, vect_memory_access_type*, 
poly_int64*, dr_alignment_support*, int*, gather_scatter_info*, internal_fn*)' 
at .../gcc/gcc/tree-vect-stmts.cc:2387:38:
.../gcc/gcc/poly-int.h:557:7: error: 'remain.poly_int<2, long unsigned 
int>::coeffs[1]' may be used uninitialized [-Werror=maybe-uninitialized]
  557 |   if (this->coeffs[i] != 0)
  |   ^~
.../gcc/gcc/tree-vect-stmts.cc: In function 'bool 
get_load_store_type(vec_info*, stmt_vec_info, tree, slp_tree, bool, 
vec_load_store_type, unsigned int, vect_memory_access_type*, poly_int64*, 
dr_alignment_support*, int*, gather_scatter_info*, internal_fn*)':
.../gcc/gcc/tree-vect-stmts.cc:2115:23: note: 'remain.poly_int<2, long unsigned 
int>::coeffs[1]' was declared here
 2115 |   poly_uint64 remain;
  |   ^~
In file included from .../gcc/gcc/system.h:1250,
 from .../gcc/gcc/tree-vect-stmts.cc:23:
In function 'int ceil_log2(long unsigned int)',
inlined from 'bool get_group_load_store_type(vec_info*, stmt_vec_info, 
tree, slp_tree, bool, vec_load_store_type, vect_memory_access_type*, 
poly_int64*, dr_alignment_support*, int*, gather_scatter_info*, internal_fn*)' 
at .../gcc/gcc/tree-vect-stmts.cc:2156:43,
inlined from 'bool get_load_store_type(vec_info*, stmt_vec_info, tree, 
slp_tree, bool, vec_load_store_type, unsigned int, vect_memory_access_type*, 
poly_int64*, dr_alignment_support*, int*, gather_scatter_info*, internal_fn*)' 
at .../gcc/gcc/tree-vect-stmts.cc:2387:38:
.../gcc/gcc/hwint.h:266:17: error: 'remain.poly_int<2, long unsigned 
int>::coeffs[0]' may be used uninitialized [-Werror=maybe-uninitialized]
  266 |   return x == 0 ? 0 : floor_log2 (x - 1) + 1;
  |  ~~~^~~~
.../gcc/gcc/tree-vect-stmts.cc: In function 'bool 
get_load_store_type(vec_info*, stmt_vec_info, tree, slp_tree, bool, 
vec_load_store_type, unsigned int, vect_memory_access_type*, poly_int64*, 
dr_alignment_support*, int*, gather_scatter_info*, internal_fn*)':
.../gcc/gcc/tree-vect-stmts.cc:2115:23: note: 'remain.poly_int<2, long unsigned 
int>::coeffs[0]' was declared here
 2115 |   poly_uint64 remain;
  |   ^~
cc1plus: all warnings being treated as errors
make[2]: *** [Makefile:1198: tree-vect-stmts.o] Error 1

and actually e14afbe2d1c6^ doesn't build either (I guess it's just slipped 
through bisection as the file didn't have to be rebuild or something):

In file included from .../gcc/gcc/rtl.h:3973,
 from .../gcc/gcc/config/riscv/riscv.cc:31:
In function 'rtx_def* init_rtx_fmt_ee(rtx, machine_mode, rtx, rtx)',
inlined from 'rtx_def* gen_rtx_fmt_ee_stat(rtx_code, machine_mode, rtx, 
rtx)' at ./genrtl.h:50:26,
inlined from 'void riscv_move_integer(rtx, rtx, long int, machine_mode)' at 
.../gcc/gcc/config/riscv/riscv.cc:2786:10:
./genrtl.h:37:16: error: 'x' may be used uninitialized 
[-Werror=maybe-uninitialized]
   37 |   XEXP (rt, 0) = arg0;
.../gcc/gcc/config/riscv/riscv.cc: In function 'void riscv_move_integer(rtx, 
rtx, long int, machine_mode)':
.../gcc/gcc/config/riscv/riscv.cc:2723:7: note: 'x' was declared here
 2723 |   rtx x;
  |  

RE: [PATCH v1 1/8] RISC-V: Add testcases for scalar unsigned SAT_SUB form 3

2024-06-13 Thread Li, Pan2
Thanks Juzhe, will commit the series after the middle-end patch.

Pan

From: juzhe.zh...@rivai.ai 
Sent: Friday, June 14, 2024 10:24 AM
To: Li, Pan2 ; gcc-patches 
Cc: kito.cheng ; jeffreyalaw ; 
Robin Dapp ; Li, Pan2 
Subject: Re: [PATCH v1 1/8] RISC-V: Add testcases for scalar unsigned SAT_SUB 
form 3

LGTM


juzhe.zh...@rivai.ai<mailto:juzhe.zh...@rivai.ai>

From: pan2.li<mailto:pan2...@intel.com>
Date: 2024-06-14 10:13
To: gcc-patches<mailto:gcc-patches@gcc.gnu.org>
CC: juzhe.zhong<mailto:juzhe.zh...@rivai.ai>; 
kito.cheng<mailto:kito.ch...@gmail.com>; 
jeffreyalaw<mailto:jeffreya...@gmail.com>; 
rdapp.gcc<mailto:rdapp@gmail.com>; Pan Li<mailto:pan2...@intel.com>
Subject: [PATCH v1 1/8] RISC-V: Add testcases for scalar unsigned SAT_SUB form 3
From: Pan Li mailto:pan2...@intel.com>>

After the middle-end support the form 3 of unsigned SAT_SUB and
the RISC-V backend implement the scalar .SAT_SUB, add more test
case to cover the form 3 of unsigned .SAT_SUB.

Form 3:
  #define SAT_SUB_U_3(T) \
  T sat_sub_u_3_##T (T x, T y) \
  { \
return x > y ? x - y : 0; \
  }

Passed the rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add helper macro for test.
* gcc.target/riscv/sat_u_sub-10.c: New test.
* gcc.target/riscv/sat_u_sub-11.c: New test.
* gcc.target/riscv/sat_u_sub-12.c: New test.
* gcc.target/riscv/sat_u_sub-9.c: New test.
* gcc.target/riscv/sat_u_sub-run-10.c: New test.
* gcc.target/riscv/sat_u_sub-run-11.c: New test.
* gcc.target/riscv/sat_u_sub-run-12.c: New test.
* gcc.target/riscv/sat_u_sub-run-9.c: New test.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>
---
gcc/testsuite/gcc.target/riscv/sat_arith.h|  8 ++
gcc/testsuite/gcc.target/riscv/sat_u_sub-10.c | 19 ++
gcc/testsuite/gcc.target/riscv/sat_u_sub-11.c | 18 +
gcc/testsuite/gcc.target/riscv/sat_u_sub-12.c | 17 +
gcc/testsuite/gcc.target/riscv/sat_u_sub-9.c  | 18 +
.../gcc.target/riscv/sat_u_sub-run-10.c   | 25 +++
.../gcc.target/riscv/sat_u_sub-run-11.c   | 25 +++
.../gcc.target/riscv/sat_u_sub-run-12.c   | 25 +++
.../gcc.target/riscv/sat_u_sub-run-9.c| 25 +++
9 files changed, 180 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-10.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-11.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-12.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-9.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-10.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-11.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-12.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-9.c

diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/sat_arith.h
index bc9a372b6df..50c65cdea49 100644
--- a/gcc/testsuite/gcc.target/riscv/sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
@@ -92,8 +92,16 @@ sat_u_sub_##T##_fmt_2 (T x, T y)  \
   return (x - y) & (-(T)(x > y)); \
}
+#define DEF_SAT_U_SUB_FMT_3(T)\
+T __attribute__((noinline))   \
+sat_u_sub_##T##_fmt_3 (T x, T y)  \
+{ \
+  return x > y ? x - y : 0;   \
+}
+
#define RUN_SAT_U_SUB_FMT_1(T, x, y) sat_u_sub_##T##_fmt_1(x, y)
#define RUN_SAT_U_SUB_FMT_2(T, x, y) sat_u_sub_##T##_fmt_2(x, y)
+#define RUN_SAT_U_SUB_FMT_3(T, x, y) sat_u_sub_##T##_fmt_3(x, y)
#define DEF_VEC_SAT_U_SUB_FMT_1(T)   \
void __attribute__((noinline))   \
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_sub-10.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_sub-10.c
new file mode 100644
index 000..6e78164865f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_sub-10.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_sub_uint16_t_fmt_3:
+** sub\s+[atx][0-9]+,\s*a0,\s*a1
+** sltu\s+[atx][0-9]+,\s*a0,\s*a1
+** addi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*-1
+** and\s+a0,\s*[atx][0-9]+,\s*[atx][0-9]+
+** slli\s+a0,\s*a0,\s*48
+** srli\s+a0,\s*a0,\s*48
+** ret
+*/
+DEF_SAT_U_SUB_FMT_3(uint16_t)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_SUB " 2 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_sub-11.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_sub-11.c
new file mode 100644
index 000..84e34657f55
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_sub-11.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -

RE: [PATCH v1] Match: Support form 2 for the .SAT_TRUNC

2024-07-10 Thread Li, Pan2
> OK.

Committed, thanks Richard.

Pan

-Original Message-
From: Richard Biener  
Sent: Wednesday, July 10, 2024 5:24 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
tamar.christ...@arm.com; jeffreya...@gmail.com; rdapp@gmail.com; Liu, 
Hongtao 
Subject: Re: [PATCH v1] Match: Support form 2 for the .SAT_TRUNC

On Fri, Jul 5, 2024 at 2:48 PM  wrote:
>
> From: Pan Li 
>
> This patch would like to add form 2 support for the .SAT_TRUNC.  Aka:
>
> Form 2:
>   #define DEF_SAT_U_TRUC_FMT_2(NT, WT) \
>   NT __attribute__((noinline)) \
>   sat_u_truc_##WT##_to_##NT##_fmt_2 (WT x) \
>   {\
> bool overflow = x > (WT)(NT)(-1);  \
> return overflow ? (NT)-1 : (NT)x;  \
>   }
>
> DEF_SAT_U_TRUC_FMT_2(uint32, uint64)
>
> Before this patch:
>3   │
>4   │ __attribute__((noinline))
>5   │ uint32_t sat_u_truc_uint64_t_to_uint32_t_fmt_2 (uint64_t x)
>6   │ {
>7   │   uint32_t _1;
>8   │   long unsigned int _3;
>9   │
>   10   │ ;;   basic block 2, loop depth 0
>   11   │ ;;pred:   ENTRY
>   12   │   _3 = MIN_EXPR ;
>   13   │   _1 = (uint32_t) _3;
>   14   │   return _1;
>   15   │ ;;succ:   EXIT
>   16   │
>   17   │ }
>
> After this patch:
>3   │
>4   │ __attribute__((noinline))
>5   │ uint32_t sat_u_truc_uint64_t_to_uint32_t_fmt_2 (uint64_t x)
>6   │ {
>7   │   uint32_t _1;
>8   │
>9   │ ;;   basic block 2, loop depth 0
>   10   │ ;;pred:   ENTRY
>   11   │   _1 = .SAT_TRUNC (x_2(D)); [tail call]
>   12   │   return _1;
>   13   │ ;;succ:   EXIT
>   14   │
>   15   │ }
>
> The below test suites are passed for this patch:
> 1. The x86 bootstrap test.
> 2. The x86 fully regression test.
> 3. The rv64gcv fully regresssion test.

OK.

Thanks,
Richard.

> gcc/ChangeLog:
>
> * match.pd: Add form 2 for .SAT_TRUNC.
> * tree-ssa-math-opts.cc (math_opts_dom_walker::after_dom_children):
> Add new case NOP_EXPR,  and try to match SAT_TRUNC.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/match.pd  | 17 -
>  gcc/tree-ssa-math-opts.cc |  4 
>  2 files changed, 20 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 4edfa2ae2c9..3759c64d461 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3234,7 +3234,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
>&& types_match (type, @0, @1
>
> -/* Unsigned saturation truncate, case 1 (), sizeof (WT) > sizeof (NT).
> +/* Unsigned saturation truncate, case 1, sizeof (WT) > sizeof (NT).
> SAT_U_TRUNC = (NT)x | (NT)(-(X > (WT)(NT)(-1))).  */
>  (match (unsigned_integer_sat_trunc @0)
>   (bit_ior:c (negate (convert (gt @0 INTEGER_CST@1)))
> @@ -3250,6 +3250,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>}
>(if (otype_precision < itype_precision && wi::eq_p (trunc_max, 
> int_cst))
>
> +/* Unsigned saturation truncate, case 2, sizeof (WT) > sizeof (NT).
> +   SAT_U_TRUNC = (NT)(MIN_EXPR (X, 255)).  */
> +(match (unsigned_integer_sat_trunc @0)
> + (convert (min @0 INTEGER_CST@1))
> + (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> +  && TYPE_UNSIGNED (TREE_TYPE (@0)))
> + (with
> +  {
> +   unsigned itype_precision = TYPE_PRECISION (TREE_TYPE (@0));
> +   unsigned otype_precision = TYPE_PRECISION (type);
> +   wide_int trunc_max = wi::mask (otype_precision, false, itype_precision);
> +   wide_int int_cst = wi::to_wide (@1, itype_precision);
> +  }
> +  (if (otype_precision < itype_precision && wi::eq_p (trunc_max, 
> int_cst))
> +
>  /* x >  y  &&  x != XXX_MIN  -->  x > y
> x >  y  &&  x == XXX_MIN  -->  false . */
>  (for eqne (eq ne)
> diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
> index a35caf5f058..ac86be8eb94 100644
> --- a/gcc/tree-ssa-math-opts.cc
> +++ b/gcc/tree-ssa-math-opts.cc
> @@ -6170,6 +6170,10 @@ math_opts_dom_walker::after_dom_children (basic_block 
> bb)
>   match_unsigned_saturation_sub (&gsi, as_a (stmt));
>   break;
>
> +   case NOP_EXPR:
> + match_unsigned_saturation_trunc (&gsi, as_a (stmt));
> + break;
> +
> default:;
> }
> }
> --
> 2.34.1
>


RE: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for vectorizable_call

2024-07-10 Thread Li, Pan2
> I think that's a bug.  Do you say __builtin_add_overflow fails to promote
> (constant) arguments?

Thanks Richard. Not very sure which part result in type mismatch, will take a 
look and keep you posted.

Pan

-Original Message-
From: Richard Biener  
Sent: Wednesday, July 10, 2024 7:36 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
tamar.christ...@arm.com; jeffreya...@gmail.com; rdapp@gmail.com; Liu, 
Hongtao 
Subject: Re: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for 
vectorizable_call

On Wed, Jul 10, 2024 at 11:28 AM  wrote:
>
> From: Pan Li 
>
> The .SAT_ADD has 2 operand and one of the operand may be INTEGER_CST.
> For example _1 = .SAT_ADD (_2, 9) comes from below sample code.
>
> Form 3:
>   #define DEF_VEC_SAT_U_ADD_IMM_FMT_3(T, IMM)  \
>   T __attribute__((noinline))  \
>   vec_sat_u_add_imm##IMM##_##T##_fmt_3 (T *out, T *in, unsigned limit) \
>   {\
> unsigned i;\
> T ret; \
> for (i = 0; i < limit; i++)\
>   {\
> out[i] = __builtin_add_overflow (in[i], IMM, &ret) ? -1 : ret; \
>   }\
>   }
>
> DEF_VEC_SAT_U_ADD_IMM_FMT_3(uint64_t, 9)
>
> It will failure to vectorize as the vectorizable_call will check the
> operands is type_compatiable but the imm will be treated as unsigned
> SImode from the perspective of tree.

I think that's a bug.  Do you say __builtin_add_overflow fails to promote
(constant) arguments?

>  Aka
>
> uint64_t _1;
> uint64_t _2;
>
> _1 = .SAT_ADD (_2, 9);
>
> The _1 and _2 are unsigned DImode, which is different to imm 9 unsigned
> SImode,  and then result in vectorizable_call fails.  This patch would
> like to promote the imm operand to the operand type mode of _2 if and
> only if there is no precision/data loss.  Aka convert the imm 9 to the
> DImode for above example.
>
> The below test suites are passed for this patch:
> 1. The rv64gcv fully regression tests.
> 2. The rv64gcv build with glibc.
> 3. The x86 bootstrap tests.
> 4. The x86 fully regression tests.
>
> gcc/ChangeLog:
>
> * tree-vect-patterns.cc (vect_recog_promote_cst_to_unsigned): Add
> new func impl to promote the imm tree to target type.
> (vect_recog_sat_add_pattern): Peform the type promotion before
> generate .SAT_ADD call.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/tree-vect-patterns.cc | 17 +
>  1 file changed, 17 insertions(+)
>
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index 86e893a1c43..e1013222b12 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -4527,6 +4527,20 @@ vect_recog_build_binary_gimple_stmt (vec_info *vinfo, 
> stmt_vec_info stmt_info,
>return NULL;
>  }
>
> +static void
> +vect_recog_promote_cst_to_unsigned (tree *op, tree type)
> +{
> +  if (TREE_CODE (*op) != INTEGER_CST || !TYPE_UNSIGNED (type))
> +return;
> +
> +  unsigned precision = TYPE_PRECISION (type);
> +  wide_int type_max = wi::mask (precision, false, precision);
> +  wide_int op_cst_val = wi::to_wide (*op, precision);
> +
> +  if (wi::leu_p (op_cst_val, type_max))
> +*op = wide_int_to_tree (type, op_cst_val);
> +}
> +
>  /*
>   * Try to detect saturation add pattern (SAT_ADD), aka below gimple:
>   *   _7 = _4 + _6;
> @@ -4553,6 +4567,9 @@ vect_recog_sat_add_pattern (vec_info *vinfo, 
> stmt_vec_info stmt_vinfo,
>
>if (gimple_unsigned_integer_sat_add (lhs, ops, NULL))
>  {
> +  vect_recog_promote_cst_to_unsigned (&ops[0], TREE_TYPE (ops[1]));
> +  vect_recog_promote_cst_to_unsigned (&ops[1], TREE_TYPE (ops[0]));
> +
>gimple *stmt = vect_recog_build_binary_gimple_stmt (vinfo, stmt_vinfo,
>   IFN_SAT_ADD, 
> type_out,
>   lhs, ops[0], 
> ops[1]);
> --
> 2.34.1
>


RE: [PATCH v3] Vect: Optimize truncation for .SAT_SUB operands

2024-07-10 Thread Li, Pan2
> OK.

Committed, thanks Richard.

Pan

-Original Message-
From: Richard Biener  
Sent: Wednesday, July 10, 2024 7:26 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
tamar.christ...@arm.com; jeffreya...@gmail.com; rdapp@gmail.com; Liu, 
Hongtao 
Subject: Re: [PATCH v3] Vect: Optimize truncation for .SAT_SUB operands

On Tue, Jul 9, 2024 at 6:03 AM  wrote:
>
> From: Pan Li 
>
> To get better vectorized code of .SAT_SUB,  we would like to avoid the
> truncated operation for the assignment.  For example, as below.
>
> unsigned int _1;
> unsigned int _2;
> unsigned short int _4;
> _9 = (unsigned short int).SAT_SUB (_1, _2);
>
> If we make sure that the _1 is in the range of unsigned short int.  Such
> as a def similar to:
>
> _1 = (unsigned short int)_4;
>
> Then we can do the distribute the truncation operation to:
>
> _3 = (unsigned short int) MIN (65535, _2); // aka _3 = .SAT_TRUNC (_2);
> _9 = .SAT_SUB (_4, _3);
>
> Then,  we can better vectorized code and avoid the unnecessary narrowing
> stmt during vectorization with below stmt(s).
>
> _3 = .SAT_TRUNC(_2); // SI => HI
> _9 = .SAT_SUB (_4, _3);
>
> Let's take RISC-V vector as example to tell the changes.  For below
> sample code:
>
> __attribute__((noinline))
> void test (uint16_t *x, unsigned b, unsigned n)
> {
>   unsigned a = 0;
>   uint16_t *p = x;
>
>   do {
> a = *--p;
> *p = (uint16_t)(a >= b ? a - b : 0);
>   } while (--n);
> }
>
> Before this patch:
>   ...
>   .L3:
>   vle16.v   v1,0(a3)
>   vrsub.vx  v5,v2,t1
>   mvt3,a4
>   addw  a4,a4,t5
>   vrgather.vv   v3,v1,v5
>   vsetvli   zero,zero,e32,m1,ta,ma
>   vzext.vf2 v1,v3
>   vssubu.vx v1,v1,a1
>   vsetvli   zero,zero,e16,mf2,ta,ma
>   vncvt.x.x.w   v1,v1
>   vrgather.vv   v3,v1,v5
>   vse16.v   v3,0(a3)
>   sub   a3,a3,t4
>   bgtu  t6,a4,.L3
>   ...
>
> After this patch:
> test:
>   ...
>   .L3:
>   vle16.v v3,0(a3)
>   vrsub.vxv5,v2,a6
>   mv  a7,a4
>   addwa4,a4,t3
>   vrgather.vv v1,v3,v5
>   vssubu.vv   v1,v1,v6
>   vrgather.vv v3,v1,v5
>   vse16.v v3,0(a3)
>   sub a3,a3,t1
>   bgtut4,a4,.L3
>   ...
>
> The below test suites are passed for this patch:
> 1. The rv64gcv fully regression tests.
> 2. The rv64gcv build with glibc.
> 3. The x86 bootstrap tests.
> 4. The x86 fully regression tests.

OK.

Thanks,
Richard.

> gcc/ChangeLog:
>
> * tree-vect-patterns.cc (vect_recog_sat_sub_pattern_transform):
> Add new func impl to perform the truncation distribution.
> (vect_recog_sat_sub_pattern): Perform above optimize before
> generate .SAT_SUB call.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/tree-vect-patterns.cc | 65 +++
>  1 file changed, 65 insertions(+)
>
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index 86e893a1c43..4570c25b664 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -4566,6 +4566,70 @@ vect_recog_sat_add_pattern (vec_info *vinfo, 
> stmt_vec_info stmt_vinfo,
>return NULL;
>  }
>
> +/*
> + * Try to transform the truncation for .SAT_SUB pattern,  mostly occurs in
> + * the benchmark zip.  Aka:
> + *
> + *   unsigned int _1;
> + *   unsigned int _2;
> + *   unsigned short int _4;
> + *   _9 = (unsigned short int).SAT_SUB (_1, _2);
> + *
> + *   if _1 is known to be in the range of unsigned short int.  For example
> + *   there is a def _1 = (unsigned short int)_4.  Then we can transform the
> + *   truncation to:
> + *
> + *   _3 = (unsigned short int) MIN (65535, _2); // aka _3 = .SAT_TRUNC (_2);
> + *   _9 = .SAT_SUB (_4, _3);
> + *
> + *   Then,  we can better vectorized code and avoid the unnecessary narrowing
> + *   stmt during vectorization with below stmt(s).
> + *
> + *   _3 = .SAT_TRUNC(_2); // SI => HI
> + *   _9 = .SAT_SUB (_4, _3);
> + */
> +static void
> +vect_recog_sat_sub_pattern_transform (vec_info *vinfo,
> + stmt_vec_info stmt_vinfo,
> + tree lhs, tree *ops)
> +{
> +  tree otype = TREE_TYPE (lhs);
> +  tree itype = TREE_TYPE (ops[0]);
> +  unsigned itype_prec = TYPE_PRECISION (itype);
> +  unsigned otype_prec = TYPE_PRECISION (otype);
> +
> +  if (types_compatible_p (otype, itype) || otype_prec >= itype_prec)
> +return;
> +
> +  tree v_otype = get_vectype_for_scalar_type (vinfo, otype);
> +  tree v_itype = get_vectype_for_scalar_type (vinfo, itype);
> +  tree_pair v_pair = tree_pair (v_oty

RE: [PATCH v1] RISC-V: Add testcases for vector .SAT_SUB in zip benchmark

2024-07-11 Thread Li, Pan2
Committed, thanks Juzhe.

Pan

From: juzhe.zh...@rivai.ai 
Sent: Thursday, July 11, 2024 6:32 PM
To: Li, Pan2 ; gcc-patches 
Cc: kito.cheng ; jeffreyalaw ; 
Robin Dapp ; Li, Pan2 
Subject: Re: [PATCH v1] RISC-V: Add testcases for vector .SAT_SUB in zip 
benchmark


LGTM

juzhe.zh...@rivai.ai<mailto:juzhe.zh...@rivai.ai>

From: pan2.li<mailto:pan2...@intel.com>
Date: 2024-07-11 16:29
To: gcc-patches<mailto:gcc-patches@gcc.gnu.org>
CC: juzhe.zhong<mailto:juzhe.zh...@rivai.ai>; 
kito.cheng<mailto:kito.ch...@gmail.com>; 
jeffreyalaw<mailto:jeffreya...@gmail.com>; 
rdapp.gcc<mailto:rdapp@gmail.com>; Pan Li<mailto:pan2...@intel.com>
Subject: [PATCH v1] RISC-V: Add testcases for vector .SAT_SUB in zip benchmark
From: Pan Li mailto:pan2...@intel.com>>

This patch would like to add the test cases for the vector .SAT_SUB in
the zip benchmark.  Aka:

Form in zip benchmark:
  #define DEF_VEC_SAT_U_SUB_ZIP(T1, T2) \
  void __attribute__((noinline))\
  vec_sat_u_sub_##T1##_##T2##_fmt_zip (T1 *x, T2 b, unsigned limit) \
  { \
T2 a;   \
T1 *p = x;  \
do {\
  a = *--p; \
  *p = (T1)(a >= b ? a - b : 0);\
} while (--limit);  \
  }

DEF_VEC_SAT_U_SUB_ZIP(uint8_t, uint16_t)

vec_sat_u_sub_uint16_t_uint32_t_fmt_zip:
  ...
  vsetvli   a4,zero,e32,m1,ta,ma
  vmv.v.x   v6,a1
  vsetvli   zero,zero,e16,mf2,ta,ma
  vid.v v2
  lia4,-1
  vnclipu.wiv6,v6,0   // .SAT_TRUNC
.L3:
  vle16.v   v3,0(a3)
  vrsub.vx  v5,v2,a6
  mva7,a4
  addw  a4,a4,t3
  vrgather.vv   v1,v3,v5
  vssubu.vv v1,v1,v6  // .SAT_SUB
  vrgather.vv   v3,v1,v5
  vse16.v   v3,0(a3)
  sub   a3,a3,t1
  bgtu  t4,a4,.L3

Passed the rv64gcv tests.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h: Add test
helper macros.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_data.h: Add test
data for .SAT_SUB in zip benchmark.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_vx.h: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_zip-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_zip.c: New test.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>
---
.../riscv/rvv/autovec/binop/vec_sat_arith.h   | 18 +
.../rvv/autovec/binop/vec_sat_binary_vx.h | 22 +
.../riscv/rvv/autovec/binop/vec_sat_data.h| 81 +++
.../rvv/autovec/binop/vec_sat_u_sub_zip-run.c | 16 
.../rvv/autovec/binop/vec_sat_u_sub_zip.c | 18 +
5 files changed, 155 insertions(+)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_vx.h
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_zip-run.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_zip.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h
index 10459807b2c..416a1e49a47 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h
@@ -322,6 +322,19 @@ vec_sat_u_sub_##T##_fmt_10 (T *out, T *op_1, T *op_2, 
unsigned limit) \
 } \
}
+#define DEF_VEC_SAT_U_SUB_ZIP(T1, T2) \
+void __attribute__((noinline))\
+vec_sat_u_sub_##T1##_##T2##_fmt_zip (T1 *x, T2 b, unsigned limit) \
+{ \
+  T2 a;   \
+  T1 *p = x;  \
+  do {\
+a = *--p; \
+*p = (T1)(a >= b ? a - b : 0);\
+  } while (--limit);  \
+}
+#define DEF_VEC_SAT_U_SUB_ZIP_WRAP(T1, T2) DEF_VEC_SAT_U_SUB_ZIP(T1, T2)
+
#define RUN_VEC_SAT_U_SUB_FMT_1(T, out, op_1, op_2, N) \
   vec_sat_u_sub_##T##_fmt_1(out, op_1, op_2, N)
@@ -352,6 +365,11 @@ vec_sat_u_sub_##T##_fmt_10 (T *out, T *op_1, T *op_2, 
unsigned limit) \
#define RUN_VEC_SAT_U_SUB_FMT_10(T, out, op_1, op_2, N) \
   vec_sat_u_sub_##T##_fmt_10(out, op_1, op_2, N)
+#define RUN_VEC_SAT_U_SUB_FMT_ZIP(T1, T2, x, b, N) \
+  vec_sat_u_sub_##T1##_##T2##_fmt_zi

RE: [PATCH] RISC-V: Fix testcase for vector .SAT_SUB in zip benchmark

2024-07-12 Thread Li, Pan2
Thanks Jeff and Edwin for my silly mistake.

Pan

-Original Message-
From: Jeff Law  
Sent: Saturday, July 13, 2024 5:40 AM
To: Edwin Lu ; gcc-patches@gcc.gnu.org
Cc: Li, Pan2 ; gnu-toolch...@rivosinc.com
Subject: Re: [PATCH] RISC-V: Fix testcase for vector .SAT_SUB in zip benchmark



On 7/12/24 12:37 PM, Edwin Lu wrote:
> The following testcase was not properly testing anything due to an
> uninitialized variable. As a result, the loop was not iterating through
> the testing data, but instead on undefined values which could cause an
> unexpected abort.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_vx.h:
>   initialize variable
OK.  Thanks for chasing this down.

jeff



RE: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for vectorizable_call

2024-07-16 Thread Li, Pan2
> I think that's a bug.  Do you say __builtin_add_overflow fails to promote
> (constant) arguments?

I double checked the 022t.ssa pass for the __builtin_add_overflow operands tree 
type. It looks like that
the 2 operands of .ADD_OVERFLOW has different tree types when one of them is 
constant.
One is unsigned DI, and the other is int.

(gdb) call debug_gimple_stmt(stmt)
_14 = .ADD_OVERFLOW (_4, 129);
(gdb) call debug_tree (gimple_call_arg(stmt, 0))
 
unit-size 
align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 
0x76a437e0 precision:64 min  max 
pointer_to_this >
visited
def_stmt _4 = *_3;
version:4>
(gdb) call debug_tree (gimple_call_arg(stmt, 1))
  constant 
129>
(gdb)

Then we go to the vect pass, we can also see that the ops of .ADD_OVERFLOW has 
different tree types.
As my understanding, here we should have unsigned DI for constant operands

(gdb) layout src
(gdb) list
506 if 
(gimple_call_num_args (_c4) == 2)
507   {
508 
tree _q40 = gimple_call_arg (_c4, 0);
509 
_q40 = do_valueize (valueize, _q40);
510 
tree _q41 = gimple_call_arg (_c4, 1);
511 
_q41 = do_valueize (valueize, _q41);
512 
if (integer_zerop (_q21))
513 
  {
514 
if (integer_minus_onep (_p1))
515 
  {
(gdb) call debug_tree (_q40)
 
unit-size 
align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 
0x76a437e0 precision:64 min  max 
pointer_to_this >
visited
def_stmt _4 = *_3;
version:4>
(gdb) call debug_tree (_q41)
  constant 
129>

Pan

-Original Message-
From: Richard Biener  
Sent: Wednesday, July 10, 2024 7:36 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
tamar.christ...@arm.com; jeffreya...@gmail.com; rdapp@gmail.com; Liu, 
Hongtao 
Subject: Re: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for 
vectorizable_call

On Wed, Jul 10, 2024 at 11:28 AM  wrote:
>
> From: Pan Li 
>
> The .SAT_ADD has 2 operand and one of the operand may be INTEGER_CST.
> For example _1 = .SAT_ADD (_2, 9) comes from below sample code.
>
> Form 3:
>   #define DEF_VEC_SAT_U_ADD_IMM_FMT_3(T, IMM)  \
>   T __attribute__((noinline))  \
>   vec_sat_u_add_imm##IMM##_##T##_fmt_3 (T *out, T *in, unsigned limit) \
>   {\
> unsigned i;\
> T ret; \
> for (i = 0; i < limit; i++)\
>   {\
> out[i] = __builtin_add_overflow (in[i], IMM, &ret) ? -1 : ret; \
>   }\
>   }
>
> DEF_VEC_SAT_U_ADD_IMM_FMT_3(uint64_t, 9)
>
> It will failure to vectorize as the vectorizable_call will check the
> operands is type_compatiable but the imm will be treated as unsigned
> SImode from the perspective of tree.

I think that's a bug.  Do you say __builtin_add_overflow fails to promote
(constant) arguments?

>  Aka
>
> uint64_t _1;
> uint64_t _2;
>
> _1 = .SAT_ADD (_2, 9);
>
> The _1 and _2 are unsigned DImode, which is different to imm 9 unsigned
> SImode,  and then result in vectorizable_call fails.  This patch would
> like to promote the imm operand to the operand type mode of _2 if and
> only if there is no precision/data loss.  Aka convert the imm 9 to the
> DImode for above example.
>
> The below test suites are passed for this patch:
> 1. The rv64gcv fully regression tests.
> 2. The rv64gcv build with glibc.
> 3. The x86 bootstrap tests.
> 4. The x86 fully regression tests.
>
> gcc/ChangeLog:
>
> * tree-vect-patterns.cc (vect_recog_promote_cst_to_unsigned): Add
> new func impl to promote the imm tree to target type.
> (vect_recog_sat_add_pattern): Peform the type promotion before
> generate .SAT_ADD call.
>

RE: [PATCH v1] Internal-fn: Support new IFN SAT_TRUNC for unsigned scalar int

2024-07-17 Thread Li, Pan2
> I just noticed you added ustrunc/sstrunc optabs but didn't add
> documentation for them in md.texi like the other optabs that are
> defined.
> See https://gcc.gnu.org/onlinedocs/gccint/Standard-Names.html for the
> generated file of md.texi there.

> Can you please update md.texi to add them?

Thanks Andrew, almost forgot this, will add it soon.

Pan

-Original Message-
From: Andrew Pinski  
Sent: Thursday, July 18, 2024 6:59 AM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
richard.guent...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1] Internal-fn: Support new IFN SAT_TRUNC for unsigned 
scalar int

On Tue, Jun 25, 2024 at 6:46 PM  wrote:
>
> From: Pan Li 
>
> This patch would like to add the middle-end presentation for the
> saturation truncation.  Aka set the result of truncated value to
> the max value when overflow.  It will take the pattern similar
> as below.
>
> Form 1:
>   #define DEF_SAT_U_TRUC_FMT_1(WT, NT) \
>   NT __attribute__((noinline)) \
>   sat_u_truc_##T##_fmt_1 (WT x)\
>   {\
> bool overflow = x > (WT)(NT)(-1);  \
> return ((NT)x) | (NT)-overflow;\
>   }
>
> For example, truncated uint16_t to uint8_t, we have
>
> * SAT_TRUNC (254)   => 254
> * SAT_TRUNC (255)   => 255
> * SAT_TRUNC (256)   => 255
> * SAT_TRUNC (65536) => 255
>
> Given below SAT_TRUNC from uint64_t to uint32_t.
>
> DEF_SAT_U_TRUC_FMT_1 (uint64_t, uint32_t)
>
> Before this patch:
> __attribute__((noinline))
> uint32_t sat_u_truc_T_fmt_1 (uint64_t x)
> {
>   _Bool overflow;
>   unsigned int _1;
>   unsigned int _2;
>   unsigned int _3;
>   uint32_t _6;
>
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   overflow_5 = x_4(D) > 4294967295;
>   _1 = (unsigned int) x_4(D);
>   _2 = (unsigned int) overflow_5;
>   _3 = -_2;
>   _6 = _1 | _3;
>   return _6;
> ;;succ:   EXIT
>
> }
>
> After this patch:
> __attribute__((noinline))
> uint32_t sat_u_truc_T_fmt_1 (uint64_t x)
> {
>   uint32_t _6;
>
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _6 = .SAT_TRUNC (x_4(D)); [tail call]
>   return _6;
> ;;succ:   EXIT
>
> }
>
> The below tests are passed for this patch:
> *. The rv64gcv fully regression tests.
> *. The rv64gcv build with glibc.
> *. The x86 bootstrap tests.
> *. The x86 fully regression tests.
>
> gcc/ChangeLog:
>
> * internal-fn.def (SAT_TRUNC): Add new signed IFN sat_trunc as
> unary_convert.
> * match.pd: Add new matching pattern for unsigned int sat_trunc.
> * optabs.def (OPTAB_CL): Add unsigned and signed optab.

I just noticed you added ustrunc/sstrunc optabs but didn't add
documentation for them in md.texi like the other optabs that are
defined.
See https://gcc.gnu.org/onlinedocs/gccint/Standard-Names.html for the
generated file of md.texi there.

Can you please update md.texi to add them?

Thanks,
Andrew Pinski


> * tree-ssa-math-opts.cc (gimple_unsigend_integer_sat_trunc): Add
> new decl for the matching pattern generated func.
> (match_unsigned_saturation_trunc): Add new func impl to match
> the .SAT_TRUNC.
> (math_opts_dom_walker::after_dom_children): Add .SAT_TRUNC match
> function under BIT_IOR_EXPR case.
> * tree.cc (integer_half_truncated_all_ones_p): Add new func impl
> to filter the truncated threshold.
> * tree.h (integer_half_truncated_all_ones_p): Add new func decl.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/internal-fn.def   |  2 ++
>  gcc/match.pd  | 12 +++-
>  gcc/optabs.def|  3 +++
>  gcc/tree-ssa-math-opts.cc | 32 
>  gcc/tree.cc   | 22 ++
>  gcc/tree.h|  6 ++
>  6 files changed, 76 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index a8c83437ada..915d329c05a 100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -278,6 +278,8 @@ DEF_INTERNAL_SIGNED_OPTAB_FN (MULHRS, ECF_CONST | 
> ECF_NOTHROW, first,
>  DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_ADD, ECF_CONST, first, ssadd, usadd, 
> binary)
>  DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_SUB, ECF_CONST, first, sssub, ussub, 
> binary)
>
> +DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_TRUNC, ECF_CONST, first, sstrunc, ustrunc, 
> unary_convert)
> +
>  DEF_INTERNAL_COND_FN (ADD, ECF_CONST, add, binary)
>  DEF_INTERNAL_COND_FN (SUB, ECF_CONST, sub, binary)
>  DEF_INTERNAL_COND_FN (MUL, ECF_CONST, smul, binary)
> 

RE: [PATCH v1] Match: Bugfix .SAT_TRUNC honor types has no mode precision [PR115961]

2024-07-17 Thread Li, Pan2
Thanks all, will have a try in v2.

Pan

-Original Message-
From: Richard Sandiford  
Sent: Thursday, July 18, 2024 5:14 AM
To: Andrew Pinski 
Cc: Tamar Christina ; Richard Biener 
; Li, Pan2 ; 
gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
jeffreya...@gmail.com; rdapp@gmail.com; Liu, Hongtao 
Subject: Re: [PATCH v1] Match: Bugfix .SAT_TRUNC honor types has no mode 
precision [PR115961]

Andrew Pinski  writes:
> On Wed, Jul 17, 2024 at 1:03 PM Tamar Christina  
> wrote:
>>
>> > -Original Message-
>> > From: Richard Sandiford 
>> > Sent: Wednesday, July 17, 2024 8:55 PM
>> > To: Richard Biener 
>> > Cc: pan2...@intel.com; gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai;
>> > kito.ch...@gmail.com; Tamar Christina ;
>> > jeffreya...@gmail.com; rdapp@gmail.com; hongtao@intel.com
>> > Subject: Re: [PATCH v1] Match: Bugfix .SAT_TRUNC honor types has no mode
>> > precision [PR115961]
>> >
>> > Richard Biener  writes:
>> > > On Wed, Jul 17, 2024 at 11:48 AM  wrote:
>> > >>
>> > >> From: Pan Li 
>> > >>
>> > >> The .SAT_TRUNC matching doesn't check the type has mode precision.  Thus
>> > >> when bitfield like below will be recog as .SAT_TRUNC.
>> > >>
>> > >> struct e
>> > >> {
>> > >>   unsigned pre : 12;
>> > >>   unsigned a : 4;
>> > >> };
>> > >>
>> > >> __attribute__((noipa))
>> > >> void bug (e * v, unsigned def, unsigned use) {
>> > >>   e & defE = *v;
>> > >>   defE.a = min_u (use + 1, 0xf);
>> > >> }
>> > >>
>> > >> This patch would like to add type_has_mode_precision_p for the
>> > >> .SAT_TRUNC matching to get rid of this.
>> > >>
>> > >> The below test suites are passed for this patch:
>> > >> 1. The rv64gcv fully regression tests.
>> > >> 2. The x86 bootstrap tests.
>> > >> 3. The x86 fully regression tests.
>> > >
>> > > Hmm, rather than restricting the matching the issue is the optab query or
>> > > in this case how *_optab_supported_p blindly uses TYPE_MODE without
>> > > either asserting the type has mode precision or failing the query in 
>> > > this case.
>> > >
>> > > I think it would be simplest to adjust direct_optab_supported_p
>> > > (and convert_optab_supported_p) to reject such operations?  Richard, do
>> > > you agree or should callers check this instead?
>> >
>> > Sounds good to me, although I suppose it should go:
>> >
>> > bool
>> > direct_internal_fn_supported_p (internal_fn fn, tree_pair types,
>> >   optimization_type opt_type)
>> > {
>> >   // <--- Here
>> >   switch (fn)
>> > {
>> >
>> > }
>> > }
>> >
>> > until we know of a specific case where that's wrong.
>> >
>> > Is type_has_mode_precision_p meaningful for all types?
>> >
>>
>> I was wondering about that, wouldn't VECTOR_BOOLEAN_TYPE_P types fail?
>> e.g. on AVX where the type precision is 1 but the mode precision QImode?
>>
>> Unless I misunderstood the predicate.
>
> So type_has_mode_precision_p only works with scalar integral types
> (maybe scalar real types too) since it uses TYPE_PRECISION directly
> and not element_precision (the precision field is overloaded for
> vectors for the number of elements and TYPE_PRECISION on a vector type
> will cause an ICE since r14-2150-gfe48f2651334bc).
> So I suspect you need to check !VECTOR_TYPE_P (type) before calling
> type_has_mode_precision_p .

I think for VECTOR_TYPE_P it would be worth checking VECTOR_MODE_P instead,
if we're not requiring callers to check this kind of thing.

So something like:

bool
mode_describes_type_p (const_tree type)
{
  if (VECTOR_TYPE_P (type))
return VECTOR_MODE_P (TREE_TYPE (type));

  if (INTEGRAL_TYPE_P (type))
return type_has_mode_precision_p (type);

  if (SCALAR_FLOAT_TYPE_P (type))
return true;

  return false;
}

?  Possibly also with complex handling if we need that.

Richard


RE: [PATCH v1] Doc: Add Standard-Names ustrunc and sstrunc for integer modes

2024-07-18 Thread Li, Pan2
Thanks Richard and Andrew, will commit v2 with that changes.

https://gcc.gnu.org/pipermail/gcc-patches/2024-July/657617.html

Pan

-Original Message-
From: Richard Biener  
Sent: Thursday, July 18, 2024 3:00 PM
To: Andrew Pinski 
Cc: Li, Pan2 ; gcc-patches@gcc.gnu.org; 
juzhe.zh...@rivai.ai; kito.ch...@gmail.com; tamar.christ...@arm.com; 
jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1] Doc: Add Standard-Names ustrunc and sstrunc for integer 
modes

On Thu, Jul 18, 2024 at 7:35 AM Andrew Pinski  wrote:
>
> On Wed, Jul 17, 2024 at 9:20 PM  wrote:
> >
> > From: Pan Li 
> >
> > This patch would like to add the doc for the Standard-Names
> > ustrunc and sstrunc,  include both the scalar and vector integer
> > modes.
>
> Thanks for doing this and this looks mostly good to me (can't approve it).

Too bad.  OK with the changes Andrew requested.

Thanks,
Richard.

>
> >
> > gcc/ChangeLog:
> >
> > * doc/md.texi: Add Standard-Names ustrunc and sstrunc.
> >
> > Signed-off-by: Pan Li 
> > ---
> >  gcc/doc/md.texi | 12 
> >  1 file changed, 12 insertions(+)
> >
> > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> > index 7f4335e0aac..f116dede906 100644
> > --- a/gcc/doc/md.texi
> > +++ b/gcc/doc/md.texi
> > @@ -5543,6 +5543,18 @@ means of constraints requiring operands 1 and 0 to 
> > be the same location.
> >  @itemx @samp{and@var{m}3}, @samp{ior@var{m}3}, @samp{xor@var{m}3}
> >  Similar, for other arithmetic operations.
> >
> > +@cindex @code{ustrunc@var{m}@var{n}2} instruction pattern
> > +@item @samp{ustrunc@var{m}@var{n}2}
> > +Truncate the operand 1, and storing the result in operand 0.  There will
> > +be saturation during the trunction.  The result will be saturated to the
> > +maximal value of operand 0 type if there is overflow when truncation.  The
> s/type/mode/ .
> > +operand 1 must have mode @var{n},  and the operand 0 must have mode 
> > @var{m}.
> > +Both the scalar and vector integer modes are allowed.
> I don't think you need the article `the` here. It reads wrong with it
> at least to me.
>
> > +
> > +@cindex @code{sstrunc@var{m}@var{n}2} instruction pattern
> > +@item @samp{sstrunc@var{m}@var{n}2}
> > +Similar but for signed.
> > +
> >  @cindex @code{andc@var{m}3} instruction pattern
> >  @item @samp{andc@var{m}3}
> >  Like @code{and@var{m}3}, but it uses bitwise-complement of operand 2
> > --
> > 2.34.1
> >


RE: [PATCH v1] Match: Only allow single use of MIN_EXPR for SAT_TRUNC form 2 [PR115863]

2024-07-18 Thread Li, Pan2
Thanks Tamar for comments.

The :s flag is somehow ignored in matching according the gccint doc.

"The second supported flag is s which tells the code generator to fail the 
pattern if the
expression marked with s does have more than one use and the simplification 
results in an
expression with more than one operator."

I also diff the generated code in gimple_unsigned_integer_sat_trunc, it doesn't 
have single use when :s flag.

&& TYPE_UNSIGNED (TREE_TYPE (captures[0]))  
 // the :s flag
&& TYPE_UNSIGNED (TREE_TYPE (captures[0])) && single_use (captures[1]) // 
explicit single_use check.

Pan

-Original Message-
From: Tamar Christina  
Sent: Thursday, July 18, 2024 8:36 PM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com; 
jeffreya...@gmail.com; rdapp@gmail.com; Liu, Hongtao 
Subject: RE: [PATCH v1] Match: Only allow single use of MIN_EXPR for SAT_TRUNC 
form 2 [PR115863]

> -Original Message-
> From: pan2...@intel.com 
> Sent: Thursday, July 18, 2024 1:27 PM
> To: gcc-patches@gcc.gnu.org
> Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com;
> Tamar Christina ; jeffreya...@gmail.com;
> rdapp@gmail.com; hongtao@intel.com; Pan Li 
> Subject: [PATCH v1] Match: Only allow single use of MIN_EXPR for SAT_TRUNC
> form 2 [PR115863]
> 
> From: Pan Li 
> 
> The SAT_TRUNC form 2 has below pattern matching.
> From:
>   _18 = MIN_EXPR ;
>   iftmp.0_11 = (unsigned int) _18;
> 
> To:
>   _18 = MIN_EXPR ;
>   iftmp.0_11 = .SAT_TRUNC (_18);
> 
> But if there is another use of _18 like below,  the transform to the
> .SAT_TRUNC may have no earnings.  For example:
> 
> From:
>   _18 = MIN_EXPR ; // op_0 def
>   iftmp.0_11 = (unsigned int) _18; // op_0
>   stream.avail_out = iftmp.0_11;
>   left_37 = left_8 - _18;  // op_0 use
> 
> To:
>   _18 = MIN_EXPR ; // op_0 def
>   iftmp.0_11 = .SAT_TRUNC (_18);
>   stream.avail_out = iftmp.0_11;
>   left_37 = left_8 - _18;  // op_0 use
> 
> Pattern recog to .SAT_TRUNC cannot eliminate MIN_EXPR as above.  Then the
> backend (for example x86/riscv) will have additional 2-3 more insns
> after pattern recog besides the MIN_EXPR.  Thus,  keep the normal truncation
> as is should be the better choose.
> 
> The below testsuites are passed for this patch:
> 1. The rv64gcv fully regression tests.
> 2. The x86 bootstrap tests.
> 3. The x86 fully regression tests.
> 
>   PR target/115863
> 
> gcc/ChangeLog:
> 
>   * match.pd: Add single_use of MIN_EXPR for .SAT_TRUNC form 2.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/i386/pr115863-1.c: New test.
> 
> Signed-off-by: Pan Li 
> ---
>  gcc/match.pd   | 15 +++--
>  gcc/testsuite/gcc.target/i386/pr115863-1.c | 37 ++
>  2 files changed, 50 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr115863-1.c
> 
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 5cb399b8718..d4f040b5c7b 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3252,10 +3252,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> 
>  /* Unsigned saturation truncate, case 2, sizeof (WT) > sizeof (NT).
> SAT_U_TRUNC = (NT)(MIN_EXPR (X, 255)).  */
> +/* If Op_0 def is MIN_EXPR and not single_use.  Aka below pattern:
> +
> + _18 = MIN_EXPR ; // op_0 def
> + iftmp.0_11 = (unsigned int) _18; // op_0
> + stream.avail_out = iftmp.0_11;
> + left_37 = left_8 - _18;  // op_0 use
> +
> +   Transfer to .SAT_TRUNC will have MIN_EXPR still live.  Then the backend
> +   (for example x86/riscv) will have 2-3 more insns generation for .SAT_TRUNC
> +   besides the MIN_EXPR.  Thus,  keep the normal truncation as is should be
> +   the better choose.  */
>  (match (unsigned_integer_sat_trunc @0)
> - (convert (min @0 INTEGER_CST@1))
> + (convert (min@2 @0 INTEGER_CST@1))
>   (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> -  && TYPE_UNSIGNED (TREE_TYPE (@0)))
> +  && TYPE_UNSIGNED (TREE_TYPE (@0)) && single_use (@2))

You can probably use the single use flag here? so

> - (convert (min @0 INTEGER_CST@1))
> + (convert (min:s @0 @0 INTEGER_CST@1))

?

Cheers,
Tamar

>   (with
>{
> unsigned itype_precision = TYPE_PRECISION (TREE_TYPE (@0));
> diff --git a/gcc/testsuite/gcc.target/i386/pr115863-1.c
> b/gcc/testsuite/gcc.target/i386/pr115863-1.c
> new file mode 100644
> index 000..a672f62cec5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr115863-1.c
> @@ -0,0 +1,37 @@
> +/* PR tar

RE: [PATCH v1] Match: Only allow single use of MIN_EXPR for SAT_TRUNC form 2 [PR115863]

2024-07-18 Thread Li, Pan2
> Otherwise the patch looks good to me.

Thanks Richard, will commit with the log updated.

Pan

-Original Message-
From: Richard Biener  
Sent: Thursday, July 18, 2024 9:27 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
tamar.christ...@arm.com; jeffreya...@gmail.com; rdapp@gmail.com; Liu, 
Hongtao 
Subject: Re: [PATCH v1] Match: Only allow single use of MIN_EXPR for SAT_TRUNC 
form 2 [PR115863]

On Thu, Jul 18, 2024 at 2:27 PM  wrote:
>
> From: Pan Li 
>
> The SAT_TRUNC form 2 has below pattern matching.
> From:
>   _18 = MIN_EXPR ;
>   iftmp.0_11 = (unsigned int) _18;
>
> To:
>   _18 = MIN_EXPR ;
>   iftmp.0_11 = .SAT_TRUNC (_18);

.SAT_TRUNC (left_8);

> But if there is another use of _18 like below,  the transform to the
> .SAT_TRUNC may have no earnings.  For example:
>
> From:
>   _18 = MIN_EXPR ; // op_0 def
>   iftmp.0_11 = (unsigned int) _18; // op_0
>   stream.avail_out = iftmp.0_11;
>   left_37 = left_8 - _18;  // op_0 use
>
> To:
>   _18 = MIN_EXPR ; // op_0 def
>   iftmp.0_11 = .SAT_TRUNC (_18);

.SAT_TRUNC (left_8);?

Otherwise the patch looks good to me.

Thanks,
Richard.

>   stream.avail_out = iftmp.0_11;
>   left_37 = left_8 - _18;  // op_0 use
>
> Pattern recog to .SAT_TRUNC cannot eliminate MIN_EXPR as above.  Then the
> backend (for example x86/riscv) will have additional 2-3 more insns
> after pattern recog besides the MIN_EXPR.  Thus,  keep the normal truncation
> as is should be the better choose.
>
> The below testsuites are passed for this patch:
> 1. The rv64gcv fully regression tests.
> 2. The x86 bootstrap tests.
> 3. The x86 fully regression tests.
>
> PR target/115863
>
> gcc/ChangeLog:
>
> * match.pd: Add single_use of MIN_EXPR for .SAT_TRUNC form 2.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr115863-1.c: New test.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/match.pd   | 15 +++--
>  gcc/testsuite/gcc.target/i386/pr115863-1.c | 37 ++
>  2 files changed, 50 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr115863-1.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 5cb399b8718..d4f040b5c7b 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3252,10 +3252,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>
>  /* Unsigned saturation truncate, case 2, sizeof (WT) > sizeof (NT).
> SAT_U_TRUNC = (NT)(MIN_EXPR (X, 255)).  */
> +/* If Op_0 def is MIN_EXPR and not single_use.  Aka below pattern:
> +
> + _18 = MIN_EXPR ; // op_0 def
> + iftmp.0_11 = (unsigned int) _18; // op_0
> + stream.avail_out = iftmp.0_11;
> + left_37 = left_8 - _18;  // op_0 use
> +
> +   Transfer to .SAT_TRUNC will have MIN_EXPR still live.  Then the backend
> +   (for example x86/riscv) will have 2-3 more insns generation for .SAT_TRUNC
> +   besides the MIN_EXPR.  Thus,  keep the normal truncation as is should be
> +   the better choose.  */
>  (match (unsigned_integer_sat_trunc @0)
> - (convert (min @0 INTEGER_CST@1))
> + (convert (min@2 @0 INTEGER_CST@1))
>   (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> -  && TYPE_UNSIGNED (TREE_TYPE (@0)))
> +  && TYPE_UNSIGNED (TREE_TYPE (@0)) && single_use (@2))
>   (with
>{
> unsigned itype_precision = TYPE_PRECISION (TREE_TYPE (@0));
> diff --git a/gcc/testsuite/gcc.target/i386/pr115863-1.c 
> b/gcc/testsuite/gcc.target/i386/pr115863-1.c
> new file mode 100644
> index 000..a672f62cec5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr115863-1.c
> @@ -0,0 +1,37 @@
> +/* PR target/115863 */
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-rtl-expand-details" } */
> +
> +#include 
> +
> +typedef struct z_stream_s {
> +uint32_t avail_out;
> +} z_stream;
> +
> +typedef z_stream *z_streamp;
> +
> +extern int deflate (z_streamp strmp);
> +
> +int compress2 (uint64_t *destLen)
> +{
> +  z_stream stream;
> +  int err;
> +  const uint32_t max = (uint32_t)(-1);
> +  uint64_t left;
> +
> +  left = *destLen;
> +
> +  stream.avail_out = 0;
> +
> +  do {
> +if (stream.avail_out == 0) {
> +stream.avail_out = left > (uint64_t)max ? max : (uint32_t)left;
> +left -= stream.avail_out;
> +}
> +err = deflate(&stream);
> +} while (err == 0);
> +
> +  return err;
> +}
> +
> +/* { dg-final { scan-rtl-dump-not ".SAT_TRUNC " "expand" } } */
> --
> 2.34.1
>


RE: [PATCH v2] RISC-V: More support of vx and vf for autovec comparison

2024-07-19 Thread Li, Pan2
> +  TEST_COND_IMM_FLOAT (T, >, 0.0, _gt)   
> \
>  +  TEST_COND_IMM_FLOAT (T, <, 0.0, _lt)  
> \
>  +  TEST_COND_IMM_FLOAT (T, >=, 0.0, _ge) 
> \
>  +  TEST_COND_IMM_FLOAT (T, <=, 0.0, _le) 
> \
>  +  TEST_COND_IMM_FLOAT (T, ==, 0.0, _eq) 
> \
>  +  TEST_COND_IMM_FLOAT (T, !=, 0.0, _ne) 
> \

Just curious, does this patch covered float imm is -0.0 (notice only +0.0 is 
mentioned)?
If so we can have similar tests as +0.0 here.

It is totally Ok if -0.0f is not applicable here.

Pan

-Original Message-
From: demin.han  
Sent: Friday, July 19, 2024 4:55 PM
To: gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; Li, Pan2 ; 
jeffreya...@gmail.com; rdapp@gmail.com
Subject: [PATCH v2] RISC-V: More support of vx and vf for autovec comparison

There are still some cases which can't utilize vx or vf after
last_combine pass.

1. integer comparison when imm isn't in range of [-16, 15]
2. float imm is 0.0
3. DI or DF mode under RV32

This patch fix above mentioned issues.

Tested on RV32 and RV64.

Signed-off-by: demin.han 
gcc/ChangeLog:

* config/riscv/autovec.md: register_operand to nonmemory_operand
* config/riscv/riscv-v.cc (get_cmp_insn_code): Select code according
* to scalar_p
(expand_vec_cmp): Generate scalar_p and transform op1
* config/riscv/riscv.cc (riscv_const_insns): Add !FLOAT_MODE_P
* constrain

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/cmp/vcond-1.c: Fix and add test

Signed-off-by: demin.han 
---
V2 changes:
  1. remove unnecessary add_integer_operand and related code
  2. fix one format issue
  3. split patch and make it only related to vec cmp

 gcc/config/riscv/autovec.md   |  2 +-
 gcc/config/riscv/riscv-v.cc   | 57 +++
 gcc/config/riscv/riscv.cc |  2 +-
 .../riscv/rvv/autovec/cmp/vcond-1.c   | 48 +++-
 4 files changed, 82 insertions(+), 27 deletions(-)

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index d5793acc999..a772153 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -690,7 +690,7 @@ (define_expand "vec_cmp"
   [(set (match_operand: 0 "register_operand")
(match_operator: 1 "comparison_operator"
  [(match_operand:V_VLSF 2 "register_operand")
-  (match_operand:V_VLSF 3 "register_operand")]))]
+  (match_operand:V_VLSF 3 "nonmemory_operand")]))]
   "TARGET_VECTOR"
   {
 riscv_vector::expand_vec_cmp_float (operands[0], GET_CODE (operands[1]),
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index e290675bbf0..56328075aeb 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -2624,32 +2624,27 @@ expand_vec_init (rtx target, rtx vals)
 /* Get insn code for corresponding comparison.  */
 
 static insn_code
-get_cmp_insn_code (rtx_code code, machine_mode mode)
+get_cmp_insn_code (rtx_code code, machine_mode mode, bool scalar_p)
 {
   insn_code icode;
-  switch (code)
+  if (FLOAT_MODE_P (mode))
 {
-case EQ:
-case NE:
-case LE:
-case LEU:
-case GT:
-case GTU:
-case LTGT:
-  icode = code_for_pred_cmp (mode);
-  break;
-case LT:
-case LTU:
-case GE:
-case GEU:
-  if (FLOAT_MODE_P (mode))
-   icode = code_for_pred_cmp (mode);
+  icode = !scalar_p ? code_for_pred_cmp (mode)
+   : code_for_pred_cmp_scalar (mode);
+  return icode;
+}
+  if (scalar_p)
+{
+  if (code == GE || code == GEU)
+   icode = code_for_pred_ge_scalar (mode);
   else
-   icode = code_for_pred_ltge (mode);
-  break;
-default:
-  gcc_unreachable ();
+   icode = code_for_pred_cmp_scalar (mode);
+  return icode;
 }
+  if (code == LT || code == LTU || code == GE || code == GEU)
+icode = code_for_pred_ltge (mode);
+  else
+icode = code_for_pred_cmp (mode);
   return icode;
 }
 
@@ -2771,7 +2766,6 @@ expand_vec_cmp (rtx target, rtx_code code, rtx op0, rtx 
op1, rtx mask,
 {
   machine_mode mask_mode = GET_MODE (target);
   machine_mode data_mode = GET_MODE (op0);
-  insn_code icode = get_cmp_insn_code (code, data_mode);
 
   if (code == LTGT)
 {
@@ -2779,12 +2773,29 @@ expand_vec_cmp (rtx target, rtx_code code, rtx op0, rtx 
op1, rtx mask,
   rtx gt = gen_reg_rtx (mask_mode);
   expand_vec_cmp (lt, LT, op0, op1, mask, maskoff);
   expand_vec_cmp (gt, GT, op0, op1, mask, maskoff);
-  icode = code_for_pred (IOR, mask_mode);
+  insn_code icode = code_for_pred (IOR, mask_mode);
   rtx ops[] = {target, l

RE: [PATCH v1] Internal-fn: Only allow modes describe types for internal fn[PR115961]

2024-07-19 Thread Li, Pan2
Thanks Richard S for comments and suggestions, updated in v2.

Pan

-Original Message-
From: Richard Sandiford  
Sent: Friday, July 19, 2024 3:46 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
richard.guent...@gmail.com; tamar.christ...@arm.com; jeffreya...@gmail.com; 
rdapp@gmail.com
Subject: Re: [PATCH v1] Internal-fn: Only allow modes describe types for 
internal fn[PR115961]

pan2...@intel.com writes:
> From: Pan Li 
>
> The direct_internal_fn_supported_p has no restrictions for the type
> modes.  For example the bitfield like below will be recog as .SAT_TRUNC.
>
> struct e
> {
>   unsigned pre : 12;
>   unsigned a : 4;
> };
>
> __attribute__((noipa))
> void bug (e * v, unsigned def, unsigned use) {
>   e & defE = *v;
>   defE.a = min_u (use + 1, 0xf);
> }
>
> This patch would like to add checks for the direct_internal_fn_supported_p,
> and only allows the tree types describled by modes.
>
> The below test suites are passed for this patch:
> 1. The rv64gcv fully regression tests.
> 2. The x86 bootstrap tests.
> 3. The x86 fully regression tests.
>
>   PR target/115961
>
> gcc/ChangeLog:
>
>   * internal-fn.cc (mode_describle_type_precision_p): Add new func
>   impl to check if mode describle the tree type.
>   (direct_internal_fn_supported_p): Add above check for the first
>   and second tree type of tree pair.
>
> gcc/testsuite/ChangeLog:
>
>   * g++.target/i386/pr115961-run-1.C: New test.
>   * g++.target/riscv/rvv/base/pr115961-run-1.C: New test.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/internal-fn.cc| 21 
>  .../g++.target/i386/pr115961-run-1.C  | 34 +++
>  .../riscv/rvv/base/pr115961-run-1.C   | 34 +++
>  3 files changed, 89 insertions(+)
>  create mode 100644 gcc/testsuite/g++.target/i386/pr115961-run-1.C
>  create mode 100644 gcc/testsuite/g++.target/riscv/rvv/base/pr115961-run-1.C
>
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 95946bfd683..4dc69264a24 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -4164,6 +4164,23 @@ direct_internal_fn_optab (internal_fn fn)
>gcc_unreachable ();
>  }
>  
> +/* Return true if the mode describes the precision of tree type,  or false.  
> */
> +
> +static bool
> +mode_describle_type_precision_p (const_tree type)

Bit pedantic, but it's not really just about precision.  For floats
and vectors it's also about format.  Maybe:

/* Return true if TYPE's mode has the same format as TYPE, and if there is
   a 1:1 correspondence between the values that the mode can store and the
   values that the type can store.  */

And maybe my mode_describes_type_p suggestion wasn't the best,
but given that it's not just about precision, I'm not sure about
mode_describle_type_precision_p either.  How about:

  type_strictly_matches_mode_p

?  I'm open to other suggestions.

> +{
> +  if (VECTOR_TYPE_P (type))
> +return VECTOR_MODE_P (TYPE_MODE (type));
> +
> +  if (INTEGRAL_TYPE_P (type))
> +return type_has_mode_precision_p (type);
> +
> +  if (SCALAR_FLOAT_TYPE_P (type) || COMPLEX_FLOAT_TYPE_P (type))
> +return true;
> +
> +  return false;
> +}
> +
>  /* Return true if FN is supported for the types in TYPES when the
> optimization type is OPT_TYPE.  The types are those associated with
> the "type0" and "type1" fields of FN's direct_internal_fn_info
> @@ -4173,6 +4190,10 @@ bool
>  direct_internal_fn_supported_p (internal_fn fn, tree_pair types,
>   optimization_type opt_type)
>  {
> +  if (!mode_describle_type_precision_p (types.first)
> +|| !mode_describle_type_precision_p (types.second))

Formatting nit: the "||" should line up with the "!".

LGTM otherwise.

Thanks,
Richard

> +return false;
> +
>switch (fn)
>  {
>  #define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) \
> diff --git a/gcc/testsuite/g++.target/i386/pr115961-run-1.C 
> b/gcc/testsuite/g++.target/i386/pr115961-run-1.C
> new file mode 100644
> index 000..b8c8aef3b17
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/i386/pr115961-run-1.C
> @@ -0,0 +1,34 @@
> +/* PR target/115961 */
> +/* { dg-do run } */
> +/* { dg-options "-O3 -fdump-rtl-expand-details" } */
> +
> +struct e
> +{
> +  unsigned pre : 12;
> +  unsigned a : 4;
> +};
> +
> +static unsigned min_u (unsigned a, unsigned b)
> +{
> +  return (b < a) ? b : a;
> +}
> +
> +__attribute__((noipa))
> +void bug (e * v, unsigned def, unsigned use) {
> +  e & defE 

RE: [PATCH v3] RISC-V: Implement the .SAT_TRUNC for scalar

2024-07-22 Thread Li, Pan2
Kindly ping.

Pan

-Original Message-
From: Li, Pan2  
Sent: Monday, July 15, 2024 6:35 PM
To: gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; 
rdapp@gmail.com; Li, Pan2 
Subject: [PATCH v3] RISC-V: Implement the .SAT_TRUNC for scalar

From: Pan Li 

Update in v3:
* Rebase the upstream.
* Adjust asm check.

Original log:
This patch would like to implement the simple .SAT_TRUNC pattern
in the riscv backend. Aka:

Form 1:
  #define DEF_SAT_U_TRUC_FMT_1(NT, WT) \
  NT __attribute__((noinline)) \
  sat_u_truc_##WT##_to_##NT##_fmt_1 (WT x) \
  {\
bool overflow = x > (WT)(NT)(-1);  \
return ((NT)x) | (NT)-overflow;\
  }

DEF_SAT_U_TRUC_FMT_1(uint32_t, uint64_t)

Before this patch:
__attribute__((noinline))
uint8_t sat_u_truc_uint16_t_to_uint8_t_fmt_1 (uint16_t x)
{
  _Bool overflow;
  unsigned char _1;
  unsigned char _2;
  unsigned char _3;
  uint8_t _6;

;;   basic block 2, loop depth 0
;;pred:   ENTRY
  overflow_5 = x_4(D) > 255;
  _1 = (unsigned char) x_4(D);
  _2 = (unsigned char) overflow_5;
  _3 = -_2;
  _6 = _1 | _3;
  return _6;
;;succ:   EXIT

}

After this patch:
__attribute__((noinline))
uint8_t sat_u_truc_uint16_t_to_uint8_t_fmt_1 (uint16_t x)
{
  uint8_t _6;

;;   basic block 2, loop depth 0
;;pred:   ENTRY
  _6 = .SAT_TRUNC (x_4(D)); [tail call]
  return _6;
;;succ:   EXIT

}

The below tests suites are passed for this patch
1. The rv64gcv fully regression test.
2. The rv64gcv build with glibc

gcc/ChangeLog:

* config/riscv/iterators.md (ANYI_DOUBLE_TRUNC): Add new iterator
for int double truncation.
(ANYI_DOUBLE_TRUNCATED): Add new attr for int double truncation.
(anyi_double_truncated): Ditto but for lowercase.
* config/riscv/riscv-protos.h (riscv_expand_ustrunc): Add new
func decl for expanding ustrunc
* config/riscv/riscv.cc (riscv_expand_ustrunc): Add new func
impl to expand ustrunc.
* config/riscv/riscv.md (ustrunc2): Impl
the new pattern ustrunc2 for int.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-1.c: Adjust
asm check times from 2 to 4.
* gcc.target/riscv/sat_arith.h: Add test helper macro.
* gcc.target/riscv/sat_arith_data.h: New test.
* gcc.target/riscv/sat_u_trunc-1.c: New test.
* gcc.target/riscv/sat_u_trunc-2.c: New test.
* gcc.target/riscv/sat_u_trunc-3.c: New test.
* gcc.target/riscv/sat_u_trunc-run-1.c: New test.
* gcc.target/riscv/sat_u_trunc-run-2.c: New test.
* gcc.target/riscv/sat_u_trunc-run-3.c: New test.
* gcc.target/riscv/scalar_sat_unary.h: New test.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/iterators.md | 10 
 gcc/config/riscv/riscv-protos.h   |  1 +
 gcc/config/riscv/riscv.cc | 40 +
 gcc/config/riscv/riscv.md | 10 
 .../rvv/autovec/unop/vec_sat_u_trunc-1.c  |  2 +-
 gcc/testsuite/gcc.target/riscv/sat_arith.h| 16 ++
 .../gcc.target/riscv/sat_arith_data.h | 56 +++
 .../gcc.target/riscv/sat_u_trunc-1.c  | 17 ++
 .../gcc.target/riscv/sat_u_trunc-2.c  | 20 +++
 .../gcc.target/riscv/sat_u_trunc-3.c  | 19 +++
 .../gcc.target/riscv/sat_u_trunc-run-1.c  | 16 ++
 .../gcc.target/riscv/sat_u_trunc-run-2.c  | 16 ++
 .../gcc.target/riscv/sat_u_trunc-run-3.c  | 16 ++
 .../gcc.target/riscv/scalar_sat_unary.h   | 22 
 14 files changed, 260 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_arith_data.h
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-run-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-run-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-run-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/scalar_sat_unary.h

diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md
index d61ed53a8b1..734da041f0c 100644
--- a/gcc/config/riscv/iterators.md
+++ b/gcc/config/riscv/iterators.md
@@ -65,6 +65,16 @@ (define_mode_iterator SUBX [QI HI (SI "TARGET_64BIT")])
 ;; Iterator for hardware-supported integer modes.
 (define_mode_iterator ANYI [QI HI SI (DI "TARGET_64BIT")])
 
+(define_mode_iterator ANYI_DOUBLE_TRUNC [HI SI (DI "TARGET_64BIT")])
+
+(define_mode_attr ANYI_DOUBLE_TRUNCATED [
+  (HI "QI") (SI "HI") (DI "SI")
+])
+
+(define_mode_attr anyi_double_truncated [
+  (HI "qi") (SI "hi") (DI "si")
+])
+
 ;; Iterator 

RE: [PATCH v3] RISC-V: Implement the .SAT_TRUNC for scalar

2024-07-22 Thread Li, Pan2
Committed, thanks Robin.

Pan

-Original Message-
From: Robin Dapp  
Sent: Monday, July 22, 2024 11:27 PM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com
Subject: Re: [PATCH v3] RISC-V: Implement the .SAT_TRUNC for scalar

LGTM.

-- 
Regards
 Robin



RE: [PATCH v2] Internal-fn: Only allow type matches mode for internal fn[PR115961]

2024-07-23 Thread Li, Pan2
> Just a slight comment improvement:
> /* Returns true if both types of TYPE_PAIR strictly match their modes,
> else returns false.  */

> This testcase could go in g++.dg/torture/ without the -O3 option.

> Since we are scanning for the negative it should pass on all targets
> even ones without SAT_TRUNC support. And then you should not need the
> other testcase either.

Thanks all, will address above comments and commit it if no surprise from test.

Pan

-Original Message-
From: Richard Sandiford  
Sent: Tuesday, July 23, 2024 10:03 PM
To: Richard Biener 
Cc: Li, Pan2 ; gcc-patches@gcc.gnu.org; 
juzhe.zh...@rivai.ai; kito.ch...@gmail.com; tamar.christ...@arm.com; 
jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v2] Internal-fn: Only allow type matches mode for internal 
fn[PR115961]

Richard Biener  writes:
> On Fri, Jul 19, 2024 at 1:10 PM  wrote:
>>
>> From: Pan Li 
>>
>> The direct_internal_fn_supported_p has no restrictions for the type
>> modes.  For example the bitfield like below will be recog as .SAT_TRUNC.
>>
>> struct e
>> {
>>   unsigned pre : 12;
>>   unsigned a : 4;
>> };
>>
>> __attribute__((noipa))
>> void bug (e * v, unsigned def, unsigned use) {
>>   e & defE = *v;
>>   defE.a = min_u (use + 1, 0xf);
>> }
>>
>> This patch would like to check strictly for the 
>> direct_internal_fn_supported_p,
>> and only allows the type matches mode for ifn type tree pair.
>>
>> The below test suites are passed for this patch:
>> 1. The rv64gcv fully regression tests.
>> 2. The x86 bootstrap tests.
>> 3. The x86 fully regression tests.
>
> LGTM unless Richard S. has any more comments.

LGTM too with Andrew's comments addressed.

Thanks,
Richard

>
> Richard.
>
>> PR target/115961
>>
>> gcc/ChangeLog:
>>
>> * internal-fn.cc (type_strictly_matches_mode_p): Add new func
>> impl to check type strictly matches mode or not.
>> (type_pair_strictly_matches_mode_p): Ditto but for tree type
>> pair.
>> (direct_internal_fn_supported_p): Add above check for the tree
>> type pair.
>>
>> gcc/testsuite/ChangeLog:
>>
>> * g++.target/i386/pr115961-run-1.C: New test.
>> * g++.target/riscv/rvv/base/pr115961-run-1.C: New test.
>>
>> Signed-off-by: Pan Li 
>> ---
>>  gcc/internal-fn.cc| 32 +
>>  .../g++.target/i386/pr115961-run-1.C  | 34 +++
>>  .../riscv/rvv/base/pr115961-run-1.C   | 34 +++
>>  3 files changed, 100 insertions(+)
>>  create mode 100644 gcc/testsuite/g++.target/i386/pr115961-run-1.C
>>  create mode 100644 gcc/testsuite/g++.target/riscv/rvv/base/pr115961-run-1.C
>>
>> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
>> index 95946bfd683..5c21249318e 100644
>> --- a/gcc/internal-fn.cc
>> +++ b/gcc/internal-fn.cc
>> @@ -4164,6 +4164,35 @@ direct_internal_fn_optab (internal_fn fn)
>>gcc_unreachable ();
>>  }
>>
>> +/* Return true if TYPE's mode has the same format as TYPE, and if there is
>> +   a 1:1 correspondence between the values that the mode can store and the
>> +   values that the type can store.  */
>> +
>> +static bool
>> +type_strictly_matches_mode_p (const_tree type)
>> +{
>> +  if (VECTOR_TYPE_P (type))
>> +return VECTOR_MODE_P (TYPE_MODE (type));
>> +
>> +  if (INTEGRAL_TYPE_P (type))
>> +return type_has_mode_precision_p (type);
>> +
>> +  if (SCALAR_FLOAT_TYPE_P (type) || COMPLEX_FLOAT_TYPE_P (type))
>> +return true;
>> +
>> +  return false;
>> +}
>> +
>> +/* Return true if both the first and the second type of tree pair are
>> +   strictly matches their modes,  or return false.  */
>> +
>> +static bool
>> +type_pair_strictly_matches_mode_p (tree_pair type_pair)
>> +{
>> +  return type_strictly_matches_mode_p (type_pair.first)
>> +&& type_strictly_matches_mode_p (type_pair.second);
>> +}
>> +
>>  /* Return true if FN is supported for the types in TYPES when the
>> optimization type is OPT_TYPE.  The types are those associated with
>> the "type0" and "type1" fields of FN's direct_internal_fn_info
>> @@ -4173,6 +4202,9 @@ bool
>>  direct_internal_fn_supported_p (internal_fn fn, tree_pair types,
>> optimization_type opt_type)
>>  {
>> +  if (!type_pair_strictly_matches_mode_p (types))
>> +ret

RE: [PATCH v1] Match: Support .SAT_SUB with IMM op for form 1-4

2024-07-26 Thread Li, Pan2
> OK.

Committed, thanks Richard.

Pan

-Original Message-
From: Richard Biener  
Sent: Friday, July 26, 2024 9:32 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
tamar.christ...@arm.com; jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1] Match: Support .SAT_SUB with IMM op for form 1-4

On Fri, Jul 26, 2024 at 11:20 AM  wrote:
>
> From: Pan Li 
>
> This patch would like to support .SAT_SUB when one of the op
> is IMM.  Aka below 1-4 forms.
>
> Form 1:
>  #define DEF_SAT_U_SUB_IMM_FMT_1(T, IMM) \
>  T __attribute__((noinline)) \
>  sat_u_sub_imm##IMM##_##T##_fmt_1 (T y)  \
>  {   \
>return IMM >= y ? IMM - y : 0;\
>  }
>
> Form 2:
>   #define DEF_SAT_U_SUB_IMM_FMT_2(T, IMM) \
>   T __attribute__((noinline)) \
>   sat_u_sub_imm##IMM##_##T##_fmt_2 (T y)  \
>   {   \
> return IMM > y ? IMM - y : 0; \
>   }
>
> Form 3:
>   #define DEF_SAT_U_SUB_IMM_FMT_3(T, IMM) \
>   T __attribute__((noinline)) \
>   sat_u_sub_imm##IMM##_##T##_fmt_3 (T x)  \
>   {   \
> return x >= IMM ? x - IMM : 0;\
>   }
>
> Form 4:
>   #define DEF_SAT_U_SUB_IMM_FMT_4(T, IMM) \
>   T __attribute__((noinline)) \
>   sat_u_sub_imm##IMM##_##T##_fmt_4 (T x)  \
>   {   \
> return x > IMM ? x - IMM : 0; \
>   }
>
> Take below form 1 as example:
>
> DEF_SAT_U_SUB_OP0_IMM_FMT_1(uint32_t, 11)
>
> Before this patch:
>4   │ __attribute__((noinline))
>5   │ uint64_t sat_u_sub_imm11_uint64_t_fmt_1 (uint64_t y)
>6   │ {
>7   │   uint64_t _1;
>8   │   uint64_t _3;
>9   │
>   10   │ ;;   basic block 2, loop depth 0
>   11   │ ;;pred:   ENTRY
>   12   │   if (y_2(D) <= 11)
>   13   │ goto ; [50.00%]
>   14   │   else
>   15   │ goto ; [50.00%]
>   16   │ ;;succ:   3
>   17   │ ;;4
>   18   │
>   19   │ ;;   basic block 3, loop depth 0
>   20   │ ;;pred:   2
>   21   │   _3 = 11 - y_2(D);
>   22   │ ;;succ:   4
>   23   │
>   24   │ ;;   basic block 4, loop depth 0
>   25   │ ;;pred:   2
>   26   │ ;;3
>   27   │   # _1 = PHI <0(2), _3(3)>
>   28   │   return _1;
>   29   │ ;;succ:   EXIT
>   30   │
>   31   │ }
>
> After this patch:
>4   │ __attribute__((noinline))
>5   │ uint64_t sat_u_sub_imm11_uint64_t_fmt_1 (uint64_t y)
>6   │ {
>7   │   uint64_t _1;
>8   │
>9   │ ;;   basic block 2, loop depth 0
>   10   │ ;;pred:   ENTRY
>   11   │   _1 = .SAT_SUB (11, y_2(D)); [tail call]
>   12   │   return _1;
>   13   │ ;;succ:   EXIT
>   14   │
>   15   │ }
>
> The below test suites are passed for this patch:
> 1. The rv64gcv fully regression tests.
> 2. The x86 bootstrap tests.
> 3. The x86 fully regression tests.

OK.

Thanks,
Richard.

> gcc/ChangeLog:
>
> * match.pd: Add case 9 and case 10 for .SAT_SUB when one
> of the op is IMM.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/match.pd | 35 +++
>  1 file changed, 35 insertions(+)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index cf359b0ec0f..b2e7d61790d 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3234,6 +3234,41 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
>&& types_match (type, @0, @1
>
> +/* Unsigned saturation sub with op_0 imm, case 9 (branch with gt):
> +   SAT_U_SUB = IMM > Y  ? (IMM - Y) : 0.
> + = IMM >= Y ? (IMM - Y) : 0.  */
> +(match (unsigned_integer_sat_sub @0 @1)
> + (cond^ (le @1 INTEGER_CST@2) (minus INTEGER_CST@0 @1) integer_zerop)
> + (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> + && types_match (type, @1))
> + (with
> +  {
> +   unsigned precision = TYPE_PRECISION (type);
> +   wide_int max = wi::mask (precision, false, precision);
> +   wide_int c0 = wi::to_wide (@0);
> +   wide_int c2 = wi::to_wide (@2);
> +   wide_int c2_add_1 = wi::add (c2, wi::uhwi (1, precision));
> +   bool equal_p = wi::eq_p (c0, c2);
> +   bool less_than_1_p = !wi::eq_p (c2, max) && wi::eq_p (c2_add_1, c0);
> +  }
> +  (if (equal_p || less_than_1_p)
> +
> +/* Unsigned saturation sub with op_1 imm, case 10:
> +   SAT_U_SUB = X > IMM  ? (X - IMM) : 0.
> + = X >= IMM ? (X - IMM) : 0.  */
> +(match (unsigned_integer_sat_sub @0 @1)
> + (plus (max @0

RE: [PATCH v1] Widening-Mul: Try .SAT_SUB for PLUS_EXPR when one op is IMM

2024-07-29 Thread Li, Pan2
> OK

Committed, thanks Richard.

Pan

-Original Message-
From: Richard Biener  
Sent: Monday, July 29, 2024 5:03 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
tamar.christ...@arm.com; jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1] Widening-Mul: Try .SAT_SUB for PLUS_EXPR when one op is 
IMM

On Sun, Jul 28, 2024 at 5:25 AM  wrote:
>
> From: Pan Li 
>
> After add the matching for .SAT_SUB when one op is IMM,  there
> will be a new root PLUS_EXPR for the .SAT_SUB pattern.  For example,
>
> Form 3:
>   #define DEF_SAT_U_SUB_IMM_FMT_3(T, IMM) \
>   T __attribute__((noinline)) \
>   sat_u_sub_imm##IMM##_##T##_fmt_3 (T x)  \
>   {   \
> return x >= IMM ? x - IMM : 0;\
>   }
>
> DEF_SAT_U_SUB_IMM_FMT_3(uint64_t, 11)
>
> And then we will have gimple before widening-mul as below.  Thus,  try
> the .SAT_SUB for the PLUS_EXPR.
>
>4   │ __attribute__((noinline))
>5   │ uint64_t sat_u_sub_imm11_uint64_t_fmt_3 (uint64_t x)
>6   │ {
>7   │   long unsigned int _1;
>8   │   uint64_t _3;
>9   │
>   10   │[local count: 1073741824]:
>   11   │   _1 = MAX_EXPR ;
>   12   │   _3 = _1 + 18446744073709551605;
>   13   │   return _3;
>   14   │
>   15   │ }
>
> The below test suites are passed for this patch.
> 1. The rv64gcv fully regression tests.
> 2. The x86 bootstrap tests.
> 3. The x86 fully regression tests.

OK

> gcc/ChangeLog:
>
> * tree-ssa-math-opts.cc (math_opts_dom_walker::after_dom_children):
> Try .SAT_SUB for PLUS_EXPR case.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/tree-ssa-math-opts.cc | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
> index ac86be8eb94..8d96a4c964b 100644
> --- a/gcc/tree-ssa-math-opts.cc
> +++ b/gcc/tree-ssa-math-opts.cc
> @@ -6129,6 +6129,7 @@ math_opts_dom_walker::after_dom_children (basic_block 
> bb)
>
> case PLUS_EXPR:
>   match_unsigned_saturation_add (&gsi, as_a (stmt));
> + match_unsigned_saturation_sub (&gsi, as_a (stmt));
>   /* fall-through  */
> case MINUS_EXPR:
>   if (!convert_plusminus_to_widen (&gsi, stmt, code))
> --
> 2.34.1
>


RE: [PATCH v1] Internal-fn: Handle vector bool type for type strict match mode [PR116103]

2024-07-29 Thread Li, Pan2
> OK.

Thanks Richard, will wait the confirmation from Thomas in case I missed some 
more failed cases.

Pan

-Original Message-
From: Richard Biener  
Sent: Monday, July 29, 2024 4:44 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
tamar.christ...@arm.com; jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1] Internal-fn: Handle vector bool type for type strict 
match mode [PR116103]

On Mon, Jul 29, 2024 at 9:57 AM  wrote:
>
> From: Pan Li 
>
> For some target like target=amdgcn-amdhsa,  we need to take care of
> vector bool types prior to general vector mode types.  Or we may have
> the asm check failure as below.
>
> gcc.target/gcn/cond_smax_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, 
> s[0-9]+, v[0-9]+ 80
> gcc.target/gcn/cond_smin_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, 
> s[0-9]+, v[0-9]+ 80
> gcc.target/gcn/cond_umax_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, 
> s[0-9]+, v[0-9]+ 56
> gcc.target/gcn/cond_umin_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, 
> s[0-9]+, v[0-9]+ 56
> gcc.dg/tree-ssa/loop-bound-2.c scan-tree-dump-not ivopts "zero if "
>
> The below test suites are passed for this patch.
> 1. The rv64gcv fully regression tests.
> 2. The x86 bootstrap tests.
> 3. The x86 fully regression tests.
> 4. The amdgcn test case as above.

OK.

Richard.

> gcc/ChangeLog:
>
> * internal-fn.cc (type_strictly_matches_mode_p): Add handling
> for vector bool type.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/internal-fn.cc | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 8a2e07f2f96..086c8be398a 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -4171,6 +4171,12 @@ direct_internal_fn_optab (internal_fn fn)
>  static bool
>  type_strictly_matches_mode_p (const_tree type)
>  {
> +  /* For target=amdgcn-amdhsa,  we need to take care of vector bool types.
> + More details see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116103.  
> */
> +  if (VECTOR_BOOLEAN_TYPE_P (type) && SCALAR_INT_MODE_P (TYPE_MODE (type))
> +&& TYPE_PRECISION (TREE_TYPE (type)) == 1)
> +return true;
> +
>if (VECTOR_TYPE_P (type))
>  return VECTOR_MODE_P (TYPE_MODE (type));
>
> --
> 2.34.1
>


RE: [PATCH v1] Internal-fn: Handle vector bool type for type strict match mode [PR116103]

2024-07-29 Thread Li, Pan2
Thanks Richard S for comments, updated in v2.

https://gcc.gnu.org/pipermail/gcc-patches/2024-July/658637.html

Pan

-Original Message-
From: Richard Sandiford  
Sent: Tuesday, July 30, 2024 12:09 AM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
richard.guent...@gmail.com; tamar.christ...@arm.com; jeffreya...@gmail.com; 
rdapp@gmail.com
Subject: Re: [PATCH v1] Internal-fn: Handle vector bool type for type strict 
match mode [PR116103]

pan2...@intel.com writes:
> From: Pan Li 
>
> For some target like target=amdgcn-amdhsa,  we need to take care of
> vector bool types prior to general vector mode types.  Or we may have
> the asm check failure as below.
>
> gcc.target/gcn/cond_smax_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, 
> s[0-9]+, v[0-9]+ 80
> gcc.target/gcn/cond_smin_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, 
> s[0-9]+, v[0-9]+ 80
> gcc.target/gcn/cond_umax_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, 
> s[0-9]+, v[0-9]+ 56
> gcc.target/gcn/cond_umin_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, 
> s[0-9]+, v[0-9]+ 56
> gcc.dg/tree-ssa/loop-bound-2.c scan-tree-dump-not ivopts "zero if "
>
> The below test suites are passed for this patch.
> 1. The rv64gcv fully regression tests.
> 2. The x86 bootstrap tests.
> 3. The x86 fully regression tests.
> 4. The amdgcn test case as above.
>
> gcc/ChangeLog:
>
>   * internal-fn.cc (type_strictly_matches_mode_p): Add handling
>   for vector bool type.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/internal-fn.cc | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 8a2e07f2f96..086c8be398a 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -4171,6 +4171,12 @@ direct_internal_fn_optab (internal_fn fn)
>  static bool
>  type_strictly_matches_mode_p (const_tree type)
>  {
> +  /* For target=amdgcn-amdhsa,  we need to take care of vector bool types.
> + More details see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116103.  
> */
> +  if (VECTOR_BOOLEAN_TYPE_P (type) && SCALAR_INT_MODE_P (TYPE_MODE (type))
> +&& TYPE_PRECISION (TREE_TYPE (type)) == 1)

Sorry for the formatting nits, but I think this should be:

  if (VECTOR_BOOLEAN_TYPE_P (type)
  && SCALAR_INT_MODE_P (TYPE_MODE (type))
  && TYPE_PRECISION (TREE_TYPE (type)) == 1)

(one condition per line, indented below "VECTOR").

But I think the comment should give the underlying reason, rather than
treat it as a target oddity.  Maybe something like:

  /* Masked vector operations have both vector data operands and
 vector boolean operands.  The vector data operands are expected
 to have a vector mode, but the vector boolean operands can be
 an integer mode rather than a vector mode, depending on how
 TARGET_VECTORIZE_GET_MASK_MODE is defined.  */

Thanks,
Richard

> +return true;
> +
>if (VECTOR_TYPE_P (type))
>  return VECTOR_MODE_P (TYPE_MODE (type));


RE: [PATCH v1] RISC-V: Take Xmode instead of Pmode for ussub expanding

2024-07-29 Thread Li, Pan2
Committed, thanks Robin.

Pan

-Original Message-
From: Robin Dapp  
Sent: Tuesday, July 30, 2024 2:28 PM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; Robin 
Dapp 
Subject: Re: [PATCH v1] RISC-V: Take Xmode instead of Pmode for ussub expanding

OK.

-- 
Regards
 Robin



RE: [PATCH v1] RISC-V: Implement the quad and oct .SAT_TRUNC for scalar

2024-07-30 Thread Li, Pan2
Kindly ping.

Pan

-Original Message-
From: Li, Pan2  
Sent: Tuesday, July 23, 2024 1:06 PM
To: gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; 
rdapp@gmail.com; Li, Pan2 
Subject: [PATCH v1] RISC-V: Implement the quad and oct .SAT_TRUNC for scalar

From: Pan Li 

This patch would like to implement the quad and oct .SAT_TRUNC pattern
in the riscv backend. Aka:

Form 1:
  #define DEF_SAT_U_TRUC_FMT_1(NT, WT) \
  NT __attribute__((noinline)) \
  sat_u_truc_##WT##_to_##NT##_fmt_1 (WT x) \
  {\
bool overflow = x > (WT)(NT)(-1);  \
return ((NT)x) | (NT)-overflow;\
  }

DEF_SAT_U_TRUC_FMT_1(uint16_t, uint64_t)

Before this patch:
   4   │ __attribute__((noinline))
   5   │ uint16_t sat_u_truc_uint64_t_to_uint16_t_fmt_1 (uint64_t x)
   6   │ {
   7   │   _Bool overflow;
   8   │   short unsigned int _1;
   9   │   short unsigned int _2;
  10   │   short unsigned int _3;
  11   │   uint16_t _6;
  12   │
  13   │ ;;   basic block 2, loop depth 0
  14   │ ;;pred:   ENTRY
  15   │   overflow_5 = x_4(D) > 65535;
  16   │   _1 = (short unsigned int) x_4(D);
  17   │   _2 = (short unsigned int) overflow_5;
  18   │   _3 = -_2;
  19   │   _6 = _1 | _3;
  20   │   return _6;
  21   │ ;;succ:   EXIT
  22   │
  23   │ }

After this patch:
   3   │
   4   │ __attribute__((noinline))
   5   │ uint16_t sat_u_truc_uint64_t_to_uint16_t_fmt_1 (uint64_t x)
   6   │ {
   7   │   uint16_t _6;
   8   │
   9   │ ;;   basic block 2, loop depth 0
  10   │ ;;pred:   ENTRY
  11   │   _6 = .SAT_TRUNC (x_4(D)); [tail call]
  12   │   return _6;
  13   │ ;;succ:   EXIT
  14   │
  15   │ }

The below tests suites are passed for this patch
1. The rv64gcv fully regression test.
2. The rv64gcv build with glibc

gcc/ChangeLog:

* config/riscv/iterators.md (ANYI_QUAD_TRUNC): New iterator for
quad truncation.
(ANYI_OCT_TRUNC): New iterator for oct truncation.
(ANYI_QUAD_TRUNCATED): New attr for truncated quad modes.
(ANYI_OCT_TRUNCATED): New attr for truncated oct modes.
(anyi_quad_truncated): Ditto but for lower case.
(anyi_oct_truncated): Ditto but for lower case.
* config/riscv/riscv.md (ustrunc2):
Add new pattern for quad truncation.
(ustrunc2): Ditto but for oct.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-2.c: Adjust
the expand dump check times.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-3.c: Ditto.
* gcc.target/riscv/sat_arith_data.h: Add test helper macros.
* gcc.target/riscv/sat_u_trunc-4.c: New test.
* gcc.target/riscv/sat_u_trunc-5.c: New test.
* gcc.target/riscv/sat_u_trunc-6.c: New test.
* gcc.target/riscv/sat_u_trunc-run-4.c: New test.
* gcc.target/riscv/sat_u_trunc-run-5.c: New test.
* gcc.target/riscv/sat_u_trunc-run-6.c: New test.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/iterators.md | 20 
 gcc/config/riscv/riscv.md | 20 
 .../rvv/autovec/unop/vec_sat_u_trunc-2.c  |  2 +-
 .../rvv/autovec/unop/vec_sat_u_trunc-3.c  |  2 +-
 .../gcc.target/riscv/sat_arith_data.h | 51 +++
 .../gcc.target/riscv/sat_u_trunc-4.c  | 17 +++
 .../gcc.target/riscv/sat_u_trunc-5.c  | 17 +++
 .../gcc.target/riscv/sat_u_trunc-6.c  | 20 
 .../gcc.target/riscv/sat_u_trunc-run-4.c  | 16 ++
 .../gcc.target/riscv/sat_u_trunc-run-5.c  | 16 ++
 .../gcc.target/riscv/sat_u_trunc-run-6.c  | 16 ++
 11 files changed, 195 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-run-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-run-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-run-6.c

diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md
index 734da041f0c..bdcdb8babc8 100644
--- a/gcc/config/riscv/iterators.md
+++ b/gcc/config/riscv/iterators.md
@@ -67,14 +67,34 @@ (define_mode_iterator ANYI [QI HI SI (DI "TARGET_64BIT")])
 
 (define_mode_iterator ANYI_DOUBLE_TRUNC [HI SI (DI "TARGET_64BIT")])
 
+(define_mode_iterator ANYI_QUAD_TRUNC [SI (DI "TARGET_64BIT")])
+
+(define_mode_iterator ANYI_OCT_TRUNC [(DI "TARGET_64BIT")])
+
 (define_mode_attr ANYI_DOUBLE_TRUNCATED [
   (HI "QI") (SI "HI") (DI "SI")
 ])
 
+(define_mode_attr ANYI_QUAD_TRUNCATED [
+  (SI "QI") (DI "HI")
+])
+
+(define_mode_attr ANYI_OCT_TRUNCATED [
+  (DI "QI")
+])
+
 (define_mode_attr anyi_dou

RE: [PATCH v2] Internal-fn: Handle vector bool type for type strict match mode [PR116103]

2024-08-01 Thread Li, Pan2
> Still OK.

Thanks Richard, let me wait the final confirmation from Richard S.

Pan

-Original Message-
From: Richard Biener  
Sent: Tuesday, July 30, 2024 5:03 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
tamar.christ...@arm.com; jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v2] Internal-fn: Handle vector bool type for type strict 
match mode [PR116103]

On Tue, Jul 30, 2024 at 5:08 AM  wrote:
>
> From: Pan Li 
>
> For some target like target=amdgcn-amdhsa,  we need to take care of
> vector bool types prior to general vector mode types.  Or we may have
> the asm check failure as below.
>
> gcc.target/gcn/cond_smax_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, 
> s[0-9]+, v[0-9]+ 80
> gcc.target/gcn/cond_smin_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, 
> s[0-9]+, v[0-9]+ 80
> gcc.target/gcn/cond_umax_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, 
> s[0-9]+, v[0-9]+ 56
> gcc.target/gcn/cond_umin_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, 
> s[0-9]+, v[0-9]+ 56
> gcc.dg/tree-ssa/loop-bound-2.c scan-tree-dump-not ivopts "zero if "
>
> The below test suites are passed for this patch.
> 1. The rv64gcv fully regression tests.
> 2. The x86 bootstrap tests.
> 3. The x86 fully regression tests.
> 4. The amdgcn test case as above.

Still OK.

Richard.

> gcc/ChangeLog:
>
> * internal-fn.cc (type_strictly_matches_mode_p): Add handling
> for vector bool type.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/internal-fn.cc | 10 ++
>  1 file changed, 10 insertions(+)
>
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 8a2e07f2f96..966594a52ed 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -4171,6 +4171,16 @@ direct_internal_fn_optab (internal_fn fn)
>  static bool
>  type_strictly_matches_mode_p (const_tree type)
>  {
> +  /* The masked vector operations have both vector data operands and vector
> + boolean operands.  The vector data operands are expected to have a 
> vector
> + mode,  but the vector boolean operands can be an integer mode rather 
> than
> + a vector mode,  depending on how TARGET_VECTORIZE_GET_MASK_MODE is
> + defined.  PR116103.  */
> +  if (VECTOR_BOOLEAN_TYPE_P (type)
> +  && SCALAR_INT_MODE_P (TYPE_MODE (type))
> +  && TYPE_PRECISION (TREE_TYPE (type)) == 1)
> +return true;
> +
>if (VECTOR_TYPE_P (type))
>  return VECTOR_MODE_P (TYPE_MODE (type));
>
> --
> 2.34.1
>


RE: [PATCH v1] RISC-V: Bugfix vec_extract v mode iterator restriction mismatch

2024-06-14 Thread Li, Pan2
Committed, thanks Kito.

Pan

-Original Message-
From: Kito Cheng  
Sent: Friday, June 14, 2024 3:33 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; jeffreya...@gmail.com; 
rdapp@gmail.com
Subject: Re: [PATCH v1] RISC-V: Bugfix vec_extract v mode iterator restriction 
mismatch

LGTM, thanks :)

On Fri, Jun 14, 2024 at 3:02 PM  wrote:
>
> From: Pan Li 
>
> We have vec_extract pattern which takes ZVFHMIN as the mode
> iterator of the V mode.  Aka VF_ZVFHMIN iterator.  But it will
> expand to pred_extract_first pattern which takes the ZVFH as the mode
> iterator of the V mode.  AKa VF.  The mismatch will result in one ICE
> similar as below:
>
> insn 30 29 31 2 (set (reg:HF 156 [ _2 ])
> (unspec:HF [
> (vec_select:HF (reg:RVVMF2HF 134 [ _1 ])
> (parallel [
> (const_int 0 [0])
> ]))
> (reg:SI 67 vtype)
> ] UNSPEC_VPREDICATE)) "compress_run-2.c":22:3 -1
>  (nil))
> during RTL pass: vregs
> compress_run-2.c:25:1: internal compiler error: in extract_insn, at
> recog.cc:2812
> 0xb3bc47 _fatal_insn(char const*, rtx_def const*, char const*, int, char
> const*)
> ../../../gcc/gcc/rtl-error.cc:108
> 0xb3bc69 _fatal_insn_not_found(rtx_def const*, char const*, int, char
> const*)
> ../../../gcc/gcc/rtl-error.cc:116
> 0xb3a545 extract_insn(rtx_insn*)
> ../../../gcc/gcc/recog.cc:2812
> 0x1010e9e instantiate_virtual_regs_in_insn
> ../../../gcc/gcc/function.cc:1612
> 0x1010e9e instantiate_virtual_regs
> ../../../gcc/gcc/function.cc:1995
> 0x1010e9e execute
> ../../../gcc/gcc/function.cc:2042
>
> The below test suites are passed for this patch.
> 1. The rv64gcv fully regression test.
> 2. The rv64gcv build with glibc.
>
> There may be other similar issue(s) for the mismatch,  we will take care
> of them by test cases one by one.
>
> PR target/115456
>
> gcc/ChangeLog:
>
> * config/riscv/vector-iterators.md: Leverage V_ZVFH instead of V
> which contains the VF_ZVFHMIN for alignment.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/pr115456-2.c: New test.
> * gcc.target/riscv/rvv/base/pr115456-3.c: New test.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/config/riscv/vector-iterators.md  |  4 ++-
>  .../gcc.target/riscv/rvv/base/pr115456-2.c| 31 +++
>  .../gcc.target/riscv/rvv/base/pr115456-3.c| 31 +++
>  3 files changed, 65 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr115456-2.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr115456-3.c
>
> diff --git a/gcc/config/riscv/vector-iterators.md 
> b/gcc/config/riscv/vector-iterators.md
> index 47392d0da4c..43137a2a379 100644
> --- a/gcc/config/riscv/vector-iterators.md
> +++ b/gcc/config/riscv/vector-iterators.md
> @@ -1578,9 +1578,11 @@ (define_mode_iterator VLS_ZVFH [VLSI VLSF])
>
>  (define_mode_iterator V [VI VF_ZVFHMIN])
>
> +(define_mode_iterator V_ZVFH [VI VF])
> +
>  (define_mode_iterator V_VLS [V VLS])
>
> -(define_mode_iterator V_VLS_ZVFH [V VLS_ZVFH])
> +(define_mode_iterator V_VLS_ZVFH [V_ZVFH VLS_ZVFH])
>
>  (define_mode_iterator V_VLSI [VI VLSI])
>
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr115456-2.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/base/pr115456-2.c
> new file mode 100644
> index 000..453e18b1c79
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr115456-2.c
> @@ -0,0 +1,31 @@
> +/* Test there is no ICE when compile.  */
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv_zvfhmin -mrvv-vector-bits=zvl -mabi=lp64d 
> -O3 -ftree-vectorize" } */
> +
> +#include 
> +#include 
> +
> +typedef _Float16 vnx4f __attribute__ ((vector_size (8)));
> +
> +vnx4f __attribute__ ((noinline, noclone))
> +test_5 (vnx4f x, vnx4f y)
> +{
> +  return __builtin_shufflevector (x, y, 1, 3, 6, 7);
> +}
> +
> +int
> +main (void)
> +{
> +  vnx4f test_5_x = {0, 1, 3, 4};
> +  vnx4f test_5_y = {4, 5, 6, 7};
> +  vnx4f test_5_except = {1, 4, 6, 7};
> +  vnx4f test_5_real;
> +  test_5_real = test_5 (test_5_x, test_5_y);
> +
> +  for (int i = 0; i < 4; i++)
> +assert (test_5_real[i] == test_5_except[i]);
> +
> +  return 0;
> +}
> +
> +/* { dg-final { scan-assembler-times {call\s+__extendhfsf2} 8 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr115456-3.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/base/pr115456-3.c
> new file mode 100644
> index 000..2c54f

RE: [PATCH v1] Match: Support more forms for the scalar unsigned .SAT_SUB

2024-06-14 Thread Li, Pan2
Thanks Richard for comments.

> :c shouldn't be necessary on the plus
> or on the bit_xor
> OK with those changes.

Will remove the :c and commit it if there is no surprise from test suites.

Pan

-Original Message-
From: Richard Biener  
Sent: Friday, June 14, 2024 4:05 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1] Match: Support more forms for the scalar unsigned 
.SAT_SUB

On Wed, Jun 12, 2024 at 2:38 PM  wrote:
>
> From: Pan Li 
>
> After we support the scalar unsigned form 1 and 2,  we would like
> to introduce more forms include the branch and branchless.  There
> are forms 3-10 list as below:
>
> Form 3:
>   #define SAT_SUB_U_3(T) \
>   T sat_sub_u_3_##T (T x, T y) \
>   { \
> return x > y ? x - y : 0; \
>   }
>
> Form 4:
>   #define SAT_SUB_U_4(T) \
>   T sat_sub_u_4_##T (T x, T y) \
>   { \
> return x >= y ? x - y : 0; \
>   }
>
> Form 5:
>   #define SAT_SUB_U_5(T) \
>   T sat_sub_u_5_##T (T x, T y) \
>   { \
> return x < y ? 0 : x - y; \
>   }
>
> Form 6:
>   #define SAT_SUB_U_6(T) \
>   T sat_sub_u_6_##T (T x, T y) \
>   { \
> return x <= y ? 0 : x - y; \
>   }
>
> Form 7:
>   #define SAT_SUB_U_7(T) \
>   T sat_sub_u_7_##T (T x, T y) \
>   { \
> T ret; \
> T overflow = __builtin_sub_overflow (x, y, &ret); \
> return ret & (T)(overflow - 1); \
>   }
>
> Form 8:
>   #define SAT_SUB_U_8(T) \
>   T sat_sub_u_8_##T (T x, T y) \
>   { \
> T ret; \
> T overflow = __builtin_sub_overflow (x, y, &ret); \
> return ret & (T)-(!overflow); \
>   }
>
> Form 9:
>   #define SAT_SUB_U_9(T) \
>   T sat_sub_u_9_##T (T x, T y) \
>   { \
> T ret; \
> T overflow = __builtin_sub_overflow (x, y, &ret); \
> return overflow ? 0 : ret; \
>   }
>
> Form 10:
>   #define SAT_SUB_U_10(T) \
>   T sat_sub_u_10_##T (T x, T y) \
>   { \
> T ret; \
> T overflow = __builtin_sub_overflow (x, y, &ret); \
> return !overflow ? ret : 0; \
>   }
>
> Take form 10 as example:
>
> SAT_SUB_U_10(uint64_t);
>
> Before this patch:
> uint8_t sat_sub_u_10_uint8_t (uint8_t x, uint8_t y)
> {
>   unsigned char _1;
>   unsigned char _2;
>   uint8_t _3;
>   __complex__ unsigned char _6;
>
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _6 = .SUB_OVERFLOW (x_4(D), y_5(D));
>   _2 = IMAGPART_EXPR <_6>;
>   if (_2 == 0)
> goto ; [50.00%]
>   else
> goto ; [50.00%]
> ;;succ:   3
> ;;4
>
> ;;   basic block 3, loop depth 0
> ;;pred:   2
>   _1 = REALPART_EXPR <_6>;
> ;;succ:   4
>
> ;;   basic block 4, loop depth 0
> ;;pred:   2
> ;;3
>   # _3 = PHI <0(2), _1(3)>
>   return _3;
> ;;succ:   EXIT
>
> }
>
> After this patch:
> uint8_t sat_sub_u_10_uint8_t (uint8_t x, uint8_t y)
> {
>   uint8_t _3;
>
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _3 = .SAT_SUB (x_4(D), y_5(D)); [tail call]
>   return _3;
> ;;succ:   EXIT
>
> }
>
> The below test suites are passed for this patch:
> 1. The rv64gcv fully regression test with newlib.
> 2. The rv64gcv build with glibc.
> 3. The x86 bootstrap test.
> 4. The x86 fully regression test.
>
> gcc/ChangeLog:
>
> * match.pd: Add more match for unsigned sat_sub.
> * tree-ssa-math-opts.cc (match_unsigned_saturation_sub): Add new
> func impl to match phi node for .SAT_SUB.
> (math_opts_dom_walker::after_dom_children): Try match .SAT_SUB
> for the phi node, MULT_EXPR, BIT_XOR_EXPR and BIT_AND_EXPR.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/match.pd  | 25 +++--
>  gcc/tree-ssa-math-opts.cc | 33 +
>  2 files changed, 56 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 5cfe81e80b3..66e411b3359 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3140,14 +3140,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  /* Unsigned saturation sub, case 1 (branch with gt):
> SAT_U_SUB = X > Y ? X - Y : 0  */
>  (match (unsigned_integer_sat_sub @0 @1)
> - (cond (gt @0 @1) (minus @0 @1) integer_zerop)
> + (cond^ (gt @0 @1) (minus @0 @1) integer_zerop)
>   (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
>&& types_match (type, @0, @1
>
>  /* Unsigned saturation sub, case 2 (branch with ge):
> SAT_U_SUB = X >= Y ? X - Y : 0.  */

RE: [PATCH] tree-optimization/114589 - remove profile based sink heuristics

2024-06-14 Thread Li, Pan2
Hi Richard,

Here is one PR related to this patch (by git bisect), details as below.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115458

I am still trying to narrow down which change caused this failure or any hints 
here?

Pan

-Original Message-
From: Richard Biener  
Sent: Wednesday, May 15, 2024 5:39 PM
To: gcc-patches@gcc.gnu.org
Subject: [PATCH] tree-optimization/114589 - remove profile based sink heuristics

The following removes the profile based heuristic limiting sinking
and instead uses post-dominators to avoid sinking to places that
are executed under the same conditions as the earlier location which
the profile based heuristic should have guaranteed as well.

To avoid regressing this moves the empty-latch check to cover all
sink cases.

It also stream-lines the resulting select_best_block a bit but avoids
adjusting heuristics more with this change.  gfortran.dg/streamio_9.f90
starts execute failing with this on x86_64 with -m32 because the
(float)i * 9....e-7 compute is sunk across a STOP causing it
to be no longer spilled and thus the compare failing due to excess
precision.  The patch adds -ffloat-store to avoid this, following
other similar testcases.

This change doesn't fix the testcase in the PR on itself.

Bootstrapped on x86_64-unknown-linux-gnu, re-testing in progress.

PR tree-optimization/114589
* tree-ssa-sink.cc (select_best_block): Remove profile-based
heuristics.  Instead reject sink locations that sink
to post-dominators.  Move empty latch check here from
statement_sink_location.  Also consider early_bb for the
loop depth check.
(statement_sink_location): Remove superfluous check.  Remove
empty latch check.
(pass_sink_code::execute): Compute/release post-dominators.

* gfortran.dg/streamio_9.f90: Use -ffloat-store to avoid
excess precision when not spilling.
---
 gcc/testsuite/gfortran.dg/streamio_9.f90 |  1 +
 gcc/tree-ssa-sink.cc | 62 
 2 files changed, 20 insertions(+), 43 deletions(-)

diff --git a/gcc/testsuite/gfortran.dg/streamio_9.f90 
b/gcc/testsuite/gfortran.dg/streamio_9.f90
index b6bddb973f8..f29ded6ba54 100644
--- a/gcc/testsuite/gfortran.dg/streamio_9.f90
+++ b/gcc/testsuite/gfortran.dg/streamio_9.f90
@@ -1,4 +1,5 @@
 ! { dg-do run }
+! { dg-options "-ffloat-store" }
 ! PR29053 Stream IO test 9.
 ! Contributed by Jerry DeLisle .
 ! Test case derived from that given in PR by Steve Kargl.
diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
index 2f90acb7ef4..2188b7523c7 100644
--- a/gcc/tree-ssa-sink.cc
+++ b/gcc/tree-ssa-sink.cc
@@ -178,15 +178,7 @@ nearest_common_dominator_of_uses (def_operand_p def_p, 
bool *debug_stmts)
 
We want the most control dependent block in the shallowest loop nest.
 
-   If the resulting block is in a shallower loop nest, then use it.  Else
-   only use the resulting block if it has significantly lower execution
-   frequency than EARLY_BB to avoid gratuitous statement movement.  We
-   consider statements with VOPS more desirable to move.
-
-   This pass would obviously benefit from PDO as it utilizes block
-   frequencies.  It would also benefit from recomputing frequencies
-   if profile data is not available since frequencies often get out
-   of sync with reality.  */
+   If the resulting block is in a shallower loop nest, then use it.  */
 
 static basic_block
 select_best_block (basic_block early_bb,
@@ -195,18 +187,17 @@ select_best_block (basic_block early_bb,
 {
   basic_block best_bb = late_bb;
   basic_block temp_bb = late_bb;
-  int threshold;
 
   while (temp_bb != early_bb)
 {
+  /* Walk up the dominator tree, hopefully we'll find a shallower
+loop nest.  */
+  temp_bb = get_immediate_dominator (CDI_DOMINATORS, temp_bb);
+
   /* If we've moved into a lower loop nest, then that becomes
 our best block.  */
   if (bb_loop_depth (temp_bb) < bb_loop_depth (best_bb))
best_bb = temp_bb;
-
-  /* Walk up the dominator tree, hopefully we'll find a shallower
-loop nest.  */
-  temp_bb = get_immediate_dominator (CDI_DOMINATORS, temp_bb);
 }
 
   /* Placing a statement before a setjmp-like function would be invalid
@@ -221,6 +212,16 @@ select_best_block (basic_block early_bb,
   if (bb_loop_depth (best_bb) < bb_loop_depth (early_bb))
 return best_bb;
 
+  /* Do not move stmts to post-dominating places on the same loop depth.  */
+  if (dominated_by_p (CDI_POST_DOMINATORS, early_bb, best_bb))
+return early_bb;
+
+  /* If the latch block is empty, don't make it non-empty by sinking
+ something into it.  */
+  if (best_bb == early_bb->loop_father->latch
+  && empty_block_p (best_bb))
+return early_bb;
+
   /* Avoid turning an unconditional read into a conditional one when we
  still might want to perform vectorization.  */
   if (best_bb->loop_father == early_bb->loop_father
@@ -233,28 +234,7 @@ se

RE: [PATCH] tree-optimization/114589 - remove profile based sink heuristics

2024-06-14 Thread Li, Pan2
> It definitely looks like a latent issue being triggered.  Either in LRA
> or in how the target presents itself.

Thanks Richard, will have a try and keep you posted.

Pan

-Original Message-
From: Richard Biener  
Sent: Friday, June 14, 2024 9:11 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org
Subject: RE: [PATCH] tree-optimization/114589 - remove profile based sink 
heuristics

On Fri, 14 Jun 2024, Li, Pan2 wrote:

> Hi Richard,
> 
> Here is one PR related to this patch (by git bisect), details as below.
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115458
> 
> I am still trying to narrow down which change caused this failure or any 
> hints here?

It definitely looks like a latent issue being triggered.  Either in LRA
or in how the target presents itself.

Richard.

> Pan
> 
> -Original Message-
> From: Richard Biener  
> Sent: Wednesday, May 15, 2024 5:39 PM
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH] tree-optimization/114589 - remove profile based sink 
> heuristics
> 
> The following removes the profile based heuristic limiting sinking
> and instead uses post-dominators to avoid sinking to places that
> are executed under the same conditions as the earlier location which
> the profile based heuristic should have guaranteed as well.
> 
> To avoid regressing this moves the empty-latch check to cover all
> sink cases.
> 
> It also stream-lines the resulting select_best_block a bit but avoids
> adjusting heuristics more with this change.  gfortran.dg/streamio_9.f90
> starts execute failing with this on x86_64 with -m32 because the
> (float)i * 9....e-7 compute is sunk across a STOP causing it
> to be no longer spilled and thus the compare failing due to excess
> precision.  The patch adds -ffloat-store to avoid this, following
> other similar testcases.
> 
> This change doesn't fix the testcase in the PR on itself.
> 
> Bootstrapped on x86_64-unknown-linux-gnu, re-testing in progress.
> 
>   PR tree-optimization/114589
>   * tree-ssa-sink.cc (select_best_block): Remove profile-based
>   heuristics.  Instead reject sink locations that sink
> to post-dominators.  Move empty latch check here from
>   statement_sink_location.  Also consider early_bb for the
>   loop depth check.
>   (statement_sink_location): Remove superfluous check.  Remove
>   empty latch check.
>   (pass_sink_code::execute): Compute/release post-dominators.
> 
>   * gfortran.dg/streamio_9.f90: Use -ffloat-store to avoid
>   excess precision when not spilling.
> ---
>  gcc/testsuite/gfortran.dg/streamio_9.f90 |  1 +
>  gcc/tree-ssa-sink.cc | 62 
>  2 files changed, 20 insertions(+), 43 deletions(-)
> 
> diff --git a/gcc/testsuite/gfortran.dg/streamio_9.f90 
> b/gcc/testsuite/gfortran.dg/streamio_9.f90
> index b6bddb973f8..f29ded6ba54 100644
> --- a/gcc/testsuite/gfortran.dg/streamio_9.f90
> +++ b/gcc/testsuite/gfortran.dg/streamio_9.f90
> @@ -1,4 +1,5 @@
>  ! { dg-do run }
> +! { dg-options "-ffloat-store" }
>  ! PR29053 Stream IO test 9.
>  ! Contributed by Jerry DeLisle .
>  ! Test case derived from that given in PR by Steve Kargl.
> diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
> index 2f90acb7ef4..2188b7523c7 100644
> --- a/gcc/tree-ssa-sink.cc
> +++ b/gcc/tree-ssa-sink.cc
> @@ -178,15 +178,7 @@ nearest_common_dominator_of_uses (def_operand_p def_p, 
> bool *debug_stmts)
>  
> We want the most control dependent block in the shallowest loop nest.
>  
> -   If the resulting block is in a shallower loop nest, then use it.  Else
> -   only use the resulting block if it has significantly lower execution
> -   frequency than EARLY_BB to avoid gratuitous statement movement.  We
> -   consider statements with VOPS more desirable to move.
> -
> -   This pass would obviously benefit from PDO as it utilizes block
> -   frequencies.  It would also benefit from recomputing frequencies
> -   if profile data is not available since frequencies often get out
> -   of sync with reality.  */
> +   If the resulting block is in a shallower loop nest, then use it.  */
>  
>  static basic_block
>  select_best_block (basic_block early_bb,
> @@ -195,18 +187,17 @@ select_best_block (basic_block early_bb,
>  {
>basic_block best_bb = late_bb;
>basic_block temp_bb = late_bb;
> -  int threshold;
>  
>while (temp_bb != early_bb)
>  {
> +  /* Walk up the dominator tree, hopefully we'll find a shallower
> +  loop nest.  */
> +  temp_bb = get_immediate_dominator (CDI_DOMINATORS, temp_bb);
> +
>/* If we've moved into a lower loop nest, then that becomes
>our best 

RE: [PATCH v1] Match: Support more forms for the scalar unsigned .SAT_SUB

2024-06-14 Thread Li, Pan2
Committed with those changes and test suites passed.

Pan

-Original Message-
From: Li, Pan2 
Sent: Friday, June 14, 2024 4:15 PM
To: Richard Biener 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
jeffreya...@gmail.com; rdapp@gmail.com
Subject: RE: [PATCH v1] Match: Support more forms for the scalar unsigned 
.SAT_SUB

Thanks Richard for comments.

> :c shouldn't be necessary on the plus
> or on the bit_xor
> OK with those changes.

Will remove the :c and commit it if there is no surprise from test suites.

Pan

-Original Message-
From: Richard Biener  
Sent: Friday, June 14, 2024 4:05 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1] Match: Support more forms for the scalar unsigned 
.SAT_SUB

On Wed, Jun 12, 2024 at 2:38 PM  wrote:
>
> From: Pan Li 
>
> After we support the scalar unsigned form 1 and 2,  we would like
> to introduce more forms include the branch and branchless.  There
> are forms 3-10 list as below:
>
> Form 3:
>   #define SAT_SUB_U_3(T) \
>   T sat_sub_u_3_##T (T x, T y) \
>   { \
> return x > y ? x - y : 0; \
>   }
>
> Form 4:
>   #define SAT_SUB_U_4(T) \
>   T sat_sub_u_4_##T (T x, T y) \
>   { \
> return x >= y ? x - y : 0; \
>   }
>
> Form 5:
>   #define SAT_SUB_U_5(T) \
>   T sat_sub_u_5_##T (T x, T y) \
>   { \
> return x < y ? 0 : x - y; \
>   }
>
> Form 6:
>   #define SAT_SUB_U_6(T) \
>   T sat_sub_u_6_##T (T x, T y) \
>   { \
> return x <= y ? 0 : x - y; \
>   }
>
> Form 7:
>   #define SAT_SUB_U_7(T) \
>   T sat_sub_u_7_##T (T x, T y) \
>   { \
> T ret; \
> T overflow = __builtin_sub_overflow (x, y, &ret); \
> return ret & (T)(overflow - 1); \
>   }
>
> Form 8:
>   #define SAT_SUB_U_8(T) \
>   T sat_sub_u_8_##T (T x, T y) \
>   { \
> T ret; \
> T overflow = __builtin_sub_overflow (x, y, &ret); \
> return ret & (T)-(!overflow); \
>   }
>
> Form 9:
>   #define SAT_SUB_U_9(T) \
>   T sat_sub_u_9_##T (T x, T y) \
>   { \
> T ret; \
> T overflow = __builtin_sub_overflow (x, y, &ret); \
> return overflow ? 0 : ret; \
>   }
>
> Form 10:
>   #define SAT_SUB_U_10(T) \
>   T sat_sub_u_10_##T (T x, T y) \
>   { \
> T ret; \
> T overflow = __builtin_sub_overflow (x, y, &ret); \
> return !overflow ? ret : 0; \
>   }
>
> Take form 10 as example:
>
> SAT_SUB_U_10(uint64_t);
>
> Before this patch:
> uint8_t sat_sub_u_10_uint8_t (uint8_t x, uint8_t y)
> {
>   unsigned char _1;
>   unsigned char _2;
>   uint8_t _3;
>   __complex__ unsigned char _6;
>
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _6 = .SUB_OVERFLOW (x_4(D), y_5(D));
>   _2 = IMAGPART_EXPR <_6>;
>   if (_2 == 0)
> goto ; [50.00%]
>   else
> goto ; [50.00%]
> ;;succ:   3
> ;;4
>
> ;;   basic block 3, loop depth 0
> ;;pred:   2
>   _1 = REALPART_EXPR <_6>;
> ;;succ:   4
>
> ;;   basic block 4, loop depth 0
> ;;pred:   2
> ;;3
>   # _3 = PHI <0(2), _1(3)>
>   return _3;
> ;;succ:   EXIT
>
> }
>
> After this patch:
> uint8_t sat_sub_u_10_uint8_t (uint8_t x, uint8_t y)
> {
>   uint8_t _3;
>
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _3 = .SAT_SUB (x_4(D), y_5(D)); [tail call]
>   return _3;
> ;;succ:   EXIT
>
> }
>
> The below test suites are passed for this patch:
> 1. The rv64gcv fully regression test with newlib.
> 2. The rv64gcv build with glibc.
> 3. The x86 bootstrap test.
> 4. The x86 fully regression test.
>
> gcc/ChangeLog:
>
> * match.pd: Add more match for unsigned sat_sub.
> * tree-ssa-math-opts.cc (match_unsigned_saturation_sub): Add new
> func impl to match phi node for .SAT_SUB.
> (math_opts_dom_walker::after_dom_children): Try match .SAT_SUB
> for the phi node, MULT_EXPR, BIT_XOR_EXPR and BIT_AND_EXPR.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/match.pd  | 25 +++--
>  gcc/tree-ssa-math-opts.cc | 33 +
>  2 files changed, 56 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 5cfe81e80b3..66e411b3359 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3140,14 +3140,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  /* Unsigned saturation sub, case 1 (branch with gt):
> SAT_U_SUB = X > Y ? X - Y : 0  */
>  (match (unsigned_in

RE: [PATCH v1] Match: Support forms 7 and 8 for the unsigned .SAT_ADD

2024-06-18 Thread Li, Pan2
Thanks Richard for comments.

> we might want to consider such transform in match.pd, in this case this
> would allow to elide one of the patterns.

That makes much more sense to me, it is not good idea to have many patterns for 
SAT_ADD,
will commit this first and have a try in another PATCH for this.

Pan

-Original Message-
From: Richard Biener  
Sent: Tuesday, June 18, 2024 7:03 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1] Match: Support forms 7 and 8 for the unsigned .SAT_ADD

On Mon, Jun 17, 2024 at 3:41 AM  wrote:
>
> From: Pan Li 
>
> When investigate the vectorization of .SAT_ADD,  we notice there
> are additional 2 forms,  aka form 7 and 8 for .SAT_ADD.
>
> Form 7:
>   #define DEF_SAT_U_ADD_FMT_7(T)  \
>   T __attribute__((noinline)) \
>   sat_u_add_##T##_fmt_7 (T x, T y)\
>   {   \
> return x > (T)(x + y) ? -1 : (x + y); \
>   }
>
> Form 8:
>   #define DEF_SAT_U_ADD_FMT_8(T)   \
>   T __attribute__((noinline))  \
>   sat_u_add_##T##_fmt_8 (T x, T y) \
>   {\
> return x <= (T)(x + y) ? (x + y) : -1; \
>   }
>
> Thus,  add above 2 forms to the match gimple_unsigned_integer_sat_add,
> and then the vectorizer can try to recog the pattern like form 7 and
> form 8.
>
> The below test suites are passed for this patch:
> 1. The rv64gcv fully regression test with newlib.
> 2. The rv64gcv build with glibc.
> 3. The x86 bootstrap test.
> 4. The x86 fully regression test.

OK.

Note that fold-const.cc has canonicalization for the minus one to be put last:

  /* If the second operand is simpler than the third, swap them
 since that produces better jump optimization results.  */
  if (truth_value_p (TREE_CODE (arg0))
  && tree_swap_operands_p (op1, op2))
{
  location_t loc0 = expr_location_or (arg0, loc);
  /* See if this can be inverted.  If it can't, possibly because
 it was a floating-point inequality comparison, don't do
 anything.  */
  tem = fold_invert_truthvalue (loc0, arg0);
  if (tem)
return fold_build3_loc (loc, code, type, tem, op2, op1);

we might want to consider such transform in match.pd, in this case this
would allow to elide one of the patterns.

Richard.

> gcc/ChangeLog:
>
> * match.pd: Add form 7 and 8 for the unsigned .SAT_ADD match.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/match.pd | 10 ++
>  1 file changed, 10 insertions(+)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 99968d316ed..aae6d30a5e4 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3144,6 +3144,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (cond^ (ne (imagpart (IFN_ADD_OVERFLOW:c @0 @1)) integer_zerop)
>integer_minus_onep (usadd_left_part_2 @0 @1)))
>
> +/* Unsigned saturation add, case 7 (branch with le):
> +   SAT_ADD = x <= (X + Y) ? (X + Y) : -1.  */
> +(match (unsigned_integer_sat_add @0 @1)
> + (cond^ (le @0 (usadd_left_part_1@2 @0 @1)) @2 integer_minus_onep))
> +
> +/* Unsigned saturation add, case 8 (branch with gt):
> +   SAT_ADD = x > (X + Y) ? -1 : (X + Y).  */
> +(match (unsigned_integer_sat_add @0 @1)
> + (cond^ (gt @0 (usadd_left_part_1@2 @0 @1)) integer_minus_onep @2))
> +
>  /* Unsigned saturation sub, case 1 (branch with gt):
> SAT_U_SUB = X > Y ? X - Y : 0  */
>  (match (unsigned_integer_sat_sub @0 @1)
> --
> 2.34.1
>


RE: [PATCH v1] Match: Support form 11 for the unsigned scalar .SAT_SUB

2024-06-18 Thread Li, Pan2
Thanks Richard, will commit this one and then have a try to reduce unnecessary 
pattern following your suggestion.

Pan

-Original Message-
From: Richard Biener  
Sent: Tuesday, June 18, 2024 7:08 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1] Match: Support form 11 for the unsigned scalar .SAT_SUB

On Mon, Jun 17, 2024 at 9:07 AM  wrote:
>
> From: Pan Li 
>
> We missed one match pattern for the unsigned scalar .SAT_SUB,  aka
> form 11.
>
> Form 11:
>   #define SAT_SUB_U_11(T) \
>   T sat_sub_u_11_##T (T x, T y) \
>   { \
> T ret; \
> bool overflow = __builtin_sub_overflow (x, y, &ret); \
> return overflow ? 0 : ret; \
>   }
>
> Thus,  add above form 11 to the match pattern gimple_unsigned_integer_sat_sub.
>
> The below test suites are passed for this patch:
> 1. The rv64gcv fully regression test with newlib.
> 2. The rv64gcv build with glibc.
> 3. The x86 bootstrap test.
> 4. The x86 fully regression test.

OK, but see my other mail.  Eventually sth like

(for cmp (tcc_comparison)
   icmp (inverted_tcc_comparison)
   ncmp (inverted_tcc_comparison_with_nans)
(simplify
 (cond (cmp @0 @1) @2 @3)
 (if (tree_swap_operands_p (@2, @3))
  (with { enum tree_code ic = invert_tree_comparison (cmp, HONOR_NANS (@0)); }
   (if (ic == icmp)
   (cond (icmp @0 @1) @3 @2)
   (if (ic == ncmp)
(cond (ncmp @0 @1) @3 @2))

helps here.  Of course with matching PHIs the above isn't going to help.

> gcc/ChangeLog:
>
> * match.pd: Add form 11 match pattern for .SAT_SUB.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/match.pd | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 99968d316ed..5c330a43ed0 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3186,13 +3186,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
>&& types_match (type, @0, @1
>
> -/* Unsigned saturation sub, case 7 (branch with .SUB_OVERFLOW).  */
> +/* Unsigned saturation sub, case 7 (branch eq with .SUB_OVERFLOW).  */
>  (match (unsigned_integer_sat_sub @0 @1)
>   (cond^ (eq (imagpart (IFN_SUB_OVERFLOW@2 @0 @1)) integer_zerop)
>(realpart @2) integer_zerop)
>   (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
>&& types_match (type, @0, @1
>
> +/* Unsigned saturation sub, case 8 (branch ne with .SUB_OVERFLOW).  */
> +(match (unsigned_integer_sat_sub @0 @1)
> + (cond^ (ne (imagpart (IFN_SUB_OVERFLOW@2 @0 @1)) integer_zerop)
> +   integer_zerop (realpart @2))
> + (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> +  && types_match (type, @0, @1
> +
>  /* x >  y  &&  x != XXX_MIN  -->  x > y
> x >  y  &&  x == XXX_MIN  -->  false . */
>  (for eqne (eq ne)
> --
> 2.34.1
>


RE: [PATCH v1 2/7] RISC-V: Add testcases for unsigned .SAT_ADD vector form 3

2024-06-18 Thread Li, Pan2
Committed the series, thanks Juzhe.

Pan

From: 钟居哲 
Sent: Wednesday, June 19, 2024 12:01 PM
To: Li, Pan2 ; gcc-patches 
Cc: kito.cheng ; jeffreyalaw ; 
rdapp.gcc ; Li, Pan2 
Subject: Re: [PATCH v1 2/7] RISC-V: Add testcases for unsigned .SAT_ADD vector 
form 3

lgtm



--Reply to Message--
On Mon, Jun 17, 2024 22:34 PM 
pan2.limailto:pan2...@intel.com>> wrote:
From: Pan Li mailto:pan2...@intel.com>>

After the middle-end support the form 3 of unsigned SAT_ADD and
the RISC-V backend implement the .SAT_ADD for vector mode, add
more test case to cover the form 3.

Form 3:
  #define DEF_VEC_SAT_U_ADD_FMT_3(T)   \
  void __attribute__((noinline))   \
  vec_sat_u_add_##T##_fmt_3 (T *out, T *op_1, T *op_2, unsigned limit) \
  {\
unsigned i;\
for (i = 0; i < limit; i++)\
  {\
T x = op_1[i]; \
T y = op_2[i]; \
T ret; \
T overflow = __builtin_add_overflow (x, y, &ret);  \
out[i] = (T)(-overflow) | ret; \
  }\
  }

Passed the rv64gcv regression tests.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h: Add helper
macro for testing.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-10.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-11.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-12.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-9.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-10.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-11.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-12.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-9.c: New test.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>
---
 .../riscv/rvv/autovec/binop/vec_sat_arith.h   | 18 +
 .../rvv/autovec/binop/vec_sat_u_add-10.c  | 20 +
 .../rvv/autovec/binop/vec_sat_u_add-11.c  | 20 +
 .../rvv/autovec/binop/vec_sat_u_add-12.c  | 20 +
 .../riscv/rvv/autovec/binop/vec_sat_u_add-9.c | 19 +
 .../rvv/autovec/binop/vec_sat_u_add-run-10.c  | 75 +++
 .../rvv/autovec/binop/vec_sat_u_add-run-11.c  | 75 +++
 .../rvv/autovec/binop/vec_sat_u_add-run-12.c  | 75 +++
 .../rvv/autovec/binop/vec_sat_u_add-run-9.c   | 75 +++
 9 files changed, 397 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-10.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-11.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-12.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-9.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-10.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-11.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-12.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-9.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h
index 57b1bce4bd2..76f393fffbd 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h
@@ -32,12 +32,30 @@ vec_sat_u_add_##T##_fmt_2 (T *out, T *op_1, T *op_2, 
unsigned limit) \
 }\
 }

+#define DEF_VEC_SAT_U_ADD_FMT_3(T)   \
+void __attribute__((noinline))   \
+vec_sat_u_add_##T##_fmt_3 (T *out, T *op_1, T *op_2, unsigned limit) \
+{\
+  unsigned i;\
+  for (i = 0; i < limit; i++)\
+{\
+  T x = op_1[i]; \
+  T y = op_2[i]; \
+  T ret; \
+  T overflow = __builtin_add_overflow (x, y, &ret);  \
+   

RE: [PATCH v1 1/2] RISC-V: Add testcases for unsigned .SAT_SUB scalar form 11

2024-06-18 Thread Li, Pan2
Committed the series, thanks Juzhe.

Pan


From: 钟居哲 
Sent: Wednesday, June 19, 2024 11:55 AM
To: Li, Pan2 ; gcc-patches 
Cc: kito.cheng ; jeffreyalaw ; 
rdapp.gcc ; Li, Pan2 
Subject: Re: [PATCH v1 1/2] RISC-V: Add testcases for unsigned .SAT_SUB scalar 
form 11

lgtm



--Reply to Message--
On Tue, Jun 18, 2024 16:25 PM Li, 
Pan2mailto:pan2...@intel.com>> wrote:
From: Pan Li mailto:pan2...@intel.com>>

After the middle-end support the form 11 of unsigned SAT_SUB and
the RISC-V backend implement the SAT_SUB for vector mode, add
more test case to cover the form 11.

Form 11:
  #define DEF_SAT_U_SUB_FMT_11(T)\
  T __attribute__((noinline))\
  sat_u_sub_##T##_fmt_11 (T x, T y)  \
  {  \
T ret;   \
bool overflow = __builtin_sub_overflow (x, y, &ret); \
return overflow ? 0 : ret;   \
  }

Passed the rv64gcv regression tests.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add helper
macro for testing.
* gcc.target/riscv/sat_u_sub-41.c: New test.
* gcc.target/riscv/sat_u_sub-42.c: New test.
* gcc.target/riscv/sat_u_sub-43.c: New test.
* gcc.target/riscv/sat_u_sub-44.c: New test.
* gcc.target/riscv/sat_u_sub-run-41.c: New test.
* gcc.target/riscv/sat_u_sub-run-42.c: New test.
* gcc.target/riscv/sat_u_sub-run-43.c: New test.
* gcc.target/riscv/sat_u_sub-run-44.c: New test.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>
---
 gcc/testsuite/gcc.target/riscv/sat_arith.h| 11 
 gcc/testsuite/gcc.target/riscv/sat_u_sub-41.c | 18 +
 gcc/testsuite/gcc.target/riscv/sat_u_sub-42.c | 19 ++
 gcc/testsuite/gcc.target/riscv/sat_u_sub-43.c | 18 +
 gcc/testsuite/gcc.target/riscv/sat_u_sub-44.c | 17 +
 .../gcc.target/riscv/sat_u_sub-run-41.c   | 25 +++
 .../gcc.target/riscv/sat_u_sub-run-42.c   | 25 +++
 .../gcc.target/riscv/sat_u_sub-run-43.c   | 25 +++
 .../gcc.target/riscv/sat_u_sub-run-44.c   | 25 +++
 9 files changed, 183 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-41.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-42.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-43.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-44.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-41.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-42.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-43.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-44.c

diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/sat_arith.h
index 0f94c5ff087..ab7289a6947 100644
--- a/gcc/testsuite/gcc.target/riscv/sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
@@ -2,6 +2,7 @@
 #define HAVE_SAT_ARITH

 #include 
+#include 

 
/**/
 /* Saturation Add (unsigned and signed)   
*/
@@ -140,6 +141,15 @@ sat_u_sub_##T##_fmt_10 (T x, T y)   \
   return !overflow ? ret : 0;   \
 }

+#define DEF_SAT_U_SUB_FMT_11(T)\
+T __attribute__((noinline))\
+sat_u_sub_##T##_fmt_11 (T x, T y)  \
+{  \
+  T ret;   \
+  bool overflow = __builtin_sub_overflow (x, y, &ret); \
+  return overflow ? 0 : ret;   \
+}
+
 #define RUN_SAT_U_SUB_FMT_1(T, x, y) sat_u_sub_##T##_fmt_1(x, y)
 #define RUN_SAT_U_SUB_FMT_2(T, x, y) sat_u_sub_##T##_fmt_2(x, y)
 #define RUN_SAT_U_SUB_FMT_3(T, x, y) sat_u_sub_##T##_fmt_3(x, y)
@@ -150,5 +160,6 @@ sat_u_sub_##T##_fmt_10 (T x, T y)   \
 #define RUN_SAT_U_SUB_FMT_8(T, x, y) sat_u_sub_##T##_fmt_8(x, y)
 #define RUN_SAT_U_SUB_FMT_9(T, x, y) sat_u_sub_##T##_fmt_9(x, y)
 #define RUN_SAT_U_SUB_FMT_10(T, x, y) sat_u_sub_##T##_fmt_10(x, y)
+#define RUN_SAT_U_SUB_FMT_11(T, x, y) sat_u_sub_##T##_fmt_11(x, y)

 #endif
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_sub-41.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_sub-41.c
new file mode 100644
index 000..dd13f94e40f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_sub-41.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_sub_uint8_t_fmt_11:
+** sub\s+[atx][0-9]+,\s*a0,\s*a1
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** addi\

RE: [PATCH v1] Match: Support more forms for the scalar unsigned .SAT_SUB

2024-06-19 Thread Li, Pan2
Hi Richard,

Given almost all unsigned SAT_ADD/SAT_SUB patches are merged, I revisit the 
original code pattern aka zip benchmark.
It may look like below:

void test (uint16_t *x, uint16_t *y, unsigned wsize, unsigned count)
{
  unsigned m = 0, n = count;
  register uint16_t *p;

  p = x;

  do {
m = *--p;

*p = (uint16_t)(m >= wsize ? m-wsize : 0); // There will be a conversion 
here.
  } while (--n);
}

And we can have 179 tree pass as below:

   [local count: 1073741824]:
  # n_3 = PHI 
  # p_4 = PHI 
  p_10 = p_4 + 18446744073709551614;
  _1 = *p_10;
  m_11 = (unsigned int) _1;
  _2 = m_11 - wsize_12(D);
  iftmp.0_13 = (short unsigned int) _2;
  _18 = m_11 >= wsize_12(D);
  iftmp.0_5 = _18 ? iftmp.0_13 : 0;
  *p_10 = iftmp.0_5;

The above form doesn't hit any form we have supported in match.pd. Then I have 
one idea that to convert

uint16 d, tmp;
uint32 a, b, m;

m = a - b;
tmp = (uint16)m;
d = a >= b ? tmp : 0;

to

d = (uint16)(.SAT_SUB (a, b));

I am not very sure it is reasonable to make it work, it may have gimple 
assignment with convert similar as below (may require the help of 
vectorize_conversion?).
Would like to get some hint from you before the next step, thanks a lot.

patt_34 = .SAT_SUB (m_11, wsize_12(D));
patt_35 = (vector([8,8]) short unsigned int) patt_34;

Pan

-Original Message-
From: Richard Biener  
Sent: Friday, June 14, 2024 4:05 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1] Match: Support more forms for the scalar unsigned 
.SAT_SUB

On Wed, Jun 12, 2024 at 2:38 PM  wrote:
>
> From: Pan Li 
>
> After we support the scalar unsigned form 1 and 2,  we would like
> to introduce more forms include the branch and branchless.  There
> are forms 3-10 list as below:
>
> Form 3:
>   #define SAT_SUB_U_3(T) \
>   T sat_sub_u_3_##T (T x, T y) \
>   { \
> return x > y ? x - y : 0; \
>   }
>
> Form 4:
>   #define SAT_SUB_U_4(T) \
>   T sat_sub_u_4_##T (T x, T y) \
>   { \
> return x >= y ? x - y : 0; \
>   }
>
> Form 5:
>   #define SAT_SUB_U_5(T) \
>   T sat_sub_u_5_##T (T x, T y) \
>   { \
> return x < y ? 0 : x - y; \
>   }
>
> Form 6:
>   #define SAT_SUB_U_6(T) \
>   T sat_sub_u_6_##T (T x, T y) \
>   { \
> return x <= y ? 0 : x - y; \
>   }
>
> Form 7:
>   #define SAT_SUB_U_7(T) \
>   T sat_sub_u_7_##T (T x, T y) \
>   { \
> T ret; \
> T overflow = __builtin_sub_overflow (x, y, &ret); \
> return ret & (T)(overflow - 1); \
>   }
>
> Form 8:
>   #define SAT_SUB_U_8(T) \
>   T sat_sub_u_8_##T (T x, T y) \
>   { \
> T ret; \
> T overflow = __builtin_sub_overflow (x, y, &ret); \
> return ret & (T)-(!overflow); \
>   }
>
> Form 9:
>   #define SAT_SUB_U_9(T) \
>   T sat_sub_u_9_##T (T x, T y) \
>   { \
> T ret; \
> T overflow = __builtin_sub_overflow (x, y, &ret); \
> return overflow ? 0 : ret; \
>   }
>
> Form 10:
>   #define SAT_SUB_U_10(T) \
>   T sat_sub_u_10_##T (T x, T y) \
>   { \
> T ret; \
> T overflow = __builtin_sub_overflow (x, y, &ret); \
> return !overflow ? ret : 0; \
>   }
>
> Take form 10 as example:
>
> SAT_SUB_U_10(uint64_t);
>
> Before this patch:
> uint8_t sat_sub_u_10_uint8_t (uint8_t x, uint8_t y)
> {
>   unsigned char _1;
>   unsigned char _2;
>   uint8_t _3;
>   __complex__ unsigned char _6;
>
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _6 = .SUB_OVERFLOW (x_4(D), y_5(D));
>   _2 = IMAGPART_EXPR <_6>;
>   if (_2 == 0)
> goto ; [50.00%]
>   else
> goto ; [50.00%]
> ;;succ:   3
> ;;4
>
> ;;   basic block 3, loop depth 0
> ;;pred:   2
>   _1 = REALPART_EXPR <_6>;
> ;;succ:   4
>
> ;;   basic block 4, loop depth 0
> ;;pred:   2
> ;;3
>   # _3 = PHI <0(2), _1(3)>
>   return _3;
> ;;succ:   EXIT
>
> }
>
> After this patch:
> uint8_t sat_sub_u_10_uint8_t (uint8_t x, uint8_t y)
> {
>   uint8_t _3;
>
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _3 = .SAT_SUB (x_4(D), y_5(D)); [tail call]
>   return _3;
> ;;succ:   EXIT
>
> }
>
> The below test suites are passed for this patch:
> 1. The rv64gcv fully regression test with newlib.
> 2. The rv64gcv build with glibc.
> 3. The x86 bootstrap test.
> 4. The x86 fully regression test.
>
> gcc/ChangeLog:
>
> * match.pd: Add more match for unsigned sat_sub.
> * tree-ssa-math-opts.cc (match_unsigned_saturation_sub): Add new
> func impl to match 

RE: [PATCH v1 1/8] RISC-V: Add testcases for unsigned .SAT_SUB vector form 3

2024-06-19 Thread Li, Pan2
Committed the series, thanks Juzhe.

Pan

From: 钟居哲 
Sent: Wednesday, June 19, 2024 9:20 PM
To: Li, Pan2 ; gcc-patches 
Cc: kito.cheng ; jeffreyalaw ; 
rdapp.gcc ; Li, Pan2 
Subject: Re: [PATCH v1 1/8] RISC-V: Add testcases for unsigned .SAT_SUB vector 
form 3

lgtm



--Reply to Message--
On Wed, Jun 19, 2024 21:17 PM 
pan2.limailto:pan2...@intel.com>> wrote:
From: Pan Li mailto:pan2...@intel.com>>

After the middle-end support the form 3 of unsigned SAT_SUB and
the RISC-V backend implement the .SAT_SUB for vector mode,  thus
add more test case to cover that.

Form 3:
  #define DEF_VEC_SAT_U_SUB_FMT_3(T)   \
  void __attribute__((noinline))   \
  vec_sat_u_sub_##T##_fmt_3 (T *out, T *op_1, T *op_2, unsigned limit) \
  {\
unsigned i;\
for (i = 0; i < limit; i++)\
  {\
T x = op_1[i]; \
T y = op_2[i]; \
out[i] = x > y ? x - y : 0;\
  }\
  }

Passed the rv64gcv regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h: Add test macro.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-10.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-11.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-12.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-9.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-run-10.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-run-11.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-run-12.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-run-9.c: New test.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>
---
 .../riscv/rvv/autovec/binop/vec_sat_arith.h   | 17 +
 .../rvv/autovec/binop/vec_sat_u_sub-10.c  | 20 +
 .../rvv/autovec/binop/vec_sat_u_sub-11.c  | 20 +
 .../rvv/autovec/binop/vec_sat_u_sub-12.c  | 20 +
 .../riscv/rvv/autovec/binop/vec_sat_u_sub-9.c | 19 +
 .../rvv/autovec/binop/vec_sat_u_sub-run-10.c  | 75 +++
 .../rvv/autovec/binop/vec_sat_u_sub-run-11.c  | 75 +++
 .../rvv/autovec/binop/vec_sat_u_sub-run-12.c  | 75 +++
 .../rvv/autovec/binop/vec_sat_u_sub-run-9.c   | 75 +++
 9 files changed, 396 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-10.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-11.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-12.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-9.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-run-10.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-run-11.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-run-12.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-run-9.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h
index 443f88261ba..182cf2cf064 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h
@@ -167,9 +167,26 @@ vec_sat_u_sub_##T##_fmt_2 (T *out, T *op_1, T *op_2, 
unsigned limit) \
 }\
 }

+#define DEF_VEC_SAT_U_SUB_FMT_3(T)   \
+void __attribute__((noinline))   \
+vec_sat_u_sub_##T##_fmt_3 (T *out, T *op_1, T *op_2, unsigned limit) \
+{\
+  unsigned i;\
+  for (i = 0; i < limit; i++)\
+{\
+  T x = op_1[i]; \
+  T y = op_2[i]; \
+  out[i] = x > y ? x - y : 0;\
+}\
+}
+
 #define RUN_VEC_SAT_U_SUB_FMT_1(T, out, op_1, op_2, N) \
   vec_sat_u_sub_##T##_fmt_1(out, op_1, op_2, N)
+
 #define RUN_VEC_SAT_U_SUB_FMT_2(T, out, op_1, op_2, N) \
   vec_sat_u_sub_##T##_fmt_2

RE: [PATCH v1] Match: Support more forms for the scalar unsigned .SAT_SUB

2024-06-19 Thread Li, Pan2
Got it. Thanks Richard for suggestion.

Pan

-Original Message-
From: Richard Biener  
Sent: Wednesday, June 19, 2024 4:00 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1] Match: Support more forms for the scalar unsigned 
.SAT_SUB

On Wed, Jun 19, 2024 at 9:37 AM Li, Pan2  wrote:
>
> Hi Richard,
>
> Given almost all unsigned SAT_ADD/SAT_SUB patches are merged, I revisit the 
> original code pattern aka zip benchmark.
> It may look like below:
>
> void test (uint16_t *x, uint16_t *y, unsigned wsize, unsigned count)
> {
>   unsigned m = 0, n = count;
>   register uint16_t *p;
>
>   p = x;
>
>   do {
> m = *--p;
>
> *p = (uint16_t)(m >= wsize ? m-wsize : 0); // There will be a conversion 
> here.
>   } while (--n);
> }
>
> And we can have 179 tree pass as below:
>
>[local count: 1073741824]:
>   # n_3 = PHI 
>   # p_4 = PHI 
>   p_10 = p_4 + 18446744073709551614;
>   _1 = *p_10;
>   m_11 = (unsigned int) _1;
>   _2 = m_11 - wsize_12(D);
>   iftmp.0_13 = (short unsigned int) _2;
>   _18 = m_11 >= wsize_12(D);
>   iftmp.0_5 = _18 ? iftmp.0_13 : 0;
>   *p_10 = iftmp.0_5;
>
> The above form doesn't hit any form we have supported in match.pd. Then I 
> have one idea that to convert
>
> uint16 d, tmp;
> uint32 a, b, m;
>
> m = a - b;
> tmp = (uint16)m;
> d = a >= b ? tmp : 0;
>
> to
>
> d = (uint16)(.SAT_SUB (a, b));

The key here is to turn this into

 m = a - b;
 tmp = a >= b ? m : 0;
 d = (uint16) tmp;

I guess?  We probably have the reverse transform, turn
(uint16) a ? b : c; into a ? (uint16)b : (uint16)c if any of the arm simplifies.

OTOH if you figure the correct rules for the allowed conversions adjusting the
pattern matching to allow a conversion on the subtract would work.

> I am not very sure it is reasonable to make it work, it may have gimple 
> assignment with convert similar as below (may require the help of 
> vectorize_conversion?).
> Would like to get some hint from you before the next step, thanks a lot.
>
> patt_34 = .SAT_SUB (m_11, wsize_12(D));
> patt_35 = (vector([8,8]) short unsigned int) patt_34;
>
> Pan
>
> -Original Message-
> From: Richard Biener 
> Sent: Friday, June 14, 2024 4:05 PM
> To: Li, Pan2 
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
> jeffreya...@gmail.com; rdapp@gmail.com
> Subject: Re: [PATCH v1] Match: Support more forms for the scalar unsigned 
> .SAT_SUB
>
> On Wed, Jun 12, 2024 at 2:38 PM  wrote:
> >
> > From: Pan Li 
> >
> > After we support the scalar unsigned form 1 and 2,  we would like
> > to introduce more forms include the branch and branchless.  There
> > are forms 3-10 list as below:
> >
> > Form 3:
> >   #define SAT_SUB_U_3(T) \
> >   T sat_sub_u_3_##T (T x, T y) \
> >   { \
> > return x > y ? x - y : 0; \
> >   }
> >
> > Form 4:
> >   #define SAT_SUB_U_4(T) \
> >   T sat_sub_u_4_##T (T x, T y) \
> >   { \
> > return x >= y ? x - y : 0; \
> >   }
> >
> > Form 5:
> >   #define SAT_SUB_U_5(T) \
> >   T sat_sub_u_5_##T (T x, T y) \
> >   { \
> > return x < y ? 0 : x - y; \
> >   }
> >
> > Form 6:
> >   #define SAT_SUB_U_6(T) \
> >   T sat_sub_u_6_##T (T x, T y) \
> >   { \
> > return x <= y ? 0 : x - y; \
> >   }
> >
> > Form 7:
> >   #define SAT_SUB_U_7(T) \
> >   T sat_sub_u_7_##T (T x, T y) \
> >   { \
> > T ret; \
> > T overflow = __builtin_sub_overflow (x, y, &ret); \
> > return ret & (T)(overflow - 1); \
> >   }
> >
> > Form 8:
> >   #define SAT_SUB_U_8(T) \
> >   T sat_sub_u_8_##T (T x, T y) \
> >   { \
> > T ret; \
> > T overflow = __builtin_sub_overflow (x, y, &ret); \
> > return ret & (T)-(!overflow); \
> >   }
> >
> > Form 9:
> >   #define SAT_SUB_U_9(T) \
> >   T sat_sub_u_9_##T (T x, T y) \
> >   { \
> > T ret; \
> > T overflow = __builtin_sub_overflow (x, y, &ret); \
> > return overflow ? 0 : ret; \
> >   }
> >
> > Form 10:
> >   #define SAT_SUB_U_10(T) \
> >   T sat_sub_u_10_##T (T x, T y) \
> >   { \
> > T ret; \
> > T overflow = __builtin_sub_overflow (x, y, &ret); \
> > return !overflow ? ret : 0; \
> >   }
> >
> > Take form 10 as example:
> >
> > SAT_SUB_U_10(uint64_t);
> >
> > Before this patch:

RE: [PATCH v1] Ifcvt: Add cond tree reconcile for truncated .SAT_SUB

2024-06-21 Thread Li, Pan2
Thanks Richard for comments.

> to match this by changing it to

> /* Unsigned saturation sub, case 2 (branch with ge):
>SAT_U_SUB = X >= Y ? X - Y : 0.  */
> (match (unsigned_integer_sat_sub @0 @1)
> (cond^ (ge @0 @1) (convert? (minus @0 @1)) integer_zerop)
>  (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
>   && types_match (type, @0, @1

Do we need another name for this matching ? Add (convert? here may change the 
sematics of .SAT_SUB.
When we call gimple_unsigned_integer_sat_sub (lhs, ops, NULL), the converted 
value may be returned different
to the (minus @0 @1). Please correct me if my understanding is wrong.

> and when using the gimple_match_* function make sure to consider
> that the .SAT_SUB (@0, @1) is converted to the type of the SSA name
> we matched?

This may have problem for vector part I guess, require some additional change 
from vectorize_convert when
I try to do that in previous. Let me double check about it, and keep you posted.

Pan

-Original Message-
From: Richard Biener  
Sent: Friday, June 21, 2024 3:00 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1] Ifcvt: Add cond tree reconcile for truncated .SAT_SUB

On Fri, Jun 21, 2024 at 5:53 AM  wrote:
>
> From: Pan Li 
>
> The zip benchmark of coremark-pro have one SAT_SUB like pattern but
> truncated as below:
>
> void test (uint16_t *x, unsigned b, unsigned n)
> {
>   unsigned a = 0;
>   register uint16_t *p = x;
>
>   do {
> a = *--p;
> *p = (uint16_t)(a >= b ? a - b : 0); // Truncate the result of SAT_SUB
>   } while (--n);
> }
>
> It will have gimple after ifcvt pass,  it cannot hit any pattern of
> SAT_SUB and then cannot vectorize to SAT_SUB.
>
> _2 = a_11 - b_12(D);
> iftmp.0_13 = (short unsigned int) _2;
> _18 = a_11 >= b_12(D);
> iftmp.0_5 = _18 ? iftmp.0_13 : 0;
>
> This patch would like to do some reconcile for above pattern to match
> the SAT_SUB pattern.  Then the underlying vect pass is able to vectorize
> the SAT_SUB.

Hmm.  I was thinking of allowing

/* Unsigned saturation sub, case 2 (branch with ge):
   SAT_U_SUB = X >= Y ? X - Y : 0.  */
(match (unsigned_integer_sat_sub @0 @1)
 (cond^ (ge @0 @1) (minus @0 @1) integer_zerop)
 (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
  && types_match (type, @0, @1

to match this by changing it to

/* Unsigned saturation sub, case 2 (branch with ge):
   SAT_U_SUB = X >= Y ? X - Y : 0.  */
(match (unsigned_integer_sat_sub @0 @1)
 (cond^ (ge @0 @1) (convert? (minus @0 @1)) integer_zerop)
 (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
  && types_match (type, @0, @1

and when using the gimple_match_* function make sure to consider
that the .SAT_SUB (@0, @1) is converted to the type of the SSA name
we matched?

Richard.

> _2 = a_11 - b_12(D);
> _18 = a_11 >= b_12(D);
> _pattmp = _18 ? _2 : 0; // .SAT_SUB pattern
> iftmp.0_13 = (short unsigned int) _pattmp;
> iftmp.0_5 = iftmp.0_13;
>
> The below tests are running for this patch.
> 1. The rv64gcv fully regression tests.
> 2. The rv64gcv build with glibc.
> 3. The x86 bootstrap tests.
> 4. The x86 fully regression tests.
>
> gcc/ChangeLog:
>
> * match.pd: Add new match for trunated unsigned sat_sub.
> * tree-if-conv.cc (gimple_truncated_unsigned_integer_sat_sub):
> New external decl from match.pd.
> (tree_if_cond_reconcile_unsigned_integer_sat_sub): New func impl
> to reconcile the truncated sat_sub pattern.
> (tree_if_cond_reconcile): New func impl to reconcile.
> (pass_if_conversion::execute): Try to reconcile after ifcvt.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/match.pd|  9 +
>  gcc/tree-if-conv.cc | 83 +
>  2 files changed, 92 insertions(+)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 3d0689c9312..9617a5f9d5e 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3210,6 +3210,15 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
>&& types_match (type, @0, @1
>
> +/* Unsigned saturation sub and then truncated, aka:
> +   Truncated = X >= Y ? (Other Type) (X - Y) : 0.
> + */
> +(match (truncated_unsigned_integer_sat_sub @0 @1)
> + (cond (ge @0 @1) (convert (minus @0 @1)) integer_zerop)
> + (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> +  && types_match (@0, @1)
> +  && tree_int_cst_lt (TYPE_SIZE (type), TYPE_SIZE (TREE_TYPE (@0))
> +
>  /* x >  y  &&  x != XXX_MIN  -->  x > y
>

RE: [PATCH v1] Ifcvt: Add cond tree reconcile for truncated .SAT_SUB

2024-06-21 Thread Li, Pan2
Thanks Richard for suggestion, tried the (convert? with below gimple stmt but 
got a miss def ice.
To double confirm, the *type_out should be the vector type of lhs, and we only 
need to build
one cvt stmt from itype to otype here. Or just return the call directly and set 
the type_out to the v_otype?

static gimple *
vect_recog_build_binary_gimple_stmt (vec_info *vinfo, gimple *stmt,
 internal_fn fn, tree *type_out,
 tree lhs, tree op_0, tree op_1)
{
  tree itype = TREE_TYPE (op_0);
  tree otype = TREE_TYPE (lhs);
  tree v_itype = get_vectype_for_scalar_type (vinfo, itype);
  tree v_otype = get_vectype_for_scalar_type (vinfo, otype);

  if (v_itype != NULL_TREE && v_otype != NULL_TREE
&& direct_internal_fn_supported_p (fn, v_itype, OPTIMIZE_FOR_BOTH))
{
  gcall *call = gimple_build_call_internal (fn, 2, op_0, op_1);
  tree itype_ssa = vect_recog_temp_ssa_var (itype, NULL);

  gimple_call_set_lhs (call, itype_ssa);
  gimple_call_set_nothrow (call, /* nothrow_p */ false);
  gimple_set_location (call, gimple_location (stmt));

  *type_out = v_otype;
  gimple *new_stmt = call;

  if (itype != otype)
{
  tree otype_ssa = vect_recog_temp_ssa_var (otype, NULL);
  new_stmt = gimple_build_assign (otype_ssa, CONVERT_EXPR, itype_ssa);
}

  return new_stmt;
}

  return NULL;
}

-cut the ice---

zip.test.c: In function ‘test’:
zip.test.c:4:6: error: missing definition
4 | void test (uint16_t *x, unsigned b, unsigned n)
  |  ^~~~
for SSA_NAME: patt_40 in statement:
vect_cst__151 = [vec_duplicate_expr] patt_40;
during GIMPLE pass: vect
dump file: zip.test.c.180t.vect
zip.test.c:4:6: internal compiler error: verify_ssa failed
0x1de0860 verify_ssa(bool, bool)

/home/pli/gcc/555/riscv-gnu-toolchain/gcc/__RISCV_BUILD__/../gcc/tree-ssa.cc:1203
0x1919f69 execute_function_todo

/home/pli/gcc/555/riscv-gnu-toolchain/gcc/__RISCV_BUILD__/../gcc/passes.cc:2096
0x1918b46 do_per_function

/home/pli/gcc/555/riscv-gnu-toolchain/gcc/__RISCV_BUILD__/../gcc/passes.cc:1688
0x191a116 execute_todo

Pan


-Original Message-
From: Richard Biener  
Sent: Friday, June 21, 2024 5:29 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1] Ifcvt: Add cond tree reconcile for truncated .SAT_SUB

On Fri, Jun 21, 2024 at 10:50 AM Li, Pan2  wrote:
>
> Thanks Richard for comments.
>
> > to match this by changing it to
>
> > /* Unsigned saturation sub, case 2 (branch with ge):
> >SAT_U_SUB = X >= Y ? X - Y : 0.  */
> > (match (unsigned_integer_sat_sub @0 @1)
> > (cond^ (ge @0 @1) (convert? (minus @0 @1)) integer_zerop)
> >  (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> >   && types_match (type, @0, @1
>
> Do we need another name for this matching ? Add (convert? here may change the 
> sematics of .SAT_SUB.
> When we call gimple_unsigned_integer_sat_sub (lhs, ops, NULL), the converted 
> value may be returned different
> to the (minus @0 @1). Please correct me if my understanding is wrong.

I think gimple_unsigned_integer_sat_sub (lhs, ...) simply matches
(typeof LHS).SAT_SUB (ops[0], ops[1]) now, I don't think it's necessary to
handle the case where typef LHS and typeof ops[0] are equal specially?

> > and when using the gimple_match_* function make sure to consider
> > that the .SAT_SUB (@0, @1) is converted to the type of the SSA name
> > we matched?
>
> This may have problem for vector part I guess, require some additional change 
> from vectorize_convert when
> I try to do that in previous. Let me double check about it, and keep you 
> posted.

You are using gimple_unsigned_integer_sat_sub from pattern recognition, the
thing to do is simply to add a conversion stmt to the pattern sequence in case
the types differ?

But maybe I'm missing something.

Richard.

> Pan
>
> -Original Message-
> From: Richard Biener 
> Sent: Friday, June 21, 2024 3:00 PM
> To: Li, Pan2 
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
> jeffreya...@gmail.com; rdapp@gmail.com
> Subject: Re: [PATCH v1] Ifcvt: Add cond tree reconcile for truncated .SAT_SUB
>
> On Fri, Jun 21, 2024 at 5:53 AM  wrote:
> >
> > From: Pan Li 
> >
> > The zip benchmark of coremark-pro have one SAT_SUB like pattern but
> > truncated as below:
> >
> > void test (uint16_t *x, unsigned b, unsigned n)
> > {
> >   unsigned a = 0;
> >   register uint16_t *p = x;
> >
> >   do {
> > a = *--p;
> > *p = (uint16_t)(a >= b ? a - b : 0); // Truncate the result of SAT_SUB
> >

RE: [PATCH v1] Ifcvt: Add cond tree reconcile for truncated .SAT_SUB

2024-06-23 Thread Li, Pan2
> You need to refactor this to add to the stmts pattern def sequence
>  (look for append_pattern_def_seq uses for example)

Thanks Richard, really save my day, will have a try in v2.

Pan

-Original Message-
From: Richard Biener  
Sent: Saturday, June 22, 2024 9:19 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1] Ifcvt: Add cond tree reconcile for truncated .SAT_SUB

On Fri, Jun 21, 2024 at 4:45 PM Li, Pan2  wrote:
>
> Thanks Richard for suggestion, tried the (convert? with below gimple stmt but 
> got a miss def ice.
> To double confirm, the *type_out should be the vector type of lhs, and we 
> only need to build
> one cvt stmt from itype to otype here. Or just return the call directly and 
> set the type_out to the v_otype?
>
> static gimple *
> vect_recog_build_binary_gimple_stmt (vec_info *vinfo, gimple *stmt,
>  internal_fn fn, tree *type_out,
>  tree lhs, tree op_0, tree op_1)
> {
>   tree itype = TREE_TYPE (op_0);
>   tree otype = TREE_TYPE (lhs);
>   tree v_itype = get_vectype_for_scalar_type (vinfo, itype);
>   tree v_otype = get_vectype_for_scalar_type (vinfo, otype);
>
>   if (v_itype != NULL_TREE && v_otype != NULL_TREE
> && direct_internal_fn_supported_p (fn, v_itype, OPTIMIZE_FOR_BOTH))
> {
>   gcall *call = gimple_build_call_internal (fn, 2, op_0, op_1);
>   tree itype_ssa = vect_recog_temp_ssa_var (itype, NULL);
>
>   gimple_call_set_lhs (call, itype_ssa);
>   gimple_call_set_nothrow (call, /* nothrow_p */ false);
>   gimple_set_location (call, gimple_location (stmt));
>
>   *type_out = v_otype;
>   gimple *new_stmt = call;
>
>   if (itype != otype)
> {
>   tree otype_ssa = vect_recog_temp_ssa_var (otype, NULL);
>   new_stmt = gimple_build_assign (otype_ssa, CONVERT_EXPR, itype_ssa);
> }
>
>   return new_stmt;

You need to refactor this to add to the stmts pattern def sequence
(look for append_pattern_def_seq uses for example)

> }
>
>   return NULL;
> }
>
> -cut the ice---
>
> zip.test.c: In function ‘test’:
> zip.test.c:4:6: error: missing definition
> 4 | void test (uint16_t *x, unsigned b, unsigned n)
>   |  ^~~~
> for SSA_NAME: patt_40 in statement:
> vect_cst__151 = [vec_duplicate_expr] patt_40;
> during GIMPLE pass: vect
> dump file: zip.test.c.180t.vect
> zip.test.c:4:6: internal compiler error: verify_ssa failed
> 0x1de0860 verify_ssa(bool, bool)
> 
> /home/pli/gcc/555/riscv-gnu-toolchain/gcc/__RISCV_BUILD__/../gcc/tree-ssa.cc:1203
> 0x1919f69 execute_function_todo
> 
> /home/pli/gcc/555/riscv-gnu-toolchain/gcc/__RISCV_BUILD__/../gcc/passes.cc:2096
> 0x1918b46 do_per_function
> 
> /home/pli/gcc/555/riscv-gnu-toolchain/gcc/__RISCV_BUILD__/../gcc/passes.cc:1688
> 0x191a116 execute_todo
>
> Pan
>
>
> -Original Message-
> From: Richard Biener 
> Sent: Friday, June 21, 2024 5:29 PM
> To: Li, Pan2 
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
> jeffreya...@gmail.com; rdapp@gmail.com
> Subject: Re: [PATCH v1] Ifcvt: Add cond tree reconcile for truncated .SAT_SUB
>
> On Fri, Jun 21, 2024 at 10:50 AM Li, Pan2  wrote:
> >
> > Thanks Richard for comments.
> >
> > > to match this by changing it to
> >
> > > /* Unsigned saturation sub, case 2 (branch with ge):
> > >SAT_U_SUB = X >= Y ? X - Y : 0.  */
> > > (match (unsigned_integer_sat_sub @0 @1)
> > > (cond^ (ge @0 @1) (convert? (minus @0 @1)) integer_zerop)
> > >  (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> > >   && types_match (type, @0, @1
> >
> > Do we need another name for this matching ? Add (convert? here may change 
> > the sematics of .SAT_SUB.
> > When we call gimple_unsigned_integer_sat_sub (lhs, ops, NULL), the 
> > converted value may be returned different
> > to the (minus @0 @1). Please correct me if my understanding is wrong.
>
> I think gimple_unsigned_integer_sat_sub (lhs, ...) simply matches
> (typeof LHS).SAT_SUB (ops[0], ops[1]) now, I don't think it's necessary to
> handle the case where typef LHS and typeof ops[0] are equal specially?
>
> > > and when using the gimple_match_* function make sure to consider
> > > that the .SAT_SUB (@0, @1) is converted to the type of the SSA name
> > > we matched?
> >
> > This may have problem for vector part I guess, require some additional 
> > change from vector

RE: [PATCH v2] Vect: Support truncate after .SAT_SUB pattern in zip

2024-06-24 Thread Li, Pan2
Thanks Tamar for comments. It indeed benefits the vectorized code, for example 
in RISC-V, we may eliminate some vsetvel insn in loop for widen here.

> iftmp.0_5 = .SAT_SUB ((short unsigned int) a_11, (short unsigned int) 
> b_12(D));
> is cheaper than
> iftmp.0_5 = (short unsigned int).SAT_SUB (a_11, b_12(D));

I am not sure if it has any correctness problem for this transform, take 
uint16_t to uint8_t as example.

uint16_t a, b;
uint8_t result = (uint8_t)(a >= b ? a - b : 0);

Given a = 0x100; // 256
   b = 0xff; // 255
For iftmp.0_5 = .SAT_SUB ((char unsigned) a, (char unsigned) b) = .SAT_SUB (0, 
255) = 0
For iftmp.0_5 = (char unsigned).SAT_SUB (a, b) = (char unsigned).SAT_SUB (256, 
255) = 1

Please help to correct me if any misunderstanding, thanks again for 
enlightening.

Pan

-Original Message-
From: Tamar Christina  
Sent: Tuesday, June 25, 2024 4:00 AM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com; 
jeffreya...@gmail.com; pins...@gmail.com
Subject: RE: [PATCH v2] Vect: Support truncate after .SAT_SUB pattern in zip

Hi,

> -Original Message-
> From: pan2...@intel.com 
> Sent: Monday, June 24, 2024 2:55 PM
> To: gcc-patches@gcc.gnu.org
> Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com;
> jeffreya...@gmail.com; pins...@gmail.com; Pan Li 
> Subject: [PATCH v2] Vect: Support truncate after .SAT_SUB pattern in zip
> 
> From: Pan Li 
> 
> The zip benchmark of coremark-pro have one SAT_SUB like pattern but
> truncated as below:
> 
> void test (uint16_t *x, unsigned b, unsigned n)
> {
>   unsigned a = 0;
>   register uint16_t *p = x;
> 
>   do {
> a = *--p;
> *p = (uint16_t)(a >= b ? a - b : 0); // Truncate after .SAT_SUB
>   } while (--n);
> }
> 
> It will have gimple before vect pass,  it cannot hit any pattern of
> SAT_SUB and then cannot vectorize to SAT_SUB.
> 
> _2 = a_11 - b_12(D);
> iftmp.0_13 = (short unsigned int) _2;
> _18 = a_11 >= b_12(D);
> iftmp.0_5 = _18 ? iftmp.0_13 : 0;
> 
> This patch would like to improve the pattern match to recog above
> as truncate after .SAT_SUB pattern.  Then we will have the pattern
> similar to below,  as well as eliminate the first 3 dead stmt.
> 
> _2 = a_11 - b_12(D);
> iftmp.0_13 = (short unsigned int) _2;
> _18 = a_11 >= b_12(D);
> iftmp.0_5 = (short unsigned int).SAT_SUB (a_11, b_12(D));
> 

I guess this is because one branch of the  cond is a constant so the
convert is folded in.  I was wondering though,  can't we just push
in the truncate in this case?

i.e. in this case we know both types are unsigned and the difference
positive and max value is the max value of the truncate type.

It seems like folding as a general rule

  _1 = *p_10;
  a_11 = (unsigned int) _1;
  _2 = a_11 - b_12(D);
  iftmp.0_13 = (short unsigned int) _2;
  _18 = a_11 >= b_12(D);
  iftmp.0_5 = _18 ? iftmp.0_13 : 0;
  *p_10 = iftmp.0_5;

Into 

  _1 = *p_10;
  a_11 = (unsigned int) _1;
  _2 = ((short unsigned int) a_11) - ((short unsigned int) b_12(D));
  iftmp.0_13 = _2;
  _18 = a_11 >= b_12(D);
  iftmp.0_5 = _18 ? iftmp.0_13 : 0;
  *p_10 = iftmp.0_5;

Is valid (though might have missed something).  This would negate the need for 
this change to the vectorizer and saturation detection
but also should generate better vector code. This is what we do in the general 
case https://godbolt.org/z/dfoj6fWdv
I think here we're just not seeing through the cond.

Typically lots of architectures have cheap truncation operations, so truncating 
before saturation means you do the cheap
operation first rather than doing the complex operation on the wider type.

That is,

_2 = a_11 - b_12(D);
iftmp.0_13 = (short unsigned int) _2;
_18 = a_11 >= b_12(D);
iftmp.0_5 = .SAT_SUB ((short unsigned int) a_11, (short unsigned int) b_12(D));

is cheaper than

_2 = a_11 - b_12(D);
iftmp.0_13 = (short unsigned int) _2;
_18 = a_11 >= b_12(D);
iftmp.0_5 = (short unsigned int).SAT_SUB (a_11, b_12(D));

after vectorization.   Normally the vectorizer will try to do this through 
over-widening detection as well,
but we haven't taught ranger about the ranges of these new IFNs (probably 
should at some point).

Cheers,
Tamar

> The below tests are passed for this patch.
> 1. The rv64gcv fully regression tests.
> 2. The rv64gcv build with glibc.
> 3. The x86 bootstrap tests.
> 4. The x86 fully regression tests.
> 
> gcc/ChangeLog:
> 
>   * match.pd: Add convert description for minus and capture.
>   * tree-vect-patterns.cc (vect_recog_build_binary_gimple_call): Add
>   new logic to handle in_type is incompatibile with out_type,  as
>   well as rename from.
>   (vect_recog_build_binary_gimple_stmt): Rename to.
>   (vect_recog_sat_add_pattern): Leverage

RE: [PATCH v2] Vect: Support truncate after .SAT_SUB pattern in zip

2024-06-24 Thread Li, Pan2
> Ah, no you're right, those would end up wrong for saturation. Arg..  Sorry 
> should have
> though it through more.

Never mind, but you enlighten me for even more optimize with some restrictions. 
I revisited the pattern, for example as below.

uint16_t a, b;
uint8_t result = (uint8_t)(a >= b ? a - b : 0);

=> result = (char unsigned).SAT_SUB (a, b)

If a has a def like below
uint8_t other = 0x1f;
a = (uint8_t)other

then we can safely convert result = (char unsigned).SAT_SUB (a, b) to
result = .SAT_SUB ((char unsigned)a, (char unsigned).b)

Then we may have better vectorized code if a is limited to char unsigned. Of 
course we can do that based on this patch.

Pan

-Original Message-
From: Tamar Christina  
Sent: Tuesday, June 25, 2024 12:01 PM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com; 
jeffreya...@gmail.com; pins...@gmail.com
Subject: RE: [PATCH v2] Vect: Support truncate after .SAT_SUB pattern in zip

> -Original Message-
> From: Li, Pan2 
> Sent: Tuesday, June 25, 2024 3:25 AM
> To: Tamar Christina ; gcc-patches@gcc.gnu.org
> Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com;
> jeffreya...@gmail.com; pins...@gmail.com
> Subject: RE: [PATCH v2] Vect: Support truncate after .SAT_SUB pattern in zip
> 
> Thanks Tamar for comments. It indeed benefits the vectorized code, for 
> example in
> RISC-V, we may eliminate some vsetvel insn in loop for widen here.
> 
> > iftmp.0_5 = .SAT_SUB ((short unsigned int) a_11, (short unsigned int) 
> > b_12(D));
> > is cheaper than
> > iftmp.0_5 = (short unsigned int).SAT_SUB (a_11, b_12(D));
> 
> I am not sure if it has any correctness problem for this transform, take 
> uint16_t to
> uint8_t as example.
> 
> uint16_t a, b;
> uint8_t result = (uint8_t)(a >= b ? a - b : 0);
> 
> Given a = 0x100; // 256
>b = 0xff; // 255
> For iftmp.0_5 = .SAT_SUB ((char unsigned) a, (char unsigned) b) = .SAT_SUB (0,
> 255) = 0
> For iftmp.0_5 = (char unsigned).SAT_SUB (a, b) = (char unsigned).SAT_SUB (256,
> 255) = 1
> 
> Please help to correct me if any misunderstanding, thanks again for 
> enlightening.

Ah, no you're right, those would end up wrong for saturation. Arg..  Sorry 
should have
though it through more.

Tamar.
> 
> Pan
> 
> -Original Message-
> From: Tamar Christina 
> Sent: Tuesday, June 25, 2024 4:00 AM
> To: Li, Pan2 ; gcc-patches@gcc.gnu.org
> Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com;
> jeffreya...@gmail.com; pins...@gmail.com
> Subject: RE: [PATCH v2] Vect: Support truncate after .SAT_SUB pattern in zip
> 
> Hi,
> 
> > -Original Message-
> > From: pan2...@intel.com 
> > Sent: Monday, June 24, 2024 2:55 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com;
> > jeffreya...@gmail.com; pins...@gmail.com; Pan Li 
> > Subject: [PATCH v2] Vect: Support truncate after .SAT_SUB pattern in zip
> >
> > From: Pan Li 
> >
> > The zip benchmark of coremark-pro have one SAT_SUB like pattern but
> > truncated as below:
> >
> > void test (uint16_t *x, unsigned b, unsigned n)
> > {
> >   unsigned a = 0;
> >   register uint16_t *p = x;
> >
> >   do {
> > a = *--p;
> > *p = (uint16_t)(a >= b ? a - b : 0); // Truncate after .SAT_SUB
> >   } while (--n);
> > }
> >
> > It will have gimple before vect pass,  it cannot hit any pattern of
> > SAT_SUB and then cannot vectorize to SAT_SUB.
> >
> > _2 = a_11 - b_12(D);
> > iftmp.0_13 = (short unsigned int) _2;
> > _18 = a_11 >= b_12(D);
> > iftmp.0_5 = _18 ? iftmp.0_13 : 0;
> >
> > This patch would like to improve the pattern match to recog above
> > as truncate after .SAT_SUB pattern.  Then we will have the pattern
> > similar to below,  as well as eliminate the first 3 dead stmt.
> >
> > _2 = a_11 - b_12(D);
> > iftmp.0_13 = (short unsigned int) _2;
> > _18 = a_11 >= b_12(D);
> > iftmp.0_5 = (short unsigned int).SAT_SUB (a_11, b_12(D));
> >
> 
> I guess this is because one branch of the  cond is a constant so the
> convert is folded in.  I was wondering though,  can't we just push
> in the truncate in this case?
> 
> i.e. in this case we know both types are unsigned and the difference
> positive and max value is the max value of the truncate type.
> 
> It seems like folding as a general rule
> 
>   _1 = *p_10;
>   a_11 = (unsigned int) _1;
>   _2 = a_11 - b_12(D);
>   iftmp.0_13 = (short unsigned int) _2;
>   _18 = a_11 >

RE: [PATCH v2] Vect: Support truncate after .SAT_SUB pattern in zip

2024-06-24 Thread Li, Pan2
Got it, thanks Tamer, will have a try.

Pan

-Original Message-
From: Tamar Christina  
Sent: Tuesday, June 25, 2024 2:11 PM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com; 
jeffreya...@gmail.com; pins...@gmail.com
Subject: RE: [PATCH v2] Vect: Support truncate after .SAT_SUB pattern in zip

> -Original Message-
> From: Li, Pan2 
> Sent: Tuesday, June 25, 2024 7:06 AM
> To: Tamar Christina ; gcc-patches@gcc.gnu.org
> Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com;
> jeffreya...@gmail.com; pins...@gmail.com
> Subject: RE: [PATCH v2] Vect: Support truncate after .SAT_SUB pattern in zip
> 
> > Ah, no you're right, those would end up wrong for saturation. Arg..  Sorry 
> > should
> have
> > though it through more.
> 
> Never mind, but you enlighten me for even more optimize with some 
> restrictions. I
> revisited the pattern, for example as below.
> 
> uint16_t a, b;
> uint8_t result = (uint8_t)(a >= b ? a - b : 0);
> 
> => result = (char unsigned).SAT_SUB (a, b)
> 
> If a has a def like below
> uint8_t other = 0x1f;
> a = (uint8_t)other

You can in principle do this by querying range information,
e.g.

  gimple_ranger ranger;
  int_range_max r;
  if (ranger.range_of_expr (r, oprnd0, stmt) && !r.undefined_p ())
{
...

We do this for instance in vect_recog_divmod_pattern.

Tamar

> 
> then we can safely convert result = (char unsigned).SAT_SUB (a, b) to
> result = .SAT_SUB ((char unsigned)a, (char unsigned).b)
> 
> Then we may have better vectorized code if a is limited to char unsigned. Of 
> course
> we can do that based on this patch.
> 
> Pan
> 
> -Original Message-
> From: Tamar Christina 
> Sent: Tuesday, June 25, 2024 12:01 PM
> To: Li, Pan2 ; gcc-patches@gcc.gnu.org
> Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com;
> jeffreya...@gmail.com; pins...@gmail.com
> Subject: RE: [PATCH v2] Vect: Support truncate after .SAT_SUB pattern in zip
> 
> > -Original Message-
> > From: Li, Pan2 
> > Sent: Tuesday, June 25, 2024 3:25 AM
> > To: Tamar Christina ; gcc-patches@gcc.gnu.org
> > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com;
> > jeffreya...@gmail.com; pins...@gmail.com
> > Subject: RE: [PATCH v2] Vect: Support truncate after .SAT_SUB pattern in zip
> >
> > Thanks Tamar for comments. It indeed benefits the vectorized code, for 
> > example
> in
> > RISC-V, we may eliminate some vsetvel insn in loop for widen here.
> >
> > > iftmp.0_5 = .SAT_SUB ((short unsigned int) a_11, (short unsigned int)
> b_12(D));
> > > is cheaper than
> > > iftmp.0_5 = (short unsigned int).SAT_SUB (a_11, b_12(D));
> >
> > I am not sure if it has any correctness problem for this transform, take 
> > uint16_t
> to
> > uint8_t as example.
> >
> > uint16_t a, b;
> > uint8_t result = (uint8_t)(a >= b ? a - b : 0);
> >
> > Given a = 0x100; // 256
> >b = 0xff; // 255
> > For iftmp.0_5 = .SAT_SUB ((char unsigned) a, (char unsigned) b) = .SAT_SUB 
> > (0,
> > 255) = 0
> > For iftmp.0_5 = (char unsigned).SAT_SUB (a, b) = (char unsigned).SAT_SUB 
> > (256,
> > 255) = 1
> >
> > Please help to correct me if any misunderstanding, thanks again for 
> > enlightening.
> 
> Ah, no you're right, those would end up wrong for saturation. Arg..  Sorry 
> should
> have
> though it through more.
> 
> Tamar.
> >
> > Pan
> >
> > -Original Message-
> > From: Tamar Christina 
> > Sent: Tuesday, June 25, 2024 4:00 AM
> > To: Li, Pan2 ; gcc-patches@gcc.gnu.org
> > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com;
> > jeffreya...@gmail.com; pins...@gmail.com
> > Subject: RE: [PATCH v2] Vect: Support truncate after .SAT_SUB pattern in zip
> >
> > Hi,
> >
> > > -Original Message-
> > > From: pan2...@intel.com 
> > > Sent: Monday, June 24, 2024 2:55 PM
> > > To: gcc-patches@gcc.gnu.org
> > > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com;
> richard.guent...@gmail.com;
> > > jeffreya...@gmail.com; pins...@gmail.com; Pan Li 
> > > Subject: [PATCH v2] Vect: Support truncate after .SAT_SUB pattern in zip
> > >
> > > From: Pan Li 
> > >
> > > The zip benchmark of coremark-pro have one SAT_SUB like pattern but
> > > truncated as below:
> > >
> > > void test (uint16_t *x, unsigned b, unsigned n)
&

RE: [PATCH v2] Vect: Support truncate after .SAT_SUB pattern in zip

2024-06-25 Thread Li, Pan2
Thanks Tamer, gimple_ranger works well for that case, will send another patch 
after this one.

Pan

-Original Message-
From: Li, Pan2 
Sent: Tuesday, June 25, 2024 2:26 PM
To: Tamar Christina ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com; 
jeffreya...@gmail.com; pins...@gmail.com
Subject: RE: [PATCH v2] Vect: Support truncate after .SAT_SUB pattern in zip

Got it, thanks Tamer, will have a try.

Pan

-Original Message-
From: Tamar Christina  
Sent: Tuesday, June 25, 2024 2:11 PM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com; 
jeffreya...@gmail.com; pins...@gmail.com
Subject: RE: [PATCH v2] Vect: Support truncate after .SAT_SUB pattern in zip

> -Original Message-
> From: Li, Pan2 
> Sent: Tuesday, June 25, 2024 7:06 AM
> To: Tamar Christina ; gcc-patches@gcc.gnu.org
> Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com;
> jeffreya...@gmail.com; pins...@gmail.com
> Subject: RE: [PATCH v2] Vect: Support truncate after .SAT_SUB pattern in zip
> 
> > Ah, no you're right, those would end up wrong for saturation. Arg..  Sorry 
> > should
> have
> > though it through more.
> 
> Never mind, but you enlighten me for even more optimize with some 
> restrictions. I
> revisited the pattern, for example as below.
> 
> uint16_t a, b;
> uint8_t result = (uint8_t)(a >= b ? a - b : 0);
> 
> => result = (char unsigned).SAT_SUB (a, b)
> 
> If a has a def like below
> uint8_t other = 0x1f;
> a = (uint8_t)other

You can in principle do this by querying range information,
e.g.

  gimple_ranger ranger;
  int_range_max r;
  if (ranger.range_of_expr (r, oprnd0, stmt) && !r.undefined_p ())
{
...

We do this for instance in vect_recog_divmod_pattern.

Tamar

> 
> then we can safely convert result = (char unsigned).SAT_SUB (a, b) to
> result = .SAT_SUB ((char unsigned)a, (char unsigned).b)
> 
> Then we may have better vectorized code if a is limited to char unsigned. Of 
> course
> we can do that based on this patch.
> 
> Pan
> 
> -Original Message-
> From: Tamar Christina 
> Sent: Tuesday, June 25, 2024 12:01 PM
> To: Li, Pan2 ; gcc-patches@gcc.gnu.org
> Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com;
> jeffreya...@gmail.com; pins...@gmail.com
> Subject: RE: [PATCH v2] Vect: Support truncate after .SAT_SUB pattern in zip
> 
> > -Original Message-
> > From: Li, Pan2 
> > Sent: Tuesday, June 25, 2024 3:25 AM
> > To: Tamar Christina ; gcc-patches@gcc.gnu.org
> > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com;
> > jeffreya...@gmail.com; pins...@gmail.com
> > Subject: RE: [PATCH v2] Vect: Support truncate after .SAT_SUB pattern in zip
> >
> > Thanks Tamar for comments. It indeed benefits the vectorized code, for 
> > example
> in
> > RISC-V, we may eliminate some vsetvel insn in loop for widen here.
> >
> > > iftmp.0_5 = .SAT_SUB ((short unsigned int) a_11, (short unsigned int)
> b_12(D));
> > > is cheaper than
> > > iftmp.0_5 = (short unsigned int).SAT_SUB (a_11, b_12(D));
> >
> > I am not sure if it has any correctness problem for this transform, take 
> > uint16_t
> to
> > uint8_t as example.
> >
> > uint16_t a, b;
> > uint8_t result = (uint8_t)(a >= b ? a - b : 0);
> >
> > Given a = 0x100; // 256
> >b = 0xff; // 255
> > For iftmp.0_5 = .SAT_SUB ((char unsigned) a, (char unsigned) b) = .SAT_SUB 
> > (0,
> > 255) = 0
> > For iftmp.0_5 = (char unsigned).SAT_SUB (a, b) = (char unsigned).SAT_SUB 
> > (256,
> > 255) = 1
> >
> > Please help to correct me if any misunderstanding, thanks again for 
> > enlightening.
> 
> Ah, no you're right, those would end up wrong for saturation. Arg..  Sorry 
> should
> have
> though it through more.
> 
> Tamar.
> >
> > Pan
> >
> > -Original Message-
> > From: Tamar Christina 
> > Sent: Tuesday, June 25, 2024 4:00 AM
> > To: Li, Pan2 ; gcc-patches@gcc.gnu.org
> > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com;
> > jeffreya...@gmail.com; pins...@gmail.com
> > Subject: RE: [PATCH v2] Vect: Support truncate after .SAT_SUB pattern in zip
> >
> > Hi,
> >
> > > -Original Message-
> > > From: pan2...@intel.com 
> > > Sent: Monday, June 24, 2024 2:55 PM
> > > To: gcc-patches@gcc.gnu.org
> > > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com;
> richard.guent..

RE: [PATCH v1] Internal-fn: Support new IFN SAT_TRUNC for unsigned scalar int

2024-06-26 Thread Li, Pan2
Thanks Richard, will address the comments in v2.

Pan

-Original Message-
From: Richard Biener  
Sent: Wednesday, June 26, 2024 9:52 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1] Internal-fn: Support new IFN SAT_TRUNC for unsigned 
scalar int

On Wed, Jun 26, 2024 at 3:46 AM  wrote:
>
> From: Pan Li 
>
> This patch would like to add the middle-end presentation for the
> saturation truncation.  Aka set the result of truncated value to
> the max value when overflow.  It will take the pattern similar
> as below.
>
> Form 1:
>   #define DEF_SAT_U_TRUC_FMT_1(WT, NT) \
>   NT __attribute__((noinline)) \
>   sat_u_truc_##T##_fmt_1 (WT x)\
>   {\
> bool overflow = x > (WT)(NT)(-1);  \
> return ((NT)x) | (NT)-overflow;\
>   }
>
> For example, truncated uint16_t to uint8_t, we have
>
> * SAT_TRUNC (254)   => 254
> * SAT_TRUNC (255)   => 255
> * SAT_TRUNC (256)   => 255
> * SAT_TRUNC (65536) => 255
>
> Given below SAT_TRUNC from uint64_t to uint32_t.
>
> DEF_SAT_U_TRUC_FMT_1 (uint64_t, uint32_t)
>
> Before this patch:
> __attribute__((noinline))
> uint32_t sat_u_truc_T_fmt_1 (uint64_t x)
> {
>   _Bool overflow;
>   unsigned int _1;
>   unsigned int _2;
>   unsigned int _3;
>   uint32_t _6;
>
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   overflow_5 = x_4(D) > 4294967295;
>   _1 = (unsigned int) x_4(D);
>   _2 = (unsigned int) overflow_5;
>   _3 = -_2;
>   _6 = _1 | _3;
>   return _6;
> ;;succ:   EXIT
>
> }
>
> After this patch:
> __attribute__((noinline))
> uint32_t sat_u_truc_T_fmt_1 (uint64_t x)
> {
>   uint32_t _6;
>
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _6 = .SAT_TRUNC (x_4(D)); [tail call]
>   return _6;
> ;;succ:   EXIT
>
> }
>
> The below tests are passed for this patch:
> *. The rv64gcv fully regression tests.
> *. The rv64gcv build with glibc.
> *. The x86 bootstrap tests.
> *. The x86 fully regression tests.
>
> gcc/ChangeLog:
>
> * internal-fn.def (SAT_TRUNC): Add new signed IFN sat_trunc as
> unary_convert.
> * match.pd: Add new matching pattern for unsigned int sat_trunc.
> * optabs.def (OPTAB_CL): Add unsigned and signed optab.
> * tree-ssa-math-opts.cc (gimple_unsigend_integer_sat_trunc): Add
> new decl for the matching pattern generated func.
> (match_unsigned_saturation_trunc): Add new func impl to match
> the .SAT_TRUNC.
> (math_opts_dom_walker::after_dom_children): Add .SAT_TRUNC match
> function under BIT_IOR_EXPR case.
> * tree.cc (integer_half_truncated_all_ones_p): Add new func impl
> to filter the truncated threshold.
> * tree.h (integer_half_truncated_all_ones_p): Add new func decl.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/internal-fn.def   |  2 ++
>  gcc/match.pd  | 12 +++-
>  gcc/optabs.def|  3 +++
>  gcc/tree-ssa-math-opts.cc | 32 
>  gcc/tree.cc   | 22 ++
>  gcc/tree.h|  6 ++
>  6 files changed, 76 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index a8c83437ada..915d329c05a 100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -278,6 +278,8 @@ DEF_INTERNAL_SIGNED_OPTAB_FN (MULHRS, ECF_CONST | 
> ECF_NOTHROW, first,
>  DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_ADD, ECF_CONST, first, ssadd, usadd, 
> binary)
>  DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_SUB, ECF_CONST, first, sssub, ussub, 
> binary)
>
> +DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_TRUNC, ECF_CONST, first, sstrunc, ustrunc, 
> unary_convert)
> +
>  DEF_INTERNAL_COND_FN (ADD, ECF_CONST, add, binary)
>  DEF_INTERNAL_COND_FN (SUB, ECF_CONST, sub, binary)
>  DEF_INTERNAL_COND_FN (MUL, ECF_CONST, smul, binary)
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 3d0689c9312..d4062434cc7 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -39,7 +39,8 @@ along with GCC; see the file COPYING3.  If not see
> HONOR_NANS
> uniform_vector_p
> expand_vec_cmp_expr_p
> -   bitmask_inv_cst_vector_p)
> +   bitmask_inv_cst_vector_p
> +   integer_half_truncated_all_ones_p)
>
>  /* Operator lists.  */
>  (define_operator_list tcc_comparison
> @@ -3210,6 +3211,15 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
>&& types_match (type, @0, @1
>
> +/* Unsigned sat

RE: [PATCH v2] Vect: Support truncate after .SAT_SUB pattern in zip

2024-06-26 Thread Li, Pan2
> I suppose the other patterns can see similar enhacements for the case
> their forms
> show up truncated or extended?

Yes, just want to highlight that this form comes from the zip benchmark.
Of course, the rest forms are planed in underlying Patch(es).

> Please use NOP_EXPR here.

Sure, and will send the v2 if no surprise from test.

Pan

-Original Message-
From: Richard Biener  
Sent: Wednesday, June 26, 2024 9:56 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
jeffreya...@gmail.com; pins...@gmail.com
Subject: Re: [PATCH v2] Vect: Support truncate after .SAT_SUB pattern in zip

On Mon, Jun 24, 2024 at 3:55 PM  wrote:
>
> From: Pan Li 
>
> The zip benchmark of coremark-pro have one SAT_SUB like pattern but
> truncated as below:
>
> void test (uint16_t *x, unsigned b, unsigned n)
> {
>   unsigned a = 0;
>   register uint16_t *p = x;
>
>   do {
> a = *--p;
> *p = (uint16_t)(a >= b ? a - b : 0); // Truncate after .SAT_SUB
>   } while (--n);
> }
>
> It will have gimple before vect pass,  it cannot hit any pattern of
> SAT_SUB and then cannot vectorize to SAT_SUB.
>
> _2 = a_11 - b_12(D);
> iftmp.0_13 = (short unsigned int) _2;
> _18 = a_11 >= b_12(D);
> iftmp.0_5 = _18 ? iftmp.0_13 : 0;
>
> This patch would like to improve the pattern match to recog above
> as truncate after .SAT_SUB pattern.  Then we will have the pattern
> similar to below,  as well as eliminate the first 3 dead stmt.
>
> _2 = a_11 - b_12(D);
> iftmp.0_13 = (short unsigned int) _2;
> _18 = a_11 >= b_12(D);
> iftmp.0_5 = (short unsigned int).SAT_SUB (a_11, b_12(D));
>
> The below tests are passed for this patch.
> 1. The rv64gcv fully regression tests.
> 2. The rv64gcv build with glibc.
> 3. The x86 bootstrap tests.
> 4. The x86 fully regression tests.
>
> gcc/ChangeLog:
>
> * match.pd: Add convert description for minus and capture.
> * tree-vect-patterns.cc (vect_recog_build_binary_gimple_call): Add
> new logic to handle in_type is incompatibile with out_type,  as
> well as rename from.
> (vect_recog_build_binary_gimple_stmt): Rename to.
> (vect_recog_sat_add_pattern): Leverage above renamed func.
> (vect_recog_sat_sub_pattern): Ditto.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/match.pd  |  4 +--
>  gcc/tree-vect-patterns.cc | 51 ---
>  2 files changed, 33 insertions(+), 22 deletions(-)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 3d0689c9312..4a4b0b2e72f 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3164,9 +3164,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  /* Unsigned saturation sub, case 2 (branch with ge):
> SAT_U_SUB = X >= Y ? X - Y : 0.  */
>  (match (unsigned_integer_sat_sub @0 @1)
> - (cond^ (ge @0 @1) (minus @0 @1) integer_zerop)
> + (cond^ (ge @0 @1) (convert? (minus (convert1? @0) (convert1? @1))) 
> integer_zerop)
>   (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> -  && types_match (type, @0, @1
> +  && TYPE_UNSIGNED (TREE_TYPE (@0)) && types_match (@0, @1

I suppose the other patterns can see similar enhacements for the case
their forms
show up truncated or extended?

>  /* Unsigned saturation sub, case 3 (branchless with gt):
> SAT_U_SUB = (X - Y) * (X > Y).  */
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index cef901808eb..3d887d36050 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -4490,26 +4490,37 @@ vect_recog_mult_pattern (vec_info *vinfo,
>  extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree));
>  extern bool gimple_unsigned_integer_sat_sub (tree, tree*, tree (*)(tree));
>
> -static gcall *
> -vect_recog_build_binary_gimple_call (vec_info *vinfo, gimple *stmt,
> +static gimple *
> +vect_recog_build_binary_gimple_stmt (vec_info *vinfo, stmt_vec_info 
> stmt_info,
>  internal_fn fn, tree *type_out,
> -tree op_0, tree op_1)
> +tree lhs, tree op_0, tree op_1)
>  {
>tree itype = TREE_TYPE (op_0);
> -  tree vtype = get_vectype_for_scalar_type (vinfo, itype);
> +  tree otype = TREE_TYPE (lhs);
> +  tree v_itype = get_vectype_for_scalar_type (vinfo, itype);
> +  tree v_otype = get_vectype_for_scalar_type (vinfo, otype);
>
> -  if (vtype != NULL_TREE
> -&& direct_internal_fn_supported_p (fn, vtype, OPTIMIZE_FOR_BOTH))
> +  if (v_itype != NULL_TREE && v_otype != NULL_TREE
> +&& direct_internal_fn_supported_p (fn, v_itype, OPTIMIZE_FOR_BOTH))
>  

RE: [PATCH v3] Vect: Support truncate after .SAT_SUB pattern in zip

2024-06-26 Thread Li, Pan2
> OK

Committed, thanks Richard.

Pan

-Original Message-
From: Richard Biener  
Sent: Thursday, June 27, 2024 2:04 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v3] Vect: Support truncate after .SAT_SUB pattern in zip

On Thu, Jun 27, 2024 at 3:31 AM  wrote:
>
> From: Pan Li 

OK

> The zip benchmark of coremark-pro have one SAT_SUB like pattern but
> truncated as below:
>
> void test (uint16_t *x, unsigned b, unsigned n)
> {
>   unsigned a = 0;
>   register uint16_t *p = x;
>
>   do {
> a = *--p;
> *p = (uint16_t)(a >= b ? a - b : 0); // Truncate after .SAT_SUB
>   } while (--n);
> }
>
> It will have gimple before vect pass,  it cannot hit any pattern of
> SAT_SUB and then cannot vectorize to SAT_SUB.
>
> _2 = a_11 - b_12(D);
> iftmp.0_13 = (short unsigned int) _2;
> _18 = a_11 >= b_12(D);
> iftmp.0_5 = _18 ? iftmp.0_13 : 0;
>
> This patch would like to improve the pattern match to recog above
> as truncate after .SAT_SUB pattern.  Then we will have the pattern
> similar to below,  as well as eliminate the first 3 dead stmt.
>
> _2 = a_11 - b_12(D);
> iftmp.0_13 = (short unsigned int) _2;
> _18 = a_11 >= b_12(D);
> iftmp.0_5 = (short unsigned int).SAT_SUB (a_11, b_12(D));
>
> The below tests are passed for this patch.
> 1. The rv64gcv fully regression tests.
> 2. The rv64gcv build with glibc.
> 3. The x86 bootstrap tests.
> 4. The x86 fully regression tests.
>
> gcc/ChangeLog:
>
> * match.pd: Add convert description for minus and capture.
> * tree-vect-patterns.cc (vect_recog_build_binary_gimple_call): Add
> new logic to handle in_type is incompatibile with out_type,  as
> well as rename from.
> (vect_recog_build_binary_gimple_stmt): Rename to.
> (vect_recog_sat_add_pattern): Leverage above renamed func.
> (vect_recog_sat_sub_pattern): Ditto.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/match.pd  |  4 +--
>  gcc/tree-vect-patterns.cc | 51 ---
>  2 files changed, 33 insertions(+), 22 deletions(-)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index cf8a399a744..820591a36b3 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3164,9 +3164,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  /* Unsigned saturation sub, case 2 (branch with ge):
> SAT_U_SUB = X >= Y ? X - Y : 0.  */
>  (match (unsigned_integer_sat_sub @0 @1)
> - (cond^ (ge @0 @1) (minus @0 @1) integer_zerop)
> + (cond^ (ge @0 @1) (convert? (minus (convert1? @0) (convert1? @1))) 
> integer_zerop)
>   (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> -  && types_match (type, @0, @1
> +  && TYPE_UNSIGNED (TREE_TYPE (@0)) && types_match (@0, @1
>
>  /* Unsigned saturation sub, case 3 (branchless with gt):
> SAT_U_SUB = (X - Y) * (X > Y).  */
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index cef901808eb..519d15f2a43 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -4490,26 +4490,37 @@ vect_recog_mult_pattern (vec_info *vinfo,
>  extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree));
>  extern bool gimple_unsigned_integer_sat_sub (tree, tree*, tree (*)(tree));
>
> -static gcall *
> -vect_recog_build_binary_gimple_call (vec_info *vinfo, gimple *stmt,
> +static gimple *
> +vect_recog_build_binary_gimple_stmt (vec_info *vinfo, stmt_vec_info 
> stmt_info,
>  internal_fn fn, tree *type_out,
> -tree op_0, tree op_1)
> +tree lhs, tree op_0, tree op_1)
>  {
>tree itype = TREE_TYPE (op_0);
> -  tree vtype = get_vectype_for_scalar_type (vinfo, itype);
> +  tree otype = TREE_TYPE (lhs);
> +  tree v_itype = get_vectype_for_scalar_type (vinfo, itype);
> +  tree v_otype = get_vectype_for_scalar_type (vinfo, otype);
>
> -  if (vtype != NULL_TREE
> -&& direct_internal_fn_supported_p (fn, vtype, OPTIMIZE_FOR_BOTH))
> +  if (v_itype != NULL_TREE && v_otype != NULL_TREE
> +&& direct_internal_fn_supported_p (fn, v_itype, OPTIMIZE_FOR_BOTH))
>  {
>gcall *call = gimple_build_call_internal (fn, 2, op_0, op_1);
> +  tree in_ssa = vect_recog_temp_ssa_var (itype, NULL);
>
> -  gimple_call_set_lhs (call, vect_recog_temp_ssa_var (itype, NULL));
> +  gimple_call_set_lhs (call, in_ssa);
>gimple_call_set_nothrow (call, /* nothrow_p */ false);
> -  gimple_set_location (call, gimple_location (stmt));
> + 

RE: [PATCH v2] Internal-fn: Support new IFN SAT_TRUNC for unsigned scalar int

2024-06-26 Thread Li, Pan2
> OK.

Committed, thanks Richard.

Pan

-Original Message-
From: Richard Biener  
Sent: Thursday, June 27, 2024 2:08 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v2] Internal-fn: Support new IFN SAT_TRUNC for unsigned 
scalar int

On Thu, Jun 27, 2024 at 7:12 AM  wrote:
>
> From: Pan Li 
>
> This patch would like to add the middle-end presentation for the
> saturation truncation.  Aka set the result of truncated value to
> the max value when overflow.  It will take the pattern similar
> as below.
>
> Form 1:
>   #define DEF_SAT_U_TRUC_FMT_1(WT, NT) \
>   NT __attribute__((noinline)) \
>   sat_u_truc_##T##_fmt_1 (WT x)\
>   {\
> bool overflow = x > (WT)(NT)(-1);  \
> return ((NT)x) | (NT)-overflow;\
>   }
>
> For example, truncated uint16_t to uint8_t, we have
>
> * SAT_TRUNC (254)   => 254
> * SAT_TRUNC (255)   => 255
> * SAT_TRUNC (256)   => 255
> * SAT_TRUNC (65536) => 255
>
> Given below SAT_TRUNC from uint64_t to uint32_t.
>
> DEF_SAT_U_TRUC_FMT_1 (uint64_t, uint32_t)
>
> Before this patch:
> __attribute__((noinline))
> uint32_t sat_u_truc_T_fmt_1 (uint64_t x)
> {
>   _Bool overflow;
>   unsigned int _1;
>   unsigned int _2;
>   unsigned int _3;
>   uint32_t _6;
>
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   overflow_5 = x_4(D) > 4294967295;
>   _1 = (unsigned int) x_4(D);
>   _2 = (unsigned int) overflow_5;
>   _3 = -_2;
>   _6 = _1 | _3;
>   return _6;
> ;;succ:   EXIT
>
> }
>
> After this patch:
> __attribute__((noinline))
> uint32_t sat_u_truc_T_fmt_1 (uint64_t x)
> {
>   uint32_t _6;
>
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _6 = .SAT_TRUNC (x_4(D)); [tail call]
>   return _6;
> ;;succ:   EXIT
>
> }

OK.

Thanks,
Richard.

> The below tests are passed for this patch:
> *. The rv64gcv fully regression tests.
> *. The rv64gcv build with glibc.
> *. The x86 bootstrap tests.
> *. The x86 fully regression tests.
>
> gcc/ChangeLog:
>
> * internal-fn.def (SAT_TRUNC): Add new signed IFN sat_trunc as
> unary_convert.
> * match.pd: Add new matching pattern for unsigned int sat_trunc.
> * optabs.def (OPTAB_CL): Add unsigned and signed optab.
> * tree-ssa-math-opts.cc (gimple_unsigend_integer_sat_trunc): Add
> new decl for the matching pattern generated func.
> (match_unsigned_saturation_trunc): Add new func impl to match
> the .SAT_TRUNC.
> (math_opts_dom_walker::after_dom_children): Add .SAT_TRUNC match
> function under BIT_IOR_EXPR case.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/internal-fn.def   |  2 ++
>  gcc/match.pd  | 16 
>  gcc/optabs.def|  3 +++
>  gcc/tree-ssa-math-opts.cc | 32 
>  4 files changed, 53 insertions(+)
>
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index a8c83437ada..915d329c05a 100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -278,6 +278,8 @@ DEF_INTERNAL_SIGNED_OPTAB_FN (MULHRS, ECF_CONST | 
> ECF_NOTHROW, first,
>  DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_ADD, ECF_CONST, first, ssadd, usadd, 
> binary)
>  DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_SUB, ECF_CONST, first, sssub, ussub, 
> binary)
>
> +DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_TRUNC, ECF_CONST, first, sstrunc, ustrunc, 
> unary_convert)
> +
>  DEF_INTERNAL_COND_FN (ADD, ECF_CONST, add, binary)
>  DEF_INTERNAL_COND_FN (SUB, ECF_CONST, sub, binary)
>  DEF_INTERNAL_COND_FN (MUL, ECF_CONST, smul, binary)
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 3d0689c9312..06120a1c62c 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3210,6 +3210,22 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
>&& types_match (type, @0, @1
>
> +/* Unsigned saturation truncate, case 1 (), sizeof (WT) > sizeof (NT).
> +   SAT_U_TRUNC = (NT)x | (NT)(-(X > (WT)(NT)(-1))).  */
> +(match (unsigned_integer_sat_trunc @0)
> + (bit_ior:c (negate (convert (gt @0 INTEGER_CST@1)))
> +   (convert @0))
> + (with {
> +   unsigned itype_precision = TYPE_PRECISION (TREE_TYPE (@0));
> +   unsigned otype_precision = TYPE_PRECISION (type);
> +   wide_int trunc_max = wi::mask (itype_precision / 2, false, 
> itype_precision);
> +   wide_int int_cst = wi::to_wide (@1, itype_precision);
> +  }
> +  (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> +   && TYPE_

RE: [PATCH v2] Vect: Support truncate after .SAT_SUB pattern in zip

2024-06-27 Thread Li, Pan2
It only requires the backend implement the standard name for vector mode I bet.
How about a simpler one like below.

  #define DEF_VEC_SAT_U_SUB_TRUNC_FMT_1(OUT_T, IN_T)   \
  void __attribute__((noinline))   \
  vec_sat_u_sub_trunc_##OUT_T##_fmt_1 (OUT_T *out, IN_T *op_1, IN_T y, \
   unsigned limit) \
  {\
unsigned i;\
for (i = 0; i < limit; i++)\
  {\
IN_T x = op_1[i];  \
out[i] = (OUT_T)(x >= y ? x - y : 0);  \
  }\
  }

DEF_VEC_SAT_U_SUB_TRUNC_FMT_1(uint32_t, uint64_t);

The riscv backend is able to detect the pattern similar as below. I can help to 
check x86 side after the running test suites.

;;   basic block 2, loop depth 0
;;pred:   ENTRY
  if (limit_11(D) != 0)
goto ; [89.00%]
  else
goto ; [11.00%]
;;succ:   3
;;5
;;   basic block 3, loop depth 0
;;pred:   2
  vect_cst__71 = [vec_duplicate_expr] y_14(D);
  _78 = (unsigned long) limit_11(D);
;;succ:   4

;;   basic block 4, loop depth 1
;;pred:   4
;;3
  # vectp_op_1.7_68 = PHI 
  # vectp_out.12_75 = PHI 
  # ivtmp_79 = PHI 
  _81 = .SELECT_VL (ivtmp_79, POLY_INT_CST [2, 2]);
  ivtmp_67 = _81 * 8;
  vect_x_13.9_70 = .MASK_LEN_LOAD (vectp_op_1.7_68, 64B, { -1, ... }, _81, 0);
  vect_patt_48.10_72 = .SAT_SUB (vect_x_13.9_70, vect_cst__71); 
 // .SAT_SUB pattern
  vect_patt_49.11_73 = (vector([2,2]) unsigned int) vect_patt_48.10_72;
  ivtmp_74 = _81 * 4;
  .MASK_LEN_STORE (vectp_out.12_75, 32B, { -1, ... }, _81, 0, 
vect_patt_49.11_73);
  vectp_op_1.7_69 = vectp_op_1.7_68 + ivtmp_67;
  vectp_out.12_76 = vectp_out.12_75 + ivtmp_74;
  ivtmp_80 = ivtmp_79 - _81;

riscv64-unknown-elf-gcc (GCC) 15.0.0 20240627 (experimental)
Copyright (C) 2024 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Pan

-Original Message-
From: Uros Bizjak  
Sent: Thursday, June 27, 2024 2:48 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
richard.guent...@gmail.com; jeffreya...@gmail.com; pins...@gmail.com
Subject: Re: [PATCH v2] Vect: Support truncate after .SAT_SUB pattern in zip

On Mon, Jun 24, 2024 at 3:55 PM  wrote:
>
> From: Pan Li 
>
> The zip benchmark of coremark-pro have one SAT_SUB like pattern but
> truncated as below:
>
> void test (uint16_t *x, unsigned b, unsigned n)
> {
>   unsigned a = 0;
>   register uint16_t *p = x;
>
>   do {
> a = *--p;
> *p = (uint16_t)(a >= b ? a - b : 0); // Truncate after .SAT_SUB
>   } while (--n);
> }
>
> It will have gimple before vect pass,  it cannot hit any pattern of
> SAT_SUB and then cannot vectorize to SAT_SUB.
>
> _2 = a_11 - b_12(D);
> iftmp.0_13 = (short unsigned int) _2;
> _18 = a_11 >= b_12(D);
> iftmp.0_5 = _18 ? iftmp.0_13 : 0;
>
> This patch would like to improve the pattern match to recog above
> as truncate after .SAT_SUB pattern.  Then we will have the pattern
> similar to below,  as well as eliminate the first 3 dead stmt.
>
> _2 = a_11 - b_12(D);
> iftmp.0_13 = (short unsigned int) _2;
> _18 = a_11 >= b_12(D);
> iftmp.0_5 = (short unsigned int).SAT_SUB (a_11, b_12(D));
>
> The below tests are passed for this patch.
> 1. The rv64gcv fully regression tests.
> 2. The rv64gcv build with glibc.
> 3. The x86 bootstrap tests.
> 4. The x86 fully regression tests.

I have tried this patch with x86_64 on the testcase from PR51492, but
the compiler does not recognize the .SAT_SUB pattern here.

Is there anything else missing for successful detection?

Uros.

>
> gcc/ChangeLog:
>
> * match.pd: Add convert description for minus and capture.
> * tree-vect-patterns.cc (vect_recog_build_binary_gimple_call): Add
> new logic to handle in_type is incompatibile with out_type,  as
> well as rename from.
> (vect_recog_build_binary_gimple_stmt): Rename to.
> (vect_recog_sat_add_pattern): Leverage above renamed func.
> (vect_recog_sat_sub_pattern): Ditto.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/match.pd  |  4 +--
>  gcc/tree-vect-patterns.cc | 51 ---
>  2 files changed, 33 insertions(+), 22 deletions(-)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 3d0689c9312..4a4b0b2e72f 100644
> -

RE: [PATCH v1] Match: Support more forms for the scalar unsigned .SAT_SUB

2024-06-27 Thread Li, Pan2
Yes, you are right. The supported forms for now don't involve any widening ops.
Thus, below uint64_t form is failed to detect, will add it to my backlog.

uint32_t saturation_add(uint32_t a, uint32_t b)
{
const uint64_t tmp = (uint64_t)a + b;
if (tmp > UINT32_MAX)
{
return UINT32_MAX;
}
return tmp;
}

Pan

-Original Message-
From: Andrew Pinski  
Sent: Thursday, June 27, 2024 3:32 PM
To: Li, Pan2 
Cc: Richard Biener ; gcc-patches@gcc.gnu.org; 
juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; 
rdapp@gmail.com
Subject: Re: [PATCH v1] Match: Support more forms for the scalar unsigned 
.SAT_SUB

On Wed, Jun 19, 2024 at 12:37 AM Li, Pan2  wrote:
>
> Hi Richard,
>
> Given almost all unsigned SAT_ADD/SAT_SUB patches are merged, I revisit the 
> original code pattern aka zip benchmark.
> It may look like below:
>
> void test (uint16_t *x, uint16_t *y, unsigned wsize, unsigned count)
> {
>   unsigned m = 0, n = count;
>   register uint16_t *p;
>
>   p = x;
>
>   do {
> m = *--p;
>
> *p = (uint16_t)(m >= wsize ? m-wsize : 0); // There will be a conversion 
> here.
>   } while (--n);
> }
>
> And we can have 179 tree pass as below:
>
>[local count: 1073741824]:
>   # n_3 = PHI 
>   # p_4 = PHI 
>   p_10 = p_4 + 18446744073709551614;
>   _1 = *p_10;
>   m_11 = (unsigned int) _1;
>   _2 = m_11 - wsize_12(D);
>   iftmp.0_13 = (short unsigned int) _2;
>   _18 = m_11 >= wsize_12(D);
>   iftmp.0_5 = _18 ? iftmp.0_13 : 0;
>   *p_10 = iftmp.0_5;
>
> The above form doesn't hit any form we have supported in match.pd. Then I 
> have one idea that to convert
>
> uint16 d, tmp;
> uint32 a, b, m;
>
> m = a - b;
> tmp = (uint16)m;
> d = a >= b ? tmp : 0;
>
> to
>
> d = (uint16)(.SAT_SUB (a, b));
>
> I am not very sure it is reasonable to make it work, it may have gimple 
> assignment with convert similar as below (may require the help of 
> vectorize_conversion?).
> Would like to get some hint from you before the next step, thanks a lot.
>
> patt_34 = .SAT_SUB (m_11, wsize_12(D));
> patt_35 = (vector([8,8]) short unsigned int) patt_34;

I am not sure if this is related to the above but we also miss:
```
uint32_t saturation_add(uint32_t a, uint32_t b)
{
const uint64_t tmp = (uint64_t)a + b;
if (tmp > UINT32_MAX)
{
return UINT32_MAX;
}
return tmp;
}
```

This comes from https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88603 . I
thought you might be interested in that form too.

Thanks,
Andrew

>
> Pan
>
> -Original Message-
> From: Richard Biener 
> Sent: Friday, June 14, 2024 4:05 PM
> To: Li, Pan2 
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
> jeffreya...@gmail.com; rdapp@gmail.com
> Subject: Re: [PATCH v1] Match: Support more forms for the scalar unsigned 
> .SAT_SUB
>
> On Wed, Jun 12, 2024 at 2:38 PM  wrote:
> >
> > From: Pan Li 
> >
> > After we support the scalar unsigned form 1 and 2,  we would like
> > to introduce more forms include the branch and branchless.  There
> > are forms 3-10 list as below:
> >
> > Form 3:
> >   #define SAT_SUB_U_3(T) \
> >   T sat_sub_u_3_##T (T x, T y) \
> >   { \
> > return x > y ? x - y : 0; \
> >   }
> >
> > Form 4:
> >   #define SAT_SUB_U_4(T) \
> >   T sat_sub_u_4_##T (T x, T y) \
> >   { \
> > return x >= y ? x - y : 0; \
> >   }
> >
> > Form 5:
> >   #define SAT_SUB_U_5(T) \
> >   T sat_sub_u_5_##T (T x, T y) \
> >   { \
> > return x < y ? 0 : x - y; \
> >   }
> >
> > Form 6:
> >   #define SAT_SUB_U_6(T) \
> >   T sat_sub_u_6_##T (T x, T y) \
> >   { \
> > return x <= y ? 0 : x - y; \
> >   }
> >
> > Form 7:
> >   #define SAT_SUB_U_7(T) \
> >   T sat_sub_u_7_##T (T x, T y) \
> >   { \
> > T ret; \
> > T overflow = __builtin_sub_overflow (x, y, &ret); \
> > return ret & (T)(overflow - 1); \
> >   }
> >
> > Form 8:
> >   #define SAT_SUB_U_8(T) \
> >   T sat_sub_u_8_##T (T x, T y) \
> >   { \
> > T ret; \
> > T overflow = __builtin_sub_overflow (x, y, &ret); \
> > return ret & (T)-(!overflow); \
> >   }
> >
> > Form 9:
> >   #define SAT_SUB_U_9(T) \
> >   T sat_sub_u_9_##T (T x, T y) \
> >   { \
> > T ret; \
> > T overflow = __builtin_sub_overflow (x, y, &ret); \
> > return overflow ? 0 : ret; \
> >   }
> >
> > Form 10:
> >   #define SAT_SUB_U_10(T) \
> >  

RE: [PATCH v3] Vect: Support truncate after .SAT_SUB pattern in zip

2024-06-27 Thread Li, Pan2
Hi Richard,

As mentioned by tamar in previous, would like to try even more optimization 
based on this patch.
Assume we take zip benchmark as example, we may have gimple similar as below

unsigned int _1, _2;
unsigned short int _9;

_9 = (unsigned short int).SAT_SUB (_1, _2);

If we can locate the _1 is in the range of unsigned short, we can distribute 
the convert into
the .SAT_SUB, aka:

From:
_1 = (unsigned int short)_other;
_9 = (unsigned short int).SAT_SUB (_1, _2);

To:
_9 = .SAT_SUB ((unsigned int short)_1, (unsigned int short)MIN_EXPR (_2, 
65536)));

Unfortunately, it failed to vectorize when I try to perform above changes. The 
vectorizable_conversion
considers it is not simple use and then return fail to vect_analyze_loop_2.

zip.test.c:15:12: note:   ==> examining pattern def statement: patt_42 = (short 
unsigned int) MIN_EXPR ;
zip.test.c:15:12: note:   ==> examining statement: patt_42 = (short unsigned 
int) MIN_EXPR ;
zip.test.c:15:12: note:   vect_is_simple_use: operand MIN_EXPR , type of def: unknown
zip.test.c:15:12: missed:   Unsupported pattern.
zip.test.c:15:12: missed:   use not simple.
zip.test.c:15:12: note:   vect_is_simple_use: operand MIN_EXPR , type of def: unknown
zip.test.c:15:12: missed:   Unsupported pattern.
zip.test.c:15:12: missed:   use not simple.
zip.test.c:15:12: note:   vect_is_simple_use: operand MIN_EXPR , type of def: unknown
zip.test.c:15:12: missed:   Unsupported pattern.
zip.test.c:15:12: missed:   use not simple.
zip.test.c:7:6: missed:   not vectorized: relevant stmt not supported: patt_42 
= (short unsigned int) MIN_EXPR ;
zip.test.c:15:12: missed:  bad operation or unsupported loop bound. 

I tried to take COND_EXPR here instead of MIN_EXPR but almost the same 
behavior. I am not sure if we can unblock this by the
vectorizable_conversion or we need some improvements from other pass.

Thanks a lot.

Pan

-Original Message-
From: Li, Pan2 
Sent: Thursday, June 27, 2024 2:14 PM
To: Richard Biener 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
jeffreya...@gmail.com; rdapp@gmail.com
Subject: RE: [PATCH v3] Vect: Support truncate after .SAT_SUB pattern in zip

> OK

Committed, thanks Richard.

Pan

-Original Message-
From: Richard Biener  
Sent: Thursday, June 27, 2024 2:04 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v3] Vect: Support truncate after .SAT_SUB pattern in zip

On Thu, Jun 27, 2024 at 3:31 AM  wrote:
>
> From: Pan Li 

OK

> The zip benchmark of coremark-pro have one SAT_SUB like pattern but
> truncated as below:
>
> void test (uint16_t *x, unsigned b, unsigned n)
> {
>   unsigned a = 0;
>   register uint16_t *p = x;
>
>   do {
> a = *--p;
> *p = (uint16_t)(a >= b ? a - b : 0); // Truncate after .SAT_SUB
>   } while (--n);
> }
>
> It will have gimple before vect pass,  it cannot hit any pattern of
> SAT_SUB and then cannot vectorize to SAT_SUB.
>
> _2 = a_11 - b_12(D);
> iftmp.0_13 = (short unsigned int) _2;
> _18 = a_11 >= b_12(D);
> iftmp.0_5 = _18 ? iftmp.0_13 : 0;
>
> This patch would like to improve the pattern match to recog above
> as truncate after .SAT_SUB pattern.  Then we will have the pattern
> similar to below,  as well as eliminate the first 3 dead stmt.
>
> _2 = a_11 - b_12(D);
> iftmp.0_13 = (short unsigned int) _2;
> _18 = a_11 >= b_12(D);
> iftmp.0_5 = (short unsigned int).SAT_SUB (a_11, b_12(D));
>
> The below tests are passed for this patch.
> 1. The rv64gcv fully regression tests.
> 2. The rv64gcv build with glibc.
> 3. The x86 bootstrap tests.
> 4. The x86 fully regression tests.
>
> gcc/ChangeLog:
>
> * match.pd: Add convert description for minus and capture.
> * tree-vect-patterns.cc (vect_recog_build_binary_gimple_call): Add
> new logic to handle in_type is incompatibile with out_type,  as
> well as rename from.
> (vect_recog_build_binary_gimple_stmt): Rename to.
> (vect_recog_sat_add_pattern): Leverage above renamed func.
> (vect_recog_sat_sub_pattern): Ditto.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/match.pd  |  4 +--
>  gcc/tree-vect-patterns.cc | 51 ---
>  2 files changed, 33 insertions(+), 22 deletions(-)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index cf8a399a744..820591a36b3 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3164,9 +3164,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  /* Unsigned saturation sub, case 2 (branch with ge):
> SAT_U_SUB = X >= Y ? X - Y : 0.  */
>  (match (unsigned_integer_sat_sub @0 @1)
> - (cond^ (ge @0 @1) (minus @0 @1) integer_zerop)
> + (cond^ (ge @0 @1) (convert? (minus (convert1? @0) (co

RE: [PATCH v1] Match: Support imm form for unsigned scalar .SAT_ADD

2024-06-28 Thread Li, Pan2
> OK with those changes.

Thanks Richard for comments, will make the changes and commit if no surprise 
from test suites.

Pan

-Original Message-
From: Richard Biener  
Sent: Friday, June 28, 2024 9:12 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1] Match: Support imm form for unsigned scalar .SAT_ADD

On Fri, Jun 28, 2024 at 5:44 AM  wrote:
>
> From: Pan Li 
>
> This patch would like to support the form of unsigned scalar .SAT_ADD
> when one of the op is IMM.  For example as below:
>
> Form IMM:
>   #define DEF_SAT_U_ADD_IMM_FMT_1(T)   \
>   T __attribute__((noinline))  \
>   sat_u_add_imm_##T##_fmt_1 (T x)  \
>   {\
> return (T)(x + 9) >= x ? (x + 9) : -1; \
>   }
>
> DEF_SAT_U_ADD_IMM_FMT_1(uint64_t)
>
> Before this patch:
> __attribute__((noinline))
> uint64_t sat_u_add_imm_uint64_t_fmt_1 (uint64_t x)
> {
>   long unsigned int _1;
>   uint64_t _3;
>
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _1 = MIN_EXPR ;
>   _3 = _1 + 9;
>   return _3;
> ;;succ:   EXIT
>
> }
>
> After this patch:
> __attribute__((noinline))
> uint64_t sat_u_add_imm_uint64_t_fmt_1 (uint64_t x)
> {
>   uint64_t _3;
>
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _3 = .SAT_ADD (x_2(D), 9); [tail call]
>   return _3;
> ;;succ:   EXIT
>
> }
>
> The below test suites are passed for this patch:
> 1. The rv64gcv fully regression test with newlib.
> 2. The x86 bootstrap test.
> 3. The x86 fully regression test.
>
> gcc/ChangeLog:
>
> * match.pd: Add imm form for .SAT_ADD matching.
> * tree-ssa-math-opts.cc (math_opts_dom_walker::after_dom_children):
> Add .SAT_ADD matching under PLUS_EXPR.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/match.pd  | 22 ++
>  gcc/tree-ssa-math-opts.cc |  2 ++
>  2 files changed, 24 insertions(+)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 3fa3f2e8296..d738c7ee9b4 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3154,6 +3154,28 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  (match (unsigned_integer_sat_add @0 @1)
>   (cond^ (gt @0 (usadd_left_part_1@2 @0 @1)) integer_minus_onep @2))
>
> +/* Unsigned saturation add, case 9 (one op is imm):
> +   SAT_U_ADD = (X + 3) >= x ? (X + 3) : -1.  */
> +(match (unsigned_integer_sat_add @0 @1)
> + (plus:c (min @0 INTEGER_CST@2) INTEGER_CST@1)

No :c necessary on the plus.

> + (with {
> +   unsigned precision = TYPE_PRECISION (type);
> +   wide_int cst_1 = wi::to_wide (@1, precision);
> +   wide_int cst_2 = wi::to_wide (@2, precision);

Just use wi::to_wide (@1/@2);

> +   wide_int max = wi::mask (precision, false, precision);
> +   wide_int sum = wi::add (cst_1, cst_2);
> +  }
> +  (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> +  && types_match (type, @0, @1) && wi::eq_p (max, sum)

Can you refactor to put the non-max/sum tests before the (with {...}?

> +
> +/* Unsigned saturation add, case 10 (one op is imm):
> +   SAT_U_ADD = __builtin_add_overflow (X, 3, &ret) == 0 ? ret : -1.  */
> +(match (unsigned_integer_sat_add @0 @1)
> + (cond^ (ne (imagpart (IFN_ADD_OVERFLOW:c@2 @0 INTEGER_CST@1)) integer_zerop)

No need for :c on the IFN_ADD_OVERFLOW.

OK with those changes.

Richard.

> +  integer_minus_onep (realpart @2))
> +  (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> +  && types_match (type, @0
> +
>  /* Unsigned saturation sub, case 1 (branch with gt):
> SAT_U_SUB = X > Y ? X - Y : 0  */
>  (match (unsigned_integer_sat_sub @0 @1)
> diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
> index 3783a874699..3b5433ec000 100644
> --- a/gcc/tree-ssa-math-opts.cc
> +++ b/gcc/tree-ssa-math-opts.cc
> @@ -6195,6 +6195,8 @@ math_opts_dom_walker::after_dom_children (basic_block 
> bb)
>   break;
>
> case PLUS_EXPR:
> + match_unsigned_saturation_add (&gsi, as_a (stmt));
> + /* fall-through  */
> case MINUS_EXPR:
>   if (!convert_plusminus_to_widen (&gsi, stmt, code))
> {
> --
> 2.34.1
>


RE: [PATCH v3] Vect: Support truncate after .SAT_SUB pattern in zip

2024-06-28 Thread Li, Pan2
Thanks Tamar and Richard for enlightening.

> I think you're doing the MIN_EXPR wrong - the above says MIN_EXPR
>  which doesn't make
> sense anyway.  I suspect you fail to put the MIN_EXPR to a separate statement?

Make sense, will have another try for this.

> Aye, you need to emit the additional statements through  
> append_pattern_def_seq,
> This is also because the scalar statement doesn’t require them, so it makes 
> costing easier.
> The vectorizer expects arguments to be simple use, so compound statements 
> aren't
> Supported as they make costing and codegen harder.

Yes, you are right. It is not ssa_name during simple use check and then return 
failures to vectorization_convertion.

Pan

-Original Message-
From: Tamar Christina  
Sent: Friday, June 28, 2024 9:39 PM
To: Richard Biener ; Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
jeffreya...@gmail.com; rdapp@gmail.com
Subject: RE: [PATCH v3] Vect: Support truncate after .SAT_SUB pattern in zip

> -Original Message-
> From: Richard Biener 
> Sent: Friday, June 28, 2024 6:39 AM
> To: Li, Pan2 
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com;
> jeffreya...@gmail.com; rdapp@gmail.com; Tamar Christina
> 
> Subject: Re: [PATCH v3] Vect: Support truncate after .SAT_SUB pattern in zip
> 
> On Thu, Jun 27, 2024 at 4:45 PM Li, Pan2  wrote:
> >
> > Hi Richard,
> >
> > As mentioned by tamar in previous, would like to try even more optimization
> based on this patch.
> > Assume we take zip benchmark as example, we may have gimple similar as below
> >
> > unsigned int _1, _2;
> > unsigned short int _9;
> >
> > _9 = (unsigned short int).SAT_SUB (_1, _2);
> >
> > If we can locate the _1 is in the range of unsigned short, we can 
> > distribute the
> convert into
> > the .SAT_SUB, aka:
> >
> > From:
> > _1 = (unsigned int short)_other;
> > _9 = (unsigned short int).SAT_SUB (_1, _2);
> >
> > To:
> > _9 = .SAT_SUB ((unsigned int short)_1, (unsigned int short)MIN_EXPR (_2,
> 65536)));
> >
> > Unfortunately, it failed to vectorize when I try to perform above changes. 
> > The
> vectorizable_conversion
> > considers it is not simple use and then return fail to vect_analyze_loop_2.
> >
> > zip.test.c:15:12: note:   ==> examining pattern def statement: patt_42 = 
> > (short
> unsigned int) MIN_EXPR ;
> > zip.test.c:15:12: note:   ==> examining statement: patt_42 = (short 
> > unsigned int)
> MIN_EXPR ;
> > zip.test.c:15:12: note:   vect_is_simple_use: operand MIN_EXPR  b_12(D)>, type of def: unknown
> > zip.test.c:15:12: missed:   Unsupported pattern.
> > zip.test.c:15:12: missed:   use not simple.
> > zip.test.c:15:12: note:   vect_is_simple_use: operand MIN_EXPR  b_12(D)>, type of def: unknown
> > zip.test.c:15:12: missed:   Unsupported pattern.
> > zip.test.c:15:12: missed:   use not simple.
> > zip.test.c:15:12: note:   vect_is_simple_use: operand MIN_EXPR  b_12(D)>, type of def: unknown
> > zip.test.c:15:12: missed:   Unsupported pattern.
> > zip.test.c:15:12: missed:   use not simple.
> > zip.test.c:7:6: missed:   not vectorized: relevant stmt not supported: 
> > patt_42 =
> (short unsigned int) MIN_EXPR ;
> > zip.test.c:15:12: missed:  bad operation or unsupported loop bound.
> >
> > I tried to take COND_EXPR here instead of MIN_EXPR but almost the same
> behavior. I am not sure if we can unblock this by the
> > vectorizable_conversion or we need some improvements from other pass.
> 
> I think you're doing the MIN_EXPR wrong - the above says MIN_EXPR
>  which doesn't make
> sense anyway.  I suspect you fail to put the MIN_EXPR to a separate statement?
> 

Aye, you need to emit the additional statements through  append_pattern_def_seq,
This is also because the scalar statement doesn’t require them, so it makes 
costing easier.

The vectorizer expects arguments to be simple use, so compound statements aren't
Supported as they make costing and codegen harder.

Cheers,
Tamar

> > Thanks a lot.
> >
> > Pan
> >
> > -Original Message-
> > From: Li, Pan2
> > Sent: Thursday, June 27, 2024 2:14 PM
> > To: Richard Biener 
> > Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com;
> jeffreya...@gmail.com; rdapp@gmail.com
> > Subject: RE: [PATCH v3] Vect: Support truncate after .SAT_SUB pattern in zip
> >
> > > OK
> >
> > Committed, thanks Richard.
> >
> > Pan
> >
> > -Original Message-
> > From: Richard Biener 
> >

RE: [PATCH v2] RISC-V: Implement the .SAT_TRUNC for scalar

2024-07-02 Thread Li, Pan2
Thanks Jeff for comments.

> Rather than reference TARGET_64BIT, you should reference the new 
> iterators names.  

Got it, generated need some manual adjustment.

> You probably want gen_int_mode rather than GEN_INT.

Sure.

> Why are you using Pmode?  Pmode is for pointers.  This stuff looks like 
> basic integer ops, so I don't see why Pmode is appropriate.

The incoming operand may be HI/QI/SImode, so we need to prompt the mode.
So there we should take Xmode? Will update in v2.

Pan

-Original Message-
From: Jeff Law  
Sent: Wednesday, July 3, 2024 8:30 AM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v2] RISC-V: Implement the .SAT_TRUNC for scalar



On 7/2/24 12:33 AM, pan2...@intel.com wrote:

> 
> The below tests suites are passed for this patch
> 1. The rv64gcv fully regression test.
> 2. The rv64gcv build with glibc
> 
> gcc/ChangeLog:
> 
>   * config/riscv/iterators.md (TARGET_64BIT): Add new iterator
>   and related attr(s).
Rather than reference TARGET_64BIT, you should reference the new 
iterators names.

>   * config/riscv/riscv-protos.h (riscv_expand_ustrunc): Add new
>   func decl for expanding ustrunc
>   * config/riscv/riscv.cc (riscv_expand_ustrunc): Add new func
>   impl to expand ustrunc.
>   * config/riscv/riscv.md (ustrunc2): Add
>   new pattern ustrunc2.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/riscv/sat_arith.h: Add test helper macro.
>   * gcc.target/riscv/sat_arith_data.h: New test.
>   * gcc.target/riscv/sat_u_trunc-1.c: New test.
>   * gcc.target/riscv/sat_u_trunc-2.c: New test.
>   * gcc.target/riscv/sat_u_trunc-3.c: New test.
>   * gcc.target/riscv/sat_u_trunc-run-1.c: New test.
>   * gcc.target/riscv/sat_u_trunc-run-2.c: New test.
>   * gcc.target/riscv/sat_u_trunc-run-3.c: New test.
>   * gcc.target/riscv/scalar_sat_unary.h: New test.
> 
> Signed-off-by: Pan Li 
> ---

>   
> +/* Implement the unsigned saturation truncation for int mode.
> +
> +   b = SAT_TRUNC (a);
> +   =>
> +   1. max = half truncated max
> +   2. lt = a < max
> +   3. lt = lt - 1 (lt 0, ge -1)
> +   4. d = a | lt
> +   5. b = (trunc)d  */
> +
> +void
> +riscv_expand_ustrunc (rtx dest, rtx src)
> +{
> +  machine_mode omode = GET_MODE (dest);
> +  rtx pmode_max = gen_reg_rtx (Pmode);
> +  unsigned precision = GET_MODE_PRECISION (omode).to_constant ();
> +
> +  gcc_assert (precision < 64);
> +
> +  uint64_t max = ((uint64_t)1u << precision) - 1u;
> +  rtx pmode_src = gen_lowpart (Pmode, src);
> +  rtx pmode_dest = gen_reg_rtx (Pmode);
> +  rtx pmode_lt = gen_reg_rtx (Pmode);
> +
> +  /* Step-1: max = half truncated max  */
> +  emit_move_insn (pmode_max, GEN_INT (max));
> +
> +  /* Step-2: lt = src < max  */
> +  riscv_emit_binary (LTU, pmode_lt, pmode_src, pmode_max);
> +
> +  /* Step-3: lt = lt - 1  */
> +  riscv_emit_binary (PLUS, pmode_lt, pmode_lt, CONSTM1_RTX (Pmode));
> +
> +  /* Step-4: pmode_dest = lt | src  */
> +  riscv_emit_binary (IOR, pmode_dest, pmode_lt, pmode_src);
> +
> +  /* Step-5: dest = pmode_dest  */
> +  emit_move_insn (dest, gen_lowpart (omode, pmode_dest));
> +}
You probably want gen_int_mode rather than GEN_INT.

Why are you using Pmode?  Pmode is for pointers.  This stuff looks like 
basic integer ops, so I don't see why Pmode is appropriate.


jeff




RE: [PATCH v1] Vect: Support IFN SAT_TRUNC for unsigned vector int

2024-07-02 Thread Li, Pan2
Thanks Tamar.

Looks I missed the comments part, will update in v2.

Pan

-Original Message-
From: Tamar Christina  
Sent: Tuesday, July 2, 2024 11:03 PM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com; 
jeffreya...@gmail.com; rdapp@gmail.com
Subject: RE: [PATCH v1] Vect: Support IFN SAT_TRUNC for unsigned vector int

Hi Pan,

Ah so this is doing the same code as in match_unsigned_saturation_trunc inside
the vectorizer?

Looks good to me but can't approve, could you however also place the same 
comment
about what it's matching from match_unsigned_saturation_trunc into the vector 
one?

Thanks,
Tamar

> -Original Message-
> From: pan2...@intel.com 
> Sent: Tuesday, July 2, 2024 2:32 PM
> To: gcc-patches@gcc.gnu.org
> Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com;
> Tamar Christina ; jeffreya...@gmail.com;
> rdapp@gmail.com; Pan Li 
> Subject: [PATCH v1] Vect: Support IFN SAT_TRUNC for unsigned vector int
> 
> From: Pan Li 
> 
> This patch would like to support the .SAT_TRUNC for the unsigned
> vector int.  Given we have below example code:
> 
> Form 1
>   #define VEC_DEF_SAT_U_TRUC_FMT_1(NT, WT) \
>   void __attribute__((noinline))   \
>   vec_sat_u_truc_##WT##_to_##NT##_fmt_1 (NT *x, WT *y, unsigned limit) \
>   {\
> for (unsigned i = 0; i < limit; i++)   \
>   {\
> bool overflow = y[i] > (WT)(NT)(-1);   \
> x[i] = ((NT)y[i]) | (NT)-overflow; \
>   }\
>   }
> 
> VEC_DEF_SAT_U_TRUC_FMT_1 (uint32_t, uint64_t)
> 
> Before this patch:
> void vec_sat_u_truc_uint64_t_to_uint32_t_fmt_1 (uint32_t * x, uint64_t * y,
> unsigned int limit)
> {
>   ...
>   _51 = .SELECT_VL (ivtmp_49, POLY_INT_CST [2, 2]);
>   ivtmp_35 = _51 * 8;
>   vect__4.7_32 = .MASK_LEN_LOAD (vectp_y.5_34, 64B, { -1, ... }, _51, 0);
>   mask_overflow_16.8_30 = vect__4.7_32 > { 4294967295, ... };
>   vect__5.9_29 = (vector([2,2]) unsigned int) vect__4.7_32;
>   vect__10.13_20 = .VCOND_MASK (mask_overflow_16.8_30, { 4294967295, ... },
> vect__5.9_29);
>   ivtmp_12 = _51 * 4;
>   .MASK_LEN_STORE (vectp_x.14_11, 32B, { -1, ... }, _51, 0, vect__10.13_20);
>   vectp_y.5_33 = vectp_y.5_34 + ivtmp_35;
>   vectp_x.14_46 = vectp_x.14_11 + ivtmp_12;
>   ivtmp_50 = ivtmp_49 - _51;
>   if (ivtmp_50 != 0)
>   ...
> }
> 
> After this patch:
> void vec_sat_u_truc_uint64_t_to_uint32_t_fmt_1 (uint32_t * x, uint64_t * y,
> unsigned int limit)
> {
>   ...
>   _12 = .SELECT_VL (ivtmp_21, POLY_INT_CST [2, 2]);
>   ivtmp_34 = _12 * 8;
>   vect__4.7_31 = .MASK_LEN_LOAD (vectp_y.5_33, 64B, { -1, ... }, _12, 0);
>   vect_patt_40.8_30 = .SAT_TRUNC (vect__4.7_31); // << .SAT_TRUNC
>   ivtmp_29 = _12 * 4;
>   .MASK_LEN_STORE (vectp_x.9_28, 32B, { -1, ... }, _12, 0, vect_patt_40.8_30);
>   vectp_y.5_32 = vectp_y.5_33 + ivtmp_34;
>   vectp_x.9_27 = vectp_x.9_28 + ivtmp_29;
>   ivtmp_20 = ivtmp_21 - _12;
>   if (ivtmp_20 != 0)
>   ...
> }
> 
> The below test suites are passed for this patch
> * The x86 bootstrap test.
> * The x86 fully regression test.
> * The rv64gcv fully regression tests.
> 
> gcc/ChangeLog:
> 
>   * tree-vect-patterns.cc (gimple_unsigned_integer_sat_trunc): Add
>   new decl generated by match.
>   (vect_recog_sat_trunc_pattern): Add new func impl to recog the
>   .SAT_TRUNC pattern.
> 
> Signed-off-by: Pan Li 
> ---
>  gcc/tree-vect-patterns.cc | 42 +++
>  1 file changed, 42 insertions(+)
> 
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index 519d15f2a43..802c5d0f7c8 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -4489,6 +4489,7 @@ vect_recog_mult_pattern (vec_info *vinfo,
> 
>  extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree));
>  extern bool gimple_unsigned_integer_sat_sub (tree, tree*, tree (*)(tree));
> +extern bool gimple_unsigned_integer_sat_trunc (tree, tree*, tree (*)(tree));
> 
>  static gimple *
>  vect_recog_build_binary_gimple_stmt (vec_info *vinfo, stmt_vec_info 
> stmt_info,
> @@ -4603,6 +4604,46 @@ vect_recog_sat_sub_pattern (vec_info *vinfo,
> stmt_vec_info stmt_vinfo,
>return NULL;
>  }
> 
> +static gimple *
> +vect_recog_sat_trunc_pattern (vec_info *vinfo, stmt_vec_info stmt_vinfo,
> +

RE: [PATCH v1] RISC-V: Fix asm check failure for truncated after SAT_SUB

2024-07-03 Thread Li, Pan2
Committed, thanks Juzhe.

Pan

From: juzhe.zh...@rivai.ai 
Sent: Wednesday, July 3, 2024 3:24 PM
To: Li, Pan2 ; gcc-patches 
Cc: kito.cheng ; jeffreyalaw ; 
Robin Dapp ; Li, Pan2 
Subject: Re: [PATCH v1] RISC-V: Fix asm check failure for truncated after 
SAT_SUB

LGTM


juzhe.zh...@rivai.ai<mailto:juzhe.zh...@rivai.ai>

From: pan2.li<mailto:pan2...@intel.com>
Date: 2024-07-03 13:22
To: gcc-patches<mailto:gcc-patches@gcc.gnu.org>
CC: juzhe.zhong<mailto:juzhe.zh...@rivai.ai>; 
kito.cheng<mailto:kito.ch...@gmail.com>; 
jeffreyalaw<mailto:jeffreya...@gmail.com>; 
rdapp.gcc<mailto:rdapp@gmail.com>; Pan Li<mailto:pan2...@intel.com>
Subject: [PATCH v1] RISC-V: Fix asm check failure for truncated after SAT_SUB
From: Pan Li mailto:pan2...@intel.com>>

It seems that the asm check is incorrect for truncated after SAT_SUB,
we should take the vx check for vssubu instead of vv check.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-1.c:
Update vssubu check from vv to vx.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-2.c:
Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-3.c:
Ditto.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>
---
.../gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-1.c  | 2 +-
.../gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-2.c  | 2 +-
.../gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-3.c  | 2 +-
3 files changed, 3 insertions(+), 3 deletions(-)

diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-1.c
index dd9e3999a29..1e380657d74 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-1.c
@@ -11,7 +11,7 @@
** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e16,\s*m1,\s*ta,\s*ma
** ...
** vle16\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
-** vssubu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** vssubu\.vx\s+v[0-9]+,\s*v[0-9]+,\s*[atx][0-9]+
** vsetvli\s+zero,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma
** vncvt\.x\.x\.w\s+v[0-9]+,\s*v[0-9]+
** ...
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-2.c
index 738d1465a01..d7b8931f0ec 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-2.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-2.c
@@ -11,7 +11,7 @@
** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e32,\s*m1,\s*ta,\s*ma
** ...
** vle32\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
-** vssubu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** vssubu\.vx\s+v[0-9]+,\s*v[0-9]+,\s*[atx][0-9]+
** vsetvli\s+zero,\s*zero,\s*e16,\s*mf2,\s*ta,\s*ma
** vncvt\.x\.x\.w\s+v[0-9]+,\s*v[0-9]+
** ...
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-3.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-3.c
index b008b21cf0c..edf42a1f776 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-3.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-3.c
@@ -11,7 +11,7 @@
** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e64,\s*m1,\s*ta,\s*ma
** ...
** vle64\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
-** vssubu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** vssubu\.vx\s+v[0-9]+,\s*v[0-9]+,\s*[atx][0-9]+
** vsetvli\s+zero,\s*zero,\s*e32,\s*mf2,\s*ta,\s*ma
** vncvt\.x\.x\.w\s+v[0-9]+,\s*v[0-9]+
** ...
--
2.34.1




RE: [PATCH v1] Match: Allow more types truncation for .SAT_TRUNC

2024-07-03 Thread Li, Pan2
> OK.

Committed, thanks Richard.

Pan

-Original Message-
From: Richard Biener  
Sent: Wednesday, July 3, 2024 5:04 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
tamar.christ...@arm.com; jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1] Match: Allow more types truncation for .SAT_TRUNC

On Tue, Jul 2, 2024 at 3:38 AM  wrote:
>
> From: Pan Li 
>
> The .SAT_TRUNC has the input and output types,  aka cvt from
> itype to otype and the sizeof (otype) < sizeof (itype).  The
> previous patch only allows the sizeof (otype) == sizeof (itype) / 2.
> But actually we have 1/4 and 1/8 truncation.
>
> This patch would like to support more types trunction when
> sizeof (otype) < sizeof (itype).  The below truncation will be
> covered.
>
> * uint64_t => uint8_t
> * uint64_t => uint16_t
> * uint64_t => uint32_t
> * uint32_t => uint8_t
> * uint32_t => uint16_t
> * uint16_t => uint8_t
>
> The below test suites are passed for this patch:
> 1. The rv64gcv fully regression tests.
> 2. The rv64gcv build with glibc.
> 3. The x86 bootstrap tests.
> 4. The x86 fully regression tests.

OK.

> gcc/ChangeLog:
>
> * match.pd: Allow any otype is less than itype truncation.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/match.pd | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 7fff7b5f9fe..f708f4622bd 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3239,16 +3239,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  (match (unsigned_integer_sat_trunc @0)
>   (bit_ior:c (negate (convert (gt @0 INTEGER_CST@1)))
> (convert @0))
> - (with {
> + (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> +  && TYPE_UNSIGNED (TREE_TYPE (@0)))
> + (with
> +  {
> unsigned itype_precision = TYPE_PRECISION (TREE_TYPE (@0));
> unsigned otype_precision = TYPE_PRECISION (type);
> -   wide_int trunc_max = wi::mask (itype_precision / 2, false, 
> itype_precision);
> +   wide_int trunc_max = wi::mask (otype_precision, false, itype_precision);
> wide_int int_cst = wi::to_wide (@1, itype_precision);
>}
> -  (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> -   && TYPE_UNSIGNED (TREE_TYPE (@0))
> -   && otype_precision < itype_precision
> -   && wi::eq_p (trunc_max, int_cst)
> +  (if (otype_precision < itype_precision && wi::eq_p (trunc_max, 
> int_cst))
>
>  /* x >  y  &&  x != XXX_MIN  -->  x > y
> x >  y  &&  x == XXX_MIN  -->  false . */
> --
> 2.34.1
>


RE: [PATCH v2] Vect: Support IFN SAT_TRUNC for unsigned vector int

2024-07-03 Thread Li, Pan2
> OK.

Committed, thanks Richard.

Pan

-Original Message-
From: Richard Biener  
Sent: Wednesday, July 3, 2024 5:06 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
tamar.christ...@arm.com; jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v2] Vect: Support IFN SAT_TRUNC for unsigned vector int

On Wed, Jul 3, 2024 at 3:33 AM  wrote:
>
> From: Pan Li 
>
> This patch would like to support the .SAT_TRUNC for the unsigned
> vector int.  Given we have below example code:
>
> Form 1
>   #define VEC_DEF_SAT_U_TRUC_FMT_1(NT, WT) \
>   void __attribute__((noinline))   \
>   vec_sat_u_truc_##WT##_to_##NT##_fmt_1 (NT *x, WT *y, unsigned limit) \
>   {\
> for (unsigned i = 0; i < limit; i++)   \
>   {\
> bool overflow = y[i] > (WT)(NT)(-1);   \
> x[i] = ((NT)y[i]) | (NT)-overflow; \
>   }\
>   }
>
> VEC_DEF_SAT_U_TRUC_FMT_1 (uint32_t, uint64_t)
>
> Before this patch:
> void vec_sat_u_truc_uint64_t_to_uint32_t_fmt_1 (uint32_t * x, uint64_t * y, 
> unsigned int limit)
> {
>   ...
>   _51 = .SELECT_VL (ivtmp_49, POLY_INT_CST [2, 2]);
>   ivtmp_35 = _51 * 8;
>   vect__4.7_32 = .MASK_LEN_LOAD (vectp_y.5_34, 64B, { -1, ... }, _51, 0);
>   mask_overflow_16.8_30 = vect__4.7_32 > { 4294967295, ... };
>   vect__5.9_29 = (vector([2,2]) unsigned int) vect__4.7_32;
>   vect__10.13_20 = .VCOND_MASK (mask_overflow_16.8_30, { 4294967295, ... }, 
> vect__5.9_29);
>   ivtmp_12 = _51 * 4;
>   .MASK_LEN_STORE (vectp_x.14_11, 32B, { -1, ... }, _51, 0, vect__10.13_20);
>   vectp_y.5_33 = vectp_y.5_34 + ivtmp_35;
>   vectp_x.14_46 = vectp_x.14_11 + ivtmp_12;
>   ivtmp_50 = ivtmp_49 - _51;
>   if (ivtmp_50 != 0)
>   ...
> }
>
> After this patch:
> void vec_sat_u_truc_uint64_t_to_uint32_t_fmt_1 (uint32_t * x, uint64_t * y, 
> unsigned int limit)
> {
>   ...
>   _12 = .SELECT_VL (ivtmp_21, POLY_INT_CST [2, 2]);
>   ivtmp_34 = _12 * 8;
>   vect__4.7_31 = .MASK_LEN_LOAD (vectp_y.5_33, 64B, { -1, ... }, _12, 0);
>   vect_patt_40.8_30 = .SAT_TRUNC (vect__4.7_31); // << .SAT_TRUNC
>   ivtmp_29 = _12 * 4;
>   .MASK_LEN_STORE (vectp_x.9_28, 32B, { -1, ... }, _12, 0, vect_patt_40.8_30);
>   vectp_y.5_32 = vectp_y.5_33 + ivtmp_34;
>   vectp_x.9_27 = vectp_x.9_28 + ivtmp_29;
>   ivtmp_20 = ivtmp_21 - _12;
>   if (ivtmp_20 != 0)
>   ...
> }
>
> The below test suites are passed for this patch
> * The x86 bootstrap test.
> * The x86 fully regression test.
> * The rv64gcv fully regression tests.

OK.

Thanks,
Richard.

> gcc/ChangeLog:
>
> * tree-vect-patterns.cc (gimple_unsigned_integer_sat_trunc): Add
> new decl generated by match.
> (vect_recog_sat_trunc_pattern): Add new func impl to recog the
> .SAT_TRUNC pattern.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/tree-vect-patterns.cc | 54 +++
>  1 file changed, 54 insertions(+)
>
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index 519d15f2a43..86e893a1c43 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -4489,6 +4489,7 @@ vect_recog_mult_pattern (vec_info *vinfo,
>
>  extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree));
>  extern bool gimple_unsigned_integer_sat_sub (tree, tree*, tree (*)(tree));
> +extern bool gimple_unsigned_integer_sat_trunc (tree, tree*, tree (*)(tree));
>
>  static gimple *
>  vect_recog_build_binary_gimple_stmt (vec_info *vinfo, stmt_vec_info 
> stmt_info,
> @@ -4603,6 +4604,58 @@ vect_recog_sat_sub_pattern (vec_info *vinfo, 
> stmt_vec_info stmt_vinfo,
>return NULL;
>  }
>
> +/*
> + * Try to detect saturation truncation pattern (SAT_TRUNC), aka below gimple:
> + *   overflow_5 = x_4(D) > 4294967295;
> + *   _1 = (unsigned int) x_4(D);
> + *   _2 = (unsigned int) overflow_5;
> + *   _3 = -_2;
> + *   _6 = _1 | _3;
> + *
> + * And then simplied to
> + *   _6 = .SAT_TRUNC (x_4(D));
> + */
> +
> +static gimple *
> +vect_recog_sat_trunc_pattern (vec_info *vinfo, stmt_vec_info stmt_vinfo,
> + tree *type_out)
> +{
> +  gimple *last_stmt = STMT_VINFO_STMT (stmt_vinfo);
> +
> +  if (!is_gimple_assign (last_stmt))
> +return NULL;
> +
> +  tree ops[1];
> +  tree lhs = gimple_assign_lhs (last_stmt);
> +
> +  if (gimpl

  1   2   3   4   5   6   7   8   9   10   >