Hi Pan,

> -----Original Message-----
> From: pan2...@intel.com <pan2...@intel.com>
> Sent: Monday, May 20, 2024 12:01 PM
> To: gcc-patches@gcc.gnu.org
> Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; Tamar Christina
> <tamar.christ...@arm.com>; richard.guent...@gmail.com; Pan Li
> <pan2...@intel.com>
> Subject: [PATCH v1 1/2] Match: Support branch form for unsigned SAT_ADD
> 
> From: Pan Li <pan2...@intel.com>
> 
> This patch would like to support the branch form for unsigned
> SAT_ADD.  For example as below:
> 
> uint64_t
> sat_add (uint64_t x, uint64_t y)
> {
>   return (uint64_t) (x + y) >= x ? (x + y) : -1;
> }
> 
> Different to the branchless version,  we leverage the simplify to
> convert the branch version of SAT_ADD into branchless if and only
> if the backend has supported the IFN_SAT_ADD.  Thus,  the backend has
> the ability to choose branch or branchless implementation of .SAT_ADD.
> For example,  some target can take care of branches code more optimally.
> 
> When the target implement the IFN_SAT_ADD for unsigned and before this
> patch:
> uint64_t sat_add_u_1_uint64_t (uint64_t x, uint64_t y)
> {
>   long unsigned int _1;
>   uint64_t _2;
>   __complex__ long unsigned int _6;
>   long unsigned int _7;
> 
> ;;   basic block 2, loop depth 0
> ;;    pred:       ENTRY
>   _6 = .ADD_OVERFLOW (x_3(D), y_4(D));
>   _1 = REALPART_EXPR <_6>;
>   _7 = IMAGPART_EXPR <_6>;
>   if (_7 == 0)
>     goto <bb 4>; [65.00%]
>   else
>     goto <bb 3>; [35.00%]
> ;;    succ:       4
> ;;                3
> 
> ;;   basic block 3, loop depth 0
> ;;    pred:       2
> ;;    succ:       4
> 
> ;;   basic block 4, loop depth 0
> ;;    pred:       3
> ;;                2
>   # _2 = PHI <18446744073709551615(3), _1(2)>
>   return _2;
> ;;    succ:       EXIT
> 
> }
> 
> After this patch:
> uint64_t sat_add (uint64_t x, uint64_t y)
> {
>   long unsigned int _9;
> 
> ;;   basic block 2, loop depth 0
> ;;    pred:       ENTRY
>   _9 = .SAT_ADD (x_3(D), y_4(D)); [tail call]
>   return _9;
> ;;    succ:       EXIT
> }
> 
> The below test suites are passed for this patch:
> * The x86 bootstrap test.
> * The x86 fully regression test.
> * The riscv fully regression test.
> 
> gcc/ChangeLog:
> 
>       * match.pd: Add new simplify to convert branch SAT_ADD into
>       branchless,  if and only if backend implement the IFN.
> 
> Signed-off-by: Pan Li <pan2...@intel.com>
> ---
>  gcc/match.pd | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 0f9c34fa897..0547b57b3a3 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3094,6 +3094,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  (match (unsigned_integer_sat_add @0 @1)
>   (bit_ior:c (usadd_left_part_2 @0 @1) (usadd_right_part_2 @0 @1)))
> 
> +#if GIMPLE
> +
> +/* Simplify the branch version of SAT_ADD into branchless if and only if
> +   the backend has supported the IFN_SAT_ADD.  Thus, the backend has the
> +   ability to choose branch or branchless implementation of .SAT_ADD.  */
> +
> +(simplify
> + (cond (ge (plus:c@2 @0 @1) @0) @2 integer_minus_onep)
> +  (if (direct_internal_fn_supported_p (IFN_SAT_ADD, type,
> OPTIMIZE_FOR_BOTH))
> +   (bit_ior @2 (negate (convert (lt @2 @0))))))
> +
> +(simplify
> + (cond (le @0 (plus:c@2 @0 @1)) @2 integer_minus_onep)
> +  (if (direct_internal_fn_supported_p (IFN_SAT_ADD, type,
> OPTIMIZE_FOR_BOTH))
> +   (bit_ior @2 (negate (convert (lt @2 @0))))))
> +
> +#endif

Thanks, this looks good to me!

I'll leave it up to Richard to approve,
Richard: The reason for the direct_internal_fn_supported_p is because some
targets said that they currently handle the branch version better due to the 
lack
of some types.  At the time I reason it's just a target expansion bug but 
didn't hear anything.

To be honest, it feels to me like we should do this unconditionally, and just 
have the targets
that get faster branch version to handle it during expand? Since the patch 
series provides
a canonicalized version now.

This means we can also better support targets that have the vector optab but 
not the scalar one
as the above check would fail for these targets.

What do you think?

Thanks,
Tamar

> +
>  /* x >  y  &&  x != XXX_MIN  -->  x > y
>     x >  y  &&  x == XXX_MIN  -->  false . */
>  (for eqne (eq ne)
> --
> 2.34.1

Reply via email to