RE: [PATCH v1] Internal-fn: Add new internal function SAT_ADDU

Li, Pan2 Sat, 24 Feb 2024 03:19:23 -0800

Hi Tamar and Richard.

Just try DEF_INTERNAL_INT_EXT_FN as below draft patch, not very sure if
my understanding is correct(mostly reference the popcount implementation) here.
Thanks a lot.


https://gcc.gnu.org/pipermail/gcc-patches/2024-February/646442.html

Pan

-----Original Message-----
From: Tamar Christina <tamar.christ...@arm.com> 
Sent: Monday, February 19, 2024 9:05 PM
To: Li, Pan2 <pan2...@intel.com>; Richard Biener <richard.guent...@gmail.com>
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang 
<yanzhang.w...@intel.com>; kito.ch...@gmail.com
Subject: RE: [PATCH v1] Internal-fn: Add new internal function SAT_ADDU

> -----Original Message-----
> From: Li, Pan2 <pan2...@intel.com>
> Sent: Monday, February 19, 2024 12:59 PM
> To: Tamar Christina <tamar.christ...@arm.com>; Richard Biener
> <richard.guent...@gmail.com>
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang
> <yanzhang.w...@intel.com>; kito.ch...@gmail.com
> Subject: RE: [PATCH v1] Internal-fn: Add new internal function SAT_ADDU
> 
> Thanks Tamar for comments and explanations.
> 
> > I think we should actually do an indirect optab here, because the IFN can 
> > be used
> > to replace the general representation of saturating arithmetic.
> 
> > e.g. the __builtin_add_overflow case in
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600
> > is inefficient on all targets and so the IFN can always expand to something 
> > that's
> more
> > efficient like the branchless version add_sat2.
> 
> > I think this is why you suggested a new tree code below, but we don't 
> > really need
> > tree-codes for this. It can be done cleaner using the same way as
> DEF_INTERNAL_INT_EXT_FN
> 
> Yes, the backend could choose a branchless(of course we always hate branch for
> performance) code-gen or even better there is one saturation insn.
> Good to learn DEF_INTERNAL_INT_EXT_FN, and will have a try for it.
> 
> > Richard means that there shouldn't be .SAT_ADDU and .SAT_ADDS  and that the
> sign
> > should be determined by the types at expansion time.  i.e. there should 
> > only be
> > .SAT_ADD.
> 
> Got it, my initial idea comes from that we may have two insns for saturation 
> add,
> mostly these insns need to be signed or unsigned.
> For example, slt/sltu in riscv scalar. But I am not very clear about a 
> scenario like this.
> During define_expand in backend, we hit the standard name
> sat_add_<m>3 but can we tell it is signed or not here? AFAIK, we only have 
> QI, HI,
> SI and DI.

Yeah, the way DEF_INTERNAL_SIGNED_OPTAB_FN works is that you give it two optabs,
one for when it's signed and one for when it's unsigned, and the right one is 
picked
automatically during expansion.  But in GIMPLE you'd only have one IFN.

> Maybe I will have the answer after try DEF_INTERNAL_SIGNED_OPTAB_FN, will
> keep you posted.

Awesome, Thanks!

Tamar
> 
> Pan
> 
> -----Original Message-----
> From: Tamar Christina <tamar.christ...@arm.com>
> Sent: Monday, February 19, 2024 4:55 PM
> To: Li, Pan2 <pan2...@intel.com>; Richard Biener <richard.guent...@gmail.com>
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang
> <yanzhang.w...@intel.com>; kito.ch...@gmail.com
> Subject: RE: [PATCH v1] Internal-fn: Add new internal function SAT_ADDU
> 
> Thanks for doing this!
> 
> > -----Original Message-----
> > From: Li, Pan2 <pan2...@intel.com>
> > Sent: Monday, February 19, 2024 8:42 AM
> > To: Richard Biener <richard.guent...@gmail.com>
> > Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang
> > <yanzhang.w...@intel.com>; kito.ch...@gmail.com; Tamar Christina
> > <tamar.christ...@arm.com>
> > Subject: RE: [PATCH v1] Internal-fn: Add new internal function SAT_ADDU
> >
> > Thanks Richard for comments.
> >
> > > I'll note that on RTL we already have SS_PLUS/US_PLUS and friends and
> > > the corresponding ssadd/usadd optabs.  There's not much documentation
> > > unfortunately besides the use of gen_*_fixed_libfunc usage where the
> comment
> > > suggests this is used for fixed-point operations.  It looks like arm uses
> > > fractional/accumulator modes for this but for example bfin has ssaddsi3.
> >
> > I find the related description about plus family in GCC internals doc but 
> > it doesn't
> > mention
> > anything about mode m here.
> >
> > (plus:m x y)
> > (ss_plus:m x y)
> > (us_plus:m x y)
> > These three expressions all represent the sum of the values represented by x
> > and y carried out in machine mode m. They diff er in their behavior on 
> > overflow
> > of integer modes. plus wraps round modulo the width of m; ss_plus saturates
> > at the maximum signed value representable in m; us_plus saturates at the
> > maximum unsigned value.
> >
> > > The natural thing is to use direct optab internal functions (that's what 
> > > you
> > > basically did, but you added a new optab, IMO without good reason).
> 
> I think we should actually do an indirect optab here, because the IFN can be 
> used
> to replace the general representation of saturating arithmetic.
> 
> e.g. the __builtin_add_overflow case in
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600
> is inefficient on all targets and so the IFN can always expand to something 
> that's
> more
> efficient like the branchless version add_sat2.
> 
> I think this is why you suggested a new tree code below, but we don't really 
> need
> tree-codes for this. It can be done cleaner using the same way as
> DEF_INTERNAL_INT_EXT_FN.
> 
> >
> > That makes sense to me, I will try to leverage US_PLUS instead here.
> >
> > > More GIMPLE-like would be to let the types involved decide whether
> > > it's signed or unsigned saturation.  That's actually what I'd prefer here
> > > and if we don't map 1:1 to optabs then instead use tree codes like
> > > S_PLUS_EXPR (mimicing RTL here).
> >
> > Sorry I don't get the point here for GIMPLE-like way. For the .SAT_ADDU, I 
> > add
> one
> > restriction
> > like unsigned_p (type) in match.pd. Looks we have a better way here.
> >
> 
> Richard means that there shouldn't be .SAT_ADDU and .SAT_ADDS  and that the
> sign
> should be determined by the types at expansion time.  i.e. there should only 
> be
> .SAT_ADD.
> 
> i.e. instead of this
> 
> +DEF_INTERNAL_OPTAB_FN (SAT_ADDU, ECF_CONST | ECF_NOTHROW, sat_addu,
> binary)
> 
> You should use DEF_INTERNAL_SIGNED_OPTAB_FN.
> 
> Regards,
> Tamar
> 
> > > Any other opinions?  Anyone knows more about fixed-point and RTL/modes?
> >
> > AFAIK, the scalar of the riscv backend doesn't have fixed-point but the 
> > vector
> does
> > have. They
> > share the same mode as vector integer. For example, RVVM1SI in vector-
> > iterators.md. Kito
> > and Juzhe can help to correct me if any misunderstandings.
> >
> > Pan
> >
> > -----Original Message-----
> > From: Richard Biener <richard.guent...@gmail.com>
> > Sent: Monday, February 19, 2024 3:36 PM
> > To: Li, Pan2 <pan2...@intel.com>
> > Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang
> > <yanzhang.w...@intel.com>; kito.ch...@gmail.com;
> tamar.christ...@arm.com
> > Subject: Re: [PATCH v1] Internal-fn: Add new internal function SAT_ADDU
> >
> > On Sat, Feb 17, 2024 at 11:30 AM <pan2...@intel.com> wrote:
> > >
> > > From: Pan Li <pan2...@intel.com>
> > >
> > > This patch would like to add the middle-end presentation for the
> > > unsigned saturation add.  Aka set the result of add to the max
> > > when overflow.  It will take the pattern similar as below.
> > >
> > > SAT_ADDU (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))
> > >
> > > Take uint8_t as example, we will have:
> > >
> > > * SAT_ADDU (1, 254)   => 255.
> > > * SAT_ADDU (1, 255)   => 255.
> > > * SAT_ADDU (2, 255)   => 255.
> > > * SAT_ADDU (255, 255) => 255.
> > >
> > > The patch also implement the SAT_ADDU in the riscv backend as
> > > the sample.  Given below example:
> > >
> > > uint64_t sat_add_u64 (uint64_t x, uint64_t y)
> > > {
> > >   return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
> > > }
> > >
> > > Before this patch:
> > >
> > > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > > {
> > >   long unsigned int _1;
> > >   _Bool _2;
> > >   long unsigned int _3;
> > >   long unsigned int _4;
> > >   uint64_t _7;
> > >   long unsigned int _10;
> > >   __complex__ long unsigned int _11;
> > >
> > > ;;   basic block 2, loop depth 0
> > > ;;    pred:       ENTRY
> > >   _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
> > >   _1 = REALPART_EXPR <_11>;
> > >   _10 = IMAGPART_EXPR <_11>;
> > >   _2 = _10 != 0;
> > >   _3 = (long unsigned int) _2;
> > >   _4 = -_3;
> > >   _7 = _1 | _4;
> > >   return _7;
> > > ;;    succ:       EXIT
> > >
> > > }
> > >
> > > After this patch:
> > >
> > > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > > {
> > >   uint64_t _7;
> > >
> > > ;;   basic block 2, loop depth 0
> > > ;;    pred:       ENTRY
> > >   _7 = .SAT_ADDU (x_5(D), y_6(D)); [tail call]
> > >   return _7;
> > > ;;    succ:       EXIT
> > >
> > > }
> > >
> > > Then we will have the middle-end representation like .SAT_ADDU after
> > > this patch.
> >
> > I'll note that on RTL we already have SS_PLUS/US_PLUS and friends and
> > the corresponding ssadd/usadd optabs.  There's not much documentation
> > unfortunately besides the use of gen_*_fixed_libfunc usage where the comment
> > suggests this is used for fixed-point operations.  It looks like arm uses
> > fractional/accumulator modes for this but for example bfin has ssaddsi3.
> >
> > So the question is whether the fixed-point case can be distinguished from
> > the integer case based on mode.
> >
> > There's also FIXED_POINT_TYPE on the GENERIC/GIMPLE side and
> > no special tree operator codes for them.  So compared to what appears
> > to be the case on RTL we'd need a way to represent saturating integer
> > operations on GIMPLE.
> >
> > The natural thing is to use direct optab internal functions (that's what you
> > basically did, but you added a new optab, IMO without good reason).
> > More GIMPLE-like would be to let the types involved decide whether
> > it's signed or unsigned saturation.  That's actually what I'd prefer here
> > and if we don't map 1:1 to optabs then instead use tree codes like
> > S_PLUS_EXPR (mimicing RTL here).
> >
> > Any other opinions?  Anyone knows more about fixed-point and RTL/modes?
> >
> > Richard.
> >
> > >         PR target/51492
> > >         PR target/112600
> > >
> > > gcc/ChangeLog:
> > >
> > >         * config/riscv/riscv-protos.h (riscv_expand_saturation_addu):
> > >         New func decl for the SAT_ADDU expand.
> > >         * config/riscv/riscv.cc (riscv_expand_saturation_addu): New func
> > >         impl for the SAT_ADDU expand.
> > >         * config/riscv/riscv.md (sat_addu_<mode>3): New pattern to impl
> > >         the standard name SAT_ADDU.
> > >         * doc/md.texi: Add doc for SAT_ADDU.
> > >         * internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADDU.
> > >         * internal-fn.def (SAT_ADDU): Add SAT_ADDU.
> > >         * match.pd: Add simplify pattern patch for SAT_ADDU.
> > >         * optabs.def (OPTAB_D): Add sat_addu_optab.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >         * gcc.target/riscv/sat_addu-1.c: New test.
> > >         * gcc.target/riscv/sat_addu-2.c: New test.
> > >         * gcc.target/riscv/sat_addu-3.c: New test.
> > >         * gcc.target/riscv/sat_addu-4.c: New test.
> > >         * gcc.target/riscv/sat_addu-run-1.c: New test.
> > >         * gcc.target/riscv/sat_addu-run-2.c: New test.
> > >         * gcc.target/riscv/sat_addu-run-3.c: New test.
> > >         * gcc.target/riscv/sat_addu-run-4.c: New test.
> > >         * gcc.target/riscv/sat_arith.h: New test.
> > >
> > > Signed-off-by: Pan Li <pan2...@intel.com>
> > > ---
> > >  gcc/config/riscv/riscv-protos.h               |  1 +
> > >  gcc/config/riscv/riscv.cc                     | 46 +++++++++++++++++
> > >  gcc/config/riscv/riscv.md                     | 11 +++++
> > >  gcc/doc/md.texi                               | 11 +++++
> > >  gcc/internal-fn.cc                            |  1 +
> > >  gcc/internal-fn.def                           |  1 +
> > >  gcc/match.pd                                  | 22 +++++++++
> > >  gcc/optabs.def                                |  2 +
> > >  gcc/testsuite/gcc.target/riscv/sat_addu-1.c   | 18 +++++++
> > >  gcc/testsuite/gcc.target/riscv/sat_addu-2.c   | 20 ++++++++
> > >  gcc/testsuite/gcc.target/riscv/sat_addu-3.c   | 17 +++++++
> > >  gcc/testsuite/gcc.target/riscv/sat_addu-4.c   | 16 ++++++
> > >  .../gcc.target/riscv/sat_addu-run-1.c         | 42 ++++++++++++++++
> > >  .../gcc.target/riscv/sat_addu-run-2.c         | 42 ++++++++++++++++
> > >  .../gcc.target/riscv/sat_addu-run-3.c         | 42 ++++++++++++++++
> > >  .../gcc.target/riscv/sat_addu-run-4.c         | 49 +++++++++++++++++++
> > >  gcc/testsuite/gcc.target/riscv/sat_arith.h    | 15 ++++++
> > >  17 files changed, 356 insertions(+)
> > >  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-1.c
> > >  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-2.c
> > >  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-3.c
> > >  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-4.c
> > >  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-run-1.c
> > >  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-run-2.c
> > >  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-run-3.c
> > >  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-run-4.c
> > >  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_arith.h
> > >
> > > diff --git a/gcc/config/riscv/riscv-protos.h 
> > > b/gcc/config/riscv/riscv-protos.h
> > > index ae1685850ac..f201b2384f9 100644
> > > --- a/gcc/config/riscv/riscv-protos.h
> > > +++ b/gcc/config/riscv/riscv-protos.h
> > > @@ -132,6 +132,7 @@ extern void riscv_asm_output_external (FILE *, const
> > tree, const char *);
> > >  extern bool
> > >  riscv_zcmp_valid_stack_adj_bytes_p (HOST_WIDE_INT, int);
> > >  extern void riscv_legitimize_poly_move (machine_mode, rtx, rtx, rtx);
> > > +extern void riscv_expand_saturation_addu (rtx, rtx, rtx);
> > >
> > >  #ifdef RTX_CODE
> > >  extern void riscv_expand_int_scc (rtx, enum rtx_code, rtx, rtx, bool 
> > > *invert_ptr
> =
> > 0);
> > > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> > > index 799d7919a4a..84e86eb5d49 100644
> > > --- a/gcc/config/riscv/riscv.cc
> > > +++ b/gcc/config/riscv/riscv.cc
> > > @@ -10657,6 +10657,52 @@ riscv_vector_mode_supported_any_target_p
> > (machine_mode)
> > >    return true;
> > >  }
> > >
> > > +/* Emit insn for the saturation addu, aka (x + y) | - ((x + y) < x).  */
> > > +void
> > > +riscv_expand_saturation_addu (rtx dest, rtx x, rtx y)
> > > +{
> > > +  machine_mode mode = GET_MODE (dest);
> > > +  rtx pmode_sum = gen_reg_rtx (Pmode);
> > > +  rtx pmode_lt = gen_reg_rtx (Pmode);
> > > +  rtx pmode_x = gen_lowpart (Pmode, x);
> > > +  rtx pmode_y = gen_lowpart (Pmode, y);
> > > +  rtx pmode_dest = gen_reg_rtx (Pmode);
> > > +
> > > +  /* Step-1: sum = x + y  */
> > > +  if (mode == SImode && mode != Pmode)
> > > +    { /* Take addw to avoid the sum truncate.  */
> > > +      rtx simode_sum = gen_reg_rtx (SImode);
> > > +      riscv_emit_binary (PLUS, simode_sum, x, y);
> > > +      emit_move_insn (pmode_sum, gen_lowpart (Pmode, simode_sum));
> > > +    }
> > > +  else
> > > +    riscv_emit_binary (PLUS, pmode_sum, pmode_x, pmode_y);
> > > +
> > > +  /* Step-1.1: truncate sum for HI and QI as we have no insn for add 
> > > QI/HI.  */
> > > +  if (mode == HImode || mode == QImode)
> > > +    {
> > > +      int shift_bits = GET_MODE_BITSIZE (Pmode)
> > > +       - GET_MODE_BITSIZE (mode).to_constant ();
> > > +
> > > +      gcc_assert (shift_bits > 0);
> > > +
> > > +      riscv_emit_binary (ASHIFT, pmode_sum, pmode_sum, GEN_INT
> (shift_bits));
> > > +      riscv_emit_binary (LSHIFTRT, pmode_sum, pmode_sum, GEN_INT
> > (shift_bits));
> > > +    }
> > > +
> > > +  /* Step-2: lt = sum < x  */
> > > +  riscv_emit_binary (LTU, pmode_lt, pmode_sum, pmode_x);
> > > +
> > > +  /* Step-3: lt = -lt  */
> > > +  riscv_emit_unary (NEG, pmode_lt, pmode_lt);
> > > +
> > > +  /* Step-4: pmode_dest = sum | lt  */
> > > +  riscv_emit_binary (IOR, pmode_dest, pmode_lt, pmode_sum);
> > > +
> > > +  /* Step-5: dest = pmode_dest */
> > > +  emit_move_insn (dest, gen_lowpart (mode, pmode_dest));
> > > +}
> > > +
> > >  /* Initialize the GCC target structure.  */
> > >  #undef TARGET_ASM_ALIGNED_HI_OP
> > >  #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
> > > diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
> > > index 39b29795cd6..03cbe5a2ca9 100644
> > > --- a/gcc/config/riscv/riscv.md
> > > +++ b/gcc/config/riscv/riscv.md
> > > @@ -3841,6 +3841,17 @@ (define_insn "*large_load_address"
> > >    [(set_attr "type" "load")
> > >     (set (attr "length") (const_int 8))])
> > >
> > > +(define_expand "sat_addu_<mode>3"
> > > +  [(match_operand:ANYI   0 "register_operand")
> > > +   (match_operand:ANYI   1 "register_operand")
> > > +   (match_operand:ANYI   2 "register_operand")]
> > > +  ""
> > > +  {
> > > +    riscv_expand_saturation_addu (operands[0], operands[1], operands[2]);
> > > +    DONE;
> > > +  }
> > > +)
> > > +
> > >  (include "bitmanip.md")
> > >  (include "crypto.md")
> > >  (include "sync.md")
> > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> > > index b0c61925120..5867afdb1a0 100644
> > > --- a/gcc/doc/md.texi
> > > +++ b/gcc/doc/md.texi
> > > @@ -6653,6 +6653,17 @@ The operation is only supported for vector modes
> > @var{m}.
> > >
> > >  This pattern is not allowed to @code{FAIL}.
> > >
> > > +@cindex @code{sat_addu_@var{m}3} instruction pattern
> > > +@item @samp{sat_addu_@var{m}3}
> > > +Perform the saturation unsigned add for the operand 1 and operand 2 and
> > > +store the result into the operand 0.  All operands have mode @var{m},
> > > +which is a scalar integer mode.
> > > +
> > > +@smallexample
> > > +  typedef unsigned char uint8_t;
> > > +  uint8_t sat_addu (uint8_t x, uint8_t y) => return (x + y) | -((x + y) 
> > > < x);
> > > +@end smallexample
> > > +
> > >  @cindex @code{cmla@var{m}4} instruction pattern
> > >  @item @samp{cmla@var{m}4}
> > >  Perform a vector multiply and accumulate that is semantically the same as
> > > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> > > index a07f25f3aee..dee73dbc614 100644
> > > --- a/gcc/internal-fn.cc
> > > +++ b/gcc/internal-fn.cc
> > > @@ -4159,6 +4159,7 @@ commutative_binary_fn_p (internal_fn fn)
> > >      case IFN_VEC_WIDEN_PLUS_HI:
> > >      case IFN_VEC_WIDEN_PLUS_EVEN:
> > >      case IFN_VEC_WIDEN_PLUS_ODD:
> > > +    case IFN_SAT_ADDU:
> > >        return true;
> > >
> > >      default:
> > > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> > > index c14d30365c1..a04592fc779 100644
> > > --- a/gcc/internal-fn.def
> > > +++ b/gcc/internal-fn.def
> > > @@ -428,6 +428,7 @@ DEF_INTERNAL_WIDENING_OPTAB_FN
> > (VEC_WIDEN_ABD,
> > >                                 binary)
> > >  DEF_INTERNAL_OPTAB_FN (VEC_FMADDSUB, ECF_CONST, vec_fmaddsub,
> > ternary)
> > >  DEF_INTERNAL_OPTAB_FN (VEC_FMSUBADD, ECF_CONST, vec_fmsubadd,
> > ternary)
> > > +DEF_INTERNAL_OPTAB_FN (SAT_ADDU, ECF_CONST | ECF_NOTHROW,
> > sat_addu, binary)
> > >
> > >  /* FP scales.  */
> > >  DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary)
> > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > index 711c3a10c3f..9de1106adcf 100644
> > > --- a/gcc/match.pd
> > > +++ b/gcc/match.pd
> > > @@ -1994,6 +1994,28 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > >   )
> > >  )
> > >
> > > +#if GIMPLE
> > > +
> > > +/* Saturation add unsigned, aka:
> > > +   SAT_ADDU = (X + Y) | - ((X + Y) < X) or
> > > +   SAT_ADDU = (X + Y) | - ((X + Y) < Y).  */
> > > +(simplify
> > > + (bit_ior:c (plus:c@2 @0 @1) (negate (convert (lt @2 @0))))
> > > +   (if (optimize
> > > +       && INTEGRAL_TYPE_P (type)
> > > +       && TYPE_UNSIGNED (TREE_TYPE (@0))
> > > +       && types_match (type, TREE_TYPE (@0))
> > > +       && types_match (type, TREE_TYPE (@1))
> > > +       && direct_internal_fn_supported_p (IFN_SAT_ADDU, type,
> > OPTIMIZE_FOR_BOTH))
> > > +   (IFN_SAT_ADDU @0 @1)))
> > > +
> > > +/* SAT_ADDU (X, 0) = X  */
> > > +(simplify
> > > + (IFN_SAT_ADDU:c @0 integer_zerop)
> > > + @0)
> > > +
> > > +#endif
> > > +
> > >  /* A few cases of fold-const.cc negate_expr_p predicate.  */
> > >  (match negate_expr_p
> > >   INTEGER_CST
> > > diff --git a/gcc/optabs.def b/gcc/optabs.def
> > > index ad14f9328b9..a2c11b7707b 100644
> > > --- a/gcc/optabs.def
> > > +++ b/gcc/optabs.def
> > > @@ -300,6 +300,8 @@ OPTAB_D (usubc5_optab, "usubc$I$a5")
> > >  OPTAB_D (addptr3_optab, "addptr$a3")
> > >  OPTAB_D (spaceship_optab, "spaceship$a3")
> > >
> > > +OPTAB_D (sat_addu_optab, "sat_addu_$a3")
> > > +
> > >  OPTAB_D (smul_highpart_optab, "smul$a3_highpart")
> > >  OPTAB_D (umul_highpart_optab, "umul$a3_highpart")
> > >
> > > diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-1.c
> > b/gcc/testsuite/gcc.target/riscv/sat_addu-1.c
> > > new file mode 100644
> > > index 00000000000..229abef0faa
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-1.c
> > > @@ -0,0 +1,18 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fno-schedule-insns -fno-
> > schedule-insns2" } */
> > > +/* { dg-skip-if "" { *-*-* } { "-flto" } } */
> > > +/* { dg-final { check-function-bodies "**" "" } } */
> > > +
> > > +#include "sat_arith.h"
> > > +
> > > +/*
> > > +** sat_addu_uint8_t:
> > > +** add\s+[atx][0-9]+,\s*a0,\s*a1
> > > +** andi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*0xff
> > > +** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> > > +** neg\s+[atx][0-9]+,\s*[atx][0-9]+
> > > +** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> > > +** andi\s+a0,\s*a0,\s*0xff
> > > +** ret
> > > +*/
> > > +DEF_SAT_ADDU(uint8_t)
> > > diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-2.c
> > b/gcc/testsuite/gcc.target/riscv/sat_addu-2.c
> > > new file mode 100644
> > > index 00000000000..4023b030811
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-2.c
> > > @@ -0,0 +1,20 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fno-schedule-insns -fno-
> > schedule-insns2" } */
> > > +/* { dg-skip-if "" { *-*-* } { "-flto" } } */
> > > +/* { dg-final { check-function-bodies "**" "" } } */
> > > +
> > > +#include "sat_arith.h"
> > > +
> > > +/*
> > > +** sat_addu_uint16_t:
> > > +** add\s+[atx][0-9]+,\s*a0,\s*a1
> > > +** slli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
> > > +** srli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
> > > +** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> > > +** neg\s+[atx][0-9]+,\s*[atx][0-9]+
> > > +** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> > > +** slli\s+a0,\s*a0,\s*48
> > > +** srli\s+a0,\s*a0,\s*48
> > > +** ret
> > > +*/
> > > +DEF_SAT_ADDU(uint16_t)
> > > diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-3.c
> > b/gcc/testsuite/gcc.target/riscv/sat_addu-3.c
> > > new file mode 100644
> > > index 00000000000..4d0af97fb67
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-3.c
> > > @@ -0,0 +1,17 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fno-schedule-insns -fno-
> > schedule-insns2" } */
> > > +/* { dg-skip-if "" { *-*-* } { "-flto" } } */
> > > +/* { dg-final { check-function-bodies "**" "" } } */
> > > +
> > > +#include "sat_arith.h"
> > > +
> > > +/*
> > > +** sat_addu_uint32_t:
> > > +** addw\s+[atx][0-9]+,\s*a0,\s*a1
> > > +** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> > > +** neg\s+[atx][0-9]+,\s*[atx][0-9]+
> > > +** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> > > +** sext.w\s+a0,\s*a0
> > > +** ret
> > > +*/
> > > +DEF_SAT_ADDU(uint32_t)
> > > diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-4.c
> > b/gcc/testsuite/gcc.target/riscv/sat_addu-4.c
> > > new file mode 100644
> > > index 00000000000..926f31266e3
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-4.c
> > > @@ -0,0 +1,16 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fno-schedule-insns -fno-
> > schedule-insns2" } */
> > > +/* { dg-skip-if "" { *-*-* } { "-flto" } } */
> > > +/* { dg-final { check-function-bodies "**" "" } } */
> > > +
> > > +#include "sat_arith.h"
> > > +
> > > +/*
> > > +** sat_addu_uint64_t:
> > > +** add\s+[atx][0-9]+,\s*a0,\s*a1
> > > +** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> > > +** neg\s+[atx][0-9]+,\s*[atx][0-9]+
> > > +** or\s+a0,\s*[atx][0-9]+,\s*[atx][0-9]+
> > > +** ret
> > > +*/
> > > +DEF_SAT_ADDU(uint64_t)
> > > diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-run-1.c
> > b/gcc/testsuite/gcc.target/riscv/sat_addu-run-1.c
> > > new file mode 100644
> > > index 00000000000..b19515c39d1
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-run-1.c
> > > @@ -0,0 +1,42 @@
> > > +/* { dg-do run { target { riscv_v } } } */
> > > +/* { dg-additional-options "-std=c99" } */
> > > +
> > > +#include "sat_arith.h"
> > > +
> > > +DEF_SAT_ADDU(uint8_t)
> > > +
> > > +int
> > > +main ()
> > > +{
> > > +  if (RUN_SAT_ADDU (uint8_t, 0, 0) != 0)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint8_t, 0, 1) != 1)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint8_t, 1, 1) != 2)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint8_t, 0, 254) != 254)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint8_t, 1, 254) != 255)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint8_t, 2, 254) != 255)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint8_t, 0, 255) != 255)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint8_t, 1, 255) != 255)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint8_t, 2, 255) != 255)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint8_t, 255, 255) != 255)
> > > +    __builtin_abort ();
> > > +
> > > +  return 0;
> > > +}
> > > diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-run-2.c
> > b/gcc/testsuite/gcc.target/riscv/sat_addu-run-2.c
> > > new file mode 100644
> > > index 00000000000..90073fbe4ba
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-run-2.c
> > > @@ -0,0 +1,42 @@
> > > +/* { dg-do run { target { riscv_v } } } */
> > > +/* { dg-additional-options "-std=c99" } */
> > > +
> > > +#include "sat_arith.h"
> > > +
> > > +DEF_SAT_ADDU(uint16_t)
> > > +
> > > +int
> > > +main ()
> > > +{
> > > +  if (RUN_SAT_ADDU (uint16_t, 0, 0) != 0)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint16_t, 0, 1) != 1)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint16_t, 1, 1) != 2)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint16_t, 0, 65534) != 65534)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint16_t, 1, 65534) != 65535)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint16_t, 2, 65534) != 65535)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint16_t, 0, 65535) != 65535)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint16_t, 1, 65535) != 65535)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint16_t, 2, 65535) != 65535)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint16_t, 65535, 65535) != 65535)
> > > +    __builtin_abort ();
> > > +
> > > +  return 0;
> > > +}
> > > diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-run-3.c
> > b/gcc/testsuite/gcc.target/riscv/sat_addu-run-3.c
> > > new file mode 100644
> > > index 00000000000..996dd3de737
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-run-3.c
> > > @@ -0,0 +1,42 @@
> > > +/* { dg-do run { target { riscv_v } } } */
> > > +/* { dg-additional-options "-std=c99" } */
> > > +
> > > +#include "sat_arith.h"
> > > +
> > > +DEF_SAT_ADDU(uint32_t)
> > > +
> > > +int
> > > +main ()
> > > +{
> > > +  if (RUN_SAT_ADDU (uint32_t, 0, 0) != 0)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint32_t, 0, 1) != 1)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint32_t, 1, 1) != 2)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint32_t, 0, 4294967294) != 4294967294)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint32_t, 1, 4294967294) != 4294967295)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint32_t, 2, 4294967294) != 4294967295)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint32_t, 0, 4294967295) != 4294967295)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint32_t, 1, 4294967295) != 4294967295)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint32_t, 2, 4294967295) != 4294967295)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint32_t, 4294967295, 4294967295) != 4294967295)
> > > +    __builtin_abort ();
> > > +
> > > +  return 0;
> > > +}
> > > diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-run-4.c
> > b/gcc/testsuite/gcc.target/riscv/sat_addu-run-4.c
> > > new file mode 100644
> > > index 00000000000..51a5421577b
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-run-4.c
> > > @@ -0,0 +1,49 @@
> > > +/* { dg-do run { target { riscv_v } } } */
> > > +/* { dg-additional-options "-std=c99" } */
> > > +
> > > +#include "sat_arith.h"
> > > +
> > > +DEF_SAT_ADDU(uint64_t)
> > > +
> > > +int
> > > +main ()
> > > +{
> > > +  if (RUN_SAT_ADDU (uint64_t, 0, 0) != 0)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint64_t, 0, 1) != 1)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint64_t, 1, 1) != 2)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint64_t, 0, 18446744073709551614u)
> > > +    != 18446744073709551614u)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint64_t, 1, 18446744073709551614u)
> > > +    != 18446744073709551615u)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint64_t, 2, 18446744073709551614u)
> > > +    != 18446744073709551615u)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint64_t, 0, 18446744073709551615u)
> > > +    != 18446744073709551615u)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint64_t, 1, 18446744073709551615u)
> > > +    != 18446744073709551615u)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint64_t, 2, 18446744073709551615u)
> > > +    != 18446744073709551615u)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint64_t, 18446744073709551615u,
> > 18446744073709551615u)
> > > +    != 18446744073709551615u)
> > > +    __builtin_abort ();
> > > +
> > > +  return 0;
> > > +}
> > > diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h
> > b/gcc/testsuite/gcc.target/riscv/sat_arith.h
> > > new file mode 100644
> > > index 00000000000..4c00157685e
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
> > > @@ -0,0 +1,15 @@
> > > +#ifndef HAVE_SAT_ARITH
> > > +#define HAVE_SAT_ARITH
> > > +
> > > +#include <stdint.h>
> > > +
> > > +#define DEF_SAT_ADDU(TYPE)                       \
> > > +TYPE __attribute__((noinline))                   \
> > > +sat_addu_##TYPE (TYPE x, TYPE y)                 \
> > > +{                                                \
> > > +  return (x + y) | (-(TYPE)((TYPE)(x + y) < x)); \
> > > +}
> > > +
> > > +#define RUN_SAT_ADDU(TYPE, x, y) sat_addu_##TYPE(x, y)
> > > +
> > > +#endif
> > > --
> > > 2.34.1
> > >

RE: [PATCH v1] Internal-fn: Add new internal function SAT_ADDU

Reply via email to