[Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)

2023-08-29 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

--- Comment #33 from CVS Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:a7aec76a74dd38524be325343158d3049b6ab3ac

commit r14-3541-ga7aec76a74dd38524be325343158d3049b6ab3ac
Author: Jakub Jelinek 
Date:   Tue Aug 29 10:46:01 2023 +0200

tree-ssa-math-opts: Improve uaddc/usubc pattern matching [PR111209]

The uaddc/usubc usual matching is of the .{ADD,SUB}_OVERFLOW pair in the
middle, which adds/subtracts carry-in (from lower limbs) and computes
carry-out (to higher limbs).  Before optimizations (unless user writes
it intentionally that way already), all the steps look the same, but
optimizations simplify the handling of the least significant limb
(one which adds/subtracts 0 carry-in) to just a single
.{ADD,SUB}_OVERFLOW and the handling of the most significant limb
if the computed carry-out is ignored to normal addition/subtraction
of multiple operands.
Now, match_uaddc_usubc has code to turn that least significant
.{ADD,SUB}_OVERFLOW call into .U{ADD,SUB}C call with 0 carry-in if
a more significant limb above it is matched into .U{ADD,SUB}C; this
isn't necessary for functionality, as .ADD_OVERFLOW (x, y) is
functionally equal to .UADDC (x, y, 0) (provided the types of operands
are the same and result is complex type with that type element), and
it also has code to match the most significant limb with ignored carry-out
(in that case one pattern match turns both the penultimate limb pair of
.{ADD,SUB}_OVERFLOW into .U{ADD,SUB}C and the addition/subtraction
of the 4 values (2 carries) into another .U{ADD,SUB}C.

As the following patch shows, what we weren't handling is the case when
one uses either the __builtin_{add,sub}c builtins or hand written forms
thereof (either __builtin_*_overflow or even that written by hand) for
just 2 limbs, where the least significant has 0 carry-in and the most
significant ignores carry-out.  The following patch matches that, e.g.
  _16 = .ADD_OVERFLOW (_1, _2);
  _17 = REALPART_EXPR <_16>;
  _18 = IMAGPART_EXPR <_16>;
  _15 = _3 + _4;
  _12 = _15 + _18;
into
  _16 = .UADDC (_1, _2, 0);
  _17 = REALPART_EXPR <_16>;
  _18 = IMAGPART_EXPR <_16>;
  _19 = .UADDC (_3, _4, _18);
  _12 = IMAGPART_EXPR <_19>;
so that we can emit better code.

As the 2 later comments show, we must do that carefully, because the
pass walks the IL from first to last stmt in a bb and we must avoid
pattern matching this way something that should be matched on a later
instruction differently.

2023-08-29  Jakub Jelinek  

PR middle-end/79173
PR middle-end/111209
* tree-ssa-math-opts.cc (match_uaddc_usubc): Match also
just 2 limb uaddc/usubc with 0 carry-in on lower limb and ignored
carry-out on higher limb.  Don't match it though if it could be
matched later on 4 argument addition/subtraction.

* gcc.target/i386/pr79173-12.c: New test.

[Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)

2023-06-24 Thread vincent-gcc at vinc17 dot net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

--- Comment #32 from Vincent Lefèvre  ---
(In reply to Jakub Jelinek from comment #31)
> (In reply to Vincent Lefèvre from comment #30)
> > (In reply to Jakub Jelinek from comment #29)
> > > I mean that if the compiler can't see it is in [0, 1], it will need
> > > to use 2 additions and or the 2 carry bits together.  But, because
> > > the ored carry bits are in [0, 1] range, all the higher limbs could
> > > be done using addc.
> > 
> > If the compiler can't see that carryin is in [0, 1], then it must not "or"
> > the carry bits; it needs to add them, as carryout may be 2.
> 
> That is not how the clang builtin works, which is why I've implemented the |
> and documented it that way, as it is a compatibility builtin.

I'm confused. In Comment 14, you said that

  *carry_out = c1 + c2;

was used. This is an addition, not an OR.

[Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)

2023-06-24 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

--- Comment #31 from Jakub Jelinek  ---
(In reply to Vincent Lefèvre from comment #30)
> (In reply to Jakub Jelinek from comment #29)
> > (In reply to Vincent Lefèvre from comment #28)
> > > What do you mean by "the first additions will be less optimized"? (If you
> > > don't know anything about the initial carryin and the arguments, you can't
> > > optimize at all, AFAIK.)
> > 
> > I mean that if the compiler can't see it is in [0, 1], it will need to use 2
> > additions and or the 2 carry bits together.  But, because the ored carry
> > bits are in [0, 1] range, all the higher limbs could be done using addc.
> 
> If the compiler can't see that carryin is in [0, 1], then it must not "or"
> the carry bits; it needs to add them, as carryout may be 2.

That is not how the clang builtin works, which is why I've implemented the |
and documented it that way, as it is a compatibility builtin.

[Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)

2023-06-24 Thread vincent-gcc at vinc17 dot net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

--- Comment #30 from Vincent Lefèvre  ---
(In reply to Jakub Jelinek from comment #29)
> (In reply to Vincent Lefèvre from comment #28)
> > What do you mean by "the first additions will be less optimized"? (If you
> > don't know anything about the initial carryin and the arguments, you can't
> > optimize at all, AFAIK.)
> 
> I mean that if the compiler can't see it is in [0, 1], it will need to use 2
> additions and or the 2 carry bits together.  But, because the ored carry
> bits are in [0, 1] range, all the higher limbs could be done using addc.

If the compiler can't see that carryin is in [0, 1], then it must not "or" the
carry bits; it needs to add them, as carryout may be 2. So each part of the
whole chain would need 2 __builtin_add_overflow and an addition of the carry
bits. However, if the compiler can detect that at some point, the arguments
cannot be both 0x at the same time (while carryin is in [0, 2]), then
an optimization is possible for the rest of the chain.

[Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)

2023-06-24 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

--- Comment #29 from Jakub Jelinek  ---
(In reply to Vincent Lefèvre from comment #28)
> (In reply to Jakub Jelinek from comment #27)
> > Given that the builtins exist for 10 years already, I think changing it for
> > them is too late, though they don't seem to take backwards compatibility as
> > seriously.
> > They don't document the [0, 1] restriction and the behavior implemented in
> > GCC is what I saw when trying it.
> 
> Their documentation at https://clang.llvm.org/docs/LanguageExtensions.html
> is currently just
> 
>   unsigned sum = __builtin_addc(x, y, carryin, );
> 
> But a carry for a 2-ary addition is always 0 or 1, so the [0, 1] restriction
> is implicit (by the language that is used).

That is something that would need to be said explicitly, that it is undefined
behavior if it is some other value.  Like we document that e.g. __builtin_clz
is undefined behavior on 0 input.

> What do you mean by "the first additions will be less optimized"? (If you
> don't know anything about the initial carryin and the arguments, you can't
> optimize at all, AFAIK.)

I mean that if the compiler can't see it is in [0, 1], it will need to use 2
additions and or the 2 carry bits together.  But, because the ored carry bits
are in [0, 1] range, all the higher limbs could be done using addc.

If you try clang trunk with -O2
unsigned int foo (unsigned x, unsigned y, unsigned carry_in, unsigned
*carry_out) { return __builtin_addc (x, y, carry_in, carry_out); }
unsigned int bar (unsigned x, unsigned y, unsigned carry_in, unsigned
*carry_out) { if (carry_in > 1) __builtin_unreachable (); return __builtin_addc
(x, y, carry_in, carry_out); }
it shows exactly those 2 additions, rather than trying to optimize it, in both
cases.
GCC trunk emits something comparable for the first case, and 
unsigned int foo (unsigned x, unsigned y, unsigned carry_in, unsigned
*carry_out) { return __builtin_addc (x, y, carry_in, carry_out); }
you get
addb$-1, %dl; adcl%esi, %eax
for the main work.

[Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)

2023-06-24 Thread vincent-gcc at vinc17 dot net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

--- Comment #28 from Vincent Lefèvre  ---
(In reply to Jakub Jelinek from comment #27)
> Given that the builtins exist for 10 years already, I think changing it for
> them is too late, though they don't seem to take backwards compatibility as
> seriously.
> They don't document the [0, 1] restriction and the behavior implemented in
> GCC is what I saw when trying it.

Their documentation at https://clang.llvm.org/docs/LanguageExtensions.html is
currently just

  unsigned sum = __builtin_addc(x, y, carryin, );

But a carry for a 2-ary addition is always 0 or 1, so the [0, 1] restriction is
implicit (by the language that is used).

And in their example, the carries are always 0 or 1.

> Note, in many cases it isn't that big deal, because if carry_in is in [0, 1]
> range and compiler can see it from VRP, it can still optimize it.  And given
> that carry_out is always in [0, 1] range, for chained cases worst case the
> first additions will be less optimized but the chained will be already
> better.

What do you mean by "the first additions will be less optimized"? (If you don't
know anything about the initial carryin and the arguments, you can't optimize
at all, AFAIK.)

[Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)

2023-06-24 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

--- Comment #27 from Jakub Jelinek  ---
Given that the builtins exist for 10 years already, I think changing it for
them is too late, though they don't seem to take backwards compatibility as
seriously.
They don't document the [0, 1] restriction and the behavior implemented in GCC
is what I saw when trying it.
Note, in many cases it isn't that big deal, because if carry_in is in [0, 1]
range and compiler can see it from VRP, it can still optimize it.  And given
that carry_out is always in [0, 1] range, for chained cases worst case the
first additions will be less optimized but the chained will be already better.

[Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)

2023-06-24 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

--- Comment #26 from Marc Glisse  ---
(In reply to CVS Commits from comment #22)
> While the design of these builtins in clang is questionable,
> rather than being say
> unsigned __builtin_addc (unsigned, unsigned, bool, bool *)
> so that it is clear they add two [0, 0x] range numbers
> plus one [0, 1] range carry in and give [0, 0x] range
> return plus [0, 1] range carry out, they actually instead
> add 3 [0, 0x] values together but the carry out
> isn't then the expected [0, 2] value because
> 0xULL + 0x + 0x is 0x2fffd,
> but just [0, 1] whether there was any overflow at all.

That is very strange. I always thought that the original intent was for
__builtin_addc to assume that its third argument was in [0, 1] and generate a
single addc instruction on hardware that has it, and the type only ended up
being the same as the others for convenience (also C used not to have a bool
type). The final overflow never being 2 confirms this.

It may be worth discussing with clang developers if they would be willing to
document such a [0, 1] restriction, and maybe have ubsan check it.

[Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)

2023-06-20 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

--- Comment #25 from CVS Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:f8f68c4ca622a24c2e8cf2b5f2f9fdcd47a7b369

commit r14-2001-gf8f68c4ca622a24c2e8cf2b5f2f9fdcd47a7b369
Author: Jakub Jelinek 
Date:   Tue Jun 20 20:17:41 2023 +0200

tree-ssa-math-opts: Small uaddc/usubc pattern matching improvement
[PR79173]

In the following testcase we fail to pattern recognize the least
significant
.UADDC call.  The reason is that arg3 in that case is
  _3 = .ADD_OVERFLOW (...);
  _2 = __imag__ _3;
  _1 = _2 != 0;
  arg3 = (unsigned long) _1;
and while before the changes arg3 has a single use in some .ADD_OVERFLOW
later on, we add a .UADDC call next to it (and gsi_remove/gsi_replace only
what is strictly necessary and leave quite a few dead stmts around which
next DCE cleans up) and so it all of sudden isn't used just once, but twice
(.ADD_OVERFLOW and .UADDC) and so uaddc_cast fails.  While we could tweak
uaddc_cast and not require has_single_use in these uses, there is also
no vrp that would figure out that because __imag__ _3 is in [0, 1] range,
it can just use arg3 = __imag__ _3; and drop the comparison and cast.

We already search if either arg2 or arg3 is ultimately set from __imag__
of .{{ADD,SUB}_OVERFLOW,U{ADD,SUB}C} call, so the following patch just
remembers the lhs of __imag__ from that case and uses it later.

2023-06-20  Jakub Jelinek  

PR middle-end/79173
* tree-ssa-math-opts.cc (match_uaddc_usubc): Remember lhs of
IMAGPART_EXPR of arg2/arg3 and use that as arg3 if it has the right
type.

* g++.target/i386/pr79173-1.C: New test.

[Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)

2023-06-17 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

--- Comment #24 from Jakub Jelinek  ---
Sorry, in that case nothing needs to be done for riscv.  I'm sure aarch64, arm
has one (e.g. adcs), I think powerpc has some, but e.g. PR43892 is still open,
and I'm sure s390 has them too (alc*, slb*).

[Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)

2023-06-17 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

--- Comment #23 from Jeffrey A. Law  ---
risc-v doesn't have any special instructions to implement add-with-carry or
subtract-with-borrow.  Depending on who you talk do, it's either a feature or a
mis-design.

[Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)

2023-06-16 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

--- Comment #22 from CVS Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:2b4e0415ad664cdb3ce87d1f7eee5ca26911a05b

commit r14-1896-g2b4e0415ad664cdb3ce87d1f7eee5ca26911a05b
Author: Jakub Jelinek 
Date:   Fri Jun 16 19:47:28 2023 +0200

uiltins: Add support for clang compatible __builtin_{add,sub}c{,l,ll}
[PR79173]

While the design of these builtins in clang is questionable,
rather than being say
unsigned __builtin_addc (unsigned, unsigned, bool, bool *)
so that it is clear they add two [0, 0x] range numbers
plus one [0, 1] range carry in and give [0, 0x] range
return plus [0, 1] range carry out, they actually instead
add 3 [0, 0x] values together but the carry out
isn't then the expected [0, 2] value because
0xULL + 0x + 0x is 0x2fffd,
but just [0, 1] whether there was any overflow at all.

It is something used in the wild and shorter to write than the
corresponding
 #define __builtin_addc(a,b,carry_in,carry_out) \
  ({ unsigned _s; \
 unsigned _c1 = __builtin_uadd_overflow (a, b, &_s); \
 unsigned _c2 = __builtin_uadd_overflow (_s, carry_in, &_s); \
 *(carry_out) = (_c1 | _c2); \
 _s; })
and so a canned builtin for something people could often use.
It isn't that hard to maintain on the GCC side, as we just lower
it to two .ADD_OVERFLOW calls early, and the already committed
pottern recognization code can then make .UADDC/.USUBC calls out of
that if the carry in is in [0, 1] range and the corresponding
optab is supported by the target.

2023-06-16  Jakub Jelinek  

PR middle-end/79173
* builtin-types.def (BT_FN_UINT_UINT_UINT_UINT_UINTPTR,
BT_FN_ULONG_ULONG_ULONG_ULONG_ULONGPTR,
BT_FN_ULONGLONG_ULONGLONG_ULONGLONG_ULONGLONG_ULONGLONGPTR): New
types.
* builtins.def (BUILT_IN_ADDC, BUILT_IN_ADDCL, BUILT_IN_ADDCLL,
BUILT_IN_SUBC, BUILT_IN_SUBCL, BUILT_IN_SUBCLL): New builtins.
* builtins.cc (fold_builtin_addc_subc): New function.
(fold_builtin_varargs): Handle BUILT_IN_{ADD,SUB}C{,L,LL}.
* doc/extend.texi (__builtin_addc, __builtin_subc): Document.

* gcc.target/i386/pr79173-11.c: New test.
* gcc.dg/builtin-addc-1.c: New test.

[Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)

2023-06-15 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

Jakub Jelinek  changed:

   What|Removed |Added

 CC||krebbel at gcc dot gnu.org,
   ||law at gcc dot gnu.org,
   ||rsandifo at gcc dot gnu.org,
   ||segher at gcc dot gnu.org

--- Comment #21 from Jakub Jelinek  ---
CCing some maintainers, could you please consider adding these
uaddc5/usubc5 expanders to rs6000, aarch64, arm, s390, riscv
targets if the targets have some appropriate instructions (add with carry and
subtract with carry/borrow), copy the above gcc.target/i386/ testcases to other
gcc.target subdirectories and verify there you get optimal code for those
sequences?
The _BitInt addition/subtraction code will also use these new internal
functions if possible, so implementing it will also help get usable code for
_BitInt later on.

[Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)

2023-06-15 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

--- Comment #20 from CVS Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:43a3252c42af12ad90082e4088ea58eecd0bf582

commit r14-1837-g43a3252c42af12ad90082e4088ea58eecd0bf582
Author: Jakub Jelinek 
Date:   Thu Jun 15 09:12:40 2023 +0200

middle-end, i386: Pattern recognize add/subtract with carry [PR79173]

The following patch introduces {add,sub}c5_optab and pattern recognizes
various forms of add with carry and subtract with carry/borrow, see
pr79173-{1,2,3,4,5,6}.c tests on what is matched.
Primarily forms with 2 __builtin_add_overflow or __builtin_sub_overflow
calls per limb (with just one for the least significant one), for
add with carry even when it is hand written in C (for subtraction
reassoc seems to change it too much so that the pattern recognition
doesn't work).  __builtin_{add,sub}_overflow are standardized in C23
under ckd_{add,sub} names, so it isn't any longer a GNU only extension.

Note, clang has for these (IMHO badly designed)
__builtin_{add,sub}c{b,s,,l,ll} builtins which don't add/subtract just
a single bit of carry, but basically add 3 unsigned values or
subtract 2 unsigned values from one, and result in carry out of 0, 1, or 2
because of that.  If we wanted to introduce those for clang compatibility,
we could and lower them early to just two __builtin_{add,sub}_overflow
calls and let the pattern matching in this patch recognize it later.

I've added expanders for this on ix86 and in addition to that
added various peephole2s (in preparation patches for this patch) to make
sure we get nice (and small) code for the common cases.  I think there are
other PRs which request that e.g. for the _{addcarry,subborrow}_u{32,64}
intrinsics, which the patch also improves.

Would be nice if support for these optabs was added to many other targets,
arm/aarch64 and powerpc* certainly have such instructions, I'd expect
in fact that most targets do.

The _BitInt support I'm working on will also need this to emit reasonable
code.

2023-06-15  Jakub Jelinek  

PR middle-end/79173
* internal-fn.def (UADDC, USUBC): New internal functions.
* internal-fn.cc (expand_UADDC, expand_USUBC): New functions.
(commutative_ternary_fn_p): Return true also for IFN_UADDC.
* optabs.def (uaddc5_optab, usubc5_optab): New optabs.
* tree-ssa-math-opts.cc (uaddc_cast, uaddc_ne0, uaddc_is_cplxpart,
match_uaddc_usubc): New functions.
(math_opts_dom_walker::after_dom_children): Call match_uaddc_usubc
for PLUS_EXPR, MINUS_EXPR, BIT_IOR_EXPR and BIT_XOR_EXPR unless
other optimizations have been successful for those.
* gimple-fold.cc (gimple_fold_call): Handle IFN_UADDC and
IFN_USUBC.
* fold-const-call.cc (fold_const_call): Likewise.
* gimple-range-fold.cc (adjust_imagpart_expr): Likewise.
* tree-ssa-dce.cc (eliminate_unnecessary_stmts): Likewise.
* doc/md.texi (uaddc5, usubc5): Document new named
patterns.
* config/i386/i386.md (uaddc5, usubc5): New
define_expand patterns.
(*setcc_qi_addqi3_cconly_overflow_1_, *setccc): Split
into NOTE_INSN_DELETED note rather than nop instruction.
(*setcc_qi_negqi_ccc_1_, *setcc_qi_negqi_ccc_2_):
Likewise.

* gcc.target/i386/pr79173-1.c: New test.
* gcc.target/i386/pr79173-2.c: New test.
* gcc.target/i386/pr79173-3.c: New test.
* gcc.target/i386/pr79173-4.c: New test.
* gcc.target/i386/pr79173-5.c: New test.
* gcc.target/i386/pr79173-6.c: New test.
* gcc.target/i386/pr79173-7.c: New test.
* gcc.target/i386/pr79173-8.c: New test.
* gcc.target/i386/pr79173-9.c: New test.
* gcc.target/i386/pr79173-10.c: New test.

[Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)

2023-06-15 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

--- Comment #19 from CVS Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:ec52d228d6db7f77188ad099a8c0ff65dead3241

commit r14-1836-gec52d228d6db7f77188ad099a8c0ff65dead3241
Author: Jakub Jelinek 
Date:   Thu Jun 15 09:08:37 2023 +0200

i386: Add peephole2 patterns to improve subtract with borrow with memory
destination [PR79173]

This patch adds subborrow alternative so that it can have memory
destination and adds various peephole2s which help to match it.

2023-06-15  Jakub Jelinek  

PR middle-end/79173
* config/i386/i386.md (subborrow): Add alternative with
memory destination and add for it define_peephole2
TARGET_READ_MODIFY_WRITE/-Os patterns to prefer using memory
destination in these patterns.

[Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)

2023-06-15 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

--- Comment #18 from CVS Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:b6ca11407d4f5d16ccfb580ea2d3d9aa08d7cd11

commit r14-1835-gb6ca11407d4f5d16ccfb580ea2d3d9aa08d7cd11
Author: Jakub Jelinek 
Date:   Thu Jun 15 09:05:01 2023 +0200

i386: Add peephole2 patterns to improve add with carry or subtract with
borrow with memory destination [PR79173]

This patch adds various peephole2s which help to recognize add with
carry or subtract with borrow with memory destination.

2023-06-14  Jakub Jelinek  

PR middle-end/79173
* config/i386/i386.md (*sub_3, @add3_carry,
addcarry, @sub3_carry, *add3_cc_overflow_1): Add
define_peephole2 TARGET_READ_MODIFY_WRITE/-Os patterns to prefer
using memory destination in these patterns.

[Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)

2023-06-06 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

Jakub Jelinek  changed:

   What|Removed |Added

  Attachment #55271|0   |1
is obsolete||

--- Comment #17 from Jakub Jelinek  ---
Created attachment 55274
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55274=edit
gcc14-pr79173.patch

Full patch I'm going to test.
Unfortunately, I haven't been successful at getting the subc stuff working when
not using __builtin_sub_overflow builtins nor _subborrow_u* instrinsics, only
addc seems to work in that case, seems reassoc rewrites stuff that we no longer
are able to even pattern match __builtin_sub_overflow in that case.

So, maybe even adding the ugly clang builtins as a canned way how to express it
canonically would be useful, the pattern matching can't handle infinite number
of different ways how to write the same thing.

[Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)

2023-06-06 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

Jakub Jelinek  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |jakub at gcc dot gnu.org

--- Comment #16 from Jakub Jelinek  ---
Created attachment 55271
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55271=edit
gcc14-pr79173-wip.patch

Untested WIP (with backend implementation for x86 only so far).
Still, need to add/tweak some peephole2s to make pr79173-{1,2,3,4}.c tests
clean
on both x86-64 and i?86, and then as can be seen in pr79173-5.c need to do some
tweaks so that it pattern recognizes also the pattern recognized
__builtin_{add,sub}_overflow instead of those being used directly.
C23 is standardizing __builtin_{add,sub,mul}_overflow under the
ckd_{add,sub,mul} names, so pattern recognizing it that way is definitely
desirable.
Oh, and maybe incrementally check what happens if one of the addends or
subtrahends are immediate.

[Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)

2023-06-05 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

--- Comment #15 from Paweł Bylica  ---
For what it's worth, clang's __builtin_addc is implemented in frontend only as
a pair of __builtin_add_overflow. The commit from 11 year ago does not explain
why they were added.
https://github.com/llvm/llvm-project/commit/54398015bf8cbdc3af54dda74807d6f3c8436164

Producing a chain of ADC instructions out of __builtin_add_overflow patterns
has been done quite recently (~1 year ago). And this work is not fully finished
yet.

On the other hand, Go recently added "addc" like "builtins" in
https://pkg.go.dev/math/bits. And they are really pleasure to use in
multi-precision arithmetic.

[Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)

2023-06-05 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

--- Comment #14 from Jakub Jelinek  ---
Unfortunately, the clang __builtin_addc* and __builtin_subc* builtins are badly
designed.
Instead of adding or subtracting a 1-bit carry, where the result is guaranteed
to have 1-bit carry as well, they are:
unsigned __builtin_addc (unsigned x, unsigned y, unsigned carry_in, unsigned
*carry_out)
{
  unsigned r;
  unsigned c1 = __builtin_add_overflow (x, y, );
  unsigned c2 = __builtin_add_overflow (r, carry_in, );
  *carry_out = c1 + c2;
  return r; 
}

unsigned __builtin_subc (unsigned x, unsigned y, unsigned carry_in, unsigned
*carry_out)
{
  unsigned r;
  unsigned c1 = __builtin_sub_overflow (x, y, );
  unsigned c2 = __builtin_sub_overflow (r, carry_in, );
  *carry_out = c1 + c2;
  return r; 
}
So, instead of doing [0, 0x] + [0, 0x] + [0, 1] resulting in
[0, 0x] plus [0, 1] carry they actually do
[0, 0x] + [0, 0x] + [0, 0x] resulting in [0,
0x] plus [0, 2] carry.  So far for good "design".

So, am not really sure if it is worth implementing those builtins, one can use
__builtin_add_overflow/__builtin_sub_overflow instead, all we need is pattern
detect if they are chained and so start with 0 carry in and then the carry outs
are guaranteed to be [0, 1] and further pairs of .ADD_OVERFLOW/.SUB_OVERFLOW
again can count on [0, 1]
carry in and produce [0, 1] carry out.  And pattern detect that into some new
IFN which will try to produce efficient code for these.

[Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)

2021-09-09 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

--- Comment #13 from Andrew Pinski  ---
(In reply to Vincent Lefèvre from comment #12) 
> One issue is that _addcarry_u64 / x86intrin.h are not documented, so the
> conditions of its use in portable code are not clear. I suppose that it is
> designed to be used in a target-independent compiler builtin.

None of the x86 intrinsics are documented except on the Intel intrinsics guide
page:
e.g:
https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_addcarry_u64

[Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)

2021-09-09 Thread vincent-gcc at vinc17 dot net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

--- Comment #12 from Vincent Lefèvre  ---
(In reply to Andrew Pinski from comment #11)
> x86 has _addcarry_u64 which gets it mostly (see PR 97387).
> 
> The clang builtins __builtin_*_overflow are there but not the __builtin_add*
> builtins.
> 
> GCC does do a decent job of optimizing the original code now too.

By "original code", do you mean the code with _addcarry_u64 (I haven't tested)?
Otherwise, I don't see any optimization at all on the code I posted in Comment
0.

One issue is that _addcarry_u64 / x86intrin.h are not documented, so the
conditions of its use in portable code are not clear. I suppose that it is
designed to be used in a target-independent compiler builtin.

[Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)

2021-09-09 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||missed-optimization
   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=97387
  Component|target  |middle-end

--- Comment #11 from Andrew Pinski  ---
x86 has _addcarry_u64 which gets it mostly (see PR 97387).

The clang builtins __builtin_*_overflow are there but not the __builtin_add*
builtins.

GCC does do a decent job of optimizing the original code now too.