[Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173 --- Comment #33 from CVS Commits --- The master branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:a7aec76a74dd38524be325343158d3049b6ab3ac commit r14-3541-ga7aec76a74dd38524be325343158d3049b6ab3ac Author: Jakub Jelinek Date: Tue Aug 29 10:46:01 2023 +0200 tree-ssa-math-opts: Improve uaddc/usubc pattern matching [PR111209] The uaddc/usubc usual matching is of the .{ADD,SUB}_OVERFLOW pair in the middle, which adds/subtracts carry-in (from lower limbs) and computes carry-out (to higher limbs). Before optimizations (unless user writes it intentionally that way already), all the steps look the same, but optimizations simplify the handling of the least significant limb (one which adds/subtracts 0 carry-in) to just a single .{ADD,SUB}_OVERFLOW and the handling of the most significant limb if the computed carry-out is ignored to normal addition/subtraction of multiple operands. Now, match_uaddc_usubc has code to turn that least significant .{ADD,SUB}_OVERFLOW call into .U{ADD,SUB}C call with 0 carry-in if a more significant limb above it is matched into .U{ADD,SUB}C; this isn't necessary for functionality, as .ADD_OVERFLOW (x, y) is functionally equal to .UADDC (x, y, 0) (provided the types of operands are the same and result is complex type with that type element), and it also has code to match the most significant limb with ignored carry-out (in that case one pattern match turns both the penultimate limb pair of .{ADD,SUB}_OVERFLOW into .U{ADD,SUB}C and the addition/subtraction of the 4 values (2 carries) into another .U{ADD,SUB}C. As the following patch shows, what we weren't handling is the case when one uses either the __builtin_{add,sub}c builtins or hand written forms thereof (either __builtin_*_overflow or even that written by hand) for just 2 limbs, where the least significant has 0 carry-in and the most significant ignores carry-out. The following patch matches that, e.g. _16 = .ADD_OVERFLOW (_1, _2); _17 = REALPART_EXPR <_16>; _18 = IMAGPART_EXPR <_16>; _15 = _3 + _4; _12 = _15 + _18; into _16 = .UADDC (_1, _2, 0); _17 = REALPART_EXPR <_16>; _18 = IMAGPART_EXPR <_16>; _19 = .UADDC (_3, _4, _18); _12 = IMAGPART_EXPR <_19>; so that we can emit better code. As the 2 later comments show, we must do that carefully, because the pass walks the IL from first to last stmt in a bb and we must avoid pattern matching this way something that should be matched on a later instruction differently. 2023-08-29 Jakub Jelinek PR middle-end/79173 PR middle-end/111209 * tree-ssa-math-opts.cc (match_uaddc_usubc): Match also just 2 limb uaddc/usubc with 0 carry-in on lower limb and ignored carry-out on higher limb. Don't match it though if it could be matched later on 4 argument addition/subtraction. * gcc.target/i386/pr79173-12.c: New test.
[Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173 --- Comment #32 from Vincent Lefèvre --- (In reply to Jakub Jelinek from comment #31) > (In reply to Vincent Lefèvre from comment #30) > > (In reply to Jakub Jelinek from comment #29) > > > I mean that if the compiler can't see it is in [0, 1], it will need > > > to use 2 additions and or the 2 carry bits together. But, because > > > the ored carry bits are in [0, 1] range, all the higher limbs could > > > be done using addc. > > > > If the compiler can't see that carryin is in [0, 1], then it must not "or" > > the carry bits; it needs to add them, as carryout may be 2. > > That is not how the clang builtin works, which is why I've implemented the | > and documented it that way, as it is a compatibility builtin. I'm confused. In Comment 14, you said that *carry_out = c1 + c2; was used. This is an addition, not an OR.
[Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173 --- Comment #31 from Jakub Jelinek --- (In reply to Vincent Lefèvre from comment #30) > (In reply to Jakub Jelinek from comment #29) > > (In reply to Vincent Lefèvre from comment #28) > > > What do you mean by "the first additions will be less optimized"? (If you > > > don't know anything about the initial carryin and the arguments, you can't > > > optimize at all, AFAIK.) > > > > I mean that if the compiler can't see it is in [0, 1], it will need to use 2 > > additions and or the 2 carry bits together. But, because the ored carry > > bits are in [0, 1] range, all the higher limbs could be done using addc. > > If the compiler can't see that carryin is in [0, 1], then it must not "or" > the carry bits; it needs to add them, as carryout may be 2. That is not how the clang builtin works, which is why I've implemented the | and documented it that way, as it is a compatibility builtin.
[Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173 --- Comment #30 from Vincent Lefèvre --- (In reply to Jakub Jelinek from comment #29) > (In reply to Vincent Lefèvre from comment #28) > > What do you mean by "the first additions will be less optimized"? (If you > > don't know anything about the initial carryin and the arguments, you can't > > optimize at all, AFAIK.) > > I mean that if the compiler can't see it is in [0, 1], it will need to use 2 > additions and or the 2 carry bits together. But, because the ored carry > bits are in [0, 1] range, all the higher limbs could be done using addc. If the compiler can't see that carryin is in [0, 1], then it must not "or" the carry bits; it needs to add them, as carryout may be 2. So each part of the whole chain would need 2 __builtin_add_overflow and an addition of the carry bits. However, if the compiler can detect that at some point, the arguments cannot be both 0x at the same time (while carryin is in [0, 2]), then an optimization is possible for the rest of the chain.
[Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173 --- Comment #29 from Jakub Jelinek --- (In reply to Vincent Lefèvre from comment #28) > (In reply to Jakub Jelinek from comment #27) > > Given that the builtins exist for 10 years already, I think changing it for > > them is too late, though they don't seem to take backwards compatibility as > > seriously. > > They don't document the [0, 1] restriction and the behavior implemented in > > GCC is what I saw when trying it. > > Their documentation at https://clang.llvm.org/docs/LanguageExtensions.html > is currently just > > unsigned sum = __builtin_addc(x, y, carryin, ); > > But a carry for a 2-ary addition is always 0 or 1, so the [0, 1] restriction > is implicit (by the language that is used). That is something that would need to be said explicitly, that it is undefined behavior if it is some other value. Like we document that e.g. __builtin_clz is undefined behavior on 0 input. > What do you mean by "the first additions will be less optimized"? (If you > don't know anything about the initial carryin and the arguments, you can't > optimize at all, AFAIK.) I mean that if the compiler can't see it is in [0, 1], it will need to use 2 additions and or the 2 carry bits together. But, because the ored carry bits are in [0, 1] range, all the higher limbs could be done using addc. If you try clang trunk with -O2 unsigned int foo (unsigned x, unsigned y, unsigned carry_in, unsigned *carry_out) { return __builtin_addc (x, y, carry_in, carry_out); } unsigned int bar (unsigned x, unsigned y, unsigned carry_in, unsigned *carry_out) { if (carry_in > 1) __builtin_unreachable (); return __builtin_addc (x, y, carry_in, carry_out); } it shows exactly those 2 additions, rather than trying to optimize it, in both cases. GCC trunk emits something comparable for the first case, and unsigned int foo (unsigned x, unsigned y, unsigned carry_in, unsigned *carry_out) { return __builtin_addc (x, y, carry_in, carry_out); } you get addb$-1, %dl; adcl%esi, %eax for the main work.
[Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173 --- Comment #28 from Vincent Lefèvre --- (In reply to Jakub Jelinek from comment #27) > Given that the builtins exist for 10 years already, I think changing it for > them is too late, though they don't seem to take backwards compatibility as > seriously. > They don't document the [0, 1] restriction and the behavior implemented in > GCC is what I saw when trying it. Their documentation at https://clang.llvm.org/docs/LanguageExtensions.html is currently just unsigned sum = __builtin_addc(x, y, carryin, ); But a carry for a 2-ary addition is always 0 or 1, so the [0, 1] restriction is implicit (by the language that is used). And in their example, the carries are always 0 or 1. > Note, in many cases it isn't that big deal, because if carry_in is in [0, 1] > range and compiler can see it from VRP, it can still optimize it. And given > that carry_out is always in [0, 1] range, for chained cases worst case the > first additions will be less optimized but the chained will be already > better. What do you mean by "the first additions will be less optimized"? (If you don't know anything about the initial carryin and the arguments, you can't optimize at all, AFAIK.)
[Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173 --- Comment #27 from Jakub Jelinek --- Given that the builtins exist for 10 years already, I think changing it for them is too late, though they don't seem to take backwards compatibility as seriously. They don't document the [0, 1] restriction and the behavior implemented in GCC is what I saw when trying it. Note, in many cases it isn't that big deal, because if carry_in is in [0, 1] range and compiler can see it from VRP, it can still optimize it. And given that carry_out is always in [0, 1] range, for chained cases worst case the first additions will be less optimized but the chained will be already better.
[Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173 --- Comment #26 from Marc Glisse --- (In reply to CVS Commits from comment #22) > While the design of these builtins in clang is questionable, > rather than being say > unsigned __builtin_addc (unsigned, unsigned, bool, bool *) > so that it is clear they add two [0, 0x] range numbers > plus one [0, 1] range carry in and give [0, 0x] range > return plus [0, 1] range carry out, they actually instead > add 3 [0, 0x] values together but the carry out > isn't then the expected [0, 2] value because > 0xULL + 0x + 0x is 0x2fffd, > but just [0, 1] whether there was any overflow at all. That is very strange. I always thought that the original intent was for __builtin_addc to assume that its third argument was in [0, 1] and generate a single addc instruction on hardware that has it, and the type only ended up being the same as the others for convenience (also C used not to have a bool type). The final overflow never being 2 confirms this. It may be worth discussing with clang developers if they would be willing to document such a [0, 1] restriction, and maybe have ubsan check it.
[Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173 --- Comment #25 from CVS Commits --- The master branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:f8f68c4ca622a24c2e8cf2b5f2f9fdcd47a7b369 commit r14-2001-gf8f68c4ca622a24c2e8cf2b5f2f9fdcd47a7b369 Author: Jakub Jelinek Date: Tue Jun 20 20:17:41 2023 +0200 tree-ssa-math-opts: Small uaddc/usubc pattern matching improvement [PR79173] In the following testcase we fail to pattern recognize the least significant .UADDC call. The reason is that arg3 in that case is _3 = .ADD_OVERFLOW (...); _2 = __imag__ _3; _1 = _2 != 0; arg3 = (unsigned long) _1; and while before the changes arg3 has a single use in some .ADD_OVERFLOW later on, we add a .UADDC call next to it (and gsi_remove/gsi_replace only what is strictly necessary and leave quite a few dead stmts around which next DCE cleans up) and so it all of sudden isn't used just once, but twice (.ADD_OVERFLOW and .UADDC) and so uaddc_cast fails. While we could tweak uaddc_cast and not require has_single_use in these uses, there is also no vrp that would figure out that because __imag__ _3 is in [0, 1] range, it can just use arg3 = __imag__ _3; and drop the comparison and cast. We already search if either arg2 or arg3 is ultimately set from __imag__ of .{{ADD,SUB}_OVERFLOW,U{ADD,SUB}C} call, so the following patch just remembers the lhs of __imag__ from that case and uses it later. 2023-06-20 Jakub Jelinek PR middle-end/79173 * tree-ssa-math-opts.cc (match_uaddc_usubc): Remember lhs of IMAGPART_EXPR of arg2/arg3 and use that as arg3 if it has the right type. * g++.target/i386/pr79173-1.C: New test.
[Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173 --- Comment #24 from Jakub Jelinek --- Sorry, in that case nothing needs to be done for riscv. I'm sure aarch64, arm has one (e.g. adcs), I think powerpc has some, but e.g. PR43892 is still open, and I'm sure s390 has them too (alc*, slb*).
[Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173 --- Comment #23 from Jeffrey A. Law --- risc-v doesn't have any special instructions to implement add-with-carry or subtract-with-borrow. Depending on who you talk do, it's either a feature or a mis-design.
[Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173 --- Comment #22 from CVS Commits --- The master branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:2b4e0415ad664cdb3ce87d1f7eee5ca26911a05b commit r14-1896-g2b4e0415ad664cdb3ce87d1f7eee5ca26911a05b Author: Jakub Jelinek Date: Fri Jun 16 19:47:28 2023 +0200 uiltins: Add support for clang compatible __builtin_{add,sub}c{,l,ll} [PR79173] While the design of these builtins in clang is questionable, rather than being say unsigned __builtin_addc (unsigned, unsigned, bool, bool *) so that it is clear they add two [0, 0x] range numbers plus one [0, 1] range carry in and give [0, 0x] range return plus [0, 1] range carry out, they actually instead add 3 [0, 0x] values together but the carry out isn't then the expected [0, 2] value because 0xULL + 0x + 0x is 0x2fffd, but just [0, 1] whether there was any overflow at all. It is something used in the wild and shorter to write than the corresponding #define __builtin_addc(a,b,carry_in,carry_out) \ ({ unsigned _s; \ unsigned _c1 = __builtin_uadd_overflow (a, b, &_s); \ unsigned _c2 = __builtin_uadd_overflow (_s, carry_in, &_s); \ *(carry_out) = (_c1 | _c2); \ _s; }) and so a canned builtin for something people could often use. It isn't that hard to maintain on the GCC side, as we just lower it to two .ADD_OVERFLOW calls early, and the already committed pottern recognization code can then make .UADDC/.USUBC calls out of that if the carry in is in [0, 1] range and the corresponding optab is supported by the target. 2023-06-16 Jakub Jelinek PR middle-end/79173 * builtin-types.def (BT_FN_UINT_UINT_UINT_UINT_UINTPTR, BT_FN_ULONG_ULONG_ULONG_ULONG_ULONGPTR, BT_FN_ULONGLONG_ULONGLONG_ULONGLONG_ULONGLONG_ULONGLONGPTR): New types. * builtins.def (BUILT_IN_ADDC, BUILT_IN_ADDCL, BUILT_IN_ADDCLL, BUILT_IN_SUBC, BUILT_IN_SUBCL, BUILT_IN_SUBCLL): New builtins. * builtins.cc (fold_builtin_addc_subc): New function. (fold_builtin_varargs): Handle BUILT_IN_{ADD,SUB}C{,L,LL}. * doc/extend.texi (__builtin_addc, __builtin_subc): Document. * gcc.target/i386/pr79173-11.c: New test. * gcc.dg/builtin-addc-1.c: New test.
[Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173 Jakub Jelinek changed: What|Removed |Added CC||krebbel at gcc dot gnu.org, ||law at gcc dot gnu.org, ||rsandifo at gcc dot gnu.org, ||segher at gcc dot gnu.org --- Comment #21 from Jakub Jelinek --- CCing some maintainers, could you please consider adding these uaddc5/usubc5 expanders to rs6000, aarch64, arm, s390, riscv targets if the targets have some appropriate instructions (add with carry and subtract with carry/borrow), copy the above gcc.target/i386/ testcases to other gcc.target subdirectories and verify there you get optimal code for those sequences? The _BitInt addition/subtraction code will also use these new internal functions if possible, so implementing it will also help get usable code for _BitInt later on.
[Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173 --- Comment #20 from CVS Commits --- The master branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:43a3252c42af12ad90082e4088ea58eecd0bf582 commit r14-1837-g43a3252c42af12ad90082e4088ea58eecd0bf582 Author: Jakub Jelinek Date: Thu Jun 15 09:12:40 2023 +0200 middle-end, i386: Pattern recognize add/subtract with carry [PR79173] The following patch introduces {add,sub}c5_optab and pattern recognizes various forms of add with carry and subtract with carry/borrow, see pr79173-{1,2,3,4,5,6}.c tests on what is matched. Primarily forms with 2 __builtin_add_overflow or __builtin_sub_overflow calls per limb (with just one for the least significant one), for add with carry even when it is hand written in C (for subtraction reassoc seems to change it too much so that the pattern recognition doesn't work). __builtin_{add,sub}_overflow are standardized in C23 under ckd_{add,sub} names, so it isn't any longer a GNU only extension. Note, clang has for these (IMHO badly designed) __builtin_{add,sub}c{b,s,,l,ll} builtins which don't add/subtract just a single bit of carry, but basically add 3 unsigned values or subtract 2 unsigned values from one, and result in carry out of 0, 1, or 2 because of that. If we wanted to introduce those for clang compatibility, we could and lower them early to just two __builtin_{add,sub}_overflow calls and let the pattern matching in this patch recognize it later. I've added expanders for this on ix86 and in addition to that added various peephole2s (in preparation patches for this patch) to make sure we get nice (and small) code for the common cases. I think there are other PRs which request that e.g. for the _{addcarry,subborrow}_u{32,64} intrinsics, which the patch also improves. Would be nice if support for these optabs was added to many other targets, arm/aarch64 and powerpc* certainly have such instructions, I'd expect in fact that most targets do. The _BitInt support I'm working on will also need this to emit reasonable code. 2023-06-15 Jakub Jelinek PR middle-end/79173 * internal-fn.def (UADDC, USUBC): New internal functions. * internal-fn.cc (expand_UADDC, expand_USUBC): New functions. (commutative_ternary_fn_p): Return true also for IFN_UADDC. * optabs.def (uaddc5_optab, usubc5_optab): New optabs. * tree-ssa-math-opts.cc (uaddc_cast, uaddc_ne0, uaddc_is_cplxpart, match_uaddc_usubc): New functions. (math_opts_dom_walker::after_dom_children): Call match_uaddc_usubc for PLUS_EXPR, MINUS_EXPR, BIT_IOR_EXPR and BIT_XOR_EXPR unless other optimizations have been successful for those. * gimple-fold.cc (gimple_fold_call): Handle IFN_UADDC and IFN_USUBC. * fold-const-call.cc (fold_const_call): Likewise. * gimple-range-fold.cc (adjust_imagpart_expr): Likewise. * tree-ssa-dce.cc (eliminate_unnecessary_stmts): Likewise. * doc/md.texi (uaddc5, usubc5): Document new named patterns. * config/i386/i386.md (uaddc5, usubc5): New define_expand patterns. (*setcc_qi_addqi3_cconly_overflow_1_, *setccc): Split into NOTE_INSN_DELETED note rather than nop instruction. (*setcc_qi_negqi_ccc_1_, *setcc_qi_negqi_ccc_2_): Likewise. * gcc.target/i386/pr79173-1.c: New test. * gcc.target/i386/pr79173-2.c: New test. * gcc.target/i386/pr79173-3.c: New test. * gcc.target/i386/pr79173-4.c: New test. * gcc.target/i386/pr79173-5.c: New test. * gcc.target/i386/pr79173-6.c: New test. * gcc.target/i386/pr79173-7.c: New test. * gcc.target/i386/pr79173-8.c: New test. * gcc.target/i386/pr79173-9.c: New test. * gcc.target/i386/pr79173-10.c: New test.
[Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173 --- Comment #19 from CVS Commits --- The master branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:ec52d228d6db7f77188ad099a8c0ff65dead3241 commit r14-1836-gec52d228d6db7f77188ad099a8c0ff65dead3241 Author: Jakub Jelinek Date: Thu Jun 15 09:08:37 2023 +0200 i386: Add peephole2 patterns to improve subtract with borrow with memory destination [PR79173] This patch adds subborrow alternative so that it can have memory destination and adds various peephole2s which help to match it. 2023-06-15 Jakub Jelinek PR middle-end/79173 * config/i386/i386.md (subborrow): Add alternative with memory destination and add for it define_peephole2 TARGET_READ_MODIFY_WRITE/-Os patterns to prefer using memory destination in these patterns.
[Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173 --- Comment #18 from CVS Commits --- The master branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:b6ca11407d4f5d16ccfb580ea2d3d9aa08d7cd11 commit r14-1835-gb6ca11407d4f5d16ccfb580ea2d3d9aa08d7cd11 Author: Jakub Jelinek Date: Thu Jun 15 09:05:01 2023 +0200 i386: Add peephole2 patterns to improve add with carry or subtract with borrow with memory destination [PR79173] This patch adds various peephole2s which help to recognize add with carry or subtract with borrow with memory destination. 2023-06-14 Jakub Jelinek PR middle-end/79173 * config/i386/i386.md (*sub_3, @add3_carry, addcarry, @sub3_carry, *add3_cc_overflow_1): Add define_peephole2 TARGET_READ_MODIFY_WRITE/-Os patterns to prefer using memory destination in these patterns.
[Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173 Jakub Jelinek changed: What|Removed |Added Attachment #55271|0 |1 is obsolete|| --- Comment #17 from Jakub Jelinek --- Created attachment 55274 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55274=edit gcc14-pr79173.patch Full patch I'm going to test. Unfortunately, I haven't been successful at getting the subc stuff working when not using __builtin_sub_overflow builtins nor _subborrow_u* instrinsics, only addc seems to work in that case, seems reassoc rewrites stuff that we no longer are able to even pattern match __builtin_sub_overflow in that case. So, maybe even adding the ugly clang builtins as a canned way how to express it canonically would be useful, the pattern matching can't handle infinite number of different ways how to write the same thing.
[Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173 Jakub Jelinek changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |jakub at gcc dot gnu.org --- Comment #16 from Jakub Jelinek --- Created attachment 55271 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55271=edit gcc14-pr79173-wip.patch Untested WIP (with backend implementation for x86 only so far). Still, need to add/tweak some peephole2s to make pr79173-{1,2,3,4}.c tests clean on both x86-64 and i?86, and then as can be seen in pr79173-5.c need to do some tweaks so that it pattern recognizes also the pattern recognized __builtin_{add,sub}_overflow instead of those being used directly. C23 is standardizing __builtin_{add,sub,mul}_overflow under the ckd_{add,sub,mul} names, so pattern recognizing it that way is definitely desirable. Oh, and maybe incrementally check what happens if one of the addends or subtrahends are immediate.
[Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173 --- Comment #15 from Paweł Bylica --- For what it's worth, clang's __builtin_addc is implemented in frontend only as a pair of __builtin_add_overflow. The commit from 11 year ago does not explain why they were added. https://github.com/llvm/llvm-project/commit/54398015bf8cbdc3af54dda74807d6f3c8436164 Producing a chain of ADC instructions out of __builtin_add_overflow patterns has been done quite recently (~1 year ago). And this work is not fully finished yet. On the other hand, Go recently added "addc" like "builtins" in https://pkg.go.dev/math/bits. And they are really pleasure to use in multi-precision arithmetic.
[Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173 --- Comment #14 from Jakub Jelinek --- Unfortunately, the clang __builtin_addc* and __builtin_subc* builtins are badly designed. Instead of adding or subtracting a 1-bit carry, where the result is guaranteed to have 1-bit carry as well, they are: unsigned __builtin_addc (unsigned x, unsigned y, unsigned carry_in, unsigned *carry_out) { unsigned r; unsigned c1 = __builtin_add_overflow (x, y, ); unsigned c2 = __builtin_add_overflow (r, carry_in, ); *carry_out = c1 + c2; return r; } unsigned __builtin_subc (unsigned x, unsigned y, unsigned carry_in, unsigned *carry_out) { unsigned r; unsigned c1 = __builtin_sub_overflow (x, y, ); unsigned c2 = __builtin_sub_overflow (r, carry_in, ); *carry_out = c1 + c2; return r; } So, instead of doing [0, 0x] + [0, 0x] + [0, 1] resulting in [0, 0x] plus [0, 1] carry they actually do [0, 0x] + [0, 0x] + [0, 0x] resulting in [0, 0x] plus [0, 2] carry. So far for good "design". So, am not really sure if it is worth implementing those builtins, one can use __builtin_add_overflow/__builtin_sub_overflow instead, all we need is pattern detect if they are chained and so start with 0 carry in and then the carry outs are guaranteed to be [0, 1] and further pairs of .ADD_OVERFLOW/.SUB_OVERFLOW again can count on [0, 1] carry in and produce [0, 1] carry out. And pattern detect that into some new IFN which will try to produce efficient code for these.
[Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173 --- Comment #13 from Andrew Pinski --- (In reply to Vincent Lefèvre from comment #12) > One issue is that _addcarry_u64 / x86intrin.h are not documented, so the > conditions of its use in portable code are not clear. I suppose that it is > designed to be used in a target-independent compiler builtin. None of the x86 intrinsics are documented except on the Intel intrinsics guide page: e.g: https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_addcarry_u64
[Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173 --- Comment #12 from Vincent Lefèvre --- (In reply to Andrew Pinski from comment #11) > x86 has _addcarry_u64 which gets it mostly (see PR 97387). > > The clang builtins __builtin_*_overflow are there but not the __builtin_add* > builtins. > > GCC does do a decent job of optimizing the original code now too. By "original code", do you mean the code with _addcarry_u64 (I haven't tested)? Otherwise, I don't see any optimization at all on the code I posted in Comment 0. One issue is that _addcarry_u64 / x86intrin.h are not documented, so the conditions of its use in portable code are not clear. I suppose that it is designed to be used in a target-independent compiler builtin.
[Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173 Andrew Pinski changed: What|Removed |Added Keywords||missed-optimization See Also||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=97387 Component|target |middle-end --- Comment #11 from Andrew Pinski --- x86 has _addcarry_u64 which gets it mostly (see PR 97387). The clang builtins __builtin_*_overflow are there but not the __builtin_add* builtins. GCC does do a decent job of optimizing the original code now too.