[Bug middle-end/113982] Poor codegen for 64-bit add with carry widening functions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113982 Jakub Jelinek changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #9 from Jakub Jelinek --- Should be fixed now.
[Bug middle-end/113982] Poor codegen for 64-bit add with carry widening functions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113982 --- Comment #8 from GCC Commits --- The master branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:b621482296f6dec0abb22ed39cc4ce6811535d47 commit r15-427-gb621482296f6dec0abb22ed39cc4ce6811535d47 Author: Jakub Jelinek Date: Mon May 13 11:15:27 2024 +0200 tree-ssa-math-opts: Pattern recognize yet another .ADD_OVERFLOW pattern [PR113982] We pattern recognize already many different patterns, and closest to the requested one also yc = (type) y; zc = (type) z; x = yc + zc; w = (typeof_y) x; if (x > max) where y/z has the same unsigned type and type is a wider unsigned type and max is maximum value of the narrower unsigned type. But apparently people are creative in writing this in diffent ways, this requests yc = (type) y; zc = (type) z; x = yc + zc; w = (typeof_y) x; if (x >> narrower_type_bits) The following patch implements that. 2024-05-13 Jakub Jelinek PR middle-end/113982 * tree-ssa-math-opts.cc (arith_overflow_check_p): Also return 1 for RSHIFT_EXPR by precision of maxval if shift result is only used in a cast or comparison against zero. (match_arith_overflow): Handle the RSHIFT_EXPR use case. * gcc.dg/pr113982.c: New test.
[Bug middle-end/113982] Poor codegen for 64-bit add with carry widening functions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113982 Jakub Jelinek changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |jakub at gcc dot gnu.org Status|NEW |ASSIGNED --- Comment #7 from Jakub Jelinek --- Created attachment 58179 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58179=edit gcc15-pr113982.patch Untested fix.
[Bug middle-end/113982] Poor codegen for 64-bit add with carry widening functions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113982 --- Comment #6 from Jakub Jelinek --- Note, since PR95853 we also recognize bool(r > ~0ULL) as the check rather than bool(r >> 64).
[Bug middle-end/113982] Poor codegen for 64-bit add with carry widening functions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113982 --- Comment #5 from Jakub Jelinek --- With signed +- overflow not sure what exactly to pattern match, people can be really creative there. I guess w = (__int128) x + y; r = (long long) w; ovf = (w >> 64) != (w >> 63); or w = (__int128) x + y; r = (long long) w; ovf = (w >> 63) + (unsigned __int128) 1 <= 1 etc.
[Bug middle-end/113982] Poor codegen for 64-bit add with carry widening functions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113982 --- Comment #4 from Richard Biener --- Confirmed for more pattern recog. Possibly documenting GCC recognized idoms for these kind of operations might be a nice thing to have.
[Bug middle-end/113982] Poor codegen for 64-bit add with carry widening functions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113982 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #3 from Jakub Jelinek --- We pattern recognize just something like (or r < y): add_result add_wide_3(unsigned long long x, unsigned long long y) { auto r = x + y; return add_result{r, r < x}; } and not doing the addition uselessly in twice as big mode + shift + comparison against 0. The reason for mov edx, 0 is not to clobber flags which are live at that point, of course one could also move the clearing of edx one insn earlier and then it could be xor edx, edx.
[Bug middle-end/113982] Poor codegen for 64-bit add with carry widening functions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113982 --- Comment #2 from Andrew Pinski --- Note some of this is due to return register issues. If we instead do stores: ``` add_wide_1(unsigned long long, unsigned long long, add_result*): mov rax, rdi lea rcx, [rdi+rsi] xor edi, edi add rsi, rax mov QWORD PTR [rdx], rcx adc rdi, 0 mov BYTE PTR [rdx+8], dil and BYTE PTR [rdx+8], 1 ret ``` Which is better. add_wide_2 becomes: ``` add_wide_2(unsigned long long, unsigned long long, add_result*): add rdi, rsi setcBYTE PTR [rdx+8] mov QWORD PTR [rdx], rdi and BYTE PTR [rdx+8], 1 ret ``` Which is better, the extra and is filed already though I can't seem to find it.
[Bug middle-end/113982] Poor codegen for 64-bit add with carry widening functions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113982 Andrew Pinski changed: What|Removed |Added Component|target |middle-end Last reconfirmed||2024-02-18 Severity|normal |enhancement Ever confirmed|0 |1 Status|UNCONFIRMED |NEW --- Comment #1 from Andrew Pinski --- aarch64 looks fine (ok there is an one extra mov): ``` add_wide_1(unsigned long long, unsigned long long): addsx2, x0, x1 mov x0, x2 csetx1, cs ret add_wide_2(unsigned long long, unsigned long long): addsx0, x0, x1 csetx1, cs ret ``` So we have: ``` _1 = (__int128 unsigned) x_6(D); _2 = (__int128 unsigned) y_7(D); r_8 = _1 + _2; _3 = x_6(D) + y_7(D); D.2566.sum = _3; _4 = r_8 >> 64; _5 = (bool) _4; D.2566.carry = _5; ``` So if we should convert _5 into: _t = .ADD_OVERFLOW (x_3(D), y_4(D)); _t2 = IMAGPART_EXPR <_t>; _5 = (bool) _t2; And then later on see that r_8 is REALPART_EXPR<_t>; It would just work ...