[Bug middle-end/113982] Poor codegen for 64-bit add with carry widening functions

2024-05-13 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113982

Jakub Jelinek  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #9 from Jakub Jelinek  ---
Should be fixed now.

[Bug middle-end/113982] Poor codegen for 64-bit add with carry widening functions

2024-05-13 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113982

--- Comment #8 from GCC Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:b621482296f6dec0abb22ed39cc4ce6811535d47

commit r15-427-gb621482296f6dec0abb22ed39cc4ce6811535d47
Author: Jakub Jelinek 
Date:   Mon May 13 11:15:27 2024 +0200

tree-ssa-math-opts: Pattern recognize yet another .ADD_OVERFLOW pattern
[PR113982]

We pattern recognize already many different patterns, and closest to the
requested one also
   yc = (type) y;
   zc = (type) z;
   x = yc + zc;
   w = (typeof_y) x;
   if (x > max)
where y/z has the same unsigned type and type is a wider unsigned type
and max is maximum value of the narrower unsigned type.
But apparently people are creative in writing this in diffent ways,
this requests
   yc = (type) y;
   zc = (type) z;
   x = yc + zc;
   w = (typeof_y) x;
   if (x >> narrower_type_bits)

The following patch implements that.

2024-05-13  Jakub Jelinek  

PR middle-end/113982
* tree-ssa-math-opts.cc (arith_overflow_check_p): Also return 1
for RSHIFT_EXPR by precision of maxval if shift result is only
used in a cast or comparison against zero.
(match_arith_overflow): Handle the RSHIFT_EXPR use case.

* gcc.dg/pr113982.c: New test.

[Bug middle-end/113982] Poor codegen for 64-bit add with carry widening functions

2024-05-10 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113982

Jakub Jelinek  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |jakub at gcc dot gnu.org
 Status|NEW |ASSIGNED

--- Comment #7 from Jakub Jelinek  ---
Created attachment 58179
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58179=edit
gcc15-pr113982.patch

Untested fix.

[Bug middle-end/113982] Poor codegen for 64-bit add with carry widening functions

2024-05-10 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113982

--- Comment #6 from Jakub Jelinek  ---
Note, since PR95853 we also recognize bool(r > ~0ULL) as the check rather than
bool(r >> 64).

[Bug middle-end/113982] Poor codegen for 64-bit add with carry widening functions

2024-02-19 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113982

--- Comment #5 from Jakub Jelinek  ---
With signed +- overflow not sure what exactly to pattern match, people can be
really creative there.  I guess
w = (__int128) x + y;
r = (long long) w;
ovf = (w >> 64) != (w >> 63);
or
w = (__int128) x + y;
r = (long long) w;
ovf = (w >> 63) + (unsigned __int128) 1 <= 1
etc.

[Bug middle-end/113982] Poor codegen for 64-bit add with carry widening functions

2024-02-19 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113982

--- Comment #4 from Richard Biener  ---
Confirmed for more pattern recog.

Possibly documenting GCC recognized idoms for these kind of operations
might be a nice thing to have.

[Bug middle-end/113982] Poor codegen for 64-bit add with carry widening functions

2024-02-18 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113982

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #3 from Jakub Jelinek  ---
We pattern recognize just something like (or r < y):
add_result add_wide_3(unsigned long long x, unsigned long long y) {
auto r = x + y;
return add_result{r, r < x};
}
and not doing the addition uselessly in twice as big mode + shift + comparison
against 0.
The reason for mov edx, 0 is not to clobber flags which are live at that point,
of course one could also move the clearing of edx one insn earlier and then it
could be xor edx, edx.

[Bug middle-end/113982] Poor codegen for 64-bit add with carry widening functions

2024-02-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113982

--- Comment #2 from Andrew Pinski  ---
Note some of this is due to return register issues.

If we instead do stores:
```
add_wide_1(unsigned long long, unsigned long long, add_result*):
mov rax, rdi
lea rcx, [rdi+rsi]
xor edi, edi
add rsi, rax
mov QWORD PTR [rdx], rcx
adc rdi, 0
mov BYTE PTR [rdx+8], dil
and BYTE PTR [rdx+8], 1
ret
```

Which is better.

add_wide_2 becomes:
```
add_wide_2(unsigned long long, unsigned long long, add_result*):
add rdi, rsi
setcBYTE PTR [rdx+8]
mov QWORD PTR [rdx], rdi
and BYTE PTR [rdx+8], 1
ret
```

Which is better, the extra and is filed already though I can't seem to find it.

[Bug middle-end/113982] Poor codegen for 64-bit add with carry widening functions

2024-02-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113982

Andrew Pinski  changed:

   What|Removed |Added

  Component|target  |middle-end
   Last reconfirmed||2024-02-18
   Severity|normal  |enhancement
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW

--- Comment #1 from Andrew Pinski  ---
aarch64 looks fine (ok there is an one extra mov):
```
add_wide_1(unsigned long long, unsigned long long):
addsx2, x0, x1
mov x0, x2
csetx1, cs
ret
add_wide_2(unsigned long long, unsigned long long):
addsx0, x0, x1
csetx1, cs
ret

```

So we have:
```
  _1 = (__int128 unsigned) x_6(D);
  _2 = (__int128 unsigned) y_7(D);
  r_8 = _1 + _2;
  _3 = x_6(D) + y_7(D);
  D.2566.sum = _3;
  _4 = r_8 >> 64;
  _5 = (bool) _4;
  D.2566.carry = _5;
```

So if we should convert _5 into:
  _t = .ADD_OVERFLOW (x_3(D), y_4(D));
  _t2 = IMAGPART_EXPR <_t>;
  _5 = (bool) _t2;

And then later on see that r_8 is REALPART_EXPR<_t>;
It would just work ...