[Bug middle-end/112600] Failed to optimize saturating addition using __builtin_add_overflow

2024-06-11 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600

--- Comment #15 from GCC Commits  ---
The master branch has been updated by Uros Bizjak :

https://gcc.gnu.org/g:05b95238be648c9cf8af2516930af6a7b637a2b8

commit r15-1183-g05b95238be648c9cf8af2516930af6a7b637a2b8
Author: Uros Bizjak 
Date:   Tue Jun 11 16:00:31 2024 +0200

i386: Use CMOV in .SAT_{ADD|SUB} expansion for TARGET_CMOV [PR112600]

For TARGET_CMOV targets emit insn sequence involving conditonal move.

.SAT_ADD:

addl%esi, %edi
movl$-1, %eax
cmovnc  %edi, %eax
ret

.SAT_SUB:

subl%esi, %edi
movl$0, %eax
cmovnc  %edi, %eax
ret

PR target/112600

gcc/ChangeLog:

* config/i386/i386.md (usadd3): Emit insn sequence
involving conditional move for TARGET_CMOVE targets.
(ussub3): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr112600-a.c: Also scan for cmov.
* gcc.target/i386/pr112600-b.c: Ditto.

[Bug middle-end/112600] Failed to optimize saturating addition using __builtin_add_overflow

2024-06-09 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600

--- Comment #14 from GCC Commits  ---
The master branch has been updated by Uros Bizjak :

https://gcc.gnu.org/g:8bb6b2f4ae19c3aab7d7a5e5c8f5965f89d90e01

commit r15-1122-g8bb6b2f4ae19c3aab7d7a5e5c8f5965f89d90e01
Author: Uros Bizjak 
Date:   Sun Jun 9 12:09:13 2024 +0200

i386: Implement .SAT_SUB for unsigned scalar integers [PR112600]

The following testcase:

unsigned
sub_sat (unsigned x, unsigned y)
{
  unsigned res;
  res = x - y;
  res &= -(x >= y);
  return res;
}

currently compiles (-O2) to:

sub_sat:
movl%edi, %edx
xorl%eax, %eax
subl%esi, %edx
cmpl%esi, %edi
setnb   %al
negl%eax
andl%edx, %eax
ret

We can expand through ussub{m}3 optab to use carry flag from the
subtraction
and generate code using SBB instruction implementing:

unsigned res = x - y;
res &= ~(-(x < y));

sub_sat:
subl%esi, %edi
sbbl%eax, %eax
notl%eax
andl%edi, %eax
ret

PR target/112600

gcc/ChangeLog:

* config/i386/i386.md (ussub3): New expander.
(sub_3): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr112600-b.c: New test.

[Bug middle-end/112600] Failed to optimize saturating addition using __builtin_add_overflow

2024-06-08 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600

--- Comment #13 from GCC Commits  ---
The master branch has been updated by Uros Bizjak :

https://gcc.gnu.org/g:de05e44b2ad9638d04173393b1eae3c38b2c3864

commit r15-1113-gde05e44b2ad9638d04173393b1eae3c38b2c3864
Author: Uros Bizjak 
Date:   Sat Jun 8 12:17:11 2024 +0200

i386: Implement .SAT_ADD for unsigned scalar integers [PR112600]

The following testcase:

unsigned
add_sat(unsigned x, unsigned y)
{
unsigned z;
return __builtin_add_overflow(x, y, ) ? -1u : z;
}

currently compiles (-O2) to:

add_sat:
addl%esi, %edi
jc  .L3
movl%edi, %eax
ret
.L3:
orl $-1, %eax
ret

We can expand through usadd{m}3 optab to use carry flag from the addition
and generate branchless code using SBB instruction implementing:

unsigned res = x + y;
res |= -(res < x);

add_sat:
addl%esi, %edi
sbbl%eax, %eax
orl %edi, %eax
ret

PR target/112600

gcc/ChangeLog:

* config/i386/i386.md (usadd3): New expander.
(x86_movcc_0_m1_neg): Use SWI mode iterator.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr112600-a.c: New test.

[Bug middle-end/112600] Failed to optimize saturating addition using __builtin_add_overflow

2024-06-06 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600

--- Comment #12 from GCC Commits  ---
The master branch has been updated by Uros Bizjak :

https://gcc.gnu.org/g:366d45c8d4911dc7874d2e64cf2583c0133b8dd5

commit r15-1077-g366d45c8d4911dc7874d2e64cf2583c0133b8dd5
Author: Uros Bizjak 
Date:   Thu Jun 6 19:18:41 2024 +0200

testsuite/i386: Add vector sat_sub testcases [PR112600]

PR middle-end/112600

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr112600-2a.c: New test.
* gcc.target/i386/pr112600-2b.c: New test.

[Bug middle-end/112600] Failed to optimize saturating addition using __builtin_add_overflow

2024-06-05 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600

Uroš Bizjak  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #11 from Uroš Bizjak  ---
(In reply to Jonathan Wakely from comment #0)
> These two implementations of C++26 saturating addition
> (std::add_sat) have equivalent behaviour:
> 
> unsigned
> add_sat(unsigned x, unsigned y) noexcept
> {
> unsigned z;
> if (!__builtin_add_overflow(x, y, ))
>   return z;
> return -1u;
> }

[...]

> For -O3 on x86_64 GCC uses a branch for the first one:
> 
> add_sat(unsigned int, unsigned int):
> add edi, esi
> jc  .L3
> mov eax, edi
> ret
> .L3:
> or  eax, -1
> ret

The reason for failed if-conversion to cmove is due to the "weird" compare
arguments, the consequence of addsi3_cc_overflow_1 definition:

(insn 9 4 10 2 (parallel [
(set (reg:CCC 17 flags)
(compare:CCC (plus:SI (reg:SI 106)
(reg:SI 107))
(reg:SI 106)))
(set (reg:SI 104)
(plus:SI (reg:SI 106)
(reg:SI 107)))
]) "sadd.c":7:12 477 {addsi3_cc_overflow_1}
 (expr_list:REG_DEAD (reg:SI 107)
(expr_list:REG_DEAD (reg:SI 106)
(nil

the noce_try_cmove path fails in noce_emit_cmove:

Breakpoint 1, noce_emit_cmove (if_info=0x7fffd750, x=0x7fffe9fe4e40,
code=LTU, cmp_a=0x7fffe9fe4a20, cmp_b=0x7fffe9feb9a8, vfalse=0x7fffe9fe49d8, 
vtrue=0x7fffe9e09480, cc_cmp=0x0, rev_cc_cmp=0x0) at
../../git/gcc/gcc/ifcvt.cc:1774
1774return NULL_RTX;
(gdb) list
1766  /* Don't even try if the comparison operands are weird
1767 except that the target supports cbranchcc4.  */
1768  if (! general_operand (cmp_a, GET_MODE (cmp_a))
1769  || ! general_operand (cmp_b, GET_MODE (cmp_b)))
1770{
1771  if (!have_cbranchcc4
1772  || GET_MODE_CLASS (GET_MODE (cmp_a)) != MODE_CC
1773  || cmp_b != const0_rtx)
1774return NULL_RTX;
1775}
1776
1777  target = emit_conditional_move (x, { code, cmp_a, cmp_b, VOIDmode
},
1778  vtrue, vfalse, GET_MODE (x),
(gdb) bt
#0  noce_emit_cmove (if_info=0x7fffd750, x=0x7fffe9fe4e40, code=LTU,
cmp_a=0x7fffe9fe4a20, cmp_b=0x7fffe9feb9a8, vfalse=0x7fffe9fe49d8, 
vtrue=0x7fffe9e09480, cc_cmp=0x0, rev_cc_cmp=0x0) at
../../git/gcc/gcc/ifcvt.cc:1774
#1  0x020d995b in noce_try_cmove (if_info=0x7fffd750) at
../../git/gcc/gcc/ifcvt.cc:1884
#2  0x020dec37 in noce_process_if_block (if_info=0x7fffd750) at
../../git/gcc/gcc/ifcvt.cc:4149
#3  0x020e0248 in noce_find_if_block (test_bb=0x7fffe9fb5d80,
then_edge=0x7fffe9fd7cc0, else_edge=0x7fffe9fd7c60, pass=1)
at ../../git/gcc/gcc/ifcvt.cc:4716
#4  0x020e08e9 in find_if_header (test_bb=0x7fffe9fb5d80, pass=1) at
../../git/gcc/gcc/ifcvt.cc:4921
#5  0x020e3255 in if_convert (after_combine=true) at
../../git/gcc/gcc/ifcvt.cc:6068

(gdb) p debug_rtx (cmp_a)
(plus:SI (reg:SI 106)
(reg:SI 107))
$1 = void
(gdb) p debug_rtx (cmp_b)
(reg:SI 106)
$2 = void

The above cmp_a RTX fails general_operand check.

Please note that similar testcase:

unsigned
sub_sat(unsigned x, unsigned y)
{
unsigned z;
return __builtin_sub_overflow(x, y, ) ? 0 : z;
}

results in the expected:

subl%esi, %edi  # 52[c=4 l=2]  *subsi_3/0
movl$0, %eax# 53[c=4 l=5]  *movsi_internal/0
cmovnb  %edi, %eax  # 54[c=4 l=3]  *movsicc_noc/0
ret # 50[c=0 l=1]  simple_return_internal

due to:

(insn 9 4 10 2 (parallel [
(set (reg:CC 17 flags)
(compare:CC (reg:SI 106)
(reg:SI 107)))
(set (reg:SI 104)
(minus:SI (reg:SI 106)
(reg:SI 107)))
]) "sadd.c":28:12 416 {*subsi_3}
 (expr_list:REG_DEAD (reg:SI 107)
(expr_list:REG_DEAD (reg:SI 106)
(nil

So, either addsi3_cc_overflow_1 RTX is not correct, or noce_emit_cmove should
be improved to handle the above "weird" operand form.

Let's ask Jakub.

[Bug middle-end/112600] Failed to optimize saturating addition using __builtin_add_overflow

2024-06-05 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600

--- Comment #10 from GCC Commits  ---
The master branch has been updated by Pan Li :

https://gcc.gnu.org/g:abe6d39365476e6be724815d09d072e305018755

commit r15-1030-gabe6d39365476e6be724815d09d072e305018755
Author: Pan Li 
Date:   Tue May 28 15:37:44 2024 +0800

Internal-fn: Support new IFN SAT_SUB for unsigned scalar int

This patch would like to add the middle-end presentation for the
saturation sub.  Aka set the result of add to the min when downflow.
It will take the pattern similar as below.

SAT_SUB (x, y) => (x - y) & (-(TYPE)(x >= y));

For example for uint8_t, we have

* SAT_SUB (255, 0)   => 255
* SAT_SUB (1, 2) => 0
* SAT_SUB (254, 255) => 0
* SAT_SUB (0, 255)   => 0

Given below SAT_SUB for uint64

uint64_t sat_sub_u64 (uint64_t x, uint64_t y)
{
  return (x - y) & (-(TYPE)(x >= y));
}

Before this patch:
uint64_t sat_sub_u_0_uint64_t (uint64_t x, uint64_t y)
{
  _Bool _1;
  long unsigned int _3;
  uint64_t _6;

;;   basic block 2, loop depth 0
;;pred:   ENTRY
  _1 = x_4(D) >= y_5(D);
  _3 = x_4(D) - y_5(D);
  _6 = _1 ? _3 : 0;
  return _6;
;;succ:   EXIT
}

After this patch:
uint64_t sat_sub_u_0_uint64_t (uint64_t x, uint64_t y)
{
  uint64_t _6;

;;   basic block 2, loop depth 0
;;pred:   ENTRY
  _6 = .SAT_SUB (x_4(D), y_5(D)); [tail call]
  return _6;
;;succ:   EXIT
}

The below tests are running for this patch:
*. The riscv fully regression tests.
*. The x86 bootstrap tests.
*. The x86 fully regression tests.

PR target/51492
PR target/112600

gcc/ChangeLog:

* internal-fn.def (SAT_SUB): Add new IFN define for SAT_SUB.
* match.pd: Add new match for SAT_SUB.
* optabs.def (OPTAB_NL): Remove fixed-point for ussub/ssub.
* tree-ssa-math-opts.cc (gimple_unsigned_integer_sat_sub): Add
new decl for generated in match.pd.
(build_saturation_binary_arith_call): Add new helper function
to build the gimple call to binary SAT alu.
(match_saturation_arith): Rename from.
(match_unsigned_saturation_add): Rename to.
(match_unsigned_saturation_sub): Add new func to match the
unsigned sat sub.
(math_opts_dom_walker::after_dom_children): Add SAT_SUB matching
try when COND_EXPR.

Signed-off-by: Pan Li 

[Bug middle-end/112600] Failed to optimize saturating addition using __builtin_add_overflow

2024-05-17 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600

--- Comment #9 from GCC Commits  ---
The master branch has been updated by Pan Li :

https://gcc.gnu.org/g:34ed2b4593fa98b613632d0dde30b6ba3e7ecad9

commit r15-642-g34ed2b4593fa98b613632d0dde30b6ba3e7ecad9
Author: Pan Li 
Date:   Fri May 17 18:49:46 2024 +0800

RISC-V: Implement IFN SAT_ADD for both the scalar and vector

The patch implement the SAT_ADD in the riscv backend as the
sample for both the scalar and vector.  Given below vector
as example:

void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
{
  unsigned i;

  for (i = 0; i < n; i++)
out[i] = (x[i] + y[i]) | (- (uint64_t)((uint64_t)(x[i] + y[i]) <
x[i]));
}

Before this patch:
vec_sat_add_u64:
  ...
  vsetvli a5,a3,e64,m1,ta,ma
  vle64.v v0,0(a1)
  vle64.v v1,0(a2)
  sllia4,a5,3
  sub a3,a3,a5
  add a1,a1,a4
  add a2,a2,a4
  vadd.vv v1,v0,v1
  vmsgtu.vv   v0,v0,v1
  vmerge.vim  v1,v1,-1,v0
  vse64.v v1,0(a0)
  ...

After this patch:
vec_sat_add_u64:
  ...
  vsetvli a5,a3,e64,m1,ta,ma
  vle64.v v1,0(a1)
  vle64.v v2,0(a2)
  sllia4,a5,3
  sub a3,a3,a5
  add a1,a1,a4
  add a2,a2,a4
  vsaddu.vv   v1,v1,v2  <=  Vector Single-Width Saturating Add
  vse64.v v1,0(a0)
  ...

The below test suites are passed for this patch.
* The riscv fully regression tests.
* The aarch64 fully regression tests.
* The x86 bootstrap tests.
* The x86 fully regression tests.

PR target/51492
PR target/112600

gcc/ChangeLog:

* config/riscv/autovec.md (usadd3): New pattern expand for
the unsigned SAT_ADD in vector mode.
* config/riscv/riscv-protos.h (riscv_expand_usadd): New func decl
to expand usadd3 pattern.
(expand_vec_usadd): Ditto but for vector.
* config/riscv/riscv-v.cc (emit_vec_saddu): New func impl to emit
the vsadd insn.
(expand_vec_usadd): New func impl to expand usadd3 for
vector.
* config/riscv/riscv.cc (riscv_expand_usadd): New func impl to
expand usadd3 for scalar.
* config/riscv/riscv.md (usadd3): New pattern expand for
the unsigned SAT_ADD in scalar mode.
* config/riscv/vector.md: Allow VLS mode for vsaddu.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vec_sat_binary.h: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-2.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-3.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-4.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-1.c: New
test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-2.c: New
test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-3.c: New
test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-4.c: New
test.
* gcc.target/riscv/sat_arith.h: New test.
* gcc.target/riscv/sat_u_add-1.c: New test.
* gcc.target/riscv/sat_u_add-2.c: New test.
* gcc.target/riscv/sat_u_add-3.c: New test.
* gcc.target/riscv/sat_u_add-4.c: New test.
* gcc.target/riscv/sat_u_add-run-1.c: New test.
* gcc.target/riscv/sat_u_add-run-2.c: New test.
* gcc.target/riscv/sat_u_add-run-3.c: New test.
* gcc.target/riscv/sat_u_add-run-4.c: New test.
* gcc.target/riscv/scalar_sat_binary.h: New test.

Signed-off-by: Pan Li 

[Bug middle-end/112600] Failed to optimize saturating addition using __builtin_add_overflow

2024-05-17 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600

--- Comment #8 from GCC Commits  ---
The master branch has been updated by Uros Bizjak :

https://gcc.gnu.org/g:b59de4113262f2bee14147eb17eb3592f03d9556

commit r15-634-gb59de4113262f2bee14147eb17eb3592f03d9556
Author: Uros Bizjak 
Date:   Fri May 17 09:55:49 2024 +0200

i386: Rename sat_plusminus expanders to standard names [PR112600]

Rename _3 expander to a standard ssadd,
usadd, sssub and ussub name to enable corresponding optab expansion.

Also add named expander for MMX modes.

PR middle-end/112600

gcc/ChangeLog:

* config/i386/mmx.md (3): New expander.
* config/i386/sse.md
(_3):
Rename expander to 3.
(3): Update for rename.
* config/i386/i386-builtin.def: Update for rename.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr112600-1a.c: New test.
* gcc.target/i386/pr112600-1b.c: New test.

[Bug middle-end/112600] Failed to optimize saturating addition using __builtin_add_overflow

2024-05-16 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600

--- Comment #7 from GCC Commits  ---
The master branch has been updated by Pan Li :

https://gcc.gnu.org/g:d4dee347b3fe1982bab26485ff31cd039c9df010

commit r15-577-gd4dee347b3fe1982bab26485ff31cd039c9df010
Author: Pan Li 
Date:   Wed May 15 10:14:06 2024 +0800

Vect: Support new IFN SAT_ADD for unsigned vector int

For vectorize, we leverage the existing vect pattern recog to find
the pattern similar to scalar and let the vectorizer to perform
the rest part for standard name usadd3 in vector mode.
The riscv vector backend have insn "Vector Single-Width Saturating
Add and Subtract" which can be leveraged when expand the usadd3
in vector mode.  For example:

void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
{
  unsigned i;

  for (i = 0; i < n; i++)
out[i] = (x[i] + y[i]) | (- (uint64_t)((uint64_t)(x[i] + y[i]) <
x[i]));
}

Before this patch:
void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
{
  ...
  _80 = .SELECT_VL (ivtmp_78, POLY_INT_CST [2, 2]);
  ivtmp_58 = _80 * 8;
  vect__4.7_61 = .MASK_LEN_LOAD (vectp_x.5_59, 64B, { -1, ... }, _80, 0);
  vect__6.10_65 = .MASK_LEN_LOAD (vectp_y.8_63, 64B, { -1, ... }, _80, 0);
  vect__7.11_66 = vect__4.7_61 + vect__6.10_65;
  mask__8.12_67 = vect__4.7_61 > vect__7.11_66;
  vect__12.15_72 = .VCOND_MASK (mask__8.12_67, { 18446744073709551615,
... }, vect__7.11_66);
  .MASK_LEN_STORE (vectp_out.16_74, 64B, { -1, ... }, _80, 0,
vect__12.15_72);
  vectp_x.5_60 = vectp_x.5_59 + ivtmp_58;
  vectp_y.8_64 = vectp_y.8_63 + ivtmp_58;
  vectp_out.16_75 = vectp_out.16_74 + ivtmp_58;
  ivtmp_79 = ivtmp_78 - _80;
  ...
}

After this patch:
void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
{
  ...
  _62 = .SELECT_VL (ivtmp_60, POLY_INT_CST [2, 2]);
  ivtmp_46 = _62 * 8;
  vect__4.7_49 = .MASK_LEN_LOAD (vectp_x.5_47, 64B, { -1, ... }, _62, 0);
  vect__6.10_53 = .MASK_LEN_LOAD (vectp_y.8_51, 64B, { -1, ... }, _62, 0);
  vect__12.11_54 = .SAT_ADD (vect__4.7_49, vect__6.10_53);
  .MASK_LEN_STORE (vectp_out.12_56, 64B, { -1, ... }, _62, 0,
vect__12.11_54);
  ...
}

The below test suites are passed for this patch.
* The riscv fully regression tests.
* The x86 bootstrap tests.
* The x86 fully regression tests.

PR target/51492
PR target/112600

gcc/ChangeLog:

* tree-vect-patterns.cc (gimple_unsigned_integer_sat_add): New
func decl generated by match.pd match.
(vect_recog_sat_add_pattern): New func impl to recog the pattern
for unsigned SAT_ADD.

Signed-off-by: Pan Li 

[Bug middle-end/112600] Failed to optimize saturating addition using __builtin_add_overflow

2024-05-16 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600

--- Comment #6 from GCC Commits  ---
The master branch has been updated by Pan Li :

https://gcc.gnu.org/g:52b0536710ff3f3ace72ab00ce9ef6c630cd1183

commit r15-576-g52b0536710ff3f3ace72ab00ce9ef6c630cd1183
Author: Pan Li 
Date:   Wed May 15 10:14:05 2024 +0800

Internal-fn: Support new IFN SAT_ADD for unsigned scalar int

This patch would like to add the middle-end presentation for the
saturation add.  Aka set the result of add to the max when overflow.
It will take the pattern similar as below.

SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))

Take uint8_t as example, we will have:

* SAT_ADD (1, 254)   => 255.
* SAT_ADD (1, 255)   => 255.
* SAT_ADD (2, 255)   => 255.
* SAT_ADD (255, 255) => 255.

Given below example for the unsigned scalar integer uint64_t:

uint64_t sat_add_u64 (uint64_t x, uint64_t y)
{
  return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
}

Before this patch:
uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
{
  long unsigned int _1;
  _Bool _2;
  long unsigned int _3;
  long unsigned int _4;
  uint64_t _7;
  long unsigned int _10;
  __complex__ long unsigned int _11;

;;   basic block 2, loop depth 0
;;pred:   ENTRY
  _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
  _1 = REALPART_EXPR <_11>;
  _10 = IMAGPART_EXPR <_11>;
  _2 = _10 != 0;
  _3 = (long unsigned int) _2;
  _4 = -_3;
  _7 = _1 | _4;
  return _7;
;;succ:   EXIT

}

After this patch:
uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
{
  uint64_t _7;

;;   basic block 2, loop depth 0
;;pred:   ENTRY
  _7 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
  return _7;
;;succ:   EXIT
}

The below tests are passed for this patch:
1. The riscv fully regression tests.
3. The x86 bootstrap tests.
4. The x86 fully regression tests.

PR target/51492
PR target/112600

gcc/ChangeLog:

* internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD
to the return true switch case(s).
* internal-fn.def (SAT_ADD):  Add new signed optab SAT_ADD.
* match.pd: Add unsigned SAT_ADD match(es).
* optabs.def (OPTAB_NL): Remove fixed-point limitation for
us/ssadd.
* tree-ssa-math-opts.cc (gimple_unsigned_integer_sat_add): New
extern func decl generated in match.pd match.
(match_saturation_arith): New func impl to match the saturation
arith.
(math_opts_dom_walker::after_dom_children): Try match saturation
arith when IOR expr.

Signed-off-by: Pan Li 

[Bug middle-end/112600] Failed to optimize saturating addition using __builtin_add_overflow

2024-01-22 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600

Tamar Christina  changed:

   What|Removed |Added

 CC||tnfchris at gcc dot gnu.org

--- Comment #5 from Tamar Christina  ---
Yeah, this is hurting us a lot on vectors as well:

https://godbolt.org/z/ecnGadxcG

The first one isn't vectorizable and the second one we generates too
complicated code as the pattern vec_cond is expanded to something quite
complicated.

It was too complicated for the intern we had at the time, but I think basically
we should still do the conclusion of this thread no?
https://www.mail-archive.com/gcc@gcc.gnu.org/msg95398.html

i.e. we should just make proper saturating IFN.

The only remaining question is, should we make them optab backed or can we do
something reasonably better for most target with better fallback code.

This seems to indicate yes since the REALPART_EXPR seems to screw things up a
bit.

[Bug middle-end/112600] Failed to optimize saturating addition using __builtin_add_overflow

2023-11-20 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600

--- Comment #4 from Richard Biener  ---
Note we don't have a good middle-end representation for (integer) saturation.

Maybe having variants of .ADD_OVERFLOW and friends specifying an alternate
value (or the value in case the actual value is left unspecified when
overflow occurs) as additional argument would work.  So have the first
fold into

   :
  _8 = .ADD_OVERFLOW (x_6(D), y_7(D), -1u);
  _1 = REALPART_EXPR (_8);
  return _1;

of course that defers the code-generation problem to RTL expansion and
would require to pattern match

res = x + y;
res |= -(res < x);

to the same for canonicalization purposes.  I would expect that some
targets implement saturating integer arithmetic (not sure about
multiplication or division though).

[Bug middle-end/112600] Failed to optimize saturating addition using __builtin_add_overflow

2023-11-19 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |enhancement
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2023-11-19
 Ever confirmed|0   |1

--- Comment #3 from Andrew Pinski  ---
Confirmed.

[Bug middle-end/112600] Failed to optimize saturating addition using __builtin_add_overflow

2023-11-17 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600

--- Comment #2 from Jonathan Wakely  ---
For similar saturating subtraction functions:

unsigned
sub_sat(unsigned x, unsigned y) noexcept
{
unsigned z;
if (!__builtin_sub_overflow(x, y, ))
return z;
return 0;
}

unsigned
sub_sat2(unsigned x, unsigned y) noexcept
{
unsigned res;
res = x - y;
res &= -(res <= x);;
return res;
}

GCC x86_64 gives:

sub_sat(unsigned int, unsigned int):
sub edi, esi
jb  .L3
mov eax, edi
ret
.L3:
xor eax, eax
ret
sub_sat2(unsigned int, unsigned int):
sub edi, esi
mov eax, 0
cmovnb  eax, edi
ret

GCC aarch64 gives:

sub_sat(unsigned int, unsigned int):
subsw2, w0, w1
mov w3, 0
cmp w0, w1
cselw0, w2, w3, cs
ret
sub_sat2(unsigned int, unsigned int):
subsw0, w0, w1
cselw0, w0, wzr, cs
ret


Clang x86_64 gives:

sub_sat(unsigned int, unsigned int):
xor eax, eax
sub edi, esi
cmovae  eax, edi
ret
sub_sat2(unsigned int, unsigned int):
xor eax, eax
sub edi, esi
cmovae  eax, edi
ret

[Bug middle-end/112600] Failed to optimize saturating addition using __builtin_add_overflow

2023-11-17 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600

--- Comment #1 from Jonathan Wakely  ---
Similar results for aarch64 with GCC:

add_sat(unsigned int, unsigned int):
addsw0, w0, w1
bcs .L7
ret
.L7:
mov w0, -1
ret
add_sat2(unsigned int, unsigned int):
addsw0, w0, w1
csinv   w0, w0, wzr, cc
ret