[Bug target/92953] Undesired if-conversion with overflow builtins

2023-02-17 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92953

--- Comment #5 from Andrew Pinski  ---
On x86_64 since the flags get clobbered with almost all instructions. Either
you do the subtraction twice or you use set instruction. GCC choses the later
... I suspect that is a general issue that shows up more than normal on x86_64
than any other target due to that.

[Bug target/92953] Undesired if-conversion with overflow builtins

2019-12-16 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92953

--- Comment #4 from Alexander Monakov  ---
At least then GCC should try to use cmovno instead of seto-test-cmove for
if-conversion:

foo:
movl%edi, %eax
subl%esi, %eax
notl%eax
orl $1, %eax
subl%esi, %edi
cmovno  %edi, %eax
ret

[Bug target/92953] Undesired if-conversion with overflow builtins

2019-12-16 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92953

--- Comment #3 from Andrew Pinski  ---
(In reply to Alexander Monakov from comment #2)
> Well, the aarch64 backend does not implement subv4 pattern in the
> first place, which would be required for efficient branchy code:
> 
> foo:
> subsw0, w0, w1
> b.vc.LBB0_2
> mvn w0, w0
> orr w0, w0, #0x1
> .LBB0_2:
> ret
> 
> This is preferable when the branch is predictable, thanks to shorter
> dependency chain.

Not much shorter and even if the branch is predictable it is worse. Also more
likely you a lot going on during the time, so a microbenchmark is not going to
prove to me it is worse.  

Also yes overflow might be considered the "exceptional" case but I have my
doubts that is always true.

[Bug target/92953] Undesired if-conversion with overflow builtins

2019-12-16 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92953

--- Comment #2 from Alexander Monakov  ---
Well, the aarch64 backend does not implement subv4 pattern in the first
place, which would be required for efficient branchy code:

foo:
subsw0, w0, w1
b.vc.LBB0_2
mvn w0, w0
orr w0, w0, #0x1
.LBB0_2:
ret

This is preferable when the branch is predictable, thanks to shorter dependency
chain.

[Bug target/92953] Undesired if-conversion with overflow builtins

2019-12-16 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92953

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||missed-optimization
 Target||x86_64-linux-gnu
  Component|rtl-optimization|target
   Severity|normal  |enhancement

--- Comment #1 from Andrew Pinski  ---
But for aarch64 we get:
subsw0, w0, w1 //w0 = w0 - w1; setting overflow bit of flags
mvn w1, w0  // w1 = ~w0
orr w1, w1, 1 // w1 = w1 | 1
cselw0, w1, w0, vs // w0 = over ? w1 : w0
ret

Which is the best and better than with branches.

So this is a fully a target issue.