[Bug middle-end/98713] Failure to generate branch version of abs if user requested it

2021-09-26 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98713

--- Comment #10 from Andrew Pinski  ---
(In reply to Martin Liška from comment #1)
> I think it's fixed since r11-2588-gc072fd236dc08f99.

Oh this changed from the shift/xor/sub to using cmov ...

[Bug middle-end/98713] Failure to generate branch version of abs if user requested it

2021-09-26 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98713

Andrew Pinski  changed:

   What|Removed |Added

 Target||x86_64
   Keywords||missed-optimization

--- Comment #9 from Andrew Pinski  ---
This is basically a bug in PHI-OPT which assumes ABS_EXPR will always generate
better code than the conditional case.

Hmm, Why is x86_64 using a cmov here for ABS_EXPR instead of:
Shift
xor
sub
?

[Bug middle-end/98713] Failure to generate branch version of abs if user requested it

2021-01-18 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98713

--- Comment #8 from Jakub Jelinek  ---
This is specific to x86, where if the inputs are inpredictable and results
aren't consumed too early that the cmov latency kills performance cmov
sometimes improves performance a lot, on the other side, if the inputs are
predictable, branches are often much faster than cmov.
I'm not aware of other architectures where the conditional moves are such a
mixed bag, e.g. on arm/aarch64 I think using cmov is generally always better.

[Bug middle-end/98713] Failure to generate branch version of abs if user requested it

2021-01-18 Thread roger at nextmovesoftware dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98713

--- Comment #7 from Roger Sayle  ---
I agree in the general case, a conditional jump (that depends only on the
condition flags) potentially has a shorter dependence chain than a cmov (which
depends on the condition flags and two registers).  But in this case, the
condition codes can't be determined any faster than the register operands.

I believe the branch prediction argument is a red herring.  Yes, with clever
hardware and luck, the CPU can predict which instruction will be executed next
after a conditional jump, but it can always know which instruction is executed
next after a cmov.  A cmov (and multiple cmovs) can be scheduled and executed
out-of-order without speculation.  Hence branch prediction is only a factor
when dependency chain lengths are an issue/unequal (or the cmov is slower than
a correctly predicted branch).

An understanding the data distribution is also irrelevant if the best/fastest
(correctly predicted) branch implementation is no better/faster than the cmov.

But I can also imagine microarchitectures where predicted conditional jumps are
free (requiring zero cycles) and where the condition code "test" is eliminated
having been set/forwarded from an earier instruction, in which case a
zero-latency abs is about as good as you can get.  Are we assuming a target
with "predicted_branch_cost < conditional_move_cost"? I wouldn't be surprised
if GCC internally assumes these are both always COSTS_N_INSNS(1).

If conditional_move_cost <= predicted_branch_cost <= mispredicted_branch_cost
then the cmov should always preferred (independent of branch probabilities or
__builtin_expect hints).  If predicted_branch_cost <= mispredicted_branch_cost
<= conditional_move_cost, the branch should always be preferred, and the cmov
shouldn't be part of the ISA.  The interesting domain of trade-offs is
where/when predicted_branch_cost < conditional_move_cost <=
mispredicted_branch_cost (which I'm not yet convinced is the case here).

Do we have any numbers that show the branch is better (for this case) on real
hardware, than can't be explained by other factors?  For example, on ABS where
the inputs are always positive.

[Bug middle-end/98713] Failure to generate branch version of abs if user requested it

2021-01-18 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98713

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #6 from Jakub Jelinek  ---
Guess we should punt on the various phiopt detections if the branch
probabilities are substantially different.

[Bug middle-end/98713] Failure to generate branch version of abs if user requested it

2021-01-18 Thread david.bolvansky at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98713

--- Comment #5 from Dávid Bolvanský  ---
User knows the data better, so he/she may prefer abs with branch.

Also PGO may say that branch for abs is better based on profile data.

[Bug middle-end/98713] Failure to generate branch version of abs if user requested it

2021-01-18 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98713

--- Comment #4 from Uroš Bizjak  ---
Please see PR 56309 (and PR 85559 meta bug).

Quote from Honza:

The decision on whether to use cmov or jmp was always tricky on x86
architectures. Cmov increase dependency chains, register pressure (both values
needs to be loaded in) and has long opcode. So jump sequence, if well
predicted, flows better through the out-of-order core. If badly predicted it
is, of course, a disaster. I think more modern CPUs solved the problems with
long latency of cmov, but the dependency chains are still there.

We don't know how to drive the decision, as this is a deep architectural issue.

[Bug middle-end/98713] Failure to generate branch version of abs if user requested it

2021-01-18 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98713

--- Comment #3 from Richard Biener  ---
A well-predicted branch will be faster than the cmov because of the shorter
data dependence path.

[Bug middle-end/98713] Failure to generate branch version of abs if user requested it

2021-01-18 Thread roger at nextmovesoftware dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98713

Roger Sayle  changed:

   What|Removed |Added

 CC||roger at nextmovesoftware dot 
com

--- Comment #2 from Roger Sayle  ---
My first impression is that this isn't a bug, it's a feature.  In an optimizing
compiler, the user specifies the computation to be performed and the compiler
selects the implementation.  Hence "x+0" isn't a user request to perform an
addition.

Perhaps David could provide more information on why a branch implementation is
required/preferred (for example on which target)?  On generic x86_64, I believe
the code currently generated is both smaller and faster.  Assuming "neg eax"
takes about the same time as "test edi,edi", and that "cmovs" takes about the
same time as (either branch) of "js".

As a workaround a branch version can be implemented in inline assembly using
__asm, but I'm still hazy as to why this would be desirable.

[Bug middle-end/98713] Failure to generate branch version of abs if user requested it

2021-01-18 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98713

Martin Liška  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2021-01-18
 CC||marxin at gcc dot gnu.org,
   ||sayle at gcc dot gnu.org,
   ||uros at gcc dot gnu.org

--- Comment #1 from Martin Liška  ---
I think it's fixed since r11-2588-gc072fd236dc08f99.

@Roger, Uros: Can you please verify that?