[Bug middle-end/98713] Failure to generate branch version of abs if user requested it
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98713 --- Comment #10 from Andrew Pinski --- (In reply to Martin Liška from comment #1) > I think it's fixed since r11-2588-gc072fd236dc08f99. Oh this changed from the shift/xor/sub to using cmov ...
[Bug middle-end/98713] Failure to generate branch version of abs if user requested it
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98713 Andrew Pinski changed: What|Removed |Added Target||x86_64 Keywords||missed-optimization --- Comment #9 from Andrew Pinski --- This is basically a bug in PHI-OPT which assumes ABS_EXPR will always generate better code than the conditional case. Hmm, Why is x86_64 using a cmov here for ABS_EXPR instead of: Shift xor sub ?
[Bug middle-end/98713] Failure to generate branch version of abs if user requested it
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98713 --- Comment #8 from Jakub Jelinek --- This is specific to x86, where if the inputs are inpredictable and results aren't consumed too early that the cmov latency kills performance cmov sometimes improves performance a lot, on the other side, if the inputs are predictable, branches are often much faster than cmov. I'm not aware of other architectures where the conditional moves are such a mixed bag, e.g. on arm/aarch64 I think using cmov is generally always better.
[Bug middle-end/98713] Failure to generate branch version of abs if user requested it
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98713 --- Comment #7 from Roger Sayle --- I agree in the general case, a conditional jump (that depends only on the condition flags) potentially has a shorter dependence chain than a cmov (which depends on the condition flags and two registers). But in this case, the condition codes can't be determined any faster than the register operands. I believe the branch prediction argument is a red herring. Yes, with clever hardware and luck, the CPU can predict which instruction will be executed next after a conditional jump, but it can always know which instruction is executed next after a cmov. A cmov (and multiple cmovs) can be scheduled and executed out-of-order without speculation. Hence branch prediction is only a factor when dependency chain lengths are an issue/unequal (or the cmov is slower than a correctly predicted branch). An understanding the data distribution is also irrelevant if the best/fastest (correctly predicted) branch implementation is no better/faster than the cmov. But I can also imagine microarchitectures where predicted conditional jumps are free (requiring zero cycles) and where the condition code "test" is eliminated having been set/forwarded from an earier instruction, in which case a zero-latency abs is about as good as you can get. Are we assuming a target with "predicted_branch_cost < conditional_move_cost"? I wouldn't be surprised if GCC internally assumes these are both always COSTS_N_INSNS(1). If conditional_move_cost <= predicted_branch_cost <= mispredicted_branch_cost then the cmov should always preferred (independent of branch probabilities or __builtin_expect hints). If predicted_branch_cost <= mispredicted_branch_cost <= conditional_move_cost, the branch should always be preferred, and the cmov shouldn't be part of the ISA. The interesting domain of trade-offs is where/when predicted_branch_cost < conditional_move_cost <= mispredicted_branch_cost (which I'm not yet convinced is the case here). Do we have any numbers that show the branch is better (for this case) on real hardware, than can't be explained by other factors? For example, on ABS where the inputs are always positive.
[Bug middle-end/98713] Failure to generate branch version of abs if user requested it
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98713 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #6 from Jakub Jelinek --- Guess we should punt on the various phiopt detections if the branch probabilities are substantially different.
[Bug middle-end/98713] Failure to generate branch version of abs if user requested it
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98713 --- Comment #5 from Dávid Bolvanský --- User knows the data better, so he/she may prefer abs with branch. Also PGO may say that branch for abs is better based on profile data.
[Bug middle-end/98713] Failure to generate branch version of abs if user requested it
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98713 --- Comment #4 from Uroš Bizjak --- Please see PR 56309 (and PR 85559 meta bug). Quote from Honza: The decision on whether to use cmov or jmp was always tricky on x86 architectures. Cmov increase dependency chains, register pressure (both values needs to be loaded in) and has long opcode. So jump sequence, if well predicted, flows better through the out-of-order core. If badly predicted it is, of course, a disaster. I think more modern CPUs solved the problems with long latency of cmov, but the dependency chains are still there. We don't know how to drive the decision, as this is a deep architectural issue.
[Bug middle-end/98713] Failure to generate branch version of abs if user requested it
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98713 --- Comment #3 from Richard Biener --- A well-predicted branch will be faster than the cmov because of the shorter data dependence path.
[Bug middle-end/98713] Failure to generate branch version of abs if user requested it
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98713 Roger Sayle changed: What|Removed |Added CC||roger at nextmovesoftware dot com --- Comment #2 from Roger Sayle --- My first impression is that this isn't a bug, it's a feature. In an optimizing compiler, the user specifies the computation to be performed and the compiler selects the implementation. Hence "x+0" isn't a user request to perform an addition. Perhaps David could provide more information on why a branch implementation is required/preferred (for example on which target)? On generic x86_64, I believe the code currently generated is both smaller and faster. Assuming "neg eax" takes about the same time as "test edi,edi", and that "cmovs" takes about the same time as (either branch) of "js". As a workaround a branch version can be implemented in inline assembly using __asm, but I'm still hazy as to why this would be desirable.
[Bug middle-end/98713] Failure to generate branch version of abs if user requested it
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98713 Martin Liška changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Last reconfirmed||2021-01-18 CC||marxin at gcc dot gnu.org, ||sayle at gcc dot gnu.org, ||uros at gcc dot gnu.org --- Comment #1 from Martin Liška --- I think it's fixed since r11-2588-gc072fd236dc08f99. @Roger, Uros: Can you please verify that?