[Bug target/18154] Inefficient max/min code for PowerPC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=18154 --- Comment #13 from Segher Boessenkool --- Trunk now generates isel for power9.
[Bug target/18154] Inefficient max/min code for PowerPC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=18154 --- Comment #12 from Segher Boessenkool --- (Never mind those last "addc" insn, they can just as well be plain "add", I pasted the wrong ones).
[Bug target/18154] Inefficient max/min code for PowerPC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=18154 --- Comment #11 from Segher Boessenkool --- The signed version can be done in four insns: 1: subfc r5,r3,r4 subfe r6,r6,r6 and r7,r6,r5 addcr8,r7,r3 (superopt finds 16 versions, all similar). The unsigned version can be done in six: 33: subfc r5,r3,r4 srwir6,r4,31 srwir7,r3,31 subfe r8,r6,r7 and r9,r8,r5 addcr10,r9,r3 (superopt finds 240 versions, many with one or two xoris ,,0x8000 which doesn't work for 64 bit, and many with srawi as well, which can be more expensive than srwi; all remaining are similar). For 32-bit min/max on a 64-bit cpu, we can use only "cheap", non-carry instructions: extsw r3,r3 extsw r4,r4 subf r5,r4,r3 srdi r6,r5,32 and r7,r6,r5 add r8,r7,r4 (and unsigned exts for unsigned). Those extends often disappear into surrounding insns, or because the ABI requires the regs to be extended already, etc.
[Bug target/18154] Inefficient max/min code for PowerPC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=18154 --- Comment #9 from Martin Sebor --- I noticed while looking at an unrelated bug that when targeting power7 or power8 Clang makes use of the isel instruction and emits the following: min:# @min cmpw 3, 4 isel 3, 3, 4, 0 blr Gcc also has the capability of using isel but it's disabled by default even when targeting power8 and must be explicitly enabled via -misel. With it, GCC emits the following branchless code: min: cmpw 7,3,4 isel 3,3,4,28 extsw 3,3 blr Since the instruction exists for just this purpose (eliminating branches), would it make sense to enable it by default? (I suppose one concern with it might be that it's not being very extensively tested.)
[Bug target/18154] Inefficient max/min code for PowerPC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=18154 --- Comment #10 from David Edelsohn --- isel is not generally performance win for Power using GCC. It is enabled for LLVM because LLVM has a simplistic basic block scheduler and isel allows LLVM to form larger basic blocks to provide the scheduler with more freedom of movement.
[Bug target/18154] Inefficient max/min code for PowerPC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=18154 Martin Sebor changed: What|Removed |Added Target|powerpc-*-* |powerpc*-*-* Status|NEW |WAITING Last reconfirmed|2006-10-22 23:16:26 |2016-1-27 CC||msebor at gcc dot gnu.org Known to fail||4.9.3, 5.3.0, 6.0 --- Comment #7 from Martin Sebor --- Current trunk as well as all supported GCC versions before it still emits the same code (see below). XLC 12 on gcc111.fsffrance.org also emits a branch (see below). Ditto for Clang. David, in light of this and in light of comments #4 and #5, do you still believe that GCC should change as you suggested in the Description? .min: # 0x (H.10.NO_SYMBOL) cmp0,r3,r4 bc BO_IF,CR0_LT,__L10 oril r3,r4,0x bcrBO_ALWAYS,CR0_LT __L10: # 0x0010 (H.10.NO_SYMBOL+0x10) bcrBO_ALWAYS,CR0_LT $ cat ~/tmp/t.c && /build/gcc-trunk/gcc/xgcc -B /build/gcc-trunk/gcc -O2 -S -Wall -Wextra -Wpedantic -o/dev/stdout ~/tmp/t.c int min(int a, int b) { if (a < b) return a; else return b; } .file "t.c" .machine power8 .abiversion 2 .section".toc","aw" .section".text" .align 2 .p2align 4,,15 .globl min .type min, @function min: cmpw 7,3,4 ble 7,.L2 mr 3,4 .L2: extsw 3,3 blr .long 0 .byte 0,0,0,0,0,0,0,0 .size min,.-min .ident "GCC: (GNU) 6.0.0 20160125 (experimental)" .section.note.GNU-stack,"",@progbits
[Bug target/18154] Inefficient max/min code for PowerPC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=18154 David Edelsohn changed: What|Removed |Added Status|WAITING |NEW CC||wschmidt at gcc dot gnu.org --- Comment #8 from David Edelsohn --- Branchless code generally is better.
[Bug target/18154] Inefficient max/min code for PowerPC
--- Additional Comments From pinskia at gcc dot gnu dot org 2005-04-24 14:31 --- On the mainline, we now produce: cmpw cr7,r3,r4 blelr- cr7 mr r3,r4 blr -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18154
[Bug target/18154] Inefficient max/min code for PowerPC
--- Additional Comments From geoffk at gcc dot gnu dot org 2004-10-27 23:45 --- I'm not sure that subfc/subfe is going to be cheaper than a compare and a branch, even if the branch is mispredicted half the time. Do you have timing results? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18154
[Bug target/18154] Inefficient max/min code for PowerPC
--- Additional Comments From pinskia at gcc dot gnu dot org 2004-10-27 23:52 --- I should note when I was doing SPEC work, using subfc/subfe did not help SPEC at all (I tried to change the source and also rs6000.md). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18154
[Bug target/18154] Inefficient max/min code for PowerPC
--- Additional Comments From dje at gcc dot gnu dot org 2004-10-26 20:06 --- XLC chooses the straight-line code sequence versus compare and branch based on a cost model. This should not be a uniform change in behavior for PowerPC. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18154
[Bug target/18154] Inefficient max/min code for PowerPC
--- Additional Comments From dje at gcc dot gnu dot org 2004-10-26 21:25 --- Also, do not enable when optimizing for size. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18154
[Bug target/18154] Inefficient max/min code for PowerPC
--- Additional Comments From pinskia at gcc dot gnu dot org 2004-10-26 04:25 --- Confirmed. -- What|Removed |Added Status|UNCONFIRMED |NEW Ever Confirmed||1 Last reconfirmed|-00-00 00:00:00 |2004-10-26 04:25:17 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18154