[Bug target/18154] Inefficient max/min code for PowerPC

2017-11-21 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=18154

--- Comment #13 from Segher Boessenkool  ---
Trunk now generates isel for power9.

[Bug target/18154] Inefficient max/min code for PowerPC

2016-08-23 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=18154

--- Comment #12 from Segher Boessenkool  ---
(Never mind those last "addc" insn, they can just as well be plain
"add", I pasted the wrong ones).

[Bug target/18154] Inefficient max/min code for PowerPC

2016-08-23 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=18154

--- Comment #11 from Segher Boessenkool  ---
The signed version can be done in four insns:

1:  subfc   r5,r3,r4
subfe   r6,r6,r6
and r7,r6,r5
addcr8,r7,r3

(superopt finds 16 versions, all similar).

The unsigned version can be done in six:

33: subfc   r5,r3,r4
srwir6,r4,31
srwir7,r3,31
subfe   r8,r6,r7
and r9,r8,r5
addcr10,r9,r3

(superopt finds 240 versions, many with one or two xoris ,,0x8000
which doesn't work for 64 bit, and many with srawi as well, which
can be more expensive than srwi; all remaining are similar).

For 32-bit min/max on a 64-bit cpu, we can use only "cheap", non-carry
instructions:

  extsw r3,r3
  extsw r4,r4
  subf r5,r4,r3
  srdi r6,r5,32
  and r7,r6,r5
  add r8,r7,r4

(and unsigned exts for unsigned).  Those extends often disappear into
surrounding insns, or because the ABI requires the regs to be extended
already, etc.

[Bug target/18154] Inefficient max/min code for PowerPC

2016-01-30 Thread msebor at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=18154

--- Comment #9 from Martin Sebor  ---
I noticed while looking at an unrelated bug that when targeting power7 or
power8 Clang makes use of the isel instruction and emits the following:

min:# @min
cmpw 3, 4
isel 3, 3, 4, 0
blr

Gcc also has the capability of using isel but it's disabled by default even
when targeting power8 and must be explicitly enabled via -misel.  With it, GCC
emits the following branchless code:

min:
cmpw 7,3,4
isel 3,3,4,28
extsw 3,3
blr

Since the instruction exists for just this purpose (eliminating branches),
would it make sense to enable it by default?  (I suppose one concern with it
might be that it's not being very extensively tested.)

[Bug target/18154] Inefficient max/min code for PowerPC

2016-01-30 Thread dje at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=18154

--- Comment #10 from David Edelsohn  ---
isel is not generally performance win for Power using GCC.  It is enabled for
LLVM because LLVM has a simplistic basic block scheduler and isel allows LLVM
to form larger basic blocks to provide the scheduler with more freedom of
movement.

[Bug target/18154] Inefficient max/min code for PowerPC

2016-01-27 Thread msebor at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=18154

Martin Sebor  changed:

   What|Removed |Added

 Target|powerpc-*-* |powerpc*-*-*
 Status|NEW |WAITING
   Last reconfirmed|2006-10-22 23:16:26 |2016-1-27
 CC||msebor at gcc dot gnu.org
  Known to fail||4.9.3, 5.3.0, 6.0

--- Comment #7 from Martin Sebor  ---
Current trunk as well as all supported GCC versions before it still emits the
same code (see below).  XLC 12 on gcc111.fsffrance.org also emits a branch (see
below).  Ditto for Clang.

David, in light of this and in light of comments #4 and #5, do you still
believe that GCC should change as you suggested in the Description?

.min:   # 0x (H.10.NO_SYMBOL)
cmp0,r3,r4
bc BO_IF,CR0_LT,__L10
oril   r3,r4,0x
bcrBO_ALWAYS,CR0_LT
__L10:  # 0x0010 (H.10.NO_SYMBOL+0x10)
bcrBO_ALWAYS,CR0_LT


$ cat ~/tmp/t.c && /build/gcc-trunk/gcc/xgcc -B /build/gcc-trunk/gcc -O2 -S
-Wall -Wextra -Wpedantic -o/dev/stdout ~/tmp/t.c
int min(int a, int b) {
  if (a < b)
return a;
  else
return b;
}
.file   "t.c"
.machine power8
.abiversion 2
.section".toc","aw"
.section".text"
.align 2
.p2align 4,,15
.globl min
.type   min, @function
min:
cmpw 7,3,4
ble 7,.L2
mr 3,4
.L2:
extsw 3,3
blr
.long 0
.byte 0,0,0,0,0,0,0,0
.size   min,.-min
.ident  "GCC: (GNU) 6.0.0 20160125 (experimental)"
.section.note.GNU-stack,"",@progbits

[Bug target/18154] Inefficient max/min code for PowerPC

2016-01-27 Thread dje at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=18154

David Edelsohn  changed:

   What|Removed |Added

 Status|WAITING |NEW
 CC||wschmidt at gcc dot gnu.org

--- Comment #8 from David Edelsohn  ---
Branchless code generally is better.

[Bug target/18154] Inefficient max/min code for PowerPC

2005-04-24 Thread pinskia at gcc dot gnu dot org

--- Additional Comments From pinskia at gcc dot gnu dot org  2005-04-24 
14:31 ---
On the mainline, we now produce:
cmpw cr7,r3,r4
blelr- cr7
mr r3,r4
blr



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18154


[Bug target/18154] Inefficient max/min code for PowerPC

2004-10-27 Thread geoffk at gcc dot gnu dot org

--- Additional Comments From geoffk at gcc dot gnu dot org  2004-10-27 23:45 
---
I'm not sure that subfc/subfe is going to be cheaper than a compare and a branch, even 
if the branch is 
mispredicted half the time.  Do you have timing results?

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18154


[Bug target/18154] Inefficient max/min code for PowerPC

2004-10-27 Thread pinskia at gcc dot gnu dot org

--- Additional Comments From pinskia at gcc dot gnu dot org  2004-10-27 23:52 
---
I should note when I was doing SPEC work, using subfc/subfe did not help SPEC at all 
(I tried to change 
the source and also rs6000.md).

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18154


[Bug target/18154] Inefficient max/min code for PowerPC

2004-10-26 Thread dje at gcc dot gnu dot org

--- Additional Comments From dje at gcc dot gnu dot org  2004-10-26 20:06 ---
XLC chooses the straight-line code sequence versus compare and branch based on 
a cost model.  This should not be a uniform change in behavior for PowerPC.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18154


[Bug target/18154] Inefficient max/min code for PowerPC

2004-10-26 Thread dje at gcc dot gnu dot org

--- Additional Comments From dje at gcc dot gnu dot org  2004-10-26 21:25 ---
Also, do not enable when optimizing for size.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18154


[Bug target/18154] Inefficient max/min code for PowerPC

2004-10-25 Thread pinskia at gcc dot gnu dot org

--- Additional Comments From pinskia at gcc dot gnu dot org  2004-10-26 04:25 
---
Confirmed.

-- 
   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever Confirmed||1
   Last reconfirmed|-00-00 00:00:00 |2004-10-26 04:25:17
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18154