Re: [RFC, 4.9 patch] Reload alternative weighting vs. wrong class regs

2013-02-07 Thread Steven Bosscher
On Thu, Feb 7, 2013 at 4:09 AM, Alan Modra wrote:
 After fixing PR54009 (again), I thought I'd take a look at why reload
 is generating the following correct but poor code

  stw 10,8(1)
  stw 11 12(1)
  ...
  lfd 0,8(1)
  stfd 0,x+32764@l(9)

 rather than

  addi 9,x+32764@l(9)
  ...
  stw 10,0(9)
  stw 11 4(9)

FWIW, left trunk vs. LRA right (with PR54009 patch on rs6000.c):

r:  r:
stwu 1,-160(1)  stwu 1,-160(1)
lis 9,x+32764@halis 9,x+32764@ha
la 9,x+32764@l(9) | la 8,x+32764@l(9)
lwz 10,0(9)   
lwz 11,4(9)   
lis 9,y@ha  lis 9,y@ha
   lwz 10,0(8)
   lwz 11,4(8)
stfd ...
...
lfd ...
...
stw 10,y@l(9)   stw 10,y@l(9)
stw 11,y+4@l(9) stw 11,y+4@l(9)
addi 1,1,160addi 1,1,160
blr blr

w:  w:
stwu 1,-160(1)  stwu 1,-160(1)
lis 9,y@ha  lis 9,y@ha
la 10,y@l(9)la 10,y@l(9)
   lis 9,x+32764@ha
lwz 11,4(10)lwz 11,4(10)
lwz 10,0(10)lwz 10,0(10)
lis 9,x+32764@ha  | la 8,x+32764@l(9)
stfd ...
...
lfd ...
...
lfd 31,152(1)   lfd 31,152(1)
lfd 0,8(1)| stw 10,0(8)
stfd 0,x+32764@l(9)   | stw 11,4(8)
addi 1,1,160addi 1,1,160
blr blr

I don't speak POWER but perhaps you can make sense of it :-)

Ciao!
Steven


Re: [RFC, 4.9 patch] Reload alternative weighting vs. wrong class regs

2013-02-07 Thread Alan Modra
On Thu, Feb 07, 2013 at 11:01:13PM +0100, Steven Bosscher wrote:
 FWIW, left trunk vs. LRA right (with PR54009 patch on rs6000.c):
[snip]
 
 I don't speak POWER but perhaps you can make sense of it :-)

No real difference in r function, and interesting that LRA does a
better job of w, about the same as trunk with my reload patch.

-- 
Alan Modra
Australia Development Lab, IBM


[RFC, 4.9 patch] Reload alternative weighting vs. wrong class regs

2013-02-06 Thread Alan Modra
After fixing PR54009 (again), I thought I'd take a look at why reload
is generating the following correct but poor code

 stw 10,8(1)
 stw 11 12(1)
 ...
 lfd 0,8(1)
 stfd 0,x+32764@l(9)

rather than

 addi 9,x+32764@l(9)
 ...
 stw 10,0(9)
 stw 11 4(9)

This code sequence is from

(set (mem/c:DF (lo_sum:SI (reg/f:SI 9)
  (const:SI (plus:SI (symbol_ref:SI (x))
 (const_int 32764)
 (reg:DF 10) ...gcc.target/powerpc/pr54009.c:42 363 {*movdf_hardfloat32})

In tracing through reload, I see a score of 8 for the m-d
alternative, and 9 for Y-r.  In both cases we have one loser
operand for a score of 6 (d in the first case, Y in the second),
plus a score of 2 from

  /* We prefer to reload pseudos over reloading other things,
 since such reloads may be able to be eliminated later.
 If we are reloading a SCRATCH, we won't be generating any
 insns, just using a register, so it is also preferred.
 So bump REJECT in other cases.  Don't do this in the
 case where we are forcing a constant into memory and
 it will then win since we don't want to have a different
 alternative match then.  */
  if (! (REG_P (operand)
  REGNO (operand) = FIRST_PSEUDO_REGISTER)
   GET_CODE (operand) != SCRATCH
   ! (const_to_mem  constmemok))
reject += 2;

The Y-r alternative gets one extra from

  /* Input reloads can be inherited more often than output
 reloads can be removed, so penalize output reloads.  */
  if (operand_type[i] != RELOAD_FOR_INPUT
   GET_CODE (operand) != SCRATCH)
reject++;

The problem of course is that the input reload is quite expensive,
involving a copy to memory.  So, how about teaching reload about this
as follows?  I picked 6 for the reject value to make it equivalent to
a '?' in the constraint, but that may be too large.  Any value of 2 or
greater works for the testcase.

Bootstrapped and regression tested powerpc64-linux, but not yet spec
tested.

2013-02-07  Alan Modra  amo...@gmail.com

* reload.c (find_reloads): Disparage reg alternatives needing
secondary memory to reload.

Index: gcc/reload.c
===
--- gcc/reload.c(revision 195707)
+++ gcc/reload.c(working copy)
@@ -3633,11 +3633,21 @@
  == NO_REGS)
reject = 600;
 
- if (operand_type[i] == RELOAD_FOR_OUTPUT
-  (targetm.preferred_output_reload_class (operand,
-   this_alternative[i])
- == NO_REGS))
+ else if (operand_type[i] == RELOAD_FOR_OUTPUT
+   (targetm.preferred_output_reload_class
+  (operand, this_alternative[i])
+  == NO_REGS))
reject = 600;
+
+#ifdef SECONDARY_MEMORY_NEEDED
+ else if (REG_P (operand)
+   REGNO (operand)  FIRST_PSEUDO_REGISTER
+   (SECONDARY_MEMORY_NEEDED
+  ((enum reg_class) this_alternative[i],
+   REGNO_REG_CLASS (REGNO (operand)),
+   operand_mode[i])))
+   reject += 6;
+#endif
}
 
  /* We prefer to reload pseudos over reloading other things,

-- 
Alan Modra
Australia Development Lab, IBM