Re: [RFC, 4.9 patch] Reload alternative weighting vs. wrong class regs
On Thu, Feb 7, 2013 at 4:09 AM, Alan Modra wrote: After fixing PR54009 (again), I thought I'd take a look at why reload is generating the following correct but poor code stw 10,8(1) stw 11 12(1) ... lfd 0,8(1) stfd 0,x+32764@l(9) rather than addi 9,x+32764@l(9) ... stw 10,0(9) stw 11 4(9) FWIW, left trunk vs. LRA right (with PR54009 patch on rs6000.c): r: r: stwu 1,-160(1) stwu 1,-160(1) lis 9,x+32764@halis 9,x+32764@ha la 9,x+32764@l(9) | la 8,x+32764@l(9) lwz 10,0(9) lwz 11,4(9) lis 9,y@ha lis 9,y@ha lwz 10,0(8) lwz 11,4(8) stfd ... ... lfd ... ... stw 10,y@l(9) stw 10,y@l(9) stw 11,y+4@l(9) stw 11,y+4@l(9) addi 1,1,160addi 1,1,160 blr blr w: w: stwu 1,-160(1) stwu 1,-160(1) lis 9,y@ha lis 9,y@ha la 10,y@l(9)la 10,y@l(9) lis 9,x+32764@ha lwz 11,4(10)lwz 11,4(10) lwz 10,0(10)lwz 10,0(10) lis 9,x+32764@ha | la 8,x+32764@l(9) stfd ... ... lfd ... ... lfd 31,152(1) lfd 31,152(1) lfd 0,8(1)| stw 10,0(8) stfd 0,x+32764@l(9) | stw 11,4(8) addi 1,1,160addi 1,1,160 blr blr I don't speak POWER but perhaps you can make sense of it :-) Ciao! Steven
Re: [RFC, 4.9 patch] Reload alternative weighting vs. wrong class regs
On Thu, Feb 07, 2013 at 11:01:13PM +0100, Steven Bosscher wrote: FWIW, left trunk vs. LRA right (with PR54009 patch on rs6000.c): [snip] I don't speak POWER but perhaps you can make sense of it :-) No real difference in r function, and interesting that LRA does a better job of w, about the same as trunk with my reload patch. -- Alan Modra Australia Development Lab, IBM
[RFC, 4.9 patch] Reload alternative weighting vs. wrong class regs
After fixing PR54009 (again), I thought I'd take a look at why reload is generating the following correct but poor code stw 10,8(1) stw 11 12(1) ... lfd 0,8(1) stfd 0,x+32764@l(9) rather than addi 9,x+32764@l(9) ... stw 10,0(9) stw 11 4(9) This code sequence is from (set (mem/c:DF (lo_sum:SI (reg/f:SI 9) (const:SI (plus:SI (symbol_ref:SI (x)) (const_int 32764) (reg:DF 10) ...gcc.target/powerpc/pr54009.c:42 363 {*movdf_hardfloat32}) In tracing through reload, I see a score of 8 for the m-d alternative, and 9 for Y-r. In both cases we have one loser operand for a score of 6 (d in the first case, Y in the second), plus a score of 2 from /* We prefer to reload pseudos over reloading other things, since such reloads may be able to be eliminated later. If we are reloading a SCRATCH, we won't be generating any insns, just using a register, so it is also preferred. So bump REJECT in other cases. Don't do this in the case where we are forcing a constant into memory and it will then win since we don't want to have a different alternative match then. */ if (! (REG_P (operand) REGNO (operand) = FIRST_PSEUDO_REGISTER) GET_CODE (operand) != SCRATCH ! (const_to_mem constmemok)) reject += 2; The Y-r alternative gets one extra from /* Input reloads can be inherited more often than output reloads can be removed, so penalize output reloads. */ if (operand_type[i] != RELOAD_FOR_INPUT GET_CODE (operand) != SCRATCH) reject++; The problem of course is that the input reload is quite expensive, involving a copy to memory. So, how about teaching reload about this as follows? I picked 6 for the reject value to make it equivalent to a '?' in the constraint, but that may be too large. Any value of 2 or greater works for the testcase. Bootstrapped and regression tested powerpc64-linux, but not yet spec tested. 2013-02-07 Alan Modra amo...@gmail.com * reload.c (find_reloads): Disparage reg alternatives needing secondary memory to reload. Index: gcc/reload.c === --- gcc/reload.c(revision 195707) +++ gcc/reload.c(working copy) @@ -3633,11 +3633,21 @@ == NO_REGS) reject = 600; - if (operand_type[i] == RELOAD_FOR_OUTPUT - (targetm.preferred_output_reload_class (operand, - this_alternative[i]) - == NO_REGS)) + else if (operand_type[i] == RELOAD_FOR_OUTPUT + (targetm.preferred_output_reload_class + (operand, this_alternative[i]) + == NO_REGS)) reject = 600; + +#ifdef SECONDARY_MEMORY_NEEDED + else if (REG_P (operand) + REGNO (operand) FIRST_PSEUDO_REGISTER + (SECONDARY_MEMORY_NEEDED + ((enum reg_class) this_alternative[i], + REGNO_REG_CLASS (REGNO (operand)), + operand_mode[i]))) + reject += 6; +#endif } /* We prefer to reload pseudos over reloading other things, -- Alan Modra Australia Development Lab, IBM