On Thu, Oct 4, 2012 at 5:37 PM, Vladimir Makarov wrote: > The following patch solves most of LRA scalability problems. > > It switches on simpler algorithms in LRA. The first it switches off > trying to reassign hard registers to spilled pseudos (they usually for such > huge functions have long live ranges -- so the possibility to assign > them something very small but trying to reassign them a hard registers > is to expensive), inheritance, live range splitting, and memory > coalescing optimizations. It seems that rematerialization is too > important for performance -- so I don't switch it off. As splitting is > also necessary for generation of caller saves code, I switch off > caller-saves in IRA and force IRA to do non-regional RA.
Hi Vlad, I've revisited this patch now that parts of the scalability issues have been resolved. Something funny happened for our soon-to-be-legendary PR54146 test case... lra-branch yesterday (i.e. without the elimination and constraints speedup patches): integrated RA : 145.26 (18%) LRA non-specific : 46.94 ( 6%) LRA virtuals elimination: 51.56 ( 6%) LRA reload inheritance : 0.03 ( 0%) LRA create live ranges : 46.67 ( 6%) LRA hard reg assignment : 0.55 ( 0%) lra-branch today + ira-speedup-1.diff: integrated RA : 111.19 (15%) usr LRA non-specific : 21.16 ( 3%) usr LRA virtuals elimination: 0.65 ( 0%) usr LRA reload inheritance : 0.01 ( 0%) usr LRA create live ranges : 56.33 ( 8%) usr LRA hard reg assignment : 0.58 ( 0%) usr lra-branch today + ira-speedup-1.diff + rm-lra_simple_p.diff: integrated RA : 89.43 (11%) usr LRA non-specific : 21.43 ( 3%) usr LRA virtuals elimination: 0.61 ( 0%) usr LRA reload inheritance : 6.10 ( 1%) usr LRA create live ranges : 88.64 (11%) usr LRA hard reg assignment : 45.17 ( 6%) usr LRA coalesce pseudo regs: 2.24 ( 0%) usr Note how IRA is *faster* without the lra_simple_p patch. The cost comes back in "LRA hard reg assignment" and "LRA create live ranges" where I assume the latter is a consequence of running lra_create_live_ranges a few more times to work for the hard-reg assignment phase. Do you have an idea why IRA might be faster without the lra_simple_p thing? Maybe there's a way to get the best of both... Ciao! Steven