Vladimir Makarov <vmaka...@redhat.com> writes: > On 04/10/2012 09:35 AM, Richard Sandiford wrote: >> Hi Vlad, >> >> Back in Decemember, when we were still very much in stage 3, I sent >> an RFC about an alternative implementation of -fsched-pressure. >> Just wanted to send a reminder now that we're in the proper stage: >> >> http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01684.html >> >> Ulrich has benchmarked it on ARM, S/390 and Power7 (thanks), and got >> reasonable results. (I mentioned bad Power 7 results in that message, >> because of the way the VSX_REGS class is handled. Ulrich's results >> are without -mvsx though.) >> >> The condition I orignally set myself was that this patch should only >> go in if it becomes the default on at least one architecture, >> specifically ARM. Ulrich tells me that Linaro have now made it >> the default for ARM in their GCC 4.7 release, so hopefully Ramana >> would be OK with doing the same in upstream 4.8. >> >> I realise the whole thing is probably more complicated and ad-hoc >> than you'd like. Saying it can't go in is a perfectly acceptable >> answer IMO. >> > I have a mixed feeling with the patch. I've tried it on SPEC2000 on > x86/x86-64 and ARM. Model algorithm generates bigger code up to 3.5% > (SPECFP on x86), 2% (SPECFP on 86-64), and 0.23% (SPECFP on ARM) in > comparison with the current algorithm.
Yeah. That's expected really, since one of the points of the patch is to allow more spilling than the current algorithm does. One of the main problems I was seeing with the current algorithm was that it was too conservative, and prevented spilling even when it would be benificial. This included spilling outside loops. E.g. if you have 14 available GPRs, as on ARM, and have 10 pseudos live across a loop but not used within it, the current algorithm uses 10 registers as the starting pressure when scheduling the loop. So the current algorithm tries hard to avoid spilling those 10 registers, even if that restricts the amount of reordering within the loop. The new algorithm was supposed to allow such registers to be spilled, but those extra spills would increase code size. > It is slower too. Although the difference is quite insignificant on > Corei7, compiler speed slowdown achieves 0.4% on SPECFP2000 on arm. Hmm, that's not good. > The algorithm also generates slower code on x86 (1.5% on SPECINT and > 5% on SPECFP200) and practically the same average code on x86-64 and > ARM (I've tried only SPECINT on ARM). Yeah, that's underwhelming too. It looks like there's a danger that this would become an A8- and A9-specific pass, so we would really need better ARM results than that. > On the other hand, I don't think that 1st insn scheduling will be ever > used for x86. And although the SPECFP2000 rate is the same on x86-64 I > saw that some SPECFP2000 tests benefit from your algorithm on x86-64 > (one amazing difference is 70% improvement on swim on x86-64 although it > might be because of different reasons like alignment or cache > behaviour). So I think the algorithm might work better on processors > with more registers. Notwithstanding that this is a goemean, I assume there were some bad results to cancel out the gain? > As for the patch itself, I think you should document the option in > doc/invoke.texi. It is missed. I forgot to say, but that was deliberate. I see this as a developer option rather than a user option. The important thing was to allow backends to default to the new algorithm if they want to. Providing command-line control gives developers an easier way of seeing which default makes sense, but I don't think the option should be advertised to users. > Another minor mistake I found is one line garbage (I guess from > -fira-algorithm) in description of -fsched-pressure-algorithm in > common.opt. Oops, thanks :-) Anyway, given those results and your mixed feelings, I think it would be best to drop the patch. It's a lot of code to carry around, and its ad-hoc nature would make it hard to change in future. There must be better ways of achieving the same thing. Richard