Re: RFC reminder: an alternative -fsched-pressure implementation

Richard Sandiford Tue, 17 Apr 2012 01:30:33 -0700

Vladimir Makarov <vmaka...@redhat.com> writes:
> On 04/10/2012 09:35 AM, Richard Sandiford wrote:
>> Hi Vlad,
>>
>> Back in Decemember, when we were still very much in stage 3, I sent
>> an RFC about an alternative implementation of -fsched-pressure.
>> Just wanted to send a reminder now that we're in the proper stage:
>>
>> http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01684.html
>>
>> Ulrich has benchmarked it on ARM, S/390 and Power7 (thanks), and got
>> reasonable results.  (I mentioned bad Power 7 results in that message,
>> because of the way the VSX_REGS class is handled.  Ulrich's results
>> are without -mvsx though.)
>>
>> The condition I orignally set myself was that this patch should only
>> go in if it becomes the default on at least one architecture,
>> specifically ARM.  Ulrich tells me that Linaro have now made it
>> the default for ARM in their GCC 4.7 release, so hopefully Ramana
>> would be OK with doing the same in upstream 4.8.
>>
>> I realise the whole thing is probably more complicated and ad-hoc
>> than you'd like.  Saying it can't go in is a perfectly acceptable
>> answer IMO.
>>
> I have a mixed feeling with the patch.  I've tried it on SPEC2000 on 
> x86/x86-64 and ARM.  Model algorithm generates bigger code up to 3.5% 
> (SPECFP on x86), 2% (SPECFP on 86-64), and 0.23% (SPECFP on ARM) in 
> comparison with the current algorithm.


Yeah.  That's expected really, since one of the points of the patch is
to allow more spilling than the current algorithm does.

One of the main problems I was seeing with the current algorithm was
that it was too conservative, and prevented spilling even when it would
be benificial.  This included spilling outside loops.  E.g. if you have
14 available GPRs, as on ARM, and have 10 pseudos live across a loop
but not used within it, the current algorithm uses 10 registers as the
starting pressure when scheduling the loop.  So the current algorithm
tries hard to avoid spilling those 10 registers, even if that restricts
the amount of reordering within the loop.  The new algorithm was supposed
to allow such registers to be spilled, but those extra spills would
increase code size.

> It is slower too.  Although the difference is quite insignificant on
> Corei7, compiler speed slowdown achieves 0.4% on SPECFP2000 on arm.

Hmm, that's not good.

> The algorithm also generates slower code on x86 (1.5% on SPECINT and
> 5% on SPECFP200) and practically the same average code on x86-64 and
> ARM (I've tried only SPECINT on ARM).

Yeah, that's underwhelming too.  It looks like there's a danger that
this would become an A8- and A9-specific pass, so we would really need
better ARM results than that.

> On the other hand, I don't think that 1st insn scheduling will be ever 
> used for x86.  And although the SPECFP2000 rate is the same on x86-64 I 
> saw that some SPECFP2000 tests benefit from your algorithm on x86-64 
> (one amazing difference is 70% improvement on swim on x86-64 although it 
> might be because of different reasons like alignment or cache 
> behaviour).  So I think the algorithm might work better on processors 
> with more registers.

Notwithstanding that this is a goemean, I assume there were some bad
results to cancel out the gain?

> As for the patch itself, I think you should document the option in 
> doc/invoke.texi.  It is missed.

I forgot to say, but that was deliberate.  I see this as a developer
option rather than a user option.  The important thing was to allow
backends to default to the new algorithm if they want to.  Providing
command-line control gives developers an easier way of seeing which
default makes sense, but I don't think the option should be advertised
to users.

> Another minor mistake I found is one line garbage (I guess from
> -fira-algorithm) in description of -fsched-pressure-algorithm in
> common.opt.

Oops, thanks :-)

Anyway, given those results and your mixed feelings, I think it would
be best to drop the patch.  It's a lot of code to carry around, and its
ad-hoc nature would make it hard to change in future.  There must be
better ways of achieving the same thing.

Richard

Re: RFC reminder: an alternative -fsched-pressure implementation

Reply via email to