Re: patch implementing a new pass for register-pressure relief through live range shrinkage

Vladimir Makarov Wed, 06 Nov 2013 08:59:54 -0800

On 11/6/2013, 4:17 AM, Richard Biener wrote:

On Tue, Nov 5, 2013 at 4:35 PM, Vladimir Makarov <vmaka...@redhat.com> wrote:

   I'd like to add a new experimental optimization to the trunk.  This
optimization was discussed on RA BOF of this summer GNU Cauldron.


   It is a register pressure relief through live-range shrinkage.  It
is implemented on the scheduler base and uses register-pressure insn
scheduling infrastructure.  By rearranging insns we shorten pseudo
live-ranges and increase a chance to them be assigned to a hard
register.

   The code looks pretty simple but there are a lot of works behind
this patch.  I've tried about ten different versions of this code
(different heuristics for two currently existing register-pressure
algorithms).

   I think it is *upto target maintainers* to decide to use or not to
use this optimization for their targets.  I'd recommend to use this at
least for x86/x86-64.  I think any OOO processor with small or
moderate register file which does not use the 1st insn scheduling
might benefit from this too.

   On SPEC2000 for x86/x86-64 (I use Haswell processor, -O3 with
general tuning), the optimization usage results in smaller code size
in average (for floating point and integer benchmarks in 32- and
64-bit mode).  The improvement better visible for SPECFP2000 (although
I have the same improvement on x86-64 SPECInt2000 but it might be
attributed mostly mcf benchmark unstability).  It is about 0.5% for
32-bit and 64-bit mode.  It is understandable, as the optimization has
more opportunities to improve the code on longer BBs.  Different from
other heuristic optimizations, I don't see any significant worse
performance.  It gives practically the same or better performance (a
few benchmarks imporoved by 1% or more upto 3%).

   The single but significant drawback is additional compilation time
(4%-6%) as the 1st insn scheduling pass is quite expensive.  So I'd
recommend target maintainers to switch it on only for -Ofast.

Generally I'd not recomment viewing -Ofast as -O4 but as -O3
plus generally "unsafe" optimizations.  So I'd not enable it for -Ofast
but for -O3 - possibly also with -Os if indeed the main motivation is
also code-size improvements (-Os is a similar beast as -O3, spend
as much time as you can on optimizing size).

Ok. Probably my recommendation is wrong. It is actually upto targetmaintainers to decide when to use the optimization and or use it at allfor default (may be they just decide to use it only for SPEC reporting).

I guess that in some time we will need to use something like -O4 forgreedy algorithms (there are a lot of researches in this area, e.g. I amreading an article about optimal register-pressure sensitive insnscheduling but the optimization can be constrained for time, for example1ms for each insn, and still to produce better results than the currentheuristics). I am sure such algorithms will be coming.

Btw, thanks for working on this.  How does it relate to
-fsched-pressure?

It is based on -fsched-pressure infrastructure but has differentheuristics and goals. GCC with 1st insn scheduling even with-fsched-pressure still produces worse results on mainstream x86/x86-64processors that GCC without it. I've also tried -flive-range-shrinkage-fschedule-insns -fsched-pressure, but just -flive-range-shrinkage isbetter for x86/x86-64.

By the way, LLVM uses insn-scheduling for x86/x86-64 before RA, but itgoal is only register-pressure decrease (for x86, for x86-64 it is a bitmore complicated). So with this optimization we are just catching upwith LLVM (which is unusual for us in optimization area).

   Does it treat all register classes the same?
On x86 mostly the few fixed registers for some of the integer pipeline
instructions hurt, x86_64 has enough general and FP registers?

It treats them the same (although it is different for different classesas they have different number of available regs). It is always somekind of approximation as we use register pressure classes here not theclasses which will be actually used for RA. It is even more complicatedas IRA actually uses dynamic classes (only sets of regs which areprofitable, e.g. it can be different from classes defined in the targetfile as reg in classes are caller-saved or some specific hard regs areused for arg passing). It makes graph coloring better for irregularregister file architectures. In whole as I remember, dynamic classesgave about 1% improvement even for ppc.

I should say that presence of hard regs in RTL (e.g. for parameterpassing) is still a challenge for live-range shrinkage andregister-pressure scheduling. It should be addressed somehow.

Re: patch implementing a new pass for register-pressure relief through live range shrinkage

Reply via email to