Steven Bosscher wrote:


Every time some RTL optimizer is re-re-re-re-re-evaluated, it turns
out we lose without it. Good luck to you, but I think you're seriously
underestimating the complexity of things here.



Its clearly not as good as a new register allocator would be, but the
effort to benefit ratio ought to be a lot higher for RABLET than for a
register allocator rewrite.


There is a register allocator rewrite under way, from one of your
co-workers even. Is there any relation between Vlad's project and
yours, or are you going different ways with the same goal in mind? :-D


 Working on register allocation issues last three years and looking
at the new-ra project I can say that any project in this area has a
big chance to fail despite how good design looked at the first glance.
The problem is in complexity of RTL and lot of ports with very
specific issues which are described by gazillion of macros.  Redesign
and simplification of RTL could solve a lot of problems like code
selection, register allocation etc (although might create others).
But this task is much bigger than introducing tree-SSA because it
means rewriting all machine description files and practically equal to
redoing all ports.  Do we have resources for this?  I don't think so.

 So saying this, my point of view that the more projects we have in
this area, the better chance we will have to solve the problem.
Therefore I really appreciate what Andrew and Bernd Schmidt do.  It
might look as a waste of resources but we can not people force not to
do what they believe and want to do (e.g. we can not force Bernd not
to improve reload because he can work on a new register allocator. He
improves reload because he knows it best than others).

 As for Andrew's proposal, my opinion is that all this
transformations are done too early and we need them to do again on
rtl sometime.

o coalescing.  CSE can create more moves but more important thing is
 the extended coalescing can not be done here (or I don't know how it
 can be done here).  It is about moves generated because of
 two-address architecture constraints (regmove and global tries to
 solve this problem in ad hoc way e.g. through hard register
 preference by global).  It should be part of coalescing pass,
 because removing a move can prevent removing a higher priority move
 generated by reload because of the two address constraints.

o register pressure relief through live range splitting and/or
 rematerialization.  We have no accurate information here, because
 after that there are passes which change the pressure like insn
 scheduling and CSE.  Although insn scheduling has heuristic not to
 increase register pressure, it has very small priority (third or
 fourth).  Therefore insn scheduling can increase the pressure a lot
 (but sometimes decrease it too).  Insn scheduler with register
 renaming being implemented by ISP RAS might solve this problem, if
 it works only after the register allocator.  But this insn scheduler
 can work before the register allocator too and only its usage will
 show will it work only after the register allocator or in
 traditional way (before and the after the register allocator).

 Even without changing the register pressure by subsequent passes,
 there is another problem which is difficulty to calculate the
 register pressure excess.  We don't know what register class will be
 used for a pseudo-register (e.g. AREG or GENERAL_REGS for x86 which
 creates difference 6 in the register pressure).  Although reducing
 register pressure from 100 to 6 will be very helpful, my experience
 shows that the most frequent and interesting cases are on the
 border.

o register renaming is already done and effectively (because it uses
 the data flow analysis framework) by -fweb.  But I think it can be
 done more effectively by out-of-ssa pass.

 Actually what Andrew proposes (and more) I did two years ago on RTL
level close to the register allocator (see gcc summit article
"fighting register pressure in gcc").  The result was not satisfactory
for me and I moved on rewriting the register allocator.  Probably, I
should have committed more what I've done into the mainline.

 Probably what Andrew proposes can be done faster on tree-SSA
although doing it on RTL we would have more accurate information.  In
any case it will improve code in some cases and can be used as a
temporary solution (until new register allocator projects will be done
or forever if they failed).  Andrew's proposal has a sense too with
the code reuse point of view if he wants to move on with RABLE
project.

 As for my project YARA, I don't know when it will be ready for the
mainline (at least one more year) because it includes removing reload
(the biggest and most complicated part of the RA).  It works now only
for x86 and x86_64, generates better code for SPECint2000 and
SPECFp2000 (at least for pentium4, nocona and coming woodcrest.  I
have no free AMD machine to make benchmarking).  I've just started
work on debugging ppc32 port, a lot of thing should be done to improve
code quality for this port.  ppc64 and itanium will come after that.

 Currently YARA already has what Andrew is going to do

 o register pressure relief in Morgan's style.
 o register coalescing (including the extended one).
 o simple register rematerialization (probably more could be done here).
 o register renaming as by product of building YARA IR.

The compiler with YARA is now less than 2% slower (on SPECInt2000)
than gcc with the current one (but all Andrew's proposed transformations
will take time too).  I should have informed gcc community more about
YARA project.  Actually I wanted to do it on this gcc summit
unfortunately my proposal was rejected.  I'll find other ways.

P.S. I found that register pressure was decreased last 6-8 months.  So
probably people were working on it.  I think it is wrong if each
optimization tries to decrease the pressure but itself.  It should be
done in one or two places.  Therefore once again Andrew's proposal has
a sense especially if it will be in gcc4.3.

Vlad

Reply via email to