Leo,

Thanks for your suggestions and comments.

On Wed, 20 Oct 2004 10:35:04 +0200, Leopold Toetsch <[EMAIL PROTECTED]> wrote:
> Some remargs WRT gen{3,4}.pl:
> 1) While these programs exhibit some worst case register layout it's
>    probably not a very typical layout.

Agreed.  The idea was to automate and compare to gcc.  There are
real-world tests already in the parrot test suite, but because I can
generate millions of such cases, one hope is to detect errors, and to
get some kind of performance metric, even if these programs are
artificial.  To get real metrics that people would pay attention to,
we might implement SPEC99, etc.

> 2) RL programs have lexicals and globals access spread over the code like
>    in the generated gen.imc
> 3) and that's intersparsed with highly local access to temporaries coming
>    from expression evaluations.

I think I could modify the tests so that the variables have a Poisson
distribution, as far as their usage distance goes (max number of lines
away from first use).  This would cause some of them to look more like
temporary variables, and be a somewhat better simulation of real
programs.

> I'd change the simulation program to use PMCs to allow for 2). Now when
> it comes to spilling, these lexicals or globals don't need a new
> storage, their live range can just get discarded and at the next usage
> of this lexical or global it just can be refetched[1]. Implementing this
> should already vastly improve the register allocation.
>
> [1] The refetching can of course be into a different Parrot register.
> Thus the usage of globals or lexicals wouldn't interfer and generate
> huge live ranges over the whole function as it currently does.
> 

I don't quite understand this.  You think  I should create PMC
variables, instead of integers?  It's probably a good idea to have a
mix of the four types I suppose.

Once thing I can say is that the interference graph is pretty
conservatively generated.  When a variable is reassigned, it can be
treated as a new variable which can reduce register pressure, but I
don't think the code is doing that yet.  It sounds like you're talking
about implementing the interference graph better.  I may get into that
too, but for now, I want to get the rester coloring (mapping vars to
registers) working better.

Also, I'd like to be able to capture the effects of changes in metrics
(via scripts like those I posted) so we can see if different ideas
will actually improve the situation or not.  I don't even have any
runtime checks yet, but creating a score to see how much was spilled,
could be helpful.  That could be incorporated into the register
allocator itself (which already is, if you run with -d 0008, then grep
Spill).

> I don't know if we need some syntax to easily support lexicals or
> globals or if the register allocation can deduce that itself. But if a
> new syntax simplifies that, we put it in.

My preference is to not distinguish between various types of
variables, but to have the algorithms deal best with nodes based on
the structure of the graph(s).

> 
> For 3) there is already a separate pass in
> imcc/reg_alloc.c:allocate_non_interfering().

Yes.  This function seems to find variables whose live ranges (life
ranges) are restricted to a single basic block, and assign them reg
28,29, or 30.  My preference is that low hanging fruit like temporary
vars should be assigned by the algorithm.  Such variables will tend
not to have much interference with others, and are thus probably
easier to color.  My experience with coloring, is that assigning low
degree (aka arity, #neighbors) nodes to fixed colors first, will
actually reduce the performance of the graph coloring algorithm.  It's
usually best to color the easy stuff last.

The algorithm I'd like to use is one that happens to work very well
for interference graphs, and it is very simple.  This algorithm
selects a node that interferes with the fewest other variables (lowest
degree), and removes it from the interference graph, and puts it into
a stack.  It then recalculates the lower degrees of each neighbor, and
iteratively removes all nodes from the graph in this way, pushing them
onto the stack.  It then, pops off the nodes from the stack and colors
them in lifo order, using the first available color.  Nodes that need
a color higher that 30, or whatever the max. turns out to be, are
spilled.  The spilled nodes are always in the densest part of the
graph, where something has to spill.

I believe this is essentially the algorithm Briggs/Chaitin used, and
some variation of it has been used in most compilers since the 70's. 
One exception is that certain variables want certain colors
(want_regno) -- those should be colored first, as the current code
does.

A disadvantage of the above algorithm is that it tends to randomly
select which nodes get spilled, among the densest parts.  Another
optimization to the algorithm is to incorporate a score that causes
the most frequently accessed nodes not to get spilled.  (The current
scoring mechanism is pretty primitive right now -- someone wrote "I
have no clue of how good it is")  There are also other possible
optimizations, like splitting the live-ranges of certain long lived,
rarely used variables, to again, reduce register pressure.

Currently, there are many simple optimizations to be done to the
register allocation that would have a dramatic effect on, at least
runtime.  I think that implementing the above algorithm would be
helpful, and scale with GCC as far as runtime goes.  Next, probably
improving the interference graph, and then the above optimizations.  I
am open to suggestions about the best path.  I think it's important to
look at experimental results as these decisions are made.

With so many registers, it may be more important to focus on swapping
registers in and out between subroutine calls, rather than the actual
register allocation optimization.  Another thing that might be worth
checking, after parrot gets out of "alpha", is if reducing or
increasing the number of registers will help performance.  Just a
thought.

- Bill

Reply via email to