Bill Coffman <[EMAIL PROTECTED]> wrote:
> Leo,

> Thanks for your suggestions and comments.

Welcome and thanks to you for looking at that nasty piece of code ;)

> On Wed, 20 Oct 2004 10:35:04 +0200, Leopold Toetsch <[EMAIL PROTECTED]> wrote:
>> Some remargs WRT gen{3,4}.pl:
>> 1) While these programs exhibit some worst case register layout it's
>>    probably not a very typical layout.

> Agreed.  The idea was to automate and compare to gcc.  There are
> real-world tests already in the parrot test suite, but because I can
> generate millions of such cases, one hope is to detect errors, and to
> get some kind of performance metric, even if these programs are
> artificial.

Ok, that's of course very reasonable. One of the problems Dan
encountered is memory usage, though. Currently the interference graph is
built for all four kinds of registers in one piece. By splitting the
register allocation into four passes, much memory can be saved.

But the memory usage can be estimated w/o using four register kinds too.

>> I'd change the simulation program to use PMCs to allow for 2). Now when
>> it comes to spilling, these lexicals or globals don't need a new
>> storage, their live range can just get discarded and at the next usage
>> of this lexical or global it just can be refetched[1]. Implementing this
>> should already vastly improve the register allocation.
>>
>> [1] The refetching can of course be into a different Parrot register.
>> Thus the usage of globals or lexicals wouldn't interfer and generate
>> huge live ranges over the whole function as it currently does.

> I don't quite understand this.  You think  I should create PMC
> variables, instead of integers?

Yes. As said, with PMC lexicals or globals the register allocation can
be simplified. But it depends. E.g.

   .local pmc foo
   ...
   find_global foo, "foo"
   ...
   find_global foo, "foo"

The C<foo> is an C<out> argument. If your algorithm does register
renaming, all is fine. If it doesn't, like now, all the usage of the
C<foo>s take the same register over the whole unit, which is the major
PITA of the current register allocation.

And of course, lexicals and globals already have a storage, you don't
need to spill them.

One more difference between {I,N,S} and {P} registers is the value
behavior. E.g.

  =item B<add>(out INT, in INT, in INT)
  =item B<add>(out NUM, in NUM, in NUM)
  =item B<add>(in PMC, in PMC, in PMC)
               ^^

That is, with {I,N,S} register ranges are cut early at each binary
operation. You don't have that with PMCs.

> ... It's probably a good idea to have a
> mix of the four types I suppose.

If you want to compare with gcc too, you could use three types:

   I ... int
   N ... double
   P ... V4SI   (vector for sse, altivec, ...)

> Once thing I can say is that the interference graph is pretty
> conservatively generated.  When a variable is reassigned, it can be
> treated as a new variable which can reduce register pressure, but I
> don't think the code is doing that yet.

Yep, register renaming. That would already improve things vastly.

[ metrics ]

> could be helpful.  That could be incorporated into the register
> allocator itself (which already is, if you run with -d 0008, then grep
> Spill).

With much less output "-v". The "spill" number is ok, the usage count is
broken, though.

>> I don't know if we need some syntax to easily support lexicals or
>> globals or if the register allocation can deduce that itself. But if a
>> new syntax simplifies that, we put it in.

> My preference is to not distinguish between various types of
> variables, but to have the algorithms deal best with nodes based on
> the structure of the graph(s).

That's good. But as said, you don't have to spill lexicals or globals.

>> For 3) there is already a separate pass in
>> imcc/reg_alloc.c:allocate_non_interfering().

> Yes.  This function seems to find variables whose live ranges (life
> ranges) are restricted to a single basic block, and assign them reg
> 28,29, or 30.

> ....  It's
> usually best to color the easy stuff last.

I did assign them first to reduce the size of the interference graph,
which was one of the problems - "out of memory". But that is better
solved by doing four passes.

> ...  Another
> optimization to the algorithm is to incorporate a score that causes
> the most frequently accessed nodes not to get spilled.

... and stuff inside inner loops.

> scoring mechanism is pretty primitive right now

Yep. A broad field for experiments.

> ...  I think it's important to
> look at experimental results as these decisions are made.

Yep.

> With so many registers, it may be more important to focus on swapping
> registers in and out between subroutine calls, rather than the actual
> register allocation optimization.

That's currently being addressed. Each subroutine gets a fresh set of
registers.

> ... Another thing that might be worth
> checking, after parrot gets out of "alpha", is if reducing or
> increasing the number of registers will help performance.  Just a
> thought.

The 4 x 32 is pretty good. It matches recent hardware too. But if a good
register algorithm shows that 4 x 16 is enough, we can of course
decrease the register number. Increasing shouldn't be necessary.

> - Bill

leo

Reply via email to