Re: Register allocation/spilling and register volitility

2004-11-12 Thread Dan Sugalski
At 10:51 AM -0800 11/12/04, Michel Pelletier wrote:
 > Date: Fri, 12 Nov 2004 04:10:20 -0800
 From: Bill Coffman <[EMAIL PROTECTED]>

 > > And spilling?
 >
 > Well, I'm proposing a variable-sized register frame. With very little
 > additions we could run with more then 32 registers per kind (there are a
 > few bitmasks currently that would need adaptions, but not much).
 I think this is a great idea. 
So do I.
I don't. It's staying the way it is, as are the implicit operands to 
some of the ops, and the unprototyped calling conventions.
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Register allocation/spilling and register volitility

2004-11-12 Thread Michel Pelletier
> Date: Fri, 12 Nov 2004 04:10:20 -0800
> From: Bill Coffman <[EMAIL PROTECTED]>

> > > And spilling?
> > 
> > Well, I'm proposing a variable-sized register frame. With very little
> > additions we could run with more then 32 registers per kind (there are a
> > few bitmasks currently that would need adaptions, but not much).
> 
> I think this is a great idea.  

So do I.  Having a fixed number of registes is like having the C keywords 
register0 through registerN.  All registers should be $Pn registers and let 
the guts deal with how that maps physically.  I certainly don't care which 
$Pn register maps to which Pn register, should a PIR programmer ever care?   
As far as I can tell the only reason why a PIR programmer would even use Pn 
registers is for implicit arguments, leading me into my next point:

I strongly feel that Leo was right to suggest removing implicit register 
arguments from opcodes.  I responded to that email but somehow it didn't get 
to the list AFAICT.  I can't think of any good reason or use case for 
implicit arguments.  Can one be provided?  From my experiences with Parakeet, 
it's just more weird, implicit stuff that I have to keep in my head that 
confuses new PIR programmers who look at my code.

Since I'm on the box I might as well get it all out: I don't think 
unprototyped pcc calls should use the first 11 arguments in P registers and 
the rest in an array, again it just doesn't make any sense to me other than 
to violate DON (Don't Optimize Now, in tribute to Don Knuth).  Just pass an 
array and be done with it.  If the guts want to register-optimize arguments 
then fine, why does the PIR programmer have to do it by hand? (this is really 
an specific extension of the argument against implicit register usage)

Unprototyped arguments are usually built dynamically at run time anyway or 
called in from an external language.  I wager the performance benefit is 
miniscule to nothing compared to the contortious PIR code needed to pack and 
unpack arguments into registers, spill arguments into an array, avoid those 
11 P regisers all the time, and the pressure it all puts on the register 
allocator.  For prototyped calls I can (barely) see this as being useful, but 
not unprototyped.

Fixed registers, implicit arguments, and calling contortions are "dead 
chickens" (coined, AFAIK, by Jim Fulton).   They're the completely useless 
things you have to wave over the machine to get it to do what you want.  The 
less dead chickens you have to wave over Parrot, the better.

-Michel


Re: Register allocation/spilling and register volitility

2004-11-12 Thread Leopold Toetsch
Bill Coffman wrote:
In that case, I'll focus on the register renaming, live-range
analysis, etc.  
Great.
In light of this, I think I'll send in my patch, in case it is
helpful.  The problem is that some routine outside of imc_reg_alloc()
is sending in conflicting register preallocation. 
Yep. I've already found two flaws within pcc.c. One is fixed (thanks to 
your allocation conflict check).

The second is tricky: we have
   _global_dumper()
   ...
   object."dumper"(pmc, name)# args are P5, S5
and _global_dumper() does:
   .return(object)
The function is returning the object in P5, which means that return 
conventions are indicating a PMC result in P5, and that get's copied 
into the caller's frame. But the caller doesn't use it. The usage of P5 
in the caller (pmc) is clobbered...

I don't know, if that is an abuse of prototyped calling conventions (we 
need a definition WRT call/return argument mismatches).

If not, we can never assign possible return values around a function call.
Bill
leo


Re: Register allocation/spilling and register volitility

2004-11-12 Thread Bill Coffman
On Mon, 8 Nov 2004 12:20:19 +0100, Leopold Toetsch <[EMAIL PROTECTED]> wrote:
> Jeff Clites <[EMAIL PROTECTED]> wrote:
> > On Nov 8, 2004, at 1:34 AM, Leopold Toetsch wrote:
> 
> >> If frames aren't adjacent, normal argument copying can be done anyway.
> 
> > This would seem to require the same types of runtime checks that you
> > are objecting to below,
> 
> Not runtime. The register allocator knows the amount of needed
> non-volatiles and volatile registers. Depending on that a call can place
> arguments directly into the incoming regs of the caller or not. Then if
> at runtime copying is needed (because e.g. frames aren't adjacent), the
> function arguments are copyied. This is fully transparent.
> 
> > I discussed that in item (1) under "Further notes".
> 
> Yep, saw that then ;)
> 
> >> It's not needed. I've a better scheme in mind, which addressess
> >> efficieny as well as argument passing.
> 
> > And spilling?
> 
> Well, I'm proposing a variable-sized register frame. With very little
> additions we could run with more then 32 registers per kind (there are a
> few bitmasks currently that would need adaptions, but not much).

I think this is a great idea.  Mostly, I was thinking of the case
where there are less than 32 registers needed.  The allocator can try
to reduce the number of registers that will need to be used.  I think
that addressing could become an issue is there are too many registers,
but I still haven't figured out how the bytecode looks yet.

> But basically, spilling should not be needed at all, if the register
> allocator isn't as broken as it now is. Dan got some really evil
> subroutines, so we have RL test cases. We'll see.

In that case, I'll focus on the register renaming, live-range
analysis, etc.  The memory leak, which might not be actually lost
memory, but just wasteful useage of memory, is pretty much related to
the spill code.  Most of the grotesqueness of the allocator is in the
spiller.  If this has a high probability of changing, I'll focus on
other stuff, like register renaming, which is a lot harder, but with
higher longterm payoff.

In light of this, I think I'll send in my patch, in case it is
helpful.  The problem is that some routine outside of imc_reg_alloc()
is sending in conflicting register preallocation.  My code does lot's
of assertions, and finds this, one way or another.  I'll put in a
variable to turn off color consistency checking, which will probably
then pass the tests.

Bill

> 
> > JEff
> 
> leo
>


Re: Register allocation/spilling and register volitility

2004-11-08 Thread Leopold Toetsch
Jeff Clites <[EMAIL PROTECTED]> wrote:
> On Nov 8, 2004, at 1:34 AM, Leopold Toetsch wrote:

>> If frames aren't adjacent, normal argument copying can be done anyway.

> This would seem to require the same types of runtime checks that you
> are objecting to below,

Not runtime. The register allocator knows the amount of needed
non-volatiles and volatile registers. Depending on that a call can place
arguments directly into the incoming regs of the caller or not. Then if
at runtime copying is needed (because e.g. frames aren't adjacent), the
function arguments are copyied. This is fully transparent.

> I discussed that in item (1) under "Further notes".

Yep, saw that then ;)

>> It's not needed. I've a better scheme in mind, which addressess
>> efficieny as well as argument passing.

> And spilling?

Well, I'm proposing a variable-sized register frame. With very little
additions we could run with more then 32 registers per kind (there are a
few bitmasks currently that would need adaptions, but not much).

But basically, spilling should not be needed at all, if the register
allocator isn't as broken as it now is. Dan got some really evil
subroutines, so we have RL test cases. We'll see.

> JEff

leo


Re: Register allocation/spilling and register volitility

2004-11-08 Thread Jeff Clites
On Nov 8, 2004, at 1:34 AM, Leopold Toetsch wrote:
Jeff Clites <[EMAIL PROTECTED]> wrote:
OTOH it doesn't really matter, if the context structure is in the
frame too. We'd just need to skip that gap. REG_INT(64) or I64 is as
valid as I0 or I4, as long as it's assured, that it's exactly
addressing the incoming argument area of the called function.

A problem with this is that you can't be sure that you can actually
have the "next" frame of registers adjacent to the current frame--they
might already be taken. Imagine A calls B, then B creates a
continuation and stores it in a global, then returns.
Please read the proposal summary by Miroslav Silovic, keyword 
"watermark".
If frames aren't adjacent, normal argument copying can be done anyway.
This would seem to require the same types of runtime checks that you 
are objecting to below, unless user code is expected to explicitly 
check whether it's supposed to be assigning to I3 v I(3 + 32x?). And 
that still seems to require copying, in my case of a function a() which 
calls a function b() with the exact same arguments.

Keep the old-scheme registers inside the interpreter structure, *as
well as* the new indirect registers. Call the registers in the
interpreter I0 to I31, and the indirect registers I32 to whatever.
That would need two different addressing modes depending on the 
register
number. That'll lead to considerable code bloat: we'd have all possible
permutations for direct/indirect registers. Doing it at runtime only
would be a serious slowdown.
I discussed that in item (1) under "Further notes".
It's not needed. I've a better scheme in mind, which addressess
efficieny as well as argument passing.
And spilling?
JEff


Re: Register allocation/spilling and register volitility

2004-11-08 Thread Leopold Toetsch
Jeff Clites <[EMAIL PROTECTED]> wrote:

>> OTOH it doesn't really matter, if the context structure is in the
>> frame too. We'd just need to skip that gap. REG_INT(64) or I64 is as
>> valid as I0 or I4, as long as it's assured, that it's exactly
>> addressing the incoming argument area of the called function.

> A problem with this is that you can't be sure that you can actually
> have the "next" frame of registers adjacent to the current frame--they
> might already be taken. Imagine A calls B, then B creates a
> continuation and stores it in a global, then returns.

Please read the proposal summary by Miroslav Silovic, keyword "watermark".
If frames aren't adjacent, normal argument copying can be done anyway.

> Keep the old-scheme registers inside the interpreter structure, *as
> well as* the new indirect registers. Call the registers in the
> interpreter I0 to I31, and the indirect registers I32 to whatever.

That would need two different addressing modes depending on the register
number. That'll lead to considerable code bloat: we'd have all possible
permutations for direct/indirect registers. Doing it at runtime only
would be a serious slowdown.

It's not needed. I've a better scheme in mind, which addressess
efficieny as well as argument passing.

leo


Register allocation/spilling and register volitility

2004-11-08 Thread Jeff Clites
From other threads:
Now we are placing arguments or return values in registers according 
to PDD03 and the other end has immediately access to the placed 
values, because the register file is in the interpreter.

With the indirect addressing of the register frame, this argument 
passing is probably the most costly part of function calls and 
returns, because we have to copy the values.
and
Anyway, if you can pop both register frames -and- context structures, 
you won't run GC too often, and everything will nicely fit into the 
cache.
I thought about that too, but I'd rather have registers adjacent, so 
that the caller can place function arguments directly into the callers 
in arguments.

OTOH it doesn't really matter, if the context structure is in the 
frame too. We'd just need to skip that gap. REG_INT(64) or I64 is as 
valid as I0 or I4, as long as it's assured, that it's exactly 
addressing the incoming argument area of the called function.
A problem with this is that you can't be sure that you can actually 
have the "next" frame of registers adjacent to the current frame--they 
might already be taken. Imagine A calls B, then B creates a 
continuation and stores it in a global, then returns. If A makes 
another function call, it can't just use the adjacent register 
frame--it's still "taken". (I'm referring here to the "sliding register 
window" idea of course.) But this is reminding me of another idea I've 
had.

I've been thinking for a while about another idea to decrease some of 
the copying which we are now doing for function calls, as well as the 
allocation of register sets during subroutine calls, and which would as 
a side-effect allow us to reduce the need for register spilling.

First, a code snippet. Consider the following C code:
int a(int x) { return b(x); }
int b(int x) { return c(x + 1); }
int c(int x) { return x + 7; }
As compiled on the PPC, this code is very compact--each function is 
implemented by just one or two instructions, and only one register is 
used. There are two key factors here: (1) no register preservation is 
needed, and (2) the call from a() to b() is just a branch, since the 
call to a() has already loaded the registers with the correct values 
for the call to b().

By my thinking, register usage falls into three basic categories:
1) Registers used for parameter passing and returns values for function 
calls.

2) Registers used to hold values which need to be preserved across 
function calls.

3) Registers used to hold values which do not need to be preserved 
across function calls.

Really, (1) and (3) are similar--in either case, you have registers 
whose values are allowed to change across function calls. (For register 
which hold return values that's obvious, and for those used to pass 
parameters, it seems fair that these be expected to change across a 
function call.) So we really have two cases--volatile and non-volatile 
register usage.

In the PPC ABI, half the registers are treated as volatile, and the 
other half as non-volatile. For the PPC, this corresponds to 
caller-preserved v. callee-preserved. Because of continuations[1], 
parrot can't have callee-preserved registers, but it could still have 
volatile v. non-volatile registers. Here's what I'm thinking:

Keep the old-scheme registers inside the interpreter structure, *as 
well as* the new indirect registers. Call the registers in the 
interpreter I0 to I31, and the indirect registers I32 to whatever.[2] 
I'll call these the "direct" and "indirect" registers. By their very 
nature, the direct registers are volatile, and the indirect registers 
are non-volatile. What does this buy us? The following:

1) Parameter passing and return occur via the established calling 
conventions, in the volatile registers. No extra copying is needed upon 
function call or return--the volatile registers are physically the same 
for the caller and the callee.

2) "Temporary" calculations can use the volatile registers. Values 
which need preserving can use the non-volatile registers directly.

3) In functions which need no non-volatile registers, there's no need 
to allocate the indirect registers at all.

4) Cases like my example code above can compile just as compactly as 
the PPC case. With the volatile Parrot registers mapped to volatile CPU 
registers (in the PPC case at least), this would be highly efficient: 
you'd end up with asm similar to what you have in the C case above, and 
you'd not have a need to save the CPU registers back to Parrot 
registers (other than those used in the calling conventions) for Parrot 
function calls out of JIT code.

5) Because this opens things up to more than 32 registers, code could 
use as many registers as needed. You'd never have register spilling 
per-se in order to accommodate local variables, though you'd still want 
a smart register allocator which minimized the number of registers 
needed (for memory locality as well as minimizing allocation). But a 
very simple one-non-vo