Re: Register allocation/spilling and register volitility
At 10:51 AM -0800 11/12/04, Michel Pelletier wrote: > Date: Fri, 12 Nov 2004 04:10:20 -0800 From: Bill Coffman <[EMAIL PROTECTED]> > > And spilling? > > Well, I'm proposing a variable-sized register frame. With very little > additions we could run with more then 32 registers per kind (there are a > few bitmasks currently that would need adaptions, but not much). I think this is a great idea. So do I. I don't. It's staying the way it is, as are the implicit operands to some of the ops, and the unprototyped calling conventions. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Register allocation/spilling and register volitility
> Date: Fri, 12 Nov 2004 04:10:20 -0800 > From: Bill Coffman <[EMAIL PROTECTED]> > > > And spilling? > > > > Well, I'm proposing a variable-sized register frame. With very little > > additions we could run with more then 32 registers per kind (there are a > > few bitmasks currently that would need adaptions, but not much). > > I think this is a great idea. So do I. Having a fixed number of registes is like having the C keywords register0 through registerN. All registers should be $Pn registers and let the guts deal with how that maps physically. I certainly don't care which $Pn register maps to which Pn register, should a PIR programmer ever care? As far as I can tell the only reason why a PIR programmer would even use Pn registers is for implicit arguments, leading me into my next point: I strongly feel that Leo was right to suggest removing implicit register arguments from opcodes. I responded to that email but somehow it didn't get to the list AFAICT. I can't think of any good reason or use case for implicit arguments. Can one be provided? From my experiences with Parakeet, it's just more weird, implicit stuff that I have to keep in my head that confuses new PIR programmers who look at my code. Since I'm on the box I might as well get it all out: I don't think unprototyped pcc calls should use the first 11 arguments in P registers and the rest in an array, again it just doesn't make any sense to me other than to violate DON (Don't Optimize Now, in tribute to Don Knuth). Just pass an array and be done with it. If the guts want to register-optimize arguments then fine, why does the PIR programmer have to do it by hand? (this is really an specific extension of the argument against implicit register usage) Unprototyped arguments are usually built dynamically at run time anyway or called in from an external language. I wager the performance benefit is miniscule to nothing compared to the contortious PIR code needed to pack and unpack arguments into registers, spill arguments into an array, avoid those 11 P regisers all the time, and the pressure it all puts on the register allocator. For prototyped calls I can (barely) see this as being useful, but not unprototyped. Fixed registers, implicit arguments, and calling contortions are "dead chickens" (coined, AFAIK, by Jim Fulton). They're the completely useless things you have to wave over the machine to get it to do what you want. The less dead chickens you have to wave over Parrot, the better. -Michel
Re: Register allocation/spilling and register volitility
Bill Coffman wrote: In that case, I'll focus on the register renaming, live-range analysis, etc. Great. In light of this, I think I'll send in my patch, in case it is helpful. The problem is that some routine outside of imc_reg_alloc() is sending in conflicting register preallocation. Yep. I've already found two flaws within pcc.c. One is fixed (thanks to your allocation conflict check). The second is tricky: we have _global_dumper() ... object."dumper"(pmc, name)# args are P5, S5 and _global_dumper() does: .return(object) The function is returning the object in P5, which means that return conventions are indicating a PMC result in P5, and that get's copied into the caller's frame. But the caller doesn't use it. The usage of P5 in the caller (pmc) is clobbered... I don't know, if that is an abuse of prototyped calling conventions (we need a definition WRT call/return argument mismatches). If not, we can never assign possible return values around a function call. Bill leo
Re: Register allocation/spilling and register volitility
On Mon, 8 Nov 2004 12:20:19 +0100, Leopold Toetsch <[EMAIL PROTECTED]> wrote: > Jeff Clites <[EMAIL PROTECTED]> wrote: > > On Nov 8, 2004, at 1:34 AM, Leopold Toetsch wrote: > > >> If frames aren't adjacent, normal argument copying can be done anyway. > > > This would seem to require the same types of runtime checks that you > > are objecting to below, > > Not runtime. The register allocator knows the amount of needed > non-volatiles and volatile registers. Depending on that a call can place > arguments directly into the incoming regs of the caller or not. Then if > at runtime copying is needed (because e.g. frames aren't adjacent), the > function arguments are copyied. This is fully transparent. > > > I discussed that in item (1) under "Further notes". > > Yep, saw that then ;) > > >> It's not needed. I've a better scheme in mind, which addressess > >> efficieny as well as argument passing. > > > And spilling? > > Well, I'm proposing a variable-sized register frame. With very little > additions we could run with more then 32 registers per kind (there are a > few bitmasks currently that would need adaptions, but not much). I think this is a great idea. Mostly, I was thinking of the case where there are less than 32 registers needed. The allocator can try to reduce the number of registers that will need to be used. I think that addressing could become an issue is there are too many registers, but I still haven't figured out how the bytecode looks yet. > But basically, spilling should not be needed at all, if the register > allocator isn't as broken as it now is. Dan got some really evil > subroutines, so we have RL test cases. We'll see. In that case, I'll focus on the register renaming, live-range analysis, etc. The memory leak, which might not be actually lost memory, but just wasteful useage of memory, is pretty much related to the spill code. Most of the grotesqueness of the allocator is in the spiller. If this has a high probability of changing, I'll focus on other stuff, like register renaming, which is a lot harder, but with higher longterm payoff. In light of this, I think I'll send in my patch, in case it is helpful. The problem is that some routine outside of imc_reg_alloc() is sending in conflicting register preallocation. My code does lot's of assertions, and finds this, one way or another. I'll put in a variable to turn off color consistency checking, which will probably then pass the tests. Bill > > > JEff > > leo >
Re: Register allocation/spilling and register volitility
Jeff Clites <[EMAIL PROTECTED]> wrote: > On Nov 8, 2004, at 1:34 AM, Leopold Toetsch wrote: >> If frames aren't adjacent, normal argument copying can be done anyway. > This would seem to require the same types of runtime checks that you > are objecting to below, Not runtime. The register allocator knows the amount of needed non-volatiles and volatile registers. Depending on that a call can place arguments directly into the incoming regs of the caller or not. Then if at runtime copying is needed (because e.g. frames aren't adjacent), the function arguments are copyied. This is fully transparent. > I discussed that in item (1) under "Further notes". Yep, saw that then ;) >> It's not needed. I've a better scheme in mind, which addressess >> efficieny as well as argument passing. > And spilling? Well, I'm proposing a variable-sized register frame. With very little additions we could run with more then 32 registers per kind (there are a few bitmasks currently that would need adaptions, but not much). But basically, spilling should not be needed at all, if the register allocator isn't as broken as it now is. Dan got some really evil subroutines, so we have RL test cases. We'll see. > JEff leo
Re: Register allocation/spilling and register volitility
On Nov 8, 2004, at 1:34 AM, Leopold Toetsch wrote: Jeff Clites <[EMAIL PROTECTED]> wrote: OTOH it doesn't really matter, if the context structure is in the frame too. We'd just need to skip that gap. REG_INT(64) or I64 is as valid as I0 or I4, as long as it's assured, that it's exactly addressing the incoming argument area of the called function. A problem with this is that you can't be sure that you can actually have the "next" frame of registers adjacent to the current frame--they might already be taken. Imagine A calls B, then B creates a continuation and stores it in a global, then returns. Please read the proposal summary by Miroslav Silovic, keyword "watermark". If frames aren't adjacent, normal argument copying can be done anyway. This would seem to require the same types of runtime checks that you are objecting to below, unless user code is expected to explicitly check whether it's supposed to be assigning to I3 v I(3 + 32x?). And that still seems to require copying, in my case of a function a() which calls a function b() with the exact same arguments. Keep the old-scheme registers inside the interpreter structure, *as well as* the new indirect registers. Call the registers in the interpreter I0 to I31, and the indirect registers I32 to whatever. That would need two different addressing modes depending on the register number. That'll lead to considerable code bloat: we'd have all possible permutations for direct/indirect registers. Doing it at runtime only would be a serious slowdown. I discussed that in item (1) under "Further notes". It's not needed. I've a better scheme in mind, which addressess efficieny as well as argument passing. And spilling? JEff
Re: Register allocation/spilling and register volitility
Jeff Clites <[EMAIL PROTECTED]> wrote: >> OTOH it doesn't really matter, if the context structure is in the >> frame too. We'd just need to skip that gap. REG_INT(64) or I64 is as >> valid as I0 or I4, as long as it's assured, that it's exactly >> addressing the incoming argument area of the called function. > A problem with this is that you can't be sure that you can actually > have the "next" frame of registers adjacent to the current frame--they > might already be taken. Imagine A calls B, then B creates a > continuation and stores it in a global, then returns. Please read the proposal summary by Miroslav Silovic, keyword "watermark". If frames aren't adjacent, normal argument copying can be done anyway. > Keep the old-scheme registers inside the interpreter structure, *as > well as* the new indirect registers. Call the registers in the > interpreter I0 to I31, and the indirect registers I32 to whatever. That would need two different addressing modes depending on the register number. That'll lead to considerable code bloat: we'd have all possible permutations for direct/indirect registers. Doing it at runtime only would be a serious slowdown. It's not needed. I've a better scheme in mind, which addressess efficieny as well as argument passing. leo
Register allocation/spilling and register volitility
From other threads: Now we are placing arguments or return values in registers according to PDD03 and the other end has immediately access to the placed values, because the register file is in the interpreter. With the indirect addressing of the register frame, this argument passing is probably the most costly part of function calls and returns, because we have to copy the values. and Anyway, if you can pop both register frames -and- context structures, you won't run GC too often, and everything will nicely fit into the cache. I thought about that too, but I'd rather have registers adjacent, so that the caller can place function arguments directly into the callers in arguments. OTOH it doesn't really matter, if the context structure is in the frame too. We'd just need to skip that gap. REG_INT(64) or I64 is as valid as I0 or I4, as long as it's assured, that it's exactly addressing the incoming argument area of the called function. A problem with this is that you can't be sure that you can actually have the "next" frame of registers adjacent to the current frame--they might already be taken. Imagine A calls B, then B creates a continuation and stores it in a global, then returns. If A makes another function call, it can't just use the adjacent register frame--it's still "taken". (I'm referring here to the "sliding register window" idea of course.) But this is reminding me of another idea I've had. I've been thinking for a while about another idea to decrease some of the copying which we are now doing for function calls, as well as the allocation of register sets during subroutine calls, and which would as a side-effect allow us to reduce the need for register spilling. First, a code snippet. Consider the following C code: int a(int x) { return b(x); } int b(int x) { return c(x + 1); } int c(int x) { return x + 7; } As compiled on the PPC, this code is very compact--each function is implemented by just one or two instructions, and only one register is used. There are two key factors here: (1) no register preservation is needed, and (2) the call from a() to b() is just a branch, since the call to a() has already loaded the registers with the correct values for the call to b(). By my thinking, register usage falls into three basic categories: 1) Registers used for parameter passing and returns values for function calls. 2) Registers used to hold values which need to be preserved across function calls. 3) Registers used to hold values which do not need to be preserved across function calls. Really, (1) and (3) are similar--in either case, you have registers whose values are allowed to change across function calls. (For register which hold return values that's obvious, and for those used to pass parameters, it seems fair that these be expected to change across a function call.) So we really have two cases--volatile and non-volatile register usage. In the PPC ABI, half the registers are treated as volatile, and the other half as non-volatile. For the PPC, this corresponds to caller-preserved v. callee-preserved. Because of continuations[1], parrot can't have callee-preserved registers, but it could still have volatile v. non-volatile registers. Here's what I'm thinking: Keep the old-scheme registers inside the interpreter structure, *as well as* the new indirect registers. Call the registers in the interpreter I0 to I31, and the indirect registers I32 to whatever.[2] I'll call these the "direct" and "indirect" registers. By their very nature, the direct registers are volatile, and the indirect registers are non-volatile. What does this buy us? The following: 1) Parameter passing and return occur via the established calling conventions, in the volatile registers. No extra copying is needed upon function call or return--the volatile registers are physically the same for the caller and the callee. 2) "Temporary" calculations can use the volatile registers. Values which need preserving can use the non-volatile registers directly. 3) In functions which need no non-volatile registers, there's no need to allocate the indirect registers at all. 4) Cases like my example code above can compile just as compactly as the PPC case. With the volatile Parrot registers mapped to volatile CPU registers (in the PPC case at least), this would be highly efficient: you'd end up with asm similar to what you have in the C case above, and you'd not have a need to save the CPU registers back to Parrot registers (other than those used in the calling conventions) for Parrot function calls out of JIT code. 5) Because this opens things up to more than 32 registers, code could use as many registers as needed. You'd never have register spilling per-se in order to accommodate local variables, though you'd still want a smart register allocator which minimized the number of registers needed (for memory locality as well as minimizing allocation). But a very simple one-non-vo