Re: Register stacks, return continuations, and speeding up calling
Dan Sugalski [EMAIL PROTECTED] wrote: The first is the CPS style chews through continuation objects at a massive rate, and using them means that everything needs to be COW. We don't have COWed stacks, they are all single chunk already. A return continuation is a *potential* continuation. Ok, the return of the return continuation recycling. But it's likely a bit more complicated. Creating a Continuation somewhere down a call chain has to mark all return continuations up as non-recyclable. And as the Continuation PMC can get passed anywhere, it looks to me that just the presence of one Continuation PMC invalidates all return continuations. The proposed scheme with big register chunks and a watermark at the frame pointer of the highest continuation looks more sane. ..., and it will completely invalidate all the existing JIT code. Only i386, which is the only one that does not already a CPU register indirect addressing of Parrot registers. i386 is currently being fixed. When that's done, all JIT platforms basically need *one* additional instruction in one place, e.g. for PPC: r16 = BP_OFFS(r13)# get base bointer from interpreter ..., but I'm kinda sick at the moment, Get well soon. leo
Re: [perl #31919] [PATCH] Win32 perlnum test failure - test 36 (+- zero)
Ron Blaschke [EMAIL PROTECTED] wrote: Visual C++ compiles -0.0 to 0.0, which leads to the error. Attached patch will fix this. Thanks, applied. leo
Re: Problems with 0.1.1 release on x86-64
Brian Wheeler [EMAIL PROTECTED] wrote: Sigh. I'll get this right sometime! Thanks, applied. leo
Re: Pathological Register Allocation Test Generator
Bill Coffman [EMAIL PROTECTED] wrote: I am currently working on a fix to the large subroutine register allocation bug, aka, massive spilling not yet implemented. The problem, is that the register allocation code is complex, and I'm not all that familiar with it, or even with working with compilers at the coding level. If you encounter any problems, please just ask. I am comming at the problem as a former graph theorist, That's good. ... The first attached program (gen3.pl), Some remargs WRT gen{3,4}.pl: 1) While these programs exhibit some worst case register layout it's probably not a very typical layout. 2) RL programs have lexicals and globals access spread over the code like in the generated gen.imc 3) and that's intersparsed with highly local access to temporaries coming from expression evaluations. I'd change the simulation program to use PMCs to allow for 2). Now when it comes to spilling, these lexicals or globals don't need a new storage, their live range can just get discarded and at the next usage of this lexical or global it just can be refetched[1]. Implementing this should already vastly improve the register allocation. [1] The refetching can of course be into a different Parrot register. Thus the usage of globals or lexicals wouldn't interfer and generate huge live ranges over the whole function as it currently does. I don't know if we need some syntax to easily support lexicals or globals or if the register allocation can deduce that itself. But if a new syntax simplifies that, we put it in. For 3) there is already a separate pass in imcc/reg_alloc.c:allocate_non_interfering(). leo
Re: Perl 6 Summary for 2004-10-01 through 2004-10-17
In article [EMAIL PROTECTED], [EMAIL PROTECTED] (Matthew Walton) writes: Austin Hastings wrote: Does this mean that we're done? :) No, it means Larry's about to stun us with something seemingly bizarre and inexplicable which turns out to be a stroke of genius. This conjured up an image of Larry whacking someone with a coelacanth... -- Peter Scott http://www.perldebugged.com/ *** NEW *** http://www.perlmedic.com/
Re: Register stacks, return continuations, and speeding up calling
At 9:16 AM +0200 10/20/04, Leopold Toetsch wrote: Dan Sugalski [EMAIL PROTECTED] wrote: The first is the CPS style chews through continuation objects at a massive rate, and using them means that everything needs to be COW. We don't have COWed stacks, they are all single chunk already. Yeah, I know, but I've been considering allowing stack back-peeking and modifying. Sorry, wasn't clear. A return continuation is a *potential* continuation. Ok, the return of the return continuation recycling. But it's likely a bit more complicated. Creating a Continuation somewhere down a call chain has to mark all return continuations up as non-recyclable. Right. Any time an actual continuation is created we need to walk back up the call chain and mark all the pending return continuations as non-recyclable. And as the Continuation PMC can get passed anywhere, it looks to me that just the presence of one Continuation PMC invalidates all return continuations. No, it doesn't, luckily. The worst case here is that invoking a continuation will leave some return continuations to be cleaned up by a DOD sweep. The proposed scheme with big register chunks and a watermark at the frame pointer of the highest continuation looks more sane. Yeah, that's what I thought back when we started parrot--that's the original design of the register stacks. At this point I *don't* think it's the right thing, for a few reasons. *) It was annoyingly error prone *) If we switch to the scheme we've been arguing over the number of register frame pushes will approach 0, so there's little point in the complexity. If we're not saving much on the register stacks (and with the switch in calls we won't be, which means we can drop the pushtop/poptop stuff on calls) it's easier to go with a one-frame-per-chunk setup. ..., and it will completely invalidate all the existing JIT code. Only i386, which is the only one that does not already a CPU register indirect addressing of Parrot registers. i386 is currently being fixed. When that's done, all JIT platforms basically need *one* additional instruction in one place, e.g. for PPC: r16 = BP_OFFS(r13)# get base bointer from interpreter What, we have two registers dedicated? One for the interpreter pointer and one for the start of registers? I didn't realize that. If so, then nevermind. ..., but I'm kinda sick at the moment, Get well soon. Let's hope that's after we get this hashed out. :) -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Register stacks, return continuations, and speeding up calling
Dan Sugalski wrote: At 9:16 AM +0200 10/20/04, Leopold Toetsch wrote: Right. Any time an actual continuation is created we need to walk back up the call chain and mark all the pending return continuations as non-recyclable. Ok. If we're not saving much on the register stacks (and with the switch in calls we won't be, which means we can drop the pushtop/poptop stuff on calls) it's easier to go with a one-frame-per-chunk setup. Yep, it's easier. Let's start with that. r16 = BP_OFFS(r13)# get base bointer from interpreter What, we have two registers dedicated? One for the interpreter pointer and one for the start of registers? I didn't realize that. If so, then nevermind. Well, not quite ;) But it's absolutely no problem for e.g. PPC. It got plenty of callee-saved registers. For i386 the frame-pointer is currently being created in %ebx, interpreter access, which is basically rare, needs: mov -16(%ebp), %eax # get interpreter And that's needed too for getting the new frame pointer mov BP_OFFS(%eax), %ebx # only for reload after invoke Accessing e.g. a non-mapped I2 is already: mov 8(%ebx), %eax which isn't worse then the old absolute address thingy. leo
Re: cvs commit: parrot/jit/i386 jit_emit.h
Leopold Toetsch [EMAIL PROTECTED] wrote: reserve %ebp for register frame pointer s/%ebp/%ebx/ Sorry, leo
Re: Register stacks, return continuations, and speeding up calling
At 6:13 PM +0200 10/20/04, Leopold Toetsch wrote: Dan Sugalski wrote: At 9:16 AM +0200 10/20/04, Leopold Toetsch wrote: Right. Any time an actual continuation is created we need to walk back up the call chain and mark all the pending return continuations as non-recyclable. Ok. If we're not saving much on the register stacks (and with the switch in calls we won't be, which means we can drop the pushtop/poptop stuff on calls) it's easier to go with a one-frame-per-chunk setup. Yep, it's easier. Let's start with that. Good. Today I can manage eaiser. I'm not sure past that. :) r16 = BP_OFFS(r13)# get base bointer from interpreter What, we have two registers dedicated? One for the interpreter pointer and one for the start of registers? I didn't realize that. If so, then nevermind. Well, not quite ;) But it's absolutely no problem for e.g. PPC. It got plenty of callee-saved registers. For i386 the frame-pointer is currently being created in %ebx, interpreter access, which is basically rare, needs: mov -16(%ebp), %eax # get interpreter And that's needed too for getting the new frame pointer mov BP_OFFS(%eax), %ebx # only for reload after invoke Accessing e.g. a non-mapped I2 is already: mov 8(%ebx), %eax which isn't worse then the old absolute address thingy. 'Kay, now I'm confused. I thought we were talking about removing the registers from out of the interpreter structure, which'd leave us needing two pointers, one for the interpreter struct and one for the registers. (With no link back from the register stack top to the interpreter) That not the case? -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
parrot authentication handlers
mod_parrot now supports authentication handlers. i'm planning a release in the next few days, including a whitepaper on its architecture, but here's an example of what you can now do. the following handler accepts any basic authentication with a password of 'squawk' (correpsonding httpd.conf follows the code). .namespace [ 'MyAuthHandler' ] # handler.imc .sub _handler .local pmc r .local pmc ap_const .local string pw .local int status find_global ap_const, 'Apache::Constants', 'ap_constants' # get the request_rec object find_type $I0, 'Apache::RequestRec' r = new $I0 # decline if not the initial request $I1 = r.'is_initial_req'( ) if $I1 != 1 goto auth_declined (status, pw) = r.'get_basic_auth_pw'( ) if pw != 'squawk' goto auth_failure $I0 = ap_const['OK'] goto auth_return_status auth_failure: $I0 = ap_const['HTTP_UNAUTHORIZED'] goto auth_return_status auth_declined: $I0 = ap_const['DECLINED'] goto auth_return_status auth_return_status: .pcc_begin_return .return $I0 .pcc_end_return .end - # httpd.conf ParrotInit /tmp/init.pbc ParrotLoad /tmp/Constants.pbc ParrotLoad /tmp/RequestRec.pbc ParrotLoad /tmp/handler.pbc Directory /usr/local/apache2/htdocs/parrot/private ParrotAuthenHandler MyAuthHandler AuthType Basic AuthName Parrot Test Require valid-user /Directory -jeff
Re: Register stacks, return continuations, and speeding up calling
Dan Sugalski wrote: 'Kay, now I'm confused. I thought we were talking about removing the registers from out of the interpreter structure, which'd leave us needing two pointers, one for the interpreter struct and one for the registers. Ok, short summary of future layout of JIT regs: itemPPC i386 interpreter r13 -16(%ebp) frame pointer r16%ebx Register addressing is done relative to the frame pointer, which will be in a register. The interpreter isn't used that often inside integer JIT code, so it isn't in an register in i386 but is easily reloaded into one. Currently the frame pointer and the interpreter are the same. Code that branches (invoke) will reload the frame pointer from the interpreter. leo
Re: Register stacks, return continuations, and speeding up calling
At 9:09 PM +0200 10/20/04, Leopold Toetsch wrote: Dan Sugalski wrote: 'Kay, now I'm confused. I thought we were talking about removing the registers from out of the interpreter structure, which'd leave us needing two pointers, one for the interpreter struct and one for the registers. Ok, short summary of future layout of JIT regs: itemPPC i386 interpreter r13 -16(%ebp) frame pointer r16%ebx Register addressing is done relative to the frame pointer, which will be in a register. The interpreter isn't used that often inside integer JIT code, so it isn't in an register in i386 but is easily reloaded into one. Currently the frame pointer and the interpreter are the same. Code that branches (invoke) will reload the frame pointer from the interpreter. Ah, OK. That make sense. (Which is probably a clue that it's time to go lie down, but that's a separate issue) In that case I won't worry about it, and I think I know what I'd like to do with the interpreter, the register frame, and the register backing stack. I'll muddle it about some and see where it goes. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: parrot authentication handlers
At 2:27 PM -0400 10/20/04, Jeff Horwitz wrote: mod_parrot now supports authentication handlers. i'm planning a release in the next few days, including a whitepaper on its architecture, but here's an example of what you can now do. Wow. That's... cool. And a bit scary. But definitely cool. I could have an excessive amount of fun with this. You will, I trust, keep us up to date on all the places where we're making life difficult? :) -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: parrot authentication handlers
Jeff Horwitz [EMAIL PROTECTED] wrote: mod_parrot now supports authentication handlers. awesome -jeff leo
Re: parrot authentication handlers
You will, I trust, keep us up to date on all the places where we're making life difficult? :) for sure. but it's actually been quite a smooth ride so far. i have a short list of problems i've had to deal with, and i'll forward them to the list when i get the chance. what i'd really like to see is a language that utilizes parrot objects (AFAIK there isn't one right now). once we have this, we can start writing handlers in a high level language and REALLY have something to show off. ;-) -jeff On Wed, 20 Oct 2004, Dan Sugalski wrote: At 2:27 PM -0400 10/20/04, Jeff Horwitz wrote: mod_parrot now supports authentication handlers. i'm planning a release in the next few days, including a whitepaper on its architecture, but here's an example of what you can now do. Wow. That's... cool. And a bit scary. But definitely cool. I could have an excessive amount of fun with this. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: parrot authentication handlers
At 3:38 PM -0400 10/20/04, Jeff Horwitz wrote: You will, I trust, keep us up to date on all the places where we're making life difficult? :) for sure. but it's actually been quite a smooth ride so far. i have a short list of problems i've had to deal with, and i'll forward them to the list when i get the chance. Great. what i'd really like to see is a language that utilizes parrot objects (AFAIK there isn't one right now). once we have this, we can start writing handlers in a high level language and REALLY have something to show off. ;-) Well, actually scary though it may be, my Work Project uses parrot objects for everything. Whether this is a useful thing or not's an open question (the language lacks subroutines, for example) but... On Wed, 20 Oct 2004, Dan Sugalski wrote: At 2:27 PM -0400 10/20/04, Jeff Horwitz wrote: mod_parrot now supports authentication handlers. i'm planning a release in the next few days, including a whitepaper on its architecture, but here's an example of what you can now do. Wow. That's... cool. And a bit scary. But definitely cool. I could have an excessive amount of fun with this. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Externals in Repository (again) (was Re: Perl 6 Summary for 2004-10-01 through 2004-10-17)
I understand Dan's view that parrot should be 100% self contained, but I really think its silly to inline CPAN modules into our CVS repository. I have a compromise solution, which might satisfy Dan. 1. I create a new parrot-external-dependencies CVS repository. All external dependencies that Dan will not allow to be external go here. 2. I setup some CVS magic, so 'co parrot' does the right thing and puts all thee things from the parrot-external-dependencies directory into a 'externals' directory or something. 2a. There will also be a parrot-base CVS module which will be the contents of the existing one. Actually, this can be expanded to have several different checkoutable things in CVS: maybe.. parrot-base parrot-external-perl parrot-external-icu parrot-languages and one parrot (or Parrot, if we want to be like the mozilla folks) that gets them all. There is another compromise, that says: CVS shouldn't have CPAN modules. But the release process for packages will include the tar inside the tar. Thoughts? -R Joshua Gatcomb accidentally introduced a dependency on Config::IniFiles. Since it is implemented in pure perl he offered to add it to the repository. Warnock applies. In the note offering to fix it, I also listed numerous other scripts with non-core dependencies. Dan, in IRC, indicated that they all should have tickets on them. Before fixing parrotbench.pl with one of the following solutions: 1. inline Config::IniFiles with the author's permission 2. Use some other core module if possible 3. Roll my own 4. Revert back to previous non-module version
Re: parrot authentication handlers
On Wed, 20 Oct 2004, Dan Sugalski wrote: Well, actually scary though it may be, my Work Project uses parrot objects for everything. Whether this is a useful thing or not's an open question (the language lacks subroutines, for example) but... in general, anything that can generate bytecode/PIR/PASM containing a _handler subroutine in a parrot namespace would fit the bill. -jeff
Re: parrot authentication handlers
At 3:55 PM -0400 10/20/04, Jeff Horwitz wrote: On Wed, 20 Oct 2004, Dan Sugalski wrote: Well, actually scary though it may be, my Work Project uses parrot objects for everything. Whether this is a useful thing or not's an open question (the language lacks subroutines, for example) but... in general, anything that can generate bytecode/PIR/PASM containing a _handler subroutine in a parrot namespace would fit the bill. Hrm. I may have to introduce subs into the grammar for the language just so I can make this work. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Pathological Register Allocation Test Generator
Leo, Thanks for your suggestions and comments. On Wed, 20 Oct 2004 10:35:04 +0200, Leopold Toetsch [EMAIL PROTECTED] wrote: Some remargs WRT gen{3,4}.pl: 1) While these programs exhibit some worst case register layout it's probably not a very typical layout. Agreed. The idea was to automate and compare to gcc. There are real-world tests already in the parrot test suite, but because I can generate millions of such cases, one hope is to detect errors, and to get some kind of performance metric, even if these programs are artificial. To get real metrics that people would pay attention to, we might implement SPEC99, etc. 2) RL programs have lexicals and globals access spread over the code like in the generated gen.imc 3) and that's intersparsed with highly local access to temporaries coming from expression evaluations. I think I could modify the tests so that the variables have a Poisson distribution, as far as their usage distance goes (max number of lines away from first use). This would cause some of them to look more like temporary variables, and be a somewhat better simulation of real programs. I'd change the simulation program to use PMCs to allow for 2). Now when it comes to spilling, these lexicals or globals don't need a new storage, their live range can just get discarded and at the next usage of this lexical or global it just can be refetched[1]. Implementing this should already vastly improve the register allocation. [1] The refetching can of course be into a different Parrot register. Thus the usage of globals or lexicals wouldn't interfer and generate huge live ranges over the whole function as it currently does. I don't quite understand this. You think I should create PMC variables, instead of integers? It's probably a good idea to have a mix of the four types I suppose. Once thing I can say is that the interference graph is pretty conservatively generated. When a variable is reassigned, it can be treated as a new variable which can reduce register pressure, but I don't think the code is doing that yet. It sounds like you're talking about implementing the interference graph better. I may get into that too, but for now, I want to get the rester coloring (mapping vars to registers) working better. Also, I'd like to be able to capture the effects of changes in metrics (via scripts like those I posted) so we can see if different ideas will actually improve the situation or not. I don't even have any runtime checks yet, but creating a score to see how much was spilled, could be helpful. That could be incorporated into the register allocator itself (which already is, if you run with -d 0008, then grep Spill). I don't know if we need some syntax to easily support lexicals or globals or if the register allocation can deduce that itself. But if a new syntax simplifies that, we put it in. My preference is to not distinguish between various types of variables, but to have the algorithms deal best with nodes based on the structure of the graph(s). For 3) there is already a separate pass in imcc/reg_alloc.c:allocate_non_interfering(). Yes. This function seems to find variables whose live ranges (life ranges) are restricted to a single basic block, and assign them reg 28,29, or 30. My preference is that low hanging fruit like temporary vars should be assigned by the algorithm. Such variables will tend not to have much interference with others, and are thus probably easier to color. My experience with coloring, is that assigning low degree (aka arity, #neighbors) nodes to fixed colors first, will actually reduce the performance of the graph coloring algorithm. It's usually best to color the easy stuff last. The algorithm I'd like to use is one that happens to work very well for interference graphs, and it is very simple. This algorithm selects a node that interferes with the fewest other variables (lowest degree), and removes it from the interference graph, and puts it into a stack. It then recalculates the lower degrees of each neighbor, and iteratively removes all nodes from the graph in this way, pushing them onto the stack. It then, pops off the nodes from the stack and colors them in lifo order, using the first available color. Nodes that need a color higher that 30, or whatever the max. turns out to be, are spilled. The spilled nodes are always in the densest part of the graph, where something has to spill. I believe this is essentially the algorithm Briggs/Chaitin used, and some variation of it has been used in most compilers since the 70's. One exception is that certain variables want certain colors (want_regno) -- those should be colored first, as the current code does. A disadvantage of the above algorithm is that it tends to randomly select which nodes get spilled, among the densest parts. Another optimization to the algorithm is to incorporate a score that causes the most frequently accessed nodes not to get spilled. (The