Re: light-weight calling conventions
Patrick R. Michaud <[EMAIL PROTECTED]> wrote: > BTW, it may be very possible for me to write the p6ge generator so > that it can be switched between the PIR and bsr/ret calling conventions, > so we don't need to resolve this entirely now. And we could then benchmark > the two against each other. That would be really great. There are a lot of things to consider, which might or might not have an influence. - tailcalls are faster then bsr/ret - error traceback: not really easy with bsr/ret - GC issues: the stack pushes consume GC-able object - calling back into PIR (might work seemlessly or not - it's untested) > Pm Thanks, leo
Re: silent effects of opcodes
Dan Sugalski <[EMAIL PROTECTED]> wrote: > Exceptions and continuations should be the same problem -- the target > is the start of a basic block. (Well, more than that, as they're > places where calling conventions potentially kick in) This means the > instruction immediately after a sub call starts a new block, as does > the start of an exception handler. Dan, I've already said that there is of course a new basic block. The problem arises by the silent generation of loops in the CFG. Within a loop the same register can't be reallocated to a different variable. There are two possible solutions (AFAIK): 1) statically mark the branch target of the loop. Proposed syntax constructs: 1a) for exceptions: set_eh handler, catch_label This is just a small adaption of the sequence of installing an exception handler. It depends a bit, if exception handlers are inline or nested closures or both. 1b) generally RESUMABLE: func_that_might_loop_through_cc() possibly accompanied with another markup of the function call that loops back. 2) Fetch all from lexicals/globals after a function call. leo
Re: silent effects of opcodes
Bill Coffman <[EMAIL PROTECTED]> wrote: > Since I understand the item about allocating registers between sub > calls, I can probably implement that change, as I work through the > control flow/data flow analysis. This is already implemented, parts of it are in CVS. > Sounds like everything else is okay. We're just missing a few CFG > arcs from the continuations stuff, which I'll let you all worry about. > :) Yep > Bill > ps: I'm making progress on grokking the cfg and register renaming > stuff. Will let you know. This needs an SSA graph of the data flow? BTW, looking at analyse_life_block() it seems that this allocates unconditionally a Life_range structure. As these are O2 in (n_symbols * n_basic_blocks) we could safe huge amounts of memory, by defining that a missing life block propagates the previous one. Dunno if its possible, though. leo
Re: silent effects of opcodes
> >* [NEW] If register 15 or below is used, it should be cleared out, > >ZEROED, after it's last use and before the next sub call. This is for > >security reasons. Obviously, these registers will not be the first > >choice to use. > > Nope -- this isn't the job of the register allocator. We aren't > leaving security issues up to bytecode except in a very few, limited > cases. (All involving subroutines with elevated security credentials > which the sub needs to drop after using things they allow) Okay, looks like I misread an earlier message of Dan's. The reason that we cannot use R0-R15 through a sub, is that they are shredded. The values are not preserved through the sub call. Since I understand the item about allocating registers between sub calls, I can probably implement that change, as I work through the control flow/data flow analysis. Sounds like everything else is okay. We're just missing a few CFG arcs from the continuations stuff, which I'll let you all worry about. :) Bill ps: I'm making progress on grokking the cfg and register renaming stuff. Will let you know.
Re: silent effects of opcodes
At 2:02 PM -0800 11/17/04, Bill Coffman wrote: So to generalize. The following registers are available, under the following conditions: * [NEW] If register 15 or below is used, it should be cleared out, ZEROED, after it's last use and before the next sub call. This is for security reasons. Obviously, these registers will not be the first choice to use. Nope -- this isn't the job of the register allocator. We aren't leaving security issues up to bytecode except in a very few, limited cases. (All involving subroutines with elevated security credentials which the sub needs to drop after using things they allow) Other observations: * From new allocator bugs, and analysis, we've discovered that exceptions cause new control flow edges, not previously considerd. This case is being reworked by Leo? to provide missing CFG edges, through a minor change in the try block declaration. (thread "Continuations, basic blocks, loops and register allocation") * The case of continuations has not been solved with respect to register alloction. Leo's RESUMEABLE: label might provide help here. In any case, we can expect to see some additional edges being inserted though. (also thread "Continuations, basic blocks, loops and register allocation") Exceptions and continuations should be the same problem -- the target is the start of a basic block. (Well, more than that, as they're places where calling conventions potentially kick in) This means the instruction immediately after a sub call starts a new block, as does the start of an exception handler. (And I've got some docs on exceptions that should be out later tonight) -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: silent effects of opcodes
So to generalize. The following registers are available, under the following conditions: * Registers R16-R31 are always available for the allocator. * Registers R0-R15 are available between sub calls. That is, for any symbol, whose life range does not cross a subroutine. (This implies that all registers are available if no subs are called.) Since we have no way to determine if a sub is using those or not, any sub call will be assumed to possibly use R0-R15. Furthermore, even though we know there are certain registers in that range, which are unused by the calling convention, we will still not use them through a sub call for security reasons. * [NEW] If register 15 or below is used, it should be cleared out, ZEROED, after it's last use and before the next sub call. This is for security reasons. Obviously, these registers will not be the first choice to use. * Availability of these registers is subject to the rules for using the Parrot opcode C, which were (are being?) worked through by Leo. Other observations: * Leo introduced a flag on the symbol, to indicate if it's volatile or not. These will be eligible for R0-R15 (volitile registers?). * From new allocator bugs, and analysis, we've discovered that exceptions cause new control flow edges, not previously considerd. This case is being reworked by Leo? to provide missing CFG edges, through a minor change in the try block declaration. (thread "Continuations, basic blocks, loops and register allocation") * The case of continuations has not been solved with respect to register alloction. Leo's RESUMEABLE: label might provide help here. In any case, we can expect to see some additional edges being inserted though. (also thread "Continuations, basic blocks, loops and register allocation") Did I miss anything?
Re: light-weight calling conventions (was: Second cut at a P6 grammar engine, in Parrot)
On Wed, Nov 17, 2004 at 02:47:09PM -0700, Patrick R. Michaud wrote: > BTW, it may be very possible for me to write the p6ge generator so > that it can be switched between the PIR and bsr/ret calling conventions, > so we don't need to resolve this entirely now. And we could then benchmark > the two against each other. Keeping the code that flexible would be very interesting. If you can achieve this without much extra pain, I think that it would be worth it. Nicholas Clark
Re: light-weight calling conventions (was: Second cut at a P6 grammar engine, in Parrot)
On Wed, Nov 17, 2004 at 10:03:14PM +0100, Leopold Toetsch wrote: > Dan Sugalski wrote: > > As already stated, I don't consider these as either light-weight nor > faster. Here is a benchmark. > > Below are 2 version of a recursive factorial program. fact(100) is > calculated 1000 times: > > PIR 1.1 s > bsr/ret 2.4 s > PIR/tailcall 0.2s > > Unoptimized Parrot, default i.e. slow run core. BTW, it may be very possible for me to write the p6ge generator so that it can be switched between the PIR and bsr/ret calling conventions, so we don't need to resolve this entirely now. And we could then benchmark the two against each other. Pm
Re: light-weight calling conventions (was: Second cut at a P6 grammar engine, in Parrot)
On Wed, Nov 17, 2004 at 10:03:14PM +0100, Leopold Toetsch wrote: > As already stated, I don't consider these as either light-weight nor > faster. Here is a benchmark. > > Below are 2 version of a recursive factorial program. fact(100) is > calculated 1000 times: > > PIR 1.1 s > bsr/ret 2.4 s > PIR/tailcall 0.2s > > Unoptimized Parrot, default i.e. slow run core. Sure, but the bsr/ret in your version is making lots of saveall calls that I'd be avoiding. Also, this code is saving pmc's (big ones at that) whereas I'll generally be pushing a few ints and maybe a string onto the stack. So, rewriting the above for ints instead of PerlInts, changing the multiply op to add to stay within the range of ints, and removing the unneeded saves/restores for things that are being passed as parameters anyway (and doubling the count save/restore to make it somewhat closer to what I'd expect...): [EMAIL PROTECTED] pmichaud]$ parrot pmfact.imc #PIR 500500 5.819842 [EMAIL PROTECTED] pmichaud]$ parrot pmfactbsr.imc #bsr/ret 500500 2.010935 Please keep in mind that I'm a newcomer to Parrot, so it's entirely possible that I'm made some invalid assumptions in my code that skew these results (and I'll freely admit them if pointed out). And I will admit that the PIR code is still impressive speed-wise relative to what it is doing, but it's hard to ignore a 60% improvement. Pm .sub optc @IMMEDIATE # TODO turn on -Oc # print "optc\n" .end .sub _main @MAIN .param pmc argv .local int count, product .local float start, end count = 1000 .local int argc argc = elements argv if argc < 2 goto def $S0 = argv[1] count = $S0 def: .local int i i = 0 start = time .local int n loop: n = count product = 1 product = _fact(product, n) inc i if i < 1000 goto loop end = time end -= start print product print "\n" print end print "\n" .end .sub _fact .param int product .param int count if count > 1 goto recurs .return (product) recurs: product += count dec count product = _fact(product, count) .return (product) .end .sub _main @MAIN .param pmc argv .local int count, product .local float start, end count = 1000 .local int argc argc = elements argv if argc < 2 goto def $S0 = argv[1] count = $S0 def: .local int i i = 0 start = time .local int n loop: n = count product = 1 save count bsr fact restore count inc i if i < 1000 goto loop end = time end -= start print product print "\n" print end print "\n" goto ex fact: if count > 1 goto recurse ret recurse: product += count dec count save count save count bsr fact restore count restore count ret ex: .end
Re: light-weight calling conventions (was: Second cut at a P6 grammar engine, in Parrot)
At 5:08 PM -0500 11/17/04, Dan Sugalski wrote: Chopping out the multiplication (since that's a not-insignificant amount of the runtime for the bsr/ret version) gives: PIR: real0m3.016s user0m2.990s sys 0m0.030s bsr/ret real0m0.344s user0m0.340s sys 0m0.010s and with -Oc, for completeness: real0m0.416s user0m0.380s sys 0m0.030s -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: light-weight calling conventions
At 11:07 PM +0100 11/17/04, Leopold Toetsch wrote: Please no premature optimizations. It's important to note that premature optimization == things Leo disapproves of The bsr/ret version of things is fine. In the absolute best case it'll be the same speed as tail calls, and in normal cases it'll be significantly faster since it, by definition, has a lot less work to do. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: light-weight calling conventions (was: Second cut at a P6 grammar engine, in Parrot)
At 10:03 PM +0100 11/17/04, Leopold Toetsch wrote: Dan Sugalski wrote: [ this came up WRT calling conventions ] I assume he's doing bsr/ret to get into and out of the sub, which is going to be significantly faster. Who says that? As already stated, I don't consider these as either light-weight nor faster. Here is a benchmark. Below are 2 version of a recursive factorial program. fact(100) is calculated 1000 times: PIR 1.1 s bsr/ret 2.4 s PIR/tailcall 0.2s Unoptimized Parrot, default i.e. slow run core. Way to go with the overkill. I'm impressed. However, written more sanely the results are: PIR: real0m4.149s user0m4.120s sys 0m0.030s bsr/ret: real0m1.266s user0m1.260s sys 0m0.000s Chopping out the multiplication (since that's a not-insignificant amount of the runtime for the bsr/ret version) gives: PIR: real0m3.016s user0m2.990s sys 0m0.030s bsr/ret real0m0.344s user0m0.340s sys 0m0.010s The bsr/ret version is: start: new P16, .PerlInt set P16, 1000 elements I16, P5 lt I16, 2, def set S0, P5[1] set P16, S0 def: set I16, 0 time N16 save N16 loop: clone P1, P16 new P0, .PerlInt set P0, 1 save P16 save I16 bsr fact restore I16 restore P16 inc I16 lt I16, 1000, loop restore N16 time N17 sub N17, N17, N16 print P0 print "\n" print N17 print "\n" end # in: P0 is product, p1 is count # out: P0 is new product fact: gt P1, 1, doit ret doit: mul P0, P0, P1 dec P1 bsr fact ret -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: light-weight calling conventions
Patrick R. Michaud wrote: On Wed, Nov 17, 2004 at 10:03:14PM +0100, Leopold Toetsch wrote: [EMAIL PROTECTED] pmichaud]$ parrot pmfact.imc #PIR 500500 5.819842 [EMAIL PROTECTED] pmichaud]$ parrot pmfactbsr.imc #bsr/ret 500500 2.010935 Ok: $ parrot pmfactbsr.imc 500500 3.459947 $ parrot -Oc pmfact.imc 500500 1.237185 Now what ;) Are you sure, that you can't do a tailcall sometimes? What about calling back into PIR code? Please no premature optimizations. Please keep in mind that I'm a newcomer to Parrot, so it's entirely possible that I'm made some invalid assumptions in my code that skew these results (and I'll freely admit them if pointed out). I've first to understand the generated rules engine a bit. But generally speaking: let's first do it right and then fast. And I will admit that the PIR code is still impressive speed-wise relative to what it is doing, but it's hard to ignore a 60% improvement. Or more ... Pm leo
Re: silent effects of opcodes
At 10:12 PM +0100 11/17/04, Leopold Toetsch wrote: Dan Sugalski <[EMAIL PROTECTED]> wrote: At 7:34 PM +0100 11/17/04, Leopold Toetsch wrote: All registers are preserved, but some of these registers are used, either by implict opcodes or as return values. Erm, no. Unused registers in the 0-15 range are explicitly garbage: It was about usabalitiy of registers for the allocator. So before I make a function call, these are allocatable as temps. Return values are garbage, if not set. As long as the allocator is set to assume that after a function call that all the registers in the range 0-15 that don't have return values are garbage. So if there are no string return values, string registers 0-15 are toast. > Note that registers 16-31 of each of the four types are, for security reasons, I passed into the invoked subroutine, method, or continuation. They are guaranteed to be garbage. Not quite. S and P regs have to be NULLed. Or you gonna tell the DOD system how to mark garbage ;) No, the invoke op. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: silent effects of opcodes
Dan Sugalski <[EMAIL PROTECTED]> wrote: > At 7:34 PM +0100 11/17/04, Leopold Toetsch wrote: >>All registers are preserved, but some of these registers are used, >>either by implict opcodes or as return values. > Erm, no. Unused registers in the 0-15 range are explicitly garbage: It was about usabalitiy of registers for the allocator. So before I make a function call, these are allocatable as temps. Return values are garbage, if not set. > Note that registers 16-31 of each of the four types are, for > security reasons, I passed into the invoked subroutine, > method, or continuation. They are guaranteed to be garbage. Not quite. S and P regs have to be NULLed. Or you gonna tell the DOD system how to mark garbage ;) >> > * Registers P4, S1-S4, N0-N4 are free for allocation, regardless. >> >>I've included P3 (see below). If it's used it interfers. > Nope. It'll either be set if a call returns overflow parameters, or > unused and thus garbage. Ah, yep. Thanks. It is returned too, forgot that. leo
light-weight calling conventions (was: Second cut at a P6 grammar engine, in Parrot)
Dan Sugalski wrote: [ this came up WRT calling conventions ] I assume he's doing bsr/ret to get into and out of the sub, which is going to be significantly faster. Who says that? As already stated, I don't consider these as either light-weight nor faster. Here is a benchmark. Below are 2 version of a recursive factorial program. fact(100) is calculated 1000 times: PIR 1.1 s bsr/ret 2.4 s PIR/tailcall 0.2s Unoptimized Parrot, default i.e. slow run core. leo .sub optc @IMMEDIATE # TODO turn on -Oc # print "optc\n" .end .sub _main @MAIN .param pmc argv .local pmc count, product .local float start, end count = new PerlInt count = 1000 .local int argc argc = elements argv if argc < 2 goto def $S0 = argv[1] count = $S0 def: .local int i i = 0 start = time .local pmc n loop: n = clone count product = new PerlInt product = 1 product = _fact(product, n) inc i if i < 1000 goto loop end = time end -= start print product print "\n" print end print "\n" .end .sub _fact .param pmc product .param pmc count if count > 1 goto recurs .return (product) recurs: product *= count dec count product = _fact(product, count) .return (product) .end .sub optc @IMMEDIATE # TODO turn on -Oc # print "optc\n" .end .sub _main @MAIN .param pmc argv .local pmc count, product .local float start, end count = new PerlInt count = 1000 .local int argc argc = elements argv if argc < 2 goto def $S0 = argv[1] count = $S0 def: .local int i i = 0 start = time .local pmc n loop: n = clone count product = new PerlInt product = 1 save n save product bsr fact restore product inc i if i < 1000 goto loop end = time end -= start print product print "\n" print end print "\n" goto ex fact: saveall .local pmc product, count restore product restore count if count > 1 goto recurs restoreall save product ret recurs: product *= count dec count save count save product bsr fact restore product restoreall save product ret ex: .end
Re: silent effects of opcodes
At 2:14 PM +0100 11/17/04, Leopold Toetsch wrote: Works fine *except* for the .flatten_arg directive. This directive takes an argument array and expands the array contents to function arguments in consecutive parrot registers. E.g. .arg a=> P5 .flatten_arg array=> P6, P7, ... The code emitted to achieve that runs in a loop and is using the Parrot opcode C which sets the xth Parrot register from Py. Yep. The indirect access ops will cause problems for the PIR register allocation, since there's no way to know at compile time what's happening. Their use probably ought to invalidate all the registers, or the op restricted to pasm code. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: silent effects of opcodes
At 7:34 PM +0100 11/17/04, Leopold Toetsch wrote: Bill Coffman <[EMAIL PROTECTED]> wrote: On Wed, 17 Nov 2004 14:14:18 +0100, Leopold Toetsch <[EMAIL PROTECTED]> wrote: I've now (locally here) extended Bill Coffman's register allocator by one subroutine that actually decides to use non-volatiles or volatiles according to pdd03. All variables that are live around a subroutine call >> are e.g. allocated from R16..R31. > Regarding pdd03, I am still not clear how it should be interpreted. All registers are preserved, but some of these registers are used, either by implict opcodes or as return values. Erm, no. Unused registers in the 0-15 range are explicitly garbage: Note that registers 16-31 of each of the four types are, for security reasons, I passed into the invoked subroutine, method, or continuation. They are guaranteed to be garbage. > * Registers P4, S1-S4, N0-N4 are free for allocation, regardless. I've included P3 (see below). If it's used it interfers. Nope. It'll either be set if a call returns overflow parameters, or unused and thus garbage. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: silent effects of opcodes
Bill Coffman <[EMAIL PROTECTED]> wrote: > On Wed, 17 Nov 2004 14:14:18 +0100, Leopold Toetsch <[EMAIL PROTECTED]> wrote: >> I've now (locally here) extended Bill Coffman's register allocator by >> one subroutine that actually decides to use non-volatiles or volatiles >> according to pdd03. All variables that are live around a subroutine call >> are e.g. allocated from R16..R31. > Interesting. I'd like to see it. See below, you know where it's called from ;) > Regarding pdd03, I am still not clear how it should be interpreted. All registers are preserved, but some of these registers are used, either by implict opcodes or as return values. > * If the subroutine being allocated is a leaf (with no sub calls), > then all registers should be available. Yep. > * Registers P4, S1-S4, N0-N4 are free for allocation, regardless. I've included P3 (see below). If it's used it interfers. > * It seems like it would be simple enough to provide a "compiler > hint", to let the allocator know if the subs it calls are using the > parrot convention or not, or how many of the R5-R15 it will need. The register allocator is only supporting pdd03, nothing else. The amount of needed R5 - R15 is unknown, as these are return results. > ... This can then be used as part of a static analysis, > and can be incorporated into the unit data structure, or passed as a > separate parameter to imc_reg_alloc(). Yep and it's working. > ~Bill leo /* * find available color for register #x in available colors */ static int ig_find_color(Interp* interpreter, IMC_Unit *unit, int x, char *avail) { int c, t; SymReg *r; static const char types[] = "ISPN"; static const char assignable[4][5] = { /* 0 1 2 3 4 */ { 0, 0, 0, 0, 0, }, /* I */ { 0, 1, 1, 1, 1, }, /* S */ { 0, 0, 0, 1, 1, }, /* P */ { 1, 1, 1, 1, 1, }, /* N */ }; UNUSED(interpreter); r = unit->reglist[x]; t = strchr(types, r->set) - types; /* please note: c is starting at 1 for R0 */ if (!(r->usage & U_NON_VOLATILE)) { /* 0) 5-15 volatile range */ for (c = 6; c <= 16; c++) if (avail[c]) return c; } /* 1) try upper non-volatiles, 16...31 */ for (c = 17; c <= 32; c++) if (avail[c]) return c; /* some lower regs are preserved too 0...4 */ for (c = 1; c <= 5; c++) if (avail[c] && assignable[t][c - 1]) return c; /* no chance, force high range with possible spilling */ for (c = 33; ; c++) if (avail[c]) return c; assert(0); return 0; }
Re: silent effects of opcodes
On Wed, 17 Nov 2004 14:14:18 +0100, Leopold Toetsch <[EMAIL PROTECTED]> wrote: > I've now (locally here) extended Bill Coffman's register allocator by > one subroutine that actually decides to use non-volatiles or volatiles > according to pdd03. All variables that are live around a subroutine call > are e.g. allocated from R16..R31. Interesting. I'd like to see it. Regarding pdd03, I am still not clear how it should be interpreted. The current pdd03, as well as the previous one, both seem to indicate that registers 0-15 are likely to be overwritten, and anyone making a call, should save those registers if they still want them. The issue with PIR Code, is that the author won't know which of their symbols are mapping to registers about to be killed. So, as previously discussed, those registers will have to be hands off for the register allocator. That is essentially how the old and new alloctor have been working. But this doesn't have to always be the case. * If the subroutine being allocated is a leaf (with no sub calls), then all registers should be available. * Registers P4, S1-S4, N0-N4 are free for allocation, regardless. * It seems like it would be simple enough to provide a "compiler hint", to let the allocator know if the subs it calls are using the parrot convention or not, or how many of the R5-R15 it will need. >From this hint, a bit mask saying which registers are available could be constructed. This can then be used as part of a static analysis, and can be incorporated into the unit data structure, or passed as a separate parameter to imc_reg_alloc(). I wouldn't think this last idea would be considered a change to the calling convention, but rather as an optional optimization prototype. Not part of pasm. Dan, would something like this be allowed? ~Bill
COND macros (was: Threads, events, Win32, etc.)
Gabe Schaffer <[EMAIL PROTECTED]> wrote: >> >> Not quite. COND_WAIT takes an opaque type defined by the platform, that >> >> happens to be a mutex for the pthreads based implementation. >> >> > It should, but it doesn't. Here's the definition: >> > # define COND_WAIT(c,m) pthread_cond_wait(&c, &m) >> >> You are already in the POSIX specific part. > It came from thr_pthread.h, so it should be POSIX. The issue here is > that it's #define COND_WAIT(c,m) instead of #define COND_WAIT(c). Well in the mentioned (TODO) platform/win32/threads.h you have to define your own COND_WAIT(c, m) - this is the interface of that macro, as POSIX needs the mutex, but you would ignore the 2nd parameter. Please have a look at the empty defines in include/parrot/threads.h. The problem is a different one: the COND_INIT macro just passes a condition location, the mutex is created in a second step, which isn't needed for windows. OTOH a mutex aka critical section is needed separatly. So we should probably define these macros to be: COND_INIT(c, m) COND_DESTROY(c, m) see src/tsq.c for usage. Does win32 require more info to create conditions/mutexes or would these macros suffice? [ I'll try to answer more in a separate thread ] leo
Re: Threads, events, Win32, etc.
Gabe Schaffer <[EMAIL PROTECTED]> wrote: > Yes, there has to be a separate thread to get signals, and each thread > needs its own event queue, but why does the process have a global > event_queue? I suppose there are generic events that could be handled > just by the next thread to call check_events, but that isn't what this > sounds like. It's mainly intended for broadcasts and timers. POSIX signals are weird and more or less broken from platform to platform. The only reliable way to get at them is to block the desired signal in all but one thread. This signal gets converted to a global event and from there it can be put into specifc threads if they have installed signal handlers for that signal. But as said the existing code is experimental and is likely to change a lot. > I don't see why there needs to be a separate thread to listen for IOs > to finish. Can't that be the same thread that listens for signals? That's the plan yes. AIO completion can be delivered as a signal. > OK, I think I understand why...the event thread is in a loop waiting > for somebody to tell it that there's an event in the global event > queue...which is really the part I don't get yet. Well, the event thread is handling timer events on behalf of an interpreter. [ long win32 proposal ] I've to read through that some more times. Do you alread have ideas for a common API, or where to split the existing threads.c into platform and common code? > GNS leo
Re: silent effects of opcodes
Leopold Toetsch <[EMAIL PROTECTED]> wrote: [ setp_ind troubles ] I've found a way to force allocation to R16..R31 in the presence of this opcode. leo
Re: [perl #32466] [PATCH] Parrot m4 0.0.10 and "eval" changes
Bernhard Schmalhofer <[EMAIL PROTECTED]> wrote: > Leopold Toetsch wrote: >> >> How that? Are there no constants? > Yes, there are no constants. The only thing the generated sub does, is > to return an integer value, that was computed in the C-Code. > Thus the m4 macro "eval( 1 ^ 3 )" compiles into a sub that looks in PIR > like: > .sub generated_sub >.return( 3 ) > .end I see. And what about the equivalent of eval("ab" "cd") or eval(1.3 + 2.5) ? > CU, Bernhard leo
Re: Perl 6 Summary for 2004-11-08 through 2004-11-15
On Mon, 15 Nov 2004, Matt Fowles wrote: > Languages with Object Support? >Jeff Horwitz wondered if there were any languages with object support >that he could bend to the evil ends of mod_parrot. While no one >answered, I think Parakeet might be such a language... parakeet's a newcomer to the languages directory, so i hadn't seen it before. it has objects and functions, so it should fit in nicely with mod_parrot. it's currently broken with all the changes that have been going on, but michel is working on the fixes. good suggestion, matt. :) -jeff
Re: [perl #32466] [PATCH] Parrot m4 0.0.10 and "eval" changes
Leopold Toetsch wrote: Bernhard Schmalhofer <[EMAIL PROTECTED]> wrote: The 'eval' compiler returns a bytecode segment without a constant table. The 'destroy' of the Eval PMC needs to handle that. How that? Are there no constants? Anyway, switching to a new bytecode segment does switch the constant table too, so all compiled code ought to have a constant table. Yes, there are no constants. The only thing the generated sub does, is to return an integer value, that was computed in the C-Code. Thus the m4 macro "eval( 1 ^ 3 )" compiles into a sub that looks in PIR like: .sub generated_sub .return( 3 ) .end Of course it would be much more simle to use a plain NCI call for this purpose. But I wanted to play with 'compreg' when I implemented that. CU, Bernhard -- ** Dipl.-Physiker Bernhard Schmalhofer Senior Developer Biomax Informatics AG Lochhamer Str. 11 82152 Martinsried, Germany Tel: +49 89 895574-839 Fax: +49 89 895574-825 eMail: [EMAIL PROTECTED] Website: www.biomax.com **
Re: [perl #32418] Re: [PATCH] Register allocation patch - scales better to more symbols
At 11:35 AM +0100 11/17/04, Leopold Toetsch wrote: Dan Sugalski wrote: Okay. I'll apply it and take a shot. May take a few hours to get a real number. How does it look like? Any results already? Nope, haven't had time, unfortunately. Work's been busy. Today, if I get lucky. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: accessing self in methods
At 11:00 AM +0100 11/17/04, Leopold Toetsch wrote: We should create some syntax to access the object in methods. Well, there are two issues here. First is in pasm/bytecode. For that, fetching things explicitly with interpinfo is just fine, so the code sequence: interpinfo P16, .INTERPINFO_CURRENT_OBJECT works. At the PIR level, self is just a special-case .local, so I don't see much reason to do anything special there either -- the method tag on the .sub declaration should be enough to tell the pir compiler that it ought to go fetch the object into a register for use later on. If you wanted to use this as a time to tie named .local declarations to lexical pad slots and global names so the spilling code can refetch spilled things from the pad/namespace rather than from a private backing array, that'd be fine too. self would just spill in from the interpreter info rather than a pad or namespace. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Continuations, basic blocks, loops and register allocation
Leo~ Thanks for the clarification. Matt On Wed, 17 Nov 2004 08:48:58 +0100, Leopold Toetsch <[EMAIL PROTECTED]> wrote: > Matt Fowles <[EMAIL PROTECTED]> wrote: > > > ... Thus you can consider all of the > > following questions (even though they will be phrased as statements). > > > 1) After a full continuation is taken all of the registers must be > > considered invalid. > > Calling a subroutine allocates a new register frame, that subs register > frame pointer in the context points to these fresh registers. > > A continuation restores the context it captured, i.e. at the place, > where it was created. This is true for all continuations. Inside the > context there is a *pointer* to a register frame, which is therefore > restored too. > > The effect of taking a continuation is therefore to restore registers to > that state where the continuation was created. Due to calling conventions > a part of the registers is volatile (used during a call or as return > results), while the other part is non-volatile. > > Until here there is no difference between return or full continuation. > > The effect of a full continuation can be to create a loop, where the > known control flow doesn't show a loop. Without further syntax to denote > such loops 1) is true. This register invalidation happens, if a > preserved register was e.g. only used once after the call and then that > register got reassigned, which is allowed for a linear control flow but > not inside a loop. > > This has per se nothing to do with a continuation. If you got an opcode > that does *silently* a "goto again_label" the CFG doesn't cope with the > loop, because it isn't there and things start breaking. The effect of a > full continuation *is* to create such loops. > > > 2) After a return continuation is taken, the registers can be trusted. > > Yes, according to usage in pdd03. > > > 3) If someone takes a full continuation, all return continuations > > down the callstack must be promoted. > > If one *creates* a full continuation ... > > > 4) After a function call, some magic needs to happen so that the code > > knows whether it came back to itself via a return continuation and can > > trust its registers, or it came back via a full continuation and > > cannot trust them. > > No. It's too late for magic. Either the CFG is known at compile time or > refetching in the presence of full continuations is mandatory. For both > the code must reflect the facts. > > > Corrections welcome, > > Matt > > leo > -- "Computer Science is merely the post-Turing Decline of Formal Systems Theory." -???
Re: Threads, events, Win32, etc.
> >> Not quite. COND_WAIT takes an opaque type defined by the platform, that > >> happens to be a mutex for the pthreads based implementation. > > > It should, but it doesn't. Here's the definition: > > # define COND_WAIT(c,m) pthread_cond_wait(&c, &m) > > You are already in the POSIX specific part. It came from thr_pthread.h, so it should be POSIX. The issue here is that it's #define COND_WAIT(c,m) instead of #define COND_WAIT(c). Every place in the code, whether it's Win32 or POSIX, is going to have to pass in a condition variable and a mutex. Just because Win32 will ignore the second parameter, that isn't going to prevent the code from creating the mutex, initializing it, and passing it in. > >> I'm not sure, if we even should support Win9{8,5}. > > > I'd be happy with simply implementing Win9x as a non-threaded > > platform. Of course, hopefully nobody will even ask... > > We'll see. But as Parrot's IO system is gonna be asynchronous in core, I > doubt that we'll support it. Obviously Parrot has to run on non-threaded platforms where the kernel threading and AIO stuff just won't work. You can still do user threads, but file IO will still block everything. > > rationale. I can understand why there would need to be a global event > > thread (timers, GC, DoD), but why would passing a message from one > > thread to another need to be serialized through a global event queue? > > The main reason for the global event queue isn't message passing. The > reason is POSIX signals. Basically you aren't allowed to do anything > serious in a signal handler, especially you aren't allowed to broadcast > a condition or something. > So I came up with that experimental code of one thread doing signals. Yes, there has to be a separate thread to get signals, and each thread needs its own event queue, but why does the process have a global event_queue? I suppose there are generic events that could be handled just by the next thread to call check_events, but that isn't what this sounds like. > > And as for IO, I see the obvious advantages of performing synchronous > > IO functions in a separate thread to make them asynchronous, but that > > sounds like the job of a worker thread pool. There are many ways to > > implement this, but serializing them all through one queue sounds like > > a bottleneck to me. > > Yes. The AIO library is doing that anyway i.e. utilizing a thread pool > for IO operations. I don't see why there needs to be a separate thread to listen for IOs to finish. Can't that be the same thread that listens for signals? That is, the IO thread just spends its whole life doing select(). If it got a signal, select() should return EINTR, so the thread could then check a flag to see which signal was raised, queue the event in the proper queue(s), and call select() again. OK, I think I understand why...the event thread is in a loop waiting for somebody to tell it that there's an event in the global event queue...which is really the part I don't get yet. > Dan did post a series of documents to the list some time ago. Sorry I'be > no exact subject, but with relevant keywords like "events" you should > find it. Yeah, I remember reading some of his discussions with Damien Neil because I think I went to school with him. Anyway, here's my first draft for a Win32 event model: As for a Win32 event model, I think I should clarify what I'm talking about when I say Win32. Win32 IS NOT: The MS Services for Unix package provides a POSIX subsystem for Windows called Interix which is completely separate from Win32 (i.e. no GUI is possible, no Win SDK calls are available). It has fork(), symlinks, pthreads, SysV IPC, POSIX signals, pttys, and maybe even AIO. This config would be compiled like any other Unix variant with its own idiosyncracies. Win32 IS PROBABLY NOT: There are various POSIX emulation layers for Win32, such as cygwin and MinGW. These provide many function calls that Unix programs expect, but only to the degree that the Win32 subsystem allows (e.g. chmod likely will not do anything sensible). Since these programs still run under the Win32 subsystem, Windows GUIs are still possible. I don't know how these will interact with my event model. Win32 IS: This is the standard Win32 API as defined by NT4.0sp6a and higher. If you want to drop support for NT4, then we go to Win2k, but don't gain much. GUI message queues in Win32 are per thread. Each thread has a message queue that is autovivified. Any window that a thread creates has its messages sent to that thread's queue. However, there is no reason that a message actually has to have an associated window. You can send any thread in any process a message, so long as the thread has had its queue autovivified and is not crossing security boundaries. All files or things that look like files can be opened for async access. For example, sockets, files, and pipes can all be async. Any read, write, lock, unlock, or ioctl call can either signal a condition var (
silent effects of opcodes
I've now (locally here) extended Bill Coffman's register allocator by one subroutine that actually decides to use non-volatiles or volatiles according to pdd03. All variables that are live around a subroutine call are e.g. allocated from R16..R31. Variables not alive around a call (temps) are allocated preferred in the lower range first. Seems to work fine and is not really specific to this register allocator, nor to a specific ABI. Its just exploiting the fact that a bunch of registers are preserved around a call. Works fine *except* for the .flatten_arg directive. This directive takes an argument array and expands the array contents to function arguments in consecutive parrot registers. E.g. .arg a=> P5 .flatten_arg array=> P6, P7, ... The code emitted to achieve that runs in a loop and is using the Parrot opcode C which sets the xth Parrot register from Py. Now this array is typically a temporary and not not used around the call, so it gets allocated in the volatile register range, which then collides with the generated code for function argument passing. The register allocator doesn't know, that e.g. P6, P7 is effected by this opcode. see imcc/t/syn/pcc_20 - _25 for examples and ops/set.ops for usage information of this opcode. leo
main is just a sub
Parrot starts execution at the first sub (or that one denoted with @MAIN). This subroutine is called with pdd03 calling conventions like any other sub. So we have: P5 ... argv array I0 = 0, I3 = 1 ... one PMC argument passed A tailcall at the end of main is a valid operation to represent this code snippet: .main ... foo() .end as well as a .return() directive (or the omission of one, as missing return sequences are inserted by imcc). The only difference is that a return result of main is *not* promoted to the parent process, this can be achieved by the C opcode. Please note: PASM code still needs an "end" or the upcoming "returncc" opcode. leo
accessing self in methods
We should create some syntax to access the object in methods. It used to be: 1) self."bar"() where "self" automagically expanded to P2. The current offical way is this sequence 2) .include "interpinfo.pasm" $P0 = interpinfo .INTERPINFO_CURRENT_OBJECT $P0."bar"() This two-liner looks a bit bulky compared to the old syntax. I can imagine several ways to achieve the simplicity of 1) again, but this needs some effort in code generation inside imcc. As an intermediate step, I'm thinking of something like: 3) .GET_SELF($P0) This macro expands to the above two-liner and is defined internally. A final and optimal solution would expand "self" to either a (re)fetch into volatiles or non-volatiles, or depending on register allocation pressure and usage to a fetch once and reuse this register. Better solutions welcome, leo
Re: [perl #32418] Re: [PATCH] Register allocation patch - scales better to more symbols
Dan Sugalski wrote: Okay. I'll apply it and take a shot. May take a few hours to get a real number. How does it look like? Any results already? Thanks, leo
Re: [perl #32466] [PATCH] Parrot m4 0.0.10 and "eval" changes
Bernhard Schmalhofer <[EMAIL PROTECTED]> wrote: > this patch brings Parrot m4 to terms with recent "eval" changes. The compile > function of the 'eval' compiler now returns an Eval PMC. The m4 macro "eval" > is a simple interpreter of integer arithmetic expressions. Thanks, applied. leo
deprecation warning P0, P1
Due to adaptions to pdd03 the direct access to the return continuation is deprecated. Instead these constructs should be used: 1) PIR code * return from a sub .return() .return(foo) .return (foo, bar, baz) ... * get the current continuation (for call/cc) .include "interpinfo.pasm" .local pmc cont cont = interpinfo .INTERPINFO_CURRENT_CONT The returned continuation is already a real continuation, thus it doesn't need cloning any more. * get the current sub .local pmc sub sub = interpinfo .INTERPINFO_CURRENT_SUB 2) PASM code * return from a sub returncc [ proposed opcode, TBD ] * get the current continuation / sub .include "interpinfo.pasm" interpinfo Px, .INTERPINFO_CURRENT_CONT# or _SUB leo
Re: parakeet broken?
Jeff Horwitz <[EMAIL PROTECTED]> wrote: > i was starting to play with parakeet, but unfortunately it keeps dying on > me. this is from a cvs checkout from today: It needs for sure some adaption WRT the changes in the compreg/compile/invoke sequence aka "eval". leo
Re: Continuations, basic blocks, loops and register allocation
Matt Fowles <[EMAIL PROTECTED]> wrote: > ... Thus you can consider all of the > following questions (even though they will be phrased as statements). > 1) After a full continuation is taken all of the registers must be > considered invalid. Calling a subroutine allocates a new register frame, that subs register frame pointer in the context points to these fresh registers. A continuation restores the context it captured, i.e. at the place, where it was created. This is true for all continuations. Inside the context there is a *pointer* to a register frame, which is therefore restored too. The effect of taking a continuation is therefore to restore registers to that state where the continuation was created. Due to calling conventions a part of the registers is volatile (used during a call or as return results), while the other part is non-volatile. Until here there is no difference between return or full continuation. The effect of a full continuation can be to create a loop, where the known control flow doesn't show a loop. Without further syntax to denote such loops 1) is true. This register invalidation happens, if a preserved register was e.g. only used once after the call and then that register got reassigned, which is allowed for a linear control flow but not inside a loop. This has per se nothing to do with a continuation. If you got an opcode that does *silently* a "goto again_label" the CFG doesn't cope with the loop, because it isn't there and things start breaking. The effect of a full continuation *is* to create such loops. > 2) After a return continuation is taken, the registers can be trusted. Yes, according to usage in pdd03. > 3) If someone takes a full continuation, all return continuations > down the callstack must be promoted. If one *creates* a full continuation ... > 4) After a function call, some magic needs to happen so that the code > knows whether it came back to itself via a return continuation and can > trust its registers, or it came back via a full continuation and > cannot trust them. No. It's too late for magic. Either the CFG is known at compile time or refetching in the presence of full continuations is mandatory. For both the code must reflect the facts. > Corrections welcome, > Matt leo
Re: cvs commit: parrot/docs/pdds pdd03_calling_conventions.pod
Dan Sugalski <[EMAIL PROTECTED]> wrote: > At 9:16 PM +0100 11/16/04, Leopold Toetsch wrote: >>This would imply a distinct return opcode instead of C. > That went in, or was supposed to go in, as part of moving the return > continuation into the interpreter struct. I presume this hasn't > happened? It was supposed so, yes. But: Please read the start of the thread "calling conventions, tracebacks, and register allocator", from Nov 6th. I asked about the return sequence. Your answer was: "no changes to the calling conventions". So it didn't happen, yet. leo
Re: [perl #32466] [PATCH] Parrot m4 0.0.10 and "eval" changes
Bernhard Schmalhofer <[EMAIL PROTECTED]> wrote: > The 'eval' compiler returns a bytecode segment without a constant table. The > 'destroy' of the Eval PMC needs to handle that. How that? Are there no constants? Anyway, switching to a new bytecode segment does switch the constant table too, so all compiled code ought to have a constant table. leo