Re: light-weight calling conventions

2004-11-17 Thread Leopold Toetsch
Patrick R. Michaud <[EMAIL PROTECTED]> wrote:

> BTW, it may be very possible for me to write the p6ge generator so
> that it can be switched between the PIR and bsr/ret calling conventions,
> so we don't need to resolve this entirely now.  And we could then benchmark
> the two against each other.

That would be really great. There are a lot of things to consider, which
might or might not have an influence.
- tailcalls are faster then bsr/ret
- error traceback: not really easy with bsr/ret
- GC issues: the stack pushes consume GC-able object
- calling back into PIR (might work seemlessly or not - it's untested)

> Pm

Thanks,
leo


Re: silent effects of opcodes

2004-11-17 Thread Leopold Toetsch
Dan Sugalski <[EMAIL PROTECTED]> wrote:

> Exceptions and continuations should be the same problem -- the target
> is the start of a basic block. (Well, more than that, as they're
> places where calling conventions potentially kick in) This means the
> instruction immediately after a sub call starts a new block, as does
> the start of an exception handler.

Dan, I've already said that there is of course a new basic block. The
problem arises by the silent generation of loops in the CFG. Within a
loop the same register can't be reallocated to a different variable.
There are two possible solutions (AFAIK):

1) statically mark the branch target of the loop. Proposed syntax
constructs:

1a) for exceptions:

 set_eh handler, catch_label

  This is just a small adaption of the sequence of installing an
  exception handler.
  It depends a bit, if exception handlers are inline or nested
  closures or both.

1b) generally

 RESUMABLE: func_that_might_loop_through_cc()

  possibly accompanied with another markup of the function call that
  loops back.

2) Fetch all from lexicals/globals after a function call.

leo


Re: silent effects of opcodes

2004-11-17 Thread Leopold Toetsch
Bill Coffman <[EMAIL PROTECTED]> wrote:

> Since I understand the item about allocating registers between sub
> calls, I can probably implement that change, as I work through the
> control flow/data flow analysis.

This is already implemented, parts of it are in CVS.

> Sounds like everything else is okay.  We're just missing a few CFG
> arcs from the continuations stuff, which I'll let you all worry about.
> :)

Yep

> Bill

> ps: I'm making progress on grokking the cfg and register renaming
> stuff.  Will let you know.

This needs an SSA graph of the data flow?

BTW, looking at analyse_life_block() it seems that this allocates
unconditionally a Life_range structure. As these are O2 in (n_symbols *
n_basic_blocks) we could safe huge amounts of memory, by defining that a
missing life block propagates the previous one. Dunno if its possible,
though.

leo


Re: silent effects of opcodes

2004-11-17 Thread Bill Coffman
> >* [NEW] If register 15 or below is used, it should be cleared out,
> >ZEROED, after it's last use and before the next sub call.  This is for
> >security reasons.  Obviously, these registers will not be the first
> >choice to use.
> 
> Nope -- this isn't the job of the register allocator. We aren't
> leaving security issues up to bytecode except in a very few, limited
> cases. (All involving subroutines with elevated security credentials
> which the sub needs to drop after using things they allow)

Okay, looks like I misread an earlier message of Dan's.  The reason
that we cannot use R0-R15 through a sub, is that they are shredded. 
The values are not preserved through the sub call.

Since I understand the item about allocating registers between sub
calls, I can probably implement that change, as I work through the
control flow/data flow analysis.

Sounds like everything else is okay.  We're just missing a few CFG
arcs from the continuations stuff, which I'll let you all worry about.
:)

Bill

ps: I'm making progress on grokking the cfg and register renaming
stuff.  Will let you know.


Re: silent effects of opcodes

2004-11-17 Thread Dan Sugalski
At 2:02 PM -0800 11/17/04, Bill Coffman wrote:
So to generalize.  The following registers are available, under the
following conditions:
* [NEW] If register 15 or below is used, it should be cleared out,
ZEROED, after it's last use and before the next sub call.  This is for
security reasons.  Obviously, these registers will not be the first
choice to use.
Nope -- this isn't the job of the register allocator. We aren't 
leaving security issues up to bytecode except in a very few, limited 
cases. (All involving subroutines with elevated security credentials 
which the sub needs to drop after using things they allow)

Other observations:
* From new allocator bugs, and analysis, we've discovered that
exceptions cause new control flow edges, not previously considerd.
This case is being reworked by Leo?  to provide missing CFG edges,
through a minor change in the try block declaration.  (thread
"Continuations, basic blocks, loops and register allocation")
* The case of continuations has not been solved with respect to
register alloction.  Leo's RESUMEABLE: label might provide help here.
In any case, we can expect to see some additional edges being inserted
though.  (also thread "Continuations, basic blocks, loops and register
allocation")
Exceptions and continuations should be the same problem -- the target 
is the start of a basic block. (Well, more than that, as they're 
places where calling conventions potentially kick in) This means the 
instruction immediately after a sub call starts a new block, as does 
the start of an exception handler. (And I've got some docs on 
exceptions that should be out later tonight)
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: silent effects of opcodes

2004-11-17 Thread Bill Coffman
So to generalize.  The following registers are available, under the
following conditions:

* Registers R16-R31 are always available for the allocator.
* Registers R0-R15 are available between sub calls.  That is, for any
symbol, whose life range does not cross a subroutine.  (This implies
that all registers are available if no subs are called.)  Since we
have no way to determine if a sub is using those or not, any sub call
will be assumed to possibly use R0-R15.  Furthermore, even though we
know there are certain registers in that range, which are unused by
the calling convention, we will still not use them through a sub call
for security reasons.
* [NEW] If register 15 or below is used, it should be cleared out,
ZEROED, after it's last use and before the next sub call.  This is for
security reasons.  Obviously, these registers will not be the first
choice to use.
* Availability of these registers is subject to the rules for using
the Parrot opcode C, which were (are being?) worked
through by Leo.

Other observations:

* Leo introduced a flag on the symbol, to indicate if it's volatile or
not.  These will be eligible for R0-R15 (volitile registers?).
* From new allocator bugs, and analysis, we've discovered that
exceptions cause new control flow edges, not previously considerd. 
This case is being reworked by Leo?  to provide missing CFG edges,
through a minor change in the try block declaration.  (thread
"Continuations, basic blocks, loops and register allocation")
* The case of continuations has not been solved with respect to
register alloction.  Leo's RESUMEABLE: label might provide help here. 
In any case, we can expect to see some additional edges being inserted
though.  (also thread "Continuations, basic blocks, loops and register
allocation")

Did I miss anything?


Re: light-weight calling conventions (was: Second cut at a P6 grammar engine, in Parrot)

2004-11-17 Thread Nicholas Clark
On Wed, Nov 17, 2004 at 02:47:09PM -0700, Patrick R. Michaud wrote:

> BTW, it may be very possible for me to write the p6ge generator so 
> that it can be switched between the PIR and bsr/ret calling conventions,
> so we don't need to resolve this entirely now.  And we could then benchmark
> the two against each other.

Keeping the code that flexible would be very interesting. If you can
achieve this without much extra pain, I think that it would be worth it.

Nicholas Clark


Re: light-weight calling conventions (was: Second cut at a P6 grammar engine, in Parrot)

2004-11-17 Thread Patrick R. Michaud
On Wed, Nov 17, 2004 at 10:03:14PM +0100, Leopold Toetsch wrote:
> Dan Sugalski wrote:
> 
> As already stated, I don't consider these as either light-weight nor 
> faster. Here is a benchmark.
> 
> Below are 2 version of a recursive factorial program. fact(100) is 
> calculated 1000 times:
> 
> PIR   1.1 s
> bsr/ret   2.4 s
> PIR/tailcall  0.2s
> 
> Unoptimized Parrot, default i.e. slow run core.

BTW, it may be very possible for me to write the p6ge generator so 
that it can be switched between the PIR and bsr/ret calling conventions,
so we don't need to resolve this entirely now.  And we could then benchmark
the two against each other.

Pm



Re: light-weight calling conventions (was: Second cut at a P6 grammar engine, in Parrot)

2004-11-17 Thread Patrick R. Michaud
On Wed, Nov 17, 2004 at 10:03:14PM +0100, Leopold Toetsch wrote:
> As already stated, I don't consider these as either light-weight nor 
> faster. Here is a benchmark.
> 
> Below are 2 version of a recursive factorial program. fact(100) is 
> calculated 1000 times:
> 
> PIR   1.1 s
> bsr/ret   2.4 s
> PIR/tailcall  0.2s
> 
> Unoptimized Parrot, default i.e. slow run core.

Sure, but the bsr/ret in your version is making lots of saveall calls 
that I'd be avoiding.  Also, this code is saving pmc's (big ones at 
that) whereas I'll generally be pushing a few ints and maybe a string 
onto the stack.  So, rewriting the above for ints instead of PerlInts, 
changing the multiply op to add to stay within the range of ints, and 
removing the unneeded saves/restores for things that are being passed 
as parameters anyway (and doubling the count save/restore to make it 
somewhat closer to what I'd expect...):

[EMAIL PROTECTED] pmichaud]$ parrot pmfact.imc #PIR
500500
5.819842
[EMAIL PROTECTED] pmichaud]$ parrot pmfactbsr.imc  #bsr/ret
500500
2.010935

Please keep in mind that I'm a newcomer to Parrot, so it's entirely
possible that I'm made some invalid assumptions in my code that skew
these results (and I'll freely admit them if pointed out).
And I will admit that the PIR code is still impressive speed-wise
relative to what it is doing, but it's hard to ignore a 60% improvement.

Pm
.sub optc @IMMEDIATE
# TODO turn on -Oc
# print "optc\n"
.end
.sub _main @MAIN
.param pmc argv
.local int count, product
.local float start, end
count = 1000
.local int argc
argc = elements argv
if argc < 2 goto def
$S0 = argv[1]
count = $S0
def:
.local int i
i = 0
start = time
.local int n
loop:
n = count
product = 1
product = _fact(product, n)
inc i
if i < 1000 goto loop
end = time
end -= start
 print product
 print "\n"
print end
print "\n"
.end
.sub _fact
   .param int product
   .param int count
   if count > 1 goto recurs
   .return (product)
recurs:
   product += count
   dec count
   product = _fact(product, count)
   .return (product)
.end

.sub _main @MAIN
.param pmc argv
.local int count, product
.local float start, end
count = 1000
.local int argc
argc = elements argv
if argc < 2 goto def
$S0 = argv[1]
count = $S0
def:
.local int i
i = 0
start = time
.local int n
loop:
n = count
product = 1
save count
bsr fact
restore count
inc i
if i < 1000 goto loop
end = time
end -= start
print product
print "\n"
print end
print "\n"
goto ex

fact:
if count > 1 goto recurse
ret
recurse:
product += count
dec count
save count
save count
bsr fact
restore count
restore count
ret

ex:
.end



Re: light-weight calling conventions (was: Second cut at a P6 grammar engine, in Parrot)

2004-11-17 Thread Dan Sugalski
At 5:08 PM -0500 11/17/04, Dan Sugalski wrote:
Chopping out the multiplication (since that's a not-insignificant 
amount of the runtime for the bsr/ret version) gives:

PIR:
real0m3.016s
user0m2.990s
sys 0m0.030s
bsr/ret
real0m0.344s
user0m0.340s
sys 0m0.010s
and with -Oc, for completeness:
real0m0.416s
user0m0.380s
sys 0m0.030s
--
Dan
--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: light-weight calling conventions

2004-11-17 Thread Dan Sugalski
At 11:07 PM +0100 11/17/04, Leopold Toetsch wrote:
Please no premature optimizations.
It's important to note that
   premature optimization == things Leo disapproves of
The bsr/ret version of things is fine. In the absolute best case 
it'll be the same speed as tail calls, and in normal cases it'll be 
significantly faster since it, by definition, has a lot less work to 
do.
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: light-weight calling conventions (was: Second cut at a P6 grammar engine, in Parrot)

2004-11-17 Thread Dan Sugalski
At 10:03 PM +0100 11/17/04, Leopold Toetsch wrote:
Dan Sugalski wrote:
[ this came up WRT calling conventions ]
I assume he's doing bsr/ret to get into and out of the sub, which 
is going to be significantly faster.
Who says that?
As already stated, I don't consider these as either light-weight nor 
faster. Here is a benchmark.

Below are 2 version of a recursive factorial program. fact(100) is 
calculated 1000 times:

PIR   1.1 s
bsr/ret   2.4 s
PIR/tailcall  0.2s
Unoptimized Parrot, default i.e. slow run core.
Way to go with the overkill. I'm impressed. However, written more 
sanely the results are:

PIR:
real0m4.149s
user0m4.120s
sys 0m0.030s
bsr/ret:
real0m1.266s
user0m1.260s
sys 0m0.000s
Chopping out the multiplication (since that's a not-insignificant 
amount of the runtime for the bsr/ret version) gives:

PIR:
real0m3.016s
user0m2.990s
sys 0m0.030s
bsr/ret
real0m0.344s
user0m0.340s
sys 0m0.010s
The bsr/ret version is:
start:
new P16, .PerlInt
set P16, 1000
elements I16, P5
lt I16, 2, def
set S0, P5[1]
set P16, S0
def:   
set I16, 0
time N16
save N16

loop:
clone P1, P16
new P0, .PerlInt
set P0, 1
save P16
save I16
bsr fact
restore I16
restore P16
inc I16
lt I16, 1000, loop
restore N16
time N17
sub N17, N17, N16
print P0
print "\n"
print N17
print "\n"
end

# in: P0 is product, p1 is count
# out: P0 is new product
fact:
  gt P1, 1, doit
  ret
doit:
  mul P0, P0, P1
  dec P1
  bsr fact
  ret
--
Dan
--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: light-weight calling conventions

2004-11-17 Thread Leopold Toetsch
Patrick R. Michaud wrote:
On Wed, Nov 17, 2004 at 10:03:14PM +0100, Leopold Toetsch wrote:

[EMAIL PROTECTED] pmichaud]$ parrot pmfact.imc #PIR
500500
5.819842
[EMAIL PROTECTED] pmichaud]$ parrot pmfactbsr.imc  #bsr/ret
500500
2.010935
Ok:
$ parrot pmfactbsr.imc
500500
3.459947
$ parrot -Oc pmfact.imc
500500
1.237185
Now what ;)
Are you sure, that you can't do a tailcall sometimes? What about calling 
back into PIR code?

Please no premature optimizations.
Please keep in mind that I'm a newcomer to Parrot, so it's entirely
possible that I'm made some invalid assumptions in my code that skew
these results (and I'll freely admit them if pointed out).
I've first to understand the generated rules engine a bit. But generally 
speaking: let's first do it right and then fast.

And I will admit that the PIR code is still impressive speed-wise
relative to what it is doing, but it's hard to ignore a 60% improvement.
Or more ...
Pm
leo


Re: silent effects of opcodes

2004-11-17 Thread Dan Sugalski
At 10:12 PM +0100 11/17/04, Leopold Toetsch wrote:
Dan Sugalski <[EMAIL PROTECTED]> wrote:
 At 7:34 PM +0100 11/17/04, Leopold Toetsch wrote:

All registers are preserved, but some of these registers are used,
either by implict opcodes or as return values.

 Erm, no. Unused registers in the 0-15 range are explicitly garbage:
It was about usabalitiy of registers for the allocator. So before I make
a function call, these are allocatable as temps. Return values are
garbage, if not set.
As long as the allocator is set to assume that after a function call 
that all the registers in the range 0-15 that don't have return 
values are garbage. So if there are no string return values, string 
registers 0-15 are toast.

 >  Note that registers 16-31 of each of the four types are, for
  security reasons, I passed into the invoked subroutine,
  method, or continuation. They are guaranteed to be garbage.
Not quite. S and P regs have to be NULLed. Or you gonna tell the DOD
system how to mark garbage ;)
No, the invoke op.
--
Dan
--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: silent effects of opcodes

2004-11-17 Thread Leopold Toetsch
Dan Sugalski <[EMAIL PROTECTED]> wrote:
> At 7:34 PM +0100 11/17/04, Leopold Toetsch wrote:

>>All registers are preserved, but some of these registers are used,
>>either by implict opcodes or as return values.

> Erm, no. Unused registers in the 0-15 range are explicitly garbage:

It was about usabalitiy of registers for the allocator. So before I make
a function call, these are allocatable as temps. Return values are
garbage, if not set.

>  Note that registers 16-31 of each of the four types are, for
>  security reasons, I passed into the invoked subroutine,
>  method, or continuation. They are guaranteed to be garbage.

Not quite. S and P regs have to be NULLed. Or you gonna tell the DOD
system how to mark garbage ;)

>>  > * Registers P4, S1-S4, N0-N4 are free for allocation, regardless.
>>
>>I've included P3 (see below). If it's used it interfers.

> Nope. It'll either be set if a call returns overflow parameters, or
> unused and thus garbage.

Ah, yep. Thanks. It is returned too, forgot that.

leo


light-weight calling conventions (was: Second cut at a P6 grammar engine, in Parrot)

2004-11-17 Thread Leopold Toetsch
Dan Sugalski wrote:
[ this came up WRT calling conventions ]
I assume he's doing bsr/ret to get into and 
out of the sub, which is going to be significantly faster.
Who says that?
As already stated, I don't consider these as either light-weight nor 
faster. Here is a benchmark.

Below are 2 version of a recursive factorial program. fact(100) is 
calculated 1000 times:

PIR   1.1 s
bsr/ret   2.4 s
PIR/tailcall  0.2s
Unoptimized Parrot, default i.e. slow run core.
leo
.sub optc @IMMEDIATE
# TODO turn on -Oc
# print "optc\n"
.end
.sub _main @MAIN
.param pmc argv
.local pmc count, product
.local float start, end
count = new PerlInt
count = 1000
.local int argc
argc = elements argv
if argc < 2 goto def
$S0 = argv[1]
count = $S0
def:
.local int i
i = 0
start = time
.local pmc n
loop:
n = clone count
product = new PerlInt
product = 1
product = _fact(product, n)
inc i
if i < 1000 goto loop
end = time
end -= start
 print product
 print "\n"
print end
print "\n"
.end
.sub _fact
   .param pmc product
   .param pmc count
   if count > 1 goto recurs
   .return (product)
recurs:
   product *= count
   dec count
   product = _fact(product, count)
   .return (product)
.end

.sub optc @IMMEDIATE
# TODO turn on -Oc
# print "optc\n"
.end
.sub _main @MAIN
.param pmc argv
.local pmc count, product
.local float start, end
count = new PerlInt
count = 1000
.local int argc
argc = elements argv
if argc < 2 goto def
$S0 = argv[1]
count = $S0
def:
.local int i
i = 0
start = time
.local pmc n
loop:
n = clone count
product = new PerlInt
product = 1
save n
save product
bsr fact
restore product
inc i
if i < 1000 goto loop
end = time
end -= start
print product
print "\n"
print end
print "\n"
goto ex

fact:
saveall
.local pmc product, count
restore product
restore count
if count > 1 goto recurs
restoreall
save product
ret
recurs:
product *= count
dec count
save count
save product
bsr fact
restore product
restoreall
save product
ret
ex:
.end



Re: silent effects of opcodes

2004-11-17 Thread Dan Sugalski
At 2:14 PM +0100 11/17/04, Leopold Toetsch wrote:
Works fine *except* for the .flatten_arg directive. This directive 
takes an argument array and expands the array contents to function 
arguments in consecutive parrot registers. E.g.

  .arg a=> P5
  .flatten_arg array=> P6, P7, ...
The code emitted to achieve that runs in a loop and is using the 
Parrot opcode C which sets the xth Parrot register 
from Py.
Yep. The indirect access ops will cause problems for the PIR register 
allocation, since there's no way to know at compile time what's 
happening. Their use probably ought to invalidate all the registers, 
or the op restricted to pasm code.
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: silent effects of opcodes

2004-11-17 Thread Dan Sugalski
At 7:34 PM +0100 11/17/04, Leopold Toetsch wrote:
Bill Coffman <[EMAIL PROTECTED]> wrote:
 On Wed, 17 Nov 2004 14:14:18 +0100, Leopold Toetsch <[EMAIL PROTECTED]> wrote:
 I've now (locally here) extended Bill Coffman's register allocator by
 one subroutine that actually decides to use non-volatiles or volatiles
 according to pdd03. All variables that are live around a subroutine call
 >> are e.g. allocated from R16..R31.
 > Regarding pdd03, I am still not clear how it should be interpreted.
All registers are preserved, but some of these registers are used,
either by implict opcodes or as return values.
Erm, no. Unused registers in the 0-15 range are explicitly garbage:
Note that registers 16-31 of each of the four types are, for
security reasons, I passed into the invoked subroutine,
method, or continuation. They are guaranteed to be garbage.
 > * Registers P4, S1-S4, N0-N4 are free for allocation, regardless.
I've included P3 (see below). If it's used it interfers.
Nope. It'll either be set if a call returns overflow parameters, or 
unused and thus garbage.
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: silent effects of opcodes

2004-11-17 Thread Leopold Toetsch
Bill Coffman <[EMAIL PROTECTED]> wrote:
> On Wed, 17 Nov 2004 14:14:18 +0100, Leopold Toetsch <[EMAIL PROTECTED]> wrote:
>> I've now (locally here) extended Bill Coffman's register allocator by
>> one subroutine that actually decides to use non-volatiles or volatiles
>> according to pdd03. All variables that are live around a subroutine call
>> are e.g. allocated from R16..R31.

> Interesting.  I'd like to see it.

See below, you know where it's called from ;)

> Regarding pdd03, I am still not clear how it should be interpreted.

All registers are preserved, but some of these registers are used,
either by implict opcodes or as return values.

> * If the subroutine being allocated is a leaf (with no sub calls),
> then all registers should be available.

Yep.

> * Registers P4, S1-S4, N0-N4 are free for allocation, regardless.

I've included P3 (see below). If it's used it interfers.

> * It seems like it would be simple enough to provide a "compiler
> hint", to let the allocator know if the subs it calls are using the
> parrot convention or not, or how many of the R5-R15 it will need.

The register allocator is only supporting pdd03, nothing else. The
amount of needed R5 - R15 is unknown, as these are return results.

> ...  This can then be used as part of a static analysis,
> and can be incorporated into the unit data structure, or passed as a
> separate parameter to imc_reg_alloc().

Yep and it's working.

> ~Bill

leo

/*
 * find available color for register #x in available colors
 */
static int
ig_find_color(Interp* interpreter, IMC_Unit *unit, int x, char *avail)
{
int c, t;
SymReg *r;
static const char types[] = "ISPN";

static const char assignable[4][5] = {
   /* 0  1  2  3  4  */
{ 0, 0, 0, 0, 0, }, /* I */
{ 0, 1, 1, 1, 1, }, /* S */
{ 0, 0, 0, 1, 1, }, /* P */
{ 1, 1, 1, 1, 1, }, /* N */
};


UNUSED(interpreter);
r = unit->reglist[x];
t = strchr(types, r->set) - types;

/* please note: c is starting at 1 for R0 */
if (!(r->usage & U_NON_VOLATILE)) {
/* 0) 5-15 volatile range */
for (c = 6; c <= 16; c++)
if (avail[c])
return c;
}
/* 1) try upper non-volatiles, 16...31 */
for (c = 17; c <= 32; c++)
if (avail[c])
return c;
/* some lower regs are preserved too 0...4 */
for (c = 1; c <= 5; c++)
if (avail[c] && assignable[t][c - 1])
return c;
/* no chance, force high range with possible spilling */
for (c = 33; ; c++)
if (avail[c])
return c;
assert(0);
return 0;
}


Re: silent effects of opcodes

2004-11-17 Thread Bill Coffman
On Wed, 17 Nov 2004 14:14:18 +0100, Leopold Toetsch <[EMAIL PROTECTED]> wrote:
> I've now (locally here) extended Bill Coffman's register allocator by
> one subroutine that actually decides to use non-volatiles or volatiles
> according to pdd03. All variables that are live around a subroutine call
> are e.g. allocated from R16..R31.

Interesting.  I'd like to see it. 

Regarding pdd03, I am still not clear how it should be interpreted. 
The current pdd03, as well as the previous one, both seem to indicate
that registers 0-15 are likely to be overwritten, and anyone making a
call, should save those registers if they still want them.  The issue
with PIR Code, is that the author won't know which of their symbols
are mapping to registers about to be killed.  So, as previously
discussed, those registers will have to be hands off for the register
allocator.  That is essentially how the old and new alloctor have been
working.  But this doesn't have to always be the case.

* If the subroutine being allocated is a leaf (with no sub calls),
then all registers should be available.
* Registers P4, S1-S4, N0-N4 are free for allocation, regardless.
* It seems like it would be simple enough to provide a "compiler
hint", to let the allocator know if the subs it calls are using the
parrot convention or not, or how many of the R5-R15 it will need. 
>From this hint, a bit mask saying which registers are available could
be constructed.  This can then be used as part of a static analysis,
and can be incorporated into the unit data structure, or passed as a
separate parameter to imc_reg_alloc().

I wouldn't think this last idea would be considered a change to the
calling convention, but rather as an optional optimization prototype. 
Not part of pasm.  Dan, would something like this be allowed?

~Bill


COND macros (was: Threads, events, Win32, etc.)

2004-11-17 Thread Leopold Toetsch
Gabe Schaffer <[EMAIL PROTECTED]> wrote:
>> >> Not quite. COND_WAIT takes an opaque type defined by the platform, that
>> >> happens to be a mutex for the pthreads based implementation.
>>
>> > It should, but it doesn't. Here's the definition:
>> > #  define COND_WAIT(c,m) pthread_cond_wait(&c, &m)
>>
>> You are already in the POSIX specific part.

> It came from thr_pthread.h, so it should be POSIX. The issue here is
> that it's #define COND_WAIT(c,m) instead of #define COND_WAIT(c).

Well in the mentioned (TODO) platform/win32/threads.h you have to define
your own COND_WAIT(c, m) - this is the interface of that macro, as POSIX
needs the mutex, but you would ignore the 2nd parameter.

Please have a look at the empty defines in include/parrot/threads.h.

The problem is a different one: the COND_INIT macro just passes a
condition location, the mutex is created in a second step, which isn't
needed for windows. OTOH a mutex aka critical section is needed
separatly.

So we should probably define these macros to be:

  COND_INIT(c, m)
  COND_DESTROY(c, m)

see src/tsq.c for usage.

Does win32 require more info to create conditions/mutexes or would these
macros suffice?

[ I'll try to answer more in a separate thread ]

leo


Re: Threads, events, Win32, etc.

2004-11-17 Thread Leopold Toetsch
Gabe Schaffer <[EMAIL PROTECTED]> wrote:

> Yes, there has to be a separate thread to get signals, and each thread
> needs its own event queue, but why does the process have a global
> event_queue? I suppose there are generic events that could be handled
> just by the next thread to call check_events, but that isn't what this
> sounds like.

It's mainly intended for broadcasts and timers. POSIX signals are weird
and more or less broken from platform to platform. The only reliable way
to get at them is to block the desired signal in all but one thread.
This signal gets converted to a global event and from there it can be
put into specifc threads if they have installed signal handlers for that
signal.

But as said the existing code is experimental and is likely to change a
lot.

> I don't see why there needs to be a separate thread to listen for IOs
> to finish. Can't that be the same thread that listens for signals?

That's the plan yes. AIO completion can be delivered as a signal.

> OK, I think I understand why...the event thread is in a loop waiting
> for somebody to tell it that there's an event in the global event
> queue...which is really the part I don't get yet.

Well, the event thread is handling timer events on behalf of an
interpreter.

[ long win32 proposal ]

I've to read through that some more times.

Do you alread have ideas for a common API, or where to split the
existing threads.c into platform and common code?

> GNS

leo


Re: silent effects of opcodes

2004-11-17 Thread Leopold Toetsch
Leopold Toetsch <[EMAIL PROTECTED]> wrote:

[ setp_ind troubles ]

I've found a way to force allocation to R16..R31 in the presence of this
opcode.

leo


Re: [perl #32466] [PATCH] Parrot m4 0.0.10 and "eval" changes

2004-11-17 Thread Leopold Toetsch
Bernhard Schmalhofer <[EMAIL PROTECTED]> wrote:
> Leopold Toetsch wrote:
>>
>> How that? Are there no constants?

> Yes, there are no constants. The only thing the generated sub does, is
> to return an integer value, that was computed in the C-Code.
> Thus the m4 macro "eval( 1 ^ 3 )" compiles into a sub that looks in PIR
> like:

> .sub generated_sub
>.return( 3 )
> .end

I see. And what about the equivalent of eval("ab"  "cd") or
eval(1.3 + 2.5) ?

> CU, Bernhard

leo


Re: Perl 6 Summary for 2004-11-08 through 2004-11-15

2004-11-17 Thread Jeff Horwitz
On Mon, 15 Nov 2004, Matt Fowles wrote:
>   Languages with Object Support?
>Jeff Horwitz wondered if there were any languages with object support
>that he could bend to the evil ends of mod_parrot. While no one
>answered, I think Parakeet might be such a language...

parakeet's a newcomer to the languages directory, so i hadn't seen it
before.  it has objects and functions, so it should fit in nicely with
mod_parrot.  it's currently broken with all the changes that have been
going on, but michel is working on the fixes.

good suggestion, matt.  :)

-jeff



Re: [perl #32466] [PATCH] Parrot m4 0.0.10 and "eval" changes

2004-11-17 Thread Bernhard Schmalhofer
Leopold Toetsch wrote:
Bernhard Schmalhofer <[EMAIL PROTECTED]> wrote:

The 'eval' compiler returns a bytecode segment without a constant table. The
'destroy' of the Eval PMC needs to handle that.

How that? Are there no constants? Anyway, switching to a new bytecode
segment does switch the constant table too, so all compiled code ought
to have a constant table.
Yes, there are no constants. The only thing the generated sub does, is 
to return an integer value, that was computed in the C-Code.
Thus the m4 macro "eval( 1 ^ 3 )" compiles into a sub that looks in PIR 
like:

.sub generated_sub
  .return( 3 )
.end
Of course it would be much more simle to use a plain NCI call for this 
purpose.
But I wanted to play with 'compreg' when I implemented that.

CU, Bernhard
--
**
Dipl.-Physiker Bernhard Schmalhofer
Senior Developer
Biomax Informatics AG
Lochhamer Str. 11
82152 Martinsried, Germany
Tel: +49 89 895574-839
Fax: +49 89 895574-825
eMail: [EMAIL PROTECTED]
Website: www.biomax.com
**


Re: [perl #32418] Re: [PATCH] Register allocation patch - scales better to more symbols

2004-11-17 Thread Dan Sugalski
At 11:35 AM +0100 11/17/04, Leopold Toetsch wrote:
Dan Sugalski wrote:
Okay. I'll apply it and take a shot. May take a few hours to get a 
real number.
How does it look like? Any results already?
Nope, haven't had time, unfortunately. Work's been busy. Today, if I get 
lucky.
--
Dan
--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: accessing self in methods

2004-11-17 Thread Dan Sugalski
At 11:00 AM +0100 11/17/04, Leopold Toetsch wrote:
We should create some syntax to access the object in methods.
Well, there are two issues here.
First is in pasm/bytecode. For that, fetching things explicitly with 
interpinfo is just fine, so the code sequence:

 interpinfo P16, .INTERPINFO_CURRENT_OBJECT
works.
At the PIR level, self is just a special-case .local, so I don't see 
much reason to do anything special there either -- the method tag on 
the .sub declaration should be enough to tell the pir compiler that 
it ought to go fetch the object into a register for use later on.

If you wanted to use this as a time to tie named .local declarations 
to lexical pad slots and global names so the spilling code can 
refetch spilled things from the pad/namespace rather than from a 
private backing array, that'd be fine too. self would just spill in 
from the interpreter info rather than a pad or namespace.
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Continuations, basic blocks, loops and register allocation

2004-11-17 Thread Matt Fowles
Leo~

Thanks for the clarification.

Matt


On Wed, 17 Nov 2004 08:48:58 +0100, Leopold Toetsch <[EMAIL PROTECTED]> wrote:
> Matt Fowles <[EMAIL PROTECTED]> wrote:
> 
> > ...  Thus you can consider all of the
> > following questions (even though they will be phrased as statements).
> 
> > 1)  After a full continuation is taken all of the registers must be
> > considered invalid.
> 
> Calling a subroutine allocates a new register frame, that subs register
> frame pointer in the context points to these fresh registers.
> 
> A continuation restores the context it captured, i.e. at the place,
> where it was created. This is true for all continuations. Inside the
> context there is a *pointer* to a register frame, which is therefore
> restored too.
> 
> The effect of taking a continuation is therefore to restore registers to
> that state where the continuation was created. Due to calling conventions
> a part of the registers is volatile (used during a call or as return
> results), while the other part is non-volatile.
> 
> Until here there is no difference between return or full continuation.
> 
> The effect of a full continuation can be to create a loop, where the
> known control flow doesn't show a loop. Without further syntax to denote
> such loops 1) is true. This register invalidation happens, if a
> preserved register was e.g. only used once after the call and then that
> register got reassigned, which is allowed for a linear control flow but
> not inside a loop.
> 
> This has per se nothing to do with a continuation. If you got an opcode
> that does *silently* a "goto again_label" the CFG doesn't cope with the
> loop, because it isn't there and things start breaking. The effect of a
> full continuation *is* to create such loops.
> 
> > 2)  After a return continuation is taken, the registers can be trusted.
> 
> Yes, according to usage in pdd03.
> 
> > 3)  If someone takes a full continuation, all return continuations
> > down the callstack must be promoted.
> 
> If one *creates* a full continuation ...
> 
> > 4)  After a function call, some magic needs to happen so that the code
> > knows whether it came back to itself via a return continuation and can
> > trust its registers, or it came back via a full continuation and
> > cannot trust them.
> 
> No. It's too late for magic. Either the CFG is known at compile time or
> refetching in the presence of full continuations is mandatory. For both
> the code must reflect the facts.
> 
> > Corrections welcome,
> > Matt
> 
> leo
> 


-- 
"Computer Science is merely the post-Turing Decline of Formal Systems Theory."
-???


Re: Threads, events, Win32, etc.

2004-11-17 Thread Gabe Schaffer
> >> Not quite. COND_WAIT takes an opaque type defined by the platform, that
> >> happens to be a mutex for the pthreads based implementation.
> 
> > It should, but it doesn't. Here's the definition:
> > #  define COND_WAIT(c,m) pthread_cond_wait(&c, &m)
> 
> You are already in the POSIX specific part.

It came from thr_pthread.h, so it should be POSIX. The issue here is
that it's #define COND_WAIT(c,m) instead of #define COND_WAIT(c).
Every place in the code, whether it's Win32 or POSIX, is going to have
to pass in a condition variable and a mutex. Just because Win32 will
ignore the second parameter, that isn't going to prevent the code from
creating the mutex, initializing it, and passing it in.

> >> I'm not sure, if we even should support Win9{8,5}.
> 
> > I'd be happy with simply implementing Win9x as a non-threaded
> > platform. Of course, hopefully nobody will even ask...
> 
> We'll see. But as Parrot's IO system is gonna be asynchronous in core, I
> doubt that we'll support it.

Obviously Parrot has to run on non-threaded platforms where the kernel
threading and AIO stuff just won't work. You can still do user
threads, but file IO will still block everything.

> > rationale. I can understand why there would need to be a global event
> > thread (timers, GC, DoD), but why would passing a message from one
> > thread to another need to be serialized through a global event queue?
> 
> The main reason for the global event queue isn't message passing. The
> reason is POSIX signals. Basically you aren't allowed to do anything
> serious in a signal handler, especially you aren't allowed to broadcast
> a condition or something.
> So I came up with that experimental code of one thread doing signals.

Yes, there has to be a separate thread to get signals, and each thread
needs its own event queue, but why does the process have a global
event_queue? I suppose there are generic events that could be handled
just by the next thread to call check_events, but that isn't what this
sounds like.

> > And as for IO, I see the obvious advantages of performing synchronous
> > IO functions in a separate thread to make them asynchronous, but that
> > sounds like the job of a worker thread pool. There are many ways to
> > implement this, but serializing them all through one queue sounds like
> > a bottleneck to me.
> 
> Yes. The AIO library is doing that anyway i.e. utilizing a thread pool
> for IO operations.

I don't see why there needs to be a separate thread to listen for IOs
to finish. Can't that be the same thread that listens for signals?
That is, the IO thread just spends its whole life doing select(). If
it got a signal, select() should return EINTR, so the thread could
then check a flag to see which signal was raised, queue the event in
the proper queue(s), and call select() again.

OK, I think I understand why...the event thread is in a loop waiting
for somebody to tell it that there's an event in the global event
queue...which is really the part I don't get yet.

> Dan did post a series of documents to the list some time ago. Sorry I'be
> no exact subject, but with relevant keywords like "events" you should
> find it.

Yeah, I remember reading some of his discussions with Damien Neil
because I think I went to school with him.

Anyway, here's my first draft for a Win32 event model:

As for a Win32 event model, I think I should clarify what I'm talking
about when I say Win32.

Win32 IS NOT: The MS Services for Unix package provides a POSIX
subsystem for Windows called Interix which is completely separate from
Win32 (i.e. no GUI is possible, no Win SDK calls are available). It
has fork(), symlinks, pthreads, SysV IPC, POSIX signals, pttys, and
maybe even AIO. This config would be compiled like any other Unix
variant with its own idiosyncracies.

Win32 IS PROBABLY NOT: There are various POSIX emulation layers for
Win32, such as cygwin and MinGW. These provide many function calls
that Unix programs expect, but only to the degree that the Win32
subsystem allows (e.g. chmod likely will not do anything sensible).
Since these programs still run under the Win32 subsystem, Windows GUIs
are still possible. I don't know how these will interact with my event
model.

Win32 IS: This is the standard Win32 API as defined by NT4.0sp6a and
higher. If you want to drop support for NT4, then we go to Win2k, but
don't gain much.

GUI message queues in Win32 are per thread. Each thread has a message
queue that is autovivified. Any window that a thread creates has its
messages sent to that thread's queue. However, there is no reason that
a message actually has to have an associated window. You can send any
thread in any process a message, so long as the thread has had its
queue autovivified and is not crossing security boundaries.

All files or things that look like files can be opened for async
access. For example, sockets, files, and pipes can all be async. Any
read, write, lock, unlock, or ioctl call can either signal a condition
var (

silent effects of opcodes

2004-11-17 Thread Leopold Toetsch
I've now (locally here) extended Bill Coffman's register allocator by 
one subroutine that actually decides to use non-volatiles or volatiles 
according to pdd03. All variables that are live around a subroutine call 
are e.g. allocated from R16..R31.

Variables not alive around a call (temps) are allocated preferred in the 
lower range first.

Seems to work fine and is not really specific to this register 
allocator, nor to a specific ABI. Its just exploiting the fact that a 
bunch of registers are preserved around a call.

Works fine *except* for the .flatten_arg directive. This directive takes 
an argument array and expands the array contents to function arguments 
in consecutive parrot registers. E.g.

  .arg a=> P5
  .flatten_arg array=> P6, P7, ...
The code emitted to achieve that runs in a loop and is using the Parrot 
opcode C which sets the xth Parrot register from Py.

Now this array is typically a temporary and not not used around the 
call, so it gets allocated in the volatile register range, which then 
collides with the generated code for function argument passing.

The register allocator doesn't know, that e.g. P6, P7 is effected by 
this opcode.

see imcc/t/syn/pcc_20 - _25 for examples and ops/set.ops for usage 
information of this opcode.

leo


main is just a sub

2004-11-17 Thread Leopold Toetsch
Parrot starts execution at the first sub (or that one denoted with 
@MAIN). This subroutine is called with pdd03 calling conventions like 
any other sub. So we have:

   P5   ... argv array
   I0 = 0, I3 = 1   ... one PMC argument passed
A tailcall at the end of main is a valid operation to represent this 
code snippet:

 .main
...
foo()
 .end
as well as a .return() directive (or the omission of one, as missing 
return sequences are inserted by imcc).

The only difference is that a return result of main is *not* promoted to 
the parent process, this can be achieved by the C opcode.

Please note: PASM code still needs an "end" or the upcoming "returncc" 
opcode.

leo


accessing self in methods

2004-11-17 Thread Leopold Toetsch
We should create some syntax to access the object in methods.
It used to be:
1)
   self."bar"()
where "self" automagically expanded to P2.
The current offical way is this sequence
2)
   .include "interpinfo.pasm"
   $P0 = interpinfo .INTERPINFO_CURRENT_OBJECT
   $P0."bar"()
This two-liner looks a bit bulky compared to the old syntax.
I can imagine several ways to achieve the simplicity of 1) again, but 
this needs some effort in code generation inside imcc.

As an intermediate step, I'm thinking of something like:
3)
   .GET_SELF($P0)
This macro expands to the above two-liner and is defined internally.
A final and optimal solution would expand "self" to either a (re)fetch 
into volatiles or non-volatiles, or depending on register allocation 
pressure and usage to a fetch once and reuse this register.

Better solutions welcome,
leo


Re: [perl #32418] Re: [PATCH] Register allocation patch - scales better to more symbols

2004-11-17 Thread Leopold Toetsch
Dan Sugalski wrote:
Okay. I'll apply it and take a shot. May take a few hours to get a real 
number.
How does it look like? Any results already?
Thanks,
leo


Re: [perl #32466] [PATCH] Parrot m4 0.0.10 and "eval" changes

2004-11-17 Thread Leopold Toetsch
Bernhard Schmalhofer <[EMAIL PROTECTED]> wrote:

> this patch brings Parrot m4 to terms with recent "eval" changes. The compile
> function of the 'eval' compiler now returns an Eval PMC. The m4 macro "eval"
> is a simple interpreter of integer arithmetic expressions.

Thanks, applied.
leo


deprecation warning P0, P1

2004-11-17 Thread Leopold Toetsch
Due to adaptions to pdd03 the direct access to the return continuation 
is deprecated.

Instead these constructs should be used:
1) PIR code
* return from a sub
   .return()
   .return(foo)
   .return (foo, bar, baz)
   ...
* get the current continuation (for call/cc)
   .include "interpinfo.pasm"
.local pmc cont
cont = interpinfo .INTERPINFO_CURRENT_CONT
  The returned continuation is already a real continuation, thus it 
doesn't need cloning any more.

* get the current sub
.local pmc sub
sub = interpinfo .INTERPINFO_CURRENT_SUB
2) PASM code
* return from a sub
returncc  [ proposed opcode, TBD ]
* get the current continuation / sub
   .include "interpinfo.pasm"
interpinfo Px, .INTERPINFO_CURRENT_CONT# or _SUB
leo


Re: parakeet broken?

2004-11-17 Thread Leopold Toetsch
Jeff Horwitz <[EMAIL PROTECTED]> wrote:
> i was starting to play with parakeet, but unfortunately it keeps dying on
> me.  this is from a cvs checkout from today:

It needs for sure some adaption WRT the changes in the
compreg/compile/invoke sequence aka "eval".

leo


Re: Continuations, basic blocks, loops and register allocation

2004-11-17 Thread Leopold Toetsch
Matt Fowles <[EMAIL PROTECTED]> wrote:

> ...  Thus you can consider all of the
> following questions (even though they will be phrased as statements).

> 1)  After a full continuation is taken all of the registers must be
> considered invalid.

Calling a subroutine allocates a new register frame, that subs register
frame pointer in the context points to these fresh registers.

A continuation restores the context it captured, i.e. at the place,
where it was created. This is true for all continuations. Inside the
context there is a *pointer* to a register frame, which is therefore
restored too.

The effect of taking a continuation is therefore to restore registers to
that state where the continuation was created. Due to calling conventions
a part of the registers is volatile (used during a call or as return
results), while the other part is non-volatile.

Until here there is no difference between return or full continuation.

The effect of a full continuation can be to create a loop, where the
known control flow doesn't show a loop. Without further syntax to denote
such loops 1) is true. This register invalidation happens, if a
preserved register was e.g. only used once after the call and then that
register got reassigned, which is allowed for a linear control flow but
not inside a loop.

This has per se nothing to do with a continuation. If you got an opcode
that does *silently* a "goto again_label" the CFG doesn't cope with the
loop, because it isn't there and things start breaking. The effect of a
full continuation *is* to create such loops.

> 2)  After a return continuation is taken, the registers can be trusted.

Yes, according to usage in pdd03.

> 3)  If someone takes a full continuation, all return continuations
> down the callstack must be promoted.

If one *creates* a full continuation ...

> 4)  After a function call, some magic needs to happen so that the code
> knows whether it came back to itself via a return continuation and can
> trust its registers, or it came back via a full continuation and
> cannot trust them.

No. It's too late for magic. Either the CFG is known at compile time or
refetching in the presence of full continuations is mandatory. For both
the code must reflect the facts.

> Corrections welcome,
> Matt

leo


Re: cvs commit: parrot/docs/pdds pdd03_calling_conventions.pod

2004-11-17 Thread Leopold Toetsch
Dan Sugalski <[EMAIL PROTECTED]> wrote:
> At 9:16 PM +0100 11/16/04, Leopold Toetsch wrote:

>>This would imply a distinct return opcode instead of C.

> That went in, or was supposed to go in, as part of moving the return
> continuation into the interpreter struct. I presume this hasn't
> happened?

It was supposed so, yes. But:

Please read the start of the thread "calling conventions, tracebacks,
and register allocator", from Nov 6th.

I asked about the return sequence.  Your answer was: "no changes to the
calling conventions".

So it didn't happen, yet.

leo


Re: [perl #32466] [PATCH] Parrot m4 0.0.10 and "eval" changes

2004-11-17 Thread Leopold Toetsch
Bernhard Schmalhofer <[EMAIL PROTECTED]> wrote:

> The 'eval' compiler returns a bytecode segment without a constant table. The
> 'destroy' of the Eval PMC needs to handle that.

How that? Are there no constants? Anyway, switching to a new bytecode
segment does switch the constant table too, so all compiled code ought
to have a constant table.

leo