Re: Register stacks, return continuations, and speeding up calling

2004-10-20 Thread Leopold Toetsch
Dan Sugalski [EMAIL PROTECTED] wrote:

 The first is the CPS style chews through continuation objects at a
 massive rate, and using them means that everything needs to be COW.

We don't have COWed stacks, they are all single chunk already.

 A return continuation is a *potential* continuation.

Ok, the return of the return continuation recycling. But it's likely a
bit more complicated. Creating a Continuation somewhere down a call
chain has to mark all return continuations up as non-recyclable. And as
the Continuation PMC can get passed anywhere, it looks to me that just
the presence of one Continuation PMC invalidates all return
continuations.

The proposed scheme with big register chunks and a watermark at the
frame pointer of the highest continuation looks more sane.

 ..., and it will completely invalidate all the
 existing JIT code.

Only i386, which is the only one that does not already a CPU register
indirect addressing of Parrot registers. i386 is currently being fixed.
When that's done, all JIT platforms basically need *one* additional
instruction in one place, e.g. for PPC:

  r16 = BP_OFFS(r13)# get base bointer from interpreter

 ..., but I'm kinda sick at the moment,

Get well soon.

leo


Re: [perl #31919] [PATCH] Win32 perlnum test failure - test 36 (+- zero)

2004-10-20 Thread Leopold Toetsch
Ron Blaschke [EMAIL PROTECTED] wrote:

 Visual C++ compiles -0.0 to 0.0, which leads to the error.  Attached
 patch will fix this.

Thanks, applied.
leo


Re: Problems with 0.1.1 release on x86-64

2004-10-20 Thread Leopold Toetsch
Brian Wheeler [EMAIL PROTECTED] wrote:
 Sigh.  I'll get this right sometime!

Thanks, applied.
leo


Re: Pathological Register Allocation Test Generator

2004-10-20 Thread Leopold Toetsch
Bill Coffman [EMAIL PROTECTED] wrote:

 I am currently working on a fix to the large subroutine register
 allocation bug, aka, massive spilling not yet implemented.  The
 problem, is that the register allocation code is complex, and I'm not
 all that familiar with it, or even with working with compilers at the
 coding level.

If you encounter any problems, please just ask.

 I am comming at the problem as a former graph theorist,

That's good.

 ... The first attached program (gen3.pl),

Some remargs WRT gen{3,4}.pl:
1) While these programs exhibit some worst case register layout it's
   probably not a very typical layout.
2) RL programs have lexicals and globals access spread over the code like
   in the generated gen.imc
3) and that's intersparsed with highly local access to temporaries coming
   from expression evaluations.

I'd change the simulation program to use PMCs to allow for 2). Now when
it comes to spilling, these lexicals or globals don't need a new
storage, their live range can just get discarded and at the next usage
of this lexical or global it just can be refetched[1]. Implementing this
should already vastly improve the register allocation.

[1] The refetching can of course be into a different Parrot register.
Thus the usage of globals or lexicals wouldn't interfer and generate
huge live ranges over the whole function as it currently does.

I don't know if we need some syntax to easily support lexicals or
globals or if the register allocation can deduce that itself. But if a
new syntax simplifies that, we put it in.

For 3) there is already a separate pass in
imcc/reg_alloc.c:allocate_non_interfering().

leo


Re: Perl 6 Summary for 2004-10-01 through 2004-10-17

2004-10-20 Thread Peter Scott
In article [EMAIL PROTECTED],
 [EMAIL PROTECTED] (Matthew Walton) writes:
Austin Hastings wrote:
 Does this mean that we're done?   :)

No, it means Larry's about to stun us with something seemingly bizarre 
and inexplicable which turns out to be a stroke of genius.

This conjured up an image of Larry whacking someone with
a coelacanth...

-- 
Peter Scott
http://www.perldebugged.com/
*** NEW *** http://www.perlmedic.com/


Re: Register stacks, return continuations, and speeding up calling

2004-10-20 Thread Dan Sugalski
At 9:16 AM +0200 10/20/04, Leopold Toetsch wrote:
Dan Sugalski [EMAIL PROTECTED] wrote:
 The first is the CPS style chews through continuation objects at a
 massive rate, and using them means that everything needs to be COW.
We don't have COWed stacks, they are all single chunk already.
Yeah, I know, but I've been considering allowing stack back-peeking 
and modifying. Sorry, wasn't clear.

  A return continuation is a *potential* continuation.
Ok, the return of the return continuation recycling. But it's likely a
bit more complicated. Creating a Continuation somewhere down a call
chain has to mark all return continuations up as non-recyclable.
Right. Any time an actual continuation is created we need to walk 
back up the call chain and mark all the pending return continuations 
as non-recyclable.

 And as
the Continuation PMC can get passed anywhere, it looks to me that just
the presence of one Continuation PMC invalidates all return
continuations.
No, it doesn't, luckily. The worst case here is that invoking a 
continuation will leave some return continuations to be cleaned up by 
a DOD sweep.

The proposed scheme with big register chunks and a watermark at the
frame pointer of the highest continuation looks more sane.
Yeah, that's what I thought back when we started parrot--that's the 
original design of the register stacks. At this point I *don't* think 
it's the right thing, for a few reasons.

*) It was annoyingly error prone
*) If we switch to the scheme we've been arguing over the number of 
register frame pushes will approach 0, so there's little point in the 
complexity.

If we're not saving much on the register stacks (and with the switch 
in calls we won't be, which means we can drop the pushtop/poptop 
stuff on calls) it's easier to go with a one-frame-per-chunk setup.

  ..., and it will completely invalidate all the
 existing JIT code.
Only i386, which is the only one that does not already a CPU register
indirect addressing of Parrot registers. i386 is currently being fixed.
When that's done, all JIT platforms basically need *one* additional
instruction in one place, e.g. for PPC:
  r16 = BP_OFFS(r13)# get base bointer from interpreter
What, we have two registers dedicated? One for the interpreter 
pointer and one for the start of registers? I didn't realize that. If 
so, then nevermind.

  ..., but I'm kinda sick at the moment,
Get well soon.
Let's hope that's after we get this hashed out. :)
--
Dan
--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Register stacks, return continuations, and speeding up calling

2004-10-20 Thread Leopold Toetsch
Dan Sugalski wrote:
At 9:16 AM +0200 10/20/04, Leopold Toetsch wrote:

Right. Any time an actual continuation is created we need to walk back 
up the call chain and mark all the pending return continuations as 
non-recyclable.
Ok.
If we're not saving much on the register stacks (and with the switch in 
calls we won't be, which means we can drop the pushtop/poptop stuff on 
calls) it's easier to go with a one-frame-per-chunk setup.
Yep, it's easier. Let's start with that.
  r16 = BP_OFFS(r13)# get base bointer from interpreter
What, we have two registers dedicated? One for the interpreter pointer 
and one for the start of registers? I didn't realize that. If so, then 
nevermind.
Well, not quite ;) But it's absolutely no problem for e.g. PPC. It got 
plenty of callee-saved registers. For i386 the frame-pointer is 
currently being created in %ebx, interpreter access, which is basically 
rare, needs:

  mov -16(%ebp), %eax   # get interpreter
And that's needed too for getting the new frame pointer
  mov BP_OFFS(%eax), %ebx   # only for reload after invoke
Accessing e.g. a non-mapped I2 is already:
  mov 8(%ebx), %eax
which isn't worse then the old absolute address thingy.
leo


Re: cvs commit: parrot/jit/i386 jit_emit.h

2004-10-20 Thread Leopold Toetsch
Leopold Toetsch [EMAIL PROTECTED] wrote:

   reserve %ebp for register frame pointer

s/%ebp/%ebx/

Sorry,
leo


Re: Register stacks, return continuations, and speeding up calling

2004-10-20 Thread Dan Sugalski
At 6:13 PM +0200 10/20/04, Leopold Toetsch wrote:
Dan Sugalski wrote:
At 9:16 AM +0200 10/20/04, Leopold Toetsch wrote:

Right. Any time an actual continuation is created we need to walk 
back up the call chain and mark all the pending return 
continuations as non-recyclable.
Ok.
If we're not saving much on the register stacks (and with the 
switch in calls we won't be, which means we can drop the 
pushtop/poptop stuff on calls) it's easier to go with a 
one-frame-per-chunk setup.
Yep, it's easier. Let's start with that.
Good. Today I can manage eaiser. I'm not sure past that. :)
  r16 = BP_OFFS(r13)# get base bointer from interpreter
What, we have two registers dedicated? One for the interpreter 
pointer and one for the start of registers? I didn't realize that. 
If so, then nevermind.
Well, not quite ;) But it's absolutely no problem for e.g. PPC. It 
got plenty of callee-saved registers. For i386 the frame-pointer is 
currently being created in %ebx, interpreter access, which is 
basically rare, needs:

  mov -16(%ebp), %eax   # get interpreter
And that's needed too for getting the new frame pointer
  mov BP_OFFS(%eax), %ebx   # only for reload after invoke
Accessing e.g. a non-mapped I2 is already:
  mov 8(%ebx), %eax
which isn't worse then the old absolute address thingy.
'Kay, now I'm confused. I thought we were talking about removing the 
registers from out of the interpreter structure, which'd leave us 
needing two pointers, one for the interpreter struct and one for the 
registers. (With no link back from the register stack top to the 
interpreter) That not the case?
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


parrot authentication handlers

2004-10-20 Thread Jeff Horwitz
mod_parrot now supports authentication handlers.  i'm planning a release
in the next few days, including a whitepaper on its architecture, but
here's an example of what you can now do.  the following handler accepts
any basic authentication with a password of 'squawk' (correpsonding
httpd.conf follows the code).

.namespace [ 'MyAuthHandler' ]

# handler.imc
.sub _handler
.local pmc r
.local pmc ap_const
.local string pw
.local int status

find_global ap_const, 'Apache::Constants', 'ap_constants'

# get the request_rec object
find_type $I0, 'Apache::RequestRec'
r = new $I0

# decline if not the initial request
$I1 = r.'is_initial_req'( )
if $I1 != 1 goto auth_declined

(status, pw) = r.'get_basic_auth_pw'( )

if pw != 'squawk' goto auth_failure
$I0 = ap_const['OK']
goto auth_return_status

auth_failure:
$I0 = ap_const['HTTP_UNAUTHORIZED']
goto auth_return_status

auth_declined:
$I0 = ap_const['DECLINED']
goto auth_return_status

auth_return_status:
.pcc_begin_return
.return $I0
.pcc_end_return
.end

-

# httpd.conf
ParrotInit /tmp/init.pbc
ParrotLoad /tmp/Constants.pbc
ParrotLoad /tmp/RequestRec.pbc
ParrotLoad /tmp/handler.pbc
Directory /usr/local/apache2/htdocs/parrot/private
ParrotAuthenHandler MyAuthHandler
AuthType Basic
AuthName Parrot Test
Require valid-user
/Directory


-jeff



Re: Register stacks, return continuations, and speeding up calling

2004-10-20 Thread Leopold Toetsch
Dan Sugalski wrote:
'Kay, now I'm confused. I thought we were talking about removing the 
registers from out of the interpreter structure, which'd leave us 
needing two pointers, one for the interpreter struct and one for the 
registers. 
Ok, short summary of future layout of JIT regs:
itemPPC   i386

interpreter r13   -16(%ebp)
frame pointer   r16%ebx
Register addressing is done relative to the frame pointer, which will be 
in a register. The interpreter isn't used that often inside integer JIT 
 code, so it isn't in an register in i386 but is easily reloaded into one.

Currently the frame pointer and the interpreter are the same.
Code that branches (invoke) will reload the frame pointer from the 
interpreter.

leo


Re: Register stacks, return continuations, and speeding up calling

2004-10-20 Thread Dan Sugalski
At 9:09 PM +0200 10/20/04, Leopold Toetsch wrote:
Dan Sugalski wrote:
'Kay, now I'm confused. I thought we were talking about removing 
the registers from out of the interpreter structure, which'd leave 
us needing two pointers, one for the interpreter struct and one for 
the registers.
Ok, short summary of future layout of JIT regs:
itemPPC   i386

interpreter r13   -16(%ebp)
frame pointer   r16%ebx
Register addressing is done relative to the frame pointer, which 
will be in a register. The interpreter isn't used that often inside 
integer JIT  code, so it isn't in an register in i386 but is easily 
reloaded into one.

Currently the frame pointer and the interpreter are the same.
Code that branches (invoke) will reload the frame pointer from the 
interpreter.
Ah, OK. That make sense. (Which is probably a clue that it's time to 
go lie down, but that's a separate issue)

In that case I won't worry about it, and I think I know what I'd like 
to do with the interpreter, the register frame, and the register 
backing stack. I'll muddle it about some and see where it goes.
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: parrot authentication handlers

2004-10-20 Thread Dan Sugalski
At 2:27 PM -0400 10/20/04, Jeff Horwitz wrote:
mod_parrot now supports authentication handlers.  i'm planning a release
in the next few days, including a whitepaper on its architecture, but
here's an example of what you can now do.
Wow. That's... cool. And a bit scary. But definitely cool. I could 
have an excessive amount of fun with this.

You will, I trust, keep us up to date on all the places where we're 
making life difficult? :)
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: parrot authentication handlers

2004-10-20 Thread Leopold Toetsch
Jeff Horwitz [EMAIL PROTECTED] wrote:
 mod_parrot now supports authentication handlers.

awesome

 -jeff

leo


Re: parrot authentication handlers

2004-10-20 Thread Jeff Horwitz
 You will, I trust, keep us up to date on all the places where we're
 making life difficult? :)

for sure.  but it's actually been quite a smooth ride so far.  i have a
short list of problems i've had to deal with, and i'll forward them to the
list when i get the chance.

what i'd really like to see is a language that utilizes parrot objects
(AFAIK there isn't one right now).  once we have this, we can start
writing handlers in a high level language and REALLY have something to
show off.  ;-)

-jeff

On Wed, 20 Oct 2004, Dan Sugalski wrote:

 At 2:27 PM -0400 10/20/04, Jeff Horwitz wrote:
 mod_parrot now supports authentication handlers.  i'm planning a release
 in the next few days, including a whitepaper on its architecture, but
 here's an example of what you can now do.

 Wow. That's... cool. And a bit scary. But definitely cool. I could
 have an excessive amount of fun with this.

 --
   Dan

 --it's like this---
 Dan Sugalski  even samurai
 [EMAIL PROTECTED] have teddy bears and even
teddy bears get drunk




Re: parrot authentication handlers

2004-10-20 Thread Dan Sugalski
At 3:38 PM -0400 10/20/04, Jeff Horwitz wrote:
  You will, I trust, keep us up to date on all the places where we're
 making life difficult? :)
for sure.  but it's actually been quite a smooth ride so far.  i have a
short list of problems i've had to deal with, and i'll forward them to the
list when i get the chance.
Great.
what i'd really like to see is a language that utilizes parrot objects
(AFAIK there isn't one right now).  once we have this, we can start
writing handlers in a high level language and REALLY have something to
show off.  ;-)
Well, actually scary though it may be, my Work Project uses 
parrot objects for everything. Whether this is a useful thing or 
not's an open question (the language lacks subroutines, for example) 
but...

On Wed, 20 Oct 2004, Dan Sugalski wrote:
 At 2:27 PM -0400 10/20/04, Jeff Horwitz wrote:
 mod_parrot now supports authentication handlers.  i'm planning a release
 in the next few days, including a whitepaper on its architecture, but
 here's an example of what you can now do.
 Wow. That's... cool. And a bit scary. But definitely cool. I could
  have an excessive amount of fun with this.
--
Dan
--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Externals in Repository (again) (was Re: Perl 6 Summary for 2004-10-01 through 2004-10-17)

2004-10-20 Thread Robert
I understand Dan's view that parrot should be 100% self contained, but I
really think its silly to inline CPAN modules into our CVS repository.
I have a compromise solution, which might satisfy Dan.
1. I create a new parrot-external-dependencies CVS repository.  All
external dependencies that Dan will not allow to be external go here.
2. I setup some CVS magic, so 'co parrot' does the right thing and puts
all thee things from the parrot-external-dependencies directory into a
'externals' directory or something.
2a. There will also be a parrot-base CVS module which will be the
contents of the existing one.
Actually, this can be expanded to have several different checkoutable
things in CVS:
maybe..
parrot-base
parrot-external-perl
parrot-external-icu
parrot-languages
and one
parrot
(or Parrot, if we want to be like the mozilla folks)
that gets them all.
There is another compromise, that says: CVS shouldn't have CPAN
modules.  But the release process for packages will include the tar
inside the tar.
Thoughts?
-R


Joshua Gatcomb  accidentally introduced a dependency
on
Config::IniFiles.  Since it is implemented in pure
perl he offered to
add it to the repository.  Warnock applies.
In the note offering to fix it, I also listed numerous
other scripts with non-core dependencies.  Dan, in
IRC, indicated that they all should have tickets on
them.  Before fixing parrotbench.pl with one of the
following solutions:
1.  inline Config::IniFiles with the author's
permission
2.  Use some other core module if possible
3.  Roll my own
4.  Revert back to previous non-module version



Re: parrot authentication handlers

2004-10-20 Thread Jeff Horwitz
On Wed, 20 Oct 2004, Dan Sugalski wrote:
 Well, actually scary though it may be, my Work Project uses
 parrot objects for everything. Whether this is a useful thing or
 not's an open question (the language lacks subroutines, for example)
 but...

in general, anything that can generate bytecode/PIR/PASM containing a
_handler subroutine in a parrot namespace would fit the bill.

-jeff



Re: parrot authentication handlers

2004-10-20 Thread Dan Sugalski
At 3:55 PM -0400 10/20/04, Jeff Horwitz wrote:
On Wed, 20 Oct 2004, Dan Sugalski wrote:
 Well, actually scary though it may be, my Work Project uses
 parrot objects for everything. Whether this is a useful thing or
 not's an open question (the language lacks subroutines, for example)
 but...
in general, anything that can generate bytecode/PIR/PASM containing a
_handler subroutine in a parrot namespace would fit the bill.
Hrm. I may have to introduce subs into the grammar for the language 
just so I can make this work.
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Pathological Register Allocation Test Generator

2004-10-20 Thread Bill Coffman
Leo,

Thanks for your suggestions and comments.

On Wed, 20 Oct 2004 10:35:04 +0200, Leopold Toetsch [EMAIL PROTECTED] wrote:
 Some remargs WRT gen{3,4}.pl:
 1) While these programs exhibit some worst case register layout it's
probably not a very typical layout.

Agreed.  The idea was to automate and compare to gcc.  There are
real-world tests already in the parrot test suite, but because I can
generate millions of such cases, one hope is to detect errors, and to
get some kind of performance metric, even if these programs are
artificial.  To get real metrics that people would pay attention to,
we might implement SPEC99, etc.

 2) RL programs have lexicals and globals access spread over the code like
in the generated gen.imc
 3) and that's intersparsed with highly local access to temporaries coming
from expression evaluations.

I think I could modify the tests so that the variables have a Poisson
distribution, as far as their usage distance goes (max number of lines
away from first use).  This would cause some of them to look more like
temporary variables, and be a somewhat better simulation of real
programs.

 I'd change the simulation program to use PMCs to allow for 2). Now when
 it comes to spilling, these lexicals or globals don't need a new
 storage, their live range can just get discarded and at the next usage
 of this lexical or global it just can be refetched[1]. Implementing this
 should already vastly improve the register allocation.

 [1] The refetching can of course be into a different Parrot register.
 Thus the usage of globals or lexicals wouldn't interfer and generate
 huge live ranges over the whole function as it currently does.
 

I don't quite understand this.  You think  I should create PMC
variables, instead of integers?  It's probably a good idea to have a
mix of the four types I suppose.

Once thing I can say is that the interference graph is pretty
conservatively generated.  When a variable is reassigned, it can be
treated as a new variable which can reduce register pressure, but I
don't think the code is doing that yet.  It sounds like you're talking
about implementing the interference graph better.  I may get into that
too, but for now, I want to get the rester coloring (mapping vars to
registers) working better.

Also, I'd like to be able to capture the effects of changes in metrics
(via scripts like those I posted) so we can see if different ideas
will actually improve the situation or not.  I don't even have any
runtime checks yet, but creating a score to see how much was spilled,
could be helpful.  That could be incorporated into the register
allocator itself (which already is, if you run with -d 0008, then grep
Spill).

 I don't know if we need some syntax to easily support lexicals or
 globals or if the register allocation can deduce that itself. But if a
 new syntax simplifies that, we put it in.

My preference is to not distinguish between various types of
variables, but to have the algorithms deal best with nodes based on
the structure of the graph(s).

 
 For 3) there is already a separate pass in
 imcc/reg_alloc.c:allocate_non_interfering().

Yes.  This function seems to find variables whose live ranges (life
ranges) are restricted to a single basic block, and assign them reg
28,29, or 30.  My preference is that low hanging fruit like temporary
vars should be assigned by the algorithm.  Such variables will tend
not to have much interference with others, and are thus probably
easier to color.  My experience with coloring, is that assigning low
degree (aka arity, #neighbors) nodes to fixed colors first, will
actually reduce the performance of the graph coloring algorithm.  It's
usually best to color the easy stuff last.

The algorithm I'd like to use is one that happens to work very well
for interference graphs, and it is very simple.  This algorithm
selects a node that interferes with the fewest other variables (lowest
degree), and removes it from the interference graph, and puts it into
a stack.  It then recalculates the lower degrees of each neighbor, and
iteratively removes all nodes from the graph in this way, pushing them
onto the stack.  It then, pops off the nodes from the stack and colors
them in lifo order, using the first available color.  Nodes that need
a color higher that 30, or whatever the max. turns out to be, are
spilled.  The spilled nodes are always in the densest part of the
graph, where something has to spill.

I believe this is essentially the algorithm Briggs/Chaitin used, and
some variation of it has been used in most compilers since the 70's. 
One exception is that certain variables want certain colors
(want_regno) -- those should be colored first, as the current code
does.

A disadvantage of the above algorithm is that it tends to randomly
select which nodes get spilled, among the densest parts.  Another
optimization to the algorithm is to incorporate a score that causes
the most frequently accessed nodes not to get spilled.  (The