HP-UX 11.00 back on track again

2001-10-29 Thread H . Merijn Brand

Automated smoke report for patch Oct 28 20:00:01 2001 UTC
  v0.02 on hpux using cc version B.11.11.02
O = OK
F = Failure(s), extended report at the bottom
? = still running or test results not (yet) available
Build failures during:   - = unknown
c = Configure, m = make, t = make test-prep

 Configuration
---  
O O  
O O  nv=double
O O  iv=int
O O  iv=int --define nv=double
O O  iv=long
O O  iv=long --define nv=double
| |
| +- --debugging
+--- normal

-- 
H.Merijn BrandAmsterdam Perl Mongers (http://www.amsterdam.pm.org/)
using perl-5.6.1, 5.7.2  629 on HP-UX 10.20  11.00, AIX 4.2, AIX 4.3,
  WinNT 4, Win2K pro  WinCE 2.11.  Smoking perl CORE: [EMAIL PROTECTED]
http:[EMAIL PROTECTED]/   [EMAIL PROTECTED]
send smoke reports to: [EMAIL PROTECTED], QA: http://qa.perl.org




Re: Parameter passing conventions

2001-10-29 Thread Gregor N. Purdy

Dan --

On Fri, 2001-10-26 at 16:38, Dan Sugalski wrote:
 Okay, here are the conventions.

Looks like I'm going to have to write some real logic in jakoc
pretty soon...

 *) The callee is responsible for saving and restoring non-scratch registers

Nice for callee since if its work fits into five regs of each type
its not going to have to do any saves or restores. Caller, though, is
going to have to vacate those regs. So, if caller got args in those
regs and then calls anyone else, it has to move them from those regs (or
save them).

 *) The first five registers (I0-I4, S0-S4, P0-P4, N0-N4) are scratch and do 
 not have to be preserved by the callee

Still thinking about this... We are reducing the overall number of reg
copies going on by adding these special cases. I just wish we had an
approach that was both uniform (simple, no special cases) and fast too.

 *) In *ALL* cases where the stack is used, things are put on the stack in 
 *reverse* order. The topmost stack element *must* be the integer count of 
 the number of elements on the stack

OK.

 *) The callee is responsible for making sure the stack is cleaned off.

So, in the case of zero args, do we still push a zero on the stack to
make a proper frame? I think yes...

 
 Inbound args
 
 If the called subroutine has a fixed number of arguments, they will be 
 placed in the first five registers of the appropriate register types. First 
 integer goes in I0, second in I1, and so on.
 
 If there are too many arguments of a particular type the overflow go on the 
 stack. If there are a variable number of arguments, all the *non* fixed 
 args go on the stack.

So for right now, just pretend that all Jako subroutines take a variable
number of args.. :) (Until I get the time to write fully compatible
conventions in jakoc, anyway).

Can we have ops to inquire on the type of the topmost stack entry?

[snip]


Regards,
 
-- Gregor
 _ 
/ perl -e 'srand(-2091643526); print chr rand 90 for (0..4)'  \

   Gregor N. Purdy  [EMAIL PROTECTED]
   Focus Research, Inc.http://www.focusresearch.com/
   8080 Beckett Center Drive #203   513-860-3570 vox
   West Chester, OH 45069   513-860-3579 fax
\_/




Re: Parameter passing conventions

2001-10-29 Thread Gregor N. Purdy

Sam --

  Okay, here are the conventions.
 
 Great.  Anyone want to offer up some examples or should I just wait for
 Jako support to see this in action?

I'll be working on making jakoc support the convention, but it may
take a while with my day job duties as they are. If I can get it in
quickly I will, but please continue breathing :)

The first step I'm going to take is to start putting the arg and
result counts on the stack, and remove the stack rotation stuff.
Then, I'll start thinking about how I want to wrap up the conventions
so I don't have to think about them more than once.

Hey! We should be thinking about the minimum amount of stuff we need
to do to support separate compilation so we can implement the
conventions in more than one of the Parrot-targeted languages and do
a demo of mixed language programming. Heres a partial list:

  * export table segment in packfile.

Put the subroutine entry points here.

  * import table segment in packfile (fixup table sufficient for this?)

Put the unresolved external symbols here.

  * possibly unify all this into symbol table segment.

  * linker that takes multiple pbc files and concatenates them, doing
relocating to produce a single pbc file.


Regards
 
-- Gregor
 _ 
/ perl -e 'srand(-2091643526); print chr rand 90 for (0..4)'  \

   Gregor N. Purdy  [EMAIL PROTECTED]
   Focus Research, Inc.http://www.focusresearch.com/
   8080 Beckett Center Drive #203   513-860-3570 vox
   West Chester, OH 45069   513-860-3579 fax
\_/




Re: Parameter passing conventions

2001-10-29 Thread Dan Sugalski

At 08:43 AM 10/29/2001 -0500, Gregor N. Purdy wrote:
Dan --

On Fri, 2001-10-26 at 16:38, Dan Sugalski wrote:
  Okay, here are the conventions.

Looks like I'm going to have to write some real logic in jakoc
pretty soon...

Ahhh! The horror! :-)

Seriously, the conventions are geared towards full-blown compilers with a 
reasonable register ordering module at the very least, which isn't 
unreasonable to expect for a language implementation. (And folks that want 
to fake out using a stack will probably work with the top few registers to 
avoid having to deal with parameter conflicts)

  *) The callee is responsible for saving and restoring non-scratch registers

Nice for callee since if its work fits into five regs of each type
its not going to have to do any saves or restores. Caller, though, is
going to have to vacate those regs. So, if caller got args in those
regs and then calls anyone else, it has to move them from those regs (or
save them).

Caller will only have to vacate those registers if they're being used and 
need to last past the call to the function. If the register assignment 
algorithm's clever (which is a big if) the lifetime of temporaries will 
keep function calls in mind.

  *) The first five registers (I0-I4, S0-S4, P0-P4, N0-N4) are scratch 
 and do
  not have to be preserved by the callee

Still thinking about this... We are reducing the overall number of reg
copies going on by adding these special cases. I just wish we had an
approach that was both uniform (simple, no special cases) and fast too.

You, and me, and about a zillion other people. Generally speaking the 
choices are fast, uniform, and scalable. Choose two.

This is really only an issue for folks writing code generators by hand, and 
with 32 of each register type most people won't hit it. Plain parser 
add-ons will use the core code generator, so they won't need to worry about it.

  *) The callee is responsible for making sure the stack is cleaned off.

So, in the case of zero args, do we still push a zero on the stack to
make a proper frame? I think yes...

If the function is listed as taking a variable number of args, yes. 
Functions marked as taking no args at all don't get anything put on the stack.

 
  Inbound args
  
  If the called subroutine has a fixed number of arguments, they will be
  placed in the first five registers of the appropriate register types. 
 First
  integer goes in I0, second in I1, and so on.
 
  If there are too many arguments of a particular type the overflow go on 
 the
  stack. If there are a variable number of arguments, all the *non* fixed
  args go on the stack.

So for right now, just pretend that all Jako subroutines take a variable
number of args.. :) (Until I get the time to write fully compatible
conventions in jakoc, anyway).

That's fine. A perfectly workable solution.

Can we have ops to inquire on the type of the topmost stack entry?

In the works, yep.

Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk




Re: Parameter passing conventions

2001-10-29 Thread Dan Sugalski

At 08:52 AM 10/29/2001 -0500, Gregor N. Purdy wrote:
The first step I'm going to take is to start putting the arg and
result counts on the stack, and remove the stack rotation stuff.

Leave the rotate opcode, though. That might come in handy for the 
Forth/Scheme/Postscript folks, once we have them.

Hey! We should be thinking about the minimum amount of stuff we need
to do to support separate compilation so we can implement the
conventions in more than one of the Parrot-targeted languages and do
a demo of mixed language programming.

Darned straight. Anyone want to take a shot at a proposed bytecode file 
format update?

Heres a partial list:

   * export table segment in packfile.

 Put the subroutine entry points here.

Yep.

   * import table segment in packfile (fixup table sufficient for this?)

 Put the unresolved external symbols here.

Dunno if we need this. We can leave symbol resolution to runtime when we 
come across them, but we probably ought to have it for those languages that 
want full linktime resolution.

   * possibly unify all this into symbol table segment.

That would be spiffy-keen. :)

   * linker that takes multiple pbc files and concatenates them, doing
 relocating to produce a single pbc file.

While I don't think we need this for normal use, it could be quite handy. 
(I don't want to require linking before running--loading up module bytecode 
at runtime is definitely a requirement)

Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk




Re: Parameter passing conventions

2001-10-29 Thread Gregor N. Purdy

Dan --

 Looks like I'm going to have to write some real logic in jakoc
 pretty soon...
 
 Ahhh! The horror! :-)

:)

 Seriously, the conventions are geared towards full-blown compilers with a 
 reasonable register ordering module at the very least, which isn't 
 unreasonable to expect for a language implementation. (And folks that want 
 to fake out using a stack will probably work with the top few registers to 
 avoid having to deal with parameter conflicts)

I am thinking about having Jako take the position that it doesn't use
those regs except for calls, and values are immediately copied from
those regs to the real regs for the variables as the tail end of the
callee part of the subroutine linkage. That will at least permit Jako
to be correct even if it isn't as efficient as possible. Later I can
worry about being smarter. Heck, right now Jako sometimes generates
code with a branch to the next instruction (ah, the joy of simple
code generators...).

   *) The callee is responsible for saving and restoring non-scratch registers
 
 Nice for callee since if its work fits into five regs of each type
 its not going to have to do any saves or restores. Caller, though, is
 going to have to vacate those regs. So, if caller got args in those
 regs and then calls anyone else, it has to move them from those regs (or
 save them).
 
 Caller will only have to vacate those registers if they're being used and 
 need to last past the call to the function. If the register assignment 
 algorithm's clever (which is a big if) the lifetime of temporaries will 
 keep function calls in mind.

Big if indeed. At least for Jako's near future.

   *) The first five registers (I0-I4, S0-S4, P0-P4, N0-N4) are scratch 
  and do
   not have to be preserved by the callee
 
 Still thinking about this... We are reducing the overall number of reg
 copies going on by adding these special cases. I just wish we had an
 approach that was both uniform (simple, no special cases) and fast too.
 
 You, and me, and about a zillion other people. Generally speaking the 
 choices are fast, uniform, and scalable. Choose two.

H. I tried reading section 29 (Subroutine Linkage) of the MMIXware
book (pages 32-34) for inspiration, but I didn't see how anything there
could help us. MMIX has 256 logical general-purpose 64-bit registers.
That's a handy reg size since a reasonable float can sit in there as
well as an unreasonable int. The local-marginal-global register
distinction used by MMIX is interesting, but I think it might lose its
appeal with 4 distinct typed register files.

Knuth does make the statement:

These conventions for parameter passing are admittedly a bit
confusing in the general case, and I suppose people who use them
extensively might sometime find themselves talking about the
infamous MMIX register shuffle. However, there is good use for
subroutines that convert a sequence of register contents like
(x, a, b, c) into (f, a, b, c) where f is a function of a, b,
and c but not x. Moreover PUSHGO and POP can be implemented with
great efficiency, and subroutine linkage tends to be a significant
bottleneck when other conventions are used.

Its that last sentence that got my attention... But, I still don't
know if we could make use of any of those ideas. I can imagine
having separate L and G for each register file, and otherwise
following the same procedure, but I suspect we'd be unhappy with
the MMIX conventions without having a larger number of registers.

BTW, how did you choose 32 for the number of regs?

 This is really only an issue for folks writing code generators by hand, and 
 with 32 of each register type most people won't hit it. Plain parser 
 add-ons will use the core code generator, so they won't need to worry about it.

Yeah. I'm trying very hard not to put anything really sophisticated
into jakoc (at least not yet). Right now I can still tweak things
reasonably well. If I add much more complexity, I'm going to have
to actually write a real compiler, and if I write a real compiler I
probably won't be able to resist the temptation to turn Jako into the
language I *really* wish I had, and that would be a bigger project.

   *) The callee is responsible for making sure the stack is cleaned off.
 
 So, in the case of zero args, do we still push a zero on the stack to
 make a proper frame? I think yes...
 
 If the function is listed as taking a variable number of args, yes. 
 Functions marked as taking no args at all don't get anything put on the stack.

I'm thinking yes because of stack unwinding. Don't we need to have
parity between return addresses on their stack and frames of args
on their stack? Oh wait. We're popping (restoring) those off the stack
on subroutine entry, so in general the arg stack should be empty
most of the time, right? Adding to that the fact that most of the
time our args and results will be passed in regs, and I guess I
can see that we won't need it. Except for 

Re: Parameter passing conventions

2001-10-29 Thread Dan Sugalski

At 11:17 AM 10/29/2001 -0500, Gregor N. Purdy wrote:
*) The first five registers (I0-I4, S0-S4, P0-P4, N0-N4) are scratch
   and do
not have to be preserved by the callee
  
  Still thinking about this... We are reducing the overall number of reg
  copies going on by adding these special cases. I just wish we had an
  approach that was both uniform (simple, no special cases) and fast too.
 
  You, and me, and about a zillion other people. Generally speaking the
  choices are fast, uniform, and scalable. Choose two.

H. I tried reading section 29 (Subroutine Linkage) of the MMIXware
book (pages 32-34) for inspiration, but I didn't see how anything there
could help us. MMIX has 256 logical general-purpose 64-bit registers.
That's a handy reg size since a reasonable float can sit in there as
well as an unreasonable int. The local-marginal-global register
distinction used by MMIX is interesting, but I think it might lose its
appeal with 4 distinct typed register files.

Knuth does make the statement:

 These conventions for parameter passing are admittedly a bit
 confusing in the general case, and I suppose people who use them
 extensively might sometime find themselves talking about the
 infamous MMIX register shuffle. However, there is good use for
 subroutines that convert a sequence of register contents like
 (x, a, b, c) into (f, a, b, c) where f is a function of a, b,
 and c but not x. Moreover PUSHGO and POP can be implemented with
 great efficiency, and subroutine linkage tends to be a significant
 bottleneck when other conventions are used.

Its that last sentence that got my attention... But, I still don't
know if we could make use of any of those ideas. I can imagine
having separate L and G for each register file, and otherwise
following the same procedure, but I suspect we'd be unhappy with
the MMIX conventions without having a larger number of registers.

I'll have to snag that manual next time I'm around a good bookstore. I've 
not read it as of yet, and Knuth generally has good things to say.

A split between local, marginal, and global registers would be an 
interesting thing to do, and I can see it making the code more elegant. I 
worry about it making things more complex, though, especially with us 
already having multiple register types. (We'd double or triple the number 
of register types essentially, and to some extent blow cache even more than 
we do now. Might be a win in other ways, though. I'll have to ponder a bit)

BTW, how did you choose 32 for the number of regs?

Picked it out of the air. :)

Seriously, I wanted a power-of-two number, I wanted the resulting size of a 
register file to be equal to or smaller than your average page size (512 
bytes for most folks IIRC) and I wanted to be able to encode the register 
number and type in a single byte if it turned out that the overhead of 
decoding was smaller than the speed hit we took from the extra bus 
bandwidth wasting a full 32 bit word for each parameter.

So, the two-bit type limits us to 64 registers max, and that seemed a bit 
too big in the general case. 16 was too few by a bit (most of my compiler 
books say that's not quite enough for most code, and you'll end up with 
overflow to the stack to handle temps), so that left 32. Still a bit big in 
some cases, especially considering we have four full sets of registers, but 
we'll see how that goes.

Yeah. I'm trying very hard not to put anything really sophisticated
into jakoc (at least not yet). Right now I can still tweak things
reasonably well. If I add much more complexity, I'm going to have
to actually write a real compiler, and if I write a real compiler I
probably won't be able to resist the temptation to turn Jako into the
language I *really* wish I had, and that would be a bigger project.

And this would be a bad thing because? (Well, besides the demands on what 
little free time you might have now, but that's not our problem... :)

*) The callee is responsible for making sure the stack is cleaned off.
  
  So, in the case of zero args, do we still push a zero on the stack to
  make a proper frame? I think yes...
 
  If the function is listed as taking a variable number of args, yes.
  Functions marked as taking no args at all don't get anything put on the 
 stack.

I'm thinking yes because of stack unwinding. Don't we need to have
parity between return addresses on their stack and frames of args
on their stack?

Sort of. The only place we really need to have it is for the exception 
handling, which needs to quickly unwind the register stacks, but I'm 
thinking we'll push the addresses of the current register files when we 
push an exception handler, and restore them (along with the stack) when we 
catch an exception.

Oh wait. We're popping (restoring) those off the stack
on subroutine entry, so in general the arg stack should be empty
most of the time, right?

I don't know that it'll be empty all the time, as 

Re: Parameter passing conventions

2001-10-29 Thread Gregor N. Purdy

Dan --

[snip]

 I'll have to snag that manual next time I'm around a good bookstore. I've 
 not read it as of yet, and Knuth generally has good things to say.

You can grab PDFs here:

http://link.springer.de/link/service/series/0558/tocs/t1750.htm

Of course, you can also browse around on Knuth's site for other related
stuff...

http://www-cs-faculty.stanford.edu/~knuth/mmix-news.html

 A split between local, marginal, and global registers would be an 
 interesting thing to do, and I can see it making the code more elegant. I 
 worry about it making things more complex, though, especially with us 
 already having multiple register types. (We'd double or triple the number 
 of register types essentially, and to some extent blow cache even more than 
 we do now. Might be a win in other ways, though. I'll have to ponder a bit)

Yeah, I didn't like the idea of proliferating that more either. I still
sometimes dream about a single register file of N regs into which we can
put whatever we want. Each block of registers has room for the reg
contents and the type info too. Seems you've got some of the support for
that figured out in the stack already. Just declare that either (a) it
is illegal (or behavior undefined) to do

  set $2, 5
  set $3, foo bar
  add $1, $2, $3

[just because we have higher-level data types than a real machine
doesn't mean we can't still have general-purpose registers, I think]

or (b) that if you do something numeric with a register that is
non-numeric type mucking happens behind the scenes and throws an
exception if there is a problem. Certainly this wouldn't be surprising
to anyone who had been looking at what we do with PMCs and arithmetic
ops.

If we ever did move to such a single-register-file model, I'd support
looking seriously at the calling conventions of MMIX to see if we can
get the appropriate performance characteristics. And, BTW, we have
4*32 = 128 regs now. We could even match the logical register count
of MMIX (256) with only a doubling of total register count. And, if
we ever determined we needed another kind of register (such as one
that can be used for address arithmetic, since INTVAL doesn't cut it),
we wouldn't have to add a fifth file, we'd just add another type
(thinking again about the stack implementation).

[snip]

 Yeah. I'm trying very hard not to put anything really sophisticated
 into jakoc (at least not yet). Right now I can still tweak things
 reasonably well. If I add much more complexity, I'm going to have
 to actually write a real compiler, and if I write a real compiler I
 probably won't be able to resist the temptation to turn Jako into the
 language I *really* wish I had, and that would be a bigger project.
 
 And this would be a bad thing because? (Well, besides the demands on what 
 little free time you might have now, but that's not our problem... :)

It might be a bad thing because Jako would then not be a little demo
language. I suppose I could start from scratch, but then I'd have to
come up with another language name (oh the horrors!)

[snip]


Regards,
 
-- Gregor
 _ 
/ perl -e 'srand(-2091643526); print chr rand 90 for (0..4)'  \

   Gregor N. Purdy  [EMAIL PROTECTED]
   Focus Research, Inc.http://www.focusresearch.com/
   8080 Beckett Center Drive #203   513-860-3570 vox
   West Chester, OH 45069   513-860-3579 fax
\_/




Re: Parameter passing conventions

2001-10-29 Thread Gregor N. Purdy

Dan --

You can also look at section 1.4.1' of

http://www-cs-faculty.stanford.edu/~knuth/fasc1.ps.gz

for another view of subroutine linkage from the upcoming TAOCP.


Regards,

-- Gregor
 _ 
/ perl -e 'srand(-2091643526); print chr rand 90 for (0..4)'  \

   Gregor N. Purdy  [EMAIL PROTECTED]
   Focus Research, Inc.http://www.focusresearch.com/
   8080 Beckett Center Drive #203   513-860-3570 vox
   West Chester, OH 45069   513-860-3579 fax
\_/




Re: String rationale

2001-10-29 Thread Tom Hughes

In message [EMAIL PROTECTED]
  Dan Sugalski [EMAIL PROTECTED] wrote:

 At 04:23 PM 10/27/2001 +0100, Tom Hughes wrote:

 Attached is my first pass at this - it's not fully ready yet but
 is something for people to cast an eye over before I spend lots of
 time going down the wrong path ;-)
 
 It looks pretty good on first glance.

I've done a bit more work now, and the latest version is attached.

This version can do transcoding. The intention is that there will be
some sort of cache in chartype_lookup_transcoder to avoid repeating
the expensive lookups by name too much.

One interesting question is who is responsible for transcoding
from character set A to character set B - is it A or B? and how
about the other way?

My code currently allows either set to provide the transform on the
grounds that otherwise the unicode module would have to either know
how to convert to everything else or from everything else.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/


# This is a patch for parrot to update it to parrot-ns
# 
# To apply this patch:
# STEP 1: Chdir to the source directory.
# STEP 2: Run the 'applypatch' program with this patch file as input.
#
# If you do not have 'applypatch', it is part of the 'makepatch' package
# that you can fetch from the Comprehensive Perl Archive Network:
# http://www.perl.com/CPAN/authors/Johan_Vromans/makepatch-x.y.tar.gz
# In the above URL, 'x' should be 2 or higher.
#
# To apply this patch without the use of 'applypatch':
# STEP 1: Chdir to the source directory.
# If you have a decent Bourne-type shell:
# STEP 2: Run the shell with this file as input.
# If you don't have such a shell, you may need to manually create/delete
# the files/directories as shown below.
# STEP 3: Run the 'patch' program with this file as input.
#
# These are the commands needed to create/delete files/directories:
#
mkdir 'chartypes'
chmod 0755 'chartypes'
mkdir 'encodings'
chmod 0755 'encodings'
rm -f 'transcode.c'
rm -f 'strutf8.c'
rm -f 'strutf32.c'
rm -f 'strutf16.c'
rm -f 'strnative.c'
rm -f 'include/parrot/transcode.h'
rm -f 'include/parrot/strutf8.h'
rm -f 'include/parrot/strutf32.h'
rm -f 'include/parrot/strutf16.h'
rm -f 'include/parrot/strnative.h'
touch 'chartype.c'
chmod 0644 'chartype.c'
touch 'chartypes/unicode.c'
chmod 0644 'chartypes/unicode.c'
touch 'chartypes/usascii.c'
chmod 0644 'chartypes/usascii.c'
touch 'encoding.c'
chmod 0644 'encoding.c'
touch 'encodings/singlebyte.c'
chmod 0644 'encodings/singlebyte.c'
touch 'encodings/utf16.c'
chmod 0644 'encodings/utf16.c'
touch 'encodings/utf32.c'
chmod 0644 'encodings/utf32.c'
touch 'encodings/utf8.c'
chmod 0644 'encodings/utf8.c'
touch 'include/parrot/chartype.h'
chmod 0644 'include/parrot/chartype.h'
touch 'include/parrot/encoding.h'
chmod 0644 'include/parrot/encoding.h'
#
# This command terminates the shell and need not be executed manually.
exit
#
 End of Preamble 

 Patch data follows 
diff -c 'parrot/MANIFEST' 'parrot-ns/MANIFEST'
Index: ./MANIFEST
*** ./MANIFEST  Sun Oct 28 17:11:21 2001
--- ./MANIFEST  Sun Oct 28 17:11:07 2001
***
*** 1,5 
--- 1,8 
  assemble.pl
  ChangeLog
+ chartype.c
+ chartypes/unicode.c
+ chartypes/usascii.c
  classes/genclass.pl
  classes/intclass.c
  classes/scalarclass.c
***
*** 15,20 
--- 18,28 
  docs/parrotbyte.pod
  docs/strings.pod
  docs/vtables.pod
+ encoding.c
+ encodings/singlebyte.c
+ encodings/utf8.c
+ encodings/utf16.c
+ encodings/utf32.c
  examples/assembly/bsr.pasm
  examples/assembly/call.pasm
  examples/assembly/euclid.pasm
***
*** 30,35 
--- 38,45 
  global_setup.c
  hints/mswin32.pl
  hints/vms.pl
+ include/parrot/chartype.h
+ include/parrot/encoding.h
  include/parrot/events.h
  include/parrot/exceptions.h
  include/parrot/global_setup.h
***
*** 46,56 
  include/parrot/runops_cores.h
  include/parrot/stacks.h
  include/parrot/string.h
- include/parrot/strnative.h
- include/parrot/strutf16.h
- include/parrot/strutf32.h
- include/parrot/strutf8.h
- include/parrot/transcode.h
  include/parrot/trace.h
  include/parrot/unicode.h
  interpreter.c
--- 56,61 
***
*** 108,117 
  runops_cores.c
  stacks.c
  string.c
- strnative.c
- strutf16.c
- strutf32.c
- strutf8.c
  test_c.in
  test_main.c
  Test/More.pm
--- 113,118 
***
*** 129,135 
  t/op/time.t
  t/op/trans.t
  trace.c
- transcode.c
  Types_pm.in
  vtable_h.pl
  vtable.tbl
--- 130,135 
diff -c 'parrot/Makefile.in' 'parrot-ns/Makefile.in'
Index: ./Makefile.in
*** ./Makefile.in   Wed Oct 24 19:23:47 2001
--- ./Makefile.in   Sat Oct 27 15:02:45 2001
***
*** 11,19 
  $(INC)/pmc.h $(INC)/resources.h
  
  O_FILES = global_setup$(O) interpreter$(O) parrot$(O) register$(O) \
! core_ops$(O) memory$(O) packfile$(O) stacks$(O) string$(O) strnative$(O) \
! strutf8$(O) strutf16$(O) strutf32$(O) transcode$(O) runops_cores$(O) \
! trace$(O) vtable_ops$(O) 

RE: String rationale

2001-10-29 Thread Stephen Howard

You might consider requiring all character sets be able to convert to Unicode, and 
otherwise only have to know how to convert other
character sets to it's own set.

-Original Message-
From: Tom Hughes [mailto:[EMAIL PROTECTED]]
Sent: Monday, October 29, 2001 02:31 PM
To: [EMAIL PROTECTED]
Subject: Re: String rationale


In message [EMAIL PROTECTED]
  Dan Sugalski [EMAIL PROTECTED] wrote:

 At 04:23 PM 10/27/2001 +0100, Tom Hughes wrote:

 Attached is my first pass at this - it's not fully ready yet but
 is something for people to cast an eye over before I spend lots of
 time going down the wrong path ;-)

 It looks pretty good on first glance.

I've done a bit more work now, and the latest version is attached.

This version can do transcoding. The intention is that there will be
some sort of cache in chartype_lookup_transcoder to avoid repeating
the expensive lookups by name too much.

One interesting question is who is responsible for transcoding
from character set A to character set B - is it A or B? and how
about the other way?

My code currently allows either set to provide the transform on the
grounds that otherwise the unicode module would have to either know
how to convert to everything else or from everything else.

Tom

--
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/




RE: String rationale

2001-10-29 Thread Dan Sugalski

At 02:52 PM 10/29/2001 -0500, Stephen Howard wrote:
You might consider requiring all character sets be able to convert to Unicode,

That's already a requirement. All character sets must be able to go to or 
come from Unicode. They can do others if they want, but it's not required. 
(And we'll have to figure out how to allow that reasonably efficiently)

Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk




RE: String rationale

2001-10-29 Thread Stephen Howard

right.  I had just keyed in on this from Tom's message:

My code currently allows either set to provide the transform on the
grounds that otherwise the unicode module would have to either know
how to convert to everything else or from everything else.

...which seemed to posit that Unicode module could be responsible for all the 
transcodings to and from it's own character set, which
seemed backwards to me.

-Stephen

-Original Message-
From: Dan Sugalski [mailto:[EMAIL PROTECTED]]
Sent: Monday, October 29, 2001 02:43 PM
To: Stephen Howard; Tom Hughes; [EMAIL PROTECTED]
Subject: RE: String rationale


At 02:52 PM 10/29/2001 -0500, Stephen Howard wrote:
You might consider requiring all character sets be able to convert to Unicode,

That's already a requirement. All character sets must be able to go to or
come from Unicode. They can do others if they want, but it's not required.
(And we'll have to figure out how to allow that reasonably efficiently)

Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk





Anybody write a threaded dispatcher yet?

2001-10-29 Thread Ken Fox

Anybody do a gcc-specific goto *pc dispatcher
for Parrot yet? On some architectures it really
cooks.

- Ken



Re: Anybody write a threaded dispatcher yet?

2001-10-29 Thread Dan Sugalski

At 03:33 PM 10/29/2001 -0500, Ken Fox wrote:
Anybody do a gcc-specific goto *pc dispatcher
for Parrot yet? On some architectures it really
cooks.

That's a good question. There was talk and benchmark numbers from a variety 
of different dispatchers.

C'mon folks, kick in the code. I'll weld dispatch selection into 
configure.pl if I've got the dispatchers to work from...

Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk




New patch

2001-10-29 Thread Daniel Grunblatt

OK, there is another workaround to make pbc2c.pl work which still uses the
goto model so speed is not affected but it's harder to maintain since
it's not as generic as the other one.
Daniel.



Index: pbc2c.pl
===
RCS file: /home/perlcvs/parrot/pbc2c.pl,v
retrieving revision 1.3
diff -r1.3 pbc2c.pl
68a69
 my $op;
70a72
 my @pcs = ();
88c90
 inti;
---
 intcur_opcode,to;
142c144
 my $op;
---
 my $jump;
145a148
 $jump .= case  . $pc . : goto PC_ . $pc . ;\n;
162a166
$source = cur_opcode =  . $pc . ;\n . $source if ($op-full_name eq 
'bsr_ic');
172a177,187
 JUMP:{
 switch (to) {
 case 0: goto PC_0;
 END_C
 
 print $jump;
 print END_C;
 default: exit(0);
 }
 }
 
189c204,208
   return sprintf(goto PC_%d, $addr);
---
   if ($op-full_name =~ 'ret') {
 return sprintf(to = dest;\ngoto JUMP);
   } else {
 return sprintf(goto PC_%d, $addr);
   }
201c220,224
   return sprintf(goto PC_%d, $pc + $offset);
---
   if ($op-full_name eq 'jump_i') {
 return sprintf(to =  . $pc . + . $offset . ;\ngoto JUMP);
   } else {
  return sprintf(goto PC_%d, $pc + $offset);
   }


Index: pbc2c.pl
===
RCS file: /home/perlcvs/parrot/pbc2c.pl,v
retrieving revision 1.3
diff -r1.3 pbc2c.pl
70a71
 my @functions = ();
79a81,82
 
 void start();
85a89,90
 struct Parrot_Interp * interpreter;
 
88,89d92
 inti;
 struct Parrot_Interp * interpreter;
134a138,142
 print END_C;
 start();
 return 0;
 }
 END_C
163c171,172
 printf(PC_%d: { /* %s */\n%s}\n\n, $pc, $op-full_name, $source);
---
 push(@functions,$pc);
 printf(int\nPC_%d(int cur_opcode) /* %s */\n{\n%s}\n\n, $pc, 
$op-full_name, $source);
168,171c177,181
 PC_$new_pc:
 PC_0: {
 exit(0);
 }
---
 void
 start()
 {
 int(*functions[$pc])(int);
 intj = 1;
173c183,191
 return 0;
---
 END_C
 foreach (0..scalar(@functions) - 1) {
 print functions[ . $functions[$_] . ] = (int (*)(int))PC_ . 
$functions[$_] . ;\n;
 }
 
 print END_C;
 
 while (j) { j = (*functions[j])(j); };
 exit(0);
189c207
   return sprintf(goto PC_%d, $addr);
---
   return sprintf return ( . $addr . );
201c219
   return sprintf(goto PC_%d, $pc + $offset);
---
   return sprintf return (cur_opcode+ . $offset . );



Re: New patch

2001-10-29 Thread Daniel Grunblatt

Just to make it clear both of them still need a LOT of work, but I don't
know to which should I stick.

On Mon, 29 Oct 2001, Daniel Grunblatt wrote:

 OK, there is another workaround to make pbc2c.pl work which still uses the
 goto model so speed is not affected but it's harder to maintain since
 it's not as generic as the other one.
   Daniel.






Re: Schedule of things to come

2001-10-29 Thread Nathan Torkington

John Siracusa writes:
  I think we're due out in reasonably good alpha/beta shape for the summer.
 Heh, the phrase suitable vague springs to mind... :)

There's a good reason for that, for why I've tried hard to avoid
giving promises of when things would be ready.  Have you seen Apache 2
and Mozilla slip their schedules?  I'm making everyone take things
feature-by-feature, and we'll give a release schedule when we can see
the end in sight and not before.  What would be the point of naming an
arbitrary date when we don't even know when Larry will finish his
Apocalypses?  It seems crazy to have dates before you have
specifications of the final system.

Nat 




RE: String rationale

2001-10-29 Thread Tom Hughes

In message [EMAIL PROTECTED]
  Stephen Howard [EMAIL PROTECTED] wrote:

 right.  I had just keyed in on this from Tom's message:
 
 My code currently allows either set to provide the transform on the
 grounds that otherwise the unicode module would have to either know
 how to convert to everything else or from everything else.
 
 ...which seemed to posit that Unicode module could be responsible for
 all the transcodings to and from it's own character set, which seemed
 backwards to me.

I was only positing it long enough to acknowledge that such a rule
was untenable.

What it comes down to is that there are three possibles rules, namely:

  1. Each character set defines transforms from itself to other
 character sets.

  2. Each character set defines transforms to itself from other
 character sets.

  3. Each character set defines transforms both from itself to
 other character sets and from other character sets to itself.

We have established that the first two will not work because of the
unicode problem.

That leaves the third, which is what I have implemented. When looking to
transcode from A to B it will first ask A if can it transcode to B and
if that fails then it will ask B if it can transcode from A.

That way each character set can manage it's own translations both to
and from unicode as we require.

The problem it raises is, whois reponsible for transcoding from ASCII to
Latin-1? and back again? If we're not careful both ends will implement
both translations and we will have effective duplication.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/




Re: New patch

2001-10-29 Thread Simon Cozens

On Mon, Oct 29, 2001 at 03:15:07PM -0300, Daniel Grunblatt wrote:
 Just to make it clear both of them still need a LOT of work, but I don't
 know to which should I stick.

Just in case anyone wonders what's up with this patch, I'm waiting for
some feedback from others before applying.

-- 
So i get the chance to reread my postings to asr at times, with a
corresponding conservation of the almighty leviam00se, Kai Henningsen.
-- Megahal (trained on asr), 1998-11-06



Improved storage-to-storage architecture performance

2001-10-29 Thread Ken Fox

A little while back I posted some code that
implemented a storage-to-storage architecture.
It was slow, but I tossed that off as an
implementation detail. Really. It was. :)

Well, I've tuned things up a bit. It's now
hitting 56 mops with the mops.pasm example. Parrot
turns in 24 mops on the same machine with the same
compiler options. This is not a fair comparison
because the Parrot dispatcher isn't optimal, but it
shows I'm not hand waving about the architecture
any more... ;)

Dan was right. It's a lot faster to emit explicit
scope change instructions than to include a scope
tag everywhere. Memory usage is about the same, but
the explicit instructions permit code threading
which is a *huge* win on some architectures. The
assembler does 99% of the optimizations, and it
still uses scope tagged instructions, so nothing is
really lost by ripping out the scope tags.

One thing I learned is that it's not necessary (or
desirable) to do enter/exit scope ops. I implemented
sync_scope which takes a scope id as an operand
and switches the VM into that scope, adjusting
the current lexical environment as necessary. This
works really well. The reason why sync_scope works
better than explicit enter/exit ops is because
sync_scope doesn't force any execution order on the
code. Compilers just worry about flow control and
the VM figures out how to adjust the environment
automatically. For example, Algol-style non-local
goto is very fast -- faster and cleaner than
exceptions for escaping from deep recursion.

One other thing I tested was subroutine calling.
This is an area where a storage-to-storage arch
really shines. I called a naive factorial(5) in a
loop 10 million times. Subroutine call performance
obviously dominates. Here's the code and the times:

Parrot: 237,000 fact(5)/sec

fact:   clonei
eq  I0, 1, done
set I1, I0
dec I0, 1
bsr fact
mul I0, I0, I1
done:   saveI0
popi
restore I0
ret

Kakapo: 467,000 fact(5)/sec

.begin
fact: arg   L0, 0
  cmp   L1, L0, 1
  brne  L1, else
  ret.i 1
else: sub   L2, L0, 1
  jsr   L3, fact, L2
  mul   L4, L0, L3
  ret.i L4
.end

I think the main thing that makes the storage-
to-storage architecture faster is that the callee
won't step on the caller's registers. The caller's
arguments can be fetched directly by the callee.
There's no argument stack or save/restore needed.

Here's the calling conventions for Kakapo.

On a sub call, the pc is saved in the ret_pc
register. Any frames not shared (lexically) between
the caller and callee are dumped to the stack (just
the frame pointers; the frames themselves are never
copied).

A sync_scope instruction at the start of a sub
takes care of building the callee's lexical
environment.

The caller passes arguments by reference. The arg
instruction uses the operands in the jsr instruction
as an argument list. (The jsr instruction is easy to
access because the ret_pc register points to it.)
arg works exactly like set except that it uses
the caller's lexical environment to fetch the source
value. Yes, this makes jsr a variable-size instruction,
but so what? There's no penalty on a software VM.

- Ken



Re: Improved storage-to-storage architecture performance

2001-10-29 Thread Dan Sugalski

At 04:44 PM 10/29/2001 -0500, Ken Fox wrote:

Well, I've tuned things up a bit. It's now
hitting 56 mops with the mops.pasm example. Parrot
turns in 24 mops on the same machine with the same
compiler options.

Damn. I hate it when things outside my comfort zone end up being faster. :)

This is not a fair comparison
because the Parrot dispatcher isn't optimal, but it
shows I'm not hand waving about the architecture
any more... ;)

I didn't think you were, unfortunately. (for me, at least) A SS 
architecture skips a level of indirection, and that'll end up being faster 
generally.

What sort of dispatch was your version using, and what sort was parrot 
using in your test?

One thing I learned is that it's not necessary (or
desirable) to do enter/exit scope ops.

Don't forget that you'll need those for higher-level constructs. For 
example, this code:

   {
  my Dog $spot is color('brindle'):breed('welsh corgi');
   }

will need to call Dog's constructor and attribute setting code every time 
you enter that scope.

You also potentially need to allocate a new scope object every time you 
enter a scope so you can remember it properly if any closures are created.

I implemented
sync_scope which takes a scope id as an operand
and switches the VM into that scope, adjusting
the current lexical environment as necessary.

How does this handle nested copies of a single scope? That's the spot a SS 
architecture needs to switch to indirect access from direct, otherwise you 
can only have a single instance of a particular scope active at any one 
time, and that won't work.

I'm curious as to whether the current bytecode could be translated on load 
to something a SS interpreter could handle.

Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk




Re: String rationale

2001-10-29 Thread James Mastros

On Mon, Oct 29, 2001 at 08:32:16PM +, Tom Hughes wrote:
 We have established that the first two will not work because of the
 unicode problem.
Hm.  I think instead of requiring Unicode to support everything, we should
require Unicode to support /nothing/.  If A and B have no mutual transcoding
function, we should use Unicode as a intermediary.  (This means that
charsets that are lossy to unicode need to transcode to eachother directly,
like Far Eastern sets.  (And Klingon, but that can't transcode to anything.))

This still makes Unicode a special case, but not a terrible one.  (In fact,
unicode can be treated like any other charset, except when we want to
trancode between mutualy incompatable sets, since we always try both A-B
and A-B.

(Notational note: A-B means that A is implementing a transcoding from itself
to B.  A-B means that A is implementing a transcoding from B to A.)

 That leaves the third, which is what I have implemented. When looking to
 transcode from A to B it will first ask A if can it transcode to B and
 if that fails then it will ask B if it can transcode from A.
I propose another variant on this:
If that fails, it asks A to transcode to Unicode, and B to transcode from
Unicode.  (Not Unicode to transcode to B; Unicode implements no transcodings.)

 The problem it raises is, whois reponsible for transcoding from ASCII to
 Latin-1? and back again? If we're not careful both ends will implement
 both translations and we will have effective duplication.
1) Neither.  Each must support transcoding to and from Unicode.
2) But either can support converting directly if it wants.

I also think that, for efficency, we might want a 7-bit chars match ASCII
flag, since most charactersets do, and that means that we don't have to deal
with the overhead for strings that fit in 7 bits.  This smells of premature
optimization, though, so sombody just file this away in their heads for
future reference.

That would also mean that neither is responsible for converting between
Latin-1 and ASCII, because core will do it, most of the time, and the rest
of the time, it isn't possible.

Hm.  But it isn't possible _losslessly_, though it is possibly lossfuly.
IMHO, there should be two ways to transcode, or the transcoding function
should flag to it's caller somehow.

(Sorry for the train-of-thought, but I think it's decently clear.)

(BTW, for those paying attention, I'm waiting on this discussion for my
chr/ord patch, since I want them in terms of charsets, not encodings.)

   -=- James Mastros



Re: String rationale

2001-10-29 Thread Tom Hughes

In message [EMAIL PROTECTED]
  James Mastros [EMAIL PROTECTED] wrote:

  That leaves the third, which is what I have implemented. When looking to
  transcode from A to B it will first ask A if can it transcode to B and
  if that fails then it will ask B if it can transcode from A.
 I propose another variant on this:
 If that fails, it asks A to transcode to Unicode, and B to transcode from
 Unicode.  (Not Unicode to transcode to B; Unicode implements no transcodings.)

My code does that, though at a slightly higher level. If you look
at string_transcode() you will see that if it can't find a direct
mapping it will go via unicode. If C had closures then I'd have
buried that down in the chartype_lookup_transcoder() layer, but it
doesn't so I couldn't ;-)

  The problem it raises is, whois reponsible for transcoding from ASCII to
  Latin-1? and back again? If we're not careful both ends will implement
  both translations and we will have effective duplication.
 1) Neither.  Each must support transcoding to and from Unicode.

Absolutely.

 2) But either can support converting directly if it wants.

The danger is that everybody tries to be clever and support direct
conversion to and from as many other character sets as possible, which
leads to lots of duplication.

 I also think that, for efficency, we might want a 7-bit chars match ASCII
 flag, since most charactersets do, and that means that we don't have to deal
 with the overhead for strings that fit in 7 bits.  This smells of premature
 optimization, though, so sombody just file this away in their heads for
 future reference.

I have already been thinking about this although it does get more
complicated as you have to consider the encoding as well - if you
have a single byte encoded ASCII string then transcoding to a single
byte encoded Latin-1 string is a no-op, but that may not be true for
other encodings if such a thing makes sense for those character types.

 (BTW, for those paying attention, I'm waiting on this discussion for my
 chr/ord patch, since I want them in terms of charsets, not encodings.)

I suspect that the encode and decode methods in the encoding vtable
are enough for doing chr/ord aren't they?

Surely chr() is just encoding the argument in the chosen encoding (which
can be the default encoding for the char type if you want) and then setting
the type and encoding of the resulting string appropriately.

Equally ord() is decoding the first character of the string to get a
number.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/




Re: String rationale

2001-10-29 Thread James Mastros

On Mon, Oct 29, 2001 at 11:20:47PM +, Tom Hughes wrote:
  2) But either can support converting directly if it wants.
 The danger is that everybody tries to be clever and support direct
 conversion to and from as many other character sets as possible, which
 leads to lots of duplication.
Yeah.  But that's a convention thing, I think.  I also think that most
people won't go to the bother of writing conversion functions that they
don't have to.  What we need to worry about is both, say, big5 and shiftjis
writing both of the conversions.  And it shouldn't come up all that much,
because Unicode is /supposted to be/ lossless for most things.

 I have already been thinking about this although it does get more
 complicated as you have to consider the encoding as well - if you
 have a single byte encoded ASCII string then transcoding to a single
 byte encoded Latin-1 string is a no-op, but that may not be true for
 other encodings if such a thing makes sense for those character types.
Hm.  All the encodings I can think of (which is rather limited -- the UTFs),
you can scan for units (IE ints of the proper size)  0x7f, and if you don't
find any, it's 7bit, and you can just change the charset marker without
doing any work.

In any case, it's up to the encoding to tell if we've got a pure 7bit
string.  If that's complicated for it, it can just always return FALSE.

 I suspect that the encode and decode methods in the encoding vtable
 are enough for doing chr/ord aren't they?
Hmm... come to think of it, yes.  chr will always create a utf32-encoded
string with the given charset number (or unicode for the two-arg version),
ord will return the codepoint within the current charset.

(This, BTW, means that only encodings that feel like it have to provide
either, but all encodings must be able to convert to utf32.)

Powers-that-be (I'm looking at you, Dan), is that good?

   -=- James Mastros



Re: Parameter passing conventions

2001-10-29 Thread Michael L Maraist


  A split between local, marginal, and global registers would be an
  interesting thing to do, and I can see it making the code more elegant. I
  worry about it making things more complex, though, especially with us
  already having multiple register types. (We'd double or triple the number
  of register types essentially, and to some extent blow cache even more
  than we do now. Might be a win in other ways, though. I'll have to ponder
  a bit)

 Yeah, I didn't like the idea of proliferating that more either. I still
 sometimes dream about a single register file of N regs into which we can
 put whatever we want. Each block of registers has room for the reg
 contents and the type info too. Seems you've got some of the support for
 that figured out in the stack already. Just declare that either (a) it
 is illegal (or behavior undefined) to do

   set $2, 5
   set $3, foo bar
   add $1, $2, $3

 [just because we have higher-level data types than a real machine
 doesn't mean we can't still have general-purpose registers, I think]

 or (b) that if you do something numeric with a register that is
 non-numeric type mucking happens behind the scenes and throws an
 exception if there is a problem. Certainly this wouldn't be surprising
 to anyone who had been looking at what we do with PMCs and arithmetic
 ops.

 If we ever did move to such a single-register-file model, I'd support
 looking seriously at the calling conventions of MMIX to see if we can
 get the appropriate performance characteristics. And, BTW, we have
 4*32 = 128 regs now. We could even match the logical register count
 of MMIX (256) with only a doubling of total register count. And, if
 we ever determined we needed another kind of register (such as one
 that can be used for address arithmetic, since INTVAL doesn't cut it),
 we wouldn't have to add a fifth file, we'd just add another type
 (thinking again about the stack implementation).


After reading the entire MMIX chapter, my mind went back and forth.  First of 
all, the only reason that 256 registers were used was because of the byte 
aligned register arguments to the op-codes (plus modern intuition is 
that more registers = faster execution).  Currently we have 4B arguments and 
don't perform boundry-condition checks, so this is neither here nor 
there.  Currently we translate P1 to:

interpreter-num_reg-registers[ cur_opcode[1] ]

Which requires 4 indirections.  (multiple Px instances should be optimized by 
gcc so that subsequent accesses only require 2 indirections) To avoid 
core-dumping on invalid arguments, we could up the reg-set to 256  
and convert the above to:

interpreter-num_reg-registers[ (unsigned char)cur_opcode[1] ]

and adjust the assembler accordingly.  Alternatively to work with 32 regs, 
we'd have:

interpreter-num_reg-registers[ cur_opcode[1]  0x001F ]

As for MMIX.  I don't see a need for globals, since we're going to have 
various global symbol stashes available to us.  Further, I don't see a value 
in providing special trapping code to return zero when reading from a 
Marginal or extending the local variable space when writing.  For writing, an 
explicit reserve (once (or less)per function call) shouldn't be too much 
bother.  And if the code is silly enough to write or read from this 
marginal region, then we'll pretend that they're using uninitialized 
values.  Further, the reserveer must require that there is enough space to 
fully utilize the n-register set (currently 32), so that set $r31, 5 doesn't 
spill into the tail of the register set (since that was previously handled by 
the trapping code).  This modifies the MMIX spec such that we potentially 
waste up to 31 register slots in the register window (which is trivial when 
we have a window of size = 1024).

The rolling register set can be accomplished via three methods.  First, 
realloc the register stack every time an extend exceeds the size.  (This 
sucks for recursive functions).  Second use paged register sets and copy 
values durring partial spillover (very complex).  Lastly utilize a [pow2] 
modulous on a fixed size register stack.  This has a very interesting 
implication; that we completely do away with the (push|pop)[ipsn] and their 
associated data-structures.  Thus the P1 translation becomes:

//(for a 1K rolling stack)
#define STACK_MASK 0x03FF 
interpreter-num_reg[ ( interp-num_offset + cur_opcode[1] )  STACK_MASK ]

This has 4 indirections and two integer ops.  As above, for multiple uses of 
Px in an op-code, this should be optimized to 2 indirections.  Since 
indirections are significantly slower than bitwise logical operations, this 
should be roughly equivalent in speed to our current interpreter.  If we were 
hell-bent on speed, we could utilize STACK_SIZE alligned memory regions and 
perform direct memory arithmetic as with:

interp-x_reg_base = [ ... ] 1 K chunk of memory aligned to 1K boundry
interp-x_reg = interp-x_reg_base + interp-x_offset

#define P1  *(int*)(((int)( interp-x_reg 

Re: Improved storage-to-storage architecture performance

2001-10-29 Thread Ken Fox

Dan Sugalski wrote:
 What sort of dispatch was your version using, and what sort was
 parrot using in your test?

Parrot used the standard function call dispatcher without bounds
checking.

Kakapo used a threaded dispatcher. There's a pre-processing phase
that does byte code verification because threading makes for some
outrageously unsafe code.

Parrot and Kakapo should have very similar mops when using the
same dispatcher. You all know what a Parrot add op looks like.
Here's the Kakapo add op:

op_add:
STORE(kvm_int32, pc[1]) = FETCH(kvm_int32, pc[2]) +
  FETCH(kvm_int32, pc[3]);
pc += 4;
NEXT_OP;

Ok, ok. You want to know what those macros do... ;)

op_add:
*(kvm_int32 *)(frame[pc[1].word.hi] + pc[1].word.lo) = 
   *(const kvm_int32 *)(frame[pc[2].word.hi] + pc[2].word.lo) +
   *(const kvm_int32 *)(frame[pc[3].word.hi] + pc[3].word.lo);
pc += 4;
goto *(pc-i_addr);

I haven't counted derefs, but Parrot and Kakapo should be close.
On architectures with very slow word instructions, some code bloat
to store hi/lo offsets in native ints might be worth faster
address calculations.

 Ken Fox wrote:
  One thing I learned is that it's not necessary (or
  desirable) to do enter/exit scope ops.
 
 Don't forget that you'll need those for higher-level constructs. For
 example, this code:
 
{
   my Dog $spot is color('brindle'):breed('welsh corgi');
}
 
 will need to call Dog's constructor and attribute setting code every time
 you enter that scope.

Definitely. I didn't say Kakapo doesn't have enter/exit scope
semantics -- it does. There's no byte code enter scope op though.
What happens is more declarative. There's a sync_scope guard op
that means the VM must be in lexical scope X to properly run the
following code. If the VM is already in scope X, then it's a nop.
If the VM is in the parent of X, then it's an enter scope. If the
VM is in a child of X, then it's an exit scope.

This makes it *very* easy for a compiler to generate flow control
instructions. For example:

{
   my Dog $spot ...

   {
  my Cat $fluffy ...

middle: $spot-chases($fluffy);

   }
}

What happens when you goto middle depends on where you started.
sync_scope might have to create both Dog and Cat scopes when code
jumps to the middle. Or, code might already be in a sub-scope of
Cat, so sync_scope would just pop scopes until it gets back to Cat.

This is where sync_scope is very useful. It allows the compiler
to say this is the environment I want here and delegates the job
to the VM on how it happens.

 You also potentially need to allocate a new scope object every time you
 enter a scope so you can remember it properly if any closures are created.

Closures in Kakapo are simple. All it needs to do is:

1. copy any current stack frames to the heap
2. copy the display (array of frame pointers) to the heap
3. save the pc

Step #1 can be optimized because the assembler will have a pretty
good idea which frames escape -- the run-time can scribble a note
on the scope definition if it finds one the assembler missed.
Escaping frames will just be allocated on the heap to begin with.

This means that taking a closure is almost as cheap as calling
a subroutine. Calling a closure is also almost as cheap as
calling a subroutine because we just swap in an entirely new
frame display.

 How does this handle nested copies of a single scope? That's the spot a SS
 architecture needs to switch to indirect access from direct, otherwise you
 can only have a single instance of a particular scope active at any one
 time, and that won't work.

Calling a subroutine basically does this:

1. pushes previous return state on the stack
2. sets the return state registers
3. finds the deepest shared scope between caller and callee's parent
4. pushes the non-shared frames onto the stack
5. transfers control to the callee
6. sync_scope at the callee creates any frames it needs

 I'm curious as to whether the current bytecode could be translated on load
 to something a SS interpreter could handle.

Never thought of that -- I figured the advantage of an SS machine
is that brain-dead compilers can still generate fast code. Taking
a really smart compiler generating register-based code and then
translating it to an SS machine seems like a losing scenario.

I think this is why storage-to-storage architectures have lost
favor -- today's compilers are just too smart. Possibly with a
software VM the memory pressure argument favoring registers isn't
strong enough to offset the disadvantage of requiring smart
compilers.

I just put up the 0.2 version of Kakapo at
http://www.msen.com/~fox/Kakapo-0.2.tar.gz

This version has the sync_scope instruction, threaded dispatch,
immediate mode operands, and a really crappy rewrite technique
for instruction selection.

One other thing that I discovered is how sensitive the VM is
to dereferences. Adding the immediate mode versions of add and
cmp gave me 10 more mops in the 

Re: Improved storage-to-storage architecture performance

2001-10-29 Thread Ken Fox

Uri Guttman wrote:
 and please don't bring in hardware comparisons again. a VM design
 cannot be compared in any way to a hardware design.

I have absolutely no idea what you are talking about. I didn't
say a single thing about hardware. My entire post was simply about
an alternative VM architecture. It's not a theory. You can go get
the code right now.

I'm just messing around on a storage-to-storage VM system I've
named Kakapo. It's a dead-end. A fat, flightless, endangered kind
of parrot. It's fun to experiment with ideas and I hope that good
ideas might make it into Parrot.

- Ken



Re: Improved storage-to-storage architecture performance

2001-10-29 Thread Ken Fox

Uri Guttman wrote:
 that is good. i wasn't disagreeing with your alternative architecture.
 i was just making sure that the priority was execution over compilation
 speed.

I use a snazzy quintuple-pass object-oriented assembler written
in equal parts spit and string (with a little RecDescent thrown in for
good measure). A real speed demon it is... ;)

The real motivation of my work is to see if a storage-to-storage
machine ends up using cache better and with less compiler effort
than a register machine. When I read about CRISP, the first thing
that came to mind was the top-of-stack-register-file could be
simulated exactly with high-speed cache in a software VM. Dropping
the stack-machine instructions in favor of Parrot's 3 operand ones
made it sound even better.

 ... then be mmap'ed in and run with hopefully impressive speed.

I'm impressed with the possibilities of the pbc-C translator. The
core modules on my system probably won't be mmap'ed byte code -- they'll
be mmap'ed executable. Reducing memory foot-print this way might take
some of the pressure off the need to share byte code. Lots of really
nice optimizations require frobbing the byte code, which definitely
hurts sharing.

- Ken



Request for new feature: attach a perl debugger to a running process

2001-10-29 Thread David Trusty

Hi,

I would like to request a new feature for perl:  The ability to
attach a perl debugger to a running process.

Also, it would be nice to have the capability to generate a
dump (core file) for post-mortem analysis.  The perl debugger
could then read the core file.

These capabilities would add a lot of value to perl.

Thanks in advance!!

David


_
Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp




Re: Request for new feature: attach a perl debugger to a running process

2001-10-29 Thread Michael G Schwern

On Mon, Oct 29, 2001 at 05:27:30PM +, David Trusty wrote:
 I would like to request a new feature for perl:  The ability to
 attach a perl debugger to a running process.

The DB module gives you the tools to do this sort of thing, though
there is some assembly required for certain very large values of
some.


-- 

Michael G. Schwern   [EMAIL PROTECTED]http://www.pobox.com/~schwern/
Perl6 Quality Assurance [EMAIL PROTECTED]   Kwalitee Is Job One
There is a disurbing lack of PASTE ENEMA on the internet.