subject:"RE\: Stacks registers"

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-05 Thread Dave Mitchell


Larry Wall [EMAIL PROTECTED] wrote:
 It may certainly be valuable to (not) think of it that way, but just
 don't be surprised if the regex folks come along and borrow a lot of
 your opcodes to make things that look like (in C):
 
while (s  send  isdigit(*s)) s++; 

This is the bit that scares me about unifying perl ops and regex ops:

I see perl ops as relatively heavyweight things that can absorb the
costs of 'heavyweight' dispatch (function call overhead, etc etc), while
regex stuff needs to be very lightweight, eg

while (op = *optr++) {
switch (op) {
case FOO:
while (s  send  isdigit(*s)) s++;
case BAR:
while (s  send  isspace(*s)) s++;



can we really unify them without taking a performance hit?

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-05 Thread Simon Cozens


On Tue, Jun 05, 2001 at 11:25:09AM +0100, Dave Mitchell wrote:
 This is the bit that scares me about unifying perl ops and regex ops:
 can we really unify them without taking a performance hit?

Coupl'a things: firstly, we can make Perl 6 ops as lightweight as we like.

Second, Ruby uses a giant switch instead of function pointers for their
op despatch loop; Matz says it doesn't make that much difference in
terms of performance.

I don't know if I've mentioned this before, but 
http://www-6.ibm.com/jp/developerworks/linux/001027/ruby_qa.html
was my interview with Matsumoto about his ideas for Perl 6 and his experiences
from Ruby. It's in Japanese, so http://www.excite.co.jp/world/url/ may help.

-- 
Familiarity breeds facility.
-- Megahal (trained on asr), 1998-11-06

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-05 Thread Dave Mitchell


Simon Cozens [EMAIL PROTECTED] opined:
 On Tue, Jun 05, 2001 at 11:25:09AM +0100, Dave Mitchell wrote:
  This is the bit that scares me about unifying perl ops and regex ops:
  can we really unify them without taking a performance hit?
 
 Coupl'a things: firstly, we can make Perl 6 ops as lightweight as we like.
 
 Second, Ruby uses a giant switch instead of function pointers for their
 op despatch loop; Matz says it doesn't make that much difference in
 terms of performance.

I think it would be very messy to have both types of operands in the same
dispatch loop. I'd much rather have a 'regex start' opcode which
calls a separate dispath loop function, and which then interprets any
further ops in the bytestream as regex ops. That way we double the number
of 8-bit ops, and can have all the regex-specific state variables (s, send
etc in the earlier example) and logic separated out.

 I don't know if I've mentioned this before, but 
 http://www-6.ibm.com/jp/developerworks/linux/001027/ruby_qa.html
 was my interview with Matsumoto about his ideas for Perl 6 and his experiences
 from Ruby. It's in Japanese, so http://www.excite.co.jp/world/url/ may help.

A talk of jewelry
Perl developer From Mr. Simon Cozens to Ruby developer It is also as a pine
It dies and is the question and reply to Mr. [squiggle]

:-)

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-05 Thread Dave Storrs




On Tue, 5 Jun 2001, Dave Mitchell wrote:

 dispatch loop. I'd much rather have a 'regex start' opcode which
 calls a separate dispath loop function, and which then interprets any
 further ops in the bytestream as regex ops. That way we double the number
 of 8-bit ops, and can have all the regex-specific state variables (s, send
 etc in the earlier example) and logic separated out.

This is an interesting idea...could we use this more generally to
multiply our number of opcodes?  Basically, you have one set of opcodes
for (e.g.) string parsing, one set for math, etc, all of which have the
same value.  Then you have a set of opcodes that tells the interpreter
which opcode table to look in.  The 'switching' opcodes then become
overhead, but if there aren't too many of those, perhaps its
acceptable.  And it would mean that we could specialize the opcodes a
great deal more (if, of course, that is desirable), and still have them
fit in an octet.

(Sorry if this is a stupid question, but please be patient; I've
never done internals stuff before.)

Dave

RE: Stacks, registers, and bytecode. (Oh, my!)

2001-06-05 Thread Hong Zhang


 On Tue, Jun 05, 2001 at 11:25:09AM +0100, Dave Mitchell wrote:
  This is the bit that scares me about unifying perl ops and regex ops:
  can we really unify them without taking a performance hit?
 
 Coupl'a things: firstly, we can make Perl 6 ops as lightweight as we like.
 
 Second, Ruby uses a giant switch instead of function pointers for their
 op despatch loop; Matz says it doesn't make that much difference in
 terms of performance.

Function pointer dispath is normally faster or as fast as switch. The main
down side is the context. A typical regular expression engine can pre-fetch
many variables into register local, they can be efficiently used by
all switch cases. However, the common context for regular expression is
relative small, I am not sure of the performance hit.

Hong

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-05 Thread Graham Barr


On Mon, Jun 04, 2001 at 06:04:10PM -0700, Larry Wall wrote:
 Well, other languages have explored that option, and I think that makes
 for an unnatural interface.  If you think of regexes as part of a
 larger language, you really want them to be as incestuous as possible,
 just as any other part of the language is incestuous with the rest of
 the language.  That's part of what I mean when I say that I'm trying to
 look at regular expressions as just a strange variant of Perl code.
 
 Looking at it from a slightly different angle, regular expressions are
 in great part control syntax, and library interfaces are lousy at
 implementing control.

Right. Having the regex opcodes be perl opcodes will certainly make
implementing (?{ ... }) much easier and probably faster too.

Also re references that we have now will become similar to subroutines
for pattern matching.

I think there are a lot of benefits to the re engine not to be
separate from the core perl ops.

Graham.

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-05 Thread David L. Nicol


Graham Barr wrote:

 I think there are a lot of benefits to the re engine not to be
 separate from the core perl ops.


So does it start with a split(//,$bound_thing) or does it use
substr(...) with explicit offsets?

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-05 Thread Graham Barr


On Tue, Jun 05, 2001 at 03:31:24PM -0500, David L. Nicol wrote:
 Graham Barr wrote:
 
  I think there are a lot of benefits to the re engine not to be
  separate from the core perl ops.
 
 
 So does it start with a split(//,$bound_thing) or does it use
 substr(...) with explicit offsets?

Eh ?

Nobody is suggesting we implement re's using the current set of
perl ops, but that we extend the set with ops needed for re's. So
they use the same dispatch loop and that the ops can be intermixed

Graham.

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-05 Thread Dan Sugalski


At 07:40 AM 6/5/2001 -0700, Dave Storrs wrote:
On Tue, 5 Jun 2001, Dave Mitchell wrote:

  dispatch loop. I'd much rather have a 'regex start' opcode which
  calls a separate dispath loop function, and which then interprets any
  further ops in the bytestream as regex ops. That way we double the number
  of 8-bit ops, and can have all the regex-specific state variables (s, send
  etc in the earlier example) and logic separated out.

 This is an interesting idea...could we use this more generally to
multiply our number of opcodes?  Basically, you have one set of opcodes
for (e.g.) string parsing, one set for math, etc, all of which have the
same value.  Then you have a set of opcodes that tells the interpreter
which opcode table to look in.

Nah, that's too much work. We just allow folks to define their own opcode 
functions and assign each a lexically unique number, and dispatch to the 
function as appropriate.

Adding and overriding opcodes is definitely in the cards, though in most 
cases it'll probably be an opcode version of a function call, since 
machine-level stuff would also require telling the compiler how to emit 
those opcodes. (Which folks writing python/ruby/rebol/cobol/fortran front 
ends for the interpreter might do)


Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-04 Thread Larry Wall


Dan Sugalski writes:
: Are you speaking of the nodes in regnode.h? I hadn't considered them as 
: regular perl opcodes--I figured they'd stay internal to the regex engine so 
: we could keep it reasonably modular.

I don't think that's a terribly strong argument--one could justify any
number of unfortunate architectural distinctions on the basis of
modularity.  Plus, I'd argue you can still retain modularity of your
code while unifying implementational philosophy.

It seems to me that the main reason for not considering such a
unification is that We've Never Done It That Way Before.  It's as if
regular expressions have always been second-class programs, so they'll
always be second-class programs, world without end, amen, amen.

But there is precedent for turning second-class code into first-class
code.  After all, that's just what we did for ordinary quotes in the
transition from Perl 4 to Perl 5.  Perl 4 had a string interpolation
engine, and it was a royal pain to deal with.

The fact that Perl 5's regex engine is a royal pain to deal with should
be a warning to us.

Much of the pain of dealing with the regex engine in Perl 5 has to do
with allocation of opcodes and temporary values in a non-standard
fashion, and dealing with the resultant non-reentrancy on an ad hoc
basis.  We've already tried that experiment, and it sucks.  I don't
want to see the regex engine get swept back under the complexity carpet
for Perl 6.  It will come back to haunt us if we do:

  Sure, you can download the object code for this 5 line Perl program
  into your toaster...but you'll also have to download this 5 gigabyte
  regex interpreter before it'll run.

That's a scenario I'd love to avoid.  And if we can manage to store
regex opcodes and state using mechanisms similar to ordinary opcodes,
maybe we'll not fall back into the situation where the regex engine is
understood by only three people, plus or minus four.

Larry

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-04 Thread Jarkko Hietaniemi


 The fact that Perl 5's regex engine is a royal pain to deal with should
 be a warning to us.
 
 Much of the pain of dealing with the regex engine in Perl 5 has to do
 with allocation of opcodes and temporary values in a non-standard
 fashion, and dealing with the resultant non-reentrancy on an ad hoc
 basis.  We've already tried that experiment, and it sucks.  I don't
 want to see the regex engine get swept back under the complexity carpet
 for Perl 6.  It will come back to haunt us if we do:

*standing ovations*

   Sure, you can download the object code for this 5 line Perl program
   into your toaster...but you'll also have to download this 5 gigabyte
   regex interpreter before it'll run.
 
 That's a scenario I'd love to avoid.  And if we can manage to store
 regex opcodes and state using mechanisms similar to ordinary opcodes,
 maybe we'll not fall back into the situation where the regex engine is
 understood by only three people, plus or minus four.
 
 Larry

-- 
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for 'stable'.
# It is 'dead'. -- Jack Cohen

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-04 Thread David L. Nicol


Larry Wall wrote:

   Sure, you can download the object code for this 5 line Perl program
   into your toaster...but you'll also have to download this 5 gigabyte
   regex interpreter before it'll run.
 
 That's a scenario I'd love to avoid.  And if we can manage to store
 regex opcodes and state using mechanisms similar to ordinary opcodes,
 maybe we'll not fall back into the situation where the regex engine is
 understood by only three people, plus or minus four.
 
 Larry


Does anyone have on-their-shelves a regex-into-non-regex-perl translator?
run time is not an issue, correct behavior is



-- 
  David Nicol 816.235.1187 [EMAIL PROTECTED]
  Obi-Wan taught me mysticism -- Luke Housego

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-04 Thread Simon Cozens


On Mon, Jun 04, 2001 at 02:26:26PM -0500, David L. Nicol wrote:
 Does anyone have on-their-shelves a regex-into-non-regex-perl translator?

Does anyone have on-their-shelves a David-Nicol-into-English translator? :)

 run time is not an issue

Wrong.

-- 
I decided to spread the bad mood: Dress To Depress.
- Red Drag Diva

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-04 Thread Dan Sugalski


At 08:34 PM 6/4/2001 +0100, Simon Cozens wrote:
On Mon, Jun 04, 2001 at 02:26:26PM -0500, David L. Nicol wrote:
  Does anyone have on-their-shelves a regex-into-non-regex-perl translator?

Does anyone have on-their-shelves a David-Nicol-into-English translator? :)

I think he's looking for something that turns a regex into perl that 
doesn't involve regexes.

  run time is not an issue

Wrong.

Um Presumably if he *was* interested in run time he wouldn't have said 
that. (Which isn't to say that I'm not interested in run time for something 
like that if it's to be used in perl 6, but that's a separate issue)

Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-04 Thread Uri Guttman


 SC == Simon Cozens [EMAIL PROTECTED] writes:

  SC OK, here's how you do it. Run perl -Mre=debug -e '/your regexp/',
  SC and use Perl to parse the bit in the middle. That's a state machine,
  SC so we can emulate it with subroutines.

  SC So, for instance: perl -Mre=debug -e '/fo+bar/'
  SC ...
  SC1: EXACT f(3)
  SC3: PLUS(6)
  SC4:   EXACT o(0)
  SC6: EXACT bar(8)
  SC8: END(0)
  SC ...

  SC Now we translate that to something like this:
  SC sub node_1 { 
  SC $state = shift; 
  SC while ($_ = substr($state-{string},0,1,)) {
  SC if ($_ eq f) {
  SC return success if $state-node_3;
  SC }
  SC }
  SC return failure; 
  SC }
  SC ... and so on ...

  SC In keeping with tradition, implementation is left as an exercise for
  SC the reader. :)

that doesn't address larry's point which is very important. the regex
compiler needs to be able to generate the equvilent of the above
directly into perl ops. it would eliminate the separation of the regex
runtime as a black box. we could create special ops that support node
making and traversal and others that execute the low level matches. an
optimizer could then combine some ops into fewer and more specialized
ones.

then regexes can be also debugged in detail with the perl debugger.
that assumes the debugger has access to single stepping ops which is an
intriguing idea. BTW this is the kind of feature that dan wanted the
debugger PDD to have. having this feature supported and running early
on, would allow us to debug the compiler and optimizer using our own
tool. in fact if the debugger supported remote control as well as
internal, then it could be developed in parallel in perl5. this would be
a very useful thing to have during the development of the full perl 6.

uri

-- 
Uri Guttman  -  [EMAIL PROTECTED]  --  http://www.sysarch.com
SYStems ARCHitecture and Stem Development -- http://www.stemsystems.com
Learn Advanced Object Oriented Perl from Damian Conway - Boston, July 10-11
Class and Registration info: http://www.sysarch.com/perl/OOP_class.html

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-04 Thread Dan Sugalski


At 11:24 AM 6/4/2001 -0700, Larry Wall wrote:
Dan Sugalski writes:
: Are you speaking of the nodes in regnode.h? I hadn't considered them as
: regular perl opcodes--I figured they'd stay internal to the regex engine so
: we could keep it reasonably modular.

I don't think that's a terribly strong argument--one could justify any
number of unfortunate architectural distinctions on the basis of
modularity.

Yeah I know. I mean, look at how those darned bricks have held back 
architecture all these years! :-P

Plus, I'd argue you can still retain modularity of your
code while unifying implementational philosophy.

I'm not entirely sure of that one--processing a full regex requires the 
perl interpreter, it's not all that modular. Though whether being able to 
yank out the RE engine and treat it as a standalone library is important 
enough to warrant being treated as a design goal or not is a separate 
issue. (I think so, as it also means I can treat it as a black box for the 
moment so there's less to try and stuff in my head at once)

It seems to me that the main reason for not considering such a
unification is that We've Never Done It That Way Before.  It's as if
regular expressions have always been second-class programs, so they'll
always be second-class programs, world without end, amen, amen.

No, not really. The big reasons I wasn't planning on unification are:

*) It makes the amount of mental space the core interpreter takes up smaller
*) It can make performance tradeoffs separately from the main perl engine
*) We can probably snag the current perl 5 source without much change
*) The current RE engine's scared (or is that scarred?) me off enough that 
I'd as soon leave it to someone who's more tempermentally suited for such 
things.
*) Treating regexes as non-atomic operations brings some serious threading 
issues into things.

The fact that Perl 5's regex engine is a royal pain to deal with should
be a warning to us.

I can think of a couple of reasons that the current engine's a royal pain, 
and they don't have much to do with it as a separate entity...

Much of the pain of dealing with the regex engine in Perl 5 has to do
with allocation of opcodes and temporary values in a non-standard
fashion, and dealing with the resultant non-reentrancy on an ad hoc
basis.  We've already tried that experiment, and it sucks.  I don't
want to see the regex engine get swept back under the complexity carpet
for Perl 6.

Yeah, but those are mostly issues with the implementation, not with the 
separation.

That's a scenario I'd love to avoid.  And if we can manage to store
regex opcodes and state using mechanisms similar to ordinary opcodes,
maybe we'll not fall back into the situation where the regex engine is
understood by only three people, plus or minus four.

While I'm not sure I agree with it, if that's what you want, then that's 
what we'll do. Threading will complicate this some, since we'll need to 
guarantee atomicity across multiple opcodes, something I'd not planned on 
doing.

Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-04 Thread Dan Sugalski


At 04:20 PM 6/4/2001 -0400, Uri Guttman wrote:
then regexes can be also debugged in detail with the perl debugger.
that assumes the debugger has access to single stepping ops which is an
intriguing idea. BTW this is the kind of feature that dan wanted the
debugger PDD to have. having this feature supported and running early
on, would allow us to debug the compiler and optimizer using our own
tool. in fact if the debugger supported remote control as well as
internal, then it could be developed in parallel in perl5. this would be
a very useful thing to have during the development of the full perl 6.

Regexes will be able to be debugged in detail regardless of how the regex 
engine's implemented. If we go separate, you'll just single-step through 
the regex ops rather than the perl ones.

Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-04 Thread Dan Sugalski


At 03:08 PM 6/4/2001 -0500, Jarkko Hietaniemi wrote:
On Mon, Jun 04, 2001 at 03:43:43PM -0400, Dan Sugalski wrote:
  At 08:34 PM 6/4/2001 +0100, Simon Cozens wrote:
  On Mon, Jun 04, 2001 at 02:26:26PM -0500, David L. Nicol wrote:
Does anyone have on-their-shelves a regex-into-non-regex-perl 
 translator?
  
  Does anyone have on-their-shelves a David-Nicol-into-English 
 translator? :)
 
  I think he's looking for something that turns a regex into perl that
  doesn't involve regexes.

You must be sharing the pipe with David since I couldn't make head||tails
about it, either :-)

Hey, it's the standard, regulation-issue perl internals pipe. Didn't you 
get one with the pumpkin? (I think they put it inside as a joke... :)

Err...a regex that isn't a regex, is this a Zen koan...?  Ahhh, you
want to emulate the state machine in Pure Perl.  Okay... next thing
you want to do is to write symbolic assembler in C...? :-)

I know, I know, what *are* we thinking? Heck, next thing you know we'll 
want to go parsing text in C too... :-P

Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-04 Thread David L. Nicol


Jarkko Hietaniemi wrote:
 Err...a regex that isn't a regex, is this a Zen koan...?  Ahhh, you
 want to emulate the state machine in Pure Perl.  Okay... next thing
 you want to do is to write symbolic assembler in C...? :-)


I have my reasons :)


Actually, I want to write a c into perl compiler -- I have for years --
one of these long manic afternoons I'll do it, too

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-04 Thread Uri Guttman


 DS == Dan Sugalski [EMAIL PROTECTED] writes:

  DS No, not really. The big reasons I wasn't planning on unification are:

  DS *) It makes the amount of mental space the core interpreter takes
  DS up smaller

not if the regex ops were developed separately. all it will do is expand
the op table, not make the core more complex. also the regex compiler
can be written separately as well.

  DS *) It can make performance tradeoffs separately from the main perl engine

that can be done with a regex optimizer, or creating more specialized
and efficient regex ops.

  DS *) We can probably snag the current perl 5 source without much change

is that worth keeping? maybe for bootstrapping the new setup but not
permanently. stuff like having direct access to regex engine for
debugging would be better designed in from scratch. and the moving of
the engine to the main op code loop solves that problem.

  DS *) Treating regexes as non-atomic operations brings some serious
  DS threading issues into things.

well, the main op loop is threaded (isn't one of the bugs with current
threads the fact that the regex engine is not thread safe?) so you get
that automatically. by just marking the regex data as shared should
trigger all the right thread locks and stuff. what happens to $1 when
the data is shared and threaded? are thread local copies of the original
data made and $1 maps inside those?

   That's a scenario I'd love to avoid.  And if we can manage to store
   regex opcodes and state using mechanisms similar to ordinary opcodes,
   maybe we'll not fall back into the situation where the regex engine is
   understood by only three people, plus or minus four.

  DS While I'm not sure I agree with it, if that's what you want, then that's 
  DS what we'll do. Threading will complicate this some, since we'll need to 
  DS guarantee atomicity across multiple opcodes, something I'd not planned on 
  DS doing.

what is wrong with a single mutex (or equivilent) around the regex? that
is a simple way to gain atomicity over multiple ops. and we would have
to define what is atomic in detail. most regexes will operate on thread
specific data as most data should be thread specific. doing a regex on a
shared variable may be costly but that can be dealt with by copying it
first to a thread local variable.

uri

-- 
Uri Guttman  -  [EMAIL PROTECTED]  --  http://www.sysarch.com
SYStems ARCHitecture and Stem Development -- http://www.stemsystems.com
Learn Advanced Object Oriented Perl from Damian Conway - Boston, July 10-11
Class and Registration info: http://www.sysarch.com/perl/OOP_class.html

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-04 Thread Dan Sugalski


At 11:46 AM 6/1/2001 -0700, [EMAIL PROTECTED] wrote:
On Tue, May 29, 2001 at 06:20:40PM -0400, Dan Sugalski wrote:
 
  I really think we'll win if we have support for at least integers as well
  as PMCs. There's potentially a lot of integer work that'll be generated by
  the optimizer and, while integer opcodes might not make the interpreter
  much faster, they'll speed up TIL and generated C code significantly since
  they won't need to call opcodes.

How much integer arithmetic does the perl interpreter actually do?

Some, though not as much as it ought as most math's done as floating point.

Figuring out where the hot spots are in an interpreter for a general-
purpose programming language is hard. I'd recommend against special
cases in the registers, since it's not clear how much they'd help.

There are a few reasons to consider integer and floating point registers:

*) A good chunk of the optimizations I've been coming across seem to 
benefit from integer scratch space. (Granted, generally for languages with 
a different thrust than perl)
*) Code that does use it will translate more efficiently to .NET/Java 
bytecode/native machine code
*) Larry's expressed a desire to let us get lower-level with perl, which'll 
benefit from having integer scratch space
*) If the interpreter will support multiple source languages, they'll get a 
benefit from it. (Like Python and Ruby, to name two)

It might be that there's no advantage to having them, in which case we'll 
excise them and the opcodes that reference them, and that's fine. There 
seems to be some advantage to it, which is why I brought them up in the 
first place.

Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-04 Thread Dan Sugalski


At 01:09 AM 6/1/2001 -0700, [EMAIL PROTECTED] wrote:
In my experience, perl opcodes have not been the performance bottleneck
in perl5.

It seems it isn't actually the loop that's the bottleneck in perl5. I
profiled a whole bunch of different perl programs, using a lot of
different versions of perl5; and the runops loop was very rarely among
the top CPU users.  Many times, the opcode routines themselves weren't
the hot spots. It was the support routines like hash key calculation
and lookup, string comparisons, or the regular expression code.

Yeah, a lot of that's definitely a problem, as is the manipulation of the 
return stack and some assignment bits. (You can cut the time split takes in 
half by having the destination array presized, for example)

Profiling is almost always counter-intuitive, but if I had to
guess, I'd say that most of the per-opcode cost in perl5 was due to
setup/initialization as each opcode was entered, and that devious/clever
data structure design could avoid most of this. Also, opcode dispatch
might not be the right tree up which to be barking in seeking performance.

The setup/teardown costs are definitely a heavy expense. We can cut it down 
some, hopefully a lot in some cases, but there are limits to what can be 
done. If they're treated as a fixed cost (which they more or less are), 
then the less often we pay that cost the better.

Unfortunately there's also the problem of having *too* many opcode 
functions, at which point you pay a lot of cycles in cache miss times 
fetching in Specially Optimized Function #4327. It seems, though, that if 
we have the option of doing something in one opcode instead of 10 or 20, it 
makes sense to do so as we reduce the constant costs.

Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-04 Thread Larry Wall


Dan Sugalski writes:
: At 11:24 AM 6/4/2001 -0700, Larry Wall wrote:
: Dan Sugalski writes:
: : Are you speaking of the nodes in regnode.h? I hadn't considered them as
: : regular perl opcodes--I figured they'd stay internal to the regex engine so
: : we could keep it reasonably modular.
: 
: I don't think that's a terribly strong argument--one could justify any
: number of unfortunate architectural distinctions on the basis of
: modularity.
: 
: Yeah I know. I mean, look at how those darned bricks have held back 
: architecture all these years! :-P

Hey, I come from the coast where we try to avoid bricks, especially
when they're falling.

: Plus, I'd argue you can still retain modularity of your
: code while unifying implementational philosophy.
: 
: I'm not entirely sure of that one--processing a full regex requires the 
: perl interpreter, it's not all that modular.

These days I'm trying to see the regex as just a funny-looking kind of
Perl code.

: Though whether being able to 
: yank out the RE engine and treat it as a standalone library is important 
: enough to warrant being treated as a design goal or not is a separate 
: issue. (I think so, as it also means I can treat it as a black box for the 
: moment so there's less to try and stuff in my head at once)

As a fellow bear of very little brain, I'm just trying to point out that
we already have a good example of the dangers to that approach.

: It seems to me that the main reason for not considering such a
: unification is that We've Never Done It That Way Before.  It's as if
: regular expressions have always been second-class programs, so they'll
: always be second-class programs, world without end, amen, amen.
: 
: No, not really. The big reasons I wasn't planning on unification are:
: 
: *) It makes the amount of mental space the core interpreter takes up smaller

It may certainly be valuable to (not) think of it that way, but just
don't be surprised if the regex folks come along and borrow a lot of
your opcodes to make things that look like (in C):

while (s  send  isdigit(*s)) s++;

: *) It can make performance tradeoffs separately from the main perl engine

The option of doing its own thing its own way is always open to an
opcode, but when you do that the option of making efficient use of the
core infrastructure goes away.  As an honorary member of the regex
hacking team, I covet registers.  :-)

: *) We can probably snag the current perl 5 source without much change

Cough, cough.

: *) The current RE engine's scared (or is that scarred?) me off enough that 
: I'd as soon leave it to someone who's more tempermentally suited for such 
: things.

As an honorary member of the temperamentally suited team, allow me to
repeat myself.  Cough, cough.  We can certainly borrow the ideas from
Perl 5's regex engine, but even us temperamentally suited veterans are
sufficiently scared/scarred to want something that works better.

: *) Treating regexes as non-atomic operations brings some serious threading 
: issues into things.

Eh, treating regexes as atomic is the root of the re-entrancy problem.
If the regex has access to local storage, the re-entrancy and threading
problems pretty much solve themselves.

: The fact that Perl 5's regex engine is a royal pain to deal with should
: be a warning to us.
: 
: I can think of a couple of reasons that the current engine's a royal pain, 
: and they don't have much to do with it as a separate entity...

Sure, I'm just saying that at least two of those couple reasons are
that 1) it invents its own opcode storage mechanism, and 2) it uses
globals for efficiency when it should be using some efficient variety
of locals.

: Much of the pain of dealing with the regex engine in Perl 5 has to do
: with allocation of opcodes and temporary values in a non-standard
: fashion, and dealing with the resultant non-reentrancy on an ad hoc
: basis.  We've already tried that experiment, and it sucks.  I don't
: want to see the regex engine get swept back under the complexity carpet
: for Perl 6.
: 
: Yeah, but those are mostly issues with the implementation, not with the 
: separation.

That sounds suspiciously like what I'm trying to say.

: That's a scenario I'd love to avoid.  And if we can manage to store
: regex opcodes and state using mechanisms similar to ordinary opcodes,
: maybe we'll not fall back into the situation where the regex engine is
: understood by only three people, plus or minus four.
: 
: While I'm not sure I agree with it, if that's what you want, then that's 
: what we'll do. Threading will complicate this some, since we'll need to 
: guarantee atomicity across multiple opcodes, something I'd not planned on 
: doing.

How will we guarantee that my $foo stays my under threading?  Surely
the same mechanism could serve to keep my regex state variables sane
under the same circumstances.  (I oversimplify, of course...)

Larry

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-04 Thread Jarkko Hietaniemi


 : Though whether being able to 
 : yank out the RE engine and treat it as a standalone library is important 
 : enough to warrant being treated as a design goal or not is a separate 
 : issue. (I think so, as it also means I can treat it as a black box for the 
 : moment so there's less to try and stuff in my head at once)
 
 As a fellow bear of very little brain, I'm just trying to point out that
 we already have a good example of the dangers to that approach.

Still, having the regexen as reusable component (i.e. library)
wouldn't be a bad idea.  The current regex code most definitely isn't
detachable or reusable, so we can't have said to have explored that
option.

-- 
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for 'stable'.
# It is 'dead'. -- Jack Cohen

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-04 Thread Larry Wall


Jarkko Hietaniemi writes:
:  : Though whether being able to 
:  : yank out the RE engine and treat it as a standalone library is important 
:  : enough to warrant being treated as a design goal or not is a separate 
:  : issue. (I think so, as it also means I can treat it as a black box for the 
:  : moment so there's less to try and stuff in my head at once)
:  
:  As a fellow bear of very little brain, I'm just trying to point out that
:  we already have a good example of the dangers to that approach.
: 
: Still, having the regexen as reusable component (i.e. library)
: wouldn't be a bad idea.  The current regex code most definitely isn't
: detachable or reusable, so we can't have said to have explored that
: option.

Well, other languages have explored that option, and I think that makes
for an unnatural interface.  If you think of regexes as part of a
larger language, you really want them to be as incestuous as possible,
just as any other part of the language is incestuous with the rest of
the language.  That's part of what I mean when I say that I'm trying to
look at regular expressions as just a strange variant of Perl code.

Looking at it from a slightly different angle, regular expressions are
in great part control syntax, and library interfaces are lousy at
implementing control.

Larry

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-04 Thread Bryan C . Warnock


On Monday 04 June 2001 08:38 pm, Jarkko Hietaniemi wrote:
  : Though whether being able to
  : yank out the RE engine and treat it as a standalone library is
  : important enough to warrant being treated as a design goal or not is a
  : separate issue. (I think so, as it also means I can treat it as a
  : black box for the moment so there's less to try and stuff in my head
  : at once)
 
  As a fellow bear of very little brain, I'm just trying to point out that
  we already have a good example of the dangers to that approach.

 Still, having the regexen as reusable component (i.e. library)
 wouldn't be a bad idea.  The current regex code most definitely isn't
 detachable or reusable, so we can't have said to have explored that
 option.

But to what extent?  I seem to recall one of the main reasons against 
complete separation was (speculated, perhaps) that it was too inefficient 
for the regex engine to be ignorant of Perl internals.

Given that the regexen, in all its complexity, is one of the more enviable 
capabilities of Perl (traditionally), it would certainly be nice (for 
others, I suppose) to be able to gain Perl Regex Handling via a couple calls 
and shared library.  But it was counter-argued that much of the regex's 
speed (and some of its functionality) was directly tied to Perl internal 
structures, and the layer of abstraction necessary to completely separate 
the two would not benefit either Perl, or the stand-alone engine, for both 
cases, so why bother?  IOW, if you're not going to completely exorcise Perl 
from its regexes, why try to exorcise the regexes from Perl?  

-- 
Bryan C. Warnock
[EMAIL PROTECTED]

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-04 Thread Jarkko Hietaniemi


 Well, other languages have explored that option, and I think that makes
 for an unnatural interface.  If you think of regexes as part of a
 larger language, you really want them to be as incestuous as possible,

These days we can be that without feeling that guilty since pcre exists.

 just as any other part of the language is incestuous with the rest of
 the language.  That's part of what I mean when I say that I'm trying to
 look at regular expressions as just a strange variant of Perl code.
 
 Looking at it from a slightly different angle, regular expressions are
 in great part control syntax, and library interfaces are lousy at
 implementing control.

Assuming one wants to have access to regex low level ops, but if one
wants just matches/not and submatches, a regex(3)-like interface is
quite enough.

 Larry

-- 
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for 'stable'.
# It is 'dead'. -- Jack Cohen

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-01 Thread mooring


On Wed, May 30, 2001 at 12:14:29PM -0400, Uri Guttman wrote:
  NI == Nick Ing-Simmons [EMAIL PROTECTED] writes:
 
   NI The overhead of op dispatch is a self-proving issue - if you
   NI have complex ops they are expensive to dispatch.
 
 but as someone else said, we can design our own ops to be as high level
 as we want. lowering the number of op calls is the key. that loop will
 be a bottleneck as it is in perl5 unless we optimize it now.
 

In my experience, perl opcodes have not been the performance bottleneck
in perl5.

It seems it isn't actually the loop that's the bottleneck in perl5. I
profiled a whole bunch of different perl programs, using a lot of
different versions of perl5; and the runops loop was very rarely among
the top CPU users.  Many times, the opcode routines themselves weren't
the hot spots. It was the support routines like hash key calculation
and lookup, string comparisons, or the regular expression code.

Profiling is almost always counter-intuitive, but if I had to
guess, I'd say that most of the per-opcode cost in perl5 was due to
setup/initialization as each opcode was entered, and that devious/clever
data structure design could avoid most of this. Also, opcode dispatch
might not be the right tree up which to be barking in seeking performance.

-- 
Ed Mooring ([EMAIL PROTECTED])

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-01 Thread mooring


On Tue, May 29, 2001 at 06:20:40PM -0400, Dan Sugalski wrote:
 
 I really think we'll win if we have support for at least integers as well 
 as PMCs. There's potentially a lot of integer work that'll be generated by 
 the optimizer and, while integer opcodes might not make the interpreter 
 much faster, they'll speed up TIL and generated C code significantly since 
 they won't need to call opcodes.

How much integer arithmetic does the perl interpreter actually do?

I've profiled a quite a few perl5 versions on a lot of different perl
programs, and about the only programs where integer ops made it above
the background noise were things like Ackerman, iterative Fibonacci, and
some QD image processing on PBM files. Most of the programs I looked
at didn't do much of any integer arithmetic.

Figuring out where the hot spots are in an interpreter for a general-
purpose programming language is hard. I'd recommend against special
cases in the registers, since it's not clear how much they'd help.
-- 
Ed Mooring ([EMAIL PROTECTED])

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-05-30 Thread Randal L. Schwartz


 Larry == Larry Wall [EMAIL PROTECTED] writes:

Larry : Nah, bytecode'll have an endianness marker at the top. If you
Larry : load in bytecode with the wrong endianness, the loader will
Larry : have to swap for you.

Larry Er.  If they're not bytes, we can't call it bytecode.

Unless you want to raise the octet means 8 bits, byte means whatever
is nice and chunky argument. :)

-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
[EMAIL PROTECTED] URL:http://www.stonehenge.com/merlyn/
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-05-30 Thread Nick Ing-Simmons


Uri Guttman [EMAIL PROTECTED] writes:
  DS The one handy thing about push and pop is you don't need to go
  DS tracking the stack manually--that's taken care of by the push and
  DS pop opcodes. They can certainly be replaced with manipulations of
  DS a temp register and indirect register stores or loads, but that's
  DS more expensive--you do the same thing only with more dispatch
  DS overhead.

  DS And I'm considering the stack as a place to put registers
  DS temporarily when the compiler runs out and needs a spot to
  DS squirrel something away, rather than as a mechanism to pass
  DS parameters to subs or opcodes. This is a stack in the traditional
  DS scratch-space sense.

i agree with that. the stack here is mostly a call stack which
save/restores registers as we run out. with a large number like 64, we
won't run out until we do some deep calls. then the older registers (do
we have an LRU mechnism here?) get pushed by the sub call prologue which
then uses those registers for its my vars.

I don't like push/pop - they imply a lot of stack limit checking word-by-word
when it is less overhead for compiler to analyse the needs of whole basic-block
check-for/make-space-on the stack _once_ then just address it.


is the sub call/return stack also the data (scratch) stack? i think
separate ones makes sense here. the data stack is just PMC pointers, the
code call stack has register info, context, etc.

One stack is more natural for translation to C (which has just one).
One problem with FORTH was allocating two growable segments for its 
two stacks - one always ended up 2nd class.

-- 
Nick Ing-Simmons
who is looking for a new job see http://www.ni-s.u-net.com/

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-05-30 Thread Nick Ing-Simmons


Uri Guttman [EMAIL PROTECTED] writes:

think of this as classic CISC code generation with plenty of registers
and a scratch stack. this is stable technology. we could even find a
code generator guru (i don't know any obvious ones in the perl6 world)

Classic CISC code generation taught us that CISC is a pain to code-gen.
(I am not a Guru but did design TMS320C80's RISC specifically to match 
gcc of that vintage, and dabbled in a p-code for Pascal way back.)


   special registers ($_, @_, events, etc.) are indexed with a starting
   offset of 64, so general registers are 0-63.

  DS I'd name them specially (S0-Snnn) rather than make them a chunk of the 
  DS normal register set.

All that dividing registers into sub-classes does it cause you to do 
register-register moves when things are in the wrong sort of register.
Its only real benefit is for encoding density as you can imply part
of the register number by requiring addresses to be in address registers
etc. It is not clear to me that perl special variables map well to that.
Mind you the names are just a human thing - it is the bit-pattern that 
compiler cares about.




oh, they have macro names which are special. something like:

#defineMAX_PLAIN REG   64  /* 0 - 63 are plain regs */
#defineREG_ARG 64  /* $_ */
#defineREG_SUB_ARG 65  /* @_ */
#defineREG_ARGV66  /* @ARGV */
#defineREG_INT167  /* integer 1 */
#defineREG_INT268  /* integer 1 */

uri
-- 
Nick Ing-Simmons
who is looking for a new job see http://www.ni-s.u-net.com/

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-05-30 Thread Dan Sugalski


At 02:08 PM 5/30/2001 +, Nick Ing-Simmons wrote:
Uri Guttman [EMAIL PROTECTED] writes:
 
 think of this as classic CISC code generation with plenty of registers
 and a scratch stack. this is stable technology. we could even find a
 code generator guru (i don't know any obvious ones in the perl6 world)

Classic CISC code generation taught us that CISC is a pain to code-gen.
(I am not a Guru but did design TMS320C80's RISC specifically to match
gcc of that vintage, and dabbled in a p-code for Pascal way back.)

Right, but in this case we have the advantage of tailoring the instruction 
set to the language, and given the overhead inherent in op dispatch we also 
have an incentive to hoist opcodes up to as high a level as we can manage.

special registers ($_, @_, events, etc.) are indexed with a starting
offset of 64, so general registers are 0-63.
 
   DS I'd name them specially (S0-Snnn) rather than make them a chunk of 
 the
   DS normal register set.

All that dividing registers into sub-classes does it cause you to do
register-register moves when things are in the wrong sort of register.
Its only real benefit is for encoding density as you can imply part
of the register number by requiring addresses to be in address registers
etc. It is not clear to me that perl special variables map well to that.
Mind you the names are just a human thing - it is the bit-pattern that
compiler cares about.

Fair enough. Having the equivalent to perl5's push stack (well, one of 'em) 
to save and restore named specials is probably a better idea. There's no 
real overriding performance reason to have the specials in the register set.

Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-05-30 Thread Nick Ing-Simmons


Dan Sugalski [EMAIL PROTECTED] writes:
At 02:08 PM 5/30/2001 +, Nick Ing-Simmons wrote:
Classic CISC code generation taught us that CISC is a pain to code-gen.
(I am not a Guru but did design TMS320C80's RISC specifically to match
gcc of that vintage, and dabbled in a p-code for Pascal way back.)

Right, but in this case we have the advantage of tailoring the instruction 
set to the language, and given the overhead inherent in op dispatch we also 
have an incentive to hoist opcodes up to as high a level as we can manage.

That is of course what they/we all say ;-)

The 68K for example matched quite well to the low-tech compiler technology
of its day, as did UCSD's p-code for USCD Pascal, and DSPs have their own 
reasons (inner loops are more important than generic C) for their CISC nature.

Even the horrible x86 architecture is quasi-sane if you assume all variables
are on the stack addressed by the Base Pointer.

It is interesting now that people are looking at building chips for JVM
how much cursing there is about certain features - though I don't have 
the references to hand.

The overhead of op dispatch is a self-proving issue - if you have complex
ops they are expensive to dispatch. 
In the limit FORTH-like threaded code 

   while (1) *(*op_ptr++)();

is not really very expensive, it is then up to the op to adjust op_ptr
for in-line args etc. Down sides are size op is at least size of a pointer.

With a 16-bit opcode as-per-Uri that becomes:

   while (1) *(table[*op_ptr++])();

(Assuming we don't need to check bounds 'cos we won't generate bad code...)

One can then start adding decode to the loop:
 
   while (1) {
 op_t op = *op_ptr++;
 switch(NUM_ARGS(op))
  case 1:
   *(table[FUNC_NUM(op)])(*op_ptr++);
   break;
  case 3:
   *(table[FUNC_NUM(op)])(op_ptr[0],op_ptr[1],op_ptr[2]);
   op_ptr += 3;
   break;
  ...
   }

Then one can do byte-ordering and mis-aligned hackery and index into reg-array

   while (1) {
 op_t op = GET16BITS(*op_ptr);
 switch(NUM_ARGS(op))
  case 1:
   *(table[FUNC_NUM(op)])(reg_ptr[GET8BITS(*op_ptr)]);
   break;
  ...
   }



-- 
Nick Ing-Simmons
who is looking for a new job see http://www.ni-s.u-net.com/

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-05-30 Thread Uri Guttman


 NI == Nick Ing-Simmons [EMAIL PROTECTED] writes:

  NI Uri Guttman [EMAIL PROTECTED] writes:
   
   think of this as classic CISC code generation with plenty of registers
   and a scratch stack. this is stable technology. we could even find a
   code generator guru (i don't know any obvious ones in the perl6 world)

  NI Classic CISC code generation taught us that CISC is a pain to code-gen.
  NI (I am not a Guru but did design TMS320C80's RISC specifically to match 
  NI gcc of that vintage, and dabbled in a p-code for Pascal way back.)

i disagree. cisc went out of style for speed reasons, not because of
compilers. most risc compilers are much more complicated than cisc
compilers. they have to manage complex register usage, deal with delayed
branches, reorder instructions to fill pipelines, etc. what happened was
that the higher level instruction logic in a cisc chip was removed and
put into the compiler. but we are not doing a risc like IL so this is moot.

   
special registers ($_, @_, events, etc.) are indexed with a starting
offset of 64, so general registers are 0-63.
   
  DS I'd name them specially (S0-Snnn) rather than make them a chunk of the 
  DS normal register set.

  NI All that dividing registers into sub-classes does it cause you to do 
  NI register-register moves when things are in the wrong sort of register.
  NI Its only real benefit is for encoding density as you can imply part
  NI of the register number by requiring addresses to be in address registers
  NI etc. It is not clear to me that perl special variables map well to that.
  NI Mind you the names are just a human thing - it is the bit-pattern that 
  NI compiler cares about.

no register to register moves are needed. if you call tr with the
default $_ it get generated as:

TR_OP   REG_ARG, map_reg, result_reg ; REG_ARG is $_

where  are register indexes generate by the compiler. it assumes
something is already in $_

if you use tr on $foo you code gen:

push reg  ; save reg if needed
load reg, $foo; load if needed - probably more code
; than this
TR_OP   reg, map_reg, result_reg

the first two lines are only there is $foo isn't in a register yet.
in both cases result_reg is picked by the compiler and must be free to
use. it gets the tr count return (the result of the op).

the idea is that coding for $_ or $foo is basically the same. you just
use the special register index for $_ vs. any regular one for $foo.

code generation with this type of IL is almost one to one with
perl. most of the language level operations map to one op code, so it is
mostly a case of managing the data in and out of the registers and
calling the op codes. a real compiler has to do so much more. TIL and c
back ends will need much more work than the basic perl to IL
translation.

idea: we have a VOID special register. it is like /dev/null. if you
execute an op in a void context, you pass in the VOID special register
to get the result. then either no result is generated or stored
depending on how we code it. this doesn't handle the runtime context of
sub calls.

uri

-- 
Uri Guttman  -  [EMAIL PROTECTED]  --  http://www.sysarch.com
SYStems ARCHitecture and Stem Development -- http://www.stemsystems.com
Learn Advanced Object Oriented Perl from Damian Conway - Boston, July 10-11
Class and Registration info: http://www.sysarch.com/perl/OOP_class.html

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-05-30 Thread Uri Guttman


 NI == Nick Ing-Simmons [EMAIL PROTECTED] writes:

  NI The overhead of op dispatch is a self-proving issue - if you
  NI have complex ops they are expensive to dispatch.

but as someone else said, we can design our own ops to be as high level
as we want. lowering the number of op calls is the key. that loop will
be a bottleneck as it is in perl5 unless we optimize it now.

  NI With a 16-bit opcode as-per-Uri that becomes:

  NIwhile (1) *(table[*op_ptr++])();

  NI (Assuming we don't need to check bounds 'cos we won't generate bad code...)

i dropped the 16 bit idea in favor of an extension byte code that zhong
mentioned. it has several wins, no ordering issues, it is pure 'byte'
code. 

  NI One can then start adding decode to the loop:
 
  NIwhile (1) {
  NI  op_t op = *op_ptr++;
  NI  switch(NUM_ARGS(op))

no switch, a simple lookup table:

op_cnt = op_counts[ op ] ;

  NI*(table[FUNC_NUM(op)])(*op_ptr++);

i don't have the c code in mind for that. something that uses op_cnt and
passes in that many args. 

op_ptr += op_cnt;


uri

-- 
Uri Guttman  -  [EMAIL PROTECTED]  --  http://www.sysarch.com
SYStems ARCHitecture and Stem Development -- http://www.stemsystems.com
Learn Advanced Object Oriented Perl from Damian Conway - Boston, July 10-11
Class and Registration info: http://www.sysarch.com/perl/OOP_class.html

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-05-30 Thread Nick Ing-Simmons


Dave Mitchell [EMAIL PROTECTED] writes:

There's no reason why you can.t have a hybrid scheme. In fact I think
it's a big win over a pure register-addressing scheme. Consider...

Which was more or less my own position...


At the start of a new scope, the stack is extended by N to create a new
stack frame (including a one-off check that the stack can be
extended).  There is then a 'stack pointer' (sp) which is initialised
to the base of the new frame, or an initial offset thereof. (So sp is
really just a temporary index within the current frame.)

Then some opcodes can use explicit addressing, while others can be explicit,
or a mixture.

Explicit opcodes specify one or more 'registers' - ie indexes within the
current frame, while implicit opcodes use the current value of sp as an
implicit index, and may alter sp as a side effect. So an ADD opcode
would use sp[0], sp[-1] to find the 2 operands and would store a pointer
to the result at sp[-1], then sp--. The compiler plants code in such a way
that it will never allow sp to go outside the current stack frame.

This allows a big win on the size of the bytecode, and in terms of the
time required to decode each op.

Consider the following code.

$a = $x*$y+$z

Suppose we have r5 and r6 available for scratch use, and that for some
reason we wish to keep a pointer to $a in r1 at the end (perhaps we use
$a again a couple of lines later):


This might have the following bytecode with a pure resiger scheme:

GETSV('x',r5)  # get pointer to global $x, store in register 5
GETSV('y',r6)
MULT(r5,r5,r6)  # multiply the things pointed to by r5 and r6; store ptr to
   # result in r5
GETSV('z',r6)
ADD(r5,r5,r6)
GETSV('a',r1)
SASSIGN(r1,r5)

Globals are a pain. Consider this code:

sub foo
{
 my ($x,$y,$z) = @_;
 return $x*$y+$z;
}

In the pure register (RISC-oid) scheme the bytecode should be:

FOO:
  MULT(arg1,arg2,tmp1)
  ADD(tmp1,arg3,result)
  RETURN

That is lexicals get allocated registers at compile time, and ops
just go get them.

In the pure stack with alloc scheme (x86-oid) scheme it should be
  ENTER +1 # need a temp
  MULT SP[1],SP[2],SP[4]   # $x*$y
  ADD SP[4],SP[3],SP[1]# temp + $z - result
  RETURN -2# Loose temp and non-results   

And in a pure stack (FORTH, PostScript) style it might be
  rot 3# reorder stack to get x y on top
  mpy
  add
  ret   


but might be like this in a hybrid scheme:

SETSP(5)   # make sp point to r5
GETSV('x') # get pointer to global $a, store at *sp++
GETSV('y')
MULT
GETSV('z')
ADD
GETSV('a')
SASSIGN
SAVEREG(r1)# pop pointer at *sp, and store in register 1

The problem that the hybrid scheme glosses over is the re-order the args
issue that is handled by register numbers, stack addressing or 
FORTH/PostScript stack re-ordering.
It avoids it by expensive long range global fetches - which is indeed
what humans do when writing PostScript - use globals - but compilers
can keep track of such mess for us.



Both use the same regsters, have the same net result, but the explicit
scheme requires an extra 11 numbers in the bytecode, not to mention all
the extra cycles required to extract out those nunmbers from the bytecode
in the first place.
-- 
Nick Ing-Simmons
who is looking for a new job see http://www.ni-s.u-net.com/

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-05-30 Thread Larry Wall


Dan Sugalski writes:
: Right, but in this case we have the advantage of tailoring the instruction 
: set to the language, and given the overhead inherent in op dispatch we also 
: have an incentive to hoist opcodes up to as high a level as we can manage.

We basically tried this experiment with Perl 5, and it's only a partial
success.  Yes, you end up with a Perl VM that can run Perl pretty fast,
but it tends to complicate the mapping to other virtual machines.
(Enough so that we still don't have a reasonable way to run Perl on a
JVM, despite several attempts.)

I guess the real question is to what extent the world of the future will
use interpreters, and to what extent it'll settle on JIT compiling instead.
And that's a big enough dog that we can't wag it very easily.

By the way, have you folks considered how to unify the regex opcodes
with the normal Perl opcodes?  I suspect we might want to do that.

Larry

RE: Stacks, registers, and bytecode. (Oh, my!)

2001-05-30 Thread Hong Zhang



 There's no reason why you can.t have a hybrid scheme. In fact I think
 it's a big win over a pure register-addressing scheme. Consider...

The hybrid scheme may be a win in some cases, but I am not sure if it
worth the complexity. I personally prefer a strict RISC style opcodes,
mainly load, store, and ops for common operators (+, -, * etc), plus
escaped opcode for complicated operators and functions.

 Consider the following code.
 
 $a = $x*$y+$z
 
 Suppose we have r5 and r6 available for scratch use, and that for some
 reason we wish to keep a pointer to $a in r1 at the end 
 (perhaps we use $a again a couple of lines later):
 
 
 This might have the following bytecode with a pure resiger scheme:
 
 GETSV('x',r5) # get pointer to global $x, store in register 5
 GETSV('y',r6)
 MULT(r5,r5,r6)  # multiply the things pointed to by r5 and 
 r6; store ptr to
   # result in r5
 GETSV('z',r6)
 ADD(r5,r5,r6)
 GETSV('a',r1)
 SASSIGN(r1,r5)

Please note most of common operations will deal with locals, not
globals. Since almost all locals will fit into register set, the
generated bytecode will be very small and very fast.

The global access is doomed to be slower than locals, especailly
considering the synchronization overhead associated with threading.

Hong

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-05-30 Thread Nick Ing-Simmons


Uri Guttman [EMAIL PROTECTED] writes:
 NI == Nick Ing-Simmons [EMAIL PROTECTED] writes:

  NI The overhead of op dispatch is a self-proving issue - if you
  NI have complex ops they are expensive to dispatch.

but as someone else said, we can design our own ops to be as high level
as we want. lowering the number of op calls is the key. that loop will
be a bottleneck as it is in perl5 unless we optimize it now.

  NI With a 16-bit opcode as-per-Uri that becomes:

  NIwhile (1) *(table[*op_ptr++])();

  NI (Assuming we don't need to check bounds 'cos we won't generate bad code...)

i dropped the 16 bit idea in favor of an extension byte code that zhong
mentioned. it has several wins, no ordering issues, it is pure 'byte'
code. 

  NI One can then start adding decode to the loop:
 
  NIwhile (1) {
  NI  op_t op = *op_ptr++;
  NI  switch(NUM_ARGS(op))

no switch, a simple lookup table:

   op_cnt = op_counts[ op ] ;

Myths of 21st Century Computing #1:
 Memory lookups are cheap

Most processors only have only one memory unit and it typically has
a long pipeline delay. But many have several units that can do 
compare etc.

A lookup table may or may-not be faster/denser than a switch.
A lookup may take 9 cycles down a memory pipe while

ans = (op  16) ? 2 : (op  8) ? 1 : 0;

might super-scalar issue in 1 cycle.  Code at high level and let 
C compiler know what is best. C will give you a lookup if that 
is best.

Memory ops need not be expensive if they pipeline well, but 
making one memory op depend on the result of another is bad idea
e.g.

   op   = *op_ptr++;
   arg1 = *op_ptr++;
   arg2 = *op_ptr++;

May apear to happen in 3 cycles, as all the loads can be issued in a pipelined
manner and ++s issued in parallel. While

   op  = *op_ptr++;
   ans = table[op];

could take seem to 18 cycles as can't start 2nd load till 1st one completes.

I have been meaning to try and prove my point with 
a software-pipelined dispatch loop which is fetching one op,
decoding previous one and executing one before that.

-- 
Nick Ing-Simmons
who is looking for a new job see http://www.ni-s.u-net.com/

RE: Stacks, registers, and bytecode. (Oh, my!)

2001-05-30 Thread Dan Sugalski


At 10:06 AM 5/30/2001 -0700, Hong Zhang wrote:

  There's no reason why you can.t have a hybrid scheme. In fact I think
  it's a big win over a pure register-addressing scheme. Consider...

The hybrid scheme may be a win in some cases, but I am not sure if it
worth the complexity. I personally prefer a strict RISC style opcodes,
mainly load, store, and ops for common operators (+, -, * etc), plus
escaped opcode for complicated operators and functions.

While RISC style opcodes have a certain pleasant simplicity to them, the 
opcode overhead'll kill us. (If perl 5's anything to go by, and in this 
case I think it might be)

Please note most of common operations will deal with locals, not
globals. Since almost all locals will fit into register set, the
generated bytecode will be very small and very fast.

The size of the generated bytecode's likely not going to be directly an 
issue in determining speed. (And we can't count on keep the locals 
exclusively in the register set either, as otherwise string eval won't be 
able to find them, and neither will the debugger)

The global access is doomed to be slower than locals, especailly
considering the synchronization overhead associated with threading.

We still may have synchronization issues with threads and lexicals, but 
likely far less than with globals.

Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-05-30 Thread Uri Guttman


 DS == Dan Sugalski [EMAIL PROTECTED] writes:

  DS The size of the generated bytecode's likely not going to be
  DS directly an issue in determining speed. (And we can't count on
  DS keep the locals exclusively in the register set either, as
  DS otherwise string eval won't be able to find them, and neither will
  DS the debugger)

but if registers are usually just a PMC pointer, that address can also
be accessed through the symbol table or pad or whatever. in my view what
the compiler is doing with registers is actually aliasing a register
index to some PMC somewhere. a load register doesn't copy the PMC, just
its address into the register slot.

this whole purpose of the register system is to support a simple
bytecode with integer offsets for arguments. that bytecode format has
many postive qualities so we should try to make the IL map well to
it. the register idea is not a speed thing in and of itself. the
registers are the logical way the VM operates on data. fetching a global
may require an operations such as a symbol lookup and load the PMC into
a register. the compiler tracks register usage and does the push/pops to
manage them.

  DS We still may have synchronization issues with threads and lexicals, but 
  DS likely far less than with globals.

by keeping all thread specific data in a single data structure, then
only global accesses will need to be synchronized. i think the VM data
(stack, registers, PC, etc.)  will be a part of this per thread data
block.

uri

-- 
Uri Guttman  -  [EMAIL PROTECTED]  --  http://www.sysarch.com
SYStems ARCHitecture and Stem Development -- http://www.stemsystems.com
Learn Advanced Object Oriented Perl from Damian Conway - Boston, July 10-11
Class and Registration info: http://www.sysarch.com/perl/OOP_class.html

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-05-29 Thread Uri Guttman


 DS == Dan Sugalski [EMAIL PROTECTED] writes:

  DS If someone can make a compelling argument for details on how the
  DS registers should be done that should be visible to the bytecode
  DS (Like, say, a variant on SPARC's windowing system, or forcing them
  DS on the stack such that pushes and pops renumber the registers)
  DS that'd be keen. I can see the merits, but I'm not compelled one
  DS way or the other quite yet.

here is an idea. if we use a pure stack design but you can access the
stack values with an index, then the index number can get large. so a
fixed register set would allow us to limit the index to 8 bits. so the
byte code could look something like this:

16 bit op   (plenty of room for growth)
8 bit register index of arg 1
8 bit register index of arg 2
...

next op code ...

literal data support is needed (read only)

either each op code knows how many args it has, or we have an end marker
(e.g  0xff which is never used as a register index).

the op code is stored in network endian order and the interpreter will
always build a 16 bit int from the 2 bytes.

we have a simple set of load literal, push/pop (multiple) registers op
codes. 

each thread has its own register set.

all registers point to PMC's

passing lists to/from subs is via an array ref. the data list is on the
stack and the array ref is in @_ or passed by return().

special registers ($_, @_, events, etc.) are indexed with a starting
offset of 64, so general registers are 0-63.

this can be mmapped in, executed with NO changes, fairly easily
generated by the compiler front end, optimizable on or offline,
(dis)assembler can be simply written, etc.

simple to translate to TIL code by converting each op code to a call to
the the op function itself and passing in the register indexes or even
the PMC pointers themselves.

uri

-- 
Uri Guttman  -  [EMAIL PROTECTED]  --  http://www.sysarch.com
SYStems ARCHitecture and Stem Development -- http://www.stemsystems.com
Learn Advanced Object Oriented Perl from Damian Conway - Boston, July 10-11
Class and Registration info: http://www.sysarch.com/perl/OOP_class.html

RE: Stacks, registers, and bytecode. (Oh, my!)

2001-05-29 Thread Hong Zhang



 here is an idea. if we use a pure stack design but you can access the
 stack values with an index, then the index number can get large. so a
 fixed register set would allow us to limit the index to 8 bits. so the
 byte code could look something like this:
 
   16 bit op   (plenty of room for growth)
   8 bit register index of arg 1
   8 bit register index of arg 2
   ...
 
   next op code ...
 
   literal data support is needed (read only)

 either each op code knows how many args it has,

I like to do so, otherwise we will lose most of the performance gain.

 or we have an end marker (e.g  0xff which is never used as a register
index).

If we have to use variable arguments, I strongly recommend to add one argc
byte immediately following the opcode. Linear scan bytecode will be very
slow.

 the op code is stored in network endian order and the interpreter will
 always build a 16 bit int from the 2 bytes.

The 16-bit op has both endian issue and alignment issue. Most of RISC
machine can not access byte-aligned opcode, so we have to add a lot
of padding. Anyway, it will be fatter and slower than 8-bit opcode.
I prefer to using escape opcode.

 we have a simple set of load literal, push/pop (multiple) registers op
 codes. 

There should be no push/pop opcodes. They are simply register moves.
 
 each thread has its own register set.
 
 all registers point to PMC's
 
 passing lists to/from subs is via an array ref. the data list 
 is on the
 stack and the array ref is in @_ or passed by return().
 
 special registers ($_, @_, events, etc.) are indexed with a starting
 offset of 64, so general registers are 0-63.
 
 this can be mmapped in, executed with NO changes, fairly easily
 generated by the compiler front end, optimizable on or offline,
 (dis)assembler can be simply written, etc.
 
 simple to translate to TIL code by converting each op code to a call to
 the the op function itself and passing in the register indexes or even
 the PMC pointers themselves.

Agreed.

Hong

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-05-29 Thread Dan Sugalski


At 05:32 PM 5/29/2001 -0400, Bryan C. Warnock wrote:
On Tuesday 29 May 2001 05:15 pm, Hong Zhang wrote:
   either each op code knows how many args it has,
 
  I like to do so, otherwise we will lose most of the performance gain.

I would think, though, that some ops would benefit more from varargs.
String assembly comes to mind.

If an opcode wants some undefined number of arguments passed, it'll get a 
single one (a list) passed to it. I don't know that we'll have too many 
opcodes other than perhaps sub and method call ones that'll take a variable 
number of arguments, and the ones that do will really take one--a list with 
potentially a variable number of entries in it. So instead of seeing:

   some_op a, b, c, d, e...

we'd see:

   templist r1
   push r1, array1
   push r1, scalar2
   push r1, 15
   some_op, r1

Though that might have sufficient performance loss in opcode dispatch to 
warrant doing things in another way.

   or we have an end marker (e.g  0xff which is never used as a register
 
  index).
 
  If we have to use variable arguments, I strongly recommend to add one
  argc byte immediately following the opcode. Linear scan bytecode will be
  very slow.

Agreed.

I don't think this'll come up. Ops that take a variable number of arguments 
will have a count, though, and it'll probably have to be the first element.

  The 16-bit op has both endian issue and alignment issue. Most of RISC
  machine can not access byte-aligned opcode, so we have to add a lot
  of padding. Anyway, it will be fatter and slower than 8-bit opcode.
  I prefer to using escape opcode.

This is a discussion we've had before.  It seems if we solve the Unicode
issue, we've solved the bytecode issue, too.  Better yet, make all the
opcodes map to UTF-8 characters, so you can write and execute bytecode as
simple strings.  ;-)

Nah, bytecode'll have an endianness marker at the top. If you load in 
bytecode with the wrong endianness, the loader will have to swap for you.

Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-05-29 Thread Dan Sugalski


At 04:25 PM 5/29/2001 -0400, Uri Guttman wrote:
  DS == Dan Sugalski [EMAIL PROTECTED] writes:

   DS If someone can make a compelling argument for details on how the
   DS registers should be done that should be visible to the bytecode
   DS (Like, say, a variant on SPARC's windowing system, or forcing them
   DS on the stack such that pushes and pops renumber the registers)
   DS that'd be keen. I can see the merits, but I'm not compelled one
   DS way or the other quite yet.

here is an idea. if we use a pure stack design but you can access the
stack values with an index, then the index number can get large. so a
fixed register set would allow us to limit the index to 8 bits. so the
byte code could look something like this:

Right, but that means values can potentially have their offsets change 
based on what might happen during the course of a section of code, and I'm 
not too comfortable with that. The potential for error worries me. Plus it 
might make a mixed-type stack trickier. (And yes, I realize the irony of me 
arguing against one form of error-prone activity in the same paragraph that 
advocates another...)

 literal data support is needed (read only)

Yup. We'll need some sort of constant support, as well as some sort of 
sectional literal table.

the op code is stored in network endian order and the interpreter will
always build a 16 bit int from the 2 bytes.

Not network. Native. We can put a marker at the beginning of any bytecode 
stream, and provide an endian-swapper. That way we're always running at 
platform optimal encoding, and if we get bytecode from a platform with 
different endianness we can run the utility to swap things for us.

all registers point to PMC's

I really think we'll win if we have support for at least integers as well 
as PMCs. There's potentially a lot of integer work that'll be generated by 
the optimizer and, while integer opcodes might not make the interpreter 
much faster, they'll speed up TIL and generated C code significantly since 
they won't need to call opcodes.

passing lists to/from subs is via an array ref. the data list is on the
stack and the array ref is in @_ or passed by return().

PMC for a list, and the args are in the list. (Potentially aliased, of 
course)  Subs prototyped with input and return data won't use lists, 
they'll pass the parameters in registers. (Or something like that)

special registers ($_, @_, events, etc.) are indexed with a starting
offset of 64, so general registers are 0-63.

I'd name them specially (S0-Snnn) rather than make them a chunk of the 
normal register set.


Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-05-29 Thread Larry Wall


Dan Sugalski writes:
: Nah, bytecode'll have an endianness marker at the top. If you load in 
: bytecode with the wrong endianness, the loader will have to swap for you.

Er.  If they're not bytes, we can't call it bytecode.

Larry

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-05-29 Thread Nathan Torkington


Dan Sugalski writes:
 Okay--Parrot Object Code. (If I was feeling cleverer at the moment, I'd 
 come up with a name that has a snappier acronym. Alas I'm not... :)

p-code.  The p stands for Parrot :-)

Nat

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-05-29 Thread Uri Guttman


 DS == Dan Sugalski [EMAIL PROTECTED] writes:

  DS At 02:15 PM 5/29/2001 -0700, Hong Zhang wrote:
we have a simple set of load literal, push/pop (multiple) registers op
codes.
   
   There should be no push/pop opcodes. They are simply register moves.

  DS The one handy thing about push and pop is you don't need to go
  DS tracking the stack manually--that's taken care of by the push and
  DS pop opcodes. They can certainly be replaced with manipulations of
  DS a temp register and indirect register stores or loads, but that's
  DS more expensive--you do the same thing only with more dispatch
  DS overhead.

  DS And I'm considering the stack as a place to put registers
  DS temporarily when the compiler runs out and needs a spot to
  DS squirrel something away, rather than as a mechanism to pass
  DS parameters to subs or opcodes. This is a stack in the traditional
  DS scratch-space sense.

i agree with that. the stack here is mostly a call stack which
save/restores registers as we run out. with a large number like 64, we
won't run out until we do some deep calls. then the older registers (do
we have an LRU mechnism here?) get pushed by the sub call prologue which
then uses those registers for its my vars.

is the sub call/return stack also the data (scratch) stack? i think
separate ones makes sense here. the data stack is just PMC pointers, the
code call stack has register info, context, etc.

uri

-- 
Uri Guttman  -  [EMAIL PROTECTED]  --  http://www.sysarch.com
SYStems ARCHitecture and Stem Development -- http://www.stemsystems.com
Learn Advanced Object Oriented Perl from Damian Conway - Boston, July 10-11
Class and Registration info: http://www.sysarch.com/perl/OOP_class.html

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-05-29 Thread Jarkko Hietaniemi


On Tue, May 29, 2001 at 04:36:51PM -0600, Nathan Torkington wrote:
 Dan Sugalski writes:
  Okay--Parrot Object Code. (If I was feeling cleverer at the moment, I'd 
  come up with a name that has a snappier acronym. Alas I'm not... :)
 
 p-code.  The p stands for Parrot :-)

No, it stands for pieces of eight.

The bar silver and the arms still lie, for all that I
know, where Flint buried them; and certainly they shall
lie there for me.  Oxen and wain-ropes would not bring
me back again to that accursed island; and the worst
dreams that ever I have are when I hear the surf
booming about its coasts or start upright in bed with
the sharp voice of Captain Flint still ringing in my
ears: Pieces of eight!  Pieces of eight!

 Nat

-- 
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for 'stable'.
# It is 'dead'. -- Jack Cohen

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-05-29 Thread Uri Guttman


 DS == Dan Sugalski [EMAIL PROTECTED] writes:

  DS At 04:25 PM 5/29/2001 -0400, Uri Guttman wrote:

   here is an idea. if we use a pure stack design but you can access the
   stack values with an index, then the index number can get large. so a
   fixed register set would allow us to limit the index to 8 bits. so the
   byte code could look something like this:

  DS Right, but that means values can potentially have their offsets
  DS change based on what might happen during the course of a section
  DS of code, and I'm not too comfortable with that. The potential for
  DS error worries me. Plus it might make a mixed-type stack
  DS trickier. (And yes, I realize the irony of me arguing against one
  DS form of error-prone activity in the same paragraph that advocates
  DS another...)

current compilers do it all the time. they track register usage at
compile time and push/pop at runtime to keep the code working. a given
op is passed its args as register offsets. before/after that, there may
be ops to set up the registers. only during that single op call are the
needed registers known. this is not a sub call in perl with complex
stuff, but a single IL op call with a handful of registers passed to
it. most registers will be recycled after each perl level statement as
they are temps. only those that are mapped to real vars (and other live
things) are kept between statements.

think of this as classic CISC code generation with plenty of registers
and a scratch stack. this is stable technology. we could even find a
code generator guru (i don't know any obvious ones in the perl6 world)
to work with us on this.

   the op code is stored in network endian order and the interpreter will
   always build a 16 bit int from the 2 bytes.

  DS Not network. Native. We can put a marker at the beginning of any bytecode 
  DS stream, and provide an endian-swapper. That way we're always running at 
  DS platform optimal encoding, and if we get bytecode from a platform with 
  DS different endianness we can run the utility to swap things for us.

i disagree. bytecode should be portable without conversions. we store 16
bits in a known order (i picked network at random :) and build a native
16 bit int as we execute. or use only an 8 bit op code with one or more
escape code for the next 256 codes (like Hong Zhang said).

   all registers point to PMC's

  DS I really think we'll win if we have support for at least integers
  DS as well as PMCs. There's potentially a lot of integer work that'll
  DS be generated by the optimizer and, while integer opcodes might not
  DS make the interpreter much faster, they'll speed up TIL and
  DS generated C code significantly since they won't need to call
  DS opcodes.

ok, some special registers for hardware ops like add. but they should be
indexed off the same register set with the other special registers.

   passing lists to/from subs is via an array ref. the data list is on the
   stack and the array ref is in @_ or passed by return().

  DS PMC for a list, and the args are in the list. (Potentially
  DS aliased, of course) Subs prototyped with input and return data
  DS won't use lists, they'll pass the parameters in registers. (Or
  DS something like that)

same idea but you clarified it.

   special registers ($_, @_, events, etc.) are indexed with a starting
   offset of 64, so general registers are 0-63.

  DS I'd name them specially (S0-Snnn) rather than make them a chunk of the 
  DS normal register set.

oh, they have macro names which are special. something like:

#define MAX_PLAIN REG   64  /* 0 - 63 are plain regs */
#define REG_ARG 64  /* $_ */
#define REG_SUB_ARG 65  /* @_ */
#define REG_ARGV66  /* @ARGV */
#define REG_INT167  /* integer 1 */
#define REG_INT268  /* integer 1 */

uri

-- 
Uri Guttman  -  [EMAIL PROTECTED]  --  http://www.sysarch.com
SYStems ARCHitecture and Stem Development -- http://www.stemsystems.com
Learn Advanced Object Oriented Perl from Damian Conway - Boston, July 10-11
Class and Registration info: http://www.sysarch.com/perl/OOP_class.html

Re: Stacks registers

2001-05-27 Thread Nick Ing-Simmons


Uri Guttman [EMAIL PROTECTED] writes:
  NI No - you keep the window base handy and don't keep re-fetching it,
  NI same way you keep program counter and stack pointer handy.

  NI Getting  
  NIwindow[N] 
  NI is same cost as 
  NInext = *PC++; 

  NI My point is that to avoid keeping too-many things handy window
  NI base and stack pointer should be the same (real machine) register.

if we can control that. 

Maybe not directly, but most compilers will keep common base registers
in machine registers if you code things right.

but i see issues too. i mentioned the idea of
having $_ and other special vars and stuff would have their own PMC's in
this register set. 

Why does it have to be _this_ register set - globals can go in another
register set - SPARC's register scheme has global registers too.

That said my guess is that $_ is usually save/restored across sub/block
boundaries.

dan like the idea. that doesn't map well to a window
as those vars may not change when you call subs. i just don't see
register windows as useful at the VM level.

Call it what you will - I am arguing for an addressable stack
not for windows as such.



   i am just saying register windows don't seem to be any win for us
   and cost an extra indirection for each data access. my view is let
   the compiler keep track of the register usage and just do
   individual push/pops as needed when registers run out.

  NI That makes sense if (and only if) virtual machine registers are real 
  NI machine registers. If virtual machine registers are in memory then 
  NI accessing them on the stack is just as efficient (perhaps more so)
  NI than at some other special location. And it avoids need for 
  NI memory-to-memory moves to push/pop them when we do spill.

no, the idea is the VM compiler keeps track of IL register use for the
purpose of code generating N-tuple op codes and their register
arguments. this is a pure IL design thing and has nothing to do with
machine registers. at this level, register windows don't win IMO.

That quote is a little misleading. My point is that UNLESS machine
(real) machine registers are involved then all IL Registers are 
in memory. Given that they are in memory they should be grouped with
and addressed-via-same-base-as other memory that a sub is accessing.
(The sub will be accessing the stack (or its PAD if you like), and the 
op-stream for sure, and possibly a few hot globals.)

The IL is going to be CISC-ish - so treat it like an x86 where 
you operate on things where-they-are (e.g. on the stack) 

   add 4,BP[4]

rather than RISC where you 

   ld BP[4],X
   add 4,X
   ST X,BP[4]
   
If registers are really memory the extra moves of a RISC scheme
are expensive.

What we _really_ don't want is the worst of both worlds:

   push BP[4];
   push 4
   add
   pop  BP[4] 


i am thinking about writing a short psuedo code post about the N-tuple
op codes and the register set design. the ideas are percolating in my
brane.

uri
-- 
Nick Ing-Simmons
who is looking for a new job see http://www.ni-s.u-net.com/

Re: Stacks registers

2001-05-26 Thread Nick Ing-Simmons


Uri Guttman [EMAIL PROTECTED] writes:
  NI i.e. 
  NI  R4 = frame[N]
  NI is same cost as
  NI  R4 = per_thread[N]
  NI and about the same as
  NI  extern REGISTER GlobalRegs4 
  NI  R4 = GlobalRegs4;

well, if there is no multithreading then you don't need the per_thread
lookup. 

Well:
 (a) I thought the plan was to design threads in from the begining this time.
 (b) I maintain that cost is about the same as global variables anyway.

The case for (b) is as follows:
on RISC hardware

R4 = SomeGlobal;

becomes two instructions:

loadhigh SomeGlobal.high,rp 
ld rp(SomeGlobal.low),R4

The C compiler will try and factor out the loadhigh instruction, leaving
you with an indexed load. In most cases 

ld rp(RegBase.low+4),R4

is just a valid and takes same number of cycles, and there is normally
a form like

ld rp(rn),R4

Which allows index by variable amount.


On CISC machines, then either there is an invisible RISC (e.g. Pentium)
which behaves as above or you get something akin to PDP-11 where indirection
reads a literal address via the program counter.

move [pc+n],r4

In such cases 

move [regbase+n],r4 

is going to be just as fast - the issue is the need for a (real machine)
register to hold 'regbase'.

and the window base is not accounted for. you would need 2
indirections, the first to get the window base and the second to get the
register in that window. 

No - you keep the window base handy and don't keep re-fetching it,
same way you keep program counter and stack pointer handy.

Getting  
   window[N] 
is same cost as 
   next = *PC++; 

My point is that to avoid keeping too-many things handy window base
and stack pointer should be the same (real machine) register.

i am just saying register windows don't seem to
be any win for us and cost an extra indirection for each data access. my
view is let the compiler keep track of the register usage and just do
individual push/pops as needed when registers run out.

That makes sense if (and only if) virtual machine registers are real 
machine registers. If virtual machine registers are in memory then 
accessing them on the stack is just as efficient (perhaps more so)
than at some other special location. And it avoids need for 
memory-to-memory moves to push/pop them when we do spill.
 
-- 
Nick Ing-Simmons
who is looking for a new job see http://www.ni-s.u-net.com/

Re: Stacks registers

2001-05-26 Thread Uri Guttman


 NI == Nick Ing-Simmons [EMAIL PROTECTED] writes:

  NI Well:
  NI  (a) I thought the plan was to design threads in from the begining this time.
  NI  (b) I maintain that cost is about the same as global variables anyway.

i agree with (a) but is that always compiled in/enabled even in a single
thread perl? is the code always thread enabled with the default being
just one thread running?

   and the window base is not accounted for. you would need 2
   indirections, the first to get the window base and the second to get the
   register in that window. 

  NI No - you keep the window base handy and don't keep re-fetching it,
  NI same way you keep program counter and stack pointer handy.

  NI Getting  
  NIwindow[N] 
  NI is same cost as 
  NInext = *PC++; 

  NI My point is that to avoid keeping too-many things handy window
  NI base and stack pointer should be the same (real machine) register.

if we can control that. but i see issues too. i mentioned the idea of
having $_ and other special vars and stuff would have their own PMC's in
this register set. dan like the idea. that doesn't map well to a window
as those vars may not change when you call subs. i just don't see
register windows as useful at the VM level.

   i am just saying register windows don't seem to be any win for us
   and cost an extra indirection for each data access. my view is let
   the compiler keep track of the register usage and just do
   individual push/pops as needed when registers run out.

  NI That makes sense if (and only if) virtual machine registers are real 
  NI machine registers. If virtual machine registers are in memory then 
  NI accessing them on the stack is just as efficient (perhaps more so)
  NI than at some other special location. And it avoids need for 
  NI memory-to-memory moves to push/pop them when we do spill.

no, the idea is the VM compiler keeps track of IL register use for the
purpose of code generating N-tuple op codes and their register
arguments. this is a pure IL design thing and has nothing to do with
machine registers. at this level, register windows don't win IMO.

i am thinking about writing a short psuedo code post about the N-tuple
op codes and the register set design. the ideas are percolating in my
brane.

uri

-- 
Uri Guttman  -  [EMAIL PROTECTED]  --  http://www.sysarch.com
SYStems ARCHitecture and Stem Development -- http://www.stemsystems.com
Learn Advanced Object Oriented Perl from Damian Conway - Boston, July 10-11
Class and Registration info: http://www.sysarch.com/perl/OOP_class.html

Re: Stacks registers

2001-05-24 Thread Nick Ing-Simmons


Uri Guttman [EMAIL PROTECTED] writes:
 NI == Nick Ing-Simmons [EMAIL PROTECTED] writes:

  NI We need to decide where a perl6 sub's local variables are going
  NI to live (in the recursive case) - if we need a stack anyway it
  NI may make sense for VM to have ways of indexing the local frame
  NI rather than having global registers (set per thread by the way?)

i made that thread point too in my long reply to dan.

but indexing directly into a stack frame is effectively a register
window. the problem is that you need to do an indirection through the
window base for every access and that is slow in software (but free in
hardware).

It isn't free in hardware either, but cost may be lower.
Modern machines should be able to schedule indirection fairly efficiently.
But I would contend we are going to have at least one index operation
anyway - if only from the thread pointer, or global base - so 
with careful design so that registers are at right offset from the base
we can subsume the register lookup index into that.

i.e. 
 R4 = frame[N]
is same cost as
 R4 = per_thread[N]
and about the same as
 extern REGISTER GlobalRegs4 
 R4 = GlobalRegs4;




-- 
Nick Ing-Simmons
who is looking for a new job see http://www.ni-s.u-net.com/

Re: Stacks registers

2001-05-24 Thread Nick Ing-Simmons


Alan Burlison [EMAIL PROTECTED] writes:
 1. When you call deep enough to fall off the end of the large register
file an expensive system call is needed to save some registers
at the other end to memory and wrap, and then again when you
come back to the now-in-memory registers.

Not a system call but a trap - they aren't the same thing (pedant mode off
;-).  The register spill trap handler copies the relevant registers onto the
stack - each stack frame has space allocated for this.

Pedant mode accepted - and I concur. But trap handler is still significant
overhead compared to just doing the moves (scheduled) inline 
as part of normal code. So register windows win if you stay in bounds
but loose quite seriously if you have active deep calls.

(My own style is to write small functions rather than #define or inline,
for cache reasons - this has tended to make above show. I am delighted to 
say that _modern_ (Sun) SPARCs have deep enough windows even for me - 
but SPARCStation1+ and some of the lowcost CPUs didn't.)



Alan Burlison
-- 
Nick Ing-Simmons
who is looking for a new job see http://www.ni-s.u-net.com/

Re: Stacks registers

2001-05-24 Thread Uri Guttman


 NI == Nick Ing-Simmons [EMAIL PROTECTED] writes:

  NI Uri Guttman [EMAIL PROTECTED] writes:

   but indexing directly into a stack frame is effectively a register
   window. the problem is that you need to do an indirection through the
   window base for every access and that is slow in software (but free in
   hardware).

  NI It isn't free in hardware either, but cost may be lower.  Modern
  NI machines should be able to schedule indirection fairly
  NI efficiently.  But I would contend we are going to have at least
  NI one index operation anyway - if only from the thread pointer, or
  NI global base - so with careful design so that registers are at
  NI right offset from the base we can subsume the register lookup
  NI index into that.

  NI i.e. 
  NI  R4 = frame[N]
  NI is same cost as
  NI  R4 = per_thread[N]
  NI and about the same as
  NI  extern REGISTER GlobalRegs4 
  NI  R4 = GlobalRegs4;

well, if there is no multithreading then you don't need the per_thread
lookup. and the window base is not accounted for. you would need 2
indirections, the first to get the window base and the second to get the
register in that window. i am just saying register windows don't seem to
be any win for us and cost an extra indirection for each data access. my
view is let the compiler keep track of the register usage and just do
individual push/pops as needed when registers run out.

uri

-- 
Uri Guttman  -  [EMAIL PROTECTED]  --  http://www.sysarch.com
SYStems ARCHitecture and Stem Development -- http://www.stemsystems.com
Learn Advanced Object Oriented Perl from Damian Conway - Boston, July 10-11
Class and Registration info: http://www.sysarch.com/perl/OOP_class.html

Re: Stacks registers

2001-05-23 Thread Uri Guttman


 DS == Dan Sugalski [EMAIL PROTECTED] writes:

  DS Should Parrot be a register or stack-based system, and if a
  DS register-based one, should we go with typed registers?

first question: is this for the intermediate language or the back end
VM? they don't have to be the same.

since our goal is to support the polymorphic front/back end design, the
intermediate language (IL) should be easy to generate and easy to
compile to various targets. also it needs to have other features like
being modifiable for optimization, storable on disk, debuggable (for dev
types), etc.

this IL could be the VM we use in perl itself and it can also be
compiled to other VM's or c, etc.

also it could be processed/compiled to a faster perl VM for execution.


  DS My personal preference here is to make Parrot a register based
  DS system. I can wrap my head around that a lot easier than a stack
  DS system, it's something I'm comfortable with, there's lots of
  DS literature on optimizing this sort of system, and I just generally
  DS like it better. (I cut my programming teeth on 6502s... so sue me
  DS :) Stack based systems have a certain appeal--they're simpler
  DS generally, which is fine. I'm not too worried about simpler as
  DS much as I am faster, and I think we can get faster out of
  DS registers. (Or, if you prefer, named temporaries instead of
  DS registers. Whatever)

the PL/I compiler suite i hacked on used that style of IL. but is
allocated as many named temps as it needed by using N-tuples for each op
code. each OP would leave its result in a named temp which could be
referred to by later OPs. i think they were generic (as PL/I could mung
many more types of data than perl) but the compiler knew what held what
so that wasn't an issue.

  DS If (or when, I suppose, barring a Really Good Counter-Argument) we
  DS go the register route, then, should we have typed registers like
  DS most CPUs do? A set of PMC pointer registers, a set of integer
  DS registers, a set of floating-point registers, a set of string
  DS registers? And if we do go with typed registers, should they be
  DS linked together? (So that the int in Iregister 1 matches the
  DS integer value of the PMC in Pregister 1, and the float value in
  DS Fregister 1?) And if they're linked (And should they all be
  DS linked, or only some of them) should we guarantee consistency, or
  DS should the bytecode explicitly make various registers valid or
  DS invalid?

the problem with a fixed number of registers is register overflow. this
is true in all register based machines and VM's and is a pain. if we go
with a pure named temp architecture, you don't need to worry about
that. but it is slower to access and manage named temps than a fixed set
of registers (or even just a stack which is like a single reg machine).

  DS My current thoughts are this:

  DS We have a set of N registers. They're all linked. Nothing
  DS implicitly sets values in any of the registers (if you want an
  DS integer value, you need to make one). Each register has a set of
  DS validity markers for each type (int, flaot, string, PMC) that may
  DS or may not be bits. We have a stack of sorts that we can push the
  DS registers on to if we need.

that is a good start. the stack can handle the overflow but that means
more complexity in the compiler to deal with that. let's do some study
of various IL designs (including both compiled and interpreted) and see
their wins and losses. i think pure stack based is out as it is too
limiting. i have some experience with N-tuples and it does work well for
compiling as you don't care about the managing of registers at run time
then. you just keep a fat namespace table at compile time and throw it
out later. we have a compile and run phase so managing a large number of
N-tuples may be slow running but easy to generate and manipulate. that
is a typical tradeoff.

so my current best idea is a fixed size set of active N-tuples/registers
(any number larger than most long statements would work, say 100?) as
the registers. each N-tuple has its op code and argument register
indexes and which register it stores its result in. we have a stack for
the overflow handling and that is the hardest part.

we should have a small set of special registers to handle events,
booleans, the PC, signals, etc.

this makes compiling and optimization very easy. we just need to manage
the lifetime of registers for the N-tuples and deal with the (hopefully)
rare overflow. runtime interpretation is also fairly simple (not as easy
as a stack but not too bad) with decent speed. code generation to other
VM or storing to disk should be fairly straightforward.

  DS I'm definitely feeling unsure about this, so feel free (please!)
  DS to wade in with comments, criticisms, or personal attacks... :)

you are a doofus.

but a smart one. :)

uri

-- 
Uri Guttman  -  [EMAIL PROTECTED]  --  http://www.sysarch.com
SYStems ARCHitecture and Stem Development --

RE: Stacks registers

2001-05-23 Thread Hong Zhang



 Register based. Untyped registers; I'm hoping that the vtable stuff can be
 sufficiently optimized that there'll be no major win in 
 storing multiple copies of a PMC's data in different types knocking
around. 
 
 For those yet to be convinced by the benefits of registers over stacks,
try
 grokking in fullness what op scratchpads are about. Ooh look, registers.

I think stack based =~ register based. If we don't have Java-like jsr 
and ret, every bytecode inside one method always operates on the same
stack
depth, therefore we can just treat the locals + stack as a flat register
file. A single pass can translate stack based code into register based code.

For example:
  push local #3; = move #(max_local + opcode_stack_depth), #3

  push local #3; push local #4; add; pop local #5; = add #5, #3, #4

  push local #3; push local #4; call foo; pop #6; = call_2 #6, #3, #4

As long as stack based system is carefully designed, we can easily add
linear-cost translation step to convert it into register based bytecode,
and run it faster.

Hong

Re: Stacks registers

2001-05-23 Thread Bryan C . Warnock


On Wednesday 23 May 2001 12:59, Dan Sugalski wrote:
 Okay, folks, here's the current conundrum:

 Should Parrot be a register or stack-based system, and if a register-based
 one, should we go with typed registers?

 My personal preference here is to make Parrot a register based system. I
 can wrap my head around that a lot easier than a stack system, it's
 something I'm comfortable with, there's lots of literature on optimizing
 this sort of system, and I just generally like it better. (I cut my
 programming teeth on 6502s... so sue me :) Stack based systems have a
 certain appeal--they're simpler generally, which is fine. I'm not too
 worried about simpler as much as I am faster, and I think we can get
 faster out of registers. (Or, if you prefer, named temporaries instead
 of registers. Whatever)

 If (or when, I suppose, barring a Really Good Counter-Argument) we go the
 register route, then, should we have typed registers like most CPUs do? A
 set of PMC pointer registers, a set of integer registers, a set of
 floating-point registers, a set of string registers? And if we do go with
 typed registers, should they be linked together? (So that the int in
 Iregister 1 matches the integer value of the PMC in Pregister 1, and the
 float value in Fregister 1?) And if they're linked (And should they all be
 linked, or only some of them) should we guarantee consistency, or should
 the bytecode explicitly make various registers valid or invalid?

A little backwards my answer will be.  I'm fairly sure we won't need linked 
registers.  The PMC's vtable implemention should already take care of the 
int/float/string cross-referencing - by having the linked registers, you'd 
really only be creating a parallel structure to wade through.  Worse case, 
you're doing both - using the vtable to create the values for the other 
registers.  (After checking the register first to see if it were valid?)  
That doesn't sound like a win at all.  You could bypass the PMC's vtable and 
populate the registers directly, but then why have the PMC.  I don't see any 
reason for the linking.

Now, assuming I'm not way out in the left-field stands above, I don't see 
the necessity for typed registers (at least as described).  (I mean, one of 
the major reasons for typed registers is for hardware optimization, which we 
can't really borrow.  We're emulating that win with the vtables.)  However,
another common use for registers is to save on allocation - like a hardware 
zero register.  You could certainly call PL_undef and the ilk (in its new 
PMC form) a dedicated register, which means it's sort of type.  But I think 
you still make a save if you automatically assume that every register is 
simply a PMC.  

The only use I could think of would be for constants that you know aren't 
going to need to be converted from one form to another.  But then you need 
to handle the two paths in the rest of your vtable entries.  Unless you make 
everything run through the registers.

Your not going to make opcodes {shudder} dependent on implicit registers, 
are you?  :-)

-- 
Bryan C. Warnock
[EMAIL PROTECTED]

Re: Stacks registers

2001-05-23 Thread Graham Barr


On Wed, May 23, 2001 at 06:06:26PM +0100, Simon Cozens wrote:
 On Wed, May 23, 2001 at 12:59:01PM -0400, Dan Sugalski wrote:
  Should Parrot be a register or stack-based system, and if a register-based 
  one, should we go with typed registers?
 
 Register based. Untyped registers; I'm hoping that the vtable stuff can be
 sufficiently optimized that there'll be no major win in storing multiple
 copies of a PMC's data in different types knocking around. 

I would agree with that too.

Graham.

Re: Stacks registers

2001-05-23 Thread Uri Guttman


 GB == Graham Barr [EMAIL PROTECTED] writes:

  GB On Wed, May 23, 2001 at 10:30:32AM -0700, Hong Zhang wrote:
   I think stack based =~ register based. If we don't have Java-like jsr 

  GB That comment reminds me of how the register file is implemented in
  GB a sun sparc. They have a large register file, but only some are accessable
  GB at any given time, say 16. When you do a sub call you place your
  GB arguments in the high registers, say 4, and shift the window pointer
  GB by 12 (in this case).  What was r12-r15 now becomes r0-r3. On return
  GB the result is placed into r0-r3 which are then available to the
  GB caller as r12-r15.

  GB This allows very efficient argument passing without having to save
  GB registers to a stack and restor them later.

there have been arguments over whether register windows win or not. on
the plus side they make passing sub args very efficient (if you can fit
them all into the window). on the minus side, you have to do major
chunks of swapping registers to/from stack. the window size is a big
issue too and some think sparc has too small a window and too few total
registers.

and besides register windows is a hardware only solution (they just mung
the virtual base in a special register). i have not heard of it being
used in pure software as the extra costs of having many registers or a
deep stack in software is nothing while in hardware is has a real (and
heavy) cost.

uri

-- 
Uri Guttman  -  [EMAIL PROTECTED]  --  http://www.sysarch.com
SYStems ARCHitecture and Stem Development -- http://www.stemsystems.com
Learn Advanced Object Oriented Perl from Damian Conway - Boston, July 10-11
Class and Registration info: http://www.sysarch.com/perl/OOP_class.html

Re: Stacks registers

2001-05-23 Thread Simon Cozens


On Wed, May 23, 2001 at 01:36:14PM -0400, Bryan C. Warnock wrote:
 You could certainly call PL_undef and the ilk (in its new 
 PMC form) a dedicated register

Which is, incidentally, exactly what happens in disassembled Perl 5 bytecode.

-- 
Why waste negative entropy on comments, when you could use the same
entropy to create bugs instead?
-- Steve Elias

Re: Stacks registers

2001-05-23 Thread Uri Guttman


 DS == Dan Sugalski [EMAIL PROTECTED] writes:

  DS [I'm taking these all at once, so bear with me]
  DS Simon wrote:

   Register based. Untyped registers; I'm hoping that the vtable stuff can be
   sufficiently optimized that there'll be no major win in storing multiple
   copies of a PMC's data in different types knocking around.

  DS Maybe, but I'm thinking that adding two cached integers together (the 
  DS integer piece of the register being essentially a cache) will be faster 
  DS than two calls to get_integer.

that makes sense for common stuff like integer math and maybe even basic
string stuff like join, interpolation and (perl5) ..

we could have a string accumulator (shades of pdp8!) for building up
strings. the problem occurs when you have nested interpolation:

foo$bar{qq{$key1$key2}}

the compiler would have to convert that to inside out execution.

idea: there still is a special string accumulator but it is a pmc and
that is passed around instead of the full register. then a new pmc is
used to replace it. this should lower the copying of strings as they get
passed around. 

  DS Also, by having integer bits of registers, it means we can avoid
  DS making a full PMC for integer constants, or for integers (or
  DS floats, or strings) the interpreter wants to deal with
  DS internally. (scratch ints/floats/strings/whatever)

yep.

   I'm unclear what purpose having virtual resisters for ints and nums serves:
   can't we just put those in real registers? Or perhaps I'm missing the pt?

  DS The point of having int/num/string registers is for cache and
  DS convenience reasons. If Parrot's dealing with an integer and it
  DS *knows* that it's an integer (because, for example, we're dealing
  DS with some sort of internal counter, or Python code) there seems
  DS little reason to promote it to a full PMC.

hmm, sort of like your special integer scratch registers too. but since
we are coding in C we can't force anything to real hardware registers,
we have to let the (g)cc handle that for us.

  DS Hong wrote:


   every bytecode inside one method always operates on the same stack
   depth, therefore we can just treat the locals + stack as a flat
   register file. A single pass can translate stack based code into
   register based code.

  DS Fair enough, but then registers are just oddly named stack
  DS entities, which makes their being on the stack a matter of
  DS implementation. (Whether that means anything useful is another
  DS matter)

as most compiler books will note, all/most IL's can be easily translated
to one another. our goal is to pick one that works best for our needs
and then translate to others as desired (like jvm or .net).

  DS Uri wrote:

   first question: is this for the intermediate language or the back end
   VM? they don't have to be the same.

  DS Bytecode. I want the bytecode to be what the Parrot VM eats--it
  DS will be Parrot's assembly language.

but that still doesn't answer my question. do we directly generate
bytecode or an IL and then generate bytecode from that? bytecode is not
an IL but a storable form of one. you can create bytecode for a stack or
register or N-tuple machine. i am saying that we can choose our IL
independently of the bytecode design. bytecode is harder to optimize and
manipulate than most IL's. so i say we compile to an IL which can be
interpreted directly and which can also be stored/retrieved in a
bytecode form. but the bytecode doesn't have to be directly
executable. it may need to be expanded back into the IL (which could be
very fast with a simple mapping). 

   since our goal is to support the polymorphic front/back end design,
   the intermediate language (IL) should be easy to generate and easy
   to compile to various targets. also it needs to have other features
   like being modifiable for optimization, storable on disk,
   debuggable (for dev types), etc.

  DS This is the one spot it falls down. Though arguably translating
  DS bytecodes dealing with PMCs will be more work than bytecodes that
  DS deal with ints or strings.

that was my previous point. we have to decide the tradeoff between
speed, ease of manipulation of the IL and translations (if any) to/from
bytecode.

  DS If someone thinks the whole multiple views per register thing is a bad 
  DS idea, this would be a good line of attack to start with. :)

i think it is ok but there doesn't need to be a V flag for each
type. any given N-tuple knows knows the return value of any N-tuple it
references and so fetches it directly. just like in a real compiler, you
know that memory or register is an int and don't check it each time. but
it will change over time like our registers will. a register type is
tracked at compile time only and not looked at during runtime.

   the PL/I compiler suite i hacked on used that style of IL. but is
   allocated as many named temps as it needed by using N-tuples for
   each op code. each OP would leave its result in a named temp

Re: Stacks registers

2001-05-23 Thread Uri Guttman


 NI == Nick Ing-Simmons [EMAIL PROTECTED] writes:

  NI We need to decide where a perl6 sub's local variables are going
  NI to live (in the recursive case) - if we need a stack anyway it
  NI may make sense for VM to have ways of indexing the local frame
  NI rather than having global registers (set per thread by the way?)

i made that thread point too in my long reply to dan.

but indexing directly into a stack frame is effectively a register
window. the problem is that you need to do an indirection through the
window base for every access and that is slow in software (but free in
hardware).

  NI What I do agree with is that 
  NIpush a
  NIpush b
  NIadd
  NIpop  r 

  NI is lousy way to code r = a + b - too much pointless copying. 
  NI We want 
  NIadd #a,#b,#r
  NI were #a is a small number indexing into somewhere where a is stored.

and that is an N-tuple opcode with a fixed size register set. register
windowing is not needed here, as pushing/popping as needed works well.

the win of a stack system is a cleaner and simpler op code api with
fewer op codes. but that doesn't gain us much as we have so many complex
ops that compiling them to simpler ones and all the extra runtime is a
loss.

BTW, an N-tuple op code is easily converted to/from an asm form which is
a feature that has been mentioned as desired. you can look at higher
level CISC instruction sets (e.g. vax) for examples. hell, the vax had a
polynomial op code and high level string ops!

uri

-- 
Uri Guttman  -  [EMAIL PROTECTED]  --  http://www.sysarch.com
SYStems ARCHitecture and Stem Development -- http://www.stemsystems.com
Learn Advanced Object Oriented Perl from Damian Conway - Boston, July 10-11
Class and Registration info: http://www.sysarch.com/perl/OOP_class.html

Re: Stacks registers

2001-05-23 Thread Alan Burlison


Nick Ing-Simmons wrote:

 That comment reminds me of how the register file is implemented in
 a sun sparc. They have a large register file, but only some are accessable
 at any given time, say 16.
 
 32 IIRC but principle is correct.

8 global registers, 8 out registers, 8 local registers and 8 in registers. 
Some are set aside for special purposes, e.g. %r14 is stack pointer, %r15 is
called subroutine return addr etc.  Effectively you have 6 'in' and 6 'out'
registers  Extra arguments above 6 are passed in the caller's stack frame.

 1. When you call deep enough to fall off the end of the large register
file an expensive system call is needed to save some registers
at the other end to memory and wrap, and then again when you
come back to the now-in-memory registers.

Not a system call but a trap - they aren't the same thing (pedant mode off
;-).  The register spill trap handler copies the relevant registers onto the
stack - each stack frame has space allocated for this.

Alan Burlison

Re: Stacks registers

2001-05-23 Thread Uri Guttman


 AB == Alan Burlison [EMAIL PROTECTED] writes:

  AB Uri Guttman wrote:
   but indexing directly into a stack frame is effectively a register
   window. the problem is that you need to do an indirection through the
   window base for every access and that is slow in software (but free in
   hardware).

  AB Not quite - a register window system has real registers, it's just
  AB at some point they may end up in memory.  That's not the same as
  AB indexing into memory on every access.

what i meant is that the indirection to access a given windowed register
using the window base is free (speed wise) in hardware, but doing the
same thing in software costs.

but comparing the hardware design with a pure software design is not too
useful. the costs and goals are so different. some conceptual analogies
are useful but saying register windows are fast in hardware (BTW that is
risc stuff) but not a speed win in software (where parrot is leaning
towards looking cisc-like) is not relevant.

uri

-- 
Uri Guttman  -  [EMAIL PROTECTED]  --  http://www.sysarch.com
SYStems ARCHitecture and Stem Development -- http://www.stemsystems.com
Learn Advanced Object Oriented Perl from Damian Conway - Boston, July 10-11
Class and Registration info: http://www.sysarch.com/perl/OOP_class.html

67 matches

Mail list logo