subject:"Stacks registers"

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-06 Thread Nick Ing-Simmons

Dan Sugalski <[EMAIL PROTECTED]> writes:
>
>I'm not entirely sure of that one--processing a full regex requires the 
>perl interpreter, it's not all that modular. Though whether being able to 
>yank out the RE engine and treat it as a standalone library is important 
>enough to warrant being treated as a design goal or not is a separate 
>issue. (I think so, as it also means I can treat it as a black box for the 
>moment so there's less to try and stuff in my head at once)

We are way past that point in perl5 - having tried to use perl's regexps
in a non perl app you need perl there, not the interpreter perhaps
by bits of the tokenizer and of course SV *.

So to make it modular in perl6 yoiu have to re-write it, and I am will 
Larry - lets make the main op-state machine handle those ops too.
That will help with need to special case regexps for signal despatch 
etc.

>
>*) It makes the amount of mental space the core interpreter takes up smaller

But surely we are considering "expandable" intepreter already?
Adding regexp ops is just one such extension.

>*) It can make performance tradeoffs separately from the main perl engine
>*) We can probably snag the current perl 5 source without much change

I doubt that.

>*) The current RE engine's scared (or is that scarred?) me off enough that 
>I'd as soon leave it to someone who's more tempermentally suited for such 
>things.
>*) Treating regexes as non-atomic operations brings some serious threading 
>issues into things.

Leaving them atomic does as well - I will switch threads as soon
as the regexp completes ...

-- 
Nick Ing-Simmons

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-05 Thread Dan Sugalski

At 07:40 AM 6/5/2001 -0700, Dave Storrs wrote:
>On Tue, 5 Jun 2001, Dave Mitchell wrote:
>
> > dispatch loop. I'd much rather have a 'regex start' opcode which
> > calls a separate dispath loop function, and which then interprets any
> > further ops in the bytestream as regex ops. That way we double the number
> > of 8-bit ops, and can have all the regex-specific state variables (s, send
> > etc in the earlier example) and logic separated out.
>
> This is an interesting idea...could we use this more generally to
>multiply our number of opcodes?  Basically, you have one set of opcodes
>for (e.g.) string parsing, one set for math, etc, all of which have the
>same value.  Then you have a set of opcodes that tells the interpreter
>which opcode table to look in.

Nah, that's too much work. We just allow folks to define their own opcode 
functions and assign each a lexically unique number, and dispatch to the 
function as appropriate.

Adding and overriding opcodes is definitely in the cards, though in most 
cases it'll probably be an opcode version of a function call, since 
machine-level stuff would also require telling the compiler how to emit 
those opcodes. (Which folks writing python/ruby/rebol/cobol/fortran front 
ends for the interpreter might do)

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-05 Thread Graham Barr

On Tue, Jun 05, 2001 at 03:31:24PM -0500, David L. Nicol wrote:
> Graham Barr wrote:
> 
> > I think there are a lot of benefits to the re engine not to be
> > separate from the core perl ops.
> 
> 
> So does it start with a split(//,$bound_thing) or does it use
> substr(...) with explicit offsets?

Eh ?

Nobody is suggesting we implement re's using the current set of
perl ops, but that we extend the set with ops needed for re's. So
they use the same dispatch loop and that the ops can be intermixed

Graham.

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-05 Thread David L. Nicol


Graham Barr wrote:

> I think there are a lot of benefits to the re engine not to be
> separate from the core perl ops.


So does it start with a split(//,$bound_thing) or does it use
substr(...) with explicit offsets?

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-05 Thread Graham Barr

On Mon, Jun 04, 2001 at 06:04:10PM -0700, Larry Wall wrote:
> Well, other languages have explored that option, and I think that makes
> for an unnatural interface.  If you think of regexes as part of a
> larger language, you really want them to be as incestuous as possible,
> just as any other part of the language is incestuous with the rest of
> the language.  That's part of what I mean when I say that I'm trying to
> look at regular expressions as just a strange variant of Perl code.
> 
> Looking at it from a slightly different angle, regular expressions are
> in great part control syntax, and library interfaces are lousy at
> implementing control.

Right. Having the regex opcodes be perl opcodes will certainly make
implementing (?{ ... }) much easier and probably faster too.

Also re references that we have now will become similar to subroutines
for pattern matching.

I think there are a lot of benefits to the re engine not to be
separate from the core perl ops.

Graham.

RE: Stacks, registers, and bytecode. (Oh, my!)

2001-06-05 Thread Hong Zhang

> On Tue, Jun 05, 2001 at 11:25:09AM +0100, Dave Mitchell wrote:
> > This is the bit that scares me about unifying perl ops and regex ops:
> > can we really unify them without taking a performance hit?
> 
> Coupl'a things: firstly, we can make Perl 6 ops as lightweight as we like.
> 
> Second, Ruby uses a giant switch instead of function pointers for their
> op despatch loop; Matz says it doesn't make that much difference in
> terms of performance.

Function pointer dispath is normally faster or as fast as switch. The main
down side is the context. A typical regular expression engine can pre-fetch
many variables into "register local", they can be efficiently used by
all switch cases. However, the common context for regular expression is
relative small, I am not sure of the performance hit.

Hong

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-05 Thread Dave Storrs

On Tue, 5 Jun 2001, Dave Mitchell wrote:

> dispatch loop. I'd much rather have a 'regex start' opcode which
> calls a separate dispath loop function, and which then interprets any
> further ops in the bytestream as regex ops. That way we double the number
> of 8-bit ops, and can have all the regex-specific state variables (s, send
> etc in the earlier example) and logic separated out.

This is an interesting idea...could we use this more generally to
multiply our number of opcodes?  Basically, you have one set of opcodes
for (e.g.) string parsing, one set for math, etc, all of which have the
same value.  Then you have a set of opcodes that tells the interpreter
which opcode table to look in.  The 'switching' opcodes then become
overhead, but if there aren't too many of those, perhaps its
acceptable.  And it would mean that we could specialize the opcodes a
great deal more (if, of course, that is desirable), and still have them
fit in an octet.

(Sorry if this is a stupid question, but please be patient; I've
never done internals stuff before.)

Dave

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-05 Thread Dave Mitchell

Simon Cozens <[EMAIL PROTECTED]> opined:
> On Tue, Jun 05, 2001 at 11:25:09AM +0100, Dave Mitchell wrote:
> > This is the bit that scares me about unifying perl ops and regex ops:
> > can we really unify them without taking a performance hit?
> 
> Coupl'a things: firstly, we can make Perl 6 ops as lightweight as we like.
> 
> Second, Ruby uses a giant switch instead of function pointers for their
> op despatch loop; Matz says it doesn't make that much difference in
> terms of performance.

I think it would be very messy to have both types of operands in the same
dispatch loop. I'd much rather have a 'regex start' opcode which
calls a separate dispath loop function, and which then interprets any
further ops in the bytestream as regex ops. That way we double the number
of 8-bit ops, and can have all the regex-specific state variables (s, send
etc in the earlier example) and logic separated out.

> I don't know if I've mentioned this before, but 
> http://www-6.ibm.com/jp/developerworks/linux/001027/ruby_qa.html
> was my interview with Matsumoto about his ideas for Perl 6 and his experiences
> from Ruby. It's in Japanese, so http://www.excite.co.jp/world/url/ may help.

"A talk of jewelry
Perl developer From Mr. Simon Cozens to Ruby developer It is also as a pine
It dies and is the question and reply to Mr. [squiggle]"

:-)

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-05 Thread Simon Cozens

On Tue, Jun 05, 2001 at 11:25:09AM +0100, Dave Mitchell wrote:
> This is the bit that scares me about unifying perl ops and regex ops:
> can we really unify them without taking a performance hit?

Coupl'a things: firstly, we can make Perl 6 ops as lightweight as we like.

Second, Ruby uses a giant switch instead of function pointers for their
op despatch loop; Matz says it doesn't make that much difference in
terms of performance.

I don't know if I've mentioned this before, but 
http://www-6.ibm.com/jp/developerworks/linux/001027/ruby_qa.html
was my interview with Matsumoto about his ideas for Perl 6 and his experiences
from Ruby. It's in Japanese, so http://www.excite.co.jp/world/url/ may help.

-- 
Familiarity breeds facility.
-- Megahal (trained on asr), 1998-11-06

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-05 Thread Dave Mitchell


Larry Wall <[EMAIL PROTECTED]> wrote:
> It may certainly be valuable to (not) think of it that way, but just
> don't be surprised if the regex folks come along and borrow a lot of
> your opcodes to make things that look like (in C):
> 
>while (s < send && isdigit(*s)) s++; 

This is the bit that scares me about unifying perl ops and regex ops:

I see perl ops as relatively heavyweight things that can absorb the
costs of 'heavyweight' dispatch (function call overhead, etc etc), while
regex stuff needs to be very lightweight, eg

while (op = *optr++) {
switch (op) {
case FOO:
while (s < send && isdigit(*s)) s++;
case BAR:
while (s < send && isspace(*s)) s++;



can we really unify them without taking a performance hit?

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-04 Thread Jarkko Hietaniemi


> Well, other languages have explored that option, and I think that makes
> for an unnatural interface.  If you think of regexes as part of a
> larger language, you really want them to be as incestuous as possible,

These days we can be that without feeling that guilty since pcre exists.

> just as any other part of the language is incestuous with the rest of
> the language.  That's part of what I mean when I say that I'm trying to
> look at regular expressions as just a strange variant of Perl code.
> 
> Looking at it from a slightly different angle, regular expressions are
> in great part control syntax, and library interfaces are lousy at
> implementing control.

Assuming one wants to have access to regex low level ops, but if one
wants just matches/not and submatches, a regex(3)-like interface is
quite enough.

> Larry

-- 
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for 'stable'.
# It is 'dead'. -- Jack Cohen

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-04 Thread Bryan C . Warnock

On Monday 04 June 2001 08:38 pm, Jarkko Hietaniemi wrote:
> > : Though whether being able to
> > : yank out the RE engine and treat it as a standalone library is
> > : important enough to warrant being treated as a design goal or not is a
> > : separate issue. (I think so, as it also means I can treat it as a
> > : black box for the moment so there's less to try and stuff in my head
> > : at once)
> >
> > As a fellow bear of very little brain, I'm just trying to point out that
> > we already have a good example of the dangers to that approach.
>
> Still, having the regexen as "reusable component" (i.e. library)
> wouldn't be a bad idea.  The current regex code most definitely isn't
> detachable or reusable, so we can't have said to have explored that
> option.

But to what extent?  I seem to recall one of the main reasons against 
complete separation was (speculated, perhaps) that it was too inefficient 
for the regex engine to be ignorant of Perl internals.

Given that the regexen, in all its complexity, is one of the more enviable 
capabilities of Perl (traditionally), it would certainly be nice (for 
others, I suppose) to be able to gain Perl Regex Handling via a couple calls 
and shared library.  But it was counter-argued that much of the regex's 
speed (and some of its functionality) was directly tied to Perl internal 
structures, and the layer of abstraction necessary to completely separate 
the two would not benefit either Perl, or the stand-alone engine, for both 
cases, so why bother?  IOW, if you're not going to completely exorcise Perl 
from its regexes, why try to exorcise the regexes from Perl?  

-- 
Bryan C. Warnock
[EMAIL PROTECTED]

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-04 Thread Larry Wall


Jarkko Hietaniemi writes:
: > : Though whether being able to 
: > : yank out the RE engine and treat it as a standalone library is important 
: > : enough to warrant being treated as a design goal or not is a separate 
: > : issue. (I think so, as it also means I can treat it as a black box for the 
: > : moment so there's less to try and stuff in my head at once)
: > 
: > As a fellow bear of very little brain, I'm just trying to point out that
: > we already have a good example of the dangers to that approach.
: 
: Still, having the regexen as "reusable component" (i.e. library)
: wouldn't be a bad idea.  The current regex code most definitely isn't
: detachable or reusable, so we can't have said to have explored that
: option.

Well, other languages have explored that option, and I think that makes
for an unnatural interface.  If you think of regexes as part of a
larger language, you really want them to be as incestuous as possible,
just as any other part of the language is incestuous with the rest of
the language.  That's part of what I mean when I say that I'm trying to
look at regular expressions as just a strange variant of Perl code.

Looking at it from a slightly different angle, regular expressions are
in great part control syntax, and library interfaces are lousy at
implementing control.

Larry

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-04 Thread Larry Wall


Dan Sugalski writes:
: Yeah, a lot of that's definitely a problem, as is the manipulation of the 
: return stack and some assignment bits. (You can cut the time split takes in 
: half by having the destination array presized, for example)

That's why the current version Perl goes to a great deal of trouble to
hang onto its allocations from previous iterations/invocations, on the
theory that if you needed N elements last time, you'll probably need
about N elements again this time.  In changing how locals are generated
for Perl 6, it would be easy to lose such optimizations accidentally,
and that might be unfortunate.  It may be that there's a middle ground,
where by saving only the information and not the actual allocation, we
can estimate how much to reallocate the next time through without
actually hanging on to the memory in question.

Larry

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-04 Thread Jarkko Hietaniemi


> : Though whether being able to 
> : yank out the RE engine and treat it as a standalone library is important 
> : enough to warrant being treated as a design goal or not is a separate 
> : issue. (I think so, as it also means I can treat it as a black box for the 
> : moment so there's less to try and stuff in my head at once)
> 
> As a fellow bear of very little brain, I'm just trying to point out that
> we already have a good example of the dangers to that approach.

Still, having the regexen as "reusable component" (i.e. library)
wouldn't be a bad idea.  The current regex code most definitely isn't
detachable or reusable, so we can't have said to have explored that
option.

-- 
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for 'stable'.
# It is 'dead'. -- Jack Cohen

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-04 Thread Larry Wall


Dan Sugalski writes:
: At 11:24 AM 6/4/2001 -0700, Larry Wall wrote:
: >Dan Sugalski writes:
: >: Are you speaking of the nodes in regnode.h? I hadn't considered them as
: >: regular perl opcodes--I figured they'd stay internal to the regex engine so
: >: we could keep it reasonably modular.
: >
: >I don't think that's a terribly strong argument--one could justify any
: >number of unfortunate architectural distinctions on the basis of
: >modularity.
: 
: Yeah I know. I mean, look at how those darned bricks have held back 
: architecture all these years! :-P

Hey, I come from the coast where we try to avoid bricks, especially
when they're falling.

: >Plus, I'd argue you can still retain modularity of your
: >code while unifying implementational philosophy.
: 
: I'm not entirely sure of that one--processing a full regex requires the 
: perl interpreter, it's not all that modular.

These days I'm trying to see the regex as just a funny-looking kind of
Perl code.

: Though whether being able to 
: yank out the RE engine and treat it as a standalone library is important 
: enough to warrant being treated as a design goal or not is a separate 
: issue. (I think so, as it also means I can treat it as a black box for the 
: moment so there's less to try and stuff in my head at once)

As a fellow bear of very little brain, I'm just trying to point out that
we already have a good example of the dangers to that approach.

: >It seems to me that the main reason for not considering such a
: >unification is that We've Never Done It That Way Before.  It's as if
: >regular expressions have always been second-class programs, so they'll
: >always be second-class programs, world without end, amen, amen.
: 
: No, not really. The big reasons I wasn't planning on unification are:
: 
: *) It makes the amount of mental space the core interpreter takes up smaller

It may certainly be valuable to (not) think of it that way, but just
don't be surprised if the regex folks come along and borrow a lot of
your opcodes to make things that look like (in C):

while (s < send && isdigit(*s)) s++;

: *) It can make performance tradeoffs separately from the main perl engine

The option of doing its own thing its own way is always open to an
opcode, but when you do that the option of making efficient use of the
core infrastructure goes away.  As an honorary member of the regex
hacking team, I covet registers.  :-)

: *) We can probably snag the current perl 5 source without much change

Cough, cough.

: *) The current RE engine's scared (or is that scarred?) me off enough that 
: I'd as soon leave it to someone who's more tempermentally suited for such 
: things.

As an honorary member of the temperamentally suited team, allow me to
repeat myself.  Cough, cough.  We can certainly borrow the ideas from
Perl 5's regex engine, but even us temperamentally suited veterans are
sufficiently scared/scarred to want something that works better.

: *) Treating regexes as non-atomic operations brings some serious threading 
: issues into things.

Eh, treating regexes as atomic is the root of the re-entrancy problem.
If the regex has access to local storage, the re-entrancy and threading
problems pretty much solve themselves.

: >The fact that Perl 5's regex engine is a royal pain to deal with should
: >be a warning to us.
: 
: I can think of a couple of reasons that the current engine's a royal pain, 
: and they don't have much to do with it as a separate entity...

Sure, I'm just saying that at least two of those couple reasons are
that 1) it invents its own opcode storage mechanism, and 2) it uses
globals for efficiency when it should be using some efficient variety
of locals.

: >Much of the pain of dealing with the regex engine in Perl 5 has to do
: >with allocation of opcodes and temporary values in a non-standard
: >fashion, and dealing with the resultant non-reentrancy on an ad hoc
: >basis.  We've already tried that experiment, and it sucks.  I don't
: >want to see the regex engine get swept back under the complexity carpet
: >for Perl 6.
: 
: Yeah, but those are mostly issues with the implementation, not with the 
: separation.

That sounds suspiciously like what I'm trying to say.

: >That's a scenario I'd love to avoid.  And if we can manage to store
: >regex opcodes and state using mechanisms similar to ordinary opcodes,
: >maybe we'll not fall back into the situation where the regex engine is
: >understood by only three people, plus or minus four.
: 
: While I'm not sure I agree with it, if that's what you want, then that's 
: what we'll do. Threading will complicate this some, since we'll need to 
: guarantee atomicity across multiple opcodes, something I'd not planned on 
: doing.

How will we guarantee that my $foo stays "my" under threading?  Surely
the same mechanism could serve to keep "my" regex state variables sane
under the same circumstances.  (I oversimplify, of course...)

Larry

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-04 Thread Dan Sugalski

At 01:09 AM 6/1/2001 -0700, [EMAIL PROTECTED] wrote:
>In my experience, perl opcodes have not been the performance bottleneck
>in perl5.
>
>It seems it isn't actually the loop that's the bottleneck in perl5. I
>profiled a whole bunch of different perl programs, using a lot of
>different versions of perl5; and the runops loop was very rarely among
>the top CPU users.  Many times, the opcode routines themselves weren't
>the hot spots. It was the support routines like hash key calculation
>and lookup, string comparisons, or the regular expression code.

Yeah, a lot of that's definitely a problem, as is the manipulation of the 
return stack and some assignment bits. (You can cut the time split takes in 
half by having the destination array presized, for example)

>Profiling is almost always counter-intuitive, but if I had to
>guess, I'd say that most of the per-opcode cost in perl5 was due to
>setup/initialization as each opcode was entered, and that devious/clever
>data structure design could avoid most of this. Also, opcode dispatch
>might not be the right tree up which to be barking in seeking performance.

The setup/teardown costs are definitely a heavy expense. We can cut it down 
some, hopefully a lot in some cases, but there are limits to what can be 
done. If they're treated as a fixed cost (which they more or less are), 
then the less often we pay that cost the better.

Unfortunately there's also the problem of having *too* many opcode 
functions, at which point you pay a lot of cycles in cache miss times 
fetching in Specially Optimized Function #4327. It seems, though, that if 
we have the option of doing something in one opcode instead of 10 or 20, it 
makes sense to do so as we reduce the constant costs.

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-04 Thread Dan Sugalski

At 11:46 AM 6/1/2001 -0700, [EMAIL PROTECTED] wrote:
>On Tue, May 29, 2001 at 06:20:40PM -0400, Dan Sugalski wrote:
> >
> > I really think we'll win if we have support for at least integers as well
> > as PMCs. There's potentially a lot of integer work that'll be generated by
> > the optimizer and, while integer opcodes might not make the interpreter
> > much faster, they'll speed up TIL and generated C code significantly since
> > they won't need to call opcodes.
>
>How much integer arithmetic does the perl interpreter actually do?

Some, though not as much as it ought as most math's done as floating point.

>Figuring out where the hot spots are in an interpreter for a general-
>purpose programming language is hard. I'd recommend against special
>cases in the registers, since it's not clear how much they'd help.

There are a few reasons to consider integer and floating point registers:

*) A good chunk of the optimizations I've been coming across seem to 
benefit from integer scratch space. (Granted, generally for languages with 
a different thrust than perl)
*) Code that does use it will translate more efficiently to .NET/Java 
bytecode/native machine code
*) Larry's expressed a desire to let us get lower-level with perl, which'll 
benefit from having integer scratch space
*) If the interpreter will support multiple source languages, they'll get a 
benefit from it. (Like Python and Ruby, to name two)

It might be that there's no advantage to having them, in which case we'll 
excise them and the opcodes that reference them, and that's fine. There 
seems to be some advantage to it, which is why I brought them up in the 
first place.

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-04 Thread Uri Guttman

> "DS" == Dan Sugalski <[EMAIL PROTECTED]> writes:

  DS> No, not really. The big reasons I wasn't planning on unification are:

  DS> *) It makes the amount of mental space the core interpreter takes
  DS> up smaller

not if the regex ops were developed separately. all it will do is expand
the op table, not make the core more complex. also the regex compiler
can be written separately as well.

  DS> *) It can make performance tradeoffs separately from the main perl engine

that can be done with a regex optimizer, or creating more specialized
and efficient regex ops.

  DS> *) We can probably snag the current perl 5 source without much change

is that worth keeping? maybe for bootstrapping the new setup but not
permanently. stuff like having direct access to regex engine for
debugging would be better designed in from scratch. and the moving of
the engine to the main op code loop solves that problem.

  DS> *) Treating regexes as non-atomic operations brings some serious
  DS> threading issues into things.

well, the main op loop is threaded (isn't one of the bugs with current
threads the fact that the regex engine is not thread safe?) so you get
that automatically. by just marking the regex data as shared should
trigger all the right thread locks and stuff. what happens to $1 when
the data is shared and threaded? are thread local copies of the original
data made and $1 maps inside those?

  >> That's a scenario I'd love to avoid.  And if we can manage to store
  >> regex opcodes and state using mechanisms similar to ordinary opcodes,
  >> maybe we'll not fall back into the situation where the regex engine is
  >> understood by only three people, plus or minus four.

  DS> While I'm not sure I agree with it, if that's what you want, then that's 
  DS> what we'll do. Threading will complicate this some, since we'll need to 
  DS> guarantee atomicity across multiple opcodes, something I'd not planned on 
  DS> doing.

what is wrong with a single mutex (or equivilent) around the regex? that
is a simple way to gain atomicity over multiple ops. and we would have
to define what is atomic in detail. most regexes will operate on thread
specific data as most data should be thread specific. doing a regex on a
shared variable may be costly but that can be dealt with by copying it
first to a thread local variable.

uri

-- 
Uri Guttman  -  [EMAIL PROTECTED]  --  http://www.sysarch.com
SYStems ARCHitecture and Stem Development -- http://www.stemsystems.com
Learn Advanced Object Oriented Perl from Damian Conway - Boston, July 10-11
Class and Registration info: http://www.sysarch.com/perl/OOP_class.html

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-04 Thread David L. Nicol


Jarkko Hietaniemi wrote:
> Err...a regex that isn't a regex, is this a Zen koan...?  Ahhh, you
> want to emulate the state machine in Pure Perl.  Okay... next thing
> you want to do is to write symbolic assembler in C...? :-)


I have my reasons :)


Actually, I want to write a c into perl compiler -- I have for years --
one of these long manic afternoons I'll do it, too

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-04 Thread Dan Sugalski


At 03:08 PM 6/4/2001 -0500, Jarkko Hietaniemi wrote:
>On Mon, Jun 04, 2001 at 03:43:43PM -0400, Dan Sugalski wrote:
> > At 08:34 PM 6/4/2001 +0100, Simon Cozens wrote:
> > >On Mon, Jun 04, 2001 at 02:26:26PM -0500, David L. Nicol wrote:
> > > > Does anyone have on-their-shelves a regex-into-non-regex-perl 
> translator?
> > >
> > >Does anyone have on-their-shelves a David-Nicol-into-English 
> translator? :)
> >
> > I think he's looking for something that turns a regex into perl that
> > doesn't involve regexes.
>
>You must be sharing the pipe with David since I couldn't make head||tails
>about it, either :-)

Hey, it's the standard, regulation-issue perl internals pipe. Didn't you 
get one with the pumpkin? (I think they put it inside as a joke... :)

>Err...a regex that isn't a regex, is this a Zen koan...?  Ahhh, you
>want to emulate the state machine in Pure Perl.  Okay... next thing
>you want to do is to write symbolic assembler in C...? :-)

I know, I know, what *are* we thinking? Heck, next thing you know we'll 
want to go parsing text in C too... :-P

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-04 Thread Dan Sugalski


At 04:20 PM 6/4/2001 -0400, Uri Guttman wrote:
>then regexes can be also debugged in detail with the perl debugger.
>that assumes the debugger has access to single stepping ops which is an
>intriguing idea. BTW this is the kind of feature that dan wanted the
>debugger PDD to have. having this feature supported and running early
>on, would allow us to debug the compiler and optimizer using our own
>tool. in fact if the debugger supported remote control as well as
>internal, then it could be developed in parallel in perl5. this would be
>a very useful thing to have during the development of the full perl 6.

Regexes will be able to be debugged in detail regardless of how the regex 
engine's implemented. If we go separate, you'll just single-step through 
the regex ops rather than the perl ones.

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-04 Thread Dan Sugalski

At 11:24 AM 6/4/2001 -0700, Larry Wall wrote:
>Dan Sugalski writes:
>: Are you speaking of the nodes in regnode.h? I hadn't considered them as
>: regular perl opcodes--I figured they'd stay internal to the regex engine so
>: we could keep it reasonably modular.
>
>I don't think that's a terribly strong argument--one could justify any
>number of unfortunate architectural distinctions on the basis of
>modularity.

Yeah I know. I mean, look at how those darned bricks have held back 
architecture all these years! :-P

>Plus, I'd argue you can still retain modularity of your
>code while unifying implementational philosophy.

I'm not entirely sure of that one--processing a full regex requires the 
perl interpreter, it's not all that modular. Though whether being able to 
yank out the RE engine and treat it as a standalone library is important 
enough to warrant being treated as a design goal or not is a separate 
issue. (I think so, as it also means I can treat it as a black box for the 
moment so there's less to try and stuff in my head at once)

>It seems to me that the main reason for not considering such a
>unification is that We've Never Done It That Way Before.  It's as if
>regular expressions have always been second-class programs, so they'll
>always be second-class programs, world without end, amen, amen.

No, not really. The big reasons I wasn't planning on unification are:

*) It makes the amount of mental space the core interpreter takes up smaller
*) It can make performance tradeoffs separately from the main perl engine
*) We can probably snag the current perl 5 source without much change
*) The current RE engine's scared (or is that scarred?) me off enough that 
I'd as soon leave it to someone who's more tempermentally suited for such 
things.
*) Treating regexes as non-atomic operations brings some serious threading 
issues into things.

>The fact that Perl 5's regex engine is a royal pain to deal with should
>be a warning to us.

I can think of a couple of reasons that the current engine's a royal pain, 
and they don't have much to do with it as a separate entity...

>Much of the pain of dealing with the regex engine in Perl 5 has to do
>with allocation of opcodes and temporary values in a non-standard
>fashion, and dealing with the resultant non-reentrancy on an ad hoc
>basis.  We've already tried that experiment, and it sucks.  I don't
>want to see the regex engine get swept back under the complexity carpet
>for Perl 6.

Yeah, but those are mostly issues with the implementation, not with the 
separation.

>That's a scenario I'd love to avoid.  And if we can manage to store
>regex opcodes and state using mechanisms similar to ordinary opcodes,
>maybe we'll not fall back into the situation where the regex engine is
>understood by only three people, plus or minus four.

While I'm not sure I agree with it, if that's what you want, then that's 
what we'll do. Threading will complicate this some, since we'll need to 
guarantee atomicity across multiple opcodes, something I'd not planned on 
doing.

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-04 Thread Simon Cozens

On Mon, Jun 04, 2001 at 04:20:12PM -0400, Uri Guttman wrote:
> that doesn't address larry's point which is very important. 

No, I was replying to David, so I thought I'd address his point instead.
Conventional, I know. :)

> the regex compiler needs to be able to generate the equvilent of the
> above directly into perl ops.

Yes, of course. It's easy enough to do if, for instance, you have a
Perl op called "exact", "plus", etc. ...

-- 
"Everything's working, try again in half an hour."-chorus.net tech
support

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-04 Thread Uri Guttman


> "SC" == Simon Cozens <[EMAIL PROTECTED]> writes:

  SC> OK, here's how you do it. Run perl -Mre=debug -e '/your regexp/',
  SC> and use Perl to parse the bit in the middle. That's a state machine,
  SC> so we can emulate it with subroutines.

  SC> So, for instance: perl -Mre=debug -e '/fo+bar/'
  SC> ...
  SC>1: EXACT (3)
  SC>3: PLUS(6)
  SC>4:   EXACT (0)
  SC>6: EXACT (8)
  SC>8: END(0)
  SC> ...

  SC> Now we translate that to something like this:
  SC> sub node_1 { 
  SC> $state = shift; 
  SC> while ($_ = substr($state->{string},0,1,"")) {
  SC> if ($_ eq "f") {
  SC> return success if $state->node_3;
  SC> }
  SC> }
  SC> return failure; 
  SC> }
  SC> ... and so on ...

  SC> In keeping with tradition, implementation is left as an exercise for
  SC> the reader. :)

that doesn't address larry's point which is very important. the regex
compiler needs to be able to generate the equvilent of the above
directly into perl ops. it would eliminate the separation of the regex
runtime as a black box. we could create special ops that support node
making and traversal and others that execute the low level matches. an
optimizer could then combine some ops into fewer and more specialized
ones.

then regexes can be also debugged in detail with the perl debugger.
that assumes the debugger has access to single stepping ops which is an
intriguing idea. BTW this is the kind of feature that dan wanted the
debugger PDD to have. having this feature supported and running early
on, would allow us to debug the compiler and optimizer using our own
tool. in fact if the debugger supported remote control as well as
internal, then it could be developed in parallel in perl5. this would be
a very useful thing to have during the development of the full perl 6.

uri

-- 
Uri Guttman  -  [EMAIL PROTECTED]  --  http://www.sysarch.com
SYStems ARCHitecture and Stem Development -- http://www.stemsystems.com
Learn Advanced Object Oriented Perl from Damian Conway - Boston, July 10-11
Class and Registration info: http://www.sysarch.com/perl/OOP_class.html

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-04 Thread Jarkko Hietaniemi


On Mon, Jun 04, 2001 at 03:43:43PM -0400, Dan Sugalski wrote:
> At 08:34 PM 6/4/2001 +0100, Simon Cozens wrote:
> >On Mon, Jun 04, 2001 at 02:26:26PM -0500, David L. Nicol wrote:
> > > Does anyone have on-their-shelves a regex-into-non-regex-perl translator?
> >
> >Does anyone have on-their-shelves a David-Nicol-into-English translator? :)
> 
> I think he's looking for something that turns a regex into perl that 
> doesn't involve regexes.

You must be sharing the pipe with David since I couldn't make head||tails
about it, either :-)

Err...a regex that isn't a regex, is this a Zen koan...?  Ahhh, you
want to emulate the state machine in Pure Perl.  Okay... next thing
you want to do is to write symbolic assembler in C...? :-)

> > > run time is not an issue
> >
> >Wrong.
> 
> Um Presumably if he *was* interested in run time he wouldn't have said 
> that. (Which isn't to say that I'm not interested in run time for something 
> like that if it's to be used in perl 6, but that's a separate issue)
> 
>   Dan
> 
> --"it's like this"---
> Dan Sugalski  even samurai
> [EMAIL PROTECTED] have teddy bears and even
>   teddy bears get drunk

-- 
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for 'stable'.
# It is 'dead'. -- Jack Cohen

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-04 Thread Simon Cozens

On Mon, Jun 04, 2001 at 03:43:43PM -0400, Dan Sugalski wrote:
> I think he's looking for something that turns a regex into perl that 
> doesn't involve regexes.

Oh, well, if we follow Larry's suggestion and have regexp matching ops,
then we could claim that the regex *is* Perl, it's just written in a
slightly different language. :)

OK, here's how you do it. Run perl -Mre=debug -e '/your regexp/',
and use Perl to parse the bit in the middle. That's a state machine,
so we can emulate it with subroutines.

So, for instance: perl -Mre=debug -e '/fo+bar/'
...
   1: EXACT (3)
   3: PLUS(6)
   4:   EXACT (0)
   6: EXACT (8)
   8: END(0)
...

Now we translate that to something like this:
sub node_1 { 
$state = shift; 
while ($_ = substr($state->{string},0,1,"")) {
if ($_ eq "f") {
return success if $state->node_3;
}
}
return failure; 
}
... and so on ...

In keeping with tradition, implementation is left as an exercise for
the reader. :)

> (Which isn't to say that I'm not interested in run time for something 
> like that if it's to be used in perl 6, but that's a separate issue)

Oh, I see, I see. Sorry, was confusing "implemention of thing for Perl 6"
and "implementation of this handy tool for exploration purposes".

Simon

-- 
"The elder gods went to Suggoth and all I got was this lousy T-shirt."

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-04 Thread Dan Sugalski

At 08:34 PM 6/4/2001 +0100, Simon Cozens wrote:
>On Mon, Jun 04, 2001 at 02:26:26PM -0500, David L. Nicol wrote:
> > Does anyone have on-their-shelves a regex-into-non-regex-perl translator?
>
>Does anyone have on-their-shelves a David-Nicol-into-English translator? :)

I think he's looking for something that turns a regex into perl that 
doesn't involve regexes.

> > run time is not an issue
>
>Wrong.

Um Presumably if he *was* interested in run time he wouldn't have said 
that. (Which isn't to say that I'm not interested in run time for something 
like that if it's to be used in perl 6, but that's a separate issue)

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-04 Thread Simon Cozens


On Mon, Jun 04, 2001 at 02:26:26PM -0500, David L. Nicol wrote:
> Does anyone have on-their-shelves a regex-into-non-regex-perl translator?

Does anyone have on-their-shelves a David-Nicol-into-English translator? :)

> run time is not an issue

Wrong.

-- 
I decided to spread the bad mood: Dress To Depress.
- Red Drag Diva

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-04 Thread David L. Nicol


Larry Wall wrote:

>   "Sure, you can download the object code for this 5 line Perl program
>   into your toaster...but you'll also have to download this 5 gigabyte
>   regex interpreter before it'll run."
> 
> That's a scenario I'd love to avoid.  And if we can manage to store
> regex opcodes and state using mechanisms similar to ordinary opcodes,
> maybe we'll not fall back into the situation where the regex engine is
> understood by only three people, plus or minus four.
> 
> Larry


Does anyone have on-their-shelves a regex-into-non-regex-perl translator?
run time is not an issue, correct behavior is



-- 
  David Nicol 816.235.1187 [EMAIL PROTECTED]
  "Obi-Wan taught me mysticism" -- Luke Housego

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-04 Thread Jarkko Hietaniemi


> The fact that Perl 5's regex engine is a royal pain to deal with should
> be a warning to us.
> 
> Much of the pain of dealing with the regex engine in Perl 5 has to do
> with allocation of opcodes and temporary values in a non-standard
> fashion, and dealing with the resultant non-reentrancy on an ad hoc
> basis.  We've already tried that experiment, and it sucks.  I don't
> want to see the regex engine get swept back under the complexity carpet
> for Perl 6.  It will come back to haunt us if we do:

*standing ovations*

>   "Sure, you can download the object code for this 5 line Perl program
>   into your toaster...but you'll also have to download this 5 gigabyte
>   regex interpreter before it'll run."
> 
> That's a scenario I'd love to avoid.  And if we can manage to store
> regex opcodes and state using mechanisms similar to ordinary opcodes,
> maybe we'll not fall back into the situation where the regex engine is
> understood by only three people, plus or minus four.
> 
> Larry

-- 
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for 'stable'.
# It is 'dead'. -- Jack Cohen

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-04 Thread Larry Wall


Dan Sugalski writes:
: Are you speaking of the nodes in regnode.h? I hadn't considered them as 
: regular perl opcodes--I figured they'd stay internal to the regex engine so 
: we could keep it reasonably modular.

I don't think that's a terribly strong argument--one could justify any
number of unfortunate architectural distinctions on the basis of
modularity.  Plus, I'd argue you can still retain modularity of your
code while unifying implementational philosophy.

It seems to me that the main reason for not considering such a
unification is that We've Never Done It That Way Before.  It's as if
regular expressions have always been second-class programs, so they'll
always be second-class programs, world without end, amen, amen.

But there is precedent for turning second-class code into first-class
code.  After all, that's just what we did for ordinary quotes in the
transition from Perl 4 to Perl 5.  Perl 4 had a string interpolation
engine, and it was a royal pain to deal with.

The fact that Perl 5's regex engine is a royal pain to deal with should
be a warning to us.

Much of the pain of dealing with the regex engine in Perl 5 has to do
with allocation of opcodes and temporary values in a non-standard
fashion, and dealing with the resultant non-reentrancy on an ad hoc
basis.  We've already tried that experiment, and it sucks.  I don't
want to see the regex engine get swept back under the complexity carpet
for Perl 6.  It will come back to haunt us if we do:

  "Sure, you can download the object code for this 5 line Perl program
  into your toaster...but you'll also have to download this 5 gigabyte
  regex interpreter before it'll run."

That's a scenario I'd love to avoid.  And if we can manage to store
regex opcodes and state using mechanisms similar to ordinary opcodes,
maybe we'll not fall back into the situation where the regex engine is
understood by only three people, plus or minus four.

Larry

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-01 Thread mooring

On Tue, May 29, 2001 at 06:20:40PM -0400, Dan Sugalski wrote:
> 
> I really think we'll win if we have support for at least integers as well 
> as PMCs. There's potentially a lot of integer work that'll be generated by 
> the optimizer and, while integer opcodes might not make the interpreter 
> much faster, they'll speed up TIL and generated C code significantly since 
> they won't need to call opcodes.

How much integer arithmetic does the perl interpreter actually do?

I've profiled a quite a few perl5 versions on a lot of different perl
programs, and about the only programs where integer ops made it above
the background noise were things like Ackerman, iterative Fibonacci, and
some Q&D image processing on PBM files. Most of the programs I looked
at didn't do much of any integer arithmetic.

Figuring out where the hot spots are in an interpreter for a general-
purpose programming language is hard. I'd recommend against special
cases in the registers, since it's not clear how much they'd help.
-- 
Ed Mooring ([EMAIL PROTECTED])

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-01 Thread mooring

On Wed, May 30, 2001 at 12:14:29PM -0400, Uri Guttman wrote:
> > "NI" == Nick Ing-Simmons <[EMAIL PROTECTED]> writes:
> 
>   NI> The "overhead of op dispatch" is a self-proving issue - if you
>   NI> have complex ops they are expensive to dispatch.
> 
> but as someone else said, we can design our own ops to be as high level
> as we want. lowering the number of op calls is the key. that loop will
> be a bottleneck as it is in perl5 unless we optimize it now.
> 

In my experience, perl opcodes have not been the performance bottleneck
in perl5.

It seems it isn't actually the loop that's the bottleneck in perl5. I
profiled a whole bunch of different perl programs, using a lot of
different versions of perl5; and the runops loop was very rarely among
the top CPU users.  Many times, the opcode routines themselves weren't
the hot spots. It was the support routines like hash key calculation
and lookup, string comparisons, or the regular expression code.

Profiling is almost always counter-intuitive, but if I had to
guess, I'd say that most of the per-opcode cost in perl5 was due to
setup/initialization as each opcode was entered, and that devious/clever
data structure design could avoid most of this. Also, opcode dispatch
might not be the right tree up which to be barking in seeking performance.

-- 
Ed Mooring ([EMAIL PROTECTED])

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-05-30 Thread Uri Guttman


> "DS" == Dan Sugalski <[EMAIL PROTECTED]> writes:

  DS> The size of the generated bytecode's likely not going to be
  DS> directly an issue in determining speed. (And we can't count on
  DS> keep the locals exclusively in the register set either, as
  DS> otherwise string eval won't be able to find them, and neither will
  DS> the debugger)

but if registers are usually just a PMC pointer, that address can also
be accessed through the symbol table or pad or whatever. in my view what
the compiler is doing with registers is actually aliasing a register
index to some PMC somewhere. a load register doesn't copy the PMC, just
its address into the register slot.

this whole purpose of the register system is to support a simple
bytecode with integer offsets for arguments. that bytecode format has
many postive qualities so we should try to make the IL map well to
it. the register idea is not a speed thing in and of itself. the
registers are the logical way the VM operates on data. fetching a global
may require an operations such as a symbol lookup and load the PMC into
a register. the compiler tracks register usage and does the push/pops to
manage them.

  DS> We still may have synchronization issues with threads and lexicals, but 
  DS> likely far less than with globals.

by keeping all thread specific data in a single data structure, then
only global accesses will need to be synchronized. i think the VM data
(stack, registers, PC, etc.)  will be a part of this per thread data
block.

uri

-- 
Uri Guttman  -  [EMAIL PROTECTED]  --  http://www.sysarch.com
SYStems ARCHitecture and Stem Development -- http://www.stemsystems.com
Learn Advanced Object Oriented Perl from Damian Conway - Boston, July 10-11
Class and Registration info: http://www.sysarch.com/perl/OOP_class.html

RE: Stacks, registers, and bytecode. (Oh, my!)

2001-05-30 Thread Hong Zhang


> While RISC style opcodes have a certain pleasant simplicity to them, the 
> opcode overhead'll kill us. (If perl 5's anything to go by, and in this 
> case I think it might be)

I don't understand what the opcode overhead you meant. It can not be worse
than stack based opcode.

> The size of the generated bytecode's likely not going to be directly an 
> issue in determining speed. (And we can't count on keep the locals 
> exclusively in the register set either, as otherwise string eval won't be 
> able to find them, and neither will the debugger)

If we have 64 registers, I bet 99.99% of locals will fit into register set.
For the debug information, we just need a hash to map my name to register
id. This is needed even for stack based code. How to deal with optimization
is a different story. However, it has nothing to do with the register 
based scheme itself. For example, an local variable is not used by
any of the code, however an eval($s) may refer to it. So should optimizer
eliminate the local?

Hong

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-05-30 Thread Uri Guttman


> "LW" == Larry Wall <[EMAIL PROTECTED]> writes:

  LW> We basically tried this experiment with Perl 5, and it's only a
  LW> partial success.  Yes, you end up with a Perl VM that can run Perl
  LW> pretty fast, but it tends to complicate the mapping to other
  LW> virtual machines.  (Enough so that we still don't have a
  LW> reasonable way to run Perl on a JVM, despite several attempts.)

good point. does that mean we should think more risc like? my (very
limited) impression of the jvm is that it lower level and not too
dissimilar to p-code. how much closer to jvm (and $DIETY help us, .NET)
should our IL design go?

uri

-- 
Uri Guttman  -  [EMAIL PROTECTED]  --  http://www.sysarch.com
SYStems ARCHitecture and Stem Development -- http://www.stemsystems.com
Learn Advanced Object Oriented Perl from Damian Conway - Boston, July 10-11
Class and Registration info: http://www.sysarch.com/perl/OOP_class.html

RE: Stacks, registers, and bytecode. (Oh, my!)

2001-05-30 Thread Dan Sugalski

At 10:06 AM 5/30/2001 -0700, Hong Zhang wrote:

> > There's no reason why you can.t have a hybrid scheme. In fact I think
> > it's a big win over a pure register-addressing scheme. Consider...
>
>The hybrid scheme may be a win in some cases, but I am not sure if it
>worth the complexity. I personally prefer a strict RISC style opcodes,
>mainly load, store, and ops for common operators (+, -, * etc), plus
>escaped opcode for complicated operators and functions.

While RISC style opcodes have a certain pleasant simplicity to them, the 
opcode overhead'll kill us. (If perl 5's anything to go by, and in this 
case I think it might be)

>Please note most of common operations will deal with locals, not
>globals. Since almost all locals will fit into register set, the
>generated bytecode will be very small and very fast.

The size of the generated bytecode's likely not going to be directly an 
issue in determining speed. (And we can't count on keep the locals 
exclusively in the register set either, as otherwise string eval won't be 
able to find them, and neither will the debugger)

>The global access is doomed to be slower than locals, especailly
>considering the synchronization overhead associated with threading.

We still may have synchronization issues with threads and lexicals, but 
likely far less than with globals.

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-05-30 Thread Nick Ing-Simmons

Uri Guttman <[EMAIL PROTECTED]> writes:
>> "NI" == Nick Ing-Simmons <[EMAIL PROTECTED]> writes:
>
>  NI> The "overhead of op dispatch" is a self-proving issue - if you
>  NI> have complex ops they are expensive to dispatch.
>
>but as someone else said, we can design our own ops to be as high level
>as we want. lowering the number of op calls is the key. that loop will
>be a bottleneck as it is in perl5 unless we optimize it now.
>
>  NI> With a 16-bit opcode as-per-Uri that becomes:
>
>  NI>while (1) *(table[*op_ptr++])();
>
>  NI> (Assuming we don't need to check bounds 'cos we won't generate bad code...)
>
>i dropped the 16 bit idea in favor of an extension byte code that zhong
>mentioned. it has several wins, no ordering issues, it is pure 'byte'
>code. 
>
>  NI> One can then start adding decode to the loop:
> 
>  NI>while (1) {
>  NI>  op_t op = *op_ptr++;
>  NI>  switch(NUM_ARGS(op))
>
>no switch, a simple lookup table:
>
>   op_cnt = op_counts[ op ] ;

Myths of 21st Century Computing #1:
 "Memory lookups are cheap"

Most processors only have only one memory unit and it typically has
a long pipeline delay. But many have several units that can do 
compare etc.

A lookup table may or may-not be faster/denser than a switch.
A lookup may take 9 cycles down a memory pipe while

ans = (op > 16) ? 2 : (op > 8) ? 1 : 0;

might super-scalar issue in 1 cycle.  Code at high level and let 
C compiler know what is best. C will give you a lookup if that 
is best.

Memory ops need not be expensive if they pipeline well, but 
making one memory op depend on the result of another is bad idea
e.g.

   op   = *op_ptr++;
   arg1 = *op_ptr++;
   arg2 = *op_ptr++;

May apear to happen in 3 cycles, as all the loads can be issued in a pipelined
manner and ++s issued in parallel. While

   op  = *op_ptr++;
   ans = table[op];

could take seem to 18 cycles as can't start 2nd load till 1st one completes.

I have been meaning to try and prove my point with 
a software-pipelined dispatch loop which is fetching one op,
decoding previous one and executing one before that.

-- 
Nick Ing-Simmons
who is looking for a new job see http://www.ni-s.u-net.com/

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-05-30 Thread Dan Sugalski

At 09:35 AM 5/30/2001 -0700, Larry Wall wrote:
>Dan Sugalski writes:
>: Right, but in this case we have the advantage of tailoring the instruction
>: set to the language, and given the overhead inherent in op dispatch we also
>: have an incentive to hoist opcodes up to as high a level as we can manage.
>
>We basically tried this experiment with Perl 5, and it's only a partial
>success.  Yes, you end up with a Perl VM that can run Perl pretty fast,
>but it tends to complicate the mapping to other virtual machines.
>(Enough so that we still don't have a reasonable way to run Perl on a
>JVM, despite several attempts.)

I think, if we have a solid definition of the abstract machine the p-code's 
targeted at, that a JIT/.NET/whatever translation should be reasonably 
straightforward. Except for the regex engine, which is probably going to 
drive someone (else) mad...

>I guess the real question is to what extent the world of the future will
>use interpreters, and to what extent it'll settle on JIT compiling instead.
>And that's a big enough dog that we can't wag it very easily.

If I had to bet, I'd wager that interpreter usage will increase as the 
power of processors increase. How much of that usage is co-opted by JIT 
versions of interpreters is something I'm not sure of.

>By the way, have you folks considered how to unify the regex opcodes
>with the normal Perl opcodes?  I suspect we might want to do that.

Are you speaking of the nodes in regnode.h? I hadn't considered them as 
regular perl opcodes--I figured they'd stay internal to the regex engine so 
we could keep it reasonably modular.

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk

RE: Stacks, registers, and bytecode. (Oh, my!)

2001-05-30 Thread Hong Zhang



> There's no reason why you can.t have a hybrid scheme. In fact I think
> it's a big win over a pure register-addressing scheme. Consider...

The hybrid scheme may be a win in some cases, but I am not sure if it
worth the complexity. I personally prefer a strict RISC style opcodes,
mainly load, store, and ops for common operators (+, -, * etc), plus
escaped opcode for complicated operators and functions.

> Consider the following code.
> 
> $a = $x*$y+$z
> 
> Suppose we have r5 and r6 available for scratch use, and that for some
> reason we wish to keep a pointer to $a in r1 at the end 
> (perhaps we use $a again a couple of lines later):
> 
> 
> This might have the following bytecode with a pure resiger scheme:
> 
> GETSV('x',r5) # get pointer to global $x, store in register 5
> GETSV('y',r6)
> MULT(r5,r5,r6)  # multiply the things pointed to by r5 and 
> r6; store ptr to
>   # result in r5
> GETSV('z',r6)
> ADD(r5,r5,r6)
> GETSV('a',r1)
> SASSIGN(r1,r5)

Please note most of common operations will deal with locals, not
globals. Since almost all locals will fit into register set, the
generated bytecode will be very small and very fast.

The global access is doomed to be slower than locals, especailly
considering the synchronization overhead associated with threading.

Hong

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-05-30 Thread Larry Wall


Dan Sugalski writes:
: Right, but in this case we have the advantage of tailoring the instruction 
: set to the language, and given the overhead inherent in op dispatch we also 
: have an incentive to hoist opcodes up to as high a level as we can manage.

We basically tried this experiment with Perl 5, and it's only a partial
success.  Yes, you end up with a Perl VM that can run Perl pretty fast,
but it tends to complicate the mapping to other virtual machines.
(Enough so that we still don't have a reasonable way to run Perl on a
JVM, despite several attempts.)

I guess the real question is to what extent the world of the future will
use interpreters, and to what extent it'll settle on JIT compiling instead.
And that's a big enough dog that we can't wag it very easily.

By the way, have you folks considered how to unify the regex opcodes
with the normal Perl opcodes?  I suspect we might want to do that.

Larry

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-05-30 Thread Nick Ing-Simmons


Dave Mitchell <[EMAIL PROTECTED]> writes:
>
>There's no reason why you can.t have a hybrid scheme. In fact I think
>it's a big win over a pure register-addressing scheme. Consider...

Which was more or less my own position...

>
>At the start of a new scope, the stack is extended by N to create a new
>stack frame (including a one-off check that the stack can be
>extended).  There is then a 'stack pointer' (sp) which is initialised
>to the base of the new frame, or an initial offset thereof. (So sp is
>really just a temporary index within the current frame.)
>
>Then some opcodes can use explicit addressing, while others can be explicit,
>or a mixture.
>
>Explicit opcodes specify one or more 'registers' - ie indexes within the
>current frame, while implicit opcodes use the current value of sp as an
>implicit index, and may alter sp as a side effect. So an ADD opcode
>would use sp[0], sp[-1] to find the 2 operands and would store a pointer
>to the result at sp[-1], then sp--. The compiler plants code in such a way
>that it will never allow sp to go outside the current stack frame.
>
>This allows a big win on the size of the bytecode, and in terms of the
>time required to decode each op.
>
>Consider the following code.
>
>$a = $x*$y+$z
>
>Suppose we have r5 and r6 available for scratch use, and that for some
>reason we wish to keep a pointer to $a in r1 at the end (perhaps we use
>$a again a couple of lines later):
>
>
>This might have the following bytecode with a pure resiger scheme:
>
>GETSV('x',r5)  # get pointer to global $x, store in register 5
>GETSV('y',r6)
>MULT(r5,r5,r6)  # multiply the things pointed to by r5 and r6; store ptr to
>   # result in r5
>GETSV('z',r6)
>ADD(r5,r5,r6)
>GETSV('a',r1)
>SASSIGN(r1,r5)

Globals are a pain. Consider this code:

sub foo
{
 my ($x,$y,$z) = @_;
 return $x*$y+$z;
}

In the pure register (RISC-oid) scheme the bytecode should be:

FOO:
  MULT(arg1,arg2,tmp1)
  ADD(tmp1,arg3,result)
  RETURN

That is lexicals get allocated registers at compile time, and ops
just go get them.

In the pure stack with alloc scheme (x86-oid) scheme it should be
  ENTER +1 # need a temp
  MULT SP[1],SP[2],SP[4]   # $x*$y
  ADD SP[4],SP[3],SP[1]# temp + $z -> result
  RETURN -2# Loose temp and non-results   

And in a pure stack (FORTH, PostScript) style it might be
  rot 3# reorder stack to get x y on top
  mpy
  add
  ret   

>
>but might be like this in a hybrid scheme:
>
>SETSP(5)   # make sp point to r5
>GETSV('x') # get pointer to global $a, store at *sp++
>GETSV('y')
>MULT
>GETSV('z')
>ADD
>GETSV('a')
>SASSIGN
>SAVEREG(r1)# pop pointer at *sp, and store in register 1

The problem that the hybrid scheme glosses over is the re-order the args
issue that is handled by register numbers, stack addressing or 
FORTH/PostScript stack re-ordering.
It avoids it by expensive long range global fetches - which is indeed
what humans do when writing PostScript - use globals - but compilers
can keep track of such mess for us.

>
>
>Both use the same regsters, have the same net result, but the explicit
>scheme requires an extra 11 numbers in the bytecode, not to mention all
>the extra cycles required to extract out those nunmbers from the bytecode
>in the first place.
-- 
Nick Ing-Simmons
who is looking for a new job see http://www.ni-s.u-net.com/

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-05-30 Thread Uri Guttman


> "NI" == Nick Ing-Simmons <[EMAIL PROTECTED]> writes:

  NI> The "overhead of op dispatch" is a self-proving issue - if you
  NI> have complex ops they are expensive to dispatch.

but as someone else said, we can design our own ops to be as high level
as we want. lowering the number of op calls is the key. that loop will
be a bottleneck as it is in perl5 unless we optimize it now.

  NI> With a 16-bit opcode as-per-Uri that becomes:

  NI>while (1) *(table[*op_ptr++])();

  NI> (Assuming we don't need to check bounds 'cos we won't generate bad code...)

i dropped the 16 bit idea in favor of an extension byte code that zhong
mentioned. it has several wins, no ordering issues, it is pure 'byte'
code. 

  NI> One can then start adding decode to the loop:
 
  NI>while (1) {
  NI>  op_t op = *op_ptr++;
  NI>  switch(NUM_ARGS(op))

no switch, a simple lookup table:

op_cnt = op_counts[ op ] ;

  NI>*(table[FUNC_NUM(op)])(*op_ptr++);

i don't have the c code in mind for that. something that uses op_cnt and
passes in that many args. 

op_ptr += op_cnt;


uri

-- 
Uri Guttman  -  [EMAIL PROTECTED]  --  http://www.sysarch.com
SYStems ARCHitecture and Stem Development -- http://www.stemsystems.com
Learn Advanced Object Oriented Perl from Damian Conway - Boston, July 10-11
Class and Registration info: http://www.sysarch.com/perl/OOP_class.html

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-05-30 Thread Uri Guttman

> "NI" == Nick Ing-Simmons <[EMAIL PROTECTED]> writes:

  NI> Uri Guttman <[EMAIL PROTECTED]> writes:
  >> 
  >> think of this as classic CISC code generation with plenty of registers
  >> and a scratch stack. this is stable technology. we could even find a
  >> code generator guru (i don't know any obvious ones in the perl6 world)

  NI> Classic CISC code generation taught "us" that CISC is a pain to code-gen.
  NI> (I am not a Guru but did design TMS320C80's RISC specifically to match 
  NI> gcc of that vintage, and dabbled in a p-code for Pascal way back.)

i disagree. cisc went out of style for speed reasons, not because of
compilers. most risc compilers are much more complicated than cisc
compilers. they have to manage complex register usage, deal with delayed
branches, reorder instructions to fill pipelines, etc. what happened was
that the higher level instruction logic in a cisc chip was removed and
put into the compiler. but we are not doing a risc like IL so this is moot.

  >> 
  >> >> special registers ($_, @_, events, etc.) are indexed with a starting
  >> >> offset of 64, so general registers are 0-63.
  >> 
  DS> I'd name them specially (S0-Snnn) rather than make them a chunk of the 
  DS> normal register set.

  NI> All that dividing registers into sub-classes does it cause you to do 
  NI> register-register moves when things are in the wrong sort of register.
  NI> Its only real benefit is for encoding density as you can "imply" part
  NI> of the register number by requiring addresses to be in address registers
  NI> etc. It is not clear to me that perl special variables map well to that.
  NI> Mind you the names are just a human thing - it is the bit-pattern that 
  NI> compiler cares about.

no register to register moves are needed. if you call tr with the
default $_ it get generated as:

TR_OP   REG_ARG, ,  ; REG_ARG is $_

where <> are register indexes generate by the compiler. it assumes
something is already in $_

if you use tr on $foo you code gen:

push   ; save reg if needed
load , $foo; load if needed - probably more code
; than this
TR_OP   , , 

the first two lines are only there is $foo isn't in a register yet.
in both cases  is picked by the compiler and must be free to
use. it gets the tr count return (the result of the op).

the idea is that coding for $_ or $foo is basically the same. you just
use the special register index for $_ vs. any regular one for $foo.

code generation with this type of IL is almost one to one with
perl. most of the language level operations map to one op code, so it is
mostly a case of managing the data in and out of the registers and
calling the op codes. a real compiler has to do so much more. TIL and c
back ends will need much more work than the basic perl to IL
translation.

idea: we have a VOID special register. it is like /dev/null. if you
execute an op in a void context, you pass in the VOID special register
to get the result. then either no result is generated or stored
depending on how we code it. this doesn't handle the runtime context of
sub calls.

uri

-- 
Uri Guttman  -  [EMAIL PROTECTED]  --  http://www.sysarch.com
SYStems ARCHitecture and Stem Development -- http://www.stemsystems.com
Learn Advanced Object Oriented Perl from Damian Conway - Boston, July 10-11
Class and Registration info: http://www.sysarch.com/perl/OOP_class.html

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-05-30 Thread Dave Mitchell


> >the op code is stored in network endian order and the interpreter will
> >always build a 16 bit int from the 2 bytes.
>
> Not network. Native. We can put a marker at the beginning of any bytecode 
> stream, and provide an endian-swapper. That way we're always running at 
> platform optimal encoding, and if we get bytecode from a platform with 
> different endianness we can run the utility to swap things for us.

Onther way would to have an 8-bit opcode, with extensions, eg
opcode 0xff imples that the next 16 bits is an extended opcode.
The most common opcodes would be 8-bit. This would also mean the initial
dispatch could be done without any range-checks even, as long as unused
entries in the 8-bit table pointed to an error function:

char *byte_ptr = BYTECODE_START;
while (byte_ptr) {
byte_ptr = (dispatch_table8[*byte_ptr++])(byte_ptr, )
}

with the function at dispatch_table8[255] responsible for reading
the next 2 bytes, forming them into an int, checking the ranges, then
using another dispatch table.

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-05-30 Thread Nick Ing-Simmons

Dan Sugalski <[EMAIL PROTECTED]> writes:
>At 02:08 PM 5/30/2001 +, Nick Ing-Simmons wrote:
>>Classic CISC code generation taught "us" that CISC is a pain to code-gen.
>>(I am not a Guru but did design TMS320C80's RISC specifically to match
>>gcc of that vintage, and dabbled in a p-code for Pascal way back.)
>
>Right, but in this case we have the advantage of tailoring the instruction 
>set to the language, and given the overhead inherent in op dispatch we also 
>have an incentive to hoist opcodes up to as high a level as we can manage.

That is of course what they/we all say ;-)

The 68K for example matched quite well to the low-tech compiler technology
of its day, as did UCSD's p-code for USCD Pascal, and DSPs have their own 
reasons (inner loops are more important than generic C) for their CISC nature.

Even the horrible x86 architecture is quasi-sane if you assume all variables
are on the stack addressed by the Base Pointer.

It is interesting now that people are looking at building chips for JVM
how much cursing there is about certain features - though I don't have 
the references to hand.

The "overhead of op dispatch" is a self-proving issue - if you have complex
ops they are expensive to dispatch. 
In the limit FORTH-like threaded code 

   while (1) *(*op_ptr++)();

is not really very expensive, it is then up to the "op" to adjust op_ptr
for in-line args etc. Down sides are size op is at least size of a pointer.

With a 16-bit opcode as-per-Uri that becomes:

   while (1) *(table[*op_ptr++])();

(Assuming we don't need to check bounds 'cos we won't generate bad code...)

One can then start adding decode to the loop:

   while (1) {
 op_t op = *op_ptr++;
 switch(NUM_ARGS(op))
  case 1:
   *(table[FUNC_NUM(op)])(*op_ptr++);
   break;
  case 3:
   *(table[FUNC_NUM(op)])(op_ptr[0],op_ptr[1],op_ptr[2]);
   op_ptr += 3;
   break;
  ...
   }

Then one can do byte-ordering and mis-aligned hackery and index into reg-array

   while (1) {
 op_t op = GET16BITS(*op_ptr);
 switch(NUM_ARGS(op))
  case 1:
   *(table[FUNC_NUM(op)])(reg_ptr[GET8BITS(*op_ptr)]);
   break;
  ...
   }

>
-- 
Nick Ing-Simmons
who is looking for a new job see http://www.ni-s.u-net.com/

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-05-30 Thread Dave Mitchell

Nick Ing-Simmons <[EMAIL PROTECTED]> wrote:
> I don't like push/pop - they imply a lot of stack limit checking word-by-word
> when it is less overhead for compiler to analyse the needs of whole 
basic-block
> check-for/make-space-on the stack _once_ then just address it.

There's no reason why you can.t have a hybrid scheme. In fact I think
it's a big win over a pure register-addressing scheme. Consider...

At the start of a new scope, the stack is extended by N to create a new
stack frame (including a one-off check that the stack can be
extended).  There is then a 'stack pointer' (sp) which is initialised
to the base of the new frame, or an initial offset thereof. (So sp is
really just a temporary index within the current frame.)

Then some opcodes can use explicit addressing, while others can be explicit,
or a mixture.

Explicit opcodes specify one or more 'registers' - ie indexes within the
current frame, while implicit opcodes use the current value of sp as an
implicit index, and may alter sp as a side effect. So an ADD opcode
would use sp[0], sp[-1] to find the 2 operands and would store a pointer
to the result at sp[-1], then sp--. The compiler plants code in such a way
that it will never allow sp to go outside the current stack frame.

This allows a big win on the size of the bytecode, and in terms of the
time required to decode each op.

Consider the following code.

$a = $x*$y+$z

Suppose we have r5 and r6 available for scratch use, and that for some
reason we wish to keep a pointer to $a in r1 at the end (perhaps we use
$a again a couple of lines later):

This might have the following bytecode with a pure resiger scheme:

GETSV('x',r5)   # get pointer to global $x, store in register 5
GETSV('y',r6)
MULT(r5,r5,r6)  # multiply the things pointed to by r5 and r6; store ptr to
# result in r5
GETSV('z',r6)
ADD(r5,r5,r6)
GETSV('a',r1)
SASSIGN(r1,r5)

but might be like this in a hybrid scheme:

SETSP(5)# make sp point to r5
GETSV('x')  # get pointer to global $a, store at *sp++
GETSV('y')
MULT
GETSV('z')
ADD
GETSV('a')
SASSIGN
SAVEREG(r1) # pop pointer at *sp, and store in register 1

Both use the same regsters, have the same net result, but the explicit
scheme requires an extra 11 numbers in the bytecode, not to mention all
the extra cycles required to extract out those nunmbers from the bytecode
in the first place.

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-05-30 Thread Dan Sugalski


At 02:08 PM 5/30/2001 +, Nick Ing-Simmons wrote:
>Uri Guttman <[EMAIL PROTECTED]> writes:
> >
> >think of this as classic CISC code generation with plenty of registers
> >and a scratch stack. this is stable technology. we could even find a
> >code generator guru (i don't know any obvious ones in the perl6 world)
>
>Classic CISC code generation taught "us" that CISC is a pain to code-gen.
>(I am not a Guru but did design TMS320C80's RISC specifically to match
>gcc of that vintage, and dabbled in a p-code for Pascal way back.)

Right, but in this case we have the advantage of tailoring the instruction 
set to the language, and given the overhead inherent in op dispatch we also 
have an incentive to hoist opcodes up to as high a level as we can manage.

> >  >> special registers ($_, @_, events, etc.) are indexed with a starting
> >  >> offset of 64, so general registers are 0-63.
> >
> >  DS> I'd name them specially (S0-Snnn) rather than make them a chunk of 
> the
> >  DS> normal register set.
>
>All that dividing registers into sub-classes does it cause you to do
>register-register moves when things are in the wrong sort of register.
>Its only real benefit is for encoding density as you can "imply" part
>of the register number by requiring addresses to be in address registers
>etc. It is not clear to me that perl special variables map well to that.
>Mind you the names are just a human thing - it is the bit-pattern that
>compiler cares about.

Fair enough. Having the equivalent to perl5's push stack (well, one of 'em) 
to save and restore named specials is probably a better idea. There's no 
real overriding performance reason to have the specials in the register set.

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-05-30 Thread Nick Ing-Simmons

Uri Guttman <[EMAIL PROTECTED]> writes:
>
>think of this as classic CISC code generation with plenty of registers
>and a scratch stack. this is stable technology. we could even find a
>code generator guru (i don't know any obvious ones in the perl6 world)

Classic CISC code generation taught "us" that CISC is a pain to code-gen.
(I am not a Guru but did design TMS320C80's RISC specifically to match 
gcc of that vintage, and dabbled in a p-code for Pascal way back.)

>
>  >> special registers ($_, @_, events, etc.) are indexed with a starting
>  >> offset of 64, so general registers are 0-63.
>
>  DS> I'd name them specially (S0-Snnn) rather than make them a chunk of the 
>  DS> normal register set.

All that dividing registers into sub-classes does it cause you to do 
register-register moves when things are in the wrong sort of register.
Its only real benefit is for encoding density as you can "imply" part
of the register number by requiring addresses to be in address registers
etc. It is not clear to me that perl special variables map well to that.
Mind you the names are just a human thing - it is the bit-pattern that 
compiler cares about.

>
>oh, they have macro names which are special. something like:
>
>#defineMAX_PLAIN REG   64  /* 0 - 63 are plain regs */
>#defineREG_ARG 64  /* $_ */
>#defineREG_SUB_ARG 65  /* @_ */
>#defineREG_ARGV66  /* @ARGV */
>#defineREG_INT167  /* integer 1 */
>#defineREG_INT268  /* integer 1 */
>
>uri
-- 
Nick Ing-Simmons
who is looking for a new job see http://www.ni-s.u-net.com/

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-05-30 Thread Nick Ing-Simmons


Uri Guttman <[EMAIL PROTECTED]> writes:
>  DS> The one handy thing about push and pop is you don't need to go
>  DS> tracking the stack manually--that's taken care of by the push and
>  DS> pop opcodes. They can certainly be replaced with manipulations of
>  DS> a temp register and indirect register stores or loads, but that's
>  DS> more expensive--you do the same thing only with more dispatch
>  DS> overhead.
>
>  DS> And I'm considering the stack as a place to put registers
>  DS> temporarily when the compiler runs out and needs a spot to
>  DS> squirrel something away, rather than as a mechanism to pass
>  DS> parameters to subs or opcodes. This is a stack in the traditional
>  DS> scratch-space sense.
>
>i agree with that. the stack here is mostly a call stack which
>save/restores registers as we run out. with a large number like 64, we
>won't run out until we do some deep calls. then the older registers (do
>we have an LRU mechnism here?) get pushed by the sub call prologue which
>then uses those registers for its my vars.

I don't like push/pop - they imply a lot of stack limit checking word-by-word
when it is less overhead for compiler to analyse the needs of whole basic-block
check-for/make-space-on the stack _once_ then just address it.

>
>is the sub call/return stack also the data (scratch) stack? i think
>separate ones makes sense here. the data stack is just PMC pointers, the
>code call stack has register info, context, etc.

One stack is more natural for translation to C (which has just one).
One problem with FORTH was allocating two growable segments for its 
two stacks - one always ended up 2nd class.

-- 
Nick Ing-Simmons
who is looking for a new job see http://www.ni-s.u-net.com/

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-05-30 Thread Randal L. Schwartz


> "Larry" == Larry Wall <[EMAIL PROTECTED]> writes:

Larry> : Nah, bytecode'll have an endianness marker at the top. If you
Larry> : load in bytecode with the wrong endianness, the loader will
Larry> : have to swap for you.

Larry> Er.  If they're not bytes, we can't call it bytecode.

Unless you want to raise the "octet means 8 bits, byte means whatever
is nice and chunky" argument. :)

-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<[EMAIL PROTECTED]> http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-05-29 Thread Uri Guttman

> "DS" == Dan Sugalski <[EMAIL PROTECTED]> writes:

  DS> At 04:25 PM 5/29/2001 -0400, Uri Guttman wrote:

  >> here is an idea. if we use a pure stack design but you can access the
  >> stack values with an index, then the index number can get large. so a
  >> fixed register set would allow us to limit the index to 8 bits. so the
  >> byte code could look something like this:

  DS> Right, but that means values can potentially have their offsets
  DS> change based on what might happen during the course of a section
  DS> of code, and I'm not too comfortable with that. The potential for
  DS> error worries me. Plus it might make a mixed-type stack
  DS> trickier. (And yes, I realize the irony of me arguing against one
  DS> form of error-prone activity in the same paragraph that advocates
  DS> another...)

current compilers do it all the time. they track register usage at
compile time and push/pop at runtime to keep the code working. a given
op is passed its args as register offsets. before/after that, there may
be ops to set up the registers. only during that single op call are the
needed registers known. this is not a sub call in perl with complex
stuff, but a single IL op call with a handful of registers passed to
it. most registers will be recycled after each perl level statement as
they are temps. only those that are mapped to real vars (and other live
things) are kept between statements.

think of this as classic CISC code generation with plenty of registers
and a scratch stack. this is stable technology. we could even find a
code generator guru (i don't know any obvious ones in the perl6 world)
to work with us on this.

  >> the op code is stored in network endian order and the interpreter will
  >> always build a 16 bit int from the 2 bytes.

  DS> Not network. Native. We can put a marker at the beginning of any bytecode 
  DS> stream, and provide an endian-swapper. That way we're always running at 
  DS> platform optimal encoding, and if we get bytecode from a platform with 
  DS> different endianness we can run the utility to swap things for us.

i disagree. bytecode should be portable without conversions. we store 16
bits in a known order (i picked network at random :) and build a native
16 bit int as we execute. or use only an 8 bit op code with one or more
escape code for the next 256 codes (like Hong Zhang said).

  >> all registers point to PMC's

  DS> I really think we'll win if we have support for at least integers
  DS> as well as PMCs. There's potentially a lot of integer work that'll
  DS> be generated by the optimizer and, while integer opcodes might not
  DS> make the interpreter much faster, they'll speed up TIL and
  DS> generated C code significantly since they won't need to call
  DS> opcodes.

ok, some special registers for hardware ops like add. but they should be
indexed off the same register set with the other special registers.

  >> passing lists to/from subs is via an array ref. the data list is on the
  >> stack and the array ref is in @_ or passed by return().

  DS> PMC for a list, and the args are in the list. (Potentially
  DS> aliased, of course) Subs prototyped with input and return data
  DS> won't use lists, they'll pass the parameters in registers. (Or
  DS> something like that)

same idea but you clarified it.

  >> special registers ($_, @_, events, etc.) are indexed with a starting
  >> offset of 64, so general registers are 0-63.

  DS> I'd name them specially (S0-Snnn) rather than make them a chunk of the 
  DS> normal register set.

oh, they have macro names which are special. something like:

#define MAX_PLAIN REG   64  /* 0 - 63 are plain regs */
#define REG_ARG 64  /* $_ */
#define REG_SUB_ARG 65  /* @_ */
#define REG_ARGV66  /* @ARGV */
#define REG_INT167  /* integer 1 */
#define REG_INT268  /* integer 1 */

uri

-- 
Uri Guttman  -  [EMAIL PROTECTED]  --  http://www.sysarch.com
SYStems ARCHitecture and Stem Development -- http://www.stemsystems.com
Learn Advanced Object Oriented Perl from Damian Conway - Boston, July 10-11
Class and Registration info: http://www.sysarch.com/perl/OOP_class.html

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-05-29 Thread Jarkko Hietaniemi

On Tue, May 29, 2001 at 04:36:51PM -0600, Nathan Torkington wrote:
> Dan Sugalski writes:
> > Okay--Parrot Object Code. (If I was feeling cleverer at the moment, I'd 
> > come up with a name that has a snappier acronym. Alas I'm not... :)
> 
> p-code.  The p stands for Parrot :-)

No, it stands for "pieces of eight".

The bar silver and the arms still lie, for all that I
know, where Flint buried them; and certainly they shall
lie there for me.  Oxen and wain-ropes would not bring
me back again to that accursed island; and the worst
dreams that ever I have are when I hear the surf
booming about its coasts or start upright in bed with
the sharp voice of Captain Flint still ringing in my
ears: "Pieces of eight!  Pieces of eight!"

> Nat

-- 
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for 'stable'.
# It is 'dead'. -- Jack Cohen

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-05-29 Thread Uri Guttman

> "DS" == Dan Sugalski <[EMAIL PROTECTED]> writes:

  DS> At 02:15 PM 5/29/2001 -0700, Hong Zhang wrote:
  >> > we have a simple set of load literal, push/pop (multiple) registers op
  >> > codes.
  >> 
  >> There should be no push/pop opcodes. They are simply register moves.

  DS> The one handy thing about push and pop is you don't need to go
  DS> tracking the stack manually--that's taken care of by the push and
  DS> pop opcodes. They can certainly be replaced with manipulations of
  DS> a temp register and indirect register stores or loads, but that's
  DS> more expensive--you do the same thing only with more dispatch
  DS> overhead.

  DS> And I'm considering the stack as a place to put registers
  DS> temporarily when the compiler runs out and needs a spot to
  DS> squirrel something away, rather than as a mechanism to pass
  DS> parameters to subs or opcodes. This is a stack in the traditional
  DS> scratch-space sense.

i agree with that. the stack here is mostly a call stack which
save/restores registers as we run out. with a large number like 64, we
won't run out until we do some deep calls. then the older registers (do
we have an LRU mechnism here?) get pushed by the sub call prologue which
then uses those registers for its my vars.

is the sub call/return stack also the data (scratch) stack? i think
separate ones makes sense here. the data stack is just PMC pointers, the
code call stack has register info, context, etc.

uri

-- 
Uri Guttman  -  [EMAIL PROTECTED]  --  http://www.sysarch.com
SYStems ARCHitecture and Stem Development -- http://www.stemsystems.com
Learn Advanced Object Oriented Perl from Damian Conway - Boston, July 10-11
Class and Registration info: http://www.sysarch.com/perl/OOP_class.html

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-05-29 Thread Nathan Torkington


Dan Sugalski writes:
> Okay--Parrot Object Code. (If I was feeling cleverer at the moment, I'd 
> come up with a name that has a snappier acronym. Alas I'm not... :)

p-code.  The p stands for Parrot :-)

Nat

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-05-29 Thread Dan Sugalski


At 03:26 PM 5/29/2001 -0700, Larry Wall wrote:
>Dan Sugalski writes:
>: Nah, bytecode'll have an endianness marker at the top. If you load in
>: bytecode with the wrong endianness, the loader will have to swap for you.
>
>Er.  If they're not bytes, we can't call it bytecode.

Okay--Parrot Object Code. (If I was feeling cleverer at the moment, I'd 
come up with a name that has a snappier acronym. Alas I'm not... :)

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-05-29 Thread Larry Wall


Dan Sugalski writes:
: Nah, bytecode'll have an endianness marker at the top. If you load in 
: bytecode with the wrong endianness, the loader will have to swap for you.

Er.  If they're not bytes, we can't call it bytecode.

Larry

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-05-29 Thread Dan Sugalski

At 04:25 PM 5/29/2001 -0400, Uri Guttman wrote:
> > "DS" == Dan Sugalski <[EMAIL PROTECTED]> writes:
>
>   DS> If someone can make a compelling argument for details on how the
>   DS> registers should be done that should be visible to the bytecode
>   DS> (Like, say, a variant on SPARC's windowing system, or forcing them
>   DS> on the stack such that pushes and pops renumber the registers)
>   DS> that'd be keen. I can see the merits, but I'm not compelled one
>   DS> way or the other quite yet.
>
>here is an idea. if we use a pure stack design but you can access the
>stack values with an index, then the index number can get large. so a
>fixed register set would allow us to limit the index to 8 bits. so the
>byte code could look something like this:

Right, but that means values can potentially have their offsets change 
based on what might happen during the course of a section of code, and I'm 
not too comfortable with that. The potential for error worries me. Plus it 
might make a mixed-type stack trickier. (And yes, I realize the irony of me 
arguing against one form of error-prone activity in the same paragraph that 
advocates another...)

> literal data support is needed (read only)

Yup. We'll need some sort of constant support, as well as some sort of 
sectional literal table.

>the op code is stored in network endian order and the interpreter will
>always build a 16 bit int from the 2 bytes.

Not network. Native. We can put a marker at the beginning of any bytecode 
stream, and provide an endian-swapper. That way we're always running at 
platform optimal encoding, and if we get bytecode from a platform with 
different endianness we can run the utility to swap things for us.

>all registers point to PMC's

I really think we'll win if we have support for at least integers as well 
as PMCs. There's potentially a lot of integer work that'll be generated by 
the optimizer and, while integer opcodes might not make the interpreter 
much faster, they'll speed up TIL and generated C code significantly since 
they won't need to call opcodes.

>passing lists to/from subs is via an array ref. the data list is on the
>stack and the array ref is in @_ or passed by return().

PMC for a list, and the args are in the list. (Potentially aliased, of 
course)  Subs prototyped with input and return data won't use lists, 
they'll pass the parameters in registers. (Or something like that)

>special registers ($_, @_, events, etc.) are indexed with a starting
>offset of 64, so general registers are 0-63.

I'd name them specially (S0-Snnn) rather than make them a chunk of the 
normal register set.

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk

RE: Stacks, registers, and bytecode. (Oh, my!)

2001-05-29 Thread Dan Sugalski

At 02:15 PM 5/29/2001 -0700, Hong Zhang wrote:
> > we have a simple set of load literal, push/pop (multiple) registers op
> > codes.
>
>There should be no push/pop opcodes. They are simply register moves.

The one handy thing about push and pop is you don't need to go tracking the 
stack manually--that's taken care of by the push and pop opcodes. They can 
certainly be replaced with manipulations of a temp register and indirect 
register stores or loads, but that's more expensive--you do the same thing 
only with more dispatch overhead.

And I'm considering the stack as a place to put registers temporarily when 
the compiler runs out and needs a spot to squirrel something away, rather 
than as a mechanism to pass parameters to subs or opcodes. This is a stack 
in the traditional scratch-space sense.

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-05-29 Thread Dan Sugalski

At 05:32 PM 5/29/2001 -0400, Bryan C. Warnock wrote:
>On Tuesday 29 May 2001 05:15 pm, Hong Zhang wrote:
> > > either each op code knows how many args it has,
> >
> > I like to do so, otherwise we will lose most of the performance gain.
>
>I would think, though, that some ops would benefit more from varargs.
>String assembly comes to mind.

If an opcode wants some undefined number of arguments passed, it'll get a 
single one (a list) passed to it. I don't know that we'll have too many 
opcodes other than perhaps sub and method call ones that'll take a variable 
number of arguments, and the ones that do will really take one--a list with 
potentially a variable number of entries in it. So instead of seeing:

   some_op a, b, c, d, e...

we'd see:

   templist r1
   push r1, array1
   push r1, scalar2
   push r1, 15
   some_op, r1

Though that might have sufficient performance loss in opcode dispatch to 
warrant doing things in another way.

> > > or we have an end marker (e.g  0xff which is never used as a register
> >
> > index).
> >
> > If we have to use variable arguments, I strongly recommend to add one
> > "argc" byte immediately following the opcode. Linear scan bytecode will be
> > very slow.
>
>Agreed.

I don't think this'll come up. Ops that take a variable number of arguments 
will have a count, though, and it'll probably have to be the first element.

> > The 16-bit op has both endian issue and alignment issue. Most of RISC
> > machine can not access byte-aligned opcode, so we have to add a lot
> > of padding. Anyway, it will be fatter and slower than 8-bit opcode.
> > I prefer to using escape opcode.
>
>This is a discussion we've had before.  It seems if we solve the Unicode
>issue, we've solved the bytecode issue, too.  Better yet, make all the
>opcodes map to UTF-8 characters, so you can write and execute bytecode as
>simple strings.  ;-)

Nah, bytecode'll have an endianness marker at the top. If you load in 
bytecode with the wrong endianness, the loader will have to swap for you.

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-05-29 Thread Bryan C . Warnock

On Tuesday 29 May 2001 05:15 pm, Hong Zhang wrote:
> > either each op code knows how many args it has,
>
> I like to do so, otherwise we will lose most of the performance gain.

I would think, though, that some ops would benefit more from varargs.  
String assembly comes to mind.

>
> > or we have an end marker (e.g  0xff which is never used as a register
>
> index).
>
> If we have to use variable arguments, I strongly recommend to add one
> "argc" byte immediately following the opcode. Linear scan bytecode will be
> very slow.

Agreed.  

> The 16-bit op has both endian issue and alignment issue. Most of RISC
> machine can not access byte-aligned opcode, so we have to add a lot
> of padding. Anyway, it will be fatter and slower than 8-bit opcode.
> I prefer to using escape opcode.

This is a discussion we've had before.  It seems if we solve the Unicode 
issue, we've solved the bytecode issue, too.  Better yet, make all the 
opcodes map to UTF-8 characters, so you can write and execute bytecode as 
simple strings.  ;-)

-- 
Bryan C. Warnock
[EMAIL PROTECTED]

RE: Stacks, registers, and bytecode. (Oh, my!)

2001-05-29 Thread Hong Zhang



> here is an idea. if we use a pure stack design but you can access the
> stack values with an index, then the index number can get large. so a
> fixed register set would allow us to limit the index to 8 bits. so the
> byte code could look something like this:
> 
>   16 bit op   (plenty of room for growth)
>   8 bit register index of arg 1
>   8 bit register index of arg 2
>   ...
> 
>   next op code ...
> 
>   literal data support is needed (read only)

> either each op code knows how many args it has,

I like to do so, otherwise we will lose most of the performance gain.

> or we have an end marker (e.g  0xff which is never used as a register
index).

If we have to use variable arguments, I strongly recommend to add one "argc"
byte immediately following the opcode. Linear scan bytecode will be very
slow.

> the op code is stored in network endian order and the interpreter will
> always build a 16 bit int from the 2 bytes.

The 16-bit op has both endian issue and alignment issue. Most of RISC
machine can not access byte-aligned opcode, so we have to add a lot
of padding. Anyway, it will be fatter and slower than 8-bit opcode.
I prefer to using escape opcode.

> we have a simple set of load literal, push/pop (multiple) registers op
> codes. 

There should be no push/pop opcodes. They are simply register moves.
 
> each thread has its own register set.
> 
> all registers point to PMC's
> 
> passing lists to/from subs is via an array ref. the data list 
> is on the
> stack and the array ref is in @_ or passed by return().
> 
> special registers ($_, @_, events, etc.) are indexed with a starting
> offset of 64, so general registers are 0-63.
> 
> this can be mmapped in, executed with NO changes, fairly easily
> generated by the compiler front end, optimizable on or offline,
> (dis)assembler can be simply written, etc.
> 
> simple to translate to TIL code by converting each op code to a call to
> the the op function itself and passing in the register indexes or even
> the PMC pointers themselves.

Agreed.

Hong

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-05-29 Thread Uri Guttman


> "DS" == Dan Sugalski <[EMAIL PROTECTED]> writes:

  DS> If someone can make a compelling argument for details on how the
  DS> registers should be done that should be visible to the bytecode
  DS> (Like, say, a variant on SPARC's windowing system, or forcing them
  DS> on the stack such that pushes and pops renumber the registers)
  DS> that'd be keen. I can see the merits, but I'm not compelled one
  DS> way or the other quite yet.

here is an idea. if we use a pure stack design but you can access the
stack values with an index, then the index number can get large. so a
fixed register set would allow us to limit the index to 8 bits. so the
byte code could look something like this:

16 bit op   (plenty of room for growth)
8 bit register index of arg 1
8 bit register index of arg 2
...

next op code ...

literal data support is needed (read only)

either each op code knows how many args it has, or we have an end marker
(e.g  0xff which is never used as a register index).

the op code is stored in network endian order and the interpreter will
always build a 16 bit int from the 2 bytes.

we have a simple set of load literal, push/pop (multiple) registers op
codes. 

each thread has its own register set.

all registers point to PMC's

passing lists to/from subs is via an array ref. the data list is on the
stack and the array ref is in @_ or passed by return().

special registers ($_, @_, events, etc.) are indexed with a starting
offset of 64, so general registers are 0-63.

this can be mmapped in, executed with NO changes, fairly easily
generated by the compiler front end, optimizable on or offline,
(dis)assembler can be simply written, etc.

simple to translate to TIL code by converting each op code to a call to
the the op function itself and passing in the register indexes or even
the PMC pointers themselves.

uri

-- 
Uri Guttman  -  [EMAIL PROTECTED]  --  http://www.sysarch.com
SYStems ARCHitecture and Stem Development -- http://www.stemsystems.com
Learn Advanced Object Oriented Perl from Damian Conway - Boston, July 10-11
Class and Registration info: http://www.sysarch.com/perl/OOP_class.html

Stacks, registers, and bytecode. (Oh, my!)

2001-05-29 Thread Dan Sugalski


I've been out of touch this weekend (and currently sorta swamped now) so 
I've not been commenting, but here's a quick sum-up of what I've been 
thinking about:

1) The paired register thing's silly. Forget I mentioned it.

2) The interpreter will have some int, float, and string registers. Some 
stuff will be faster because of it, and it'll make the generated TIL or C 
code (when we do a TILperl or perl2c version) faster since we won't need to 
call opcode functions to add 3 and 4 together...

3) Whether the registers are really stack-based or not's an implementation 
detail. They'll be based off of some per-interpreter thing, of course, 
so'll be thread-local

4) We will have some sort of register push/pop system independent of the 
register implementation. (Probably, like the 68K family, with the ability 
to move multiple registers in one go)

5) The bytecode should be really, really close to the final executable 
form. I'd really like to be able to read in the bytecode in one big chunk 
and start executing it without change. (We'll end up with some sections 
that'll need to be changed--that's inevitable. If we can mmap in the 
non-fixup section pieces, though, that'd be great)

6) We may formally split the registers used to pass parameters from the 
working registers. I'm not sure if that'll ultimately be a win or not. (I 
can forsee lots of pointless register->register moving, and I'm not keen on 
pointless anything)

If someone can make a compelling argument for details on how the registers 
should be done that should be visible to the bytecode (Like, say, a variant 
on SPARC's windowing system, or forcing them on the stack such that pushes 
and pops renumber the registers) that'd be keen. I can see the merits, but 
I'm not compelled one way or the other quite yet.

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk

Re: Stacks & registers

2001-05-27 Thread Nick Ing-Simmons

Uri Guttman <[EMAIL PROTECTED]> writes:
>  NI> No - you keep the window base "handy" and don't keep re-fetching it,
>  NI> same way you keep "program counter" and "stack pointer" "handy".
>
>  NI> Getting  
>  NI>window[N] 
>  NI> is same cost as 
>  NI>next = *PC++; 
>
>  NI> My point is that to avoid keeping too-many things "handy" window
>  NI> base and stack pointer should be the same (real machine) register.
>
>if we can control that. 

Maybe not directly, but most compilers will keep common base registers
in machine registers if you code things right.

>but i see issues too. i mentioned the idea of
>having $_ and other special vars and stuff would have their own PMC's in
>this register set. 

Why does it have to be _this_ register set - globals can go in another
register set - SPARC's register scheme has global registers too.

That said my guess is that $_ is usually save/restored across sub/block
boundaries.

>dan like the idea. that doesn't map well to a window
>as those vars may not change when you call subs. i just don't see
>register windows as useful at the VM level.

Call it what you will - I am arguing for an addressable stack
not for windows as such.

>
>  >> i am just saying register windows don't seem to be any win for us
>  >> and cost an extra indirection for each data access. my view is let
>  >> the compiler keep track of the register usage and just do
>  >> individual push/pops as needed when registers run out.
>
>  NI> That makes sense if (and only if) virtual machine registers are real 
>  NI> machine registers. If virtual machine registers are in memory then 
>  NI> accessing them "on the stack" is just as efficient (perhaps more so)
>  NI> than at some other "special" location. And it avoids need for 
>  NI> memory-to-memory moves to push/pop them when we do "spill".
>
>no, the idea is the VM compiler keeps track of IL register use for the
>purpose of code generating N-tuple op codes and their register
>arguments. this is a pure IL design thing and has nothing to do with
>machine registers. at this level, register windows don't win IMO.

That quote is a little misleading. My point is that UNLESS machine
(real) machine registers are involved then all IL "Registers" are 
in memory. Given that they are in memory they should be grouped with
and addressed-via-same-base-as other "memory" that a sub is accessing.
(The sub will be accessing the stack (or its PAD if you like), and the 
op-stream for sure, and possibly a few hot globals.)

The IL is going to be CISC-ish - so treat it like an x86 where 
you operate on things where-they-are (e.g. "on the stack") 

   add 4,BP[4]

rather than RISC where you 

   ld BP[4],X
   add 4,X
   ST X,BP[4]

If "registers" are really memory the extra "moves" of a RISC scheme
are expensive.

What we _really_ don't want is the worst of both worlds:

   push BP[4];
   push 4
   add
   pop  BP[4] 

>
>i am thinking about writing a short psuedo code post about the N-tuple
>op codes and the register set design. the ideas are percolating in my
>brane.
>
>uri
-- 
Nick Ing-Simmons
who is looking for a new job see http://www.ni-s.u-net.com/

Re: Stacks & registers

2001-05-26 Thread Uri Guttman

> "NI" == Nick Ing-Simmons <[EMAIL PROTECTED]> writes:

  NI> Well:
  NI>  (a) I thought the plan was to design threads in from the begining this time.
  NI>  (b) I maintain that cost is about the same as global variables anyway.

i agree with (a) but is that always compiled in/enabled even in a single
thread perl? is the code always thread enabled with the default being
just one thread running?

  >> and the window base is not accounted for. you would need 2
  >> indirections, the first to get the window base and the second to get the
  >> register in that window. 

  NI> No - you keep the window base "handy" and don't keep re-fetching it,
  NI> same way you keep "program counter" and "stack pointer" "handy".

  NI> Getting  
  NI>window[N] 
  NI> is same cost as 
  NI>next = *PC++; 

  NI> My point is that to avoid keeping too-many things "handy" window
  NI> base and stack pointer should be the same (real machine) register.

if we can control that. but i see issues too. i mentioned the idea of
having $_ and other special vars and stuff would have their own PMC's in
this register set. dan like the idea. that doesn't map well to a window
as those vars may not change when you call subs. i just don't see
register windows as useful at the VM level.

  >> i am just saying register windows don't seem to be any win for us
  >> and cost an extra indirection for each data access. my view is let
  >> the compiler keep track of the register usage and just do
  >> individual push/pops as needed when registers run out.

  NI> That makes sense if (and only if) virtual machine registers are real 
  NI> machine registers. If virtual machine registers are in memory then 
  NI> accessing them "on the stack" is just as efficient (perhaps more so)
  NI> than at some other "special" location. And it avoids need for 
  NI> memory-to-memory moves to push/pop them when we do "spill".

no, the idea is the VM compiler keeps track of IL register use for the
purpose of code generating N-tuple op codes and their register
arguments. this is a pure IL design thing and has nothing to do with
machine registers. at this level, register windows don't win IMO.

i am thinking about writing a short psuedo code post about the N-tuple
op codes and the register set design. the ideas are percolating in my
brane.

uri

-- 
Uri Guttman  -  [EMAIL PROTECTED]  --  http://www.sysarch.com
SYStems ARCHitecture and Stem Development -- http://www.stemsystems.com
Learn Advanced Object Oriented Perl from Damian Conway - Boston, July 10-11
Class and Registration info: http://www.sysarch.com/perl/OOP_class.html

Re: Stacks & registers

2001-05-26 Thread Nick Ing-Simmons

Uri Guttman <[EMAIL PROTECTED]> writes:
>  NI> i.e. 
>  NI>  R4 = frame[N]
>  NI> is same cost as
>  NI>  R4 = per_thread[N]
>  NI> and about the same as
>  NI>  extern REGISTER GlobalRegs4 
>  NI>  R4 = GlobalRegs4;
>
>well, if there is no multithreading then you don't need the per_thread
>lookup. 

Well:
 (a) I thought the plan was to design threads in from the begining this time.
 (b) I maintain that cost is about the same as global variables anyway.

The case for (b) is as follows:
on RISC hardware

R4 = SomeGlobal;

becomes two instructions:

loadhigh SomeGlobal.high,rp 
ld rp(SomeGlobal.low),R4

The C compiler will try and factor out the loadhigh instruction, leaving
you with an indexed load. In most cases 

ld rp(RegBase.low+4),R4

is just a valid and takes same number of cycles, and there is normally
a form like

ld rp(rn),R4

Which allows "index" by variable amount.

On CISC machines, then either there is an invisible RISC (e.g. Pentium)
which behaves as above or you get something akin to PDP-11 where indirection
reads a literal address via the "program counter".

move [pc+n],r4

In such cases 

move [regbase+n],r4 

is going to be just as fast - the issue is the need for a (real machine)
register to hold 'regbase'.

>and the window base is not accounted for. you would need 2
>indirections, the first to get the window base and the second to get the
>register in that window. 

No - you keep the window base "handy" and don't keep re-fetching it,
same way you keep "program counter" and "stack pointer" "handy".

Getting  
   window[N] 
is same cost as 
   next = *PC++; 

My point is that to avoid keeping too-many things "handy" window base
and stack pointer should be the same (real machine) register.

>i am just saying register windows don't seem to
>be any win for us and cost an extra indirection for each data access. my
>view is let the compiler keep track of the register usage and just do
>individual push/pops as needed when registers run out.

That makes sense if (and only if) virtual machine registers are real 
machine registers. If virtual machine registers are in memory then 
accessing them "on the stack" is just as efficient (perhaps more so)
than at some other "special" location. And it avoids need for 
memory-to-memory moves to push/pop them when we do "spill".

-- 
Nick Ing-Simmons
who is looking for a new job see http://www.ni-s.u-net.com/

Re: Stacks & registers

2001-05-24 Thread Uri Guttman

> "NI" == Nick Ing-Simmons <[EMAIL PROTECTED]> writes:

  NI> Uri Guttman <[EMAIL PROTECTED]> writes:

  >> but indexing directly into a stack frame is effectively a register
  >> window. the problem is that you need to do an indirection through the
  >> window base for every access and that is slow in software (but free in
  >> hardware).

  NI> It isn't free in hardware either, but cost may be lower.  Modern
  NI> machines should be able to schedule indirection fairly
  NI> efficiently.  But I would contend we are going to have at least
  NI> one index operation anyway - if only from the "thread" pointer, or
  NI> "global base" - so with careful design so that "registers" are at
  NI> right "offset" from the "base" we can subsume the register lookup
  NI> index into that.

  NI> i.e. 
  NI>  R4 = frame[N]
  NI> is same cost as
  NI>  R4 = per_thread[N]
  NI> and about the same as
  NI>  extern REGISTER GlobalRegs4 
  NI>  R4 = GlobalRegs4;

well, if there is no multithreading then you don't need the per_thread
lookup. and the window base is not accounted for. you would need 2
indirections, the first to get the window base and the second to get the
register in that window. i am just saying register windows don't seem to
be any win for us and cost an extra indirection for each data access. my
view is let the compiler keep track of the register usage and just do
individual push/pops as needed when registers run out.

uri

-- 
Uri Guttman  -  [EMAIL PROTECTED]  --  http://www.sysarch.com
SYStems ARCHitecture and Stem Development -- http://www.stemsystems.com
Learn Advanced Object Oriented Perl from Damian Conway - Boston, July 10-11
Class and Registration info: http://www.sysarch.com/perl/OOP_class.html

Re: Stacks & registers

2001-05-24 Thread Nick Ing-Simmons

Alan Burlison <[EMAIL PROTECTED]> writes:
>> 1. When you call deep enough to fall off the end of the large register
>>file an expensive "system call" is needed to save some registers
>>at the other end to memory and "wrap", and then again when you
>>come "back" to the now-in-memory registers.
>
>Not a system call but a trap - they aren't the same thing (pedant mode off
>;-).  The register spill trap handler copies the relevant registers onto the
>stack - each stack frame has space allocated for this.

Pedant mode accepted - and I concur. But trap handler is still significant
overhead compared to just doing the moves (scheduled) inline 
as part of normal code. So register windows win if you stay in bounds
but loose quite seriously if you have "active" deep calls.

(My own style is to write small functions rather than #define or inline,
for cache reasons - this has tended to make above show. I am delighted to 
say that _modern_ (Sun) SPARCs have deep enough windows even for me - 
but SPARCStation1+ and some of the lowcost CPUs didn't.)

>
>Alan Burlison
-- 
Nick Ing-Simmons
who is looking for a new job see http://www.ni-s.u-net.com/

Re: Stacks & registers

2001-05-24 Thread Nick Ing-Simmons

Uri Guttman <[EMAIL PROTECTED]> writes:
>> "NI" == Nick Ing-Simmons <[EMAIL PROTECTED]> writes:
>
>  NI> "We" need to decide where a perl6 sub's local variables are going
>  NI> to live (in the recursive case) - if we need a "stack" anyway it
>  NI> may make sense for VM to have ways of indexing the local "frame"
>  NI> rather than having "global" registers (set per thread by the way?)
>
>i made that thread point too in my long reply to dan.
>
>but indexing directly into a stack frame is effectively a register
>window. the problem is that you need to do an indirection through the
>window base for every access and that is slow in software (but free in
>hardware).

It isn't free in hardware either, but cost may be lower.
Modern machines should be able to schedule indirection fairly efficiently.
But I would contend we are going to have at least one index operation
anyway - if only from the "thread" pointer, or "global base" - so 
with careful design so that "registers" are at right "offset" from the "base"
we can subsume the register lookup index into that.

i.e. 
 R4 = frame[N]
is same cost as
 R4 = per_thread[N]
and about the same as
 extern REGISTER GlobalRegs4 
 R4 = GlobalRegs4;

>
-- 
Nick Ing-Simmons
who is looking for a new job see http://www.ni-s.u-net.com/

Re: Stacks & registers

2001-05-23 Thread Uri Guttman

> "AB" == Alan Burlison <[EMAIL PROTECTED]> writes:

  AB> Uri Guttman wrote:
  >> but indexing directly into a stack frame is effectively a register
  >> window. the problem is that you need to do an indirection through the
  >> window base for every access and that is slow in software (but free in
  >> hardware).

  AB> Not quite - a register window system has real registers, it's just
  AB> at some point they may end up in memory.  That's not the same as
  AB> indexing into memory on every access.

what i meant is that the indirection to access a given windowed register
using the window base is free (speed wise) in hardware, but doing the
same thing in software costs.

but comparing the hardware design with a pure software design is not too
useful. the costs and goals are so different. some conceptual analogies
are useful but saying register windows are fast in hardware (BTW that is
risc stuff) but not a speed win in software (where parrot is leaning
towards looking cisc-like) is not relevant.

uri

-- 
Uri Guttman  -  [EMAIL PROTECTED]  --  http://www.sysarch.com
SYStems ARCHitecture and Stem Development -- http://www.stemsystems.com
Learn Advanced Object Oriented Perl from Damian Conway - Boston, July 10-11
Class and Registration info: http://www.sysarch.com/perl/OOP_class.html

Re: Stacks & registers

2001-05-23 Thread Alan Burlison

Uri Guttman wrote:

> but indexing directly into a stack frame is effectively a register
> window. the problem is that you need to do an indirection through the
> window base for every access and that is slow in software (but free in
> hardware).

Not quite - a register window system has real registers, it's just at some
point they may end up in memory.  That's not the same as indexing into
memory on every access.

Alan Burlison

Re: Stacks & registers

2001-05-23 Thread Alan Burlison

Nick Ing-Simmons wrote:

> >That comment reminds me of how the register file is implemented in
> >a sun sparc. They have a large register file, but only some are accessable
> >at any given time, say 16.
> 
> 32 IIRC but principle is correct.

8 global registers, 8 out registers, 8 local registers and 8 in registers. 
Some are set aside for special purposes, e.g. %r14 is stack pointer, %r15 is
called subroutine return addr etc.  Effectively you have 6 'in' and 6 'out'
registers  Extra arguments above 6 are passed in the caller's stack frame.

> 1. When you call deep enough to fall off the end of the large register
>file an expensive "system call" is needed to save some registers
>at the other end to memory and "wrap", and then again when you
>come "back" to the now-in-memory registers.

Not a system call but a trap - they aren't the same thing (pedant mode off
;-).  The register spill trap handler copies the relevant registers onto the
stack - each stack frame has space allocated for this.

Alan Burlison

Re: Stacks & registers

2001-05-23 Thread Uri Guttman


> "NI" == Nick Ing-Simmons <[EMAIL PROTECTED]> writes:

  NI> "We" need to decide where a perl6 sub's local variables are going
  NI> to live (in the recursive case) - if we need a "stack" anyway it
  NI> may make sense for VM to have ways of indexing the local "frame"
  NI> rather than having "global" registers (set per thread by the way?)

i made that thread point too in my long reply to dan.

but indexing directly into a stack frame is effectively a register
window. the problem is that you need to do an indirection through the
window base for every access and that is slow in software (but free in
hardware).

  NI> What I do agree with is that 
  NI>push a
  NI>push b
  NI>add
  NI>pop  r 

  NI> is lousy way to code r = a + b - too much pointless copying. 
  NI> We want 
  NI>add #a,#b,#r
  NI> were #a is a small number indexing into "somewhere" where a is stored.

and that is an N-tuple opcode with a fixed size register set. register
windowing is not needed here, as pushing/popping as needed works well.

the win of a stack system is a cleaner and simpler op code api with
fewer op codes. but that doesn't gain us much as we have so many complex
ops that compiling them to simpler ones and all the extra runtime is a
loss.

BTW, an N-tuple op code is easily converted to/from an asm form which is
a feature that has been mentioned as desired. you can look at higher
level CISC instruction sets (e.g. vax) for examples. hell, the vax had a
polynomial op code and high level string ops!

uri

-- 
Uri Guttman  -  [EMAIL PROTECTED]  --  http://www.sysarch.com
SYStems ARCHitecture and Stem Development -- http://www.stemsystems.com
Learn Advanced Object Oriented Perl from Damian Conway - Boston, July 10-11
Class and Registration info: http://www.sysarch.com/perl/OOP_class.html

Re: Stacks & registers

2001-05-23 Thread Uri Guttman

> "DS" == Dan Sugalski <[EMAIL PROTECTED]> writes:

  DS> [I'm taking these all at once, so bear with me]
  DS> Simon wrote:

  >> Register based. Untyped registers; I'm hoping that the vtable stuff can be
  >> sufficiently optimized that there'll be no major win in storing multiple
  >> copies of a PMC's data in different types knocking around.

  DS> Maybe, but I'm thinking that adding two cached integers together (the 
  DS> integer piece of the register being essentially a cache) will be faster 
  DS> than two calls to get_integer.

that makes sense for common stuff like integer math and maybe even basic
string stuff like join, interpolation and (perl5) ..

we could have a string accumulator (shades of pdp8!) for building up
strings. the problem occurs when you have nested interpolation:

"foo$bar{qq{$key1$key2}}"

the compiler would have to convert that to inside out execution.

idea: there still is a special string accumulator but it is a pmc and
that is passed around instead of the full register. then a new pmc is
used to replace it. this should lower the copying of strings as they get
passed around. 

  DS> Also, by having integer bits of registers, it means we can avoid
  DS> making a full PMC for integer constants, or for integers (or
  DS> floats, or strings) the interpreter wants to deal with
  DS> internally. (scratch ints/floats/strings/whatever)

yep.

  >> I'm unclear what purpose having virtual resisters for ints and nums serves:
  >> can't we just put those in real registers? Or perhaps I'm missing the pt?

  DS> The point of having int/num/string registers is for cache and
  DS> convenience reasons. If Parrot's dealing with an integer and it
  DS> *knows* that it's an integer (because, for example, we're dealing
  DS> with some sort of internal counter, or Python code) there seems
  DS> little reason to promote it to a full PMC.

hmm, sort of like your special integer scratch registers too. but since
we are coding in C we can't force anything to real hardware registers,
we have to let the (g)cc handle that for us.

  DS> Hong wrote:

  >> every bytecode inside one method always operates on the same stack
  >> depth, therefore we can just treat the "locals + stack" as a flat
  >> register file. A single pass can translate stack based code into
  >> register based code.

  DS> Fair enough, but then registers are just oddly named stack
  DS> entities, which makes their being on the stack a matter of
  DS> implementation. (Whether that means anything useful is another
  DS> matter)

as most compiler books will note, all/most IL's can be easily translated
to one another. our goal is to pick one that works best for our needs
and then translate to others as desired (like jvm or .net).

  DS> Uri wrote:

  >> first question: is this for the intermediate language or the back end
  >> VM? they don't have to be the same.

  DS> Bytecode. I want the bytecode to be what the Parrot VM eats--it
  DS> will be Parrot's assembly language.

but that still doesn't answer my question. do we directly generate
bytecode or an IL and then generate bytecode from that? bytecode is not
an IL but a storable form of one. you can create bytecode for a stack or
register or N-tuple machine. i am saying that we can choose our IL
independently of the bytecode design. bytecode is harder to optimize and
manipulate than most IL's. so i say we compile to an IL which can be
interpreted directly and which can also be stored/retrieved in a
bytecode form. but the bytecode doesn't have to be directly
executable. it may need to be expanded back into the IL (which could be
very fast with a simple mapping). 

  >> since our goal is to support the polymorphic front/back end design,
  >> the intermediate language (IL) should be easy to generate and easy
  >> to compile to various targets. also it needs to have other features
  >> like being modifiable for optimization, storable on disk,
  >> debuggable (for dev types), etc.

  DS> This is the one spot it falls down. Though arguably translating
  DS> bytecodes dealing with PMCs will be more work than bytecodes that
  DS> deal with ints or strings.

that was my previous point. we have to decide the tradeoff between
speed, ease of manipulation of the IL and translations (if any) to/from
bytecode.

  DS> If someone thinks the whole "multiple views per register" thing is a bad 
  DS> idea, this would be a good line of attack to start with. :)

i think it is ok but there doesn't need to be a V flag for each
type. any given N-tuple knows knows the return value of any N-tuple it
references and so fetches it directly. just like in a real compiler, you
know that memory or register is an int and don't check it each time. but
it will change over time like our registers will. a register type is
tracked at compile time only and not looked at during runtime.

  >> the PL/I compiler suite i hacked on used that style of IL. but is
  >> allocated as many named temps as it needed by using N

Re: Stacks & registers

2001-05-23 Thread Dan Sugalski

[I'm taking these all at once, so bear with me]
Simon wrote:

>Register based. Untyped registers; I'm hoping that the vtable stuff can be
>sufficiently optimized that there'll be no major win in storing multiple
>copies of a PMC's data in different types knocking around.

Maybe, but I'm thinking that adding two cached integers together (the 
integer piece of the register being essentially a cache) will be faster 
than two calls to get_integer.

Also, by having integer bits of registers, it means we can avoid making a 
full PMC for integer constants, or for integers (or floats, or strings) the 
interpreter wants to deal with internally. (scratch 
ints/floats/strings/whatever)

>For those yet to be convinced by the benefits of registers over stacks, try
>grokking in fullness what op scratchpads are about. Ooh look, registers.

Or just dig through the code for pp_split. (While it might not convince you 
of the joys of registers, it will make your head hurt sufficiently to 
convince you that Things Must Be Done. Preferably things involving lots of 
analgesics... :)

Dave Mitchell wrote:

>For low-level assembly, registers are important because using them
>avoids memory accesses. For high-level perl 6 bytecode, I suppose the
>analogy would be having a pool of scratch PMCs that dont have
>to be continually alloced+GCed. Whether we can avoid the allocs depends
>on the fine detail of how how PMCs are allocated in general. If the allocs
>can't be avoided, then I think we might as well stick with a stack
>architecture.

We'll have a stack, certainly, or something like it. If the register file 
is limited in size (and it'll have to be, reality being what it is) 
there'll have to be a scratch spot to stick things we aren't using.

We could call 'em temporaries, or named temporaries, or whatever, but 
that's just a roundabout way of saying register. :) (Or vice versa, I 
suppose, depending on which direction you're coming from)

>I'm unclear what purpose having virtual resisters for ints and nums serves:
>can't we just put those in real registers? Or perhaps I'm missing the pt?

The point of having int/num/string registers is for cache and convenience 
reasons. If Parrot's dealing with an integer and it *knows* that it's an 
integer (because, for example, we're dealing with some sort of internal 
counter, or Python code) there seems little reason to promote it to a full PMC.

Hong wrote:

>I think "stack based =~ register based". If we don't have Java-like "jsr"
>and "ret",

We might, though. Probably will, as it will make translation to JVM and 
native machine code easier. (I think. Could be wrong)

>  every bytecode inside one method always operates on the same
>stack
>depth, therefore we can just treat the "locals + stack" as a flat register
>file. A single pass can translate stack based code into register based code.

Fair enough, but then registers are just oddly named stack entities, which 
makes their being on the stack a matter of implementation. (Whether that 
means anything useful is another matter)

Uri wrote:

>first question: is this for the intermediate language or the back end
>VM? they don't have to be the same.

Bytecode. I want the bytecode to be what the Parrot VM eats--it will be 
Parrot's assembly language.

>since our goal is to support the polymorphic front/back end design, the
>intermediate language (IL) should be easy to generate and easy to
>compile to various targets. also it needs to have other features like
>being modifiable for optimization, storable on disk, debuggable (for dev
>types), etc.

This is the one spot it falls down. Though arguably translating bytecodes 
dealing with PMCs will be more work than bytecodes that deal with ints or 
strings.

If someone thinks the whole "multiple views per register" thing is a bad 
idea, this would be a good line of attack to start with. :)

>the PL/I compiler suite i hacked on used that style of IL. but is
>allocated as many named temps as it needed by using N-tuples for each op
>code. each OP would leave its result in a named temp which could be
>referred to by later OPs. i think they were generic (as PL/I could mung
>many more types of data than perl) but the compiler knew what held what
>so that wasn't an issue.

That's still reasonably stack-based, and assumes a single return value per 
sub, which we don't have. (Unless we consider a list as a single value, 
which is reasonable)

>the problem with a fixed number of registers is register overflow. this
>is true in all register based machines and VM's and is a pain. if we go
>with a pure named temp architecture, you don't need to worry about
>that. but it is slower to access and manage named temps than a fixed set
>of registers (or even just a stack which is like a single reg machine).

 From all the literature I've dug through, a largish register set (like 32 
or 64) gets around 90+% of the overflow issues, and we can still have 
register push/pop opcodes.

>i have some experience with N-tuple

Re: Stacks & registers

2001-05-23 Thread Nick Ing-Simmons

Graham Barr <[EMAIL PROTECTED]> writes:
>On Wed, May 23, 2001 at 10:30:32AM -0700, Hong Zhang wrote:
>> I think "stack based =~ register based". If we don't have Java-like "jsr" 
>
>That comment reminds me of how the register file is implemented in
>a sun sparc. They have a large register file, but only some are accessable
>at any given time, say 16. 

32 IIRC but principle is correct.

>When you do a sub call you place your
>arguments in the high registers, say 4, and shift the window pointer
>by 12 (in this case).  What was r12-r15 now becomes r0-r3. On return
>the result is placed into r0-r3 which are then available to the
>caller as r12-r15.
>
>This allows very efficient argument passing without having to save
>registers to a stack and restor them later.

That cost can be over-played - most compilers need "scratch" registers
which with have values which don't need to be saved, if your RISC
is set up to use arg registers as scratch then most of the save/restores
can be eliminated, and others can be hidden in delay slots or super-scalar
executed in parallel with other operations.

The problems with SPARC scheme are:

1. When you call deep enough to fall off the end of the large register
   file an expensive "system call" is needed to save some registers
   at the other end to memory and "wrap", and then again when you
   come "back" to the now-in-memory registers.

2. The large file has a large decode which is in "logical" critical
   path - so all kinds of bypass tricks have to be played to get 
   speed back.

Neither is too bad for a virtual machine though.

"We" need to decide where a perl6 sub's local variables are going to live
(in the recursive case) - if we need a "stack" anyway it may make sense
for VM to have ways of indexing the local "frame" rather than having 
"global" registers (set per thread by the way?)

What I do agree with is that 
   push a
   push b
   add
   pop  r 

is lousy way to code r = a + b - too much pointless copying. 
We want 
   add #a,#b,#r
were #a is a small number indexing into "somewhere" where a is stored.

>
>Graham.
-- 
Nick Ing-Simmons
who is looking for a new job see http://www.ni-s.u-net.com/

Re: Stacks & registers

2001-05-23 Thread Simon Cozens


On Wed, May 23, 2001 at 01:36:14PM -0400, Bryan C. Warnock wrote:
> You could certainly call PL_undef and the ilk (in its new 
> PMC form) a dedicated register

Which is, incidentally, exactly what happens in disassembled Perl 5 bytecode.

-- 
"Why waste negative entropy on comments, when you could use the same
entropy to create bugs instead?"
-- Steve Elias

Re: Stacks & registers

2001-05-23 Thread Uri Guttman

> "GB" == Graham Barr <[EMAIL PROTECTED]> writes:

  GB> On Wed, May 23, 2001 at 10:30:32AM -0700, Hong Zhang wrote:
  >> I think "stack based =~ register based". If we don't have Java-like "jsr" 

  GB> That comment reminds me of how the register file is implemented in
  GB> a sun sparc. They have a large register file, but only some are accessable
  GB> at any given time, say 16. When you do a sub call you place your
  GB> arguments in the high registers, say 4, and shift the window pointer
  GB> by 12 (in this case).  What was r12-r15 now becomes r0-r3. On return
  GB> the result is placed into r0-r3 which are then available to the
  GB> caller as r12-r15.

  GB> This allows very efficient argument passing without having to save
  GB> registers to a stack and restor them later.

there have been arguments over whether register windows win or not. on
the plus side they make passing sub args very efficient (if you can fit
them all into the window). on the minus side, you have to do major
chunks of swapping registers to/from stack. the window size is a big
issue too and some think sparc has too small a window and too few total
registers.

and besides register windows is a hardware only solution (they just mung
the virtual base in a special register). i have not heard of it being
used in pure software as the extra costs of having many registers or a
deep stack in software is nothing while in hardware is has a real (and
heavy) cost.

uri

-- 
Uri Guttman  -  [EMAIL PROTECTED]  --  http://www.sysarch.com
SYStems ARCHitecture and Stem Development -- http://www.stemsystems.com
Learn Advanced Object Oriented Perl from Damian Conway - Boston, July 10-11
Class and Registration info: http://www.sysarch.com/perl/OOP_class.html

Re: Stacks & registers

2001-05-23 Thread Graham Barr


On Wed, May 23, 2001 at 10:30:32AM -0700, Hong Zhang wrote:
> I think "stack based =~ register based". If we don't have Java-like "jsr" 

That comment reminds me of how the register file is implemented in
a sun sparc. They have a large register file, but only some are accessable
at any given time, say 16. When you do a sub call you place your
arguments in the high registers, say 4, and shift the window pointer
by 12 (in this case).  What was r12-r15 now becomes r0-r3. On return
the result is placed into r0-r3 which are then available to the
caller as r12-r15.

This allows very efficient argument passing without having to save
registers to a stack and restor them later.

Graham.

Re: Stacks & registers

2001-05-23 Thread Graham Barr


On Wed, May 23, 2001 at 06:06:26PM +0100, Simon Cozens wrote:
> On Wed, May 23, 2001 at 12:59:01PM -0400, Dan Sugalski wrote:
> > Should Parrot be a register or stack-based system, and if a register-based 
> > one, should we go with typed registers?
> 
> Register based. Untyped registers; I'm hoping that the vtable stuff can be
> sufficiently optimized that there'll be no major win in storing multiple
> copies of a PMC's data in different types knocking around. 

I would agree with that too.

Graham.

Re: Stacks & registers

2001-05-23 Thread Buddha Buck


At 12:59 PM 05-23-2001 -0400, Dan Sugalski wrote:
>Okay, folks, here's the current conundrum:
>
>Should Parrot be a register or stack-based system, and if a register-based 
>one, should we go with typed registers?



>My current thoughts are this:
>
>We have a set of N registers. They're all linked. Nothing implicitly sets 
>values in any of the registers (if you want an integer value, you need to 
>make one). Each register has a set of validity markers for each type (int, 
>flaot, string, PMC) that may or may not be bits. We have a stack of sorts 
>that we can push the registers on to if we need.

In the section I snipped, you described "linked" registers in relation to 
multiple sets of typed registers, with linking meaning that IntR1 would 
have the same value as FloatR1, etc.  What do you mean by "linked" here, 
with each register being (as I read it) dynamically typed?

Is N fixed, or can we have different number of visible registers at a 
time?  When we push registers onto the stack, do we push them individually, 
or as a set?

I mean, can we get away with something like (assuming C++-style overloading 
on "Register"):

--
Register rFile[MAXREGSTACK][N];
int rDepth = 0;

ParrotOp
add(int addend, int addor, int sum)
{
 rFile[rDepth][sum] = rFile[rDept][addor] + rFile[rDepth][addend];
}

ParrotOp
pushrframe(void)
{
if (rDepth == MAXREGSTACK)
   die("Register Stack exceeded");

rDepth++;
}



Or we could go the SPARC register window route:
(Note:  SPARC register windows overlap: they have three sets of 8 
registers, and when a push happens, the old 3rd set becomes the new 1st 
set, allowing the caller and callee to share a set of 8 registers)


Register *rFile = NULL;
Register *rFrame = NULL;
int rFileSize = 0;

ParrotOp
add(unsigned int addend, unsigned int addor, unsigned int sum)
{
   assert(sum < 3*N); assert(addend < 3*N); assert (addor < 3*N);
   rFrame[sum] = rFrame[addend] + rFrame[addor]
}

ParrotOp
pushRframe
{
   if ((rFrame - rFile) < rFileSize - 5*N) {
 if (resize(rFile,rFileSize*2)) {
   rFileSize *= 2;
 } else {
   die("Register Stack Frame out of memory");
 }
   }
   rFrame += 2*N;
}

--


>I'm definitely feeling unsure about this, so feel free (please!) to wade 
>in with comments, criticisms, or personal attacks... :)
>
> Dan
>
>--"it's like this"---
>Dan Sugalski  even samurai
>[EMAIL PROTECTED] have teddy bears and even
>  teddy bears get drunk

Re: Stacks & registers

2001-05-23 Thread Bryan C . Warnock

On Wednesday 23 May 2001 12:59, Dan Sugalski wrote:
> Okay, folks, here's the current conundrum:
>
> Should Parrot be a register or stack-based system, and if a register-based
> one, should we go with typed registers?
>
> My personal preference here is to make Parrot a register based system. I
> can wrap my head around that a lot easier than a stack system, it's
> something I'm comfortable with, there's lots of literature on optimizing
> this sort of system, and I just generally like it better. (I cut my
> programming teeth on 6502s... so sue me :) Stack based systems have a
> certain appeal--they're simpler generally, which is fine. I'm not too
> worried about simpler as much as I am faster, and I think we can get
> faster out of registers. (Or, if you prefer, "named temporaries" instead
> of registers. Whatever)
>
> If (or when, I suppose, barring a Really Good Counter-Argument) we go the
> register route, then, should we have typed registers like most CPUs do? A
> set of PMC pointer registers, a set of integer registers, a set of
> floating-point registers, a set of string registers? And if we do go with
> typed registers, should they be linked together? (So that the int in
> Iregister 1 matches the integer value of the PMC in Pregister 1, and the
> float value in Fregister 1?) And if they're linked (And should they all be
> linked, or only some of them) should we guarantee consistency, or should
> the bytecode explicitly make various registers valid or invalid?

A little backwards my answer will be.  I'm fairly sure we won't need linked 
registers.  The PMC's vtable implemention should already take care of the 
int/float/string cross-referencing - by having the linked registers, you'd 
really only be creating a parallel structure to wade through.  Worse case, 
you're doing both - using the vtable to create the values for the other 
registers.  (After checking the register first to see if it were valid?)  
That doesn't sound like a win at all.  You could bypass the PMC's vtable and 
populate the registers directly, but then why have the PMC.  I don't see any 
reason for the linking.

Now, assuming I'm not way out in the left-field stands above, I don't see 
the necessity for typed registers (at least as described).  (I mean, one of 
the major reasons for typed registers is for hardware optimization, which we 
can't really borrow.  We're emulating that win with the vtables.)  However,
another common use for registers is to save on allocation - like a hardware 
zero register.  You could certainly call PL_undef and the ilk (in its new 
PMC form) a dedicated register, which means it's sort of type.  But I think 
you still make a save if you automatically assume that every register is 
simply a PMC.  

The only use I could think of would be for constants that you know aren't 
going to need to be converted from one form to another.  But then you need 
to handle the two paths in the rest of your vtable entries.  Unless you make 
everything run through the registers.

Your not going to make opcodes {shudder} dependent on implicit registers, 
are you?  :-)

-- 
Bryan C. Warnock
[EMAIL PROTECTED]

RE: Stacks & registers

2001-05-23 Thread Hong Zhang



> Register based. Untyped registers; I'm hoping that the vtable stuff can be
> sufficiently optimized that there'll be no major win in 
> storing multiple copies of a PMC's data in different types knocking
around. 
> 
> For those yet to be convinced by the benefits of registers over stacks,
try
> grokking in fullness what op scratchpads are about. Ooh look, registers.

I think "stack based =~ register based". If we don't have Java-like "jsr" 
and "ret", every bytecode inside one method always operates on the same
stack
depth, therefore we can just treat the "locals + stack" as a flat register
file. A single pass can translate stack based code into register based code.

For example:
  push local #3; => move #(max_local + opcode_stack_depth), #3

  push local #3; push local #4; add; pop local #5; => add #5, #3, #4

  push local #3; push local #4; call foo; pop #6; => call_2 #6, #3, #4

As long as stack based system is carefully designed, we can easily add
linear-cost translation step to convert it into register based bytecode,
and run it faster.

Hong

Re: Stacks & registers

2001-05-23 Thread Uri Guttman


> "DS" == Dan Sugalski <[EMAIL PROTECTED]> writes:

  DS> Should Parrot be a register or stack-based system, and if a
  DS> register-based one, should we go with typed registers?

first question: is this for the intermediate language or the back end
VM? they don't have to be the same.

since our goal is to support the polymorphic front/back end design, the
intermediate language (IL) should be easy to generate and easy to
compile to various targets. also it needs to have other features like
being modifiable for optimization, storable on disk, debuggable (for dev
types), etc.

this IL could be the VM we use in perl itself and it can also be
compiled to other VM's or c, etc.

also it could be processed/compiled to a faster perl VM for execution.


  DS> My personal preference here is to make Parrot a register based
  DS> system. I can wrap my head around that a lot easier than a stack
  DS> system, it's something I'm comfortable with, there's lots of
  DS> literature on optimizing this sort of system, and I just generally
  DS> like it better. (I cut my programming teeth on 6502s... so sue me
  DS> :) Stack based systems have a certain appeal--they're simpler
  DS> generally, which is fine. I'm not too worried about simpler as
  DS> much as I am faster, and I think we can get faster out of
  DS> registers. (Or, if you prefer, "named temporaries" instead of
  DS> registers. Whatever)

the PL/I compiler suite i hacked on used that style of IL. but is
allocated as many named temps as it needed by using N-tuples for each op
code. each OP would leave its result in a named temp which could be
referred to by later OPs. i think they were generic (as PL/I could mung
many more types of data than perl) but the compiler knew what held what
so that wasn't an issue.

  DS> If (or when, I suppose, barring a Really Good Counter-Argument) we
  DS> go the register route, then, should we have typed registers like
  DS> most CPUs do? A set of PMC pointer registers, a set of integer
  DS> registers, a set of floating-point registers, a set of string
  DS> registers? And if we do go with typed registers, should they be
  DS> linked together? (So that the int in Iregister 1 matches the
  DS> integer value of the PMC in Pregister 1, and the float value in
  DS> Fregister 1?) And if they're linked (And should they all be
  DS> linked, or only some of them) should we guarantee consistency, or
  DS> should the bytecode explicitly make various registers valid or
  DS> invalid?

the problem with a fixed number of registers is register overflow. this
is true in all register based machines and VM's and is a pain. if we go
with a pure named temp architecture, you don't need to worry about
that. but it is slower to access and manage named temps than a fixed set
of registers (or even just a stack which is like a single reg machine).

  DS> My current thoughts are this:

  DS> We have a set of N registers. They're all linked. Nothing
  DS> implicitly sets values in any of the registers (if you want an
  DS> integer value, you need to make one). Each register has a set of
  DS> validity markers for each type (int, flaot, string, PMC) that may
  DS> or may not be bits. We have a stack of sorts that we can push the
  DS> registers on to if we need.

that is a good start. the stack can handle the overflow but that means
more complexity in the compiler to deal with that. let's do some study
of various IL designs (including both compiled and interpreted) and see
their wins and losses. i think pure stack based is out as it is too
limiting. i have some experience with N-tuples and it does work well for
compiling as you don't care about the managing of registers at run time
then. you just keep a fat namespace table at compile time and throw it
out later. we have a compile and run phase so managing a large number of
N-tuples may be slow running but easy to generate and manipulate. that
is a typical tradeoff.

so my current best idea is a fixed size set of active N-tuples/registers
(any number larger than most long statements would work, say 100?) as
the registers. each N-tuple has its op code and argument register
indexes and which register it stores its result in. we have a stack for
the overflow handling and that is the hardest part.

we should have a small set of special registers to handle events,
booleans, the PC, signals, etc.

this makes compiling and optimization very easy. we just need to manage
the lifetime of registers for the N-tuples and deal with the (hopefully)
rare overflow. runtime interpretation is also fairly simple (not as easy
as a stack but not too bad) with decent speed. code generation to other
VM or storing to disk should be fairly straightforward.

  DS> I'm definitely feeling unsure about this, so feel free (please!)
  DS> to wade in with comments, criticisms, or personal attacks... :)

you are a doofus.

but a smart one. :)

uri

-- 
Uri Guttman  -  [EMAIL PROTECTED]  --  http://www.sysarch.com
SYStems AR

Re: Stacks & registers

2001-05-23 Thread Dave Mitchell


> Okay, folks, here's the current conundrum:
> 
> Should Parrot be a register or stack-based system, and if a register-based 
> one, should we go with typed registers?

For low-level assembly, registers are important because using them
avoids memory accesses. For high-level perl 6 bytecode, I suppose the
analogy would be having a pool of scratch PMCs that dont have
to be continually alloced+GCed. Whether we can avoid the allocs depends
on the fine detail of how how PMCs are allocated in general. If the allocs
can't be avoided, then I think we might as well stick with a stack
architecture.
I'm unclear what purpose having virtual resisters for ints and nums serves:
can't we just put those in real registers? Or perhaps I'm missing the pt?

Dave "even unclearer abut this than Dan" M.

Re: Stacks & registers

2001-05-23 Thread Simon Cozens

On Wed, May 23, 2001 at 12:59:01PM -0400, Dan Sugalski wrote:
> Should Parrot be a register or stack-based system, and if a register-based 
> one, should we go with typed registers?

Register based. Untyped registers; I'm hoping that the vtable stuff can be
sufficiently optimized that there'll be no major win in storing multiple
copies of a PMC's data in different types knocking around. 

For those yet to be convinced by the benefits of registers over stacks, try
grokking in fullness what op scratchpads are about. Ooh look, registers.

-- 
> I'm a person, not a piece of property.
Happily, I'm both!
- Lionel and Stephen Harris.

Stacks & registers

2001-05-23 Thread Dan Sugalski


Okay, folks, here's the current conundrum:

Should Parrot be a register or stack-based system, and if a register-based 
one, should we go with typed registers?

My personal preference here is to make Parrot a register based system. I 
can wrap my head around that a lot easier than a stack system, it's 
something I'm comfortable with, there's lots of literature on optimizing 
this sort of system, and I just generally like it better. (I cut my 
programming teeth on 6502s... so sue me :) Stack based systems have a 
certain appeal--they're simpler generally, which is fine. I'm not too 
worried about simpler as much as I am faster, and I think we can get faster 
out of registers. (Or, if you prefer, "named temporaries" instead of 
registers. Whatever)

If (or when, I suppose, barring a Really Good Counter-Argument) we go the 
register route, then, should we have typed registers like most CPUs do? A 
set of PMC pointer registers, a set of integer registers, a set of 
floating-point registers, a set of string registers? And if we do go with 
typed registers, should they be linked together? (So that the int in 
Iregister 1 matches the integer value of the PMC in Pregister 1, and the 
float value in Fregister 1?) And if they're linked (And should they all be 
linked, or only some of them) should we guarantee consistency, or should 
the bytecode explicitly make various registers valid or invalid?

My current thoughts are this:

We have a set of N registers. They're all linked. Nothing implicitly sets 
values in any of the registers (if you want an integer value, you need to 
make one). Each register has a set of validity markers for each type (int, 
flaot, string, PMC) that may or may not be bits. We have a stack of sorts 
that we can push the registers on to if we need.

I'm definitely feeling unsure about this, so feel free (please!) to wade in 
with comments, criticisms, or personal attacks... :)

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk

89 matches

Mail list logo