Re: Stacks, registers, and bytecode. (Oh, my!)
Dan Sugalski <[EMAIL PROTECTED]> writes: > >I'm not entirely sure of that one--processing a full regex requires the >perl interpreter, it's not all that modular. Though whether being able to >yank out the RE engine and treat it as a standalone library is important >enough to warrant being treated as a design goal or not is a separate >issue. (I think so, as it also means I can treat it as a black box for the >moment so there's less to try and stuff in my head at once) We are way past that point in perl5 - having tried to use perl's regexps in a non perl app you need perl there, not the interpreter perhaps by bits of the tokenizer and of course SV *. So to make it modular in perl6 yoiu have to re-write it, and I am will Larry - lets make the main op-state machine handle those ops too. That will help with need to special case regexps for signal despatch etc. > >*) It makes the amount of mental space the core interpreter takes up smaller But surely we are considering "expandable" intepreter already? Adding regexp ops is just one such extension. >*) It can make performance tradeoffs separately from the main perl engine >*) We can probably snag the current perl 5 source without much change I doubt that. >*) The current RE engine's scared (or is that scarred?) me off enough that >I'd as soon leave it to someone who's more tempermentally suited for such >things. >*) Treating regexes as non-atomic operations brings some serious threading >issues into things. Leaving them atomic does as well - I will switch threads as soon as the regexp completes ... -- Nick Ing-Simmons
Re: Stacks, registers, and bytecode. (Oh, my!)
At 07:40 AM 6/5/2001 -0700, Dave Storrs wrote: >On Tue, 5 Jun 2001, Dave Mitchell wrote: > > > dispatch loop. I'd much rather have a 'regex start' opcode which > > calls a separate dispath loop function, and which then interprets any > > further ops in the bytestream as regex ops. That way we double the number > > of 8-bit ops, and can have all the regex-specific state variables (s, send > > etc in the earlier example) and logic separated out. > > This is an interesting idea...could we use this more generally to >multiply our number of opcodes? Basically, you have one set of opcodes >for (e.g.) string parsing, one set for math, etc, all of which have the >same value. Then you have a set of opcodes that tells the interpreter >which opcode table to look in. Nah, that's too much work. We just allow folks to define their own opcode functions and assign each a lexically unique number, and dispatch to the function as appropriate. Adding and overriding opcodes is definitely in the cards, though in most cases it'll probably be an opcode version of a function call, since machine-level stuff would also require telling the compiler how to emit those opcodes. (Which folks writing python/ruby/rebol/cobol/fortran front ends for the interpreter might do) Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Stacks, registers, and bytecode. (Oh, my!)
On Tue, Jun 05, 2001 at 03:31:24PM -0500, David L. Nicol wrote: > Graham Barr wrote: > > > I think there are a lot of benefits to the re engine not to be > > separate from the core perl ops. > > > So does it start with a split(//,$bound_thing) or does it use > substr(...) with explicit offsets? Eh ? Nobody is suggesting we implement re's using the current set of perl ops, but that we extend the set with ops needed for re's. So they use the same dispatch loop and that the ops can be intermixed Graham.
Re: Stacks, registers, and bytecode. (Oh, my!)
Graham Barr wrote: > I think there are a lot of benefits to the re engine not to be > separate from the core perl ops. So does it start with a split(//,$bound_thing) or does it use substr(...) with explicit offsets?
Re: Stacks, registers, and bytecode. (Oh, my!)
On Mon, Jun 04, 2001 at 06:04:10PM -0700, Larry Wall wrote: > Well, other languages have explored that option, and I think that makes > for an unnatural interface. If you think of regexes as part of a > larger language, you really want them to be as incestuous as possible, > just as any other part of the language is incestuous with the rest of > the language. That's part of what I mean when I say that I'm trying to > look at regular expressions as just a strange variant of Perl code. > > Looking at it from a slightly different angle, regular expressions are > in great part control syntax, and library interfaces are lousy at > implementing control. Right. Having the regex opcodes be perl opcodes will certainly make implementing (?{ ... }) much easier and probably faster too. Also re references that we have now will become similar to subroutines for pattern matching. I think there are a lot of benefits to the re engine not to be separate from the core perl ops. Graham.
RE: Stacks, registers, and bytecode. (Oh, my!)
> On Tue, Jun 05, 2001 at 11:25:09AM +0100, Dave Mitchell wrote: > > This is the bit that scares me about unifying perl ops and regex ops: > > can we really unify them without taking a performance hit? > > Coupl'a things: firstly, we can make Perl 6 ops as lightweight as we like. > > Second, Ruby uses a giant switch instead of function pointers for their > op despatch loop; Matz says it doesn't make that much difference in > terms of performance. Function pointer dispath is normally faster or as fast as switch. The main down side is the context. A typical regular expression engine can pre-fetch many variables into "register local", they can be efficiently used by all switch cases. However, the common context for regular expression is relative small, I am not sure of the performance hit. Hong
Re: Stacks, registers, and bytecode. (Oh, my!)
On Tue, 5 Jun 2001, Dave Mitchell wrote: > dispatch loop. I'd much rather have a 'regex start' opcode which > calls a separate dispath loop function, and which then interprets any > further ops in the bytestream as regex ops. That way we double the number > of 8-bit ops, and can have all the regex-specific state variables (s, send > etc in the earlier example) and logic separated out. This is an interesting idea...could we use this more generally to multiply our number of opcodes? Basically, you have one set of opcodes for (e.g.) string parsing, one set for math, etc, all of which have the same value. Then you have a set of opcodes that tells the interpreter which opcode table to look in. The 'switching' opcodes then become overhead, but if there aren't too many of those, perhaps its acceptable. And it would mean that we could specialize the opcodes a great deal more (if, of course, that is desirable), and still have them fit in an octet. (Sorry if this is a stupid question, but please be patient; I've never done internals stuff before.) Dave
Re: Stacks, registers, and bytecode. (Oh, my!)
Simon Cozens <[EMAIL PROTECTED]> opined: > On Tue, Jun 05, 2001 at 11:25:09AM +0100, Dave Mitchell wrote: > > This is the bit that scares me about unifying perl ops and regex ops: > > can we really unify them without taking a performance hit? > > Coupl'a things: firstly, we can make Perl 6 ops as lightweight as we like. > > Second, Ruby uses a giant switch instead of function pointers for their > op despatch loop; Matz says it doesn't make that much difference in > terms of performance. I think it would be very messy to have both types of operands in the same dispatch loop. I'd much rather have a 'regex start' opcode which calls a separate dispath loop function, and which then interprets any further ops in the bytestream as regex ops. That way we double the number of 8-bit ops, and can have all the regex-specific state variables (s, send etc in the earlier example) and logic separated out. > I don't know if I've mentioned this before, but > http://www-6.ibm.com/jp/developerworks/linux/001027/ruby_qa.html > was my interview with Matsumoto about his ideas for Perl 6 and his experiences > from Ruby. It's in Japanese, so http://www.excite.co.jp/world/url/ may help. "A talk of jewelry Perl developer From Mr. Simon Cozens to Ruby developer It is also as a pine It dies and is the question and reply to Mr. [squiggle]" :-)
Re: Stacks, registers, and bytecode. (Oh, my!)
On Tue, Jun 05, 2001 at 11:25:09AM +0100, Dave Mitchell wrote: > This is the bit that scares me about unifying perl ops and regex ops: > can we really unify them without taking a performance hit? Coupl'a things: firstly, we can make Perl 6 ops as lightweight as we like. Second, Ruby uses a giant switch instead of function pointers for their op despatch loop; Matz says it doesn't make that much difference in terms of performance. I don't know if I've mentioned this before, but http://www-6.ibm.com/jp/developerworks/linux/001027/ruby_qa.html was my interview with Matsumoto about his ideas for Perl 6 and his experiences from Ruby. It's in Japanese, so http://www.excite.co.jp/world/url/ may help. -- Familiarity breeds facility. -- Megahal (trained on asr), 1998-11-06
Re: Stacks, registers, and bytecode. (Oh, my!)
Larry Wall <[EMAIL PROTECTED]> wrote: > It may certainly be valuable to (not) think of it that way, but just > don't be surprised if the regex folks come along and borrow a lot of > your opcodes to make things that look like (in C): > >while (s < send && isdigit(*s)) s++; This is the bit that scares me about unifying perl ops and regex ops: I see perl ops as relatively heavyweight things that can absorb the costs of 'heavyweight' dispatch (function call overhead, etc etc), while regex stuff needs to be very lightweight, eg while (op = *optr++) { switch (op) { case FOO: while (s < send && isdigit(*s)) s++; case BAR: while (s < send && isspace(*s)) s++; can we really unify them without taking a performance hit?
Re: Stacks, registers, and bytecode. (Oh, my!)
> Well, other languages have explored that option, and I think that makes > for an unnatural interface. If you think of regexes as part of a > larger language, you really want them to be as incestuous as possible, These days we can be that without feeling that guilty since pcre exists. > just as any other part of the language is incestuous with the rest of > the language. That's part of what I mean when I say that I'm trying to > look at regular expressions as just a strange variant of Perl code. > > Looking at it from a slightly different angle, regular expressions are > in great part control syntax, and library interfaces are lousy at > implementing control. Assuming one wants to have access to regex low level ops, but if one wants just matches/not and submatches, a regex(3)-like interface is quite enough. > Larry -- $jhi++; # http://www.iki.fi/jhi/ # There is this special biologist word we use for 'stable'. # It is 'dead'. -- Jack Cohen
Re: Stacks, registers, and bytecode. (Oh, my!)
On Monday 04 June 2001 08:38 pm, Jarkko Hietaniemi wrote: > > : Though whether being able to > > : yank out the RE engine and treat it as a standalone library is > > : important enough to warrant being treated as a design goal or not is a > > : separate issue. (I think so, as it also means I can treat it as a > > : black box for the moment so there's less to try and stuff in my head > > : at once) > > > > As a fellow bear of very little brain, I'm just trying to point out that > > we already have a good example of the dangers to that approach. > > Still, having the regexen as "reusable component" (i.e. library) > wouldn't be a bad idea. The current regex code most definitely isn't > detachable or reusable, so we can't have said to have explored that > option. But to what extent? I seem to recall one of the main reasons against complete separation was (speculated, perhaps) that it was too inefficient for the regex engine to be ignorant of Perl internals. Given that the regexen, in all its complexity, is one of the more enviable capabilities of Perl (traditionally), it would certainly be nice (for others, I suppose) to be able to gain Perl Regex Handling via a couple calls and shared library. But it was counter-argued that much of the regex's speed (and some of its functionality) was directly tied to Perl internal structures, and the layer of abstraction necessary to completely separate the two would not benefit either Perl, or the stand-alone engine, for both cases, so why bother? IOW, if you're not going to completely exorcise Perl from its regexes, why try to exorcise the regexes from Perl? -- Bryan C. Warnock [EMAIL PROTECTED]
Re: Stacks, registers, and bytecode. (Oh, my!)
Jarkko Hietaniemi writes: : > : Though whether being able to : > : yank out the RE engine and treat it as a standalone library is important : > : enough to warrant being treated as a design goal or not is a separate : > : issue. (I think so, as it also means I can treat it as a black box for the : > : moment so there's less to try and stuff in my head at once) : > : > As a fellow bear of very little brain, I'm just trying to point out that : > we already have a good example of the dangers to that approach. : : Still, having the regexen as "reusable component" (i.e. library) : wouldn't be a bad idea. The current regex code most definitely isn't : detachable or reusable, so we can't have said to have explored that : option. Well, other languages have explored that option, and I think that makes for an unnatural interface. If you think of regexes as part of a larger language, you really want them to be as incestuous as possible, just as any other part of the language is incestuous with the rest of the language. That's part of what I mean when I say that I'm trying to look at regular expressions as just a strange variant of Perl code. Looking at it from a slightly different angle, regular expressions are in great part control syntax, and library interfaces are lousy at implementing control. Larry
Re: Stacks, registers, and bytecode. (Oh, my!)
Dan Sugalski writes: : Yeah, a lot of that's definitely a problem, as is the manipulation of the : return stack and some assignment bits. (You can cut the time split takes in : half by having the destination array presized, for example) That's why the current version Perl goes to a great deal of trouble to hang onto its allocations from previous iterations/invocations, on the theory that if you needed N elements last time, you'll probably need about N elements again this time. In changing how locals are generated for Perl 6, it would be easy to lose such optimizations accidentally, and that might be unfortunate. It may be that there's a middle ground, where by saving only the information and not the actual allocation, we can estimate how much to reallocate the next time through without actually hanging on to the memory in question. Larry
Re: Stacks, registers, and bytecode. (Oh, my!)
> : Though whether being able to > : yank out the RE engine and treat it as a standalone library is important > : enough to warrant being treated as a design goal or not is a separate > : issue. (I think so, as it also means I can treat it as a black box for the > : moment so there's less to try and stuff in my head at once) > > As a fellow bear of very little brain, I'm just trying to point out that > we already have a good example of the dangers to that approach. Still, having the regexen as "reusable component" (i.e. library) wouldn't be a bad idea. The current regex code most definitely isn't detachable or reusable, so we can't have said to have explored that option. -- $jhi++; # http://www.iki.fi/jhi/ # There is this special biologist word we use for 'stable'. # It is 'dead'. -- Jack Cohen
Re: Stacks, registers, and bytecode. (Oh, my!)
Dan Sugalski writes: : At 11:24 AM 6/4/2001 -0700, Larry Wall wrote: : >Dan Sugalski writes: : >: Are you speaking of the nodes in regnode.h? I hadn't considered them as : >: regular perl opcodes--I figured they'd stay internal to the regex engine so : >: we could keep it reasonably modular. : > : >I don't think that's a terribly strong argument--one could justify any : >number of unfortunate architectural distinctions on the basis of : >modularity. : : Yeah I know. I mean, look at how those darned bricks have held back : architecture all these years! :-P Hey, I come from the coast where we try to avoid bricks, especially when they're falling. : >Plus, I'd argue you can still retain modularity of your : >code while unifying implementational philosophy. : : I'm not entirely sure of that one--processing a full regex requires the : perl interpreter, it's not all that modular. These days I'm trying to see the regex as just a funny-looking kind of Perl code. : Though whether being able to : yank out the RE engine and treat it as a standalone library is important : enough to warrant being treated as a design goal or not is a separate : issue. (I think so, as it also means I can treat it as a black box for the : moment so there's less to try and stuff in my head at once) As a fellow bear of very little brain, I'm just trying to point out that we already have a good example of the dangers to that approach. : >It seems to me that the main reason for not considering such a : >unification is that We've Never Done It That Way Before. It's as if : >regular expressions have always been second-class programs, so they'll : >always be second-class programs, world without end, amen, amen. : : No, not really. The big reasons I wasn't planning on unification are: : : *) It makes the amount of mental space the core interpreter takes up smaller It may certainly be valuable to (not) think of it that way, but just don't be surprised if the regex folks come along and borrow a lot of your opcodes to make things that look like (in C): while (s < send && isdigit(*s)) s++; : *) It can make performance tradeoffs separately from the main perl engine The option of doing its own thing its own way is always open to an opcode, but when you do that the option of making efficient use of the core infrastructure goes away. As an honorary member of the regex hacking team, I covet registers. :-) : *) We can probably snag the current perl 5 source without much change Cough, cough. : *) The current RE engine's scared (or is that scarred?) me off enough that : I'd as soon leave it to someone who's more tempermentally suited for such : things. As an honorary member of the temperamentally suited team, allow me to repeat myself. Cough, cough. We can certainly borrow the ideas from Perl 5's regex engine, but even us temperamentally suited veterans are sufficiently scared/scarred to want something that works better. : *) Treating regexes as non-atomic operations brings some serious threading : issues into things. Eh, treating regexes as atomic is the root of the re-entrancy problem. If the regex has access to local storage, the re-entrancy and threading problems pretty much solve themselves. : >The fact that Perl 5's regex engine is a royal pain to deal with should : >be a warning to us. : : I can think of a couple of reasons that the current engine's a royal pain, : and they don't have much to do with it as a separate entity... Sure, I'm just saying that at least two of those couple reasons are that 1) it invents its own opcode storage mechanism, and 2) it uses globals for efficiency when it should be using some efficient variety of locals. : >Much of the pain of dealing with the regex engine in Perl 5 has to do : >with allocation of opcodes and temporary values in a non-standard : >fashion, and dealing with the resultant non-reentrancy on an ad hoc : >basis. We've already tried that experiment, and it sucks. I don't : >want to see the regex engine get swept back under the complexity carpet : >for Perl 6. : : Yeah, but those are mostly issues with the implementation, not with the : separation. That sounds suspiciously like what I'm trying to say. : >That's a scenario I'd love to avoid. And if we can manage to store : >regex opcodes and state using mechanisms similar to ordinary opcodes, : >maybe we'll not fall back into the situation where the regex engine is : >understood by only three people, plus or minus four. : : While I'm not sure I agree with it, if that's what you want, then that's : what we'll do. Threading will complicate this some, since we'll need to : guarantee atomicity across multiple opcodes, something I'd not planned on : doing. How will we guarantee that my $foo stays "my" under threading? Surely the same mechanism could serve to keep "my" regex state variables sane under the same circumstances. (I oversimplify, of course...) Larry
Re: Stacks, registers, and bytecode. (Oh, my!)
At 01:09 AM 6/1/2001 -0700, [EMAIL PROTECTED] wrote: >In my experience, perl opcodes have not been the performance bottleneck >in perl5. > >It seems it isn't actually the loop that's the bottleneck in perl5. I >profiled a whole bunch of different perl programs, using a lot of >different versions of perl5; and the runops loop was very rarely among >the top CPU users. Many times, the opcode routines themselves weren't >the hot spots. It was the support routines like hash key calculation >and lookup, string comparisons, or the regular expression code. Yeah, a lot of that's definitely a problem, as is the manipulation of the return stack and some assignment bits. (You can cut the time split takes in half by having the destination array presized, for example) >Profiling is almost always counter-intuitive, but if I had to >guess, I'd say that most of the per-opcode cost in perl5 was due to >setup/initialization as each opcode was entered, and that devious/clever >data structure design could avoid most of this. Also, opcode dispatch >might not be the right tree up which to be barking in seeking performance. The setup/teardown costs are definitely a heavy expense. We can cut it down some, hopefully a lot in some cases, but there are limits to what can be done. If they're treated as a fixed cost (which they more or less are), then the less often we pay that cost the better. Unfortunately there's also the problem of having *too* many opcode functions, at which point you pay a lot of cycles in cache miss times fetching in Specially Optimized Function #4327. It seems, though, that if we have the option of doing something in one opcode instead of 10 or 20, it makes sense to do so as we reduce the constant costs. Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Stacks, registers, and bytecode. (Oh, my!)
At 11:46 AM 6/1/2001 -0700, [EMAIL PROTECTED] wrote: >On Tue, May 29, 2001 at 06:20:40PM -0400, Dan Sugalski wrote: > > > > I really think we'll win if we have support for at least integers as well > > as PMCs. There's potentially a lot of integer work that'll be generated by > > the optimizer and, while integer opcodes might not make the interpreter > > much faster, they'll speed up TIL and generated C code significantly since > > they won't need to call opcodes. > >How much integer arithmetic does the perl interpreter actually do? Some, though not as much as it ought as most math's done as floating point. >Figuring out where the hot spots are in an interpreter for a general- >purpose programming language is hard. I'd recommend against special >cases in the registers, since it's not clear how much they'd help. There are a few reasons to consider integer and floating point registers: *) A good chunk of the optimizations I've been coming across seem to benefit from integer scratch space. (Granted, generally for languages with a different thrust than perl) *) Code that does use it will translate more efficiently to .NET/Java bytecode/native machine code *) Larry's expressed a desire to let us get lower-level with perl, which'll benefit from having integer scratch space *) If the interpreter will support multiple source languages, they'll get a benefit from it. (Like Python and Ruby, to name two) It might be that there's no advantage to having them, in which case we'll excise them and the opcodes that reference them, and that's fine. There seems to be some advantage to it, which is why I brought them up in the first place. Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Stacks, registers, and bytecode. (Oh, my!)
> "DS" == Dan Sugalski <[EMAIL PROTECTED]> writes: DS> No, not really. The big reasons I wasn't planning on unification are: DS> *) It makes the amount of mental space the core interpreter takes DS> up smaller not if the regex ops were developed separately. all it will do is expand the op table, not make the core more complex. also the regex compiler can be written separately as well. DS> *) It can make performance tradeoffs separately from the main perl engine that can be done with a regex optimizer, or creating more specialized and efficient regex ops. DS> *) We can probably snag the current perl 5 source without much change is that worth keeping? maybe for bootstrapping the new setup but not permanently. stuff like having direct access to regex engine for debugging would be better designed in from scratch. and the moving of the engine to the main op code loop solves that problem. DS> *) Treating regexes as non-atomic operations brings some serious DS> threading issues into things. well, the main op loop is threaded (isn't one of the bugs with current threads the fact that the regex engine is not thread safe?) so you get that automatically. by just marking the regex data as shared should trigger all the right thread locks and stuff. what happens to $1 when the data is shared and threaded? are thread local copies of the original data made and $1 maps inside those? >> That's a scenario I'd love to avoid. And if we can manage to store >> regex opcodes and state using mechanisms similar to ordinary opcodes, >> maybe we'll not fall back into the situation where the regex engine is >> understood by only three people, plus or minus four. DS> While I'm not sure I agree with it, if that's what you want, then that's DS> what we'll do. Threading will complicate this some, since we'll need to DS> guarantee atomicity across multiple opcodes, something I'd not planned on DS> doing. what is wrong with a single mutex (or equivilent) around the regex? that is a simple way to gain atomicity over multiple ops. and we would have to define what is atomic in detail. most regexes will operate on thread specific data as most data should be thread specific. doing a regex on a shared variable may be costly but that can be dealt with by copying it first to a thread local variable. uri -- Uri Guttman - [EMAIL PROTECTED] -- http://www.sysarch.com SYStems ARCHitecture and Stem Development -- http://www.stemsystems.com Learn Advanced Object Oriented Perl from Damian Conway - Boston, July 10-11 Class and Registration info: http://www.sysarch.com/perl/OOP_class.html
Re: Stacks, registers, and bytecode. (Oh, my!)
Jarkko Hietaniemi wrote: > Err...a regex that isn't a regex, is this a Zen koan...? Ahhh, you > want to emulate the state machine in Pure Perl. Okay... next thing > you want to do is to write symbolic assembler in C...? :-) I have my reasons :) Actually, I want to write a c into perl compiler -- I have for years -- one of these long manic afternoons I'll do it, too
Re: Stacks, registers, and bytecode. (Oh, my!)
At 03:08 PM 6/4/2001 -0500, Jarkko Hietaniemi wrote: >On Mon, Jun 04, 2001 at 03:43:43PM -0400, Dan Sugalski wrote: > > At 08:34 PM 6/4/2001 +0100, Simon Cozens wrote: > > >On Mon, Jun 04, 2001 at 02:26:26PM -0500, David L. Nicol wrote: > > > > Does anyone have on-their-shelves a regex-into-non-regex-perl > translator? > > > > > >Does anyone have on-their-shelves a David-Nicol-into-English > translator? :) > > > > I think he's looking for something that turns a regex into perl that > > doesn't involve regexes. > >You must be sharing the pipe with David since I couldn't make head||tails >about it, either :-) Hey, it's the standard, regulation-issue perl internals pipe. Didn't you get one with the pumpkin? (I think they put it inside as a joke... :) >Err...a regex that isn't a regex, is this a Zen koan...? Ahhh, you >want to emulate the state machine in Pure Perl. Okay... next thing >you want to do is to write symbolic assembler in C...? :-) I know, I know, what *are* we thinking? Heck, next thing you know we'll want to go parsing text in C too... :-P Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Stacks, registers, and bytecode. (Oh, my!)
At 04:20 PM 6/4/2001 -0400, Uri Guttman wrote: >then regexes can be also debugged in detail with the perl debugger. >that assumes the debugger has access to single stepping ops which is an >intriguing idea. BTW this is the kind of feature that dan wanted the >debugger PDD to have. having this feature supported and running early >on, would allow us to debug the compiler and optimizer using our own >tool. in fact if the debugger supported remote control as well as >internal, then it could be developed in parallel in perl5. this would be >a very useful thing to have during the development of the full perl 6. Regexes will be able to be debugged in detail regardless of how the regex engine's implemented. If we go separate, you'll just single-step through the regex ops rather than the perl ones. Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Stacks, registers, and bytecode. (Oh, my!)
At 11:24 AM 6/4/2001 -0700, Larry Wall wrote: >Dan Sugalski writes: >: Are you speaking of the nodes in regnode.h? I hadn't considered them as >: regular perl opcodes--I figured they'd stay internal to the regex engine so >: we could keep it reasonably modular. > >I don't think that's a terribly strong argument--one could justify any >number of unfortunate architectural distinctions on the basis of >modularity. Yeah I know. I mean, look at how those darned bricks have held back architecture all these years! :-P >Plus, I'd argue you can still retain modularity of your >code while unifying implementational philosophy. I'm not entirely sure of that one--processing a full regex requires the perl interpreter, it's not all that modular. Though whether being able to yank out the RE engine and treat it as a standalone library is important enough to warrant being treated as a design goal or not is a separate issue. (I think so, as it also means I can treat it as a black box for the moment so there's less to try and stuff in my head at once) >It seems to me that the main reason for not considering such a >unification is that We've Never Done It That Way Before. It's as if >regular expressions have always been second-class programs, so they'll >always be second-class programs, world without end, amen, amen. No, not really. The big reasons I wasn't planning on unification are: *) It makes the amount of mental space the core interpreter takes up smaller *) It can make performance tradeoffs separately from the main perl engine *) We can probably snag the current perl 5 source without much change *) The current RE engine's scared (or is that scarred?) me off enough that I'd as soon leave it to someone who's more tempermentally suited for such things. *) Treating regexes as non-atomic operations brings some serious threading issues into things. >The fact that Perl 5's regex engine is a royal pain to deal with should >be a warning to us. I can think of a couple of reasons that the current engine's a royal pain, and they don't have much to do with it as a separate entity... >Much of the pain of dealing with the regex engine in Perl 5 has to do >with allocation of opcodes and temporary values in a non-standard >fashion, and dealing with the resultant non-reentrancy on an ad hoc >basis. We've already tried that experiment, and it sucks. I don't >want to see the regex engine get swept back under the complexity carpet >for Perl 6. Yeah, but those are mostly issues with the implementation, not with the separation. >That's a scenario I'd love to avoid. And if we can manage to store >regex opcodes and state using mechanisms similar to ordinary opcodes, >maybe we'll not fall back into the situation where the regex engine is >understood by only three people, plus or minus four. While I'm not sure I agree with it, if that's what you want, then that's what we'll do. Threading will complicate this some, since we'll need to guarantee atomicity across multiple opcodes, something I'd not planned on doing. Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Stacks, registers, and bytecode. (Oh, my!)
On Mon, Jun 04, 2001 at 04:20:12PM -0400, Uri Guttman wrote: > that doesn't address larry's point which is very important. No, I was replying to David, so I thought I'd address his point instead. Conventional, I know. :) > the regex compiler needs to be able to generate the equvilent of the > above directly into perl ops. Yes, of course. It's easy enough to do if, for instance, you have a Perl op called "exact", "plus", etc. ... -- "Everything's working, try again in half an hour."-chorus.net tech support
Re: Stacks, registers, and bytecode. (Oh, my!)
> "SC" == Simon Cozens <[EMAIL PROTECTED]> writes: SC> OK, here's how you do it. Run perl -Mre=debug -e '/your regexp/', SC> and use Perl to parse the bit in the middle. That's a state machine, SC> so we can emulate it with subroutines. SC> So, for instance: perl -Mre=debug -e '/fo+bar/' SC> ... SC>1: EXACT (3) SC>3: PLUS(6) SC>4: EXACT (0) SC>6: EXACT (8) SC>8: END(0) SC> ... SC> Now we translate that to something like this: SC> sub node_1 { SC> $state = shift; SC> while ($_ = substr($state->{string},0,1,"")) { SC> if ($_ eq "f") { SC> return success if $state->node_3; SC> } SC> } SC> return failure; SC> } SC> ... and so on ... SC> In keeping with tradition, implementation is left as an exercise for SC> the reader. :) that doesn't address larry's point which is very important. the regex compiler needs to be able to generate the equvilent of the above directly into perl ops. it would eliminate the separation of the regex runtime as a black box. we could create special ops that support node making and traversal and others that execute the low level matches. an optimizer could then combine some ops into fewer and more specialized ones. then regexes can be also debugged in detail with the perl debugger. that assumes the debugger has access to single stepping ops which is an intriguing idea. BTW this is the kind of feature that dan wanted the debugger PDD to have. having this feature supported and running early on, would allow us to debug the compiler and optimizer using our own tool. in fact if the debugger supported remote control as well as internal, then it could be developed in parallel in perl5. this would be a very useful thing to have during the development of the full perl 6. uri -- Uri Guttman - [EMAIL PROTECTED] -- http://www.sysarch.com SYStems ARCHitecture and Stem Development -- http://www.stemsystems.com Learn Advanced Object Oriented Perl from Damian Conway - Boston, July 10-11 Class and Registration info: http://www.sysarch.com/perl/OOP_class.html
Re: Stacks, registers, and bytecode. (Oh, my!)
On Mon, Jun 04, 2001 at 03:43:43PM -0400, Dan Sugalski wrote: > At 08:34 PM 6/4/2001 +0100, Simon Cozens wrote: > >On Mon, Jun 04, 2001 at 02:26:26PM -0500, David L. Nicol wrote: > > > Does anyone have on-their-shelves a regex-into-non-regex-perl translator? > > > >Does anyone have on-their-shelves a David-Nicol-into-English translator? :) > > I think he's looking for something that turns a regex into perl that > doesn't involve regexes. You must be sharing the pipe with David since I couldn't make head||tails about it, either :-) Err...a regex that isn't a regex, is this a Zen koan...? Ahhh, you want to emulate the state machine in Pure Perl. Okay... next thing you want to do is to write symbolic assembler in C...? :-) > > > run time is not an issue > > > >Wrong. > > Um Presumably if he *was* interested in run time he wouldn't have said > that. (Which isn't to say that I'm not interested in run time for something > like that if it's to be used in perl 6, but that's a separate issue) > > Dan > > --"it's like this"--- > Dan Sugalski even samurai > [EMAIL PROTECTED] have teddy bears and even > teddy bears get drunk -- $jhi++; # http://www.iki.fi/jhi/ # There is this special biologist word we use for 'stable'. # It is 'dead'. -- Jack Cohen
Re: Stacks, registers, and bytecode. (Oh, my!)
On Mon, Jun 04, 2001 at 03:43:43PM -0400, Dan Sugalski wrote: > I think he's looking for something that turns a regex into perl that > doesn't involve regexes. Oh, well, if we follow Larry's suggestion and have regexp matching ops, then we could claim that the regex *is* Perl, it's just written in a slightly different language. :) OK, here's how you do it. Run perl -Mre=debug -e '/your regexp/', and use Perl to parse the bit in the middle. That's a state machine, so we can emulate it with subroutines. So, for instance: perl -Mre=debug -e '/fo+bar/' ... 1: EXACT (3) 3: PLUS(6) 4: EXACT (0) 6: EXACT (8) 8: END(0) ... Now we translate that to something like this: sub node_1 { $state = shift; while ($_ = substr($state->{string},0,1,"")) { if ($_ eq "f") { return success if $state->node_3; } } return failure; } ... and so on ... In keeping with tradition, implementation is left as an exercise for the reader. :) > (Which isn't to say that I'm not interested in run time for something > like that if it's to be used in perl 6, but that's a separate issue) Oh, I see, I see. Sorry, was confusing "implemention of thing for Perl 6" and "implementation of this handy tool for exploration purposes". Simon -- "The elder gods went to Suggoth and all I got was this lousy T-shirt."
Re: Stacks, registers, and bytecode. (Oh, my!)
At 08:34 PM 6/4/2001 +0100, Simon Cozens wrote: >On Mon, Jun 04, 2001 at 02:26:26PM -0500, David L. Nicol wrote: > > Does anyone have on-their-shelves a regex-into-non-regex-perl translator? > >Does anyone have on-their-shelves a David-Nicol-into-English translator? :) I think he's looking for something that turns a regex into perl that doesn't involve regexes. > > run time is not an issue > >Wrong. Um Presumably if he *was* interested in run time he wouldn't have said that. (Which isn't to say that I'm not interested in run time for something like that if it's to be used in perl 6, but that's a separate issue) Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Stacks, registers, and bytecode. (Oh, my!)
On Mon, Jun 04, 2001 at 02:26:26PM -0500, David L. Nicol wrote: > Does anyone have on-their-shelves a regex-into-non-regex-perl translator? Does anyone have on-their-shelves a David-Nicol-into-English translator? :) > run time is not an issue Wrong. -- I decided to spread the bad mood: Dress To Depress. - Red Drag Diva
Re: Stacks, registers, and bytecode. (Oh, my!)
Larry Wall wrote: > "Sure, you can download the object code for this 5 line Perl program > into your toaster...but you'll also have to download this 5 gigabyte > regex interpreter before it'll run." > > That's a scenario I'd love to avoid. And if we can manage to store > regex opcodes and state using mechanisms similar to ordinary opcodes, > maybe we'll not fall back into the situation where the regex engine is > understood by only three people, plus or minus four. > > Larry Does anyone have on-their-shelves a regex-into-non-regex-perl translator? run time is not an issue, correct behavior is -- David Nicol 816.235.1187 [EMAIL PROTECTED] "Obi-Wan taught me mysticism" -- Luke Housego
Re: Stacks, registers, and bytecode. (Oh, my!)
> The fact that Perl 5's regex engine is a royal pain to deal with should > be a warning to us. > > Much of the pain of dealing with the regex engine in Perl 5 has to do > with allocation of opcodes and temporary values in a non-standard > fashion, and dealing with the resultant non-reentrancy on an ad hoc > basis. We've already tried that experiment, and it sucks. I don't > want to see the regex engine get swept back under the complexity carpet > for Perl 6. It will come back to haunt us if we do: *standing ovations* > "Sure, you can download the object code for this 5 line Perl program > into your toaster...but you'll also have to download this 5 gigabyte > regex interpreter before it'll run." > > That's a scenario I'd love to avoid. And if we can manage to store > regex opcodes and state using mechanisms similar to ordinary opcodes, > maybe we'll not fall back into the situation where the regex engine is > understood by only three people, plus or minus four. > > Larry -- $jhi++; # http://www.iki.fi/jhi/ # There is this special biologist word we use for 'stable'. # It is 'dead'. -- Jack Cohen
Re: Stacks, registers, and bytecode. (Oh, my!)
Dan Sugalski writes: : Are you speaking of the nodes in regnode.h? I hadn't considered them as : regular perl opcodes--I figured they'd stay internal to the regex engine so : we could keep it reasonably modular. I don't think that's a terribly strong argument--one could justify any number of unfortunate architectural distinctions on the basis of modularity. Plus, I'd argue you can still retain modularity of your code while unifying implementational philosophy. It seems to me that the main reason for not considering such a unification is that We've Never Done It That Way Before. It's as if regular expressions have always been second-class programs, so they'll always be second-class programs, world without end, amen, amen. But there is precedent for turning second-class code into first-class code. After all, that's just what we did for ordinary quotes in the transition from Perl 4 to Perl 5. Perl 4 had a string interpolation engine, and it was a royal pain to deal with. The fact that Perl 5's regex engine is a royal pain to deal with should be a warning to us. Much of the pain of dealing with the regex engine in Perl 5 has to do with allocation of opcodes and temporary values in a non-standard fashion, and dealing with the resultant non-reentrancy on an ad hoc basis. We've already tried that experiment, and it sucks. I don't want to see the regex engine get swept back under the complexity carpet for Perl 6. It will come back to haunt us if we do: "Sure, you can download the object code for this 5 line Perl program into your toaster...but you'll also have to download this 5 gigabyte regex interpreter before it'll run." That's a scenario I'd love to avoid. And if we can manage to store regex opcodes and state using mechanisms similar to ordinary opcodes, maybe we'll not fall back into the situation where the regex engine is understood by only three people, plus or minus four. Larry
Re: Stacks, registers, and bytecode. (Oh, my!)
On Tue, May 29, 2001 at 06:20:40PM -0400, Dan Sugalski wrote: > > I really think we'll win if we have support for at least integers as well > as PMCs. There's potentially a lot of integer work that'll be generated by > the optimizer and, while integer opcodes might not make the interpreter > much faster, they'll speed up TIL and generated C code significantly since > they won't need to call opcodes. How much integer arithmetic does the perl interpreter actually do? I've profiled a quite a few perl5 versions on a lot of different perl programs, and about the only programs where integer ops made it above the background noise were things like Ackerman, iterative Fibonacci, and some Q&D image processing on PBM files. Most of the programs I looked at didn't do much of any integer arithmetic. Figuring out where the hot spots are in an interpreter for a general- purpose programming language is hard. I'd recommend against special cases in the registers, since it's not clear how much they'd help. -- Ed Mooring ([EMAIL PROTECTED])
Re: Stacks, registers, and bytecode. (Oh, my!)
On Wed, May 30, 2001 at 12:14:29PM -0400, Uri Guttman wrote: > > "NI" == Nick Ing-Simmons <[EMAIL PROTECTED]> writes: > > NI> The "overhead of op dispatch" is a self-proving issue - if you > NI> have complex ops they are expensive to dispatch. > > but as someone else said, we can design our own ops to be as high level > as we want. lowering the number of op calls is the key. that loop will > be a bottleneck as it is in perl5 unless we optimize it now. > In my experience, perl opcodes have not been the performance bottleneck in perl5. It seems it isn't actually the loop that's the bottleneck in perl5. I profiled a whole bunch of different perl programs, using a lot of different versions of perl5; and the runops loop was very rarely among the top CPU users. Many times, the opcode routines themselves weren't the hot spots. It was the support routines like hash key calculation and lookup, string comparisons, or the regular expression code. Profiling is almost always counter-intuitive, but if I had to guess, I'd say that most of the per-opcode cost in perl5 was due to setup/initialization as each opcode was entered, and that devious/clever data structure design could avoid most of this. Also, opcode dispatch might not be the right tree up which to be barking in seeking performance. -- Ed Mooring ([EMAIL PROTECTED])
Re: Stacks, registers, and bytecode. (Oh, my!)
> "DS" == Dan Sugalski <[EMAIL PROTECTED]> writes: DS> The size of the generated bytecode's likely not going to be DS> directly an issue in determining speed. (And we can't count on DS> keep the locals exclusively in the register set either, as DS> otherwise string eval won't be able to find them, and neither will DS> the debugger) but if registers are usually just a PMC pointer, that address can also be accessed through the symbol table or pad or whatever. in my view what the compiler is doing with registers is actually aliasing a register index to some PMC somewhere. a load register doesn't copy the PMC, just its address into the register slot. this whole purpose of the register system is to support a simple bytecode with integer offsets for arguments. that bytecode format has many postive qualities so we should try to make the IL map well to it. the register idea is not a speed thing in and of itself. the registers are the logical way the VM operates on data. fetching a global may require an operations such as a symbol lookup and load the PMC into a register. the compiler tracks register usage and does the push/pops to manage them. DS> We still may have synchronization issues with threads and lexicals, but DS> likely far less than with globals. by keeping all thread specific data in a single data structure, then only global accesses will need to be synchronized. i think the VM data (stack, registers, PC, etc.) will be a part of this per thread data block. uri -- Uri Guttman - [EMAIL PROTECTED] -- http://www.sysarch.com SYStems ARCHitecture and Stem Development -- http://www.stemsystems.com Learn Advanced Object Oriented Perl from Damian Conway - Boston, July 10-11 Class and Registration info: http://www.sysarch.com/perl/OOP_class.html
RE: Stacks, registers, and bytecode. (Oh, my!)
> While RISC style opcodes have a certain pleasant simplicity to them, the > opcode overhead'll kill us. (If perl 5's anything to go by, and in this > case I think it might be) I don't understand what the opcode overhead you meant. It can not be worse than stack based opcode. > The size of the generated bytecode's likely not going to be directly an > issue in determining speed. (And we can't count on keep the locals > exclusively in the register set either, as otherwise string eval won't be > able to find them, and neither will the debugger) If we have 64 registers, I bet 99.99% of locals will fit into register set. For the debug information, we just need a hash to map my name to register id. This is needed even for stack based code. How to deal with optimization is a different story. However, it has nothing to do with the register based scheme itself. For example, an local variable is not used by any of the code, however an eval($s) may refer to it. So should optimizer eliminate the local? Hong
Re: Stacks, registers, and bytecode. (Oh, my!)
> "LW" == Larry Wall <[EMAIL PROTECTED]> writes: LW> We basically tried this experiment with Perl 5, and it's only a LW> partial success. Yes, you end up with a Perl VM that can run Perl LW> pretty fast, but it tends to complicate the mapping to other LW> virtual machines. (Enough so that we still don't have a LW> reasonable way to run Perl on a JVM, despite several attempts.) good point. does that mean we should think more risc like? my (very limited) impression of the jvm is that it lower level and not too dissimilar to p-code. how much closer to jvm (and $DIETY help us, .NET) should our IL design go? uri -- Uri Guttman - [EMAIL PROTECTED] -- http://www.sysarch.com SYStems ARCHitecture and Stem Development -- http://www.stemsystems.com Learn Advanced Object Oriented Perl from Damian Conway - Boston, July 10-11 Class and Registration info: http://www.sysarch.com/perl/OOP_class.html
RE: Stacks, registers, and bytecode. (Oh, my!)
At 10:06 AM 5/30/2001 -0700, Hong Zhang wrote: > > There's no reason why you can.t have a hybrid scheme. In fact I think > > it's a big win over a pure register-addressing scheme. Consider... > >The hybrid scheme may be a win in some cases, but I am not sure if it >worth the complexity. I personally prefer a strict RISC style opcodes, >mainly load, store, and ops for common operators (+, -, * etc), plus >escaped opcode for complicated operators and functions. While RISC style opcodes have a certain pleasant simplicity to them, the opcode overhead'll kill us. (If perl 5's anything to go by, and in this case I think it might be) >Please note most of common operations will deal with locals, not >globals. Since almost all locals will fit into register set, the >generated bytecode will be very small and very fast. The size of the generated bytecode's likely not going to be directly an issue in determining speed. (And we can't count on keep the locals exclusively in the register set either, as otherwise string eval won't be able to find them, and neither will the debugger) >The global access is doomed to be slower than locals, especailly >considering the synchronization overhead associated with threading. We still may have synchronization issues with threads and lexicals, but likely far less than with globals. Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Stacks, registers, and bytecode. (Oh, my!)
Uri Guttman <[EMAIL PROTECTED]> writes: >> "NI" == Nick Ing-Simmons <[EMAIL PROTECTED]> writes: > > NI> The "overhead of op dispatch" is a self-proving issue - if you > NI> have complex ops they are expensive to dispatch. > >but as someone else said, we can design our own ops to be as high level >as we want. lowering the number of op calls is the key. that loop will >be a bottleneck as it is in perl5 unless we optimize it now. > > NI> With a 16-bit opcode as-per-Uri that becomes: > > NI>while (1) *(table[*op_ptr++])(); > > NI> (Assuming we don't need to check bounds 'cos we won't generate bad code...) > >i dropped the 16 bit idea in favor of an extension byte code that zhong >mentioned. it has several wins, no ordering issues, it is pure 'byte' >code. > > NI> One can then start adding decode to the loop: > > NI>while (1) { > NI> op_t op = *op_ptr++; > NI> switch(NUM_ARGS(op)) > >no switch, a simple lookup table: > > op_cnt = op_counts[ op ] ; Myths of 21st Century Computing #1: "Memory lookups are cheap" Most processors only have only one memory unit and it typically has a long pipeline delay. But many have several units that can do compare etc. A lookup table may or may-not be faster/denser than a switch. A lookup may take 9 cycles down a memory pipe while ans = (op > 16) ? 2 : (op > 8) ? 1 : 0; might super-scalar issue in 1 cycle. Code at high level and let C compiler know what is best. C will give you a lookup if that is best. Memory ops need not be expensive if they pipeline well, but making one memory op depend on the result of another is bad idea e.g. op = *op_ptr++; arg1 = *op_ptr++; arg2 = *op_ptr++; May apear to happen in 3 cycles, as all the loads can be issued in a pipelined manner and ++s issued in parallel. While op = *op_ptr++; ans = table[op]; could take seem to 18 cycles as can't start 2nd load till 1st one completes. I have been meaning to try and prove my point with a software-pipelined dispatch loop which is fetching one op, decoding previous one and executing one before that. -- Nick Ing-Simmons who is looking for a new job see http://www.ni-s.u-net.com/
Re: Stacks, registers, and bytecode. (Oh, my!)
At 09:35 AM 5/30/2001 -0700, Larry Wall wrote: >Dan Sugalski writes: >: Right, but in this case we have the advantage of tailoring the instruction >: set to the language, and given the overhead inherent in op dispatch we also >: have an incentive to hoist opcodes up to as high a level as we can manage. > >We basically tried this experiment with Perl 5, and it's only a partial >success. Yes, you end up with a Perl VM that can run Perl pretty fast, >but it tends to complicate the mapping to other virtual machines. >(Enough so that we still don't have a reasonable way to run Perl on a >JVM, despite several attempts.) I think, if we have a solid definition of the abstract machine the p-code's targeted at, that a JIT/.NET/whatever translation should be reasonably straightforward. Except for the regex engine, which is probably going to drive someone (else) mad... >I guess the real question is to what extent the world of the future will >use interpreters, and to what extent it'll settle on JIT compiling instead. >And that's a big enough dog that we can't wag it very easily. If I had to bet, I'd wager that interpreter usage will increase as the power of processors increase. How much of that usage is co-opted by JIT versions of interpreters is something I'm not sure of. >By the way, have you folks considered how to unify the regex opcodes >with the normal Perl opcodes? I suspect we might want to do that. Are you speaking of the nodes in regnode.h? I hadn't considered them as regular perl opcodes--I figured they'd stay internal to the regex engine so we could keep it reasonably modular. Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
RE: Stacks, registers, and bytecode. (Oh, my!)
> There's no reason why you can.t have a hybrid scheme. In fact I think > it's a big win over a pure register-addressing scheme. Consider... The hybrid scheme may be a win in some cases, but I am not sure if it worth the complexity. I personally prefer a strict RISC style opcodes, mainly load, store, and ops for common operators (+, -, * etc), plus escaped opcode for complicated operators and functions. > Consider the following code. > > $a = $x*$y+$z > > Suppose we have r5 and r6 available for scratch use, and that for some > reason we wish to keep a pointer to $a in r1 at the end > (perhaps we use $a again a couple of lines later): > > > This might have the following bytecode with a pure resiger scheme: > > GETSV('x',r5) # get pointer to global $x, store in register 5 > GETSV('y',r6) > MULT(r5,r5,r6) # multiply the things pointed to by r5 and > r6; store ptr to > # result in r5 > GETSV('z',r6) > ADD(r5,r5,r6) > GETSV('a',r1) > SASSIGN(r1,r5) Please note most of common operations will deal with locals, not globals. Since almost all locals will fit into register set, the generated bytecode will be very small and very fast. The global access is doomed to be slower than locals, especailly considering the synchronization overhead associated with threading. Hong
Re: Stacks, registers, and bytecode. (Oh, my!)
Dan Sugalski writes: : Right, but in this case we have the advantage of tailoring the instruction : set to the language, and given the overhead inherent in op dispatch we also : have an incentive to hoist opcodes up to as high a level as we can manage. We basically tried this experiment with Perl 5, and it's only a partial success. Yes, you end up with a Perl VM that can run Perl pretty fast, but it tends to complicate the mapping to other virtual machines. (Enough so that we still don't have a reasonable way to run Perl on a JVM, despite several attempts.) I guess the real question is to what extent the world of the future will use interpreters, and to what extent it'll settle on JIT compiling instead. And that's a big enough dog that we can't wag it very easily. By the way, have you folks considered how to unify the regex opcodes with the normal Perl opcodes? I suspect we might want to do that. Larry
Re: Stacks, registers, and bytecode. (Oh, my!)
Dave Mitchell <[EMAIL PROTECTED]> writes: > >There's no reason why you can.t have a hybrid scheme. In fact I think >it's a big win over a pure register-addressing scheme. Consider... Which was more or less my own position... > >At the start of a new scope, the stack is extended by N to create a new >stack frame (including a one-off check that the stack can be >extended). There is then a 'stack pointer' (sp) which is initialised >to the base of the new frame, or an initial offset thereof. (So sp is >really just a temporary index within the current frame.) > >Then some opcodes can use explicit addressing, while others can be explicit, >or a mixture. > >Explicit opcodes specify one or more 'registers' - ie indexes within the >current frame, while implicit opcodes use the current value of sp as an >implicit index, and may alter sp as a side effect. So an ADD opcode >would use sp[0], sp[-1] to find the 2 operands and would store a pointer >to the result at sp[-1], then sp--. The compiler plants code in such a way >that it will never allow sp to go outside the current stack frame. > >This allows a big win on the size of the bytecode, and in terms of the >time required to decode each op. > >Consider the following code. > >$a = $x*$y+$z > >Suppose we have r5 and r6 available for scratch use, and that for some >reason we wish to keep a pointer to $a in r1 at the end (perhaps we use >$a again a couple of lines later): > > >This might have the following bytecode with a pure resiger scheme: > >GETSV('x',r5) # get pointer to global $x, store in register 5 >GETSV('y',r6) >MULT(r5,r5,r6) # multiply the things pointed to by r5 and r6; store ptr to > # result in r5 >GETSV('z',r6) >ADD(r5,r5,r6) >GETSV('a',r1) >SASSIGN(r1,r5) Globals are a pain. Consider this code: sub foo { my ($x,$y,$z) = @_; return $x*$y+$z; } In the pure register (RISC-oid) scheme the bytecode should be: FOO: MULT(arg1,arg2,tmp1) ADD(tmp1,arg3,result) RETURN That is lexicals get allocated registers at compile time, and ops just go get them. In the pure stack with alloc scheme (x86-oid) scheme it should be ENTER +1 # need a temp MULT SP[1],SP[2],SP[4] # $x*$y ADD SP[4],SP[3],SP[1]# temp + $z -> result RETURN -2# Loose temp and non-results And in a pure stack (FORTH, PostScript) style it might be rot 3# reorder stack to get x y on top mpy add ret > >but might be like this in a hybrid scheme: > >SETSP(5) # make sp point to r5 >GETSV('x') # get pointer to global $a, store at *sp++ >GETSV('y') >MULT >GETSV('z') >ADD >GETSV('a') >SASSIGN >SAVEREG(r1)# pop pointer at *sp, and store in register 1 The problem that the hybrid scheme glosses over is the re-order the args issue that is handled by register numbers, stack addressing or FORTH/PostScript stack re-ordering. It avoids it by expensive long range global fetches - which is indeed what humans do when writing PostScript - use globals - but compilers can keep track of such mess for us. > > >Both use the same regsters, have the same net result, but the explicit >scheme requires an extra 11 numbers in the bytecode, not to mention all >the extra cycles required to extract out those nunmbers from the bytecode >in the first place. -- Nick Ing-Simmons who is looking for a new job see http://www.ni-s.u-net.com/
Re: Stacks, registers, and bytecode. (Oh, my!)
> "NI" == Nick Ing-Simmons <[EMAIL PROTECTED]> writes: NI> The "overhead of op dispatch" is a self-proving issue - if you NI> have complex ops they are expensive to dispatch. but as someone else said, we can design our own ops to be as high level as we want. lowering the number of op calls is the key. that loop will be a bottleneck as it is in perl5 unless we optimize it now. NI> With a 16-bit opcode as-per-Uri that becomes: NI>while (1) *(table[*op_ptr++])(); NI> (Assuming we don't need to check bounds 'cos we won't generate bad code...) i dropped the 16 bit idea in favor of an extension byte code that zhong mentioned. it has several wins, no ordering issues, it is pure 'byte' code. NI> One can then start adding decode to the loop: NI>while (1) { NI> op_t op = *op_ptr++; NI> switch(NUM_ARGS(op)) no switch, a simple lookup table: op_cnt = op_counts[ op ] ; NI>*(table[FUNC_NUM(op)])(*op_ptr++); i don't have the c code in mind for that. something that uses op_cnt and passes in that many args. op_ptr += op_cnt; uri -- Uri Guttman - [EMAIL PROTECTED] -- http://www.sysarch.com SYStems ARCHitecture and Stem Development -- http://www.stemsystems.com Learn Advanced Object Oriented Perl from Damian Conway - Boston, July 10-11 Class and Registration info: http://www.sysarch.com/perl/OOP_class.html
Re: Stacks, registers, and bytecode. (Oh, my!)
> "NI" == Nick Ing-Simmons <[EMAIL PROTECTED]> writes: NI> Uri Guttman <[EMAIL PROTECTED]> writes: >> >> think of this as classic CISC code generation with plenty of registers >> and a scratch stack. this is stable technology. we could even find a >> code generator guru (i don't know any obvious ones in the perl6 world) NI> Classic CISC code generation taught "us" that CISC is a pain to code-gen. NI> (I am not a Guru but did design TMS320C80's RISC specifically to match NI> gcc of that vintage, and dabbled in a p-code for Pascal way back.) i disagree. cisc went out of style for speed reasons, not because of compilers. most risc compilers are much more complicated than cisc compilers. they have to manage complex register usage, deal with delayed branches, reorder instructions to fill pipelines, etc. what happened was that the higher level instruction logic in a cisc chip was removed and put into the compiler. but we are not doing a risc like IL so this is moot. >> >> >> special registers ($_, @_, events, etc.) are indexed with a starting >> >> offset of 64, so general registers are 0-63. >> DS> I'd name them specially (S0-Snnn) rather than make them a chunk of the DS> normal register set. NI> All that dividing registers into sub-classes does it cause you to do NI> register-register moves when things are in the wrong sort of register. NI> Its only real benefit is for encoding density as you can "imply" part NI> of the register number by requiring addresses to be in address registers NI> etc. It is not clear to me that perl special variables map well to that. NI> Mind you the names are just a human thing - it is the bit-pattern that NI> compiler cares about. no register to register moves are needed. if you call tr with the default $_ it get generated as: TR_OP REG_ARG, , ; REG_ARG is $_ where <> are register indexes generate by the compiler. it assumes something is already in $_ if you use tr on $foo you code gen: push ; save reg if needed load , $foo; load if needed - probably more code ; than this TR_OP , , the first two lines are only there is $foo isn't in a register yet. in both cases is picked by the compiler and must be free to use. it gets the tr count return (the result of the op). the idea is that coding for $_ or $foo is basically the same. you just use the special register index for $_ vs. any regular one for $foo. code generation with this type of IL is almost one to one with perl. most of the language level operations map to one op code, so it is mostly a case of managing the data in and out of the registers and calling the op codes. a real compiler has to do so much more. TIL and c back ends will need much more work than the basic perl to IL translation. idea: we have a VOID special register. it is like /dev/null. if you execute an op in a void context, you pass in the VOID special register to get the result. then either no result is generated or stored depending on how we code it. this doesn't handle the runtime context of sub calls. uri -- Uri Guttman - [EMAIL PROTECTED] -- http://www.sysarch.com SYStems ARCHitecture and Stem Development -- http://www.stemsystems.com Learn Advanced Object Oriented Perl from Damian Conway - Boston, July 10-11 Class and Registration info: http://www.sysarch.com/perl/OOP_class.html
Re: Stacks, registers, and bytecode. (Oh, my!)
> >the op code is stored in network endian order and the interpreter will > >always build a 16 bit int from the 2 bytes. > > Not network. Native. We can put a marker at the beginning of any bytecode > stream, and provide an endian-swapper. That way we're always running at > platform optimal encoding, and if we get bytecode from a platform with > different endianness we can run the utility to swap things for us. Onther way would to have an 8-bit opcode, with extensions, eg opcode 0xff imples that the next 16 bits is an extended opcode. The most common opcodes would be 8-bit. This would also mean the initial dispatch could be done without any range-checks even, as long as unused entries in the 8-bit table pointed to an error function: char *byte_ptr = BYTECODE_START; while (byte_ptr) { byte_ptr = (dispatch_table8[*byte_ptr++])(byte_ptr, ) } with the function at dispatch_table8[255] responsible for reading the next 2 bytes, forming them into an int, checking the ranges, then using another dispatch table.
Re: Stacks, registers, and bytecode. (Oh, my!)
Dan Sugalski <[EMAIL PROTECTED]> writes: >At 02:08 PM 5/30/2001 +, Nick Ing-Simmons wrote: >>Classic CISC code generation taught "us" that CISC is a pain to code-gen. >>(I am not a Guru but did design TMS320C80's RISC specifically to match >>gcc of that vintage, and dabbled in a p-code for Pascal way back.) > >Right, but in this case we have the advantage of tailoring the instruction >set to the language, and given the overhead inherent in op dispatch we also >have an incentive to hoist opcodes up to as high a level as we can manage. That is of course what they/we all say ;-) The 68K for example matched quite well to the low-tech compiler technology of its day, as did UCSD's p-code for USCD Pascal, and DSPs have their own reasons (inner loops are more important than generic C) for their CISC nature. Even the horrible x86 architecture is quasi-sane if you assume all variables are on the stack addressed by the Base Pointer. It is interesting now that people are looking at building chips for JVM how much cursing there is about certain features - though I don't have the references to hand. The "overhead of op dispatch" is a self-proving issue - if you have complex ops they are expensive to dispatch. In the limit FORTH-like threaded code while (1) *(*op_ptr++)(); is not really very expensive, it is then up to the "op" to adjust op_ptr for in-line args etc. Down sides are size op is at least size of a pointer. With a 16-bit opcode as-per-Uri that becomes: while (1) *(table[*op_ptr++])(); (Assuming we don't need to check bounds 'cos we won't generate bad code...) One can then start adding decode to the loop: while (1) { op_t op = *op_ptr++; switch(NUM_ARGS(op)) case 1: *(table[FUNC_NUM(op)])(*op_ptr++); break; case 3: *(table[FUNC_NUM(op)])(op_ptr[0],op_ptr[1],op_ptr[2]); op_ptr += 3; break; ... } Then one can do byte-ordering and mis-aligned hackery and index into reg-array while (1) { op_t op = GET16BITS(*op_ptr); switch(NUM_ARGS(op)) case 1: *(table[FUNC_NUM(op)])(reg_ptr[GET8BITS(*op_ptr)]); break; ... } > -- Nick Ing-Simmons who is looking for a new job see http://www.ni-s.u-net.com/
Re: Stacks, registers, and bytecode. (Oh, my!)
Nick Ing-Simmons <[EMAIL PROTECTED]> wrote: > I don't like push/pop - they imply a lot of stack limit checking word-by-word > when it is less overhead for compiler to analyse the needs of whole basic-block > check-for/make-space-on the stack _once_ then just address it. There's no reason why you can.t have a hybrid scheme. In fact I think it's a big win over a pure register-addressing scheme. Consider... At the start of a new scope, the stack is extended by N to create a new stack frame (including a one-off check that the stack can be extended). There is then a 'stack pointer' (sp) which is initialised to the base of the new frame, or an initial offset thereof. (So sp is really just a temporary index within the current frame.) Then some opcodes can use explicit addressing, while others can be explicit, or a mixture. Explicit opcodes specify one or more 'registers' - ie indexes within the current frame, while implicit opcodes use the current value of sp as an implicit index, and may alter sp as a side effect. So an ADD opcode would use sp[0], sp[-1] to find the 2 operands and would store a pointer to the result at sp[-1], then sp--. The compiler plants code in such a way that it will never allow sp to go outside the current stack frame. This allows a big win on the size of the bytecode, and in terms of the time required to decode each op. Consider the following code. $a = $x*$y+$z Suppose we have r5 and r6 available for scratch use, and that for some reason we wish to keep a pointer to $a in r1 at the end (perhaps we use $a again a couple of lines later): This might have the following bytecode with a pure resiger scheme: GETSV('x',r5) # get pointer to global $x, store in register 5 GETSV('y',r6) MULT(r5,r5,r6) # multiply the things pointed to by r5 and r6; store ptr to # result in r5 GETSV('z',r6) ADD(r5,r5,r6) GETSV('a',r1) SASSIGN(r1,r5) but might be like this in a hybrid scheme: SETSP(5)# make sp point to r5 GETSV('x') # get pointer to global $a, store at *sp++ GETSV('y') MULT GETSV('z') ADD GETSV('a') SASSIGN SAVEREG(r1) # pop pointer at *sp, and store in register 1 Both use the same regsters, have the same net result, but the explicit scheme requires an extra 11 numbers in the bytecode, not to mention all the extra cycles required to extract out those nunmbers from the bytecode in the first place.
Re: Stacks, registers, and bytecode. (Oh, my!)
At 02:08 PM 5/30/2001 +, Nick Ing-Simmons wrote: >Uri Guttman <[EMAIL PROTECTED]> writes: > > > >think of this as classic CISC code generation with plenty of registers > >and a scratch stack. this is stable technology. we could even find a > >code generator guru (i don't know any obvious ones in the perl6 world) > >Classic CISC code generation taught "us" that CISC is a pain to code-gen. >(I am not a Guru but did design TMS320C80's RISC specifically to match >gcc of that vintage, and dabbled in a p-code for Pascal way back.) Right, but in this case we have the advantage of tailoring the instruction set to the language, and given the overhead inherent in op dispatch we also have an incentive to hoist opcodes up to as high a level as we can manage. > > >> special registers ($_, @_, events, etc.) are indexed with a starting > > >> offset of 64, so general registers are 0-63. > > > > DS> I'd name them specially (S0-Snnn) rather than make them a chunk of > the > > DS> normal register set. > >All that dividing registers into sub-classes does it cause you to do >register-register moves when things are in the wrong sort of register. >Its only real benefit is for encoding density as you can "imply" part >of the register number by requiring addresses to be in address registers >etc. It is not clear to me that perl special variables map well to that. >Mind you the names are just a human thing - it is the bit-pattern that >compiler cares about. Fair enough. Having the equivalent to perl5's push stack (well, one of 'em) to save and restore named specials is probably a better idea. There's no real overriding performance reason to have the specials in the register set. Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Stacks, registers, and bytecode. (Oh, my!)
Uri Guttman <[EMAIL PROTECTED]> writes: > >think of this as classic CISC code generation with plenty of registers >and a scratch stack. this is stable technology. we could even find a >code generator guru (i don't know any obvious ones in the perl6 world) Classic CISC code generation taught "us" that CISC is a pain to code-gen. (I am not a Guru but did design TMS320C80's RISC specifically to match gcc of that vintage, and dabbled in a p-code for Pascal way back.) > > >> special registers ($_, @_, events, etc.) are indexed with a starting > >> offset of 64, so general registers are 0-63. > > DS> I'd name them specially (S0-Snnn) rather than make them a chunk of the > DS> normal register set. All that dividing registers into sub-classes does it cause you to do register-register moves when things are in the wrong sort of register. Its only real benefit is for encoding density as you can "imply" part of the register number by requiring addresses to be in address registers etc. It is not clear to me that perl special variables map well to that. Mind you the names are just a human thing - it is the bit-pattern that compiler cares about. > >oh, they have macro names which are special. something like: > >#defineMAX_PLAIN REG 64 /* 0 - 63 are plain regs */ >#defineREG_ARG 64 /* $_ */ >#defineREG_SUB_ARG 65 /* @_ */ >#defineREG_ARGV66 /* @ARGV */ >#defineREG_INT167 /* integer 1 */ >#defineREG_INT268 /* integer 1 */ > >uri -- Nick Ing-Simmons who is looking for a new job see http://www.ni-s.u-net.com/
Re: Stacks, registers, and bytecode. (Oh, my!)
Uri Guttman <[EMAIL PROTECTED]> writes: > DS> The one handy thing about push and pop is you don't need to go > DS> tracking the stack manually--that's taken care of by the push and > DS> pop opcodes. They can certainly be replaced with manipulations of > DS> a temp register and indirect register stores or loads, but that's > DS> more expensive--you do the same thing only with more dispatch > DS> overhead. > > DS> And I'm considering the stack as a place to put registers > DS> temporarily when the compiler runs out and needs a spot to > DS> squirrel something away, rather than as a mechanism to pass > DS> parameters to subs or opcodes. This is a stack in the traditional > DS> scratch-space sense. > >i agree with that. the stack here is mostly a call stack which >save/restores registers as we run out. with a large number like 64, we >won't run out until we do some deep calls. then the older registers (do >we have an LRU mechnism here?) get pushed by the sub call prologue which >then uses those registers for its my vars. I don't like push/pop - they imply a lot of stack limit checking word-by-word when it is less overhead for compiler to analyse the needs of whole basic-block check-for/make-space-on the stack _once_ then just address it. > >is the sub call/return stack also the data (scratch) stack? i think >separate ones makes sense here. the data stack is just PMC pointers, the >code call stack has register info, context, etc. One stack is more natural for translation to C (which has just one). One problem with FORTH was allocating two growable segments for its two stacks - one always ended up 2nd class. -- Nick Ing-Simmons who is looking for a new job see http://www.ni-s.u-net.com/
Re: Stacks, registers, and bytecode. (Oh, my!)
> "Larry" == Larry Wall <[EMAIL PROTECTED]> writes: Larry> : Nah, bytecode'll have an endianness marker at the top. If you Larry> : load in bytecode with the wrong endianness, the loader will Larry> : have to swap for you. Larry> Er. If they're not bytes, we can't call it bytecode. Unless you want to raise the "octet means 8 bits, byte means whatever is nice and chunky" argument. :) -- Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095 <[EMAIL PROTECTED]> http://www.stonehenge.com/merlyn/> Perl/Unix/security consulting, Technical writing, Comedy, etc. etc. See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!
Re: Stacks, registers, and bytecode. (Oh, my!)
> "DS" == Dan Sugalski <[EMAIL PROTECTED]> writes: DS> At 04:25 PM 5/29/2001 -0400, Uri Guttman wrote: >> here is an idea. if we use a pure stack design but you can access the >> stack values with an index, then the index number can get large. so a >> fixed register set would allow us to limit the index to 8 bits. so the >> byte code could look something like this: DS> Right, but that means values can potentially have their offsets DS> change based on what might happen during the course of a section DS> of code, and I'm not too comfortable with that. The potential for DS> error worries me. Plus it might make a mixed-type stack DS> trickier. (And yes, I realize the irony of me arguing against one DS> form of error-prone activity in the same paragraph that advocates DS> another...) current compilers do it all the time. they track register usage at compile time and push/pop at runtime to keep the code working. a given op is passed its args as register offsets. before/after that, there may be ops to set up the registers. only during that single op call are the needed registers known. this is not a sub call in perl with complex stuff, but a single IL op call with a handful of registers passed to it. most registers will be recycled after each perl level statement as they are temps. only those that are mapped to real vars (and other live things) are kept between statements. think of this as classic CISC code generation with plenty of registers and a scratch stack. this is stable technology. we could even find a code generator guru (i don't know any obvious ones in the perl6 world) to work with us on this. >> the op code is stored in network endian order and the interpreter will >> always build a 16 bit int from the 2 bytes. DS> Not network. Native. We can put a marker at the beginning of any bytecode DS> stream, and provide an endian-swapper. That way we're always running at DS> platform optimal encoding, and if we get bytecode from a platform with DS> different endianness we can run the utility to swap things for us. i disagree. bytecode should be portable without conversions. we store 16 bits in a known order (i picked network at random :) and build a native 16 bit int as we execute. or use only an 8 bit op code with one or more escape code for the next 256 codes (like Hong Zhang said). >> all registers point to PMC's DS> I really think we'll win if we have support for at least integers DS> as well as PMCs. There's potentially a lot of integer work that'll DS> be generated by the optimizer and, while integer opcodes might not DS> make the interpreter much faster, they'll speed up TIL and DS> generated C code significantly since they won't need to call DS> opcodes. ok, some special registers for hardware ops like add. but they should be indexed off the same register set with the other special registers. >> passing lists to/from subs is via an array ref. the data list is on the >> stack and the array ref is in @_ or passed by return(). DS> PMC for a list, and the args are in the list. (Potentially DS> aliased, of course) Subs prototyped with input and return data DS> won't use lists, they'll pass the parameters in registers. (Or DS> something like that) same idea but you clarified it. >> special registers ($_, @_, events, etc.) are indexed with a starting >> offset of 64, so general registers are 0-63. DS> I'd name them specially (S0-Snnn) rather than make them a chunk of the DS> normal register set. oh, they have macro names which are special. something like: #define MAX_PLAIN REG 64 /* 0 - 63 are plain regs */ #define REG_ARG 64 /* $_ */ #define REG_SUB_ARG 65 /* @_ */ #define REG_ARGV66 /* @ARGV */ #define REG_INT167 /* integer 1 */ #define REG_INT268 /* integer 1 */ uri -- Uri Guttman - [EMAIL PROTECTED] -- http://www.sysarch.com SYStems ARCHitecture and Stem Development -- http://www.stemsystems.com Learn Advanced Object Oriented Perl from Damian Conway - Boston, July 10-11 Class and Registration info: http://www.sysarch.com/perl/OOP_class.html
Re: Stacks, registers, and bytecode. (Oh, my!)
On Tue, May 29, 2001 at 04:36:51PM -0600, Nathan Torkington wrote: > Dan Sugalski writes: > > Okay--Parrot Object Code. (If I was feeling cleverer at the moment, I'd > > come up with a name that has a snappier acronym. Alas I'm not... :) > > p-code. The p stands for Parrot :-) No, it stands for "pieces of eight". The bar silver and the arms still lie, for all that I know, where Flint buried them; and certainly they shall lie there for me. Oxen and wain-ropes would not bring me back again to that accursed island; and the worst dreams that ever I have are when I hear the surf booming about its coasts or start upright in bed with the sharp voice of Captain Flint still ringing in my ears: "Pieces of eight! Pieces of eight!" > Nat -- $jhi++; # http://www.iki.fi/jhi/ # There is this special biologist word we use for 'stable'. # It is 'dead'. -- Jack Cohen
Re: Stacks, registers, and bytecode. (Oh, my!)
> "DS" == Dan Sugalski <[EMAIL PROTECTED]> writes: DS> At 02:15 PM 5/29/2001 -0700, Hong Zhang wrote: >> > we have a simple set of load literal, push/pop (multiple) registers op >> > codes. >> >> There should be no push/pop opcodes. They are simply register moves. DS> The one handy thing about push and pop is you don't need to go DS> tracking the stack manually--that's taken care of by the push and DS> pop opcodes. They can certainly be replaced with manipulations of DS> a temp register and indirect register stores or loads, but that's DS> more expensive--you do the same thing only with more dispatch DS> overhead. DS> And I'm considering the stack as a place to put registers DS> temporarily when the compiler runs out and needs a spot to DS> squirrel something away, rather than as a mechanism to pass DS> parameters to subs or opcodes. This is a stack in the traditional DS> scratch-space sense. i agree with that. the stack here is mostly a call stack which save/restores registers as we run out. with a large number like 64, we won't run out until we do some deep calls. then the older registers (do we have an LRU mechnism here?) get pushed by the sub call prologue which then uses those registers for its my vars. is the sub call/return stack also the data (scratch) stack? i think separate ones makes sense here. the data stack is just PMC pointers, the code call stack has register info, context, etc. uri -- Uri Guttman - [EMAIL PROTECTED] -- http://www.sysarch.com SYStems ARCHitecture and Stem Development -- http://www.stemsystems.com Learn Advanced Object Oriented Perl from Damian Conway - Boston, July 10-11 Class and Registration info: http://www.sysarch.com/perl/OOP_class.html
Re: Stacks, registers, and bytecode. (Oh, my!)
Dan Sugalski writes: > Okay--Parrot Object Code. (If I was feeling cleverer at the moment, I'd > come up with a name that has a snappier acronym. Alas I'm not... :) p-code. The p stands for Parrot :-) Nat
Re: Stacks, registers, and bytecode. (Oh, my!)
At 03:26 PM 5/29/2001 -0700, Larry Wall wrote: >Dan Sugalski writes: >: Nah, bytecode'll have an endianness marker at the top. If you load in >: bytecode with the wrong endianness, the loader will have to swap for you. > >Er. If they're not bytes, we can't call it bytecode. Okay--Parrot Object Code. (If I was feeling cleverer at the moment, I'd come up with a name that has a snappier acronym. Alas I'm not... :) Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Stacks, registers, and bytecode. (Oh, my!)
Dan Sugalski writes: : Nah, bytecode'll have an endianness marker at the top. If you load in : bytecode with the wrong endianness, the loader will have to swap for you. Er. If they're not bytes, we can't call it bytecode. Larry
Re: Stacks, registers, and bytecode. (Oh, my!)
At 04:25 PM 5/29/2001 -0400, Uri Guttman wrote: > > "DS" == Dan Sugalski <[EMAIL PROTECTED]> writes: > > DS> If someone can make a compelling argument for details on how the > DS> registers should be done that should be visible to the bytecode > DS> (Like, say, a variant on SPARC's windowing system, or forcing them > DS> on the stack such that pushes and pops renumber the registers) > DS> that'd be keen. I can see the merits, but I'm not compelled one > DS> way or the other quite yet. > >here is an idea. if we use a pure stack design but you can access the >stack values with an index, then the index number can get large. so a >fixed register set would allow us to limit the index to 8 bits. so the >byte code could look something like this: Right, but that means values can potentially have their offsets change based on what might happen during the course of a section of code, and I'm not too comfortable with that. The potential for error worries me. Plus it might make a mixed-type stack trickier. (And yes, I realize the irony of me arguing against one form of error-prone activity in the same paragraph that advocates another...) > literal data support is needed (read only) Yup. We'll need some sort of constant support, as well as some sort of sectional literal table. >the op code is stored in network endian order and the interpreter will >always build a 16 bit int from the 2 bytes. Not network. Native. We can put a marker at the beginning of any bytecode stream, and provide an endian-swapper. That way we're always running at platform optimal encoding, and if we get bytecode from a platform with different endianness we can run the utility to swap things for us. >all registers point to PMC's I really think we'll win if we have support for at least integers as well as PMCs. There's potentially a lot of integer work that'll be generated by the optimizer and, while integer opcodes might not make the interpreter much faster, they'll speed up TIL and generated C code significantly since they won't need to call opcodes. >passing lists to/from subs is via an array ref. the data list is on the >stack and the array ref is in @_ or passed by return(). PMC for a list, and the args are in the list. (Potentially aliased, of course) Subs prototyped with input and return data won't use lists, they'll pass the parameters in registers. (Or something like that) >special registers ($_, @_, events, etc.) are indexed with a starting >offset of 64, so general registers are 0-63. I'd name them specially (S0-Snnn) rather than make them a chunk of the normal register set. Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
RE: Stacks, registers, and bytecode. (Oh, my!)
At 02:15 PM 5/29/2001 -0700, Hong Zhang wrote: > > we have a simple set of load literal, push/pop (multiple) registers op > > codes. > >There should be no push/pop opcodes. They are simply register moves. The one handy thing about push and pop is you don't need to go tracking the stack manually--that's taken care of by the push and pop opcodes. They can certainly be replaced with manipulations of a temp register and indirect register stores or loads, but that's more expensive--you do the same thing only with more dispatch overhead. And I'm considering the stack as a place to put registers temporarily when the compiler runs out and needs a spot to squirrel something away, rather than as a mechanism to pass parameters to subs or opcodes. This is a stack in the traditional scratch-space sense. Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Stacks, registers, and bytecode. (Oh, my!)
At 05:32 PM 5/29/2001 -0400, Bryan C. Warnock wrote: >On Tuesday 29 May 2001 05:15 pm, Hong Zhang wrote: > > > either each op code knows how many args it has, > > > > I like to do so, otherwise we will lose most of the performance gain. > >I would think, though, that some ops would benefit more from varargs. >String assembly comes to mind. If an opcode wants some undefined number of arguments passed, it'll get a single one (a list) passed to it. I don't know that we'll have too many opcodes other than perhaps sub and method call ones that'll take a variable number of arguments, and the ones that do will really take one--a list with potentially a variable number of entries in it. So instead of seeing: some_op a, b, c, d, e... we'd see: templist r1 push r1, array1 push r1, scalar2 push r1, 15 some_op, r1 Though that might have sufficient performance loss in opcode dispatch to warrant doing things in another way. > > > or we have an end marker (e.g 0xff which is never used as a register > > > > index). > > > > If we have to use variable arguments, I strongly recommend to add one > > "argc" byte immediately following the opcode. Linear scan bytecode will be > > very slow. > >Agreed. I don't think this'll come up. Ops that take a variable number of arguments will have a count, though, and it'll probably have to be the first element. > > The 16-bit op has both endian issue and alignment issue. Most of RISC > > machine can not access byte-aligned opcode, so we have to add a lot > > of padding. Anyway, it will be fatter and slower than 8-bit opcode. > > I prefer to using escape opcode. > >This is a discussion we've had before. It seems if we solve the Unicode >issue, we've solved the bytecode issue, too. Better yet, make all the >opcodes map to UTF-8 characters, so you can write and execute bytecode as >simple strings. ;-) Nah, bytecode'll have an endianness marker at the top. If you load in bytecode with the wrong endianness, the loader will have to swap for you. Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Stacks, registers, and bytecode. (Oh, my!)
On Tuesday 29 May 2001 05:15 pm, Hong Zhang wrote: > > either each op code knows how many args it has, > > I like to do so, otherwise we will lose most of the performance gain. I would think, though, that some ops would benefit more from varargs. String assembly comes to mind. > > > or we have an end marker (e.g 0xff which is never used as a register > > index). > > If we have to use variable arguments, I strongly recommend to add one > "argc" byte immediately following the opcode. Linear scan bytecode will be > very slow. Agreed. > The 16-bit op has both endian issue and alignment issue. Most of RISC > machine can not access byte-aligned opcode, so we have to add a lot > of padding. Anyway, it will be fatter and slower than 8-bit opcode. > I prefer to using escape opcode. This is a discussion we've had before. It seems if we solve the Unicode issue, we've solved the bytecode issue, too. Better yet, make all the opcodes map to UTF-8 characters, so you can write and execute bytecode as simple strings. ;-) -- Bryan C. Warnock [EMAIL PROTECTED]
RE: Stacks, registers, and bytecode. (Oh, my!)
> here is an idea. if we use a pure stack design but you can access the > stack values with an index, then the index number can get large. so a > fixed register set would allow us to limit the index to 8 bits. so the > byte code could look something like this: > > 16 bit op (plenty of room for growth) > 8 bit register index of arg 1 > 8 bit register index of arg 2 > ... > > next op code ... > > literal data support is needed (read only) > either each op code knows how many args it has, I like to do so, otherwise we will lose most of the performance gain. > or we have an end marker (e.g 0xff which is never used as a register index). If we have to use variable arguments, I strongly recommend to add one "argc" byte immediately following the opcode. Linear scan bytecode will be very slow. > the op code is stored in network endian order and the interpreter will > always build a 16 bit int from the 2 bytes. The 16-bit op has both endian issue and alignment issue. Most of RISC machine can not access byte-aligned opcode, so we have to add a lot of padding. Anyway, it will be fatter and slower than 8-bit opcode. I prefer to using escape opcode. > we have a simple set of load literal, push/pop (multiple) registers op > codes. There should be no push/pop opcodes. They are simply register moves. > each thread has its own register set. > > all registers point to PMC's > > passing lists to/from subs is via an array ref. the data list > is on the > stack and the array ref is in @_ or passed by return(). > > special registers ($_, @_, events, etc.) are indexed with a starting > offset of 64, so general registers are 0-63. > > this can be mmapped in, executed with NO changes, fairly easily > generated by the compiler front end, optimizable on or offline, > (dis)assembler can be simply written, etc. > > simple to translate to TIL code by converting each op code to a call to > the the op function itself and passing in the register indexes or even > the PMC pointers themselves. Agreed. Hong
Re: Stacks, registers, and bytecode. (Oh, my!)
> "DS" == Dan Sugalski <[EMAIL PROTECTED]> writes: DS> If someone can make a compelling argument for details on how the DS> registers should be done that should be visible to the bytecode DS> (Like, say, a variant on SPARC's windowing system, or forcing them DS> on the stack such that pushes and pops renumber the registers) DS> that'd be keen. I can see the merits, but I'm not compelled one DS> way or the other quite yet. here is an idea. if we use a pure stack design but you can access the stack values with an index, then the index number can get large. so a fixed register set would allow us to limit the index to 8 bits. so the byte code could look something like this: 16 bit op (plenty of room for growth) 8 bit register index of arg 1 8 bit register index of arg 2 ... next op code ... literal data support is needed (read only) either each op code knows how many args it has, or we have an end marker (e.g 0xff which is never used as a register index). the op code is stored in network endian order and the interpreter will always build a 16 bit int from the 2 bytes. we have a simple set of load literal, push/pop (multiple) registers op codes. each thread has its own register set. all registers point to PMC's passing lists to/from subs is via an array ref. the data list is on the stack and the array ref is in @_ or passed by return(). special registers ($_, @_, events, etc.) are indexed with a starting offset of 64, so general registers are 0-63. this can be mmapped in, executed with NO changes, fairly easily generated by the compiler front end, optimizable on or offline, (dis)assembler can be simply written, etc. simple to translate to TIL code by converting each op code to a call to the the op function itself and passing in the register indexes or even the PMC pointers themselves. uri -- Uri Guttman - [EMAIL PROTECTED] -- http://www.sysarch.com SYStems ARCHitecture and Stem Development -- http://www.stemsystems.com Learn Advanced Object Oriented Perl from Damian Conway - Boston, July 10-11 Class and Registration info: http://www.sysarch.com/perl/OOP_class.html
Stacks, registers, and bytecode. (Oh, my!)
I've been out of touch this weekend (and currently sorta swamped now) so I've not been commenting, but here's a quick sum-up of what I've been thinking about: 1) The paired register thing's silly. Forget I mentioned it. 2) The interpreter will have some int, float, and string registers. Some stuff will be faster because of it, and it'll make the generated TIL or C code (when we do a TILperl or perl2c version) faster since we won't need to call opcode functions to add 3 and 4 together... 3) Whether the registers are really stack-based or not's an implementation detail. They'll be based off of some per-interpreter thing, of course, so'll be thread-local 4) We will have some sort of register push/pop system independent of the register implementation. (Probably, like the 68K family, with the ability to move multiple registers in one go) 5) The bytecode should be really, really close to the final executable form. I'd really like to be able to read in the bytecode in one big chunk and start executing it without change. (We'll end up with some sections that'll need to be changed--that's inevitable. If we can mmap in the non-fixup section pieces, though, that'd be great) 6) We may formally split the registers used to pass parameters from the working registers. I'm not sure if that'll ultimately be a win or not. (I can forsee lots of pointless register->register moving, and I'm not keen on pointless anything) If someone can make a compelling argument for details on how the registers should be done that should be visible to the bytecode (Like, say, a variant on SPARC's windowing system, or forcing them on the stack such that pushes and pops renumber the registers) that'd be keen. I can see the merits, but I'm not compelled one way or the other quite yet. Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Stacks & registers
Uri Guttman <[EMAIL PROTECTED]> writes: > NI> No - you keep the window base "handy" and don't keep re-fetching it, > NI> same way you keep "program counter" and "stack pointer" "handy". > > NI> Getting > NI>window[N] > NI> is same cost as > NI>next = *PC++; > > NI> My point is that to avoid keeping too-many things "handy" window > NI> base and stack pointer should be the same (real machine) register. > >if we can control that. Maybe not directly, but most compilers will keep common base registers in machine registers if you code things right. >but i see issues too. i mentioned the idea of >having $_ and other special vars and stuff would have their own PMC's in >this register set. Why does it have to be _this_ register set - globals can go in another register set - SPARC's register scheme has global registers too. That said my guess is that $_ is usually save/restored across sub/block boundaries. >dan like the idea. that doesn't map well to a window >as those vars may not change when you call subs. i just don't see >register windows as useful at the VM level. Call it what you will - I am arguing for an addressable stack not for windows as such. > > >> i am just saying register windows don't seem to be any win for us > >> and cost an extra indirection for each data access. my view is let > >> the compiler keep track of the register usage and just do > >> individual push/pops as needed when registers run out. > > NI> That makes sense if (and only if) virtual machine registers are real > NI> machine registers. If virtual machine registers are in memory then > NI> accessing them "on the stack" is just as efficient (perhaps more so) > NI> than at some other "special" location. And it avoids need for > NI> memory-to-memory moves to push/pop them when we do "spill". > >no, the idea is the VM compiler keeps track of IL register use for the >purpose of code generating N-tuple op codes and their register >arguments. this is a pure IL design thing and has nothing to do with >machine registers. at this level, register windows don't win IMO. That quote is a little misleading. My point is that UNLESS machine (real) machine registers are involved then all IL "Registers" are in memory. Given that they are in memory they should be grouped with and addressed-via-same-base-as other "memory" that a sub is accessing. (The sub will be accessing the stack (or its PAD if you like), and the op-stream for sure, and possibly a few hot globals.) The IL is going to be CISC-ish - so treat it like an x86 where you operate on things where-they-are (e.g. "on the stack") add 4,BP[4] rather than RISC where you ld BP[4],X add 4,X ST X,BP[4] If "registers" are really memory the extra "moves" of a RISC scheme are expensive. What we _really_ don't want is the worst of both worlds: push BP[4]; push 4 add pop BP[4] > >i am thinking about writing a short psuedo code post about the N-tuple >op codes and the register set design. the ideas are percolating in my >brane. > >uri -- Nick Ing-Simmons who is looking for a new job see http://www.ni-s.u-net.com/
Re: Stacks & registers
> "NI" == Nick Ing-Simmons <[EMAIL PROTECTED]> writes: NI> Well: NI> (a) I thought the plan was to design threads in from the begining this time. NI> (b) I maintain that cost is about the same as global variables anyway. i agree with (a) but is that always compiled in/enabled even in a single thread perl? is the code always thread enabled with the default being just one thread running? >> and the window base is not accounted for. you would need 2 >> indirections, the first to get the window base and the second to get the >> register in that window. NI> No - you keep the window base "handy" and don't keep re-fetching it, NI> same way you keep "program counter" and "stack pointer" "handy". NI> Getting NI>window[N] NI> is same cost as NI>next = *PC++; NI> My point is that to avoid keeping too-many things "handy" window NI> base and stack pointer should be the same (real machine) register. if we can control that. but i see issues too. i mentioned the idea of having $_ and other special vars and stuff would have their own PMC's in this register set. dan like the idea. that doesn't map well to a window as those vars may not change when you call subs. i just don't see register windows as useful at the VM level. >> i am just saying register windows don't seem to be any win for us >> and cost an extra indirection for each data access. my view is let >> the compiler keep track of the register usage and just do >> individual push/pops as needed when registers run out. NI> That makes sense if (and only if) virtual machine registers are real NI> machine registers. If virtual machine registers are in memory then NI> accessing them "on the stack" is just as efficient (perhaps more so) NI> than at some other "special" location. And it avoids need for NI> memory-to-memory moves to push/pop them when we do "spill". no, the idea is the VM compiler keeps track of IL register use for the purpose of code generating N-tuple op codes and their register arguments. this is a pure IL design thing and has nothing to do with machine registers. at this level, register windows don't win IMO. i am thinking about writing a short psuedo code post about the N-tuple op codes and the register set design. the ideas are percolating in my brane. uri -- Uri Guttman - [EMAIL PROTECTED] -- http://www.sysarch.com SYStems ARCHitecture and Stem Development -- http://www.stemsystems.com Learn Advanced Object Oriented Perl from Damian Conway - Boston, July 10-11 Class and Registration info: http://www.sysarch.com/perl/OOP_class.html
Re: Stacks & registers
Uri Guttman <[EMAIL PROTECTED]> writes: > NI> i.e. > NI> R4 = frame[N] > NI> is same cost as > NI> R4 = per_thread[N] > NI> and about the same as > NI> extern REGISTER GlobalRegs4 > NI> R4 = GlobalRegs4; > >well, if there is no multithreading then you don't need the per_thread >lookup. Well: (a) I thought the plan was to design threads in from the begining this time. (b) I maintain that cost is about the same as global variables anyway. The case for (b) is as follows: on RISC hardware R4 = SomeGlobal; becomes two instructions: loadhigh SomeGlobal.high,rp ld rp(SomeGlobal.low),R4 The C compiler will try and factor out the loadhigh instruction, leaving you with an indexed load. In most cases ld rp(RegBase.low+4),R4 is just a valid and takes same number of cycles, and there is normally a form like ld rp(rn),R4 Which allows "index" by variable amount. On CISC machines, then either there is an invisible RISC (e.g. Pentium) which behaves as above or you get something akin to PDP-11 where indirection reads a literal address via the "program counter". move [pc+n],r4 In such cases move [regbase+n],r4 is going to be just as fast - the issue is the need for a (real machine) register to hold 'regbase'. >and the window base is not accounted for. you would need 2 >indirections, the first to get the window base and the second to get the >register in that window. No - you keep the window base "handy" and don't keep re-fetching it, same way you keep "program counter" and "stack pointer" "handy". Getting window[N] is same cost as next = *PC++; My point is that to avoid keeping too-many things "handy" window base and stack pointer should be the same (real machine) register. >i am just saying register windows don't seem to >be any win for us and cost an extra indirection for each data access. my >view is let the compiler keep track of the register usage and just do >individual push/pops as needed when registers run out. That makes sense if (and only if) virtual machine registers are real machine registers. If virtual machine registers are in memory then accessing them "on the stack" is just as efficient (perhaps more so) than at some other "special" location. And it avoids need for memory-to-memory moves to push/pop them when we do "spill". -- Nick Ing-Simmons who is looking for a new job see http://www.ni-s.u-net.com/
Re: Stacks & registers
> "NI" == Nick Ing-Simmons <[EMAIL PROTECTED]> writes: NI> Uri Guttman <[EMAIL PROTECTED]> writes: >> but indexing directly into a stack frame is effectively a register >> window. the problem is that you need to do an indirection through the >> window base for every access and that is slow in software (but free in >> hardware). NI> It isn't free in hardware either, but cost may be lower. Modern NI> machines should be able to schedule indirection fairly NI> efficiently. But I would contend we are going to have at least NI> one index operation anyway - if only from the "thread" pointer, or NI> "global base" - so with careful design so that "registers" are at NI> right "offset" from the "base" we can subsume the register lookup NI> index into that. NI> i.e. NI> R4 = frame[N] NI> is same cost as NI> R4 = per_thread[N] NI> and about the same as NI> extern REGISTER GlobalRegs4 NI> R4 = GlobalRegs4; well, if there is no multithreading then you don't need the per_thread lookup. and the window base is not accounted for. you would need 2 indirections, the first to get the window base and the second to get the register in that window. i am just saying register windows don't seem to be any win for us and cost an extra indirection for each data access. my view is let the compiler keep track of the register usage and just do individual push/pops as needed when registers run out. uri -- Uri Guttman - [EMAIL PROTECTED] -- http://www.sysarch.com SYStems ARCHitecture and Stem Development -- http://www.stemsystems.com Learn Advanced Object Oriented Perl from Damian Conway - Boston, July 10-11 Class and Registration info: http://www.sysarch.com/perl/OOP_class.html
Re: Stacks & registers
Alan Burlison <[EMAIL PROTECTED]> writes: >> 1. When you call deep enough to fall off the end of the large register >>file an expensive "system call" is needed to save some registers >>at the other end to memory and "wrap", and then again when you >>come "back" to the now-in-memory registers. > >Not a system call but a trap - they aren't the same thing (pedant mode off >;-). The register spill trap handler copies the relevant registers onto the >stack - each stack frame has space allocated for this. Pedant mode accepted - and I concur. But trap handler is still significant overhead compared to just doing the moves (scheduled) inline as part of normal code. So register windows win if you stay in bounds but loose quite seriously if you have "active" deep calls. (My own style is to write small functions rather than #define or inline, for cache reasons - this has tended to make above show. I am delighted to say that _modern_ (Sun) SPARCs have deep enough windows even for me - but SPARCStation1+ and some of the lowcost CPUs didn't.) > >Alan Burlison -- Nick Ing-Simmons who is looking for a new job see http://www.ni-s.u-net.com/
Re: Stacks & registers
Uri Guttman <[EMAIL PROTECTED]> writes: >> "NI" == Nick Ing-Simmons <[EMAIL PROTECTED]> writes: > > NI> "We" need to decide where a perl6 sub's local variables are going > NI> to live (in the recursive case) - if we need a "stack" anyway it > NI> may make sense for VM to have ways of indexing the local "frame" > NI> rather than having "global" registers (set per thread by the way?) > >i made that thread point too in my long reply to dan. > >but indexing directly into a stack frame is effectively a register >window. the problem is that you need to do an indirection through the >window base for every access and that is slow in software (but free in >hardware). It isn't free in hardware either, but cost may be lower. Modern machines should be able to schedule indirection fairly efficiently. But I would contend we are going to have at least one index operation anyway - if only from the "thread" pointer, or "global base" - so with careful design so that "registers" are at right "offset" from the "base" we can subsume the register lookup index into that. i.e. R4 = frame[N] is same cost as R4 = per_thread[N] and about the same as extern REGISTER GlobalRegs4 R4 = GlobalRegs4; > -- Nick Ing-Simmons who is looking for a new job see http://www.ni-s.u-net.com/
Re: Stacks & registers
> "AB" == Alan Burlison <[EMAIL PROTECTED]> writes: AB> Uri Guttman wrote: >> but indexing directly into a stack frame is effectively a register >> window. the problem is that you need to do an indirection through the >> window base for every access and that is slow in software (but free in >> hardware). AB> Not quite - a register window system has real registers, it's just AB> at some point they may end up in memory. That's not the same as AB> indexing into memory on every access. what i meant is that the indirection to access a given windowed register using the window base is free (speed wise) in hardware, but doing the same thing in software costs. but comparing the hardware design with a pure software design is not too useful. the costs and goals are so different. some conceptual analogies are useful but saying register windows are fast in hardware (BTW that is risc stuff) but not a speed win in software (where parrot is leaning towards looking cisc-like) is not relevant. uri -- Uri Guttman - [EMAIL PROTECTED] -- http://www.sysarch.com SYStems ARCHitecture and Stem Development -- http://www.stemsystems.com Learn Advanced Object Oriented Perl from Damian Conway - Boston, July 10-11 Class and Registration info: http://www.sysarch.com/perl/OOP_class.html
Re: Stacks & registers
Uri Guttman wrote: > but indexing directly into a stack frame is effectively a register > window. the problem is that you need to do an indirection through the > window base for every access and that is slow in software (but free in > hardware). Not quite - a register window system has real registers, it's just at some point they may end up in memory. That's not the same as indexing into memory on every access. Alan Burlison
Re: Stacks & registers
Nick Ing-Simmons wrote: > >That comment reminds me of how the register file is implemented in > >a sun sparc. They have a large register file, but only some are accessable > >at any given time, say 16. > > 32 IIRC but principle is correct. 8 global registers, 8 out registers, 8 local registers and 8 in registers. Some are set aside for special purposes, e.g. %r14 is stack pointer, %r15 is called subroutine return addr etc. Effectively you have 6 'in' and 6 'out' registers Extra arguments above 6 are passed in the caller's stack frame. > 1. When you call deep enough to fall off the end of the large register >file an expensive "system call" is needed to save some registers >at the other end to memory and "wrap", and then again when you >come "back" to the now-in-memory registers. Not a system call but a trap - they aren't the same thing (pedant mode off ;-). The register spill trap handler copies the relevant registers onto the stack - each stack frame has space allocated for this. Alan Burlison
Re: Stacks & registers
> "NI" == Nick Ing-Simmons <[EMAIL PROTECTED]> writes: NI> "We" need to decide where a perl6 sub's local variables are going NI> to live (in the recursive case) - if we need a "stack" anyway it NI> may make sense for VM to have ways of indexing the local "frame" NI> rather than having "global" registers (set per thread by the way?) i made that thread point too in my long reply to dan. but indexing directly into a stack frame is effectively a register window. the problem is that you need to do an indirection through the window base for every access and that is slow in software (but free in hardware). NI> What I do agree with is that NI>push a NI>push b NI>add NI>pop r NI> is lousy way to code r = a + b - too much pointless copying. NI> We want NI>add #a,#b,#r NI> were #a is a small number indexing into "somewhere" where a is stored. and that is an N-tuple opcode with a fixed size register set. register windowing is not needed here, as pushing/popping as needed works well. the win of a stack system is a cleaner and simpler op code api with fewer op codes. but that doesn't gain us much as we have so many complex ops that compiling them to simpler ones and all the extra runtime is a loss. BTW, an N-tuple op code is easily converted to/from an asm form which is a feature that has been mentioned as desired. you can look at higher level CISC instruction sets (e.g. vax) for examples. hell, the vax had a polynomial op code and high level string ops! uri -- Uri Guttman - [EMAIL PROTECTED] -- http://www.sysarch.com SYStems ARCHitecture and Stem Development -- http://www.stemsystems.com Learn Advanced Object Oriented Perl from Damian Conway - Boston, July 10-11 Class and Registration info: http://www.sysarch.com/perl/OOP_class.html
Re: Stacks & registers
> "DS" == Dan Sugalski <[EMAIL PROTECTED]> writes: DS> [I'm taking these all at once, so bear with me] DS> Simon wrote: >> Register based. Untyped registers; I'm hoping that the vtable stuff can be >> sufficiently optimized that there'll be no major win in storing multiple >> copies of a PMC's data in different types knocking around. DS> Maybe, but I'm thinking that adding two cached integers together (the DS> integer piece of the register being essentially a cache) will be faster DS> than two calls to get_integer. that makes sense for common stuff like integer math and maybe even basic string stuff like join, interpolation and (perl5) .. we could have a string accumulator (shades of pdp8!) for building up strings. the problem occurs when you have nested interpolation: "foo$bar{qq{$key1$key2}}" the compiler would have to convert that to inside out execution. idea: there still is a special string accumulator but it is a pmc and that is passed around instead of the full register. then a new pmc is used to replace it. this should lower the copying of strings as they get passed around. DS> Also, by having integer bits of registers, it means we can avoid DS> making a full PMC for integer constants, or for integers (or DS> floats, or strings) the interpreter wants to deal with DS> internally. (scratch ints/floats/strings/whatever) yep. >> I'm unclear what purpose having virtual resisters for ints and nums serves: >> can't we just put those in real registers? Or perhaps I'm missing the pt? DS> The point of having int/num/string registers is for cache and DS> convenience reasons. If Parrot's dealing with an integer and it DS> *knows* that it's an integer (because, for example, we're dealing DS> with some sort of internal counter, or Python code) there seems DS> little reason to promote it to a full PMC. hmm, sort of like your special integer scratch registers too. but since we are coding in C we can't force anything to real hardware registers, we have to let the (g)cc handle that for us. DS> Hong wrote: >> every bytecode inside one method always operates on the same stack >> depth, therefore we can just treat the "locals + stack" as a flat >> register file. A single pass can translate stack based code into >> register based code. DS> Fair enough, but then registers are just oddly named stack DS> entities, which makes their being on the stack a matter of DS> implementation. (Whether that means anything useful is another DS> matter) as most compiler books will note, all/most IL's can be easily translated to one another. our goal is to pick one that works best for our needs and then translate to others as desired (like jvm or .net). DS> Uri wrote: >> first question: is this for the intermediate language or the back end >> VM? they don't have to be the same. DS> Bytecode. I want the bytecode to be what the Parrot VM eats--it DS> will be Parrot's assembly language. but that still doesn't answer my question. do we directly generate bytecode or an IL and then generate bytecode from that? bytecode is not an IL but a storable form of one. you can create bytecode for a stack or register or N-tuple machine. i am saying that we can choose our IL independently of the bytecode design. bytecode is harder to optimize and manipulate than most IL's. so i say we compile to an IL which can be interpreted directly and which can also be stored/retrieved in a bytecode form. but the bytecode doesn't have to be directly executable. it may need to be expanded back into the IL (which could be very fast with a simple mapping). >> since our goal is to support the polymorphic front/back end design, >> the intermediate language (IL) should be easy to generate and easy >> to compile to various targets. also it needs to have other features >> like being modifiable for optimization, storable on disk, >> debuggable (for dev types), etc. DS> This is the one spot it falls down. Though arguably translating DS> bytecodes dealing with PMCs will be more work than bytecodes that DS> deal with ints or strings. that was my previous point. we have to decide the tradeoff between speed, ease of manipulation of the IL and translations (if any) to/from bytecode. DS> If someone thinks the whole "multiple views per register" thing is a bad DS> idea, this would be a good line of attack to start with. :) i think it is ok but there doesn't need to be a V flag for each type. any given N-tuple knows knows the return value of any N-tuple it references and so fetches it directly. just like in a real compiler, you know that memory or register is an int and don't check it each time. but it will change over time like our registers will. a register type is tracked at compile time only and not looked at during runtime. >> the PL/I compiler suite i hacked on used that style of IL. but is >> allocated as many named temps as it needed by using N
Re: Stacks & registers
[I'm taking these all at once, so bear with me] Simon wrote: >Register based. Untyped registers; I'm hoping that the vtable stuff can be >sufficiently optimized that there'll be no major win in storing multiple >copies of a PMC's data in different types knocking around. Maybe, but I'm thinking that adding two cached integers together (the integer piece of the register being essentially a cache) will be faster than two calls to get_integer. Also, by having integer bits of registers, it means we can avoid making a full PMC for integer constants, or for integers (or floats, or strings) the interpreter wants to deal with internally. (scratch ints/floats/strings/whatever) >For those yet to be convinced by the benefits of registers over stacks, try >grokking in fullness what op scratchpads are about. Ooh look, registers. Or just dig through the code for pp_split. (While it might not convince you of the joys of registers, it will make your head hurt sufficiently to convince you that Things Must Be Done. Preferably things involving lots of analgesics... :) Dave Mitchell wrote: >For low-level assembly, registers are important because using them >avoids memory accesses. For high-level perl 6 bytecode, I suppose the >analogy would be having a pool of scratch PMCs that dont have >to be continually alloced+GCed. Whether we can avoid the allocs depends >on the fine detail of how how PMCs are allocated in general. If the allocs >can't be avoided, then I think we might as well stick with a stack >architecture. We'll have a stack, certainly, or something like it. If the register file is limited in size (and it'll have to be, reality being what it is) there'll have to be a scratch spot to stick things we aren't using. We could call 'em temporaries, or named temporaries, or whatever, but that's just a roundabout way of saying register. :) (Or vice versa, I suppose, depending on which direction you're coming from) >I'm unclear what purpose having virtual resisters for ints and nums serves: >can't we just put those in real registers? Or perhaps I'm missing the pt? The point of having int/num/string registers is for cache and convenience reasons. If Parrot's dealing with an integer and it *knows* that it's an integer (because, for example, we're dealing with some sort of internal counter, or Python code) there seems little reason to promote it to a full PMC. Hong wrote: >I think "stack based =~ register based". If we don't have Java-like "jsr" >and "ret", We might, though. Probably will, as it will make translation to JVM and native machine code easier. (I think. Could be wrong) > every bytecode inside one method always operates on the same >stack >depth, therefore we can just treat the "locals + stack" as a flat register >file. A single pass can translate stack based code into register based code. Fair enough, but then registers are just oddly named stack entities, which makes their being on the stack a matter of implementation. (Whether that means anything useful is another matter) Uri wrote: >first question: is this for the intermediate language or the back end >VM? they don't have to be the same. Bytecode. I want the bytecode to be what the Parrot VM eats--it will be Parrot's assembly language. >since our goal is to support the polymorphic front/back end design, the >intermediate language (IL) should be easy to generate and easy to >compile to various targets. also it needs to have other features like >being modifiable for optimization, storable on disk, debuggable (for dev >types), etc. This is the one spot it falls down. Though arguably translating bytecodes dealing with PMCs will be more work than bytecodes that deal with ints or strings. If someone thinks the whole "multiple views per register" thing is a bad idea, this would be a good line of attack to start with. :) >the PL/I compiler suite i hacked on used that style of IL. but is >allocated as many named temps as it needed by using N-tuples for each op >code. each OP would leave its result in a named temp which could be >referred to by later OPs. i think they were generic (as PL/I could mung >many more types of data than perl) but the compiler knew what held what >so that wasn't an issue. That's still reasonably stack-based, and assumes a single return value per sub, which we don't have. (Unless we consider a list as a single value, which is reasonable) >the problem with a fixed number of registers is register overflow. this >is true in all register based machines and VM's and is a pain. if we go >with a pure named temp architecture, you don't need to worry about >that. but it is slower to access and manage named temps than a fixed set >of registers (or even just a stack which is like a single reg machine). From all the literature I've dug through, a largish register set (like 32 or 64) gets around 90+% of the overflow issues, and we can still have register push/pop opcodes. >i have some experience with N-tuple
Re: Stacks & registers
Graham Barr <[EMAIL PROTECTED]> writes: >On Wed, May 23, 2001 at 10:30:32AM -0700, Hong Zhang wrote: >> I think "stack based =~ register based". If we don't have Java-like "jsr" > >That comment reminds me of how the register file is implemented in >a sun sparc. They have a large register file, but only some are accessable >at any given time, say 16. 32 IIRC but principle is correct. >When you do a sub call you place your >arguments in the high registers, say 4, and shift the window pointer >by 12 (in this case). What was r12-r15 now becomes r0-r3. On return >the result is placed into r0-r3 which are then available to the >caller as r12-r15. > >This allows very efficient argument passing without having to save >registers to a stack and restor them later. That cost can be over-played - most compilers need "scratch" registers which with have values which don't need to be saved, if your RISC is set up to use arg registers as scratch then most of the save/restores can be eliminated, and others can be hidden in delay slots or super-scalar executed in parallel with other operations. The problems with SPARC scheme are: 1. When you call deep enough to fall off the end of the large register file an expensive "system call" is needed to save some registers at the other end to memory and "wrap", and then again when you come "back" to the now-in-memory registers. 2. The large file has a large decode which is in "logical" critical path - so all kinds of bypass tricks have to be played to get speed back. Neither is too bad for a virtual machine though. "We" need to decide where a perl6 sub's local variables are going to live (in the recursive case) - if we need a "stack" anyway it may make sense for VM to have ways of indexing the local "frame" rather than having "global" registers (set per thread by the way?) What I do agree with is that push a push b add pop r is lousy way to code r = a + b - too much pointless copying. We want add #a,#b,#r were #a is a small number indexing into "somewhere" where a is stored. > >Graham. -- Nick Ing-Simmons who is looking for a new job see http://www.ni-s.u-net.com/
Re: Stacks & registers
On Wed, May 23, 2001 at 01:36:14PM -0400, Bryan C. Warnock wrote: > You could certainly call PL_undef and the ilk (in its new > PMC form) a dedicated register Which is, incidentally, exactly what happens in disassembled Perl 5 bytecode. -- "Why waste negative entropy on comments, when you could use the same entropy to create bugs instead?" -- Steve Elias
Re: Stacks & registers
> "GB" == Graham Barr <[EMAIL PROTECTED]> writes: GB> On Wed, May 23, 2001 at 10:30:32AM -0700, Hong Zhang wrote: >> I think "stack based =~ register based". If we don't have Java-like "jsr" GB> That comment reminds me of how the register file is implemented in GB> a sun sparc. They have a large register file, but only some are accessable GB> at any given time, say 16. When you do a sub call you place your GB> arguments in the high registers, say 4, and shift the window pointer GB> by 12 (in this case). What was r12-r15 now becomes r0-r3. On return GB> the result is placed into r0-r3 which are then available to the GB> caller as r12-r15. GB> This allows very efficient argument passing without having to save GB> registers to a stack and restor them later. there have been arguments over whether register windows win or not. on the plus side they make passing sub args very efficient (if you can fit them all into the window). on the minus side, you have to do major chunks of swapping registers to/from stack. the window size is a big issue too and some think sparc has too small a window and too few total registers. and besides register windows is a hardware only solution (they just mung the virtual base in a special register). i have not heard of it being used in pure software as the extra costs of having many registers or a deep stack in software is nothing while in hardware is has a real (and heavy) cost. uri -- Uri Guttman - [EMAIL PROTECTED] -- http://www.sysarch.com SYStems ARCHitecture and Stem Development -- http://www.stemsystems.com Learn Advanced Object Oriented Perl from Damian Conway - Boston, July 10-11 Class and Registration info: http://www.sysarch.com/perl/OOP_class.html
Re: Stacks & registers
On Wed, May 23, 2001 at 10:30:32AM -0700, Hong Zhang wrote: > I think "stack based =~ register based". If we don't have Java-like "jsr" That comment reminds me of how the register file is implemented in a sun sparc. They have a large register file, but only some are accessable at any given time, say 16. When you do a sub call you place your arguments in the high registers, say 4, and shift the window pointer by 12 (in this case). What was r12-r15 now becomes r0-r3. On return the result is placed into r0-r3 which are then available to the caller as r12-r15. This allows very efficient argument passing without having to save registers to a stack and restor them later. Graham.
Re: Stacks & registers
On Wed, May 23, 2001 at 06:06:26PM +0100, Simon Cozens wrote: > On Wed, May 23, 2001 at 12:59:01PM -0400, Dan Sugalski wrote: > > Should Parrot be a register or stack-based system, and if a register-based > > one, should we go with typed registers? > > Register based. Untyped registers; I'm hoping that the vtable stuff can be > sufficiently optimized that there'll be no major win in storing multiple > copies of a PMC's data in different types knocking around. I would agree with that too. Graham.
Re: Stacks & registers
At 12:59 PM 05-23-2001 -0400, Dan Sugalski wrote: >Okay, folks, here's the current conundrum: > >Should Parrot be a register or stack-based system, and if a register-based >one, should we go with typed registers? >My current thoughts are this: > >We have a set of N registers. They're all linked. Nothing implicitly sets >values in any of the registers (if you want an integer value, you need to >make one). Each register has a set of validity markers for each type (int, >flaot, string, PMC) that may or may not be bits. We have a stack of sorts >that we can push the registers on to if we need. In the section I snipped, you described "linked" registers in relation to multiple sets of typed registers, with linking meaning that IntR1 would have the same value as FloatR1, etc. What do you mean by "linked" here, with each register being (as I read it) dynamically typed? Is N fixed, or can we have different number of visible registers at a time? When we push registers onto the stack, do we push them individually, or as a set? I mean, can we get away with something like (assuming C++-style overloading on "Register"): -- Register rFile[MAXREGSTACK][N]; int rDepth = 0; ParrotOp add(int addend, int addor, int sum) { rFile[rDepth][sum] = rFile[rDept][addor] + rFile[rDepth][addend]; } ParrotOp pushrframe(void) { if (rDepth == MAXREGSTACK) die("Register Stack exceeded"); rDepth++; } Or we could go the SPARC register window route: (Note: SPARC register windows overlap: they have three sets of 8 registers, and when a push happens, the old 3rd set becomes the new 1st set, allowing the caller and callee to share a set of 8 registers) Register *rFile = NULL; Register *rFrame = NULL; int rFileSize = 0; ParrotOp add(unsigned int addend, unsigned int addor, unsigned int sum) { assert(sum < 3*N); assert(addend < 3*N); assert (addor < 3*N); rFrame[sum] = rFrame[addend] + rFrame[addor] } ParrotOp pushRframe { if ((rFrame - rFile) < rFileSize - 5*N) { if (resize(rFile,rFileSize*2)) { rFileSize *= 2; } else { die("Register Stack Frame out of memory"); } } rFrame += 2*N; } -- >I'm definitely feeling unsure about this, so feel free (please!) to wade >in with comments, criticisms, or personal attacks... :) > > Dan > >--"it's like this"--- >Dan Sugalski even samurai >[EMAIL PROTECTED] have teddy bears and even > teddy bears get drunk
Re: Stacks & registers
On Wednesday 23 May 2001 12:59, Dan Sugalski wrote: > Okay, folks, here's the current conundrum: > > Should Parrot be a register or stack-based system, and if a register-based > one, should we go with typed registers? > > My personal preference here is to make Parrot a register based system. I > can wrap my head around that a lot easier than a stack system, it's > something I'm comfortable with, there's lots of literature on optimizing > this sort of system, and I just generally like it better. (I cut my > programming teeth on 6502s... so sue me :) Stack based systems have a > certain appeal--they're simpler generally, which is fine. I'm not too > worried about simpler as much as I am faster, and I think we can get > faster out of registers. (Or, if you prefer, "named temporaries" instead > of registers. Whatever) > > If (or when, I suppose, barring a Really Good Counter-Argument) we go the > register route, then, should we have typed registers like most CPUs do? A > set of PMC pointer registers, a set of integer registers, a set of > floating-point registers, a set of string registers? And if we do go with > typed registers, should they be linked together? (So that the int in > Iregister 1 matches the integer value of the PMC in Pregister 1, and the > float value in Fregister 1?) And if they're linked (And should they all be > linked, or only some of them) should we guarantee consistency, or should > the bytecode explicitly make various registers valid or invalid? A little backwards my answer will be. I'm fairly sure we won't need linked registers. The PMC's vtable implemention should already take care of the int/float/string cross-referencing - by having the linked registers, you'd really only be creating a parallel structure to wade through. Worse case, you're doing both - using the vtable to create the values for the other registers. (After checking the register first to see if it were valid?) That doesn't sound like a win at all. You could bypass the PMC's vtable and populate the registers directly, but then why have the PMC. I don't see any reason for the linking. Now, assuming I'm not way out in the left-field stands above, I don't see the necessity for typed registers (at least as described). (I mean, one of the major reasons for typed registers is for hardware optimization, which we can't really borrow. We're emulating that win with the vtables.) However, another common use for registers is to save on allocation - like a hardware zero register. You could certainly call PL_undef and the ilk (in its new PMC form) a dedicated register, which means it's sort of type. But I think you still make a save if you automatically assume that every register is simply a PMC. The only use I could think of would be for constants that you know aren't going to need to be converted from one form to another. But then you need to handle the two paths in the rest of your vtable entries. Unless you make everything run through the registers. Your not going to make opcodes {shudder} dependent on implicit registers, are you? :-) -- Bryan C. Warnock [EMAIL PROTECTED]
RE: Stacks & registers
> Register based. Untyped registers; I'm hoping that the vtable stuff can be > sufficiently optimized that there'll be no major win in > storing multiple copies of a PMC's data in different types knocking around. > > For those yet to be convinced by the benefits of registers over stacks, try > grokking in fullness what op scratchpads are about. Ooh look, registers. I think "stack based =~ register based". If we don't have Java-like "jsr" and "ret", every bytecode inside one method always operates on the same stack depth, therefore we can just treat the "locals + stack" as a flat register file. A single pass can translate stack based code into register based code. For example: push local #3; => move #(max_local + opcode_stack_depth), #3 push local #3; push local #4; add; pop local #5; => add #5, #3, #4 push local #3; push local #4; call foo; pop #6; => call_2 #6, #3, #4 As long as stack based system is carefully designed, we can easily add linear-cost translation step to convert it into register based bytecode, and run it faster. Hong
Re: Stacks & registers
> "DS" == Dan Sugalski <[EMAIL PROTECTED]> writes: DS> Should Parrot be a register or stack-based system, and if a DS> register-based one, should we go with typed registers? first question: is this for the intermediate language or the back end VM? they don't have to be the same. since our goal is to support the polymorphic front/back end design, the intermediate language (IL) should be easy to generate and easy to compile to various targets. also it needs to have other features like being modifiable for optimization, storable on disk, debuggable (for dev types), etc. this IL could be the VM we use in perl itself and it can also be compiled to other VM's or c, etc. also it could be processed/compiled to a faster perl VM for execution. DS> My personal preference here is to make Parrot a register based DS> system. I can wrap my head around that a lot easier than a stack DS> system, it's something I'm comfortable with, there's lots of DS> literature on optimizing this sort of system, and I just generally DS> like it better. (I cut my programming teeth on 6502s... so sue me DS> :) Stack based systems have a certain appeal--they're simpler DS> generally, which is fine. I'm not too worried about simpler as DS> much as I am faster, and I think we can get faster out of DS> registers. (Or, if you prefer, "named temporaries" instead of DS> registers. Whatever) the PL/I compiler suite i hacked on used that style of IL. but is allocated as many named temps as it needed by using N-tuples for each op code. each OP would leave its result in a named temp which could be referred to by later OPs. i think they were generic (as PL/I could mung many more types of data than perl) but the compiler knew what held what so that wasn't an issue. DS> If (or when, I suppose, barring a Really Good Counter-Argument) we DS> go the register route, then, should we have typed registers like DS> most CPUs do? A set of PMC pointer registers, a set of integer DS> registers, a set of floating-point registers, a set of string DS> registers? And if we do go with typed registers, should they be DS> linked together? (So that the int in Iregister 1 matches the DS> integer value of the PMC in Pregister 1, and the float value in DS> Fregister 1?) And if they're linked (And should they all be DS> linked, or only some of them) should we guarantee consistency, or DS> should the bytecode explicitly make various registers valid or DS> invalid? the problem with a fixed number of registers is register overflow. this is true in all register based machines and VM's and is a pain. if we go with a pure named temp architecture, you don't need to worry about that. but it is slower to access and manage named temps than a fixed set of registers (or even just a stack which is like a single reg machine). DS> My current thoughts are this: DS> We have a set of N registers. They're all linked. Nothing DS> implicitly sets values in any of the registers (if you want an DS> integer value, you need to make one). Each register has a set of DS> validity markers for each type (int, flaot, string, PMC) that may DS> or may not be bits. We have a stack of sorts that we can push the DS> registers on to if we need. that is a good start. the stack can handle the overflow but that means more complexity in the compiler to deal with that. let's do some study of various IL designs (including both compiled and interpreted) and see their wins and losses. i think pure stack based is out as it is too limiting. i have some experience with N-tuples and it does work well for compiling as you don't care about the managing of registers at run time then. you just keep a fat namespace table at compile time and throw it out later. we have a compile and run phase so managing a large number of N-tuples may be slow running but easy to generate and manipulate. that is a typical tradeoff. so my current best idea is a fixed size set of active N-tuples/registers (any number larger than most long statements would work, say 100?) as the registers. each N-tuple has its op code and argument register indexes and which register it stores its result in. we have a stack for the overflow handling and that is the hardest part. we should have a small set of special registers to handle events, booleans, the PC, signals, etc. this makes compiling and optimization very easy. we just need to manage the lifetime of registers for the N-tuples and deal with the (hopefully) rare overflow. runtime interpretation is also fairly simple (not as easy as a stack but not too bad) with decent speed. code generation to other VM or storing to disk should be fairly straightforward. DS> I'm definitely feeling unsure about this, so feel free (please!) DS> to wade in with comments, criticisms, or personal attacks... :) you are a doofus. but a smart one. :) uri -- Uri Guttman - [EMAIL PROTECTED] -- http://www.sysarch.com SYStems AR
Re: Stacks & registers
> Okay, folks, here's the current conundrum: > > Should Parrot be a register or stack-based system, and if a register-based > one, should we go with typed registers? For low-level assembly, registers are important because using them avoids memory accesses. For high-level perl 6 bytecode, I suppose the analogy would be having a pool of scratch PMCs that dont have to be continually alloced+GCed. Whether we can avoid the allocs depends on the fine detail of how how PMCs are allocated in general. If the allocs can't be avoided, then I think we might as well stick with a stack architecture. I'm unclear what purpose having virtual resisters for ints and nums serves: can't we just put those in real registers? Or perhaps I'm missing the pt? Dave "even unclearer abut this than Dan" M.
Re: Stacks & registers
On Wed, May 23, 2001 at 12:59:01PM -0400, Dan Sugalski wrote: > Should Parrot be a register or stack-based system, and if a register-based > one, should we go with typed registers? Register based. Untyped registers; I'm hoping that the vtable stuff can be sufficiently optimized that there'll be no major win in storing multiple copies of a PMC's data in different types knocking around. For those yet to be convinced by the benefits of registers over stacks, try grokking in fullness what op scratchpads are about. Ooh look, registers. -- > I'm a person, not a piece of property. Happily, I'm both! - Lionel and Stephen Harris.
Stacks & registers
Okay, folks, here's the current conundrum: Should Parrot be a register or stack-based system, and if a register-based one, should we go with typed registers? My personal preference here is to make Parrot a register based system. I can wrap my head around that a lot easier than a stack system, it's something I'm comfortable with, there's lots of literature on optimizing this sort of system, and I just generally like it better. (I cut my programming teeth on 6502s... so sue me :) Stack based systems have a certain appeal--they're simpler generally, which is fine. I'm not too worried about simpler as much as I am faster, and I think we can get faster out of registers. (Or, if you prefer, "named temporaries" instead of registers. Whatever) If (or when, I suppose, barring a Really Good Counter-Argument) we go the register route, then, should we have typed registers like most CPUs do? A set of PMC pointer registers, a set of integer registers, a set of floating-point registers, a set of string registers? And if we do go with typed registers, should they be linked together? (So that the int in Iregister 1 matches the integer value of the PMC in Pregister 1, and the float value in Fregister 1?) And if they're linked (And should they all be linked, or only some of them) should we guarantee consistency, or should the bytecode explicitly make various registers valid or invalid? My current thoughts are this: We have a set of N registers. They're all linked. Nothing implicitly sets values in any of the registers (if you want an integer value, you need to make one). Each register has a set of validity markers for each type (int, flaot, string, PMC) that may or may not be bits. We have a stack of sorts that we can push the registers on to if we need. I'm definitely feeling unsure about this, so feel free (please!) to wade in with comments, criticisms, or personal attacks... :) Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk