Dan Sugalski wrote: > What sort of dispatch was your version using, and what sort was > parrot using in your test?
Parrot used the standard function call dispatcher without bounds checking. Kakapo used a threaded dispatcher. There's a pre-processing phase that does byte code verification because threading makes for some outrageously unsafe code. Parrot and Kakapo should have very similar mops when using the same dispatcher. You all know what a Parrot "add" op looks like. Here's the Kakapo add op: op_add: STORE(kvm_int32, pc[1]) = FETCH(kvm_int32, pc[2]) + FETCH(kvm_int32, pc[3]); pc += 4; NEXT_OP; Ok, ok. You want to know what those macros do... ;) op_add: *(kvm_int32 *)(frame[pc[1].word.hi] + pc[1].word.lo) = *(const kvm_int32 *)(frame[pc[2].word.hi] + pc[2].word.lo) + *(const kvm_int32 *)(frame[pc[3].word.hi] + pc[3].word.lo); pc += 4; goto *(pc->i_addr); I haven't counted derefs, but Parrot and Kakapo should be close. On architectures with very slow word instructions, some code bloat to store hi/lo offsets in native ints might be worth faster address calculations. > Ken Fox wrote: > > One thing I learned is that it's not necessary (or > > desirable) to do enter/exit scope ops. > > Don't forget that you'll need those for higher-level constructs. For > example, this code: > > { > my Dog $spot is color('brindle'):breed('welsh corgi'); > } > > will need to call Dog's constructor and attribute setting code every time > you enter that scope. Definitely. I didn't say Kakapo doesn't have enter/exit scope semantics -- it does. There's no byte code "enter scope" op though. What happens is more declarative. There's a sync_scope guard op that means "the VM must be in lexical scope X to properly run the following code." If the VM is already in scope X, then it's a nop. If the VM is in the parent of X, then it's an enter scope. If the VM is in a child of X, then it's an exit scope. This makes it *very* easy for a compiler to generate flow control instructions. For example: { my Dog $spot ... { my Cat $fluffy ... middle: $spot->chases($fluffy); } } What happens when you "goto middle" depends on where you started. sync_scope might have to create both Dog and Cat scopes when code jumps to the middle. Or, code might already be in a sub-scope of Cat, so sync_scope would just pop scopes until it gets back to Cat. This is where sync_scope is very useful. It allows the compiler to say "this is the environment I want here" and delegates the job to the VM on how it happens. > You also potentially need to allocate a new scope object every time you > enter a scope so you can remember it properly if any closures are created. Closures in Kakapo are simple. All it needs to do is: 1. copy any current stack frames to the heap 2. copy the display (array of frame pointers) to the heap 3. save the pc Step #1 can be optimized because the assembler will have a pretty good idea which frames escape -- the run-time can scribble a note on the scope definition if it finds one the assembler missed. Escaping frames will just be allocated on the heap to begin with. This means that taking a closure is almost as cheap as calling a subroutine. Calling a closure is also almost as cheap as calling a subroutine because we just swap in an entirely new frame display. > How does this handle nested copies of a single scope? That's the spot a SS > architecture needs to switch to indirect access from direct, otherwise you > can only have a single instance of a particular scope active at any one > time, and that won't work. Calling a subroutine basically does this: 1. pushes previous return state on the stack 2. sets the return state registers 3. finds the deepest shared scope between caller and callee's parent 4. pushes the non-shared frames onto the stack 5. transfers control to the callee 6. sync_scope at the callee creates any frames it needs > I'm curious as to whether the current bytecode could be translated on load > to something a SS interpreter could handle. Never thought of that -- I figured the advantage of an SS machine is that brain-dead compilers can still generate fast code. Taking a really smart compiler generating register-based code and then translating it to an SS machine seems like a losing scenario. I think this is why storage-to-storage architectures have lost favor -- today's compilers are just too smart. Possibly with a software VM the memory pressure argument favoring registers isn't strong enough to offset the disadvantage of requiring smart compilers. I just put up the 0.2 version of Kakapo at <http://www.msen.com/~fox/Kakapo-0.2.tar.gz> This version has the sync_scope instruction, threaded dispatch, immediate mode operands, and a really crappy "rewrite" technique for instruction selection. One other thing that I discovered is how sensitive the VM is to dereferences. Adding the immediate mode versions of "add" and "cmp" gave me 10 more mops in the simple timing loop. I ran a simple experiment with a version of "add" that looked like this: op_add: ++i; pc += 4; NEXT_OP; The other ops in the timing loop were similarly changed. Result? A whopping 230 mops! So, the moral of this story is that derefs hurt bad. There's a cross-over point somewhere where I-cache starts blowing worse than the derefs, but I bet we can have a *lot* of specialized versions of ops before we hit that point. I think the next version of Kakapo is going to experiment with unification for instruction selection. This should allow lots of special-case instructions in the VM without complicating the assembler syntax. There's also going to be closures and non-local gotos in there. I'd like to play around with Damian's yield syntax too -- I may do yield first and then write the unification algorithm using it. - Ken