A while back I wondered if a higher-level VM might be useful for implementing higher-level languages. I proposed a lexically scoped machine without registers or stacks. The response wasn't encouraging.
A quick tour through the library turned up a few machine designs that sounded very similar to what I was thinking. There's even a name for them: storage to storage architectures. The one I found the most literature on was CRISP. AT&T based the Hobbit chips off the CRISP design. CRISP is a RISC architecture that does not have named registers. It maps the top of the stack to a large register file and provides a few addressing modes to access the stack. This hugely simplifies a compiler because it doesn't have to worry about register conventions -- especially it doesn't have to worry about saving/restoring register sets in function calls. The effect is sort of like custom-fit register windows. One thing that makes these particularly well suited for a software VM is that they never copy data. A software register architecture needs to be *very* smart about register usage, otherwise all it's doing is loading two copies of everything into L2. I became interested enough that I implemented a throw-away VM based on the concept. It's a Perl 5 module that implements a lexically scoped, storage to storage VM. Parts of the system ended up being more of a framework -- there's a parser, assembler, byte-code generator and VM all integrated together. It's very easy to hack on. I'm calling the module Kakapo. The Kakapo is the heaviest parrot in the world. It's critically endangered because the *flightless* bird uses a "stand and freeze" defense to evade predators. As you can see I have high hopes for this code. ;) I realize I'm too slow to have any direct impact on Parrot, but maybe some of the ideas will find a home. Here are a couple factorial examples in Kakapo: ; code a stupid compiler might generate .begin fact1: arg L0, 0 cmp L1, L0, 1 brne L1, else ret 1 else: sub L2, L0, 1 jsr L3, fact1, L2 mul L4, L0, L3 ret L4 .end ; code a smart compiler might generate .begin fact2: arg L0, 0 sub L1, L0, 1 bre L1, done jsr L1, fact2, L1 mul L0, L0, L1 done: ret L0 .end The ".begin" and ".end" commands are assembly directives that introduce a lexical scope. Every time a scope is entered (via a subroutine call or branch) the VM compares the current lexical scope state to the one the instruction requires. The VM will automatically sync up. This works almost exactly like make-your-own register windows. Here's an example: .begin set L0, 1 .begin set L0, L0@1 .end .end The inner scope has it's own register set -- it only uses "L0" so the set only has one register in it. The "L0@1" syntax asks for the "L0" register in the parent scope. (This isn't the calling scope like the way register windows work on the Sparc -- this is lexical scoping. The "arg" op is used to reference the caller's scope. It treats the argument list of the "jsr" op as an export statement. The "arg" op imports from that list.) Using scopes allows nested subroutines to be written very easily: .begin tens: arg L0, 0 set L1, 10 set L2, 0 jmp test next: jsr L2, add, L2 sub L0, L0, 1 test: cmp L3, L0, 0 brg L3, next ret L2 .begin add: arg L0, 0 add L0, L0, L1@1 ret L0 .end .end Here "tens" is the address of a subroutine that takes one input argument and return 10x the value. It calls the "add" subroutine in a loop. add takes one argument and adds "L1@1" -- or the L1 register in the parent lexical scope. If you could write nested subroutines in Perl, the code would look about like this: sub tens { my $n = $_[0]; my $i = 10; my $r = 0; sub add { return $_[0] + $i } while ($n > 0) { $r = add($r); $n-- } return $r } Another cool thing about scopes is that they can be marked "static". (This is how global and module global scopes work.) When a scope is static -- even if it's in the middle of a bunch of dynamic scopes -- the VM will re-use a single instance of the scope. Here's a simple example: .begin set L0, 1 .begin static add L0, L0, L0@1 .end .end The inner scope is static so "L0" is actually an accumulator that will keep incrementing by the value of the parent's "L0". This is pretty handy for implementing class and function static variables. Performance is slower than Parrot on some things, but faster on others. For example, Kakapo is about 50% slower on tight loops like: ; Kakapo loop1: add L0, L0, 1 cmp L1, L0, 100000000 brl L1, loop1 # Parrot REDO: eq I2, I4, DONE add I2, I2, I3 branch REDO However, on subroutine calls where register save and restore ops are required, Kakapo can be 50% faster. I suspect that the performance of both architectures depends more on the implementation than on the architecture. Neither Parrot nor Kakapo are anywhere close to being optimized yet. Anyways, a lexically scoped, storage-to-storage architecture is a really sweet assembly language to compile to. It doesn't seem to have much (if any) performance penalty associated with it. I'll probably keep hacking on it just to see what it can do. If anybody is interested, the code is available at <http://www.msen.com/~fox/Kakapo-0.1.tar.gz>. - Ken