A while back I wondered if a higher-level VM might be
useful for implementing higher-level languages. I
proposed a lexically scoped machine without registers
or stacks. The response wasn't encouraging.

A quick tour through the library turned up a few
machine designs that sounded very similar to what I
was thinking. There's even a name for them: storage
to storage architectures. The one I found the most
literature on was CRISP. AT&T based the Hobbit chips
off the CRISP design.

CRISP is a RISC architecture that does not have
named registers. It maps the top of the stack to a
large register file and provides a few addressing modes
to access the stack. This hugely simplifies a compiler
because it doesn't have to worry about register
conventions -- especially it doesn't have to worry
about saving/restoring register sets in function calls.
The effect is sort of like custom-fit register windows.

One thing that makes these particularly well suited
for a software VM is that they never copy data.
A software register architecture needs to be *very*
smart about register usage, otherwise all it's doing
is loading two copies of everything into L2.

I became interested enough that I implemented a
throw-away VM based on the concept. It's a Perl 5
module that implements a lexically scoped, storage
to storage VM. Parts of the system ended up being more
of a framework -- there's a parser, assembler, byte-code
generator and VM all integrated together. It's very
easy to hack on.

I'm calling the module Kakapo. The Kakapo is the
heaviest parrot in the world. It's critically endangered
because the *flightless* bird uses a "stand and freeze"
defense to evade predators.

As you can see I have high hopes for this code. ;)

I realize I'm too slow to have any direct impact on
Parrot, but maybe some of the ideas will find a home.

Here are a couple factorial examples in Kakapo:

        ; code a stupid compiler might generate

        .begin
fact1:    arg   L0, 0
          cmp   L1, L0, 1
          brne  L1, else
          ret   1
else:     sub   L2, L0, 1
          jsr   L3, fact1, L2
          mul   L4, L0, L3
          ret   L4
        .end

        ; code a smart compiler might generate

        .begin
fact2:    arg   L0, 0
          sub   L1, L0, 1
          bre   L1, done
          jsr   L1, fact2, L1
          mul   L0, L0, L1
done:     ret   L0
        .end

The ".begin" and ".end" commands are assembly
directives that introduce a lexical scope. Every
time a scope is entered (via a subroutine call
or branch) the VM compares the current lexical
scope state to the one the instruction requires.
The VM will automatically sync up. This works
almost exactly like make-your-own register
windows. Here's an example:

  .begin
     set L0, 1
     .begin
        set L0, L0@1
     .end
  .end

The inner scope has it's own register set -- it
only uses "L0" so the set only has one register
in it. The "L0@1" syntax asks for the "L0"
register in the parent scope. (This isn't the
calling scope like the way register windows work
on the Sparc -- this is lexical scoping. The
"arg" op is used to reference the caller's scope.
It treats the argument list of the "jsr" op as
an export statement. The "arg" op imports from
that list.)

Using scopes allows nested subroutines to be
written very easily:

        .begin
tens:     arg   L0, 0
          set   L1, 10
          set   L2, 0

          jmp   test
next:     jsr   L2, add, L2
          sub   L0, L0, 1
test:     cmp   L3, L0, 0
          brg   L3, next
          ret   L2

          .begin
add:        arg L0, 0
            add L0, L0, L1@1
            ret L0
          .end
        .end

Here "tens" is the address of a subroutine that
takes one input argument and return 10x the value.
It calls the "add" subroutine in a loop. add
takes one argument and adds "L1@1" -- or the L1
register in the parent lexical scope. If you
could write nested subroutines in Perl, the code
would look about like this:

 sub tens {
     my $n = $_[0];
     my $i = 10;
     my $r = 0;

     sub add {
        return $_[0] + $i
     }

     while ($n > 0) {
        $r = add($r);
        $n--
     }

     return $r
 }

Another cool thing about scopes is that they can
be marked "static". (This is how global and module
global scopes work.) When a scope is static -- even
if it's in the middle of a bunch of dynamic scopes --
the VM will re-use a single instance of the scope.
Here's a simple example:

  .begin
     set L0, 1
     .begin static
        add L0, L0, L0@1
     .end
  .end

The inner scope is static so "L0" is actually an
accumulator that will keep incrementing by the value
of the parent's "L0". This is pretty handy for
implementing class and function static variables.

Performance is slower than Parrot on some things, but
faster on others. For example, Kakapo is about 50%
slower on tight loops like:

  ; Kakapo
  loop1:  add    L0, L0, 1
          cmp    L1, L0, 100000000
          brl    L1, loop1

 # Parrot
  REDO:   eq     I2, I4, DONE
          add    I2, I2, I3
          branch REDO

However, on subroutine calls where register save
and restore ops are required, Kakapo can be 50%
faster. I suspect that the performance of both
architectures depends more on the implementation
than on the architecture. Neither Parrot nor
Kakapo are anywhere close to being optimized yet.

Anyways, a lexically scoped, storage-to-storage
architecture is a really sweet assembly language to
compile to. It doesn't seem to have much (if any)
performance penalty associated with it.

I'll probably keep hacking on it just to see what
it can do.

If anybody is interested, the code is available
at <http://www.msen.com/~fox/Kakapo-0.1.tar.gz>.

- Ken

Reply via email to