Okay, here's a really, really evil idea. (And yes, bluntly, it's triggered by the pie-thon bytecode translator's needs) I need a stack, and one that's faster than our current stack which, while snappy for what it does, is still burdened by generality. I also need a stack that's generally not very big. So...

The fake stack. The idea is this. PMC registers 18-29 are used as a 12 element stack. PMC register 31 is used as a stack overflow array. Integer register I31 is used as the stack depth register. We add in the ops:

 fakepush [Px|Ix]

This pushes the contents of Px onto the top of the stack, or pushes the stack down by Ix entries. Note that the Ix form does *not* set these new slots to anything! They're left as-is, which can be an issue for the GC.

 fakepop [ix]

This pops Ix entries off the stack, default of one. Note that there's no need for any destination--if you're popping the TOS into a register then just do a set first.

 iset Px, Iy

This does a set of register Px to PMC register #Y.

 iset Ix, Py

This sets the PMC register #X to Py.

The last two are pretty standard indirect register access. If someone wants to propose a more generic syntax for it we might want to generalize on, that's fine--do it *after* OSCON, thanks. :)

The reason for the odd register count and register number usage is twofold. First, it leaves some of the top-half registers free for other things. Second, the registers used will be 8-byte aligned and a multiple of 8 bytes on systems with 4-byte pointers. Not likely a huge deal, but it may shave a cycle off here or there, and it does mean we have a few spare registers in the upper half.

Note that this stack does *not* need rot, swap, dup, or other funky 'move the things around' ops, since that can already be done. For example:

  dup:
    fakepush 1

  dup2 (the top two entries are duplicated)
    fakepush 2

  swap:
    exchange P18, P19

  rot3:
    exchange P18, P20
    exchange P19, P20

And it means that stack based ops turn to:

   add (pop the top two entries, add them, and push the result):
    new P17
    add P17, P18, P19
    fakepop
    set P18, P17

and something like add-in-place (new TOS = oldTOS+1 + oldTOS):
  add P18, P19, P18
  fakepop

It seems sensible, which of course worries me, as do so many things I think are sensible, but I think we should do this--it'll be useful for other languages that like being stack based. (If I'd any sense, I'd have done this ages ago as it'd have made the forth implementation nicer) We should get x86 JIT code for the new ops, since I expect they'll be used rather a lot.

Full-fledged tracking of used stack slots within basic blocks with register coloring and cross-block register exchanges would, of course, make a lot of sense and be faster, but I'm a bit pressed for time here, so we'll make do. Only the fake push and pop ops are at all odd, so I'll put those in experimental.ops for now. iset will go into set.ops, though if we decide that we want a more general indirect register access scheme we can see about renaming them. (We probably should put in an opname aliasing feature to the pir and pasm compilers, but we'll deal with that later, unless someone's feeling like a project)
--
Dan


--------------------------------------it's like this-------------------
Dan Sugalski                          even samurai
[EMAIL PROTECTED]                         have teddy bears and even
                                      teddy bears get drunk

Reply via email to