I don't have the time right now to do this myself, so here is a simple
idea to evaluate.

Currently, the computed goto decode and dispatch is essentially:

goto *ops_addr[ *cur_opcode ];

Now a big part of the gain of the prederef runops core comes from decoding each
op once instead of each time it is executed.  The prederef core does this by
creating an array shadowing the byte code which stores pointers to the op
functions for the decoded ops.

One could modify the computed goto runops analagously, by creating a parallel
array that stores the decoded label address of each op. Suppose the parallel
array is pointed to by decoded_ops, then op dispatch would then look like :

goto *decoded_ops[ cur_opcode - start_of_bytecode ];

The C compiler might be able to optimize away the explicit subtraction. If not
one can do the equivalent pointer math, but I won't try to write that here.

In the ideal case, where sizeof(opcode_t) == sizeof(void *), one could possibly
cheat like the jit compiler does and overwrite the original bytecode instead of
using a parallel array, but that may not be good.

-- 
Jason

Reply via email to