I don't have the time right now to do this myself, so here is a simple idea to evaluate.
Currently, the computed goto decode and dispatch is essentially: goto *ops_addr[ *cur_opcode ]; Now a big part of the gain of the prederef runops core comes from decoding each op once instead of each time it is executed. The prederef core does this by creating an array shadowing the byte code which stores pointers to the op functions for the decoded ops. One could modify the computed goto runops analagously, by creating a parallel array that stores the decoded label address of each op. Suppose the parallel array is pointed to by decoded_ops, then op dispatch would then look like : goto *decoded_ops[ cur_opcode - start_of_bytecode ]; The C compiler might be able to optimize away the explicit subtraction. If not one can do the equivalent pointer math, but I won't try to write that here. In the ideal case, where sizeof(opcode_t) == sizeof(void *), one could possibly cheat like the jit compiler does and overwrite the original bytecode instead of using a parallel array, but that may not be good. -- Jason