On 26-Jul-12 11:56, Don Clugston wrote:
But doing that screws up the CPU"s stack prediction so badly that it
will dominate the timing
At least do something like:

jump_table:
       move EAX, [ESP]
       ret


BTW this seems to be a roundabout way to get address of label
that I can use do a threaded code interpreter.
Basically every branch is:

L_opcode_1:
        asm { mov EAX, [ESP];
                ret;
        }
        ... real code here
L_opcode_2:
        asm { mov EAX, [ESP];
                ret;
        }
        ... real code here

Then "compile" step looks like this:
while(not_last_opcode(code[pc]){
        size_t c = code[pc];
        switch(c){
        case op_1:
                asm{ call L_opcode_1; add EAX, 4; mov c, EAX; }
                break;
        case op_2:
                ...
        }
        code[pc] = c; //now we have label address
        pc++;
}

//interpret:
pc = 0;
size_t op = code[pc];
asm { mov EAX, op; jump eax; } //here we go, almost computed goto

Obviously it's backwards and awful. Makes me wonder why can't we take it directly, what's limitation ?
How about allowing it, at least in inline assembly?


--
Dmitry Olshansky

Reply via email to