[Bug tree-optimization/91898] New: [optimization] switch statements for state machines could be better

nathan at acm dot org Wed, 25 Sep 2019 05:50:38 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91898


            Bug ID: 91898
           Summary: [optimization] switch statements for state machines
                    could be better
           Product: gcc
           Version: 9.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: nathan at acm dot org
  Target Milestone: ---

Interesting talk at cppcon'19 by Miro Knejp 'Non-conforming C++: the Secrets
the Committee Is Hiding From You'  It'll be on youtube at some point.


This PR concerns his observations about a common VM implementation technique.

We have an array of instructions, and we execute these one after eachother. 
The natural way to write this would be:

for (int index = 0; ; index++)
  switch (insns[index]) {
    case ADD: ... break;
    case SUB: ... break;
    ...
    case END: goto end;
 } end:;

this leads to code gen that looks like:

loop:
  r = insn[index];
  index = index + 1;
  [range check on r]
  addr = jmptable[r];
  jmp [addr];
ADD: ...
  jmp loop

etc.  This performs poorly because all the dispatches go through the single
jmp[addr] instruction, and the CPU's indirect branch predictor has no idea
where it'll go.  But there are patterns in the sequence of virtual
instructions, just as there are in real code!

Hence, people use the GCC label-address extension and write:
static void *const dispatch[] = {&&ADD, &&SUB, ... };

index = 0; goto *dispatch[insn[index++]];

ADD: ... goto *dispatch[insn[index++]];
SUB: ... goto *dispatch[insn[index++]];

code generation looks like:

ADD: ...
   r = insn[index];
   index++;
   addr = dispatch[r];
   jmp [addr];

This is faster because there are per-virtual-insn indirect jumps, and the
branch predictor does much better.

(We're also eliding the range check, I do not know how much that would cost if
it was added).

Hence, it might be worthwhile duplicating the switch's dispatch logic at the
end of every case block.

[Bug tree-optimization/91898] New: [optimization] switch statements for state machines could be better

Reply via email to