https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123524
--- Comment #8 from mikulas at artax dot karlin.mff.cuni.cz --- Hongtao Liu: There's not a normal loop in the function u_run(). There are about 3600 labels and a static table that holds pointers to these labels - see "static const void *dispatch" in the source code. The variable "register const code_t *ip" holds a pointer to the byte-code (or I should say two-byte-code because each opcode has two bytes). At the beginning of u_run, it jumps to the first instruction from byte-code using "code = *ip; next_label = dispatch[code & 0xffff]; goto *(void *)next_label;" Each label represents one interpreter instruction - it performs the requested operation, then it advances the variable "ip", loads the next opcode from "*ip", loads the label address from the table and jumps to it using the computed goto feature. In the disassembly, it can be seen that the instructions "mov $0x20002,%edx; movd %edx,%xmm4; pshufd $0x0,%xmm4,%xmm7; movaps %xmm7,(%rsp)" are performed always between loading the address of the next instruction and jumping to it - so it slows down all the interpreter instructions, regardless of whether they need the constant in (%rsp) or not.
