[issue4753] Faster opcode dispatch on gcc

Skip Montanaro Sun, 04 Jan 2009 05:14:29 -0800

Skip Montanaro <s...@pobox.com> added the comment:

I'm sure this is the wrong place to bring this up, but I had a
thought about simple JIT compilation coupled with the opcode
dispatch changes in this issue.


Consider this silly function:

    >>> def f(a, b):
    ...   result = 0
    ...   while b:
    ...     result += a
    ...     b -= 1
    ...   return result
    ... 

which compiles to

      2           0 LOAD_CONST               1 (0)
                  3 STORE_FAST               2 (result)

      3           6 SETUP_LOOP              32 (to 41)
            >>    9 LOAD_FAST                1 (b)
                 12 JUMP_IF_FALSE           24 (to 39)
                 15 POP_TOP             

      4          16 LOAD_FAST                2 (result)
                 19 LOAD_FAST                0 (a)
                 22 INPLACE_ADD         
                 23 STORE_FAST               2 (result)

      5          26 LOAD_FAST                1 (b)
                 29 LOAD_CONST               2 (1)
                 32 INPLACE_SUBTRACT    
                 33 STORE_FAST               1 (b)
                 36 JUMP_ABSOLUTE            9
            >>   39 POP_TOP             
                 40 POP_BLOCK           

      6     >>   41 LOAD_FAST                2 (result)
                 44 RETURN_VALUE        

What if you built and compiled a "Mini Me" version of
PyEval_EvalFrameEx on-the-fly which only contained the prologue and
epilog of the real function and a small switch statement which only
knew about the the byte-code instructions used by f()?  Would the
compiler be better able to optimize the code?  Would the
instructions' placement nearer to each other provide better cache
behavior?  Would branch prediction by CPU be improved?

Another possibility would be to eliminate the for(;;) ... switch
altogether and just inline the code for the individual instructions.
It would help if the body of each bytecode instruction was
implemented as a macro, e.g.:

    #define _POP_TOP() \
        PREDICTED(POP_TOP); \
        TARGET(POP_TOP) \
        v = POP(); \
        Py_DECREF(v); \
        FAST_DISPATCH();

The above function could (lots of hand-waving here) be "compiled" to
something like

    PyObject *
    _MiniMe(PyFrameObject *f, int throwflag)
    {
        _PyEVAL_FRAMEEX_PROLOG

        _LOAD_CONST(1)
        _STORE_FAST(2)
        _SETUP_LOOP(_41)
    _9:
        _LOAD_FAST(1)
        _JUMP_IF_FALSE(_39)
        _POP_TOP()
        _LOAD_FAST(2)
        _LOAD_FAST(0)
        _INPLACE_ADD()
        _STORE_FAST(2)
    _26:
        _LOAD_FAST(1)
        _LOAD_CONST(2)
        _INPLACE_SUBTRACT()
        _STORE_FAST(1)
        _JUMP_ABSOLUTE(_9)
    _39:
        _POP_TOP()
        _POP_BLOCK()
        _LOAD_FAST(2)
        _RETURN_VALUE()

        _PyEVAL_FRAMEEX_EPILOG
    }

and the resulting binary code saved as an attribute of the code
object.  Presumably there would be some decision made about whether
to compile a function into this form (maybe only after it's been
called N times?).

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue4753>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue4753] Faster opcode dispatch on gcc

Reply via email to