[issue10399] AST Optimization: inlining of function calls

Dave Malcolm Fri, 19 Nov 2010 17:18:06 -0800

Dave Malcolm <[email protected]> added the comment:

Sorry again for another epic-length comment...


I'm attaching the latest work-in-progress on this.

The code is still fairly messy (embarrasingly so in places), but it's better to 
have it out in public in this tracker than stuck on my hard drive.

Symbol tables
=============
  - I've moved the optimization pass to after symbol table generation.  Pass 
the symtable to mod2obj et al, adding an "ste" attribute to all objects that 
have their own PySTEntryObject (adding a PySymtable_TryLookup function to 
Python/symtable.c to help with this).  The symtable pass is rerun after 
optimization.
    - the "ste" attribute isn't exposed anywhere in the grammar (do other VMs 
implement symbol table entries?)
  - in the graphviz debug output, add boxes around scopes (i.e. those nodes 
with an "ste")
  - use the symbol table info: only inline functions that purely reference 
locals and globals, not free/cellvars.
  - only rename locals when inlining, not globals
  - don't inline functions that use the "locals" or "dir" builtin

Specialization
==============
I've implemented a first pass at the specialization ideas in the original 
comment.

I've added a new AST node: "Specialize".  It's an expression, but contains 
statements.
            -- Generated by the optimizer:
            | Specialize(expr name,
                          stmt* specialized_body,
                          expr specialized_result,
                          expr generalized) -- must be a Call

The idea is that:
  - "name" should be dynamically evaluated, then compared in some (not yet well 
specified) way against the expected value during inlining
  - If we have a match:
     - then execute the statements in "specialized_body", then evaluate 
"specialized_result", the latter giving the overall value of this expression
  - If we don't have a match:
       - then evaluate "generalized", which currently must be a Call, and this 
gives the overall value of the expression.

I've added  a JUMP_IF_SPECIALIZABLE opcode, mostly as described in the initial 
comment.

This allows us to implement "specialized" bytecode that makes an assumption, 
but has a fallback ("generalized") for the case when the assumption fails.

To wire this up for function inlining, we need some way of recording the 
"original" version of a function versus what it's now bound to.  For now, I'm 
simply adding an assignment after each function def, equivalent to:

   def foo():
       print('hello world')
   __saved__foo = foo

   def callsite():
       foo()

and the "Specialize" in "callsite" is compiled to:

   LOAD_GLOBAL   (foo)              ; dynamic lookup
   LOAD_GLOBAL (__saved__foo__)   ; get saved version
   JUMP_IF_SPECIALIZABLE -> inline impl
   CALL_FUNCTION 0
   POP_TOP
   JUMP_FORWARD
; inline implementation
   LOAD_GLOBAL ('print')
   LOAD_CONST ('hello world')
   CALL_FUNCTION 1
   POP_TOP
   LOAD_CONST (None)   
   RETURN_VALUE

Additional optimizations
========================
I've added:
  - a very simple copy-propagation pass, which only runs on functions with a 
single basic block (to keep dataflow ananalysis simple: no edges in the CFG, 
since we don't have an explicit CFG yet)
  - a pass which removes locals that are only ever written to, never read from, 
provided that the function doesn't use "globals", "locals", or "dir"

These combine nicely with the function inlining.  For example, given this code 
(from "test_copy_propagation")::
    def function_to_be_inlined(a, b, c, d):
        return(a + b + c + d)
    def callsite():
        print(function_to_be_inlined('fee', 'fi', 'fo', 'fum'))

then "callsite" compiles down to this bytecode::

  5           0 LOAD_GLOBAL              0 (print) 
              3 LOAD_GLOBAL              1 (function_to_be_inlined) 
              6 LOAD_GLOBAL              2 (__saved__function_to_be_inlined) 
              9 JUMP_IF_SPECIALIZABLE    30 
             12 LOAD_CONST               1 ('fee') 
             15 LOAD_CONST               2 ('fi') 
             18 LOAD_CONST               3 ('fo') 
             21 LOAD_CONST               4 ('fum') 
             24 CALL_FUNCTION            4 
             27 JUMP_FORWARD            15 (to 45) 
        >>   30 LOAD_CONST               0 (None) 
             33 STORE_FAST               0 
(__inline_function_to_be_inlined_1976300____returnval__) 
             36 LOAD_CONST               7 ('feefifofum') 
             39 STORE_FAST               0 
(__inline_function_to_be_inlined_1976300____returnval__) 
             42 LOAD_FAST                0 
(__inline_function_to_be_inlined_1976300____returnval__) 
        >>   45 CALL_FUNCTION            1 
             48 POP_TOP              
             49 LOAD_CONST               0 (None) 
             52 RETURN_VALUE         

Note how the specialized inline implementation is able to do:
               LOAD_CONST               7 ('feefifofum') 
rather than have to compute the sum at runtime.  (the return value handling is 
still a little messy, alas).


Other stuff
===========
- don't try to optimize anything until the end of Py_InitializeEx, since we 
can't run arbitrary python code until e.g. "sys" and "codecs" etc are brought up
- don't inline for functions that have rebound names
- co_lnotab's implementation of line numbers described in 
Objects/lnotab_notes.txt only supports monotonically-increasing line numbers 
within a code object's bytecode, but with inlining, we can jump around 
arbitrarily within one file.  I've disabled the check for now, and line numbers 
sometimes come out wrong.


Correctness
===========
Just a proof-of-concept for now.  There's a "safety switch" in 
Lib/__optimizer__.py:is_test_code which restricts optimizations to just a few 
places.  I've been occasionally turning it off and runnning regrtest: many 
tests pass, many fail; I have fixed some failures as I see them.


Performance
===========

The optimizer is currently slow, and adds a noticable delay whey compiling a 
.py file.  Potentially it could be optimized.  Alternatively, we could gain a 
command-line flag to tell python to more heavily optimize its bytecode.  This 
is particularly relevant with my downstream-distributor hat on: in Fedora and 
RHEL we ship hundreds of Python packages (and .pyc/.pyo) via RPM, and it would 
be simple to turn up the optimizations when we do a build.

bm_call_simple.py gets significantly faster: a 49% speedup (albeit for a 
do-nothing benchmark, and this isn't counting the start-up time)::

    ### call_simple ###
    Min: 0.320839 -> 0.214237: 1.4976x faster
    Avg: 0.322125 -> 0.215405: 1.4954x faster
    Significant (t=155.641488)
    Stddev: 0.00246 -> 0.00099: 2.4894x smaller
    Timeline: b'http://tinyurl.com/2abok5l'

Other benchmarks show no change, or slight slowdowns (or crashes).  I hope that 
this will improve upon implementation method-call inlining.

----------
Added file: 
http://bugs.python.org/file19649/py3k-ast-pyoptimize-2010-11-19-006.patch

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue10399>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue10399] AST Optimization: inlining of function calls

Reply via email to