Dave Malcolm <[email protected]> added the comment:
Sorry again for another epic-length comment...
I'm attaching the latest work-in-progress on this.
The code is still fairly messy (embarrasingly so in places), but it's better to
have it out in public in this tracker than stuck on my hard drive.
Symbol tables
=============
- I've moved the optimization pass to after symbol table generation. Pass
the symtable to mod2obj et al, adding an "ste" attribute to all objects that
have their own PySTEntryObject (adding a PySymtable_TryLookup function to
Python/symtable.c to help with this). The symtable pass is rerun after
optimization.
- the "ste" attribute isn't exposed anywhere in the grammar (do other VMs
implement symbol table entries?)
- in the graphviz debug output, add boxes around scopes (i.e. those nodes
with an "ste")
- use the symbol table info: only inline functions that purely reference
locals and globals, not free/cellvars.
- only rename locals when inlining, not globals
- don't inline functions that use the "locals" or "dir" builtin
Specialization
==============
I've implemented a first pass at the specialization ideas in the original
comment.
I've added a new AST node: "Specialize". It's an expression, but contains
statements.
-- Generated by the optimizer:
| Specialize(expr name,
stmt* specialized_body,
expr specialized_result,
expr generalized) -- must be a Call
The idea is that:
- "name" should be dynamically evaluated, then compared in some (not yet well
specified) way against the expected value during inlining
- If we have a match:
- then execute the statements in "specialized_body", then evaluate
"specialized_result", the latter giving the overall value of this expression
- If we don't have a match:
- then evaluate "generalized", which currently must be a Call, and this
gives the overall value of the expression.
I've added a JUMP_IF_SPECIALIZABLE opcode, mostly as described in the initial
comment.
This allows us to implement "specialized" bytecode that makes an assumption,
but has a fallback ("generalized") for the case when the assumption fails.
To wire this up for function inlining, we need some way of recording the
"original" version of a function versus what it's now bound to. For now, I'm
simply adding an assignment after each function def, equivalent to:
def foo():
print('hello world')
__saved__foo = foo
def callsite():
foo()
and the "Specialize" in "callsite" is compiled to:
LOAD_GLOBAL (foo) ; dynamic lookup
LOAD_GLOBAL (__saved__foo__) ; get saved version
JUMP_IF_SPECIALIZABLE -> inline impl
CALL_FUNCTION 0
POP_TOP
JUMP_FORWARD
; inline implementation
LOAD_GLOBAL ('print')
LOAD_CONST ('hello world')
CALL_FUNCTION 1
POP_TOP
LOAD_CONST (None)
RETURN_VALUE
Additional optimizations
========================
I've added:
- a very simple copy-propagation pass, which only runs on functions with a
single basic block (to keep dataflow ananalysis simple: no edges in the CFG,
since we don't have an explicit CFG yet)
- a pass which removes locals that are only ever written to, never read from,
provided that the function doesn't use "globals", "locals", or "dir"
These combine nicely with the function inlining. For example, given this code
(from "test_copy_propagation")::
def function_to_be_inlined(a, b, c, d):
return(a + b + c + d)
def callsite():
print(function_to_be_inlined('fee', 'fi', 'fo', 'fum'))
then "callsite" compiles down to this bytecode::
5 0 LOAD_GLOBAL 0 (print)
3 LOAD_GLOBAL 1 (function_to_be_inlined)
6 LOAD_GLOBAL 2 (__saved__function_to_be_inlined)
9 JUMP_IF_SPECIALIZABLE 30
12 LOAD_CONST 1 ('fee')
15 LOAD_CONST 2 ('fi')
18 LOAD_CONST 3 ('fo')
21 LOAD_CONST 4 ('fum')
24 CALL_FUNCTION 4
27 JUMP_FORWARD 15 (to 45)
>> 30 LOAD_CONST 0 (None)
33 STORE_FAST 0
(__inline_function_to_be_inlined_1976300____returnval__)
36 LOAD_CONST 7 ('feefifofum')
39 STORE_FAST 0
(__inline_function_to_be_inlined_1976300____returnval__)
42 LOAD_FAST 0
(__inline_function_to_be_inlined_1976300____returnval__)
>> 45 CALL_FUNCTION 1
48 POP_TOP
49 LOAD_CONST 0 (None)
52 RETURN_VALUE
Note how the specialized inline implementation is able to do:
LOAD_CONST 7 ('feefifofum')
rather than have to compute the sum at runtime. (the return value handling is
still a little messy, alas).
Other stuff
===========
- don't try to optimize anything until the end of Py_InitializeEx, since we
can't run arbitrary python code until e.g. "sys" and "codecs" etc are brought up
- don't inline for functions that have rebound names
- co_lnotab's implementation of line numbers described in
Objects/lnotab_notes.txt only supports monotonically-increasing line numbers
within a code object's bytecode, but with inlining, we can jump around
arbitrarily within one file. I've disabled the check for now, and line numbers
sometimes come out wrong.
Correctness
===========
Just a proof-of-concept for now. There's a "safety switch" in
Lib/__optimizer__.py:is_test_code which restricts optimizations to just a few
places. I've been occasionally turning it off and runnning regrtest: many
tests pass, many fail; I have fixed some failures as I see them.
Performance
===========
The optimizer is currently slow, and adds a noticable delay whey compiling a
.py file. Potentially it could be optimized. Alternatively, we could gain a
command-line flag to tell python to more heavily optimize its bytecode. This
is particularly relevant with my downstream-distributor hat on: in Fedora and
RHEL we ship hundreds of Python packages (and .pyc/.pyo) via RPM, and it would
be simple to turn up the optimizations when we do a build.
bm_call_simple.py gets significantly faster: a 49% speedup (albeit for a
do-nothing benchmark, and this isn't counting the start-up time)::
### call_simple ###
Min: 0.320839 -> 0.214237: 1.4976x faster
Avg: 0.322125 -> 0.215405: 1.4954x faster
Significant (t=155.641488)
Stddev: 0.00246 -> 0.00099: 2.4894x smaller
Timeline: b'http://tinyurl.com/2abok5l'
Other benchmarks show no change, or slight slowdowns (or crashes). I hope that
this will improve upon implementation method-call inlining.
----------
Added file:
http://bugs.python.org/file19649/py3k-ast-pyoptimize-2010-11-19-006.patch
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue10399>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com