Re: [Python-Dev] Python 3 optimizations...

Stefan Behnel Fri, 23 Jul 2010 01:40:31 -0700

stefan brunthaler, 23.07.2010 08:48:

I guess it would be a good idea to quickly outline my inline caching
approach, so that we all have a basic understanding of how it works.


Yes, that certainly makes it easier to discuss.

If we take for instance the BINARY_ADD instruction, the interpreter
evaluates the actual operand types and chooses the matching operation
implementation at runtime, i.e., operands that are unicode strings
will be concatenated via unicode_concatenate, for float operands on
the other hand, the interpreter would end up invoking float_add via
binary_op1. Now, a very efficient way to achieve purely interpretative
inline caching is to quicken the type-generic BINARY_ADD instruction
to a type-dependent FLOAT_ADD instruction (this technique, i.e.,
inline caching via quickening, is the primary contribution of my ECOOP
paper). Hence, I have a very simple code generator, that generates
type-dependent interpreter instructions in a pre-compile step of the
interpreter, and uses runtime type information to quicken/rewrite
instructions.
Aside of the operators, I have implemented this quickening technique
for FOR_ITER, COMPARE_OP and CALL_FUNCTION instructions.

This sounds like wpython (a CPython derivative with a wider set of bytecode commands) could benefit from it.

Do I understand correctly that you modify the byte code ofmodules/functions at runtime?

I'm absolutely interested, although not for the CPython project but for
Cython. I wonder how you do inline caching in Python if the methods of a
type can be replaced by whatever at runtime. Could you elaborate on that?


Currently, I only provide optimized derivatives for several separate
call targets, i.e., whether a call target is a C function with
varargs, or a Python function/method--this already eliminates a lot of
overhead from invoking call_function.

Ah, yes, that makes good sense. So you basically add an intermediate stepto calls that provides faster dispatch for known C functions.

Or do you restrict yourself to builtin types?


Currently, my approach provides optimized derivative instructions for
the standard library, e.g., unicode strings, numerical objects,
containers, and iterators.

I'm interested in the code that determines what can be optimised in whatway. I read that Jython recently received a contribution that provides typeinformation for lots of modules and builtins, but having something likethat for CPython would be cool.

That might be worth it
already, just think of list.append(). We have an optimistic optimisation for
object.append() in Cython that gives us massive speed-ups in loops that
build lists, even if we don't know at compile time that we are dealing with
lists.

Yes, that sounds like a reasonable thing to do. I could provide much
more optimized derivatives based on application profiles, too. Since I
use a simple code generator for generating the derivatives, it would
also be possible to provide end-users with the means to analyze their
apps and generate optimized instruction derivatives matching their
profile.

Such an approach would also be very useful for Cython. Think of a profilerthat runs a program in CPython and tells you exactly what static typeannotations to put where in your Python code to make it compile to a fastbinary with Cython. Or, even better, it could just spit out a .pxd filethat you drop next to your .py file and that provides the static typeinformation for you.


Stefan

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations...

Reply via email to