Yury Selivanov schrieb am 27.01.2016 um 19:25:
> tl;dr The summary is that I have a patch that improves CPython performance
> up to 5-10% on macro benchmarks.  Benchmarks results on Macbook Pro/Mac OS
> X, desktop CPU/Linux, server CPU/Linux are available at [1].  There are no
> slowdowns that I could reproduce consistently.
> 
> There are two different optimizations that yield this speedup:
> LOAD_METHOD/CALL_METHOD opcodes and per-opcode cache in ceval loop.
> 
> LOAD_METHOD & CALL_METHOD
> -------------------------
> 
> We had a lot of conversations with Victor about his PEP 509, and he sent me
> a link to his amazing compilation of notes about CPython performance [2]. 
> One optimization that he pointed out to me was LOAD/CALL_METHOD opcodes, an
> idea first originated in PyPy.
> 
> There is a patch that implements this optimization, it's tracked here:
> [3].  There are some low level details that I explained in the issue, but
> I'll go over the high level design in this email as well.
> 
> Every time you access a method attribute on an object, a BoundMethod object
> is created. It is a fairly expensive operation, despite a freelist of
> BoundMethods (so that memory allocation is generally avoided).  The idea is
> to detect what looks like a method call in the compiler, and emit a pair of
> specialized bytecodes for that.
> 
> So instead of LOAD_GLOBAL/LOAD_ATTR/CALL_FUNCTION we will have
> LOAD_GLOBAL/LOAD_METHOD/CALL_METHOD.
> 
> LOAD_METHOD looks at the object on top of the stack, and checks if the name
> resolves to a method or to a regular attribute.  If it's a method, then we
> push the unbound method object and the object to the stack.  If it's an
> attribute, we push the resolved attribute and NULL.
> 
> When CALL_METHOD looks at the stack it knows how to call the unbound method
> properly (pushing the object as a first arg), or how to call a regular
> callable.
> 
> This idea does make CPython faster around 2-4%.  And it surely doesn't make
> it slower.  I think it's a safe bet to at least implement this optimization
> in CPython 3.6.
> 
> So far, the patch only optimizes positional-only method calls. It's
> possible to optimize all kind of calls, but this will necessitate 3 more
> opcodes (explained in the issue).  We'll need to do some careful
> benchmarking to see if it's really needed.

I implemented a similar but simpler optimisation in Cython a while back:

http://blog.behnel.de/posts/faster-python-calls-in-cython-021.html

Instead of avoiding the creation of method objects, as you proposed, it
just normally calls getattr and if that returns a bound method object, it
uses inlined calling code that avoids re-packing the argument tuple.
Interestingly, I got speedups of 5-15% for some of the Python benchmarks,
but I don't quite remember which ones (at least raytrace and richards, I
think), nor do I recall the overall gain, which (I assume) is what you are
referring to with your 2-4% above. Might have been in the same order.

Stefan


_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to