Yury Selivanov schrieb am 27.01.2016 um 19:25: > tl;dr The summary is that I have a patch that improves CPython performance > up to 5-10% on macro benchmarks. Benchmarks results on Macbook Pro/Mac OS > X, desktop CPU/Linux, server CPU/Linux are available at [1]. There are no > slowdowns that I could reproduce consistently. > > There are two different optimizations that yield this speedup: > LOAD_METHOD/CALL_METHOD opcodes and per-opcode cache in ceval loop. > > LOAD_METHOD & CALL_METHOD > ------------------------- > > We had a lot of conversations with Victor about his PEP 509, and he sent me > a link to his amazing compilation of notes about CPython performance [2]. > One optimization that he pointed out to me was LOAD/CALL_METHOD opcodes, an > idea first originated in PyPy. > > There is a patch that implements this optimization, it's tracked here: > [3]. There are some low level details that I explained in the issue, but > I'll go over the high level design in this email as well. > > Every time you access a method attribute on an object, a BoundMethod object > is created. It is a fairly expensive operation, despite a freelist of > BoundMethods (so that memory allocation is generally avoided). The idea is > to detect what looks like a method call in the compiler, and emit a pair of > specialized bytecodes for that. > > So instead of LOAD_GLOBAL/LOAD_ATTR/CALL_FUNCTION we will have > LOAD_GLOBAL/LOAD_METHOD/CALL_METHOD. > > LOAD_METHOD looks at the object on top of the stack, and checks if the name > resolves to a method or to a regular attribute. If it's a method, then we > push the unbound method object and the object to the stack. If it's an > attribute, we push the resolved attribute and NULL. > > When CALL_METHOD looks at the stack it knows how to call the unbound method > properly (pushing the object as a first arg), or how to call a regular > callable. > > This idea does make CPython faster around 2-4%. And it surely doesn't make > it slower. I think it's a safe bet to at least implement this optimization > in CPython 3.6. > > So far, the patch only optimizes positional-only method calls. It's > possible to optimize all kind of calls, but this will necessitate 3 more > opcodes (explained in the issue). We'll need to do some careful > benchmarking to see if it's really needed.
I implemented a similar but simpler optimisation in Cython a while back: http://blog.behnel.de/posts/faster-python-calls-in-cython-021.html Instead of avoiding the creation of method objects, as you proposed, it just normally calls getattr and if that returns a bound method object, it uses inlined calling code that avoids re-packing the argument tuple. Interestingly, I got speedups of 5-15% for some of the Python benchmarks, but I don't quite remember which ones (at least raytrace and richards, I think), nor do I recall the overall gain, which (I assume) is what you are referring to with your 2-4% above. Might have been in the same order. Stefan _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com