STINNER Victor added the comment:

"While I feel your work is great, performance benefit seems very small,
compared complexity of this patch."

I have to agree. I spent a lot of times on benhchmarking these tp_fast* 
changes. While one or two benchmarks are faster, it's not really the case for 
the others.

I also agree with the complexity. In Python 3.6, most FASTCALL changes were 
internals. For example, using PyObject_CallFunctionObjArgs() now uses FASTCALL 
internally, without having to modify callers of the API. I tried to only use 
_PyObject_FastCallDict/Keywords() in a few places where the speedup was 
significant.

The main visible change of Python 3.6 FASTCALL is the new METH_CALL calling 
convention for C function. Your change modifying print() to use METH_CALL has a 
significant impact on the telco benchmark, without no drawback. I tested 
further changes to use METH_FASTCALL in struct and decimal modules, and they 
optimize telco even more.

To continue the optimization work, I guess that using METH_CALL in more cases, 
using Argument Clinic whenever possible, would have a more concrete and 
measurable impact on performances, than this big tp_fastcall patch.

But I'm not ready to abandon the whole approach yet, so I change the status to 
Pending. I may come back in one or two months, to check if I didn't miss 
anything obvious to unlock even more optimizations ;-)

----------
status: open -> pending

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue29259>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to