STINNER Victor added the comment: pyobject_fastcall-4.patch combines a lot of changes. I wrote it to experiment _PyObject_FastCall().
I will rewrite these changes as a patch serie with smaller changes. The design of PyObject_FastCall*() is to be short and simple. Changes: * Add _PyObject_FastCall(), _PyFunction_FastCall(), _PyCFunction_FastCall(): similar to the _PyObject_FastCallKeywords() but without keyword arguments. Without keyword arguments, the checks on keyword arguments can be removed, and it allows to reduce the stack usage: one less parameter to C functions. * Move the slow path out of _PyObject_FastCallDict() and _PyObject_FastCallKeywords() in a new subfunction which is tagged with _Py_NO_INLINE to reduce the stack consumption of _PyObject_FastCallDict() and _PyObject_FastCallKeywords() in their fast paths, which are the most code paths. * Move all "call" functions into a new Objects/call.c function. This change should help code placement to enhance the usage of the CPU caches (especially the CPU L1 instruction cache). In my long experience with benchmarking last year, I notice huge performance differences caused by code placement. See my blog post: https://haypo.github.io/analysis-python-performance-issue.html Sadly, _Py_HOT_FUNCTION didn't fix the issue completely. * _PyObject_FastCallDict() and _PyObject_FastCallKeywords() "call" _PyObject_FastCall(). In fact, _PyObject_FastCall() is inlined manually in these functions, against, to minimize the stack usage. Similar change in _PyCFunction_FastCallDict() and PyCFunction_Call(). * Since the slow path is moved to a subfunction, I removed _Py_NO_INLINE from _PyStack_AsTuple() to allow the compiler to inline it if it wants to ;-) Since the stack usage is better with the patch, it seems like this strange has no negative impact on the stack usage. * Optimize PyMethodDescr_Type (call _PyMethodDescr_FastCallKeywords()) in _PyObject_FastCallKeywords() * I moved Py_EnterRecursiveCall() closer to the final function call, to simplify error handling. It would be nice to fix the issue #29306 before :-/ * I had to copy/paste the null_error() function from Objects/abstract.c. I don't think that it's worth it to add a _PyErr_NullError() function shared by abstract.c and call.c. Compilers are even able to merge duplicated functions ;-) * A few other changes. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue29465> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com