[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

Marc-Andre Lemburg Fri, 17 Sep 2021 09:36:33 -0700


Marc-Andre Lemburg <[email protected]> added the comment:


FWIW: Back in the days of Python 1.5.2, the ceval loop was too big for CPU 
caches as well and one of the things I experimented with at the time was 
rearranging the opcodes based on how often they were used and splitting the 
whole switch statement we had back then in two parts. This results in a 10-20% 
speedup.

CPU caches have since gotten much larger, but the size of the loop still is 
something to keep in mind and optimize for, as more and more logic gets added 
to the inner loop of Python.

IMO, we should definitely keep forced inlines / macros where they are used 
inside hot loops, perhaps even in all of the CPython code, since the conversion 
to inline functions is mostly for hiding internals from extensions, not to hide 
them from CPython itself.

@neonene: Could you provide more details about the CPU you're using to run the 
tests ?

BTW: Perhaps the PSF could get a few sponsors to add more hosts to 
speed.python.org, to provide a better overview. It looks as if the system is 
only compiling on Ubuntu 14.04 and running on an 11 year old system 
(https://speed.python.org/about/). If that's the case, the system uses a server 
CPU with 12MB cache 
(https://www.intel.com/content/www/us/en/products/sku/47916/intel-xeon-processor-x5680-12m-cache-3-33-ghz-6-40-gts-intel-qpi/specifications.html).

----------

_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue45116>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

Reply via email to