New submission from STINNER Victor <vstin...@python.org>:

The Fedora packaging has been modified to compile libpython with 
-fno-semantic-interposition flag: it makes Python up to 1.3x faster without 
having to touch any line of the C code! See pyperformance results:
https://fedoraproject.org/wiki/Changes/PythonNoSemanticInterpositionSpeedup#Benefit_to_Fedora

The main drawback is that -fno-semantic-interposition prevents to override 
Python symbols using a custom library preloaded by LD_PRELOAD. For example, 
override PyErr_Occurred() function.

We (authors of the Fedora change) failed to find any use case for LD_PRELOAD.

To be honest, I found *one* user in the last 10 years who used LD_PRELOAD to 
track memory allocations in Python 2.7. This use case is no longer relevant in 
Python 3 with PEP 445 which provides a supported C API to override Python 
memory allocators or to install hooks on Python memory allocators. Moreover, 
tracemalloc is a nice way to track memory allocations.

Is there anyone aware of any special use of LD_PRELOAD for libpython?

To be clear: -fno-semantic-interposition only impacts libpython. All other 
libraries still respect LD_PRELOAD. For example, it is still possible to 
override glibc malloc/free.

Why -fno-semantic-interposition makes Python faster? There are multiple 
reasons. For of all, libpython makes a lot of function calls to libpython. Like 
really a lot, especially in the hot code paths. Without 
-fno-semantic-interposition, function calls to libpython requires to get 
through "interposition": for example "Procedure Linkage Table" (PLT) 
indirection on Linux. It prevents function inlining which has a major impact on 
performance (missed optimization). In short, even with PGO and LTO, libpython 
function calls have two performance "penalities":

* indirect function calls (PLT)
* no inlining

I'm comparing Python performance of "statically linked Python" (Debian/Ubuntu 
choice: don't use ./configure --enable-shared, python is not linked to 
libpython) to "dynamically linked Python" (Fedora choice: use "./configure 
--enable-shared", python is dynamically linked to libpython).

With -fno-semantic-interposition, function calls are direct and can be inlined 
when appropriate. You don't have to trust me, look at pyperformance benchmark 
results ;-)

When using ./configure --enable-shared (libpython), the "python" binary is 
exactly one function call and that's all:

int main(int argc, char **argv)
{ return Py_BytesMain(argc, argv); }

So 100% of the time is only spent in libpython.

For a longer rationale, see the accepted Fedora change:
https://fedoraproject.org/wiki/Changes/PythonNoSemanticInterpositionSpeedup

----------
components: Build
messages: 357856
nosy: inada.naoki, pablogsal, serhiy.storchaka, vstinner
priority: normal
severity: normal
status: open
title: Compile libpython with -fno-semantic-interposition
type: performance
versions: Python 3.9

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue38980>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to