New submission from STINNER Victor <vstin...@python.org>:

The glibc pthread_exit() functions loads an unwind function from libgcc_s.so.1 
using dlopen(). dlopen() can fail to open libgcc_s.so.1 file to various 
reasons, but the most likely seems to be that the process is out of available 
file descriptor (EMFILE error).

If the glibc pthread_exit() fails to open libgcc_s.so.1, it aborts the process. 
Extract of pthread_cancel():

  /* Trigger an error if libgcc_s cannot be loaded.  */
  {
    struct unwind_link *unwind_link = __libc_unwind_link_get ();
    if (unwind_link == NULL)
      __libc_fatal (LIBGCC_S_SO
                    " must be installed for pthread_cancel to work\n");
  }

Sometimes, libgcc_s.so.1 library is loaded early in Python startup. Sometimes, 
it only loaded when the first Python thread exits.

Hitting in a multithreaded real world application, dlopen() failing with EMFILE 
is not deterministic. It depends on precise timing and in which order threads 
are running. It is unlikely in a small application, but it is more likely on a 
network server which has thousands of open sockets (file descriptors).

--

Attached scripts reproduces the issue. You may need to run the scripts 
(especially pthread_cancel_emfile.py) multiple times to trigger the issue. 
Sometimes libgcc_s library is loaded early for an unknown reason, it works 
around the issue.

(1) pthread_cancel_bug.py 

$ python3.10 pthread_cancel_bug.py 
libgcc_s.so.1 must be installed for pthread_cancel to work
Abandon (core dumped)


(2) pthread_cancel_emfile.py:

$ python3.10 ~/pthread_cancel_emfile.py 
spawn thread
os.open failed: OSError(24, 'Too many open files')
FDs open by the thread: 2 (max FD: 4)
fd 0 valid? True
fd 1 valid? True
fd 2 valid? True
fd 3 valid? True
fd 4 valid? True
libgcc_s.so.1 must be installed for pthread_cancel to work
Abandon (core dumped)

--

Example of real world issue on RHEL8:
https://bugzilla.redhat.com/show_bug.cgi?id=1972293

The RHEL reproducer uses a very low RLIMIT_NOFILE (5 file descriptors) to 
trigger the bug faster. It simulates a busy server application.

--

There are different options:

(*) Modify thread_run() of Modules/_threadmodule.c to remove the *redundant* 
PyThread_exit_thread() call.

This is the most simple option and it sounds perfectly safe to me. I'm not sure 
why PyThread_exit_thread() is called explicitly. We don't pass any parameter to 
the function.


(*) Link the Python _thread extension on libgcc_s.so if Python it built with 
the glibc.

Checking if Python is linked to the glibc is non trivial and we have hardcode 
the "libgcc_s" library name. I expect painful maintenance burden with this 
option.

(*) Load explicitly the libgcc_s.so library in _thread.start_new_thread(): when 
the first thread is created.

We need to detect that we are running the glibc at runtime, by calling 
confstr('CS_GNU_LIBC_VERSION') for example. The problem is that "libgcc_s.so.1" 
filename may change depending on the Linux distribution. It will likely have a 
different filename on macOS (".dynlib"). In short, it's tricky to get it right.

(*) Fix the glibc!

I discussed with glibc developers who explained me that there are good reasons 
to keep the unwind code in the compiler (GCC), and so load it dynamically in 
the glibc. In short, this is not going to change.

--

Attached PR implements the most straightforward option: remove the redundant 
PyThread_exit_thread() call in thread_run().

----------
components: Library (Lib)
files: pthread_cancel_bug.py
messages: 395924
nosy: vstinner
priority: normal
severity: normal
status: open
title: _thread module: Remove redundant PyThread_exit_thread() call to avoid 
glibc fatal error: libgcc_s.so.1 must be installed for pthread_cancel to work
versions: Python 3.11
Added file: https://bugs.python.org/file50112/pthread_cancel_bug.py

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue44434>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to