Jeremy Maitin-Shepard <jer...@jeremyms.com> added the comment:

In general, I view hanging threads as the least bad thing to do when faced with 
either acquiring the GIL or not returning at all.  There is a lot of existing 
usage of Python that currently poses a risk of random crashes and memory 
corruption while Python is exiting, and I would like to fix that.

However, I would certainly recommend that code using the Python C API attempt 
to avoid threads getting to that state in the first place.  I added a "finalize 
block" mechanism to that PR which is intended to provide a way to attempt to 
acquire the GIL in a way that ensures the GIL won't get hung.  I would welcome 
feedback on that.  A common use case for that API might be a non-Python created 
thread that wants to invoke some sort of asynchronous callback handler via 
Python APIs.

For Python daemon threads that you control, you can avoid them hanging by 
registering an atexit function that signals them to exit and then waits until 
they do.

vsteinner: Regarding processing the Windows messages, I updated the PR to 
include a link to the MSDN documentation that led me to think it was a good 
idea.

vstinner: As for random code outside of Python itself that is using 
`PyThread_exit_thread`: although I suppose there are legitimate use cases for 
`pthread_exit` and `_endthreadex`, these functions are only safe with the 
cooperation of the entire call stack of the thread.  Additionally, while 
`pthread_exit` and `_endthreadex` have similar behavior for C code, they don't 
have the same behavior for C++ code, and that difference seems like a likely 
source of problems.  Finally, I would say Python itself does not guarantee that 
its call stacks safely cooperate with `pthread_exit` (at the very least, there 
are sure to be memory leaks).  Therefore, I think Python should not encourage 
its use by leaving it as a non-deprecated public API.  If a user wishes to 
terminate a thread, they can invoke `pthread_exit` or `_endthreadex` directly, 
ideally without any Python functions in the call stack, and can refer to the 
documentation of those functions to understand the implications.

gps: The reasons I believe hanging the thread is better than `pthread_exit`:
- `pthread_exit` essentially throws an exception.  In theory that means you 
could do proper cleanup via C++ destructors and/or re-throwing catch blocks.  
But in practice existing extension code is not designed to do that, and it 
would be quite a complex task to modify it to do proper cleanup, and on Windows 
the cleanup wouldn't run anyway.
- Additionally, throwing an exception means if there is a `noexcept` function 
in the call stack, the program terminates.  We would need to document that you 
aren't allowed to call Python APIs if there is a `noexcept` function in the 
call stack.  If you have a `catch(...)` in the call stack, then you may 
inadvertently catch the exception and return control back to Python at a point 
that assumes it owns the GIL, which will cause all sorts of havoc.  We would 
likewise need to document that you can't have a non-rethrowing `catch(...)` 
block in the call stack (I believe pybind11 has some of those).
- Throwing an exception also means C++ destructors run.  pybind11 has a smart 
pointer type that holds a `PyObject` and whose destructor calls `Py_DECREF`.  
That causes a crash when `pthread_exit` unwinds the stack, since the thread 
doesn't own the GIL.

Those are the additional problems specific to `pthread_exit`.  As gps noted, 
there is the additional problem of memory corruption common to both 
`pthread_exit` and `_endthreadex`:
- Data structures that are accessible from other threads may contain pointers 
to data on the thread's stack.  For example, with certain types of 
locks/signalling mechanisms it is common to store a linked list node on the 
stack that as then added to a list of waiting threads.  If we destroy the 
thread stack without proper cleanup (and that proper cleanup definitely won't 
happen with `_endthreadex`, and probably in most cases still won't happen with 
`pthread_exit`), the data structure has now become corrupted.

I don't think hanging the thread really increases the risk of deadlock over the 
status quo.  In theory someone could have a C++ destructor that does some 
cleanup that safely pevents deadlock, but that is not portable to Windows, and 
I expect that properly implemented `pthread_exit`-safe code is extremely rare.

I think we would want to ensure that Python itself is implemented in such a way 
as to not deadlock if Python-created threads with only Python functions in the 
call stack hang.  Mostly that would amount to not holding mutexes while calling 
functions that may transitively attempt to acquire the GIL (or release and then 
re-acquire the GIL).  That is probably a good practice for avoiding deadlock 
even when not finalizing.

We would also want to document that external code using the Python API should 
be safe from deadlock if a thread hangs, or should use the new "finalize block" 
mechanism to ensure that it doesn't hang, but I would say that is much easier 
to achieve than `pthread_exit`/`_endthreadex` safety, and I expect is already 
satisfied by most code.

Regarding vstinner's point that we would leak hung threads in an application 
that embeds Python and keeps running after `Py_FinalizeEx`:
What are the existing use cases for initializing/finalizing the interpreter 
multiple times?   Under the status quo we leak a bunch of memory when 
finalizing, e.g. because any Python object references held by the threads that 
are killed will be leaked, among many other things.  That would seem to rule 
out its suitability for use in production code.  At best it might be suitable 
for test code.  Leaking a hung thread basically just amounts to leaking the 
thread stack.  Additionally, if the thread is not created by Python, and the 
application is not exiting, then only the application should decide whether to 
kill one of its threads.

Adding a `Py_SetThreadExitCallback(func)` function seems reasonable, but I 
would propose that the only acceptable behaviors are: abort, quick exit, and 
hang.  I don't think we should promise that Python's own code base is safe for 
use with `pthread_exit` / `_endthreadex`, nor should we require that extension 
code is safe for that.  Furthermore, `abort` would seem to only be useful if 
you have already taken measures (such as signaling daemon threads to exit and 
using the "finalize block" mechanism) to ensure it is never reached, and it 
just serves as an additional sanity check.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue42969>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to