New submission from STINNER Victor <vstin...@python.org>: Bernát Gábor found an interesting bug on Windows. Sometimes, when a process is interrupted on Windows with CTRL+C, its parent process hangs in thread.join(): https://twitter.com/gjbernat/status/1440949682759426050
Reproducer: * Install https://github.com/gaborbernat/tox/tree/2159 * Make an empty folder and put the above tox.ini in it * Invoke python -m tox and once that command is sleeping press CTRL+C (the app should lock there indefinitely). tox.ini: --- [testenv] skip_install = true commands = python -c 'import os; print(f"start {os.getpid()}"); import time; time.sleep(100); print("end");' --- Source: https://gist.github.com/gaborbernat/f1e1aff0f2ee514b50a92a4d019d4d13 I tried to attach the Python process in Python: there is a single thread, the main thread which is blocked in thread.join(). You can also see it in the faulthandler traceback. I did a long analysis of the _tstate_lock and I checked that thread really completed. Raw debug traces: == thread 6200 exit == thread_run[pid=3984, thread_id=6200]: clear PyThreadState_Clear[pid=3984, thread_id=6200]: on_delete() release_sentinel[pid=3984, thread_id=6200]: enter release_sentinel[pid=3984, thread_id=6200]: release(obj=000001C1122669C0, lock=000001C110BBDA00) release_sentinel[pid=3984, thread_id=6200]: exit PyThreadState_Clear[pid=3984, thread_id=6200]: on_delete()-- == main thread is calling join() but gets a KeyboardInterrupt == [pid=3984, thread_id=8000] Lock<obj=000001C1122669C0>.acquire() -> ACQUIRED Current thread 0x00001f40 (most recent call first): File "C:\vstinner\python\3.10\lib\threading.py", line 1118 in _wait_for_tstate_lock File "C:\vstinner\python\3.10\lib\threading.py", line 1092 in join File "C:\vstinner\env\lib\site-packages\tox\session\cmthread_run[pid=3984, thread_id=6200]: exit d\run\common.py", line 203 in execute File "C:\vstinner\env\lib\site-packages\tox\session\cmd\run\sequential.py", line 20 in run_sequential File "C:\vstinner\env\lib\site-packages\tox\session\cmd\legacy.py", line 104 in legacy File "C:\vstinner\env\lib\site-packages\tox\run.py", line 49 in main File "C:\vstinner\env\lib\site-packages\tox\run.py", line 23 in run File "C:\vstinner\env\lib\site-packages\tox\__main__.py", line 4 in <module> File "C:\vstinner\python\3.10\lib\runpy.py", line 86 in _run_code File "C:\vstinner\python\3.10\lib\runpy.py", line 196 in _run_module_as_main _wait_for_tstate_lock[pid=3984, current thread_id=8000, self thread_id=6200]: EXC: KeyboardInterrupt(); acquired? None == main thread calls repr(thread) == ROOT: [3984] KeyboardInterrupt - teardown started _wait_for_tstate_lock[pid=3984, current thread_id=8000, self thread_id=6200]: acquire(block=False, timeout=-1): lock obj= 0x1c1122669c0 File "C:\vstinner\python\3.10\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\vstinner\python\3.10\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\vstinner\env\lib\site-packages\tox\__main__.py", line 4, in <module> run() File "C:\vstinner\env\lib\site-packages\tox\run.py", line 23, in run result = main(sys.argv[1:] if args is None else args) File "C:\vstinner\env\lib\site-packages\tox\run.py", line 49, in main result = handler(state) File "C:\vstinner\env\lib\site-packages\tox\session\cmd\legacy.py", line 104, in legacy return run_sequential(state) File "C:\vstinner\env\lib\site-packages\tox\session\cmd\run\sequential.py", line 20, in run_sequential return execute(state, max_workers=1, has_spinner=False, live=True) File "C:\vstinner\env\lib\site-packages\tox\session\cmd\run\common.py", line 217, in execute print(f'join {thread}') File "C:\vstinner\python\3.10\lib\threading.py", line 901, in __repr__ self.is_alive() # easy way to get ._is_stopped set when appropriate File "C:\vstinner\python\3.10\lib\threading.py", line 1181, in is_alive self._wait_for_tstate_lock(False) File "C:\vstinner\python\3.10\lib\threading.py", line 1113, in _wait_for_tstate_lock traceback.print_stack() _wait_for_tstate_lock[pid=3984, current thread_id=8000, self thread_id=6200]: failed to acquire 0x1c1122669c0 _wait_for_tstate_lock[pid=3984, current thread_id=8000, self thread_id=6200]: exit 0x1c1122669c0 join <Thread(tox-interrupt, started 6200)> == main thread calls thread.join()... which hangs == _wait_for_tstate_lock[pid=3984, current thread_id=8000, self thread_id=6200]: acquire(block=True, timeout=-1): lock obj= 0x1c1122669c0 File "C:\vstinner\python\3.10\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\vstinner\python\3.10\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\vstinner\env\lib\site-packages\tox\__main__.py", line 4, in <module> run() File "C:\vstinner\env\lib\site-packages\tox\run.py", line 23, in run result = main(sys.argv[1:] if args is None else args) File "C:\vstinner\env\lib\site-packages\tox\run.py", line 49, in main result = handler(state) File "C:\vstinner\env\lib\site-packages\tox\session\cmd\legacy.py", line 104, in legacy return run_sequential(state) File "C:\vstinner\env\lib\site-packages\tox\session\cmd\run\sequential.py", line 20, in run_sequential return execute(state, max_workers=1, has_spinner=False, live=True) File "C:\vstinner\env\lib\site-packages\tox\session\cmd\run\common.py", line 218, in execute thread.join() File "C:\vstinner\python\3.10\lib\threading.py", line 1092, in join self._wait_for_tstate_lock() File "C:\vstinner\python\3.10\lib\threading.py", line 1113, in _wait_for_tstate_lock traceback.print_stack() Explanation: * Context: The main thread is waiting on thread.join() * The thread 6200 completes: release_sentinel() is called to to release _tstate_lock * The main thread succeed to acquire _tstate_lock (of the thread 6200) since it was just release * Ooops oops oops, the main thread gets KeyboardInterrupt in Thread._wait_for_tstate_lock() before being able to release the lock. As if the function if interrupted here: def _wait_for_tstate_lock(self, block=True, timeout=-1): lock = self._tstate_lock if lock is None: assert self._is_stopped elif lock.acquire(block, timeout): # -- got KeyboardInterrupt here --- lock.release() self._stop() * (tox does something in the main thread) * (there are only one remaining thread: the main thread) * tox catchs KeyboardInterrupt and calls thread.join() again * thread.join() hangs because the _tstate_lock was already acquire, so lock.acquire() hangs forever => NOT GOOD You can reproduce the issue on Linux with attached patch and script: * Apply threading_bug.patch on Python * Run threading_bug.py * See that the script hangs forever Example: --- $ git apply threading_bug.patch $ ./python threading_bug.py join... join failed with: KeyboardInterrupt() join again... --- I'm now working on a PR to fix the race condition. ---------- components: Library (Lib) files: threading_bug.patch keywords: patch messages: 402523 nosy: vstinner priority: normal severity: normal status: open title: Race condition in Thread._wait_for_tstate_lock() versions: Python 3.11 Added file: https://bugs.python.org/file50299/threading_bug.patch _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue45274> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com