[issue38106] [2.7] Race in PyThread_release_lock on macOS - can lead to memory corruption and deadlock
Kirill Smelkov added the comment: :) Yes and no. PyPy did not make a new release with the fix yet. -- ___ Python tracker <https://bugs.python.org/issue38106> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38106] [2.7] Race in PyThread_release_lock on macOS - can lead to memory corruption and deadlock
Kirill Smelkov added the comment: Victor, thanks for merging. > I'm surprised that we still find new bugs in this code which is supposed to > be battle tested! Maybe recent Darwin changed made the bug more likely. As discussed above (https://bugs.python.org/issue38106#msg351917, https://bugs.python.org/issue38106#msg351970), due to the GIL, the bug is not possible to trigger from pure-python code, and it can be hit only via using CAPI directly. I indeed started to observe the problem after moving locks and other code that implement channels in Pygolang from Python to C level (see "0.0.3" in https://pypi.org/project/pygolang/#pygolang-change-history) The bug was there since 1994 and, from my point of view, it was not discovered because locking functionality was not enough hammer-tested. The bug was also not possible to explain without taking lock lifetime into account, as, without create/destroy, just lock/unlock sequence was race free. https://bugs.python.org/issue433625 confirms that. I cannot say about whether recent macOS changes are relevant. My primary platform is Linux and I only recently started to use macOS under QEMU for testing. However from my brief study of https://github.com/apple/darwin-libpthread I believe the difference in scheduling related to pthread condition variable signalling under macOS and Linux is there already for a long time. > PyPy: it's now your turn to fix it ;-) PyPy people fixed the bug the same day it was reported: https://bitbucket.org/pypy/pypy/issues/3072 :) Kirill P.S. Mariusz, thanks also for your feedback. -- ___ Python tracker <https://bugs.python.org/issue38106> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue26360] Deadlock in thread.join on Python 2.7/macOS
Kirill Smelkov added the comment: > > Maybe issue38106 related. > > That looks plausible, but unfortunately I'm still able to reproduce the hang > with your PR (commit 9b135c02aa1edab4c99c915c43cd62d988f1f9c1, macOS 10.14.6). Thanks for feedback. Then hereby bug is probably deadlock of another kind. -- ___ Python tracker <https://bugs.python.org/issue26360> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38106] Race in PyThread_release_lock - can lead to memory corruption and deadlock
Kirill Smelkov added the comment: Ok, I did https://github.com/python/cpython/pull/16047. -- ___ Python tracker <https://bugs.python.org/issue38106> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38106] Race in PyThread_release_lock - can lead to memory corruption and deadlock
Change by Kirill Smelkov : -- keywords: +patch pull_requests: +15669 stage: -> patch review pull_request: https://github.com/python/cpython/pull/16047 ___ Python tracker <https://bugs.python.org/issue38106> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38106] Race in PyThread_release_lock - can lead to memory corruption and deadlock
Kirill Smelkov added the comment: I agree it seems like a design mistake. Not only it leads to suboptimal implementations, but what is more important, it throws misuse risks onto the user. -- ___ Python tracker <https://bugs.python.org/issue38106> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue26360] Deadlock in thread.join on Python 2.7/Mac OS X 10.9, 10.10
Kirill Smelkov added the comment: Maybe issue38106 related. -- nosy: +navytux ___ Python tracker <https://bugs.python.org/issue26360> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8410] Fix emulated lock to be 'fair'
Kirill Smelkov added the comment: Thanks for feedback. -- ___ Python tracker <https://bugs.python.org/issue8410> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38106] Race in PyThread_release_lock - can lead to memory corruption and deadlock
Kirill Smelkov added the comment: And it is indeed better to always do pthread_cond_signal() from under mutex. Many pthread libraries delay the signalling to associated mutex unlock, so there should be no performance penalty here and the correctness is much more easier to reason about. -- ___ Python tracker <https://bugs.python.org/issue38106> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38106] Race in PyThread_release_lock - can lead to memory corruption and deadlock
Kirill Smelkov added the comment: Thanks for feedback. Yes, since for Python-level lock, PyThread_release_lock() is called with GIL held: https://github.com/python/cpython/blob/v2.7.16-129-g58d61efd4cd/Modules/threadmodule.c#L69-L82 the GIL effectively serves as the synchronization device in between T2 releasing the lock, and T1 proceeding after second sema.acquire() when it gets to execute python-level code with `del sema`. However a) there is no sign that this aspect - that release must be called under GIL - is being explicitly relied upon by PyThread_release_lock() code, and b) e.g. _testcapimodule.c already has a test which calls PyThread_release_lock() with GIL released: https://github.com/python/cpython/blob/v2.7.16-129-g58d61efd4cd/Modules/_testcapimodule.c#L1972-L2053 https://github.com/python/cpython/blob/v2.7.16-129-g58d61efd4cd/Modules/_testcapimodule.c#L1998-L2002 Thus, I believe, there is a bug in PyThread_release_lock() and we were just lucky not to hit it due to GIL and Python-level usage. For the reference, I indeed started to observe the problem when I moved locks and other code that implement channels in Pygolang from Python to C level: https://lab.nexedi.com/kirr/pygolang/commit/69db91bf https://lab.nexedi.com/kirr/pygolang/commit/3b241983?expand_all_diffs=1 -- ___ Python tracker <https://bugs.python.org/issue38106> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8410] Fix emulated lock to be 'fair'
Kirill Smelkov added the comment: At least condition variable signalling has to be moved to be done under mutex for correctness: https://bugs.python.org/issue38106. -- nosy: +navytux ___ Python tracker <https://bugs.python.org/issue8410> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue433625] bug in PyThread_release_lock()
Kirill Smelkov added the comment: I still believe there is a race here and it can lead to MEMORY CORRUPTION and DEADLOCK: https://bugs.python.org/issue38106. -- nosy: +navytux ___ Python tracker <https://bugs.python.org/issue433625> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38106] Race in PyThread_release_lock - can lead to MEMORY CORRUPTION and DEADLOCK
New submission from Kirill Smelkov : Hello up there. I believe I've discovered a race in PyThread_release_lock on Python2.7 that, on systems where POSIX semaphores are not available and Python locks are implemented with mutexes and condition variables, can lead to MEMORY CORRUPTION and DEADLOCK. The particular system I've discovered the bug on is macOS Mojave 10.14.6. The bug is already fixed on Python3 and the fix for Python2 is easy: git cherry-pick 187aa545165d Thanks beforehand, Kirill Bug description ( Please see attached pylock_bug.pyx for the program that triggers the bug for real. ) On Darwin, even though this is considered as POSIX, Python uses mutex+condition variable to implement its lock, and, as of 20190828, Py2.7 implementation, even though similar issue was fixed for Py3 in 2012, contains synchronization bug: the condition is signalled after mutex unlock while the correct protocol is to signal condition from under mutex: https://github.com/python/cpython/blob/v2.7.16-127-g0229b56d8c0/Python/thread_pthread.h#L486-L506 https://github.com/python/cpython/commit/187aa545165d (py3 fix) PyPy has the same bug for both pypy2 and pypy3: https://bitbucket.org/pypy/pypy/src/578667b3fef9/rpython/translator/c/src/thread_pthread.c#lines-443:465 https://bitbucket.org/pypy/pypy/src/5b42890d48c3/rpython/translator/c/src/thread_pthread.c#lines-443:465 Signalling condition outside of corresponding mutex is considered OK by POSIX, but in Python context it can lead to at least memory corruption if we consider the whole lifetime of python level lock. For example the following logical scenario: T1 T2 sema = Lock() sema.acquire() sema.release() sema.acquire() free(sema) ... can translate to the next C-level calls: T1 T2 # sema = Lock() sema = malloc(...) sema.locked = 0 pthread_mutex_init() pthread_cond_init (_released) # sema.acquire() pthread_mutex_lock() # sees sema.locked == 0 sema.locked = 1 pthread_mutex_unlock() # sema.release() pthread_mutex_lock() sema.locked = 0 pthread_mutex_unlock() # OS scheduler gets in and relinquishes control from T2 # to another process ... # second sema.acquire() pthread_mutex_lock() # sees sema.locked == 0 sema.locked = 1 pthread_mutex_unlock() # free(sema) pthread_mutex_destroy() pthread_cond_destroy (_released) free(sema) # ... e.g. malloc() which returns memory where sema was ... # OS scheduler returns control to T2 # sema.release() continues # # BUT sema was already freed and writing to anywhere # inside sema block CORRUPTS MEMORY. In particular if # _another_ python-level lock was allocated where sema # block was, writing into the memory can have effect on # further synchronization correctness and in particular # lead to deadlock on lock that was next allocated. pthread_cond_signal(_released) Note that T2.pthread_cond_signal(_released) CORRUPTS MEMORY as it is called when sema memory was already freed and is potentially reallocated for another object. The fix is to move pthread_cond_signal to be done under corresponding mutex: # sema.release() pthread_mutex_lock() sema.locked = 0 pthread_cond_signal(_released) pthread_mutex_unlock() by cherry-picking commit 187aa545165d ("Signal condition variables with the mutex held. Destroy condition variables before their mutexes"). Bug history The bug was there since 1994 - since at least [1]. It was discussed in 2001 with original code author[2], but the code was still considered to be race-free. In 2010 the place where pthread_cond_signal should be - before or after pthread_mutex_unlock - was discussed with the rationale to avoid threads bouncing[3,4,5], and in 2012 pthread_cond_signal was moved to be called from under mutex, but only for CPython3[6,7]. In 2019 the bug was (re-)discovered while testing Pygolang[8] on macOS with CPython2 and PyPy2 and PyPy3. [1] https://github.com/python/cpython/commit/2c8cb9f3d240 [2] https://bugs.python.org/issue433625 [3] https://bugs.python.org/issue8299#msg103224 [4] https://bugs.python.org/issue8410#msg103313 [5] https://bugs.python.org/issue8411#msg113301 [6] https://bugs.python.org/issue15038#msg163187 [7] https://github.com/python/cpython/commit/187aa545165d [8] https: