[issue11618] Locks broken wrt timeouts on Windows
Kristján Valur Jónsson krist...@ccpgames.com added the comment: Possibly the patch had a mixup I'm going to rework it a bit and post as a separate issue. -- status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11618 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11618] Locks broken wrt timeouts on Windows
Antoine Pitrou pit...@free.fr added the comment: I agree that Martin that it's not a good idea to add dead code. Furthermore, you patch has: +#ifndef _PY_EMULATED_WIN_CV +#define _PY_EMULATED_WIN_CV 0 /* use emulated condition variables */ +#endif + +#if !defined NTDDI_VISTA || NTDDI_VERSION NTDDI_VISTA +#undef _PY_EMULATED_WIN_CV +#define _PY_EMULATED_WIN_CV 1 +#endif so am I right to understand that when compiled under Vista or later, it will produce an XP-incompatible binary? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11618 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11618] Locks broken wrt timeouts on Windows
Kristján Valur Jónsson krist...@ccpgames.com added the comment: Again, to clarify because this seems to have been put to sleep by Martin's unfortunate dismissal. A recap of the patch: 1) Extract the Contition Variable functions on windows out of ceval_gil.h and into thread_nt_cv.h, so that they can be used in more places. 2) Implement the Lock primitive in Python using CritialSection and condition variables, rather than windows Mutexes. This gives a large performance boost on uncontended locks. 3) Provide an alternate implementation of the Condition Variable for a build target of Vista/Server 2008, using the native contidion variable objects available for that platform. I think Martin got distraught by 3) and though that was the only thing this patch is about. The important part is 1) and 2) whereas 3) is provided as a bonus (and to make sure that 1) is future-safe) So, can we get this reviewed please? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11618 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11618] Locks broken wrt timeouts on Windows
Martin v. Löwis mar...@v.loewis.de added the comment: As it stands, the patch is pointless, and can safely be rejected. We will just not have defined NTDDI_VERSION at NTDDI_VISTA for any foreseeable future, so all the Vista-specific code can be eliminated from the patch. Python had been using dynamic checking for API forever. In 2.5, there was a check for presence of GetFileAttributesExA; in 2.4, there was a check for CryptAcquireContextA. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11618 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11618] Locks broken wrt timeouts on Windows
Kristján Valur Jónsson krist...@ccpgames.com added the comment: Martin, I think you misunderstand completely. the patch is _not_ about using the VISTA features. It is about not using a mutex for threading.lock. Currently, the locks in python use Mutex objects, and a WaitForSingleObjects() system call to acquire them. This patch replaces theses locks with user-level objects (critical sections and condition variables.). This drops the time needed for an uncontended acquire/release by 60% since there is no kernel transition and scheduling. The patch comes in two flavors. The current version _emulates_ condition variables on Windows by the same mechanism as I introduced for the new GIL, that is, using a combination of critical section objects and a construct made of a semaphore and a counter. Also provided, for those that want, and for future reference, is a version that uses native system objects (windows condition variables and SRWLocks). I can drop them from the patch to make you happy, but they are dormant and nicely show how conditional compilation can switch in more modern features for a different target architecture. K -Original Message- From: Martin v. Löwis [mailto:rep...@bugs.python.org] Sent: 30. apríl 2012 09:05 To: Kristján Valur Jónsson Subject: [issue11618] Locks broken wrt timeouts on Windows Martin v. Löwis mar...@v.loewis.de added the comment: As it stands, the patch is pointless, and can safely be rejected. We will just not have defined NTDDI_VERSION at NTDDI_VISTA for any foreseeable future, so all the Vista-specific code can be eliminated from the patch. Python had been using dynamic checking for API forever. In 2.5, there was a check for presence of GetFileAttributesExA; in 2.4, there was a check for CryptAcquireContextA. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11618 ___ -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11618 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11618] Locks broken wrt timeouts on Windows
Kristján Valur Jónsson krist...@ccpgames.com added the comment: Antoine: of course, sorry for rushing you. Martin, This is an XP patch. The vista option is put in there as a compile time option, and disabled by hand. I'm not adding any apis that weren't already in use since the new gil (windows Semaphores). Incidentally, we should make sure that python defines NTDDI_VERSION to NTDDI_WINXP (0x0501), either in the sources before including windows (tricky) or in the solution (probably in the .prefs files) This will ensure that we don't attempt to use non-existent features, unless we dynamically check for them. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11618 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11618] Locks broken wrt timeouts on Windows
Antoine Pitrou pit...@free.fr added the comment: This is an XP patch. The vista option is put in there as a compile time option, and disabled by hand. I'm not adding any apis that weren't already in use since the new gil (windows Semaphores). Martin means that you shouldn't use #ifdef's but runtime detection, so that we can provide a single installer for all Windows versions. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11618 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11618] Locks broken wrt timeouts on Windows
Kristján Valur Jónsson krist...@ccpgames.com added the comment: I understand what he meant, but that wasn't the intent of the patch. The patch is to use simulated critical sections using a semaphore, same as the new GIL implementation already does. If you want dynamic runtime detection, then this is a feature request :) I'm not sure we do it elsewhere in Python, and the benefit is doubtful... -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11618 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11618] Locks broken wrt timeouts on Windows
Brian Curtin br...@python.org added the comment: We do the runtime checks for a few things in winreg as well as the os.symlink implementation and i think a few other supplemental functions for symlinking. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11618 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11618] Locks broken wrt timeouts on Windows
Kristján Valur Jónsson krist...@ccpgames.com added the comment: Ok, but the patch as provided would become more compliated. For general consumption, the primitives would need to become dynamically allocated structures, and so on. I'm not sure that its worth the effort, but I can have a look. (I thought the patch was radical enough, tbh.) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11618 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11618] Locks broken wrt timeouts on Windows
Kristján Valur Jónsson krist...@ccpgames.com added the comment: So, what do you think, should this go in? Any qualms about the thread_nt_cv.h header? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11618 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11618] Locks broken wrt timeouts on Windows
Antoine Pitrou pit...@free.fr added the comment: So, what do you think, should this go in? Any qualms about the thread_nt_cv.h header? On the principle it's ok, but I'd like to do a review before it goes in :) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11618 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11618] Locks broken wrt timeouts on Windows
Martin v. Löwis mar...@v.loewis.de added the comment: -1. Choice of operating system must be a run-time decision, not a compile-time decision. We will have to support XP for quite some time. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11618 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11618] Locks broken wrt timeouts on Windows
Kristján Valur Jónsson krist...@ccpgames.com added the comment: Here is a new patch. I've factored out the NT condittion variable code into thread_nt_cv.h which is now used by both thread_nt.h and ceval_gil.h -- Added file: http://bugs.python.org/file25351/ntlocks.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11618 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11618] Locks broken wrt timeouts on Windows
Kristján Valur Jónsson krist...@ccpgames.com added the comment: Any thougts? Is a 60% performance increase for the common case of acquiring an uncontested lock worth doing? Btw, for our console game I also opted for non-semaphore based locks in thread_pthread.h, because our console profilers were alarmed at all the kernel transitions caused by the GIL being ticked -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11618 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11618] Locks broken wrt timeouts on Windows
Antoine Pitrou pit...@free.fr added the comment: Is a 60% performance increase for the common case of acquiring an uncontested lock worth doing? Yes, I agree it is. However, the Vista-specific path seems uninteresting, if it's really 2% faster. our console profilers were alarmed at all the kernel transitions caused by the GIL being ticked That's the old GIL. The new GIL uses a fixed timeout with a condition variable. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11618 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11618] Locks broken wrt timeouts on Windows
Kristján Valur Jónsson krist...@ccpgames.com added the comment: The vista specific path is included there for completeness, if and when code moves to that platform, besides showing what the emulated CV is actually emulating. Also, I am aware of the old/new GIL, but our console game uses python 2.7 and the old GIL will be living on for many a good year, just like 2.7 will. But you make a good point. I had forgotten that the new GIL is actually implemented with emulated condition variables (current version contributed by myself :). I think a different patch is in order, where ceval_gil.h makes use of the platform specific condition variable services as declared in thread_platform.h. There is no point in duplicating code. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11618 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11618] Locks broken wrt timeouts on Windows
Kristján Valur Jónsson krist...@ccpgames.com added the comment: Two runs with standard locks: D:\pydev\hg\cpython2pcbuild\amd64\python.exe -m timeit -s import _thread; l = _thread.allocate_lock() l.acquire();l.release() 100 loops, best of 3: 0.746 usec per loop D:\pydev\hg\cpython2pcbuild\amd64\python.exe -m timeit -s import _thread; l = _thread.allocate_lock() l.acquire();l.release() 100 loops, best of 3: 0.749 usec per loop Two runs with CV locks (emulated) D:\pydev\hg\cpython2pcbuild\amd64\python.exe -m timeit -s import _thread; l = _thread.allocate_lock() l.acquire();l.release() 100 loops, best of 3: 0.278 usec per loop D:\pydev\hg\cpython2pcbuild\amd64\python.exe -m timeit -s import _thread; l = _thread.allocate_lock() l.acquire();l.release() 100 loops, best of 3: 0.279 usec per loop Two runs with CV locks targeted for Vista: D:\pydev\hg\cpython2pcbuild\amd64\python.exe -m timeit -s import _thread; l = _thread.allocate_lock() l.acquire();l.release() 100 loops, best of 3: 0.272 usec per loop D:\pydev\hg\cpython2pcbuild\amd64\python.exe -m timeit -s import _thread; l = _thread.allocate_lock() l.acquire();l.release() 100 loops, best of 3: 0.272 usec per loop You can see the big win from not doing kernel switches all the time. shedding 60% of the time. Once in user space, moving from CriticalSection objects to SRWLock objects is less beneficial, being overshadowed by Python overhead. Still, 2% overall is not to be frowned upon. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11618 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11618] Locks broken wrt timeouts on Windows
Kristján Valur Jónsson krist...@ccpgames.com added the comment: Here is a new patch. This uses critical sections and condition variables to avoid kernel mode switches for locks. Windows mutexes are expensive and for uncontented locks, this offers a big win. It also adds an internal set of critical section/condition variable structures, that can be used on windows to do other such things without resorting to explicit kernel objects. This code works on XP and newer, since it relies on the semaphore kernel object being present. In addition, if compiled to target Vista or greater, it will use the built-in critical section primitives and the FRWLock objects (which are faster still than CriticalSection objects and more robust) -- status: pending - open Added file: http://bugs.python.org/file25271/ntlocks.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11618 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11618] Locks broken wrt timeouts on Windows
Antoine Pitrou pit...@free.fr added the comment: This uses critical sections and condition variables to avoid kernel mode switches for locks. Windows mutexes are expensive and for uncontented locks, this offers a big win. Can you post some numbers? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11618 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11618] Locks broken wrt timeouts on Windows
Roundup Robot devnull@devnull added the comment: New changeset 9b12af6e9ea9 by Antoine Pitrou in branch '3.2': Issue #11618: Fix the timeout logic in threading.Lock.acquire() under http://hg.python.org/cpython/rev/9b12af6e9ea9 New changeset 9d658f000419 by Antoine Pitrou in branch 'default': Issue #11618: Fix the timeout logic in threading.Lock.acquire() under http://hg.python.org/cpython/rev/9d658f000419 -- nosy: +python-dev ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11618 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11618] Locks broken wrt timeouts on Windows
Antoine Pitrou pit...@free.fr added the comment: I have now committed the semaphore implementation, so as to fix the issue. Potential performance optimizations can still be discussed, of course (either here or in a new issue, I'm not sure). -- resolution: - fixed stage: - committed/rejected status: open - pending ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11618 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11618] Locks broken wrt timeouts on Windows
Kristján Valur Jónsson krist...@ccpgames.com added the comment: Martin: I wouldn't worry too much about replacing a Mutex with a Semaphore. There is no reason to believe that they behave in any way different scheduling wise, and if they did, then any python code that this would affect would be extremely poorly written. sbt: Look, I really hate to be a pain but please consider: In line 50 of your patch the thread may pause at any point, perhaps even a number of times. Meanwhile, a number of locks/unlocks may go by. The values of owned and timeouts that the reader sees may be from any number of different lock states that the lock goes through during this, including any number of different reset cycles of these counters. In short, there is no guarantee that the values read represent any kind of mutually consistent state. They might as well be from two different locks. Please allow me to repeat: Lockless programming is notoriously hard and there is almost always one subtlety or other that is overlooked. I can't begin to count the number of times I've reluctantly had to admit defeat to its devious manipulations. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11618 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11618] Locks broken wrt timeouts on Windows
Kristján Valur Jónsson krist...@ccpgames.com added the comment: Sbt: I re-read the code and while I still maintain that the evaluation in line 50 is meaningless, I agree that the worst that can happen is an incorrect timeout. It is probably harmless because this state is only encountered for timeout==0, and it is only incorrect in the face of lock contention, while a 0 timeout provides no guarantees between two threads. So, I suggest a change in the comments: Do not claim that the value is never an underestimate, and explain how falsely returning a WAIT_TIMEOUT is safe and only occurs when the lock is heavily contented. Sorry for being so nitpicky but having this stuff correct is crucial. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11618 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11618] Locks broken wrt timeouts on Windows
sbt shibt...@gmail.com added the comment: krisvale wrote: So, I suggest a change in the comments: Do not claim that the value is never an underestimate, and explain how falsely returning a WAIT_TIMEOUT is safe and only occurs when the lock is heavily contented. Sorry for being so nitpicky but having this stuff correct is crucial. Nitpickiness is a necessity ;-) I've done a new version which replaces the meaningless racy test on line 50 with the simpler test else if (mutex-timeouts == 0) As with the old meaningless test, if the test succeeds then there must at least have been very recent conention for the lock, so timing out is reasonable. Also the new patch only considers rezeroing mutex-timeouts if we acquire the lock on the slow path. The patch contains more comments than before. -- Added file: http://bugs.python.org/file21335/locktimeout3.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11618 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11618] Locks broken wrt timeouts on Windows
Changes by sbt shibt...@gmail.com: Removed file: http://bugs.python.org/file21335/locktimeout3.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11618 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11618] Locks broken wrt timeouts on Windows
sbt shibt...@gmail.com added the comment: krisvale wrote: So, I suggest a change in the comments: Do not claim that the value is never an underestimate, and explain how falsely returning a WAIT_TIMEOUT is safe and only occurs when the lock is heavily contented. Sorry for being so nitpicky but having this stuff correct is crucial. Nitpickiness is a necessity ;-) I've done a new version which replaces the meaningless racy test on line 50 with the simpler test else if (mutex-timeouts == 0) As with the old meaningless test, if the test succeeds then there must at least have been very recent conention for the lock, so timing out is reasonable. Also the new patch only considers rezeroing mutex-timeouts if we acquire the lock on the slow path. The patch contains more comments than before. -- Added file: http://bugs.python.org/file21336/locktimeout3.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11618 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11618] Locks broken wrt timeouts on Windows
Kristján Valur Jónsson krist...@ccpgames.com added the comment: Yes, the race condition with the timeout is a problem. Here is a patch that implements this lock using a condition variable. I agree that one must consider performance/simplicity when doing this. -- Added file: http://bugs.python.org/file21322/locktimeout2.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11618 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11618] Locks broken wrt timeouts on Windows
Antoine Pitrou pit...@free.fr added the comment: Yes, the race condition with the timeout is a problem. Here is a patch that implements this lock using a condition variable. I agree that one must consider performance/simplicity when doing this. I don't understand why you need something that complicated. A simple semaphore should be enough (as in the POSIX implementation). -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11618 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11618] Locks broken wrt timeouts on Windows
Kristján Valur Jónsson krist...@ccpgames.com added the comment: I'm just providing this as a fast alternative to the Semaphore, which as far as I know, will cause a kernel call every time. Complicated is relative. In terms of the condition variable api, I wouldn't say that it is. But given the fact that we have to emulate condition variables on older windows, then yes, it is complex. If we are rolling our own instead of using Semaphores (as has been suggested for performance reasons) then using a Condition variable is IMHO safer than a custom solution because the correctness of that approach is so easily provable. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11618 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11618] Locks broken wrt timeouts on Windows
Antoine Pitrou pit...@free.fr added the comment: I'm just providing this as a fast alternative to the Semaphore, which as far as I know, will cause a kernel call every time. A Semaphore might be slow, but I'm not sure other primitives are faster. For the record, I tried another implementation using a critical section, and it's not significantly faster under a VM (even though MSDN claims critical sections are fast). Have you timed your solution? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11618 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11618] Locks broken wrt timeouts on Windows
sbt shibt...@gmail.com added the comment: If we are rolling our own instead of using Semaphores (as has been suggested for performance reasons) then using a Condition variable is IMHO safer than a custom solution because the correctness of that approach is so easily provable. Assuming that you trust the implementation of condition variables, then I agree. Unfortunately implementing condition variables correctly on Windows is notoriously difficult. The patch contains the lines + Generic emulations of the pthread_cond_* API using + Win32 functions can be found on the Web. + The following read can be edificating (or not): + http://www.cse.wustl.edu/~schmidt/win32-cv-1.html Apparently all the examples from that web page are faulty one way or another. http://newsgroups.derkeiler.com/Archive/Comp/comp.programming.threads/2008-07/msg00025.html contains the following quote: Perhaps this list should provide links to a reliable windows condition variable implementation instead of continuously bad mouthing the ~schmidt/win32-cv-1.html page and thereby raising it's page rank. It would greatly help out all us newbies out here. pthreads-w32 used to use a solution depending on that paper but changed to something else. The following is a long but relevant read: ftp://sourceware.org/pub/pthreads-win32/sources/pthreads-w32-2-8-0-release/README.CV Of course implementing condition variables is a whole lot easier if you don't need to broadcast and you only need weak guarantees on the behaviour. So python's implementation may be quite sufficient. (It does appear that a thread which calls COND_SIGNAL() may consume that signal with a later call of COND_WAIT(). A proper implementation should never allow that because it can cause deadlocks in code depending on normal pthread sematics.) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11618 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11618] Locks broken wrt timeouts on Windows
Kristján Valur Jónsson krist...@ccpgames.com added the comment: Emulating condition variables on windows became easy once Semaphores were provided by the OS because they provide a way around the lost wakeup problem. The current implementation in cpython was submitted by me :) The source material is provided for reference only. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11618 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11618] Locks broken wrt timeouts on Windows
sbt shibt...@gmail.com added the comment: Benchmarks (on an old laptop running XP without a VM) doing D:\Repos\cpython\PCbuildpython -m timeit -s from threading import Lock; l = Lock() l.acquire(); l.release() 100 loops, best of 3: 0.934 usec per loop default:0.934 locktimeout.patch: 0.965 semlocknt.patch:2.76 locktimeout2.patch: 2.03 -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11618 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11618] Locks broken wrt timeouts on Windows
Kristján Valur Jónsson krist...@ccpgames.com added the comment: Btw, the locktimeout.patch appears to have a race condition. LeaveNonRecursiveMutex may SetEvent when there is no thread waiting (because a timeout just occurred, but the thread on which it happened is still somewhere around line #62 ). This will cause the next WaitForSingleObject() to succeed, when it shouldn't. It is this race between the timeout occurring, and the ability of us being able to register that in the lock's bookkeeping, that is the source of all the race problems with the timeout. This is what prompted me to submit the condition variable version. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11618 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11618] Locks broken wrt timeouts on Windows
Antoine Pitrou pit...@free.fr added the comment: Just for the record, here is the critical section-based version. I would still favour committing the semaphore-based version first (especially in 3.2), and then discussing performance improvements if desired. -- Added file: http://bugs.python.org/file21325/critlocknt.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11618 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11618] Locks broken wrt timeouts on Windows
sbt shibt...@gmail.com added the comment: Btw, the locktimeout.patch appears to have a race condition. LeaveNonRecursiveMutex may SetEvent when there is no thread waiting (because a timeout just occurred, but the thread on which it happened is still somewhere around line #62 ). This will cause the next WaitForSingleObject() to succeed, when it shouldn't. I believe the lock is still in a consistent state. If this race happens and SetEvent() is called then we will must have mutex-owned -1 because the timed out waiter is still counted by mutex-owned. This prevents the tests involving interlocked functions from giving true. Thus WaitForSingleObject() is the ONLY way for a waiter to get the lock. In other words, as soon as a timeout happens the fast interlocked path gets blocked. It is only unblocked again after a call to WaitForSingleObject() succeeds: then the thread which now owns the lock fixes mutex-owned using mutex-timeouts and the interlocked path is operational again (unless another timeout happens). I can certainly understand the desire to follow the KISS principle. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11618 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11618] Locks broken wrt timeouts on Windows
Kristján Valur Jónsson krist...@ccpgames.com added the comment: Antoine: I agree, the semaphore is the quick and robust solution. sbt: I see your point. Still, I think we still may have a flaw: The statement that (owned-timeouts) is never an under-estimate isn't true on modern architectures, I think. The order of the atomic decrement operations in the code means nothing and cannot be depended on to guarantee such a claim: The thread doing the reading may see the individual updates in any order, and so the estimate may be an over- or an underestimate. It would fix this and simplify things a lot to take the special case for timeout==0 out of the code. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11618 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11618] Locks broken wrt timeouts on Windows
sbt shibt...@gmail.com added the comment: sbt wrote: - I see your point. Still, I think we still may have a flaw: The statement that (owned-timeouts) is never an under-estimate isn't true on modern architectures, I think. The order of the atomic decrement operations in the code means nothing and cannot be depended on to guarantee such a claim: The thread doing the reading may see the individual updates in any order, and so the estimate may be an over- or an underestimate. - The interlocked functions act as read (and write) memory barriers, so mutex-timeout is never any staler than the value of owned obtained from the preceeding interlocked function call. As you say my claim that (owned-timeout) is never an underestimate is dubious. But the only time I use this quantity is in this bit: else if (owned - mutex-timeouts != -1) /* harmless race */ return WAIT_TIMEOUT ; If this test gives a false negative we just fall through to the slow path (no problem). If we get a false positive it is because one of the two following races happened: 1) Another thread just got the lock: letting the non-blocking acquire fail is clearly the right thing to do. 2) Another thread just timed out: this means that a third thread must have held the lock up until very recently, so allowing a non-blocking acquire to fail is entirely reasonable (even if WaitForSingleObject() might now succeed). -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11618 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11618] Locks broken wrt timeouts on Windows
Kristján Valur Jónsson krist...@ccpgames.com added the comment: There is no barrier in use on the read part. I realize that this is a subtle point, but in fact, the atomic functions make no memory barrier guarantees either (I think). And even if they did, you are not using a memory barrier when you read the 'timeouts' to perform the subtraction. On a multiprocessor machine the two values can easily fall on two cache lines and become visible to the other cpu in a random fashion. In other words: One cpu decreases the owner and timeouts at about the same time. A different thread, on a different cpu may see the decrease in owner but not the decrease in timeouts until at some random later point. Lockless algorithms are notoriously hard and it is precisely because of subtle pitfalls like these. I could even be wrong about the above, but that would not be blindingly obvious either. I'm sure you've read something similar but this is where I remember seeing some of this stuff mentioned: http://msdn.microsoft.com/en-us/library/ee418650(v=vs.85).aspx -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11618 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11618] Locks broken wrt timeouts on Windows
Kristján Valur Jónsson krist...@ccpgames.com added the comment: Antoine: I notice that even the fast path contains a ResetEvent() call. I think this is a kernel call and so just as expensive as directly using a semaphore :). Otherwise, the logic looks robust, although ResetEvent() and Event objects always give me an uneasy feeling. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11618 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11618] Locks broken wrt timeouts on Windows
Antoine Pitrou pit...@free.fr added the comment: Antoine: I notice that even the fast path contains a ResetEvent() call. I think this is a kernel call and so just as expensive as directly using a semaphore :) Yes, in my timings it doesn't show significant improvements compared to the semaphore approach (although again it's on a VM, so I'm not sure how much this reflects a native Windows system). -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11618 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11618] Locks broken wrt timeouts on Windows
sbt shibt...@gmail.com added the comment: krisvale wrote There is no barrier in use on the read part. I realize that this is a subtle point, but in fact, the atomic functions make no memory barrier guarantees either (I think). And even if they did, you are not using a memory barrier when you read the 'timeouts' to perform the subtraction. On a multiprocessor machine the two values can easily fall on two cache lines and become visible to the other cpu in a random fashion. In other words: One cpu decreases the owner and timeouts at about the same time. A different thread, on a different cpu may see the decrease in owner but not the decrease in timeouts until at some random later point. From the webpage you linked to: Sometimes the read or write that acquires or releases a resource is done using one of the InterlockedXxx functions. On Windows this simplifies things, because on Windows, the InterlockedXxx functions are all full-memory barriers—they effectively have a CPU memory barrier both before and after them, which means that they are a full read-acquire or write-release barrier all by themselves. Interlocked functions would be pretty useless for implementing mutexes if they did not also act as some kind of barrier: preventing two threads from manipulating an object at the same time is not much use if they don't also get up-to-date views of that object while they own the lock. Given that mutex-timeout is only modified by interlocked functions, an unprotected read of mutex-timeout will get a value which is at least as fresh as the one available the last time we crossed a barrier by calling InterlockedXXX() or WaitForSingleObject(). Note that if the read of mutex-timeouts in this line if ((timeouts = mutex-timeouts) != 0) gives the wrong answer it will be an underestimate because we own the lock and the only other threads which might interfere will be incrementing the counter. The worst that can happen is that the fast path remains blocked: consistency is not affected. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11618 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11618] Locks broken wrt timeouts on Windows
Martin v. Löwis mar...@v.loewis.de added the comment: I would still favour committing the semaphore-based version first (especially in 3.2), and then discussing performance improvements if desired. For 3.2, I would prefer a solution that makes least changes to the current code. This is better than fundamentally replacing the synchronization mechanism which locks are based on. For 3.3, I predict that any Semaphore-based version will be shortly replaced by something fast. Benchmarks seem to indicate that you can get much faster than semaphores. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11618 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11618] Locks broken wrt timeouts on Windows
Martin v. Löwis mar...@v.loewis.de added the comment: I realize that this is a subtle point, but in fact, the atomic functions make no memory barrier guarantees either (I think). No need to guess: http://msdn.microsoft.com/en-us/library/ms683560(v=vs.85).aspx This function generates a full memory barrier (or fence) to ensure that memory operations are completed in order. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11618 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11618] Locks broken wrt timeouts on Windows
New submission from sbt shibt...@gmail.com: In thread_nt.h, when the WaitForSingleObject() call in EnterNonRecursiveMutex() fails with WAIT_TIMEOUT (or WAIT_FAILED) the mutex is left in an inconsistent state. Note that the first line of EnterNonRecursiveMutex() is the comment /* Assume that the thread waits successfully */ Allowing EnterNonRecursiveMutex() to fail with a timeout obviously violates this promise ;-) I think the problem was introduced to Python 3.2 with: Issue7316: Add a timeout functionality to common locking operations. The following Windows session demonstrates unexpected behaviour: Python 3.3a0 (default, Mar 19 2011, 18:16:48) [MSC v.1500 32 bit (Intel)] on win32 Type help, copyright, credits or license for more information. import threading l = threading.Lock() l.acquire() True l.acquire(timeout=1) False l.release() l.locked() # should return False True l.acquire(blocking=False) # should return True False Also, after a timeout, uncontended acquires/releases always take the slow path: D:\Repos\cpython\PCbuildpython -m timeit ^ More? -s from threading import Lock; l = Lock() ^ More? l.acquire();l.release() 100 loops, best of 3: 0.974 usec per loop D:\Repos\cpython\PCbuildpython -m timeit ^ More? -s from threading import Lock; l = Lock() ^ More? -s l.acquire();l.acquire(timeout=0.1);l.release() ^ More? l.acquire();l.release() 10 loops, best of 3: 2.18 usec per loop A unit test is attached which passes on Linux but has three failures on Windows. The owned field of NRMUTEX is a count of the number of threads waiting for the mutex (not including the owner). owned will over-estimate the number of waiters if a timeout occurs, because the timed out thread will still be counted as a waiter. The obvious fix is to decrement mutex-owned when a timeout occurs. Unfortunately that would introduce a race which might allow two threads to think they own the lock at the same time. I also notice that EnterNonRecursiveMutex() wrongly sets mutex-thread_id to the current thread even when it fails with a timeout. It appears that the thread_id field is never actually used -- is it there to help with debugging? Perhaps it should just be removed. BTW only thread_pthread.h and thread_nt.h have implementations of PyThread_acquire_lock_timed(). Since this function appears to be required by _threadmodule.c, does this mean that in Python 3.2 threads are only supported with pthreads and win32? If so you can get rid of all those other thread_*.h files. -- files: test-timeout.py messages: 131515 nosy: sbt priority: normal severity: normal status: open title: Locks broken wrt timeouts on Windows type: behavior versions: Python 3.2, Python 3.3 Added file: http://bugs.python.org/file21304/test-timeout.py ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11618 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11618] Locks broken wrt timeouts on Windows
Changes by Antoine Pitrou pit...@free.fr: -- nosy: +krisvale, pitrou ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11618 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11618] Locks broken wrt timeouts on Windows
Antoine Pitrou pit...@free.fr added the comment: It appears that the thread_id field is never actually used -- is it there to help with debugging? Perhaps it should just be removed. True, I think we can remove it. does this mean that in Python 3.2 threads are only supported with pthreads and win32? If so you can get rid of all those other thread_*.h files. Getting ridding them is scheduled for 3.3. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11618 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11618] Locks broken wrt timeouts on Windows
sbt shibt...@gmail.com added the comment: First stab at a fix. Gets rid of mutex-thread_id and adds a mutex-timeouts counter. Does not try to prevent mutex-owned from overflowing. When no timeouts have occurred I don't think it changes behaviour, and it uses the same number of Interlocked functions. -- keywords: +patch Added file: http://bugs.python.org/file21306/locktimeout.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11618 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11618] Locks broken wrt timeouts on Windows
Antoine Pitrou pit...@free.fr added the comment: Well, Windows 2000 has semaphores, so why not use them? It makes the code much simpler. Patch attached (including test). -- nosy: +loewis Added file: http://bugs.python.org/file21308/semlocknt.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11618 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11618] Locks broken wrt timeouts on Windows
Changes by Antoine Pitrou pit...@free.fr: -- nosy: +brian.curtin, tim.golden ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11618 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11618] Locks broken wrt timeouts on Windows
sbt shibt...@gmail.com added the comment: Have you tried benchmarking it? Interlocked functions are *much* faster than Win32 mutex/semaphores in the uncontended case. It only doubles the time taken for a l.acquire(); l.release() loop in Python code, but at the C level it is probably 10 times slower. Do you really want the GIL to be 10 times slower in the uncontended case? ;-) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11618 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11618] Locks broken wrt timeouts on Windows
Antoine Pitrou pit...@free.fr added the comment: Have you tried benchmarking it? Interlocked functions are *much* faster than Win32 mutex/semaphores in the uncontended case. Well, I'd rather have obviously correct code than difficult-to-understand speedy code. The patch I've posted takes less than a microsecond per acquire/release pair, and that's in a virtual machine to begin with. Do you really want the GIL to be 10 times slower in the uncontended case? ;-) The GIL doesn't use these functions (see ceval_gil.h). -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11618 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11618] Locks broken wrt timeouts on Windows
Martin v. Löwis mar...@v.loewis.de added the comment: Interestingly, it used to be a Semaphore up to [5e6e9e893acd]; in [cde4da18c4fa], Yakov Markovitch rewrote this to be the faster implementation we have today. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11618 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11618] Locks broken wrt timeouts on Windows
Antoine Pitrou pit...@free.fr added the comment: Interestingly, it used to be a Semaphore up to [5e6e9e893acd]; in [cde4da18c4fa], Yakov Markovitch rewrote this to be the faster implementation we have today. At that time, the Pythread_* functions were still in use by the GIL implementation, and it made a difference judging by the commit message. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11618 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11618] Locks broken wrt timeouts on Windows
Martin v. Löwis mar...@v.loewis.de added the comment: At that time, the Pythread_* functions were still in use by the GIL implementation, and it made a difference judging by the commit message. Hmm. And if some application uses thread.lock heavily, won't it still make a difference? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11618 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11618] Locks broken wrt timeouts on Windows
Antoine Pitrou pit...@free.fr added the comment: At that time, the Pythread_* functions were still in use by the GIL implementation, and it made a difference judging by the commit message. Hmm. And if some application uses thread.lock heavily, won't it still make a difference? An acquire/release pair is less than one microsecond here. Compared to the evaluation overhead of Python code, it seems not very significant. That said, if someone can guarantee than the complex approach is correct, why not. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11618 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com