On Tuesday, April 12, 2005 11:11:29 AM -0400 Ted Anderson <[EMAIL PROTECTED]> wrote:
Just to make sure I understand this problem, let me try providing a specific scenario: thread A thread B -------- -------- 1: lock M; 2: check queue; 3: cond_wait(C,M); // queue is empty 4: -> kernel_scheduler { 5: unlock M; 6: lock M; ... 7: add item; ... 8: unlock M; ... 9: broadcast on C; ... // noop, nobody blocked on C 10: put B on C; 11: desched B; 12: DEADLOCK
Is this a problem that can occur with a corrently operating implementation of POSIX threads, i.e. an example of correct but non-deterministic scheduling? I've expanded the operation of cond_wait as I understand it. I observe that M correctly protects the queue itself. Treating "broadcast C" and "put B on C" as black boxes, however, it doesn't seem to help to move broadcast(C) inside M, i.e. by swapping 8 & 9. On the other hand, tt would seem a simple matter to reverse the sequence of 5 & 10, but perhaps that is inconvenient for some multi-processor schedulers.
As I understand your comment, you are saying that the broadcast must be protected by the mutex or race conditions are possible. What am I missing here?
What you're missing here is that cond_wait() is atomic. The lines you list as 5 and 10 happen at the same time, so it's not possible for thread A to acquire the lock before thread B is on the wait-list for the CV. This is why it is necessary to hold the lock through the broadcast in order to prevent a race -- otherwise, the broadcast could indeed occur between lines 2 and 10, and thread B would miss the wakeup.
Well, I always thought cond_wait() was atomic. But if that is true then I don't see why broadcast() needs to be covered by the mutex. In the above example (ignoring the kernel_scheduler stuff), in thread A's execution the broadcast (9) follows the locked item addition (6-8). So, either thread B goes first, gets added to the CV's queue and then gets awoken, or thread A goes first and there the item is added to the queue and B won't sleep. How does moving broadcast() inside the mutex change anything?
What I was responding to way this: On 4/11/2005 16:26, Horst Birthelmer wrote: > On Apr 11, 2005, at 9:21 PM, Tom Keiser wrote: >> It is a race condition. pthread_cond_wait() is not an atomic >> operation; it relies on the mutex to synchronize a set of operations >> that enqueue the calling thread onto the list of threads blocked on >> the cond var, drop the associated lock, and become blocked by the >> kernel scheduler. If you call signal/broadcast without holding the >> lock, you can race against these bookkeeping operations in >> pthread_cond_wait(). Thus, a thread in the process of blocking will >> never find out about that signal/broadcast, because cond vars are not >> stateful like semaphores. > > The mutex passed to a condition variable isn't for guarding the > bookkeeping, but that's another story, I guess.
It seemed that both Horst and Tom agreed that broadcast() should be protected, though neither was clear on the reason. The quotes from POSIX didn't clarify it for me. So what I tried to do was break out the sub-operations that must compose cond_wait() to try to see how protecting broadcast() with the mutex would help, but I couldn't construct a failing case.
Maybe one of you could outline a scenario where broadcast() must be protected to avoid a race condition?
Ted Anderson
_______________________________________________ OpenAFS-devel mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-devel
