Thanks Rob, that helps a lot. But I'm still confused: Why do I need to use a separate flag value in addition to the mutex? In other words, why can't I just do something like:
The code in T1: ns_mutex lock $mutex ns_cond wait $event_id $mutex_id $timeout The code in T2: ns_mutex lock $mutex ns_cond broadcast $cond ns_mutex unlock $mutex On Fri, Feb 22, 2002 at 01:10:16PM -0600, Rob Mayoff wrote: > The code in T1 should look something like this: > > set mutex [nsv_get the_flag mutex] > set cond [nsv_get the_flag cond] > > ns_mutex lock $mutex > > # Use a while loop to account for > # some other thread getting woken > # first and unsetting the flag! > while {![nsv_get the_flag value]} { > ns_cond wait $cond $mutex > } > > ns_mutex unlock $mutex Wait, Rob, you said ns_cond wait implicitly unlocks the mutex for us, so why are you calling ns_mutex unlock afterwards? > The code in T2 should look something like this: > > set mutex [nsv_get the_flag mutex] > set cond [nsv_get the_flag cond] > ns_mutex lock $mutex > nsv_set the_flag value 1 > ns_cond broadcast $cond > ns_mutex unlock $mutex I think it would also work if T2 unlocks the mutex immediately BEFORE calling ns_cond broadcast, rather than after. True? Also, I don't undersand your use of the while loop in T1. The meaning we've assigned to the_flag is "event X has occurred", right? So is your use of the while loop because of some assumptions specific to the application you took this example from? E.g., if some other thread already "handled" the event, then you want T1 to continue waiting for the next such event. Or is it something more general, that you always need when using ns_cond, that I'm not understanding here? And, is this while loop business related somehow to the comment in "aolserver/thread/pthread.cpp" below?? /* * On Solaris, we have noticed that when the condition and/or * mutex are process-shared instead of process-private that * pthread_cond_wait may incorrectly return ETIME. Because * we're not sure why ETIME is being returned (perhaps it's * from an underlying _lwp_cond_timedwait???), we allow * the condition to return. This should be o.k. because * properly written condition code must be in a while * loop capable of handling spurious wakeups. */ Hm, this whole business with ns_cond and mutexes sounds awfully similar to, but somewhat worse than, using the old ns_share command. (The fact that 'ns_cond wait' automagically unlocks the mutex without this being mentioned at all in the docs is particularly disturbing. Actually, it's not even clear to me from looking at the code that that's actually what it does - I'm just taking Rob on faith here.) Would it be possible for ns_cond to handle all use of mutexes itself, like nsv does? Or is there some difference between the problems that ns_share and ns_cond are solving which would prevent that? I suspect that mutex use by ns_share and ns_cond are in fact equivalent, and an nsv-style ns_cond command wold work fine, as long as you were willing to accept the same limitations as with nsv. E.g., there's no way for the caller to guarantee atomicity when doing several nsv_set statements at once, since you're not manually controlling the mutexes around the nsv buckets. On Fri, Feb 22, 2002 at 01:10:16PM -0600, Rob Mayoff wrote: > +---------- On Feb 22, Andrew Piskorski said: > > In the 'ns_cond wait' command: > > http://www.aolserver.com/docs/devel/tcl/tcl-api.adp#ns_cond > > > > What is the mutex lock FOR? Do I need to use one mutex per > > cond/event, or one mutex per thread per event, or what? > Let's use this as an example: thread T1's computation must wait for some > flag to be set. Maybe the flag is already set, in which case T1 can > proceed; maybe the flag is not set, in which case T1 must wait for some > other thread T2 to set the flag. > > Now suppose that T1 could just call "ns_cond wait $cond", with no mutex. > Here's a scenario: > > T1 checks the flag > - flag not set > > T2 sets the flag > T2 calls ns_cond broadcast $cond > > T1 calls ns_cond wait $cond > > T1 hangs indefinitely, but > the flag is set! > > In other words, there's an interval between T1 checking the flag and T1 > calling ns_cond wait, during which T2 might set the flag and signal the > condition. If that happens, T1 misses the signal and never sees that > the flag is set. > > You need T1 and T2 to cooperate using mutex to eliminate the interval: > > T1 locks $mutex > > T1 checks the flag > - flag not set > > T2 tries to lock $mutex > - blocks because T1 owns > the lock > > T1 calls ns_cond wait $cond $mutex > - this atomically unlocks $mutex > and starts waiting for $cond > to be signalled > > T2 unblocks and locks $mutex > > T2 sets the flag > > T2 calls ns_cond broadcast $cond > > T1 wakes up, still in > ns_cond wait, which tries > to lock $mutex again and > blocks because T2 still > owns the lock > > T2 unlocks $mutex > > T1 unblocks and locks $mutex > > T1 checks the flag - flag set, > T1 carries on > Of course you need to initialize the_flag at server startup: > > nsv_array set the_flag [list \ > value 0 \ > mutex [ns_mutex create the_flag] \ > cond [ns_cond create]] -- Andrew Piskorski <[EMAIL PROTECTED]> http://www.piskorski.com