Hi Julian, "Julian Graham" <[EMAIL PROTECTED]> writes:
> ... > scm_i_pthread_mutex_unlock (&m->lock); > SCM_TICK; > scm_i_scm_pthread_mutex_lock (&m->lock); > } > block_self (m->waiting, mutex, &m->lock, timeout); > > ...which means that if the loop is entered while the mutex is still > locked but the owner unlocks it after the locking thread releases the > administrative lock to run the tick, the locking thread will sleep > forever because it doesn't re-check the state of the mutex. I've made > a small change (blocking before doing the tick instead of after) that > seems to resolve the issue (so far no lock-ups using Han-Wen's x.test > for a couple of hours). There's a patch attached. I think I understand your description, assuming "the mutex" is M, "the administrative lock" is `M->lock', and "the state" is the rest of the `fat_mutex' structure. Let me rephrase it: what can happen is that, during the tick, another thread could actually take M, increase `M->level' and mark itself as the owner. After the tick, our primary thread takes `M->lock' back, thinking it now owns M, and goes to sleep; but M is actually already taken by that other thread, so our primary thread never wakes up. (Not sure this description is any clearer...) I guess it can be applied to 1.8 as well? Another question: why is there this mixture of `scm_i_pthread' and `scm_i_scm_pthread' calls? Thanks for tracking it down! Ludo'.