Julian Graham escreveu: > Okay, I think I know what the problem is: Part of the SRFI-18 thread > start / creation process involves contention for a mutex, and there's > a bug in fat_mutex_lock code that causes the locking thread to > sometimes miss an unlocking thread's notification that a mutex is > available. So it's actually a mutex bug -- specifically, in the loop > code in fat_mutex_lock that ends with the following snippet: > > ... > scm_i_pthread_mutex_unlock (&m->lock); > SCM_TICK; > scm_i_scm_pthread_mutex_lock (&m->lock); > } > block_self (m->waiting, mutex, &m->lock, timeout); > > ...which means that if the loop is entered while the mutex is still > locked but the owner unlocks it after the locking thread releases the > administrative lock to run the tick, the locking thread will sleep > forever because it doesn't re-check the state of the mutex. I've made > a small change (blocking before doing the tick instead of after) that > seems to resolve the issue (so far no lock-ups using Han-Wen's x.test > for a couple of hours). There's a patch attached. > > (Sorry, should have noticed this earlier; the problem existed before > the changes I introduced to support SRFI-18...)
Would this also explain the 'corruption' in the evaluator we have been seeing ("bad bindings at .. ")? -- Han-Wen Nienhuys - [EMAIL PROTECTED] - http://www.xs4all.nl/~hanwen