El Thu, 8 Apr 2010 00:53:02 +0200 Samuel Thibault <samuel.thiba...@gnu.org> escribió:
> HEllo, > > Sergio Lopez, le Wed 07 Apr 2010 12:43:15 +0200, a écrit : > > El Sat, 27 Mar 2010 00:39:19 +0100 > > Samuel Thibault <samuel.thiba...@gnu.org> escribió: > > > From times to times, ext2fs deadlocks on the pager->interlock > > > mutex. This is an excerpt of what I could find in the process: > > > > > > #2 0x08106e59 in memory_object_lock_request () > > > #3 0x0806fdeb in _pager_lock_object (p=0x81c97b8, offset=0, > > > size=827392, should_return=2, should_flush=0, lock_value=8, > > > sync=0) at /var/tmp/hurd-20090404/./libpager/lock-object.c:68 #4 > > > 0x0806da18 in pager_sync (p=0x81c97b8, wait=0) > > > at /var/tmp/hurd-20090404/./libpager/pager-sync.c:31 ... #9 > > > 0x0805a9ac in periodic_sync (interval=5) > > > at /var/tmp/hurd-20090404/./libdiskfs/sync-interval.c:119 > > > > > > This is the periodic sync, calling memory_object_lock_request() > > > on the pager. Note that before doing this, _pager_lock_object > > > takes pager->interlock. > > > > AFAIK, m_o_lock_request is an asynchronous operation, so it should > > not block in any case. Perhaps the cthreads package is behaving > > weird? > > Above #2 0x08106e59 in memory_object_lock_request (), there is > #0 0x080bf22c in mach_msg_trap () > #1 0x0808666e in mach_msg () > > So it's really hung in the kernel. And indeed, even if from > the interface it would seem like it could be asynchronous, > the memory_object_lock_completed() call is done from the > memory_object_lock_request function itself... > But even if m_o_lock_completed is called from m_o_lock_request, that answer should come in another message, which arrives at another user thread (in libpager, this is processed at lock-completed.c). So if _pager_lock_object is called with sync=1, and is waiting for the kernel to reply with a m_o_lock_completed, the thread should be waiting at the "condition_wait (&p->wakeup, &p->interlock);" just a lines below. And condition_wait releases the interlock until is woke up by another thread. If a thread is stalled in mach_msg_trap(), that means the kernel can't enqueue the message for some reason (and this is very, very bad). Is possible that ext2fs had a huge number of threads at that moment?