Re: libpager deadlock

Sergio Lopez Thu, 08 Apr 2010 04:17:00 -0700

El Thu, 8 Apr 2010 00:53:02 +0200
Samuel Thibault <samuel.thiba...@gnu.org> escribió:


> HEllo,
> 
> Sergio Lopez, le Wed 07 Apr 2010 12:43:15 +0200, a écrit :
> > El Sat, 27 Mar 2010 00:39:19 +0100
> > Samuel Thibault <samuel.thiba...@gnu.org> escribió:
> > > From times to times, ext2fs deadlocks on the pager->interlock
> > > mutex. This is an excerpt of what I could find in the process:
> > > 
> > > #2  0x08106e59 in memory_object_lock_request ()
> > > #3  0x0806fdeb in _pager_lock_object (p=0x81c97b8, offset=0,
> > > size=827392, should_return=2, should_flush=0, lock_value=8,
> > > sync=0) at /var/tmp/hurd-20090404/./libpager/lock-object.c:68 #4
> > > 0x0806da18 in pager_sync (p=0x81c97b8, wait=0)
> > > at /var/tmp/hurd-20090404/./libpager/pager-sync.c:31 ... #9
> > > 0x0805a9ac in periodic_sync (interval=5)
> > > at /var/tmp/hurd-20090404/./libdiskfs/sync-interval.c:119
> > > 
> > > This is the periodic sync, calling memory_object_lock_request()
> > > on the pager.  Note that before doing this, _pager_lock_object
> > > takes pager->interlock.
> > 
> > AFAIK, m_o_lock_request is an asynchronous operation, so it should
> > not block in any case. Perhaps the cthreads package is behaving
> > weird?
> 
> Above #2  0x08106e59 in memory_object_lock_request (), there is
> #0  0x080bf22c in mach_msg_trap ()
> #1  0x0808666e in mach_msg ()
> 
> So it's really hung in the kernel. And indeed, even if from
> the interface it would seem like it could be asynchronous,
> the memory_object_lock_completed() call is done from the
> memory_object_lock_request function itself...
> 

But even if m_o_lock_completed is called from m_o_lock_request, that
answer should come in another message, which arrives at another user
thread (in libpager, this is processed at lock-completed.c). So if
_pager_lock_object is called with sync=1, and is waiting for the kernel
to reply with a m_o_lock_completed, the thread should be waiting at the
"condition_wait (&p->wakeup, &p->interlock);" just a lines below. And
condition_wait releases the interlock until is woke up by another
thread.

If a thread is stalled in mach_msg_trap(), that means the kernel can't
enqueue the message for some reason (and this is very, very bad). Is
possible that ext2fs had a huge number of threads at that moment?

Re: libpager deadlock

Reply via email to