Arun Sharma writes:
 > The daemons which are involved in freeing up pages during low memory
 > conditions qualify as system daemons. Making sure that these daemons
 > don't block avoids the deadlock.
 > 
 >      -Arun

The second solution involves a little more than that.  Such as
blessing "normal" jobs just enough to allow them to get sufficent
resources to avoid a deadlock.

One instance of the mmap lockup involves a case where you've got a
single process dirtying a memory mapped file which is larger than
physical memory.  Assuming an otherwise idle system, nearly all
available memory in the system will belong to the file's object & it
will all be dirty.

At some point, the process will trigger a fault on a non-resident
page.  vm_fault will call the vnode_pager_getpages to read in the
faulting page.  ffs_getpages (let's assume we're using ffs)
will then call ffs_read to read in the pages.  ffs_read will try to
build a cluster.  The deadlock occurs when allocbuf cannot allocate a
page for one of the pages in the cluster.  Here's a stack trace (from
a long, long time ago, May 12th):

db> tr
vm_page_alloc(caa0a074,d1d,0,c58f7ba0,1fc) at vm_page_alloc
allocbuf(c58f7ba0,2000,0,c58c4588,5) at allocbuf+0x3ae
getblk(caa0f8c0,68e,2000,0,0) at getblk+0x32e
cluster_rbuild(caa0f8c0,8000001,0,689,370b0) at cluster_rbuild+0x1df
cluster_read(caa0f8c0,8000001,0,689,2000) at cluster_read+0x2cc
ffs_read(caa12e28) at ffs_read+0x3ea
ffs_getpages(caa12e80) at ffs_getpages+0x22c
vnode_pager_getpages(caa0a074,caa12f14,1,0,c9fcdce0) at 
vnode_pager_getpages+0x4e
vm_fault(c9fd28c0,48df9000,3,8,c9fcdce0) at vm_fault+0x484
trap_pfault(caa12fb8,1,48df9000) at trap_pfault+0xaa
trap(2f,2f,2f,48df9000,48df9000) at trap+0x1aa
calltrap() at calltrap+0x1c

The real problem is that the pageout daemon cannot push any pages
because (nearly) all the pages available to user-processes are held by 
the mmap'ed object.  The killer is that they are all dirty & that
because we're in the middle of doing a cluster read, the vnode is
locked so the pageout daemon cannot touch them.

A solution would be allowing the faulting process to dip into the
system reserves enough so that the vm_page_alloc will succeed, which
will allow the cluster read to complete.  This will avoid deadlock.

I personally think the first solution (always taking write faults)
would be far, far better.  This would allow the system to avoid
getting anywhere near a deadlock situation & to remain responsive.

I'm afraid that if we go with the second solution, the system would be 
unresponsive until the cluster read completed & the pageout daemon was 
able begin to flush the dirty pages in the offending object.

------------------------------------------------------------------------------
Andrew Gallatin, Sr Systems Programmer  http://www.cs.duke.edu/~gallatin
Duke University                         Email: galla...@cs.duke.edu
Department of Computer Science          Phone: (919) 660-6590


To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message

Reply via email to