Arun Sharma writes: > The daemons which are involved in freeing up pages during low memory > conditions qualify as system daemons. Making sure that these daemons > don't block avoids the deadlock. > > -Arun
The second solution involves a little more than that. Such as blessing "normal" jobs just enough to allow them to get sufficent resources to avoid a deadlock. One instance of the mmap lockup involves a case where you've got a single process dirtying a memory mapped file which is larger than physical memory. Assuming an otherwise idle system, nearly all available memory in the system will belong to the file's object & it will all be dirty. At some point, the process will trigger a fault on a non-resident page. vm_fault will call the vnode_pager_getpages to read in the faulting page. ffs_getpages (let's assume we're using ffs) will then call ffs_read to read in the pages. ffs_read will try to build a cluster. The deadlock occurs when allocbuf cannot allocate a page for one of the pages in the cluster. Here's a stack trace (from a long, long time ago, May 12th): db> tr vm_page_alloc(caa0a074,d1d,0,c58f7ba0,1fc) at vm_page_alloc allocbuf(c58f7ba0,2000,0,c58c4588,5) at allocbuf+0x3ae getblk(caa0f8c0,68e,2000,0,0) at getblk+0x32e cluster_rbuild(caa0f8c0,8000001,0,689,370b0) at cluster_rbuild+0x1df cluster_read(caa0f8c0,8000001,0,689,2000) at cluster_read+0x2cc ffs_read(caa12e28) at ffs_read+0x3ea ffs_getpages(caa12e80) at ffs_getpages+0x22c vnode_pager_getpages(caa0a074,caa12f14,1,0,c9fcdce0) at vnode_pager_getpages+0x4e vm_fault(c9fd28c0,48df9000,3,8,c9fcdce0) at vm_fault+0x484 trap_pfault(caa12fb8,1,48df9000) at trap_pfault+0xaa trap(2f,2f,2f,48df9000,48df9000) at trap+0x1aa calltrap() at calltrap+0x1c The real problem is that the pageout daemon cannot push any pages because (nearly) all the pages available to user-processes are held by the mmap'ed object. The killer is that they are all dirty & that because we're in the middle of doing a cluster read, the vnode is locked so the pageout daemon cannot touch them. A solution would be allowing the faulting process to dip into the system reserves enough so that the vm_page_alloc will succeed, which will allow the cluster read to complete. This will avoid deadlock. I personally think the first solution (always taking write faults) would be far, far better. This would allow the system to avoid getting anywhere near a deadlock situation & to remain responsive. I'm afraid that if we go with the second solution, the system would be unresponsive until the cluster read completed & the pageout daemon was able begin to flush the dirty pages in the offending object. ------------------------------------------------------------------------------ Andrew Gallatin, Sr Systems Programmer http://www.cs.duke.edu/~gallatin Duke University Email: galla...@cs.duke.edu Department of Computer Science Phone: (919) 660-6590 To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-hackers" in the body of the message