Another thought occurs to me: Maybe afs_linux_vma_close() doesn't need to hold the file lock anyway, since change to mapcnt will be protected by AFS_GLOCK() anyway right?
See the comment in afs_linux_release() for the flushcnt--can the same rationale apply to mapcnt? -- George T. Talbot <[EMAIL PROTECTED]> On Thu, 2003-07-10 at 14:02, George Talbot wrote: > Hi, > > We have a problem here with clients that make heavy use of mmap() on > files stored on an AFS server. The program doing the mmap() access will > hang, as will top, ps, etc. I'm using Linux kernel 2.4.20, with OpenAFS > 1.2.8, and OpenMosix, though I don't think it's OpenMosix, because if I > recompile the kernel without OpenMosix, I still get hangs, just not as > frequently. > > So I found this patch: > > https://lists.openafs.org/pipermail/openafs-devel/2002-December/003624.html > > This patch does not work for us. I did some further investigation of > where the program is hanging. The programs hang in > afs_linux_vma_close() right when this function tries to acquire a write > lock on the vcache entry for the file. I added some instrumentation, > and found that the holder of the lock is afs_GetDcache(). > afs_GetDcache(), when the problem occurs, has acquired the lock at > position #66 (search for ",66)" in the source code), and this lock has > been converted to a shared lock. > > The sequence of events, I believe, is this: > > afs_GetDCache() has the AFS_GLOCK(), acquires the write lock for the > file, converts the write lock to a shared lock, and drops AFS_GLOCK() > while still holding the shared lock on the file, and starts reading > blocks. > > At this point afs_linux_vma_close() gets called because the application > is unmapping the file, acquires the AFS_GLOCK(), and blocks trying to > acquire the shared lock. > > Then, I believe that afs_GetDCache() runs again after the read > completes, tries to acquire the AFS_GLOCK() and blocks. > > Classic deadlock. > > Any ideas how to fix this? I think that afs_linux_mmap() and > afx_linux_vma_close() are using the write lock to mutually exclude each > other, so I think the code still needs to hold the lock. However, it > seems to me a classic case of deadlock to drop the global lock out of > order with the file lock. Should afs_linux_vma_close() somehow wait for > any pending reads to complete? Is there a way to do that? > > The previous patch seemed only to change the timing a bit. > > Thanks for any insight. > > -- > George T. Talbot > <[EMAIL PROTECTED]> > > _______________________________________________ > OpenAFS-devel mailing list > [EMAIL PROTECTED] > https://lists.openafs.org/mailman/listinfo/openafs-devel _______________________________________________ OpenAFS-devel mailing list [EMAIL PROTECTED] https://lists.openafs.org/mailman/listinfo/openafs-devel
