AHA! After commenting out the write locking in afs_linux_vma_close(), I got the same deadlock as the 2002 December one! So I applied the December patch again, and re-commented out the write locking, and I seem to be doing OK.
Please, if the original author of afs_linux_vma_lock() and afs_linux_release() is out there, can you tell me if what I'm doing is OK? Thanks. -- George T. Talbot <[EMAIL PROTECTED]> On Thu, 2003-07-10 at 14:34, George Talbot wrote: > Another thought occurs to me: > > Maybe afs_linux_vma_close() doesn't need to hold the file lock anyway, > since change to mapcnt will be protected by AFS_GLOCK() anyway right? > > See the comment in afs_linux_release() for the flushcnt--can the same > rationale apply to mapcnt? > > -- > George T. Talbot > <[EMAIL PROTECTED]> > > > On Thu, 2003-07-10 at 14:02, George Talbot wrote: > > Hi, > > > > We have a problem here with clients that make heavy use of mmap() on > > files stored on an AFS server. The program doing the mmap() access will > > hang, as will top, ps, etc. I'm using Linux kernel 2.4.20, with OpenAFS > > 1.2.8, and OpenMosix, though I don't think it's OpenMosix, because if I > > recompile the kernel without OpenMosix, I still get hangs, just not as > > frequently. > > > > So I found this patch: > > > > https://lists.openafs.org/pipermail/openafs-devel/2002-December/003624.html > > > > This patch does not work for us. I did some further investigation of > > where the program is hanging. The programs hang in > > afs_linux_vma_close() right when this function tries to acquire a write > > lock on the vcache entry for the file. I added some instrumentation, > > and found that the holder of the lock is afs_GetDcache(). > > afs_GetDcache(), when the problem occurs, has acquired the lock at > > position #66 (search for ",66)" in the source code), and this lock has > > been converted to a shared lock. > > > > The sequence of events, I believe, is this: > > > > afs_GetDCache() has the AFS_GLOCK(), acquires the write lock for the > > file, converts the write lock to a shared lock, and drops AFS_GLOCK() > > while still holding the shared lock on the file, and starts reading > > blocks. > > > > At this point afs_linux_vma_close() gets called because the application > > is unmapping the file, acquires the AFS_GLOCK(), and blocks trying to > > acquire the shared lock. > > > > Then, I believe that afs_GetDCache() runs again after the read > > completes, tries to acquire the AFS_GLOCK() and blocks. > > > > Classic deadlock. > > > > Any ideas how to fix this? I think that afs_linux_mmap() and > > afx_linux_vma_close() are using the write lock to mutually exclude each > > other, so I think the code still needs to hold the lock. However, it > > seems to me a classic case of deadlock to drop the global lock out of > > order with the file lock. Should afs_linux_vma_close() somehow wait for > > any pending reads to complete? Is there a way to do that? > > > > The previous patch seemed only to change the timing a bit. > > > > Thanks for any insight. > > > > -- > > George T. Talbot > > <[EMAIL PROTECTED]> > > > > _______________________________________________ > > OpenAFS-devel mailing list > > [EMAIL PROTECTED] > > https://lists.openafs.org/mailman/listinfo/openafs-devel > _______________________________________________ > OpenAFS-devel mailing list > [EMAIL PROTECTED] > https://lists.openafs.org/mailman/listinfo/openafs-devel _______________________________________________ OpenAFS-devel mailing list [EMAIL PROTECTED] https://lists.openafs.org/mailman/listinfo/openafs-devel
