Those running HA NFS should be aware of the following two NFSD open leaks. The first is the nfs4_open_downgrade leak: http://marc.info/?l=linux-nfs&m=131077202109185&w=2 https://bugzilla.redhat.com/show_bug.cgi?id=714153
Redhat supposedly fixed this, but I never saw the errata go by.. while we waited for them to fix it, we went to an upstream kernel and got bit by this one: http://marc.info/?l=linux-nfs&m=131077202109185&w=2 NFSD open leaks will cause your filesystems to fail to umount, even after waiting through your lease time. You'll see the device's open count will be non-zero (dmsetup info <device>), even though the filesystem is unexported, and kernel nfsds are stopped. We've been running our NFS4 HA cluster for a few months now on a 3.2.5 kernel, and failover/recovery works well. Ben On May 16, 2012, at 2:19 PM, Colin Simpson wrote: > This is interesting. > > We very often see the filesystems fail to umount on busy clustered NFS > servers. > > What is the nature of the "real fix"? > > I like the idea of NFSD fully being in user space, so killing it would > definitely free the fs. > > Alan Brown (who's on this list) recently posted to a RH BZ that he was > one of the people who moved it into kernel space for performance reasons > in the past (that are no longer relevant): > > https://bugzilla.redhat.com/show_bug.cgi?id=580863#c9 > > , but I doubt this is the fix you have in mind. > > Colin > > On Tue, 2012-05-15 at 20:21 +0200, Fabio M. Di Nitto wrote: >> This solves different issues at startup, relocation and recovery >> >> Also note that there is known limitation in nfsd (both rhel5/6) that >> could cause some problems in some conditions in your current >> configuration. A permanent fix is being worked on atm. >> >> Without extreme details, you might have 2 of those services running on >> the same node and attempting to relocate one of them can fail because >> the fs cannot be unmounted. This is due to nfsd holding a lock (at >> kernel level) to the FS. Changing config to the suggested one, mask the >> problem pretty well, but more testing for a real fix is in progress. >> >> Fabio >> >> -- >> Linux-cluster mailing list >> [email protected] >> https://www.redhat.com/mailman/listinfo/linux-cluster > > > ________________________________ > > > This email and any files transmitted with it are confidential and are > intended solely for the use of the individual or entity to whom they are > addressed. If you are not the original recipient or the person responsible > for delivering the email to the intended recipient, be advised that you have > received this email in error, and that any use, dissemination, forwarding, > printing, or copying of this email is strictly prohibited. If you received > this email in error, please immediately notify the sender and delete the > original. > > > -- > Linux-cluster mailing list > [email protected] > https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list [email protected] https://www.redhat.com/mailman/listinfo/linux-cluster
