Those running HA NFS should be aware of the following two NFSD open leaks.

The first is the nfs4_open_downgrade leak:
http://marc.info/?l=linux-nfs&m=131077202109185&w=2
https://bugzilla.redhat.com/show_bug.cgi?id=714153

Redhat supposedly fixed this, but I never saw the errata go by.. while we
waited for them to fix it, we went to an upstream kernel and got bit
by this one:

http://marc.info/?l=linux-nfs&m=131077202109185&w=2

NFSD open leaks will cause your filesystems to fail to umount, even after
waiting through your lease time.  You'll see the device's open count
will be non-zero (dmsetup info <device>), even though the filesystem
is unexported, and kernel nfsds are stopped.

We've been running our NFS4 HA cluster for a few months now on
a 3.2.5 kernel, and failover/recovery works well.

Ben

On May 16, 2012, at 2:19 PM, Colin Simpson wrote:

> This is interesting.
> 
> We very often see the filesystems fail to umount on busy clustered NFS
> servers.
> 
> What is the nature of the "real fix"?
> 
> I like the idea of NFSD fully being in user space, so killing it would
> definitely free the fs.
> 
> Alan Brown (who's on this list) recently posted to a RH BZ that he was
> one of the people who moved it into kernel space for performance reasons
> in the past (that are no longer relevant):
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=580863#c9
> 
> , but I doubt this is the fix you have in mind.
> 
> Colin
> 
> On Tue, 2012-05-15 at 20:21 +0200, Fabio M. Di Nitto wrote:
>> This solves different issues at startup, relocation and recovery
>> 
>> Also note that there is known limitation in nfsd (both rhel5/6) that
>> could cause some problems in some conditions in your current
>> configuration. A permanent fix is being worked on atm.
>> 
>> Without extreme details, you might have 2 of those services running on
>> the same node and attempting to relocate one of them can fail because
>> the fs cannot be unmounted. This is due to nfsd holding a lock (at
>> kernel level) to the FS. Changing config to the suggested one, mask the
>> problem pretty well, but more testing for a real fix is in progress.
>> 
>> Fabio
>> 
>> --
>> Linux-cluster mailing list
>> [email protected]
>> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
> ________________________________
> 
> 
> This email and any files transmitted with it are confidential and are 
> intended solely for the use of the individual or entity to whom they are 
> addressed. If you are not the original recipient or the person responsible 
> for delivering the email to the intended recipient, be advised that you have 
> received this email in error, and that any use, dissemination, forwarding, 
> printing, or copying of this email is strictly prohibited. If you received 
> this email in error, please immediately notify the sender and delete the 
> original.
> 
> 
> --
> Linux-cluster mailing list
> [email protected]
> https://www.redhat.com/mailman/listinfo/linux-cluster


--
Linux-cluster mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/linux-cluster

Reply via email to