Hi Colin, On 5/17/2012 11:47 AM, Colin Simpson wrote: > Thanks for all the useful information on this. > > I realise the bz is not for this issue, I just included it as it has the > suggestion that nfsd should actually live in user space (which seems > sensible).
Understood. I can“t really say if userland or kernel would make any difference in this specific unmount issue, but for "safety reasons" I need to assume their design is the same and behave the same way. when/if there will be a switch, we will need to look more deeply into it. With current kernel implementation we (cluster guys) need to use this approach. > > Out of interest is there a bz # for this issue? Yes one for rhel5 and one for rhel6, but they are both private at the moment because they have customer data in it. I expect that the workaround/fix (whatever you want to label it) will be available via RHN in 2/3 weeks. Fabio > > Colin > > > On Thu, 2012-05-17 at 10:26 +0200, Fabio M. Di Nitto wrote: >> On 05/16/2012 08:19 PM, Colin Simpson wrote: >>> This is interesting. >>> >>> We very often see the filesystems fail to umount on busy clustered NFS >>> servers. >> >> Yes, I am aware the issue since I have been investigating it in details >> for the past couple of weeks. >> >>> >>> What is the nature of the "real fix"? >> >> First, the bz you mention below is unrelated to the unmount problem we >> are discussing. clustered nfsd locks are a slightly different story. >> >> There are two issues here: >> >> 1) cluster users expectations >> 2) nfsd internal design >> >> (and note I am not blaming either cluster or nfsd here) >> >> Generally cluster users expect to be able to do things like (fake meta >> config): >> >> <service1.. >> <fs1.. >> <nfsexport1.. >> <nfsclient1.. >> <ip1.. >> .... >> <service2 >> <fs2.. >> <nfsexport2.. >> <nfsclient2.. >> <ip2.. >> >> and be able to move services around cluster nodes without problem. Note >> that it is irrelevant of the fs used. It can be clustered or not. >> >> This setup does unfortunately clash with nfsd design. >> >> When shutdown of a service happens (due to stop or relocation is >> indifferent): >> >> ip is removed >> exportfs -u ..... >> (and that's where we hit the nfsd design limitation) >> umount fs.. >> >> By design (tho I can't say exactly why it is done this way without >> speculating), nfsd will continue to serve open sessions via rpc. >> exportfs -u will only stop new incoming requests. >> >> If nfsd is serving a client, it will continue to hold a lock on the >> filesystem (in kernel) that would prevent the fs to be unmounted. >> >> The only way to effectively close the sessions are: >> >> - drop the VIP and wait for connections timeout (nfsd would effectively >> also drop the lock on the fs) but it is slow and not always consistent >> on how long it would take >> >> - restart nfsd. >> >> >> The "real fix" here would be to wait for nfsd containers that do support >> exactly this scenario. Allowing unexport of single fs and lock drops >> etc. etc. This work is still in very early stages upstream, that doesn't >> make it suitable yet for production. >> >> The patch I am working on, is basically a way to handle the clash in the >> best way as possible. >> >> A new nfsrestart="" option will be added to both fs and clusterfs, that, >> if the filesystem cannot be unmounted, if force_unmount is set, it will >> perform an extremely fast restart of nfslock and nfsd. >> >> We can argue that it is not the final solution, i think we can agree >> that it is more of a workaround, but: >> >> 1) it will allow service migration instead of service failure >> 2) it will match cluster users expectations (allowing different exports >> and live peacefully together). >> >> The only negative impact that we have been able to evaluate so far (the >> patch is still under heavy testing phase), beside having to add a config >> option to enable it, is that there will be a small window in which all >> clients connect to a certain node for all nfs services, will not be >> served because nfsd is restarting. >> >> So if you are migrating export1 and there are clients using export2, >> export2 will also be affected for those few ms required to restart nfsd. >> (assuming export1 and 2 are running on the same node of course). >> >> Placing things in perspective for a cluster, I think that it is a lot >> better to be able to unmount a fs and relocate services as necessary vs >> a service failing completely and maybe node being fenced. >> >> >> >> >>> >>> I like the idea of NFSD fully being in user space, so killing it would >>> definitely free the fs. >>> >>> Alan Brown (who's on this list) recently posted to a RH BZ that he was >>> one of the people who moved it into kernel space for performance reasons >>> in the past (that are no longer relevant): >>> >>> https://bugzilla.redhat.com/show_bug.cgi?id=580863#c9 >>> >>> , but I doubt this is the fix you have in mind. >> >> No that's a totally different issue. >> >>> >>> Colin >>> >>> On Tue, 2012-05-15 at 20:21 +0200, Fabio M. Di Nitto wrote: >>>> This solves different issues at startup, relocation and recovery >>>> >>>> Also note that there is known limitation in nfsd (both rhel5/6) that >>>> could cause some problems in some conditions in your current >>>> configuration. A permanent fix is being worked on atm. >>>> >>>> Without extreme details, you might have 2 of those services running on >>>> the same node and attempting to relocate one of them can fail because >>>> the fs cannot be unmounted. This is due to nfsd holding a lock (at >>>> kernel level) to the FS. Changing config to the suggested one, mask the >>>> problem pretty well, but more testing for a real fix is in progress. >>>> >>>> Fabio >>>> >>>> -- >>>> Linux-cluster mailing list >>>> Linux-cluster@redhat.com >>>> https://www.redhat.com/mailman/listinfo/linux-cluster >>> >>> >>> ________________________________ >>> >>> >>> This email and any files transmitted with it are confidential and are >>> intended solely for the use of the individual or entity to whom they are >>> addressed. If you are not the original recipient or the person responsible >>> for delivering the email to the intended recipient, be advised that you >>> have received this email in error, and that any use, dissemination, >>> forwarding, printing, or copying of this email is strictly prohibited. If >>> you received this email in error, please immediately notify the sender and >>> delete the original. >>> >>> >>> -- >>> Linux-cluster mailing list >>> Linux-cluster@redhat.com >>> https://www.redhat.com/mailman/listinfo/linux-cluster >> > > > ________________________________ > > > This email and any files transmitted with it are confidential and are > intended solely for the use of the individual or entity to whom they are > addressed. If you are not the original recipient or the person responsible > for delivering the email to the intended recipient, be advised that you have > received this email in error, and that any use, dissemination, forwarding, > printing, or copying of this email is strictly prohibited. If you received > this email in error, please immediately notify the sender and delete the > original. > -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster