Hi Colin,

On 5/17/2012 11:47 AM, Colin Simpson wrote:
> Thanks for all the useful information on this.
> 
> I realise the bz is not for this issue, I just included it as it has the
> suggestion that nfsd should actually live in user space (which seems
> sensible).

Understood. I can“t really say if userland or kernel would make any
difference in this specific unmount issue, but for "safety reasons" I
need to assume their design is the same and behave the same way. when/if
there will be a switch, we will need to look more deeply into it. With
current kernel implementation we (cluster guys) need to use this approach.

> 
> Out of interest is there a bz # for this issue?

Yes one for rhel5 and one for rhel6, but they are both private at the
moment because they have customer data in it.

I expect that the workaround/fix (whatever you want to label it) will be
available via RHN in 2/3 weeks.

Fabio

> 
> Colin
> 
> 
> On Thu, 2012-05-17 at 10:26 +0200, Fabio M. Di Nitto wrote:
>> On 05/16/2012 08:19 PM, Colin Simpson wrote:
>>> This is interesting.
>>>
>>> We very often see the filesystems fail to umount on busy clustered NFS
>>> servers.
>>
>> Yes, I am aware the issue since I have been investigating it in details
>> for the past couple of weeks.
>>
>>>
>>> What is the nature of the "real fix"?
>>
>> First, the bz you mention below is unrelated to the unmount problem we
>> are discussing. clustered nfsd locks are a slightly different story.
>>
>> There are two issues here:
>>
>> 1) cluster users expectations
>> 2) nfsd internal design
>>
>> (and note I am not blaming either cluster or nfsd here)
>>
>> Generally cluster users expect to be able to do things like (fake meta
>> config):
>>
>> <service1..
>>  <fs1..
>>   <nfsexport1..
>>    <nfsclient1..
>>     <ip1..
>> ....
>> <service2
>>  <fs2..
>>   <nfsexport2..
>>    <nfsclient2..
>>     <ip2..
>>
>> and be able to move services around cluster nodes without problem. Note
>> that it is irrelevant of the fs used. It can be clustered or not.
>>
>> This setup does unfortunately clash with nfsd design.
>>
>> When shutdown of a service happens (due to stop or relocation is
>> indifferent):
>>
>> ip is removed
>> exportfs -u .....
>> (and that's where we hit the nfsd design limitation)
>> umount fs..
>>
>> By design (tho I can't say exactly why it is done this way without
>> speculating), nfsd will continue to serve open sessions via rpc.
>> exportfs -u will only stop new incoming requests.
>>
>> If nfsd is serving a client, it will continue to hold a lock on the
>> filesystem (in kernel) that would prevent the fs to be unmounted.
>>
>> The only way to effectively close the sessions are:
>>
>> - drop the VIP and wait for connections timeout (nfsd would effectively
>>   also drop the lock on the fs) but it is slow and not always consistent
>>   on how long it would take
>>
>> - restart nfsd.
>>
>>
>> The "real fix" here would be to wait for nfsd containers that do support
>> exactly this scenario. Allowing unexport of single fs and lock drops
>> etc. etc. This work is still in very early stages upstream, that doesn't
>> make it suitable yet for production.
>>
>> The patch I am working on, is basically a way to handle the clash in the
>> best way as possible.
>>
>> A new nfsrestart="" option will be added to both fs and clusterfs, that,
>> if the filesystem cannot be unmounted, if force_unmount is set, it will
>> perform an extremely fast restart of nfslock and nfsd.
>>
>> We can argue that it is not the final solution, i think we can agree
>> that it is more of a workaround, but:
>>
>> 1) it will allow service migration instead of service failure
>> 2) it will match cluster users expectations (allowing different exports
>> and live peacefully together).
>>
>> The only negative impact that we have been able to evaluate so far (the
>> patch is still under heavy testing phase), beside having to add a config
>> option to enable it, is that there will be a small window in which all
>> clients connect to a certain node for all nfs services, will not be
>> served because nfsd is restarting.
>>
>> So if you are migrating export1 and there are clients using export2,
>> export2 will also be affected for those few ms required to restart nfsd.
>> (assuming export1 and 2 are running on the same node of course).
>>
>> Placing things in perspective for a cluster, I think that it is a lot
>> better to be able to unmount a fs and relocate services as necessary vs
>> a service failing completely and maybe node being fenced.
>>
>>
>>
>>
>>>
>>> I like the idea of NFSD fully being in user space, so killing it would
>>> definitely free the fs.
>>>
>>> Alan Brown (who's on this list) recently posted to a RH BZ that he was
>>> one of the people who moved it into kernel space for performance reasons
>>> in the past (that are no longer relevant):
>>>
>>> https://bugzilla.redhat.com/show_bug.cgi?id=580863#c9
>>>
>>> , but I doubt this is the fix you have in mind.
>>
>> No that's a totally different issue.
>>
>>>
>>> Colin
>>>
>>> On Tue, 2012-05-15 at 20:21 +0200, Fabio M. Di Nitto wrote:
>>>> This solves different issues at startup, relocation and recovery
>>>>
>>>> Also note that there is known limitation in nfsd (both rhel5/6) that
>>>> could cause some problems in some conditions in your current
>>>> configuration. A permanent fix is being worked on atm.
>>>>
>>>> Without extreme details, you might have 2 of those services running on
>>>> the same node and attempting to relocate one of them can fail because
>>>> the fs cannot be unmounted. This is due to nfsd holding a lock (at
>>>> kernel level) to the FS. Changing config to the suggested one, mask the
>>>> problem pretty well, but more testing for a real fix is in progress.
>>>>
>>>> Fabio
>>>>
>>>> --
>>>> Linux-cluster mailing list
>>>> Linux-cluster@redhat.com
>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>>
>>> ________________________________
>>>
>>>
>>> This email and any files transmitted with it are confidential and are 
>>> intended solely for the use of the individual or entity to whom they are 
>>> addressed. If you are not the original recipient or the person responsible 
>>> for delivering the email to the intended recipient, be advised that you 
>>> have received this email in error, and that any use, dissemination, 
>>> forwarding, printing, or copying of this email is strictly prohibited. If 
>>> you received this email in error, please immediately notify the sender and 
>>> delete the original.
>>>
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster@redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
> 
> 
> ________________________________
> 
> 
> This email and any files transmitted with it are confidential and are 
> intended solely for the use of the individual or entity to whom they are 
> addressed. If you are not the original recipient or the person responsible 
> for delivering the email to the intended recipient, be advised that you have 
> received this email in error, and that any use, dissemination, forwarding, 
> printing, or copying of this email is strictly prohibited. If you received 
> this email in error, please immediately notify the sender and delete the 
> original.
> 

--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

Reply via email to