Re: [Linux-HA] Antw: Managed Failovers w/ NFS HA Cluster

Charles Taylor Mon, 21 Jul 2014 07:42:11 -0700

On Jul 21, 2014, at 8:57 AM, Ulrich Windl wrote:

>>>> Charles Taylor <chas...@ufl.edu> schrieb am 17.07.2014 um 17:24 in 
>>>> Nachricht
> <761ce39a-57d8-47d2-860d-2af1936cc...@ufl.edu>:
>> I feel like this is something that must have been covered extensively 
>> already 
>> but I've done a lot of googling, looked at a lot of cluster configs, but 
>> have 
>> not found the solution.
>> 
>> I have an HA NFS cluster (corosync+pacemaker).  The relevant rpms are listed 
>> below but I'm not sure they are that important to the question which is 
>> this...
>> 
>> When performing managed failovers of the NFS-exported file system resource 
>> from one node to the other (crm resource move), any active NFS clients 
>> experience an I/O error when the file system is unexported.  In other words, 
>> you must unexport it to unmount it.  As soon as it is unexported, clients 
>> are 
>> no longer able to write to it and experience an I/O error (rather than just 
>> blocking).
> 
> Do you hard-mount or soft-mount NFS? Do you use NFSv3 or NFSv4?


Hard mounts.  We are supporting both NFSv3 and NFSv4 mounts.  I tested both and 
the behavior was the same.   There seemed to be no way to avoid and I/O error 
on the clients when umounting the file system as part of a managed (crm 
resource move) failover.   I'm wondering if this is expected or if there is 
some way around it that I'm simply missing.  We'd like to be able to "move" 
resources back and forth among the servers for maintenance without disrupting 
client I/O.

Just to summarize, the Filesystem agent must umount the volume to migrate it.   
To successfully umount it, umount requires the volume to be unexported.   As 
soon as the "stop" operation is run by the exportfs agent, any clients actively 
doing I/O are interrupted and error out rather than blocking as they would if 
the server went down.   So far, I've been unable to find a way around this.

As I write this, I'm thinking that perhaps the way to achieve this is to change 
the order of the services so that the VIP is started last and stopped first 
when stopping/starting the resource group.   That should make it appear to the 
client that the server just "went away" as would happen in a failure scenario.  
 Then the client should not know that the file system has been unexported since 
it can't talk to the server.   

Perhaps, I just made a rookie mistake in the ordering of the services within 
the resource group.  I'll try that and report back.

Regards,

Charlie

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Antw: Managed Failovers w/ NFS HA Cluster

Reply via email to