[Linux-ha-dev] NFS resource agent for active-active clusters / Filesystem RA

luben karavelov Mon, 12 Apr 2010 07:03:55 -0700

I have tested the exportfs RA with NFS4 server (the version from the
mercurial repo).


I have some considerations though.

1. It uses grep -P (perl regexes). This feature of grep is marked as
experimental and
not compiled in on some linux distributions. My proposition is to replace
the only
occurrence of "grep -P ..." with equivalent "grep -E ...":

showmount -e | grep -E
^${OCF_RESKEY_directory}[[:space:]]+${OCF_RESKEY_clientspec}$

2. It seems that /var/lib/nfs/rmtab is not used in NFSv4 so I just deleted
the backup/restore
procedures. May be it could be a configuration option or runtime check for
version 4 that 
disables these procedures.

On itself exportfs RA worked as expected, though there was some unexpected
interaction between 
NFSv4 server and the underlying FS. If some NFSv4 clients keep an open
file of an exported
directory,  even if I completely stop the nfsd, there is some timeout
before I could umount the 
underlying FS on the server (XFS here). Meantime I keep getting "device
busy error".

The interaction of this mis-feature of NFS4 server with pacemaker is
pretty nasty: if you migrate
such a resource group on another node it could not properly stop on the
current node. So the 
cluster hangs without providing the service. If you configured preferred
nodes for different 
NFS exports (services) and node fencing (in order to avoid data
corruptions) it gets a lot worse.

Example of some possible scenario. 
Setup: Cluster of 2 nodes (node0, node1). You get 2 devices replicated by
DRBD on nodes (drbd0,
drbd1) configured as master/slave ms resource . Over this devices you
collocate a resource groups
of Filesystem, exportfs and IPaddr. You set a preferred location for drbd0
to run on node0 and 
drbd1 to run on node1. You setup stonith device in order to shutdown
appropriately mis-behaving
node. 

On this setup you try to migrate drbd0 from node0 to node1 (crm node
standby). The resource group 
fails to stop because the Filesystem RA fails to umount the busy
filesystem. So the the node1 
shoots down node0 in order to bring back the service. Now the 2 volumes
and associated RGs run on
node1. When node0 comes back online the the drbd0 is scheduled for
migration on it (location pref).
It again fails to stop properly the RG on node1 so it is shooted down by
node0. Now all volumes and 
RGs run on node0. When node1 restarts the cluster manager tries to migrate
drbd1 to node1 (location
pref). It fails and so on ... the cluster keeps automatically shooting
itself.

What I have done: In the Filesystem RA I have increased the sleep interval
in the "stop" op. Also I 
have configured 4 minutes timeout for the stop op. May be this sleep value
could be configurable:
I could imagine other scenarios where tweaking it could be useful.

With this tweaks I get NFSv4 client failover and active/active nfsd setup.


Thanks for the great work

Luben




_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

[Linux-ha-dev] NFS resource agent for active-active clusters / Filesystem RA

Reply via email to