On 03/23/2010 01:18 PM, Greg Woods wrote:
>
>>> On one node, i can get all services to start(and they work fine), but
>>> whenever fail over occurs, there's nfs related handles left open thus
>>> inhibiting/hanging the fail over. more specifically, the file systems fails
>>> to unmount.
>
> If you are referring to file systems on the server that are made
> available for NFS mounting that hang on unmount (it's not clear from the
> above if your cluster nodes are NFS servers or clients), then you need
> to unexport the file systems first, then you can umount them. I handled
> this by writing my own nfs-exports RA that basically just does an
> "exportfs -u" with the appropriate parameters, and used an "order" line
> in crm shell to make sure that the Filesystem resource is ordered before
> the nfs-exports resource. The nfs-exports resource will export the file
> system on start, and unexport it on stop.
>
> --Greg
>
>
For clarity, the unmount I am speaking of is on the server, right before the
resource group is to be moved to the other node.
Looks like the Debian Lenny init script already takes care of unexporting of
file systems.
<snip from debian lenny:/etc/init.d/nfs-kernel-server>
162 log_begin_msg "Unexporting directories for $DESC..."
163 $PREFIX/sbin/exportfs -au
164 if [ $? != 0 ]; then
165 log_end_msg $?
166 exit $?
</snip>
What I'm seeing seems to be related to stale nfs handles or lock files. When I
completely shut down heartbeat on one of the
nodes (it doesn't matter which node), the fail over works perfectly. If I try
to move/migrate resources from one node to the
next, all hell breaks lose.
>From an inexperienced eye, it would appear something is wrong with the
>ordering in regards to the tearing down of the
resources and resource group or the nfs-kernel-server init script simply
doesn't perform all necessary tasks needed.
I've attached the my crm configuration that I import with `crm < fileserver.crm`
software versions used:
pacemaker/lenny uptodate 1.0.7+hg20100203-1
heartbeat/lenny uptodate 1:3.0.2-1~bpo50+1
libheartbeat2/lenny uptodate 1:3.0.2-1~bpo50+1
drbd8-2.6.26-2-xen-686 2:8.3.7-1~bpo50+1+2.6.26-21lenny4 installed: No
available version in archive
drbd8-source/lenny-backports uptodate 2:8.3.7-1~bpo50+1
drbd8-utils/lenny-backports uptodate 2:8.3.7-1~bpo50+1
nfs-common/lenny uptodate 1:1.1.2-6lenny1
nfs-kernel-server/lenny uptodate 1:1.1.2-6lenny1
kind regards,
Terry
configure
# quorum doesnt make sense in two node clusters, so disable quorum
property no-quorum-policy=ignore
# we dont have any stonith devices, so disable stonith
property stonith-enabled=false
# this is the IPaddr resource that will fileserver clients will reference for
connections
primitive fileserver_vip0 ocf:heartbeat:IPaddr \
params ip="10.0.10.1" cidr_netmask="24" nic="eth1" \
op monitor interval="5s" timeout="20s"
# this is the filesystem resource for our fileserver
primitive fileserver_fs0 ocf:heartbeat:Filesystem \
params fstype="ext3" directory="/data" device="/dev/drbd1"
# simple resource migration notification
primitive fileserver_notify_admin ocf:heartbeat:MailTo \
params email="[email protected]" subject="*** Cluster Resource
Failover Occured ***" \
op monitor interval="3s" timeout="20s"
# drbd resource for out filesystem
primitive drbd1 ocf:linbit:drbd \
params drbd_resource="fileserver" \
op monitor interval="59s" role="Master" timeout="30s" \
op monitor interval="60s" role="Slave" timeout="30s"
primitive fileserver_nfs lsb:nfs-kernel-server op monitor interval="1min"
primitive fileserver_nfs-common lsb:nfs-common op monitor interval="1min"
# resource group definition
group fileserver_cluster_group fileserver_fs0 fileserver_vip0
fileserver_nfs-common fileserver_nfs fileserver_notify_admin \
meta colocated="true" ordered="true" migration-threshold="1"
failure-timeout="10s" resource-stickiness="10"
# master/slave drbd definiton (needs more in depth explanation)
master master-drbd1 drbd1 meta clone-max="2" notify="true"
globally-unique="false" target-role="Started"
# colocation constraint for ms-drbd1
colocation fileserver_cluster_group-on-master-drbd1 inf:
fileserver_cluster_group master-drbd1:Master
# ordering definiton ((needs more in depth explanation)
order master-drbd1-before-fileserver_cluster_group inf: master-drbd1:promote
fileserver_cluster_group:start
commit
quit
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems