On 03/23/2010 01:18 PM, Greg Woods wrote:
> 
>>> On one node, i can get all services to start(and they work fine), but
>>> whenever fail over occurs, there's nfs related handles left open thus
>>> inhibiting/hanging the fail over. more specifically, the file systems fails
>>> to unmount.
> 
> If you are referring to file systems on the server that are made
> available for NFS mounting that hang on unmount (it's not clear from the
> above if your cluster nodes are NFS servers or clients), then you need
> to unexport the file systems first, then you can umount them. I handled
> this by writing my own nfs-exports RA that basically just does an
> "exportfs -u" with the appropriate parameters, and used an "order" line
> in crm shell to make sure that the Filesystem resource is ordered before
> the nfs-exports resource. The nfs-exports resource will export the file
> system on start, and unexport it on stop.
> 
> --Greg
> 
> 

For clarity, the unmount I am speaking of is on the server, right before the 
resource group is to be moved to the other node.

Looks like the Debian Lenny init script already takes care of unexporting of 
file systems.

<snip from debian lenny:/etc/init.d/nfs-kernel-server>
162         log_begin_msg "Unexporting directories for $DESC..."
163         $PREFIX/sbin/exportfs -au
164         if [ $? != 0 ]; then
165                 log_end_msg $?
166                 exit $?
</snip>

What I'm seeing seems to be related to stale nfs handles or lock files.  When I 
completely shut down heartbeat on one of the
nodes (it doesn't matter which node), the fail over works perfectly. If I try 
to move/migrate resources from one node to the
next, all hell breaks lose.

>From an inexperienced eye, it would appear something is wrong with the 
>ordering in regards to the tearing down of the
resources and resource group or the nfs-kernel-server init script simply 
doesn't perform all necessary tasks needed.

I've attached the my crm configuration that I import with `crm < fileserver.crm`

software versions used:

pacemaker/lenny uptodate 1.0.7+hg20100203-1
heartbeat/lenny uptodate 1:3.0.2-1~bpo50+1
libheartbeat2/lenny uptodate 1:3.0.2-1~bpo50+1
drbd8-2.6.26-2-xen-686 2:8.3.7-1~bpo50+1+2.6.26-21lenny4 installed: No 
available version in archive
drbd8-source/lenny-backports uptodate 2:8.3.7-1~bpo50+1
drbd8-utils/lenny-backports uptodate 2:8.3.7-1~bpo50+1
nfs-common/lenny uptodate 1:1.1.2-6lenny1
nfs-kernel-server/lenny uptodate 1:1.1.2-6lenny1


kind regards,

Terry





configure

# quorum doesnt make sense in two node clusters, so disable quorum
property no-quorum-policy=ignore

# we dont have any stonith devices, so disable stonith
property stonith-enabled=false

# this is the IPaddr resource that will fileserver clients will reference for 
connections
primitive fileserver_vip0 ocf:heartbeat:IPaddr \
         params ip="10.0.10.1" cidr_netmask="24" nic="eth1" \
         op monitor interval="5s" timeout="20s"

# this is the filesystem resource for our fileserver
primitive fileserver_fs0 ocf:heartbeat:Filesystem \
         params fstype="ext3" directory="/data" device="/dev/drbd1"

# simple resource migration notification
primitive fileserver_notify_admin ocf:heartbeat:MailTo \
         params email="inbo...@domain.com" subject="*** Cluster Resource 
Failover Occured ***" \
         op monitor interval="3s" timeout="20s"

# drbd resource for out filesystem
primitive drbd1 ocf:linbit:drbd \
         params drbd_resource="fileserver" \
         op monitor interval="59s" role="Master" timeout="30s" \
         op monitor interval="60s" role="Slave" timeout="30s"

primitive fileserver_nfs lsb:nfs-kernel-server op monitor interval="1min"
primitive fileserver_nfs-common lsb:nfs-common op monitor interval="1min"

# resource group definition
group fileserver_cluster_group fileserver_fs0 fileserver_vip0 
fileserver_nfs-common fileserver_nfs fileserver_notify_admin \
        meta colocated="true" ordered="true" migration-threshold="1" 
failure-timeout="10s" resource-stickiness="10" 

# master/slave drbd definiton (needs more in depth explanation)
master master-drbd1 drbd1 meta clone-max="2" notify="true" 
globally-unique="false" target-role="Started" 

# colocation constraint for ms-drbd1 
colocation fileserver_cluster_group-on-master-drbd1 inf: 
fileserver_cluster_group master-drbd1:Master 

# ordering definiton ((needs more in depth explanation)
order master-drbd1-before-fileserver_cluster_group inf: master-drbd1:promote 
fileserver_cluster_group:start

commit
quit
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to