I'm not sure if this is a "Linux-HA" question; please direct me to the
appropriate list if it's not.

I'm setting up a two-node cman+pacemaker+gfs2 cluster as described in "Clusters
From Scratch." Fencing is through forcibly rebooting a node by cutting and
restoring its power via UPS.

My fencing/failover tests have revealed a problem. If I gracefully turn off one
node ("crm node standby"; "service pacemaker stop"; "shutdown -r now") all the
resources transfer to the other node with no problems. If I cut power to one
node (as would happen if it were fenced), the lsb::clvmd resource on the
remaining node eventually fails. Since all the other resources depend on clvmd,
all the resources on the remaining node stop and the cluster is left with
nothing running.

I've traced why the lsb::clvmd fails: The monitor/status command includes
"vgdisplay", which hangs indefinitely. Therefore the monitor will always 
time-out.

So this isn't a problem with pacemaker, but with clvmd/dlm: If a node is cut
off, the cluster isn't handling it properly. Has anyone on this list seen this
before? Any ideas?

Details:

versions:
Redhat Linux 6.2 (kernel 2.6.32)
cman-3.0.12.1
corosync-1.4.1
pacemaker-1.1.6
lvm2-2.02.87
lvm2-cluster-2.02.87

cluster.conf: <http://pastebin.com/w5XNYyAX>
output of "crm configure show": <http://pastebin.com/atVkXjkn>
output of "lvm dumpconfig": <http://pastebin.com/rtw8c3Pf>

/var/log/cluster/dlm_controld.log and /var/log/cluster/gfs_controld.log show
nothing. When I shut down power to one nodes (orestes-tb), the output of
grep -E "(dlm|gfs2|clvmd)" /var/log/messages is <http://pastebin.com/vjpvCFeN>.

-- 
Bill Seligman             | Phone: (914) 591-2823
Nevis Labs, Columbia Univ | mailto://selig...@nevis.columbia.edu
PO Box 137                |
Irvington NY 10533 USA    | http://www.nevis.columbia.edu/~seligman/

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to