I'm not sure if this is a "Linux-HA" question; please direct me to the appropriate list if it's not.
I'm setting up a two-node cman+pacemaker+gfs2 cluster as described in "Clusters From Scratch." Fencing is through forcibly rebooting a node by cutting and restoring its power via UPS. My fencing/failover tests have revealed a problem. If I gracefully turn off one node ("crm node standby"; "service pacemaker stop"; "shutdown -r now") all the resources transfer to the other node with no problems. If I cut power to one node (as would happen if it were fenced), the lsb::clvmd resource on the remaining node eventually fails. Since all the other resources depend on clvmd, all the resources on the remaining node stop and the cluster is left with nothing running. I've traced why the lsb::clvmd fails: The monitor/status command includes "vgdisplay", which hangs indefinitely. Therefore the monitor will always time-out. So this isn't a problem with pacemaker, but with clvmd/dlm: If a node is cut off, the cluster isn't handling it properly. Has anyone on this list seen this before? Any ideas? Details: versions: Redhat Linux 6.2 (kernel 2.6.32) cman-3.0.12.1 corosync-1.4.1 pacemaker-1.1.6 lvm2-2.02.87 lvm2-cluster-2.02.87 cluster.conf: <http://pastebin.com/w5XNYyAX> output of "crm configure show": <http://pastebin.com/atVkXjkn> output of "lvm dumpconfig": <http://pastebin.com/rtw8c3Pf> /var/log/cluster/dlm_controld.log and /var/log/cluster/gfs_controld.log show nothing. When I shut down power to one nodes (orestes-tb), the output of grep -E "(dlm|gfs2|clvmd)" /var/log/messages is <http://pastebin.com/vjpvCFeN>. -- Bill Seligman | Phone: (914) 591-2823 Nevis Labs, Columbia Univ | mailto://selig...@nevis.columbia.edu PO Box 137 | Irvington NY 10533 USA | http://www.nevis.columbia.edu/~seligman/
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems