Hello, We had many unresolved issues some time ago with Pacemaker. I think almost all of them got solved by fixing link between clusters (removed media converters, replaced them with NIC with SFP+, upgraded to 10Gbps).
Now it seems to be working fine with few exceptions: - if I kill one node manually (power off, but IPMI is still operational so stonith is working fine) or - if I move one of nodes to standby and it had few Xen domUs It gets Unclean. Funny thing is that if I kill (or make a standby) node B, also node A gets unclean. So I have situation that crm_mon shows Node-A: UNCLEAN (Online), Node-B: Unclean (OFFLINE). To be honest I have much trouble diagnosing it (BTW: is there a some kind of documentation how to read logs of pacemaker?) One thing I found that makes me worried is: Mar 20 04:16:39 rivendell-A kernel: [ 774.635312] stonithd[10089]: segfault at 0 ip 00007f51a1aa5bd4 sp 00007fff20c7fb50 error 4 in libcrmcommon.so.2.0.0[7f51a1a93000+2d000] And it happens on both nodes. And also it seems that it only happens when I define manual fencing device (meatware) as such: primitive manual-fencing-of-A stonith:meatware \ params hostlist="rivendell-B" \ op monitor interval="60s" \ meta target-role="Started" primitive manual-fencing-of-B stonith:meatware \ params hostlist="rivendell-A" \ op monitor interval="60s" \ meta target-role="Started" location location-manual-fencing-of-A manual-fencing-of-A -inf: rivendell-A location location-manual-fencing-of-B manual-fencing-of-B -inf: rivendell-B Here is our configuration which currently is used (without manual fencing) - http://pastebin.com/CudX6wx3 BTW - is there a way to recover from such situation? I can only fix it by restarting corosync or rebooting a node. But it then kills other node because of UNCLEAN state. Also if it is a pacemaker bug how to debug it/fix it? We are currently using Debian Wheezy 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff. I see there are more up to date versions but not with Debian. Should I consider upgrading? Thank you! -- Michał Margula, alche...@uznam.net.pl, http://alchemyx.uznam.net.pl/ "W życiu piękne są tylko chwile" [Ryszard Riedel] _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org