On 10/01/17 05:24 AM, Stefan Schloesser wrote: > Hi, > > I am currently testing a 2 node cluster under Ubuntu 16.04. The setup seems > to be working ok including the STONITH. > For test purposes I issued a "pkill -f pace" killing all pacemaker processes > on one node. > > Result: > The node is marked as "pending", all resources stay on it. If I manually kill > a resource it is not noticed. On the other node a drbd "promote" command > fails (drbd is still running as master on the first node). > > Killing the corosync process works as expected -> STONITH. > > Could someone shed some light on this behavior? > > Thanks, > > Stefan
A good way to test fencing is to crash the OS with 'echo c > /proc/sysrq-trigger', which causes an immediate segfault. The only recovery is a reboot, so it's excellent for simulating a hung node. Make sure, too, that you've hooked DRBD's fencing into pacemaker with 'fencing resource-and-stonith;' and using the crm-{un,}fence-peer.sh {un,}fence-handlers. If these are bare-iron nodes, also test by pulling the power out of the node entirely while it was running. If you can pass both of these tests, you will have simulated most all possible node failure modes (I say 'most' because it is impossible to think of everything :) ). -- Digimer Papers and Projects: https://alteeve.com/w/ "I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops." - Stephen Jay Gould _______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org