Re: [ClusterLabs] Pacemaker kill does not cause node fault ???

Digimer Mon, 30 Jan 2017 18:02:03 -0800

On 10/01/17 05:24 AM, Stefan Schloesser wrote:
> Hi,
> 
> I am currently testing a 2 node cluster under Ubuntu 16.04. The setup seems 
> to be working ok including the STONITH.
> For test purposes I issued a "pkill -f pace" killing all pacemaker processes 
> on one node.
> 
> Result:
> The node is marked as "pending", all resources stay on it. If I manually kill 
> a resource it is not noticed. On the other node a drbd "promote" command 
> fails (drbd is still running as master on the first node).
> 
> Killing the corosync process works as expected -> STONITH.
> 
> Could someone shed some light on this behavior? 
> 
> Thanks,
> 
> Stefan


A good way to test fencing is to crash the OS with 'echo c >
/proc/sysrq-trigger', which causes an immediate segfault. The only
recovery is a reboot, so it's excellent for simulating a hung node.

Make sure, too, that you've hooked DRBD's fencing into pacemaker with
'fencing resource-and-stonith;' and using the crm-{un,}fence-peer.sh
{un,}fence-handlers.

If these are bare-iron nodes, also test by pulling the power out of the
node entirely while it was running. If you can pass both of these tests,
you will have simulated most all possible node failure modes (I say
'most' because it is impossible to think of everything :) ).

-- 
Digimer
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of
Einstein’s brain than in the near certainty that people of equal talent
have lived and died in cotton fields and sweatshops." - Stephen Jay Gould

_______________________________________________
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Pacemaker kill does not cause node fault ???

Reply via email to