Re: [Pacemaker] SBD kills both nodes in a two node cluster.

Andrew Beekhof Wed, 20 Apr 2011 00:04:06 -0700

On Tue, Apr 19, 2011 at 12:04 PM, Ulf <m...@gmx.net> wrote:
> I' ve two nodes with shared storage and multipathing. But the SBD device 
> doesn't work as expected.
> My idea was that in case of a split brain: One node kills the other node and 
> one will survive.
> But in my case I get a double kill, both nodes will be killed at the same 
> time.


http://ourobengr.com/ha might be of some assistance.

> I simulated the split brain with "ip link set down eth0" on one node. I 
> tested it several times.
>
> The sbd deamon is running on both nodes.
> My configuration:
> primitive stonith_sbd stonith:external/sbd params 
> sbd_device="/dev/disk/by-id/scsi-36..."
> clone stonith_sbd-clone stonith_sbd
>
> /var/log/messages:
> Node A:
> Apr 19 10:37:09 nodeA crmd: [7690]: info: te_fence_node: Executing reboot 
> fencing operation (17) on nodeB (timeout=180000)
> Apr 19 10:37:09 nodeA stonith-ng: [7685]: info: initiate_remote_stonith_op: 
> Initiating remote operation reboot for nodeB: 
> d4226746-fef1-4d29-bc85-2d33e9bf7f94
> Apr 19 10:37:09 nodeA stonith-ng: [7685]: info: stonith_queryQuery 
> <stonith_command t="stonith-ng" 
> st_async_id="d4226746-fef1-4d29-bc85-2d33e9bf7f94" st_op="st_query" 
> st_callid="0" st_callopt="0" st_remote
> _op="d4226746-fef1-4d29-bc85-2d33e9bf7f94" st_target="nodeB" 
> st_device_action="reboot" st_clientid="3b1b3feb-5e4e-4a3c-ae8e-2131ea2ae588" 
> st_timeout="18000" src="nodeA" seq="1" />
>
>
> Node B:
> Apr 19 10:37:09 nodeB crmd: [7851]: info: te_fence_node: Executing reboot 
> fencing operation (17) on nodeA (timeout=180000)
> Apr 19 10:37:09 nodeB stonith-ng: [7846]: info: initiate_remote_stonith_op: 
> Initiating remote operation reboot for nodeA: 
> e361b3b6-2890-474d-8671-b73eea62d1ab
> Apr 19 10:37:09 nodeB stonith-ng: [7846]: info: stonith_queryQuery 
> <stonith_command t="stonith-ng" 
> st_async_id="e361b3b6-2890-474d-8671-b73eea62d1ab" st_op="st_query" 
> st_callid="0" st_callopt="0" st_remote
> _op="e361b3b6-2890-474d-8671-b73eea62d1ab" st_target="nodeA" 
> st_device_action="reboot" st_clientid="a0d67d7e-5e30-44fe-bc88-e733019e594d" 
> st_timeout="18000" src="nodeB" seq="1" />
>
>
> On both nodes I started a "sbd -d /dev/disk/by-id/scsi-36... list" in an 
> endless loop and these are the last SBD commands I get.
> As you can see both nodes request a reset at the same time and both will 
> succeed => double kill.
> Node A:
> 0       nodeB clear
> 1       nodeA clear
> 0       nodeB clear
> 1       nodeA reset   nodeB
> 0       nodeB reset   nodeA
> 1       nodeA reset   nodeB
>
> Node B:
> 0       nodeB clear
> 1       nodeA reset   nodeB
> 0       nodeB clear
> 1       nodeA reset   nodeB
> 0       nodeB clear
> 1       nodeA reset   nodeB
> 0       nodeB reset   nodeA
> 1       nodeA reset   nodeB
> 0       nodeB reset   nodeA
> 1       nodeA reset   nodeB
>
>
> Cheers,
> Ulf
> --
> NEU: FreePhone - kostenlos mobil telefonieren und surfen!
> Jetzt informieren: http://www.gmx.net/de/go/freephone
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: 
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] SBD kills both nodes in a two node cluster.

Reply via email to