On Tue, Apr 19, 2011 at 12:04 PM, Ulf <m...@gmx.net> wrote: > I' ve two nodes with shared storage and multipathing. But the SBD device > doesn't work as expected. > My idea was that in case of a split brain: One node kills the other node and > one will survive. > But in my case I get a double kill, both nodes will be killed at the same > time.
http://ourobengr.com/ha might be of some assistance. > I simulated the split brain with "ip link set down eth0" on one node. I > tested it several times. > > The sbd deamon is running on both nodes. > My configuration: > primitive stonith_sbd stonith:external/sbd params > sbd_device="/dev/disk/by-id/scsi-36..." > clone stonith_sbd-clone stonith_sbd > > /var/log/messages: > Node A: > Apr 19 10:37:09 nodeA crmd: [7690]: info: te_fence_node: Executing reboot > fencing operation (17) on nodeB (timeout=180000) > Apr 19 10:37:09 nodeA stonith-ng: [7685]: info: initiate_remote_stonith_op: > Initiating remote operation reboot for nodeB: > d4226746-fef1-4d29-bc85-2d33e9bf7f94 > Apr 19 10:37:09 nodeA stonith-ng: [7685]: info: stonith_queryQuery > <stonith_command t="stonith-ng" > st_async_id="d4226746-fef1-4d29-bc85-2d33e9bf7f94" st_op="st_query" > st_callid="0" st_callopt="0" st_remote > _op="d4226746-fef1-4d29-bc85-2d33e9bf7f94" st_target="nodeB" > st_device_action="reboot" st_clientid="3b1b3feb-5e4e-4a3c-ae8e-2131ea2ae588" > st_timeout="18000" src="nodeA" seq="1" /> > > > Node B: > Apr 19 10:37:09 nodeB crmd: [7851]: info: te_fence_node: Executing reboot > fencing operation (17) on nodeA (timeout=180000) > Apr 19 10:37:09 nodeB stonith-ng: [7846]: info: initiate_remote_stonith_op: > Initiating remote operation reboot for nodeA: > e361b3b6-2890-474d-8671-b73eea62d1ab > Apr 19 10:37:09 nodeB stonith-ng: [7846]: info: stonith_queryQuery > <stonith_command t="stonith-ng" > st_async_id="e361b3b6-2890-474d-8671-b73eea62d1ab" st_op="st_query" > st_callid="0" st_callopt="0" st_remote > _op="e361b3b6-2890-474d-8671-b73eea62d1ab" st_target="nodeA" > st_device_action="reboot" st_clientid="a0d67d7e-5e30-44fe-bc88-e733019e594d" > st_timeout="18000" src="nodeB" seq="1" /> > > > On both nodes I started a "sbd -d /dev/disk/by-id/scsi-36... list" in an > endless loop and these are the last SBD commands I get. > As you can see both nodes request a reset at the same time and both will > succeed => double kill. > Node A: > 0 nodeB clear > 1 nodeA clear > 0 nodeB clear > 1 nodeA reset nodeB > 0 nodeB reset nodeA > 1 nodeA reset nodeB > > Node B: > 0 nodeB clear > 1 nodeA reset nodeB > 0 nodeB clear > 1 nodeA reset nodeB > 0 nodeB clear > 1 nodeA reset nodeB > 0 nodeB reset nodeA > 1 nodeA reset nodeB > 0 nodeB reset nodeA > 1 nodeA reset nodeB > > > Cheers, > Ulf > -- > NEU: FreePhone - kostenlos mobil telefonieren und surfen! > Jetzt informieren: http://www.gmx.net/de/go/freephone > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker