Re: [Pacemaker] how do I avoid infinite reboot cycles by fencing just the offline node?

Dejan Muhamedagic Mon, 14 Jun 2010 07:45:42 -0700

Hi,

On Mon, Jun 14, 2010 at 02:26:57PM +0200, Oliver Heinz wrote:
> 
> I configured a sbd fencing device on the shared storage to prevent data 
> corruption. It works basically, but when I pull the network plugs on one node 
> to simulate a failure one of the nodes is fenced (not necessarily the one 
> that 
> was unplugged). After the fenced node reboots it fences the other node, this 
> goes on and on...


The networking is still down between the nodes? If so, then this
is expected.

> I configured pingd and location so that the resources on the shared device 
> are 
> not started on the node that is without network connectivity, but still this 
> node fences the other node.

Yes. With pingd you can influence the resource placement, but it
can't fix split brain.

> What I would like to achive is that in case of a network problem on a node 
> this node is fenced (and not some randomly chosen node) and that after a 
> reboot this node just sits there waiting for the network to come up again 
> (and 
> not fencing other nodes). Once the network comes up, this node could 
> automatically join the cluster again.
> 
> Is this possible?

No. You need to make your network connectivity between the nodes
redundant. Split brain is bad news. The cluster will try best to
deal with it, but, as you could see, it won't always please the
users.

> Or do I have to disable  the cluster stack on bootup, to 
> sort things out manually before joining the cluster. 

I think that that's a good idea.

Oh, and don't make sbd a clone, it doesn't like parallel
operations to a device.

Thanks,

Dejan

> Can s.o. please point me to the right direction? Maybe I'm just overlooking 
> the obviuos?
> 
> TIA,
> Oliver
> 
> node $id="00b61c9a-22c6-4689-9930-1fd65d5729fa" server-d \
>         attributes standby="off"
> node $id="0d11e934-91b9-400d-9820-feb2f5895b55" server-c
> primitive resDATA ocf:heartbeat:LVM \
>         params volgrpname="data"
> primitive resDataC ocf:heartbeat:Filesystem \
>         params device="/dev/mapper/data-C" directory="/srv/data/C" 
> fstype="ext4" \
>         meta is-managed="true"
> primitive resDataD ocf:heartbeat:Filesystem \
>         params device="/dev/mapper/data-D" directory="/srv/data/D" 
> fstype="ext4"
> primitive resPingGateway ocf:pacemaker:pingd \
>         params host_list="gateway"
> primitive resSBD stonith:external/sbd \
>         params 
> sbd_device="/dev/mapper/3600c0ff000d8d78802faa14b01000000-part1"
> primitive resVserverTestFramelos ocf:heartbeat:VServer \
>         params vserver="test-framelos" \
>         meta is-managed="true"
> group grpVserverC resDataC resVserverTestFramelos \
>         meta target-role="Started"
> group grpVserverD resDataD \
>         meta target-role="Started"
> clone cloneDATA resDATA
> clone clonePingGateway resPingGateway \
>         meta target-role="Started"
> clone cloneSBD resSBD
> location cli-prefer-grpVserverC grpVserverC \
>         rule $id="cli-prefer-rule-grpVserverC" inf: #uname eq server-c
> location cli-prefer-grpVserverD grpVserverD \
>         rule $id="cli-prefer-rule-grpVserverD" inf: #uname eq server-d and 
> #uname eq server-d
> location cli-prefer-resVserverTestFramelos resVserverTestFramelos \
>         rule $id="cli-prefer-rule-resVserverTestFramelos" inf: #uname eq 
> server-c
> location locPingVserverC grpVserverC \
>         rule $id="locPingVserverC-rule" -inf: not_defined pingd or pingd lte 0
> location locPingVserverD grpVserverD \
>         rule $id="locPingVserverD-rule" -inf: not_defined pingd or pingd lte 0
> order ordDataC inf: cloneDATA grpVserverC
> order ordDataD inf: cloneDATA grpVserverD
> property $id="cib-bootstrap-options" \
>         dc-version="1.0.8-f2ca9dd92b1d+ sid tip" \
>         cluster-infrastructure="Heartbeat" \
>         expected-quorum-votes="2" \
>         no-quorum-policy="ignore" \
>         stonith-enabled="true" \
>         default-resource-stickiness="INFINITY" \
>         last-lrm-refresh="1276514035"
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: 
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] how do I avoid infinite reboot cycles by fencing just the offline node?

Reply via email to