Hi, On Mon, Jun 14, 2010 at 06:29:59PM +0200, Oliver Heinz wrote: > Am Montag, 14. Juni 2010, um 16:43:54 schrieb Dejan Muhamedagic: > > Hi, > > > > On Mon, Jun 14, 2010 at 02:26:57PM +0200, Oliver Heinz wrote: > > > I configured a sbd fencing device on the shared storage to prevent data > > > corruption. It works basically, but when I pull the network plugs on one > > > node to simulate a failure one of the nodes is fenced (not necessarily > > > the one that was unplugged). After the fenced node reboots it fences the > > > other node, this goes on and on... > > > > The networking is still down between the nodes? If so, then this > > is expected. > > > > > I configured pingd and location so that the resources on the shared > > > device are not started on the node that is without network connectivity, > > > but still this node fences the other node. > > > > Yes. With pingd you can influence the resource placement, but it > > can't fix split brain. > > > > > What I would like to achive is that in case of a network problem on a > > > node this node is fenced (and not some randomly chosen node) and that > > > after a reboot this node just sits there waiting for the network to come > > > up again (and not fencing other nodes). Once the network comes up, this > > > node could automatically join the cluster again. > > > > > > Is this possible? > > > > No. You need to make your network connectivity between the nodes > > redundant. > > It will be, but I'm testing the worst case scenario. I once had split brain > because I plugged in a firewire device and the kernel oops made it to block > any > i/o (network and even the dumb serial line I had for redundancy) for longer > than my configured dead time. > > > Split brain is bad news. The cluster will try best to > > deal with it, but, as you could see, it won't always please the > > users. > > > > > Or do I have to disable the cluster stack on bootup, to > > > sort things out manually before joining the cluster. > > > > I think that that's a good idea. > > I really like the idea of having the other node sitting there waiting for > network recovery and integrating seamlessly once network is up. > > I guess pacemaker fences the other node after it becomes DC, rigth? So what I > would probably want is some rules that prevent this node from becoming DC and > starting to do things. Something like "hmm, I can't reach my gateway I should > keep looking for a DC an not elect myself."
There is no way to influence the DC election process. > Meanwhile I'll add some checks to the init-Script "if we can't reach the > gateway and the other node we'd better not start the cluster stack" > > > > > > Oh, and don't make sbd a clone, it doesn't like parallel > > operations to a device. > > Thanks for clarification. After reading this > http://www.mail-archive.com/pacemaker@oss.clusterlabs.org/msg03851.html > I thought it wouldn't hurt either. I think that in the meantime our knowledge about the way sbd works improved. So, it may hurt and you're covered just as well with a single sbd instance. Thanks, Dejan > > Thanks, > Oliver > > > > > > Thanks, > > > > Dejan > > > > > Can s.o. please point me to the right direction? Maybe I'm just > > > overlooking the obviuos? > > > > > > TIA, > > > Oliver > > > > > > node $id="00b61c9a-22c6-4689-9930-1fd65d5729fa" server-d \ > > > > > > attributes standby="off" > > > > > > node $id="0d11e934-91b9-400d-9820-feb2f5895b55" server-c > > > primitive resDATA ocf:heartbeat:LVM \ > > > > > > params volgrpname="data" > > > > > > primitive resDataC ocf:heartbeat:Filesystem \ > > > > > > params device="/dev/mapper/data-C" directory="/srv/data/C" > > > > > > fstype="ext4" \ > > > > > > meta is-managed="true" > > > > > > primitive resDataD ocf:heartbeat:Filesystem \ > > > > > > params device="/dev/mapper/data-D" directory="/srv/data/D" > > > > > > fstype="ext4" > > > primitive resPingGateway ocf:pacemaker:pingd \ > > > > > > params host_list="gateway" > > > > > > primitive resSBD stonith:external/sbd \ > > > > > > params > > > sbd_device="/dev/mapper/3600c0ff000d8d78802faa14b01000000-part1" > > > > > > primitive resVserverTestFramelos ocf:heartbeat:VServer \ > > > > > > params vserver="test-framelos" \ > > > meta is-managed="true" > > > > > > group grpVserverC resDataC resVserverTestFramelos \ > > > > > > meta target-role="Started" > > > > > > group grpVserverD resDataD \ > > > > > > meta target-role="Started" > > > > > > clone cloneDATA resDATA > > > clone clonePingGateway resPingGateway \ > > > > > > meta target-role="Started" > > > > > > clone cloneSBD resSBD > > > location cli-prefer-grpVserverC grpVserverC \ > > > > > > rule $id="cli-prefer-rule-grpVserverC" inf: #uname eq server-c > > > > > > location cli-prefer-grpVserverD grpVserverD \ > > > > > > rule $id="cli-prefer-rule-grpVserverD" inf: #uname eq server-d > > > and > > > > > > #uname eq server-d > > > location cli-prefer-resVserverTestFramelos resVserverTestFramelos \ > > > > > > rule $id="cli-prefer-rule-resVserverTestFramelos" inf: #uname eq > > > > > > server-c > > > location locPingVserverC grpVserverC \ > > > > > > rule $id="locPingVserverC-rule" -inf: not_defined pingd or pingd > > > lte 0 > > > > > > location locPingVserverD grpVserverD \ > > > > > > rule $id="locPingVserverD-rule" -inf: not_defined pingd or pingd > > > lte 0 > > > > > > order ordDataC inf: cloneDATA grpVserverC > > > order ordDataD inf: cloneDATA grpVserverD > > > property $id="cib-bootstrap-options" \ > > > > > > dc-version="1.0.8-f2ca9dd92b1d+ sid tip" \ > > > cluster-infrastructure="Heartbeat" \ > > > expected-quorum-votes="2" \ > > > no-quorum-policy="ignore" \ > > > stonith-enabled="true" \ > > > default-resource-stickiness="INFINITY" \ > > > last-lrm-refresh="1276514035" > > > > > > _______________________________________________ > > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > > > Project Home: http://www.clusterlabs.org > > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > > Bugs: > > > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemake > > > r > > > > _______________________________________________ > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: > > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker