On Wed, Apr 27, 2011 at 11:52 AM, Stallmann, Andreas <astallm...@conet.de> wrote: > Hi Lars, > > Hi Lars! > >> You are exercising complete cluster communication loss. >> Which is cluster split brain. > Correct, yes. > >> If you are specifically exercising cluster split brain, why are you >> surprised that you get exactly that? > > Because ping(d) is supposed to keep ressources from starting on nodes which > are not properly connected to the network. Thus: Still split brain, but no > possibility for concurrent (and possibly damaging) access to resources.
According to your configuration, it can be up to 60s before we'll detect a change in external connectivity. Thats plenty of time for the cluster to start resources. Maybe shortening the monitor interval will help you. Couldn't hurt. > >> You need to reduce the probability to run into complete communication loss, >> by >> - using multiple communication links. > > There will be *one* dedicated (mpls) line between the two sites. No > possibility for any "real" redundant links; honestly, believe me. The only > way would be the usage of GSM modems or other wireless links, which is not > possible for several other reasons (which I can't discuss here). > >> - using a real quorum >> (there is no quorum in a two node failover cluster) > > Yes, and there is no proper way to use DRBD in a three node cluster. How is one related to the other? No-one said the third node had to run anything. > Until then (unless we have a dedicated, replicated, shared storage, which we > don't have, unfortunately), it's a two node cluster or nothing. This - > inevitably - leads to the need for an "external quorum", and ping(d) seems to > do that, as far as I understood the docs. Please correct me if I'm wrong. > >> You may want to still guard against the ugly effects of cluster split brain, >> by >> - implementing stonith >> - configuring stonith properly > > There's no proper way for doing stonith in a split-site scenario, besides > "meatware". If the link is down between the two sites, you won't be able to > access any ILO, UPS or other stonith device. > >> - additionally configuring fencing in DRBD > > Yes, I'm going to try that. > > Still: Please tell me if ping(d) is behaving properly or if it isn't. You've > seen my configuration. I think it should work (and, indeed, it did a while > ago; it could well be that we misconfigured something after that, but I just > can't find what it is... > > THANKS, > > Andreas > > -- > : Lars Ellenberg > : LINBIT | Your Way to High Availability > : DRBD/HA support and consulting http://www.linbit.com > > DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. > _______________________________________________ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > > ------------------------ > CONET Solutions GmbH, Theodor-Heuss-Allee 19, 53773 Hennef. > Registergericht/Registration Court: Amtsgericht Siegburg (HRB Nr. 9136) > Geschäftsführer/Managing Directors: Jürgen Zender (Sprecher/Chairman), Anke > Höfer > _______________________________________________ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems