On Wed, Apr 27, 2011 at 11:52 AM, Stallmann, Andreas
<astallm...@conet.de> wrote:
> Hi Lars,
>
> Hi Lars!
>
>> You are exercising complete cluster communication loss.
>> Which is cluster split brain.
> Correct, yes.
>
>> If you are specifically exercising cluster split brain, why are you 
>> surprised that you get exactly that?
>
> Because ping(d) is supposed to keep ressources from starting on nodes which 
> are not properly connected to the network. Thus: Still split brain, but no 
> possibility for concurrent (and possibly damaging) access to resources.

According to your configuration, it can be up to 60s before we'll
detect a change in external connectivity.
Thats plenty of time for the cluster to start resources.

Maybe shortening the monitor interval will help you.
Couldn't hurt.

>
>> You need to reduce the probability to run into complete communication loss, 
>> by
>> - using multiple communication links.
>
> There will be *one* dedicated (mpls) line between the two sites. No 
> possibility for any "real" redundant links; honestly, believe me. The only 
> way would be the usage of GSM modems or other wireless links, which is not 
> possible for several other reasons (which I can't discuss here).
>
>> - using a real quorum
>>  (there is no quorum in a two node failover cluster)
>
> Yes, and there is no proper way to use DRBD in a three node cluster.

How is one related to the other?
No-one said the third node had to run anything.

> Until then (unless we have a dedicated, replicated, shared storage, which we 
> don't have, unfortunately), it's a two node cluster or nothing.  This - 
> inevitably - leads to the need for an "external quorum", and ping(d) seems to 
> do that, as far as I understood the docs. Please correct me if I'm wrong.
>
>> You may want to still guard against the ugly effects of cluster split brain, 
>> by
>> - implementing stonith
>> - configuring stonith properly
>
> There's no proper way for doing stonith in a split-site scenario, besides 
> "meatware". If the link is down between the two sites, you won't be able to 
> access any ILO, UPS or other stonith device.
>
>> - additionally configuring fencing in DRBD
>
> Yes, I'm going to try that.
>
> Still: Please tell me if ping(d) is behaving properly or if it isn't. You've 
> seen my configuration. I think it should work (and, indeed, it did a while 
> ago; it could well be that we misconfigured something after that, but I just 
> can't find what it is...
>
> THANKS,
>
> Andreas
>
> --
> : Lars Ellenberg
> : LINBIT | Your Way to High Availability
> : DRBD/HA support and consulting http://www.linbit.com
>
> DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
> _______________________________________________
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
> ------------------------
> CONET Solutions GmbH, Theodor-Heuss-Allee 19, 53773 Hennef.
> Registergericht/Registration Court: Amtsgericht Siegburg (HRB Nr. 9136)
> Geschäftsführer/Managing Directors: Jürgen Zender (Sprecher/Chairman), Anke 
> Höfer
> _______________________________________________
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to