On Wed, Dec 21, 2011 at 11:18:23AM +0100, Ulrich Windl wrote:
> >>> Andreas Kurz <andr...@hastexo.com> schrieb am 20.12.2011 um 22:57 in 
> >>> Nachricht
> <4ef104b3.7050...@hastexo.com>:
> > Hello,
> > 
> > On 12/20/2011 02:47 PM, Ulrich Windl wrote:
> > > Hi!
> > > 
> > > I have a dual-primary DRBD that is not working well: It was working, then 
> > > I 
> > shut it down and restarted it. DRBD complained about split brain and fenced 
> > the other node. When coming up, the other node fenced this node. IMHO no 
> > node 
> > should have fenced each other.
> > > 
> > 
> > no config from drbd, no cluster config, partial/filtered logs ...
> > fragments ... you have _all_ information and can't find the problem ...
> > sorry, but I can't see how anyone can help you based on that information.
> 
> Well,
> 
> to me the problem looks like this: When starting both DRBDs talk to each 
> other successfully, then they say "we jsut talked about not being able to 
> talk to each other, so let's commit suicide, because afterwards we can talk 
> better to each other"
> 
> I think the diagnosis for "split brain" is based on disk content, not on 
> communication failure, because the nodes just talked to each other. So a 
> sync, not suicide would be the proper solution for the conflict.
> 
> And as far as the DRBD logs are concearned, they are complete in the interval 
> that's interesting.
> 
> I only heard  from third party rumors that "this and that" isn't working, but 
> nobody could actually tell me why. I was hoping to get some insight here.
> 
> > 
> > I personally think it is part of the free community support deal to
> > share as much information as possible if one wants help for free.
> 
> Well, if anybody has a dual-primary DRBD (with OCFS on top) working with 
> pacemaker, would you share your configuration with me to find out what's 
> different?
> 
> Here's my configuration:
> # grep -v '^[      ]*#' *
> global_common.conf:global {
> global_common.conf:     usage-count no;
> global_common.conf:}
> global_common.conf:
> global_common.conf:common {
> global_common.conf:     protocol C;
> global_common.conf:
> global_common.conf:     handlers {
> global_common.conf:             pri-on-incon-degr 
> "/usr/lib/drbd/notify-pri-on-incon-degr.sh; 
> /usr/lib/drbd/notify-emergency-reboot.sh; sync; echo b > /proc/sysrq-trigger 
> ; reboot -f";
> global_common.conf:             pri-lost-after-sb 
> "/usr/lib/drbd/notify-pri-lost-after-sb.sh; 
> /usr/lib/drbd/notify-emergency-reboot.sh; sync; echo b > /proc/sysrq-trigger 
> ; reboot -f";
> global_common.conf:             local-io-error 
> "/usr/lib/drbd/notify-io-error.sh; 
> /usr/lib/drbd/notify-emergency-shutdown.sh; sync; echo o > 
> /proc/sysrq-trigger ; halt -f";
> global_common.conf:             split-brain 
> "/usr/lib/drbd/notify-split-brain.sh root";
> global_common.conf:             out-of-sync 
> "/usr/lib/drbd/notify-out-of-sync.sh root";
> global_common.conf:     }
> global_common.conf:
> global_common.conf:     startup {
> global_common.conf:             become-primary-on both;
> global_common.conf:             wfc-timeout 15;
> global_common.conf:     }
> global_common.conf:
> global_common.conf:     disk {
> global_common.conf:             use-bmbv;
> global_common.conf:     }

So you do not even have DRBD fencing configured,
yet claim that DRBD fencing was shooting your nodes.
Yeah, right. 

> global_common.conf:
> global_common.conf:     net {
> global_common.conf:             allow-two-primaries;
> global_common.conf:             after-sb-0pri discard-zero-changes;
> global_common.conf:             after-sb-1pri discard-secondary;
> global_common.conf:             after-sb-2pri disconnect;
> global_common.conf:     }
> global_common.conf:
> global_common.conf:     syncer {
> global_common.conf:     }
> global_common.conf:}
> r0.res:resource r0 {
> r0.res: device /dev/drbd_r0 minor 0;
> r0.res: disk /dev/sys/samba;
> r0.res: meta-disk internal;
> r0.res: on h02 {
> r0.res:         address 172.20.78.2:7780;
> r0.res: }
> r0.res: on h06 {
> r0.res:         address 172.20.78.6:7780;
> r0.res: }
> r0.res: syncer {
> r0.res:         rate 7M;
> r0.res: }
> r0.res:}

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to