On Mon, Oct 10, 2011 at 9:12 PM, Florian Haas <flor...@hastexo.com> wrote: > On 2011-10-08 15:55, Bart Coninckx wrote: >> On 10/08/11 00:25, Lars Ellenberg wrote: >>> On Fri, Oct 07, 2011 at 10:21:08PM +0200, Bart Coninckx wrote: >>>> On 10/06/11 22:03, Florian Haas wrote: >>>>> On 2011-10-06 21:43, Bart Coninckx wrote: >>>>>> Hi all, >>>>>> >>>>>> would you mind sending me examples of your crm config for a dual >>>>>> primary >>>>>> DRBD resource? >>>>>> >>>>>> I used the one on >>>>>> >>>>>> http://www.drbd.org/users-guide/s-ocfs2-pacemaker.html >>>>>> >>>>>> and on >>>>>> >>>>>> http://www.clusterlabs.org/wiki/Dual_Primary_DRBD_%2B_OCFS2 >>>>>> >>>>>> and they both result into split brain, except for when I start drbd >>>>>> manually first. >>>>> >>>>> They clearly should not. Rather than soliciting other people's >>>>> configurations and then try to adapt yours based on that, why don't you >>>>> upload _your_ CIB (not just a "crm configure dump", but a full >>>>> "cibadmin >>>>> -Q") and your DRBD configuration to your pastebin/pastie/fpaste and let >>>>> people tell you where your problem is? >>>> >>>> OK, I posted the drbd.conf on http://pastebin.com/SQe9YxhY >>>> >>>> cibadmin -Q is on http://pastebin.com/gTZqsACq >>>> >>>> The split brain logging is on http://pastebin.com/7unKKkdi . >>> >>> I somehow think you added some "--force" or "--overwrite-data-of-peer" >>> to some drbdadm/drbdsetup primary invocation? >>> >>>> Could this be some sort of timing issue? Manually things are find, >>>> but there are some seconds in between the primary promotions. >>> >> >> OK, seems to be some sort of timing issue. I "fixed" this by adding a >> "sleep 1" in the RA right before the "do_drbdadm primary $DRBD_RESOURCE" >> line. >> >> I'm surprised though that I'm the first one to run into this. > > Er, wait. I'm cross-posting this to the Pacemaker list on a hunch. > > Andrew, in Boston last year you mentioned you were planning to implement > a change to Master/Slave sets in which, iirc, startup and promotion > would happen in one fell swoop (I believe the NTT folks made a > compelling case for this). Has that change ever been implemented?
Alas no. I still have intentions of doing so, but I was consumed with Matahari for most of this year and have been playing catch-up ever since. If you were inclined, you could (re)create a bug for this in http://bugs.clusterlabs.org > And if > so, at which Pacemaker version? Is there a configuration option to > revert back to the old behavior where the resource would be started > first, and then promotion would occur some time after that? > > Cheers, > Florian > > -- > Need help with High Availability? > http://www.hastexo.com/now > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker