On 10/11/11 04:35, Andrew Beekhof wrote:
On Mon, Oct 10, 2011 at 9:12 PM, Florian Haas<flor...@hastexo.com>  wrote:
On 2011-10-08 15:55, Bart Coninckx wrote:
On 10/08/11 00:25, Lars Ellenberg wrote:
On Fri, Oct 07, 2011 at 10:21:08PM +0200, Bart Coninckx wrote:
On 10/06/11 22:03, Florian Haas wrote:
On 2011-10-06 21:43, Bart Coninckx wrote:
Hi all,

would you mind sending me examples of your crm config for a dual
primary
DRBD resource?

I used the one on

http://www.drbd.org/users-guide/s-ocfs2-pacemaker.html

and on

http://www.clusterlabs.org/wiki/Dual_Primary_DRBD_%2B_OCFS2

and they both result into split brain, except for when I start drbd
manually first.

They clearly should not. Rather than soliciting other people's
configurations and then try to adapt yours based on that, why don't you
upload _your_ CIB (not just a "crm configure dump", but a full
"cibadmin
-Q") and your DRBD configuration to your pastebin/pastie/fpaste and let
people tell you where your problem is?

OK, I posted the drbd.conf on http://pastebin.com/SQe9YxhY

cibadmin -Q is on http://pastebin.com/gTZqsACq

The split brain logging is on http://pastebin.com/7unKKkdi .

I somehow think you added some "--force" or "--overwrite-data-of-peer"
to some drbdadm/drbdsetup primary invocation?

Could this be some sort of timing issue? Manually things are find,
but there are some seconds in between the primary promotions.


OK, seems to be some sort of timing issue. I "fixed" this by adding a
"sleep 1" in the RA right before the "do_drbdadm primary $DRBD_RESOURCE"
line.

I'm surprised though that I'm the first one to run into this.

Er, wait. I'm cross-posting this to the Pacemaker list on a hunch.

Andrew, in Boston last year you mentioned you were planning to implement
a change to Master/Slave sets in which, iirc, startup and promotion
would happen in one fell swoop (I believe the NTT folks made a
compelling case for this). Has that change ever been implemented?

Alas no.
I still have intentions of doing so, but I was consumed with Matahari
for most of this year and have been playing catch-up ever since.

If you were inclined, you could (re)create a bug for this in
http://bugs.clusterlabs.org

And if
so, at which Pacemaker version? Is there a configuration option to
revert back to the old behavior where the resource would be started
first, and then promotion would occur some time after that?

Cheers,
Florian

--
Need help with High Availability?
http://www.hastexo.com/now

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

_______________________________________________
drbd-user mailing list
drbd-u...@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Florian,

Does this mean you thought this problem could have been the result of changes done by Andrew to the DRBD RA? But sindce he hasn't done them yet, isn't?

thx,

B.

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Reply via email to