Hello Lars,

thanks for Your reply..

> You "Problem" is this:
> 
>       DRBD config:
>              allow-two-primaries,
>              but *NO* fencing policy,
>              and *NO* fencing handler.
> 
>       And, as if that was not bad enough already,
>       Pacemaker config:
>               no-quorum-policy="ignore" \
>               stonith-enabled="false"

yes, I've written it's just test cluster on virtual machines. therefore no 
fencing devices.

however I don't think it's the whole problem source, I've tried starting node2 
much later
after node1 (actually node1 has been running for about 1 day), and got right 
into same situation..
pacemaker just doesn't wait long enough before the drbds can connect at all and 
seems to promote them both.
it really seems to be regression to me, as this was always working well...

even though I've set no-quorum-policy to freeze, the problem returns as soon as 
cluster becomes quorate..
I have all split-brain and fencing scripts in drbd disabled intentionaly so I 
had chance to investigate, otherwise
one of the nodes always commited suicide but there should be no reason for 
split brain..

cheers!

nik




> D'oh.
> 
> And then, well,
> your nodes come up some minute+ after each other,
> and Pacemaker and DRBD behave exactly as configured:
> 
> 
> Jul 10 06:00:12 vmnci20 crmd: [3569]: info: do_state_transition: All 1 
> cluster nodes are eligible to run resources.
> 
> 
> Note the *1* ...
> 
> So it starts:
> Jul 10 06:00:12 vmnci20 pengine: [3568]: notice: LogActions: Start   
> drbd-sas0:0      (vmnci20)
> 
> But leaves:
> Jul 10 06:00:12 vmnci20 pengine: [3568]: notice: LogActions: Leave   
> drbd-sas0:1      (Stopped)
> as there is no peer node yet.
> 
> 
> And on the next iteration, we still have only one node:
> Jul 10 06:00:15 vmnci20 crmd: [3569]: info: do_state_transition: All 1 
> cluster nodes are eligible to run resources.
> 
> So we promote:
> Jul 10 06:00:15 vmnci20 pengine: [3568]: notice: LogActions: Promote 
> drbd-sas0:0      (Slave -> Master vmnci20)
> 
> 
> And only some minute later, the peer node joins:
> Jul 10 06:01:33 vmnci20 crmd: [3569]: info: do_state_transition: State 
> transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED 
> cause=C_FSA_INTERNAL origin=check_join_state ]
> Jul 10 06:01:33 vmnci20 crmd: [3569]: info: do_state_transition: All 2 
> cluster nodes responded to the join offer.
> 
> So now we can start the peer:
> 
> Jul 10 06:01:33 vmnci20 pengine: [3568]: notice: LogActions: Leave   
> drbd-sas0:0      (Master vmnci20)
> Jul 10 06:01:33 vmnci20 pengine: [3568]: notice: LogActions: Start   
> drbd-sas0:1      (vmnci21)
> 
> 
> And it even is promoted right away:
> Jul 10 06:01:36 vmnci20 pengine: [3568]: notice: LogActions: Promote 
> drbd-sas0:1      (Slave -> Master vmnci21)
> 
> And within those 3 seconds, DRBD was not able to establish the connection yet.
> 
> 
> You configured DRBD and Pacemaker to produce data divergence.
> Not suprisingly, that is exactly what you get.
> 
> 
> 
> Fix your Problem.
> See above; hint: fencing resource-and-stonith,
> crm-fence-peer.sh + stonith_admin,
> add stonith, maybe add a third node so you don't need to ignore quorum,
> ...
> 
> And all will be well.
> 
> 
> 
> -- 
> : Lars Ellenberg
> : LINBIT | Your Way to High Availability
> : DRBD/HA support and consulting http://www.linbit.com
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 

-- 
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:    +420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-------------------------------------

Attachment: pgplQrfWtrRJa.pgp
Description: PGP signature

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to