Florian(and all), thanks for the reply. I've gone over past threads on the DRBD list as you suggested, and found only this: http://archives.free.net.ph/message/20090909.131635.ef640f6a.en.html
I am not entirely certain what specific problem the one-separate-cluster-at-each-site design addresses that one-node-on-each-site does not. >From the above thread, the only roadblock explicitly mentioned was setting up >cross-site multicast routing, which needs to be made to work. Fair enough. I'd like to get a clear idea of what the roadblocks --actually are-- (not on a "The WAN link" level but what the WAN link -actually breaks-) to doing what I suggested. Assuming I can get it to work, are there any other specific reasons it wouldn't? To recap, in my proposed solution, an outage will result in four things: --- 1. A "Race" by both nodes to a 3rd site, to perform an atomic operation (a mkdir for instance). Following it, it will be abundantly clear to both nodes "who is right, and who is dead". --- 2. A hard-iLO-poweroff STONITH (NOT reboot!) from the winner to the loser's iLO. It can also iptables-block all comms from the loser until further notice as an extra safety-net. --- 3. A hard-own-iLO-poweroff-else-kernel-halt SMITH (NOT reboot!) suicide by the loser (SMITH is our pet acronym for Shoot-Myself-...). --- 4. A "WAN-PROBLEM=[true|false] flag immediately raised (locally) by the winner based on pinging the OTHER SITE's ROUTER. A separate resource on the winner will, in the presence of this flag, monitor the same router of the other site for life, and when the other site comes back up (perhaps -and-stays-up-for-an-hour- or some similar flap-avoiding logic) issues a POWERON to the other node's iLO which will come back up as a drbd slave, resync and get re-promoted to master. As an attractive side-benefit, this is a deathmatch-proof design. ---- NOTE: There's a departure from common wisdom here, and I am not sure whether this one of the issues you're pointing at. Common wisdom states: SMITH BAD, not reliable (obvious reasons - no success/failure etc) In this solution I claim: SMIT BAD, not reliable, except in one specific failure mode (WAN outage) where SMITH GOOD, is reliable, shortcomings can be worked around. both steps [2] and [3] are issued on EVERY TYPE of outage, regardless of whether it's WAN-related or not. In non-WAN issues the loser is considered compromised, thus making [3] unreliable, but [2] is reliable. In WAN issues, the WAN is considered compromised, thus making [2] unreliable, but the node itself is sound, so [3] still is reliable. To sum up, it looks to me like the "data safety" is provided by the layer underneath DRBD, not DRBD itself, and if it works as advertised, DRBD should have no problem, thus we have a system sufficiently reliable to withstand any scenario short of a double failure. ... thoughts? -- -----Original Message----- From: Florian Haas [mailto:florian.h...@linbit.com] Sent: Monday, 18 January 2010 9:36 PM To: pacemaker@oss.clusterlabs.org Subject: Re: [Pacemaker] Split Site 2-way clusters On 2010-01-18 11:14, Andrew Beekhof wrote: > On Thu, Jan 14, 2010 at 11:44 PM, Miki Shapiro > <miki.shap...@coles.com.au> wrote: >> Confused. >> >> >> >> I *am* running DRBD in dual-master mode > > /me cringes... this sounds to me like an impossibly dangerous idea. > Can someone from linbit comment on this please? Am I imagining this? Dual-Primary DRBD in a split site cluster? Really really bad idea. Anyone attempting this, please search the drbd-user archives for multiple discussions about this in the past. Then reconsider. Hope that makes it clear enough. Florian ______________________________________________________________________ This email and any attachments may contain privileged and confidential information and are intended for the named addressee only. If you have received this e-mail in error, please notify the sender and delete this e-mail immediately. Any confidentiality, privilege or copyright is not waived or lost because this e-mail has been sent to you in error. It is your responsibility to check this e-mail and any attachments for viruses. No warranty is made that this material is free from computer virus or any other defect or error. Any loss/damage incurred by using this material is not the sender's responsibility. The sender's entire liability will be limited to resupplying the material. ______________________________________________________________________ _______________________________________________ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker