Matthew Palmer wrote:
On Thu, Mar 11, 2010 at 03:34:50PM +0800, Martin Aspeli wrote:
I was wondering, though, if fencing at the DRBD level would get around
the possible problem with a full power outage taking the fencing device
down.

In my poor understanding of things, it'd work like this:

  - Pacemaker runs on master and slave
  - Master loses all power
  - Pacemaker on slave notices something is wrong, and prepares to start
up postgres on slave, which will now also be the one writing to the DRBD
disk
  - Before it can do that, it wants to fence off DRBD
  - It does that by saying to the local DRBD, "even if the other node
tries to send you stuff, ignore it". This would avoid the risk of data
corruption on slave. Before master could came back up, it'd need to wipe
its local partition and re-sync from slave (which is now the new
primary).

The old master shouldn't need to "wipe" anything, as it should have no data
that the new master didn't have at the time of the power failure.

I was just thinking that if the failure was, e.g., the connection between master and the rest of the cluster, postgres on the old master could stay up and merrily keep writing to the filesystem on the DRBD.

In the case of power failure, that wouldn't happen, of course. But in case of total power failure, the fencing device (an IPMI device, Dell DRAC) would be inaccessible too, so the cluster would not fail postgres over.

The piece of the puzzle I think you're missing is that DRBD will never be
ready for service on a node unless one of the following conditions is true:

* Both nodes have talked to each other and agreed that they're ready to
   exchange data (either because of a clean start on both sides, because
   you've manually prodded a rebooted node into operation again, or because a
   split-brain handler dealt with any issues); or

* A failed node has been successfully fenced and the cluster manager has
   notified DRBD of this fact.

Right.

In the case you suggest, where the whole of node "A" disappears, you may
well have a fencing problem: because node "B" can't positively confirm that
"A" is, in fact, dead (because the DRAC went away too), it may refuse to
confirm the fencing operation (this is why using DRAC/IPMI as a STONITH
device isn't such a win).

From what I'm reading, the only fencing device that's truly good is a UPS that can cut power to an individual device. Unfortunately, we don't have such a device and can't get one. We do have a UPS with a backup generator, and dual PSUs, so total power outage is unlikely. But someone could also just pull the (two) cables out of the UPS and pacemaker would be none the wiser.

On the other hand, the DRAC STONITH handler may
assume that if it can't talk to a DRAC unit, that the machine is fenced (I
don't know which way it goes, I haven't looked).

The docs say it will assume the device is not fenced, and keep trying to fence it "forever", hence never actually failing over.

What I don't get is, if this happens, why can't slave just say, "I'm going to assume master is gone and take over postgres, and I'm not going to let anyone else write anything to my disk". In my mind, this is similar to having a shared SAN and having the fencing operation be "node master is no longer allowed to mount or write to the SAN disk, even if it tries".

Martin

--
Author of `Professional Plone Development`, a book for developers who
want to work with Plone. See http://martinaspeli.net/plone-book


_______________________________________________
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Reply via email to