Re: [ClusterLabs] Recovering after split-brain

Dmitri Maziuk Mon, 20 Jun 2016 08:20:12 -0700

On 2016-06-20 09:13, Jehan-Guillaume de Rorthais wrote:

I've heard multiple time this kind of argument on the field, but soon or later,
these clusters actually had a split brain scenario with clients connected on
both side, some very bad corruptions, data lost, etc.

I'm sure it's a very helpful answer but the question was aboutsuspending pacemaker while I manually fix a problem with the resource.

I too would very much like to know how to get pacemaker to "unmonitor"my resources and not get in the way while I'm updating and/or fixing them.

In heartbeat mon was a completely separate component that could be movedout of the way when needed.

In pacemaker I now had to power-cycle the nodes several times because ina 2-node active/passive cluster without quorum and fencing set up like

- drbd master-slave
- drbd filesystem (colocated and ordered after the master)
- symlink (colocated and ordered after the fs)
- service (colocated and ordered after the symlink)

-- when the service fails to start due to user error, pacemaker fscks upeverything up to and including the master-slave drbd and "clearing"errors on the service does not fix the symlink and the rest of it. (Sofar I've been unable to reliable reproduce it in testing environments,Murphy made sure it only happens on production clusters.)

Right now it seems to me for drbd split brain I'll have to stop thecluster on victim node, do manual split brain recovery, and restart thecluster after sync is complete. Is that correct?


Dimitri


_______________________________________________
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Recovering after split-brain

Reply via email to