On 2016-06-20 09:13, Jehan-Guillaume de Rorthais wrote:

I've heard multiple time this kind of argument on the field, but soon or later,
these clusters actually had a split brain scenario with clients connected on
both side, some very bad corruptions, data lost, etc.

I'm sure it's a very helpful answer but the question was about suspending pacemaker while I manually fix a problem with the resource.

I too would very much like to know how to get pacemaker to "unmonitor" my resources and not get in the way while I'm updating and/or fixing them.

In heartbeat mon was a completely separate component that could be moved out of the way when needed.

In pacemaker I now had to power-cycle the nodes several times because in a 2-node active/passive cluster without quorum and fencing set up like
- drbd master-slave
- drbd filesystem (colocated and ordered after the master)
- symlink (colocated and ordered after the fs)
- service (colocated and ordered after the symlink)
-- when the service fails to start due to user error, pacemaker fscks up everything up to and including the master-slave drbd and "clearing" errors on the service does not fix the symlink and the rest of it. (So far I've been unable to reliable reproduce it in testing environments, Murphy made sure it only happens on production clusters.)

Right now it seems to me for drbd split brain I'll have to stop the cluster on victim node, do manual split brain recovery, and restart the cluster after sync is complete. Is that correct?

Dimitri


_______________________________________________
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to