I wouldn't be doing anything without corosync2 and its option that requires all nodes to be online before quorum is granted. Otherwise I can imagine ways that the old master might try to promote itself.
On 02/07/2013, at 7:18 PM, Michael Schwartzkopff <mi...@clusterbau.com> wrote: > Am Dienstag, 2. Juli 2013, 09:47:31 schrieb Stefano Sasso: > > Hello folks, > > I have the following setup in mind, but I need some advice and one hint > > on how to realize a particular function. > > > > I have a N (>= 2) nodes cluster, with data storage on postgresql. > > I would like to manage postgres master-slave replication in this way: one > > node is the "master", one is the "slave", and the others are "standby" > > nodes. > > If the master fails, the slave becomes the master, and one of the standby > > becomes the slave. > > If the slave fails, one of the standby becomes the new slave. > > If one of the "standby" fails, no problem :) > > I can correctly manage this configuration with ms and a custom script > > (using > > ocf:pacemaker:Stateful as example). If the cluster is already operational, > > the failover works fine. > > > > My problem is about cluster start-up: in fact, only the previous running > > master and slave own the most updated data; so I would like that the new > > master should be the "old master" (or, even, the old slave), and the new > > slave should be the "old slave" (but this one is not mandatory). The > > important thing is that the new master should have up-to-date data. > > This should happen even if the servers are booted up with some minutes of > > delay between them. (users are very stupid sometimes). > > > > My idea is the following: > > the MS resource is not started when the cluster comes up, but on startup > > there will only be one "arbitrator" resource (started on only one node). > > This resource reads from somewhere which was the previous master and the > > previous slave, and it wait up to 5 minutes to see if one of them comes up. > > In positive case, it forces the MS master resource to be run on that node > > (and start it); in negative case, if the wait timer expired, it start the > > master resource on a random node. > > hi, > > an other possible was to acchieve your goal ist to add resource level fencing > to your custom postgresql resource agent. > > If a node leaves the cluster and this node is NOT the running master or slave > of the postgresql instance, the surviving nodes add a location constraint to > the CIB that prevents a postgresql instance running on that lost node: > > loc ResFence_postgresql msPostgresql -inf: <lost node> > > When the node comes back online and is visible in the cluster again your > resource agent should remove that connstraint again. > > this constraints also can be set if the resource is stopping on a node. So > you can use the notify action the acchieve this from another node. Only on > the nodes where postgresql is running there is NO location constraint. If > there are any changes in the config (nodes leaves) the notify action checks > if a removal or an add of such a location constraint is nescessary. > > Did not work out in total. Perhaps you have to think about the problem again. > > Please see resource level fencing of the drbd agent of linbit and the drbd > configuration: > http://www.drbd.org/users-guide/s-pacemaker-fencing.html, section 8.3.2 > > -- > Dr. Michael Schwartzkopff > Guardinistr. 63 > 81375 München > > Tel: (0163) 172 50 98 > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org