Thank You Ken for such a detailed response. Truly appreciate it. Cheers.
On Tue, Dec 1, 2015 at 9:04 PM, Ken Gaillot <kgail...@redhat.com> wrote: > On 12/01/2015 05:31 AM, Nikhil Utane wrote: > > Hi, > > > > I am evaluating whether it is feasible to use Pacemaker + Corosync to add > > support for clustering/redundancy into our product. > > Most definitely > > > Our objectives: > > 1) Support N+1 redundancy. i,e. N Active and (up to) 1 Standby. > > You can do this with location constraints and scores. See: > > http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#_deciding_which_nodes_a_resource_can_run_on > > Basically, you give the standby node a lower score than the other nodes. > > > 2) Each node has some different configuration parameters. > > 3) Whenever any active node goes down, the standby node comes up with the > > same configuration that the active had. > > How you solve this requirement depends on the specifics of your > situation. Ideally, you can use OCF resource agents that take the > configuration location as a parameter. You may have to write your own, > if none is available for your services. > > > 4) There is no one single process/service for which we need redundancy, > > rather it is the entire system (multiple processes running together). > > This is trivially implemented using either groups or ordering and > colocation constraints. > > Order constraint = start service A before starting service B (and stop > in reverse order) > > Colocation constraint = keep services A and B on the same node > > Group = shortcut to specify several services that need to start/stop in > order and be kept together > > > http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm231363875392 > > > http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#group-resources > > > > 5) I would also want to be notified when any active<->standby state > > transition happens as I would want to take some steps at the application > > level. > > There are multiple approaches. > > If you don't mind compiling your own packages, the latest master branch > (which will be part of the upcoming 1.1.14 release) has built-in > notification capability. See: > http://blog.clusterlabs.org/blog/2015/reliable-notifications/ > > Otherwise, you can use SNMP or e-mail if your packages were compiled > with those options, or you can use the ocf:pacemaker:ClusterMon resource > agent: > > http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm231308442928 > > > I went through the documents/blogs but all had example for 1 active and 1 > > standby use-case and that too for some standard service like httpd. > > Pacemaker is incredibly versatile, and the use cases are far too varied > to cover more than a small subset. Those simple examples show the basic > building blocks, and can usually point you to the specific features you > need to investigate further. > > > One additional question, If I am having multiple actives, then Virtual IP > > configuration cannot be used? Is it possible such that N actives have > > different IP addresses but whenever standby becomes active it uses the IP > > address of the failed node? > > Yes, there are a few approaches here, too. > > The simplest is to assign a virtual IP to each active, and include it in > your group of resources. The whole group will fail over to the standby > node if the original goes down. > > If you want a single virtual IP that is used by all your actives, one > alternative is to clone the ocf:heartbeat:IPaddr2 resource. When cloned, > that resource agent will use iptables' CLUSTERIP functionality, which > relies on multicast Ethernet addresses (not to be confused with > multicast IP). Since multicast Ethernet has limitations, this is not > often used in production. > > A more complicated method is to use a virtual IP in combination with a > load-balancer such as haproxy. Pacemaker can manage haproxy and the real > services, and haproxy manages distributing requests to the real services. > > > Thanking in advance. > > Nikhil > > A last word of advice: Fencing (aka STONITH) is important for proper > recovery from difficult failure conditions. Without it, it is possible > to have data loss or corruption in a split-brain situation. > > _______________________________________________ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org >
_______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org