I have prepared a write-up explaining my requirements and current solution that I am proposing based on my understanding so far. Kindly let me know if what I am proposing is good or there is a better way to achieve the same.
https://drive.google.com/file/d/0B0zPvL-Tp-JSTEJpcUFTanhsNzQ/view?usp=sharing Let me know if you face any issue in accessing the above link. Thanks. On Thu, Dec 3, 2015 at 11:34 PM, Ken Gaillot <kgail...@redhat.com> wrote: > On 12/03/2015 05:23 AM, Nikhil Utane wrote: > > Ken, > > > > One more question, if i have to propagate configuration changes between > the > > nodes then is cpg (closed process group) the right way? > > For e.g. > > Active Node1 has config A=1, B=2 > > Active Node2 has config A=3, B=4 > > Standby Node needs to have configuration for all the nodes such that > > whichever goes down, it comes up with those values. > > Here configuration is not static but can be updated at run-time. > > Being unfamiliar with the specifics of your case, I can't say what the > best approach is, but it sounds like you will need to write a custom OCF > resource agent to manage your service. > > A resource agent is similar to an init script: > > http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#ap-ocf > > The RA will start the service with the appropriate configuration. It can > use per-resource options configured in pacemaker or external information > to do that. > > How does your service get its configuration currently? > > > BTW, I'm little confused between OpenAIS and Corosync. For my purpose I > > should be able to use either, right? > > Corosync started out as a subset of OpenAIS, optimized for use with > Pacemaker. Corosync 2 is now the preferred membership layer for > Pacemaker for most uses, though other layers are still supported. > > > Thanks. > > > > On Tue, Dec 1, 2015 at 9:04 PM, Ken Gaillot <kgail...@redhat.com> wrote: > > > >> On 12/01/2015 05:31 AM, Nikhil Utane wrote: > >>> Hi, > >>> > >>> I am evaluating whether it is feasible to use Pacemaker + Corosync to > add > >>> support for clustering/redundancy into our product. > >> > >> Most definitely > >> > >>> Our objectives: > >>> 1) Support N+1 redundancy. i,e. N Active and (up to) 1 Standby. > >> > >> You can do this with location constraints and scores. See: > >> > >> > http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#_deciding_which_nodes_a_resource_can_run_on > >> > >> Basically, you give the standby node a lower score than the other nodes. > >> > >>> 2) Each node has some different configuration parameters. > >>> 3) Whenever any active node goes down, the standby node comes up with > the > >>> same configuration that the active had. > >> > >> How you solve this requirement depends on the specifics of your > >> situation. Ideally, you can use OCF resource agents that take the > >> configuration location as a parameter. You may have to write your own, > >> if none is available for your services. > >> > >>> 4) There is no one single process/service for which we need redundancy, > >>> rather it is the entire system (multiple processes running together). > >> > >> This is trivially implemented using either groups or ordering and > >> colocation constraints. > >> > >> Order constraint = start service A before starting service B (and stop > >> in reverse order) > >> > >> Colocation constraint = keep services A and B on the same node > >> > >> Group = shortcut to specify several services that need to start/stop in > >> order and be kept together > >> > >> > >> > http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm231363875392 > >> > >> > >> > http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#group-resources > >> > >> > >>> 5) I would also want to be notified when any active<->standby state > >>> transition happens as I would want to take some steps at the > application > >>> level. > >> > >> There are multiple approaches. > >> > >> If you don't mind compiling your own packages, the latest master branch > >> (which will be part of the upcoming 1.1.14 release) has built-in > >> notification capability. See: > >> http://blog.clusterlabs.org/blog/2015/reliable-notifications/ > >> > >> Otherwise, you can use SNMP or e-mail if your packages were compiled > >> with those options, or you can use the ocf:pacemaker:ClusterMon resource > >> agent: > >> > >> > http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm231308442928 > >> > >>> I went through the documents/blogs but all had example for 1 active > and 1 > >>> standby use-case and that too for some standard service like httpd. > >> > >> Pacemaker is incredibly versatile, and the use cases are far too varied > >> to cover more than a small subset. Those simple examples show the basic > >> building blocks, and can usually point you to the specific features you > >> need to investigate further. > >> > >>> One additional question, If I am having multiple actives, then Virtual > IP > >>> configuration cannot be used? Is it possible such that N actives have > >>> different IP addresses but whenever standby becomes active it uses the > IP > >>> address of the failed node? > >> > >> Yes, there are a few approaches here, too. > >> > >> The simplest is to assign a virtual IP to each active, and include it in > >> your group of resources. The whole group will fail over to the standby > >> node if the original goes down. > >> > >> If you want a single virtual IP that is used by all your actives, one > >> alternative is to clone the ocf:heartbeat:IPaddr2 resource. When cloned, > >> that resource agent will use iptables' CLUSTERIP functionality, which > >> relies on multicast Ethernet addresses (not to be confused with > >> multicast IP). Since multicast Ethernet has limitations, this is not > >> often used in production. > >> > >> A more complicated method is to use a virtual IP in combination with a > >> load-balancer such as haproxy. Pacemaker can manage haproxy and the real > >> services, and haproxy manages distributing requests to the real > services. > >> > >>> Thanking in advance. > >>> Nikhil > >> > >> A last word of advice: Fencing (aka STONITH) is important for proper > >> recovery from difficult failure conditions. Without it, it is possible > >> to have data loss or corruption in a split-brain situation. > >
_______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org