Re: [ClusterLabs] Help required for N+1 redundancy setup
Thanks Ken for the detailed response. I suppose I could even use some of the pcs/crm CLI commands then. Cheers. On Wed, Mar 16, 2016 at 8:27 PM, Ken Gaillotwrote: > On 03/16/2016 05:22 AM, Nikhil Utane wrote: > > I see following info gets updated in CIB. Can I use this or there is > better > > way? > > > > > crm-debug-origin="peer_update_callback" join="*down*" expected="member"> > > in_ccm/crmd/join reflect the current state of the node (as known by the > partition that you're looking at the CIB on), so if the node went down > and came back up, it won't tell you anything about being down. > > - in_ccm indicates that the node is part of the underlying cluster layer > (heartbeat/cman/corosync) > > - crmd indicates that the node is communicating at the pacemaker layer > > - join indicates what phase of the join process the node is at > > There's not a direct way to see what node went down after the fact. > There are ways however: > > - if the node was running resources, those will be failed, and those > failures (including node) will be shown in the cluster status > > - the logs show all node membership events; you can search for logs such > as "state is now lost" and "left us" > > - "stonith -H $NODE_NAME" will show the fence history for a given node, > so if the node went down due to fencing, it will show up there > > - you can configure an ocf:pacemaker:ClusterMon resource to run crm_mon > periodically and run a script for node events, and you can write the > script to do whatever you want (email you, etc.) (in the upcoming 1.1.15 > release, built-in notifications will make this more reliable and easier, > but any script you use with ClusterMon will still be usable with the new > method) > > > On Wed, Mar 16, 2016 at 12:40 PM, Nikhil Utane < > nikhil.subscri...@gmail.com> > > wrote: > > > >> Hi Ken, > >> > >> Sorry about the long delay. This activity was de-focussed but now it's > >> back on track. > >> > >> One part of question that is still not answered is on the newly active > >> node, how to find out which was the node that went down? > >> Anything that gets updated in the status section that can be read and > >> figured out? > >> > >> Thanks. > >> Nikhil > >> > >> On Sat, Jan 9, 2016 at 3:31 AM, Ken Gaillot > wrote: > >> > >>> On 01/08/2016 11:13 AM, Nikhil Utane wrote: > > I think stickiness will do what you want here. Set a stickiness > higher > > than the original node's preference, and the resource will want to > stay > > where it is. > > Not sure I understand this. Stickiness will ensure that resources > don't > move back when original node comes back up, isn't it? > But in my case, I want the newly standby node to become the backup > node > >>> for > all other nodes. i.e. it should now be able to run all my resource > >>> groups > albeit with a lower score. How do I achieve that? > >>> > >>> Oh right. I forgot to ask whether you had an opt-out > >>> (symmetric-cluster=true, the default) or opt-in > >>> (symmetric-cluster=false) cluster. If you're opt-out, every node can > run > >>> every resource unless you give it a negative preference. > >>> > >>> Partly it depends on whether there is a good reason to give each > >>> instance a "home" node. Often, there's not. If you just want to balance > >>> resources across nodes, the cluster will do that by default. > >>> > >>> If you prefer to put certain resources on certain nodes because the > >>> resources require more physical resources (RAM/CPU/whatever), you can > >>> set node attributes for that and use rules to set node preferences. > >>> > >>> Either way, you can decide whether you want stickiness with it. > >>> > Also can you answer, how to get the values of node that goes active > and > >>> the > node that goes down inside the OCF agent? Do I need to use > >>> notification or > some simpler alternative is available? > Thanks. > > > On Fri, Jan 8, 2016 at 9:30 PM, Ken Gaillot > >>> wrote: > > > On 01/08/2016 06:55 AM, Nikhil Utane wrote: > >> Would like to validate my final config. > >> > >> As I mentioned earlier, I will be having (upto) 5 active servers > and 1 > >> standby server. > >> The standby server should take up the role of active that went down. > >>> Each > >> active has some unique configuration that needs to be preserved. > >> > >> 1) So I will create total 5 groups. Each group has a > >>> "heartbeat::IPaddr2 > >> resource (for virtual IP) and my custom resource. > >> 2) The virtual IP needs to be read inside my custom OCF agent, so I > >>> will > >> make use of attribute reference and point to the value of IPaddr2 > >>> inside > > my > >> custom resource to avoid duplication. > >> 3) I will then configure location constraint to run the group > resource > > on 5 > >> active nodes with higher score and lesser
Re: [ClusterLabs] Help required for N+1 redundancy setup
On 03/16/2016 05:22 AM, Nikhil Utane wrote: > I see following info gets updated in CIB. Can I use this or there is better > way? > > crm-debug-origin="peer_update_callback" join="*down*" expected="member"> in_ccm/crmd/join reflect the current state of the node (as known by the partition that you're looking at the CIB on), so if the node went down and came back up, it won't tell you anything about being down. - in_ccm indicates that the node is part of the underlying cluster layer (heartbeat/cman/corosync) - crmd indicates that the node is communicating at the pacemaker layer - join indicates what phase of the join process the node is at There's not a direct way to see what node went down after the fact. There are ways however: - if the node was running resources, those will be failed, and those failures (including node) will be shown in the cluster status - the logs show all node membership events; you can search for logs such as "state is now lost" and "left us" - "stonith -H $NODE_NAME" will show the fence history for a given node, so if the node went down due to fencing, it will show up there - you can configure an ocf:pacemaker:ClusterMon resource to run crm_mon periodically and run a script for node events, and you can write the script to do whatever you want (email you, etc.) (in the upcoming 1.1.15 release, built-in notifications will make this more reliable and easier, but any script you use with ClusterMon will still be usable with the new method) > On Wed, Mar 16, 2016 at 12:40 PM, Nikhil Utane> wrote: > >> Hi Ken, >> >> Sorry about the long delay. This activity was de-focussed but now it's >> back on track. >> >> One part of question that is still not answered is on the newly active >> node, how to find out which was the node that went down? >> Anything that gets updated in the status section that can be read and >> figured out? >> >> Thanks. >> Nikhil >> >> On Sat, Jan 9, 2016 at 3:31 AM, Ken Gaillot wrote: >> >>> On 01/08/2016 11:13 AM, Nikhil Utane wrote: > I think stickiness will do what you want here. Set a stickiness higher > than the original node's preference, and the resource will want to stay > where it is. Not sure I understand this. Stickiness will ensure that resources don't move back when original node comes back up, isn't it? But in my case, I want the newly standby node to become the backup node >>> for all other nodes. i.e. it should now be able to run all my resource >>> groups albeit with a lower score. How do I achieve that? >>> >>> Oh right. I forgot to ask whether you had an opt-out >>> (symmetric-cluster=true, the default) or opt-in >>> (symmetric-cluster=false) cluster. If you're opt-out, every node can run >>> every resource unless you give it a negative preference. >>> >>> Partly it depends on whether there is a good reason to give each >>> instance a "home" node. Often, there's not. If you just want to balance >>> resources across nodes, the cluster will do that by default. >>> >>> If you prefer to put certain resources on certain nodes because the >>> resources require more physical resources (RAM/CPU/whatever), you can >>> set node attributes for that and use rules to set node preferences. >>> >>> Either way, you can decide whether you want stickiness with it. >>> Also can you answer, how to get the values of node that goes active and >>> the node that goes down inside the OCF agent? Do I need to use >>> notification or some simpler alternative is available? Thanks. On Fri, Jan 8, 2016 at 9:30 PM, Ken Gaillot >>> wrote: > On 01/08/2016 06:55 AM, Nikhil Utane wrote: >> Would like to validate my final config. >> >> As I mentioned earlier, I will be having (upto) 5 active servers and 1 >> standby server. >> The standby server should take up the role of active that went down. >>> Each >> active has some unique configuration that needs to be preserved. >> >> 1) So I will create total 5 groups. Each group has a >>> "heartbeat::IPaddr2 >> resource (for virtual IP) and my custom resource. >> 2) The virtual IP needs to be read inside my custom OCF agent, so I >>> will >> make use of attribute reference and point to the value of IPaddr2 >>> inside > my >> custom resource to avoid duplication. >> 3) I will then configure location constraint to run the group resource > on 5 >> active nodes with higher score and lesser score on standby. >> For e.g. >> Group NodeScore >> - >> MyGroup1node1 500 >> MyGroup1node6 0 >> >> MyGroup2node2 500 >> MyGroup2node6 0 >> .. >> MyGroup5node5 500 >> MyGroup5node6 0 >> >> 4) Now if say
Re: [ClusterLabs] Help required for N+1 redundancy setup
I see following info gets updated in CIB. Can I use this or there is better way? On Wed, Mar 16, 2016 at 12:40 PM, Nikhil Utanewrote: > Hi Ken, > > Sorry about the long delay. This activity was de-focussed but now it's > back on track. > > One part of question that is still not answered is on the newly active > node, how to find out which was the node that went down? > Anything that gets updated in the status section that can be read and > figured out? > > Thanks. > Nikhil > > On Sat, Jan 9, 2016 at 3:31 AM, Ken Gaillot wrote: > >> On 01/08/2016 11:13 AM, Nikhil Utane wrote: >> >> I think stickiness will do what you want here. Set a stickiness higher >> >> than the original node's preference, and the resource will want to stay >> >> where it is. >> > >> > Not sure I understand this. Stickiness will ensure that resources don't >> > move back when original node comes back up, isn't it? >> > But in my case, I want the newly standby node to become the backup node >> for >> > all other nodes. i.e. it should now be able to run all my resource >> groups >> > albeit with a lower score. How do I achieve that? >> >> Oh right. I forgot to ask whether you had an opt-out >> (symmetric-cluster=true, the default) or opt-in >> (symmetric-cluster=false) cluster. If you're opt-out, every node can run >> every resource unless you give it a negative preference. >> >> Partly it depends on whether there is a good reason to give each >> instance a "home" node. Often, there's not. If you just want to balance >> resources across nodes, the cluster will do that by default. >> >> If you prefer to put certain resources on certain nodes because the >> resources require more physical resources (RAM/CPU/whatever), you can >> set node attributes for that and use rules to set node preferences. >> >> Either way, you can decide whether you want stickiness with it. >> >> > Also can you answer, how to get the values of node that goes active and >> the >> > node that goes down inside the OCF agent? Do I need to use >> notification or >> > some simpler alternative is available? >> > Thanks. >> > >> > >> > On Fri, Jan 8, 2016 at 9:30 PM, Ken Gaillot >> wrote: >> > >> >> On 01/08/2016 06:55 AM, Nikhil Utane wrote: >> >>> Would like to validate my final config. >> >>> >> >>> As I mentioned earlier, I will be having (upto) 5 active servers and 1 >> >>> standby server. >> >>> The standby server should take up the role of active that went down. >> Each >> >>> active has some unique configuration that needs to be preserved. >> >>> >> >>> 1) So I will create total 5 groups. Each group has a >> "heartbeat::IPaddr2 >> >>> resource (for virtual IP) and my custom resource. >> >>> 2) The virtual IP needs to be read inside my custom OCF agent, so I >> will >> >>> make use of attribute reference and point to the value of IPaddr2 >> inside >> >> my >> >>> custom resource to avoid duplication. >> >>> 3) I will then configure location constraint to run the group resource >> >> on 5 >> >>> active nodes with higher score and lesser score on standby. >> >>> For e.g. >> >>> Group NodeScore >> >>> - >> >>> MyGroup1node1 500 >> >>> MyGroup1node6 0 >> >>> >> >>> MyGroup2node2 500 >> >>> MyGroup2node6 0 >> >>> .. >> >>> MyGroup5node5 500 >> >>> MyGroup5node6 0 >> >>> >> >>> 4) Now if say node1 were to go down, then stop action on node1 will >> first >> >>> get called. Haven't decided if I need to do anything specific here. >> >> >> >> To clarify, if node1 goes down intentionally (e.g. standby or stop), >> >> then all resources on it will be stopped first. But if node1 becomes >> >> unavailable (e.g. crash or communication outage), it will get fenced. >> >> >> >>> 5) But when the start action of node 6 gets called, then using crm >> >> command >> >>> line interface, I will modify the above config to swap node 1 and >> node 6. >> >>> i.e. >> >>> MyGroup1node6 500 >> >>> MyGroup1node1 0 >> >>> >> >>> MyGroup2node2 500 >> >>> MyGroup2node1 0 >> >>> >> >>> 6) To do the above, I need the newly active and newly standby node >> names >> >> to >> >>> be passed to my start action. What's the best way to get this >> information >> >>> inside my OCF agent? >> >> >> >> Modifying the configuration from within an agent is dangerous -- too >> >> much potential for feedback loops between pacemaker and the agent. >> >> >> >> I think stickiness will do what you want here. Set a stickiness higher >> >> than the original node's preference, and the resource will want to stay >> >> where it is. >> >> >> >>> 7) Apart from node name, there will be other information which I plan >> to >> >>> pass by making use of node attributes. What's the best way to get this >> >>>
Re: [ClusterLabs] Help required for N+1 redundancy setup
Hi Ken, Sorry about the long delay. This activity was de-focussed but now it's back on track. One part of question that is still not answered is on the newly active node, how to find out which was the node that went down? Anything that gets updated in the status section that can be read and figured out? Thanks. Nikhil On Sat, Jan 9, 2016 at 3:31 AM, Ken Gaillotwrote: > On 01/08/2016 11:13 AM, Nikhil Utane wrote: > >> I think stickiness will do what you want here. Set a stickiness higher > >> than the original node's preference, and the resource will want to stay > >> where it is. > > > > Not sure I understand this. Stickiness will ensure that resources don't > > move back when original node comes back up, isn't it? > > But in my case, I want the newly standby node to become the backup node > for > > all other nodes. i.e. it should now be able to run all my resource groups > > albeit with a lower score. How do I achieve that? > > Oh right. I forgot to ask whether you had an opt-out > (symmetric-cluster=true, the default) or opt-in > (symmetric-cluster=false) cluster. If you're opt-out, every node can run > every resource unless you give it a negative preference. > > Partly it depends on whether there is a good reason to give each > instance a "home" node. Often, there's not. If you just want to balance > resources across nodes, the cluster will do that by default. > > If you prefer to put certain resources on certain nodes because the > resources require more physical resources (RAM/CPU/whatever), you can > set node attributes for that and use rules to set node preferences. > > Either way, you can decide whether you want stickiness with it. > > > Also can you answer, how to get the values of node that goes active and > the > > node that goes down inside the OCF agent? Do I need to use notification > or > > some simpler alternative is available? > > Thanks. > > > > > > On Fri, Jan 8, 2016 at 9:30 PM, Ken Gaillot wrote: > > > >> On 01/08/2016 06:55 AM, Nikhil Utane wrote: > >>> Would like to validate my final config. > >>> > >>> As I mentioned earlier, I will be having (upto) 5 active servers and 1 > >>> standby server. > >>> The standby server should take up the role of active that went down. > Each > >>> active has some unique configuration that needs to be preserved. > >>> > >>> 1) So I will create total 5 groups. Each group has a > "heartbeat::IPaddr2 > >>> resource (for virtual IP) and my custom resource. > >>> 2) The virtual IP needs to be read inside my custom OCF agent, so I > will > >>> make use of attribute reference and point to the value of IPaddr2 > inside > >> my > >>> custom resource to avoid duplication. > >>> 3) I will then configure location constraint to run the group resource > >> on 5 > >>> active nodes with higher score and lesser score on standby. > >>> For e.g. > >>> Group NodeScore > >>> - > >>> MyGroup1node1 500 > >>> MyGroup1node6 0 > >>> > >>> MyGroup2node2 500 > >>> MyGroup2node6 0 > >>> .. > >>> MyGroup5node5 500 > >>> MyGroup5node6 0 > >>> > >>> 4) Now if say node1 were to go down, then stop action on node1 will > first > >>> get called. Haven't decided if I need to do anything specific here. > >> > >> To clarify, if node1 goes down intentionally (e.g. standby or stop), > >> then all resources on it will be stopped first. But if node1 becomes > >> unavailable (e.g. crash or communication outage), it will get fenced. > >> > >>> 5) But when the start action of node 6 gets called, then using crm > >> command > >>> line interface, I will modify the above config to swap node 1 and node > 6. > >>> i.e. > >>> MyGroup1node6 500 > >>> MyGroup1node1 0 > >>> > >>> MyGroup2node2 500 > >>> MyGroup2node1 0 > >>> > >>> 6) To do the above, I need the newly active and newly standby node > names > >> to > >>> be passed to my start action. What's the best way to get this > information > >>> inside my OCF agent? > >> > >> Modifying the configuration from within an agent is dangerous -- too > >> much potential for feedback loops between pacemaker and the agent. > >> > >> I think stickiness will do what you want here. Set a stickiness higher > >> than the original node's preference, and the resource will want to stay > >> where it is. > >> > >>> 7) Apart from node name, there will be other information which I plan > to > >>> pass by making use of node attributes. What's the best way to get this > >>> information inside my OCF agent? Use crm command to query? > >> > >> Any of the command-line interfaces for doing so should be fine, but I'd > >> recommend using one of the lower-level tools (crm_attribute or > >> attrd_updater) so you don't have a dependency on a higher-level tool > >> that may not always be
Re: [ClusterLabs] Help required for N+1 redundancy setup
On 01/08/2016 11:13 AM, Nikhil Utane wrote: >> I think stickiness will do what you want here. Set a stickiness higher >> than the original node's preference, and the resource will want to stay >> where it is. > > Not sure I understand this. Stickiness will ensure that resources don't > move back when original node comes back up, isn't it? > But in my case, I want the newly standby node to become the backup node for > all other nodes. i.e. it should now be able to run all my resource groups > albeit with a lower score. How do I achieve that? Oh right. I forgot to ask whether you had an opt-out (symmetric-cluster=true, the default) or opt-in (symmetric-cluster=false) cluster. If you're opt-out, every node can run every resource unless you give it a negative preference. Partly it depends on whether there is a good reason to give each instance a "home" node. Often, there's not. If you just want to balance resources across nodes, the cluster will do that by default. If you prefer to put certain resources on certain nodes because the resources require more physical resources (RAM/CPU/whatever), you can set node attributes for that and use rules to set node preferences. Either way, you can decide whether you want stickiness with it. > Also can you answer, how to get the values of node that goes active and the > node that goes down inside the OCF agent? Do I need to use notification or > some simpler alternative is available? > Thanks. > > > On Fri, Jan 8, 2016 at 9:30 PM, Ken Gaillotwrote: > >> On 01/08/2016 06:55 AM, Nikhil Utane wrote: >>> Would like to validate my final config. >>> >>> As I mentioned earlier, I will be having (upto) 5 active servers and 1 >>> standby server. >>> The standby server should take up the role of active that went down. Each >>> active has some unique configuration that needs to be preserved. >>> >>> 1) So I will create total 5 groups. Each group has a "heartbeat::IPaddr2 >>> resource (for virtual IP) and my custom resource. >>> 2) The virtual IP needs to be read inside my custom OCF agent, so I will >>> make use of attribute reference and point to the value of IPaddr2 inside >> my >>> custom resource to avoid duplication. >>> 3) I will then configure location constraint to run the group resource >> on 5 >>> active nodes with higher score and lesser score on standby. >>> For e.g. >>> Group NodeScore >>> - >>> MyGroup1node1 500 >>> MyGroup1node6 0 >>> >>> MyGroup2node2 500 >>> MyGroup2node6 0 >>> .. >>> MyGroup5node5 500 >>> MyGroup5node6 0 >>> >>> 4) Now if say node1 were to go down, then stop action on node1 will first >>> get called. Haven't decided if I need to do anything specific here. >> >> To clarify, if node1 goes down intentionally (e.g. standby or stop), >> then all resources on it will be stopped first. But if node1 becomes >> unavailable (e.g. crash or communication outage), it will get fenced. >> >>> 5) But when the start action of node 6 gets called, then using crm >> command >>> line interface, I will modify the above config to swap node 1 and node 6. >>> i.e. >>> MyGroup1node6 500 >>> MyGroup1node1 0 >>> >>> MyGroup2node2 500 >>> MyGroup2node1 0 >>> >>> 6) To do the above, I need the newly active and newly standby node names >> to >>> be passed to my start action. What's the best way to get this information >>> inside my OCF agent? >> >> Modifying the configuration from within an agent is dangerous -- too >> much potential for feedback loops between pacemaker and the agent. >> >> I think stickiness will do what you want here. Set a stickiness higher >> than the original node's preference, and the resource will want to stay >> where it is. >> >>> 7) Apart from node name, there will be other information which I plan to >>> pass by making use of node attributes. What's the best way to get this >>> information inside my OCF agent? Use crm command to query? >> >> Any of the command-line interfaces for doing so should be fine, but I'd >> recommend using one of the lower-level tools (crm_attribute or >> attrd_updater) so you don't have a dependency on a higher-level tool >> that may not always be installed. >> >>> Thank You. >>> >>> On Tue, Dec 22, 2015 at 9:44 PM, Nikhil Utane < >> nikhil.subscri...@gmail.com> >>> wrote: >>> Thanks to you Ken for giving all the pointers. Yes, I can use service start/stop which should be a lot simpler. Thanks again. :) On Tue, Dec 22, 2015 at 9:29 PM, Ken Gaillot >> wrote: > On 12/22/2015 12:17 AM, Nikhil Utane wrote: >> I have prepared a write-up explaining my requirements and current > solution >> that I am proposing based on my understanding so far. >> Kindly let me know if what I am
Re: [ClusterLabs] Help required for N+1 redundancy setup
Would like to validate my final config. As I mentioned earlier, I will be having (upto) 5 active servers and 1 standby server. The standby server should take up the role of active that went down. Each active has some unique configuration that needs to be preserved. 1) So I will create total 5 groups. Each group has a "heartbeat::IPaddr2 resource (for virtual IP) and my custom resource. 2) The virtual IP needs to be read inside my custom OCF agent, so I will make use of attribute reference and point to the value of IPaddr2 inside my custom resource to avoid duplication. 3) I will then configure location constraint to run the group resource on 5 active nodes with higher score and lesser score on standby. For e.g. Group NodeScore - MyGroup1node1 500 MyGroup1node6 0 MyGroup2node2 500 MyGroup2node6 0 .. MyGroup5node5 500 MyGroup5node6 0 4) Now if say node1 were to go down, then stop action on node1 will first get called. Haven't decided if I need to do anything specific here. 5) But when the start action of node 6 gets called, then using crm command line interface, I will modify the above config to swap node 1 and node 6. i.e. MyGroup1node6 500 MyGroup1node1 0 MyGroup2node2 500 MyGroup2node1 0 6) To do the above, I need the newly active and newly standby node names to be passed to my start action. What's the best way to get this information inside my OCF agent? 7) Apart from node name, there will be other information which I plan to pass by making use of node attributes. What's the best way to get this information inside my OCF agent? Use crm command to query? Thank You. On Tue, Dec 22, 2015 at 9:44 PM, Nikhil Utanewrote: > Thanks to you Ken for giving all the pointers. > Yes, I can use service start/stop which should be a lot simpler. Thanks > again. :) > > On Tue, Dec 22, 2015 at 9:29 PM, Ken Gaillot wrote: > >> On 12/22/2015 12:17 AM, Nikhil Utane wrote: >> > I have prepared a write-up explaining my requirements and current >> solution >> > that I am proposing based on my understanding so far. >> > Kindly let me know if what I am proposing is good or there is a better >> way >> > to achieve the same. >> > >> > >> https://drive.google.com/file/d/0B0zPvL-Tp-JSTEJpcUFTanhsNzQ/view?usp=sharing >> > >> > Let me know if you face any issue in accessing the above link. Thanks. >> >> This looks great. Very well thought-out. >> >> One comment: >> >> "8. In the event of any failover, the standby node will get notified >> through an event and it will execute a script that will read the >> configuration specific to the node that went down (again using >> crm_attribute) and become active." >> >> It may not be necessary to use the notifications for this. Pacemaker >> will call your resource agent with the "start" action on the standby >> node, after ensuring it is stopped on the previous node. Hopefully the >> resource agent's start action has (or can have, with configuration >> options) all the information you need. >> >> If you do end up needing notifications, be aware that the feature will >> be disabled by default in the 1.1.14 release, because changes in syntax >> are expected in further development. You can define a compile-time >> constant to enable them. >> >> ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Help required for N+1 redundancy setup
On 01/08/2016 06:55 AM, Nikhil Utane wrote: > Would like to validate my final config. > > As I mentioned earlier, I will be having (upto) 5 active servers and 1 > standby server. > The standby server should take up the role of active that went down. Each > active has some unique configuration that needs to be preserved. > > 1) So I will create total 5 groups. Each group has a "heartbeat::IPaddr2 > resource (for virtual IP) and my custom resource. > 2) The virtual IP needs to be read inside my custom OCF agent, so I will > make use of attribute reference and point to the value of IPaddr2 inside my > custom resource to avoid duplication. > 3) I will then configure location constraint to run the group resource on 5 > active nodes with higher score and lesser score on standby. > For e.g. > Group NodeScore > - > MyGroup1node1 500 > MyGroup1node6 0 > > MyGroup2node2 500 > MyGroup2node6 0 > .. > MyGroup5node5 500 > MyGroup5node6 0 > > 4) Now if say node1 were to go down, then stop action on node1 will first > get called. Haven't decided if I need to do anything specific here. To clarify, if node1 goes down intentionally (e.g. standby or stop), then all resources on it will be stopped first. But if node1 becomes unavailable (e.g. crash or communication outage), it will get fenced. > 5) But when the start action of node 6 gets called, then using crm command > line interface, I will modify the above config to swap node 1 and node 6. > i.e. > MyGroup1node6 500 > MyGroup1node1 0 > > MyGroup2node2 500 > MyGroup2node1 0 > > 6) To do the above, I need the newly active and newly standby node names to > be passed to my start action. What's the best way to get this information > inside my OCF agent? Modifying the configuration from within an agent is dangerous -- too much potential for feedback loops between pacemaker and the agent. I think stickiness will do what you want here. Set a stickiness higher than the original node's preference, and the resource will want to stay where it is. > 7) Apart from node name, there will be other information which I plan to > pass by making use of node attributes. What's the best way to get this > information inside my OCF agent? Use crm command to query? Any of the command-line interfaces for doing so should be fine, but I'd recommend using one of the lower-level tools (crm_attribute or attrd_updater) so you don't have a dependency on a higher-level tool that may not always be installed. > Thank You. > > On Tue, Dec 22, 2015 at 9:44 PM, Nikhil Utane> wrote: > >> Thanks to you Ken for giving all the pointers. >> Yes, I can use service start/stop which should be a lot simpler. Thanks >> again. :) >> >> On Tue, Dec 22, 2015 at 9:29 PM, Ken Gaillot wrote: >> >>> On 12/22/2015 12:17 AM, Nikhil Utane wrote: I have prepared a write-up explaining my requirements and current >>> solution that I am proposing based on my understanding so far. Kindly let me know if what I am proposing is good or there is a better >>> way to achieve the same. >>> https://drive.google.com/file/d/0B0zPvL-Tp-JSTEJpcUFTanhsNzQ/view?usp=sharing Let me know if you face any issue in accessing the above link. Thanks. >>> >>> This looks great. Very well thought-out. >>> >>> One comment: >>> >>> "8. In the event of any failover, the standby node will get notified >>> through an event and it will execute a script that will read the >>> configuration specific to the node that went down (again using >>> crm_attribute) and become active." >>> >>> It may not be necessary to use the notifications for this. Pacemaker >>> will call your resource agent with the "start" action on the standby >>> node, after ensuring it is stopped on the previous node. Hopefully the >>> resource agent's start action has (or can have, with configuration >>> options) all the information you need. >>> >>> If you do end up needing notifications, be aware that the feature will >>> be disabled by default in the 1.1.14 release, because changes in syntax >>> are expected in further development. You can define a compile-time >>> constant to enable them. >>> >>> > ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Help required for N+1 redundancy setup
Thanks to you Ken for giving all the pointers. Yes, I can use service start/stop which should be a lot simpler. Thanks again. :) On Tue, Dec 22, 2015 at 9:29 PM, Ken Gaillotwrote: > On 12/22/2015 12:17 AM, Nikhil Utane wrote: > > I have prepared a write-up explaining my requirements and current > solution > > that I am proposing based on my understanding so far. > > Kindly let me know if what I am proposing is good or there is a better > way > > to achieve the same. > > > > > https://drive.google.com/file/d/0B0zPvL-Tp-JSTEJpcUFTanhsNzQ/view?usp=sharing > > > > Let me know if you face any issue in accessing the above link. Thanks. > > This looks great. Very well thought-out. > > One comment: > > "8. In the event of any failover, the standby node will get notified > through an event and it will execute a script that will read the > configuration specific to the node that went down (again using > crm_attribute) and become active." > > It may not be necessary to use the notifications for this. Pacemaker > will call your resource agent with the "start" action on the standby > node, after ensuring it is stopped on the previous node. Hopefully the > resource agent's start action has (or can have, with configuration > options) all the information you need. > > If you do end up needing notifications, be aware that the feature will > be disabled by default in the 1.1.14 release, because changes in syntax > are expected in further development. You can define a compile-time > constant to enable them. > > ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Help required for N+1 redundancy setup
I have prepared a write-up explaining my requirements and current solution that I am proposing based on my understanding so far. Kindly let me know if what I am proposing is good or there is a better way to achieve the same. https://drive.google.com/file/d/0B0zPvL-Tp-JSTEJpcUFTanhsNzQ/view?usp=sharing Let me know if you face any issue in accessing the above link. Thanks. On Thu, Dec 3, 2015 at 11:34 PM, Ken Gaillotwrote: > On 12/03/2015 05:23 AM, Nikhil Utane wrote: > > Ken, > > > > One more question, if i have to propagate configuration changes between > the > > nodes then is cpg (closed process group) the right way? > > For e.g. > > Active Node1 has config A=1, B=2 > > Active Node2 has config A=3, B=4 > > Standby Node needs to have configuration for all the nodes such that > > whichever goes down, it comes up with those values. > > Here configuration is not static but can be updated at run-time. > > Being unfamiliar with the specifics of your case, I can't say what the > best approach is, but it sounds like you will need to write a custom OCF > resource agent to manage your service. > > A resource agent is similar to an init script: > > http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#ap-ocf > > The RA will start the service with the appropriate configuration. It can > use per-resource options configured in pacemaker or external information > to do that. > > How does your service get its configuration currently? > > > BTW, I'm little confused between OpenAIS and Corosync. For my purpose I > > should be able to use either, right? > > Corosync started out as a subset of OpenAIS, optimized for use with > Pacemaker. Corosync 2 is now the preferred membership layer for > Pacemaker for most uses, though other layers are still supported. > > > Thanks. > > > > On Tue, Dec 1, 2015 at 9:04 PM, Ken Gaillot wrote: > > > >> On 12/01/2015 05:31 AM, Nikhil Utane wrote: > >>> Hi, > >>> > >>> I am evaluating whether it is feasible to use Pacemaker + Corosync to > add > >>> support for clustering/redundancy into our product. > >> > >> Most definitely > >> > >>> Our objectives: > >>> 1) Support N+1 redundancy. i,e. N Active and (up to) 1 Standby. > >> > >> You can do this with location constraints and scores. See: > >> > >> > http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#_deciding_which_nodes_a_resource_can_run_on > >> > >> Basically, you give the standby node a lower score than the other nodes. > >> > >>> 2) Each node has some different configuration parameters. > >>> 3) Whenever any active node goes down, the standby node comes up with > the > >>> same configuration that the active had. > >> > >> How you solve this requirement depends on the specifics of your > >> situation. Ideally, you can use OCF resource agents that take the > >> configuration location as a parameter. You may have to write your own, > >> if none is available for your services. > >> > >>> 4) There is no one single process/service for which we need redundancy, > >>> rather it is the entire system (multiple processes running together). > >> > >> This is trivially implemented using either groups or ordering and > >> colocation constraints. > >> > >> Order constraint = start service A before starting service B (and stop > >> in reverse order) > >> > >> Colocation constraint = keep services A and B on the same node > >> > >> Group = shortcut to specify several services that need to start/stop in > >> order and be kept together > >> > >> > >> > http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm231363875392 > >> > >> > >> > http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#group-resources > >> > >> > >>> 5) I would also want to be notified when any active<->standby state > >>> transition happens as I would want to take some steps at the > application > >>> level. > >> > >> There are multiple approaches. > >> > >> If you don't mind compiling your own packages, the latest master branch > >> (which will be part of the upcoming 1.1.14 release) has built-in > >> notification capability. See: > >> http://blog.clusterlabs.org/blog/2015/reliable-notifications/ > >> > >> Otherwise, you can use SNMP or e-mail if your packages were compiled > >> with those options, or you can use the ocf:pacemaker:ClusterMon resource > >> agent: > >> > >> > http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm231308442928 > >> > >>> I went through the documents/blogs but all had example for 1 active > and 1 > >>> standby use-case and that too for some standard service like httpd. > >> > >> Pacemaker is incredibly versatile, and the use cases are far too varied > >> to cover more than a small subset. Those simple examples show the basic > >> building blocks, and can usually point you to the specific features you >
Re: [ClusterLabs] Help required for N+1 redundancy setup
On 12/03/2015 05:23 AM, Nikhil Utane wrote: > Ken, > > One more question, if i have to propagate configuration changes between the > nodes then is cpg (closed process group) the right way? > For e.g. > Active Node1 has config A=1, B=2 > Active Node2 has config A=3, B=4 > Standby Node needs to have configuration for all the nodes such that > whichever goes down, it comes up with those values. > Here configuration is not static but can be updated at run-time. Being unfamiliar with the specifics of your case, I can't say what the best approach is, but it sounds like you will need to write a custom OCF resource agent to manage your service. A resource agent is similar to an init script: http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#ap-ocf The RA will start the service with the appropriate configuration. It can use per-resource options configured in pacemaker or external information to do that. How does your service get its configuration currently? > BTW, I'm little confused between OpenAIS and Corosync. For my purpose I > should be able to use either, right? Corosync started out as a subset of OpenAIS, optimized for use with Pacemaker. Corosync 2 is now the preferred membership layer for Pacemaker for most uses, though other layers are still supported. > Thanks. > > On Tue, Dec 1, 2015 at 9:04 PM, Ken Gaillotwrote: > >> On 12/01/2015 05:31 AM, Nikhil Utane wrote: >>> Hi, >>> >>> I am evaluating whether it is feasible to use Pacemaker + Corosync to add >>> support for clustering/redundancy into our product. >> >> Most definitely >> >>> Our objectives: >>> 1) Support N+1 redundancy. i,e. N Active and (up to) 1 Standby. >> >> You can do this with location constraints and scores. See: >> >> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#_deciding_which_nodes_a_resource_can_run_on >> >> Basically, you give the standby node a lower score than the other nodes. >> >>> 2) Each node has some different configuration parameters. >>> 3) Whenever any active node goes down, the standby node comes up with the >>> same configuration that the active had. >> >> How you solve this requirement depends on the specifics of your >> situation. Ideally, you can use OCF resource agents that take the >> configuration location as a parameter. You may have to write your own, >> if none is available for your services. >> >>> 4) There is no one single process/service for which we need redundancy, >>> rather it is the entire system (multiple processes running together). >> >> This is trivially implemented using either groups or ordering and >> colocation constraints. >> >> Order constraint = start service A before starting service B (and stop >> in reverse order) >> >> Colocation constraint = keep services A and B on the same node >> >> Group = shortcut to specify several services that need to start/stop in >> order and be kept together >> >> >> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm231363875392 >> >> >> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#group-resources >> >> >>> 5) I would also want to be notified when any active<->standby state >>> transition happens as I would want to take some steps at the application >>> level. >> >> There are multiple approaches. >> >> If you don't mind compiling your own packages, the latest master branch >> (which will be part of the upcoming 1.1.14 release) has built-in >> notification capability. See: >> http://blog.clusterlabs.org/blog/2015/reliable-notifications/ >> >> Otherwise, you can use SNMP or e-mail if your packages were compiled >> with those options, or you can use the ocf:pacemaker:ClusterMon resource >> agent: >> >> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm231308442928 >> >>> I went through the documents/blogs but all had example for 1 active and 1 >>> standby use-case and that too for some standard service like httpd. >> >> Pacemaker is incredibly versatile, and the use cases are far too varied >> to cover more than a small subset. Those simple examples show the basic >> building blocks, and can usually point you to the specific features you >> need to investigate further. >> >>> One additional question, If I am having multiple actives, then Virtual IP >>> configuration cannot be used? Is it possible such that N actives have >>> different IP addresses but whenever standby becomes active it uses the IP >>> address of the failed node? >> >> Yes, there are a few approaches here, too. >> >> The simplest is to assign a virtual IP to each active, and include it in >> your group of resources. The whole group will fail over to the standby >> node if the original goes down. >> >> If you want a single virtual IP that is used by all your actives, one >> alternative is to clone the ocf:heartbeat:IPaddr2 resource.
Re: [ClusterLabs] Help required for N+1 redundancy setup
On 12/01/2015 05:31 AM, Nikhil Utane wrote: > Hi, > > I am evaluating whether it is feasible to use Pacemaker + Corosync to add > support for clustering/redundancy into our product. Most definitely > Our objectives: > 1) Support N+1 redundancy. i,e. N Active and (up to) 1 Standby. You can do this with location constraints and scores. See: http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#_deciding_which_nodes_a_resource_can_run_on Basically, you give the standby node a lower score than the other nodes. > 2) Each node has some different configuration parameters. > 3) Whenever any active node goes down, the standby node comes up with the > same configuration that the active had. How you solve this requirement depends on the specifics of your situation. Ideally, you can use OCF resource agents that take the configuration location as a parameter. You may have to write your own, if none is available for your services. > 4) There is no one single process/service for which we need redundancy, > rather it is the entire system (multiple processes running together). This is trivially implemented using either groups or ordering and colocation constraints. Order constraint = start service A before starting service B (and stop in reverse order) Colocation constraint = keep services A and B on the same node Group = shortcut to specify several services that need to start/stop in order and be kept together http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm231363875392 http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#group-resources > 5) I would also want to be notified when any active<->standby state > transition happens as I would want to take some steps at the application > level. There are multiple approaches. If you don't mind compiling your own packages, the latest master branch (which will be part of the upcoming 1.1.14 release) has built-in notification capability. See: http://blog.clusterlabs.org/blog/2015/reliable-notifications/ Otherwise, you can use SNMP or e-mail if your packages were compiled with those options, or you can use the ocf:pacemaker:ClusterMon resource agent: http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm231308442928 > I went through the documents/blogs but all had example for 1 active and 1 > standby use-case and that too for some standard service like httpd. Pacemaker is incredibly versatile, and the use cases are far too varied to cover more than a small subset. Those simple examples show the basic building blocks, and can usually point you to the specific features you need to investigate further. > One additional question, If I am having multiple actives, then Virtual IP > configuration cannot be used? Is it possible such that N actives have > different IP addresses but whenever standby becomes active it uses the IP > address of the failed node? Yes, there are a few approaches here, too. The simplest is to assign a virtual IP to each active, and include it in your group of resources. The whole group will fail over to the standby node if the original goes down. If you want a single virtual IP that is used by all your actives, one alternative is to clone the ocf:heartbeat:IPaddr2 resource. When cloned, that resource agent will use iptables' CLUSTERIP functionality, which relies on multicast Ethernet addresses (not to be confused with multicast IP). Since multicast Ethernet has limitations, this is not often used in production. A more complicated method is to use a virtual IP in combination with a load-balancer such as haproxy. Pacemaker can manage haproxy and the real services, and haproxy manages distributing requests to the real services. > Thanking in advance. > Nikhil A last word of advice: Fencing (aka STONITH) is important for proper recovery from difficult failure conditions. Without it, it is possible to have data loss or corruption in a split-brain situation. ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org