Re: [ClusterLabs] Help required for N+1 redundancy setup

2016-03-19 Thread Nikhil Utane
Thanks Ken for the detailed response.
I suppose I could even use some of the pcs/crm CLI commands then.
Cheers.

On Wed, Mar 16, 2016 at 8:27 PM, Ken Gaillot  wrote:

> On 03/16/2016 05:22 AM, Nikhil Utane wrote:
> > I see following info gets updated in CIB. Can I use this or there is
> better
> > way?
> >
> >  > crm-debug-origin="peer_update_callback" join="*down*" expected="member">
>
> in_ccm/crmd/join reflect the current state of the node (as known by the
> partition that you're looking at the CIB on), so if the node went down
> and came back up, it won't tell you anything about being down.
>
> - in_ccm indicates that the node is part of the underlying cluster layer
> (heartbeat/cman/corosync)
>
> - crmd indicates that the node is communicating at the pacemaker layer
>
> - join indicates what phase of the join process the node is at
>
> There's not a direct way to see what node went down after the fact.
> There are ways however:
>
> - if the node was running resources, those will be failed, and those
> failures (including node) will be shown in the cluster status
>
> - the logs show all node membership events; you can search for logs such
> as "state is now lost" and "left us"
>
> - "stonith -H $NODE_NAME" will show the fence history for a given node,
> so if the node went down due to fencing, it will show up there
>
> - you can configure an ocf:pacemaker:ClusterMon resource to run crm_mon
> periodically and run a script for node events, and you can write the
> script to do whatever you want (email you, etc.) (in the upcoming 1.1.15
> release, built-in notifications will make this more reliable and easier,
> but any script you use with ClusterMon will still be usable with the new
> method)
>
> > On Wed, Mar 16, 2016 at 12:40 PM, Nikhil Utane <
> nikhil.subscri...@gmail.com>
> > wrote:
> >
> >> Hi Ken,
> >>
> >> Sorry about the long delay. This activity was de-focussed but now it's
> >> back on track.
> >>
> >> One part of question that is still not answered is on the newly active
> >> node, how to find out which was the node that went down?
> >> Anything that gets updated in the status section that can be read and
> >> figured out?
> >>
> >> Thanks.
> >> Nikhil
> >>
> >> On Sat, Jan 9, 2016 at 3:31 AM, Ken Gaillot 
> wrote:
> >>
> >>> On 01/08/2016 11:13 AM, Nikhil Utane wrote:
> > I think stickiness will do what you want here. Set a stickiness
> higher
> > than the original node's preference, and the resource will want to
> stay
> > where it is.
> 
>  Not sure I understand this. Stickiness will ensure that resources
> don't
>  move back when original node comes back up, isn't it?
>  But in my case, I want the newly standby node to become the backup
> node
> >>> for
>  all other nodes. i.e. it should now be able to run all my resource
> >>> groups
>  albeit with a lower score. How do I achieve that?
> >>>
> >>> Oh right. I forgot to ask whether you had an opt-out
> >>> (symmetric-cluster=true, the default) or opt-in
> >>> (symmetric-cluster=false) cluster. If you're opt-out, every node can
> run
> >>> every resource unless you give it a negative preference.
> >>>
> >>> Partly it depends on whether there is a good reason to give each
> >>> instance a "home" node. Often, there's not. If you just want to balance
> >>> resources across nodes, the cluster will do that by default.
> >>>
> >>> If you prefer to put certain resources on certain nodes because the
> >>> resources require more physical resources (RAM/CPU/whatever), you can
> >>> set node attributes for that and use rules to set node preferences.
> >>>
> >>> Either way, you can decide whether you want stickiness with it.
> >>>
>  Also can you answer, how to get the values of node that goes active
> and
> >>> the
>  node that goes down inside the OCF agent?  Do I need to use
> >>> notification or
>  some simpler alternative is available?
>  Thanks.
> 
> 
>  On Fri, Jan 8, 2016 at 9:30 PM, Ken Gaillot 
> >>> wrote:
> 
> > On 01/08/2016 06:55 AM, Nikhil Utane wrote:
> >> Would like to validate my final config.
> >>
> >> As I mentioned earlier, I will be having (upto) 5 active servers
> and 1
> >> standby server.
> >> The standby server should take up the role of active that went down.
> >>> Each
> >> active has some unique configuration that needs to be preserved.
> >>
> >> 1) So I will create total 5 groups. Each group has a
> >>> "heartbeat::IPaddr2
> >> resource (for virtual IP) and my custom resource.
> >> 2) The virtual IP needs to be read inside my custom OCF agent, so I
> >>> will
> >> make use of attribute reference and point to the value of IPaddr2
> >>> inside
> > my
> >> custom resource to avoid duplication.
> >> 3) I will then configure location constraint to run the group
> resource
> > on 5
> >> active nodes with higher score and lesser 

Re: [ClusterLabs] Help required for N+1 redundancy setup

2016-03-19 Thread Ken Gaillot
On 03/16/2016 05:22 AM, Nikhil Utane wrote:
> I see following info gets updated in CIB. Can I use this or there is better
> way?
> 
>  crm-debug-origin="peer_update_callback" join="*down*" expected="member">

in_ccm/crmd/join reflect the current state of the node (as known by the
partition that you're looking at the CIB on), so if the node went down
and came back up, it won't tell you anything about being down.

- in_ccm indicates that the node is part of the underlying cluster layer
(heartbeat/cman/corosync)

- crmd indicates that the node is communicating at the pacemaker layer

- join indicates what phase of the join process the node is at

There's not a direct way to see what node went down after the fact.
There are ways however:

- if the node was running resources, those will be failed, and those
failures (including node) will be shown in the cluster status

- the logs show all node membership events; you can search for logs such
as "state is now lost" and "left us"

- "stonith -H $NODE_NAME" will show the fence history for a given node,
so if the node went down due to fencing, it will show up there

- you can configure an ocf:pacemaker:ClusterMon resource to run crm_mon
periodically and run a script for node events, and you can write the
script to do whatever you want (email you, etc.) (in the upcoming 1.1.15
release, built-in notifications will make this more reliable and easier,
but any script you use with ClusterMon will still be usable with the new
method)

> On Wed, Mar 16, 2016 at 12:40 PM, Nikhil Utane 
> wrote:
> 
>> Hi Ken,
>>
>> Sorry about the long delay. This activity was de-focussed but now it's
>> back on track.
>>
>> One part of question that is still not answered is on the newly active
>> node, how to find out which was the node that went down?
>> Anything that gets updated in the status section that can be read and
>> figured out?
>>
>> Thanks.
>> Nikhil
>>
>> On Sat, Jan 9, 2016 at 3:31 AM, Ken Gaillot  wrote:
>>
>>> On 01/08/2016 11:13 AM, Nikhil Utane wrote:
> I think stickiness will do what you want here. Set a stickiness higher
> than the original node's preference, and the resource will want to stay
> where it is.

 Not sure I understand this. Stickiness will ensure that resources don't
 move back when original node comes back up, isn't it?
 But in my case, I want the newly standby node to become the backup node
>>> for
 all other nodes. i.e. it should now be able to run all my resource
>>> groups
 albeit with a lower score. How do I achieve that?
>>>
>>> Oh right. I forgot to ask whether you had an opt-out
>>> (symmetric-cluster=true, the default) or opt-in
>>> (symmetric-cluster=false) cluster. If you're opt-out, every node can run
>>> every resource unless you give it a negative preference.
>>>
>>> Partly it depends on whether there is a good reason to give each
>>> instance a "home" node. Often, there's not. If you just want to balance
>>> resources across nodes, the cluster will do that by default.
>>>
>>> If you prefer to put certain resources on certain nodes because the
>>> resources require more physical resources (RAM/CPU/whatever), you can
>>> set node attributes for that and use rules to set node preferences.
>>>
>>> Either way, you can decide whether you want stickiness with it.
>>>
 Also can you answer, how to get the values of node that goes active and
>>> the
 node that goes down inside the OCF agent?  Do I need to use
>>> notification or
 some simpler alternative is available?
 Thanks.


 On Fri, Jan 8, 2016 at 9:30 PM, Ken Gaillot 
>>> wrote:

> On 01/08/2016 06:55 AM, Nikhil Utane wrote:
>> Would like to validate my final config.
>>
>> As I mentioned earlier, I will be having (upto) 5 active servers and 1
>> standby server.
>> The standby server should take up the role of active that went down.
>>> Each
>> active has some unique configuration that needs to be preserved.
>>
>> 1) So I will create total 5 groups. Each group has a
>>> "heartbeat::IPaddr2
>> resource (for virtual IP) and my custom resource.
>> 2) The virtual IP needs to be read inside my custom OCF agent, so I
>>> will
>> make use of attribute reference and point to the value of IPaddr2
>>> inside
> my
>> custom resource to avoid duplication.
>> 3) I will then configure location constraint to run the group resource
> on 5
>> active nodes with higher score and lesser score on standby.
>> For e.g.
>> Group  NodeScore
>> -
>> MyGroup1node1   500
>> MyGroup1node6   0
>>
>> MyGroup2node2   500
>> MyGroup2node6   0
>> ..
>> MyGroup5node5   500
>> MyGroup5node6   0
>>
>> 4) Now if say 

Re: [ClusterLabs] Help required for N+1 redundancy setup

2016-03-16 Thread Nikhil Utane
I see following info gets updated in CIB. Can I use this or there is better
way?



On Wed, Mar 16, 2016 at 12:40 PM, Nikhil Utane 
wrote:

> Hi Ken,
>
> Sorry about the long delay. This activity was de-focussed but now it's
> back on track.
>
> One part of question that is still not answered is on the newly active
> node, how to find out which was the node that went down?
> Anything that gets updated in the status section that can be read and
> figured out?
>
> Thanks.
> Nikhil
>
> On Sat, Jan 9, 2016 at 3:31 AM, Ken Gaillot  wrote:
>
>> On 01/08/2016 11:13 AM, Nikhil Utane wrote:
>> >> I think stickiness will do what you want here. Set a stickiness higher
>> >> than the original node's preference, and the resource will want to stay
>> >> where it is.
>> >
>> > Not sure I understand this. Stickiness will ensure that resources don't
>> > move back when original node comes back up, isn't it?
>> > But in my case, I want the newly standby node to become the backup node
>> for
>> > all other nodes. i.e. it should now be able to run all my resource
>> groups
>> > albeit with a lower score. How do I achieve that?
>>
>> Oh right. I forgot to ask whether you had an opt-out
>> (symmetric-cluster=true, the default) or opt-in
>> (symmetric-cluster=false) cluster. If you're opt-out, every node can run
>> every resource unless you give it a negative preference.
>>
>> Partly it depends on whether there is a good reason to give each
>> instance a "home" node. Often, there's not. If you just want to balance
>> resources across nodes, the cluster will do that by default.
>>
>> If you prefer to put certain resources on certain nodes because the
>> resources require more physical resources (RAM/CPU/whatever), you can
>> set node attributes for that and use rules to set node preferences.
>>
>> Either way, you can decide whether you want stickiness with it.
>>
>> > Also can you answer, how to get the values of node that goes active and
>> the
>> > node that goes down inside the OCF agent?  Do I need to use
>> notification or
>> > some simpler alternative is available?
>> > Thanks.
>> >
>> >
>> > On Fri, Jan 8, 2016 at 9:30 PM, Ken Gaillot 
>> wrote:
>> >
>> >> On 01/08/2016 06:55 AM, Nikhil Utane wrote:
>> >>> Would like to validate my final config.
>> >>>
>> >>> As I mentioned earlier, I will be having (upto) 5 active servers and 1
>> >>> standby server.
>> >>> The standby server should take up the role of active that went down.
>> Each
>> >>> active has some unique configuration that needs to be preserved.
>> >>>
>> >>> 1) So I will create total 5 groups. Each group has a
>> "heartbeat::IPaddr2
>> >>> resource (for virtual IP) and my custom resource.
>> >>> 2) The virtual IP needs to be read inside my custom OCF agent, so I
>> will
>> >>> make use of attribute reference and point to the value of IPaddr2
>> inside
>> >> my
>> >>> custom resource to avoid duplication.
>> >>> 3) I will then configure location constraint to run the group resource
>> >> on 5
>> >>> active nodes with higher score and lesser score on standby.
>> >>> For e.g.
>> >>> Group  NodeScore
>> >>> -
>> >>> MyGroup1node1   500
>> >>> MyGroup1node6   0
>> >>>
>> >>> MyGroup2node2   500
>> >>> MyGroup2node6   0
>> >>> ..
>> >>> MyGroup5node5   500
>> >>> MyGroup5node6   0
>> >>>
>> >>> 4) Now if say node1 were to go down, then stop action on node1 will
>> first
>> >>> get called. Haven't decided if I need to do anything specific here.
>> >>
>> >> To clarify, if node1 goes down intentionally (e.g. standby or stop),
>> >> then all resources on it will be stopped first. But if node1 becomes
>> >> unavailable (e.g. crash or communication outage), it will get fenced.
>> >>
>> >>> 5) But when the start action of node 6 gets called, then using crm
>> >> command
>> >>> line interface, I will modify the above config to swap node 1 and
>> node 6.
>> >>> i.e.
>> >>> MyGroup1node6   500
>> >>> MyGroup1node1   0
>> >>>
>> >>> MyGroup2node2   500
>> >>> MyGroup2node1   0
>> >>>
>> >>> 6) To do the above, I need the newly active and newly standby node
>> names
>> >> to
>> >>> be passed to my start action. What's the best way to get this
>> information
>> >>> inside my OCF agent?
>> >>
>> >> Modifying the configuration from within an agent is dangerous -- too
>> >> much potential for feedback loops between pacemaker and the agent.
>> >>
>> >> I think stickiness will do what you want here. Set a stickiness higher
>> >> than the original node's preference, and the resource will want to stay
>> >> where it is.
>> >>
>> >>> 7) Apart from node name, there will be other information which I plan
>> to
>> >>> pass by making use of node attributes. What's the best way to get this
>> >>> 

Re: [ClusterLabs] Help required for N+1 redundancy setup

2016-03-16 Thread Nikhil Utane
Hi Ken,

Sorry about the long delay. This activity was de-focussed but now it's back
on track.

One part of question that is still not answered is on the newly active
node, how to find out which was the node that went down?
Anything that gets updated in the status section that can be read and
figured out?

Thanks.
Nikhil

On Sat, Jan 9, 2016 at 3:31 AM, Ken Gaillot  wrote:

> On 01/08/2016 11:13 AM, Nikhil Utane wrote:
> >> I think stickiness will do what you want here. Set a stickiness higher
> >> than the original node's preference, and the resource will want to stay
> >> where it is.
> >
> > Not sure I understand this. Stickiness will ensure that resources don't
> > move back when original node comes back up, isn't it?
> > But in my case, I want the newly standby node to become the backup node
> for
> > all other nodes. i.e. it should now be able to run all my resource groups
> > albeit with a lower score. How do I achieve that?
>
> Oh right. I forgot to ask whether you had an opt-out
> (symmetric-cluster=true, the default) or opt-in
> (symmetric-cluster=false) cluster. If you're opt-out, every node can run
> every resource unless you give it a negative preference.
>
> Partly it depends on whether there is a good reason to give each
> instance a "home" node. Often, there's not. If you just want to balance
> resources across nodes, the cluster will do that by default.
>
> If you prefer to put certain resources on certain nodes because the
> resources require more physical resources (RAM/CPU/whatever), you can
> set node attributes for that and use rules to set node preferences.
>
> Either way, you can decide whether you want stickiness with it.
>
> > Also can you answer, how to get the values of node that goes active and
> the
> > node that goes down inside the OCF agent?  Do I need to use notification
> or
> > some simpler alternative is available?
> > Thanks.
> >
> >
> > On Fri, Jan 8, 2016 at 9:30 PM, Ken Gaillot  wrote:
> >
> >> On 01/08/2016 06:55 AM, Nikhil Utane wrote:
> >>> Would like to validate my final config.
> >>>
> >>> As I mentioned earlier, I will be having (upto) 5 active servers and 1
> >>> standby server.
> >>> The standby server should take up the role of active that went down.
> Each
> >>> active has some unique configuration that needs to be preserved.
> >>>
> >>> 1) So I will create total 5 groups. Each group has a
> "heartbeat::IPaddr2
> >>> resource (for virtual IP) and my custom resource.
> >>> 2) The virtual IP needs to be read inside my custom OCF agent, so I
> will
> >>> make use of attribute reference and point to the value of IPaddr2
> inside
> >> my
> >>> custom resource to avoid duplication.
> >>> 3) I will then configure location constraint to run the group resource
> >> on 5
> >>> active nodes with higher score and lesser score on standby.
> >>> For e.g.
> >>> Group  NodeScore
> >>> -
> >>> MyGroup1node1   500
> >>> MyGroup1node6   0
> >>>
> >>> MyGroup2node2   500
> >>> MyGroup2node6   0
> >>> ..
> >>> MyGroup5node5   500
> >>> MyGroup5node6   0
> >>>
> >>> 4) Now if say node1 were to go down, then stop action on node1 will
> first
> >>> get called. Haven't decided if I need to do anything specific here.
> >>
> >> To clarify, if node1 goes down intentionally (e.g. standby or stop),
> >> then all resources on it will be stopped first. But if node1 becomes
> >> unavailable (e.g. crash or communication outage), it will get fenced.
> >>
> >>> 5) But when the start action of node 6 gets called, then using crm
> >> command
> >>> line interface, I will modify the above config to swap node 1 and node
> 6.
> >>> i.e.
> >>> MyGroup1node6   500
> >>> MyGroup1node1   0
> >>>
> >>> MyGroup2node2   500
> >>> MyGroup2node1   0
> >>>
> >>> 6) To do the above, I need the newly active and newly standby node
> names
> >> to
> >>> be passed to my start action. What's the best way to get this
> information
> >>> inside my OCF agent?
> >>
> >> Modifying the configuration from within an agent is dangerous -- too
> >> much potential for feedback loops between pacemaker and the agent.
> >>
> >> I think stickiness will do what you want here. Set a stickiness higher
> >> than the original node's preference, and the resource will want to stay
> >> where it is.
> >>
> >>> 7) Apart from node name, there will be other information which I plan
> to
> >>> pass by making use of node attributes. What's the best way to get this
> >>> information inside my OCF agent? Use crm command to query?
> >>
> >> Any of the command-line interfaces for doing so should be fine, but I'd
> >> recommend using one of the lower-level tools (crm_attribute or
> >> attrd_updater) so you don't have a dependency on a higher-level tool
> >> that may not always be 

Re: [ClusterLabs] Help required for N+1 redundancy setup

2016-01-08 Thread Ken Gaillot
On 01/08/2016 11:13 AM, Nikhil Utane wrote:
>> I think stickiness will do what you want here. Set a stickiness higher
>> than the original node's preference, and the resource will want to stay
>> where it is.
> 
> Not sure I understand this. Stickiness will ensure that resources don't
> move back when original node comes back up, isn't it?
> But in my case, I want the newly standby node to become the backup node for
> all other nodes. i.e. it should now be able to run all my resource groups
> albeit with a lower score. How do I achieve that?

Oh right. I forgot to ask whether you had an opt-out
(symmetric-cluster=true, the default) or opt-in
(symmetric-cluster=false) cluster. If you're opt-out, every node can run
every resource unless you give it a negative preference.

Partly it depends on whether there is a good reason to give each
instance a "home" node. Often, there's not. If you just want to balance
resources across nodes, the cluster will do that by default.

If you prefer to put certain resources on certain nodes because the
resources require more physical resources (RAM/CPU/whatever), you can
set node attributes for that and use rules to set node preferences.

Either way, you can decide whether you want stickiness with it.

> Also can you answer, how to get the values of node that goes active and the
> node that goes down inside the OCF agent?  Do I need to use notification or
> some simpler alternative is available?
> Thanks.
> 
> 
> On Fri, Jan 8, 2016 at 9:30 PM, Ken Gaillot  wrote:
> 
>> On 01/08/2016 06:55 AM, Nikhil Utane wrote:
>>> Would like to validate my final config.
>>>
>>> As I mentioned earlier, I will be having (upto) 5 active servers and 1
>>> standby server.
>>> The standby server should take up the role of active that went down. Each
>>> active has some unique configuration that needs to be preserved.
>>>
>>> 1) So I will create total 5 groups. Each group has a "heartbeat::IPaddr2
>>> resource (for virtual IP) and my custom resource.
>>> 2) The virtual IP needs to be read inside my custom OCF agent, so I will
>>> make use of attribute reference and point to the value of IPaddr2 inside
>> my
>>> custom resource to avoid duplication.
>>> 3) I will then configure location constraint to run the group resource
>> on 5
>>> active nodes with higher score and lesser score on standby.
>>> For e.g.
>>> Group  NodeScore
>>> -
>>> MyGroup1node1   500
>>> MyGroup1node6   0
>>>
>>> MyGroup2node2   500
>>> MyGroup2node6   0
>>> ..
>>> MyGroup5node5   500
>>> MyGroup5node6   0
>>>
>>> 4) Now if say node1 were to go down, then stop action on node1 will first
>>> get called. Haven't decided if I need to do anything specific here.
>>
>> To clarify, if node1 goes down intentionally (e.g. standby or stop),
>> then all resources on it will be stopped first. But if node1 becomes
>> unavailable (e.g. crash or communication outage), it will get fenced.
>>
>>> 5) But when the start action of node 6 gets called, then using crm
>> command
>>> line interface, I will modify the above config to swap node 1 and node 6.
>>> i.e.
>>> MyGroup1node6   500
>>> MyGroup1node1   0
>>>
>>> MyGroup2node2   500
>>> MyGroup2node1   0
>>>
>>> 6) To do the above, I need the newly active and newly standby node names
>> to
>>> be passed to my start action. What's the best way to get this information
>>> inside my OCF agent?
>>
>> Modifying the configuration from within an agent is dangerous -- too
>> much potential for feedback loops between pacemaker and the agent.
>>
>> I think stickiness will do what you want here. Set a stickiness higher
>> than the original node's preference, and the resource will want to stay
>> where it is.
>>
>>> 7) Apart from node name, there will be other information which I plan to
>>> pass by making use of node attributes. What's the best way to get this
>>> information inside my OCF agent? Use crm command to query?
>>
>> Any of the command-line interfaces for doing so should be fine, but I'd
>> recommend using one of the lower-level tools (crm_attribute or
>> attrd_updater) so you don't have a dependency on a higher-level tool
>> that may not always be installed.
>>
>>> Thank You.
>>>
>>> On Tue, Dec 22, 2015 at 9:44 PM, Nikhil Utane <
>> nikhil.subscri...@gmail.com>
>>> wrote:
>>>
 Thanks to you Ken for giving all the pointers.
 Yes, I can use service start/stop which should be a lot simpler. Thanks
 again. :)

 On Tue, Dec 22, 2015 at 9:29 PM, Ken Gaillot 
>> wrote:

> On 12/22/2015 12:17 AM, Nikhil Utane wrote:
>> I have prepared a write-up explaining my requirements and current
> solution
>> that I am proposing based on my understanding so far.
>> Kindly let me know if what I am 

Re: [ClusterLabs] Help required for N+1 redundancy setup

2016-01-08 Thread Nikhil Utane
Would like to validate my final config.

As I mentioned earlier, I will be having (upto) 5 active servers and 1
standby server.
The standby server should take up the role of active that went down. Each
active has some unique configuration that needs to be preserved.

1) So I will create total 5 groups. Each group has a "heartbeat::IPaddr2
resource (for virtual IP) and my custom resource.
2) The virtual IP needs to be read inside my custom OCF agent, so I will
make use of attribute reference and point to the value of IPaddr2 inside my
custom resource to avoid duplication.
3) I will then configure location constraint to run the group resource on 5
active nodes with higher score and lesser score on standby.
For e.g.
Group  NodeScore
-
MyGroup1node1   500
MyGroup1node6   0

MyGroup2node2   500
MyGroup2node6   0
..
MyGroup5node5   500
MyGroup5node6   0

4) Now if say node1 were to go down, then stop action on node1 will first
get called. Haven't decided if I need to do anything specific here.
5) But when the start action of node 6 gets called, then using crm command
line interface, I will modify the above config to swap node 1 and node 6.
i.e.
MyGroup1node6   500
MyGroup1node1   0

MyGroup2node2   500
MyGroup2node1   0

6) To do the above, I need the newly active and newly standby node names to
be passed to my start action. What's the best way to get this information
inside my OCF agent?
7) Apart from node name, there will be other information which I plan to
pass by making use of node attributes. What's the best way to get this
information inside my OCF agent? Use crm command to query?

Thank You.

On Tue, Dec 22, 2015 at 9:44 PM, Nikhil Utane 
wrote:

> Thanks to you Ken for giving all the pointers.
> Yes, I can use service start/stop which should be a lot simpler. Thanks
> again. :)
>
> On Tue, Dec 22, 2015 at 9:29 PM, Ken Gaillot  wrote:
>
>> On 12/22/2015 12:17 AM, Nikhil Utane wrote:
>> > I have prepared a write-up explaining my requirements and current
>> solution
>> > that I am proposing based on my understanding so far.
>> > Kindly let me know if what I am proposing is good or there is a better
>> way
>> > to achieve the same.
>> >
>> >
>> https://drive.google.com/file/d/0B0zPvL-Tp-JSTEJpcUFTanhsNzQ/view?usp=sharing
>> >
>> > Let me know if you face any issue in accessing the above link. Thanks.
>>
>> This looks great. Very well thought-out.
>>
>> One comment:
>>
>> "8. In the event of any failover, the standby node will get notified
>> through an event and it will execute a script that will read the
>> configuration specific to the node that went down (again using
>> crm_attribute) and become active."
>>
>> It may not be necessary to use the notifications for this. Pacemaker
>> will call your resource agent with the "start" action on the standby
>> node, after ensuring it is stopped on the previous node. Hopefully the
>> resource agent's start action has (or can have, with configuration
>> options) all the information you need.
>>
>> If you do end up needing notifications, be aware that the feature will
>> be disabled by default in the 1.1.14 release, because changes in syntax
>> are expected in further development. You can define a compile-time
>> constant to enable them.
>>
>>
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Help required for N+1 redundancy setup

2016-01-08 Thread Ken Gaillot
On 01/08/2016 06:55 AM, Nikhil Utane wrote:
> Would like to validate my final config.
> 
> As I mentioned earlier, I will be having (upto) 5 active servers and 1
> standby server.
> The standby server should take up the role of active that went down. Each
> active has some unique configuration that needs to be preserved.
> 
> 1) So I will create total 5 groups. Each group has a "heartbeat::IPaddr2
> resource (for virtual IP) and my custom resource.
> 2) The virtual IP needs to be read inside my custom OCF agent, so I will
> make use of attribute reference and point to the value of IPaddr2 inside my
> custom resource to avoid duplication.
> 3) I will then configure location constraint to run the group resource on 5
> active nodes with higher score and lesser score on standby.
> For e.g.
> Group  NodeScore
> -
> MyGroup1node1   500
> MyGroup1node6   0
> 
> MyGroup2node2   500
> MyGroup2node6   0
> ..
> MyGroup5node5   500
> MyGroup5node6   0
> 
> 4) Now if say node1 were to go down, then stop action on node1 will first
> get called. Haven't decided if I need to do anything specific here.

To clarify, if node1 goes down intentionally (e.g. standby or stop),
then all resources on it will be stopped first. But if node1 becomes
unavailable (e.g. crash or communication outage), it will get fenced.

> 5) But when the start action of node 6 gets called, then using crm command
> line interface, I will modify the above config to swap node 1 and node 6.
> i.e.
> MyGroup1node6   500
> MyGroup1node1   0
> 
> MyGroup2node2   500
> MyGroup2node1   0
> 
> 6) To do the above, I need the newly active and newly standby node names to
> be passed to my start action. What's the best way to get this information
> inside my OCF agent?

Modifying the configuration from within an agent is dangerous -- too
much potential for feedback loops between pacemaker and the agent.

I think stickiness will do what you want here. Set a stickiness higher
than the original node's preference, and the resource will want to stay
where it is.

> 7) Apart from node name, there will be other information which I plan to
> pass by making use of node attributes. What's the best way to get this
> information inside my OCF agent? Use crm command to query?

Any of the command-line interfaces for doing so should be fine, but I'd
recommend using one of the lower-level tools (crm_attribute or
attrd_updater) so you don't have a dependency on a higher-level tool
that may not always be installed.

> Thank You.
> 
> On Tue, Dec 22, 2015 at 9:44 PM, Nikhil Utane 
> wrote:
> 
>> Thanks to you Ken for giving all the pointers.
>> Yes, I can use service start/stop which should be a lot simpler. Thanks
>> again. :)
>>
>> On Tue, Dec 22, 2015 at 9:29 PM, Ken Gaillot  wrote:
>>
>>> On 12/22/2015 12:17 AM, Nikhil Utane wrote:
 I have prepared a write-up explaining my requirements and current
>>> solution
 that I am proposing based on my understanding so far.
 Kindly let me know if what I am proposing is good or there is a better
>>> way
 to achieve the same.


>>> https://drive.google.com/file/d/0B0zPvL-Tp-JSTEJpcUFTanhsNzQ/view?usp=sharing

 Let me know if you face any issue in accessing the above link. Thanks.
>>>
>>> This looks great. Very well thought-out.
>>>
>>> One comment:
>>>
>>> "8. In the event of any failover, the standby node will get notified
>>> through an event and it will execute a script that will read the
>>> configuration specific to the node that went down (again using
>>> crm_attribute) and become active."
>>>
>>> It may not be necessary to use the notifications for this. Pacemaker
>>> will call your resource agent with the "start" action on the standby
>>> node, after ensuring it is stopped on the previous node. Hopefully the
>>> resource agent's start action has (or can have, with configuration
>>> options) all the information you need.
>>>
>>> If you do end up needing notifications, be aware that the feature will
>>> be disabled by default in the 1.1.14 release, because changes in syntax
>>> are expected in further development. You can define a compile-time
>>> constant to enable them.
>>>
>>>
> 


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Help required for N+1 redundancy setup

2016-01-07 Thread Rishin Gangadharan
Hi
   Can anybody tell be ,how to configure corosync pacemaker with crmsh  for 
active -active and N+1redundancy setup for kamailio.

Thanks
Rishin






Disclaimer:  This message and the information contained herein is proprietary 
and confidential and subject to the Tech Mahindra policy statement, you may 
review the policy at http://www.techmahindra.com/Disclaimer.html externally 
http://tim.techmahindra.com/tim/disclaimer.html internally within TechMahindra.


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Help required for N+1 redundancy setup

2015-12-22 Thread Nikhil Utane
Thanks to you Ken for giving all the pointers.
Yes, I can use service start/stop which should be a lot simpler. Thanks
again. :)

On Tue, Dec 22, 2015 at 9:29 PM, Ken Gaillot  wrote:

> On 12/22/2015 12:17 AM, Nikhil Utane wrote:
> > I have prepared a write-up explaining my requirements and current
> solution
> > that I am proposing based on my understanding so far.
> > Kindly let me know if what I am proposing is good or there is a better
> way
> > to achieve the same.
> >
> >
> https://drive.google.com/file/d/0B0zPvL-Tp-JSTEJpcUFTanhsNzQ/view?usp=sharing
> >
> > Let me know if you face any issue in accessing the above link. Thanks.
>
> This looks great. Very well thought-out.
>
> One comment:
>
> "8. In the event of any failover, the standby node will get notified
> through an event and it will execute a script that will read the
> configuration specific to the node that went down (again using
> crm_attribute) and become active."
>
> It may not be necessary to use the notifications for this. Pacemaker
> will call your resource agent with the "start" action on the standby
> node, after ensuring it is stopped on the previous node. Hopefully the
> resource agent's start action has (or can have, with configuration
> options) all the information you need.
>
> If you do end up needing notifications, be aware that the feature will
> be disabled by default in the 1.1.14 release, because changes in syntax
> are expected in further development. You can define a compile-time
> constant to enable them.
>
>
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Help required for N+1 redundancy setup

2015-12-21 Thread Nikhil Utane
I have prepared a write-up explaining my requirements and current solution
that I am proposing based on my understanding so far.
Kindly let me know if what I am proposing is good or there is a better way
to achieve the same.

https://drive.google.com/file/d/0B0zPvL-Tp-JSTEJpcUFTanhsNzQ/view?usp=sharing

Let me know if you face any issue in accessing the above link. Thanks.

On Thu, Dec 3, 2015 at 11:34 PM, Ken Gaillot  wrote:

> On 12/03/2015 05:23 AM, Nikhil Utane wrote:
> > Ken,
> >
> > One more question, if i have to propagate configuration changes between
> the
> > nodes then is cpg (closed process group) the right way?
> > For e.g.
> > Active Node1 has config A=1, B=2
> > Active Node2 has config A=3, B=4
> > Standby Node needs to have configuration for all the nodes such that
> > whichever goes down, it comes up with those values.
> > Here configuration is not static but can be updated at run-time.
>
> Being unfamiliar with the specifics of your case, I can't say what the
> best approach is, but it sounds like you will need to write a custom OCF
> resource agent to manage your service.
>
> A resource agent is similar to an init script:
>
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#ap-ocf
>
> The RA will start the service with the appropriate configuration. It can
> use per-resource options configured in pacemaker or external information
> to do that.
>
> How does your service get its configuration currently?
>
> > BTW, I'm little confused between OpenAIS and Corosync. For my purpose I
> > should be able to use either, right?
>
> Corosync started out as a subset of OpenAIS, optimized for use with
> Pacemaker. Corosync 2 is now the preferred membership layer for
> Pacemaker for most uses, though other layers are still supported.
>
> > Thanks.
> >
> > On Tue, Dec 1, 2015 at 9:04 PM, Ken Gaillot  wrote:
> >
> >> On 12/01/2015 05:31 AM, Nikhil Utane wrote:
> >>> Hi,
> >>>
> >>> I am evaluating whether it is feasible to use Pacemaker + Corosync to
> add
> >>> support for clustering/redundancy into our product.
> >>
> >> Most definitely
> >>
> >>> Our objectives:
> >>> 1) Support N+1 redundancy. i,e. N Active and (up to) 1 Standby.
> >>
> >> You can do this with location constraints and scores. See:
> >>
> >>
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#_deciding_which_nodes_a_resource_can_run_on
> >>
> >> Basically, you give the standby node a lower score than the other nodes.
> >>
> >>> 2) Each node has some different configuration parameters.
> >>> 3) Whenever any active node goes down, the standby node comes up with
> the
> >>> same configuration that the active had.
> >>
> >> How you solve this requirement depends on the specifics of your
> >> situation. Ideally, you can use OCF resource agents that take the
> >> configuration location as a parameter. You may have to write your own,
> >> if none is available for your services.
> >>
> >>> 4) There is no one single process/service for which we need redundancy,
> >>> rather it is the entire system (multiple processes running together).
> >>
> >> This is trivially implemented using either groups or ordering and
> >> colocation constraints.
> >>
> >> Order constraint = start service A before starting service B (and stop
> >> in reverse order)
> >>
> >> Colocation constraint = keep services A and B on the same node
> >>
> >> Group = shortcut to specify several services that need to start/stop in
> >> order and be kept together
> >>
> >>
> >>
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm231363875392
> >>
> >>
> >>
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#group-resources
> >>
> >>
> >>> 5) I would also want to be notified when any active<->standby state
> >>> transition happens as I would want to take some steps at the
> application
> >>> level.
> >>
> >> There are multiple approaches.
> >>
> >> If you don't mind compiling your own packages, the latest master branch
> >> (which will be part of the upcoming 1.1.14 release) has built-in
> >> notification capability. See:
> >> http://blog.clusterlabs.org/blog/2015/reliable-notifications/
> >>
> >> Otherwise, you can use SNMP or e-mail if your packages were compiled
> >> with those options, or you can use the ocf:pacemaker:ClusterMon resource
> >> agent:
> >>
> >>
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm231308442928
> >>
> >>> I went through the documents/blogs but all had example for 1 active
> and 1
> >>> standby use-case and that too for some standard service like httpd.
> >>
> >> Pacemaker is incredibly versatile, and the use cases are far too varied
> >> to cover more than a small subset. Those simple examples show the basic
> >> building blocks, and can usually point you to the specific features you
> 

Re: [ClusterLabs] Help required for N+1 redundancy setup

2015-12-03 Thread Ken Gaillot
On 12/03/2015 05:23 AM, Nikhil Utane wrote:
> Ken,
> 
> One more question, if i have to propagate configuration changes between the
> nodes then is cpg (closed process group) the right way?
> For e.g.
> Active Node1 has config A=1, B=2
> Active Node2 has config A=3, B=4
> Standby Node needs to have configuration for all the nodes such that
> whichever goes down, it comes up with those values.
> Here configuration is not static but can be updated at run-time.

Being unfamiliar with the specifics of your case, I can't say what the
best approach is, but it sounds like you will need to write a custom OCF
resource agent to manage your service.

A resource agent is similar to an init script:
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#ap-ocf

The RA will start the service with the appropriate configuration. It can
use per-resource options configured in pacemaker or external information
to do that.

How does your service get its configuration currently?

> BTW, I'm little confused between OpenAIS and Corosync. For my purpose I
> should be able to use either, right?

Corosync started out as a subset of OpenAIS, optimized for use with
Pacemaker. Corosync 2 is now the preferred membership layer for
Pacemaker for most uses, though other layers are still supported.

> Thanks.
> 
> On Tue, Dec 1, 2015 at 9:04 PM, Ken Gaillot  wrote:
> 
>> On 12/01/2015 05:31 AM, Nikhil Utane wrote:
>>> Hi,
>>>
>>> I am evaluating whether it is feasible to use Pacemaker + Corosync to add
>>> support for clustering/redundancy into our product.
>>
>> Most definitely
>>
>>> Our objectives:
>>> 1) Support N+1 redundancy. i,e. N Active and (up to) 1 Standby.
>>
>> You can do this with location constraints and scores. See:
>>
>> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#_deciding_which_nodes_a_resource_can_run_on
>>
>> Basically, you give the standby node a lower score than the other nodes.
>>
>>> 2) Each node has some different configuration parameters.
>>> 3) Whenever any active node goes down, the standby node comes up with the
>>> same configuration that the active had.
>>
>> How you solve this requirement depends on the specifics of your
>> situation. Ideally, you can use OCF resource agents that take the
>> configuration location as a parameter. You may have to write your own,
>> if none is available for your services.
>>
>>> 4) There is no one single process/service for which we need redundancy,
>>> rather it is the entire system (multiple processes running together).
>>
>> This is trivially implemented using either groups or ordering and
>> colocation constraints.
>>
>> Order constraint = start service A before starting service B (and stop
>> in reverse order)
>>
>> Colocation constraint = keep services A and B on the same node
>>
>> Group = shortcut to specify several services that need to start/stop in
>> order and be kept together
>>
>>
>> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm231363875392
>>
>>
>> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#group-resources
>>
>>
>>> 5) I would also want to be notified when any active<->standby state
>>> transition happens as I would want to take some steps at the application
>>> level.
>>
>> There are multiple approaches.
>>
>> If you don't mind compiling your own packages, the latest master branch
>> (which will be part of the upcoming 1.1.14 release) has built-in
>> notification capability. See:
>> http://blog.clusterlabs.org/blog/2015/reliable-notifications/
>>
>> Otherwise, you can use SNMP or e-mail if your packages were compiled
>> with those options, or you can use the ocf:pacemaker:ClusterMon resource
>> agent:
>>
>> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm231308442928
>>
>>> I went through the documents/blogs but all had example for 1 active and 1
>>> standby use-case and that too for some standard service like httpd.
>>
>> Pacemaker is incredibly versatile, and the use cases are far too varied
>> to cover more than a small subset. Those simple examples show the basic
>> building blocks, and can usually point you to the specific features you
>> need to investigate further.
>>
>>> One additional question, If I am having multiple actives, then Virtual IP
>>> configuration cannot be used? Is it possible such that N actives have
>>> different IP addresses but whenever standby becomes active it uses the IP
>>> address of the failed node?
>>
>> Yes, there are a few approaches here, too.
>>
>> The simplest is to assign a virtual IP to each active, and include it in
>> your group of resources. The whole group will fail over to the standby
>> node if the original goes down.
>>
>> If you want a single virtual IP that is used by all your actives, one
>> alternative is to clone the ocf:heartbeat:IPaddr2 resource. 

Re: [ClusterLabs] Help required for N+1 redundancy setup

2015-12-01 Thread Ken Gaillot
On 12/01/2015 05:31 AM, Nikhil Utane wrote:
> Hi,
> 
> I am evaluating whether it is feasible to use Pacemaker + Corosync to add
> support for clustering/redundancy into our product.

Most definitely

> Our objectives:
> 1) Support N+1 redundancy. i,e. N Active and (up to) 1 Standby.

You can do this with location constraints and scores. See:
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#_deciding_which_nodes_a_resource_can_run_on

Basically, you give the standby node a lower score than the other nodes.

> 2) Each node has some different configuration parameters.
> 3) Whenever any active node goes down, the standby node comes up with the
> same configuration that the active had.

How you solve this requirement depends on the specifics of your
situation. Ideally, you can use OCF resource agents that take the
configuration location as a parameter. You may have to write your own,
if none is available for your services.

> 4) There is no one single process/service for which we need redundancy,
> rather it is the entire system (multiple processes running together).

This is trivially implemented using either groups or ordering and
colocation constraints.

Order constraint = start service A before starting service B (and stop
in reverse order)

Colocation constraint = keep services A and B on the same node

Group = shortcut to specify several services that need to start/stop in
order and be kept together

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm231363875392

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#group-resources


> 5) I would also want to be notified when any active<->standby state
> transition happens as I would want to take some steps at the application
> level.

There are multiple approaches.

If you don't mind compiling your own packages, the latest master branch
(which will be part of the upcoming 1.1.14 release) has built-in
notification capability. See:
http://blog.clusterlabs.org/blog/2015/reliable-notifications/

Otherwise, you can use SNMP or e-mail if your packages were compiled
with those options, or you can use the ocf:pacemaker:ClusterMon resource
agent:
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm231308442928

> I went through the documents/blogs but all had example for 1 active and 1
> standby use-case and that too for some standard service like httpd.

Pacemaker is incredibly versatile, and the use cases are far too varied
to cover more than a small subset. Those simple examples show the basic
building blocks, and can usually point you to the specific features you
need to investigate further.

> One additional question, If I am having multiple actives, then Virtual IP
> configuration cannot be used? Is it possible such that N actives have
> different IP addresses but whenever standby becomes active it uses the IP
> address of the failed node?

Yes, there are a few approaches here, too.

The simplest is to assign a virtual IP to each active, and include it in
your group of resources. The whole group will fail over to the standby
node if the original goes down.

If you want a single virtual IP that is used by all your actives, one
alternative is to clone the ocf:heartbeat:IPaddr2 resource. When cloned,
that resource agent will use iptables' CLUSTERIP functionality, which
relies on multicast Ethernet addresses (not to be confused with
multicast IP). Since multicast Ethernet has limitations, this is not
often used in production.

A more complicated method is to use a virtual IP in combination with a
load-balancer such as haproxy. Pacemaker can manage haproxy and the real
services, and haproxy manages distributing requests to the real services.

> Thanking in advance.
> Nikhil

A last word of advice: Fencing (aka STONITH) is important for proper
recovery from difficult failure conditions. Without it, it is possible
to have data loss or corruption in a split-brain situation.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Help required for N+1 redundancy setup

2015-12-01 Thread Nikhil Utane
Hi,

I am evaluating whether it is feasible to use Pacemaker + Corosync to add
support for clustering/redundancy into our product.

Our objectives:
1) Support N+1 redundancy. i,e. N Active and (up to) 1 Standby.
2) Each node has some different configuration parameters.
3) Whenever any active node goes down, the standby node comes up with the
same configuration that the active had.
4) There is no one single process/service for which we need redundancy,
rather it is the entire system (multiple processes running together).
5) I would also want to be notified when any active<->standby state
transition happens as I would want to take some steps at the application
level.

I went through the documents/blogs but all had example for 1 active and 1
standby use-case and that too for some standard service like httpd.

One additional question, If I am having multiple actives, then Virtual IP
configuration cannot be used? Is it possible such that N actives have
different IP addresses but whenever standby becomes active it uses the IP
address of the failed node?

Thanking in advance.
Nikhil
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org