Re: [ClusterLabs] corosync - CS_ERR_BAD_HANDLE when multiple nodes are starting up

2015-10-01 Thread Jan Friesse

Hi,

Thomas Lamprecht napsal(a):

Hello,

we are using corosync version needle (2.3.5) for our cluster filesystem
(pmxcfs).
The situation is the following. First we start up the pmxcfs, which is
an fuse fs. And if there is an cluster configuration, we start also
corosync.
This allows the filesystem to exist on one node 'cluster's or forcing it
in an local mode. We use CPG to send our messages to all members,
the filesystem is in the RAM and all fs operations are sent 'over the
wire'.

The problem is now the following:
When we're restarting all (in my test case 3) nodes at the same time, I
get in 1 from 10 cases only CS_ERR_BAD_HANDLE back when calling


I'm really unsure how to understand what are you doing. You are 
restarting all nodes and get CS_ERR_BAD_HANDLE? I mean, if you are 
restarting all nodes, which node returns CS_ERR_BAD_HANDLE? Or are you 
restarting just pmxcfs? Or just coorsync?



cpg_mcast_joined to send out the data, but only one node.
corosyn-quorumtool shows that we have quorum, and the logs are also
showing a healthy connect to the corosync cluster. The failing handle is
initialized once at the initialization of our filesystem. Should it be
reinitialized on every reconnect?


Again, I'm unsure what you mean by reconnect. On Corosync shudown you 
have to reconnect (I believe this is not the case because you are 
getting error only with 10% probability).



Restarting the filesystem solves this problem, the strange thing is that
isn't clearly reproduce-able and often works just fine.

Are there some known problems or steps we should look for?


Hard to tell but generally:
- Make sure cpg_init really returns CS_OK. If not, returned handle is 
invalid
- Make sure there is no memory corruption and handle is really valid 
(valgrind may be helpful).


Regards,
  Honza




___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] stopping a particular resource throughout the cluster

2015-10-01 Thread Vijay Partha
Hi.

Is it possible to stop a resource running on all nodes from a single node.
Say that i have resource A running on node A and resource A running on node
B. Is it possible to disable the resource A from one node so that the
resource A does not run on both nodes.

-- 
With Regards
P.Vijay
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] [Linux-HA] fence_ec2 agent

2015-10-01 Thread Dejan Muhamedagic
Hi Kazuhiko-san,

On Mon, Sep 28, 2015 at 02:22:02PM +0900, 東一彦 wrote:
> Hi Dejan,
> 
> I made a patch file as unified diff by "hg export tip" command.
> 
> Would you please marge it ?

Merged. I just modified a bit the summary and patch description
beforehand. Many thanks!

Cheers,

Dejan

> 
> 
> Regards,
> Kazuhiko Higashi
> 
> On 2015/09/25 0:04, Dejan Muhamedagic wrote:
> >Hi Kazuhiko-san,
> >
> >On Wed, Mar 25, 2015 at 10:47:01AM +0900, 東一彦 wrote:
> >>Hi Markus,
> >>
> >>I implemented it for trial.
> >>
> >>[diff from http://hg.linux-ha.org/glue/rev/9da0680bc9c0 ]
> >>50d49
> >>< port_default=""
> >>60c59
> >>< ec2_tag=${tag}
> >>---
> >>>[ -n "$tag" ] && ec2_tag="$tag"
> >>63d61
> >>< : ${port=${port_default}}
> >>97c95
> >><   
> >>---
> >>>   
> >>105c103
> >><   
> >>---
> >>>   
> >>132c130
> >><   
> >>---
> >>>   
> >>142c140
> >><   
> >>---
> >>>   
> >>221a220,224
> >>>function monitor()
> >>>{
> >>>   # Is the device ok?
> >>>   aws ec2 describe-instances $options | grep INSTANCES &> 
> >>> /dev/null
> >>>}
> >>267a271
> >>>[ -n "$2" ] && node_to_fence=$2
> >>326a331,334
> >>>if [ -z "$port" ]; then
> >>>   port="$node_to_fence"
> >>>fi
> >>>
> >>379,380c387
> >><   # Is the device ok?
> >><   aws ec2 describe-instances $options | grep INSTANCES &> 
> >>/dev/null
> >>---
> >>>   monitor
> >>391c398
> >><   instance_status $instance > /dev/null
> >>---
> >>>   monitor
> >>
> >>
> >>
> >>It works fine on my environment with 2 patterns settings below.
> >>
> >>[pattern No.1]
> >>Without "port" and "tag" parameters.
> >>And instances has "Name=" tag.
> >>
> >>
> >>primitive prmStonith1-2 stonith:external/ec2 \
> >>  params \
> >>  pcmk_off_timeout="120s" \
> >>  op start interval="0s" timeout="60s" \
> >>  op monitor interval="3600s" timeout="60s" \
> >>  op stop interval="0s" timeout="60s"
> >>
> >>
> >>
> >>[pattern No.2]
> >>With only "tag" parameter.(Without "port" parameter.)
> >>And, The 1st instance(node01) has "Cluster1=node01" tag.
> >>The 2nd instance(node02) has "Cluster1=node02" tag.
> >>
> >>
> >>primitive prmStonith1-2 stonith:external/ec2 \
> >>  params \
> >>  pcmk_off_timeout="120s" \
> >>  tag="Cluster1" \
> >>  op start interval="0s" timeout="60s" \
> >>  op monitor interval="3600s" timeout="60s" \
> >>  op stop interval="0s" timeout="60s"
> >>
> >
> >Sounds good. Sorry for the delay, but would it be possible that
> >you provide a patch as unified diff or similar so that we can
> >apply it.
> >
> >Cheers,
> >
> >Dejan
> >
> >>
> >>Regards,
> >>Kazuhiko Higashi
> >>
> >>
> >>On 2015/03/24 20:48, 東一彦 wrote:
> >>>Hi Markus,
> >>>
> >>>Thank you for the comment.
> >>>
> Would it be possible, to implement this idea as an additional 
> configuration method to the fence_ec2 agent?
> >>>I think that your idea is good.
> >>>
> >>>So, I tries to implement it.
> >>>I'm going to change the fence_ec2(ec2) the following points.
> >>>
> >>>  - the "tag" and the "port" options will be "not" required.
> >>>
> >>>  - if the "port" option is not set, the 2nd argument of ec2 will use as 
> >>> the "port".
> >>>- the 2nd argument of ec2 is "node to fence".
> >>>
> >>>  - the "stat" and "status" action will be same the "monitor" action.
> >>>(for do not use the "port" parameter in "stat" action.)
> >>>
> >>>
> >>>By the above modifications, If it is described uname in the Name tag,
> >>>the setting of the "tag" and "port" parameters are no longer necessary.
> >>>
> >>>
> >>>primitive prmStonith1-2 stonith:external/ec2 \
> >>> params \
> >>> pcmk_off_timeout="120s" \
> >>> op start interval="0s" timeout="60s" \
> >>> op monitor interval="3600s" timeout="60s" \
> >>> op stop interval="0s" timeout="60s"
> >>>
> >>>
> >>>
> >>>You can use "tag" parameter like your "Clustername" tag.
> >>>If cluster nodes(instances) have "Cluster1" tag, and uname is described in 
> >>>that tag,
> >>>it works just like you to expect.
> >>>
> >>>
> >>>primitive prmStonith1-2 stonith:external/ec2 \
> >>> params \
> >>> pcmk_off_timeout="120s" \
> >>> tag="Cluster1" \
> >>> op start interval="0s" timeout="60s" \
> >>> op monitor interval="3600s" timeout="60s" \
> >>> op stop interval="0s" timeout="60s"
> >>>
> >>>
> >>>The 1st instance have "Cluster1=node01" tag-key.
> >>>The 2nd instance have "Cluster1=node02" tag-key.
> >>>The 3rd instance have "Cluster1=node03" tag-key.
> >>>...
> >>>The prmStonith1-2 can fence node01 , node02 and node03.
> >>>
> >>>
> >>>If you like above, I will implement that.
> >>>
> >>>
> >>>Regards,
> >>>Kazuhiko Higashi
> >>>
> >>>
> >>>On 2015/03/19 1:03, Markus Guertler wrote:
> Hi Kazuhiko, Dejan,
> 

Re: [ClusterLabs] IPaddr2 Unkown interface cause a failover that didn't work

2015-10-01 Thread Dejan Muhamedagic
Hi,

On Wed, Sep 30, 2015 at 02:24:32PM -0400, Luc Paulin wrote:
> Hi Everyone,
> I have experience a weird issue last night where our cluster try to
> failover due to an "Unkown interface"
> 
> Look like when the IPaddr2 monitor try to perform a status on eth0, it
> didn't find the device. Both node are VM. I haven't found any reason as why
> eth0 would have "disapear"
> 
> 
> [...]
> Sep 29 21:25:06 node-02 pengine[3240]:error: unpack_rsc_op: Preventing
> vip_v207_174 from re-starting anywhere: operation monitor failed 'not
> configured' (6)

The RA exits with the error code which says that the resource
configuration is invalid. Hence PE won't try to start that
resource again. Normally, we don't expect network interfaces to
disappear, but this should probably be the "not installed" error,
so that the resource can be started on another node. Or even the
"generic" error in case it may be expected that interfaces can
come and go. Did you figure why the interface disappeared?

Thanks,

Dejan

> I know that I found some post that say to run sysctl -w
> net.ipv4.conf.all.promote_secondaries=1 to avoid secondary nic to be remove
> when primary is gone, but in this case the eth0 has a single nic that is
> manage through IPaddr2 within crm configuration
> 
> Here's the configuration or node:
> 
> 
> Cluster Name: nodecluster1
> Corosync Nodes:
>  node-01 node-02
> Pacemaker Nodes:
>  node-01 node-02
> 
> Resources:
>  Group: lbpcivip
>   Resource: vip_v207_174 (class=ocf provider=heartbeat type=IPaddr2)
>Attributes: ip=x.x.x.174 cidr_netmask=27 broadcast=x.x.x.191 nic=eth0
>Operations: monitor interval=10s (vip_v207_174-monitor-interval-10s)
>   Resource: vip_v26_1 (class=ocf provider=heartbeat type=IPaddr2)
>Attributes: ip=x.x.26.1
>Operations: monitor interval=10s (vip_v26_1-monitor-interval-10s)
>   Resource: vip_v27_1 (class=ocf provider=heartbeat type=IPaddr2)
>Attributes: ip=x.x.27.1
>Operations: monitor interval=10s (vip_v27_1-monitor-interval-10s)
>   Resource: vip_v254_230 (class=ocf provider=heartbeat type=IPaddr2)
>Attributes: ip=x.x.254.230
>Operations: monitor interval=10s (vip_v254_230-monitor-interval-10s)
>   Resource: change-default-fw (class=lsb type=fwdefaultgw)
>Operations: monitor interval=60s (change-default-fw-monitor-interval-60s)
>   Resource: fwcorp-mailto-sysadmin (class=ocf provider=heartbeat
> type=MailTo)
>Attributes: email=i...@touchtunes.com subject="[node - Clustered
> services]"
>Operations: monitor interval=60s
> (fwcorp-mailto-sysadmin-monitor-interval-60s)
> 
> Stonith Devices:
> Fencing Levels:
> 
> Location Constraints:
> Ordering Constraints:
> Colocation Constraints:
> 
> Cluster Properties:
>  cluster-infrastructure: cman
>  dc-version: 1.1.11-97629de
>  last-lrm-refresh: 1412269491
>  no-quorum-policy: ignore
>  stonith-enabled: false
> 
> 
> Has anyone have suggestion on how I can solve this issue? Why did the
> failover from node1 to node2 didn't work ?
> 
> If more information is require let me know, any suggestion would be
> appreciated!
> 
> Thanx!
> 
> 
> --
>  !
>( o o )
>  --oOO(_)OOo--
>Luc Paulin
>email: paulinster(at)gmail.com
>Skype: paulinster

> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] stopping a particular resource throughout the cluster

2015-10-01 Thread Dejan Muhamedagic
Hi,

On Thu, Oct 01, 2015 at 06:20:32PM +0530, Vijay Partha wrote:
> Hi.
> 
> Is it possible to stop a resource running on all nodes from a single node.
> Say that i have resource A running on node A and resource A running on node
> B. Is it possible to disable the resource A from one node so that the
> resource A does not run on both nodes.

That sounds like a cloned resource. You can just stop it and it
won't run anywhere.

Thanks,

Dejan

> -- 
> With Regards
> P.Vijay

> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Antw: stopping a particular resource throughout the cluster

2015-10-01 Thread Ulrich Windl
>>> Ulrich Windl schrieb am 01.10.2015 um 16:04 in Nachricht <560D3D5F.860 : 
>>> 161 :
60728>:
 Vijay Partha  schrieb am 01.10.2015 um 14:50 in
> Nachricht
> :
> > Hi.
> > 
> > Is it possible to stop a resource running on all nodes from a single node.
> > Say that i have resource A running on node A and resource A running on node
> > B. Is it possible to disable the resource A from one node so that the
> > resource A does not run on both nodes.
> 
> 1) Not using clone
> 2) use fencing
> 3) use a location constraint
> 4) ;-)

4) Means: "don't use a broken resource agent", of course...




___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Antw: stopping a particular resource throughout the cluster

2015-10-01 Thread Ulrich Windl
>>> Vijay Partha  schrieb am 01.10.2015 um 14:50 in
Nachricht
:
> Hi.
> 
> Is it possible to stop a resource running on all nodes from a single node.
> Say that i have resource A running on node A and resource A running on node
> B. Is it possible to disable the resource A from one node so that the
> resource A does not run on both nodes.

1) Not using clone
2) use fencing
3) use a location constraint
4) ;-)

> 
> -- 
> With Regards
> P.Vijay





___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] disable failover

2015-10-01 Thread Vijay Partha
Hi,

I want to know how to disable failover. If a node undergoes a failover the
resources running on the node should not be started on the other node in
the cluster. How can this be achieved.

-- 
With Regards
P.Vijay
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] disable failover

2015-10-01 Thread Andrei Borzenkov
On Thu, Oct 1, 2015 at 5:30 PM, Vijay Partha  wrote:
> Hi,
>
> I want to know how to disable failover. If a node undergoes a failover the
> resources running on the node should not be started on the other node in the
> cluster. How can this be achieved.
>

What exactly "node undergoes failover" means? Nodes do not failover -
resources may fail over between nodes.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] disable failover

2015-10-01 Thread Vijay Partha
For example. Lets have a cluster of 2 nodes node A and node B. Say on node
A i have resource A running. If node A goes down i dont want the resource A
to start on node B.

On Thu, Oct 1, 2015 at 8:18 PM, Andrei Borzenkov 
wrote:

> On Thu, Oct 1, 2015 at 5:30 PM, Vijay Partha 
> wrote:
> > Hi,
> >
> > I want to know how to disable failover. If a node undergoes a failover
> the
> > resources running on the node should not be started on the other node in
> the
> > cluster. How can this be achieved.
> >
>
> What exactly "node undergoes failover" means? Nodes do not failover -
> resources may fail over between nodes.
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



-- 
With Regards
P.Vijay
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] disable failover

2015-10-01 Thread Andrei Borzenkov
On Thu, Oct 1, 2015 at 5:54 PM, Vijay Partha  wrote:
> For example. Lets have a cluster of 2 nodes node A and node B. Say on node A
> i have resource A running. If node A goes down i dont want the resource A to
> start on node B.
>

Do you want it temporary (e.g. during maintenance) or permanently?
Permanently you can define constraints. Temporary you can set
is-managed to false for resources on this node (do not forget to undo
it later). Or set global maintenance mode (but this affects all
resources on all nodes).

> On Thu, Oct 1, 2015 at 8:18 PM, Andrei Borzenkov 
> wrote:
>>
>> On Thu, Oct 1, 2015 at 5:30 PM, Vijay Partha 
>> wrote:
>> > Hi,
>> >
>> > I want to know how to disable failover. If a node undergoes a failover
>> > the
>> > resources running on the node should not be started on the other node in
>> > the
>> > cluster. How can this be achieved.
>> >
>>
>> What exactly "node undergoes failover" means? Nodes do not failover -
>> resources may fail over between nodes.
>>
>> ___
>> Users mailing list: Users@clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
>
>
> --
> With Regards
> P.Vijay
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] disable failover

2015-10-01 Thread Vijay Partha
I want this to be done on permanent basis. Could you tell me the constaints
that has to be given for this to be achieved.

On Thu, Oct 1, 2015 at 8:36 PM, Andrei Borzenkov 
wrote:

> On Thu, Oct 1, 2015 at 5:54 PM, Vijay Partha 
> wrote:
> > For example. Lets have a cluster of 2 nodes node A and node B. Say on
> node A
> > i have resource A running. If node A goes down i dont want the resource
> A to
> > start on node B.
> >
>
> Do you want it temporary (e.g. during maintenance) or permanently?
> Permanently you can define constraints. Temporary you can set
> is-managed to false for resources on this node (do not forget to undo
> it later). Or set global maintenance mode (but this affects all
> resources on all nodes).
>
> > On Thu, Oct 1, 2015 at 8:18 PM, Andrei Borzenkov 
> > wrote:
> >>
> >> On Thu, Oct 1, 2015 at 5:30 PM, Vijay Partha 
> >> wrote:
> >> > Hi,
> >> >
> >> > I want to know how to disable failover. If a node undergoes a failover
> >> > the
> >> > resources running on the node should not be started on the other node
> in
> >> > the
> >> > cluster. How can this be achieved.
> >> >
> >>
> >> What exactly "node undergoes failover" means? Nodes do not failover -
> >> resources may fail over between nodes.
> >>
> >> ___
> >> Users mailing list: Users@clusterlabs.org
> >> http://clusterlabs.org/mailman/listinfo/users
> >>
> >> Project Home: http://www.clusterlabs.org
> >> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >> Bugs: http://bugs.clusterlabs.org
> >
> >
> >
> >
> > --
> > With Regards
> > P.Vijay
> >
> > ___
> > Users mailing list: Users@clusterlabs.org
> > http://clusterlabs.org/mailman/listinfo/users
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> >
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



-- 
With Regards
P.Vijay
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] disable failover

2015-10-01 Thread Ken Gaillot
On 10/01/2015 09:54 AM, Vijay Partha wrote:
> For example. Lets have a cluster of 2 nodes node A and node B. Say on node
> A i have resource A running. If node A goes down i dont want the resource A
> to start on node B.

I assume the goal is to do this temporarily, for example, to perform
some maintenance on resource A? (If not, why put it in HA in the first
place?)

You have a few options for temporary maintenance:

* You can make a particular resource or resources "unmanaged", which
means Pacemaker will no longer try to start or stop them. To do this,
set the resource's "is-managed" meta-attribute to false. You might also
want to disable any recurring monitor operations on them, by setting the
monitor operation's "enabled" option to false.

* You can put the entire cluster into maintenance mode, in which case
all resources are made unmanaged. To do this, set the "maintenance-mode"
cluster option to true.

You can start and stop services as desired at that point, however you
shouldn't move a service when it is unmanaged (i.e. start a service on a
different node than the cluster last thought it was on).

You can also put a node into standby mode to do maintenance on the node
itself (e.g. reboot for a kernel update), but that will move all
resources to the other node.

Of course, remember to undo those changes when done with maintenance,
and realize that Pacemaker may then decide to move resources around if
circumstances call for it.

> On Thu, Oct 1, 2015 at 8:18 PM, Andrei Borzenkov 
> wrote:
> 
>> On Thu, Oct 1, 2015 at 5:30 PM, Vijay Partha 
>> wrote:
>>> Hi,
>>>
>>> I want to know how to disable failover. If a node undergoes a failover
>> the
>>> resources running on the node should not be started on the other node in
>> the
>>> cluster. How can this be achieved.
>>>
>>
>> What exactly "node undergoes failover" means? Nodes do not failover -
>> resources may fail over between nodes.



___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] disable failover

2015-10-01 Thread Vijay Partha
Could u help me out on this please.


On Thu, Oct 1, 2015 at 8:38 PM, Vijay Partha 
wrote:

> I want this to be done on permanent basis. Could you tell me the
> constaints that has to be given for this to be achieved.
>
> On Thu, Oct 1, 2015 at 8:36 PM, Andrei Borzenkov 
> wrote:
>
>> On Thu, Oct 1, 2015 at 5:54 PM, Vijay Partha 
>> wrote:
>> > For example. Lets have a cluster of 2 nodes node A and node B. Say on
>> node A
>> > i have resource A running. If node A goes down i dont want the resource
>> A to
>> > start on node B.
>> >
>>
>> Do you want it temporary (e.g. during maintenance) or permanently?
>> Permanently you can define constraints. Temporary you can set
>> is-managed to false for resources on this node (do not forget to undo
>> it later). Or set global maintenance mode (but this affects all
>> resources on all nodes).
>>
>> > On Thu, Oct 1, 2015 at 8:18 PM, Andrei Borzenkov 
>> > wrote:
>> >>
>> >> On Thu, Oct 1, 2015 at 5:30 PM, Vijay Partha > >
>> >> wrote:
>> >> > Hi,
>> >> >
>> >> > I want to know how to disable failover. If a node undergoes a
>> failover
>> >> > the
>> >> > resources running on the node should not be started on the other
>> node in
>> >> > the
>> >> > cluster. How can this be achieved.
>> >> >
>> >>
>> >> What exactly "node undergoes failover" means? Nodes do not failover -
>> >> resources may fail over between nodes.
>> >>
>> >> ___
>> >> Users mailing list: Users@clusterlabs.org
>> >> http://clusterlabs.org/mailman/listinfo/users
>> >>
>> >> Project Home: http://www.clusterlabs.org
>> >> Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> >> Bugs: http://bugs.clusterlabs.org
>> >
>> >
>> >
>> >
>> > --
>> > With Regards
>> > P.Vijay
>> >
>> > ___
>> > Users mailing list: Users@clusterlabs.org
>> > http://clusterlabs.org/mailman/listinfo/users
>> >
>> > Project Home: http://www.clusterlabs.org
>> > Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> > Bugs: http://bugs.clusterlabs.org
>> >
>>
>> ___
>> Users mailing list: Users@clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
>
>
> --
> With Regards
> P.Vijay
>



-- 
With Regards
P.Vijay
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] disable failover

2015-10-01 Thread Kai Dupke
On 10/01/2015 05:35 PM, Vijay Partha wrote:
> Could u help me out on this please.

It would help if you could elaborate on the wish for an HA stack, if you
don't want to use the stack.

But if you don't want HA, then just do not install HA & do not configure
this application as resource in the HA stack and start it on the command
line / use the standard start-stop system of your Linux.

greetings
Kai Dupke
Senior Product Manager
Server Product Line
-- 
Sell not virtue to purchase wealth, nor liberty to purchase power.
Phone:  +49-(0)5102-9310828 Mail: kdu...@suse.com
Mobile: +49-(0)173-5876766  WWW:  www.suse.com

SUSE Linux GmbH - Maxfeldstr. 5 - 90409 Nuernberg (Germany)
GF:Felix Imendörffer,Jane Smithard,Graham Norton,HRB 21284 (AG Nürnberg)

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Asterisk as a resource

2015-10-01 Thread ‪H Yavari‬ ‪
Hi,
I'm newbie so sorry for this questions.But I can't find any usuful doc. 
I added ocf resource agent of asterisk to my heartbeat lib. I used this command 
to add a resource :pcs resource create pbx ocf:heartbeat:asterisk params 
user="root" group="root" maxfiles="65536" op start interval="1" timeout="30s" 
op monitor interval="5s" timeout="30s"

but when I run "pcs status", I received "FAILED (unmanaged)"  and " pbx_start_0 
on ha-1 'unknown error' (1): call=12, status=Timed Out, exitreason='none',
    last-rc-change='Thu Oct  1 23:40:53 2015', queued=0ms, exec=20003ms"
errors.
So what is problem? 

(I configured IPaddr2 too and It's work.)

Thanks for reply.
 

 From: ‪H Yavari‬ ‪ 
 
   


Hi,
I want to add Asterisk pbx as a rsource to pacemaker/corosync. I'm using that 
latest version (version 1.1.13-a14efad). I searched but I could find only old 
version configuration.Can you give me some hints for configs?Thanks.
Regards.


   

  ___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] disable failover

2015-10-01 Thread Vijay Partha
i want pacemaker to monitor the resources running on each node and at the
same time restart it. It should run on the same node.

On Thu, Oct 1, 2015 at 9:17 PM, Kai Dupke  wrote:

> On 10/01/2015 05:35 PM, Vijay Partha wrote:
> > Could u help me out on this please.
>
> It would help if you could elaborate on the wish for an HA stack, if you
> don't want to use the stack.
>
> But if you don't want HA, then just do not install HA & do not configure
> this application as resource in the HA stack and start it on the command
> line / use the standard start-stop system of your Linux.
>
> greetings
> Kai Dupke
> Senior Product Manager
> Server Product Line
> --
> Sell not virtue to purchase wealth, nor liberty to purchase power.
> Phone:  +49-(0)5102-9310828 Mail: kdu...@suse.com
> Mobile: +49-(0)173-5876766  WWW:  www.suse.com
>
> SUSE Linux GmbH - Maxfeldstr. 5 - 90409 Nuernberg (Germany)
> GF:Felix Imendörffer,Jane Smithard,Graham Norton,HRB 21284 (AG Nürnberg)
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



-- 
With Regards
P.Vijay
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] disable failover

2015-10-01 Thread Andrei Borzenkov

01.10.2015 19:09, Vijay Partha пишет:

i want pacemaker to monitor the resources running on each node and at the
same time restart it. It should run on the same node.



Then create single node cluster. Why do you add second node if you do 
not want to use it?



On Thu, Oct 1, 2015 at 9:17 PM, Kai Dupke  wrote:


On 10/01/2015 05:35 PM, Vijay Partha wrote:

Could u help me out on this please.


It would help if you could elaborate on the wish for an HA stack, if you
don't want to use the stack.

But if you don't want HA, then just do not install HA & do not configure
this application as resource in the HA stack and start it on the command
line / use the standard start-stop system of your Linux.

greetings
Kai Dupke
Senior Product Manager
Server Product Line
--
Sell not virtue to purchase wealth, nor liberty to purchase power.
Phone:  +49-(0)5102-9310828 Mail: kdu...@suse.com
Mobile: +49-(0)173-5876766  WWW:  www.suse.com

SUSE Linux GmbH - Maxfeldstr. 5 - 90409 Nuernberg (Germany)
GF:Felix Imendörffer,Jane Smithard,Graham Norton,HRB 21284 (AG Nürnberg)

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org







___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Asterisk as a resource

2015-10-01 Thread Ken Gaillot
On 10/01/2015 11:04 AM, ‪H Yavari‬ ‪ wrote:
> Hi,
> I'm newbie so sorry for this questions.But I can't find any usuful doc. 
> I added ocf resource agent of asterisk to my heartbeat lib. I used this 
> command to add a resource :pcs resource create pbx ocf:heartbeat:asterisk 
> params user="root" group="root" maxfiles="65536" op start interval="1" 
> timeout="30s" op monitor interval="5s" timeout="30s"
> 
> but when I run "pcs status", I received "FAILED (unmanaged)"  and " 
> pbx_start_0 on ha-1 'unknown error' (1): call=12, status=Timed Out, 
> exitreason='none',
> last-rc-change='Thu Oct  1 23:40:53 2015', queued=0ms, exec=20003ms"
> errors.
> So what is problem? 

I'd take the asterisk resource out of the cluster first, and make sure
it can be started manually with no errors. If so, I'd next try calling
the resource agent directly to see what error it reports.

I haven't used the asterisk resource agent so I can't be much more
specific than that. FYI, some issues to consider when running asterisk HA:

* The easiest setup is pure SIP. If you have a physical line (T1, ISDN,
whatever), that complicates the situation significantly.

* It's best to have two SIP providers so that you don't have a single
point of failure there. FreePBX (based on asterisk) has some nice
features to simplify this.

* You need shared/replicated storage for asterisk's files (voice mails,
etc.).

* In the past, I've run FreePBX inside a VM, and made the VM the HA
resource instead of asterisk directly. That can simplify the HA setup.
VMs have more startup time, but there's the possible benefit of live
migration. I expect using a Docker container would be another good
alternative.

> (I configured IPaddr2 too and It's work.)
> 
> Thanks for reply.
>  
> 
>  From: ‪H Yavari‬ ‪ 
>  
>
> 
> 
> Hi,
> I want to add Asterisk pbx as a rsource to pacemaker/corosync. I'm using that 
> latest version (version 1.1.13-a14efad). I searched but I could find only old 
> version configuration.Can you give me some hints for configs?Thanks.
> Regards.


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] IPaddr2 Unkown interface cause a failover that didn't work

2015-10-01 Thread Luc Paulin
2015-10-01 9:30 GMT-04:00 Dejan Muhamedagic :

> Hi,
>
> On Wed, Sep 30, 2015 at 02:24:32PM -0400, Luc Paulin wrote:
> > Hi Everyone,
> > I have experience a weird issue last night where our cluster try to
> > failover due to an "Unkown interface"
> >
> > Look like when the IPaddr2 monitor try to perform a status on eth0, it
> > didn't find the device. Both node are VM. I haven't found any reason as
> why
> > eth0 would have "disapear"
> >
> > 
> > [...]
> > Sep 29 21:25:06 node-02 pengine[3240]:error: unpack_rsc_op:
> Preventing
> > vip_v207_174 from re-starting anywhere: operation monitor failed 'not
> > configured' (6)
>
> The RA exits with the error code which says that the resource
> configuration is invalid. Hence PE won't try to start that
> resource again. Normally, we don't expect network interfaces to
> disappear, but this should probably be the "not installed" error,
> so that the resource can be started on another node. Or even the
> "generic" error in case it may be expected that interfaces can
> come and go. Did you figure why the interface disappeared?
>
>
No we haven't been able to figure out why the interface disappeared.
Actually it doesn't seem to have disappeared as we have no evidence that
interface was gone from kernel log.  As you say this should probably have
be in the "not intstalled" or "generic" error so it tries to start it on
another node, but obviously, network interface that disapear is not
something that we expect to see.



> Thanks,
>
> Dejan
>
> > I know that I found some post that say to run sysctl -w
> > net.ipv4.conf.all.promote_secondaries=1 to avoid secondary nic to be
> remove
> > when primary is gone, but in this case the eth0 has a single nic that is
> > manage through IPaddr2 within crm configuration
> >
> > Here's the configuration or node:
> >
> > 
> > Cluster Name: nodecluster1
> > Corosync Nodes:
> >  node-01 node-02
> > Pacemaker Nodes:
> >  node-01 node-02
> >
> > Resources:
> >  Group: lbpcivip
> >   Resource: vip_v207_174 (class=ocf provider=heartbeat type=IPaddr2)
> >Attributes: ip=x.x.x.174 cidr_netmask=27 broadcast=x.x.x.191 nic=eth0
> >Operations: monitor interval=10s (vip_v207_174-monitor-interval-10s)
> >   Resource: vip_v26_1 (class=ocf provider=heartbeat type=IPaddr2)
> >Attributes: ip=x.x.26.1
> >Operations: monitor interval=10s (vip_v26_1-monitor-interval-10s)
> >   Resource: vip_v27_1 (class=ocf provider=heartbeat type=IPaddr2)
> >Attributes: ip=x.x.27.1
> >Operations: monitor interval=10s (vip_v27_1-monitor-interval-10s)
> >   Resource: vip_v254_230 (class=ocf provider=heartbeat type=IPaddr2)
> >Attributes: ip=x.x.254.230
> >Operations: monitor interval=10s (vip_v254_230-monitor-interval-10s)
> >   Resource: change-default-fw (class=lsb type=fwdefaultgw)
> >Operations: monitor interval=60s
> (change-default-fw-monitor-interval-60s)
> >   Resource: fwcorp-mailto-sysadmin (class=ocf provider=heartbeat
> > type=MailTo)
> >Attributes: email=i...@touchtunes.com subject="[node - Clustered
> > services]"
> >Operations: monitor interval=60s
> > (fwcorp-mailto-sysadmin-monitor-interval-60s)
> >
> > Stonith Devices:
> > Fencing Levels:
> >
> > Location Constraints:
> > Ordering Constraints:
> > Colocation Constraints:
> >
> > Cluster Properties:
> >  cluster-infrastructure: cman
> >  dc-version: 1.1.11-97629de
> >  last-lrm-refresh: 1412269491
> >  no-quorum-policy: ignore
> >  stonith-enabled: false
> > 
> >
> > Has anyone have suggestion on how I can solve this issue? Why did the
> > failover from node1 to node2 didn't work ?
> >
> > If more information is require let me know, any suggestion would be
> > appreciated!
> >
> > Thanx!
> >
> >
> > --
> >  !
> >( o o )
> >  --oOO(_)OOo--
> >Luc Paulin
> >email: paulinster(at)gmail.com
> >Skype: paulinster
>
> > ___
> > Users mailing list: Users@clusterlabs.org
> > http://clusterlabs.org/mailman/listinfo/users
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
>
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Antw: Re: disable failover

2015-10-01 Thread Ulrich Windl
>>> Kai Dupke  schrieb am 01.10.2015 um 17:47 in Nachricht
<560d5598.2080...@suse.com>:
> On 10/01/2015 05:35 PM, Vijay Partha wrote:
>> Could u help me out on this please.
> 
> It would help if you could elaborate on the wish for an HA stack, if you
> don't want to use the stack.
> 
> But if you don't want HA, then just do not install HA & do not configure
> this application as resource in the HA stack and start it on the command
> line / use the standard start-stop system of your Linux.

Maybe monit ist the solution for this case.

> 
> greetings
> Kai Dupke
> Senior Product Manager
> Server Product Line
> -- 
> Sell not virtue to purchase wealth, nor liberty to purchase power.
> Phone:  +49-(0)5102-9310828 Mail: kdu...@suse.com 
> Mobile: +49-(0)173-5876766  WWW:  www.suse.com 
> 
> SUSE Linux GmbH - Maxfeldstr. 5 - 90409 Nuernberg (Germany)
> GF:Felix Imendörffer,Jane Smithard,Graham Norton,HRB 21284 (AG Nürnberg)
> 
> ___
> Users mailing list: Users@clusterlabs.org 
> http://clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org 




___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org