Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-16 Thread Valentin Vidic
On Sat, Feb 16, 2019 at 10:23:17PM +, Eric Robinson wrote: > I'm looking through the docs but I don't see how to set the on-fail value for > a resource. It is not set on the resource itself but on each of the actions (monitor, start, stop). -- Valentin

Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-16 Thread Andrei Borzenkov
17.02.2019 0:44, Eric Robinson пишет: > Thanks for the feedback, Andrei. > > I only want cluster failover to occur if the filesystem or drbd resources > fail, or if the cluster messaging layer detects a complete node failure. Is > there a way to tell PaceMaker not to trigger a cluster failover

Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-16 Thread Eric Robinson
I'm looking through the docs but I don't see how to set the on-fail value for a resource. > -Original Message- > From: Users On Behalf Of Eric Robinson > Sent: Saturday, February 16, 2019 1:47 PM > To: Cluster Labs - All topics related to open-source clustering welcomed > > Subject:

Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-16 Thread Eric Robinson
> On Sat, Feb 16, 2019 at 09:33:42PM +, Eric Robinson wrote: > > I just noticed that. I also noticed that the lsb init script has a > > hard-coded stop timeout of 30 seconds. So if the init script waits > > longer than the cluster resource timeout of 15s, that would cause the > > Yes, you

Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-16 Thread Eric Robinson
Thanks for the feedback, Andrei. I only want cluster failover to occur if the filesystem or drbd resources fail, or if the cluster messaging layer detects a complete node failure. Is there a way to tell PaceMaker not to trigger a cluster failover if any of the p_mysql resources fail? >

Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-16 Thread Valentin Vidic
On Sat, Feb 16, 2019 at 09:33:42PM +, Eric Robinson wrote: > I just noticed that. I also noticed that the lsb init script has a > hard-coded stop timeout of 30 seconds. So if the init script waits > longer than the cluster resource timeout of 15s, that would cause the Yes, you should use

Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-16 Thread Andrei Borzenkov
17.02.2019 0:03, Eric Robinson пишет: > Here are the relevant corosync logs. > > It appears that the stop action for resource p_mysql_002 failed, and that > caused a cascading series of service changes. However, I don't understand > why, since no other resources are dependent on p_mysql_002. >

Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-16 Thread Eric Robinson
> -Original Message- > From: Users On Behalf Of Valentin Vidic > Sent: Saturday, February 16, 2019 1:28 PM > To: users@clusterlabs.org > Subject: Re: [ClusterLabs] Why Do All The Services Go Down When Just One > Fails? > > On Sat, Feb 16, 2019 at 09:03:43PM +, Eric Robinson wrote: >

Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-16 Thread Valentin Vidic
On Sat, Feb 16, 2019 at 09:03:43PM +, Eric Robinson wrote: > Here are the relevant corosync logs. > > It appears that the stop action for resource p_mysql_002 failed, and > that caused a cascading series of service changes. However, I don't > understand why, since no other resources are

Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-16 Thread Valentin Vidic
On Sat, Feb 16, 2019 at 08:50:57PM +, Eric Robinson wrote: > Which logs? You mean /var/log/cluster/corosync.log? On the DC node pacemaker will be logging the actions it is trying to run (start or stop some resources). > But even if the stop action is resulting in an error, why would the >

Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-16 Thread Eric Robinson
Here are the relevant corosync logs. It appears that the stop action for resource p_mysql_002 failed, and that caused a cascading series of service changes. However, I don't understand why, since no other resources are dependent on p_mysql_002. [root@001db01a cluster]# cat

Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-16 Thread Valentin Vidic
On Sat, Feb 16, 2019 at 08:34:21PM +, Eric Robinson wrote: > Why is it that when one of the resources that start with p_mysql_* > goes into a FAILED state, all the other MySQL services also stop? Perhaps stop is not working correctly for these lsb services, so for example stopping

Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-16 Thread Eric Robinson
As a follow-up, here is the whole config. [root@001db01a ~]# pcs config Cluster Name: 001db01ab Corosync Nodes: 001db01a 001db01b Pacemaker Nodes: 001db01a 001db01b Resources: Resource: p_vip_clust01 (class=ocf provider=heartbeat type=IPaddr2) Attributes: cidr_netmask=32 ip=10.51.14.75

[ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-16 Thread Eric Robinson
These are the resources on our cluster. [root@001db01a ~]# pcs status Cluster name: 001db01ab Stack: corosync Current DC: 001db01a (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum Last updated: Sat Feb 16 15:24:55 2019 Last change: Sat Feb 16 15:10:21 2019 by root via cibadmin on

Re: [ClusterLabs] Summit

2019-02-16 Thread Digimer
I'm only one person, and I wouldn't want the schedule set to my conveniences. That said, I'd vote for September as I will be pretty busy in May. Also, for both practical and self-serving reasons, September gives more time to plan things. :) On 2019-02-16 11:35 a.m., Chris Feist wrote: > Yes, we

[ClusterLabs] Pacemaker do not schedule  resource which is  in docker  container after docker is restarted but the pacemaker cluster show the resource is started ! 

2019-02-16 Thread ma.jinfeng
There is a issue that pacemaker don't schedule resource which is in docker container after docker is restarted but the pacemaker cluster show the resource is started ,it seems to be a bug of pacemaker . I am very confused what happend when pengine print those logs(pengine: notice:

Re: [ClusterLabs] Summit

2019-02-16 Thread Chris Feist
Yes, we are definitely looking at coordinating another cluster summit. Possibly in May or September, but we're just in the very early idea stages. Thanks, Chris On Fri, Feb 15, 2019 at 10:25 AM Mark Syms wrote: > Just been asked whether there were any plans for another Clusterlabs > Summit as