> -----Original Message----- > From: Users <users-boun...@clusterlabs.org> On Behalf Of Ken Gaillot > Sent: Tuesday, February 19, 2019 10:31 AM > To: Cluster Labs - All topics related to open-source clustering welcomed > <users@clusterlabs.org> > Subject: Re: [ClusterLabs] Why Do All The Services Go Down When Just One > Fails? > > On Tue, 2019-02-19 at 17:40 +0000, Eric Robinson wrote: > > > -----Original Message----- > > > From: Users <users-boun...@clusterlabs.org> On Behalf Of Andrei > > > Borzenkov > > > Sent: Sunday, February 17, 2019 11:56 AM > > > To: users@clusterlabs.org > > > Subject: Re: [ClusterLabs] Why Do All The Services Go Down When Just > > > One Fails? > > > > > > 17.02.2019 0:44, Eric Robinson пишет: > > > > Thanks for the feedback, Andrei. > > > > > > > > I only want cluster failover to occur if the filesystem or drbd > > > > resources fail, > > > > > > or if the cluster messaging layer detects a complete node failure. > > > Is there a > > > way to tell PaceMaker not to trigger a cluster failover if any of > > > the p_mysql resources fail? > > > > > > > > > > Let's look at this differently. If all these applications depend on > > > each other, you should not be able to stop individual resource in > > > the first place - you need to group them or define dependency so > > > that stopping any resource would stop everything. > > > > > > If these applications are independent, they should not share > > > resources. > > > Each MySQL application should have own IP and own FS and own block > > > device for this FS so that they can be moved between cluster nodes > > > independently. > > > > > > Anything else will lead to troubles as you already observed. > > > > FYI, the MySQL services do not depend on each other. All of them > > depend on the floating IP, which depends on the filesystem, which > > depends on DRBD, but they do not depend on each other. Ideally, the > > failure of p_mysql_002 should not cause failure of other mysql > > resources, but now I understand why it happened. Pacemaker wanted to > > start it on the other node, so it needed to move the floating IP, > > filesystem, and DRBD primary, which had the cascade effect of stopping > > the other MySQL resources. > > > > I think I also understand why the p_vip_clust01 resource blocked. > > > > FWIW, we've been using Linux HA since 2006, originally Heartbeat, but > > then Corosync+Pacemaker. The past 12 years have been relatively > > problem free. This symptom is new for us, only within the past year. > > Our cluster nodes have many separate instances of MySQL running, so it > > is not practical to have that many filesystems, IPs, etc. We are > > content with the way things are, except for this new troubling > > behavior. > > > > If I understand the thread correctly, op-fail=stop will not work > > because the cluster will still try to stop the resources that are > > implied dependencies. > > > > Bottom line is, how do we configure the cluster in such a way that > > there are no cascading circumstances when a MySQL resource fails? > > Basically, if a MySQL resource fails, it fails. We'll deal with that > > on an ad-hoc basis. I don't want the whole cluster to barf. What about > > op-fail=ignore? Earlier, you suggested symmetrical=false might also do > > the trick, but you said it comes with its own can or worms. > > What are the downsides with op-fail=ignore or asymmetrical=false? > > > > --Eric > > Even adding on-fail=ignore to the recurring monitors may not do what you > want, because I suspect that even an ignored failure will make the node less > preferable for all the other resources. But it's worth testing. > > Otherwise, your best option is to remove all the recurring monitors from the > mysql resources, and rely on external monitoring (e.g. nagios, icinga, monit, > ...) to detect problems.
This is probably a dumb question, but can we remove just the monitor operation but leave the resource configured in the cluster? If a node fails over, we do want the resources to start automatically on the new primary node. > -- > Ken Gaillot <kgail...@redhat.com> > > _______________________________________________ > Users mailing list: Users@clusterlabs.org > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org