Hi, Thanks Ken and Ulrich for your replies. With your suggestions I ended up finding out about ocf:heartbeat:ethmonitor and will try to set this up as an additional resource within our cluster.
I can share more information once (if!) I have it working the way I want to. Cheers, Alex > On 07.12.2023, at 08:59, Windl, Ulrich <u.wi...@ukr.de> wrote: > > Hi! > > What about this: Run a ping node for a remote resource to set up some score > value. If the remote is unreachable, the score will reflect that. > Then add a rule chink that score, deciding whether to run the virtual IP or > not. > > Regards, > Ulrich > > -----Original Message----- > From: Users <users-boun...@clusterlabs.org> On Behalf Of Alexander Eastwood > Sent: Wednesday, December 6, 2023 5:56 PM > To: users@clusterlabs.org > Subject: [EXT] [ClusterLabs] Prevent cluster transition when resource > unavailable on both nodes > > Hello, > > I administrate a Pacemaker cluster consisting of 2 nodes, which are connected > to each other via ethernet cable to ensure that they are always able to > communicate with each other. A network switch is also connected to each node > via ethernet cable and provides external access. > > One of the managed resources of the cluster is a virtual IP, which is > assigned to a physical network interface card and thus depends on the network > switch being available. The virtual IP is always hosted on the active node. > > We had the situation where the network switch lost power or was rebooted, as > a result both servers reported `NIC Link is Down`. The recover operation on > the Virtual IP resource then failed repeatedly on the active node, and a > transition was initiated. Since the other node was also unable to start the > resource, the cluster was swaying between the 2 nodes until the NIC links > were up again. > > Is there a way to change this behaviour? I am thinking of the following > sequence of events, but have not been able to find a way to configure this: > > 1. active node detects NIC Link is Down, which affects a resource managed by > the cluster (monitor operation on the resource starts to fail) > 2. active node checks if the other (passive) node in the cluster would be > able to start the resource > 3. if passive node can start the resource, transition all resources to > passive node > 4. if passive node is unable to start the resource, then there is nothing to > be gained a transition, so no action should be taken > > Any pointers or advice will be much appreciated! > > Thank you and kind regards, > > Alex Eastwood > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/