On Mon, 7 Nov 2022 14:06:51 +0000 Robert Hayden <robert.h.hay...@oracle.com> wrote:
> > -----Original Message----- > > From: Users <users-boun...@clusterlabs.org> On Behalf Of Valentin Vidic > > via Users > > Sent: Sunday, November 6, 2022 5:20 PM > > To: users@clusterlabs.org > > Cc: Valentin Vidić <vvi...@valentin-vidic.from.hr> > > Subject: Re: [ClusterLabs] [External] : Re: Fence Agent tests > > > > On Sun, Nov 06, 2022 at 09:08:19PM +0000, Robert Hayden wrote: > > > When SBD_PACEMAKER was set to "yes", the lack of network connectivity > > to the node > > > would be seen and acted upon by the remote nodes (evicts and takes > > > over ownership of the resources). But the impacted node would just > > > sit logging IO errors. Pacemaker would keep updating the /dev/watchdog > > > device so SBD would not self evict. Once I re-enabled the network, then > > > > > the > > > > Interesting, not sure if this is the expected behaviour based on: > > > > https://urldefense.com/v3/__https://lists.clusterlabs.org/pipermail/users/2 > > 017- > > August/022699.html__;!!ACWV5N9M2RV99hQ!IvnnhGI1HtTBGTKr4VFabWA > > LeMfBWNhcS0FHsPFHwwQ3Riu5R3pOYLaQPNia- > > GaB38wRJ7Eq4Q3GyT5C3s8y7w$ > > > > Does SBD log "Majority of devices lost - surviving on pacemaker" or > > some other messages related to Pacemaker? > > Yes. > > > > > Also what is the status of Pacemaker when the network is down? Does it > > report no quorum or something else? > > > > Pacemaker on the failing node shows quorum even though it has lost > communication to the Quorum Device and to the other node in the cluster. This is the main issue. Maybe inspecting the corosync-cmapctl output could shed some lights on some setup we are missing? > The non-failing node of the cluster can see the Quorum Device system and > thus correctly determines to fence the failing node and take over its > resources. Normal. > Only after I run firewall-cmd --panic-off, will the failing node start to log > messages about loss of TOTEM and getting a new consensus with the > now visible members. > > I think all of that explains the lack of self-fencing when the sbd setting of > SBD_PACEMAKER=yes is used. I'm not sure. If I understand correctly, SBD_PACEMAKER=yes only instruct sbd to keep an eye on the pacemaker+corosync processes (as described up thread). It doesn't explain why Pacemaker keeps holding the quorum, but I might miss something... _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/