On Wed, Jul 29, 2020 at 2:48 AM Ulrich Windl < ulrich.wi...@rz.uni-regensburg.de> wrote:
> >>> Reid Wahl <nw...@redhat.com> schrieb am 29.07.2020 um 11:39 in > Nachricht > <capiuu98adagzdunkasr3rchrc+o9mh8uqatsn36q633ndxa...@mail.gmail.com>: > > "As it stated in the comments, we don't want to halt or boot via ssh, > only > > reboot." > > > > Generally speaking, a stonith reboot action consists of the following > basic > > sequence of events: > > > > 1. Execute the fence agent with the "off" action. > > 2. Poll the power status of the fenced node until it is powered off. > > 3. Execute the fence agent with the "on" action. > > 4. Poll the power status of the fenced node until it is powered on. > > > > So a custom fence agent that supports reboots, actually needs to support > > off and on actions. > > Are you sure? Sbd can do "off" action, but when the node is off, it cannot > perform an "on" action. So either you can use "off" and the node will > remain off, or you use "reboot" and the node will be reset (and come up > again, hopefully). > I'm referring to conventional power fencing agents. Sorry for not clarifying. Conventional power fencing (e.g., fence_ipmilan and fence_vmware_soap) is most of what I see deployed on a daily basis. > > > > > > As Andrei noted, ssh is **not** a reliable method by which to ensure a > node > > gets rebooted or stops using cluster-managed resources. You can't depend > on > > the ability to SSH to an unhealthy node that needs to be fenced. > > > > The only way to guarantee that an unhealthy or unresponsive node stops > all > > access to shared resources is to power off or reboot the node. (In the > case > > of resources that rely on shared storage, I/O fencing instead of power > > fencing can also work, but that's not ideal.) > > > > As others have said, SBD is a great option. Use it if you can. There are > > also power fencing methods (one example is fence_ipmilan, but the options > > available depend on your hardware or virt platform) that are reliable > under > > most circumstances. > > > > You said that when you stop corosync on node 2, Pacemaker tries to fence > > node 2. There are a couple of possible reasons for that. One possibility > is > > that you stopped or killed corosync without stopping Pacemaker first. (If > > you use pcs, then try `pcs cluster stop`.) Another possibility is that > > resources failed to stop during cluster shutdown on node 2, causing node > 2 > > to be fenced. > > > > On Wed, Jul 29, 2020 at 12:47 AM Andrei Borzenkov <arvidj...@gmail.com> > > wrote: > > > >> > >> > >> On Wed, Jul 29, 2020 at 9:01 AM Gabriele Bulfon <gbul...@sonicle.com> > >> wrote: > >> > >>> That one was taken from a specific implementation on Solaris 11. > >>> The situation is a dual node server with shared storage controller: > both > >>> nodes see the same disks concurrently. > >>> Here we must be sure that the two nodes are not going to import/mount > the > >>> same zpool at the same time, or we will encounter data corruption: > >>> > >> > >> ssh based "stonith" cannot guarantee it. > >> > >> > >> > >>> node 1 will be perferred for pool 1, node 2 for pool 2, only in case > one > >>> of the node goes down or is taken offline the resources should be first > >>> free by the leaving node and taken by the other node. > >>> > >>> Would you suggest one of the available stonith in this case? > >>> > >>> > >> > >> IPMI, managed PDU, SBD ... > >> > >> In practice, the only stonith method that works in case of complete node > >> outage including any power supply is SBD. > >> _______________________________________________ > >> Manage your subscription: > >> https://lists.clusterlabs.org/mailman/listinfo/users > >> > >> ClusterLabs home: https://www.clusterlabs.org/ > >> > > > > > > -- > > Regards, > > > > Reid Wahl, RHCA > > Software Maintenance Engineer, Red Hat > > CEE - Platform Support Delivery - ClusterHA > > > > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ > > -- Regards, Reid Wahl, RHCA Software Maintenance Engineer, Red Hat CEE - Platform Support Delivery - ClusterHA
_______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/