On 23/04/17 12:51 AM, Andrei Borzenkov wrote: > 22.04.2017 23:33, Dmitri Maziuk пишет: >> On 4/22/2017 12:02 PM, Digimer wrote: >> >>> Having SBD properly configured is *massively* safer than no fencing at >>> all. So for people where other fence methods are not available for >>> whatever reason, SBD is the way to go. >> >> Now you're talking. IMO in a 2-node cluster, a node that kills itself in >> response to, say, losing link on eth0 is infinitely preferable to a node >> that tries to shoot the other node when it can't ping it. >> > > How do you know whether node actually killed itself? How do you know > when it is safe to takeover resources from this node?
Watchdog timers work outside the OS. They're hardware devices that will reboot the host unless told not to. So it doesn't matter what state the host is in; It can be stuck in a hung state, paniced, whatever. If the watchdog timer isn't kicked, it will face having it's reset button pressed (effectively). That's why, if you know the kick time, you just have to wait longer than that to know that the lost node is no longer operational. > As a real life example (not Linux/pacemaker) - panicking node flush > eddisk buffers, so it was not safe to access shared filesystem until > this was complete. This could take quite a lot of time, so without agent > on *surviving* node(s) that monitors and acknowledges this process this > resulted in data corruption. > > The problem is not so much how to put node in known state, but how other > node(s) can ensure it was done. -- Digimer Papers and Projects: https://alteeve.com/w/ "I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops." - Stephen Jay Gould _______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org