On 2013-05-15T22:55:43, Andreas Kurz <andr...@hastexo.com> wrote: > start-delay is an option of the monitor operation ... in fact means > "don't trust that start was successfull, wait for the initial monitor > some more time"
It can be used on start here though to avoid exactly this situation; and it works fine for that, effectively being equivalent to the "delay" option on stonith (since the start always precedes the fence). > The problem is, this would only make sense for one single stonith > resource that can fence more nodes. In case of a split-brain that would > delay the start on that node where the stonith resource was not running > before and gives that node a "penalty". Sure. In a split-brain scenario, one side will receive a penalty, that's the whole point of this exercise. In particular for the external/sbd agent. Or by grouping all fencing resources to always run on one node; if you don't have access to RHT fence agents, for example. external/sbd also has code to avoid a death-match cycle in case of persistent split-brain scenarios now; after a reboot, the node that was fenced will not join unless the fence is cleared first. (The RHT world calls that "unfence", I believe.) That should be a win for the fence_sbd that I hope to get around to sometime in the next few months, too ;-) > In your example with two stonith resources running all the time, > Digimer's suggestion is a good idea: use one of the redhat fencing > agents, most of them have some sort of "stonith-delay" parameter that > you can use with one instance. It'd make sense to have logic for this embedded at a higher level, somehow; the problem is all too common. Of course, it is most relevant in scenarios where "split brain" is a significantly higher probability than "node down". Which is true for most test scenarios (admins love yanking cables), but in practice, it's mostly truly the node down. Regards, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) "Experience is the name everyone gives to their mistakes." -- Oscar Wilde _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org