On Fri, May 25, 2012 at 5:41 PM, Lars Marowsky-Bree <l...@suse.com> wrote: > On 2012-05-25T17:31:52, Florian Haas <flor...@hastexo.com> wrote: > >> > That aside, what do you think of the idea/approach? >> Um, right now I have no opinion. Your commit messages are pretty >> terse, and there's no README in the repo. Mind adding one? > > Good point. I wasn't aware the commit messages were terse ;-) > > To sketch this out: > > Basically though SBD continues as it always did. > > If you specify "-P" to the daemon start-up (usually via > /etc/sysconfig/sbd SBD_OPTS), the following will happen: > > sbd will start (in addition to the worker processes that monitor the > disks) a process that signs in with pacemaker (and corosync). This > process monitors that the partition the local node is part of is > quorate, and that the local node (according to the CIB as run through > pengine) is "healthy". > > If so, the master thread will not self-fence even if the majority of > devices is currently unavailable. > > That's it, nothing more. Does that help?
It does. One naive question: what's the rationale of tying in with Pacemaker's view of things? Couldn't you just consume the quorum and membership information from Corosync alone? > It became needed because customers had scenarios with just one device > (which experienced intermittent problems), where MPIO acted up (I've > seen IO stuck for minutes), or even three devices where failures were > correlated. Then, SBD would self-fence, and the customer be unhappy. > > > (I have opinions on particularly the last failure mode. This seems to > arise specifically when customers have build setups with two HBAs, two > SANs, two storages, but then cross-linked the SANs, connected the HBAs > to each, and the storages too. That seems to frequently lead to > hiccups where the *entire* fabric is affected. I'm thinking this > cross-linking is a case of sham redundancy; it *looks* as if makes > things more redundant, but in reality reduces it since faults are no > longer independent. Alas, they've not wanted to change that.) Henceforth, I'm going to dangle this thread in front of everyone who believes their SAN can never fail. Thanks. :) Are there any SUSEisms in SBD or would you expect it to be packageable on any platform? Cheers, Florian -- Need help with High Availability? http://www.hastexo.com/now _______________________________________________________ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/