On Fri, May 25, 2012 at 5:41 PM, Lars Marowsky-Bree <l...@suse.com> wrote:
> On 2012-05-25T17:31:52, Florian Haas <flor...@hastexo.com> wrote:
>
>> > That aside, what do you think of the idea/approach?
>> Um, right now I have no opinion. Your commit messages are pretty
>> terse, and there's no README in the repo. Mind adding one?
>
> Good point. I wasn't aware the commit messages were terse ;-)
>
> To sketch this out:
>
> Basically though SBD continues as it always did.
>
> If you specify "-P" to the daemon start-up (usually via
> /etc/sysconfig/sbd SBD_OPTS), the following will happen:
>
> sbd will start (in addition to the worker processes that monitor the
> disks) a process that signs in with pacemaker (and corosync). This
> process monitors that the partition the local node is part of is
> quorate, and that the local node (according to the CIB as run through
> pengine) is "healthy".
>
> If so, the master thread will not self-fence even if the majority of
> devices is currently unavailable.
>
> That's it, nothing more. Does that help?

It does. One naive question: what's the rationale of tying in with
Pacemaker's view of things? Couldn't you just consume the quorum and
membership information from Corosync alone?

> It became needed because customers had scenarios with just one device
> (which experienced intermittent problems), where MPIO acted up (I've
> seen IO stuck for minutes), or even three devices where failures were
> correlated. Then, SBD would self-fence, and the customer be unhappy.
>
>
> (I have opinions on particularly the last failure mode. This seems to
> arise specifically when customers have build setups with two HBAs, two
> SANs, two storages, but then cross-linked the SANs, connected the HBAs
> to each, and the storages too. That seems to frequently lead to
> hiccups where the *entire* fabric is affected. I'm thinking this
> cross-linking is a case of sham redundancy; it *looks* as if makes
> things more redundant, but in reality reduces it since faults are no
> longer independent. Alas, they've not wanted to change that.)

Henceforth, I'm going to dangle this thread in front of everyone who
believes their SAN can never fail. Thanks. :)

Are there any SUSEisms in SBD or would you expect it to be packageable
on any platform?

Cheers,
Florian

-- 
Need help with High Availability?
http://www.hastexo.com/now
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Reply via email to