On 2012-05-25T21:44:25, Florian Haas <[email protected]> wrote:
> > If so, the master thread will not self-fence even if the majority of
> > devices is currently unavailable.
> >
> > That's it, nothing more. Does that help?
>
> It does. One naive question: what's the rationale of tying in with
> Pacemaker's view of things? Couldn't you just consume the quorum and
> membership information from Corosync alone?
Yes and no.
On SLE HA 11 (which, alas, is still the prime motivator for this),
corosync actually gets that state from Pacemaker. And, ultimately, it is
Pacemaker's belief (from the CIB) that pengine bases its fencing
decisions on, so that's where we need to look.
Further, quorum isn't enough. If we have quorum, the local node could
still be dirty (as in: stop failures, unclean, ...) that imply that it
should self-fence, pronto.
Since this overrides the decision to self-fence if the devices are gone,
and thus a real poison pill may no longer be delivered, we must take
steps to minimize that risk.
But yes, what it does now is to sign in both with corosync/ais and
the CIB, querying quorum state from both.
Fun anecdote, I originally thought being notification-driven might be
good enough - until the testers started SIGSTOPping corosync/cib and
complaining that the pacemaker watcher didn't pick up on that ;-)
I know this is bound to have some holes. It can't perform a
comprehensive health check of pacemaker's stack; yet, this only matters
for as long as the loss of devices persists. During that degraded phase,
the system is a bit more fragile. I'm a bit weary of this, because I'm
*sure* these will all get reported one after another and further
contribute to the code obfuscation, but such is reality ...
> > (I have opinions on particularly the last failure mode. This seems to
> > arise specifically when customers have build setups with two HBAs, two
> > SANs, two storages, but then cross-linked the SANs, connected the HBAs
> > to each, and the storages too. That seems to frequently lead to
> > hiccups where the *entire* fabric is affected. I'm thinking this
> > cross-linking is a case of sham redundancy; it *looks* as if makes
> > things more redundant, but in reality reduces it since faults are no
> > longer independent. Alas, they've not wanted to change that.)
>
> Henceforth, I'm going to dangle this thread in front of everyone who
> believes their SAN can never fail. Thanks. :)
Heh. Please dangle it in front of them and explain the benefits of
separation/isolation to them. ;-)
If they followed our recommendation - 2 independent SANs, and a third
iSCSI device over the network (okok, effectively that makes 3 SANs) -
they'd never experience this.
(Since that's how my lab is actually set up, I had some troubles
following the problems they reported initially. Oh, and *don't* get me
started on async IO handling in Linux.)
> Are there any SUSEisms in SBD or would you expect it to be packageable
> on any platform?
Should be packageable on every platform, though I admit that I've not
tried building the pacemaker module against anything but the
corosync+pacemaker+openais stuff we ship on SLE HA 11 so far.
I assume that this may need further work; at least the places I stole
code from had special treatment. And the source code to crm_node
(ccm_epoche.c) ... I *think* this may indicate opportunities for
improving the client libraries in pacemaker to hide all that stuff
better.
Best,
Lars
--
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB
21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde
_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/