Re: [Linux-HA] Looking for a suitable Stonith Solution

Stallmann, Andreas Fri, 25 Feb 2011 03:51:56 -0800

Hi!

I conentrate both your answers into one mail, I hope that's allright for you.


> >For now, I need an interim solution, which is, as of now, stonith via 
> >suicide.
> Doesn't work as suicide is not considered reliable - by definition the 
> remaining nodes have no way to verify that the fencing operation was 
> successful.
> Suspect it will still fail though, suicide isnt a supported fencing option - 
> since obviously the other nodes can't confirm it happened.

Ok then, I know I'm a little bit provocative right now:

If "suicide" is no supported fencing option, why is it still included with 
stonith? It's badly documented, and I didn't find a single (official) document 
on howto implement a (stable!) suicide-stonith, but it's there, and thus it 
should be usable. If it isn't, the maintainer should please (please!) remove it 
or supply something that's working. I do know, that's quite demanding, because 
the maintainer will probably do the development in his (or her) free time. 
Still...

I do as well agree, that "suicide" is a very special way of keeping a cluster 
consistent, very different from the other stonith methods. I wouldn't expect it 
under stonith, I'd rather think...

> Yes no-quorum-policy=suicide means that all nodes in the partition will end 
> up being shot, but you still require a real stonith device so that 
> _someone_else_ can perform it.
...that if you set "no-quorum-policy=suicide", the suicide script is executed 
by the node itself. It should be an *extra* feature *besides* stonith. The 
procedure should be something like:

1) node1: Allright, I have no quorum anymore. Let's wait for a while...
2)... a while passes
3) node1: OK, I'm still without quorum, no contact to my peers, whatsoever. I'd 
rather shut myself down, before I cause a mess.

If, during (2), the other nodes find a way to shut down the node externaly (if 
through ssh, a power switch, a virtualisation host...), that's even better, 
because then the cluster "knows", that it's still consistent. I'm with you, 
here.

If a split brain happens in a split site scenario, a "suicide" might be the 
only way to keep up consistency, because no one will be able to reach any 
device on the other site... Please correct me if I'm wrong. What do you do in 
such a case? What's your exemplary implementation of Linux-HA then?

On the other hand, it doen't make any other sense to name a "no-quorum-policy" 
"suicide", if it's anything, but a suicide (if, at all, one could name it 
"assisted suicide").

Please correct me: Do I have a utterly wrong understanding of the whole process 
(that could be very well the case), is the implementation not entirely thought 
through, or is the naming of certain components not as good as it could be?

I might point you to 
http://osdir.com/ml/linux.highavailability.devel/2007-11/msg00026.html, because 
the same thing has been discussed then, and I very much do think, that Lars was 
right with what he wrote. Has anything changed in the concept of 
suicide/quorum-loss/stonith since then? That's not a provocative question, 
well, maybe it is, but it's not meant to be.

In addition: Something that's missing from the manuals is a "case study" (or 
something the like) on how to implement a split side scenario. How should the 
cluster be build then? If you have to sides? If you have one? How should the 
storage-replication be set up? Is synchronous replication like in drbd really a 
good idea then, performance wise? I think I'll finally have to buy a book. :-) 
Any recommendations (either english or german prefered).

Well, thank's a lot again, my brain didn't explode (that's something good, I 
feel), but I'm not entirely happy, though.

Cheers and have a nice weekend,

Andreas


------------------------
CONET Solutions GmbH, Theodor-Heuss-Allee 19, 53773 Hennef.
Registergericht/Registration Court: Amtsgericht Siegburg (HRB Nr. 9136)
Geschäftsführer/Managing Directors: Jürgen Zender (Sprecher/Chairman), Anke 
Höfer
Vorsitzender des Aufsichtsrates/Chairman of the Supervisory Board: Hans Jürgen 
Niemeier

CONET Technologies AG, Theodor-Heuss-Allee 19, 53773 Hennef.
Registergericht/Registration Court: Amtsgericht Siegburg (HRB Nr. 10328 )
Vorstand/Member of the Managementboard: Rüdiger Zeyen (Sprecher/Chairman), 
Wilfried Pütz
Vorsitzender des Aufsichtsrates/Chairman of the Supervisory Board: Dr. Gerd 
Jakob
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Looking for a suitable Stonith Solution

Reply via email to