Hi Andrew, >> If "suicide" is no supported fencing option, why is it still included with >> stonith? > Left over from heartbeat v1 days I guess. > Could also be a testing-only device like ssh.
www.clusterlabs.org tells me, you're the Pacemaker project leader. Would you, by chance, know who maintains or maintained the suicide-stonith-plugin? It maybe "testing-only", yes. But at least, ssh is working as intended. >> It's badly documented, and I didn't find a single (official) document >> on howto implement a (stable!) suicide-stonith, > Because you can't. Suicide is not, will not, can not be reliable. Yes, you're right. But under certain circumstances (1. nodes are still alive, 2. both redundant communication channels [networks] are down, 3. policy requires no node to be up, which has no quorum) it might be a good addition to a "regular" stonith (because if [2] happens, pacemaker/stonith will probably not be able to control a network power switch etc.) Could we agree on that? If not: What's your recommended setup for (resp. against) such situations? Think of "split sites" here! > The whole point of stonith is to create a known node state (off) in > situations where you cannot be sure if your peer is alive, dead > or some > state in-between. Yes, so don't file "suicide" under "stonith"! We implemented a different approach in a two node cluster: We wrote a script that checks (by means of cron) the connectivity (by means of ping) to the peer (if connected, everything fine) and then (if peer are not reachable) to some quorum nodes. If either the peer or a majority of the quorum nodes are alive, nothing happens. If "quorum" is lost, the node shut's itself down. We did that, because drbd tended to misbehave in situations, where all network connectivity was lost. We'd rather have a clean shutdown on both sides, than a corrupt filesystem. I always consider this solution as unelegant, mainly because it wasn't controllable via crm. Thus I hoped, I could forget this solution when using pacemaker. It seems, I can not. If there's any interest from the community in our "suicide by cron"-solution, tell me if and how to contribute. > It requires a "sick" node to suddenly start functioning correctly - so > attempting to self-terminate makes some sense, relying on it to succeed does > not seem prudent. Yeeees! But it's not always the node, that's sick. Sometimes (even with the best and most redundant network), the connectivity between the node ist the problem, not a marauding pacemaker or openais! Again: Please tell me, what's your solution in that case? >> On the other hand, it doen't make any other sense to name a >> "no-quorum-policy" "suicide", if it's anything, but a suicide (if, at all, >> one could name it "assisted suicide"). This question is still unanswered. Does "no quorum-policy suicide" really have a meaning? Or is it as well a leftover from the times of "heartbeat". Is it still functional? Cheers, Andreas ------------------------ CONET Solutions GmbH, Theodor-Heuss-Allee 19, 53773 Hennef. Registergericht/Registration Court: Amtsgericht Siegburg (HRB Nr. 9136) Geschäftsführer/Managing Directors: Jürgen Zender (Sprecher/Chairman), Anke Höfer Vorsitzender des Aufsichtsrates/Chairman of the Supervisory Board: Hans Jürgen Niemeier CONET Technologies AG, Theodor-Heuss-Allee 19, 53773 Hennef. Registergericht/Registration Court: Amtsgericht Siegburg (HRB Nr. 10328 ) Vorstand/Member of the Managementboard: Rüdiger Zeyen (Sprecher/Chairman), Wilfried Pütz Vorsitzender des Aufsichtsrates/Chairman of the Supervisory Board: Dr. Gerd Jakob _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems