>> >> Ok, I can understand the need to make really sure the server is >> offline. >> Unfortunately, the stonith reset command will always fail in this >> case, as >> the server is broken and cannot be turned on anymore. >> >> Should the stonith reset command return a success even if the server >> cannot be turned on anymore? > > No - because it didn't perform the action. > Lie to the cluster and it will always bite you in the end - in this > case, when your iLO board (maybe even the network) fails and the node > really is still running in some capacity.
In this case the ipmi board is working fine, and reports the server is off. It just cannot turn the server on again anymore due to a power supply failure. I don't think it would hurt in this case to report a success on the stonith action, as we can be reasonably sure the server is off. Unfortunately there is no way to report a partial success on a stonith reset action (i.e. the server can be turned off, but not on anymore). There also seems to be no stonith action to report the status of the managed device. Maybe there should be a stonith monitor action, which reports the status of the managed device? > >> This is the only way I can think of to get an >> automatic failover in case of a hardware failure. > > Then that's not a good stonith agent/hardware setup. > Its the same reason the SSH agent isn't recommended. The only way to get 100% certainty that a server is down is to pull the power plug yourself. In any other case there can always be some failure in the stonith device where it reports the server is off, while it actually isn't. > >> There is also no way to tell heartbeat manually that the server is >> offline. This means that there seems to be no nice way to recover from >> this situation. > > remove the node from the cluster? > hb_del_node does that i think According to the comment in the script this only works for a 2 node cluster. Can I use cibadmin to just remove the node details of the failed node? Niels _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
