>> >> The front003 has a hardware failure, so it is to be expected that the stonith action will fail. ( This is a custom stonith script, so there might be some bugs left in it. The xen ocf script is also a custom one >> ) >> >> The real problem is that it shows 2 resources running on the front003, while this server is obviously offline. It should move the resources to one of the other servers, but doesn't for some reason. > > How can it? It's offline remember. > Or at least it _appears_ offline which is the whole point of > STONITH... to make _sure_ its offline before starting the resources elsewhere. > > So until the STONITH command succeeds, the resources wont be moved. They show up as running on that node because as far as the cluster can confirm... they still are.
Ok, I can understand the need to make really sure the server is offline. Unfortunately, the stonith reset command will always fail in this case, as the server is broken and cannot be turned on anymore. Should the stonith reset command return a success even if the server cannot be turned on anymore? This is the only way I can think of to get an automatic failover in case of a hardware failure. There is also no way to tell heartbeat manually that the server is offline. This means that there seems to be no nice way to recover from this situation. Niels _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
