Re: [Linux-HA] weird heartbeat status

Niels de Carpentier Fri, 28 Mar 2008 08:37:20 -0700

>>
>>  The front003 has a hardware failure, so it is to be expected that the
stonith action will fail. ( This is a custom stonith script, so there
might be some bugs left in it. The xen ocf script is also a custom one
>> )
>>
>>  The real problem is that it shows 2 resources running on the front003,
while this server is obviously offline. It should move the resources
to one of the other servers, but doesn't for some reason.
>
> How can it?  It's offline remember.
> Or at least it _appears_ offline which is the whole point of
> STONITH... to make _sure_ its offline before starting the resources
elsewhere.
>
> So until the STONITH command succeeds, the resources wont be moved. They
show up as running on that node because as far as the cluster can
confirm... they still are.


Ok, I can understand the need to make really sure the server is offline.
Unfortunately, the stonith reset command will always fail in this case, as
the server is broken and cannot be turned on anymore.

Should the stonith reset command return a success even if the server
cannot be turned on anymore? This is the only way I can think of to get an
automatic failover in case of a hardware failure.

There is also no way to tell heartbeat manually that the server is
offline. This means that there seems to be no nice way to recover from
this situation.

Niels



_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] weird heartbeat status

Reply via email to