On 07/13/2016 03:10 AM, Ulrich Windl wrote:
>>>> Ken Gaillot <kgail...@redhat.com> schrieb am 12.07.2016 um 21:19 in 
>>>> Nachricht
> <578542bf.9010...@redhat.com>:
>> On 07/12/2016 01:16 AM, Ulrich Windl wrote:
> 
> [...]
>>> What I mean is: there is no "success status" for STONITH; it is assumed that
>>> the node will be down after issuing a successful stonith command. You are
>>> claiming your stonith command was not logging any error, so the cluster will
>>> assume STONITH was successful after a timeout.
>>
>> Fence agents do return success/failure; the cluster considers a timeout
>> to be a failure. The only time the cluster assumes a successful fence is
>> when sbd-based watchdog is in use.
> 
> Hi!
> 
> Sorry, but I don't see the difference: If SBD delivers a command 
> successfully, there is no guarantee that the victim node actually executes 
> the command and resets.
> If you use any other fencing command (like submitting some command to an 
> external device) the situation is not different: Successfully submitting the 
> command does not mean the STONITH will succeed in every case (you could even 
> tun off power in the wrong PDU, which is still a "success" from the cluster's 
> perspective)
> [...]
> 
> What I really wanted to say is:
> If the fencing command logged an error, try to fix it; if it did not, try to 
> find out why fencing did not work.
> 
> Regards,
> Ulrich

Yes, I understand your point now, and agree completely.

The cluster can only respond to the status code (or timeout) it receives
from the fence agent. There may be problems beyond that point (in the
fence agent and/or the device itself) that result in success being
returned incorrectly, and that must be investigated separately.

_______________________________________________
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to