On 12/05/2017 03:11 PM, Ulrich Windl wrote:
"Gao,Yan" schrieb am 05.12.2017 um 15:04 in Nachricht
:
On 12/05/2017 12:41 PM, Ulrich Windl wrote:
"Gao,Yan" schrieb am 01.12.2017 um 20:36 in Nachricht
:
[...]
I meant: There are three delays:
1) The delay until data is on the disk
It takes several IOs for the sender to do this -- read the device
header, lookup the slot, write the message and verify the message is
written (-- A timeout_io defaults to 3s).
As mentioned, msgwait timer of the sender starts only after message has
been verified to be written. We just need to make sure stonith-timeout
is configured longer enough than the sum.
2) Delay until date is read from the disk
It's already taken into account with msgwait. Considering the recipient
keeps reading in a loop, we don't know when exactly it starts to read
for this specific message. But once it starts a reading, it has to be
done within timeout_watchdog, otherwise watchdog triggers. So even for a
bad case, the message should be read within 2* timemout_watchdog. That's
the reason why the sender has to wait msgwait, which is 2 *
timeout_watchdog.
3) Delay until Host was killed
Kill is basically immediately triggered once poison pill is read.
Considering that the response time of a SAN disk system with cache is typically a very
few microseconds, writing to disk may be even "more immediate" than killing the
node via watchdog reset ;-)
Well, it's possible :) Timeout matters for "bad cases" though. Compared
with a disk io facing difficulties like path failure and so on,
triggering watchdog is trivial.
So you can't easily say one is immediate, while the other has to be waited for
IMHO.
Of course a even longer msgwait with all the factors that you can think
of taken into account will be even safer.
Regards,
Yan
Regards,
Ulrich
A confirmation before 3) could shorten the total wait that includes 2) and
3),
right?
As mentioned in another email, an alive node, even indeed coming back
from death, cannot actually confirm itself or even give a confirmation
about if it was ever dead. And a successful fencing means the node being
dead.
Regards,
Yan
Regards,
Ulrich
Regards,
Yan
[...]
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org