Rusell, Russell Jones napsal(a): > Hi all, > > I am trying to understand how the corosync token, token_retansmit, and > token_retransmit_before_loss_const variables all tie in together. >
Definitively look to corosync.conf man page. Summary: token: How long to wait until receive token. When not received, start forming new cluster token_retransmit is automatically computed from token_retransmits_before_loss_const: It's used for making membership more stable. If token is not received in given time, previous token is retransmitted. So If token was lost on the line (and because of UDP it's possible), it may be retransmitted. This value is SMALLER then token (usually 1/4 of token), so it means, 4 tokens are sent before node tries to recreate membership. Generally, don't modify token_retransmit and token_retransmits_before_loss_const. Just modify token if you have big latency. Some setups (very rarely) also need to modify send_join and join. > I have a standard RHCS v3 cluster set up and running. The token timeout > is set to 10000. When testing it seems to detect failed members pretty > consistently within 10 seconds. What I am not understanding is *when* a > node is declared dead, and a fence call is actually made. The man pages > show that the cluster is reconfigured when the "token" time is reached, > and also when token_retransmits_before_loss_const is reached. This is > confusing :-) As I said, formula is token/token_retransmits_before_loss_const = token_retransmit. So just set token if you need something special. If you will set token_retransmit incorrectly, it may take precedence or token may take precedence (whatever is smaller). > > > Which one is it that will reform the cluster? Both? When does one taken > precedence over the other? > Both. Smaller one. > > Thanks! > Regards, Honza -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster