I have about a dozen corosync+pacemaker clusters and I am just now getting around to understanding timeouts.
Most of my corosync.conf files look something like this: version: 2 token: 5000 token_retransmits_before_loss_const: 10 join: 1000 consensus: 7500 vsftype: none max_messages: 20 secauth: off threads: 0 clear_node_high_bit: yes rrp_mode: active If I understand this correctly, this means the node will wait 50 seconds (5000ms x 10) before deciding that a cluster reconfig is necessary (perhaps after a link failure). Is that correct? I'm trying to understand how this works together with my bonded NIC's arp_interval settings. I normally set arp_interval=1000. My question is, how many arp losses are required before the bonding driver decides to failover to the other link? If arp_interval=1000, how many times does the driver send an arp and fail to receive a reply before it decides that the link is dead? I think I need to know this so I can set my corosync.conf settings correctly to avoid "false positive" cluster failovers. In other words, if there is a link or switch failure, I want to make sure that the cluster allows plenty of time for link communication to recover before deciding that a node has actually died. -- Eric Robinson _______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org