I am trying to understand how these timers interact with each other.
In a RHEL4 cluster the heartbeat defaults are; hello_timer:5 max_retries:5 deadnode_timeout:21 Meaning a heartbeat message is sent every 5 seconds, if it fails to receive a response it will start a deadnode counter @ 21 seconds. It will also try to send 5 more heartbeat requests. What is the interval of those retries? If none of those requests receive a response. 5 seconds pass.. there is 15 seconds left on the deadnode timer and we try upto 5 times to get a response.... This goes on until we hit the 4th iteration of the hellotimer it tries again upto 5 times and fails... we then hit the 21 second on the deadnode time.. fenced takes over and wham reboot. Is my understanding of this correct???? Thanks for any help.. Michael
-- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster