[
https://issues.apache.org/jira/browse/CLOUDSTACK-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sheng Yang resolved CLOUDSTACK-1653.
------------------------------------
Resolution: Fixed
> Redundant router: check_heartbeat.sh malfunction caused by delayed cron job
> ---------------------------------------------------------------------------
>
> Key: CLOUDSTACK-1653
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-1653
> Project: CloudStack
> Issue Type: Bug
> Security Level: Public(Anyone can view this level - this is the
> default.)
> Affects Versions: 4.1.0
> Reporter: Sheng Yang
> Assignee: Sheng Yang
> Fix For: 4.2.0
>
>
> According to: https://bugzilla.redhat.com/show_bug.cgi?id=159441
> cron can only guarantee the minimum interval of execution jobs, so two check
> of check_heartbeat.sh would possibly take more than 1 minutes.
> Since keepalived should update keepalived.ts every 10 seconds, so if any of
> two execution have gap less than 60 seconds, it should fail.
> The current logic in the check_heartbeat.sh is wrong, which only guarantee
> cron didn't delay, but not keepalived is alive.
> This pass the original test because it was a NFS disconnecting test, in which
> case disk is corrupted, so cron got delayed, means network is down.
> Change the condition to less than 60(probably 30 is safer because seems
> sometime cron has bug for not meeting the minimum interval requirement)
> should works too. Because it should find out that keepalived is dead after
> second time it was executed after NFS recovered.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira