GitHub user remibergsma opened a pull request:
https://github.com/apache/cloudstack/pull/1486
Reimplement router.redundant.vrrp.interval setting
Global setting `router.redundant.vrrp.interval` is not used any more and it
is now set to a hardcoded 1.
This results in a failover from master->backup when the backup doesn't hear
from the master in ~3.6sec. This is a bit too tight, as we've seen failovers
during live migrations. We could reproduce it in about half of the cases.
Setting this to setting to 2 (tested it by hardcoding it in the systemvms)
gives twice as much time and we didn't see issues any more. Instead of updating
the hardcoded setting from 1 to 2, I reimplemented the global setting by
sending it to the router with the cmd_line, as the non-VPC router also does.
Background:
Why is the maximum failover time in the example 3.6 seconds? This comes
from the advertisement interval and the skew time. The default advertisement
interval is 1 second (configurable in keepalived.conf). The skew time helps to
keep everyone from trying to transition at once. It is a number between 0 and
1, based on the formula (256 - priority) / 256
As defined in the RFC, the backup must receive an advertisement from the
master every (3 * advert_int) + skew_time seconds. If it doesn't hear anything
from the master, it takes over. With a backup router priority of 100 (as in the
example), the failover will happen at most 3.6 seconds after the master goes
down.
Source: http://www.hollenback.net/KeepalivedForNetworkReliability
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/remibergsma/cloudstack
reimplement-vrrp-setting-47
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/cloudstack/pull/1486.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1486
----
commit c33358db848faf8c8891e00e0100a2627b177407
Author: Remi Bergsma <[email protected]>
Date: 2016-03-23T15:33:20Z
Have rVPCs use the router.redundant.vrrp.interval setting
It defaults to 1, which is hardcoded in the template:
./cosmic/cosmic-core/systemvm/patches/debian/config/opt/cloud/templates/keepalived.conf.templ
As non-VPC redundant routers use this setting, I think it makes sense to
use it for rVPCs as well.
We also need a change to pickup the cmd_line parameter and use it in the
Python code that configures the router.
commit 408478413ad0469265dfa0ce9101d6337f558ab2
Author: Remi Bergsma <[email protected]>
Date: 2016-03-23T15:56:54Z
Configure rVPC for router.redundant.vrrp.interval advert_int setting
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---