GitHub user wido opened a pull request:
https://github.com/apache/cloudstack/pull/1707
CLOUDSTACK-9397: Add Watchdog timer to KVM Instance
The watchdog timer adds functionality where the Hypervisor can detect if an
instance has crashed or stopped functioning.
When the Instance has the 'watchdog' daemon running it will send heartbeats
to the /dev/watchdog device.
If these heartbeats are no longer received by the HV it will reset the
Instance.
If the Instance never sends the heartbeats the HV does not take action. It
only
takes action if it stops sending heartbeats.
This is supported since Libvirt 0.7.3 and can be defined in the XML format
as
described in the docs:
https://libvirt.org/formatdomain.html#elementsWatchdog
To the 'devices' section this will be added:
<watchdog model='i6300esb' action='reset'/>
In the agent.properties the action to be taken can be defined:
vm.watchdog.action=reset
The same goes for the model. The Intel i6300esb is however the most
commonly used.
vm.watchdog.model=i6300esb
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/wido/cloudstack watchdog-timer
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/cloudstack/pull/1707.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1707
----
commit 8046ba679be53abd7a70657d7f8ed00f2225cf46
Author: Wido den Hollander <[email protected]>
Date: 2016-05-31T09:31:27Z
CLOUDSTACK-9397: Add Watchdog timer to KVM Instance
The watchdog timer adds functionality where the Hypervisor can detect if an
instance has crashed or stopped functioning.
When the Instance has the 'watchdog' daemon running it will send heartbeats
to the /dev/watchdog device.
If these heartbeats are no longer received by the HV it will reset the
Instance.
If the Instance never sends the heartbeats the HV does not take action. It
only
takes action if it stops sending heartbeats.
This is supported since Libvirt 0.7.3 and can be defined in the XML format
as
described in the docs:
https://libvirt.org/formatdomain.html#elementsWatchdog
To the 'devices' section this will be added:
<watchdog model='i6300esb' action='reset'/>
In the agent.properties the action to be taken can be defined:
vm.watchdog.action=reset
The same goes for the model. The Intel i6300esb is however the most
commonly used.
vm.watchdog.model=i6300esb
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---