Ganeti is currently not able to detect a legit shutdown request performed by a user from inside a Xen domain.
This patch provides a design document to implement a mechanism able to cope with such events. Signed-off-by: Michele Tartara <[email protected]> --- Makefile.am | 1 + doc/design-draft.rst | 1 + doc/design-internal-shutdown.rst | 72 ++++++++++++++++++++++++++++++++++++++++ 3 files changed, 74 insertions(+) create mode 100644 doc/design-internal-shutdown.rst diff --git a/Makefile.am b/Makefile.am index 037cf53..f66624e 100644 --- a/Makefile.am +++ b/Makefile.am @@ -410,6 +410,7 @@ docinput = \ doc/design-htools-2.3.rst \ doc/design-http-server.rst \ doc/design-impexp2.rst \ + doc/design-internal-shutdown.rst \ doc/design-lu-generated-jobs.rst \ doc/design-linuxha.rst \ doc/design-multi-reloc.rst \ diff --git a/doc/design-draft.rst b/doc/design-draft.rst index ccb2f93..9a1d2b1 100644 --- a/doc/design-draft.rst +++ b/doc/design-draft.rst @@ -19,6 +19,7 @@ Design document drafts design-storagetypes.rst design-reason-trail.rst design-device-uuid-name.rst + design-internal-shutdown.rst .. vim: set textwidth=72 : .. Local Variables: diff --git a/doc/design-internal-shutdown.rst b/doc/design-internal-shutdown.rst new file mode 100644 index 0000000..836d00c --- /dev/null +++ b/doc/design-internal-shutdown.rst @@ -0,0 +1,72 @@ +============================================================ +Detection of user-initiated shutdown from inside an instance +============================================================ + +.. contents:: :depth: 2 + +This is a design document detailing the implementation of a way for Ganeti to +detect whether a machine marked as up but not running was shutdown gracefully +by the user from inside the machine itself. + +Current state and shortcomings +============================== + +Ganeti keeps track of the desired status of instances in order to be able to +take proper actions (e.g.: reboot) on the ones that happen to crash. +Currently, the only way to properly shut down a machine is through Ganeti's own +commands, that will mark an instance as ``ADMIN_down``. +If a user shuts down an instance from inside, through the proper command of the +operating system it is running, the instance will be shutdown gracefully, but +Ganeti is not aware of that: the desired status of the instance will still be +marked as ``running``, so when the watcher realises that the instance is down, +it will restart it. This behaviour is usually not what the user expects. + +Proposed changes +================ + +We propose to modify Ganeti in such a way that it will detect when an instance +was shutdown because of an explicit user request. When such a situation is +detected, the state of the instance will be set to ADMIN_down, as intended by +the user. + +This design document applies to the Xen backend of Ganeti, because it uses +features specific of such hypervisor. + +Implementation +============== + +Xen knows why a domain is being shut down (a crash or an explicit shutdown +or poweroff request), but such information is not usually readily available +externally, because all such cases lead to the virtual machine being destroyed +immediately after the event is detected. + +Still, Xen allows the instance configuration file to define what action to be +taken in all those cases through the ``on_poweroff``, ``on_shutdown`` and +``on_crash`` variables. By setting them to ``preserve``, Xen will avoid +destroying the domains automatically. + +When the domain is not destroyed, it can be viewed by using ``xm list`` (or ``xl +list`` in newer Xen versions), and the ``State`` field of the output will +provide useful information. + +If the state is ``----c-`` it means the instance has crashed. + +If the state is ``---s--`` it means the instance was properly shutdown. + +If the instance was properly shutdown and it is still marked as ``running`` by +Ganeti, it means that it was shutdown from inside by the user, and the ganeti +status of the instance needs to be changed to ``ADMIN_down``. + +This will be done at regular intervals by the group watcher, just before +deciding which instances to reboot. + +On top of that, at the same times, the watcher will also need to issue ``xm +destroy`` commands for all the domains that are in crashed or shutdown state, +since this will not be done automatically by Xen anymore because of the +``preserve`` setting in their config files. + +.. vim: set textwidth=72 : +.. Local Variables: +.. mode: rst +.. fill-column: 72 +.. End: -- 1.8.2.1
