On 17 May 2013 19:30, Michele Tartara <[email protected]> wrote: > On Fri, May 17, 2013 at 6:05 PM, Bernardo Dal Seno <[email protected]> > wrote: >> >> On 17 May 2013 11:46, Michele Tartara <[email protected]> wrote: >> > Ganeti is currently not able to detect a legit shutdown request >> > performed by a >> > user from inside a Xen domain. >> > >> > This patch provides a design document to implement a mechanism able to >> > cope with >> > such events. >> > >> > Signed-off-by: Michele Tartara <[email protected]> >> > --- >> > Makefile.am | 1 + >> > doc/design-draft.rst | 1 + >> > doc/design-internal-shutdown.rst | 72 >> > ++++++++++++++++++++++++++++++++++++++++ >> > 3 files changed, 74 insertions(+) >> > create mode 100644 doc/design-internal-shutdown.rst >> > >> > diff --git a/Makefile.am b/Makefile.am >> > index 037cf53..f66624e 100644 >> > --- a/Makefile.am >> > +++ b/Makefile.am >> > @@ -410,6 +410,7 @@ docinput = \ >> > doc/design-htools-2.3.rst \ >> > doc/design-http-server.rst \ >> > doc/design-impexp2.rst \ >> > + doc/design-internal-shutdown.rst \ >> > doc/design-lu-generated-jobs.rst \ >> > doc/design-linuxha.rst \ >> > doc/design-multi-reloc.rst \ >> > diff --git a/doc/design-draft.rst b/doc/design-draft.rst >> > index ccb2f93..9a1d2b1 100644 >> > --- a/doc/design-draft.rst >> > +++ b/doc/design-draft.rst >> > @@ -19,6 +19,7 @@ Design document drafts >> > design-storagetypes.rst >> > design-reason-trail.rst >> > design-device-uuid-name.rst >> > + design-internal-shutdown.rst >> > >> > .. vim: set textwidth=72 : >> > .. Local Variables: >> > diff --git a/doc/design-internal-shutdown.rst >> > b/doc/design-internal-shutdown.rst >> > new file mode 100644 >> > index 0000000..836d00c >> > --- /dev/null >> > +++ b/doc/design-internal-shutdown.rst >> > @@ -0,0 +1,72 @@ >> > +============================================================ >> > +Detection of user-initiated shutdown from inside an instance >> > +============================================================ >> > + >> > +.. contents:: :depth: 2 >> > + >> > +This is a design document detailing the implementation of a way for >> > Ganeti to >> > +detect whether a machine marked as up but not running was shutdown >> > gracefully >> > +by the user from inside the machine itself. >> > + >> > +Current state and shortcomings >> > +============================== >> > + >> > +Ganeti keeps track of the desired status of instances in order to be >> > able to >> > +take proper actions (e.g.: reboot) on the ones that happen to crash. >> > +Currently, the only way to properly shut down a machine is through >> > Ganeti's own >> > +commands, that will mark an instance as ``ADMIN_down``. >> > +If a user shuts down an instance from inside, through the proper >> > command of the >> > +operating system it is running, the instance will be shutdown >> > gracefully, but >> > +Ganeti is not aware of that: the desired status of the instance will >> > still be >> > +marked as ``running``, so when the watcher realises that the instance >> > is down, >> > +it will restart it. This behaviour is usually not what the user >> > expects. >> > + >> > +Proposed changes >> > +================ >> > + >> > +We propose to modify Ganeti in such a way that it will detect when an >> > instance >> > +was shutdown because of an explicit user request. When such a situation >> > is >> > +detected, the state of the instance will be set to ADMIN_down, as >> > intended by >> > +the user. >> > + >> > +This design document applies to the Xen backend of Ganeti, because it >> > uses >> > +features specific of such hypervisor. >> > + >> > +Implementation >> > +============== >> > + >> > +Xen knows why a domain is being shut down (a crash or an explicit >> > shutdown >> > +or poweroff request), but such information is not usually readily >> > available >> > +externally, because all such cases lead to the virtual machine being >> > destroyed >> > +immediately after the event is detected. >> > + >> > +Still, Xen allows the instance configuration file to define what action >> > to be >> > +taken in all those cases through the ``on_poweroff``, ``on_shutdown`` >> > and >> > +``on_crash`` variables. By setting them to ``preserve``, Xen will avoid >> > +destroying the domains automatically. >> > + >> > +When the domain is not destroyed, it can be viewed by using ``xm list`` >> > (or ``xl >> > +list`` in newer Xen versions), and the ``State`` field of the output >> > will >> > +provide useful information. >> > + >> > +If the state is ``----c-`` it means the instance has crashed. >> > + >> > +If the state is ``---s--`` it means the instance was properly shutdown. >> > + >> > +If the instance was properly shutdown and it is still marked as >> > ``running`` by >> > +Ganeti, it means that it was shutdown from inside by the user, and the >> > ganeti >> > +status of the instance needs to be changed to ``ADMIN_down``. >> > + >> > +This will be done at regular intervals by the group watcher, just >> > before >> > +deciding which instances to reboot. >> > + >> > +On top of that, at the same times, the watcher will also need to issue >> > ``xm >> > +destroy`` commands for all the domains that are in crashed or shutdown >> > state, >> > +since this will not be done automatically by Xen anymore because of the >> > +``preserve`` setting in their config files. >> >> I think that that should be done also by gnt-instance start and >> similar commands, as they could be issued before the watcher runs. >> >> Also, what happens to output of gnt-instance list? Will it be correct? >> > Read my reply to Guido's emails and you'll find the answer to your > questions. :-)
If only I found them. But I guess I'll wait for the revised doc. :-) Bernardo > > Thanks for pointing it out, though. > I'll soon send a revised design doc containing those clarifications. > > Thanks, > Michele
