+list
On Fri, May 17, 2013 at 4:26 PM, Michele Tartara <[email protected]>wrote: > On Fri, May 17, 2013 at 2:50 PM, Guido Trotter <[email protected]>wrote: > >> >> >> >> On Fri, May 17, 2013 at 2:30 PM, Michele Tartara <[email protected]>wrote: >> >>> On Fri, May 17, 2013 at 10:54 AM, Guido Trotter <[email protected]>wrote: >>> >>>> >>>> >>>> >>>> On Fri, May 17, 2013 at 10:46 AM, Michele Tartara >>>> <[email protected]>wrote: >>>> >>>>> Ganeti is currently not able to detect a legit shutdown request >>>>> performed by a >>>>> user from inside a Xen domain. >>>>> >>>>> This patch provides a design document to implement a mechanism able to >>>>> cope with >>>>> such events. >>>>> >>>>> Signed-off-by: Michele Tartara <[email protected]> >>>>> --- >>>>> Makefile.am | 1 + >>>>> doc/design-draft.rst | 1 + >>>>> doc/design-internal-shutdown.rst | 72 >>>>> ++++++++++++++++++++++++++++++++++++++++ >>>>> 3 files changed, 74 insertions(+) >>>>> create mode 100644 doc/design-internal-shutdown.rst >>>>> >>>>> diff --git a/Makefile.am b/Makefile.am >>>>> index 037cf53..f66624e 100644 >>>>> --- a/Makefile.am >>>>> +++ b/Makefile.am >>>>> @@ -410,6 +410,7 @@ docinput = \ >>>>> doc/design-htools-2.3.rst \ >>>>> doc/design-http-server.rst \ >>>>> doc/design-impexp2.rst \ >>>>> + doc/design-internal-shutdown.rst \ >>>>> doc/design-lu-generated-jobs.rst \ >>>>> doc/design-linuxha.rst \ >>>>> doc/design-multi-reloc.rst \ >>>>> diff --git a/doc/design-draft.rst b/doc/design-draft.rst >>>>> index ccb2f93..9a1d2b1 100644 >>>>> --- a/doc/design-draft.rst >>>>> +++ b/doc/design-draft.rst >>>>> @@ -19,6 +19,7 @@ Design document drafts >>>>> design-storagetypes.rst >>>>> design-reason-trail.rst >>>>> design-device-uuid-name.rst >>>>> + design-internal-shutdown.rst >>>>> >>>>> .. vim: set textwidth=72 : >>>>> .. Local Variables: >>>>> diff --git a/doc/design-internal-shutdown.rst >>>>> b/doc/design-internal-shutdown.rst >>>>> new file mode 100644 >>>>> index 0000000..836d00c >>>>> --- /dev/null >>>>> +++ b/doc/design-internal-shutdown.rst >>>>> @@ -0,0 +1,72 @@ >>>>> +============================================================ >>>>> +Detection of user-initiated shutdown from inside an instance >>>>> +============================================================ >>>>> + >>>>> +.. contents:: :depth: 2 >>>>> + >>>>> +This is a design document detailing the implementation of a way for >>>>> Ganeti to >>>>> +detect whether a machine marked as up but not running was shutdown >>>>> gracefully >>>>> +by the user from inside the machine itself. >>>>> + >>>>> +Current state and shortcomings >>>>> +============================== >>>>> + >>>>> +Ganeti keeps track of the desired status of instances in order to be >>>>> able to >>>>> +take proper actions (e.g.: reboot) on the ones that happen to crash. >>>>> +Currently, the only way to properly shut down a machine is through >>>>> Ganeti's own >>>>> +commands, that will mark an instance as ``ADMIN_down``. >>>>> +If a user shuts down an instance from inside, through the proper >>>>> command of the >>>>> +operating system it is running, the instance will be shutdown >>>>> gracefully, but >>>>> +Ganeti is not aware of that: the desired status of the instance will >>>>> still be >>>>> +marked as ``running``, so when the watcher realises that the instance >>>>> is down, >>>>> +it will restart it. This behaviour is usually not what the user >>>>> expects. >>>>> + >>>>> +Proposed changes >>>>> +================ >>>>> + >>>>> +We propose to modify Ganeti in such a way that it will detect when an >>>>> instance >>>>> +was shutdown because of an explicit user request. When such a >>>>> situation is >>>>> +detected, the state of the instance will be set to ADMIN_down, as >>>>> intended by >>>>> +the user. >>>>> + >>>>> >>>> >>>> Should we provide an option to just restart it in that case? (as an hv >>>> parameter) >>>> The default could still be to leave it down. >>>> >>> >>> Do you mean to have an option to tell the system to behave as it does >>> currently? Is that really useful? >>> >>> >> Yes. One thing is "we don't want these to appear as errors, like if the >> instance have crashed", and a different one is "we're ok if they don't come >> back up". That said, perhaps if the user does "halt" it's ok to halt, and >> if they do reboot to reboot. So that should be the default. >> > > Ack. > > >> >> >>> >>>> >>>>> +This design document applies to the Xen backend of Ganeti, because it >>>>> uses >>>>> +features specific of such hypervisor. >>>>> + >>>>> >>>> >>>> Is there any way to do something similar at least for kvm? I think this >>>> is worth investigating. >>>> >>> >>> I had a brief look, and unfortunately it seems like it's not possible. >>> But I'll spend some more time trying to do it. >>> >> >> I see a -no-shutdown option in qemu. Not sure what kind of information >> the monitor gives, in that case. >> > > Thanks for the hint, I'll have a look at it. > > >> >> >> >>> >>>> >>>> >>>>> +Implementation >>>>> +============== >>>>> + >>>>> +Xen knows why a domain is being shut down (a crash or an explicit >>>>> shutdown >>>>> +or poweroff request), but such information is not usually readily >>>>> available >>>>> +externally, because all such cases lead to the virtual machine being >>>>> destroyed >>>>> +immediately after the event is detected. >>>>> + >>>>> +Still, Xen allows the instance configuration file to define what >>>>> action to be >>>>> +taken in all those cases through the ``on_poweroff``, ``on_shutdown`` >>>>> and >>>>> +``on_crash`` variables. By setting them to ``preserve``, Xen will >>>>> avoid >>>>> +destroying the domains automatically. >>>>> + >>>>> +When the domain is not destroyed, it can be viewed by using ``xm >>>>> list`` (or ``xl >>>>> +list`` in newer Xen versions), and the ``State`` field of the output >>>>> will >>>>> +provide useful information. >>>>> + >>>>> +If the state is ``----c-`` it means the instance has crashed. >>>>> + >>>>> +If the state is ``---s--`` it means the instance was properly >>>>> shutdown. >>>>> + >>>>> +If the instance was properly shutdown and it is still marked as >>>>> ``running`` by >>>>> +Ganeti, it means that it was shutdown from inside by the user, and >>>>> the ganeti >>>>> +status of the instance needs to be changed to ``ADMIN_down``. >>>>> + >>>>> +This will be done at regular intervals by the group watcher, just >>>>> before >>>>> +deciding which instances to reboot. >>>> >>>> + >>>>> +On top of that, at the same times, the watcher will also need to >>>>> issue ``xm >>>>> +destroy`` commands for all the domains that are in crashed or >>>>> shutdown state, >>>>> +since this will not be done automatically by Xen anymore because of >>>>> the >>>>> +``preserve`` setting in their config files. >>>>> + >>>>> >>>> >>>> Does this mean the memory will be freed only 5 minutes later? >>>> >>> >>> Yes, that's the downside of this approach. >>> Still, I think the impact is quite limited. First, because I don't think >>> that shutting down VMs from inside happens so frequently (but we probably >>> need some data to confirm this), and 5 minutes is the upper limit anyway. >>> Second, because if we see that it is required, we could have the cleanup >>> function running not only in the watcher, but also "on demand"every time it >>> is needed, such as before creating a new instance, so that all the >>> operations that require memory to be available are sure to be executed in a >>> clean state. >>> >>> >> Ack. >> > >> >>> Can you add a note saying that this will not impact the "ganeti admin" >>>> issued path, as that can check "xm list" and will do "the right thing" >>>> finishing with the destruction? >>>> >>> >>> Sure, this is not a problem. >>> >>> Also, what will happen when gnt-instance list is called beteween the >>>> shutdown and the watcher run? Do we need to display a different state, make >>>> the cleanup happen then (not sure I like this), or just display ADMIN_Down >>>> "as if", although the domain is not destroyed yet? >>>> >>> >>> Given that gnt-instance list shows the status of instances according to >>> Ganeti, I think the right thing would be to show the exact situation: the >>> expected status, the actual status, and a note saying that an internal >>> shutdown was detected. >>> >>> >> Ack, but this needs to be explicit in this design. What exactly is going >> to appear in the "status" field? What about in oper_state and admin_state ? >> Is there any new field you're adding? (I assume not, but this is important >> to specify). >> > > Ok, I'll detail all of this. > > >> >> >> >>> >>>> Finally, can you comment on changes to gnt-instance >>>> start/failover/migrate and other ops, when they encounter an instance in >>>> this state? Will the RPC layer be changed to make this transparent to them >>>> (how?) or will they need to deal with this explicitly? >>>> >>> >>> My idea is to run the cleanup function for a single instance just before >>> executing commands that would affect the running state of that instance. >>> This way, we'd end up being sure we always are in a clean state before >>> performing an operation. >>> I don't see a need to modify the RPC layer: it seems to me that this is >>> an hypervisor specific think, and could therefore be implemented directly >>> in the affected hypervisor-specific functions. >>> >>> >> Ack, but please add this to the design doc, specifying that it is only >> for functions that perform action on an instance, and not for ones who just >> query it (if this is the case as I assume it is). >> > > Ok. Yes, of course, that is the case. > > Thanks, > Michele >
