On Sat, May 18, 2013 at 10:07 AM, Michele Tartara <[email protected]>wrote:
> +list > > > > On Fri, May 17, 2013 at 4:26 PM, Michele Tartara <[email protected]>wrote: > >> On Fri, May 17, 2013 at 2:50 PM, Guido Trotter <[email protected]>wrote: >> >>> >>> >>> >>> On Fri, May 17, 2013 at 2:30 PM, Michele Tartara <[email protected]>wrote: >>> >>>> On Fri, May 17, 2013 at 10:54 AM, Guido Trotter >>>> <[email protected]>wrote: >>>> >>>>> >>>>> >>>>> >>>>> On Fri, May 17, 2013 at 10:46 AM, Michele Tartara <[email protected] >>>>> > wrote: >>>>> >>>>>> Ganeti is currently not able to detect a legit shutdown request >>>>>> performed by a >>>>>> user from inside a Xen domain. >>>>>> >>>>>> This patch provides a design document to implement a mechanism able >>>>>> to cope with >>>>>> such events. >>>>>> >>>>>> Signed-off-by: Michele Tartara <[email protected]> >>>>>> --- >>>>>> Makefile.am | 1 + >>>>>> doc/design-draft.rst | 1 + >>>>>> doc/design-internal-shutdown.rst | 72 >>>>>> ++++++++++++++++++++++++++++++++++++++++ >>>>>> 3 files changed, 74 insertions(+) >>>>>> create mode 100644 doc/design-internal-shutdown.rst >>>>>> >>>>>> diff --git a/Makefile.am b/Makefile.am >>>>>> index 037cf53..f66624e 100644 >>>>>> --- a/Makefile.am >>>>>> +++ b/Makefile.am >>>>>> @@ -410,6 +410,7 @@ docinput = \ >>>>>> doc/design-htools-2.3.rst \ >>>>>> doc/design-http-server.rst \ >>>>>> doc/design-impexp2.rst \ >>>>>> + doc/design-internal-shutdown.rst \ >>>>>> doc/design-lu-generated-jobs.rst \ >>>>>> doc/design-linuxha.rst \ >>>>>> doc/design-multi-reloc.rst \ >>>>>> diff --git a/doc/design-draft.rst b/doc/design-draft.rst >>>>>> index ccb2f93..9a1d2b1 100644 >>>>>> --- a/doc/design-draft.rst >>>>>> +++ b/doc/design-draft.rst >>>>>> @@ -19,6 +19,7 @@ Design document drafts >>>>>> design-storagetypes.rst >>>>>> design-reason-trail.rst >>>>>> design-device-uuid-name.rst >>>>>> + design-internal-shutdown.rst >>>>>> >>>>>> .. vim: set textwidth=72 : >>>>>> .. Local Variables: >>>>>> diff --git a/doc/design-internal-shutdown.rst >>>>>> b/doc/design-internal-shutdown.rst >>>>>> new file mode 100644 >>>>>> index 0000000..836d00c >>>>>> --- /dev/null >>>>>> +++ b/doc/design-internal-shutdown.rst >>>>>> @@ -0,0 +1,72 @@ >>>>>> +============================================================ >>>>>> +Detection of user-initiated shutdown from inside an instance >>>>>> +============================================================ >>>>>> + >>>>>> +.. contents:: :depth: 2 >>>>>> + >>>>>> +This is a design document detailing the implementation of a way for >>>>>> Ganeti to >>>>>> +detect whether a machine marked as up but not running was shutdown >>>>>> gracefully >>>>>> +by the user from inside the machine itself. >>>>>> + >>>>>> +Current state and shortcomings >>>>>> +============================== >>>>>> + >>>>>> +Ganeti keeps track of the desired status of instances in order to be >>>>>> able to >>>>>> +take proper actions (e.g.: reboot) on the ones that happen to crash. >>>>>> +Currently, the only way to properly shut down a machine is through >>>>>> Ganeti's own >>>>>> +commands, that will mark an instance as ``ADMIN_down``. >>>>>> +If a user shuts down an instance from inside, through the proper >>>>>> command of the >>>>>> +operating system it is running, the instance will be shutdown >>>>>> gracefully, but >>>>>> +Ganeti is not aware of that: the desired status of the instance will >>>>>> still be >>>>>> +marked as ``running``, so when the watcher realises that the >>>>>> instance is down, >>>>>> +it will restart it. This behaviour is usually not what the user >>>>>> expects. >>>>>> + >>>>>> +Proposed changes >>>>>> +================ >>>>>> + >>>>>> +We propose to modify Ganeti in such a way that it will detect when >>>>>> an instance >>>>>> +was shutdown because of an explicit user request. When such a >>>>>> situation is >>>>>> +detected, the state of the instance will be set to ADMIN_down, as >>>>>> intended by >>>>>> +the user. >>>>>> + >>>>>> >>>>> >>>>> Should we provide an option to just restart it in that case? (as an hv >>>>> parameter) >>>>> The default could still be to leave it down. >>>>> >>>> >>>> Do you mean to have an option to tell the system to behave as it does >>>> currently? Is that really useful? >>>> >>>> >>> Yes. One thing is "we don't want these to appear as errors, like if the >>> instance have crashed", and a different one is "we're ok if they don't come >>> back up". That said, perhaps if the user does "halt" it's ok to halt, and >>> if they do reboot to reboot. So that should be the default. >>> >> >> Ack. >> >> >>> >>> >>>> >>>>> >>>>>> +This design document applies to the Xen backend of Ganeti, because >>>>>> it uses >>>>>> +features specific of such hypervisor. >>>>>> + >>>>>> >>>>> >>>>> Is there any way to do something similar at least for kvm? I think >>>>> this is worth investigating. >>>>> >>>> >>>> I had a brief look, and unfortunately it seems like it's not possible. >>>> But I'll spend some more time trying to do it. >>>> >>> >>> I see a -no-shutdown option in qemu. Not sure what kind of information >>> the monitor gives, in that case. >>> >> >> Thanks for the hint, I'll have a look at it. >> >> >>> >>> >>> >>>> >>>>> >>>>> >>>>>> +Implementation >>>>>> +============== >>>>>> + >>>>>> +Xen knows why a domain is being shut down (a crash or an explicit >>>>>> shutdown >>>>>> +or poweroff request), but such information is not usually readily >>>>>> available >>>>>> +externally, because all such cases lead to the virtual machine being >>>>>> destroyed >>>>>> +immediately after the event is detected. >>>>>> + >>>>>> +Still, Xen allows the instance configuration file to define what >>>>>> action to be >>>>>> +taken in all those cases through the ``on_poweroff``, >>>>>> ``on_shutdown`` and >>>>>> +``on_crash`` variables. By setting them to ``preserve``, Xen will >>>>>> avoid >>>>>> +destroying the domains automatically. >>>>>> + >>>>>> +When the domain is not destroyed, it can be viewed by using ``xm >>>>>> list`` (or ``xl >>>>>> +list`` in newer Xen versions), and the ``State`` field of the output >>>>>> will >>>>>> +provide useful information. >>>>>> + >>>>>> +If the state is ``----c-`` it means the instance has crashed. >>>>>> + >>>>>> +If the state is ``---s--`` it means the instance was properly >>>>>> shutdown. >>>>>> + >>>>>> +If the instance was properly shutdown and it is still marked as >>>>>> ``running`` by >>>>>> +Ganeti, it means that it was shutdown from inside by the user, and >>>>>> the ganeti >>>>>> +status of the instance needs to be changed to ``ADMIN_down``. >>>>>> + >>>>>> +This will be done at regular intervals by the group watcher, just >>>>>> before >>>>>> +deciding which instances to reboot. >>>>> >>>>> + >>>>>> +On top of that, at the same times, the watcher will also need to >>>>>> issue ``xm >>>>>> +destroy`` commands for all the domains that are in crashed or >>>>>> shutdown state, >>>>>> +since this will not be done automatically by Xen anymore because of >>>>>> the >>>>>> +``preserve`` setting in their config files. >>>>>> + >>>>>> >>>>> >>>>> Does this mean the memory will be freed only 5 minutes later? >>>>> >>>> >>>> Yes, that's the downside of this approach. >>>> Still, I think the impact is quite limited. First, because I don't >>>> think that shutting down VMs from inside happens so frequently (but we >>>> probably need some data to confirm this), and 5 minutes is the upper limit >>>> anyway. Second, because if we see that it is required, we could have the >>>> cleanup function running not only in the watcher, but also "on demand"every >>>> time it is needed, such as before creating a new instance, so that all the >>>> operations that require memory to be available are sure to be executed in a >>>> clean state. >>>> >>>> >>> Ack. >>> >> >>> >>>> Can you add a note saying that this will not impact the "ganeti >>>>> admin" issued path, as that can check "xm list" and will do "the right >>>>> thing" finishing with the destruction? >>>>> >>>> >>>> Sure, this is not a problem. >>>> >>>> Also, what will happen when gnt-instance list is called beteween the >>>>> shutdown and the watcher run? Do we need to display a different state, >>>>> make >>>>> the cleanup happen then (not sure I like this), or just display ADMIN_Down >>>>> "as if", although the domain is not destroyed yet? >>>>> >>>> >>>> Given that gnt-instance list shows the status of instances according to >>>> Ganeti, I think the right thing would be to show the exact situation: the >>>> expected status, the actual status, and a note saying that an internal >>>> shutdown was detected. >>>> >>>> >>> Ack, but this needs to be explicit in this design. What exactly is going >>> to appear in the "status" field? What about in oper_state and admin_state ? >>> Is there any new field you're adding? (I assume not, but this is important >>> to specify). >>> >> >> Ok, I'll detail all of this. >> >> >>> >>> >>> >>>> >>>>> Finally, can you comment on changes to gnt-instance >>>>> start/failover/migrate and other ops, when they encounter an instance in >>>>> this state? Will the RPC layer be changed to make this transparent to them >>>>> (how?) or will they need to deal with this explicitly? >>>>> >>>> >>>> My idea is to run the cleanup function for a single instance just >>>> before executing commands that would affect the running state of that >>>> instance. This way, we'd end up being sure we always are in a clean state >>>> before performing an operation. >>>> I don't see a need to modify the RPC layer: it seems to me that this is >>>> an hypervisor specific think, and could therefore be implemented directly >>>> in the affected hypervisor-specific functions. >>>> >>>> >>> Ack, but please add this to the design doc, specifying that it is only >>> for functions that perform action on an instance, and not for ones who just >>> query it (if this is the case as I assume it is). >>> >> >> Ok. Yes, of course, that is the case. >> >> Thanks, >> Michele >> > > Interdiff: diff --git a/doc/design-internal-shutdown.rst b/doc/design-internal-shutdown.rst index 836d00c..a92d837 100644 --- a/doc/design-internal-shutdown.rst +++ b/doc/design-internal-shutdown.rst We propose to modify Ganeti in such a way that it will detect when an instance was shutdown because of an explicit user request. When such a situation is -detected, the state of the instance will be set to ADMIN_down, as intended by -the user. +detected, instead of presenting an error as it happens now, either the state +of the instance will be set to ADMIN_down, or the instance will be +automatically rebooted, depending on a instance-specific configuration value. +The default behavior in case no such parameter is found will be to follow +the apparent will of the user, and setting to ADMIN_down an instance that +was shut down correctly from inside. This design document applies to the Xen backend of Ganeti, because it uses -features specific of such hypervisor. +features specific of such hypervisor. Initial analysis suggests that a similar +approach might be used for KVM as well, so this design document will be later +extended to add more details about it. Implementation ============== @@ -65,6 +71,40 @@ destroy`` commands for all the domains that are in crashed or shutdown state, since this will not be done automatically by Xen anymore because of the ``preserve`` setting in their config files. +This behavior will be limited to the domains shut down from inside, because it +will actually keep the resources of the domain busy until the watcher will do +the cleaning job (that, with the default setting, is up to every 5 minutes). +Still, this is considered acceptable, because it is not frequent for a domain +to be shut down this way. The cleanup function will be also run +automatically just before performing any job that requires resources to be +available (such as when creating a new instance), in order to ensure that the +new resource allocation happens starting from a clean state. Functionalities +that only query the state of instances will not run the cleanup function. + +This changes are hypervisor-specific and will not affect the external RPC +interface. + +Side effects of the modification +++++++++++++++++++++++++++++++++ + +The implementation of this design document will require some commands to be +changed in order to cope with the new shutdown procedure. + +With this modification, also the Ganeti command for shutting down instances +would leave them in a shutdown but preserved state. Therefore, it will be +changed in such a way to immediately perform the cleanup of the instance +after verifying its correct shutdown. + +The ``gnt-instance list`` command will need to be able to handle the situation +where an instance was shutdown internally but not yet cleaned up. +The admin_state field will maintain the current meaning unchanged. The +oper_state field will get a new possible state, ``S``, meaning that the instance +was shutdown internally. + +The ``gnt-instance info`` command ``State`` field, in such case, will show a +message stating that the instance was supposed to be run but was shut down +internally. + .. vim: set textwidth=72 : .. Local Variables: .. mode: rst Cheers, Michele
