On Tue, May 21, 2013 at 2:45 PM, Guido Trotter <[email protected]> wrote:
> > > > On Tue, May 21, 2013 at 2:37 PM, Michele Tartara <[email protected]>wrote: > >> On Sat, May 18, 2013 at 10:07 AM, Michele Tartara <[email protected]>wrote: >> >>> +list >>> >>> >>> >>> On Fri, May 17, 2013 at 4:26 PM, Michele Tartara <[email protected]>wrote: >>> >>>> On Fri, May 17, 2013 at 2:50 PM, Guido Trotter <[email protected]>wrote: >>>> >>>>> >>>>> >>>>> >>>>> On Fri, May 17, 2013 at 2:30 PM, Michele Tartara >>>>> <[email protected]>wrote: >>>>> >>>>>> On Fri, May 17, 2013 at 10:54 AM, Guido Trotter <[email protected] >>>>>> > wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, May 17, 2013 at 10:46 AM, Michele Tartara < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Ganeti is currently not able to detect a legit shutdown request >>>>>>>> performed by a >>>>>>>> user from inside a Xen domain. >>>>>>>> >>>>>>>> This patch provides a design document to implement a mechanism able >>>>>>>> to cope with >>>>>>>> such events. >>>>>>>> >>>>>>>> Signed-off-by: Michele Tartara <[email protected]> >>>>>>>> --- >>>>>>>> Makefile.am | 1 + >>>>>>>> doc/design-draft.rst | 1 + >>>>>>>> doc/design-internal-shutdown.rst | 72 >>>>>>>> ++++++++++++++++++++++++++++++++++++++++ >>>>>>>> 3 files changed, 74 insertions(+) >>>>>>>> create mode 100644 doc/design-internal-shutdown.rst >>>>>>>> >>>>>>>> diff --git a/Makefile.am b/Makefile.am >>>>>>>> index 037cf53..f66624e 100644 >>>>>>>> --- a/Makefile.am >>>>>>>> +++ b/Makefile.am >>>>>>>> @@ -410,6 +410,7 @@ docinput = \ >>>>>>>> doc/design-htools-2.3.rst \ >>>>>>>> doc/design-http-server.rst \ >>>>>>>> doc/design-impexp2.rst \ >>>>>>>> + doc/design-internal-shutdown.rst \ >>>>>>>> doc/design-lu-generated-jobs.rst \ >>>>>>>> doc/design-linuxha.rst \ >>>>>>>> doc/design-multi-reloc.rst \ >>>>>>>> diff --git a/doc/design-draft.rst b/doc/design-draft.rst >>>>>>>> index ccb2f93..9a1d2b1 100644 >>>>>>>> --- a/doc/design-draft.rst >>>>>>>> +++ b/doc/design-draft.rst >>>>>>>> @@ -19,6 +19,7 @@ Design document drafts >>>>>>>> design-storagetypes.rst >>>>>>>> design-reason-trail.rst >>>>>>>> design-device-uuid-name.rst >>>>>>>> + design-internal-shutdown.rst >>>>>>>> >>>>>>>> .. vim: set textwidth=72 : >>>>>>>> .. Local Variables: >>>>>>>> diff --git a/doc/design-internal-shutdown.rst >>>>>>>> b/doc/design-internal-shutdown.rst >>>>>>>> new file mode 100644 >>>>>>>> index 0000000..836d00c >>>>>>>> --- /dev/null >>>>>>>> +++ b/doc/design-internal-shutdown.rst >>>>>>>> @@ -0,0 +1,72 @@ >>>>>>>> +============================================================ >>>>>>>> +Detection of user-initiated shutdown from inside an instance >>>>>>>> +============================================================ >>>>>>>> + >>>>>>>> +.. contents:: :depth: 2 >>>>>>>> + >>>>>>>> +This is a design document detailing the implementation of a way >>>>>>>> for Ganeti to >>>>>>>> +detect whether a machine marked as up but not running was shutdown >>>>>>>> gracefully >>>>>>>> +by the user from inside the machine itself. >>>>>>>> + >>>>>>>> +Current state and shortcomings >>>>>>>> +============================== >>>>>>>> + >>>>>>>> +Ganeti keeps track of the desired status of instances in order to >>>>>>>> be able to >>>>>>>> +take proper actions (e.g.: reboot) on the ones that happen to >>>>>>>> crash. >>>>>>>> +Currently, the only way to properly shut down a machine is through >>>>>>>> Ganeti's own >>>>>>>> +commands, that will mark an instance as ``ADMIN_down``. >>>>>>>> +If a user shuts down an instance from inside, through the proper >>>>>>>> command of the >>>>>>>> +operating system it is running, the instance will be shutdown >>>>>>>> gracefully, but >>>>>>>> +Ganeti is not aware of that: the desired status of the instance >>>>>>>> will still be >>>>>>>> +marked as ``running``, so when the watcher realises that the >>>>>>>> instance is down, >>>>>>>> +it will restart it. This behaviour is usually not what the user >>>>>>>> expects. >>>>>>>> + >>>>>>>> +Proposed changes >>>>>>>> +================ >>>>>>>> + >>>>>>>> +We propose to modify Ganeti in such a way that it will detect when >>>>>>>> an instance >>>>>>>> +was shutdown because of an explicit user request. When such a >>>>>>>> situation is >>>>>>>> +detected, the state of the instance will be set to ADMIN_down, as >>>>>>>> intended by >>>>>>>> +the user. >>>>>>>> + >>>>>>>> >>>>>>> >>>>>>> Should we provide an option to just restart it in that case? (as an >>>>>>> hv parameter) >>>>>>> The default could still be to leave it down. >>>>>>> >>>>>> >>>>>> Do you mean to have an option to tell the system to behave as it does >>>>>> currently? Is that really useful? >>>>>> >>>>>> >>>>> Yes. One thing is "we don't want these to appear as errors, like if >>>>> the instance have crashed", and a different one is "we're ok if they don't >>>>> come back up". That said, perhaps if the user does "halt" it's ok to halt, >>>>> and if they do reboot to reboot. So that should be the default. >>>>> >>>> >>>> Ack. >>>> >>>> >>>>> >>>>> >>>>>> >>>>>>> >>>>>>>> +This design document applies to the Xen backend of Ganeti, because >>>>>>>> it uses >>>>>>>> +features specific of such hypervisor. >>>>>>>> + >>>>>>>> >>>>>>> >>>>>>> Is there any way to do something similar at least for kvm? I think >>>>>>> this is worth investigating. >>>>>>> >>>>>> >>>>>> I had a brief look, and unfortunately it seems like it's not >>>>>> possible. But I'll spend some more time trying to do it. >>>>>> >>>>> >>>>> I see a -no-shutdown option in qemu. Not sure what kind of information >>>>> the monitor gives, in that case. >>>>> >>>> >>>> Thanks for the hint, I'll have a look at it. >>>> >>>> >>>>> >>>>> >>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>>> +Implementation >>>>>>>> +============== >>>>>>>> + >>>>>>>> +Xen knows why a domain is being shut down (a crash or an explicit >>>>>>>> shutdown >>>>>>>> +or poweroff request), but such information is not usually readily >>>>>>>> available >>>>>>>> +externally, because all such cases lead to the virtual machine >>>>>>>> being destroyed >>>>>>>> +immediately after the event is detected. >>>>>>>> + >>>>>>>> +Still, Xen allows the instance configuration file to define what >>>>>>>> action to be >>>>>>>> +taken in all those cases through the ``on_poweroff``, >>>>>>>> ``on_shutdown`` and >>>>>>>> +``on_crash`` variables. By setting them to ``preserve``, Xen will >>>>>>>> avoid >>>>>>>> +destroying the domains automatically. >>>>>>>> + >>>>>>>> +When the domain is not destroyed, it can be viewed by using ``xm >>>>>>>> list`` (or ``xl >>>>>>>> +list`` in newer Xen versions), and the ``State`` field of the >>>>>>>> output will >>>>>>>> +provide useful information. >>>>>>>> + >>>>>>>> +If the state is ``----c-`` it means the instance has crashed. >>>>>>>> + >>>>>>>> +If the state is ``---s--`` it means the instance was properly >>>>>>>> shutdown. >>>>>>>> + >>>>>>>> +If the instance was properly shutdown and it is still marked as >>>>>>>> ``running`` by >>>>>>>> +Ganeti, it means that it was shutdown from inside by the user, and >>>>>>>> the ganeti >>>>>>>> +status of the instance needs to be changed to ``ADMIN_down``. >>>>>>>> + >>>>>>>> +This will be done at regular intervals by the group watcher, just >>>>>>>> before >>>>>>>> +deciding which instances to reboot. >>>>>>> >>>>>>> + >>>>>>>> +On top of that, at the same times, the watcher will also need to >>>>>>>> issue ``xm >>>>>>>> +destroy`` commands for all the domains that are in crashed or >>>>>>>> shutdown state, >>>>>>>> +since this will not be done automatically by Xen anymore because >>>>>>>> of the >>>>>>>> +``preserve`` setting in their config files. >>>>>>>> + >>>>>>>> >>>>>>> >>>>>>> Does this mean the memory will be freed only 5 minutes later? >>>>>>> >>>>>> >>>>>> Yes, that's the downside of this approach. >>>>>> Still, I think the impact is quite limited. First, because I don't >>>>>> think that shutting down VMs from inside happens so frequently (but we >>>>>> probably need some data to confirm this), and 5 minutes is the upper >>>>>> limit >>>>>> anyway. Second, because if we see that it is required, we could have the >>>>>> cleanup function running not only in the watcher, but also "on >>>>>> demand"every >>>>>> time it is needed, such as before creating a new instance, so that all >>>>>> the >>>>>> operations that require memory to be available are sure to be executed >>>>>> in a >>>>>> clean state. >>>>>> >>>>>> >>>>> Ack. >>>>> >>>> >>>>> >>>>>> Can you add a note saying that this will not impact the "ganeti >>>>>>> admin" issued path, as that can check "xm list" and will do "the right >>>>>>> thing" finishing with the destruction? >>>>>>> >>>>>> >>>>>> Sure, this is not a problem. >>>>>> >>>>>> Also, what will happen when gnt-instance list is called beteween >>>>>>> the shutdown and the watcher run? Do we need to display a different >>>>>>> state, >>>>>>> make the cleanup happen then (not sure I like this), or just display >>>>>>> ADMIN_Down "as if", although the domain is not destroyed yet? >>>>>>> >>>>>> >>>>>> Given that gnt-instance list shows the status of instances according >>>>>> to Ganeti, I think the right thing would be to show the exact situation: >>>>>> the expected status, the actual status, and a note saying that an >>>>>> internal >>>>>> shutdown was detected. >>>>>> >>>>>> >>>>> Ack, but this needs to be explicit in this design. What exactly is >>>>> going to appear in the "status" field? What about in oper_state and >>>>> admin_state ? Is there any new field you're adding? (I assume not, but >>>>> this >>>>> is important to specify). >>>>> >>>> >>>> Ok, I'll detail all of this. >>>> >>>> >>>>> >>>>> >>>>> >>>>>> >>>>>>> Finally, can you comment on changes to gnt-instance >>>>>>> start/failover/migrate and other ops, when they encounter an instance in >>>>>>> this state? Will the RPC layer be changed to make this transparent to >>>>>>> them >>>>>>> (how?) or will they need to deal with this explicitly? >>>>>>> >>>>>> >>>>>> My idea is to run the cleanup function for a single instance just >>>>>> before executing commands that would affect the running state of that >>>>>> instance. This way, we'd end up being sure we always are in a clean state >>>>>> before performing an operation. >>>>>> I don't see a need to modify the RPC layer: it seems to me that this >>>>>> is an hypervisor specific think, and could therefore be implemented >>>>>> directly in the affected hypervisor-specific functions. >>>>>> >>>>>> >>>>> Ack, but please add this to the design doc, specifying that it is only >>>>> for functions that perform action on an instance, and not for ones who >>>>> just >>>>> query it (if this is the case as I assume it is). >>>>> >>>> >>>> Ok. Yes, of course, that is the case. >>>> >>>> Thanks, >>>> Michele >>>> >>> >>> >> Interdiff: >> diff --git a/doc/design-internal-shutdown.rst >> b/doc/design-internal-shutdown.rst >> index 836d00c..a92d837 100644 >> --- a/doc/design-internal-shutdown.rst >> +++ b/doc/design-internal-shutdown.rst >> We propose to modify Ganeti in such a way that it will detect when an >> instance >> was shutdown because of an explicit user request. When such a situation >> is >> -detected, the state of the instance will be set to ADMIN_down, as >> intended by >> -the user. >> +detected, instead of presenting an error as it happens now, either the >> state >> +of the instance will be set to ADMIN_down, or the instance will be >> +automatically rebooted, depending on a instance-specific configuration >> value. >> +The default behavior in case no such parameter is found will be to follow >> +the apparent will of the user, and setting to ADMIN_down an instance that >> +was shut down correctly from inside. >> >> This design document applies to the Xen backend of Ganeti, because it >> uses >> -features specific of such hypervisor. >> +features specific of such hypervisor. Initial analysis suggests that a >> similar >> +approach might be used for KVM as well, so this design document will be >> later >> +extended to add more details about it. >> >> Implementation >> ============== >> @@ -65,6 +71,40 @@ destroy`` commands for all the domains that are in >> crashed or shutdown state, >> since this will not be done automatically by Xen anymore because of the >> ``preserve`` setting in their config files. >> >> +This behavior will be limited to the domains shut down from inside, >> because it >> +will actually keep the resources of the domain busy until the watcher >> will do >> +the cleaning job (that, with the default setting, is up to every 5 >> minutes). >> +Still, this is considered acceptable, because it is not frequent for a >> domain >> +to be shut down this way. The cleanup function will be also run >> +automatically just before performing any job that requires resources to >> be >> +available (such as when creating a new instance), in order to ensure >> that the >> +new resource allocation happens starting from a clean state. >> Functionalities >> +that only query the state of instances will not run the cleanup function. >> + >> +This changes are hypervisor-specific and will not affect the external RPC >> +interface. >> > > Here you're talking about the RAPI I believe. The RPC interface (the > internal one, between noded and masterd) will need to be changed to support > this operation. Also the Hypervisor API will need changes for this, I > believe. > > Finally, will there be a new LU, or how will the watcher perform the > check/cleanup? > > Interdiff to explain all this: diff --git a/doc/design-internal-shutdown.rst b/doc/design-internal-shutdown.rst index a92d837..8b5e3c3 100644 --- a/doc/design-internal-shutdown.rst +++ b/doc/design-internal-shutdown.rst @@ -81,11 +81,28 @@ available (such as when creating a new instance), in order to en new resource allocation happens starting from a clean state. Functionalities that only query the state of instances will not run the cleanup function. -This changes are hypervisor-specific and will not affect the external RPC -interface. - -Side effects of the modification -++++++++++++++++++++++++++++++++ +The cleanup operation includes both node-specific operations (the actual +destruction of the stopped domains) and configuration changes, to be performed +on the master node (marking as offline an instance that was shut down +internally). Therefore, it will be implemented by adding a LU in cmdlib +(``LUCleanupInstances``). A Job executing such an opcode will be submitted by +the watcher to perform the cleanup. + +The node daemon will have to be modified in order to support at least the +following RPC calls: + * A call to list all the instances that have been shutdown from inside + * A call to destroy a domain + +The base hypervisor class (and all the deriving classes) will need two methods +for implementing such functionalities in a hypervisor-specific way. + +LUs performing operations other than an explicit cleanup will have to be +modified to perform the cleanup as well, either by submitting a job to perform +the cleanup (to be completed before actually performing the task at hand) or by +explicitly performing the cleanup themselves through the RPC calls. + +Other required changes +++++++++++++++++++++++ The implementation of this design document will require some commands to be changed in order to cope with the new shutdown procedure. > + >> +Side effects of the modification >> +++++++++++++++++++++++++++++++++ >> + >> > > Other changes required, I would say, rather than "side effects". > > >> +The implementation of this design document will require some commands to >> be >> +changed in order to cope with the new shutdown procedure. >> + >> +With this modification, also the Ganeti command for shutting down >> instances >> +would leave them in a shutdown but preserved state. Therefore, it will be >> +changed in such a way to immediately perform the cleanup of the instance >> +after verifying its correct shutdown. >> + >> +The ``gnt-instance list`` command will need to be able to handle the >> situation >> +where an instance was shutdown internally but not yet cleaned up. >> +The admin_state field will maintain the current meaning unchanged. The >> +oper_state field will get a new possible state, ``S``, meaning that the >> instance >> +was shutdown internally. >> + >> > +The ``gnt-instance info`` command ``State`` field, in such case, will >> show a >> +message stating that the instance was supposed to be run but was shut >> down >> +internally. >> + >> > > ACK. > > Thanks, > > Guido > > Thanks, Michele
