On Tue, May 21, 2013 at 2:37 PM, Michele Tartara <[email protected]>wrote:
> On Sat, May 18, 2013 at 10:07 AM, Michele Tartara <[email protected]>wrote: > >> +list >> >> >> >> On Fri, May 17, 2013 at 4:26 PM, Michele Tartara <[email protected]>wrote: >> >>> On Fri, May 17, 2013 at 2:50 PM, Guido Trotter <[email protected]>wrote: >>> >>>> >>>> >>>> >>>> On Fri, May 17, 2013 at 2:30 PM, Michele Tartara >>>> <[email protected]>wrote: >>>> >>>>> On Fri, May 17, 2013 at 10:54 AM, Guido Trotter >>>>> <[email protected]>wrote: >>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Fri, May 17, 2013 at 10:46 AM, Michele Tartara < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Ganeti is currently not able to detect a legit shutdown request >>>>>>> performed by a >>>>>>> user from inside a Xen domain. >>>>>>> >>>>>>> This patch provides a design document to implement a mechanism able >>>>>>> to cope with >>>>>>> such events. >>>>>>> >>>>>>> Signed-off-by: Michele Tartara <[email protected]> >>>>>>> --- >>>>>>> Makefile.am | 1 + >>>>>>> doc/design-draft.rst | 1 + >>>>>>> doc/design-internal-shutdown.rst | 72 >>>>>>> ++++++++++++++++++++++++++++++++++++++++ >>>>>>> 3 files changed, 74 insertions(+) >>>>>>> create mode 100644 doc/design-internal-shutdown.rst >>>>>>> >>>>>>> diff --git a/Makefile.am b/Makefile.am >>>>>>> index 037cf53..f66624e 100644 >>>>>>> --- a/Makefile.am >>>>>>> +++ b/Makefile.am >>>>>>> @@ -410,6 +410,7 @@ docinput = \ >>>>>>> doc/design-htools-2.3.rst \ >>>>>>> doc/design-http-server.rst \ >>>>>>> doc/design-impexp2.rst \ >>>>>>> + doc/design-internal-shutdown.rst \ >>>>>>> doc/design-lu-generated-jobs.rst \ >>>>>>> doc/design-linuxha.rst \ >>>>>>> doc/design-multi-reloc.rst \ >>>>>>> diff --git a/doc/design-draft.rst b/doc/design-draft.rst >>>>>>> index ccb2f93..9a1d2b1 100644 >>>>>>> --- a/doc/design-draft.rst >>>>>>> +++ b/doc/design-draft.rst >>>>>>> @@ -19,6 +19,7 @@ Design document drafts >>>>>>> design-storagetypes.rst >>>>>>> design-reason-trail.rst >>>>>>> design-device-uuid-name.rst >>>>>>> + design-internal-shutdown.rst >>>>>>> >>>>>>> .. vim: set textwidth=72 : >>>>>>> .. Local Variables: >>>>>>> diff --git a/doc/design-internal-shutdown.rst >>>>>>> b/doc/design-internal-shutdown.rst >>>>>>> new file mode 100644 >>>>>>> index 0000000..836d00c >>>>>>> --- /dev/null >>>>>>> +++ b/doc/design-internal-shutdown.rst >>>>>>> @@ -0,0 +1,72 @@ >>>>>>> +============================================================ >>>>>>> +Detection of user-initiated shutdown from inside an instance >>>>>>> +============================================================ >>>>>>> + >>>>>>> +.. contents:: :depth: 2 >>>>>>> + >>>>>>> +This is a design document detailing the implementation of a way for >>>>>>> Ganeti to >>>>>>> +detect whether a machine marked as up but not running was shutdown >>>>>>> gracefully >>>>>>> +by the user from inside the machine itself. >>>>>>> + >>>>>>> +Current state and shortcomings >>>>>>> +============================== >>>>>>> + >>>>>>> +Ganeti keeps track of the desired status of instances in order to >>>>>>> be able to >>>>>>> +take proper actions (e.g.: reboot) on the ones that happen to crash. >>>>>>> +Currently, the only way to properly shut down a machine is through >>>>>>> Ganeti's own >>>>>>> +commands, that will mark an instance as ``ADMIN_down``. >>>>>>> +If a user shuts down an instance from inside, through the proper >>>>>>> command of the >>>>>>> +operating system it is running, the instance will be shutdown >>>>>>> gracefully, but >>>>>>> +Ganeti is not aware of that: the desired status of the instance >>>>>>> will still be >>>>>>> +marked as ``running``, so when the watcher realises that the >>>>>>> instance is down, >>>>>>> +it will restart it. This behaviour is usually not what the user >>>>>>> expects. >>>>>>> + >>>>>>> +Proposed changes >>>>>>> +================ >>>>>>> + >>>>>>> +We propose to modify Ganeti in such a way that it will detect when >>>>>>> an instance >>>>>>> +was shutdown because of an explicit user request. When such a >>>>>>> situation is >>>>>>> +detected, the state of the instance will be set to ADMIN_down, as >>>>>>> intended by >>>>>>> +the user. >>>>>>> + >>>>>>> >>>>>> >>>>>> Should we provide an option to just restart it in that case? (as an >>>>>> hv parameter) >>>>>> The default could still be to leave it down. >>>>>> >>>>> >>>>> Do you mean to have an option to tell the system to behave as it does >>>>> currently? Is that really useful? >>>>> >>>>> >>>> Yes. One thing is "we don't want these to appear as errors, like if the >>>> instance have crashed", and a different one is "we're ok if they don't come >>>> back up". That said, perhaps if the user does "halt" it's ok to halt, and >>>> if they do reboot to reboot. So that should be the default. >>>> >>> >>> Ack. >>> >>> >>>> >>>> >>>>> >>>>>> >>>>>>> +This design document applies to the Xen backend of Ganeti, because >>>>>>> it uses >>>>>>> +features specific of such hypervisor. >>>>>>> + >>>>>>> >>>>>> >>>>>> Is there any way to do something similar at least for kvm? I think >>>>>> this is worth investigating. >>>>>> >>>>> >>>>> I had a brief look, and unfortunately it seems like it's not possible. >>>>> But I'll spend some more time trying to do it. >>>>> >>>> >>>> I see a -no-shutdown option in qemu. Not sure what kind of information >>>> the monitor gives, in that case. >>>> >>> >>> Thanks for the hint, I'll have a look at it. >>> >>> >>>> >>>> >>>> >>>>> >>>>>> >>>>>> >>>>>>> +Implementation >>>>>>> +============== >>>>>>> + >>>>>>> +Xen knows why a domain is being shut down (a crash or an explicit >>>>>>> shutdown >>>>>>> +or poweroff request), but such information is not usually readily >>>>>>> available >>>>>>> +externally, because all such cases lead to the virtual machine >>>>>>> being destroyed >>>>>>> +immediately after the event is detected. >>>>>>> + >>>>>>> +Still, Xen allows the instance configuration file to define what >>>>>>> action to be >>>>>>> +taken in all those cases through the ``on_poweroff``, >>>>>>> ``on_shutdown`` and >>>>>>> +``on_crash`` variables. By setting them to ``preserve``, Xen will >>>>>>> avoid >>>>>>> +destroying the domains automatically. >>>>>>> + >>>>>>> +When the domain is not destroyed, it can be viewed by using ``xm >>>>>>> list`` (or ``xl >>>>>>> +list`` in newer Xen versions), and the ``State`` field of the >>>>>>> output will >>>>>>> +provide useful information. >>>>>>> + >>>>>>> +If the state is ``----c-`` it means the instance has crashed. >>>>>>> + >>>>>>> +If the state is ``---s--`` it means the instance was properly >>>>>>> shutdown. >>>>>>> + >>>>>>> +If the instance was properly shutdown and it is still marked as >>>>>>> ``running`` by >>>>>>> +Ganeti, it means that it was shutdown from inside by the user, and >>>>>>> the ganeti >>>>>>> +status of the instance needs to be changed to ``ADMIN_down``. >>>>>>> + >>>>>>> +This will be done at regular intervals by the group watcher, just >>>>>>> before >>>>>>> +deciding which instances to reboot. >>>>>> >>>>>> + >>>>>>> +On top of that, at the same times, the watcher will also need to >>>>>>> issue ``xm >>>>>>> +destroy`` commands for all the domains that are in crashed or >>>>>>> shutdown state, >>>>>>> +since this will not be done automatically by Xen anymore because of >>>>>>> the >>>>>>> +``preserve`` setting in their config files. >>>>>>> + >>>>>>> >>>>>> >>>>>> Does this mean the memory will be freed only 5 minutes later? >>>>>> >>>>> >>>>> Yes, that's the downside of this approach. >>>>> Still, I think the impact is quite limited. First, because I don't >>>>> think that shutting down VMs from inside happens so frequently (but we >>>>> probably need some data to confirm this), and 5 minutes is the upper limit >>>>> anyway. Second, because if we see that it is required, we could have the >>>>> cleanup function running not only in the watcher, but also "on >>>>> demand"every >>>>> time it is needed, such as before creating a new instance, so that all the >>>>> operations that require memory to be available are sure to be executed in >>>>> a >>>>> clean state. >>>>> >>>>> >>>> Ack. >>>> >>> >>>> >>>>> Can you add a note saying that this will not impact the "ganeti >>>>>> admin" issued path, as that can check "xm list" and will do "the right >>>>>> thing" finishing with the destruction? >>>>>> >>>>> >>>>> Sure, this is not a problem. >>>>> >>>>> Also, what will happen when gnt-instance list is called beteween the >>>>>> shutdown and the watcher run? Do we need to display a different state, >>>>>> make >>>>>> the cleanup happen then (not sure I like this), or just display >>>>>> ADMIN_Down >>>>>> "as if", although the domain is not destroyed yet? >>>>>> >>>>> >>>>> Given that gnt-instance list shows the status of instances according >>>>> to Ganeti, I think the right thing would be to show the exact situation: >>>>> the expected status, the actual status, and a note saying that an internal >>>>> shutdown was detected. >>>>> >>>>> >>>> Ack, but this needs to be explicit in this design. What exactly is >>>> going to appear in the "status" field? What about in oper_state and >>>> admin_state ? Is there any new field you're adding? (I assume not, but this >>>> is important to specify). >>>> >>> >>> Ok, I'll detail all of this. >>> >>> >>>> >>>> >>>> >>>>> >>>>>> Finally, can you comment on changes to gnt-instance >>>>>> start/failover/migrate and other ops, when they encounter an instance in >>>>>> this state? Will the RPC layer be changed to make this transparent to >>>>>> them >>>>>> (how?) or will they need to deal with this explicitly? >>>>>> >>>>> >>>>> My idea is to run the cleanup function for a single instance just >>>>> before executing commands that would affect the running state of that >>>>> instance. This way, we'd end up being sure we always are in a clean state >>>>> before performing an operation. >>>>> I don't see a need to modify the RPC layer: it seems to me that this >>>>> is an hypervisor specific think, and could therefore be implemented >>>>> directly in the affected hypervisor-specific functions. >>>>> >>>>> >>>> Ack, but please add this to the design doc, specifying that it is only >>>> for functions that perform action on an instance, and not for ones who just >>>> query it (if this is the case as I assume it is). >>>> >>> >>> Ok. Yes, of course, that is the case. >>> >>> Thanks, >>> Michele >>> >> >> > Interdiff: > diff --git a/doc/design-internal-shutdown.rst > b/doc/design-internal-shutdown.rst > index 836d00c..a92d837 100644 > --- a/doc/design-internal-shutdown.rst > +++ b/doc/design-internal-shutdown.rst > We propose to modify Ganeti in such a way that it will detect when an > instance > was shutdown because of an explicit user request. When such a situation is > -detected, the state of the instance will be set to ADMIN_down, as > intended by > -the user. > +detected, instead of presenting an error as it happens now, either the > state > +of the instance will be set to ADMIN_down, or the instance will be > +automatically rebooted, depending on a instance-specific configuration > value. > +The default behavior in case no such parameter is found will be to follow > +the apparent will of the user, and setting to ADMIN_down an instance that > +was shut down correctly from inside. > > This design document applies to the Xen backend of Ganeti, because it uses > -features specific of such hypervisor. > +features specific of such hypervisor. Initial analysis suggests that a > similar > +approach might be used for KVM as well, so this design document will be > later > +extended to add more details about it. > > Implementation > ============== > @@ -65,6 +71,40 @@ destroy`` commands for all the domains that are in > crashed or shutdown state, > since this will not be done automatically by Xen anymore because of the > ``preserve`` setting in their config files. > > +This behavior will be limited to the domains shut down from inside, > because it > +will actually keep the resources of the domain busy until the watcher > will do > +the cleaning job (that, with the default setting, is up to every 5 > minutes). > +Still, this is considered acceptable, because it is not frequent for a > domain > +to be shut down this way. The cleanup function will be also run > +automatically just before performing any job that requires resources to be > +available (such as when creating a new instance), in order to ensure that > the > +new resource allocation happens starting from a clean state. > Functionalities > +that only query the state of instances will not run the cleanup function. > + > +This changes are hypervisor-specific and will not affect the external RPC > +interface. > Here you're talking about the RAPI I believe. The RPC interface (the internal one, between noded and masterd) will need to be changed to support this operation. Also the Hypervisor API will need changes for this, I believe. Finally, will there be a new LU, or how will the watcher perform the check/cleanup? + > +Side effects of the modification > +++++++++++++++++++++++++++++++++ > + > Other changes required, I would say, rather than "side effects". > +The implementation of this design document will require some commands to > be > +changed in order to cope with the new shutdown procedure. > + > +With this modification, also the Ganeti command for shutting down > instances > +would leave them in a shutdown but preserved state. Therefore, it will be > +changed in such a way to immediately perform the cleanup of the instance > +after verifying its correct shutdown. > + > +The ``gnt-instance list`` command will need to be able to handle the > situation > +where an instance was shutdown internally but not yet cleaned up. > +The admin_state field will maintain the current meaning unchanged. The > +oper_state field will get a new possible state, ``S``, meaning that the > instance > +was shutdown internally. > + > +The ``gnt-instance info`` command ``State`` field, in such case, will show > a > +message stating that the instance was supposed to be run but was shut down > +internally. > + > ACK. Thanks, Guido
