Re: [PATCH master] Design doc for internal shutdown detection

Guido Trotter Tue, 21 May 2013 05:45:33 -0700

On Tue, May 21, 2013 at 2:37 PM, Michele Tartara <[email protected]>wrote:


> On Sat, May 18, 2013 at 10:07 AM, Michele Tartara <[email protected]>wrote:
>
>> +list
>>
>>
>>
>> On Fri, May 17, 2013 at 4:26 PM, Michele Tartara <[email protected]>wrote:
>>
>>> On Fri, May 17, 2013 at 2:50 PM, Guido Trotter <[email protected]>wrote:
>>>
>>>>
>>>>
>>>>
>>>> On Fri, May 17, 2013 at 2:30 PM, Michele Tartara 
>>>> <[email protected]>wrote:
>>>>
>>>>> On Fri, May 17, 2013 at 10:54 AM, Guido Trotter 
>>>>> <[email protected]>wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, May 17, 2013 at 10:46 AM, Michele Tartara <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Ganeti is currently not able to detect a legit shutdown request
>>>>>>> performed by a
>>>>>>> user from inside a Xen domain.
>>>>>>>
>>>>>>> This patch provides a design document to implement a mechanism able
>>>>>>> to cope with
>>>>>>> such events.
>>>>>>>
>>>>>>> Signed-off-by: Michele Tartara <[email protected]>
>>>>>>> ---
>>>>>>>  Makefile.am                      |  1 +
>>>>>>>  doc/design-draft.rst             |  1 +
>>>>>>>  doc/design-internal-shutdown.rst | 72
>>>>>>> ++++++++++++++++++++++++++++++++++++++++
>>>>>>>  3 files changed, 74 insertions(+)
>>>>>>>  create mode 100644 doc/design-internal-shutdown.rst
>>>>>>>
>>>>>>> diff --git a/Makefile.am b/Makefile.am
>>>>>>> index 037cf53..f66624e 100644
>>>>>>> --- a/Makefile.am
>>>>>>> +++ b/Makefile.am
>>>>>>> @@ -410,6 +410,7 @@ docinput = \
>>>>>>>         doc/design-htools-2.3.rst \
>>>>>>>         doc/design-http-server.rst \
>>>>>>>         doc/design-impexp2.rst \
>>>>>>> +       doc/design-internal-shutdown.rst \
>>>>>>>         doc/design-lu-generated-jobs.rst \
>>>>>>>         doc/design-linuxha.rst \
>>>>>>>         doc/design-multi-reloc.rst \
>>>>>>> diff --git a/doc/design-draft.rst b/doc/design-draft.rst
>>>>>>> index ccb2f93..9a1d2b1 100644
>>>>>>> --- a/doc/design-draft.rst
>>>>>>> +++ b/doc/design-draft.rst
>>>>>>> @@ -19,6 +19,7 @@ Design document drafts
>>>>>>>     design-storagetypes.rst
>>>>>>>     design-reason-trail.rst
>>>>>>>     design-device-uuid-name.rst
>>>>>>> +   design-internal-shutdown.rst
>>>>>>>
>>>>>>>  .. vim: set textwidth=72 :
>>>>>>>  .. Local Variables:
>>>>>>> diff --git a/doc/design-internal-shutdown.rst
>>>>>>> b/doc/design-internal-shutdown.rst
>>>>>>> new file mode 100644
>>>>>>> index 0000000..836d00c
>>>>>>> --- /dev/null
>>>>>>> +++ b/doc/design-internal-shutdown.rst
>>>>>>> @@ -0,0 +1,72 @@
>>>>>>> +============================================================
>>>>>>> +Detection of user-initiated shutdown from inside an instance
>>>>>>> +============================================================
>>>>>>> +
>>>>>>> +.. contents:: :depth: 2
>>>>>>> +
>>>>>>> +This is a design document detailing the implementation of a way for
>>>>>>> Ganeti to
>>>>>>> +detect whether a machine marked as up but not running was shutdown
>>>>>>> gracefully
>>>>>>> +by the user from inside the machine itself.
>>>>>>> +
>>>>>>> +Current state and shortcomings
>>>>>>> +==============================
>>>>>>> +
>>>>>>> +Ganeti keeps track of the desired status of instances in order to
>>>>>>> be able to
>>>>>>> +take proper actions (e.g.: reboot) on the ones that happen to crash.
>>>>>>> +Currently, the only way to properly shut down a machine is through
>>>>>>> Ganeti's own
>>>>>>> +commands, that will mark an instance as ``ADMIN_down``.
>>>>>>> +If a user shuts down an instance from inside, through the proper
>>>>>>> command of the
>>>>>>> +operating system it is running, the instance will be shutdown
>>>>>>> gracefully, but
>>>>>>> +Ganeti is not aware of that: the desired status of the instance
>>>>>>> will still be
>>>>>>> +marked as ``running``, so when the watcher realises that the
>>>>>>> instance is down,
>>>>>>> +it will restart it. This behaviour is usually not what the user
>>>>>>> expects.
>>>>>>> +
>>>>>>> +Proposed changes
>>>>>>> +================
>>>>>>> +
>>>>>>> +We propose to modify Ganeti in such a way that it will detect when
>>>>>>> an instance
>>>>>>> +was shutdown because of an explicit user request. When such a
>>>>>>> situation is
>>>>>>> +detected, the state of the instance will be set to ADMIN_down, as
>>>>>>> intended by
>>>>>>> +the user.
>>>>>>> +
>>>>>>>
>>>>>>
>>>>>> Should we provide an option to just restart it in that case? (as an
>>>>>> hv parameter)
>>>>>> The default could still be to leave it down.
>>>>>>
>>>>>
>>>>> Do you mean to have an option to tell the system to behave as it does
>>>>> currently? Is that really useful?
>>>>>
>>>>>
>>>> Yes. One thing is "we don't want these to appear as errors, like if the
>>>> instance have crashed", and a different one is "we're ok if they don't come
>>>> back up". That said, perhaps if the user does "halt" it's ok to halt, and
>>>> if they do reboot to reboot. So that should be the default.
>>>>
>>>
>>> Ack.
>>>
>>>
>>>>
>>>>
>>>>>
>>>>>>
>>>>>>> +This design document applies to the Xen backend of Ganeti, because
>>>>>>> it uses
>>>>>>> +features specific of such hypervisor.
>>>>>>> +
>>>>>>>
>>>>>>
>>>>>> Is there any way to do something similar at least for kvm? I think
>>>>>> this is worth investigating.
>>>>>>
>>>>>
>>>>> I had a brief look, and unfortunately it seems like it's not possible.
>>>>> But I'll spend some more time trying to do it.
>>>>>
>>>>
>>>> I see a -no-shutdown option in qemu. Not sure what kind of information
>>>> the monitor gives, in that case.
>>>>
>>>
>>> Thanks for the hint, I'll have a look at it.
>>>
>>>
>>>>
>>>>
>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>> +Implementation
>>>>>>> +==============
>>>>>>> +
>>>>>>> +Xen knows why a domain is being shut down (a crash or an explicit
>>>>>>> shutdown
>>>>>>> +or poweroff request), but such information is not usually readily
>>>>>>> available
>>>>>>> +externally, because all such cases lead to the virtual machine
>>>>>>> being destroyed
>>>>>>> +immediately after the event is detected.
>>>>>>> +
>>>>>>> +Still, Xen allows the instance configuration file to define what
>>>>>>> action to be
>>>>>>> +taken in all those cases through the ``on_poweroff``,
>>>>>>> ``on_shutdown`` and
>>>>>>> +``on_crash`` variables. By setting them to ``preserve``, Xen will
>>>>>>> avoid
>>>>>>> +destroying the domains automatically.
>>>>>>> +
>>>>>>> +When the domain is not destroyed, it can be viewed by using ``xm
>>>>>>> list`` (or ``xl
>>>>>>> +list`` in newer Xen versions), and the ``State`` field of the
>>>>>>> output will
>>>>>>> +provide useful information.
>>>>>>> +
>>>>>>> +If the state is ``----c-`` it means the instance has crashed.
>>>>>>> +
>>>>>>> +If the state is ``---s--`` it means the instance was properly
>>>>>>> shutdown.
>>>>>>> +
>>>>>>> +If the instance was properly shutdown and it is still marked as
>>>>>>> ``running`` by
>>>>>>> +Ganeti, it means that it was shutdown from inside by the user, and
>>>>>>> the ganeti
>>>>>>> +status of the instance needs to be changed to ``ADMIN_down``.
>>>>>>> +
>>>>>>> +This will be done at regular intervals by the group watcher, just
>>>>>>> before
>>>>>>> +deciding which instances to reboot.
>>>>>>
>>>>>> +
>>>>>>> +On top of that, at the same times, the watcher will also need to
>>>>>>> issue ``xm
>>>>>>> +destroy`` commands for all the domains that are in crashed or
>>>>>>> shutdown state,
>>>>>>> +since this will not be done automatically by Xen anymore because of
>>>>>>> the
>>>>>>> +``preserve`` setting in their config files.
>>>>>>> +
>>>>>>>
>>>>>>
>>>>>> Does this mean the memory will be freed only 5 minutes later?
>>>>>>
>>>>>
>>>>> Yes, that's the downside of this approach.
>>>>> Still, I think the impact is quite limited. First, because I don't
>>>>> think that shutting down VMs from inside happens so frequently (but we
>>>>> probably need some data to confirm this), and 5 minutes is the upper limit
>>>>> anyway. Second, because if we see that it is required, we could have the
>>>>> cleanup function running not only in the watcher, but also "on 
>>>>> demand"every
>>>>> time it is needed, such as before creating a new instance, so that all the
>>>>> operations that require memory to be available are sure to be executed in 
>>>>> a
>>>>> clean state.
>>>>>
>>>>>
>>>> Ack.
>>>>
>>>
>>>>
>>>>>  Can you add a note saying that this will not impact the "ganeti
>>>>>> admin" issued path, as that can check "xm list" and will do "the right
>>>>>> thing" finishing with the destruction?
>>>>>>
>>>>>
>>>>> Sure, this is not a problem.
>>>>>
>>>>>  Also, what will happen when gnt-instance list is called beteween the
>>>>>> shutdown and the watcher run? Do we need to display a different state, 
>>>>>> make
>>>>>> the cleanup happen then (not sure I like this), or just display 
>>>>>> ADMIN_Down
>>>>>> "as if", although the domain is not destroyed yet?
>>>>>>
>>>>>
>>>>> Given that gnt-instance list shows the status of instances according
>>>>> to Ganeti, I think the right thing would be to show the exact situation:
>>>>> the expected status, the actual status, and a note saying that an internal
>>>>> shutdown was detected.
>>>>>
>>>>>
>>>> Ack, but this needs to be explicit in this design. What exactly is
>>>> going to appear in the "status" field? What about in oper_state and
>>>> admin_state ? Is there any new field you're adding? (I assume not, but this
>>>> is important to specify).
>>>>
>>>
>>> Ok, I'll detail all of this.
>>>
>>>
>>>>
>>>>
>>>>
>>>>>
>>>>>> Finally, can you comment on changes to gnt-instance
>>>>>> start/failover/migrate and other ops, when they encounter an instance in
>>>>>> this state? Will the RPC layer be changed to make this transparent to 
>>>>>> them
>>>>>> (how?) or will they need to deal with this explicitly?
>>>>>>
>>>>>
>>>>> My idea is to run the cleanup function for a single instance just
>>>>> before executing commands that would affect the running state of that
>>>>> instance. This way, we'd end up being sure we always are in a clean state
>>>>> before performing an operation.
>>>>> I don't see a need to modify the RPC layer: it seems to me that this
>>>>> is an hypervisor specific think, and could therefore be implemented
>>>>> directly in the affected hypervisor-specific functions.
>>>>>
>>>>>
>>>> Ack, but please add this to the design doc, specifying that it is only
>>>> for functions that perform action on an instance, and not for ones who just
>>>> query it (if this is the case as I assume it is).
>>>>
>>>
>>> Ok. Yes, of course, that is the case.
>>>
>>> Thanks,
>>> Michele
>>>
>>
>>
> Interdiff:
> diff --git a/doc/design-internal-shutdown.rst
> b/doc/design-internal-shutdown.rst
> index 836d00c..a92d837 100644
> --- a/doc/design-internal-shutdown.rst
> +++ b/doc/design-internal-shutdown.rst
>  We propose to modify Ganeti in such a way that it will detect when an
> instance
>  was shutdown because of an explicit user request. When such a situation is
> -detected, the state of the instance will be set to ADMIN_down, as
> intended by
> -the user.
> +detected, instead of presenting an error as it happens now, either the
> state
> +of the instance will be set to ADMIN_down, or the instance will be
> +automatically rebooted, depending on a instance-specific configuration
> value.
> +The default behavior in case no such parameter is found will be to follow
> +the apparent will of the user, and setting to ADMIN_down an instance that
> +was shut down correctly from inside.
>
>  This design document applies to the Xen backend of Ganeti, because it uses
> -features specific of such hypervisor.
> +features specific of such hypervisor. Initial analysis suggests that a
> similar
> +approach might be used for KVM as well, so this design document will be
> later
> +extended to add more details about it.
>
>  Implementation
>  ==============
> @@ -65,6 +71,40 @@ destroy`` commands for all the domains that are in
> crashed or shutdown state,
>  since this will not be done automatically by Xen anymore because of the
>  ``preserve`` setting in their config files.
>
> +This behavior will be limited to the domains shut down from inside,
> because it
> +will actually keep the resources of the domain busy until the watcher
> will do
> +the cleaning job (that, with the default setting, is up to every 5
> minutes).
> +Still, this is considered acceptable, because it is not frequent for a
> domain
> +to be shut down this way. The cleanup function will be also run
> +automatically just before performing any job that requires resources to be
> +available (such as when creating a new instance), in order to ensure that
> the
> +new resource allocation happens starting from a clean state.
> Functionalities
> +that only query the state of instances will not run the cleanup function.
> +
> +This changes are hypervisor-specific and will not affect the external RPC
> +interface.
>

Here you're talking about the RAPI I believe. The RPC interface (the
internal one, between noded and masterd) will need to be changed to support
this operation. Also the Hypervisor API will need changes for this, I
believe.

Finally, will there be a new LU, or how will the watcher perform the
check/cleanup?

+
> +Side effects of the modification
> +++++++++++++++++++++++++++++++++
> +
>

Other changes required, I would say, rather than "side effects".


> +The implementation of this design document will require some commands to
> be
> +changed in order to cope with the new shutdown procedure.
> +
> +With this modification, also the Ganeti command for shutting down
> instances
> +would leave them in a shutdown but preserved state. Therefore, it will be
> +changed in such a way to immediately perform the cleanup of the instance
> +after verifying its correct shutdown.
> +
> +The ``gnt-instance list`` command will need to be able to handle the
> situation
> +where an instance was shutdown internally but not yet cleaned up.
> +The admin_state field will maintain the current meaning unchanged. The
> +oper_state field will get a new possible state, ``S``, meaning that the
> instance
> +was shutdown internally.
> +
>
+The ``gnt-instance info`` command ``State`` field, in such case, will show
> a
> +message stating that the instance was supposed to be run but was shut down
> +internally.
> +
>

ACK.

Thanks,

Guido

Re: [PATCH master] Design doc for internal shutdown detection

Reply via email to