Re: [PATCH master] Design doc for internal shutdown detection

Michele Tartara Sat, 18 May 2013 01:08:16 -0700

+list


On Fri, May 17, 2013 at 4:26 PM, Michele Tartara <[email protected]>wrote:

> On Fri, May 17, 2013 at 2:50 PM, Guido Trotter <[email protected]>wrote:
>
>>
>>
>>
>> On Fri, May 17, 2013 at 2:30 PM, Michele Tartara <[email protected]>wrote:
>>
>>> On Fri, May 17, 2013 at 10:54 AM, Guido Trotter <[email protected]>wrote:
>>>
>>>>
>>>>
>>>>
>>>> On Fri, May 17, 2013 at 10:46 AM, Michele Tartara 
>>>> <[email protected]>wrote:
>>>>
>>>>> Ganeti is currently not able to detect a legit shutdown request
>>>>> performed by a
>>>>> user from inside a Xen domain.
>>>>>
>>>>> This patch provides a design document to implement a mechanism able to
>>>>> cope with
>>>>> such events.
>>>>>
>>>>> Signed-off-by: Michele Tartara <[email protected]>
>>>>> ---
>>>>>  Makefile.am                      |  1 +
>>>>>  doc/design-draft.rst             |  1 +
>>>>>  doc/design-internal-shutdown.rst | 72
>>>>> ++++++++++++++++++++++++++++++++++++++++
>>>>>  3 files changed, 74 insertions(+)
>>>>>  create mode 100644 doc/design-internal-shutdown.rst
>>>>>
>>>>> diff --git a/Makefile.am b/Makefile.am
>>>>> index 037cf53..f66624e 100644
>>>>> --- a/Makefile.am
>>>>> +++ b/Makefile.am
>>>>> @@ -410,6 +410,7 @@ docinput = \
>>>>>         doc/design-htools-2.3.rst \
>>>>>         doc/design-http-server.rst \
>>>>>         doc/design-impexp2.rst \
>>>>> +       doc/design-internal-shutdown.rst \
>>>>>         doc/design-lu-generated-jobs.rst \
>>>>>         doc/design-linuxha.rst \
>>>>>         doc/design-multi-reloc.rst \
>>>>> diff --git a/doc/design-draft.rst b/doc/design-draft.rst
>>>>> index ccb2f93..9a1d2b1 100644
>>>>> --- a/doc/design-draft.rst
>>>>> +++ b/doc/design-draft.rst
>>>>> @@ -19,6 +19,7 @@ Design document drafts
>>>>>     design-storagetypes.rst
>>>>>     design-reason-trail.rst
>>>>>     design-device-uuid-name.rst
>>>>> +   design-internal-shutdown.rst
>>>>>
>>>>>  .. vim: set textwidth=72 :
>>>>>  .. Local Variables:
>>>>> diff --git a/doc/design-internal-shutdown.rst
>>>>> b/doc/design-internal-shutdown.rst
>>>>> new file mode 100644
>>>>> index 0000000..836d00c
>>>>> --- /dev/null
>>>>> +++ b/doc/design-internal-shutdown.rst
>>>>> @@ -0,0 +1,72 @@
>>>>> +============================================================
>>>>> +Detection of user-initiated shutdown from inside an instance
>>>>> +============================================================
>>>>> +
>>>>> +.. contents:: :depth: 2
>>>>> +
>>>>> +This is a design document detailing the implementation of a way for
>>>>> Ganeti to
>>>>> +detect whether a machine marked as up but not running was shutdown
>>>>> gracefully
>>>>> +by the user from inside the machine itself.
>>>>> +
>>>>> +Current state and shortcomings
>>>>> +==============================
>>>>> +
>>>>> +Ganeti keeps track of the desired status of instances in order to be
>>>>> able to
>>>>> +take proper actions (e.g.: reboot) on the ones that happen to crash.
>>>>> +Currently, the only way to properly shut down a machine is through
>>>>> Ganeti's own
>>>>> +commands, that will mark an instance as ``ADMIN_down``.
>>>>> +If a user shuts down an instance from inside, through the proper
>>>>> command of the
>>>>> +operating system it is running, the instance will be shutdown
>>>>> gracefully, but
>>>>> +Ganeti is not aware of that: the desired status of the instance will
>>>>> still be
>>>>> +marked as ``running``, so when the watcher realises that the instance
>>>>> is down,
>>>>> +it will restart it. This behaviour is usually not what the user
>>>>> expects.
>>>>> +
>>>>> +Proposed changes
>>>>> +================
>>>>> +
>>>>> +We propose to modify Ganeti in such a way that it will detect when an
>>>>> instance
>>>>> +was shutdown because of an explicit user request. When such a
>>>>> situation is
>>>>> +detected, the state of the instance will be set to ADMIN_down, as
>>>>> intended by
>>>>> +the user.
>>>>> +
>>>>>
>>>>
>>>> Should we provide an option to just restart it in that case? (as an hv
>>>> parameter)
>>>> The default could still be to leave it down.
>>>>
>>>
>>> Do you mean to have an option to tell the system to behave as it does
>>> currently? Is that really useful?
>>>
>>>
>> Yes. One thing is "we don't want these to appear as errors, like if the
>> instance have crashed", and a different one is "we're ok if they don't come
>> back up". That said, perhaps if the user does "halt" it's ok to halt, and
>> if they do reboot to reboot. So that should be the default.
>>
>
> Ack.
>
>
>>
>>
>>>
>>>>
>>>>> +This design document applies to the Xen backend of Ganeti, because it
>>>>> uses
>>>>> +features specific of such hypervisor.
>>>>> +
>>>>>
>>>>
>>>> Is there any way to do something similar at least for kvm? I think this
>>>> is worth investigating.
>>>>
>>>
>>> I had a brief look, and unfortunately it seems like it's not possible.
>>> But I'll spend some more time trying to do it.
>>>
>>
>> I see a -no-shutdown option in qemu. Not sure what kind of information
>> the monitor gives, in that case.
>>
>
> Thanks for the hint, I'll have a look at it.
>
>
>>
>>
>>
>>>
>>>>
>>>>
>>>>> +Implementation
>>>>> +==============
>>>>> +
>>>>> +Xen knows why a domain is being shut down (a crash or an explicit
>>>>> shutdown
>>>>> +or poweroff request), but such information is not usually readily
>>>>> available
>>>>> +externally, because all such cases lead to the virtual machine being
>>>>> destroyed
>>>>> +immediately after the event is detected.
>>>>> +
>>>>> +Still, Xen allows the instance configuration file to define what
>>>>> action to be
>>>>> +taken in all those cases through the ``on_poweroff``, ``on_shutdown``
>>>>> and
>>>>> +``on_crash`` variables. By setting them to ``preserve``, Xen will
>>>>> avoid
>>>>> +destroying the domains automatically.
>>>>> +
>>>>> +When the domain is not destroyed, it can be viewed by using ``xm
>>>>> list`` (or ``xl
>>>>> +list`` in newer Xen versions), and the ``State`` field of the output
>>>>> will
>>>>> +provide useful information.
>>>>> +
>>>>> +If the state is ``----c-`` it means the instance has crashed.
>>>>> +
>>>>> +If the state is ``---s--`` it means the instance was properly
>>>>> shutdown.
>>>>> +
>>>>> +If the instance was properly shutdown and it is still marked as
>>>>> ``running`` by
>>>>> +Ganeti, it means that it was shutdown from inside by the user, and
>>>>> the ganeti
>>>>> +status of the instance needs to be changed to ``ADMIN_down``.
>>>>> +
>>>>> +This will be done at regular intervals by the group watcher, just
>>>>> before
>>>>> +deciding which instances to reboot.
>>>>
>>>> +
>>>>> +On top of that, at the same times, the watcher will also need to
>>>>> issue ``xm
>>>>> +destroy`` commands for all the domains that are in crashed or
>>>>> shutdown state,
>>>>> +since this will not be done automatically by Xen anymore because of
>>>>> the
>>>>> +``preserve`` setting in their config files.
>>>>> +
>>>>>
>>>>
>>>> Does this mean the memory will be freed only 5 minutes later?
>>>>
>>>
>>> Yes, that's the downside of this approach.
>>> Still, I think the impact is quite limited. First, because I don't think
>>> that shutting down VMs from inside happens so frequently (but we probably
>>> need some data to confirm this), and 5 minutes is the upper limit anyway.
>>> Second, because if we see that it is required, we could have the cleanup
>>> function running not only in the watcher, but also "on demand"every time it
>>> is needed, such as before creating a new instance, so that all the
>>> operations that require memory to be available are sure to be executed in a
>>> clean state.
>>>
>>>
>> Ack.
>>
>
>>
>>>  Can you add a note saying that this will not impact the "ganeti admin"
>>>> issued path, as that can check "xm list" and will do "the right thing"
>>>> finishing with the destruction?
>>>>
>>>
>>> Sure, this is not a problem.
>>>
>>>  Also, what will happen when gnt-instance list is called beteween the
>>>> shutdown and the watcher run? Do we need to display a different state, make
>>>> the cleanup happen then (not sure I like this), or just display ADMIN_Down
>>>> "as if", although the domain is not destroyed yet?
>>>>
>>>
>>> Given that gnt-instance list shows the status of instances according to
>>> Ganeti, I think the right thing would be to show the exact situation: the
>>> expected status, the actual status, and a note saying that an internal
>>> shutdown was detected.
>>>
>>>
>> Ack, but this needs to be explicit in this design. What exactly is going
>> to appear in the "status" field? What about in oper_state and admin_state ?
>> Is there any new field you're adding? (I assume not, but this is important
>> to specify).
>>
>
> Ok, I'll detail all of this.
>
>
>>
>>
>>
>>>
>>>> Finally, can you comment on changes to gnt-instance
>>>> start/failover/migrate and other ops, when they encounter an instance in
>>>> this state? Will the RPC layer be changed to make this transparent to them
>>>> (how?) or will they need to deal with this explicitly?
>>>>
>>>
>>> My idea is to run the cleanup function for a single instance just before
>>> executing commands that would affect the running state of that instance.
>>> This way, we'd end up being sure we always are in a clean state before
>>> performing an operation.
>>> I don't see a need to modify the RPC layer: it seems to me that this is
>>> an hypervisor specific think, and could therefore be implemented directly
>>> in the affected hypervisor-specific functions.
>>>
>>>
>> Ack, but please add this to the design doc, specifying that it is only
>> for functions that perform action on an instance, and not for ones who just
>> query it (if this is the case as I assume it is).
>>
>
> Ok. Yes, of course, that is the case.
>
> Thanks,
> Michele
>

Re: [PATCH master] Design doc for internal shutdown detection

Reply via email to