Re: [PATCH master] Design doc for internal shutdown detection

Michele Tartara Tue, 21 May 2013 09:29:42 -0700

On Tue, May 21, 2013 at 2:45 PM, Guido Trotter <[email protected]> wrote:


>
>
>
> On Tue, May 21, 2013 at 2:37 PM, Michele Tartara <[email protected]>wrote:
>
>> On Sat, May 18, 2013 at 10:07 AM, Michele Tartara <[email protected]>wrote:
>>
>>> +list
>>>
>>>
>>>
>>> On Fri, May 17, 2013 at 4:26 PM, Michele Tartara <[email protected]>wrote:
>>>
>>>> On Fri, May 17, 2013 at 2:50 PM, Guido Trotter <[email protected]>wrote:
>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, May 17, 2013 at 2:30 PM, Michele Tartara 
>>>>> <[email protected]>wrote:
>>>>>
>>>>>> On Fri, May 17, 2013 at 10:54 AM, Guido Trotter <[email protected]
>>>>>> > wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, May 17, 2013 at 10:46 AM, Michele Tartara <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Ganeti is currently not able to detect a legit shutdown request
>>>>>>>> performed by a
>>>>>>>> user from inside a Xen domain.
>>>>>>>>
>>>>>>>> This patch provides a design document to implement a mechanism able
>>>>>>>> to cope with
>>>>>>>> such events.
>>>>>>>>
>>>>>>>> Signed-off-by: Michele Tartara <[email protected]>
>>>>>>>> ---
>>>>>>>>  Makefile.am                      |  1 +
>>>>>>>>  doc/design-draft.rst             |  1 +
>>>>>>>>  doc/design-internal-shutdown.rst | 72
>>>>>>>> ++++++++++++++++++++++++++++++++++++++++
>>>>>>>>  3 files changed, 74 insertions(+)
>>>>>>>>  create mode 100644 doc/design-internal-shutdown.rst
>>>>>>>>
>>>>>>>> diff --git a/Makefile.am b/Makefile.am
>>>>>>>> index 037cf53..f66624e 100644
>>>>>>>> --- a/Makefile.am
>>>>>>>> +++ b/Makefile.am
>>>>>>>> @@ -410,6 +410,7 @@ docinput = \
>>>>>>>>         doc/design-htools-2.3.rst \
>>>>>>>>         doc/design-http-server.rst \
>>>>>>>>         doc/design-impexp2.rst \
>>>>>>>> +       doc/design-internal-shutdown.rst \
>>>>>>>>         doc/design-lu-generated-jobs.rst \
>>>>>>>>         doc/design-linuxha.rst \
>>>>>>>>         doc/design-multi-reloc.rst \
>>>>>>>> diff --git a/doc/design-draft.rst b/doc/design-draft.rst
>>>>>>>> index ccb2f93..9a1d2b1 100644
>>>>>>>> --- a/doc/design-draft.rst
>>>>>>>> +++ b/doc/design-draft.rst
>>>>>>>> @@ -19,6 +19,7 @@ Design document drafts
>>>>>>>>     design-storagetypes.rst
>>>>>>>>     design-reason-trail.rst
>>>>>>>>     design-device-uuid-name.rst
>>>>>>>> +   design-internal-shutdown.rst
>>>>>>>>
>>>>>>>>  .. vim: set textwidth=72 :
>>>>>>>>  .. Local Variables:
>>>>>>>> diff --git a/doc/design-internal-shutdown.rst
>>>>>>>> b/doc/design-internal-shutdown.rst
>>>>>>>> new file mode 100644
>>>>>>>> index 0000000..836d00c
>>>>>>>> --- /dev/null
>>>>>>>> +++ b/doc/design-internal-shutdown.rst
>>>>>>>> @@ -0,0 +1,72 @@
>>>>>>>> +============================================================
>>>>>>>> +Detection of user-initiated shutdown from inside an instance
>>>>>>>> +============================================================
>>>>>>>> +
>>>>>>>> +.. contents:: :depth: 2
>>>>>>>> +
>>>>>>>> +This is a design document detailing the implementation of a way
>>>>>>>> for Ganeti to
>>>>>>>> +detect whether a machine marked as up but not running was shutdown
>>>>>>>> gracefully
>>>>>>>> +by the user from inside the machine itself.
>>>>>>>> +
>>>>>>>> +Current state and shortcomings
>>>>>>>> +==============================
>>>>>>>> +
>>>>>>>> +Ganeti keeps track of the desired status of instances in order to
>>>>>>>> be able to
>>>>>>>> +take proper actions (e.g.: reboot) on the ones that happen to
>>>>>>>> crash.
>>>>>>>> +Currently, the only way to properly shut down a machine is through
>>>>>>>> Ganeti's own
>>>>>>>> +commands, that will mark an instance as ``ADMIN_down``.
>>>>>>>> +If a user shuts down an instance from inside, through the proper
>>>>>>>> command of the
>>>>>>>> +operating system it is running, the instance will be shutdown
>>>>>>>> gracefully, but
>>>>>>>> +Ganeti is not aware of that: the desired status of the instance
>>>>>>>> will still be
>>>>>>>> +marked as ``running``, so when the watcher realises that the
>>>>>>>> instance is down,
>>>>>>>> +it will restart it. This behaviour is usually not what the user
>>>>>>>> expects.
>>>>>>>> +
>>>>>>>> +Proposed changes
>>>>>>>> +================
>>>>>>>> +
>>>>>>>> +We propose to modify Ganeti in such a way that it will detect when
>>>>>>>> an instance
>>>>>>>> +was shutdown because of an explicit user request. When such a
>>>>>>>> situation is
>>>>>>>> +detected, the state of the instance will be set to ADMIN_down, as
>>>>>>>> intended by
>>>>>>>> +the user.
>>>>>>>> +
>>>>>>>>
>>>>>>>
>>>>>>> Should we provide an option to just restart it in that case? (as an
>>>>>>> hv parameter)
>>>>>>> The default could still be to leave it down.
>>>>>>>
>>>>>>
>>>>>> Do you mean to have an option to tell the system to behave as it does
>>>>>> currently? Is that really useful?
>>>>>>
>>>>>>
>>>>> Yes. One thing is "we don't want these to appear as errors, like if
>>>>> the instance have crashed", and a different one is "we're ok if they don't
>>>>> come back up". That said, perhaps if the user does "halt" it's ok to halt,
>>>>> and if they do reboot to reboot. So that should be the default.
>>>>>
>>>>
>>>> Ack.
>>>>
>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>>> +This design document applies to the Xen backend of Ganeti, because
>>>>>>>> it uses
>>>>>>>> +features specific of such hypervisor.
>>>>>>>> +
>>>>>>>>
>>>>>>>
>>>>>>> Is there any way to do something similar at least for kvm? I think
>>>>>>> this is worth investigating.
>>>>>>>
>>>>>>
>>>>>> I had a brief look, and unfortunately it seems like it's not
>>>>>> possible. But I'll spend some more time trying to do it.
>>>>>>
>>>>>
>>>>> I see a -no-shutdown option in qemu. Not sure what kind of information
>>>>> the monitor gives, in that case.
>>>>>
>>>>
>>>> Thanks for the hint, I'll have a look at it.
>>>>
>>>>
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> +Implementation
>>>>>>>> +==============
>>>>>>>> +
>>>>>>>> +Xen knows why a domain is being shut down (a crash or an explicit
>>>>>>>> shutdown
>>>>>>>> +or poweroff request), but such information is not usually readily
>>>>>>>> available
>>>>>>>> +externally, because all such cases lead to the virtual machine
>>>>>>>> being destroyed
>>>>>>>> +immediately after the event is detected.
>>>>>>>> +
>>>>>>>> +Still, Xen allows the instance configuration file to define what
>>>>>>>> action to be
>>>>>>>> +taken in all those cases through the ``on_poweroff``,
>>>>>>>> ``on_shutdown`` and
>>>>>>>> +``on_crash`` variables. By setting them to ``preserve``, Xen will
>>>>>>>> avoid
>>>>>>>> +destroying the domains automatically.
>>>>>>>> +
>>>>>>>> +When the domain is not destroyed, it can be viewed by using ``xm
>>>>>>>> list`` (or ``xl
>>>>>>>> +list`` in newer Xen versions), and the ``State`` field of the
>>>>>>>> output will
>>>>>>>> +provide useful information.
>>>>>>>> +
>>>>>>>> +If the state is ``----c-`` it means the instance has crashed.
>>>>>>>> +
>>>>>>>> +If the state is ``---s--`` it means the instance was properly
>>>>>>>> shutdown.
>>>>>>>> +
>>>>>>>> +If the instance was properly shutdown and it is still marked as
>>>>>>>> ``running`` by
>>>>>>>> +Ganeti, it means that it was shutdown from inside by the user, and
>>>>>>>> the ganeti
>>>>>>>> +status of the instance needs to be changed to ``ADMIN_down``.
>>>>>>>> +
>>>>>>>> +This will be done at regular intervals by the group watcher, just
>>>>>>>> before
>>>>>>>> +deciding which instances to reboot.
>>>>>>>
>>>>>>> +
>>>>>>>> +On top of that, at the same times, the watcher will also need to
>>>>>>>> issue ``xm
>>>>>>>> +destroy`` commands for all the domains that are in crashed or
>>>>>>>> shutdown state,
>>>>>>>> +since this will not be done automatically by Xen anymore because
>>>>>>>> of the
>>>>>>>> +``preserve`` setting in their config files.
>>>>>>>> +
>>>>>>>>
>>>>>>>
>>>>>>> Does this mean the memory will be freed only 5 minutes later?
>>>>>>>
>>>>>>
>>>>>> Yes, that's the downside of this approach.
>>>>>> Still, I think the impact is quite limited. First, because I don't
>>>>>> think that shutting down VMs from inside happens so frequently (but we
>>>>>> probably need some data to confirm this), and 5 minutes is the upper 
>>>>>> limit
>>>>>> anyway. Second, because if we see that it is required, we could have the
>>>>>> cleanup function running not only in the watcher, but also "on 
>>>>>> demand"every
>>>>>> time it is needed, such as before creating a new instance, so that all 
>>>>>> the
>>>>>> operations that require memory to be available are sure to be executed 
>>>>>> in a
>>>>>> clean state.
>>>>>>
>>>>>>
>>>>> Ack.
>>>>>
>>>>
>>>>>
>>>>>>  Can you add a note saying that this will not impact the "ganeti
>>>>>>> admin" issued path, as that can check "xm list" and will do "the right
>>>>>>> thing" finishing with the destruction?
>>>>>>>
>>>>>>
>>>>>> Sure, this is not a problem.
>>>>>>
>>>>>>  Also, what will happen when gnt-instance list is called beteween
>>>>>>> the shutdown and the watcher run? Do we need to display a different 
>>>>>>> state,
>>>>>>> make the cleanup happen then (not sure I like this), or just display
>>>>>>> ADMIN_Down "as if", although the domain is not destroyed yet?
>>>>>>>
>>>>>>
>>>>>> Given that gnt-instance list shows the status of instances according
>>>>>> to Ganeti, I think the right thing would be to show the exact situation:
>>>>>> the expected status, the actual status, and a note saying that an 
>>>>>> internal
>>>>>> shutdown was detected.
>>>>>>
>>>>>>
>>>>> Ack, but this needs to be explicit in this design. What exactly is
>>>>> going to appear in the "status" field? What about in oper_state and
>>>>> admin_state ? Is there any new field you're adding? (I assume not, but 
>>>>> this
>>>>> is important to specify).
>>>>>
>>>>
>>>> Ok, I'll detail all of this.
>>>>
>>>>
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>> Finally, can you comment on changes to gnt-instance
>>>>>>> start/failover/migrate and other ops, when they encounter an instance in
>>>>>>> this state? Will the RPC layer be changed to make this transparent to 
>>>>>>> them
>>>>>>> (how?) or will they need to deal with this explicitly?
>>>>>>>
>>>>>>
>>>>>> My idea is to run the cleanup function for a single instance just
>>>>>> before executing commands that would affect the running state of that
>>>>>> instance. This way, we'd end up being sure we always are in a clean state
>>>>>> before performing an operation.
>>>>>> I don't see a need to modify the RPC layer: it seems to me that this
>>>>>> is an hypervisor specific think, and could therefore be implemented
>>>>>> directly in the affected hypervisor-specific functions.
>>>>>>
>>>>>>
>>>>> Ack, but please add this to the design doc, specifying that it is only
>>>>> for functions that perform action on an instance, and not for ones who 
>>>>> just
>>>>> query it (if this is the case as I assume it is).
>>>>>
>>>>
>>>> Ok. Yes, of course, that is the case.
>>>>
>>>> Thanks,
>>>> Michele
>>>>
>>>
>>>
>> Interdiff:
>> diff --git a/doc/design-internal-shutdown.rst
>> b/doc/design-internal-shutdown.rst
>> index 836d00c..a92d837 100644
>> --- a/doc/design-internal-shutdown.rst
>> +++ b/doc/design-internal-shutdown.rst
>>  We propose to modify Ganeti in such a way that it will detect when an
>> instance
>>  was shutdown because of an explicit user request. When such a situation
>> is
>> -detected, the state of the instance will be set to ADMIN_down, as
>> intended by
>>  -the user.
>> +detected, instead of presenting an error as it happens now, either the
>> state
>> +of the instance will be set to ADMIN_down, or the instance will be
>> +automatically rebooted, depending on a instance-specific configuration
>> value.
>> +The default behavior in case no such parameter is found will be to follow
>> +the apparent will of the user, and setting to ADMIN_down an instance that
>> +was shut down correctly from inside.
>>
>>  This design document applies to the Xen backend of Ganeti, because it
>> uses
>> -features specific of such hypervisor.
>> +features specific of such hypervisor. Initial analysis suggests that a
>> similar
>> +approach might be used for KVM as well, so this design document will be
>> later
>> +extended to add more details about it.
>>
>>  Implementation
>>  ==============
>> @@ -65,6 +71,40 @@ destroy`` commands for all the domains that are in
>> crashed or shutdown state,
>>   since this will not be done automatically by Xen anymore because of the
>>  ``preserve`` setting in their config files.
>>
>> +This behavior will be limited to the domains shut down from inside,
>> because it
>> +will actually keep the resources of the domain busy until the watcher
>> will do
>> +the cleaning job (that, with the default setting, is up to every 5
>> minutes).
>> +Still, this is considered acceptable, because it is not frequent for a
>> domain
>> +to be shut down this way. The cleanup function will be also run
>> +automatically just before performing any job that requires resources to
>> be
>> +available (such as when creating a new instance), in order to ensure
>> that the
>> +new resource allocation happens starting from a clean state.
>> Functionalities
>> +that only query the state of instances will not run the cleanup function.
>> +
>> +This changes are hypervisor-specific and will not affect the external RPC
>> +interface.
>>
>
> Here you're talking about the RAPI I believe. The RPC interface (the
> internal one, between noded and masterd) will need to be changed to support
> this operation. Also the Hypervisor API will need changes for this, I
> believe.
>
> Finally, will there be a new LU, or how will the watcher perform the
> check/cleanup?
>
>
Interdiff to explain all this:
diff --git a/doc/design-internal-shutdown.rst
b/doc/design-internal-shutdown.rst
index a92d837..8b5e3c3 100644
--- a/doc/design-internal-shutdown.rst
+++ b/doc/design-internal-shutdown.rst
@@ -81,11 +81,28 @@ available (such as when creating a new instance), in
order to en
 new resource allocation happens starting from a clean state.
Functionalities
 that only query the state of instances will not run the cleanup function.

-This changes are hypervisor-specific and will not affect the external RPC
-interface.
-
-Side effects of the modification
-++++++++++++++++++++++++++++++++
+The cleanup operation includes both node-specific operations (the actual
+destruction of the stopped domains) and configuration changes, to be
performed
+on the master node (marking as offline an instance that was shut down
+internally). Therefore, it will be implemented by adding a LU in cmdlib
+(``LUCleanupInstances``). A Job executing such an opcode will be submitted
by
+the watcher to perform the cleanup.
+
+The node daemon will have to be modified in order to support at least the
+following RPC calls:
+ * A call to list all the instances that have been shutdown from inside
+ * A call to destroy a domain
+
+The base hypervisor class (and all the deriving classes) will need two
methods
+for implementing such functionalities in a hypervisor-specific way.
+
+LUs performing operations other than an explicit cleanup will have to be
+modified to perform the cleanup as well, either by submitting a job to
perform
+the cleanup (to be completed before actually performing the task at hand)
or by
+explicitly performing the cleanup themselves through the RPC calls.
+
+Other required changes
+++++++++++++++++++++++

 The implementation of this design document will require some commands to be
 changed in order to cope with the new shutdown procedure.




>  +
>> +Side effects of the modification
>> +++++++++++++++++++++++++++++++++
>> +
>>
>
> Other changes required, I would say, rather than "side effects".
>
>
>> +The implementation of this design document will require some commands to
>> be
>> +changed in order to cope with the new shutdown procedure.
>>  +
>> +With this modification, also the Ganeti command for shutting down
>> instances
>> +would leave them in a shutdown but preserved state. Therefore, it will be
>> +changed in such a way to immediately perform the cleanup of the instance
>> +after verifying its correct shutdown.
>> +
>> +The ``gnt-instance list`` command will need to be able to handle the
>> situation
>> +where an instance was shutdown internally but not yet cleaned up.
>> +The admin_state field will maintain the current meaning unchanged. The
>> +oper_state field will get a new possible state, ``S``, meaning that the
>> instance
>> +was shutdown internally.
>> +
>>
> +The ``gnt-instance info`` command ``State`` field, in such case, will
>> show a
>> +message stating that the instance was supposed to be run but was shut
>> down
>> +internally.
>> +
>>
>
> ACK.
>
> Thanks,
>
> Guido
>
>
Thanks,
Michele

Re: [PATCH master] Design doc for internal shutdown detection

Reply via email to