Re: [PATCH master] Design doc for internal shutdown detection

Michele Tartara Wed, 22 May 2013 07:27:10 -0700

On Wed, May 22, 2013 at 4:13 PM, Michele Tartara <[email protected]>wrote:


> On Wed, May 22, 2013 at 3:57 PM, Guido Trotter <[email protected]>wrote:
>
>>
>>
>>
>> On Wed, May 22, 2013 at 1:35 PM, Michele Tartara <[email protected]>wrote:
>>
>>> On Wed, May 22, 2013 at 1:30 PM, Guido Trotter <[email protected]>wrote:
>>>
>>>>
>>>>
>>>>
>>>> On Wed, May 22, 2013 at 1:25 PM, Michele Tartara 
>>>> <[email protected]>wrote:
>>>>
>>>>> On Wed, May 22, 2013 at 1:07 PM, Guido Trotter 
>>>>> <[email protected]>wrote:
>>>>>
>>>>>> Ack thanks.
>>>>>>
>>>>>> This introduced a race condition in instance start then, but there's
>>>>>> nothing we can do about it, except documenting it, I guess.
>>>>>>
>>>>>>
>>>>> Why in instance start? When you are starting an instance, either the
>>>>> instance is not running (and everything is fine), or it is in the 
>>>>> preserved
>>>>> state (and therefore it's first cleaned and then started again).
>>>>>
>>>>> If it is running already, it will be detected as such, and the start
>>>>> job will not run. If after that the instance is shutdown, I don't think
>>>>> this introduces any race condition.
>>>>>
>>>>>
>>>> Ack, true. Well, this anyway sounds sensible, so let's update the
>>>> design and see where we get.
>>>>
>>>>
>>> Interdiff:
>>>
>>> diff --git a/doc/design-internal-shutdown.rst
>>> b/doc/design-internal-shutdown.rst
>>> index 8b5e3c3..8b4d0fb 100644
>>> --- a/doc/design-internal-shutdown.rst
>>> +++ b/doc/design-internal-shutdown.rst
>>> @@ -84,14 +84,12 @@ that only query the state of instances will not run
>>> the cleanup function.
>>>  The cleanup operation includes both node-specific operations (the actual
>>>  destruction of the stopped domains) and configuration changes, to be
>>> performed
>>>  on the master node (marking as offline an instance that was shut down
>>> -internally). Therefore, it will be implemented by adding a LU in cmdlib
>>> -(``LUCleanupInstances``). A Job executing such an opcode will be
>>> submitted by
>>> -the watcher to perform the cleanup.
>>> -
>>> -The node daemon will have to be modified in order to support at least
>>> the
>>> -following RPC calls:
>>> - * A call to list all the instances that have been shutdown from inside
>>> - * A call to destroy a domain
>>> +internally). The watcher (that runs on every node) will be able to
>>> detect the
>>> +instances that have been shutdown from inside by directly querying the
>>> +hypervisor. It will then submit to the master node a series of
>>> +``InstanceShutdown`` jobs that will mark such instances as
>>> ``ADMIN_down``
>>> +and clean them up (after the functionality of ``InstanceShutdown`` will
>>> have
>>> +been extended as specified in this design document).
>>>
>>
>> Actually I think you don't want the watcher to do this from each node,
>> but just from the master fetching the instance list and submitting the
>> relevant jobs. Am I wrong? Why should the watcher query the hypervisor
>> directly??
>>
>
> Given that it's running on all the nodes, I though this could reduce the
> number of additional jobs and queries, but it's no problem to have it run
> from master. On the other hand, it's actually quite easier.
>

Interdiff:
diff --git a/doc/design-internal-shutdown.rst
b/doc/design-internal-shutdown.rst
index 8b4d0fb..e1cc864 100644
--- a/doc/design-internal-shutdown.rst
+++ b/doc/design-internal-shutdown.rst
@@ -84,15 +84,12 @@ that only query the state of instances will not run the
cleanup
 The cleanup operation includes both node-specific operations (the actual
 destruction of the stopped domains) and configuration changes, to be
performed
 on the master node (marking as offline an instance that was shut down
-internally). The watcher (that runs on every node) will be able to detect
the
-instances that have been shutdown from inside by directly querying the
-hypervisor. It will then submit to the master node a series of
-``InstanceShutdown`` jobs that will mark such instances as ``ADMIN_down``
-and clean them up (after the functionality of ``InstanceShutdown`` will
have
-been extended as specified in this design document).
-
-The base hypervisor class (and all the deriving classes) will need two
methods
-for implementing such functionalities in a hypervisor-specific way.
+internally). The watcher, on the master node, will fetch the list of
instances
+that have been shutdown from inside (recognizable by their ``oper_state``
+as described below). It will then submit a series of ``InstanceShutdown``
jobs
+that will mark such instances as ``ADMIN_down`` and clean them up (after
+the functionality of ``InstanceShutdown`` will have been extended as
specified
+in the rest of this design document).

 LUs performing operations other than an explicit cleanup will have to be
 modified to perform the cleanup as well, either by submitting a job to
perform


Cheers,
Michele

Re: [PATCH master] Design doc for internal shutdown detection

Reply via email to