Re: [PATCH master] Design doc for internal shutdown detection

Guido Trotter Wed, 22 May 2013 07:49:19 -0700

LGTM,

Thanks,


Guido
On 22 May 2013 16:27, "Michele Tartara" <[email protected]> wrote:

> On Wed, May 22, 2013 at 4:13 PM, Michele Tartara <[email protected]>wrote:
>
>> On Wed, May 22, 2013 at 3:57 PM, Guido Trotter <[email protected]>wrote:
>>
>>>
>>>
>>>
>>> On Wed, May 22, 2013 at 1:35 PM, Michele Tartara <[email protected]>wrote:
>>>
>>>> On Wed, May 22, 2013 at 1:30 PM, Guido Trotter <[email protected]>wrote:
>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wed, May 22, 2013 at 1:25 PM, Michele Tartara 
>>>>> <[email protected]>wrote:
>>>>>
>>>>>> On Wed, May 22, 2013 at 1:07 PM, Guido Trotter 
>>>>>> <[email protected]>wrote:
>>>>>>
>>>>>>> Ack thanks.
>>>>>>>
>>>>>>> This introduced a race condition in instance start then, but there's
>>>>>>> nothing we can do about it, except documenting it, I guess.
>>>>>>>
>>>>>>>
>>>>>> Why in instance start? When you are starting an instance, either the
>>>>>> instance is not running (and everything is fine), or it is in the 
>>>>>> preserved
>>>>>> state (and therefore it's first cleaned and then started again).
>>>>>>
>>>>>> If it is running already, it will be detected as such, and the start
>>>>>> job will not run. If after that the instance is shutdown, I don't think
>>>>>> this introduces any race condition.
>>>>>>
>>>>>>
>>>>> Ack, true. Well, this anyway sounds sensible, so let's update the
>>>>> design and see where we get.
>>>>>
>>>>>
>>>> Interdiff:
>>>>
>>>> diff --git a/doc/design-internal-shutdown.rst
>>>> b/doc/design-internal-shutdown.rst
>>>> index 8b5e3c3..8b4d0fb 100644
>>>> --- a/doc/design-internal-shutdown.rst
>>>> +++ b/doc/design-internal-shutdown.rst
>>>> @@ -84,14 +84,12 @@ that only query the state of instances will not run
>>>> the cleanup function.
>>>>  The cleanup operation includes both node-specific operations (the
>>>> actual
>>>>  destruction of the stopped domains) and configuration changes, to be
>>>> performed
>>>>  on the master node (marking as offline an instance that was shut down
>>>> -internally). Therefore, it will be implemented by adding a LU in cmdlib
>>>> -(``LUCleanupInstances``). A Job executing such an opcode will be
>>>> submitted by
>>>> -the watcher to perform the cleanup.
>>>> -
>>>> -The node daemon will have to be modified in order to support at least
>>>> the
>>>> -following RPC calls:
>>>> - * A call to list all the instances that have been shutdown from inside
>>>> - * A call to destroy a domain
>>>> +internally). The watcher (that runs on every node) will be able to
>>>> detect the
>>>> +instances that have been shutdown from inside by directly querying the
>>>> +hypervisor. It will then submit to the master node a series of
>>>> +``InstanceShutdown`` jobs that will mark such instances as
>>>> ``ADMIN_down``
>>>> +and clean them up (after the functionality of ``InstanceShutdown``
>>>> will have
>>>> +been extended as specified in this design document).
>>>>
>>>
>>> Actually I think you don't want the watcher to do this from each node,
>>> but just from the master fetching the instance list and submitting the
>>> relevant jobs. Am I wrong? Why should the watcher query the hypervisor
>>> directly??
>>>
>>
>> Given that it's running on all the nodes, I though this could reduce the
>> number of additional jobs and queries, but it's no problem to have it run
>> from master. On the other hand, it's actually quite easier.
>>
>
> Interdiff:
> diff --git a/doc/design-internal-shutdown.rst
> b/doc/design-internal-shutdown.rst
> index 8b4d0fb..e1cc864 100644
> --- a/doc/design-internal-shutdown.rst
> +++ b/doc/design-internal-shutdown.rst
> @@ -84,15 +84,12 @@ that only query the state of instances will not run
> the cleanup
>  The cleanup operation includes both node-specific operations (the actual
>  destruction of the stopped domains) and configuration changes, to be
> performed
>  on the master node (marking as offline an instance that was shut down
> -internally). The watcher (that runs on every node) will be able to detect
> the
> -instances that have been shutdown from inside by directly querying the
> -hypervisor. It will then submit to the master node a series of
> -``InstanceShutdown`` jobs that will mark such instances as ``ADMIN_down``
> -and clean them up (after the functionality of ``InstanceShutdown`` will
> have
> -been extended as specified in this design document).
> -
> -The base hypervisor class (and all the deriving classes) will need two
> methods
> -for implementing such functionalities in a hypervisor-specific way.
> +internally). The watcher, on the master node, will fetch the list of
> instances
> +that have been shutdown from inside (recognizable by their ``oper_state``
> +as described below). It will then submit a series of ``InstanceShutdown``
> jobs
> +that will mark such instances as ``ADMIN_down`` and clean them up (after
> +the functionality of ``InstanceShutdown`` will have been extended as
> specified
> +in the rest of this design document).
>
>  LUs performing operations other than an explicit cleanup will have to be
>  modified to perform the cleanup as well, either by submitting a job to
> perform
>
>
> Cheers,
> Michele
>

Re: [PATCH master] Design doc for internal shutdown detection

Reply via email to