LGTM, Thanks,
Guido On 22 May 2013 16:27, "Michele Tartara" <[email protected]> wrote: > On Wed, May 22, 2013 at 4:13 PM, Michele Tartara <[email protected]>wrote: > >> On Wed, May 22, 2013 at 3:57 PM, Guido Trotter <[email protected]>wrote: >> >>> >>> >>> >>> On Wed, May 22, 2013 at 1:35 PM, Michele Tartara <[email protected]>wrote: >>> >>>> On Wed, May 22, 2013 at 1:30 PM, Guido Trotter <[email protected]>wrote: >>>> >>>>> >>>>> >>>>> >>>>> On Wed, May 22, 2013 at 1:25 PM, Michele Tartara >>>>> <[email protected]>wrote: >>>>> >>>>>> On Wed, May 22, 2013 at 1:07 PM, Guido Trotter >>>>>> <[email protected]>wrote: >>>>>> >>>>>>> Ack thanks. >>>>>>> >>>>>>> This introduced a race condition in instance start then, but there's >>>>>>> nothing we can do about it, except documenting it, I guess. >>>>>>> >>>>>>> >>>>>> Why in instance start? When you are starting an instance, either the >>>>>> instance is not running (and everything is fine), or it is in the >>>>>> preserved >>>>>> state (and therefore it's first cleaned and then started again). >>>>>> >>>>>> If it is running already, it will be detected as such, and the start >>>>>> job will not run. If after that the instance is shutdown, I don't think >>>>>> this introduces any race condition. >>>>>> >>>>>> >>>>> Ack, true. Well, this anyway sounds sensible, so let's update the >>>>> design and see where we get. >>>>> >>>>> >>>> Interdiff: >>>> >>>> diff --git a/doc/design-internal-shutdown.rst >>>> b/doc/design-internal-shutdown.rst >>>> index 8b5e3c3..8b4d0fb 100644 >>>> --- a/doc/design-internal-shutdown.rst >>>> +++ b/doc/design-internal-shutdown.rst >>>> @@ -84,14 +84,12 @@ that only query the state of instances will not run >>>> the cleanup function. >>>> The cleanup operation includes both node-specific operations (the >>>> actual >>>> destruction of the stopped domains) and configuration changes, to be >>>> performed >>>> on the master node (marking as offline an instance that was shut down >>>> -internally). Therefore, it will be implemented by adding a LU in cmdlib >>>> -(``LUCleanupInstances``). A Job executing such an opcode will be >>>> submitted by >>>> -the watcher to perform the cleanup. >>>> - >>>> -The node daemon will have to be modified in order to support at least >>>> the >>>> -following RPC calls: >>>> - * A call to list all the instances that have been shutdown from inside >>>> - * A call to destroy a domain >>>> +internally). The watcher (that runs on every node) will be able to >>>> detect the >>>> +instances that have been shutdown from inside by directly querying the >>>> +hypervisor. It will then submit to the master node a series of >>>> +``InstanceShutdown`` jobs that will mark such instances as >>>> ``ADMIN_down`` >>>> +and clean them up (after the functionality of ``InstanceShutdown`` >>>> will have >>>> +been extended as specified in this design document). >>>> >>> >>> Actually I think you don't want the watcher to do this from each node, >>> but just from the master fetching the instance list and submitting the >>> relevant jobs. Am I wrong? Why should the watcher query the hypervisor >>> directly?? >>> >> >> Given that it's running on all the nodes, I though this could reduce the >> number of additional jobs and queries, but it's no problem to have it run >> from master. On the other hand, it's actually quite easier. >> > > Interdiff: > diff --git a/doc/design-internal-shutdown.rst > b/doc/design-internal-shutdown.rst > index 8b4d0fb..e1cc864 100644 > --- a/doc/design-internal-shutdown.rst > +++ b/doc/design-internal-shutdown.rst > @@ -84,15 +84,12 @@ that only query the state of instances will not run > the cleanup > The cleanup operation includes both node-specific operations (the actual > destruction of the stopped domains) and configuration changes, to be > performed > on the master node (marking as offline an instance that was shut down > -internally). The watcher (that runs on every node) will be able to detect > the > -instances that have been shutdown from inside by directly querying the > -hypervisor. It will then submit to the master node a series of > -``InstanceShutdown`` jobs that will mark such instances as ``ADMIN_down`` > -and clean them up (after the functionality of ``InstanceShutdown`` will > have > -been extended as specified in this design document). > - > -The base hypervisor class (and all the deriving classes) will need two > methods > -for implementing such functionalities in a hypervisor-specific way. > +internally). The watcher, on the master node, will fetch the list of > instances > +that have been shutdown from inside (recognizable by their ``oper_state`` > +as described below). It will then submit a series of ``InstanceShutdown`` > jobs > +that will mark such instances as ``ADMIN_down`` and clean them up (after > +the functionality of ``InstanceShutdown`` will have been extended as > specified > +in the rest of this design document). > > LUs performing operations other than an explicit cleanup will have to be > modified to perform the cleanup as well, either by submitting a job to > perform > > > Cheers, > Michele >
