On Wed, May 22, 2013 at 4:13 PM, Michele Tartara <[email protected]>wrote:
> On Wed, May 22, 2013 at 3:57 PM, Guido Trotter <[email protected]>wrote: > >> >> >> >> On Wed, May 22, 2013 at 1:35 PM, Michele Tartara <[email protected]>wrote: >> >>> On Wed, May 22, 2013 at 1:30 PM, Guido Trotter <[email protected]>wrote: >>> >>>> >>>> >>>> >>>> On Wed, May 22, 2013 at 1:25 PM, Michele Tartara >>>> <[email protected]>wrote: >>>> >>>>> On Wed, May 22, 2013 at 1:07 PM, Guido Trotter >>>>> <[email protected]>wrote: >>>>> >>>>>> Ack thanks. >>>>>> >>>>>> This introduced a race condition in instance start then, but there's >>>>>> nothing we can do about it, except documenting it, I guess. >>>>>> >>>>>> >>>>> Why in instance start? When you are starting an instance, either the >>>>> instance is not running (and everything is fine), or it is in the >>>>> preserved >>>>> state (and therefore it's first cleaned and then started again). >>>>> >>>>> If it is running already, it will be detected as such, and the start >>>>> job will not run. If after that the instance is shutdown, I don't think >>>>> this introduces any race condition. >>>>> >>>>> >>>> Ack, true. Well, this anyway sounds sensible, so let's update the >>>> design and see where we get. >>>> >>>> >>> Interdiff: >>> >>> diff --git a/doc/design-internal-shutdown.rst >>> b/doc/design-internal-shutdown.rst >>> index 8b5e3c3..8b4d0fb 100644 >>> --- a/doc/design-internal-shutdown.rst >>> +++ b/doc/design-internal-shutdown.rst >>> @@ -84,14 +84,12 @@ that only query the state of instances will not run >>> the cleanup function. >>> The cleanup operation includes both node-specific operations (the actual >>> destruction of the stopped domains) and configuration changes, to be >>> performed >>> on the master node (marking as offline an instance that was shut down >>> -internally). Therefore, it will be implemented by adding a LU in cmdlib >>> -(``LUCleanupInstances``). A Job executing such an opcode will be >>> submitted by >>> -the watcher to perform the cleanup. >>> - >>> -The node daemon will have to be modified in order to support at least >>> the >>> -following RPC calls: >>> - * A call to list all the instances that have been shutdown from inside >>> - * A call to destroy a domain >>> +internally). The watcher (that runs on every node) will be able to >>> detect the >>> +instances that have been shutdown from inside by directly querying the >>> +hypervisor. It will then submit to the master node a series of >>> +``InstanceShutdown`` jobs that will mark such instances as >>> ``ADMIN_down`` >>> +and clean them up (after the functionality of ``InstanceShutdown`` will >>> have >>> +been extended as specified in this design document). >>> >> >> Actually I think you don't want the watcher to do this from each node, >> but just from the master fetching the instance list and submitting the >> relevant jobs. Am I wrong? Why should the watcher query the hypervisor >> directly?? >> > > Given that it's running on all the nodes, I though this could reduce the > number of additional jobs and queries, but it's no problem to have it run > from master. On the other hand, it's actually quite easier. > Interdiff: diff --git a/doc/design-internal-shutdown.rst b/doc/design-internal-shutdown.rst index 8b4d0fb..e1cc864 100644 --- a/doc/design-internal-shutdown.rst +++ b/doc/design-internal-shutdown.rst @@ -84,15 +84,12 @@ that only query the state of instances will not run the cleanup The cleanup operation includes both node-specific operations (the actual destruction of the stopped domains) and configuration changes, to be performed on the master node (marking as offline an instance that was shut down -internally). The watcher (that runs on every node) will be able to detect the -instances that have been shutdown from inside by directly querying the -hypervisor. It will then submit to the master node a series of -``InstanceShutdown`` jobs that will mark such instances as ``ADMIN_down`` -and clean them up (after the functionality of ``InstanceShutdown`` will have -been extended as specified in this design document). - -The base hypervisor class (and all the deriving classes) will need two methods -for implementing such functionalities in a hypervisor-specific way. +internally). The watcher, on the master node, will fetch the list of instances +that have been shutdown from inside (recognizable by their ``oper_state`` +as described below). It will then submit a series of ``InstanceShutdown`` jobs +that will mark such instances as ``ADMIN_down`` and clean them up (after +the functionality of ``InstanceShutdown`` will have been extended as specified +in the rest of this design document). LUs performing operations other than an explicit cleanup will have to be modified to perform the cleanup as well, either by submitting a job to perform Cheers, Michele
