On Wed, May 22, 2013 at 3:57 PM, Guido Trotter <[email protected]> wrote:
> > > > On Wed, May 22, 2013 at 1:35 PM, Michele Tartara <[email protected]>wrote: > >> On Wed, May 22, 2013 at 1:30 PM, Guido Trotter <[email protected]>wrote: >> >>> >>> >>> >>> On Wed, May 22, 2013 at 1:25 PM, Michele Tartara <[email protected]>wrote: >>> >>>> On Wed, May 22, 2013 at 1:07 PM, Guido Trotter <[email protected]>wrote: >>>> >>>>> Ack thanks. >>>>> >>>>> This introduced a race condition in instance start then, but there's >>>>> nothing we can do about it, except documenting it, I guess. >>>>> >>>>> >>>> Why in instance start? When you are starting an instance, either the >>>> instance is not running (and everything is fine), or it is in the preserved >>>> state (and therefore it's first cleaned and then started again). >>>> >>>> If it is running already, it will be detected as such, and the start >>>> job will not run. If after that the instance is shutdown, I don't think >>>> this introduces any race condition. >>>> >>>> >>> Ack, true. Well, this anyway sounds sensible, so let's update the design >>> and see where we get. >>> >>> >> Interdiff: >> >> diff --git a/doc/design-internal-shutdown.rst >> b/doc/design-internal-shutdown.rst >> index 8b5e3c3..8b4d0fb 100644 >> --- a/doc/design-internal-shutdown.rst >> +++ b/doc/design-internal-shutdown.rst >> @@ -84,14 +84,12 @@ that only query the state of instances will not run >> the cleanup function. >> The cleanup operation includes both node-specific operations (the actual >> destruction of the stopped domains) and configuration changes, to be >> performed >> on the master node (marking as offline an instance that was shut down >> -internally). Therefore, it will be implemented by adding a LU in cmdlib >> -(``LUCleanupInstances``). A Job executing such an opcode will be >> submitted by >> -the watcher to perform the cleanup. >> - >> -The node daemon will have to be modified in order to support at least the >> -following RPC calls: >> - * A call to list all the instances that have been shutdown from inside >> - * A call to destroy a domain >> +internally). The watcher (that runs on every node) will be able to >> detect the >> +instances that have been shutdown from inside by directly querying the >> +hypervisor. It will then submit to the master node a series of >> +``InstanceShutdown`` jobs that will mark such instances as ``ADMIN_down`` >> +and clean them up (after the functionality of ``InstanceShutdown`` will >> have >> +been extended as specified in this design document). >> > > Actually I think you don't want the watcher to do this from each node, but > just from the master fetching the instance list and submitting the relevant > jobs. Am I wrong? Why should the watcher query the hypervisor directly?? > Given that it's running on all the nodes, I though this could reduce the number of additional jobs and queries, but it's no problem to have it run from master. On the other hand, it's actually quite easier. Thanks, Michele
