On Wed, May 22, 2013 at 1:35 PM, Michele Tartara <[email protected]>wrote:
> On Wed, May 22, 2013 at 1:30 PM, Guido Trotter <[email protected]>wrote: > >> >> >> >> On Wed, May 22, 2013 at 1:25 PM, Michele Tartara <[email protected]>wrote: >> >>> On Wed, May 22, 2013 at 1:07 PM, Guido Trotter <[email protected]>wrote: >>> >>>> Ack thanks. >>>> >>>> This introduced a race condition in instance start then, but there's >>>> nothing we can do about it, except documenting it, I guess. >>>> >>>> >>> Why in instance start? When you are starting an instance, either the >>> instance is not running (and everything is fine), or it is in the preserved >>> state (and therefore it's first cleaned and then started again). >>> >>> If it is running already, it will be detected as such, and the start job >>> will not run. If after that the instance is shutdown, I don't think this >>> introduces any race condition. >>> >>> >> Ack, true. Well, this anyway sounds sensible, so let's update the design >> and see where we get. >> >> > Interdiff: > > diff --git a/doc/design-internal-shutdown.rst > b/doc/design-internal-shutdown.rst > index 8b5e3c3..8b4d0fb 100644 > --- a/doc/design-internal-shutdown.rst > +++ b/doc/design-internal-shutdown.rst > @@ -84,14 +84,12 @@ that only query the state of instances will not run > the cleanup function. > The cleanup operation includes both node-specific operations (the actual > destruction of the stopped domains) and configuration changes, to be > performed > on the master node (marking as offline an instance that was shut down > -internally). Therefore, it will be implemented by adding a LU in cmdlib > -(``LUCleanupInstances``). A Job executing such an opcode will be > submitted by > -the watcher to perform the cleanup. > - > -The node daemon will have to be modified in order to support at least the > -following RPC calls: > - * A call to list all the instances that have been shutdown from inside > - * A call to destroy a domain > +internally). The watcher (that runs on every node) will be able to detect > the > +instances that have been shutdown from inside by directly querying the > +hypervisor. It will then submit to the master node a series of > +``InstanceShutdown`` jobs that will mark such instances as ``ADMIN_down`` > +and clean them up (after the functionality of ``InstanceShutdown`` will > have > +been extended as specified in this design document). > Actually I think you don't want the watcher to do this from each node, but just from the master fetching the instance list and submitting the relevant jobs. Am I wrong? Why should the watcher query the hypervisor directly?? Thanks, Guido
