On Wed, May 22, 2013 at 1:30 PM, Guido Trotter <[email protected]> wrote:
> > > > On Wed, May 22, 2013 at 1:25 PM, Michele Tartara <[email protected]>wrote: > >> On Wed, May 22, 2013 at 1:07 PM, Guido Trotter <[email protected]>wrote: >> >>> Ack thanks. >>> >>> This introduced a race condition in instance start then, but there's >>> nothing we can do about it, except documenting it, I guess. >>> >>> >> Why in instance start? When you are starting an instance, either the >> instance is not running (and everything is fine), or it is in the preserved >> state (and therefore it's first cleaned and then started again). >> >> If it is running already, it will be detected as such, and the start job >> will not run. If after that the instance is shutdown, I don't think this >> introduces any race condition. >> >> > Ack, true. Well, this anyway sounds sensible, so let's update the design > and see where we get. > > Interdiff: diff --git a/doc/design-internal-shutdown.rst b/doc/design-internal-shutdown.rst index 8b5e3c3..8b4d0fb 100644 --- a/doc/design-internal-shutdown.rst +++ b/doc/design-internal-shutdown.rst @@ -84,14 +84,12 @@ that only query the state of instances will not run the cleanup function. The cleanup operation includes both node-specific operations (the actual destruction of the stopped domains) and configuration changes, to be performed on the master node (marking as offline an instance that was shut down -internally). Therefore, it will be implemented by adding a LU in cmdlib -(``LUCleanupInstances``). A Job executing such an opcode will be submitted by -the watcher to perform the cleanup. - -The node daemon will have to be modified in order to support at least the -following RPC calls: - * A call to list all the instances that have been shutdown from inside - * A call to destroy a domain +internally). The watcher (that runs on every node) will be able to detect the +instances that have been shutdown from inside by directly querying the +hypervisor. It will then submit to the master node a series of +``InstanceShutdown`` jobs that will mark such instances as ``ADMIN_down`` +and clean them up (after the functionality of ``InstanceShutdown`` will have +been extended as specified in this design document). The base hypervisor class (and all the deriving classes) will need two methods for implementing such functionalities in a hypervisor-specific way. @@ -107,16 +105,20 @@ Other required changes The implementation of this design document will require some commands to be changed in order to cope with the new shutdown procedure. -With this modification, also the Ganeti command for shutting down instances -would leave them in a shutdown but preserved state. Therefore, it will be -changed in such a way to immediately perform the cleanup of the instance -after verifying its correct shutdown. +With the default shutdown action in Xen set to ``preserve``, the Ganeti +command for shutting down instances would leave them in a shutdown but +preserved state. Therefore, it will have to be changed in such a way to +immediately perform the cleanup of the instance after verifying its correct +shutdown. Also, it will correctly deal with instances that have been shutdown +from inside but are still active according to Ganeti, by detecting this +situation, destroying the instance and carrying out the rest of the Ganeti +shutdown procedure as usual. The ``gnt-instance list`` command will need to be able to handle the situation where an instance was shutdown internally but not yet cleaned up. -The admin_state field will maintain the current meaning unchanged. The -oper_state field will get a new possible state, ``S``, meaning that the instance -was shutdown internally. +The ``admin_state`` field will maintain the current meaning unchanged. The +``oper_state`` field will get a new possible state, ``S``, meaning that the +instance was shutdown internally. The ``gnt-instance info`` command ``State`` field, in such case, will show a message stating that the instance was supposed to be run but was shut down Thanks, Michele
