New paragraph in further considerations section:

Interdiff:

diff --git a/doc/design-kvmd.rst b/doc/design-kvmd.rst
index eaf21d0..062dece 100644
--- a/doc/design-kvmd.rst
+++ b/doc/design-kvmd.rst
@@ -175,6 +175,18 @@ the KVM daemon has a chance to add a watch to the KVM 
control directory,
 thus causing this daemon to miss the ``inotify`` creation event for the
 QMP socket.
 
+Another race condition occurs when Ganeti shuts down a KVM instance
+using force.  Ganeti uses ``TERM`` signals to stop KVM instances when
+force is specified or ACPI is not enabled.  However, as mentioned
+before, ``TERM`` signals are interpreted by the KVM daemon as a user
+shutdown.  As a result, the KVM daemon creates a shutdown file which
+then must be removed by Ganeti.  The race condition occurs because the
+KVM daemon might create the shutdown file after the hypervisor code that
+tries to remove this file has already run.  In practice, the race
+condition seems unlikely because Ganeti stops the KVM instance in a
+retry loop, which allows Ganeti to stop the instance and cleanup its
+runtime information.
+
 There are other problems which arise from the limitations of
 ``inotify``.  For example, if the KVM daemon is started after the first
 Ganeti instance has been created, then the ``inotify`` will not produce

Thanks,
Jose

On Thu, Nov 28, 2013 at 09:00:29AM +0100, Thomas Thrainer wrote:
> What are the advantages of creating a new daemon compared to include this
> functionality in noded?
> noded already holds a runtime configuration file for each instance where it
> would be easy to add the shutdown state. It also knows when an instance is
> started / stopped through Ganeti and when the respective directories are
> created, so there is no need for inotify watches.
> It also has already support for communicating via QMP with the instance
> which could be reused.
> 
> 
> On Wed, Nov 27, 2013 at 6:49 PM, Jose A. Lopes <[email protected]> wrote:
> 
> > Design document for KVM daemon which is needed by the instance
> > shutdown detection for KVM.
> >
> > Signed-off-by: Jose A. Lopes <[email protected]>
> > ---
> >  Makefile.am         |   1 +
> >  doc/design-kvmd.rst | 146
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  doc/index.rst       |   1 +
> >  3 files changed, 148 insertions(+)
> >  create mode 100644 doc/design-kvmd.rst
> >
> > diff --git a/Makefile.am b/Makefile.am
> > index 10e9962..1249bd5 100644
> > --- a/Makefile.am
> > +++ b/Makefile.am
> > @@ -520,6 +520,7 @@ docinput = \
> >         doc/design-hugepages-support.rst \
> >         doc/design-impexp2.rst \
> >         doc/design-internal-shutdown.rst \
> > +       doc/design-kvmd.rst \
> >         doc/design-linuxha.rst \
> >         doc/design-lu-generated-jobs.rst \
> >         doc/design-monitoring-agent.rst \
> > diff --git a/doc/design-kvmd.rst b/doc/design-kvmd.rst
> > new file mode 100644
> > index 0000000..079c0b3
> > --- /dev/null
> > +++ b/doc/design-kvmd.rst
> > @@ -0,0 +1,146 @@
> > +==========
> > +KVM daemon
> > +==========
> > +
> > +.. Last updated for Ganeti 2.10
> > +
> > +.. toctree::
> > +   :maxdepth: 2
> > +
> > +This design document describes the KVM daemon, which is responsible for
> > +determining whether a given KVM instance was shutdown by an
> > +administrator or a user.
> > +
> > +
> > +Current state and shortcomings
> > +==============================
> > +
> > +This design document describes the KVM daemon which addresses the KVM
> > +side of the user-initiated shutdown problem introduced in
> > +:doc:`design-internal-shutdown`
> > +
> > +
> > +Proposed changes
> > +================
> > +
> > +The instance shutdown feature for KVM requires listening on events from
> > +the Qemu Machine Protocol (QMP) Unix socket, which is created together
> > +with a KVM instance.  A QMP socket typically looks like
> > +``/var/run/ganeti/kvm-hypervisor/ctrl/<instance>.qmp`` and implements
> > +the QMP protocol.  This is a bidirectional protocol that allows Ganeti
> > +to send commands, such as, system powerdown, as well as, receive events,
> > +such as, the powerdown and shutdown events.
> > +
> > +Listening in on these events allows Ganeti to determine whether a given
> > +KVM instance was shutdown by an administrator, either through
> > +``gnt-instance stop|remove <instance>`` or ``kill -KILL
> > +<instance-pid>``, or by a user, through ``poweroff`` from inside the
> > +instance.  Upon an administrator powerdown, the QMP protocol sends two
> > +events, namely, a powerdown event and a shutdown event, whereas upon a
> > +user shutdown only the shutdown event is sent.  This is enough to
> > +distinguish between an administrator and a user shutdown.  However,
> > +there is one limitation, which is, ``kill -TERM <instance-pid>``.  Even
> > +though this is an action performed by the administrator, it will be
> > +considered a user shutdown by the approach described in this document.
> > +
> > +Several design strategies were considered.  Most of these strategies
> > +consisted of spawning some process listening on the QMP socket when a
> > +KVM instance is created.  However, having a listener process per KVM
> > +instance is not scalable.  Therefore, a different strategy is proposed,
> > +namely, having a single process, called the KVM daemon, listening on the
> > +QMP sockets of all KVM instances within a node.  That also means there
> > +is an instance of the KVM daemon on each node.
> > +
> > +In order to implement the KVM daemon, two problems need to be addressed,
> > +namely, how the KVM daemon knows when to open a connection to a given
> > +QMP socket and how the KVM daemon communicates with Ganeti whether a
> > +given instance was shutdown by an administrator or a user.
> >
> 
> Having the functionality in noded would avoid those problems, right?
> 
> 
> > +
> > +QMP connections management
> > +--------------------------
> > +
> > +As mentioned before, the QMP sockets reside in the KVM control
> > +directory, which is usually located under
> > +``/var/run/ganeti/kvm-hypervisor/ctrl/``.  When a KVM instance is
> > +created, a new QMP socket for this instance is also created in this
> > +directory.
> > +
> > +In order to simplify the design of the KVM daemon, instead of having
> > +Ganeti communicate to this daemon through a pipe or socket the creation
> > +of a new KVM instance, and thus a new QMP socket, this daemon will
> > +monitor the KVM control directory using ``inotify``.  As a result, the
> > +daemon is not only able to deal with KVM instances being created and
> > +removed, but also capable of overcoming other problematic situations
> > +concerning the filesystem, such as, the case when the KVM control
> > +directory does not exist because, for example, Ganeti was not yet
> > +started, or the KVM control directory was removed, for example, as a
> > +result of a Ganeti reinstallation.
> > +
> > +Shutdown detection
> > +------------------
> > +
> > +As mentioned before, the KVM daemon is responsbile for opening a
> > +connection to the QMP socket of a given instance and listening in on the
> > +shutdown and powerdown events, which allow the KVM daemon to determine
> > +whether the instance stopped because of an administrator or user
> > +shutdown.  Once the instance is stopped, the KVM daemon needs to
> > +communicate to Ganeti whether the user was responsible for shutting down
> > +the instance.
> > +
> > +In order to achieve this, the KVM daemon writes an empty file, called
> > +the shutdown file, in the KVM control directory with a name similar to
> > +the QMP socket file but with the extension ``.qmp`` replaced with
> > +``.shutdown``.  The presence of this file indicates that the shutdown
> > +was initiated by a user, whereas the absence of this file indicates that
> > +the shutdown was caused by an administrator.  This strategy also handles
> > +crashes and signals, such as, ``SIGKILL``, to be handled correctly,
> > +given that in these cases the KVM daemon never receives the powerdown
> > +and shutdown events and, therefore, never creates the shutdown file.
> > +
> > +KVM daemon launch
> > +-----------------
> > +
> > +With the above issues addressed, a question remains as to when the KVM
> > +daemon should be started.  The KVM daemon is different from other Ganeti
> > +daemons, which start together with the Ganeti service, because the KVM
> > +daemon is optional, given that it is specific to KVM and should not be
> > +run on installations containing only Xen, and, even in a KVM
> > +installation, the user might still choose not to enable it.  And finally
> > +because the KVM daemon is not really necessary until the first KVM
> > +instance is started.  For these reasons, the KVM daemon is started from
> > +within Ganeti when a KVM instance is started.  And the job process
> > +spawned by the node daemon is responsible for starting the KVM daemon.
> > +
> > +Given the current design of Ganeti, in which the node daemon spawns a
> > +job process to handle the creation of the instance, when launching the
> > +KVM daemon it is necessary to first check whether an instance of this
> > +daemon is already running and, if this is not the case, then the KVM
> > +daemon can be safely started.
> >
> 
> The node daemon does not spawn a process for handling requests, AFAIK.
> Instead, a dedicated thread is created for each request.
> KVM shutdown detection could run in a separate thread which communicates
> with the request-processing threads via a queue but writes the shutdown
> "result" directly into the KVM runtime files (with a minimal amount of
> locking required there).
> 
> 
> > +
> > +Further considerations
> > +======================
> > +
> > +There are potential race conditions between Ganeti and the KVM daemon,
> > +however, in practice they seem unlikely.  For example, the KVM daemon
> > +needs to add and remove watches to the parent directories of the KVM
> > +control directory until this directory is finally created.  It is
> > +possible that Ganeti creates this directory and a KVM instance before
> > +the KVM daemon has a chance to add a watch to the KVM control directory,
> > +thus causing this daemon to miss the ``inotify`` creation event for the
> > +QMP socket.
> >
> 
> This would be no problem neither, right?
> 
> 
> > +
> > +There are other problems which arise from the limitations of
> > +``inotify``.  For example, if the KVM daemon is started after the first
> > +Ganeti instance has been created, then the ``inotify`` will not produce
> > +any event for the creation of the QMP socket.  This can happen, for
> > +example, if the KVM daemon needs to be restarted or upgraded.  As a
> > +result, it might be necessary to have an additional mechanism that runs
> > +at KVM daemon startup or at regular intervals to ensure that the current
> > +KVM internal state is consistent with the actual contents of the KVM
> > +control directory.
> >
> 
> As the node daemon has the authoritative state anyway (stored in runtime
> files, so they survive restarts), this would be simpler too.
> 
> 
> > +
> > +.. vim: set textwidth=72 :
> > +.. Local Variables:
> > +.. mode: rst
> > +.. fill-column: 72
> > +.. End:
> > diff --git a/doc/index.rst b/doc/index.rst
> > index 7ec8162..df443e0 100644
> > --- a/doc/index.rst
> > +++ b/doc/index.rst
> > @@ -116,6 +116,7 @@ Draft designs
> >     design-device-uuid-name.rst
> >     design-hroller.rst
> >     design-hotplug.rst
> > +   design-kvmd.rst
> >     design-linuxha.rst
> >     design-lu-generated-jobs.rst
> >     design-monitoring-agent.rst
> > --
> > 1.8.4.1
> >
> >
> Thanks,
> Thomas
> 
> -- 
> Thomas Thrainer | Software Engineer | [email protected] |
> 
> Google Germany GmbH
> Dienerstr. 12
> 80331 München
> 
> Registergericht und -nummer: Hamburg, HRB 86891
> Sitz der Gesellschaft: Hamburg
> Geschäftsführer: Graham Law, Christine Elizabeth Flores

-- 
Jose Antonio Lopes
Ganeti Engineering
Google Germany GmbH
Dienerstr. 12, 80331, München

Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschäftsführer: Graham Law, Christine Elizabeth Flores
Steuernummer: 48/725/00206
Umsatzsteueridentifikationsnummer: DE813741370

Reply via email to