New paragraph in further considerations section: Interdiff:
diff --git a/doc/design-kvmd.rst b/doc/design-kvmd.rst index eaf21d0..062dece 100644 --- a/doc/design-kvmd.rst +++ b/doc/design-kvmd.rst @@ -175,6 +175,18 @@ the KVM daemon has a chance to add a watch to the KVM control directory, thus causing this daemon to miss the ``inotify`` creation event for the QMP socket. +Another race condition occurs when Ganeti shuts down a KVM instance +using force. Ganeti uses ``TERM`` signals to stop KVM instances when +force is specified or ACPI is not enabled. However, as mentioned +before, ``TERM`` signals are interpreted by the KVM daemon as a user +shutdown. As a result, the KVM daemon creates a shutdown file which +then must be removed by Ganeti. The race condition occurs because the +KVM daemon might create the shutdown file after the hypervisor code that +tries to remove this file has already run. In practice, the race +condition seems unlikely because Ganeti stops the KVM instance in a +retry loop, which allows Ganeti to stop the instance and cleanup its +runtime information. + There are other problems which arise from the limitations of ``inotify``. For example, if the KVM daemon is started after the first Ganeti instance has been created, then the ``inotify`` will not produce Thanks, Jose On Thu, Nov 28, 2013 at 09:00:29AM +0100, Thomas Thrainer wrote: > What are the advantages of creating a new daemon compared to include this > functionality in noded? > noded already holds a runtime configuration file for each instance where it > would be easy to add the shutdown state. It also knows when an instance is > started / stopped through Ganeti and when the respective directories are > created, so there is no need for inotify watches. > It also has already support for communicating via QMP with the instance > which could be reused. > > > On Wed, Nov 27, 2013 at 6:49 PM, Jose A. Lopes <[email protected]> wrote: > > > Design document for KVM daemon which is needed by the instance > > shutdown detection for KVM. > > > > Signed-off-by: Jose A. Lopes <[email protected]> > > --- > > Makefile.am | 1 + > > doc/design-kvmd.rst | 146 > > ++++++++++++++++++++++++++++++++++++++++++++++++++++ > > doc/index.rst | 1 + > > 3 files changed, 148 insertions(+) > > create mode 100644 doc/design-kvmd.rst > > > > diff --git a/Makefile.am b/Makefile.am > > index 10e9962..1249bd5 100644 > > --- a/Makefile.am > > +++ b/Makefile.am > > @@ -520,6 +520,7 @@ docinput = \ > > doc/design-hugepages-support.rst \ > > doc/design-impexp2.rst \ > > doc/design-internal-shutdown.rst \ > > + doc/design-kvmd.rst \ > > doc/design-linuxha.rst \ > > doc/design-lu-generated-jobs.rst \ > > doc/design-monitoring-agent.rst \ > > diff --git a/doc/design-kvmd.rst b/doc/design-kvmd.rst > > new file mode 100644 > > index 0000000..079c0b3 > > --- /dev/null > > +++ b/doc/design-kvmd.rst > > @@ -0,0 +1,146 @@ > > +========== > > +KVM daemon > > +========== > > + > > +.. Last updated for Ganeti 2.10 > > + > > +.. toctree:: > > + :maxdepth: 2 > > + > > +This design document describes the KVM daemon, which is responsible for > > +determining whether a given KVM instance was shutdown by an > > +administrator or a user. > > + > > + > > +Current state and shortcomings > > +============================== > > + > > +This design document describes the KVM daemon which addresses the KVM > > +side of the user-initiated shutdown problem introduced in > > +:doc:`design-internal-shutdown` > > + > > + > > +Proposed changes > > +================ > > + > > +The instance shutdown feature for KVM requires listening on events from > > +the Qemu Machine Protocol (QMP) Unix socket, which is created together > > +with a KVM instance. A QMP socket typically looks like > > +``/var/run/ganeti/kvm-hypervisor/ctrl/<instance>.qmp`` and implements > > +the QMP protocol. This is a bidirectional protocol that allows Ganeti > > +to send commands, such as, system powerdown, as well as, receive events, > > +such as, the powerdown and shutdown events. > > + > > +Listening in on these events allows Ganeti to determine whether a given > > +KVM instance was shutdown by an administrator, either through > > +``gnt-instance stop|remove <instance>`` or ``kill -KILL > > +<instance-pid>``, or by a user, through ``poweroff`` from inside the > > +instance. Upon an administrator powerdown, the QMP protocol sends two > > +events, namely, a powerdown event and a shutdown event, whereas upon a > > +user shutdown only the shutdown event is sent. This is enough to > > +distinguish between an administrator and a user shutdown. However, > > +there is one limitation, which is, ``kill -TERM <instance-pid>``. Even > > +though this is an action performed by the administrator, it will be > > +considered a user shutdown by the approach described in this document. > > + > > +Several design strategies were considered. Most of these strategies > > +consisted of spawning some process listening on the QMP socket when a > > +KVM instance is created. However, having a listener process per KVM > > +instance is not scalable. Therefore, a different strategy is proposed, > > +namely, having a single process, called the KVM daemon, listening on the > > +QMP sockets of all KVM instances within a node. That also means there > > +is an instance of the KVM daemon on each node. > > + > > +In order to implement the KVM daemon, two problems need to be addressed, > > +namely, how the KVM daemon knows when to open a connection to a given > > +QMP socket and how the KVM daemon communicates with Ganeti whether a > > +given instance was shutdown by an administrator or a user. > > > > Having the functionality in noded would avoid those problems, right? > > > > + > > +QMP connections management > > +-------------------------- > > + > > +As mentioned before, the QMP sockets reside in the KVM control > > +directory, which is usually located under > > +``/var/run/ganeti/kvm-hypervisor/ctrl/``. When a KVM instance is > > +created, a new QMP socket for this instance is also created in this > > +directory. > > + > > +In order to simplify the design of the KVM daemon, instead of having > > +Ganeti communicate to this daemon through a pipe or socket the creation > > +of a new KVM instance, and thus a new QMP socket, this daemon will > > +monitor the KVM control directory using ``inotify``. As a result, the > > +daemon is not only able to deal with KVM instances being created and > > +removed, but also capable of overcoming other problematic situations > > +concerning the filesystem, such as, the case when the KVM control > > +directory does not exist because, for example, Ganeti was not yet > > +started, or the KVM control directory was removed, for example, as a > > +result of a Ganeti reinstallation. > > + > > +Shutdown detection > > +------------------ > > + > > +As mentioned before, the KVM daemon is responsbile for opening a > > +connection to the QMP socket of a given instance and listening in on the > > +shutdown and powerdown events, which allow the KVM daemon to determine > > +whether the instance stopped because of an administrator or user > > +shutdown. Once the instance is stopped, the KVM daemon needs to > > +communicate to Ganeti whether the user was responsible for shutting down > > +the instance. > > + > > +In order to achieve this, the KVM daemon writes an empty file, called > > +the shutdown file, in the KVM control directory with a name similar to > > +the QMP socket file but with the extension ``.qmp`` replaced with > > +``.shutdown``. The presence of this file indicates that the shutdown > > +was initiated by a user, whereas the absence of this file indicates that > > +the shutdown was caused by an administrator. This strategy also handles > > +crashes and signals, such as, ``SIGKILL``, to be handled correctly, > > +given that in these cases the KVM daemon never receives the powerdown > > +and shutdown events and, therefore, never creates the shutdown file. > > + > > +KVM daemon launch > > +----------------- > > + > > +With the above issues addressed, a question remains as to when the KVM > > +daemon should be started. The KVM daemon is different from other Ganeti > > +daemons, which start together with the Ganeti service, because the KVM > > +daemon is optional, given that it is specific to KVM and should not be > > +run on installations containing only Xen, and, even in a KVM > > +installation, the user might still choose not to enable it. And finally > > +because the KVM daemon is not really necessary until the first KVM > > +instance is started. For these reasons, the KVM daemon is started from > > +within Ganeti when a KVM instance is started. And the job process > > +spawned by the node daemon is responsible for starting the KVM daemon. > > + > > +Given the current design of Ganeti, in which the node daemon spawns a > > +job process to handle the creation of the instance, when launching the > > +KVM daemon it is necessary to first check whether an instance of this > > +daemon is already running and, if this is not the case, then the KVM > > +daemon can be safely started. > > > > The node daemon does not spawn a process for handling requests, AFAIK. > Instead, a dedicated thread is created for each request. > KVM shutdown detection could run in a separate thread which communicates > with the request-processing threads via a queue but writes the shutdown > "result" directly into the KVM runtime files (with a minimal amount of > locking required there). > > > > + > > +Further considerations > > +====================== > > + > > +There are potential race conditions between Ganeti and the KVM daemon, > > +however, in practice they seem unlikely. For example, the KVM daemon > > +needs to add and remove watches to the parent directories of the KVM > > +control directory until this directory is finally created. It is > > +possible that Ganeti creates this directory and a KVM instance before > > +the KVM daemon has a chance to add a watch to the KVM control directory, > > +thus causing this daemon to miss the ``inotify`` creation event for the > > +QMP socket. > > > > This would be no problem neither, right? > > > > + > > +There are other problems which arise from the limitations of > > +``inotify``. For example, if the KVM daemon is started after the first > > +Ganeti instance has been created, then the ``inotify`` will not produce > > +any event for the creation of the QMP socket. This can happen, for > > +example, if the KVM daemon needs to be restarted or upgraded. As a > > +result, it might be necessary to have an additional mechanism that runs > > +at KVM daemon startup or at regular intervals to ensure that the current > > +KVM internal state is consistent with the actual contents of the KVM > > +control directory. > > > > As the node daemon has the authoritative state anyway (stored in runtime > files, so they survive restarts), this would be simpler too. > > > > + > > +.. vim: set textwidth=72 : > > +.. Local Variables: > > +.. mode: rst > > +.. fill-column: 72 > > +.. End: > > diff --git a/doc/index.rst b/doc/index.rst > > index 7ec8162..df443e0 100644 > > --- a/doc/index.rst > > +++ b/doc/index.rst > > @@ -116,6 +116,7 @@ Draft designs > > design-device-uuid-name.rst > > design-hroller.rst > > design-hotplug.rst > > + design-kvmd.rst > > design-linuxha.rst > > design-lu-generated-jobs.rst > > design-monitoring-agent.rst > > -- > > 1.8.4.1 > > > > > Thanks, > Thomas > > -- > Thomas Thrainer | Software Engineer | [email protected] | > > Google Germany GmbH > Dienerstr. 12 > 80331 München > > Registergericht und -nummer: Hamburg, HRB 86891 > Sitz der Gesellschaft: Hamburg > Geschäftsführer: Graham Law, Christine Elizabeth Flores -- Jose Antonio Lopes Ganeti Engineering Google Germany GmbH Dienerstr. 12, 80331, München Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg Geschäftsführer: Graham Law, Christine Elizabeth Flores Steuernummer: 48/725/00206 Umsatzsteueridentifikationsnummer: DE813741370
