This really is an undocumented feature, I could barely find any
information about it.  I have included this information in the design
doc.  Thanks.

diff --git a/doc/design-kvmd.rst b/doc/design-kvmd.rst
index eaf21d0..b627b35 100644
--- a/doc/design-kvmd.rst
+++ b/doc/design-kvmd.rst
@@ -82,7 +82,7 @@ result of a Ganeti reinstallation.
 Shutdown detection
 ------------------
 
-As mentioned before, the KVM daemon is responsbile for opening a
+As mentioned before, the KVM daemon is responsible for opening a
 connection to the QMP socket of a given instance and listening in on the
 shutdown and powerdown events, which allow the KVM daemon to determine
 whether the instance stopped because of an administrator or user
@@ -185,6 +185,30 @@ at KVM daemon startup or at regular intervals to ensure 
that the current
 KVM internal state is consistent with the actual contents of the KVM
 control directory.
 
+Another race condition occurs when Ganeti shuts down a KVM instance
+using force.  Ganeti uses ``TERM`` signals to stop KVM instances when
+force is specified or ACPI is not enabled.  However, as mentioned
+before, ``TERM`` signals are interpreted by the KVM daemon as a user
+shutdown.  As a result, the KVM daemon creates a shutdown file which
+then must be removed by Ganeti.  The race condition occurs because the
+KVM daemon might create the shutdown file after the hypervisor code that
+tries to remove this file has already run.  In practice, the race
+condition seems unlikely because Ganeti stops the KVM instance in a
+retry loop, which allows Ganeti to stop the instance and cleanup its
+runtime information.
+
+It is possible to determine if a process, in this particular case the
+KVM process, was terminated by a ``TERM`` signal, using the `proc
+connector and socket filters
+<https://web.archive.org/web/20121025062848/http://netsplit.com/2011/02/09/the-proc-connector-and-socket-filters/>`_.
+The proc connector is a socket connected between a userspace process and
+the kernel through the netlink protocol and can be used to receive
+notifications of process events, and the socket filters is a mechanism
+for subscribing only to events that are relevant.  There are several
+`process events <http://lwn.net/Articles/157150/>`_ which can be
+subscribed to, however, in this case, we are interested only in the exit
+event, which carries information about the exit signal.
+

On Tue, Dec 10, 2013 at 01:01:35PM +0200, Apollon Oikonomopoulos wrote:
> Hi Jose,
> 
> On 10:47 Mon 09 Dec     , Jose A. Lopes wrote:
> > New paragraph in further considerations section:
> > 
> > Interdiff:
> > 
> > diff --git a/doc/design-kvmd.rst b/doc/design-kvmd.rst
> > index eaf21d0..062dece 100644
> > --- a/doc/design-kvmd.rst
> > +++ b/doc/design-kvmd.rst
> > @@ -175,6 +175,18 @@ the KVM daemon has a chance to add a watch to the KVM 
> > control directory,
> >  thus causing this daemon to miss the ``inotify`` creation event for the
> >  QMP socket.
> >  
> > +Another race condition occurs when Ganeti shuts down a KVM instance
> > +using force.  Ganeti uses ``TERM`` signals to stop KVM instances when
> > +force is specified or ACPI is not enabled.  However, as mentioned
> > +before, ``TERM`` signals are interpreted by the KVM daemon as a user
> > +shutdown.  As a result, the KVM daemon creates a shutdown file which
> > +then must be removed by Ganeti.  The race condition occurs because the
> > +KVM daemon might create the shutdown file after the hypervisor code that
> > +tries to remove this file has already run.  In practice, the race
> > +condition seems unlikely because Ganeti stops the KVM instance in a
> > +retry loop, which allows Ganeti to stop the instance and cleanup its
> > +runtime information.
> > +
> 
> There's an interesting (yet mostly undocumented) feature of the Linux 
> kernel called "Process Events Connector". It basically allows you to get 
> process events directly from the kernel using a regular Netlink socket.  
> A quick look at the kernel source indicates that the PROC_EVENT_EXIT 
> event carries all necessary information, namely the process ID, the exit 
> code and the signal that (possibly) caused process termination. It's 
> like wait(), but it works for all processes, not only children.
> 
> I'm not saying it will make your life necessarily easier, but I think 
> you should have a look at it. Having things like the actual signal 
> available, could also help with debugging (e.g. log that the process 
> actually aborted or segfaulted). A detailed article describing the above 
> (together with event filtering using BPF) can be found at (original site 
> seems to be down):
> 
> https://web.archive.org/web/20121025062848/http://netsplit.com/2011/02/09/the-proc-connector-and-socket-filters/
> 
> For the record, I had done a hackish implementation of process-exit 
> notifications for ganeti using the release handler facility of cgroups 
> (and placing each KVM instance in its own cgroup), but this is a route I 
> wouldn't recommend (mostly because the cgroups subsystem is currently 
> changing to not support multiple hierarchies anymore).
> 
> Regards,
> Apollon

-- 
Jose Antonio Lopes
Ganeti Engineering
Google Germany GmbH
Dienerstr. 12, 80331, München

Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschäftsführer: Graham Law, Christine Elizabeth Flores
Steuernummer: 48/725/00206
Umsatzsteueridentifikationsnummer: DE813741370

Reply via email to