This really is an undocumented feature, I could barely find any information about it. I have included this information in the design doc. Thanks.
diff --git a/doc/design-kvmd.rst b/doc/design-kvmd.rst index eaf21d0..b627b35 100644 --- a/doc/design-kvmd.rst +++ b/doc/design-kvmd.rst @@ -82,7 +82,7 @@ result of a Ganeti reinstallation. Shutdown detection ------------------ -As mentioned before, the KVM daemon is responsbile for opening a +As mentioned before, the KVM daemon is responsible for opening a connection to the QMP socket of a given instance and listening in on the shutdown and powerdown events, which allow the KVM daemon to determine whether the instance stopped because of an administrator or user @@ -185,6 +185,30 @@ at KVM daemon startup or at regular intervals to ensure that the current KVM internal state is consistent with the actual contents of the KVM control directory. +Another race condition occurs when Ganeti shuts down a KVM instance +using force. Ganeti uses ``TERM`` signals to stop KVM instances when +force is specified or ACPI is not enabled. However, as mentioned +before, ``TERM`` signals are interpreted by the KVM daemon as a user +shutdown. As a result, the KVM daemon creates a shutdown file which +then must be removed by Ganeti. The race condition occurs because the +KVM daemon might create the shutdown file after the hypervisor code that +tries to remove this file has already run. In practice, the race +condition seems unlikely because Ganeti stops the KVM instance in a +retry loop, which allows Ganeti to stop the instance and cleanup its +runtime information. + +It is possible to determine if a process, in this particular case the +KVM process, was terminated by a ``TERM`` signal, using the `proc +connector and socket filters +<https://web.archive.org/web/20121025062848/http://netsplit.com/2011/02/09/the-proc-connector-and-socket-filters/>`_. +The proc connector is a socket connected between a userspace process and +the kernel through the netlink protocol and can be used to receive +notifications of process events, and the socket filters is a mechanism +for subscribing only to events that are relevant. There are several +`process events <http://lwn.net/Articles/157150/>`_ which can be +subscribed to, however, in this case, we are interested only in the exit +event, which carries information about the exit signal. + On Tue, Dec 10, 2013 at 01:01:35PM +0200, Apollon Oikonomopoulos wrote: > Hi Jose, > > On 10:47 Mon 09 Dec , Jose A. Lopes wrote: > > New paragraph in further considerations section: > > > > Interdiff: > > > > diff --git a/doc/design-kvmd.rst b/doc/design-kvmd.rst > > index eaf21d0..062dece 100644 > > --- a/doc/design-kvmd.rst > > +++ b/doc/design-kvmd.rst > > @@ -175,6 +175,18 @@ the KVM daemon has a chance to add a watch to the KVM > > control directory, > > thus causing this daemon to miss the ``inotify`` creation event for the > > QMP socket. > > > > +Another race condition occurs when Ganeti shuts down a KVM instance > > +using force. Ganeti uses ``TERM`` signals to stop KVM instances when > > +force is specified or ACPI is not enabled. However, as mentioned > > +before, ``TERM`` signals are interpreted by the KVM daemon as a user > > +shutdown. As a result, the KVM daemon creates a shutdown file which > > +then must be removed by Ganeti. The race condition occurs because the > > +KVM daemon might create the shutdown file after the hypervisor code that > > +tries to remove this file has already run. In practice, the race > > +condition seems unlikely because Ganeti stops the KVM instance in a > > +retry loop, which allows Ganeti to stop the instance and cleanup its > > +runtime information. > > + > > There's an interesting (yet mostly undocumented) feature of the Linux > kernel called "Process Events Connector". It basically allows you to get > process events directly from the kernel using a regular Netlink socket. > A quick look at the kernel source indicates that the PROC_EVENT_EXIT > event carries all necessary information, namely the process ID, the exit > code and the signal that (possibly) caused process termination. It's > like wait(), but it works for all processes, not only children. > > I'm not saying it will make your life necessarily easier, but I think > you should have a look at it. Having things like the actual signal > available, could also help with debugging (e.g. log that the process > actually aborted or segfaulted). A detailed article describing the above > (together with event filtering using BPF) can be found at (original site > seems to be down): > > https://web.archive.org/web/20121025062848/http://netsplit.com/2011/02/09/the-proc-connector-and-socket-filters/ > > For the record, I had done a hackish implementation of process-exit > notifications for ganeti using the release handler facility of cgroups > (and placing each KVM instance in its own cgroup), but this is a route I > wouldn't recommend (mostly because the cgroups subsystem is currently > changing to not support multiple hierarchies anymore). > > Regards, > Apollon -- Jose Antonio Lopes Ganeti Engineering Google Germany GmbH Dienerstr. 12, 80331, München Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg Geschäftsführer: Graham Law, Christine Elizabeth Flores Steuernummer: 48/725/00206 Umsatzsteueridentifikationsnummer: DE813741370
