Re: [libvirt] libvirt accidentally destroys guests after being restarted

Igor Lukyanov Sat, 02 Mar 2013 03:58:16 -0800

Seems that we have following execution flow:

1. Something calls function
static void qemuMonitorIO(int watch, int fd, int events, void *opaque)


which has following piece of code:

qemu_monitor.c:
 599         if (!error &&
 600             events & VIR_EVENT_HANDLE_HANGUP) {
 601             virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
 602                            _("End of file from monitor"));
 603             eof = 1;
 604             events &= ~VIR_EVENT_HANDLE_HANGUP;
 605         }

Someway we have VIR_EVENT_HANDLE_HANGUP set when come to the mentioned piece of 
code, and mark monitor file as eof-ed.

2. Few lines below we coming up to the following instructions:

qemu_monitor.c:
649     if (eof) {
 650         void (*eofNotify)(qemuMonitorPtr, virDomainObjPtr)
 651             = mon->cb->eofNotify;
 652         virDomainObjPtr vm = mon->vm;
 653 
 654         /* Make sure anyone waiting wakes up now */
 655         virCondSignal(&mon->notify);
 656         virObjectUnlock(mon);
 657         virObjectUnref(mon);
 658         VIR_DEBUG("Triggering EOF callback");
 659         (eofNotify)(mon, vm);
 660     } 

and running eofNotify callback, which leads us to:

3. qemu_process.c:
 292 static void
 293 qemuProcessHandleMonitorEOF(qemuMonitorPtr mon ATTRIBUTE_UNUSED,
 294                             virDomainObjPtr vm)
 295 {
 296     virQEMUDriverPtr driver = qemu_driver;
 297     virDomainEventPtr event = NULL;
 298     qemuDomainObjPrivatePtr priv;
 299     int eventReason = VIR_DOMAIN_EVENT_STOPPED_SHUTDOWN;
 300     int stopReason = VIR_DOMAIN_SHUTOFF_SHUTDOWN;
 301     const char *auditReason = "shutdown";
 302 
 303     VIR_DEBUG("Received EOF on %p '%s'", vm, vm->def->name);
 304 
 305     qemuDriverLock(driver);
 306     virObjectLock(vm);
 307 
 308     priv = vm->privateData;
 309 
 310     if (priv->beingDestroyed) {
 311         VIR_DEBUG("Domain is being destroyed, EOF is expected");
 312         goto unlock;
 313     }
 314 
 315     if (!virDomainObjIsActive(vm)) {
 316         VIR_DEBUG("Domain %p is not active, ignoring EOF", vm);
 317         goto unlock;
 318     }
 319 
 320     if (priv->monJSON && !priv->gotShutdown) {
 321         VIR_DEBUG("Monitor connection to '%s' closed without SHUTDOWN 
event; "
 322                   "assuming the domain crashed", vm->def->name);
 323         eventReason = VIR_DOMAIN_EVENT_STOPPED_FAILED;
 324         stopReason = VIR_DOMAIN_SHUTOFF_CRASHED;
 325         auditReason = "failed";
 326     }
 327 
 328     event = virDomainEventNewFromObj(vm,
 329                                      VIR_DOMAIN_EVENT_STOPPED,
 330                                      eventReason);
 331     qemuProcessStop(driver, vm, stopReason, 0);
 332     virDomainAuditStop(vm, auditReason);
 333 
 334     if (!vm->persistent) {
 335         qemuDomainRemoveInactive(driver, vm);
 336         goto cleanup;
 337     }
 338 
 339 unlock:
 340     virObjectUnlock(vm);
 341 
 342 cleanup:
 343     if (event)
 344         qemuDomainEventQueue(driver, event);
 345     qemuDriverUnlock(driver);
 346 }

… where we mistakenly stop domain by the following call:
 331     qemuProcessStop(driver, vm, stopReason, 0);


This flow seems to be the only possible.
So the question is how running domain may generate VIR_EVENT_HANDLE_HANGUP 
event.
May be there some race condition or some other reasons causing that state on 
libvirt startup?
Thank you for help.

02.03.2013, в 13:35, Igor Lukyanov <i...@lukyanov.org> написал(а):

> Hello.
> We notice a very upsetting behavior of how libvirtd deals with running 
> domains after being restarted.
> Namely, in 10-20s after start libvirtd may accidentally kill a random number 
> of running (qemu) domains with SIGTERM.
> 
> Here is what we get in libvirt log: 
> 
> 2013-02-27 12:35:10.415+0000: 45777: info : libvirt version: 1.0.2
> 2013-02-27 12:35:10.415+0000: 45777: error : 
> virNWFilterSnoopLeaseFileRefresh:1903 : 
> open("/var/run/libvirt/network/nwfilter.ltmp"): No such file or directory
> 2013-02-27 12:35:21.772+0000: 45766: error : qemuMonitorIO:602 : internal 
> error End of file from monitor
> 2013-02-27 12:35:21.772+0000: 45766: error : virNWFilterDHCPSnoopEnd:2131 : 
> internal error ifname "vnet34" not in key map
> 2013-02-27 12:35:21.905+0000: 45766: error : virNetDevGetIndex:653 : Unable 
> to get index for interface vnet34: No such device
> 2013-02-27 12:35:22.029+0000: 45766: error : virNWFilterDHCPSnoopEnd:2131 : 
> internal error ifname "vnet35" not in key map
> 2013-02-27 12:35:22.032+0000: 45766: error : virNetDevGetIndex:653 : Unable 
> to get index for interface vnet35: No such device
> 2013-02-27 12:35:22.430+0000: 45766: error : qemuMonitorIO:602 : internal 
> error End of file from monitor
> 2013-02-27 12:35:22.430+0000: 45766: error : virNWFilterDHCPSnoopEnd:2131 : 
> internal error ifname "vnet4" not in key map
> 2013-02-27 12:35:22.433+0000: 45766: error : virNetDevGetIndex:653 : Unable 
> to get index for interface vnet4: No such device
> 2013-02-27 12:35:22.554+0000: 45766: error : virNWFilterDHCPSnoopEnd:2131 : 
> internal error ifname "vnet5" not in key map
> 2013-02-27 12:35:22.557+0000: 45766: error : virNetDevGetIndex:653 : Unable 
> to get index for interface vnet5: No such device
> 2013-02-27 12:35:22.794+0000: 45766: error : qemuMonitorIO:602 : internal 
> error End of file from monitor
> 2013-02-27 12:35:22.794+0000: 45766: error : virNWFilterDHCPSnoopEnd:2131 : 
> internal error ifname "vnet10" not in key map
> 2013-02-27 12:35:22.797+0000: 45766: error : virNetDevGetIndex:653 : Unable 
> to get index for interface vnet10: No such device
> 2013-02-27 12:35:22.944+0000: 45766: error : virNWFilterDHCPSnoopEnd:2131 : 
> internal error ifname "vnet11" not in key map
> 2013-02-27 12:35:22.947+0000: 45766: error : virNetDevGetIndex:653 : Unable 
> to get index for interface vnet11: No such device
> 2013-02-27 12:35:23.194+0000: 45766: error : qemuMonitorIO:602 : internal 
> error End of file from monitor
> 2013-02-27 12:35:23.194+0000: 45766: error : virNWFilterDHCPSnoopEnd:2131 : 
> internal error ifname "vnet14" not in key map
> 2013-02-27 12:35:23.197+0000: 45766: error : virNetDevGetIndex:653 : Unable 
> to get index for interface vnet14: No such device
> 2013-02-27 12:35:23.361+0000: 45766: error : virNWFilterDHCPSnoopEnd:2131 : 
> internal error ifname "vnet15" not in key map
> 2013-02-27 12:35:23.364+0000: 45766: error : virNetDevGetIndex:653 : Unable 
> to get index for interface vnet15: No such device
> 
> 
> And this is what we get by grepping 45766 in separate logs of vms:
> 
> drafter@10-5-0-11:/var/log/libvirt/qemu$ sudo grep -A 1 45766 * 
> vm010-001-206-002.log:qemu: terminating on signal 15 from pid 45766
> vm010-001-206-002.log-2013-02-27 12:35:22.430+0000: shutting down
> --
> vm010-001-207-002.log:qemu: terminating on signal 15 from pid 45766
> vm010-001-207-002.log-2013-02-27 12:35:22.794+0000: shutting down
> --
> vm010-001-228-006.log:qemu: terminating on signal 15 from pid 45766
> vm010-001-228-006.log-2013-02-27 12:35:23.194+0000: shutting down
> --
> vm010-002-011-002.log:qemu: terminating on signal 15 from pid 45766
> vm010-002-011-002.log-2013-02-27 12:35:21.772+0000: shutting down
> 
> Specifically here libvirt killed 4 vms just after start.
> Before that occasion we just shut libvirtd off and started again (stop/start).
> 
> Will be very grateful for any ideas on this matter.
> --
> wbr, Igor Lukyanov


--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] libvirt accidentally destroys guests after being restarted

Reply via email to