Re: [libvirt] [PATCH] fix deadlock when qemu cannot start
Quoting Daniel Veillard (veill...@redhat.com): > On Fri, Mar 30, 2012 at 08:45:35AM -0500, Serge Hallyn wrote: > > Quoting Daniel Veillard (veill...@redhat.com): > > > On Fri, Mar 16, 2012 at 02:37:42PM +0800, Wen Congyang wrote: > > > > When qemu cannot start, we may call qemuProcessStop() twice. > > > > We have check whether the vm is running at the beginning of > > > > qemuProcessStop() to avoid libvirt deadlock. We call > > > > qemuProcessStop() with driver and vm locked. It seems that > > > > we can avoid libvirt deadlock. But unfortunately we may > > > > unlock driver and vm in the function qemuProcessKill() while > > > > vm->def->id is not -1. So qemuProcessStop() will be run twice, > > > > and monitor will be freed unexpectedly. So we should set > > > > vm->def->id to -1 at the beginning of qemuProcessStop(). > > > > Oh, uh, Huh. This seems like it could be responsible for what I was > > reporting a few days ago (*1). I spent most of yesterday trying to > > track it down, only to finally realize that the QemuProcessStart would > > silently die at various places. So i was getting ready to send an email > > postulating that what's happening is that a virsh list starts, then > > a virsh start starts, and when the virsh list ends it somehow causes > > the virsh start to be killed. > > > > I'll test and see if this patch fixes it. > > I guess I need to resurect my patch checker, tune it and > make it send warning on patches not ACK'ed or reviewed automatically, > or even better make it a web page to avoid boring people. > http://libvirt.org/git/?p=patchchecker.git;a=summary > > I had that patch on my radar and though I took an awful long time > to review it it's there. > > BTW I'm late for rc2, I will make it tomorrow I guess, sorry about > it this is due to events out of my control :-\ > > Daniel > Alas, this patch (by itself) doesn't fix the issue for me. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] fix deadlock when qemu cannot start
On Fri, Mar 30, 2012 at 08:45:35AM -0500, Serge Hallyn wrote: > Quoting Daniel Veillard (veill...@redhat.com): > > On Fri, Mar 16, 2012 at 02:37:42PM +0800, Wen Congyang wrote: > > > When qemu cannot start, we may call qemuProcessStop() twice. > > > We have check whether the vm is running at the beginning of > > > qemuProcessStop() to avoid libvirt deadlock. We call > > > qemuProcessStop() with driver and vm locked. It seems that > > > we can avoid libvirt deadlock. But unfortunately we may > > > unlock driver and vm in the function qemuProcessKill() while > > > vm->def->id is not -1. So qemuProcessStop() will be run twice, > > > and monitor will be freed unexpectedly. So we should set > > > vm->def->id to -1 at the beginning of qemuProcessStop(). > > Oh, uh, Huh. This seems like it could be responsible for what I was > reporting a few days ago (*1). I spent most of yesterday trying to > track it down, only to finally realize that the QemuProcessStart would > silently die at various places. So i was getting ready to send an email > postulating that what's happening is that a virsh list starts, then > a virsh start starts, and when the virsh list ends it somehow causes > the virsh start to be killed. > > I'll test and see if this patch fixes it. I guess I need to resurect my patch checker, tune it and make it send warning on patches not ACK'ed or reviewed automatically, or even better make it a web page to avoid boring people. http://libvirt.org/git/?p=patchchecker.git;a=summary I had that patch on my radar and though I took an awful long time to review it it's there. BTW I'm late for rc2, I will make it tomorrow I guess, sorry about it this is due to events out of my control :-\ Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ dan...@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/ -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] fix deadlock when qemu cannot start
Quoting Daniel Veillard (veill...@redhat.com): > On Fri, Mar 16, 2012 at 02:37:42PM +0800, Wen Congyang wrote: > > When qemu cannot start, we may call qemuProcessStop() twice. > > We have check whether the vm is running at the beginning of > > qemuProcessStop() to avoid libvirt deadlock. We call > > qemuProcessStop() with driver and vm locked. It seems that > > we can avoid libvirt deadlock. But unfortunately we may > > unlock driver and vm in the function qemuProcessKill() while > > vm->def->id is not -1. So qemuProcessStop() will be run twice, > > and monitor will be freed unexpectedly. So we should set > > vm->def->id to -1 at the beginning of qemuProcessStop(). Oh, uh, Huh. This seems like it could be responsible for what I was reporting a few days ago (*1). I spent most of yesterday trying to track it down, only to finally realize that the QemuProcessStart would silently die at various places. So i was getting ready to send an email postulating that what's happening is that a virsh list starts, then a virsh start starts, and when the virsh list ends it somehow causes the virsh start to be killed. I'll test and see if this patch fixes it. thanks, -serge *1: If I create 4 vms and do for i in `seq 1 4`; do virsh start o$i > /tmp/o$i 2>&1 &; done; for i in `seq 1 4`; do virsh list > /dev/null 2>&1; done; sleep 1; virsh list most of the time at least one vm won't start. > > --- > > src/qemu/qemu_process.c | 20 ++-- > > src/qemu/qemu_process.h |2 ++ > > 2 files changed, 16 insertions(+), 6 deletions(-) > > > > diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c > > index 0af3751..44814df 100644 > > --- a/src/qemu/qemu_process.c > > +++ b/src/qemu/qemu_process.c > > @@ -3633,10 +3633,11 @@ qemuProcessKill(struct qemud_driver *driver, > > VIR_DEBUG("vm=%s pid=%d flags=%x", > >vm->def->name, vm->pid, flags); > > > > -if (!virDomainObjIsActive(vm)) { > > -VIR_DEBUG("VM '%s' not active", vm->def->name); > > -return 0; > > -} > > +if (!(flags & VIR_QEMU_PROCESS_KILL_NOCHECK)) > > +if (!virDomainObjIsActive(vm)) { > > +VIR_DEBUG("VM '%s' not active", vm->def->name); > > +return 0; > > +} > > > > /* This loop sends SIGTERM (or SIGKILL if flags has > > * VIR_QEMU_PROCESS_KILL_FORCE and VIR_QEMU_PROCESS_KILL_NOWAIT), > > @@ -3739,6 +3740,13 @@ void qemuProcessStop(struct qemud_driver *driver, > > return; > > } > > > > +/* > > + * We may unlock driver and vm in qemuProcessKill(), so the other > > thread > > + * can lock driver and vm, and then call qemuProcessStop(). So we > > should > > + * set vm->def->id to -1 here to avoid qemuProcessStop() called twice. > > + */ > > +vm->def->id = -1; > > + > > if ((logfile = qemuDomainCreateLog(driver, vm, true)) < 0) { > > /* To not break the normal domain shutdown process, skip the > > * timestamp log writing if failed on opening log file. */ > > @@ -3801,7 +3809,8 @@ void qemuProcessStop(struct qemud_driver *driver, > > } > > > > /* shut it off for sure */ > > -ignore_value(qemuProcessKill(driver, vm, VIR_QEMU_PROCESS_KILL_FORCE)); > > +ignore_value(qemuProcessKill(driver, vm, VIR_QEMU_PROCESS_KILL_FORCE| > > + > > VIR_QEMU_PROCESS_KILL_NOCHECK)); > > > > /* Stop autodestroy in case guest is restarted */ > > qemuProcessAutoDestroyRemove(driver, vm); > > @@ -3892,7 +3901,6 @@ retry: > > > > vm->taint = 0; > > vm->pid = -1; > > -vm->def->id = -1; > > virDomainObjSetState(vm, VIR_DOMAIN_SHUTOFF, reason); > > VIR_FREE(priv->vcpupids); > > priv->nvcpupids = 0; > > diff --git a/src/qemu/qemu_process.h b/src/qemu/qemu_process.h > > index 761db6f..891f7a5 100644 > > --- a/src/qemu/qemu_process.h > > +++ b/src/qemu/qemu_process.h > > @@ -72,6 +72,8 @@ int qemuProcessAttach(virConnectPtr conn, > > typedef enum { > > VIR_QEMU_PROCESS_KILL_FORCE = 1 << 0, > > VIR_QEMU_PROCESS_KILL_NOWAIT = 1 << 1, > > + VIR_QEMU_PROCESS_KILL_NOCHECK = 1 << 2, /* donot check whether the vm is > > + running */ > > } virQemuProcessKillMode; > > > > int qemuProcessKill(struct qemud_driver *driver, > > Hi Wen, > > sorry for the delay in reviewing the patch. I think I understand the > issue, and removing the pid and using an extra flag to qemuProcessKill > to avoid the race sounds reasonable. I rebased the patch a bit, made > some cosmetic changes especially on the comments and pushed it so it is > included in rc2 for more testing, > > thanks ! > > Daniel > > -- > Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ > dan...@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ > http://veillard.com/ | virtualization library http://libvirt.org/ > > -- > libvir-list mailing list > libvir-list@
Re: [libvirt] [PATCH] fix deadlock when qemu cannot start
On Fri, Mar 16, 2012 at 02:37:42PM +0800, Wen Congyang wrote: > When qemu cannot start, we may call qemuProcessStop() twice. > We have check whether the vm is running at the beginning of > qemuProcessStop() to avoid libvirt deadlock. We call > qemuProcessStop() with driver and vm locked. It seems that > we can avoid libvirt deadlock. But unfortunately we may > unlock driver and vm in the function qemuProcessKill() while > vm->def->id is not -1. So qemuProcessStop() will be run twice, > and monitor will be freed unexpectedly. So we should set > vm->def->id to -1 at the beginning of qemuProcessStop(). > > --- > src/qemu/qemu_process.c | 20 ++-- > src/qemu/qemu_process.h |2 ++ > 2 files changed, 16 insertions(+), 6 deletions(-) > > diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c > index 0af3751..44814df 100644 > --- a/src/qemu/qemu_process.c > +++ b/src/qemu/qemu_process.c > @@ -3633,10 +3633,11 @@ qemuProcessKill(struct qemud_driver *driver, > VIR_DEBUG("vm=%s pid=%d flags=%x", >vm->def->name, vm->pid, flags); > > -if (!virDomainObjIsActive(vm)) { > -VIR_DEBUG("VM '%s' not active", vm->def->name); > -return 0; > -} > +if (!(flags & VIR_QEMU_PROCESS_KILL_NOCHECK)) > +if (!virDomainObjIsActive(vm)) { > +VIR_DEBUG("VM '%s' not active", vm->def->name); > +return 0; > +} > > /* This loop sends SIGTERM (or SIGKILL if flags has > * VIR_QEMU_PROCESS_KILL_FORCE and VIR_QEMU_PROCESS_KILL_NOWAIT), > @@ -3739,6 +3740,13 @@ void qemuProcessStop(struct qemud_driver *driver, > return; > } > > +/* > + * We may unlock driver and vm in qemuProcessKill(), so the other thread > + * can lock driver and vm, and then call qemuProcessStop(). So we should > + * set vm->def->id to -1 here to avoid qemuProcessStop() called twice. > + */ > +vm->def->id = -1; > + > if ((logfile = qemuDomainCreateLog(driver, vm, true)) < 0) { > /* To not break the normal domain shutdown process, skip the > * timestamp log writing if failed on opening log file. */ > @@ -3801,7 +3809,8 @@ void qemuProcessStop(struct qemud_driver *driver, > } > > /* shut it off for sure */ > -ignore_value(qemuProcessKill(driver, vm, VIR_QEMU_PROCESS_KILL_FORCE)); > +ignore_value(qemuProcessKill(driver, vm, VIR_QEMU_PROCESS_KILL_FORCE| > + VIR_QEMU_PROCESS_KILL_NOCHECK)); > > /* Stop autodestroy in case guest is restarted */ > qemuProcessAutoDestroyRemove(driver, vm); > @@ -3892,7 +3901,6 @@ retry: > > vm->taint = 0; > vm->pid = -1; > -vm->def->id = -1; > virDomainObjSetState(vm, VIR_DOMAIN_SHUTOFF, reason); > VIR_FREE(priv->vcpupids); > priv->nvcpupids = 0; > diff --git a/src/qemu/qemu_process.h b/src/qemu/qemu_process.h > index 761db6f..891f7a5 100644 > --- a/src/qemu/qemu_process.h > +++ b/src/qemu/qemu_process.h > @@ -72,6 +72,8 @@ int qemuProcessAttach(virConnectPtr conn, > typedef enum { > VIR_QEMU_PROCESS_KILL_FORCE = 1 << 0, > VIR_QEMU_PROCESS_KILL_NOWAIT = 1 << 1, > + VIR_QEMU_PROCESS_KILL_NOCHECK = 1 << 2, /* donot check whether the vm is > + running */ > } virQemuProcessKillMode; > > int qemuProcessKill(struct qemud_driver *driver, Hi Wen, sorry for the delay in reviewing the patch. I think I understand the issue, and removing the pid and using an extra flag to qemuProcessKill to avoid the race sounds reasonable. I rebased the patch a bit, made some cosmetic changes especially on the comments and pushed it so it is included in rc2 for more testing, thanks ! Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ dan...@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/ -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] fix deadlock when qemu cannot start
ping At 03/16/2012 02:37 PM, Wen Congyang Wrote: > When qemu cannot start, we may call qemuProcessStop() twice. > We have check whether the vm is running at the beginning of > qemuProcessStop() to avoid libvirt deadlock. We call > qemuProcessStop() with driver and vm locked. It seems that > we can avoid libvirt deadlock. But unfortunately we may > unlock driver and vm in the function qemuProcessKill() while > vm->def->id is not -1. So qemuProcessStop() will be run twice, > and monitor will be freed unexpectedly. So we should set > vm->def->id to -1 at the beginning of qemuProcessStop(). > > --- > src/qemu/qemu_process.c | 20 ++-- > src/qemu/qemu_process.h |2 ++ > 2 files changed, 16 insertions(+), 6 deletions(-) > > diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c > index 0af3751..44814df 100644 > --- a/src/qemu/qemu_process.c > +++ b/src/qemu/qemu_process.c > @@ -3633,10 +3633,11 @@ qemuProcessKill(struct qemud_driver *driver, > VIR_DEBUG("vm=%s pid=%d flags=%x", >vm->def->name, vm->pid, flags); > > -if (!virDomainObjIsActive(vm)) { > -VIR_DEBUG("VM '%s' not active", vm->def->name); > -return 0; > -} > +if (!(flags & VIR_QEMU_PROCESS_KILL_NOCHECK)) > +if (!virDomainObjIsActive(vm)) { > +VIR_DEBUG("VM '%s' not active", vm->def->name); > +return 0; > +} > > /* This loop sends SIGTERM (or SIGKILL if flags has > * VIR_QEMU_PROCESS_KILL_FORCE and VIR_QEMU_PROCESS_KILL_NOWAIT), > @@ -3739,6 +3740,13 @@ void qemuProcessStop(struct qemud_driver *driver, > return; > } > > +/* > + * We may unlock driver and vm in qemuProcessKill(), so the other thread > + * can lock driver and vm, and then call qemuProcessStop(). So we should > + * set vm->def->id to -1 here to avoid qemuProcessStop() called twice. > + */ > +vm->def->id = -1; > + > if ((logfile = qemuDomainCreateLog(driver, vm, true)) < 0) { > /* To not break the normal domain shutdown process, skip the > * timestamp log writing if failed on opening log file. */ > @@ -3801,7 +3809,8 @@ void qemuProcessStop(struct qemud_driver *driver, > } > > /* shut it off for sure */ > -ignore_value(qemuProcessKill(driver, vm, VIR_QEMU_PROCESS_KILL_FORCE)); > +ignore_value(qemuProcessKill(driver, vm, VIR_QEMU_PROCESS_KILL_FORCE| > + VIR_QEMU_PROCESS_KILL_NOCHECK)); > > /* Stop autodestroy in case guest is restarted */ > qemuProcessAutoDestroyRemove(driver, vm); > @@ -3892,7 +3901,6 @@ retry: > > vm->taint = 0; > vm->pid = -1; > -vm->def->id = -1; > virDomainObjSetState(vm, VIR_DOMAIN_SHUTOFF, reason); > VIR_FREE(priv->vcpupids); > priv->nvcpupids = 0; > diff --git a/src/qemu/qemu_process.h b/src/qemu/qemu_process.h > index 761db6f..891f7a5 100644 > --- a/src/qemu/qemu_process.h > +++ b/src/qemu/qemu_process.h > @@ -72,6 +72,8 @@ int qemuProcessAttach(virConnectPtr conn, > typedef enum { > VIR_QEMU_PROCESS_KILL_FORCE = 1 << 0, > VIR_QEMU_PROCESS_KILL_NOWAIT = 1 << 1, > + VIR_QEMU_PROCESS_KILL_NOCHECK = 1 << 2, /* donot check whether the vm is > + running */ > } virQemuProcessKillMode; > > int qemuProcessKill(struct qemud_driver *driver, -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list