Re: [libvirt] [PATCH] fix deadlock when qemu cannot start

2012-03-30 Thread Serge Hallyn
Quoting Daniel Veillard (veill...@redhat.com):
> On Fri, Mar 30, 2012 at 08:45:35AM -0500, Serge Hallyn wrote:
> > Quoting Daniel Veillard (veill...@redhat.com):
> > > On Fri, Mar 16, 2012 at 02:37:42PM +0800, Wen Congyang wrote:
> > > > When qemu cannot start, we may call qemuProcessStop() twice.
> > > > We have check whether the vm is running at the beginning of
> > > > qemuProcessStop() to avoid libvirt deadlock. We call
> > > > qemuProcessStop() with driver and vm locked. It seems that
> > > > we can avoid libvirt deadlock. But unfortunately we may
> > > > unlock driver and vm in the function qemuProcessKill() while
> > > > vm->def->id is not -1. So qemuProcessStop() will be run twice,
> > > > and monitor will be freed unexpectedly. So we should set
> > > > vm->def->id to -1 at the beginning of qemuProcessStop().
> > 
> > Oh, uh, Huh.  This seems like it could be responsible for what I was
> > reporting a few days ago (*1).  I spent most of yesterday trying to
> > track it down, only to finally realize that the QemuProcessStart would
> > silently die at various places.  So i was getting ready to send an email
> > postulating that what's happening is that a virsh list starts, then
> > a virsh start starts, and when the virsh list ends it somehow causes
> > the virsh start to be killed.
> > 
> > I'll test and see if this patch fixes it.
> 
>   I guess I need to resurect my patch checker, tune it and
> make it send warning on patches not ACK'ed or reviewed automatically,
> or even better make it a web page to avoid boring people.
>   http://libvirt.org/git/?p=patchchecker.git;a=summary
> 
>   I had that patch on my radar and though I took an awful long time
> to review it it's there.
> 
>   BTW I'm late for rc2, I will make it tomorrow I guess, sorry about
> it this is due to events out of my control :-\
> 
> Daniel
> 

Alas, this patch (by itself) doesn't fix the issue for me.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] fix deadlock when qemu cannot start

2012-03-30 Thread Daniel Veillard
On Fri, Mar 30, 2012 at 08:45:35AM -0500, Serge Hallyn wrote:
> Quoting Daniel Veillard (veill...@redhat.com):
> > On Fri, Mar 16, 2012 at 02:37:42PM +0800, Wen Congyang wrote:
> > > When qemu cannot start, we may call qemuProcessStop() twice.
> > > We have check whether the vm is running at the beginning of
> > > qemuProcessStop() to avoid libvirt deadlock. We call
> > > qemuProcessStop() with driver and vm locked. It seems that
> > > we can avoid libvirt deadlock. But unfortunately we may
> > > unlock driver and vm in the function qemuProcessKill() while
> > > vm->def->id is not -1. So qemuProcessStop() will be run twice,
> > > and monitor will be freed unexpectedly. So we should set
> > > vm->def->id to -1 at the beginning of qemuProcessStop().
> 
> Oh, uh, Huh.  This seems like it could be responsible for what I was
> reporting a few days ago (*1).  I spent most of yesterday trying to
> track it down, only to finally realize that the QemuProcessStart would
> silently die at various places.  So i was getting ready to send an email
> postulating that what's happening is that a virsh list starts, then
> a virsh start starts, and when the virsh list ends it somehow causes
> the virsh start to be killed.
> 
> I'll test and see if this patch fixes it.

  I guess I need to resurect my patch checker, tune it and
make it send warning on patches not ACK'ed or reviewed automatically,
or even better make it a web page to avoid boring people.
  http://libvirt.org/git/?p=patchchecker.git;a=summary

  I had that patch on my radar and though I took an awful long time
to review it it's there.

  BTW I'm late for rc2, I will make it tomorrow I guess, sorry about
it this is due to events out of my control :-\

Daniel

-- 
Daniel Veillard  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
dan...@veillard.com  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] fix deadlock when qemu cannot start

2012-03-30 Thread Serge Hallyn
Quoting Daniel Veillard (veill...@redhat.com):
> On Fri, Mar 16, 2012 at 02:37:42PM +0800, Wen Congyang wrote:
> > When qemu cannot start, we may call qemuProcessStop() twice.
> > We have check whether the vm is running at the beginning of
> > qemuProcessStop() to avoid libvirt deadlock. We call
> > qemuProcessStop() with driver and vm locked. It seems that
> > we can avoid libvirt deadlock. But unfortunately we may
> > unlock driver and vm in the function qemuProcessKill() while
> > vm->def->id is not -1. So qemuProcessStop() will be run twice,
> > and monitor will be freed unexpectedly. So we should set
> > vm->def->id to -1 at the beginning of qemuProcessStop().

Oh, uh, Huh.  This seems like it could be responsible for what I was
reporting a few days ago (*1).  I spent most of yesterday trying to
track it down, only to finally realize that the QemuProcessStart would
silently die at various places.  So i was getting ready to send an email
postulating that what's happening is that a virsh list starts, then
a virsh start starts, and when the virsh list ends it somehow causes
the virsh start to be killed.

I'll test and see if this patch fixes it.

thanks,
-serge

*1: If I create 4 vms and do
for i in `seq 1 4`; do virsh start o$i > /tmp/o$i 2>&1 &; done; for i in `seq 1 
4`; do virsh list > /dev/null 2>&1; done; sleep 1; virsh list
most of the time at least one vm won't start.

> > ---
> >  src/qemu/qemu_process.c |   20 ++--
> >  src/qemu/qemu_process.h |2 ++
> >  2 files changed, 16 insertions(+), 6 deletions(-)
> > 
> > diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
> > index 0af3751..44814df 100644
> > --- a/src/qemu/qemu_process.c
> > +++ b/src/qemu/qemu_process.c
> > @@ -3633,10 +3633,11 @@ qemuProcessKill(struct qemud_driver *driver,
> >  VIR_DEBUG("vm=%s pid=%d flags=%x",
> >vm->def->name, vm->pid, flags);
> >  
> > -if (!virDomainObjIsActive(vm)) {
> > -VIR_DEBUG("VM '%s' not active", vm->def->name);
> > -return 0;
> > -}
> > +if (!(flags & VIR_QEMU_PROCESS_KILL_NOCHECK))
> > +if (!virDomainObjIsActive(vm)) {
> > +VIR_DEBUG("VM '%s' not active", vm->def->name);
> > +return 0;
> > +}
> >  
> >  /* This loop sends SIGTERM (or SIGKILL if flags has
> >   * VIR_QEMU_PROCESS_KILL_FORCE and VIR_QEMU_PROCESS_KILL_NOWAIT),
> > @@ -3739,6 +3740,13 @@ void qemuProcessStop(struct qemud_driver *driver,
> >  return;
> >  }
> >  
> > +/*
> > + * We may unlock driver and vm in qemuProcessKill(), so the other 
> > thread
> > + * can lock driver and vm, and then call qemuProcessStop(). So we 
> > should
> > + * set vm->def->id to -1 here to avoid qemuProcessStop() called twice.
> > + */
> > +vm->def->id = -1;
> > +
> >  if ((logfile = qemuDomainCreateLog(driver, vm, true)) < 0) {
> >  /* To not break the normal domain shutdown process, skip the
> >   * timestamp log writing if failed on opening log file. */
> > @@ -3801,7 +3809,8 @@ void qemuProcessStop(struct qemud_driver *driver,
> >  }
> >  
> >  /* shut it off for sure */
> > -ignore_value(qemuProcessKill(driver, vm, VIR_QEMU_PROCESS_KILL_FORCE));
> > +ignore_value(qemuProcessKill(driver, vm, VIR_QEMU_PROCESS_KILL_FORCE|
> > + 
> > VIR_QEMU_PROCESS_KILL_NOCHECK));
> >  
> >  /* Stop autodestroy in case guest is restarted */
> >  qemuProcessAutoDestroyRemove(driver, vm);
> > @@ -3892,7 +3901,6 @@ retry:
> >  
> >  vm->taint = 0;
> >  vm->pid = -1;
> > -vm->def->id = -1;
> >  virDomainObjSetState(vm, VIR_DOMAIN_SHUTOFF, reason);
> >  VIR_FREE(priv->vcpupids);
> >  priv->nvcpupids = 0;
> > diff --git a/src/qemu/qemu_process.h b/src/qemu/qemu_process.h
> > index 761db6f..891f7a5 100644
> > --- a/src/qemu/qemu_process.h
> > +++ b/src/qemu/qemu_process.h
> > @@ -72,6 +72,8 @@ int qemuProcessAttach(virConnectPtr conn,
> >  typedef enum {
> > VIR_QEMU_PROCESS_KILL_FORCE  = 1 << 0,
> > VIR_QEMU_PROCESS_KILL_NOWAIT = 1 << 1,
> > +   VIR_QEMU_PROCESS_KILL_NOCHECK = 1 << 2, /* donot check whether the vm is
> > +  running */
> >  } virQemuProcessKillMode;
> >  
> >  int qemuProcessKill(struct qemud_driver *driver,
> 
>   Hi Wen,
> 
> sorry for the delay in reviewing the patch. I think I understand the
> issue, and removing the pid and using an extra flag to qemuProcessKill
> to avoid the race sounds reasonable. I rebased the patch a bit, made
> some cosmetic changes especially on the comments and pushed it so it is
> included in rc2 for more testing,
> 
>   thanks !
> 
> Daniel
> 
> -- 
> Daniel Veillard  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
> dan...@veillard.com  | Rpmfind RPM search engine http://rpmfind.net/
> http://veillard.com/ | virtualization library  http://libvirt.org/
> 
> --
> libvir-list mailing list
> libvir-list@

Re: [libvirt] [PATCH] fix deadlock when qemu cannot start

2012-03-29 Thread Daniel Veillard
On Fri, Mar 16, 2012 at 02:37:42PM +0800, Wen Congyang wrote:
> When qemu cannot start, we may call qemuProcessStop() twice.
> We have check whether the vm is running at the beginning of
> qemuProcessStop() to avoid libvirt deadlock. We call
> qemuProcessStop() with driver and vm locked. It seems that
> we can avoid libvirt deadlock. But unfortunately we may
> unlock driver and vm in the function qemuProcessKill() while
> vm->def->id is not -1. So qemuProcessStop() will be run twice,
> and monitor will be freed unexpectedly. So we should set
> vm->def->id to -1 at the beginning of qemuProcessStop().
> 
> ---
>  src/qemu/qemu_process.c |   20 ++--
>  src/qemu/qemu_process.h |2 ++
>  2 files changed, 16 insertions(+), 6 deletions(-)
> 
> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
> index 0af3751..44814df 100644
> --- a/src/qemu/qemu_process.c
> +++ b/src/qemu/qemu_process.c
> @@ -3633,10 +3633,11 @@ qemuProcessKill(struct qemud_driver *driver,
>  VIR_DEBUG("vm=%s pid=%d flags=%x",
>vm->def->name, vm->pid, flags);
>  
> -if (!virDomainObjIsActive(vm)) {
> -VIR_DEBUG("VM '%s' not active", vm->def->name);
> -return 0;
> -}
> +if (!(flags & VIR_QEMU_PROCESS_KILL_NOCHECK))
> +if (!virDomainObjIsActive(vm)) {
> +VIR_DEBUG("VM '%s' not active", vm->def->name);
> +return 0;
> +}
>  
>  /* This loop sends SIGTERM (or SIGKILL if flags has
>   * VIR_QEMU_PROCESS_KILL_FORCE and VIR_QEMU_PROCESS_KILL_NOWAIT),
> @@ -3739,6 +3740,13 @@ void qemuProcessStop(struct qemud_driver *driver,
>  return;
>  }
>  
> +/*
> + * We may unlock driver and vm in qemuProcessKill(), so the other thread
> + * can lock driver and vm, and then call qemuProcessStop(). So we should
> + * set vm->def->id to -1 here to avoid qemuProcessStop() called twice.
> + */
> +vm->def->id = -1;
> +
>  if ((logfile = qemuDomainCreateLog(driver, vm, true)) < 0) {
>  /* To not break the normal domain shutdown process, skip the
>   * timestamp log writing if failed on opening log file. */
> @@ -3801,7 +3809,8 @@ void qemuProcessStop(struct qemud_driver *driver,
>  }
>  
>  /* shut it off for sure */
> -ignore_value(qemuProcessKill(driver, vm, VIR_QEMU_PROCESS_KILL_FORCE));
> +ignore_value(qemuProcessKill(driver, vm, VIR_QEMU_PROCESS_KILL_FORCE|
> + VIR_QEMU_PROCESS_KILL_NOCHECK));
>  
>  /* Stop autodestroy in case guest is restarted */
>  qemuProcessAutoDestroyRemove(driver, vm);
> @@ -3892,7 +3901,6 @@ retry:
>  
>  vm->taint = 0;
>  vm->pid = -1;
> -vm->def->id = -1;
>  virDomainObjSetState(vm, VIR_DOMAIN_SHUTOFF, reason);
>  VIR_FREE(priv->vcpupids);
>  priv->nvcpupids = 0;
> diff --git a/src/qemu/qemu_process.h b/src/qemu/qemu_process.h
> index 761db6f..891f7a5 100644
> --- a/src/qemu/qemu_process.h
> +++ b/src/qemu/qemu_process.h
> @@ -72,6 +72,8 @@ int qemuProcessAttach(virConnectPtr conn,
>  typedef enum {
> VIR_QEMU_PROCESS_KILL_FORCE  = 1 << 0,
> VIR_QEMU_PROCESS_KILL_NOWAIT = 1 << 1,
> +   VIR_QEMU_PROCESS_KILL_NOCHECK = 1 << 2, /* donot check whether the vm is
> +  running */
>  } virQemuProcessKillMode;
>  
>  int qemuProcessKill(struct qemud_driver *driver,

  Hi Wen,

sorry for the delay in reviewing the patch. I think I understand the
issue, and removing the pid and using an extra flag to qemuProcessKill
to avoid the race sounds reasonable. I rebased the patch a bit, made
some cosmetic changes especially on the comments and pushed it so it is
included in rc2 for more testing,

  thanks !

Daniel

-- 
Daniel Veillard  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
dan...@veillard.com  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] fix deadlock when qemu cannot start

2012-03-25 Thread Wen Congyang
ping

At 03/16/2012 02:37 PM, Wen Congyang Wrote:
> When qemu cannot start, we may call qemuProcessStop() twice.
> We have check whether the vm is running at the beginning of
> qemuProcessStop() to avoid libvirt deadlock. We call
> qemuProcessStop() with driver and vm locked. It seems that
> we can avoid libvirt deadlock. But unfortunately we may
> unlock driver and vm in the function qemuProcessKill() while
> vm->def->id is not -1. So qemuProcessStop() will be run twice,
> and monitor will be freed unexpectedly. So we should set
> vm->def->id to -1 at the beginning of qemuProcessStop().
> 
> ---
>  src/qemu/qemu_process.c |   20 ++--
>  src/qemu/qemu_process.h |2 ++
>  2 files changed, 16 insertions(+), 6 deletions(-)
> 
> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
> index 0af3751..44814df 100644
> --- a/src/qemu/qemu_process.c
> +++ b/src/qemu/qemu_process.c
> @@ -3633,10 +3633,11 @@ qemuProcessKill(struct qemud_driver *driver,
>  VIR_DEBUG("vm=%s pid=%d flags=%x",
>vm->def->name, vm->pid, flags);
>  
> -if (!virDomainObjIsActive(vm)) {
> -VIR_DEBUG("VM '%s' not active", vm->def->name);
> -return 0;
> -}
> +if (!(flags & VIR_QEMU_PROCESS_KILL_NOCHECK))
> +if (!virDomainObjIsActive(vm)) {
> +VIR_DEBUG("VM '%s' not active", vm->def->name);
> +return 0;
> +}
>  
>  /* This loop sends SIGTERM (or SIGKILL if flags has
>   * VIR_QEMU_PROCESS_KILL_FORCE and VIR_QEMU_PROCESS_KILL_NOWAIT),
> @@ -3739,6 +3740,13 @@ void qemuProcessStop(struct qemud_driver *driver,
>  return;
>  }
>  
> +/*
> + * We may unlock driver and vm in qemuProcessKill(), so the other thread
> + * can lock driver and vm, and then call qemuProcessStop(). So we should
> + * set vm->def->id to -1 here to avoid qemuProcessStop() called twice.
> + */
> +vm->def->id = -1;
> +
>  if ((logfile = qemuDomainCreateLog(driver, vm, true)) < 0) {
>  /* To not break the normal domain shutdown process, skip the
>   * timestamp log writing if failed on opening log file. */
> @@ -3801,7 +3809,8 @@ void qemuProcessStop(struct qemud_driver *driver,
>  }
>  
>  /* shut it off for sure */
> -ignore_value(qemuProcessKill(driver, vm, VIR_QEMU_PROCESS_KILL_FORCE));
> +ignore_value(qemuProcessKill(driver, vm, VIR_QEMU_PROCESS_KILL_FORCE|
> + VIR_QEMU_PROCESS_KILL_NOCHECK));
>  
>  /* Stop autodestroy in case guest is restarted */
>  qemuProcessAutoDestroyRemove(driver, vm);
> @@ -3892,7 +3901,6 @@ retry:
>  
>  vm->taint = 0;
>  vm->pid = -1;
> -vm->def->id = -1;
>  virDomainObjSetState(vm, VIR_DOMAIN_SHUTOFF, reason);
>  VIR_FREE(priv->vcpupids);
>  priv->nvcpupids = 0;
> diff --git a/src/qemu/qemu_process.h b/src/qemu/qemu_process.h
> index 761db6f..891f7a5 100644
> --- a/src/qemu/qemu_process.h
> +++ b/src/qemu/qemu_process.h
> @@ -72,6 +72,8 @@ int qemuProcessAttach(virConnectPtr conn,
>  typedef enum {
> VIR_QEMU_PROCESS_KILL_FORCE  = 1 << 0,
> VIR_QEMU_PROCESS_KILL_NOWAIT = 1 << 1,
> +   VIR_QEMU_PROCESS_KILL_NOCHECK = 1 << 2, /* donot check whether the vm is
> +  running */
>  } virQemuProcessKillMode;
>  
>  int qemuProcessKill(struct qemud_driver *driver,

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list