Re: [libvirt] Failed to terminate process 1275 with SIGTERM: Device or resource busy

Richard W.M. Jones Tue, 19 Jan 2016 05:42:14 -0800

On Tue, Jan 19, 2016 at 12:31:48PM +0100, Kashyap Chamarthy wrote:
> On Mon, Jan 18, 2016 at 04:19:58PM +0000, Richard W.M. Jones wrote:
> > On Mon, Jan 18, 2016 at 03:33:25PM +0000, Richard W.M. Jones wrote:
> > > I tried another workaround which was to get virt-resize to fsync the
> > > output file before closing the libvirt connection, but that doesn't
> > > work for reasons I don't understand so far - still studying this.
> > 
> > I worked out what was happening here -- I'd inserted the fsync at the
> > wrong place in virt-resize.  So I have now successfully worked around
> > this for the virt-resize case, however it's still a problem that could
> > manifest itself in other uses of libvirt + qemu + slow devices.
> 
> We've seen the "Failed to terminate process 1275 with SIGTERM: Device or
> resource busy" error occur in context of OpenStack as well[1][2].
> 
> The behavior is from virDomainDestroy() API (src/libvirt-domain.c):
> 
>     [...]
>     * virDomainDestroy first requests that a guest terminate (e.g.
>     * SIGTERM), then waits for it to comply. After a reasonable timeout,
>     * if the guest still exists, virDomainDestroy will forcefully
>     * terminate the guest (e.g. SIGKILL) if necessary (which may produce
>     * undesirable results, for example unflushed disk cache in the
>     * guest). To avoid this possibility, it's recommended to instead
>     * call virDomainDestroyFlags, sending the
>     * VIR_DOMAIN_DESTROY_GRACEFUL flag.
>     [...]
> 
> Dan Berrange explains[1]:
> 
>   There are two reasons why you'd get this failure ("Failed to terminate
>   process: Device or resource busy") from libvirt. 
>    
>     - The host is so overloaded that the kernel was not able to clean up
>       the process in the time that libvirt was prepared to wait. If this
>       is the case, the process should eventually go away on its own
>       after a short while longer and everything should return to normal
> 
>     - There is some problem, causing the process to get stuck in an
>       uninterruptable wait state. This is usually due to something going
>       wrong in the storage stack, causing some I/O read/write operation
>       to hang in kernel space. In this case the process will stay around
>       in the zombie state forever, or until the storage problem is
>       resolved.


Thanks for finding this documentation.

The problem with this theory is we are passing the
VIR_DOMAIN_DESTROY_GRACEFUL flag, so that would indicate that this
flag is buggy.

I think what we need is a test case, so here goes.  Note you must run
these steps as *non-root*.

(1) Download the attachment to /var/tmp

(2) chmod +x /var/tmp/qemu.sh

(3) killall libvirtd             ;# kills the session libvirtd

(4) LIBGUESTFS_HV=/var/tmp/qemu.sh guestfish -N fs exit -vx

You should see at the end of the output:

libguestfs: calling virDomainDestroy "guestfs-q94hsiz89t8jp418" 
flags=VIR_DOMAIN_DESTROY_GRACEFUL
[pause of a few seconds]
libguestfs: error: could not destroy libvirt domain: Failed to terminate 
process 11412 with SIGTERM: Device or resource busy [code=38 domain=0]

If someone else can reproduce this, then I will file a bug.

Rich.


> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1205647 --
>     nova.virt.libvirt.driver fails to shutdown reboot instance with
>     error 'Code=38 Error=Failed to terminate process 4260 with SIGKILL:
>     Device or resource busy' 
> [2] https://bugs.launchpad.net/nova/+bug/1353939 -- Rescue fails with
>     'Failed to terminate process: Device or resource busy' in the n-cpu
>     log
> 
> -- 
> /kashyap

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW

qemu.sh
Description: Bourne shell script

--
libvir-list mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] Failed to terminate process 1275 with SIGTERM: Device or resource busy

Reply via email to