Re: [Qemu-discuss] Fwd: qemu VM cannot be killed
Does this represent a security vulnerability (or two)? Specifically: - If this can be triggered by code running in the Guest OS (as opposed to via the qemu commandline/config) it could be considered a violation of the sandboxing. - If this can be done when qemu is not started by root, it could be considered a violation of basic kernel security on the host, since a non-root process should have no way to cause a kernel hang that cannot be resolved by kill -9, regardless if the related syscalls are made by a qemu binary or by a custom exploit program. (Note: Such vulnerabilities would not normally be discussed in public, but since the report is already public there is no further harm). On 15/11/2017 15:38, Michael S. Tsirkin wrote: Yes - I suspect a packet is stuck somewhere in networking stack. This is what vhost is waiting for. Yes, host reboot is the only way out. RHEL disables zero copy tx in vhost to avoid these issues. On Wed, Nov 15, 2017 at 02:55:32PM +0100, Lukáš Kubín wrote: CC-ing Michael and Jason as I was suggested in OFTC:#virt forum. Thanks! -- Forwarded message -- From: Lukáš Kubín Date: Wed, Nov 15, 2017 at 1:39 PM Subject: qemu VM cannot be killed To: qemu-discuss@nongnu.org Hi, we've experienced an issue with kvm instance which got stuck at reboot. It's an OpenStack environment, with OpenContrail networking (vrouter agent running on host), Ubuntu 16.04. Machine was first called to reboot from guest OS by user, had issues with NFS unmount during that, user sent a hard-reboot call from OpenStack again then. Then we (platform operator) got involved, tried to "virsh destroy" it with this output: error: Failed to destroy domain instance-4243 error: Failed to terminate process 140529 with SIGKILL: Device or resource busy Neither "kill -9" sent to the qemu process helped. Good guys at OFTC:#virt have guided me to collect the following traces and ask for help here: # cat /proc/140529/wchan vhost_net_ubuf_put_and_wait # cat /proc/140529/stack [] vhost_net_ubuf_put_and_wait+0x54/0xa0 [vhost_net] [] vhost_net_ioctl+0x354/0x8a0 [vhost_net] [] do_vfs_ioctl+0xa1/0x5f0 [] SyS_ioctl+0x79/0x90 [] entry_SYSCALL_64_fastpath+0x1e/0xa8 [] 0x The versions we use are: • kernel 4.8.0-41-generic • qemu-kvm 1:2.5+dfsg-5ubuntu10.2~xenial0+contrail1 • libvirt-bin 1.3.1-1ubuntu10.1~xenial1+contrail1 What can be the cause for this error? What can we do in such a situation to destroy the VM - is physical server reboot the only option? Enjoy Jakob -- Jakob Bohm, CIO, Partner, WiseMo A/S. https://www.wisemo.com Transformervej 29, 2860 Søborg, Denmark. Direct +45 31 13 16 10 This public discussion message is non-binding and may contain errors. WiseMo - Remote Service Management for PCs, Phones and Embedded
Re: [Qemu-discuss] Fwd: qemu VM cannot be killed
OK, thanks Michael! We'll consider disabling it too. Lukas On Wed, Nov 15, 2017 at 3:38 PM, Michael S. Tsirkin wrote: > Yes - I suspect a packet is stuck somewhere in networking stack. > This is what vhost is waiting for. > > Yes, host reboot is the only way out. > > RHEL disables zero copy tx in vhost to avoid these issues. > > On Wed, Nov 15, 2017 at 02:55:32PM +0100, Lukáš Kubín wrote: > > CC-ing Michael and Jason as I was suggested in OFTC:#virt forum. Thanks! > > > > -- Forwarded message -- > > From: Lukáš Kubín > > Date: Wed, Nov 15, 2017 at 1:39 PM > > Subject: qemu VM cannot be killed > > To: qemu-discuss@nongnu.org > > > > > > Hi, we've experienced an issue with kvm instance which got stuck at > reboot. > > It's an OpenStack environment, with OpenContrail networking (vrouter > agent > > running on host), Ubuntu 16.04. > > > > Machine was first called to reboot from guest OS by user, had issues > with NFS > > unmount during that, user sent a hard-reboot call from OpenStack again > then. > > Then we (platform operator) got involved, tried to "virsh destroy" it > with this > > output: > > > > > > error: Failed to destroy domain instance-4243 > > error: Failed to terminate process 140529 with SIGKILL: Device or > resource > > busy > > > > > > Neither "kill -9" sent to the qemu process helped. > > > > Good guys at OFTC:#virt have guided me to collect the following traces > and ask > > for help here: > > > > > > # cat /proc/140529/wchan > > vhost_net_ubuf_put_and_wait > > > > # cat /proc/140529/stack > > [] vhost_net_ubuf_put_and_wait+0x54/0xa0 > [vhost_net] > > [] vhost_net_ioctl+0x354/0x8a0 [vhost_net] > > [] do_vfs_ioctl+0xa1/0x5f0 > > [] SyS_ioctl+0x79/0x90 > > [] entry_SYSCALL_64_fastpath+0x1e/0xa8 > > [] 0x > > > > > > The versions we use are: > > > > • kernel 4.8.0-41-generic > > • qemu-kvm 1:2.5+dfsg-5ubuntu10.2~xenial0+contrail1 > > • libvirt-bin 1.3.1-1ubuntu10.1~xenial1+contrail1 > > > > What can be the cause for this error? What can we do in such a situation > to > > destroy the VM - is physical server reboot the only option? > > > > Thanks and greetings, > > > > Lukáš > > >
Re: [Qemu-discuss] Fwd: qemu VM cannot be killed
Yes - I suspect a packet is stuck somewhere in networking stack. This is what vhost is waiting for. Yes, host reboot is the only way out. RHEL disables zero copy tx in vhost to avoid these issues. On Wed, Nov 15, 2017 at 02:55:32PM +0100, Lukáš Kubín wrote: > CC-ing Michael and Jason as I was suggested in OFTC:#virt forum. Thanks! > > -- Forwarded message -- > From: Lukáš Kubín > Date: Wed, Nov 15, 2017 at 1:39 PM > Subject: qemu VM cannot be killed > To: qemu-discuss@nongnu.org > > > Hi, we've experienced an issue with kvm instance which got stuck at reboot. > It's an OpenStack environment, with OpenContrail networking (vrouter agent > running on host), Ubuntu 16.04. > > Machine was first called to reboot from guest OS by user, had issues with NFS > unmount during that, user sent a hard-reboot call from OpenStack again then. > Then we (platform operator) got involved, tried to "virsh destroy" it with > this > output: > > > error: Failed to destroy domain instance-4243 > error: Failed to terminate process 140529 with SIGKILL: Device or resource > busy > > > Neither "kill -9" sent to the qemu process helped. > > Good guys at OFTC:#virt have guided me to collect the following traces and ask > for help here: > > > # cat /proc/140529/wchan > vhost_net_ubuf_put_and_wait > > # cat /proc/140529/stack > [] vhost_net_ubuf_put_and_wait+0x54/0xa0 [vhost_net] > [] vhost_net_ioctl+0x354/0x8a0 [vhost_net] > [] do_vfs_ioctl+0xa1/0x5f0 > [] SyS_ioctl+0x79/0x90 > [] entry_SYSCALL_64_fastpath+0x1e/0xa8 > [] 0x > > > The versions we use are: > > • kernel 4.8.0-41-generic > • qemu-kvm 1:2.5+dfsg-5ubuntu10.2~xenial0+contrail1 > • libvirt-bin 1.3.1-1ubuntu10.1~xenial1+contrail1 > > What can be the cause for this error? What can we do in such a situation to > destroy the VM - is physical server reboot the only option? > > Thanks and greetings, > > Lukáš >
[Qemu-discuss] Fwd: qemu VM cannot be killed
CC-ing Michael and Jason as I was suggested in OFTC:#virt forum. Thanks! -- Forwarded message -- From: Lukáš Kubín Date: Wed, Nov 15, 2017 at 1:39 PM Subject: qemu VM cannot be killed To: qemu-discuss@nongnu.org Hi, we've experienced an issue with kvm instance which got stuck at reboot. It's an OpenStack environment, with OpenContrail networking (vrouter agent running on host), Ubuntu 16.04. Machine was first called to reboot from guest OS by user, had issues with NFS unmount during that, user sent a hard-reboot call from OpenStack again then. Then we (platform operator) got involved, tried to "virsh destroy" it with this output: error: Failed to destroy domain instance-4243 error: Failed to terminate process 140529 with SIGKILL: Device or resource busy Neither "kill -9" sent to the qemu process helped. Good guys at OFTC:#virt have guided me to collect the following traces and ask for help here: # cat /proc/140529/wchan vhost_net_ubuf_put_and_wait # cat /proc/140529/stack [] vhost_net_ubuf_put_and_wait+0x54/0xa0 [vhost_net] [] vhost_net_ioctl+0x354/0x8a0 [vhost_net] [] do_vfs_ioctl+0xa1/0x5f0 [] SyS_ioctl+0x79/0x90 [] entry_SYSCALL_64_fastpath+0x1e/0xa8 [] 0x The versions we use are: - kernel 4.8.0-41-generic - qemu-kvm 1:2.5+dfsg-5ubuntu10.2~xenial0+contrail1 - libvirt-bin 1.3.1-1ubuntu10.1~xenial1+contrail1 What can be the cause for this error? What can we do in such a situation to destroy the VM - is physical server reboot the only option? Thanks and greetings, Lukáš