Re: [Qemu-discuss] Fwd: qemu VM cannot be killed

2017-11-16 Thread Jakob Bohm

Does this represent a security vulnerability (or two)?

Specifically:

 - If this can be triggered by code running in the Guest OS (as opposed
  to via the qemu commandline/config) it could be considered a violation
  of the sandboxing.

 - If this can be done when qemu is not started by root, it could be
  considered a violation of basic kernel security on the host, since a
  non-root process should have no way to cause a kernel hang that cannot
  be resolved by kill -9, regardless if the related syscalls are made by
  a qemu binary or by a custom exploit program.

(Note: Such vulnerabilities would not normally be discussed in public,
but since the report is already public there is no further harm).

On 15/11/2017 15:38, Michael S. Tsirkin wrote:

Yes - I suspect a packet is stuck somewhere in networking stack.
This is what vhost is waiting for.

Yes, host reboot is the only way out.

RHEL disables zero copy tx in vhost to avoid these issues.

On Wed, Nov 15, 2017 at 02:55:32PM +0100, Lukáš Kubín wrote:

CC-ing Michael and Jason as I was suggested in OFTC:#virt forum. Thanks!

-- Forwarded message --
From: Lukáš Kubín 
Date: Wed, Nov 15, 2017 at 1:39 PM
Subject: qemu VM cannot be killed
To: qemu-discuss@nongnu.org


Hi, we've experienced an issue with kvm instance which got stuck at reboot.
It's an OpenStack environment, with OpenContrail networking (vrouter agent
running on host), Ubuntu 16.04.

Machine was first called to reboot from guest OS by user, had issues with NFS
unmount during that, user sent a hard-reboot call from OpenStack again then.
Then we (platform operator) got involved, tried to "virsh destroy" it with this
output:


 error: Failed to destroy domain instance-4243
 error: Failed to terminate process 140529 with SIGKILL: Device or resource
 busy


Neither "kill -9" sent to the qemu process helped.

Good guys at OFTC:#virt have guided me to collect the following traces and ask
for help here:


 # cat /proc/140529/wchan
 vhost_net_ubuf_put_and_wait

 # cat /proc/140529/stack
 [] vhost_net_ubuf_put_and_wait+0x54/0xa0 [vhost_net]
 [] vhost_net_ioctl+0x354/0x8a0 [vhost_net]
 [] do_vfs_ioctl+0xa1/0x5f0
 [] SyS_ioctl+0x79/0x90
 [] entry_SYSCALL_64_fastpath+0x1e/0xa8
 [] 0x


The versions we use are:

   • kernel 4.8.0-41-generic
   • qemu-kvm 1:2.5+dfsg-5ubuntu10.2~xenial0+contrail1
   • libvirt-bin 1.3.1-1ubuntu10.1~xenial1+contrail1

What can be the cause for this error? What can we do in such a situation to
destroy the VM - is physical server reboot the only option?



Enjoy

Jakob
--
Jakob Bohm, CIO, Partner, WiseMo A/S.  https://www.wisemo.com
Transformervej 29, 2860 Søborg, Denmark.  Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded




Re: [Qemu-discuss] Fwd: qemu VM cannot be killed

2017-11-16 Thread Lukáš Kubín
OK, thanks Michael! We'll consider disabling it too.

Lukas

On Wed, Nov 15, 2017 at 3:38 PM, Michael S. Tsirkin  wrote:

> Yes - I suspect a packet is stuck somewhere in networking stack.
> This is what vhost is waiting for.
>
> Yes, host reboot is the only way out.
>
> RHEL disables zero copy tx in vhost to avoid these issues.
>
> On Wed, Nov 15, 2017 at 02:55:32PM +0100, Lukáš Kubín wrote:
> > CC-ing Michael and Jason as I was suggested in OFTC:#virt forum. Thanks!
> >
> > -- Forwarded message --
> > From: Lukáš Kubín 
> > Date: Wed, Nov 15, 2017 at 1:39 PM
> > Subject: qemu VM cannot be killed
> > To: qemu-discuss@nongnu.org
> >
> >
> > Hi, we've experienced an issue with kvm instance which got stuck at
> reboot.
> > It's an OpenStack environment, with OpenContrail networking (vrouter
> agent
> > running on host), Ubuntu 16.04.
> >
> > Machine was first called to reboot from guest OS by user, had issues
> with NFS
> > unmount during that, user sent a hard-reboot call from OpenStack again
> then.
> > Then we (platform operator) got involved, tried to "virsh destroy" it
> with this
> > output:
> >
> >
> > error: Failed to destroy domain instance-4243
> > error: Failed to terminate process 140529 with SIGKILL: Device or
> resource
> > busy
> >
> >
> > Neither "kill -9" sent to the qemu process helped.
> >
> > Good guys at OFTC:#virt have guided me to collect the following traces
> and ask
> > for help here:
> >
> >
> > # cat /proc/140529/wchan
> > vhost_net_ubuf_put_and_wait
> >
> > # cat /proc/140529/stack
> > [] vhost_net_ubuf_put_and_wait+0x54/0xa0
> [vhost_net]
> > [] vhost_net_ioctl+0x354/0x8a0 [vhost_net]
> > [] do_vfs_ioctl+0xa1/0x5f0
> > [] SyS_ioctl+0x79/0x90
> > [] entry_SYSCALL_64_fastpath+0x1e/0xa8
> > [] 0x
> >
> >
> > The versions we use are:
> >
> >   • kernel 4.8.0-41-generic
> >   • qemu-kvm 1:2.5+dfsg-5ubuntu10.2~xenial0+contrail1
> >   • libvirt-bin 1.3.1-1ubuntu10.1~xenial1+contrail1
> >
> > What can be the cause for this error? What can we do in such a situation
> to
> > destroy the VM - is physical server reboot the only option?
> >
> > Thanks and greetings,
> >
> > Lukáš
> >
>


Re: [Qemu-discuss] Fwd: qemu VM cannot be killed

2017-11-15 Thread Michael S. Tsirkin
Yes - I suspect a packet is stuck somewhere in networking stack.
This is what vhost is waiting for.

Yes, host reboot is the only way out.

RHEL disables zero copy tx in vhost to avoid these issues.

On Wed, Nov 15, 2017 at 02:55:32PM +0100, Lukáš Kubín wrote:
> CC-ing Michael and Jason as I was suggested in OFTC:#virt forum. Thanks!
> 
> -- Forwarded message --
> From: Lukáš Kubín 
> Date: Wed, Nov 15, 2017 at 1:39 PM
> Subject: qemu VM cannot be killed
> To: qemu-discuss@nongnu.org
> 
> 
> Hi, we've experienced an issue with kvm instance which got stuck at reboot.
> It's an OpenStack environment, with OpenContrail networking (vrouter agent
> running on host), Ubuntu 16.04.
> 
> Machine was first called to reboot from guest OS by user, had issues with NFS
> unmount during that, user sent a hard-reboot call from OpenStack again then.
> Then we (platform operator) got involved, tried to "virsh destroy" it with 
> this
> output:
> 
> 
> error: Failed to destroy domain instance-4243
> error: Failed to terminate process 140529 with SIGKILL: Device or resource
> busy
> 
> 
> Neither "kill -9" sent to the qemu process helped.
> 
> Good guys at OFTC:#virt have guided me to collect the following traces and ask
> for help here:
> 
> 
> # cat /proc/140529/wchan
> vhost_net_ubuf_put_and_wait
> 
> # cat /proc/140529/stack
> [] vhost_net_ubuf_put_and_wait+0x54/0xa0 [vhost_net]
> [] vhost_net_ioctl+0x354/0x8a0 [vhost_net]
> [] do_vfs_ioctl+0xa1/0x5f0
> [] SyS_ioctl+0x79/0x90
> [] entry_SYSCALL_64_fastpath+0x1e/0xa8
> [] 0x
> 
> 
> The versions we use are:
> 
>   • kernel 4.8.0-41-generic
>   • qemu-kvm 1:2.5+dfsg-5ubuntu10.2~xenial0+contrail1
>   • libvirt-bin 1.3.1-1ubuntu10.1~xenial1+contrail1
> 
> What can be the cause for this error? What can we do in such a situation to
> destroy the VM - is physical server reboot the only option?
> 
> Thanks and greetings,
> 
> Lukáš
> 



[Qemu-discuss] Fwd: qemu VM cannot be killed

2017-11-15 Thread Lukáš Kubín
CC-ing Michael and Jason as I was suggested in OFTC:#virt forum. Thanks!

-- Forwarded message --
From: Lukáš Kubín 
Date: Wed, Nov 15, 2017 at 1:39 PM
Subject: qemu VM cannot be killed
To: qemu-discuss@nongnu.org


Hi, we've experienced an issue with kvm instance which got stuck at reboot.
It's an OpenStack environment, with OpenContrail networking (vrouter agent
running on host), Ubuntu 16.04.

Machine was first called to reboot from guest OS by user, had issues with
NFS unmount during that, user sent a hard-reboot call from OpenStack again
then. Then we (platform operator) got involved, tried to "virsh destroy" it
with this output:

error: Failed to destroy domain instance-4243
error: Failed to terminate process 140529 with SIGKILL: Device or resource
busy


Neither "kill -9" sent to the qemu process helped.

Good guys at OFTC:#virt have guided me to collect the following traces and
ask for help here:

# cat /proc/140529/wchan
vhost_net_ubuf_put_and_wait

# cat /proc/140529/stack
[] vhost_net_ubuf_put_and_wait+0x54/0xa0 [vhost_net]
[] vhost_net_ioctl+0x354/0x8a0 [vhost_net]
[] do_vfs_ioctl+0xa1/0x5f0
[] SyS_ioctl+0x79/0x90
[] entry_SYSCALL_64_fastpath+0x1e/0xa8
[] 0x

The versions we use are:

   - kernel 4.8.0-41-generic
   - qemu-kvm 1:2.5+dfsg-5ubuntu10.2~xenial0+contrail1
   - libvirt-bin 1.3.1-1ubuntu10.1~xenial1+contrail1

What can be the cause for this error? What can we do in such a situation to
destroy the VM - is physical server reboot the only option?

Thanks and greetings,

Lukáš