On Mon, Apr 22, 2019 at 10:59 PM Jason Wang <jasow...@redhat.com> wrote: > > > On 2019/4/23 上午4:14, Dan Streetman wrote: > > On Sun, Apr 21, 2019 at 10:50 PM Jason Wang <jasow...@redhat.com> wrote: > >> > >> On 2019/4/17 上午2:46, Dan Streetman wrote: > >>> From: Dan Streetman <ddstr...@canonical.com> > >>> > >>> Buglink: https://launchpad.net/bugs/1823458 > >>> > >>> There is a race condition when using the vhost-user driver, between a > >>> guest > >>> shutdown and the vhost-user interface being closed. This is explained in > >>> more detail at the bug link above; the short explanation is the vhost-user > >>> device can be closed while the main thread is in the middle of stopping > >>> the vhost_net. In this case, the main thread handling shutdown will > >>> enter virtio_net_vhost_status() and move into the n->vhost_started (else) > >>> block, and call vhost_net_stop(); while it is running that function, > >>> another thread is notified that the vhost-user device has been closed, > >>> and (indirectly) calls into virtio_net_vhost_status() also. > >> > >> I think we need figure out why there are multiple vhost_net_stop() calls > >> simultaneously. E.g vhost-user register fd handlers like: > >> > >> qemu_chr_fe_set_handlers(&s->chr, NULL, NULL, > >> net_vhost_user_event, NULL, nc0->name, > >> NULL, > >> true); > >> > >> which uses default main context, so it should only be called only in > >> main thread. > > net_vhost_user_event() schedules chr_closed_bh() to do its bottom half > > work; does aio_bh_schedule_oneshot() execute its events from the main > > thread? > > > I think so if net_vhost_user_event() was called in main thread (it calls > qemu_get_current_aio_context()).
ok, I'll check that, thanks! I think my other patch, to remove the vhost_user_stop() call completely from the net_vhost_user_event() handler for CHR_EVENT_CLOSED, is still relevant; do you have thoughts on that? > > > > > > For reference, the call chain is: > > > > chr_closed_bh() > > qmp_set_link() > > nc->info->link_status_changed() -> virtio_net_set_link_status() > > virtio_net_set_status() > > virtio_net_vhost_status() > > > The code was added by Marc since: > > commit e7c83a885f865128ae3cf1946f8cb538b63cbfba > Author: Marc-André Lureau <marcandre.lur...@redhat.com> > Date: Mon Feb 27 14:49:56 2017 +0400 > > vhost-user: delay vhost_user_stop > > Cc him for more thoughts. > > Thanks > > > >> Thanks > >> > >> > >>> Since the > >>> vhost_net status hasn't yet changed, the second thread also enters > >>> the n->vhost_started block, and also calls vhost_net_stop(). This > >>> causes problems for the second thread when it tries to stop the network > >>> that's already been stopped. > >>> > >>> This adds a flag to the struct that's atomically set to prevent more than > >>> one thread from calling vhost_net_stop(). The atomic_fetch_inc() is > >>> likely > >>> overkill and probably could be done with a simple check-and-set, but > >>> since it's a race condition there would still be a (very, very) small > >>> window without using an atomic to set it. > >>> > >>> Signed-off-by: Dan Streetman <ddstr...@canonical.com> > >>> --- > >>> hw/net/virtio-net.c | 3 ++- > >>> include/hw/virtio/virtio-net.h | 1 + > >>> 2 files changed, 3 insertions(+), 1 deletion(-) > >>> > >>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c > >>> index ffe0872fff..d36f50d5dd 100644 > >>> --- a/hw/net/virtio-net.c > >>> +++ b/hw/net/virtio-net.c > >>> @@ -13,6 +13,7 @@ > >>> > >>> #include "qemu/osdep.h" > >>> #include "qemu/iov.h" > >>> +#include "qemu/atomic.h" > >>> #include "hw/virtio/virtio.h" > >>> #include "net/net.h" > >>> #include "net/checksum.h" > >>> @@ -240,7 +241,7 @@ static void virtio_net_vhost_status(VirtIONet *n, > >>> uint8_t status) > >>> "falling back on userspace virtio", -r); > >>> n->vhost_started = 0; > >>> } > >>> - } else { > >>> + } else if (atomic_fetch_inc(&n->vhost_stopped) == 0) { > >>> vhost_net_stop(vdev, n->nic->ncs, queues); > >>> n->vhost_started = 0; > >>> } > >>> diff --git a/include/hw/virtio/virtio-net.h > >>> b/include/hw/virtio/virtio-net.h > >>> index b96f0c643f..d03fd933d0 100644 > >>> --- a/include/hw/virtio/virtio-net.h > >>> +++ b/include/hw/virtio/virtio-net.h > >>> @@ -164,6 +164,7 @@ struct VirtIONet { > >>> uint8_t nouni; > >>> uint8_t nobcast; > >>> uint8_t vhost_started; > >>> + int vhost_stopped; > >>> struct { > >>> uint32_t in_use; > >>> uint32_t first_multi;