On Mon, 14 Nov 2016 14:41:42 +0100 Paolo Bonzini <pbonz...@redhat.com> wrote:
> On 11/11/2016 22:03, Alex Williamson wrote: > > On Fri, 11 Nov 2016 21:24:33 +0100 > > Paolo Bonzini <pbonz...@redhat.com> wrote: > >> If you can post a backtrace of all threads at the time of the hang, from > >> origin/master (so without vhost, and not at ad07cd6) that could help. > > > > Yes, it occurs with all of the vfio devices removed using VNC/Cirrus. > > I cannot reproduce it anyway. :( > > As you said on IRC it's a pretty standard "event loop doing nothing" > backtrace, so it seems that an eventfd write was lost. > > Since I was lucky with the vhost patch, perhaps this can help: > > diff --git a/hw/scsi/virtio-scsi-dataplane.c b/hw/scsi/virtio-scsi-dataplane.c > index f2ea29d..22d6cd5 100644 > --- a/hw/scsi/virtio-scsi-dataplane.c > +++ b/hw/scsi/virtio-scsi-dataplane.c > @@ -202,13 +202,15 @@ void virtio_scsi_dataplane_stop(VirtIODevice *vdev) > > aio_context_acquire(s->ctx); > virtio_scsi_clear_aio(s); > - aio_context_release(s->ctx); > - > - blk_drain_all(); /* ensure there are no in-flight requests */ > > for (i = 0; i < vs->conf.num_queues + 2; i++) { > + VirtQueue *vq = virtio_get_queue(vdev, i); > virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), i, false); > + > virtio_queue_host_notifier_aio_read(virtio_queue_get_guest_notifier(vq)); > } > + aio_context_release(s->ctx); > + > + blk_drain_all(); /* ensure there are no in-flight requests */ > > /* Clean up guest notifier (irq) */ > k->set_guest_notifiers(qbus->parent, vs->conf.num_queues + 2, false); > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c > index 89b0b80..9c894d7 100644 > --- a/hw/virtio/virtio.c > +++ b/hw/virtio/virtio.c > @@ -2018,7 +2018,7 @@ EventNotifier > *virtio_queue_get_guest_notifier(VirtQueue *vq) > return &vq->guest_notifier; > } > > -static void virtio_queue_host_notifier_aio_read(EventNotifier *n) > +void virtio_queue_host_notifier_aio_read(EventNotifier *n) > { > VirtQueue *vq = container_of(n, VirtQueue, host_notifier); > if (event_notifier_test_and_clear(n)) { > diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h > index 35ede30..d3dfc69 100644 > --- a/include/hw/virtio/virtio.h > +++ b/include/hw/virtio/virtio.h > @@ -274,6 +274,7 @@ int virtio_device_grab_ioeventfd(VirtIODevice *vdev); > void virtio_device_release_ioeventfd(VirtIODevice *vdev); > bool virtio_device_ioeventfd_enabled(VirtIODevice *vdev); > EventNotifier *virtio_queue_get_host_notifier(VirtQueue *vq); > +void virtio_queue_host_notifier_aio_read(EventNotifier *n); > void virtio_queue_host_notifier_read(EventNotifier *n); > void virtio_queue_aio_set_host_notifier_handler(VirtQueue *vq, AioContext > *ctx, > void (*fn)(VirtIODevice *, > Hmm, fixed yet not fixed. I get a nice shutdown and it even eliminates a cpu spike shown in virt-manager at the end of shutdown that was typical previously, but then I noticed dmesg showing me segfaults, so I hooked up gdb and: Thread 3 "CPU 0/KVM" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fb4f73ba700 (LWP 2713)] 0x00005593dacc2800 in virtio_queue_notify_aio_vq (vq=0x5593dd4a7378) at /net/gimli/home/alwillia/Work/qemu.git/hw/virtio/virtio.c:1242 1242 trace_virtio_queue_notify(vdev, vq - vdev->vq, vq); (gdb) bt #0 0x00005593dacc2800 in virtio_queue_notify_aio_vq (vq=0x5593dd4a7378) at /net/gimli/home/alwillia/Work/qemu.git/hw/virtio/virtio.c:1242 #1 0x00005593dacc4a4e in virtio_queue_host_notifier_aio_read (n=0x5593dd4a73d8) at /net/gimli/home/alwillia/Work/qemu.git/hw/virtio/virtio.c:2025 #2 0x00005593daca4997 in virtio_scsi_dataplane_stop (vdev=0x5593dc0cc0f0) at /net/gimli/home/alwillia/Work/qemu.git/hw/scsi/virtio-scsi-dataplane.c:209 #3 0x00005593daf6a4b7 in virtio_bus_stop_ioeventfd (bus=0x5593dc0cc078) at hw/virtio/virtio-bus.c:219 #4 0x00005593daf64279 in virtio_pci_stop_ioeventfd (proxy=0x5593dc0c3ce0) at hw/virtio/virtio-pci.c:344 #5 0x00005593daf643d5 in virtio_ioport_write (opaque=0x5593dc0c3ce0, addr=18, val=0) at hw/virtio/virtio-pci.c:380 #6 0x00005593daf6484d in virtio_pci_config_write (opaque=0x5593dc0c3ce0, addr=18, val=0, size=1) at hw/virtio/virtio-pci.c:508 #7 0x00005593dac592fd in memory_region_write_accessor (mr=0x5593dc0c45d0, addr=18, value=0x7fb4f73b74b8, size=1, shift=0, mask=255, attrs=...) at /net/gimli/home/alwillia/Work/qemu.git/memory.c:526 #8 0x00005593dac59515 in access_with_adjusted_size (addr=18, value=0x7fb4f73b74b8, size=1, access_size_min=1, access_size_max=4, access= 0x5593dac59213 <memory_region_write_accessor>, mr=0x5593dc0c45d0, attrs=...) at /net/gimli/home/alwillia/Work/qemu.git/memory.c:592 #9 0x00005593dac5bc55 in memory_region_dispatch_write (mr=0x5593dc0c45d0, addr=18, data=0, size=1, attrs=...) at /net/gimli/home/alwillia/Work/qemu.git/memory.c:1323 #10 0x00005593dac07583 in address_space_write_continue (as=0x5593db727de0 <address_space_io>, addr=49298, attrs=..., buf=0x7fb520354000 "", len=1, addr1=18, l=1, mr=0x5593dc0c45d0) at /net/gimli/home/alwillia/Work/qemu.git/exec.c:2621 #11 0x00005593dac076cb in address_space_write (as=0x5593db727de0 <address_space_io>, addr=49298, attrs=..., buf=0x7fb520354000 "", len=1) at /net/gimli/home/alwillia/Work/qemu.git/exec.c:2666 #12 0x00005593dac07a57 in address_space_rw (as=0x5593db727de0 <address_space_io>, addr=49298, attrs=..., buf=0x7fb520354000 "", len=1, is_write=true) at /net/gimli/home/alwillia/Work/qemu.git/exec.c:2768 #13 0x00005593dac558d7 in kvm_handle_io (port=49298, attrs=..., data=0x7fb520354000, direction=1, size=1, count=1) at /net/gimli/home/alwillia/Work/qemu.git/kvm-all.c:1800 #14 0x00005593dac55ddd in kvm_cpu_exec (cpu=0x5593dc0a6490) at /net/gimli/home/alwillia/Work/qemu.git/kvm-all.c:1958 #15 0x00005593dac3cc58 in qemu_kvm_cpu_thread_fn (arg=0x5593dc0a6490) at /net/gimli/home/alwillia/Work/qemu.git/cpus.c:998 #16 0x00007fb5054715ca in start_thread (arg=0x7fb4f73ba700) at pthread_create.c:333 #17 0x00007fb5051ab0ed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 > And if it doesn't work here is some printf debugging. It's pretty verbose but > the interesting part starts pretty much where you issue the virsh shutdown or > system_powerdown command: > > diff --git a/hw/scsi/virtio-scsi-dataplane.c b/hw/scsi/virtio-scsi-dataplane.c > index f2ea29d..ec0f750 100644 > --- a/hw/scsi/virtio-scsi-dataplane.c > +++ b/hw/scsi/virtio-scsi-dataplane.c > @@ -108,11 +108,13 @@ static void virtio_scsi_clear_aio(VirtIOSCSI *s) > VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(s); > int i; > > + printf("before clear\n"); > virtio_queue_aio_set_host_notifier_handler(vs->ctrl_vq, s->ctx, NULL); > virtio_queue_aio_set_host_notifier_handler(vs->event_vq, s->ctx, NULL); > for (i = 0; i < vs->conf.num_queues; i++) { > virtio_queue_aio_set_host_notifier_handler(vs->cmd_vqs[i], s->ctx, > NULL); > } > + printf("after clear\n"); > } > > /* Context: QEMU global mutex held */ > @@ -202,15 +204,20 @@ void virtio_scsi_dataplane_stop(VirtIODevice *vdev) > > aio_context_acquire(s->ctx); > virtio_scsi_clear_aio(s); > - aio_context_release(s->ctx); > - > - blk_drain_all(); /* ensure there are no in-flight requests */ > > for (i = 0; i < vs->conf.num_queues + 2; i++) { > + VirtQueue *vq = virtio_get_queue(vdev, i); > virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), i, false); > + > virtio_queue_host_notifier_aio_read(virtio_queue_get_guest_notifier(vq)); > } > + aio_context_release(s->ctx); > + > + printf("before drain\n"); > + blk_drain_all(); /* ensure there are no in-flight requests */ > + printf("after drain\n"); > > /* Clean up guest notifier (irq) */ > + printf("end of virtio_scsi_dataplane_stop\n"); > k->set_guest_notifiers(qbus->parent, vs->conf.num_queues + 2, false); > s->dataplane_stopping = false; > s->dataplane_started = false; > diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c > index 3e5ae6a..e8b83d4 100644 > --- a/hw/scsi/virtio-scsi.c > +++ b/hw/scsi/virtio-scsi.c > @@ -75,6 +75,7 @@ static void virtio_scsi_complete_req(VirtIOSCSIReq *req) > } > > if (req->sreq) { > + printf("finish %x\n", req->sreq->tag); > req->sreq->hba_private = NULL; > scsi_req_unref(req->sreq); > } > @@ -549,6 +549,7 @@ static int virtio_scsi_handle_cmd_req_prepare(VirtIOSCSI > *s, VirtIOSCSIReq *req) > return -ENOENT; > } > virtio_scsi_ctx_check(s, d); > + printf("prepare %lx %x\n", req->req.cmd.tag, req->req.cmd.cdb[0]); > req->sreq = scsi_req_new(d, req->req.cmd.tag, > virtio_scsi_get_lun(req->req.cmd.lun), > req->req.cmd.cdb, req); > diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c > index 62001b4..c75dec3 100644 > --- a/hw/virtio/virtio-pci.c > +++ b/hw/virtio/virtio-pci.c > @@ -336,11 +336,13 @@ static int virtio_pci_ioeventfd_assign(DeviceState *d, > EventNotifier *notifier, > > static void virtio_pci_start_ioeventfd(VirtIOPCIProxy *proxy) > { > + printf("start ioeventfd %s\n", > object_class_get_name(object_get_class(OBJECT(proxy)))); > virtio_bus_start_ioeventfd(&proxy->bus); > } > > static void virtio_pci_stop_ioeventfd(VirtIOPCIProxy *proxy) > { > + printf("stop ioeventfd %s\n", > object_class_get_name(object_get_class(OBJECT(proxy)))); > virtio_bus_stop_ioeventfd(&proxy->bus); > } > > @@ -376,6 +378,7 @@ static void virtio_ioport_write(void *opaque, uint32_t > addr, uint32_t val) > } > break; > case VIRTIO_PCI_STATUS: > + printf("set status %s %x\n", > object_class_get_name(object_get_class(OBJECT(proxy))), val & 0xFF); > if (!(val & VIRTIO_CONFIG_S_DRIVER_OK)) { > virtio_pci_stop_ioeventfd(proxy); > } > @@ -1274,6 +1277,7 @@ static void virtio_pci_common_write(void *opaque, > hwaddr addr, > vdev->config_vector = val; > break; > case VIRTIO_PCI_COMMON_STATUS: > + printf("set status %s %x\n", > object_class_get_name(object_get_class(OBJECT(proxy))), (uint8_t)val); > if (!(val & VIRTIO_CONFIG_S_DRIVER_OK)) { > virtio_pci_stop_ioeventfd(proxy); > } > This required making virtio_queue_host_notifier_aio_read() non-static and adding the forward declaration, stolen from the first patch. The attached log starts at the point where there guest is idle and I issue a virsh shutdown. This also results in a segfault nearly identical to above: Thread 4 "CPU 1/KVM" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7f7194901700 (LWP 3804)] 0x000056358070788a in virtio_queue_notify_aio_vq (vq=0x56358310d378) at /net/gimli/home/alwillia/Work/qemu.git/hw/virtio/virtio.c:1242 1242 trace_virtio_queue_notify(vdev, vq - vdev->vq, vq); (gdb) bt #0 0x000056358070788a in virtio_queue_notify_aio_vq (vq=0x56358310d378) at /net/gimli/home/alwillia/Work/qemu.git/hw/virtio/virtio.c:1242 #1 0x0000563580709ad8 in virtio_queue_host_notifier_aio_read (n=0x56358310d3d8) at /net/gimli/home/alwillia/Work/qemu.git/hw/virtio/virtio.c:2025 #2 0x00005635806e99fd in virtio_scsi_dataplane_stop (vdev=0x563581d320f0) at /net/gimli/home/alwillia/Work/qemu.git/hw/scsi/virtio-scsi-dataplane.c:211 #3 0x00005635809af5fe in virtio_bus_stop_ioeventfd (bus=0x563581d32078) at hw/virtio/virtio-bus.c:219 #4 0x00005635809a9353 in virtio_pci_stop_ioeventfd (proxy=0x563581d29ce0) at hw/virtio/virtio-pci.c:346 #5 0x00005635809a94e0 in virtio_ioport_write (opaque=0x563581d29ce0, addr=18, val=0) at hw/virtio/virtio-pci.c:383 #6 0x00005635809a995d in virtio_pci_config_write (opaque=0x563581d29ce0, addr=18, val=0, size=1) at hw/virtio/virtio-pci.c:511 #7 0x000056358069e2fd in memory_region_write_accessor (mr=0x563581d2a5d0, addr=18, value=0x7f71948fe4b8, size=1, shift=0, mask=255, attrs=...) at /net/gimli/home/alwillia/Work/qemu.git/memory.c:526 #8 0x000056358069e515 in access_with_adjusted_size (addr=18, value=0x7f71948fe4b8, size=1, access_size_min=1, access_size_max=4, access= 0x56358069e213 <memory_region_write_accessor>, mr=0x563581d2a5d0, attrs=...) at /net/gimli/home/alwillia/Work/qemu.git/memory.c:592 #9 0x00005635806a0c55 in memory_region_dispatch_write (mr=0x563581d2a5d0, addr=18, data=0, size=1, attrs=...) at /net/gimli/home/alwillia/Work/qemu.git/memory.c:1323 #10 0x000056358064c583 in address_space_write_continue (as=0x56358116cde0 <address_space_io>, addr=49298, attrs=..., buf=0x7f71be099000 "", len=1, addr1=18, l=1, mr=0x563581d2a5d0) at /net/gimli/home/alwillia/Work/qemu.git/exec.c:2621 #11 0x000056358064c6cb in address_space_write (as=0x56358116cde0 <address_space_io>, addr=49298, attrs=..., buf=0x7f71be099000 "", len=1) at /net/gimli/home/alwillia/Work/qemu.git/exec.c:2666 #12 0x000056358064ca57 in address_space_rw (as=0x56358116cde0 <address_space_io>, addr=49298, attrs=..., buf=0x7f71be099000 "", len=1, is_write=true) at /net/gimli/home/alwillia/Work/qemu.git/exec.c:2768 #13 0x000056358069a8d7 in kvm_handle_io (port=49298, attrs=..., data=0x7f71be099000, direction=1, size=1, count=1) at /net/gimli/home/alwillia/Work/qemu.git/kvm-all.c:1800 #14 0x000056358069addd in kvm_cpu_exec (cpu=0x563581d6d030) at /net/gimli/home/alwillia/Work/qemu.git/kvm-all.c:1958 #15 0x0000563580681c58 in qemu_kvm_cpu_thread_fn (arg=0x563581d6d030) at /net/gimli/home/alwillia/Work/qemu.git/cpus.c:998 #16 0x00007f71a31b95ca in start_thread (arg=0x7f7194901700) at pthread_create.c:333 #17 0x00007f71a2ef30ed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 If you care to match line numbers, my tree is based on 6bbcb76301a72dc80c8d29af13d40bb9a759c9c6, it includes you patch: virtio: introduce grab/release_ioeventfd to fix vhost Plus your first fix removing the assert and return 0 case from virtio_bus_set_host_notifier(). Thanks, Alex
shutdown.log.bz2
Description: application/bzip