Re: [PATCH for 7.2-rc? v2 0/5] continuing efforts to fix vhost-user issues
On Sat, 26 Nov 2022 at 09:18, Alex Bennée wrote: > > > Stefan Hajnoczi writes: > > > On Sat, 26 Nov 2022 at 04:45, Alex Bennée wrote: > >> > >> > >> Alex Bennée writes: > >> > >> > Alex Bennée writes: > >> > > >> >> Hi, > >> >> > >> > > >> >> I can replicate some of the other failures I've been seeing in CI by > >> >> running: > >> >> > >> >> ../../meson/meson.py test --repeat 10 --print-errorlogs > >> >> qtest-arm/qos-test > >> >> > >> >> however this seems to run everything in parallel and maybe is better > >> >> at exposing race conditions. Perhaps the CI system makes those races > >> >> easier to hit? Unfortunately I've not been able to figure out exactly > >> >> how things go wrong in the failure case. > >> >> > >> > > >> > > >> > There is a circular call - we are in vu_gpio_stop which triggers a write > >> > to vhost-user which allows us to catch a disconnect event: > >> > > >> > #0 vhost_dev_is_started (hdev=0x557adf80d878) at > >> > /home/alex/lsrc/qemu.git/include/hw/virtio/vhost.h:199 > >> > #1 0x557adbe0518a in vu_gpio_stop (vdev=0x557adf80d640) at > >> > ../../hw/virtio/vhost-user-gpio.c:138 > >> > #2 0x557adbe04d56 in vu_gpio_disconnect (dev=0x557adf80d640) > >> > at ../../hw/virtio/vhost-user-gpio.c:255 > >> > #3 0x557adbe049bb in vu_gpio_event (opaque=0x557adf80d640, > >> > event=CHR_EVENT_CLOSED) at ../../hw/virtio/vhost-user-gpio.c:274 > >> > >> I suspect the best choice here is to schedule the cleanup as a later > >> date. Should I use the aio_bh one shots for this or maybe an rcu cleanup > >> event? > >> > >> Paolo, any suggestions? > >> > >> > #4 0x557adc0539ef in chr_be_event (s=0x557adea51f10, > >> > event=CHR_EVENT_CLOSED) at ../../chardev/char.c:61 > >> > #5 0x557adc0506aa in qemu_chr_be_event (s=0x557adea51f10, > >> > event=CHR_EVENT_CLOSED) at ../../chardev/char.c:81 > >> > #6 0x557adc04f666 in tcp_chr_disconnect_locked > >> > (chr=0x557adea51f10) at ../../chardev/char-socket.c:470 > >> > #7 0x557adc04c81a in tcp_chr_write (chr=0x557adea51f10, > >> > buf=0x7ffe8588cce0 "\v", len=20) at > >> > ../../chardev/char-socket.c:129 > > > > Does this mean the backend closed the connection before receiving all > > the vhost-user protocol messages sent by QEMU? > > > > This looks like a backend bug. It prevents QEMU's vhost-user client > > from cleanly stopping the virtqueue (vhost_virtqueue_stop()). > > Well the backend in this case is the qtest framework so not the worlds > most complete implementation. > > > QEMU is still broken if it cannot handle disconnect at any time. Maybe > > a simple solution for that is to check for reentrancy (either by > > checking an existing variable or adding a new one to prevent > > vu_gpio_stop() from calling itself). > > vhost-user-blk introduced an additional flag: > > /* > * There are at least two steps of initialization of the > * vhost-user device. The first is a "connect" step and > * second is a "start" step. Make a separation between > * those initialization phases by using two fields. > */ > /* vhost_user_blk_connect/vhost_user_blk_disconnect */ > bool connected; > /* vhost_user_blk_start/vhost_user_blk_stop */ > bool started_vu; > > but that in itself is not enough. If you look at the various cases of > handling CHR_EVENT_CLOSED you'll see some schedule the shutdown with aio > and some don't even bother (so will probably break the same way). > > Rather than have a mish-mash of solutions maybe we should introduce a > new vhost function - vhost_user_async_close() which can take care of the > scheduling and wrap it with a check for a valid vhost structure in case > it gets shutdown in the meantime? Handling this in core vhost code would be great. I suggested checking a variable because it's not async. Async is more complicated because it creates a new in-between state while waiting for the operation to complete. Async approaches are more likely to have bugs for this reason. vhost-user-blk.c's async shutdown is a good example of that: case CHR_EVENT_CLOSED: if (!runstate_check(RUN_STATE_SHUTDOWN)) { /* * A close event may happen during a read/write, but vhost * code assumes the vhost_dev remains setup, so delay the * stop & clear. */ AioContext *ctx = qemu_get_current_aio_context(); qemu_chr_fe_set_handlers(&s->chardev, NULL, NULL, NULL, NULL, NULL, NULL, false); aio_bh_schedule_oneshot(ctx, vhost_user_blk_chr_closed_bh, opaque); /* * Move vhost device to the stopped state. The vhost-user device * will be clean up and disconnected in BH. This can be useful in * the vhost migration code. If disconnect was caught there is an * option for the general vhost code to get the dev state without * knowing its type (in this case vhost-user).
Re: [PATCH for 7.2-rc? v2 0/5] continuing efforts to fix vhost-user issues
Stefan Hajnoczi writes: > On Sat, 26 Nov 2022 at 04:45, Alex Bennée wrote: >> >> >> Alex Bennée writes: >> >> > Alex Bennée writes: >> > >> >> Hi, >> >> >> > >> >> I can replicate some of the other failures I've been seeing in CI by >> >> running: >> >> >> >> ../../meson/meson.py test --repeat 10 --print-errorlogs >> >> qtest-arm/qos-test >> >> >> >> however this seems to run everything in parallel and maybe is better >> >> at exposing race conditions. Perhaps the CI system makes those races >> >> easier to hit? Unfortunately I've not been able to figure out exactly >> >> how things go wrong in the failure case. >> >> >> > >> > >> > There is a circular call - we are in vu_gpio_stop which triggers a write >> > to vhost-user which allows us to catch a disconnect event: >> > >> > #0 vhost_dev_is_started (hdev=0x557adf80d878) at >> > /home/alex/lsrc/qemu.git/include/hw/virtio/vhost.h:199 >> > #1 0x557adbe0518a in vu_gpio_stop (vdev=0x557adf80d640) at >> > ../../hw/virtio/vhost-user-gpio.c:138 >> > #2 0x557adbe04d56 in vu_gpio_disconnect (dev=0x557adf80d640) >> > at ../../hw/virtio/vhost-user-gpio.c:255 >> > #3 0x557adbe049bb in vu_gpio_event (opaque=0x557adf80d640, >> > event=CHR_EVENT_CLOSED) at ../../hw/virtio/vhost-user-gpio.c:274 >> >> I suspect the best choice here is to schedule the cleanup as a later >> date. Should I use the aio_bh one shots for this or maybe an rcu cleanup >> event? >> >> Paolo, any suggestions? >> >> > #4 0x557adc0539ef in chr_be_event (s=0x557adea51f10, >> > event=CHR_EVENT_CLOSED) at ../../chardev/char.c:61 >> > #5 0x557adc0506aa in qemu_chr_be_event (s=0x557adea51f10, >> > event=CHR_EVENT_CLOSED) at ../../chardev/char.c:81 >> > #6 0x557adc04f666 in tcp_chr_disconnect_locked >> > (chr=0x557adea51f10) at ../../chardev/char-socket.c:470 >> > #7 0x557adc04c81a in tcp_chr_write (chr=0x557adea51f10, >> > buf=0x7ffe8588cce0 "\v", len=20) at >> > ../../chardev/char-socket.c:129 > > Does this mean the backend closed the connection before receiving all > the vhost-user protocol messages sent by QEMU? > > This looks like a backend bug. It prevents QEMU's vhost-user client > from cleanly stopping the virtqueue (vhost_virtqueue_stop()). Well the backend in this case is the qtest framework so not the worlds most complete implementation. > QEMU is still broken if it cannot handle disconnect at any time. Maybe > a simple solution for that is to check for reentrancy (either by > checking an existing variable or adding a new one to prevent > vu_gpio_stop() from calling itself). vhost-user-blk introduced an additional flag: /* * There are at least two steps of initialization of the * vhost-user device. The first is a "connect" step and * second is a "start" step. Make a separation between * those initialization phases by using two fields. */ /* vhost_user_blk_connect/vhost_user_blk_disconnect */ bool connected; /* vhost_user_blk_start/vhost_user_blk_stop */ bool started_vu; but that in itself is not enough. If you look at the various cases of handling CHR_EVENT_CLOSED you'll see some schedule the shutdown with aio and some don't even bother (so will probably break the same way). Rather than have a mish-mash of solutions maybe we should introduce a new vhost function - vhost_user_async_close() which can take care of the scheduling and wrap it with a check for a valid vhost structure in case it gets shutdown in the meantime? > >> > #8 0x557adc050999 in qemu_chr_write_buffer (s=0x557adea51f10, >> > buf=0x7ffe8588cce0 "\v", len=20, offset=0x7ffe8588cbe4, write_all=true) at >> > ../../chardev/char.c:121 >> > #9 0x557adc0507c7 in qemu_chr_write (s=0x557adea51f10, >> > buf=0x7ffe8588cce0 "\v", len=20, write_all=true) at >> > ../../chardev/char.c:173 >> > #10 0x557adc046f3a in qemu_chr_fe_write_all (be=0x557adf80d830, >> > buf=0x7ffe8588cce0 "\v", len=20) at ../../chardev/char-fe.c:53 >> > #11 0x557adbddc02f in vhost_user_write (dev=0x557adf80d878, >> > msg=0x7ffe8588cce0, fds=0x0, fd_num=0) at ../../hw/virtio/vhost-user.c:490 >> > #12 0x557adbddd48f in vhost_user_get_vring_base (dev=0x557adf80d878, >> > ring=0x7ffe8588d000) at ../../hw/virtio/vhost-user.c:1260 >> > #13 0x557adbdd4bd6 in vhost_virtqueue_stop (dev=0x557adf80d878, >> > vdev=0x557adf80d640, vq=0x557adf843570, idx=0) at >> > ../../hw/virtio/vhost.c:1220 >> > #14 0x557adbdd7eda in vhost_dev_stop (hdev=0x557adf80d878, >> > vdev=0x557adf80d640, vrings=false) at ../../hw/virtio/vhost.c:1916 >> > #15 0x557adbe051a6 in vu_gpio_stop (vdev=0x557adf80d640) at >> > ../../hw/virtio/vhost-user-gpio.c:142 >> > #16 0x557adbe04849 in vu_gpio_set_status (vdev=0x557adf80d640, >> > status=15 '\017') at ../../hw/virtio/vhost-user-gpio.c:173 >> > #17 0x557adbdc87ff in virtio_set_status (vdev=0x557adf80d640, val=15 >> > '\017') at ../../hw/virtio/virtio.c:2442 >> > #18 0x557adb
Re: [PATCH for 7.2-rc? v2 0/5] continuing efforts to fix vhost-user issues
On Sat, 26 Nov 2022 at 04:45, Alex Bennée wrote: > > > Alex Bennée writes: > > > Alex Bennée writes: > > > >> Hi, > >> > > > >> I can replicate some of the other failures I've been seeing in CI by > >> running: > >> > >> ../../meson/meson.py test --repeat 10 --print-errorlogs > >> qtest-arm/qos-test > >> > >> however this seems to run everything in parallel and maybe is better > >> at exposing race conditions. Perhaps the CI system makes those races > >> easier to hit? Unfortunately I've not been able to figure out exactly > >> how things go wrong in the failure case. > >> > > > > > > There is a circular call - we are in vu_gpio_stop which triggers a write > > to vhost-user which allows us to catch a disconnect event: > > > > #0 vhost_dev_is_started (hdev=0x557adf80d878) at > > /home/alex/lsrc/qemu.git/include/hw/virtio/vhost.h:199 > > #1 0x557adbe0518a in vu_gpio_stop (vdev=0x557adf80d640) at > > ../../hw/virtio/vhost-user-gpio.c:138 > > #2 0x557adbe04d56 in vu_gpio_disconnect (dev=0x557adf80d640) at > > ../../hw/virtio/vhost-user-gpio.c:255 > > #3 0x557adbe049bb in vu_gpio_event (opaque=0x557adf80d640, > > event=CHR_EVENT_CLOSED) at ../../hw/virtio/vhost-user-gpio.c:274 > > I suspect the best choice here is to schedule the cleanup as a later > date. Should I use the aio_bh one shots for this or maybe an rcu cleanup > event? > > Paolo, any suggestions? > > > #4 0x557adc0539ef in chr_be_event (s=0x557adea51f10, > > event=CHR_EVENT_CLOSED) at ../../chardev/char.c:61 > > #5 0x557adc0506aa in qemu_chr_be_event (s=0x557adea51f10, > > event=CHR_EVENT_CLOSED) at ../../chardev/char.c:81 > > #6 0x557adc04f666 in tcp_chr_disconnect_locked (chr=0x557adea51f10) > > at ../../chardev/char-socket.c:470 > > #7 0x557adc04c81a in tcp_chr_write (chr=0x557adea51f10, > > buf=0x7ffe8588cce0 "\v", len=20) at ../../chardev/char-socket.c:129 Does this mean the backend closed the connection before receiving all the vhost-user protocol messages sent by QEMU? This looks like a backend bug. It prevents QEMU's vhost-user client from cleanly stopping the virtqueue (vhost_virtqueue_stop()). QEMU is still broken if it cannot handle disconnect at any time. Maybe a simple solution for that is to check for reentrancy (either by checking an existing variable or adding a new one to prevent vu_gpio_stop() from calling itself). > > #8 0x557adc050999 in qemu_chr_write_buffer (s=0x557adea51f10, > > buf=0x7ffe8588cce0 "\v", len=20, offset=0x7ffe8588cbe4, write_all=true) at > > ../../chardev/char.c:121 > > #9 0x557adc0507c7 in qemu_chr_write (s=0x557adea51f10, > > buf=0x7ffe8588cce0 "\v", len=20, write_all=true) at ../../chardev/char.c:173 > > #10 0x557adc046f3a in qemu_chr_fe_write_all (be=0x557adf80d830, > > buf=0x7ffe8588cce0 "\v", len=20) at ../../chardev/char-fe.c:53 > > #11 0x557adbddc02f in vhost_user_write (dev=0x557adf80d878, > > msg=0x7ffe8588cce0, fds=0x0, fd_num=0) at ../../hw/virtio/vhost-user.c:490 > > #12 0x557adbddd48f in vhost_user_get_vring_base (dev=0x557adf80d878, > > ring=0x7ffe8588d000) at ../../hw/virtio/vhost-user.c:1260 > > #13 0x557adbdd4bd6 in vhost_virtqueue_stop (dev=0x557adf80d878, > > vdev=0x557adf80d640, vq=0x557adf843570, idx=0) at > > ../../hw/virtio/vhost.c:1220 > > #14 0x557adbdd7eda in vhost_dev_stop (hdev=0x557adf80d878, > > vdev=0x557adf80d640, vrings=false) at ../../hw/virtio/vhost.c:1916 > > #15 0x557adbe051a6 in vu_gpio_stop (vdev=0x557adf80d640) at > > ../../hw/virtio/vhost-user-gpio.c:142 > > #16 0x557adbe04849 in vu_gpio_set_status (vdev=0x557adf80d640, > > status=15 '\017') at ../../hw/virtio/vhost-user-gpio.c:173 > > #17 0x557adbdc87ff in virtio_set_status (vdev=0x557adf80d640, val=15 > > '\017') at ../../hw/virtio/virtio.c:2442 > > #18 0x557adbdcbfa0 in virtio_vmstate_change (opaque=0x557adf80d640, > > running=false, state=RUN_STATE_SHUTDOWN) at ../../hw/virtio/virtio.c:3736 > > #19 0x557adb91ad27 in vm_state_notify (running=false, > > state=RUN_STATE_SHUTDOWN) at ../../softmmu/runstate.c:334 > > #20 0x557adb910e88 in do_vm_stop (state=RUN_STATE_SHUTDOWN, > > send_stop=false) at ../../softmmu/cpus.c:262 > > #21 0x557adb910e30 in vm_shutdown () at ../../softmmu/cpus.c:280 > > #22 0x557adb91b9c3 in qemu_cleanup () at ../../softmmu/runstate.c:827 > > #23 0x557adb522975 in qemu_default_main () at ../../softmmu/main.c:38 > > #24 0x557adb5229a8 in main (argc=27, argv=0x7ffe8588d2f8) at > > ../../softmmu/main.c:48 > > (rr) p hdev->started > > $9 = true > > (rr) info thread > > Id Target IdFrame > > * 1Thread 2140414.2140414 (qemu-system-aar) vhost_dev_is_started > > (hdev=0x557adf80d878) at > > /home/alex/lsrc/qemu.git/include/hw/virtio/vhost.h:199 > > 2Thread 2140414.2140439 (qemu-system-aar) 0x7002 in > > syscall_traced () > > 3Thread 214
Re: [PATCH for 7.2-rc? v2 0/5] continuing efforts to fix vhost-user issues
Alex Bennée writes: > Alex Bennée writes: > >> Hi, >> > >> I can replicate some of the other failures I've been seeing in CI by >> running: >> >> ../../meson/meson.py test --repeat 10 --print-errorlogs qtest-arm/qos-test >> >> however this seems to run everything in parallel and maybe is better >> at exposing race conditions. Perhaps the CI system makes those races >> easier to hit? Unfortunately I've not been able to figure out exactly >> how things go wrong in the failure case. >> > > > There is a circular call - we are in vu_gpio_stop which triggers a write > to vhost-user which allows us to catch a disconnect event: > > #0 vhost_dev_is_started (hdev=0x557adf80d878) at > /home/alex/lsrc/qemu.git/include/hw/virtio/vhost.h:199 > #1 0x557adbe0518a in vu_gpio_stop (vdev=0x557adf80d640) at > ../../hw/virtio/vhost-user-gpio.c:138 > #2 0x557adbe04d56 in vu_gpio_disconnect (dev=0x557adf80d640) at > ../../hw/virtio/vhost-user-gpio.c:255 > #3 0x557adbe049bb in vu_gpio_event (opaque=0x557adf80d640, > event=CHR_EVENT_CLOSED) at ../../hw/virtio/vhost-user-gpio.c:274 I suspect the best choice here is to schedule the cleanup as a later date. Should I use the aio_bh one shots for this or maybe an rcu cleanup event? Paolo, any suggestions? > #4 0x557adc0539ef in chr_be_event (s=0x557adea51f10, > event=CHR_EVENT_CLOSED) at ../../chardev/char.c:61 > #5 0x557adc0506aa in qemu_chr_be_event (s=0x557adea51f10, > event=CHR_EVENT_CLOSED) at ../../chardev/char.c:81 > #6 0x557adc04f666 in tcp_chr_disconnect_locked (chr=0x557adea51f10) at > ../../chardev/char-socket.c:470 > #7 0x557adc04c81a in tcp_chr_write (chr=0x557adea51f10, > buf=0x7ffe8588cce0 "\v", len=20) at ../../chardev/char-socket.c:129 > #8 0x557adc050999 in qemu_chr_write_buffer (s=0x557adea51f10, > buf=0x7ffe8588cce0 "\v", len=20, offset=0x7ffe8588cbe4, write_all=true) at > ../../chardev/char.c:121 > #9 0x557adc0507c7 in qemu_chr_write (s=0x557adea51f10, > buf=0x7ffe8588cce0 "\v", len=20, write_all=true) at ../../chardev/char.c:173 > #10 0x557adc046f3a in qemu_chr_fe_write_all (be=0x557adf80d830, > buf=0x7ffe8588cce0 "\v", len=20) at ../../chardev/char-fe.c:53 > #11 0x557adbddc02f in vhost_user_write (dev=0x557adf80d878, > msg=0x7ffe8588cce0, fds=0x0, fd_num=0) at ../../hw/virtio/vhost-user.c:490 > #12 0x557adbddd48f in vhost_user_get_vring_base (dev=0x557adf80d878, > ring=0x7ffe8588d000) at ../../hw/virtio/vhost-user.c:1260 > #13 0x557adbdd4bd6 in vhost_virtqueue_stop (dev=0x557adf80d878, > vdev=0x557adf80d640, vq=0x557adf843570, idx=0) at ../../hw/virtio/vhost.c:1220 > #14 0x557adbdd7eda in vhost_dev_stop (hdev=0x557adf80d878, > vdev=0x557adf80d640, vrings=false) at ../../hw/virtio/vhost.c:1916 > #15 0x557adbe051a6 in vu_gpio_stop (vdev=0x557adf80d640) at > ../../hw/virtio/vhost-user-gpio.c:142 > #16 0x557adbe04849 in vu_gpio_set_status (vdev=0x557adf80d640, > status=15 '\017') at ../../hw/virtio/vhost-user-gpio.c:173 > #17 0x557adbdc87ff in virtio_set_status (vdev=0x557adf80d640, val=15 > '\017') at ../../hw/virtio/virtio.c:2442 > #18 0x557adbdcbfa0 in virtio_vmstate_change (opaque=0x557adf80d640, > running=false, state=RUN_STATE_SHUTDOWN) at ../../hw/virtio/virtio.c:3736 > #19 0x557adb91ad27 in vm_state_notify (running=false, > state=RUN_STATE_SHUTDOWN) at ../../softmmu/runstate.c:334 > #20 0x557adb910e88 in do_vm_stop (state=RUN_STATE_SHUTDOWN, > send_stop=false) at ../../softmmu/cpus.c:262 > #21 0x557adb910e30 in vm_shutdown () at ../../softmmu/cpus.c:280 > #22 0x557adb91b9c3 in qemu_cleanup () at ../../softmmu/runstate.c:827 > #23 0x557adb522975 in qemu_default_main () at ../../softmmu/main.c:38 > #24 0x557adb5229a8 in main (argc=27, argv=0x7ffe8588d2f8) at > ../../softmmu/main.c:48 > (rr) p hdev->started > $9 = true > (rr) info thread > Id Target IdFrame > * 1Thread 2140414.2140414 (qemu-system-aar) vhost_dev_is_started > (hdev=0x557adf80d878) at > /home/alex/lsrc/qemu.git/include/hw/virtio/vhost.h:199 > 2Thread 2140414.2140439 (qemu-system-aar) 0x7002 in > syscall_traced () > 3Thread 2140414.2140442 (qemu-system-aar) 0x7002 in > syscall_traced () > 4Thread 2140414.2140443 (qemu-system-aar) 0x7002 in > syscall_traced () > > During which we eliminate the vhost_dev with a memset: > > Thread 1 hit Hardware watchpoint 2: *(unsigned int *) 0x557adf80da30 > > Old value = 2 > New value = 0 > __memset_avx2_unaligned_erms () at > ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:220 > Download failed: Invalid argument. Continuing without source file > ./string/../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S. > 220 ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: No such > file or directory. > (rr) bt > #0 __memset_avx2_
Re: [PATCH for 7.2-rc? v2 0/5] continuing efforts to fix vhost-user issues
Stefan Weil writes: > Am 25.11.22 um 18:30 schrieb Alex Bennée: >> Hi, >> This is continuing to attempt to fix the various vhost-user issues >> that are currently plaguing the release. One concrete bug I've come >> across is that all qtest MMIO devices where being treated as legacy >> which caused the VIRTIO_F_VERSION_1 flag to get missed causing s390x >> to fall back to trying to set the endian value for the virt-queues. > > Do you want to add my 4 commits which fix format strings for > libvhost-user to your series, or should they be handled separately? I'm going to leave the choice of VirtIO patches to take to MST given he is the maintainer and I've obviously broken it enough this release :-/ -- Alex Bennée
Re: [PATCH for 7.2-rc? v2 0/5] continuing efforts to fix vhost-user issues
Alex Bennée writes: > Hi, > > I can replicate some of the other failures I've been seeing in CI by > running: > > ../../meson/meson.py test --repeat 10 --print-errorlogs qtest-arm/qos-test > > however this seems to run everything in parallel and maybe is better > at exposing race conditions. Perhaps the CI system makes those races > easier to hit? Unfortunately I've not been able to figure out exactly > how things go wrong in the failure case. > There is a circular call - we are in vu_gpio_stop which triggers a write to vhost-user which allows us to catch a disconnect event: #0 vhost_dev_is_started (hdev=0x557adf80d878) at /home/alex/lsrc/qemu.git/include/hw/virtio/vhost.h:199 #1 0x557adbe0518a in vu_gpio_stop (vdev=0x557adf80d640) at ../../hw/virtio/vhost-user-gpio.c:138 #2 0x557adbe04d56 in vu_gpio_disconnect (dev=0x557adf80d640) at ../../hw/virtio/vhost-user-gpio.c:255 #3 0x557adbe049bb in vu_gpio_event (opaque=0x557adf80d640, event=CHR_EVENT_CLOSED) at ../../hw/virtio/vhost-user-gpio.c:274 #4 0x557adc0539ef in chr_be_event (s=0x557adea51f10, event=CHR_EVENT_CLOSED) at ../../chardev/char.c:61 #5 0x557adc0506aa in qemu_chr_be_event (s=0x557adea51f10, event=CHR_EVENT_CLOSED) at ../../chardev/char.c:81 #6 0x557adc04f666 in tcp_chr_disconnect_locked (chr=0x557adea51f10) at ../../chardev/char-socket.c:470 #7 0x557adc04c81a in tcp_chr_write (chr=0x557adea51f10, buf=0x7ffe8588cce0 "\v", len=20) at ../../chardev/char-socket.c:129 #8 0x557adc050999 in qemu_chr_write_buffer (s=0x557adea51f10, buf=0x7ffe8588cce0 "\v", len=20, offset=0x7ffe8588cbe4, write_all=true) at ../../chardev/char.c:121 #9 0x557adc0507c7 in qemu_chr_write (s=0x557adea51f10, buf=0x7ffe8588cce0 "\v", len=20, write_all=true) at ../../chardev/char.c:173 #10 0x557adc046f3a in qemu_chr_fe_write_all (be=0x557adf80d830, buf=0x7ffe8588cce0 "\v", len=20) at ../../chardev/char-fe.c:53 #11 0x557adbddc02f in vhost_user_write (dev=0x557adf80d878, msg=0x7ffe8588cce0, fds=0x0, fd_num=0) at ../../hw/virtio/vhost-user.c:490 #12 0x557adbddd48f in vhost_user_get_vring_base (dev=0x557adf80d878, ring=0x7ffe8588d000) at ../../hw/virtio/vhost-user.c:1260 #13 0x557adbdd4bd6 in vhost_virtqueue_stop (dev=0x557adf80d878, vdev=0x557adf80d640, vq=0x557adf843570, idx=0) at ../../hw/virtio/vhost.c:1220 #14 0x557adbdd7eda in vhost_dev_stop (hdev=0x557adf80d878, vdev=0x557adf80d640, vrings=false) at ../../hw/virtio/vhost.c:1916 #15 0x557adbe051a6 in vu_gpio_stop (vdev=0x557adf80d640) at ../../hw/virtio/vhost-user-gpio.c:142 #16 0x557adbe04849 in vu_gpio_set_status (vdev=0x557adf80d640, status=15 '\017') at ../../hw/virtio/vhost-user-gpio.c:173 #17 0x557adbdc87ff in virtio_set_status (vdev=0x557adf80d640, val=15 '\017') at ../../hw/virtio/virtio.c:2442 #18 0x557adbdcbfa0 in virtio_vmstate_change (opaque=0x557adf80d640, running=false, state=RUN_STATE_SHUTDOWN) at ../../hw/virtio/virtio.c:3736 #19 0x557adb91ad27 in vm_state_notify (running=false, state=RUN_STATE_SHUTDOWN) at ../../softmmu/runstate.c:334 #20 0x557adb910e88 in do_vm_stop (state=RUN_STATE_SHUTDOWN, send_stop=false) at ../../softmmu/cpus.c:262 #21 0x557adb910e30 in vm_shutdown () at ../../softmmu/cpus.c:280 #22 0x557adb91b9c3 in qemu_cleanup () at ../../softmmu/runstate.c:827 #23 0x557adb522975 in qemu_default_main () at ../../softmmu/main.c:38 #24 0x557adb5229a8 in main (argc=27, argv=0x7ffe8588d2f8) at ../../softmmu/main.c:48 (rr) p hdev->started $9 = true (rr) info thread Id Target IdFrame * 1Thread 2140414.2140414 (qemu-system-aar) vhost_dev_is_started (hdev=0x557adf80d878) at /home/alex/lsrc/qemu.git/include/hw/virtio/vhost.h:199 2Thread 2140414.2140439 (qemu-system-aar) 0x7002 in syscall_traced () 3Thread 2140414.2140442 (qemu-system-aar) 0x7002 in syscall_traced () 4Thread 2140414.2140443 (qemu-system-aar) 0x7002 in syscall_traced () During which we eliminate the vhost_dev with a memset: Thread 1 hit Hardware watchpoint 2: *(unsigned int *) 0x557adf80da30 Old value = 2 New value = 0 __memset_avx2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:220 Download failed: Invalid argument. Continuing without source file ./string/../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S. 220 ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: No such file or directory. (rr) bt #0 __memset_avx2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:220 #1 0x557adbdd67f8 in vhost_dev_cleanup (hdev=0x557adf80d878) at ../../hw/virtio/vhost.c:1501 #2 0x557adbe04d68 in vu_gpio_disconnect (dev=0x557adf80d640) at ../../hw/virtio/vhost-user-gpio.c:256 #3 0x557adbe049bb in vu_gpio_event (opaque=0x557adf80d640, event=CHR_EVE
Re: [PATCH for 7.2-rc? v2 0/5] continuing efforts to fix vhost-user issues
Am 25.11.22 um 18:30 schrieb Alex Bennée: Hi, This is continuing to attempt to fix the various vhost-user issues that are currently plaguing the release. One concrete bug I've come across is that all qtest MMIO devices where being treated as legacy which caused the VIRTIO_F_VERSION_1 flag to get missed causing s390x to fall back to trying to set the endian value for the virt-queues. Do you want to add my 4 commits which fix format strings for libvhost-user to your series, or should they be handled separately? Regards Stefan
[PATCH for 7.2-rc? v2 0/5] continuing efforts to fix vhost-user issues
Hi, This is continuing to attempt to fix the various vhost-user issues that are currently plaguing the release. One concrete bug I've come across is that all qtest MMIO devices where being treated as legacy which caused the VIRTIO_F_VERSION_1 flag to get missed causing s390x to fall back to trying to set the endian value for the virt-queues. I've patched it for the GPIO tests and raised a tracking bug (#1342) for the general problem. This might explain why the only other VirtIO vhost-user MMIO device tested via qtest are the virtio-net-tests. The vhost networking support is its own special implementation so its hard to compare the code for GPIO. It does make me wonder if disabling the mmio version of the test for now would be worthwhile. FWIW I did try disabling force-legacy for all machine types and that caused a bunch of the other tests to fail. I made some progress in tracking down the memory leak that clang complains about. It comes down to the line: gpio->vhost_dev.vqs = g_new0(struct vhost_virtqueue, gpio->vhost_dev.nvqs); which is never cleared up because we never call vu_gpio_device_unrealize() in the test. However its unclear why this is the case. We don't seem to unrealize the vhost-user-network tests either and clang doesn't complain about that. I can replicate some of the other failures I've been seeing in CI by running: ../../meson/meson.py test --repeat 10 --print-errorlogs qtest-arm/qos-test however this seems to run everything in parallel and maybe is better at exposing race conditions. Perhaps the CI system makes those races easier to hit? Unfortunately I've not been able to figure out exactly how things go wrong in the failure case. I've included Stefano's: vhost: enable vrings in vhost_dev_start() for vhost-user devices in this series as it makes sense and improves the vring state errors. However it's up to you if you want to include it in the eventual PR. There are still CI errors I'm trying to track down but I thought it would be worth posting the current state of my tree. Please review. Alex Bennée (4): include/hw: attempt to document VirtIO feature variables include/hw: VM state takes precedence in virtio_device_should_start tests/qtests: override "force-legacy" for gpio virtio-mmio tests hw/virtio: ensure a valid host_feature set for virtio-user-gpio Stefano Garzarella (1): vhost: enable vrings in vhost_dev_start() for vhost-user devices include/hw/virtio/vhost.h| 31 ++ include/hw/virtio/virtio.h | 43 ++- backends/cryptodev-vhost.c | 4 +-- backends/vhost-user.c| 4 +-- hw/block/vhost-user-blk.c| 4 +-- hw/net/vhost_net.c | 8 +++--- hw/scsi/vhost-scsi-common.c | 4 +-- hw/virtio/vhost-user-fs.c| 4 +-- hw/virtio/vhost-user-gpio.c | 10 ++-- hw/virtio/vhost-user-i2c.c | 4 +-- hw/virtio/vhost-user-rng.c | 4 +-- hw/virtio/vhost-vsock-common.c | 4 +-- hw/virtio/vhost.c| 44 tests/qtest/libqos/virtio-gpio.c | 3 ++- hw/virtio/trace-events | 4 +-- 15 files changed, 134 insertions(+), 41 deletions(-) -- 2.34.1