subject:"\[PATCH for 7.2\-rc\? v2 0\/5\] continuing efforts to fix vhost\-user issues"

Re: [PATCH for 7.2-rc? v2 0/5] continuing efforts to fix vhost-user issues

2022-11-27 Thread Stefan Hajnoczi

On Sat, 26 Nov 2022 at 09:18, Alex Bennée  wrote:
>
>
> Stefan Hajnoczi  writes:
>
> > On Sat, 26 Nov 2022 at 04:45, Alex Bennée  wrote:
> >>
> >>
> >> Alex Bennée  writes:
> >>
> >> > Alex Bennée  writes:
> >> >
> >> >> Hi,
> >> >>
> >> > 
> >> >> I can replicate some of the other failures I've been seeing in CI by
> >> >> running:
> >> >>
> >> >>   ../../meson/meson.py test --repeat 10 --print-errorlogs 
> >> >> qtest-arm/qos-test
> >> >>
> >> >> however this seems to run everything in parallel and maybe is better
> >> >> at exposing race conditions. Perhaps the CI system makes those races
> >> >> easier to hit? Unfortunately I've not been able to figure out exactly
> >> >> how things go wrong in the failure case.
> >> >>
> >> > 
> >> >
> >> > There is a circular call - we are in vu_gpio_stop which triggers a write
> >> > to vhost-user which allows us to catch a disconnect event:
> >> >
> >> >   #0 vhost_dev_is_started (hdev=0x557adf80d878) at
> >> > /home/alex/lsrc/qemu.git/include/hw/virtio/vhost.h:199
> >> >   #1  0x557adbe0518a in vu_gpio_stop (vdev=0x557adf80d640) at 
> >> > ../../hw/virtio/vhost-user-gpio.c:138
> >> >   #2 0x557adbe04d56 in vu_gpio_disconnect (dev=0x557adf80d640)
> >> > at ../../hw/virtio/vhost-user-gpio.c:255
> >> >   #3 0x557adbe049bb in vu_gpio_event (opaque=0x557adf80d640,
> >> > event=CHR_EVENT_CLOSED) at ../../hw/virtio/vhost-user-gpio.c:274
> >>
> >> I suspect the best choice here is to schedule the cleanup as a later
> >> date. Should I use the aio_bh one shots for this or maybe an rcu cleanup
> >> event?
> >>
> >> Paolo, any suggestions?
> >>
> >> >   #4 0x557adc0539ef in chr_be_event (s=0x557adea51f10,
> >> > event=CHR_EVENT_CLOSED) at ../../chardev/char.c:61
> >> >   #5 0x557adc0506aa in qemu_chr_be_event (s=0x557adea51f10,
> >> > event=CHR_EVENT_CLOSED) at ../../chardev/char.c:81
> >> >   #6 0x557adc04f666 in tcp_chr_disconnect_locked
> >> > (chr=0x557adea51f10) at ../../chardev/char-socket.c:470
> >> >   #7 0x557adc04c81a in tcp_chr_write (chr=0x557adea51f10,
> >> > buf=0x7ffe8588cce0 "\v", len=20) at
> >> > ../../chardev/char-socket.c:129
> >
> > Does this mean the backend closed the connection before receiving all
> > the vhost-user protocol messages sent by QEMU?
> >
> > This looks like a backend bug. It prevents QEMU's vhost-user client
> > from cleanly stopping the virtqueue (vhost_virtqueue_stop()).
>
> Well the backend in this case is the qtest framework so not the worlds
> most complete implementation.
>
> > QEMU is still broken if it cannot handle disconnect at any time. Maybe
> > a simple solution for that is to check for reentrancy (either by
> > checking an existing variable or adding a new one to prevent
> > vu_gpio_stop() from calling itself).
>
> vhost-user-blk introduced an additional flag:
>
> /*
>  * There are at least two steps of initialization of the
>  * vhost-user device. The first is a "connect" step and
>  * second is a "start" step. Make a separation between
>  * those initialization phases by using two fields.
>  */
> /* vhost_user_blk_connect/vhost_user_blk_disconnect */
> bool connected;
> /* vhost_user_blk_start/vhost_user_blk_stop */
> bool started_vu;
>
> but that in itself is not enough. If you look at the various cases of
> handling CHR_EVENT_CLOSED you'll see some schedule the shutdown with aio
> and some don't even bother (so will probably break the same way).
>
> Rather than have a mish-mash of solutions maybe we should introduce a
> new vhost function - vhost_user_async_close() which can take care of the
> scheduling and wrap it with a check for a valid vhost structure in case
> it gets shutdown in the meantime?

Handling this in core vhost code would be great.

I suggested checking a variable because it's not async. Async is more
complicated because it creates a new in-between state while waiting
for the operation to complete. Async approaches are more likely to
have bugs for this reason.

vhost-user-blk.c's async shutdown is a good example of that:
case CHR_EVENT_CLOSED:
if (!runstate_check(RUN_STATE_SHUTDOWN)) {
/*
 * A close event may happen during a read/write, but vhost
 * code assumes the vhost_dev remains setup, so delay the
 * stop & clear.
 */
AioContext *ctx = qemu_get_current_aio_context();

qemu_chr_fe_set_handlers(&s->chardev, NULL, NULL, NULL, NULL,
NULL, NULL, false);
aio_bh_schedule_oneshot(ctx, vhost_user_blk_chr_closed_bh, opaque);

/*
 * Move vhost device to the stopped state. The vhost-user device
 * will be clean up and disconnected in BH. This can be useful in
 * the vhost migration code. If disconnect was caught there is an
 * option for the general vhost code to get the dev state without
 * knowing its type (in this case vhost-user).

Re: [PATCH for 7.2-rc? v2 0/5] continuing efforts to fix vhost-user issues

2022-11-26 Thread Alex Bennée



Stefan Hajnoczi  writes:

> On Sat, 26 Nov 2022 at 04:45, Alex Bennée  wrote:
>>
>>
>> Alex Bennée  writes:
>>
>> > Alex Bennée  writes:
>> >
>> >> Hi,
>> >>
>> > 
>> >> I can replicate some of the other failures I've been seeing in CI by
>> >> running:
>> >>
>> >>   ../../meson/meson.py test --repeat 10 --print-errorlogs 
>> >> qtest-arm/qos-test
>> >>
>> >> however this seems to run everything in parallel and maybe is better
>> >> at exposing race conditions. Perhaps the CI system makes those races
>> >> easier to hit? Unfortunately I've not been able to figure out exactly
>> >> how things go wrong in the failure case.
>> >>
>> > 
>> >
>> > There is a circular call - we are in vu_gpio_stop which triggers a write
>> > to vhost-user which allows us to catch a disconnect event:
>> >
>> >   #0 vhost_dev_is_started (hdev=0x557adf80d878) at
>> > /home/alex/lsrc/qemu.git/include/hw/virtio/vhost.h:199
>> >   #1  0x557adbe0518a in vu_gpio_stop (vdev=0x557adf80d640) at 
>> > ../../hw/virtio/vhost-user-gpio.c:138
>> >   #2 0x557adbe04d56 in vu_gpio_disconnect (dev=0x557adf80d640)
>> > at ../../hw/virtio/vhost-user-gpio.c:255
>> >   #3 0x557adbe049bb in vu_gpio_event (opaque=0x557adf80d640,
>> > event=CHR_EVENT_CLOSED) at ../../hw/virtio/vhost-user-gpio.c:274
>>
>> I suspect the best choice here is to schedule the cleanup as a later
>> date. Should I use the aio_bh one shots for this or maybe an rcu cleanup
>> event?
>>
>> Paolo, any suggestions?
>>
>> >   #4 0x557adc0539ef in chr_be_event (s=0x557adea51f10,
>> > event=CHR_EVENT_CLOSED) at ../../chardev/char.c:61
>> >   #5 0x557adc0506aa in qemu_chr_be_event (s=0x557adea51f10,
>> > event=CHR_EVENT_CLOSED) at ../../chardev/char.c:81
>> >   #6 0x557adc04f666 in tcp_chr_disconnect_locked
>> > (chr=0x557adea51f10) at ../../chardev/char-socket.c:470
>> >   #7 0x557adc04c81a in tcp_chr_write (chr=0x557adea51f10,
>> > buf=0x7ffe8588cce0 "\v", len=20) at
>> > ../../chardev/char-socket.c:129
>
> Does this mean the backend closed the connection before receiving all
> the vhost-user protocol messages sent by QEMU?
>
> This looks like a backend bug. It prevents QEMU's vhost-user client
> from cleanly stopping the virtqueue (vhost_virtqueue_stop()).

Well the backend in this case is the qtest framework so not the worlds
most complete implementation.

> QEMU is still broken if it cannot handle disconnect at any time. Maybe
> a simple solution for that is to check for reentrancy (either by
> checking an existing variable or adding a new one to prevent
> vu_gpio_stop() from calling itself).

vhost-user-blk introduced an additional flag:

/*
 * There are at least two steps of initialization of the
 * vhost-user device. The first is a "connect" step and
 * second is a "start" step. Make a separation between
 * those initialization phases by using two fields.
 */
/* vhost_user_blk_connect/vhost_user_blk_disconnect */
bool connected;
/* vhost_user_blk_start/vhost_user_blk_stop */
bool started_vu;

but that in itself is not enough. If you look at the various cases of
handling CHR_EVENT_CLOSED you'll see some schedule the shutdown with aio
and some don't even bother (so will probably break the same way).

Rather than have a mish-mash of solutions maybe we should introduce a
new vhost function - vhost_user_async_close() which can take care of the
scheduling and wrap it with a check for a valid vhost structure in case
it gets shutdown in the meantime?

>
>> >   #8  0x557adc050999 in qemu_chr_write_buffer (s=0x557adea51f10, 
>> > buf=0x7ffe8588cce0 "\v", len=20, offset=0x7ffe8588cbe4, write_all=true) at 
>> > ../../chardev/char.c:121
>> >   #9  0x557adc0507c7 in qemu_chr_write (s=0x557adea51f10, 
>> > buf=0x7ffe8588cce0 "\v", len=20, write_all=true) at 
>> > ../../chardev/char.c:173
>> >   #10 0x557adc046f3a in qemu_chr_fe_write_all (be=0x557adf80d830, 
>> > buf=0x7ffe8588cce0 "\v", len=20) at ../../chardev/char-fe.c:53
>> >   #11 0x557adbddc02f in vhost_user_write (dev=0x557adf80d878, 
>> > msg=0x7ffe8588cce0, fds=0x0, fd_num=0) at ../../hw/virtio/vhost-user.c:490
>> >   #12 0x557adbddd48f in vhost_user_get_vring_base (dev=0x557adf80d878, 
>> > ring=0x7ffe8588d000) at ../../hw/virtio/vhost-user.c:1260
>> >   #13 0x557adbdd4bd6 in vhost_virtqueue_stop (dev=0x557adf80d878, 
>> > vdev=0x557adf80d640, vq=0x557adf843570, idx=0) at 
>> > ../../hw/virtio/vhost.c:1220
>> >   #14 0x557adbdd7eda in vhost_dev_stop (hdev=0x557adf80d878, 
>> > vdev=0x557adf80d640, vrings=false) at ../../hw/virtio/vhost.c:1916
>> >   #15 0x557adbe051a6 in vu_gpio_stop (vdev=0x557adf80d640) at 
>> > ../../hw/virtio/vhost-user-gpio.c:142
>> >   #16 0x557adbe04849 in vu_gpio_set_status (vdev=0x557adf80d640, 
>> > status=15 '\017') at ../../hw/virtio/vhost-user-gpio.c:173
>> >   #17 0x557adbdc87ff in virtio_set_status (vdev=0x557adf80d640, val=15 
>> > '\017') at ../../hw/virtio/virtio.c:2442
>> >   #18 0x557adb

Re: [PATCH for 7.2-rc? v2 0/5] continuing efforts to fix vhost-user issues

2022-11-26 Thread Stefan Hajnoczi

On Sat, 26 Nov 2022 at 04:45, Alex Bennée  wrote:
>
>
> Alex Bennée  writes:
>
> > Alex Bennée  writes:
> >
> >> Hi,
> >>
> > 
> >> I can replicate some of the other failures I've been seeing in CI by
> >> running:
> >>
> >>   ../../meson/meson.py test --repeat 10 --print-errorlogs 
> >> qtest-arm/qos-test
> >>
> >> however this seems to run everything in parallel and maybe is better
> >> at exposing race conditions. Perhaps the CI system makes those races
> >> easier to hit? Unfortunately I've not been able to figure out exactly
> >> how things go wrong in the failure case.
> >>
> > 
> >
> > There is a circular call - we are in vu_gpio_stop which triggers a write
> > to vhost-user which allows us to catch a disconnect event:
> >
> >   #0  vhost_dev_is_started (hdev=0x557adf80d878) at 
> > /home/alex/lsrc/qemu.git/include/hw/virtio/vhost.h:199
> >   #1  0x557adbe0518a in vu_gpio_stop (vdev=0x557adf80d640) at 
> > ../../hw/virtio/vhost-user-gpio.c:138
> >   #2  0x557adbe04d56 in vu_gpio_disconnect (dev=0x557adf80d640) at 
> > ../../hw/virtio/vhost-user-gpio.c:255
> >   #3  0x557adbe049bb in vu_gpio_event (opaque=0x557adf80d640, 
> > event=CHR_EVENT_CLOSED) at ../../hw/virtio/vhost-user-gpio.c:274
>
> I suspect the best choice here is to schedule the cleanup as a later
> date. Should I use the aio_bh one shots for this or maybe an rcu cleanup
> event?
>
> Paolo, any suggestions?
>
> >   #4  0x557adc0539ef in chr_be_event (s=0x557adea51f10, 
> > event=CHR_EVENT_CLOSED) at ../../chardev/char.c:61
> >   #5  0x557adc0506aa in qemu_chr_be_event (s=0x557adea51f10, 
> > event=CHR_EVENT_CLOSED) at ../../chardev/char.c:81
> >   #6  0x557adc04f666 in tcp_chr_disconnect_locked (chr=0x557adea51f10) 
> > at ../../chardev/char-socket.c:470
> >   #7  0x557adc04c81a in tcp_chr_write (chr=0x557adea51f10, 
> > buf=0x7ffe8588cce0 "\v", len=20) at ../../chardev/char-socket.c:129

Does this mean the backend closed the connection before receiving all
the vhost-user protocol messages sent by QEMU?

This looks like a backend bug. It prevents QEMU's vhost-user client
from cleanly stopping the virtqueue (vhost_virtqueue_stop()).

QEMU is still broken if it cannot handle disconnect at any time. Maybe
a simple solution for that is to check for reentrancy (either by
checking an existing variable or adding a new one to prevent
vu_gpio_stop() from calling itself).

> >   #8  0x557adc050999 in qemu_chr_write_buffer (s=0x557adea51f10, 
> > buf=0x7ffe8588cce0 "\v", len=20, offset=0x7ffe8588cbe4, write_all=true) at 
> > ../../chardev/char.c:121
> >   #9  0x557adc0507c7 in qemu_chr_write (s=0x557adea51f10, 
> > buf=0x7ffe8588cce0 "\v", len=20, write_all=true) at ../../chardev/char.c:173
> >   #10 0x557adc046f3a in qemu_chr_fe_write_all (be=0x557adf80d830, 
> > buf=0x7ffe8588cce0 "\v", len=20) at ../../chardev/char-fe.c:53
> >   #11 0x557adbddc02f in vhost_user_write (dev=0x557adf80d878, 
> > msg=0x7ffe8588cce0, fds=0x0, fd_num=0) at ../../hw/virtio/vhost-user.c:490
> >   #12 0x557adbddd48f in vhost_user_get_vring_base (dev=0x557adf80d878, 
> > ring=0x7ffe8588d000) at ../../hw/virtio/vhost-user.c:1260
> >   #13 0x557adbdd4bd6 in vhost_virtqueue_stop (dev=0x557adf80d878, 
> > vdev=0x557adf80d640, vq=0x557adf843570, idx=0) at 
> > ../../hw/virtio/vhost.c:1220
> >   #14 0x557adbdd7eda in vhost_dev_stop (hdev=0x557adf80d878, 
> > vdev=0x557adf80d640, vrings=false) at ../../hw/virtio/vhost.c:1916
> >   #15 0x557adbe051a6 in vu_gpio_stop (vdev=0x557adf80d640) at 
> > ../../hw/virtio/vhost-user-gpio.c:142
> >   #16 0x557adbe04849 in vu_gpio_set_status (vdev=0x557adf80d640, 
> > status=15 '\017') at ../../hw/virtio/vhost-user-gpio.c:173
> >   #17 0x557adbdc87ff in virtio_set_status (vdev=0x557adf80d640, val=15 
> > '\017') at ../../hw/virtio/virtio.c:2442
> >   #18 0x557adbdcbfa0 in virtio_vmstate_change (opaque=0x557adf80d640, 
> > running=false, state=RUN_STATE_SHUTDOWN) at ../../hw/virtio/virtio.c:3736
> >   #19 0x557adb91ad27 in vm_state_notify (running=false, 
> > state=RUN_STATE_SHUTDOWN) at ../../softmmu/runstate.c:334
> >   #20 0x557adb910e88 in do_vm_stop (state=RUN_STATE_SHUTDOWN, 
> > send_stop=false) at ../../softmmu/cpus.c:262
> >   #21 0x557adb910e30 in vm_shutdown () at ../../softmmu/cpus.c:280
> >   #22 0x557adb91b9c3 in qemu_cleanup () at ../../softmmu/runstate.c:827
> >   #23 0x557adb522975 in qemu_default_main () at ../../softmmu/main.c:38
> >   #24 0x557adb5229a8 in main (argc=27, argv=0x7ffe8588d2f8) at 
> > ../../softmmu/main.c:48
> >   (rr) p hdev->started
> >   $9 = true
> >   (rr) info thread
> > Id   Target IdFrame
> >   * 1Thread 2140414.2140414 (qemu-system-aar) vhost_dev_is_started 
> > (hdev=0x557adf80d878) at 
> > /home/alex/lsrc/qemu.git/include/hw/virtio/vhost.h:199
> > 2Thread 2140414.2140439 (qemu-system-aar) 0x7002 in 
> > syscall_traced ()
> > 3Thread 214

Re: [PATCH for 7.2-rc? v2 0/5] continuing efforts to fix vhost-user issues

2022-11-26 Thread Alex Bennée



Alex Bennée  writes:

> Alex Bennée  writes:
>
>> Hi,
>>
> 
>> I can replicate some of the other failures I've been seeing in CI by
>> running:
>>
>>   ../../meson/meson.py test --repeat 10 --print-errorlogs qtest-arm/qos-test
>>
>> however this seems to run everything in parallel and maybe is better
>> at exposing race conditions. Perhaps the CI system makes those races
>> easier to hit? Unfortunately I've not been able to figure out exactly
>> how things go wrong in the failure case. 
>>
> 
>
> There is a circular call - we are in vu_gpio_stop which triggers a write
> to vhost-user which allows us to catch a disconnect event:
>
>   #0  vhost_dev_is_started (hdev=0x557adf80d878) at 
> /home/alex/lsrc/qemu.git/include/hw/virtio/vhost.h:199
>   #1  0x557adbe0518a in vu_gpio_stop (vdev=0x557adf80d640) at 
> ../../hw/virtio/vhost-user-gpio.c:138
>   #2  0x557adbe04d56 in vu_gpio_disconnect (dev=0x557adf80d640) at 
> ../../hw/virtio/vhost-user-gpio.c:255
>   #3  0x557adbe049bb in vu_gpio_event (opaque=0x557adf80d640, 
> event=CHR_EVENT_CLOSED) at ../../hw/virtio/vhost-user-gpio.c:274

I suspect the best choice here is to schedule the cleanup as a later
date. Should I use the aio_bh one shots for this or maybe an rcu cleanup
event?

Paolo, any suggestions?

>   #4  0x557adc0539ef in chr_be_event (s=0x557adea51f10, 
> event=CHR_EVENT_CLOSED) at ../../chardev/char.c:61
>   #5  0x557adc0506aa in qemu_chr_be_event (s=0x557adea51f10, 
> event=CHR_EVENT_CLOSED) at ../../chardev/char.c:81
>   #6  0x557adc04f666 in tcp_chr_disconnect_locked (chr=0x557adea51f10) at 
> ../../chardev/char-socket.c:470
>   #7  0x557adc04c81a in tcp_chr_write (chr=0x557adea51f10, 
> buf=0x7ffe8588cce0 "\v", len=20) at ../../chardev/char-socket.c:129
>   #8  0x557adc050999 in qemu_chr_write_buffer (s=0x557adea51f10, 
> buf=0x7ffe8588cce0 "\v", len=20, offset=0x7ffe8588cbe4, write_all=true) at 
> ../../chardev/char.c:121
>   #9  0x557adc0507c7 in qemu_chr_write (s=0x557adea51f10, 
> buf=0x7ffe8588cce0 "\v", len=20, write_all=true) at ../../chardev/char.c:173
>   #10 0x557adc046f3a in qemu_chr_fe_write_all (be=0x557adf80d830, 
> buf=0x7ffe8588cce0 "\v", len=20) at ../../chardev/char-fe.c:53
>   #11 0x557adbddc02f in vhost_user_write (dev=0x557adf80d878, 
> msg=0x7ffe8588cce0, fds=0x0, fd_num=0) at ../../hw/virtio/vhost-user.c:490
>   #12 0x557adbddd48f in vhost_user_get_vring_base (dev=0x557adf80d878, 
> ring=0x7ffe8588d000) at ../../hw/virtio/vhost-user.c:1260
>   #13 0x557adbdd4bd6 in vhost_virtqueue_stop (dev=0x557adf80d878, 
> vdev=0x557adf80d640, vq=0x557adf843570, idx=0) at ../../hw/virtio/vhost.c:1220
>   #14 0x557adbdd7eda in vhost_dev_stop (hdev=0x557adf80d878, 
> vdev=0x557adf80d640, vrings=false) at ../../hw/virtio/vhost.c:1916
>   #15 0x557adbe051a6 in vu_gpio_stop (vdev=0x557adf80d640) at 
> ../../hw/virtio/vhost-user-gpio.c:142
>   #16 0x557adbe04849 in vu_gpio_set_status (vdev=0x557adf80d640, 
> status=15 '\017') at ../../hw/virtio/vhost-user-gpio.c:173
>   #17 0x557adbdc87ff in virtio_set_status (vdev=0x557adf80d640, val=15 
> '\017') at ../../hw/virtio/virtio.c:2442
>   #18 0x557adbdcbfa0 in virtio_vmstate_change (opaque=0x557adf80d640, 
> running=false, state=RUN_STATE_SHUTDOWN) at ../../hw/virtio/virtio.c:3736
>   #19 0x557adb91ad27 in vm_state_notify (running=false, 
> state=RUN_STATE_SHUTDOWN) at ../../softmmu/runstate.c:334
>   #20 0x557adb910e88 in do_vm_stop (state=RUN_STATE_SHUTDOWN, 
> send_stop=false) at ../../softmmu/cpus.c:262
>   #21 0x557adb910e30 in vm_shutdown () at ../../softmmu/cpus.c:280
>   #22 0x557adb91b9c3 in qemu_cleanup () at ../../softmmu/runstate.c:827
>   #23 0x557adb522975 in qemu_default_main () at ../../softmmu/main.c:38
>   #24 0x557adb5229a8 in main (argc=27, argv=0x7ffe8588d2f8) at 
> ../../softmmu/main.c:48
>   (rr) p hdev->started
>   $9 = true
>   (rr) info thread
> Id   Target IdFrame 
>   * 1Thread 2140414.2140414 (qemu-system-aar) vhost_dev_is_started 
> (hdev=0x557adf80d878) at 
> /home/alex/lsrc/qemu.git/include/hw/virtio/vhost.h:199
> 2Thread 2140414.2140439 (qemu-system-aar) 0x7002 in 
> syscall_traced ()
> 3Thread 2140414.2140442 (qemu-system-aar) 0x7002 in 
> syscall_traced ()
> 4Thread 2140414.2140443 (qemu-system-aar) 0x7002 in 
> syscall_traced ()
>
> During which we eliminate the vhost_dev with a memset:
>
>   Thread 1 hit Hardware watchpoint 2: *(unsigned int *) 0x557adf80da30
>
>   Old value = 2
>   New value = 0
>   __memset_avx2_unaligned_erms () at 
> ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:220
>   Download failed: Invalid argument.  Continuing without source file 
> ./string/../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S.
>   220 ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: No such 
> file or directory.
>   (rr) bt
>   #0  __memset_avx2_

Re: [PATCH for 7.2-rc? v2 0/5] continuing efforts to fix vhost-user issues

2022-11-25 Thread Alex Bennée



Stefan Weil  writes:

> Am 25.11.22 um 18:30 schrieb Alex Bennée:
>> Hi,
>> This is continuing to attempt to fix the various vhost-user issues
>> that are currently plaguing the release. One concrete bug I've come
>> across is that all qtest MMIO devices where being treated as legacy
>> which caused the VIRTIO_F_VERSION_1 flag to get missed causing s390x
>> to fall back to trying to set the endian value for the virt-queues.
>
> Do you want to add my 4 commits which fix format strings for
> libvhost-user to your series, or should they be handled separately?

I'm going to leave the choice of VirtIO patches to take to MST given
he is the maintainer and I've obviously broken it enough this release :-/

-- 
Alex Bennée

Re: [PATCH for 7.2-rc? v2 0/5] continuing efforts to fix vhost-user issues

2022-11-25 Thread Alex Bennée



Alex Bennée  writes:

> Hi,
>

> I can replicate some of the other failures I've been seeing in CI by
> running:
>
>   ../../meson/meson.py test --repeat 10 --print-errorlogs qtest-arm/qos-test
>
> however this seems to run everything in parallel and maybe is better
> at exposing race conditions. Perhaps the CI system makes those races
> easier to hit? Unfortunately I've not been able to figure out exactly
> how things go wrong in the failure case. 
>


There is a circular call - we are in vu_gpio_stop which triggers a write
to vhost-user which allows us to catch a disconnect event:

  #0  vhost_dev_is_started (hdev=0x557adf80d878) at 
/home/alex/lsrc/qemu.git/include/hw/virtio/vhost.h:199
  #1  0x557adbe0518a in vu_gpio_stop (vdev=0x557adf80d640) at 
../../hw/virtio/vhost-user-gpio.c:138
  #2  0x557adbe04d56 in vu_gpio_disconnect (dev=0x557adf80d640) at 
../../hw/virtio/vhost-user-gpio.c:255
  #3  0x557adbe049bb in vu_gpio_event (opaque=0x557adf80d640, 
event=CHR_EVENT_CLOSED) at ../../hw/virtio/vhost-user-gpio.c:274
  #4  0x557adc0539ef in chr_be_event (s=0x557adea51f10, 
event=CHR_EVENT_CLOSED) at ../../chardev/char.c:61
  #5  0x557adc0506aa in qemu_chr_be_event (s=0x557adea51f10, 
event=CHR_EVENT_CLOSED) at ../../chardev/char.c:81
  #6  0x557adc04f666 in tcp_chr_disconnect_locked (chr=0x557adea51f10) at 
../../chardev/char-socket.c:470
  #7  0x557adc04c81a in tcp_chr_write (chr=0x557adea51f10, 
buf=0x7ffe8588cce0 "\v", len=20) at ../../chardev/char-socket.c:129
  #8  0x557adc050999 in qemu_chr_write_buffer (s=0x557adea51f10, 
buf=0x7ffe8588cce0 "\v", len=20, offset=0x7ffe8588cbe4, write_all=true) at 
../../chardev/char.c:121
  #9  0x557adc0507c7 in qemu_chr_write (s=0x557adea51f10, 
buf=0x7ffe8588cce0 "\v", len=20, write_all=true) at ../../chardev/char.c:173
  #10 0x557adc046f3a in qemu_chr_fe_write_all (be=0x557adf80d830, 
buf=0x7ffe8588cce0 "\v", len=20) at ../../chardev/char-fe.c:53
  #11 0x557adbddc02f in vhost_user_write (dev=0x557adf80d878, 
msg=0x7ffe8588cce0, fds=0x0, fd_num=0) at ../../hw/virtio/vhost-user.c:490
  #12 0x557adbddd48f in vhost_user_get_vring_base (dev=0x557adf80d878, 
ring=0x7ffe8588d000) at ../../hw/virtio/vhost-user.c:1260
  #13 0x557adbdd4bd6 in vhost_virtqueue_stop (dev=0x557adf80d878, 
vdev=0x557adf80d640, vq=0x557adf843570, idx=0) at ../../hw/virtio/vhost.c:1220
  #14 0x557adbdd7eda in vhost_dev_stop (hdev=0x557adf80d878, 
vdev=0x557adf80d640, vrings=false) at ../../hw/virtio/vhost.c:1916
  #15 0x557adbe051a6 in vu_gpio_stop (vdev=0x557adf80d640) at 
../../hw/virtio/vhost-user-gpio.c:142
  #16 0x557adbe04849 in vu_gpio_set_status (vdev=0x557adf80d640, status=15 
'\017') at ../../hw/virtio/vhost-user-gpio.c:173
  #17 0x557adbdc87ff in virtio_set_status (vdev=0x557adf80d640, val=15 
'\017') at ../../hw/virtio/virtio.c:2442
  #18 0x557adbdcbfa0 in virtio_vmstate_change (opaque=0x557adf80d640, 
running=false, state=RUN_STATE_SHUTDOWN) at ../../hw/virtio/virtio.c:3736
  #19 0x557adb91ad27 in vm_state_notify (running=false, 
state=RUN_STATE_SHUTDOWN) at ../../softmmu/runstate.c:334
  #20 0x557adb910e88 in do_vm_stop (state=RUN_STATE_SHUTDOWN, 
send_stop=false) at ../../softmmu/cpus.c:262
  #21 0x557adb910e30 in vm_shutdown () at ../../softmmu/cpus.c:280
  #22 0x557adb91b9c3 in qemu_cleanup () at ../../softmmu/runstate.c:827
  #23 0x557adb522975 in qemu_default_main () at ../../softmmu/main.c:38
  #24 0x557adb5229a8 in main (argc=27, argv=0x7ffe8588d2f8) at 
../../softmmu/main.c:48
  (rr) p hdev->started
  $9 = true
  (rr) info thread
Id   Target IdFrame 
  * 1Thread 2140414.2140414 (qemu-system-aar) vhost_dev_is_started 
(hdev=0x557adf80d878) at /home/alex/lsrc/qemu.git/include/hw/virtio/vhost.h:199
2Thread 2140414.2140439 (qemu-system-aar) 0x7002 in 
syscall_traced ()
3Thread 2140414.2140442 (qemu-system-aar) 0x7002 in 
syscall_traced ()
4Thread 2140414.2140443 (qemu-system-aar) 0x7002 in 
syscall_traced ()

During which we eliminate the vhost_dev with a memset:

  Thread 1 hit Hardware watchpoint 2: *(unsigned int *) 0x557adf80da30

  Old value = 2
  New value = 0
  __memset_avx2_unaligned_erms () at 
../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:220
  Download failed: Invalid argument.  Continuing without source file 
./string/../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S.
  220 ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: No such file 
or directory.
  (rr) bt
  #0  __memset_avx2_unaligned_erms () at 
../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:220
  #1  0x557adbdd67f8 in vhost_dev_cleanup (hdev=0x557adf80d878) at 
../../hw/virtio/vhost.c:1501
  #2  0x557adbe04d68 in vu_gpio_disconnect (dev=0x557adf80d640) at 
../../hw/virtio/vhost-user-gpio.c:256
  #3  0x557adbe049bb in vu_gpio_event (opaque=0x557adf80d640, 
event=CHR_EVE

Re: [PATCH for 7.2-rc? v2 0/5] continuing efforts to fix vhost-user issues

2022-11-25 Thread Stefan Weil via


Am 25.11.22 um 18:30 schrieb Alex Bennée:

Hi,

This is continuing to attempt to fix the various vhost-user issues
that are currently plaguing the release. One concrete bug I've come
across is that all qtest MMIO devices where being treated as legacy
which caused the VIRTIO_F_VERSION_1 flag to get missed causing s390x
to fall back to trying to set the endian value for the virt-queues.


Do you want to add my 4 commits which fix format strings for 
libvhost-user to your series, or should they be handled separately?


Regards
Stefan

[PATCH for 7.2-rc? v2 0/5] continuing efforts to fix vhost-user issues

2022-11-25 Thread Alex Bennée

Hi,

This is continuing to attempt to fix the various vhost-user issues
that are currently plaguing the release. One concrete bug I've come
across is that all qtest MMIO devices where being treated as legacy
which caused the VIRTIO_F_VERSION_1 flag to get missed causing s390x
to fall back to trying to set the endian value for the virt-queues.

I've patched it for the GPIO tests and raised a tracking bug (#1342)
for the general problem. This might explain why the only other VirtIO
vhost-user MMIO device tested via qtest are the virtio-net-tests. The
vhost networking support is its own special implementation so its hard
to compare the code for GPIO. It does make me wonder if disabling the
mmio version of the test for now would be worthwhile. FWIW I did try
disabling force-legacy for all machine types and that caused a bunch
of the other tests to fail.

I made some progress in tracking down the memory leak that clang
complains about. It comes down to the line:

  gpio->vhost_dev.vqs = g_new0(struct vhost_virtqueue, gpio->vhost_dev.nvqs);

which is never cleared up because we never call
vu_gpio_device_unrealize() in the test. However its unclear why this
is the case. We don't seem to unrealize the vhost-user-network tests
either and clang doesn't complain about that.

I can replicate some of the other failures I've been seeing in CI by
running:

  ../../meson/meson.py test --repeat 10 --print-errorlogs qtest-arm/qos-test

however this seems to run everything in parallel and maybe is better
at exposing race conditions. Perhaps the CI system makes those races
easier to hit? Unfortunately I've not been able to figure out exactly
how things go wrong in the failure case. 

I've included Stefano's:

  vhost: enable vrings in vhost_dev_start() for vhost-user devices

in this series as it makes sense and improves the vring state errors.
However it's up to you if you want to include it in the eventual PR.
There are still CI errors I'm trying to track down but I thought it
would be worth posting the current state of my tree.

Please review.


Alex Bennée (4):
  include/hw: attempt to document VirtIO feature variables
  include/hw: VM state takes precedence in virtio_device_should_start
  tests/qtests: override "force-legacy" for gpio virtio-mmio tests
  hw/virtio: ensure a valid host_feature set for virtio-user-gpio

Stefano Garzarella (1):
  vhost: enable vrings in vhost_dev_start() for vhost-user devices

 include/hw/virtio/vhost.h| 31 ++
 include/hw/virtio/virtio.h   | 43 ++-
 backends/cryptodev-vhost.c   |  4 +--
 backends/vhost-user.c|  4 +--
 hw/block/vhost-user-blk.c|  4 +--
 hw/net/vhost_net.c   |  8 +++---
 hw/scsi/vhost-scsi-common.c  |  4 +--
 hw/virtio/vhost-user-fs.c|  4 +--
 hw/virtio/vhost-user-gpio.c  | 10 ++--
 hw/virtio/vhost-user-i2c.c   |  4 +--
 hw/virtio/vhost-user-rng.c   |  4 +--
 hw/virtio/vhost-vsock-common.c   |  4 +--
 hw/virtio/vhost.c| 44 
 tests/qtest/libqos/virtio-gpio.c |  3 ++-
 hw/virtio/trace-events   |  4 +--
 15 files changed, 134 insertions(+), 41 deletions(-)

-- 
2.34.1

Re: [PATCH for 7.2-rc? v2 0/5] continuing efforts to fix vhost-user issues

Re: [PATCH for 7.2-rc? v2 0/5] continuing efforts to fix vhost-user issues

Re: [PATCH for 7.2-rc? v2 0/5] continuing efforts to fix vhost-user issues

Re: [PATCH for 7.2-rc? v2 0/5] continuing efforts to fix vhost-user issues

Re: [PATCH for 7.2-rc? v2 0/5] continuing efforts to fix vhost-user issues

Re: [PATCH for 7.2-rc? v2 0/5] continuing efforts to fix vhost-user issues

Re: [PATCH for 7.2-rc? v2 0/5] continuing efforts to fix vhost-user issues

[PATCH for 7.2-rc? v2 0/5] continuing efforts to fix vhost-user issues

8 matches

Site Navigation

Mail list logo

Footer information