Re: vhost-user-blk reconnect issue
On 4/2/2024 4:44 PM, Li Feng wrote: *External email: Use caution opening links or attachments* Hi, I tested it today and there is indeed a problem in this scenario. It seems that the first version of the patch is the best and can handle all scenarios. With this patch, the previously merged patches are no longer useful. I will revert this patch and submit a new fix. Do you have any comments? Revert: https://lore.kernel.org/all/a68c0148e9bf105f9e83ff5e763b8fcb6f7ba9be.1697644299.git@redhat.com/ <https://lore.kernel.org/all/a68c0148e9bf105f9e83ff5e763b8fcb6f7ba9be.1697644299.git@redhat.com/> New: https://lore.kernel.org/all/20230804052954.2918915-2-fen...@smartx.com/ <https://lore.kernel.org/all/20230804052954.2918915-2-fen...@smartx.com/> Looks good to me. Thanks, Yajun Thanks, Li 2024年4月1日 16:43,Yajun Wu 写道: On 4/1/2024 4:34 PM, Li Feng wrote: *External email: Use caution opening links or attachments* Hi yajun, I have submitted a patch to fix this problem a few months ago, but in the end this solution was not accepted and other solutions were adopted to fix it. [PATCH 1/2] vhost-user: fix lost reconnect - Li Feng <https://lore.kernel.org/all/20230804052954.2918915-2-fen...@smartx.com/> lore.kernel.org <https://lore.kernel.org/all/20230804052954.2918915-2-fen...@smartx.com/> <https://lore.kernel.org/all/20230804052954.2918915-2-fen...@smartx.com/> <https://lore.kernel.org/all/20230804052954.2918915-2-fen...@smartx.com/> I think this fix is valid. This is the merged fix: [PULL 76/83] vhost-user: fix lost reconnect - Michael S. Tsirkin <https://lore.kernel.org/all/a68c0148e9bf105f9e83ff5e763b8fcb6f7ba9be.1697644299.git@redhat.com/> lore.kernel.org <https://lore.kernel.org/all/a68c0148e9bf105f9e83ff5e763b8fcb6f7ba9be.1697644299.git@redhat.com/> <https://lore.kernel.org/all/a68c0148e9bf105f9e83ff5e763b8fcb6f7ba9be.1697644299.git@redhat.com/> <https://lore.kernel.org/all/a68c0148e9bf105f9e83ff5e763b8fcb6f7ba9be.1697644299.git@redhat.com/> My tests are with this fix, failed in the two scenarios I mentioned. Thanks, Li 2024年4月1日 10:08,Yajun Wu 写道: On 3/27/2024 6:47 PM, Stefano Garzarella wrote: External email: Use caution opening links or attachments Hi Yajun, On Mon, Mar 25, 2024 at 10:54:13AM +, Yajun Wu wrote: Hi experts, With latest QEMU (8.2.90), we find two vhost-user-blk backend reconnect failure scenarios: Do you know if has it ever worked and so it's a regression, or have we always had this problem? I am afraid this commit: "71e076a07d (2022-12-01 02:30:13 -0500) hw/virtio: generalise CHR_EVENT_CLOSED handling" caused both failures. Previous hash is good. I suspect the "if (vhost->vdev)" in vhost_user_async_close_bh is the cause, previous code doesn't have this check? Thanks, Stefano 1. Disconnect vhost-user-blk backend before guest driver probe vblk device, then reconnect backend after guest driver probe device. QEMU won't send out any vhost messages to restore backend. This is because vhost->vdev is NULL before guest driver probe vblk device, so vhost_user_blk_disconnect won't be called, s->connected is still true. Next vhost_user_blk_connect will simply return without doing anything. 2. modprobe -r virtio-blk inside VM, then disconnect backend, then reconnect backend, then modprobe virtio-blk. QEMU won't send messages in vhost_dev_init. This is because rmmod will let qemu call vhost_user_blk_stop, vhost->vdev also become NULL(in vhost_dev_stop), vhost_user_blk_disconnect won't be called. Again s->connected is still true, even chr connect is closed. I think even vhost->vdev is NULL, vhost_user_blk_disconnect should be called when chr connect close? Hope we can have a fix soon. Thanks, Yajun
Re: [PATCH] virtio-net: fix qemu set used ring flag even vhost started
On 4/3/2024 1:10 PM, Michael Tokarev wrote: External email: Use caution opening links or attachments 02.04.2024 07:51, Yajun Wu: When vhost-user or vhost-kernel is handling virtio net datapath, qemu should not touch used ring. But with vhost-user socket reconnect scenario, in a very rare case (has pending kick event). VRING_USED_F_NO_NOTIFY is set by qemu in following code path: ... This issue causes guest kernel stop kicking device and traffic stop. Add vhost_started check in virtio_net_handle_tx_bh to fix this wrong VRING_USED_F_NO_NOTIFY set. This seems to be qemu-stable material, even if the case when the issue happens is "very rare", -- is it not? If you mean this fix needs back port to previous version? Yes. Thanks, /mjt Signed-off-by: Yajun Wu Reviewed-by: Jiri Pirko --- hw/net/virtio-net.c | 4 1 file changed, 4 insertions(+) diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index a6ff000cd9..8035e01fdf 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -2865,6 +2865,10 @@ static void virtio_net_handle_tx_bh(VirtIODevice *vdev, VirtQueue *vq) VirtIONet *n = VIRTIO_NET(vdev); VirtIONetQueue *q = >vqs[vq2q(virtio_get_queue_index(vq))]; +if (n->vhost_started) { +return; +} + if (unlikely((n->status & VIRTIO_NET_S_LINK_UP) == 0)) { virtio_net_drop_tx_queue_data(vdev, vq); return;
Re: [PATCH] virtio-net: fix qemu set used ring flag even vhost started
On 4/2/2024 8:11 PM, Philippe Mathieu-Daudé wrote: External email: Use caution opening links or attachments Hi Yajun, On 2/4/24 06:51, Yajun Wu wrote: When vhost-user or vhost-kernel is handling virtio net datapath, qemu "QEMU" Ack. should not touch used ring. But with vhost-user socket reconnect scenario, in a very rare case (has pending kick event). VRING_USED_F_NO_NOTIFY is set by qemu in "QEMU" Ack. following code path: #0 virtio_queue_split_set_notification (vq=0x7ff5f4c920a8, enable=0) at ../hw/virtio/virtio.c:511 #1 0x559d6dbf033b in virtio_queue_set_notification (vq=0x7ff5f4c920a8, enable=0) at ../hw/virtio/virtio.c:576 #2 0x559d6dbbbdbc in virtio_net_handle_tx_bh (vdev=0x559d703a6aa0, vq=0x7ff5f4c920a8) at ../hw/net/virtio-net.c:2801 #3 0x559d6dbf4791 in virtio_queue_notify_vq (vq=0x7ff5f4c920a8) at ../hw/virtio/virtio.c:2248 #4 0x559d6dbf79da in virtio_queue_host_notifier_read (n=0x7ff5f4c9211c) at ../hw/virtio/virtio.c:3525 #5 0x559d6d9a5814 in virtio_bus_cleanup_host_notifier (bus=0x559d703a6a20, n=1) at ../hw/virtio/virtio-bus.c:321 #6 0x559d6dbf83c9 in virtio_device_stop_ioeventfd_impl (vdev=0x559d703a6aa0) at ../hw/virtio/virtio.c:3774 #7 0x559d6d9a55c8 in virtio_bus_stop_ioeventfd (bus=0x559d703a6a20) at ../hw/virtio/virtio-bus.c:259 #8 0x559d6d9a53e8 in virtio_bus_grab_ioeventfd (bus=0x559d703a6a20) at ../hw/virtio/virtio-bus.c:199 #9 0x559d6dbf841c in virtio_device_grab_ioeventfd (vdev=0x559d703a6aa0) at ../hw/virtio/virtio.c:3783 #10 0x559d6d9bde18 in vhost_dev_enable_notifiers (hdev=0x559d707edd70, vdev=0x559d703a6aa0) at ../hw/virtio/vhost.c:1592 #11 0x559d6d89a0b8 in vhost_net_start_one (net=0x559d707edd70, dev=0x559d703a6aa0) at ../hw/net/vhost_net.c:266 #12 0x559d6d89a6df in vhost_net_start (dev=0x559d703a6aa0, ncs=0x559d7048d890, data_queue_pairs=31, cvq=0) at ../hw/net/vhost_net.c:412 #13 0x559d6dbb5b89 in virtio_net_vhost_status (n=0x559d703a6aa0, status=15 '\017') at ../hw/net/virtio-net.c:311 #14 0x559d6dbb5e34 in virtio_net_set_status (vdev=0x559d703a6aa0, status=15 '\017') at ../hw/net/virtio-net.c:392 #15 0x559d6dbb60d8 in virtio_net_set_link_status (nc=0x559d7048d890) at ../hw/net/virtio-net.c:455 #16 0x559d6da64863 in qmp_set_link (name=0x559d6f0b83d0 "hostnet1", up=true, errp=0x7ffdd76569f0) at ../net/net.c:1459 #17 0x559d6da7226e in net_vhost_user_event (opaque=0x559d6f0b83d0, event=CHR_EVENT_OPENED) at ../net/vhost-user.c:301 #18 0x559d6ddc7f63 in chr_be_event (s=0x559d6f2ffea0, event=CHR_EVENT_OPENED) at ../chardev/char.c:62 #19 0x559d6ddc7fdc in qemu_chr_be_event (s=0x559d6f2ffea0, event=CHR_EVENT_OPENED) at ../chardev/char.c:82 This issue causes guest kernel stop kicking device and traffic stop. Add vhost_started check in virtio_net_handle_tx_bh to fix this wrong VRING_USED_F_NO_NOTIFY set. Signed-off-by: Yajun Wu Reviewed-by: Jiri Pirko --- hw/net/virtio-net.c | 4 1 file changed, 4 insertions(+) diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index a6ff000cd9..8035e01fdf 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -2865,6 +2865,10 @@ static void virtio_net_handle_tx_bh(VirtIODevice *vdev, VirtQueue *vq) VirtIONet *n = VIRTIO_NET(vdev); VirtIONetQueue *q = >vqs[vq2q(virtio_get_queue_index(vq))]; +if (n->vhost_started) { Since you mentioned "in a very rare case", maybe use unlikely()? Ack. +return; +} + if (unlikely((n->status & VIRTIO_NET_S_LINK_UP) == 0)) { virtio_net_drop_tx_queue_data(vdev, vq); return;
[PATCH] virtio-net: fix qemu set used ring flag even vhost started
When vhost-user or vhost-kernel is handling virtio net datapath, qemu should not touch used ring. But with vhost-user socket reconnect scenario, in a very rare case (has pending kick event). VRING_USED_F_NO_NOTIFY is set by qemu in following code path: #0 virtio_queue_split_set_notification (vq=0x7ff5f4c920a8, enable=0) at ../hw/virtio/virtio.c:511 #1 0x559d6dbf033b in virtio_queue_set_notification (vq=0x7ff5f4c920a8, enable=0) at ../hw/virtio/virtio.c:576 #2 0x559d6dbbbdbc in virtio_net_handle_tx_bh (vdev=0x559d703a6aa0, vq=0x7ff5f4c920a8) at ../hw/net/virtio-net.c:2801 #3 0x559d6dbf4791 in virtio_queue_notify_vq (vq=0x7ff5f4c920a8) at ../hw/virtio/virtio.c:2248 #4 0x559d6dbf79da in virtio_queue_host_notifier_read (n=0x7ff5f4c9211c) at ../hw/virtio/virtio.c:3525 #5 0x559d6d9a5814 in virtio_bus_cleanup_host_notifier (bus=0x559d703a6a20, n=1) at ../hw/virtio/virtio-bus.c:321 #6 0x559d6dbf83c9 in virtio_device_stop_ioeventfd_impl (vdev=0x559d703a6aa0) at ../hw/virtio/virtio.c:3774 #7 0x559d6d9a55c8 in virtio_bus_stop_ioeventfd (bus=0x559d703a6a20) at ../hw/virtio/virtio-bus.c:259 #8 0x559d6d9a53e8 in virtio_bus_grab_ioeventfd (bus=0x559d703a6a20) at ../hw/virtio/virtio-bus.c:199 #9 0x559d6dbf841c in virtio_device_grab_ioeventfd (vdev=0x559d703a6aa0) at ../hw/virtio/virtio.c:3783 #10 0x559d6d9bde18 in vhost_dev_enable_notifiers (hdev=0x559d707edd70, vdev=0x559d703a6aa0) at ../hw/virtio/vhost.c:1592 #11 0x559d6d89a0b8 in vhost_net_start_one (net=0x559d707edd70, dev=0x559d703a6aa0) at ../hw/net/vhost_net.c:266 #12 0x559d6d89a6df in vhost_net_start (dev=0x559d703a6aa0, ncs=0x559d7048d890, data_queue_pairs=31, cvq=0) at ../hw/net/vhost_net.c:412 #13 0x559d6dbb5b89 in virtio_net_vhost_status (n=0x559d703a6aa0, status=15 '\017') at ../hw/net/virtio-net.c:311 #14 0x559d6dbb5e34 in virtio_net_set_status (vdev=0x559d703a6aa0, status=15 '\017') at ../hw/net/virtio-net.c:392 #15 0x559d6dbb60d8 in virtio_net_set_link_status (nc=0x559d7048d890) at ../hw/net/virtio-net.c:455 #16 0x559d6da64863 in qmp_set_link (name=0x559d6f0b83d0 "hostnet1", up=true, errp=0x7ffdd76569f0) at ../net/net.c:1459 #17 0x559d6da7226e in net_vhost_user_event (opaque=0x559d6f0b83d0, event=CHR_EVENT_OPENED) at ../net/vhost-user.c:301 #18 0x559d6ddc7f63 in chr_be_event (s=0x559d6f2ffea0, event=CHR_EVENT_OPENED) at ../chardev/char.c:62 #19 0x559d6ddc7fdc in qemu_chr_be_event (s=0x559d6f2ffea0, event=CHR_EVENT_OPENED) at ../chardev/char.c:82 This issue causes guest kernel stop kicking device and traffic stop. Add vhost_started check in virtio_net_handle_tx_bh to fix this wrong VRING_USED_F_NO_NOTIFY set. Signed-off-by: Yajun Wu Reviewed-by: Jiri Pirko --- hw/net/virtio-net.c | 4 1 file changed, 4 insertions(+) diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index a6ff000cd9..8035e01fdf 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -2865,6 +2865,10 @@ static void virtio_net_handle_tx_bh(VirtIODevice *vdev, VirtQueue *vq) VirtIONet *n = VIRTIO_NET(vdev); VirtIONetQueue *q = >vqs[vq2q(virtio_get_queue_index(vq))]; +if (n->vhost_started) { +return; +} + if (unlikely((n->status & VIRTIO_NET_S_LINK_UP) == 0)) { virtio_net_drop_tx_queue_data(vdev, vq); return; -- 2.27.0
Re: vhost-user-blk reconnect issue
On 4/1/2024 4:34 PM, Li Feng wrote: *External email: Use caution opening links or attachments* Hi yajun, I have submitted a patch to fix this problem a few months ago, but in the end this solution was not accepted and other solutions were adopted to fix it. [PATCH 1/2] vhost-user: fix lost reconnect - Li Feng <https://lore.kernel.org/all/20230804052954.2918915-2-fen...@smartx.com/> lore.kernel.org <https://lore.kernel.org/all/20230804052954.2918915-2-fen...@smartx.com/> <https://lore.kernel.org/all/20230804052954.2918915-2-fen...@smartx.com/> <https://lore.kernel.org/all/20230804052954.2918915-2-fen...@smartx.com/> I think this fix is valid. This is the merged fix: [PULL 76/83] vhost-user: fix lost reconnect - Michael S. Tsirkin <https://lore.kernel.org/all/a68c0148e9bf105f9e83ff5e763b8fcb6f7ba9be.1697644299.git@redhat.com/> lore.kernel.org <https://lore.kernel.org/all/a68c0148e9bf105f9e83ff5e763b8fcb6f7ba9be.1697644299.git@redhat.com/> <https://lore.kernel.org/all/a68c0148e9bf105f9e83ff5e763b8fcb6f7ba9be.1697644299.git@redhat.com/> <https://lore.kernel.org/all/a68c0148e9bf105f9e83ff5e763b8fcb6f7ba9be.1697644299.git@redhat.com/> My tests are with this fix, failed in the two scenarios I mentioned. Thanks, Li 2024年4月1日 10:08,Yajun Wu 写道: On 3/27/2024 6:47 PM, Stefano Garzarella wrote: External email: Use caution opening links or attachments Hi Yajun, On Mon, Mar 25, 2024 at 10:54:13AM +, Yajun Wu wrote: Hi experts, With latest QEMU (8.2.90), we find two vhost-user-blk backend reconnect failure scenarios: Do you know if has it ever worked and so it's a regression, or have we always had this problem? I am afraid this commit: "71e076a07d (2022-12-01 02:30:13 -0500) hw/virtio: generalise CHR_EVENT_CLOSED handling" caused both failures. Previous hash is good. I suspect the "if (vhost->vdev)" in vhost_user_async_close_bh is the cause, previous code doesn't have this check? Thanks, Stefano 1. Disconnect vhost-user-blk backend before guest driver probe vblk device, then reconnect backend after guest driver probe device. QEMU won't send out any vhost messages to restore backend. This is because vhost->vdev is NULL before guest driver probe vblk device, so vhost_user_blk_disconnect won't be called, s->connected is still true. Next vhost_user_blk_connect will simply return without doing anything. 2. modprobe -r virtio-blk inside VM, then disconnect backend, then reconnect backend, then modprobe virtio-blk. QEMU won't send messages in vhost_dev_init. This is because rmmod will let qemu call vhost_user_blk_stop, vhost->vdev also become NULL(in vhost_dev_stop), vhost_user_blk_disconnect won't be called. Again s->connected is still true, even chr connect is closed. I think even vhost->vdev is NULL, vhost_user_blk_disconnect should be called when chr connect close? Hope we can have a fix soon. Thanks, Yajun
Re: vhost-user-blk reconnect issue
On 3/27/2024 6:47 PM, Stefano Garzarella wrote: External email: Use caution opening links or attachments Hi Yajun, On Mon, Mar 25, 2024 at 10:54:13AM +, Yajun Wu wrote: Hi experts, With latest QEMU (8.2.90), we find two vhost-user-blk backend reconnect failure scenarios: Do you know if has it ever worked and so it's a regression, or have we always had this problem? I am afraid this commit: "71e076a07d (2022-12-01 02:30:13 -0500) hw/virtio: generalise CHR_EVENT_CLOSED handling" caused both failures. Previous hash is good. I suspect the "if (vhost->vdev)" in vhost_user_async_close_bh is the cause, previous code doesn't have this check? Thanks, Stefano 1. Disconnect vhost-user-blk backend before guest driver probe vblk device, then reconnect backend after guest driver probe device. QEMU won't send out any vhost messages to restore backend. This is because vhost->vdev is NULL before guest driver probe vblk device, so vhost_user_blk_disconnect won't be called, s->connected is still true. Next vhost_user_blk_connect will simply return without doing anything. 2. modprobe -r virtio-blk inside VM, then disconnect backend, then reconnect backend, then modprobe virtio-blk. QEMU won't send messages in vhost_dev_init. This is because rmmod will let qemu call vhost_user_blk_stop, vhost->vdev also become NULL(in vhost_dev_stop), vhost_user_blk_disconnect won't be called. Again s->connected is still true, even chr connect is closed. I think even vhost->vdev is NULL, vhost_user_blk_disconnect should be called when chr connect close? Hope we can have a fix soon. Thanks, Yajun
vhost-user-blk reconnect issue
Hi experts, With latest QEMU (8.2.90), we find two vhost-user-blk backend reconnect failure scenarios: 1. Disconnect vhost-user-blk backend before guest driver probe vblk device, then reconnect backend after guest driver probe device. QEMU won't send out any vhost messages to restore backend. This is because vhost->vdev is NULL before guest driver probe vblk device, so vhost_user_blk_disconnect won't be called, s->connected is still true. Next vhost_user_blk_connect will simply return without doing anything. 2. modprobe -r virtio-blk inside VM, then disconnect backend, then reconnect backend, then modprobe virtio-blk. QEMU won't send messages in vhost_dev_init. This is because rmmod will let qemu call vhost_user_blk_stop, vhost->vdev also become NULL(in vhost_dev_stop), vhost_user_blk_disconnect won't be called. Again s->connected is still true, even chr connect is closed. I think even vhost->vdev is NULL, vhost_user_blk_disconnect should be called when chr connect close? Hope we can have a fix soon. Thanks, Yajun
Re: [RFC PATCH 0/5] virtio-net: Introduce LM early load
On 10/18/2023 12:47 AM, Eugenio Perez Martin wrote: External email: Use caution opening links or attachments On Mon, Sep 18, 2023 at 6:51 AM Yajun Wu wrote: This series of patches aims to minimize the downtime during live migration of a virtio-net device with a vhost-user backend. In the case of hardware virtual Data Path Acceleration (vDPA) implementation, the hardware configuration, which includes tasks like VQ creation and RSS setting, may take above 200ms. This significantly increases the downtime of the VM, particularly in terms of networking. Hi! Sorry I totally missed this email. Please CC me in next versions. Just for completion, there is an ongoing plan to reduce the downtime in vhost-vdpa. You can find more details at [1]. To send the state periodically is in the roadmap, but some benchmarking detected that memory pinning and unpinning affects more to downtime. I'll send a RFC soon with this. The plan was to continue with iterative state restoring, so I'm happy to know more people are looking into it! In the case of vhost-vdpa it already restores the state by not enabling dataplane until migration completes. All the load is performed using CVQ, as you can see in net/vhost-vdpa.c:vhost_vdpa_net_load. After that, all dataplane is started again. My idea is to start vhost-vdpa (by calling vhost_vdpa_dev_start) at the destination at the same moment the migration starts, as it will not have dataplane enabled. After that, the source should send the virtio-net vmstate every time it changes. vhost-vdpa net is able to send and receive through CVQ, so it should be able to modify net device configuration as many times as needed. I guess that could be done by calling something in the line of your vhost_user_set_presetup_state. This is very good approach. How do you know when virtio-net vmstate change? vhost-user and vhost-vdpa should share same code of virtio-net vmstate early sync. This can be improved in vhost-vdpa by being able to send only the new state. When all the migration is completed, vhost-vdpa net dataplane should start as it does now. If you are interested in saving changes to vhost-user protocol, maybe qemu could just disable the dataplane too with VHOST_USER_SET_VRING_ENABLE? If not, I think both approaches have a lot in common, so I'm sure we can develop one backend on top of another. Thanks! [1] https://lists.gnu.org/archive/html/qemu-devel/2023-04/msg00659.html I'm afraid just like DRIVER_OK as a hint for vhost-user vDPA to apply all the configuration to HW. Vhost-user also needs same hint as the end of each round vmstate sync to apply configuration to HW. That's why I need define new protocol message. Because of MQ can also change, VQ enable is a valid parameter to HW. HW will create only enabled queue, number of enabled queues affects RSS setting. To reduce the VM downtime, the proposed approach involves capturing the basic device state/configuration during the VM's running stage and performing the initial device configuration(presetup). During the normal configuration process when the VM is in a stopped state, the second configuration is compared to the first one, and only the differences are applied to reduce downtime. Ideally, only the vring available index needs to be changed within VM stop. This feature is disabled by default, because backend like dpdk also needs adding support for vhost new message. New device property "x-early-migration" can enable this feature. 1. Register a new vmstate for virtio-net with an early_setup flag to send the device state during migration setup. 2. After device state load on destination VM, need to send device status to vhost backend in a new way. Introduce new vhost-user message: VHOST_USER_PRESETUP, to notify backend of presetup. 3. Let virtio-net, vhost-net, vhost-dev support presetup. Main flow: a. vhost-dev sending presetup start. b. virtio-net setting mtu. c. vhost-dev sending vring configuration and setting dummy call/kick fd. d. vhost-net sending vring enable. e. vhost-dev sending presetup end. TODOs: == - No vhost-vdpa/kernel support. Need to discuss/design new kernel interface if there's same requirement for vhost-vdpa. - No vIOMMU support so far. If there is a need for vIOMMU support, it is planned to be addressed in a follow-up patchset. Test: = - Live migration VM with 2 virtio-net devices, ping can recover. Together with DPDK patch [1]. - The time consumption of DPDK function dev_conf is reduced from 191.4 ms to 6.6 ms. References: === [1] https://github.com/Mellanox/dpdk-vhost-vfe/pull/37 Any comments or feedback are highly appreciated. Thanks, Yajun Yajun Wu (5): vhost-user: Add presetup protocol feature and op vhost: Add support for presetup vhost-net: Add support for presetup virtio: Add VMState for early load virtio-net: Introduce LM early load docs/interop/vhost-user.rst
Re: [RFC PATCH 0/5] virtio-net: Introduce LM early load
Ping. On 9/18/2023 12:49 PM, Yajun Wu wrote: This series of patches aims to minimize the downtime during live migration of a virtio-net device with a vhost-user backend. In the case of hardware virtual Data Path Acceleration (vDPA) implementation, the hardware configuration, which includes tasks like VQ creation and RSS setting, may take above 200ms. This significantly increases the downtime of the VM, particularly in terms of networking. To reduce the VM downtime, the proposed approach involves capturing the basic device state/configuration during the VM's running stage and performing the initial device configuration(presetup). During the normal configuration process when the VM is in a stopped state, the second configuration is compared to the first one, and only the differences are applied to reduce downtime. Ideally, only the vring available index needs to be changed within VM stop. This feature is disabled by default, because backend like dpdk also needs adding support for vhost new message. New device property "x-early-migration" can enable this feature. 1. Register a new vmstate for virtio-net with an early_setup flag to send the device state during migration setup. 2. After device state load on destination VM, need to send device status to vhost backend in a new way. Introduce new vhost-user message: VHOST_USER_PRESETUP, to notify backend of presetup. 3. Let virtio-net, vhost-net, vhost-dev support presetup. Main flow: a. vhost-dev sending presetup start. b. virtio-net setting mtu. c. vhost-dev sending vring configuration and setting dummy call/kick fd. d. vhost-net sending vring enable. e. vhost-dev sending presetup end. TODOs: == - No vhost-vdpa/kernel support. Need to discuss/design new kernel interface if there's same requirement for vhost-vdpa. - No vIOMMU support so far. If there is a need for vIOMMU support, it is planned to be addressed in a follow-up patchset. Test: = - Live migration VM with 2 virtio-net devices, ping can recover. Together with DPDK patch [1]. - The time consumption of DPDK function dev_conf is reduced from 191.4 ms to 6.6 ms. References: === [1] https://github.com/Mellanox/dpdk-vhost-vfe/pull/37 Any comments or feedback are highly appreciated. Thanks, Yajun Yajun Wu (5): vhost-user: Add presetup protocol feature and op vhost: Add support for presetup vhost-net: Add support for presetup virtio: Add VMState for early load virtio-net: Introduce LM early load docs/interop/vhost-user.rst | 10 ++ hw/net/trace-events | 1 + hw/net/vhost_net.c| 40 +++ hw/net/virtio-net.c | 100 ++ hw/virtio/vhost-user.c| 30 ++ hw/virtio/vhost.c | 166 +- hw/virtio/virtio.c| 152 --- include/hw/virtio/vhost-backend.h | 3 + include/hw/virtio/vhost.h | 12 +++ include/hw/virtio/virtio-net.h| 1 + include/hw/virtio/virtio.h| 10 +- include/net/vhost_net.h | 3 + 12 files changed, 443 insertions(+), 85 deletions(-)
Re: [Virtio-fs] (no subject)
On 10/9/2023 5:13 PM, Hanna Czenczek wrote: External email: Use caution opening links or attachments On 09.10.23 11:07, Hanna Czenczek wrote: On 09.10.23 10:21, Hanna Czenczek wrote: On 07.10.23 04:22, Yajun Wu wrote: [...] The main motivation of adding VHOST_USER_SET_STATUS is to let backend DPDK know when DRIVER_OK bit is valid. It's an indication of all VQ configuration has sent, otherwise DPDK has to rely on first queue pair is ready, then receiving/applying VQ configuration one by one. During live migration, configuring VQ one by one is very time consuming. One question I have here is why it wasn’t then introduced in the live migration code, but in the general VM stop/cont code instead. It does seem time-consuming to do this every time the VM is paused and resumed. Yes, VM stop/cont will call vhost_net_stop/vhost_net_start. Maybe because there's no device level stop/cont vhost message? For VIRTIO net vDPA, HW needs to know how many VQs are enabled to set RSS(Receive-Side Scaling). If you don’t want SET_STATUS message, backend can remove protocol feature bit VHOST_USER_PROTOCOL_F_STATUS. The problem isn’t back-ends that don’t want the message, the problem is that qemu uses the message wrongly, which prevents well-behaving back-ends from implementing the message. DPDK is ignoring SET_STATUS 0, but using GET_VRING_BASE to do device close/reset. So the right thing to do for back-ends is to announce STATUS support and then not implement it correctly? GET_VRING_BASE should not reset the close or reset the device, by the way. It should stop that one vring, not more. We have a RESET_DEVICE command for resetting. I believe dpdk uses GET_VRING_BASE long before qemu has RESET_DEVICE? It's a compatible issue. For new backend implements, we can have better solution, right? I'm not involved in discussion about adding SET_STATUS in Vhost protocol. This feature is essential for vDPA(same as vhost-vdpa implements VHOST_VDPA_SET_STATUS). So from what I gather from your response is that there is only a single use for SET_STATUS, which is the DRIVER_OK bit. If so, documenting that all other bits are to be ignored by both back-end and front-end would be fine by me. I’m not fully serious about that suggestion, but I hear the strong implication that nothing but DRIVER_OK was of any concern, and this is really important to note when we talk about the status of the STATUS feature in vhost today. It seems to me now that it was not intended to be the virtio-level status byte, but just a DRIVER_OK signalling path from front-end to back-end. That makes it a vhost-level protocol feature to me. On second thought, it just is a pure vhost-level protocol feature, and has nothing to do with the virtio status byte as-is. The only stated purpose is for the front-end to send DRIVER_OK after migration, but migration is transparent to the guest, so the guest would never change the status byte during migration. Therefore, if this feature is essential, we will never be able to have a status byte that is transparently shared between guest and back-end device, i.e. the virtio status byte. On third thought, scratch that. The guest wouldn’t set it, but naturally, after migration, the front-end will need to restore the status byte from the source, so the front-end will always need to set it, even if it were otherwise used controlled only by the guest and the back-end device. So technically, this doesn’t prevent such a use case. (In practice, it isn’t controlled by the guest right now, but that could be fixed.) I only tested the feature with DPDK(the only backend use it today?). Max defined the protocol and added the corresponding code in DPDK before I added QEMU support. If other backend or different device type want to use this, we can have further discussion? Cc-ing Alex on this mail, because to me, this seems like an important detail when he plans on using the byte in the future. If we need a virtio status byte, I can’t see how we could use the existing F_STATUS for it. Hanna
Re: [Virtio-fs] (no subject)
On 10/9/2023 6:28 PM, German Maglione wrote: External email: Use caution opening links or attachments On Sat, Oct 7, 2023 at 4:23 AM Yajun Wu wrote: On 10/6/2023 6:34 PM, Michael S. Tsirkin wrote: External email: Use caution opening links or attachments On Fri, Oct 06, 2023 at 11:47:55AM +0200, Hanna Czenczek wrote: On 06.10.23 11:26, Michael S. Tsirkin wrote: On Fri, Oct 06, 2023 at 11:15:55AM +0200, Hanna Czenczek wrote: On 06.10.23 10:45, Michael S. Tsirkin wrote: On Fri, Oct 06, 2023 at 09:48:14AM +0200, Hanna Czenczek wrote: On 05.10.23 19:15, Michael S. Tsirkin wrote: On Thu, Oct 05, 2023 at 01:08:52PM -0400, Stefan Hajnoczi wrote: On Wed, Oct 04, 2023 at 02:58:57PM +0200, Hanna Czenczek wrote: There is no clearly defined purpose for the virtio status byte in vhost-user: For resetting, we already have RESET_DEVICE; and for virtio feature negotiation, we have [GS]ET_FEATURES. With the REPLY_ACK protocol extension, it is possible for SET_FEATURES to return errors (SET_PROTOCOL_FEATURES may be called before SET_FEATURES). As for implementations, SET_STATUS is not widely implemented. dpdk does implement it, but only uses it to signal feature negotiation failure. While it does log reset requests (SET_STATUS 0) as such, it effectively ignores them, in contrast to RESET_OWNER (which is deprecated, and today means the same thing as RESET_DEVICE). While qemu superficially has support for [GS]ET_STATUS, it does not forward the guest-set status byte, but instead just makes it up internally, and actually completely ignores what the back-end returns, only using it as the template for a subsequent SET_STATUS to add single bits to it. Notably, after setting FEATURES_OK, it never reads it back to see whether the flag is still set, which is the only way in which dpdk uses the status byte. As-is, no front-end or back-end can rely on the other side handling this field in a useful manner, and it also provides no practical use over other mechanisms the vhost-user protocol has, which are more clearly defined. Deprecate it. Suggested-by: Stefan Hajnoczi Signed-off-by: Hanna Czenczek --- docs/interop/vhost-user.rst | 28 +--- 1 file changed, 21 insertions(+), 7 deletions(-) Reviewed-by: Stefan Hajnoczi SET_STATUS is the only way to signal failure to acknowledge FEATURES_OK. The fact current backends never check errors does not mean they never will. So no, not applying this. Can this not be done with REPLY_ACK? I.e., with the following message order: 1. GET_FEATURES to find out whether VHOST_USER_F_PROTOCOL_FEATURES is present 2. GET_PROTOCOL_FEATURES to hopefully get VHOST_USER_PROTOCOL_F_REPLY_ACK 3. SET_PROTOCOL_FEATURES to set VHOST_USER_PROTOCOL_F_REPLY_ACK 4. SET_FEATURES with need_reply If not, the problem is that qemu has sent SET_STATUS 0 for a while when the vCPUs are stopped, which generally seems to request a device reset. If we don’t state at least that SET_STATUS 0 is to be ignored, back-ends that will implement SET_STATUS later may break with at least these qemu versions. But documenting that a particular use of the status byte is to be ignored would be really strange. Hanna Hmm I guess. Though just following virtio spec seems cleaner to me... vhost-user reconfigures the state fully on start. Not the internal device state, though. virtiofsd has internal state, and other devices like vhost-gpu back-ends would probably, too. Stefan has recently sent a series (https://lists.nongnu.org/archive/html/qemu-devel/2023-10/msg00709.html) to put the reset (RESET_DEVICE) into virtio_reset() (when we really need a reset). I really don’t like our current approach with the status byte. Following the virtio specification to me would mean that the guest directly controls this byte, which it does not. qemu makes up values as it deems appropriate, and this includes sending a SET_STATUS 0 when the guest is just paused, i.e. when the guest really doesn’t want a device reset. That means that qemu does not treat this as a virtio device field (because that would mean exposing it to the guest driver), but instead treats it as part of the vhost(-user) protocol. It doesn’t feel right to me that we use a virtio-defined feature for communication on the vhost level, i.e. between front-end and back-end, and not between guest driver and device. I think all vhost-level protocol features should be fully defined in the vhost-user specification, which REPLY_ACK is. Hmm that makes sense. Maybe we should have done what stefan's patch is doing. Do look at the original commit that introduced it to understand why it was added. I don’t understand why this was added to the stop/cont code, though. If it is time consuming to make these changes, why are they done every time the VM is paused and resumed? It makes sense that this would be done for the initial configuration (where a reset also wouldn’t hurt), but here it seems wrong. (To be clear, a reset in the stop
Re: [Virtio-fs] (no subject)
was wrong even for non-stateful devices, because it occurred before we fetched the state (vring indices) so we could restore it later. I don’t know how 923b8921d21 was tested, but if the back-end used for testing implemented SET_STATUS 0 as a reset, it could not have survived either migration or a stop/cont in general, because the vring indices would have been reset to 0. What I’m saying is, 923b8921d21 introduced SET_STATUS calls that broke all devices that would implement them as per virtio spec, and even today it’s broken for stateful devices. The mentioned performance issue is likely real, but we can’t address it by making up SET_STATUS calls that are wrong. I concede that I didn’t think about DRIVER_OK. Personally, I would do all final configuration that would happen upon a DRIVER_OK once the first vring is started (i.e. receives a kick). That has the added benefit of being asynchronous because it doesn’t block any vhost-user messages (which are synchronous, and thus block downtime). Hanna For better or worse kick is per ring. It's out of spec to start rings that were not kicked but I guess you could do configuration ... Seems somewhat asymmetrical though. Let's wait until next week, hopefully Yajun Wu will answer. The main motivation of adding VHOST_USER_SET_STATUS is to let backend DPDK know when DRIVER_OK bit is valid. It's an indication of all VQ configuration has sent, otherwise DPDK has to rely on first queue pair is ready, then receiving/applying VQ configuration one by one. During live migration, configuring VQ one by one is very time consuming. For VIRTIO net vDPA, HW needs to know how many VQs are enabled to set RSS(Receive-Side Scaling). If you don’t want SET_STATUS message, backend can remove protocol feature bit VHOST_USER_PROTOCOL_F_STATUS. DPDK is ignoring SET_STATUS 0, but using GET_VRING_BASE to do device close/reset. I'm not involved in discussion about adding SET_STATUS in Vhost protocol. This feature is essential for vDPA(same as vhost-vdpa implements VHOST_VDPA_SET_STATUS). Thanks, Yajun Now, we could hand full control of the status byte to the guest, and that would make me content. But I feel like that doesn’t really work, because qemu needs to intercept the status byte anyway (it needs to know when there is a reset, probably wants to know when the device is configured, etc.), so I don’t think having the status byte in vhost-user really gains us much when qemu could translate status byte changes to/from other vhost-user commands. Hanna well it intercepts it but I think it could pass it on unchanged. I guess symmetry was the point. So I don't see why SET_STATUS 0 has to be ignored. SET_STATUS was introduced by: commit 923b8921d210763359e96246a58658ac0db6c645 Author: Yajun Wu Date: Mon Oct 17 14:44:52 2022 +0800 vhost-user: Support vhost_dev_start CC the author.
[RFC PATCH 2/5] vhost: Add support for presetup
Add New API vhost_dev_start_presetup to notify backend the start and end of presetup. API vhost_dev_presetup to send out the device configurations: 1. acked_features 2. memory table 3. vring information 4. disable host/guest notifier. Signed-off-by: Yajun Wu Reviewed-by: Avihai Horon Reviewed-by: Jiri Pirko --- hw/virtio/vhost.c | 166 -- include/hw/virtio/vhost.h | 12 +++ 2 files changed, 152 insertions(+), 26 deletions(-) diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c index e2f6ffb446..5b162590fb 100644 --- a/hw/virtio/vhost.c +++ b/hw/virtio/vhost.c @@ -1138,24 +1138,71 @@ out: return ret; } -int vhost_virtqueue_start(struct vhost_dev *dev, - struct VirtIODevice *vdev, - struct vhost_virtqueue *vq, - unsigned idx) +static void vhost_virtqueue_memory_unmap(struct vhost_dev *dev, + struct VirtIODevice *vdev, + struct vhost_virtqueue *vq, + unsigned idx) +{ +if (vq->used) { +vhost_memory_unmap(dev, vq->used, + virtio_queue_get_used_size(vdev, idx), + 1, virtio_queue_get_used_size(vdev, idx)); +vq->used = NULL; +} + +if (vq->avail) { +vhost_memory_unmap(dev, vq->avail, + virtio_queue_get_avail_size(vdev, idx), + 0, virtio_queue_get_avail_size(vdev, idx)); +vq->avail = NULL; +} + +if (vq->desc) { +vhost_memory_unmap(dev, vq->desc, + virtio_queue_get_desc_size(vdev, idx), + 0, virtio_queue_get_desc_size(vdev, idx)); +vq->desc = NULL; +} +} + +static int vhost_virtqueue_disable_notify(struct vhost_dev *dev, + struct VirtIODevice *vdev, + struct vhost_virtqueue *vq, + unsigned idx) { -BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev))); -VirtioBusState *vbus = VIRTIO_BUS(qbus); -VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus); -hwaddr s, l, a; -int r; int vhost_vq_index = dev->vhost_ops->vhost_get_vq_index(dev, idx); struct vhost_vring_file file = { .index = vhost_vq_index }; +int r; + +file.fd = -1; +r = dev->vhost_ops->vhost_set_vring_kick(dev, ); +if (r) { +VHOST_OPS_DEBUG(r, "vhost_set_vring_kick failed"); +return r; +} + +r = dev->vhost_ops->vhost_set_vring_call(dev, ); +if (r) { +VHOST_OPS_DEBUG(r, "vhost_set_vring_call failed"); +return r; +} + +return 0; +} + +static int vhost_virtqueue_vring_setup(struct vhost_dev *dev, + struct VirtIODevice *vdev, + struct vhost_virtqueue *vq, + unsigned idx) +{ +hwaddr s, l, a; +int vhost_vq_index = dev->vhost_ops->vhost_get_vq_index(dev, idx); struct vhost_vring_state state = { .index = vhost_vq_index }; -struct VirtQueue *vvq = virtio_get_queue(vdev, idx); +int r; a = virtio_queue_get_desc_addr(vdev, idx); if (a == 0) { @@ -1186,6 +1233,10 @@ int vhost_virtqueue_start(struct vhost_dev *dev, } } +if (vq->desc) { +vhost_virtqueue_memory_unmap(dev, vdev, vq, idx); +} + vq->desc_size = s = l = virtio_queue_get_desc_size(vdev, idx); vq->desc_phys = a; vq->desc = vhost_memory_map(dev, a, , false); @@ -1212,6 +1263,36 @@ int vhost_virtqueue_start(struct vhost_dev *dev, if (r < 0) { goto fail_alloc; } +return 0; + +fail_alloc: +fail_alloc_used: +fail_alloc_avail: +vhost_virtqueue_memory_unmap(dev, vdev, vq, idx); +fail_alloc_desc: +return r; +} + +int vhost_virtqueue_start(struct vhost_dev *dev, + struct VirtIODevice *vdev, + struct vhost_virtqueue *vq, + unsigned idx) +{ +BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev))); +VirtioBusState *vbus = VIRTIO_BUS(qbus); +VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus); +int r; +int vhost_vq_index = dev->vhost_ops->vhost_get_vq_index(dev, idx); +struct vhost_vring_file file = { +.index = vhost_vq_index +}; +struct VirtQueue *vvq = virtio_get_queue(vdev, idx); + +r = vhost_virtqueue_vring_setup(dev, vdev, vq, idx); +if (r) { +VHOST_OPS_DEBUG(r, "vhost_virtqueue_vring_setup failed"); +goto fail_vring_setup; +} file.fd = event_notifier_get_fd(virtio_queue_get_host_notifier(vvq)); r = dev->vhost_ops->vhost_set_vring_kic
[RFC PATCH 3/5] vhost-net: Add support for presetup
Introduce New API vhost_net_presetup to send virtio net device configuration to backend in LM setup. Mainly calling vhost_dev_presetup, then sending out vring enable. Signed-off-by: Yajun Wu Reviewed-by: Avihai Horon Reviewed-by: Jiri Pirko --- hw/net/vhost_net.c | 40 include/net/vhost_net.h | 3 +++ 2 files changed, 43 insertions(+) diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c index 6b958d6363..dcb818ccf1 100644 --- a/hw/net/vhost_net.c +++ b/hw/net/vhost_net.c @@ -345,6 +345,46 @@ static void vhost_net_stop_one(struct vhost_net *net, vhost_dev_disable_notifiers(>dev, dev); } +int vhost_net_presetup(VirtIODevice *dev, NetClientState *ncs, +int data_queue_pairs, int cvq) +{ +VirtIONet *n = VIRTIO_NET(dev); +int nvhosts = data_queue_pairs + cvq; +struct vhost_net *net; +int r = 0, i, index_end = data_queue_pairs * 2; +NetClientState *peer; + +if (cvq) { +index_end += 1; +} + +for (i = 0; i < nvhosts; i++) { +if (i < data_queue_pairs) { +peer = qemu_get_peer(ncs, i); +} else { /* Control Virtqueue */ +peer = qemu_get_peer(ncs, n->max_queue_pairs); +} + +net = get_vhost_net(peer); +vhost_net_set_vq_index(net, i * 2, index_end); + +r = vhost_dev_presetup(>dev, dev); +if (r < 0) { +return r; +} + +if (peer->vring_enable) { +/* restore vring enable state */ +r = vhost_set_vring_enable(peer, peer->vring_enable); +if (r < 0) { +return r; +} +} +} + +return r; +} + int vhost_net_start(VirtIODevice *dev, NetClientState *ncs, int data_queue_pairs, int cvq) { diff --git a/include/net/vhost_net.h b/include/net/vhost_net.h index c37aba35e6..2c9020c5a2 100644 --- a/include/net/vhost_net.h +++ b/include/net/vhost_net.h @@ -26,6 +26,9 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs, void vhost_net_stop(VirtIODevice *dev, NetClientState *ncs, int data_queue_pairs, int cvq); +int vhost_net_presetup(VirtIODevice *dev, NetClientState *ncs, + int data_queue_pairs, int cvq); + void vhost_net_cleanup(VHostNetState *net); uint64_t vhost_net_get_features(VHostNetState *net, uint64_t features); -- 2.27.0
[RFC PATCH 0/5] virtio-net: Introduce LM early load
This series of patches aims to minimize the downtime during live migration of a virtio-net device with a vhost-user backend. In the case of hardware virtual Data Path Acceleration (vDPA) implementation, the hardware configuration, which includes tasks like VQ creation and RSS setting, may take above 200ms. This significantly increases the downtime of the VM, particularly in terms of networking. To reduce the VM downtime, the proposed approach involves capturing the basic device state/configuration during the VM's running stage and performing the initial device configuration(presetup). During the normal configuration process when the VM is in a stopped state, the second configuration is compared to the first one, and only the differences are applied to reduce downtime. Ideally, only the vring available index needs to be changed within VM stop. This feature is disabled by default, because backend like dpdk also needs adding support for vhost new message. New device property "x-early-migration" can enable this feature. 1. Register a new vmstate for virtio-net with an early_setup flag to send the device state during migration setup. 2. After device state load on destination VM, need to send device status to vhost backend in a new way. Introduce new vhost-user message: VHOST_USER_PRESETUP, to notify backend of presetup. 3. Let virtio-net, vhost-net, vhost-dev support presetup. Main flow: a. vhost-dev sending presetup start. b. virtio-net setting mtu. c. vhost-dev sending vring configuration and setting dummy call/kick fd. d. vhost-net sending vring enable. e. vhost-dev sending presetup end. TODOs: == - No vhost-vdpa/kernel support. Need to discuss/design new kernel interface if there's same requirement for vhost-vdpa. - No vIOMMU support so far. If there is a need for vIOMMU support, it is planned to be addressed in a follow-up patchset. Test: = - Live migration VM with 2 virtio-net devices, ping can recover. Together with DPDK patch [1]. - The time consumption of DPDK function dev_conf is reduced from 191.4 ms to 6.6 ms. References: === [1] https://github.com/Mellanox/dpdk-vhost-vfe/pull/37 Any comments or feedback are highly appreciated. Thanks, Yajun Yajun Wu (5): vhost-user: Add presetup protocol feature and op vhost: Add support for presetup vhost-net: Add support for presetup virtio: Add VMState for early load virtio-net: Introduce LM early load docs/interop/vhost-user.rst | 10 ++ hw/net/trace-events | 1 + hw/net/vhost_net.c| 40 +++ hw/net/virtio-net.c | 100 ++ hw/virtio/vhost-user.c| 30 ++ hw/virtio/vhost.c | 166 +- hw/virtio/virtio.c| 152 --- include/hw/virtio/vhost-backend.h | 3 + include/hw/virtio/vhost.h | 12 +++ include/hw/virtio/virtio-net.h| 1 + include/hw/virtio/virtio.h| 10 +- include/net/vhost_net.h | 3 + 12 files changed, 443 insertions(+), 85 deletions(-) -- 2.27.0
[RFC PATCH 1/5] vhost-user: Add presetup protocol feature and op
This patch implements VHOST_USER_PROTOCOL_F_PRESETUP protocol feature and VHOST_USER_PRESETUP, so that the backend can know the beginning and completion of the early setup phase for the virtio device. Unlike the regular device state load, which occurs in the VM stop phase, this pre-setup takes place in the live migration setup stage. Signed-off-by: Yajun Wu Reviewed-by: Avihai Horon Reviewed-by: Jiri Pirko --- docs/interop/vhost-user.rst | 10 ++ hw/virtio/vhost-user.c| 30 ++ include/hw/virtio/vhost-backend.h | 3 +++ 3 files changed, 43 insertions(+) diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst index 5a070adbc1..70b8e2694c 100644 --- a/docs/interop/vhost-user.rst +++ b/docs/interop/vhost-user.rst @@ -885,6 +885,7 @@ Protocol features #define VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS 15 #define VHOST_USER_PROTOCOL_F_STATUS 16 #define VHOST_USER_PROTOCOL_F_XEN_MMAP 17 + #define VHOST_USER_PROTOCOL_F_PRESETUP 18 Front-end message types --- @@ -1440,6 +1441,15 @@ Front-end message types query the back-end for its device status as defined in the Virtio specification. +``VHOST_USER_PRESETUP`` + :id: 41 + :equivalent ioctl: N/A + :request payload: ``u64`` + :reply payload: N/A + + When the ``VHOST_USER_PROTOCOL_F_PRESETUP`` protocol feature has been + successfully negotiated, this message is submitted by the front-end to + indicate start or end early setup. Value 1 means start, 2 means end. Back-end message types -- diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index 8dcf049d42..71018d06c1 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -74,6 +74,8 @@ enum VhostUserProtocolFeature { /* Feature 14 reserved for VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS. */ VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS = 15, VHOST_USER_PROTOCOL_F_STATUS = 16, +/* Feature 17 reserved for VHOST_USER_PROTOCOL_F_XEN_MMAP. */ +VHOST_USER_PROTOCOL_F_PRESETUP = 18, VHOST_USER_PROTOCOL_F_MAX }; @@ -121,6 +123,7 @@ typedef enum VhostUserRequest { VHOST_USER_REM_MEM_REG = 38, VHOST_USER_SET_STATUS = 39, VHOST_USER_GET_STATUS = 40, +VHOST_USER_PRESETUP = 41, VHOST_USER_MAX } VhostUserRequest; @@ -132,6 +135,11 @@ typedef enum VhostUserBackendRequest { VHOST_USER_BACKEND_MAX } VhostUserBackendRequest; +typedef enum VhostUserPresetupState { +VHOST_USER_PRESETUP_START = 1, +VHOST_USER_PRESETUP_END = 2, +} VhostUserPresetupState; + typedef struct VhostUserMemoryRegion { uint64_t guest_phys_addr; uint64_t memory_size; @@ -2741,6 +2749,27 @@ static void vhost_user_reset_status(struct vhost_dev *dev) } } +static int vhost_user_set_presetup_state(struct vhost_dev *dev, bool start) +{ +if (start) { +return vhost_user_set_u64(dev, VHOST_USER_PRESETUP, + VHOST_USER_PRESETUP_START, false); +} else { +return vhost_user_set_u64(dev, VHOST_USER_PRESETUP, + VHOST_USER_PRESETUP_END, false); +} +} + +static int vhost_user_presetup(struct vhost_dev *dev, bool start) +{ +if (!virtio_has_feature(dev->protocol_features, +VHOST_USER_PROTOCOL_F_PRESETUP)) { +return -ENOTSUP; +} + +return vhost_user_set_presetup_state(dev, start); +} + const VhostOps user_ops = { .backend_type = VHOST_BACKEND_TYPE_USER, .vhost_backend_init = vhost_user_backend_init, @@ -2777,4 +2806,5 @@ const VhostOps user_ops = { .vhost_set_inflight_fd = vhost_user_set_inflight_fd, .vhost_dev_start = vhost_user_dev_start, .vhost_reset_status = vhost_user_reset_status, +.vhost_presetup = vhost_user_presetup, }; diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h index 31a251a9f5..00dd532df9 100644 --- a/include/hw/virtio/vhost-backend.h +++ b/include/hw/virtio/vhost-backend.h @@ -133,6 +133,8 @@ typedef int (*vhost_set_config_call_op)(struct vhost_dev *dev, typedef void (*vhost_reset_status_op)(struct vhost_dev *dev); +typedef int (*vhost_presetup_op)(struct vhost_dev *dev, bool start); + typedef struct VhostOps { VhostBackendType backend_type; vhost_backend_init vhost_backend_init; @@ -181,6 +183,7 @@ typedef struct VhostOps { vhost_force_iommu_op vhost_force_iommu; vhost_set_config_call_op vhost_set_config_call; vhost_reset_status_op vhost_reset_status; +vhost_presetup_op vhost_presetup; } VhostOps; int vhost_backend_update_device_iotlb(struct vhost_dev *dev, -- 2.27.0
[RFC PATCH 5/5] virtio-net: Introduce LM early load
Register a new vmstate for virtio-net with an early_setup flag to send the device state during migration setup. This can reduce the migration downtime of a virtio-net device with a vhost-user backend. This feature is disabled by default and can be enabled by setting the "x-early-migration" device property to on. Signed-off-by: Yajun Wu Reviewed-by: Avihai Horon Reviewed-by: Jiri Pirko --- hw/net/trace-events| 1 + hw/net/virtio-net.c| 100 + include/hw/virtio/virtio-net.h | 1 + 3 files changed, 102 insertions(+) diff --git a/hw/net/trace-events b/hw/net/trace-events index 6b5ba669a2..ec89229044 100644 --- a/hw/net/trace-events +++ b/hw/net/trace-events @@ -399,6 +399,7 @@ virtio_net_post_load_device(void) virtio_net_rss_disable(void) virtio_net_rss_error(const char *msg, uint32_t value) "%s, value 0x%08x" virtio_net_rss_enable(uint32_t p1, uint16_t p2, uint8_t p3) "hashes 0x%x, table of %d, key of %d" +virtio_net_load_early_setup(void) "" # tulip.c tulip_reg_write(uint64_t addr, const char *name, int size, uint64_t val) "addr 0x%02"PRIx64" (%s) size %d value 0x%08"PRIx64 diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index 7102ec4817..d0b0cc2ffe 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -46,6 +46,7 @@ #include "net_rx_pkt.h" #include "hw/virtio/vhost.h" #include "sysemu/qtest.h" +#include "sysemu/runstate.h" #define VIRTIO_NET_VM_VERSION11 @@ -3568,6 +3569,95 @@ static bool failover_hide_primary_device(DeviceListener *listener, return qatomic_read(>failover_primary_hidden); } +static int virtio_net_load_early_setup(void *opaque, int version_id) +{ +VirtIONet *n = opaque; +VirtIODevice *vdev = VIRTIO_DEVICE(n); +NetClientState *nc = qemu_get_queue(n->nic); +int queue_pairs = n->multiqueue ? n->max_queue_pairs : 1; +int cvq = virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ) ? +n->max_ncs - n->max_queue_pairs : 0; +VHostNetState *net; +int r; + +assert(nc->peer); +assert(nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_USER); + +net = get_vhost_net(nc->peer); +assert(net); +assert(net->dev.vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER); + +trace_virtio_net_load_early_setup(); + +/* backend should support presetup */ +r = vhost_dev_set_presetup_state(>dev, true); +if (r < 0) { +error_report("Start presetup device fail: %d", r); +return r; +} + +if (virtio_has_feature(vdev->guest_features, VIRTIO_NET_F_MTU)) { +r = vhost_net_set_mtu(get_vhost_net(nc->peer), n->net_conf.mtu); +if (r < 0) { +error_report("%uBytes MTU not supported by the backend", + n->net_conf.mtu); +goto error; +} +} + +r = vhost_net_presetup(vdev, n->nic->ncs, queue_pairs, cvq); +if (r < 0) { +error_report("Presetup device fail: %d", r); +goto error; +} + +r = vhost_dev_set_presetup_state(>dev, false); +if (r < 0) { +error_report("Finish presetup device fail: %d", r); +return r; +} +return 0; + +error: +vhost_dev_set_presetup_state(>dev, false); +return r; +} + +static bool virtio_net_early_setup_needed(void *opaque) +{ +VirtIONet *n = opaque; +NetClientState *nc = qemu_get_queue(n->nic); +VHostNetState *net = get_vhost_net(nc->peer); + +/* + * Presetup aims to reduce live migration downtime by sync device + * status in setup stage. So only do presetup when source VM is in + * running state. + */ +if (runstate_is_running() && +nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_USER && +net->dev.vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER && +!vhost_dev_has_iommu(>dev) && +n->vhost_started && +n->status & VIRTIO_NET_S_LINK_UP) { +return true; +} +return false; +} + +static const VMStateDescription vmstate_virtio_net_early = { +.name = "virtio-net-early", +.minimum_version_id = VIRTIO_NET_VM_VERSION, +.version_id = VIRTIO_NET_VM_VERSION, +.fields = (VMStateField[]) { +VMSTATE_EARLY_VIRTIO_DEVICE, +VMSTATE_END_OF_LIST() +}, +.early_setup = true, +.post_load = virtio_net_load_early_setup, +.needed = virtio_net_early_setup_needed, +}; + static void virtio_net_device_realize(DeviceState *dev, Error **errp) { VirtIODevice *vdev = VIRTIO_DEVICE(dev); @@ -3743,6 +3833,11 @@ static void virtio_net_device_realize(DeviceState *dev, Error **errp) if (virtio_has_feature(n->host_features, VIRTIO_NET_F_RSS)) { virtio_net_load_
[RFC PATCH 4/5] virtio: Add VMState for early load
Define new virtio device vmstate for early save/load (happens in LM setup stage). Same as original vmstate, except: In LM setup phase of the destination VM, the guest memory has not been synchronized yet. To address this, a flag has been added to virtio_load function to skip the index check. Signed-off-by: Yajun Wu Reviewed-by: Avihai Horon Reviewed-by: Jiri Pirko --- hw/virtio/virtio.c | 152 +++-- include/hw/virtio/virtio.h | 10 ++- 2 files changed, 103 insertions(+), 59 deletions(-) diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index 969c25f4cf..8c73c245dd 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -2832,7 +2832,17 @@ virtio_device_get(QEMUFile *f, void *opaque, size_t size, VirtIODevice *vdev = VIRTIO_DEVICE(opaque); DeviceClass *dc = DEVICE_CLASS(VIRTIO_DEVICE_GET_CLASS(vdev)); -return virtio_load(vdev, f, dc->vmsd->version_id); +return virtio_load(vdev, f, dc->vmsd->version_id, false); +} + +/* A wrapper for use as a VMState .get function */ +static int virtio_early_device_get(QEMUFile *f, void *opaque, size_t size, + const VMStateField *field) +{ +VirtIODevice *vdev = VIRTIO_DEVICE(opaque); +DeviceClass *dc = DEVICE_CLASS(VIRTIO_DEVICE_GET_CLASS(vdev)); + +return virtio_load(vdev, f, dc->vmsd->version_id, true); } const VMStateInfo virtio_vmstate_info = { @@ -2841,6 +2851,12 @@ const VMStateInfo virtio_vmstate_info = { .put = virtio_device_put, }; +const VMStateInfo virtio_early_vmstate_info = { +.name = "virtio-early", +.get = virtio_early_device_get, +.put = virtio_device_put, +}; + static int virtio_set_features_nocheck(VirtIODevice *vdev, uint64_t val) { VirtioDeviceClass *k = VIRTIO_DEVICE_GET_CLASS(vdev); @@ -2940,8 +2956,75 @@ size_t virtio_get_config_size(const VirtIOConfigSizeParams *params, return config_size; } +static int virtio_load_check_index(VirtIODevice *vdev, int num) +{ +int i; + +RCU_READ_LOCK_GUARD(); + +for (i = 0; i < num; i++) { +if (vdev->vq[i].vring.desc) { +uint16_t nheads; + +/* + * VIRTIO-1 devices migrate desc, used, and avail ring addresses so + * only the region cache needs to be set up. Legacy devices need + * to calculate used and avail ring addresses based on the desc + * address. + */ +if (virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1)) { +virtio_init_region_cache(vdev, i); +} else { +virtio_queue_update_rings(vdev, i); +} + +if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) { +vdev->vq[i].shadow_avail_idx = vdev->vq[i].last_avail_idx; +vdev->vq[i].shadow_avail_wrap_counter = +vdev->vq[i].last_avail_wrap_counter; +continue; +} + +nheads = vring_avail_idx(>vq[i]) - vdev->vq[i].last_avail_idx; +/* Check it isn't doing strange things with descriptor numbers. */ +if (nheads > vdev->vq[i].vring.num) { +virtio_error(vdev, "VQ %d size 0x%x Guest index 0x%x " + "inconsistent with Host index 0x%x: delta 0x%x", + i, vdev->vq[i].vring.num, + vring_avail_idx(>vq[i]), + vdev->vq[i].last_avail_idx, nheads); +vdev->vq[i].used_idx = 0; +vdev->vq[i].shadow_avail_idx = 0; +vdev->vq[i].inuse = 0; +continue; +} +vdev->vq[i].used_idx = vring_used_idx(>vq[i]); +vdev->vq[i].shadow_avail_idx = vring_avail_idx(>vq[i]); + +/* + * Some devices migrate VirtQueueElements that have been popped + * from the avail ring but not yet returned to the used ring. + * Since max ring size < UINT16_MAX it's safe to use modulo + * UINT16_MAX + 1 subtraction. + */ +vdev->vq[i].inuse = (uint16_t)(vdev->vq[i].last_avail_idx - +vdev->vq[i].used_idx); +if (vdev->vq[i].inuse > vdev->vq[i].vring.num) { +error_report("VQ %d size 0x%x < last_avail_idx 0x%x - " + "used_idx 0x%x", + i, vdev->vq[i].vring.num, + vdev->vq[i].last_avail_idx, + vdev->vq[i].used_idx); +return -1; +} +} +} + +return 0; +} + int coroutine_mixed_fn -virtio_load(VirtIODevice *vdev, QEMUFile *f, int version_id) +virtio_load(VirtIODevice *vdev, QEMUFile
[PATCH] docs: vhost-user: VHOST_USER_GET_STATUS require reply
Add VHOST_USER_GET_STATUS to the list of requests that require a reply. Cc: Maxime Coquelin Cc: Michael S. Tsirkin Signed-off-by: Yajun Wu --- docs/interop/vhost-user.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst index 8a5924ea75..2d13108284 100644 --- a/docs/interop/vhost-user.rst +++ b/docs/interop/vhost-user.rst @@ -299,6 +299,7 @@ replies. Here is a list of the ones that do: * ``VHOST_USER_GET_VRING_BASE`` * ``VHOST_USER_SET_LOG_BASE`` (if ``VHOST_USER_PROTOCOL_F_LOG_SHMFD``) * ``VHOST_USER_GET_INFLIGHT_FD`` (if ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD``) +* ``VHOST_USER_GET_STATUS`` (if ``VHOST_USER_PROTOCOL_F_STATUS``) .. seealso:: -- 2.27.0
Re: [PATCH v2] vhost-user: send SET_STATUS 0 after GET_VRING_BASE
On 5/2/2023 7:04 AM, Stefan Hajnoczi wrote: Setting the VIRTIO Device Status Field to 0 resets the device. The device's state is lost, including the vring configuration. vhost-user.c currently sends SET_STATUS 0 before GET_VRING_BASE. This risks confusion about the lifetime of the vhost-user state (e.g. vring last_avail_idx) across VIRTIO device reset. Eugenio Pérez adjusted the order for vhost-vdpa.c in commit c3716f260bff ("vdpa: move vhost reset after get vring base") and in that commit description suggested doing the same for vhost-user in the future. Go ahead and adjust vhost-user.c now. I ran various online code searches to identify vhost-user backends implementing SET_STATUS. It seems only DPDK implements SET_STATUS and Yajun Wu has confirmed that it is safe to make this change. Fixes: commit 923b8921d210763359e96246a58658ac0db6c645 ("vhost-user: Support vhost_dev_start") Cc: Michael S. Tsirkin Cc: Cindy Lu Cc: Yajun Wu Signed-off-by: Stefan Hajnoczi --- v2: - Added VHOST_USER_PROTOCOL_F_STATUS check [Yajun Wu] - Added "Fixes:" tag [Michael] --- hw/virtio/vhost-user.c | 16 +++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index e5285df4ba..40974afd06 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -2677,7 +2677,20 @@ static int vhost_user_dev_start(struct vhost_dev *dev, bool started) VIRTIO_CONFIG_S_DRIVER | VIRTIO_CONFIG_S_DRIVER_OK); } else { -return vhost_user_set_status(dev, 0); +return 0; +} +} + +static void vhost_user_reset_status(struct vhost_dev *dev) +{ +/* Set device status only for last queue pair */ +if (dev->vq_index + dev->nvqs != dev->vq_index_end) { +return; +} + +if (virtio_has_feature(dev->protocol_features, + VHOST_USER_PROTOCOL_F_STATUS)) { +vhost_user_set_status(dev, 0); } } @@ -2716,4 +2729,5 @@ const VhostOps user_ops = { .vhost_get_inflight_fd = vhost_user_get_inflight_fd, .vhost_set_inflight_fd = vhost_user_set_inflight_fd, .vhost_dev_start = vhost_user_dev_start, +.vhost_reset_status = vhost_user_reset_status, }; -- 2.40.1 Reviewed-by: Yajun Wu
Re: [PATCH] vhost-user: send SET_STATUS 0 after GET_VRING_BASE
On 4/20/2023 9:07 PM, Stefan Hajnoczi wrote: Setting the VIRTIO Device Status Field to 0 resets the device. The device's state is lost, including the vring configuration. vhost-user.c currently sends SET_STATUS 0 before GET_VRING_BASE. This risks confusion about the lifetime of the vhost-user state (e.g. vring last_avail_idx) across VIRTIO device reset. Eugenio Pérez adjusted the order for vhost-vdpa.c in commit c3716f260bff ("vdpa: move vhost reset after get vring base") and in that commit description suggested doing the same for vhost-user in the future. Go ahead and adjust vhost-user.c now. I ran various online code searches to identify vhost-user backends implementing SET_STATUS. It seems only DPDK implements SET_STATUS and Yajun Wu has confirmed that it is safe to make this change. Cc: Michael S. Tsirkin Cc: Cindy Lu Signed-off-by: Stefan Hajnoczi --- hw/virtio/vhost-user.c | 13 - 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index e5285df4ba..2d40b1b3e7 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -2677,10 +2677,20 @@ static int vhost_user_dev_start(struct vhost_dev *dev, bool started) VIRTIO_CONFIG_S_DRIVER | VIRTIO_CONFIG_S_DRIVER_OK); } else { -return vhost_user_set_status(dev, 0); +return 0; } } +static void vhost_user_reset_status(struct vhost_dev *dev) +{ +/* Set device status only for last queue pair */ +if (dev->vq_index + dev->nvqs != dev->vq_index_end) { +return; +} + +vhost_user_set_status(dev, 0); +} + const VhostOps user_ops = { .backend_type = VHOST_BACKEND_TYPE_USER, .vhost_backend_init = vhost_user_backend_init, @@ -2716,4 +2726,5 @@ const VhostOps user_ops = { .vhost_get_inflight_fd = vhost_user_get_inflight_fd, .vhost_set_inflight_fd = vhost_user_set_inflight_fd, .vhost_dev_start = vhost_user_dev_start, +.vhost_reset_status = vhost_user_reset_status, }; -- 2.39.2 Thank you for this fix. Can you add protocol feature bit check, just like we do in vhost_user_dev_start? if (!virtio_has_feature(dev->protocol_features, VHOST_USER_PROTOCOL_F_STATUS)) { return 0; }
Re: Move vhost-user SET_STATUS 0 after get vring base?
On 4/18/2023 11:34 PM, Michael S. Tsirkin wrote: On Tue, Apr 18, 2023 at 11:18:11AM -0400, Stefan Hajnoczi wrote: Hi, Cindy's commit ca71db438bdc ("vhost: implement vhost_dev_start method") added SET_STATUS calls to vhost_dev_start() and vhost_dev_stop() for all vhost backends. Eugenio's commit c3716f260bff ("vdpa: move vhost reset after get vring base") deferred the SET_STATUS 0 call in vhost_dev_stop() until after GET_VRING_BASE for vDPA only. In that commit Eugenio said, "A patch to make vhost_user_dev_start more similar to vdpa is desirable, but it can be added on top". I agree and think it's a good idea to keep the vhost backends in sync where possible. vhost-user still has the old behavior where QEMU sends SET_STATUS 0 before GET_VRING_BASE. Most existing vhost-user backends don't implement the SET_STATUS message, so I think no one has tripped over this yet. Any thoughts on making vhost-user behave like vDPA here? Stefan Wow. Well SET_STATUS 0 resets the device so yes, I think doing that before GET_VRING_BASE will lose a state. Donnu how it does not trip up people, indeed the only idea is if people ignore SET_STATUS. -- MST For DPDK vhost-user backend SET_STATUS 0 (reset) is ignored. Yajun
[PATCH] chardev/char-socket: set s->listener = NULL in char_socket_finalize
After live migration with virtio block device, qemu crash at: #0 0x55914f46f795 in object_dynamic_cast_assert (obj=0x559151b7b090, typename=0x55914f80fbc4 "qio-channel", file=0x55914f80fb90 "/images/testvfe/sw/qemu.gerrit/include/io/channel.h", line=30, func=0x55914f80fcb8 <__func__.17257> "QIO_CHANNEL") at ../qom/object.c:872 #1 0x55914f480d68 in QIO_CHANNEL (obj=0x559151b7b090) at /images/testvfe/sw/qemu.gerrit/include/io/channel.h:29 #2 0x55914f4812f8 in qio_net_listener_set_client_func_full (listener=0x559151b7a720, func=0x55914f580b97 , data=0x5591519f4ea0, notify=0x0, context=0x0) at ../io/net-listener.c:166 #3 0x55914f580059 in tcp_chr_update_read_handler (chr=0x5591519f4ea0) at ../chardev/char-socket.c:637 #4 0x55914f583dca in qemu_chr_be_update_read_handlers (s=0x5591519f4ea0, context=0x0) at ../chardev/char.c:226 #5 0x55914f57b7c9 in qemu_chr_fe_set_handlers_full (b=0x559152bf23a0, fd_can_read=0x0, fd_read=0x0, fd_event=0x0, be_change=0x0, opaque=0x0, context=0x0, set_open=false, sync_state=true) at ../chardev/char-fe.c:279 #6 0x55914f57b86d in qemu_chr_fe_set_handlers (b=0x559152bf23a0, fd_can_read=0x0, fd_read=0x0, fd_event=0x0, be_change=0x0, opaque=0x0, context=0x0, set_open=false) at ../chardev/char-fe.c:304 #7 0x55914f378caf in vhost_user_async_close (d=0x559152bf21a0, chardev=0x559152bf23a0, vhost=0x559152bf2420, cb=0x55914f2fb8c1 ) at ../hw/virtio/vhost-user.c:2725 #8 0x55914f2fba40 in vhost_user_blk_event (opaque=0x559152bf21a0, event=CHR_EVENT_CLOSED) at ../hw/block/vhost-user-blk.c:395 #9 0x55914f58388c in chr_be_event (s=0x5591519f4ea0, event=CHR_EVENT_CLOSED) at ../chardev/char.c:61 #10 0x55914f583905 in qemu_chr_be_event (s=0x5591519f4ea0, event=CHR_EVENT_CLOSED) at ../chardev/char.c:81 #11 0x55914f581275 in char_socket_finalize (obj=0x5591519f4ea0) at ../chardev/char-socket.c:1083 #12 0x55914f46f073 in object_deinit (obj=0x5591519f4ea0, type=0x5591519055c0) at ../qom/object.c:680 #13 0x55914f46f0e5 in object_finalize (data=0x5591519f4ea0) at ../qom/object.c:694 #14 0x55914f46ff06 in object_unref (objptr=0x5591519f4ea0) at ../qom/object.c:1202 #15 0x55914f4715a4 in object_finalize_child_property (obj=0x559151b76c50, name=0x559151b7b250 "char3", opaque=0x5591519f4ea0) at ../qom/object.c:1747 #16 0x55914f46ee86 in object_property_del_all (obj=0x559151b76c50) at ../qom/object.c:632 #17 0x55914f46f0d2 in object_finalize (data=0x559151b76c50) at ../qom/object.c:693 #18 0x55914f46ff06 in object_unref (objptr=0x559151b76c50) at ../qom/object.c:1202 #19 0x55914f4715a4 in object_finalize_child_property (obj=0x559151b6b560, name=0x559151b76630 "chardevs", opaque=0x559151b76c50) at ../qom/object.c:1747 #20 0x55914f46ef67 in object_property_del_child (obj=0x559151b6b560, child=0x559151b76c50) at ../qom/object.c:654 #21 0x55914f46f042 in object_unparent (obj=0x559151b76c50) at ../qom/object.c:673 #22 0x55914f58632a in qemu_chr_cleanup () at ../chardev/char.c:1189 #23 0x55914f16c66c in qemu_cleanup () at ../softmmu/runstate.c:830 #24 0x55914eee7b9e in qemu_default_main () at ../softmmu/main.c:38 #25 0x55914eee7bcc in main (argc=86, argv=0x7ffc97cb8d88) at ../softmmu/main.c:48 In char_socket_finalize after s->listener freed, event callback function vhost_user_blk_event will be called to handle CHR_EVENT_CLOSED. vhost_user_blk_event is calling qio_net_listener_set_client_func_full which is still using s->listener. Setting s->listener = NULL after object_unref(OBJECT(s->listener)) can solve this issue. Signed-off-by: Yajun Wu Acked-by: Jiri Pirko --- chardev/char-socket.c | 1 + 1 file changed, 1 insertion(+) diff --git a/chardev/char-socket.c b/chardev/char-socket.c index c2265436ac..8c58532171 100644 --- a/chardev/char-socket.c +++ b/chardev/char-socket.c @@ -1065,6 +1065,7 @@ static void char_socket_finalize(Object *obj) qio_net_listener_set_client_func_full(s->listener, NULL, NULL, NULL, chr->gcontext); object_unref(OBJECT(s->listener)); +s->listener = NULL; } if (s->tls_creds) { object_unref(OBJECT(s->tls_creds)); -- 2.27.0
RE: [PULL v4 76/83] vhost-user: Support vhost_dev_start
7 0x55b709ad in vhost_dev_start (hdev=0x584dd600, vdev=0x57b965d0, vrings=false) at ../hw/virtio/vhost.c:1988 #38 0x5584291c in vhost_net_start_one (net=0x584dd600, dev=0x57b965d0) at ../hw/net/vhost_net.c:271 #39 0x55842f1e in vhost_net_start (dev=0x57b965d0, ncs=0x57bc09e0, data_queue_pairs=1, cvq=0) at ../hw/net/vhost_net.c:412 #40 0x55b1bf61 in virtio_net_vhost_status (n=0x57b965d0, status=15 '\017') at ../hw/net/virtio-net.c:311 #41 0x55b1c20c in virtio_net_set_status (vdev=0x57b965d0, status=15 '\017') at ../hw/net/virtio-net.c:392 #42 0x55b1ed04 in virtio_net_handle_mq (n=0x57b965d0, cmd=0 '\000', iov=0x56c7ef50, iov_cnt=1) at ../hw/net/virtio-net.c:1497 #43 0x55b1eef0 in virtio_net_handle_ctrl_iov (vdev=0x57b965d0, in_sg=0x56a09880, in_num=1, out_sg=0x56a09890, out_num=1) at ../hw/net/virtio-net.c:1534 #44 0x55b1efe9 in virtio_net_handle_ctrl (vdev=0x57b965d0, vq=0x7fffc04ac140) at ../hw/net/virtio-net.c:1557 #45 0x55b63776 in virtio_queue_notify_vq (vq=0x7fffc04ac140) at ../hw/virtio/virtio.c:2249 #46 0x55b669dc in virtio_queue_host_notifier_read (n=0x7fffc04ac1b4) at ../hw/virtio/virtio.c:3529 #47 0x55e3f458 in aio_dispatch_handler (ctx=0x56a016c0, node=0x7ffd8800e430) at ../util/aio-posix.c:369 #48 0x55e3f613 in aio_dispatch_handlers (ctx=0x56a016c0) at ../util/aio-posix.c:412 #49 0x55e3f669 in aio_dispatch (ctx=0x56a016c0) at ../util/aio-posix.c:422 #50 0x55e585de in aio_ctx_dispatch (source=0x56a016c0, callback=0x0, user_data=0x0) at ../util/async.c:321 #51 0x7554895d in g_main_context_dispatch () at /lib64/libglib-2.0.so.0 #52 0x55e5abea in glib_pollfds_poll () at ../util/main-loop.c:295 #53 0x55e5ac64 in os_host_main_loop_wait (timeout=0) at ../util/main-loop.c:318 #54 0x55e5ad69 in main_loop_wait (nonblocking=0) at ../util/main-loop.c:604 #55 0x559693de in qemu_main_loop () at ../softmmu/runstate.c:731 #56 0x556e7c06 in qemu_default_main () at ../softmmu/main.c:37 #57 0x556e7c3c in main (argc=71, argv=0x7fffcda8) at ../softmmu/main.c:48 -Original Message- From: Maxime Coquelin Sent: Thursday, January 12, 2023 5:26 PM To: Laurent Vivier Cc: qemu-devel@nongnu.org; Peter Maydell ; Yajun Wu ; Parav Pandit ; Michael S. Tsirkin Subject: Re: [PULL v4 76/83] vhost-user: Support vhost_dev_start External email: Use caution opening links or attachments Hi Laurent, On 1/11/23 10:50, Laurent Vivier wrote: > On 1/9/23 11:55, Michael S. Tsirkin wrote: >> On Fri, Jan 06, 2023 at 03:21:43PM +0100, Laurent Vivier wrote: >>> Hi, >>> >>> it seems this patch breaks vhost-user with DPDK. >>> >>> See >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbu >>> gzilla.redhat.com%2Fshow_bug.cgi%3Fid%3D2155173=05%7C01%7Cyajun >>> w%40nvidia.com%7C47e6e0fabd044383fd3308daf47f0253%7C43083d15727340c1 >>> b7db39efd9ccc17a%7C0%7C0%7C638091123577559319%7CUnknown%7CTWFpbGZsb3 >>> d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D >>> %7C3000%7C%7C%7C=1pjChYTKHVmBoempNitiZHBdrlPIMFjKoD6FeOVSay0%3 >>> D=0 >>> >>> it seems QEMU doesn't receive the expected commands sequence: >>> >>> Received unexpected msg type. Expected 22 received 40 Fail to update >>> device iotlb Received unexpected msg type. Expected 40 received 22 >>> Received unexpected msg type. Expected 22 received 11 Fail to update >>> device iotlb Received unexpected msg type. Expected 11 received 22 >>> vhost VQ 1 ring restore failed: -71: Protocol error (71) Received >>> unexpected msg type. Expected 22 received 11 Fail to update device >>> iotlb Received unexpected msg type. Expected 11 received 22 vhost VQ >>> 0 ring restore failed: -71: Protocol error (71) unable to start >>> vhost net: 71: falling back on userspace virtio >>> >>> It receives VHOST_USER_GET_STATUS (40) when it expects >>> VHOST_USER_IOTLB_MSG (22) and VHOST_USER_IOTLB_MSG when it expects >>> VHOST_USER_GET_STATUS. >>> and VHOST_USER_GET_VRING_BASE (11) when it expect >>> VHOST_USER_GET_STATUS and so on. >>> >>> Any idea? We only have a single thread on DPDK side to handle Vhost-user requests, it will read a request, handle it and reply to it. Then it reads the next one, etc... So I don't think it is possible to mix request replies order on DPDK side. Maybe there are two threads concurrently sending requests on QEMU side? Regards, Maxime >>> Thanks, >>> Laurent >> >> >> So I am guessing it's coming from: >> >> if (msg.hdr.request != req
RE: [PULL v4 76/83] vhost-user: Support vhost_dev_start
Hi, VHOST_USER_PROTOCOL_F_STATUS is enabled by default (dpdk): lib/vhost/vhost_user.h 17 #define VHOST_USER_PROTOCOL_FEATURES((1ULL << VHOST_USER_PROTOCOL_F_MQ) | \ 18 (1ULL << VHOST_USER_PROTOCOL_F_LOG_SHMFD) |\ 19 (1ULL << VHOST_USER_PROTOCOL_F_RARP) | \ 20 (1ULL << VHOST_USER_PROTOCOL_F_REPLY_ACK) | \ 21 (1ULL << VHOST_USER_PROTOCOL_F_NET_MTU) | \ 22 (1ULL << VHOST_USER_PROTOCOL_F_SLAVE_REQ) | \ 23 (1ULL << VHOST_USER_PROTOCOL_F_CRYPTO_SESSION) | \ 24 (1ULL << VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD) | \ 25 (1ULL << VHOST_USER_PROTOCOL_F_HOST_NOTIFIER) | \ 26 (1ULL << VHOST_USER_PROTOCOL_F_PAGEFAULT) | \ 27 (1ULL << VHOST_USER_PROTOCOL_F_STATUS)) Remove VHOST_USER_PROTOCOL_F_STATUS can disable VHOST_USER_SET/GET_STATUS message. Should W.A. this issue. Thanks, Yajun -Original Message- From: Laurent Vivier Sent: Wednesday, January 11, 2023 5:50 PM To: Maxime Coquelin Cc: qemu-devel@nongnu.org; Peter Maydell ; Yajun Wu ; Parav Pandit ; Michael S. Tsirkin Subject: Re: [PULL v4 76/83] vhost-user: Support vhost_dev_start External email: Use caution opening links or attachments On 1/9/23 11:55, Michael S. Tsirkin wrote: > On Fri, Jan 06, 2023 at 03:21:43PM +0100, Laurent Vivier wrote: >> Hi, >> >> it seems this patch breaks vhost-user with DPDK. >> >> See >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbug >> zilla.redhat.com%2Fshow_bug.cgi%3Fid%3D2155173=05%7C01%7Cyajunw% >> 40nvidia.com%7Cf4c581251ab548d64ae708daf3b94867%7C43083d15727340c1b7d >> b39efd9ccc17a%7C0%7C0%7C638090274351645141%7CUnknown%7CTWFpbGZsb3d8ey >> JWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C30 >> 00%7C%7C%7C=m582YO4Sd2jJ0S%2F%2FSv9zx6NSuXQIrRwkqBPgYedO%2Fr8%3 >> D=0 >> >> it seems QEMU doesn't receive the expected commands sequence: >> >> Received unexpected msg type. Expected 22 received 40 Fail to update >> device iotlb Received unexpected msg type. Expected 40 received 22 >> Received unexpected msg type. Expected 22 received 11 Fail to update >> device iotlb Received unexpected msg type. Expected 11 received 22 >> vhost VQ 1 ring restore failed: -71: Protocol error (71) Received >> unexpected msg type. Expected 22 received 11 Fail to update device >> iotlb Received unexpected msg type. Expected 11 received 22 vhost VQ >> 0 ring restore failed: -71: Protocol error (71) unable to start vhost >> net: 71: falling back on userspace virtio >> >> It receives VHOST_USER_GET_STATUS (40) when it expects >> VHOST_USER_IOTLB_MSG (22) and VHOST_USER_IOTLB_MSG when it expects >> VHOST_USER_GET_STATUS. >> and VHOST_USER_GET_VRING_BASE (11) when it expect VHOST_USER_GET_STATUS and >> so on. >> >> Any idea? >> >> Thanks, >> Laurent > > > So I am guessing it's coming from: > > if (msg.hdr.request != request) { > error_report("Received unexpected msg type. Expected %d received %d", > request, msg.hdr.request); > return -EPROTO; > } > > in process_message_reply and/or in vhost_user_get_u64. > > >> On 11/7/22 23:53, Michael S. Tsirkin wrote: >>> From: Yajun Wu >>> >>> The motivation of adding vhost-user vhost_dev_start support is to >>> improve backend configuration speed and reduce live migration VM >>> downtime. >>> >>> Today VQ configuration is issued one by one. For virtio net with >>> multi-queue support, backend needs to update RSS (Receive side >>> scaling) on every rx queue enable. Updating RSS is time-consuming >>> (typical time like 7ms). >>> >>> Implement already defined vhost status and message in the vhost >>> specification [1]. >>> (a) VHOST_USER_PROTOCOL_F_STATUS >>> (b) VHOST_USER_SET_STATUS >>> (c) VHOST_USER_GET_STATUS >>> >>> Send message VHOST_USER_SET_STATUS with VIRTIO_CONFIG_S_DRIVER_OK >>> for device start and reset(0) for device stop. >>> >>> On reception of the DRIVER_OK message, backend can apply the needed >>> setting only once (instead of incremental) and also utilize >>> parallelism on enabling queues. >>> >>> This improves QEMU's live migration downtime with vhost user backend >>> implementation by great margin, specially for the large number of >>> VQs of 64 from 800 msec to 250 msec. >>> &g
[PATCH] vhost-user-blk: Fix live migration crash during event handling
After live migration with virtio block device, qemu crash at: #0 0x7fe051e54269 in g_source_destroy () at /lib/x86_64-linux-gnu/libglib-2.0.so.0 #1 0x55cebaa5f37d in qio_net_listener_set_client_func_full (listener=0x55cebceab340, func=0x55cebab4f5f2 , data=0x55cebcdfcc00, notify=0x0, context=0x0) at ../io/net-listener.c:157 #2 0x55cebab4ea99 in tcp_chr_update_read_handler (chr=0x55cebcdfcc00) at ../chardev/char-socket.c:639 #3 0x55cebab529fa in qemu_chr_be_update_read_handlers (s=0x55cebcdfcc00, context=0x0) at ../chardev/char.c:226 #4 0x55cebab4a04e in qemu_chr_fe_set_handlers_full (b=0x55cebdf52120, fd_can_read=0x0, fd_read=0x0, fd_event=0x0, be_change=0x0, opaque=0x0, context=0x0, set_open=false, sync_state=true) at ../chardev/char-fe.c:279 #5 0x55cebab4a0f6 in qemu_chr_fe_set_handlers(b=0x55cebdf52120, fd_can_read=0x0, fd_read=0x0, fd_event=0x0, be_change=0x0, opaque=0x0, context=0x0, set_open=false) at ../chardev/char-fe.c:304 #6 0x55ceba8ec3c8 in vhost_user_blk_event (opaque=0x55cebdf51f40, event=CHR_EVENT_CLOSED) at ../hw/block/vhost-user-blk.c:412 #7 0x55cebab524a1 in chr_be_event (s=0x55cebcdfcc00, event=CHR_EVENT_CLOSED) at ../chardev/char.c:61 #8 0x55cebab52519 in qemu_chr_be_event (s=0x55cebcdfcc00, event=CHR_EVENT_CLOSED) at ../chardev/char.c:81 #9 0x55cebab4fce4 in char_socket_finalize (obj=0x55cebcdfcc00) at ../chardev/char-socket.c:1085 #10 0x55cebaa4cde5 in object_deinit (obj=0x55cebcdfcc00, type=0x55cebcc67160) at ../qom/object.c:675 #11 0x55cebaa4ce5b in object_finalize (data=0x55cebcdfcc00) at ../qom/object.c:689 #12 0x55cebaa4dcec in object_unref (objptr=0x55cebcdfcc00) at ../qom/object.c:1192 #13 0x55cebaa4f3ee in object_finalize_child_property (obj=0x55cebcc6df40, name=0x55cebcead490 "char0", opaque=0x55cebcdfcc00) at ../qom/object.c:1735 #14 0x55cebaa4cbe4 in object_property_del_all (obj=0x55cebcc6df40) at ../qom/object.c:627 #15 0x55cebaa4ce48 in object_finalize (data=0x55cebcc6df40) at ../qom/object.c:688 #16 0x55cebaa4dcec in object_unref (objptr=0x55cebcc6df40) at ../qom/object.c:1192 #17 0x55cebaa4f3ee in object_finalize_child_property (obj=0x55cebce96e00, name=0x55cebceab300 "chardevs", opaque=0x55cebcc6df40) at ../qom/object.c:1735 #18 0x55cebaa4ccd1 in object_property_del_child (obj=0x55cebce96e00, child=0x55cebcc6df40) at ../qom/object.c:649 #19 0x55cebaa4cdb0 in object_unparent (obj=0x55cebcc6df40) at ../qom/object.c:668 #20 0x55cebab55124 in qemu_chr_cleanup () at ../chardev/char.c:1222 #21 0x55ceba79a561 in qemu_cleanup () at ../softmmu/runstate.c:823 #22 0x55ceba53d65f in qemu_main (argc=78, argv=0x7ffc9440bd98, envp=0x0) at ../softmmu/main.c:37 #23 0x55ceba53d68f in main (argc=78, argv=0x7ffc9440bd98) at ../softmmu/main.c:45 Function qemu_chr_fe_set_handlers should not be called in qemu_chr_cleanup, because chardev already freed. Quick fix is to handle RUN_STATE_POSTMIGRATE same as RUN_STATE_SHUTDOWN. Better solution is to add block device cleanup function like net_cleanup and call it in qemu_cleanup. Signed-off-by: Yajun Wu Acked-by: Parav Pandit --- hw/block/vhost-user-blk.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c index 0d5190accf..b323d5820b 100644 --- a/hw/block/vhost-user-blk.c +++ b/hw/block/vhost-user-blk.c @@ -110,7 +110,7 @@ static int vhost_user_blk_handle_config_change(struct vhost_dev *dev) } /* valid for resize only */ -if (blkcfg.capacity != s->blkcfg.capacity) { +if (s && blkcfg.capacity != s->blkcfg.capacity) { s->blkcfg.capacity = blkcfg.capacity; memcpy(dev->vdev->config, >blkcfg, vdev->config_len); virtio_notify_config(dev->vdev); @@ -398,7 +398,8 @@ static void vhost_user_blk_event(void *opaque, QEMUChrEvent event) } break; case CHR_EVENT_CLOSED: -if (!runstate_check(RUN_STATE_SHUTDOWN)) { +if (!runstate_check(RUN_STATE_SHUTDOWN) && +!runstate_check(RUN_STATE_POSTMIGRATE)) { /* * A close event may happen during a read/write, but vhost * code assumes the vhost_dev remains setup, so delay the -- 2.27.0
[PATCH] vhost-user: send set log base message only once
Vhost message VHOST_USER_SET_LOG_BASE is device wide. So only send it once with the first queue pair. Signed-off-by: Yajun Wu Acked-by: Parav Pandit --- hw/virtio/vhost-user.c | 5 + 1 file changed, 5 insertions(+) diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index abe23d4ebe..b3bdf7921d 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -526,6 +526,11 @@ static int vhost_user_set_log_base(struct vhost_dev *dev, uint64_t base, .hdr.size = sizeof(msg.payload.log), }; +/* Send only once with first queue pair */ +if (dev->vq_index != 0) { +return 0; +} + if (shmfd && log->fd != -1) { fds[fd_num++] = log->fd; } -- 2.27.0
RE: [PATCH v2 1/4] hw/virtio: incorporate backend features in features
Hi Alex, Sorry for the late response, I missed your mail. You can test together with dpdk and have reproduce. Steps: 1. DPDK git clone https://github.com/DPDK/dpdk.git git checkout v22.07 meson build -Dexamples=vhost_blk ninja -C build cd /tmp/ sudo dpdk/build/examples/dpdk-vhost_blk # it's a daemon 2. Add blk device to qemu, then bootup VM. Without this commit, you can get device ready log from dpdk-vhost_blk: VHOST_CONFIG: (/tmp/vhost.socket) negotiated Vhost-user protocol features: 0x11ebf VHOST_CONFIG: (/tmp/vhost.socket) negotiated Virtio features: 0x1 VHOST_CONFIG: (/tmp/vhost.socket) virtio is now ready for processing. New Device /tmp/vhost.socket, Device ID 0 Ctrlr Worker Thread start With this commit, device won't be ready and VM will hang. You can see VHOST_USER_F_PROTOCOL_FEATURES bit is added. VHOST_CONFIG: (/tmp/vhost.socket) negotiated Virtio features: 0x14000 Dpdk code related: ./lib/vhost/vhost_user.c 2044 /* 2045 * When VHOST_USER_F_PROTOCOL_FEATURES is not negotiated, 2046 * the ring starts already enabled. Otherwise, it is enabled via 2047 * the SET_VRING_ENABLE message. 2048 */ 2049 if (!(dev->features & (1ULL << VHOST_USER_F_PROTOCOL_FEATURES))) { 2050 vq->enabled = true; 2051 } Thanks, Yajun -Original Message- From: Alex Bennée Sent: Wednesday, October 26, 2022 3:42 PM To: Yajun Wu Cc: qemu-devel@nongnu.org; m...@redhat.com; Parav Pandit Subject: Re: [PATCH v2 1/4] hw/virtio: incorporate backend features in features External email: Use caution opening links or attachments Yajun Wu writes: > Hi Alex, > > With this change, VHOST_USER_F_PROTOCOL_FEATURES bit will be set to > backend for virtio block device (previously not). > > From > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.qemu.org%2Fdocs%2Fmaster%2Finterop%2Fvhost-user.htmldata=05%7C01%7Cyajunw%40nvidia.com%7C2e8901540bf441248ec608dab725ca87%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C638023670196631779%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=y5g9wwTfh%2BxrkESRypoo4pg3eKYInyDerDmI844PBSE%3Dreserved=0 > spec: > If VHOST_USER_F_PROTOCOL_FEATURES has not been negotiated, the ring starts > directly in the enabled state. > If VHOST_USER_F_PROTOCOL_FEATURES has been negotiated, the ring is > initialized in a disabled state and is enabled by > VHOST_USER_SET_VRING_ENABLE with parameter 1. > > Vhost-user-blk won't send out VHOST_USER_SET_VRING_ENABLE today. > Backend gets VHOST_USER_F_PROTOCOL_FEATURES negotiated and can't get > VHOST_USER_SET_VRING_ENABLE. > VQs keep in disabled state. If the backend advertises protocol features but the stub doesn't support it how does it get enabled? The testing I did was mostly by hand with the gpio backend and using the qtests. I Think we need to add some acceptance testing into avocado with some real daemons because I don't think we have enough coverage with the current qtest approach. > > Can you check on this scenario? > > Thanks > > -Original Message- > From: Qemu-devel On > Behalf Of Alex Bennée > Sent: Thursday, July 28, 2022 9:55 PM > To: qemu-devel@nongnu.org > Cc: m...@redhat.com; Alex Bennée > Subject: [PATCH v2 1/4] hw/virtio: incorporate backend features in > features > > External email: Use caution opening links or attachments > > > There are some extra bits used over a vhost-user connection which are hidden > from the device itself. We need to set them here to ensure we enable things > like the protocol extensions. > > Currently net/vhost-user.c has it's own inscrutable way of persisting this > data but it really should live in the core vhost_user code. > > Signed-off-by: Alex Bennée > Message-Id: <20220726192150.2435175-7-alex.ben...@linaro.org> > --- > hw/virtio/vhost-user.c | 9 - > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index > 75b8df21a4..1936a44e82 100644 > --- a/hw/virtio/vhost-user.c > +++ b/hw/virtio/vhost-user.c > @@ -1460,7 +1460,14 @@ static int vhost_user_set_features(struct vhost_dev > *dev, > */ > bool log_enabled = features & (0x1ULL << VHOST_F_LOG_ALL); > > -return vhost_user_set_u64(dev, VHOST_USER_SET_FEATURES, features, > +/* > + * We need to include any extra backend only feature bits that > + * mig
[PATCH] vhost: set mem table before device start
Today guest memory information (through VHOST_USER_SET_MEM_TABLE) is sent out during vhost device start. Due to this delay, memory pinning is delayed. For 4G guest memory, a VFIO driver usually takes around 400+msec to pin the memory. This time is accounted towards the VM downtime. When live migrating a VM, vhost device start is occuring in vmstate load stage. Moving set mem table just after VM bootup, before device start can let backend have enough time to pin the guest memory before starting the device. This improvements reduces VM downtime by 400+msec. Signed-off-by: Yajun Wu Acked-by: Parav Pandit --- hw/virtio/vhost.c | 8 1 file changed, 8 insertions(+) diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c index f758f177bb..73e473cd84 100644 --- a/hw/virtio/vhost.c +++ b/hw/virtio/vhost.c @@ -539,6 +539,14 @@ static void vhost_commit(MemoryListener *listener) } if (!dev->started) { +/* Backend can pin memory before device start, reduce LM downtime */ +if (dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER && +dev->n_mem_sections) { +r = dev->vhost_ops->vhost_set_mem_table(dev, dev->mem); +if (r < 0) { +VHOST_OPS_DEBUG(r, "vhost_set_mem_table failed"); +} +} goto out; } -- 2.27.0
[PATCH v4 1/2] vhost: Change the sequence of device start
This patch is part of adding vhost-user vhost_dev_start support. The motivation is to improve backend configuration speed and reduce live migration VM downtime. Moving the device start routines after finishing all the necessary device and VQ configuration, further aligning to the virtio specification for "device initialization sequence". Following patch will add vhost-user vhost_dev_start support. Signed-off-by: Yajun Wu Acked-by: Parav Pandit --- hw/block/vhost-user-blk.c | 18 +++--- hw/net/vhost_net.c| 14 -- 2 files changed, 19 insertions(+), 13 deletions(-) diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c index 13bf5cc47a..28409c90f7 100644 --- a/hw/block/vhost-user-blk.c +++ b/hw/block/vhost-user-blk.c @@ -168,13 +168,6 @@ static int vhost_user_blk_start(VirtIODevice *vdev, Error **errp) goto err_guest_notifiers; } -ret = vhost_dev_start(>dev, vdev); -if (ret < 0) { -error_setg_errno(errp, -ret, "Error starting vhost"); -goto err_guest_notifiers; -} -s->started_vu = true; - /* guest_notifier_mask/pending not used yet, so just unmask * everything here. virtio-pci will do the right thing by * enabling/disabling irqfd. @@ -183,9 +176,20 @@ static int vhost_user_blk_start(VirtIODevice *vdev, Error **errp) vhost_virtqueue_mask(>dev, vdev, i, false); } +s->dev.vq_index_end = s->dev.nvqs; +ret = vhost_dev_start(>dev, vdev); +if (ret < 0) { +error_setg_errno(errp, -ret, "Error starting vhost"); +goto err_guest_notifiers; +} +s->started_vu = true; + return ret; err_guest_notifiers: +for (i = 0; i < s->dev.nvqs; i++) { +vhost_virtqueue_mask(>dev, vdev, i, true); +} k->set_guest_notifiers(qbus->parent, s->dev.nvqs, false); err_host_notifiers: vhost_dev_disable_notifiers(>dev, vdev); diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c index d28f8b974b..0fe71ed309 100644 --- a/hw/net/vhost_net.c +++ b/hw/net/vhost_net.c @@ -387,21 +387,23 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs, } else { peer = qemu_get_peer(ncs, n->max_queue_pairs); } -r = vhost_net_start_one(get_vhost_net(peer), dev); - -if (r < 0) { -goto err_start; -} if (peer->vring_enable) { /* restore vring enable state */ r = vhost_set_vring_enable(peer, peer->vring_enable); if (r < 0) { -vhost_net_stop_one(get_vhost_net(peer), dev); goto err_start; } } + +r = vhost_net_start_one(get_vhost_net(peer), dev); +if (r < 0) { +if (peer->vring_enable) { +vhost_set_vring_enable(peer, false); +} +goto err_start; +} } return 0; -- 2.27.0
[PATCH v4 0/2] vhost-user: Support vhost_dev_start
The motivation of adding vhost-user vhost_dev_start support is to improve backend configuration speed and reduce live migration VM downtime. Today VQ configuration is issued one by one. For virtio net with multi-queue support, backend needs to update RSS (Receive side scaling) on every rx queue enable. Updating RSS is time-consuming (typical time like 7ms). Implement already defined vhost status and message in the vhost specification [1]. (a) VHOST_USER_PROTOCOL_F_STATUS (b) VHOST_USER_SET_STATUS (c) VHOST_USER_GET_STATUS Send message VHOST_USER_SET_STATUS with VIRTIO_CONFIG_S_DRIVER_OK for device start and reset(0) for device stop. On reception of the DRIVER_OK message, backend can apply the needed setting only once (instead of incremental) and also utilize parallelism on enabling queues. This improves QEMU's live migration downtime with vhost user backend implementation by great margin, specially for the large number of VQs of 64 from 800 msec to 250 msec. Another change is to move the device start routines after finishing all the necessary device and VQ configuration, further aligning to the virtio specification for "device initialization sequence". [1] https://qemu-project.gitlab.io/qemu/interop/vhost-user.html#introduction v4: - fix vhost_net_start_one fallback code v3: - rebase v2: - add setting status bit VIRTIO_CONFIG_S_FEATURES_OK - avoid adding status bits already set Yajun Wu (2): vhost: Change the sequence of device start vhost-user: Support vhost_dev_start hw/block/vhost-user-blk.c | 18 ++ hw/net/vhost_net.c| 14 hw/virtio/vhost-user.c| 74 ++- 3 files changed, 92 insertions(+), 14 deletions(-) -- 2.27.0
[PATCH v4 2/2] vhost-user: Support vhost_dev_start
The motivation of adding vhost-user vhost_dev_start support is to improve backend configuration speed and reduce live migration VM downtime. Today VQ configuration is issued one by one. For virtio net with multi-queue support, backend needs to update RSS (Receive side scaling) on every rx queue enable. Updating RSS is time-consuming (typical time like 7ms). Implement already defined vhost status and message in the vhost specification [1]. (a) VHOST_USER_PROTOCOL_F_STATUS (b) VHOST_USER_SET_STATUS (c) VHOST_USER_GET_STATUS Send message VHOST_USER_SET_STATUS with VIRTIO_CONFIG_S_DRIVER_OK for device start and reset(0) for device stop. On reception of the DRIVER_OK message, backend can apply the needed setting only once (instead of incremental) and also utilize parallelism on enabling queues. This improves QEMU's live migration downtime with vhost user backend implementation by great margin, specially for the large number of VQs of 64 from 800 msec to 250 msec. [1] https://qemu-project.gitlab.io/qemu/interop/vhost-user.html Signed-off-by: Yajun Wu Acked-by: Parav Pandit --- hw/virtio/vhost-user.c | 74 +- 1 file changed, 73 insertions(+), 1 deletion(-) diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index 03415b6c95..bb5164b753 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -81,6 +81,7 @@ enum VhostUserProtocolFeature { VHOST_USER_PROTOCOL_F_RESET_DEVICE = 13, /* Feature 14 reserved for VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS. */ VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS = 15, +VHOST_USER_PROTOCOL_F_STATUS = 16, VHOST_USER_PROTOCOL_F_MAX }; @@ -126,6 +127,8 @@ typedef enum VhostUserRequest { VHOST_USER_GET_MAX_MEM_SLOTS = 36, VHOST_USER_ADD_MEM_REG = 37, VHOST_USER_REM_MEM_REG = 38, +VHOST_USER_SET_STATUS = 39, +VHOST_USER_GET_STATUS = 40, VHOST_USER_MAX } VhostUserRequest; @@ -1452,6 +1455,43 @@ static int vhost_user_set_u64(struct vhost_dev *dev, int request, uint64_t u64, return 0; } +static int vhost_user_set_status(struct vhost_dev *dev, uint8_t status) +{ +return vhost_user_set_u64(dev, VHOST_USER_SET_STATUS, status, false); +} + +static int vhost_user_get_status(struct vhost_dev *dev, uint8_t *status) +{ +uint64_t value; +int ret; + +ret = vhost_user_get_u64(dev, VHOST_USER_GET_STATUS, ); +if (ret < 0) { +return ret; +} +*status = value; + +return 0; +} + +static int vhost_user_add_status(struct vhost_dev *dev, uint8_t status) +{ +uint8_t s; +int ret; + +ret = vhost_user_get_status(dev, ); +if (ret < 0) { +return ret; +} + +if ((s & status) == status) { +return 0; +} +s |= status; + +return vhost_user_set_status(dev, s); +} + static int vhost_user_set_features(struct vhost_dev *dev, uint64_t features) { @@ -1460,6 +1500,7 @@ static int vhost_user_set_features(struct vhost_dev *dev, * backend is actually logging changes */ bool log_enabled = features & (0x1ULL << VHOST_F_LOG_ALL); +int ret; /* * We need to include any extra backend only feature bits that @@ -1467,9 +1508,18 @@ static int vhost_user_set_features(struct vhost_dev *dev, * VHOST_USER_F_PROTOCOL_FEATURES bit for enabling protocol * features. */ -return vhost_user_set_u64(dev, VHOST_USER_SET_FEATURES, +ret = vhost_user_set_u64(dev, VHOST_USER_SET_FEATURES, features | dev->backend_features, log_enabled); + +if (virtio_has_feature(dev->protocol_features, + VHOST_USER_PROTOCOL_F_STATUS)) { +if (!ret) { +return vhost_user_add_status(dev, VIRTIO_CONFIG_S_FEATURES_OK); +} +} + +return ret; } static int vhost_user_set_protocol_features(struct vhost_dev *dev, @@ -2615,6 +2665,27 @@ void vhost_user_cleanup(VhostUserState *user) user->chr = NULL; } +static int vhost_user_dev_start(struct vhost_dev *dev, bool started) +{ +if (!virtio_has_feature(dev->protocol_features, +VHOST_USER_PROTOCOL_F_STATUS)) { +return 0; +} + +/* Set device status only for last queue pair */ +if (dev->vq_index + dev->nvqs != dev->vq_index_end) { +return 0; +} + +if (started) { +return vhost_user_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE | + VIRTIO_CONFIG_S_DRIVER | + VIRTIO_CONFIG_S_DRIVER_OK); +} else { +return vhost_user_set_status(dev, 0); +} +} + const VhostOps user_ops = { .backend_type = VHOST_BACKEND_TYPE_USER, .vhost_backend_init = vhost_user_backend_init, @@ -2649,4 +2720,5 @@ const VhostOps user_ops = { .vhost_backend_mem_secti
RE: [PATCH v2 1/4] hw/virtio: incorporate backend features in features
Hi Alex, With this change, VHOST_USER_F_PROTOCOL_FEATURES bit will be set to backend for virtio block device (previously not). From https://www.qemu.org/docs/master/interop/vhost-user.html spec: If VHOST_USER_F_PROTOCOL_FEATURES has not been negotiated, the ring starts directly in the enabled state. If VHOST_USER_F_PROTOCOL_FEATURES has been negotiated, the ring is initialized in a disabled state and is enabled by VHOST_USER_SET_VRING_ENABLE with parameter 1. Vhost-user-blk won't send out VHOST_USER_SET_VRING_ENABLE today. Backend gets VHOST_USER_F_PROTOCOL_FEATURES negotiated and can't get VHOST_USER_SET_VRING_ENABLE. VQs keep in disabled state. Can you check on this scenario? Thanks -Original Message- From: Qemu-devel On Behalf Of Alex Bennée Sent: Thursday, July 28, 2022 9:55 PM To: qemu-devel@nongnu.org Cc: m...@redhat.com; Alex Bennée Subject: [PATCH v2 1/4] hw/virtio: incorporate backend features in features External email: Use caution opening links or attachments There are some extra bits used over a vhost-user connection which are hidden from the device itself. We need to set them here to ensure we enable things like the protocol extensions. Currently net/vhost-user.c has it's own inscrutable way of persisting this data but it really should live in the core vhost_user code. Signed-off-by: Alex Bennée Message-Id: <20220726192150.2435175-7-alex.ben...@linaro.org> --- hw/virtio/vhost-user.c | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index 75b8df21a4..1936a44e82 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -1460,7 +1460,14 @@ static int vhost_user_set_features(struct vhost_dev *dev, */ bool log_enabled = features & (0x1ULL << VHOST_F_LOG_ALL); -return vhost_user_set_u64(dev, VHOST_USER_SET_FEATURES, features, +/* + * We need to include any extra backend only feature bits that + * might be needed by our device. Currently this includes the + * VHOST_USER_F_PROTOCOL_FEATURES bit for enabling protocol + * features. + */ +return vhost_user_set_u64(dev, VHOST_USER_SET_FEATURES, + features | dev->backend_features, log_enabled); } -- 2.30.2
[PATCH] vhost-user: Fix out of order vring host notification handling
vhost backend sends host notification for every VQ. If backend creates VQs in parallel, the VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG may arrive to QEMU in different order than incremental queue index order. For example VQ 1's message arrive earlier than VQ 0's: After alloc VhostUserHostNotifier for VQ 1. GPtrArray becomes [ nil, VQ1 pointer ] After alloc VhostUserHostNotifier for VQ 0. GPtrArray becomes [ VQ0 pointer, nil, VQ1 pointer ] This is wrong. fetch_notifier will return NULL for VQ 1 in vhost_user_get_vring_base, causes host notifier miss removal(leak). The fix is to remove current element from GPtrArray, make the right position for element to insert. Fixes: 503e355465 ("virtio/vhost-user: dynamically assign VhostUserHostNotifiers") Signed-off-by: Yajun Wu Acked-by: Parav Pandit --- hw/virtio/vhost-user.c | 5 + 1 file changed, 5 insertions(+) diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index 03415b6c95..d256ce589b 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -1543,6 +1543,11 @@ static VhostUserHostNotifier *fetch_or_create_notifier(VhostUserState *u, n = g_ptr_array_index(u->notifiers, idx); if (!n) { +/* + * In case notification arrive out-of-order, + * make room for current index. + */ +g_ptr_array_remove_index(u->notifiers, idx); n = g_new0(VhostUserHostNotifier, 1); n->idx = idx; g_ptr_array_insert(u->notifiers, idx, n); -- 2.27.0
[PATCH v3 0/2] vhost-user: Support vhost_dev_start
The motivation of adding vhost-user vhost_dev_start support is to improve backend configuration speed and reduce live migration VM downtime. Today VQ configuration is issued one by one. For virtio net with multi-queue support, backend needs to update RSS (Receive side scaling) on every rx queue enable. Updating RSS is time-consuming (typical time like 7ms). Implement already defined vhost status and message in the vhost specification [1]. (a) VHOST_USER_PROTOCOL_F_STATUS (b) VHOST_USER_SET_STATUS (c) VHOST_USER_GET_STATUS Send message VHOST_USER_SET_STATUS with VIRTIO_CONFIG_S_DRIVER_OK for device start and reset(0) for device stop. On reception of the DRIVER_OK message, backend can apply the needed setting only once (instead of incremental) and also utilize parallelism on enabling queues. This improves QEMU's live migration downtime with vhost user backend implementation by great margin, specially for the large number of VQs of 64 from 800 msec to 250 msec. Another change is to move the device start routines after finishing all the necessary device and VQ configuration, further aligning to the virtio specification for "device initialization sequence". [1] https://qemu-project.gitlab.io/qemu/interop/vhost-user.html#introduction v3: - rebase v2: - add setting status bit VIRTIO_CONFIG_S_FEATURES_OK - avoid adding status bits already set Yajun Wu (2): vhost: Change the sequence of device start vhost-user: Support vhost_dev_start hw/block/vhost-user-blk.c | 18 ++ hw/net/vhost_net.c| 12 +++ hw/virtio/vhost-user.c| 74 ++- 3 files changed, 90 insertions(+), 14 deletions(-) -- 2.27.0
[PATCH v3 1/2] vhost: Change the sequence of device start
This patch is part of adding vhost-user vhost_dev_start support. The motivation is to improve backend configuration speed and reduce live migration VM downtime. Moving the device start routines after finishing all the necessary device and VQ configuration, further aligning to the virtio specification for "device initialization sequence". Following patch will add vhost-user vhost_dev_start support. Signed-off-by: Yajun Wu Acked-by: Parav Pandit --- hw/block/vhost-user-blk.c | 18 +++--- hw/net/vhost_net.c| 12 ++-- 2 files changed, 17 insertions(+), 13 deletions(-) diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c index 84902dde17..f4deb8cd5d 100644 --- a/hw/block/vhost-user-blk.c +++ b/hw/block/vhost-user-blk.c @@ -164,13 +164,6 @@ static int vhost_user_blk_start(VirtIODevice *vdev, Error **errp) goto err_guest_notifiers; } -ret = vhost_dev_start(>dev, vdev); -if (ret < 0) { -error_setg_errno(errp, -ret, "Error starting vhost"); -goto err_guest_notifiers; -} -s->started_vu = true; - /* guest_notifier_mask/pending not used yet, so just unmask * everything here. virtio-pci will do the right thing by * enabling/disabling irqfd. @@ -179,9 +172,20 @@ static int vhost_user_blk_start(VirtIODevice *vdev, Error **errp) vhost_virtqueue_mask(>dev, vdev, i, false); } +s->dev.vq_index_end = s->dev.nvqs; +ret = vhost_dev_start(>dev, vdev); +if (ret < 0) { +error_setg_errno(errp, -ret, "Error starting vhost"); +goto err_guest_notifiers; +} +s->started_vu = true; + return ret; err_guest_notifiers: +for (i = 0; i < s->dev.nvqs; i++) { +vhost_virtqueue_mask(>dev, vdev, i, true); +} k->set_guest_notifiers(qbus->parent, s->dev.nvqs, false); err_host_notifiers: vhost_dev_disable_notifiers(>dev, vdev); diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c index d28f8b974b..d6924f5e57 100644 --- a/hw/net/vhost_net.c +++ b/hw/net/vhost_net.c @@ -387,21 +387,21 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs, } else { peer = qemu_get_peer(ncs, n->max_queue_pairs); } -r = vhost_net_start_one(get_vhost_net(peer), dev); - -if (r < 0) { -goto err_start; -} if (peer->vring_enable) { /* restore vring enable state */ r = vhost_set_vring_enable(peer, peer->vring_enable); if (r < 0) { -vhost_net_stop_one(get_vhost_net(peer), dev); goto err_start; } } + +r = vhost_net_start_one(get_vhost_net(peer), dev); +if (r < 0) { +vhost_net_stop_one(get_vhost_net(peer), dev); +goto err_start; +} } return 0; -- 2.27.0
[PATCH v3 2/2] vhost-user: Support vhost_dev_start
The motivation of adding vhost-user vhost_dev_start support is to improve backend configuration speed and reduce live migration VM downtime. Today VQ configuration is issued one by one. For virtio net with multi-queue support, backend needs to update RSS (Receive side scaling) on every rx queue enable. Updating RSS is time-consuming (typical time like 7ms). Implement already defined vhost status and message in the vhost specification [1]. (a) VHOST_USER_PROTOCOL_F_STATUS (b) VHOST_USER_SET_STATUS (c) VHOST_USER_GET_STATUS Send message VHOST_USER_SET_STATUS with VIRTIO_CONFIG_S_DRIVER_OK for device start and reset(0) for device stop. On reception of the DRIVER_OK message, backend can apply the needed setting only once (instead of incremental) and also utilize parallelism on enabling queues. This improves QEMU's live migration downtime with vhost user backend implementation by great margin, specially for the large number of VQs of 64 from 800 msec to 250 msec. [1] https://qemu-project.gitlab.io/qemu/interop/vhost-user.html Signed-off-by: Yajun Wu Acked-by: Parav Pandit --- hw/virtio/vhost-user.c | 74 +- 1 file changed, 73 insertions(+), 1 deletion(-) diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index 03415b6c95..bb5164b753 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -81,6 +81,7 @@ enum VhostUserProtocolFeature { VHOST_USER_PROTOCOL_F_RESET_DEVICE = 13, /* Feature 14 reserved for VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS. */ VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS = 15, +VHOST_USER_PROTOCOL_F_STATUS = 16, VHOST_USER_PROTOCOL_F_MAX }; @@ -126,6 +127,8 @@ typedef enum VhostUserRequest { VHOST_USER_GET_MAX_MEM_SLOTS = 36, VHOST_USER_ADD_MEM_REG = 37, VHOST_USER_REM_MEM_REG = 38, +VHOST_USER_SET_STATUS = 39, +VHOST_USER_GET_STATUS = 40, VHOST_USER_MAX } VhostUserRequest; @@ -1452,6 +1455,43 @@ static int vhost_user_set_u64(struct vhost_dev *dev, int request, uint64_t u64, return 0; } +static int vhost_user_set_status(struct vhost_dev *dev, uint8_t status) +{ +return vhost_user_set_u64(dev, VHOST_USER_SET_STATUS, status, false); +} + +static int vhost_user_get_status(struct vhost_dev *dev, uint8_t *status) +{ +uint64_t value; +int ret; + +ret = vhost_user_get_u64(dev, VHOST_USER_GET_STATUS, ); +if (ret < 0) { +return ret; +} +*status = value; + +return 0; +} + +static int vhost_user_add_status(struct vhost_dev *dev, uint8_t status) +{ +uint8_t s; +int ret; + +ret = vhost_user_get_status(dev, ); +if (ret < 0) { +return ret; +} + +if ((s & status) == status) { +return 0; +} +s |= status; + +return vhost_user_set_status(dev, s); +} + static int vhost_user_set_features(struct vhost_dev *dev, uint64_t features) { @@ -1460,6 +1500,7 @@ static int vhost_user_set_features(struct vhost_dev *dev, * backend is actually logging changes */ bool log_enabled = features & (0x1ULL << VHOST_F_LOG_ALL); +int ret; /* * We need to include any extra backend only feature bits that @@ -1467,9 +1508,18 @@ static int vhost_user_set_features(struct vhost_dev *dev, * VHOST_USER_F_PROTOCOL_FEATURES bit for enabling protocol * features. */ -return vhost_user_set_u64(dev, VHOST_USER_SET_FEATURES, +ret = vhost_user_set_u64(dev, VHOST_USER_SET_FEATURES, features | dev->backend_features, log_enabled); + +if (virtio_has_feature(dev->protocol_features, + VHOST_USER_PROTOCOL_F_STATUS)) { +if (!ret) { +return vhost_user_add_status(dev, VIRTIO_CONFIG_S_FEATURES_OK); +} +} + +return ret; } static int vhost_user_set_protocol_features(struct vhost_dev *dev, @@ -2615,6 +2665,27 @@ void vhost_user_cleanup(VhostUserState *user) user->chr = NULL; } +static int vhost_user_dev_start(struct vhost_dev *dev, bool started) +{ +if (!virtio_has_feature(dev->protocol_features, +VHOST_USER_PROTOCOL_F_STATUS)) { +return 0; +} + +/* Set device status only for last queue pair */ +if (dev->vq_index + dev->nvqs != dev->vq_index_end) { +return 0; +} + +if (started) { +return vhost_user_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE | + VIRTIO_CONFIG_S_DRIVER | + VIRTIO_CONFIG_S_DRIVER_OK); +} else { +return vhost_user_set_status(dev, 0); +} +} + const VhostOps user_ops = { .backend_type = VHOST_BACKEND_TYPE_USER, .vhost_backend_init = vhost_user_backend_init, @@ -2649,4 +2720,5 @@ const VhostOps user_ops = { .vhost_backend_mem_secti
RE: [PATCH v2 0/2] vhost-user: Support vhost_dev_start
Michael, Don't forget to merge. Thanks! -Original Message- From: Yajun Wu Sent: Monday, September 5, 2022 12:58 PM To: Michael S. Tsirkin Cc: qemu-devel@nongnu.org; Parav Pandit Subject: RE: [PATCH v2 0/2] vhost-user: Support vhost_dev_start Michael, 7.1 released, please merge. -Original Message- From: Michael S. Tsirkin Sent: Tuesday, July 26, 2022 10:56 PM To: Yajun Wu Cc: qemu-devel@nongnu.org; Parav Pandit Subject: Re: [PATCH v2 0/2] vhost-user: Support vhost_dev_start External email: Use caution opening links or attachments On Tue, Jul 12, 2022 at 01:54:59PM +0300, Yajun Wu wrote: > The motivation of adding vhost-user vhost_dev_start support is to > improve backend configuration speed and reduce live migration VM > downtime. > > Today VQ configuration is issued one by one. For virtio net with > multi-queue support, backend needs to update RSS (Receive side > scaling) on every rx queue enable. Updating RSS is time-consuming > (typical time like 7ms). > > Implement already defined vhost status and message in the vhost > specification [1]. > (a) VHOST_USER_PROTOCOL_F_STATUS > (b) VHOST_USER_SET_STATUS > (c) VHOST_USER_GET_STATUS > > Send message VHOST_USER_SET_STATUS with VIRTIO_CONFIG_S_DRIVER_OK for > device start and reset(0) for device stop. > > On reception of the DRIVER_OK message, backend can apply the needed > setting only once (instead of incremental) and also utilize > parallelism on enabling queues. > > This improves QEMU's live migration downtime with vhost user backend > implementation by great margin, specially for the large number of VQs > of 64 from 800 msec to 250 msec. > > Another change is to move the device start routines after finishing > all the necessary device and VQ configuration, further aligning to the > virtio specification for "device initialization sequence". I think it's best to merge this after the 7.1 release. I've tagged this but pls ping me after the release just to make sure it's not lost. Thanks! > [1] > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fqemu > -project.gitlab.io%2Fqemu%2Finterop%2Fvhost-user.html%23introduction > mp;data=05%7C01%7Cyajunw%40nvidia.com%7Cb526f8237f7a4531d6eb08da6f16fc > e9%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C63791784266470%7CU > nknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1ha > WwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=zQ5%2F6pYP0riRzArdOWyaARNn > 6s9NC8vIeIgj%2BB03EAM%3Dreserved=0 > > v2: > - add setting status bit VIRTIO_CONFIG_S_FEATURES_OK > - avoid adding status bits already set > > Yajun Wu (2): > vhost: Change the sequence of device start > vhost-user: Support vhost_dev_start > > hw/block/vhost-user-blk.c | 18 ++ > hw/net/vhost_net.c| 12 +++ > hw/virtio/vhost-user.c| 74 ++- > 3 files changed, 90 insertions(+), 14 deletions(-) > > -- > 2.27.0
RE: [PATCH v2 0/2] vhost-user: Support vhost_dev_start
Michael, 7.1 released, please merge. -Original Message- From: Michael S. Tsirkin Sent: Tuesday, July 26, 2022 10:56 PM To: Yajun Wu Cc: qemu-devel@nongnu.org; Parav Pandit Subject: Re: [PATCH v2 0/2] vhost-user: Support vhost_dev_start External email: Use caution opening links or attachments On Tue, Jul 12, 2022 at 01:54:59PM +0300, Yajun Wu wrote: > The motivation of adding vhost-user vhost_dev_start support is to > improve backend configuration speed and reduce live migration VM > downtime. > > Today VQ configuration is issued one by one. For virtio net with > multi-queue support, backend needs to update RSS (Receive side > scaling) on every rx queue enable. Updating RSS is time-consuming > (typical time like 7ms). > > Implement already defined vhost status and message in the vhost > specification [1]. > (a) VHOST_USER_PROTOCOL_F_STATUS > (b) VHOST_USER_SET_STATUS > (c) VHOST_USER_GET_STATUS > > Send message VHOST_USER_SET_STATUS with VIRTIO_CONFIG_S_DRIVER_OK for > device start and reset(0) for device stop. > > On reception of the DRIVER_OK message, backend can apply the needed > setting only once (instead of incremental) and also utilize > parallelism on enabling queues. > > This improves QEMU's live migration downtime with vhost user backend > implementation by great margin, specially for the large number of VQs > of 64 from 800 msec to 250 msec. > > Another change is to move the device start routines after finishing > all the necessary device and VQ configuration, further aligning to the > virtio specification for "device initialization sequence". I think it's best to merge this after the 7.1 release. I've tagged this but pls ping me after the release just to make sure it's not lost. Thanks! > [1] > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fqemu > -project.gitlab.io%2Fqemu%2Finterop%2Fvhost-user.html%23introduction > mp;data=05%7C01%7Cyajunw%40nvidia.com%7Cb526f8237f7a4531d6eb08da6f16fc > e9%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C63791784266470%7CU > nknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1ha > WwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=zQ5%2F6pYP0riRzArdOWyaARNn > 6s9NC8vIeIgj%2BB03EAM%3Dreserved=0 > > v2: > - add setting status bit VIRTIO_CONFIG_S_FEATURES_OK > - avoid adding status bits already set > > Yajun Wu (2): > vhost: Change the sequence of device start > vhost-user: Support vhost_dev_start > > hw/block/vhost-user-blk.c | 18 ++ > hw/net/vhost_net.c| 12 +++ > hw/virtio/vhost-user.c| 74 ++- > 3 files changed, 90 insertions(+), 14 deletions(-) > > -- > 2.27.0
[PATCH v2 1/2] vhost: Change the sequence of device start
This patch is part of adding vhost-user vhost_dev_start support. The motivation is to improve backend configuration speed and reduce live migration VM downtime. Moving the device start routines after finishing all the necessary device and VQ configuration, further aligning to the virtio specification for "device initialization sequence". Following patch will add vhost-user vhost_dev_start support. Signed-off-by: Yajun Wu Acked-by: Parav Pandit --- hw/block/vhost-user-blk.c | 18 +++--- hw/net/vhost_net.c| 12 ++-- 2 files changed, 17 insertions(+), 13 deletions(-) diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c index 9117222456..972ef46365 100644 --- a/hw/block/vhost-user-blk.c +++ b/hw/block/vhost-user-blk.c @@ -163,13 +163,6 @@ static int vhost_user_blk_start(VirtIODevice *vdev, Error **errp) goto err_guest_notifiers; } -ret = vhost_dev_start(>dev, vdev); -if (ret < 0) { -error_setg_errno(errp, -ret, "Error starting vhost"); -goto err_guest_notifiers; -} -s->started_vu = true; - /* guest_notifier_mask/pending not used yet, so just unmask * everything here. virtio-pci will do the right thing by * enabling/disabling irqfd. @@ -178,9 +171,20 @@ static int vhost_user_blk_start(VirtIODevice *vdev, Error **errp) vhost_virtqueue_mask(>dev, vdev, i, false); } +s->dev.vq_index_end = s->dev.nvqs; +ret = vhost_dev_start(>dev, vdev); +if (ret < 0) { +error_setg_errno(errp, -ret, "Error starting vhost"); +goto err_guest_notifiers; +} +s->started_vu = true; + return ret; err_guest_notifiers: +for (i = 0; i < s->dev.nvqs; i++) { +vhost_virtqueue_mask(>dev, vdev, i, true); +} k->set_guest_notifiers(qbus->parent, s->dev.nvqs, false); err_host_notifiers: vhost_dev_disable_notifiers(>dev, vdev); diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c index ccac5b7a64..6a8a6082a2 100644 --- a/hw/net/vhost_net.c +++ b/hw/net/vhost_net.c @@ -370,21 +370,21 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs, } else { peer = qemu_get_peer(ncs, n->max_queue_pairs); } -r = vhost_net_start_one(get_vhost_net(peer), dev); - -if (r < 0) { -goto err_start; -} if (peer->vring_enable) { /* restore vring enable state */ r = vhost_set_vring_enable(peer, peer->vring_enable); if (r < 0) { -vhost_net_stop_one(get_vhost_net(peer), dev); goto err_start; } } + +r = vhost_net_start_one(get_vhost_net(peer), dev); +if (r < 0) { +vhost_net_stop_one(get_vhost_net(peer), dev); +goto err_start; +} } return 0; -- 2.27.0
[PATCH v2 0/2] vhost-user: Support vhost_dev_start
The motivation of adding vhost-user vhost_dev_start support is to improve backend configuration speed and reduce live migration VM downtime. Today VQ configuration is issued one by one. For virtio net with multi-queue support, backend needs to update RSS (Receive side scaling) on every rx queue enable. Updating RSS is time-consuming (typical time like 7ms). Implement already defined vhost status and message in the vhost specification [1]. (a) VHOST_USER_PROTOCOL_F_STATUS (b) VHOST_USER_SET_STATUS (c) VHOST_USER_GET_STATUS Send message VHOST_USER_SET_STATUS with VIRTIO_CONFIG_S_DRIVER_OK for device start and reset(0) for device stop. On reception of the DRIVER_OK message, backend can apply the needed setting only once (instead of incremental) and also utilize parallelism on enabling queues. This improves QEMU's live migration downtime with vhost user backend implementation by great margin, specially for the large number of VQs of 64 from 800 msec to 250 msec. Another change is to move the device start routines after finishing all the necessary device and VQ configuration, further aligning to the virtio specification for "device initialization sequence". [1] https://qemu-project.gitlab.io/qemu/interop/vhost-user.html#introduction v2: - add setting status bit VIRTIO_CONFIG_S_FEATURES_OK - avoid adding status bits already set Yajun Wu (2): vhost: Change the sequence of device start vhost-user: Support vhost_dev_start hw/block/vhost-user-blk.c | 18 ++ hw/net/vhost_net.c| 12 +++ hw/virtio/vhost-user.c| 74 ++- 3 files changed, 90 insertions(+), 14 deletions(-) -- 2.27.0
[PATCH v2 2/2] vhost-user: Support vhost_dev_start
The motivation of adding vhost-user vhost_dev_start support is to improve backend configuration speed and reduce live migration VM downtime. Today VQ configuration is issued one by one. For virtio net with multi-queue support, backend needs to update RSS (Receive side scaling) on every rx queue enable. Updating RSS is time-consuming (typical time like 7ms). Implement already defined vhost status and message in the vhost specification [1]. (a) VHOST_USER_PROTOCOL_F_STATUS (b) VHOST_USER_SET_STATUS (c) VHOST_USER_GET_STATUS Send message VHOST_USER_SET_STATUS with VIRTIO_CONFIG_S_DRIVER_OK for device start and reset(0) for device stop. On reception of the DRIVER_OK message, backend can apply the needed setting only once (instead of incremental) and also utilize parallelism on enabling queues. This improves QEMU's live migration downtime with vhost user backend implementation by great margin, specially for the large number of VQs of 64 from 800 msec to 250 msec. [1] https://qemu-project.gitlab.io/qemu/interop/vhost-user.html Signed-off-by: Yajun Wu Acked-by: Parav Pandit --- hw/virtio/vhost-user.c | 74 +- 1 file changed, 73 insertions(+), 1 deletion(-) diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index 75b8df21a4..e6ad4c05b8 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -81,6 +81,7 @@ enum VhostUserProtocolFeature { VHOST_USER_PROTOCOL_F_RESET_DEVICE = 13, /* Feature 14 reserved for VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS. */ VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS = 15, +VHOST_USER_PROTOCOL_F_STATUS = 16, VHOST_USER_PROTOCOL_F_MAX }; @@ -126,6 +127,8 @@ typedef enum VhostUserRequest { VHOST_USER_GET_MAX_MEM_SLOTS = 36, VHOST_USER_ADD_MEM_REG = 37, VHOST_USER_REM_MEM_REG = 38, +VHOST_USER_SET_STATUS = 39, +VHOST_USER_GET_STATUS = 40, VHOST_USER_MAX } VhostUserRequest; @@ -1451,6 +1454,43 @@ static int vhost_user_set_u64(struct vhost_dev *dev, int request, uint64_t u64, return 0; } +static int vhost_user_set_status(struct vhost_dev *dev, uint8_t status) +{ +return vhost_user_set_u64(dev, VHOST_USER_SET_STATUS, status, false); +} + +static int vhost_user_get_status(struct vhost_dev *dev, uint8_t *status) +{ +uint64_t value; +int ret; + +ret = vhost_user_get_u64(dev, VHOST_USER_GET_STATUS, ); +if (ret < 0) { +return ret; +} +*status = value; + +return 0; +} + +static int vhost_user_add_status(struct vhost_dev *dev, uint8_t status) +{ +uint8_t s; +int ret; + +ret = vhost_user_get_status(dev, ); +if (ret < 0) { +return ret; +} + +if ((s & status) == status) { +return 0; +} +s |= status; + +return vhost_user_set_status(dev, s); +} + static int vhost_user_set_features(struct vhost_dev *dev, uint64_t features) { @@ -1459,9 +1499,19 @@ static int vhost_user_set_features(struct vhost_dev *dev, * backend is actually logging changes */ bool log_enabled = features & (0x1ULL << VHOST_F_LOG_ALL); +int ret; -return vhost_user_set_u64(dev, VHOST_USER_SET_FEATURES, features, +ret = vhost_user_set_u64(dev, VHOST_USER_SET_FEATURES, features, log_enabled); + +if (virtio_has_feature(dev->protocol_features, +VHOST_USER_PROTOCOL_F_STATUS)) { +if (!ret) { +return vhost_user_add_status(dev, VIRTIO_CONFIG_S_FEATURES_OK); +} +} + +return ret; } static int vhost_user_set_protocol_features(struct vhost_dev *dev, @@ -2607,6 +2657,27 @@ void vhost_user_cleanup(VhostUserState *user) user->chr = NULL; } +static int vhost_user_dev_start(struct vhost_dev *dev, bool started) +{ +if (!virtio_has_feature(dev->protocol_features, +VHOST_USER_PROTOCOL_F_STATUS)) { +return 0; +} + +/* Set device status only for last queue pair */ +if (dev->vq_index + dev->nvqs != dev->vq_index_end) { +return 0; +} + +if (started) { +return vhost_user_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE | + VIRTIO_CONFIG_S_DRIVER | + VIRTIO_CONFIG_S_DRIVER_OK); +} else { +return vhost_user_set_status(dev, 0); +} +} + const VhostOps user_ops = { .backend_type = VHOST_BACKEND_TYPE_USER, .vhost_backend_init = vhost_user_backend_init, @@ -2641,4 +2712,5 @@ const VhostOps user_ops = { .vhost_backend_mem_section_filter = vhost_user_mem_section_filter, .vhost_get_inflight_fd = vhost_user_get_inflight_fd, .vhost_set_inflight_fd = vhost_user_set_inflight_fd, +.vhost_dev_start = vhost_user_dev_start, }; -- 2.27.0
[PATCH 0/2] vhost-user: Support vhost_dev_start
The motivation of adding vhost-user vhost_dev_start support is to improve backend configuration speed and reduce live migration VM downtime. Today VQ configuration is issued one by one. For virtio net with multi-queue support, backend needs to update RSS (Receive side scaling) on every rx queue enable. Updating RSS is time-consuming (typical time like 7ms). Implement already defined vhost status and message in the vhost specification [1]. (a) VHOST_USER_PROTOCOL_F_STATUS (b) VHOST_USER_SET_STATUS (c) VHOST_USER_GET_STATUS Send message VHOST_USER_SET_STATUS with VIRTIO_CONFIG_S_DRIVER_OK for device start and reset(0) for device stop. On reception of the DRIVER_OK message, backend can apply the needed setting only once (instead of incremental) and also utilize parallelism on enabling queues. This improves QEMU's live migration downtime with vhost user backend implementation by great margin, specially for the large number of VQs of 64 from 800 msec to 250 msec. Another change is to move the device start routines after finishing all the necessary device and VQ configuration, further aligning to the virtio specification for "device initialization sequence". [1] https://qemu-project.gitlab.io/qemu/interop/vhost-user.html#introduction Yajun Wu (2): vhost: Change the sequence of device start vhost-user: Support vhost_dev_start hw/block/vhost-user-blk.c | 18 +++- hw/net/vhost_net.c| 12 hw/virtio/vhost-user.c| 58 +++ 3 files changed, 75 insertions(+), 13 deletions(-) -- 2.30.2
[PATCH 1/2] vhost: Change the sequence of device start
This patch is part of adding vhost-user vhost_dev_start support. The motivation is to improve backend configuration speed and reduce live migration VM downtime. Moving the device start routines after finishing all the necessary device and VQ configuration, further aligning to the virtio specification for "device initialization sequence". Following patch will add vhost-user vhost_dev_start support. Signed-off-by: Yajun Wu Acked-by: Parav Pandit --- hw/block/vhost-user-blk.c | 18 +++--- hw/net/vhost_net.c| 12 ++-- 2 files changed, 17 insertions(+), 13 deletions(-) diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c index 9117222456..972ef46365 100644 --- a/hw/block/vhost-user-blk.c +++ b/hw/block/vhost-user-blk.c @@ -163,13 +163,6 @@ static int vhost_user_blk_start(VirtIODevice *vdev, Error **errp) goto err_guest_notifiers; } -ret = vhost_dev_start(>dev, vdev); -if (ret < 0) { -error_setg_errno(errp, -ret, "Error starting vhost"); -goto err_guest_notifiers; -} -s->started_vu = true; - /* guest_notifier_mask/pending not used yet, so just unmask * everything here. virtio-pci will do the right thing by * enabling/disabling irqfd. @@ -178,9 +171,20 @@ static int vhost_user_blk_start(VirtIODevice *vdev, Error **errp) vhost_virtqueue_mask(>dev, vdev, i, false); } +s->dev.vq_index_end = s->dev.nvqs; +ret = vhost_dev_start(>dev, vdev); +if (ret < 0) { +error_setg_errno(errp, -ret, "Error starting vhost"); +goto err_guest_notifiers; +} +s->started_vu = true; + return ret; err_guest_notifiers: +for (i = 0; i < s->dev.nvqs; i++) { +vhost_virtqueue_mask(>dev, vdev, i, true); +} k->set_guest_notifiers(qbus->parent, s->dev.nvqs, false); err_host_notifiers: vhost_dev_disable_notifiers(>dev, vdev); diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c index ccac5b7a64..6a8a6082a2 100644 --- a/hw/net/vhost_net.c +++ b/hw/net/vhost_net.c @@ -370,21 +370,21 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs, } else { peer = qemu_get_peer(ncs, n->max_queue_pairs); } -r = vhost_net_start_one(get_vhost_net(peer), dev); - -if (r < 0) { -goto err_start; -} if (peer->vring_enable) { /* restore vring enable state */ r = vhost_set_vring_enable(peer, peer->vring_enable); if (r < 0) { -vhost_net_stop_one(get_vhost_net(peer), dev); goto err_start; } } + +r = vhost_net_start_one(get_vhost_net(peer), dev); +if (r < 0) { +vhost_net_stop_one(get_vhost_net(peer), dev); +goto err_start; +} } return 0; -- 2.27.0
[PATCH 2/2] vhost-user: Support vhost_dev_start
The motivation of adding vhost-user vhost_dev_start support is to improve backend configuration speed and reduce live migration VM downtime. Today VQ configuration is issued one by one. For virtio net with multi-queue support, backend needs to update RSS (Receive side scaling) on every rx queue enable. Updating RSS is time-consuming (typical time like 7ms). Implement already defined vhost status and message in the vhost specification [1]. (a) VHOST_USER_PROTOCOL_F_STATUS (b) VHOST_USER_SET_STATUS (c) VHOST_USER_GET_STATUS Send message VHOST_USER_SET_STATUS with VIRTIO_CONFIG_S_DRIVER_OK for device start and reset(0) for device stop. On reception of the DRIVER_OK message, backend can apply the needed setting only once (instead of incremental) and also utilize parallelism on enabling queues. This improves QEMU's live migration downtime with vhost user backend implementation by great margin, specially for the large number of VQs of 64 from 800 msec to 250 msec. [1] https://qemu-project.gitlab.io/qemu/interop/vhost-user.html Signed-off-by: Yajun Wu Acked-by: Parav Pandit --- hw/virtio/vhost-user.c | 58 ++ 1 file changed, 58 insertions(+) diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index 4b9be26e84..9f75c51dc2 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -81,6 +81,7 @@ enum VhostUserProtocolFeature { VHOST_USER_PROTOCOL_F_RESET_DEVICE = 13, /* Feature 14 reserved for VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS. */ VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS = 15, +VHOST_USER_PROTOCOL_F_STATUS = 16, VHOST_USER_PROTOCOL_F_MAX }; @@ -126,6 +127,8 @@ typedef enum VhostUserRequest { VHOST_USER_GET_MAX_MEM_SLOTS = 36, VHOST_USER_ADD_MEM_REG = 37, VHOST_USER_REM_MEM_REG = 38, +VHOST_USER_SET_STATUS = 39, +VHOST_USER_GET_STATUS = 40, VHOST_USER_MAX } VhostUserRequest; @@ -1446,6 +1449,39 @@ static int vhost_user_set_u64(struct vhost_dev *dev, int request, uint64_t u64, return 0; } +static int vhost_user_set_status(struct vhost_dev *dev, uint8_t status) +{ +return vhost_user_set_u64(dev, VHOST_USER_SET_STATUS, status, false); +} + +static int vhost_user_get_status(struct vhost_dev *dev, uint8_t *status) +{ +uint64_t value; +int ret; + +ret = vhost_user_get_u64(dev, VHOST_USER_GET_STATUS, ); +if (ret < 0) { +return ret; +} +*status = value; + +return 0; +} + +static int vhost_user_add_status(struct vhost_dev *dev, uint8_t status) +{ +uint8_t s; +int ret; + +ret = vhost_user_get_status(dev, ); +if (ret < 0) { +return ret; +} +s |= status; + +return vhost_user_set_status(dev, s); +} + static int vhost_user_set_features(struct vhost_dev *dev, uint64_t features) { @@ -2602,6 +2638,27 @@ void vhost_user_cleanup(VhostUserState *user) user->chr = NULL; } +static int vhost_user_dev_start(struct vhost_dev *dev, bool started) +{ +if (!virtio_has_feature(dev->protocol_features, +VHOST_USER_PROTOCOL_F_STATUS)) { +return 0; +} + +/* Set device status only for last queue pair */ +if (dev->vq_index + dev->nvqs != dev->vq_index_end) { +return 0; +} + +if (started) { +return vhost_user_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE | + VIRTIO_CONFIG_S_DRIVER | + VIRTIO_CONFIG_S_DRIVER_OK); +} else { +return vhost_user_set_status(dev, 0); +} +} + const VhostOps user_ops = { .backend_type = VHOST_BACKEND_TYPE_USER, .vhost_backend_init = vhost_user_backend_init, @@ -2635,4 +2692,5 @@ const VhostOps user_ops = { .vhost_backend_mem_section_filter = vhost_user_mem_section_filter, .vhost_get_inflight_fd = vhost_user_get_inflight_fd, .vhost_set_inflight_fd = vhost_user_set_inflight_fd, +.vhost_dev_start = vhost_user_dev_start, }; -- 2.27.0
RE: [PATCH] virtio/vhost-user: Fix wrong vhost notifier GPtrArray size
Hi Michael, User space vhost clients are broken for few weeks now without this fix. With Alex's review, can you please merge it if there are no further comments? Thanks. -Original Message- From: Alex Bennée Sent: Thursday, May 26, 2022 3:09 PM To: Yajun Wu Cc: qemu-devel@nongnu.org; m...@redhat.com; Parav Pandit Subject: Re: [PATCH] virtio/vhost-user: Fix wrong vhost notifier GPtrArray size External email: Use caution opening links or attachments Yajun Wu writes: > In fetch_or_create_notifier, idx begins with 0. So the GPtrArray size > should be idx + 1 and g_ptr_array_set_size should be called with idx + 1. > > This wrong GPtrArray size causes fetch_or_create_notifier return an > invalid address. Passing this invalid pointer to > vhost_user_host_notifier_remove causes assert fail: > > qemu/include/qemu/int128.h:27: int128_get64: Assertion `r == a' failed. > shutting down, reason=crashed > > Backends like dpdk-vdpa which sends out vhost notifier requests almost > always hit qemu crash. My bad. I was looking for ways to exercise this code but the internal tests didn't do it. I guess I should look at getting a test setup for dpdk. Anyway: Reviewed-by: Alex Bennée > > Fixes: 503e355465 ("virtio/vhost-user: dynamically assign > VhostUserHostNotifiers") > Signed-off-by: Yajun Wu > Acked-by: Parav Pandit > Change-Id: I87e0f7591ca9a59d210879b260704a2d9e9d6bcd > --- > hw/virtio/vhost-user.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index > b040c1ad2b..dbc690d16c 100644 > --- a/hw/virtio/vhost-user.c > +++ b/hw/virtio/vhost-user.c > @@ -1525,7 +1525,7 @@ static VhostUserHostNotifier > *fetch_or_create_notifier(VhostUserState *u, { > VhostUserHostNotifier *n = NULL; > if (idx >= u->notifiers->len) { > -g_ptr_array_set_size(u->notifiers, idx); > +g_ptr_array_set_size(u->notifiers, idx + 1); > } > > n = g_ptr_array_index(u->notifiers, idx); -- Alex Bennée
[PATCH] virtio/vhost-user: Fix wrong vhost notifier GPtrArray size
In fetch_or_create_notifier, idx begins with 0. So the GPtrArray size should be idx + 1 and g_ptr_array_set_size should be called with idx + 1. This wrong GPtrArray size causes fetch_or_create_notifier return an invalid address. Passing this invalid pointer to vhost_user_host_notifier_remove causes assert fail: qemu/include/qemu/int128.h:27: int128_get64: Assertion `r == a' failed. shutting down, reason=crashed Backends like dpdk-vdpa which sends out vhost notifier requests almost always hit qemu crash. Fixes: 503e355465 ("virtio/vhost-user: dynamically assign VhostUserHostNotifiers") Signed-off-by: Yajun Wu Acked-by: Parav Pandit Change-Id: I87e0f7591ca9a59d210879b260704a2d9e9d6bcd --- hw/virtio/vhost-user.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index b040c1ad2b..dbc690d16c 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -1525,7 +1525,7 @@ static VhostUserHostNotifier *fetch_or_create_notifier(VhostUserState *u, { VhostUserHostNotifier *n = NULL; if (idx >= u->notifiers->len) { -g_ptr_array_set_size(u->notifiers, idx); +g_ptr_array_set_size(u->notifiers, idx + 1); } n = g_ptr_array_index(u->notifiers, idx); -- 2.36.0
[PATCH v2] hw/virtio: Fix leak of host-notifier memory-region
If call virtio_queue_set_host_notifier_mr fails, should free host-notifier memory-region. Fixes: 44866521bd ("vhost-user: support registering external host notifiers") Signed-off-by: Yajun Wu --- hw/virtio/vhost-user.c | 1 + 1 file changed, 1 insertion(+) diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index aec6cc1..3ae5297 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -1474,6 +1474,7 @@ static int vhost_user_slave_handle_vring_host_notifier(struct vhost_dev *dev, g_free(name); if (virtio_queue_set_host_notifier_mr(vdev, queue_idx, >mr, true)) { +object_unparent(OBJECT(>mr)); munmap(addr, page_size); return -1; } -- 1.8.3.1
[PATCH] hw/virtio: Fix leak of host-notifier memory-region
If call virtio_queue_set_host_notifier_mr fails, should free host-notifier memory-region. Signed-off-by: Yajun Wu --- hw/virtio/vhost-user.c | 1 + 1 file changed, 1 insertion(+) diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index aec6cc1..3ae5297 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -1474,6 +1474,7 @@ static int vhost_user_slave_handle_vring_host_notifier(struct vhost_dev *dev, g_free(name); if (virtio_queue_set_host_notifier_mr(vdev, queue_idx, >mr, true)) { +object_unparent(OBJECT(>mr)); munmap(addr, page_size); return -1; } -- 1.8.3.1
Any reason VIRTQUEUE_MAX_SIZE is 1024? Can we increase this limit?
Hi all, I'm doing iperf test on VIRTIO net through vhost-user(HW VDPA). Find maximal acceptable tx_queue_size/rx_queue_size is 1024. Basically increase queue size can get better RX rate for my case. Can we increase the limit(VIRTQUEUE_MAX_SIZE) to 8192 to possibly gain better performance? Thanks, Yajun