Re: vhost-user-blk reconnect issue

2024-04-10 Thread Yajun Wu


On 4/2/2024 4:44 PM, Li Feng wrote:

*External email: Use caution opening links or attachments*



Hi,

I tested it today and there is indeed a problem in this scenario.
It seems that the first version of the patch is the best and can 
handle all scenarios.

With this patch, the previously merged patches are no longer useful.


I will revert this patch and submit a new fix. Do you have any comments?

Revert: 
https://lore.kernel.org/all/a68c0148e9bf105f9e83ff5e763b8fcb6f7ba9be.1697644299.git@redhat.com/ 
<https://lore.kernel.org/all/a68c0148e9bf105f9e83ff5e763b8fcb6f7ba9be.1697644299.git@redhat.com/>
New: 
https://lore.kernel.org/all/20230804052954.2918915-2-fen...@smartx.com/ 
<https://lore.kernel.org/all/20230804052954.2918915-2-fen...@smartx.com/>


Looks good to me.

Thanks,

Yajun



Thanks,
Li


2024年4月1日 16:43,Yajun Wu  写道:


On 4/1/2024 4:34 PM, Li Feng wrote:

*External email: Use caution opening links or attachments*


Hi yajun,

I have submitted a patch to fix this problem a few months ago, but 
in the end this solution was not accepted and other solutions

were adopted to fix it.

[PATCH 1/2] vhost-user: fix lost reconnect - Li Feng 
<https://lore.kernel.org/all/20230804052954.2918915-2-fen...@smartx.com/>
lore.kernel.org 
<https://lore.kernel.org/all/20230804052954.2918915-2-fen...@smartx.com/>


<https://lore.kernel.org/all/20230804052954.2918915-2-fen...@smartx.com/>

<https://lore.kernel.org/all/20230804052954.2918915-2-fen...@smartx.com/>


I think this fix is valid.


This is the merged fix:


[PULL 76/83] vhost-user: fix lost reconnect - Michael S. Tsirkin 
<https://lore.kernel.org/all/a68c0148e9bf105f9e83ff5e763b8fcb6f7ba9be.1697644299.git@redhat.com/>
lore.kernel.org 
<https://lore.kernel.org/all/a68c0148e9bf105f9e83ff5e763b8fcb6f7ba9be.1697644299.git@redhat.com/>


<https://lore.kernel.org/all/a68c0148e9bf105f9e83ff5e763b8fcb6f7ba9be.1697644299.git@redhat.com/>

<https://lore.kernel.org/all/a68c0148e9bf105f9e83ff5e763b8fcb6f7ba9be.1697644299.git@redhat.com/>


My tests are with this fix, failed in the two scenarios I mentioned.



Thanks,
Li


2024年4月1日 10:08,Yajun Wu  写道:


On 3/27/2024 6:47 PM, Stefano Garzarella wrote:

External email: Use caution opening links or attachments


Hi Yajun,

On Mon, Mar 25, 2024 at 10:54:13AM +, Yajun Wu wrote:

Hi experts,

With latest QEMU (8.2.90), we find two vhost-user-blk backend 
reconnect

failure scenarios:

Do you know if has it ever worked and so it's a regression, or have we
always had this problem?


I am afraid this commit: "71e076a07d (2022-12-01 02:30:13 -0500) 
hw/virtio: generalise CHR_EVENT_CLOSED handling"  caused both 
failures. Previous hash is good.


I suspect the "if (vhost->vdev)" in vhost_user_async_close_bh is 
the cause, previous code doesn't have this check?




Thanks,
Stefano

1. Disconnect vhost-user-blk backend before guest driver probe 
vblk device, then reconnect backend after guest driver probe 
device. QEMU won't send out any vhost messages to restore backend.
This is because vhost->vdev is NULL before guest driver probe 
vblk device, so vhost_user_blk_disconnect won't be called, 
s->connected is still true. Next vhost_user_blk_connect will 
simply return without doing anything.


2. modprobe -r virtio-blk inside VM, then disconnect backend, 
then reconnect backend, then modprobe virtio-blk. QEMU won't send 
messages in vhost_dev_init.
This is because rmmod will let qemu call vhost_user_blk_stop, 
vhost->vdev also become NULL(in vhost_dev_stop), 
vhost_user_blk_disconnect won't be called. Again s->connected is 
still true, even chr connect is closed.


I think even vhost->vdev is NULL, vhost_user_blk_disconnect 
should be called when chr connect close?

Hope we can have a fix soon.


Thanks,
Yajun





Re: [PATCH] virtio-net: fix qemu set used ring flag even vhost started

2024-04-02 Thread Yajun Wu



On 4/3/2024 1:10 PM, Michael Tokarev wrote:

External email: Use caution opening links or attachments


02.04.2024 07:51, Yajun Wu:

When vhost-user or vhost-kernel is handling virtio net datapath, qemu
should not touch used ring.

But with vhost-user socket reconnect scenario, in a very rare case (has
pending kick event). VRING_USED_F_NO_NOTIFY is set by qemu in
following code path:

...

This issue causes guest kernel stop kicking device and traffic stop.

Add vhost_started check in virtio_net_handle_tx_bh to fix this wrong
VRING_USED_F_NO_NOTIFY set.

This seems to be qemu-stable material, even if the case when the issue
happens is "very rare", -- is it not?

If you mean this fix needs back port to previous version? Yes.

Thanks,

/mjt


Signed-off-by: Yajun Wu 
Reviewed-by: Jiri Pirko 
---
   hw/net/virtio-net.c | 4 
   1 file changed, 4 insertions(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index a6ff000cd9..8035e01fdf 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -2865,6 +2865,10 @@ static void virtio_net_handle_tx_bh(VirtIODevice *vdev, 
VirtQueue *vq)
   VirtIONet *n = VIRTIO_NET(vdev);
   VirtIONetQueue *q = >vqs[vq2q(virtio_get_queue_index(vq))];

+if (n->vhost_started) {
+return;
+}
+
   if (unlikely((n->status & VIRTIO_NET_S_LINK_UP) == 0)) {
   virtio_net_drop_tx_queue_data(vdev, vq);
   return;




Re: [PATCH] virtio-net: fix qemu set used ring flag even vhost started

2024-04-02 Thread Yajun Wu



On 4/2/2024 8:11 PM, Philippe Mathieu-Daudé wrote:

External email: Use caution opening links or attachments


Hi Yajun,

On 2/4/24 06:51, Yajun Wu wrote:

When vhost-user or vhost-kernel is handling virtio net datapath, qemu

"QEMU"

Ack.

should not touch used ring.

But with vhost-user socket reconnect scenario, in a very rare case (has
pending kick event). VRING_USED_F_NO_NOTIFY is set by qemu in

"QEMU"

Ack.

following code path:

   #0  virtio_queue_split_set_notification (vq=0x7ff5f4c920a8, enable=0) at 
../hw/virtio/virtio.c:511
   #1  0x559d6dbf033b in virtio_queue_set_notification 
(vq=0x7ff5f4c920a8, enable=0) at ../hw/virtio/virtio.c:576
   #2  0x559d6dbbbdbc in virtio_net_handle_tx_bh (vdev=0x559d703a6aa0, 
vq=0x7ff5f4c920a8) at ../hw/net/virtio-net.c:2801
   #3  0x559d6dbf4791 in virtio_queue_notify_vq (vq=0x7ff5f4c920a8) at 
../hw/virtio/virtio.c:2248
   #4  0x559d6dbf79da in virtio_queue_host_notifier_read 
(n=0x7ff5f4c9211c) at ../hw/virtio/virtio.c:3525
   #5  0x559d6d9a5814 in virtio_bus_cleanup_host_notifier 
(bus=0x559d703a6a20, n=1) at ../hw/virtio/virtio-bus.c:321
   #6  0x559d6dbf83c9 in virtio_device_stop_ioeventfd_impl 
(vdev=0x559d703a6aa0) at ../hw/virtio/virtio.c:3774
   #7  0x559d6d9a55c8 in virtio_bus_stop_ioeventfd (bus=0x559d703a6a20) 
at ../hw/virtio/virtio-bus.c:259
   #8  0x559d6d9a53e8 in virtio_bus_grab_ioeventfd (bus=0x559d703a6a20) 
at ../hw/virtio/virtio-bus.c:199
   #9  0x559d6dbf841c in virtio_device_grab_ioeventfd 
(vdev=0x559d703a6aa0) at ../hw/virtio/virtio.c:3783
   #10 0x559d6d9bde18 in vhost_dev_enable_notifiers 
(hdev=0x559d707edd70, vdev=0x559d703a6aa0) at ../hw/virtio/vhost.c:1592
   #11 0x559d6d89a0b8 in vhost_net_start_one (net=0x559d707edd70, 
dev=0x559d703a6aa0) at ../hw/net/vhost_net.c:266
   #12 0x559d6d89a6df in vhost_net_start (dev=0x559d703a6aa0, 
ncs=0x559d7048d890, data_queue_pairs=31, cvq=0) at ../hw/net/vhost_net.c:412
   #13 0x559d6dbb5b89 in virtio_net_vhost_status (n=0x559d703a6aa0, 
status=15 '\017') at ../hw/net/virtio-net.c:311
   #14 0x559d6dbb5e34 in virtio_net_set_status (vdev=0x559d703a6aa0, 
status=15 '\017') at ../hw/net/virtio-net.c:392
   #15 0x559d6dbb60d8 in virtio_net_set_link_status (nc=0x559d7048d890) 
at ../hw/net/virtio-net.c:455
   #16 0x559d6da64863 in qmp_set_link (name=0x559d6f0b83d0 "hostnet1", 
up=true, errp=0x7ffdd76569f0) at ../net/net.c:1459
   #17 0x559d6da7226e in net_vhost_user_event (opaque=0x559d6f0b83d0, 
event=CHR_EVENT_OPENED) at ../net/vhost-user.c:301
   #18 0x559d6ddc7f63 in chr_be_event (s=0x559d6f2ffea0, 
event=CHR_EVENT_OPENED) at ../chardev/char.c:62
   #19 0x559d6ddc7fdc in qemu_chr_be_event (s=0x559d6f2ffea0, 
event=CHR_EVENT_OPENED) at ../chardev/char.c:82

This issue causes guest kernel stop kicking device and traffic stop.

Add vhost_started check in virtio_net_handle_tx_bh to fix this wrong
VRING_USED_F_NO_NOTIFY set.

Signed-off-by: Yajun Wu 
Reviewed-by: Jiri Pirko 
---
   hw/net/virtio-net.c | 4 
   1 file changed, 4 insertions(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index a6ff000cd9..8035e01fdf 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -2865,6 +2865,10 @@ static void virtio_net_handle_tx_bh(VirtIODevice *vdev, 
VirtQueue *vq)
   VirtIONet *n = VIRTIO_NET(vdev);
   VirtIONetQueue *q = >vqs[vq2q(virtio_get_queue_index(vq))];

+if (n->vhost_started) {

Since you mentioned "in a very rare case", maybe use unlikely()?

Ack.



+return;
+}
+
   if (unlikely((n->status & VIRTIO_NET_S_LINK_UP) == 0)) {
   virtio_net_drop_tx_queue_data(vdev, vq);
   return;




[PATCH] virtio-net: fix qemu set used ring flag even vhost started

2024-04-01 Thread Yajun Wu
When vhost-user or vhost-kernel is handling virtio net datapath, qemu
should not touch used ring.

But with vhost-user socket reconnect scenario, in a very rare case (has
pending kick event). VRING_USED_F_NO_NOTIFY is set by qemu in
following code path:

#0  virtio_queue_split_set_notification (vq=0x7ff5f4c920a8, enable=0) 
at ../hw/virtio/virtio.c:511
#1  0x559d6dbf033b in virtio_queue_set_notification 
(vq=0x7ff5f4c920a8, enable=0) at ../hw/virtio/virtio.c:576
#2  0x559d6dbbbdbc in virtio_net_handle_tx_bh (vdev=0x559d703a6aa0, 
vq=0x7ff5f4c920a8) at ../hw/net/virtio-net.c:2801
#3  0x559d6dbf4791 in virtio_queue_notify_vq (vq=0x7ff5f4c920a8) at 
../hw/virtio/virtio.c:2248
#4  0x559d6dbf79da in virtio_queue_host_notifier_read 
(n=0x7ff5f4c9211c) at ../hw/virtio/virtio.c:3525
#5  0x559d6d9a5814 in virtio_bus_cleanup_host_notifier 
(bus=0x559d703a6a20, n=1) at ../hw/virtio/virtio-bus.c:321
#6  0x559d6dbf83c9 in virtio_device_stop_ioeventfd_impl 
(vdev=0x559d703a6aa0) at ../hw/virtio/virtio.c:3774
#7  0x559d6d9a55c8 in virtio_bus_stop_ioeventfd 
(bus=0x559d703a6a20) at ../hw/virtio/virtio-bus.c:259
#8  0x559d6d9a53e8 in virtio_bus_grab_ioeventfd 
(bus=0x559d703a6a20) at ../hw/virtio/virtio-bus.c:199
#9  0x559d6dbf841c in virtio_device_grab_ioeventfd 
(vdev=0x559d703a6aa0) at ../hw/virtio/virtio.c:3783
#10 0x559d6d9bde18 in vhost_dev_enable_notifiers 
(hdev=0x559d707edd70, vdev=0x559d703a6aa0) at ../hw/virtio/vhost.c:1592
#11 0x559d6d89a0b8 in vhost_net_start_one (net=0x559d707edd70, 
dev=0x559d703a6aa0) at ../hw/net/vhost_net.c:266
#12 0x559d6d89a6df in vhost_net_start (dev=0x559d703a6aa0, 
ncs=0x559d7048d890, data_queue_pairs=31, cvq=0) at ../hw/net/vhost_net.c:412
#13 0x559d6dbb5b89 in virtio_net_vhost_status (n=0x559d703a6aa0, 
status=15 '\017') at ../hw/net/virtio-net.c:311
#14 0x559d6dbb5e34 in virtio_net_set_status (vdev=0x559d703a6aa0, 
status=15 '\017') at ../hw/net/virtio-net.c:392
#15 0x559d6dbb60d8 in virtio_net_set_link_status 
(nc=0x559d7048d890) at ../hw/net/virtio-net.c:455
#16 0x559d6da64863 in qmp_set_link (name=0x559d6f0b83d0 "hostnet1", 
up=true, errp=0x7ffdd76569f0) at ../net/net.c:1459
#17 0x559d6da7226e in net_vhost_user_event (opaque=0x559d6f0b83d0, 
event=CHR_EVENT_OPENED) at ../net/vhost-user.c:301
#18 0x559d6ddc7f63 in chr_be_event (s=0x559d6f2ffea0, 
event=CHR_EVENT_OPENED) at ../chardev/char.c:62
#19 0x559d6ddc7fdc in qemu_chr_be_event (s=0x559d6f2ffea0, 
event=CHR_EVENT_OPENED) at ../chardev/char.c:82

This issue causes guest kernel stop kicking device and traffic stop.

Add vhost_started check in virtio_net_handle_tx_bh to fix this wrong
VRING_USED_F_NO_NOTIFY set.

Signed-off-by: Yajun Wu 
Reviewed-by: Jiri Pirko 
---
 hw/net/virtio-net.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index a6ff000cd9..8035e01fdf 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -2865,6 +2865,10 @@ static void virtio_net_handle_tx_bh(VirtIODevice *vdev, 
VirtQueue *vq)
 VirtIONet *n = VIRTIO_NET(vdev);
 VirtIONetQueue *q = >vqs[vq2q(virtio_get_queue_index(vq))];
 
+if (n->vhost_started) {
+return;
+}
+
 if (unlikely((n->status & VIRTIO_NET_S_LINK_UP) == 0)) {
 virtio_net_drop_tx_queue_data(vdev, vq);
 return;
-- 
2.27.0




Re: vhost-user-blk reconnect issue

2024-04-01 Thread Yajun Wu


On 4/1/2024 4:34 PM, Li Feng wrote:

*External email: Use caution opening links or attachments*


Hi yajun,

I have submitted a patch to fix this problem a few months ago, but in 
the end this solution was not accepted and other solutions

were adopted to fix it.

[PATCH 1/2] vhost-user: fix lost reconnect - Li Feng 
<https://lore.kernel.org/all/20230804052954.2918915-2-fen...@smartx.com/>
lore.kernel.org 
<https://lore.kernel.org/all/20230804052954.2918915-2-fen...@smartx.com/>


<https://lore.kernel.org/all/20230804052954.2918915-2-fen...@smartx.com/>

<https://lore.kernel.org/all/20230804052954.2918915-2-fen...@smartx.com/>


I think this fix is valid.


This is the merged fix:


[PULL 76/83] vhost-user: fix lost reconnect - Michael S. Tsirkin 
<https://lore.kernel.org/all/a68c0148e9bf105f9e83ff5e763b8fcb6f7ba9be.1697644299.git@redhat.com/>
lore.kernel.org 
<https://lore.kernel.org/all/a68c0148e9bf105f9e83ff5e763b8fcb6f7ba9be.1697644299.git@redhat.com/>


<https://lore.kernel.org/all/a68c0148e9bf105f9e83ff5e763b8fcb6f7ba9be.1697644299.git@redhat.com/>

<https://lore.kernel.org/all/a68c0148e9bf105f9e83ff5e763b8fcb6f7ba9be.1697644299.git@redhat.com/>


My tests are with this fix, failed in the two scenarios I mentioned.



Thanks,
Li


2024年4月1日 10:08,Yajun Wu  写道:


On 3/27/2024 6:47 PM, Stefano Garzarella wrote:

External email: Use caution opening links or attachments


Hi Yajun,

On Mon, Mar 25, 2024 at 10:54:13AM +, Yajun Wu wrote:

Hi experts,

With latest QEMU (8.2.90), we find two vhost-user-blk backend reconnect
failure scenarios:

Do you know if has it ever worked and so it's a regression, or have we
always had this problem?


I am afraid this commit: "71e076a07d (2022-12-01 02:30:13 -0500) 
hw/virtio: generalise CHR_EVENT_CLOSED handling"  caused both 
failures. Previous hash is good.


I suspect the "if (vhost->vdev)" in vhost_user_async_close_bh is the 
cause, previous code doesn't have this check?




Thanks,
Stefano

1. Disconnect vhost-user-blk backend before guest driver probe vblk 
device, then reconnect backend after guest driver probe device. 
QEMU won't send out any vhost messages to restore backend.
This is because vhost->vdev is NULL before guest driver probe vblk 
device, so vhost_user_blk_disconnect won't be called, s->connected 
is still true. Next vhost_user_blk_connect will simply return 
without doing anything.


2. modprobe -r virtio-blk inside VM, then disconnect backend, then 
reconnect backend, then modprobe virtio-blk. QEMU won't send 
messages in vhost_dev_init.
This is because rmmod will let qemu call vhost_user_blk_stop, 
vhost->vdev also become NULL(in vhost_dev_stop), 
vhost_user_blk_disconnect won't be called. Again s->connected is 
still true, even chr connect is closed.


I think even vhost->vdev is NULL, vhost_user_blk_disconnect should 
be called when chr connect close?

Hope we can have a fix soon.


Thanks,
Yajun



Re: vhost-user-blk reconnect issue

2024-03-31 Thread Yajun Wu



On 3/27/2024 6:47 PM, Stefano Garzarella wrote:

External email: Use caution opening links or attachments


Hi Yajun,

On Mon, Mar 25, 2024 at 10:54:13AM +, Yajun Wu wrote:

Hi experts,

With latest QEMU (8.2.90), we find two vhost-user-blk backend reconnect
failure scenarios:

Do you know if has it ever worked and so it's a regression, or have we
always had this problem?


I am afraid this commit: "71e076a07d (2022-12-01 02:30:13 -0500) 
hw/virtio: generalise CHR_EVENT_CLOSED handling"  caused both failures. 
Previous hash is good.


I suspect the "if (vhost->vdev)" in vhost_user_async_close_bh is the 
cause, previous code doesn't have this check?




Thanks,
Stefano


1. Disconnect vhost-user-blk backend before guest driver probe vblk device, 
then reconnect backend after guest driver probe device. QEMU won't send out any 
vhost messages to restore backend.
This is because vhost->vdev is NULL before guest driver probe vblk device, so 
vhost_user_blk_disconnect won't be called, s->connected is still true. Next 
vhost_user_blk_connect will simply return without doing anything.

2. modprobe -r virtio-blk inside VM, then disconnect backend, then reconnect 
backend, then modprobe virtio-blk. QEMU won't send messages in vhost_dev_init.
This is because rmmod will let qemu call vhost_user_blk_stop, vhost->vdev also 
become NULL(in vhost_dev_stop), vhost_user_blk_disconnect won't be called. Again 
s->connected is still true, even chr connect is closed.

I think even vhost->vdev is NULL, vhost_user_blk_disconnect should be called 
when chr connect close?
Hope we can have a fix soon.


Thanks,
Yajun





vhost-user-blk reconnect issue

2024-03-25 Thread Yajun Wu
Hi experts,

With latest QEMU (8.2.90), we find two vhost-user-blk backend reconnect failure 
scenarios:
1. Disconnect vhost-user-blk backend before guest driver probe vblk device, 
then reconnect backend after guest driver probe device. QEMU won't send out any 
vhost messages to restore backend.
This is because vhost->vdev is NULL before guest driver probe vblk device, so 
vhost_user_blk_disconnect won't be called, s->connected is still true. Next 
vhost_user_blk_connect will simply return without doing anything.

2. modprobe -r virtio-blk inside VM, then disconnect backend, then reconnect 
backend, then modprobe virtio-blk. QEMU won't send messages in vhost_dev_init. 
This is because rmmod will let qemu call vhost_user_blk_stop, vhost->vdev also 
become NULL(in vhost_dev_stop), vhost_user_blk_disconnect won't be called. 
Again s->connected is still true, even chr connect is closed.

I think even vhost->vdev is NULL, vhost_user_blk_disconnect should be called 
when chr connect close?
Hope we can have a fix soon.


Thanks,
Yajun



Re: [RFC PATCH 0/5] virtio-net: Introduce LM early load

2023-10-18 Thread Yajun Wu



On 10/18/2023 12:47 AM, Eugenio Perez Martin wrote:

External email: Use caution opening links or attachments


On Mon, Sep 18, 2023 at 6:51 AM Yajun Wu  wrote:

This series of patches aims to minimize the downtime during live migration of a
virtio-net device with a vhost-user backend. In the case of hardware virtual
Data Path Acceleration (vDPA) implementation, the hardware configuration, which
includes tasks like VQ creation and RSS setting, may take above 200ms. This
significantly increases the downtime of the VM, particularly in terms of
networking.


Hi!

Sorry I totally missed this email. Please CC me in next versions.

Just for completion, there is an ongoing plan to reduce the downtime
in vhost-vdpa. You can find more details at [1].

To send the state periodically is in the roadmap, but some
benchmarking detected that memory pinning and unpinning affects more
to downtime. I'll send a RFC soon with this. The plan was to continue
with iterative state restoring, so I'm happy to know more people are
looking into it!

In the case of vhost-vdpa it already restores the state by not
enabling dataplane until migration completes. All the load is
performed using CVQ, as you can see in
net/vhost-vdpa.c:vhost_vdpa_net_load. After that, all dataplane is
started again.

My idea is to start vhost-vdpa (by calling vhost_vdpa_dev_start) at
the destination at the same moment the migration starts, as it will
not have dataplane enabled. After that, the source should send the
virtio-net vmstate every time it changes. vhost-vdpa net is able to
send and receive through CVQ, so it should be able to modify net
device configuration as many times as needed. I guess that could be
done by calling something in the line of your
vhost_user_set_presetup_state.
This is very good approach. How do you know when virtio-net vmstate 
change? vhost-user and vhost-vdpa should share same code of virtio-net 
vmstate early sync.


This can be improved in vhost-vdpa by being able to send only the new state.

When all the migration is completed, vhost-vdpa net dataplane should
start as it does now.

If you are interested in saving changes to vhost-user protocol, maybe
qemu could just disable the dataplane too with
VHOST_USER_SET_VRING_ENABLE? If not, I think both approaches have a
lot in common, so I'm sure we can develop one backend on top of
another.

Thanks!

[1] https://lists.gnu.org/archive/html/qemu-devel/2023-04/msg00659.html


I'm afraid just like DRIVER_OK as a hint for vhost-user vDPA to apply 
all the configuration to HW. Vhost-user also needs same hint as the end 
of each round vmstate sync to apply configuration to HW. That's why I 
need define new protocol message.


Because of MQ can also change, VQ enable is a valid parameter to HW. HW 
will create only enabled queue, number of enabled queues affects RSS 
setting.






To reduce the VM downtime, the proposed approach involves capturing the basic
device state/configuration during the VM's running stage and performing the
initial device configuration(presetup). During the normal configuration process
when the VM is in a stopped state, the second configuration is compared to the
first one, and only the differences are applied to reduce downtime. Ideally,
only the vring available index needs to be changed within VM stop.

This feature is disabled by default, because backend like dpdk also needs
adding support for vhost new message. New device property "x-early-migration"
can enable this feature.

1. Register a new vmstate for virtio-net with an early_setup flag to send the
device state during migration setup.
2. After device state load on destination VM, need to send device status to
vhost backend in a new way. Introduce new vhost-user message:
VHOST_USER_PRESETUP, to notify backend of presetup.
3. Let virtio-net, vhost-net, vhost-dev support presetup. Main flow:
a. vhost-dev sending presetup start.
b. virtio-net setting mtu.
c. vhost-dev sending vring configuration and setting dummy call/kick fd.
d. vhost-net sending vring enable.
e. vhost-dev sending presetup end.


TODOs:
==
   - No vhost-vdpa/kernel support. Need to discuss/design new kernel interface
 if there's same requirement for vhost-vdpa.

   - No vIOMMU support so far. If there is a need for vIOMMU support, it is
 planned to be addressed in a follow-up patchset.


Test:
=
   - Live migration VM with 2 virtio-net devices, ping can recover.
 Together with DPDK patch [1].
   - The time consumption of DPDK function dev_conf is reduced from 191.4 ms
 to 6.6 ms.


References:
===

[1] https://github.com/Mellanox/dpdk-vhost-vfe/pull/37

Any comments or feedback are highly appreciated.

Thanks,
Yajun


Yajun Wu (5):
   vhost-user: Add presetup protocol feature and op
   vhost: Add support for presetup
   vhost-net: Add support for presetup
   virtio: Add VMState for early load
   virtio-net: Introduce LM early load

  docs/interop/vhost-user.rst  

Re: [RFC PATCH 0/5] virtio-net: Introduce LM early load

2023-10-17 Thread Yajun Wu

Ping.

On 9/18/2023 12:49 PM, Yajun Wu wrote:

This series of patches aims to minimize the downtime during live migration of a
virtio-net device with a vhost-user backend. In the case of hardware virtual
Data Path Acceleration (vDPA) implementation, the hardware configuration, which
includes tasks like VQ creation and RSS setting, may take above 200ms. This
significantly increases the downtime of the VM, particularly in terms of
networking.

To reduce the VM downtime, the proposed approach involves capturing the basic
device state/configuration during the VM's running stage and performing the
initial device configuration(presetup). During the normal configuration process
when the VM is in a stopped state, the second configuration is compared to the
first one, and only the differences are applied to reduce downtime. Ideally,
only the vring available index needs to be changed within VM stop.

This feature is disabled by default, because backend like dpdk also needs
adding support for vhost new message. New device property "x-early-migration"
can enable this feature.

1. Register a new vmstate for virtio-net with an early_setup flag to send the
device state during migration setup.
2. After device state load on destination VM, need to send device status to
vhost backend in a new way. Introduce new vhost-user message:
VHOST_USER_PRESETUP, to notify backend of presetup.
3. Let virtio-net, vhost-net, vhost-dev support presetup. Main flow:
a. vhost-dev sending presetup start.
b. virtio-net setting mtu.
c. vhost-dev sending vring configuration and setting dummy call/kick fd.
d. vhost-net sending vring enable.
e. vhost-dev sending presetup end.


TODOs:
==
   - No vhost-vdpa/kernel support. Need to discuss/design new kernel interface
 if there's same requirement for vhost-vdpa.

   - No vIOMMU support so far. If there is a need for vIOMMU support, it is
 planned to be addressed in a follow-up patchset.


Test:
=
   - Live migration VM with 2 virtio-net devices, ping can recover.
 Together with DPDK patch [1].
   - The time consumption of DPDK function dev_conf is reduced from 191.4 ms
 to 6.6 ms.


References:
===

[1] https://github.com/Mellanox/dpdk-vhost-vfe/pull/37

Any comments or feedback are highly appreciated.

Thanks,
Yajun


Yajun Wu (5):
   vhost-user: Add presetup protocol feature and op
   vhost: Add support for presetup
   vhost-net: Add support for presetup
   virtio: Add VMState for early load
   virtio-net: Introduce LM early load

  docs/interop/vhost-user.rst   |  10 ++
  hw/net/trace-events   |   1 +
  hw/net/vhost_net.c|  40 +++
  hw/net/virtio-net.c   | 100 ++
  hw/virtio/vhost-user.c|  30 ++
  hw/virtio/vhost.c | 166 +-
  hw/virtio/virtio.c| 152 ---
  include/hw/virtio/vhost-backend.h |   3 +
  include/hw/virtio/vhost.h |  12 +++
  include/hw/virtio/virtio-net.h|   1 +
  include/hw/virtio/virtio.h|  10 +-
  include/net/vhost_net.h   |   3 +
  12 files changed, 443 insertions(+), 85 deletions(-)





Re: [Virtio-fs] (no subject)

2023-10-09 Thread Yajun Wu



On 10/9/2023 5:13 PM, Hanna Czenczek wrote:

External email: Use caution opening links or attachments


On 09.10.23 11:07, Hanna Czenczek wrote:

On 09.10.23 10:21, Hanna Czenczek wrote:

On 07.10.23 04:22, Yajun Wu wrote:

[...]


The main motivation of adding VHOST_USER_SET_STATUS is to let
backend DPDK know
when DRIVER_OK bit is valid. It's an indication of all VQ
configuration has sent,
otherwise DPDK has to rely on first queue pair is ready, then
receiving/applying
VQ configuration one by one.

During live migration, configuring VQ one by one is very time
consuming.

One question I have here is why it wasn’t then introduced in the live
migration code, but in the general VM stop/cont code instead. It does
seem time-consuming to do this every time the VM is paused and resumed.


Yes, VM stop/cont will call vhost_net_stop/vhost_net_start. Maybe 
because there's no device level stop/cont vhost message?





For VIRTIO
net vDPA, HW needs to know how many VQs are enabled to set
RSS(Receive-Side Scaling).

If you don’t want SET_STATUS message, backend can remove protocol
feature bit
VHOST_USER_PROTOCOL_F_STATUS.

The problem isn’t back-ends that don’t want the message, the problem
is that qemu uses the message wrongly, which prevents well-behaving
back-ends from implementing the message.


DPDK is ignoring SET_STATUS 0, but using GET_VRING_BASE to do device
close/reset.

So the right thing to do for back-ends is to announce STATUS support
and then not implement it correctly?

GET_VRING_BASE should not reset the close or reset the device, by the
way.  It should stop that one vring, not more.  We have a
RESET_DEVICE command for resetting.
I believe dpdk uses GET_VRING_BASE long before qemu has RESET_DEVICE? 
It's a compatible issue. For new backend implements, we can have better 
solution, right?

I'm not involved in discussion about adding SET_STATUS in Vhost
protocol. This feature
is essential for vDPA(same as vhost-vdpa implements
VHOST_VDPA_SET_STATUS).

So from what I gather from your response is that there is only a
single use for SET_STATUS, which is the DRIVER_OK bit.  If so,
documenting that all other bits are to be ignored by both back-end
and front-end would be fine by me.

I’m not fully serious about that suggestion, but I hear the strong
implication that nothing but DRIVER_OK was of any concern, and this
is really important to note when we talk about the status of the
STATUS feature in vhost today.  It seems to me now that it was not
intended to be the virtio-level status byte, but just a DRIVER_OK
signalling path from front-end to back-end.  That makes it a
vhost-level protocol feature to me.

On second thought, it just is a pure vhost-level protocol feature, and
has nothing to do with the virtio status byte as-is.  The only stated
purpose is for the front-end to send DRIVER_OK after migration, but
migration is transparent to the guest, so the guest would never change
the status byte during migration.  Therefore, if this feature is
essential, we will never be able to have a status byte that is
transparently shared between guest and back-end device, i.e. the
virtio status byte.

On third thought, scratch that.  The guest wouldn’t set it, but
naturally, after migration, the front-end will need to restore the
status byte from the source, so the front-end will always need to set
it, even if it were otherwise used controlled only by the guest and the
back-end device.  So technically, this doesn’t prevent such a use case.
(In practice, it isn’t controlled by the guest right now, but that could
be fixed.)
I only tested the feature with DPDK(the only backend use it today?). Max 
defined the protocol and added the corresponding code in DPDK before I 
added QEMU support. If other backend or different device type want to 
use this, we can have further discussion?

Cc-ing Alex on this mail, because to me, this seems like an important
detail when he plans on using the byte in the future. If we need a
virtio status byte, I can’t see how we could use the existing F_STATUS
for it.

Hanna




Re: [Virtio-fs] (no subject)

2023-10-09 Thread Yajun Wu



On 10/9/2023 6:28 PM, German Maglione wrote:

External email: Use caution opening links or attachments


On Sat, Oct 7, 2023 at 4:23 AM Yajun Wu  wrote:


On 10/6/2023 6:34 PM, Michael S. Tsirkin wrote:

External email: Use caution opening links or attachments


On Fri, Oct 06, 2023 at 11:47:55AM +0200, Hanna Czenczek wrote:

On 06.10.23 11:26, Michael S. Tsirkin wrote:

On Fri, Oct 06, 2023 at 11:15:55AM +0200, Hanna Czenczek wrote:

On 06.10.23 10:45, Michael S. Tsirkin wrote:

On Fri, Oct 06, 2023 at 09:48:14AM +0200, Hanna Czenczek wrote:

On 05.10.23 19:15, Michael S. Tsirkin wrote:

On Thu, Oct 05, 2023 at 01:08:52PM -0400, Stefan Hajnoczi wrote:

On Wed, Oct 04, 2023 at 02:58:57PM +0200, Hanna Czenczek wrote:

There is no clearly defined purpose for the virtio status byte in
vhost-user: For resetting, we already have RESET_DEVICE; and for virtio
feature negotiation, we have [GS]ET_FEATURES.  With the REPLY_ACK
protocol extension, it is possible for SET_FEATURES to return errors
(SET_PROTOCOL_FEATURES may be called before SET_FEATURES).

As for implementations, SET_STATUS is not widely implemented.  dpdk does
implement it, but only uses it to signal feature negotiation failure.
While it does log reset requests (SET_STATUS 0) as such, it effectively
ignores them, in contrast to RESET_OWNER (which is deprecated, and today
means the same thing as RESET_DEVICE).

While qemu superficially has support for [GS]ET_STATUS, it does not
forward the guest-set status byte, but instead just makes it up
internally, and actually completely ignores what the back-end returns,
only using it as the template for a subsequent SET_STATUS to add single
bits to it.  Notably, after setting FEATURES_OK, it never reads it back
to see whether the flag is still set, which is the only way in which
dpdk uses the status byte.

As-is, no front-end or back-end can rely on the other side handling this
field in a useful manner, and it also provides no practical use over
other mechanisms the vhost-user protocol has, which are more clearly
defined.  Deprecate it.

Suggested-by: Stefan Hajnoczi 
Signed-off-by: Hanna Czenczek 
---
  docs/interop/vhost-user.rst | 28 +---
  1 file changed, 21 insertions(+), 7 deletions(-)

Reviewed-by: Stefan Hajnoczi 

SET_STATUS is the only way to signal failure to acknowledge FEATURES_OK.
The fact current backends never check errors does not mean they never
will. So no, not applying this.

Can this not be done with REPLY_ACK?  I.e., with the following message
order:

1. GET_FEATURES to find out whether VHOST_USER_F_PROTOCOL_FEATURES is
present
2. GET_PROTOCOL_FEATURES to hopefully get VHOST_USER_PROTOCOL_F_REPLY_ACK
3. SET_PROTOCOL_FEATURES to set VHOST_USER_PROTOCOL_F_REPLY_ACK
4. SET_FEATURES with need_reply

If not, the problem is that qemu has sent SET_STATUS 0 for a while when the
vCPUs are stopped, which generally seems to request a device reset.  If we
don’t state at least that SET_STATUS 0 is to be ignored, back-ends that will
implement SET_STATUS later may break with at least these qemu versions.  But
documenting that a particular use of the status byte is to be ignored would
be really strange.

Hanna

Hmm I guess. Though just following virtio spec seems cleaner to me...
vhost-user reconfigures the state fully on start.

Not the internal device state, though.  virtiofsd has internal state, and
other devices like vhost-gpu back-ends would probably, too.

Stefan has recently sent a series
(https://lists.nongnu.org/archive/html/qemu-devel/2023-10/msg00709.html) to
put the reset (RESET_DEVICE) into virtio_reset() (when we really need a
reset).

I really don’t like our current approach with the status byte. Following the
virtio specification to me would mean that the guest directly controls this
byte, which it does not.  qemu makes up values as it deems appropriate, and
this includes sending a SET_STATUS 0 when the guest is just paused, i.e.
when the guest really doesn’t want a device reset.

That means that qemu does not treat this as a virtio device field (because
that would mean exposing it to the guest driver), but instead treats it as
part of the vhost(-user) protocol.  It doesn’t feel right to me that we use
a virtio-defined feature for communication on the vhost level, i.e. between
front-end and back-end, and not between guest driver and device.  I think
all vhost-level protocol features should be fully defined in the vhost-user
specification, which REPLY_ACK is.

Hmm that makes sense. Maybe we should have done what stefan's patch
is doing.

Do look at the original commit that introduced it to understand why
it was added.

I don’t understand why this was added to the stop/cont code, though.  If it
is time consuming to make these changes, why are they done every time the VM
is paused
and resumed?  It makes sense that this would be done for the initial
configuration (where a reset also wouldn’t hurt), but here it seems wrong.

(To be clear, a reset in the stop

Re: [Virtio-fs] (no subject)

2023-10-06 Thread Yajun Wu
 was wrong even for non-stateful devices, because it
occurred before we fetched the state (vring indices) so we could restore it
later.  I don’t know how 923b8921d21 was tested, but if the back-end used
for testing implemented SET_STATUS 0 as a reset, it could not have survived
either migration or a stop/cont in general, because the vring indices would
have been reset to 0.

What I’m saying is, 923b8921d21 introduced SET_STATUS calls that broke all
devices that would implement them as per virtio spec, and even today it’s
broken for stateful devices.  The mentioned performance issue is likely
real, but we can’t address it by making up SET_STATUS calls that are wrong.

I concede that I didn’t think about DRIVER_OK.  Personally, I would do all
final configuration that would happen upon a DRIVER_OK once the first vring
is started (i.e. receives a kick).  That has the added benefit of being
asynchronous because it doesn’t block any vhost-user messages (which are
synchronous, and thus block downtime).

Hanna


For better or worse kick is per ring. It's out of spec to start rings
that were not kicked but I guess you could do configuration ...
Seems somewhat asymmetrical though.

Let's wait until next week, hopefully Yajun Wu will answer.
The main motivation of adding VHOST_USER_SET_STATUS is to let backend 
DPDK know
when DRIVER_OK bit is valid. It's an indication of all VQ configuration 
has sent,
otherwise DPDK has to rely on first queue pair is ready, then 
receiving/applying

VQ configuration one by one.

During live migration, configuring VQ one by one is very time consuming. 
For VIRTIO
net vDPA, HW needs to know how many VQs are enabled to set 
RSS(Receive-Side Scaling).


If you don’t want SET_STATUS message, backend can remove protocol 
feature bit

VHOST_USER_PROTOCOL_F_STATUS.
DPDK is ignoring SET_STATUS 0, but using GET_VRING_BASE to do device 
close/reset.


I'm not involved in discussion about adding SET_STATUS in Vhost 
protocol. This feature

is essential for vDPA(same as vhost-vdpa implements VHOST_VDPA_SET_STATUS).

Thanks,
Yajun



Now, we could hand full control of the status byte to the guest, and that
would make me content.  But I feel like that doesn’t really work, because
qemu needs to intercept the status byte anyway (it needs to know when there
is a reset, probably wants to know when the device is configured, etc.), so
I don’t think having the status byte in vhost-user really gains us much when
qemu could translate status byte changes to/from other vhost-user commands.

Hanna

well it intercepts it but I think it could pass it on unchanged.



I guess symmetry was the
point. So I don't see why SET_STATUS 0 has to be ignored.


SET_STATUS was introduced by:

commit 923b8921d210763359e96246a58658ac0db6c645
Author: Yajun Wu 
Date:   Mon Oct 17 14:44:52 2022 +0800

   vhost-user: Support vhost_dev_start

CC the author.





[RFC PATCH 2/5] vhost: Add support for presetup

2023-09-17 Thread Yajun Wu
Add New API vhost_dev_start_presetup to notify backend the start
and end of presetup.

API vhost_dev_presetup to send out the device configurations:
1. acked_features
2. memory table
3. vring information
4. disable host/guest notifier.

Signed-off-by: Yajun Wu 
Reviewed-by: Avihai Horon 
Reviewed-by: Jiri Pirko 
---
 hw/virtio/vhost.c | 166 --
 include/hw/virtio/vhost.h |  12 +++
 2 files changed, 152 insertions(+), 26 deletions(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index e2f6ffb446..5b162590fb 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -1138,24 +1138,71 @@ out:
 return ret;
 }
 
-int vhost_virtqueue_start(struct vhost_dev *dev,
-  struct VirtIODevice *vdev,
-  struct vhost_virtqueue *vq,
-  unsigned idx)
+static void vhost_virtqueue_memory_unmap(struct vhost_dev *dev,
+ struct VirtIODevice *vdev,
+ struct vhost_virtqueue *vq,
+ unsigned idx)
+{
+if (vq->used) {
+vhost_memory_unmap(dev, vq->used,
+   virtio_queue_get_used_size(vdev, idx),
+   1, virtio_queue_get_used_size(vdev, idx));
+vq->used = NULL;
+}
+
+if (vq->avail) {
+vhost_memory_unmap(dev, vq->avail,
+   virtio_queue_get_avail_size(vdev, idx),
+   0, virtio_queue_get_avail_size(vdev, idx));
+vq->avail = NULL;
+}
+
+if (vq->desc) {
+vhost_memory_unmap(dev, vq->desc,
+   virtio_queue_get_desc_size(vdev, idx),
+   0, virtio_queue_get_desc_size(vdev, idx));
+vq->desc = NULL;
+}
+}
+
+static int vhost_virtqueue_disable_notify(struct vhost_dev *dev,
+  struct VirtIODevice *vdev,
+  struct vhost_virtqueue *vq,
+  unsigned idx)
 {
-BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
-VirtioBusState *vbus = VIRTIO_BUS(qbus);
-VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus);
-hwaddr s, l, a;
-int r;
 int vhost_vq_index = dev->vhost_ops->vhost_get_vq_index(dev, idx);
 struct vhost_vring_file file = {
 .index = vhost_vq_index
 };
+int r;
+
+file.fd = -1;
+r = dev->vhost_ops->vhost_set_vring_kick(dev, );
+if (r) {
+VHOST_OPS_DEBUG(r, "vhost_set_vring_kick failed");
+return r;
+}
+
+r = dev->vhost_ops->vhost_set_vring_call(dev, );
+if (r) {
+VHOST_OPS_DEBUG(r, "vhost_set_vring_call failed");
+return r;
+}
+
+return 0;
+}
+
+static int vhost_virtqueue_vring_setup(struct vhost_dev *dev,
+   struct VirtIODevice *vdev,
+   struct vhost_virtqueue *vq,
+   unsigned idx)
+{
+hwaddr s, l, a;
+int vhost_vq_index = dev->vhost_ops->vhost_get_vq_index(dev, idx);
 struct vhost_vring_state state = {
 .index = vhost_vq_index
 };
-struct VirtQueue *vvq = virtio_get_queue(vdev, idx);
+int r;
 
 a = virtio_queue_get_desc_addr(vdev, idx);
 if (a == 0) {
@@ -1186,6 +1233,10 @@ int vhost_virtqueue_start(struct vhost_dev *dev,
 }
 }
 
+if (vq->desc) {
+vhost_virtqueue_memory_unmap(dev, vdev, vq, idx);
+}
+
 vq->desc_size = s = l = virtio_queue_get_desc_size(vdev, idx);
 vq->desc_phys = a;
 vq->desc = vhost_memory_map(dev, a, , false);
@@ -1212,6 +1263,36 @@ int vhost_virtqueue_start(struct vhost_dev *dev,
 if (r < 0) {
 goto fail_alloc;
 }
+return 0;
+
+fail_alloc:
+fail_alloc_used:
+fail_alloc_avail:
+vhost_virtqueue_memory_unmap(dev, vdev, vq, idx);
+fail_alloc_desc:
+return r;
+}
+
+int vhost_virtqueue_start(struct vhost_dev *dev,
+  struct VirtIODevice *vdev,
+  struct vhost_virtqueue *vq,
+  unsigned idx)
+{
+BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
+VirtioBusState *vbus = VIRTIO_BUS(qbus);
+VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus);
+int r;
+int vhost_vq_index = dev->vhost_ops->vhost_get_vq_index(dev, idx);
+struct vhost_vring_file file = {
+.index = vhost_vq_index
+};
+struct VirtQueue *vvq = virtio_get_queue(vdev, idx);
+
+r = vhost_virtqueue_vring_setup(dev, vdev, vq, idx);
+if (r) {
+VHOST_OPS_DEBUG(r, "vhost_virtqueue_vring_setup failed");
+goto fail_vring_setup;
+}
 
 file.fd = event_notifier_get_fd(virtio_queue_get_host_notifier(vvq));
 r = dev->vhost_ops->vhost_set_vring_kic

[RFC PATCH 3/5] vhost-net: Add support for presetup

2023-09-17 Thread Yajun Wu
Introduce New API vhost_net_presetup to send virtio net device
configuration to backend in LM setup.

Mainly calling vhost_dev_presetup, then sending out vring enable.

Signed-off-by: Yajun Wu 
Reviewed-by: Avihai Horon 
Reviewed-by: Jiri Pirko 
---
 hw/net/vhost_net.c  | 40 
 include/net/vhost_net.h |  3 +++
 2 files changed, 43 insertions(+)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 6b958d6363..dcb818ccf1 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -345,6 +345,46 @@ static void vhost_net_stop_one(struct vhost_net *net,
 vhost_dev_disable_notifiers(>dev, dev);
 }
 
+int vhost_net_presetup(VirtIODevice *dev, NetClientState *ncs,
+int data_queue_pairs, int cvq)
+{
+VirtIONet *n = VIRTIO_NET(dev);
+int nvhosts = data_queue_pairs + cvq;
+struct vhost_net *net;
+int r = 0, i, index_end = data_queue_pairs * 2;
+NetClientState *peer;
+
+if (cvq) {
+index_end += 1;
+}
+
+for (i = 0; i < nvhosts; i++) {
+if (i < data_queue_pairs) {
+peer = qemu_get_peer(ncs, i);
+} else { /* Control Virtqueue */
+peer = qemu_get_peer(ncs, n->max_queue_pairs);
+}
+
+net = get_vhost_net(peer);
+vhost_net_set_vq_index(net, i * 2, index_end);
+
+r = vhost_dev_presetup(>dev, dev);
+if (r < 0) {
+return r;
+}
+
+if (peer->vring_enable) {
+/* restore vring enable state */
+r = vhost_set_vring_enable(peer, peer->vring_enable);
+if (r < 0) {
+return r;
+}
+}
+}
+
+return r;
+}
+
 int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
 int data_queue_pairs, int cvq)
 {
diff --git a/include/net/vhost_net.h b/include/net/vhost_net.h
index c37aba35e6..2c9020c5a2 100644
--- a/include/net/vhost_net.h
+++ b/include/net/vhost_net.h
@@ -26,6 +26,9 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
 void vhost_net_stop(VirtIODevice *dev, NetClientState *ncs,
 int data_queue_pairs, int cvq);
 
+int vhost_net_presetup(VirtIODevice *dev, NetClientState *ncs,
+   int data_queue_pairs, int cvq);
+
 void vhost_net_cleanup(VHostNetState *net);
 
 uint64_t vhost_net_get_features(VHostNetState *net, uint64_t features);
-- 
2.27.0




[RFC PATCH 0/5] virtio-net: Introduce LM early load

2023-09-17 Thread Yajun Wu
This series of patches aims to minimize the downtime during live migration of a
virtio-net device with a vhost-user backend. In the case of hardware virtual
Data Path Acceleration (vDPA) implementation, the hardware configuration, which
includes tasks like VQ creation and RSS setting, may take above 200ms. This
significantly increases the downtime of the VM, particularly in terms of
networking.

To reduce the VM downtime, the proposed approach involves capturing the basic
device state/configuration during the VM's running stage and performing the
initial device configuration(presetup). During the normal configuration process
when the VM is in a stopped state, the second configuration is compared to the
first one, and only the differences are applied to reduce downtime. Ideally,
only the vring available index needs to be changed within VM stop.

This feature is disabled by default, because backend like dpdk also needs
adding support for vhost new message. New device property "x-early-migration"
can enable this feature.

1. Register a new vmstate for virtio-net with an early_setup flag to send the
   device state during migration setup.
2. After device state load on destination VM, need to send device status to
   vhost backend in a new way. Introduce new vhost-user message:
   VHOST_USER_PRESETUP, to notify backend of presetup.
3. Let virtio-net, vhost-net, vhost-dev support presetup. Main flow:
   a. vhost-dev sending presetup start.
   b. virtio-net setting mtu.
   c. vhost-dev sending vring configuration and setting dummy call/kick fd.
   d. vhost-net sending vring enable.
   e. vhost-dev sending presetup end.


TODOs:
==
  - No vhost-vdpa/kernel support. Need to discuss/design new kernel interface
if there's same requirement for vhost-vdpa.

  - No vIOMMU support so far. If there is a need for vIOMMU support, it is
planned to be addressed in a follow-up patchset.


Test:
=
  - Live migration VM with 2 virtio-net devices, ping can recover.
Together with DPDK patch [1].
  - The time consumption of DPDK function dev_conf is reduced from 191.4 ms
to 6.6 ms.


References:
===

[1] https://github.com/Mellanox/dpdk-vhost-vfe/pull/37

Any comments or feedback are highly appreciated.

Thanks,
Yajun


Yajun Wu (5):
  vhost-user: Add presetup protocol feature and op
  vhost: Add support for presetup
  vhost-net: Add support for presetup
  virtio: Add VMState for early load
  virtio-net: Introduce LM early load

 docs/interop/vhost-user.rst   |  10 ++
 hw/net/trace-events   |   1 +
 hw/net/vhost_net.c|  40 +++
 hw/net/virtio-net.c   | 100 ++
 hw/virtio/vhost-user.c|  30 ++
 hw/virtio/vhost.c | 166 +-
 hw/virtio/virtio.c| 152 ---
 include/hw/virtio/vhost-backend.h |   3 +
 include/hw/virtio/vhost.h |  12 +++
 include/hw/virtio/virtio-net.h|   1 +
 include/hw/virtio/virtio.h|  10 +-
 include/net/vhost_net.h   |   3 +
 12 files changed, 443 insertions(+), 85 deletions(-)

-- 
2.27.0




[RFC PATCH 1/5] vhost-user: Add presetup protocol feature and op

2023-09-17 Thread Yajun Wu
This patch implements VHOST_USER_PROTOCOL_F_PRESETUP protocol feature
and VHOST_USER_PRESETUP, so that the backend can know the beginning
and completion of the early setup phase for the virtio device.

Unlike the regular device state load, which occurs in the VM stop
phase, this pre-setup takes place in the live migration setup stage.

Signed-off-by: Yajun Wu 
Reviewed-by: Avihai Horon 
Reviewed-by: Jiri Pirko 
---
 docs/interop/vhost-user.rst   | 10 ++
 hw/virtio/vhost-user.c| 30 ++
 include/hw/virtio/vhost-backend.h |  3 +++
 3 files changed, 43 insertions(+)

diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
index 5a070adbc1..70b8e2694c 100644
--- a/docs/interop/vhost-user.rst
+++ b/docs/interop/vhost-user.rst
@@ -885,6 +885,7 @@ Protocol features
   #define VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS  15
   #define VHOST_USER_PROTOCOL_F_STATUS   16
   #define VHOST_USER_PROTOCOL_F_XEN_MMAP 17
+  #define VHOST_USER_PROTOCOL_F_PRESETUP 18
 
 Front-end message types
 ---
@@ -1440,6 +1441,15 @@ Front-end message types
   query the back-end for its device status as defined in the Virtio
   specification.
 
+``VHOST_USER_PRESETUP``
+  :id: 41
+  :equivalent ioctl: N/A
+  :request payload: ``u64``
+  :reply payload: N/A
+
+  When the ``VHOST_USER_PROTOCOL_F_PRESETUP`` protocol feature has been
+  successfully negotiated, this message is submitted by the front-end to
+  indicate start or end early setup. Value 1 means start, 2 means end.
 
 Back-end message types
 --
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 8dcf049d42..71018d06c1 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -74,6 +74,8 @@ enum VhostUserProtocolFeature {
 /* Feature 14 reserved for VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS. */
 VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS = 15,
 VHOST_USER_PROTOCOL_F_STATUS = 16,
+/* Feature 17 reserved for VHOST_USER_PROTOCOL_F_XEN_MMAP. */
+VHOST_USER_PROTOCOL_F_PRESETUP = 18,
 VHOST_USER_PROTOCOL_F_MAX
 };
 
@@ -121,6 +123,7 @@ typedef enum VhostUserRequest {
 VHOST_USER_REM_MEM_REG = 38,
 VHOST_USER_SET_STATUS = 39,
 VHOST_USER_GET_STATUS = 40,
+VHOST_USER_PRESETUP = 41,
 VHOST_USER_MAX
 } VhostUserRequest;
 
@@ -132,6 +135,11 @@ typedef enum VhostUserBackendRequest {
 VHOST_USER_BACKEND_MAX
 }  VhostUserBackendRequest;
 
+typedef enum VhostUserPresetupState {
+VHOST_USER_PRESETUP_START = 1,
+VHOST_USER_PRESETUP_END = 2,
+} VhostUserPresetupState;
+
 typedef struct VhostUserMemoryRegion {
 uint64_t guest_phys_addr;
 uint64_t memory_size;
@@ -2741,6 +2749,27 @@ static void vhost_user_reset_status(struct vhost_dev 
*dev)
 }
 }
 
+static int vhost_user_set_presetup_state(struct vhost_dev *dev, bool start)
+{
+if (start) {
+return vhost_user_set_u64(dev, VHOST_USER_PRESETUP,
+  VHOST_USER_PRESETUP_START, false);
+} else {
+return vhost_user_set_u64(dev, VHOST_USER_PRESETUP,
+  VHOST_USER_PRESETUP_END, false);
+}
+}
+
+static int vhost_user_presetup(struct vhost_dev *dev, bool start)
+{
+if (!virtio_has_feature(dev->protocol_features,
+VHOST_USER_PROTOCOL_F_PRESETUP)) {
+return -ENOTSUP;
+}
+
+return vhost_user_set_presetup_state(dev, start);
+}
+
 const VhostOps user_ops = {
 .backend_type = VHOST_BACKEND_TYPE_USER,
 .vhost_backend_init = vhost_user_backend_init,
@@ -2777,4 +2806,5 @@ const VhostOps user_ops = {
 .vhost_set_inflight_fd = vhost_user_set_inflight_fd,
 .vhost_dev_start = vhost_user_dev_start,
 .vhost_reset_status = vhost_user_reset_status,
+.vhost_presetup = vhost_user_presetup,
 };
diff --git a/include/hw/virtio/vhost-backend.h 
b/include/hw/virtio/vhost-backend.h
index 31a251a9f5..00dd532df9 100644
--- a/include/hw/virtio/vhost-backend.h
+++ b/include/hw/virtio/vhost-backend.h
@@ -133,6 +133,8 @@ typedef int (*vhost_set_config_call_op)(struct vhost_dev 
*dev,
 
 typedef void (*vhost_reset_status_op)(struct vhost_dev *dev);
 
+typedef int (*vhost_presetup_op)(struct vhost_dev *dev, bool start);
+
 typedef struct VhostOps {
 VhostBackendType backend_type;
 vhost_backend_init vhost_backend_init;
@@ -181,6 +183,7 @@ typedef struct VhostOps {
 vhost_force_iommu_op vhost_force_iommu;
 vhost_set_config_call_op vhost_set_config_call;
 vhost_reset_status_op vhost_reset_status;
+vhost_presetup_op vhost_presetup;
 } VhostOps;
 
 int vhost_backend_update_device_iotlb(struct vhost_dev *dev,
-- 
2.27.0




[RFC PATCH 5/5] virtio-net: Introduce LM early load

2023-09-17 Thread Yajun Wu
Register a new vmstate for virtio-net with an early_setup flag to send
the device state during migration setup.

This can reduce the migration downtime of a virtio-net device with a
vhost-user backend.

This feature is disabled by default and can be enabled by setting the
"x-early-migration" device property to on.

Signed-off-by: Yajun Wu 
Reviewed-by: Avihai Horon 
Reviewed-by: Jiri Pirko 
---
 hw/net/trace-events|   1 +
 hw/net/virtio-net.c| 100 +
 include/hw/virtio/virtio-net.h |   1 +
 3 files changed, 102 insertions(+)

diff --git a/hw/net/trace-events b/hw/net/trace-events
index 6b5ba669a2..ec89229044 100644
--- a/hw/net/trace-events
+++ b/hw/net/trace-events
@@ -399,6 +399,7 @@ virtio_net_post_load_device(void)
 virtio_net_rss_disable(void)
 virtio_net_rss_error(const char *msg, uint32_t value) "%s, value 0x%08x"
 virtio_net_rss_enable(uint32_t p1, uint16_t p2, uint8_t p3) "hashes 0x%x, 
table of %d, key of %d"
+virtio_net_load_early_setup(void) ""
 
 # tulip.c
 tulip_reg_write(uint64_t addr, const char *name, int size, uint64_t val) "addr 
0x%02"PRIx64" (%s) size %d value 0x%08"PRIx64
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 7102ec4817..d0b0cc2ffe 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -46,6 +46,7 @@
 #include "net_rx_pkt.h"
 #include "hw/virtio/vhost.h"
 #include "sysemu/qtest.h"
+#include "sysemu/runstate.h"
 
 #define VIRTIO_NET_VM_VERSION11
 
@@ -3568,6 +3569,95 @@ static bool failover_hide_primary_device(DeviceListener 
*listener,
 return qatomic_read(>failover_primary_hidden);
 }
 
+static int virtio_net_load_early_setup(void *opaque, int version_id)
+{
+VirtIONet *n = opaque;
+VirtIODevice *vdev = VIRTIO_DEVICE(n);
+NetClientState *nc = qemu_get_queue(n->nic);
+int queue_pairs = n->multiqueue ? n->max_queue_pairs : 1;
+int cvq = virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ) ?
+n->max_ncs - n->max_queue_pairs : 0;
+VHostNetState *net;
+int r;
+
+assert(nc->peer);
+assert(nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_USER);
+
+net = get_vhost_net(nc->peer);
+assert(net);
+assert(net->dev.vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER);
+
+trace_virtio_net_load_early_setup();
+
+/* backend should support presetup */
+r = vhost_dev_set_presetup_state(>dev, true);
+if (r < 0) {
+error_report("Start presetup device fail: %d", r);
+return r;
+}
+
+if (virtio_has_feature(vdev->guest_features, VIRTIO_NET_F_MTU)) {
+r = vhost_net_set_mtu(get_vhost_net(nc->peer), n->net_conf.mtu);
+if (r < 0) {
+error_report("%uBytes MTU not supported by the backend",
+ n->net_conf.mtu);
+goto error;
+}
+}
+
+r = vhost_net_presetup(vdev, n->nic->ncs, queue_pairs, cvq);
+if (r < 0) {
+error_report("Presetup device fail: %d", r);
+goto error;
+}
+
+r = vhost_dev_set_presetup_state(>dev, false);
+if (r < 0) {
+error_report("Finish presetup device fail: %d", r);
+return r;
+}
+return 0;
+
+error:
+vhost_dev_set_presetup_state(>dev, false);
+return r;
+}
+
+static bool virtio_net_early_setup_needed(void *opaque)
+{
+VirtIONet *n = opaque;
+NetClientState *nc = qemu_get_queue(n->nic);
+VHostNetState *net = get_vhost_net(nc->peer);
+
+/*
+ * Presetup aims to reduce live migration downtime by sync device
+ * status in setup stage. So only do presetup when source VM is in
+ * running state.
+ */
+if (runstate_is_running() &&
+nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_USER &&
+net->dev.vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER &&
+!vhost_dev_has_iommu(>dev) &&
+n->vhost_started &&
+n->status & VIRTIO_NET_S_LINK_UP) {
+return true;
+}
+return false;
+}
+
+static const VMStateDescription vmstate_virtio_net_early = {
+.name = "virtio-net-early",
+.minimum_version_id = VIRTIO_NET_VM_VERSION,
+.version_id = VIRTIO_NET_VM_VERSION,
+.fields = (VMStateField[]) {
+VMSTATE_EARLY_VIRTIO_DEVICE,
+VMSTATE_END_OF_LIST()
+},
+.early_setup = true,
+.post_load = virtio_net_load_early_setup,
+.needed = virtio_net_early_setup_needed,
+};
+
 static void virtio_net_device_realize(DeviceState *dev, Error **errp)
 {
 VirtIODevice *vdev = VIRTIO_DEVICE(dev);
@@ -3743,6 +3833,11 @@ static void virtio_net_device_realize(DeviceState *dev, 
Error **errp)
 if (virtio_has_feature(n->host_features, VIRTIO_NET_F_RSS)) {
 virtio_net_load_

[RFC PATCH 4/5] virtio: Add VMState for early load

2023-09-17 Thread Yajun Wu
Define new virtio device vmstate for early save/load (happens in
LM setup stage). Same as original vmstate, except:

In LM setup phase of the destination VM, the guest memory has not
been synchronized yet. To address this, a flag has been added to
virtio_load function to skip the index check.

Signed-off-by: Yajun Wu 
Reviewed-by: Avihai Horon 
Reviewed-by: Jiri Pirko 
---
 hw/virtio/virtio.c | 152 +++--
 include/hw/virtio/virtio.h |  10 ++-
 2 files changed, 103 insertions(+), 59 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 969c25f4cf..8c73c245dd 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -2832,7 +2832,17 @@ virtio_device_get(QEMUFile *f, void *opaque, size_t size,
 VirtIODevice *vdev = VIRTIO_DEVICE(opaque);
 DeviceClass *dc = DEVICE_CLASS(VIRTIO_DEVICE_GET_CLASS(vdev));
 
-return virtio_load(vdev, f, dc->vmsd->version_id);
+return virtio_load(vdev, f, dc->vmsd->version_id, false);
+}
+
+/* A wrapper for use as a VMState .get function */
+static int virtio_early_device_get(QEMUFile *f, void *opaque, size_t size,
+  const VMStateField *field)
+{
+VirtIODevice *vdev = VIRTIO_DEVICE(opaque);
+DeviceClass *dc = DEVICE_CLASS(VIRTIO_DEVICE_GET_CLASS(vdev));
+
+return virtio_load(vdev, f, dc->vmsd->version_id, true);
 }
 
 const VMStateInfo  virtio_vmstate_info = {
@@ -2841,6 +2851,12 @@ const VMStateInfo  virtio_vmstate_info = {
 .put = virtio_device_put,
 };
 
+const VMStateInfo virtio_early_vmstate_info = {
+.name = "virtio-early",
+.get = virtio_early_device_get,
+.put = virtio_device_put,
+};
+
 static int virtio_set_features_nocheck(VirtIODevice *vdev, uint64_t val)
 {
 VirtioDeviceClass *k = VIRTIO_DEVICE_GET_CLASS(vdev);
@@ -2940,8 +2956,75 @@ size_t virtio_get_config_size(const 
VirtIOConfigSizeParams *params,
 return config_size;
 }
 
+static int virtio_load_check_index(VirtIODevice *vdev, int num)
+{
+int i;
+
+RCU_READ_LOCK_GUARD();
+
+for (i = 0; i < num; i++) {
+if (vdev->vq[i].vring.desc) {
+uint16_t nheads;
+
+/*
+ * VIRTIO-1 devices migrate desc, used, and avail ring addresses so
+ * only the region cache needs to be set up.  Legacy devices need
+ * to calculate used and avail ring addresses based on the desc
+ * address.
+ */
+if (virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1)) {
+virtio_init_region_cache(vdev, i);
+} else {
+virtio_queue_update_rings(vdev, i);
+}
+
+if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
+vdev->vq[i].shadow_avail_idx = vdev->vq[i].last_avail_idx;
+vdev->vq[i].shadow_avail_wrap_counter =
+vdev->vq[i].last_avail_wrap_counter;
+continue;
+}
+
+nheads = vring_avail_idx(>vq[i]) - 
vdev->vq[i].last_avail_idx;
+/* Check it isn't doing strange things with descriptor numbers. */
+if (nheads > vdev->vq[i].vring.num) {
+virtio_error(vdev, "VQ %d size 0x%x Guest index 0x%x "
+ "inconsistent with Host index 0x%x: delta 0x%x",
+ i, vdev->vq[i].vring.num,
+ vring_avail_idx(>vq[i]),
+ vdev->vq[i].last_avail_idx, nheads);
+vdev->vq[i].used_idx = 0;
+vdev->vq[i].shadow_avail_idx = 0;
+vdev->vq[i].inuse = 0;
+continue;
+}
+vdev->vq[i].used_idx = vring_used_idx(>vq[i]);
+vdev->vq[i].shadow_avail_idx = vring_avail_idx(>vq[i]);
+
+/*
+ * Some devices migrate VirtQueueElements that have been popped
+ * from the avail ring but not yet returned to the used ring.
+ * Since max ring size < UINT16_MAX it's safe to use modulo
+ * UINT16_MAX + 1 subtraction.
+ */
+vdev->vq[i].inuse = (uint16_t)(vdev->vq[i].last_avail_idx -
+vdev->vq[i].used_idx);
+if (vdev->vq[i].inuse > vdev->vq[i].vring.num) {
+error_report("VQ %d size 0x%x < last_avail_idx 0x%x - "
+ "used_idx 0x%x",
+ i, vdev->vq[i].vring.num,
+ vdev->vq[i].last_avail_idx,
+ vdev->vq[i].used_idx);
+return -1;
+}
+}
+}
+
+return 0;
+}
+
 int coroutine_mixed_fn
-virtio_load(VirtIODevice *vdev, QEMUFile *f, int version_id)
+virtio_load(VirtIODevice *vdev, QEMUFile 

[PATCH] docs: vhost-user: VHOST_USER_GET_STATUS require reply

2023-05-04 Thread Yajun Wu
Add VHOST_USER_GET_STATUS to the list of requests that require a reply.

Cc: Maxime Coquelin 
Cc: Michael S. Tsirkin 
Signed-off-by: Yajun Wu 
---
 docs/interop/vhost-user.rst | 1 +
 1 file changed, 1 insertion(+)

diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
index 8a5924ea75..2d13108284 100644
--- a/docs/interop/vhost-user.rst
+++ b/docs/interop/vhost-user.rst
@@ -299,6 +299,7 @@ replies. Here is a list of the ones that do:
 * ``VHOST_USER_GET_VRING_BASE``
 * ``VHOST_USER_SET_LOG_BASE`` (if ``VHOST_USER_PROTOCOL_F_LOG_SHMFD``)
 * ``VHOST_USER_GET_INFLIGHT_FD`` (if ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD``)
+* ``VHOST_USER_GET_STATUS`` (if ``VHOST_USER_PROTOCOL_F_STATUS``)
 
 .. seealso::
 
-- 
2.27.0




Re: [PATCH v2] vhost-user: send SET_STATUS 0 after GET_VRING_BASE

2023-05-03 Thread Yajun Wu



On 5/2/2023 7:04 AM, Stefan Hajnoczi wrote:

Setting the VIRTIO Device Status Field to 0 resets the device. The
device's state is lost, including the vring configuration.

vhost-user.c currently sends SET_STATUS 0 before GET_VRING_BASE. This
risks confusion about the lifetime of the vhost-user state (e.g. vring
last_avail_idx) across VIRTIO device reset.

Eugenio Pérez  adjusted the order for vhost-vdpa.c
in commit c3716f260bff ("vdpa: move vhost reset after get vring base")
and in that commit description suggested doing the same for vhost-user
in the future.

Go ahead and adjust vhost-user.c now. I ran various online code searches
to identify vhost-user backends implementing SET_STATUS. It seems only
DPDK implements SET_STATUS and Yajun Wu  has
confirmed that it is safe to make this change.

Fixes: commit 923b8921d210763359e96246a58658ac0db6c645 ("vhost-user: Support 
vhost_dev_start")
Cc: Michael S. Tsirkin 
Cc: Cindy Lu 
Cc: Yajun Wu 
Signed-off-by: Stefan Hajnoczi 
---
v2:
- Added VHOST_USER_PROTOCOL_F_STATUS check [Yajun Wu]
- Added "Fixes:" tag [Michael]
---
  hw/virtio/vhost-user.c | 16 +++-
  1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index e5285df4ba..40974afd06 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -2677,7 +2677,20 @@ static int vhost_user_dev_start(struct vhost_dev *dev, 
bool started)
VIRTIO_CONFIG_S_DRIVER |
VIRTIO_CONFIG_S_DRIVER_OK);
  } else {
-return vhost_user_set_status(dev, 0);
+return 0;
+}
+}
+
+static void vhost_user_reset_status(struct vhost_dev *dev)
+{
+/* Set device status only for last queue pair */
+if (dev->vq_index + dev->nvqs != dev->vq_index_end) {
+return;
+}
+
+if (virtio_has_feature(dev->protocol_features,
+   VHOST_USER_PROTOCOL_F_STATUS)) {
+vhost_user_set_status(dev, 0);
  }
  }

@@ -2716,4 +2729,5 @@ const VhostOps user_ops = {
  .vhost_get_inflight_fd = vhost_user_get_inflight_fd,
  .vhost_set_inflight_fd = vhost_user_set_inflight_fd,
  .vhost_dev_start = vhost_user_dev_start,
+.vhost_reset_status = vhost_user_reset_status,
  };
--
2.40.1


Reviewed-by: Yajun Wu 



Re: [PATCH] vhost-user: send SET_STATUS 0 after GET_VRING_BASE

2023-04-20 Thread Yajun Wu



On 4/20/2023 9:07 PM, Stefan Hajnoczi wrote:


Setting the VIRTIO Device Status Field to 0 resets the device. The
device's state is lost, including the vring configuration.

vhost-user.c currently sends SET_STATUS 0 before GET_VRING_BASE. This
risks confusion about the lifetime of the vhost-user state (e.g. vring
last_avail_idx) across VIRTIO device reset.

Eugenio Pérez  adjusted the order for vhost-vdpa.c
in commit c3716f260bff ("vdpa: move vhost reset after get vring base")
and in that commit description suggested doing the same for vhost-user
in the future.

Go ahead and adjust vhost-user.c now. I ran various online code searches
to identify vhost-user backends implementing SET_STATUS. It seems only
DPDK implements SET_STATUS and Yajun Wu  has
confirmed that it is safe to make this change.

Cc: Michael S. Tsirkin 
Cc: Cindy Lu 
Signed-off-by: Stefan Hajnoczi 
---
  hw/virtio/vhost-user.c | 13 -
  1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index e5285df4ba..2d40b1b3e7 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -2677,10 +2677,20 @@ static int vhost_user_dev_start(struct vhost_dev *dev, 
bool started)
VIRTIO_CONFIG_S_DRIVER |
VIRTIO_CONFIG_S_DRIVER_OK);
  } else {
-return vhost_user_set_status(dev, 0);
+return 0;
  }
  }

+static void vhost_user_reset_status(struct vhost_dev *dev)
+{
+/* Set device status only for last queue pair */
+if (dev->vq_index + dev->nvqs != dev->vq_index_end) {
+return;
+}
+
+vhost_user_set_status(dev, 0);
+}
+
  const VhostOps user_ops = {
  .backend_type = VHOST_BACKEND_TYPE_USER,
  .vhost_backend_init = vhost_user_backend_init,
@@ -2716,4 +2726,5 @@ const VhostOps user_ops = {
  .vhost_get_inflight_fd = vhost_user_get_inflight_fd,
  .vhost_set_inflight_fd = vhost_user_set_inflight_fd,
  .vhost_dev_start = vhost_user_dev_start,
+.vhost_reset_status = vhost_user_reset_status,
  };
--
2.39.2


Thank you for this fix.

Can you add protocol feature bit check, just like we do in 
vhost_user_dev_start?


    if (!virtio_has_feature(dev->protocol_features,
    VHOST_USER_PROTOCOL_F_STATUS)) {
    return 0;
    }





Re: Move vhost-user SET_STATUS 0 after get vring base?

2023-04-18 Thread Yajun Wu



On 4/18/2023 11:34 PM, Michael S. Tsirkin wrote:

On Tue, Apr 18, 2023 at 11:18:11AM -0400, Stefan Hajnoczi wrote:

Hi,
Cindy's commit ca71db438bdc ("vhost: implement vhost_dev_start method")
added SET_STATUS calls to vhost_dev_start() and vhost_dev_stop() for all
vhost backends.

Eugenio's commit c3716f260bff ("vdpa: move vhost reset after get vring
base") deferred the SET_STATUS 0 call in vhost_dev_stop() until after
GET_VRING_BASE for vDPA only. In that commit Eugenio said, "A patch to
make vhost_user_dev_start more similar to vdpa is desirable, but it can
be added on top".

I agree and think it's a good idea to keep the vhost backends in sync
where possible.

vhost-user still has the old behavior where QEMU sends SET_STATUS 0
before GET_VRING_BASE. Most existing vhost-user backends don't implement
the SET_STATUS message, so I think no one has tripped over this yet.

Any thoughts on making vhost-user behave like vDPA here?

Stefan

Wow. Well  SET_STATUS 0 resets the device so yes, I think doing that
before GET_VRING_BASE will lose a state. Donnu how it does not trip
up people, indeed the only idea is if people ignore SET_STATUS.


--
MST


For DPDK vhost-user backend SET_STATUS 0 (reset) is ignored.

Yajun





[PATCH] chardev/char-socket: set s->listener = NULL in char_socket_finalize

2023-02-14 Thread Yajun Wu
After live migration with virtio block device, qemu crash at:

#0  0x55914f46f795 in object_dynamic_cast_assert 
(obj=0x559151b7b090, typename=0x55914f80fbc4 "qio-channel", file=0x55914f80fb90 
"/images/testvfe/sw/qemu.gerrit/include/io/channel.h", line=30, 
func=0x55914f80fcb8 <__func__.17257> "QIO_CHANNEL") at ../qom/object.c:872
#1  0x55914f480d68 in QIO_CHANNEL (obj=0x559151b7b090) at 
/images/testvfe/sw/qemu.gerrit/include/io/channel.h:29
#2  0x55914f4812f8 in qio_net_listener_set_client_func_full 
(listener=0x559151b7a720, func=0x55914f580b97 , 
data=0x5591519f4ea0, notify=0x0, context=0x0) at ../io/net-listener.c:166
#3  0x55914f580059 in tcp_chr_update_read_handler 
(chr=0x5591519f4ea0) at ../chardev/char-socket.c:637
#4  0x55914f583dca in qemu_chr_be_update_read_handlers 
(s=0x5591519f4ea0, context=0x0) at ../chardev/char.c:226
#5  0x55914f57b7c9 in qemu_chr_fe_set_handlers_full 
(b=0x559152bf23a0, fd_can_read=0x0, fd_read=0x0, fd_event=0x0, be_change=0x0, 
opaque=0x0, context=0x0, set_open=false, sync_state=true) at 
../chardev/char-fe.c:279
#6  0x55914f57b86d in qemu_chr_fe_set_handlers (b=0x559152bf23a0, 
fd_can_read=0x0, fd_read=0x0, fd_event=0x0, be_change=0x0, opaque=0x0, 
context=0x0, set_open=false) at ../chardev/char-fe.c:304
#7  0x55914f378caf in vhost_user_async_close (d=0x559152bf21a0, 
chardev=0x559152bf23a0, vhost=0x559152bf2420, cb=0x55914f2fb8c1 
) at ../hw/virtio/vhost-user.c:2725
#8  0x55914f2fba40 in vhost_user_blk_event (opaque=0x559152bf21a0, 
event=CHR_EVENT_CLOSED) at ../hw/block/vhost-user-blk.c:395
#9  0x55914f58388c in chr_be_event (s=0x5591519f4ea0, 
event=CHR_EVENT_CLOSED) at ../chardev/char.c:61
#10 0x55914f583905 in qemu_chr_be_event (s=0x5591519f4ea0, 
event=CHR_EVENT_CLOSED) at ../chardev/char.c:81
#11 0x55914f581275 in char_socket_finalize (obj=0x5591519f4ea0) at 
../chardev/char-socket.c:1083
#12 0x55914f46f073 in object_deinit (obj=0x5591519f4ea0, 
type=0x5591519055c0) at ../qom/object.c:680
#13 0x55914f46f0e5 in object_finalize (data=0x5591519f4ea0) at 
../qom/object.c:694
#14 0x55914f46ff06 in object_unref (objptr=0x5591519f4ea0) at 
../qom/object.c:1202
#15 0x55914f4715a4 in object_finalize_child_property 
(obj=0x559151b76c50, name=0x559151b7b250 "char3", opaque=0x5591519f4ea0) at 
../qom/object.c:1747
#16 0x55914f46ee86 in object_property_del_all (obj=0x559151b76c50) 
at ../qom/object.c:632
#17 0x55914f46f0d2 in object_finalize (data=0x559151b76c50) at 
../qom/object.c:693
#18 0x55914f46ff06 in object_unref (objptr=0x559151b76c50) at 
../qom/object.c:1202
#19 0x55914f4715a4 in object_finalize_child_property 
(obj=0x559151b6b560, name=0x559151b76630 "chardevs", opaque=0x559151b76c50) at 
../qom/object.c:1747
#20 0x55914f46ef67 in object_property_del_child 
(obj=0x559151b6b560, child=0x559151b76c50) at ../qom/object.c:654
#21 0x55914f46f042 in object_unparent (obj=0x559151b76c50) at 
../qom/object.c:673
#22 0x55914f58632a in qemu_chr_cleanup () at ../chardev/char.c:1189
#23 0x55914f16c66c in qemu_cleanup () at ../softmmu/runstate.c:830
#24 0x55914eee7b9e in qemu_default_main () at ../softmmu/main.c:38
#25 0x55914eee7bcc in main (argc=86, argv=0x7ffc97cb8d88) at 
../softmmu/main.c:48

In char_socket_finalize after s->listener freed, event callback function
vhost_user_blk_event will be called to handle CHR_EVENT_CLOSED.
vhost_user_blk_event is calling qio_net_listener_set_client_func_full which
is still using s->listener.

Setting s->listener = NULL after object_unref(OBJECT(s->listener)) can
solve this issue.

Signed-off-by: Yajun Wu 
Acked-by: Jiri Pirko 

---
 chardev/char-socket.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/chardev/char-socket.c b/chardev/char-socket.c
index c2265436ac..8c58532171 100644
--- a/chardev/char-socket.c
+++ b/chardev/char-socket.c
@@ -1065,6 +1065,7 @@ static void char_socket_finalize(Object *obj)
 qio_net_listener_set_client_func_full(s->listener, NULL, NULL,
   NULL, chr->gcontext);
 object_unref(OBJECT(s->listener));
+s->listener = NULL;
 }
 if (s->tls_creds) {
 object_unref(OBJECT(s->tls_creds));
-- 
2.27.0




RE: [PULL v4 76/83] vhost-user: Support vhost_dev_start

2023-01-15 Thread Yajun Wu
7 0x55b709ad in vhost_dev_start (hdev=0x584dd600, 
vdev=0x57b965d0, vrings=false) at ../hw/virtio/vhost.c:1988
#38 0x5584291c in vhost_net_start_one (net=0x584dd600, 
dev=0x57b965d0) at ../hw/net/vhost_net.c:271
#39 0x55842f1e in vhost_net_start (dev=0x57b965d0, 
ncs=0x57bc09e0, data_queue_pairs=1, cvq=0) at ../hw/net/vhost_net.c:412
#40 0x55b1bf61 in virtio_net_vhost_status (n=0x57b965d0, status=15 
'\017') at ../hw/net/virtio-net.c:311
#41 0x55b1c20c in virtio_net_set_status (vdev=0x57b965d0, status=15 
'\017') at ../hw/net/virtio-net.c:392
#42 0x55b1ed04 in virtio_net_handle_mq (n=0x57b965d0, cmd=0 '\000', 
iov=0x56c7ef50, iov_cnt=1) at ../hw/net/virtio-net.c:1497
#43 0x55b1eef0 in virtio_net_handle_ctrl_iov (vdev=0x57b965d0, 
in_sg=0x56a09880, in_num=1, out_sg=0x56a09890, out_num=1) at 
../hw/net/virtio-net.c:1534
#44 0x55b1efe9 in virtio_net_handle_ctrl (vdev=0x57b965d0, 
vq=0x7fffc04ac140) at ../hw/net/virtio-net.c:1557
#45 0x55b63776 in virtio_queue_notify_vq (vq=0x7fffc04ac140) at 
../hw/virtio/virtio.c:2249
#46 0x55b669dc in virtio_queue_host_notifier_read (n=0x7fffc04ac1b4) at 
../hw/virtio/virtio.c:3529
#47 0x55e3f458 in aio_dispatch_handler (ctx=0x56a016c0, 
node=0x7ffd8800e430) at ../util/aio-posix.c:369
#48 0x55e3f613 in aio_dispatch_handlers (ctx=0x56a016c0) at 
../util/aio-posix.c:412
#49 0x55e3f669 in aio_dispatch (ctx=0x56a016c0) at 
../util/aio-posix.c:422
#50 0x55e585de in aio_ctx_dispatch (source=0x56a016c0, 
callback=0x0, user_data=0x0) at ../util/async.c:321
#51 0x7554895d in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
#52 0x55e5abea in glib_pollfds_poll () at ../util/main-loop.c:295
#53 0x55e5ac64 in os_host_main_loop_wait (timeout=0) at 
../util/main-loop.c:318
#54 0x55e5ad69 in main_loop_wait (nonblocking=0) at 
../util/main-loop.c:604
#55 0x559693de in qemu_main_loop () at ../softmmu/runstate.c:731
#56 0x556e7c06 in qemu_default_main () at ../softmmu/main.c:37
#57 0x556e7c3c in main (argc=71, argv=0x7fffcda8) at 
../softmmu/main.c:48



-Original Message-
From: Maxime Coquelin  
Sent: Thursday, January 12, 2023 5:26 PM
To: Laurent Vivier 
Cc: qemu-devel@nongnu.org; Peter Maydell ; Yajun Wu 
; Parav Pandit ; Michael S. Tsirkin 

Subject: Re: [PULL v4 76/83] vhost-user: Support vhost_dev_start

External email: Use caution opening links or attachments


Hi Laurent,

On 1/11/23 10:50, Laurent Vivier wrote:
> On 1/9/23 11:55, Michael S. Tsirkin wrote:
>> On Fri, Jan 06, 2023 at 03:21:43PM +0100, Laurent Vivier wrote:
>>> Hi,
>>>
>>> it seems this patch breaks vhost-user with DPDK.
>>>
>>> See 
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbu
>>> gzilla.redhat.com%2Fshow_bug.cgi%3Fid%3D2155173=05%7C01%7Cyajun
>>> w%40nvidia.com%7C47e6e0fabd044383fd3308daf47f0253%7C43083d15727340c1
>>> b7db39efd9ccc17a%7C0%7C0%7C638091123577559319%7CUnknown%7CTWFpbGZsb3
>>> d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D
>>> %7C3000%7C%7C%7C=1pjChYTKHVmBoempNitiZHBdrlPIMFjKoD6FeOVSay0%3
>>> D=0
>>>
>>> it seems QEMU doesn't receive the expected commands sequence:
>>>
>>> Received unexpected msg type. Expected 22 received 40 Fail to update 
>>> device iotlb Received unexpected msg type. Expected 40 received 22 
>>> Received unexpected msg type. Expected 22 received 11 Fail to update 
>>> device iotlb Received unexpected msg type. Expected 11 received 22 
>>> vhost VQ 1 ring restore failed: -71: Protocol error (71) Received 
>>> unexpected msg type. Expected 22 received 11 Fail to update device 
>>> iotlb Received unexpected msg type. Expected 11 received 22 vhost VQ 
>>> 0 ring restore failed: -71: Protocol error (71) unable to start 
>>> vhost net: 71: falling back on userspace virtio
>>>
>>> It receives VHOST_USER_GET_STATUS (40) when it expects 
>>> VHOST_USER_IOTLB_MSG (22) and VHOST_USER_IOTLB_MSG when it expects 
>>> VHOST_USER_GET_STATUS.
>>> and VHOST_USER_GET_VRING_BASE (11) when it expect 
>>> VHOST_USER_GET_STATUS and so on.
>>>
>>> Any idea?

We only have a single thread on DPDK side to handle Vhost-user requests, it 
will read a request, handle it and reply to it. Then it reads the next one, 
etc... So I don't think it is possible to mix request replies order on DPDK 
side.

Maybe there are two threads concurrently sending requests on QEMU side?

Regards,
Maxime

>>> Thanks,
>>> Laurent
>>
>>
>> So I am guessing it's coming from:
>>
>>  if (msg.hdr.request != req

RE: [PULL v4 76/83] vhost-user: Support vhost_dev_start

2023-01-11 Thread Yajun Wu
Hi,

VHOST_USER_PROTOCOL_F_STATUS is enabled by default (dpdk):

lib/vhost/vhost_user.h

17 #define VHOST_USER_PROTOCOL_FEATURES((1ULL << VHOST_USER_PROTOCOL_F_MQ) 
| \
 18  (1ULL << VHOST_USER_PROTOCOL_F_LOG_SHMFD) |\
 19  (1ULL << VHOST_USER_PROTOCOL_F_RARP) | \
 20  (1ULL << VHOST_USER_PROTOCOL_F_REPLY_ACK) | \
 21  (1ULL << VHOST_USER_PROTOCOL_F_NET_MTU) | \
 22  (1ULL << VHOST_USER_PROTOCOL_F_SLAVE_REQ) | \
 23  (1ULL << VHOST_USER_PROTOCOL_F_CRYPTO_SESSION) | \
 24  (1ULL << VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD) | \
 25  (1ULL << VHOST_USER_PROTOCOL_F_HOST_NOTIFIER) | \
 26  (1ULL << VHOST_USER_PROTOCOL_F_PAGEFAULT) | \
 27  (1ULL << VHOST_USER_PROTOCOL_F_STATUS))

Remove VHOST_USER_PROTOCOL_F_STATUS can disable VHOST_USER_SET/GET_STATUS 
message.
Should W.A. this issue.

Thanks,
Yajun

-Original Message-
From: Laurent Vivier  
Sent: Wednesday, January 11, 2023 5:50 PM
To: Maxime Coquelin 
Cc: qemu-devel@nongnu.org; Peter Maydell ; Yajun Wu 
; Parav Pandit ; Michael S. Tsirkin 

Subject: Re: [PULL v4 76/83] vhost-user: Support vhost_dev_start

External email: Use caution opening links or attachments


On 1/9/23 11:55, Michael S. Tsirkin wrote:
> On Fri, Jan 06, 2023 at 03:21:43PM +0100, Laurent Vivier wrote:
>> Hi,
>>
>> it seems this patch breaks vhost-user with DPDK.
>>
>> See 
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbug
>> zilla.redhat.com%2Fshow_bug.cgi%3Fid%3D2155173=05%7C01%7Cyajunw%
>> 40nvidia.com%7Cf4c581251ab548d64ae708daf3b94867%7C43083d15727340c1b7d
>> b39efd9ccc17a%7C0%7C0%7C638090274351645141%7CUnknown%7CTWFpbGZsb3d8ey
>> JWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C30
>> 00%7C%7C%7C=m582YO4Sd2jJ0S%2F%2FSv9zx6NSuXQIrRwkqBPgYedO%2Fr8%3
>> D=0
>>
>> it seems QEMU doesn't receive the expected commands sequence:
>>
>> Received unexpected msg type. Expected 22 received 40 Fail to update 
>> device iotlb Received unexpected msg type. Expected 40 received 22 
>> Received unexpected msg type. Expected 22 received 11 Fail to update 
>> device iotlb Received unexpected msg type. Expected 11 received 22 
>> vhost VQ 1 ring restore failed: -71: Protocol error (71) Received 
>> unexpected msg type. Expected 22 received 11 Fail to update device 
>> iotlb Received unexpected msg type. Expected 11 received 22 vhost VQ 
>> 0 ring restore failed: -71: Protocol error (71) unable to start vhost 
>> net: 71: falling back on userspace virtio
>>
>> It receives VHOST_USER_GET_STATUS (40) when it expects 
>> VHOST_USER_IOTLB_MSG (22) and VHOST_USER_IOTLB_MSG when it expects 
>> VHOST_USER_GET_STATUS.
>> and VHOST_USER_GET_VRING_BASE (11) when it expect VHOST_USER_GET_STATUS and 
>> so on.
>>
>> Any idea?
>>
>> Thanks,
>> Laurent
>
>
> So I am guessing it's coming from:
>
>  if (msg.hdr.request != request) {
>  error_report("Received unexpected msg type. Expected %d received %d",
>   request, msg.hdr.request);
>  return -EPROTO;
>  }
>
> in process_message_reply and/or in vhost_user_get_u64.
>
>
>> On 11/7/22 23:53, Michael S. Tsirkin wrote:
>>> From: Yajun Wu 
>>>
>>> The motivation of adding vhost-user vhost_dev_start support is to 
>>> improve backend configuration speed and reduce live migration VM 
>>> downtime.
>>>
>>> Today VQ configuration is issued one by one. For virtio net with 
>>> multi-queue support, backend needs to update RSS (Receive side
>>> scaling) on every rx queue enable. Updating RSS is time-consuming 
>>> (typical time like 7ms).
>>>
>>> Implement already defined vhost status and message in the vhost 
>>> specification [1].
>>> (a) VHOST_USER_PROTOCOL_F_STATUS
>>> (b) VHOST_USER_SET_STATUS
>>> (c) VHOST_USER_GET_STATUS
>>>
>>> Send message VHOST_USER_SET_STATUS with VIRTIO_CONFIG_S_DRIVER_OK 
>>> for device start and reset(0) for device stop.
>>>
>>> On reception of the DRIVER_OK message, backend can apply the needed 
>>> setting only once (instead of incremental) and also utilize 
>>> parallelism on enabling queues.
>>>
>>> This improves QEMU's live migration downtime with vhost user backend 
>>> implementation by great margin, specially for the large number of 
>>> VQs of 64 from 800 msec to 250 msec.
>>>
&g

[PATCH] vhost-user-blk: Fix live migration crash during event handling

2022-12-07 Thread Yajun Wu
After live migration with virtio block device, qemu crash at:

#0 0x7fe051e54269 in g_source_destroy () at 
/lib/x86_64-linux-gnu/libglib-2.0.so.0
#1 0x55cebaa5f37d in qio_net_listener_set_client_func_full 
(listener=0x55cebceab340, func=0x55cebab4f5f2 , 
data=0x55cebcdfcc00, notify=0x0, context=0x0) at ../io/net-listener.c:157
#2 0x55cebab4ea99 in tcp_chr_update_read_handler 
(chr=0x55cebcdfcc00) at ../chardev/char-socket.c:639
#3 0x55cebab529fa in qemu_chr_be_update_read_handlers 
(s=0x55cebcdfcc00, context=0x0) at ../chardev/char.c:226
#4 0x55cebab4a04e in qemu_chr_fe_set_handlers_full 
(b=0x55cebdf52120, fd_can_read=0x0, fd_read=0x0, fd_event=0x0, be_change=0x0, 
opaque=0x0, context=0x0, set_open=false, sync_state=true) at 
../chardev/char-fe.c:279
#5 0x55cebab4a0f6 in qemu_chr_fe_set_handlers(b=0x55cebdf52120, 
fd_can_read=0x0, fd_read=0x0, fd_event=0x0, be_change=0x0, opaque=0x0, 
context=0x0, set_open=false) at ../chardev/char-fe.c:304
#6 0x55ceba8ec3c8 in vhost_user_blk_event (opaque=0x55cebdf51f40, 
event=CHR_EVENT_CLOSED) at ../hw/block/vhost-user-blk.c:412
#7 0x55cebab524a1 in chr_be_event (s=0x55cebcdfcc00, 
event=CHR_EVENT_CLOSED) at ../chardev/char.c:61
#8 0x55cebab52519 in qemu_chr_be_event (s=0x55cebcdfcc00, 
event=CHR_EVENT_CLOSED) at ../chardev/char.c:81
#9 0x55cebab4fce4 in char_socket_finalize (obj=0x55cebcdfcc00) at 
../chardev/char-socket.c:1085
#10 0x55cebaa4cde5 in object_deinit (obj=0x55cebcdfcc00, 
type=0x55cebcc67160) at ../qom/object.c:675
#11 0x55cebaa4ce5b in object_finalize (data=0x55cebcdfcc00) at 
../qom/object.c:689
#12 0x55cebaa4dcec in object_unref (objptr=0x55cebcdfcc00) at 
../qom/object.c:1192
#13 0x55cebaa4f3ee in object_finalize_child_property 
(obj=0x55cebcc6df40, name=0x55cebcead490 "char0", opaque=0x55cebcdfcc00) at 
../qom/object.c:1735
#14 0x55cebaa4cbe4 in object_property_del_all (obj=0x55cebcc6df40) 
at ../qom/object.c:627
#15 0x55cebaa4ce48 in object_finalize (data=0x55cebcc6df40) at 
../qom/object.c:688
#16 0x55cebaa4dcec in object_unref (objptr=0x55cebcc6df40) at 
../qom/object.c:1192
#17 0x55cebaa4f3ee in object_finalize_child_property 
(obj=0x55cebce96e00, name=0x55cebceab300 "chardevs", opaque=0x55cebcc6df40) at 
../qom/object.c:1735
#18 0x55cebaa4ccd1 in object_property_del_child 
(obj=0x55cebce96e00, child=0x55cebcc6df40) at ../qom/object.c:649
#19 0x55cebaa4cdb0 in object_unparent (obj=0x55cebcc6df40) at 
../qom/object.c:668
#20 0x55cebab55124 in qemu_chr_cleanup () at ../chardev/char.c:1222
#21 0x55ceba79a561 in qemu_cleanup () at ../softmmu/runstate.c:823
#22 0x55ceba53d65f in qemu_main (argc=78, argv=0x7ffc9440bd98, 
envp=0x0) at ../softmmu/main.c:37
#23 0x55ceba53d68f in main (argc=78, argv=0x7ffc9440bd98) at 
../softmmu/main.c:45

Function qemu_chr_fe_set_handlers should not be called in qemu_chr_cleanup,
because chardev already freed. Quick fix is to handle RUN_STATE_POSTMIGRATE
same as RUN_STATE_SHUTDOWN.

Better solution is to add block device cleanup function like net_cleanup and
call it in qemu_cleanup.

Signed-off-by: Yajun Wu 
Acked-by: Parav Pandit 
---
 hw/block/vhost-user-blk.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 0d5190accf..b323d5820b 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -110,7 +110,7 @@ static int vhost_user_blk_handle_config_change(struct 
vhost_dev *dev)
 }
 
 /* valid for resize only */
-if (blkcfg.capacity != s->blkcfg.capacity) {
+if (s && blkcfg.capacity != s->blkcfg.capacity) {
 s->blkcfg.capacity = blkcfg.capacity;
 memcpy(dev->vdev->config, >blkcfg, vdev->config_len);
 virtio_notify_config(dev->vdev);
@@ -398,7 +398,8 @@ static void vhost_user_blk_event(void *opaque, QEMUChrEvent 
event)
 }
 break;
 case CHR_EVENT_CLOSED:
-if (!runstate_check(RUN_STATE_SHUTDOWN)) {
+if (!runstate_check(RUN_STATE_SHUTDOWN) &&
+!runstate_check(RUN_STATE_POSTMIGRATE)) {
 /*
  * A close event may happen during a read/write, but vhost
  * code assumes the vhost_dev remains setup, so delay the
-- 
2.27.0




[PATCH] vhost-user: send set log base message only once

2022-11-21 Thread Yajun Wu
Vhost message VHOST_USER_SET_LOG_BASE is device wide. So only
send it once with the first queue pair.

Signed-off-by: Yajun Wu 
Acked-by: Parav Pandit 
---
 hw/virtio/vhost-user.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index abe23d4ebe..b3bdf7921d 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -526,6 +526,11 @@ static int vhost_user_set_log_base(struct vhost_dev *dev, 
uint64_t base,
 .hdr.size = sizeof(msg.payload.log),
 };
 
+/* Send only once with first queue pair */
+if (dev->vq_index != 0) {
+return 0;
+}
+
 if (shmfd && log->fd != -1) {
 fds[fd_num++] = log->fd;
 }
-- 
2.27.0




RE: [PATCH v2 1/4] hw/virtio: incorporate backend features in features

2022-11-10 Thread Yajun Wu
Hi Alex,

Sorry for the late response, I missed your mail.
You can test together with dpdk and have reproduce.

Steps:
1. DPDK 
git clone https://github.com/DPDK/dpdk.git
git checkout v22.07
meson build -Dexamples=vhost_blk
ninja -C build
cd /tmp/
sudo dpdk/build/examples/dpdk-vhost_blk # it's a daemon 

2. Add blk device to qemu, then bootup VM.
  




  

Without this commit, you can get device ready log from dpdk-vhost_blk:

VHOST_CONFIG: (/tmp/vhost.socket) negotiated Vhost-user protocol features: 
0x11ebf
VHOST_CONFIG: (/tmp/vhost.socket) negotiated Virtio features: 0x1
VHOST_CONFIG: (/tmp/vhost.socket) virtio is now ready for processing.
New Device /tmp/vhost.socket, Device ID 0
Ctrlr Worker Thread start

With this commit, device won't be ready and VM will hang. 
You can see VHOST_USER_F_PROTOCOL_FEATURES bit is added.

VHOST_CONFIG: (/tmp/vhost.socket) negotiated Virtio features: 0x14000

Dpdk code related:
./lib/vhost/vhost_user.c

2044 /* 
 
2045  * When VHOST_USER_F_PROTOCOL_FEATURES is not negotiated,  
 
2046  * the ring starts already enabled. Otherwise, it is enabled via   
 
2047  * the SET_VRING_ENABLE message.   
 
2048  */
 
2049 if (!(dev->features & (1ULL << VHOST_USER_F_PROTOCOL_FEATURES))) { 
 
2050 vq->enabled = true;
 
2051 }


Thanks,
Yajun 




-Original Message-
From: Alex Bennée  
Sent: Wednesday, October 26, 2022 3:42 PM
To: Yajun Wu 
Cc: qemu-devel@nongnu.org; m...@redhat.com; Parav Pandit 
Subject: Re: [PATCH v2 1/4] hw/virtio: incorporate backend features in features

External email: Use caution opening links or attachments


Yajun Wu  writes:

> Hi Alex,
>
> With this change, VHOST_USER_F_PROTOCOL_FEATURES bit will be set to 
> backend for virtio block device (previously not).
>
> From 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.qemu.org%2Fdocs%2Fmaster%2Finterop%2Fvhost-user.htmldata=05%7C01%7Cyajunw%40nvidia.com%7C2e8901540bf441248ec608dab725ca87%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C638023670196631779%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=y5g9wwTfh%2BxrkESRypoo4pg3eKYInyDerDmI844PBSE%3Dreserved=0
>  spec:
> If VHOST_USER_F_PROTOCOL_FEATURES has not been negotiated, the ring starts 
> directly in the enabled state.
> If VHOST_USER_F_PROTOCOL_FEATURES has been negotiated, the ring is 
> initialized in a disabled state and is enabled by 
> VHOST_USER_SET_VRING_ENABLE with parameter 1.
>
> Vhost-user-blk won't send out VHOST_USER_SET_VRING_ENABLE today.
> Backend gets VHOST_USER_F_PROTOCOL_FEATURES negotiated and can't get 
> VHOST_USER_SET_VRING_ENABLE.
> VQs keep in disabled state.

If the backend advertises protocol features but the stub doesn't support it how 
does it get enabled?

The testing I did was mostly by hand with the gpio backend and using the 
qtests. I Think we need to add some acceptance testing into avocado with some 
real daemons because I don't think we have enough coverage with the current 
qtest approach.

>
> Can you check on this scenario?
>
> Thanks
>
> -Original Message-
> From: Qemu-devel  On 
> Behalf Of Alex Bennée
> Sent: Thursday, July 28, 2022 9:55 PM
> To: qemu-devel@nongnu.org
> Cc: m...@redhat.com; Alex Bennée 
> Subject: [PATCH v2 1/4] hw/virtio: incorporate backend features in 
> features
>
> External email: Use caution opening links or attachments
>
>
> There are some extra bits used over a vhost-user connection which are hidden 
> from the device itself. We need to set them here to ensure we enable things 
> like the protocol extensions.
>
> Currently net/vhost-user.c has it's own inscrutable way of persisting this 
> data but it really should live in the core vhost_user code.
>
> Signed-off-by: Alex Bennée 
> Message-Id: <20220726192150.2435175-7-alex.ben...@linaro.org>
> ---
>  hw/virtio/vhost-user.c | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index 
> 75b8df21a4..1936a44e82 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -1460,7 +1460,14 @@ static int vhost_user_set_features(struct vhost_dev 
> *dev,
>   */
>  bool log_enabled = features & (0x1ULL << VHOST_F_LOG_ALL);
>
> -return vhost_user_set_u64(dev, VHOST_USER_SET_FEATURES, features,
> +/*
> + * We need to include any extra backend only feature bits that
> + * mig

[PATCH] vhost: set mem table before device start

2022-11-08 Thread Yajun Wu
Today guest memory information (through VHOST_USER_SET_MEM_TABLE)
is sent out during vhost device start. Due to this delay, memory
pinning is delayed. For 4G guest memory, a VFIO driver usually
takes around 400+msec to pin the memory.

This time is accounted towards the VM downtime. When live migrating
a VM, vhost device start is occuring in vmstate load stage.

Moving set mem table just after VM bootup, before device start can
let backend have enough time to pin the guest memory before starting
the device. This improvements reduces VM downtime by 400+msec.

Signed-off-by: Yajun Wu 
Acked-by: Parav Pandit 
---
 hw/virtio/vhost.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index f758f177bb..73e473cd84 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -539,6 +539,14 @@ static void vhost_commit(MemoryListener *listener)
 }
 
 if (!dev->started) {
+/* Backend can pin memory before device start, reduce LM downtime */
+if (dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER &&
+dev->n_mem_sections) {
+r = dev->vhost_ops->vhost_set_mem_table(dev, dev->mem);
+if (r < 0) {
+VHOST_OPS_DEBUG(r, "vhost_set_mem_table failed");
+}
+}
 goto out;
 }
 
-- 
2.27.0




[PATCH v4 1/2] vhost: Change the sequence of device start

2022-11-06 Thread Yajun Wu
This patch is part of adding vhost-user vhost_dev_start support. The
motivation is to improve backend configuration speed and reduce live
migration VM downtime.

Moving the device start routines after finishing all the necessary device
and VQ configuration, further aligning to the virtio specification for
"device initialization sequence".

Following patch will add vhost-user vhost_dev_start support.

Signed-off-by: Yajun Wu 
Acked-by: Parav Pandit 

---
 hw/block/vhost-user-blk.c | 18 +++---
 hw/net/vhost_net.c| 14 --
 2 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 13bf5cc47a..28409c90f7 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -168,13 +168,6 @@ static int vhost_user_blk_start(VirtIODevice *vdev, Error 
**errp)
 goto err_guest_notifiers;
 }
 
-ret = vhost_dev_start(>dev, vdev);
-if (ret < 0) {
-error_setg_errno(errp, -ret, "Error starting vhost");
-goto err_guest_notifiers;
-}
-s->started_vu = true;
-
 /* guest_notifier_mask/pending not used yet, so just unmask
  * everything here. virtio-pci will do the right thing by
  * enabling/disabling irqfd.
@@ -183,9 +176,20 @@ static int vhost_user_blk_start(VirtIODevice *vdev, Error 
**errp)
 vhost_virtqueue_mask(>dev, vdev, i, false);
 }
 
+s->dev.vq_index_end = s->dev.nvqs;
+ret = vhost_dev_start(>dev, vdev);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "Error starting vhost");
+goto err_guest_notifiers;
+}
+s->started_vu = true;
+
 return ret;
 
 err_guest_notifiers:
+for (i = 0; i < s->dev.nvqs; i++) {
+vhost_virtqueue_mask(>dev, vdev, i, true);
+}
 k->set_guest_notifiers(qbus->parent, s->dev.nvqs, false);
 err_host_notifiers:
 vhost_dev_disable_notifiers(>dev, vdev);
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index d28f8b974b..0fe71ed309 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -387,21 +387,23 @@ int vhost_net_start(VirtIODevice *dev, NetClientState 
*ncs,
 } else {
 peer = qemu_get_peer(ncs, n->max_queue_pairs);
 }
-r = vhost_net_start_one(get_vhost_net(peer), dev);
-
-if (r < 0) {
-goto err_start;
-}
 
 if (peer->vring_enable) {
 /* restore vring enable state */
 r = vhost_set_vring_enable(peer, peer->vring_enable);
 
 if (r < 0) {
-vhost_net_stop_one(get_vhost_net(peer), dev);
 goto err_start;
 }
 }
+
+r = vhost_net_start_one(get_vhost_net(peer), dev);
+if (r < 0) {
+if (peer->vring_enable) {
+vhost_set_vring_enable(peer, false);
+}
+goto err_start;
+}
 }
 
 return 0;
-- 
2.27.0




[PATCH v4 0/2] vhost-user: Support vhost_dev_start

2022-11-06 Thread Yajun Wu
The motivation of adding vhost-user vhost_dev_start support is to
improve backend configuration speed and reduce live migration VM
downtime.

Today VQ configuration is issued one by one. For virtio net with
multi-queue support, backend needs to update RSS (Receive side
scaling) on every rx queue enable. Updating RSS is time-consuming
(typical time like 7ms).

Implement already defined vhost status and message in the vhost
specification [1].
(a) VHOST_USER_PROTOCOL_F_STATUS
(b) VHOST_USER_SET_STATUS
(c) VHOST_USER_GET_STATUS

Send message VHOST_USER_SET_STATUS with VIRTIO_CONFIG_S_DRIVER_OK for
device start and reset(0) for device stop.

On reception of the DRIVER_OK message, backend can apply the needed setting
only once (instead of incremental) and also utilize parallelism on enabling
queues.

This improves QEMU's live migration downtime with vhost user backend
implementation by great margin, specially for the large number of VQs of 64
from 800 msec to 250 msec.

Another change is to move the device start routines after finishing all the
necessary device and VQ configuration, further aligning to the virtio
specification for "device initialization sequence".

[1] https://qemu-project.gitlab.io/qemu/interop/vhost-user.html#introduction

v4:
- fix vhost_net_start_one fallback code

v3:
- rebase

v2:
- add setting status bit VIRTIO_CONFIG_S_FEATURES_OK
- avoid adding status bits already set

Yajun Wu (2):
  vhost: Change the sequence of device start
  vhost-user: Support vhost_dev_start

 hw/block/vhost-user-blk.c | 18 ++
 hw/net/vhost_net.c| 14 
 hw/virtio/vhost-user.c| 74 ++-
 3 files changed, 92 insertions(+), 14 deletions(-)

-- 
2.27.0




[PATCH v4 2/2] vhost-user: Support vhost_dev_start

2022-11-06 Thread Yajun Wu
The motivation of adding vhost-user vhost_dev_start support is to
improve backend configuration speed and reduce live migration VM
downtime.

Today VQ configuration is issued one by one. For virtio net with
multi-queue support, backend needs to update RSS (Receive side
scaling) on every rx queue enable. Updating RSS is time-consuming
(typical time like 7ms).

Implement already defined vhost status and message in the vhost
specification [1].
(a) VHOST_USER_PROTOCOL_F_STATUS
(b) VHOST_USER_SET_STATUS
(c) VHOST_USER_GET_STATUS

Send message VHOST_USER_SET_STATUS with VIRTIO_CONFIG_S_DRIVER_OK for
device start and reset(0) for device stop.

On reception of the DRIVER_OK message, backend can apply the needed setting
only once (instead of incremental) and also utilize parallelism on enabling
queues.

This improves QEMU's live migration downtime with vhost user backend
implementation by great margin, specially for the large number of VQs of 64
from 800 msec to 250 msec.

[1] https://qemu-project.gitlab.io/qemu/interop/vhost-user.html

Signed-off-by: Yajun Wu 
Acked-by: Parav Pandit 
---
 hw/virtio/vhost-user.c | 74 +-
 1 file changed, 73 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 03415b6c95..bb5164b753 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -81,6 +81,7 @@ enum VhostUserProtocolFeature {
 VHOST_USER_PROTOCOL_F_RESET_DEVICE = 13,
 /* Feature 14 reserved for VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS. */
 VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS = 15,
+VHOST_USER_PROTOCOL_F_STATUS = 16,
 VHOST_USER_PROTOCOL_F_MAX
 };
 
@@ -126,6 +127,8 @@ typedef enum VhostUserRequest {
 VHOST_USER_GET_MAX_MEM_SLOTS = 36,
 VHOST_USER_ADD_MEM_REG = 37,
 VHOST_USER_REM_MEM_REG = 38,
+VHOST_USER_SET_STATUS = 39,
+VHOST_USER_GET_STATUS = 40,
 VHOST_USER_MAX
 } VhostUserRequest;
 
@@ -1452,6 +1455,43 @@ static int vhost_user_set_u64(struct vhost_dev *dev, int 
request, uint64_t u64,
 return 0;
 }
 
+static int vhost_user_set_status(struct vhost_dev *dev, uint8_t status)
+{
+return vhost_user_set_u64(dev, VHOST_USER_SET_STATUS, status, false);
+}
+
+static int vhost_user_get_status(struct vhost_dev *dev, uint8_t *status)
+{
+uint64_t value;
+int ret;
+
+ret = vhost_user_get_u64(dev, VHOST_USER_GET_STATUS, );
+if (ret < 0) {
+return ret;
+}
+*status = value;
+
+return 0;
+}
+
+static int vhost_user_add_status(struct vhost_dev *dev, uint8_t status)
+{
+uint8_t s;
+int ret;
+
+ret = vhost_user_get_status(dev, );
+if (ret < 0) {
+return ret;
+}
+
+if ((s & status) == status) {
+return 0;
+}
+s |= status;
+
+return vhost_user_set_status(dev, s);
+}
+
 static int vhost_user_set_features(struct vhost_dev *dev,
uint64_t features)
 {
@@ -1460,6 +1500,7 @@ static int vhost_user_set_features(struct vhost_dev *dev,
  * backend is actually logging changes
  */
 bool log_enabled = features & (0x1ULL << VHOST_F_LOG_ALL);
+int ret;
 
 /*
  * We need to include any extra backend only feature bits that
@@ -1467,9 +1508,18 @@ static int vhost_user_set_features(struct vhost_dev *dev,
  * VHOST_USER_F_PROTOCOL_FEATURES bit for enabling protocol
  * features.
  */
-return vhost_user_set_u64(dev, VHOST_USER_SET_FEATURES,
+ret = vhost_user_set_u64(dev, VHOST_USER_SET_FEATURES,
   features | dev->backend_features,
   log_enabled);
+
+if (virtio_has_feature(dev->protocol_features,
+   VHOST_USER_PROTOCOL_F_STATUS)) {
+if (!ret) {
+return vhost_user_add_status(dev, VIRTIO_CONFIG_S_FEATURES_OK);
+}
+}
+
+return ret;
 }
 
 static int vhost_user_set_protocol_features(struct vhost_dev *dev,
@@ -2615,6 +2665,27 @@ void vhost_user_cleanup(VhostUserState *user)
 user->chr = NULL;
 }
 
+static int vhost_user_dev_start(struct vhost_dev *dev, bool started)
+{
+if (!virtio_has_feature(dev->protocol_features,
+VHOST_USER_PROTOCOL_F_STATUS)) {
+return 0;
+}
+
+/* Set device status only for last queue pair */
+if (dev->vq_index + dev->nvqs != dev->vq_index_end) {
+return 0;
+}
+
+if (started) {
+return vhost_user_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
+  VIRTIO_CONFIG_S_DRIVER |
+  VIRTIO_CONFIG_S_DRIVER_OK);
+} else {
+return vhost_user_set_status(dev, 0);
+}
+}
+
 const VhostOps user_ops = {
 .backend_type = VHOST_BACKEND_TYPE_USER,
 .vhost_backend_init = vhost_user_backend_init,
@@ -2649,4 +2720,5 @@ const VhostOps user_ops = {
 .vhost_backend_mem_secti

RE: [PATCH v2 1/4] hw/virtio: incorporate backend features in features

2022-10-26 Thread Yajun Wu
Hi Alex,

With this change, VHOST_USER_F_PROTOCOL_FEATURES bit will be set to backend for 
virtio block device (previously not).

From https://www.qemu.org/docs/master/interop/vhost-user.html spec:
If VHOST_USER_F_PROTOCOL_FEATURES has not been negotiated, the ring starts 
directly in the enabled state.
If VHOST_USER_F_PROTOCOL_FEATURES has been negotiated, the ring is initialized 
in a disabled state and is enabled by VHOST_USER_SET_VRING_ENABLE with 
parameter 1.

Vhost-user-blk won't send out VHOST_USER_SET_VRING_ENABLE today. 
Backend gets VHOST_USER_F_PROTOCOL_FEATURES negotiated and can't get 
VHOST_USER_SET_VRING_ENABLE.
VQs keep in disabled state.

Can you check on this scenario?

Thanks

-Original Message-
From: Qemu-devel  On Behalf Of 
Alex Bennée
Sent: Thursday, July 28, 2022 9:55 PM
To: qemu-devel@nongnu.org
Cc: m...@redhat.com; Alex Bennée 
Subject: [PATCH v2 1/4] hw/virtio: incorporate backend features in features

External email: Use caution opening links or attachments


There are some extra bits used over a vhost-user connection which are hidden 
from the device itself. We need to set them here to ensure we enable things 
like the protocol extensions.

Currently net/vhost-user.c has it's own inscrutable way of persisting this data 
but it really should live in the core vhost_user code.

Signed-off-by: Alex Bennée 
Message-Id: <20220726192150.2435175-7-alex.ben...@linaro.org>
---
 hw/virtio/vhost-user.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index 
75b8df21a4..1936a44e82 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -1460,7 +1460,14 @@ static int vhost_user_set_features(struct vhost_dev *dev,
  */
 bool log_enabled = features & (0x1ULL << VHOST_F_LOG_ALL);

-return vhost_user_set_u64(dev, VHOST_USER_SET_FEATURES, features,
+/*
+ * We need to include any extra backend only feature bits that
+ * might be needed by our device. Currently this includes the
+ * VHOST_USER_F_PROTOCOL_FEATURES bit for enabling protocol
+ * features.
+ */
+return vhost_user_set_u64(dev, VHOST_USER_SET_FEATURES,
+  features | dev->backend_features,
   log_enabled);  }

--
2.30.2




[PATCH] vhost-user: Fix out of order vring host notification handling

2022-10-17 Thread Yajun Wu
vhost backend sends host notification for every VQ. If backend creates
VQs in parallel, the VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG may
arrive to QEMU in different order than incremental queue index order.

For example VQ 1's message arrive earlier than VQ 0's:
After alloc VhostUserHostNotifier for VQ 1. GPtrArray becomes

[ nil, VQ1 pointer ]

After alloc VhostUserHostNotifier for VQ 0. GPtrArray becomes

[ VQ0 pointer, nil, VQ1 pointer ]

This is wrong. fetch_notifier will return NULL for VQ 1 in
vhost_user_get_vring_base, causes host notifier miss removal(leak).

The fix is to remove current element from GPtrArray, make the right
position for element to insert.

Fixes: 503e355465 ("virtio/vhost-user: dynamically assign 
VhostUserHostNotifiers")
Signed-off-by: Yajun Wu 
Acked-by: Parav Pandit 

---
 hw/virtio/vhost-user.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 03415b6c95..d256ce589b 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -1543,6 +1543,11 @@ static VhostUserHostNotifier 
*fetch_or_create_notifier(VhostUserState *u,
 
 n = g_ptr_array_index(u->notifiers, idx);
 if (!n) {
+/*
+ * In case notification arrive out-of-order,
+ * make room for current index.
+ */
+g_ptr_array_remove_index(u->notifiers, idx);
 n = g_new0(VhostUserHostNotifier, 1);
 n->idx = idx;
 g_ptr_array_insert(u->notifiers, idx, n);
-- 
2.27.0




[PATCH v3 0/2] vhost-user: Support vhost_dev_start

2022-10-17 Thread Yajun Wu
The motivation of adding vhost-user vhost_dev_start support is to
improve backend configuration speed and reduce live migration VM
downtime.

Today VQ configuration is issued one by one. For virtio net with
multi-queue support, backend needs to update RSS (Receive side
scaling) on every rx queue enable. Updating RSS is time-consuming
(typical time like 7ms).

Implement already defined vhost status and message in the vhost
specification [1].
(a) VHOST_USER_PROTOCOL_F_STATUS
(b) VHOST_USER_SET_STATUS
(c) VHOST_USER_GET_STATUS

Send message VHOST_USER_SET_STATUS with VIRTIO_CONFIG_S_DRIVER_OK for
device start and reset(0) for device stop.

On reception of the DRIVER_OK message, backend can apply the needed setting
only once (instead of incremental) and also utilize parallelism on enabling
queues.

This improves QEMU's live migration downtime with vhost user backend
implementation by great margin, specially for the large number of VQs of 64
from 800 msec to 250 msec.

Another change is to move the device start routines after finishing all the
necessary device and VQ configuration, further aligning to the virtio
specification for "device initialization sequence".

[1] https://qemu-project.gitlab.io/qemu/interop/vhost-user.html#introduction

v3:
- rebase

v2:
- add setting status bit VIRTIO_CONFIG_S_FEATURES_OK
- avoid adding status bits already set

Yajun Wu (2):
  vhost: Change the sequence of device start
  vhost-user: Support vhost_dev_start

 hw/block/vhost-user-blk.c | 18 ++
 hw/net/vhost_net.c| 12 +++
 hw/virtio/vhost-user.c| 74 ++-
 3 files changed, 90 insertions(+), 14 deletions(-)

-- 
2.27.0




[PATCH v3 1/2] vhost: Change the sequence of device start

2022-10-17 Thread Yajun Wu
This patch is part of adding vhost-user vhost_dev_start support. The
motivation is to improve backend configuration speed and reduce live
migration VM downtime.

Moving the device start routines after finishing all the necessary device
and VQ configuration, further aligning to the virtio specification for
"device initialization sequence".

Following patch will add vhost-user vhost_dev_start support.

Signed-off-by: Yajun Wu 
Acked-by: Parav Pandit 

---
 hw/block/vhost-user-blk.c | 18 +++---
 hw/net/vhost_net.c| 12 ++--
 2 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 84902dde17..f4deb8cd5d 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -164,13 +164,6 @@ static int vhost_user_blk_start(VirtIODevice *vdev, Error 
**errp)
 goto err_guest_notifiers;
 }
 
-ret = vhost_dev_start(>dev, vdev);
-if (ret < 0) {
-error_setg_errno(errp, -ret, "Error starting vhost");
-goto err_guest_notifiers;
-}
-s->started_vu = true;
-
 /* guest_notifier_mask/pending not used yet, so just unmask
  * everything here. virtio-pci will do the right thing by
  * enabling/disabling irqfd.
@@ -179,9 +172,20 @@ static int vhost_user_blk_start(VirtIODevice *vdev, Error 
**errp)
 vhost_virtqueue_mask(>dev, vdev, i, false);
 }
 
+s->dev.vq_index_end = s->dev.nvqs;
+ret = vhost_dev_start(>dev, vdev);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "Error starting vhost");
+goto err_guest_notifiers;
+}
+s->started_vu = true;
+
 return ret;
 
 err_guest_notifiers:
+for (i = 0; i < s->dev.nvqs; i++) {
+vhost_virtqueue_mask(>dev, vdev, i, true);
+}
 k->set_guest_notifiers(qbus->parent, s->dev.nvqs, false);
 err_host_notifiers:
 vhost_dev_disable_notifiers(>dev, vdev);
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index d28f8b974b..d6924f5e57 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -387,21 +387,21 @@ int vhost_net_start(VirtIODevice *dev, NetClientState 
*ncs,
 } else {
 peer = qemu_get_peer(ncs, n->max_queue_pairs);
 }
-r = vhost_net_start_one(get_vhost_net(peer), dev);
-
-if (r < 0) {
-goto err_start;
-}
 
 if (peer->vring_enable) {
 /* restore vring enable state */
 r = vhost_set_vring_enable(peer, peer->vring_enable);
 
 if (r < 0) {
-vhost_net_stop_one(get_vhost_net(peer), dev);
 goto err_start;
 }
 }
+
+r = vhost_net_start_one(get_vhost_net(peer), dev);
+if (r < 0) {
+vhost_net_stop_one(get_vhost_net(peer), dev);
+goto err_start;
+}
 }
 
 return 0;
-- 
2.27.0




[PATCH v3 2/2] vhost-user: Support vhost_dev_start

2022-10-17 Thread Yajun Wu
The motivation of adding vhost-user vhost_dev_start support is to
improve backend configuration speed and reduce live migration VM
downtime.

Today VQ configuration is issued one by one. For virtio net with
multi-queue support, backend needs to update RSS (Receive side
scaling) on every rx queue enable. Updating RSS is time-consuming
(typical time like 7ms).

Implement already defined vhost status and message in the vhost
specification [1].
(a) VHOST_USER_PROTOCOL_F_STATUS
(b) VHOST_USER_SET_STATUS
(c) VHOST_USER_GET_STATUS

Send message VHOST_USER_SET_STATUS with VIRTIO_CONFIG_S_DRIVER_OK for
device start and reset(0) for device stop.

On reception of the DRIVER_OK message, backend can apply the needed setting
only once (instead of incremental) and also utilize parallelism on enabling
queues.

This improves QEMU's live migration downtime with vhost user backend
implementation by great margin, specially for the large number of VQs of 64
from 800 msec to 250 msec.

[1] https://qemu-project.gitlab.io/qemu/interop/vhost-user.html

Signed-off-by: Yajun Wu 
Acked-by: Parav Pandit 
---
 hw/virtio/vhost-user.c | 74 +-
 1 file changed, 73 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 03415b6c95..bb5164b753 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -81,6 +81,7 @@ enum VhostUserProtocolFeature {
 VHOST_USER_PROTOCOL_F_RESET_DEVICE = 13,
 /* Feature 14 reserved for VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS. */
 VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS = 15,
+VHOST_USER_PROTOCOL_F_STATUS = 16,
 VHOST_USER_PROTOCOL_F_MAX
 };
 
@@ -126,6 +127,8 @@ typedef enum VhostUserRequest {
 VHOST_USER_GET_MAX_MEM_SLOTS = 36,
 VHOST_USER_ADD_MEM_REG = 37,
 VHOST_USER_REM_MEM_REG = 38,
+VHOST_USER_SET_STATUS = 39,
+VHOST_USER_GET_STATUS = 40,
 VHOST_USER_MAX
 } VhostUserRequest;
 
@@ -1452,6 +1455,43 @@ static int vhost_user_set_u64(struct vhost_dev *dev, int 
request, uint64_t u64,
 return 0;
 }
 
+static int vhost_user_set_status(struct vhost_dev *dev, uint8_t status)
+{
+return vhost_user_set_u64(dev, VHOST_USER_SET_STATUS, status, false);
+}
+
+static int vhost_user_get_status(struct vhost_dev *dev, uint8_t *status)
+{
+uint64_t value;
+int ret;
+
+ret = vhost_user_get_u64(dev, VHOST_USER_GET_STATUS, );
+if (ret < 0) {
+return ret;
+}
+*status = value;
+
+return 0;
+}
+
+static int vhost_user_add_status(struct vhost_dev *dev, uint8_t status)
+{
+uint8_t s;
+int ret;
+
+ret = vhost_user_get_status(dev, );
+if (ret < 0) {
+return ret;
+}
+
+if ((s & status) == status) {
+return 0;
+}
+s |= status;
+
+return vhost_user_set_status(dev, s);
+}
+
 static int vhost_user_set_features(struct vhost_dev *dev,
uint64_t features)
 {
@@ -1460,6 +1500,7 @@ static int vhost_user_set_features(struct vhost_dev *dev,
  * backend is actually logging changes
  */
 bool log_enabled = features & (0x1ULL << VHOST_F_LOG_ALL);
+int ret;
 
 /*
  * We need to include any extra backend only feature bits that
@@ -1467,9 +1508,18 @@ static int vhost_user_set_features(struct vhost_dev *dev,
  * VHOST_USER_F_PROTOCOL_FEATURES bit for enabling protocol
  * features.
  */
-return vhost_user_set_u64(dev, VHOST_USER_SET_FEATURES,
+ret = vhost_user_set_u64(dev, VHOST_USER_SET_FEATURES,
   features | dev->backend_features,
   log_enabled);
+
+if (virtio_has_feature(dev->protocol_features,
+   VHOST_USER_PROTOCOL_F_STATUS)) {
+if (!ret) {
+return vhost_user_add_status(dev, VIRTIO_CONFIG_S_FEATURES_OK);
+}
+}
+
+return ret;
 }
 
 static int vhost_user_set_protocol_features(struct vhost_dev *dev,
@@ -2615,6 +2665,27 @@ void vhost_user_cleanup(VhostUserState *user)
 user->chr = NULL;
 }
 
+static int vhost_user_dev_start(struct vhost_dev *dev, bool started)
+{
+if (!virtio_has_feature(dev->protocol_features,
+VHOST_USER_PROTOCOL_F_STATUS)) {
+return 0;
+}
+
+/* Set device status only for last queue pair */
+if (dev->vq_index + dev->nvqs != dev->vq_index_end) {
+return 0;
+}
+
+if (started) {
+return vhost_user_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
+  VIRTIO_CONFIG_S_DRIVER |
+  VIRTIO_CONFIG_S_DRIVER_OK);
+} else {
+return vhost_user_set_status(dev, 0);
+}
+}
+
 const VhostOps user_ops = {
 .backend_type = VHOST_BACKEND_TYPE_USER,
 .vhost_backend_init = vhost_user_backend_init,
@@ -2649,4 +2720,5 @@ const VhostOps user_ops = {
 .vhost_backend_mem_secti

RE: [PATCH v2 0/2] vhost-user: Support vhost_dev_start

2022-10-12 Thread Yajun Wu
Michael,

Don't forget to merge. Thanks!

-Original Message-
From: Yajun Wu 
Sent: Monday, September 5, 2022 12:58 PM
To: Michael S. Tsirkin 
Cc: qemu-devel@nongnu.org; Parav Pandit 
Subject: RE: [PATCH v2 0/2] vhost-user: Support vhost_dev_start

Michael,

7.1 released, please merge.

-Original Message-
From: Michael S. Tsirkin 
Sent: Tuesday, July 26, 2022 10:56 PM
To: Yajun Wu 
Cc: qemu-devel@nongnu.org; Parav Pandit 
Subject: Re: [PATCH v2 0/2] vhost-user: Support vhost_dev_start

External email: Use caution opening links or attachments


On Tue, Jul 12, 2022 at 01:54:59PM +0300, Yajun Wu wrote:
> The motivation of adding vhost-user vhost_dev_start support is to 
> improve backend configuration speed and reduce live migration VM 
> downtime.
>
> Today VQ configuration is issued one by one. For virtio net with 
> multi-queue support, backend needs to update RSS (Receive side
> scaling) on every rx queue enable. Updating RSS is time-consuming 
> (typical time like 7ms).
>
> Implement already defined vhost status and message in the vhost 
> specification [1].
> (a) VHOST_USER_PROTOCOL_F_STATUS
> (b) VHOST_USER_SET_STATUS
> (c) VHOST_USER_GET_STATUS
>
> Send message VHOST_USER_SET_STATUS with VIRTIO_CONFIG_S_DRIVER_OK for 
> device start and reset(0) for device stop.
>
> On reception of the DRIVER_OK message, backend can apply the needed 
> setting only once (instead of incremental) and also utilize 
> parallelism on enabling queues.
>
> This improves QEMU's live migration downtime with vhost user backend 
> implementation by great margin, specially for the large number of VQs 
> of 64 from 800 msec to 250 msec.
>
> Another change is to move the device start routines after finishing 
> all the necessary device and VQ configuration, further aligning to the 
> virtio specification for "device initialization sequence".

I think it's best to merge this after the 7.1 release.
I've tagged this but pls ping me after the release just to make sure it's not 
lost. Thanks!

> [1]
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fqemu
> -project.gitlab.io%2Fqemu%2Finterop%2Fvhost-user.html%23introduction
> mp;data=05%7C01%7Cyajunw%40nvidia.com%7Cb526f8237f7a4531d6eb08da6f16fc
> e9%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C63791784266470%7CU
> nknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1ha
> WwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=zQ5%2F6pYP0riRzArdOWyaARNn
> 6s9NC8vIeIgj%2BB03EAM%3Dreserved=0
>
> v2:
> - add setting status bit VIRTIO_CONFIG_S_FEATURES_OK
> - avoid adding status bits already set
>
> Yajun Wu (2):
>   vhost: Change the sequence of device start
>   vhost-user: Support vhost_dev_start
>
>  hw/block/vhost-user-blk.c | 18 ++
>  hw/net/vhost_net.c| 12 +++
>  hw/virtio/vhost-user.c| 74 ++-
>  3 files changed, 90 insertions(+), 14 deletions(-)
>
> --
> 2.27.0




RE: [PATCH v2 0/2] vhost-user: Support vhost_dev_start

2022-09-04 Thread Yajun Wu
Michael,

7.1 released, please merge.

-Original Message-
From: Michael S. Tsirkin  
Sent: Tuesday, July 26, 2022 10:56 PM
To: Yajun Wu 
Cc: qemu-devel@nongnu.org; Parav Pandit 
Subject: Re: [PATCH v2 0/2] vhost-user: Support vhost_dev_start

External email: Use caution opening links or attachments


On Tue, Jul 12, 2022 at 01:54:59PM +0300, Yajun Wu wrote:
> The motivation of adding vhost-user vhost_dev_start support is to 
> improve backend configuration speed and reduce live migration VM 
> downtime.
>
> Today VQ configuration is issued one by one. For virtio net with 
> multi-queue support, backend needs to update RSS (Receive side
> scaling) on every rx queue enable. Updating RSS is time-consuming 
> (typical time like 7ms).
>
> Implement already defined vhost status and message in the vhost 
> specification [1].
> (a) VHOST_USER_PROTOCOL_F_STATUS
> (b) VHOST_USER_SET_STATUS
> (c) VHOST_USER_GET_STATUS
>
> Send message VHOST_USER_SET_STATUS with VIRTIO_CONFIG_S_DRIVER_OK for 
> device start and reset(0) for device stop.
>
> On reception of the DRIVER_OK message, backend can apply the needed 
> setting only once (instead of incremental) and also utilize 
> parallelism on enabling queues.
>
> This improves QEMU's live migration downtime with vhost user backend 
> implementation by great margin, specially for the large number of VQs 
> of 64 from 800 msec to 250 msec.
>
> Another change is to move the device start routines after finishing 
> all the necessary device and VQ configuration, further aligning to the 
> virtio specification for "device initialization sequence".

I think it's best to merge this after the 7.1 release.
I've tagged this but pls ping me after the release just to make sure it's not 
lost. Thanks!

> [1] 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fqemu
> -project.gitlab.io%2Fqemu%2Finterop%2Fvhost-user.html%23introduction
> mp;data=05%7C01%7Cyajunw%40nvidia.com%7Cb526f8237f7a4531d6eb08da6f16fc
> e9%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C63791784266470%7CU
> nknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1ha
> WwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=zQ5%2F6pYP0riRzArdOWyaARNn
> 6s9NC8vIeIgj%2BB03EAM%3Dreserved=0
>
> v2:
> - add setting status bit VIRTIO_CONFIG_S_FEATURES_OK
> - avoid adding status bits already set
>
> Yajun Wu (2):
>   vhost: Change the sequence of device start
>   vhost-user: Support vhost_dev_start
>
>  hw/block/vhost-user-blk.c | 18 ++
>  hw/net/vhost_net.c| 12 +++
>  hw/virtio/vhost-user.c| 74 ++-
>  3 files changed, 90 insertions(+), 14 deletions(-)
>
> --
> 2.27.0




[PATCH v2 1/2] vhost: Change the sequence of device start

2022-07-12 Thread Yajun Wu
This patch is part of adding vhost-user vhost_dev_start support. The
motivation is to improve backend configuration speed and reduce live
migration VM downtime.

Moving the device start routines after finishing all the necessary device
and VQ configuration, further aligning to the virtio specification for
"device initialization sequence".

Following patch will add vhost-user vhost_dev_start support.

Signed-off-by: Yajun Wu 
Acked-by: Parav Pandit 
---
 hw/block/vhost-user-blk.c | 18 +++---
 hw/net/vhost_net.c| 12 ++--
 2 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 9117222456..972ef46365 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -163,13 +163,6 @@ static int vhost_user_blk_start(VirtIODevice *vdev, Error 
**errp)
 goto err_guest_notifiers;
 }
 
-ret = vhost_dev_start(>dev, vdev);
-if (ret < 0) {
-error_setg_errno(errp, -ret, "Error starting vhost");
-goto err_guest_notifiers;
-}
-s->started_vu = true;
-
 /* guest_notifier_mask/pending not used yet, so just unmask
  * everything here. virtio-pci will do the right thing by
  * enabling/disabling irqfd.
@@ -178,9 +171,20 @@ static int vhost_user_blk_start(VirtIODevice *vdev, Error 
**errp)
 vhost_virtqueue_mask(>dev, vdev, i, false);
 }
 
+s->dev.vq_index_end = s->dev.nvqs;
+ret = vhost_dev_start(>dev, vdev);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "Error starting vhost");
+goto err_guest_notifiers;
+}
+s->started_vu = true;
+
 return ret;
 
 err_guest_notifiers:
+for (i = 0; i < s->dev.nvqs; i++) {
+vhost_virtqueue_mask(>dev, vdev, i, true);
+}
 k->set_guest_notifiers(qbus->parent, s->dev.nvqs, false);
 err_host_notifiers:
 vhost_dev_disable_notifiers(>dev, vdev);
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index ccac5b7a64..6a8a6082a2 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -370,21 +370,21 @@ int vhost_net_start(VirtIODevice *dev, NetClientState 
*ncs,
 } else {
 peer = qemu_get_peer(ncs, n->max_queue_pairs);
 }
-r = vhost_net_start_one(get_vhost_net(peer), dev);
-
-if (r < 0) {
-goto err_start;
-}
 
 if (peer->vring_enable) {
 /* restore vring enable state */
 r = vhost_set_vring_enable(peer, peer->vring_enable);
 
 if (r < 0) {
-vhost_net_stop_one(get_vhost_net(peer), dev);
 goto err_start;
 }
 }
+
+r = vhost_net_start_one(get_vhost_net(peer), dev);
+if (r < 0) {
+vhost_net_stop_one(get_vhost_net(peer), dev);
+goto err_start;
+}
 }
 
 return 0;
-- 
2.27.0




[PATCH v2 0/2] vhost-user: Support vhost_dev_start

2022-07-12 Thread Yajun Wu
The motivation of adding vhost-user vhost_dev_start support is to
improve backend configuration speed and reduce live migration VM
downtime.

Today VQ configuration is issued one by one. For virtio net with
multi-queue support, backend needs to update RSS (Receive side
scaling) on every rx queue enable. Updating RSS is time-consuming
(typical time like 7ms).

Implement already defined vhost status and message in the vhost
specification [1].
(a) VHOST_USER_PROTOCOL_F_STATUS
(b) VHOST_USER_SET_STATUS
(c) VHOST_USER_GET_STATUS

Send message VHOST_USER_SET_STATUS with VIRTIO_CONFIG_S_DRIVER_OK for
device start and reset(0) for device stop.

On reception of the DRIVER_OK message, backend can apply the needed setting
only once (instead of incremental) and also utilize parallelism on enabling
queues.

This improves QEMU's live migration downtime with vhost user backend
implementation by great margin, specially for the large number of VQs of 64
from 800 msec to 250 msec.

Another change is to move the device start routines after finishing all the
necessary device and VQ configuration, further aligning to the virtio
specification for "device initialization sequence".

[1] https://qemu-project.gitlab.io/qemu/interop/vhost-user.html#introduction

v2:
- add setting status bit VIRTIO_CONFIG_S_FEATURES_OK
- avoid adding status bits already set

Yajun Wu (2):
  vhost: Change the sequence of device start
  vhost-user: Support vhost_dev_start

 hw/block/vhost-user-blk.c | 18 ++
 hw/net/vhost_net.c| 12 +++
 hw/virtio/vhost-user.c| 74 ++-
 3 files changed, 90 insertions(+), 14 deletions(-)

-- 
2.27.0




[PATCH v2 2/2] vhost-user: Support vhost_dev_start

2022-07-12 Thread Yajun Wu
The motivation of adding vhost-user vhost_dev_start support is to
improve backend configuration speed and reduce live migration VM
downtime.

Today VQ configuration is issued one by one. For virtio net with
multi-queue support, backend needs to update RSS (Receive side
scaling) on every rx queue enable. Updating RSS is time-consuming
(typical time like 7ms).

Implement already defined vhost status and message in the vhost
specification [1].
(a) VHOST_USER_PROTOCOL_F_STATUS
(b) VHOST_USER_SET_STATUS
(c) VHOST_USER_GET_STATUS

Send message VHOST_USER_SET_STATUS with VIRTIO_CONFIG_S_DRIVER_OK for
device start and reset(0) for device stop.

On reception of the DRIVER_OK message, backend can apply the needed setting
only once (instead of incremental) and also utilize parallelism on enabling
queues.

This improves QEMU's live migration downtime with vhost user backend
implementation by great margin, specially for the large number of VQs of 64
from 800 msec to 250 msec.

[1] https://qemu-project.gitlab.io/qemu/interop/vhost-user.html

Signed-off-by: Yajun Wu 
Acked-by: Parav Pandit 
---
 hw/virtio/vhost-user.c | 74 +-
 1 file changed, 73 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 75b8df21a4..e6ad4c05b8 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -81,6 +81,7 @@ enum VhostUserProtocolFeature {
 VHOST_USER_PROTOCOL_F_RESET_DEVICE = 13,
 /* Feature 14 reserved for VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS. */
 VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS = 15,
+VHOST_USER_PROTOCOL_F_STATUS = 16,
 VHOST_USER_PROTOCOL_F_MAX
 };
 
@@ -126,6 +127,8 @@ typedef enum VhostUserRequest {
 VHOST_USER_GET_MAX_MEM_SLOTS = 36,
 VHOST_USER_ADD_MEM_REG = 37,
 VHOST_USER_REM_MEM_REG = 38,
+VHOST_USER_SET_STATUS = 39,
+VHOST_USER_GET_STATUS = 40,
 VHOST_USER_MAX
 } VhostUserRequest;
 
@@ -1451,6 +1454,43 @@ static int vhost_user_set_u64(struct vhost_dev *dev, int 
request, uint64_t u64,
 return 0;
 }
 
+static int vhost_user_set_status(struct vhost_dev *dev, uint8_t status)
+{
+return vhost_user_set_u64(dev, VHOST_USER_SET_STATUS, status, false);
+}
+
+static int vhost_user_get_status(struct vhost_dev *dev, uint8_t *status)
+{
+uint64_t value;
+int ret;
+
+ret = vhost_user_get_u64(dev, VHOST_USER_GET_STATUS, );
+if (ret < 0) {
+return ret;
+}
+*status = value;
+
+return 0;
+}
+
+static int vhost_user_add_status(struct vhost_dev *dev, uint8_t status)
+{
+uint8_t s;
+int ret;
+
+ret = vhost_user_get_status(dev, );
+if (ret < 0) {
+return ret;
+}
+
+if ((s & status) == status) {
+return 0;
+}
+s |= status;
+
+return vhost_user_set_status(dev, s);
+}
+
 static int vhost_user_set_features(struct vhost_dev *dev,
uint64_t features)
 {
@@ -1459,9 +1499,19 @@ static int vhost_user_set_features(struct vhost_dev *dev,
  * backend is actually logging changes
  */
 bool log_enabled = features & (0x1ULL << VHOST_F_LOG_ALL);
+int ret;
 
-return vhost_user_set_u64(dev, VHOST_USER_SET_FEATURES, features,
+ret = vhost_user_set_u64(dev, VHOST_USER_SET_FEATURES, features,
   log_enabled);
+
+if (virtio_has_feature(dev->protocol_features,
+VHOST_USER_PROTOCOL_F_STATUS)) {
+if (!ret) {
+return vhost_user_add_status(dev, VIRTIO_CONFIG_S_FEATURES_OK);
+}
+}
+
+return ret;
 }
 
 static int vhost_user_set_protocol_features(struct vhost_dev *dev,
@@ -2607,6 +2657,27 @@ void vhost_user_cleanup(VhostUserState *user)
 user->chr = NULL;
 }
 
+static int vhost_user_dev_start(struct vhost_dev *dev, bool started)
+{
+if (!virtio_has_feature(dev->protocol_features,
+VHOST_USER_PROTOCOL_F_STATUS)) {
+return 0;
+}
+
+/* Set device status only for last queue pair */
+if (dev->vq_index + dev->nvqs != dev->vq_index_end) {
+return 0;
+}
+
+if (started) {
+return vhost_user_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
+  VIRTIO_CONFIG_S_DRIVER |
+  VIRTIO_CONFIG_S_DRIVER_OK);
+} else {
+return vhost_user_set_status(dev, 0);
+}
+}
+
 const VhostOps user_ops = {
 .backend_type = VHOST_BACKEND_TYPE_USER,
 .vhost_backend_init = vhost_user_backend_init,
@@ -2641,4 +2712,5 @@ const VhostOps user_ops = {
 .vhost_backend_mem_section_filter = vhost_user_mem_section_filter,
 .vhost_get_inflight_fd = vhost_user_get_inflight_fd,
 .vhost_set_inflight_fd = vhost_user_set_inflight_fd,
+.vhost_dev_start = vhost_user_dev_start,
 };
-- 
2.27.0




[PATCH 0/2] vhost-user: Support vhost_dev_start

2022-06-28 Thread Yajun Wu
The motivation of adding vhost-user vhost_dev_start support is to
improve backend configuration speed and reduce live migration VM
downtime.

Today VQ configuration is issued one by one. For virtio net with
multi-queue support, backend needs to update RSS (Receive side
scaling) on every rx queue enable. Updating RSS is time-consuming
(typical time like 7ms).

Implement already defined vhost status and message in the vhost
specification [1].
(a) VHOST_USER_PROTOCOL_F_STATUS
(b) VHOST_USER_SET_STATUS
(c) VHOST_USER_GET_STATUS

Send message VHOST_USER_SET_STATUS with VIRTIO_CONFIG_S_DRIVER_OK for
device start and reset(0) for device stop.

On reception of the DRIVER_OK message, backend can apply the needed setting
only once (instead of incremental) and also utilize parallelism on enabling
queues.

This improves QEMU's live migration downtime with vhost user backend
implementation by great margin, specially for the large number of VQs of 64
from 800 msec to 250 msec.

Another change is to move the device start routines after finishing all the
necessary device and VQ configuration, further aligning to the virtio
specification for "device initialization sequence".

[1] https://qemu-project.gitlab.io/qemu/interop/vhost-user.html#introduction

Yajun Wu (2):
  vhost: Change the sequence of device start
  vhost-user: Support vhost_dev_start

 hw/block/vhost-user-blk.c | 18 +++-
 hw/net/vhost_net.c| 12 
 hw/virtio/vhost-user.c| 58 +++
 3 files changed, 75 insertions(+), 13 deletions(-)

-- 
2.30.2




[PATCH 1/2] vhost: Change the sequence of device start

2022-06-28 Thread Yajun Wu
This patch is part of adding vhost-user vhost_dev_start support. The
motivation is to improve backend configuration speed and reduce live
migration VM downtime.

Moving the device start routines after finishing all the necessary device
and VQ configuration, further aligning to the virtio specification for
"device initialization sequence".

Following patch will add vhost-user vhost_dev_start support.

Signed-off-by: Yajun Wu 
Acked-by: Parav Pandit 

---
 hw/block/vhost-user-blk.c | 18 +++---
 hw/net/vhost_net.c| 12 ++--
 2 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 9117222456..972ef46365 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -163,13 +163,6 @@ static int vhost_user_blk_start(VirtIODevice *vdev, Error 
**errp)
 goto err_guest_notifiers;
 }
 
-ret = vhost_dev_start(>dev, vdev);
-if (ret < 0) {
-error_setg_errno(errp, -ret, "Error starting vhost");
-goto err_guest_notifiers;
-}
-s->started_vu = true;
-
 /* guest_notifier_mask/pending not used yet, so just unmask
  * everything here. virtio-pci will do the right thing by
  * enabling/disabling irqfd.
@@ -178,9 +171,20 @@ static int vhost_user_blk_start(VirtIODevice *vdev, Error 
**errp)
 vhost_virtqueue_mask(>dev, vdev, i, false);
 }
 
+s->dev.vq_index_end = s->dev.nvqs;
+ret = vhost_dev_start(>dev, vdev);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "Error starting vhost");
+goto err_guest_notifiers;
+}
+s->started_vu = true;
+
 return ret;
 
 err_guest_notifiers:
+for (i = 0; i < s->dev.nvqs; i++) {
+vhost_virtqueue_mask(>dev, vdev, i, true);
+}
 k->set_guest_notifiers(qbus->parent, s->dev.nvqs, false);
 err_host_notifiers:
 vhost_dev_disable_notifiers(>dev, vdev);
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index ccac5b7a64..6a8a6082a2 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -370,21 +370,21 @@ int vhost_net_start(VirtIODevice *dev, NetClientState 
*ncs,
 } else {
 peer = qemu_get_peer(ncs, n->max_queue_pairs);
 }
-r = vhost_net_start_one(get_vhost_net(peer), dev);
-
-if (r < 0) {
-goto err_start;
-}
 
 if (peer->vring_enable) {
 /* restore vring enable state */
 r = vhost_set_vring_enable(peer, peer->vring_enable);
 
 if (r < 0) {
-vhost_net_stop_one(get_vhost_net(peer), dev);
 goto err_start;
 }
 }
+
+r = vhost_net_start_one(get_vhost_net(peer), dev);
+if (r < 0) {
+vhost_net_stop_one(get_vhost_net(peer), dev);
+goto err_start;
+}
 }
 
 return 0;
-- 
2.27.0




[PATCH 2/2] vhost-user: Support vhost_dev_start

2022-06-28 Thread Yajun Wu
The motivation of adding vhost-user vhost_dev_start support is to
improve backend configuration speed and reduce live migration VM
downtime.

Today VQ configuration is issued one by one. For virtio net with
multi-queue support, backend needs to update RSS (Receive side
scaling) on every rx queue enable. Updating RSS is time-consuming
(typical time like 7ms).

Implement already defined vhost status and message in the vhost
specification [1].
(a) VHOST_USER_PROTOCOL_F_STATUS
(b) VHOST_USER_SET_STATUS
(c) VHOST_USER_GET_STATUS

Send message VHOST_USER_SET_STATUS with VIRTIO_CONFIG_S_DRIVER_OK for
device start and reset(0) for device stop.

On reception of the DRIVER_OK message, backend can apply the needed setting
only once (instead of incremental) and also utilize parallelism on enabling
queues.

This improves QEMU's live migration downtime with vhost user backend
implementation by great margin, specially for the large number of VQs of 64
from 800 msec to 250 msec.

[1] https://qemu-project.gitlab.io/qemu/interop/vhost-user.html

Signed-off-by: Yajun Wu 
Acked-by: Parav Pandit 
---
 hw/virtio/vhost-user.c | 58 ++
 1 file changed, 58 insertions(+)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 4b9be26e84..9f75c51dc2 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -81,6 +81,7 @@ enum VhostUserProtocolFeature {
 VHOST_USER_PROTOCOL_F_RESET_DEVICE = 13,
 /* Feature 14 reserved for VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS. */
 VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS = 15,
+VHOST_USER_PROTOCOL_F_STATUS = 16,
 VHOST_USER_PROTOCOL_F_MAX
 };
 
@@ -126,6 +127,8 @@ typedef enum VhostUserRequest {
 VHOST_USER_GET_MAX_MEM_SLOTS = 36,
 VHOST_USER_ADD_MEM_REG = 37,
 VHOST_USER_REM_MEM_REG = 38,
+VHOST_USER_SET_STATUS = 39,
+VHOST_USER_GET_STATUS = 40,
 VHOST_USER_MAX
 } VhostUserRequest;
 
@@ -1446,6 +1449,39 @@ static int vhost_user_set_u64(struct vhost_dev *dev, int 
request, uint64_t u64,
 return 0;
 }
 
+static int vhost_user_set_status(struct vhost_dev *dev, uint8_t status)
+{
+return vhost_user_set_u64(dev, VHOST_USER_SET_STATUS, status, false);
+}
+
+static int vhost_user_get_status(struct vhost_dev *dev, uint8_t *status)
+{
+uint64_t value;
+int ret;
+
+ret = vhost_user_get_u64(dev, VHOST_USER_GET_STATUS, );
+if (ret < 0) {
+return ret;
+}
+*status = value;
+
+return 0;
+}
+
+static int vhost_user_add_status(struct vhost_dev *dev, uint8_t status)
+{
+uint8_t s;
+int ret;
+
+ret = vhost_user_get_status(dev, );
+if (ret < 0) {
+return ret;
+}
+s |= status;
+
+return vhost_user_set_status(dev, s);
+}
+
 static int vhost_user_set_features(struct vhost_dev *dev,
uint64_t features)
 {
@@ -2602,6 +2638,27 @@ void vhost_user_cleanup(VhostUserState *user)
 user->chr = NULL;
 }
 
+static int vhost_user_dev_start(struct vhost_dev *dev, bool started)
+{
+if (!virtio_has_feature(dev->protocol_features,
+VHOST_USER_PROTOCOL_F_STATUS)) {
+return 0;
+}
+
+/* Set device status only for last queue pair */
+if (dev->vq_index + dev->nvqs != dev->vq_index_end) {
+return 0;
+}
+
+if (started) {
+return vhost_user_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
+  VIRTIO_CONFIG_S_DRIVER |
+  VIRTIO_CONFIG_S_DRIVER_OK);
+} else {
+return vhost_user_set_status(dev, 0);
+}
+}
+
 const VhostOps user_ops = {
 .backend_type = VHOST_BACKEND_TYPE_USER,
 .vhost_backend_init = vhost_user_backend_init,
@@ -2635,4 +2692,5 @@ const VhostOps user_ops = {
 .vhost_backend_mem_section_filter = vhost_user_mem_section_filter,
 .vhost_get_inflight_fd = vhost_user_get_inflight_fd,
 .vhost_set_inflight_fd = vhost_user_set_inflight_fd,
+.vhost_dev_start = vhost_user_dev_start,
 };
-- 
2.27.0




RE: [PATCH] virtio/vhost-user: Fix wrong vhost notifier GPtrArray size

2022-06-13 Thread Yajun Wu
Hi Michael,

User space vhost clients are broken for few weeks now without this fix.
With Alex's review, can you please merge it if there are no further comments?

Thanks.

-Original Message-
From: Alex Bennée  
Sent: Thursday, May 26, 2022 3:09 PM
To: Yajun Wu 
Cc: qemu-devel@nongnu.org; m...@redhat.com; Parav Pandit 
Subject: Re: [PATCH] virtio/vhost-user: Fix wrong vhost notifier GPtrArray size

External email: Use caution opening links or attachments


Yajun Wu  writes:

> In fetch_or_create_notifier, idx begins with 0. So the GPtrArray size 
> should be idx + 1 and g_ptr_array_set_size should be called with idx + 1.
>
> This wrong GPtrArray size causes fetch_or_create_notifier return an 
> invalid address. Passing this invalid pointer to 
> vhost_user_host_notifier_remove causes assert fail:
>
> qemu/include/qemu/int128.h:27: int128_get64: Assertion `r == a' failed.
>   shutting down, reason=crashed
>
> Backends like dpdk-vdpa which sends out vhost notifier requests almost 
> always hit qemu crash.

My bad. I was looking for ways to exercise this code but the internal tests 
didn't do it. I guess I should look at getting a test setup for dpdk. Anyway:

Reviewed-by: Alex Bennée 


>
> Fixes: 503e355465 ("virtio/vhost-user: dynamically assign 
> VhostUserHostNotifiers")
> Signed-off-by: Yajun Wu 
> Acked-by: Parav Pandit 
> Change-Id: I87e0f7591ca9a59d210879b260704a2d9e9d6bcd
> ---
>  hw/virtio/vhost-user.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index 
> b040c1ad2b..dbc690d16c 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -1525,7 +1525,7 @@ static VhostUserHostNotifier 
> *fetch_or_create_notifier(VhostUserState *u,  {
>  VhostUserHostNotifier *n = NULL;
>  if (idx >= u->notifiers->len) {
> -g_ptr_array_set_size(u->notifiers, idx);
> +g_ptr_array_set_size(u->notifiers, idx + 1);
>  }
>
>  n = g_ptr_array_index(u->notifiers, idx);


--
Alex Bennée


[PATCH] virtio/vhost-user: Fix wrong vhost notifier GPtrArray size

2022-05-25 Thread Yajun Wu
In fetch_or_create_notifier, idx begins with 0. So the GPtrArray size
should be idx + 1 and g_ptr_array_set_size should be called with idx + 1.

This wrong GPtrArray size causes fetch_or_create_notifier return an invalid
address. Passing this invalid pointer to vhost_user_host_notifier_remove
causes assert fail:

qemu/include/qemu/int128.h:27: int128_get64: Assertion `r == a' failed.
shutting down, reason=crashed

Backends like dpdk-vdpa which sends out vhost notifier requests almost always
hit qemu crash.

Fixes: 503e355465 ("virtio/vhost-user: dynamically assign 
VhostUserHostNotifiers")
Signed-off-by: Yajun Wu 
Acked-by: Parav Pandit 
Change-Id: I87e0f7591ca9a59d210879b260704a2d9e9d6bcd
---
 hw/virtio/vhost-user.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index b040c1ad2b..dbc690d16c 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -1525,7 +1525,7 @@ static VhostUserHostNotifier 
*fetch_or_create_notifier(VhostUserState *u,
 {
 VhostUserHostNotifier *n = NULL;
 if (idx >= u->notifiers->len) {
-g_ptr_array_set_size(u->notifiers, idx);
+g_ptr_array_set_size(u->notifiers, idx + 1);
 }
 
 n = g_ptr_array_index(u->notifiers, idx);
-- 
2.36.0




[PATCH v2] hw/virtio: Fix leak of host-notifier memory-region

2021-08-15 Thread Yajun Wu via
If call virtio_queue_set_host_notifier_mr fails, should free
host-notifier memory-region.

Fixes: 44866521bd ("vhost-user: support registering external host notifiers")
Signed-off-by: Yajun Wu 
---
 hw/virtio/vhost-user.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index aec6cc1..3ae5297 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -1474,6 +1474,7 @@ static int 
vhost_user_slave_handle_vring_host_notifier(struct vhost_dev *dev,
 g_free(name);
 
 if (virtio_queue_set_host_notifier_mr(vdev, queue_idx, >mr, true)) {
+object_unparent(OBJECT(>mr));
 munmap(addr, page_size);
 return -1;
 }
-- 
1.8.3.1




[PATCH] hw/virtio: Fix leak of host-notifier memory-region

2021-08-11 Thread Yajun Wu via
If call virtio_queue_set_host_notifier_mr fails, should free
host-notifier memory-region.

Signed-off-by: Yajun Wu 
---
 hw/virtio/vhost-user.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index aec6cc1..3ae5297 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -1474,6 +1474,7 @@ static int 
vhost_user_slave_handle_vring_host_notifier(struct vhost_dev *dev,
 g_free(name);
 
 if (virtio_queue_set_host_notifier_mr(vdev, queue_idx, >mr, true)) {
+object_unparent(OBJECT(>mr));
 munmap(addr, page_size);
 return -1;
 }
-- 
1.8.3.1




Any reason VIRTQUEUE_MAX_SIZE is 1024? Can we increase this limit?

2020-07-30 Thread Yajun Wu
Hi all,

I'm doing iperf test on VIRTIO net through vhost-user(HW VDPA).
Find maximal acceptable tx_queue_size/rx_queue_size is 1024.
Basically increase queue size can get better RX rate for my case.

Can we increase the limit(VIRTQUEUE_MAX_SIZE) to 8192 to possibly gain better 
performance?

Thanks,
Yajun