在 2024/2/21 17:02, Maxime Coquelin 写道:


On 2/20/24 12:43, Michael S. Tsirkin wrote:
On Tue, Feb 20, 2024 at 12:26:49PM +0100, Maxime Coquelin wrote:


On 2/13/24 11:05, Michael S. Tsirkin wrote:
On Fri, Jan 26, 2024 at 06:07:37PM +0800, Hao Chen wrote:
I run "dpdk-vdpa" and "qemur-L2" in "qemu-L1".

In a nested virtualization environment, "qemu-L2" vhost-user socket sends
a "VHOST_USER_IOTLB_MSG" message to "dpdk-vdpa" and blocks waiting for
"dpdk-vdpa" to process the message.
If "dpdk-vdpa" doesn't complete the processing of the "VHOST_USER_IOTLB_MSG" message and sends a message that needs to be replied in another thread, such as "VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG", "dpdk-vdpa" will also block and wait for "qemu-L2" to process this message. However, "qemu-L2" vhost-user's socket is blocking while waiting for a reply from "dpdk-vdpa"
after processing the message "VHOSTr_USER_IOTLB_MSG", and
"VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG" will not be processed.
In this case, both "dpdk-vdpa" and "qemu-L2" are blocked on the
vhost read, resulting in a deadlock.

You can modify "VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG" or
"VHOST_USER_IOTLB_MSG" to "no need reply" to fix this issue.
There are too many messages in dpdk that are similar to
"VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG", and I would prefer the latter.

Fixes: 24e34754eb78 ("vhost-user: factor out msg head and payload")

Signed-off-by: Hao Chen <ch...@yusur.tech>

I would be very worried that IOTLB becomes stale and
guest memory is corrupted if we just proceed without waiting.

Maxime what do you think? How would you address the issue?

I agree with you, this is not possible.
For example, in case of IOTLB invalidate, the frontend relies on the
backend reply to ensure it is no more accessing the memory before
proceeding.

The reply-ack for VHOST_USER_BACKEND_VRING_HOST_NOTIFIER_MSG request is
less important, if it fails the host notifications won't work but would
not risk corruption. Maybe on Qemu side we could fail init if processing
the request fails, as I think that if negotiated, we can expect it to
succeed.

What do you think about this proposal?

Regards,
Maxime

Fundamentally, I think that if qemu blocks guest waiting for a rely
that is ok but it really has to process incoming messages meanwhile.
Same should apply to backend I think ...

I understand your point.
For DPDK Vhost library, it will likely imply ABI breakage as it would
require to asynchronous handling of Vhost-user requests. We would only
be able to do it at next LTS release.

Hao, as your driver is not available upstream it will be difficult to
assist you more. But if you look at other DPDK vDPA driver like SFC for
instance, the way they implemented host notification control should be
safe against this kind of deadlocks.
Okay, I can also avoid this issue by sending the "VHOST_USER_SLAVE_VRING_HOSTNOTIFIER_MSG" message as late as possible to avoid conflicts with the "VHOST-USER-IOTLB-MSG" message. In summary, thank you.





---
   hw/virtio/vhost-user.c | 10 ++--------
   1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index f214df804b..02caa94b6c 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -2371,20 +2371,14 @@ static int vhost_user_net_set_mtu(struct vhost_dev *dev, uint16_t mtu)
   static int vhost_user_send_device_iotlb_msg(struct vhost_dev *dev,
                                               struct vhost_iotlb_msg *imsg)
   {
-    int ret;
       VhostUserMsg msg = {
           .hdr.request = VHOST_USER_IOTLB_MSG,
           .hdr.size = sizeof(msg.payload.iotlb),
-        .hdr.flags = VHOST_USER_VERSION | VHOST_USER_NEED_REPLY_MASK,
+        .hdr.flags = VHOST_USER_VERSION,
           .payload.iotlb = *imsg,
       };
-    ret = vhost_user_write(dev, &msg, NULL, 0);
-    if (ret < 0) {
-        return ret;
-    }
-
-    return process_message_reply(dev, &msg);
+    return vhost_user_write(dev, &msg, NULL, 0);
   }
--
2.27.0




Reply via email to