Thanks, Feng Li On Thu, Oct 14, 2021 at 8:28 PM Maxime Coquelin <maxime.coque...@redhat.com> wrote: > > > > On 10/14/21 14:12, Li Feng wrote: > > On Thu, Oct 14, 2021 at 8:01 PM Maxime Coquelin > > <maxime.coque...@redhat.com> wrote: > >> > >> Hi Li, > >> > >> The commit message is not compliant with the contributors guidelines: > >> https://doc.dpdk.org/guides/contributing/patches.html#commit-messages-subject-line > > OK, I got it. > >> > >> On 9/3/21 10:02, Li Feng wrote: > >>> Vhost-user client must send the mem table, kick fd, call fd on all > >>> virtqueues, then the device will be VIRTIO_DEV_RUNNING. > >>> > >>> If the vhost-user communication is initialized partly, e.g. > >>> - When initializing the vhost-user, try to restart the vhost-user > >>> backend; > >>> - Seabios only initialized the vhost-scsi req vq. > >>> The device is not with flags VIRTIO_DEV_RUNNING.. > >>> > >>> Root Cause: > >>> The vhost session has been created, and added the scsi/blk requestq > >>> poller into reactor, but when destroying the device, the requestq is not > >>> unregistered. > >>> > >>> Reproduce the crash on spdk vhost-user backend: > >>> 1. Create a VM; > >>> 2. Mount a ISO to a VM, start the VM, don't install the OS; > >>> 3. Restart the spdk_tgt; > >>> > >>> Another discusstion is in seabiso: > >>> https://patchew.org/Seabios/20210831122339.2591585-1-fen...@smartx.com/ > >> > >> This is a fix, so you need to add the Fixes tag and cc stable. > > Acked. > > > >> > >>> Signed-off-by: Li Feng <fen...@smartx.com> > >>> --- > >>> v2: > >>> Fix the commit msg typo: vas -> virtqueues. > >>> -- > >>> lib/vhost/vhost.c | 2 +- > >>> 1 file changed, 1 insertion(+), 1 deletion(-) > >>> > >>> diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c > >>> index 355ff37651..191ba82c41 100644 > >>> --- a/lib/vhost/vhost.c > >>> +++ b/lib/vhost/vhost.c > >>> @@ -710,8 +710,8 @@ vhost_destroy_device_notify(struct virtio_net *dev) > >>> if (vdpa_dev) > >>> vdpa_dev->ops->dev_close(dev->vid); > >>> dev->flags &= ~VIRTIO_DEV_RUNNING; > >>> - dev->notify_ops->destroy_device(dev->vid); > >>> } > >>> + dev->notify_ops->destroy_device(dev->vid); > >> > >> .destroy_device() is the counter-part of .new_device(). > >> VIRTIO_DEV_RUNNING is set only when .new_device() has been called with > >> success, and cleared when .destroy_device() is called. > >> > >> So I disagree with the fix, we want to keep the correlation between > >> VIRTIO_DEV_RUNNING and .new_device()/.destroy_device(). Doing otherwise > >> could lead to regressions with other applications than yours. > >> > >> What is not clear from the commit message or the discussion you link is > >> where does it crash exactly. Is it in SPDK, in DPDK? > > > > The crash is in SPDK, the poller is still running in the reactor, > > however, the device is freed. > > > > I really don't have a good method to handle this partly initialized virt > > queues. > > This is another patch I prepared to fix this issue: > > > > From 63142ec60088d08b27b9657640b82e837557b5d4 Mon Sep 17 00:00:00 2001 > > From: Li Feng <fen...@smartx.com> > > Date: Wed, 1 Sep 2021 16:51:44 +0800 > > Subject: [PATCH] vhost: fix vhost session crash > > > > If any vq is inited well, treat the dev is RUNNING status. > > > > Root Cause: > > The session has been created, and added the requestq poller into > > reactor, but when destroying the device, the requestq is not > > unregistered. > > The seabios only initialized the req vq(idx = 2), ignore the controlq > > and eventq vq. > > > > Reproduce: > > 1. Create a VM; > > 2. Mount a ISO to a VM, start the VM, don't install the OS; > > 3. Restart the zbs-chunkd; > > > > Signed-off-by: Li Feng <fen...@smartx.com> > > Change-Id: I21292e58b0b08237b5d105359095ec6a31907752 > > --- > > lib/librte_vhost/vhost_user.c | 6 ++++-- > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > > diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c > > index f211ec8..a80e9f4 100644 > > --- a/lib/librte_vhost/vhost_user.c > > +++ b/lib/librte_vhost/vhost_user.c > > @@ -1394,9 +1394,11 @@ virtio_is_ready(struct virtio_net *dev) > > "kickfd: %d callfd: %d enabled: %d\n", > > dev->ifname, vq, i, vq->desc, vq->avail, > > vq->used, vq->kickfd, vq->callfd, vq->enabled); > > - if (!vq_is_ready(dev, vq)) > > - return 0; > > + if (vq_is_ready(dev, vq)) > > + break; > > } > > + if (i == nr_vring) > > + return 0; > > > > /* If supported, ensure the frontend is really done with config */ > > if (dev->protocol_features & (1ULL << VHOST_USER_PROTOCOL_F_STATUS)) > > > > Above patch will also cause regression, as networking backends work > with queue pairs. > > So your issue is that SPDK is processing vrings while DPDK considers the > device as not running. Instead of working around that issue, maybe what > you should do is to introduce a new API and mechanism to help DPDK > determine whether it should consider the device ready based on the > backend type. IIUC, in your Vhost-scsi case, it should be as soon as VQ > 2 is ready?
Yes, VQ 2 is ready, the vhost-iscsi device should be ready. I had this idea months ago, if it's acceptable, I will try to do it. Thanks. > > > >