Thanks,
Feng Li

On Thu, Oct 14, 2021 at 8:28 PM Maxime Coquelin
<maxime.coque...@redhat.com> wrote:
>
>
>
> On 10/14/21 14:12, Li Feng wrote:
> > On Thu, Oct 14, 2021 at 8:01 PM Maxime Coquelin
> > <maxime.coque...@redhat.com> wrote:
> >>
> >> Hi Li,
> >>
> >> The commit message is not compliant with the contributors guidelines:
> >> https://doc.dpdk.org/guides/contributing/patches.html#commit-messages-subject-line
> > OK, I got it.
> >>
> >> On 9/3/21 10:02, Li Feng wrote:
> >>> Vhost-user client must send the mem table, kick fd, call fd on all
> >>> virtqueues, then the device will be VIRTIO_DEV_RUNNING.
> >>>
> >>> If the vhost-user communication is initialized partly, e.g.
> >>> - When initializing the vhost-user, try to restart the vhost-user
> >>>       backend;
> >>> - Seabios only initialized the vhost-scsi req vq.
> >>> The device is not with flags VIRTIO_DEV_RUNNING..
> >>>
> >>> Root Cause:
> >>> The vhost session has been created, and added the scsi/blk requestq
> >>> poller into reactor, but when destroying the device, the requestq is not
> >>> unregistered.
> >>>
> >>> Reproduce the crash on spdk vhost-user backend:
> >>> 1. Create a VM;
> >>> 2. Mount a ISO to a VM, start the VM, don't install the OS;
> >>> 3. Restart the spdk_tgt;
> >>>
> >>> Another discusstion is in seabiso:
> >>> https://patchew.org/Seabios/20210831122339.2591585-1-fen...@smartx.com/
> >>
> >> This is a fix, so you need to add the Fixes tag and cc stable.
> > Acked.
> >
> >>
> >>> Signed-off-by: Li Feng <fen...@smartx.com>
> >>> ---
> >>> v2:
> >>> Fix the commit msg typo: vas -> virtqueues.
> >>> --
> >>>    lib/vhost/vhost.c | 2 +-
> >>>    1 file changed, 1 insertion(+), 1 deletion(-)
> >>>
> >>> diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c
> >>> index 355ff37651..191ba82c41 100644
> >>> --- a/lib/vhost/vhost.c
> >>> +++ b/lib/vhost/vhost.c
> >>> @@ -710,8 +710,8 @@ vhost_destroy_device_notify(struct virtio_net *dev)
> >>>                if (vdpa_dev)
> >>>                        vdpa_dev->ops->dev_close(dev->vid);
> >>>                dev->flags &= ~VIRTIO_DEV_RUNNING;
> >>> -             dev->notify_ops->destroy_device(dev->vid);
> >>>        }
> >>> +     dev->notify_ops->destroy_device(dev->vid);
> >>
> >> .destroy_device() is the counter-part of .new_device().
> >> VIRTIO_DEV_RUNNING is set only when .new_device() has been called with
> >> success, and cleared when .destroy_device() is called.
> >>
> >> So I disagree with the fix, we want to keep the correlation between
> >> VIRTIO_DEV_RUNNING and .new_device()/.destroy_device(). Doing otherwise
> >> could lead to regressions with other applications than yours.
> >>
> >> What is not clear from the commit message or the discussion you link is
> >> where does it crash exactly. Is it in SPDK, in DPDK?
> >
> > The crash is in SPDK, the poller is still running in the reactor,
> > however, the device is freed.
> >
> > I really don't have a good method to handle this partly initialized virt 
> > queues.
> > This is another patch I prepared to fix this issue:
> >
> >  From 63142ec60088d08b27b9657640b82e837557b5d4 Mon Sep 17 00:00:00 2001
> > From: Li Feng <fen...@smartx.com>
> > Date: Wed, 1 Sep 2021 16:51:44 +0800
> > Subject: [PATCH] vhost: fix vhost session crash
> >
> > If any vq is inited well, treat the dev is RUNNING status.
> >
> > Root Cause:
> > The session has been created, and added the requestq poller into
> > reactor, but when destroying the device, the requestq is not
> > unregistered.
> > The seabios only initialized the req vq(idx = 2), ignore the controlq
> > and eventq vq.
> >
> > Reproduce:
> > 1. Create a VM;
> > 2. Mount a ISO to a VM, start the VM, don't install the OS;
> > 3. Restart the zbs-chunkd;
> >
> > Signed-off-by: Li Feng <fen...@smartx.com>
> > Change-Id: I21292e58b0b08237b5d105359095ec6a31907752
> > ---
> >   lib/librte_vhost/vhost_user.c | 6 ++++--
> >   1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
> > index f211ec8..a80e9f4 100644
> > --- a/lib/librte_vhost/vhost_user.c
> > +++ b/lib/librte_vhost/vhost_user.c
> > @@ -1394,9 +1394,11 @@ virtio_is_ready(struct virtio_net *dev)
> >    "kickfd: %d callfd: %d enabled: %d\n",
> >    dev->ifname, vq, i, vq->desc, vq->avail,
> >    vq->used, vq->kickfd, vq->callfd, vq->enabled);
> > - if (!vq_is_ready(dev, vq))
> > - return 0;
> > + if (vq_is_ready(dev, vq))
> > + break;
> >    }
> > + if (i == nr_vring)
> > + return 0;
> >
> >    /* If supported, ensure the frontend is really done with config */
> >    if (dev->protocol_features & (1ULL << VHOST_USER_PROTOCOL_F_STATUS))
> >
>
> Above patch will also cause regression, as networking backends work
> with queue pairs.
>
> So your issue is that SPDK is processing vrings while DPDK considers the
> device as not running. Instead of working around that issue, maybe what
> you should do is to introduce a new API and mechanism to help DPDK
> determine whether it should consider the device ready based on the
> backend type. IIUC, in your Vhost-scsi case, it should be as soon as VQ
> 2 is ready?

Yes, VQ 2 is ready, the vhost-iscsi device should be ready.
I had this idea months ago, if it's acceptable, I will try to do it.

Thanks.

>
>
>
>

Reply via email to