> On Aug 17, 2023, at 2:40 AM, Li Feng <fen...@smartx.com> wrote:
> 
> 
>> 2023年8月14日 下午8:11,Raphael Norwitz <raphael.norw...@nutanix.com> 写道:
>> 
>> Why can’t we rather fix this by adding a “event_cb” param to 
>> vhost_user_async_close and then call qemu_chr_fe_set_handlers in 
>> vhost_user_async_close_bh()?
>> 
>> Even if calling vhost_dev_cleanup() twice is safe today I worry future 
>> changes may easily stumble over the reconnect case and introduce crashes or 
>> double frees.
>> 
> I think add a new event_cb is not good enough. ‘qemu_chr_fe_set_handlers’ has 
> been called in vhost_user_async_close, and will be called in event->cb, so 
> why need add a
> new event_cb?
> 

I’m suggesting calling the data->event_cb instead of the data->cb if we hit the 
error case where vhost->vdev is NULL. Something like:

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 8dcf049d42..edf1dccd44 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -2648,6 +2648,10 @@ typedef struct {
 static void vhost_user_async_close_bh(void *opaque)
 {
     VhostAsyncCallback *data = opaque;
+
+    VirtIODevice *vdev = VIRTIO_DEVICE(data->dev);
+    VHostUserBlk *s = VHOST_USER_BLK(vdev);
+
     struct vhost_dev *vhost = data->vhost;
 
     /*
@@ -2657,6 +2661,9 @@ static void vhost_user_async_close_bh(void *opaque)
      */
     if (vhost->vdev) {
         data->cb(data->dev);
+    } else if (data->event_cb) {
+        qemu_chr_fe_set_handlers(&s->chardev, NULL, NULL, data->event_cb,
+                                 NULL, data->dev, NULL, true);
     }
 
     g_free(data);

data->event_cb would be vhost_user_blk_event().

I think that makes the error path a lot easier to reason about and more future 
proof.

> For avoiding to call the vhost_dev_cleanup() twice, add a ‘inited’ in struct 
> vhost-dev to mark if it’s inited like this:
> 

This is better than the original, but let me know what you think of my 
alternative.

> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index e2f6ffb446..edc80c0231 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -1502,6 +1502,7 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
>         goto fail_busyloop;
>     }
> 
> +    hdev->inited = true;
>     return 0;
> 
> fail_busyloop:
> @@ -1520,6 +1521,10 @@ void vhost_dev_cleanup(struct vhost_dev *hdev)
> {
>     int i;
> 
> +    if (!hdev->inited) {
> +        return;
> +    }
> +    hdev->inited = false;
>     trace_vhost_dev_cleanup(hdev);
> 
>     for (i = 0; i < hdev->nvqs; ++i) {
> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> index ca3131b1af..74b1aec960 100644
> --- a/include/hw/virtio/vhost.h
> +++ b/include/hw/virtio/vhost.h
> @@ -123,6 +123,7 @@ struct vhost_dev {
>     /* @started: is the vhost device started? */
>     bool started;
>     bool log_enabled;
> +    bool inited;
>     uint64_t log_size;
>     Error *migration_blocker;
>     const VhostOps *vhost_ops;
> 
> Thanks.
> 
>> 
>>> On Aug 4, 2023, at 1:29 AM, Li Feng <fen...@smartx.com> wrote:
>>> 
>>> When the vhost-user is reconnecting to the backend, and if the vhost-user 
>>> fails
>>> at the get_features in vhost_dev_init(), then the reconnect will fail
>>> and it will not be retriggered forever.
>>> 
>>> The reason is:
>>> When the vhost-user fail at get_features, the vhost_dev_cleanup will be 
>>> called
>>> immediately.
>>> 
>>> vhost_dev_cleanup calls 'memset(hdev, 0, sizeof(struct vhost_dev))'.
>>> 
>>> The reconnect path is:
>>> vhost_user_blk_event
>>> vhost_user_async_close(.. vhost_user_blk_disconnect ..)
>>>   qemu_chr_fe_set_handlers <----- clear the notifier callback
>>>     schedule vhost_user_async_close_bh
>>> 
>>> The vhost->vdev is null, so the vhost_user_blk_disconnect will not be
>>> called, then the event fd callback will not be reinstalled.
>>> 
>>> With this patch, the vhost_user_blk_disconnect will call the
>>> vhost_dev_cleanup() again, it's safe.
>>> 
>>> All vhost-user devices have this issue, including vhost-user-blk/scsi.
>>> 
>>> Fixes: 71e076a07d ("hw/virtio: generalise CHR_EVENT_CLOSED handling")
>>> 
>>> Signed-off-by: Li Feng <fen...@smartx.com>
>>> ---
>>> hw/virtio/vhost-user.c | 10 +---------
>>> 1 file changed, 1 insertion(+), 9 deletions(-)
>>> 
>>> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
>>> index 8dcf049d42..697b403fe2 100644
>>> --- a/hw/virtio/vhost-user.c
>>> +++ b/hw/virtio/vhost-user.c
>>> @@ -2648,16 +2648,8 @@ typedef struct {
>>> static void vhost_user_async_close_bh(void *opaque)
>>> {
>>>   VhostAsyncCallback *data = opaque;
>>> -    struct vhost_dev *vhost = data->vhost;
>>> 
>>> -    /*
>>> -     * If the vhost_dev has been cleared in the meantime there is
>>> -     * nothing left to do as some other path has completed the
>>> -     * cleanup.
>>> -     */
>>> -    if (vhost->vdev) {
>>> -        data->cb(data->dev);
>>> -    }
>>> +    data->cb(data->dev);
>>> 
>>>   g_free(data);
>>> }
>>> -- 
>>> 2.41.0
>>> 
>> 
> 

Reply via email to