> On Aug 17, 2023, at 2:40 AM, Li Feng <fen...@smartx.com> wrote: > > >> 2023年8月14日 下午8:11,Raphael Norwitz <raphael.norw...@nutanix.com> 写道: >> >> Why can’t we rather fix this by adding a “event_cb” param to >> vhost_user_async_close and then call qemu_chr_fe_set_handlers in >> vhost_user_async_close_bh()? >> >> Even if calling vhost_dev_cleanup() twice is safe today I worry future >> changes may easily stumble over the reconnect case and introduce crashes or >> double frees. >> > I think add a new event_cb is not good enough. ‘qemu_chr_fe_set_handlers’ has > been called in vhost_user_async_close, and will be called in event->cb, so > why need add a > new event_cb? >
I’m suggesting calling the data->event_cb instead of the data->cb if we hit the error case where vhost->vdev is NULL. Something like: diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index 8dcf049d42..edf1dccd44 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -2648,6 +2648,10 @@ typedef struct { static void vhost_user_async_close_bh(void *opaque) { VhostAsyncCallback *data = opaque; + + VirtIODevice *vdev = VIRTIO_DEVICE(data->dev); + VHostUserBlk *s = VHOST_USER_BLK(vdev); + struct vhost_dev *vhost = data->vhost; /* @@ -2657,6 +2661,9 @@ static void vhost_user_async_close_bh(void *opaque) */ if (vhost->vdev) { data->cb(data->dev); + } else if (data->event_cb) { + qemu_chr_fe_set_handlers(&s->chardev, NULL, NULL, data->event_cb, + NULL, data->dev, NULL, true); } g_free(data); data->event_cb would be vhost_user_blk_event(). I think that makes the error path a lot easier to reason about and more future proof. > For avoiding to call the vhost_dev_cleanup() twice, add a ‘inited’ in struct > vhost-dev to mark if it’s inited like this: > This is better than the original, but let me know what you think of my alternative. > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c > index e2f6ffb446..edc80c0231 100644 > --- a/hw/virtio/vhost.c > +++ b/hw/virtio/vhost.c > @@ -1502,6 +1502,7 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque, > goto fail_busyloop; > } > > + hdev->inited = true; > return 0; > > fail_busyloop: > @@ -1520,6 +1521,10 @@ void vhost_dev_cleanup(struct vhost_dev *hdev) > { > int i; > > + if (!hdev->inited) { > + return; > + } > + hdev->inited = false; > trace_vhost_dev_cleanup(hdev); > > for (i = 0; i < hdev->nvqs; ++i) { > diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h > index ca3131b1af..74b1aec960 100644 > --- a/include/hw/virtio/vhost.h > +++ b/include/hw/virtio/vhost.h > @@ -123,6 +123,7 @@ struct vhost_dev { > /* @started: is the vhost device started? */ > bool started; > bool log_enabled; > + bool inited; > uint64_t log_size; > Error *migration_blocker; > const VhostOps *vhost_ops; > > Thanks. > >> >>> On Aug 4, 2023, at 1:29 AM, Li Feng <fen...@smartx.com> wrote: >>> >>> When the vhost-user is reconnecting to the backend, and if the vhost-user >>> fails >>> at the get_features in vhost_dev_init(), then the reconnect will fail >>> and it will not be retriggered forever. >>> >>> The reason is: >>> When the vhost-user fail at get_features, the vhost_dev_cleanup will be >>> called >>> immediately. >>> >>> vhost_dev_cleanup calls 'memset(hdev, 0, sizeof(struct vhost_dev))'. >>> >>> The reconnect path is: >>> vhost_user_blk_event >>> vhost_user_async_close(.. vhost_user_blk_disconnect ..) >>> qemu_chr_fe_set_handlers <----- clear the notifier callback >>> schedule vhost_user_async_close_bh >>> >>> The vhost->vdev is null, so the vhost_user_blk_disconnect will not be >>> called, then the event fd callback will not be reinstalled. >>> >>> With this patch, the vhost_user_blk_disconnect will call the >>> vhost_dev_cleanup() again, it's safe. >>> >>> All vhost-user devices have this issue, including vhost-user-blk/scsi. >>> >>> Fixes: 71e076a07d ("hw/virtio: generalise CHR_EVENT_CLOSED handling") >>> >>> Signed-off-by: Li Feng <fen...@smartx.com> >>> --- >>> hw/virtio/vhost-user.c | 10 +--------- >>> 1 file changed, 1 insertion(+), 9 deletions(-) >>> >>> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c >>> index 8dcf049d42..697b403fe2 100644 >>> --- a/hw/virtio/vhost-user.c >>> +++ b/hw/virtio/vhost-user.c >>> @@ -2648,16 +2648,8 @@ typedef struct { >>> static void vhost_user_async_close_bh(void *opaque) >>> { >>> VhostAsyncCallback *data = opaque; >>> - struct vhost_dev *vhost = data->vhost; >>> >>> - /* >>> - * If the vhost_dev has been cleared in the meantime there is >>> - * nothing left to do as some other path has completed the >>> - * cleanup. >>> - */ >>> - if (vhost->vdev) { >>> - data->cb(data->dev); >>> - } >>> + data->cb(data->dev); >>> >>> g_free(data); >>> } >>> -- >>> 2.41.0 >>> >> >