> On Aug 22, 2023, at 12:49 AM, Li Feng <fen...@smartx.com> wrote: > > > >> On 22 Aug 2023, at 8:38 AM, Raphael Norwitz <raphael.norw...@nutanix.com> >> wrote: >> >>> >>> On Aug 17, 2023, at 2:40 AM, Li Feng <fen...@smartx.com> wrote: >>> >>> >>>> 2023年8月14日 下午8:11,Raphael Norwitz <raphael.norw...@nutanix.com> 写道: >>>> >>>> Why can’t we rather fix this by adding a “event_cb” param to >>>> vhost_user_async_close and then call qemu_chr_fe_set_handlers in >>>> vhost_user_async_close_bh()? >>>> >>>> Even if calling vhost_dev_cleanup() twice is safe today I worry future >>>> changes may easily stumble over the reconnect case and introduce crashes >>>> or double frees. >>>> >>> I think add a new event_cb is not good enough. ‘qemu_chr_fe_set_handlers’ >>> has been called in vhost_user_async_close, and will be called in event->cb, >>> so why need add a >>> new event_cb? >>> >> >> I’m suggesting calling the data->event_cb instead of the data->cb if we hit >> the error case where vhost->vdev is NULL. Something like: >> >> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c >> index 8dcf049d42..edf1dccd44 100644 >> --- a/hw/virtio/vhost-user.c >> +++ b/hw/virtio/vhost-user.c >> @@ -2648,6 +2648,10 @@ typedef struct { >> static void vhost_user_async_close_bh(void *opaque) >> { >> VhostAsyncCallback *data = opaque; >> + >> + VirtIODevice *vdev = VIRTIO_DEVICE(data->dev); >> + VHostUserBlk *s = VHOST_USER_BLK(vdev); >> + >> struct vhost_dev *vhost = data->vhost; >> >> /* >> @@ -2657,6 +2661,9 @@ static void vhost_user_async_close_bh(void *opaque) >> */ >> if (vhost->vdev) { >> data->cb(data->dev); >> + } else if (data->event_cb) { >> + qemu_chr_fe_set_handlers(&s->chardev, NULL, NULL, data->event_cb, >> + NULL, data->dev, NULL, true); >> } >> >> g_free(data); >> >> data->event_cb would be vhost_user_blk_event(). >> >> I think that makes the error path a lot easier to reason about and more >> future proof. >> >>> For avoiding to call the vhost_dev_cleanup() twice, add a ‘inited’ in >>> struct vhost-dev to mark if it’s inited like this: >>> >> >> This is better than the original, but let me know what you think of my >> alternative. > > The vhost_user_async_close_bh() is a common function in vhost-user.c, and > vhost_user_async_close() is used by vhost-user-scsi/blk/gpio, > However, in your patch it’s limited to VhostUserBlk, so I think my fix is > more reasonable.
I did not write out the full patch. vhost-user-scsi/blk/gpio would each provide their own ->event_cb, just like they each provide a different ->cb today. Looking at it again, data has the chardev, so no need to use VIRTO_DEVICE() or VHOST_USER_BLK(). The fix generalizes to all device types. > > Thanks, > LI >> >>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c >>> index e2f6ffb446..edc80c0231 100644 >>> --- a/hw/virtio/vhost.c >>> +++ b/hw/virtio/vhost.c >>> @@ -1502,6 +1502,7 @@ int vhost_dev_init(struct vhost_dev *hdev, void >>> *opaque, >>> goto fail_busyloop; >>> } >>> >>> + hdev->inited = true; >>> return 0; >>> >>> fail_busyloop: >>> @@ -1520,6 +1521,10 @@ void vhost_dev_cleanup(struct vhost_dev *hdev) >>> { >>> int i; >>> >>> + if (!hdev->inited) { >>> + return; >>> + } >>> + hdev->inited = false; >>> trace_vhost_dev_cleanup(hdev); >>> >>> for (i = 0; i < hdev->nvqs; ++i) { >>> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h >>> index ca3131b1af..74b1aec960 100644 >>> --- a/include/hw/virtio/vhost.h >>> +++ b/include/hw/virtio/vhost.h >>> @@ -123,6 +123,7 @@ struct vhost_dev { >>> /* @started: is the vhost device started? */ >>> bool started; >>> bool log_enabled; >>> + bool inited; >>> uint64_t log_size; >>> Error *migration_blocker; >>> const VhostOps *vhost_ops; >>> >>> Thanks. >>> >>>> >>>>> On Aug 4, 2023, at 1:29 AM, Li Feng <fen...@smartx.com> wrote: >>>>> >>>>> When the vhost-user is reconnecting to the backend, and if the vhost-user >>>>> fails >>>>> at the get_features in vhost_dev_init(), then the reconnect will fail >>>>> and it will not be retriggered forever. >>>>> >>>>> The reason is: >>>>> When the vhost-user fail at get_features, the vhost_dev_cleanup will be >>>>> called >>>>> immediately. >>>>> >>>>> vhost_dev_cleanup calls 'memset(hdev, 0, sizeof(struct vhost_dev))'. >>>>> >>>>> The reconnect path is: >>>>> vhost_user_blk_event >>>>> vhost_user_async_close(.. vhost_user_blk_disconnect ..) >>>>> qemu_chr_fe_set_handlers <----- clear the notifier callback >>>>> schedule vhost_user_async_close_bh >>>>> >>>>> The vhost->vdev is null, so the vhost_user_blk_disconnect will not be >>>>> called, then the event fd callback will not be reinstalled. >>>>> >>>>> With this patch, the vhost_user_blk_disconnect will call the >>>>> vhost_dev_cleanup() again, it's safe. >>>>> >>>>> All vhost-user devices have this issue, including vhost-user-blk/scsi. >>>>> >>>>> Fixes: 71e076a07d ("hw/virtio: generalise CHR_EVENT_CLOSED handling") >>>>> >>>>> Signed-off-by: Li Feng <fen...@smartx.com> >>>>> --- >>>>> hw/virtio/vhost-user.c | 10 +--------- >>>>> 1 file changed, 1 insertion(+), 9 deletions(-) >>>>> >>>>> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c >>>>> index 8dcf049d42..697b403fe2 100644 >>>>> --- a/hw/virtio/vhost-user.c >>>>> +++ b/hw/virtio/vhost-user.c >>>>> @@ -2648,16 +2648,8 @@ typedef struct { >>>>> static void vhost_user_async_close_bh(void *opaque) >>>>> { >>>>> VhostAsyncCallback *data = opaque; >>>>> - struct vhost_dev *vhost = data->vhost; >>>>> >>>>> - /* >>>>> - * If the vhost_dev has been cleared in the meantime there is >>>>> - * nothing left to do as some other path has completed the >>>>> - * cleanup. >>>>> - */ >>>>> - if (vhost->vdev) { >>>>> - data->cb(data->dev); >>>>> - } >>>>> + data->cb(data->dev); >>>>> >>>>> g_free(data); >>>>> } >>>>> -- >>>>> 2.41.0