> 2023年7月31日 06:09,Raphael Norwitz <raphael.norw...@nutanix.com> 写道: > > > >> On Jul 28, 2023, at 3:48 AM, Li Feng <fen...@smartx.com> wrote: >> >> Thanks for your reply. >> >>> 2023年7月28日 上午5:21,Raphael Norwitz <raphael.norw...@nutanix.com> 写道: >>> >>> >>> >>>> On Jul 25, 2023, at 6:19 AM, Li Feng <fen...@smartx.com> wrote: >>>> >>>> Thanks for your comments. >>>> >>>>> 2023年7月25日 上午1:21,Raphael Norwitz <raphael.norw...@nutanix.com> 写道: >>>>> >>>>> Very excited to see this. High level looks good modulo a few small things. >>>>> >>>>> My major concern is around existing vhost-user-scsi backends which don’t >>>>> support VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD. IMO we should hide the >>>>> reconnect behavior behind a VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD check. >>>>> We may want to do the same for vhost-user-blk. >>>>> >>>>> The question is then what happens if the check is false. IIUC without an >>>>> inflight FD, if a device processes requests out of order, it’s not safe >>>>> to continue execution on reconnect, as there’s no way for the backend to >>>>> know how to replay IO. Should we permanently wedge the device or have >>>>> QEMU fail out? May be nice to have a toggle for this. >>>> >>>> Based on what MST said, is there anything else I need to do? >>> >>> I don’t think so. >>> >>>>> >>>>>> On Jul 21, 2023, at 6:51 AM, Li Feng <fen...@smartx.com> wrote: >>>>>> >>>>>> If the backend crashes and restarts, the device is broken. >>>>>> This patch adds reconnect for vhost-user-scsi. >>>>>> >>>>>> Tested with spdk backend. >>>>>> >>>>>> Signed-off-by: Li Feng <fen...@smartx.com> >>>>>> --- >>>>>> hw/block/vhost-user-blk.c | 2 - >>>>>> hw/scsi/vhost-scsi-common.c | 27 ++--- >>>>>> hw/scsi/vhost-user-scsi.c | 163 +++++++++++++++++++++++++--- >>>>>> include/hw/virtio/vhost-user-scsi.h | 3 + >>>>>> include/hw/virtio/vhost.h | 2 + >>>>>> 5 files changed, 165 insertions(+), 32 deletions(-) >>>>>> >>>>>> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c >>>>>> index eecf3f7a81..f250c740b5 100644 >>>>>> --- a/hw/block/vhost-user-blk.c >>>>>> +++ b/hw/block/vhost-user-blk.c >>>>>> @@ -32,8 +32,6 @@ >>>>>> #include "sysemu/sysemu.h" >>>>>> #include "sysemu/runstate.h" >>>>>> >>>>>> -#define REALIZE_CONNECTION_RETRIES 3 >>>>>> - >>>>>> static const int user_feature_bits[] = { >>>>>> VIRTIO_BLK_F_SIZE_MAX, >>>>>> VIRTIO_BLK_F_SEG_MAX, >>>>>> diff --git a/hw/scsi/vhost-scsi-common.c b/hw/scsi/vhost-scsi-common.c >>>>> >>>>> Why can’t all the vhost-scsi-common stuff be moved to a separate change? >>>> >>>> I will move this code to separate patch. >>>>> >>>>> Especially the stuff introduced for vhost-user-blk in >>>>> 1b0063b3048af65dfaae6422a572c87db8575a92 should be moved out. >>>> OK. >>>> >>>>> >>>>>> index a06f01af26..08801886b8 100644 >>>>>> --- a/hw/scsi/vhost-scsi-common.c >>>>>> +++ b/hw/scsi/vhost-scsi-common.c >>>>>> @@ -52,16 +52,22 @@ int vhost_scsi_common_start(VHostSCSICommon *vsc) >>>>>> >>>>>> vsc->dev.acked_features = vdev->guest_features; >>>>>> >>>>>> - assert(vsc->inflight == NULL); >>>>>> - vsc->inflight = g_new0(struct vhost_inflight, 1); >>>>>> - ret = vhost_dev_get_inflight(&vsc->dev, >>>>>> - vs->conf.virtqueue_size, >>>>>> - vsc->inflight); >>>>>> + ret = vhost_dev_prepare_inflight(&vsc->dev, vdev); >>>>>> if (ret < 0) { >>>>>> - error_report("Error get inflight: %d", -ret); >>>>>> + error_report("Error setting inflight format: %d", -ret); >>>>>> goto err_guest_notifiers; >>>>>> } >>>>>> >>>>>> + if (!vsc->inflight->addr) { >>>>>> + ret = vhost_dev_get_inflight(&vsc->dev, >>>>>> + vs->conf.virtqueue_size, >>>>>> + vsc->inflight); >>>>>> + if (ret < 0) { >>>>>> + error_report("Error get inflight: %d", -ret); >>>>>> + goto err_guest_notifiers; >>>>>> + } >>>>>> + } >>>>>> + >>>>>> ret = vhost_dev_set_inflight(&vsc->dev, vsc->inflight); >>>>>> if (ret < 0) { >>>>>> error_report("Error set inflight: %d", -ret); >>>>>> @@ -85,9 +91,6 @@ int vhost_scsi_common_start(VHostSCSICommon *vsc) >>>>>> return ret; >>>>>> >>>>>> err_guest_notifiers: >>>>>> - g_free(vsc->inflight); >>>>>> - vsc->inflight = NULL; >>>>>> - >>>>>> k->set_guest_notifiers(qbus->parent, vsc->dev.nvqs, false); >>>>>> err_host_notifiers: >>>>>> vhost_dev_disable_notifiers(&vsc->dev, vdev); >>>>>> @@ -111,12 +114,6 @@ void vhost_scsi_common_stop(VHostSCSICommon *vsc) >>>>>> } >>>>>> assert(ret >= 0); >>>>>> >>>>> >>>>> In the vhost-scsi (kernel backend) path, what will cleanup vsc->inflight >>>>> now? >>>> OK, we should check the vsc->inflight if it is null, the vhost-scsi >>>> doesn’t allocate the >>>> inflight object memory. >>> >>> Are you saying vhost-scsi never allocates inflight so we don’t need to >>> check for it? >> We have checked the vsc->inflight, and only if allocated, we send the >> get/set_inflight_fd. >> This works with vhost-user-scsi/vhost-scsi both. > > So then it sounds like this code introduces a resource leak. > g_free(vsc->inflight) should be added to the vhost-scsi code in > vhost_scsi_stop().
No, the vhost-scsi doesn’t need ‘inflight', it doesn’t allocate the inflight memory. The rule is ‘who allocates, who free it’. > >>> >>>> >>>>> >>>>>> - if (vsc->inflight) { >>>>>> - vhost_dev_free_inflight(vsc->inflight); >>>>>> - g_free(vsc->inflight); >>>>>> - vsc->inflight = NULL; >>>>>> - } >>>>>> - >>>>>> vhost_dev_disable_notifiers(&vsc->dev, vdev); >>>>>> } >>>>>> >>>>>> diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c >>>>>> index ee99b19e7a..e0e88b0c42 100644 >>>>>> --- a/hw/scsi/vhost-user-scsi.c >>>>>> +++ b/hw/scsi/vhost-user-scsi.c >>>>>> @@ -89,14 +89,126 @@ static void vhost_dummy_handle_output(VirtIODevice >>>>>> *vdev, VirtQueue *vq) >>>>>> { >>>>>> } >>>>>> >>>>>> +static int vhost_user_scsi_connect(DeviceState *dev, Error **errp) >>>>>> +{ >>>>>> + VirtIODevice *vdev = VIRTIO_DEVICE(dev); >>>>>> + VHostUserSCSI *s = VHOST_USER_SCSI(vdev); >>>>>> + VHostSCSICommon *vsc = VHOST_SCSI_COMMON(s); >>>>>> + VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev); >>>>>> + int ret = 0; >>>>>> + >>>>>> + if (s->connected) { >>>>>> + return 0; >>>>>> + } >>>>>> + s->connected = true; >>>>>> + >>>>>> + vsc->dev.num_queues = vs->conf.num_queues; >>>>>> + vsc->dev.nvqs = VIRTIO_SCSI_VQ_NUM_FIXED + vs->conf.num_queues; >>>>>> + vsc->dev.vqs = s->vhost_vqs; >>>>>> + vsc->dev.vq_index = 0; >>>>>> + vsc->dev.backend_features = 0; >>>>>> + >>>>>> + ret = vhost_dev_init(&vsc->dev, &s->vhost_user, >>>>>> VHOST_BACKEND_TYPE_USER, 0, >>>>>> + errp); >>>>>> + if (ret < 0) { >>>>>> + return ret; >>>>>> + } >>>>>> + >>>>>> + /* restore vhost state */ >>>>> >>>>> Should this use virtio_device_should_start like vhost_user_blk? >>>> I will change this. >>>>> >>>>>> + if (virtio_device_started(vdev, vdev->status)) { >>>>>> + ret = vhost_scsi_common_start(vsc); >>>>>> + if (ret < 0) { >>>>>> + return ret; >>>>>> + } >>>>>> + } >>>>>> + >>>>>> + return 0; >>>>>> +} >>>>>> + >>>>>> +static void vhost_user_scsi_event(void *opaque, QEMUChrEvent event); >>>>>> + >>>>>> +static void vhost_user_scsi_disconnect(DeviceState *dev) >>>>>> +{ >>>>>> + VirtIODevice *vdev = VIRTIO_DEVICE(dev); >>>>>> + VHostUserSCSI *s = VHOST_USER_SCSI(vdev); >>>>>> + VHostSCSICommon *vsc = VHOST_SCSI_COMMON(s); >>>>>> + VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev); >>>>>> + >>>>> >>>>> I don’t think we want to execute vhost_scsi_common_stop() if the device >>>>> hasn’t been started. I remember that caused a number of races with the >>>>> vhost_user_blk connecting/disconnecting on startup. >>>>> >>>>> Let’s add a similar started_vu check? >>>> I will add it. >>>>> >>>>>> + if (!s->connected) { >>>>>> + return; >>>>>> + } >>>>>> + s->connected = false; >>>>>> + >>>>>> + vhost_scsi_common_stop(vsc); >>>>>> + >>>>>> + vhost_dev_cleanup(&vsc->dev); >>>>>> + >>>>>> + /* Re-instate the event handler for new connections */ >>>>>> + qemu_chr_fe_set_handlers(&vs->conf.chardev, NULL, NULL, >>>>>> + vhost_user_scsi_event, NULL, dev, NULL, >>>>>> true); >>>>>> +} >>>>>> + >>>>>> +static void vhost_user_scsi_event(void *opaque, QEMUChrEvent event) >>>>>> +{ >>>>>> + DeviceState *dev = opaque; >>>>>> + VirtIODevice *vdev = VIRTIO_DEVICE(dev); >>>>>> + VHostUserSCSI *s = VHOST_USER_SCSI(vdev); >>>>>> + VHostSCSICommon *vsc = VHOST_SCSI_COMMON(s); >>>>>> + VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev); >>>>>> + Error *local_err = NULL; >>>>>> + >>>>>> + switch (event) { >>>>>> + case CHR_EVENT_OPENED: >>>>>> + if (vhost_user_scsi_connect(dev, &local_err) < 0) { >>>>>> + error_report_err(local_err); >>>>>> + qemu_chr_fe_disconnect(&vs->conf.chardev); >>>>>> + return; >>>>>> + } >>>>>> + break; >>>>>> + case CHR_EVENT_CLOSED: >>>>>> + /* defer close until later to avoid circular close */ >>>>>> + vhost_user_async_close(dev, &vs->conf.chardev, &vsc->dev, >>>>>> + vhost_user_scsi_disconnect); >>>>>> + break; >>>>>> + case CHR_EVENT_BREAK: >>>>>> + case CHR_EVENT_MUX_IN: >>>>>> + case CHR_EVENT_MUX_OUT: >>>>>> + /* Ignore */ >>>>>> + break; >>>>>> + } >>>>>> +} >>>>>> + >>>>>> +static int vhost_user_scsi_realize_connect(VHostUserSCSI *s, Error >>>>>> **errp) >>>>>> +{ >>>>>> + DeviceState *dev = &s->parent_obj.parent_obj.parent_obj.parent_obj; >>>>>> + VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev); >>>>>> + int ret; >>>>>> + >>>>>> + s->connected = false; >>>>>> + >>>>>> + ret = qemu_chr_fe_wait_connected(&vs->conf.chardev, errp); >>>>>> + if (ret < 0) { >>>>>> + return ret; >>>>>> + } >>>>>> + >>>>>> + ret = vhost_user_scsi_connect(dev, errp); >>>>>> + if (ret < 0) { >>>>>> + qemu_chr_fe_disconnect(&vs->conf.chardev); >>>>>> + return ret; >>>>>> + } >>>>>> + assert(s->connected); >>>>>> + >>>>>> + return 0; >>>>>> +} >>>>>> + >>>>>> static void vhost_user_scsi_realize(DeviceState *dev, Error **errp) >>>>>> { >>>>>> VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev); >>>>>> VHostUserSCSI *s = VHOST_USER_SCSI(dev); >>>>>> VHostSCSICommon *vsc = VHOST_SCSI_COMMON(s); >>>>>> - struct vhost_virtqueue *vqs = NULL; >>>>>> Error *err = NULL; >>>>>> int ret; >>>>>> + int retries = REALIZE_CONNECTION_RETRIES; >>>>>> >>>>>> if (!vs->conf.chardev.chr) { >>>>>> error_setg(errp, "vhost-user-scsi: missing chardev"); >>>>>> @@ -112,21 +224,31 @@ static void vhost_user_scsi_realize(DeviceState >>>>>> *dev, Error **errp) >>>>>> } >>>>>> >>>>>> if (!vhost_user_init(&s->vhost_user, &vs->conf.chardev, errp)) { >>>>> >>>>> Why execute vhost_user_cleanup() if vhost_user_init() fails? >>>> OK, move this line up in v2. >>>> >>>>> >>>>>> - goto free_virtio; >>>>>> + goto free_vhost; >>>>>> } >>>>>> >>>>>> - vsc->dev.nvqs = VIRTIO_SCSI_VQ_NUM_FIXED + vs->conf.num_queues; >>>>>> - vsc->dev.vqs = g_new0(struct vhost_virtqueue, vsc->dev.nvqs); >>>>>> - vsc->dev.vq_index = 0; >>>>>> - vsc->dev.backend_features = 0; >>>>>> - vqs = vsc->dev.vqs; >>>>>> + vsc->inflight = g_new0(struct vhost_inflight, 1); >>>>>> + s->vhost_vqs = g_new0(struct vhost_virtqueue, >>>>>> + VIRTIO_SCSI_VQ_NUM_FIXED + >>>>>> vs->conf.num_queues); >>>>>> + >>>>>> + assert(!*errp); >>>>>> + do { >>>>>> + if (*errp) { >>>>>> + error_prepend(errp, "Reconnecting after error: "); >>>>>> + error_report_err(*errp); >>>>>> + *errp = NULL; >>>>>> + } >>>>>> + ret = vhost_user_scsi_realize_connect(s, errp); >>>>>> + } while (ret < 0 && retries--); >>>>>> >>>>>> - ret = vhost_dev_init(&vsc->dev, &s->vhost_user, >>>>>> - VHOST_BACKEND_TYPE_USER, 0, errp); >>>>>> if (ret < 0) { >>>>>> - goto free_vhost; >>>>>> + goto free_vqs; >>>>>> } >>>>>> >>>>>> + /* we're fully initialized, now we can operate, so add the handler >>>>>> */ >>>>>> + qemu_chr_fe_set_handlers(&vs->conf.chardev, NULL, NULL, >>>>>> + vhost_user_scsi_event, NULL, (void *)dev, >>>>>> + NULL, true); >>>>>> /* Channel and lun both are 0 for bootable vhost-user-scsi disk */ >>>>>> vsc->channel = 0; >>>>>> vsc->lun = 0; >>>>>> @@ -134,10 +256,15 @@ static void vhost_user_scsi_realize(DeviceState >>>>>> *dev, Error **errp) >>>>>> >>>>>> return; >>>>>> >>>>>> +free_vqs: >>>>>> + g_free(s->vhost_vqs); >>>>>> + s->vhost_vqs = NULL; >>>>>> + g_free(vsc->inflight); >>>>>> + vsc->inflight = NULL; >>>>>> + >>>>>> free_vhost: >>>>>> vhost_user_cleanup(&s->vhost_user); >>>>>> - g_free(vqs); >>>>>> -free_virtio: >>>>>> + >>>>>> virtio_scsi_common_unrealize(dev); >>>>>> } >>>>>> >>>>>> @@ -146,16 +273,22 @@ static void vhost_user_scsi_unrealize(DeviceState >>>>>> *dev) >>>>>> VirtIODevice *vdev = VIRTIO_DEVICE(dev); >>>>>> VHostUserSCSI *s = VHOST_USER_SCSI(dev); >>>>>> VHostSCSICommon *vsc = VHOST_SCSI_COMMON(s); >>>>>> - struct vhost_virtqueue *vqs = vsc->dev.vqs; >>>>>> + VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev); >>>>>> >>>>>> /* This will stop the vhost backend. */ >>>>>> vhost_user_scsi_set_status(vdev, 0); >>>>>> + qemu_chr_fe_set_handlers(&vs->conf.chardev, NULL, NULL, NULL, NULL, >>>>>> NULL, >>>>>> + NULL, false); >>>>>> >>>>>> vhost_dev_cleanup(&vsc->dev); >>>>>> - g_free(vqs); >>>>> >>>>> Nit: Why not put vhost_dev_free_inflight next to the remaining inflight >>>>> cleanup? >>>> OK. >>>>> >>>>>> + vhost_dev_free_inflight(vsc->inflight); >>>>>> + g_free(s->vhost_vqs); >>>>>> + s->vhost_vqs = NULL; >>>>>> + g_free(vsc->inflight); >>>>>> + vsc->inflight = NULL; >>>>>> >>>>> >>>>> Curiosity - why reorder here? Is something in vhost_user_cleanup() >>>>> dependent on state freed in virtio_scsi_common_unrealize()? >>>>> >>>>> If so, should that go as a standalone fix? >>>> >>>> Because in vhost_user_scsi_realize, we initialize in order: >>>> virtio_scsi_common_realize >>>> vhost_user_init >>>> >>>> And in the error handler of vhost_user_scsi_realize, the uninitialize in >>>> order: >>>> vhost_user_cleanup >>>> virtio_scsi_common_unrealize >>>> >>>> I think in vhost_user_scsi_unrealize we should keep it the same order, >>>> right? >>> >>> I’m not saying it’s wrong. If there’s no dependency (i.e. this is not >>> fixing a bug, just a stylistic improvement) it can stay in the same change. >> OK. >>> >>>> >>>>> >>>>>> - virtio_scsi_common_unrealize(dev); >>>>>> vhost_user_cleanup(&s->vhost_user); >>>>>> + virtio_scsi_common_unrealize(dev); >>>>>> } >>>>>> >>>>>> static Property vhost_user_scsi_properties[] = { >>>>>> diff --git a/include/hw/virtio/vhost-user-scsi.h >>>>>> b/include/hw/virtio/vhost-user-scsi.h >>>>>> index 521b08e559..c66acc68b7 100644 >>>>>> --- a/include/hw/virtio/vhost-user-scsi.h >>>>>> +++ b/include/hw/virtio/vhost-user-scsi.h >>>>>> @@ -29,6 +29,9 @@ OBJECT_DECLARE_SIMPLE_TYPE(VHostUserSCSI, >>>>>> VHOST_USER_SCSI) >>>>>> struct VHostUserSCSI { >>>>>> VHostSCSICommon parent_obj; >>>>>> VhostUserState vhost_user; >>>>> >>>>> See above - we should probably have started_vu here/ >>>> I will add it. >>>>> >>>>> Maybe we should have some shared struct with vhost_user_blk for >>>>> connectivity params? >>>> >>>> In the future vhost-user-blk/scsi can be refactored to share the same code. >>> >>> Sure - this can be done at some point in the future. >>> >>>>> >>>>>> + bool connected; >>>>>> + >>>>>> + struct vhost_virtqueue *vhost_vqs; >>>>>> }; >>>>>> >>>>>> #endif /* VHOST_USER_SCSI_H */ >>>>>> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h >>>>>> index 6a173cb9fa..b904346fe1 100644 >>>>>> --- a/include/hw/virtio/vhost.h >>>>>> +++ b/include/hw/virtio/vhost.h >>>>>> @@ -8,6 +8,8 @@ >>>>>> #define VHOST_F_DEVICE_IOTLB 63 >>>>>> #define VHOST_USER_F_PROTOCOL_FEATURES 30 >>>>>> >>>>> >>>>> Should the macro name indicate that this is for vhost-user? >>>>> >>>>> VU_REALIZE_CONN_RETRIES? >>>> I will rename it in v2. >>>> >>>>> >>>>>> +#define REALIZE_CONNECTION_RETRIES 3 >>>>>> + >>>>>> /* Generic structures common for any vhost based device. */ >>>>>> >>>>>> struct vhost_inflight { >>>>>> -- >>>>>> 2.41.0 >> >> Any comments about other patches? > > I’ll send shortly.