Am 23.11.2023 um 20:49 hat Stefan Hajnoczi geschrieben: > Stop depending on the AioContext lock and instead access > SCSIDevice->requests from only one thread at a time: > - When the VM is running only the BlockBackend's AioContext may access > the requests list. > - When the VM is stopped only the main loop may access the requests > list. > > These constraints protect the requests list without the need for locking > in the I/O code path. > > Note that multiple IOThreads are not supported yet because the code > assumes all SCSIRequests are executed from a single AioContext. Leave > that as future work. > > Signed-off-by: Stefan Hajnoczi <stefa...@redhat.com> > --- > include/hw/scsi/scsi.h | 7 +- > hw/scsi/scsi-bus.c | 174 ++++++++++++++++++++++++++++------------- > 2 files changed, 124 insertions(+), 57 deletions(-) > > diff --git a/include/hw/scsi/scsi.h b/include/hw/scsi/scsi.h > index 3692ca82f3..10c4e8288d 100644 > --- a/include/hw/scsi/scsi.h > +++ b/include/hw/scsi/scsi.h > @@ -69,14 +69,19 @@ struct SCSIDevice > { > DeviceState qdev; > VMChangeStateEntry *vmsentry; > - QEMUBH *bh; > uint32_t id; > BlockConf conf; > SCSISense unit_attention; > bool sense_is_ua; > uint8_t sense[SCSI_SENSE_BUF_SIZE]; > uint32_t sense_len; > + > + /* > + * The requests list is only accessed from the AioContext that executes > + * requests or from the main loop when IOThread processing is stopped. > + */ > QTAILQ_HEAD(, SCSIRequest) requests; > + > uint32_t channel; > uint32_t lun; > int blocksize; > diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c > index fc4b77fdb0..b8bfde9565 100644 > --- a/hw/scsi/scsi-bus.c > +++ b/hw/scsi/scsi-bus.c > @@ -85,6 +85,82 @@ SCSIDevice *scsi_device_get(SCSIBus *bus, int channel, int > id, int lun) > return d; > } > > +/* > + * Invoke @fn() for each enqueued request in device @s. Must be called from > the > + * main loop thread while the guest is stopped. This is only suitable for > + * vmstate ->put(), use scsi_device_for_each_req_async() for other cases. > + */ > +static void scsi_device_for_each_req_sync(SCSIDevice *s, > + void (*fn)(SCSIRequest *, void *), > + void *opaque) > +{ > + SCSIRequest *req; > + SCSIRequest *next_req; > + > + assert(!runstate_is_running()); > + assert(qemu_in_main_thread()); > + > + QTAILQ_FOREACH_SAFE(req, &s->requests, next, next_req) { > + fn(req, opaque); > + } > +} > + > +typedef struct { > + SCSIDevice *s; > + void (*fn)(SCSIRequest *, void *); > + void *fn_opaque; > +} SCSIDeviceForEachReqAsyncData; > + > +static void scsi_device_for_each_req_async_bh(void *opaque) > +{ > + g_autofree SCSIDeviceForEachReqAsyncData *data = opaque; > + SCSIDevice *s = data->s; > + SCSIRequest *req; > + SCSIRequest *next; > + > + /* > + * It is unlikely that the AioContext will change before this BH is > called, > + * but if it happens then ->requests must not be accessed from this > + * AioContext. > + */
What is the scenario where this happens? I would have expected that switching the AioContext of a node involves draining the node first, which would execute this BH before the context changes. The other option I see is an empty BlockBackend, which can change its AioContext without polling BHs, but in that case there is no connection to other users, so the only change could come from virtio-scsi itself. If there is such a case, it would probably be helpful to be specific in the comment. > + if (blk_get_aio_context(s->conf.blk) == qemu_get_current_aio_context()) { > + QTAILQ_FOREACH_SAFE(req, &s->requests, next, next) { > + data->fn(req, data->fn_opaque); > + } > + } Of course, if the situation does happen, the question is why just doing nothing is correct. Wouldn't that mean that the guest still sees stuck requests? Would rescheduling the BH in the new context be better? > + > + /* Drop the reference taken by scsi_device_for_each_req_async() */ > + object_unref(OBJECT(s)); > +} > + > +/* > + * Schedule @fn() to be invoked for each enqueued request in device @s. @fn() > + * runs in the AioContext that is executing the request. > + */ > +static void scsi_device_for_each_req_async(SCSIDevice *s, > + void (*fn)(SCSIRequest *, void *), > + void *opaque) If we keep the behaviour above (doesn't do anything if the AioContext changes), then I think it needs to be documented for this function and callers should be explicit about why it's okay. Kevin