requests from one thread

Hanna Czenczek Wed, 24 Jan 2024 04:14:08 -0800

On 23.01.24 18:10, Kevin Wolf wrote:

Am 23.01.2024 um 17:40 hat Hanna Czenczek geschrieben:

On 21.12.23 22:23, Kevin Wolf wrote:

From: Stefan Hajnoczi<stefa...@redhat.com>


Stop depending on the AioContext lock and instead access
SCSIDevice->requests from only one thread at a time:
- When the VM is running only the BlockBackend's AioContext may access
    the requests list.
- When the VM is stopped only the main loop may access the requests
    list.

These constraints protect the requests list without the need for locking
in the I/O code path.

Note that multiple IOThreads are not supported yet because the code
assumes all SCSIRequests are executed from a single AioContext. Leave
that as future work.

Signed-off-by: Stefan Hajnoczi<stefa...@redhat.com>
Reviewed-by: Eric Blake<ebl...@redhat.com>
Message-ID:<20231204164259.1515217-2-stefa...@redhat.com>
Signed-off-by: Kevin Wolf<kw...@redhat.com>
---
   include/hw/scsi/scsi.h |   7 +-
   hw/scsi/scsi-bus.c     | 181 ++++++++++++++++++++++++++++-------------
   2 files changed, 131 insertions(+), 57 deletions(-)

My reproducer forhttps://issues.redhat.com/browse/RHEL-3934  now breaks more
often because of this commit than because of the original bug, i.e. when
repeatedly hot-plugging and unplugging a virtio-scsi and a scsi-hd device,
this tends to happen when unplugging the scsi-hd:

{"execute":"device_del","arguments":{"id":"stg0"}}
{"return": {}}
qemu-system-x86_64: ../block/block-backend.c:2429: blk_get_aio_context:
Assertion `ctx == blk->ctx' failed.


[...]

I don't know anything about the problem either, but since you already
speculated about the cause, let me speculate about the solution:
Can we hold the graph writer lock for the tran_commit() call in
bdrv_try_change_aio_context()? And of course take the reader lock for
blk_get_aio_context(), but that should be completely unproblematic.

I tried this, and it’s not easy taking the lock just for tran_commit(),because some callers of bdrv_try_change_aio_context() already hold thewrite lock (specifically bdrv_attach_child_common(),bdrv_attach_child_common_abort(), and bdrv_root_unref_child()[1]), andqmp_x_blockdev_set_iothread() holds the read lock. Other callers don’thold any lock[2].

So I’m not sure whether we should mark all ofbdrv_try_change_aio_context() as GRAPH_WRLOCK and then make all callerstake the lock, or really only take it for tran_commit(), and havecallers release the lock around bdrv_try_change_aio_context(). Formersounds better to naïve me.

(In any case, FWIW, having blk_set_aio_context() take the write lock,and scsi_device_for_each_req_async_bh() take the read lock[3], does fixthe assertion failure.)


Hanna

[1] bdrv_root_unref_child() is not marked as GRAPH_WRLOCK, but it’scallers generally seem to ensure that the lock is taken when calling it.

[2] blk_set_aio_context() (evidently), blk_exp_add(),external_snapshot_abort(), {blockdev,drive}_backup_action(),qmp_{blockdev,drive}_mirror()

[3] I’ve made the _bh a coroutine (for bdrv_graph_co_rdlock()) andreplaced the aio_bh_schedule_oneshot() by aio_co_enter() – hope that’sright.

Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread

Reply via email to