Am 02.02.2024 um 15:47 hat Hanna Czenczek geschrieben: > Hi, > > Without the AioContext lock, a BB's context may kind of change at any > time (unless it has a root node, and I/O requests are pending). That > also means that its own context (BlockBackend.ctx) and that of its root > node can differ sometimes (while the context is being changed). > > blk_get_aio_context() doesn't know this yet and asserts that both are > always equal (if there is a root node). Because it's no longer true, > and because callers don't seem to really care about the root node's > context, we can and should remove the assertion and just return the BB's > context. > > Beyond that, the question is whether the callers of > blk_get_aio_context() are OK with the context potentially changing > concurrently. Honestly, it isn't entirely clear to me; most look OK, > except for the virtio-scsi code, which operates under the general > assumption that the BB's context is always equal to that of the > virtio-scsi device. I doubt that this assumption always holds (it is > definitely not obvious to me that it would), but then again, this series > will not make matters worse in that regard, and that is what counts for > me now. > > One clear point of contention is scsi_device_for_each_req_async(), which > is addressed by patch 2. Right now, it schedules a BH in the BB > context, then the BH double-checks whether the context still fits, and > if not, re-schedules itself. Because virtio-scsi's context is fixed, > this seems to indicate to me that it wants to be able to deal with a > case where BB and virtio-scsi context differ, which seems to break that > aforementioned general virtio-scsi assumption. > > Unfortunately, I definitely have to touch that code, because accepting > concurrent changes of AioContexts breaks the double-check (just because > the BB has the right context in that place does not mean it stays in > that context); instead, we must prevent any concurrent change until the > BH is done. Because changing contexts generally involves a drained > section, we can prevent it by keeping the BB in-flight counter elevated. > > Question is, how to reason for that. I’d really rather not include the > need to follow the BB context in my argument, because I find that part a > bit fishy. > > Luckily, there’s a second, completely different reason for having > scsi_device_for_each_req_async() increment the in-flight counter: > Specifically, scsi_device_purge_requests() probably wants to await full > completion of scsi_device_for_each_req_async(), and we can do that most > easily in the very same way by incrementing the in-flight counter. This > way, the blk_drain() in scsi_device_purge_requests() will not only await > all (cancelled) I/O requests, but also the non-I/O requests. > > The fact that this prevents the BB AioContext from changing while the BH > is scheduled/running then is just a nice side effect. > > > Hanna Czenczek (2): > block-backend: Allow concurrent context changes > scsi: Await request purging > > block/block-backend.c | 22 +++++++++++----------- > hw/scsi/scsi-bus.c | 30 +++++++++++++++++++++--------- > 2 files changed, 32 insertions(+), 20 deletions(-)
Thanks, applied to the block branch. Kevin