On Wed, 7 Feb 2024 at 04:36, Hanna Czenczek <hre...@redhat.com> wrote: > > On 06.02.24 17:53, Stefan Hajnoczi wrote: > > On Fri, Feb 02, 2024 at 03:47:53PM +0100, Hanna Czenczek wrote: > > Hi, > > Without the AioContext lock, a BB's context may kind of change at any > time (unless it has a root node, and I/O requests are pending). That > also means that its own context (BlockBackend.ctx) and that of its root > node can differ sometimes (while the context is being changed). > > blk_get_aio_context() doesn't know this yet and asserts that both are > always equal (if there is a root node). Because it's no longer true, > and because callers don't seem to really care about the root node's > context, we can and should remove the assertion and just return the BB's > context. > > Beyond that, the question is whether the callers of > blk_get_aio_context() are OK with the context potentially changing > concurrently. Honestly, it isn't entirely clear to me; most look OK, > except for the virtio-scsi code, which operates under the general > assumption that the BB's context is always equal to that of the > virtio-scsi device. I doubt that this assumption always holds (it is > definitely not obvious to me that it would), but then again, this series > will not make matters worse in that regard, and that is what counts for > me now. > > One clear point of contention is scsi_device_for_each_req_async(), which > is addressed by patch 2. Right now, it schedules a BH in the BB > context, then the BH double-checks whether the context still fits, and > if not, re-schedules itself. Because virtio-scsi's context is fixed, > this seems to indicate to me that it wants to be able to deal with a > case where BB and virtio-scsi context differ, which seems to break that > aforementioned general virtio-scsi assumption. > > I don't agree with the last sentence: virtio-scsi's context isn't fixed. > > The AioContext changes when dataplane is started/stopped. virtio-scsi > switches AioContext between the IOThread's AioContext and the main > loop's qemu_aio_context. > > However, virtio-scsi virtqueue processing only happens in the IOThread's > AioContext. Maybe this is what you meant when you said the AioContext is > fixed? > > > Specifically, I meant VirtIOSCSI.ctx, which is set only once in > virtio_scsi_dataplane_setup(). That’s at least where the virtqueue notifiers > are registered, so yes, virtqueue processing should at least be fixed to that > context. It seems like it’s always possible some things are processed in the > main thread (not just setup/teardown, but also e.g. TMF_LOGICAL_UNIT_RESET), > so to me it seems like virtio-scsi kind of runs in two contexts > simultaneously. Yes, when virtqueue processing is paused, all processing > VirtIOSCSI.ctx is stopped, but I wouldn’t say it switches contexts there. It > just stops processing some requests. > > Either way, virtio-scsi request processing doesn’t stop just because a > scsi-hd device is hot-plugged or -unplugged. If the BB changes contexts in > the hot-unplug path (while vq request processing is continuing in the I/O > thread), its context will differ from that of virtio-scsi. > > So should I just replace the “the context is fixed” and say that in this > specific instance, virtio-scsi vq processing continues in the I/O thread? > > The BH function is aware that the current AioContext might not be the > same as the AioContext at the time the BH was scheduled. That doesn't > break assumptions in the code. > > (It may be possible to rewrite virtio-blk, virtio-scsi, and core > VirtIODevice ioeventfd code to use the simpler model where the > AioContext really is fixed because things have changed significantly > over the years, but I looked a few weeks ago and it's difficult work.) > > I'm just pointing out that I think this description is incomplete. I > *do* agree with what this patch series is doing :). > > > Well, this description won’t land in any commit log, so from my side, I’m not > too worried about its correctness. O:)
Okay, I think we're in agreement. What you described in your reply matches how I understand the code. No need to resend anything. Stefan