On Tue, May 24, 2022 at 09:55:39AM +0200, Paolo Bonzini wrote: > On 5/22/22 17:06, Stefan Hajnoczi wrote: > > However, I hit on a problem that I think Emanuele and Paolo have already > > pointed out: draining is GS & IO. This might have worked under the 1 > > IOThread > > model but it does not make sense for multi-queue. It is possible to submit > > I/O > > requests in drained sections. How can multiple threads be in drained > > sections > > simultaneously and possibly submit further I/O requests in their drained > > sections? Those sections wouldn't be "drained" in any useful sense of the > > word. > > Yeah, that works only if the drained sections are well-behaved. > > "External" sources of I/O are fine; they are disabled using is_external, and > don't drain themselves I think.
I/O requests for a given BDS may be executing in multiple AioContexts, so how do you call aio_disable_external() on all relevant AioContexts? We covered this below but I wanted to reply here in case someone else reads this part without reading the rest. > Mirror is the only I/O user of drain, and it's fine because it never submits > I/O to the drained BDS. > > Drained sections in the main thread can be special cased to allow I/O > (wrlock in this series would also allow I/O). > > So I think that the "cooperation from all relevant places" that Kevin > mentioned is already there, except for coroutine commands in the monitor. > Those are a bad idea in my opinion and I'd rather revert commit eb94b81a94 > ("block: Convert 'block_resize' to coroutine") until we have a clearer idea > of how to handle them. > > I agree that it's basically impossible to review the change. On the other > hand, there's already a substantial amount of faith involved in the > correctness of the current code. > > In particular the AioContext lock does absolutely nothing to protect > corutines in the main thread against graph changes---both from the monitor > (including BHs as in "block: Fix BB.root changing across bdrv_next()") and > from BDS coroutines. The former are unprotected; the latter are protected > by drain only: using drain to protect against graph writes would be a matter > of extending *existing* faith to the multi-iothread case. > > Once the deadlock is broken, we can proceed to remove the AioContext lock > and then introduce actual coroutine-based locking. > > > Possible steps for AioContext removal > > ------------------------------------- > > I also wanted to share my assumptions about multi-queue and AioContext > > removal. > > Please let me know if anything seems wrong or questionable: > > > > - IO code can execute in any thread that has an AioContext. > > - Multiple threads may execute a IO code at the same time. > > - GS code only execute under the BQL. > > > > For AioContext removal this means: > > > > - bdrv_get_aio_context() becomes mostly meaningless since there is no need > > for > > a special "home" AioContext. > > Correct. bdrv_set_aio_context() remains useful as a way to set a home > AioContext for sockets. > > > - bdrv_coroutine_enter() becomes mostly meaningless because there is no > > need to > > run a coroutine in the BDS's AioContext. > > - aio_disable_external(bdrv_get_aio_context(bs)) no longer works because > > many > > threads/AioContexts may submit new I/O requests. > > BlockDevOps.drained_begin() > > may be used instead (e.g. to temporarily disable ioeventfds on a > > multi-queue > > virtio-blk device). > > This is a change that can be done independent of this work. > > > - AIO_WAIT_WHILE() simplifies to > > > > while ((cond)) { > > aio_poll(qemu_get_current_aio_context(), true); > > ... > > } > > > > and the distinction between home AioContext and non-home context is > > eliminated. AioContext unlocking is dropped. > > (I'll reply on this from elsewhere in the thread). > > > Does this make sense? I haven't seen these things in recent patch series. > > I agree, and yeah all these are blocked on protecting graph modifications. > > In parallel to the block layer discussions, it's possible to work on > introducing a request queue lock in virtio-blk and virtio-scsi. That's the > only thing that relies on the AioContext lock outside the block layer. I'm not sure what the request queue lock protects in virtio-blk? In virtio-scsi I guess a lock is needed to protect SCSI target emulation state? Stefan
signature.asc
Description: PGP signature