children: new rwlock

Paolo Bonzini Tue, 24 May 2022 00:56:16 -0700

On 5/22/22 17:06, Stefan Hajnoczi wrote:

However, I hit on a problem that I think Emanuele and Paolo have already
pointed out: draining is GS & IO. This might have worked under the 1 IOThread
model but it does not make sense for multi-queue. It is possible to submit I/O
requests in drained sections. How can multiple threads be in drained sections
simultaneously and possibly submit further I/O requests in their drained
sections? Those sections wouldn't be "drained" in any useful sense of the word.


Yeah, that works only if the drained sections are well-behaved.

"External" sources of I/O are fine; they are disabled using is_external,and don't drain themselves I think.

Mirror is the only I/O user of drain, and it's fine because it neversubmits I/O to the drained BDS.

Drained sections in the main thread can be special cased to allow I/O(wrlock in this series would also allow I/O).

So I think that the "cooperation from all relevant places" that Kevinmentioned is already there, except for coroutine commands in themonitor. Those are a bad idea in my opinion and I'd rather revertcommit eb94b81a94 ("block: Convert 'block_resize' to coroutine") untilwe have a clearer idea of how to handle them.

I agree that it's basically impossible to review the change. On theother hand, there's already a substantial amount of faith involved inthe correctness of the current code.

In particular the AioContext lock does absolutely nothing to protectcorutines in the main thread against graph changes---both from themonitor (including BHs as in "block: Fix BB.root changing acrossbdrv_next()") and from BDS coroutines. The former are unprotected; thelatter are protected by drain only: using drain to protect against graphwrites would be a matter of extending *existing* faith to themulti-iothread case.

Once the deadlock is broken, we can proceed to remove the AioContextlock and then introduce actual coroutine-based locking.

Possible steps for AioContext removal
-------------------------------------
I also wanted to share my assumptions about multi-queue and AioContext removal.
Please let me know if anything seems wrong or questionable:

- IO code can execute in any thread that has an AioContext.
- Multiple threads may execute a IO code at the same time.
- GS code only execute under the BQL.

For AioContext removal this means:

- bdrv_get_aio_context() becomes mostly meaningless since there is no need for
   a special "home" AioContext.

Correct. bdrv_set_aio_context() remains useful as a way to set a homeAioContext for sockets.

- bdrv_coroutine_enter() becomes mostly meaningless because there is no need to
   run a coroutine in the BDS's AioContext.
- aio_disable_external(bdrv_get_aio_context(bs)) no longer works because many
   threads/AioContexts may submit new I/O requests. BlockDevOps.drained_begin()
   may be used instead (e.g. to temporarily disable ioeventfds on a multi-queue
   virtio-blk device).


This is a change that can be done independent of this work.

- AIO_WAIT_WHILE() simplifies to

     while ((cond)) {
         aio_poll(qemu_get_current_aio_context(), true);
         ...
     }

   and the distinction between home AioContext and non-home context is
   eliminated. AioContext unlocking is dropped.


(I'll reply on this from elsewhere in the thread).

Does this make sense? I haven't seen these things in recent patch series.


I agree, and yeah all these are blocked on protecting graph modifications.

In parallel to the block layer discussions, it's possible to work onintroducing a request queue lock in virtio-blk and virtio-scsi. That'sthe only thing that relies on the AioContext lock outside the block layer.


Paolo

Re: [RFC PATCH v2 0/8] Removal of AioContext lock, bs->parents and ->children: new rwlock

Reply via email to