Re: [Qemu-devel] [RFC] aio: add aio_context_acquire() and aio_context_release()

Stefan Hajnoczi Thu, 12 Sep 2013 01:13:01 -0700

On Tue, Sep 10, 2013 at 02:42:10PM -0500, Michael Roth wrote:
> Quoting Stefan Hajnoczi (2013-08-29 02:43:02)
> > On Thu, Aug 29, 2013 at 09:09:45AM +0800, Wenchao Xia wrote:
> > > 于 2013-8-28 16:49, Stefan Hajnoczi 写道:
> > > >On Wed, Aug 28, 2013 at 11:25:33AM +0800, Wenchao Xia wrote:
> > > >>>+void aio_context_release(AioContext *ctx)
> > > >>>+{
> > > >>>+    qemu_mutex_lock(&ctx->acquire_lock);
> > > >>>+    assert(ctx->owner && qemu_thread_is_self(ctx->owner));
> > > >>>+    ctx->owner = NULL;
> > > >>>+    qemu_cond_signal(&ctx->acquire_cond);
> > > >>>+    qemu_mutex_unlock(&ctx->acquire_lock);
> > > >>>+}
> > > >>   if main thread have call bdrv_aio_readv(cb *bdrv_cb), now it
> > > >>is possible bdrv_cb will be executed in another thread which
> > > >>aio_context_acquire() it. I think there are some ways to solve,
> > > >>but leave a comments here now to tip better?
> > > >
> > > >Callbacks, BHs, and timers are executed in the thread that calls
> > > >aio_poll().  This is safe since other threads cannot run aio_poll() or
> > > >submit new block I/O requests at the same time.
> > > >
> > > >In other words: code should only care which AioContext it runs under,
> > > >not which thread ID it runs under.  (I think we talked about this on IRC
> > > >a few weeks ago.)
> > > >
> > > >Are there any situations you are worried about?
> > > >
> > > >Stefan
> > > >
> > >   Yes, we have discussed it before and think it may be safe for block
> > > driver caller. Still, here I mean to add some in-code comment to tip how
> > > to use it safely.
> > > 
> > > for example:
> > > 
> > > static int glob_test = 0;
> > > 
> > > int aio_cb(void *opaque)
> > > {
> > >     glob_test++;
> > > }
> > > 
> > > Thread A:
> > > bdrv_aio_read(bs, aio_cb...);
> > > .....
> > > glob_test++;
> > > 
> > > 
> > > Normally glob_test have no race condition since they supposed
> > > to work in one thread, but it need to be considered when
> > > aio_context_acquire() is involved. How about:
> > > /* Note that callback can run in different thread which acquired the
> > > AioContext and do a poll() call. */
> > 
> > I will add a comment to aio_context_acquire() to explain that callbacks,
> > timers, and BHs may run in another thread.
> > 
> > Normally this is not a problem since the callbacks access BDS or
> > AioContext, which are both protected by acquire/release.  But people
> > should be aware of this.
> 
> Do you imagine we'll ever have a case where one main loop thread attempts
> to register events (aio_set_fd_handler etc.) with another thread? Currently
> virtio-blk (iothread) registers events with the dataplane's AioContext, but
> doesn't start up the dataplane thread until afterward, so there's no
> contention there. But looking at a scenario where the dataplane threads
> are created/started independently of virtio-blk (as in the QContext RFC for
> example), that may not be the case (there we start up the main loop thread
> during machine init time, then attach events to it via dataplane start
> routine).
> 
> So, in that case, I take it we'd do something like this, in, say,
> virtio_blk_data_plane_start?
> 
>     aio_context_acquire(ctx)
>     aio_set_fd_handler(ctx, ...)
>     aio_context_release(ctx)
> 
> Or maybe push the acquire/release down into aio_set_fd_handler, and allow
> for support for recursive acquisition?


Or use a BH to schedule initialization code into the AioContext.

> As far as locking constraints (for situations like virtio-net dataplane, or
> threaded virtio-blk, where we may have lots of shared state between threads,
> potentially protected by a 'block'/'network' lock or something along that
> line), I assume we need to ensure that any locks that may be acquired after
> acquisition of an AioContext (via, say, event callbacks), must be released
> prior to calling aio_context_acquire() from another thread to avoid an
> ABBA deadlock/'lock'-order reversal?

We should avoid long-held locks where possible.  For example, protect
the relevant subsystem structures (e.g. bdrv_states or net_clients)
during access (adding/remove a device) but don't have a block/net
subsystem lock.

This way it's much easier to avoid lock ordering problems since
fine-grained locks also don't nest as much.

> I ask because I'm looking at how to handle this same scenario for
> qemu_set_fd_handler, set_fd_handler(ctx, ...). My approach is a little
> different:
> 
> https://github.com/mdroth/qemu/commit/9a749a2a1ae93ac1b7d6a1216edaf0ca96ff1edb#L1R110
> 
> but i think the requirements end up being similar for how
> users need to structure their code and handle locking, just wanted to
> double-check though because it seems like a potential pain to have to
> figure out what locks you need to drop prior to do event registration.
> (one alternative to this requirement is making the event updates
> asynchronous, and pushing the logic to deal with stable
> callbacks/events into the event callbacks and their opaque data).
> 
> For anything outside of event registration, I assume a 'light-weight'
> AioContext loop could be converted to driving a more general GMainContext
> loop via the corresponding g_main_context_acquire(ctx)/release(ctx)?

If there are any doubts about lock ordering, I think the simplest
solution is executing code in the context of the event loop thread (see
the comment I made about BHs).

Stefan

Re: [Qemu-devel] [RFC] aio: add aio_context_acquire() and aio_context_release()

Reply via email to