Am 14.06.2017 um 15:15 hat Pavel Butsykin geschrieben:
> On 14.06.2017 13:10, Pavel Butsykin wrote:
> >
> >On 22.05.2017 16:57, Stefan Hajnoczi wrote:
> >>AioContext was designed to allow nested acquire/release calls.  It uses
> >>a recursive mutex so callers don't need to worry about nesting...or so
> >>we thought.
> >>
> >>BDRV_POLL_WHILE() is used to wait for block I/O requests.  It releases
> >>the AioContext temporarily around aio_poll().  This gives IOThreads a
> >>chance to acquire the AioContext to process I/O completions.
> >>
> >>It turns out that recursive locking and BDRV_POLL_WHILE() don't mix.
> >>BDRV_POLL_WHILE() only releases the AioContext once, so the IOThread
> >>will not be able to acquire the AioContext if it was acquired
> >>multiple times.
> >>
> >>Instead of trying to release AioContext n times in BDRV_POLL_WHILE(),
> >>this patch simply avoids nested locking in save_vmstate().  It's the
> >>simplest fix and we should step back to consider the big picture with
> >>all the recent changes to block layer threading.
> >>
> >>This patch is the final fix to solve 'savevm' hanging with -object
> >>iothread.
> >
> >The same I see in external_snapshot_prepare():
> >[...]
> >and at the moment BDRV_POLL_WHILE(bs, flush_co.ret == NOT_DONE),
> >we have at least two locks.. So here is another deadlock.
> 
> Sorry, here different kind of deadlock. In external_snapshot case, the
> deadlock can happen only if state->old_bs->aio_context == my_iothread->ctx,
> because in this case the aio_co_enter() always calls aio_co_schedule():

Can you please write qemu-iotests case for any deadlock case that we're
seeing? Stefan, we could also use one for the bug fixed in this series.

Kevin

Reply via email to