Am 14.06.2017 um 15:15 hat Pavel Butsykin geschrieben: > On 14.06.2017 13:10, Pavel Butsykin wrote: > > > >On 22.05.2017 16:57, Stefan Hajnoczi wrote: > >>AioContext was designed to allow nested acquire/release calls. It uses > >>a recursive mutex so callers don't need to worry about nesting...or so > >>we thought. > >> > >>BDRV_POLL_WHILE() is used to wait for block I/O requests. It releases > >>the AioContext temporarily around aio_poll(). This gives IOThreads a > >>chance to acquire the AioContext to process I/O completions. > >> > >>It turns out that recursive locking and BDRV_POLL_WHILE() don't mix. > >>BDRV_POLL_WHILE() only releases the AioContext once, so the IOThread > >>will not be able to acquire the AioContext if it was acquired > >>multiple times. > >> > >>Instead of trying to release AioContext n times in BDRV_POLL_WHILE(), > >>this patch simply avoids nested locking in save_vmstate(). It's the > >>simplest fix and we should step back to consider the big picture with > >>all the recent changes to block layer threading. > >> > >>This patch is the final fix to solve 'savevm' hanging with -object > >>iothread. > > > >The same I see in external_snapshot_prepare(): > >[...] > >and at the moment BDRV_POLL_WHILE(bs, flush_co.ret == NOT_DONE), > >we have at least two locks.. So here is another deadlock. > > Sorry, here different kind of deadlock. In external_snapshot case, the > deadlock can happen only if state->old_bs->aio_context == my_iothread->ctx, > because in this case the aio_co_enter() always calls aio_co_schedule():
Can you please write qemu-iotests case for any deadlock case that we're seeing? Stefan, we could also use one for the bug fixed in this series. Kevin