On 03/28/2018 06:31 PM, Max Reitz wrote: > On 2018-03-27 15:30, Denis V. Lunev wrote: >> We have received the following assert on QEMU 2.9: >> >> (gdb) bt >> 0 0x00007f6f67d281f7 in __GI_raise () >> 1 0x00007f6f67d298e8 in __GI_abort () >> 2 0x00007f6f67d21266 in __assert_fail_base () >> 3 0x00007f6f67d21312 in __GI___assert_fail () >> 4 0x000055a8faf76f9f in bdrv_detach_aio_context () >> 5 0x000055a8faf76f68 in bdrv_detach_aio_context () >> 6 0x000055a8faf770c6 in bdrv_set_aio_context () >> 7 0x000055a8fafb780d in blk_set_aio_context () >> 8 0x000055a8faf7af08 in block_job_attached_aio_context () >> 9 0x000055a8faf77043 in bdrv_attach_aio_context () >> 10 0x000055a8faf770d9 in bdrv_set_aio_context () >> 11 0x000055a8fafb780d in blk_set_aio_context () >> 12 0x000055a8fad580e7 in virtio_blk_data_plane_stop () >> 13 0x000055a8faf11da5 in virtio_bus_stop_ioeventfd () >> 14 0x000055a8fad85604 in virtio_vmstate_change () >> 15 0x000055a8fae1ba52 in vm_state_notify () >> 16 0x000055a8fad273e5 in do_vm_stop () >> 17 vm_stop () >> 18 0x000055a8face8f28 in main_loop_should_exit () >> 19 main_loop () >> 20 main () >> (gdb) >> >> It does not look, that the code is fundumentally different in 2.12. >> >> block_job_attached_aio_context() calls backup_attached_aio_context(), >> which in turn calls bdrv_detach_aio_context() again. This results in >> assert(!bs->walking_aio_notifiers). >> >> The code in mirror is basically the same. The patch replaces boolean >> condition with incremental counter, which should solve the problem. >> >> Signed-off-by: Denis V. Lunev <d...@openvz.org> >> CC: Kevin Wolf <kw...@redhat.com> >> CC: Max Reitz <mre...@redhat.com> >> --- >> include/block/block_int.h | 2 +- >> block.c | 12 ++++++------ >> 2 files changed, 7 insertions(+), 7 deletions(-) > Changing this to a counter looks OK to me, but dealing with a recursive > bdrv_set_aio_context() might be a bit more complicated than that. It > calls for trouble if one of the aio_notifiers assigns a different > AioContext than was just assigned, if nothing else then because we have > to make sure not to call any other notifier with the old "new" context. > > In this case, that shouldn't be an issue because we will simply assign > the new context again, so it's actually a no-op. Maybe we could allow > that case alone, but even then we have to verify it. > > Another thing I'm wondering is how this prevents infinite recursion. > Won't we just call the notifier again which will try to again set its > BlockBackend's AioContext? good catch to think about. AIO context here was different that one one top. I'll come back after more evaluation.
> What is the full case anyway? The AioContext of the source changes, > which results in the call of backup's notifier, which will then change > the AioContext of the target. So source and target would need to be in > the same chain if setting target's context conflicts with a context > change on source. Unfortunately I do not know :( This has been happened on the customer site. The only thing we have is coredump, which gives above mentioned trace. Den > Max > >> diff --git a/include/block/block_int.h b/include/block/block_int.h >> index 29cafa4..a290711 100644 >> --- a/include/block/block_int.h >> +++ b/include/block/block_int.h >> @@ -613,7 +613,7 @@ struct BlockDriverState { >> * BDS may register themselves in this list to be notified of changes >> * regarding this BDS's context */ >> QLIST_HEAD(, BdrvAioNotifier) aio_notifiers; >> - bool walking_aio_notifiers; /* to make removal during iteration safe */ >> + int walking_aio_notifiers; /* to make removal during iteration safe */ >> >> char filename[PATH_MAX]; >> char backing_file[PATH_MAX]; /* if non zero, the image is a diff of >> diff --git a/block.c b/block.c >> index a8da4f2..82cc07b 100644 >> --- a/block.c >> +++ b/block.c >> @@ -4739,8 +4739,7 @@ void bdrv_detach_aio_context(BlockDriverState *bs) >> return; >> } >> >> - assert(!bs->walking_aio_notifiers); >> - bs->walking_aio_notifiers = true; >> + bs->walking_aio_notifiers++; >> QLIST_FOREACH_SAFE(baf, &bs->aio_notifiers, list, baf_tmp) { >> if (baf->deleted) { >> bdrv_do_remove_aio_context_notifier(baf); >> @@ -4751,7 +4750,8 @@ void bdrv_detach_aio_context(BlockDriverState *bs) >> /* Never mind iterating again to check for ->deleted. bdrv_close() will >> * remove remaining aio notifiers if we aren't called again. >> */ >> - bs->walking_aio_notifiers = false; >> + bs->walking_aio_notifiers--; >> + assert(bs->walking_aio_notifiers >= 0); >> >> if (bs->drv->bdrv_detach_aio_context) { >> bs->drv->bdrv_detach_aio_context(bs); >> @@ -4782,8 +4782,7 @@ void bdrv_attach_aio_context(BlockDriverState *bs, >> bs->drv->bdrv_attach_aio_context(bs, new_context); >> } >> >> - assert(!bs->walking_aio_notifiers); >> - bs->walking_aio_notifiers = true; >> + bs->walking_aio_notifiers++; >> QLIST_FOREACH_SAFE(ban, &bs->aio_notifiers, list, ban_tmp) { >> if (ban->deleted) { >> bdrv_do_remove_aio_context_notifier(ban); >> @@ -4791,7 +4790,8 @@ void bdrv_attach_aio_context(BlockDriverState *bs, >> ban->attached_aio_context(new_context, ban->opaque); >> } >> } >> - bs->walking_aio_notifiers = false; >> + bs->walking_aio_notifiers--; >> + assert(bs->walking_aio_notifiers >= 0); >> } >> >> void bdrv_set_aio_context(BlockDriverState *bs, AioContext *new_context) >> >