Am 13.09.2018 um 22:55 hat Max Reitz geschrieben: > On 13.09.18 14:52, Kevin Wolf wrote: > > When starting an active commit job, other callbacks can run before > > mirror_start_job() calls bdrv_ref() where needed and cause the nodes to > > go away. Add another pair of bdrv_ref/unref() around it to protect > > against this case. > > > > Signed-off-by: Kevin Wolf <kw...@redhat.com> > > --- > > block/mirror.c | 11 +++++++++++ > > 1 file changed, 11 insertions(+) > > Reviewed-by: Max Reitz <mre...@redhat.com> > > But... How?
Tried to reproduce the other bug Paolo was concerned about (good there is something like 'git reflog'!) and dug up this one instead. So the scenario seems to be test_stream_commit_1 in qemu-iotests 030. The backtrace when the BDS is deleted is the following: (rr) bt #0 0x00007faeaf6145ec in __memset_sse2_unaligned_erms () at /lib64/libc.so.6 #1 0x00007faeaf60414e in _int_free () at /lib64/libc.so.6 #2 0x00007faeaf6093be in free () at /lib64/libc.so.6 #3 0x00007faecaea6b4e in g_free () at /lib64/libglib-2.0.so.0 #4 0x000055c0c94f9d6c in bdrv_delete (bs=0x55c0cbe69240) at block.c:3590 #5 0x000055c0c94fbe82 in bdrv_unref (bs=0x55c0cbe69240) at block.c:4638 #6 0x000055c0c94f6761 in bdrv_root_unref_child (child=0x55c0cbe57af0) at block.c:2188 #7 0x000055c0c94fe239 in block_job_remove_all_bdrv (job=0x55c0ccf9e220) at blockjob.c:200 #8 0x000055c0c94fdf1f in block_job_free (job=0x55c0ccf9e220) at blockjob.c:94 #9 0x000055c0c94ffe0d in job_unref (job=0x55c0ccf9e220) at job.c:368 #10 0x000055c0c95007cd in job_do_dismiss (job=0x55c0ccf9e220) at job.c:641 #11 0x000055c0c95008d9 in job_conclude (job=0x55c0ccf9e220) at job.c:667 #12 0x000055c0c9500b66 in job_finalize_single (job=0x55c0ccf9e220) at job.c:735 #13 0x000055c0c94ff4fa in job_txn_apply (txn=0x55c0cbe19eb0, fn=0x55c0c9500a62 <job_finalize_single>, lock=true) at job.c:151 #14 0x000055c0c9500e92 in job_do_finalize (job=0x55c0ccf9e220) at job.c:822 #15 0x000055c0c9501040 in job_completed_txn_success (job=0x55c0ccf9e220) at job.c:872 #16 0x000055c0c950115c in job_completed (job=0x55c0ccf9e220, ret=0, error=0x0) at job.c:892 #17 0x000055c0c92b572c in stream_complete (job=0x55c0ccf9e220, opaque=0x55c0cc050bc0) at block/stream.c:96 #18 0x000055c0c950142b in job_defer_to_main_loop_bh (opaque=0x55c0cbe38a50) at job.c:981 #19 0x000055c0c96171fc in aio_bh_call (bh=0x55c0cbe19f20) at util/async.c:90 #20 0x000055c0c9617294 in aio_bh_poll (ctx=0x55c0cbd7b8d0) at util/async.c:118 #21 0x000055c0c961c150 in aio_poll (ctx=0x55c0cbd7b8d0, blocking=true) at util/aio-posix.c:690 #22 0x000055c0c957246c in bdrv_flush (bs=0x55c0cbe4b250) at block/io.c:2693 #23 0x000055c0c94f8e46 in bdrv_reopen_prepare (reopen_state=0x55c0cbef4d68, queue=0x55c0cbeab9f0, errp=0x7ffd65617050) at block.c:3206 #24 0x000055c0c94f883a in bdrv_reopen_multiple (ctx=0x55c0cbd7b8d0, bs_queue=0x55c0cbeab9f0, errp=0x7ffd656170c8) at block.c:3032 #25 0x000055c0c94f8a00 in bdrv_reopen (bs=0x55c0cbe2c250, bdrv_flags=8194, errp=0x7ffd65617220) at block.c:3075 #26 0x000055c0c956a23f in commit_active_start (job_id=0x0, bs=0x55c0cbd905b0, base=0x55c0cbe2c250, creation_flags=0, speed=0, on_error=BLOCKDEV_ON_ERROR_REPORT, filter_node_name=0x0, cb=0x0, opaque=0x0, auto_complete=false, errp=0x7ffd65617220) at block/mirror.c:1687 #27 0x000055c0c927437c in qmp_block_commit (has_job_id=false, job_id=0x0, device=0x55c0cbef4d20 "drive0", has_base_node=false, base_node=0x0, has_base=true, base=0x55c0cbe38a00 "/home/kwolf/source/qemu/tests/qemu-iotests/scratch/img-3.img", has_top_node=false, top_node=0x0, has_top=false, top=0x0, has_backing_file=false, backing_file=0x0, has_speed=false, speed=0, has_filter_node_name=false, filter_node_name=0x0, errp=0x7ffd656172d8) at blockdev.c:3325 #28 0x000055c0c928bb08 in qmp_marshal_block_commit (args=<optimized out>, ret=<optimized out>, errp=0x7ffd656173b8) at qapi/qapi-commands-block-core.c:409 #29 0x000055c0c960beef in do_qmp_dispatch (errp=0x7ffd656173b0, allow_oob=false, request=<optimized out>, cmds=0x55c0c9f56f80 <qmp_commands>) at qapi/qmp-dispatch.c:129 #30 0x000055c0c960beef in qmp_dispatch (cmds=0x55c0c9f56f80 <qmp_commands>, request=<optimized out>, allow_oob=<optimized out>) at qapi/qmp-dispatch.c:171 #31 0x000055c0c9111623 in monitor_qmp_dispatch (mon=0x55c0cbdbb930, req=0x55c0ccf90e00, id=0x0) at /home/kwolf/source/qemu/monitor.c:4168 #32 0x000055c0c911191f in monitor_qmp_bh_dispatcher (data=0x0) at /home/kwolf/source/qemu/monitor.c:4237 #33 0x000055c0c96171fc in aio_bh_call (bh=0x55c0cbccc840) at util/async.c:90 #34 0x000055c0c9617294 in aio_bh_poll (ctx=0x55c0cbccc550) at util/async.c:118 #35 0x000055c0c961b723 in aio_dispatch (ctx=0x55c0cbccc550) at util/aio-posix.c:436 #36 0x000055c0c961762f in aio_ctx_dispatch (source=0x55c0cbccc550, callback=0x0, user_data=0x0) at util/async.c:261 #37 0x00007faecaea1257 in g_main_context_dispatch () at /lib64/libglib-2.0.so.0 #38 0x000055c0c961a2aa in glib_pollfds_poll () at util/main-loop.c:215 #39 0x000055c0c961a2aa in os_host_main_loop_wait (timeout=0) at util/main-loop.c:238 #40 0x000055c0c961a2aa in main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:497 #41 0x000055c0c92805e1 in main_loop () at vl.c:1866 #42 0x000055c0c9287eca in main (argc=18, argv=0x7ffd65617958, envp=0x7ffd656179f0) at vl.c:4647 (rr) p bs.node_name $17 = "node1", '\000' <repeats 26 times> Obviously, this exact backtrace shouldn't even be possible like this because job_defer_to_main_loop_bh() shouldn't be called inside bdrv_flush(), but when draining the subtree in bdrv_reopen(). This is only fixed in "blockjob: Lie better in child_job_drained_poll()". It probably doesn't make a difference, though, where exactly during bdrv_reopen() the node is deleted. We'll crash anyway if we don't hold the extra reference. Kevin
signature.asc
Description: PGP signature