Am 17.09.2018 um 00:05 hat Max Reitz geschrieben: > On 14.09.18 18:25, Kevin Wolf wrote: > > Am 13.09.2018 um 22:55 hat Max Reitz geschrieben: > >> On 13.09.18 14:52, Kevin Wolf wrote: > >>> When starting an active commit job, other callbacks can run before > >>> mirror_start_job() calls bdrv_ref() where needed and cause the nodes to > >>> go away. Add another pair of bdrv_ref/unref() around it to protect > >>> against this case. > >>> > >>> Signed-off-by: Kevin Wolf <kw...@redhat.com> > >>> --- > >>> block/mirror.c | 11 +++++++++++ > >>> 1 file changed, 11 insertions(+) > >> > >> Reviewed-by: Max Reitz <mre...@redhat.com> > >> > >> But... How? > > > > Tried to reproduce the other bug Paolo was concerned about (good there > > is something like 'git reflog'!) and dug up this one instead. > > > > So the scenario seems to be test_stream_commit_1 in qemu-iotests 030. > > > > The backtrace when the BDS is deleted is the following: > > > > (rr) bt > > #0 0x00007faeaf6145ec in __memset_sse2_unaligned_erms () at > > /lib64/libc.so.6 > > #1 0x00007faeaf60414e in _int_free () at /lib64/libc.so.6 > > #2 0x00007faeaf6093be in free () at /lib64/libc.so.6 > > #3 0x00007faecaea6b4e in g_free () at /lib64/libglib-2.0.so.0 > > #4 0x000055c0c94f9d6c in bdrv_delete (bs=0x55c0cbe69240) at block.c:3590 > > #5 0x000055c0c94fbe82 in bdrv_unref (bs=0x55c0cbe69240) at block.c:4638 > > #6 0x000055c0c94f6761 in bdrv_root_unref_child (child=0x55c0cbe57af0) at > > block.c:2188 > > #7 0x000055c0c94fe239 in block_job_remove_all_bdrv (job=0x55c0ccf9e220) at > > blockjob.c:200 > > #8 0x000055c0c94fdf1f in block_job_free (job=0x55c0ccf9e220) at > > blockjob.c:94 > > #9 0x000055c0c94ffe0d in job_unref (job=0x55c0ccf9e220) at job.c:368 > > #10 0x000055c0c95007cd in job_do_dismiss (job=0x55c0ccf9e220) at job.c:641 > > #11 0x000055c0c95008d9 in job_conclude (job=0x55c0ccf9e220) at job.c:667 > > #12 0x000055c0c9500b66 in job_finalize_single (job=0x55c0ccf9e220) at > > job.c:735 > > #13 0x000055c0c94ff4fa in job_txn_apply (txn=0x55c0cbe19eb0, > > fn=0x55c0c9500a62 <job_finalize_single>, lock=true) at job.c:151 > > #14 0x000055c0c9500e92 in job_do_finalize (job=0x55c0ccf9e220) at job.c:822 > > #15 0x000055c0c9501040 in job_completed_txn_success (job=0x55c0ccf9e220) at > > job.c:872 > > #16 0x000055c0c950115c in job_completed (job=0x55c0ccf9e220, ret=0, > > error=0x0) at job.c:892 > > #17 0x000055c0c92b572c in stream_complete (job=0x55c0ccf9e220, > > opaque=0x55c0cc050bc0) at block/stream.c:96 > > But isn't this just deletion of node1, i.e. the intermediate node of the > stream job? > > The commit job happens above that (node3 and upwards), so all its BDSs > shouldn't be affected. So even with the bdrv_ref()s introduced in this > patch, I'd still expect node1 to be deleted exactly here.
Hm, I suppose that's true. So it crashed because the drain in bdrv_reopen() didn't actually drain the streaming job (which includes removing node1), so node1 ended up in the ReopenQueue and when it disappeared in the middle of reopen, we got a use after free. Then it would be bug between job draining and bdrv_reopen() and the active commit job would only be a victim of the bug without actually doing anything wrong, and we can drop this patch. Does this make sense? Kevin
signature.asc
Description: PGP signature