Am 17.09.2018 um 00:05 hat Max Reitz geschrieben:
> On 14.09.18 18:25, Kevin Wolf wrote:
> > Am 13.09.2018 um 22:55 hat Max Reitz geschrieben:
> >> On 13.09.18 14:52, Kevin Wolf wrote:
> >>> When starting an active commit job, other callbacks can run before
> >>> mirror_start_job() calls bdrv_ref() where needed and cause the nodes to
> >>> go away. Add another pair of bdrv_ref/unref() around it to protect
> >>> against this case.
> >>>
> >>> Signed-off-by: Kevin Wolf <kw...@redhat.com>
> >>> ---
> >>>  block/mirror.c | 11 +++++++++++
> >>>  1 file changed, 11 insertions(+)
> >>
> >> Reviewed-by: Max Reitz <mre...@redhat.com>
> >>
> >> But...  How?
> > 
> > Tried to reproduce the other bug Paolo was concerned about (good there
> > is something like 'git reflog'!) and dug up this one instead.
> > 
> > So the scenario seems to be test_stream_commit_1 in qemu-iotests 030.
> > 
> > The backtrace when the BDS is deleted is the following:
> > 
> > (rr) bt
> > #0  0x00007faeaf6145ec in __memset_sse2_unaligned_erms () at 
> > /lib64/libc.so.6
> > #1  0x00007faeaf60414e in _int_free () at /lib64/libc.so.6
> > #2  0x00007faeaf6093be in free () at /lib64/libc.so.6
> > #3  0x00007faecaea6b4e in g_free () at /lib64/libglib-2.0.so.0
> > #4  0x000055c0c94f9d6c in bdrv_delete (bs=0x55c0cbe69240) at block.c:3590
> > #5  0x000055c0c94fbe82 in bdrv_unref (bs=0x55c0cbe69240) at block.c:4638
> > #6  0x000055c0c94f6761 in bdrv_root_unref_child (child=0x55c0cbe57af0) at 
> > block.c:2188
> > #7  0x000055c0c94fe239 in block_job_remove_all_bdrv (job=0x55c0ccf9e220) at 
> > blockjob.c:200
> > #8  0x000055c0c94fdf1f in block_job_free (job=0x55c0ccf9e220) at 
> > blockjob.c:94
> > #9  0x000055c0c94ffe0d in job_unref (job=0x55c0ccf9e220) at job.c:368
> > #10 0x000055c0c95007cd in job_do_dismiss (job=0x55c0ccf9e220) at job.c:641
> > #11 0x000055c0c95008d9 in job_conclude (job=0x55c0ccf9e220) at job.c:667
> > #12 0x000055c0c9500b66 in job_finalize_single (job=0x55c0ccf9e220) at 
> > job.c:735
> > #13 0x000055c0c94ff4fa in job_txn_apply (txn=0x55c0cbe19eb0, 
> > fn=0x55c0c9500a62 <job_finalize_single>, lock=true) at job.c:151
> > #14 0x000055c0c9500e92 in job_do_finalize (job=0x55c0ccf9e220) at job.c:822
> > #15 0x000055c0c9501040 in job_completed_txn_success (job=0x55c0ccf9e220) at 
> > job.c:872
> > #16 0x000055c0c950115c in job_completed (job=0x55c0ccf9e220, ret=0, 
> > error=0x0) at job.c:892
> > #17 0x000055c0c92b572c in stream_complete (job=0x55c0ccf9e220, 
> > opaque=0x55c0cc050bc0) at block/stream.c:96
> 
> But isn't this just deletion of node1, i.e. the intermediate node of the
> stream job?
> 
> The commit job happens above that (node3 and upwards), so all its BDSs
> shouldn't be affected.  So even with the bdrv_ref()s introduced in this
> patch, I'd still expect node1 to be deleted exactly here.

Hm, I suppose that's true.

So it crashed because the drain in bdrv_reopen() didn't actually drain
the streaming job (which includes removing node1), so node1 ended up in
the ReopenQueue and when it disappeared in the middle of reopen, we got
a use after free.

Then it would be bug between job draining and bdrv_reopen() and the
active commit job would only be a victim of the bug without actually
doing anything wrong, and we can drop this patch.

Does this make sense?

Kevin

Attachment: signature.asc
Description: PGP signature

Reply via email to