On Fri, Nov 03, 2023 at 11:32:39AM +0100, Kevin Wolf wrote: ... > > > - GLOBAL_STATE_CODE(); > > > - > > > - /* Make sure that @from doesn't go away until we have successfully > > > attached > > > - * all of its parents to @to. */ > > > > Useful comment that you just moved here in the previous patch... > > > > > - bdrv_ref(from); ... > > > @@ -5717,9 +5699,15 @@ BlockDriverState > > > *bdrv_insert_node(BlockDriverState *bs, QDict *options, > > > goto fail; > > > } > > > > > > + bdrv_ref(bs); > > > > ...but now it is gone. Intentional? > > I figured it was obvious enough that bdrv_ref() is always called to make > sure that the node doesn't go away too early, but I can add it back.
Your call. > > > @@ -94,8 +95,12 @@ static void commit_abort(Job *job) > > > * XXX Can (or should) we somehow keep 'consistent read' blocked even > > > * after the failed/cancelled commit job is gone? If we already wrote > > > * something to base, the intermediate images aren't valid any more. > > > */ > > > - bdrv_replace_node(s->commit_top_bs, s->commit_top_bs->backing->bs, > > > - &error_abort); > > > + commit_top_backing_bs = s->commit_top_bs->backing->bs; > > > + bdrv_drained_begin(commit_top_backing_bs); > > > + bdrv_graph_wrlock(commit_top_backing_bs); > > > > Here, and elsewhere in the patch, drained_begin/end is outside > > wr(un)lock... > > > > > + bdrv_replace_node(s->commit_top_bs, commit_top_backing_bs, > > > &error_abort); > > > + bdrv_graph_wrunlock(); > > > + bdrv_drained_end(commit_top_backing_bs); > > > > > > bdrv_unref(s->commit_top_bs); > > > bdrv_unref(top_bs); > > > @@ -425,7 +430,11 @@ fail: > > > /* commit_top_bs has to be replaced after deleting the block job, > > > * otherwise this would fail because of lack of permissions. */ > > > if (commit_top_bs) { > > > + bdrv_graph_wrlock(top); > > > + bdrv_drained_begin(top); > > > bdrv_replace_node(commit_top_bs, top, &error_abort); > > > + bdrv_drained_end(top); > > > + bdrv_graph_wrunlock(); > > > > ...but here you do it in the opposite order. Intentional? > > No, this is actually wrong. bdrv_drained_begin() has a nested event > loop, and running a nested event loop while holding the graph lock can > cause deadlocks, so it's forbidden. Thanks for catching this! That's what review is for! > > Since the two comments above are the only thing you found in the review, > I'll just directly fix them while applying the series. Sounds good to me. With the deadlock fixed by swapping order, Reviewed-by: Eric Blake <ebl...@redhat.com> -- Eric Blake, Principal Software Engineer Red Hat, Inc. Virtualization: qemu.org | libguestfs.org