Re: [PATCH 16/24] block: Mark bdrv_replace_node() GRAPH_WRLOCK

Eric Blake Fri, 03 Nov 2023 05:38:09 -0700

On Fri, Nov 03, 2023 at 11:32:39AM +0100, Kevin Wolf wrote:
...
> > > -    GLOBAL_STATE_CODE();
> > > -
> > > -    /* Make sure that @from doesn't go away until we have successfully 
> > > attached
> > > -     * all of its parents to @to. */
> > 
> > Useful comment that you just moved here in the previous patch...
> > 
> > > -    bdrv_ref(from);
...
> > > @@ -5717,9 +5699,15 @@ BlockDriverState 
> > > *bdrv_insert_node(BlockDriverState *bs, QDict *options,
> > >          goto fail;
> > >      }
> > >  
> > > +    bdrv_ref(bs);
> > 
> > ...but now it is gone.  Intentional?
> 
> I figured it was obvious enough that bdrv_ref() is always called to make
> sure that the node doesn't go away too early, but I can add it back.


Your call.

> > > @@ -94,8 +95,12 @@ static void commit_abort(Job *job)
> > >       * XXX Can (or should) we somehow keep 'consistent read' blocked even
> > >       * after the failed/cancelled commit job is gone? If we already wrote
> > >       * something to base, the intermediate images aren't valid any more. 
> > > */
> > > -    bdrv_replace_node(s->commit_top_bs, s->commit_top_bs->backing->bs,
> > > -                      &error_abort);
> > > +    commit_top_backing_bs = s->commit_top_bs->backing->bs;
> > > +    bdrv_drained_begin(commit_top_backing_bs);
> > > +    bdrv_graph_wrlock(commit_top_backing_bs);
> > 
> > Here, and elsewhere in the patch, drained_begin/end is outside
> > wr(un)lock...
> > 
> > > +    bdrv_replace_node(s->commit_top_bs, commit_top_backing_bs, 
> > > &error_abort);
> > > +    bdrv_graph_wrunlock();
> > > +    bdrv_drained_end(commit_top_backing_bs);
> > >  
> > >      bdrv_unref(s->commit_top_bs);
> > >      bdrv_unref(top_bs);
> > > @@ -425,7 +430,11 @@ fail:
> > >      /* commit_top_bs has to be replaced after deleting the block job,
> > >       * otherwise this would fail because of lack of permissions. */
> > >      if (commit_top_bs) {
> > > +        bdrv_graph_wrlock(top);
> > > +        bdrv_drained_begin(top);
> > >          bdrv_replace_node(commit_top_bs, top, &error_abort);
> > > +        bdrv_drained_end(top);
> > > +        bdrv_graph_wrunlock();
> > 
> > ...but here you do it in the opposite order.  Intentional?
> 
> No, this is actually wrong. bdrv_drained_begin() has a nested event
> loop, and running a nested event loop while holding the graph lock can
> cause deadlocks, so it's forbidden. Thanks for catching this!

That's what review is for!

> 
> Since the two comments above are the only thing you found in the review,
> I'll just directly fix them while applying the series.

Sounds good to me. With the deadlock fixed by swapping order,

Reviewed-by: Eric Blake <ebl...@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org

Re: [PATCH 16/24] block: Mark bdrv_replace_node() GRAPH_WRLOCK

Reply via email to