Am 09.12.2021 um 15:23 hat Stefan Hajnoczi geschrieben: > The BlockBackend root child can change during bdrv_drained_begin() when > aio_poll() is invoked. In fact the BlockDriverState can reach refcnt 0 > and blk_drain() is left with a dangling BDS pointer. > > One example is scsi_device_purge_requests(), which calls blk_drain() to > wait for in-flight requests to cancel. If the backup blockjob is active, > then the BlockBackend root child is a temporary filter BDS owned by the > blockjob. The blockjob can complete during bdrv_drained_begin() and the > last reference to the BDS is released when the temporary filter node is > removed. This results in a use-after-free when blk_drain() calls > bdrv_drained_end(bs) on the dangling pointer. > > The general problem is that a function and its callers must not assume > that bs is still valid across aio_poll(). Explicitly hold a reference to > bs in blk_drain() to avoid the dangling pointer. > > Signed-off-by: Stefan Hajnoczi <stefa...@redhat.com> > --- > I found that BDS nodes are sometimes deleted with bs->quiesce_counter > > 0 (at least when running "make check"), so it is currently not possible > to put the bdrv_ref/unref() calls in bdrv_do_drained_begin() and > bdrv_do_drained_end() because they will be unbalanced. That would have > been a more general solution than only fixing blk_drain().
They are not supposed to end up unbalanced because detaching a child calls bdrv_unapply_subtree_drain(). In fact, I think test-bdrv-drain tests a few scenarios like this. Do have more details about the case that failed for you? Kevin