On Tue, Feb 15, 2022 at 02:57:25PM +0100, Hanna Reitz wrote: > bdrv_refresh_limits() recurses down to the node's children. That does > not seem necessary: When we refresh limits on some node, and then > recurse down and were to change one of its children's BlockLimits, then > that would mean we noticed the changed limits by pure chance. The fact > that we refresh the parent's limits has nothing to do with it, so the > reason for the change probably happened before this point in time, and > we should have refreshed the limits then. > > On the other hand, we do not have infrastructure for noticing that block > limits change after they have been initialized for the first time (this > would require propagating the change upwards to the respective node's > parents), and so evidently we consider this case impossible. > > If this case is impossible, then we will not need to recurse down in > bdrv_refresh_limits(). Every node's limits are initialized in > bdrv_open_driver(), and are refreshed whenever its children change. > We want to use the childrens' limits to get some initial default, but we > can just take them, we do not need to refresh them. > > The problem with recursing is that bdrv_refresh_limits() is not atomic. > It begins with zeroing BDS.bl, and only then sets proper, valid limits. > If we do not drain all nodes whose limits are refreshed, then concurrent > I/O requests can encounter invalid request_alignment values and crash > qemu. Therefore, a recursing bdrv_refresh_limits() requires the whole > subtree to be drained, which is currently not ensured by most callers. > > A non-recursive bdrv_refresh_limits() only requires the node in question > to not receive I/O requests, and this is done by most callers in some > way or another: > - bdrv_open_driver() deals with a new node with no parents yet > - bdrv_set_file_or_backing_noperm() acts on a drained node > - bdrv_reopen_commit() acts only on drained nodes > - bdrv_append() should in theory require the node to be drained; in > practice most callers just lock the AioContext, which should at least > be enough to prevent concurrent I/O requests from accessing invalid > limits > > So we can resolve the bug by making bdrv_refresh_limits() non-recursive.
Long explanation, but very helpful. > > Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1879437 > Signed-off-by: Hanna Reitz <hre...@redhat.com> > --- > block/io.c | 4 ---- > 1 file changed, 4 deletions(-) And deceptively simple fix! Reviewed-by: Eric Blake <ebl...@redhat.com> > > diff --git a/block/io.c b/block/io.c > index 4e4cb556c5..c3e7301613 100644 > --- a/block/io.c > +++ b/block/io.c > @@ -189,10 +189,6 @@ void bdrv_refresh_limits(BlockDriverState *bs, > Transaction *tran, Error **errp) > QLIST_FOREACH(c, &bs->children, next) { > if (c->role & (BDRV_CHILD_DATA | BDRV_CHILD_FILTERED | > BDRV_CHILD_COW)) > { > - bdrv_refresh_limits(c->bs, tran, errp); > - if (*errp) { > - return; > - } > bdrv_merge_limits(&bs->bl, &c->bs->bl); > have_limits = true; > } > -- > 2.34.1 > > -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org