15.08.2019 20:01, Max Reitz wrote: > On 15.08.19 17:21, Vladimir Sementsov-Ogievskiy wrote: >> 09.08.2019 19:14, Max Reitz wrote: >>> Currently, check_to_replace_node() only allows mirror to replace a node >>> in the chain of the source node, and only if it is the first non-filter >>> node below the source. Well, technically, the idea is that you can >>> exactly replace a quorum child by mirroring from quorum. >>> >>> This has (probably) two reasons: >>> (1) We do not want to create loops. >>> (2) @replaces and @device should have exactly the same content so >>> replacing them does not cause visible data to change. >>> >>> This has two issues: >>> (1) It is overly restrictive. It is completely fine for @replaces to be >>> a filter. >>> (2) It is not restrictive enough. You can create loops with this as >>> follows: >>> >>> $ qemu-img create -f qcow2 /tmp/source.qcow2 64M >>> $ qemu-system-x86_64 -qmp stdio >>> {"execute": "qmp_capabilities"} >>> {"execute": "object-add", >>> "arguments": {"qom-type": "throttle-group", "id": "tg0"}} >>> {"execute": "blockdev-add", >>> "arguments": { >>> "node-name": "source", >>> "driver": "throttle", >>> "throttle-group": "tg0", >>> "file": { >>> "node-name": "filtered", >>> "driver": "qcow2", >>> "file": { >>> "driver": "file", >>> "filename": "/tmp/source.qcow2" >>> } } } } >>> {"execute": "drive-mirror", >>> "arguments": { >>> "job-id": "mirror", >>> "device": "source", >>> "target": "/tmp/target.qcow2", >>> "format": "qcow2", >>> "node-name": "target", >>> "sync" :"none", >>> "replaces": "filtered" >>> } } >>> {"execute": "block-job-complete", "arguments": {"device": "mirror"}} >>> >>> And qemu crashes because of a stack overflow due to the loop being >>> created (target's backing file is source, so when it replaces filtered, >>> it points to itself through source). >>> >>> (blockdev-mirror can be broken similarly.) >>> >>> So let us make the checks for the two conditions above explicit, which >>> makes the whole function exactly as restrictive as it needs to be. >>> >>> Signed-off-by: Max Reitz <mre...@redhat.com> >>> --- >>> include/block/block.h | 1 + >>> block.c | 83 +++++++++++++++++++++++++++++++++++++++---- >>> blockdev.c | 34 ++++++++++++++++-- >>> 3 files changed, 110 insertions(+), 8 deletions(-) >>> >>> diff --git a/include/block/block.h b/include/block/block.h >>> index 6ba853fb90..8da706cd89 100644 >>> --- a/include/block/block.h >>> +++ b/include/block/block.h >>> @@ -404,6 +404,7 @@ bool bdrv_is_first_non_filter(BlockDriverState >>> *candidate); >>> >>> /* check if a named node can be replaced when doing drive-mirror */ >>> BlockDriverState *check_to_replace_node(BlockDriverState *parent_bs, >>> + BlockDriverState *backing_bs, >>> const char *node_name, Error >>> **errp); >>> >>> /* async block I/O */ >>> diff --git a/block.c b/block.c >>> index 915b80153c..4858d3e718 100644 >>> --- a/block.c >>> +++ b/block.c >>> @@ -6290,7 +6290,59 @@ bool bdrv_is_first_non_filter(BlockDriverState >>> *candidate) >>> return false; >>> } >>> >>> +static bool is_child_of(BlockDriverState *child, BlockDriverState *parent) >>> +{ >>> + BdrvChild *c; >>> + >>> + if (!parent) { >>> + return false; >>> + } >>> + >>> + QLIST_FOREACH(c, &parent->children, next) { >>> + if (c->bs == child || is_child_of(child, c->bs)) { >>> + return true; >>> + } >>> + } >>> + >>> + return false; >>> +} >>> + >>> +/* >>> + * Return true if there are only filters in [@top, @base). Note that >>> + * this may include quorum (which bdrv_chain_contains() cannot >>> + * handle). >> >> More presizely: return true if exists chain of filters from top to base or if >> top == base. >> >> I keep in mind backup-top filter: >> >> [backup-top] >> | \target > > backup-top can’t be a filter if it has two children with different > contents, though.
Why? target is special child, unrelated to what is read/written over backup-top. It's an own business of backup-top. > > (commit-top and mirror-top aren’t filters either.) Ahm, I missed something. They have is_filter = true and their children considered to be filtered-rw children in your series? And than, who they are? Format nodes? And how they appears in backing chains than? > > That’s why there must be a unique chain [@top, @base). > > I should probably not that it will return true if top == base, though, yes. > >> |backing -------->[target] >> V / >> [source] <---------/backing >> >>> + */ >>> +static bool is_filtered_child(BlockDriverState *top, BlockDriverState >>> *base) >>> +{ >>> + BdrvChild *c; >>> + >>> + if (!top) { >>> + return false; >>> + } >>> + >>> + if (top == base) { >>> + return true; >>> + } >>> + >>> + if (!top->drv->is_filter) { >>> + return false; >>> + } >>> + >>> + QLIST_FOREACH(c, &top->children, next) { >>> + if (is_filtered_child(c->bs, base)) { >>> + return true; >>> + } >>> + } >> >> interesting, how much is it better to somehow reuse DFS search written in >> should_update_child().. >> [just note, don't do it in these series please] >> >>> + >>> + return false; >>> +} >>> + >>> +/* >>> + * @parent_bs is mirror's source BDS, @backing_bs is the BDS which >>> + * will be attached to the target when mirror completes. >>> + */ >>> BlockDriverState *check_to_replace_node(BlockDriverState *parent_bs, >>> + BlockDriverState *backing_bs, >>> const char *node_name, Error >>> **errp) >>> { >>> BlockDriverState *to_replace_bs = bdrv_find_node(node_name); >>> @@ -6309,13 +6361,32 @@ BlockDriverState >>> *check_to_replace_node(BlockDriverState *parent_bs, >>> goto out; >>> } >>> >>> - /* We don't want arbitrary node of the BDS chain to be replaced only >>> the top >>> - * most non filter in order to prevent data corruption. >>> - * Another benefit is that this tests exclude backing files which are >>> - * blocked by the backing blockers. >>> + /* >>> + * If to_replace_bs is (recursively) a child of backing_bs, >>> + * replacing it may create a loop. We cannot allow that. >>> */ >>> - if (!bdrv_recurse_is_first_non_filter(parent_bs, to_replace_bs)) { >>> - error_setg(errp, "Only top most non filter can be replaced"); >>> + if (to_replace_bs == backing_bs || is_child_of(to_replace_bs, >>> backing_bs)) { >> >> first condition is covered by second, so first may be omitted. > > It is not. is_child_of() does not return true if child == parent. > >>> + error_setg(errp, "Replacing this node would result in a loop"); >>> + to_replace_bs = NULL; >>> + goto out; >>> + } >>> + >>> + /* >>> + * Mirror is designed in such a way that when it completes, the >>> + * source BDS is seamlessly replaced. >> >> Not source but to_replace_bs is replaced? > > It has originally been designed to replace the source. If it could > replace any arbitrary BDS, all of this would be moot. quorum child, you saying about in commit message? > >>> It is therefore not allowed >>> + * to replace a BDS where this condition would be violated, as that >>> + * would defeat the purpose of mirror and could lead to data >>> + * corruption. >>> + * Therefore, between parent_bs and to_replace_bs there may be >>> + * only filters (and the one on top must be a filter, too), so >>> + * their data always stays in sync and mirror can complete and >>> + * replace to_replace_bs without any possible corruptions. >>> + */ >>> + if (!is_filtered_child(parent_bs, to_replace_bs) && >>> + !is_filtered_child(to_replace_bs, parent_bs)) >>> + { >>> + error_setg(errp, "The node to be replaced must be connected to the >>> " >>> + "source through filter nodes only"); >> >> "and the one on top must be a filter, too" not mentioned in the error.. > > Well, unless the source node is the node to be replaced. Hm... This > gets very hard to express. I think I’d prefer to keep this as it is, > even though it is not quite correct, unless you have a better suggestion > of what to report. :-/ I can't imaging something better than just add "(and the one on top must be a filter, too)" > >>> to_replace_bs = NULL; >>> goto out; >>> } >>> diff --git a/blockdev.c b/blockdev.c >>> index 4e72f6f701..758e0b5431 100644 >>> --- a/blockdev.c >>> +++ b/blockdev.c >>> @@ -3887,7 +3887,7 @@ static void blockdev_mirror_common(const char >>> *job_id, BlockDriverState *bs, >>> } >>> >>> if (has_replaces) { >>> - BlockDriverState *to_replace_bs; >>> + BlockDriverState *to_replace_bs, *backing_bs; >>> AioContext *replace_aio_context; >>> int64_t bs_size, replace_size; >>> >>> @@ -3897,7 +3897,37 @@ static void blockdev_mirror_common(const char >>> *job_id, BlockDriverState *bs, >>> return; >>> } >>> >>> - to_replace_bs = check_to_replace_node(bs, replaces, errp); >>> + if (backing_mode == MIRROR_SOURCE_BACKING_CHAIN || >>> + backing_mode == MIRROR_OPEN_BACKING_CHAIN) >>> + { >>> + /* >>> + * While we do not quite know what OPEN_BACKING_CHAIN >>> + * (used for mode=existing) will yield, it is probably >>> + * best to restrict it exactly like SOURCE_BACKING_CHAIN, >>> + * because that is our best guess. >>> + */ >>> + switch (sync) { >>> + case MIRROR_SYNC_MODE_FULL: >>> + backing_bs = NULL; >>> + break; >>> + >>> + case MIRROR_SYNC_MODE_TOP: >>> + backing_bs = >>> bdrv_filtered_cow_bs(bdrv_skip_rw_filters(bs)); >> >> why not bdrv_backing_chain_next(bs) like in mirror_start? > > Good question. I suppose it should be > bdrv_filtered_cow_bs(bdrv_backing_chain_next(bs)) in mirror_start()? You mean bdrv_filtered_cow_bs(bdrv_skip_rw_filters(bs)), I hope) > Because with sync=top, we just want to remove the topmost COW node (and > filters on top), but keep filters behind it. > Agreed. > >>> + break; >>> + >>> + case MIRROR_SYNC_MODE_NONE: >>> + backing_bs = bs; >>> + break; >>> + >>> + default: >>> + abort(); >>> + } >>> + } else { >>> + assert(backing_mode == MIRROR_LEAVE_BACKING_CHAIN); >>> + backing_bs = >>> bdrv_filtered_cow_bs(bdrv_skip_rw_filters(target)); >>> + } >>> + >>> + to_replace_bs = check_to_replace_node(bs, backing_bs, replaces, >>> errp); >>> if (!to_replace_bs) { >>> return; >>> } >>> >> >> > > -- Best regards, Vladimir