[PATCH v7 33/47] mirror: Deal with filters
This includes some permission limiting (for example, we only need to take the RESIZE permission for active commits where the base is smaller than the top). Use this opportunity to rename qmp_drive_mirror()'s "source" BDS to "target_backing_bs", because that is what it really refers to. Signed-off-by: Max Reitz --- qapi/block-core.json | 6 ++- block/mirror.c | 118 +-- blockdev.c | 36 + 3 files changed, 121 insertions(+), 39 deletions(-) diff --git a/qapi/block-core.json b/qapi/block-core.json index df87855429..0b8ccd30aa 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -1943,7 +1943,8 @@ # # @replaces: with sync=full graph node name to be replaced by the new #image when a whole image copy is done. This can be used to repair -#broken Quorum files. (Since 2.1) +#broken Quorum files. By default, @device is replaced, although +#implicitly created filters on it are kept. (Since 2.1) # # @mode: whether and how QEMU should create a new image, default is #'absolute-paths'. @@ -2254,7 +2255,8 @@ # # @replaces: with sync=full graph node name to be replaced by the new #image when a whole image copy is done. This can be used to repair -#broken Quorum files. +#broken Quorum files. By default, @device is replaced, although +#implicitly created filters on it are kept. # # @speed: the maximum speed, in bytes per second # diff --git a/block/mirror.c b/block/mirror.c index 469acf4600..770de3b34e 100644 --- a/block/mirror.c +++ b/block/mirror.c @@ -42,6 +42,7 @@ typedef struct MirrorBlockJob { BlockBackend *target; BlockDriverState *mirror_top_bs; BlockDriverState *base; +BlockDriverState *base_overlay; /* The name of the graph node to replace */ char *replaces; @@ -677,8 +678,10 @@ static int mirror_exit_common(Job *job) &error_abort); if (!abort && s->backing_mode == MIRROR_SOURCE_BACKING_CHAIN) { BlockDriverState *backing = s->is_none_mode ? src : s->base; -if (backing_bs(target_bs) != backing) { -bdrv_set_backing_hd(target_bs, backing, &local_err); +BlockDriverState *unfiltered_target = bdrv_skip_filters(target_bs); + +if (bdrv_cow_bs(unfiltered_target) != backing) { +bdrv_set_backing_hd(unfiltered_target, backing, &local_err); if (local_err) { error_report_err(local_err); local_err = NULL; @@ -740,7 +743,7 @@ static int mirror_exit_common(Job *job) * valid. */ block_job_remove_all_bdrv(bjob); -bdrv_replace_node(mirror_top_bs, backing_bs(mirror_top_bs), &error_abort); +bdrv_replace_node(mirror_top_bs, mirror_top_bs->backing->bs, &error_abort); /* We just changed the BDS the job BB refers to (with either or both of the * bdrv_replace_node() calls), so switch the BB back so the cleanup does @@ -786,7 +789,6 @@ static void coroutine_fn mirror_throttle(MirrorBlockJob *s) static int coroutine_fn mirror_dirty_init(MirrorBlockJob *s) { int64_t offset; -BlockDriverState *base = s->base; BlockDriverState *bs = s->mirror_top_bs->backing->bs; BlockDriverState *target_bs = blk_bs(s->target); int ret; @@ -837,7 +839,8 @@ static int coroutine_fn mirror_dirty_init(MirrorBlockJob *s) return 0; } -ret = bdrv_is_allocated_above(bs, base, false, offset, bytes, &count); +ret = bdrv_is_allocated_above(bs, s->base_overlay, true, offset, bytes, + &count); if (ret < 0) { return ret; } @@ -936,7 +939,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp) } else { s->target_cluster_size = BDRV_SECTOR_SIZE; } -if (backing_filename[0] && !target_bs->backing && +if (backing_filename[0] && !bdrv_backing_chain_next(target_bs) && s->granularity < s->target_cluster_size) { s->buf_size = MAX(s->buf_size, s->target_cluster_size); s->cow_bitmap = bitmap_new(length); @@ -1116,8 +1119,9 @@ static void mirror_complete(Job *job, Error **errp) if (s->backing_mode == MIRROR_OPEN_BACKING_CHAIN) { int ret; -assert(!target->backing); -ret = bdrv_open_backing_file(target, NULL, "backing", errp); +assert(!bdrv_backing_chain_next(target)); +ret = bdrv_open_backing_file(bdrv_skip_filters(target), NULL, + "backing", errp); if (ret < 0) { return; } @@ -1565,8 +1569,8 @@ static BlockJob *mirror_start_job( MirrorBlockJob *s; MirrorBDSOpaque *bs_opaque; BlockDriverState *mirror_top_bs; -bool target_graph_mod; bool target_is_backing; +uint64_t target_perms, target_shared_perms; Error *local_err = NULL; int ret; @@ -15
Re: [PATCH v7 33/47] mirror: Deal with filters
Am 25.06.2020 um 17:22 hat Max Reitz geschrieben: > This includes some permission limiting (for example, we only need to > take the RESIZE permission for active commits where the base is smaller > than the top). > > Use this opportunity to rename qmp_drive_mirror()'s "source" BDS to > "target_backing_bs", because that is what it really refers to. > > Signed-off-by: Max Reitz > @@ -1682,6 +1721,7 @@ static BlockJob *mirror_start_job( > s->zero_target = zero_target; > s->copy_mode = copy_mode; > s->base = base; > +s->base_overlay = bdrv_find_overlay(bs, base); > s->granularity = granularity; > s->buf_size = ROUND_UP(buf_size, granularity); > s->unmap = unmap; Is this valid without freezing the links between base_overlay and base? Actually, I guess we should freeze everything between bs and base (for base != NULL) and it's a preexisting problem that just happens to affect this code, too. Or maybe freezing everything is too much. We only want to make sure that no non-filter is inserted between base and base_overlay and that base (and now base_overlay) always stay in the backing chain of bs. But what options apart from freezing do we have to achieve this? Why is using base_overlay even better than using base? Assuming there is a good reason, maybe the commit message could spell it out. Kevin
Re: [PATCH v7 33/47] mirror: Deal with filters
On 19.08.20 18:50, Kevin Wolf wrote: > Am 25.06.2020 um 17:22 hat Max Reitz geschrieben: >> This includes some permission limiting (for example, we only need to >> take the RESIZE permission for active commits where the base is smaller >> than the top). >> >> Use this opportunity to rename qmp_drive_mirror()'s "source" BDS to >> "target_backing_bs", because that is what it really refers to. >> >> Signed-off-by: Max Reitz > >> @@ -1682,6 +1721,7 @@ static BlockJob *mirror_start_job( >> s->zero_target = zero_target; >> s->copy_mode = copy_mode; >> s->base = base; >> +s->base_overlay = bdrv_find_overlay(bs, base); >> s->granularity = granularity; >> s->buf_size = ROUND_UP(buf_size, granularity); >> s->unmap = unmap; > > Is this valid without freezing the links between base_overlay and base? Er... > Actually, I guess we should freeze everything between bs and base (for > base != NULL) and it's a preexisting problem that just happens to affect > this code, too. Yes, that’s how it looks to me, too. I don’t think that has anything to do with this patch. > Or maybe freezing everything is too much. We only want to make sure that > no non-filter is inserted between base and base_overlay and that base > (and now base_overlay) always stay in the backing chain of bs. But what > options apart from freezing do we have to achieve this? I don’t know of any, and I don’t know whether anyone would actually care if we were to just freeze everything. > Why is using base_overlay even better than using base? Assuming there is > a good reason, maybe the commit message could spell it out. The problem is that querying the block status for a filter node falls through to the underlying data-carrying node. So if there’s a filter on top of @base, and we query for is_allocated_above above @base, then we’ll include @base, which we do not want. Max signature.asc Description: OpenPGP digital signature
Re: [PATCH v7 33/47] mirror: Deal with filters
On 25.06.2020 18:22, Max Reitz wrote: This includes some permission limiting (for example, we only need to take the RESIZE permission for active commits where the base is smaller than the top). Use this opportunity to rename qmp_drive_mirror()'s "source" BDS to "target_backing_bs", because that is what it really refers to. Signed-off-by: Max Reitz --- qapi/block-core.json | 6 ++- block/mirror.c | 118 +-- blockdev.c | 36 + 3 files changed, 121 insertions(+), 39 deletions(-) ... diff --git a/block/mirror.c b/block/mirror.c index 469acf4600..770de3b34e 100644 --- a/block/mirror.c +++ b/block/mirror.c @@ -42,6 +42,7 @@ typedef struct MirrorBlockJob { BlockBackend *target; BlockDriverState *mirror_top_bs; BlockDriverState *base; +BlockDriverState *base_overlay; /* The name of the graph node to replace */ char *replaces; @@ -677,8 +678,10 @@ static int mirror_exit_common(Job *job) &error_abort); if (!abort && s->backing_mode == MIRROR_SOURCE_BACKING_CHAIN) { BlockDriverState *backing = s->is_none_mode ? src : s->base; -if (backing_bs(target_bs) != backing) { -bdrv_set_backing_hd(target_bs, backing, &local_err); +BlockDriverState *unfiltered_target = bdrv_skip_filters(target_bs); + +if (bdrv_cow_bs(unfiltered_target) != backing) { I just worry about a filter node of the concurrent job right below the unfiltered_target. The filter has unfiltered_target in its parent list. Will that filter node be replaced correctly then? Andrey ... +/* + * The topmost node with + * bdrv_skip_filters(filtered_target) == bdrv_skip_filters(target) + */ +filtered_target = bdrv_cow_bs(bdrv_find_overlay(bs, target)); + +assert(bdrv_skip_filters(filtered_target) == + bdrv_skip_filters(target)); + +/* + * XXX BLK_PERM_WRITE needs to be allowed so we don't block + * ourselves at s->base (if writes are blocked for a node, they are + * also blocked for its backing file). The other options would be a + * second filter driver above s->base (== target). + */ +iter_shared_perms = BLK_PERM_WRITE_UNCHANGED | BLK_PERM_WRITE; + +for (iter = bdrv_filter_or_cow_bs(bs); iter != target; + iter = bdrv_filter_or_cow_bs(iter)) +{ +if (iter == filtered_target) { For one filter node only? +/* + * From here on, all nodes are filters on the base. + * This allows us to share BLK_PERM_CONSISTENT_READ. + */ +iter_shared_perms |= BLK_PERM_CONSISTENT_READ; +} + ret = block_job_add_bdrv(&s->common, "intermediate node", iter, 0, - BLK_PERM_WRITE_UNCHANGED | BLK_PERM_WRITE, - errp); + iter_shared_perms, errp); if (ret < 0) { goto fail; } ... @@ -3042,6 +3053,7 @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp) " named node of the graph"); goto out; } +replaces_node_name = arg->replaces; What is the idea behind the variables substitution? Probably, the patch might be split out. Andrey
Re: [PATCH v7 33/47] mirror: Deal with filters
On 22.07.20 20:31, Andrey Shinkevich wrote: > On 25.06.2020 18:22, Max Reitz wrote: >> This includes some permission limiting (for example, we only need to >> take the RESIZE permission for active commits where the base is smaller >> than the top). >> >> Use this opportunity to rename qmp_drive_mirror()'s "source" BDS to >> "target_backing_bs", because that is what it really refers to. >> >> Signed-off-by: Max Reitz >> --- >> qapi/block-core.json | 6 ++- >> block/mirror.c | 118 +-- >> blockdev.c | 36 + >> 3 files changed, 121 insertions(+), 39 deletions(-) >> > ... >> diff --git a/block/mirror.c b/block/mirror.c >> index 469acf4600..770de3b34e 100644 >> --- a/block/mirror.c >> +++ b/block/mirror.c >> @@ -42,6 +42,7 @@ typedef struct MirrorBlockJob { >> BlockBackend *target; >> BlockDriverState *mirror_top_bs; >> BlockDriverState *base; >> + BlockDriverState *base_overlay; >> /* The name of the graph node to replace */ >> char *replaces; >> @@ -677,8 +678,10 @@ static int mirror_exit_common(Job *job) >> &error_abort); >> if (!abort && s->backing_mode == MIRROR_SOURCE_BACKING_CHAIN) { >> BlockDriverState *backing = s->is_none_mode ? src : s->base; >> - if (backing_bs(target_bs) != backing) { >> - bdrv_set_backing_hd(target_bs, backing, &local_err); >> + BlockDriverState *unfiltered_target = >> bdrv_skip_filters(target_bs); >> + >> + if (bdrv_cow_bs(unfiltered_target) != backing) { > > > I just worry about a filter node of the concurrent job right below the > unfiltered_target. Having a concurrent job on the target sounds extremely problematic in itself (because at least for most of the mirror job, the target isn’t in a consistent state). Is that a real use case? > The filter has unfiltered_target in its parent list. > Will that filter node be replaced correctly then? I’m also not quite sure what you mean. We need to attach the source’s backing chain to the target here, so we go down to the first node that might accept COW backing files (by invoking bdrv_skip_filters()). That should be correct no matter what kind of filters are on it. >> + /* >> + * The topmost node with >> + * bdrv_skip_filters(filtered_target) == >> bdrv_skip_filters(target) >> + */ >> + filtered_target = bdrv_cow_bs(bdrv_find_overlay(bs, target)); >> + >> + assert(bdrv_skip_filters(filtered_target) == >> + bdrv_skip_filters(target)); >> + >> + /* >> + * XXX BLK_PERM_WRITE needs to be allowed so we don't block >> + * ourselves at s->base (if writes are blocked for a node, >> they are >> + * also blocked for its backing file). The other options >> would be a >> + * second filter driver above s->base (== target). >> + */ >> + iter_shared_perms = BLK_PERM_WRITE_UNCHANGED | BLK_PERM_WRITE; >> + >> + for (iter = bdrv_filter_or_cow_bs(bs); iter != target; >> + iter = bdrv_filter_or_cow_bs(iter)) >> + { >> + if (iter == filtered_target) { > > > For one filter node only? No, iter_shared_perms is never reset, so it retains the BLK_PERM_CONSISTENT_READ flag until the end of the loop. >> + /* >> + * From here on, all nodes are filters on the base. >> + * This allows us to share BLK_PERM_CONSISTENT_READ. >> + */ >> + iter_shared_perms |= BLK_PERM_CONSISTENT_READ; >> + } >> + >> ret = block_job_add_bdrv(&s->common, "intermediate >> node", iter, 0, >> - BLK_PERM_WRITE_UNCHANGED | >> BLK_PERM_WRITE, >> - errp); >> + iter_shared_perms, errp); >> if (ret < 0) { >> goto fail; >> } > ... >> @@ -3042,6 +3053,7 @@ void qmp_drive_mirror(DriveMirror *arg, Error >> **errp) >> " named node of the graph"); >> goto out; >> } >> + replaces_node_name = arg->replaces; > > > What is the idea behind the variables substitution? Looks like a remnant from v6, where there was an if (arg->has_replaces) { ... replaces_node_name = arg->replaces; } else if (unfiltered_bs != bs) { replaces_node_name = unfiltered_bs->node_name; } But I moved that logic to blockdev_mirror_common() in this version. So it’s just useless now and replaces_node_name shouldn’t exist. Max signature.asc Description: OpenPGP digital signature
Re: [PATCH v7 33/47] mirror: Deal with filters
On 24.07.2020 12:49, Max Reitz wrote: On 22.07.20 20:31, Andrey Shinkevich wrote: On 25.06.2020 18:22, Max Reitz wrote: This includes some permission limiting (for example, we only need to take the RESIZE permission for active commits where the base is smaller than the top). Use this opportunity to rename qmp_drive_mirror()'s "source" BDS to "target_backing_bs", because that is what it really refers to. Signed-off-by: Max Reitz --- qapi/block-core.json | 6 ++- block/mirror.c | 118 +-- blockdev.c | 36 + 3 files changed, 121 insertions(+), 39 deletions(-) ... diff --git a/block/mirror.c b/block/mirror.c index 469acf4600..770de3b34e 100644 --- a/block/mirror.c +++ b/block/mirror.c @@ -42,6 +42,7 @@ typedef struct MirrorBlockJob { BlockBackend *target; BlockDriverState *mirror_top_bs; BlockDriverState *base; + BlockDriverState *base_overlay; /* The name of the graph node to replace */ char *replaces; @@ -677,8 +678,10 @@ static int mirror_exit_common(Job *job) &error_abort); if (!abort && s->backing_mode == MIRROR_SOURCE_BACKING_CHAIN) { BlockDriverState *backing = s->is_none_mode ? src : s->base; - if (backing_bs(target_bs) != backing) { - bdrv_set_backing_hd(target_bs, backing, &local_err); + BlockDriverState *unfiltered_target = bdrv_skip_filters(target_bs); + + if (bdrv_cow_bs(unfiltered_target) != backing) { I just worry about a filter node of the concurrent job right below the unfiltered_target. Having a concurrent job on the target sounds extremely problematic in itself (because at least for most of the mirror job, the target isn’t in a consistent state). Is that a real use case? It might be at the TestParallelOps of iotests #30 but I am not sure now. I am going to apply my series with copy-on-read filter for the stream job above this one and will see then. Andrey The filter has unfiltered_target in its parent list. Will that filter node be replaced correctly then? I’m also not quite sure what you mean. We need to attach the source’s backing chain to the target here, so we go down to the first node that might accept COW backing files (by invoking bdrv_skip_filters()). That should be correct no matter what kind of filters are on it. I ment when a filter is removed with the bdrv_replace_node() afterwards. As I mentioned above, I am going to test the case later. Andrey + /* + * The topmost node with + * bdrv_skip_filters(filtered_target) == bdrv_skip_filters(target) + */ + filtered_target = bdrv_cow_bs(bdrv_find_overlay(bs, target)); + + assert(bdrv_skip_filters(filtered_target) == + bdrv_skip_filters(target)); + + /* + * XXX BLK_PERM_WRITE needs to be allowed so we don't block + * ourselves at s->base (if writes are blocked for a node, they are + * also blocked for its backing file). The other options would be a + * second filter driver above s->base (== target). + */ + iter_shared_perms = BLK_PERM_WRITE_UNCHANGED | BLK_PERM_WRITE; + + for (iter = bdrv_filter_or_cow_bs(bs); iter != target; + iter = bdrv_filter_or_cow_bs(iter)) + { + if (iter == filtered_target) { For one filter node only? No, iter_shared_perms is never reset, so it retains the BLK_PERM_CONSISTENT_READ flag until the end of the loop. Yes, that's right. Clear. Andrey + /* + * From here on, all nodes are filters on the base. + * This allows us to share BLK_PERM_CONSISTENT_READ. + */ + iter_shared_perms |= BLK_PERM_CONSISTENT_READ; + } + ret = block_job_add_bdrv(&s->common, "intermediate node", iter, 0, - BLK_PERM_WRITE_UNCHANGED | BLK_PERM_WRITE, - errp); + iter_shared_perms, errp); if (ret < 0) { goto fail; } ... @@ -3042,6 +3053,7 @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp) " named node of the graph"); goto out; } + replaces_node_name = arg->replaces; What is the idea behind the variables substitution? Looks like a remnant from v6, where there was an if (arg->has_replaces) { ... replaces_node_name = arg->replaces; } else if (unfiltered_bs != bs) { replaces_node_name = unfiltered_bs->node_name; } But I moved that logic to blockdev_mirror_common() in this version. So it’s just useless now and replaces_node_name shouldn’t exist. Max