Re: [PATCH v9 4/7] qapi: add blockdev-replace command
On 18.07.24 15:00, Markus Armbruster wrote: Vladimir Sementsov-Ogievskiy writes: Add a command that can replace bs in following BdrvChild structures: - qdev blk root child - block-export blk root child - any child of BlockDriverState selected by child-name Signed-off-by: Vladimir Sementsov-Ogievskiy --- blockdev.c | 56 +++ qapi/block-core.json | 88 ++ stubs/blk-by-qdev-id.c | 13 +++ stubs/meson.build | 1 + 4 files changed, 158 insertions(+) create mode 100644 stubs/blk-by-qdev-id.c diff --git a/blockdev.c b/blockdev.c index ba7e90b06e..2190467022 100644 --- a/blockdev.c +++ b/blockdev.c @@ -3559,6 +3559,62 @@ void qmp_x_blockdev_set_iothread(const char *node_name, StrOrNull *iothread, bdrv_try_change_aio_context(bs, new_context, NULL, errp); } +void qmp_blockdev_replace(BlockdevReplace *repl, Error **errp) +{ +BdrvChild *child = NULL; +BlockDriverState *new_child_bs; + +if (repl->parent_type == BLOCK_PARENT_TYPE_DRIVER) { +BlockDriverState *parent_bs; + +parent_bs = bdrv_find_node(repl->u.driver.node_name); +if (!parent_bs) { +error_setg(errp, "Block driver node with node-name '%s' not " + "found", repl->u.driver.node_name); +return; +} + +child = bdrv_find_child(parent_bs, repl->u.driver.child); +if (!child) { +error_setg(errp, "Block driver node '%s' doesn't have child " + "named '%s'", repl->u.driver.node_name, + repl->u.driver.child); +return; +} +} else { +/* Other types are similar, they work through blk */ +BlockBackend *blk; +bool is_qdev = repl->parent_type == BLOCK_PARENT_TYPE_QDEV; +const char *id = +is_qdev ? repl->u.qdev.qdev_id : repl->u.export.export_id; + +assert(is_qdev || repl->parent_type == BLOCK_PARENT_TYPE_EXPORT); + +blk = is_qdev ? blk_by_qdev_id(id, errp) : blk_by_export_id(id, errp); blk_by_export_id() finds export @exp, and returns the associated block backend exp->blk. Fine. blk_by_qdev_id() finds the device, and then searches @block_backends for a blk with blk->dev == blk. If a device has more than one block backend, you get the one first in @block_backends. I figure that's the one created first. Interface issue: when a device has multiple block backends, only one of them can be replaced, and which one is kind of random. Do such devices exist? Hmm.. I expect, they don't. If there is a problem, it's pre-existing, all callers of qmp_get_blk are affected. But honestly, I don't know. If no, could they exist? If yes, what should we do about it now? Maybe, continue the loop in blk_by_dev() up the the end, and if two blk found for the device, return an error? Or even abort? And add check to blk_attach_dev(), that we don't attach same device to several blks. +if (!blk) { +return; +} + +child = blk_root(blk); +if (!child) { +error_setg(errp, "%s '%s' is empty, nothing to replace", + is_qdev ? "Device" : "Export", id); +return; +} +} + +assert(child); +assert(child->bs); + +new_child_bs = bdrv_find_node(repl->new_child); +if (!new_child_bs) { +error_setg(errp, "Node '%s' not found", repl->new_child); +return; +} + +bdrv_replace_child_bs(child, new_child_bs, errp); +} + QemuOptsList qemu_common_drive_opts = { .name = "drive", .head = QTAILQ_HEAD_INITIALIZER(qemu_common_drive_opts.head), diff --git a/qapi/block-core.json b/qapi/block-core.json index df5e07debd..0a6f08a6e0 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -6148,3 +6148,91 @@ ## { 'struct': 'DummyBlockCoreForceArrays', 'data': { 'unused-block-graph-info': ['BlockGraphInfo'] } } + +## +# @BlockParentType: +# +# @qdev: block device, such as created by device_add, and denoted by +# qdev-id +# +# @driver: block driver node, such as created by blockdev-add, and +# denoted by node-name node-name and child? Hmm. I'd say that parent is denoted only by node-name, as parent is node itself (the fact that we have separate BdrvChild structure as a parent I'd keep as internal realization). But parent may have several children, and concrete child is denoted by @child. +# +# @export: block export, such created by block-export-add, and +# denoted by export-id +# +# Since 9.1 +## I'm kind of unhappy with this doc comment. I feel makes sense only in the context of its use. Let me try to avoid that: # @driver: the parent is a block node, the child is one of its # children. # # @export: the parent
Re: [PATCH v9 2/7] block/export: add blk_by_export_id()
On 18.07.24 14:48, Markus Armbruster wrote: Vladimir Sementsov-Ogievskiy writes: We need it for further blockdev-replace functionality. Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/export/export.c | 18 ++ include/sysemu/block-backend-global-state.h | 1 + 2 files changed, 19 insertions(+) diff --git a/block/export/export.c b/block/export/export.c index 6d51ae8ed7..57beae7982 100644 --- a/block/export/export.c +++ b/block/export/export.c @@ -355,3 +355,21 @@ BlockExportInfoList *qmp_query_block_exports(Error **errp) return head; } + +BlockBackend *blk_by_export_id(const char *id, Error **errp) +{ +BlockExport *exp; + +exp = blk_exp_find(id); +if (exp == NULL) { +error_setg(errp, "Export '%s' not found", id); +return NULL; +} + +if (!exp->blk) { +error_setg(errp, "Export '%s' is empty", id); Can this happen? Hmm, looking at the code in blk_exp_add: assert(exp->blk != NULL); QLIST_INSERT_HEAD(_exports, exp, next); return exp; seems not. And I can't find anything changing exp->blk except for blk_exp_add(). Will switch to assertion. +return NULL; +} + +return exp->blk; +} diff --git a/include/sysemu/block-backend-global-state.h b/include/sysemu/block-backend-global-state.h index ccb35546a1..410d0cc5c7 100644 --- a/include/sysemu/block-backend-global-state.h +++ b/include/sysemu/block-backend-global-state.h @@ -74,6 +74,7 @@ void blk_detach_dev(BlockBackend *blk, DeviceState *dev); DeviceState *blk_get_attached_dev(BlockBackend *blk); BlockBackend *blk_by_dev(void *dev); BlockBackend *blk_by_qdev_id(const char *id, Error **errp); +BlockBackend *blk_by_export_id(const char *id, Error **errp); void blk_set_dev_ops(BlockBackend *blk, const BlockDevOps *ops, void *opaque); void blk_activate(BlockBackend *blk, Error **errp); -- Best regards, Vladimir
Re: [PATCH v2 1/3] block/commit: implement final flush
On 18.07.24 22:22, Kevin Wolf wrote: Am 26.06.2024 um 16:50 hat Vladimir Sementsov-Ogievskiy geschrieben: Actually block job is not completed without the final flush. It's rather unexpected to have broken target when job was successfully completed long ago and now we fail to flush or process just crashed/killed. Mirror job already has mirror_flush() for this. So, it's OK. Do this for commit job too. Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/commit.c | 41 +++-- 1 file changed, 27 insertions(+), 14 deletions(-) diff --git a/block/commit.c b/block/commit.c index 7c3fdcb0ca..81971692a2 100644 --- a/block/commit.c +++ b/block/commit.c @@ -134,6 +134,7 @@ static int coroutine_fn commit_run(Job *job, Error **errp) int64_t n = 0; /* bytes */ QEMU_AUTO_VFREE void *buf = NULL; int64_t len, base_len; +bool need_final_flush = true; len = blk_co_getlength(s->top); if (len < 0) { @@ -155,8 +156,8 @@ static int coroutine_fn commit_run(Job *job, Error **errp) buf = blk_blockalign(s->top, COMMIT_BUFFER_SIZE); -for (offset = 0; offset < len; offset += n) { -bool copy; +for (offset = 0; offset < len || need_final_flush; offset += n) { In general, the control flow would be nicer to read if the final flush weren't integrated into the loop, but just added after it. But I assume this is pretty much required for pausing the job during error handling in the final flush if you don't want to duplicate a lot of the logic into a second loop? Right, that's the reason. +bool copy = false; bool error_in_source = true; /* Note that even when no rate limit is applied we need to yield @@ -166,22 +167,34 @@ static int coroutine_fn commit_run(Job *job, Error **errp) if (job_is_cancelled(>common.job)) { break; } -/* Copy if allocated above the base */ -ret = blk_co_is_allocated_above(s->top, s->base_overlay, true, -offset, COMMIT_BUFFER_SIZE, ); -copy = (ret > 0); -trace_commit_one_iteration(s, offset, n, ret); -if (copy) { -assert(n < SIZE_MAX); -ret = blk_co_pread(s->top, offset, n, buf, 0); -if (ret >= 0) { -ret = blk_co_pwrite(s->base, offset, n, buf, 0); -if (ret < 0) { -error_in_source = false; +if (offset < len) { +/* Copy if allocated above the base */ +ret = blk_co_is_allocated_above(s->top, s->base_overlay, true, +offset, COMMIT_BUFFER_SIZE, ); +copy = (ret > 0); +trace_commit_one_iteration(s, offset, n, ret); +if (copy) { +assert(n < SIZE_MAX); + +ret = blk_co_pread(s->top, offset, n, buf, 0); +if (ret >= 0) { +ret = blk_co_pwrite(s->base, offset, n, buf, 0); +if (ret < 0) { +error_in_source = false; +} } } +} else { +assert(need_final_flush); +ret = blk_co_flush(s->base); +if (ret < 0) { +error_in_source = false; +} else { +need_final_flush = false; +} Should we set n = 0 in this block to avoid counting the last chunk twice for the progress? Oops, right. Will fix, thanks } + if (ret < 0) { BlockErrorAction action = block_job_error_action(>common, s->on_error, Kevin -- Best regards, Vladimir
Re: [PATCH v5 3/3] qapi: introduce device-sync-config
On 18.07.24 11:27, Markus Armbruster wrote: Vladimir Sementsov-Ogievskiy writes: Add command to sync config from vhost-user backend to the device. It may be helpful when VHOST_USER_SLAVE_CONFIG_CHANGE_MSG failed or not triggered interrupt to the guest or just not available (not supported by vhost-user server). Command result is racy if allow it during migration. Let's allow the sync only in RUNNING state. Is this still accurate? The runstate_is_running() check is gone in v4, the migration_is_running() check remains. Right, better to fix commit message like: Command result is racy if allow it during migration. Let's not allow it. Signed-off-by: Vladimir Sementsov-Ogievskiy QAPI schema and QMP part: Signed-off-by: Markus Armbruster -- Best regards, Vladimir
Re: [PATCH v5 0/3] vhost-user-blk: live resize additional APIs
On 18.07.24 11:31, Markus Armbruster wrote: Vladimir Sementsov-Ogievskiy writes: ping. Markus, Eric, could someone give an ACC for QAPI part? I apologize for the delay. It was pretty bad. No problem, I myself make worse delays now when busy with other work, thanks for reviewing! -- Best regards, Vladimir
Re: [PATCH v2] block/reqlist: allow adding overlapping requests
On 12.07.24 17:07, Fiona Ebner wrote: Allow overlapping request by removing the assert that made it impossible. There are only two callers: 1. block_copy_task_create() It already asserts the very same condition before calling reqlist_init_req(). 2. cbw_snapshot_read_lock() There is no need to have read requests be non-overlapping in copy-before-write when used for snapshot-access. In fact, there was no protection against two callers of cbw_snapshot_read_lock() calling reqlist_init_req() with overlapping ranges and this could lead to an assertion failure [1]. In particular, with the reproducer script below [0], two cbw_co_snapshot_block_status() callers could race, with the second calling reqlist_init_req() before the first one finishes and removes its conflicting request. [0]: #!/bin/bash -e dd if=/dev/urandom of=/tmp/disk.raw bs=1M count=1024 ./qemu-img create /tmp/fleecing.raw -f raw 1G ( ./qemu-system-x86_64 --qmp stdio \ --blockdev raw,node-name=node0,file.driver=file,file.filename=/tmp/disk.raw \ --blockdev raw,node-name=node1,file.driver=file,file.filename=/tmp/fleecing.raw \ < [1]: #5 0x71e5f0088eb2 in __GI___assert_fail (...) at ./assert/assert.c:101 #6 0x615285438017 in reqlist_init_req (...) at ../block/reqlist.c:23 #7 0x6152853e2d98 in cbw_snapshot_read_lock (...) at ../block/copy-before-write.c:237 #8 0x6152853e3068 in cbw_co_snapshot_block_status (...) at ../block/copy-before-write.c:304 #9 0x6152853f4d22 in bdrv_co_snapshot_block_status (...) at ../block/io.c:3726 #10 0x61528543a63e in snapshot_access_co_block_status (...) at ../block/snapshot-access.c:48 #11 0x6152853f1a0a in bdrv_co_do_block_status (...) at ../block/io.c:2474 #12 0x6152853f2016 in bdrv_co_common_block_status_above (...) at ../block/io.c:2652 #13 0x6152853f22cf in bdrv_co_block_status_above (...) at ../block/io.c:2732 #14 0x6152853d9a86 in blk_co_block_status_above (...) at ../block/block-backend.c:1473 #15 0x61528538da6c in blockstatus_to_extents (...) at ../nbd/server.c:2374 #16 0x61528538deb1 in nbd_co_send_block_status (...) at ../nbd/server.c:2481 #17 0x61528538f424 in nbd_handle_request (...) at ../nbd/server.c:2978 #18 0x61528538f906 in nbd_trip (...) at ../nbd/server.c:3121 #19 0x6152855a7caf in coroutine_trampoline (...) at ../util/coroutine-ucontext.c:175 Cc: qemu-sta...@nongnu.org Suggested-by: Vladimir Sementsov-Ogievskiy Signed-off-by: Fiona Ebner --- Changes in v2: * different approach, allowing overlapping requests for copy-before-write rather than waiting for them. block-copy already asserts there are no conflicts before adding a request. block/copy-before-write.c | 3 ++- block/reqlist.c | 2 -- 2 files changed, 2 insertions(+), 3 deletions(-) diff --git a/block/copy-before-write.c b/block/copy-before-write.c index 853e01a1eb..28f6a096cd 100644 --- a/block/copy-before-write.c +++ b/block/copy-before-write.c @@ -66,7 +66,8 @@ typedef struct BDRVCopyBeforeWriteState { /* * @frozen_read_reqs: current read requests for fleecing user in bs->file - * node. These areas must not be rewritten by guest. + * node. These areas must not be rewritten by guest. There can be multiple + * overlapping read requests. */ BlockReqList frozen_read_reqs; diff --git a/block/reqlist.c b/block/reqlist.c index 08cb57cfa4..098e807378 100644 --- a/block/reqlist.c +++ b/block/reqlist.c @@ -20,8 +20,6 @@ void reqlist_init_req(BlockReqList *reqs, BlockReq *req, int64_t offset, int64_t bytes) { -assert(!reqlist_find_conflict(reqs, offset, bytes)); - *req = (BlockReq) { .offset = offset, .bytes = bytes, Reviewed-by: Vladimir Sementsov-Ogievskiy Thanks, applied to my block branch -- Best regards, Vladimir
Re: [PATCH v3 0/2] backup: allow specifying minimum cluster size
On 11.07.24 15:09, Fiona Ebner wrote: Discussion for v2: https://lore.kernel.org/qemu-devel/20240528120114.344416-1-f.eb...@proxmox.com/ Changes in v3: * Pass min_cluster_size option directly without checking has_min_cluster_size, because the default is 0 anyways. * Calculate maximum of passed-in argument and default once at the beginning of block_copy_calculate_cluster_size() * Update warning message to reflect actual value used * Do not leak qdict in error case * Use PRI{i,u}64 macros Discussion for v1: https://lore.kernel.org/qemu-devel/20240308155158.830258-1-f.eb...@proxmox.com/ - Changes in v2: * Use 'size' type in QAPI. * Remove option in cbw_parse_options(), i.e. before parsing generic blockdev options. * Reword commit messages hoping to describe the issue in a more straight-forward way. In the context of backup fleecing, discarding the source will not work when the fleecing image has a larger granularity than the one used for block-copy operations (can happen if the backup target has smaller cluster size), because cbw_co_pdiscard_snapshot() will align down the discard requests and thus effectively ignore then. To make @discard-source work in such a scenario, allow specifying the minimum cluster size used for block-copy operations and thus in particular also the granularity for discard requests to the source. Fiona Ebner (2): copy-before-write: allow specifying minimum cluster size backup: add minimum cluster size to performance options block/backup.c | 2 +- block/block-copy.c | 36 ++-- block/copy-before-write.c | 14 +- block/copy-before-write.h | 1 + blockdev.c | 3 +++ include/block/block-copy.h | 1 + qapi/block-core.json | 17 ++--- 7 files changed, 59 insertions(+), 15 deletions(-) Thanks, applied to my block branch. -- Best regards, Vladimir
Re: [PATCH v3 2/2] backup: add minimum cluster size to performance options
On 11.07.24 15:09, Fiona Ebner wrote: In the context of backup fleecing, discarding the source will not work when the fleecing image has a larger granularity than the one used for block-copy operations (can happen if the backup target has smaller cluster size), because cbw_co_pdiscard_snapshot() will align down the discard requests and thus effectively ignore then. To make @discard-source work in such a scenario, allow specifying the minimum cluster size used for block-copy operations and thus in particular also the granularity for discard requests to the source. Suggested-by: Vladimir Sementsov-Ogievskiy Acked-by: Markus Armbruster (QAPI schema) Signed-off-by: Fiona Ebner Reviewed-by: Vladimir Sementsov-Ogievskiy -- Best regards, Vladimir
Re: [PATCH v3 1/2] copy-before-write: allow specifying minimum cluster size
On 11.07.24 15:09, Fiona Ebner wrote: In the context of backup fleecing, discarding the source will not work when the fleecing image has a larger granularity than the one used for block-copy operations (can happen if the backup target has smaller cluster size), because cbw_co_pdiscard_snapshot() will align down the discard requests and thus effectively ignore then. To make @discard-source work in such a scenario, allow specifying the minimum cluster size used for block-copy operations and thus in particular also the granularity for discard requests to the source. The type 'size' (corresponding to uint64_t in C) is used in QAPI to rule out negative inputs and for consistency with already existing @cluster-size parameters. Since block_copy_calculate_cluster_size() uses int64_t for its result, a check that the input is not too large is added in block_copy_state_new() before calling it. The calculation in block_copy_calculate_cluster_size() is done in the target int64_t type. Suggested-by: Vladimir Sementsov-Ogievskiy Acked-by: Markus Armbruster (QAPI schema) Signed-off-by: Fiona Ebner Reviewed-by: Vladimir Sementsov-Ogievskiy -- Best regards, Vladimir
Re: [PATCH] block/copy-before-write: wait for conflicts when read locking to avoid assertion failure
On 11.07.24 16:36, Fiona Ebner wrote: There is no protection against two callers of cbw_snapshot_read_lock() calling reqlist_init_req() with overlapping ranges, and reqlist_init_req() asserts that there are no conflicting requests. In particular, two cbw_co_snapshot_block_status() callers can race, with the second calling reqlist_init_req() before the first one finishes and removes its conflicting request, leading to an assertion failure. Reproducer script [0] and backtrace [1] are attached below. Understand. But seems in case of CBW read-lock, nothing bad in intersecting read requests? reqlist is shared with backup, where we care to avoid intersecting requests in the list. What about just move the assertion to block_copy_task_create() ? And add comment somewhere that we support intersecting reads in frozen_read_reqs. [0]: #!/bin/bash -e dd if=/dev/urandom of=/tmp/disk.raw bs=1M count=1024 ./qemu-img create /tmp/fleecing.raw -f raw 1G ( ./qemu-system-x86_64 --qmp stdio \ --blockdev raw,node-name=node0,file.driver=file,file.filename=/tmp/disk.raw \ --blockdev raw,node-name=node1,file.driver=file,file.filename=/tmp/fleecing.raw \ < [1]: #5 0x71e5f0088eb2 in __GI___assert_fail (...) at ./assert/assert.c:101 #6 0x615285438017 in reqlist_init_req (...) at ../block/reqlist.c:23 #7 0x6152853e2d98 in cbw_snapshot_read_lock (...) at ../block/copy-before-write.c:237 #8 0x6152853e3068 in cbw_co_snapshot_block_status (...) at ../block/copy-before-write.c:304 #9 0x6152853f4d22 in bdrv_co_snapshot_block_status (...) at ../block/io.c:3726 #10 0x61528543a63e in snapshot_access_co_block_status (...) at ../block/snapshot-access.c:48 #11 0x6152853f1a0a in bdrv_co_do_block_status (...) at ../block/io.c:2474 #12 0x6152853f2016 in bdrv_co_common_block_status_above (...) at ../block/io.c:2652 #13 0x6152853f22cf in bdrv_co_block_status_above (...) at ../block/io.c:2732 #14 0x6152853d9a86 in blk_co_block_status_above (...) at ../block/block-backend.c:1473 #15 0x61528538da6c in blockstatus_to_extents (...) at ../nbd/server.c:2374 #16 0x61528538deb1 in nbd_co_send_block_status (...) at ../nbd/server.c:2481 #17 0x61528538f424 in nbd_handle_request (...) at ../nbd/server.c:2978 #18 0x61528538f906 in nbd_trip (...) at ../nbd/server.c:3121 #19 0x6152855a7caf in coroutine_trampoline (...) at ../util/coroutine-ucontext.c:175 Signed-off-by: Fiona Ebner --- block/copy-before-write.c | 1 + 1 file changed, 1 insertion(+) diff --git a/block/copy-before-write.c b/block/copy-before-write.c index 853e01a1eb..376ff3f3e1 100644 --- a/block/copy-before-write.c +++ b/block/copy-before-write.c @@ -234,6 +234,7 @@ cbw_snapshot_read_lock(BlockDriverState *bs, int64_t offset, int64_t bytes, *req = (BlockReq) {.offset = -1, .bytes = -1}; *file = s->target; } else { +reqlist_wait_all(>frozen_read_reqs, offset, bytes, >lock); reqlist_init_req(>frozen_read_reqs, req, offset, bytes); *file = bs->file; } -- Best regards, Vladimir
Re: [PATCH 1/2] block: zero data data corruption using prealloc-filter
On 11.07.24 16:32, Andrey Drobyshev wrote: From: "Denis V. Lunev" We have observed that some clusters in the QCOW2 files are zeroed while preallocation filter is used. We are able to trace down the following sequence when prealloc-filter is used: co=0x55e7cbed7680 qcow2_co_pwritev_task() co=0x55e7cbed7680 preallocate_co_pwritev_part() co=0x55e7cbed7680 handle_write() co=0x55e7cbed7680 bdrv_co_do_pwrite_zeroes() co=0x55e7cbed7680 raw_do_pwrite_zeroes() co=0x7f9edb7fe500 do_fallocate() Here coroutine 0x55e7cbed7680 is being blocked waiting while coroutine 0x7f9edb7fe500 will finish with fallocate of the file area. OK. It is time to handle next coroutine, which co=0x55e7cbee91b0 qcow2_co_pwritev_task() co=0x55e7cbee91b0 preallocate_co_pwritev_part() co=0x55e7cbee91b0 handle_write() co=0x55e7cbee91b0 bdrv_co_do_pwrite_zeroes() co=0x55e7cbee91b0 raw_do_pwrite_zeroes() co=0x7f9edb7deb00 do_fallocate() The trouble comes here. Coroutine 0x55e7cbed7680 has not advanced file_end yet and coroutine 0x55e7cbee91b0 will start fallocate() for the same area. This means that if (once fallocate is started inside 0x7f9edb7deb00) original fallocate could end and the real write will be executed. In that case write() request is handled at the same time as fallocate(). Normally we should protect s->file_end while it is detected that preallocation is need. The patch introduces file_end_lock for it to be protected when run in the coroutine context. Note: the lock is taken only once it is detected that the preallocation is really required. This is not a frequent case due to the preallocation nature thus the patch should not have performance impact. Originally-by: Denis V. Lunev Co-authored-by: Andrey Drobyshev Signed-off-by: Denis V. Lunev Signed-off-by: Andrey Drobyshev --- block/preallocate.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/block/preallocate.c b/block/preallocate.c index d215bc5d6d..9cb2c97635 100644 --- a/block/preallocate.c +++ b/block/preallocate.c @@ -78,6 +78,8 @@ typedef struct BDRVPreallocateState { /* Gives up the resize permission on children when parents don't need it */ QEMUBH *drop_resize_bh; + +CoMutex file_end_lock; } BDRVPreallocateState; static int preallocate_drop_resize(BlockDriverState *bs, Error **errp); @@ -170,6 +172,8 @@ static int preallocate_open(BlockDriverState *bs, QDict *options, int flags, ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) & bs->file->bs->supported_zero_flags); +qemu_co_mutex_init(>file_end_lock); + return 0; } @@ -342,6 +346,7 @@ handle_write(BlockDriverState *bs, int64_t offset, int64_t bytes, return false; } +QEMU_LOCK_GUARD(>file_end_lock); if (s->file_end < 0) { s->file_end = s->data_end; } @@ -353,6 +358,8 @@ handle_write(BlockDriverState *bs, int64_t offset, int64_t bytes, /* We have valid s->data_end, and request writes beyond it. */ +QEMU_LOCK_GUARD(>file_end_lock); + s->data_end = end; if (s->zero_start < 0 || !want_merge_zero) { s->zero_start = end; @@ -428,6 +435,8 @@ preallocate_co_truncate(BlockDriverState *bs, int64_t offset, BDRVPreallocateState *s = bs->opaque; int ret; +QEMU_LOCK_GUARD(>file_end_lock); + if (s->data_end >= 0 && offset > s->data_end) { if (s->file_end < 0) { s->file_end = bdrv_co_getlength(bs->file->bs); @@ -501,6 +510,8 @@ preallocate_co_getlength(BlockDriverState *bs) return s->data_end; } +QEMU_LOCK_GUARD(>file_end_lock); + ret = bdrv_co_getlength(bs->file->bs); if (has_prealloc_perms(bs)) { Hmm, seems preallocate driver not thread-safe neither coroutine-safe. I think co-mutex is good idea. Still, protecting only s->file_end may be not enough - we do want to keep data_end / zero_start / file_end variables correct - probably, just make the whole preallocation code a critical section? Maybe, atomic read of variables for fast-path (for writes which doesn't need preallocation) -- Best regards, Vladimir
Re: [PATCH v2 1/2] block: zero data data corruption using prealloc-filter
On 12.07.24 12:46, Andrey Drobyshev wrote: From: "Denis V. Lunev" We have observed that some clusters in the QCOW2 files are zeroed while preallocation filter is used. We are able to trace down the following sequence when prealloc-filter is used: co=0x55e7cbed7680 qcow2_co_pwritev_task() co=0x55e7cbed7680 preallocate_co_pwritev_part() co=0x55e7cbed7680 handle_write() co=0x55e7cbed7680 bdrv_co_do_pwrite_zeroes() co=0x55e7cbed7680 raw_do_pwrite_zeroes() co=0x7f9edb7fe500 do_fallocate() Here coroutine 0x55e7cbed7680 is being blocked waiting while coroutine 0x7f9edb7fe500 will finish with fallocate of the file area. OK. It is time to handle next coroutine, which co=0x55e7cbee91b0 qcow2_co_pwritev_task() co=0x55e7cbee91b0 preallocate_co_pwritev_part() co=0x55e7cbee91b0 handle_write() co=0x55e7cbee91b0 bdrv_co_do_pwrite_zeroes() co=0x55e7cbee91b0 raw_do_pwrite_zeroes() co=0x7f9edb7deb00 do_fallocate() The trouble comes here. Coroutine 0x55e7cbed7680 has not advanced file_end yet and coroutine 0x55e7cbee91b0 will start fallocate() for the same area. This means that if (once fallocate is started inside 0x7f9edb7deb00) original fallocate could end and the real write will be executed. In that case write() request is handled at the same time as fallocate(). The patch moves s->file_lock assignment before fallocate and that is text need to be updated crucial. The idea is that all subsequent requests into the area being preallocation will be issued as just writes without fallocate to this area and they will not proceed thanks to overlapping requests mechanics. If preallocation will fail, we will just switch to the normal expand-by-write behavior and that is not a problem except performance. Signed-off-by: Denis V. Lunev Tested-by: Andrey Drobyshev --- block/preallocate.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/block/preallocate.c b/block/preallocate.c index d215bc5d6d..ecf0aa4baa 100644 --- a/block/preallocate.c +++ b/block/preallocate.c @@ -383,6 +383,13 @@ handle_write(BlockDriverState *bs, int64_t offset, int64_t bytes, want_merge_zero = want_merge_zero && (prealloc_start <= offset); +/* + * Assign file_end before making actual preallocation. This will ensure + * that next request performed while preallocation is in progress will + * be passed without preallocation. + */ +s->file_end = prealloc_end; + ret = bdrv_co_pwrite_zeroes( bs->file, prealloc_start, prealloc_end - prealloc_start, BDRV_REQ_NO_FALLBACK | BDRV_REQ_SERIALISING | BDRV_REQ_NO_WAIT); @@ -391,7 +398,6 @@ handle_write(BlockDriverState *bs, int64_t offset, int64_t bytes, return false; } -s->file_end = prealloc_end; return want_merge_zero; } Hmm. But this way we set both s->file_end and s->zero_start prior to actual write_zero operation. This means that next write-zero operation may go fast-path (see preallocate_co_pwrite_zeroes()) and return success, even before actual finish of preallocation write_zeroes operation (which may also fail). Seems we need to update logic around s->zero_start too. -- Best regards, Vladimir
Re: [PATCH v5 0/3] vhost-user-blk: live resize additional APIs
ping. Markus, Eric, could someone give an ACC for QAPI part? On 25.06.24 15:18, Vladimir Sementsov-Ogievskiy wrote: v5: 03: drop extra check on is is runstate running Vladimir Sementsov-Ogievskiy (3): qdev-monitor: add option to report GenericError from find_device_state vhost-user-blk: split vhost_user_blk_sync_config() qapi: introduce device-sync-config hw/block/vhost-user-blk.c | 27 ++-- hw/virtio/virtio-pci.c| 9 +++ include/hw/qdev-core.h| 3 +++ qapi/qdev.json| 24 ++ system/qdev-monitor.c | 53 --- 5 files changed, 105 insertions(+), 11 deletions(-) -- Best regards, Vladimir
Re: [PATCH] block/curl: rewrite http header parsing function
On 01.07.24 09:55, Michael Tokarev wrote: 01.07.2024 09:54, Vladimir Sementsov-Ogievskiy wrote: + const char *t = "accept-ranges : bytes "; /* A lowercase template */ Note: you make parser less strict: you allow "bytes" to be uppercase (was allowed only for accept-ranges", and you allow whitespaces before colon. Yes, exactly. I should add this to the description (wanted to do that but forgot). I'll update the patch (without re-sending) - hopefully its' okay to keep your S-o-b :) Of course! -- Best regards, Vladimir
Re: [PATCH] block/curl: rewrite http header parsing function
On 29.06.24 17:25, Michael Tokarev wrote: Existing code was long, unclear and twisty. Signed-off-by: Michael Tokarev Reviewed-by: Vladimir Sementsov-Ogievskiy --- block/curl.c | 44 ++-- 1 file changed, 18 insertions(+), 26 deletions(-) diff --git a/block/curl.c b/block/curl.c index 419f7c89ef..9802d0319d 100644 --- a/block/curl.c +++ b/block/curl.c @@ -210,37 +210,29 @@ static size_t curl_header_cb(void *ptr, size_t size, size_t nmemb, void *opaque) { BDRVCURLState *s = opaque; size_t realsize = size * nmemb; -const char *header = (char *)ptr; -const char *end = header + realsize; -const char *accept_ranges = "accept-ranges:"; -const char *bytes = "bytes"; +const char *p = ptr; +const char *end = p + realsize; +const char *t = "accept-ranges : bytes "; /* A lowercase template */ Note: you make parser less strict: you allow "bytes" to be uppercase (was allowed only for accept-ranges", and you allow whitespaces before colon. -if (realsize >= strlen(accept_ranges) -&& g_ascii_strncasecmp(header, accept_ranges, - strlen(accept_ranges)) == 0) { - -char *p = strchr(header, ':') + 1; - -/* Skip whitespace between the header name and value. */ -while (p < end && *p && g_ascii_isspace(*p)) { -p++; -} - -if (end - p >= strlen(bytes) -&& strncmp(p, bytes, strlen(bytes)) == 0) { - -/* Check that there is nothing but whitespace after the value. */ -p += strlen(bytes); -while (p < end && *p && g_ascii_isspace(*p)) { -p++; -} - -if (p == end || !*p) { -s->accept_range = true; +/* check if header matches the "t" template */ +for (;;) { +if (*t == ' ') { /* space in t matches any amount of isspace in p */ +if (p < end && g_ascii_isspace(*p)) { +++p; +} else { +++t; } +} else if (*t && p < end && *t == g_ascii_tolower(*p)) { +++p, ++t; +} else { +break; } } +if (!*t && p == end) { /* if we managed to reach ends of both strings */ +s->accept_range = true; +} + return realsize; } -- Best regards, Vladimir
Re: [PATCH] block/curl: use strlen instead of strchr
On 01.07.24 09:34, Vladimir Sementsov-Ogievskiy wrote: On 29.06.24 09:20, Michael Tokarev wrote: On 6/28/24 08:49, Vladimir Sementsov-Ogievskiy wrote: We already know where colon is, so no reason to search for it. Also, avoid a code, which looks like we forget to check return value of strchr() to NULL. Suggested-by: Kevin Wolf Signed-off-by: Vladimir Sementsov-Ogievskiy --- This replaces my patch [PATCH] block/curl: explicitly assert that strchr returns non-NULL value Supersedes: <20240627153059.589070-1-vsement...@yandex-team.ru> block/curl.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/block/curl.c b/block/curl.c index 419f7c89ef..d03bfe4817 100644 --- a/block/curl.c +++ b/block/curl.c @@ -219,7 +219,7 @@ static size_t curl_header_cb(void *ptr, size_t size, size_t nmemb, void *opaque) && g_ascii_strncasecmp(header, accept_ranges, strlen(accept_ranges)) == 0) { - char *p = strchr(header, ':') + 1; + char *p = header + strlen(accept_ranges); /* Skip whitespace between the header name and value. */ while (p < end && *p && g_ascii_isspace(*p)) { Heck. All these strlen()s look ugly, especially in the loop iterations.. I expect that strlen of string constant is calculated in compilation time. My aim was to fix Coverity complain (I don't see this complain in public qemu coverity project, that's why I don't specify CID in commit message), not to rewrite the whole function. So I'd prefer Kevein's suggesting which is both minimal and makes the code obviously correct.. The only simpler thing is to mark the problem false-positive in Coverity. Upd: I missed that you sent a patch, this changes things. Of course, you code looks nicer than old one. -- Best regards, Vladimir
Re: [PATCH] block/curl: use strlen instead of strchr
On 29.06.24 09:20, Michael Tokarev wrote: On 6/28/24 08:49, Vladimir Sementsov-Ogievskiy wrote: We already know where colon is, so no reason to search for it. Also, avoid a code, which looks like we forget to check return value of strchr() to NULL. Suggested-by: Kevin Wolf Signed-off-by: Vladimir Sementsov-Ogievskiy --- This replaces my patch [PATCH] block/curl: explicitly assert that strchr returns non-NULL value Supersedes: <20240627153059.589070-1-vsement...@yandex-team.ru> block/curl.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/block/curl.c b/block/curl.c index 419f7c89ef..d03bfe4817 100644 --- a/block/curl.c +++ b/block/curl.c @@ -219,7 +219,7 @@ static size_t curl_header_cb(void *ptr, size_t size, size_t nmemb, void *opaque) && g_ascii_strncasecmp(header, accept_ranges, strlen(accept_ranges)) == 0) { - char *p = strchr(header, ':') + 1; + char *p = header + strlen(accept_ranges); /* Skip whitespace between the header name and value. */ while (p < end && *p && g_ascii_isspace(*p)) { Heck. All these strlen()s look ugly, especially in the loop iterations.. I expect that strlen of string constant is calculated in compilation time. My aim was to fix Coverity complain (I don't see this complain in public qemu coverity project, that's why I don't specify CID in commit message), not to rewrite the whole function. So I'd prefer Kevein's suggesting which is both minimal and makes the code obviously correct.. The only simpler thing is to mark the problem false-positive in Coverity. -- Best regards, Vladimir
[PATCH] block/curl: use strlen instead of strchr
We already know where colon is, so no reason to search for it. Also, avoid a code, which looks like we forget to check return value of strchr() to NULL. Suggested-by: Kevin Wolf Signed-off-by: Vladimir Sementsov-Ogievskiy --- This replaces my patch [PATCH] block/curl: explicitly assert that strchr returns non-NULL value Supersedes: <20240627153059.589070-1-vsement...@yandex-team.ru> block/curl.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/block/curl.c b/block/curl.c index 419f7c89ef..d03bfe4817 100644 --- a/block/curl.c +++ b/block/curl.c @@ -219,7 +219,7 @@ static size_t curl_header_cb(void *ptr, size_t size, size_t nmemb, void *opaque) && g_ascii_strncasecmp(header, accept_ranges, strlen(accept_ranges)) == 0) { -char *p = strchr(header, ':') + 1; +char *p = header + strlen(accept_ranges); /* Skip whitespace between the header name and value. */ while (p < end && *p && g_ascii_isspace(*p)) { -- 2.34.1
Re: [PATCH] block/curl: explicitly assert that strchr returns non-NULL value
On 27.06.24 21:05, Kevin Wolf wrote: Am 27.06.2024 um 17:30 hat Vladimir Sementsov-Ogievskiy geschrieben: strchr may return NULL if colon is not found. It seems clearer to assert explicitly that we don't expect it here, than dereference 1 in the next line. Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/curl.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/block/curl.c b/block/curl.c index 419f7c89ef..ccfffd6c12 100644 --- a/block/curl.c +++ b/block/curl.c @@ -219,7 +219,9 @@ static size_t curl_header_cb(void *ptr, size_t size, size_t nmemb, void *opaque) && g_ascii_strncasecmp(header, accept_ranges, strlen(accept_ranges)) == 0) { -char *p = strchr(header, ':') + 1; +char *p = strchr(header, ':'); +assert(p != NULL); +p += 1; I'm not sure if this is actually much clearer because it doesn't say why we don't expect NULL here. If you don't look at the context, it almost looks like an assert() where proper error handling is needed. If you do, then the original line is clear enough. My first thought was that maybe what we want is a comment, but we actually already know where the colon is. So how about this instead: char *p = header + strlen(accept_ranges); Oh, right. That's better. /* Skip whitespace between the header name and value. */ while (p < end && *p && g_ascii_isspace(*p)) { -- 2.34.1 -- Best regards, Vladimir
[PATCH] hw/core/loader: gunzip(): fix memory leak on error path
We should call inflateEnd() like on success path to cleanup state in s variable. Signed-off-by: Vladimir Sementsov-Ogievskiy --- hw/core/loader.c | 1 + 1 file changed, 1 insertion(+) diff --git a/hw/core/loader.c b/hw/core/loader.c index 2f8105d7de..a3bea1e718 100644 --- a/hw/core/loader.c +++ b/hw/core/loader.c @@ -610,6 +610,7 @@ ssize_t gunzip(void *dst, size_t dstlen, uint8_t *src, size_t srclen) r = inflate(, Z_FINISH); if (r != Z_OK && r != Z_STREAM_END) { printf ("Error: inflate() returned %d\n", r); +inflateEnd(); return -1; } dstbytes = s.next_out - (unsigned char *) dst; -- 2.34.1
[PATCH] block/curl: explicitly assert that strchr returns non-NULL value
strchr may return NULL if colon is not found. It seems clearer to assert explicitly that we don't expect it here, than dereference 1 in the next line. Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/curl.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/block/curl.c b/block/curl.c index 419f7c89ef..ccfffd6c12 100644 --- a/block/curl.c +++ b/block/curl.c @@ -219,7 +219,9 @@ static size_t curl_header_cb(void *ptr, size_t size, size_t nmemb, void *opaque) && g_ascii_strncasecmp(header, accept_ranges, strlen(accept_ranges)) == 0) { -char *p = strchr(header, ':') + 1; +char *p = strchr(header, ':'); +assert(p != NULL); +p += 1; /* Skip whitespace between the header name and value. */ while (p < end && *p && g_ascii_isspace(*p)) { -- 2.34.1
[PATCH v2 3/3] block/backup: implement final flush
Actually block job is not completed without the final flush. It's rather unexpected to have broken target when job was successfully completed long ago and now we fail to flush or process just crashed/killed. Mirror job already has mirror_flush() for this. So, it's OK. Do this for backup job too. Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/backup.c | 2 +- block/block-copy.c | 7 +++ include/block/block-copy.h | 1 + 3 files changed, 9 insertions(+), 1 deletion(-) diff --git a/block/backup.c b/block/backup.c index 3dd2e229d2..fee78ba5ad 100644 --- a/block/backup.c +++ b/block/backup.c @@ -156,7 +156,7 @@ static int coroutine_fn backup_loop(BackupBlockJob *job) job->bg_bcs_call = s = block_copy_async(job->bcs, 0, QEMU_ALIGN_UP(job->len, job->cluster_size), job->perf.max_workers, job->perf.max_chunk, -backup_block_copy_callback, job); +true, backup_block_copy_callback, job); while (!block_copy_call_finished(s) && !job_is_cancelled(>common.job)) diff --git a/block/block-copy.c b/block/block-copy.c index 7e3b378528..842b0383db 100644 --- a/block/block-copy.c +++ b/block/block-copy.c @@ -54,6 +54,7 @@ typedef struct BlockCopyCallState { int max_workers; int64_t max_chunk; bool ignore_ratelimit; +bool need_final_flush; BlockCopyAsyncCallbackFunc cb; void *cb_opaque; /* Coroutine where async block-copy is running */ @@ -899,6 +900,10 @@ block_copy_common(BlockCopyCallState *call_state) */ } while (ret > 0 && !qatomic_read(_state->cancelled)); +if (ret == 0 && call_state->need_final_flush) { +ret = bdrv_co_flush(s->target->bs); +} + qatomic_store_release(_state->finished, true); if (call_state->cb) { @@ -954,6 +959,7 @@ int coroutine_fn block_copy(BlockCopyState *s, int64_t start, int64_t bytes, BlockCopyCallState *block_copy_async(BlockCopyState *s, int64_t offset, int64_t bytes, int max_workers, int64_t max_chunk, + bool need_final_flush, BlockCopyAsyncCallbackFunc cb, void *cb_opaque) { @@ -965,6 +971,7 @@ BlockCopyCallState *block_copy_async(BlockCopyState *s, .bytes = bytes, .max_workers = max_workers, .max_chunk = max_chunk, +.need_final_flush = need_final_flush, .cb = cb, .cb_opaque = cb_opaque, diff --git a/include/block/block-copy.h b/include/block/block-copy.h index bdc703bacd..6588ebaf77 100644 --- a/include/block/block-copy.h +++ b/include/block/block-copy.h @@ -62,6 +62,7 @@ int coroutine_fn block_copy(BlockCopyState *s, int64_t offset, int64_t bytes, BlockCopyCallState *block_copy_async(BlockCopyState *s, int64_t offset, int64_t bytes, int max_workers, int64_t max_chunk, + bool need_final_flush, BlockCopyAsyncCallbackFunc cb, void *cb_opaque); -- 2.34.1
[PATCH v2 0/3] block-jobs: add final flush
Actually block job is not completed without the final flush. It's rather unexpected to have broken target when job was successfully completed long ago and now we fail to flush or process just crashed/killed. Mirror job already has mirror_flush() for this. So, it's OK. Add similar things for other jobs: backup, stream, commit. v2: rework stream and commit, also split into 3 commits. Vladimir Sementsov-Ogievskiy (3): block/commit: implement final flush block/stream: implement final flush block/backup: implement final flush block/backup.c | 2 +- block/block-copy.c | 7 block/commit.c | 41 +++ block/stream.c | 67 +++--- include/block/block-copy.h | 1 + 5 files changed, 77 insertions(+), 41 deletions(-) -- 2.34.1
[PATCH v2 2/3] block/stream: implement final flush
Actually block job is not completed without the final flush. It's rather unexpected to have broken target when job was successfully completed long ago and now we fail to flush or process just crashed/killed. Mirror job already has mirror_flush() for this. So, it's OK. Do this for stream job too. Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/stream.c | 67 ++ 1 file changed, 41 insertions(+), 26 deletions(-) diff --git a/block/stream.c b/block/stream.c index 7031eef12b..893db258d4 100644 --- a/block/stream.c +++ b/block/stream.c @@ -160,6 +160,7 @@ static int coroutine_fn stream_run(Job *job, Error **errp) int64_t offset = 0; int error = 0; int64_t n = 0; /* bytes */ +bool need_final_flush = true; WITH_GRAPH_RDLOCK_GUARD() { unfiltered_bs = bdrv_skip_filters(s->target_bs); @@ -175,10 +176,13 @@ static int coroutine_fn stream_run(Job *job, Error **errp) } job_progress_set_remaining(>common.job, len); -for ( ; offset < len; offset += n) { -bool copy; +for ( ; offset < len || need_final_flush; offset += n) { +bool copy = false; +bool error_is_read = true; int ret; +n = 0; + /* Note that even when no rate limit is applied we need to yield * with no pending I/O here so that bdrv_drain_all() returns. */ @@ -187,35 +191,46 @@ static int coroutine_fn stream_run(Job *job, Error **errp) break; } -copy = false; - -WITH_GRAPH_RDLOCK_GUARD() { -ret = bdrv_co_is_allocated(unfiltered_bs, offset, STREAM_CHUNK, ); -if (ret == 1) { -/* Allocated in the top, no need to copy. */ -} else if (ret >= 0) { -/* - * Copy if allocated in the intermediate images. Limit to the - * known-unallocated area [offset, offset+n*BDRV_SECTOR_SIZE). - */ -ret = bdrv_co_is_allocated_above(bdrv_cow_bs(unfiltered_bs), - s->base_overlay, true, - offset, n, ); -/* Finish early if end of backing file has been reached */ -if (ret == 0 && n == 0) { -n = len - offset; +if (offset < len) { +WITH_GRAPH_RDLOCK_GUARD() { +ret = bdrv_co_is_allocated(unfiltered_bs, offset, STREAM_CHUNK, + ); +if (ret == 1) { +/* Allocated in the top, no need to copy. */ +} else if (ret >= 0) { +/* + * Copy if allocated in the intermediate images. Limit to + * the known-unallocated area + * [offset, offset+n*BDRV_SECTOR_SIZE). + */ +ret = bdrv_co_is_allocated_above(bdrv_cow_bs(unfiltered_bs), + s->base_overlay, true, + offset, n, ); +/* Finish early if end of backing file has been reached */ +if (ret == 0 && n == 0) { +n = len - offset; +} + +copy = (ret > 0); } - -copy = (ret > 0); } -} -trace_stream_one_iteration(s, offset, n, ret); -if (copy) { -ret = stream_populate(s->blk, offset, n); +trace_stream_one_iteration(s, offset, n, ret); +if (copy) { +ret = stream_populate(s->blk, offset, n); +} +} else { +assert(need_final_flush); +ret = blk_co_flush(s->blk); +if (ret < 0) { +error_is_read = false; +} else { +need_final_flush = false; +} } if (ret < 0) { BlockErrorAction action = -block_job_error_action(>common, s->on_error, true, -ret); +block_job_error_action(>common, s->on_error, + error_is_read, -ret); if (action == BLOCK_ERROR_ACTION_STOP) { n = 0; continue; -- 2.34.1
[PATCH v2 1/3] block/commit: implement final flush
Actually block job is not completed without the final flush. It's rather unexpected to have broken target when job was successfully completed long ago and now we fail to flush or process just crashed/killed. Mirror job already has mirror_flush() for this. So, it's OK. Do this for commit job too. Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/commit.c | 41 +++-- 1 file changed, 27 insertions(+), 14 deletions(-) diff --git a/block/commit.c b/block/commit.c index 7c3fdcb0ca..81971692a2 100644 --- a/block/commit.c +++ b/block/commit.c @@ -134,6 +134,7 @@ static int coroutine_fn commit_run(Job *job, Error **errp) int64_t n = 0; /* bytes */ QEMU_AUTO_VFREE void *buf = NULL; int64_t len, base_len; +bool need_final_flush = true; len = blk_co_getlength(s->top); if (len < 0) { @@ -155,8 +156,8 @@ static int coroutine_fn commit_run(Job *job, Error **errp) buf = blk_blockalign(s->top, COMMIT_BUFFER_SIZE); -for (offset = 0; offset < len; offset += n) { -bool copy; +for (offset = 0; offset < len || need_final_flush; offset += n) { +bool copy = false; bool error_in_source = true; /* Note that even when no rate limit is applied we need to yield @@ -166,22 +167,34 @@ static int coroutine_fn commit_run(Job *job, Error **errp) if (job_is_cancelled(>common.job)) { break; } -/* Copy if allocated above the base */ -ret = blk_co_is_allocated_above(s->top, s->base_overlay, true, -offset, COMMIT_BUFFER_SIZE, ); -copy = (ret > 0); -trace_commit_one_iteration(s, offset, n, ret); -if (copy) { -assert(n < SIZE_MAX); -ret = blk_co_pread(s->top, offset, n, buf, 0); -if (ret >= 0) { -ret = blk_co_pwrite(s->base, offset, n, buf, 0); -if (ret < 0) { -error_in_source = false; +if (offset < len) { +/* Copy if allocated above the base */ +ret = blk_co_is_allocated_above(s->top, s->base_overlay, true, +offset, COMMIT_BUFFER_SIZE, ); +copy = (ret > 0); +trace_commit_one_iteration(s, offset, n, ret); +if (copy) { +assert(n < SIZE_MAX); + +ret = blk_co_pread(s->top, offset, n, buf, 0); +if (ret >= 0) { +ret = blk_co_pwrite(s->base, offset, n, buf, 0); +if (ret < 0) { +error_in_source = false; +} } } +} else { +assert(need_final_flush); +ret = blk_co_flush(s->base); +if (ret < 0) { +error_in_source = false; +} else { +need_final_flush = false; +} } + if (ret < 0) { BlockErrorAction action = block_job_error_action(>common, s->on_error, -- 2.34.1
[PATCH 2/3] vl.c: select_machine(): use g_autoptr
Signed-off-by: Vladimir Sementsov-Ogievskiy --- system/vl.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/system/vl.c b/system/vl.c index fa81037ce2..947b433905 100644 --- a/system/vl.c +++ b/system/vl.c @@ -1667,7 +1667,7 @@ static MachineClass *select_machine(QDict *qdict, Error **errp) { ERRP_GUARD(); const char *machine_type = qdict_get_try_str(qdict, "type"); -GSList *machines = object_class_get_list(TYPE_MACHINE, false); +g_autoptr(GSList) machines = object_class_get_list(TYPE_MACHINE, false); MachineClass *machine_class = NULL; if (machine_type) { @@ -1683,7 +1683,6 @@ static MachineClass *select_machine(QDict *qdict, Error **errp) } } -g_slist_free(machines); if (!machine_class) { error_append_hint(errp, "Use -machine help to list supported machines\n"); -- 2.34.1
[PATCH 3/3] vl.c: select_machine(): add selected machine type to error message
Signed-off-by: Vladimir Sementsov-Ogievskiy --- system/vl.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/system/vl.c b/system/vl.c index 947b433905..a6a4b470a7 100644 --- a/system/vl.c +++ b/system/vl.c @@ -1674,7 +1674,7 @@ static MachineClass *select_machine(QDict *qdict, Error **errp) machine_class = find_machine(machine_type, machines); qdict_del(qdict, "type"); if (!machine_class) { -error_setg(errp, "unsupported machine type"); +error_setg(errp, "unsupported machine type: \"%s\"", optarg); } } else { machine_class = find_default_machine(machines); -- 2.34.1
[PATCH 0/3] vl.c: select_machine(): improve error message
Hi all! Here are three simple patches, improving select_machine() function a bit. Vladimir Sementsov-Ogievskiy (3): vl.c: select_machine(): use ERRP_GUARD instead of error propagation vl.c: select_machine(): use g_autoptr vl.c: select_machine(): add selected machine type to error message system/vl.c | 17 - 1 file changed, 8 insertions(+), 9 deletions(-) -- 2.34.1
[PATCH 1/3] vl.c: select_machine(): use ERRP_GUARD instead of error propagation
Signed-off-by: Vladimir Sementsov-Ogievskiy --- system/vl.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/system/vl.c b/system/vl.c index cfcb674425..fa81037ce2 100644 --- a/system/vl.c +++ b/system/vl.c @@ -1665,28 +1665,28 @@ static const QEMUOption *lookup_opt(int argc, char **argv, static MachineClass *select_machine(QDict *qdict, Error **errp) { +ERRP_GUARD(); const char *machine_type = qdict_get_try_str(qdict, "type"); GSList *machines = object_class_get_list(TYPE_MACHINE, false); -MachineClass *machine_class; -Error *local_err = NULL; +MachineClass *machine_class = NULL; if (machine_type) { machine_class = find_machine(machine_type, machines); qdict_del(qdict, "type"); if (!machine_class) { -error_setg(_err, "unsupported machine type"); +error_setg(errp, "unsupported machine type"); } } else { machine_class = find_default_machine(machines); if (!machine_class) { -error_setg(_err, "No machine specified, and there is no default"); +error_setg(errp, "No machine specified, and there is no default"); } } g_slist_free(machines); -if (local_err) { -error_append_hint(_err, "Use -machine help to list supported machines\n"); -error_propagate(errp, local_err); +if (!machine_class) { +error_append_hint(errp, + "Use -machine help to list supported machines\n"); } return machine_class; } -- 2.34.1
[PATCH v9 1/7] block-backend: blk_root(): drop const specifier on return type
We'll need get non-const child pointer for graph modifications in further commits. Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/block-backend.c | 2 +- include/sysemu/block-backend-global-state.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/block/block-backend.c b/block/block-backend.c index db6f9b92a3..71d5002198 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -2879,7 +2879,7 @@ int coroutine_fn blk_co_copy_range(BlockBackend *blk_in, int64_t off_in, bytes, read_flags, write_flags); } -const BdrvChild *blk_root(BlockBackend *blk) +BdrvChild *blk_root(BlockBackend *blk) { GLOBAL_STATE_CODE(); return blk->root; diff --git a/include/sysemu/block-backend-global-state.h b/include/sysemu/block-backend-global-state.h index 49c12b0fa9..ccb35546a1 100644 --- a/include/sysemu/block-backend-global-state.h +++ b/include/sysemu/block-backend-global-state.h @@ -126,7 +126,7 @@ void blk_set_force_allow_inactivate(BlockBackend *blk); bool blk_register_buf(BlockBackend *blk, void *host, size_t size, Error **errp); void blk_unregister_buf(BlockBackend *blk, void *host, size_t size); -const BdrvChild *blk_root(BlockBackend *blk); +BdrvChild *blk_root(BlockBackend *blk); int blk_make_empty(BlockBackend *blk, Error **errp); -- 2.34.1
[PATCH v9 2/7] block/export: add blk_by_export_id()
We need it for further blockdev-replace functionality. Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/export/export.c | 18 ++ include/sysemu/block-backend-global-state.h | 1 + 2 files changed, 19 insertions(+) diff --git a/block/export/export.c b/block/export/export.c index 6d51ae8ed7..57beae7982 100644 --- a/block/export/export.c +++ b/block/export/export.c @@ -355,3 +355,21 @@ BlockExportInfoList *qmp_query_block_exports(Error **errp) return head; } + +BlockBackend *blk_by_export_id(const char *id, Error **errp) +{ +BlockExport *exp; + +exp = blk_exp_find(id); +if (exp == NULL) { +error_setg(errp, "Export '%s' not found", id); +return NULL; +} + +if (!exp->blk) { +error_setg(errp, "Export '%s' is empty", id); +return NULL; +} + +return exp->blk; +} diff --git a/include/sysemu/block-backend-global-state.h b/include/sysemu/block-backend-global-state.h index ccb35546a1..410d0cc5c7 100644 --- a/include/sysemu/block-backend-global-state.h +++ b/include/sysemu/block-backend-global-state.h @@ -74,6 +74,7 @@ void blk_detach_dev(BlockBackend *blk, DeviceState *dev); DeviceState *blk_get_attached_dev(BlockBackend *blk); BlockBackend *blk_by_dev(void *dev); BlockBackend *blk_by_qdev_id(const char *id, Error **errp); +BlockBackend *blk_by_export_id(const char *id, Error **errp); void blk_set_dev_ops(BlockBackend *blk, const BlockDevOps *ops, void *opaque); void blk_activate(BlockBackend *blk, Error **errp); -- 2.34.1
[PATCH v9 0/7] blockdev-replace
Hi all! This series presents a new command blockdev-replace, which helps to insert/remove filters anywhere in the block graph. It can: - replace qdev block-node by qdev-id - replace export block-node by export-id - replace any child of parent block-node by node-name and child name So insertions is done in two steps: 1. blockdev_add (create filter node, unparented) [some parent] [new filter] | | V V [some child ] 2. blockdev-replace (replace child by the filter) [some parent] | V [new filter] | V [some child] And removal is done in reverse order: 1. blockdev-replace (go back to picture 1) 2. blockdev_del (remove filter node) Ideally, we to do both operations (add + replace or replace + del) in a transaction, but that would be another series. v9: rebase drop x- prefix and use unstable feature bump version to 9.1 in qapi spec update error message in blk_by_qdev_id stub v8: rebase. Also don't use "preallocate" filter in a test, as we don't support removal of this filter for now. Preallocate filter is really unusual, see discussion here: https://www.mail-archive.com/qemu-devel@nongnu.org/msg994945.html Vladimir Sementsov-Ogievskiy (7): block-backend: blk_root(): drop const specifier on return type block/export: add blk_by_export_id() block: make bdrv_find_child() function public qapi: add blockdev-replace command block: bdrv_get_xdbg_block_graph(): report export ids iotests.py: introduce VM.assert_edges_list() method iotests: add filter-insertion block.c | 17 ++ block/block-backend.c | 2 +- block/export/export.c | 31 +++ blockdev.c| 70 -- include/block/block_int-io.h | 2 + include/block/export.h| 1 + include/sysemu/block-backend-global-state.h | 3 +- qapi/block-core.json | 88 +++ stubs/blk-by-qdev-id.c| 13 + stubs/blk-exp-find-by-blk.c | 9 + stubs/meson.build | 2 + tests/qemu-iotests/iotests.py | 17 ++ tests/qemu-iotests/tests/filter-insertion | 236 ++ tests/qemu-iotests/tests/filter-insertion.out | 5 + 14 files changed, 480 insertions(+), 16 deletions(-) create mode 100644 stubs/blk-by-qdev-id.c create mode 100644 stubs/blk-exp-find-by-blk.c create mode 100755 tests/qemu-iotests/tests/filter-insertion create mode 100644 tests/qemu-iotests/tests/filter-insertion.out -- 2.34.1
[PATCH v9 7/7] iotests: add filter-insertion
Demonstrate new blockdev-replace API for filter insertion and removal. Signed-off-by: Vladimir Sementsov-Ogievskiy --- tests/qemu-iotests/tests/filter-insertion | 236 ++ tests/qemu-iotests/tests/filter-insertion.out | 5 + 2 files changed, 241 insertions(+) create mode 100755 tests/qemu-iotests/tests/filter-insertion create mode 100644 tests/qemu-iotests/tests/filter-insertion.out diff --git a/tests/qemu-iotests/tests/filter-insertion b/tests/qemu-iotests/tests/filter-insertion new file mode 100755 index 00..02a0978f96 --- /dev/null +++ b/tests/qemu-iotests/tests/filter-insertion @@ -0,0 +1,236 @@ +#!/usr/bin/env python3 +# +# Tests for inserting and removing filters in a block graph. +# +# Copyright (c) 2022 Virtuozzo International GmbH. +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see <http://www.gnu.org/licenses/>. +# + +import os + +import iotests +from iotests import qemu_img_create, try_remove + + +disk = os.path.join(iotests.test_dir, 'disk') +sock = os.path.join(iotests.sock_dir, 'sock') +size = 1024 * 1024 + + +class TestFilterInsertion(iotests.QMPTestCase): +def setUp(self): +qemu_img_create('-f', iotests.imgfmt, disk, str(size)) + +self.vm = iotests.VM() +self.vm.launch() + +self.vm.cmd('blockdev-add', { +'node-name': 'disk0', +'driver': 'qcow2', +'file': { +'node-name': 'file0', +'driver': 'file', +'filename': disk +} +}) + +def tearDown(self): +self.vm.shutdown() +os.remove(disk) +try_remove(sock) + +def test_simple_insertion(self): +vm = self.vm + +vm.cmd('blockdev-add', { +'node-name': 'filter', +'driver': 'blkdebug', +'image': 'file0' +}) + +vm.cmd('blockdev-replace', { +'parent-type': 'driver', +'node-name': 'disk0', +'child': 'file', +'new-child': 'filter' +}) + +# Filter inserted: +# disk0 -file-> filter -file-> file0 +vm.assert_edges_list([ +('disk0', 'file', 'filter'), +('filter', 'image', 'file0') +]) + +vm.cmd('blockdev-replace', { +'parent-type': 'driver', +'node-name': 'disk0', +'child': 'file', +'new-child': 'file0' +}) + +# Filter replaced, but still exists: +# dik0 -file-> file0 <-file- filter +vm.assert_edges_list([ +('disk0', 'file', 'file0'), +('filter', 'image', 'file0') +]) + +vm.cmd('blockdev-del', node_name='filter') + +# Filter removed +# dik0 -file-> file0 +vm.assert_edges_list([ +('disk0', 'file', 'file0') +]) + +def test_insert_under_qdev(self): +vm = self.vm + +vm.cmd('device_add', driver='virtio-scsi') +vm.cmd('device_add', id='sda', driver='scsi-hd', + drive='disk0') + +vm.cmd('blockdev-add', { +'node-name': 'filter', +'driver': 'compress', +'file': 'disk0' +}) + +vm.cmd('blockdev-replace', { +'parent-type': 'qdev', +'qdev-id': 'sda', +'new-child': 'filter' +}) + +# Filter inserted: +# sda -root-> filter -file-> disk0 -file-> file0 +vm.assert_edges_list([ +# parent_node_name, child_name, child_node_name +('sda', 'root', 'filter'), +('filter', 'file', 'disk0'), +('disk0', 'file', 'file0'), +]) + +vm.cmd('blockdev-replace', { +'parent-type': 'qdev', +'qdev-id': 'sda', +'new-child': 'disk0' +}) +vm.cmd('blockdev-del', node_name='filter') + +# Filter removed: +# sda -root-> disk0 -file-> file0 +vm.assert_edges_list([ +# parent_node_name, child_name, child_node_name +('sda', 'root', 'disk0'), +('disk0', 'file', 'file0'), +]) + +def test_insert_under_nbd_export(self): +vm = self.vm + +vm.cmd('nbd-server-start', + addr={'type': 'unix', 'data': {'path': sock}}) +vm.cmd('block-export-add', id='exp1', type='nbd', +
[PATCH v9 4/7] qapi: add blockdev-replace command
Add a command that can replace bs in following BdrvChild structures: - qdev blk root child - block-export blk root child - any child of BlockDriverState selected by child-name Signed-off-by: Vladimir Sementsov-Ogievskiy --- blockdev.c | 56 +++ qapi/block-core.json | 88 ++ stubs/blk-by-qdev-id.c | 13 +++ stubs/meson.build | 1 + 4 files changed, 158 insertions(+) create mode 100644 stubs/blk-by-qdev-id.c diff --git a/blockdev.c b/blockdev.c index ba7e90b06e..2190467022 100644 --- a/blockdev.c +++ b/blockdev.c @@ -3559,6 +3559,62 @@ void qmp_x_blockdev_set_iothread(const char *node_name, StrOrNull *iothread, bdrv_try_change_aio_context(bs, new_context, NULL, errp); } +void qmp_blockdev_replace(BlockdevReplace *repl, Error **errp) +{ +BdrvChild *child = NULL; +BlockDriverState *new_child_bs; + +if (repl->parent_type == BLOCK_PARENT_TYPE_DRIVER) { +BlockDriverState *parent_bs; + +parent_bs = bdrv_find_node(repl->u.driver.node_name); +if (!parent_bs) { +error_setg(errp, "Block driver node with node-name '%s' not " + "found", repl->u.driver.node_name); +return; +} + +child = bdrv_find_child(parent_bs, repl->u.driver.child); +if (!child) { +error_setg(errp, "Block driver node '%s' doesn't have child " + "named '%s'", repl->u.driver.node_name, + repl->u.driver.child); +return; +} +} else { +/* Other types are similar, they work through blk */ +BlockBackend *blk; +bool is_qdev = repl->parent_type == BLOCK_PARENT_TYPE_QDEV; +const char *id = +is_qdev ? repl->u.qdev.qdev_id : repl->u.export.export_id; + +assert(is_qdev || repl->parent_type == BLOCK_PARENT_TYPE_EXPORT); + +blk = is_qdev ? blk_by_qdev_id(id, errp) : blk_by_export_id(id, errp); +if (!blk) { +return; +} + +child = blk_root(blk); +if (!child) { +error_setg(errp, "%s '%s' is empty, nothing to replace", + is_qdev ? "Device" : "Export", id); +return; +} +} + +assert(child); +assert(child->bs); + +new_child_bs = bdrv_find_node(repl->new_child); +if (!new_child_bs) { +error_setg(errp, "Node '%s' not found", repl->new_child); +return; +} + +bdrv_replace_child_bs(child, new_child_bs, errp); +} + QemuOptsList qemu_common_drive_opts = { .name = "drive", .head = QTAILQ_HEAD_INITIALIZER(qemu_common_drive_opts.head), diff --git a/qapi/block-core.json b/qapi/block-core.json index df5e07debd..0a6f08a6e0 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -6148,3 +6148,91 @@ ## { 'struct': 'DummyBlockCoreForceArrays', 'data': { 'unused-block-graph-info': ['BlockGraphInfo'] } } + +## +# @BlockParentType: +# +# @qdev: block device, such as created by device_add, and denoted by +# qdev-id +# +# @driver: block driver node, such as created by blockdev-add, and +# denoted by node-name +# +# @export: block export, such created by block-export-add, and +# denoted by export-id +# +# Since 9.1 +## +{ 'enum': 'BlockParentType', + 'data': ['qdev', 'driver', 'export'] } + +## +# @BdrvChildRefQdev: +# +# @qdev-id: the device's ID or QOM path +# +# Since 9.1 +## +{ 'struct': 'BdrvChildRefQdev', + 'data': { 'qdev-id': 'str' } } + +## +# @BdrvChildRefExport: +# +# @export-id: block export identifier +# +# Since 9.1 +## +{ 'struct': 'BdrvChildRefExport', + 'data': { 'export-id': 'str' } } + +## +# @BdrvChildRefDriver: +# +# @node-name: the node name of the parent block node +# +# @child: name of the child to be replaced, like "file" or "backing" +# +# Since 9.1 +## +{ 'struct': 'BdrvChildRefDriver', + 'data': { 'node-name': 'str', 'child': 'str' } } + +## +# @BlockdevReplace: +# +# @parent-type: type of the parent, which child is to be replaced +# +# @new-child: new child for replacement +# +# Since 9.1 +## +{ 'union': 'BlockdevReplace', + 'base': { + 'parent-type': 'BlockParentType', + 'new-child': 'str' + }, + 'discriminator': 'parent-type', + 'data': { + 'qdev': 'BdrvChildRefQdev', + 'export': 'BdrvChildRefExport', + 'driver': 'BdrvChildRefDriver' + } } + +## +# @blockdev-replace: +# +# Replace a block-node associated with device (selected by +# @qdev-id) or with block-export (selected by @export-id) or +# any child of block-node (selected by @node-name and @child) +# with @new-child block-node. +# +# Features: +# +# @unstable: This command is experimental. +# +# Since 9.1 +## +{ 'command': 'blockdev-replace', 'boxed': true, + 'features': [ 'unstable' ], + 'data': 'BlockdevRe
[PATCH v9 6/7] iotests.py: introduce VM.assert_edges_list() method
Add an alternative method to check block graph, to be used in further commit. Signed-off-by: Vladimir Sementsov-Ogievskiy --- tests/qemu-iotests/iotests.py | 17 + 1 file changed, 17 insertions(+) diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py index ea48af4a7b..8a5fd01eac 100644 --- a/tests/qemu-iotests/iotests.py +++ b/tests/qemu-iotests/iotests.py @@ -1108,6 +1108,23 @@ def check_bitmap_status(self, node_name, bitmap_name, fields): return fields.items() <= ret.items() +def get_block_graph(self): +""" +Returns block graph in form of edges list, where each edge is a tuple: + (parent_node_name, child_name, child_node_name) +""" +graph = self.qmp('x-debug-query-block-graph')['return'] + +nodes = {n['id']: n['name'] for n in graph['nodes']} +# Check that all names are unique: +assert len(set(nodes.values())) == len(nodes) + +return [(nodes[e['parent']], e['name'], nodes[e['child']]) +for e in graph['edges']] + +def assert_edges_list(self, edges): +assert sorted(edges) == sorted(self.get_block_graph()) + def assert_block_path(self, root, path, expected_node, graph=None): """ Check whether the node under the given path in the block graph -- 2.34.1
[PATCH v9 3/7] block: make bdrv_find_child() function public
To be reused soon for blockdev-replace functionality. Signed-off-by: Vladimir Sementsov-Ogievskiy --- block.c | 13 + blockdev.c | 14 -- include/block/block_int-io.h | 2 ++ 3 files changed, 15 insertions(+), 14 deletions(-) diff --git a/block.c b/block.c index 468cf5e67d..f6292f459a 100644 --- a/block.c +++ b/block.c @@ -8174,6 +8174,19 @@ int bdrv_make_empty(BdrvChild *c, Error **errp) return 0; } +BdrvChild *bdrv_find_child(BlockDriverState *parent_bs, const char *child_name) +{ +BdrvChild *child; + +QLIST_FOREACH(child, _bs->children, next) { +if (strcmp(child->name, child_name) == 0) { +return child; +} +} + +return NULL; +} + /* * Return the child that @bs acts as an overlay for, and from which data may be * copied in COW or COR operations. Usually this is the backing file. diff --git a/blockdev.c b/blockdev.c index 835064ed03..ba7e90b06e 100644 --- a/blockdev.c +++ b/blockdev.c @@ -3452,20 +3452,6 @@ void qmp_blockdev_del(const char *node_name, Error **errp) bdrv_unref(bs); } -static BdrvChild * GRAPH_RDLOCK -bdrv_find_child(BlockDriverState *parent_bs, const char *child_name) -{ -BdrvChild *child; - -QLIST_FOREACH(child, _bs->children, next) { -if (strcmp(child->name, child_name) == 0) { -return child; -} -} - -return NULL; -} - void qmp_x_blockdev_change(const char *parent, const char *child, const char *node, Error **errp) { diff --git a/include/block/block_int-io.h b/include/block/block_int-io.h index 4a7cf2b4fd..11ed4aa927 100644 --- a/include/block/block_int-io.h +++ b/include/block/block_int-io.h @@ -130,6 +130,8 @@ bdrv_co_refresh_total_sectors(BlockDriverState *bs, int64_t hint); int co_wrapper_mixed_bdrv_rdlock bdrv_refresh_total_sectors(BlockDriverState *bs, int64_t hint); +BdrvChild * GRAPH_RDLOCK +bdrv_find_child(BlockDriverState *parent_bs, const char *child_name); BdrvChild * GRAPH_RDLOCK bdrv_cow_child(BlockDriverState *bs); BdrvChild * GRAPH_RDLOCK bdrv_filter_child(BlockDriverState *bs); BdrvChild * GRAPH_RDLOCK bdrv_filter_or_cow_child(BlockDriverState *bs); -- 2.34.1
[PATCH v9 5/7] block: bdrv_get_xdbg_block_graph(): report export ids
Currently for block exports we report empty blk names. That's not good. Let's try to find corresponding block export and report its id. Signed-off-by: Vladimir Sementsov-Ogievskiy --- block.c | 4 block/export/export.c | 13 + include/block/export.h | 1 + stubs/blk-exp-find-by-blk.c | 9 + stubs/meson.build | 1 + 5 files changed, 28 insertions(+) create mode 100644 stubs/blk-exp-find-by-blk.c diff --git a/block.c b/block.c index f6292f459a..601475e835 100644 --- a/block.c +++ b/block.c @@ -6326,7 +6326,11 @@ XDbgBlockGraph *bdrv_get_xdbg_block_graph(Error **errp) for (blk = blk_all_next(NULL); blk; blk = blk_all_next(blk)) { char *allocated_name = NULL; const char *name = blk_name(blk); +BlockExport *exp = blk_exp_find_by_blk(blk); +if (!*name && exp) { +name = exp->id; +} if (!*name) { name = allocated_name = blk_get_attached_dev_id(blk); } diff --git a/block/export/export.c b/block/export/export.c index 57beae7982..8744a1171e 100644 --- a/block/export/export.c +++ b/block/export/export.c @@ -60,6 +60,19 @@ BlockExport *blk_exp_find(const char *id) return NULL; } +BlockExport *blk_exp_find_by_blk(BlockBackend *blk) +{ +BlockExport *exp; + +QLIST_FOREACH(exp, _exports, next) { +if (exp->blk == blk) { +return exp; +} +} + +return NULL; +} + static const BlockExportDriver *blk_exp_find_driver(BlockExportType type) { int i; diff --git a/include/block/export.h b/include/block/export.h index f2fe0f8078..16863d37cf 100644 --- a/include/block/export.h +++ b/include/block/export.h @@ -82,6 +82,7 @@ struct BlockExport { BlockExport *blk_exp_add(BlockExportOptions *export, Error **errp); BlockExport *blk_exp_find(const char *id); +BlockExport *blk_exp_find_by_blk(BlockBackend *blk); void blk_exp_ref(BlockExport *exp); void blk_exp_unref(BlockExport *exp); void blk_exp_request_shutdown(BlockExport *exp); diff --git a/stubs/blk-exp-find-by-blk.c b/stubs/blk-exp-find-by-blk.c new file mode 100644 index 00..2fc1da953b --- /dev/null +++ b/stubs/blk-exp-find-by-blk.c @@ -0,0 +1,9 @@ +#include "qemu/osdep.h" +#include "sysemu/block-backend.h" +#include "block/export.h" + +BlockExport *blk_exp_find_by_blk(BlockBackend *blk) +{ +return NULL; +} + diff --git a/stubs/meson.build b/stubs/meson.build index 068998c1a5..cbe30f94e8 100644 --- a/stubs/meson.build +++ b/stubs/meson.build @@ -16,6 +16,7 @@ if have_block stub_ss.add(files('blk-commit-all.c')) stub_ss.add(files('blk-exp-close-all.c')) stub_ss.add(files('blk-by-qdev-id.c')) + stub_ss.add(files('blk-exp-find-by-blk.c')) stub_ss.add(files('blockdev-close-all-bdrv-states.c')) stub_ss.add(files('change-state-handler.c')) stub_ss.add(files('get-vm-name.c')) -- 2.34.1
Re: [PATCH v4] virtio: add VIRTQUEUE_ERROR QAPI event
ping4 On 17.10.23 15:39, Vladimir Sementsov-Ogievskiy wrote: For now we only log the vhost device error, when virtqueue is actually stopped. Let's add a QAPI event, which makes possible: - collect statistics of such errors - make immediate actions: take core dumps or do some other debugging - inform the user through a management API or UI, so that (s)he can react somehow, e.g. reset the device driver in the guest or even build up some automation to do so Note that basically every inconsistency discovered during virtqueue processing results in a silent virtqueue stop. The guest then just sees the requests getting stuck somewhere in the device for no visible reason. This event provides a means to inform the management layer of this situation in a timely fashion. The event could be reused for some other virtqueue problems (not only for vhost devices) in future. For this it gets a generic name and structure. We keep original VHOST_OPS_DEBUG(), to keep original debug output as is here, it's not the only call to VHOST_OPS_DEBUG in the file. Signed-off-by: Vladimir Sementsov-Ogievskiy Reviewed-by: Denis Plotnikov Acked-by: Markus Armbruster -- Best regards, Vladimir
Re: [PATCH v2] block-backend: per-device throttling of BLOCK_IO_ERROR reports
ping2 On 09.01.24 16:13, Vladimir Sementsov-Ogievskiy wrote: From: Leonid Kaplan BLOCK_IO_ERROR events comes from guest, so we must throttle them. We still want per-device throttling, so let's use device id as a key. Signed-off-by: Leonid Kaplan Signed-off-by: Vladimir Sementsov-Ogievskiy --- v2: add Note: to QAPI doc monitor/monitor.c| 10 ++ qapi/block-core.json | 2 ++ 2 files changed, 12 insertions(+) diff --git a/monitor/monitor.c b/monitor/monitor.c index 01ede1babd..ad0243e9d7 100644 --- a/monitor/monitor.c +++ b/monitor/monitor.c @@ -309,6 +309,7 @@ int error_printf_unless_qmp(const char *fmt, ...) static MonitorQAPIEventConf monitor_qapi_event_conf[QAPI_EVENT__MAX] = { /* Limit guest-triggerable events to 1 per second */ [QAPI_EVENT_RTC_CHANGE]= { 1000 * SCALE_MS }, +[QAPI_EVENT_BLOCK_IO_ERROR]= { 1000 * SCALE_MS }, [QAPI_EVENT_WATCHDOG] = { 1000 * SCALE_MS }, [QAPI_EVENT_BALLOON_CHANGE]= { 1000 * SCALE_MS }, [QAPI_EVENT_QUORUM_REPORT_BAD] = { 1000 * SCALE_MS }, @@ -498,6 +499,10 @@ static unsigned int qapi_event_throttle_hash(const void *key) hash += g_str_hash(qdict_get_str(evstate->data, "qom-path")); } +if (evstate->event == QAPI_EVENT_BLOCK_IO_ERROR) { +hash += g_str_hash(qdict_get_str(evstate->data, "device")); +} + return hash; } @@ -525,6 +530,11 @@ static gboolean qapi_event_throttle_equal(const void *a, const void *b) qdict_get_str(evb->data, "qom-path")); } +if (eva->event == QAPI_EVENT_BLOCK_IO_ERROR) { +return !strcmp(qdict_get_str(eva->data, "device"), + qdict_get_str(evb->data, "device")); +} + return TRUE; } diff --git a/qapi/block-core.json b/qapi/block-core.json index ca390c5700..32c2c2f030 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -5559,6 +5559,8 @@ # Note: If action is "stop", a STOP event will eventually follow the # BLOCK_IO_ERROR event # +# Note: This event is rate-limited. +# # Since: 0.13 # # Example: -- Best regards, Vladimir
[PATCH v2 3/7] qapi: block-job-change: make copy-mode parameter optional
We are going to add more parameters to change. We want to make possible to change only one or any subset of available options. So all the options should be optional. Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/mirror.c | 4 qapi/block-core.json | 3 ++- 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/block/mirror.c b/block/mirror.c index 2816bb1042..60e8d83e4f 100644 --- a/block/mirror.c +++ b/block/mirror.c @@ -1272,6 +1272,10 @@ static void mirror_change(BlockJob *job, JobChangeOptions *opts, GLOBAL_STATE_CODE(); +if (!change_opts->has_copy_mode) { +return; +} + if (qatomic_read(>copy_mode) == change_opts->copy_mode) { return; } diff --git a/qapi/block-core.json b/qapi/block-core.json index 4ec5632596..660c7f4a48 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -3071,11 +3071,12 @@ # # @copy-mode: Switch to this copy mode. Currently, only the switch # from 'background' to 'write-blocking' is implemented. +# If absent, copy mode remains the same. (optional since 9.1) # # Since: 8.2 ## { 'struct': 'JobChangeOptionsMirror', - 'data': { 'copy-mode' : 'MirrorCopyMode' } } + 'data': { '*copy-mode' : 'MirrorCopyMode' } } ## # @JobChangeOptions: -- 2.34.1
[PATCH v2 1/7] qapi: rename BlockJobChangeOptions to JobChangeOptions
We are going to move change action from block-job to job implementation, and then move to job-* extenral APIs, deprecating block-job-* APIs. This commit simplifies further transition. The commit is made by command git grep -l BlockJobChangeOptions | \ xargs sed -i 's/BlockJobChangeOptions/JobChangeOptions/g' Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/mirror.c | 4 ++-- blockdev.c | 2 +- blockjob.c | 2 +- include/block/blockjob.h | 2 +- include/block/blockjob_int.h | 2 +- qapi/block-core.json | 12 ++-- 6 files changed, 12 insertions(+), 12 deletions(-) diff --git a/block/mirror.c b/block/mirror.c index 61f0a717b7..2816bb1042 100644 --- a/block/mirror.c +++ b/block/mirror.c @@ -1258,11 +1258,11 @@ static bool commit_active_cancel(Job *job, bool force) return force || !job_is_ready(job); } -static void mirror_change(BlockJob *job, BlockJobChangeOptions *opts, +static void mirror_change(BlockJob *job, JobChangeOptions *opts, Error **errp) { MirrorBlockJob *s = container_of(job, MirrorBlockJob, common); -BlockJobChangeOptionsMirror *change_opts = >u.mirror; +JobChangeOptionsMirror *change_opts = >u.mirror; MirrorCopyMode current; /* diff --git a/blockdev.c b/blockdev.c index 835064ed03..3f4ed96ecc 100644 --- a/blockdev.c +++ b/blockdev.c @@ -3248,7 +3248,7 @@ void qmp_block_job_dismiss(const char *id, Error **errp) job_dismiss_locked(, errp); } -void qmp_block_job_change(BlockJobChangeOptions *opts, Error **errp) +void qmp_block_job_change(JobChangeOptions *opts, Error **errp) { BlockJob *job; diff --git a/blockjob.c b/blockjob.c index d5f29e14af..8cfbb15543 100644 --- a/blockjob.c +++ b/blockjob.c @@ -312,7 +312,7 @@ static bool block_job_set_speed(BlockJob *job, int64_t speed, Error **errp) return block_job_set_speed_locked(job, speed, errp); } -void block_job_change_locked(BlockJob *job, BlockJobChangeOptions *opts, +void block_job_change_locked(BlockJob *job, JobChangeOptions *opts, Error **errp) { const BlockJobDriver *drv = block_job_driver(job); diff --git a/include/block/blockjob.h b/include/block/blockjob.h index 7061ab7201..5dd1b08909 100644 --- a/include/block/blockjob.h +++ b/include/block/blockjob.h @@ -181,7 +181,7 @@ bool block_job_set_speed_locked(BlockJob *job, int64_t speed, Error **errp); * * Change the job according to opts. */ -void block_job_change_locked(BlockJob *job, BlockJobChangeOptions *opts, +void block_job_change_locked(BlockJob *job, JobChangeOptions *opts, Error **errp); /** diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h index 4c3d2e25a2..d9c3b911d0 100644 --- a/include/block/blockjob_int.h +++ b/include/block/blockjob_int.h @@ -73,7 +73,7 @@ struct BlockJobDriver { * * Note that this can already be called before the job coroutine is running. */ -void (*change)(BlockJob *job, BlockJobChangeOptions *opts, Error **errp); +void (*change)(BlockJob *job, JobChangeOptions *opts, Error **errp); /* * Query information specific to this kind of block job. diff --git a/qapi/block-core.json b/qapi/block-core.json index df5e07debd..4ec5632596 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -3067,18 +3067,18 @@ 'allow-preconfig': true } ## -# @BlockJobChangeOptionsMirror: +# @JobChangeOptionsMirror: # # @copy-mode: Switch to this copy mode. Currently, only the switch # from 'background' to 'write-blocking' is implemented. # # Since: 8.2 ## -{ 'struct': 'BlockJobChangeOptionsMirror', +{ 'struct': 'JobChangeOptionsMirror', 'data': { 'copy-mode' : 'MirrorCopyMode' } } ## -# @BlockJobChangeOptions: +# @JobChangeOptions: # # Block job options that can be changed after job creation. # @@ -3088,10 +3088,10 @@ # # Since: 8.2 ## -{ 'union': 'BlockJobChangeOptions', +{ 'union': 'JobChangeOptions', 'base': { 'id': 'str', 'type': 'JobType' }, 'discriminator': 'type', - 'data': { 'mirror': 'BlockJobChangeOptionsMirror' } } + 'data': { 'mirror': 'JobChangeOptionsMirror' } } ## # @block-job-change: @@ -3101,7 +3101,7 @@ # Since: 8.2 ## { 'command': 'block-job-change', - 'data': 'BlockJobChangeOptions', 'boxed': true } + 'data': 'JobChangeOptions', 'boxed': true } ## # @BlockdevDiscardOptions: -- 2.34.1
[PATCH v2 7/7] iotests/mirror-change-copy-mode: switch to job-change command
block-job-change is deprecated, let's move test to job-change. Signed-off-by: Vladimir Sementsov-Ogievskiy --- tests/qemu-iotests/tests/mirror-change-copy-mode | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/qemu-iotests/tests/mirror-change-copy-mode b/tests/qemu-iotests/tests/mirror-change-copy-mode index 51788b85c7..e972604ebf 100755 --- a/tests/qemu-iotests/tests/mirror-change-copy-mode +++ b/tests/qemu-iotests/tests/mirror-change-copy-mode @@ -150,7 +150,7 @@ class TestMirrorChangeCopyMode(iotests.QMPTestCase): len_before_change = result[0]['len'] # Change the copy mode while requests are happening. -self.vm.cmd('block-job-change', +self.vm.cmd('job-change', id='mirror', type='mirror', copy_mode='write-blocking') -- 2.34.1
[PATCH v2 0/7] introduce job-change qmp command
Hi all! This is an updated first part of my "[RFC 00/15] block job API" Supersedes: <20240313150907.623462-1-vsement...@yandex-team.ru> v2: - only job-change for now, as a first step - drop "type-based unions", and keep type parameter as is for now (I now doubt that this was good idea, as it makes QAPI protocol dependent on context) 03: improve documentation 06: deprecated only block-job-change for now 07: new Vladimir Sementsov-Ogievskiy (7): qapi: rename BlockJobChangeOptions to JobChangeOptions blockjob: block_job_change_locked(): check job type qapi: block-job-change: make copy-mode parameter optional blockjob: move change action implementation to job from block-job qapi: add job-change qapi/block-core: derpecate block-job-change iotests/mirror-change-copy-mode: switch to job-change command block/mirror.c| 13 +--- blockdev.c| 4 +-- blockjob.c| 20 docs/about/deprecated.rst | 5 +++ include/block/blockjob.h | 11 --- include/block/blockjob_int.h | 7 - include/qemu/job.h| 12 +++ job-qmp.c | 15 + job.c | 23 ++ qapi/block-core.json | 31 ++- .../tests/mirror-change-copy-mode | 2 +- 11 files changed, 90 insertions(+), 53 deletions(-) -- 2.34.1
[PATCH v2 2/7] blockjob: block_job_change_locked(): check job type
User may specify wrong type for the job id. Let's check it. Signed-off-by: Vladimir Sementsov-Ogievskiy --- blockjob.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/blockjob.c b/blockjob.c index 8cfbb15543..788cb1e07d 100644 --- a/blockjob.c +++ b/blockjob.c @@ -319,6 +319,12 @@ void block_job_change_locked(BlockJob *job, JobChangeOptions *opts, GLOBAL_STATE_CODE(); +if (job_type(>job) != opts->type) { +error_setg(errp, "Job '%s' is '%s' job, not '%s'", job->job.id, + job_type_str(>job), JobType_str(opts->type)); +return; +} + if (job_apply_verb_locked(>job, JOB_VERB_CHANGE, errp)) { return; } -- 2.34.1
[PATCH v2 6/7] qapi/block-core: derpecate block-job-change
That's a first step to move on newer job-* APIs. The difference between block-job-change and job-change is in find_block_job_locked() vs find_job_locked() functions. What's different? 1. find_block_job_locked() do check, is found job a block-job. This OK when moving to more generic API, no needs to document this change. 2. find_block_job_locked() reports DeviceNotActive on failure, when find_job_locked() reports GenericError. Still, for block-job-change errors are not documented at all, so be silent in deprecated.txt as well. Signed-off-by: Vladimir Sementsov-Ogievskiy --- docs/about/deprecated.rst | 5 + qapi/block-core.json | 6 ++ 2 files changed, 11 insertions(+) diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst index ff3da68208..0ddced0781 100644 --- a/docs/about/deprecated.rst +++ b/docs/about/deprecated.rst @@ -134,6 +134,11 @@ options are removed in favor of using explicit ``blockdev-create`` and ``blockdev-add`` calls. See :doc:`/interop/live-block-operations` for details. +``block-job-change`` (since 9.1) + + +Use ``job-change`` instead. + Incorrectly typed ``device_add`` arguments (since 6.2) '' diff --git a/qapi/block-core.json b/qapi/block-core.json index 9087ce300c..064cad0b64 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -3099,9 +3099,15 @@ # # Change the block job's options. # +# Features: +# +# @deprecated: This command is deprecated. Use @job-change +# instead. +# # Since: 8.2 ## { 'command': 'block-job-change', + 'features': ['deprecated'], 'data': 'JobChangeOptions', 'boxed': true } ## -- 2.34.1
[PATCH v2 4/7] blockjob: move change action implementation to job from block-job
Like for other block-job-* APIs we want have the actual functionality in job layer and make block-job-change to be a deprecated duplication of job-change in the following commit. Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/mirror.c | 7 +++ blockdev.c | 2 +- blockjob.c | 26 -- include/block/blockjob.h | 11 --- include/block/blockjob_int.h | 7 --- include/qemu/job.h | 12 job-qmp.c| 1 + job.c| 23 +++ 8 files changed, 40 insertions(+), 49 deletions(-) diff --git a/block/mirror.c b/block/mirror.c index 60e8d83e4f..63e35114f3 100644 --- a/block/mirror.c +++ b/block/mirror.c @@ -1258,10 +1258,9 @@ static bool commit_active_cancel(Job *job, bool force) return force || !job_is_ready(job); } -static void mirror_change(BlockJob *job, JobChangeOptions *opts, - Error **errp) +static void mirror_change(Job *job, JobChangeOptions *opts, Error **errp) { -MirrorBlockJob *s = container_of(job, MirrorBlockJob, common); +MirrorBlockJob *s = container_of(job, MirrorBlockJob, common.job); JobChangeOptionsMirror *change_opts = >u.mirror; MirrorCopyMode current; @@ -1316,9 +1315,9 @@ static const BlockJobDriver mirror_job_driver = { .pause = mirror_pause, .complete = mirror_complete, .cancel = mirror_cancel, +.change = mirror_change, }, .drained_poll = mirror_drained_poll, -.change = mirror_change, .query = mirror_query, }; diff --git a/blockdev.c b/blockdev.c index 3f4ed96ecc..70b6aeaef0 100644 --- a/blockdev.c +++ b/blockdev.c @@ -3259,7 +3259,7 @@ void qmp_block_job_change(JobChangeOptions *opts, Error **errp) return; } -block_job_change_locked(job, opts, errp); +job_change_locked(>job, opts, errp); } void qmp_change_backing_file(const char *device, diff --git a/blockjob.c b/blockjob.c index 788cb1e07d..2769722b37 100644 --- a/blockjob.c +++ b/blockjob.c @@ -312,32 +312,6 @@ static bool block_job_set_speed(BlockJob *job, int64_t speed, Error **errp) return block_job_set_speed_locked(job, speed, errp); } -void block_job_change_locked(BlockJob *job, JobChangeOptions *opts, - Error **errp) -{ -const BlockJobDriver *drv = block_job_driver(job); - -GLOBAL_STATE_CODE(); - -if (job_type(>job) != opts->type) { -error_setg(errp, "Job '%s' is '%s' job, not '%s'", job->job.id, - job_type_str(>job), JobType_str(opts->type)); -return; -} - -if (job_apply_verb_locked(>job, JOB_VERB_CHANGE, errp)) { -return; -} - -if (drv->change) { -job_unlock(); -drv->change(job, opts, errp); -job_lock(); -} else { -error_setg(errp, "Job type does not support change"); -} -} - void block_job_ratelimit_processed_bytes(BlockJob *job, uint64_t n) { IO_CODE(); diff --git a/include/block/blockjob.h b/include/block/blockjob.h index 5dd1b08909..72e849a140 100644 --- a/include/block/blockjob.h +++ b/include/block/blockjob.h @@ -173,17 +173,6 @@ bool block_job_has_bdrv(BlockJob *job, BlockDriverState *bs); */ bool block_job_set_speed_locked(BlockJob *job, int64_t speed, Error **errp); -/** - * block_job_change_locked: - * @job: The job to change. - * @opts: The new options. - * @errp: Error object. - * - * Change the job according to opts. - */ -void block_job_change_locked(BlockJob *job, JobChangeOptions *opts, - Error **errp); - /** * block_job_query_locked: * @job: The job to get information about. diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h index d9c3b911d0..58bc7a5cea 100644 --- a/include/block/blockjob_int.h +++ b/include/block/blockjob_int.h @@ -68,13 +68,6 @@ struct BlockJobDriver { void (*set_speed)(BlockJob *job, int64_t speed); -/* - * Change the @job's options according to @opts. - * - * Note that this can already be called before the job coroutine is running. - */ -void (*change)(BlockJob *job, JobChangeOptions *opts, Error **errp); - /* * Query information specific to this kind of block job. */ diff --git a/include/qemu/job.h b/include/qemu/job.h index 2b873f2576..6fa525dac3 100644 --- a/include/qemu/job.h +++ b/include/qemu/job.h @@ -27,6 +27,7 @@ #define JOB_H #include "qapi/qapi-types-job.h" +#include "qapi/qapi-types-block-core.h" #include "qemu/queue.h" #include "qemu/progress_meter.h" #include "qemu/coroutine.h" @@ -307,6 +308,12 @@ struct JobDriver { */ bool (*cancel)(Job *job, bool force); +/** + * Chang
[PATCH v2 5/7] qapi: add job-change
Add a new-style command job-change, doing same thing as block-job-change. The aim is finally deprecate block-job-* APIs and move to job-* APIs. We add a new command to qapi/block-core.json, not to qapi/job.json to avoid resolving json file including loops for now. This all would be a lot simple to refactor when we finally drop deprecated block-job-* APIs. @type argument of the new command immediately becomes deprecated. Signed-off-by: Vladimir Sementsov-Ogievskiy --- job-qmp.c| 14 ++ qapi/block-core.json | 10 ++ 2 files changed, 24 insertions(+) diff --git a/job-qmp.c b/job-qmp.c index c764bd3801..248e68f554 100644 --- a/job-qmp.c +++ b/job-qmp.c @@ -139,6 +139,20 @@ void qmp_job_dismiss(const char *id, Error **errp) job_dismiss_locked(, errp); } +void qmp_job_change(JobChangeOptions *opts, Error **errp) +{ +Job *job; + +JOB_LOCK_GUARD(); +job = find_job_locked(opts->id, errp); + +if (!job) { +return; +} + +job_change_locked(job, opts, errp); +} + /* Called with job_mutex held. */ static JobInfo *job_query_single_locked(Job *job, Error **errp) { diff --git a/qapi/block-core.json b/qapi/block-core.json index 660c7f4a48..9087ce300c 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -3104,6 +3104,16 @@ { 'command': 'block-job-change', 'data': 'JobChangeOptions', 'boxed': true } +## +# @job-change: +# +# Change the block job's options. +# +# Since: 9.1 +## +{ 'command': 'job-change', + 'data': 'JobChangeOptions', 'boxed': true } + ## # @BlockdevDiscardOptions: # -- 2.34.1
Re: [PATCH 1/2] block: allow commit to unmap zero blocks
On 26.05.24 22:29, Vincent Vanlaer wrote: Non-active block commits do not discard blocks only containing zeros, causing images to lose sparseness after the commit. This commit fixes that by writing zero blocks using blk_co_pwrite_zeroes rather than writing them out as any oother arbitrary data. Hi Vincent! Sorry for long delay. Could you please split the commit into simpler parts? Something like this: 1. refactor to use direct bdrv_co_common_block_status_above() 2. refactor to use CommitMethod (with two possible modes) 3. add new mode (The idea is to split parts of no-logic-change refactoring from real logic changes and keep the latter smaller and clearer) Signed-off-by: Vincent Vanlaer --- block/commit.c | 71 +- 1 file changed, 58 insertions(+), 13 deletions(-) diff --git a/block/commit.c b/block/commit.c index 7c3fdcb0ca..5bd97b5a74 100644 --- a/block/commit.c +++ b/block/commit.c @@ -12,9 +12,13 @@ * */ +#include "bits/time.h" #include "qemu/osdep.h" #include "qemu/cutils.h" +#include "time.h" #include "trace.h" +#include "block/block-common.h" +#include "block/coroutines.h" #include "block/block_int.h" #include "block/blockjob_int.h" #include "qapi/error.h" @@ -126,6 +130,12 @@ static void commit_clean(Job *job) blk_unref(s->top); } +typedef enum CommitMethod { +COMMIT_METHOD_COPY, +COMMIT_METHOD_ZERO, +COMMIT_METHOD_IGNORE, +} CommitMethod; + static int coroutine_fn commit_run(Job *job, Error **errp) { CommitBlockJob *s = container_of(job, CommitBlockJob, common.job); @@ -156,8 +166,9 @@ static int coroutine_fn commit_run(Job *job, Error **errp) buf = blk_blockalign(s->top, COMMIT_BUFFER_SIZE); for (offset = 0; offset < len; offset += n) { -bool copy; bool error_in_source = true; +CommitMethod commit_method = COMMIT_METHOD_COPY; + /* Note that even when no rate limit is applied we need to yield * with no pending I/O here so that bdrv_drain_all() returns. @@ -167,21 +178,54 @@ static int coroutine_fn commit_run(Job *job, Error **errp) break; } /* Copy if allocated above the base */ -ret = blk_co_is_allocated_above(s->top, s->base_overlay, true, -offset, COMMIT_BUFFER_SIZE, ); -copy = (ret > 0); +WITH_GRAPH_RDLOCK_GUARD() { +ret = bdrv_co_common_block_status_above(blk_bs(s->top), +s->base_overlay, true, true, offset, COMMIT_BUFFER_SIZE, +, NULL, NULL, NULL); +} + trace_commit_one_iteration(s, offset, n, ret); -if (copy) { -assert(n < SIZE_MAX); - -ret = blk_co_pread(s->top, offset, n, buf, 0); -if (ret >= 0) { -ret = blk_co_pwrite(s->base, offset, n, buf, 0); -if (ret < 0) { -error_in_source = false; + +if (ret >= 0 && !(ret & BDRV_BLOCK_ALLOCATED)) { +commit_method = COMMIT_METHOD_IGNORE; +} else if (ret >= 0 && ret & BDRV_BLOCK_ZERO) { +int64_t target_offset; +int64_t target_bytes; +WITH_GRAPH_RDLOCK_GUARD() { +bdrv_round_to_subclusters(s->base_bs, offset, n, + _offset, _bytes); +} + +if (target_offset == offset && +target_bytes == n) { +commit_method = COMMIT_METHOD_ZERO; +} +} + +if (ret >= 0) { +switch (commit_method) { +case COMMIT_METHOD_COPY: +assert(n < SIZE_MAX); +ret = blk_co_pread(s->top, offset, n, buf, 0); +if (ret >= 0) { +ret = blk_co_pwrite(s->base, offset, n, buf, 0); +if (ret < 0) { +error_in_source = false; +} } +break; +case COMMIT_METHOD_ZERO: +ret = blk_co_pwrite_zeroes(s->base, offset, n, +BDRV_REQ_MAY_UNMAP); +error_in_source = false; +break; +case COMMIT_METHOD_IGNORE: +break; +default: +abort(); } } + if (ret < 0) { BlockErrorAction action = block_job_error_action(>common, s->on_error, @@ -193,10 +237,11 @@ static int coroutine_fn commit_run(Job *job, Error **errp) continue; } } + /* Publish progress */ job_progress_update(>common.job, n); -if (copy) { +if (commit_method == COMMIT_METHOD_COPY) { block_job_ratelimit_processed_bytes(>common, n); } } -- Best regards, Vladimir
Re: [PATCH 1/1] prealloc: add truncate mode for prealloc filter
On 30.04.24 20:05, Denis V. Lunev via wrote: Preallocate filter allows to implement really interesting setups. Assume that we have * shared block device, f.e. iSCSI LUN, implemented with some HW device * clustered LVM on top of it * QCOW2 image stored inside LVM volume This allows very cheap clustered setups with all QCOW2 features intact. Currently supported setups using QCOW2 with data_file option are not so cool as snapshots are not allowed, QCOW2 should be placed into some additional distributed storage and so on. Though QCOW2 inside LVM volume has a drawback. The image is growing and in order to accomodate that image LVM volume is to be resized. This could be done externally using ENOSPACE event/condition but this is cumbersome. This patch introduces native implementation for such a setup. We should just put prealloc filter in between QCOW2 format and file nodes. In that case LVM will be resized at proper moment and that is done effectively as resizing is done in chinks. The patch adds allocation mode for this purpose in order to distinguish 'fallocate' for ordinary file system and 'truncate'. Signed-off-by: Denis V. Lunev CC: Alexander Ivanov CC: Kevin Wolf CC: Hanna Reitz CC: Vladimir Sementsov-Ogievskiy --- block/preallocate.c | 50 +++-- 1 file changed, 48 insertions(+), 2 deletions(-) diff --git a/block/preallocate.c b/block/preallocate.c index 4d82125036..6d31627325 100644 --- a/block/preallocate.c +++ b/block/preallocate.c @@ -33,10 +33,24 @@ #include "block/block-io.h" #include "block/block_int.h" +typedef enum PreallocateMode { +PREALLOCATE_MODE_FALLOCATE = 0, +PREALLOCATE_MODE_TRUNCATE = 1, +PREALLOCATE_MODE__MAX = 2, +} PreallocateMode; + +static QEnumLookup prealloc_mode_lookup = { +.array = (const char *const[]) { +"falloc", +"truncate", +}, +.size = PREALLOCATE_MODE__MAX, +}; I think instead we should update BlockdevOptionsPreallocate options in qapi/block-core.json. Then corresponding enum would be generated. Example of parsing options, reusing qapi mechanics is in block/copy-before-write.c - cbw_parse_options(). on-cbw-error field is enum. typedef struct PreallocateOpts { int64_t prealloc_size; int64_t prealloc_align; +PreallocateMode prealloc_mode; } PreallocateOpts; typedef struct BDRVPreallocateState { @@ -79,6 +93,7 @@ typedef struct BDRVPreallocateState { #define PREALLOCATE_OPT_PREALLOC_ALIGN "prealloc-align" #define PREALLOCATE_OPT_PREALLOC_SIZE "prealloc-size" +#define PREALLOCATE_OPT_MODE "mode" static QemuOptsList runtime_opts = { .name = "preallocate", .head = QTAILQ_HEAD_INITIALIZER(runtime_opts.head), @@ -94,7 +109,14 @@ static QemuOptsList runtime_opts = { .type = QEMU_OPT_SIZE, .help = "how much to preallocate, default 128M", }, -{ /* end of list */ } +{ +.name = PREALLOCATE_OPT_MODE, +.type = QEMU_OPT_STRING, +.help = "Preallocation mode on image expansion " +"(allowed values: falloc, truncate)", +.def_value_str = "falloc", +}, +{ /* end of list */ }, }, }; @@ -102,6 +124,8 @@ static bool preallocate_absorb_opts(PreallocateOpts *dest, QDict *options, BlockDriverState *child_bs, Error **errp) { QemuOpts *opts = qemu_opts_create(_opts, NULL, 0, _abort); +Error *local_err = NULL; +char *buf; if (!qemu_opts_absorb_qdict(opts, options, errp)) { return false; @@ -112,6 +136,17 @@ static bool preallocate_absorb_opts(PreallocateOpts *dest, QDict *options, dest->prealloc_size = qemu_opt_get_size(opts, PREALLOCATE_OPT_PREALLOC_SIZE, 128 * MiB); +buf = qemu_opt_get_del(opts, PREALLOCATE_OPT_MODE); +/* prealloc_mode can be downgraded later during allocate_clusters */ +dest->prealloc_mode = qapi_enum_parse(_mode_lookup, buf, + PREALLOCATE_MODE_FALLOCATE, + _err); +g_free(buf); +if (local_err != NULL) { +error_propagate(errp, local_err); +return false; +} + qemu_opts_del(opts); if (!QEMU_IS_ALIGNED(dest->prealloc_align, BDRV_SECTOR_SIZE)) { @@ -335,9 +370,20 @@ handle_write(BlockDriverState *bs, int64_t offset, int64_t bytes, want_merge_zero = want_merge_zero && (prealloc_start <= offset); -ret = bdrv_co_pwrite_zeroes( +switch (s->opts.prealloc_mode) { +case PREALLOCATE_MODE_FALLOCATE: +ret = bdrv_co_pwrite_zeroes( bs->file, prealloc_start, prealloc_end - prealloc_start, BDRV_REQ_NO_FALLBACK | BDRV_REQ_S
Re: [PATCH] tests/avocado: add hotplug_blk test
ping2 On 09.04.24 09:58, Vladimir Sementsov-Ogievskiy wrote: Introduce a test, that checks that plug/unplug of virtio-blk device works. (the test is developed by copying hotplug_cpu.py, so keep original copyright) Signed-off-by: Vladimir Sementsov-Ogievskiy -- Best regards, Vladimir
[PATCH v5 1/3] qdev-monitor: add option to report GenericError from find_device_state
Here we just prepare for the following patch, making possible to report GenericError as recommended. This patch doesn't aim to prevent further use of DeviceNotFound by future interfaces: - find_device_state() is used in blk_by_qdev_id() and qmp_get_blk() functions, which may lead to spread of DeviceNotFound anyway - also, nothing prevent simply copy-pasting find_device_state() calls with false argument Signed-off-by: Vladimir Sementsov-Ogievskiy --- system/qdev-monitor.c | 15 +++ 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/system/qdev-monitor.c b/system/qdev-monitor.c index 6af6ef7d66..264978aa40 100644 --- a/system/qdev-monitor.c +++ b/system/qdev-monitor.c @@ -879,13 +879,20 @@ void qmp_device_add(QDict *qdict, QObject **ret_data, Error **errp) object_unref(OBJECT(dev)); } -static DeviceState *find_device_state(const char *id, Error **errp) +/* + * Note that creating new APIs using error classes other than GenericError is + * not recommended. Set use_generic_error=true for new interfaces. + */ +static DeviceState *find_device_state(const char *id, bool use_generic_error, + Error **errp) { Object *obj = object_resolve_path_at(qdev_get_peripheral(), id); DeviceState *dev; if (!obj) { -error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND, +error_set(errp, + (use_generic_error ? + ERROR_CLASS_GENERIC_ERROR : ERROR_CLASS_DEVICE_NOT_FOUND), "Device '%s' not found", id); return NULL; } @@ -950,7 +957,7 @@ void qdev_unplug(DeviceState *dev, Error **errp) void qmp_device_del(const char *id, Error **errp) { -DeviceState *dev = find_device_state(id, errp); +DeviceState *dev = find_device_state(id, false, errp); if (dev != NULL) { if (dev->pending_deleted_event && (dev->pending_deleted_expires_ms == 0 || @@ -1070,7 +1077,7 @@ BlockBackend *blk_by_qdev_id(const char *id, Error **errp) GLOBAL_STATE_CODE(); -dev = find_device_state(id, errp); +dev = find_device_state(id, false, errp); if (dev == NULL) { return NULL; } -- 2.34.1
[PATCH v5 3/3] qapi: introduce device-sync-config
Add command to sync config from vhost-user backend to the device. It may be helpful when VHOST_USER_SLAVE_CONFIG_CHANGE_MSG failed or not triggered interrupt to the guest or just not available (not supported by vhost-user server). Command result is racy if allow it during migration. Let's allow the sync only in RUNNING state. Signed-off-by: Vladimir Sementsov-Ogievskiy --- hw/block/vhost-user-blk.c | 1 + hw/virtio/virtio-pci.c| 9 + include/hw/qdev-core.h| 3 +++ qapi/qdev.json| 24 system/qdev-monitor.c | 38 ++ 5 files changed, 75 insertions(+) diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c index 091d2c6acf..2f301f380c 100644 --- a/hw/block/vhost-user-blk.c +++ b/hw/block/vhost-user-blk.c @@ -588,6 +588,7 @@ static void vhost_user_blk_class_init(ObjectClass *klass, void *data) device_class_set_props(dc, vhost_user_blk_properties); dc->vmsd = _vhost_user_blk; +dc->sync_config = vhost_user_blk_sync_config; set_bit(DEVICE_CATEGORY_STORAGE, dc->categories); vdc->realize = vhost_user_blk_device_realize; vdc->unrealize = vhost_user_blk_device_unrealize; diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c index b1d02f4b3d..0d91e8b5dc 100644 --- a/hw/virtio/virtio-pci.c +++ b/hw/virtio/virtio-pci.c @@ -2351,6 +2351,14 @@ static void virtio_pci_dc_realize(DeviceState *qdev, Error **errp) vpciklass->parent_dc_realize(qdev, errp); } +static int virtio_pci_sync_config(DeviceState *dev, Error **errp) +{ +VirtIOPCIProxy *proxy = VIRTIO_PCI(dev); +VirtIODevice *vdev = virtio_bus_get_device(>bus); + +return qdev_sync_config(DEVICE(vdev), errp); +} + static void virtio_pci_class_init(ObjectClass *klass, void *data) { DeviceClass *dc = DEVICE_CLASS(klass); @@ -2367,6 +2375,7 @@ static void virtio_pci_class_init(ObjectClass *klass, void *data) device_class_set_parent_realize(dc, virtio_pci_dc_realize, >parent_dc_realize); rc->phases.hold = virtio_pci_bus_reset_hold; +dc->sync_config = virtio_pci_sync_config; } static const TypeInfo virtio_pci_info = { diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h index 5336728a23..f992061919 100644 --- a/include/hw/qdev-core.h +++ b/include/hw/qdev-core.h @@ -95,6 +95,7 @@ typedef void (*DeviceUnrealize)(DeviceState *dev); typedef void (*DeviceReset)(DeviceState *dev); typedef void (*BusRealize)(BusState *bus, Error **errp); typedef void (*BusUnrealize)(BusState *bus); +typedef int (*DeviceSyncConfig)(DeviceState *dev, Error **errp); /** * struct DeviceClass - The base class for all devices. @@ -162,6 +163,7 @@ struct DeviceClass { DeviceReset reset; DeviceRealize realize; DeviceUnrealize unrealize; +DeviceSyncConfig sync_config; /** * @vmsd: device state serialisation description for @@ -547,6 +549,7 @@ bool qdev_hotplug_allowed(DeviceState *dev, Error **errp); */ HotplugHandler *qdev_get_hotplug_handler(DeviceState *dev); void qdev_unplug(DeviceState *dev, Error **errp); +int qdev_sync_config(DeviceState *dev, Error **errp); void qdev_simple_device_unplug_cb(HotplugHandler *hotplug_dev, DeviceState *dev, Error **errp); void qdev_machine_creation_done(void); diff --git a/qapi/qdev.json b/qapi/qdev.json index facaa0bc6a..72e434bc45 100644 --- a/qapi/qdev.json +++ b/qapi/qdev.json @@ -161,3 +161,27 @@ ## { 'event': 'DEVICE_UNPLUG_GUEST_ERROR', 'data': { '*device': 'str', 'path': 'str' } } + +## +# @device-sync-config: +# +# Synchronize device configuration from host to guest part. First, +# copy the configuration from the host part (backend) to the guest +# part (frontend). Then notify guest software that device +# configuration changed. +# +# The command may be used to notify the guest about block device +# capcity change. Currently only vhost-user-blk device supports +# this. +# +# @id: the device's ID or QOM path +# +# Features: +# +# @unstable: The command is experimental. +# +# Since: 9.1 +## +{ 'command': 'device-sync-config', + 'features': [ 'unstable' ], + 'data': {'id': 'str'} } diff --git a/system/qdev-monitor.c b/system/qdev-monitor.c index 264978aa40..1c29312b53 100644 --- a/system/qdev-monitor.c +++ b/system/qdev-monitor.c @@ -23,6 +23,7 @@ #include "monitor/monitor.h" #include "monitor/qdev.h" #include "sysemu/arch_init.h" +#include "sysemu/runstate.h" #include "qapi/error.h" #include "qapi/qapi-commands-qdev.h" #include "qapi/qmp/dispatch.h" @@ -971,6 +972,43 @@ void qmp_device_del(const char *id, Error **errp) } } +int qdev_sync_config(DeviceState *dev, Error **errp) +{ +DeviceClass *dc = DEVICE_GET_CLASS(dev); + +if (!dc->sync_config) { +error_setg(errp, "device-sync-config
[PATCH v5 0/3] vhost-user-blk: live resize additional APIs
v5: 03: drop extra check on is is runstate running Vladimir Sementsov-Ogievskiy (3): qdev-monitor: add option to report GenericError from find_device_state vhost-user-blk: split vhost_user_blk_sync_config() qapi: introduce device-sync-config hw/block/vhost-user-blk.c | 27 ++-- hw/virtio/virtio-pci.c| 9 +++ include/hw/qdev-core.h| 3 +++ qapi/qdev.json| 24 ++ system/qdev-monitor.c | 53 --- 5 files changed, 105 insertions(+), 11 deletions(-) -- 2.34.1
[PATCH v5 2/3] vhost-user-blk: split vhost_user_blk_sync_config()
Split vhost_user_blk_sync_config() out from vhost_user_blk_handle_config_change(), to be reused in the following commit. Signed-off-by: Vladimir Sementsov-Ogievskiy --- hw/block/vhost-user-blk.c | 26 +++--- 1 file changed, 19 insertions(+), 7 deletions(-) diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c index 9e6bbc6950..091d2c6acf 100644 --- a/hw/block/vhost-user-blk.c +++ b/hw/block/vhost-user-blk.c @@ -88,27 +88,39 @@ static void vhost_user_blk_set_config(VirtIODevice *vdev, const uint8_t *config) s->blkcfg.wce = blkcfg->wce; } +static int vhost_user_blk_sync_config(DeviceState *dev, Error **errp) +{ +int ret; +VirtIODevice *vdev = VIRTIO_DEVICE(dev); +VHostUserBlk *s = VHOST_USER_BLK(vdev); + +ret = vhost_dev_get_config(>dev, (uint8_t *)>blkcfg, + vdev->config_len, errp); +if (ret < 0) { +return ret; +} + +memcpy(vdev->config, >blkcfg, vdev->config_len); +virtio_notify_config(vdev); + +return 0; +} + static int vhost_user_blk_handle_config_change(struct vhost_dev *dev) { int ret; -VirtIODevice *vdev = dev->vdev; -VHostUserBlk *s = VHOST_USER_BLK(dev->vdev); Error *local_err = NULL; if (!dev->started) { return 0; } -ret = vhost_dev_get_config(dev, (uint8_t *)>blkcfg, - vdev->config_len, _err); +ret = vhost_user_blk_sync_config(DEVICE(dev->vdev), _err); if (ret < 0) { error_report_err(local_err); return ret; } -memcpy(dev->vdev->config, >blkcfg, vdev->config_len); -virtio_notify_config(dev->vdev); - return 0; } -- 2.34.1
Re: [PATCH v2 1/2] copy-before-write: allow specifying minimum cluster size
On 28.05.24 15:01, Fiona Ebner wrote: +if (min_cluster_size > INT64_MAX) { +error_setg(errp, "min-cluster-size too large: %lu > %ld", + min_cluster_size, INT64_MAX); Better use PRIu64 and PRIi64 macros -- Best regards, Vladimir
Re: [PATCH v2 2/2] backup: add minimum cluster size to performance options
On 28.05.24 15:01, Fiona Ebner wrote: In the context of backup fleecing, discarding the source will not work when the fleecing image has a larger granularity than the one used for block-copy operations (can happen if the backup target has smaller cluster size), because cbw_co_pdiscard_snapshot() will align down the discard requests and thus effectively ignore then. To make @discard-source work in such a scenario, allow specifying the minimum cluster size used for block-copy operations and thus in particular also the granularity for discard requests to the source. Suggested-by: Vladimir Sementsov-Ogievskiy Signed-off-by: Fiona Ebner --- Changes in v2: * Use 'size' type in QAPI. block/backup.c| 2 +- block/copy-before-write.c | 8 block/copy-before-write.h | 1 + blockdev.c| 3 +++ qapi/block-core.json | 9 +++-- 5 files changed, 20 insertions(+), 3 deletions(-) diff --git a/block/backup.c b/block/backup.c index 3dd2e229d2..a1292c01ec 100644 --- a/block/backup.c +++ b/block/backup.c @@ -458,7 +458,7 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs, } cbw = bdrv_cbw_append(bs, target, filter_node_name, discard_source, - , errp); + perf->min_cluster_size, , errp); if (!cbw) { goto error; } diff --git a/block/copy-before-write.c b/block/copy-before-write.c index ef0bc4dcfe..183eed42e5 100644 --- a/block/copy-before-write.c +++ b/block/copy-before-write.c @@ -553,6 +553,7 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState *source, BlockDriverState *target, const char *filter_node_name, bool discard_source, + uint64_t min_cluster_size, BlockCopyState **bcs, Error **errp) { @@ -572,6 +573,13 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState *source, qdict_put_str(opts, "file", bdrv_get_node_name(source)); qdict_put_str(opts, "target", bdrv_get_node_name(target)); +if (min_cluster_size > INT64_MAX) { +error_setg(errp, "min-cluster-size too large: %lu > %ld", + min_cluster_size, INT64_MAX); opts leaked here. with that fixed: Reviewed-by: Vladimir Sementsov-Ogievskiy +return NULL; +} +qdict_put_int(opts, "min-cluster-size", (int64_t)min_cluster_size); + top = bdrv_insert_node(source, opts, flags, errp); if (!top) { return NULL; diff --git a/block/copy-before-write.h b/block/copy-before-write.h index 01af0cd3c4..2a5d4ba693 100644 --- a/block/copy-before-write.h +++ b/block/copy-before-write.h @@ -40,6 +40,7 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState *source, BlockDriverState *target, const char *filter_node_name, bool discard_source, + uint64_t min_cluster_size, BlockCopyState **bcs, Error **errp); void bdrv_cbw_drop(BlockDriverState *bs); diff --git a/blockdev.c b/blockdev.c index 835064ed03..6740663fda 100644 --- a/blockdev.c +++ b/blockdev.c @@ -2655,6 +2655,9 @@ static BlockJob *do_backup_common(BackupCommon *backup, if (backup->x_perf->has_max_chunk) { perf.max_chunk = backup->x_perf->max_chunk; } +if (backup->x_perf->has_min_cluster_size) { +perf.min_cluster_size = backup->x_perf->min_cluster_size; +} } if ((backup->sync == MIRROR_SYNC_MODE_BITMAP) || diff --git a/qapi/block-core.json b/qapi/block-core.json index 8fc0a4b234..f1219a9dfb 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -1551,11 +1551,16 @@ # it should not be less than job cluster size which is calculated # as maximum of target image cluster size and 64k. Default 0. # +# @min-cluster-size: Minimum size of blocks used by copy-before-write +# and background copy operations. Has to be a power of 2. No +# effect if smaller than the maximum of the target's cluster size +# and 64 KiB. Default 0. (Since 9.1) +# # Since: 6.0 ## { 'struct': 'BackupPerf', - 'data': { '*use-copy-range': 'bool', -'*max-workers': 'int', '*max-chunk': 'int64' } } + 'data': { '*use-copy-range': 'bool', '*max-workers': 'int', +'*max-chunk': 'int64', '*min-cluster-size': 'size' } } ## # @BackupCommon: -- Best regards, Vladimir
Re: [PATCH v2 1/2] copy-before-write: allow specifying minimum cluster size
On 28.05.24 15:01, Fiona Ebner wrote: In the context of backup fleecing, discarding the source will not work when the fleecing image has a larger granularity than the one used for block-copy operations (can happen if the backup target has smaller cluster size), because cbw_co_pdiscard_snapshot() will align down the discard requests and thus effectively ignore then. To make @discard-source work in such a scenario, allow specifying the minimum cluster size used for block-copy operations and thus in particular also the granularity for discard requests to the source. The type 'size' (corresponding to uint64_t in C) is used in QAPI to rule out negative inputs and for consistency with already existing @cluster-size parameters. Since block_copy_calculate_cluster_size() uses int64_t for its result, a check that the input is not too large is added in block_copy_state_new() before calling it. The calculation in block_copy_calculate_cluster_size() is done in the target int64_t type. Suggested-by: Vladimir Sementsov-Ogievskiy Signed-off-by: Fiona Ebner --- Changes in v2: * Use 'size' type in QAPI. * Remove option in cbw_parse_options(), i.e. before parsing generic blockdev options. block/block-copy.c | 22 ++ block/copy-before-write.c | 10 +- include/block/block-copy.h | 1 + qapi/block-core.json | 8 +++- 4 files changed, 35 insertions(+), 6 deletions(-) diff --git a/block/block-copy.c b/block/block-copy.c index 7e3b378528..36eaecaaf4 100644 --- a/block/block-copy.c +++ b/block/block-copy.c @@ -310,6 +310,7 @@ void block_copy_set_copy_opts(BlockCopyState *s, bool use_copy_range, } static int64_t block_copy_calculate_cluster_size(BlockDriverState *target, + int64_t min_cluster_size, Error **errp) { int ret; @@ -335,7 +336,7 @@ static int64_t block_copy_calculate_cluster_size(BlockDriverState *target, "used. If the actual block size of the target exceeds " "this default, the backup may be unusable", BLOCK_COPY_CLUSTER_SIZE_DEFAULT); -return BLOCK_COPY_CLUSTER_SIZE_DEFAULT; +return MAX(min_cluster_size, (int64_t)BLOCK_COPY_CLUSTER_SIZE_DEFAULT); instead of triple use MAX(min_..., let's just do min_cluster_size = MAX(min_cluster_size, (int64_t)BLOCK_COPY_CLUSTER_SIZE_DEFAULT); at start of function, and then use min_cluster_size anyway: Reviewed-by: Vladimir Sementsov-Ogievskiy } else if (ret < 0 && !target_does_cow) { error_setg_errno(errp, -ret, "Couldn't determine the cluster size of the target image, " @@ -345,16 +346,18 @@ static int64_t block_copy_calculate_cluster_size(BlockDriverState *target, return ret; } else if (ret < 0 && target_does_cow) { /* Not fatal; just trudge on ahead. */ -return BLOCK_COPY_CLUSTER_SIZE_DEFAULT; +return MAX(min_cluster_size, (int64_t)BLOCK_COPY_CLUSTER_SIZE_DEFAULT); } -return MAX(BLOCK_COPY_CLUSTER_SIZE_DEFAULT, bdi.cluster_size); +return MAX(min_cluster_size, + (int64_t)MAX(BLOCK_COPY_CLUSTER_SIZE_DEFAULT, bdi.cluster_size)); } BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target, BlockDriverState *copy_bitmap_bs, const BdrvDirtyBitmap *bitmap, bool discard_source, + uint64_t min_cluster_size, Error **errp) { ERRP_GUARD(); @@ -365,7 +368,18 @@ BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target, GLOBAL_STATE_CODE(); -cluster_size = block_copy_calculate_cluster_size(target->bs, errp); +if (min_cluster_size > INT64_MAX) { +error_setg(errp, "min-cluster-size too large: %lu > %ld", + min_cluster_size, INT64_MAX); +return NULL; +} else if (min_cluster_size && !is_power_of_2(min_cluster_size)) { +error_setg(errp, "min-cluster-size needs to be a power of 2"); +return NULL; +} + +cluster_size = block_copy_calculate_cluster_size(target->bs, + (int64_t)min_cluster_size, + errp); if (cluster_size < 0) { return NULL; } diff --git a/block/copy-before-write.c b/block/copy-before-write.c index cd65524e26..ef0bc4dcfe 100644 --- a/block/copy-before-write.c +++ b/block/copy-before-write.c @@ -417,6 +417,7 @@ static BlockdevOptions *cbw_parse_options(QDict *options, Error **errp) qdict_extract_subqdict(options, NULL, "bitmap"); qdict_del(options, "
Re: [PATCH v4 5/5] blockdev: mirror: check for target's cluster size when using bitmap
On 21.05.24 15:20, Fiona Ebner wrote: When using mirror with a bitmap and the target does not do COW and is is a diff image, i.e. one that should only contain the delta and was not synced to previously, a too large cluster size for the target can be problematic. In particular, when the mirror sends data to the target aligned to the jobs granularity, but not aligned to the larger target image's cluster size, the target's cluster would be allocated but only be filled partially. When rebasing such a diff image later, the corresponding cluster of the base image would get "masked" and the part of the cluster not in the diff image is not accessible anymore. Unfortunately, it is not always possible to check for the target image's cluster size, e.g. when it's NBD. Because the limitation is already documented in the QAPI description for the @bitmap parameter and it's only required for special diff image use-case, simply skip the check then. Signed-off-by: Fiona Ebner --- blockdev.c | 57 ++ tests/qemu-iotests/tests/mirror-bitmap | 6 +++ tests/qemu-iotests/tests/mirror-bitmap.out | 7 +++ 3 files changed, 70 insertions(+) diff --git a/blockdev.c b/blockdev.c index 4f72a72dc7..468974108e 100644 --- a/blockdev.c +++ b/blockdev.c @@ -2769,6 +2769,59 @@ void qmp_blockdev_backup(BlockdevBackup *backup, Error **errp) blockdev_do_action(, errp); } +static int blockdev_mirror_check_bitmap_granularity(BlockDriverState *target, +BdrvDirtyBitmap *bitmap, +Error **errp) +{ +int ret; +BlockDriverInfo bdi; +uint32_t bitmap_granularity; + +GLOBAL_STATE_CODE(); +GRAPH_RDLOCK_GUARD_MAINLOOP(); + +if (bdrv_backing_chain_next(target)) { +/* + * No need to worry about creating clusters with partial data when the + * target does COW. + */ +return 0; +} + +/* + * If there is no backing file on the target, we cannot rely on COW if our + * backup cluster size is smaller than the target cluster size. Even for + * targets with a backing file, try to avoid COW if possible. "Even for targes with" - I don't follow. We do "return 0" already above for such targets? + */ +ret = bdrv_get_info(target, ); +if (ret == -ENOTSUP) { +/* + * Ignore if unable to get the info, e.g. when target is NBD. It's only + * relevant for syncing to a diff image and the documentation already + * states that the target's cluster size needs to small enough then. + */ +return 0; +} else if (ret < 0) { +error_setg_errno(errp, -ret, +"Couldn't determine the cluster size of the target image, " +"which has no backing file"); +return ret; +} + +bitmap_granularity = bdrv_dirty_bitmap_granularity(bitmap); +if (bitmap_granularity < bdi.cluster_size || +bitmap_granularity % bdi.cluster_size != 0) { +error_setg(errp, "Bitmap granularity %u is not a multiple of the " + "target image's cluster size %u and the target image has " + "no backing file", + bitmap_granularity, bdi.cluster_size); +return -EINVAL; +} + +return 0; +} + + /* Parameter check and block job starting for drive mirroring. * Caller should hold @device and @target's aio context (must be the same). **/ @@ -2863,6 +2916,10 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs, return; } +if (blockdev_mirror_check_bitmap_granularity(target, bitmap, errp)) { +return; +} + if (bdrv_dirty_bitmap_check(bitmap, BDRV_BITMAP_DEFAULT, errp)) { return; } diff --git a/tests/qemu-iotests/tests/mirror-bitmap b/tests/qemu-iotests/tests/mirror-bitmap index 37bbe0f241..e8cd482a19 100755 --- a/tests/qemu-iotests/tests/mirror-bitmap +++ b/tests/qemu-iotests/tests/mirror-bitmap @@ -584,6 +584,12 @@ def test_mirror_api(): bitmap=bitmap) log('') +log("-- Test bitmap with too small granularity to non-COW target --\n") +vm.qmp_log("block-dirty-bitmap-add", node=drive0.node, + name="bitmap-small", granularity=GRANULARITY) +blockdev_mirror(drive0.vm, drive0.node, "mirror_target", "full", +job_id='api_job', bitmap="bitmap-small") +log('') def main(): for bsync_mode in ("never", "on-success", "always"): diff --git a/tests/qemu-iotests/tests/mirror-bitmap.out b/tests/qemu-iotests/tests/mirror-bitmap.out index 5c8acc1d69..af605f3803 100644 --- a/tests/qemu-iotests/tests/mirror-bitmap.out +++ b/tests/qemu-iotests/tests/mirror-bitmap.out @@ -3189,3 +3189,10 @@ qemu_img compare "TEST_DIR/PID-img"
Re: [PATCH v4 4/5] iotests: add test for bitmap mirror
On 21.05.24 15:20, Fiona Ebner wrote: From: Fabian Grünbichler heavily based on/practically forked off iotest 257 for bitmap backups, but: really, heavily. Making a duplication is always bad idea. Could we instead just add test-cases to 257? - no writes to filter node 'mirror-top' between completion and finalization, as those seem to deadlock? Could you give a bit more concreteness? If guest writes may lead to dead-lock, that's a bug, is it? -- Best regards, Vladimir
Re: [PATCH v4 3/5] mirror: allow specifying working bitmap
On 21.05.24 15:20, Fiona Ebner wrote: From: John Snow for the mirror job. The bitmap's granularity is used as the job's granularity. The new @bitmap parameter is marked unstable in the QAPI and can currently only be used for @sync=full mode. Clusters initially dirty in the bitmap as well as new writes are copied to the target. Using block-dirty-bitmap-clear and block-dirty-bitmap-merge API, callers can simulate the three kinds of @BitmapSyncMode (which is used by backup): 1. always: default, just pass bitmap as working bitmap. 2. never: copy bitmap and pass copy to the mirror job. 3. on-success: copy bitmap and pass copy to the mirror job and if successful, merge bitmap into original afterwards. When the target image is a non-COW "diff image", i.e. one that was not used as the target of a previous mirror and the target image's cluster size is larger than the bitmap's granularity, or when @copy-mode=write-blocking is used, there is a pitfall, because the cluster in the target image will be allocated, but not contain all the data corresponding to the same region in the source image. An idea to avoid the limitation would be to mark clusters which are affected by unaligned writes and are not allocated in the target image dirty, so they would be copied fully later. However, for migration, the invariant that an actively synced mirror stays actively synced (unless an error happens) is useful, because without that invariant, migration might inactivate block devices when mirror still got work to do and run into an assertion failure [0]. Another approach would be to read the missing data from the source upon unaligned writes to be able to write the full target cluster instead. But certain targets like NBD do not allow querying the cluster size. To avoid limiting/breaking the use case of syncing to an existing target, which is arguably more common than the diff image use case, document the limitation in QAPI. This patch was originally based on one by Ma Haocong, but it has since been modified pretty heavily, first by John and then again by Fiona. [0]: https://lore.kernel.org/qemu-devel/1db7f571-cb7f-c293-04cc-cd856e060...@proxmox.com/ Suggested-by: Ma Haocong Signed-off-by: Ma Haocong Signed-off-by: John Snow [FG: switch to bdrv_dirty_bitmap_merge_internal] Signed-off-by: Fabian Grünbichler Signed-off-by: Thomas Lamprecht [FE: rebase for 9.1 get rid of bitmap mode parameter use caller-provided bitmap as working bitmap turn bitmap parameter experimental] Signed-off-by: Fiona Ebner Acked-by: Markus Armbruster --- block/mirror.c | 80 +- blockdev.c | 44 +++--- include/block/block_int-global-state.h | 5 +- qapi/block-core.json | 35 ++- tests/unit/test-block-iothread.c | 2 +- 5 files changed, 141 insertions(+), 25 deletions(-) diff --git a/block/mirror.c b/block/mirror.c index ca23d6ef65..d3d0698116 100644 --- a/block/mirror.c +++ b/block/mirror.c @@ -73,6 +73,11 @@ typedef struct MirrorBlockJob { size_t buf_size; int64_t bdev_length; unsigned long *cow_bitmap; +/* + * Whether the bitmap is created locally or provided by the caller (for + * incremental sync). + */ +bool dirty_bitmap_is_local; BdrvDirtyBitmap *dirty_bitmap; BdrvDirtyBitmapIter *dbi; uint8_t *buf; @@ -691,7 +696,11 @@ static int mirror_exit_common(Job *job) bdrv_unfreeze_backing_chain(mirror_top_bs, target_bs); } -bdrv_release_dirty_bitmap(s->dirty_bitmap); +if (s->dirty_bitmap_is_local) { +bdrv_release_dirty_bitmap(s->dirty_bitmap); +} else { +bdrv_enable_dirty_bitmap(s->dirty_bitmap); +} /* Make sure that the source BDS doesn't go away during bdrv_replace_node, * before we can call bdrv_drained_end */ @@ -820,6 +829,16 @@ static void mirror_abort(Job *job) assert(ret == 0); } +/* Always called after commit/abort. */ +static void mirror_clean(Job *job) +{ +MirrorBlockJob *s = container_of(job, MirrorBlockJob, common.job); + +if (!s->dirty_bitmap_is_local && s->dirty_bitmap) { +bdrv_dirty_bitmap_set_busy(s->dirty_bitmap, false); +} why not do that in existing mirror_exit_common, where we already do release/enable? +} + -- Best regards, Vladimir
Re: [PATCH v4 3/5] mirror: allow specifying working bitmap
On 21.05.24 15:20, Fiona Ebner wrote: From: John Snow for the mirror job. The bitmap's granularity is used as the job's granularity. The new @bitmap parameter is marked unstable in the QAPI and can currently only be used for @sync=full mode. Clusters initially dirty in the bitmap as well as new writes are copied to the target. Using block-dirty-bitmap-clear and block-dirty-bitmap-merge API, callers can simulate the three kinds of @BitmapSyncMode (which is used [..] /* * The dirty bitmap is set by bdrv_mirror_top_do_write() when not in active - * mode. + * mode. For external/caller-provided bitmap, need to wait until + * bdrv_mirror_top_do_write() can actually access it before disabling. hmm, isn't that true at least for non-mode? for other modes we rely on mirror_dirty_init(), but with none mode we risk to miss some writes.. That's preexisting, but probably it's better to move disabling the bitmap to the moment where we sure that we do dirty it by hand on any not-immediatelly-synced write. And than have _disable_() call in same place for all scenarios. */ -bdrv_disable_dirty_bitmap(s->dirty_bitmap); +if (s->dirty_bitmap_is_local) { +bdrv_disable_dirty_bitmap(s->dirty_bitmap); +} -- Best regards, Vladimir
Re: [PATCH v4 2/5] block/mirror: replace is_none_mode with sync_mode in MirrorBlockJob struct
On 21.05.24 15:20, Fiona Ebner wrote: It is more flexible and is done in preparation to support specifying a working bitmap for mirror jobs. In particular, this makes it possible to assert that @sync_mode=full when a bitmap is used. That assertion is just to be sure, of course the mirror QMP commands will be made to fail earlier with a clean error. Signed-off-by: Fiona Ebner Reviewed-by: Vladimir Sementsov-Ogievskiy -- Best regards, Vladimir
[PATCH 1/2] iotests/backup-discard-source: convert size variable to be int
Make variable reusable in code for checks. Don't care to change "512 * 1024" invocations as they will be dropped in the next commit. Signed-off-by: Vladimir Sementsov-Ogievskiy --- tests/qemu-iotests/tests/backup-discard-source | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/tests/qemu-iotests/tests/backup-discard-source b/tests/qemu-iotests/tests/backup-discard-source index 2391b12acd..05fbe5d26b 100755 --- a/tests/qemu-iotests/tests/backup-discard-source +++ b/tests/qemu-iotests/tests/backup-discard-source @@ -28,7 +28,7 @@ from iotests import qemu_img_create, qemu_img_map, qemu_io temp_img = os.path.join(iotests.test_dir, 'temp') source_img = os.path.join(iotests.test_dir, 'source') target_img = os.path.join(iotests.test_dir, 'target') -size = '1M' +size = 1024 * 1024 def get_actual_size(vm, node_name): @@ -39,9 +39,9 @@ def get_actual_size(vm, node_name): class TestBackup(iotests.QMPTestCase): def setUp(self): -qemu_img_create('-f', iotests.imgfmt, source_img, size) -qemu_img_create('-f', iotests.imgfmt, temp_img, size) -qemu_img_create('-f', iotests.imgfmt, target_img, size) +qemu_img_create('-f', iotests.imgfmt, source_img, str(size)) +qemu_img_create('-f', iotests.imgfmt, temp_img, str(size)) +qemu_img_create('-f', iotests.imgfmt, target_img, str(size)) qemu_io('-c', 'write 0 1M', source_img) self.vm = iotests.VM() @@ -98,7 +98,7 @@ class TestBackup(iotests.QMPTestCase): mapping = qemu_img_map(temp_img) self.assertEqual(len(mapping), 1) self.assertEqual(mapping[0]['start'], 0) -self.assertEqual(mapping[0]['length'], 1024 * 1024) +self.assertEqual(mapping[0]['length'], size) self.assertEqual(mapping[0]['data'], False) os.remove(temp_img) @@ -125,7 +125,7 @@ class TestBackup(iotests.QMPTestCase): self.assert_qmp(result, 'return', '') # Check that data is written to temporary image -self.assertGreater(get_actual_size(self.vm, 'temp'), 1024 * 1024) +self.assertGreater(get_actual_size(self.vm, 'temp'), size) self.do_backup() -- 2.34.1
[PATCH 0/2] fix backup-discard-source test for XFS
Hi all! As Kevin reported, the test doesn't work on XFS, as it rely on disk usage. Fix it, switching to dirty bitmap for guest write tracking. Vladimir Sementsov-Ogievskiy (2): iotests/backup-discard-source: convert size variable to be int iotests/backup-discard-source: don't use actual-size .../qemu-iotests/tests/backup-discard-source | 39 --- 1 file changed, 25 insertions(+), 14 deletions(-) -- 2.34.1
[PATCH 2/2] iotests/backup-discard-source: don't use actual-size
Relying on disk usage is bad thing, and test just doesn't work on XFS. Let's instead add a dirty bitmap to track writes to test image. Signed-off-by: Vladimir Sementsov-Ogievskiy --- .../qemu-iotests/tests/backup-discard-source | 29 +-- 1 file changed, 20 insertions(+), 9 deletions(-) diff --git a/tests/qemu-iotests/tests/backup-discard-source b/tests/qemu-iotests/tests/backup-discard-source index 05fbe5d26b..17fef9c6d3 100755 --- a/tests/qemu-iotests/tests/backup-discard-source +++ b/tests/qemu-iotests/tests/backup-discard-source @@ -31,12 +31,6 @@ target_img = os.path.join(iotests.test_dir, 'target') size = 1024 * 1024 -def get_actual_size(vm, node_name): -nodes = vm.cmd('query-named-block-nodes', flat=True) -node = next(n for n in nodes if n['node-name'] == node_name) -return node['image']['actual-size'] - - class TestBackup(iotests.QMPTestCase): def setUp(self): qemu_img_create('-f', iotests.imgfmt, source_img, str(size)) @@ -84,7 +78,12 @@ class TestBackup(iotests.QMPTestCase): } }) -self.assertLess(get_actual_size(self.vm, 'temp'), 512 * 1024) +self.bitmap = { +'node': 'temp', +'name': 'bitmap0' +} + +self.vm.cmd('block-dirty-bitmap-add', self.bitmap) def tearDown(self): # That should fail, because region is discarded @@ -113,6 +112,13 @@ class TestBackup(iotests.QMPTestCase): self.vm.event_wait(name='BLOCK_JOB_COMPLETED') +def get_bitmap_count(self): +nodes = self.vm.cmd('query-named-block-nodes', flat=True) +temp = next(n for n in nodes if n['node-name'] == 'temp') +bitmap = temp['dirty-bitmaps'][0] +assert bitmap['name'] == self.bitmap['name'] +return bitmap['count'] + def test_discard_written(self): """ 1. Guest writes @@ -125,7 +131,7 @@ class TestBackup(iotests.QMPTestCase): self.assert_qmp(result, 'return', '') # Check that data is written to temporary image -self.assertGreater(get_actual_size(self.vm, 'temp'), size) +self.assertEqual(self.get_bitmap_count(), size) self.do_backup() @@ -138,13 +144,18 @@ class TestBackup(iotests.QMPTestCase): """ self.do_backup() +# backup job did discard operation and pollute the bitmap, +# we have to clean the bitmap, to check next write +self.assertEqual(self.get_bitmap_count(), size) +self.vm.cmd('block-dirty-bitmap-clear', self.bitmap) + # Try trigger copy-before-write operation result = self.vm.hmp_qemu_io('cbw', 'write 0 1M') self.assert_qmp(result, 'return', '') # Check that data is not written to temporary image, as region # is discarded from copy-before-write process -self.assertLess(get_actual_size(self.vm, 'temp'), 512 * 1024) +self.assertEqual(self.get_bitmap_count(), 0) if __name__ == '__main__': -- 2.34.1
Re: [PATCH v4 5/5] iotests: add backup-discard-source
On 13.06.24 11:02, Kevin Wolf wrote: Am 12.06.2024 um 21:21 hat Vladimir Sementsov-Ogievskiy geschrieben: On 11.06.24 20:49, Kevin Wolf wrote: Am 13.03.2024 um 16:28 hat Vladimir Sementsov-Ogievskiy geschrieben: Add test for a new backup option: discard-source. Signed-off-by: Vladimir Sementsov-Ogievskiy Reviewed-by: Fiona Ebner Tested-by: Fiona Ebner This test fails for me, and it already does so after this commit that introduced it. I haven't checked what get_actual_size(), but I'm running on XFS, so its preallocation could be causing this. We generally avoid checking the number of allocated blocks in image files for this reason. Hmm right, I see that relying on allocated size is bad thing. Better is to check block status, to see how many qcow2 clusters are allocated. Do we have some qmp command to get such information? The simplest way I see is to add dirty to temp block-node, and then check its dirty count through query-named-block-nodes Hm, does it have to be QMP in a running QEMU process? hmm, yes, seems in test_discard_written() we do want to examine running process. I'll try to go with bitmap I'm not sure what the best way would be there. Otherwise, my approach would be 'qemu-img check' or even 'qemu-img map', depending on what you want to verify with it. Kevin -- Best regards, Vladimir
Re: [PATCH v4 5/5] iotests: add backup-discard-source
On 11.06.24 20:49, Kevin Wolf wrote: Am 13.03.2024 um 16:28 hat Vladimir Sementsov-Ogievskiy geschrieben: Add test for a new backup option: discard-source. Signed-off-by: Vladimir Sementsov-Ogievskiy Reviewed-by: Fiona Ebner Tested-by: Fiona Ebner This test fails for me, and it already does so after this commit that introduced it. I haven't checked what get_actual_size(), but I'm running on XFS, so its preallocation could be causing this. We generally avoid checking the number of allocated blocks in image files for this reason. Hmm right, I see that relying on allocated size is bad thing. Better is to check block status, to see how many qcow2 clusters are allocated. Do we have some qmp command to get such information? The simplest way I see is to add dirty to temp block-node, and then check its dirty count through query-named-block-nodes -- Best regards, Vladimir
[PULL v2 0/7] Block jobs patches for 2024-04-29
The following changes since commit ad10b4badc1dd5b28305f9b9f1168cf0aa3ae946: Merge tag 'pull-error-2024-05-27' of https://repo.or.cz/qemu/armbru into staging (2024-05-27 06:40:42 -0700) are available in the Git repository at: https://gitlab.com/vsementsov/qemu.git tags/pull-block-jobs-2024-04-29-v2 for you to fetch changes up to a149401048481247bcbaf6035a7a1308974fb464: iotests/pylintrc: allow up to 10 similar lines (2024-05-28 15:52:15 +0300) Block jobs patches for 2024-04-29 v2: add "iotests/pylintrc: allow up to 10 similar lines" to fix check-python-minreqs - backup: discard-source parameter - blockcommit: Reopen base image as RO after abort Alexander Ivanov (1): blockcommit: Reopen base image as RO after abort Vladimir Sementsov-Ogievskiy (6): block/copy-before-write: fix permission block/copy-before-write: support unligned snapshot-discard block/copy-before-write: create block_copy bitmap in filter node qapi: blockdev-backup: add discard-source parameter iotests: add backup-discard-source iotests/pylintrc: allow up to 10 similar lines block/backup.c | 5 +- block/block-copy.c | 12 ++- block/copy-before-write.c | 39 +++-- block/copy-before-write.h | 1 + block/mirror.c | 11 ++- block/replication.c| 4 +- blockdev.c | 2 +- include/block/block-common.h | 2 + include/block/block-copy.h | 2 + include/block/block_int-global-state.h | 2 +- qapi/block-core.json | 4 + tests/qemu-iotests/257.out | 112 +- tests/qemu-iotests/pylintrc| 2 +- tests/qemu-iotests/tests/backup-discard-source | 152 tests/qemu-iotests/tests/backup-discard-source.out | 5 ++ 15 files changed, 282 insertions(+), 73 deletions(-) create mode 100755 tests/qemu-iotests/tests/backup-discard-source create mode 100644 tests/qemu-iotests/tests/backup-discard-source.out Alexander Ivanov (1): blockcommit: Reopen base image as RO after abort Vladimir Sementsov-Ogievskiy (6): block/copy-before-write: fix permission block/copy-before-write: support unligned snapshot-discard block/copy-before-write: create block_copy bitmap in filter node qapi: blockdev-backup: add discard-source parameter iotests: add backup-discard-source iotests/pylintrc: allow up to 10 similar lines block/backup.c| 5 +- block/block-copy.c| 12 +- block/copy-before-write.c | 39 - block/copy-before-write.h | 1 + block/mirror.c| 11 +- block/replication.c | 4 +- blockdev.c| 2 +- include/block/block-common.h | 2 + include/block/block-copy.h| 2 + include/block/block_int-global-state.h| 2 +- qapi/block-core.json | 4 + tests/qemu-iotests/257.out| 112 ++--- tests/qemu-iotests/pylintrc | 2 +- .../qemu-iotests/tests/backup-discard-source | 152 ++ .../tests/backup-discard-source.out | 5 + 15 files changed, 282 insertions(+), 73 deletions(-) create mode 100755 tests/qemu-iotests/tests/backup-discard-source create mode 100644 tests/qemu-iotests/tests/backup-discard-source.out -- 2.34.1
[PULL v2 7/7] iotests/pylintrc: allow up to 10 similar lines
We want to have similar QMP objects in different tests. Reworking these objects to make common parts by calling some helper functions doesn't seem good. It's a lot more comfortable to see the whole QAPI request in one place. So, let's increase the limit, to unblock further commit "iotests: add backup-discard-source" Reviewed-by: Philippe Mathieu-Daudé Signed-off-by: Vladimir Sementsov-Ogievskiy --- tests/qemu-iotests/pylintrc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/qemu-iotests/pylintrc b/tests/qemu-iotests/pylintrc index de2e0c2781..05b75ee59b 100644 --- a/tests/qemu-iotests/pylintrc +++ b/tests/qemu-iotests/pylintrc @@ -55,4 +55,4 @@ max-line-length=79 [SIMILARITIES] -min-similarity-lines=6 +min-similarity-lines=10 -- 2.34.1
Re: [PATCH] iotests/pylintrc: allow up to 10 similar lines
On 30.04.24 12:13, Vladimir Sementsov-Ogievskiy wrote: We want to have similar QMP objects in different tests. Reworking these objects to make common parts by calling some helper functions doesn't seem good. It's a lot more comfortable to see the whole QAPI request in one place. So, let's increase the limit, to unblock further commit "iotests: add backup-discard-source" Signed-off-by: Vladimir Sementsov-Ogievskiy --- Hi all! That's a patch to unblock my PR "[PULL 0/6] Block jobs patches for 2024-04-29" <20240429115157.2260885-1-vsement...@yandex-team.ru> https://patchew.org/QEMU/20240429115157.2260885-1-vsement...@yandex-team.ru/ tests/qemu-iotests/pylintrc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/qemu-iotests/pylintrc b/tests/qemu-iotests/pylintrc index de2e0c2781..05b75ee59b 100644 --- a/tests/qemu-iotests/pylintrc +++ b/tests/qemu-iotests/pylintrc @@ -55,4 +55,4 @@ max-line-length=79 [SIMILARITIES] -min-similarity-lines=6 +min-similarity-lines=10 Hi! I hope it's OK, if I just apply this to my block branch, and resend "[PULL 0/6] Block jobs patches for 2024-04-29" which is blocked on this problem. Thanks for reviewing, applied to my block branch -- Best regards, Vladimir
Re: [PATCH v4 3/3] qapi: introduce device-sync-config
On 30.04.24 11:31, Vladimir Sementsov-Ogievskiy wrote: On 30.04.24 11:19, Markus Armbruster wrote: Vladimir Sementsov-Ogievskiy writes: Add command to sync config from vhost-user backend to the device. It may be helpful when VHOST_USER_SLAVE_CONFIG_CHANGE_MSG failed or not triggered interrupt to the guest or just not available (not supported by vhost-user server). Command result is racy if allow it during migration. Let's allow the sync only in RUNNING state. Signed-off-by: Vladimir Sementsov-Ogievskiy --- hw/block/vhost-user-blk.c | 1 + hw/virtio/virtio-pci.c | 9 include/hw/qdev-core.h | 3 +++ qapi/qdev.json | 23 +++ system/qdev-monitor.c | 48 +++ 5 files changed, 84 insertions(+) diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c index 091d2c6acf..2f301f380c 100644 --- a/hw/block/vhost-user-blk.c +++ b/hw/block/vhost-user-blk.c @@ -588,6 +588,7 @@ static void vhost_user_blk_class_init(ObjectClass *klass, void *data) device_class_set_props(dc, vhost_user_blk_properties); dc->vmsd = _vhost_user_blk; + dc->sync_config = vhost_user_blk_sync_config; set_bit(DEVICE_CATEGORY_STORAGE, dc->categories); vdc->realize = vhost_user_blk_device_realize; vdc->unrealize = vhost_user_blk_device_unrealize; diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c index b1d02f4b3d..0d91e8b5dc 100644 --- a/hw/virtio/virtio-pci.c +++ b/hw/virtio/virtio-pci.c @@ -2351,6 +2351,14 @@ static void virtio_pci_dc_realize(DeviceState *qdev, Error **errp) vpciklass->parent_dc_realize(qdev, errp); } +static int virtio_pci_sync_config(DeviceState *dev, Error **errp) +{ + VirtIOPCIProxy *proxy = VIRTIO_PCI(dev); + VirtIODevice *vdev = virtio_bus_get_device(>bus); + + return qdev_sync_config(DEVICE(vdev), errp); +} + static void virtio_pci_class_init(ObjectClass *klass, void *data) { DeviceClass *dc = DEVICE_CLASS(klass); @@ -2367,6 +2375,7 @@ static void virtio_pci_class_init(ObjectClass *klass, void *data) device_class_set_parent_realize(dc, virtio_pci_dc_realize, >parent_dc_realize); rc->phases.hold = virtio_pci_bus_reset_hold; + dc->sync_config = virtio_pci_sync_config; } static const TypeInfo virtio_pci_info = { diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h index 9228e96c87..87135bdcdf 100644 --- a/include/hw/qdev-core.h +++ b/include/hw/qdev-core.h @@ -95,6 +95,7 @@ typedef void (*DeviceUnrealize)(DeviceState *dev); typedef void (*DeviceReset)(DeviceState *dev); typedef void (*BusRealize)(BusState *bus, Error **errp); typedef void (*BusUnrealize)(BusState *bus); +typedef int (*DeviceSyncConfig)(DeviceState *dev, Error **errp); /** * struct DeviceClass - The base class for all devices. @@ -162,6 +163,7 @@ struct DeviceClass { DeviceReset reset; DeviceRealize realize; DeviceUnrealize unrealize; + DeviceSyncConfig sync_config; /** * @vmsd: device state serialisation description for @@ -546,6 +548,7 @@ bool qdev_hotplug_allowed(DeviceState *dev, Error **errp); */ HotplugHandler *qdev_get_hotplug_handler(DeviceState *dev); void qdev_unplug(DeviceState *dev, Error **errp); +int qdev_sync_config(DeviceState *dev, Error **errp); void qdev_simple_device_unplug_cb(HotplugHandler *hotplug_dev, DeviceState *dev, Error **errp); void qdev_machine_creation_done(void); diff --git a/qapi/qdev.json b/qapi/qdev.json index facaa0bc6a..fc5e125a45 100644 --- a/qapi/qdev.json +++ b/qapi/qdev.json @@ -161,3 +161,26 @@ ## { 'event': 'DEVICE_UNPLUG_GUEST_ERROR', 'data': { '*device': 'str', 'path': 'str' } } + +## +# @device-sync-config: +# +# Synchronize device configuration from host to guest part. First, +# copy the configuration from the host part (backend) to the guest +# part (frontend). Then notify guest software that device +# configuration changed. Blank line here, please. +# The command may be used to notify the guest about block device +# capcity change. Currently only vhost-user-blk device supports +# this. +# +# @id: the device's ID or QOM path +# +# Features: +# +# @unstable: The command is experimental. +# +# Since: 9.1 +## +{ 'command': 'device-sync-config', + 'features': [ 'unstable' ], + 'data': {'id': 'str'} } diff --git a/system/qdev-monitor.c b/system/qdev-monitor.c index 264978aa40..47bfc0506e 100644 --- a/system/qdev-monitor.c +++ b/system/qdev-monitor.c @@ -23,6 +23,7 @@ #include "monitor/monitor.h" #include "monitor/qdev.h" #include "sysemu/arch_init.h" +#include "sysemu/runstate.h" #include "qapi/error.h" #include "qapi/qapi-commands-qdev.h" #include "qapi/qmp/dispatch.h" @@ -971,6 +972,53 @@ void qmp_device_del(const char *id, Error **errp) } }
Re: [PATCH v6 4/5] migration: process_incoming_migration_co(): rework error reporting
On 30.04.24 12:16, Philippe Mathieu-Daudé wrote: On 30/4/24 10:56, Vladimir Sementsov-Ogievskiy wrote: Unify error reporting in the function. This simplifies the following commit, which will not-exit-on-error behavior variant to the function. Signed-off-by: Vladimir Sementsov-Ogievskiy --- migration/migration.c | 23 +-- 1 file changed, 13 insertions(+), 10 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index b307a4bc59..a9599838e6 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -735,14 +735,16 @@ static void process_incoming_migration_bh(void *opaque) static void coroutine_fn process_incoming_migration_co(void *opaque) { + MigrationState *s = migrate_get_current(); (see below) MigrationIncomingState *mis = migration_incoming_get_current(); PostcopyState ps; int ret; + Error *local_err = NULL; assert(mis->from_src_file); if (compress_threads_load_setup(mis->from_src_file)) { - error_report("Failed to setup decompress threads"); + error_setg(_err, "Failed to setup decompress threads"); goto fail; } @@ -779,19 +781,12 @@ process_incoming_migration_co(void *opaque) } if (ret < 0) { - MigrationState *s = migrate_get_current(); - - if (migrate_has_error(s)) { - WITH_QEMU_LOCK_GUARD(>error_mutex) { - error_report_err(s->error); - s->error = NULL; - } - } - error_report("load of migration failed: %s", strerror(-ret)); + error_setg(_err, "load of migration failed: %s", strerror(-ret)); goto fail; } if (colo_incoming_co() < 0) { + error_setg(_err, "colo incoming failed"); goto fail; } @@ -800,8 +795,16 @@ process_incoming_migration_co(void *opaque) fail: Maybe just assign @s in error path here? s = migrate_get_current(); I'd keep as is. If continue improving the function, I'd better split the logic to seperate function with classic "Error **errp" argument. And keep reporting error in caller. migrate_set_state(>state, MIGRATION_STATUS_ACTIVE, MIGRATION_STATUS_FAILED); + migrate_set_error(s, local_err); + error_free(local_err); + migration_incoming_state_destroy(); + WITH_QEMU_LOCK_GUARD(>error_mutex) { + error_report_err(s->error); + s->error = NULL; + } + exit(EXIT_FAILURE); } Reviewed-by: Philippe Mathieu-Daudé -- Best regards, Vladimir
Re: [PULL 6/6] iotests: add backup-discard-source
On 30.04.24 12:13, Kevin Wolf wrote: Am 29.04.2024 um 20:39 hat Vladimir Sementsov-Ogievskiy geschrieben: [Add John] On 29.04.24 17:18, Richard Henderson wrote: On 4/29/24 04:51, Vladimir Sementsov-Ogievskiy wrote: Add test for a new backup option: discard-source. Signed-off-by: Vladimir Sementsov-Ogievskiy Reviewed-by: Fiona Ebner Tested-by: Fiona Ebner Message-Id: <20240313152822.626493-6-vsement...@yandex-team.ru> Signed-off-by: Vladimir Sementsov-Ogievskiy --- .../qemu-iotests/tests/backup-discard-source | 152 ++ .../tests/backup-discard-source.out | 5 + 2 files changed, 157 insertions(+) create mode 100755 tests/qemu-iotests/tests/backup-discard-source create mode 100644 tests/qemu-iotests/tests/backup-discard-source.out This fails check-python-minreqs https://gitlab.com/qemu-project/qemu/-/jobs/6739551782 It appears to be a pylint issue. tests/export-incoming-iothread:1:0: R0801: Similar lines in 2 files ==tests.backup-discard-source:[52:61] ==tests.copy-before-write:[54:63] 'file': { 'driver': iotests.imgfmt, 'file': { 'driver': 'file', 'filename': source_img, } }, 'target': { 'driver': iotests.imgfmt, (duplicate-code) Hmm. I don't think, that something should be changed in code. splitting out part of this json object to a function? That's a test for QMP command, and it's good that we see the command as is in one place. I can swap some lines or rename variables to hack the linter, but I'd prefer not doing so:) For me that looks like a false-positive: yes it's a duplication, but it should better be duplication, than complicating raw json objects by reusing common parts. What to do? As described in 22305c2a081b8b6 "python: Reduce strictness of pylint's duplicate-code check", this check could be either be disabled completely, or we can increase min-similarity-lines config value. I'd just disable it completely. Any thoughts? I think it would make sense to treat tests differently from real production code. In tests/, some duplication is bound to happen and trying to unify things across test cases (which would mean moving to iotests.py) often doesn't make sense. On the other hand, in python/ we would probably want to unify duplicated code. Agree. Happily, it turns out that tests already have separate tests/qemu-iotests/pylintrc file, so that's not a problem. Still I decided to anot disable the check completely, but just bump the limit, see "[PATCH] iotests/pylintrc: allow up to 10 similar lines" -- Best regards, Vladimir
[PATCH] iotests/pylintrc: allow up to 10 similar lines
We want to have similar QMP objects in different tests. Reworking these objects to make common parts by calling some helper functions doesn't seem good. It's a lot more comfortable to see the whole QAPI request in one place. So, let's increase the limit, to unblock further commit "iotests: add backup-discard-source" Signed-off-by: Vladimir Sementsov-Ogievskiy --- Hi all! That's a patch to unblock my PR "[PULL 0/6] Block jobs patches for 2024-04-29" <20240429115157.2260885-1-vsement...@yandex-team.ru> https://patchew.org/QEMU/20240429115157.2260885-1-vsement...@yandex-team.ru/ tests/qemu-iotests/pylintrc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/qemu-iotests/pylintrc b/tests/qemu-iotests/pylintrc index de2e0c2781..05b75ee59b 100644 --- a/tests/qemu-iotests/pylintrc +++ b/tests/qemu-iotests/pylintrc @@ -55,4 +55,4 @@ max-line-length=79 [SIMILARITIES] -min-similarity-lines=6 +min-similarity-lines=10 -- 2.34.1
[PATCH v6 4/5] migration: process_incoming_migration_co(): rework error reporting
Unify error reporting in the function. This simplifies the following commit, which will not-exit-on-error behavior variant to the function. Signed-off-by: Vladimir Sementsov-Ogievskiy --- migration/migration.c | 23 +-- 1 file changed, 13 insertions(+), 10 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index b307a4bc59..a9599838e6 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -735,14 +735,16 @@ static void process_incoming_migration_bh(void *opaque) static void coroutine_fn process_incoming_migration_co(void *opaque) { +MigrationState *s = migrate_get_current(); MigrationIncomingState *mis = migration_incoming_get_current(); PostcopyState ps; int ret; +Error *local_err = NULL; assert(mis->from_src_file); if (compress_threads_load_setup(mis->from_src_file)) { -error_report("Failed to setup decompress threads"); +error_setg(_err, "Failed to setup decompress threads"); goto fail; } @@ -779,19 +781,12 @@ process_incoming_migration_co(void *opaque) } if (ret < 0) { -MigrationState *s = migrate_get_current(); - -if (migrate_has_error(s)) { -WITH_QEMU_LOCK_GUARD(>error_mutex) { -error_report_err(s->error); -s->error = NULL; -} -} -error_report("load of migration failed: %s", strerror(-ret)); +error_setg(_err, "load of migration failed: %s", strerror(-ret)); goto fail; } if (colo_incoming_co() < 0) { +error_setg(_err, "colo incoming failed"); goto fail; } @@ -800,8 +795,16 @@ process_incoming_migration_co(void *opaque) fail: migrate_set_state(>state, MIGRATION_STATUS_ACTIVE, MIGRATION_STATUS_FAILED); +migrate_set_error(s, local_err); +error_free(local_err); + migration_incoming_state_destroy(); +WITH_QEMU_LOCK_GUARD(>error_mutex) { +error_report_err(s->error); +s->error = NULL; +} + exit(EXIT_FAILURE); } -- 2.34.1
[PATCH v6 1/5] migration: move trace-point from migrate_fd_error to migrate_set_error
Cover more cases by trace-point. Signed-off-by: Vladimir Sementsov-Ogievskiy Reviewed-by: Fabiano Rosas Reviewed-by: Peter Xu --- migration/migration.c | 4 +++- migration/trace-events | 2 +- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index b5af6b5105..2dc6a063e9 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1421,6 +1421,9 @@ static void migrate_fd_cleanup_bh(void *opaque) void migrate_set_error(MigrationState *s, const Error *error) { QEMU_LOCK_GUARD(>error_mutex); + +trace_migrate_error(error_get_pretty(error)); + if (!s->error) { s->error = error_copy(error); } @@ -1444,7 +1447,6 @@ static void migrate_error_free(MigrationState *s) static void migrate_fd_error(MigrationState *s, const Error *error) { -trace_migrate_fd_error(error_get_pretty(error)); assert(s->to_dst_file == NULL); migrate_set_state(>state, MIGRATION_STATUS_SETUP, MIGRATION_STATUS_FAILED); diff --git a/migration/trace-events b/migration/trace-events index f0e1cb80c7..d0c44c3853 100644 --- a/migration/trace-events +++ b/migration/trace-events @@ -152,7 +152,7 @@ multifd_set_outgoing_channel(void *ioc, const char *ioctype, const char *hostnam # migration.c migrate_set_state(const char *new_state) "new state %s" migrate_fd_cleanup(void) "" -migrate_fd_error(const char *error_desc) "error=%s" +migrate_error(const char *error_desc) "error=%s" migrate_fd_cancel(void) "" migrate_handle_rp_req_pages(const char *rbname, size_t start, size_t len) "in %s at 0x%zx len 0x%zx" migrate_pending_exact(uint64_t size, uint64_t pre, uint64_t post) "exact pending size %" PRIu64 " (pre = %" PRIu64 " post=%" PRIu64 ")" -- 2.34.1
[PATCH v6 5/5] qapi: introduce exit-on-error parameter for migrate-incoming
Now we do set MIGRATION_FAILED state, but don't give a chance to orchestrator to query migration state and get the error. Let's provide a possibility for QMP-based orchestrators to get an error like with outgoing migration. For hmp_migrate_incoming(), let's enable the new behavior: HMP is not and ABI, it's mostly intended to use by developer and it makes sense not to stop the process. For x-exit-preconfig, let's keep the old behavior: - it's called from init(), so here we want to keep current behavior by default - it does exit on error by itself as well So, if we want to change the behavior of x-exit-preconfig, it should be another patch. Signed-off-by: Vladimir Sementsov-Ogievskiy Acked-by: Markus Armbruster --- migration/migration-hmp-cmds.c | 2 +- migration/migration.c | 33 +++-- migration/migration.h | 3 +++ qapi/migration.json| 7 ++- system/vl.c| 3 ++- 5 files changed, 39 insertions(+), 9 deletions(-) diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c index 7e96ae6ffd..23181bbee1 100644 --- a/migration/migration-hmp-cmds.c +++ b/migration/migration-hmp-cmds.c @@ -466,7 +466,7 @@ void hmp_migrate_incoming(Monitor *mon, const QDict *qdict) } QAPI_LIST_PREPEND(caps, g_steal_pointer()); -qmp_migrate_incoming(NULL, true, caps, ); +qmp_migrate_incoming(NULL, true, caps, true, false, ); qapi_free_MigrationChannelList(caps); end: diff --git a/migration/migration.c b/migration/migration.c index a9599838e6..289afa8d85 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -72,6 +72,8 @@ #define NOTIFIER_ELEM_INIT(array, elem)\ [elem] = NOTIFIER_WITH_RETURN_LIST_INITIALIZER((array)[elem]) +#define INMIGRATE_DEFAULT_EXIT_ON_ERROR true + static NotifierWithReturnList migration_state_notifiers[] = { NOTIFIER_ELEM_INIT(migration_state_notifiers, MIG_MODE_NORMAL), NOTIFIER_ELEM_INIT(migration_state_notifiers, MIG_MODE_CPR_REBOOT), @@ -234,6 +236,8 @@ void migration_object_init(void) qemu_cond_init(_incoming->page_request_cond); current_incoming->page_requested = g_tree_new(page_request_addr_cmp); +current_incoming->exit_on_error = INMIGRATE_DEFAULT_EXIT_ON_ERROR; + migration_object_check(current_migration, _fatal); blk_mig_init(); @@ -800,12 +804,14 @@ fail: migration_incoming_state_destroy(); -WITH_QEMU_LOCK_GUARD(>error_mutex) { -error_report_err(s->error); -s->error = NULL; -} +if (mis->exit_on_error) { +WITH_QEMU_LOCK_GUARD(>error_mutex) { +error_report_err(s->error); +s->error = NULL; +} -exit(EXIT_FAILURE); +exit(EXIT_FAILURE); +} } /** @@ -1314,6 +1320,15 @@ static void fill_destination_migration_info(MigrationInfo *info) break; } info->status = mis->state; + +if (!info->error_desc) { +MigrationState *s = migrate_get_current(); +QEMU_LOCK_GUARD(>error_mutex); + +if (s->error) { +info->error_desc = g_strdup(error_get_pretty(s->error)); +} +} } MigrationInfo *qmp_query_migrate(Error **errp) @@ -1797,10 +1812,13 @@ void migrate_del_blocker(Error **reasonp) } void qmp_migrate_incoming(const char *uri, bool has_channels, - MigrationChannelList *channels, Error **errp) + MigrationChannelList *channels, + bool has_exit_on_error, bool exit_on_error, + Error **errp) { Error *local_err = NULL; static bool once = true; +MigrationIncomingState *mis = migration_incoming_get_current(); if (!once) { error_setg(errp, "The incoming migration has already been started"); @@ -1815,6 +1833,9 @@ void qmp_migrate_incoming(const char *uri, bool has_channels, return; } +mis->exit_on_error = +has_exit_on_error ? exit_on_error : INMIGRATE_DEFAULT_EXIT_ON_ERROR; + qemu_start_incoming_migration(uri, has_channels, channels, _err); if (local_err) { diff --git a/migration/migration.h b/migration/migration.h index 8045e39c26..95995a818e 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -227,6 +227,9 @@ struct MigrationIncomingState { * is needed as this field is updated serially. */ unsigned int switchover_ack_pending_num; + +/* Do exit on incoming migration failure */ +bool exit_on_error; }; MigrationIncomingState *migration_incoming_get_current(void); diff --git a/qapi/migration.json b/qapi/migration.json index 8c65b90328..9feed413b5 100644 --- a/qapi/migration.json +++ b/qapi/migration.json @@ -1837,6 +1837,10 @@ # @channels: list of migration stream channels with each stream in the # list connected to a destination interface endpoint. # +# @exit-on-error: Ex
[PATCH v6 2/5] migration: process_incoming_migration_co(): complete cleanup on failure
Make call to migration_incoming_state_destroy(), instead of doing only partial of it. Signed-off-by: Vladimir Sementsov-Ogievskiy Reviewed-by: Fabiano Rosas Reviewed-by: Peter Xu --- migration/migration.c | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 2dc6a063e9..0d26db47f7 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -799,10 +799,7 @@ process_incoming_migration_co(void *opaque) fail: migrate_set_state(>state, MIGRATION_STATUS_ACTIVE, MIGRATION_STATUS_FAILED); -qemu_fclose(mis->from_src_file); - -multifd_recv_cleanup(); -compress_threads_load_cleanup(); +migration_incoming_state_destroy(); exit(EXIT_FAILURE); } -- 2.34.1
[PATCH v6 0/5] migration: do not exit on incoming failure
Hi all! The series brings an option to not immediately exit on incoming migration failure, giving a possibility to orchestrator to get the error through QAPI and shutdown QEMU by "quit". v6: 01,02: add r-b by Peter 03: only fix potential use-after-free 04: rework error handling, drop r-b v5: - add "migration: process_incoming_migration_co(): fix reporting s->error" v4: - add r-b and a-b by Fabiano and Markus - improve wording in 04 as Markus suggested v3: - don't refactor the whole code around setting migration error, it seems too much and necessary for the new feature itself - add constant - change behavior for HMP command - split some things to separate patches - and more, by Peter's suggestions New behavior can be demonstrated like this: bash: ( cat
[PATCH v6 3/5] migration: process_incoming_migration_co(): fix reporting s->error
It's bad idea to leave critical section with error object freed, but s->error still set, this theoretically may lead to use-after-free crash. Let's avoid it. Signed-off-by: Vladimir Sementsov-Ogievskiy --- migration/migration.c | 1 + 1 file changed, 1 insertion(+) diff --git a/migration/migration.c b/migration/migration.c index 0d26db47f7..b307a4bc59 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -784,6 +784,7 @@ process_incoming_migration_co(void *opaque) if (migrate_has_error(s)) { WITH_QEMU_LOCK_GUARD(>error_mutex) { error_report_err(s->error); +s->error = NULL; } } error_report("load of migration failed: %s", strerror(-ret)); -- 2.34.1
Re: [PATCH v5 3/5] migration: process_incoming_migration_co(): fix reporting s->error
On 30.04.24 11:09, Vladimir Sementsov-Ogievskiy wrote: On 30.04.24 11:06, Vladimir Sementsov-Ogievskiy wrote: On 29.04.24 22:32, Peter Xu wrote: On Mon, Apr 29, 2024 at 10:14:24PM +0300, Vladimir Sementsov-Ogievskiy wrote: It's bad idea to leave critical section with error object freed, but s->error still set, this theoretically may lead to use-after-free crash. Let's avoid it. Signed-off-by: Vladimir Sementsov-Ogievskiy --- migration/migration.c | 24 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 0d26db47f7..58fd5819bc 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -732,9 +732,19 @@ static void process_incoming_migration_bh(void *opaque) migration_incoming_state_destroy(); } +static void migrate_error_free(MigrationState *s) +{ + QEMU_LOCK_GUARD(>error_mutex); + if (s->error) { + error_free(s->error); + s->error = NULL; + } +} + static void coroutine_fn process_incoming_migration_co(void *opaque) { + MigrationState *s = migrate_get_current(); MigrationIncomingState *mis = migration_incoming_get_current(); PostcopyState ps; int ret; @@ -779,11 +789,9 @@ process_incoming_migration_co(void *opaque) } if (ret < 0) { - MigrationState *s = migrate_get_current(); - if (migrate_has_error(s)) { WITH_QEMU_LOCK_GUARD(>error_mutex) { - error_report_err(s->error); + error_report_err(error_copy(s->error)); This looks like a bugfix, agreed. } } error_report("load of migration failed: %s", strerror(-ret)); @@ -801,6 +809,7 @@ fail: MIGRATION_STATUS_FAILED); migration_incoming_state_destroy(); + migrate_error_free(s); Would migration_incoming_state_destroy() be a better place? Hmm. But we want to call migration_incoming_state_destroy() in case when exit-on-error=false too. And in this case we want to keep the error for further query-migrate commands. Actually, I think we shouldn't care about freeing the error for exit() case. I think, we skip a lot of other cleanups which we normally do in main() in case of this exit(). Or, just do the simplest fix of potential use-after-free, and don't care: WITH_QEMU_LOCK_GUARD(>error_mutex) { error_report_err(s->error); s->error = NULL; } - I'll send v6 with it. One thing weird is we actually reuses MigrationState*'s error for incoming too, but so far it looks ok as long as QEMU can't be both src & dst. Then calling migrate_error_free even in incoming_state_destroy() looks ok. This patch still looks like containing two changes. Better split them (or just fix the bug only)? Thanks, exit(EXIT_FAILURE); } @@ -1433,15 +1442,6 @@ bool migrate_has_error(MigrationState *s) return qatomic_read(>error); } -static void migrate_error_free(MigrationState *s) -{ - QEMU_LOCK_GUARD(>error_mutex); - if (s->error) { - error_free(s->error); - s->error = NULL; - } -} - static void migrate_fd_error(MigrationState *s, const Error *error) { assert(s->to_dst_file == NULL); -- 2.34.1 -- Best regards, Vladimir
Re: [PATCH v4 3/3] qapi: introduce device-sync-config
On 30.04.24 11:19, Markus Armbruster wrote: Vladimir Sementsov-Ogievskiy writes: Add command to sync config from vhost-user backend to the device. It may be helpful when VHOST_USER_SLAVE_CONFIG_CHANGE_MSG failed or not triggered interrupt to the guest or just not available (not supported by vhost-user server). Command result is racy if allow it during migration. Let's allow the sync only in RUNNING state. Signed-off-by: Vladimir Sementsov-Ogievskiy --- hw/block/vhost-user-blk.c | 1 + hw/virtio/virtio-pci.c| 9 include/hw/qdev-core.h| 3 +++ qapi/qdev.json| 23 +++ system/qdev-monitor.c | 48 +++ 5 files changed, 84 insertions(+) diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c index 091d2c6acf..2f301f380c 100644 --- a/hw/block/vhost-user-blk.c +++ b/hw/block/vhost-user-blk.c @@ -588,6 +588,7 @@ static void vhost_user_blk_class_init(ObjectClass *klass, void *data) device_class_set_props(dc, vhost_user_blk_properties); dc->vmsd = _vhost_user_blk; +dc->sync_config = vhost_user_blk_sync_config; set_bit(DEVICE_CATEGORY_STORAGE, dc->categories); vdc->realize = vhost_user_blk_device_realize; vdc->unrealize = vhost_user_blk_device_unrealize; diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c index b1d02f4b3d..0d91e8b5dc 100644 --- a/hw/virtio/virtio-pci.c +++ b/hw/virtio/virtio-pci.c @@ -2351,6 +2351,14 @@ static void virtio_pci_dc_realize(DeviceState *qdev, Error **errp) vpciklass->parent_dc_realize(qdev, errp); } +static int virtio_pci_sync_config(DeviceState *dev, Error **errp) +{ +VirtIOPCIProxy *proxy = VIRTIO_PCI(dev); +VirtIODevice *vdev = virtio_bus_get_device(>bus); + +return qdev_sync_config(DEVICE(vdev), errp); +} + static void virtio_pci_class_init(ObjectClass *klass, void *data) { DeviceClass *dc = DEVICE_CLASS(klass); @@ -2367,6 +2375,7 @@ static void virtio_pci_class_init(ObjectClass *klass, void *data) device_class_set_parent_realize(dc, virtio_pci_dc_realize, >parent_dc_realize); rc->phases.hold = virtio_pci_bus_reset_hold; +dc->sync_config = virtio_pci_sync_config; } static const TypeInfo virtio_pci_info = { diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h index 9228e96c87..87135bdcdf 100644 --- a/include/hw/qdev-core.h +++ b/include/hw/qdev-core.h @@ -95,6 +95,7 @@ typedef void (*DeviceUnrealize)(DeviceState *dev); typedef void (*DeviceReset)(DeviceState *dev); typedef void (*BusRealize)(BusState *bus, Error **errp); typedef void (*BusUnrealize)(BusState *bus); +typedef int (*DeviceSyncConfig)(DeviceState *dev, Error **errp); /** * struct DeviceClass - The base class for all devices. @@ -162,6 +163,7 @@ struct DeviceClass { DeviceReset reset; DeviceRealize realize; DeviceUnrealize unrealize; +DeviceSyncConfig sync_config; /** * @vmsd: device state serialisation description for @@ -546,6 +548,7 @@ bool qdev_hotplug_allowed(DeviceState *dev, Error **errp); */ HotplugHandler *qdev_get_hotplug_handler(DeviceState *dev); void qdev_unplug(DeviceState *dev, Error **errp); +int qdev_sync_config(DeviceState *dev, Error **errp); void qdev_simple_device_unplug_cb(HotplugHandler *hotplug_dev, DeviceState *dev, Error **errp); void qdev_machine_creation_done(void); diff --git a/qapi/qdev.json b/qapi/qdev.json index facaa0bc6a..fc5e125a45 100644 --- a/qapi/qdev.json +++ b/qapi/qdev.json @@ -161,3 +161,26 @@ ## { 'event': 'DEVICE_UNPLUG_GUEST_ERROR', 'data': { '*device': 'str', 'path': 'str' } } + +## +# @device-sync-config: +# +# Synchronize device configuration from host to guest part. First, +# copy the configuration from the host part (backend) to the guest +# part (frontend). Then notify guest software that device +# configuration changed. Blank line here, please. +# The command may be used to notify the guest about block device +# capcity change. Currently only vhost-user-blk device supports +# this. +# +# @id: the device's ID or QOM path +# +# Features: +# +# @unstable: The command is experimental. +# +# Since: 9.1 +## +{ 'command': 'device-sync-config', + 'features': [ 'unstable' ], + 'data': {'id': 'str'} } diff --git a/system/qdev-monitor.c b/system/qdev-monitor.c index 264978aa40..47bfc0506e 100644 --- a/system/qdev-monitor.c +++ b/system/qdev-monitor.c @@ -23,6 +23,7 @@ #include "monitor/monitor.h" #include "monitor/qdev.h" #include "sysemu/arch_init.h" +#include "sysemu/runstate.h" #include "qapi/error.h" #include "qapi/qapi-commands-qdev.h" #include "qapi/qmp/dispatch.h" @@ -971,6 +972,53 @@ void qmp_device_del(const char *id, Error **errp) } } +int qdev_sync_config(DeviceSta
Re: [PATCH v4 2/3] vhost-user-blk: split vhost_user_blk_sync_config()
On 30.04.24 11:15, Markus Armbruster wrote: Vladimir Sementsov-Ogievskiy writes: Split vhost_user_blk_sync_config() out from vhost_user_blk_handle_config_change(), to be reused in the following commit. Signed-off-by: Vladimir Sementsov-Ogievskiy --- hw/block/vhost-user-blk.c | 26 +++--- 1 file changed, 19 insertions(+), 7 deletions(-) diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c index 9e6bbc6950..091d2c6acf 100644 --- a/hw/block/vhost-user-blk.c +++ b/hw/block/vhost-user-blk.c @@ -88,27 +88,39 @@ static void vhost_user_blk_set_config(VirtIODevice *vdev, const uint8_t *config) s->blkcfg.wce = blkcfg->wce; } +static int vhost_user_blk_sync_config(DeviceState *dev, Error **errp) +{ +int ret; +VirtIODevice *vdev = VIRTIO_DEVICE(dev); Note for later: all this function does with paramter @dev is cast it to VirtIODevice *. +VHostUserBlk *s = VHOST_USER_BLK(vdev); + +ret = vhost_dev_get_config(>dev, (uint8_t *)>blkcfg, + vdev->config_len, errp); +if (ret < 0) { +return ret; +} + +memcpy(vdev->config, >blkcfg, vdev->config_len); +virtio_notify_config(vdev); + +return 0; +} + static int vhost_user_blk_handle_config_change(struct vhost_dev *dev) { int ret; -VirtIODevice *vdev = dev->vdev; -VHostUserBlk *s = VHOST_USER_BLK(dev->vdev); Error *local_err = NULL; if (!dev->started) { return 0; } -ret = vhost_dev_get_config(dev, (uint8_t *)>blkcfg, - vdev->config_len, _err); +ret = vhost_user_blk_sync_config(DEVICE(dev->vdev), _err); dev->vdev is a VirtIODevice *. You cast it to DeviceState * for vhost_user_blk_sync_config(), which casts it right back. Could you simply pass it as is instead? vhost_user_blk_sync_config() is generic handler, which will be used as ->sync_config() in the following commit, so it's good and convenient for it to have DeviceState* argument. if (ret < 0) { error_report_err(local_err); return ret; } -memcpy(dev->vdev->config, >blkcfg, vdev->config_len); -virtio_notify_config(dev->vdev); - return 0; } -- Best regards, Vladimir
Re: [PATCH v5 3/5] migration: process_incoming_migration_co(): fix reporting s->error
On 30.04.24 11:06, Vladimir Sementsov-Ogievskiy wrote: On 29.04.24 22:32, Peter Xu wrote: On Mon, Apr 29, 2024 at 10:14:24PM +0300, Vladimir Sementsov-Ogievskiy wrote: It's bad idea to leave critical section with error object freed, but s->error still set, this theoretically may lead to use-after-free crash. Let's avoid it. Signed-off-by: Vladimir Sementsov-Ogievskiy --- migration/migration.c | 24 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 0d26db47f7..58fd5819bc 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -732,9 +732,19 @@ static void process_incoming_migration_bh(void *opaque) migration_incoming_state_destroy(); } +static void migrate_error_free(MigrationState *s) +{ + QEMU_LOCK_GUARD(>error_mutex); + if (s->error) { + error_free(s->error); + s->error = NULL; + } +} + static void coroutine_fn process_incoming_migration_co(void *opaque) { + MigrationState *s = migrate_get_current(); MigrationIncomingState *mis = migration_incoming_get_current(); PostcopyState ps; int ret; @@ -779,11 +789,9 @@ process_incoming_migration_co(void *opaque) } if (ret < 0) { - MigrationState *s = migrate_get_current(); - if (migrate_has_error(s)) { WITH_QEMU_LOCK_GUARD(>error_mutex) { - error_report_err(s->error); + error_report_err(error_copy(s->error)); This looks like a bugfix, agreed. } } error_report("load of migration failed: %s", strerror(-ret)); @@ -801,6 +809,7 @@ fail: MIGRATION_STATUS_FAILED); migration_incoming_state_destroy(); + migrate_error_free(s); Would migration_incoming_state_destroy() be a better place? Hmm. But we want to call migration_incoming_state_destroy() in case when exit-on-error=false too. And in this case we want to keep the error for further query-migrate commands. Actually, I think we shouldn't care about freeing the error for exit() case. I think, we skip a lot of other cleanups which we normally do in main() in case of this exit(). One thing weird is we actually reuses MigrationState*'s error for incoming too, but so far it looks ok as long as QEMU can't be both src & dst. Then calling migrate_error_free even in incoming_state_destroy() looks ok. This patch still looks like containing two changes. Better split them (or just fix the bug only)? Thanks, exit(EXIT_FAILURE); } @@ -1433,15 +1442,6 @@ bool migrate_has_error(MigrationState *s) return qatomic_read(>error); } -static void migrate_error_free(MigrationState *s) -{ - QEMU_LOCK_GUARD(>error_mutex); - if (s->error) { - error_free(s->error); - s->error = NULL; - } -} - static void migrate_fd_error(MigrationState *s, const Error *error) { assert(s->to_dst_file == NULL); -- 2.34.1 -- Best regards, Vladimir
Re: [PATCH v5 3/5] migration: process_incoming_migration_co(): fix reporting s->error
On 29.04.24 22:32, Peter Xu wrote: On Mon, Apr 29, 2024 at 10:14:24PM +0300, Vladimir Sementsov-Ogievskiy wrote: It's bad idea to leave critical section with error object freed, but s->error still set, this theoretically may lead to use-after-free crash. Let's avoid it. Signed-off-by: Vladimir Sementsov-Ogievskiy --- migration/migration.c | 24 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 0d26db47f7..58fd5819bc 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -732,9 +732,19 @@ static void process_incoming_migration_bh(void *opaque) migration_incoming_state_destroy(); } +static void migrate_error_free(MigrationState *s) +{ +QEMU_LOCK_GUARD(>error_mutex); +if (s->error) { +error_free(s->error); +s->error = NULL; +} +} + static void coroutine_fn process_incoming_migration_co(void *opaque) { +MigrationState *s = migrate_get_current(); MigrationIncomingState *mis = migration_incoming_get_current(); PostcopyState ps; int ret; @@ -779,11 +789,9 @@ process_incoming_migration_co(void *opaque) } if (ret < 0) { -MigrationState *s = migrate_get_current(); - if (migrate_has_error(s)) { WITH_QEMU_LOCK_GUARD(>error_mutex) { -error_report_err(s->error); +error_report_err(error_copy(s->error)); This looks like a bugfix, agreed. } } error_report("load of migration failed: %s", strerror(-ret)); @@ -801,6 +809,7 @@ fail: MIGRATION_STATUS_FAILED); migration_incoming_state_destroy(); +migrate_error_free(s); Would migration_incoming_state_destroy() be a better place? Hmm. But we want to call migration_incoming_state_destroy() in case when exit-on-error=false too. And in this case we want to keep the error for further query-migrate commands. One thing weird is we actually reuses MigrationState*'s error for incoming too, but so far it looks ok as long as QEMU can't be both src & dst. Then calling migrate_error_free even in incoming_state_destroy() looks ok. This patch still looks like containing two changes. Better split them (or just fix the bug only)? Thanks, exit(EXIT_FAILURE); } @@ -1433,15 +1442,6 @@ bool migrate_has_error(MigrationState *s) return qatomic_read(>error); } -static void migrate_error_free(MigrationState *s) -{ -QEMU_LOCK_GUARD(>error_mutex); -if (s->error) { -error_free(s->error); -s->error = NULL; -} -} - static void migrate_fd_error(MigrationState *s, const Error *error) { assert(s->to_dst_file == NULL); -- 2.34.1 -- Best regards, Vladimir
Re: [PATCH v5 4/5] migration: process_incoming_migration_co(): rework error reporting
On 29.04.24 22:39, Peter Xu wrote: On Mon, Apr 29, 2024 at 10:14:25PM +0300, Vladimir Sementsov-Ogievskiy wrote: Unify error reporting in the function. This simplifies the following commit, which will not-exit-on-error behavior variant to the function. Signed-off-by: Vladimir Sementsov-Ogievskiy Reviewed-by: Fabiano Rosas --- migration/migration.c | 17 ++--- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 58fd5819bc..5489ff96df 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -748,11 +748,12 @@ process_incoming_migration_co(void *opaque) MigrationIncomingState *mis = migration_incoming_get_current(); PostcopyState ps; int ret; +Error *local_err = NULL; assert(mis->from_src_file); if (compress_threads_load_setup(mis->from_src_file)) { -error_report("Failed to setup decompress threads"); +error_setg(_err, "Failed to setup decompress threads"); goto fail; } @@ -789,16 +790,12 @@ process_incoming_migration_co(void *opaque) } if (ret < 0) { -if (migrate_has_error(s)) { -WITH_QEMU_LOCK_GUARD(>error_mutex) { -error_report_err(error_copy(s->error)); -} -} -error_report("load of migration failed: %s", strerror(-ret)); +error_setg(_err, "load of migration failed: %s", strerror(-ret)); goto fail; } if (colo_incoming_co() < 0) { +error_setg(_err, "colo incoming failed"); goto fail; } @@ -809,6 +806,12 @@ fail: MIGRATION_STATUS_FAILED); migration_incoming_state_destroy(); +if (migrate_has_error(s)) { +WITH_QEMU_LOCK_GUARD(>error_mutex) { +error_report_err(error_copy(s->error)); +} +} +error_report_err(local_err); Here migrate_has_error() will always return true? If so we can drop it. Meanwhile, IMHO it's easier we simply always keep the earliest error we see and report that only, local_err is just one of the errors and whoever reaches first will be reported. Something like: migrate_set_error(local_err); WITH_QEMU_LOCK_GUARD(>error_mutex) { error_report_err(error_copy(s->error)); } exit(EXIT_FAILURE); Then when with the exit-on-error thing: migrate_set_error(local_err); if (exit_on_error) { WITH_QEMU_LOCK_GUARD(>error_mutex) { error_report_err(error_copy(s->error)); } exit(EXIT_FAILURE); } Would this looks slightly cleaner? Yes, looks better, will do so -- Best regards, Vladimir
[PATCH v5 1/5] migration: move trace-point from migrate_fd_error to migrate_set_error
Cover more cases by trace-point. Signed-off-by: Vladimir Sementsov-Ogievskiy Reviewed-by: Fabiano Rosas --- migration/migration.c | 4 +++- migration/trace-events | 2 +- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index b5af6b5105..2dc6a063e9 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1421,6 +1421,9 @@ static void migrate_fd_cleanup_bh(void *opaque) void migrate_set_error(MigrationState *s, const Error *error) { QEMU_LOCK_GUARD(>error_mutex); + +trace_migrate_error(error_get_pretty(error)); + if (!s->error) { s->error = error_copy(error); } @@ -1444,7 +1447,6 @@ static void migrate_error_free(MigrationState *s) static void migrate_fd_error(MigrationState *s, const Error *error) { -trace_migrate_fd_error(error_get_pretty(error)); assert(s->to_dst_file == NULL); migrate_set_state(>state, MIGRATION_STATUS_SETUP, MIGRATION_STATUS_FAILED); diff --git a/migration/trace-events b/migration/trace-events index f0e1cb80c7..d0c44c3853 100644 --- a/migration/trace-events +++ b/migration/trace-events @@ -152,7 +152,7 @@ multifd_set_outgoing_channel(void *ioc, const char *ioctype, const char *hostnam # migration.c migrate_set_state(const char *new_state) "new state %s" migrate_fd_cleanup(void) "" -migrate_fd_error(const char *error_desc) "error=%s" +migrate_error(const char *error_desc) "error=%s" migrate_fd_cancel(void) "" migrate_handle_rp_req_pages(const char *rbname, size_t start, size_t len) "in %s at 0x%zx len 0x%zx" migrate_pending_exact(uint64_t size, uint64_t pre, uint64_t post) "exact pending size %" PRIu64 " (pre = %" PRIu64 " post=%" PRIu64 ")" -- 2.34.1
[PATCH v5 0/5] migration: do not exit on incoming failure
Hi all! The series brings an option to not immediately exit on incoming migration failure, giving a possibility to orchestrator to get the error through QAPI and shutdown QEMU by "quit". v5: - add "migration: process_incoming_migration_co(): fix reporting s->error" v4: - add r-b and a-b by Fabiano and Markus - improve wording in 04 as Markus suggested v3: - don't refactor the whole code around setting migration error, it seems too much and necessary for the new feature itself - add constant - change behavior for HMP command - split some things to separate patches - and more, by Peter's suggestions New behavior can be demonstrated like this: bash: ( cat
[PATCH v5 4/5] migration: process_incoming_migration_co(): rework error reporting
Unify error reporting in the function. This simplifies the following commit, which will not-exit-on-error behavior variant to the function. Signed-off-by: Vladimir Sementsov-Ogievskiy Reviewed-by: Fabiano Rosas --- migration/migration.c | 17 ++--- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 58fd5819bc..5489ff96df 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -748,11 +748,12 @@ process_incoming_migration_co(void *opaque) MigrationIncomingState *mis = migration_incoming_get_current(); PostcopyState ps; int ret; +Error *local_err = NULL; assert(mis->from_src_file); if (compress_threads_load_setup(mis->from_src_file)) { -error_report("Failed to setup decompress threads"); +error_setg(_err, "Failed to setup decompress threads"); goto fail; } @@ -789,16 +790,12 @@ process_incoming_migration_co(void *opaque) } if (ret < 0) { -if (migrate_has_error(s)) { -WITH_QEMU_LOCK_GUARD(>error_mutex) { -error_report_err(error_copy(s->error)); -} -} -error_report("load of migration failed: %s", strerror(-ret)); +error_setg(_err, "load of migration failed: %s", strerror(-ret)); goto fail; } if (colo_incoming_co() < 0) { +error_setg(_err, "colo incoming failed"); goto fail; } @@ -809,6 +806,12 @@ fail: MIGRATION_STATUS_FAILED); migration_incoming_state_destroy(); +if (migrate_has_error(s)) { +WITH_QEMU_LOCK_GUARD(>error_mutex) { +error_report_err(error_copy(s->error)); +} +} +error_report_err(local_err); migrate_error_free(s); exit(EXIT_FAILURE); } -- 2.34.1
[PATCH v5 3/5] migration: process_incoming_migration_co(): fix reporting s->error
It's bad idea to leave critical section with error object freed, but s->error still set, this theoretically may lead to use-after-free crash. Let's avoid it. Signed-off-by: Vladimir Sementsov-Ogievskiy --- migration/migration.c | 24 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 0d26db47f7..58fd5819bc 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -732,9 +732,19 @@ static void process_incoming_migration_bh(void *opaque) migration_incoming_state_destroy(); } +static void migrate_error_free(MigrationState *s) +{ +QEMU_LOCK_GUARD(>error_mutex); +if (s->error) { +error_free(s->error); +s->error = NULL; +} +} + static void coroutine_fn process_incoming_migration_co(void *opaque) { +MigrationState *s = migrate_get_current(); MigrationIncomingState *mis = migration_incoming_get_current(); PostcopyState ps; int ret; @@ -779,11 +789,9 @@ process_incoming_migration_co(void *opaque) } if (ret < 0) { -MigrationState *s = migrate_get_current(); - if (migrate_has_error(s)) { WITH_QEMU_LOCK_GUARD(>error_mutex) { -error_report_err(s->error); +error_report_err(error_copy(s->error)); } } error_report("load of migration failed: %s", strerror(-ret)); @@ -801,6 +809,7 @@ fail: MIGRATION_STATUS_FAILED); migration_incoming_state_destroy(); +migrate_error_free(s); exit(EXIT_FAILURE); } @@ -1433,15 +1442,6 @@ bool migrate_has_error(MigrationState *s) return qatomic_read(>error); } -static void migrate_error_free(MigrationState *s) -{ -QEMU_LOCK_GUARD(>error_mutex); -if (s->error) { -error_free(s->error); -s->error = NULL; -} -} - static void migrate_fd_error(MigrationState *s, const Error *error) { assert(s->to_dst_file == NULL); -- 2.34.1
[PATCH v5 2/5] migration: process_incoming_migration_co(): complete cleanup on failure
Make call to migration_incoming_state_destroy(), instead of doing only partial of it. Signed-off-by: Vladimir Sementsov-Ogievskiy Reviewed-by: Fabiano Rosas --- migration/migration.c | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 2dc6a063e9..0d26db47f7 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -799,10 +799,7 @@ process_incoming_migration_co(void *opaque) fail: migrate_set_state(>state, MIGRATION_STATUS_ACTIVE, MIGRATION_STATUS_FAILED); -qemu_fclose(mis->from_src_file); - -multifd_recv_cleanup(); -compress_threads_load_cleanup(); +migration_incoming_state_destroy(); exit(EXIT_FAILURE); } -- 2.34.1
[PATCH v5 5/5] qapi: introduce exit-on-error parameter for migrate-incoming
Now we do set MIGRATION_FAILED state, but don't give a chance to orchestrator to query migration state and get the error. Let's provide a possibility for QMP-based orchestrators to get an error like with outgoing migration. For hmp_migrate_incoming(), let's enable the new behavior: HMP is not and ABI, it's mostly intended to use by developer and it makes sense not to stop the process. For x-exit-preconfig, let's keep the old behavior: - it's called from init(), so here we want to keep current behavior by default - it does exit on error by itself as well So, if we want to change the behavior of x-exit-preconfig, it should be another patch. Signed-off-by: Vladimir Sementsov-Ogievskiy Acked-by: Markus Armbruster --- migration/migration-hmp-cmds.c | 2 +- migration/migration.c | 38 +++--- migration/migration.h | 3 +++ qapi/migration.json| 7 ++- system/vl.c| 3 ++- 5 files changed, 43 insertions(+), 10 deletions(-) diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c index 7e96ae6ffd..23181bbee1 100644 --- a/migration/migration-hmp-cmds.c +++ b/migration/migration-hmp-cmds.c @@ -466,7 +466,7 @@ void hmp_migrate_incoming(Monitor *mon, const QDict *qdict) } QAPI_LIST_PREPEND(caps, g_steal_pointer()); -qmp_migrate_incoming(NULL, true, caps, ); +qmp_migrate_incoming(NULL, true, caps, true, false, ); qapi_free_MigrationChannelList(caps); end: diff --git a/migration/migration.c b/migration/migration.c index 5489ff96df..09f762c99e 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -72,6 +72,8 @@ #define NOTIFIER_ELEM_INIT(array, elem)\ [elem] = NOTIFIER_WITH_RETURN_LIST_INITIALIZER((array)[elem]) +#define INMIGRATE_DEFAULT_EXIT_ON_ERROR true + static NotifierWithReturnList migration_state_notifiers[] = { NOTIFIER_ELEM_INIT(migration_state_notifiers, MIG_MODE_NORMAL), NOTIFIER_ELEM_INIT(migration_state_notifiers, MIG_MODE_CPR_REBOOT), @@ -234,6 +236,8 @@ void migration_object_init(void) qemu_cond_init(_incoming->page_request_cond); current_incoming->page_requested = g_tree_new(page_request_addr_cmp); +current_incoming->exit_on_error = INMIGRATE_DEFAULT_EXIT_ON_ERROR; + migration_object_check(current_migration, _fatal); blk_mig_init(); @@ -806,14 +810,19 @@ fail: MIGRATION_STATUS_FAILED); migration_incoming_state_destroy(); -if (migrate_has_error(s)) { -WITH_QEMU_LOCK_GUARD(>error_mutex) { -error_report_err(error_copy(s->error)); +if (mis->exit_on_error) { +if (migrate_has_error(s)) { +WITH_QEMU_LOCK_GUARD(>error_mutex) { +error_report_err(error_copy(s->error)); +} } +error_report_err(local_err); +migrate_error_free(s); +exit(EXIT_FAILURE); +} else { +migrate_set_error(s, local_err); +error_free(local_err); } -error_report_err(local_err); -migrate_error_free(s); -exit(EXIT_FAILURE); } /** @@ -1322,6 +1331,15 @@ static void fill_destination_migration_info(MigrationInfo *info) break; } info->status = mis->state; + +if (!info->error_desc) { +MigrationState *s = migrate_get_current(); +QEMU_LOCK_GUARD(>error_mutex); + +if (s->error) { +info->error_desc = g_strdup(error_get_pretty(s->error)); +} +} } MigrationInfo *qmp_query_migrate(Error **errp) @@ -1796,10 +1814,13 @@ void migrate_del_blocker(Error **reasonp) } void qmp_migrate_incoming(const char *uri, bool has_channels, - MigrationChannelList *channels, Error **errp) + MigrationChannelList *channels, + bool has_exit_on_error, bool exit_on_error, + Error **errp) { Error *local_err = NULL; static bool once = true; +MigrationIncomingState *mis = migration_incoming_get_current(); if (!once) { error_setg(errp, "The incoming migration has already been started"); @@ -1814,6 +1835,9 @@ void qmp_migrate_incoming(const char *uri, bool has_channels, return; } +mis->exit_on_error = +has_exit_on_error ? exit_on_error : INMIGRATE_DEFAULT_EXIT_ON_ERROR; + qemu_start_incoming_migration(uri, has_channels, channels, _err); if (local_err) { diff --git a/migration/migration.h b/migration/migration.h index 8045e39c26..95995a818e 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -227,6 +227,9 @@ struct MigrationIncomingState { * is needed as this field is updated serially. */ unsigned int switchover_ack_pending_num; + +/* Do exit on incoming migration failure */ +bool exit_on_error; }; MigrationIncomingState *migration_incoming_get_current(void); di
Re: [PULL 6/6] iotests: add backup-discard-source
[Add John] On 29.04.24 17:18, Richard Henderson wrote: On 4/29/24 04:51, Vladimir Sementsov-Ogievskiy wrote: Add test for a new backup option: discard-source. Signed-off-by: Vladimir Sementsov-Ogievskiy Reviewed-by: Fiona Ebner Tested-by: Fiona Ebner Message-Id: <20240313152822.626493-6-vsement...@yandex-team.ru> Signed-off-by: Vladimir Sementsov-Ogievskiy --- .../qemu-iotests/tests/backup-discard-source | 152 ++ .../tests/backup-discard-source.out | 5 + 2 files changed, 157 insertions(+) create mode 100755 tests/qemu-iotests/tests/backup-discard-source create mode 100644 tests/qemu-iotests/tests/backup-discard-source.out This fails check-python-minreqs https://gitlab.com/qemu-project/qemu/-/jobs/6739551782 It appears to be a pylint issue. tests/export-incoming-iothread:1:0: R0801: Similar lines in 2 files ==tests.backup-discard-source:[52:61] ==tests.copy-before-write:[54:63] 'file': { 'driver': iotests.imgfmt, 'file': { 'driver': 'file', 'filename': source_img, } }, 'target': { 'driver': iotests.imgfmt, (duplicate-code) Hmm. I don't think, that something should be changed in code. splitting out part of this json object to a function? That's a test for QMP command, and it's good that we see the command as is in one place. I can swap some lines or rename variables to hack the linter, but I'd prefer not doing so:) For me that looks like a false-positive: yes it's a duplication, but it should better be duplication, than complicating raw json objects by reusing common parts. What to do? As described in 22305c2a081b8b6 "python: Reduce strictness of pylint's duplicate-code check", this check could be either be disabled completely, or we can increase min-similarity-lines config value. I'd just disable it completely. Any thoughts? -- Best regards, Vladimir
Re: [PATCH v3 4/4] qapi: introduce exit-on-error parameter for migrate-incoming
On 29.04.24 16:06, Fabiano Rosas wrote: Vladimir Sementsov-Ogievskiy writes: On 25.04.24 23:30, Fabiano Rosas wrote: @@ -797,13 +801,18 @@ fail: MIGRATION_STATUS_FAILED); migration_incoming_state_destroy(); -if (migrate_has_error(s)) { -WITH_QEMU_LOCK_GUARD(>error_mutex) { -error_report_err(s->error); +if (mis->exit_on_error) { +if (migrate_has_error(s)) { +WITH_QEMU_LOCK_GUARD(>error_mutex) { +error_report_err(s->error); error_report_err(error_copy(s->error)) ...because later on you're reading from s->error at fill_destination_migration_info. No, we immediately do exit() instead. That's just a preexisting behavior, moved into "if (mis->exit_on_error)" I meant later in the patch, not later in the execution. Can't query-migrate be called during process_incoming_migration_co? Hmm.. On the one hand, seems no reason to care about it exactly before exit().. On the other hand, we do care about taking error_mutex. And we do release it, which may trigger another critical section. I'll try to touch up this thing. -- Best regards, Vladimir
Re: [PATCH] block/copy-before-write: use uint64_t for timeout in nanoseconds
On 29.04.24 17:46, Fiona Ebner wrote: Am 29.04.24 um 16:36 schrieb Philippe Mathieu-Daudé: Hi Fiona, On 29/4/24 16:19, Fiona Ebner wrote: Not everybody uses an email client that shows the patch content just after the subject (your first lines wasn't making sense at first). Simply duplicating the subject helps to understand: Use uint64_t for timeout in nanoseconds ... Oh, sorry. I'll try to remember that for the future. Should I re-send as a v2? Not necessary, I can touch up the message when applying to my block branch. Reviewed-by: Vladimir Sementsov-Ogievskiy -- Best regards, Vladimir
Re: [PATCH v3 4/5] qapi: introduce device-sync-config
On 29.04.24 16:04, Markus Armbruster wrote: Vladimir Sementsov-Ogievskiy writes: On 29.04.24 13:51, Markus Armbruster wrote: Vladimir Sementsov-Ogievskiy writes: On 24.04.24 14:48, Markus Armbruster wrote: Vladimir Sementsov-Ogievskiy writes: Add command to sync config from vhost-user backend to the device. It may be helpful when VHOST_USER_SLAVE_CONFIG_CHANGE_MSG failed or not triggered interrupt to the guest or just not available (not supported by vhost-user server). Command result is racy if allow it during migration. Let's allow the sync only in RUNNING state. Signed-off-by: Vladimir Sementsov-Ogievskiy [...] diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h index 0117d243c4..296af52322 100644 --- a/include/sysemu/runstate.h +++ b/include/sysemu/runstate.h @@ -5,6 +5,7 @@ #include "qemu/notify.h" bool runstate_check(RunState state); +const char *current_run_state_str(void); void runstate_set(RunState new_state); RunState runstate_get(void); bool runstate_is_running(void); diff --git a/qapi/qdev.json b/qapi/qdev.json index facaa0bc6a..e8be79c3d5 100644 --- a/qapi/qdev.json +++ b/qapi/qdev.json @@ -161,3 +161,24 @@ ## { 'event': 'DEVICE_UNPLUG_GUEST_ERROR', 'data': { '*device': 'str', 'path': 'str' } } + +## +# @device-sync-config: +# +# Synchronize config from backend to the guest. The command notifies +# re-read the device config from the backend and notifies the guest +# to re-read the config. The command may be used to notify the guest +# about block device capcity change. Currently only vhost-user-blk +# device supports this. I'm not sure I understand this. To work towards an understanding, I rephrase it, and you point out the errors. Synchronize device configuration from host to guest part. First, copy the configuration from the host part (backend) to the guest part (frontend). Then notify guest software that device configuration changed. Correct, thanks Perhaps Synchronize guest-visible device configuration with the backend's configuration, and notify guest software that device configuration changed. This may be useful to notify the guest of a block device capacity change. Currenrly, only vhost-user-blk devices support this. Sounds good Except I fat-fingered "Currently". Next question: what happens when the device *doesn't* support this? An error "device-sync-config is not supported ..." Okay. I wonder how configuration can get out of sync. Can you explain? The example (and the original feature, which triggered developing this) is vhost disk resize. If vhost-server (backend) doesn't support VHOST_USER_SLAVE_CONFIG_CHANGE_MSG, neither QEMU nor guest will know that disk capacity changed. Sounds like we wouldn't need this command if we could make the vhost-server support VHOST_USER_SLAVE_CONFIG_CHANGE_MSG. Is making it support it impractical? Or are there other uses for this command? Qemu's internal vhost-server do support it. But that's not the only vhost-user server) So the command is useful for those servers which doesn't support VHOST_USER_SLAVE_CONFIG_CHANGE_MSG. Note, that this message requires setting up additional channel of server -> client communication. That was the reason, why the "change-msg" solution was rejected in our downstream: it's safer to reuse existing channel (QMP), than to add and support an additional channel. Also, the command may help to debug the system, when VHOST_USER_SLAVE_CONFIG_CHANGE_MSG doesn't work for some reason. Suggest to work this into the commit message. +# +# @id: the device's ID or QOM path +# +# Features: +# +# @unstable: The command is experimental. +# +# Since: 9.1 +## +{ 'command': 'device-sync-config', + 'features': [ 'unstable' ], + 'data': {'id': 'str'} } diff --git a/system/qdev-monitor.c b/system/qdev-monitor.c index 7e075d91c1..cb35ea0b86 100644 --- a/system/qdev-monitor.c +++ b/system/qdev-monitor.c @@ -23,6 +23,7 @@ #include "monitor/monitor.h" #include "monitor/qdev.h" #include "sysemu/arch_init.h" +#include "sysemu/runstate.h" #include "qapi/error.h" #include "qapi/qapi-commands-qdev.h" #include "qapi/qmp/dispatch.h" @@ -969,6 +970,52 @@ void qmp_device_del(const char *id, Error **errp) } } +int qdev_sync_config(DeviceState *dev, Error **errp) +{ +DeviceClass *dc = DEVICE_GET_CLASS(dev); + +if (!dc->sync_config) { +error_setg(errp, "device-sync-config is not supported for '%s'", + object_get_typename(OBJECT(dev))); +return -ENOTSUP; +} + +return dc->sync_config(dev, errp); +} + +void qmp_device_sync_config(const char *id, Error **errp) +{ +DeviceState *dev; + +/* + * During migration there is a race between syncing`config and + * migr
Re: [PATCH v3 4/5] qapi: introduce device-sync-config
On 29.04.24 13:51, Markus Armbruster wrote: Vladimir Sementsov-Ogievskiy writes: On 24.04.24 14:48, Markus Armbruster wrote: Vladimir Sementsov-Ogievskiy writes: Add command to sync config from vhost-user backend to the device. It may be helpful when VHOST_USER_SLAVE_CONFIG_CHANGE_MSG failed or not triggered interrupt to the guest or just not available (not supported by vhost-user server). Command result is racy if allow it during migration. Let's allow the sync only in RUNNING state. Signed-off-by: Vladimir Sementsov-Ogievskiy [...] diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h index 0117d243c4..296af52322 100644 --- a/include/sysemu/runstate.h +++ b/include/sysemu/runstate.h @@ -5,6 +5,7 @@ #include "qemu/notify.h" bool runstate_check(RunState state); +const char *current_run_state_str(void); void runstate_set(RunState new_state); RunState runstate_get(void); bool runstate_is_running(void); diff --git a/qapi/qdev.json b/qapi/qdev.json index facaa0bc6a..e8be79c3d5 100644 --- a/qapi/qdev.json +++ b/qapi/qdev.json @@ -161,3 +161,24 @@ ## { 'event': 'DEVICE_UNPLUG_GUEST_ERROR', 'data': { '*device': 'str', 'path': 'str' } } + +## +# @device-sync-config: +# +# Synchronize config from backend to the guest. The command notifies +# re-read the device config from the backend and notifies the guest +# to re-read the config. The command may be used to notify the guest +# about block device capcity change. Currently only vhost-user-blk +# device supports this. I'm not sure I understand this. To work towards an understanding, I rephrase it, and you point out the errors. Synchronize device configuration from host to guest part. First, copy the configuration from the host part (backend) to the guest part (frontend). Then notify guest software that device configuration changed. Correct, thanks Perhaps Synchronize guest-visible device configuration with the backend's configuration, and notify guest software that device configuration changed. This may be useful to notify the guest of a block device capacity change. Currenrly, only vhost-user-blk devices support this. Sounds good Next question: what happens when the device *doesn't* support this? An error "device-sync-config is not supported ..." I wonder how configuration can get out of sync. Can you explain? The example (and the original feature, which triggered developing this) is vhost disk resize. If vhost-server (backend) doesn't support VHOST_USER_SLAVE_CONFIG_CHANGE_MSG, neither QEMU nor guest will know that disk capacity changed. Sounds like we wouldn't need this command if we could make the vhost-server support VHOST_USER_SLAVE_CONFIG_CHANGE_MSG. Is making it support it impractical? Or are there other uses for this command? Qemu's internal vhost-server do support it. But that's not the only vhost-user server) So the command is useful for those servers which doesn't support VHOST_USER_SLAVE_CONFIG_CHANGE_MSG. Note, that this message requires setting up additional channel of server -> client communication. That was the reason, why the "change-msg" solution was rejected in our downstream: it's safer to reuse existing channel (QMP), than to add and support an additional channel. Also, the command may help to debug the system, when VHOST_USER_SLAVE_CONFIG_CHANGE_MSG doesn't work for some reason. +# +# @id: the device's ID or QOM path +# +# Features: +# +# @unstable: The command is experimental. +# +# Since: 9.1 +## +{ 'command': 'device-sync-config', + 'features': [ 'unstable' ], + 'data': {'id': 'str'} } diff --git a/system/qdev-monitor.c b/system/qdev-monitor.c index 7e075d91c1..cb35ea0b86 100644 --- a/system/qdev-monitor.c +++ b/system/qdev-monitor.c @@ -23,6 +23,7 @@ #include "monitor/monitor.h" #include "monitor/qdev.h" #include "sysemu/arch_init.h" +#include "sysemu/runstate.h" #include "qapi/error.h" #include "qapi/qapi-commands-qdev.h" #include "qapi/qmp/dispatch.h" @@ -969,6 +970,52 @@ void qmp_device_del(const char *id, Error **errp) } } +int qdev_sync_config(DeviceState *dev, Error **errp) +{ +DeviceClass *dc = DEVICE_GET_CLASS(dev); + +if (!dc->sync_config) { +error_setg(errp, "device-sync-config is not supported for '%s'", + object_get_typename(OBJECT(dev))); +return -ENOTSUP; +} + +return dc->sync_config(dev, errp); +} + +void qmp_device_sync_config(const char *id, Error **errp) +{ +DeviceState *dev; + +/* + * During migration there is a race between syncing`config and + * migrating it, so let's just not allow it. Can you briefly explain the race? If at the moment of qmp command, corresponding config already migrated to the target, we'll change only the config on source, but on th
Re: [PULL 0/6] Block jobs patches for 2024-04-29
Sorry for too much CC-ing, I've mistakenly added --cc-cmd=./scripts/get_maintainer.pl On 29.04.24 14:51, Vladimir Sementsov-Ogievskiy wrote: The following changes since commit fd87be1dada5672f877e03c2ca8504458292c479: Merge tag 'accel-20240426' of https://github.com/philmd/qemu into staging (2024-04-26 15:28:13 -0700) are available in the Git repository at: https://gitlab.com/vsementsov/qemu.git tags/pull-block-jobs-2024-04-29 for you to fetch changes up to 2ca7608c6b8d57fd6347b11af12a0f035263efef: iotests: add backup-discard-source (2024-04-29 13:35:30 +0300) Block jobs patches for 2024-04-29 - backup: discard-source parameter - blockcommit: Reopen base image as RO after abort Alexander Ivanov (1): blockcommit: Reopen base image as RO after abort Vladimir Sementsov-Ogievskiy (5): block/copy-before-write: fix permission block/copy-before-write: support unligned snapshot-discard block/copy-before-write: create block_copy bitmap in filter node qapi: blockdev-backup: add discard-source parameter iotests: add backup-discard-source block/backup.c | 5 +- block/block-copy.c | 12 +- block/copy-before-write.c | 39 +-- block/copy-before-write.h | 1 + block/mirror.c | 11 +- block/replication.c| 4 +- blockdev.c | 2 +- include/block/block-common.h | 2 + include/block/block-copy.h | 2 + include/block/block_int-global-state.h | 2 +- qapi/block-core.json | 4 + tests/qemu-iotests/257.out | 112 +- tests/qemu-iotests/tests/backup-discard-source | 152 + tests/qemu-iotests/tests/backup-discard-source.out | 5 + 14 files changed, 281 insertions(+), 72 deletions(-) create mode 100755 tests/qemu-iotests/tests/backup-discard-source create mode 100644 tests/qemu-iotests/tests/backup-discard-source.out Alexander Ivanov (1): blockcommit: Reopen base image as RO after abort Vladimir Sementsov-Ogievskiy (5): block/copy-before-write: fix permission block/copy-before-write: support unligned snapshot-discard block/copy-before-write: create block_copy bitmap in filter node qapi: blockdev-backup: add discard-source parameter iotests: add backup-discard-source block/backup.c| 5 +- block/block-copy.c| 12 +- block/copy-before-write.c | 39 - block/copy-before-write.h | 1 + block/mirror.c| 11 +- block/replication.c | 4 +- blockdev.c| 2 +- include/block/block-common.h | 2 + include/block/block-copy.h| 2 + include/block/block_int-global-state.h| 2 +- qapi/block-core.json | 4 + tests/qemu-iotests/257.out| 112 ++--- .../qemu-iotests/tests/backup-discard-source | 152 ++ .../tests/backup-discard-source.out | 5 + 14 files changed, 281 insertions(+), 72 deletions(-) create mode 100755 tests/qemu-iotests/tests/backup-discard-source create mode 100644 tests/qemu-iotests/tests/backup-discard-source.out -- Best regards, Vladimir
[PULL 4/6] block/copy-before-write: create block_copy bitmap in filter node
Currently block_copy creates copy_bitmap in source node. But that is in bad relation with .independent_close=true of copy-before-write filter: source node may be detached and removed before .bdrv_close() handler called, which should call block_copy_state_free(), which in turn should remove copy_bitmap. That's all not ideal: it would be better if internal bitmap of block-copy object is not attached to any node. But that is not possible now. The simplest solution is just create copy_bitmap in filter node, where anyway two other bitmaps are created. Signed-off-by: Vladimir Sementsov-Ogievskiy Reviewed-by: Fiona Ebner Tested-by: Fiona Ebner Message-Id: <20240313152822.626493-4-vsement...@yandex-team.ru> Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/block-copy.c | 3 +- block/copy-before-write.c | 2 +- include/block/block-copy.h | 1 + tests/qemu-iotests/257.out | 112 ++--- 4 files changed, 60 insertions(+), 58 deletions(-) diff --git a/block/block-copy.c b/block/block-copy.c index 9ee3dd7ef5..8fca2c3698 100644 --- a/block/block-copy.c +++ b/block/block-copy.c @@ -351,6 +351,7 @@ static int64_t block_copy_calculate_cluster_size(BlockDriverState *target, } BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target, + BlockDriverState *copy_bitmap_bs, const BdrvDirtyBitmap *bitmap, Error **errp) { @@ -367,7 +368,7 @@ BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target, return NULL; } -copy_bitmap = bdrv_create_dirty_bitmap(source->bs, cluster_size, NULL, +copy_bitmap = bdrv_create_dirty_bitmap(copy_bitmap_bs, cluster_size, NULL, errp); if (!copy_bitmap) { return NULL; diff --git a/block/copy-before-write.c b/block/copy-before-write.c index 6d89af0b29..ed2c228da7 100644 --- a/block/copy-before-write.c +++ b/block/copy-before-write.c @@ -468,7 +468,7 @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags, ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) & bs->file->bs->supported_zero_flags); -s->bcs = block_copy_state_new(bs->file, s->target, bitmap, errp); +s->bcs = block_copy_state_new(bs->file, s->target, bs, bitmap, errp); if (!s->bcs) { error_prepend(errp, "Cannot create block-copy-state: "); return -EINVAL; diff --git a/include/block/block-copy.h b/include/block/block-copy.h index 0700953ab8..8b41643bfa 100644 --- a/include/block/block-copy.h +++ b/include/block/block-copy.h @@ -25,6 +25,7 @@ typedef struct BlockCopyState BlockCopyState; typedef struct BlockCopyCallState BlockCopyCallState; BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target, + BlockDriverState *copy_bitmap_bs, const BdrvDirtyBitmap *bitmap, Error **errp); diff --git a/tests/qemu-iotests/257.out b/tests/qemu-iotests/257.out index aa76131ca9..c33dd7f3a9 100644 --- a/tests/qemu-iotests/257.out +++ b/tests/qemu-iotests/257.out @@ -120,16 +120,16 @@ write -P0x67 0x3fe 0x2 "granularity": 65536, "persistent": false, "recording": false - } -], -"drive0": [ + }, { "busy": false, "count": 0, "granularity": 65536, "persistent": false, "recording": false - }, + } +], +"drive0": [ { "busy": false, "count": 458752, @@ -596,16 +596,16 @@ write -P0x67 0x3fe 0x2 "granularity": 65536, "persistent": false, "recording": false - } -], -"drive0": [ + }, { "busy": false, "count": 0, "granularity": 65536, "persistent": false, "recording": false - }, + } +], +"drive0": [ { "busy": false, "count": 458752, @@ -865,16 +865,16 @@ write -P0x67 0x3fe 0x2 "granularity": 65536, "persistent": false, "recording": false - } -], -"drive0": [ + }, { "busy": false, "count": 0, "granularity": 65536, "persistent": false, "recording": false - }, + } +], +"drive0": [ { "busy": false, "count": 458752, @@ -1341,16 +1341,1
[PULL 5/6] qapi: blockdev-backup: add discard-source parameter
Add a parameter that enables discard-after-copy. That is mostly useful in "push backup with fleecing" scheme, when source is snapshot-access format driver node, based on copy-before-write filter snapshot-access API: [guest] [snapshot-access] ~~ blockdev-backup ~~> [backup target] || | root | file vv [copy-before-write] | | | file| target v v [active disk] [temp.img] In this case discard-after-copy does two things: - discard data in temp.img to save disk space - avoid further copy-before-write operation in discarded area Note that we have to declare WRITE permission on source in copy-before-write filter, for discard to work. Still we can't take it unconditionally, as it will break normal backup from RO source. So, we have to add a parameter and pass it thorough bdrv_open flags. Signed-off-by: Vladimir Sementsov-Ogievskiy Reviewed-by: Fiona Ebner Tested-by: Fiona Ebner Acked-by: Markus Armbruster Message-Id: <20240313152822.626493-5-vsement...@yandex-team.ru> Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/backup.c | 5 +++-- block/block-copy.c | 9 + block/copy-before-write.c | 15 +-- block/copy-before-write.h | 1 + block/replication.c| 4 ++-- blockdev.c | 2 +- include/block/block-common.h | 2 ++ include/block/block-copy.h | 1 + include/block/block_int-global-state.h | 2 +- qapi/block-core.json | 4 10 files changed, 37 insertions(+), 8 deletions(-) diff --git a/block/backup.c b/block/backup.c index ec29d6b810..3dd2e229d2 100644 --- a/block/backup.c +++ b/block/backup.c @@ -356,7 +356,7 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs, BlockDriverState *target, int64_t speed, MirrorSyncMode sync_mode, BdrvDirtyBitmap *sync_bitmap, BitmapSyncMode bitmap_mode, - bool compress, + bool compress, bool discard_source, const char *filter_node_name, BackupPerf *perf, BlockdevOnError on_source_error, @@ -457,7 +457,8 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs, goto error; } -cbw = bdrv_cbw_append(bs, target, filter_node_name, , errp); +cbw = bdrv_cbw_append(bs, target, filter_node_name, discard_source, + , errp); if (!cbw) { goto error; } diff --git a/block/block-copy.c b/block/block-copy.c index 8fca2c3698..7e3b378528 100644 --- a/block/block-copy.c +++ b/block/block-copy.c @@ -137,6 +137,7 @@ typedef struct BlockCopyState { CoMutex lock; int64_t in_flight_bytes; BlockCopyMethod method; +bool discard_source; BlockReqList reqs; QLIST_HEAD(, BlockCopyCallState) calls; /* @@ -353,6 +354,7 @@ static int64_t block_copy_calculate_cluster_size(BlockDriverState *target, BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target, BlockDriverState *copy_bitmap_bs, const BdrvDirtyBitmap *bitmap, + bool discard_source, Error **errp) { ERRP_GUARD(); @@ -418,6 +420,7 @@ BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target, cluster_size), }; +s->discard_source = discard_source; block_copy_set_copy_opts(s, false, false); ratelimit_init(>rate_limit); @@ -589,6 +592,12 @@ static coroutine_fn int block_copy_task_entry(AioTask *task) co_put_to_shres(s->mem, t->req.bytes); block_copy_task_end(t, ret); +if (s->discard_source && ret == 0) { +int64_t nbytes = +MIN(t->req.offset + t->req.bytes, s->len) - t->req.offset; +bdrv_co_pdiscard(s->source, t->req.offset, nbytes); +} + return ret; } diff --git a/block/copy-before-write.c b/block/copy-before-write.c index ed2c228da7..cd65524e26 100644 --- a/block/copy-before-write.c +++ b/block/copy-before-write.c @@ -44,6 +44,7 @@ typedef struct BDRVCopyBeforeWriteState { BdrvChild *target; OnCbwError on_cbw_error; uint32_t cbw_timeout_ns; +bool discard_source; /* * @lock: protects access to @access_bitmap, @done_bitmap and @@ -357,6 +358,8 @@ cbw_child_perm(BlockDriverState *bs, BdrvChild *c, BdrvChildRole role, uint64_t perm, uint64_t shared, uint64_t *nperm, uint64_t *nshared) { +BDRVCopyBeforeWriteState *s = bs->opaque; + if (!(role & BDRV_CHILD_FILTERED)) { /* * Target child @@ -381,6 +384,10 @@ cbw_child_perm(BlockDriver