from:"Vladimir Sementsov\-Ogievskiy"

Re: [PATCH v9 4/7] qapi: add blockdev-replace command

2024-07-19 Thread Vladimir Sementsov-Ogievskiy


On 18.07.24 15:00, Markus Armbruster wrote:

Vladimir Sementsov-Ogievskiy  writes:


Add a command that can replace bs in following BdrvChild structures:

  - qdev blk root child
  - block-export blk root child
  - any child of BlockDriverState selected by child-name

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  blockdev.c | 56 +++
  qapi/block-core.json   | 88 ++
  stubs/blk-by-qdev-id.c | 13 +++
  stubs/meson.build  |  1 +
  4 files changed, 158 insertions(+)
  create mode 100644 stubs/blk-by-qdev-id.c

diff --git a/blockdev.c b/blockdev.c
index ba7e90b06e..2190467022 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3559,6 +3559,62 @@ void qmp_x_blockdev_set_iothread(const char *node_name, 
StrOrNull *iothread,
  bdrv_try_change_aio_context(bs, new_context, NULL, errp);
  }
  
+void qmp_blockdev_replace(BlockdevReplace *repl, Error **errp)

+{
+BdrvChild *child = NULL;
+BlockDriverState *new_child_bs;
+
+if (repl->parent_type == BLOCK_PARENT_TYPE_DRIVER) {
+BlockDriverState *parent_bs;
+
+parent_bs = bdrv_find_node(repl->u.driver.node_name);
+if (!parent_bs) {
+error_setg(errp, "Block driver node with node-name '%s' not "
+   "found", repl->u.driver.node_name);
+return;
+}
+
+child = bdrv_find_child(parent_bs, repl->u.driver.child);
+if (!child) {
+error_setg(errp, "Block driver node '%s' doesn't have child "
+   "named '%s'", repl->u.driver.node_name,
+   repl->u.driver.child);
+return;
+}
+} else {
+/* Other types are similar, they work through blk */
+BlockBackend *blk;
+bool is_qdev = repl->parent_type == BLOCK_PARENT_TYPE_QDEV;
+const char *id =
+is_qdev ? repl->u.qdev.qdev_id : repl->u.export.export_id;
+
+assert(is_qdev || repl->parent_type == BLOCK_PARENT_TYPE_EXPORT);
+
+blk = is_qdev ? blk_by_qdev_id(id, errp) : blk_by_export_id(id, errp);


blk_by_export_id() finds export @exp, and returns the associated block
backend exp->blk.  Fine.

blk_by_qdev_id() finds the device, and then searches @block_backends for
a blk with blk->dev == blk.  If a device has more than one block
backend, you get the one first in @block_backends.  I figure that's the
one created first.

Interface issue: when a device has multiple block backends, only one of
them can be replaced, and which one is kind of random.

Do such devices exist?


Hmm.. I expect, they don't. If there is a problem, it's pre-existing, all 
callers of qmp_get_blk are affected. But honestly, I don't know.



If no, could they exist?

If yes, what should we do about it now?


Maybe, continue the loop in blk_by_dev() up the the end, and if two blk found 
for the device, return an error? Or even abort?

And add check to blk_attach_dev(), that we don't attach same device to several 
blks.




+if (!blk) {
+return;
+}
+
+child = blk_root(blk);
+if (!child) {
+error_setg(errp, "%s '%s' is empty, nothing to replace",
+   is_qdev ? "Device" : "Export", id);
+return;
+}
+}
+
+assert(child);
+assert(child->bs);
+
+new_child_bs = bdrv_find_node(repl->new_child);
+if (!new_child_bs) {
+error_setg(errp, "Node '%s' not found", repl->new_child);
+return;
+}
+
+bdrv_replace_child_bs(child, new_child_bs, errp);
+}
+
  QemuOptsList qemu_common_drive_opts = {
  .name = "drive",
  .head = QTAILQ_HEAD_INITIALIZER(qemu_common_drive_opts.head),
diff --git a/qapi/block-core.json b/qapi/block-core.json
index df5e07debd..0a6f08a6e0 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -6148,3 +6148,91 @@
  ##
  { 'struct': 'DummyBlockCoreForceArrays',
'data': { 'unused-block-graph-info': ['BlockGraphInfo'] } }
+
+##
+# @BlockParentType:
+#
+# @qdev: block device, such as created by device_add, and denoted by
+# qdev-id
+#
+# @driver: block driver node, such as created by blockdev-add, and
+# denoted by node-name


node-name and child?


Hmm. I'd say that parent is denoted only by node-name, as parent is node itself 
(the fact that we have separate BdrvChild structure as a parent I'd keep as 
internal realization). But parent may have several children, and concrete child 
is denoted by @child.




+#
+# @export: block export, such created by block-export-add, and
+# denoted by export-id
+#
+# Since 9.1
+##


I'm kind of unhappy with this doc comment.  I feel makes sense only in
the context of its use.  Let me try to avoid that:

# @driver: the parent is a block node, the child is one of its
# children.
#
# @export: the parent

Re: [PATCH v9 2/7] block/export: add blk_by_export_id()

2024-07-19 Thread Vladimir Sementsov-Ogievskiy


On 18.07.24 14:48, Markus Armbruster wrote:

Vladimir Sementsov-Ogievskiy  writes:


We need it for further blockdev-replace functionality.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  block/export/export.c   | 18 ++
  include/sysemu/block-backend-global-state.h |  1 +
  2 files changed, 19 insertions(+)

diff --git a/block/export/export.c b/block/export/export.c
index 6d51ae8ed7..57beae7982 100644
--- a/block/export/export.c
+++ b/block/export/export.c
@@ -355,3 +355,21 @@ BlockExportInfoList *qmp_query_block_exports(Error **errp)
  
  return head;

  }
+
+BlockBackend *blk_by_export_id(const char *id, Error **errp)
+{
+BlockExport *exp;
+
+exp = blk_exp_find(id);
+if (exp == NULL) {
+error_setg(errp, "Export '%s' not found", id);
+return NULL;
+}
+
+if (!exp->blk) {
+error_setg(errp, "Export '%s' is empty", id);


Can this happen?



Hmm, looking at the code in blk_exp_add:

assert(exp->blk != NULL);
 
QLIST_INSERT_HEAD(_exports, exp, next);

return exp;


seems not. And I can't find anything changing exp->blk except for blk_exp_add().

Will switch to assertion.


+return NULL;
+}
+
+return exp->blk;
+}
diff --git a/include/sysemu/block-backend-global-state.h 
b/include/sysemu/block-backend-global-state.h
index ccb35546a1..410d0cc5c7 100644
--- a/include/sysemu/block-backend-global-state.h
+++ b/include/sysemu/block-backend-global-state.h
@@ -74,6 +74,7 @@ void blk_detach_dev(BlockBackend *blk, DeviceState *dev);
  DeviceState *blk_get_attached_dev(BlockBackend *blk);
  BlockBackend *blk_by_dev(void *dev);
  BlockBackend *blk_by_qdev_id(const char *id, Error **errp);
+BlockBackend *blk_by_export_id(const char *id, Error **errp);
  void blk_set_dev_ops(BlockBackend *blk, const BlockDevOps *ops, void *opaque);
  
  void blk_activate(BlockBackend *blk, Error **errp);




--
Best regards,
Vladimir

Re: [PATCH v2 1/3] block/commit: implement final flush

2024-07-19 Thread Vladimir Sementsov-Ogievskiy


On 18.07.24 22:22, Kevin Wolf wrote:

Am 26.06.2024 um 16:50 hat Vladimir Sementsov-Ogievskiy geschrieben:

Actually block job is not completed without the final flush. It's
rather unexpected to have broken target when job was successfully
completed long ago and now we fail to flush or process just
crashed/killed.

Mirror job already has mirror_flush() for this. So, it's OK.

Do this for commit job too.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  block/commit.c | 41 +++--
  1 file changed, 27 insertions(+), 14 deletions(-)

diff --git a/block/commit.c b/block/commit.c
index 7c3fdcb0ca..81971692a2 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -134,6 +134,7 @@ static int coroutine_fn commit_run(Job *job, Error **errp)
  int64_t n = 0; /* bytes */
  QEMU_AUTO_VFREE void *buf = NULL;
  int64_t len, base_len;
+bool need_final_flush = true;
  
  len = blk_co_getlength(s->top);

  if (len < 0) {
@@ -155,8 +156,8 @@ static int coroutine_fn commit_run(Job *job, Error **errp)
  
  buf = blk_blockalign(s->top, COMMIT_BUFFER_SIZE);
  
-for (offset = 0; offset < len; offset += n) {

-bool copy;
+for (offset = 0; offset < len || need_final_flush; offset += n) {


In general, the control flow would be nicer to read if the final flush
weren't integrated into the loop, but just added after it.

But I assume this is pretty much required for pausing the job during
error handling in the final flush if you don't want to duplicate a lot
of the logic into a second loop?


Right, that's the reason.




+bool copy = false;
  bool error_in_source = true;
  
  /* Note that even when no rate limit is applied we need to yield

@@ -166,22 +167,34 @@ static int coroutine_fn commit_run(Job *job, Error **errp)
  if (job_is_cancelled(>common.job)) {
  break;
  }
-/* Copy if allocated above the base */
-ret = blk_co_is_allocated_above(s->top, s->base_overlay, true,
-offset, COMMIT_BUFFER_SIZE, );
-copy = (ret > 0);
-trace_commit_one_iteration(s, offset, n, ret);
-if (copy) {
-assert(n < SIZE_MAX);
  
-ret = blk_co_pread(s->top, offset, n, buf, 0);

-if (ret >= 0) {
-ret = blk_co_pwrite(s->base, offset, n, buf, 0);
-if (ret < 0) {
-error_in_source = false;
+if (offset < len) {
+/* Copy if allocated above the base */
+ret = blk_co_is_allocated_above(s->top, s->base_overlay, true,
+offset, COMMIT_BUFFER_SIZE, );
+copy = (ret > 0);
+trace_commit_one_iteration(s, offset, n, ret);
+if (copy) {
+assert(n < SIZE_MAX);
+
+ret = blk_co_pread(s->top, offset, n, buf, 0);
+if (ret >= 0) {
+ret = blk_co_pwrite(s->base, offset, n, buf, 0);
+if (ret < 0) {
+error_in_source = false;
+}
  }
  }
+} else {
+assert(need_final_flush);
+ret = blk_co_flush(s->base);
+if (ret < 0) {
+error_in_source = false;
+} else {
+need_final_flush = false;
+}


Should we set n = 0 in this block to avoid counting the last chunk twice
for the progress?


Oops, right. Will fix, thanks




  }
+
  if (ret < 0) {
  BlockErrorAction action =
  block_job_error_action(>common, s->on_error,


Kevin



--
Best regards,
Vladimir

Re: [PATCH v5 3/3] qapi: introduce device-sync-config

2024-07-19 Thread Vladimir Sementsov-Ogievskiy


On 18.07.24 11:27, Markus Armbruster wrote:

Vladimir Sementsov-Ogievskiy  writes:


Add command to sync config from vhost-user backend to the device. It
may be helpful when VHOST_USER_SLAVE_CONFIG_CHANGE_MSG failed or not
triggered interrupt to the guest or just not available (not supported
by vhost-user server).

Command result is racy if allow it during migration. Let's allow the
sync only in RUNNING state.


Is this still accurate?  The runstate_is_running() check is gone in
v4, the migration_is_running() check remains.


Right, better to fix commit message like:

Command result is racy if allow it during migration. Let's not allow it.




Signed-off-by: Vladimir Sementsov-Ogievskiy 


QAPI schema and QMP part:
Signed-off-by: Markus Armbruster 



--
Best regards,
Vladimir

Re: [PATCH v5 0/3] vhost-user-blk: live resize additional APIs

2024-07-19 Thread Vladimir Sementsov-Ogievskiy


On 18.07.24 11:31, Markus Armbruster wrote:

Vladimir Sementsov-Ogievskiy  writes:


ping. Markus, Eric, could someone give an ACC for QAPI part?


I apologize for the delay.  It was pretty bad.



No problem, I myself make worse delays now when busy with other work, thanks 
for reviewing!

--
Best regards,
Vladimir

Re: [PATCH v2] block/reqlist: allow adding overlapping requests

2024-07-12 Thread Vladimir Sementsov-Ogievskiy


On 12.07.24 17:07, Fiona Ebner wrote:

Allow overlapping request by removing the assert that made it
impossible. There are only two callers:

1. block_copy_task_create()

It already asserts the very same condition before calling
reqlist_init_req().

2. cbw_snapshot_read_lock()

There is no need to have read requests be non-overlapping in
copy-before-write when used for snapshot-access. In fact, there was no
protection against two callers of cbw_snapshot_read_lock() calling
reqlist_init_req() with overlapping ranges and this could lead to an
assertion failure [1].

In particular, with the reproducer script below [0], two
cbw_co_snapshot_block_status() callers could race, with the second
calling reqlist_init_req() before the first one finishes and removes
its conflicting request.

[0]:


#!/bin/bash -e
dd if=/dev/urandom of=/tmp/disk.raw bs=1M count=1024
./qemu-img create /tmp/fleecing.raw -f raw 1G
(
./qemu-system-x86_64 --qmp stdio \
--blockdev raw,node-name=node0,file.driver=file,file.filename=/tmp/disk.raw \
--blockdev raw,node-name=node1,file.driver=file,file.filename=/tmp/fleecing.raw 
\
<

[1]:


#5  0x71e5f0088eb2 in __GI___assert_fail (...) at ./assert/assert.c:101
#6  0x615285438017 in reqlist_init_req (...) at ../block/reqlist.c:23
#7  0x6152853e2d98 in cbw_snapshot_read_lock (...) at 
../block/copy-before-write.c:237
#8  0x6152853e3068 in cbw_co_snapshot_block_status (...) at 
../block/copy-before-write.c:304
#9  0x6152853f4d22 in bdrv_co_snapshot_block_status (...) at 
../block/io.c:3726
#10 0x61528543a63e in snapshot_access_co_block_status (...) at 
../block/snapshot-access.c:48
#11 0x6152853f1a0a in bdrv_co_do_block_status (...) at ../block/io.c:2474
#12 0x6152853f2016 in bdrv_co_common_block_status_above (...) at 
../block/io.c:2652
#13 0x6152853f22cf in bdrv_co_block_status_above (...) at ../block/io.c:2732
#14 0x6152853d9a86 in blk_co_block_status_above (...) at 
../block/block-backend.c:1473
#15 0x61528538da6c in blockstatus_to_extents (...) at ../nbd/server.c:2374
#16 0x61528538deb1 in nbd_co_send_block_status (...) at ../nbd/server.c:2481
#17 0x61528538f424 in nbd_handle_request (...) at ../nbd/server.c:2978
#18 0x61528538f906 in nbd_trip (...) at ../nbd/server.c:3121
#19 0x6152855a7caf in coroutine_trampoline (...) at 
../util/coroutine-ucontext.c:175


Cc: qemu-sta...@nongnu.org
Suggested-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Fiona Ebner 
---

Changes in v2:
* different approach, allowing overlapping requests for
   copy-before-write rather than waiting for them. block-copy already
   asserts there are no conflicts before adding a request.

  block/copy-before-write.c | 3 ++-
  block/reqlist.c   | 2 --
  2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index 853e01a1eb..28f6a096cd 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -66,7 +66,8 @@ typedef struct BDRVCopyBeforeWriteState {
  
  /*

   * @frozen_read_reqs: current read requests for fleecing user in bs->file
- * node. These areas must not be rewritten by guest.
+ * node. These areas must not be rewritten by guest. There can be multiple
+ * overlapping read requests.
   */
  BlockReqList frozen_read_reqs;
  
diff --git a/block/reqlist.c b/block/reqlist.c

index 08cb57cfa4..098e807378 100644
--- a/block/reqlist.c
+++ b/block/reqlist.c
@@ -20,8 +20,6 @@
  void reqlist_init_req(BlockReqList *reqs, BlockReq *req, int64_t offset,
int64_t bytes)
  {
-assert(!reqlist_find_conflict(reqs, offset, bytes));
-
  *req = (BlockReq) {
  .offset = offset,
  .bytes = bytes,



Reviewed-by: Vladimir Sementsov-Ogievskiy 

Thanks, applied to my block branch

--
Best regards,
Vladimir

Re: [PATCH v3 0/2] backup: allow specifying minimum cluster size

2024-07-12 Thread Vladimir Sementsov-Ogievskiy


On 11.07.24 15:09, Fiona Ebner wrote:

Discussion for v2:
https://lore.kernel.org/qemu-devel/20240528120114.344416-1-f.eb...@proxmox.com/

Changes in v3:
* Pass min_cluster_size option directly without checking
   has_min_cluster_size, because the default is 0 anyways.
* Calculate maximum of passed-in argument and default once at the
   beginning of block_copy_calculate_cluster_size()
* Update warning message to reflect actual value used
* Do not leak qdict in error case
* Use PRI{i,u}64 macros

Discussion for v1:
https://lore.kernel.org/qemu-devel/20240308155158.830258-1-f.eb...@proxmox.com/
-
Changes in v2:
* Use 'size' type in QAPI.
* Remove option in cbw_parse_options(), i.e. before parsing generic
   blockdev options.
* Reword commit messages hoping to describe the issue in a more
   straight-forward way.

In the context of backup fleecing, discarding the source will not work
when the fleecing image has a larger granularity than the one used for
block-copy operations (can happen if the backup target has smaller
cluster size), because cbw_co_pdiscard_snapshot() will align down the
discard requests and thus effectively ignore then.

To make @discard-source work in such a scenario, allow specifying the
minimum cluster size used for block-copy operations and thus in
particular also the granularity for discard requests to the source.

Fiona Ebner (2):
   copy-before-write: allow specifying minimum cluster size
   backup: add minimum cluster size to performance options

  block/backup.c |  2 +-
  block/block-copy.c | 36 ++--
  block/copy-before-write.c  | 14 +-
  block/copy-before-write.h  |  1 +
  blockdev.c |  3 +++
  include/block/block-copy.h |  1 +
  qapi/block-core.json   | 17 ++---
  7 files changed, 59 insertions(+), 15 deletions(-)



Thanks, applied to my block branch.

--
Best regards,
Vladimir

Re: [PATCH v3 2/2] backup: add minimum cluster size to performance options

2024-07-12 Thread Vladimir Sementsov-Ogievskiy


On 11.07.24 15:09, Fiona Ebner wrote:

In the context of backup fleecing, discarding the source will not work
when the fleecing image has a larger granularity than the one used for
block-copy operations (can happen if the backup target has smaller
cluster size), because cbw_co_pdiscard_snapshot() will align down the
discard requests and thus effectively ignore then.

To make @discard-source work in such a scenario, allow specifying the
minimum cluster size used for block-copy operations and thus in
particular also the granularity for discard requests to the source.

Suggested-by: Vladimir Sementsov-Ogievskiy
Acked-by: Markus Armbruster  (QAPI schema)
Signed-off-by: Fiona Ebner


Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir

Re: [PATCH v3 1/2] copy-before-write: allow specifying minimum cluster size

2024-07-12 Thread Vladimir Sementsov-Ogievskiy


On 11.07.24 15:09, Fiona Ebner wrote:

In the context of backup fleecing, discarding the source will not work
when the fleecing image has a larger granularity than the one used for
block-copy operations (can happen if the backup target has smaller
cluster size), because cbw_co_pdiscard_snapshot() will align down the
discard requests and thus effectively ignore then.

To make @discard-source work in such a scenario, allow specifying the
minimum cluster size used for block-copy operations and thus in
particular also the granularity for discard requests to the source.

The type 'size' (corresponding to uint64_t in C) is used in QAPI to
rule out negative inputs and for consistency with already existing
@cluster-size parameters. Since block_copy_calculate_cluster_size()
uses int64_t for its result, a check that the input is not too large
is added in block_copy_state_new() before calling it. The calculation
in block_copy_calculate_cluster_size() is done in the target int64_t
type.

Suggested-by: Vladimir Sementsov-Ogievskiy
Acked-by: Markus Armbruster  (QAPI schema)
Signed-off-by: Fiona Ebner


Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir

Re: [PATCH] block/copy-before-write: wait for conflicts when read locking to avoid assertion failure

2024-07-12 Thread Vladimir Sementsov-Ogievskiy


On 11.07.24 16:36, Fiona Ebner wrote:

There is no protection against two callers of cbw_snapshot_read_lock()
calling reqlist_init_req() with overlapping ranges, and
reqlist_init_req() asserts that there are no conflicting requests.

In particular, two cbw_co_snapshot_block_status() callers can race,
with the second calling reqlist_init_req() before the first one
finishes and removes its conflicting request, leading to an assertion
failure.

Reproducer script [0] and backtrace [1] are attached below.



Understand. But seems in case of CBW read-lock, nothing bad in intersecting 
read requests?

reqlist is shared with backup, where we care to avoid intersecting requests in 
the list. What about just move the assertion to block_copy_task_create() ? And 
add comment somewhere that we support intersecting reads in frozen_read_reqs.




[0]:


#!/bin/bash -e
dd if=/dev/urandom of=/tmp/disk.raw bs=1M count=1024
./qemu-img create /tmp/fleecing.raw -f raw 1G
(
./qemu-system-x86_64 --qmp stdio \
--blockdev raw,node-name=node0,file.driver=file,file.filename=/tmp/disk.raw \
--blockdev raw,node-name=node1,file.driver=file,file.filename=/tmp/fleecing.raw 
\
<

[1]:


#5  0x71e5f0088eb2 in __GI___assert_fail (...) at ./assert/assert.c:101
#6  0x615285438017 in reqlist_init_req (...) at ../block/reqlist.c:23
#7  0x6152853e2d98 in cbw_snapshot_read_lock (...) at 
../block/copy-before-write.c:237
#8  0x6152853e3068 in cbw_co_snapshot_block_status (...) at 
../block/copy-before-write.c:304
#9  0x6152853f4d22 in bdrv_co_snapshot_block_status (...) at 
../block/io.c:3726
#10 0x61528543a63e in snapshot_access_co_block_status (...) at 
../block/snapshot-access.c:48
#11 0x6152853f1a0a in bdrv_co_do_block_status (...) at ../block/io.c:2474
#12 0x6152853f2016 in bdrv_co_common_block_status_above (...) at 
../block/io.c:2652
#13 0x6152853f22cf in bdrv_co_block_status_above (...) at ../block/io.c:2732
#14 0x6152853d9a86 in blk_co_block_status_above (...) at 
../block/block-backend.c:1473
#15 0x61528538da6c in blockstatus_to_extents (...) at ../nbd/server.c:2374
#16 0x61528538deb1 in nbd_co_send_block_status (...) at ../nbd/server.c:2481
#17 0x61528538f424 in nbd_handle_request (...) at ../nbd/server.c:2978
#18 0x61528538f906 in nbd_trip (...) at ../nbd/server.c:3121
#19 0x6152855a7caf in coroutine_trampoline (...) at 
../util/coroutine-ucontext.c:175


Signed-off-by: Fiona Ebner 
---
  block/copy-before-write.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index 853e01a1eb..376ff3f3e1 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -234,6 +234,7 @@ cbw_snapshot_read_lock(BlockDriverState *bs, int64_t 
offset, int64_t bytes,
  *req = (BlockReq) {.offset = -1, .bytes = -1};
  *file = s->target;
  } else {
+reqlist_wait_all(>frozen_read_reqs, offset, bytes, >lock);
  reqlist_init_req(>frozen_read_reqs, req, offset, bytes);
  *file = bs->file;
  }


--
Best regards,
Vladimir

Re: [PATCH 1/2] block: zero data data corruption using prealloc-filter

2024-07-12 Thread Vladimir Sementsov-Ogievskiy


On 11.07.24 16:32, Andrey Drobyshev wrote:

From: "Denis V. Lunev" 

We have observed that some clusters in the QCOW2 files are zeroed
while preallocation filter is used.

We are able to trace down the following sequence when prealloc-filter
is used:
 co=0x55e7cbed7680 qcow2_co_pwritev_task()
 co=0x55e7cbed7680 preallocate_co_pwritev_part()
 co=0x55e7cbed7680 handle_write()
 co=0x55e7cbed7680 bdrv_co_do_pwrite_zeroes()
 co=0x55e7cbed7680 raw_do_pwrite_zeroes()
 co=0x7f9edb7fe500 do_fallocate()

Here coroutine 0x55e7cbed7680 is being blocked waiting while coroutine
0x7f9edb7fe500 will finish with fallocate of the file area. OK. It is
time to handle next coroutine, which
 co=0x55e7cbee91b0 qcow2_co_pwritev_task()
 co=0x55e7cbee91b0 preallocate_co_pwritev_part()
 co=0x55e7cbee91b0 handle_write()
 co=0x55e7cbee91b0 bdrv_co_do_pwrite_zeroes()
 co=0x55e7cbee91b0 raw_do_pwrite_zeroes()
 co=0x7f9edb7deb00 do_fallocate()

The trouble comes here. Coroutine 0x55e7cbed7680 has not advanced
file_end yet and coroutine 0x55e7cbee91b0 will start fallocate() for
the same area. This means that if (once fallocate is started inside
0x7f9edb7deb00) original fallocate could end and the real write will
be executed. In that case write() request is handled at the same time
as fallocate().

Normally we should protect s->file_end while it is detected that
preallocation is need. The patch introduces file_end_lock for it to be
protected when run in the coroutine context.

Note: the lock is taken only once it is detected that the preallocation
is really required. This is not a frequent case due to the preallocation
nature thus the patch should not have performance impact.

Originally-by: Denis V. Lunev 
Co-authored-by: Andrey Drobyshev 
Signed-off-by: Denis V. Lunev 
Signed-off-by: Andrey Drobyshev 
---
  block/preallocate.c | 11 +++
  1 file changed, 11 insertions(+)

diff --git a/block/preallocate.c b/block/preallocate.c
index d215bc5d6d..9cb2c97635 100644
--- a/block/preallocate.c
+++ b/block/preallocate.c
@@ -78,6 +78,8 @@ typedef struct BDRVPreallocateState {
  
  /* Gives up the resize permission on children when parents don't need it */

  QEMUBH *drop_resize_bh;
+
+CoMutex file_end_lock;
  } BDRVPreallocateState;
  
  static int preallocate_drop_resize(BlockDriverState *bs, Error **errp);

@@ -170,6 +172,8 @@ static int preallocate_open(BlockDriverState *bs, QDict 
*options, int flags,
  ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
  bs->file->bs->supported_zero_flags);
  
+qemu_co_mutex_init(>file_end_lock);

+
  return 0;
  }
  
@@ -342,6 +346,7 @@ handle_write(BlockDriverState *bs, int64_t offset, int64_t bytes,

  return false;
  }
  
+QEMU_LOCK_GUARD(>file_end_lock);

  if (s->file_end < 0) {
  s->file_end = s->data_end;
  }
@@ -353,6 +358,8 @@ handle_write(BlockDriverState *bs, int64_t offset, int64_t 
bytes,
  
  /* We have valid s->data_end, and request writes beyond it. */
  
+QEMU_LOCK_GUARD(>file_end_lock);

+
  s->data_end = end;
  if (s->zero_start < 0 || !want_merge_zero) {
  s->zero_start = end;
@@ -428,6 +435,8 @@ preallocate_co_truncate(BlockDriverState *bs, int64_t 
offset,
  BDRVPreallocateState *s = bs->opaque;
  int ret;
  
+QEMU_LOCK_GUARD(>file_end_lock);

+
  if (s->data_end >= 0 && offset > s->data_end) {
  if (s->file_end < 0) {
  s->file_end = bdrv_co_getlength(bs->file->bs);
@@ -501,6 +510,8 @@ preallocate_co_getlength(BlockDriverState *bs)
  return s->data_end;
  }
  
+QEMU_LOCK_GUARD(>file_end_lock);

+
  ret = bdrv_co_getlength(bs->file->bs);
  
  if (has_prealloc_perms(bs)) {



Hmm, seems preallocate driver not thread-safe neither coroutine-safe. I think 
co-mutex is good idea. Still, protecting only s->file_end may be not enough

- we do want to keep data_end / zero_start / file_end variables correct
- probably, just make the whole preallocation code a critical section? Maybe, 
atomic read of variables for fast-path (for writes which doesn't need 
preallocation)


--
Best regards,
Vladimir

Re: [PATCH v2 1/2] block: zero data data corruption using prealloc-filter

2024-07-12 Thread Vladimir Sementsov-Ogievskiy


On 12.07.24 12:46, Andrey Drobyshev wrote:

From: "Denis V. Lunev" 

We have observed that some clusters in the QCOW2 files are zeroed
while preallocation filter is used.

We are able to trace down the following sequence when prealloc-filter
is used:
 co=0x55e7cbed7680 qcow2_co_pwritev_task()
 co=0x55e7cbed7680 preallocate_co_pwritev_part()
 co=0x55e7cbed7680 handle_write()
 co=0x55e7cbed7680 bdrv_co_do_pwrite_zeroes()
 co=0x55e7cbed7680 raw_do_pwrite_zeroes()
 co=0x7f9edb7fe500 do_fallocate()

Here coroutine 0x55e7cbed7680 is being blocked waiting while coroutine
0x7f9edb7fe500 will finish with fallocate of the file area. OK. It is
time to handle next coroutine, which
 co=0x55e7cbee91b0 qcow2_co_pwritev_task()
 co=0x55e7cbee91b0 preallocate_co_pwritev_part()
 co=0x55e7cbee91b0 handle_write()
 co=0x55e7cbee91b0 bdrv_co_do_pwrite_zeroes()
 co=0x55e7cbee91b0 raw_do_pwrite_zeroes()
 co=0x7f9edb7deb00 do_fallocate()

The trouble comes here. Coroutine 0x55e7cbed7680 has not advanced
file_end yet and coroutine 0x55e7cbee91b0 will start fallocate() for
the same area. This means that if (once fallocate is started inside
0x7f9edb7deb00) original fallocate could end and the real write will
be executed. In that case write() request is handled at the same time
as fallocate().

The patch moves s->file_lock assignment before fallocate and that is


text need to be updated


crucial. The idea is that all subsequent requests into the area
being preallocation will be issued as just writes without fallocate
to this area and they will not proceed thanks to overlapping
requests mechanics. If preallocation will fail, we will just switch
to the normal expand-by-write behavior and that is not a problem
except performance.

Signed-off-by: Denis V. Lunev 
Tested-by: Andrey Drobyshev 
---
  block/preallocate.c | 8 +++-
  1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/block/preallocate.c b/block/preallocate.c
index d215bc5d6d..ecf0aa4baa 100644
--- a/block/preallocate.c
+++ b/block/preallocate.c
@@ -383,6 +383,13 @@ handle_write(BlockDriverState *bs, int64_t offset, int64_t 
bytes,
  
  want_merge_zero = want_merge_zero && (prealloc_start <= offset);
  
+/*

+ * Assign file_end before making actual preallocation. This will ensure
+ * that next request performed while preallocation is in progress will
+ * be passed without preallocation.
+ */
+s->file_end = prealloc_end;
+
  ret = bdrv_co_pwrite_zeroes(
  bs->file, prealloc_start, prealloc_end - prealloc_start,
  BDRV_REQ_NO_FALLBACK | BDRV_REQ_SERIALISING | BDRV_REQ_NO_WAIT);
@@ -391,7 +398,6 @@ handle_write(BlockDriverState *bs, int64_t offset, int64_t 
bytes,
  return false;
  }
  
-s->file_end = prealloc_end;

  return want_merge_zero;
  }
  



Hmm. But this way we set both s->file_end and s->zero_start prior to actual 
write_zero operation. This means that next write-zero operation may go fast-path (see 
preallocate_co_pwrite_zeroes()) and return success, even before actual finish of 
preallocation write_zeroes operation (which may also fail). Seems we need to update 
logic around s->zero_start too.

--
Best regards,
Vladimir

Re: [PATCH v5 0/3] vhost-user-blk: live resize additional APIs

2024-07-11 Thread Vladimir Sementsov-Ogievskiy


ping. Markus, Eric, could someone give an ACC for QAPI part?

On 25.06.24 15:18, Vladimir Sementsov-Ogievskiy wrote:

v5:
03: drop extra check on is is runstate running


Vladimir Sementsov-Ogievskiy (3):
   qdev-monitor: add option to report GenericError from find_device_state
   vhost-user-blk: split vhost_user_blk_sync_config()
   qapi: introduce device-sync-config

  hw/block/vhost-user-blk.c | 27 ++--
  hw/virtio/virtio-pci.c|  9 +++
  include/hw/qdev-core.h|  3 +++
  qapi/qdev.json| 24 ++
  system/qdev-monitor.c | 53 ---
  5 files changed, 105 insertions(+), 11 deletions(-)



--
Best regards,
Vladimir

Re: [PATCH] block/curl: rewrite http header parsing function

2024-07-01 Thread Vladimir Sementsov-Ogievskiy


On 01.07.24 09:55, Michael Tokarev wrote:

01.07.2024 09:54, Vladimir Sementsov-Ogievskiy wrote:


+    const char *t = "accept-ranges : bytes "; /* A lowercase template */


Note: you make parser less strict: you allow "bytes" to be uppercase (was allowed 
only for accept-ranges", and you allow whitespaces before colon.


Yes, exactly.

I should add this to the description (wanted to do that but forgot).
I'll update the patch (without re-sending) - hopefully its' okay to
keep your S-o-b :)



Of course!

--
Best regards,
Vladimir

Re: [PATCH] block/curl: rewrite http header parsing function

2024-07-01 Thread Vladimir Sementsov-Ogievskiy


On 29.06.24 17:25, Michael Tokarev wrote:

Existing code was long, unclear and twisty.

Signed-off-by: Michael Tokarev 


Reviewed-by: Vladimir Sementsov-Ogievskiy 


---
  block/curl.c | 44 ++--
  1 file changed, 18 insertions(+), 26 deletions(-)

diff --git a/block/curl.c b/block/curl.c
index 419f7c89ef..9802d0319d 100644
--- a/block/curl.c
+++ b/block/curl.c
@@ -210,37 +210,29 @@ static size_t curl_header_cb(void *ptr, size_t size, 
size_t nmemb, void *opaque)
  {
  BDRVCURLState *s = opaque;
  size_t realsize = size * nmemb;
-const char *header = (char *)ptr;
-const char *end = header + realsize;
-const char *accept_ranges = "accept-ranges:";
-const char *bytes = "bytes";
+const char *p = ptr;
+const char *end = p + realsize;
+const char *t = "accept-ranges : bytes "; /* A lowercase template */


Note: you make parser less strict: you allow "bytes" to be uppercase (was allowed 
only for accept-ranges", and you allow whitespaces before colon.

  
-if (realsize >= strlen(accept_ranges)

-&& g_ascii_strncasecmp(header, accept_ranges,
-   strlen(accept_ranges)) == 0) {
-
-char *p = strchr(header, ':') + 1;
-
-/* Skip whitespace between the header name and value. */
-while (p < end && *p && g_ascii_isspace(*p)) {
-p++;
-}
-
-if (end - p >= strlen(bytes)
-&& strncmp(p, bytes, strlen(bytes)) == 0) {
-
-/* Check that there is nothing but whitespace after the value. */
-p += strlen(bytes);
-while (p < end && *p && g_ascii_isspace(*p)) {
-p++;
-}
-
-if (p == end || !*p) {
-s->accept_range = true;
+/* check if header matches the "t" template */
+for (;;) {
+if (*t == ' ') { /* space in t matches any amount of isspace in p */
+if (p < end && g_ascii_isspace(*p)) {
+++p;
+} else {
+++t;
  }
+} else if (*t && p < end && *t == g_ascii_tolower(*p)) {
+++p, ++t;
+} else {
+break;
  }
  }
  
+if (!*t && p == end) { /* if we managed to reach ends of both strings */

+s->accept_range = true;
+}
+
  return realsize;
  }
  


--
Best regards,
Vladimir

Re: [PATCH] block/curl: use strlen instead of strchr

2024-07-01 Thread Vladimir Sementsov-Ogievskiy


On 01.07.24 09:34, Vladimir Sementsov-Ogievskiy wrote:

On 29.06.24 09:20, Michael Tokarev wrote:

On 6/28/24 08:49, Vladimir Sementsov-Ogievskiy wrote:

We already know where colon is, so no reason to search for it. Also,
avoid a code, which looks like we forget to check return value of
strchr() to NULL.

Suggested-by: Kevin Wolf 
Signed-off-by: Vladimir Sementsov-Ogievskiy 
---

This replaces my patch
   [PATCH] block/curl: explicitly assert that strchr returns non-NULL value
Supersedes: <20240627153059.589070-1-vsement...@yandex-team.ru>

  block/curl.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/curl.c b/block/curl.c
index 419f7c89ef..d03bfe4817 100644
--- a/block/curl.c
+++ b/block/curl.c
@@ -219,7 +219,7 @@ static size_t curl_header_cb(void *ptr, size_t size, size_t 
nmemb, void *opaque)
  && g_ascii_strncasecmp(header, accept_ranges,
 strlen(accept_ranges)) == 0) {
-    char *p = strchr(header, ':') + 1;
+    char *p = header + strlen(accept_ranges);
  /* Skip whitespace between the header name and value. */
  while (p < end && *p && g_ascii_isspace(*p)) {


Heck.  All these strlen()s look ugly, especially in the
loop iterations..


I expect that strlen of string constant is calculated in compilation time.

My aim was to fix Coverity complain (I don't see this complain in public qemu 
coverity project, that's why I don't specify CID in commit message), not to 
rewrite the whole function. So I'd prefer Kevein's suggesting which is both 
minimal and makes the code obviously correct.. The only simpler thing is to 
mark the problem false-positive in Coverity.




Upd: I missed that you sent a patch, this changes things. Of course, you code 
looks nicer than old one.

--
Best regards,
Vladimir

Re: [PATCH] block/curl: use strlen instead of strchr

2024-07-01 Thread Vladimir Sementsov-Ogievskiy


On 29.06.24 09:20, Michael Tokarev wrote:

On 6/28/24 08:49, Vladimir Sementsov-Ogievskiy wrote:

We already know where colon is, so no reason to search for it. Also,
avoid a code, which looks like we forget to check return value of
strchr() to NULL.

Suggested-by: Kevin Wolf 
Signed-off-by: Vladimir Sementsov-Ogievskiy 
---

This replaces my patch
   [PATCH] block/curl: explicitly assert that strchr returns non-NULL value
Supersedes: <20240627153059.589070-1-vsement...@yandex-team.ru>

  block/curl.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/curl.c b/block/curl.c
index 419f7c89ef..d03bfe4817 100644
--- a/block/curl.c
+++ b/block/curl.c
@@ -219,7 +219,7 @@ static size_t curl_header_cb(void *ptr, size_t size, size_t 
nmemb, void *opaque)
  && g_ascii_strncasecmp(header, accept_ranges,
 strlen(accept_ranges)) == 0) {
-    char *p = strchr(header, ':') + 1;
+    char *p = header + strlen(accept_ranges);
  /* Skip whitespace between the header name and value. */
  while (p < end && *p && g_ascii_isspace(*p)) {


Heck.  All these strlen()s look ugly, especially in the
loop iterations..


I expect that strlen of string constant is calculated in compilation time.

My aim was to fix Coverity complain (I don't see this complain in public qemu 
coverity project, that's why I don't specify CID in commit message), not to 
rewrite the whole function. So I'd prefer Kevein's suggesting which is both 
minimal and makes the code obviously correct.. The only simpler thing is to 
mark the problem false-positive in Coverity.

--
Best regards,
Vladimir

[PATCH] block/curl: use strlen instead of strchr

2024-06-27 Thread Vladimir Sementsov-Ogievskiy

We already know where colon is, so no reason to search for it. Also,
avoid a code, which looks like we forget to check return value of
strchr() to NULL.

Suggested-by: Kevin Wolf 
Signed-off-by: Vladimir Sementsov-Ogievskiy 
---

This replaces my patch
  [PATCH] block/curl: explicitly assert that strchr returns non-NULL value
Supersedes: <20240627153059.589070-1-vsement...@yandex-team.ru>

 block/curl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/curl.c b/block/curl.c
index 419f7c89ef..d03bfe4817 100644
--- a/block/curl.c
+++ b/block/curl.c
@@ -219,7 +219,7 @@ static size_t curl_header_cb(void *ptr, size_t size, size_t 
nmemb, void *opaque)
 && g_ascii_strncasecmp(header, accept_ranges,
strlen(accept_ranges)) == 0) {
 
-char *p = strchr(header, ':') + 1;
+char *p = header + strlen(accept_ranges);
 
 /* Skip whitespace between the header name and value. */
 while (p < end && *p && g_ascii_isspace(*p)) {
-- 
2.34.1

Re: [PATCH] block/curl: explicitly assert that strchr returns non-NULL value

2024-06-27 Thread Vladimir Sementsov-Ogievskiy


On 27.06.24 21:05, Kevin Wolf wrote:

Am 27.06.2024 um 17:30 hat Vladimir Sementsov-Ogievskiy geschrieben:

strchr may return NULL if colon is not found. It seems clearer to
assert explicitly that we don't expect it here, than dereference 1 in
the next line.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  block/curl.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/curl.c b/block/curl.c
index 419f7c89ef..ccfffd6c12 100644
--- a/block/curl.c
+++ b/block/curl.c
@@ -219,7 +219,9 @@ static size_t curl_header_cb(void *ptr, size_t size, size_t 
nmemb, void *opaque)
  && g_ascii_strncasecmp(header, accept_ranges,
 strlen(accept_ranges)) == 0) {
  
-char *p = strchr(header, ':') + 1;

+char *p = strchr(header, ':');
+assert(p != NULL);
+p += 1;


I'm not sure if this is actually much clearer because it doesn't say why
we don't expect NULL here. If you don't look at the context, it almost
looks like an assert() where proper error handling is needed. If you do,
then the original line is clear enough.

My first thought was that maybe what we want is a comment, but we
actually already know where the colon is. So how about this instead:

 char *p = header + strlen(accept_ranges);



Oh, right. That's better.




  /* Skip whitespace between the header name and value. */
  while (p < end && *p && g_ascii_isspace(*p)) {
--
2.34.1





--
Best regards,
Vladimir

[PATCH] hw/core/loader: gunzip(): fix memory leak on error path

2024-06-27 Thread Vladimir Sementsov-Ogievskiy

We should call inflateEnd() like on success path to cleanup state in s
variable.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 hw/core/loader.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/core/loader.c b/hw/core/loader.c
index 2f8105d7de..a3bea1e718 100644
--- a/hw/core/loader.c
+++ b/hw/core/loader.c
@@ -610,6 +610,7 @@ ssize_t gunzip(void *dst, size_t dstlen, uint8_t *src, 
size_t srclen)
 r = inflate(, Z_FINISH);
 if (r != Z_OK && r != Z_STREAM_END) {
 printf ("Error: inflate() returned %d\n", r);
+inflateEnd();
 return -1;
 }
 dstbytes = s.next_out - (unsigned char *) dst;
-- 
2.34.1

[PATCH] block/curl: explicitly assert that strchr returns non-NULL value

2024-06-27 Thread Vladimir Sementsov-Ogievskiy

strchr may return NULL if colon is not found. It seems clearer to
assert explicitly that we don't expect it here, than dereference 1 in
the next line.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/curl.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/curl.c b/block/curl.c
index 419f7c89ef..ccfffd6c12 100644
--- a/block/curl.c
+++ b/block/curl.c
@@ -219,7 +219,9 @@ static size_t curl_header_cb(void *ptr, size_t size, size_t 
nmemb, void *opaque)
 && g_ascii_strncasecmp(header, accept_ranges,
strlen(accept_ranges)) == 0) {
 
-char *p = strchr(header, ':') + 1;
+char *p = strchr(header, ':');
+assert(p != NULL);
+p += 1;
 
 /* Skip whitespace between the header name and value. */
 while (p < end && *p && g_ascii_isspace(*p)) {
-- 
2.34.1

[PATCH v2 3/3] block/backup: implement final flush

2024-06-26 Thread Vladimir Sementsov-Ogievskiy

Actually block job is not completed without the final flush. It's
rather unexpected to have broken target when job was successfully
completed long ago and now we fail to flush or process just
crashed/killed.

Mirror job already has mirror_flush() for this. So, it's OK.

Do this for backup job too.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/backup.c | 2 +-
 block/block-copy.c | 7 +++
 include/block/block-copy.h | 1 +
 3 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/block/backup.c b/block/backup.c
index 3dd2e229d2..fee78ba5ad 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -156,7 +156,7 @@ static int coroutine_fn backup_loop(BackupBlockJob *job)
 job->bg_bcs_call = s = block_copy_async(job->bcs, 0,
 QEMU_ALIGN_UP(job->len, job->cluster_size),
 job->perf.max_workers, job->perf.max_chunk,
-backup_block_copy_callback, job);
+true, backup_block_copy_callback, job);
 
 while (!block_copy_call_finished(s) &&
!job_is_cancelled(>common.job))
diff --git a/block/block-copy.c b/block/block-copy.c
index 7e3b378528..842b0383db 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -54,6 +54,7 @@ typedef struct BlockCopyCallState {
 int max_workers;
 int64_t max_chunk;
 bool ignore_ratelimit;
+bool need_final_flush;
 BlockCopyAsyncCallbackFunc cb;
 void *cb_opaque;
 /* Coroutine where async block-copy is running */
@@ -899,6 +900,10 @@ block_copy_common(BlockCopyCallState *call_state)
  */
 } while (ret > 0 && !qatomic_read(_state->cancelled));
 
+if (ret == 0 && call_state->need_final_flush) {
+ret = bdrv_co_flush(s->target->bs);
+}
+
 qatomic_store_release(_state->finished, true);
 
 if (call_state->cb) {
@@ -954,6 +959,7 @@ int coroutine_fn block_copy(BlockCopyState *s, int64_t 
start, int64_t bytes,
 BlockCopyCallState *block_copy_async(BlockCopyState *s,
  int64_t offset, int64_t bytes,
  int max_workers, int64_t max_chunk,
+ bool need_final_flush,
  BlockCopyAsyncCallbackFunc cb,
  void *cb_opaque)
 {
@@ -965,6 +971,7 @@ BlockCopyCallState *block_copy_async(BlockCopyState *s,
 .bytes = bytes,
 .max_workers = max_workers,
 .max_chunk = max_chunk,
+.need_final_flush = need_final_flush,
 .cb = cb,
 .cb_opaque = cb_opaque,
 
diff --git a/include/block/block-copy.h b/include/block/block-copy.h
index bdc703bacd..6588ebaf77 100644
--- a/include/block/block-copy.h
+++ b/include/block/block-copy.h
@@ -62,6 +62,7 @@ int coroutine_fn block_copy(BlockCopyState *s, int64_t 
offset, int64_t bytes,
 BlockCopyCallState *block_copy_async(BlockCopyState *s,
  int64_t offset, int64_t bytes,
  int max_workers, int64_t max_chunk,
+ bool need_final_flush,
  BlockCopyAsyncCallbackFunc cb,
  void *cb_opaque);
 
-- 
2.34.1

[PATCH v2 0/3] block-jobs: add final flush

2024-06-26 Thread Vladimir Sementsov-Ogievskiy

Actually block job is not completed without the final flush. It's
rather unexpected to have broken target when job was successfully
completed long ago and now we fail to flush or process just
crashed/killed.

Mirror job already has mirror_flush() for this. So, it's OK.

Add similar things for other jobs: backup, stream, commit.

v2: rework stream and commit, also split into 3 commits.

Vladimir Sementsov-Ogievskiy (3):
  block/commit: implement final flush
  block/stream: implement final flush
  block/backup: implement final flush

 block/backup.c |  2 +-
 block/block-copy.c |  7 
 block/commit.c | 41 +++
 block/stream.c | 67 +++---
 include/block/block-copy.h |  1 +
 5 files changed, 77 insertions(+), 41 deletions(-)

-- 
2.34.1

[PATCH v2 2/3] block/stream: implement final flush

2024-06-26 Thread Vladimir Sementsov-Ogievskiy

Actually block job is not completed without the final flush. It's
rather unexpected to have broken target when job was successfully
completed long ago and now we fail to flush or process just
crashed/killed.

Mirror job already has mirror_flush() for this. So, it's OK.

Do this for stream job too.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/stream.c | 67 ++
 1 file changed, 41 insertions(+), 26 deletions(-)

diff --git a/block/stream.c b/block/stream.c
index 7031eef12b..893db258d4 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -160,6 +160,7 @@ static int coroutine_fn stream_run(Job *job, Error **errp)
 int64_t offset = 0;
 int error = 0;
 int64_t n = 0; /* bytes */
+bool need_final_flush = true;
 
 WITH_GRAPH_RDLOCK_GUARD() {
 unfiltered_bs = bdrv_skip_filters(s->target_bs);
@@ -175,10 +176,13 @@ static int coroutine_fn stream_run(Job *job, Error **errp)
 }
 job_progress_set_remaining(>common.job, len);
 
-for ( ; offset < len; offset += n) {
-bool copy;
+for ( ; offset < len || need_final_flush; offset += n) {
+bool copy = false;
+bool error_is_read = true;
 int ret;
 
+n = 0;
+
 /* Note that even when no rate limit is applied we need to yield
  * with no pending I/O here so that bdrv_drain_all() returns.
  */
@@ -187,35 +191,46 @@ static int coroutine_fn stream_run(Job *job, Error **errp)
 break;
 }
 
-copy = false;
-
-WITH_GRAPH_RDLOCK_GUARD() {
-ret = bdrv_co_is_allocated(unfiltered_bs, offset, STREAM_CHUNK, 
);
-if (ret == 1) {
-/* Allocated in the top, no need to copy.  */
-} else if (ret >= 0) {
-/*
- * Copy if allocated in the intermediate images.  Limit to the
- * known-unallocated area [offset, offset+n*BDRV_SECTOR_SIZE).
- */
-ret = bdrv_co_is_allocated_above(bdrv_cow_bs(unfiltered_bs),
- s->base_overlay, true,
- offset, n, );
-/* Finish early if end of backing file has been reached */
-if (ret == 0 && n == 0) {
-n = len - offset;
+if (offset < len) {
+WITH_GRAPH_RDLOCK_GUARD() {
+ret = bdrv_co_is_allocated(unfiltered_bs, offset, STREAM_CHUNK,
+   );
+if (ret == 1) {
+/* Allocated in the top, no need to copy.  */
+} else if (ret >= 0) {
+/*
+ * Copy if allocated in the intermediate images.  Limit to
+ * the known-unallocated area
+ * [offset, offset+n*BDRV_SECTOR_SIZE).
+ */
+ret = 
bdrv_co_is_allocated_above(bdrv_cow_bs(unfiltered_bs),
+ s->base_overlay, true,
+ offset, n, );
+/* Finish early if end of backing file has been reached */
+if (ret == 0 && n == 0) {
+n = len - offset;
+}
+
+copy = (ret > 0);
 }
-
-copy = (ret > 0);
 }
-}
-trace_stream_one_iteration(s, offset, n, ret);
-if (copy) {
-ret = stream_populate(s->blk, offset, n);
+trace_stream_one_iteration(s, offset, n, ret);
+if (copy) {
+ret = stream_populate(s->blk, offset, n);
+}
+} else {
+assert(need_final_flush);
+ret = blk_co_flush(s->blk);
+if (ret < 0) {
+error_is_read = false;
+} else {
+need_final_flush = false;
+}
 }
 if (ret < 0) {
 BlockErrorAction action =
-block_job_error_action(>common, s->on_error, true, -ret);
+block_job_error_action(>common, s->on_error,
+   error_is_read, -ret);
 if (action == BLOCK_ERROR_ACTION_STOP) {
 n = 0;
 continue;
-- 
2.34.1

[PATCH v2 1/3] block/commit: implement final flush

2024-06-26 Thread Vladimir Sementsov-Ogievskiy

Actually block job is not completed without the final flush. It's
rather unexpected to have broken target when job was successfully
completed long ago and now we fail to flush or process just
crashed/killed.

Mirror job already has mirror_flush() for this. So, it's OK.

Do this for commit job too.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/commit.c | 41 +++--
 1 file changed, 27 insertions(+), 14 deletions(-)

diff --git a/block/commit.c b/block/commit.c
index 7c3fdcb0ca..81971692a2 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -134,6 +134,7 @@ static int coroutine_fn commit_run(Job *job, Error **errp)
 int64_t n = 0; /* bytes */
 QEMU_AUTO_VFREE void *buf = NULL;
 int64_t len, base_len;
+bool need_final_flush = true;
 
 len = blk_co_getlength(s->top);
 if (len < 0) {
@@ -155,8 +156,8 @@ static int coroutine_fn commit_run(Job *job, Error **errp)
 
 buf = blk_blockalign(s->top, COMMIT_BUFFER_SIZE);
 
-for (offset = 0; offset < len; offset += n) {
-bool copy;
+for (offset = 0; offset < len || need_final_flush; offset += n) {
+bool copy = false;
 bool error_in_source = true;
 
 /* Note that even when no rate limit is applied we need to yield
@@ -166,22 +167,34 @@ static int coroutine_fn commit_run(Job *job, Error **errp)
 if (job_is_cancelled(>common.job)) {
 break;
 }
-/* Copy if allocated above the base */
-ret = blk_co_is_allocated_above(s->top, s->base_overlay, true,
-offset, COMMIT_BUFFER_SIZE, );
-copy = (ret > 0);
-trace_commit_one_iteration(s, offset, n, ret);
-if (copy) {
-assert(n < SIZE_MAX);
 
-ret = blk_co_pread(s->top, offset, n, buf, 0);
-if (ret >= 0) {
-ret = blk_co_pwrite(s->base, offset, n, buf, 0);
-if (ret < 0) {
-error_in_source = false;
+if (offset < len) {
+/* Copy if allocated above the base */
+ret = blk_co_is_allocated_above(s->top, s->base_overlay, true,
+offset, COMMIT_BUFFER_SIZE, );
+copy = (ret > 0);
+trace_commit_one_iteration(s, offset, n, ret);
+if (copy) {
+assert(n < SIZE_MAX);
+
+ret = blk_co_pread(s->top, offset, n, buf, 0);
+if (ret >= 0) {
+ret = blk_co_pwrite(s->base, offset, n, buf, 0);
+if (ret < 0) {
+error_in_source = false;
+}
 }
 }
+} else {
+assert(need_final_flush);
+ret = blk_co_flush(s->base);
+if (ret < 0) {
+error_in_source = false;
+} else {
+need_final_flush = false;
+}
 }
+
 if (ret < 0) {
 BlockErrorAction action =
 block_job_error_action(>common, s->on_error,
-- 
2.34.1

[PATCH 2/3] vl.c: select_machine(): use g_autoptr

2024-06-26 Thread Vladimir Sementsov-Ogievskiy

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 system/vl.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/system/vl.c b/system/vl.c
index fa81037ce2..947b433905 100644
--- a/system/vl.c
+++ b/system/vl.c
@@ -1667,7 +1667,7 @@ static MachineClass *select_machine(QDict *qdict, Error 
**errp)
 {
 ERRP_GUARD();
 const char *machine_type = qdict_get_try_str(qdict, "type");
-GSList *machines = object_class_get_list(TYPE_MACHINE, false);
+g_autoptr(GSList) machines = object_class_get_list(TYPE_MACHINE, false);
 MachineClass *machine_class = NULL;
 
 if (machine_type) {
@@ -1683,7 +1683,6 @@ static MachineClass *select_machine(QDict *qdict, Error 
**errp)
 }
 }
 
-g_slist_free(machines);
 if (!machine_class) {
 error_append_hint(errp,
   "Use -machine help to list supported machines\n");
-- 
2.34.1

[PATCH 3/3] vl.c: select_machine(): add selected machine type to error message

2024-06-26 Thread Vladimir Sementsov-Ogievskiy

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 system/vl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/system/vl.c b/system/vl.c
index 947b433905..a6a4b470a7 100644
--- a/system/vl.c
+++ b/system/vl.c
@@ -1674,7 +1674,7 @@ static MachineClass *select_machine(QDict *qdict, Error 
**errp)
 machine_class = find_machine(machine_type, machines);
 qdict_del(qdict, "type");
 if (!machine_class) {
-error_setg(errp, "unsupported machine type");
+error_setg(errp, "unsupported machine type: \"%s\"", optarg);
 }
 } else {
 machine_class = find_default_machine(machines);
-- 
2.34.1

[PATCH 0/3] vl.c: select_machine(): improve error message

2024-06-26 Thread Vladimir Sementsov-Ogievskiy

Hi all!

Here are three simple patches, improving select_machine() function a
bit.

Vladimir Sementsov-Ogievskiy (3):
  vl.c: select_machine(): use ERRP_GUARD instead of error propagation
  vl.c: select_machine(): use g_autoptr
  vl.c: select_machine(): add selected machine type to error message

 system/vl.c | 17 -
 1 file changed, 8 insertions(+), 9 deletions(-)

-- 
2.34.1

[PATCH 1/3] vl.c: select_machine(): use ERRP_GUARD instead of error propagation

2024-06-26 Thread Vladimir Sementsov-Ogievskiy

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 system/vl.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/system/vl.c b/system/vl.c
index cfcb674425..fa81037ce2 100644
--- a/system/vl.c
+++ b/system/vl.c
@@ -1665,28 +1665,28 @@ static const QEMUOption *lookup_opt(int argc, char 
**argv,
 
 static MachineClass *select_machine(QDict *qdict, Error **errp)
 {
+ERRP_GUARD();
 const char *machine_type = qdict_get_try_str(qdict, "type");
 GSList *machines = object_class_get_list(TYPE_MACHINE, false);
-MachineClass *machine_class;
-Error *local_err = NULL;
+MachineClass *machine_class = NULL;
 
 if (machine_type) {
 machine_class = find_machine(machine_type, machines);
 qdict_del(qdict, "type");
 if (!machine_class) {
-error_setg(_err, "unsupported machine type");
+error_setg(errp, "unsupported machine type");
 }
 } else {
 machine_class = find_default_machine(machines);
 if (!machine_class) {
-error_setg(_err, "No machine specified, and there is no 
default");
+error_setg(errp, "No machine specified, and there is no default");
 }
 }
 
 g_slist_free(machines);
-if (local_err) {
-error_append_hint(_err, "Use -machine help to list supported 
machines\n");
-error_propagate(errp, local_err);
+if (!machine_class) {
+error_append_hint(errp,
+  "Use -machine help to list supported machines\n");
 }
 return machine_class;
 }
-- 
2.34.1

[PATCH v9 1/7] block-backend: blk_root(): drop const specifier on return type

2024-06-26 Thread Vladimir Sementsov-Ogievskiy

We'll need get non-const child pointer for graph modifications in
further commits.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/block-backend.c   | 2 +-
 include/sysemu/block-backend-global-state.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index db6f9b92a3..71d5002198 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -2879,7 +2879,7 @@ int coroutine_fn blk_co_copy_range(BlockBackend *blk_in, 
int64_t off_in,
   bytes, read_flags, write_flags);
 }
 
-const BdrvChild *blk_root(BlockBackend *blk)
+BdrvChild *blk_root(BlockBackend *blk)
 {
 GLOBAL_STATE_CODE();
 return blk->root;
diff --git a/include/sysemu/block-backend-global-state.h 
b/include/sysemu/block-backend-global-state.h
index 49c12b0fa9..ccb35546a1 100644
--- a/include/sysemu/block-backend-global-state.h
+++ b/include/sysemu/block-backend-global-state.h
@@ -126,7 +126,7 @@ void blk_set_force_allow_inactivate(BlockBackend *blk);
 bool blk_register_buf(BlockBackend *blk, void *host, size_t size, Error 
**errp);
 void blk_unregister_buf(BlockBackend *blk, void *host, size_t size);
 
-const BdrvChild *blk_root(BlockBackend *blk);
+BdrvChild *blk_root(BlockBackend *blk);
 
 int blk_make_empty(BlockBackend *blk, Error **errp);
 
-- 
2.34.1

[PATCH v9 2/7] block/export: add blk_by_export_id()

2024-06-26 Thread Vladimir Sementsov-Ogievskiy

We need it for further blockdev-replace functionality.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/export/export.c   | 18 ++
 include/sysemu/block-backend-global-state.h |  1 +
 2 files changed, 19 insertions(+)

diff --git a/block/export/export.c b/block/export/export.c
index 6d51ae8ed7..57beae7982 100644
--- a/block/export/export.c
+++ b/block/export/export.c
@@ -355,3 +355,21 @@ BlockExportInfoList *qmp_query_block_exports(Error **errp)
 
 return head;
 }
+
+BlockBackend *blk_by_export_id(const char *id, Error **errp)
+{
+BlockExport *exp;
+
+exp = blk_exp_find(id);
+if (exp == NULL) {
+error_setg(errp, "Export '%s' not found", id);
+return NULL;
+}
+
+if (!exp->blk) {
+error_setg(errp, "Export '%s' is empty", id);
+return NULL;
+}
+
+return exp->blk;
+}
diff --git a/include/sysemu/block-backend-global-state.h 
b/include/sysemu/block-backend-global-state.h
index ccb35546a1..410d0cc5c7 100644
--- a/include/sysemu/block-backend-global-state.h
+++ b/include/sysemu/block-backend-global-state.h
@@ -74,6 +74,7 @@ void blk_detach_dev(BlockBackend *blk, DeviceState *dev);
 DeviceState *blk_get_attached_dev(BlockBackend *blk);
 BlockBackend *blk_by_dev(void *dev);
 BlockBackend *blk_by_qdev_id(const char *id, Error **errp);
+BlockBackend *blk_by_export_id(const char *id, Error **errp);
 void blk_set_dev_ops(BlockBackend *blk, const BlockDevOps *ops, void *opaque);
 
 void blk_activate(BlockBackend *blk, Error **errp);
-- 
2.34.1

[PATCH v9 0/7] blockdev-replace

2024-06-26 Thread Vladimir Sementsov-Ogievskiy

Hi all!

This series presents a new command blockdev-replace, which helps to
insert/remove filters anywhere in the block graph. It can:

 - replace qdev block-node by qdev-id
 - replace export block-node by export-id
 - replace any child of parent block-node by node-name and child name

So insertions is done in two steps:

1. blockdev_add (create filter node, unparented)

[some parent]  [new filter]
 |   |
 V   V
[some child   ]

2. blockdev-replace (replace child by the filter)

[some parent]
 | 
 V
[new filter]
 |
 V
[some child]

And removal is done in reverse order:

1. blockdev-replace (go back to picture 1)
2. blockdev_del (remove filter node)


Ideally, we to do both operations (add + replace or replace + del) in a
transaction, but that would be another series.

v9: rebase
drop x- prefix and use unstable feature
bump version to 9.1 in qapi spec
update error message in blk_by_qdev_id stub

v8: rebase. Also don't use "preallocate" filter in a test, as we don't
support removal of this filter for now. Preallocate filter is really
unusual, see discussion here:
https://www.mail-archive.com/qemu-devel@nongnu.org/msg994945.html

Vladimir Sementsov-Ogievskiy (7):
  block-backend: blk_root(): drop const specifier on return type
  block/export: add blk_by_export_id()
  block: make bdrv_find_child() function public
  qapi: add blockdev-replace command
  block: bdrv_get_xdbg_block_graph(): report export ids
  iotests.py: introduce VM.assert_edges_list() method
  iotests: add filter-insertion

 block.c   |  17 ++
 block/block-backend.c |   2 +-
 block/export/export.c |  31 +++
 blockdev.c|  70 --
 include/block/block_int-io.h  |   2 +
 include/block/export.h|   1 +
 include/sysemu/block-backend-global-state.h   |   3 +-
 qapi/block-core.json  |  88 +++
 stubs/blk-by-qdev-id.c|  13 +
 stubs/blk-exp-find-by-blk.c   |   9 +
 stubs/meson.build |   2 +
 tests/qemu-iotests/iotests.py |  17 ++
 tests/qemu-iotests/tests/filter-insertion | 236 ++
 tests/qemu-iotests/tests/filter-insertion.out |   5 +
 14 files changed, 480 insertions(+), 16 deletions(-)
 create mode 100644 stubs/blk-by-qdev-id.c
 create mode 100644 stubs/blk-exp-find-by-blk.c
 create mode 100755 tests/qemu-iotests/tests/filter-insertion
 create mode 100644 tests/qemu-iotests/tests/filter-insertion.out

-- 
2.34.1

[PATCH v9 7/7] iotests: add filter-insertion

2024-06-26 Thread Vladimir Sementsov-Ogievskiy

Demonstrate new blockdev-replace API for filter insertion and removal.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 tests/qemu-iotests/tests/filter-insertion | 236 ++
 tests/qemu-iotests/tests/filter-insertion.out |   5 +
 2 files changed, 241 insertions(+)
 create mode 100755 tests/qemu-iotests/tests/filter-insertion
 create mode 100644 tests/qemu-iotests/tests/filter-insertion.out

diff --git a/tests/qemu-iotests/tests/filter-insertion 
b/tests/qemu-iotests/tests/filter-insertion
new file mode 100755
index 00..02a0978f96
--- /dev/null
+++ b/tests/qemu-iotests/tests/filter-insertion
@@ -0,0 +1,236 @@
+#!/usr/bin/env python3
+#
+# Tests for inserting and removing filters in a block graph.
+#
+# Copyright (c) 2022 Virtuozzo International GmbH.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+import os
+
+import iotests
+from iotests import qemu_img_create, try_remove
+
+
+disk = os.path.join(iotests.test_dir, 'disk')
+sock = os.path.join(iotests.sock_dir, 'sock')
+size = 1024 * 1024
+
+
+class TestFilterInsertion(iotests.QMPTestCase):
+def setUp(self):
+qemu_img_create('-f', iotests.imgfmt, disk, str(size))
+
+self.vm = iotests.VM()
+self.vm.launch()
+
+self.vm.cmd('blockdev-add', {
+'node-name': 'disk0',
+'driver': 'qcow2',
+'file': {
+'node-name': 'file0',
+'driver': 'file',
+'filename': disk
+}
+})
+
+def tearDown(self):
+self.vm.shutdown()
+os.remove(disk)
+try_remove(sock)
+
+def test_simple_insertion(self):
+vm = self.vm
+
+vm.cmd('blockdev-add', {
+'node-name': 'filter',
+'driver': 'blkdebug',
+'image': 'file0'
+})
+
+vm.cmd('blockdev-replace', {
+'parent-type': 'driver',
+'node-name': 'disk0',
+'child': 'file',
+'new-child': 'filter'
+})
+
+# Filter inserted:
+# disk0 -file-> filter -file-> file0
+vm.assert_edges_list([
+('disk0', 'file', 'filter'),
+('filter', 'image', 'file0')
+])
+
+vm.cmd('blockdev-replace', {
+'parent-type': 'driver',
+'node-name': 'disk0',
+'child': 'file',
+'new-child': 'file0'
+})
+
+# Filter replaced, but still exists:
+# dik0 -file-> file0 <-file- filter
+vm.assert_edges_list([
+('disk0', 'file', 'file0'),
+('filter', 'image', 'file0')
+])
+
+vm.cmd('blockdev-del', node_name='filter')
+
+# Filter removed
+# dik0 -file-> file0
+vm.assert_edges_list([
+('disk0', 'file', 'file0')
+])
+
+def test_insert_under_qdev(self):
+vm = self.vm
+
+vm.cmd('device_add', driver='virtio-scsi')
+vm.cmd('device_add', id='sda', driver='scsi-hd',
+ drive='disk0')
+
+vm.cmd('blockdev-add', {
+'node-name': 'filter',
+'driver': 'compress',
+'file': 'disk0'
+})
+
+vm.cmd('blockdev-replace', {
+'parent-type': 'qdev',
+'qdev-id': 'sda',
+'new-child': 'filter'
+})
+
+# Filter inserted:
+# sda -root-> filter -file-> disk0 -file-> file0
+vm.assert_edges_list([
+# parent_node_name, child_name, child_node_name
+('sda', 'root', 'filter'),
+('filter', 'file', 'disk0'),
+('disk0', 'file', 'file0'),
+])
+
+vm.cmd('blockdev-replace', {
+'parent-type': 'qdev',
+'qdev-id': 'sda',
+'new-child': 'disk0'
+})
+vm.cmd('blockdev-del', node_name='filter')
+
+# Filter removed:
+# sda -root-> disk0 -file-> file0
+vm.assert_edges_list([
+# parent_node_name, child_name, child_node_name
+('sda', 'root', 'disk0'),
+('disk0', 'file', 'file0'),
+])
+
+def test_insert_under_nbd_export(self):
+vm = self.vm
+
+vm.cmd('nbd-server-start',
+ addr={'type': 'unix', 'data': {'path': sock}})
+vm.cmd('block-export-add', id='exp1', type='nbd',
+

[PATCH v9 4/7] qapi: add blockdev-replace command

2024-06-26 Thread Vladimir Sementsov-Ogievskiy

Add a command that can replace bs in following BdrvChild structures:

 - qdev blk root child
 - block-export blk root child
 - any child of BlockDriverState selected by child-name

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 blockdev.c | 56 +++
 qapi/block-core.json   | 88 ++
 stubs/blk-by-qdev-id.c | 13 +++
 stubs/meson.build  |  1 +
 4 files changed, 158 insertions(+)
 create mode 100644 stubs/blk-by-qdev-id.c

diff --git a/blockdev.c b/blockdev.c
index ba7e90b06e..2190467022 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3559,6 +3559,62 @@ void qmp_x_blockdev_set_iothread(const char *node_name, 
StrOrNull *iothread,
 bdrv_try_change_aio_context(bs, new_context, NULL, errp);
 }
 
+void qmp_blockdev_replace(BlockdevReplace *repl, Error **errp)
+{
+BdrvChild *child = NULL;
+BlockDriverState *new_child_bs;
+
+if (repl->parent_type == BLOCK_PARENT_TYPE_DRIVER) {
+BlockDriverState *parent_bs;
+
+parent_bs = bdrv_find_node(repl->u.driver.node_name);
+if (!parent_bs) {
+error_setg(errp, "Block driver node with node-name '%s' not "
+   "found", repl->u.driver.node_name);
+return;
+}
+
+child = bdrv_find_child(parent_bs, repl->u.driver.child);
+if (!child) {
+error_setg(errp, "Block driver node '%s' doesn't have child "
+   "named '%s'", repl->u.driver.node_name,
+   repl->u.driver.child);
+return;
+}
+} else {
+/* Other types are similar, they work through blk */
+BlockBackend *blk;
+bool is_qdev = repl->parent_type == BLOCK_PARENT_TYPE_QDEV;
+const char *id =
+is_qdev ? repl->u.qdev.qdev_id : repl->u.export.export_id;
+
+assert(is_qdev || repl->parent_type == BLOCK_PARENT_TYPE_EXPORT);
+
+blk = is_qdev ? blk_by_qdev_id(id, errp) : blk_by_export_id(id, errp);
+if (!blk) {
+return;
+}
+
+child = blk_root(blk);
+if (!child) {
+error_setg(errp, "%s '%s' is empty, nothing to replace",
+   is_qdev ? "Device" : "Export", id);
+return;
+}
+}
+
+assert(child);
+assert(child->bs);
+
+new_child_bs = bdrv_find_node(repl->new_child);
+if (!new_child_bs) {
+error_setg(errp, "Node '%s' not found", repl->new_child);
+return;
+}
+
+bdrv_replace_child_bs(child, new_child_bs, errp);
+}
+
 QemuOptsList qemu_common_drive_opts = {
 .name = "drive",
 .head = QTAILQ_HEAD_INITIALIZER(qemu_common_drive_opts.head),
diff --git a/qapi/block-core.json b/qapi/block-core.json
index df5e07debd..0a6f08a6e0 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -6148,3 +6148,91 @@
 ##
 { 'struct': 'DummyBlockCoreForceArrays',
   'data': { 'unused-block-graph-info': ['BlockGraphInfo'] } }
+
+##
+# @BlockParentType:
+#
+# @qdev: block device, such as created by device_add, and denoted by
+# qdev-id
+#
+# @driver: block driver node, such as created by blockdev-add, and
+# denoted by node-name
+#
+# @export: block export, such created by block-export-add, and
+# denoted by export-id
+#
+# Since 9.1
+##
+{ 'enum': 'BlockParentType',
+  'data': ['qdev', 'driver', 'export'] }
+
+##
+# @BdrvChildRefQdev:
+#
+# @qdev-id: the device's ID or QOM path
+#
+# Since 9.1
+##
+{ 'struct': 'BdrvChildRefQdev',
+  'data': { 'qdev-id': 'str' } }
+
+##
+# @BdrvChildRefExport:
+#
+# @export-id: block export identifier
+#
+# Since 9.1
+##
+{ 'struct': 'BdrvChildRefExport',
+  'data': { 'export-id': 'str' } }
+
+##
+# @BdrvChildRefDriver:
+#
+# @node-name: the node name of the parent block node
+#
+# @child: name of the child to be replaced, like "file" or "backing"
+#
+# Since 9.1
+##
+{ 'struct': 'BdrvChildRefDriver',
+  'data': { 'node-name': 'str', 'child': 'str' } }
+
+##
+# @BlockdevReplace:
+#
+# @parent-type: type of the parent, which child is to be replaced
+#
+# @new-child: new child for replacement
+#
+# Since 9.1
+##
+{ 'union': 'BlockdevReplace',
+  'base': {
+  'parent-type': 'BlockParentType',
+  'new-child': 'str'
+  },
+  'discriminator': 'parent-type',
+  'data': {
+  'qdev': 'BdrvChildRefQdev',
+  'export': 'BdrvChildRefExport',
+  'driver': 'BdrvChildRefDriver'
+  } }
+
+##
+# @blockdev-replace:
+#
+# Replace a block-node associated with device (selected by
+# @qdev-id) or with block-export (selected by @export-id) or
+# any child of block-node (selected by @node-name and @child)
+# with @new-child block-node.
+#
+# Features:
+#
+# @unstable: This command is experimental.
+#
+# Since 9.1
+##
+{ 'command': 'blockdev-replace', 'boxed': true,
+  'features': [ 'unstable' ],
+  'data': 'BlockdevRe

[PATCH v9 6/7] iotests.py: introduce VM.assert_edges_list() method

2024-06-26 Thread Vladimir Sementsov-Ogievskiy

Add an alternative method to check block graph, to be used in further
commit.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 tests/qemu-iotests/iotests.py | 17 +
 1 file changed, 17 insertions(+)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index ea48af4a7b..8a5fd01eac 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -1108,6 +1108,23 @@ def check_bitmap_status(self, node_name, bitmap_name, 
fields):
 
 return fields.items() <= ret.items()
 
+def get_block_graph(self):
+"""
+Returns block graph in form of edges list, where each edge is a tuple:
+  (parent_node_name, child_name, child_node_name)
+"""
+graph = self.qmp('x-debug-query-block-graph')['return']
+
+nodes = {n['id']: n['name'] for n in graph['nodes']}
+# Check that all names are unique:
+assert len(set(nodes.values())) == len(nodes)
+
+return [(nodes[e['parent']], e['name'], nodes[e['child']])
+for e in graph['edges']]
+
+def assert_edges_list(self, edges):
+assert sorted(edges) == sorted(self.get_block_graph())
+
 def assert_block_path(self, root, path, expected_node, graph=None):
 """
 Check whether the node under the given path in the block graph
-- 
2.34.1

[PATCH v9 3/7] block: make bdrv_find_child() function public

2024-06-26 Thread Vladimir Sementsov-Ogievskiy

To be reused soon for blockdev-replace functionality.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block.c  | 13 +
 blockdev.c   | 14 --
 include/block/block_int-io.h |  2 ++
 3 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/block.c b/block.c
index 468cf5e67d..f6292f459a 100644
--- a/block.c
+++ b/block.c
@@ -8174,6 +8174,19 @@ int bdrv_make_empty(BdrvChild *c, Error **errp)
 return 0;
 }
 
+BdrvChild *bdrv_find_child(BlockDriverState *parent_bs, const char *child_name)
+{
+BdrvChild *child;
+
+QLIST_FOREACH(child, _bs->children, next) {
+if (strcmp(child->name, child_name) == 0) {
+return child;
+}
+}
+
+return NULL;
+}
+
 /*
  * Return the child that @bs acts as an overlay for, and from which data may be
  * copied in COW or COR operations.  Usually this is the backing file.
diff --git a/blockdev.c b/blockdev.c
index 835064ed03..ba7e90b06e 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3452,20 +3452,6 @@ void qmp_blockdev_del(const char *node_name, Error 
**errp)
 bdrv_unref(bs);
 }
 
-static BdrvChild * GRAPH_RDLOCK
-bdrv_find_child(BlockDriverState *parent_bs, const char *child_name)
-{
-BdrvChild *child;
-
-QLIST_FOREACH(child, _bs->children, next) {
-if (strcmp(child->name, child_name) == 0) {
-return child;
-}
-}
-
-return NULL;
-}
-
 void qmp_x_blockdev_change(const char *parent, const char *child,
const char *node, Error **errp)
 {
diff --git a/include/block/block_int-io.h b/include/block/block_int-io.h
index 4a7cf2b4fd..11ed4aa927 100644
--- a/include/block/block_int-io.h
+++ b/include/block/block_int-io.h
@@ -130,6 +130,8 @@ bdrv_co_refresh_total_sectors(BlockDriverState *bs, int64_t 
hint);
 int co_wrapper_mixed_bdrv_rdlock
 bdrv_refresh_total_sectors(BlockDriverState *bs, int64_t hint);
 
+BdrvChild * GRAPH_RDLOCK
+bdrv_find_child(BlockDriverState *parent_bs, const char *child_name);
 BdrvChild * GRAPH_RDLOCK bdrv_cow_child(BlockDriverState *bs);
 BdrvChild * GRAPH_RDLOCK bdrv_filter_child(BlockDriverState *bs);
 BdrvChild * GRAPH_RDLOCK bdrv_filter_or_cow_child(BlockDriverState *bs);
-- 
2.34.1

[PATCH v9 5/7] block: bdrv_get_xdbg_block_graph(): report export ids

2024-06-26 Thread Vladimir Sementsov-Ogievskiy

Currently for block exports we report empty blk names. That's not good.
Let's try to find corresponding block export and report its id.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block.c |  4 
 block/export/export.c   | 13 +
 include/block/export.h  |  1 +
 stubs/blk-exp-find-by-blk.c |  9 +
 stubs/meson.build   |  1 +
 5 files changed, 28 insertions(+)
 create mode 100644 stubs/blk-exp-find-by-blk.c

diff --git a/block.c b/block.c
index f6292f459a..601475e835 100644
--- a/block.c
+++ b/block.c
@@ -6326,7 +6326,11 @@ XDbgBlockGraph *bdrv_get_xdbg_block_graph(Error **errp)
 for (blk = blk_all_next(NULL); blk; blk = blk_all_next(blk)) {
 char *allocated_name = NULL;
 const char *name = blk_name(blk);
+BlockExport *exp = blk_exp_find_by_blk(blk);
 
+if (!*name && exp) {
+name = exp->id;
+}
 if (!*name) {
 name = allocated_name = blk_get_attached_dev_id(blk);
 }
diff --git a/block/export/export.c b/block/export/export.c
index 57beae7982..8744a1171e 100644
--- a/block/export/export.c
+++ b/block/export/export.c
@@ -60,6 +60,19 @@ BlockExport *blk_exp_find(const char *id)
 return NULL;
 }
 
+BlockExport *blk_exp_find_by_blk(BlockBackend *blk)
+{
+BlockExport *exp;
+
+QLIST_FOREACH(exp, _exports, next) {
+if (exp->blk == blk) {
+return exp;
+}
+}
+
+return NULL;
+}
+
 static const BlockExportDriver *blk_exp_find_driver(BlockExportType type)
 {
 int i;
diff --git a/include/block/export.h b/include/block/export.h
index f2fe0f8078..16863d37cf 100644
--- a/include/block/export.h
+++ b/include/block/export.h
@@ -82,6 +82,7 @@ struct BlockExport {
 
 BlockExport *blk_exp_add(BlockExportOptions *export, Error **errp);
 BlockExport *blk_exp_find(const char *id);
+BlockExport *blk_exp_find_by_blk(BlockBackend *blk);
 void blk_exp_ref(BlockExport *exp);
 void blk_exp_unref(BlockExport *exp);
 void blk_exp_request_shutdown(BlockExport *exp);
diff --git a/stubs/blk-exp-find-by-blk.c b/stubs/blk-exp-find-by-blk.c
new file mode 100644
index 00..2fc1da953b
--- /dev/null
+++ b/stubs/blk-exp-find-by-blk.c
@@ -0,0 +1,9 @@
+#include "qemu/osdep.h"
+#include "sysemu/block-backend.h"
+#include "block/export.h"
+
+BlockExport *blk_exp_find_by_blk(BlockBackend *blk)
+{
+return NULL;
+}
+
diff --git a/stubs/meson.build b/stubs/meson.build
index 068998c1a5..cbe30f94e8 100644
--- a/stubs/meson.build
+++ b/stubs/meson.build
@@ -16,6 +16,7 @@ if have_block
   stub_ss.add(files('blk-commit-all.c'))
   stub_ss.add(files('blk-exp-close-all.c'))
   stub_ss.add(files('blk-by-qdev-id.c'))
+  stub_ss.add(files('blk-exp-find-by-blk.c'))
   stub_ss.add(files('blockdev-close-all-bdrv-states.c'))
   stub_ss.add(files('change-state-handler.c'))
   stub_ss.add(files('get-vm-name.c'))
-- 
2.34.1

Re: [PATCH v4] virtio: add VIRTQUEUE_ERROR QAPI event

2024-06-26 Thread Vladimir Sementsov-Ogievskiy


ping4

On 17.10.23 15:39, Vladimir Sementsov-Ogievskiy wrote:

For now we only log the vhost device error, when virtqueue is actually
stopped. Let's add a QAPI event, which makes possible:

  - collect statistics of such errors
  - make immediate actions: take core dumps or do some other debugging
  - inform the user through a management API or UI, so that (s)he can
   react somehow, e.g. reset the device driver in the guest or even
   build up some automation to do so

Note that basically every inconsistency discovered during virtqueue
processing results in a silent virtqueue stop.  The guest then just
sees the requests getting stuck somewhere in the device for no visible
reason.  This event provides a means to inform the management layer of
this situation in a timely fashion.

The event could be reused for some other virtqueue problems (not only
for vhost devices) in future. For this it gets a generic name and
structure.

We keep original VHOST_OPS_DEBUG(), to keep original debug output as is
here, it's not the only call to VHOST_OPS_DEBUG in the file.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Denis Plotnikov 
Acked-by: Markus Armbruster 



--
Best regards,
Vladimir

Re: [PATCH v2] block-backend: per-device throttling of BLOCK_IO_ERROR reports

2024-06-26 Thread Vladimir Sementsov-Ogievskiy


ping2

On 09.01.24 16:13, Vladimir Sementsov-Ogievskiy wrote:

From: Leonid Kaplan 

BLOCK_IO_ERROR events comes from guest, so we must throttle them.
We still want per-device throttling, so let's use device id as a key.

Signed-off-by: Leonid Kaplan 
Signed-off-by: Vladimir Sementsov-Ogievskiy 
---

v2: add Note: to QAPI doc

  monitor/monitor.c| 10 ++
  qapi/block-core.json |  2 ++
  2 files changed, 12 insertions(+)

diff --git a/monitor/monitor.c b/monitor/monitor.c
index 01ede1babd..ad0243e9d7 100644
--- a/monitor/monitor.c
+++ b/monitor/monitor.c
@@ -309,6 +309,7 @@ int error_printf_unless_qmp(const char *fmt, ...)
  static MonitorQAPIEventConf monitor_qapi_event_conf[QAPI_EVENT__MAX] = {
  /* Limit guest-triggerable events to 1 per second */
  [QAPI_EVENT_RTC_CHANGE]= { 1000 * SCALE_MS },
+[QAPI_EVENT_BLOCK_IO_ERROR]= { 1000 * SCALE_MS },
  [QAPI_EVENT_WATCHDOG]  = { 1000 * SCALE_MS },
  [QAPI_EVENT_BALLOON_CHANGE]= { 1000 * SCALE_MS },
  [QAPI_EVENT_QUORUM_REPORT_BAD] = { 1000 * SCALE_MS },
@@ -498,6 +499,10 @@ static unsigned int qapi_event_throttle_hash(const void 
*key)
  hash += g_str_hash(qdict_get_str(evstate->data, "qom-path"));
  }
  
+if (evstate->event == QAPI_EVENT_BLOCK_IO_ERROR) {

+hash += g_str_hash(qdict_get_str(evstate->data, "device"));
+}
+
  return hash;
  }
  
@@ -525,6 +530,11 @@ static gboolean qapi_event_throttle_equal(const void *a, const void *b)

 qdict_get_str(evb->data, "qom-path"));
  }
  
+if (eva->event == QAPI_EVENT_BLOCK_IO_ERROR) {

+return !strcmp(qdict_get_str(eva->data, "device"),
+   qdict_get_str(evb->data, "device"));
+}
+
  return TRUE;
  }
  
diff --git a/qapi/block-core.json b/qapi/block-core.json

index ca390c5700..32c2c2f030 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -5559,6 +5559,8 @@
  # Note: If action is "stop", a STOP event will eventually follow the
  # BLOCK_IO_ERROR event
  #
+# Note: This event is rate-limited.
+#
  # Since: 0.13
  #
  # Example:


--
Best regards,
Vladimir

[PATCH v2 3/7] qapi: block-job-change: make copy-mode parameter optional

2024-06-26 Thread Vladimir Sementsov-Ogievskiy

We are going to add more parameters to change. We want to make possible
to change only one or any subset of available options. So all the
options should be optional.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/mirror.c   | 4 
 qapi/block-core.json | 3 ++-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/block/mirror.c b/block/mirror.c
index 2816bb1042..60e8d83e4f 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1272,6 +1272,10 @@ static void mirror_change(BlockJob *job, 
JobChangeOptions *opts,
 
 GLOBAL_STATE_CODE();
 
+if (!change_opts->has_copy_mode) {
+return;
+}
+
 if (qatomic_read(>copy_mode) == change_opts->copy_mode) {
 return;
 }
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 4ec5632596..660c7f4a48 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3071,11 +3071,12 @@
 #
 # @copy-mode: Switch to this copy mode.  Currently, only the switch
 # from 'background' to 'write-blocking' is implemented.
+# If absent, copy mode remains the same.  (optional since 9.1)
 #
 # Since: 8.2
 ##
 { 'struct': 'JobChangeOptionsMirror',
-  'data': { 'copy-mode' : 'MirrorCopyMode' } }
+  'data': { '*copy-mode' : 'MirrorCopyMode' } }
 
 ##
 # @JobChangeOptions:
-- 
2.34.1

[PATCH v2 1/7] qapi: rename BlockJobChangeOptions to JobChangeOptions

2024-06-26 Thread Vladimir Sementsov-Ogievskiy

We are going to move change action from block-job to job
implementation, and then move to job-* extenral APIs, deprecating
block-job-* APIs. This commit simplifies further transition.

The commit is made by command

git grep -l BlockJobChangeOptions | \
xargs sed -i 's/BlockJobChangeOptions/JobChangeOptions/g'

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/mirror.c   |  4 ++--
 blockdev.c   |  2 +-
 blockjob.c   |  2 +-
 include/block/blockjob.h |  2 +-
 include/block/blockjob_int.h |  2 +-
 qapi/block-core.json | 12 ++--
 6 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 61f0a717b7..2816bb1042 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1258,11 +1258,11 @@ static bool commit_active_cancel(Job *job, bool force)
 return force || !job_is_ready(job);
 }
 
-static void mirror_change(BlockJob *job, BlockJobChangeOptions *opts,
+static void mirror_change(BlockJob *job, JobChangeOptions *opts,
   Error **errp)
 {
 MirrorBlockJob *s = container_of(job, MirrorBlockJob, common);
-BlockJobChangeOptionsMirror *change_opts = >u.mirror;
+JobChangeOptionsMirror *change_opts = >u.mirror;
 MirrorCopyMode current;
 
 /*
diff --git a/blockdev.c b/blockdev.c
index 835064ed03..3f4ed96ecc 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3248,7 +3248,7 @@ void qmp_block_job_dismiss(const char *id, Error **errp)
 job_dismiss_locked(, errp);
 }
 
-void qmp_block_job_change(BlockJobChangeOptions *opts, Error **errp)
+void qmp_block_job_change(JobChangeOptions *opts, Error **errp)
 {
 BlockJob *job;
 
diff --git a/blockjob.c b/blockjob.c
index d5f29e14af..8cfbb15543 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -312,7 +312,7 @@ static bool block_job_set_speed(BlockJob *job, int64_t 
speed, Error **errp)
 return block_job_set_speed_locked(job, speed, errp);
 }
 
-void block_job_change_locked(BlockJob *job, BlockJobChangeOptions *opts,
+void block_job_change_locked(BlockJob *job, JobChangeOptions *opts,
  Error **errp)
 {
 const BlockJobDriver *drv = block_job_driver(job);
diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index 7061ab7201..5dd1b08909 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -181,7 +181,7 @@ bool block_job_set_speed_locked(BlockJob *job, int64_t 
speed, Error **errp);
  *
  * Change the job according to opts.
  */
-void block_job_change_locked(BlockJob *job, BlockJobChangeOptions *opts,
+void block_job_change_locked(BlockJob *job, JobChangeOptions *opts,
  Error **errp);
 
 /**
diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
index 4c3d2e25a2..d9c3b911d0 100644
--- a/include/block/blockjob_int.h
+++ b/include/block/blockjob_int.h
@@ -73,7 +73,7 @@ struct BlockJobDriver {
  *
  * Note that this can already be called before the job coroutine is 
running.
  */
-void (*change)(BlockJob *job, BlockJobChangeOptions *opts, Error **errp);
+void (*change)(BlockJob *job, JobChangeOptions *opts, Error **errp);
 
 /*
  * Query information specific to this kind of block job.
diff --git a/qapi/block-core.json b/qapi/block-core.json
index df5e07debd..4ec5632596 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3067,18 +3067,18 @@
   'allow-preconfig': true }
 
 ##
-# @BlockJobChangeOptionsMirror:
+# @JobChangeOptionsMirror:
 #
 # @copy-mode: Switch to this copy mode.  Currently, only the switch
 # from 'background' to 'write-blocking' is implemented.
 #
 # Since: 8.2
 ##
-{ 'struct': 'BlockJobChangeOptionsMirror',
+{ 'struct': 'JobChangeOptionsMirror',
   'data': { 'copy-mode' : 'MirrorCopyMode' } }
 
 ##
-# @BlockJobChangeOptions:
+# @JobChangeOptions:
 #
 # Block job options that can be changed after job creation.
 #
@@ -3088,10 +3088,10 @@
 #
 # Since: 8.2
 ##
-{ 'union': 'BlockJobChangeOptions',
+{ 'union': 'JobChangeOptions',
   'base': { 'id': 'str', 'type': 'JobType' },
   'discriminator': 'type',
-  'data': { 'mirror': 'BlockJobChangeOptionsMirror' } }
+  'data': { 'mirror': 'JobChangeOptionsMirror' } }
 
 ##
 # @block-job-change:
@@ -3101,7 +3101,7 @@
 # Since: 8.2
 ##
 { 'command': 'block-job-change',
-  'data': 'BlockJobChangeOptions', 'boxed': true }
+  'data': 'JobChangeOptions', 'boxed': true }
 
 ##
 # @BlockdevDiscardOptions:
-- 
2.34.1

[PATCH v2 7/7] iotests/mirror-change-copy-mode: switch to job-change command

2024-06-26 Thread Vladimir Sementsov-Ogievskiy

block-job-change is deprecated, let's move test to job-change.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 tests/qemu-iotests/tests/mirror-change-copy-mode | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/tests/mirror-change-copy-mode 
b/tests/qemu-iotests/tests/mirror-change-copy-mode
index 51788b85c7..e972604ebf 100755
--- a/tests/qemu-iotests/tests/mirror-change-copy-mode
+++ b/tests/qemu-iotests/tests/mirror-change-copy-mode
@@ -150,7 +150,7 @@ class TestMirrorChangeCopyMode(iotests.QMPTestCase):
 len_before_change = result[0]['len']
 
 # Change the copy mode while requests are happening.
-self.vm.cmd('block-job-change',
+self.vm.cmd('job-change',
 id='mirror',
 type='mirror',
 copy_mode='write-blocking')
-- 
2.34.1

[PATCH v2 0/7] introduce job-change qmp command

2024-06-26 Thread Vladimir Sementsov-Ogievskiy

Hi all!

This is an updated first part of my "[RFC 00/15] block job API"
Supersedes: <20240313150907.623462-1-vsement...@yandex-team.ru>

v2:
- only job-change for now, as a first step
- drop "type-based unions", and keep type parameter as is for now (I now
  doubt that this was good idea, as it makes QAPI protocol dependent on
  context)
03: improve documentation
06: deprecated only block-job-change for now
07: new

Vladimir Sementsov-Ogievskiy (7):
  qapi: rename BlockJobChangeOptions to JobChangeOptions
  blockjob: block_job_change_locked(): check job type
  qapi: block-job-change: make copy-mode parameter optional
  blockjob: move change action implementation to job from block-job
  qapi: add job-change
  qapi/block-core: derpecate block-job-change
  iotests/mirror-change-copy-mode: switch to job-change command

 block/mirror.c| 13 +---
 blockdev.c|  4 +--
 blockjob.c| 20 
 docs/about/deprecated.rst |  5 +++
 include/block/blockjob.h  | 11 ---
 include/block/blockjob_int.h  |  7 -
 include/qemu/job.h| 12 +++
 job-qmp.c | 15 +
 job.c | 23 ++
 qapi/block-core.json  | 31 ++-
 .../tests/mirror-change-copy-mode |  2 +-
 11 files changed, 90 insertions(+), 53 deletions(-)

-- 
2.34.1

[PATCH v2 2/7] blockjob: block_job_change_locked(): check job type

2024-06-26 Thread Vladimir Sementsov-Ogievskiy

User may specify wrong type for the job id. Let's check it.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 blockjob.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/blockjob.c b/blockjob.c
index 8cfbb15543..788cb1e07d 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -319,6 +319,12 @@ void block_job_change_locked(BlockJob *job, 
JobChangeOptions *opts,
 
 GLOBAL_STATE_CODE();
 
+if (job_type(>job) != opts->type) {
+error_setg(errp, "Job '%s' is '%s' job, not '%s'", job->job.id,
+   job_type_str(>job), JobType_str(opts->type));
+return;
+}
+
 if (job_apply_verb_locked(>job, JOB_VERB_CHANGE, errp)) {
 return;
 }
-- 
2.34.1

[PATCH v2 6/7] qapi/block-core: derpecate block-job-change

2024-06-26 Thread Vladimir Sementsov-Ogievskiy

That's a first step to move on newer job-* APIs.

The difference between block-job-change and job-change is in
find_block_job_locked() vs find_job_locked() functions. What's
different?

1. find_block_job_locked() do check, is found job a block-job. This OK
   when moving to more generic API, no needs to document this change.

2. find_block_job_locked() reports DeviceNotActive on failure, when
   find_job_locked() reports GenericError. Still, for block-job-change
   errors are not documented at all, so be silent in deprecated.txt as
   well.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 docs/about/deprecated.rst | 5 +
 qapi/block-core.json  | 6 ++
 2 files changed, 11 insertions(+)

diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
index ff3da68208..0ddced0781 100644
--- a/docs/about/deprecated.rst
+++ b/docs/about/deprecated.rst
@@ -134,6 +134,11 @@ options are removed in favor of using explicit 
``blockdev-create`` and
 ``blockdev-add`` calls. See :doc:`/interop/live-block-operations` for
 details.
 
+``block-job-change`` (since 9.1)
+
+
+Use ``job-change`` instead.
+
 Incorrectly typed ``device_add`` arguments (since 6.2)
 ''
 
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 9087ce300c..064cad0b64 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3099,9 +3099,15 @@
 #
 # Change the block job's options.
 #
+# Features:
+#
+# @deprecated: This command is deprecated.  Use @job-change
+# instead.
+#
 # Since: 8.2
 ##
 { 'command': 'block-job-change',
+  'features': ['deprecated'],
   'data': 'JobChangeOptions', 'boxed': true }
 
 ##
-- 
2.34.1

[PATCH v2 4/7] blockjob: move change action implementation to job from block-job

2024-06-26 Thread Vladimir Sementsov-Ogievskiy

Like for other block-job-* APIs we want have the actual functionality
in job layer and make block-job-change to be a deprecated duplication
of job-change in the following commit.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/mirror.c   |  7 +++
 blockdev.c   |  2 +-
 blockjob.c   | 26 --
 include/block/blockjob.h | 11 ---
 include/block/blockjob_int.h |  7 ---
 include/qemu/job.h   | 12 
 job-qmp.c|  1 +
 job.c| 23 +++
 8 files changed, 40 insertions(+), 49 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 60e8d83e4f..63e35114f3 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1258,10 +1258,9 @@ static bool commit_active_cancel(Job *job, bool force)
 return force || !job_is_ready(job);
 }
 
-static void mirror_change(BlockJob *job, JobChangeOptions *opts,
-  Error **errp)
+static void mirror_change(Job *job, JobChangeOptions *opts, Error **errp)
 {
-MirrorBlockJob *s = container_of(job, MirrorBlockJob, common);
+MirrorBlockJob *s = container_of(job, MirrorBlockJob, common.job);
 JobChangeOptionsMirror *change_opts = >u.mirror;
 MirrorCopyMode current;
 
@@ -1316,9 +1315,9 @@ static const BlockJobDriver mirror_job_driver = {
 .pause  = mirror_pause,
 .complete   = mirror_complete,
 .cancel = mirror_cancel,
+.change = mirror_change,
 },
 .drained_poll   = mirror_drained_poll,
-.change = mirror_change,
 .query  = mirror_query,
 };
 
diff --git a/blockdev.c b/blockdev.c
index 3f4ed96ecc..70b6aeaef0 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3259,7 +3259,7 @@ void qmp_block_job_change(JobChangeOptions *opts, Error 
**errp)
 return;
 }
 
-block_job_change_locked(job, opts, errp);
+job_change_locked(>job, opts, errp);
 }
 
 void qmp_change_backing_file(const char *device,
diff --git a/blockjob.c b/blockjob.c
index 788cb1e07d..2769722b37 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -312,32 +312,6 @@ static bool block_job_set_speed(BlockJob *job, int64_t 
speed, Error **errp)
 return block_job_set_speed_locked(job, speed, errp);
 }
 
-void block_job_change_locked(BlockJob *job, JobChangeOptions *opts,
- Error **errp)
-{
-const BlockJobDriver *drv = block_job_driver(job);
-
-GLOBAL_STATE_CODE();
-
-if (job_type(>job) != opts->type) {
-error_setg(errp, "Job '%s' is '%s' job, not '%s'", job->job.id,
-   job_type_str(>job), JobType_str(opts->type));
-return;
-}
-
-if (job_apply_verb_locked(>job, JOB_VERB_CHANGE, errp)) {
-return;
-}
-
-if (drv->change) {
-job_unlock();
-drv->change(job, opts, errp);
-job_lock();
-} else {
-error_setg(errp, "Job type does not support change");
-}
-}
-
 void block_job_ratelimit_processed_bytes(BlockJob *job, uint64_t n)
 {
 IO_CODE();
diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index 5dd1b08909..72e849a140 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -173,17 +173,6 @@ bool block_job_has_bdrv(BlockJob *job, BlockDriverState 
*bs);
  */
 bool block_job_set_speed_locked(BlockJob *job, int64_t speed, Error **errp);
 
-/**
- * block_job_change_locked:
- * @job: The job to change.
- * @opts: The new options.
- * @errp: Error object.
- *
- * Change the job according to opts.
- */
-void block_job_change_locked(BlockJob *job, JobChangeOptions *opts,
- Error **errp);
-
 /**
  * block_job_query_locked:
  * @job: The job to get information about.
diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
index d9c3b911d0..58bc7a5cea 100644
--- a/include/block/blockjob_int.h
+++ b/include/block/blockjob_int.h
@@ -68,13 +68,6 @@ struct BlockJobDriver {
 
 void (*set_speed)(BlockJob *job, int64_t speed);
 
-/*
- * Change the @job's options according to @opts.
- *
- * Note that this can already be called before the job coroutine is 
running.
- */
-void (*change)(BlockJob *job, JobChangeOptions *opts, Error **errp);
-
 /*
  * Query information specific to this kind of block job.
  */
diff --git a/include/qemu/job.h b/include/qemu/job.h
index 2b873f2576..6fa525dac3 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -27,6 +27,7 @@
 #define JOB_H
 
 #include "qapi/qapi-types-job.h"
+#include "qapi/qapi-types-block-core.h"
 #include "qemu/queue.h"
 #include "qemu/progress_meter.h"
 #include "qemu/coroutine.h"
@@ -307,6 +308,12 @@ struct JobDriver {
  */
 bool (*cancel)(Job *job, bool force);
 
+/**
+ * Chang

[PATCH v2 5/7] qapi: add job-change

2024-06-26 Thread Vladimir Sementsov-Ogievskiy

Add a new-style command job-change, doing same thing as
block-job-change. The aim is finally deprecate block-job-* APIs and
move to job-* APIs.

We add a new command to qapi/block-core.json, not to
qapi/job.json to avoid resolving json file including loops for now.
This all would be a lot simple to refactor when we finally drop
deprecated block-job-* APIs.

@type argument of the new command immediately becomes deprecated.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 job-qmp.c| 14 ++
 qapi/block-core.json | 10 ++
 2 files changed, 24 insertions(+)

diff --git a/job-qmp.c b/job-qmp.c
index c764bd3801..248e68f554 100644
--- a/job-qmp.c
+++ b/job-qmp.c
@@ -139,6 +139,20 @@ void qmp_job_dismiss(const char *id, Error **errp)
 job_dismiss_locked(, errp);
 }
 
+void qmp_job_change(JobChangeOptions *opts, Error **errp)
+{
+Job *job;
+
+JOB_LOCK_GUARD();
+job = find_job_locked(opts->id, errp);
+
+if (!job) {
+return;
+}
+
+job_change_locked(job, opts, errp);
+}
+
 /* Called with job_mutex held. */
 static JobInfo *job_query_single_locked(Job *job, Error **errp)
 {
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 660c7f4a48..9087ce300c 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3104,6 +3104,16 @@
 { 'command': 'block-job-change',
   'data': 'JobChangeOptions', 'boxed': true }
 
+##
+# @job-change:
+#
+# Change the block job's options.
+#
+# Since: 9.1
+##
+{ 'command': 'job-change',
+  'data': 'JobChangeOptions', 'boxed': true }
+
 ##
 # @BlockdevDiscardOptions:
 #
-- 
2.34.1

Re: [PATCH 1/2] block: allow commit to unmap zero blocks

2024-06-25 Thread Vladimir Sementsov-Ogievskiy


On 26.05.24 22:29, Vincent Vanlaer wrote:

Non-active block commits do not discard blocks only containing zeros,
causing images to lose sparseness after the commit. This commit fixes
that by writing zero blocks using blk_co_pwrite_zeroes rather than
writing them out as any oother arbitrary data.


Hi Vincent! Sorry for long delay.

Could you please split the commit into simpler parts?

Something like this:

1. refactor to use direct bdrv_co_common_block_status_above()

2. refactor to use CommitMethod (with two possible modes)

3. add new mode

(The idea is to split parts of no-logic-change refactoring from real logic 
changes and keep the latter smaller and clearer)



Signed-off-by: Vincent Vanlaer 
---
  block/commit.c | 71 +-
  1 file changed, 58 insertions(+), 13 deletions(-)

diff --git a/block/commit.c b/block/commit.c
index 7c3fdcb0ca..5bd97b5a74 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -12,9 +12,13 @@
   *
   */
  
+#include "bits/time.h"

  #include "qemu/osdep.h"
  #include "qemu/cutils.h"
+#include "time.h"
  #include "trace.h"
+#include "block/block-common.h"
+#include "block/coroutines.h"
  #include "block/block_int.h"
  #include "block/blockjob_int.h"
  #include "qapi/error.h"
@@ -126,6 +130,12 @@ static void commit_clean(Job *job)
  blk_unref(s->top);
  }
  
+typedef enum CommitMethod {

+COMMIT_METHOD_COPY,
+COMMIT_METHOD_ZERO,
+COMMIT_METHOD_IGNORE,
+} CommitMethod;
+
  static int coroutine_fn commit_run(Job *job, Error **errp)
  {
  CommitBlockJob *s = container_of(job, CommitBlockJob, common.job);
@@ -156,8 +166,9 @@ static int coroutine_fn commit_run(Job *job, Error **errp)
  buf = blk_blockalign(s->top, COMMIT_BUFFER_SIZE);
  
  for (offset = 0; offset < len; offset += n) {

-bool copy;
  bool error_in_source = true;
+CommitMethod commit_method = COMMIT_METHOD_COPY;
+
  
  /* Note that even when no rate limit is applied we need to yield

   * with no pending I/O here so that bdrv_drain_all() returns.
@@ -167,21 +178,54 @@ static int coroutine_fn commit_run(Job *job, Error **errp)
  break;
  }
  /* Copy if allocated above the base */
-ret = blk_co_is_allocated_above(s->top, s->base_overlay, true,
-offset, COMMIT_BUFFER_SIZE, );
-copy = (ret > 0);
+WITH_GRAPH_RDLOCK_GUARD() {
+ret = bdrv_co_common_block_status_above(blk_bs(s->top),
+s->base_overlay, true, true, offset, COMMIT_BUFFER_SIZE,
+, NULL, NULL, NULL);
+}
+
  trace_commit_one_iteration(s, offset, n, ret);
-if (copy) {
-assert(n < SIZE_MAX);
-
-ret = blk_co_pread(s->top, offset, n, buf, 0);
-if (ret >= 0) {
-ret = blk_co_pwrite(s->base, offset, n, buf, 0);
-if (ret < 0) {
-error_in_source = false;
+
+if (ret >= 0 && !(ret & BDRV_BLOCK_ALLOCATED)) {
+commit_method = COMMIT_METHOD_IGNORE;
+} else if (ret >= 0 && ret & BDRV_BLOCK_ZERO) {
+int64_t target_offset;
+int64_t target_bytes;
+WITH_GRAPH_RDLOCK_GUARD() {
+bdrv_round_to_subclusters(s->base_bs, offset, n,
+   _offset, _bytes);
+}
+
+if (target_offset == offset &&
+target_bytes == n) {
+commit_method = COMMIT_METHOD_ZERO;
+}
+}
+
+if (ret >= 0) {
+switch (commit_method) {
+case COMMIT_METHOD_COPY:
+assert(n < SIZE_MAX);
+ret = blk_co_pread(s->top, offset, n, buf, 0);
+if (ret >= 0) {
+ret = blk_co_pwrite(s->base, offset, n, buf, 0);
+if (ret < 0) {
+error_in_source = false;
+}
  }
+break;
+case COMMIT_METHOD_ZERO:
+ret = blk_co_pwrite_zeroes(s->base, offset, n,
+BDRV_REQ_MAY_UNMAP);
+error_in_source = false;
+break;
+case COMMIT_METHOD_IGNORE:
+break;
+default:
+abort();
  }
  }
+
  if (ret < 0) {
  BlockErrorAction action =
  block_job_error_action(>common, s->on_error,
@@ -193,10 +237,11 @@ static int coroutine_fn commit_run(Job *job, Error **errp)
  continue;
  }
  }
+
  /* Publish progress */
  job_progress_update(>common.job, n);
  
-if (copy) {

+if (commit_method == COMMIT_METHOD_COPY) {
  block_job_ratelimit_processed_bytes(>common, n);
  }
  }


--
Best regards,
Vladimir

Re: [PATCH 1/1] prealloc: add truncate mode for prealloc filter

2024-06-25 Thread Vladimir Sementsov-Ogievskiy


On 30.04.24 20:05, Denis V. Lunev via wrote:

Preallocate filter allows to implement really interesting setups.

Assume that we have
* shared block device, f.e. iSCSI LUN, implemented with some HW device
* clustered LVM on top of it
* QCOW2 image stored inside LVM volume

This allows very cheap clustered setups with all QCOW2 features intact.
Currently supported setups using QCOW2 with data_file option are not
so cool as snapshots are not allowed, QCOW2 should be placed into some
additional distributed storage and so on.

Though QCOW2 inside LVM volume has a drawback. The image is growing and
in order to accomodate that image LVM volume is to be resized. This
could be done externally using ENOSPACE event/condition but this is
cumbersome.

This patch introduces native implementation for such a setup. We should
just put prealloc filter in between QCOW2 format and file nodes. In that
case LVM will be resized at proper moment and that is done effectively
as resizing is done in chinks.

The patch adds allocation mode for this purpose in order to distinguish
'fallocate' for ordinary file system and 'truncate'.

Signed-off-by: Denis V. Lunev 
CC: Alexander Ivanov 
CC: Kevin Wolf 
CC: Hanna Reitz 
CC: Vladimir Sementsov-Ogievskiy 
---
  block/preallocate.c | 50 +++--
  1 file changed, 48 insertions(+), 2 deletions(-)

diff --git a/block/preallocate.c b/block/preallocate.c
index 4d82125036..6d31627325 100644
--- a/block/preallocate.c
+++ b/block/preallocate.c
@@ -33,10 +33,24 @@
  #include "block/block-io.h"
  #include "block/block_int.h"
  
+typedef enum PreallocateMode {

+PREALLOCATE_MODE_FALLOCATE = 0,
+PREALLOCATE_MODE_TRUNCATE = 1,
+PREALLOCATE_MODE__MAX = 2,
+} PreallocateMode;
+
+static QEnumLookup prealloc_mode_lookup = {
+.array = (const char *const[]) {
+"falloc",
+"truncate",
+},
+.size = PREALLOCATE_MODE__MAX,
+};


I think instead we should update BlockdevOptionsPreallocate options in 
qapi/block-core.json. Then corresponding enum would be generated.

Example of parsing options, reusing qapi mechanics is in 
block/copy-before-write.c - cbw_parse_options(). on-cbw-error field is enum.

  
  typedef struct PreallocateOpts {

  int64_t prealloc_size;
  int64_t prealloc_align;
+PreallocateMode prealloc_mode;
  } PreallocateOpts;
  
  typedef struct BDRVPreallocateState {

@@ -79,6 +93,7 @@ typedef struct BDRVPreallocateState {
  
  #define PREALLOCATE_OPT_PREALLOC_ALIGN "prealloc-align"

  #define PREALLOCATE_OPT_PREALLOC_SIZE "prealloc-size"
+#define PREALLOCATE_OPT_MODE "mode"
  static QemuOptsList runtime_opts = {
  .name = "preallocate",
  .head = QTAILQ_HEAD_INITIALIZER(runtime_opts.head),
@@ -94,7 +109,14 @@ static QemuOptsList runtime_opts = {
  .type = QEMU_OPT_SIZE,
  .help = "how much to preallocate, default 128M",
  },
-{ /* end of list */ }
+{
+.name = PREALLOCATE_OPT_MODE,
+.type = QEMU_OPT_STRING,
+.help = "Preallocation mode on image expansion "
+"(allowed values: falloc, truncate)",
+.def_value_str = "falloc",
+},
+{ /* end of list */ },
  },
  };
  
@@ -102,6 +124,8 @@ static bool preallocate_absorb_opts(PreallocateOpts *dest, QDict *options,

  BlockDriverState *child_bs, Error **errp)
  {
  QemuOpts *opts = qemu_opts_create(_opts, NULL, 0, _abort);
+Error *local_err = NULL;
+char *buf;
  
  if (!qemu_opts_absorb_qdict(opts, options, errp)) {

  return false;
@@ -112,6 +136,17 @@ static bool preallocate_absorb_opts(PreallocateOpts *dest, 
QDict *options,
  dest->prealloc_size =
  qemu_opt_get_size(opts, PREALLOCATE_OPT_PREALLOC_SIZE, 128 * MiB);
  
+buf = qemu_opt_get_del(opts, PREALLOCATE_OPT_MODE);

+/* prealloc_mode can be downgraded later during allocate_clusters */
+dest->prealloc_mode = qapi_enum_parse(_mode_lookup, buf,
+  PREALLOCATE_MODE_FALLOCATE,
+  _err);
+g_free(buf);
+if (local_err != NULL) {
+error_propagate(errp, local_err);
+return false;
+}
+
  qemu_opts_del(opts);
  
  if (!QEMU_IS_ALIGNED(dest->prealloc_align, BDRV_SECTOR_SIZE)) {

@@ -335,9 +370,20 @@ handle_write(BlockDriverState *bs, int64_t offset, int64_t 
bytes,
  
  want_merge_zero = want_merge_zero && (prealloc_start <= offset);
  
-ret = bdrv_co_pwrite_zeroes(

+switch (s->opts.prealloc_mode) {
+case PREALLOCATE_MODE_FALLOCATE:
+ret = bdrv_co_pwrite_zeroes(
  bs->file, prealloc_start, prealloc_end - prealloc_start,
  BDRV_REQ_NO_FALLBACK | BDRV_REQ_S

Re: [PATCH] tests/avocado: add hotplug_blk test

2024-06-25 Thread Vladimir Sementsov-Ogievskiy


ping2

On 09.04.24 09:58, Vladimir Sementsov-Ogievskiy wrote:

Introduce a test, that checks that plug/unplug of virtio-blk device
works.

(the test is developed by copying hotplug_cpu.py, so keep original
copyright)

Signed-off-by: Vladimir Sementsov-Ogievskiy


--
Best regards,
Vladimir

[PATCH v5 1/3] qdev-monitor: add option to report GenericError from find_device_state

2024-06-25 Thread Vladimir Sementsov-Ogievskiy

Here we just prepare for the following patch, making possible to report
GenericError as recommended.

This patch doesn't aim to prevent further use of DeviceNotFound by
future interfaces:

 - find_device_state() is used in blk_by_qdev_id() and qmp_get_blk()
   functions, which may lead to spread of DeviceNotFound anyway
 - also, nothing prevent simply copy-pasting find_device_state() calls
   with false argument

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 system/qdev-monitor.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/system/qdev-monitor.c b/system/qdev-monitor.c
index 6af6ef7d66..264978aa40 100644
--- a/system/qdev-monitor.c
+++ b/system/qdev-monitor.c
@@ -879,13 +879,20 @@ void qmp_device_add(QDict *qdict, QObject **ret_data, 
Error **errp)
 object_unref(OBJECT(dev));
 }
 
-static DeviceState *find_device_state(const char *id, Error **errp)
+/*
+ * Note that creating new APIs using error classes other than GenericError is
+ * not recommended. Set use_generic_error=true for new interfaces.
+ */
+static DeviceState *find_device_state(const char *id, bool use_generic_error,
+  Error **errp)
 {
 Object *obj = object_resolve_path_at(qdev_get_peripheral(), id);
 DeviceState *dev;
 
 if (!obj) {
-error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND,
+error_set(errp,
+  (use_generic_error ?
+   ERROR_CLASS_GENERIC_ERROR : ERROR_CLASS_DEVICE_NOT_FOUND),
   "Device '%s' not found", id);
 return NULL;
 }
@@ -950,7 +957,7 @@ void qdev_unplug(DeviceState *dev, Error **errp)
 
 void qmp_device_del(const char *id, Error **errp)
 {
-DeviceState *dev = find_device_state(id, errp);
+DeviceState *dev = find_device_state(id, false, errp);
 if (dev != NULL) {
 if (dev->pending_deleted_event &&
 (dev->pending_deleted_expires_ms == 0 ||
@@ -1070,7 +1077,7 @@ BlockBackend *blk_by_qdev_id(const char *id, Error **errp)
 
 GLOBAL_STATE_CODE();
 
-dev = find_device_state(id, errp);
+dev = find_device_state(id, false, errp);
 if (dev == NULL) {
 return NULL;
 }
-- 
2.34.1

[PATCH v5 3/3] qapi: introduce device-sync-config

2024-06-25 Thread Vladimir Sementsov-Ogievskiy

Add command to sync config from vhost-user backend to the device. It
may be helpful when VHOST_USER_SLAVE_CONFIG_CHANGE_MSG failed or not
triggered interrupt to the guest or just not available (not supported
by vhost-user server).

Command result is racy if allow it during migration. Let's allow the
sync only in RUNNING state.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 hw/block/vhost-user-blk.c |  1 +
 hw/virtio/virtio-pci.c|  9 +
 include/hw/qdev-core.h|  3 +++
 qapi/qdev.json| 24 
 system/qdev-monitor.c | 38 ++
 5 files changed, 75 insertions(+)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 091d2c6acf..2f301f380c 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -588,6 +588,7 @@ static void vhost_user_blk_class_init(ObjectClass *klass, 
void *data)
 
 device_class_set_props(dc, vhost_user_blk_properties);
 dc->vmsd = _vhost_user_blk;
+dc->sync_config = vhost_user_blk_sync_config;
 set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
 vdc->realize = vhost_user_blk_device_realize;
 vdc->unrealize = vhost_user_blk_device_unrealize;
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index b1d02f4b3d..0d91e8b5dc 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -2351,6 +2351,14 @@ static void virtio_pci_dc_realize(DeviceState *qdev, 
Error **errp)
 vpciklass->parent_dc_realize(qdev, errp);
 }
 
+static int virtio_pci_sync_config(DeviceState *dev, Error **errp)
+{
+VirtIOPCIProxy *proxy = VIRTIO_PCI(dev);
+VirtIODevice *vdev = virtio_bus_get_device(>bus);
+
+return qdev_sync_config(DEVICE(vdev), errp);
+}
+
 static void virtio_pci_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
@@ -2367,6 +2375,7 @@ static void virtio_pci_class_init(ObjectClass *klass, 
void *data)
 device_class_set_parent_realize(dc, virtio_pci_dc_realize,
 >parent_dc_realize);
 rc->phases.hold = virtio_pci_bus_reset_hold;
+dc->sync_config = virtio_pci_sync_config;
 }
 
 static const TypeInfo virtio_pci_info = {
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 5336728a23..f992061919 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -95,6 +95,7 @@ typedef void (*DeviceUnrealize)(DeviceState *dev);
 typedef void (*DeviceReset)(DeviceState *dev);
 typedef void (*BusRealize)(BusState *bus, Error **errp);
 typedef void (*BusUnrealize)(BusState *bus);
+typedef int (*DeviceSyncConfig)(DeviceState *dev, Error **errp);
 
 /**
  * struct DeviceClass - The base class for all devices.
@@ -162,6 +163,7 @@ struct DeviceClass {
 DeviceReset reset;
 DeviceRealize realize;
 DeviceUnrealize unrealize;
+DeviceSyncConfig sync_config;
 
 /**
  * @vmsd: device state serialisation description for
@@ -547,6 +549,7 @@ bool qdev_hotplug_allowed(DeviceState *dev, Error **errp);
  */
 HotplugHandler *qdev_get_hotplug_handler(DeviceState *dev);
 void qdev_unplug(DeviceState *dev, Error **errp);
+int qdev_sync_config(DeviceState *dev, Error **errp);
 void qdev_simple_device_unplug_cb(HotplugHandler *hotplug_dev,
   DeviceState *dev, Error **errp);
 void qdev_machine_creation_done(void);
diff --git a/qapi/qdev.json b/qapi/qdev.json
index facaa0bc6a..72e434bc45 100644
--- a/qapi/qdev.json
+++ b/qapi/qdev.json
@@ -161,3 +161,27 @@
 ##
 { 'event': 'DEVICE_UNPLUG_GUEST_ERROR',
   'data': { '*device': 'str', 'path': 'str' } }
+
+##
+# @device-sync-config:
+#
+# Synchronize device configuration from host to guest part.  First,
+# copy the configuration from the host part (backend) to the guest
+# part (frontend).  Then notify guest software that device
+# configuration changed.
+#
+# The command may be used to notify the guest about block device
+# capcity change.  Currently only vhost-user-blk device supports
+# this.
+#
+# @id: the device's ID or QOM path
+#
+# Features:
+#
+# @unstable: The command is experimental.
+#
+# Since: 9.1
+##
+{ 'command': 'device-sync-config',
+  'features': [ 'unstable' ],
+  'data': {'id': 'str'} }
diff --git a/system/qdev-monitor.c b/system/qdev-monitor.c
index 264978aa40..1c29312b53 100644
--- a/system/qdev-monitor.c
+++ b/system/qdev-monitor.c
@@ -23,6 +23,7 @@
 #include "monitor/monitor.h"
 #include "monitor/qdev.h"
 #include "sysemu/arch_init.h"
+#include "sysemu/runstate.h"
 #include "qapi/error.h"
 #include "qapi/qapi-commands-qdev.h"
 #include "qapi/qmp/dispatch.h"
@@ -971,6 +972,43 @@ void qmp_device_del(const char *id, Error **errp)
 }
 }
 
+int qdev_sync_config(DeviceState *dev, Error **errp)
+{
+DeviceClass *dc = DEVICE_GET_CLASS(dev);
+
+if (!dc->sync_config) {
+error_setg(errp, "device-sync-config

[PATCH v5 0/3] vhost-user-blk: live resize additional APIs

2024-06-25 Thread Vladimir Sementsov-Ogievskiy

v5:
03: drop extra check on is is runstate running


Vladimir Sementsov-Ogievskiy (3):
  qdev-monitor: add option to report GenericError from find_device_state
  vhost-user-blk: split vhost_user_blk_sync_config()
  qapi: introduce device-sync-config

 hw/block/vhost-user-blk.c | 27 ++--
 hw/virtio/virtio-pci.c|  9 +++
 include/hw/qdev-core.h|  3 +++
 qapi/qdev.json| 24 ++
 system/qdev-monitor.c | 53 ---
 5 files changed, 105 insertions(+), 11 deletions(-)

-- 
2.34.1

[PATCH v5 2/3] vhost-user-blk: split vhost_user_blk_sync_config()

2024-06-25 Thread Vladimir Sementsov-Ogievskiy

Split vhost_user_blk_sync_config() out from
vhost_user_blk_handle_config_change(), to be reused in the following
commit.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 hw/block/vhost-user-blk.c | 26 +++---
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 9e6bbc6950..091d2c6acf 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -88,27 +88,39 @@ static void vhost_user_blk_set_config(VirtIODevice *vdev, 
const uint8_t *config)
 s->blkcfg.wce = blkcfg->wce;
 }
 
+static int vhost_user_blk_sync_config(DeviceState *dev, Error **errp)
+{
+int ret;
+VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+VHostUserBlk *s = VHOST_USER_BLK(vdev);
+
+ret = vhost_dev_get_config(>dev, (uint8_t *)>blkcfg,
+   vdev->config_len, errp);
+if (ret < 0) {
+return ret;
+}
+
+memcpy(vdev->config, >blkcfg, vdev->config_len);
+virtio_notify_config(vdev);
+
+return 0;
+}
+
 static int vhost_user_blk_handle_config_change(struct vhost_dev *dev)
 {
 int ret;
-VirtIODevice *vdev = dev->vdev;
-VHostUserBlk *s = VHOST_USER_BLK(dev->vdev);
 Error *local_err = NULL;
 
 if (!dev->started) {
 return 0;
 }
 
-ret = vhost_dev_get_config(dev, (uint8_t *)>blkcfg,
-   vdev->config_len, _err);
+ret = vhost_user_blk_sync_config(DEVICE(dev->vdev), _err);
 if (ret < 0) {
 error_report_err(local_err);
 return ret;
 }
 
-memcpy(dev->vdev->config, >blkcfg, vdev->config_len);
-virtio_notify_config(dev->vdev);
-
 return 0;
 }
 
-- 
2.34.1

Re: [PATCH v2 1/2] copy-before-write: allow specifying minimum cluster size

2024-06-25 Thread Vladimir Sementsov-Ogievskiy


On 28.05.24 15:01, Fiona Ebner wrote:

+if (min_cluster_size > INT64_MAX) {
+error_setg(errp, "min-cluster-size too large: %lu > %ld",
+   min_cluster_size, INT64_MAX);


Better use PRIu64 and PRIi64 macros

--
Best regards,
Vladimir

Re: [PATCH v2 2/2] backup: add minimum cluster size to performance options

2024-06-25 Thread Vladimir Sementsov-Ogievskiy


On 28.05.24 15:01, Fiona Ebner wrote:

In the context of backup fleecing, discarding the source will not work
when the fleecing image has a larger granularity than the one used for
block-copy operations (can happen if the backup target has smaller
cluster size), because cbw_co_pdiscard_snapshot() will align down the
discard requests and thus effectively ignore then.

To make @discard-source work in such a scenario, allow specifying the
minimum cluster size used for block-copy operations and thus in
particular also the granularity for discard requests to the source.

Suggested-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Fiona Ebner 
---

Changes in v2:
* Use 'size' type in QAPI.

  block/backup.c| 2 +-
  block/copy-before-write.c | 8 
  block/copy-before-write.h | 1 +
  blockdev.c| 3 +++
  qapi/block-core.json  | 9 +++--
  5 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 3dd2e229d2..a1292c01ec 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -458,7 +458,7 @@ BlockJob *backup_job_create(const char *job_id, 
BlockDriverState *bs,
  }
  
  cbw = bdrv_cbw_append(bs, target, filter_node_name, discard_source,

-  , errp);
+  perf->min_cluster_size, , errp);
  if (!cbw) {
  goto error;
  }
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index ef0bc4dcfe..183eed42e5 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -553,6 +553,7 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
BlockDriverState *target,
const char *filter_node_name,
bool discard_source,
+  uint64_t min_cluster_size,
BlockCopyState **bcs,
Error **errp)
  {
@@ -572,6 +573,13 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
  qdict_put_str(opts, "file", bdrv_get_node_name(source));
  qdict_put_str(opts, "target", bdrv_get_node_name(target));
  
+if (min_cluster_size > INT64_MAX) {

+error_setg(errp, "min-cluster-size too large: %lu > %ld",
+   min_cluster_size, INT64_MAX);


opts leaked here.

with that fixed:
Reviewed-by: Vladimir Sementsov-Ogievskiy 


+return NULL;
+}
+qdict_put_int(opts, "min-cluster-size", (int64_t)min_cluster_size);
+
  top = bdrv_insert_node(source, opts, flags, errp);
  if (!top) {
  return NULL;
diff --git a/block/copy-before-write.h b/block/copy-before-write.h
index 01af0cd3c4..2a5d4ba693 100644
--- a/block/copy-before-write.h
+++ b/block/copy-before-write.h
@@ -40,6 +40,7 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
BlockDriverState *target,
const char *filter_node_name,
bool discard_source,
+  uint64_t min_cluster_size,
BlockCopyState **bcs,
Error **errp);
  void bdrv_cbw_drop(BlockDriverState *bs);
diff --git a/blockdev.c b/blockdev.c
index 835064ed03..6740663fda 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2655,6 +2655,9 @@ static BlockJob *do_backup_common(BackupCommon *backup,
  if (backup->x_perf->has_max_chunk) {
  perf.max_chunk = backup->x_perf->max_chunk;
  }
+if (backup->x_perf->has_min_cluster_size) {
+perf.min_cluster_size = backup->x_perf->min_cluster_size;
+}
  }
  
  if ((backup->sync == MIRROR_SYNC_MODE_BITMAP) ||

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 8fc0a4b234..f1219a9dfb 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1551,11 +1551,16 @@
  # it should not be less than job cluster size which is calculated
  # as maximum of target image cluster size and 64k.  Default 0.
  #
+# @min-cluster-size: Minimum size of blocks used by copy-before-write
+# and background copy operations.  Has to be a power of 2.  No
+# effect if smaller than the maximum of the target's cluster size
+# and 64 KiB.  Default 0.  (Since 9.1)
+#
  # Since: 6.0
  ##
  { 'struct': 'BackupPerf',
-  'data': { '*use-copy-range': 'bool',
-'*max-workers': 'int', '*max-chunk': 'int64' } }
+  'data': { '*use-copy-range': 'bool', '*max-workers': 'int',
+'*max-chunk': 'int64', '*min-cluster-size': 'size' } }
  
  ##

  # @BackupCommon:


--
Best regards,
Vladimir

Re: [PATCH v2 1/2] copy-before-write: allow specifying minimum cluster size

2024-06-25 Thread Vladimir Sementsov-Ogievskiy


On 28.05.24 15:01, Fiona Ebner wrote:

In the context of backup fleecing, discarding the source will not work
when the fleecing image has a larger granularity than the one used for
block-copy operations (can happen if the backup target has smaller
cluster size), because cbw_co_pdiscard_snapshot() will align down the
discard requests and thus effectively ignore then.

To make @discard-source work in such a scenario, allow specifying the
minimum cluster size used for block-copy operations and thus in
particular also the granularity for discard requests to the source.

The type 'size' (corresponding to uint64_t in C) is used in QAPI to
rule out negative inputs and for consistency with already existing
@cluster-size parameters. Since block_copy_calculate_cluster_size()
uses int64_t for its result, a check that the input is not too large
is added in block_copy_state_new() before calling it. The calculation
in block_copy_calculate_cluster_size() is done in the target int64_t
type.

Suggested-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Fiona Ebner 
---

Changes in v2:
* Use 'size' type in QAPI.
* Remove option in cbw_parse_options(), i.e. before parsing generic
   blockdev options.

  block/block-copy.c | 22 ++
  block/copy-before-write.c  | 10 +-
  include/block/block-copy.h |  1 +
  qapi/block-core.json   |  8 +++-
  4 files changed, 35 insertions(+), 6 deletions(-)

diff --git a/block/block-copy.c b/block/block-copy.c
index 7e3b378528..36eaecaaf4 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -310,6 +310,7 @@ void block_copy_set_copy_opts(BlockCopyState *s, bool 
use_copy_range,
  }
  
  static int64_t block_copy_calculate_cluster_size(BlockDriverState *target,

+ int64_t min_cluster_size,
   Error **errp)
  {
  int ret;
@@ -335,7 +336,7 @@ static int64_t 
block_copy_calculate_cluster_size(BlockDriverState *target,
  "used. If the actual block size of the target exceeds "
  "this default, the backup may be unusable",
  BLOCK_COPY_CLUSTER_SIZE_DEFAULT);
-return BLOCK_COPY_CLUSTER_SIZE_DEFAULT;
+return MAX(min_cluster_size, (int64_t)BLOCK_COPY_CLUSTER_SIZE_DEFAULT);


instead of triple use MAX(min_..., let's just do

   min_cluster_size = MAX(min_cluster_size, 
(int64_t)BLOCK_COPY_CLUSTER_SIZE_DEFAULT);

at start of function, and then use min_cluster_size


anyway:
Reviewed-by: Vladimir Sementsov-Ogievskiy 


  } else if (ret < 0 && !target_does_cow) {
  error_setg_errno(errp, -ret,
  "Couldn't determine the cluster size of the target image, "
@@ -345,16 +346,18 @@ static int64_t 
block_copy_calculate_cluster_size(BlockDriverState *target,
  return ret;
  } else if (ret < 0 && target_does_cow) {
  /* Not fatal; just trudge on ahead. */
-return BLOCK_COPY_CLUSTER_SIZE_DEFAULT;
+return MAX(min_cluster_size, (int64_t)BLOCK_COPY_CLUSTER_SIZE_DEFAULT);
  }
  
-return MAX(BLOCK_COPY_CLUSTER_SIZE_DEFAULT, bdi.cluster_size);

+return MAX(min_cluster_size,
+   (int64_t)MAX(BLOCK_COPY_CLUSTER_SIZE_DEFAULT, 
bdi.cluster_size));
  }
  
  BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,

   BlockDriverState *copy_bitmap_bs,
   const BdrvDirtyBitmap *bitmap,
   bool discard_source,
+ uint64_t min_cluster_size,
   Error **errp)
  {
  ERRP_GUARD();
@@ -365,7 +368,18 @@ BlockCopyState *block_copy_state_new(BdrvChild *source, 
BdrvChild *target,
  
  GLOBAL_STATE_CODE();
  
-cluster_size = block_copy_calculate_cluster_size(target->bs, errp);

+if (min_cluster_size > INT64_MAX) {
+error_setg(errp, "min-cluster-size too large: %lu > %ld",
+   min_cluster_size, INT64_MAX);
+return NULL;
+} else if (min_cluster_size && !is_power_of_2(min_cluster_size)) {
+error_setg(errp, "min-cluster-size needs to be a power of 2");
+return NULL;
+}
+
+cluster_size = block_copy_calculate_cluster_size(target->bs,
+ (int64_t)min_cluster_size,
+ errp);
  if (cluster_size < 0) {
  return NULL;
  }
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index cd65524e26..ef0bc4dcfe 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -417,6 +417,7 @@ static BlockdevOptions *cbw_parse_options(QDict *options, 
Error **errp)
  qdict_extract_subqdict(options, NULL, "bitmap");
  qdict_del(options, "

Re: [PATCH v4 5/5] blockdev: mirror: check for target's cluster size when using bitmap

2024-06-24 Thread Vladimir Sementsov-Ogievskiy


On 21.05.24 15:20, Fiona Ebner wrote:

When using mirror with a bitmap and the target does not do COW and is
is a diff image, i.e. one that should only contain the delta and was
not synced to previously, a too large cluster size for the target can
be problematic. In particular, when the mirror sends data to the
target aligned to the jobs granularity, but not aligned to the larger
target image's cluster size, the target's cluster would be allocated
but only be filled partially. When rebasing such a diff image later,
the corresponding cluster of the base image would get "masked" and the
part of the cluster not in the diff image is not accessible anymore.

Unfortunately, it is not always possible to check for the target
image's cluster size, e.g. when it's NBD. Because the limitation is
already documented in the QAPI description for the @bitmap parameter
and it's only required for special diff image use-case, simply skip
the check then.

Signed-off-by: Fiona Ebner 
---
  blockdev.c | 57 ++
  tests/qemu-iotests/tests/mirror-bitmap |  6 +++
  tests/qemu-iotests/tests/mirror-bitmap.out |  7 +++
  3 files changed, 70 insertions(+)

diff --git a/blockdev.c b/blockdev.c
index 4f72a72dc7..468974108e 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2769,6 +2769,59 @@ void qmp_blockdev_backup(BlockdevBackup *backup, Error 
**errp)
  blockdev_do_action(, errp);
  }
  
+static int blockdev_mirror_check_bitmap_granularity(BlockDriverState *target,

+BdrvDirtyBitmap *bitmap,
+Error **errp)
+{
+int ret;
+BlockDriverInfo bdi;
+uint32_t bitmap_granularity;
+
+GLOBAL_STATE_CODE();
+GRAPH_RDLOCK_GUARD_MAINLOOP();
+
+if (bdrv_backing_chain_next(target)) {
+/*
+ * No need to worry about creating clusters with partial data when the
+ * target does COW.
+ */
+return 0;
+}
+
+/*
+ * If there is no backing file on the target, we cannot rely on COW if our
+ * backup cluster size is smaller than the target cluster size. Even for
+ * targets with a backing file, try to avoid COW if possible.


"Even for targes with" - I don't follow. We do "return 0" already above for 
such targets?


+ */
+ret = bdrv_get_info(target, );
+if (ret == -ENOTSUP) {
+/*
+ * Ignore if unable to get the info, e.g. when target is NBD. It's only
+ * relevant for syncing to a diff image and the documentation already
+ * states that the target's cluster size needs to small enough then.
+ */
+return 0;
+} else if (ret < 0) {
+error_setg_errno(errp, -ret,
+"Couldn't determine the cluster size of the target image, "
+"which has no backing file");
+return ret;
+}
+
+bitmap_granularity = bdrv_dirty_bitmap_granularity(bitmap);
+if (bitmap_granularity < bdi.cluster_size ||
+bitmap_granularity % bdi.cluster_size != 0) {
+error_setg(errp, "Bitmap granularity %u is not a multiple of the "
+   "target image's cluster size %u and the target image has "
+   "no backing file",
+   bitmap_granularity, bdi.cluster_size);
+return -EINVAL;
+}
+
+return 0;
+}
+
+
  /* Parameter check and block job starting for drive mirroring.
   * Caller should hold @device and @target's aio context (must be the same).
   **/
@@ -2863,6 +2916,10 @@ static void blockdev_mirror_common(const char *job_id, 
BlockDriverState *bs,
  return;
  }
  
+if (blockdev_mirror_check_bitmap_granularity(target, bitmap, errp)) {

+return;
+}
+
  if (bdrv_dirty_bitmap_check(bitmap, BDRV_BITMAP_DEFAULT, errp)) {
  return;
  }
diff --git a/tests/qemu-iotests/tests/mirror-bitmap 
b/tests/qemu-iotests/tests/mirror-bitmap
index 37bbe0f241..e8cd482a19 100755
--- a/tests/qemu-iotests/tests/mirror-bitmap
+++ b/tests/qemu-iotests/tests/mirror-bitmap
@@ -584,6 +584,12 @@ def test_mirror_api():
  bitmap=bitmap)
  log('')
  
+log("-- Test bitmap with too small granularity to non-COW target --\n")

+vm.qmp_log("block-dirty-bitmap-add", node=drive0.node,
+   name="bitmap-small", granularity=GRANULARITY)
+blockdev_mirror(drive0.vm, drive0.node, "mirror_target", "full",
+job_id='api_job', bitmap="bitmap-small")
+log('')
  
  def main():

  for bsync_mode in ("never", "on-success", "always"):
diff --git a/tests/qemu-iotests/tests/mirror-bitmap.out 
b/tests/qemu-iotests/tests/mirror-bitmap.out
index 5c8acc1d69..af605f3803 100644
--- a/tests/qemu-iotests/tests/mirror-bitmap.out
+++ b/tests/qemu-iotests/tests/mirror-bitmap.out
@@ -3189,3 +3189,10 @@ qemu_img compare "TEST_DIR/PID-img"

Re: [PATCH v4 4/5] iotests: add test for bitmap mirror

2024-06-24 Thread Vladimir Sementsov-Ogievskiy


On 21.05.24 15:20, Fiona Ebner wrote:

From: Fabian Grünbichler

heavily based on/practically forked off iotest 257 for bitmap backups,
but:


really, heavily. Making a duplication is always bad idea. Could we instead just 
add test-cases to 257?



- no writes to filter node 'mirror-top' between completion and
finalization, as those seem to deadlock?


Could you give a bit more concreteness? If guest writes may lead to dead-lock, 
that's a bug, is it?



--
Best regards,
Vladimir

Re: [PATCH v4 3/5] mirror: allow specifying working bitmap

2024-06-24 Thread Vladimir Sementsov-Ogievskiy


On 21.05.24 15:20, Fiona Ebner wrote:

From: John Snow 

for the mirror job. The bitmap's granularity is used as the job's
granularity.

The new @bitmap parameter is marked unstable in the QAPI and can
currently only be used for @sync=full mode.

Clusters initially dirty in the bitmap as well as new writes are
copied to the target.

Using block-dirty-bitmap-clear and block-dirty-bitmap-merge API,
callers can simulate the three kinds of @BitmapSyncMode (which is used
by backup):
1. always: default, just pass bitmap as working bitmap.
2. never: copy bitmap and pass copy to the mirror job.
3. on-success: copy bitmap and pass copy to the mirror job and if
successful, merge bitmap into original afterwards.

When the target image is a non-COW "diff image", i.e. one that was not
used as the target of a previous mirror and the target image's cluster
size is larger than the bitmap's granularity, or when
@copy-mode=write-blocking is used, there is a pitfall, because the
cluster in the target image will be allocated, but not contain all the
data corresponding to the same region in the source image.

An idea to avoid the limitation would be to mark clusters which are
affected by unaligned writes and are not allocated in the target image
dirty, so they would be copied fully later. However, for migration,
the invariant that an actively synced mirror stays actively synced
(unless an error happens) is useful, because without that invariant,
migration might inactivate block devices when mirror still got work
to do and run into an assertion failure [0].

Another approach would be to read the missing data from the source
upon unaligned writes to be able to write the full target cluster
instead.

But certain targets like NBD do not allow querying the cluster size.
To avoid limiting/breaking the use case of syncing to an existing
target, which is arguably more common than the diff image use case,
document the limitation in QAPI.

This patch was originally based on one by Ma Haocong, but it has since
been modified pretty heavily, first by John and then again by Fiona.

[0]: 
https://lore.kernel.org/qemu-devel/1db7f571-cb7f-c293-04cc-cd856e060...@proxmox.com/

Suggested-by: Ma Haocong 
Signed-off-by: Ma Haocong 
Signed-off-by: John Snow 
[FG: switch to bdrv_dirty_bitmap_merge_internal]
Signed-off-by: Fabian Grünbichler 
Signed-off-by: Thomas Lamprecht 
[FE: rebase for 9.1
  get rid of bitmap mode parameter
  use caller-provided bitmap as working bitmap
  turn bitmap parameter experimental]
Signed-off-by: Fiona Ebner 
Acked-by: Markus Armbruster 
---
  block/mirror.c | 80 +-
  blockdev.c | 44 +++---
  include/block/block_int-global-state.h |  5 +-
  qapi/block-core.json   | 35 ++-
  tests/unit/test-block-iothread.c   |  2 +-
  5 files changed, 141 insertions(+), 25 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index ca23d6ef65..d3d0698116 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -73,6 +73,11 @@ typedef struct MirrorBlockJob {
  size_t buf_size;
  int64_t bdev_length;
  unsigned long *cow_bitmap;
+/*
+ * Whether the bitmap is created locally or provided by the caller (for
+ * incremental sync).
+ */
+bool dirty_bitmap_is_local;
  BdrvDirtyBitmap *dirty_bitmap;
  BdrvDirtyBitmapIter *dbi;
  uint8_t *buf;
@@ -691,7 +696,11 @@ static int mirror_exit_common(Job *job)
  bdrv_unfreeze_backing_chain(mirror_top_bs, target_bs);
  }
  
-bdrv_release_dirty_bitmap(s->dirty_bitmap);

+if (s->dirty_bitmap_is_local) {
+bdrv_release_dirty_bitmap(s->dirty_bitmap);
+} else {
+bdrv_enable_dirty_bitmap(s->dirty_bitmap);
+}
  
  /* Make sure that the source BDS doesn't go away during bdrv_replace_node,

   * before we can call bdrv_drained_end */
@@ -820,6 +829,16 @@ static void mirror_abort(Job *job)
  assert(ret == 0);
  }
  
+/* Always called after commit/abort. */

+static void mirror_clean(Job *job)
+{
+MirrorBlockJob *s = container_of(job, MirrorBlockJob, common.job);
+
+if (!s->dirty_bitmap_is_local && s->dirty_bitmap) {
+bdrv_dirty_bitmap_set_busy(s->dirty_bitmap, false);
+}


why not do that in existing mirror_exit_common, where we already do 
release/enable?


+}
+



--
Best regards,
Vladimir

Re: [PATCH v4 3/5] mirror: allow specifying working bitmap

2024-06-24 Thread Vladimir Sementsov-Ogievskiy


On 21.05.24 15:20, Fiona Ebner wrote:

From: John Snow 

for the mirror job. The bitmap's granularity is used as the job's
granularity.

The new @bitmap parameter is marked unstable in the QAPI and can
currently only be used for @sync=full mode.

Clusters initially dirty in the bitmap as well as new writes are
copied to the target.

Using block-dirty-bitmap-clear and block-dirty-bitmap-merge API,
callers can simulate the three kinds of @BitmapSyncMode (which is used


[..]


  /*
   * The dirty bitmap is set by bdrv_mirror_top_do_write() when not in 
active
- * mode.
+ * mode. For external/caller-provided bitmap, need to wait until
+ * bdrv_mirror_top_do_write() can actually access it before disabling.


hmm, isn't that true at least for non-mode? for other modes we rely on 
mirror_dirty_init(), but with none mode we risk to miss some writes.. That's 
preexisting, but probably it's better to move disabling the bitmap to the 
moment where we sure that we do dirty it by hand on any not-immediatelly-synced 
write. And than have _disable_() call in same place for all scenarios.


   */
-bdrv_disable_dirty_bitmap(s->dirty_bitmap);
+if (s->dirty_bitmap_is_local) {
+bdrv_disable_dirty_bitmap(s->dirty_bitmap);
+}
  




--
Best regards,
Vladimir

Re: [PATCH v4 2/5] block/mirror: replace is_none_mode with sync_mode in MirrorBlockJob struct

2024-06-24 Thread Vladimir Sementsov-Ogievskiy


On 21.05.24 15:20, Fiona Ebner wrote:

It is more flexible and is done in preparation to support specifying a
working bitmap for mirror jobs. In particular, this makes it possible
to assert that @sync_mode=full when a bitmap is used. That assertion
is just to be sure, of course the mirror QMP commands will be made to
fail earlier with a clean error.

Signed-off-by: Fiona Ebner


Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir

[PATCH 1/2] iotests/backup-discard-source: convert size variable to be int

2024-06-20 Thread Vladimir Sementsov-Ogievskiy

Make variable reusable in code for checks. Don't care to change "512 *
1024" invocations as they will be dropped in the next commit.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 tests/qemu-iotests/tests/backup-discard-source | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/tests/qemu-iotests/tests/backup-discard-source 
b/tests/qemu-iotests/tests/backup-discard-source
index 2391b12acd..05fbe5d26b 100755
--- a/tests/qemu-iotests/tests/backup-discard-source
+++ b/tests/qemu-iotests/tests/backup-discard-source
@@ -28,7 +28,7 @@ from iotests import qemu_img_create, qemu_img_map, qemu_io
 temp_img = os.path.join(iotests.test_dir, 'temp')
 source_img = os.path.join(iotests.test_dir, 'source')
 target_img = os.path.join(iotests.test_dir, 'target')
-size = '1M'
+size = 1024 * 1024
 
 
 def get_actual_size(vm, node_name):
@@ -39,9 +39,9 @@ def get_actual_size(vm, node_name):
 
 class TestBackup(iotests.QMPTestCase):
 def setUp(self):
-qemu_img_create('-f', iotests.imgfmt, source_img, size)
-qemu_img_create('-f', iotests.imgfmt, temp_img, size)
-qemu_img_create('-f', iotests.imgfmt, target_img, size)
+qemu_img_create('-f', iotests.imgfmt, source_img, str(size))
+qemu_img_create('-f', iotests.imgfmt, temp_img, str(size))
+qemu_img_create('-f', iotests.imgfmt, target_img, str(size))
 qemu_io('-c', 'write 0 1M', source_img)
 
 self.vm = iotests.VM()
@@ -98,7 +98,7 @@ class TestBackup(iotests.QMPTestCase):
 mapping = qemu_img_map(temp_img)
 self.assertEqual(len(mapping), 1)
 self.assertEqual(mapping[0]['start'], 0)
-self.assertEqual(mapping[0]['length'], 1024 * 1024)
+self.assertEqual(mapping[0]['length'], size)
 self.assertEqual(mapping[0]['data'], False)
 
 os.remove(temp_img)
@@ -125,7 +125,7 @@ class TestBackup(iotests.QMPTestCase):
 self.assert_qmp(result, 'return', '')
 
 # Check that data is written to temporary image
-self.assertGreater(get_actual_size(self.vm, 'temp'), 1024 * 1024)
+self.assertGreater(get_actual_size(self.vm, 'temp'), size)
 
 self.do_backup()
 
-- 
2.34.1

[PATCH 0/2] fix backup-discard-source test for XFS

2024-06-20 Thread Vladimir Sementsov-Ogievskiy

Hi all!

As Kevin reported, the test doesn't work on XFS, as it rely on disk
usage.

Fix it, switching to dirty bitmap for guest write tracking.

Vladimir Sementsov-Ogievskiy (2):
  iotests/backup-discard-source: convert size variable to be int
  iotests/backup-discard-source: don't use actual-size

 .../qemu-iotests/tests/backup-discard-source  | 39 ---
 1 file changed, 25 insertions(+), 14 deletions(-)

-- 
2.34.1

[PATCH 2/2] iotests/backup-discard-source: don't use actual-size

2024-06-20 Thread Vladimir Sementsov-Ogievskiy

Relying on disk usage is bad thing, and test just doesn't work on XFS.

Let's instead add a dirty bitmap to track writes to test image.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 .../qemu-iotests/tests/backup-discard-source  | 29 +--
 1 file changed, 20 insertions(+), 9 deletions(-)

diff --git a/tests/qemu-iotests/tests/backup-discard-source 
b/tests/qemu-iotests/tests/backup-discard-source
index 05fbe5d26b..17fef9c6d3 100755
--- a/tests/qemu-iotests/tests/backup-discard-source
+++ b/tests/qemu-iotests/tests/backup-discard-source
@@ -31,12 +31,6 @@ target_img = os.path.join(iotests.test_dir, 'target')
 size = 1024 * 1024
 
 
-def get_actual_size(vm, node_name):
-nodes = vm.cmd('query-named-block-nodes', flat=True)
-node = next(n for n in nodes if n['node-name'] == node_name)
-return node['image']['actual-size']
-
-
 class TestBackup(iotests.QMPTestCase):
 def setUp(self):
 qemu_img_create('-f', iotests.imgfmt, source_img, str(size))
@@ -84,7 +78,12 @@ class TestBackup(iotests.QMPTestCase):
 }
 })
 
-self.assertLess(get_actual_size(self.vm, 'temp'), 512 * 1024)
+self.bitmap = {
+'node': 'temp',
+'name': 'bitmap0'
+}
+
+self.vm.cmd('block-dirty-bitmap-add', self.bitmap)
 
 def tearDown(self):
 # That should fail, because region is discarded
@@ -113,6 +112,13 @@ class TestBackup(iotests.QMPTestCase):
 
 self.vm.event_wait(name='BLOCK_JOB_COMPLETED')
 
+def get_bitmap_count(self):
+nodes = self.vm.cmd('query-named-block-nodes', flat=True)
+temp = next(n for n in nodes if n['node-name'] == 'temp')
+bitmap = temp['dirty-bitmaps'][0]
+assert bitmap['name'] == self.bitmap['name']
+return bitmap['count']
+
 def test_discard_written(self):
 """
 1. Guest writes
@@ -125,7 +131,7 @@ class TestBackup(iotests.QMPTestCase):
 self.assert_qmp(result, 'return', '')
 
 # Check that data is written to temporary image
-self.assertGreater(get_actual_size(self.vm, 'temp'), size)
+self.assertEqual(self.get_bitmap_count(), size)
 
 self.do_backup()
 
@@ -138,13 +144,18 @@ class TestBackup(iotests.QMPTestCase):
 """
 self.do_backup()
 
+# backup job did discard operation and pollute the bitmap,
+# we have to clean the bitmap, to check next write
+self.assertEqual(self.get_bitmap_count(), size)
+self.vm.cmd('block-dirty-bitmap-clear', self.bitmap)
+
 # Try trigger copy-before-write operation
 result = self.vm.hmp_qemu_io('cbw', 'write 0 1M')
 self.assert_qmp(result, 'return', '')
 
 # Check that data is not written to temporary image, as region
 # is discarded from copy-before-write process
-self.assertLess(get_actual_size(self.vm, 'temp'), 512 * 1024)
+self.assertEqual(self.get_bitmap_count(), 0)
 
 
 if __name__ == '__main__':
-- 
2.34.1

Re: [PATCH v4 5/5] iotests: add backup-discard-source

2024-06-20 Thread Vladimir Sementsov-Ogievskiy


On 13.06.24 11:02, Kevin Wolf wrote:

Am 12.06.2024 um 21:21 hat Vladimir Sementsov-Ogievskiy geschrieben:

On 11.06.24 20:49, Kevin Wolf wrote:

Am 13.03.2024 um 16:28 hat Vladimir Sementsov-Ogievskiy geschrieben:

Add test for a new backup option: discard-source.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Fiona Ebner 
Tested-by: Fiona Ebner 


This test fails for me, and it already does so after this commit that
introduced it. I haven't checked what get_actual_size(), but I'm running
on XFS, so its preallocation could be causing this. We generally avoid
checking the number of allocated blocks in image files for this reason.



Hmm right, I see that relying on allocated size is bad thing. Better
is to check block status, to see how many qcow2 clusters are
allocated.

Do we have some qmp command to get such information? The simplest way
I see is to add dirty to temp block-node, and then check its dirty
count through query-named-block-nodes


Hm, does it have to be QMP in a running QEMU process?


hmm, yes, seems in test_discard_written() we do want to examine running 
process. I'll try to go with bitmap


I'm not sure what
the best way would be there.

Otherwise, my approach would be 'qemu-img check' or even 'qemu-img map',
depending on what you want to verify with it.

Kevin



--
Best regards,
Vladimir

Re: [PATCH v4 5/5] iotests: add backup-discard-source

2024-06-12 Thread Vladimir Sementsov-Ogievskiy


On 11.06.24 20:49, Kevin Wolf wrote:

Am 13.03.2024 um 16:28 hat Vladimir Sementsov-Ogievskiy geschrieben:

Add test for a new backup option: discard-source.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Fiona Ebner 
Tested-by: Fiona Ebner 


This test fails for me, and it already does so after this commit that
introduced it. I haven't checked what get_actual_size(), but I'm running
on XFS, so its preallocation could be causing this. We generally avoid
checking the number of allocated blocks in image files for this reason.



Hmm right, I see that relying on allocated size is bad thing. Better is to 
check block status, to see how many qcow2 clusters are allocated.

Do we have some qmp command to get such information? The simplest way I see is 
to add dirty to temp block-node, and then check its dirty count through 
query-named-block-nodes

--
Best regards,
Vladimir

[PULL v2 0/7] Block jobs patches for 2024-04-29

2024-05-28 Thread Vladimir Sementsov-Ogievskiy

The following changes since commit ad10b4badc1dd5b28305f9b9f1168cf0aa3ae946:

  Merge tag 'pull-error-2024-05-27' of https://repo.or.cz/qemu/armbru into 
staging (2024-05-27 06:40:42 -0700)

are available in the Git repository at:

  https://gitlab.com/vsementsov/qemu.git tags/pull-block-jobs-2024-04-29-v2

for you to fetch changes up to a149401048481247bcbaf6035a7a1308974fb464:

  iotests/pylintrc: allow up to 10 similar lines (2024-05-28 15:52:15 +0300)


Block jobs patches for 2024-04-29

v2: add "iotests/pylintrc: allow up to 10 similar lines" to fix
check-python-minreqs

- backup: discard-source parameter
- blockcommit: Reopen base image as RO after abort


Alexander Ivanov (1):
  blockcommit: Reopen base image as RO after abort

Vladimir Sementsov-Ogievskiy (6):
  block/copy-before-write: fix permission
  block/copy-before-write: support unligned snapshot-discard
  block/copy-before-write: create block_copy bitmap in filter node
  qapi: blockdev-backup: add discard-source parameter
  iotests: add backup-discard-source
  iotests/pylintrc: allow up to 10 similar lines

 block/backup.c |   5 +-
 block/block-copy.c |  12 ++-
 block/copy-before-write.c  |  39 +++--
 block/copy-before-write.h  |   1 +
 block/mirror.c |  11 ++-
 block/replication.c|   4 +-
 blockdev.c |   2 +-
 include/block/block-common.h   |   2 +
 include/block/block-copy.h |   2 +
 include/block/block_int-global-state.h |   2 +-
 qapi/block-core.json   |   4 +
 tests/qemu-iotests/257.out | 112 
+-
 tests/qemu-iotests/pylintrc|   2 +-
 tests/qemu-iotests/tests/backup-discard-source | 152 

 tests/qemu-iotests/tests/backup-discard-source.out |   5 ++
 15 files changed, 282 insertions(+), 73 deletions(-)
 create mode 100755 tests/qemu-iotests/tests/backup-discard-source
 create mode 100644 tests/qemu-iotests/tests/backup-discard-source.out


Alexander Ivanov (1):
  blockcommit: Reopen base image as RO after abort

Vladimir Sementsov-Ogievskiy (6):
  block/copy-before-write: fix permission
  block/copy-before-write: support unligned snapshot-discard
  block/copy-before-write: create block_copy bitmap in filter node
  qapi: blockdev-backup: add discard-source parameter
  iotests: add backup-discard-source
  iotests/pylintrc: allow up to 10 similar lines

 block/backup.c|   5 +-
 block/block-copy.c|  12 +-
 block/copy-before-write.c |  39 -
 block/copy-before-write.h |   1 +
 block/mirror.c|  11 +-
 block/replication.c   |   4 +-
 blockdev.c|   2 +-
 include/block/block-common.h  |   2 +
 include/block/block-copy.h|   2 +
 include/block/block_int-global-state.h|   2 +-
 qapi/block-core.json  |   4 +
 tests/qemu-iotests/257.out| 112 ++---
 tests/qemu-iotests/pylintrc   |   2 +-
 .../qemu-iotests/tests/backup-discard-source  | 152 ++
 .../tests/backup-discard-source.out   |   5 +
 15 files changed, 282 insertions(+), 73 deletions(-)
 create mode 100755 tests/qemu-iotests/tests/backup-discard-source
 create mode 100644 tests/qemu-iotests/tests/backup-discard-source.out

-- 
2.34.1

[PULL v2 7/7] iotests/pylintrc: allow up to 10 similar lines

2024-05-28 Thread Vladimir Sementsov-Ogievskiy

We want to have similar QMP objects in different tests. Reworking these
objects to make common parts by calling some helper functions doesn't
seem good. It's a lot more comfortable to see the whole QAPI request in
one place.

So, let's increase the limit, to unblock further commit
"iotests: add backup-discard-source"

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 tests/qemu-iotests/pylintrc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/pylintrc b/tests/qemu-iotests/pylintrc
index de2e0c2781..05b75ee59b 100644
--- a/tests/qemu-iotests/pylintrc
+++ b/tests/qemu-iotests/pylintrc
@@ -55,4 +55,4 @@ max-line-length=79
 
 [SIMILARITIES]
 
-min-similarity-lines=6
+min-similarity-lines=10
-- 
2.34.1

Re: [PATCH] iotests/pylintrc: allow up to 10 similar lines

2024-05-28 Thread Vladimir Sementsov-Ogievskiy


On 30.04.24 12:13, Vladimir Sementsov-Ogievskiy wrote:

We want to have similar QMP objects in different tests. Reworking these
objects to make common parts by calling some helper functions doesn't
seem good. It's a lot more comfortable to see the whole QAPI request in
one place.

So, let's increase the limit, to unblock further commit
"iotests: add backup-discard-source"

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---

Hi all! That's a patch to unblock my PR
"[PULL 0/6] Block jobs patches for 2024-04-29"
   <20240429115157.2260885-1-vsement...@yandex-team.ru>
   https://patchew.org/QEMU/20240429115157.2260885-1-vsement...@yandex-team.ru/


  tests/qemu-iotests/pylintrc | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/pylintrc b/tests/qemu-iotests/pylintrc
index de2e0c2781..05b75ee59b 100644
--- a/tests/qemu-iotests/pylintrc
+++ b/tests/qemu-iotests/pylintrc
@@ -55,4 +55,4 @@ max-line-length=79
  
  [SIMILARITIES]
  
-min-similarity-lines=6

+min-similarity-lines=10



Hi! I hope it's OK, if I just apply this to my block branch, and resend "[PULL 0/6] 
Block jobs patches for 2024-04-29" which is blocked on this problem.

Thanks for reviewing,
applied to my block branch

--
Best regards,
Vladimir

Re: [PATCH v4 3/3] qapi: introduce device-sync-config

2024-05-01 Thread Vladimir Sementsov-Ogievskiy


On 30.04.24 11:31, Vladimir Sementsov-Ogievskiy wrote:

On 30.04.24 11:19, Markus Armbruster wrote:

Vladimir Sementsov-Ogievskiy  writes:


Add command to sync config from vhost-user backend to the device. It
may be helpful when VHOST_USER_SLAVE_CONFIG_CHANGE_MSG failed or not
triggered interrupt to the guest or just not available (not supported
by vhost-user server).

Command result is racy if allow it during migration. Let's allow the
sync only in RUNNING state.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  hw/block/vhost-user-blk.c |  1 +
  hw/virtio/virtio-pci.c    |  9 
  include/hw/qdev-core.h    |  3 +++
  qapi/qdev.json    | 23 +++
  system/qdev-monitor.c | 48 +++
  5 files changed, 84 insertions(+)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 091d2c6acf..2f301f380c 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -588,6 +588,7 @@ static void vhost_user_blk_class_init(ObjectClass *klass, 
void *data)
  device_class_set_props(dc, vhost_user_blk_properties);
  dc->vmsd = _vhost_user_blk;
+    dc->sync_config = vhost_user_blk_sync_config;
  set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
  vdc->realize = vhost_user_blk_device_realize;
  vdc->unrealize = vhost_user_blk_device_unrealize;
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index b1d02f4b3d..0d91e8b5dc 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -2351,6 +2351,14 @@ static void virtio_pci_dc_realize(DeviceState *qdev, 
Error **errp)
  vpciklass->parent_dc_realize(qdev, errp);
  }
+static int virtio_pci_sync_config(DeviceState *dev, Error **errp)
+{
+    VirtIOPCIProxy *proxy = VIRTIO_PCI(dev);
+    VirtIODevice *vdev = virtio_bus_get_device(>bus);
+
+    return qdev_sync_config(DEVICE(vdev), errp);
+}
+
  static void virtio_pci_class_init(ObjectClass *klass, void *data)
  {
  DeviceClass *dc = DEVICE_CLASS(klass);
@@ -2367,6 +2375,7 @@ static void virtio_pci_class_init(ObjectClass *klass, 
void *data)
  device_class_set_parent_realize(dc, virtio_pci_dc_realize,
  >parent_dc_realize);
  rc->phases.hold = virtio_pci_bus_reset_hold;
+    dc->sync_config = virtio_pci_sync_config;
  }
  static const TypeInfo virtio_pci_info = {
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 9228e96c87..87135bdcdf 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -95,6 +95,7 @@ typedef void (*DeviceUnrealize)(DeviceState *dev);
  typedef void (*DeviceReset)(DeviceState *dev);
  typedef void (*BusRealize)(BusState *bus, Error **errp);
  typedef void (*BusUnrealize)(BusState *bus);
+typedef int (*DeviceSyncConfig)(DeviceState *dev, Error **errp);
  /**
   * struct DeviceClass - The base class for all devices.
@@ -162,6 +163,7 @@ struct DeviceClass {
  DeviceReset reset;
  DeviceRealize realize;
  DeviceUnrealize unrealize;
+    DeviceSyncConfig sync_config;
  /**
   * @vmsd: device state serialisation description for
@@ -546,6 +548,7 @@ bool qdev_hotplug_allowed(DeviceState *dev, Error **errp);
   */
  HotplugHandler *qdev_get_hotplug_handler(DeviceState *dev);
  void qdev_unplug(DeviceState *dev, Error **errp);
+int qdev_sync_config(DeviceState *dev, Error **errp);
  void qdev_simple_device_unplug_cb(HotplugHandler *hotplug_dev,
    DeviceState *dev, Error **errp);
  void qdev_machine_creation_done(void);
diff --git a/qapi/qdev.json b/qapi/qdev.json
index facaa0bc6a..fc5e125a45 100644
--- a/qapi/qdev.json
+++ b/qapi/qdev.json
@@ -161,3 +161,26 @@
  ##
  { 'event': 'DEVICE_UNPLUG_GUEST_ERROR',
    'data': { '*device': 'str', 'path': 'str' } }
+
+##
+# @device-sync-config:
+#
+# Synchronize device configuration from host to guest part.  First,
+# copy the configuration from the host part (backend) to the guest
+# part (frontend).  Then notify guest software that device
+# configuration changed.


Blank line here, please.


+# The command may be used to notify the guest about block device
+# capcity change.  Currently only vhost-user-blk device supports
+# this.
+#
+# @id: the device's ID or QOM path
+#
+# Features:
+#
+# @unstable: The command is experimental.
+#
+# Since: 9.1
+##
+{ 'command': 'device-sync-config',
+  'features': [ 'unstable' ],
+  'data': {'id': 'str'} }
diff --git a/system/qdev-monitor.c b/system/qdev-monitor.c
index 264978aa40..47bfc0506e 100644
--- a/system/qdev-monitor.c
+++ b/system/qdev-monitor.c
@@ -23,6 +23,7 @@
  #include "monitor/monitor.h"
  #include "monitor/qdev.h"
  #include "sysemu/arch_init.h"
+#include "sysemu/runstate.h"
  #include "qapi/error.h"
  #include "qapi/qapi-commands-qdev.h"
  #include "qapi/qmp/dispatch.h"
@@ -971,6 +972,53 @@ void qmp_device_del(const char *id, Error **errp)
  }
  }

Re: [PATCH v6 4/5] migration: process_incoming_migration_co(): rework error reporting

2024-05-01 Thread Vladimir Sementsov-Ogievskiy


On 30.04.24 12:16, Philippe Mathieu-Daudé wrote:

On 30/4/24 10:56, Vladimir Sementsov-Ogievskiy wrote:

Unify error reporting in the function. This simplifies the following
commit, which will not-exit-on-error behavior variant to the function.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  migration/migration.c | 23 +--
  1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index b307a4bc59..a9599838e6 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -735,14 +735,16 @@ static void process_incoming_migration_bh(void *opaque)
  static void coroutine_fn
  process_incoming_migration_co(void *opaque)
  {
+    MigrationState *s = migrate_get_current();


(see below)


  MigrationIncomingState *mis = migration_incoming_get_current();
  PostcopyState ps;
  int ret;
+    Error *local_err = NULL;
  assert(mis->from_src_file);
  if (compress_threads_load_setup(mis->from_src_file)) {
-    error_report("Failed to setup decompress threads");
+    error_setg(_err, "Failed to setup decompress threads");
  goto fail;
  }
@@ -779,19 +781,12 @@ process_incoming_migration_co(void *opaque)
  }
  if (ret < 0) {
-    MigrationState *s = migrate_get_current();
-
-    if (migrate_has_error(s)) {
-    WITH_QEMU_LOCK_GUARD(>error_mutex) {
-    error_report_err(s->error);
-    s->error = NULL;
-    }
-    }
-    error_report("load of migration failed: %s", strerror(-ret));
+    error_setg(_err, "load of migration failed: %s", strerror(-ret));
  goto fail;
  }
  if (colo_incoming_co() < 0) {
+    error_setg(_err, "colo incoming failed");
  goto fail;
  }
@@ -800,8 +795,16 @@ process_incoming_migration_co(void *opaque)
  fail:


Maybe just assign @s in error path here?

    s = migrate_get_current();


I'd keep as is. If continue improving the function, I'd better split the logic to 
seperate function with classic "Error **errp" argument. And keep reporting 
error in caller.




  migrate_set_state(>state, MIGRATION_STATUS_ACTIVE,
    MIGRATION_STATUS_FAILED);
+    migrate_set_error(s, local_err);
+    error_free(local_err);
+
  migration_incoming_state_destroy();
+    WITH_QEMU_LOCK_GUARD(>error_mutex) {
+    error_report_err(s->error);
+    s->error = NULL;
+    }
+
  exit(EXIT_FAILURE);
  }


Reviewed-by: Philippe Mathieu-Daudé 



--
Best regards,
Vladimir

Re: [PULL 6/6] iotests: add backup-discard-source

2024-04-30 Thread Vladimir Sementsov-Ogievskiy


On 30.04.24 12:13, Kevin Wolf wrote:

Am 29.04.2024 um 20:39 hat Vladimir Sementsov-Ogievskiy geschrieben:

[Add John]

On 29.04.24 17:18, Richard Henderson wrote:

On 4/29/24 04:51, Vladimir Sementsov-Ogievskiy wrote:

Add test for a new backup option: discard-source.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Fiona Ebner 
Tested-by: Fiona Ebner 
Message-Id: <20240313152822.626493-6-vsement...@yandex-team.ru>
Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
   .../qemu-iotests/tests/backup-discard-source  | 152 ++
   .../tests/backup-discard-source.out   |   5 +
   2 files changed, 157 insertions(+)
   create mode 100755 tests/qemu-iotests/tests/backup-discard-source
   create mode 100644 tests/qemu-iotests/tests/backup-discard-source.out


This fails check-python-minreqs

    https://gitlab.com/qemu-project/qemu/-/jobs/6739551782

It appears to be a pylint issue.




tests/export-incoming-iothread:1:0: R0801: Similar lines in 2 files
==tests.backup-discard-source:[52:61]
==tests.copy-before-write:[54:63]
 'file': {
 'driver': iotests.imgfmt,
 'file': {
 'driver': 'file',
 'filename': source_img,
 }
 },
 'target': {
 'driver': iotests.imgfmt, (duplicate-code)



Hmm. I don't think, that something should be changed in code.
splitting out part of this json object to a function? That's a test
for QMP command, and it's good that we see the command as is in one
place. I can swap some lines or rename variables to hack the linter,
but I'd prefer not doing so:)


For me that looks like a false-positive: yes it's a duplication, but
it should better be duplication, than complicating raw json objects by
reusing common parts.


What to do? As described in 22305c2a081b8b6 "python: Reduce strictness
of pylint's duplicate-code check", this check could be either be
disabled completely, or we can increase min-similarity-lines config
value.

I'd just disable it completely. Any thoughts?


I think it would make sense to treat tests differently from real
production code. In tests/, some duplication is bound to happen and
trying to unify things across test cases (which would mean moving to
iotests.py) often doesn't make sense. On the other hand, in python/ we
would probably want to unify duplicated code.



Agree. Happily, it turns out that tests already have separate tests/qemu-iotests/pylintrc 
file, so that's not a problem. Still I decided to anot disable the check completely, but 
just bump the limit, see "[PATCH] iotests/pylintrc: allow up to 10 similar 
lines"

--
Best regards,
Vladimir

[PATCH] iotests/pylintrc: allow up to 10 similar lines

2024-04-30 Thread Vladimir Sementsov-Ogievskiy

We want to have similar QMP objects in different tests. Reworking these
objects to make common parts by calling some helper functions doesn't
seem good. It's a lot more comfortable to see the whole QAPI request in
one place.

So, let's increase the limit, to unblock further commit
"iotests: add backup-discard-source"

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---

Hi all! That's a patch to unblock my PR
"[PULL 0/6] Block jobs patches for 2024-04-29"
  <20240429115157.2260885-1-vsement...@yandex-team.ru>
  https://patchew.org/QEMU/20240429115157.2260885-1-vsement...@yandex-team.ru/


 tests/qemu-iotests/pylintrc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/pylintrc b/tests/qemu-iotests/pylintrc
index de2e0c2781..05b75ee59b 100644
--- a/tests/qemu-iotests/pylintrc
+++ b/tests/qemu-iotests/pylintrc
@@ -55,4 +55,4 @@ max-line-length=79
 
 [SIMILARITIES]
 
-min-similarity-lines=6
+min-similarity-lines=10
-- 
2.34.1

[PATCH v6 4/5] migration: process_incoming_migration_co(): rework error reporting

2024-04-30 Thread Vladimir Sementsov-Ogievskiy

Unify error reporting in the function. This simplifies the following
commit, which will not-exit-on-error behavior variant to the function.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 migration/migration.c | 23 +--
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index b307a4bc59..a9599838e6 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -735,14 +735,16 @@ static void process_incoming_migration_bh(void *opaque)
 static void coroutine_fn
 process_incoming_migration_co(void *opaque)
 {
+MigrationState *s = migrate_get_current();
 MigrationIncomingState *mis = migration_incoming_get_current();
 PostcopyState ps;
 int ret;
+Error *local_err = NULL;
 
 assert(mis->from_src_file);
 
 if (compress_threads_load_setup(mis->from_src_file)) {
-error_report("Failed to setup decompress threads");
+error_setg(_err, "Failed to setup decompress threads");
 goto fail;
 }
 
@@ -779,19 +781,12 @@ process_incoming_migration_co(void *opaque)
 }
 
 if (ret < 0) {
-MigrationState *s = migrate_get_current();
-
-if (migrate_has_error(s)) {
-WITH_QEMU_LOCK_GUARD(>error_mutex) {
-error_report_err(s->error);
-s->error = NULL;
-}
-}
-error_report("load of migration failed: %s", strerror(-ret));
+error_setg(_err, "load of migration failed: %s", strerror(-ret));
 goto fail;
 }
 
 if (colo_incoming_co() < 0) {
+error_setg(_err, "colo incoming failed");
 goto fail;
 }
 
@@ -800,8 +795,16 @@ process_incoming_migration_co(void *opaque)
 fail:
 migrate_set_state(>state, MIGRATION_STATUS_ACTIVE,
   MIGRATION_STATUS_FAILED);
+migrate_set_error(s, local_err);
+error_free(local_err);
+
 migration_incoming_state_destroy();
 
+WITH_QEMU_LOCK_GUARD(>error_mutex) {
+error_report_err(s->error);
+s->error = NULL;
+}
+
 exit(EXIT_FAILURE);
 }
 
-- 
2.34.1

[PATCH v6 1/5] migration: move trace-point from migrate_fd_error to migrate_set_error

2024-04-30 Thread Vladimir Sementsov-Ogievskiy

Cover more cases by trace-point.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Fabiano Rosas 
Reviewed-by: Peter Xu 
---
 migration/migration.c  | 4 +++-
 migration/trace-events | 2 +-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index b5af6b5105..2dc6a063e9 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1421,6 +1421,9 @@ static void migrate_fd_cleanup_bh(void *opaque)
 void migrate_set_error(MigrationState *s, const Error *error)
 {
 QEMU_LOCK_GUARD(>error_mutex);
+
+trace_migrate_error(error_get_pretty(error));
+
 if (!s->error) {
 s->error = error_copy(error);
 }
@@ -1444,7 +1447,6 @@ static void migrate_error_free(MigrationState *s)
 
 static void migrate_fd_error(MigrationState *s, const Error *error)
 {
-trace_migrate_fd_error(error_get_pretty(error));
 assert(s->to_dst_file == NULL);
 migrate_set_state(>state, MIGRATION_STATUS_SETUP,
   MIGRATION_STATUS_FAILED);
diff --git a/migration/trace-events b/migration/trace-events
index f0e1cb80c7..d0c44c3853 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -152,7 +152,7 @@ multifd_set_outgoing_channel(void *ioc, const char 
*ioctype, const char *hostnam
 # migration.c
 migrate_set_state(const char *new_state) "new state %s"
 migrate_fd_cleanup(void) ""
-migrate_fd_error(const char *error_desc) "error=%s"
+migrate_error(const char *error_desc) "error=%s"
 migrate_fd_cancel(void) ""
 migrate_handle_rp_req_pages(const char *rbname, size_t start, size_t len) "in 
%s at 0x%zx len 0x%zx"
 migrate_pending_exact(uint64_t size, uint64_t pre, uint64_t post) "exact 
pending size %" PRIu64 " (pre = %" PRIu64 " post=%" PRIu64 ")"
-- 
2.34.1

[PATCH v6 5/5] qapi: introduce exit-on-error parameter for migrate-incoming

2024-04-30 Thread Vladimir Sementsov-Ogievskiy

Now we do set MIGRATION_FAILED state, but don't give a chance to
orchestrator to query migration state and get the error.

Let's provide a possibility for QMP-based orchestrators to get an error
like with outgoing migration.

For hmp_migrate_incoming(), let's enable the new behavior: HMP is not
and ABI, it's mostly intended to use by developer and it makes sense
not to stop the process.

For x-exit-preconfig, let's keep the old behavior:
 - it's called from init(), so here we want to keep current behavior by
   default
 - it does exit on error by itself as well
So, if we want to change the behavior of x-exit-preconfig, it should be
another patch.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Acked-by: Markus Armbruster 
---
 migration/migration-hmp-cmds.c |  2 +-
 migration/migration.c  | 33 +++--
 migration/migration.h  |  3 +++
 qapi/migration.json|  7 ++-
 system/vl.c|  3 ++-
 5 files changed, 39 insertions(+), 9 deletions(-)

diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
index 7e96ae6ffd..23181bbee1 100644
--- a/migration/migration-hmp-cmds.c
+++ b/migration/migration-hmp-cmds.c
@@ -466,7 +466,7 @@ void hmp_migrate_incoming(Monitor *mon, const QDict *qdict)
 }
 QAPI_LIST_PREPEND(caps, g_steal_pointer());
 
-qmp_migrate_incoming(NULL, true, caps, );
+qmp_migrate_incoming(NULL, true, caps, true, false, );
 qapi_free_MigrationChannelList(caps);
 
 end:
diff --git a/migration/migration.c b/migration/migration.c
index a9599838e6..289afa8d85 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -72,6 +72,8 @@
 #define NOTIFIER_ELEM_INIT(array, elem)\
 [elem] = NOTIFIER_WITH_RETURN_LIST_INITIALIZER((array)[elem])
 
+#define INMIGRATE_DEFAULT_EXIT_ON_ERROR true
+
 static NotifierWithReturnList migration_state_notifiers[] = {
 NOTIFIER_ELEM_INIT(migration_state_notifiers, MIG_MODE_NORMAL),
 NOTIFIER_ELEM_INIT(migration_state_notifiers, MIG_MODE_CPR_REBOOT),
@@ -234,6 +236,8 @@ void migration_object_init(void)
 qemu_cond_init(_incoming->page_request_cond);
 current_incoming->page_requested = g_tree_new(page_request_addr_cmp);
 
+current_incoming->exit_on_error = INMIGRATE_DEFAULT_EXIT_ON_ERROR;
+
 migration_object_check(current_migration, _fatal);
 
 blk_mig_init();
@@ -800,12 +804,14 @@ fail:
 
 migration_incoming_state_destroy();
 
-WITH_QEMU_LOCK_GUARD(>error_mutex) {
-error_report_err(s->error);
-s->error = NULL;
-}
+if (mis->exit_on_error) {
+WITH_QEMU_LOCK_GUARD(>error_mutex) {
+error_report_err(s->error);
+s->error = NULL;
+}
 
-exit(EXIT_FAILURE);
+exit(EXIT_FAILURE);
+}
 }
 
 /**
@@ -1314,6 +1320,15 @@ static void 
fill_destination_migration_info(MigrationInfo *info)
 break;
 }
 info->status = mis->state;
+
+if (!info->error_desc) {
+MigrationState *s = migrate_get_current();
+QEMU_LOCK_GUARD(>error_mutex);
+
+if (s->error) {
+info->error_desc = g_strdup(error_get_pretty(s->error));
+}
+}
 }
 
 MigrationInfo *qmp_query_migrate(Error **errp)
@@ -1797,10 +1812,13 @@ void migrate_del_blocker(Error **reasonp)
 }
 
 void qmp_migrate_incoming(const char *uri, bool has_channels,
-  MigrationChannelList *channels, Error **errp)
+  MigrationChannelList *channels,
+  bool has_exit_on_error, bool exit_on_error,
+  Error **errp)
 {
 Error *local_err = NULL;
 static bool once = true;
+MigrationIncomingState *mis = migration_incoming_get_current();
 
 if (!once) {
 error_setg(errp, "The incoming migration has already been started");
@@ -1815,6 +1833,9 @@ void qmp_migrate_incoming(const char *uri, bool 
has_channels,
 return;
 }
 
+mis->exit_on_error =
+has_exit_on_error ? exit_on_error : INMIGRATE_DEFAULT_EXIT_ON_ERROR;
+
 qemu_start_incoming_migration(uri, has_channels, channels, _err);
 
 if (local_err) {
diff --git a/migration/migration.h b/migration/migration.h
index 8045e39c26..95995a818e 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -227,6 +227,9 @@ struct MigrationIncomingState {
  * is needed as this field is updated serially.
  */
 unsigned int switchover_ack_pending_num;
+
+/* Do exit on incoming migration failure */
+bool exit_on_error;
 };
 
 MigrationIncomingState *migration_incoming_get_current(void);
diff --git a/qapi/migration.json b/qapi/migration.json
index 8c65b90328..9feed413b5 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -1837,6 +1837,10 @@
 # @channels: list of migration stream channels with each stream in the
 # list connected to a destination interface endpoint.
 #
+# @exit-on-error: Ex

[PATCH v6 2/5] migration: process_incoming_migration_co(): complete cleanup on failure

2024-04-30 Thread Vladimir Sementsov-Ogievskiy

Make call to migration_incoming_state_destroy(), instead of doing only
partial of it.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Fabiano Rosas 
Reviewed-by: Peter Xu 
---
 migration/migration.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 2dc6a063e9..0d26db47f7 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -799,10 +799,7 @@ process_incoming_migration_co(void *opaque)
 fail:
 migrate_set_state(>state, MIGRATION_STATUS_ACTIVE,
   MIGRATION_STATUS_FAILED);
-qemu_fclose(mis->from_src_file);
-
-multifd_recv_cleanup();
-compress_threads_load_cleanup();
+migration_incoming_state_destroy();
 
 exit(EXIT_FAILURE);
 }
-- 
2.34.1

[PATCH v6 0/5] migration: do not exit on incoming failure

2024-04-30 Thread Vladimir Sementsov-Ogievskiy

Hi all!

The series brings an option to not immediately exit on incoming
migration failure, giving a possibility to orchestrator to get the error
through QAPI and shutdown QEMU by "quit".

v6:
01,02: add r-b by Peter
03: only fix potential use-after-free
04: rework error handling, drop r-b


v5:
- add "migration: process_incoming_migration_co(): fix reporting s->error"

v4:
- add r-b and a-b by Fabiano and Markus
- improve wording in 04 as Markus suggested

v3:
- don't refactor the whole code around setting migration error, it seems
  too much and necessary for the new feature itself
- add constant
- change behavior for HMP command
- split some things to separate patches
- and more, by Peter's suggestions


New behavior can be demonstrated like this:

bash:

(
cat

[PATCH v6 3/5] migration: process_incoming_migration_co(): fix reporting s->error

2024-04-30 Thread Vladimir Sementsov-Ogievskiy

It's bad idea to leave critical section with error object freed, but
s->error still set, this theoretically may lead to use-after-free
crash. Let's avoid it.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 migration/migration.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/migration/migration.c b/migration/migration.c
index 0d26db47f7..b307a4bc59 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -784,6 +784,7 @@ process_incoming_migration_co(void *opaque)
 if (migrate_has_error(s)) {
 WITH_QEMU_LOCK_GUARD(>error_mutex) {
 error_report_err(s->error);
+s->error = NULL;
 }
 }
 error_report("load of migration failed: %s", strerror(-ret));
-- 
2.34.1

Re: [PATCH v5 3/5] migration: process_incoming_migration_co(): fix reporting s->error

2024-04-30 Thread Vladimir Sementsov-Ogievskiy


On 30.04.24 11:09, Vladimir Sementsov-Ogievskiy wrote:

On 30.04.24 11:06, Vladimir Sementsov-Ogievskiy wrote:

On 29.04.24 22:32, Peter Xu wrote:

On Mon, Apr 29, 2024 at 10:14:24PM +0300, Vladimir Sementsov-Ogievskiy wrote:

It's bad idea to leave critical section with error object freed, but
s->error still set, this theoretically may lead to use-after-free
crash. Let's avoid it.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  migration/migration.c | 24 
  1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 0d26db47f7..58fd5819bc 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -732,9 +732,19 @@ static void process_incoming_migration_bh(void *opaque)
  migration_incoming_state_destroy();
  }
+static void migrate_error_free(MigrationState *s)
+{
+    QEMU_LOCK_GUARD(>error_mutex);
+    if (s->error) {
+    error_free(s->error);
+    s->error = NULL;
+    }
+}
+
  static void coroutine_fn
  process_incoming_migration_co(void *opaque)
  {
+    MigrationState *s = migrate_get_current();
  MigrationIncomingState *mis = migration_incoming_get_current();
  PostcopyState ps;
  int ret;
@@ -779,11 +789,9 @@ process_incoming_migration_co(void *opaque)
  }
  if (ret < 0) {
-    MigrationState *s = migrate_get_current();
-
  if (migrate_has_error(s)) {
  WITH_QEMU_LOCK_GUARD(>error_mutex) {
-    error_report_err(s->error);
+    error_report_err(error_copy(s->error));


This looks like a bugfix, agreed.


  }
  }
  error_report("load of migration failed: %s", strerror(-ret));
@@ -801,6 +809,7 @@ fail:
    MIGRATION_STATUS_FAILED);
  migration_incoming_state_destroy();
+    migrate_error_free(s);


Would migration_incoming_state_destroy() be a better place?


Hmm. But we want to call migration_incoming_state_destroy() in case when 
exit-on-error=false too. And in this case we want to keep the error for further 
query-migrate commands.


Actually, I think we shouldn't care about freeing the error for exit() case. I 
think, we skip a lot of other cleanups which we normally do in main() in case 
of this exit().



Or, just do the simplest fix of potential use-after-free, and don't care:

WITH_QEMU_LOCK_GUARD(>error_mutex) {
   error_report_err(s->error);
   s->error = NULL;
}

- I'll send v6 with it.





One thing weird is we actually reuses MigrationState*'s error for incoming
too, but so far it looks ok as long as QEMU can't be both src & dst.  Then
calling migrate_error_free even in incoming_state_destroy() looks ok.

This patch still looks like containing two changes.  Better split them (or
just fix the bug only)?

Thanks,


  exit(EXIT_FAILURE);
  }
@@ -1433,15 +1442,6 @@ bool migrate_has_error(MigrationState *s)
  return qatomic_read(>error);
  }
-static void migrate_error_free(MigrationState *s)
-{
-    QEMU_LOCK_GUARD(>error_mutex);
-    if (s->error) {
-    error_free(s->error);
-    s->error = NULL;
-    }
-}
-
  static void migrate_fd_error(MigrationState *s, const Error *error)
  {
  assert(s->to_dst_file == NULL);
--
2.34.1









--
Best regards,
Vladimir

Re: [PATCH v4 3/3] qapi: introduce device-sync-config

2024-04-30 Thread Vladimir Sementsov-Ogievskiy


On 30.04.24 11:19, Markus Armbruster wrote:

Vladimir Sementsov-Ogievskiy  writes:


Add command to sync config from vhost-user backend to the device. It
may be helpful when VHOST_USER_SLAVE_CONFIG_CHANGE_MSG failed or not
triggered interrupt to the guest or just not available (not supported
by vhost-user server).

Command result is racy if allow it during migration. Let's allow the
sync only in RUNNING state.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  hw/block/vhost-user-blk.c |  1 +
  hw/virtio/virtio-pci.c|  9 
  include/hw/qdev-core.h|  3 +++
  qapi/qdev.json| 23 +++
  system/qdev-monitor.c | 48 +++
  5 files changed, 84 insertions(+)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 091d2c6acf..2f301f380c 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -588,6 +588,7 @@ static void vhost_user_blk_class_init(ObjectClass *klass, 
void *data)
  
  device_class_set_props(dc, vhost_user_blk_properties);

  dc->vmsd = _vhost_user_blk;
+dc->sync_config = vhost_user_blk_sync_config;
  set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
  vdc->realize = vhost_user_blk_device_realize;
  vdc->unrealize = vhost_user_blk_device_unrealize;
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index b1d02f4b3d..0d91e8b5dc 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -2351,6 +2351,14 @@ static void virtio_pci_dc_realize(DeviceState *qdev, 
Error **errp)
  vpciklass->parent_dc_realize(qdev, errp);
  }
  
+static int virtio_pci_sync_config(DeviceState *dev, Error **errp)

+{
+VirtIOPCIProxy *proxy = VIRTIO_PCI(dev);
+VirtIODevice *vdev = virtio_bus_get_device(>bus);
+
+return qdev_sync_config(DEVICE(vdev), errp);
+}
+
  static void virtio_pci_class_init(ObjectClass *klass, void *data)
  {
  DeviceClass *dc = DEVICE_CLASS(klass);
@@ -2367,6 +2375,7 @@ static void virtio_pci_class_init(ObjectClass *klass, 
void *data)
  device_class_set_parent_realize(dc, virtio_pci_dc_realize,
  >parent_dc_realize);
  rc->phases.hold = virtio_pci_bus_reset_hold;
+dc->sync_config = virtio_pci_sync_config;
  }
  
  static const TypeInfo virtio_pci_info = {

diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 9228e96c87..87135bdcdf 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -95,6 +95,7 @@ typedef void (*DeviceUnrealize)(DeviceState *dev);
  typedef void (*DeviceReset)(DeviceState *dev);
  typedef void (*BusRealize)(BusState *bus, Error **errp);
  typedef void (*BusUnrealize)(BusState *bus);
+typedef int (*DeviceSyncConfig)(DeviceState *dev, Error **errp);
  
  /**

   * struct DeviceClass - The base class for all devices.
@@ -162,6 +163,7 @@ struct DeviceClass {
  DeviceReset reset;
  DeviceRealize realize;
  DeviceUnrealize unrealize;
+DeviceSyncConfig sync_config;
  
  /**

   * @vmsd: device state serialisation description for
@@ -546,6 +548,7 @@ bool qdev_hotplug_allowed(DeviceState *dev, Error **errp);
   */
  HotplugHandler *qdev_get_hotplug_handler(DeviceState *dev);
  void qdev_unplug(DeviceState *dev, Error **errp);
+int qdev_sync_config(DeviceState *dev, Error **errp);
  void qdev_simple_device_unplug_cb(HotplugHandler *hotplug_dev,
DeviceState *dev, Error **errp);
  void qdev_machine_creation_done(void);
diff --git a/qapi/qdev.json b/qapi/qdev.json
index facaa0bc6a..fc5e125a45 100644
--- a/qapi/qdev.json
+++ b/qapi/qdev.json
@@ -161,3 +161,26 @@
  ##
  { 'event': 'DEVICE_UNPLUG_GUEST_ERROR',
'data': { '*device': 'str', 'path': 'str' } }
+
+##
+# @device-sync-config:
+#
+# Synchronize device configuration from host to guest part.  First,
+# copy the configuration from the host part (backend) to the guest
+# part (frontend).  Then notify guest software that device
+# configuration changed.


Blank line here, please.


+# The command may be used to notify the guest about block device
+# capcity change.  Currently only vhost-user-blk device supports
+# this.
+#
+# @id: the device's ID or QOM path
+#
+# Features:
+#
+# @unstable: The command is experimental.
+#
+# Since: 9.1
+##
+{ 'command': 'device-sync-config',
+  'features': [ 'unstable' ],
+  'data': {'id': 'str'} }
diff --git a/system/qdev-monitor.c b/system/qdev-monitor.c
index 264978aa40..47bfc0506e 100644
--- a/system/qdev-monitor.c
+++ b/system/qdev-monitor.c
@@ -23,6 +23,7 @@
  #include "monitor/monitor.h"
  #include "monitor/qdev.h"
  #include "sysemu/arch_init.h"
+#include "sysemu/runstate.h"
  #include "qapi/error.h"
  #include "qapi/qapi-commands-qdev.h"
  #include "qapi/qmp/dispatch.h"
@@ -971,6 +972,53 @@ void qmp_device_del(const char *id, Error **errp)
  }
  }
  
+int qdev_sync_config(DeviceSta

Re: [PATCH v4 2/3] vhost-user-blk: split vhost_user_blk_sync_config()

2024-04-30 Thread Vladimir Sementsov-Ogievskiy


On 30.04.24 11:15, Markus Armbruster wrote:

Vladimir Sementsov-Ogievskiy  writes:


Split vhost_user_blk_sync_config() out from
vhost_user_blk_handle_config_change(), to be reused in the following
commit.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  hw/block/vhost-user-blk.c | 26 +++---
  1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 9e6bbc6950..091d2c6acf 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -88,27 +88,39 @@ static void vhost_user_blk_set_config(VirtIODevice *vdev, 
const uint8_t *config)
  s->blkcfg.wce = blkcfg->wce;
  }
  
+static int vhost_user_blk_sync_config(DeviceState *dev, Error **errp)

+{
+int ret;
+VirtIODevice *vdev = VIRTIO_DEVICE(dev);


Note for later: all this function does with paramter @dev is cast it to
VirtIODevice *.


+VHostUserBlk *s = VHOST_USER_BLK(vdev);
+
+ret = vhost_dev_get_config(>dev, (uint8_t *)>blkcfg,
+   vdev->config_len, errp);
+if (ret < 0) {
+return ret;
+}
+
+memcpy(vdev->config, >blkcfg, vdev->config_len);
+virtio_notify_config(vdev);
+
+return 0;
+}
+
  static int vhost_user_blk_handle_config_change(struct vhost_dev *dev)
  {
  int ret;
-VirtIODevice *vdev = dev->vdev;
-VHostUserBlk *s = VHOST_USER_BLK(dev->vdev);
  Error *local_err = NULL;
  
  if (!dev->started) {

  return 0;
  }
  
-ret = vhost_dev_get_config(dev, (uint8_t *)>blkcfg,

-   vdev->config_len, _err);
+ret = vhost_user_blk_sync_config(DEVICE(dev->vdev), _err);


dev->vdev is a VirtIODevice *.  You cast it to DeviceState * for
vhost_user_blk_sync_config(), which casts it right back.

Could you simply pass it as is instead?


vhost_user_blk_sync_config() is generic handler, which will be used as 
->sync_config() in the following commit, so it's good and convenient for it to 
have DeviceState* argument.




  if (ret < 0) {
  error_report_err(local_err);
  return ret;
  }
  
-memcpy(dev->vdev->config, >blkcfg, vdev->config_len);

-virtio_notify_config(dev->vdev);
-
  return 0;
  }




--
Best regards,
Vladimir

Re: [PATCH v5 3/5] migration: process_incoming_migration_co(): fix reporting s->error

2024-04-30 Thread Vladimir Sementsov-Ogievskiy


On 30.04.24 11:06, Vladimir Sementsov-Ogievskiy wrote:

On 29.04.24 22:32, Peter Xu wrote:

On Mon, Apr 29, 2024 at 10:14:24PM +0300, Vladimir Sementsov-Ogievskiy wrote:

It's bad idea to leave critical section with error object freed, but
s->error still set, this theoretically may lead to use-after-free
crash. Let's avoid it.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  migration/migration.c | 24 
  1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 0d26db47f7..58fd5819bc 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -732,9 +732,19 @@ static void process_incoming_migration_bh(void *opaque)
  migration_incoming_state_destroy();
  }
+static void migrate_error_free(MigrationState *s)
+{
+    QEMU_LOCK_GUARD(>error_mutex);
+    if (s->error) {
+    error_free(s->error);
+    s->error = NULL;
+    }
+}
+
  static void coroutine_fn
  process_incoming_migration_co(void *opaque)
  {
+    MigrationState *s = migrate_get_current();
  MigrationIncomingState *mis = migration_incoming_get_current();
  PostcopyState ps;
  int ret;
@@ -779,11 +789,9 @@ process_incoming_migration_co(void *opaque)
  }
  if (ret < 0) {
-    MigrationState *s = migrate_get_current();
-
  if (migrate_has_error(s)) {
  WITH_QEMU_LOCK_GUARD(>error_mutex) {
-    error_report_err(s->error);
+    error_report_err(error_copy(s->error));


This looks like a bugfix, agreed.


  }
  }
  error_report("load of migration failed: %s", strerror(-ret));
@@ -801,6 +809,7 @@ fail:
    MIGRATION_STATUS_FAILED);
  migration_incoming_state_destroy();
+    migrate_error_free(s);


Would migration_incoming_state_destroy() be a better place?


Hmm. But we want to call migration_incoming_state_destroy() in case when 
exit-on-error=false too. And in this case we want to keep the error for further 
query-migrate commands.


Actually, I think we shouldn't care about freeing the error for exit() case. I 
think, we skip a lot of other cleanups which we normally do in main() in case 
of this exit().





One thing weird is we actually reuses MigrationState*'s error for incoming
too, but so far it looks ok as long as QEMU can't be both src & dst.  Then
calling migrate_error_free even in incoming_state_destroy() looks ok.

This patch still looks like containing two changes.  Better split them (or
just fix the bug only)?

Thanks,


  exit(EXIT_FAILURE);
  }
@@ -1433,15 +1442,6 @@ bool migrate_has_error(MigrationState *s)
  return qatomic_read(>error);
  }
-static void migrate_error_free(MigrationState *s)
-{
-    QEMU_LOCK_GUARD(>error_mutex);
-    if (s->error) {
-    error_free(s->error);
-    s->error = NULL;
-    }
-}
-
  static void migrate_fd_error(MigrationState *s, const Error *error)
  {
  assert(s->to_dst_file == NULL);
--
2.34.1







--
Best regards,
Vladimir

Re: [PATCH v5 3/5] migration: process_incoming_migration_co(): fix reporting s->error

2024-04-30 Thread Vladimir Sementsov-Ogievskiy


On 29.04.24 22:32, Peter Xu wrote:

On Mon, Apr 29, 2024 at 10:14:24PM +0300, Vladimir Sementsov-Ogievskiy wrote:

It's bad idea to leave critical section with error object freed, but
s->error still set, this theoretically may lead to use-after-free
crash. Let's avoid it.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  migration/migration.c | 24 
  1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 0d26db47f7..58fd5819bc 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -732,9 +732,19 @@ static void process_incoming_migration_bh(void *opaque)
  migration_incoming_state_destroy();
  }
  
+static void migrate_error_free(MigrationState *s)

+{
+QEMU_LOCK_GUARD(>error_mutex);
+if (s->error) {
+error_free(s->error);
+s->error = NULL;
+}
+}
+
  static void coroutine_fn
  process_incoming_migration_co(void *opaque)
  {
+MigrationState *s = migrate_get_current();
  MigrationIncomingState *mis = migration_incoming_get_current();
  PostcopyState ps;
  int ret;
@@ -779,11 +789,9 @@ process_incoming_migration_co(void *opaque)
  }
  
  if (ret < 0) {

-MigrationState *s = migrate_get_current();
-
  if (migrate_has_error(s)) {
  WITH_QEMU_LOCK_GUARD(>error_mutex) {
-error_report_err(s->error);
+error_report_err(error_copy(s->error));


This looks like a bugfix, agreed.


  }
  }
  error_report("load of migration failed: %s", strerror(-ret));
@@ -801,6 +809,7 @@ fail:
MIGRATION_STATUS_FAILED);
  migration_incoming_state_destroy();
  
+migrate_error_free(s);


Would migration_incoming_state_destroy() be a better place?


Hmm. But we want to call migration_incoming_state_destroy() in case when 
exit-on-error=false too. And in this case we want to keep the error for further 
query-migrate commands.



One thing weird is we actually reuses MigrationState*'s error for incoming
too, but so far it looks ok as long as QEMU can't be both src & dst.  Then
calling migrate_error_free even in incoming_state_destroy() looks ok.

This patch still looks like containing two changes.  Better split them (or
just fix the bug only)?

Thanks,


  exit(EXIT_FAILURE);
  }
  
@@ -1433,15 +1442,6 @@ bool migrate_has_error(MigrationState *s)

  return qatomic_read(>error);
  }
  
-static void migrate_error_free(MigrationState *s)

-{
-QEMU_LOCK_GUARD(>error_mutex);
-if (s->error) {
-error_free(s->error);
-s->error = NULL;
-}
-}
-
  static void migrate_fd_error(MigrationState *s, const Error *error)
  {
  assert(s->to_dst_file == NULL);
--
2.34.1





--
Best regards,
Vladimir

Re: [PATCH v5 4/5] migration: process_incoming_migration_co(): rework error reporting

2024-04-30 Thread Vladimir Sementsov-Ogievskiy


On 29.04.24 22:39, Peter Xu wrote:

On Mon, Apr 29, 2024 at 10:14:25PM +0300, Vladimir Sementsov-Ogievskiy wrote:

Unify error reporting in the function. This simplifies the following
commit, which will not-exit-on-error behavior variant to the function.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Fabiano Rosas 
---
  migration/migration.c | 17 ++---
  1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 58fd5819bc..5489ff96df 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -748,11 +748,12 @@ process_incoming_migration_co(void *opaque)
  MigrationIncomingState *mis = migration_incoming_get_current();
  PostcopyState ps;
  int ret;
+Error *local_err = NULL;
  
  assert(mis->from_src_file);
  
  if (compress_threads_load_setup(mis->from_src_file)) {

-error_report("Failed to setup decompress threads");
+error_setg(_err, "Failed to setup decompress threads");
  goto fail;
  }
  
@@ -789,16 +790,12 @@ process_incoming_migration_co(void *opaque)

  }
  
  if (ret < 0) {

-if (migrate_has_error(s)) {
-WITH_QEMU_LOCK_GUARD(>error_mutex) {
-error_report_err(error_copy(s->error));
-}
-}
-error_report("load of migration failed: %s", strerror(-ret));
+error_setg(_err, "load of migration failed: %s", strerror(-ret));
  goto fail;
  }
  
  if (colo_incoming_co() < 0) {

+error_setg(_err, "colo incoming failed");
  goto fail;
  }
  
@@ -809,6 +806,12 @@ fail:

MIGRATION_STATUS_FAILED);
  migration_incoming_state_destroy();
  
+if (migrate_has_error(s)) {

+WITH_QEMU_LOCK_GUARD(>error_mutex) {
+error_report_err(error_copy(s->error));
+}
+}
+error_report_err(local_err);


Here migrate_has_error() will always return true?  If so we can drop it.

Meanwhile, IMHO it's easier we simply always keep the earliest error we see
and report that only, local_err is just one of the errors and whoever
reaches first will be reported.  Something like:

   migrate_set_error(local_err);
   WITH_QEMU_LOCK_GUARD(>error_mutex) {
   error_report_err(error_copy(s->error));
   }
   exit(EXIT_FAILURE);

Then when with the exit-on-error thing:

   migrate_set_error(local_err);
   if (exit_on_error) {
 WITH_QEMU_LOCK_GUARD(>error_mutex) {
   error_report_err(error_copy(s->error));
 }
 exit(EXIT_FAILURE);
   }

Would this looks slightly cleaner?



Yes, looks better, will do so

--
Best regards,
Vladimir

[PATCH v5 1/5] migration: move trace-point from migrate_fd_error to migrate_set_error

2024-04-29 Thread Vladimir Sementsov-Ogievskiy

Cover more cases by trace-point.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Fabiano Rosas 
---
 migration/migration.c  | 4 +++-
 migration/trace-events | 2 +-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index b5af6b5105..2dc6a063e9 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1421,6 +1421,9 @@ static void migrate_fd_cleanup_bh(void *opaque)
 void migrate_set_error(MigrationState *s, const Error *error)
 {
 QEMU_LOCK_GUARD(>error_mutex);
+
+trace_migrate_error(error_get_pretty(error));
+
 if (!s->error) {
 s->error = error_copy(error);
 }
@@ -1444,7 +1447,6 @@ static void migrate_error_free(MigrationState *s)
 
 static void migrate_fd_error(MigrationState *s, const Error *error)
 {
-trace_migrate_fd_error(error_get_pretty(error));
 assert(s->to_dst_file == NULL);
 migrate_set_state(>state, MIGRATION_STATUS_SETUP,
   MIGRATION_STATUS_FAILED);
diff --git a/migration/trace-events b/migration/trace-events
index f0e1cb80c7..d0c44c3853 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -152,7 +152,7 @@ multifd_set_outgoing_channel(void *ioc, const char 
*ioctype, const char *hostnam
 # migration.c
 migrate_set_state(const char *new_state) "new state %s"
 migrate_fd_cleanup(void) ""
-migrate_fd_error(const char *error_desc) "error=%s"
+migrate_error(const char *error_desc) "error=%s"
 migrate_fd_cancel(void) ""
 migrate_handle_rp_req_pages(const char *rbname, size_t start, size_t len) "in 
%s at 0x%zx len 0x%zx"
 migrate_pending_exact(uint64_t size, uint64_t pre, uint64_t post) "exact 
pending size %" PRIu64 " (pre = %" PRIu64 " post=%" PRIu64 ")"
-- 
2.34.1

[PATCH v5 0/5] migration: do not exit on incoming failure

2024-04-29 Thread Vladimir Sementsov-Ogievskiy

Hi all!

The series brings an option to not immediately exit on incoming
migration failure, giving a possibility to orchestrator to get the error
through QAPI and shutdown QEMU by "quit".

v5:
- add "migration: process_incoming_migration_co(): fix reporting s->error"

v4:
- add r-b and a-b by Fabiano and Markus
- improve wording in 04 as Markus suggested

v3:
- don't refactor the whole code around setting migration error, it seems
  too much and necessary for the new feature itself
- add constant
- change behavior for HMP command
- split some things to separate patches
- and more, by Peter's suggestions


New behavior can be demonstrated like this:

bash:

(
cat

[PATCH v5 4/5] migration: process_incoming_migration_co(): rework error reporting

2024-04-29 Thread Vladimir Sementsov-Ogievskiy

Unify error reporting in the function. This simplifies the following
commit, which will not-exit-on-error behavior variant to the function.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Fabiano Rosas 
---
 migration/migration.c | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 58fd5819bc..5489ff96df 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -748,11 +748,12 @@ process_incoming_migration_co(void *opaque)
 MigrationIncomingState *mis = migration_incoming_get_current();
 PostcopyState ps;
 int ret;
+Error *local_err = NULL;
 
 assert(mis->from_src_file);
 
 if (compress_threads_load_setup(mis->from_src_file)) {
-error_report("Failed to setup decompress threads");
+error_setg(_err, "Failed to setup decompress threads");
 goto fail;
 }
 
@@ -789,16 +790,12 @@ process_incoming_migration_co(void *opaque)
 }
 
 if (ret < 0) {
-if (migrate_has_error(s)) {
-WITH_QEMU_LOCK_GUARD(>error_mutex) {
-error_report_err(error_copy(s->error));
-}
-}
-error_report("load of migration failed: %s", strerror(-ret));
+error_setg(_err, "load of migration failed: %s", strerror(-ret));
 goto fail;
 }
 
 if (colo_incoming_co() < 0) {
+error_setg(_err, "colo incoming failed");
 goto fail;
 }
 
@@ -809,6 +806,12 @@ fail:
   MIGRATION_STATUS_FAILED);
 migration_incoming_state_destroy();
 
+if (migrate_has_error(s)) {
+WITH_QEMU_LOCK_GUARD(>error_mutex) {
+error_report_err(error_copy(s->error));
+}
+}
+error_report_err(local_err);
 migrate_error_free(s);
 exit(EXIT_FAILURE);
 }
-- 
2.34.1

[PATCH v5 3/5] migration: process_incoming_migration_co(): fix reporting s->error

2024-04-29 Thread Vladimir Sementsov-Ogievskiy

It's bad idea to leave critical section with error object freed, but
s->error still set, this theoretically may lead to use-after-free
crash. Let's avoid it.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 migration/migration.c | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 0d26db47f7..58fd5819bc 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -732,9 +732,19 @@ static void process_incoming_migration_bh(void *opaque)
 migration_incoming_state_destroy();
 }
 
+static void migrate_error_free(MigrationState *s)
+{
+QEMU_LOCK_GUARD(>error_mutex);
+if (s->error) {
+error_free(s->error);
+s->error = NULL;
+}
+}
+
 static void coroutine_fn
 process_incoming_migration_co(void *opaque)
 {
+MigrationState *s = migrate_get_current();
 MigrationIncomingState *mis = migration_incoming_get_current();
 PostcopyState ps;
 int ret;
@@ -779,11 +789,9 @@ process_incoming_migration_co(void *opaque)
 }
 
 if (ret < 0) {
-MigrationState *s = migrate_get_current();
-
 if (migrate_has_error(s)) {
 WITH_QEMU_LOCK_GUARD(>error_mutex) {
-error_report_err(s->error);
+error_report_err(error_copy(s->error));
 }
 }
 error_report("load of migration failed: %s", strerror(-ret));
@@ -801,6 +809,7 @@ fail:
   MIGRATION_STATUS_FAILED);
 migration_incoming_state_destroy();
 
+migrate_error_free(s);
 exit(EXIT_FAILURE);
 }
 
@@ -1433,15 +1442,6 @@ bool migrate_has_error(MigrationState *s)
 return qatomic_read(>error);
 }
 
-static void migrate_error_free(MigrationState *s)
-{
-QEMU_LOCK_GUARD(>error_mutex);
-if (s->error) {
-error_free(s->error);
-s->error = NULL;
-}
-}
-
 static void migrate_fd_error(MigrationState *s, const Error *error)
 {
 assert(s->to_dst_file == NULL);
-- 
2.34.1

[PATCH v5 2/5] migration: process_incoming_migration_co(): complete cleanup on failure

2024-04-29 Thread Vladimir Sementsov-Ogievskiy

Make call to migration_incoming_state_destroy(), instead of doing only
partial of it.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Fabiano Rosas 
---
 migration/migration.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 2dc6a063e9..0d26db47f7 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -799,10 +799,7 @@ process_incoming_migration_co(void *opaque)
 fail:
 migrate_set_state(>state, MIGRATION_STATUS_ACTIVE,
   MIGRATION_STATUS_FAILED);
-qemu_fclose(mis->from_src_file);
-
-multifd_recv_cleanup();
-compress_threads_load_cleanup();
+migration_incoming_state_destroy();
 
 exit(EXIT_FAILURE);
 }
-- 
2.34.1

[PATCH v5 5/5] qapi: introduce exit-on-error parameter for migrate-incoming

2024-04-29 Thread Vladimir Sementsov-Ogievskiy

Now we do set MIGRATION_FAILED state, but don't give a chance to
orchestrator to query migration state and get the error.

Let's provide a possibility for QMP-based orchestrators to get an error
like with outgoing migration.

For hmp_migrate_incoming(), let's enable the new behavior: HMP is not
and ABI, it's mostly intended to use by developer and it makes sense
not to stop the process.

For x-exit-preconfig, let's keep the old behavior:
 - it's called from init(), so here we want to keep current behavior by
   default
 - it does exit on error by itself as well
So, if we want to change the behavior of x-exit-preconfig, it should be
another patch.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Acked-by: Markus Armbruster 
---
 migration/migration-hmp-cmds.c |  2 +-
 migration/migration.c  | 38 +++---
 migration/migration.h  |  3 +++
 qapi/migration.json|  7 ++-
 system/vl.c|  3 ++-
 5 files changed, 43 insertions(+), 10 deletions(-)

diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
index 7e96ae6ffd..23181bbee1 100644
--- a/migration/migration-hmp-cmds.c
+++ b/migration/migration-hmp-cmds.c
@@ -466,7 +466,7 @@ void hmp_migrate_incoming(Monitor *mon, const QDict *qdict)
 }
 QAPI_LIST_PREPEND(caps, g_steal_pointer());
 
-qmp_migrate_incoming(NULL, true, caps, );
+qmp_migrate_incoming(NULL, true, caps, true, false, );
 qapi_free_MigrationChannelList(caps);
 
 end:
diff --git a/migration/migration.c b/migration/migration.c
index 5489ff96df..09f762c99e 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -72,6 +72,8 @@
 #define NOTIFIER_ELEM_INIT(array, elem)\
 [elem] = NOTIFIER_WITH_RETURN_LIST_INITIALIZER((array)[elem])
 
+#define INMIGRATE_DEFAULT_EXIT_ON_ERROR true
+
 static NotifierWithReturnList migration_state_notifiers[] = {
 NOTIFIER_ELEM_INIT(migration_state_notifiers, MIG_MODE_NORMAL),
 NOTIFIER_ELEM_INIT(migration_state_notifiers, MIG_MODE_CPR_REBOOT),
@@ -234,6 +236,8 @@ void migration_object_init(void)
 qemu_cond_init(_incoming->page_request_cond);
 current_incoming->page_requested = g_tree_new(page_request_addr_cmp);
 
+current_incoming->exit_on_error = INMIGRATE_DEFAULT_EXIT_ON_ERROR;
+
 migration_object_check(current_migration, _fatal);
 
 blk_mig_init();
@@ -806,14 +810,19 @@ fail:
   MIGRATION_STATUS_FAILED);
 migration_incoming_state_destroy();
 
-if (migrate_has_error(s)) {
-WITH_QEMU_LOCK_GUARD(>error_mutex) {
-error_report_err(error_copy(s->error));
+if (mis->exit_on_error) {
+if (migrate_has_error(s)) {
+WITH_QEMU_LOCK_GUARD(>error_mutex) {
+error_report_err(error_copy(s->error));
+}
 }
+error_report_err(local_err);
+migrate_error_free(s);
+exit(EXIT_FAILURE);
+} else {
+migrate_set_error(s, local_err);
+error_free(local_err);
 }
-error_report_err(local_err);
-migrate_error_free(s);
-exit(EXIT_FAILURE);
 }
 
 /**
@@ -1322,6 +1331,15 @@ static void 
fill_destination_migration_info(MigrationInfo *info)
 break;
 }
 info->status = mis->state;
+
+if (!info->error_desc) {
+MigrationState *s = migrate_get_current();
+QEMU_LOCK_GUARD(>error_mutex);
+
+if (s->error) {
+info->error_desc = g_strdup(error_get_pretty(s->error));
+}
+}
 }
 
 MigrationInfo *qmp_query_migrate(Error **errp)
@@ -1796,10 +1814,13 @@ void migrate_del_blocker(Error **reasonp)
 }
 
 void qmp_migrate_incoming(const char *uri, bool has_channels,
-  MigrationChannelList *channels, Error **errp)
+  MigrationChannelList *channels,
+  bool has_exit_on_error, bool exit_on_error,
+  Error **errp)
 {
 Error *local_err = NULL;
 static bool once = true;
+MigrationIncomingState *mis = migration_incoming_get_current();
 
 if (!once) {
 error_setg(errp, "The incoming migration has already been started");
@@ -1814,6 +1835,9 @@ void qmp_migrate_incoming(const char *uri, bool 
has_channels,
 return;
 }
 
+mis->exit_on_error =
+has_exit_on_error ? exit_on_error : INMIGRATE_DEFAULT_EXIT_ON_ERROR;
+
 qemu_start_incoming_migration(uri, has_channels, channels, _err);
 
 if (local_err) {
diff --git a/migration/migration.h b/migration/migration.h
index 8045e39c26..95995a818e 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -227,6 +227,9 @@ struct MigrationIncomingState {
  * is needed as this field is updated serially.
  */
 unsigned int switchover_ack_pending_num;
+
+/* Do exit on incoming migration failure */
+bool exit_on_error;
 };
 
 MigrationIncomingState *migration_incoming_get_current(void);
di

Re: [PULL 6/6] iotests: add backup-discard-source

2024-04-29 Thread Vladimir Sementsov-Ogievskiy


[Add John]

On 29.04.24 17:18, Richard Henderson wrote:

On 4/29/24 04:51, Vladimir Sementsov-Ogievskiy wrote:

Add test for a new backup option: discard-source.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Fiona Ebner 
Tested-by: Fiona Ebner 
Message-Id: <20240313152822.626493-6-vsement...@yandex-team.ru>
Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  .../qemu-iotests/tests/backup-discard-source  | 152 ++
  .../tests/backup-discard-source.out   |   5 +
  2 files changed, 157 insertions(+)
  create mode 100755 tests/qemu-iotests/tests/backup-discard-source
  create mode 100644 tests/qemu-iotests/tests/backup-discard-source.out


This fails check-python-minreqs

   https://gitlab.com/qemu-project/qemu/-/jobs/6739551782

It appears to be a pylint issue.




tests/export-incoming-iothread:1:0: R0801: Similar lines in 2 files
==tests.backup-discard-source:[52:61]
==tests.copy-before-write:[54:63]
'file': {
'driver': iotests.imgfmt,
'file': {
'driver': 'file',
'filename': source_img,
}
},
'target': {
'driver': iotests.imgfmt, (duplicate-code)



Hmm. I don't think, that something should be changed in code. splitting out 
part of this json object to a function? That's a test for QMP command, and it's 
good that we see the command as is in one place. I can swap some lines or 
rename variables to hack the linter, but I'd prefer not doing so:)


For me that looks like a false-positive: yes it's a duplication, but it should 
better be duplication, than complicating raw json objects by reusing common 
parts.


What to do? As described in 22305c2a081b8b6 "python: Reduce strictness of pylint's 
duplicate-code check", this check could be either be disabled completely, or we can 
increase min-similarity-lines config value.

I'd just disable it completely. Any thoughts?


--
Best regards,
Vladimir

Re: [PATCH v3 4/4] qapi: introduce exit-on-error parameter for migrate-incoming

2024-04-29 Thread Vladimir Sementsov-Ogievskiy


On 29.04.24 16:06, Fabiano Rosas wrote:

Vladimir Sementsov-Ogievskiy  writes:


On 25.04.24 23:30, Fabiano Rosas wrote:

@@ -797,13 +801,18 @@ fail:
 MIGRATION_STATUS_FAILED);
   migration_incoming_state_destroy();
   
-if (migrate_has_error(s)) {

-WITH_QEMU_LOCK_GUARD(>error_mutex) {
-error_report_err(s->error);
+if (mis->exit_on_error) {
+if (migrate_has_error(s)) {
+WITH_QEMU_LOCK_GUARD(>error_mutex) {
+error_report_err(s->error);

error_report_err(error_copy(s->error))

...because later on you're reading from s->error at
fill_destination_migration_info.


No, we immediately do exit() instead. That's just a preexisting behavior, moved into 
"if (mis->exit_on_error)"


I meant later in the patch, not later in the execution. Can't
query-migrate be called during process_incoming_migration_co?


Hmm.. On the one hand, seems no reason to care about it exactly before exit().. 
On the other hand, we do care about taking error_mutex. And we do release it, 
which may trigger another critical section.

I'll try to touch up this thing.

--
Best regards,
Vladimir

Re: [PATCH] block/copy-before-write: use uint64_t for timeout in nanoseconds

2024-04-29 Thread Vladimir Sementsov-Ogievskiy


On 29.04.24 17:46, Fiona Ebner wrote:

Am 29.04.24 um 16:36 schrieb Philippe Mathieu-Daudé:

Hi Fiona,

On 29/4/24 16:19, Fiona Ebner wrote:

Not everybody uses an email client that shows the patch content just
after the subject (your first lines wasn't making sense at first).

Simply duplicating the subject helps to understand:

   Use uint64_t for timeout in nanoseconds ...



Oh, sorry. I'll try to remember that for the future. Should I re-send as
a v2?



Not necessary, I can touch up the message when applying to my block branch.

Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir

Re: [PATCH v3 4/5] qapi: introduce device-sync-config

2024-04-29 Thread Vladimir Sementsov-Ogievskiy


On 29.04.24 16:04, Markus Armbruster wrote:

Vladimir Sementsov-Ogievskiy  writes:


On 29.04.24 13:51, Markus Armbruster wrote:

Vladimir Sementsov-Ogievskiy  writes:


On 24.04.24 14:48, Markus Armbruster wrote:

Vladimir Sementsov-Ogievskiy  writes:


Add command to sync config from vhost-user backend to the device. It
may be helpful when VHOST_USER_SLAVE_CONFIG_CHANGE_MSG failed or not
triggered interrupt to the guest or just not available (not supported
by vhost-user server).

Command result is racy if allow it during migration. Let's allow the
sync only in RUNNING state.

Signed-off-by: Vladimir Sementsov-Ogievskiy 


[...]


diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
index 0117d243c4..296af52322 100644
--- a/include/sysemu/runstate.h
+++ b/include/sysemu/runstate.h
@@ -5,6 +5,7 @@
#include "qemu/notify.h"

bool runstate_check(RunState state);

+const char *current_run_state_str(void);
void runstate_set(RunState new_state);
RunState runstate_get(void);
bool runstate_is_running(void);
diff --git a/qapi/qdev.json b/qapi/qdev.json
index facaa0bc6a..e8be79c3d5 100644
--- a/qapi/qdev.json
+++ b/qapi/qdev.json
@@ -161,3 +161,24 @@
##
{ 'event': 'DEVICE_UNPLUG_GUEST_ERROR',
  'data': { '*device': 'str', 'path': 'str' } }
+
+##
+# @device-sync-config:
+#
+# Synchronize config from backend to the guest. The command notifies
+# re-read the device config from the backend and notifies the guest
+# to re-read the config. The command may be used to notify the guest
+# about block device capcity change. Currently only vhost-user-blk
+# device supports this.


I'm not sure I understand this.  To work towards an understanding, I
rephrase it, and you point out the errors.

Synchronize device configuration from host to guest part.  First,
copy the configuration from the host part (backend) to the guest
part (frontend).  Then notify guest software that device
configuration changed.


Correct, thanks


Perhaps

Synchronize guest-visible device configuration with the backend's
configuration, and notify guest software that device configuration
changed.

This may be useful to notify the guest of a block device capacity
change.  Currenrly, only vhost-user-blk devices support this.


Sounds good


Except I fat-fingered "Currently".



Next question: what happens when the device *doesn't* support this?


An error "device-sync-config is not supported ..."


Okay.


I wonder how configuration can get out of sync.  Can you explain?



The example (and the original feature, which triggered developing this) is 
vhost disk resize. If vhost-server (backend) doesn't support 
VHOST_USER_SLAVE_CONFIG_CHANGE_MSG, neither QEMU nor guest will know that disk 
capacity changed.


Sounds like we wouldn't need this command if we could make the
vhost-server support VHOST_USER_SLAVE_CONFIG_CHANGE_MSG.  Is making it
support it impractical?  Or are there other uses for this command?


Qemu's internal vhost-server do support it. But that's not the only vhost-user server) So 
the command is useful for those servers which doesn't support 
VHOST_USER_SLAVE_CONFIG_CHANGE_MSG. Note, that this message requires setting up additional 
channel of server -> client communication. That was the reason, why the 
"change-msg" solution was rejected in our downstream: it's safer to reuse existing 
channel (QMP), than to add and support an additional channel.

Also, the command may help to debug the system, when 
VHOST_USER_SLAVE_CONFIG_CHANGE_MSG doesn't work for some reason.


Suggest to work this into the commit message.


+#
+# @id: the device's ID or QOM path
+#
+# Features:
+#
+# @unstable: The command is experimental.
+#
+# Since: 9.1
+##
+{ 'command': 'device-sync-config',
+  'features': [ 'unstable' ],
+  'data': {'id': 'str'} }
diff --git a/system/qdev-monitor.c b/system/qdev-monitor.c
index 7e075d91c1..cb35ea0b86 100644
--- a/system/qdev-monitor.c
+++ b/system/qdev-monitor.c
@@ -23,6 +23,7 @@
   #include "monitor/monitor.h"
   #include "monitor/qdev.h"
   #include "sysemu/arch_init.h"
+#include "sysemu/runstate.h"
   #include "qapi/error.h"
   #include "qapi/qapi-commands-qdev.h"
   #include "qapi/qmp/dispatch.h"
@@ -969,6 +970,52 @@ void qmp_device_del(const char *id, Error **errp)
}
}

+int qdev_sync_config(DeviceState *dev, Error **errp)

+{
+DeviceClass *dc = DEVICE_GET_CLASS(dev);
+
+if (!dc->sync_config) {
+error_setg(errp, "device-sync-config is not supported for '%s'",
+   object_get_typename(OBJECT(dev)));
+return -ENOTSUP;
+}
+
+return dc->sync_config(dev, errp);
+}
+
+void qmp_device_sync_config(const char *id, Error **errp)
+{
+DeviceState *dev;
+
+/*
+ * During migration there is a race between syncing`config and
+ * migr

Re: [PATCH v3 4/5] qapi: introduce device-sync-config

2024-04-29 Thread Vladimir Sementsov-Ogievskiy


On 29.04.24 13:51, Markus Armbruster wrote:

Vladimir Sementsov-Ogievskiy  writes:


On 24.04.24 14:48, Markus Armbruster wrote:

Vladimir Sementsov-Ogievskiy  writes:


Add command to sync config from vhost-user backend to the device. It
may be helpful when VHOST_USER_SLAVE_CONFIG_CHANGE_MSG failed or not
triggered interrupt to the guest or just not available (not supported
by vhost-user server).

Command result is racy if allow it during migration. Let's allow the
sync only in RUNNING state.

Signed-off-by: Vladimir Sementsov-Ogievskiy 


[...]


diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
index 0117d243c4..296af52322 100644
--- a/include/sysemu/runstate.h
+++ b/include/sysemu/runstate.h
@@ -5,6 +5,7 @@
   #include "qemu/notify.h"
   
   bool runstate_check(RunState state);

+const char *current_run_state_str(void);
   void runstate_set(RunState new_state);
   RunState runstate_get(void);
   bool runstate_is_running(void);
diff --git a/qapi/qdev.json b/qapi/qdev.json
index facaa0bc6a..e8be79c3d5 100644
--- a/qapi/qdev.json
+++ b/qapi/qdev.json
@@ -161,3 +161,24 @@
   ##
   { 'event': 'DEVICE_UNPLUG_GUEST_ERROR',
 'data': { '*device': 'str', 'path': 'str' } }
+
+##
+# @device-sync-config:
+#
+# Synchronize config from backend to the guest. The command notifies
+# re-read the device config from the backend and notifies the guest
+# to re-read the config. The command may be used to notify the guest
+# about block device capcity change. Currently only vhost-user-blk
+# device supports this.


I'm not sure I understand this.  To work towards an understanding, I
rephrase it, and you point out the errors.

   Synchronize device configuration from host to guest part.  First,
   copy the configuration from the host part (backend) to the guest
   part (frontend).  Then notify guest software that device
   configuration changed.


Correct, thanks


Perhaps

   Synchronize guest-visible device configuration with the backend's
   configuration, and notify guest software that device configuration
   changed.

   This may be useful to notify the guest of a block device capacity
   change.  Currenrly, only vhost-user-blk devices support this.


Sounds good



Next question: what happens when the device *doesn't* support this?


An error "device-sync-config is not supported ..."




I wonder how configuration can get out of sync.  Can you explain?



The example (and the original feature, which triggered developing this) is 
vhost disk resize. If vhost-server (backend) doesn't support 
VHOST_USER_SLAVE_CONFIG_CHANGE_MSG, neither QEMU nor guest will know that disk 
capacity changed.


Sounds like we wouldn't need this command if we could make the
vhost-server support VHOST_USER_SLAVE_CONFIG_CHANGE_MSG.  Is making it
support it impractical?  Or are there other uses for this command?


Qemu's internal vhost-server do support it. But that's not the only vhost-user server) So 
the command is useful for those servers which doesn't support 
VHOST_USER_SLAVE_CONFIG_CHANGE_MSG. Note, that this message requires setting up additional 
channel of server -> client communication. That was the reason, why the 
"change-msg" solution was rejected in our downstream: it's safer to reuse existing 
channel (QMP), than to add and support an additional channel.

Also, the command may help to debug the system, when 
VHOST_USER_SLAVE_CONFIG_CHANGE_MSG doesn't work for some reason.




+#
+# @id: the device's ID or QOM path
+#
+# Features:
+#
+# @unstable: The command is experimental.
+#
+# Since: 9.1
+##
+{ 'command': 'device-sync-config',
+  'features': [ 'unstable' ],
+  'data': {'id': 'str'} }
diff --git a/system/qdev-monitor.c b/system/qdev-monitor.c
index 7e075d91c1..cb35ea0b86 100644
--- a/system/qdev-monitor.c
+++ b/system/qdev-monitor.c
@@ -23,6 +23,7 @@
  #include "monitor/monitor.h"
  #include "monitor/qdev.h"
  #include "sysemu/arch_init.h"
+#include "sysemu/runstate.h"
  #include "qapi/error.h"
  #include "qapi/qapi-commands-qdev.h"
  #include "qapi/qmp/dispatch.h"
@@ -969,6 +970,52 @@ void qmp_device_del(const char *id, Error **errp)
   }
   }
   
+int qdev_sync_config(DeviceState *dev, Error **errp)

+{
+DeviceClass *dc = DEVICE_GET_CLASS(dev);
+
+if (!dc->sync_config) {
+error_setg(errp, "device-sync-config is not supported for '%s'",
+   object_get_typename(OBJECT(dev)));
+return -ENOTSUP;
+}
+
+return dc->sync_config(dev, errp);
+}
+
+void qmp_device_sync_config(const char *id, Error **errp)
+{
+DeviceState *dev;
+
+/*
+ * During migration there is a race between syncing`config and
+ * migrating it, so let's just not allow it.


Can you briefly explain the race?


If at the moment of qmp command, corresponding config already migrated to the 
target, we'll change only the config on source, but on th

Re: [PULL 0/6] Block jobs patches for 2024-04-29

2024-04-29 Thread Vladimir Sementsov-Ogievskiy


Sorry for too much CC-ing, I've mistakenly added 
--cc-cmd=./scripts/get_maintainer.pl


On 29.04.24 14:51, Vladimir Sementsov-Ogievskiy wrote:

The following changes since commit fd87be1dada5672f877e03c2ca8504458292c479:

   Merge tag 'accel-20240426' of https://github.com/philmd/qemu into staging 
(2024-04-26 15:28:13 -0700)

are available in the Git repository at:

   https://gitlab.com/vsementsov/qemu.git tags/pull-block-jobs-2024-04-29

for you to fetch changes up to 2ca7608c6b8d57fd6347b11af12a0f035263efef:

   iotests: add backup-discard-source (2024-04-29 13:35:30 +0300)


Block jobs patches for 2024-04-29

- backup: discard-source parameter
- blockcommit: Reopen base image as RO after abort


Alexander Ivanov (1):
   blockcommit: Reopen base image as RO after abort

Vladimir Sementsov-Ogievskiy (5):
   block/copy-before-write: fix permission
   block/copy-before-write: support unligned snapshot-discard
   block/copy-before-write: create block_copy bitmap in filter node
   qapi: blockdev-backup: add discard-source parameter
   iotests: add backup-discard-source

  block/backup.c |   5 +-
  block/block-copy.c |  12 +-
  block/copy-before-write.c  |  39 +--
  block/copy-before-write.h  |   1 +
  block/mirror.c |  11 +-
  block/replication.c|   4 +-
  blockdev.c |   2 +-
  include/block/block-common.h   |   2 +
  include/block/block-copy.h |   2 +
  include/block/block_int-global-state.h |   2 +-
  qapi/block-core.json   |   4 +
  tests/qemu-iotests/257.out | 112 +-
  tests/qemu-iotests/tests/backup-discard-source | 152 
+
  tests/qemu-iotests/tests/backup-discard-source.out |   5 +
  14 files changed, 281 insertions(+), 72 deletions(-)
  create mode 100755 tests/qemu-iotests/tests/backup-discard-source
  create mode 100644 tests/qemu-iotests/tests/backup-discard-source.out

Alexander Ivanov (1):
   blockcommit: Reopen base image as RO after abort

Vladimir Sementsov-Ogievskiy (5):
   block/copy-before-write: fix permission
   block/copy-before-write: support unligned snapshot-discard
   block/copy-before-write: create block_copy bitmap in filter node
   qapi: blockdev-backup: add discard-source parameter
   iotests: add backup-discard-source

  block/backup.c|   5 +-
  block/block-copy.c|  12 +-
  block/copy-before-write.c |  39 -
  block/copy-before-write.h |   1 +
  block/mirror.c|  11 +-
  block/replication.c   |   4 +-
  blockdev.c|   2 +-
  include/block/block-common.h  |   2 +
  include/block/block-copy.h|   2 +
  include/block/block_int-global-state.h|   2 +-
  qapi/block-core.json  |   4 +
  tests/qemu-iotests/257.out| 112 ++---
  .../qemu-iotests/tests/backup-discard-source  | 152 ++
  .../tests/backup-discard-source.out   |   5 +
  14 files changed, 281 insertions(+), 72 deletions(-)
  create mode 100755 tests/qemu-iotests/tests/backup-discard-source
  create mode 100644 tests/qemu-iotests/tests/backup-discard-source.out



--
Best regards,
Vladimir

[PULL 4/6] block/copy-before-write: create block_copy bitmap in filter node

2024-04-29 Thread Vladimir Sementsov-Ogievskiy

Currently block_copy creates copy_bitmap in source node. But that is in
bad relation with .independent_close=true of copy-before-write filter:
source node may be detached and removed before .bdrv_close() handler
called, which should call block_copy_state_free(), which in turn should
remove copy_bitmap.

That's all not ideal: it would be better if internal bitmap of
block-copy object is not attached to any node. But that is not possible
now.

The simplest solution is just create copy_bitmap in filter node, where
anyway two other bitmaps are created.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Fiona Ebner 
Tested-by: Fiona Ebner 
Message-Id: <20240313152822.626493-4-vsement...@yandex-team.ru>
Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/block-copy.c |   3 +-
 block/copy-before-write.c  |   2 +-
 include/block/block-copy.h |   1 +
 tests/qemu-iotests/257.out | 112 ++---
 4 files changed, 60 insertions(+), 58 deletions(-)

diff --git a/block/block-copy.c b/block/block-copy.c
index 9ee3dd7ef5..8fca2c3698 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -351,6 +351,7 @@ static int64_t 
block_copy_calculate_cluster_size(BlockDriverState *target,
 }
 
 BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
+ BlockDriverState *copy_bitmap_bs,
  const BdrvDirtyBitmap *bitmap,
  Error **errp)
 {
@@ -367,7 +368,7 @@ BlockCopyState *block_copy_state_new(BdrvChild *source, 
BdrvChild *target,
 return NULL;
 }
 
-copy_bitmap = bdrv_create_dirty_bitmap(source->bs, cluster_size, NULL,
+copy_bitmap = bdrv_create_dirty_bitmap(copy_bitmap_bs, cluster_size, NULL,
errp);
 if (!copy_bitmap) {
 return NULL;
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index 6d89af0b29..ed2c228da7 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -468,7 +468,7 @@ static int cbw_open(BlockDriverState *bs, QDict *options, 
int flags,
 ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
  bs->file->bs->supported_zero_flags);
 
-s->bcs = block_copy_state_new(bs->file, s->target, bitmap, errp);
+s->bcs = block_copy_state_new(bs->file, s->target, bs, bitmap, errp);
 if (!s->bcs) {
 error_prepend(errp, "Cannot create block-copy-state: ");
 return -EINVAL;
diff --git a/include/block/block-copy.h b/include/block/block-copy.h
index 0700953ab8..8b41643bfa 100644
--- a/include/block/block-copy.h
+++ b/include/block/block-copy.h
@@ -25,6 +25,7 @@ typedef struct BlockCopyState BlockCopyState;
 typedef struct BlockCopyCallState BlockCopyCallState;
 
 BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
+ BlockDriverState *copy_bitmap_bs,
  const BdrvDirtyBitmap *bitmap,
  Error **errp);
 
diff --git a/tests/qemu-iotests/257.out b/tests/qemu-iotests/257.out
index aa76131ca9..c33dd7f3a9 100644
--- a/tests/qemu-iotests/257.out
+++ b/tests/qemu-iotests/257.out
@@ -120,16 +120,16 @@ write -P0x67 0x3fe 0x2
 "granularity": 65536,
 "persistent": false,
 "recording": false
-  }
-],
-"drive0": [
+  },
   {
 "busy": false,
 "count": 0,
 "granularity": 65536,
 "persistent": false,
 "recording": false
-  },
+  }
+],
+"drive0": [
   {
 "busy": false,
 "count": 458752,
@@ -596,16 +596,16 @@ write -P0x67 0x3fe 0x2
 "granularity": 65536,
 "persistent": false,
 "recording": false
-  }
-],
-"drive0": [
+  },
   {
 "busy": false,
 "count": 0,
 "granularity": 65536,
 "persistent": false,
 "recording": false
-  },
+  }
+],
+"drive0": [
   {
 "busy": false,
 "count": 458752,
@@ -865,16 +865,16 @@ write -P0x67 0x3fe 0x2
 "granularity": 65536,
 "persistent": false,
 "recording": false
-  }
-],
-"drive0": [
+  },
   {
 "busy": false,
 "count": 0,
 "granularity": 65536,
 "persistent": false,
 "recording": false
-  },
+  }
+],
+"drive0": [
   {
 "busy": false,
 "count": 458752,
@@ -1341,16 +1341,1

[PULL 5/6] qapi: blockdev-backup: add discard-source parameter

2024-04-29 Thread Vladimir Sementsov-Ogievskiy

Add a parameter that enables discard-after-copy. That is mostly useful
in "push backup with fleecing" scheme, when source is snapshot-access
format driver node, based on copy-before-write filter snapshot-access
API:

[guest]  [snapshot-access] ~~ blockdev-backup ~~> [backup target]
   ||
   | root   | file
   vv
[copy-before-write]
   | |
   | file| target
   v v
[active disk]   [temp.img]

In this case discard-after-copy does two things:

 - discard data in temp.img to save disk space
 - avoid further copy-before-write operation in discarded area

Note that we have to declare WRITE permission on source in
copy-before-write filter, for discard to work. Still we can't take it
unconditionally, as it will break normal backup from RO source. So, we
have to add a parameter and pass it thorough bdrv_open flags.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Fiona Ebner 
Tested-by: Fiona Ebner 
Acked-by: Markus Armbruster 
Message-Id: <20240313152822.626493-5-vsement...@yandex-team.ru>
Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/backup.c |  5 +++--
 block/block-copy.c |  9 +
 block/copy-before-write.c  | 15 +--
 block/copy-before-write.h  |  1 +
 block/replication.c|  4 ++--
 blockdev.c |  2 +-
 include/block/block-common.h   |  2 ++
 include/block/block-copy.h |  1 +
 include/block/block_int-global-state.h |  2 +-
 qapi/block-core.json   |  4 
 10 files changed, 37 insertions(+), 8 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index ec29d6b810..3dd2e229d2 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -356,7 +356,7 @@ BlockJob *backup_job_create(const char *job_id, 
BlockDriverState *bs,
   BlockDriverState *target, int64_t speed,
   MirrorSyncMode sync_mode, BdrvDirtyBitmap *sync_bitmap,
   BitmapSyncMode bitmap_mode,
-  bool compress,
+  bool compress, bool discard_source,
   const char *filter_node_name,
   BackupPerf *perf,
   BlockdevOnError on_source_error,
@@ -457,7 +457,8 @@ BlockJob *backup_job_create(const char *job_id, 
BlockDriverState *bs,
 goto error;
 }
 
-cbw = bdrv_cbw_append(bs, target, filter_node_name, , errp);
+cbw = bdrv_cbw_append(bs, target, filter_node_name, discard_source,
+  , errp);
 if (!cbw) {
 goto error;
 }
diff --git a/block/block-copy.c b/block/block-copy.c
index 8fca2c3698..7e3b378528 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -137,6 +137,7 @@ typedef struct BlockCopyState {
 CoMutex lock;
 int64_t in_flight_bytes;
 BlockCopyMethod method;
+bool discard_source;
 BlockReqList reqs;
 QLIST_HEAD(, BlockCopyCallState) calls;
 /*
@@ -353,6 +354,7 @@ static int64_t 
block_copy_calculate_cluster_size(BlockDriverState *target,
 BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
  BlockDriverState *copy_bitmap_bs,
  const BdrvDirtyBitmap *bitmap,
+ bool discard_source,
  Error **errp)
 {
 ERRP_GUARD();
@@ -418,6 +420,7 @@ BlockCopyState *block_copy_state_new(BdrvChild *source, 
BdrvChild *target,
 cluster_size),
 };
 
+s->discard_source = discard_source;
 block_copy_set_copy_opts(s, false, false);
 
 ratelimit_init(>rate_limit);
@@ -589,6 +592,12 @@ static coroutine_fn int block_copy_task_entry(AioTask 
*task)
 co_put_to_shres(s->mem, t->req.bytes);
 block_copy_task_end(t, ret);
 
+if (s->discard_source && ret == 0) {
+int64_t nbytes =
+MIN(t->req.offset + t->req.bytes, s->len) - t->req.offset;
+bdrv_co_pdiscard(s->source, t->req.offset, nbytes);
+}
+
 return ret;
 }
 
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index ed2c228da7..cd65524e26 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -44,6 +44,7 @@ typedef struct BDRVCopyBeforeWriteState {
 BdrvChild *target;
 OnCbwError on_cbw_error;
 uint32_t cbw_timeout_ns;
+bool discard_source;
 
 /*
  * @lock: protects access to @access_bitmap, @done_bitmap and
@@ -357,6 +358,8 @@ cbw_child_perm(BlockDriverState *bs, BdrvChild *c, 
BdrvChildRole role,
uint64_t perm, uint64_t shared,
uint64_t *nperm, uint64_t *nshared)
 {
+BDRVCopyBeforeWriteState *s = bs->opaque;
+
 if (!(role & BDRV_CHILD_FILTERED)) {
 /*
  * Target child
@@ -381,6 +384,10 @@ cbw_child_perm(BlockDriver

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 11014 matches

Mail list logo