Re: [PATCH for-6.1?] iotest: Further enhance qemu-img-bitmaps

2021-07-21 Thread Nir Soffer

On 7/21/21 11:46 PM, Eric Blake wrote:

Add a regression test to make sure we detect attempts to use 'qemu-img
bitmap' to modify an in-use local file.

Suggested-by: Nir Soffer 
Signed-off-by: Eric Blake 
---

Sadly, this missed my bitmaps pull request today.  If there's any
reason to respin that pull request, I'm inclined to add this in, as it
just touches the iotests; otherwise, if it slips to 6.2 it's not too
bad.

  tests/qemu-iotests/tests/qemu-img-bitmaps | 6 ++
  tests/qemu-iotests/tests/qemu-img-bitmaps.out | 5 +
  2 files changed, 11 insertions(+)

diff --git a/tests/qemu-iotests/tests/qemu-img-bitmaps 
b/tests/qemu-iotests/tests/qemu-img-bitmaps
index 7a3fe8c3d37a..3b6fade11735 100755
--- a/tests/qemu-iotests/tests/qemu-img-bitmaps
+++ b/tests/qemu-iotests/tests/qemu-img-bitmaps
@@ -129,6 +129,12 @@ $QEMU_IMG map --output=json --image-opts \
  $QEMU_IMG map --output=json --image-opts \
  "$IMG,x-dirty-bitmap=qemu:dirty-bitmap:b3" | _filter_qemu_img_map

+echo
+echo "=== bitmap command fails to modify image already in use ==="
+echo
+
+$QEMU_IMG bitmap --add "$TEST_IMG" b4 2>&1 | _filter_testdir
+
  nbd_server_stop

  echo
diff --git a/tests/qemu-iotests/tests/qemu-img-bitmaps.out 
b/tests/qemu-iotests/tests/qemu-img-bitmaps.out
index e851f0320ecb..c6e12dd700aa 100644
--- a/tests/qemu-iotests/tests/qemu-img-bitmaps.out
+++ b/tests/qemu-iotests/tests/qemu-img-bitmaps.out
@@ -116,6 +116,11 @@ Format specific information:
  { "start": 2097152, "length": 1048576, "depth": 0, "present": false, "zero": false, 
"data": false},
  { "start": 3145728, "length": 7340032, "depth": 0, "present": true, "zero": false, 
"data": true, "offset": OFFSET}]

+=== bitmap command fails to modify image already in use ===
+
+qemu-img: Could not open 'TEST_DIR/t.qcow2': Failed to get "write" lock
+Is another process using the image [TEST_DIR/t.qcow2]?
+
  === Check handling of inconsistent bitmap ===

  image: TEST_DIR/t.IMGFMT



It would be nice to test more than --add. I guess the implementation
is shared but if someone change it the test will protect us.

Reviewed-by: Nir Soffer 




[PATCH for-6.1?] iotest: Further enhance qemu-img-bitmaps

2021-07-21 Thread Eric Blake
Add a regression test to make sure we detect attempts to use 'qemu-img
bitmap' to modify an in-use local file.

Suggested-by: Nir Soffer 
Signed-off-by: Eric Blake 
---

Sadly, this missed my bitmaps pull request today.  If there's any
reason to respin that pull request, I'm inclined to add this in, as it
just touches the iotests; otherwise, if it slips to 6.2 it's not too
bad.

 tests/qemu-iotests/tests/qemu-img-bitmaps | 6 ++
 tests/qemu-iotests/tests/qemu-img-bitmaps.out | 5 +
 2 files changed, 11 insertions(+)

diff --git a/tests/qemu-iotests/tests/qemu-img-bitmaps 
b/tests/qemu-iotests/tests/qemu-img-bitmaps
index 7a3fe8c3d37a..3b6fade11735 100755
--- a/tests/qemu-iotests/tests/qemu-img-bitmaps
+++ b/tests/qemu-iotests/tests/qemu-img-bitmaps
@@ -129,6 +129,12 @@ $QEMU_IMG map --output=json --image-opts \
 $QEMU_IMG map --output=json --image-opts \
 "$IMG,x-dirty-bitmap=qemu:dirty-bitmap:b3" | _filter_qemu_img_map

+echo
+echo "=== bitmap command fails to modify image already in use ==="
+echo
+
+$QEMU_IMG bitmap --add "$TEST_IMG" b4 2>&1 | _filter_testdir
+
 nbd_server_stop

 echo
diff --git a/tests/qemu-iotests/tests/qemu-img-bitmaps.out 
b/tests/qemu-iotests/tests/qemu-img-bitmaps.out
index e851f0320ecb..c6e12dd700aa 100644
--- a/tests/qemu-iotests/tests/qemu-img-bitmaps.out
+++ b/tests/qemu-iotests/tests/qemu-img-bitmaps.out
@@ -116,6 +116,11 @@ Format specific information:
 { "start": 2097152, "length": 1048576, "depth": 0, "present": false, "zero": 
false, "data": false},
 { "start": 3145728, "length": 7340032, "depth": 0, "present": true, "zero": 
false, "data": true, "offset": OFFSET}]

+=== bitmap command fails to modify image already in use ===
+
+qemu-img: Could not open 'TEST_DIR/t.qcow2': Failed to get "write" lock
+Is another process using the image [TEST_DIR/t.qcow2]?
+
 === Check handling of inconsistent bitmap ===

 image: TEST_DIR/t.IMGFMT
-- 
2.31.1




[PULL 3/3] qemu-img: Add --skip-broken-bitmaps for 'convert --bitmaps'

2021-07-21 Thread Eric Blake
The point of 'qemu-img convert --bitmaps' is to be a convenience for
actions that are already possible through a string of smaller
'qemu-img bitmap' sub-commands.  One situation not accounted for
already is that if a source image contains an inconsistent bitmap (for
example, because a qemu process died abruptly before flushing bitmap
state), the user MUST delete those inconsistent bitmaps before
anything else useful can be done with the image.

We don't want to delete inconsistent bitmaps by default: although a
corrupt bitmap is only a loss of optimization rather than a corruption
of user-visible data, it is still nice to require the user to opt in
to the fact that they are aware of the loss of the bitmap.  Still,
requiring the user to check 'qemu-img info' to see whether bitmaps are
consistent, then use 'qemu-img bitmap --remove' to remove offenders,
all before using 'qemu-img convert', is a lot more work than just
adding a knob 'qemu-img convert --bitmaps --skip-broken-bitmaps' which
opts in to skipping the broken bitmaps.

After testing the new option, also demonstrate the way to manually fix
things (either deleting bad bitmaps, or re-creating them as empty) so
that it is possible to convert without the option.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1946084
Signed-off-by: Eric Blake 
Message-Id: <20210709153951.2801666-4-ebl...@redhat.com>
[eblake: warning message tweak, test enhancements]
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 docs/tools/qemu-img.rst   |  8 -
 qemu-img.c| 29 +++
 tests/qemu-iotests/tests/qemu-img-bitmaps | 16 -
 tests/qemu-iotests/tests/qemu-img-bitmaps.out | 35 ++-
 4 files changed, 79 insertions(+), 9 deletions(-)

diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst
index 1d8470eada0e..b7d602a28804 100644
--- a/docs/tools/qemu-img.rst
+++ b/docs/tools/qemu-img.rst
@@ -414,7 +414,7 @@ Command description:
   4
 Error on reading data

-.. option:: convert [--object OBJECTDEF] [--image-opts] [--target-image-opts] 
[--target-is-zero] [--bitmaps] [-U] [-C] [-c] [-p] [-q] [-n] [-f FMT] [-t 
CACHE] [-T SRC_CACHE] [-O OUTPUT_FMT] [-B BACKING_FILE] [-o OPTIONS] [-l 
SNAPSHOT_PARAM] [-S SPARSE_SIZE] [-r RATE_LIMIT] [-m NUM_COROUTINES] [-W] 
FILENAME [FILENAME2 [...]] OUTPUT_FILENAME
+.. option:: convert [--object OBJECTDEF] [--image-opts] [--target-image-opts] 
[--target-is-zero] [--bitmaps [--skip-broken-bitmaps]] [-U] [-C] [-c] [-p] [-q] 
[-n] [-f FMT] [-t CACHE] [-T SRC_CACHE] [-O OUTPUT_FMT] [-B BACKING_FILE] [-o 
OPTIONS] [-l SNAPSHOT_PARAM] [-S SPARSE_SIZE] [-r RATE_LIMIT] [-m 
NUM_COROUTINES] [-W] FILENAME [FILENAME2 [...]] OUTPUT_FILENAME

   Convert the disk image *FILENAME* or a snapshot *SNAPSHOT_PARAM*
   to disk image *OUTPUT_FILENAME* using format *OUTPUT_FMT*. It can
@@ -456,6 +456,12 @@ Command description:
   *NUM_COROUTINES* specifies how many coroutines work in parallel during
   the convert process (defaults to 8).

+  Use of ``--bitmaps`` requests that any persistent bitmaps present in
+  the original are also copied to the destination.  If any bitmap is
+  inconsistent in the source, the conversion will fail unless
+  ``--skip-broken-bitmaps`` is also specified to copy only the
+  consistent bitmaps.
+
 .. option:: create [--object OBJECTDEF] [-q] [-f FMT] [-b BACKING_FILE] [-F 
BACKING_FMT] [-u] [-o OPTIONS] FILENAME [SIZE]

   Create the new disk image *FILENAME* of size *SIZE* and format
diff --git a/qemu-img.c b/qemu-img.c
index c5496e82e025..908fd0cce5b4 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -82,6 +82,7 @@ enum {
 OPTION_MERGE = 274,
 OPTION_BITMAPS = 275,
 OPTION_FORCE = 276,
+OPTION_SKIP_BROKEN = 277,
 };

 typedef enum OutputFormat {
@@ -2102,7 +2103,7 @@ static int convert_do_copy(ImgConvertState *s)
 }

 /* Check that bitmaps can be copied, or output an error */
-static int convert_check_bitmaps(BlockDriverState *src)
+static int convert_check_bitmaps(BlockDriverState *src, bool skip_broken)
 {
 BdrvDirtyBitmap *bm;

@@ -2114,17 +2115,19 @@ static int convert_check_bitmaps(BlockDriverState *src)
 if (!bdrv_dirty_bitmap_get_persistence(bm)) {
 continue;
 }
-if (bdrv_dirty_bitmap_inconsistent(bm)) {
+if (!skip_broken && bdrv_dirty_bitmap_inconsistent(bm)) {
 error_report("Cannot copy inconsistent bitmap '%s'",
  bdrv_dirty_bitmap_name(bm));
-error_printf("Try 'qemu-img bitmap --remove' to delete it\n");
+error_printf("Try --skip-broken-bitmaps, or "
+ "use 'qemu-img bitmap --remove' to delete it\n");
 return -1;
 }
 }
 return 0;
 }

-static int convert_copy_bitmaps(BlockDriverState *src, BlockDriverState *dst)
+static int convert_copy_bitmaps(BlockDriverState *src, BlockDriverState *dst,
+bool skip_broken)
 {
 BdrvD

[PULL 2/3] qemu-img: Fail fast on convert --bitmaps with inconsistent bitmap

2021-07-21 Thread Eric Blake
Waiting until the end of the convert operation (a potentially
time-consuming task) to finally detect that we can't copy a bitmap is
bad, comparing to failing fast up front.  Furthermore, this prevents
us from leaving a file behind with a bitmap that is not marked as
inconsistent even though it does not have sane contents.

This fixes the problems exposed in the previous patch to the iotest:
it adds a fast failure up front, and even if we don't fail early, it
ensures that any bitmap we add but do not properly populate is removed
again rather than left behind incomplete.

Signed-off-by: Eric Blake 
Message-Id: <20210709153951.2801666-3-ebl...@redhat.com>
[eblake: add a hint to the warning message, simplify name computation]
Reviewed-by: Nir Soffer 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 qemu-img.c| 29 +--
 tests/qemu-iotests/tests/qemu-img-bitmaps |  3 +-
 tests/qemu-iotests/tests/qemu-img-bitmaps.out | 21 ++
 3 files changed, 30 insertions(+), 23 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 797742a44331..c5496e82e025 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -2101,6 +2101,29 @@ static int convert_do_copy(ImgConvertState *s)
 return s->ret;
 }

+/* Check that bitmaps can be copied, or output an error */
+static int convert_check_bitmaps(BlockDriverState *src)
+{
+BdrvDirtyBitmap *bm;
+
+if (!bdrv_supports_persistent_dirty_bitmap(src)) {
+error_report("Source lacks bitmap support");
+return -1;
+}
+FOR_EACH_DIRTY_BITMAP(src, bm) {
+if (!bdrv_dirty_bitmap_get_persistence(bm)) {
+continue;
+}
+if (bdrv_dirty_bitmap_inconsistent(bm)) {
+error_report("Cannot copy inconsistent bitmap '%s'",
+ bdrv_dirty_bitmap_name(bm));
+error_printf("Try 'qemu-img bitmap --remove' to delete it\n");
+return -1;
+}
+}
+return 0;
+}
+
 static int convert_copy_bitmaps(BlockDriverState *src, BlockDriverState *dst)
 {
 BdrvDirtyBitmap *bm;
@@ -2127,6 +2150,7 @@ static int convert_copy_bitmaps(BlockDriverState *src, 
BlockDriverState *dst)
   &err);
 if (err) {
 error_reportf_err(err, "Failed to populate bitmap %s: ", name);
+qmp_block_dirty_bitmap_remove(dst->node_name, name, NULL);
 return -1;
 }
 }
@@ -2554,9 +2578,8 @@ static int img_convert(int argc, char **argv)
 ret = -1;
 goto out;
 }
-if (!bdrv_supports_persistent_dirty_bitmap(blk_bs(s.src[0]))) {
-error_report("Source lacks bitmap support");
-ret = -1;
+ret = convert_check_bitmaps(blk_bs(s.src[0]));
+if (ret < 0) {
 goto out;
 }
 }
diff --git a/tests/qemu-iotests/tests/qemu-img-bitmaps 
b/tests/qemu-iotests/tests/qemu-img-bitmaps
index 409c4497a303..09c3d395d1eb 100755
--- a/tests/qemu-iotests/tests/qemu-img-bitmaps
+++ b/tests/qemu-iotests/tests/qemu-img-bitmaps
@@ -140,11 +140,10 @@ $QEMU_IO -c abort "$TEST_IMG" 2>/dev/null
 $QEMU_IMG bitmap --add "$TEST_IMG" b4
 $QEMU_IMG bitmap --remove "$TEST_IMG" b1
 _img_info --format-specific | _filter_irrelevant_img_info
+# Proof that we fail fast if bitmaps can't be copied
 echo
 $QEMU_IMG convert --bitmaps -O qcow2 "$TEST_IMG" "$TEST_IMG.copy" &&
 echo "unexpected success"
-# Bug - even though we failed at conversion, we left a file around with
-# a bitmap marked as not corrupt
 TEST_IMG=$TEST_IMG.copy _img_info --format-specific \
 | _filter_irrelevant_img_info

diff --git a/tests/qemu-iotests/tests/qemu-img-bitmaps.out 
b/tests/qemu-iotests/tests/qemu-img-bitmaps.out
index 6824d4112893..d99c279d0c63 100644
--- a/tests/qemu-iotests/tests/qemu-img-bitmaps.out
+++ b/tests/qemu-iotests/tests/qemu-img-bitmaps.out
@@ -144,22 +144,7 @@ Format specific information:
 granularity: 65536
 corrupt: false

-qemu-img: Failed to populate bitmap b0: Bitmap 'b0' is inconsistent and cannot 
be used
-Try block-dirty-bitmap-remove to delete this bitmap from disk
-image: TEST_DIR/t.IMGFMT.copy
-file format: IMGFMT
-virtual size: 10 MiB (10485760 bytes)
-cluster_size: 65536
-Format specific information:
-bitmaps:
-[0]:
-flags:
-name: b0
-granularity: 65536
-[1]:
-flags:
-[0]: auto
-name: b4
-granularity: 65536
-corrupt: false
+qemu-img: Cannot copy inconsistent bitmap 'b0'
+Try 'qemu-img bitmap --remove' to delete it
+qemu-img: Could not open 'TEST_DIR/t.IMGFMT.copy': Could not open 
'TEST_DIR/t.IMGFMT.copy': No such file or directory
 *** done
-- 
2.31.1




[PULL 1/3] iotests: Improve and rename test 291 to qemu-img-bitmap

2021-07-21 Thread Eric Blake
Enhance the test to demonstrate existing less-than-stellar behavior of
qemu-img with a qcow2 image containing an inconsistent bitmap: we
don't diagnose the problem until after copying the entire image (a
potentially long time), and when we do diagnose the failure, we still
end up leaving an empty bitmap in the destination.  This mess will be
cleaned up in the next patch.

While at it, rename the test now that we support useful iotest names,
and fix a missing newline in the error message thus exposed.

Signed-off-by: Eric Blake 
Message-Id: <20210709153951.2801666-2-ebl...@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Nir Soffer 
---
 block/dirty-bitmap.c  |  2 +-
 .../{291 => tests/qemu-img-bitmaps}   | 21 +++-
 .../{291.out => tests/qemu-img-bitmaps.out}   | 49 ++-
 3 files changed, 69 insertions(+), 3 deletions(-)
 rename tests/qemu-iotests/{291 => tests/qemu-img-bitmaps} (87%)
 rename tests/qemu-iotests/{291.out => tests/qemu-img-bitmaps.out} (76%)

diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index 68d295d6e3ed..0ef46163e3ea 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -193,7 +193,7 @@ int bdrv_dirty_bitmap_check(const BdrvDirtyBitmap *bitmap, 
uint32_t flags,
 error_setg(errp, "Bitmap '%s' is inconsistent and cannot be used",
bitmap->name);
 error_append_hint(errp, "Try block-dirty-bitmap-remove to delete"
-  " this bitmap from disk");
+  " this bitmap from disk\n");
 return -1;
 }

diff --git a/tests/qemu-iotests/291 b/tests/qemu-iotests/tests/qemu-img-bitmaps
similarity index 87%
rename from tests/qemu-iotests/291
rename to tests/qemu-iotests/tests/qemu-img-bitmaps
index 20efb080a6c0..409c4497a303 100755
--- a/tests/qemu-iotests/291
+++ b/tests/qemu-iotests/tests/qemu-img-bitmaps
@@ -3,7 +3,7 @@
 #
 # Test qemu-img bitmap handling
 #
-# Copyright (C) 2018-2020 Red Hat, Inc.
+# Copyright (C) 2018-2021 Red Hat, Inc.
 #
 # This program is free software; you can redistribute it and/or modify
 # it under the terms of the GNU General Public License as published by
@@ -27,11 +27,13 @@ status=1 # failure is the default!
 _cleanup()
 {
 _cleanup_test_img
+_rm_test_img "$TEST_IMG.copy"
 nbd_server_stop
 }
 trap "_cleanup; exit \$status" 0 1 2 3 15

 # get standard environment, filters and checks
+cd ..
 . ./common.rc
 . ./common.filter
 . ./common.nbd
@@ -129,6 +131,23 @@ $QEMU_IMG map --output=json --image-opts \

 nbd_server_stop

+echo
+echo "=== Check handling of inconsistent bitmap ==="
+echo
+
+# Prepare image with corrupted bitmap
+$QEMU_IO -c abort "$TEST_IMG" 2>/dev/null
+$QEMU_IMG bitmap --add "$TEST_IMG" b4
+$QEMU_IMG bitmap --remove "$TEST_IMG" b1
+_img_info --format-specific | _filter_irrelevant_img_info
+echo
+$QEMU_IMG convert --bitmaps -O qcow2 "$TEST_IMG" "$TEST_IMG.copy" &&
+echo "unexpected success"
+# Bug - even though we failed at conversion, we left a file around with
+# a bitmap marked as not corrupt
+TEST_IMG=$TEST_IMG.copy _img_info --format-specific \
+| _filter_irrelevant_img_info
+
 # success, all done
 echo '*** done'
 rm -f $seq.full
diff --git a/tests/qemu-iotests/291.out 
b/tests/qemu-iotests/tests/qemu-img-bitmaps.out
similarity index 76%
rename from tests/qemu-iotests/291.out
rename to tests/qemu-iotests/tests/qemu-img-bitmaps.out
index 018d6b103f87..6824d4112893 100644
--- a/tests/qemu-iotests/291.out
+++ b/tests/qemu-iotests/tests/qemu-img-bitmaps.out
@@ -1,4 +1,4 @@
-QA output created by 291
+QA output created by qemu-img-bitmaps

 === Initial image setup ===

@@ -115,4 +115,51 @@ Format specific information:
 [{ "start": 0, "length": 2097152, "depth": 0, "present": true, "zero": false, 
"data": true, "offset": OFFSET},
 { "start": 2097152, "length": 1048576, "depth": 0, "present": false, "zero": 
false, "data": false},
 { "start": 3145728, "length": 7340032, "depth": 0, "present": true, "zero": 
false, "data": true, "offset": OFFSET}]
+
+=== Check handling of inconsistent bitmap ===
+
+image: TEST_DIR/t.IMGFMT
+file format: IMGFMT
+virtual size: 10 MiB (10485760 bytes)
+cluster_size: 65536
+backing file: TEST_DIR/t.IMGFMT.base
+backing file format: IMGFMT
+Format specific information:
+bitmaps:
+[0]:
+flags:
+[0]: in-use
+[1]: auto
+name: b2
+granularity: 65536
+[1]:
+flags:
+[0]: in-use
+name: b0
+granularity: 65536
+[2]:
+flags:
+[0]: auto
+name: b4
+granularity: 65536
+corrupt: false
+
+qemu-img: Failed to populate bitmap b0: Bitmap 'b0' is inconsistent and cannot 
be used
+Try block-dirty-bitmap-remove to delete this bitmap from disk
+image: TEST_DIR/t.IMGFMT.copy
+file format: IMGFMT
+virtual size: 10 MiB (10485760 bytes)
+cluster_size: 65536
+Format

Transient fail of iotests 215 and 197

2021-07-21 Thread Daniel P . Berrangé
Peter caught the following transient fail on the staging tree:

  https://gitlab.com/qemu-project/qemu/-/jobs/1438817749

--- /builds/qemu-project/qemu/tests/qemu-iotests/197.out
+++ 197.out.bad
@@ -12,13 +12,12 @@
 128 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 read 0/0 bytes at offset 0
 0 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-read 2147483136/2147483136 bytes at offset 1024
-2 GiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+./common.rc: Killed  ( VALGRIND_QEMU="${VALGRIND_QEMU_IO}" 
_qemu_proc_exec "${VALGRIND_LOGFILE}" "$QEMU_IO_PROG" $QEMU_IO_ARGS "$@" )
 read 1024/1024 bytes at offset 3221226496
 1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 qemu-io: can't open device TEST_DIR/t.wrap.qcow2: Can't use copy-on-read on 
read-only device
-2 GiB (0x8001) bytes allocated at offset 0 bytes (0x0)
-1023.938 MiB (0x3fff) bytes not allocated at offset 2 GiB (0x8001)
+2 GiB (0x8000) bytes allocated at offset 0 bytes (0x0)
+1 GiB (0x4000) bytes not allocated at offset 2 GiB (0x8000)
 64 KiB (0x1) bytes allocated at offset 3 GiB (0xc000)
 1023.938 MiB (0x3fff) bytes not allocated at offset 3 GiB (0xc001)
 No errors were found on the image.


--- /builds/qemu-project/qemu/tests/qemu-iotests/215.out
+++ 215.out.bad
@@ -12,13 +12,12 @@
 128 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 read 0/0 bytes at offset 0
 0 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-read 2147483136/2147483136 bytes at offset 1024
-2 GiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+./common.rc: Killed  ( VALGRIND_QEMU="${VALGRIND_QEMU_IO}" 
_qemu_proc_exec "${VALGRIND_LOGFILE}" "$QEMU_IO_PROG" $QEMU_IO_ARGS "$@" )
 read 1024/1024 bytes at offset 3221226496
 1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 qemu-io: can't open device TEST_DIR/t.wrap.qcow2: Block node is read-only
-2 GiB (0x8001) bytes allocated at offset 0 bytes (0x0)
-1023.938 MiB (0x3fff) bytes not allocated at offset 2 GiB (0x8001)
+2 GiB (0x8000) bytes allocated at offset 0 bytes (0x0)
+1 GiB (0x4000) bytes not allocated at offset 2 GiB (0x8000)
 64 KiB (0x1) bytes allocated at offset 3 GiB (0xc000)
 1023.938 MiB (0x3fff) bytes not allocated at offset 3 GiB (0xc001)
 No errors were found on the image.


Looks like the process might have been killed off by the OS part way
through.

Interestingly both test cases have a comment:

  #Since a 2G read may exhaust
  # memory on some machines (particularly 32-bit), we skip the test if
  # that fails due to memory pressure.


I'm wondering if the logic for handling this failure is flawed, as being
killed by the OS for exhuasting memory limits for the CI container looks
like a plausible scenario to explain the failure.

The CI shared runners supposedly have 3.75 GB of RAM for the VM as a whole.
If the tests are run in parallel this could still be an issue.

Maybe we need to skip these tests by default if they are known to require
a significant amount of memory to run ?

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PULL for-6.1 0/3] Block patches

2021-07-21 Thread Peter Maydell
On Wed, 21 Jul 2021 at 14:13, Stefan Hajnoczi  wrote:
>
> The following changes since commit 801f3db7564dcce8a37a70833c0abe40ec19f8ce:
>
>   Merge remote-tracking branch 'remotes/philmd/tags/kconfig-20210720' into 
> staging (2021-07-20 19:30:28 +0100)
>
> are available in the Git repository at:
>
>   https://gitlab.com/stefanha/qemu.git tags/block-pull-request
>
> for you to fetch changes up to d7ddd0a1618a75b31dc308bb37365ce1da972154:
>
>   linux-aio: limit the batch size using `aio-max-batch` parameter (2021-07-21 
> 13:47:50 +0100)
>
> 
> Pull request
>
> Stefano's performance regression fix for commit 2558cb8dd4 ("linux-aio:
> increasing MAX_EVENTS to a larger hardcoded value").
>
> 


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/6.1
for any user-visible changes.

-- PMM



[PATCH v2 6/6] iotests/image-fleecing: test push backup with fleecing

2021-07-21 Thread Vladimir Sementsov-Ogievskiy
Add test for push backup with fleecing:

 - start fleecing with copy-before-write filter
 - start a backup job from temporary fleecing node to actual backup
   target

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 tests/qemu-iotests/tests/image-fleecing | 105 +++-
 tests/qemu-iotests/tests/image-fleecing.out |  61 
 2 files changed, 141 insertions(+), 25 deletions(-)

diff --git a/tests/qemu-iotests/tests/image-fleecing 
b/tests/qemu-iotests/tests/image-fleecing
index f6318492c6..083cfacc6f 100755
--- a/tests/qemu-iotests/tests/image-fleecing
+++ b/tests/qemu-iotests/tests/image-fleecing
@@ -48,12 +48,20 @@ remainder = [('0xd5', '0x108000',  '32k'), # Right-end of 
partial-left [1]
  ('0xdc', '32M',   '32k'), # Left-end of partial-right [2]
  ('0xcd', '0x3ff', '64k')] # patterns[3]
 
-def do_test(use_cbw, base_img_path, fleece_img_path, nbd_sock_path, vm):
+def do_test(vm, use_cbw, base_img_path, fleece_img_path, nbd_sock_path=None,
+target_img_path=None):
+push_backup = target_img_path is not None
+assert (nbd_sock_path is not None) != push_backup
+if push_backup:
+assert use_cbw
+
 log('--- Setting up images ---')
 log('')
 
 assert qemu_img('create', '-f', iotests.imgfmt, base_img_path, '64M') == 0
 assert qemu_img('create', '-f', 'qcow2', fleece_img_path, '64M') == 0
+if push_backup:
+assert qemu_img('create', '-f', 'qcow2', target_img_path, '64M') == 0
 
 for p in patterns:
 qemu_io('-f', iotests.imgfmt,
@@ -108,25 +116,43 @@ def do_test(use_cbw, base_img_path, fleece_img_path, 
nbd_sock_path, vm):
target=tmp_node,
sync='none'))
 
-log('')
-log('--- Setting up NBD Export ---')
-log('')
+if push_backup:
+log('')
+log('--- Starting actual backup ---')
+log('')
+
+log(vm.qmp('blockdev-add', **{
+'driver': iotests.imgfmt,
+'node-name': 'target',
+'file': {
+'driver': 'file',
+'filename': target_img_path
+}
+}))
+log(vm.qmp('blockdev-backup', device=tmp_node,
+   sync='full', target='target',
+   immutable_source=True,
+   job_id='push-backup', speed=1))
+else:
+log('')
+log('--- Setting up NBD Export ---')
+log('')
 
-nbd_uri = 'nbd+unix:///%s?socket=%s' % (tmp_node, nbd_sock_path)
-log(vm.qmp('nbd-server-start',
-   {'addr': { 'type': 'unix',
-  'data': { 'path': nbd_sock_path } } }))
+nbd_uri = 'nbd+unix:///%s?socket=%s' % (tmp_node, nbd_sock_path)
+log(vm.qmp('nbd-server-start',
+   {'addr': { 'type': 'unix',
+  'data': { 'path': nbd_sock_path } } }))
 
-log(vm.qmp('nbd-server-add', device=tmp_node))
+log(vm.qmp('nbd-server-add', device=tmp_node))
 
-log('')
-log('--- Sanity Check ---')
-log('')
+log('')
+log('--- Sanity Check ---')
+log('')
 
-for p in patterns + zeroes:
-cmd = 'read -P%s %s %s' % p
-log(cmd)
-assert qemu_io_silent('-r', '-f', 'raw', '-c', cmd, nbd_uri) == 0
+for p in patterns + zeroes:
+cmd = 'read -P%s %s %s' % p
+log(cmd)
+assert qemu_io_silent('-r', '-f', 'raw', '-c', cmd, nbd_uri) == 0
 
 log('')
 log('--- Testing COW ---')
@@ -137,6 +163,20 @@ def do_test(use_cbw, base_img_path, fleece_img_path, 
nbd_sock_path, vm):
 log(cmd)
 log(vm.hmp_qemu_io(qom_path, cmd, qdev=True))
 
+if push_backup:
+# Check that previous operations were done during backup, not after
+result = vm.qmp('query-block-jobs')
+if len(result['return']) != 1:
+log('Backup finished too fast, COW is not tested')
+
+result = vm.qmp('block-job-set-speed', device='push-backup', speed=0)
+assert result == {'return': {}}
+
+log(vm.event_wait(name='BLOCK_JOB_COMPLETED',
+  match={'data': {'device': 'push-backup'}}),
+  filters=[iotests.filter_qmp_event])
+log(vm.qmp('blockdev-del', node_name='target'))
+
 log('')
 log('--- Verifying Data ---')
 log('')
@@ -144,7 +184,10 @@ def do_test(use_cbw, base_img_path, fleece_img_path, 
nbd_sock_path, vm):
 for p in patterns + zeroes:
 cmd = 'read -P%s %s %s' % p
 log(cmd)
-assert qemu_io_silent('-r', '-f', 'raw', '-c', cmd, nbd_uri) == 0
+if push_backup:
+assert qemu_io_silent('-r', '-c', cmd, target_img_path) == 0
+else:
+assert qemu_io_silent('-r', '-f', 'raw', '-c', cmd, nbd_uri) == 0
 
 log('')
 log('--- Cleanup ---')
@@ -159,7 +202,9 @@ def do_test(use_cbw, base_img_path, fleece_img_path, 
nbd_sock_path, vm):
 assert e is not N

[PATCH v2 5/6] qapi: backup: add immutable-source paramter

2021-07-21 Thread Vladimir Sementsov-Ogievskiy
We are on the way to implement internal-backup with fleecing scheme,
which includes backup job copying from temporary node (which is target
of copy-before-write filter) to final target of backup. This job
doesn't need own filter, as temporary node is a kind of snapshot, it's
immutable.

Let's add a parameter for backup to not insert filter and handle
guest writes to source node but instead unshare writes on source. This
way backup job become a simple copying process.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 qapi/block-core.json  | 12 ++-
 include/block/block_int.h |  1 +
 block/backup.c| 71 ---
 block/replication.c   |  2 +-
 blockdev.c|  1 +
 5 files changed, 80 insertions(+), 7 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 59d3e5e42d..d266eceabb 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1391,6 +1391,15 @@
 #above node specified by @drive. If this option is not 
given,
 #a node name is autogenerated. (Since: 4.2)
 #
+# @immutable-source: If true, assume source is immutable and don't insert 
filter
+#as no copy-before-write operations are needed. It will
+#fail if there are existing writers on source node, as 
well,
+#any attempt to add writer to source node during backup 
will
+#fail. @filter-node-name must not be set.
+#If false, insert copy-before-write filter above source 
node
+#(see also @filter-node-name parameter).
+#Default is false. (Since 6.2)
+#
 # @x-perf: Performance options. (Since 6.0)
 #
 # Note: @on-source-error and @on-target-error only affect background
@@ -1407,7 +1416,8 @@
 '*on-source-error': 'BlockdevOnError',
 '*on-target-error': 'BlockdevOnError',
 '*auto-finalize': 'bool', '*auto-dismiss': 'bool',
-'*filter-node-name': 'str', '*x-perf': 'BackupPerf'  } }
+'*filter-node-name': 'str', '*immutable-source': 'bool',
+'*x-perf': 'BackupPerf'  } }
 
 ##
 # @DriveBackup:
diff --git a/include/block/block_int.h b/include/block/block_int.h
index f1a54db0f8..6571dad061 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -1284,6 +1284,7 @@ BlockJob *backup_job_create(const char *job_id, 
BlockDriverState *bs,
 BitmapSyncMode bitmap_mode,
 bool compress,
 const char *filter_node_name,
+bool immutable_source,
 BackupPerf *perf,
 BlockdevOnError on_source_error,
 BlockdevOnError on_target_error,
diff --git a/block/backup.c b/block/backup.c
index 687d2882bc..2bf0840bc2 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -34,6 +34,14 @@ typedef struct BackupBlockJob {
 BlockDriverState *cbw;
 BlockDriverState *source_bs;
 BlockDriverState *target_bs;
+BlockBackend *source_blk;
+BlockBackend *target_blk;
+/*
+ * Note that if backup runs with filter (immutable-source parameter is
+ * false), @cbw is set but @source_blk and @target_blk are NULL.
+ * Otherwise if backup runs without filter (immutable-source paramter is
+ * true), @cbw is NULL but @source_blk and @target_blk are set.
+ */
 
 BdrvDirtyBitmap *sync_bitmap;
 
@@ -102,7 +110,17 @@ static void backup_clean(Job *job)
 {
 BackupBlockJob *s = container_of(job, BackupBlockJob, common.job);
 block_job_remove_all_bdrv(&s->common);
-bdrv_cbw_drop(s->cbw);
+if (s->cbw) {
+assert(!s->source_blk && !s->target_blk);
+bdrv_cbw_drop(s->cbw);
+} else {
+block_copy_state_free(s->bcs);
+s->bcs = NULL;
+blk_unref(s->source_blk);
+s->source_blk = NULL;
+blk_unref(s->target_blk);
+s->target_blk = NULL;
+}
 }
 
 void backup_do_checkpoint(BlockJob *job, Error **errp)
@@ -356,6 +374,7 @@ BlockJob *backup_job_create(const char *job_id, 
BlockDriverState *bs,
   BitmapSyncMode bitmap_mode,
   bool compress,
   const char *filter_node_name,
+  bool immutable_source,
   BackupPerf *perf,
   BlockdevOnError on_source_error,
   BlockdevOnError on_target_error,
@@ -368,6 +387,7 @@ BlockJob *backup_job_create(const char *job_id, 
BlockDriverState *bs,
 int64_t cluster_size;
 BlockDriverState *cbw = NULL;
 BlockCopyState *bcs = NULL;
+BlockBackend *source_blk = NULL, *target_blk = NULL;
 
 assert(bs);
 assert(target);
@@ -376,6 +396,12 @@ BlockJob *backup_job_create(const char *job_id, 
BlockDriverState *bs,
 assert(sync_mode != MIRROR_SYNC_MODE_INCREMENTAL);
 assert(sync_bitmap || sync_mode != MIR

[PATCH v2 4/6] block: blk_root(): return non-const pointer

2021-07-21 Thread Vladimir Sementsov-Ogievskiy
In the following patch we'll want to pass blk children to block-copy.
Const pointers are not enough. So, return non const pointer from
blk_root().

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 include/sysemu/block-backend.h | 2 +-
 block/block-backend.c  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 29d4fdbf63..5d4dd877b7 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -271,7 +271,7 @@ int coroutine_fn blk_co_copy_range(BlockBackend *blk_in, 
int64_t off_in,
int bytes, BdrvRequestFlags read_flags,
BdrvRequestFlags write_flags);
 
-const BdrvChild *blk_root(BlockBackend *blk);
+BdrvChild *blk_root(BlockBackend *blk);
 
 int blk_make_empty(BlockBackend *blk, Error **errp);
 
diff --git a/block/block-backend.c b/block/block-backend.c
index 6140d133e2..b167c630d2 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -2463,7 +2463,7 @@ int coroutine_fn blk_co_copy_range(BlockBackend *blk_in, 
int64_t off_in,
   bytes, read_flags, write_flags);
 }
 
-const BdrvChild *blk_root(BlockBackend *blk)
+BdrvChild *blk_root(BlockBackend *blk)
 {
 return blk->root;
 }
-- 
2.29.2




[PATCH v2 2/6] block/copy-before-write: require BLK_PERM_WRITE_UNCHANGED for fleecing

2021-07-21 Thread Vladimir Sementsov-Ogievskiy
Now block-copy detects fleecing scheme and do write-unchanged
operations if detected. So, let's require appropriate permissions.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/copy-before-write.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index 2cd68b480a..808e8707ed 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -113,6 +113,14 @@ static void cbw_refresh_filename(BlockDriverState *bs)
 bs->file->bs->filename);
 }
 
+static bool cbw_is_fleecing(BlockDriverState *bs)
+{
+BDRVCopyBeforeWriteState *s = bs->opaque;
+
+return bs->file && s->target &&
+bdrv_skip_filters(bs) == bdrv_backing_chain_next(s->target->bs);
+}
+
 static void cbw_child_perm(BlockDriverState *bs, BdrvChild *c,
BdrvChildRole role,
BlockReopenQueue *reopen_queue,
@@ -129,7 +137,8 @@ static void cbw_child_perm(BlockDriverState *bs, BdrvChild 
*c,
  * only upfront.
  */
 *nshared = BLK_PERM_ALL & ~BLK_PERM_RESIZE;
-*nperm = BLK_PERM_WRITE;
+*nperm =
+cbw_is_fleecing(bs) ? BLK_PERM_WRITE_UNCHANGED : BLK_PERM_WRITE;
 } else {
 /* Source child */
 bdrv_default_perms(bs, c, role, reopen_queue,
-- 
2.29.2




[PATCH v2 3/6] block: share writes on backing child of fleecing node

2021-07-21 Thread Vladimir Sementsov-Ogievskiy
By default, we share writes on backing child only if our parents share
write permission on us.

Still, with fleecing scheme we want to be able to unshare writes on
fleecing node, which is a kind of immutable snapshot
(copy-before-write operations are write-unchanged). So, let's detect
fleecing node and share writes on its backing child. (we should share
them, otherwise copy-before-write filter can't write to its file
child).

With fleecing scheme we are sure, that writes to backing child goes
through copy-before-write filter, so we are safe to share them.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/copy-before-write.h |  1 +
 block.c   |  3 ++-
 block/copy-before-write.c | 37 +
 3 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/block/copy-before-write.h b/block/copy-before-write.h
index 51847e711a..a15ae9366d 100644
--- a/block/copy-before-write.h
+++ b/block/copy-before-write.h
@@ -35,5 +35,6 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
   BlockCopyState **bcs,
   Error **errp);
 void bdrv_cbw_drop(BlockDriverState *bs);
+bool bdrv_is_fleecing_node(BlockDriverState *bs);
 
 #endif /* COPY_BEFORE_WRITE_H */
diff --git a/block.c b/block.c
index 94a556e61d..d1046b7ff5 100644
--- a/block.c
+++ b/block.c
@@ -50,6 +50,7 @@
 #include "qemu/cutils.h"
 #include "qemu/id.h"
 #include "block/coroutines.h"
+#include "block/copy-before-write.h"
 
 #ifdef CONFIG_BSD
 #include 
@@ -2507,7 +2508,7 @@ static void bdrv_default_perms_for_cow(BlockDriverState 
*bs, BdrvChild *c,
  * writable and resizable backing file.
  * TODO Require !(perm & BLK_PERM_CONSISTENT_READ), too?
  */
-if (shared & BLK_PERM_WRITE) {
+if (shared & BLK_PERM_WRITE || bdrv_is_fleecing_node(bs)) {
 shared = BLK_PERM_WRITE | BLK_PERM_RESIZE;
 } else {
 shared = 0;
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index 808e8707ed..0a311e311a 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -257,6 +257,43 @@ void bdrv_cbw_drop(BlockDriverState *bs)
 bdrv_unref(bs);
 }
 
+/*
+ * Detect is bs a fleecing node in some fleecing sceheme like:
+ *
+ * copy-before-write -- target --> fleecing-node
+ *   |   |
+ *   | file  | backing
+ * active-node  <-
+ *
+ * In this case, fleecing-node can (and should) safely share writes on its
+ * backing child.
+ */
+bool bdrv_is_fleecing_node(BlockDriverState *bs)
+{
+BdrvChild *parent;
+BlockDriverState *parent_bs;
+BDRVCopyBeforeWriteState *s;
+
+QLIST_FOREACH(parent, &bs->parents, next_parent) {
+if (parent->klass != &child_of_bds) {
+continue;
+}
+
+parent_bs = parent->opaque;
+if (parent_bs->drv != &bdrv_cbw_filter) {
+continue;
+}
+
+s = parent_bs->opaque;
+
+if (s->target && s->target->bs == bs && cbw_is_fleecing(parent_bs)) {
+return true;
+}
+}
+
+return false;
+}
+
 static void cbw_init(void)
 {
 bdrv_register(&bdrv_cbw_filter);
-- 
2.29.2




[PATCH v2 for-6.2 0/6] push backup with fleecing

2021-07-21 Thread Vladimir Sementsov-Ogievskiy
Hi all!

Here is push-backup with fleecing. What is it:

1. Make fleecing scheme

guest blk
  |
  |root
  v
copy-before-write filter  ---> temp qcow2
  |  |
  |file  | backing
  V  |
active disk <-

2. Start backup job from temp qcow2 to final remote target (NBD or
   something)

Benefit in comparison with simple backup job: for remote final target
write operations are not very fast. And guest have to wait for
copy-before-write operations. With fleecing scheme target for
copy-before-write operations is local qcow2 file with faster access than
actual backup target. So, guest is less disturbed by copy-before-write
operations.

Based-on: <20210721100555.45594-1-vsement...@virtuozzo.com>
   ([PATCH v6 00/33] block: publish backup-top filter)

v2:
01: changed to simply check s->target->perm
06: make explicit immutable-source parameter instead of auto-detecting

Vladimir Sementsov-Ogievskiy (6):
  block/block-copy: use write-unchanged for fleecing scheme
  block/copy-before-write: require BLK_PERM_WRITE_UNCHANGED for fleecing
  block: share writes on backing child of fleecing node
  block: blk_root(): return non-const pointer
  qapi: backup: add immutable-source paramter
  iotests/image-fleecing: test push backup with fleecing

 qapi/block-core.json|  12 ++-
 block/copy-before-write.h   |   1 +
 include/block/block_int.h   |   1 +
 include/sysemu/block-backend.h  |   2 +-
 block.c |   3 +-
 block/backup.c  |  71 -
 block/block-backend.c   |   2 +-
 block/block-copy.c  |  18 +++-
 block/copy-before-write.c   |  48 -
 block/replication.c |   2 +-
 blockdev.c  |   1 +
 tests/qemu-iotests/tests/image-fleecing | 105 +++-
 tests/qemu-iotests/tests/image-fleecing.out |  61 
 13 files changed, 287 insertions(+), 40 deletions(-)

-- 
2.29.2




[PATCH v2 1/6] block/block-copy: use write-unchanged for fleecing scheme

2021-07-21 Thread Vladimir Sementsov-Ogievskiy
We are going to use fleecing scheme for push-backup, so that
copy-before-write filter does copy before write operations to temporary
image and backup job copies data from (immutable from backup's point of
view) temporary image to actual backup target. For this to work
properly, backup job should unshare writes on immutable source node.
copy-before-write filter should do write-unchanged operations for this
(they are really unchanged, as source is a backing of temporary node).

So, we want to teach block-copy to do WRITE_UNCHANGED operations in
case of fleecing. We can detect fleecing on initialization but that's
not safe enough: block graph may change. We can freeze the backing
chain of target to avoid changes but that's not flexible. We can detect
fleecing before each write but that's an overhead.

Actually, we'll have to detect fleecing in copy-before-write filter
anyway, to chose correct permissions on target node in further patch.
So, we can just add a possibility to control adding
BDRV_REQ_WRITE_UNCHANGED flag in block_copy, like
block_copy_set_write_unchanged(). Block-copy doesn't own source and
target child and relies on correct permission handling by block-copy
user.

Finally, let's go the simplest way: just do WRITE_UNCHANGED if
s->target has only that permission currently.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/block-copy.c | 18 ++
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/block/block-copy.c b/block/block-copy.c
index 020c8d7126..a850684ac6 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -114,7 +114,10 @@ typedef struct BlockCopyState {
 /*
  * BdrvChild objects are not owned or managed by block-copy. They are
  * provided by block-copy user and user is responsible for appropriate
- * permissions on these children.
+ * permissions on these children. Note also that block-copy will
+ * automatically add BDRV_REQ_WRITE_UNCHANGED flag to write operations if
+ * target child has BLK_PERM_WRITE_UNCHANGED permission but doesn't have
+ * BLK_PERM_WRITE permission.
  */
 BdrvChild *source;
 BdrvChild *target;
@@ -508,6 +511,7 @@ static int coroutine_fn block_copy_do_copy(BlockCopyState 
*s,
 int ret;
 int64_t nbytes = MIN(offset + bytes, s->len) - offset;
 void *bounce_buffer = NULL;
+BdrvRequestFlags write_flags = s->write_flags;
 
 assert(offset >= 0 && bytes > 0 && INT64_MAX - offset >= bytes);
 assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
@@ -517,9 +521,15 @@ static int coroutine_fn block_copy_do_copy(BlockCopyState 
*s,
offset + bytes == QEMU_ALIGN_UP(s->len, s->cluster_size));
 assert(nbytes < INT_MAX);
 
+if ((s->target->perm & (BLK_PERM_WRITE | BLK_PERM_WRITE_UNCHANGED)) ==
+BLK_PERM_WRITE_UNCHANGED)
+{
+write_flags |= BDRV_REQ_WRITE_UNCHANGED;
+}
+
 switch (*method) {
 case COPY_WRITE_ZEROES:
-ret = bdrv_co_pwrite_zeroes(s->target, offset, nbytes, s->write_flags &
+ret = bdrv_co_pwrite_zeroes(s->target, offset, nbytes, write_flags &
 ~BDRV_REQ_WRITE_COMPRESSED);
 if (ret < 0) {
 trace_block_copy_write_zeroes_fail(s, offset, ret);
@@ -530,7 +540,7 @@ static int coroutine_fn block_copy_do_copy(BlockCopyState 
*s,
 case COPY_RANGE_SMALL:
 case COPY_RANGE_FULL:
 ret = bdrv_co_copy_range(s->source, offset, s->target, offset, nbytes,
- 0, s->write_flags);
+ 0, write_flags);
 if (ret >= 0) {
 /* Successful copy-range, increase chunk size.  */
 *method = COPY_RANGE_FULL;
@@ -563,7 +573,7 @@ static int coroutine_fn block_copy_do_copy(BlockCopyState 
*s,
 }
 
 ret = bdrv_co_pwrite(s->target, offset, nbytes, bounce_buffer,
- s->write_flags);
+ write_flags);
 if (ret < 0) {
 trace_block_copy_write_fail(s, offset, ret);
 *error_is_read = false;
-- 
2.29.2




[PULL for-6.1 3/3] linux-aio: limit the batch size using `aio-max-batch` parameter

2021-07-21 Thread Stefan Hajnoczi
From: Stefano Garzarella 

When there are multiple queues attached to the same AIO context,
some requests may experience high latency, since in the worst case
the AIO engine queue is only flushed when it is full (MAX_EVENTS) or
there are no more queues plugged.

Commit 2558cb8dd4 ("linux-aio: increasing MAX_EVENTS to a larger
hardcoded value") changed MAX_EVENTS from 128 to 1024, to increase
the number of in-flight requests. But this change also increased
the potential maximum batch to 1024 elements.

When there is a single queue attached to the AIO context, the issue
is mitigated from laio_io_unplug() that will flush the queue every
time is invoked since there can't be others queue plugged.

Let's use the new `aio-max-batch` IOThread parameter to mitigate
this issue, limiting the number of requests in a batch.

We also define a default value (32): this value is obtained running
some benchmarks and it represents a good tradeoff between the latency
increase while a request is queued and the cost of the io_submit(2)
system call.

Signed-off-by: Stefano Garzarella 
Message-id: 20210721094211.69853-4-sgarz...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 block/linux-aio.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/block/linux-aio.c b/block/linux-aio.c
index 3c0527c2bf..0dab507b71 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -28,6 +28,9 @@
  */
 #define MAX_EVENTS 1024
 
+/* Maximum number of requests in a batch. (default value) */
+#define DEFAULT_MAX_BATCH 32
+
 struct qemu_laiocb {
 Coroutine *co;
 LinuxAioState *ctx;
@@ -351,6 +354,10 @@ static int laio_do_submit(int fd, struct qemu_laiocb 
*laiocb, off_t offset,
 LinuxAioState *s = laiocb->ctx;
 struct iocb *iocbs = &laiocb->iocb;
 QEMUIOVector *qiov = laiocb->qiov;
+int64_t max_batch = s->aio_context->aio_max_batch ?: DEFAULT_MAX_BATCH;
+
+/* limit the batch with the number of available events */
+max_batch = MIN_NON_ZERO(MAX_EVENTS - s->io_q.in_flight, max_batch);
 
 switch (type) {
 case QEMU_AIO_WRITE:
@@ -371,7 +378,7 @@ static int laio_do_submit(int fd, struct qemu_laiocb 
*laiocb, off_t offset,
 s->io_q.in_queue++;
 if (!s->io_q.blocked &&
 (!s->io_q.plugged ||
- s->io_q.in_flight + s->io_q.in_queue >= MAX_EVENTS)) {
+ s->io_q.in_queue >= max_batch)) {
 ioq_submit(s);
 }
 
-- 
2.31.1



Re: [PATCH for-6.1? v2 0/3] linux-aio: limit the batch size to reduce queue latency

2021-07-21 Thread Stefan Hajnoczi
On Wed, Jul 21, 2021 at 11:42:08AM +0200, Stefano Garzarella wrote:
> Since it's a performance regression, if possible we could include it in 
> 6.1-rc1.
> There shouldn't be any particular criticism.
> 
> v1: https://lists.gnu.org/archive/html/qemu-devel/2021-07/msg01526.html
> v2:
>   - s/bacth/batch/ [stefanha]
>   - limit the batch with the number of available events [stefanha]
>   - rebased on master
>   - re-run benchmarks
> 
> Commit 2558cb8dd4 ("linux-aio: increasing MAX_EVENTS to a larger hardcoded
> value") changed MAX_EVENTS from 128 to 1024, to increase the number of
> in-flight requests. But this change also increased the potential maximum batch
> to 1024 elements.
> 
> The problem is noticeable when we have a lot of requests in flight and 
> multiple
> queues attached to the same AIO context.
> In this case we potentially create very large batches. Instead, when we have
> a single queue, the batch is limited because when the queue is unplugged,
> there is a call to io_submit(2).
> In practice, io_submit(2) was called only when there are no more queues 
> plugged
> in or when we fill the AIO queue (MAX_EVENTS = 1024).
> 
> This series limit the batch size (number of request submitted to the kernel
> through a single io_submit(2) call) in the Linux AIO backend, and add a new
> `aio-max-batch` parameter to IOThread to allow tuning it.
> If `aio-max-batch` is equal to 0 (default value), the AIO engine will use its
> default maximum batch size value.
> 
> I run some benchmarks to choose 32 as default batch value for Linux AIO.
> Below the kIOPS measured with fio running in the guest (average over 3 runs):
> 
>|   master  |   with this series applied   
>  |
>|143c2e04328| 
> maxbatch=8|maxbatch=16|maxbatch=32|maxbatch=64|
>   # queues | 1q  | 4qs | 1q  | 4qs | 1q  | 4qs | 1q  | 4qs | 1q  | 
> 4qs |
> -- randread tests 
> -|---|
> bs=4k iodepth=1| 200 | 195 | 181 | 208 | 200 | 203 | 206 | 212 | 200 | 
> 204 |
> bs=4k iodepth=8| 269 | 231 | 256 | 244 | 255 | 260 | 266 | 268 | 270 | 
> 250 |
> bs=4k iodepth=64   | 230 | 198 | 262 | 265 | 261 | 253 | 260 | 273 | 253 | 
> 263 |
> bs=4k iodepth=128  | 217 | 181 | 261 | 253 | 249 | 276 | 250 | 278 | 255 | 
> 278 |
> bs=16k iodepth=1   | 130 | 130 | 130 | 130 | 130 | 130 | 137 | 130 | 130 | 
> 130 |
> bs=16k iodepth=8   | 130 | 131 | 130 | 131 | 130 | 130 | 137 | 131 | 131 | 
> 130 |
> bs=16k iodepth=64  | 130 | 102 | 131 | 128 | 131 | 128 | 137 | 140 | 130 | 
> 128 |
> bs=16k iodepth=128 | 131 | 100 | 130 | 128 | 131 | 129 | 137 | 141 | 130 | 
> 129 |
> 
> 1q  = virtio-blk device with a single queue
> 4qs = virito-blk device with multi queues (one queue per vCPU - 4)
> 
> I reported only the most significant tests, but I also did other tests to
> make sure there were no regressions, here the full report:
> https://docs.google.com/spreadsheets/d/11X3_5FJu7pnMTlf4ZatRDvsnU9K3EPj6Mn3aJIsE4tI
> 
> Test environment:
> - Disk: Intel Corporation NVMe Datacenter SSD [Optane]
> - CPU: Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz
> - QEMU: qemu-system-x86_64 -machine q35,accel=kvm -smp 4 -m 4096 \
>   ... \
>   -object iothread,id=iothread0,aio-max-batch=${MAX_BATCH} \
>   -device virtio-blk-pci,iothread=iothread0,num-queues=${NUM_QUEUES}
> 
> - benchmark: fio --ioengine=libaio --thread --group_reporting \
>  --number_ios=20 --direct=1 --filename=/dev/vdb \
>  --rw=${TEST} --bs=${BS} --iodepth=${IODEPTH} --numjobs=16
> 
> Next steps:
>  - benchmark io_uring and use `aio-max-batch` also there
>  - make MAX_EVENTS parametric adding a new `aio-max-events` parameter
> 
> Thanks,
> Stefano
> 
> Stefano Garzarella (3):
>   iothread: generalize iothread_set_param/iothread_get_param
>   iothread: add aio-max-batch parameter
>   linux-aio: limit the batch size using `aio-max-batch` parameter
> 
>  qapi/misc.json|  6 ++-
>  qapi/qom.json |  7 +++-
>  include/block/aio.h   | 12 ++
>  include/sysemu/iothread.h |  3 ++
>  block/linux-aio.c |  9 -
>  iothread.c| 82 ++-
>  monitor/hmp-cmds.c|  2 +
>  util/aio-posix.c  | 12 ++
>  util/aio-win32.c  |  5 +++
>  util/async.c  |  2 +
>  qemu-options.hx   |  8 +++-
>  11 files changed, 134 insertions(+), 14 deletions(-)
> 
> -- 
> 2.31.1
> 

Thanks, applied to my master tree:
https://gitlab.com/stefanha/qemu/commits/master

Stefan


signature.asc
Description: PGP signature


[PULL for-6.1 1/3] iothread: generalize iothread_set_param/iothread_get_param

2021-07-21 Thread Stefan Hajnoczi
From: Stefano Garzarella 

Changes in preparation for next patches where we add a new
parameter not related to the poll mechanism.

Let's add two new generic functions (iothread_set_param and
iothread_get_param) that we use to set and get IOThread
parameters.

Signed-off-by: Stefano Garzarella 
Message-id: 20210721094211.69853-2-sgarz...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 iothread.c | 27 +++
 1 file changed, 23 insertions(+), 4 deletions(-)

diff --git a/iothread.c b/iothread.c
index 2c5ccd7367..103679a16b 100644
--- a/iothread.c
+++ b/iothread.c
@@ -213,7 +213,7 @@ static PollParamInfo poll_shrink_info = {
 "poll-shrink", offsetof(IOThread, poll_shrink),
 };
 
-static void iothread_get_poll_param(Object *obj, Visitor *v,
+static void iothread_get_param(Object *obj, Visitor *v,
 const char *name, void *opaque, Error **errp)
 {
 IOThread *iothread = IOTHREAD(obj);
@@ -223,7 +223,7 @@ static void iothread_get_poll_param(Object *obj, Visitor *v,
 visit_type_int64(v, name, field, errp);
 }
 
-static void iothread_set_poll_param(Object *obj, Visitor *v,
+static bool iothread_set_param(Object *obj, Visitor *v,
 const char *name, void *opaque, Error **errp)
 {
 IOThread *iothread = IOTHREAD(obj);
@@ -232,17 +232,36 @@ static void iothread_set_poll_param(Object *obj, Visitor 
*v,
 int64_t value;
 
 if (!visit_type_int64(v, name, &value, errp)) {
-return;
+return false;
 }
 
 if (value < 0) {
 error_setg(errp, "%s value must be in range [0, %" PRId64 "]",
info->name, INT64_MAX);
-return;
+return false;
 }
 
 *field = value;
 
+return true;
+}
+
+static void iothread_get_poll_param(Object *obj, Visitor *v,
+const char *name, void *opaque, Error **errp)
+{
+
+iothread_get_param(obj, v, name, opaque, errp);
+}
+
+static void iothread_set_poll_param(Object *obj, Visitor *v,
+const char *name, void *opaque, Error **errp)
+{
+IOThread *iothread = IOTHREAD(obj);
+
+if (!iothread_set_param(obj, v, name, opaque, errp)) {
+return;
+}
+
 if (iothread->ctx) {
 aio_context_set_poll_params(iothread->ctx,
 iothread->poll_max_ns,
-- 
2.31.1



[PULL for-6.1 2/3] iothread: add aio-max-batch parameter

2021-07-21 Thread Stefan Hajnoczi
From: Stefano Garzarella 

The `aio-max-batch` parameter will be propagated to AIO engines
and it will be used to control the maximum number of queued requests.

When there are in queue a number of requests equal to `aio-max-batch`,
the engine invokes the system call to forward the requests to the kernel.

This parameter allows us to control the maximum batch size to reduce
the latency that requests might accumulate while queued in the AIO
engine queue.

If `aio-max-batch` is equal to 0 (default value), the AIO engine will
use its default maximum batch size value.

Signed-off-by: Stefano Garzarella 
Message-id: 20210721094211.69853-3-sgarz...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 qapi/misc.json|  6 -
 qapi/qom.json |  7 -
 include/block/aio.h   | 12 +
 include/sysemu/iothread.h |  3 +++
 iothread.c| 55 +++
 monitor/hmp-cmds.c|  2 ++
 util/aio-posix.c  | 12 +
 util/aio-win32.c  |  5 
 util/async.c  |  2 ++
 qemu-options.hx   |  8 --
 10 files changed, 103 insertions(+), 9 deletions(-)

diff --git a/qapi/misc.json b/qapi/misc.json
index 156f98203e..5c2ca3b556 100644
--- a/qapi/misc.json
+++ b/qapi/misc.json
@@ -86,6 +86,9 @@
 # @poll-shrink: how many ns will be removed from polling time, 0 means that
 #   it's not configured (since 2.9)
 #
+# @aio-max-batch: maximum number of requests in a batch for the AIO engine,
+# 0 means that the engine will use its default (since 6.1)
+#
 # Since: 2.0
 ##
 { 'struct': 'IOThreadInfo',
@@ -93,7 +96,8 @@
'thread-id': 'int',
'poll-max-ns': 'int',
'poll-grow': 'int',
-   'poll-shrink': 'int' } }
+   'poll-shrink': 'int',
+   'aio-max-batch': 'int' } }
 
 ##
 # @query-iothreads:
diff --git a/qapi/qom.json b/qapi/qom.json
index 652be317b8..6d5f4a88e6 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -516,12 +516,17 @@
 #   algorithm detects it is spending too long polling without
 #   encountering events. 0 selects a default behaviour (default: 0)
 #
+# @aio-max-batch: maximum number of requests in a batch for the AIO engine,
+# 0 means that the engine will use its default
+# (default:0, since 6.1)
+#
 # Since: 2.0
 ##
 { 'struct': 'IothreadProperties',
   'data': { '*poll-max-ns': 'int',
 '*poll-grow': 'int',
-'*poll-shrink': 'int' } }
+'*poll-shrink': 'int',
+'*aio-max-batch': 'int' } }
 
 ##
 # @MemoryBackendProperties:
diff --git a/include/block/aio.h b/include/block/aio.h
index 807edce9b5..47fbe9d81f 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -232,6 +232,9 @@ struct AioContext {
 int64_t poll_grow;  /* polling time growth factor */
 int64_t poll_shrink;/* polling time shrink factor */
 
+/* AIO engine parameters */
+int64_t aio_max_batch;  /* maximum number of requests in a batch */
+
 /*
  * List of handlers participating in userspace polling.  Protected by
  * ctx->list_lock.  Iterated and modified mostly by the event loop thread
@@ -755,4 +758,13 @@ void aio_context_set_poll_params(AioContext *ctx, int64_t 
max_ns,
  int64_t grow, int64_t shrink,
  Error **errp);
 
+/**
+ * aio_context_set_aio_params:
+ * @ctx: the aio context
+ * @max_batch: maximum number of requests in a batch, 0 means that the
+ * engine will use its default
+ */
+void aio_context_set_aio_params(AioContext *ctx, int64_t max_batch,
+Error **errp);
+
 #endif
diff --git a/include/sysemu/iothread.h b/include/sysemu/iothread.h
index f177142f16..7f714bd136 100644
--- a/include/sysemu/iothread.h
+++ b/include/sysemu/iothread.h
@@ -37,6 +37,9 @@ struct IOThread {
 int64_t poll_max_ns;
 int64_t poll_grow;
 int64_t poll_shrink;
+
+/* AioContext AIO engine parameters */
+int64_t aio_max_batch;
 };
 typedef struct IOThread IOThread;
 
diff --git a/iothread.c b/iothread.c
index 103679a16b..ddbbde61f7 100644
--- a/iothread.c
+++ b/iothread.c
@@ -152,6 +152,24 @@ static void iothread_init_gcontext(IOThread *iothread)
 iothread->main_loop = g_main_loop_new(iothread->worker_context, TRUE);
 }
 
+static void iothread_set_aio_context_params(IOThread *iothread, Error **errp)
+{
+ERRP_GUARD();
+
+aio_context_set_poll_params(iothread->ctx,
+iothread->poll_max_ns,
+iothread->poll_grow,
+iothread->poll_shrink,
+errp);
+if (*errp) {
+return;
+}
+
+aio_context_set_aio_params(iothread->ctx,
+   iothread->aio_max_batch,
+   errp);
+}
+
 static void iothread_complete(UserCreatable *o

[PULL for-6.1 0/3] Block patches

2021-07-21 Thread Stefan Hajnoczi
The following changes since commit 801f3db7564dcce8a37a70833c0abe40ec19f8ce:

  Merge remote-tracking branch 'remotes/philmd/tags/kconfig-20210720' into 
staging (2021-07-20 19:30:28 +0100)

are available in the Git repository at:

  https://gitlab.com/stefanha/qemu.git tags/block-pull-request

for you to fetch changes up to d7ddd0a1618a75b31dc308bb37365ce1da972154:

  linux-aio: limit the batch size using `aio-max-batch` parameter (2021-07-21 
13:47:50 +0100)


Pull request

Stefano's performance regression fix for commit 2558cb8dd4 ("linux-aio:
increasing MAX_EVENTS to a larger hardcoded value").



Stefano Garzarella (3):
  iothread: generalize iothread_set_param/iothread_get_param
  iothread: add aio-max-batch parameter
  linux-aio: limit the batch size using `aio-max-batch` parameter

 qapi/misc.json|  6 ++-
 qapi/qom.json |  7 +++-
 include/block/aio.h   | 12 ++
 include/sysemu/iothread.h |  3 ++
 block/linux-aio.c |  9 -
 iothread.c| 82 ++-
 monitor/hmp-cmds.c|  2 +
 util/aio-posix.c  | 12 ++
 util/aio-win32.c  |  5 +++
 util/async.c  |  2 +
 qemu-options.hx   |  8 +++-
 11 files changed, 134 insertions(+), 14 deletions(-)

-- 
2.31.1



[PATCH v6 33/33] iotests/image-fleecing: add test-case for copy-before-write filter

2021-07-21 Thread Vladimir Sementsov-Ogievskiy
New fleecing method becomes available: copy-before-write filter.

Actually we don't need backup job to setup image fleecing. Add test
for new recommended way of image fleecing.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
---
 tests/qemu-iotests/tests/image-fleecing | 50 +-
 tests/qemu-iotests/tests/image-fleecing.out | 72 +
 2 files changed, 107 insertions(+), 15 deletions(-)

diff --git a/tests/qemu-iotests/tests/image-fleecing 
b/tests/qemu-iotests/tests/image-fleecing
index e210c00d28..f6318492c6 100755
--- a/tests/qemu-iotests/tests/image-fleecing
+++ b/tests/qemu-iotests/tests/image-fleecing
@@ -48,7 +48,7 @@ remainder = [('0xd5', '0x108000',  '32k'), # Right-end of 
partial-left [1]
  ('0xdc', '32M',   '32k'), # Left-end of partial-right [2]
  ('0xcd', '0x3ff', '64k')] # patterns[3]
 
-def do_test(base_img_path, fleece_img_path, nbd_sock_path, vm):
+def do_test(use_cbw, base_img_path, fleece_img_path, nbd_sock_path, vm):
 log('--- Setting up images ---')
 log('')
 
@@ -67,6 +67,7 @@ def do_test(base_img_path, fleece_img_path, nbd_sock_path, 
vm):
 
 src_node = 'source'
 tmp_node = 'temp'
+qom_path = '/machine/peripheral/sda'
 vm.add_blockdev(f'driver={iotests.imgfmt},file.driver=file,'
 f'file.filename={base_img_path},node-name={src_node}')
 vm.add_device('virtio-scsi')
@@ -90,12 +91,22 @@ def do_test(base_img_path, fleece_img_path, nbd_sock_path, 
vm):
 'backing': src_node,
 }))
 
-# Establish COW from source to fleecing node
-log(vm.qmp('blockdev-backup',
-   job_id='fleecing',
-   device=src_node,
-   target=tmp_node,
-   sync='none'))
+# Establish CBW from source to fleecing node
+if use_cbw:
+log(vm.qmp('blockdev-add', {
+'driver': 'copy-before-write',
+'node-name': 'fl-cbw',
+'file': src_node,
+'target': tmp_node
+}))
+
+log(vm.qmp('qom-set', path=qom_path, property='drive', value='fl-cbw'))
+else:
+log(vm.qmp('blockdev-backup',
+   job_id='fleecing',
+   device=src_node,
+   target=tmp_node,
+   sync='none'))
 
 log('')
 log('--- Setting up NBD Export ---')
@@ -124,7 +135,7 @@ def do_test(base_img_path, fleece_img_path, nbd_sock_path, 
vm):
 for p in overwrite:
 cmd = 'write -P%s %s %s' % p
 log(cmd)
-log(vm.hmp_qemu_io('/machine/peripheral/sda', cmd, qdev=True))
+log(vm.hmp_qemu_io(qom_path, cmd, qdev=True))
 
 log('')
 log('--- Verifying Data ---')
@@ -139,10 +150,15 @@ def do_test(base_img_path, fleece_img_path, 
nbd_sock_path, vm):
 log('--- Cleanup ---')
 log('')
 
-log(vm.qmp('block-job-cancel', device='fleecing'))
-e = vm.event_wait('BLOCK_JOB_CANCELLED')
-assert e is not None
-log(e, filters=[iotests.filter_qmp_event])
+if use_cbw:
+log(vm.qmp('qom-set', path=qom_path, property='drive', value=src_node))
+log(vm.qmp('blockdev-del', node_name='fl-cbw'))
+else:
+log(vm.qmp('block-job-cancel', device='fleecing'))
+e = vm.event_wait('BLOCK_JOB_CANCELLED')
+assert e is not None
+log(e, filters=[iotests.filter_qmp_event])
+
 log(vm.qmp('nbd-server-stop'))
 log(vm.qmp('blockdev-del', node_name=tmp_node))
 vm.shutdown()
@@ -160,13 +176,17 @@ def do_test(base_img_path, fleece_img_path, 
nbd_sock_path, vm):
 log('Done')
 
 
-def test():
+def test(use_cbw):
 with iotests.FilePath('base.img') as base_img_path, \
  iotests.FilePath('fleece.img') as fleece_img_path, \
  iotests.FilePath('nbd.sock',
   base_dir=iotests.sock_dir) as nbd_sock_path, \
  iotests.VM() as vm:
-do_test(base_img_path, fleece_img_path, nbd_sock_path, vm)
+do_test(use_cbw, base_img_path, fleece_img_path, nbd_sock_path, vm)
+
 
+log('=== Test backup(sync=none) based fleecing ===\n')
+test(False)
 
-test()
+log('=== Test filter based fleecing ===\n')
+test(True)
diff --git a/tests/qemu-iotests/tests/image-fleecing.out 
b/tests/qemu-iotests/tests/image-fleecing.out
index 314a1b54e9..e96d122a8b 100644
--- a/tests/qemu-iotests/tests/image-fleecing.out
+++ b/tests/qemu-iotests/tests/image-fleecing.out
@@ -1,3 +1,5 @@
+=== Test backup(sync=none) based fleecing ===
+
 --- Setting up images ---
 
 Done
@@ -65,3 +67,73 @@ read -P0xdc 32M 32k
 read -P0xcd 0x3ff 64k
 
 Done
+=== Test filter based fleecing ===
+
+--- Setting up images ---
+
+Done
+
+--- Launching VM ---
+
+Done
+
+--- Setting up Fleecing Graph ---
+
+{"return": {}}
+{"return": {}}
+{"return": {}}
+
+--- Setting up NBD Export ---
+
+{"return": {}}
+{"return": {}}
+
+--- Sanity Check ---
+
+read -P0x5d 0 64k
+read -P0xd5 1M 64k
+read -P0xdc 32M 64k
+read -P0xcd 0x3ff 64k
+read -P0 0x00f8000 32k
+r

[PATCH v6 29/33] iotests.py: hmp_qemu_io: support qdev

2021-07-21 Thread Vladimir Sementsov-Ogievskiy
Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
---
 tests/qemu-iotests/iotests.py | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 025e288ddd..9d0031a0e8 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -655,9 +655,10 @@ def resume_drive(self, drive: str) -> None:
 self.hmp(f'qemu-io {drive} "remove_break bp_{drive}"')
 
 def hmp_qemu_io(self, drive: str, cmd: str,
-use_log: bool = False) -> QMPMessage:
+use_log: bool = False, qdev: bool = False) -> QMPMessage:
 """Write to a given drive using an HMP command"""
-return self.hmp(f'qemu-io {drive} "{cmd}"', use_log=use_log)
+d = '-d ' if qdev else ''
+return self.hmp(f'qemu-io {d}{drive} "{cmd}"', use_log=use_log)
 
 def flatten_qmp_object(self, obj, output=None, basestr=''):
 if output is None:
-- 
2.29.2




[PATCH v6 30/33] iotests/image-fleecing: proper source device

2021-07-21 Thread Vladimir Sementsov-Ogievskiy
Define scsi device to operate with it by qom-set in further patch.

Give a new node-name to source block node, to not look like device
name.

Job now don't want to work without giving explicit id, so, let's call
it "fleecing".

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
---
 tests/qemu-iotests/tests/image-fleecing | 12 
 tests/qemu-iotests/tests/image-fleecing.out |  2 +-
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/tests/qemu-iotests/tests/image-fleecing 
b/tests/qemu-iotests/tests/image-fleecing
index 799369e290..961941bb27 100755
--- a/tests/qemu-iotests/tests/image-fleecing
+++ b/tests/qemu-iotests/tests/image-fleecing
@@ -70,7 +70,11 @@ with iotests.FilePath('base.img') as base_img_path, \
 log('--- Launching VM ---')
 log('')
 
-vm.add_drive(base_img_path)
+src_node = 'source'
+vm.add_blockdev(f'driver={iotests.imgfmt},file.driver=file,'
+f'file.filename={base_img_path},node-name={src_node}')
+vm.add_device('virtio-scsi')
+vm.add_device(f'scsi-hd,id=sda,drive={src_node}')
 vm.launch()
 log('Done')
 
@@ -78,7 +82,6 @@ with iotests.FilePath('base.img') as base_img_path, \
 log('--- Setting up Fleecing Graph ---')
 log('')
 
-src_node = 'drive0'
 tgt_node = 'fleeceNode'
 
 # create tgt_node backed by src_node
@@ -94,6 +97,7 @@ with iotests.FilePath('base.img') as base_img_path, \
 
 # Establish COW from source to fleecing node
 log(vm.qmp('blockdev-backup',
+   job_id='fleecing',
device=src_node,
target=tgt_node,
sync='none'))
@@ -125,7 +129,7 @@ with iotests.FilePath('base.img') as base_img_path, \
 for p in overwrite:
 cmd = 'write -P%s %s %s' % p
 log(cmd)
-log(vm.hmp_qemu_io(src_node, cmd))
+log(vm.hmp_qemu_io('/machine/peripheral/sda', cmd, qdev=True))
 
 log('')
 log('--- Verifying Data ---')
@@ -140,7 +144,7 @@ with iotests.FilePath('base.img') as base_img_path, \
 log('--- Cleanup ---')
 log('')
 
-log(vm.qmp('block-job-cancel', device=src_node))
+log(vm.qmp('block-job-cancel', device='fleecing'))
 e = vm.event_wait('BLOCK_JOB_CANCELLED')
 assert e is not None
 log(e, filters=[iotests.filter_qmp_event])
diff --git a/tests/qemu-iotests/tests/image-fleecing.out 
b/tests/qemu-iotests/tests/image-fleecing.out
index 16643dde30..314a1b54e9 100644
--- a/tests/qemu-iotests/tests/image-fleecing.out
+++ b/tests/qemu-iotests/tests/image-fleecing.out
@@ -50,7 +50,7 @@ read -P0 0x3fe 64k
 --- Cleanup ---
 
 {"return": {}}
-{"data": {"device": "drive0", "len": 67108864, "offset": 393216, "speed": 0, 
"type": "backup"}, "event": "BLOCK_JOB_CANCELLED", "timestamp": 
{"microseconds": "USECS", "seconds": "SECS"}}
+{"data": {"device": "fleecing", "len": 67108864, "offset": 393216, "speed": 0, 
"type": "backup"}, "event": "BLOCK_JOB_CANCELLED", "timestamp": 
{"microseconds": "USECS", "seconds": "SECS"}}
 {"return": {}}
 {"return": {}}
 
-- 
2.29.2




[PATCH v6 22/33] qapi: publish copy-before-write filter

2021-07-21 Thread Vladimir Sementsov-Ogievskiy
Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
---
 qapi/block-core.json | 25 +++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 675d8265eb..59d3e5e42d 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2825,13 +2825,14 @@
 # @blklogwrites: Since 3.0
 # @blkreplay: Since 4.2
 # @compress: Since 5.0
+# @copy-before-write: Since 6.1
 #
 # Since: 2.9
 ##
 { 'enum': 'BlockdevDriver',
   'data': [ 'blkdebug', 'blklogwrites', 'blkreplay', 'blkverify', 'bochs',
-'cloop', 'compress', 'copy-on-read', 'dmg', 'file', 'ftp', 'ftps',
-'gluster',
+'cloop', 'compress', 'copy-before-write', 'copy-on-read', 'dmg',
+'file', 'ftp', 'ftps', 'gluster',
 {'name': 'host_cdrom', 'if': 'defined(HAVE_HOST_BLOCK_DEVICE)' },
 {'name': 'host_device', 'if': 'defined(HAVE_HOST_BLOCK_DEVICE)' },
 'http', 'https', 'iscsi',
@@ -4049,6 +4050,25 @@
   'base': 'BlockdevOptionsGenericFormat',
   'data': { '*bottom': 'str' } }
 
+##
+# @BlockdevOptionsCbw:
+#
+# Driver specific block device options for the copy-before-write driver,
+# which does so called copy-before-write operations: when data is
+# written to the filter, the filter firstly reads corresponding blocks
+# from its file child and copies them to @target child. After successful
+# copying the write request is propagated to file child. If copying
+# failed, the original write request is failed too and no data is written
+# to file child.
+#
+# @target: The target for copy-before-write operations.
+#
+# Since: 6.1
+##
+{ 'struct': 'BlockdevOptionsCbw',
+  'base': 'BlockdevOptionsGenericFormat',
+  'data': { 'target': 'BlockdevRef' } }
+
 ##
 # @BlockdevOptions:
 #
@@ -4101,6 +4121,7 @@
   'bochs':  'BlockdevOptionsGenericFormat',
   'cloop':  'BlockdevOptionsGenericFormat',
   'compress':   'BlockdevOptionsGenericFormat',
+  'copy-before-write':'BlockdevOptionsCbw',
   'copy-on-read':'BlockdevOptionsCor',
   'dmg':'BlockdevOptionsGenericFormat',
   'file':   'BlockdevOptionsFile',
-- 
2.29.2




[PATCH v6 31/33] iotests/image-fleecing: rename tgt_node

2021-07-21 Thread Vladimir Sementsov-Ogievskiy
Actually target of backup(sync=None) is not a final backup target:
image fleecing is intended to be used with external tool, which will
copy data from fleecing node to some real backup target.

Also, we are going to add a test case for "push backup with fleecing",
where instead of exporting fleecing node by NBD, we'll start a backup
job from fleecing node to real backup target.

To avoid confusion, let's rename temporary fleecing node now.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
---
 tests/qemu-iotests/tests/image-fleecing | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/tests/qemu-iotests/tests/image-fleecing 
b/tests/qemu-iotests/tests/image-fleecing
index 961941bb27..ec4ef5f3f6 100755
--- a/tests/qemu-iotests/tests/image-fleecing
+++ b/tests/qemu-iotests/tests/image-fleecing
@@ -71,6 +71,7 @@ with iotests.FilePath('base.img') as base_img_path, \
 log('')
 
 src_node = 'source'
+tmp_node = 'temp'
 vm.add_blockdev(f'driver={iotests.imgfmt},file.driver=file,'
 f'file.filename={base_img_path},node-name={src_node}')
 vm.add_device('virtio-scsi')
@@ -82,12 +83,11 @@ with iotests.FilePath('base.img') as base_img_path, \
 log('--- Setting up Fleecing Graph ---')
 log('')
 
-tgt_node = 'fleeceNode'
 
-# create tgt_node backed by src_node
+# create tmp_node backed by src_node
 log(vm.qmp('blockdev-add', {
 'driver': 'qcow2',
-'node-name': tgt_node,
+'node-name': tmp_node,
 'file': {
 'driver': 'file',
 'filename': fleece_img_path,
@@ -99,19 +99,19 @@ with iotests.FilePath('base.img') as base_img_path, \
 log(vm.qmp('blockdev-backup',
job_id='fleecing',
device=src_node,
-   target=tgt_node,
+   target=tmp_node,
sync='none'))
 
 log('')
 log('--- Setting up NBD Export ---')
 log('')
 
-nbd_uri = 'nbd+unix:///%s?socket=%s' % (tgt_node, nbd_sock_path)
+nbd_uri = 'nbd+unix:///%s?socket=%s' % (tmp_node, nbd_sock_path)
 log(vm.qmp('nbd-server-start',
{'addr': { 'type': 'unix',
   'data': { 'path': nbd_sock_path } } }))
 
-log(vm.qmp('nbd-server-add', device=tgt_node))
+log(vm.qmp('nbd-server-add', device=tmp_node))
 
 log('')
 log('--- Sanity Check ---')
@@ -149,7 +149,7 @@ with iotests.FilePath('base.img') as base_img_path, \
 assert e is not None
 log(e, filters=[iotests.filter_qmp_event])
 log(vm.qmp('nbd-server-stop'))
-log(vm.qmp('blockdev-del', node_name=tgt_node))
+log(vm.qmp('blockdev-del', node_name=tmp_node))
 vm.shutdown()
 
 log('')
-- 
2.29.2




[PATCH v6 27/33] iotests/222: constantly use single quotes for strings

2021-07-21 Thread Vladimir Sementsov-Ogievskiy
The file use both single and double quotes for strings. Let's be
consistent.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
---
 tests/qemu-iotests/222 | 68 +-
 1 file changed, 34 insertions(+), 34 deletions(-)

diff --git a/tests/qemu-iotests/222 b/tests/qemu-iotests/222
index 5e2556f8df..799369e290 100755
--- a/tests/qemu-iotests/222
+++ b/tests/qemu-iotests/222
@@ -30,23 +30,23 @@ iotests.script_initialize(
 supported_platforms=['linux'],
 )
 
-patterns = [("0x5d", "0", "64k"),
-("0xd5", "1M","64k"),
-("0xdc", "32M",   "64k"),
-("0xcd", "0x3ff", "64k")]  # 64M - 64K
+patterns = [('0x5d', '0', '64k'),
+('0xd5', '1M','64k'),
+('0xdc', '32M',   '64k'),
+('0xcd', '0x3ff', '64k')]  # 64M - 64K
 
-overwrite = [("0xab", "0", "64k"), # Full overwrite
- ("0xad", "0x00f8000", "64k"), # Partial-left (1M-32K)
- ("0x1d", "0x2008000", "64k"), # Partial-right (32M+32K)
- ("0xea", "0x3fe", "64k")] # Adjacent-left (64M - 128K)
+overwrite = [('0xab', '0', '64k'), # Full overwrite
+ ('0xad', '0x00f8000', '64k'), # Partial-left (1M-32K)
+ ('0x1d', '0x2008000', '64k'), # Partial-right (32M+32K)
+ ('0xea', '0x3fe', '64k')] # Adjacent-left (64M - 128K)
 
-zeroes = [("0", "0x00f8000", "32k"), # Left-end of partial-left (1M-32K)
-  ("0", "0x201", "32k"), # Right-end of partial-right (32M+64K)
-  ("0", "0x3fe", "64k")] # overwrite[3]
+zeroes = [('0', '0x00f8000', '32k'), # Left-end of partial-left (1M-32K)
+  ('0', '0x201', '32k'), # Right-end of partial-right (32M+64K)
+  ('0', '0x3fe', '64k')] # overwrite[3]
 
-remainder = [("0xd5", "0x108000",  "32k"), # Right-end of partial-left [1]
- ("0xdc", "32M",   "32k"), # Left-end of partial-right [2]
- ("0xcd", "0x3ff", "64k")] # patterns[3]
+remainder = [('0xd5', '0x108000',  '32k'), # Right-end of partial-left [1]
+ ('0xdc', '32M',   '32k'), # Left-end of partial-right [2]
+ ('0xcd', '0x3ff', '64k')] # patterns[3]
 
 with iotests.FilePath('base.img') as base_img_path, \
  iotests.FilePath('fleece.img') as fleece_img_path, \
@@ -58,7 +58,7 @@ with iotests.FilePath('base.img') as base_img_path, \
 log('')
 
 assert qemu_img('create', '-f', iotests.imgfmt, base_img_path, '64M') == 0
-assert qemu_img('create', '-f', "qcow2", fleece_img_path, '64M') == 0
+assert qemu_img('create', '-f', 'qcow2', fleece_img_path, '64M') == 0
 
 for p in patterns:
 qemu_io('-f', iotests.imgfmt,
@@ -78,43 +78,43 @@ with iotests.FilePath('base.img') as base_img_path, \
 log('--- Setting up Fleecing Graph ---')
 log('')
 
-src_node = "drive0"
-tgt_node = "fleeceNode"
+src_node = 'drive0'
+tgt_node = 'fleeceNode'
 
 # create tgt_node backed by src_node
-log(vm.qmp("blockdev-add", {
-"driver": "qcow2",
-"node-name": tgt_node,
-"file": {
-"driver": "file",
-"filename": fleece_img_path,
+log(vm.qmp('blockdev-add', {
+'driver': 'qcow2',
+'node-name': tgt_node,
+'file': {
+'driver': 'file',
+'filename': fleece_img_path,
 },
-"backing": src_node,
+'backing': src_node,
 }))
 
 # Establish COW from source to fleecing node
-log(vm.qmp("blockdev-backup",
+log(vm.qmp('blockdev-backup',
device=src_node,
target=tgt_node,
-   sync="none"))
+   sync='none'))
 
 log('')
 log('--- Setting up NBD Export ---')
 log('')
 
 nbd_uri = 'nbd+unix:///%s?socket=%s' % (tgt_node, nbd_sock_path)
-log(vm.qmp("nbd-server-start",
-   {"addr": { "type": "unix",
-  "data": { "path": nbd_sock_path } } }))
+log(vm.qmp('nbd-server-start',
+   {'addr': { 'type': 'unix',
+  'data': { 'path': nbd_sock_path } } }))
 
-log(vm.qmp("nbd-server-add", device=tgt_node))
+log(vm.qmp('nbd-server-add', device=tgt_node))
 
 log('')
 log('--- Sanity Check ---')
 log('')
 
 for p in patterns + zeroes:
-cmd = "read -P%s %s %s" % p
+cmd = 'read -P%s %s %s' % p
 log(cmd)
 assert qemu_io_silent('-r', '-f', 'raw', '-c', cmd, nbd_uri) == 0
 
@@ -123,7 +123,7 @@ with iotests.FilePath('base.img') as base_img_path, \
 log('')
 
 for p in overwrite:
-cmd = "write -P%s %s %s" % p
+cmd = 'write -P%s %s %s' % p
 log(cmd)
 log(vm.hmp_qemu_io(src_node, cmd))
 
@@ -132,7 +132,7 @@ with iotests.FilePath('base.img') as base_img_path, \
 log('')
 
 for p in patterns + zeroes:
-cmd = "read -P%s %s %s" % p
+cmd = 'read -P%s %s 

[PATCH v6 24/33] python/qemu/machine: QEMUMachine: improve qmp() method

2021-07-21 Thread Vladimir Sementsov-Ogievskiy
We often call qmp() with unpacking dict, like qmp('foo', **{...}).
mypy don't really like it, it thinks that passed unpacked dict is a
positional argument and complains that it type should be bool (because
second argument of qmp() is conv_keys: bool).

Allow passing dict directly, simplifying interface, and giving a way to
satisfy mypy.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
Reviewed-by: John Snow 
---
 python/qemu/machine/machine.py | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/python/qemu/machine/machine.py b/python/qemu/machine/machine.py
index 5eab31aeec..ce1e130c13 100644
--- a/python/qemu/machine/machine.py
+++ b/python/qemu/machine/machine.py
@@ -572,11 +572,21 @@ def _qmp_args(cls, conv_keys: bool,
 return args
 
 def qmp(self, cmd: str,
-conv_keys: bool = True,
+args_dict: Optional[Dict[str, object]] = None,
+conv_keys: Optional[bool] = None,
 **args: Any) -> QMPMessage:
 """
 Invoke a QMP command and return the response dict
 """
+if args_dict is not None:
+assert not args
+assert conv_keys is None
+args = args_dict
+conv_keys = False
+
+if conv_keys is None:
+conv_keys = True
+
 qmp_args = self._qmp_args(conv_keys, args)
 return self._qmp.cmd(cmd, args=qmp_args)
 
-- 
2.29.2




[PATCH v6 26/33] iotests/222: fix pylint and mypy complains

2021-07-21 Thread Vladimir Sementsov-Ogievskiy
Here:
 - long line
 - move to new interface of vm.qmp() (direct passing dict), to avoid
   mypy false-positive, as it thinks that unpacked dict is a positional
   argument.
 - extra parenthesis
 - handle event_wait possible None value

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
---
 tests/qemu-iotests/222 | 20 +++-
 tests/qemu-iotests/297 |  2 +-
 2 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/tests/qemu-iotests/222 b/tests/qemu-iotests/222
index b48afe623e..5e2556f8df 100755
--- a/tests/qemu-iotests/222
+++ b/tests/qemu-iotests/222
@@ -50,7 +50,8 @@ remainder = [("0xd5", "0x108000",  "32k"), # Right-end of 
partial-left [1]
 
 with iotests.FilePath('base.img') as base_img_path, \
  iotests.FilePath('fleece.img') as fleece_img_path, \
- iotests.FilePath('nbd.sock', base_dir=iotests.sock_dir) as nbd_sock_path, 
\
+ iotests.FilePath('nbd.sock',
+  base_dir=iotests.sock_dir) as nbd_sock_path, \
  iotests.VM() as vm:
 
 log('--- Setting up images ---')
@@ -81,7 +82,7 @@ with iotests.FilePath('base.img') as base_img_path, \
 tgt_node = "fleeceNode"
 
 # create tgt_node backed by src_node
-log(vm.qmp("blockdev-add", **{
+log(vm.qmp("blockdev-add", {
 "driver": "qcow2",
 "node-name": tgt_node,
 "file": {
@@ -103,8 +104,8 @@ with iotests.FilePath('base.img') as base_img_path, \
 
 nbd_uri = 'nbd+unix:///%s?socket=%s' % (tgt_node, nbd_sock_path)
 log(vm.qmp("nbd-server-start",
-   **{"addr": { "type": "unix",
-"data": { "path": nbd_sock_path } } }))
+   {"addr": { "type": "unix",
+  "data": { "path": nbd_sock_path } } }))
 
 log(vm.qmp("nbd-server-add", device=tgt_node))
 
@@ -112,7 +113,7 @@ with iotests.FilePath('base.img') as base_img_path, \
 log('--- Sanity Check ---')
 log('')
 
-for p in (patterns + zeroes):
+for p in patterns + zeroes:
 cmd = "read -P%s %s %s" % p
 log(cmd)
 assert qemu_io_silent('-r', '-f', 'raw', '-c', cmd, nbd_uri) == 0
@@ -130,7 +131,7 @@ with iotests.FilePath('base.img') as base_img_path, \
 log('--- Verifying Data ---')
 log('')
 
-for p in (patterns + zeroes):
+for p in patterns + zeroes:
 cmd = "read -P%s %s %s" % p
 log(cmd)
 assert qemu_io_silent('-r', '-f', 'raw', '-c', cmd, nbd_uri) == 0
@@ -140,8 +141,9 @@ with iotests.FilePath('base.img') as base_img_path, \
 log('')
 
 log(vm.qmp('block-job-cancel', device=src_node))
-log(vm.event_wait('BLOCK_JOB_CANCELLED'),
-filters=[iotests.filter_qmp_event])
+e = vm.event_wait('BLOCK_JOB_CANCELLED')
+assert e is not None
+log(e, filters=[iotests.filter_qmp_event])
 log(vm.qmp('nbd-server-stop'))
 log(vm.qmp('blockdev-del', node_name=tgt_node))
 vm.shutdown()
@@ -150,7 +152,7 @@ with iotests.FilePath('base.img') as base_img_path, \
 log('--- Confirming writes ---')
 log('')
 
-for p in (overwrite + remainder):
+for p in overwrite + remainder:
 cmd = "read -P%s %s %s" % p
 log(cmd)
 assert qemu_io_silent(base_img_path, '-c', cmd) == 0
diff --git a/tests/qemu-iotests/297 b/tests/qemu-iotests/297
index 433b732336..345b617b34 100755
--- a/tests/qemu-iotests/297
+++ b/tests/qemu-iotests/297
@@ -31,7 +31,7 @@ SKIP_FILES = (
 '096', '118', '124', '132', '136', '139', '147', '148', '149',
 '151', '152', '155', '163', '165', '169', '194', '196', '199', '202',
 '203', '205', '206', '207', '208', '210', '211', '212', '213', '216',
-'218', '219', '222', '224', '228', '234', '235', '236', '237', '238',
+'218', '219', '224', '228', '234', '235', '236', '237', '238',
 '240', '242', '245', '246', '248', '255', '256', '257', '258', '260',
 '262', '264', '266', '274', '277', '280', '281', '295', '296', '298',
 '299', '302', '303', '304', '307',
-- 
2.29.2




[PATCH v6 23/33] python/qemu/machine.py: refactor _qemu_args()

2021-07-21 Thread Vladimir Sementsov-Ogievskiy
 - use shorter construction
 - don't create new dict if not needed
 - drop extra unpacking key-val arguments
 - drop extra default values

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
Reviewed-by: John Snow 
---
 python/qemu/machine/machine.py | 18 --
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/python/qemu/machine/machine.py b/python/qemu/machine/machine.py
index 971ed7e8c6..5eab31aeec 100644
--- a/python/qemu/machine/machine.py
+++ b/python/qemu/machine/machine.py
@@ -564,14 +564,12 @@ def _qmp(self) -> QEMUMonitorProtocol:
 return self._qmp_connection
 
 @classmethod
-def _qmp_args(cls, _conv_keys: bool = True, **args: Any) -> Dict[str, Any]:
-qmp_args = dict()
-for key, value in args.items():
-if _conv_keys:
-qmp_args[key.replace('_', '-')] = value
-else:
-qmp_args[key] = value
-return qmp_args
+def _qmp_args(cls, conv_keys: bool,
+  args: Dict[str, Any]) -> Dict[str, object]:
+if conv_keys:
+return {k.replace('_', '-'): v for k, v in args.items()}
+
+return args
 
 def qmp(self, cmd: str,
 conv_keys: bool = True,
@@ -579,7 +577,7 @@ def qmp(self, cmd: str,
 """
 Invoke a QMP command and return the response dict
 """
-qmp_args = self._qmp_args(conv_keys, **args)
+qmp_args = self._qmp_args(conv_keys, args)
 return self._qmp.cmd(cmd, args=qmp_args)
 
 def command(self, cmd: str,
@@ -590,7 +588,7 @@ def command(self, cmd: str,
 On success return the response dict.
 On failure raise an exception.
 """
-qmp_args = self._qmp_args(conv_keys, **args)
+qmp_args = self._qmp_args(conv_keys, args)
 return self._qmp.command(cmd, **qmp_args)
 
 def get_qmp_event(self, wait: bool = False) -> Optional[QMPMessage]:
-- 
2.29.2




[PATCH v6 21/33] block/copy-before-write: make public block driver

2021-07-21 Thread Vladimir Sementsov-Ogievskiy
Finally, copy-before-write gets own .bdrv_open and .bdrv_close
handlers, block_init() call and becomes available through bdrv_open().

To achieve this:

 - cbw_init gets unused flags argument and becomes cbw_open
 - block_copy_state_free() call moved to new cbw_close()
 - in bdrv_cbw_append:
   - options are completed with driver and node-name, and we can simply
 use bdrv_insert_node() to do both open and drained replacing
 - in bdrv_cbw_drop:
   - cbw_close() is now responsible for freeing s->bcs, so don't do it
 here

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
---
 block/copy-before-write.c | 60 ++-
 1 file changed, 28 insertions(+), 32 deletions(-)

diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index 2efe098aae..2cd68b480a 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -144,7 +144,8 @@ static void cbw_child_perm(BlockDriverState *bs, BdrvChild 
*c,
 }
 }
 
-static int cbw_init(BlockDriverState *bs, QDict *options, Error **errp)
+static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
+Error **errp)
 {
 BDRVCopyBeforeWriteState *s = bs->opaque;
 BdrvDirtyBitmap *copy_bitmap;
@@ -181,10 +182,21 @@ static int cbw_init(BlockDriverState *bs, QDict *options, 
Error **errp)
 return 0;
 }
 
+static void cbw_close(BlockDriverState *bs)
+{
+BDRVCopyBeforeWriteState *s = bs->opaque;
+
+block_copy_state_free(s->bcs);
+s->bcs = NULL;
+}
+
 BlockDriver bdrv_cbw_filter = {
 .format_name = "copy-before-write",
 .instance_size = sizeof(BDRVCopyBeforeWriteState),
 
+.bdrv_open  = cbw_open,
+.bdrv_close = cbw_close,
+
 .bdrv_co_preadv = cbw_co_preadv,
 .bdrv_co_pwritev= cbw_co_pwritev,
 .bdrv_co_pwrite_zeroes  = cbw_co_pwrite_zeroes,
@@ -205,56 +217,40 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState 
*source,
   Error **errp)
 {
 ERRP_GUARD();
-int ret;
 BDRVCopyBeforeWriteState *state;
 BlockDriverState *top;
 QDict *opts;
 
 assert(source->total_sectors == target->total_sectors);
 
-top = bdrv_new_open_driver(&bdrv_cbw_filter, filter_node_name,
-   BDRV_O_RDWR, errp);
-if (!top) {
-error_prepend(errp, "Cannot open driver: ");
-return NULL;
-}
-state = top->opaque;
-
 opts = qdict_new();
+qdict_put_str(opts, "driver", "copy-before-write");
+if (filter_node_name) {
+qdict_put_str(opts, "node-name", filter_node_name);
+}
 qdict_put_str(opts, "file", bdrv_get_node_name(source));
 qdict_put_str(opts, "target", bdrv_get_node_name(target));
 
-ret = cbw_init(top, opts, errp);
-qobject_unref(opts);
-if (ret < 0) {
-goto fail;
-}
-
-bdrv_drained_begin(source);
-ret = bdrv_replace_node(source, top, errp);
-bdrv_drained_end(source);
-if (ret < 0) {
-error_prepend(errp, "Cannot append copy-before-write filter: ");
-goto fail;
+top = bdrv_insert_node(source, opts, BDRV_O_RDWR, errp);
+if (!top) {
+return NULL;
 }
 
+state = top->opaque;
 *bcs = state->bcs;
 
 return top;
-
-fail:
-block_copy_state_free(state->bcs);
-bdrv_unref(top);
-return NULL;
 }
 
 void bdrv_cbw_drop(BlockDriverState *bs)
 {
-BDRVCopyBeforeWriteState *s = bs->opaque;
-
 bdrv_drop_filter(bs, &error_abort);
-
-block_copy_state_free(s->bcs);
-
 bdrv_unref(bs);
 }
+
+static void cbw_init(void)
+{
+bdrv_register(&bdrv_cbw_filter);
+}
+
+block_init(cbw_init);
-- 
2.29.2




[PATCH v6 15/33] block/copy-before-write: cbw_init(): rename variables

2021-07-21 Thread Vladimir Sementsov-Ogievskiy
One more step closer to real .bdrv_open() handler: use more usual names
for bs being initialized and its state.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
---
 block/copy-before-write.c | 29 ++---
 1 file changed, 14 insertions(+), 15 deletions(-)

diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index a4fee645fd..d7f1833efa 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -144,38 +144,37 @@ static void cbw_child_perm(BlockDriverState *bs, 
BdrvChild *c,
 }
 }
 
-static int cbw_init(BlockDriverState *top, BlockDriverState *source,
+static int cbw_init(BlockDriverState *bs, BlockDriverState *source,
 BlockDriverState *target, bool compress, Error **errp)
 {
-BDRVCopyBeforeWriteState *state = top->opaque;
+BDRVCopyBeforeWriteState *s = bs->opaque;
 
-top->total_sectors = source->total_sectors;
-top->supported_write_flags = BDRV_REQ_WRITE_UNCHANGED |
+bs->total_sectors = source->total_sectors;
+bs->supported_write_flags = BDRV_REQ_WRITE_UNCHANGED |
 (BDRV_REQ_FUA & source->supported_write_flags);
-top->supported_zero_flags = BDRV_REQ_WRITE_UNCHANGED |
+bs->supported_zero_flags = BDRV_REQ_WRITE_UNCHANGED |
 ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
  source->supported_zero_flags);
 
 bdrv_ref(target);
-state->target = bdrv_attach_child(top, target, "target", &child_of_bds,
-  BDRV_CHILD_DATA, errp);
-if (!state->target) {
+s->target = bdrv_attach_child(bs, target, "target", &child_of_bds,
+  BDRV_CHILD_DATA, errp);
+if (!s->target) {
 error_prepend(errp, "Cannot attach target child: ");
 return -EINVAL;
 }
 
 bdrv_ref(source);
-top->file = bdrv_attach_child(top, source, "file", &child_of_bds,
-  BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
-  errp);
-if (!top->file) {
+bs->file = bdrv_attach_child(bs, source, "file", &child_of_bds,
+ BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
+ errp);
+if (!bs->file) {
 error_prepend(errp, "Cannot attach file child: ");
 return -EINVAL;
 }
 
-state->bcs = block_copy_state_new(top->file, state->target, false, 
compress,
-  errp);
-if (!state->bcs) {
+s->bcs = block_copy_state_new(bs->file, s->target, false, compress, errp);
+if (!s->bcs) {
 error_prepend(errp, "Cannot create block-copy-state: ");
 return -EINVAL;
 }
-- 
2.29.2




[PATCH v6 20/33] block/block-copy: make setting progress optional

2021-07-21 Thread Vladimir Sementsov-Ogievskiy
Now block-copy will crash if user don't set progress meter by
block_copy_set_progress_meter(). copy-before-write filter will be used
in separate of backup job, and it doesn't want any progress meter (for
now). So, allow not setting it.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
---
 block/block-copy.c | 18 +++---
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/block/block-copy.c b/block/block-copy.c
index 40c4803033..020c8d7126 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -291,9 +291,11 @@ static void coroutine_fn block_copy_task_end(BlockCopyTask 
*task, int ret)
 bdrv_set_dirty_bitmap(task->s->copy_bitmap, task->offset, task->bytes);
 }
 QLIST_REMOVE(task, list);
-progress_set_remaining(task->s->progress,
-   bdrv_get_dirty_count(task->s->copy_bitmap) +
-   task->s->in_flight_bytes);
+if (task->s->progress) {
+progress_set_remaining(task->s->progress,
+   bdrv_get_dirty_count(task->s->copy_bitmap) +
+   task->s->in_flight_bytes);
+}
 qemu_co_queue_restart_all(&task->wait_queue);
 }
 
@@ -599,7 +601,7 @@ static coroutine_fn int block_copy_task_entry(AioTask *task)
 t->call_state->ret = ret;
 t->call_state->error_is_read = error_is_read;
 }
-} else {
+} else if (s->progress) {
 progress_work_done(s->progress, t->bytes);
 }
 }
@@ -705,9 +707,11 @@ int64_t block_copy_reset_unallocated(BlockCopyState *s,
 if (!ret) {
 qemu_co_mutex_lock(&s->lock);
 bdrv_reset_dirty_bitmap(s->copy_bitmap, offset, bytes);
-progress_set_remaining(s->progress,
-   bdrv_get_dirty_count(s->copy_bitmap) +
-   s->in_flight_bytes);
+if (s->progress) {
+progress_set_remaining(s->progress,
+   bdrv_get_dirty_count(s->copy_bitmap) +
+   s->in_flight_bytes);
+}
 qemu_co_mutex_unlock(&s->lock);
 }
 
-- 
2.29.2




[PATCH v6 19/33] block/copy-before-write: initialize block-copy bitmap

2021-07-21 Thread Vladimir Sementsov-Ogievskiy
We are going to publish copy-before-write filter to be used in separate
of backup. Future step would support bitmap for the filter. But let's
start from full set bitmap.

We have to modify backup, as bitmap is first initialized by
copy-before-write filter, and then backup modifies it.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
---
 block/backup.c| 16 +++-
 block/copy-before-write.c |  4 
 2 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 4869f1e5da..687d2882bc 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -233,18 +233,16 @@ static void backup_init_bcs_bitmap(BackupBlockJob *job)
 BdrvDirtyBitmap *bcs_bitmap = block_copy_dirty_bitmap(job->bcs);
 
 if (job->sync_mode == MIRROR_SYNC_MODE_BITMAP) {
+bdrv_clear_dirty_bitmap(bcs_bitmap, NULL);
 ret = bdrv_dirty_bitmap_merge_internal(bcs_bitmap, job->sync_bitmap,
NULL, true);
 assert(ret);
-} else {
-if (job->sync_mode == MIRROR_SYNC_MODE_TOP) {
-/*
- * We can't hog the coroutine to initialize this thoroughly.
- * Set a flag and resume work when we are able to yield safely.
- */
-block_copy_set_skip_unallocated(job->bcs, true);
-}
-bdrv_set_dirty_bitmap(bcs_bitmap, 0, job->len);
+} else if (job->sync_mode == MIRROR_SYNC_MODE_TOP) {
+/*
+ * We can't hog the coroutine to initialize this thoroughly.
+ * Set a flag and resume work when we are able to yield safely.
+ */
+block_copy_set_skip_unallocated(job->bcs, true);
 }
 
 estimate = bdrv_get_dirty_count(bcs_bitmap);
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index 1cefcade78..2efe098aae 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -147,6 +147,7 @@ static void cbw_child_perm(BlockDriverState *bs, BdrvChild 
*c,
 static int cbw_init(BlockDriverState *bs, QDict *options, Error **errp)
 {
 BDRVCopyBeforeWriteState *s = bs->opaque;
+BdrvDirtyBitmap *copy_bitmap;
 
 bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
@@ -174,6 +175,9 @@ static int cbw_init(BlockDriverState *bs, QDict *options, 
Error **errp)
 return -EINVAL;
 }
 
+copy_bitmap = block_copy_dirty_bitmap(s->bcs);
+bdrv_set_dirty_bitmap(copy_bitmap, 0, bdrv_dirty_bitmap_size(copy_bitmap));
+
 return 0;
 }
 
-- 
2.29.2




[PATCH v6 32/33] iotests/image-fleecing: prepare for adding new test-case

2021-07-21 Thread Vladimir Sementsov-Ogievskiy
We are going to add a test-case with some behavior modifications. So,
let's prepare a function to be reused.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
---
 tests/qemu-iotests/tests/image-fleecing | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/tests/qemu-iotests/tests/image-fleecing 
b/tests/qemu-iotests/tests/image-fleecing
index ec4ef5f3f6..e210c00d28 100755
--- a/tests/qemu-iotests/tests/image-fleecing
+++ b/tests/qemu-iotests/tests/image-fleecing
@@ -48,12 +48,7 @@ remainder = [('0xd5', '0x108000',  '32k'), # Right-end of 
partial-left [1]
  ('0xdc', '32M',   '32k'), # Left-end of partial-right [2]
  ('0xcd', '0x3ff', '64k')] # patterns[3]
 
-with iotests.FilePath('base.img') as base_img_path, \
- iotests.FilePath('fleece.img') as fleece_img_path, \
- iotests.FilePath('nbd.sock',
-  base_dir=iotests.sock_dir) as nbd_sock_path, \
- iotests.VM() as vm:
-
+def do_test(base_img_path, fleece_img_path, nbd_sock_path, vm):
 log('--- Setting up images ---')
 log('')
 
@@ -163,3 +158,15 @@ with iotests.FilePath('base.img') as base_img_path, \
 
 log('')
 log('Done')
+
+
+def test():
+with iotests.FilePath('base.img') as base_img_path, \
+ iotests.FilePath('fleece.img') as fleece_img_path, \
+ iotests.FilePath('nbd.sock',
+  base_dir=iotests.sock_dir) as nbd_sock_path, \
+ iotests.VM() as vm:
+do_test(base_img_path, fleece_img_path, nbd_sock_path, vm)
+
+
+test()
-- 
2.29.2




[PATCH v6 28/33] iotests: move 222 to tests/image-fleecing

2021-07-21 Thread Vladimir Sementsov-Ogievskiy
Give a good name to test file.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
---
 tests/qemu-iotests/{222 => tests/image-fleecing} | 0
 tests/qemu-iotests/{222.out => tests/image-fleecing.out} | 0
 2 files changed, 0 insertions(+), 0 deletions(-)
 rename tests/qemu-iotests/{222 => tests/image-fleecing} (100%)
 rename tests/qemu-iotests/{222.out => tests/image-fleecing.out} (100%)

diff --git a/tests/qemu-iotests/222 b/tests/qemu-iotests/tests/image-fleecing
similarity index 100%
rename from tests/qemu-iotests/222
rename to tests/qemu-iotests/tests/image-fleecing
diff --git a/tests/qemu-iotests/222.out 
b/tests/qemu-iotests/tests/image-fleecing.out
similarity index 100%
rename from tests/qemu-iotests/222.out
rename to tests/qemu-iotests/tests/image-fleecing.out
-- 
2.29.2




[PATCH v6 10/33] block/copy-before-write: relax permission requirements when no parents

2021-07-21 Thread Vladimir Sementsov-Ogievskiy
We are going to publish copy-before-write filter. So, user should be
able to create it with blockdev-add first, specifying both filtered and
target children. And then do blockdev-reopen, to actually insert the
filter where needed.

Currently, filter unshares write permission unconditionally on source
node. It's good, but it will not allow to do blockdev-add. So, let's
relax restrictions when filter doesn't have any parent.

Test output is modified, as now permission conflict happens only when
job creates a blk parent for filter node.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
---
 block/copy-before-write.c  | 8 +---
 tests/qemu-iotests/283.out | 2 +-
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index a7996d54db..2a51cc64e4 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -142,10 +142,12 @@ static void cbw_child_perm(BlockDriverState *bs, 
BdrvChild *c,
 bdrv_default_perms(bs, c, role, reopen_queue,
perm, shared, nperm, nshared);
 
-if (perm & BLK_PERM_WRITE) {
-*nperm = *nperm | BLK_PERM_CONSISTENT_READ;
+if (!QLIST_EMPTY(&bs->parents)) {
+if (perm & BLK_PERM_WRITE) {
+*nperm = *nperm | BLK_PERM_CONSISTENT_READ;
+}
+*nshared &= ~(BLK_PERM_WRITE | BLK_PERM_RESIZE);
 }
-*nshared &= ~(BLK_PERM_WRITE | BLK_PERM_RESIZE);
 }
 }
 
diff --git a/tests/qemu-iotests/283.out b/tests/qemu-iotests/283.out
index f2b7219632..5bb75952ef 100644
--- a/tests/qemu-iotests/283.out
+++ b/tests/qemu-iotests/283.out
@@ -5,7 +5,7 @@
 {"execute": "blockdev-add", "arguments": {"driver": "blkdebug", "image": 
"base", "node-name": "other", "take-child-perms": ["write"]}}
 {"return": {}}
 {"execute": "blockdev-backup", "arguments": {"device": "source", "sync": 
"full", "target": "target"}}
-{"error": {"class": "GenericError", "desc": "Cannot append copy-before-write 
filter: Permission conflict on node 'base': permissions 'write' are both 
required by node 'other' (uses node 'base' as 'image' child) and unshared by 
node 'source' (uses node 'base' as 'image' child)."}}
+{"error": {"class": "GenericError", "desc": "Permission conflict on node 
'base': permissions 'write' are both required by node 'other' (uses node 'base' 
as 'image' child) and unshared by node 'source' (uses node 'base' as 'image' 
child)."}}
 
 === copy-before-write filter should be gone after job-finalize ===
 
-- 
2.29.2




[PATCH v6 14/33] block/copy-before-write: introduce cbw_init()

2021-07-21 Thread Vladimir Sementsov-Ogievskiy
Move part of bdrv_cbw_append() to new function cbw_open(). It's an
intermediate step for adding normal .bdrv_open() handler to the
filter. With this commit no logic is changed, but we have a function
which will be turned into .bdrv_open() handler in future commit.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
---
 block/copy-before-write.c | 69 +++
 1 file changed, 41 insertions(+), 28 deletions(-)

diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index adbbc64aa9..a4fee645fd 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -144,6 +144,45 @@ static void cbw_child_perm(BlockDriverState *bs, BdrvChild 
*c,
 }
 }
 
+static int cbw_init(BlockDriverState *top, BlockDriverState *source,
+BlockDriverState *target, bool compress, Error **errp)
+{
+BDRVCopyBeforeWriteState *state = top->opaque;
+
+top->total_sectors = source->total_sectors;
+top->supported_write_flags = BDRV_REQ_WRITE_UNCHANGED |
+(BDRV_REQ_FUA & source->supported_write_flags);
+top->supported_zero_flags = BDRV_REQ_WRITE_UNCHANGED |
+((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
+ source->supported_zero_flags);
+
+bdrv_ref(target);
+state->target = bdrv_attach_child(top, target, "target", &child_of_bds,
+  BDRV_CHILD_DATA, errp);
+if (!state->target) {
+error_prepend(errp, "Cannot attach target child: ");
+return -EINVAL;
+}
+
+bdrv_ref(source);
+top->file = bdrv_attach_child(top, source, "file", &child_of_bds,
+  BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
+  errp);
+if (!top->file) {
+error_prepend(errp, "Cannot attach file child: ");
+return -EINVAL;
+}
+
+state->bcs = block_copy_state_new(top->file, state->target, false, 
compress,
+  errp);
+if (!state->bcs) {
+error_prepend(errp, "Cannot create block-copy-state: ");
+return -EINVAL;
+}
+
+return 0;
+}
+
 BlockDriver bdrv_cbw_filter = {
 .format_name = "copy-before-write",
 .instance_size = sizeof(BDRVCopyBeforeWriteState),
@@ -181,36 +220,10 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState 
*source,
 error_prepend(errp, "Cannot open driver: ");
 return NULL;
 }
-
 state = top->opaque;
-top->total_sectors = source->total_sectors;
-top->supported_write_flags = BDRV_REQ_WRITE_UNCHANGED |
-(BDRV_REQ_FUA & source->supported_write_flags);
-top->supported_zero_flags = BDRV_REQ_WRITE_UNCHANGED |
-((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
- source->supported_zero_flags);
-
-bdrv_ref(target);
-state->target = bdrv_attach_child(top, target, "target", &child_of_bds,
-  BDRV_CHILD_DATA, errp);
-if (!state->target) {
-error_prepend(errp, "Cannot attach target child: ");
-goto fail;
-}
 
-bdrv_ref(source);
-top->file = bdrv_attach_child(top, source, "file", &child_of_bds,
-  BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
-  errp);
-if (!top->file) {
-error_prepend(errp, "Cannot attach file child: ");
-goto fail;
-}
-
-state->bcs = block_copy_state_new(top->file, state->target, false, 
compress,
-  errp);
-if (!state->bcs) {
-error_prepend(errp, "Cannot create block-copy-state: ");
+ret = cbw_init(top, source, target, compress, errp);
+if (ret < 0) {
 goto fail;
 }
 
-- 
2.29.2




[PATCH v6 13/33] block/copy-before-write: bdrv_cbw_append(): replace child at last

2021-07-21 Thread Vladimir Sementsov-Ogievskiy
Refactor the function to replace child at last. Thus we don't need to
revert it and code is simplified.

block-copy state initialization being done before replacing the child
doesn't need any drained section.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
---
 block/copy-before-write.c | 33 +++--
 1 file changed, 11 insertions(+), 22 deletions(-)

diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index 7a6c15f141..adbbc64aa9 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -172,7 +172,6 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
 int ret;
 BDRVCopyBeforeWriteState *state;
 BlockDriverState *top;
-bool appended = false;
 
 assert(source->total_sectors == target->total_sectors);
 
@@ -196,8 +195,7 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
   BDRV_CHILD_DATA, errp);
 if (!state->target) {
 error_prepend(errp, "Cannot attach target child: ");
-bdrv_unref(top);
-return NULL;
+goto fail;
 }
 
 bdrv_ref(source);
@@ -206,18 +204,8 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
   errp);
 if (!top->file) {
 error_prepend(errp, "Cannot attach file child: ");
-bdrv_unref(top);
-return NULL;
-}
-
-bdrv_drained_begin(source);
-
-ret = bdrv_replace_node(source, top, errp);
-if (ret < 0) {
-error_prepend(errp, "Cannot append copy-before-write filter: ");
 goto fail;
 }
-appended = true;
 
 state->bcs = block_copy_state_new(top->file, state->target, false, 
compress,
   errp);
@@ -225,21 +213,22 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState 
*source,
 error_prepend(errp, "Cannot create block-copy-state: ");
 goto fail;
 }
-*bcs = state->bcs;
 
+bdrv_drained_begin(source);
+ret = bdrv_replace_node(source, top, errp);
 bdrv_drained_end(source);
+if (ret < 0) {
+error_prepend(errp, "Cannot append copy-before-write filter: ");
+goto fail;
+}
+
+*bcs = state->bcs;
 
 return top;
 
 fail:
-if (appended) {
-bdrv_cbw_drop(top);
-} else {
-bdrv_unref(top);
-}
-
-bdrv_drained_end(source);
-
+block_copy_state_free(state->bcs);
+bdrv_unref(top);
 return NULL;
 }
 
-- 
2.29.2




[PATCH v6 05/33] block: rename backup-top to copy-before-write

2021-07-21 Thread Vladimir Sementsov-Ogievskiy
We are going to convert backup_top to full featured public filter,
which can be used in separate of backup job. Start from renaming from
"how it used" to "what it does".

While updating comments in 283 iotest, drop and rephrase also things
about ".active", as this field is now dropped, and filter doesn't have
"inactive" mode.

Note that this change may be considered as incompatible interface
change, as backup-top filter format name was visible through
query-block and query-named-block-nodes.

Still, consider the following reasoning:

1. backup-top was never documented, so if someone depends on format
   name (for driver that can't be used other than it is automatically
   inserted on backup job start), it's a kind of "undocumented feature
   use". So I think we are free to change it.

2. There is a hope, that there is no such users: it's a lot more native
   to give a good node-name to backup-top filter if need to operate
   with it somehow, and don't touch format name.

3. Another "incompatible" change in further commit would be moving
   copy-before-write filter from using backing child to file child. And
   this is even more reasonable than renaming: for now all public
   filters are file-child based.

So, it's a risky change, but risk seems small and good interface worth
it.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
---
 block/{backup-top.h => copy-before-write.h} |  28 +++---
 block/backup.c  |  22 ++---
 block/{backup-top.c => copy-before-write.c} | 100 ++--
 MAINTAINERS |   4 +-
 block/meson.build   |   2 +-
 tests/qemu-iotests/283  |  35 +++
 tests/qemu-iotests/283.out  |   4 +-
 7 files changed, 95 insertions(+), 100 deletions(-)
 rename block/{backup-top.h => copy-before-write.h} (56%)
 rename block/{backup-top.c => copy-before-write.c} (62%)

diff --git a/block/backup-top.h b/block/copy-before-write.h
similarity index 56%
rename from block/backup-top.h
rename to block/copy-before-write.h
index b28b0031c4..5977b7aa31 100644
--- a/block/backup-top.h
+++ b/block/copy-before-write.h
@@ -1,10 +1,10 @@
 /*
- * backup-top filter driver
+ * copy-before-write filter driver
  *
  * The driver performs Copy-Before-Write (CBW) operation: it is injected above
  * some node, and before each write it copies _old_ data to the target node.
  *
- * Copyright (c) 2018-2019 Virtuozzo International GmbH.
+ * Copyright (c) 2018-2021 Virtuozzo International GmbH.
  *
  * Author:
  *  Sementsov-Ogievskiy Vladimir 
@@ -23,20 +23,20 @@
  * along with this program. If not, see .
  */
 
-#ifndef BACKUP_TOP_H
-#define BACKUP_TOP_H
+#ifndef COPY_BEFORE_WRITE_H
+#define COPY_BEFORE_WRITE_H
 
 #include "block/block_int.h"
 #include "block/block-copy.h"
 
-BlockDriverState *bdrv_backup_top_append(BlockDriverState *source,
- BlockDriverState *target,
- const char *filter_node_name,
- uint64_t cluster_size,
- BackupPerf *perf,
- BdrvRequestFlags write_flags,
- BlockCopyState **bcs,
- Error **errp);
-void bdrv_backup_top_drop(BlockDriverState *bs);
+BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
+  BlockDriverState *target,
+  const char *filter_node_name,
+  uint64_t cluster_size,
+  BackupPerf *perf,
+  BdrvRequestFlags write_flags,
+  BlockCopyState **bcs,
+  Error **errp);
+void bdrv_cbw_drop(BlockDriverState *bs);
 
-#endif /* BACKUP_TOP_H */
+#endif /* COPY_BEFORE_WRITE_H */
diff --git a/block/backup.c b/block/backup.c
index bd3614ce70..ac91821b08 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -27,13 +27,13 @@
 #include "qemu/bitmap.h"
 #include "qemu/error-report.h"
 
-#include "block/backup-top.h"
+#include "block/copy-before-write.h"
 
 #define BACKUP_CLUSTER_SIZE_DEFAULT (1 << 16)
 
 typedef struct BackupBlockJob {
 BlockJob common;
-BlockDriverState *backup_top;
+BlockDriverState *cbw;
 BlockDriverState *source_bs;
 BlockDriverState *target_bs;
 
@@ -104,7 +104,7 @@ static void backup_clean(Job *job)
 {
 BackupBlockJob *s = container_of(job, BackupBlockJob, common.job);
 block_job_remove_all_bdrv(&s->common);
-bdrv_backup_top_drop(s->backup_top);
+bdrv_cbw_drop(s->cbw);
 }
 
 void backup_do_checkpoint(BlockJob *job, Error **errp)
@@ -408,7 +408,7 @@ BlockJob *backup_job_create(const char *job_id, 
BlockDriverState *bs,
 BackupBlockJob *job = NULL;
 int64_t cluster_size;

[PATCH v6 25/33] iotests.py: VM: add own __enter__ method

2021-07-21 Thread Vladimir Sementsov-Ogievskiy
In superclass __enter__ method is defined with return value type hint
'QEMUMachine'. So, mypy thinks that return value of VM.__enter__ is
QEMUMachine. Let's redefine __enter__ in VM class, to give it correct
type hint.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
---
 tests/qemu-iotests/iotests.py | 4 
 1 file changed, 4 insertions(+)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 89663dac06..025e288ddd 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -568,6 +568,10 @@ def remote_filename(path):
 class VM(qtest.QEMUQtestMachine):
 '''A QEMU VM'''
 
+# Redefine __enter__ with proper type hint
+def __enter__(self) -> 'VM':
+return self
+
 def __init__(self, path_suffix=''):
 name = "qemu%s-%d" % (path_suffix, os.getpid())
 super().__init__(qemu_prog, qemu_opts, name=name,
-- 
2.29.2




[PATCH v6 09/33] block/backup: move cluster size calculation to block-copy

2021-07-21 Thread Vladimir Sementsov-Ogievskiy
The main consumer of cluster-size is block-copy. Let's calculate it
here instead of passing through backup-top.

We are going to publish copy-before-write filter soon, so it will be
created through options. But we don't want for now to make explicit
option for cluster-size, let's continue to calculate it automatically.
So, now is the time to get rid of cluster_size argument for
bdrv_cbw_append().

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
---
 block/copy-before-write.h  |  1 -
 include/block/block-copy.h |  5 +--
 block/backup.c | 62 ++
 block/block-copy.c | 51 ++-
 block/copy-before-write.c  | 10 +++---
 5 files changed, 66 insertions(+), 63 deletions(-)

diff --git a/block/copy-before-write.h b/block/copy-before-write.h
index 538aab8bdb..b386fd8f01 100644
--- a/block/copy-before-write.h
+++ b/block/copy-before-write.h
@@ -32,7 +32,6 @@
 BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
   BlockDriverState *target,
   const char *filter_node_name,
-  uint64_t cluster_size,
   bool compress,
   BlockCopyState **bcs,
   Error **errp);
diff --git a/include/block/block-copy.h b/include/block/block-copy.h
index f0ba7bc828..9c24e8f38e 100644
--- a/include/block/block-copy.h
+++ b/include/block/block-copy.h
@@ -25,8 +25,8 @@ typedef struct BlockCopyState BlockCopyState;
 typedef struct BlockCopyCallState BlockCopyCallState;
 
 BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
- int64_t cluster_size, bool use_copy_range,
- bool compress, Error **errp);
+ bool use_copy_range, bool compress,
+ Error **errp);
 
 void block_copy_set_copy_opts(BlockCopyState *s, bool use_copy_range,
   bool compress);
@@ -90,6 +90,7 @@ void block_copy_kick(BlockCopyCallState *call_state);
 void block_copy_call_cancel(BlockCopyCallState *call_state);
 
 BdrvDirtyBitmap *block_copy_dirty_bitmap(BlockCopyState *s);
+int64_t block_copy_cluster_size(BlockCopyState *s);
 void block_copy_set_skip_unallocated(BlockCopyState *s, bool skip);
 
 #endif /* BLOCK_COPY_H */
diff --git a/block/backup.c b/block/backup.c
index b31fd99ab3..83516297cb 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -29,8 +29,6 @@
 
 #include "block/copy-before-write.h"
 
-#define BACKUP_CLUSTER_SIZE_DEFAULT (1 << 16)
-
 typedef struct BackupBlockJob {
 BlockJob common;
 BlockDriverState *cbw;
@@ -354,43 +352,6 @@ static const BlockJobDriver backup_job_driver = {
 .set_speed = backup_set_speed,
 };
 
-static int64_t backup_calculate_cluster_size(BlockDriverState *target,
- Error **errp)
-{
-int ret;
-BlockDriverInfo bdi;
-bool target_does_cow = bdrv_backing_chain_next(target);
-
-/*
- * If there is no backing file on the target, we cannot rely on COW if our
- * backup cluster size is smaller than the target cluster size. Even for
- * targets with a backing file, try to avoid COW if possible.
- */
-ret = bdrv_get_info(target, &bdi);
-if (ret == -ENOTSUP && !target_does_cow) {
-/* Cluster size is not defined */
-warn_report("The target block device doesn't provide "
-"information about the block size and it doesn't have a "
-"backing file. The default block size of %u bytes is "
-"used. If the actual block size of the target exceeds "
-"this default, the backup may be unusable",
-BACKUP_CLUSTER_SIZE_DEFAULT);
-return BACKUP_CLUSTER_SIZE_DEFAULT;
-} else if (ret < 0 && !target_does_cow) {
-error_setg_errno(errp, -ret,
-"Couldn't determine the cluster size of the target image, "
-"which has no backing file");
-error_append_hint(errp,
-"Aborting, since this may create an unusable destination image\n");
-return ret;
-} else if (ret < 0 && target_does_cow) {
-/* Not fatal; just trudge on ahead. */
-return BACKUP_CLUSTER_SIZE_DEFAULT;
-}
-
-return MAX(BACKUP_CLUSTER_SIZE_DEFAULT, bdi.cluster_size);
-}
-
 BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
   BlockDriverState *target, int64_t speed,
   MirrorSyncMode sync_mode, BdrvDirtyBitmap *sync_bitmap,
@@ -448,11 +409,6 @@ BlockJob *backup_job_create(const char *job_id, 
BlockDriverState *bs,
 return NULL;
 }
 
-cluster_size = backup_calculate_cluster_size(target, errp);
-if (cluster_size < 0) {
-goto error;
-}
-
 if (perf->max_wo

[PATCH v6 18/33] block/copy-before-write: cbw_init(): use options

2021-07-21 Thread Vladimir Sementsov-Ogievskiy
One more step closer to .bdrv_open(): use options instead of plain
arguments. Move to bdrv_open_child() calls, native for drive open
handlers.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/copy-before-write.c | 29 +++--
 1 file changed, 15 insertions(+), 14 deletions(-)

diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index 1e7180760a..1cefcade78 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -144,25 +144,20 @@ static void cbw_child_perm(BlockDriverState *bs, 
BdrvChild *c,
 }
 }
 
-static int cbw_init(BlockDriverState *bs, BlockDriverState *source,
-BlockDriverState *target, Error **errp)
+static int cbw_init(BlockDriverState *bs, QDict *options, Error **errp)
 {
 BDRVCopyBeforeWriteState *s = bs->opaque;
 
-bdrv_ref(target);
-s->target = bdrv_attach_child(bs, target, "target", &child_of_bds,
-  BDRV_CHILD_DATA, errp);
-if (!s->target) {
-error_prepend(errp, "Cannot attach target child: ");
+bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
+   BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
+   false, errp);
+if (!bs->file) {
 return -EINVAL;
 }
 
-bdrv_ref(source);
-bs->file = bdrv_attach_child(bs, source, "file", &child_of_bds,
- BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
- errp);
-if (!bs->file) {
-error_prepend(errp, "Cannot attach file child: ");
+s->target = bdrv_open_child(NULL, options, "target", bs, &child_of_bds,
+BDRV_CHILD_DATA, false, errp);
+if (!s->target) {
 return -EINVAL;
 }
 
@@ -209,6 +204,7 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
 int ret;
 BDRVCopyBeforeWriteState *state;
 BlockDriverState *top;
+QDict *opts;
 
 assert(source->total_sectors == target->total_sectors);
 
@@ -220,7 +216,12 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
 }
 state = top->opaque;
 
-ret = cbw_init(top, source, target, errp);
+opts = qdict_new();
+qdict_put_str(opts, "file", bdrv_get_node_name(source));
+qdict_put_str(opts, "target", bdrv_get_node_name(target));
+
+ret = cbw_init(top, opts, errp);
+qobject_unref(opts);
 if (ret < 0) {
 goto fail;
 }
-- 
2.29.2




[PATCH v6 06/33] block-copy: always set BDRV_REQ_SERIALISING flag

2021-07-21 Thread Vladimir Sementsov-Ogievskiy
It won't hurt in common case, so let's not bother with detecting image
fleecing.

Also, we want to simplify initialization interface of copy-before-write
filter as we are going to make it public.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
---
 block/copy-before-write.h  |  2 +-
 include/block/block-copy.h |  3 +--
 block/backup.c | 21 +
 block/block-copy.c | 29 ++---
 block/copy-before-write.c  |  4 ++--
 5 files changed, 31 insertions(+), 28 deletions(-)

diff --git a/block/copy-before-write.h b/block/copy-before-write.h
index 5977b7aa31..f37e2249ae 100644
--- a/block/copy-before-write.h
+++ b/block/copy-before-write.h
@@ -34,7 +34,7 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
   const char *filter_node_name,
   uint64_t cluster_size,
   BackupPerf *perf,
-  BdrvRequestFlags write_flags,
+  bool compress,
   BlockCopyState **bcs,
   Error **errp);
 void bdrv_cbw_drop(BlockDriverState *bs);
diff --git a/include/block/block-copy.h b/include/block/block-copy.h
index 5c8278895c..734389d32a 100644
--- a/include/block/block-copy.h
+++ b/include/block/block-copy.h
@@ -26,8 +26,7 @@ typedef struct BlockCopyCallState BlockCopyCallState;
 
 BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
  int64_t cluster_size, bool use_copy_range,
- BdrvRequestFlags write_flags,
- Error **errp);
+ bool compress, Error **errp);
 
 void block_copy_set_progress_meter(BlockCopyState *s, ProgressMeter *pm);
 
diff --git a/block/backup.c b/block/backup.c
index ac91821b08..84f9a5f02c 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -407,7 +407,6 @@ BlockJob *backup_job_create(const char *job_id, 
BlockDriverState *bs,
 int64_t len, target_len;
 BackupBlockJob *job = NULL;
 int64_t cluster_size;
-BdrvRequestFlags write_flags;
 BlockDriverState *cbw = NULL;
 BlockCopyState *bcs = NULL;
 
@@ -504,26 +503,8 @@ BlockJob *backup_job_create(const char *job_id, 
BlockDriverState *bs,
 goto error;
 }
 
-/*
- * If source is in backing chain of target assume that target is going to 
be
- * used for "image fleecing", i.e. it should represent a kind of snapshot 
of
- * source at backup-start point in time. And target is going to be read by
- * somebody (for example, used as NBD export) during backup job.
- *
- * In this case, we need to add BDRV_REQ_SERIALISING write flag to avoid
- * intersection of backup writes and third party reads from target,
- * otherwise reading from target we may occasionally read already updated 
by
- * guest data.
- *
- * For more information see commit f8d59dfb40bb and test
- * tests/qemu-iotests/222
- */
-write_flags = (bdrv_chain_contains(target, bs) ? BDRV_REQ_SERIALISING : 0) 
|
-  (compress ? BDRV_REQ_WRITE_COMPRESSED : 0),
-
 cbw = bdrv_cbw_append(bs, target, filter_node_name,
-cluster_size, perf,
-write_flags, &bcs, errp);
+  cluster_size, perf, compress, &bcs, errp);
 if (!cbw) {
 goto error;
 }
diff --git a/block/block-copy.c b/block/block-copy.c
index 0becad52da..7ce5e3d657 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -317,7 +317,7 @@ static uint32_t block_copy_max_transfer(BdrvChild *source, 
BdrvChild *target)
 
 BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
  int64_t cluster_size, bool use_copy_range,
- BdrvRequestFlags write_flags, Error 
**errp)
+ bool compress, Error **errp)
 {
 BlockCopyState *s;
 BdrvDirtyBitmap *copy_bitmap;
@@ -329,6 +329,28 @@ BlockCopyState *block_copy_state_new(BdrvChild *source, 
BdrvChild *target,
 }
 bdrv_disable_dirty_bitmap(copy_bitmap);
 
+/*
+ * Why we always set BDRV_REQ_SERIALISING write flag:
+ *
+ * Assume source is in backing chain of target assume that target is going
+ * to be used for "image fleecing", i.e. it should represent a kind of
+ * snapshot of source at backup-start point in time. And target is going to
+ * be read by somebody (for example, used as NBD export) during backup job.
+ *
+ * In this case, we need to add BDRV_REQ_SERIALISING write flag to avoid
+ * intersection of backup writes and third party reads from target,
+ * otherwise reading from target we may occasionally read already updated 
by
+ * guest data.
+

[PATCH v6 03/33] qdev-properties: PropertyInfo: add realized_set_allowed field

2021-07-21 Thread Vladimir Sementsov-Ogievskiy
Add field, so property can declare support for setting the property
when device is realized. To be used in the following commit.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
---
 include/hw/qdev-properties.h | 1 +
 hw/core/qdev-properties.c| 6 +++---
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/include/hw/qdev-properties.h b/include/hw/qdev-properties.h
index 0ef97d60ce..f7925f67d0 100644
--- a/include/hw/qdev-properties.h
+++ b/include/hw/qdev-properties.h
@@ -32,6 +32,7 @@ struct PropertyInfo {
 const char *name;
 const char *description;
 const QEnumLookup *enum_table;
+bool realized_set_allowed; /* allow setting property on realized device */
 int (*print)(Object *obj, Property *prop, char *dest, size_t len);
 void (*set_default_value)(ObjectProperty *op, const Property *prop);
 ObjectProperty *(*create)(ObjectClass *oc, const char *name,
diff --git a/hw/core/qdev-properties.c b/hw/core/qdev-properties.c
index 50f40949f5..c34aac6ebc 100644
--- a/hw/core/qdev-properties.c
+++ b/hw/core/qdev-properties.c
@@ -26,11 +26,11 @@ void qdev_prop_set_after_realize(DeviceState *dev, const 
char *name,
 
 /* returns: true if property is allowed to be set, false otherwise */
 static bool qdev_prop_allow_set(Object *obj, const char *name,
-Error **errp)
+const PropertyInfo *info, Error **errp)
 {
 DeviceState *dev = DEVICE(obj);
 
-if (dev->realized) {
+if (dev->realized && !info->realized_set_allowed) {
 qdev_prop_set_after_realize(dev, name, errp);
 return false;
 }
@@ -79,7 +79,7 @@ static void field_prop_set(Object *obj, Visitor *v, const 
char *name,
 {
 Property *prop = opaque;
 
-if (!qdev_prop_allow_set(obj, name, errp)) {
+if (!qdev_prop_allow_set(obj, name, prop->info, errp)) {
 return;
 }
 
-- 
2.29.2




[PATCH v6 08/33] block/backup: set copy_range and compress after filter insertion

2021-07-21 Thread Vladimir Sementsov-Ogievskiy
We are going to publish copy-before-write filter, so it would be
initialized through options. Still we don't want to publish compress
and copy-range options, as

1. Modern way to enable compression is to use compress filter.

2. For copy-range it's unclean how to make proper interface:
 - it's has experimental prefix for backup job anyway
 - the whole BackupPerf structure doesn't make sense for the filter
 So, let's just add copy-range possibility to the filter later if
 needed.

Still, we are going to continue support for compression and
experimental copy-range in backup job. So, set these options after
filter creation.

Note, that we can drop "compress" argument of bdrv_cbw_append() now, as
well as "perf". The only reason not doing so is that now, when I
prepare this patch the big series around it is already reviewed and I
want to avoid extra rebase conflicts to simplify review of the
following version.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/copy-before-write.h | 1 -
 block/backup.c| 3 ++-
 block/copy-before-write.c | 4 +---
 3 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/block/copy-before-write.h b/block/copy-before-write.h
index f37e2249ae..538aab8bdb 100644
--- a/block/copy-before-write.h
+++ b/block/copy-before-write.h
@@ -33,7 +33,6 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
   BlockDriverState *target,
   const char *filter_node_name,
   uint64_t cluster_size,
-  BackupPerf *perf,
   bool compress,
   BlockCopyState **bcs,
   Error **errp);
diff --git a/block/backup.c b/block/backup.c
index 84f9a5f02c..b31fd99ab3 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -504,7 +504,7 @@ BlockJob *backup_job_create(const char *job_id, 
BlockDriverState *bs,
 }
 
 cbw = bdrv_cbw_append(bs, target, filter_node_name,
-  cluster_size, perf, compress, &bcs, errp);
+  cluster_size, false, &bcs, errp);
 if (!cbw) {
 goto error;
 }
@@ -530,6 +530,7 @@ BlockJob *backup_job_create(const char *job_id, 
BlockDriverState *bs,
 job->len = len;
 job->perf = *perf;
 
+block_copy_set_copy_opts(bcs, perf->use_copy_range, compress);
 block_copy_set_progress_meter(bcs, &job->common.job.progress);
 block_copy_set_speed(bcs, speed);
 
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index 4337076c1c..235251a640 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -170,7 +170,6 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
   BlockDriverState *target,
   const char *filter_node_name,
   uint64_t cluster_size,
-  BackupPerf *perf,
   bool compress,
   BlockCopyState **bcs,
   Error **errp)
@@ -217,8 +216,7 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
 
 state->cluster_size = cluster_size;
 state->bcs = block_copy_state_new(top->backing, state->target,
-  cluster_size, perf->use_copy_range,
-  compress, errp);
+  cluster_size, false, compress, errp);
 if (!state->bcs) {
 error_prepend(errp, "Cannot create block-copy-state: ");
 goto fail;
-- 
2.29.2




[PATCH v6 17/33] block/copy-before-write: bdrv_cbw_append(): drop unused compress arg

2021-07-21 Thread Vladimir Sementsov-Ogievskiy
Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/copy-before-write.h | 1 -
 block/backup.c| 2 +-
 block/copy-before-write.c | 7 +++
 3 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/block/copy-before-write.h b/block/copy-before-write.h
index b386fd8f01..51847e711a 100644
--- a/block/copy-before-write.h
+++ b/block/copy-before-write.h
@@ -32,7 +32,6 @@
 BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
   BlockDriverState *target,
   const char *filter_node_name,
-  bool compress,
   BlockCopyState **bcs,
   Error **errp);
 void bdrv_cbw_drop(BlockDriverState *bs);
diff --git a/block/backup.c b/block/backup.c
index 83516297cb..4869f1e5da 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -452,7 +452,7 @@ BlockJob *backup_job_create(const char *job_id, 
BlockDriverState *bs,
 goto error;
 }
 
-cbw = bdrv_cbw_append(bs, target, filter_node_name, false, &bcs, errp);
+cbw = bdrv_cbw_append(bs, target, filter_node_name, &bcs, errp);
 if (!cbw) {
 goto error;
 }
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index 4858dcf8ff..1e7180760a 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -145,7 +145,7 @@ static void cbw_child_perm(BlockDriverState *bs, BdrvChild 
*c,
 }
 
 static int cbw_init(BlockDriverState *bs, BlockDriverState *source,
-BlockDriverState *target, bool compress, Error **errp)
+BlockDriverState *target, Error **errp)
 {
 BDRVCopyBeforeWriteState *s = bs->opaque;
 
@@ -173,7 +173,7 @@ static int cbw_init(BlockDriverState *bs, BlockDriverState 
*source,
 ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
  bs->file->bs->supported_zero_flags);
 
-s->bcs = block_copy_state_new(bs->file, s->target, false, compress, errp);
+s->bcs = block_copy_state_new(bs->file, s->target, false, false, errp);
 if (!s->bcs) {
 error_prepend(errp, "Cannot create block-copy-state: ");
 return -EINVAL;
@@ -202,7 +202,6 @@ BlockDriver bdrv_cbw_filter = {
 BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
   BlockDriverState *target,
   const char *filter_node_name,
-  bool compress,
   BlockCopyState **bcs,
   Error **errp)
 {
@@ -221,7 +220,7 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
 }
 state = top->opaque;
 
-ret = cbw_init(top, source, target, compress, errp);
+ret = cbw_init(top, source, target, errp);
 if (ret < 0) {
 goto fail;
 }
-- 
2.29.2




[PATCH v6 12/33] block/copy-before-write: use file child instead of backing

2021-07-21 Thread Vladimir Sementsov-Ogievskiy
We are going to publish copy-before-write filter, and there no public
backing-child-based filter in Qemu. No reason to create a precedent, so
let's refactor copy-before-write filter instead.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
---
 block/copy-before-write.c | 39 ++-
 1 file changed, 22 insertions(+), 17 deletions(-)

diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index 945d9340f4..7a6c15f141 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -43,7 +43,7 @@ static coroutine_fn int cbw_co_preadv(
 BlockDriverState *bs, uint64_t offset, uint64_t bytes,
 QEMUIOVector *qiov, int flags)
 {
-return bdrv_co_preadv(bs->backing, offset, bytes, qiov, flags);
+return bdrv_co_preadv(bs->file, offset, bytes, qiov, flags);
 }
 
 static coroutine_fn int cbw_do_copy_before_write(BlockDriverState *bs,
@@ -71,7 +71,7 @@ static int coroutine_fn cbw_co_pdiscard(BlockDriverState *bs,
 return ret;
 }
 
-return bdrv_co_pdiscard(bs->backing, offset, bytes);
+return bdrv_co_pdiscard(bs->file, offset, bytes);
 }
 
 static int coroutine_fn cbw_co_pwrite_zeroes(BlockDriverState *bs,
@@ -82,7 +82,7 @@ static int coroutine_fn cbw_co_pwrite_zeroes(BlockDriverState 
*bs,
 return ret;
 }
 
-return bdrv_co_pwrite_zeroes(bs->backing, offset, bytes, flags);
+return bdrv_co_pwrite_zeroes(bs->file, offset, bytes, flags);
 }
 
 static coroutine_fn int cbw_co_pwritev(BlockDriverState *bs,
@@ -95,29 +95,22 @@ static coroutine_fn int cbw_co_pwritev(BlockDriverState *bs,
 return ret;
 }
 
-return bdrv_co_pwritev(bs->backing, offset, bytes, qiov, flags);
+return bdrv_co_pwritev(bs->file, offset, bytes, qiov, flags);
 }
 
 static int coroutine_fn cbw_co_flush(BlockDriverState *bs)
 {
-if (!bs->backing) {
+if (!bs->file) {
 return 0;
 }
 
-return bdrv_co_flush(bs->backing->bs);
+return bdrv_co_flush(bs->file->bs);
 }
 
 static void cbw_refresh_filename(BlockDriverState *bs)
 {
-if (bs->backing == NULL) {
-/*
- * we can be here after failed bdrv_attach_child in
- * bdrv_set_backing_hd
- */
-return;
-}
 pstrcpy(bs->exact_filename, sizeof(bs->exact_filename),
-bs->backing->bs->filename);
+bs->file->bs->filename);
 }
 
 static void cbw_child_perm(BlockDriverState *bs, BdrvChild *c,
@@ -186,6 +179,7 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
 top = bdrv_new_open_driver(&bdrv_cbw_filter, filter_node_name,
BDRV_O_RDWR, errp);
 if (!top) {
+error_prepend(errp, "Cannot open driver: ");
 return NULL;
 }
 
@@ -201,21 +195,32 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState 
*source,
 state->target = bdrv_attach_child(top, target, "target", &child_of_bds,
   BDRV_CHILD_DATA, errp);
 if (!state->target) {
+error_prepend(errp, "Cannot attach target child: ");
+bdrv_unref(top);
+return NULL;
+}
+
+bdrv_ref(source);
+top->file = bdrv_attach_child(top, source, "file", &child_of_bds,
+  BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
+  errp);
+if (!top->file) {
+error_prepend(errp, "Cannot attach file child: ");
 bdrv_unref(top);
 return NULL;
 }
 
 bdrv_drained_begin(source);
 
-ret = bdrv_append(top, source, errp);
+ret = bdrv_replace_node(source, top, errp);
 if (ret < 0) {
 error_prepend(errp, "Cannot append copy-before-write filter: ");
 goto fail;
 }
 appended = true;
 
-state->bcs = block_copy_state_new(top->backing, state->target,
-  false, compress, errp);
+state->bcs = block_copy_state_new(top->file, state->target, false, 
compress,
+  errp);
 if (!state->bcs) {
 error_prepend(errp, "Cannot create block-copy-state: ");
 goto fail;
-- 
2.29.2




[PATCH v6 16/33] block/copy-before-write: cbw_init(): use file child after attaching

2021-07-21 Thread Vladimir Sementsov-Ogievskiy
In the next commit we'll get rid of source argument of cbw_init().
Prepare to it now, to make next commit simpler: move the code block
that uses source below attaching the child and use bs->file->bs instead
of source variable.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
---
 block/copy-before-write.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index d7f1833efa..4858dcf8ff 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -149,13 +149,6 @@ static int cbw_init(BlockDriverState *bs, BlockDriverState 
*source,
 {
 BDRVCopyBeforeWriteState *s = bs->opaque;
 
-bs->total_sectors = source->total_sectors;
-bs->supported_write_flags = BDRV_REQ_WRITE_UNCHANGED |
-(BDRV_REQ_FUA & source->supported_write_flags);
-bs->supported_zero_flags = BDRV_REQ_WRITE_UNCHANGED |
-((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
- source->supported_zero_flags);
-
 bdrv_ref(target);
 s->target = bdrv_attach_child(bs, target, "target", &child_of_bds,
   BDRV_CHILD_DATA, errp);
@@ -173,6 +166,13 @@ static int cbw_init(BlockDriverState *bs, BlockDriverState 
*source,
 return -EINVAL;
 }
 
+bs->total_sectors = bs->file->bs->total_sectors;
+bs->supported_write_flags = BDRV_REQ_WRITE_UNCHANGED |
+(BDRV_REQ_FUA & bs->file->bs->supported_write_flags);
+bs->supported_zero_flags = BDRV_REQ_WRITE_UNCHANGED |
+((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
+ bs->file->bs->supported_zero_flags);
+
 s->bcs = block_copy_state_new(bs->file, s->target, false, compress, errp);
 if (!s->bcs) {
 error_prepend(errp, "Cannot create block-copy-state: ");
-- 
2.29.2




[PATCH v6 01/33] block: introduce bdrv_replace_child_bs()

2021-07-21 Thread Vladimir Sementsov-Ogievskiy
Add function to transactionally replace bs inside BdrvChild.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
---
 include/block/block.h |  2 ++
 block.c   | 31 +++
 2 files changed, 33 insertions(+)

diff --git a/include/block/block.h b/include/block/block.h
index 3477290f9a..740038a892 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -361,6 +361,8 @@ int bdrv_append(BlockDriverState *bs_new, BlockDriverState 
*bs_top,
 Error **errp);
 int bdrv_replace_node(BlockDriverState *from, BlockDriverState *to,
   Error **errp);
+int bdrv_replace_child_bs(BdrvChild *child, BlockDriverState *new_bs,
+  Error **errp);
 BlockDriverState *bdrv_insert_node(BlockDriverState *bs, QDict *node_options,
int flags, Error **errp);
 int bdrv_drop_filter(BlockDriverState *bs, Error **errp);
diff --git a/block.c b/block.c
index be083f389e..94a556e61d 100644
--- a/block.c
+++ b/block.c
@@ -5048,6 +5048,37 @@ out:
 return ret;
 }
 
+/* Not for empty child */
+int bdrv_replace_child_bs(BdrvChild *child, BlockDriverState *new_bs,
+  Error **errp)
+{
+int ret;
+Transaction *tran = tran_new();
+g_autoptr(GHashTable) found = NULL;
+g_autoptr(GSList) refresh_list = NULL;
+BlockDriverState *old_bs = child->bs;
+
+bdrv_ref(old_bs);
+bdrv_drained_begin(old_bs);
+bdrv_drained_begin(new_bs);
+
+bdrv_replace_child_tran(child, new_bs, tran);
+
+found = g_hash_table_new(NULL, NULL);
+refresh_list = bdrv_topological_dfs(refresh_list, found, old_bs);
+refresh_list = bdrv_topological_dfs(refresh_list, found, new_bs);
+
+ret = bdrv_list_refresh_perms(refresh_list, NULL, tran, errp);
+
+tran_finalize(tran, ret);
+
+bdrv_drained_end(old_bs);
+bdrv_drained_end(new_bs);
+bdrv_unref(old_bs);
+
+return ret;
+}
+
 static void bdrv_delete(BlockDriverState *bs)
 {
 assert(bdrv_op_blocker_is_empty(bs));
-- 
2.29.2




[PATCH v6 02/33] block: introduce blk_replace_bs

2021-07-21 Thread Vladimir Sementsov-Ogievskiy
Add function to change bs inside blk.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
---
 include/sysemu/block-backend.h | 1 +
 block/block-backend.c  | 8 
 2 files changed, 9 insertions(+)

diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 9ac5f7bbd3..29d4fdbf63 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -102,6 +102,7 @@ BlockBackend *blk_by_public(BlockBackendPublic *public);
 BlockDriverState *blk_bs(BlockBackend *blk);
 void blk_remove_bs(BlockBackend *blk);
 int blk_insert_bs(BlockBackend *blk, BlockDriverState *bs, Error **errp);
+int blk_replace_bs(BlockBackend *blk, BlockDriverState *new_bs, Error **errp);
 bool bdrv_has_blk(BlockDriverState *bs);
 bool bdrv_is_root_node(BlockDriverState *bs);
 int blk_set_perm(BlockBackend *blk, uint64_t perm, uint64_t shared_perm,
diff --git a/block/block-backend.c b/block/block-backend.c
index deb55c272e..6140d133e2 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -869,6 +869,14 @@ int blk_insert_bs(BlockBackend *blk, BlockDriverState *bs, 
Error **errp)
 return 0;
 }
 
+/*
+ * Change BlockDriverState associated with @blk.
+ */
+int blk_replace_bs(BlockBackend *blk, BlockDriverState *new_bs, Error **errp)
+{
+return bdrv_replace_child_bs(blk->root, new_bs, errp);
+}
+
 /*
  * Sets the permission bitmasks that the user of the BlockBackend needs.
  */
-- 
2.29.2




[PATCH v6 11/33] block/copy-before-write: drop extra bdrv_unref on failure path

2021-07-21 Thread Vladimir Sementsov-Ogievskiy
bdrv_attach_child() do bdrv_unref() on failure, so we shouldn't do it
by hand here.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
---
 block/copy-before-write.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index 2a51cc64e4..945d9340f4 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -201,7 +201,6 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
 state->target = bdrv_attach_child(top, target, "target", &child_of_bds,
   BDRV_CHILD_DATA, errp);
 if (!state->target) {
-bdrv_unref(target);
 bdrv_unref(top);
 return NULL;
 }
-- 
2.29.2




[PATCH v6 07/33] block/block-copy: introduce block_copy_set_copy_opts()

2021-07-21 Thread Vladimir Sementsov-Ogievskiy
We'll need a possibility to set compress and use_copy_range options
after initialization of the state. So make corresponding part of
block_copy_state_new() separate and public.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 include/block/block-copy.h |  2 ++
 block/block-copy.c | 50 +++---
 2 files changed, 32 insertions(+), 20 deletions(-)

diff --git a/include/block/block-copy.h b/include/block/block-copy.h
index 734389d32a..f0ba7bc828 100644
--- a/include/block/block-copy.h
+++ b/include/block/block-copy.h
@@ -28,6 +28,8 @@ BlockCopyState *block_copy_state_new(BdrvChild *source, 
BdrvChild *target,
  int64_t cluster_size, bool use_copy_range,
  bool compress, Error **errp);
 
+void block_copy_set_copy_opts(BlockCopyState *s, bool use_copy_range,
+  bool compress);
 void block_copy_set_progress_meter(BlockCopyState *s, ProgressMeter *pm);
 
 void block_copy_state_free(BlockCopyState *s);
diff --git a/block/block-copy.c b/block/block-copy.c
index 7ce5e3d657..469f992c0e 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -315,6 +315,34 @@ static uint32_t block_copy_max_transfer(BdrvChild *source, 
BdrvChild *target)
  target->bs->bl.max_transfer));
 }
 
+/* Function should be called prior any actual copy request */
+void block_copy_set_copy_opts(BlockCopyState *s, bool use_copy_range,
+  bool compress)
+{
+if (compress) {
+s->write_flags |= BDRV_REQ_WRITE_COMPRESSED;
+}
+
+if (s->max_transfer < s->cluster_size) {
+/*
+ * copy_range does not respect max_transfer. We don't want to bother
+ * with requests smaller than block-copy cluster size, so fallback to
+ * buffered copying (read and write respect max_transfer on their
+ * behalf).
+ */
+s->method = COPY_READ_WRITE_CLUSTER;
+} else if (compress) {
+/* Compression supports only cluster-size writes and no copy-range. */
+s->method = COPY_READ_WRITE_CLUSTER;
+} else {
+/*
+ * If copy range enabled, start with COPY_RANGE_SMALL, until first
+ * successful copy_range (look at block_copy_do_copy).
+ */
+s->method = use_copy_range ? COPY_RANGE_SMALL : COPY_READ_WRITE;
+}
+}
+
 BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
  int64_t cluster_size, bool use_copy_range,
  bool compress, Error **errp)
@@ -358,32 +386,14 @@ BlockCopyState *block_copy_state_new(BdrvChild *source, 
BdrvChild *target,
 .copy_bitmap = copy_bitmap,
 .cluster_size = cluster_size,
 .len = bdrv_dirty_bitmap_size(copy_bitmap),
-.write_flags = BDRV_REQ_SERIALISING |
-(compress ? BDRV_REQ_WRITE_COMPRESSED : 0),
+.write_flags = BDRV_REQ_SERIALISING,
 .mem = shres_create(BLOCK_COPY_MAX_MEM),
 .max_transfer = QEMU_ALIGN_DOWN(
 block_copy_max_transfer(source, target),
 cluster_size),
 };
 
-if (s->max_transfer < cluster_size) {
-/*
- * copy_range does not respect max_transfer. We don't want to bother
- * with requests smaller than block-copy cluster size, so fallback to
- * buffered copying (read and write respect max_transfer on their
- * behalf).
- */
-s->method = COPY_READ_WRITE_CLUSTER;
-} else if (compress) {
-/* Compression supports only cluster-size writes and no copy-range. */
-s->method = COPY_READ_WRITE_CLUSTER;
-} else {
-/*
- * If copy range enabled, start with COPY_RANGE_SMALL, until first
- * successful copy_range (look at block_copy_do_copy).
- */
-s->method = use_copy_range ? COPY_RANGE_SMALL : COPY_READ_WRITE;
-}
+block_copy_set_copy_opts(s, use_copy_range, compress);
 
 ratelimit_init(&s->rate_limit);
 qemu_co_mutex_init(&s->lock);
-- 
2.29.2




[PATCH v6 04/33] qdev: allow setting drive property for realized device

2021-07-21 Thread Vladimir Sementsov-Ogievskiy
We need an ability to insert filters above top block node, attached to
block device. It can't be achieved with blockdev-reopen command. So, we
want do it with help of qom-set.

Intended usage:

Assume there is a node A that is attached to some guest device.

1. blockdev-add to create a filter node B that has A as its child.

2. qom-set to change the node attached to the guest device’s
   BlockBackend from A to B.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
---
 hw/core/qdev-properties-system.c | 43 +++-
 1 file changed, 31 insertions(+), 12 deletions(-)

diff --git a/hw/core/qdev-properties-system.c b/hw/core/qdev-properties-system.c
index 2760c21f11..e71f5d64d1 100644
--- a/hw/core/qdev-properties-system.c
+++ b/hw/core/qdev-properties-system.c
@@ -36,11 +36,11 @@
 
 static bool check_prop_still_unset(Object *obj, const char *name,
const void *old_val, const char *new_val,
-   Error **errp)
+   bool allow_override, Error **errp)
 {
 const GlobalProperty *prop = qdev_find_global_prop(obj, name);
 
-if (!old_val) {
+if (!old_val || (!prop && allow_override)) {
 return true;
 }
 
@@ -93,16 +93,34 @@ static void set_drive_helper(Object *obj, Visitor *v, const 
char *name,
 BlockBackend *blk;
 bool blk_created = false;
 int ret;
+BlockDriverState *bs;
+AioContext *ctx;
 
 if (!visit_type_str(v, name, &str, errp)) {
 return;
 }
 
-/*
- * TODO Should this really be an error?  If no, the old value
- * needs to be released before we store the new one.
- */
-if (!check_prop_still_unset(obj, name, *ptr, str, errp)) {
+if (!check_prop_still_unset(obj, name, *ptr, str, true, errp)) {
+return;
+}
+
+if (*ptr) {
+/* BlockBackend alread exists. So, we want to change attached node */
+blk = *ptr;
+ctx = blk_get_aio_context(blk);
+bs = bdrv_lookup_bs(NULL, str, errp);
+if (!bs) {
+return;
+}
+
+if (ctx != bdrv_get_aio_context(bs)) {
+error_setg(errp, "Different aio context is not supported for new "
+   "node");
+}
+
+aio_context_acquire(ctx);
+blk_replace_bs(blk, bs, errp);
+aio_context_release(ctx);
 return;
 }
 
@@ -114,7 +132,7 @@ static void set_drive_helper(Object *obj, Visitor *v, const 
char *name,
 
 blk = blk_by_name(str);
 if (!blk) {
-BlockDriverState *bs = bdrv_lookup_bs(NULL, str, NULL);
+bs = bdrv_lookup_bs(NULL, str, NULL);
 if (bs) {
 /*
  * If the device supports iothreads, it will make sure to move the
@@ -123,8 +141,7 @@ static void set_drive_helper(Object *obj, Visitor *v, const 
char *name,
  * aware of iothreads require their BlockBackends to be in the main
  * AioContext.
  */
-AioContext *ctx = iothread ? bdrv_get_aio_context(bs) :
- qemu_get_aio_context();
+ctx = iothread ? bdrv_get_aio_context(bs) : qemu_get_aio_context();
 blk = blk_new(ctx, 0, BLK_PERM_ALL);
 blk_created = true;
 
@@ -196,6 +213,7 @@ static void release_drive(Object *obj, const char *name, 
void *opaque)
 const PropertyInfo qdev_prop_drive = {
 .name  = "str",
 .description = "Node name or ID of a block device to use as a backend",
+.realized_set_allowed = true,
 .get   = get_drive,
 .set   = set_drive,
 .release = release_drive,
@@ -204,6 +222,7 @@ const PropertyInfo qdev_prop_drive = {
 const PropertyInfo qdev_prop_drive_iothread = {
 .name  = "str",
 .description = "Node name or ID of a block device to use as a backend",
+.realized_set_allowed = true,
 .get   = get_drive,
 .set   = set_drive_iothread,
 .release = release_drive,
@@ -238,7 +257,7 @@ static void set_chr(Object *obj, Visitor *v, const char 
*name, void *opaque,
  * TODO Should this really be an error?  If no, the old value
  * needs to be released before we store the new one.
  */
-if (!check_prop_still_unset(obj, name, be->chr, str, errp)) {
+if (!check_prop_still_unset(obj, name, be->chr, str, false, errp)) {
 return;
 }
 
@@ -408,7 +427,7 @@ static void set_netdev(Object *obj, Visitor *v, const char 
*name,
  * TODO Should this really be an error?  If no, the old value
  * needs to be released before we store the new one.
  */
-if (!check_prop_still_unset(obj, name, ncs[i], str, errp)) {
+if (!check_prop_still_unset(obj, name, ncs[i], str, false, errp)) {
 goto out;
 }
 
-- 
2.29.2




[PATCH v6 00/33] block: publish backup-top filter

2021-07-21 Thread Vladimir Sementsov-Ogievskiy
Hi all!

Of course, that's a material for 6.2, still no harm in keeping it up to
date.

v6: rebase on master

Patches without r-b: 9, 10, 19, 20

v5: rebase on master
25: fix pylint complain 'Unnecessary "else" after "return"' [John]
25-26: rebased on moved machine.py
   use Dict[str, object] typing instead of Dict[str, Any]  [John]
   add John's r-b, keep Max's r-b

Vladimir Sementsov-Ogievskiy (33):
  block: introduce bdrv_replace_child_bs()
  block: introduce blk_replace_bs
  qdev-properties: PropertyInfo: add realized_set_allowed field
  qdev: allow setting drive property for realized device
  block: rename backup-top to copy-before-write
  block-copy: always set BDRV_REQ_SERIALISING flag
  block/block-copy: introduce block_copy_set_copy_opts()
  block/backup: set copy_range and compress after filter insertion
  block/backup: move cluster size calculation to block-copy
  block/copy-before-write: relax permission requirements when no parents
  block/copy-before-write: drop extra bdrv_unref on failure path
  block/copy-before-write: use file child instead of backing
  block/copy-before-write: bdrv_cbw_append(): replace child at last
  block/copy-before-write: introduce cbw_init()
  block/copy-before-write: cbw_init(): rename variables
  block/copy-before-write: cbw_init(): use file child after attaching
  block/copy-before-write: bdrv_cbw_append(): drop unused compress arg
  block/copy-before-write: cbw_init(): use options
  block/copy-before-write: initialize block-copy bitmap
  block/block-copy: make setting progress optional
  block/copy-before-write: make public block driver
  qapi: publish copy-before-write filter
  python/qemu/machine.py: refactor _qemu_args()
  python/qemu/machine: QEMUMachine: improve qmp() method
  iotests.py: VM: add own __enter__ method
  iotests/222: fix pylint and mypy complains
  iotests/222: constantly use single quotes for strings
  iotests: move 222 to tests/image-fleecing
  iotests.py: hmp_qemu_io: support qdev
  iotests/image-fleecing: proper source device
  iotests/image-fleecing: rename tgt_node
  iotests/image-fleecing: prepare for adding new test-case
  iotests/image-fleecing: add test-case for copy-before-write filter

 qapi/block-core.json|  25 +-
 block/{backup-top.h => copy-before-write.h} |  25 +-
 include/block/block-copy.h  |   6 +-
 include/block/block.h   |   2 +
 include/hw/qdev-properties.h|   1 +
 include/sysemu/block-backend.h  |   1 +
 block.c |  31 +++
 block/backup-top.c  | 253 ---
 block/backup.c  | 116 ++---
 block/block-backend.c   |   8 +
 block/block-copy.c  | 142 ---
 block/copy-before-write.c   | 256 
 hw/core/qdev-properties-system.c|  43 +++-
 hw/core/qdev-properties.c   |   6 +-
 MAINTAINERS |   4 +-
 block/meson.build   |   2 +-
 python/qemu/machine/machine.py  |  30 ++-
 tests/qemu-iotests/222  | 159 
 tests/qemu-iotests/222.out  |  67 -
 tests/qemu-iotests/283  |  35 ++-
 tests/qemu-iotests/283.out  |   4 +-
 tests/qemu-iotests/297  |   2 +-
 tests/qemu-iotests/iotests.py   |   9 +-
 tests/qemu-iotests/tests/image-fleecing | 192 +++
 tests/qemu-iotests/tests/image-fleecing.out | 139 +++
 25 files changed, 889 insertions(+), 669 deletions(-)
 rename block/{backup-top.h => copy-before-write.h} (56%)
 delete mode 100644 block/backup-top.c
 create mode 100644 block/copy-before-write.c
 delete mode 100755 tests/qemu-iotests/222
 delete mode 100644 tests/qemu-iotests/222.out
 create mode 100755 tests/qemu-iotests/tests/image-fleecing
 create mode 100644 tests/qemu-iotests/tests/image-fleecing.out

-- 
2.29.2




[PATCH for-6.1? v2 2/3] iothread: add aio-max-batch parameter

2021-07-21 Thread Stefano Garzarella
The `aio-max-batch` parameter will be propagated to AIO engines
and it will be used to control the maximum number of queued requests.

When there are in queue a number of requests equal to `aio-max-batch`,
the engine invokes the system call to forward the requests to the kernel.

This parameter allows us to control the maximum batch size to reduce
the latency that requests might accumulate while queued in the AIO
engine queue.

If `aio-max-batch` is equal to 0 (default value), the AIO engine will
use its default maximum batch size value.

Signed-off-by: Stefano Garzarella 
---

Notes:
v2:
- s/bacth/batch/ [stefanha]

 qapi/misc.json|  6 -
 qapi/qom.json |  7 -
 include/block/aio.h   | 12 +
 include/sysemu/iothread.h |  3 +++
 iothread.c| 55 +++
 monitor/hmp-cmds.c|  2 ++
 util/aio-posix.c  | 12 +
 util/aio-win32.c  |  5 
 util/async.c  |  2 ++
 qemu-options.hx   |  8 --
 10 files changed, 103 insertions(+), 9 deletions(-)

diff --git a/qapi/misc.json b/qapi/misc.json
index 156f98203e..5c2ca3b556 100644
--- a/qapi/misc.json
+++ b/qapi/misc.json
@@ -86,6 +86,9 @@
 # @poll-shrink: how many ns will be removed from polling time, 0 means that
 #   it's not configured (since 2.9)
 #
+# @aio-max-batch: maximum number of requests in a batch for the AIO engine,
+# 0 means that the engine will use its default (since 6.1)
+#
 # Since: 2.0
 ##
 { 'struct': 'IOThreadInfo',
@@ -93,7 +96,8 @@
'thread-id': 'int',
'poll-max-ns': 'int',
'poll-grow': 'int',
-   'poll-shrink': 'int' } }
+   'poll-shrink': 'int',
+   'aio-max-batch': 'int' } }
 
 ##
 # @query-iothreads:
diff --git a/qapi/qom.json b/qapi/qom.json
index 652be317b8..6d5f4a88e6 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -516,12 +516,17 @@
 #   algorithm detects it is spending too long polling without
 #   encountering events. 0 selects a default behaviour (default: 0)
 #
+# @aio-max-batch: maximum number of requests in a batch for the AIO engine,
+# 0 means that the engine will use its default
+# (default:0, since 6.1)
+#
 # Since: 2.0
 ##
 { 'struct': 'IothreadProperties',
   'data': { '*poll-max-ns': 'int',
 '*poll-grow': 'int',
-'*poll-shrink': 'int' } }
+'*poll-shrink': 'int',
+'*aio-max-batch': 'int' } }
 
 ##
 # @MemoryBackendProperties:
diff --git a/include/block/aio.h b/include/block/aio.h
index 807edce9b5..47fbe9d81f 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -232,6 +232,9 @@ struct AioContext {
 int64_t poll_grow;  /* polling time growth factor */
 int64_t poll_shrink;/* polling time shrink factor */
 
+/* AIO engine parameters */
+int64_t aio_max_batch;  /* maximum number of requests in a batch */
+
 /*
  * List of handlers participating in userspace polling.  Protected by
  * ctx->list_lock.  Iterated and modified mostly by the event loop thread
@@ -755,4 +758,13 @@ void aio_context_set_poll_params(AioContext *ctx, int64_t 
max_ns,
  int64_t grow, int64_t shrink,
  Error **errp);
 
+/**
+ * aio_context_set_aio_params:
+ * @ctx: the aio context
+ * @max_batch: maximum number of requests in a batch, 0 means that the
+ * engine will use its default
+ */
+void aio_context_set_aio_params(AioContext *ctx, int64_t max_batch,
+Error **errp);
+
 #endif
diff --git a/include/sysemu/iothread.h b/include/sysemu/iothread.h
index f177142f16..7f714bd136 100644
--- a/include/sysemu/iothread.h
+++ b/include/sysemu/iothread.h
@@ -37,6 +37,9 @@ struct IOThread {
 int64_t poll_max_ns;
 int64_t poll_grow;
 int64_t poll_shrink;
+
+/* AioContext AIO engine parameters */
+int64_t aio_max_batch;
 };
 typedef struct IOThread IOThread;
 
diff --git a/iothread.c b/iothread.c
index 103679a16b..ddbbde61f7 100644
--- a/iothread.c
+++ b/iothread.c
@@ -152,6 +152,24 @@ static void iothread_init_gcontext(IOThread *iothread)
 iothread->main_loop = g_main_loop_new(iothread->worker_context, TRUE);
 }
 
+static void iothread_set_aio_context_params(IOThread *iothread, Error **errp)
+{
+ERRP_GUARD();
+
+aio_context_set_poll_params(iothread->ctx,
+iothread->poll_max_ns,
+iothread->poll_grow,
+iothread->poll_shrink,
+errp);
+if (*errp) {
+return;
+}
+
+aio_context_set_aio_params(iothread->ctx,
+   iothread->aio_max_batch,
+   errp);
+}
+
 static void iothread_complete(UserCreatable *obj, Error **errp)
 {
 Error *local_error = NULL;
@@ -171,11 +

[PATCH for-6.1? v2 3/3] linux-aio: limit the batch size using `aio-max-batch` parameter

2021-07-21 Thread Stefano Garzarella
When there are multiple queues attached to the same AIO context,
some requests may experience high latency, since in the worst case
the AIO engine queue is only flushed when it is full (MAX_EVENTS) or
there are no more queues plugged.

Commit 2558cb8dd4 ("linux-aio: increasing MAX_EVENTS to a larger
hardcoded value") changed MAX_EVENTS from 128 to 1024, to increase
the number of in-flight requests. But this change also increased
the potential maximum batch to 1024 elements.

When there is a single queue attached to the AIO context, the issue
is mitigated from laio_io_unplug() that will flush the queue every
time is invoked since there can't be others queue plugged.

Let's use the new `aio-max-batch` IOThread parameter to mitigate
this issue, limiting the number of requests in a batch.

We also define a default value (32): this value is obtained running
some benchmarks and it represents a good tradeoff between the latency
increase while a request is queued and the cost of the io_submit(2)
system call.

Signed-off-by: Stefano Garzarella 
---

Notes:
v2:
- limit the batch with the number of available events [stefanha]

 block/linux-aio.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/block/linux-aio.c b/block/linux-aio.c
index 3c0527c2bf..0dab507b71 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -28,6 +28,9 @@
  */
 #define MAX_EVENTS 1024
 
+/* Maximum number of requests in a batch. (default value) */
+#define DEFAULT_MAX_BATCH 32
+
 struct qemu_laiocb {
 Coroutine *co;
 LinuxAioState *ctx;
@@ -351,6 +354,10 @@ static int laio_do_submit(int fd, struct qemu_laiocb 
*laiocb, off_t offset,
 LinuxAioState *s = laiocb->ctx;
 struct iocb *iocbs = &laiocb->iocb;
 QEMUIOVector *qiov = laiocb->qiov;
+int64_t max_batch = s->aio_context->aio_max_batch ?: DEFAULT_MAX_BATCH;
+
+/* limit the batch with the number of available events */
+max_batch = MIN_NON_ZERO(MAX_EVENTS - s->io_q.in_flight, max_batch);
 
 switch (type) {
 case QEMU_AIO_WRITE:
@@ -371,7 +378,7 @@ static int laio_do_submit(int fd, struct qemu_laiocb 
*laiocb, off_t offset,
 s->io_q.in_queue++;
 if (!s->io_q.blocked &&
 (!s->io_q.plugged ||
- s->io_q.in_flight + s->io_q.in_queue >= MAX_EVENTS)) {
+ s->io_q.in_queue >= max_batch)) {
 ioq_submit(s);
 }
 
-- 
2.31.1




[PATCH for-6.1? v2 1/3] iothread: generalize iothread_set_param/iothread_get_param

2021-07-21 Thread Stefano Garzarella
Changes in preparation for next patches where we add a new
parameter not related to the poll mechanism.

Let's add two new generic functions (iothread_set_param and
iothread_get_param) that we use to set and get IOThread
parameters.

Signed-off-by: Stefano Garzarella 
---
 iothread.c | 27 +++
 1 file changed, 23 insertions(+), 4 deletions(-)

diff --git a/iothread.c b/iothread.c
index 2c5ccd7367..103679a16b 100644
--- a/iothread.c
+++ b/iothread.c
@@ -213,7 +213,7 @@ static PollParamInfo poll_shrink_info = {
 "poll-shrink", offsetof(IOThread, poll_shrink),
 };
 
-static void iothread_get_poll_param(Object *obj, Visitor *v,
+static void iothread_get_param(Object *obj, Visitor *v,
 const char *name, void *opaque, Error **errp)
 {
 IOThread *iothread = IOTHREAD(obj);
@@ -223,7 +223,7 @@ static void iothread_get_poll_param(Object *obj, Visitor *v,
 visit_type_int64(v, name, field, errp);
 }
 
-static void iothread_set_poll_param(Object *obj, Visitor *v,
+static bool iothread_set_param(Object *obj, Visitor *v,
 const char *name, void *opaque, Error **errp)
 {
 IOThread *iothread = IOTHREAD(obj);
@@ -232,17 +232,36 @@ static void iothread_set_poll_param(Object *obj, Visitor 
*v,
 int64_t value;
 
 if (!visit_type_int64(v, name, &value, errp)) {
-return;
+return false;
 }
 
 if (value < 0) {
 error_setg(errp, "%s value must be in range [0, %" PRId64 "]",
info->name, INT64_MAX);
-return;
+return false;
 }
 
 *field = value;
 
+return true;
+}
+
+static void iothread_get_poll_param(Object *obj, Visitor *v,
+const char *name, void *opaque, Error **errp)
+{
+
+iothread_get_param(obj, v, name, opaque, errp);
+}
+
+static void iothread_set_poll_param(Object *obj, Visitor *v,
+const char *name, void *opaque, Error **errp)
+{
+IOThread *iothread = IOTHREAD(obj);
+
+if (!iothread_set_param(obj, v, name, opaque, errp)) {
+return;
+}
+
 if (iothread->ctx) {
 aio_context_set_poll_params(iothread->ctx,
 iothread->poll_max_ns,
-- 
2.31.1




[PATCH for-6.1? v2 0/3] linux-aio: limit the batch size to reduce queue latency

2021-07-21 Thread Stefano Garzarella
Since it's a performance regression, if possible we could include it in 6.1-rc1.
There shouldn't be any particular criticism.

v1: https://lists.gnu.org/archive/html/qemu-devel/2021-07/msg01526.html
v2:
  - s/bacth/batch/ [stefanha]
  - limit the batch with the number of available events [stefanha]
  - rebased on master
  - re-run benchmarks

Commit 2558cb8dd4 ("linux-aio: increasing MAX_EVENTS to a larger hardcoded
value") changed MAX_EVENTS from 128 to 1024, to increase the number of
in-flight requests. But this change also increased the potential maximum batch
to 1024 elements.

The problem is noticeable when we have a lot of requests in flight and multiple
queues attached to the same AIO context.
In this case we potentially create very large batches. Instead, when we have
a single queue, the batch is limited because when the queue is unplugged,
there is a call to io_submit(2).
In practice, io_submit(2) was called only when there are no more queues plugged
in or when we fill the AIO queue (MAX_EVENTS = 1024).

This series limit the batch size (number of request submitted to the kernel
through a single io_submit(2) call) in the Linux AIO backend, and add a new
`aio-max-batch` parameter to IOThread to allow tuning it.
If `aio-max-batch` is equal to 0 (default value), the AIO engine will use its
default maximum batch size value.

I run some benchmarks to choose 32 as default batch value for Linux AIO.
Below the kIOPS measured with fio running in the guest (average over 3 runs):

   |   master  |   with this series applied|
   |143c2e04328| maxbatch=8|maxbatch=16|maxbatch=32|maxbatch=64|
  # queues | 1q  | 4qs | 1q  | 4qs | 1q  | 4qs | 1q  | 4qs | 1q  | 4qs |
-- randread tests -|---|
bs=4k iodepth=1| 200 | 195 | 181 | 208 | 200 | 203 | 206 | 212 | 200 | 204 |
bs=4k iodepth=8| 269 | 231 | 256 | 244 | 255 | 260 | 266 | 268 | 270 | 250 |
bs=4k iodepth=64   | 230 | 198 | 262 | 265 | 261 | 253 | 260 | 273 | 253 | 263 |
bs=4k iodepth=128  | 217 | 181 | 261 | 253 | 249 | 276 | 250 | 278 | 255 | 278 |
bs=16k iodepth=1   | 130 | 130 | 130 | 130 | 130 | 130 | 137 | 130 | 130 | 130 |
bs=16k iodepth=8   | 130 | 131 | 130 | 131 | 130 | 130 | 137 | 131 | 131 | 130 |
bs=16k iodepth=64  | 130 | 102 | 131 | 128 | 131 | 128 | 137 | 140 | 130 | 128 |
bs=16k iodepth=128 | 131 | 100 | 130 | 128 | 131 | 129 | 137 | 141 | 130 | 129 |

1q  = virtio-blk device with a single queue
4qs = virito-blk device with multi queues (one queue per vCPU - 4)

I reported only the most significant tests, but I also did other tests to
make sure there were no regressions, here the full report:
https://docs.google.com/spreadsheets/d/11X3_5FJu7pnMTlf4ZatRDvsnU9K3EPj6Mn3aJIsE4tI

Test environment:
- Disk: Intel Corporation NVMe Datacenter SSD [Optane]
- CPU: Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz
- QEMU: qemu-system-x86_64 -machine q35,accel=kvm -smp 4 -m 4096 \
  ... \
  -object iothread,id=iothread0,aio-max-batch=${MAX_BATCH} \
  -device virtio-blk-pci,iothread=iothread0,num-queues=${NUM_QUEUES}

- benchmark: fio --ioengine=libaio --thread --group_reporting \
 --number_ios=20 --direct=1 --filename=/dev/vdb \
 --rw=${TEST} --bs=${BS} --iodepth=${IODEPTH} --numjobs=16

Next steps:
 - benchmark io_uring and use `aio-max-batch` also there
 - make MAX_EVENTS parametric adding a new `aio-max-events` parameter

Thanks,
Stefano

Stefano Garzarella (3):
  iothread: generalize iothread_set_param/iothread_get_param
  iothread: add aio-max-batch parameter
  linux-aio: limit the batch size using `aio-max-batch` parameter

 qapi/misc.json|  6 ++-
 qapi/qom.json |  7 +++-
 include/block/aio.h   | 12 ++
 include/sysemu/iothread.h |  3 ++
 block/linux-aio.c |  9 -
 iothread.c| 82 ++-
 monitor/hmp-cmds.c|  2 +
 util/aio-posix.c  | 12 ++
 util/aio-win32.c  |  5 +++
 util/async.c  |  2 +
 qemu-options.hx   |  8 +++-
 11 files changed, 134 insertions(+), 14 deletions(-)

-- 
2.31.1




Re: [PATCH v6 6/6] hmp: add virtio commands

2021-07-21 Thread Jonah Palmer


On 7/13/21 10:40 PM, Jason Wang wrote:


在 2021/7/12 下午6:35, Jonah Palmer 写道:

+void hmp_virtio_queue_status(Monitor *mon, const QDict *qdict)
+{
+    Error *err = NULL;
+    const char *path = qdict_get_try_str(qdict, "path");
+    int queue = qdict_get_int(qdict, "queue");
+    VirtQueueStatus *s = qmp_x_debug_virtio_queue_status(path, 
queue, &err);

+
+    if (err != NULL) {
+    hmp_handle_error(mon, err);
+    return;
+    }
+
+    monitor_printf(mon, "%s:\n", path);
+    monitor_printf(mon, "  device_type:  %s\n",
+   VirtioType_str(s->device_type));
+    monitor_printf(mon, "  index:    %d\n", 
s->queue_index);

+    monitor_printf(mon, "  inuse:    %d\n", s->inuse);
+    monitor_printf(mon, "  last_avail_idx:   %d (%"PRId64" %% 
%"PRId64")\n",

+   s->last_avail_idx, s->last_avail_idx % s->vring_num,
+   s->vring_num);
+    monitor_printf(mon, "  shadow_avail_idx: %d (%"PRId64" %% 
%"PRId64")\n",
+   s->shadow_avail_idx, s->shadow_avail_idx % 
s->vring_num,

+   s->vring_num);
+    monitor_printf(mon, "  used_idx: %d (%"PRId64" %% 
%"PRId64")\n",
+   s->used_idx, s->used_idx % s->vring_num, 
s->vring_num);



The modular information is not the case of packed ring where the queue 
size does not have to be a power of 2.


Doesn't modulo work for any integer, regardless if it's a power of 2 or not? 
Could you clarify this for me?

Thank you,

Jonah



Thanks


+    monitor_printf(mon, " signalled_used:   %d (%"PRId64" %% 
%"PRId64")\n",

+   s->signalled_used, s->signalled_used % s->vring_num,
+   s->vring_num);
+    monitor_printf(mon, "  signalled_used_valid: %d\n", 
s->signalled_used_valid);

+    monitor_printf(mon, "  VRing:\n");
+    monitor_printf(mon, "    num: %"PRId64"\n", s->vring_num);
+    monitor_printf(mon, "    num_default: %"PRId64"\n", 
s->vring_num_default);
+    monitor_printf(mon, "    align:   %"PRId64"\n", 
s->vring_align);
+    monitor_printf(mon, "    desc:    0x%016"PRIx64"\n", 
s->vring_desc);
+    monitor_printf(mon, "    avail:   0x%016"PRIx64"\n", 
s->vring_avail);
+    monitor_printf(mon, "    used:    0x%016"PRIx64"\n", 
s->vring_used);

+
+    qapi_free_VirtQueueStatus(s);




Re: [PATCH v6 4/6] qmp: add QMP command x-debug-virtio-queue-status

2021-07-21 Thread Jonah Palmer


On 7/13/21 10:37 PM, Jason Wang wrote:


在 2021/7/12 下午6:35, Jonah Palmer 写道:

From: Laurent Vivier 

This new command shows internal status of a VirtQueue.
(vrings and indexes).

Signed-off-by: Laurent Vivier 
Signed-off-by: Jonah Palmer 
---
  hw/virtio/virtio-stub.c |   6 +++
  hw/virtio/virtio.c  |  37 ++
  qapi/virtio.json    | 102 


  3 files changed, 145 insertions(+)

  [Jonah: Added 'device-type' field to VirtQueueStatus and
  qmp command x-debug-virtio-queue-status.]

diff --git a/hw/virtio/virtio-stub.c b/hw/virtio/virtio-stub.c
index ddb592f..3c1bf17 100644
--- a/hw/virtio/virtio-stub.c
+++ b/hw/virtio/virtio-stub.c
@@ -17,3 +17,9 @@ VirtioStatus *qmp_x_debug_virtio_status(const char* 
path, Error **errp)

  {
  return qmp_virtio_unsupported(errp);
  }
+
+VirtQueueStatus *qmp_x_debug_virtio_queue_status(const char *path,
+ uint16_t queue, 
Error **errp)

+{
+    return qmp_virtio_unsupported(errp);
+}
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 81a0ee8..ccd4371 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -3935,6 +3935,43 @@ static VirtIODevice *virtio_device_find(const 
char *path)

  return NULL;
  }
  +VirtQueueStatus *qmp_x_debug_virtio_queue_status(const char *path,
+ uint16_t queue, 
Error **errp)

+{
+    VirtIODevice *vdev;
+    VirtQueueStatus *status;
+
+    vdev = virtio_device_find(path);
+    if (vdev == NULL) {
+    error_setg(errp, "Path %s is not a VirtIO device", path);
+    return NULL;
+    }
+
+    if (queue >= VIRTIO_QUEUE_MAX || !virtio_queue_get_num(vdev, 
queue)) {

+    error_setg(errp, "Invalid virtqueue number %d", queue);
+    return NULL;
+    }
+
+    status = g_new0(VirtQueueStatus, 1);
+    status->device_type = qapi_enum_parse(&VirtioType_lookup, 
vdev->name,

+  VIRTIO_TYPE_UNKNOWN, NULL);
+    status->queue_index = vdev->vq[queue].queue_index;
+    status->inuse = vdev->vq[queue].inuse;
+    status->vring_num = vdev->vq[queue].vring.num;
+    status->vring_num_default = vdev->vq[queue].vring.num_default;
+    status->vring_align = vdev->vq[queue].vring.align;
+    status->vring_desc = vdev->vq[queue].vring.desc;
+    status->vring_avail = vdev->vq[queue].vring.avail;
+    status->vring_used = vdev->vq[queue].vring.used;
+    status->last_avail_idx = vdev->vq[queue].last_avail_idx;



As mentioned in previous versions. We need add vhost support otherwise 
the value here is wrong.


Got it. I'll add a case to determine if vhost is active for a given device.
So, in the case that vhost is active, should I just not print out the value or 
would I substitute it with
another value (whatever that might be)? Same question for shadow_avail_idx 
below as well.

Jonah





+    status->shadow_avail_idx = vdev->vq[queue].shadow_avail_idx;



The shadow index is something that is implementation specific e.g in 
the case of vhost it's kind of meaningless.


Thanks



+    status->used_idx = vdev->vq[queue].used_idx;
+    status->signalled_used = vdev->vq[queue].signalled_used;
+    status->signalled_used_valid = 
vdev->vq[queue].signalled_used_valid;

+
+    return status;
+}
+
  #define CONVERT_FEATURES(type, map)    \
  ({   \
  type *list = NULL; \
diff --git a/qapi/virtio.json b/qapi/virtio.json
index 78873cd..7007e0c 100644
--- a/qapi/virtio.json
+++ b/qapi/virtio.json
@@ -406,3 +406,105 @@
    'data': { 'path': 'str' },
    'returns': 'VirtioStatus'
  }
+
+##
+# @VirtQueueStatus:
+#
+# Status of a VirtQueue
+#
+# @device-type: VirtIO device type
+#
+# @queue-index: VirtQueue queue_index
+#
+# @inuse: VirtQueue inuse
+#
+# @vring-num: VirtQueue vring.num
+#
+# @vring-num-default: VirtQueue vring.num_default
+#
+# @vring-align: VirtQueue vring.align
+#
+# @vring-desc: VirtQueue vring.desc
+#
+# @vring-avail: VirtQueue vring.avail
+#
+# @vring-used: VirtQueue vring.used
+#
+# @last-avail-idx: VirtQueue last_avail_idx
+#
+# @shadow-avail-idx: VirtQueue shadow_avail_idx
+#
+# @used-idx: VirtQueue used_idx
+#
+# @signalled-used: VirtQueue signalled_used
+#
+# @signalled-used-valid: VirtQueue signalled_used_valid
+#
+# Since: 6.1
+#
+##
+
+{ 'struct': 'VirtQueueStatus',
+  'data': {
+    'device-type': 'VirtioType',
+    'queue-index': 'uint16',
+    'inuse': 'uint32',
+    'vring-num': 'int',
+    'vring-num-default': 'int',
+    'vring-align': 'int',
+    'vring-desc': 'uint64',
+    'vring-avail': 'uint64',
+    'vring-used': 'uint64',
+    'last-avail-idx': 'uint16',
+    'shadow-avail-idx': 'uint16',
+    'used-idx': 'uint16',
+    'signalled-used': 'uint16',
+    'signalled-used-valid': 'uint16'
+  }
+}
+
+##
+# @x-debug-virtio-queue-status:
+#
+# Return the status of a given VirtQueue
+#
+# @path: QOBject path of the VirtI

Re: [PATCH v6 0/6] hmp, qmp: Add some commands to introspect virtio devices

2021-07-21 Thread Jonah Palmer
Hi Jason. My apologies for the delayed response, several work-related 
things came up recently, but they're slowing down now so I'm turning my 
attention these patches to get taken care of.


A few questions and comments below (and in other following patches):


On 7/13/21 10:42 PM, Jason Wang wrote:


在 2021/7/12 下午6:35, Jonah Palmer 写道:
 Dump the information of the head element of the third queue 
of virtio-scsi:


 (qemu) virtio queue-element 
/machine/peripheral-anon/device[3]/virtio-backend 3

 index: 122
 ndescs: 3
 descs: addr 0x7302d000 len 4096 (write), addr 0x3c951763 len 
108 (write, next),

    addr 0x3c951728 len 51 (next)



I think it would be nice if we can show driver area and device area as 
well here.


Sure thing. And I apologize if it's obvious (I'm relatively new to virtio), but 
how can I expose the driver area?

I understand that virtio devices are part of the Qemu process, but I also 
thought that virtio drivers are in the
guest's kernel, which I don't believe I can see into from Qemu (or, at least, 
it's not obvious to me).

Jonah



Thanks



[PATCH v6 5/5] tests/qtest/nvme-test: add mmio read test

2021-07-21 Thread Klaus Jensen
From: Klaus Jensen 

Add a regression test for mmio read on big-endian hosts.

Signed-off-by: Klaus Jensen 
Reviewed-by: Gollu Appalanaidu 
---
 tests/qtest/nvme-test.c | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/tests/qtest/nvme-test.c b/tests/qtest/nvme-test.c
index 47e757d7e2af..f8bafb5d70fb 100644
--- a/tests/qtest/nvme-test.c
+++ b/tests/qtest/nvme-test.c
@@ -67,6 +67,30 @@ static void nvmetest_oob_cmb_test(void *obj, void *data, 
QGuestAllocator *alloc)
 g_assert_cmpint(qpci_io_readl(pdev, bar, cmb_bar_size - 1), !=, 
0x44332211);
 }
 
+static void nvmetest_reg_read_test(void *obj, void *data, QGuestAllocator 
*alloc)
+{
+QNvme *nvme = obj;
+QPCIDevice *pdev = &nvme->dev;
+QPCIBar bar;
+uint32_t cap_lo, cap_hi;
+uint64_t cap;
+
+qpci_device_enable(pdev);
+bar = qpci_iomap(pdev, 0, NULL);
+
+cap_lo = qpci_io_readl(pdev, bar, 0x0);
+g_assert_cmpint(NVME_CAP_MQES(cap_lo), ==, 0x7ff);
+
+cap_hi = qpci_io_readl(pdev, bar, 0x4);
+g_assert_cmpint(NVME_CAP_MPSMAX((uint64_t)cap_hi << 32), ==, 0x4);
+
+cap = qpci_io_readq(pdev, bar, 0x0);
+g_assert_cmpint(NVME_CAP_MQES(cap), ==, 0x7ff);
+g_assert_cmpint(NVME_CAP_MPSMAX(cap), ==, 0x4);
+
+qpci_iounmap(pdev, bar);
+}
+
 static void nvmetest_pmr_reg_test(void *obj, void *data, QGuestAllocator 
*alloc)
 {
 QNvme *nvme = obj;
@@ -142,6 +166,8 @@ static void nvme_register_nodes(void)
  &(QOSGraphTestOptions) {
 .edge.extra_device_opts = "pmrdev=pmr0"
 });
+
+qos_add_test("reg-read", "nvme", nvmetest_reg_read_test, NULL);
 }
 
 libqos_init(nvme_register_nodes);
-- 
2.32.0




[PATCH v6 4/5] hw/nvme: fix mmio read

2021-07-21 Thread Klaus Jensen
From: Klaus Jensen 

The new PMR test unearthed a long-standing issue with MMIO reads on
big-endian hosts.

Fix this by unconditionally storing all controller registers in little
endian.

Cc: Gollu Appalanaidu 
Reported-by: Peter Maydell 
Signed-off-by: Klaus Jensen 
Reviewed-by: Peter Maydell 
---
 hw/nvme/ctrl.c | 291 +++--
 1 file changed, 162 insertions(+), 129 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 10c2363c1d4d..43dfaeac9f54 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -439,10 +439,12 @@ static uint8_t nvme_sq_empty(NvmeSQueue *sq)
 
 static void nvme_irq_check(NvmeCtrl *n)
 {
+uint32_t intms = ldl_le_p(&n->bar.intms);
+
 if (msix_enabled(&(n->parent_obj))) {
 return;
 }
-if (~n->bar.intms & n->irq_status) {
+if (~intms & n->irq_status) {
 pci_irq_assert(&n->parent_obj);
 } else {
 pci_irq_deassert(&n->parent_obj);
@@ -1289,7 +1291,7 @@ static void nvme_post_cqes(void *opaque)
 if (ret) {
 trace_pci_nvme_err_addr_write(addr);
 trace_pci_nvme_err_cfs();
-n->bar.csts = NVME_CSTS_FAILED;
+stl_le_p(&n->bar.csts, NVME_CSTS_FAILED);
 break;
 }
 QTAILQ_REMOVE(&cq->req_list, req, entry);
@@ -4022,7 +4024,7 @@ static uint16_t nvme_create_sq(NvmeCtrl *n, NvmeRequest 
*req)
 trace_pci_nvme_err_invalid_create_sq_sqid(sqid);
 return NVME_INVALID_QID | NVME_DNR;
 }
-if (unlikely(!qsize || qsize > NVME_CAP_MQES(n->bar.cap))) {
+if (unlikely(!qsize || qsize > NVME_CAP_MQES(ldq_le_p(&n->bar.cap {
 trace_pci_nvme_err_invalid_create_sq_size(qsize);
 return NVME_MAX_QSIZE_EXCEEDED | NVME_DNR;
 }
@@ -4208,7 +4210,7 @@ static uint16_t nvme_cmd_effects(NvmeCtrl *n, uint8_t 
csi, uint32_t buf_len,
 return NVME_INVALID_FIELD | NVME_DNR;
 }
 
-switch (NVME_CC_CSS(n->bar.cc)) {
+switch (NVME_CC_CSS(ldl_le_p(&n->bar.cc))) {
 case NVME_CC_CSS_NVM:
 src_iocs = nvme_cse_iocs_nvm;
 /* fall through */
@@ -4370,7 +4372,7 @@ static uint16_t nvme_create_cq(NvmeCtrl *n, NvmeRequest 
*req)
 trace_pci_nvme_err_invalid_create_cq_cqid(cqid);
 return NVME_INVALID_QID | NVME_DNR;
 }
-if (unlikely(!qsize || qsize > NVME_CAP_MQES(n->bar.cap))) {
+if (unlikely(!qsize || qsize > NVME_CAP_MQES(ldq_le_p(&n->bar.cap {
 trace_pci_nvme_err_invalid_create_cq_size(qsize);
 return NVME_MAX_QSIZE_EXCEEDED | NVME_DNR;
 }
@@ -5163,17 +5165,19 @@ static void nvme_update_dmrsl(NvmeCtrl *n)
 
 static void nvme_select_iocs_ns(NvmeCtrl *n, NvmeNamespace *ns)
 {
+uint32_t cc = ldl_le_p(&n->bar.cc);
+
 ns->iocs = nvme_cse_iocs_none;
 switch (ns->csi) {
 case NVME_CSI_NVM:
-if (NVME_CC_CSS(n->bar.cc) != NVME_CC_CSS_ADMIN_ONLY) {
+if (NVME_CC_CSS(cc) != NVME_CC_CSS_ADMIN_ONLY) {
 ns->iocs = nvme_cse_iocs_nvm;
 }
 break;
 case NVME_CSI_ZONED:
-if (NVME_CC_CSS(n->bar.cc) == NVME_CC_CSS_CSI) {
+if (NVME_CC_CSS(cc) == NVME_CC_CSS_CSI) {
 ns->iocs = nvme_cse_iocs_zoned;
-} else if (NVME_CC_CSS(n->bar.cc) == NVME_CC_CSS_NVM) {
+} else if (NVME_CC_CSS(cc) == NVME_CC_CSS_NVM) {
 ns->iocs = nvme_cse_iocs_nvm;
 }
 break;
@@ -5510,7 +5514,7 @@ static void nvme_process_sq(void *opaque)
 if (nvme_addr_read(n, addr, (void *)&cmd, sizeof(cmd))) {
 trace_pci_nvme_err_addr_read(addr);
 trace_pci_nvme_err_cfs();
-n->bar.csts = NVME_CSTS_FAILED;
+stl_le_p(&n->bar.csts, NVME_CSTS_FAILED);
 break;
 }
 nvme_inc_sq_head(sq);
@@ -5565,8 +5569,6 @@ static void nvme_ctrl_reset(NvmeCtrl *n)
 n->aer_queued = 0;
 n->outstanding_aers = 0;
 n->qs_created = false;
-
-n->bar.cc = 0;
 }
 
 static void nvme_ctrl_shutdown(NvmeCtrl *n)
@@ -5605,7 +5607,12 @@ static void nvme_select_iocs(NvmeCtrl *n)
 
 static int nvme_start_ctrl(NvmeCtrl *n)
 {
-uint32_t page_bits = NVME_CC_MPS(n->bar.cc) + 12;
+uint64_t cap = ldq_le_p(&n->bar.cap);
+uint32_t cc = ldl_le_p(&n->bar.cc);
+uint32_t aqa = ldl_le_p(&n->bar.aqa);
+uint64_t asq = ldq_le_p(&n->bar.asq);
+uint64_t acq = ldq_le_p(&n->bar.acq);
+uint32_t page_bits = NVME_CC_MPS(cc) + 12;
 uint32_t page_size = 1 << page_bits;
 
 if (unlikely(n->cq[0])) {
@@ -5616,73 +5623,72 @@ static int nvme_start_ctrl(NvmeCtrl *n)
 trace_pci_nvme_err_startfail_sq();
 return -1;
 }
-if (unlikely(!n->bar.asq)) {
+if (unlikely(!asq)) {
 trace_pci_nvme_err_startfail_nbarasq();
 return -1;
 }
-if (unlikely(!n->bar.acq)) {
+if (unlikely(!acq)) {
 trace_pci_nvme_err_startfail_nbaracq();
 return -1;
 }
-if (unlikely(n->bar.asq & (page_size - 1))) {
-trace_pci_nvme_err_startfail_asq_misal

[PATCH v6 3/5] hw/nvme: fix out-of-bounds reads

2021-07-21 Thread Klaus Jensen
From: Klaus Jensen 

Peter noticed that mmio access may read into the NvmeParams member in
the NvmeCtrl struct.

Fix the bounds check.

Reported-by: Peter Maydell 
Signed-off-by: Klaus Jensen 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Peter Maydell 
---
 hw/nvme/ctrl.c | 27 +++
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 23ff71f65c0e..10c2363c1d4d 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -5969,23 +5969,26 @@ static uint64_t nvme_mmio_read(void *opaque, hwaddr 
addr, unsigned size)
 /* should RAZ, fall through for now */
 }
 
-if (addr < sizeof(n->bar)) {
-/*
- * When PMRWBM bit 1 is set then read from
- * from PMRSTS should ensure prior writes
- * made it to persistent media
- */
-if (addr == NVME_REG_PMRSTS &&
-(NVME_PMRCAP_PMRWBM(n->bar.pmrcap) & 0x02)) {
-memory_region_msync(&n->pmr.dev->mr, 0, n->pmr.dev->size);
-}
-memcpy(&val, ptr + addr, size);
-} else {
+if (addr > sizeof(n->bar) - size) {
 NVME_GUEST_ERR(pci_nvme_ub_mmiord_invalid_ofs,
"MMIO read beyond last register,"
" offset=0x%"PRIx64", returning 0", addr);
+
+return 0;
 }
 
+/*
+ * When PMRWBM bit 1 is set then read from
+ * from PMRSTS should ensure prior writes
+ * made it to persistent media
+ */
+if (addr == NVME_REG_PMRSTS &&
+(NVME_PMRCAP_PMRWBM(n->bar.pmrcap) & 0x02)) {
+memory_region_msync(&n->pmr.dev->mr, 0, n->pmr.dev->size);
+}
+
+memcpy(&val, ptr + addr, size);
+
 return val;
 }
 
-- 
2.32.0




[PATCH v6 2/5] hw/nvme: use symbolic names for registers

2021-07-21 Thread Klaus Jensen
From: Klaus Jensen 

Add the NvmeBarRegs enum and use these instead of explicit register
offsets.

Signed-off-by: Klaus Jensen 
Reviewed-by: Gollu Appalanaidu 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Keith Busch 
---
 include/block/nvme.h | 29 -
 hw/nvme/ctrl.c   | 44 ++--
 2 files changed, 50 insertions(+), 23 deletions(-)

diff --git a/include/block/nvme.h b/include/block/nvme.h
index 84053b68b987..77aae0117494 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -9,7 +9,7 @@ typedef struct QEMU_PACKED NvmeBar {
 uint32_tcc;
 uint8_t rsvd24[4];
 uint32_tcsts;
-uint32_tnssrc;
+uint32_tnssr;
 uint32_taqa;
 uint64_tasq;
 uint64_tacq;
@@ -31,6 +31,33 @@ typedef struct QEMU_PACKED NvmeBar {
 uint8_t css[484];
 } NvmeBar;
 
+enum NvmeBarRegs {
+NVME_REG_CAP = offsetof(NvmeBar, cap),
+NVME_REG_VS  = offsetof(NvmeBar, vs),
+NVME_REG_INTMS   = offsetof(NvmeBar, intms),
+NVME_REG_INTMC   = offsetof(NvmeBar, intmc),
+NVME_REG_CC  = offsetof(NvmeBar, cc),
+NVME_REG_CSTS= offsetof(NvmeBar, csts),
+NVME_REG_NSSR= offsetof(NvmeBar, nssr),
+NVME_REG_AQA = offsetof(NvmeBar, aqa),
+NVME_REG_ASQ = offsetof(NvmeBar, asq),
+NVME_REG_ACQ = offsetof(NvmeBar, acq),
+NVME_REG_CMBLOC  = offsetof(NvmeBar, cmbloc),
+NVME_REG_CMBSZ   = offsetof(NvmeBar, cmbsz),
+NVME_REG_BPINFO  = offsetof(NvmeBar, bpinfo),
+NVME_REG_BPRSEL  = offsetof(NvmeBar, bprsel),
+NVME_REG_BPMBL   = offsetof(NvmeBar, bpmbl),
+NVME_REG_CMBMSC  = offsetof(NvmeBar, cmbmsc),
+NVME_REG_CMBSTS  = offsetof(NvmeBar, cmbsts),
+NVME_REG_PMRCAP  = offsetof(NvmeBar, pmrcap),
+NVME_REG_PMRCTL  = offsetof(NvmeBar, pmrctl),
+NVME_REG_PMRSTS  = offsetof(NvmeBar, pmrsts),
+NVME_REG_PMREBS  = offsetof(NvmeBar, pmrebs),
+NVME_REG_PMRSWTP = offsetof(NvmeBar, pmrswtp),
+NVME_REG_PMRMSCL = offsetof(NvmeBar, pmrmscl),
+NVME_REG_PMRMSCU = offsetof(NvmeBar, pmrmscu),
+};
+
 enum NvmeCapShift {
 CAP_MQES_SHIFT = 0,
 CAP_CQR_SHIFT  = 16,
diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 070d9f6a962d..23ff71f65c0e 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -5740,7 +5740,7 @@ static void nvme_write_bar(NvmeCtrl *n, hwaddr offset, 
uint64_t data,
 }
 
 switch (offset) {
-case 0xc:   /* INTMS */
+case NVME_REG_INTMS:
 if (unlikely(msix_enabled(&(n->parent_obj {
 NVME_GUEST_ERR(pci_nvme_ub_mmiowr_intmask_with_msix,
"undefined access to interrupt mask set"
@@ -5752,7 +5752,7 @@ static void nvme_write_bar(NvmeCtrl *n, hwaddr offset, 
uint64_t data,
 trace_pci_nvme_mmio_intm_set(data & 0x, n->bar.intmc);
 nvme_irq_check(n);
 break;
-case 0x10:  /* INTMC */
+case NVME_REG_INTMC:
 if (unlikely(msix_enabled(&(n->parent_obj {
 NVME_GUEST_ERR(pci_nvme_ub_mmiowr_intmask_with_msix,
"undefined access to interrupt mask clr"
@@ -5764,7 +5764,7 @@ static void nvme_write_bar(NvmeCtrl *n, hwaddr offset, 
uint64_t data,
 trace_pci_nvme_mmio_intm_clr(data & 0x, n->bar.intmc);
 nvme_irq_check(n);
 break;
-case 0x14:  /* CC */
+case NVME_REG_CC:
 trace_pci_nvme_mmio_cfg(data & 0x);
 /* Windows first sends data, then sends enable bit */
 if (!NVME_CC_EN(data) && !NVME_CC_EN(n->bar.cc) &&
@@ -5798,7 +5798,7 @@ static void nvme_write_bar(NvmeCtrl *n, hwaddr offset, 
uint64_t data,
 n->bar.cc = data;
 }
 break;
-case 0x1c:  /* CSTS */
+case NVME_REG_CSTS:
 if (data & (1 << 4)) {
 NVME_GUEST_ERR(pci_nvme_ub_mmiowr_ssreset_w1c_unsupported,
"attempted to W1C CSTS.NSSRO"
@@ -5809,7 +5809,7 @@ static void nvme_write_bar(NvmeCtrl *n, hwaddr offset, 
uint64_t data,
" of controller status");
 }
 break;
-case 0x20:  /* NSSR */
+case NVME_REG_NSSR:
 if (data == 0x4e564d65) {
 trace_pci_nvme_ub_mmiowr_ssreset_unsupported();
 } else {
@@ -5817,38 +5817,38 @@ static void nvme_write_bar(NvmeCtrl *n, hwaddr offset, 
uint64_t data,
 return;
 }
 break;
-case 0x24:  /* AQA */
+case NVME_REG_AQA:
 n->bar.aqa = data & 0x;
 trace_pci_nvme_mmio_aqattr(data & 0x);
 break;
-case 0x28:  /* ASQ */
+case NVME_REG_ASQ:
 n->bar.asq = size == 8 ? data :
 (n->bar.asq & ~0xULL) | (data & 0x);
 trace_pci_nvme_mmio_asqaddr(data);
 break;
-case 0x2c:  /* ASQ hi */
+case NVME_REG_ASQ + 4:
 n->bar.asq = (n->bar.asq & 0x) | (data << 32);
 trace_pci_nvme_mmio_asqaddr_hi(data, n->bar.asq

[PATCH v6 1/5] hw/nvme: split pmrmsc register into upper and lower

2021-07-21 Thread Klaus Jensen
From: Klaus Jensen 

The specification uses a set of 32 bit PMRMSCL and PMRMSCU registers to
make up the 64 bit logical PMRMSC register.

Make it so.

Signed-off-by: Klaus Jensen 
---
 include/block/nvme.h | 31 ---
 hw/nvme/ctrl.c   | 10 ++
 2 files changed, 22 insertions(+), 19 deletions(-)

diff --git a/include/block/nvme.h b/include/block/nvme.h
index 527105fafc0b..84053b68b987 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -26,7 +26,8 @@ typedef struct QEMU_PACKED NvmeBar {
 uint32_tpmrsts;
 uint32_tpmrebs;
 uint32_tpmrswtp;
-uint64_tpmrmsc;
+uint32_tpmrmscl;
+uint32_tpmrmscu;
 uint8_t css[484];
 } NvmeBar;
 
@@ -475,25 +476,25 @@ enum NvmePmrswtpMask {
 #define NVME_PMRSWTP_SET_PMRSWTV(pmrswtp, val)   \
 (pmrswtp |= (uint64_t)(val & PMRSWTP_PMRSWTV_MASK) << 
PMRSWTP_PMRSWTV_SHIFT)
 
-enum NvmePmrmscShift {
-PMRMSC_CMSE_SHIFT   = 1,
-PMRMSC_CBA_SHIFT= 12,
+enum NvmePmrmsclShift {
+PMRMSCL_CMSE_SHIFT   = 1,
+PMRMSCL_CBA_SHIFT= 12,
 };
 
-enum NvmePmrmscMask {
-PMRMSC_CMSE_MASK   = 0x1,
-PMRMSC_CBA_MASK= 0xf,
+enum NvmePmrmsclMask {
+PMRMSCL_CMSE_MASK   = 0x1,
+PMRMSCL_CBA_MASK= 0xf,
 };
 
-#define NVME_PMRMSC_CMSE(pmrmsc)\
-((pmrmsc >> PMRMSC_CMSE_SHIFT)   & PMRMSC_CMSE_MASK)
-#define NVME_PMRMSC_CBA(pmrmsc) \
-((pmrmsc >> PMRMSC_CBA_SHIFT)   & PMRMSC_CBA_MASK)
+#define NVME_PMRMSCL_CMSE(pmrmscl)\
+((pmrmscl >> PMRMSCL_CMSE_SHIFT)   & PMRMSCL_CMSE_MASK)
+#define NVME_PMRMSCL_CBA(pmrmscl) \
+((pmrmscl >> PMRMSCL_CBA_SHIFT)   & PMRMSCL_CBA_MASK)
 
-#define NVME_PMRMSC_SET_CMSE(pmrmsc, val)   \
-(pmrmsc |= (uint64_t)(val & PMRMSC_CMSE_MASK) << PMRMSC_CMSE_SHIFT)
-#define NVME_PMRMSC_SET_CBA(pmrmsc, val)   \
-(pmrmsc |= (uint64_t)(val & PMRMSC_CBA_MASK) << PMRMSC_CBA_SHIFT)
+#define NVME_PMRMSCL_SET_CMSE(pmrmscl, val)   \
+(pmrmscl |= (uint32_t)(val & PMRMSCL_CMSE_MASK) << PMRMSCL_CMSE_SHIFT)
+#define NVME_PMRMSCL_SET_CBA(pmrmscl, val)   \
+(pmrmscl |= (uint32_t)(val & PMRMSCL_CBA_MASK) << PMRMSCL_CBA_SHIFT)
 
 enum NvmeSglDescriptorType {
 NVME_SGL_DESCR_TYPE_DATA_BLOCK  = 0x0,
diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 2f0524e12a36..070d9f6a962d 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -5916,11 +5916,13 @@ static void nvme_write_bar(NvmeCtrl *n, hwaddr offset, 
uint64_t data,
 return;
 }
 
-n->bar.pmrmsc = (n->bar.pmrmsc & ~0x) | (data & 0x);
+n->bar.pmrmscl = data;
 n->pmr.cmse = false;
 
-if (NVME_PMRMSC_CMSE(n->bar.pmrmsc)) {
-hwaddr cba = NVME_PMRMSC_CBA(n->bar.pmrmsc) << PMRMSC_CBA_SHIFT;
+if (NVME_PMRMSCL_CMSE(n->bar.pmrmscl)) {
+uint64_t pmrmscu = n->bar.pmrmscu;
+hwaddr cba = (pmrmscu << 32) |
+(NVME_PMRMSCL_CBA(n->bar.pmrmscl) << PMRMSCL_CBA_SHIFT);
 if (cba + int128_get64(n->pmr.dev->mr.size) < cba) {
 NVME_PMRSTS_SET_CBAI(n->bar.pmrsts, 1);
 return;
@@ -5936,7 +5938,7 @@ static void nvme_write_bar(NvmeCtrl *n, hwaddr offset, 
uint64_t data,
 return;
 }
 
-n->bar.pmrmsc = (n->bar.pmrmsc & 0x) | (data << 32);
+n->bar.pmrmscu = data;
 return;
 default:
 NVME_GUEST_ERR(pci_nvme_ub_mmiowr_invalid,
-- 
2.32.0




[PATCH v6 0/5] hw/nvme: fix mmio read

2021-07-21 Thread Klaus Jensen
From: Klaus Jensen 

Fix mmio read issues on big-endian hosts. The core issue is that values
in the BAR is not stored in little endian as required.

Fix that and add a regression test for this. This required a bit of
cleanup, so it blew up into a series.

v6:
  * "hw/nvme: split pmrmsc register into upper and lower"
- Remove unnecessary masking (Peter)
- Missing shift (Peter)

v5:
  * "hw/nvme: fix mmio read"
- Hurried the changes a bit. Fixed.

v4:
  * "hw/nvme: split pmrmsc register into upper and lower"
- Fix missing left-shift (Peter)

  * "hw/nvme: fix mmio read"
- Remove unnecessary masking (Peter)
- Keep existing behaviour and do not zero the register fields doing
  initialization (Peter)

v3:

  * "hw/nvme: use symbolic names for registers"
Use offsetof(NvmeBar, reg) instead of explicit offsets (Philippe)

  * "hw/nvme: fix mmio read"
Use the st/ld API instead of cpu_to_X (Philippe)

Klaus Jensen (5):
  hw/nvme: split pmrmsc register into upper and lower
  hw/nvme: use symbolic names for registers
  hw/nvme: fix out-of-bounds reads
  hw/nvme: fix mmio read
  tests/qtest/nvme-test: add mmio read test

 include/block/nvme.h|  60 +--
 hw/nvme/ctrl.c  | 352 ++--
 tests/qtest/nvme-test.c |  26 +++
 3 files changed, 265 insertions(+), 173 deletions(-)

-- 
2.32.0