date:20200505

Re: backing chain & block status & filters

2020-05-05 Thread Vladimir Sementsov-Ogievskiy


01.05.2020 6:04, Andrey Shinkevich wrote:

Sounds good to me generally.
Also, we need to identify the filter by its node name when the file names of a 
node and of the filter above it are the same. And what about automatically 
generated node name for the filter? We will want to pass it to the stream 
routine.



As I understand, there should not be auto-generated node-names in blockdev era. 
All nodes are named by libvirt with full-control on them.

--
Best regards,
Vladimir

Re: [PATCH v4 0/5] block-copy: use aio-task-pool

2020-05-05 Thread Vladimir Sementsov-Ogievskiy


05.05.2020 14:06, Max Reitz wrote:

On 29.04.20 15:08, Vladimir Sementsov-Ogievskiy wrote:

Hi all!

v4
01: add Max's r-b
04: move variable definition to the top of the block, add Max's r-b
05: - change error-codes in block_copy_task_run(), document them
   and be more accurate about error code in block_copy_dirty_clusters().
 - s/g_free(aio)/aio_task_pool_free(aio)/


Thanks, applied to my block branch:

https://git.xanclic.moe/XanClic/qemu/commits/branch/block



Thank you!


--
Best regards,
Vladimir

Re: [PATCH v5 19/31] qcow2: Add subcluster support to calculate_l2_meta()

2020-05-05 Thread Eric Blake


On 5/5/20 12:38 PM, Alberto Garcia wrote:

If an image has subclusters then there are more copy-on-write
scenarios that we need to consider. Let's say we have a write request
from the middle of subcluster #3 until the end of the cluster:

1) If we are writing to a newly allocated cluster then we need
copy-on-write. The previous contents of subclusters #0 to #3 must
be copied to the new cluster. We can optimize this process by
skipping all leading unallocated or zero subclusters (the status of
those skipped subclusters will be reflected in the new L2 bitmap).

2) If we are overwriting an existing cluster:

2.1) If subcluster #3 is unallocated or has the all-zeroes bit set
 then we need copy-on-write (on subcluster #3 only).

2.2) If subcluster #3 was already allocated then there is no need
 for any copy-on-write. However we still need to update the L2
 bitmap to reflect possible changes in the allocation status of
 subclusters #4 to #31. Because of this, this function checks
 if all the overwritten subclusters are already allocated and
 in this case it returns without creating a new QCowL2Meta
 structure.

After all these changes l2meta_cow_start() and l2meta_cow_end()
are not necessarily cluster-aligned anymore. We need to update the
calculation of old_start and old_end in handle_dependencies() to
guarantee that no two requests try to write on the same cluster.

Signed-off-by: Alberto Garcia 
---
  block/qcow2-cluster.c | 174 +++---
  1 file changed, 146 insertions(+), 28 deletions(-)




-/* Return if there's no COW (all clusters are normal and we keep them) */
+/* Return if there's no COW (all subclusters are normal and we are
+ * keeping the clusters) */
  if (keep_old) {
+unsigned first_sc = cow_start_to / s->subcluster_size;
+unsigned last_sc = (cow_end_from - 1) / s->subcluster_size;
  int i;
-for (i = 0; i < nb_clusters; i++) {
-l2_entry = get_l2_entry(s, l2_slice, l2_index + i);
-if (qcow2_get_cluster_type(bs, l2_entry) != QCOW2_CLUSTER_NORMAL) {
+for (i = first_sc; i <= last_sc; i++) {
+unsigned c = i / s->subclusters_per_cluster;
+unsigned sc = i % s->subclusters_per_cluster;
+l2_entry = get_l2_entry(s, l2_slice, l2_index + c);
+l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index + c);
+type = qcow2_get_subcluster_type(bs, l2_entry, l2_bitmap, sc);
+if (type == QCOW2_SUBCLUSTER_INVALID) {
+l2_index += c; /* Point to the invalid entry */
+goto fail;
+}
+if (type != QCOW2_SUBCLUSTER_NORMAL) {
  break;
  }
  }


This loop is now 32 times slower (for extended L2 entries).  Do you 
really need to check for an invalid subcluster here, or can we just 
blindly check that all 32 subclusters are NORMAL, and leave handling of 
invalid clusters for the rest of the function after we failed the 
exit-early test?  For that matter, for all but the first and last 
cluster, checking if 32 clusters are NORMAL is a simple 64-bit 
comparison rather than 32 iterations of a loop; and even for the first 
and last cluster, the _RANGE macros in 14/31 work well to mask out which 
bits must be set/cleared.  My guess is that optimizing this loop is 
worthwhile, since overwriting existing data is probably more common than 
allocating new data.



-if (i == nb_clusters) {
-return;
+if (i == last_sc + 1) {
+return 0;
  }
  }


If we get here, then i is now the address of the first subcluster that 
was not NORMAL, even if it is much smaller than the final subcluster 
learned by nb_clusters for the overall request.  [1]


  
  /* Get the L2 entry of the first cluster */

  l2_entry = get_l2_entry(s, l2_slice, l2_index);
-type = qcow2_get_cluster_type(bs, l2_entry);
+l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index);
+sc_index = offset_to_sc_index(s, guest_offset);
+type = qcow2_get_subcluster_type(bs, l2_entry, l2_bitmap, sc_index);
  
-if (type == QCOW2_CLUSTER_NORMAL && keep_old) {

-cow_start_from = cow_start_to;
+if (type == QCOW2_SUBCLUSTER_INVALID) {
+goto fail;
+}
+
+if (!keep_old) {
+switch (type) {
+case QCOW2_SUBCLUSTER_COMPRESSED:
+cow_start_from = 0;
+break;
+case QCOW2_SUBCLUSTER_NORMAL:
+case QCOW2_SUBCLUSTER_ZERO_ALLOC:
+case QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC: {
+int i;
+/* Skip all leading zero and unallocated subclusters */
+for (i = 0; i < sc_index; i++) {
+QCow2SubclusterType t;
+t = qcow2_get_subcluster_type(bs, l2_entry, l2_bitmap, i);
+if (t == QCOW2_SUBCLUSTER_INVALID) {
+goto fail;
+

Re: [PATCH v2 6/6] iotests: Add test 291 to for qemu-img bitmap coverage

2020-05-05 Thread Eric Blake


On 5/4/20 8:05 AM, Max Reitz wrote:

On 21.04.20 23:20, Eric Blake wrote:

Add a new test covering the 'qemu-img bitmap' subcommand, as well as
'qemu-img convert --bitmaps', both added in recent patches.

Signed-off-by: Eric Blake 



+echo
+echo "=== Bitmap preservation not possible to non-qcow2 ==="
+echo
+
+mv "$TEST_IMG" "$TEST_IMG.orig"


“mv” doesn’t work images with external data files.

(ORIG_IMG=$TEST_IMG; TEST_IMG="$TEST_IMG".orig should work)


Good idea.




+$QEMU_IMG convert --bitmaps -O raw "$TEST_IMG.orig" "$TEST_IMG"
+
+echo
+echo "=== Convert with bitmap preservation ==="
+echo
+
+# Only bitmaps from the active layer are copied


That’s kind of obvious when you think about (whenever an image is
attached to a VM, only the active layer’s bitmaps are visible, not those
from the backing chain), but maybe this should be noted in the
documentation?


As part of integrating bitmaps with external snapshots, libvirt actually 
depends on being able to see bitmaps from the backing chain - but as 
bitmaps are always referenced as a 'node name, bitmap name' tuple, this 
is indeed doable.





+$QEMU_IMG convert --bitmaps -O qcow2 "$TEST_IMG.orig" "$TEST_IMG"
+$QEMU_IMG info "$TEST_IMG" | _filter_img_info --format-specific
+# But we can also merge in bitmaps from other layers
+$QEMU_IMG bitmap --add --disabled -f $IMGFMT "$TEST_IMG" b0
+$QEMU_IMG bitmap --add -f $IMGFMT "$TEST_IMG" tmp
+$QEMU_IMG bitmap --merge b0 -b "$TEST_IMG.base" -F $IMGFMT "$TEST_IMG" tmp
+$QEMU_IMG bitmap --merge tmp "$TEST_IMG" b0
+$QEMU_IMG bitmap --remove -f $IMGFMT "$TEST_IMG" tmp


Why do we need tmp here?  Can’t we just merge base’s b0 directly into
$TEST_IMG’s b0?


Yes, we could.  But then I wouldn't cover as many bitmap subcommands. 
Adding a comment about why the example is contrived (for maximal 
coverage) is a good idea.




+=== Check bitmap contents ===
+
+[{ "start": 0, "length": 3145728, "depth": 0, "zero": false, "data": true, 
"offset": OFFSET},
+{ "start": 3145728, "length": 1048576, "depth": 0, "zero": false, "data": 
false},
+{ "start": 4194304, "length": 6291456, "depth": 0, "zero": false, "data": true, 
"offset": OFFSET}]
+[{ "start": 0, "length": 1048576, "depth": 0, "zero": false, "data": true, 
"offset": OFFSET},
+{ "start": 1048576, "length": 1048576, "depth": 0, "zero": false, "data": 
false},
+{ "start": 2097152, "length": 8388608, "depth": 0, "zero": false, "data": true, 
"offset": OFFSET}]
+[{ "start": 0, "length": 2097152, "depth": 0, "zero": false, "data": true, 
"offset": OFFSET},
+{ "start": 2097152, "length": 1048576, "depth": 0, "zero": false, "data": 
false},
+{ "start": 3145728, "length": 7340032, "depth": 0, "zero": false, "data": true, 
"offset": OFFSET}]


Am I looking at this wrong or does the bitmap data seem to be inverted?
  Everywhere where I’d expect the bitmaps to be cleared, this map reports
data=true, whereas where I’d expect them to be set, it reports data=false.

I suppose that’s intentional, but can you explain this behavior to me?


This is an artifact of how x-dirty-bitmap works (it has the x- prefix 
because it is a hack, but we don't have anything better for reading out 
bitmap contents).  The NBD spec returns block status as a 32-bit value 
for a 'metadata context'; normally, we use context 'base:allocation' 
context where bit 0 is set for holes or clear for allocated, and bit 1 
is set for reads-as-zero or clear for unknown contents (favoring all-0 
as the most-common case).  But with x-dirty-bitmap, we are instead 
abusing NBD to query the 'qemu:dirty-bitmap:FOO' context, where bit 0 is 
set for anywhere the bitmap has a 1, yet feed that information into the 
pre-existing qemu code for handling block status.  So qemu-img map is 
reporting "data":true for what it thinks is the normal 0-for-allocated, 
and "data":false for 1-for-sparse, and we just have to translate that 
back into an understanding of what the bitmap reported.  Yes, a comment 
would be helpful.


I would _really_ love to enhance 'qemu-img map' to output image-specific 
metadata _in addition_ to the existing "zero" and "data" fields (by 
having qemu-img read two NBD contexts at once: both base:allocation and 
qemu:dirty-bitmap:FOO), at which point we can drop the x- prefix and 
avoid the abuse of qemu's internals by overwriting the block_status 
code.  But that's a bigger project.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH v5 15/31] qcow2: Add qcow2_cluster_is_allocated()

2020-05-05 Thread Eric Blake


On 5/5/20 12:38 PM, Alberto Garcia wrote:

This helper function tells us if a cluster is allocated (that is,
there is an associated host offset for it).

Signed-off-by: Alberto Garcia 
---
  block/qcow2.h | 6 ++
  1 file changed, 6 insertions(+)


Reviewed-by: Eric Blake 



diff --git a/block/qcow2.h b/block/qcow2.h
index be7816a3b8..b5db8d2f36 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -763,6 +763,12 @@ QCow2SubclusterType 
qcow2_get_subcluster_type(BlockDriverState *bs,
  }
  }
  
+static inline bool qcow2_cluster_is_allocated(QCow2ClusterType type)

+{
+return (type == QCOW2_CLUSTER_COMPRESSED || type == QCOW2_CLUSTER_NORMAL ||
+type == QCOW2_CLUSTER_ZERO_ALLOC);
+}
+
  /* Check whether refcounts are eager or lazy */
  static inline bool qcow2_need_accurate_refcounts(BDRVQcow2State *s)
  {



--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH v5 14/31] qcow2: Add QCow2SubclusterType and qcow2_get_subcluster_type()

2020-05-05 Thread Eric Blake


On 5/5/20 12:38 PM, Alberto Garcia wrote:

This patch adds QCow2SubclusterType, which is the subcluster-level
version of QCow2ClusterType. All QCOW2_SUBCLUSTER_* values have the
the same meaning as their QCOW2_CLUSTER_* equivalents (when they
exist). See below for details and caveats.

In images without extended L2 entries clusters are treated as having
exactly one subcluster so it is possible to replace one data type with
the other while keeping the exact same semantics.

With extended L2 entries there are new possible values, and every
subcluster in the same cluster can obviously have a different
QCow2SubclusterType so functions need to be adapted to work on the
subcluster level.

There are several things that have to be taken into account:

   a) QCOW2_SUBCLUSTER_COMPRESSED means that the whole cluster is
  compressed. We do not support compression at the subcluster
  level.

   b) There are two different values for unallocated subclusters:
  QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN which means that the whole
  cluster is unallocated, and QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC
  which means that the cluster is allocated but the subcluster is
  not. The latter can only happen in images with extended L2
  entries.


Or put differently, extents of the qcow2 file are always allocated a 
contiguous cluster at a time (so using larger clusters reduces 
fragmentation), but because we can now defer to the backing image a 
sub-cluster at a time, we have less I/O to perform the first time the 
guest touches a subcluster.  The two different return values thus tell 
us when we need to do a cluster allocation vs. just an in-place 
overwrite or a sub-cluster COW.




   c) QCOW2_SUBCLUSTER_INVALID is used to detect the cases where an L2
  entry has a value that violates the specification. The caller is
  responsible for handling these situations.

  To prevent compatibility problems with images that have invalid
  values but are currently being read by QEMU without causing side
  effects, QCOW2_SUBCLUSTER_INVALID is only returned for images
  with extended L2 entries.

qcow2_cluster_to_subcluster_type() is added as a separate function
from qcow2_get_subcluster_type(), but this is only temporary and both
will be merged in a subsequent patch.

Signed-off-by: Alberto Garcia 
---
  block/qcow2.h | 127 +-
  1 file changed, 126 insertions(+), 1 deletion(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 4ad93772b9..be7816a3b8 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -80,6 +80,21 @@
  
  #define QCOW_EXTL2_SUBCLUSTERS_PER_CLUSTER 32
  
+/* The subcluster X [0..31] is allocated */

+#define QCOW_OFLAG_SUB_ALLOC(X)   (1ULL << (X))
+/* The subcluster X [0..31] reads as zeroes */
+#define QCOW_OFLAG_SUB_ZERO(X)(QCOW_OFLAG_SUB_ALLOC(X) << 32)
+/* Subclusters X to Y (both included) are allocated */
+#define QCOW_OFLAG_SUB_ALLOC_RANGE(X, Y) \
+(QCOW_OFLAG_SUB_ALLOC((Y) + 1) - QCOW_OFLAG_SUB_ALLOC(X))


Nicer than my initial thoughts on getting rid of the bit-wise loop.  And 
uses 64-bit math to produce a 32-bit answer, so there are no edge cases 
where overflow could misbehave even though the intermediate steps may 
require 33 bits.  Works as long as X <= Y (should that be mentioned in 
the contract?)



+/* Subclusters X to Y (both included) read as zeroes */
+#define QCOW_OFLAG_SUB_ZERO_RANGE(X, Y) \
+(QCOW_OFLAG_SUB_ALLOC_RANGE(X, Y) << 32)


Also works (you do the math in the low 33 bits before shifting), again 
if X <= Y.



+/* L2 entry bitmap with all allocation bits set */
+#define QCOW_L2_BITMAP_ALL_ALLOC  (QCOW_OFLAG_SUB_ALLOC_RANGE(0, 31))
+/* L2 entry bitmap with all "read as zeroes" bits set */
+#define QCOW_L2_BITMAP_ALL_ZEROES (QCOW_OFLAG_SUB_ZERO_RANGE(0, 31))


More complicated than merely writing 0xULL and 
(0xULL<<32), but the compiler will constant-fold it to the same 
value, and it elegantly expresses the intent.  I like it.


Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH v4 00/14] LUKS: encryption slot management using amend interface

2020-05-05 Thread no-reply

Patchew URL: https://patchew.org/QEMU/20200505200819.5662-1-mlevi...@redhat.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Message-id: 20200505200819.5662-1-mlevi...@redhat.com
Subject: [PATCH v4 00/14] LUKS: encryption slot management using amend interface
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Switched to a new branch 'test'
a66c1e0 iotests: add tests for blockdev-amend
d8564bf block/qcow2: implement blockdev-amend
572ee9d block/crypto: implement blockdev-amend
5f988ac block/core: add generic infrastructure for x-blockdev-amend qmp command
92daa6a iotests: qemu-img tests for luks key management
621c9e3 iotests: filter few more luks specific create options
d465502 block/qcow2: extend qemu-img amend interface with crypto options
86e6207 block/crypto: implement the encryption key management
0f8bfce block/crypto: rename two functions
85542a6 block/amend: refactor qcow2 amend options
f0f6efb block/amend: separate amend and create options for qemu-img
771c331 block/amend: add 'force' option
29c7c9c qcrypto/luks: implement encryption key management
97f7f25 qcrypto/core: add generic infrastructure for crypto options amendment

=== OUTPUT BEGIN ===
1/14 Checking commit 97f7f254b229 (qcrypto/core: add generic infrastructure for 
crypto options amendment)
2/14 Checking commit 29c7c9cc9ace (qcrypto/luks: implement encryption key 
management)
3/14 Checking commit 771c331f6773 (block/amend: add 'force' option)
4/14 Checking commit f0f6efbb1ed3 (block/amend: separate amend and create 
options for qemu-img)
ERROR: Macros with multiple statements should be enclosed in a do - while loop
#31: FILE: block/qcow2.c:5498:
+#define QCOW_COMMON_OPTIONS \
+{   \
+.name = BLOCK_OPT_SIZE, \
+.type = QEMU_OPT_SIZE,  \
+.help = "Virtual disk size" \
+},  \
+{   \
+.name = BLOCK_OPT_COMPAT_LEVEL, \
+.type = QEMU_OPT_STRING,\
+.help = "Compatibility level (v2 [0.10] or v3 [1.1])"   \
+},  \
+{   \
+.name = BLOCK_OPT_BACKING_FILE, \
+.type = QEMU_OPT_STRING,\
+.help = "File name of a base image" \
+},  \
+{   \
+.name = BLOCK_OPT_BACKING_FMT,  \
+.type = QEMU_OPT_STRING,\
+.help = "Image format of the base image"\
+},  \
+{   \
+.name = BLOCK_OPT_DATA_FILE,\
+.type = QEMU_OPT_STRING,\
+.help = "File name of an external data file"\
+},  \
+{   \
+.name = BLOCK_OPT_DATA_FILE_RAW,\
+.type = QEMU_OPT_BOOL,  \
+.help = "The external data file must stay valid "   \
+"as a raw image"\
+},  \
+{   \
+.name = BLOCK_OPT_ENCRYPT,  \
+.type = QEMU_OPT_BOOL,  \
+.help = "Encrypt the image with format 'aes'. (Deprecated " \
+"in favor of " BLOCK_OPT_ENCRYPT_FORMAT "=aes)",\
+},  \
+{   \
+.name = BLOCK_OPT_ENCRYPT_FORMAT,   \
+.type = QEMU_OPT_STRING,\
+.help = "Encrypt the image, format choices: 'aes', 'luks'", \
+},  \
+

Re: [PATCH v5 07/18] nvme: add max_ioqpairs device parameter

2020-05-05 Thread Klaus Jensen

On May  5 07:48, Klaus Jensen wrote:
> From: Klaus Jensen 
> 
> The num_queues device paramater has a slightly confusing meaning because
> it accounts for the admin queue pair which is not really optional.
> Secondly, it is really a maximum value of queues allowed.
> 
> Add a new max_ioqpairs parameter that only accounts for I/O queue pairs,
> but keep num_queues for compatibility.
> 
> Signed-off-by: Klaus Jensen 
> Reviewed-by: Maxim Levitsky 
> Reviewed-by: Philippe Mathieu-Daudé 
> Reviewed-by: Keith Busch 
> ---
>  hw/block/nvme.c | 51 ++---
>  hw/block/nvme.h |  3 ++-
>  2 files changed, 33 insertions(+), 21 deletions(-)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index 623a88be93dc..3875a5f3dcbf 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -1571,7 +1581,8 @@ static Property nvme_props[] = {
>   HostMemoryBackend *),
>  DEFINE_PROP_STRING("serial", NvmeCtrl, params.serial),
>  DEFINE_PROP_UINT32("cmb_size_mb", NvmeCtrl, params.cmb_size_mb, 0),
> -DEFINE_PROP_UINT32("num_queues", NvmeCtrl, params.num_queues, 64),
> +DEFINE_PROP_UINT32("num_queues", NvmeCtrl, params.num_queues, 0),
> +DEFINE_PROP_UINT32("max_ioqpairs", NvmeCtrl, params.max_ioqpairs, 64),
>  DEFINE_PROP_END_OF_LIST(),
>  };
>  

I noticed that this default of 64 makes the default configuration
unsafe by allowing the cq->cqid < 64 assert in nvme_irq_{,de}assert() to
trigger if the pin-based interrupt logic is used (under SPDK for
instance). The assert protects against undefined behavior caused by
shifting by more than 63 into the 64 bit irq_status variable.

As far as I can tell, the assert, the shift and the size of the
irq_status variable is bogus, so I posted a patch for this in
"hw/block/nvme: fixes for interrupt behavior". Preferably that should go
in before this series.

[PATCH 2/2] hw/block/nvme: allow use of any valid msix vector

2020-05-05 Thread Klaus Jensen

From: Klaus Jensen 

If the device uses MSI-X, any of the 2048 MSI-X interrupt vectors are
valid. If the device is not using MSI-X, vector will and can only be
zero at this point.

Signed-off-by: Klaus Jensen 
---
 hw/block/nvme.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 2b072bbb21e7..755ced8b03fb 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -646,7 +646,7 @@ static uint16_t nvme_create_cq(NvmeCtrl *n, NvmeCmd *cmd)
 trace_nvme_err_invalid_create_cq_vector(vector);
 return NVME_INVALID_IRQ_VECTOR | NVME_DNR;
 }
-if (unlikely(vector > n->num_queues)) {
+if (unlikely(vector > PCI_MSIX_FLAGS_QSIZE)) {
 trace_nvme_err_invalid_create_cq_vector(vector);
 return NVME_INVALID_IRQ_VECTOR | NVME_DNR;
 }
-- 
2.26.2

[PATCH 1/2] hw/block/nvme: fix pin-based interrupt behavior

2020-05-05 Thread Klaus Jensen

From: Klaus Jensen 

First, since the device only supports MSI-X or pin-based interrupt, if
MSI-X is not enabled, it should not accept interrupt vectors different
from 0 when creating completion queues.

Secondly, the irq_status NvmeCtrl member is meant to be compared to the
INTMS register, so it should only be 32 bits wide. And it is really only
useful when used with multi-message MSI.

Third, since we do not force a 1-to-1 correspondence between cqid and
interrupt vector, the irq_status register should not have bits set
according to cqid, but according to the associated interrupt vector.

Fix these issues, but keep irq_status available so we can easily support
multi-message MSI down the line.

Fixes: 5e9aa92eb1a5 ("hw/block: Fix pin-based interrupt behaviour of NVMe")
Signed-off-by: Klaus Jensen 
---
 hw/block/nvme.c | 12 
 hw/block/nvme.h |  2 +-
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 9b453423cf2c..2b072bbb21e7 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -127,8 +127,8 @@ static void nvme_irq_assert(NvmeCtrl *n, NvmeCQueue *cq)
 msix_notify(&(n->parent_obj), cq->vector);
 } else {
 trace_nvme_irq_pin();
-assert(cq->cqid < 64);
-n->irq_status |= 1 << cq->cqid;
+assert(cq->vector < 32);
+n->irq_status |= 1 << cq->vector;
 nvme_irq_check(n);
 }
 } else {
@@ -142,8 +142,8 @@ static void nvme_irq_deassert(NvmeCtrl *n, NvmeCQueue *cq)
 if (msix_enabled(&(n->parent_obj))) {
 return;
 } else {
-assert(cq->cqid < 64);
-n->irq_status &= ~(1 << cq->cqid);
+assert(cq->vector < 32);
+n->irq_status &= ~(1 << cq->vector);
 nvme_irq_check(n);
 }
 }
@@ -642,6 +642,10 @@ static uint16_t nvme_create_cq(NvmeCtrl *n, NvmeCmd *cmd)
 trace_nvme_err_invalid_create_cq_addr(prp1);
 return NVME_INVALID_FIELD | NVME_DNR;
 }
+if (unlikely(!msix_enabled(>parent_obj) && vector)) {
+trace_nvme_err_invalid_create_cq_vector(vector);
+return NVME_INVALID_IRQ_VECTOR | NVME_DNR;
+}
 if (unlikely(vector > n->num_queues)) {
 trace_nvme_err_invalid_create_cq_vector(vector);
 return NVME_INVALID_IRQ_VECTOR | NVME_DNR;
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index 6520a9f0bead..db62589247d7 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -78,7 +78,7 @@ typedef struct NvmeCtrl {
 uint32_tcmbsz;
 uint32_tcmbloc;
 uint8_t *cmbuf;
-uint64_tirq_status;
+uint32_tirq_status;
 uint64_thost_timestamp; /* Timestamp sent by the host 
*/
 uint64_ttimestamp_set_qemu_clock_ms;/* QEMU clock time */
 
-- 
2.26.2

[PATCH 0/2] hw/block/nvme: fixes for interrupt behavior

2020-05-05 Thread Klaus Jensen

From: Klaus Jensen 

Klaus Jensen (2):
  hw/block/nvme: fix pin-based interrupt behavior
  hw/block/nvme: allow use of any valid msix vector

 hw/block/nvme.c | 14 +-
 hw/block/nvme.h |  2 +-
 2 files changed, 10 insertions(+), 6 deletions(-)

-- 
2.26.2

[PATCH v4 12/14] block/crypto: implement blockdev-amend

2020-05-05 Thread Maxim Levitsky

Signed-off-by: Maxim Levitsky 
Reviewed-by: Daniel P. Berrangé 
---
 block/crypto.c   | 72 
 qapi/block-core.json | 14 -
 2 files changed, 66 insertions(+), 20 deletions(-)

diff --git a/block/crypto.c b/block/crypto.c
index b71e57f777..d7725df79e 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -775,32 +775,21 @@ block_crypto_get_specific_info_luks(BlockDriverState *bs, 
Error **errp)
 }
 
 static int
-block_crypto_amend_options_luks(BlockDriverState *bs,
-QemuOpts *opts,
-BlockDriverAmendStatusCB *status_cb,
-void *cb_opaque,
-bool force,
-Error **errp)
+block_crypto_amend_options_generic_luks(BlockDriverState *bs,
+QCryptoBlockAmendOptions 
*amend_options,
+bool force,
+Error **errp)
 {
 BlockCrypto *crypto = bs->opaque;
-QDict *cryptoopts = NULL;
-QCryptoBlockAmendOptions *amend_options = NULL;
 int ret;
 
 assert(crypto);
 assert(crypto->block);
-crypto->updating_keys = true;
 
+/* apply for exclusive read/write permissions to the underlying file*/
+crypto->updating_keys = true;
 ret = bdrv_child_refresh_perms(bs, bs->file, errp);
-if (ret < 0) {
-goto cleanup;
-}
-
-cryptoopts = qemu_opts_to_qdict(opts, NULL);
-qdict_put_str(cryptoopts, "format", "luks");
-amend_options = block_crypto_amend_opts_init(cryptoopts, errp);
-if (!amend_options) {
-ret = -EINVAL;
+if (ret) {
 goto cleanup;
 }
 
@@ -812,13 +801,57 @@ block_crypto_amend_options_luks(BlockDriverState *bs,
   force,
   errp);
 cleanup:
+/* release exclusive read/write permissions to the underlying file*/
 crypto->updating_keys = false;
 bdrv_child_refresh_perms(bs, bs->file, errp);
-qapi_free_QCryptoBlockAmendOptions(amend_options);
+return ret;
+}
+
+static int
+block_crypto_amend_options_luks(BlockDriverState *bs,
+QemuOpts *opts,
+BlockDriverAmendStatusCB *status_cb,
+void *cb_opaque,
+bool force,
+Error **errp)
+{
+BlockCrypto *crypto = bs->opaque;
+QDict *cryptoopts = NULL;
+QCryptoBlockAmendOptions *amend_options = NULL;
+int ret = -EINVAL;
+
+assert(crypto);
+assert(crypto->block);
+
+cryptoopts = qemu_opts_to_qdict(opts, NULL);
+qdict_put_str(cryptoopts, "format", "luks");
+amend_options = block_crypto_amend_opts_init(cryptoopts, errp);
 qobject_unref(cryptoopts);
+if (!amend_options) {
+goto cleanup;
+}
+ret = block_crypto_amend_options_generic_luks(bs, amend_options,
+  force, errp);
+cleanup:
+qapi_free_QCryptoBlockAmendOptions(amend_options);
 return ret;
 }
 
+static int
+coroutine_fn block_crypto_co_amend_luks(BlockDriverState *bs,
+BlockdevAmendOptions *opts,
+bool force,
+Error **errp)
+{
+QCryptoBlockAmendOptions amend_opts;
+
+amend_opts = (QCryptoBlockAmendOptions) {
+.format = Q_CRYPTO_BLOCK_FORMAT_LUKS,
+.u.luks = *qapi_BlockdevAmendOptionsLUKS_base(>u.luks),
+};
+return block_crypto_amend_options_generic_luks(bs, _opts,
+   force, errp);
+}
 
 static void
 block_crypto_child_perms(BlockDriverState *bs, BdrvChild *c,
@@ -891,6 +924,7 @@ static BlockDriver bdrv_crypto_luks = {
 .bdrv_get_info  = block_crypto_get_info_luks,
 .bdrv_get_specific_info = block_crypto_get_specific_info_luks,
 .bdrv_amend_options = block_crypto_amend_options_luks,
+.bdrv_co_amend  = block_crypto_co_amend_luks,
 
 .strong_runtime_opts = block_crypto_strong_runtime_opts,
 };
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 5b9123c15f..a5f679ac17 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -4649,6 +4649,18 @@
   'data': { 'job-id': 'str',
 'options': 'BlockdevCreateOptions' } }
 
+##
+# @BlockdevAmendOptionsLUKS:
+#
+# Driver specific image amend options for LUKS.
+#
+# Since: 5.0
+##
+{ 'struct': 'BlockdevAmendOptionsLUKS',
+  'base': 'QCryptoBlockAmendOptionsLUKS',
+  'data': { }
+}
+
 ##
 # @BlockdevAmendOptions:
 #
@@ -4663,7 +4675,7 @@
   'driver': 'BlockdevDriver' },
   'discriminator': 'driver',
   'data': {
-  } }
+  'luks':   'BlockdevAmendOptionsLUKS' } }
 
 ##
 # @x-blockdev-amend:
-- 
2.17.2

[PATCH v4 09/14] iotests: filter few more luks specific create options

2020-05-05 Thread Maxim Levitsky

This allows more tests to be able to have same output on both qcow2 luks 
encrypted images
and raw luks images

Signed-off-by: Maxim Levitsky 
Reviewed-by: Daniel P. Berrangé 
---
 tests/qemu-iotests/087.out   |  6 ++---
 tests/qemu-iotests/134.out   |  2 +-
 tests/qemu-iotests/158.out   |  4 +--
 tests/qemu-iotests/188.out   |  2 +-
 tests/qemu-iotests/189.out   |  4 +--
 tests/qemu-iotests/198.out   |  4 +--
 tests/qemu-iotests/263.out   |  4 +--
 tests/qemu-iotests/274.out   | 46 
 tests/qemu-iotests/284.out   |  6 ++---
 tests/qemu-iotests/common.filter |  6 +++--
 10 files changed, 43 insertions(+), 41 deletions(-)

diff --git a/tests/qemu-iotests/087.out b/tests/qemu-iotests/087.out
index f23bffbbf1..d5ff53302e 100644
--- a/tests/qemu-iotests/087.out
+++ b/tests/qemu-iotests/087.out
@@ -34,7 +34,7 @@ QMP_VERSION
 
 === Encrypted image QCow ===
 
-Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT encryption=on 
encrypt.key-secret=sec0 size=134217728
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT encryption=on size=134217728
 Testing:
 QMP_VERSION
 {"return": {}}
@@ -46,7 +46,7 @@ QMP_VERSION
 
 === Encrypted image LUKS ===
 
-Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT encrypt.format=luks 
encrypt.key-secret=sec0 size=134217728
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728
 Testing:
 QMP_VERSION
 {"return": {}}
@@ -58,7 +58,7 @@ QMP_VERSION
 
 === Missing driver ===
 
-Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT encryption=on 
encrypt.key-secret=sec0 size=134217728
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT encryption=on size=134217728
 Testing: -S
 QMP_VERSION
 {"return": {}}
diff --git a/tests/qemu-iotests/134.out b/tests/qemu-iotests/134.out
index f2878f5f3a..e4733c0b81 100644
--- a/tests/qemu-iotests/134.out
+++ b/tests/qemu-iotests/134.out
@@ -1,5 +1,5 @@
 QA output created by 134
-Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT encryption=on 
encrypt.key-secret=sec0 size=134217728
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT encryption=on size=134217728
 
 == reading whole image ==
 read 134217728/134217728 bytes at offset 0
diff --git a/tests/qemu-iotests/158.out b/tests/qemu-iotests/158.out
index fa2294bb85..52ea9a488f 100644
--- a/tests/qemu-iotests/158.out
+++ b/tests/qemu-iotests/158.out
@@ -1,6 +1,6 @@
 QA output created by 158
 == create base ==
-Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT encryption=on 
encrypt.key-secret=sec0 size=134217728
+Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT encryption=on size=134217728
 
 == writing whole image ==
 wrote 134217728/134217728 bytes at offset 0
@@ -10,7 +10,7 @@ wrote 134217728/134217728 bytes at offset 0
 read 134217728/134217728 bytes at offset 0
 128 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 == create overlay ==
-Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT encryption=on 
encrypt.key-secret=sec0 size=134217728 backing_file=TEST_DIR/t.IMGFMT.base
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT encryption=on size=134217728 
backing_file=TEST_DIR/t.IMGFMT.base
 
 == writing part of a cluster ==
 wrote 1024/1024 bytes at offset 0
diff --git a/tests/qemu-iotests/188.out b/tests/qemu-iotests/188.out
index 4b9aadd51c..5426861b18 100644
--- a/tests/qemu-iotests/188.out
+++ b/tests/qemu-iotests/188.out
@@ -1,5 +1,5 @@
 QA output created by 188
-Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT encrypt.format=luks 
encrypt.key-secret=sec0 encrypt.iter-time=10 size=16777216
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=16777216
 
 == reading whole image ==
 read 16777216/16777216 bytes at offset 0
diff --git a/tests/qemu-iotests/189.out b/tests/qemu-iotests/189.out
index e536d95d53..bc213cbe14 100644
--- a/tests/qemu-iotests/189.out
+++ b/tests/qemu-iotests/189.out
@@ -1,6 +1,6 @@
 QA output created by 189
 == create base ==
-Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT encrypt.format=luks 
encrypt.key-secret=sec0 encrypt.iter-time=10 size=16777216
+Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=16777216
 
 == writing whole image ==
 wrote 16777216/16777216 bytes at offset 0
@@ -10,7 +10,7 @@ wrote 16777216/16777216 bytes at offset 0
 read 16777216/16777216 bytes at offset 0
 16 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 == create overlay ==
-Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT encrypt.format=luks 
encrypt.key-secret=sec1 encrypt.iter-time=10 size=16777216 
backing_file=TEST_DIR/t.IMGFMT.base
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=16777216 
backing_file=TEST_DIR/t.IMGFMT.base
 
 == writing part of a cluster ==
 wrote 1024/1024 bytes at offset 0
diff --git a/tests/qemu-iotests/198.out b/tests/qemu-iotests/198.out
index b0f2d417af..acfdf96b0c 100644
--- a/tests/qemu-iotests/198.out
+++ b/tests/qemu-iotests/198.out
@@ -1,12 +1,12 @@
 QA output created by 198
 == create base ==
-Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT encrypt.format=luks 
encrypt.key-secret=sec0 encrypt.iter-time=10 size=16777216
+Formatting 'TEST_DIR/t.IMGFMT.base',

[PATCH v4 14/14] iotests: add tests for blockdev-amend

2020-05-05 Thread Maxim Levitsky

This commit adds two tests that cover the
new blockdev-amend functionality of luks and qcow2 driver

Signed-off-by: Maxim Levitsky 
Reviewed-by: Daniel P. Berrangé 
---
 tests/qemu-iotests/302 | 278 +
 tests/qemu-iotests/302.out |  40 ++
 tests/qemu-iotests/303 | 233 +++
 tests/qemu-iotests/303.out |  33 +
 tests/qemu-iotests/group   |   2 +
 5 files changed, 586 insertions(+)
 create mode 100755 tests/qemu-iotests/302
 create mode 100644 tests/qemu-iotests/302.out
 create mode 100755 tests/qemu-iotests/303
 create mode 100644 tests/qemu-iotests/303.out

diff --git a/tests/qemu-iotests/302 b/tests/qemu-iotests/302
new file mode 100755
index 00..f7b4d13bd2
--- /dev/null
+++ b/tests/qemu-iotests/302
@@ -0,0 +1,278 @@
+#!/usr/bin/env python3
+#
+# Test case QMP's encrypted key management
+#
+# Copyright (C) 2019 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+import iotests
+import os
+import time
+import json
+
+test_img = os.path.join(iotests.test_dir, 'test.img')
+
+class Secret:
+def __init__(self, index):
+self._id = "keysec" + str(index)
+# you are not supposed to see the password...
+self._secret = "hunter" + str(index)
+
+def id(self):
+return self._id
+
+def secret(self):
+return self._secret
+
+def to_cmdline_object(self):
+return  [ "secret,id=" + self._id + ",data=" + self._secret]
+
+def to_qmp_object(self):
+return { "qom_type" : "secret", "id": self.id(),
+ "props": { "data": self.secret() } }
+
+
+class EncryptionSetupTestCase(iotests.QMPTestCase):
+
+# test case startup
+def setUp(self):
+# start the VM
+self.vm = iotests.VM()
+self.vm.launch()
+
+# create the secrets and load 'em into the VM
+self.secrets = [ Secret(i) for i in range(0, 6) ]
+for secret in self.secrets:
+result = self.vm.qmp("object-add", **secret.to_qmp_object())
+self.assert_qmp(result, 'return', {})
+
+if iotests.imgfmt == "qcow2":
+self.pfx = "encrypt."
+self.img_opts = [ '-o', "encrypt.format=luks" ]
+else:
+self.pfx = ""
+self.img_opts = []
+
+# test case shutdown
+def tearDown(self):
+# stop the VM
+self.vm.shutdown()
+
+###
+# create the encrypted block device
+def createImg(self, file, secret):
+
+iotests.qemu_img(
+'create',
+'--object', *secret.to_cmdline_object(),
+'-f', iotests.imgfmt,
+'-o', self.pfx + 'key-secret=' + secret.id(),
+'-o', self.pfx + 'iter-time=10',
+*self.img_opts,
+file,
+'1M')
+
+###
+# open an encrypted block device
+def openImageQmp(self, id, file, secret, read_only = False):
+
+encrypt_options = {
+'key-secret' : secret.id()
+}
+
+if iotests.imgfmt == "qcow2":
+encrypt_options = {
+'encrypt': {
+'format':'luks',
+**encrypt_options
+}
+}
+
+result = self.vm.qmp('blockdev-add', **
+{
+'driver': iotests.imgfmt,
+'node-name': id,
+'read-only': read_only,
+
+**encrypt_options,
+
+'file': {
+'driver': 'file',
+'filename': test_img,
+}
+}
+)
+self.assert_qmp(result, 'return', {})
+
+# close the encrypted block device
+def closeImageQmp(self, id):
+result = self.vm.qmp('blockdev-del', **{ 'node-name': id })
+self.assert_qmp(result, 'return', {})
+
+###
+# add a key to an encrypted block device
+def addKeyQmp(self, id, new_secret, secret = None,
+  slot = None, force = False):
+
+crypt_options = {
+'state'  : 'active',
+'new-secret' :

[PATCH v4 13/14] block/qcow2: implement blockdev-amend

2020-05-05 Thread Maxim Levitsky

Currently the implementation only supports amending the encryption
options, unlike the qemu-img version

Signed-off-by: Maxim Levitsky 
Reviewed-by: Daniel P. Berrangé 
---
 block/qcow2.c| 39 +++
 qapi/block-core.json | 16 +++-
 2 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index ce1e25f341..a770b88a8f 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -5448,6 +5448,44 @@ static int qcow2_amend_options(BlockDriverState *bs, 
QemuOpts *opts,
 return 0;
 }
 
+static int coroutine_fn qcow2_co_amend(BlockDriverState *bs,
+   BlockdevAmendOptions *opts,
+   bool force,
+   Error **errp)
+{
+BlockdevAmendOptionsQcow2 *qopts = >u.qcow2;
+BDRVQcow2State *s = bs->opaque;
+int ret = 0;
+
+if (qopts->has_encrypt) {
+if (!s->crypto) {
+error_setg(errp, "image is not encrypted, can't amend");
+return -EOPNOTSUPP;
+}
+
+if (qopts->encrypt->format != Q_CRYPTO_BLOCK_FORMAT_LUKS) {
+error_setg(errp,
+   "Amend can't be used to change the qcow2 encryption 
format");
+return -EOPNOTSUPP;
+}
+
+if (s->crypt_method_header != QCOW_CRYPT_LUKS) {
+error_setg(errp,
+   "Only LUKS encryption options can be amended for qcow2 
with blockdev-amend");
+return -EOPNOTSUPP;
+}
+
+ret = qcrypto_block_amend_options(s->crypto,
+  qcow2_crypto_hdr_read_func,
+  qcow2_crypto_hdr_write_func,
+  bs,
+  qopts->encrypt,
+  force,
+  errp);
+}
+return ret;
+}
+
 /*
  * If offset or size are negative, respectively, they will not be included in
  * the BLOCK_IMAGE_CORRUPTED event emitted.
@@ -5658,6 +5696,7 @@ BlockDriver bdrv_qcow2 = {
 .mutable_opts= mutable_opts,
 .bdrv_co_check   = qcow2_co_check,
 .bdrv_amend_options  = qcow2_amend_options,
+.bdrv_co_amend   = qcow2_co_amend,
 
 .bdrv_detach_aio_context  = qcow2_detach_aio_context,
 .bdrv_attach_aio_context  = qcow2_attach_aio_context,
diff --git a/qapi/block-core.json b/qapi/block-core.json
index a5f679ac17..0ffdc1c3d4 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -4661,6 +4661,19 @@
   'data': { }
 }
 
+##
+# @BlockdevAmendOptionsQcow2:
+#
+# Driver specific image amend options for qcow2.
+# For now, only encryption options can be amended
+#
+# @encrypt  Encryption options to be amended
+#
+# Since: 5.0
+##
+{ 'struct': 'BlockdevAmendOptionsQcow2',
+  'data': { '*encrypt': 'QCryptoBlockAmendOptions' } }
+
 ##
 # @BlockdevAmendOptions:
 #
@@ -4675,7 +4688,8 @@
   'driver': 'BlockdevDriver' },
   'discriminator': 'driver',
   'data': {
-  'luks':   'BlockdevAmendOptionsLUKS' } }
+  'luks':   'BlockdevAmendOptionsLUKS',
+  'qcow2':  'BlockdevAmendOptionsQcow2' } }
 
 ##
 # @x-blockdev-amend:
-- 
2.17.2

[PATCH v4 11/14] block/core: add generic infrastructure for x-blockdev-amend qmp command

2020-05-05 Thread Maxim Levitsky

blockdev-amend will be used similiar to blockdev-create
to allow on the fly changes of the structure of the format based block devices.

Current plan is to first support encryption keyslot management for luks
based formats (raw and embedded in qcow2)

Signed-off-by: Maxim Levitsky 
Reviewed-by: Daniel P. Berrangé 
---
 block/Makefile.objs   |   2 +-
 block/amend.c | 108 ++
 include/block/block_int.h |  21 +---
 qapi/block-core.json  |  42 +++
 qapi/job.json |   4 +-
 5 files changed, 169 insertions(+), 8 deletions(-)
 create mode 100644 block/amend.c

diff --git a/block/Makefile.objs b/block/Makefile.objs
index 3635b6b4c1..a0988638d5 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -19,7 +19,7 @@ block-obj-$(CONFIG_WIN32) += file-win32.o win32-aio.o
 block-obj-$(CONFIG_POSIX) += file-posix.o
 block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o
 block-obj-$(CONFIG_LINUX_IO_URING) += io_uring.o
-block-obj-y += null.o mirror.o commit.o io.o create.o
+block-obj-y += null.o mirror.o commit.o io.o create.o amend.o
 block-obj-y += throttle-groups.o
 block-obj-$(CONFIG_LINUX) += nvme.o
 
diff --git a/block/amend.c b/block/amend.c
new file mode 100644
index 00..2db7b1eafc
--- /dev/null
+++ b/block/amend.c
@@ -0,0 +1,108 @@
+/*
+ * Block layer code related to image options amend
+ *
+ * Copyright (c) 2018 Kevin Wolf 
+ * Copyright (c) 2019 Maxim Levitsky 
+ *
+ * Heavily based on create.c
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "block/block_int.h"
+#include "qemu/job.h"
+#include "qemu/main-loop.h"
+#include "qapi/qapi-commands-block-core.h"
+#include "qapi/qapi-visit-block-core.h"
+#include "qapi/clone-visitor.h"
+#include "qapi/error.h"
+
+typedef struct BlockdevAmendJob {
+Job common;
+BlockdevAmendOptions *opts;
+BlockDriverState *bs;
+bool force;
+} BlockdevAmendJob;
+
+static int coroutine_fn blockdev_amend_run(Job *job, Error **errp)
+{
+BlockdevAmendJob *s = container_of(job, BlockdevAmendJob, common);
+int ret;
+
+job_progress_set_remaining(>common, 1);
+ret = s->bs->drv->bdrv_co_amend(s->bs, s->opts, s->force, errp);
+job_progress_update(>common, 1);
+qapi_free_BlockdevAmendOptions(s->opts);
+return ret;
+}
+
+static const JobDriver blockdev_amend_job_driver = {
+.instance_size = sizeof(BlockdevAmendJob),
+.job_type  = JOB_TYPE_AMEND,
+.run   = blockdev_amend_run,
+};
+
+void qmp_x_blockdev_amend(const char *job_id,
+  const char *node_name,
+  BlockdevAmendOptions *options,
+  bool has_force,
+  bool force,
+  Error **errp)
+{
+BlockdevAmendJob *s;
+const char *fmt = BlockdevDriver_str(options->driver);
+BlockDriver *drv = bdrv_find_format(fmt);
+BlockDriverState *bs = bdrv_find_node(node_name);
+
+/*
+ * If the driver is in the schema, we know that it exists. But it may not
+ * be whitelisted.
+ */
+assert(drv);
+if (bdrv_uses_whitelist() && !bdrv_is_whitelisted(drv, false)) {
+error_setg(errp, "Driver is not whitelisted");
+return;
+}
+
+if (bs->drv != drv) {
+error_setg(errp,
+   "x-blockdev-amend doesn't support changing the block 
driver");
+return;
+}
+
+/* Error out if the driver doesn't support .bdrv_co_amend */
+if (!drv->bdrv_co_amend) {
+error_setg(errp, "Driver does not support x-blockdev-amend");
+return;
+}
+
+/* Create the block job */
+s = job_create(job_id, _amend_job_driver, NULL,
+   bdrv_get_aio_context(bs), JOB_DEFAULT | JOB_MANUAL_DISMISS,
+   NULL, NULL, errp);
+if (!s) {
+return;
+}
+
+s->bs = bs,
+s->opts =

[PATCH v4 10/14] iotests: qemu-img tests for luks key management

2020-05-05 Thread Maxim Levitsky

This commit adds two tests, which test the new amend interface
of both luks raw images and qcow2 luks encrypted images.

Signed-off-by: Maxim Levitsky 
Reviewed-by: Daniel P. Berrangé 
---
 tests/qemu-iotests/300 | 207 +
 tests/qemu-iotests/300.out |  99 ++
 tests/qemu-iotests/301 |  90 
 tests/qemu-iotests/301.out |  30 ++
 tests/qemu-iotests/group   |   3 +
 5 files changed, 429 insertions(+)
 create mode 100755 tests/qemu-iotests/300
 create mode 100644 tests/qemu-iotests/300.out
 create mode 100755 tests/qemu-iotests/301
 create mode 100644 tests/qemu-iotests/301.out

diff --git a/tests/qemu-iotests/300 b/tests/qemu-iotests/300
new file mode 100755
index 00..aa1a77690f
--- /dev/null
+++ b/tests/qemu-iotests/300
@@ -0,0 +1,207 @@
+#!/usr/bin/env bash
+#
+# Test encryption key management with luks
+# Based on 134
+#
+# Copyright (C) 2019 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+# creator
+owner=mlevi...@redhat.com
+
+seq=`basename $0`
+echo "QA output created by $seq"
+
+status=1   # failure is the default!
+
+_cleanup()
+{
+   _cleanup_test_img
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+_supported_fmt qcow2 luks
+_supported_proto file #TODO
+
+QEMU_IO_OPTIONS=$QEMU_IO_OPTIONS_NO_FMT
+
+if [ "$IMGFMT" = "qcow2" ] ; then
+   PR="encrypt."
+   EXTRA_IMG_ARGS="-o encrypt.format=luks"
+fi
+
+
+# secrets: you are supposed to see the password as ***, see :-)
+S0="--object secret,id=sec0,data=hunter0"
+S1="--object secret,id=sec1,data=hunter1"
+S2="--object secret,id=sec2,data=hunter2"
+S3="--object secret,id=sec3,data=hunter3"
+S4="--object secret,id=sec4,data=hunter4"
+SECRETS="$S0 $S1 $S2 $S3 $S4"
+
+# image with given secret
+IMGS0="--image-opts 
driver=$IMGFMT,file.filename=$TEST_IMG,${PR}key-secret=sec0"
+IMGS1="--image-opts 
driver=$IMGFMT,file.filename=$TEST_IMG,${PR}key-secret=sec1"
+IMGS2="--image-opts 
driver=$IMGFMT,file.filename=$TEST_IMG,${PR}key-secret=sec2"
+IMGS3="--image-opts 
driver=$IMGFMT,file.filename=$TEST_IMG,${PR}key-secret=sec3"
+IMGS4="--image-opts 
driver=$IMGFMT,file.filename=$TEST_IMG,${PR}key-secret=sec4"
+
+
+echo "== creating a test image =="
+_make_test_img $S0 $EXTRA_IMG_ARGS -o ${PR}key-secret=sec0,${PR}iter-time=10 
32M
+
+echo
+echo "== test that key 0 opens the image =="
+$QEMU_IO $S0 -c "read 0 4096" $IMGS0 | _filter_qemu_io | _filter_testdir
+
+echo
+echo "== adding a password to slot 4 =="
+$QEMU_IMG amend $SECRETS $IMGS0 -o 
${PR}state=active,${PR}new-secret=sec4,${PR}iter-time=10,${PR}keyslot=4
+echo "== adding a password to slot 1 =="
+$QEMU_IMG amend $SECRETS $IMGS0 -o 
${PR}state=active,${PR}new-secret=sec1,${PR}iter-time=10
+echo "== adding a password to slot 3 =="
+$QEMU_IMG amend $SECRETS $IMGS1 -o 
${PR}state=active,${PR}new-secret=sec3,${PR}iter-time=10,${PR}keyslot=3
+
+echo "== adding a password to slot 2 =="
+$QEMU_IMG amend $SECRETS $IMGS3 -o 
${PR}state=active,${PR}new-secret=sec2,${PR}iter-time=10
+
+
+echo "== erase slot 4 =="
+$QEMU_IMG amend $SECRETS $IMGS1 -o ${PR}state=inactive,${PR}keyslot=4 | 
_filter_img_create
+
+
+echo
+echo "== all secrets should work =="
+for IMG in "$IMGS0" "$IMGS1" "$IMGS2" "$IMGS3"; do
+   $QEMU_IO $SECRETS -c "read 0 4096" $IMG | _filter_qemu_io | 
_filter_testdir
+done
+
+echo
+echo "== erase slot 0 and try it =="
+$QEMU_IMG amend $SECRETS $IMGS1 -o ${PR}state=inactive,${PR}old-secret=sec0 | 
_filter_img_create
+$QEMU_IO $SECRETS -c "read 0 4096" $IMGS0 | _filter_qemu_io | _filter_testdir
+
+echo
+echo "== erase slot 2 and try it =="
+$QEMU_IMG amend $SECRETS $IMGS1 -o ${PR}state=inactive,${PR}keyslot=2 | 
_filter_img_create
+$QEMU_IO $SECRETS -c "read 0 4096" $IMGS2 | _filter_qemu_io | _filter_testdir
+
+
+# at this point slots 1 and 3 should be active
+
+echo
+echo "== filling  4 slots with secret 2 =="
+for i in $(seq 0 3) ; do
+   $QEMU_IMG amend $SECRETS $IMGS3 -o 
${PR}state=active,${PR}new-secret=sec2,${PR}iter-time=10
+done
+
+echo
+echo "== adding secret 0 =="
+   $QEMU_IMG amend $SECRETS $IMGS3 -o 
${PR}state=active,${PR}new-secret=sec0,${PR}iter-time=10
+
+echo
+echo "== adding secret 3 (last slot) =="
+   $QEMU_IMG amend $SECRETS $IMGS3 -o

[PATCH v4 08/14] block/qcow2: extend qemu-img amend interface with crypto options

2020-05-05 Thread Maxim Levitsky

Now that we have all the infrastructure in place,
wire it in the qcow2 driver and expose this to the user.

Signed-off-by: Maxim Levitsky 
Reviewed-by: Daniel P. Berrangé 
---
 block/qcow2.c  | 72 +-
 tests/qemu-iotests/082.out | 45 
 2 files changed, 108 insertions(+), 9 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index e6c4d0b0b4..ce1e25f341 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -176,6 +176,19 @@ static ssize_t qcow2_crypto_hdr_write_func(QCryptoBlock 
*block, size_t offset,
 return ret;
 }
 
+static QDict*
+qcow2_extract_crypto_opts(QemuOpts *opts, const char *fmt, Error **errp)
+{
+QDict *cryptoopts_qdict;
+QDict *opts_qdict;
+
+/* Extract "encrypt." options into a qdict */
+opts_qdict = qemu_opts_to_qdict(opts, NULL);
+qdict_extract_subqdict(opts_qdict, _qdict, "encrypt.");
+qobject_unref(opts_qdict);
+qdict_put_str(cryptoopts_qdict, "format", fmt);
+return cryptoopts_qdict;
+}
 
 /*
  * read qcow2 extension and fill bs
@@ -4733,17 +4746,11 @@ static BlockMeasureInfo *qcow2_measure(QemuOpts *opts, 
BlockDriverState *in_bs,
 g_free(optstr);
 
 if (has_luks) {
+
 g_autoptr(QCryptoBlockCreateOptions) create_opts = NULL;
-QDict *opts_qdict;
-QDict *cryptoopts;
+QDict *cryptoopts = qcow2_extract_crypto_opts(opts, "luks", errp);
 size_t headerlen;
 
-opts_qdict = qemu_opts_to_qdict(opts, NULL);
-qdict_extract_subqdict(opts_qdict, , "encrypt.");
-qobject_unref(opts_qdict);
-
-qdict_put_str(cryptoopts, "format", "luks");
-
 create_opts = block_crypto_create_opts_init(cryptoopts, errp);
 qobject_unref(cryptoopts);
 if (!create_opts) {
@@ -5122,6 +5129,7 @@ typedef enum Qcow2AmendOperation {
 QCOW2_NO_OPERATION = 0,
 
 QCOW2_UPGRADING,
+QCOW2_UPDATING_ENCRYPTION,
 QCOW2_CHANGING_REFCOUNT_ORDER,
 QCOW2_DOWNGRADING,
 } Qcow2AmendOperation;
@@ -5203,6 +5211,7 @@ static int qcow2_amend_options(BlockDriverState *bs, 
QemuOpts *opts,
 int ret;
 QemuOptDesc *desc = opts->list->desc;
 Qcow2AmendHelperCBInfo helper_cb_info;
+bool encryption_update = false;
 
 while (desc && desc->name) {
 if (!qemu_opt_find(opts, desc->name)) {
@@ -5229,6 +5238,18 @@ static int qcow2_amend_options(BlockDriverState *bs, 
QemuOpts *opts,
 backing_file = qemu_opt_get(opts, BLOCK_OPT_BACKING_FILE);
 } else if (!strcmp(desc->name, BLOCK_OPT_BACKING_FMT)) {
 backing_format = qemu_opt_get(opts, BLOCK_OPT_BACKING_FMT);
+} else if (g_str_has_prefix(desc->name, "encrypt.")) {
+if (!s->crypto) {
+error_setg(errp,
+   "Can't amend encryption options - encryption not 
present");
+return -EINVAL;
+}
+if (s->crypt_method_header != QCOW_CRYPT_LUKS) {
+error_setg(errp,
+   "Only LUKS encryption options can be amended");
+return -ENOTSUP;
+}
+encryption_update = true;
 } else if (!strcmp(desc->name, BLOCK_OPT_LAZY_REFCOUNTS)) {
 lazy_refcounts = qemu_opt_get_bool(opts, BLOCK_OPT_LAZY_REFCOUNTS,
lazy_refcounts);
@@ -5271,7 +5292,8 @@ static int qcow2_amend_options(BlockDriverState *bs, 
QemuOpts *opts,
 .original_status_cb = status_cb,
 .original_cb_opaque = cb_opaque,
 .total_operations = (new_version != old_version)
-  + (s->refcount_bits != refcount_bits)
+  + (s->refcount_bits != refcount_bits) +
+(encryption_update == true)
 };
 
 /* Upgrade first (some features may require compat=1.1) */
@@ -5284,6 +5306,33 @@ static int qcow2_amend_options(BlockDriverState *bs, 
QemuOpts *opts,
 }
 }
 
+if (encryption_update) {
+QDict *amend_opts_dict;
+QCryptoBlockAmendOptions *amend_opts;
+
+helper_cb_info.current_operation = QCOW2_UPDATING_ENCRYPTION;
+amend_opts_dict = qcow2_extract_crypto_opts(opts, "luks", errp);
+if (!amend_opts_dict) {
+return -EINVAL;
+}
+amend_opts = block_crypto_amend_opts_init(amend_opts_dict, errp);
+qobject_unref(amend_opts_dict);
+if (!amend_opts) {
+return -EINVAL;
+}
+ret = qcrypto_block_amend_options(s->crypto,
+  qcow2_crypto_hdr_read_func,
+  qcow2_crypto_hdr_write_func,
+  bs,
+  amend_opts,
+  force,
+  errp);
+qapi_free_QCryptoBlockAmendOptions(amend_opts);
+if (ret < 0) {
+

[PATCH v4 07/14] block/crypto: implement the encryption key management

2020-05-05 Thread Maxim Levitsky

This implements the encryption key management using the generic code in
qcrypto layer and exposes it to the user via qemu-img

This code adds another 'write_func' because the initialization
write_func works directly on the underlying file, and amend
works on instance of luks device.

This commit also adds a 'hack/workaround' I and Kevin Wolf (thanks)
made to make the driver both support write sharing (to avoid breaking the 
users),
and be safe against concurrent  metadata update (the keyslots)

Eventually the write sharing for luks driver will be deprecated
and removed together with this hack.

The hack is that we ask (as a format driver) for BLK_PERM_CONSISTENT_READ
and then when we want to update the keys, we unshare that permission.
So if someone else has the image open, even readonly, encryption
key update will fail gracefully.

Also thanks to Daniel Berrange for the idea of
unsharing read, rather that write permission which allows
to avoid cases when the other user had opened the image read-only.

Signed-off-by: Maxim Levitsky 
Reviewed-by: Daniel P. Berrangé 
---
 block/crypto.c | 127 +++--
 block/crypto.h |  34 +
 2 files changed, 158 insertions(+), 3 deletions(-)

diff --git a/block/crypto.c b/block/crypto.c
index 13ca1ad891..b71e57f777 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -37,6 +37,7 @@ typedef struct BlockCrypto BlockCrypto;
 
 struct BlockCrypto {
 QCryptoBlock *block;
+bool updating_keys;
 };
 
 
@@ -71,6 +72,24 @@ static ssize_t block_crypto_read_func(QCryptoBlock *block,
 return ret;
 }
 
+static ssize_t block_crypto_write_func(QCryptoBlock *block,
+   size_t offset,
+   const uint8_t *buf,
+   size_t buflen,
+   void *opaque,
+   Error **errp)
+{
+BlockDriverState *bs = opaque;
+ssize_t ret;
+
+ret = bdrv_pwrite(bs->file, offset, buf, buflen);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "Could not write encryption header");
+return ret;
+}
+return ret;
+}
+
 
 struct BlockCryptoCreateData {
 BlockBackend *blk;
@@ -149,6 +168,19 @@ static QemuOptsList block_crypto_create_opts_luks = {
 };
 
 
+static QemuOptsList block_crypto_amend_opts_luks = {
+.name = "crypto",
+.head = QTAILQ_HEAD_INITIALIZER(block_crypto_create_opts_luks.head),
+.desc = {
+BLOCK_CRYPTO_OPT_DEF_LUKS_STATE(""),
+BLOCK_CRYPTO_OPT_DEF_LUKS_KEYSLOT(""),
+BLOCK_CRYPTO_OPT_DEF_LUKS_OLD_SECRET(""),
+BLOCK_CRYPTO_OPT_DEF_LUKS_NEW_SECRET(""),
+BLOCK_CRYPTO_OPT_DEF_LUKS_ITER_TIME(""),
+{ /* end of list */ }
+},
+};
+
 QCryptoBlockOpenOptions *
 block_crypto_open_opts_init(QDict *opts, Error **errp)
 {
@@ -742,6 +774,95 @@ block_crypto_get_specific_info_luks(BlockDriverState *bs, 
Error **errp)
 return spec_info;
 }
 
+static int
+block_crypto_amend_options_luks(BlockDriverState *bs,
+QemuOpts *opts,
+BlockDriverAmendStatusCB *status_cb,
+void *cb_opaque,
+bool force,
+Error **errp)
+{
+BlockCrypto *crypto = bs->opaque;
+QDict *cryptoopts = NULL;
+QCryptoBlockAmendOptions *amend_options = NULL;
+int ret;
+
+assert(crypto);
+assert(crypto->block);
+crypto->updating_keys = true;
+
+ret = bdrv_child_refresh_perms(bs, bs->file, errp);
+if (ret < 0) {
+goto cleanup;
+}
+
+cryptoopts = qemu_opts_to_qdict(opts, NULL);
+qdict_put_str(cryptoopts, "format", "luks");
+amend_options = block_crypto_amend_opts_init(cryptoopts, errp);
+if (!amend_options) {
+ret = -EINVAL;
+goto cleanup;
+}
+
+ret = qcrypto_block_amend_options(crypto->block,
+  block_crypto_read_func,
+  block_crypto_write_func,
+  bs,
+  amend_options,
+  force,
+  errp);
+cleanup:
+crypto->updating_keys = false;
+bdrv_child_refresh_perms(bs, bs->file, errp);
+qapi_free_QCryptoBlockAmendOptions(amend_options);
+qobject_unref(cryptoopts);
+return ret;
+}
+
+
+static void
+block_crypto_child_perms(BlockDriverState *bs, BdrvChild *c,
+ const BdrvChildRole *role,
+ BlockReopenQueue *reopen_queue,
+ uint64_t perm, uint64_t shared,
+ uint64_t *nperm, uint64_t *nshared)
+{
+
+BlockCrypto *crypto = bs->opaque;
+
+bdrv_filter_default_perms(bs, c, role, reopen_queue,
+perm, shared, nperm, nshared);
+/*
+ * Ask for

[PATCH v4 06/14] block/crypto: rename two functions

2020-05-05 Thread Maxim Levitsky

rename the write_func to create_write_func, and init_func to create_init_func.
This is preparation for other write_func that will be used to update the 
encryption keys.

No functional changes

Signed-off-by: Maxim Levitsky 
Reviewed-by: Daniel P. Berrangé 
---
 block/crypto.c | 25 -
 1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/block/crypto.c b/block/crypto.c
index d379e39efb..13ca1ad891 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -79,12 +79,12 @@ struct BlockCryptoCreateData {
 };
 
 
-static ssize_t block_crypto_write_func(QCryptoBlock *block,
-   size_t offset,
-   const uint8_t *buf,
-   size_t buflen,
-   void *opaque,
-   Error **errp)
+static ssize_t block_crypto_create_write_func(QCryptoBlock *block,
+  size_t offset,
+  const uint8_t *buf,
+  size_t buflen,
+  void *opaque,
+  Error **errp)
 {
 struct BlockCryptoCreateData *data = opaque;
 ssize_t ret;
@@ -97,11 +97,10 @@ static ssize_t block_crypto_write_func(QCryptoBlock *block,
 return ret;
 }
 
-
-static ssize_t block_crypto_init_func(QCryptoBlock *block,
-  size_t headerlen,
-  void *opaque,
-  Error **errp)
+static ssize_t block_crypto_create_init_func(QCryptoBlock *block,
+ size_t headerlen,
+ void *opaque,
+ Error **errp)
 {
 struct BlockCryptoCreateData *data = opaque;
 
@@ -297,8 +296,8 @@ static int block_crypto_co_create_generic(BlockDriverState 
*bs,
 };
 
 crypto = qcrypto_block_create(opts, NULL,
-  block_crypto_init_func,
-  block_crypto_write_func,
+  block_crypto_create_init_func,
+  block_crypto_create_write_func,
   ,
   errp);
 
-- 
2.17.2

[PATCH v4 05/14] block/amend: refactor qcow2 amend options

2020-05-05 Thread Maxim Levitsky

Some qcow2 create options can't be used for amend.
Remove them from the qcow2 create options and add generic logic to detect
such options in qemu-img

Signed-off-by: Maxim Levitsky 
Reviewed-by: Daniel P. Berrangé 
---
 block/qcow2.c  | 108 ++---
 qemu-img.c |  18 +++-
 tests/qemu-iotests/049.out | 102 ++--
 tests/qemu-iotests/061.out |  12 ++-
 tests/qemu-iotests/079.out |  18 ++--
 tests/qemu-iotests/082.out | 149 
 tests/qemu-iotests/085.out |  38 
 tests/qemu-iotests/087.out |   6 +-
 tests/qemu-iotests/115.out |   2 +-
 tests/qemu-iotests/121.out |   4 +-
 tests/qemu-iotests/125.out | 192 ++---
 tests/qemu-iotests/134.out |   2 +-
 tests/qemu-iotests/144.out |   4 +-
 tests/qemu-iotests/158.out |   4 +-
 tests/qemu-iotests/182.out |   2 +-
 tests/qemu-iotests/185.out |   8 +-
 tests/qemu-iotests/188.out |   2 +-
 tests/qemu-iotests/189.out |   4 +-
 tests/qemu-iotests/198.out |   4 +-
 tests/qemu-iotests/243.out |  16 ++--
 tests/qemu-iotests/250.out |   2 +-
 tests/qemu-iotests/255.out |   8 +-
 tests/qemu-iotests/259.out |   2 +-
 tests/qemu-iotests/263.out |   4 +-
 tests/qemu-iotests/280.out |   2 +-
 25 files changed, 284 insertions(+), 429 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 13780b0278..e6c4d0b0b4 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2971,17 +2971,6 @@ static int qcow2_change_backing_file(BlockDriverState 
*bs,
 return qcow2_update_header(bs);
 }
 
-static int qcow2_crypt_method_from_format(const char *encryptfmt)
-{
-if (g_str_equal(encryptfmt, "luks")) {
-return QCOW_CRYPT_LUKS;
-} else if (g_str_equal(encryptfmt, "aes")) {
-return QCOW_CRYPT_AES;
-} else {
-return -EINVAL;
-}
-}
-
 static int qcow2_set_up_encryption(BlockDriverState *bs,
QCryptoBlockCreateOptions *cryptoopts,
Error **errp)
@@ -5210,9 +5199,6 @@ static int qcow2_amend_options(BlockDriverState *bs, 
QemuOpts *opts,
 bool lazy_refcounts = s->use_lazy_refcounts;
 bool data_file_raw = data_file_is_raw(bs);
 const char *compat = NULL;
-uint64_t cluster_size = s->cluster_size;
-bool encrypt;
-int encformat;
 int refcount_bits = s->refcount_bits;
 int ret;
 QemuOptDesc *desc = opts->list->desc;
@@ -5237,44 +5223,12 @@ static int qcow2_amend_options(BlockDriverState *bs, 
QemuOpts *opts,
 error_setg(errp, "Unknown compatibility level %s", compat);
 return -EINVAL;
 }
-} else if (!strcmp(desc->name, BLOCK_OPT_PREALLOC)) {
-error_setg(errp, "Cannot change preallocation mode");
-return -ENOTSUP;
 } else if (!strcmp(desc->name, BLOCK_OPT_SIZE)) {
 new_size = qemu_opt_get_size(opts, BLOCK_OPT_SIZE, 0);
 } else if (!strcmp(desc->name, BLOCK_OPT_BACKING_FILE)) {
 backing_file = qemu_opt_get(opts, BLOCK_OPT_BACKING_FILE);
 } else if (!strcmp(desc->name, BLOCK_OPT_BACKING_FMT)) {
 backing_format = qemu_opt_get(opts, BLOCK_OPT_BACKING_FMT);
-} else if (!strcmp(desc->name, BLOCK_OPT_ENCRYPT)) {
-encrypt = qemu_opt_get_bool(opts, BLOCK_OPT_ENCRYPT,
-!!s->crypto);
-
-if (encrypt != !!s->crypto) {
-error_setg(errp,
-   "Changing the encryption flag is not supported");
-return -ENOTSUP;
-}
-} else if (!strcmp(desc->name, BLOCK_OPT_ENCRYPT_FORMAT)) {
-encformat = qcow2_crypt_method_from_format(
-qemu_opt_get(opts, BLOCK_OPT_ENCRYPT_FORMAT));
-
-if (encformat != s->crypt_method_header) {
-error_setg(errp,
-   "Changing the encryption format is not supported");
-return -ENOTSUP;
-}
-} else if (g_str_has_prefix(desc->name, "encrypt.")) {
-error_setg(errp,
-   "Changing the encryption parameters is not supported");
-return -ENOTSUP;
-} else if (!strcmp(desc->name, BLOCK_OPT_CLUSTER_SIZE)) {
-cluster_size = qemu_opt_get_size(opts, BLOCK_OPT_CLUSTER_SIZE,
- cluster_size);
-if (cluster_size != s->cluster_size) {
-error_setg(errp, "Changing the cluster size is not supported");
-return -ENOTSUP;
-}
 } else if (!strcmp(desc->name, BLOCK_OPT_LAZY_REFCOUNTS)) {
 lazy_refcounts = qemu_opt_get_bool(opts, BLOCK_OPT_LAZY_REFCOUNTS,
lazy_refcounts);
@@ -5527,37 +5481,6 @@ void qcow2_signal_corruption(BlockDriverState *bs, bool 
fatal, int64_t offset,
 .help = "The external data file must stay valid "   \

[PATCH v4 03/14] block/amend: add 'force' option

2020-05-05 Thread Maxim Levitsky

'force' option will be used for some unsafe amend operations.

This includes things like erasing last keyslot in luks based formats
(which destroys the data, unless the master key is backed up
by external means), but that _might_ be desired result.

Signed-off-by: Maxim Levitsky 
Reviewed-by: Daniel P. Berrangé 
---
 block.c   | 4 +++-
 block/qcow2.c | 1 +
 docs/tools/qemu-img.rst   | 5 -
 include/block/block.h | 1 +
 include/block/block_int.h | 1 +
 qemu-img-cmds.hx  | 4 ++--
 qemu-img.c| 8 +++-
 7 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/block.c b/block.c
index cf5c19b1db..de2e41b361 100644
--- a/block.c
+++ b/block.c
@@ -6377,6 +6377,7 @@ void bdrv_remove_aio_context_notifier(BlockDriverState 
*bs,
 
 int bdrv_amend_options(BlockDriverState *bs, QemuOpts *opts,
BlockDriverAmendStatusCB *status_cb, void *cb_opaque,
+   bool force,
Error **errp)
 {
 if (!bs->drv) {
@@ -6388,7 +6389,8 @@ int bdrv_amend_options(BlockDriverState *bs, QemuOpts 
*opts,
bs->drv->format_name);
 return -ENOTSUP;
 }
-return bs->drv->bdrv_amend_options(bs, opts, status_cb, cb_opaque, errp);
+return bs->drv->bdrv_amend_options(bs, opts, status_cb,
+   cb_opaque, force, errp);
 }
 
 /*
diff --git a/block/qcow2.c b/block/qcow2.c
index 2ba0b17c39..ffb6b22e2d 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -5200,6 +5200,7 @@ static void qcow2_amend_helper_cb(BlockDriverState *bs,
 static int qcow2_amend_options(BlockDriverState *bs, QemuOpts *opts,
BlockDriverAmendStatusCB *status_cb,
void *cb_opaque,
+   bool force,
Error **errp)
 {
 BDRVQcow2State *s = bs->opaque;
diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst
index 0080f83a76..fc2dca6649 100644
--- a/docs/tools/qemu-img.rst
+++ b/docs/tools/qemu-img.rst
@@ -249,11 +249,14 @@ Command description:
 
 .. program:: qemu-img-commands
 
-.. option:: amend [--object OBJECTDEF] [--image-opts] [-p] [-q] [-f FMT] [-t 
CACHE] -o OPTIONS FILENAME
+.. option:: amend [--object OBJECTDEF] [--image-opts] [-p] [-q] [-f FMT] [-t 
CACHE] [--force] -o OPTIONS FILENAME
 
   Amends the image format specific *OPTIONS* for the image file
   *FILENAME*. Not all file formats support this operation.
 
+  --force allows some unsafe operations. Currently for -f luks, it allows to
+  erase last encryption key, and to overwrite an active encryption key.
+
 .. option:: bench [-c COUNT] [-d DEPTH] [-f FMT] 
[--flush-interval=FLUSH_INTERVAL] [-i AIO] [-n] [--no-drain] [-o OFFSET] 
[--pattern=PATTERN] [-q] [-s BUFFER_SIZE] [-S STEP_SIZE] [-t CACHE] [-w] [-U] 
FILENAME
 
   Run a simple sequential I/O benchmark on the specified image. If ``-w`` is
diff --git a/include/block/block.h b/include/block/block.h
index 8b62429aa4..0ca53b5598 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -392,6 +392,7 @@ typedef void BlockDriverAmendStatusCB(BlockDriverState *bs, 
int64_t offset,
   int64_t total_work_size, void *opaque);
 int bdrv_amend_options(BlockDriverState *bs_new, QemuOpts *opts,
BlockDriverAmendStatusCB *status_cb, void *cb_opaque,
+   bool force,
Error **errp);
 
 /* check if a named node can be replaced when doing drive-mirror */
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 92335f33c7..98671ecdf6 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -432,6 +432,7 @@ struct BlockDriver {
 int (*bdrv_amend_options)(BlockDriverState *bs, QemuOpts *opts,
   BlockDriverAmendStatusCB *status_cb,
   void *cb_opaque,
+  bool force,
   Error **errp);
 
 void (*bdrv_debug_event)(BlockDriverState *bs, BlkdebugEvent event);
diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
index c9c54de1df..9920f1f9d4 100644
--- a/qemu-img-cmds.hx
+++ b/qemu-img-cmds.hx
@@ -10,9 +10,9 @@ HXCOMM When amending the rST sections, please remember to 
copy the usage
 HXCOMM over to the per-command sections in qemu-img.texi.
 
 DEF("amend", img_amend,
-"amend [--object objectdef] [--image-opts] [-p] [-q] [-f fmt] [-t cache] 
-o options filename")
+"amend [--object objectdef] [--image-opts] [-p] [-q] [-f fmt] [-t cache] 
[--force] -o options filename")
 SRST
-.. option:: amend [--object OBJECTDEF] [--image-opts] [-p] [-q] [-f FMT] [-t 
CACHE] -o OPTIONS FILENAME
+.. option:: amend [--object OBJECTDEF] [--image-opts] [-p] [-q] [-f FMT] [-t 
CACHE] [--force] -o OPTIONS FILENAME
 ERST
 
 DEF("bench", img_bench,
diff --git a/qemu-img.c b/qemu-img.c
index 6a4327aaba..ef422d5471 100644
---

[PATCH v4 02/14] qcrypto/luks: implement encryption key management

2020-05-05 Thread Maxim Levitsky

Next few patches will expose that functionality
to the user.

Signed-off-by: Maxim Levitsky 
---
 crypto/block-luks.c | 395 +++-
 qapi/crypto.json|  61 ++-
 2 files changed, 452 insertions(+), 4 deletions(-)

diff --git a/crypto/block-luks.c b/crypto/block-luks.c
index 4861db810c..c108518df1 100644
--- a/crypto/block-luks.c
+++ b/crypto/block-luks.c
@@ -32,6 +32,7 @@
 #include "qemu/uuid.h"
 
 #include "qemu/coroutine.h"
+#include "qemu/bitmap.h"
 
 /*
  * Reference for the LUKS format implemented here is
@@ -70,6 +71,9 @@ typedef struct QCryptoBlockLUKSKeySlot 
QCryptoBlockLUKSKeySlot;
 
 #define QCRYPTO_BLOCK_LUKS_SECTOR_SIZE 512LL
 
+#define QCRYPTO_BLOCK_LUKS_DEFAULT_ITER_TIME_MS 2000
+#define QCRYPTO_BLOCK_LUKS_ERASE_ITERATIONS 40
+
 static const char qcrypto_block_luks_magic[QCRYPTO_BLOCK_LUKS_MAGIC_LEN] = {
 'L', 'U', 'K', 'S', 0xBA, 0xBE
 };
@@ -219,6 +223,9 @@ struct QCryptoBlockLUKS {
 
 /* Hash algorithm used in pbkdf2 function */
 QCryptoHashAlgorithm hash_alg;
+
+/* Name of the secret that was used to open the image */
+char *secret;
 };
 
 
@@ -1069,6 +1076,119 @@ qcrypto_block_luks_find_key(QCryptoBlock *block,
 return -1;
 }
 
+/*
+ * Returns true if a slot i is marked as active
+ * (contains encrypted copy of the master key)
+ */
+static bool
+qcrypto_block_luks_slot_active(const QCryptoBlockLUKS *luks,
+   unsigned int slot_idx)
+{
+uint32_t val = luks->header.key_slots[slot_idx].active;
+return val ==  QCRYPTO_BLOCK_LUKS_KEY_SLOT_ENABLED;
+}
+
+/*
+ * Returns the number of slots that are marked as active
+ * (slots that contain encrypted copy of the master key)
+ */
+static unsigned int
+qcrypto_block_luks_count_active_slots(const QCryptoBlockLUKS *luks)
+{
+size_t i = 0;
+unsigned int ret = 0;
+
+for (i = 0; i < QCRYPTO_BLOCK_LUKS_NUM_KEY_SLOTS; i++) {
+if (qcrypto_block_luks_slot_active(luks, i)) {
+ret++;
+}
+}
+return ret;
+}
+
+/*
+ * Finds first key slot which is not active
+ * Returns the key slot index, or -1 if it doesn't exist
+ */
+static int
+qcrypto_block_luks_find_free_keyslot(const QCryptoBlockLUKS *luks)
+{
+size_t i;
+
+for (i = 0; i < QCRYPTO_BLOCK_LUKS_NUM_KEY_SLOTS; i++) {
+if (!qcrypto_block_luks_slot_active(luks, i)) {
+return i;
+}
+}
+return -1;
+}
+
+/*
+ * Erases an keyslot given its index
+ * Returns:
+ *0 if the keyslot was erased successfully
+ *   -1 if a error occurred while erasing the keyslot
+ *
+ */
+static int
+qcrypto_block_luks_erase_key(QCryptoBlock *block,
+ unsigned int slot_idx,
+ QCryptoBlockWriteFunc writefunc,
+ void *opaque,
+ Error **errp)
+{
+QCryptoBlockLUKS *luks = block->opaque;
+QCryptoBlockLUKSKeySlot *slot = >header.key_slots[slot_idx];
+g_autofree uint8_t *garbagesplitkey = NULL;
+size_t splitkeylen = luks->header.master_key_len * slot->stripes;
+size_t i;
+Error *local_err = NULL;
+int ret;
+
+assert(slot_idx < QCRYPTO_BLOCK_LUKS_NUM_KEY_SLOTS);
+assert(splitkeylen > 0);
+garbagesplitkey = g_new0(uint8_t, splitkeylen);
+
+/* Reset the key slot header */
+memset(slot->salt, 0, QCRYPTO_BLOCK_LUKS_SALT_LEN);
+slot->iterations = 0;
+slot->active = QCRYPTO_BLOCK_LUKS_KEY_SLOT_DISABLED;
+
+ret = qcrypto_block_luks_store_header(block,  writefunc,
+  opaque, _err);
+
+if (ret) {
+error_propagate(errp, local_err);
+}
+/*
+ * Now try to erase the key material, even if the header
+ * update failed
+ */
+for (i = 0; i < QCRYPTO_BLOCK_LUKS_ERASE_ITERATIONS; i++) {
+if (qcrypto_random_bytes(garbagesplitkey,
+ splitkeylen, _err) < 0) {
+/*
+ * If we failed to get the random data, still write
+ * at least zeros to the key slot at least once
+ */
+error_propagate(errp, local_err);
+
+if (i > 0) {
+return -1;
+}
+}
+if (writefunc(block,
+  slot->key_offset_sector * QCRYPTO_BLOCK_LUKS_SECTOR_SIZE,
+  garbagesplitkey,
+  splitkeylen,
+  opaque,
+  _err) != splitkeylen) {
+error_propagate(errp, local_err);
+return -1;
+}
+}
+return 0;
+}
 
 static int
 qcrypto_block_luks_open(QCryptoBlock *block,
@@ -1099,6 +1219,7 @@ qcrypto_block_luks_open(QCryptoBlock *block,
 
 luks = g_new0(QCryptoBlockLUKS, 1);
 block->opaque = luks;
+luks->secret = g_strdup(options->u.luks.key_secret);
 
 if (qcrypto_block_luks_load_header(block, readfunc, opaque, errp) < 0) {
 goto fail;
@@ -1164,6 +1285,7 @@

[PATCH v4 04/14] block/amend: separate amend and create options for qemu-img

2020-05-05 Thread Maxim Levitsky

Some options are only useful for creation
(or hard to be amended, like cluster size for qcow2), while some other
options are only useful for amend, like upcoming keyslot management
options for luks

Since currently only qcow2 supports amend, move all its options
to a common macro and then include it in each action option list.

In future it might be useful to remove some options which are
not supported anyway from amend list, which currently
cause an error message if amended.

Signed-off-by: Maxim Levitsky 
Reviewed-by: Daniel P. Berrangé 
---
 block/qcow2.c | 160 +-
 include/block/block_int.h |   4 +
 qemu-img.c|  18 ++---
 3 files changed, 100 insertions(+), 82 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index ffb6b22e2d..13780b0278 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -5495,83 +5495,96 @@ void qcow2_signal_corruption(BlockDriverState *bs, bool 
fatal, int64_t offset,
 s->signaled_corruption = true;
 }
 
+#define QCOW_COMMON_OPTIONS \
+{   \
+.name = BLOCK_OPT_SIZE, \
+.type = QEMU_OPT_SIZE,  \
+.help = "Virtual disk size" \
+},  \
+{   \
+.name = BLOCK_OPT_COMPAT_LEVEL, \
+.type = QEMU_OPT_STRING,\
+.help = "Compatibility level (v2 [0.10] or v3 [1.1])"   \
+},  \
+{   \
+.name = BLOCK_OPT_BACKING_FILE, \
+.type = QEMU_OPT_STRING,\
+.help = "File name of a base image" \
+},  \
+{   \
+.name = BLOCK_OPT_BACKING_FMT,  \
+.type = QEMU_OPT_STRING,\
+.help = "Image format of the base image"\
+},  \
+{   \
+.name = BLOCK_OPT_DATA_FILE,\
+.type = QEMU_OPT_STRING,\
+.help = "File name of an external data file"\
+},  \
+{   \
+.name = BLOCK_OPT_DATA_FILE_RAW,\
+.type = QEMU_OPT_BOOL,  \
+.help = "The external data file must stay valid "   \
+"as a raw image"\
+},  \
+{   \
+.name = BLOCK_OPT_ENCRYPT,  \
+.type = QEMU_OPT_BOOL,  \
+.help = "Encrypt the image with format 'aes'. (Deprecated " \
+"in favor of " BLOCK_OPT_ENCRYPT_FORMAT "=aes)",\
+},  \
+{   \
+.name = BLOCK_OPT_ENCRYPT_FORMAT,   \
+.type = QEMU_OPT_STRING,\
+.help = "Encrypt the image, format choices: 'aes', 'luks'", \
+},  \
+BLOCK_CRYPTO_OPT_DEF_KEY_SECRET("encrypt.", \
+"ID of secret providing qcow AES key or LUKS passphrase"),  \
+BLOCK_CRYPTO_OPT_DEF_LUKS_CIPHER_ALG("encrypt."),   \
+BLOCK_CRYPTO_OPT_DEF_LUKS_CIPHER_MODE("encrypt."),  \
+BLOCK_CRYPTO_OPT_DEF_LUKS_IVGEN_ALG("encrypt."),\
+BLOCK_CRYPTO_OPT_DEF_LUKS_IVGEN_HASH_ALG("encrypt."),   \
+BLOCK_CRYPTO_OPT_DEF_LUKS_HASH_ALG("encrypt."), \
+BLOCK_CRYPTO_OPT_DEF_LUKS_ITER_TIME("encrypt."),\
+{   \
+.name = BLOCK_OPT_CLUSTER_SIZE, \
+.type = QEMU_OPT_SIZE,  \
+.help = "qcow2 cluster size",   \
+.def_value_str = stringify(DEFAULT_CLUSTER_SIZE)\
+},

[PATCH v4 01/14] qcrypto/core: add generic infrastructure for crypto options amendment

2020-05-05 Thread Maxim Levitsky

This will be used first to implement luks keyslot management.

block_crypto_amend_opts_init will be used to convert
qemu-img cmdline to QCryptoBlockAmendOptions

Signed-off-by: Maxim Levitsky 
Reviewed-by: Daniel P. Berrangé 
---
 block/crypto.c | 17 +
 block/crypto.h |  3 +++
 crypto/block.c | 29 +
 crypto/blockpriv.h |  8 
 include/crypto/block.h | 22 ++
 qapi/crypto.json   | 16 
 6 files changed, 95 insertions(+)

diff --git a/block/crypto.c b/block/crypto.c
index e02f343590..d379e39efb 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -185,6 +185,23 @@ block_crypto_create_opts_init(QDict *opts, Error **errp)
 return ret;
 }
 
+QCryptoBlockAmendOptions *
+block_crypto_amend_opts_init(QDict *opts, Error **errp)
+{
+Visitor *v;
+QCryptoBlockAmendOptions *ret;
+
+v = qobject_input_visitor_new_flat_confused(opts, errp);
+if (!v) {
+return NULL;
+}
+
+visit_type_QCryptoBlockAmendOptions(v, NULL, , errp);
+
+visit_free(v);
+return ret;
+}
+
 
 static int block_crypto_open_generic(QCryptoBlockFormat format,
  QemuOptsList *opts_spec,
diff --git a/block/crypto.h b/block/crypto.h
index b935695e79..06e044c9be 100644
--- a/block/crypto.h
+++ b/block/crypto.h
@@ -91,6 +91,9 @@
 QCryptoBlockCreateOptions *
 block_crypto_create_opts_init(QDict *opts, Error **errp);
 
+QCryptoBlockAmendOptions *
+block_crypto_amend_opts_init(QDict *opts, Error **errp);
+
 QCryptoBlockOpenOptions *
 block_crypto_open_opts_init(QDict *opts, Error **errp);
 
diff --git a/crypto/block.c b/crypto/block.c
index 6f42b32f1e..eb057948b5 100644
--- a/crypto/block.c
+++ b/crypto/block.c
@@ -150,6 +150,35 @@ 
qcrypto_block_calculate_payload_offset(QCryptoBlockCreateOptions *create_opts,
 return crypto != NULL;
 }
 
+int qcrypto_block_amend_options(QCryptoBlock *block,
+QCryptoBlockReadFunc readfunc,
+QCryptoBlockWriteFunc writefunc,
+void *opaque,
+QCryptoBlockAmendOptions *options,
+bool force,
+Error **errp)
+{
+if (options->format != block->format) {
+error_setg(errp,
+   "Cannot amend encryption format");
+return -1;
+}
+
+if (!block->driver->amend) {
+error_setg(errp,
+   "Crypto format %s doesn't support format options amendment",
+   QCryptoBlockFormat_str(block->format));
+return -1;
+}
+
+return block->driver->amend(block,
+readfunc,
+writefunc,
+opaque,
+options,
+force,
+errp);
+}
 
 QCryptoBlockInfo *qcrypto_block_get_info(QCryptoBlock *block,
  Error **errp)
diff --git a/crypto/blockpriv.h b/crypto/blockpriv.h
index 71c59cb542..3c7ccea504 100644
--- a/crypto/blockpriv.h
+++ b/crypto/blockpriv.h
@@ -62,6 +62,14 @@ struct QCryptoBlockDriver {
   void *opaque,
   Error **errp);
 
+int (*amend)(QCryptoBlock *block,
+ QCryptoBlockReadFunc readfunc,
+ QCryptoBlockWriteFunc writefunc,
+ void *opaque,
+ QCryptoBlockAmendOptions *options,
+ bool force,
+ Error **errp);
+
 int (*get_info)(QCryptoBlock *block,
 QCryptoBlockInfo *info,
 Error **errp);
diff --git a/include/crypto/block.h b/include/crypto/block.h
index c77ccaf9c0..d274819791 100644
--- a/include/crypto/block.h
+++ b/include/crypto/block.h
@@ -144,6 +144,28 @@ QCryptoBlock 
*qcrypto_block_create(QCryptoBlockCreateOptions *options,
void *opaque,
Error **errp);
 
+/**
+ * qcrypto_block_amend_options:
+ * @block: the block encryption object
+ *
+ * @readfunc: callback for reading data from the volume header
+ * @writefunc: callback for writing data to the volume header
+ * @opaque: data to pass to @readfunc and @writefunc
+ * @options: the new/amended encryption options
+ * @force: hint for the driver to allow unsafe operation
+ * @errp: error pointer
+ *
+ * Changes the crypto options of the encryption format
+ *
+ */
+int qcrypto_block_amend_options(QCryptoBlock *block,
+QCryptoBlockReadFunc readfunc,
+QCryptoBlockWriteFunc writefunc,
+void *opaque,
+QCryptoBlockAmendOptions *options,
+bool force,
+Error **errp);
+
 
 /**
  *

[PATCH v4 00/14] LUKS: encryption slot management using amend interface

2020-05-05 Thread Maxim Levitsky

Hi!
Here is the updated series of my patches, incorporating all the feedback I 
received.

This implements the API interface that we agreed upon except that I merged the
LUKSKeyslotActive/LUKSKeyslotInactive union into a struct because otherwise
I need nested unions which are not supported currently by QAPI parser.
This didn't change the API and thus once support for nested unions is there,
it can always be implemented in backward compatible way.

I hope that this series will finally be considered for merging, since I am 
somewhat running
out of time to finish this task.

Patches are strictly divided by topic to 3 groups, and each group depends on 
former groups.

* Patches 1,2 implement qcrypto generic amend interface, including definition
  of structs used in crypto.json and implement this in luks crypto driver
  Nothing is exposed to the user at this stage

* Patches 3-9 use the code from patches 1,2 to implement qemu-img amend based 
encryption slot management
  for luks and for qcow2, and add a bunch of iotests to cover that.

* Patches 10-13 add x-blockdev-amend (I'll drop the -x prefix if you like), and 
wire it
  to luks and qcow2 driver to implement qmp based encryption slot management 
also using
  the code from patches 1,2, and also add a bunch of iotests to cover this.

Tested with -raw,-qcow2,-nbd and -luks iotests and 'make check'

Changes from V3: reworked patch #2 to hopefully be more readable and user 
friendly

Best regards,
Maxim Levitsky

clone of "luks-keymgmnt-v2"

Maxim Levitsky (14):
  qcrypto/core: add generic infrastructure for crypto options amendment
  qcrypto/luks: implement encryption key management
  block/amend: add 'force' option
  block/amend: separate amend and create options for qemu-img
  block/amend: refactor qcow2 amend options
  block/crypto: rename two functions
  block/crypto: implement the encryption key management
  block/qcow2: extend qemu-img amend interface with crypto options
  iotests: filter few more luks specific create options
  iotests: qemu-img tests for luks key management
  block/core: add generic infrastructure for x-blockdev-amend qmp
command
  block/crypto: implement blockdev-amend
  block/qcow2: implement blockdev-amend
  iotests: add tests for blockdev-amend

 block.c  |   4 +-
 block/Makefile.objs  |   2 +-
 block/amend.c| 108 +
 block/crypto.c   | 203 ++--
 block/crypto.h   |  37 +++
 block/qcow2.c| 306 ++--
 crypto/block-luks.c  | 395 ++-
 crypto/block.c   |  29 +++
 crypto/blockpriv.h   |   8 +
 docs/tools/qemu-img.rst  |   5 +-
 include/block/block.h|   1 +
 include/block/block_int.h|  24 +-
 include/crypto/block.h   |  22 ++
 qapi/block-core.json |  68 ++
 qapi/crypto.json |  75 +-
 qapi/job.json|   4 +-
 qemu-img-cmds.hx |   4 +-
 qemu-img.c   |  44 +++-
 tests/qemu-iotests/049.out   | 102 
 tests/qemu-iotests/061.out   |  12 +-
 tests/qemu-iotests/079.out   |  18 +-
 tests/qemu-iotests/082.out   | 176 --
 tests/qemu-iotests/085.out   |  38 +--
 tests/qemu-iotests/087.out   |   6 +-
 tests/qemu-iotests/115.out   |   2 +-
 tests/qemu-iotests/121.out   |   4 +-
 tests/qemu-iotests/125.out   | 192 +++
 tests/qemu-iotests/134.out   |   2 +-
 tests/qemu-iotests/144.out   |   4 +-
 tests/qemu-iotests/158.out   |   4 +-
 tests/qemu-iotests/182.out   |   2 +-
 tests/qemu-iotests/185.out   |   8 +-
 tests/qemu-iotests/188.out   |   2 +-
 tests/qemu-iotests/189.out   |   4 +-
 tests/qemu-iotests/198.out   |   4 +-
 tests/qemu-iotests/243.out   |  16 +-
 tests/qemu-iotests/250.out   |   2 +-
 tests/qemu-iotests/255.out   |   8 +-
 tests/qemu-iotests/259.out   |   2 +-
 tests/qemu-iotests/263.out   |   4 +-
 tests/qemu-iotests/274.out   |  46 ++--
 tests/qemu-iotests/280.out   |   2 +-
 tests/qemu-iotests/284.out   |   6 +-
 tests/qemu-iotests/300   | 207 
 tests/qemu-iotests/300.out   |  99 
 tests/qemu-iotests/301   |  90 +++
 tests/qemu-iotests/301.out   |  30 +++
 tests/qemu-iotests/302   | 278 ++
 tests/qemu-iotests/302.out   |  40 
 tests/qemu-iotests/303   | 233 ++
 tests/qemu-iotests/303.out   |  33 +++
 tests/qemu-iotests/common.filter |   6 +-
 tests/qemu-iotests/group |   5 +
 53 files changed, 2493 insertions(+), 533 deletions(-)
 create mode 100644 block/amend.c
 create mode 100755 tests/qemu-iotests/300
 create mode 100644 tests/qemu-iotests/300.out
 create mode 100755 tests/qemu-iotests/301
 create mode 100644

Re: [PATCH v5 13/31] qcow2: Update get/set_l2_entry() and add get/set_l2_bitmap()

2020-05-05 Thread Eric Blake


On 5/5/20 12:38 PM, Alberto Garcia wrote:

Extended L2 entries are 128-bit wide: 64 bits for the entry itself and
64 bits for the subcluster allocation bitmap.

In order to support them correctly get/set_l2_entry() need to be
updated so they take the entry width into account in order to
calculate the correct offset.

This patch also adds the get/set_l2_bitmap() functions that are
used to access the bitmaps. For convenience we allow calling
get_l2_bitmap() on images without subclusters. In this case the
returned value is always 0 and has no meaning.

Signed-off-by: Alberto Garcia 
---
  block/qcow2.h | 21 +
  1 file changed, 21 insertions(+)




+static inline void set_l2_bitmap(BDRVQcow2State *s, uint64_t *l2_slice,
+ int idx, uint64_t bitmap)
+{
+assert(has_subclusters(s));
+idx *= l2_entry_size(s) / sizeof(uint64_t);
+l2_slice[idx + 1] = cpu_to_be64(bitmap);
+}


Unrelated to this patch, but I just thought of it:

What happens for an image whose size is not cluster-aligned?  Must the 
bits corresponding to subclusters not present in the final cluster 
always be zero, or are they instead ignored regardless of value?


But for this patch:

Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH v5 11/31] qcow2: Add offset_into_subcluster() and size_to_subclusters()

2020-05-05 Thread Eric Blake


On 5/5/20 12:38 PM, Alberto Garcia wrote:

Like offset_into_cluster() and size_to_clusters(), but for
subclusters.

Signed-off-by: Alberto Garcia 
---
  block/qcow2.h | 10 ++
  1 file changed, 10 insertions(+)

diff --git a/block/qcow2.h b/block/qcow2.h
index e68febb15b..8b1ed1cbcf 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -537,11 +537,21 @@ static inline int64_t offset_into_cluster(BDRVQcow2State 
*s, int64_t offset)
  return offset & (s->cluster_size - 1);
  }
  
+static inline int64_t offset_into_subcluster(BDRVQcow2State *s, int64_t offset)

+{
+return offset & (s->subcluster_size - 1);
+}
+
  static inline uint64_t size_to_clusters(BDRVQcow2State *s, uint64_t size)
  {
  return (size + (s->cluster_size - 1)) >> s->cluster_bits;
  }


Pre-existing, but this could use DIV_ROUND_UP.

  
+static inline uint64_t size_to_subclusters(BDRVQcow2State *s, uint64_t size)

+{
+return (size + (s->subcluster_size - 1)) >> s->subcluster_bits;
+}


at which point, your addition could be:

return DIV_ROUND_UP(size, s->subcluster_size);

Either way,

Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH v5 07/31] qcow2: Document the Extended L2 Entries feature

2020-05-05 Thread Eric Blake


On 5/5/20 12:38 PM, Alberto Garcia wrote:

Subcluster allocation in qcow2 is implemented by extending the
existing L2 table entries and adding additional information to
indicate the allocation status of each subcluster.

This patch documents the changes to the qcow2 format and how they
affect the calculation of the L2 cache size.

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
---
  docs/interop/qcow2.txt | 68 --
  docs/qcow2-cache.txt   | 19 +++-
  2 files changed, 83 insertions(+), 4 deletions(-)




@@ -547,7 +557,8 @@ Standard Cluster Descriptor:
  nor is data read from the backing file if the cluster is
  unallocated.
  
-With version 2, this is always 0.

+With version 2 or with extended L2 entries (see the next
+section), this is always 0.


In your cover letter, you said you changed things to tolerate this bit 
being set even with extended L2 entries.  Does this sentence need a tweak?


  
   1 -  8:Reserved (set to 0)
  
@@ -584,6 +595,57 @@ file (except if bit 0 in the Standard Cluster Descriptor is set). If there is

  no backing file or the backing file is smaller than the image, they shall read
  zeros for all parts that are not covered by the backing file.
  
+== Extended L2 Entries ==

+
+An image uses Extended L2 Entries if bit 4 is set on the incompatible_features
+field of the header.
+
+In these images standard data clusters are divided into 32 subclusters of the
+same size. They are contiguous and start from the beginning of the cluster.
+Subclusters can be allocated independently and the L2 entry contains 
information
+indicating the status of each one of them. Compressed data clusters don't have
+subclusters so they are treated the same as in images without this feature.
+
+The size of an extended L2 entry is 128 bits so the number of entries per table
+is calculated using this formula:
+
+l2_entries = (cluster_size / (2 * sizeof(uint64_t)))
+
+The first 64 bits have the same format as the standard L2 table entry described
+in the previous section, with the exception of bit 0 of the standard cluster
+descriptor.


Also this sentence.


+
+The last 64 bits contain a subcluster allocation bitmap with this format:
+
+Subcluster Allocation Bitmap (for standard clusters):
+
+Bit  0 -  31:   Allocation status (one bit per subcluster)


Why two spaces after '-'?  I understand it in situations like '0 -  3' 
in the same list as '16 - 19', to make for a right-justified column, but 
here, everything in the second column is two digits, so the extra 
padding doesn't add anything useful.  Or did you mean to have '64 -  95' 
and '96 - 127', making it obvious that these are the second set of bits 
on top of the existing bits in the first 8 bytes?



+
+1: the subcluster is allocated. In this case the
+   host cluster offset field must contain a valid
+   offset.
+0: the subcluster is not allocated. In this case
+   read requests shall go to the backing file or
+   return zeros if there is no backing file data.
+
+Bits are assigned starting from the least significant
+one (i.e. bit x is used for subcluster x).
+
+32 -  63Subcluster reads as zeros (one bit per subcluster)
+
+1: the subcluster reads as zeros. In this case the
+   allocation status bit must be unset. The host
+   cluster offset field may or may not be set.
+0: no effect.
+
+Bits are assigned starting from the least significant
+one (i.e. bit x is used for subcluster x - 32).


Of course, if you change to 64-95 and 96-127, the two sentences mapping 
bit x to subcluster y need adjusting by 64 as well.



+
+Subcluster Allocation Bitmap (for compressed clusters):
+
+Bit  0 -  63:   Reserved (set to 0)
+Compressed clusters don't have subclusters,
+so this field is not used.


I can live with the wording as-is (since you did call out the "second 64 
bits" or with the adjusted bit numberings.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH v5 05/31] qcow2: Process QCOW2_CLUSTER_ZERO_ALLOC clusters in handle_copied()

2020-05-05 Thread Eric Blake


On 5/5/20 12:38 PM, Alberto Garcia wrote:

When writing to a qcow2 file there are two functions that take a
virtual offset and return a host offset, possibly allocating new
clusters if necessary:

- handle_copied() looks for normal data clusters that are already
  allocated and have a reference count of 1. In those clusters we
  can simply write the data and there is no need to perform any
  copy-on-write.

- handle_alloc() looks for clusters that do need copy-on-write,
  either because they haven't been allocated yet, because their
  reference count is != 1 or because they are ZERO_ALLOC clusters.

The ZERO_ALLOC case is a bit special because those are clusters that
are already allocated and they could perfectly be dealt with in
handle_copied() (as long as copy-on-write is performed when required).

In fact, there is extra code specifically for them in handle_alloc()
that tries to reuse the existing allocation if possible and frees them
otherwise.

This patch changes the handling of ZERO_ALLOC clusters so the
semantics of these two functions are now like this:

- handle_copied() looks for clusters that are already allocated and
  which we can overwrite (NORMAL and ZERO_ALLOC clusters with a
  reference count of 1).

- handle_alloc() looks for clusters for which we need a new
  allocation (all other cases).

One important difference after this change is that clusters found
in handle_copied() may now require copy-on-write, but this will be
necessary anyway once we add support for subclusters.

Signed-off-by: Alberto Garcia 
---
  block/qcow2-cluster.c | 256 +++---
  1 file changed, 141 insertions(+), 115 deletions(-)



@@ -1053,15 +1058,53 @@ void qcow2_alloc_cluster_abort(BlockDriverState *bs, 
QCowL2Meta *m)
  static void calculate_l2_meta(BlockDriverState *bs,
uint64_t host_cluster_offset,
uint64_t guest_offset, unsigned bytes,
-  QCowL2Meta **m, bool keep_old)
+  uint64_t *l2_slice, QCowL2Meta **m, bool 
keep_old)
  {


Borderline long line, but it fits ;)


  BDRVQcow2State *s = bs->opaque;
-unsigned cow_start_from = 0;
+int l2_index = offset_to_l2_slice_index(s, guest_offset);
+uint64_t l2_entry;
+unsigned cow_start_from, cow_end_to;
  unsigned cow_start_to = offset_into_cluster(s, guest_offset);
  unsigned cow_end_from = cow_start_to + bytes;
-unsigned cow_end_to = ROUND_UP(cow_end_from, s->cluster_size);
  unsigned nb_clusters = size_to_clusters(s, cow_end_from);
  QCowL2Meta *old_m = *m;
+QCow2ClusterType type;
+
+assert(nb_clusters <= s->l2_slice_size - l2_index);
+
+/* Return if there's no COW (all clusters are normal and we keep them) */
+if (keep_old) {
+int i;
+for (i = 0; i < nb_clusters; i++) {
+l2_entry = be64_to_cpu(l2_slice[l2_index + i]);
+if (qcow2_get_cluster_type(bs, l2_entry) != QCOW2_CLUSTER_NORMAL) {
+break;
+}
+}
+if (i == nb_clusters) {
+return;
+}
+}
+
+/* Get the L2 entry of the first cluster */
+l2_entry = be64_to_cpu(l2_slice[l2_index]);


This is the second time we're grabbing the first entry in this function. 
But I don't think it's worth trying to micro-optimize.




+static int count_single_write_clusters(BlockDriverState *bs, int nb_clusters,
+   uint64_t *l2_slice, int l2_index,
+   bool new_alloc)
  {
+BDRVQcow2State *s = bs->opaque;
+uint64_t l2_entry = be64_to_cpu(l2_slice[l2_index]);
+uint64_t expected_offset = l2_entry & L2E_OFFSET_MASK;
  int i;
  
  for (i = 0; i < nb_clusters; i++) {

-uint64_t l2_entry = be64_to_cpu(l2_slice[l2_index + i]);
-if (!cluster_needs_cow(bs, l2_entry)) {
+l2_entry = be64_to_cpu(l2_slice[l2_index + i]);


And another place where we compute l2_entry for the first cluster twice, 
and again not worth micro-optimizing.


I didn't find anything that needs a change.

Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

[PATCH v5 00/31] Add subcluster allocation to qcow2

2020-05-05 Thread Alberto Garcia

Hi,

here's the new version of the patches to add subcluster allocation
support to qcow2.

Please refer to the cover letter of the first version for a full
description of the patches:

   https://lists.gnu.org/archive/html/qemu-block/2019-10/msg00983.html

Important changes here:

- I fixed hopefully all of the issues mentioned in the previous
  review. Thanks to everyone who contributed.

- There's now support for partial zeroing of clusters (i.e. at the
  subcluster level).

- Many more tests.

- QCOW_OFLAG_ZERO is simply ignored now and not considered a sign of a
  corrupt image anymore. I hesitated about this, but we could still
  add that check later. I think there's a case for adding a new
  QCOW2_CLUSTER_INVALID type and include this and other scenarios that
  we already consider corrupt (for example: clusters with unaligned
  offsets). We would need to see if for 'qemu-img check' adding
  QCOW2_CLUSTER_INVALID complicates things or not. But I think that
  all is material for its own series.

And I think that's all. See below for the detailed list of changes,
and thanks again for the feedback.

Berto

v5:
- Patch 01: Fix indentation [Max], add trace event [Vladimir]
- Patch 02: Add host_cluster_offset variable [Vladirmir]
- Patch 05: Have separate l2_entry and cluster_offset variables [Vladimir]
- Patch 06: Only context changes due to patch 05
- Patch 11: New patch
- Patch 13: Change documentation of get_l2_entry()
- Patch 14: Add QCOW_OFLAG_SUB_{ALLOC,ZERO}_RANGE [Eric] and rewrite
the other macros.
Ignore QCOW_OFLAG_ZERO on images with subclusters
(i.e. don't treat them as corrupted).
- Patch 15: New patch
- Patch 19: Optimize cow by skipping all leading and trailing zero and
unallocated subclusters [Vladimir]
Return 0 on success [Vladimir]
Squash patch that updated handle_dependencies() [Vladirmir]
- Patch 20: Call count_contiguous_subclusters() after the main switch
in qcow2_get_host_offset() [Vladimir]
Add assertion and remove goto statement [Vladimir]
- Patch 21: Rewrite algorithm.
- Patch 22: Rewrite algorithm.
- Patch 24: Replace loop with the _RANGE macros from patch 14 [Eric]
- Patch 27: New patch
- Patch 28: Update version number and expected output from tests.
- Patch 31: Add many more new tests

v4: https://lists.gnu.org/archive/html/qemu-block/2020-03/msg00966.html
v3: https://lists.gnu.org/archive/html/qemu-block/2019-12/msg00587.html
v2: https://lists.gnu.org/archive/html/qemu-block/2019-10/msg01642.html
v1: https://lists.gnu.org/archive/html/qemu-block/2019-10/msg00983.html

Output of git backport-diff against v4:

Key:
[] : patches are identical
[] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/31:[0005] [FC] 'qcow2: Make Qcow2AioTask store the full host offset'
002/31:[0018] [FC] 'qcow2: Convert qcow2_get_cluster_offset() into 
qcow2_get_host_offset()'
003/31:[] [--] 'qcow2: Add calculate_l2_meta()'
004/31:[] [--] 'qcow2: Split cluster_needs_cow() out of 
count_cow_clusters()'
005/31:[0038] [FC] 'qcow2: Process QCOW2_CLUSTER_ZERO_ALLOC clusters in 
handle_copied()'
006/31:[0004] [FC] 'qcow2: Add get_l2_entry() and set_l2_entry()'
007/31:[] [--] 'qcow2: Document the Extended L2 Entries feature'
008/31:[] [--] 'qcow2: Add dummy has_subclusters() function'
009/31:[] [--] 'qcow2: Add subcluster-related fields to BDRVQcow2State'
010/31:[] [--] 'qcow2: Add offset_to_sc_index()'
011/31:[down] 'qcow2: Add offset_into_subcluster() and size_to_subclusters()'
012/31:[] [--] 'qcow2: Add l2_entry_size()'
013/31:[0003] [FC] 'qcow2: Update get/set_l2_entry() and add 
get/set_l2_bitmap()'
014/31:[0023] [FC] 'qcow2: Add QCow2SubclusterType and 
qcow2_get_subcluster_type()'
015/31:[down] 'qcow2: Add qcow2_cluster_is_allocated()'
016/31:[] [--] 'qcow2: Add cluster type parameter to 
qcow2_get_host_offset()'
017/31:[] [--] 'qcow2: Replace QCOW2_CLUSTER_* with QCOW2_SUBCLUSTER_*'
018/31:[] [--] 'qcow2: Handle QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC'
019/31:[0066] [FC] 'qcow2: Add subcluster support to calculate_l2_meta()'
020/31:[0022] [FC] 'qcow2: Add subcluster support to qcow2_get_host_offset()'
021/31:[0040] [FC] 'qcow2: Add subcluster support to zero_in_l2_slice()'
022/31:[0061] [FC] 'qcow2: Add subcluster support to discard_in_l2_slice()'
023/31:[] [--] 'qcow2: Add subcluster support to check_refcounts_l2()'
024/31:[0019] [FC] 'qcow2: Update L2 bitmap in qcow2_alloc_cluster_link_l2()'
025/31:[] [--] 'qcow2: Clear the L2 bitmap when allocating a compressed 
cluster'
026/31:[] [--] 'qcow2: Add subcluster support to handle_alloc_space()'
027/31:[down] 'qcow2: Add subcluster support to qcow2_co_pwrite_zeroes()'
028/31:[0105] [FC] 'qcow2: Add the 'extended_l2' option and the 
QCOW2_INCOMPAT_EXTL2 bit'

[PATCH v5 23/31] qcow2: Add subcluster support to check_refcounts_l2()

2020-05-05 Thread Alberto Garcia

Setting the QCOW_OFLAG_ZERO bit of the L2 entry is forbidden if an
image has subclusters. Instead, the individual 'all zeroes' bits must
be used.

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 block/qcow2-refcount.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index dfdcdd3c25..9bb161481e 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1686,8 +1686,13 @@ static int check_refcounts_l2(BlockDriverState *bs, 
BdrvCheckResult *res,
 int ign = active ? QCOW2_OL_ACTIVE_L2 :
QCOW2_OL_INACTIVE_L2;
 
-l2_entry = QCOW_OFLAG_ZERO;
-set_l2_entry(s, l2_table, i, l2_entry);
+if (has_subclusters(s)) {
+set_l2_entry(s, l2_table, i, 0);
+set_l2_bitmap(s, l2_table, i,
+  QCOW_L2_BITMAP_ALL_ZEROES);
+} else {
+set_l2_entry(s, l2_table, i, QCOW_OFLAG_ZERO);
+}
 ret = qcow2_pre_write_overlap_check(bs, ign,
 l2e_offset, l2_entry_size(s), false);
 if (ret < 0) {
-- 
2.20.1

[PATCH v5 17/31] qcow2: Replace QCOW2_CLUSTER_* with QCOW2_SUBCLUSTER_*

2020-05-05 Thread Alberto Garcia

In order to support extended L2 entries some functions of the qcow2
driver need to start dealing with subclusters instead of clusters.

qcow2_get_host_offset() is modified to return the subcluster type
instead of the cluster type, and all callers are updated to replace
all values of QCow2ClusterType with their QCow2SubclusterType
equivalents.

This patch only changes the data types, there are no semantic changes.

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 block/qcow2.h |  2 +-
 block/qcow2-cluster.c | 10 +++
 block/qcow2.c | 70 ++-
 3 files changed, 42 insertions(+), 40 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index cca650e934..1663b5359c 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -877,7 +877,7 @@ int qcow2_encrypt_sectors(BDRVQcow2State *s, int64_t 
sector_num,
 
 int qcow2_get_host_offset(BlockDriverState *bs, uint64_t offset,
   unsigned int *bytes, uint64_t *host_offset,
-  QCow2ClusterType *cluster_type);
+  QCow2SubclusterType *subcluster_type);
 int qcow2_alloc_cluster_offset(BlockDriverState *bs, uint64_t offset,
unsigned int *bytes, uint64_t *host_offset,
QCowL2Meta **m);
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 64481067ce..5595ce1404 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -513,15 +513,15 @@ static int coroutine_fn 
do_perform_cow_write(BlockDriverState *bs,
  * offset that we are interested in.
  *
  * On exit, *bytes is the number of bytes starting at offset that have the same
- * cluster type and (if applicable) are stored contiguously in the image file.
- * The cluster type is stored in *cluster_type.
- * Compressed clusters are always returned one by one.
+ * subcluster type and (if applicable) are stored contiguously in the image
+ * file. The subcluster type is stored in *subcluster_type.
+ * Compressed clusters are always processed one by one.
  *
  * Returns 0 on success, -errno in error cases.
  */
 int qcow2_get_host_offset(BlockDriverState *bs, uint64_t offset,
   unsigned int *bytes, uint64_t *host_offset,
-  QCow2ClusterType *cluster_type)
+  QCow2SubclusterType *subcluster_type)
 {
 BDRVQcow2State *s = bs->opaque;
 unsigned int l2_index;
@@ -662,7 +662,7 @@ out:
 assert(bytes_available - offset_in_cluster <= UINT_MAX);
 *bytes = bytes_available - offset_in_cluster;
 
-*cluster_type = type;
+*subcluster_type = qcow2_cluster_to_subcluster_type(type);
 
 return 0;
 
diff --git a/block/qcow2.c b/block/qcow2.c
index b9effb195b..13965f2e1d 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1973,7 +1973,7 @@ static int coroutine_fn 
qcow2_co_block_status(BlockDriverState *bs,
 BDRVQcow2State *s = bs->opaque;
 uint64_t host_offset;
 unsigned int bytes;
-QCow2ClusterType type;
+QCow2SubclusterType type;
 int ret, status = 0;
 
 qemu_co_mutex_lock(>lock);
@@ -1993,15 +1993,16 @@ static int coroutine_fn 
qcow2_co_block_status(BlockDriverState *bs,
 
 *pnum = bytes;
 
-if ((type == QCOW2_CLUSTER_NORMAL || type == QCOW2_CLUSTER_ZERO_ALLOC) &&
-!s->crypto) {
+if ((type == QCOW2_SUBCLUSTER_NORMAL ||
+ type == QCOW2_SUBCLUSTER_ZERO_ALLOC) && !s->crypto) {
 *map = host_offset;
 *file = s->data_file->bs;
 status |= BDRV_BLOCK_OFFSET_VALID;
 }
-if (type == QCOW2_CLUSTER_ZERO_PLAIN || type == QCOW2_CLUSTER_ZERO_ALLOC) {
+if (type == QCOW2_SUBCLUSTER_ZERO_PLAIN ||
+type == QCOW2_SUBCLUSTER_ZERO_ALLOC) {
 status |= BDRV_BLOCK_ZERO;
-} else if (type != QCOW2_CLUSTER_UNALLOCATED) {
+} else if (type != QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN) {
 status |= BDRV_BLOCK_DATA;
 }
 if (s->metadata_preallocation && (status & BDRV_BLOCK_DATA) &&
@@ -2098,7 +2099,7 @@ typedef struct Qcow2AioTask {
 AioTask task;
 
 BlockDriverState *bs;
-QCow2ClusterType cluster_type; /* only for read */
+QCow2SubclusterType subcluster_type; /* only for read */
 uint64_t host_offset; /* or full descriptor in compressed clusters */
 uint64_t offset;
 uint64_t bytes;
@@ -2111,7 +2112,7 @@ static coroutine_fn int 
qcow2_co_preadv_task_entry(AioTask *task);
 static coroutine_fn int qcow2_add_task(BlockDriverState *bs,
AioTaskPool *pool,
AioTaskFunc func,
-   QCow2ClusterType cluster_type,
+   QCow2SubclusterType subcluster_type,
uint64_t host_offset,
uint64_t offset,
uint64_t bytes,
@@ -2125,7

[PATCH v5 29/31] qcow2: Assert that expand_zero_clusters_in_l1() does not support subclusters

2020-05-05 Thread Alberto Garcia

This function is only used by qcow2_expand_zero_clusters() to
downgrade a qcow2 image to a previous version. It is however not
possible to downgrade an image with extended L2 entries because older
versions of qcow2 do not have this feature.

Signed-off-by: Alberto Garcia 
---
 block/qcow2-cluster.c  | 8 +++-
 tests/qemu-iotests/061 | 6 ++
 tests/qemu-iotests/061.out | 5 +
 3 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index d0cf9d52e6..50da38800e 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -2113,6 +2113,9 @@ static int expand_zero_clusters_in_l1(BlockDriverState 
*bs, uint64_t *l1_table,
 int ret;
 int i, j;
 
+/* qcow2_downgrade() is not allowed in images with subclusters */
+assert(!has_subclusters(s));
+
 slice_size2 = s->l2_slice_size * l2_entry_size(s);
 n_slices = s->cluster_size / slice_size2;
 
@@ -2181,7 +2184,8 @@ static int expand_zero_clusters_in_l1(BlockDriverState 
*bs, uint64_t *l1_table,
 if (cluster_type == QCOW2_CLUSTER_ZERO_PLAIN) {
 if (!bs->backing) {
 /* not backed; therefore we can simply deallocate the
- * cluster */
+ * cluster. No need to call set_l2_bitmap(), this
+ * function doesn't support images with subclusters. */
 set_l2_entry(s, l2_slice, j, 0);
 l2_dirty = true;
 continue;
@@ -2252,6 +2256,8 @@ static int expand_zero_clusters_in_l1(BlockDriverState 
*bs, uint64_t *l1_table,
 } else {
 set_l2_entry(s, l2_slice, j, offset);
 }
+/* No need to call set_l2_bitmap() after set_l2_entry() because
+ * this function doesn't support images with subclusters. */
 l2_dirty = true;
 }
 
diff --git a/tests/qemu-iotests/061 b/tests/qemu-iotests/061
index ce285d3084..f746d7ece3 100755
--- a/tests/qemu-iotests/061
+++ b/tests/qemu-iotests/061
@@ -268,6 +268,12 @@ $QEMU_IMG amend -o "compat=0.10" "$TEST_IMG"
 _img_info --format-specific
 _check_test_img
 
+echo
+echo "=== Testing version downgrade with extended L2 entries ==="
+echo
+_make_test_img -o "compat=1.1,extended_l2=on" 64M
+$QEMU_IMG amend -o "compat=0.10" "$TEST_IMG"
+
 echo
 echo "=== Try changing the external data file ==="
 echo
diff --git a/tests/qemu-iotests/061.out b/tests/qemu-iotests/061.out
index b7b2533e0a..4c87bb1a3f 100644
--- a/tests/qemu-iotests/061.out
+++ b/tests/qemu-iotests/061.out
@@ -499,6 +499,11 @@ Format specific information:
 extended l2: false
 No errors were found on the image.
 
+=== Testing version downgrade with extended L2 entries ===
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
+qemu-img: Cannot downgrade an image with incompatible features 0x10 set
+
 === Try changing the external data file ===
 
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
-- 
2.20.1

[PATCH v5 31/31] iotests: Add tests for qcow2 images with extended L2 entries

2020-05-05 Thread Alberto Garcia

Signed-off-by: Alberto Garcia 
---
 tests/qemu-iotests/271 | 664 +
 tests/qemu-iotests/271.out | 519 +
 tests/qemu-iotests/group   |   1 +
 3 files changed, 1184 insertions(+)
 create mode 100755 tests/qemu-iotests/271
 create mode 100644 tests/qemu-iotests/271.out

diff --git a/tests/qemu-iotests/271 b/tests/qemu-iotests/271
new file mode 100755
index 00..2df0dac00f
--- /dev/null
+++ b/tests/qemu-iotests/271
@@ -0,0 +1,664 @@
+#!/bin/bash
+#
+# Test qcow2 images with extended L2 entries
+#
+# Copyright (C) 2019-2020 Igalia, S.L.
+# Author: Alberto Garcia 
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+# creator
+owner=be...@igalia.com
+
+seq="$(basename $0)"
+echo "QA output created by $seq"
+
+here="$PWD"
+status=1   # failure is the default!
+
+_cleanup()
+{
+   _cleanup_test_img
+rm -f "$TEST_IMG.raw"
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+_supported_fmt qcow2
+_supported_proto file nfs
+_supported_os Linux
+_unsupported_imgopts extended_l2 compat=0.10 cluster_size data_file
+
+l2_offset=262144 # 0x4
+
+_verify_img()
+{
+$QEMU_IMG compare "$TEST_IMG" "$TEST_IMG.raw" | grep -v 'Images are 
identical'
+$QEMU_IMG check "$TEST_IMG" | _filter_qemu_img_check | \
+grep -v 'No errors were found on the image'
+}
+
+# Compare the bitmap of an extended L2 entry against an expected value
+_verify_l2_bitmap()
+{
+entry_no="$1"# L2 entry number, starting from 0
+expected_alloc="$2"  # Space-separated list of allocated subcluster indexes
+expected_zero="$3"   # Space-separated list of zero subcluster indexes
+
+offset=$(($l2_offset + $entry_no * 16))
+entry=`peek_file_be "$TEST_IMG" $offset 8`
+offset=$(($offset + 8))
+bitmap=`peek_file_be "$TEST_IMG" $offset 8`
+
+expected_bitmap=0
+for bit in $expected_alloc; do
+expected_bitmap=$(($expected_bitmap | (1 << $bit)))
+done
+for bit in $expected_zero; do
+expected_bitmap=$(($expected_bitmap | (1 << (32 + $bit
+done
+expected_bitmap=`printf "%llu" $expected_bitmap`
+
+printf "L2 entry #%d: 0x%016lx %016lx\n" "$entry_no" "$entry" "$bitmap"
+if [ "$bitmap" != "$expected_bitmap" ]; then
+printf "ERROR: expecting bitmap   0x%016lx\n" "$expected_bitmap"
+fi
+}
+
+_test_write()
+{
+cmd="$1"
+alloc_bitmap="$2"
+zero_bitmap="$3"
+l2_entry_idx="$4"
+[ -n "$l2_entry_idx" ] || l2_entry_idx=0
+raw_cmd=`echo $cmd | sed s/-c//` # Raw images don't support -c
+echo "$cmd"
+$QEMU_IO -c "$cmd" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "$raw_cmd" -f raw "$TEST_IMG.raw" | _filter_qemu_io
+_verify_img
+_verify_l2_bitmap "$l2_entry_idx" "$alloc_bitmap" "$zero_bitmap"
+}
+
+_reset_img()
+{
+size="$1"
+$QEMU_IMG create -f raw "$TEST_IMG.raw" "$size" | _filter_img_create
+if [ "$use_backing_file" = "yes" ]; then
+$QEMU_IMG create -f raw "$TEST_IMG.base" "$size" | _filter_img_create
+$QEMU_IO -c "write -q -P 0xFF 0 $size" -f raw "$TEST_IMG.base" | 
_filter_qemu_io
+$QEMU_IO -c "write -q -P 0xFF 0 $size" -f raw "$TEST_IMG.raw" | 
_filter_qemu_io
+_make_test_img -o extended_l2=on -b "$TEST_IMG.base" "$size"
+else
+_make_test_img -o extended_l2=on "$size"
+fi
+}
+
+# Test that writing to an image with subclusters produces the expected
+# results, in images with and without backing files
+for use_backing_file in yes no; do
+echo
+echo "### Standard write tests (backing file: $use_backing_file) ###"
+echo
+_reset_img 1M
+### Write subcluster #0 (beginning of subcluster) ###
+alloc="0"; zero=""
+_test_write 'write -q -P 1 0 1k' "$alloc" "$zero"
+
+### Write subcluster #1 (middle of subcluster) ###
+alloc="0 1"; zero=""
+_test_write 'write -q -P 2 3k 512' "$alloc" "$zero"
+
+### Write subcluster #2 (end of subcluster) ###
+alloc="0 1 2"; zero=""
+_test_write 'write -q -P 3 5k 1k' "$alloc" "$zero"
+
+### Write subcluster #3 (full subcluster) ###
+alloc="0 1 2 3"; zero=""
+_test_write 'write -q -P 4 6k 2k' "$alloc" "$zero"
+
+### Write subclusters #4-6 (full subclusters) ###
+alloc="`seq 0 6`"; zero=""
+_test_write 'write -q

[PATCH v5 06/31] qcow2: Add get_l2_entry() and set_l2_entry()

2020-05-05 Thread Alberto Garcia

The size of an L2 entry is 64 bits, but if we want to have subclusters
we need extended L2 entries. This means that we have to access L2
tables and slices differently depending on whether an image has
extended L2 entries or not.

This patch replaces all l2_slice[] accesses with calls to
get_l2_entry() and set_l2_entry().

Signed-off-by: Alberto Garcia 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 block/qcow2.h  | 12 
 block/qcow2-cluster.c  | 63 ++
 block/qcow2-refcount.c | 17 ++--
 3 files changed, 54 insertions(+), 38 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 37e4f79e39..97fbaba574 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -492,6 +492,18 @@ typedef enum QCow2MetadataOverlap {
 
 #define INV_OFFSET (-1ULL)
 
+static inline uint64_t get_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice,
+int idx)
+{
+return be64_to_cpu(l2_slice[idx]);
+}
+
+static inline void set_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice,
+int idx, uint64_t entry)
+{
+l2_slice[idx] = cpu_to_be64(entry);
+}
+
 static inline bool has_data_file(BlockDriverState *bs)
 {
 BDRVQcow2State *s = bs->opaque;
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index fce0be7a08..76fd0f3cdb 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -383,12 +383,13 @@ fail:
  * cluster which may require a different handling)
  */
 static int count_contiguous_clusters(BlockDriverState *bs, int nb_clusters,
-int cluster_size, uint64_t *l2_slice, uint64_t stop_flags)
+int cluster_size, uint64_t *l2_slice, int l2_index, uint64_t 
stop_flags)
 {
+BDRVQcow2State *s = bs->opaque;
 int i;
 QCow2ClusterType first_cluster_type;
 uint64_t mask = stop_flags | L2E_OFFSET_MASK | QCOW_OFLAG_COMPRESSED;
-uint64_t first_entry = be64_to_cpu(l2_slice[0]);
+uint64_t first_entry = get_l2_entry(s, l2_slice, l2_index);
 uint64_t offset = first_entry & mask;
 
 first_cluster_type = qcow2_get_cluster_type(bs, first_entry);
@@ -401,7 +402,7 @@ static int count_contiguous_clusters(BlockDriverState *bs, 
int nb_clusters,
first_cluster_type == QCOW2_CLUSTER_ZERO_ALLOC);
 
 for (i = 0; i < nb_clusters; i++) {
-uint64_t l2_entry = be64_to_cpu(l2_slice[i]) & mask;
+uint64_t l2_entry = get_l2_entry(s, l2_slice, l2_index + i) & mask;
 if (offset + (uint64_t) i * cluster_size != l2_entry) {
 break;
 }
@@ -417,14 +418,16 @@ static int count_contiguous_clusters(BlockDriverState 
*bs, int nb_clusters,
 static int count_contiguous_clusters_unallocated(BlockDriverState *bs,
  int nb_clusters,
  uint64_t *l2_slice,
+ int l2_index,
  QCow2ClusterType wanted_type)
 {
+BDRVQcow2State *s = bs->opaque;
 int i;
 
 assert(wanted_type == QCOW2_CLUSTER_ZERO_PLAIN ||
wanted_type == QCOW2_CLUSTER_UNALLOCATED);
 for (i = 0; i < nb_clusters; i++) {
-uint64_t entry = be64_to_cpu(l2_slice[i]);
+uint64_t entry = get_l2_entry(s, l2_slice, l2_index + i);
 QCow2ClusterType type = qcow2_get_cluster_type(bs, entry);
 
 if (type != wanted_type) {
@@ -573,7 +576,7 @@ int qcow2_get_host_offset(BlockDriverState *bs, uint64_t 
offset,
 /* find the cluster offset for the given disk offset */
 
 l2_index = offset_to_l2_slice_index(s, offset);
-l2_entry = be64_to_cpu(l2_slice[l2_index]);
+l2_entry = get_l2_entry(s, l2_slice, l2_index);
 
 nb_clusters = size_to_clusters(s, bytes_needed);
 /* bytes_needed <= *bytes + offset_in_cluster, both of which are unsigned
@@ -608,7 +611,7 @@ int qcow2_get_host_offset(BlockDriverState *bs, uint64_t 
offset,
 case QCOW2_CLUSTER_UNALLOCATED:
 /* how many empty clusters ? */
 c = count_contiguous_clusters_unallocated(bs, nb_clusters,
-  _slice[l2_index], type);
+  l2_slice, l2_index, type);
 *host_offset = 0;
 break;
 case QCOW2_CLUSTER_ZERO_ALLOC:
@@ -617,7 +620,7 @@ int qcow2_get_host_offset(BlockDriverState *bs, uint64_t 
offset,
 *host_offset = host_cluster_offset + offset_in_cluster;
 /* how many allocated clusters ? */
 c = count_contiguous_clusters(bs, nb_clusters, s->cluster_size,
-  _slice[l2_index], QCOW_OFLAG_ZERO);
+  l2_slice, l2_index, QCOW_OFLAG_ZERO);
 if (offset_into_cluster(s, host_cluster_offset)) {
 qcow2_signal_corruption(bs, true, -1, -1,
 "Cluster allocation offset %#"

[PATCH v5 30/31] qcow2: Add subcluster support to qcow2_measure()

2020-05-05 Thread Alberto Garcia

Extended L2 entries are bigger than normal L2 entries so this has an
impact on the amount of metadata needed for a qcow2 file.

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
---
 block/qcow2.c | 19 ---
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 18c8e3f52a..31d72f1297 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3173,28 +3173,31 @@ int64_t qcow2_refcount_metadata_size(int64_t clusters, 
size_t cluster_size,
  * @total_size: virtual disk size in bytes
  * @cluster_size: cluster size in bytes
  * @refcount_order: refcount bits power-of-2 exponent
+ * @extended_l2: true if the image has extended L2 entries
  *
  * Returns: Total number of bytes required for the fully allocated image
  * (including metadata).
  */
 static int64_t qcow2_calc_prealloc_size(int64_t total_size,
 size_t cluster_size,
-int refcount_order)
+int refcount_order,
+bool extended_l2)
 {
 int64_t meta_size = 0;
 uint64_t nl1e, nl2e;
 int64_t aligned_total_size = ROUND_UP(total_size, cluster_size);
+size_t l2e_size = extended_l2 ? L2E_SIZE_EXTENDED : L2E_SIZE_NORMAL;
 
 /* header: 1 cluster */
 meta_size += cluster_size;
 
 /* total size of L2 tables */
 nl2e = aligned_total_size / cluster_size;
-nl2e = ROUND_UP(nl2e, cluster_size / sizeof(uint64_t));
-meta_size += nl2e * sizeof(uint64_t);
+nl2e = ROUND_UP(nl2e, cluster_size / l2e_size);
+meta_size += nl2e * l2e_size;
 
 /* total size of L1 tables */
-nl1e = nl2e * sizeof(uint64_t) / cluster_size;
+nl1e = nl2e * l2e_size / cluster_size;
 nl1e = ROUND_UP(nl1e, cluster_size / sizeof(uint64_t));
 meta_size += nl1e * sizeof(uint64_t);
 
@@ -4765,6 +4768,7 @@ static BlockMeasureInfo *qcow2_measure(QemuOpts *opts, 
BlockDriverState *in_bs,
 bool has_backing_file;
 bool has_luks;
 bool extended_l2;
+size_t l2e_size;
 
 /* Parse image creation options */
 extended_l2 = qemu_opt_get_bool_del(opts, BLOCK_OPT_EXTL2, false);
@@ -4833,8 +4837,9 @@ static BlockMeasureInfo *qcow2_measure(QemuOpts *opts, 
BlockDriverState *in_bs,
 virtual_size = ROUND_UP(virtual_size, cluster_size);
 
 /* Check that virtual disk size is valid */
+l2e_size = extended_l2 ? L2E_SIZE_EXTENDED : L2E_SIZE_NORMAL;
 l2_tables = DIV_ROUND_UP(virtual_size / cluster_size,
- cluster_size / sizeof(uint64_t));
+ cluster_size / l2e_size);
 if (l2_tables * sizeof(uint64_t) > QCOW_MAX_L1_SIZE) {
 error_setg(_err, "The image size is too large "
"(try using a larger cluster size)");
@@ -4897,9 +4902,9 @@ static BlockMeasureInfo *qcow2_measure(QemuOpts *opts, 
BlockDriverState *in_bs,
 }
 
 info = g_new(BlockMeasureInfo, 1);
-info->fully_allocated =
+info->fully_allocated = luks_payload_size +
 qcow2_calc_prealloc_size(virtual_size, cluster_size,
- ctz32(refcount_bits)) + luks_payload_size;
+ ctz32(refcount_bits), extended_l2);
 
 /* Remove data clusters that are not required.  This overestimates the
  * required size because metadata needed for the fully allocated file is
-- 
2.20.1

[PATCH v5 14/31] qcow2: Add QCow2SubclusterType and qcow2_get_subcluster_type()

2020-05-05 Thread Alberto Garcia

This patch adds QCow2SubclusterType, which is the subcluster-level
version of QCow2ClusterType. All QCOW2_SUBCLUSTER_* values have the
the same meaning as their QCOW2_CLUSTER_* equivalents (when they
exist). See below for details and caveats.

In images without extended L2 entries clusters are treated as having
exactly one subcluster so it is possible to replace one data type with
the other while keeping the exact same semantics.

With extended L2 entries there are new possible values, and every
subcluster in the same cluster can obviously have a different
QCow2SubclusterType so functions need to be adapted to work on the
subcluster level.

There are several things that have to be taken into account:

  a) QCOW2_SUBCLUSTER_COMPRESSED means that the whole cluster is
 compressed. We do not support compression at the subcluster
 level.

  b) There are two different values for unallocated subclusters:
 QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN which means that the whole
 cluster is unallocated, and QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC
 which means that the cluster is allocated but the subcluster is
 not. The latter can only happen in images with extended L2
 entries.

  c) QCOW2_SUBCLUSTER_INVALID is used to detect the cases where an L2
 entry has a value that violates the specification. The caller is
 responsible for handling these situations.

 To prevent compatibility problems with images that have invalid
 values but are currently being read by QEMU without causing side
 effects, QCOW2_SUBCLUSTER_INVALID is only returned for images
 with extended L2 entries.

qcow2_cluster_to_subcluster_type() is added as a separate function
from qcow2_get_subcluster_type(), but this is only temporary and both
will be merged in a subsequent patch.

Signed-off-by: Alberto Garcia 
---
 block/qcow2.h | 127 +-
 1 file changed, 126 insertions(+), 1 deletion(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 4ad93772b9..be7816a3b8 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -80,6 +80,21 @@
 
 #define QCOW_EXTL2_SUBCLUSTERS_PER_CLUSTER 32
 
+/* The subcluster X [0..31] is allocated */
+#define QCOW_OFLAG_SUB_ALLOC(X)   (1ULL << (X))
+/* The subcluster X [0..31] reads as zeroes */
+#define QCOW_OFLAG_SUB_ZERO(X)(QCOW_OFLAG_SUB_ALLOC(X) << 32)
+/* Subclusters X to Y (both included) are allocated */
+#define QCOW_OFLAG_SUB_ALLOC_RANGE(X, Y) \
+(QCOW_OFLAG_SUB_ALLOC((Y) + 1) - QCOW_OFLAG_SUB_ALLOC(X))
+/* Subclusters X to Y (both included) read as zeroes */
+#define QCOW_OFLAG_SUB_ZERO_RANGE(X, Y) \
+(QCOW_OFLAG_SUB_ALLOC_RANGE(X, Y) << 32)
+/* L2 entry bitmap with all allocation bits set */
+#define QCOW_L2_BITMAP_ALL_ALLOC  (QCOW_OFLAG_SUB_ALLOC_RANGE(0, 31))
+/* L2 entry bitmap with all "read as zeroes" bits set */
+#define QCOW_L2_BITMAP_ALL_ZEROES (QCOW_OFLAG_SUB_ZERO_RANGE(0, 31))
+
 /* Size of normal and extended L2 entries */
 #define L2E_SIZE_NORMAL   (sizeof(uint64_t))
 #define L2E_SIZE_EXTENDED (sizeof(uint64_t) * 2)
@@ -444,6 +459,33 @@ typedef struct QCowL2Meta
 QLIST_ENTRY(QCowL2Meta) next_in_flight;
 } QCowL2Meta;
 
+/*
+ * In images with standard L2 entries all clusters are treated as if
+ * they had one subcluster so QCow2ClusterType and QCow2SubclusterType
+ * can be mapped to each other and have the exact same meaning
+ * (QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC cannot happen in these images).
+ *
+ * In images with extended L2 entries QCow2ClusterType refers to the
+ * complete cluster and QCow2SubclusterType to each of the individual
+ * subclusters, so there are several possible combinations:
+ *
+ * |--+---|
+ * | Cluster type | Possible subcluster types |
+ * |--+---|
+ * | UNALLOCATED  | UNALLOCATED_PLAIN |
+ * |  |ZERO_PLAIN |
+ * |--+---|
+ * | NORMAL   | UNALLOCATED_ALLOC |
+ * |  |ZERO_ALLOC |
+ * |  |NORMAL |
+ * |--+---|
+ * | COMPRESSED   |COMPRESSED |
+ * |--+---|
+ *
+ * QCOW2_SUBCLUSTER_INVALID means that the L2 entry is incorrect and
+ * the image should be marked corrupt.
+ */
+
 typedef enum QCow2ClusterType {
 QCOW2_CLUSTER_UNALLOCATED,
 QCOW2_CLUSTER_ZERO_PLAIN,
@@ -452,6 +494,16 @@ typedef enum QCow2ClusterType {
 QCOW2_CLUSTER_COMPRESSED,
 } QCow2ClusterType;
 
+typedef enum QCow2SubclusterType {
+QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN,
+QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC,
+QCOW2_SUBCLUSTER_ZERO_PLAIN,
+QCOW2_SUBCLUSTER_ZERO_ALLOC,
+QCOW2_SUBCLUSTER_NORMAL,
+QCOW2_SUBCLUSTER_COMPRESSED,
+QCOW2_SUBCLUSTER_INVALID,
+} QCow2SubclusterType;
+
 typedef enum QCow2MetadataOverlap {

[PATCH v5 28/31] qcow2: Add the 'extended_l2' option and the QCOW2_INCOMPAT_EXTL2 bit

2020-05-05 Thread Alberto Garcia

Now that the implementation of subclusters is complete we can finally
add the necessary options to create and read images with this feature,
which we call "extended L2 entries".

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
---
 qapi/block-core.json |   7 +++
 block/qcow2.h|   8 ++-
 include/block/block_int.h|   1 +
 block/qcow2.c|  65 ++--
 tests/qemu-iotests/031.out   |   8 +--
 tests/qemu-iotests/036.out   |   4 +-
 tests/qemu-iotests/049.out   | 102 +++
 tests/qemu-iotests/060.out   |   1 +
 tests/qemu-iotests/061.out   |  20 +++---
 tests/qemu-iotests/065   |  18 --
 tests/qemu-iotests/082.out   |  48 ---
 tests/qemu-iotests/085.out   |  38 ++--
 tests/qemu-iotests/144.out   |   4 +-
 tests/qemu-iotests/182.out   |   2 +-
 tests/qemu-iotests/185.out   |   8 +--
 tests/qemu-iotests/198.out   |   2 +
 tests/qemu-iotests/206.out   |   4 ++
 tests/qemu-iotests/242.out   |   5 ++
 tests/qemu-iotests/255.out   |   8 +--
 tests/qemu-iotests/274.out   |  49 ---
 tests/qemu-iotests/280.out   |   2 +-
 tests/qemu-iotests/common.filter |   1 +
 22 files changed, 266 insertions(+), 139 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 943df1926a..f5d1656001 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -66,6 +66,9 @@
 # standalone (read-only) raw image without looking at qcow2
 # metadata (since: 4.0)
 #
+# @extended-l2: true if the image has extended L2 entries; only valid for
+#   compat >= 1.1 (since 5.1)
+#
 # @lazy-refcounts: on or off; only valid for compat >= 1.1
 #
 # @corrupt: true if the image has been marked corrupt; only valid for
@@ -85,6 +88,7 @@
   'compat': 'str',
   '*data-file': 'str',
   '*data-file-raw': 'bool',
+  '*extended-l2': 'bool',
   '*lazy-refcounts': 'bool',
   '*corrupt': 'bool',
   'refcount-bits': 'int',
@@ -4296,6 +4300,8 @@
 # @data-file-raw: True if the external data file must stay valid as a
 # standalone (read-only) raw image without looking at qcow2
 # metadata (default: false; since: 4.0)
+# @extended-l2  True to make the image have extended L2 entries
+#   (default: false; since 5.1)
 # @size: Size of the virtual disk in bytes
 # @version: Compatibility level (default: v3)
 # @backing-file: File name of the backing file if a backing file
@@ -4314,6 +4320,7 @@
   'data': { 'file': 'BlockdevRef',
 '*data-file':   'BlockdevRef',
 '*data-file-raw':   'bool',
+'*extended-l2': 'bool',
 'size': 'size',
 '*version': 'BlockdevQcow2Version',
 '*backing-file':'str',
diff --git a/block/qcow2.h b/block/qcow2.h
index 7349c6ce40..09bf561305 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -237,13 +237,16 @@ enum {
 QCOW2_INCOMPAT_DIRTY_BITNR  = 0,
 QCOW2_INCOMPAT_CORRUPT_BITNR= 1,
 QCOW2_INCOMPAT_DATA_FILE_BITNR  = 2,
+QCOW2_INCOMPAT_EXTL2_BITNR  = 4,
 QCOW2_INCOMPAT_DIRTY= 1 << QCOW2_INCOMPAT_DIRTY_BITNR,
 QCOW2_INCOMPAT_CORRUPT  = 1 << QCOW2_INCOMPAT_CORRUPT_BITNR,
 QCOW2_INCOMPAT_DATA_FILE= 1 << QCOW2_INCOMPAT_DATA_FILE_BITNR,
+QCOW2_INCOMPAT_EXTL2= 1 << QCOW2_INCOMPAT_EXTL2_BITNR,
 
 QCOW2_INCOMPAT_MASK = QCOW2_INCOMPAT_DIRTY
 | QCOW2_INCOMPAT_CORRUPT
-| QCOW2_INCOMPAT_DATA_FILE,
+| QCOW2_INCOMPAT_DATA_FILE
+| QCOW2_INCOMPAT_EXTL2,
 };
 
 /* Compatible feature bits */
@@ -555,8 +558,7 @@ typedef enum QCow2MetadataOverlap {
 
 static inline bool has_subclusters(BDRVQcow2State *s)
 {
-/* FIXME: Return false until this feature is complete */
-return false;
+return s->incompatible_features & QCOW2_INCOMPAT_EXTL2;
 }
 
 static inline size_t l2_entry_size(BDRVQcow2State *s)
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 92335f33c7..27f17bab78 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -57,6 +57,7 @@
 #define BLOCK_OPT_REFCOUNT_BITS "refcount_bits"
 #define BLOCK_OPT_DATA_FILE "data_file"
 #define BLOCK_OPT_DATA_FILE_RAW "data_file_raw"
+#define BLOCK_OPT_EXTL2 "extended_l2"
 
 #define BLOCK_PROBE_BUF_SIZE512
 
diff --git a/block/qcow2.c b/block/qcow2.c
index ec4d1405f0..18c8e3f52a 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1385,6 +1385,12 @@ static int coroutine_fn qcow2_do_open(BlockDriverState 
*bs, QDict *options,
 s->subcluster_size = s->cluster_size / s->subclusters_per_cluster;
 s->subcluster_bits = ctz32(s->subcluster_size);

[PATCH v5 25/31] qcow2: Clear the L2 bitmap when allocating a compressed cluster

2020-05-05 Thread Alberto Garcia

Compressed clusters always have the bitmap part of the extended L2
entry set to 0.

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
---
 block/qcow2-cluster.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 4544a40aa0..0a295076a3 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -800,6 +800,9 @@ int qcow2_alloc_compressed_cluster_offset(BlockDriverState 
*bs,
 BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE_COMPRESSED);
 qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
 set_l2_entry(s, l2_slice, l2_index, cluster_offset);
+if (has_subclusters(s)) {
+set_l2_bitmap(s, l2_slice, l2_index, 0);
+}
 qcow2_cache_put(s->l2_table_cache, (void **) _slice);
 
 *host_offset = cluster_offset & s->cluster_offset_mask;
-- 
2.20.1

[PATCH v5 16/31] qcow2: Add cluster type parameter to qcow2_get_host_offset()

2020-05-05 Thread Alberto Garcia

This function returns an integer that can be either an error code or a
cluster type (a value from the QCow2ClusterType enum).

We are going to start using subcluster types instead of cluster types
in some functions so it's better to use the exact data types instead
of integers for clarity and in order to detect errors more easily.

This patch makes qcow2_get_host_offset() return 0 on success and
puts the returned cluster type in a separate parameter. There are no
semantic changes.

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 block/qcow2.h |  3 ++-
 block/qcow2-cluster.c | 11 +++
 block/qcow2.c | 37 ++---
 3 files changed, 31 insertions(+), 20 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index b5db8d2f36..cca650e934 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -876,7 +876,8 @@ int qcow2_encrypt_sectors(BDRVQcow2State *s, int64_t 
sector_num,
   uint8_t *buf, int nb_sectors, bool enc, Error 
**errp);
 
 int qcow2_get_host_offset(BlockDriverState *bs, uint64_t offset,
-  unsigned int *bytes, uint64_t *host_offset);
+  unsigned int *bytes, uint64_t *host_offset,
+  QCow2ClusterType *cluster_type);
 int qcow2_alloc_cluster_offset(BlockDriverState *bs, uint64_t offset,
unsigned int *bytes, uint64_t *host_offset,
QCowL2Meta **m);
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 8b2fc550b7..64481067ce 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -514,13 +514,14 @@ static int coroutine_fn 
do_perform_cow_write(BlockDriverState *bs,
  *
  * On exit, *bytes is the number of bytes starting at offset that have the same
  * cluster type and (if applicable) are stored contiguously in the image file.
+ * The cluster type is stored in *cluster_type.
  * Compressed clusters are always returned one by one.
  *
- * Returns the cluster type (QCOW2_CLUSTER_*) on success, -errno in error
- * cases.
+ * Returns 0 on success, -errno in error cases.
  */
 int qcow2_get_host_offset(BlockDriverState *bs, uint64_t offset,
-  unsigned int *bytes, uint64_t *host_offset)
+  unsigned int *bytes, uint64_t *host_offset,
+  QCow2ClusterType *cluster_type)
 {
 BDRVQcow2State *s = bs->opaque;
 unsigned int l2_index;
@@ -661,7 +662,9 @@ out:
 assert(bytes_available - offset_in_cluster <= UINT_MAX);
 *bytes = bytes_available - offset_in_cluster;
 
-return type;
+*cluster_type = type;
+
+return 0;
 
 fail:
 qcow2_cache_put(s->l2_table_cache, (void **)_slice);
diff --git a/block/qcow2.c b/block/qcow2.c
index 9a8fb5a7bf..b9effb195b 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1973,6 +1973,7 @@ static int coroutine_fn 
qcow2_co_block_status(BlockDriverState *bs,
 BDRVQcow2State *s = bs->opaque;
 uint64_t host_offset;
 unsigned int bytes;
+QCow2ClusterType type;
 int ret, status = 0;
 
 qemu_co_mutex_lock(>lock);
@@ -1984,7 +1985,7 @@ static int coroutine_fn 
qcow2_co_block_status(BlockDriverState *bs,
 }
 
 bytes = MIN(INT_MAX, count);
-ret = qcow2_get_host_offset(bs, offset, , _offset);
+ret = qcow2_get_host_offset(bs, offset, , _offset, );
 qemu_co_mutex_unlock(>lock);
 if (ret < 0) {
 return ret;
@@ -1992,15 +1993,15 @@ static int coroutine_fn 
qcow2_co_block_status(BlockDriverState *bs,
 
 *pnum = bytes;
 
-if ((ret == QCOW2_CLUSTER_NORMAL || ret == QCOW2_CLUSTER_ZERO_ALLOC) &&
+if ((type == QCOW2_CLUSTER_NORMAL || type == QCOW2_CLUSTER_ZERO_ALLOC) &&
 !s->crypto) {
 *map = host_offset;
 *file = s->data_file->bs;
 status |= BDRV_BLOCK_OFFSET_VALID;
 }
-if (ret == QCOW2_CLUSTER_ZERO_PLAIN || ret == QCOW2_CLUSTER_ZERO_ALLOC) {
+if (type == QCOW2_CLUSTER_ZERO_PLAIN || type == QCOW2_CLUSTER_ZERO_ALLOC) {
 status |= BDRV_BLOCK_ZERO;
-} else if (ret != QCOW2_CLUSTER_UNALLOCATED) {
+} else if (type != QCOW2_CLUSTER_UNALLOCATED) {
 status |= BDRV_BLOCK_DATA;
 }
 if (s->metadata_preallocation && (status & BDRV_BLOCK_DATA) &&
@@ -2209,6 +2210,7 @@ static coroutine_fn int 
qcow2_co_preadv_part(BlockDriverState *bs,
 int ret = 0;
 unsigned int cur_bytes; /* number of bytes in current iteration */
 uint64_t host_offset = 0;
+QCow2ClusterType type;
 AioTaskPool *aio = NULL;
 
 while (bytes != 0 && aio_task_pool_status(aio) == 0) {
@@ -2220,22 +,23 @@ static coroutine_fn int 
qcow2_co_preadv_part(BlockDriverState *bs,
 }
 
 qemu_co_mutex_lock(>lock);
-ret = qcow2_get_host_offset(bs, offset, _bytes, _offset);
+ret = qcow2_get_host_offset(bs, offset, _bytes,
+_offset, );

[PATCH v5 26/31] qcow2: Add subcluster support to handle_alloc_space()

2020-05-05 Thread Alberto Garcia

The bdrv_co_pwrite_zeroes() call here fills complete clusters with
zeroes, but it can happen that some subclusters are not part of the
write request or the copy-on-write. This patch makes sure that only
the affected subclusters are overwritten.

A potential improvement would be to also fill with zeroes the other
subclusters if we can guarantee that we are not overwriting existing
data. However this would waste more disk space, so we should first
evaluate if it's really worth doing.

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 block/qcow2.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 63e952b89a..cc5591fe8f 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2351,6 +2351,9 @@ static int handle_alloc_space(BlockDriverState *bs, 
QCowL2Meta *l2meta)
 
 for (m = l2meta; m != NULL; m = m->next) {
 int ret;
+uint64_t start_offset = m->alloc_offset + m->cow_start.offset;
+unsigned nb_bytes = m->cow_end.offset + m->cow_end.nb_bytes -
+m->cow_start.offset;
 
 if (!m->cow_start.nb_bytes && !m->cow_end.nb_bytes) {
 continue;
@@ -2365,16 +2368,14 @@ static int handle_alloc_space(BlockDriverState *bs, 
QCowL2Meta *l2meta)
  * efficiently zero out the whole clusters
  */
 
-ret = qcow2_pre_write_overlap_check(bs, 0, m->alloc_offset,
-m->nb_clusters * s->cluster_size,
+ret = qcow2_pre_write_overlap_check(bs, 0, start_offset, nb_bytes,
 true);
 if (ret < 0) {
 return ret;
 }
 
 BLKDBG_EVENT(bs->file, BLKDBG_CLUSTER_ALLOC_SPACE);
-ret = bdrv_co_pwrite_zeroes(s->data_file, m->alloc_offset,
-m->nb_clusters * s->cluster_size,
+ret = bdrv_co_pwrite_zeroes(s->data_file, start_offset, nb_bytes,
 BDRV_REQ_NO_FALLBACK);
 if (ret < 0) {
 if (ret != -ENOTSUP && ret != -EAGAIN) {
-- 
2.20.1

[PATCH v5 05/31] qcow2: Process QCOW2_CLUSTER_ZERO_ALLOC clusters in handle_copied()

2020-05-05 Thread Alberto Garcia

When writing to a qcow2 file there are two functions that take a
virtual offset and return a host offset, possibly allocating new
clusters if necessary:

   - handle_copied() looks for normal data clusters that are already
 allocated and have a reference count of 1. In those clusters we
 can simply write the data and there is no need to perform any
 copy-on-write.

   - handle_alloc() looks for clusters that do need copy-on-write,
 either because they haven't been allocated yet, because their
 reference count is != 1 or because they are ZERO_ALLOC clusters.

The ZERO_ALLOC case is a bit special because those are clusters that
are already allocated and they could perfectly be dealt with in
handle_copied() (as long as copy-on-write is performed when required).

In fact, there is extra code specifically for them in handle_alloc()
that tries to reuse the existing allocation if possible and frees them
otherwise.

This patch changes the handling of ZERO_ALLOC clusters so the
semantics of these two functions are now like this:

   - handle_copied() looks for clusters that are already allocated and
 which we can overwrite (NORMAL and ZERO_ALLOC clusters with a
 reference count of 1).

   - handle_alloc() looks for clusters for which we need a new
 allocation (all other cases).

One important difference after this change is that clusters found
in handle_copied() may now require copy-on-write, but this will be
necessary anyway once we add support for subclusters.

Signed-off-by: Alberto Garcia 
---
 block/qcow2-cluster.c | 256 +++---
 1 file changed, 141 insertions(+), 115 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 80f9787461..fce0be7a08 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1039,13 +1039,18 @@ void qcow2_alloc_cluster_abort(BlockDriverState *bs, 
QCowL2Meta *m)
 
 /*
  * For a given write request, create a new QCowL2Meta structure, add
- * it to @m and the BDRVQcow2State.cluster_allocs list.
+ * it to @m and the BDRVQcow2State.cluster_allocs list. If the write
+ * request does not need copy-on-write or changes to the L2 metadata
+ * then this function does nothing.
  *
  * @host_cluster_offset points to the beginning of the first cluster.
  *
  * @guest_offset and @bytes indicate the offset and length of the
  * request.
  *
+ * @l2_slice contains the L2 entries of all clusters involved in this
+ * write request.
+ *
  * If @keep_old is true it means that the clusters were already
  * allocated and will be overwritten. If false then the clusters are
  * new and we have to decrease the reference count of the old ones.
@@ -1053,15 +1058,53 @@ void qcow2_alloc_cluster_abort(BlockDriverState *bs, 
QCowL2Meta *m)
 static void calculate_l2_meta(BlockDriverState *bs,
   uint64_t host_cluster_offset,
   uint64_t guest_offset, unsigned bytes,
-  QCowL2Meta **m, bool keep_old)
+  uint64_t *l2_slice, QCowL2Meta **m, bool 
keep_old)
 {
 BDRVQcow2State *s = bs->opaque;
-unsigned cow_start_from = 0;
+int l2_index = offset_to_l2_slice_index(s, guest_offset);
+uint64_t l2_entry;
+unsigned cow_start_from, cow_end_to;
 unsigned cow_start_to = offset_into_cluster(s, guest_offset);
 unsigned cow_end_from = cow_start_to + bytes;
-unsigned cow_end_to = ROUND_UP(cow_end_from, s->cluster_size);
 unsigned nb_clusters = size_to_clusters(s, cow_end_from);
 QCowL2Meta *old_m = *m;
+QCow2ClusterType type;
+
+assert(nb_clusters <= s->l2_slice_size - l2_index);
+
+/* Return if there's no COW (all clusters are normal and we keep them) */
+if (keep_old) {
+int i;
+for (i = 0; i < nb_clusters; i++) {
+l2_entry = be64_to_cpu(l2_slice[l2_index + i]);
+if (qcow2_get_cluster_type(bs, l2_entry) != QCOW2_CLUSTER_NORMAL) {
+break;
+}
+}
+if (i == nb_clusters) {
+return;
+}
+}
+
+/* Get the L2 entry of the first cluster */
+l2_entry = be64_to_cpu(l2_slice[l2_index]);
+type = qcow2_get_cluster_type(bs, l2_entry);
+
+if (type == QCOW2_CLUSTER_NORMAL && keep_old) {
+cow_start_from = cow_start_to;
+} else {
+cow_start_from = 0;
+}
+
+/* Get the L2 entry of the last cluster */
+l2_entry = be64_to_cpu(l2_slice[l2_index + nb_clusters - 1]);
+type = qcow2_get_cluster_type(bs, l2_entry);
+
+if (type == QCOW2_CLUSTER_NORMAL && keep_old) {
+cow_end_to = cow_end_from;
+} else {
+cow_end_to = ROUND_UP(cow_end_from, s->cluster_size);
+}
 
 *m = g_malloc0(sizeof(**m));
 **m = (QCowL2Meta) {
@@ -1087,18 +1130,22 @@ static void calculate_l2_meta(BlockDriverState *bs,
 QLIST_INSERT_HEAD(>cluster_allocs, *m, next_in_flight);
 }
 
-/* Returns true if writing to a cluster requires COW

[PATCH v5 19/31] qcow2: Add subcluster support to calculate_l2_meta()

2020-05-05 Thread Alberto Garcia

If an image has subclusters then there are more copy-on-write
scenarios that we need to consider. Let's say we have a write request
from the middle of subcluster #3 until the end of the cluster:

1) If we are writing to a newly allocated cluster then we need
   copy-on-write. The previous contents of subclusters #0 to #3 must
   be copied to the new cluster. We can optimize this process by
   skipping all leading unallocated or zero subclusters (the status of
   those skipped subclusters will be reflected in the new L2 bitmap).

2) If we are overwriting an existing cluster:

   2.1) If subcluster #3 is unallocated or has the all-zeroes bit set
then we need copy-on-write (on subcluster #3 only).

   2.2) If subcluster #3 was already allocated then there is no need
for any copy-on-write. However we still need to update the L2
bitmap to reflect possible changes in the allocation status of
subclusters #4 to #31. Because of this, this function checks
if all the overwritten subclusters are already allocated and
in this case it returns without creating a new QCowL2Meta
structure.

After all these changes l2meta_cow_start() and l2meta_cow_end()
are not necessarily cluster-aligned anymore. We need to update the
calculation of old_start and old_end in handle_dependencies() to
guarantee that no two requests try to write on the same cluster.

Signed-off-by: Alberto Garcia 
---
 block/qcow2-cluster.c | 174 +++---
 1 file changed, 146 insertions(+), 28 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 5595ce1404..ffcb11edda 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1059,56 +1059,156 @@ void qcow2_alloc_cluster_abort(BlockDriverState *bs, 
QCowL2Meta *m)
  * If @keep_old is true it means that the clusters were already
  * allocated and will be overwritten. If false then the clusters are
  * new and we have to decrease the reference count of the old ones.
+ *
+ * Returns 0 on success, -errno on failure.
  */
-static void calculate_l2_meta(BlockDriverState *bs,
-  uint64_t host_cluster_offset,
-  uint64_t guest_offset, unsigned bytes,
-  uint64_t *l2_slice, QCowL2Meta **m, bool 
keep_old)
+static int calculate_l2_meta(BlockDriverState *bs, uint64_t 
host_cluster_offset,
+ uint64_t guest_offset, unsigned bytes,
+ uint64_t *l2_slice, QCowL2Meta **m, bool keep_old)
 {
 BDRVQcow2State *s = bs->opaque;
-int l2_index = offset_to_l2_slice_index(s, guest_offset);
-uint64_t l2_entry;
+int sc_index, l2_index = offset_to_l2_slice_index(s, guest_offset);
+uint64_t l2_entry, l2_bitmap;
 unsigned cow_start_from, cow_end_to;
 unsigned cow_start_to = offset_into_cluster(s, guest_offset);
 unsigned cow_end_from = cow_start_to + bytes;
 unsigned nb_clusters = size_to_clusters(s, cow_end_from);
 QCowL2Meta *old_m = *m;
-QCow2ClusterType type;
+QCow2SubclusterType type;
 
 assert(nb_clusters <= s->l2_slice_size - l2_index);
 
-/* Return if there's no COW (all clusters are normal and we keep them) */
+/* Return if there's no COW (all subclusters are normal and we are
+ * keeping the clusters) */
 if (keep_old) {
+unsigned first_sc = cow_start_to / s->subcluster_size;
+unsigned last_sc = (cow_end_from - 1) / s->subcluster_size;
 int i;
-for (i = 0; i < nb_clusters; i++) {
-l2_entry = get_l2_entry(s, l2_slice, l2_index + i);
-if (qcow2_get_cluster_type(bs, l2_entry) != QCOW2_CLUSTER_NORMAL) {
+for (i = first_sc; i <= last_sc; i++) {
+unsigned c = i / s->subclusters_per_cluster;
+unsigned sc = i % s->subclusters_per_cluster;
+l2_entry = get_l2_entry(s, l2_slice, l2_index + c);
+l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index + c);
+type = qcow2_get_subcluster_type(bs, l2_entry, l2_bitmap, sc);
+if (type == QCOW2_SUBCLUSTER_INVALID) {
+l2_index += c; /* Point to the invalid entry */
+goto fail;
+}
+if (type != QCOW2_SUBCLUSTER_NORMAL) {
 break;
 }
 }
-if (i == nb_clusters) {
-return;
+if (i == last_sc + 1) {
+return 0;
 }
 }
 
 /* Get the L2 entry of the first cluster */
 l2_entry = get_l2_entry(s, l2_slice, l2_index);
-type = qcow2_get_cluster_type(bs, l2_entry);
+l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index);
+sc_index = offset_to_sc_index(s, guest_offset);
+type = qcow2_get_subcluster_type(bs, l2_entry, l2_bitmap, sc_index);
 
-if (type == QCOW2_CLUSTER_NORMAL && keep_old) {
-cow_start_from = cow_start_to;
+if (type == QCOW2_SUBCLUSTER_INVALID) {
+goto fail;
+}
+
+

[PATCH v5 22/31] qcow2: Add subcluster support to discard_in_l2_slice()

2020-05-05 Thread Alberto Garcia

Two things need to be taken into account here:

1) With full_discard == true the L2 entry must be cleared completely.
   This also includes the L2 bitmap if the image has extended L2
   entries.

2) With full_discard == false we have to make the discarded cluster
   read back as zeroes. With normal L2 entries this is done with the
   QCOW_OFLAG_ZERO bit, whereas with extended L2 entries this is done
   with the individual 'all zeroes' bits for each subcluster.

   Note however that QCOW_OFLAG_ZERO is not supported in v2 qcow2
   images so, if there is a backing file, discard cannot guarantee
   that the image will read back as zeroes. If this is important for
   the caller it should forbid it as qcow2_co_pdiscard() does (see
   80f5c01183 for more details).

Signed-off-by: Alberto Garcia 
---
 block/qcow2-cluster.c | 51 +++
 1 file changed, 22 insertions(+), 29 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index bf32ed0825..2283a308d0 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1804,11 +1804,16 @@ static int discard_in_l2_slice(BlockDriverState *bs, 
uint64_t offset,
 assert(nb_clusters <= INT_MAX);
 
 for (i = 0; i < nb_clusters; i++) {
-uint64_t old_l2_entry;
-
-old_l2_entry = get_l2_entry(s, l2_slice, l2_index + i);
+uint64_t old_l2_entry = get_l2_entry(s, l2_slice, l2_index + i);
+uint64_t old_l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index + i);
+uint64_t new_l2_entry = old_l2_entry;
+uint64_t new_l2_bitmap = old_l2_bitmap;
+QCow2ClusterType type = qcow2_get_cluster_type(bs, old_l2_entry);
 
 /*
+ * If full_discard is true, the cluster should not read back as zeroes,
+ * but rather fall through to the backing file.
+ *
  * If full_discard is false, make sure that a discarded area reads back
  * as zeroes for v3 images (we cannot do it for v2 without actually
  * writing a zero-filled buffer). We can skip the operation if the
@@ -1817,40 +1822,28 @@ static int discard_in_l2_slice(BlockDriverState *bs, 
uint64_t offset,
  *
  * TODO We might want to use bdrv_block_status(bs) here, but we're
  * holding s->lock, so that doesn't work today.
- *
- * If full_discard is true, the sector should not read back as zeroes,
- * but rather fall through to the backing file.
  */
-switch (qcow2_get_cluster_type(bs, old_l2_entry)) {
-case QCOW2_CLUSTER_UNALLOCATED:
-if (full_discard || !bs->backing) {
-continue;
+if (full_discard) {
+new_l2_entry = new_l2_bitmap = 0;
+} else if (bs->backing || qcow2_cluster_is_allocated(type)) {
+if (has_subclusters(s)) {
+new_l2_entry = 0;
+new_l2_bitmap = QCOW_L2_BITMAP_ALL_ZEROES;
+} else {
+new_l2_entry = s->qcow_version >= 3 ? QCOW_OFLAG_ZERO : 0;
 }
-break;
+}
 
-case QCOW2_CLUSTER_ZERO_PLAIN:
-if (!full_discard) {
-continue;
-}
-break;
-
-case QCOW2_CLUSTER_ZERO_ALLOC:
-case QCOW2_CLUSTER_NORMAL:
-case QCOW2_CLUSTER_COMPRESSED:
-break;
-
-default:
-abort();
+if (old_l2_entry == new_l2_entry && old_l2_bitmap == new_l2_bitmap) {
+continue;
 }
 
 /* First remove L2 entries */
 qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
-if (!full_discard && s->qcow_version >= 3) {
-set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);
-} else {
-set_l2_entry(s, l2_slice, l2_index + i, 0);
+set_l2_entry(s, l2_slice, l2_index + i, new_l2_entry);
+if (has_subclusters(s)) {
+set_l2_bitmap(s, l2_slice, l2_index + i, new_l2_bitmap);
 }
-
 /* Then decrease the refcount */
 qcow2_free_any_clusters(bs, old_l2_entry, 1, type);
 }
-- 
2.20.1

[PATCH v5 09/31] qcow2: Add subcluster-related fields to BDRVQcow2State

2020-05-05 Thread Alberto Garcia

This patch adds the following new fields to BDRVQcow2State:

- subclusters_per_cluster: Number of subclusters in a cluster
- subcluster_size: The size of each subcluster, in bytes
- subcluster_bits: No. of bits so 1 << subcluster_bits = subcluster_size

Images without subclusters are treated as if they had exactly one
subcluster per cluster (i.e. subcluster_size = cluster_size).

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 block/qcow2.h | 5 +
 block/qcow2.c | 5 +
 2 files changed, 10 insertions(+)

diff --git a/block/qcow2.h b/block/qcow2.h
index 5e8036c3dd..9d8ca8068c 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -78,6 +78,8 @@
 /* The cluster reads as all zeros */
 #define QCOW_OFLAG_ZERO (1ULL << 0)
 
+#define QCOW_EXTL2_SUBCLUSTERS_PER_CLUSTER 32
+
 #define MIN_CLUSTER_BITS 9
 #define MAX_CLUSTER_BITS 21
 
@@ -284,6 +286,9 @@ typedef struct BDRVQcow2State {
 int cluster_bits;
 int cluster_size;
 int l2_slice_size;
+int subcluster_bits;
+int subcluster_size;
+int subclusters_per_cluster;
 int l2_bits;
 int l2_size;
 int l1_size;
diff --git a/block/qcow2.c b/block/qcow2.c
index 951fc19dd2..e5ae8aaff7 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1380,6 +1380,11 @@ static int coroutine_fn qcow2_do_open(BlockDriverState 
*bs, QDict *options,
 }
 }
 
+s->subclusters_per_cluster =
+has_subclusters(s) ? QCOW_EXTL2_SUBCLUSTERS_PER_CLUSTER : 1;
+s->subcluster_size = s->cluster_size / s->subclusters_per_cluster;
+s->subcluster_bits = ctz32(s->subcluster_size);
+
 /* Check support for various header values */
 if (header.refcount_order > 6) {
 error_setg(errp, "Reference count entry width too large; may not "
-- 
2.20.1

[PATCH v5 13/31] qcow2: Update get/set_l2_entry() and add get/set_l2_bitmap()

2020-05-05 Thread Alberto Garcia

Extended L2 entries are 128-bit wide: 64 bits for the entry itself and
64 bits for the subcluster allocation bitmap.

In order to support them correctly get/set_l2_entry() need to be
updated so they take the entry width into account in order to
calculate the correct offset.

This patch also adds the get/set_l2_bitmap() functions that are
used to access the bitmaps. For convenience we allow calling
get_l2_bitmap() on images without subclusters. In this case the
returned value is always 0 and has no meaning.

Signed-off-by: Alberto Garcia 
---
 block/qcow2.h | 21 +
 1 file changed, 21 insertions(+)

diff --git a/block/qcow2.h b/block/qcow2.h
index 80ceb352c9..4ad93772b9 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -515,15 +515,36 @@ static inline size_t l2_entry_size(BDRVQcow2State *s)
 static inline uint64_t get_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice,
 int idx)
 {
+idx *= l2_entry_size(s) / sizeof(uint64_t);
 return be64_to_cpu(l2_slice[idx]);
 }
 
+static inline uint64_t get_l2_bitmap(BDRVQcow2State *s, uint64_t *l2_slice,
+ int idx)
+{
+if (has_subclusters(s)) {
+idx *= l2_entry_size(s) / sizeof(uint64_t);
+return be64_to_cpu(l2_slice[idx + 1]);
+} else {
+return 0; /* For convenience only; this value has no meaning. */
+}
+}
+
 static inline void set_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice,
 int idx, uint64_t entry)
 {
+idx *= l2_entry_size(s) / sizeof(uint64_t);
 l2_slice[idx] = cpu_to_be64(entry);
 }
 
+static inline void set_l2_bitmap(BDRVQcow2State *s, uint64_t *l2_slice,
+ int idx, uint64_t bitmap)
+{
+assert(has_subclusters(s));
+idx *= l2_entry_size(s) / sizeof(uint64_t);
+l2_slice[idx + 1] = cpu_to_be64(bitmap);
+}
+
 static inline bool has_data_file(BlockDriverState *bs)
 {
 BDRVQcow2State *s = bs->opaque;
-- 
2.20.1

[PATCH v5 04/31] qcow2: Split cluster_needs_cow() out of count_cow_clusters()

2020-05-05 Thread Alberto Garcia

We are going to need it in other places.

Signed-off-by: Alberto Garcia 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
---
 block/qcow2-cluster.c | 34 +++---
 1 file changed, 19 insertions(+), 15 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 61ad638bdc..80f9787461 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1087,6 +1087,24 @@ static void calculate_l2_meta(BlockDriverState *bs,
 QLIST_INSERT_HEAD(>cluster_allocs, *m, next_in_flight);
 }
 
+/* Returns true if writing to a cluster requires COW */
+static bool cluster_needs_cow(BlockDriverState *bs, uint64_t l2_entry)
+{
+switch (qcow2_get_cluster_type(bs, l2_entry)) {
+case QCOW2_CLUSTER_NORMAL:
+if (l2_entry & QCOW_OFLAG_COPIED) {
+return false;
+}
+case QCOW2_CLUSTER_UNALLOCATED:
+case QCOW2_CLUSTER_COMPRESSED:
+case QCOW2_CLUSTER_ZERO_PLAIN:
+case QCOW2_CLUSTER_ZERO_ALLOC:
+return true;
+default:
+abort();
+}
+}
+
 /*
  * Returns the number of contiguous clusters that can be used for an allocating
  * write, but require COW to be performed (this includes yet unallocated space,
@@ -1099,25 +1117,11 @@ static int count_cow_clusters(BlockDriverState *bs, int 
nb_clusters,
 
 for (i = 0; i < nb_clusters; i++) {
 uint64_t l2_entry = be64_to_cpu(l2_slice[l2_index + i]);
-QCow2ClusterType cluster_type = qcow2_get_cluster_type(bs, l2_entry);
-
-switch(cluster_type) {
-case QCOW2_CLUSTER_NORMAL:
-if (l2_entry & QCOW_OFLAG_COPIED) {
-goto out;
-}
+if (!cluster_needs_cow(bs, l2_entry)) {
 break;
-case QCOW2_CLUSTER_UNALLOCATED:
-case QCOW2_CLUSTER_COMPRESSED:
-case QCOW2_CLUSTER_ZERO_PLAIN:
-case QCOW2_CLUSTER_ZERO_ALLOC:
-break;
-default:
-abort();
 }
 }
 
-out:
 assert(i <= nb_clusters);
 return i;
 }
-- 
2.20.1

[PATCH v5 12/31] qcow2: Add l2_entry_size()

2020-05-05 Thread Alberto Garcia

qcow2 images with subclusters have 128-bit L2 entries. The first 64
bits contain the same information as traditional images and the last
64 bits form a bitmap with the status of each individual subcluster.

Because of that we cannot assume that L2 entries are sizeof(uint64_t)
anymore. This function returns the proper value for the image.

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
---
 block/qcow2.h  |  9 +
 block/qcow2-cluster.c  | 12 ++--
 block/qcow2-refcount.c | 14 --
 block/qcow2.c  |  8 
 4 files changed, 27 insertions(+), 16 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 8b1ed1cbcf..80ceb352c9 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -80,6 +80,10 @@
 
 #define QCOW_EXTL2_SUBCLUSTERS_PER_CLUSTER 32
 
+/* Size of normal and extended L2 entries */
+#define L2E_SIZE_NORMAL   (sizeof(uint64_t))
+#define L2E_SIZE_EXTENDED (sizeof(uint64_t) * 2)
+
 #define MIN_CLUSTER_BITS 9
 #define MAX_CLUSTER_BITS 21
 
@@ -503,6 +507,11 @@ static inline bool has_subclusters(BDRVQcow2State *s)
 return false;
 }
 
+static inline size_t l2_entry_size(BDRVQcow2State *s)
+{
+return has_subclusters(s) ? L2E_SIZE_EXTENDED : L2E_SIZE_NORMAL;
+}
+
 static inline uint64_t get_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice,
 int idx)
 {
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 76fd0f3cdb..8b2fc550b7 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -208,7 +208,7 @@ static int l2_load(BlockDriverState *bs, uint64_t offset,
uint64_t l2_offset, uint64_t **l2_slice)
 {
 BDRVQcow2State *s = bs->opaque;
-int start_of_slice = sizeof(uint64_t) *
+int start_of_slice = l2_entry_size(s) *
 (offset_to_l2_index(s, offset) - offset_to_l2_slice_index(s, offset));
 
 return qcow2_cache_get(bs, s->l2_table_cache, l2_offset + start_of_slice,
@@ -281,7 +281,7 @@ static int l2_allocate(BlockDriverState *bs, int l1_index)
 
 /* allocate a new l2 entry */
 
-l2_offset = qcow2_alloc_clusters(bs, s->l2_size * sizeof(uint64_t));
+l2_offset = qcow2_alloc_clusters(bs, s->l2_size * l2_entry_size(s));
 if (l2_offset < 0) {
 ret = l2_offset;
 goto fail;
@@ -305,7 +305,7 @@ static int l2_allocate(BlockDriverState *bs, int l1_index)
 
 /* allocate a new entry in the l2 cache */
 
-slice_size2 = s->l2_slice_size * sizeof(uint64_t);
+slice_size2 = s->l2_slice_size * l2_entry_size(s);
 n_slices = s->cluster_size / slice_size2;
 
 trace_qcow2_l2_allocate_get_empty(bs, l1_index);
@@ -369,7 +369,7 @@ fail:
 }
 s->l1_table[l1_index] = old_l2_offset;
 if (l2_offset > 0) {
-qcow2_free_clusters(bs, l2_offset, s->l2_size * sizeof(uint64_t),
+qcow2_free_clusters(bs, l2_offset, s->l2_size * l2_entry_size(s),
 QCOW2_DISCARD_ALWAYS);
 }
 return ret;
@@ -716,7 +716,7 @@ static int get_cluster_table(BlockDriverState *bs, uint64_t 
offset,
 
 /* Then decrease the refcount of the old table */
 if (l2_offset) {
-qcow2_free_clusters(bs, l2_offset, s->l2_size * sizeof(uint64_t),
+qcow2_free_clusters(bs, l2_offset, s->l2_size * l2_entry_size(s),
 QCOW2_DISCARD_OTHER);
 }
 
@@ -1913,7 +1913,7 @@ static int expand_zero_clusters_in_l1(BlockDriverState 
*bs, uint64_t *l1_table,
 int ret;
 int i, j;
 
-slice_size2 = s->l2_slice_size * sizeof(uint64_t);
+slice_size2 = s->l2_slice_size * l2_entry_size(s);
 n_slices = s->cluster_size / slice_size2;
 
 if (!is_active_l1) {
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index b299b9bda4..dfdcdd3c25 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1254,7 +1254,7 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
 l2_slice = NULL;
 l1_table = NULL;
 l1_size2 = l1_size * sizeof(uint64_t);
-slice_size2 = s->l2_slice_size * sizeof(uint64_t);
+slice_size2 = s->l2_slice_size * l2_entry_size(s);
 n_slices = s->cluster_size / slice_size2;
 
 s->cache_discards = true;
@@ -1605,7 +1605,7 @@ static int check_refcounts_l2(BlockDriverState *bs, 
BdrvCheckResult *res,
 int i, l2_size, nb_csectors, ret;
 
 /* Read L2 table from disk */
-l2_size = s->l2_size * sizeof(uint64_t);
+l2_size = s->l2_size * l2_entry_size(s);
 l2_table = g_malloc(l2_size);
 
 ret = bdrv_pread(bs->file, l2_offset, l2_table, l2_size);
@@ -1680,15 +1680,16 @@ static int check_refcounts_l2(BlockDriverState *bs, 
BdrvCheckResult *res,
 fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR",
 offset);
 if (fix & BDRV_FIX_ERRORS) {
+int idx = i * (l2_entry_size(s) / sizeof(uint64_t));
 uint64_t l2e_offset =
-l2_offset +

[PATCH v5 15/31] qcow2: Add qcow2_cluster_is_allocated()

2020-05-05 Thread Alberto Garcia

This helper function tells us if a cluster is allocated (that is,
there is an associated host offset for it).

Signed-off-by: Alberto Garcia 
---
 block/qcow2.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/block/qcow2.h b/block/qcow2.h
index be7816a3b8..b5db8d2f36 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -763,6 +763,12 @@ QCow2SubclusterType 
qcow2_get_subcluster_type(BlockDriverState *bs,
 }
 }
 
+static inline bool qcow2_cluster_is_allocated(QCow2ClusterType type)
+{
+return (type == QCOW2_CLUSTER_COMPRESSED || type == QCOW2_CLUSTER_NORMAL ||
+type == QCOW2_CLUSTER_ZERO_ALLOC);
+}
+
 /* Check whether refcounts are eager or lazy */
 static inline bool qcow2_need_accurate_refcounts(BDRVQcow2State *s)
 {
-- 
2.20.1

[PATCH v5 21/31] qcow2: Add subcluster support to zero_in_l2_slice()

2020-05-05 Thread Alberto Garcia

The QCOW_OFLAG_ZERO bit that indicates that a cluster reads as
zeroes is only used in standard L2 entries. Extended L2 entries use
individual 'all zeroes' bits for each subcluster.

This must be taken into account when updating the L2 entry and also
when deciding that an existing entry does not need to be updated.

Signed-off-by: Alberto Garcia 
---
 block/qcow2-cluster.c | 36 +++-
 1 file changed, 19 insertions(+), 17 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index f500cbfb8e..bf32ed0825 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1913,7 +1913,6 @@ static int zero_in_l2_slice(BlockDriverState *bs, 
uint64_t offset,
 int l2_index;
 int ret;
 int i;
-bool unmap = !!(flags & BDRV_REQ_MAY_UNMAP);
 
 ret = get_cluster_table(bs, offset, _slice, _index);
 if (ret < 0) {
@@ -1925,28 +1924,31 @@ static int zero_in_l2_slice(BlockDriverState *bs, 
uint64_t offset,
 assert(nb_clusters <= INT_MAX);
 
 for (i = 0; i < nb_clusters; i++) {
-uint64_t old_offset;
-QCow2ClusterType cluster_type;
+uint64_t old_l2_entry = get_l2_entry(s, l2_slice, l2_index + i);
+uint64_t old_l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index + i);
+QCow2ClusterType type = qcow2_get_cluster_type(bs, old_l2_entry);
+bool unmap = (type == QCOW2_CLUSTER_COMPRESSED) ||
+((flags & BDRV_REQ_MAY_UNMAP) && qcow2_cluster_is_allocated(type));
+uint64_t new_l2_entry = unmap ? 0 : old_l2_entry;
+uint64_t new_l2_bitmap = old_l2_bitmap;
 
-old_offset = get_l2_entry(s, l2_slice, l2_index + i);
+if (has_subclusters(s)) {
+new_l2_bitmap = QCOW_L2_BITMAP_ALL_ZEROES;
+} else {
+new_l2_entry |= QCOW_OFLAG_ZERO;
+}
 
-/*
- * Minimize L2 changes if the cluster already reads back as
- * zeroes with correct allocation.
- */
-cluster_type = qcow2_get_cluster_type(bs, old_offset);
-if (cluster_type == QCOW2_CLUSTER_ZERO_PLAIN ||
-(cluster_type == QCOW2_CLUSTER_ZERO_ALLOC && !unmap)) {
+if (old_l2_entry == new_l2_entry && old_l2_bitmap == new_l2_bitmap) {
 continue;
 }
 
 qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
-if (cluster_type == QCOW2_CLUSTER_COMPRESSED || unmap) {
-set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);
-qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST);
-} else {
-uint64_t entry = get_l2_entry(s, l2_slice, l2_index + i);
-set_l2_entry(s, l2_slice, l2_index + i, entry | QCOW_OFLAG_ZERO);
+if (unmap) {
+qcow2_free_any_clusters(bs, old_l2_entry, 1, 
QCOW2_DISCARD_REQUEST);
+}
+set_l2_entry(s, l2_slice, l2_index + i, new_l2_entry);
+if (has_subclusters(s)) {
+set_l2_bitmap(s, l2_slice, l2_index + i, new_l2_bitmap);
 }
 }
 
-- 
2.20.1

[PATCH v5 27/31] qcow2: Add subcluster support to qcow2_co_pwrite_zeroes()

2020-05-05 Thread Alberto Garcia

This works now at the subcluster level and pwrite_zeroes_alignment is
updated accordingly.

qcow2_cluster_zeroize() is turned into qcow2_subcluster_zeroize() with
the following changes:

   - The request can now be subcluster-aligned.

   - The cluster-aligned body of the request is still zeroized using
 zero_in_l2_slice() as before.

   - The subcluster-aligned head and tail of the request are zeroized
 with the new zero_l2_subclusters() function.

There is just one thing to take into account for a possible future
improvement: compressed clusters cannot be partially zeroized so
zero_l2_subclusters() on the head or the tail can return -ENOTSUP.
This makes the caller repeat the *complete* request and write actual
zeroes to disk. This is sub-optimal because

   1) if the head area was compressed we would still be able to use
  the fast path for the body and possibly the tail.

   2) if the tail area was compressed we are writing zeroes to the
  head and the body areas, which are already zeroized.

Signed-off-by: Alberto Garcia 
---
 block/qcow2.h |  4 +--
 block/qcow2-cluster.c | 80 +++
 block/qcow2.c | 27 ---
 3 files changed, 90 insertions(+), 21 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 05e3ef0ece..7349c6ce40 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -881,8 +881,8 @@ void qcow2_alloc_cluster_abort(BlockDriverState *bs, 
QCowL2Meta *m);
 int qcow2_cluster_discard(BlockDriverState *bs, uint64_t offset,
   uint64_t bytes, enum qcow2_discard_type type,
   bool full_discard);
-int qcow2_cluster_zeroize(BlockDriverState *bs, uint64_t offset,
-  uint64_t bytes, int flags);
+int qcow2_subcluster_zeroize(BlockDriverState *bs, uint64_t offset,
+ uint64_t bytes, int flags);
 
 int qcow2_expand_zero_clusters(BlockDriverState *bs,
BlockDriverAmendStatusCB *status_cb,
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 0a295076a3..d0cf9d52e6 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1971,12 +1971,58 @@ static int zero_in_l2_slice(BlockDriverState *bs, 
uint64_t offset,
 return nb_clusters;
 }
 
-int qcow2_cluster_zeroize(BlockDriverState *bs, uint64_t offset,
-  uint64_t bytes, int flags)
+static int zero_l2_subclusters(BlockDriverState *bs, uint64_t offset,
+   unsigned nb_subclusters)
+{
+BDRVQcow2State *s = bs->opaque;
+uint64_t *l2_slice;
+uint64_t old_l2_bitmap, l2_bitmap;
+int l2_index, ret, sc = offset_to_sc_index(s, offset);
+
+/* For full clusters use zero_in_l2_slice() instead */
+assert(nb_subclusters > 0 && nb_subclusters < s->subclusters_per_cluster);
+assert(sc + nb_subclusters <= s->subclusters_per_cluster);
+
+ret = get_cluster_table(bs, offset, _slice, _index);
+if (ret < 0) {
+return ret;
+}
+
+switch (qcow2_get_cluster_type(bs, get_l2_entry(s, l2_slice, l2_index))) {
+case QCOW2_CLUSTER_COMPRESSED:
+ret = -ENOTSUP; /* We cannot partially zeroize compressed clusters */
+goto out;
+case QCOW2_CLUSTER_NORMAL:
+case QCOW2_CLUSTER_UNALLOCATED:
+break;
+default:
+g_assert_not_reached();
+}
+
+old_l2_bitmap = l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index);
+
+l2_bitmap |=  QCOW_OFLAG_SUB_ZERO_RANGE(sc, sc + nb_subclusters - 1);
+l2_bitmap &= ~QCOW_OFLAG_SUB_ALLOC_RANGE(sc, sc + nb_subclusters - 1);
+
+if (old_l2_bitmap != l2_bitmap) {
+set_l2_bitmap(s, l2_slice, l2_index, l2_bitmap);
+qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
+}
+
+ret = 0;
+out:
+qcow2_cache_put(s->l2_table_cache, (void **) _slice);
+
+return ret;
+}
+
+int qcow2_subcluster_zeroize(BlockDriverState *bs, uint64_t offset,
+ uint64_t bytes, int flags)
 {
 BDRVQcow2State *s = bs->opaque;
 uint64_t end_offset = offset + bytes;
 uint64_t nb_clusters;
+unsigned head, tail;
 int64_t cleared;
 int ret;
 
@@ -1991,8 +2037,8 @@ int qcow2_cluster_zeroize(BlockDriverState *bs, uint64_t 
offset,
 }
 
 /* Caller must pass aligned values, except at image end */
-assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
-assert(QEMU_IS_ALIGNED(end_offset, s->cluster_size) ||
+assert(offset_into_subcluster(s, offset) == 0);
+assert(offset_into_subcluster(s, end_offset) == 0 ||
end_offset >= bs->total_sectors << BDRV_SECTOR_BITS);
 
 /* The zero flag is only supported by version 3 and newer */
@@ -2000,11 +2046,26 @@ int qcow2_cluster_zeroize(BlockDriverState *bs, 
uint64_t offset,
 return -ENOTSUP;
 }
 
-/* Each L2 slice is handled by its own loop iteration */
-nb_clusters = size_to_clusters(s, bytes);
+head = MIN(end_offset, ROUND_UP(offset,

[PATCH v5 07/31] qcow2: Document the Extended L2 Entries feature

2020-05-05 Thread Alberto Garcia

Subcluster allocation in qcow2 is implemented by extending the
existing L2 table entries and adding additional information to
indicate the allocation status of each subcluster.

This patch documents the changes to the qcow2 format and how they
affect the calculation of the L2 cache size.

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
---
 docs/interop/qcow2.txt | 68 --
 docs/qcow2-cache.txt   | 19 +++-
 2 files changed, 83 insertions(+), 4 deletions(-)

diff --git a/docs/interop/qcow2.txt b/docs/interop/qcow2.txt
index 298a031310..f08e43f228 100644
--- a/docs/interop/qcow2.txt
+++ b/docs/interop/qcow2.txt
@@ -42,6 +42,9 @@ The first cluster of a qcow2 image contains the file header:
 as the maximum cluster size and won't be able to open 
images
 with larger cluster sizes.
 
+Note: if the image has Extended L2 Entries then 
cluster_bits
+must be at least 14 (i.e. 16384 byte clusters).
+
  24 - 31:   size
 Virtual disk size in bytes.
 
@@ -117,7 +120,12 @@ the next fields through header_length.
 clusters. The compression_type field must be
 present and not zero.
 
-Bits 4-63:  Reserved (set to 0)
+Bit 4:  Extended L2 Entries.  If this bit is set then
+L2 table entries use an extended format that
+allows subcluster-based allocation. See the
+Extended L2 Entries section for more details.
+
+Bits 5-63:  Reserved (set to 0)
 
  80 -  87:  compatible_features
 Bitmask of compatible features. An implementation can
@@ -497,7 +505,7 @@ cannot be relaxed without an incompatible layout change).
 Given an offset into the virtual disk, the offset into the image file can be
 obtained as follows:
 
-l2_entries = (cluster_size / sizeof(uint64_t))
+l2_entries = (cluster_size / sizeof(uint64_t))[*]
 
 l2_index = (offset / cluster_size) % l2_entries
 l1_index = (offset / cluster_size) / l2_entries
@@ -507,6 +515,8 @@ obtained as follows:
 
 return cluster_offset + (offset % cluster_size)
 
+[*] this changes if Extended L2 Entries are enabled, see next section
+
 L1 table entry:
 
 Bit  0 -  8:Reserved (set to 0)
@@ -547,7 +557,8 @@ Standard Cluster Descriptor:
 nor is data read from the backing file if the cluster is
 unallocated.
 
-With version 2, this is always 0.
+With version 2 or with extended L2 entries (see the next
+section), this is always 0.
 
  1 -  8:Reserved (set to 0)
 
@@ -584,6 +595,57 @@ file (except if bit 0 in the Standard Cluster Descriptor 
is set). If there is
 no backing file or the backing file is smaller than the image, they shall read
 zeros for all parts that are not covered by the backing file.
 
+== Extended L2 Entries ==
+
+An image uses Extended L2 Entries if bit 4 is set on the incompatible_features
+field of the header.
+
+In these images standard data clusters are divided into 32 subclusters of the
+same size. They are contiguous and start from the beginning of the cluster.
+Subclusters can be allocated independently and the L2 entry contains 
information
+indicating the status of each one of them. Compressed data clusters don't have
+subclusters so they are treated the same as in images without this feature.
+
+The size of an extended L2 entry is 128 bits so the number of entries per table
+is calculated using this formula:
+
+l2_entries = (cluster_size / (2 * sizeof(uint64_t)))
+
+The first 64 bits have the same format as the standard L2 table entry described
+in the previous section, with the exception of bit 0 of the standard cluster
+descriptor.
+
+The last 64 bits contain a subcluster allocation bitmap with this format:
+
+Subcluster Allocation Bitmap (for standard clusters):
+
+Bit  0 -  31:   Allocation status (one bit per subcluster)
+
+1: the subcluster is allocated. In this case the
+   host cluster offset field must contain a valid
+   offset.
+0: the subcluster is not allocated. In this case
+   read requests shall go to the backing file or
+   return zeros if there is no backing file data.
+
+Bits are assigned starting from the least significant
+one (i.e. bit x is used for subcluster x).
+
+32 -  63Subcluster reads as zeros (one bit per subcluster)
+
+1: the subcluster reads as zeros. In this case the
+   allocation status bit must be unset. The host
+   cluster offset field may or

[PATCH v5 20/31] qcow2: Add subcluster support to qcow2_get_host_offset()

2020-05-05 Thread Alberto Garcia

The logic of this function remains pretty much the same, except that
it uses count_contiguous_subclusters(), which combines the logic of
count_contiguous_clusters() / count_contiguous_clusters_unallocated()
and checks individual subclusters.

Signed-off-by: Alberto Garcia 
---
 block/qcow2.h |  38 +---
 block/qcow2-cluster.c | 141 --
 2 files changed, 82 insertions(+), 97 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 1663b5359c..05e3ef0ece 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -692,29 +692,6 @@ static inline QCow2ClusterType 
qcow2_get_cluster_type(BlockDriverState *bs,
 }
 }
 
-/*
- * For an image without extended L2 entries, return the
- * QCow2SubclusterType equivalent of a given QCow2ClusterType.
- */
-static inline
-QCow2SubclusterType qcow2_cluster_to_subcluster_type(QCow2ClusterType type)
-{
-switch (type) {
-case QCOW2_CLUSTER_COMPRESSED:
-return QCOW2_SUBCLUSTER_COMPRESSED;
-case QCOW2_CLUSTER_ZERO_PLAIN:
-return QCOW2_SUBCLUSTER_ZERO_PLAIN;
-case QCOW2_CLUSTER_ZERO_ALLOC:
-return QCOW2_SUBCLUSTER_ZERO_ALLOC;
-case QCOW2_CLUSTER_NORMAL:
-return QCOW2_SUBCLUSTER_NORMAL;
-case QCOW2_CLUSTER_UNALLOCATED:
-return QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN;
-default:
-g_assert_not_reached();
-}
-}
-
 /*
  * In an image without subsclusters @l2_bitmap is ignored and
  * @sc_index must be 0.
@@ -759,7 +736,20 @@ QCow2SubclusterType 
qcow2_get_subcluster_type(BlockDriverState *bs,
 g_assert_not_reached();
 }
 } else {
-return qcow2_cluster_to_subcluster_type(type);
+switch (type) {
+case QCOW2_CLUSTER_COMPRESSED:
+return QCOW2_SUBCLUSTER_COMPRESSED;
+case QCOW2_CLUSTER_ZERO_PLAIN:
+return QCOW2_SUBCLUSTER_ZERO_PLAIN;
+case QCOW2_CLUSTER_ZERO_ALLOC:
+return QCOW2_SUBCLUSTER_ZERO_ALLOC;
+case QCOW2_CLUSTER_NORMAL:
+return QCOW2_SUBCLUSTER_NORMAL;
+case QCOW2_CLUSTER_UNALLOCATED:
+return QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN;
+default:
+g_assert_not_reached();
+}
 }
 }
 
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index ffcb11edda..f500cbfb8e 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -376,66 +376,58 @@ fail:
 }
 
 /*
- * Checks how many clusters in a given L2 slice are contiguous in the image
- * file. As soon as one of the flags in the bitmask stop_flags changes compared
- * to the first cluster, the search is stopped and the cluster is not counted
- * as contiguous. (This allows it, for example, to stop at the first compressed
- * cluster which may require a different handling)
+ * Return the number of contiguous subclusters of the exact same type
+ * in a given L2 slice, starting from cluster @l2_index, subcluster
+ * @sc_index. Allocated subclusters are required to be contiguous in
+ * the image file.
+ * At most @nb_clusters are checked (note that this means clusters,
+ * not subclusters).
+ * Compressed clusters are always processed one by one but for the
+ * purpose of this count they are treated as if they were divided into
+ * subclusters of size s->subcluster_size.
  */
-static int count_contiguous_clusters(BlockDriverState *bs, int nb_clusters,
-int cluster_size, uint64_t *l2_slice, int l2_index, uint64_t 
stop_flags)
+static int count_contiguous_subclusters(BlockDriverState *bs, int nb_clusters,
+unsigned sc_index, uint64_t *l2_slice,
+int l2_index)
 {
 BDRVQcow2State *s = bs->opaque;
-int i;
-QCow2ClusterType first_cluster_type;
-uint64_t mask = stop_flags | L2E_OFFSET_MASK | QCOW_OFLAG_COMPRESSED;
-uint64_t first_entry = get_l2_entry(s, l2_slice, l2_index);
-uint64_t offset = first_entry & mask;
-
-first_cluster_type = qcow2_get_cluster_type(bs, first_entry);
-if (first_cluster_type == QCOW2_CLUSTER_UNALLOCATED) {
-return 0;
+int i, j, count = 0;
+uint64_t l2_entry = get_l2_entry(s, l2_slice, l2_index);
+uint64_t l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index);
+uint64_t expected_offset = l2_entry & L2E_OFFSET_MASK;
+bool check_offset = true;
+QCow2SubclusterType type =
+qcow2_get_subcluster_type(bs, l2_entry, l2_bitmap, sc_index);
+
+assert(type != QCOW2_SUBCLUSTER_INVALID); /* The caller should check this 
*/
+assert(l2_index + nb_clusters <= s->l2_size);
+
+if (type == QCOW2_SUBCLUSTER_COMPRESSED) {
+/* Compressed clusters are always processed one by one */
+return s->subclusters_per_cluster - sc_index;
 }
 
-/* must be allocated */
-assert(first_cluster_type == QCOW2_CLUSTER_NORMAL ||
-   first_cluster_type == QCOW2_CLUSTER_ZERO_ALLOC);
-
-for (i = 0; i < nb_clusters; i++) {
-uint64_t l2_entry = get_l2_entry(s,

[PATCH v5 02/31] qcow2: Convert qcow2_get_cluster_offset() into qcow2_get_host_offset()

2020-05-05 Thread Alberto Garcia

qcow2_get_cluster_offset() takes an (unaligned) guest offset and
returns the (aligned) offset of the corresponding cluster in the qcow2
image.

In practice none of the callers need to know where the cluster starts
so this patch makes the function calculate and return the final host
offset directly. The function is also renamed accordingly.

There is a pre-existing exception with compressed clusters: in this
case the function returns the complete cluster descriptor (containing
the offset and size of the compressed data). This does not change with
this patch but it is now documented.

Signed-off-by: Alberto Garcia 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 block/qcow2.h |  4 ++--
 block/qcow2-cluster.c | 42 +++---
 block/qcow2.c | 24 +++-
 3 files changed, 32 insertions(+), 38 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index f4de0a27d5..37e4f79e39 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -676,8 +676,8 @@ int qcow2_write_l1_entry(BlockDriverState *bs, int 
l1_index);
 int qcow2_encrypt_sectors(BDRVQcow2State *s, int64_t sector_num,
   uint8_t *buf, int nb_sectors, bool enc, Error 
**errp);
 
-int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
- unsigned int *bytes, uint64_t *cluster_offset);
+int qcow2_get_host_offset(BlockDriverState *bs, uint64_t offset,
+  unsigned int *bytes, uint64_t *host_offset);
 int qcow2_alloc_cluster_offset(BlockDriverState *bs, uint64_t offset,
unsigned int *bytes, uint64_t *host_offset,
QCowL2Meta **m);
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 4b5fc8c4a7..9ab41cb728 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -496,10 +496,15 @@ static int coroutine_fn 
do_perform_cow_write(BlockDriverState *bs,
 
 
 /*
- * get_cluster_offset
+ * get_host_offset
  *
- * For a given offset of the virtual disk, find the cluster type and offset in
- * the qcow2 file. The offset is stored in *cluster_offset.
+ * For a given offset of the virtual disk find the equivalent host
+ * offset in the qcow2 file and store it in *host_offset. Neither
+ * offset needs to be aligned to a cluster boundary.
+ *
+ * If the cluster is unallocated then *host_offset will be 0.
+ * If the cluster is compressed then *host_offset will contain the
+ * complete compressed cluster descriptor.
  *
  * On entry, *bytes is the maximum number of contiguous bytes starting at
  * offset that we are interested in.
@@ -511,12 +516,12 @@ static int coroutine_fn 
do_perform_cow_write(BlockDriverState *bs,
  * Returns the cluster type (QCOW2_CLUSTER_*) on success, -errno in error
  * cases.
  */
-int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
- unsigned int *bytes, uint64_t *cluster_offset)
+int qcow2_get_host_offset(BlockDriverState *bs, uint64_t offset,
+  unsigned int *bytes, uint64_t *host_offset)
 {
 BDRVQcow2State *s = bs->opaque;
 unsigned int l2_index;
-uint64_t l1_index, l2_offset, *l2_slice;
+uint64_t l1_index, l2_offset, *l2_slice, l2_entry;
 int c;
 unsigned int offset_in_cluster;
 uint64_t bytes_available, bytes_needed, nb_clusters;
@@ -537,8 +542,6 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t 
offset,
 bytes_needed = bytes_available;
 }
 
-*cluster_offset = 0;
-
 /* seek to the l2 offset in the l1 table */
 
 l1_index = offset_to_l1_index(s, offset);
@@ -570,7 +573,7 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t 
offset,
 /* find the cluster offset for the given disk offset */
 
 l2_index = offset_to_l2_slice_index(s, offset);
-*cluster_offset = be64_to_cpu(l2_slice[l2_index]);
+l2_entry = be64_to_cpu(l2_slice[l2_index]);
 
 nb_clusters = size_to_clusters(s, bytes_needed);
 /* bytes_needed <= *bytes + offset_in_cluster, both of which are unsigned
@@ -578,7 +581,7 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t 
offset,
  * true */
 assert(nb_clusters <= INT_MAX);
 
-type = qcow2_get_cluster_type(bs, *cluster_offset);
+type = qcow2_get_cluster_type(bs, l2_entry);
 if (s->qcow_version < 3 && (type == QCOW2_CLUSTER_ZERO_PLAIN ||
 type == QCOW2_CLUSTER_ZERO_ALLOC)) {
 qcow2_signal_corruption(bs, true, -1, -1, "Zero cluster entry found"
@@ -599,42 +602,43 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, 
uint64_t offset,
 }
 /* Compressed clusters can only be processed one by one */
 c = 1;
-*cluster_offset &= L2E_COMPRESSED_OFFSET_SIZE_MASK;
+*host_offset = l2_entry & L2E_COMPRESSED_OFFSET_SIZE_MASK;
 break;
 case QCOW2_CLUSTER_ZERO_PLAIN:
 case QCOW2_CLUSTER_UNALLOCATED:
 /* how many empty clusters ? */

[PATCH v5 01/31] qcow2: Make Qcow2AioTask store the full host offset

2020-05-05 Thread Alberto Garcia

The file_cluster_offset field of Qcow2AioTask stores a cluster-aligned
host offset. In practice this is not very useful because all users(*)
of this structure need the final host offset into the cluster, which
they calculate using

   host_offset = file_cluster_offset + offset_into_cluster(s, offset)

There is no reason why Qcow2AioTask cannot store host_offset directly
and that is what this patch does.

(*) compressed clusters are the exception: in this case what
file_cluster_offset was storing was the full compressed cluster
descriptor (offset + size). This does not change with this patch
but it is documented now.

Signed-off-by: Alberto Garcia 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 block/qcow2.c  | 69 ++
 block/trace-events |  2 +-
 2 files changed, 34 insertions(+), 37 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 8c97b06783..a387809aa9 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -74,7 +74,7 @@ typedef struct {
 
 static int coroutine_fn
 qcow2_co_preadv_compressed(BlockDriverState *bs,
-   uint64_t file_cluster_offset,
+   uint64_t cluster_descriptor,
uint64_t offset,
uint64_t bytes,
QEMUIOVector *qiov,
@@ -2043,7 +2043,7 @@ out:
 
 static coroutine_fn int
 qcow2_co_preadv_encrypted(BlockDriverState *bs,
-   uint64_t file_cluster_offset,
+   uint64_t host_offset,
uint64_t offset,
uint64_t bytes,
QEMUIOVector *qiov,
@@ -2070,16 +2070,12 @@ qcow2_co_preadv_encrypted(BlockDriverState *bs,
 }
 
 BLKDBG_EVENT(bs->file, BLKDBG_READ_AIO);
-ret = bdrv_co_pread(s->data_file,
-file_cluster_offset + offset_into_cluster(s, offset),
-bytes, buf, 0);
+ret = bdrv_co_pread(s->data_file, host_offset, bytes, buf, 0);
 if (ret < 0) {
 goto fail;
 }
 
-if (qcow2_co_decrypt(bs,
- file_cluster_offset + offset_into_cluster(s, offset),
- offset, buf, bytes) < 0)
+if (qcow2_co_decrypt(bs, host_offset, offset, buf, bytes) < 0)
 {
 ret = -EIO;
 goto fail;
@@ -2097,7 +2093,7 @@ typedef struct Qcow2AioTask {
 
 BlockDriverState *bs;
 QCow2ClusterType cluster_type; /* only for read */
-uint64_t file_cluster_offset;
+uint64_t host_offset; /* or full descriptor in compressed clusters */
 uint64_t offset;
 uint64_t bytes;
 QEMUIOVector *qiov;
@@ -2110,7 +2106,7 @@ static coroutine_fn int qcow2_add_task(BlockDriverState 
*bs,
AioTaskPool *pool,
AioTaskFunc func,
QCow2ClusterType cluster_type,
-   uint64_t file_cluster_offset,
+   uint64_t host_offset,
uint64_t offset,
uint64_t bytes,
QEMUIOVector *qiov,
@@ -2125,7 +2121,7 @@ static coroutine_fn int qcow2_add_task(BlockDriverState 
*bs,
 .bs = bs,
 .cluster_type = cluster_type,
 .qiov = qiov,
-.file_cluster_offset = file_cluster_offset,
+.host_offset = host_offset,
 .offset = offset,
 .bytes = bytes,
 .qiov_offset = qiov_offset,
@@ -2134,7 +2130,7 @@ static coroutine_fn int qcow2_add_task(BlockDriverState 
*bs,
 
 trace_qcow2_add_task(qemu_coroutine_self(), bs, pool,
  func == qcow2_co_preadv_task_entry ? "read" : "write",
- cluster_type, file_cluster_offset, offset, bytes,
+ cluster_type, host_offset, offset, bytes,
  qiov, qiov_offset);
 
 if (!pool) {
@@ -2148,13 +2144,12 @@ static coroutine_fn int qcow2_add_task(BlockDriverState 
*bs,
 
 static coroutine_fn int qcow2_co_preadv_task(BlockDriverState *bs,
  QCow2ClusterType cluster_type,
- uint64_t file_cluster_offset,
+ uint64_t host_offset,
  uint64_t offset, uint64_t bytes,
  QEMUIOVector *qiov,
  size_t qiov_offset)
 {
 BDRVQcow2State *s = bs->opaque;
-int offset_in_cluster = offset_into_cluster(s, offset);
 
 switch (cluster_type) {
 case QCOW2_CLUSTER_ZERO_PLAIN:
@@ -2170,19 +2165,17 @@ static coroutine_fn int 
qcow2_co_preadv_task(BlockDriverState *bs,
qiov,

[PATCH v5 11/31] qcow2: Add offset_into_subcluster() and size_to_subclusters()

2020-05-05 Thread Alberto Garcia

Like offset_into_cluster() and size_to_clusters(), but for
subclusters.

Signed-off-by: Alberto Garcia 
---
 block/qcow2.h | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/block/qcow2.h b/block/qcow2.h
index e68febb15b..8b1ed1cbcf 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -537,11 +537,21 @@ static inline int64_t offset_into_cluster(BDRVQcow2State 
*s, int64_t offset)
 return offset & (s->cluster_size - 1);
 }
 
+static inline int64_t offset_into_subcluster(BDRVQcow2State *s, int64_t offset)
+{
+return offset & (s->subcluster_size - 1);
+}
+
 static inline uint64_t size_to_clusters(BDRVQcow2State *s, uint64_t size)
 {
 return (size + (s->cluster_size - 1)) >> s->cluster_bits;
 }
 
+static inline uint64_t size_to_subclusters(BDRVQcow2State *s, uint64_t size)
+{
+return (size + (s->subcluster_size - 1)) >> s->subcluster_bits;
+}
+
 static inline int64_t size_to_l1(BDRVQcow2State *s, int64_t size)
 {
 int shift = s->cluster_bits + s->l2_bits;
-- 
2.20.1

[PATCH v5 24/31] qcow2: Update L2 bitmap in qcow2_alloc_cluster_link_l2()

2020-05-05 Thread Alberto Garcia

The L2 bitmap needs to be updated after each write to indicate what
new subclusters are now allocated. This needs to happen even if the
cluster was already allocated and the L2 entry was otherwise valid.

In some cases however a write operation doesn't need change the L2
bitmap (because all affected subclusters were already allocated). This
is detected in calculate_l2_meta(), and qcow2_alloc_cluster_link_l2()
is never called in those cases.

Signed-off-by: Alberto Garcia 
---
 block/qcow2-cluster.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 2283a308d0..4544a40aa0 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1000,6 +1000,24 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, 
QCowL2Meta *m)
 assert((offset & L2E_OFFSET_MASK) == offset);
 
 set_l2_entry(s, l2_slice, l2_index + i, offset | QCOW_OFLAG_COPIED);
+
+/* Update bitmap with the subclusters that were just written */
+if (has_subclusters(s)) {
+uint64_t l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index + i);
+unsigned written_from = m->cow_start.offset;
+unsigned written_to = m->cow_end.offset + m->cow_end.nb_bytes ?:
+m->nb_clusters << s->cluster_bits;
+int first_sc, last_sc;
+/* Narrow written_from and written_to down to the current cluster 
*/
+written_from = MAX(written_from, i << s->cluster_bits);
+written_to   = MIN(written_to, (i + 1) << s->cluster_bits);
+assert(written_from < written_to);
+first_sc = offset_to_sc_index(s, written_from);
+last_sc  = offset_to_sc_index(s, written_to - 1);
+l2_bitmap |= QCOW_OFLAG_SUB_ALLOC_RANGE(first_sc, last_sc);
+l2_bitmap &= ~QCOW_OFLAG_SUB_ZERO_RANGE(first_sc, last_sc);
+set_l2_bitmap(s, l2_slice, l2_index + i, l2_bitmap);
+}
  }
 
 
-- 
2.20.1

[PATCH v5 08/31] qcow2: Add dummy has_subclusters() function

2020-05-05 Thread Alberto Garcia

This function will be used by the qcow2 code to check if an image has
subclusters or not.

At the moment this simply returns false. Once all patches needed for
subcluster support are ready then QEMU will be able to create and
read images with subclusters and this function will return the actual
value.

Signed-off-by: Alberto Garcia 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 block/qcow2.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/block/qcow2.h b/block/qcow2.h
index 97fbaba574..5e8036c3dd 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -492,6 +492,12 @@ typedef enum QCow2MetadataOverlap {
 
 #define INV_OFFSET (-1ULL)
 
+static inline bool has_subclusters(BDRVQcow2State *s)
+{
+/* FIXME: Return false until this feature is complete */
+return false;
+}
+
 static inline uint64_t get_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice,
 int idx)
 {
-- 
2.20.1

[PATCH v5 03/31] qcow2: Add calculate_l2_meta()

2020-05-05 Thread Alberto Garcia

handle_alloc() creates a QCowL2Meta structure in order to update the
image metadata and perform the necessary copy-on-write operations.

This patch moves that code to a separate function so it can be used
from other places.

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
---
 block/qcow2-cluster.c | 77 +--
 1 file changed, 53 insertions(+), 24 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 9ab41cb728..61ad638bdc 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1037,6 +1037,56 @@ void qcow2_alloc_cluster_abort(BlockDriverState *bs, 
QCowL2Meta *m)
 }
 }
 
+/*
+ * For a given write request, create a new QCowL2Meta structure, add
+ * it to @m and the BDRVQcow2State.cluster_allocs list.
+ *
+ * @host_cluster_offset points to the beginning of the first cluster.
+ *
+ * @guest_offset and @bytes indicate the offset and length of the
+ * request.
+ *
+ * If @keep_old is true it means that the clusters were already
+ * allocated and will be overwritten. If false then the clusters are
+ * new and we have to decrease the reference count of the old ones.
+ */
+static void calculate_l2_meta(BlockDriverState *bs,
+  uint64_t host_cluster_offset,
+  uint64_t guest_offset, unsigned bytes,
+  QCowL2Meta **m, bool keep_old)
+{
+BDRVQcow2State *s = bs->opaque;
+unsigned cow_start_from = 0;
+unsigned cow_start_to = offset_into_cluster(s, guest_offset);
+unsigned cow_end_from = cow_start_to + bytes;
+unsigned cow_end_to = ROUND_UP(cow_end_from, s->cluster_size);
+unsigned nb_clusters = size_to_clusters(s, cow_end_from);
+QCowL2Meta *old_m = *m;
+
+*m = g_malloc0(sizeof(**m));
+**m = (QCowL2Meta) {
+.next   = old_m,
+
+.alloc_offset   = host_cluster_offset,
+.offset = start_of_cluster(s, guest_offset),
+.nb_clusters= nb_clusters,
+
+.keep_old_clusters = keep_old,
+
+.cow_start = {
+.offset = cow_start_from,
+.nb_bytes   = cow_start_to - cow_start_from,
+},
+.cow_end = {
+.offset = cow_end_from,
+.nb_bytes   = cow_end_to - cow_end_from,
+},
+};
+
+qemu_co_queue_init(&(*m)->dependent_requests);
+QLIST_INSERT_HEAD(>cluster_allocs, *m, next_in_flight);
+}
+
 /*
  * Returns the number of contiguous clusters that can be used for an allocating
  * write, but require COW to be performed (this includes yet unallocated space,
@@ -1435,35 +1485,14 @@ static int handle_alloc(BlockDriverState *bs, uint64_t 
guest_offset,
 uint64_t requested_bytes = *bytes + offset_into_cluster(s, guest_offset);
 int avail_bytes = nb_clusters << s->cluster_bits;
 int nb_bytes = MIN(requested_bytes, avail_bytes);
-QCowL2Meta *old_m = *m;
-
-*m = g_malloc0(sizeof(**m));
-
-**m = (QCowL2Meta) {
-.next   = old_m,
-
-.alloc_offset   = alloc_cluster_offset,
-.offset = start_of_cluster(s, guest_offset),
-.nb_clusters= nb_clusters,
-
-.keep_old_clusters  = keep_old_clusters,
-
-.cow_start = {
-.offset = 0,
-.nb_bytes   = offset_into_cluster(s, guest_offset),
-},
-.cow_end = {
-.offset = nb_bytes,
-.nb_bytes   = avail_bytes - nb_bytes,
-},
-};
-qemu_co_queue_init(&(*m)->dependent_requests);
-QLIST_INSERT_HEAD(>cluster_allocs, *m, next_in_flight);
 
 *host_offset = alloc_cluster_offset + offset_into_cluster(s, guest_offset);
 *bytes = MIN(*bytes, nb_bytes - offset_into_cluster(s, guest_offset));
 assert(*bytes != 0);
 
+calculate_l2_meta(bs, alloc_cluster_offset, guest_offset, *bytes,
+  m, keep_old_clusters);
+
 return 1;
 
 fail:
-- 
2.20.1

[PATCH v5 18/31] qcow2: Handle QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC

2020-05-05 Thread Alberto Garcia

When dealing with subcluster types there is a new value called
QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC that has no equivalent in
QCow2ClusterType.

This patch handles that value in all places where subcluster types
are processed.

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 block/qcow2.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 13965f2e1d..63e952b89a 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1994,7 +1994,8 @@ static int coroutine_fn 
qcow2_co_block_status(BlockDriverState *bs,
 *pnum = bytes;
 
 if ((type == QCOW2_SUBCLUSTER_NORMAL ||
- type == QCOW2_SUBCLUSTER_ZERO_ALLOC) && !s->crypto) {
+ type == QCOW2_SUBCLUSTER_ZERO_ALLOC ||
+ type == QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC) && !s->crypto) {
 *map = host_offset;
 *file = s->data_file->bs;
 status |= BDRV_BLOCK_OFFSET_VALID;
@@ -2002,7 +2003,8 @@ static int coroutine_fn 
qcow2_co_block_status(BlockDriverState *bs,
 if (type == QCOW2_SUBCLUSTER_ZERO_PLAIN ||
 type == QCOW2_SUBCLUSTER_ZERO_ALLOC) {
 status |= BDRV_BLOCK_ZERO;
-} else if (type != QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN) {
+} else if (type != QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN &&
+   type != QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC) {
 status |= BDRV_BLOCK_DATA;
 }
 if (s->metadata_preallocation && (status & BDRV_BLOCK_DATA) &&
@@ -2165,6 +2167,7 @@ static coroutine_fn int 
qcow2_co_preadv_task(BlockDriverState *bs,
 g_assert_not_reached();
 
 case QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN:
+case QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC:
 assert(bs->backing); /* otherwise handled in qcow2_co_preadv_part */
 
 BLKDBG_EVENT(bs->file, BLKDBG_READ_BACKING_AIO);
@@ -2233,7 +2236,8 @@ static coroutine_fn int 
qcow2_co_preadv_part(BlockDriverState *bs,
 
 if (type == QCOW2_SUBCLUSTER_ZERO_PLAIN ||
 type == QCOW2_SUBCLUSTER_ZERO_ALLOC ||
-(type == QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN && !bs->backing))
+(type == QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN && !bs->backing) ||
+(type == QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC && !bs->backing))
 {
 qemu_iovec_memset(qiov, qiov_offset, 0, cur_bytes);
 } else {
@@ -3761,6 +3765,7 @@ static coroutine_fn int 
qcow2_co_pwrite_zeroes(BlockDriverState *bs,
 ret = qcow2_get_host_offset(bs, offset, , , );
 if (ret < 0 ||
 (type != QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN &&
+ type != QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC &&
  type != QCOW2_SUBCLUSTER_ZERO_PLAIN &&
  type != QCOW2_SUBCLUSTER_ZERO_ALLOC)) {
 qemu_co_mutex_unlock(>lock);
@@ -3839,6 +3844,7 @@ qcow2_co_copy_range_from(BlockDriverState *bs,
 
 switch (type) {
 case QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN:
+case QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC:
 if (bs->backing && bs->backing->bs) {
 int64_t backing_length = bdrv_getlength(bs->backing->bs);
 if (src_offset >= backing_length) {
-- 
2.20.1

[PATCH v5 10/31] qcow2: Add offset_to_sc_index()

2020-05-05 Thread Alberto Garcia

For a given offset, return the subcluster number within its cluster
(i.e. with 32 subclusters per cluster it returns a number between 0
and 31).

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 block/qcow2.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/block/qcow2.h b/block/qcow2.h
index 9d8ca8068c..e68febb15b 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -563,6 +563,11 @@ static inline int offset_to_l2_slice_index(BDRVQcow2State 
*s, int64_t offset)
 return (offset >> s->cluster_bits) & (s->l2_slice_size - 1);
 }
 
+static inline int offset_to_sc_index(BDRVQcow2State *s, int64_t offset)
+{
+return (offset >> s->subcluster_bits) & (s->subclusters_per_cluster - 1);
+}
+
 static inline int64_t qcow2_vm_state_offset(BDRVQcow2State *s)
 {
 return (int64_t)s->l1_vm_state_index << (s->cluster_bits + s->l2_bits);
-- 
2.20.1

Re: [PATCH 5/6] block/nvme: Align block pages queue to host page size

2020-05-05 Thread Philippe Mathieu-Daudé


On 5/4/20 11:46 AM, Philippe Mathieu-Daudé wrote:

In nvme_create_queue_pair() we create a page list using
qemu_blockalign(), then map it with qemu_vfio_dma_map():

   q->prp_list_pages = qemu_blockalign0(bs, s->page_size * NVME_QUEUE_SIZE);
   r = qemu_vfio_dma_map(s->vfio, q->prp_list_pages,
 s->page_size * NVME_QUEUE_SIZE, ...);

With:

   s->page_size = MAX(4096, 1 << (12 + ((cap >> 48) & 0xF)));

The qemu_vfio_dma_map() documentation says "The caller need
to make sure the area is aligned to page size". While we use
multiple s->page_size as alignment, it might be not sufficient
on some hosts. Use the qemu_real_host_page_size value to be
sure the host alignment is respected.

Signed-off-by: Philippe Mathieu-Daudé 
---
Cc: Cédric Le Goater 
Cc: David Gibson 
Cc: Laurent Vivier 
---
  block/nvme.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/nvme.c b/block/nvme.c
index 7b7c0cc5d6..bde0d28b39 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -627,7 +627,7 @@ static int nvme_init(BlockDriverState *bs, const char 
*device, int namespace,
  
  s->page_size = MAX(4096, 1 << (12 + ((cap >> 48) & 0xF)));

  s->doorbell_scale = (4 << (((cap >> 32) & 0xF))) / sizeof(uint32_t);
-bs->bl.opt_mem_alignment = s->page_size;
+bs->bl.opt_mem_alignment = MAX(qemu_real_host_page_size, s->page_size);
  timeout_ms = MIN(500 * ((cap >> 24) & 0xFF), 3);
  
  /* Reset device to get a clean state. */




Please disregard this patch as it is incomplete.

Re: [PATCH v5 3/7] qcow: Tolerate backing_fmt=, but warn on backing_fmt=raw

2020-05-05 Thread Eric Blake


On 5/5/20 2:35 AM, Kevin Wolf wrote:

Am 03.04.2020 um 19:58 hat Eric Blake geschrieben:

qcow has no space in the metadata to store a backing format, and there
are existing qcow images backed both by raw or by other formats
(usually qcow) images, reliant on probing to tell the difference.
While we don't recommend the creation of new qcow images (as qcow2 is
hands-down better), we can at least insist that if the user does
request a specific format without using -u, then it must be non-raw
(as a raw backing file that gets inadvertently edited into some other
format can form a security hole); if the user does not request a
specific format or lies when using -u, then the status quo of probing
for the backing format remains intact (although an upcoming patch will
warn when omitting a format request).  Thus, when this series is
complete, the only way to use a backing file for qcow without
triggering a warning is when using -F if the backing file is non-raw
to begin with.  Note that this is only for QemuOpts usage; there is no
change to the QAPI to allow a format through -blockdev.

Add a new iotest 290 just for qcow, to demonstrate the new warning.

Signed-off-by: Eric Blake 


Somehow this feels backwards. Not specifying the backing file format at
all isn't any safer than explicitly specifying raw.

If there is a difference at all, I would say that explicitly specifying
raw means that the user is aware what they are doing. So we would have
more reason to warn against raw images if the backing format isn't
specified at all because then the user might not be aware that they are
using a backing file that probes as raw.


Prior to this patch, -F does not work with qcow.  And even with this 
patch, we still cannot store the explicit value of -F in the qcow file. 
Anything that does not use -F must continue to work for now (although it 
may now warn, and in fact must warn if we deprecate it), while anything 
explicit is free to fail (since it failed already), but could also be 
made to work (if letting it work is nicer than making it fail, and where 
"work" may still include a warning, although it's pointless to have 
something brand new that works but is deprecated out of the box).  So 
the following is my summary of the two options we can choose between:


Option 1, qcow backed by raw is more common than qcow backed by other, 
so we want:
raw <- qcow, no -F: work without warning (but if backing file is edited, 
a future probe seeing non-raw would break image)
raw <- qcow, with -F: work without warning (but if backing file is 
edited, a future probe seeing non-raw would break image)
other <- qcow, no -F: works but issues a warning (but backing file will 
always probe correctly)
other <- qcow, with -F: fails (we cannot honor the user's explicit 
request, because we would still have to probe)


Option 2, qcow backed by other is more common than qcow backed by raw, 
so we want:
raw <- qcow, no -F: works but issues a warning (using a raw backing file 
without explicit buy-in is risky)
raw <- qcow, with -F: works but issues a warning (explicit buy-in will 
still require subsequent probing, and a backing file could change which 
would break image)

other <- qcow, no -F: works without warning
other <- qcow, with -F: works without warning (later probing will still 
see non-raw)


It looks like you are leaning more towards option 1, while my patch 
leaned more towards option 2.  Anyone else want to chime in with an 
opinion on which is safer vs. easier?




@@ -953,6 +954,13 @@ static int coroutine_fn qcow_co_create_opts(BlockDriver 
*drv,
  };

  /* Parse options and convert legacy syntax */
+backing_fmt = qemu_opt_get_del(opts, BLOCK_OPT_BACKING_FMT);
+if (backing_fmt && !strcmp(backing_fmt, "raw")) {
+error_setg(errp, "qcow cannot store backing format; an explicit "
+   "backing format of raw is unsafe");


Does this message tell that an implicit backing format of raw is safe?


If we go with option 2, are we trying to deprecate ALL use of raw as a 
backing file to qcow, regardless of whether the user was explicit about 
it?  If we go with option 1, then we are instead deprecating any use of 
non-raw as a backing file to qcow.


At the end of the day, we are trying to discourage users from creating 
new qcow files in the first place; qcow2 is much better.  We still have 
to read existing qcow images with backing files, but maybe we want:


Option 3:
completely deprecate qcow images with backing files, as there is no safe 
way to do things favoring either raw (option 1) or non-raw (option 2), 
and therefore accept -F solely for convenience with the rest of the 
series, but always issue a warning regardless of whether -F was present.





+ret = -EINVAL;
+goto fail;
+}


The commit message promises a warning. This is not a warning, but a hard
error.


Once we decide which behavior we want, I'll make sure the commit message 
matches the behavior.  Remember, for

Re: Backup of vm disk images

2020-05-05 Thread Anders Östling

Thanks Peter and Stefan for enlightening me!

On Mon, May 4, 2020 at 9:58 AM Peter Krempa  wrote:
>
>  One thing to note though is that the backup integration is not entirely
>  finished in libvirt and thus in a 'tech-preview' state. Some
>  interactions corrupt the state for incremental backups.
>
>  If you are interested, I can give you specific info how to enable
>  support for backups as well as the specifics of the current state of
>  implementation.
>

I would very much appreciate if you can tell me more on this! It's for
a client, and I want to be as sure as possible that the solution is
robust.

Also, the wiki page referred by Kashyap is also something that I will
experiment with!

Thanks again folks!

Anders

-- 
---
This signature contains 100% recyclable electrons as prescribed by Mother Nature

Anders Östling
+46 768 716 165 (Mobil)
+46 431 45 56 01  (Hem)

[PATCH] qcow2: Fix preallocation on block devices

2020-05-05 Thread Max Reitz

Calling bdrv_getlength() to get the pre-truncate file size will not
really work on block devices, because they have always the same length,
and trying to write beyond it will fail with a rather cryptic error
message.

Instead, we should use qcow2_get_last_cluster() and bdrv_getlength()
only as a fallback.

Before this patch:
$ truncate -s 1G test.img
$ sudo losetup -f --show test.img
/dev/loop0
$ sudo qemu-img create -f qcow2 -o preallocation=full /dev/loop0 64M
Formatting '/dev/loop0', fmt=qcow2 size=67108864 cluster_size=65536
preallocation=full lazy_refcounts=off refcount_bits=16
qemu-img: /dev/loop0: Could not resize image: Failed to resize refcount
structures: No space left on device

With this patch:
$ sudo qemu-img create -f qcow2 -o preallocation=full /dev/loop0 64M
Formatting '/dev/loop0', fmt=qcow2 size=67108864 cluster_size=65536
preallocation=full lazy_refcounts=off refcount_bits=16
qemu-img: /dev/loop0: Could not resize image: Failed to resize
underlying file: Preallocation mode 'full' unsupported for this
non-regular file

So as you can see, it still fails, but now the problem is missing
support on the block device level, so we at least get a better error
message.

Note that we cannot preallocate block devices on truncate by design,
because we do not know what area to preallocate.  Their length is always
the same, the truncate operation does not change it.

Signed-off-by: Max Reitz 
---
 block/qcow2.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index ad934109a8..c1a9edd6dc 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -4107,7 +4107,7 @@ static int coroutine_fn 
qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
 {
 int64_t allocation_start, host_offset, guest_offset;
 int64_t clusters_allocated;
-int64_t old_file_size, new_file_size;
+int64_t old_file_size, last_cluster, new_file_size;
 uint64_t nb_new_data_clusters, nb_new_l2_tables;
 
 /* With a data file, preallocation means just allocating the metadata
@@ -4127,7 +4127,13 @@ static int coroutine_fn 
qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
 ret = old_file_size;
 goto fail;
 }
-old_file_size = ROUND_UP(old_file_size, s->cluster_size);
+
+last_cluster = qcow2_get_last_cluster(bs, old_file_size);
+if (last_cluster >= 0) {
+old_file_size = (last_cluster + 1) * s->cluster_size;
+} else {
+old_file_size = ROUND_UP(old_file_size, s->cluster_size);
+}
 
 nb_new_data_clusters = DIV_ROUND_UP(offset - old_length,
 s->cluster_size);
-- 
2.26.2

Re: [PATCH] iotests/055: Use cache.no-flush for vmdk target

2020-05-05 Thread Eric Blake


On 5/5/20 1:46 AM, Kevin Wolf wrote:

055 uses the backup block job to create a compressed backup of an
$IMGFMT image with both qcow2 and vmdk targets. However, cluster
allocation in vmdk is very slow because it flushes the image file after
each L2 update.

There is no reason why we need this level of safety in this test, so
let's disable flushes for vmdk. For the blockdev-backup tests this is
achieved by simply adding the cache.no-flush=on to the drive_add() for
the target. For drive-backup, the caching flags are copied from the
source node, so we'll also add the flag to the source node, even though
it is not vmdk.

This can make the test run significantly faster (though it doesn't make
a difference on tmpfs). In my usual setup it goes from ~45s to ~15s.

Signed-off-by: Kevin Wolf 
---
  tests/qemu-iotests/055 | 11 +++
  1 file changed, 7 insertions(+), 4 deletions(-)


Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH v4 05/13] acpi: move aml builder code for serial device

2020-05-05 Thread Philippe Mathieu-Daudé


On 5/5/20 1:38 PM, Gerd Hoffmann wrote:

The code uses the isa_serial_io array to figure what the device uid is.
Side effect is that acpi antries are not limited to port 1+2 any more,
we'll also get entries for ports 3+4.

Signed-off-by: Gerd Hoffmann 
Reviewed-by: Igor Mammedov 
---
  hw/char/serial-isa.c | 32 
  hw/i386/acpi-build.c | 32 
  2 files changed, 32 insertions(+), 32 deletions(-)

diff --git a/hw/char/serial-isa.c b/hw/char/serial-isa.c
index f9b6eed7833d..f7c19a398ced 100644
--- a/hw/char/serial-isa.c
+++ b/hw/char/serial-isa.c
@@ -27,6 +27,7 @@
  #include "qapi/error.h"
  #include "qemu/module.h"
  #include "sysemu/sysemu.h"
+#include "hw/acpi/aml-build.h"
  #include "hw/char/serial.h"
  #include "hw/isa/isa.h"
  #include "hw/qdev-properties.h"
@@ -81,6 +82,35 @@ static void serial_isa_realizefn(DeviceState *dev, Error 
**errp)
  isa_register_ioport(isadev, >io, isa->iobase);
  }
  
+static void serial_isa_build_aml(ISADevice *isadev, Aml *scope)

+{
+ISASerialState *isa = ISA_SERIAL(isadev);
+int i, uid = 0;
+Aml *dev;
+Aml *crs;
+
+for (i = 0; i < ARRAY_SIZE(isa_serial_io); i++) {
+if (isa->iobase == isa_serial_io[i]) {
+uid = i + 1;


Similarly to the parallel device patch, I'd use "uid = isa->index + 1" 
instead.



+}
+}
+if (!uid) {
+return;
+}
+
+crs = aml_resource_template();
+aml_append(crs, aml_io(AML_DECODE16, isa->iobase, isa->iobase, 0x00, 
0x08));
+aml_append(crs, aml_irq_no_flags(isa->isairq));
+
+dev = aml_device("COM%d", uid);
+aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0501")));
+aml_append(dev, aml_name_decl("_UID", aml_int(uid)));
+aml_append(dev, aml_name_decl("_STA", aml_int(0xf)));
+aml_append(dev, aml_name_decl("_CRS", crs));
+
+aml_append(scope, dev);
+}
+
  static const VMStateDescription vmstate_isa_serial = {
  .name = "serial",
  .version_id = 3,
@@ -103,9 +133,11 @@ static Property serial_isa_properties[] = {
  static void serial_isa_class_initfn(ObjectClass *klass, void *data)
  {
  DeviceClass *dc = DEVICE_CLASS(klass);
+ISADeviceClass *isa = ISA_DEVICE_CLASS(klass);
  
  dc->realize = serial_isa_realizefn;

  dc->vmsd = _isa_serial;
+isa->build_aml = serial_isa_build_aml;
  device_class_set_props(dc, serial_isa_properties);
  set_bit(DEVICE_CATEGORY_INPUT, dc->categories);
  }
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 3a82730a0d19..0e6a5151f4c3 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -1208,36 +1208,6 @@ static Aml *build_lpt_device_aml(void)
  return dev;
  }
  
-static void build_com_device_aml(Aml *scope, uint8_t uid)

-{
-Aml *dev;
-Aml *crs;
-uint8_t irq = 4;
-uint16_t io_port = 0x03F8;
-
-assert(uid == 1 || uid == 2);
-if (uid == 2) {
-irq = 3;
-io_port = 0x02F8;
-}
-if (!memory_region_present(get_system_io(), io_port)) {
-return;
-}
-
-dev = aml_device("COM%d", uid);
-aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0501")));
-aml_append(dev, aml_name_decl("_UID", aml_int(uid)));
-
-aml_append(dev, aml_name_decl("_STA", aml_int(0xf)));
-
-crs = aml_resource_template();
-aml_append(crs, aml_io(AML_DECODE16, io_port, io_port, 0x00, 0x08));
-aml_append(crs, aml_irq_no_flags(irq));
-aml_append(dev, aml_name_decl("_CRS", crs));
-
-aml_append(scope, dev);
-}
-
  static void build_isa_devices_aml(Aml *table)
  {
  ISADevice *fdc = pc_find_fdc0();
@@ -1252,8 +1222,6 @@ static void build_isa_devices_aml(Aml *table)
  aml_append(scope, build_fdc_device_aml(fdc));
  }
  aml_append(scope, build_lpt_device_aml());
-build_com_device_aml(scope, 1);
-build_com_device_aml(scope, 2);
  
  if (ambiguous) {

  error_report("Multiple ISA busses, unable to define IPMI ACPI data");

Re: [PATCH v4 13/13] floppy: make isa_fdc_get_drive_max_chs static

2020-05-05 Thread Igor Mammedov

On Tue,  5 May 2020 13:38:43 +0200
Gerd Hoffmann  wrote:

> acpi aml generator needs this, but it is in floppy code now
> so we can make the function static.
> 
> Signed-off-by: Gerd Hoffmann 

Reviewed-by: Igor Mammedov 

> ---
>  include/hw/block/fdc.h | 2 --
>  hw/block/fdc.c | 4 ++--
>  2 files changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/include/hw/block/fdc.h b/include/hw/block/fdc.h
> index c15ff4c62315..5d71cf972268 100644
> --- a/include/hw/block/fdc.h
> +++ b/include/hw/block/fdc.h
> @@ -16,7 +16,5 @@ void sun4m_fdctrl_init(qemu_irq irq, hwaddr io_base,
> DriveInfo **fds, qemu_irq *fdc_tc);
>  
>  FloppyDriveType isa_fdc_get_drive_type(ISADevice *fdc, int i);
> -void isa_fdc_get_drive_max_chs(FloppyDriveType type,
> -   uint8_t *maxc, uint8_t *maxh, uint8_t *maxs);
>  
>  #endif
> diff --git a/hw/block/fdc.c b/hw/block/fdc.c
> index 40faa088b5f7..499a580b993c 100644
> --- a/hw/block/fdc.c
> +++ b/hw/block/fdc.c
> @@ -2744,8 +2744,8 @@ FloppyDriveType isa_fdc_get_drive_type(ISADevice *fdc, 
> int i)
>  return isa->state.drives[i].drive;
>  }
>  
> -void isa_fdc_get_drive_max_chs(FloppyDriveType type,
> -   uint8_t *maxc, uint8_t *maxh, uint8_t *maxs)
> +static void isa_fdc_get_drive_max_chs(FloppyDriveType type, uint8_t *maxc,
> +  uint8_t *maxh, uint8_t *maxs)
>  {
>  const FDFormat *fdf;
>

Re: [PATCH v4 07/13] acpi: move aml builder code for parallel device

2020-05-05 Thread Philippe Mathieu-Daudé


Hi Gerd,

On 5/5/20 1:38 PM, Gerd Hoffmann wrote:

Also adds support for multiple LPT devices.

Signed-off-by: Gerd Hoffmann 
Reviewed-by: Igor Mammedov 
---
  hw/char/parallel.c   | 32 
  hw/i386/acpi-build.c | 23 ---
  2 files changed, 32 insertions(+), 23 deletions(-)

diff --git a/hw/char/parallel.c b/hw/char/parallel.c
index 8dd67d13759b..bc6b55b3b910 100644
--- a/hw/char/parallel.c
+++ b/hw/char/parallel.c
@@ -28,6 +28,7 @@
  #include "qemu/module.h"
  #include "chardev/char-parallel.h"
  #include "chardev/char-fe.h"
+#include "hw/acpi/aml-build.h"
  #include "hw/irq.h"
  #include "hw/isa/isa.h"
  #include "hw/qdev-properties.h"
@@ -568,6 +569,35 @@ static void parallel_isa_realizefn(DeviceState *dev, Error 
**errp)
   s, "parallel");
  }
  
+static void parallel_isa_build_aml(ISADevice *isadev, Aml *scope)

+{
+ISAParallelState *isa = ISA_PARALLEL(isadev);
+int i, uid = 0;
+Aml *dev;
+Aml *crs;
+
+for (i = 0; i < ARRAY_SIZE(isa_parallel_io); i++) {
+if (isa->iobase == isa_parallel_io[i]) {
+uid = i + 1;


I'm not sure about this check, as we can create a ISA device setting 
manually index & iobase. What about using simply "uid = isa->index + 1" 
instead?



+}
+}
+if (!uid) {
+return;
+}
+
+crs = aml_resource_template();
+aml_append(crs, aml_io(AML_DECODE16, isa->iobase, isa->iobase, 0x08, 
0x08));
+aml_append(crs, aml_irq_no_flags(isa->isairq));
+
+dev = aml_device("LPT%d", uid);
+aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0400")));
+aml_append(dev, aml_name_decl("_UID", aml_int(uid)));
+aml_append(dev, aml_name_decl("_STA", aml_int(0xf)));
+aml_append(dev, aml_name_decl("_CRS", crs));
+
+aml_append(scope, dev);
+}
+
  /* Memory mapped interface */
  static uint64_t parallel_mm_readfn(void *opaque, hwaddr addr, unsigned size)
  {
@@ -624,9 +654,11 @@ static Property parallel_isa_properties[] = {
  static void parallel_isa_class_initfn(ObjectClass *klass, void *data)
  {
  DeviceClass *dc = DEVICE_CLASS(klass);
+ISADeviceClass *isa = ISA_DEVICE_CLASS(klass);
  
  dc->realize = parallel_isa_realizefn;

  dc->vmsd = _parallel_isa;
+isa->build_aml = parallel_isa_build_aml;
  device_class_set_props(dc, parallel_isa_properties);
  set_bit(DEVICE_CATEGORY_INPUT, dc->categories);
  }
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 2188a2b99d18..443db94deb5b 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -1173,28 +1173,6 @@ static Aml *build_mouse_device_aml(void)
  return dev;
  }
  
-static void build_lpt_device_aml(Aml *scope)

-{
-Aml *dev;
-Aml *crs;
-
-if (!memory_region_present(get_system_io(), 0x0378)) {
-return;
-}
-
-dev = aml_device("LPT");
-aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0400")));
-
-aml_append(dev, aml_name_decl("_STA", aml_int(0xf)));
-
-crs = aml_resource_template();
-aml_append(crs, aml_io(AML_DECODE16, 0x0378, 0x0378, 0x08, 0x08));
-aml_append(crs, aml_irq_no_flags(7));
-aml_append(dev, aml_name_decl("_CRS", crs));
-
-aml_append(scope, dev);
-}
-
  static void build_isa_devices_aml(Aml *table)
  {
  ISADevice *fdc = pc_find_fdc0();
@@ -1208,7 +1186,6 @@ static void build_isa_devices_aml(Aml *table)
  if (fdc) {
  aml_append(scope, build_fdc_device_aml(fdc));
  }
-build_lpt_device_aml(scope);
  
  if (ambiguous) {

  error_report("Multiple ISA busses, unable to define IPMI ACPI data");

Re: [PATCH v4 12/13] acpi: drop serial/parallel enable bits from dsdt

2020-05-05 Thread Igor Mammedov

On Tue,  5 May 2020 13:38:42 +0200
Gerd Hoffmann  wrote:

> The _STA methods for COM+LPT used to reference them,
> but that isn't the case any more.
> 
> Signed-off-by: Gerd Hoffmann 
> ---
>  hw/i386/acpi-build.c | 23 ---
>  1 file changed, 23 deletions(-)
> 
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index 1922868f3401..765409a90eb6 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -1437,15 +1437,6 @@ static void build_q35_isa_bridge(Aml *table)
>  aml_append(field, aml_named_field("LPTD", 2));
  
not related to this patch but it seems that above are also unused fields.
it was this way in Seabios and probably even earlier wherever it was copied 
from.

>  aml_append(dev, field);
>  
> -aml_append(dev, aml_operation_region("LPCE", AML_PCI_CONFIG,
> - aml_int(0x82), 0x02));
> -/* enable bits */
> -field = aml_field("LPCE", AML_ANY_ACC, AML_NOLOCK, AML_PRESERVE);
> -aml_append(field, aml_named_field("CAEN", 1));
> -aml_append(field, aml_named_field("CBEN", 1));
> -aml_append(field, aml_named_field("LPEN", 1));
> -aml_append(dev, field);
> -
>  aml_append(scope, dev);
>  aml_append(table, scope);
>  }
> @@ -1469,7 +1460,6 @@ static void build_piix4_isa_bridge(Aml *table)
>  {
>  Aml *dev;
>  Aml *scope;
> -Aml *field;
>  
>  scope =  aml_scope("_SB.PCI0");
>  dev = aml_device("ISA");
> @@ -1478,19 +1468,6 @@ static void build_piix4_isa_bridge(Aml *table)
>  /* PIIX PCI to ISA irq remapping */
>  aml_append(dev, aml_operation_region("P40C", AML_PCI_CONFIG,
>   aml_int(0x60), 0x04));
> -/* enable bits */
> -field = aml_field("^PX13.P13C", AML_ANY_ACC, AML_NOLOCK, AML_PRESERVE);
^^^ should we drop this as well as it becomes 
unused?

> -/* Offset(0x5f),, 7, */
> -aml_append(field, aml_reserved_field(0x2f8));
> -aml_append(field, aml_reserved_field(7));
> -aml_append(field, aml_named_field("LPEN", 1));
> -/* Offset(0x67),, 3, */
> -aml_append(field, aml_reserved_field(0x38));
> -aml_append(field, aml_reserved_field(3));
> -aml_append(field, aml_named_field("CAEN", 1));
> -aml_append(field, aml_reserved_field(3));
> -aml_append(field, aml_named_field("CBEN", 1));
> -aml_append(dev, field);
>  
>  aml_append(scope, dev);
>  aml_append(table, scope);

Re: [PATCH v3 03/33] block: Add BdrvChildRole and BdrvChildRoleBits

2020-05-05 Thread Kevin Wolf

Am 05.05.2020 um 15:20 hat Max Reitz geschrieben:
> On 05.05.20 14:54, Kevin Wolf wrote:
> > When you're the author, the meaning of everything is clear to you. :-)
> > 
> > In case of doubt, I would be more explicit so that the comment gives a
> > clear guideline for which role to use in which scenario.
> 
> OK, so you mean just noting everywhere explicitly how many children can
> get a specific flag, and not just in some cases?  That is, make a note
> for DATA and METADATA that they can be given to an arbitrary number of
> children, and COW only to at most one.

Sounds good to me.

> >>> blkverify:
> >>>
> >>> * x-image: BDRV_CHILD_PRIMARY | BDRV_CHILD_DATA | BDRV_CHILD_FILTERED
> >>> * x-raw: BDRV_CHILD_DATA | BDRV_CHILD_FILTERED
> >>>
> >>> Hm, according to the documentation, this doesn't work, FILTERED can be
> >>> set only for one node. But the condition ("the parent forwards all reads
> >>> and writes") applies to both children. I think the documentation should
> >>> mention what needs to be done in such cases.
> >>
> >> I don’t know.  blkverify is a rare exception by design, because it can
> >> abort when both children don’t match.  (I suppose we could theoretically
> >> have a quorum mode where a child gets ejected once a mismatch is
> >> detected, but that isn’t the case now.)
> > 
> > Well, yes, this is exceptional. I would ignore that property for
> > assigning roles because when it comes to play, roles don't matter any
> > more because the whole process is gone. So...
> > 
> >> Furthermore, I would argue that blkverify actually expects the formatted
> >> image to sometimes differ from the raw image, if anything, because the
> >> format driver is to be tested.  This is the reason why I chose x-raw to
> >> be the filtered child.
> > 
> > ...I don't think this case is relevant. If blkverify returns something,
> > both children have the same data.
> 
> Another argument is that right now, bs->file points to x-raw, and
> .is_filter is set.  So x-raw is already treated as the filtered child.

I admit defeat. :-)

> >> So there is no general instruction on what to do in such cases that I
> >> followed here, I specifically chose one child based on what blkverify is
> >> and what it’s supposed to do.  Therefore, I can’t really give a general
> >> instruction on “what needs to be done in such cases”.
> > 
> > Maybe the missing part for me is what FILTERED is even used for. I
> > assume it's for skipping over filters in certain functions in the
> > generic block layer?
> 
> Yes.
> 
> > In this case, maybe the right answer is that...
> > 
> >>> For blkverify, both
> >>> children are not equal in intention, so I guess the "real" filtered
> >>> child is x-image. But for quorum, you can't make any such distinction. I
> >>> assume the recommendation should be not to set FILTERED for any child
> >>> then.
> >>
> >> Quorum just isn’t a filter driver.
> > 
> > ...blkverify isn't one either because performing an operation on only
> > one child usually won't be correct.
> 
> Good point.  It would work if filters are just skipped for functions
> that read/query stuff, which I think is the case.  I don’t think we ever
> skip filters when it comes to modifying data.
> 
> In any case, I wouldn’t lose too much sleep over blkverify whatever we
> do.  It’s a driver used purely for debugging purposes.

Yeah, I'm not really worried about blkverify per se. It just seems like
an interesting corner case to discuss to make sure that the design of
the role system is right and doesn't miss anything important.

> > Maybe the more relevant question would be if a FILTERED child must be
> > the only child to avoid the problems we're discussing for blkverify. But
> > I think I already answered that question for myself with "no", so
> > probably not much use asking it.
> 
> blkverify is just a bit weird, and I personally don’t mind just treating
> it as something “special”, considering it’s just a debugging aid.
> 
> Regardless of blkverify, I don’t think FILTERED children must be the
> only children, though, because I can well imagine filter drivers having
> metadata children on the side, e.g. config data or bitmaps (not just
> dirty bitmaps, but also e.g. what to cache for a hypothetical cache driver).

The example of a caching driver that uses a child node for the cached
data (probably on a fast, but small disk) was what made me answer the
question with "no".

But as you write, having a pure metadata child could make sense, too.

Kevin


signature.asc
Description: PGP signature

Re: [PATCH v4 11/13] acpi: simplify build_isa_devices_aml()

2020-05-05 Thread Philippe Mathieu-Daudé


On 5/5/20 1:38 PM, Gerd Hoffmann wrote:

x86 machines can have a single ISA bus only.


Reviewed-by: Philippe Mathieu-Daudé 



Signed-off-by: Gerd Hoffmann 
---
  hw/i386/acpi-build.c | 15 +--
  1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index cb3913d2ee76..1922868f3401 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -1061,19 +1061,14 @@ static void build_hpet_aml(Aml *table)
  static void build_isa_devices_aml(Aml *table)
  {
  bool ambiguous;
-
-Aml *scope = aml_scope("_SB.PCI0.ISA");
  Object *obj = object_resolve_path_type("", TYPE_ISA_BUS, );
+Aml *scope;
  
-if (ambiguous) {

-error_report("Multiple ISA busses, unable to define IPMI ACPI data");
-} else if (!obj) {
-error_report("No ISA bus, unable to define IPMI ACPI data");
-} else {
-build_acpi_ipmi_devices(scope, BUS(obj), "\\_SB.PCI0.ISA");
-isa_build_aml(ISA_BUS(obj), scope);
-}
+assert(obj && !ambiguous);
  
+scope = aml_scope("_SB.PCI0.ISA");

+build_acpi_ipmi_devices(scope, BUS(obj), "\\_SB.PCI0.ISA");
+isa_build_aml(ISA_BUS(obj), scope);
  aml_append(table, scope);
  }

Re: [PATCH v4 02/13] acpi: move aml builder code for rtc device

2020-05-05 Thread Philippe Mathieu-Daudé


On 5/5/20 1:38 PM, Gerd Hoffmann wrote:

Signed-off-by: Gerd Hoffmann 
---
  hw/i386/acpi-build.c | 17 -
  hw/rtc/mc146818rtc.c | 22 ++
  2 files changed, 22 insertions(+), 17 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 2e15f6848e7e..0bfa2dd23fcc 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -1137,22 +1137,6 @@ static Aml *build_fdc_device_aml(ISADevice *fdc)
  return dev;
  }
  
-static Aml *build_rtc_device_aml(void)

-{
-Aml *dev;
-Aml *crs;
-
-dev = aml_device("RTC");
-aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0B00")));
-crs = aml_resource_template();
-aml_append(crs, aml_io(AML_DECODE16, 0x0070, 0x0070, 0x10, 0x02));
-aml_append(crs, aml_irq_no_flags(8));
-aml_append(crs, aml_io(AML_DECODE16, 0x0072, 0x0072, 0x02, 0x06));
-aml_append(dev, aml_name_decl("_CRS", crs));
-
-return dev;
-}
-
  static Aml *build_kbd_device_aml(void)
  {
  Aml *dev;
@@ -1278,7 +1262,6 @@ static void build_isa_devices_aml(Aml *table)
  Aml *scope = aml_scope("_SB.PCI0.ISA");
  Object *obj = object_resolve_path_type("", TYPE_ISA_BUS, );
  
-aml_append(scope, build_rtc_device_aml());

  aml_append(scope, build_kbd_device_aml());
  aml_append(scope, build_mouse_device_aml());
  if (fdc) {
diff --git a/hw/rtc/mc146818rtc.c b/hw/rtc/mc146818rtc.c
index d18c09911be2..2104e0aa3b14 100644
--- a/hw/rtc/mc146818rtc.c
+++ b/hw/rtc/mc146818rtc.c
@@ -27,6 +27,7 @@
  #include "qemu/cutils.h"
  #include "qemu/module.h"
  #include "qemu/bcd.h"
+#include "hw/acpi/aml-build.h"
  #include "hw/irq.h"
  #include "hw/qdev-properties.h"
  #include "qemu/timer.h"
@@ -1007,13 +1008,34 @@ static void rtc_resetdev(DeviceState *d)
  }
  }
  
+static void rtc_build_aml(ISADevice *isadev, Aml *scope)

+{
+Aml *dev;
+Aml *crs;
+
+crs = aml_resource_template();
+aml_append(crs, aml_io(AML_DECODE16, RTC_ISA_BASE, RTC_ISA_BASE,
+   0x10, 0x02));
+aml_append(crs, aml_irq_no_flags(RTC_ISA_IRQ));
+aml_append(crs, aml_io(AML_DECODE16, RTC_ISA_BASE + 2, RTC_ISA_BASE + 2,
+   0x02, 0x06));
+
+dev = aml_device("RTC");
+aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0B00")));
+aml_append(dev, aml_name_decl("_CRS", crs));
+
+aml_append(scope, dev);
+}
+
  static void rtc_class_initfn(ObjectClass *klass, void *data)
  {
  DeviceClass *dc = DEVICE_CLASS(klass);
+ISADeviceClass *isa = ISA_DEVICE_CLASS(klass);
  
  dc->realize = rtc_realizefn;

  dc->reset = rtc_resetdev;
  dc->vmsd = _rtc;
+isa->build_aml = rtc_build_aml;
  device_class_set_props(dc, mc146818rtc_properties);
  }
  



Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v4 11/13] acpi: simplify build_isa_devices_aml()

2020-05-05 Thread Igor Mammedov

On Tue,  5 May 2020 13:38:41 +0200
Gerd Hoffmann  wrote:

> x86 machines can have a single ISA bus only.
> 
> Signed-off-by: Gerd Hoffmann 

Reviewed-by: Igor Mammedov 

> ---
>  hw/i386/acpi-build.c | 15 +--
>  1 file changed, 5 insertions(+), 10 deletions(-)
> 
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index cb3913d2ee76..1922868f3401 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -1061,19 +1061,14 @@ static void build_hpet_aml(Aml *table)
>  static void build_isa_devices_aml(Aml *table)
>  {
>  bool ambiguous;
> -
> -Aml *scope = aml_scope("_SB.PCI0.ISA");
>  Object *obj = object_resolve_path_type("", TYPE_ISA_BUS, );
> +Aml *scope;
>  
> -if (ambiguous) {
> -error_report("Multiple ISA busses, unable to define IPMI ACPI data");
> -} else if (!obj) {
> -error_report("No ISA bus, unable to define IPMI ACPI data");
> -} else {
> -build_acpi_ipmi_devices(scope, BUS(obj), "\\_SB.PCI0.ISA");
> -isa_build_aml(ISA_BUS(obj), scope);
> -}
> +assert(obj && !ambiguous);
>  
> +scope = aml_scope("_SB.PCI0.ISA");
> +build_acpi_ipmi_devices(scope, BUS(obj), "\\_SB.PCI0.ISA");
> +isa_build_aml(ISA_BUS(obj), scope);
>  aml_append(table, scope);
>  }
>

Re: [PATCH v4 03/13] acpi: rtc: use a single crs range

2020-05-05 Thread Igor Mammedov

On Tue,  5 May 2020 13:38:33 +0200
Gerd Hoffmann  wrote:

> Use a single io range for _CRS instead of two,
> following what real hardware does.
> 
> Signed-off-by: Gerd Hoffmann 
> ---
>  hw/rtc/mc146818rtc.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/hw/rtc/mc146818rtc.c b/hw/rtc/mc146818rtc.c
> index 2104e0aa3b14..47fafcfb7c1d 100644
> --- a/hw/rtc/mc146818rtc.c
> +++ b/hw/rtc/mc146818rtc.c
> @@ -1015,10 +1015,8 @@ static void rtc_build_aml(ISADevice *isadev, Aml 
> *scope)
>  
>  crs = aml_resource_template();
>  aml_append(crs, aml_io(AML_DECODE16, RTC_ISA_BASE, RTC_ISA_BASE,
> -   0x10, 0x02));
> +   0x10, 0x08));
>  aml_append(crs, aml_irq_no_flags(RTC_ISA_IRQ));
> -aml_append(crs, aml_io(AML_DECODE16, RTC_ISA_BASE + 2, RTC_ISA_BASE + 2,
> -   0x02, 0x06));
can we just drop the later range as unused? (I don't see where it's actually 
initialized)

>  
>  dev = aml_device("RTC");
>  aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0B00")));

Re: [PATCH v4 02/13] acpi: move aml builder code for rtc device

2020-05-05 Thread Igor Mammedov

On Tue,  5 May 2020 13:38:32 +0200
Gerd Hoffmann  wrote:

> Signed-off-by: Gerd Hoffmann 

Reviewed-by: Igor Mammedov 

> ---
>  hw/i386/acpi-build.c | 17 -
>  hw/rtc/mc146818rtc.c | 22 ++
>  2 files changed, 22 insertions(+), 17 deletions(-)
> 
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index 2e15f6848e7e..0bfa2dd23fcc 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -1137,22 +1137,6 @@ static Aml *build_fdc_device_aml(ISADevice *fdc)
>  return dev;
>  }
>  
> -static Aml *build_rtc_device_aml(void)
> -{
> -Aml *dev;
> -Aml *crs;
> -
> -dev = aml_device("RTC");
> -aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0B00")));
> -crs = aml_resource_template();
> -aml_append(crs, aml_io(AML_DECODE16, 0x0070, 0x0070, 0x10, 0x02));
> -aml_append(crs, aml_irq_no_flags(8));
> -aml_append(crs, aml_io(AML_DECODE16, 0x0072, 0x0072, 0x02, 0x06));
> -aml_append(dev, aml_name_decl("_CRS", crs));
> -
> -return dev;
> -}
> -
>  static Aml *build_kbd_device_aml(void)
>  {
>  Aml *dev;
> @@ -1278,7 +1262,6 @@ static void build_isa_devices_aml(Aml *table)
>  Aml *scope = aml_scope("_SB.PCI0.ISA");
>  Object *obj = object_resolve_path_type("", TYPE_ISA_BUS, );
>  
> -aml_append(scope, build_rtc_device_aml());
>  aml_append(scope, build_kbd_device_aml());
>  aml_append(scope, build_mouse_device_aml());
>  if (fdc) {
> diff --git a/hw/rtc/mc146818rtc.c b/hw/rtc/mc146818rtc.c
> index d18c09911be2..2104e0aa3b14 100644
> --- a/hw/rtc/mc146818rtc.c
> +++ b/hw/rtc/mc146818rtc.c
> @@ -27,6 +27,7 @@
>  #include "qemu/cutils.h"
>  #include "qemu/module.h"
>  #include "qemu/bcd.h"
> +#include "hw/acpi/aml-build.h"
>  #include "hw/irq.h"
>  #include "hw/qdev-properties.h"
>  #include "qemu/timer.h"
> @@ -1007,13 +1008,34 @@ static void rtc_resetdev(DeviceState *d)
>  }
>  }
>  
> +static void rtc_build_aml(ISADevice *isadev, Aml *scope)
> +{
> +Aml *dev;
> +Aml *crs;
> +
> +crs = aml_resource_template();
> +aml_append(crs, aml_io(AML_DECODE16, RTC_ISA_BASE, RTC_ISA_BASE,
> +   0x10, 0x02));
> +aml_append(crs, aml_irq_no_flags(RTC_ISA_IRQ));
> +aml_append(crs, aml_io(AML_DECODE16, RTC_ISA_BASE + 2, RTC_ISA_BASE + 2,
> +   0x02, 0x06));
> +
> +dev = aml_device("RTC");
> +aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0B00")));
> +aml_append(dev, aml_name_decl("_CRS", crs));
> +
> +aml_append(scope, dev);
> +}
> +
>  static void rtc_class_initfn(ObjectClass *klass, void *data)
>  {
>  DeviceClass *dc = DEVICE_CLASS(klass);
> +ISADeviceClass *isa = ISA_DEVICE_CLASS(klass);
>  
>  dc->realize = rtc_realizefn;
>  dc->reset = rtc_resetdev;
>  dc->vmsd = _rtc;
> +isa->build_aml = rtc_build_aml;
>  device_class_set_props(dc, mc146818rtc_properties);
>  }
>

Re: [PATCH v3 03/33] block: Add BdrvChildRole and BdrvChildRoleBits

2020-05-05 Thread Max Reitz

On 05.05.20 14:54, Kevin Wolf wrote:
> Am 05.05.2020 um 13:59 hat Max Reitz geschrieben:
>> On 05.05.20 13:19, Kevin Wolf wrote:
>>> Am 18.02.2020 um 13:42 hat Max Reitz geschrieben:

[...]

 +/* Useful combination of flags */
 +BDRV_CHILD_IMAGE= BDRV_CHILD_DATA
 +  | BDRV_CHILD_METADATA
 +  | BDRV_CHILD_PRIMARY,
 +};
 +
 +/* Mask of BdrvChildRoleBits values */
 +typedef unsigned int BdrvChildRole;
 +
  char *bdrv_perm_names(uint64_t perm);
  uint64_t bdrv_qapi_perm_to_blk_perm(BlockPermission qapi_perm);
>>>
>>> The list intuitively makes sense to me. Let me try to think of some
>>> interesting cases to see whether the documentation is complete or
>>> whether it could be improved.
>>>
>>>
>>> qcow2 is what everyone has in mind, so it should be obvious:
>>>
>>> * Without a data file:
>>>   * file: BDRV_CHILD_IMAGE
>>>   * backing: BDRV_CHILD_COW
>>>
>>> * With a data file:
>>>   * file: BDRV_CHILD_PRIMARY | BDRV_CHILD_METADATA
>>>   * data-file: BDRV_CHILD_DATA
>>>   * backing: BDRV_CHILD_COW
>>>
>>>
>>> We can use VMDK to make things a bit more interesting:
>>>
>>> * file: BDRV_CHILD_PRIMARY | BDRV_CHILD_METADATA
>>> * extents.*: BDRV_CHILD_METADATA | BDRV_CHILD_DATA
>>> * backing: BDRV_CHILD_COW
>>>
>>> In other words, we can have multiple data and metadata children. Is this
>>> correct or should extents not be marked as metadata? (Checked the final
>>> code: yes we do have multiple of them in vmdk.) Should this be mentioned
>>> in the documentation?
>>
>> If the extents contain metadata (I thought not, but I think I was just
>> wrong; sparse extents must contain their respective mapping structures),
>> then yes, they should all be marked as metadata children.
>>
>> I’m not sure whether that needs to be mentioned explicitly in the doc,
>> because “Child stores metadata” seems sufficient to me.
> 
> When you're the author, the meaning of everything is clear to you. :-)
> 
> In case of doubt, I would be more explicit so that the comment gives a
> clear guideline for which role to use in which scenario.

OK, so you mean just noting everywhere explicitly how many children can
get a specific flag, and not just in some cases?  That is, make a note
for DATA and METADATA that they can be given to an arbitrary number of
children, and COW only to at most one.

>>> Do we then also want to allow multiple BDRV_CHILD_COW children? We don't
>>> currently have a driver that needs it, but maybe it would be consistent
>>> with DATA and METADATA then. However, it would contradict the
>>> documentation that it's the "Child from which to read all data".
>>
>> Yes.  I would revisit that problem when the need arises.
>>
>> It seems to me like this would open a whole can of worms, just like
>> allowing multiple filtered children does.
> 
> Okay. Shall we document it explicitly like we do for the filter role?

Yep.

>>> blkverify:
>>>
>>> * x-image: BDRV_CHILD_PRIMARY | BDRV_CHILD_DATA | BDRV_CHILD_FILTERED
>>> * x-raw: BDRV_CHILD_DATA | BDRV_CHILD_FILTERED
>>>
>>> Hm, according to the documentation, this doesn't work, FILTERED can be
>>> set only for one node. But the condition ("the parent forwards all reads
>>> and writes") applies to both children. I think the documentation should
>>> mention what needs to be done in such cases.
>>
>> I don’t know.  blkverify is a rare exception by design, because it can
>> abort when both children don’t match.  (I suppose we could theoretically
>> have a quorum mode where a child gets ejected once a mismatch is
>> detected, but that isn’t the case now.)
> 
> Well, yes, this is exceptional. I would ignore that property for
> assigning roles because when it comes to play, roles don't matter any
> more because the whole process is gone. So...
> 
>> Furthermore, I would argue that blkverify actually expects the formatted
>> image to sometimes differ from the raw image, if anything, because the
>> format driver is to be tested.  This is the reason why I chose x-raw to
>> be the filtered child.
> 
> ...I don't think this case is relevant. If blkverify returns something,
> both children have the same data.

Another argument is that right now, bs->file points to x-raw, and
.is_filter is set.  So x-raw is already treated as the filtered child.

>> So there is no general instruction on what to do in such cases that I
>> followed here, I specifically chose one child based on what blkverify is
>> and what it’s supposed to do.  Therefore, I can’t really give a general
>> instruction on “what needs to be done in such cases”.
> 
> Maybe the missing part for me is what FILTERED is even used for. I
> assume it's for skipping over filters in certain functions in the
> generic block layer?

Yes.

> In this case, maybe the right answer is that...
> 
>>> For blkverify, both
>>> children are not equal in intention, so I guess the "real" filtered
>>> child is x-image. But for quorum, you can't make

[PULL 24/24] block/block-copy: use aio-task-pool API

2020-05-05 Thread Max Reitz

From: Vladimir Sementsov-Ogievskiy 

Run block_copy iterations in parallel in aio tasks.

Changes:
  - BlockCopyTask becomes aio task structure. Add zeroes field to pass
it to block_copy_do_copy
  - add call state - it's a state of one call of block_copy(), shared
between parallel tasks. For now used only to keep information about
first error: is it read or not.
  - convert block_copy_dirty_clusters to aio-task loop.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Message-Id: <20200429130847.28124-6-vsement...@virtuozzo.com>
Signed-off-by: Max Reitz 
---
 block/block-copy.c | 119 -
 1 file changed, 106 insertions(+), 13 deletions(-)

diff --git a/block/block-copy.c b/block/block-copy.c
index f560338647..03500680f7 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -19,15 +19,29 @@
 #include "block/block-copy.h"
 #include "sysemu/block-backend.h"
 #include "qemu/units.h"
+#include "qemu/coroutine.h"
+#include "block/aio_task.h"
 
 #define BLOCK_COPY_MAX_COPY_RANGE (16 * MiB)
 #define BLOCK_COPY_MAX_BUFFER (1 * MiB)
 #define BLOCK_COPY_MAX_MEM (128 * MiB)
+#define BLOCK_COPY_MAX_WORKERS 64
+
+static coroutine_fn int block_copy_task_entry(AioTask *task);
+
+typedef struct BlockCopyCallState {
+bool failed;
+bool error_is_read;
+} BlockCopyCallState;
 
 typedef struct BlockCopyTask {
+AioTask task;
+
 BlockCopyState *s;
+BlockCopyCallState *call_state;
 int64_t offset;
 int64_t bytes;
+bool zeroes;
 QLIST_ENTRY(BlockCopyTask) list;
 CoQueue wait_queue; /* coroutines blocked on this task */
 } BlockCopyTask;
@@ -116,6 +130,7 @@ static bool coroutine_fn block_copy_wait_one(BlockCopyState 
*s, int64_t offset,
  * the beginning of it.
  */
 static BlockCopyTask *block_copy_task_create(BlockCopyState *s,
+ BlockCopyCallState *call_state,
  int64_t offset, int64_t bytes)
 {
 BlockCopyTask *task;
@@ -135,7 +150,9 @@ static BlockCopyTask *block_copy_task_create(BlockCopyState 
*s,
 
 task = g_new(BlockCopyTask, 1);
 *task = (BlockCopyTask) {
+.task.func = block_copy_task_entry,
 .s = s,
+.call_state = call_state,
 .offset = offset,
 .bytes = bytes,
 };
@@ -263,6 +280,38 @@ void block_copy_set_progress_meter(BlockCopyState *s, 
ProgressMeter *pm)
 s->progress = pm;
 }
 
+/*
+ * Takes ownership of @task
+ *
+ * If pool is NULL directly run the task, otherwise schedule it into the pool.
+ *
+ * Returns: task.func return code if pool is NULL
+ *  otherwise -ECANCELED if pool status is bad
+ *  otherwise 0 (successfully scheduled)
+ */
+static coroutine_fn int block_copy_task_run(AioTaskPool *pool,
+BlockCopyTask *task)
+{
+if (!pool) {
+int ret = task->task.func(>task);
+
+g_free(task);
+return ret;
+}
+
+aio_task_pool_wait_slot(pool);
+if (aio_task_pool_status(pool) < 0) {
+co_put_to_shres(task->s->mem, task->bytes);
+block_copy_task_end(task, -ECANCELED);
+g_free(task);
+return -ECANCELED;
+}
+
+aio_task_pool_start_task(pool, >task);
+
+return 0;
+}
+
 /*
  * block_copy_do_copy
  *
@@ -366,6 +415,27 @@ out:
 return ret;
 }
 
+static coroutine_fn int block_copy_task_entry(AioTask *task)
+{
+BlockCopyTask *t = container_of(task, BlockCopyTask, task);
+bool error_is_read;
+int ret;
+
+ret = block_copy_do_copy(t->s, t->offset, t->bytes, t->zeroes,
+ _is_read);
+if (ret < 0 && !t->call_state->failed) {
+t->call_state->failed = true;
+t->call_state->error_is_read = error_is_read;
+} else {
+progress_work_done(t->s->progress, t->bytes);
+t->s->progress_bytes_callback(t->bytes, t->s->progress_opaque);
+}
+co_put_to_shres(t->s->mem, t->bytes);
+block_copy_task_end(t, ret);
+
+return ret;
+}
+
 static int block_copy_block_status(BlockCopyState *s, int64_t offset,
int64_t bytes, int64_t *pnum)
 {
@@ -484,6 +554,8 @@ static int coroutine_fn 
block_copy_dirty_clusters(BlockCopyState *s,
 int ret = 0;
 bool found_dirty = false;
 int64_t end = offset + bytes;
+AioTaskPool *aio = NULL;
+BlockCopyCallState call_state = {false, false};
 
 /*
  * block_copy() user is responsible for keeping source and target in same
@@ -495,11 +567,11 @@ static int coroutine_fn 
block_copy_dirty_clusters(BlockCopyState *s,
 assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
 assert(QEMU_IS_ALIGNED(bytes, s->cluster_size));
 
-while (bytes) {
-g_autofree BlockCopyTask *task = NULL;
+while (bytes && aio_task_pool_status(aio) == 0) {
+BlockCopyTask *task;
 int64_t status_bytes;
 
-task = block_copy_task_create(s, offset, bytes);
+task =

[PULL 23/24] block/block-copy: refactor task creation

2020-05-05 Thread Max Reitz

From: Vladimir Sementsov-Ogievskiy 

Instead of just relying on the comment "Called only on full-dirty
region" in block_copy_task_create() let's move initial dirty area
search directly to block_copy_task_create(). Let's also use effective
bdrv_dirty_bitmap_next_dirty_area instead of looping through all
non-dirty clusters.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
Message-Id: <20200429130847.28124-5-vsement...@virtuozzo.com>
Signed-off-by: Max Reitz 
---
 block/block-copy.c | 80 ++
 1 file changed, 46 insertions(+), 34 deletions(-)

diff --git a/block/block-copy.c b/block/block-copy.c
index 35ff9cc3ef..f560338647 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -32,6 +32,11 @@ typedef struct BlockCopyTask {
 CoQueue wait_queue; /* coroutines blocked on this task */
 } BlockCopyTask;
 
+static int64_t task_end(BlockCopyTask *task)
+{
+return task->offset + task->bytes;
+}
+
 typedef struct BlockCopyState {
 /*
  * BdrvChild objects are not owned or managed by block-copy. They are
@@ -106,17 +111,29 @@ static bool coroutine_fn 
block_copy_wait_one(BlockCopyState *s, int64_t offset,
 return true;
 }
 
-/* Called only on full-dirty region */
+/*
+ * Search for the first dirty area in offset/bytes range and create task at
+ * the beginning of it.
+ */
 static BlockCopyTask *block_copy_task_create(BlockCopyState *s,
  int64_t offset, int64_t bytes)
 {
-BlockCopyTask *task = g_new(BlockCopyTask, 1);
+BlockCopyTask *task;
 
+if (!bdrv_dirty_bitmap_next_dirty_area(s->copy_bitmap,
+   offset, offset + bytes,
+   s->copy_size, , ))
+{
+return NULL;
+}
+
+/* region is dirty, so no existent tasks possible in it */
 assert(!find_conflicting_task(s, offset, bytes));
 
 bdrv_reset_dirty_bitmap(s->copy_bitmap, offset, bytes);
 s->in_flight_bytes += bytes;
 
+task = g_new(BlockCopyTask, 1);
 *task = (BlockCopyTask) {
 .s = s,
 .offset = offset,
@@ -466,6 +483,7 @@ static int coroutine_fn 
block_copy_dirty_clusters(BlockCopyState *s,
 {
 int ret = 0;
 bool found_dirty = false;
+int64_t end = offset + bytes;
 
 /*
  * block_copy() user is responsible for keeping source and target in same
@@ -479,58 +497,52 @@ static int coroutine_fn 
block_copy_dirty_clusters(BlockCopyState *s,
 
 while (bytes) {
 g_autofree BlockCopyTask *task = NULL;
-int64_t next_zero, cur_bytes, status_bytes;
+int64_t status_bytes;
 
-if (!bdrv_dirty_bitmap_get(s->copy_bitmap, offset)) {
-trace_block_copy_skip(s, offset);
-offset += s->cluster_size;
-bytes -= s->cluster_size;
-continue; /* already copied */
+task = block_copy_task_create(s, offset, bytes);
+if (!task) {
+/* No more dirty bits in the bitmap */
+trace_block_copy_skip_range(s, offset, bytes);
+break;
+}
+if (task->offset > offset) {
+trace_block_copy_skip_range(s, offset, task->offset - offset);
 }
 
 found_dirty = true;
 
-cur_bytes = MIN(bytes, s->copy_size);
-
-next_zero = bdrv_dirty_bitmap_next_zero(s->copy_bitmap, offset,
-cur_bytes);
-if (next_zero >= 0) {
-assert(next_zero > offset); /* offset is dirty */
-assert(next_zero < offset + cur_bytes); /* no need to do MIN() */
-cur_bytes = next_zero - offset;
-}
-task = block_copy_task_create(s, offset, cur_bytes);
-
-ret = block_copy_block_status(s, offset, cur_bytes, _bytes);
+ret = block_copy_block_status(s, task->offset, task->bytes,
+  _bytes);
 assert(ret >= 0); /* never fail */
-cur_bytes = MIN(cur_bytes, status_bytes);
-block_copy_task_shrink(task, cur_bytes);
+if (status_bytes < task->bytes) {
+block_copy_task_shrink(task, status_bytes);
+}
 if (s->skip_unallocated && !(ret & BDRV_BLOCK_ALLOCATED)) {
 block_copy_task_end(task, 0);
 progress_set_remaining(s->progress,
bdrv_get_dirty_count(s->copy_bitmap) +
s->in_flight_bytes);
-trace_block_copy_skip_range(s, offset, status_bytes);
-offset += status_bytes;
-bytes -= status_bytes;
+trace_block_copy_skip_range(s, task->offset, task->bytes);
+offset = task_end(task);
+bytes = end - offset;
 continue;
 }
 
-trace_block_copy_process(s, offset);
+trace_block_copy_process(s, task->offset);
 
-co_get_from_shres(s->mem, cur_bytes);
-ret = block_copy_do_copy(s,

[PULL 21/24] block/block-copy: alloc task on each iteration

2020-05-05 Thread Max Reitz

From: Vladimir Sementsov-Ogievskiy 

We are going to use aio-task-pool API, so tasks will be handled in
parallel. We need therefore separate allocated task on each iteration.
Introduce this logic now.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
Message-Id: <20200429130847.28124-3-vsement...@virtuozzo.com>
Signed-off-by: Max Reitz 
---
 block/block-copy.c | 18 +++---
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/block/block-copy.c b/block/block-copy.c
index bbb29366dc..8d1b9ab9f0 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -106,9 +106,11 @@ static bool coroutine_fn 
block_copy_wait_one(BlockCopyState *s, int64_t offset,
 }
 
 /* Called only on full-dirty region */
-static void block_copy_task_begin(BlockCopyState *s, BlockCopyTask *task,
-  int64_t offset, int64_t bytes)
+static BlockCopyTask *block_copy_task_create(BlockCopyState *s,
+ int64_t offset, int64_t bytes)
 {
+BlockCopyTask *task = g_new(BlockCopyTask, 1);
+
 assert(!find_conflicting_task(s, offset, bytes));
 
 bdrv_reset_dirty_bitmap(s->copy_bitmap, offset, bytes);
@@ -118,6 +120,8 @@ static void block_copy_task_begin(BlockCopyState *s, 
BlockCopyTask *task,
 task->bytes = bytes;
 qemu_co_queue_init(>wait_queue);
 QLIST_INSERT_HEAD(>tasks, task, list);
+
+return task;
 }
 
 /*
@@ -472,7 +476,7 @@ static int coroutine_fn 
block_copy_dirty_clusters(BlockCopyState *s,
 assert(QEMU_IS_ALIGNED(bytes, s->cluster_size));
 
 while (bytes) {
-BlockCopyTask task;
+g_autofree BlockCopyTask *task = NULL;
 int64_t next_zero, cur_bytes, status_bytes;
 
 if (!bdrv_dirty_bitmap_get(s->copy_bitmap, offset)) {
@@ -493,14 +497,14 @@ static int coroutine_fn 
block_copy_dirty_clusters(BlockCopyState *s,
 assert(next_zero < offset + cur_bytes); /* no need to do MIN() */
 cur_bytes = next_zero - offset;
 }
-block_copy_task_begin(s, , offset, cur_bytes);
+task = block_copy_task_create(s, offset, cur_bytes);
 
 ret = block_copy_block_status(s, offset, cur_bytes, _bytes);
 assert(ret >= 0); /* never fail */
 cur_bytes = MIN(cur_bytes, status_bytes);
-block_copy_task_shrink(s, , cur_bytes);
+block_copy_task_shrink(s, task, cur_bytes);
 if (s->skip_unallocated && !(ret & BDRV_BLOCK_ALLOCATED)) {
-block_copy_task_end(s, , 0);
+block_copy_task_end(s, task, 0);
 progress_set_remaining(s->progress,
bdrv_get_dirty_count(s->copy_bitmap) +
s->in_flight_bytes);
@@ -516,7 +520,7 @@ static int coroutine_fn 
block_copy_dirty_clusters(BlockCopyState *s,
 ret = block_copy_do_copy(s, offset, cur_bytes, ret & BDRV_BLOCK_ZERO,
  error_is_read);
 co_put_to_shres(s->mem, cur_bytes);
-block_copy_task_end(s, , ret);
+block_copy_task_end(s, task, ret);
 if (ret < 0) {
 return ret;
 }
-- 
2.26.2

[PULL 16/24] qcow2: Allow resize of images with internal snapshots

2020-05-05 Thread Max Reitz

From: Eric Blake 

We originally refused to allow resize of images with internal
snapshots because the v2 image format did not require the tracking of
snapshot size, making it impossible to safely revert to a snapshot
with a different size than the current view of the image.  But the
snapshot size tracking was rectified in v3, and our recent fixes to
qemu-img amend (see 0a85af35) guarantee that we always have a valid
snapshot size.  Thus, we no longer need to artificially limit image
resizes, but it does become one more thing that would prevent a
downgrade back to v2.  And now that we support different-sized
snapshots, it's also easy to fix reverting to a snapshot to apply the
new size.

Upgrade iotest 61 to cover this (we previously had NO coverage of
refusal to resize while snapshots exist).  Note that the amend process
can fail but still have effects: in particular, since we break things
into upgrade, resize, downgrade, a failure during resize does not roll
back changes made during upgrade, nor does failure in downgrade roll
back a resize.  But this situation is pre-existing even without this
patch; and without journaling, the best we could do is minimize the
chance of partial failure by collecting all changes prior to doing any
writes - which adds a lot of complexity but could still fail with EIO.
On the other hand, we are careful that even if we have partial
modification but then fail, the image is left viable (that is, we are
careful to sequence things so that after each successful cluster
write, there may be transient leaked clusters but no corrupt
metadata).  And complicating the code to make it more transaction-like
is not worth the effort: a user can always request multiple 'qemu-img
amend' changing one thing each, if they need finer-grained control
over detecting the first failure than what they get by letting qemu
decide how to sequence multiple changes.

Signed-off-by: Eric Blake 
Reviewed-by: Max Reitz 
Message-Id: <20200428192648.749066-3-ebl...@redhat.com>
Signed-off-by: Max Reitz 
---
 block/qcow2-snapshot.c | 20 
 block/qcow2.c  | 25 ++---
 tests/qemu-iotests/061 | 35 +++
 tests/qemu-iotests/061.out | 28 
 4 files changed, 101 insertions(+), 7 deletions(-)

diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
index 82c32d4c9b..2756b37d24 100644
--- a/block/qcow2-snapshot.c
+++ b/block/qcow2-snapshot.c
@@ -23,6 +23,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "sysemu/block-backend.h"
 #include "qapi/error.h"
 #include "qcow2.h"
 #include "qemu/bswap.h"
@@ -775,10 +776,21 @@ int qcow2_snapshot_goto(BlockDriverState *bs, const char 
*snapshot_id)
 }
 
 if (sn->disk_size != bs->total_sectors * BDRV_SECTOR_SIZE) {
-error_report("qcow2: Loading snapshots with different disk "
-"size is not implemented");
-ret = -ENOTSUP;
-goto fail;
+BlockBackend *blk = blk_new_with_bs(bs, BLK_PERM_RESIZE, BLK_PERM_ALL,
+_err);
+if (!blk) {
+error_report_err(local_err);
+ret = -ENOTSUP;
+goto fail;
+}
+
+ret = blk_truncate(blk, sn->disk_size, true, PREALLOC_MODE_OFF, 0,
+   _err);
+blk_unref(blk);
+if (ret < 0) {
+error_report_err(local_err);
+goto fail;
+}
 }
 
 /*
diff --git a/block/qcow2.c b/block/qcow2.c
index 0edc7f4643..3e8b3d022b 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3989,9 +3989,12 @@ static int coroutine_fn 
qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
 
 qemu_co_mutex_lock(>lock);
 
-/* cannot proceed if image has snapshots */
-if (s->nb_snapshots) {
-error_setg(errp, "Can't resize an image which has snapshots");
+/*
+ * Even though we store snapshot size for all images, it was not
+ * required until v3, so it is not safe to proceed for v2.
+ */
+if (s->nb_snapshots && s->qcow_version < 3) {
+error_setg(errp, "Can't resize a v2 image which has snapshots");
 ret = -ENOTSUP;
 goto fail;
 }
@@ -5005,6 +5008,7 @@ static int qcow2_downgrade(BlockDriverState *bs, int 
target_version,
 BDRVQcow2State *s = bs->opaque;
 int current_version = s->qcow_version;
 int ret;
+int i;
 
 /* This is qcow2_downgrade(), not qcow2_upgrade() */
 assert(target_version < current_version);
@@ -5022,6 +5026,21 @@ static int qcow2_downgrade(BlockDriverState *bs, int 
target_version,
 return -ENOTSUP;
 }
 
+/*
+ * If any internal snapshot has a different size than the current
+ * image size, or VM state size that exceeds 32 bits, downgrading
+ * is unsafe.  Even though we would still use v3-compliant output
+ * to preserve that data, other v2 programs might not realize
+ * those optional fields are important.
+ */
+

[PULL 22/24] block/block-copy: add state pointer to BlockCopyTask

2020-05-05 Thread Max Reitz

From: Vladimir Sementsov-Ogievskiy 

We are going to use aio-task-pool API, so we'll need state pointer in
BlockCopyTask anyway. Add it now and use where possible.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
Message-Id: <20200429130847.28124-4-vsement...@virtuozzo.com>
Signed-off-by: Max Reitz 
---
 block/block-copy.c | 28 +++-
 1 file changed, 15 insertions(+), 13 deletions(-)

diff --git a/block/block-copy.c b/block/block-copy.c
index 8d1b9ab9f0..35ff9cc3ef 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -25,6 +25,7 @@
 #define BLOCK_COPY_MAX_MEM (128 * MiB)
 
 typedef struct BlockCopyTask {
+BlockCopyState *s;
 int64_t offset;
 int64_t bytes;
 QLIST_ENTRY(BlockCopyTask) list;
@@ -116,8 +117,11 @@ static BlockCopyTask 
*block_copy_task_create(BlockCopyState *s,
 bdrv_reset_dirty_bitmap(s->copy_bitmap, offset, bytes);
 s->in_flight_bytes += bytes;
 
-task->offset = offset;
-task->bytes = bytes;
+*task = (BlockCopyTask) {
+.s = s,
+.offset = offset,
+.bytes = bytes,
+};
 qemu_co_queue_init(>wait_queue);
 QLIST_INSERT_HEAD(>tasks, task, list);
 
@@ -131,8 +135,7 @@ static BlockCopyTask *block_copy_task_create(BlockCopyState 
*s,
  * wake up all tasks waiting for us (may be some of them are not intersecting
  * with shrunk task)
  */
-static void coroutine_fn block_copy_task_shrink(BlockCopyState *s,
-BlockCopyTask *task,
+static void coroutine_fn block_copy_task_shrink(BlockCopyTask *task,
 int64_t new_bytes)
 {
 if (new_bytes == task->bytes) {
@@ -141,20 +144,19 @@ static void coroutine_fn 
block_copy_task_shrink(BlockCopyState *s,
 
 assert(new_bytes > 0 && new_bytes < task->bytes);
 
-s->in_flight_bytes -= task->bytes - new_bytes;
-bdrv_set_dirty_bitmap(s->copy_bitmap,
+task->s->in_flight_bytes -= task->bytes - new_bytes;
+bdrv_set_dirty_bitmap(task->s->copy_bitmap,
   task->offset + new_bytes, task->bytes - new_bytes);
 
 task->bytes = new_bytes;
 qemu_co_queue_restart_all(>wait_queue);
 }
 
-static void coroutine_fn block_copy_task_end(BlockCopyState *s,
- BlockCopyTask *task, int ret)
+static void coroutine_fn block_copy_task_end(BlockCopyTask *task, int ret)
 {
-s->in_flight_bytes -= task->bytes;
+task->s->in_flight_bytes -= task->bytes;
 if (ret < 0) {
-bdrv_set_dirty_bitmap(s->copy_bitmap, task->offset, task->bytes);
+bdrv_set_dirty_bitmap(task->s->copy_bitmap, task->offset, task->bytes);
 }
 QLIST_REMOVE(task, list);
 qemu_co_queue_restart_all(>wait_queue);
@@ -502,9 +504,9 @@ static int coroutine_fn 
block_copy_dirty_clusters(BlockCopyState *s,
 ret = block_copy_block_status(s, offset, cur_bytes, _bytes);
 assert(ret >= 0); /* never fail */
 cur_bytes = MIN(cur_bytes, status_bytes);
-block_copy_task_shrink(s, task, cur_bytes);
+block_copy_task_shrink(task, cur_bytes);
 if (s->skip_unallocated && !(ret & BDRV_BLOCK_ALLOCATED)) {
-block_copy_task_end(s, task, 0);
+block_copy_task_end(task, 0);
 progress_set_remaining(s->progress,
bdrv_get_dirty_count(s->copy_bitmap) +
s->in_flight_bytes);
@@ -520,7 +522,7 @@ static int coroutine_fn 
block_copy_dirty_clusters(BlockCopyState *s,
 ret = block_copy_do_copy(s, offset, cur_bytes, ret & BDRV_BLOCK_ZERO,
  error_is_read);
 co_put_to_shres(s->mem, cur_bytes);
-block_copy_task_end(s, task, ret);
+block_copy_task_end(task, ret);
 if (ret < 0) {
 return ret;
 }
-- 
2.26.2

[PULL 20/24] block/block-copy: rename in-flight requests to tasks

2020-05-05 Thread Max Reitz

From: Vladimir Sementsov-Ogievskiy 

We are going to use aio-task-pool API and extend in-flight request
structure to be a successor of AioTask, so rename things appropriately.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
Message-Id: <20200429130847.28124-2-vsement...@virtuozzo.com>
Signed-off-by: Max Reitz 
---
 block/block-copy.c | 98 +++---
 1 file changed, 48 insertions(+), 50 deletions(-)

diff --git a/block/block-copy.c b/block/block-copy.c
index 05227e18bf..bbb29366dc 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -24,12 +24,12 @@
 #define BLOCK_COPY_MAX_BUFFER (1 * MiB)
 #define BLOCK_COPY_MAX_MEM (128 * MiB)
 
-typedef struct BlockCopyInFlightReq {
+typedef struct BlockCopyTask {
 int64_t offset;
 int64_t bytes;
-QLIST_ENTRY(BlockCopyInFlightReq) list;
-CoQueue wait_queue; /* coroutines blocked on this request */
-} BlockCopyInFlightReq;
+QLIST_ENTRY(BlockCopyTask) list;
+CoQueue wait_queue; /* coroutines blocked on this task */
+} BlockCopyTask;
 
 typedef struct BlockCopyState {
 /*
@@ -45,7 +45,7 @@ typedef struct BlockCopyState {
 bool use_copy_range;
 int64_t copy_size;
 uint64_t len;
-QLIST_HEAD(, BlockCopyInFlightReq) inflight_reqs;
+QLIST_HEAD(, BlockCopyTask) tasks;
 
 BdrvRequestFlags write_flags;
 
@@ -73,15 +73,14 @@ typedef struct BlockCopyState {
 SharedResource *mem;
 } BlockCopyState;
 
-static BlockCopyInFlightReq *find_conflicting_inflight_req(BlockCopyState *s,
-   int64_t offset,
-   int64_t bytes)
+static BlockCopyTask *find_conflicting_task(BlockCopyState *s,
+int64_t offset, int64_t bytes)
 {
-BlockCopyInFlightReq *req;
+BlockCopyTask *t;
 
-QLIST_FOREACH(req, >inflight_reqs, list) {
-if (offset + bytes > req->offset && offset < req->offset + req->bytes) 
{
-return req;
+QLIST_FOREACH(t, >tasks, list) {
+if (offset + bytes > t->offset && offset < t->offset + t->bytes) {
+return t;
 }
 }
 
@@ -89,73 +88,72 @@ static BlockCopyInFlightReq 
*find_conflicting_inflight_req(BlockCopyState *s,
 }
 
 /*
- * If there are no intersecting requests return false. Otherwise, wait for the
- * first found intersecting request to finish and return true.
+ * If there are no intersecting tasks return false. Otherwise, wait for the
+ * first found intersecting tasks to finish and return true.
  */
 static bool coroutine_fn block_copy_wait_one(BlockCopyState *s, int64_t offset,
  int64_t bytes)
 {
-BlockCopyInFlightReq *req = find_conflicting_inflight_req(s, offset, 
bytes);
+BlockCopyTask *task = find_conflicting_task(s, offset, bytes);
 
-if (!req) {
+if (!task) {
 return false;
 }
 
-qemu_co_queue_wait(>wait_queue, NULL);
+qemu_co_queue_wait(>wait_queue, NULL);
 
 return true;
 }
 
 /* Called only on full-dirty region */
-static void block_copy_inflight_req_begin(BlockCopyState *s,
-  BlockCopyInFlightReq *req,
-  int64_t offset, int64_t bytes)
+static void block_copy_task_begin(BlockCopyState *s, BlockCopyTask *task,
+  int64_t offset, int64_t bytes)
 {
-assert(!find_conflicting_inflight_req(s, offset, bytes));
+assert(!find_conflicting_task(s, offset, bytes));
 
 bdrv_reset_dirty_bitmap(s->copy_bitmap, offset, bytes);
 s->in_flight_bytes += bytes;
 
-req->offset = offset;
-req->bytes = bytes;
-qemu_co_queue_init(>wait_queue);
-QLIST_INSERT_HEAD(>inflight_reqs, req, list);
+task->offset = offset;
+task->bytes = bytes;
+qemu_co_queue_init(>wait_queue);
+QLIST_INSERT_HEAD(>tasks, task, list);
 }
 
 /*
- * block_copy_inflight_req_shrink
+ * block_copy_task_shrink
  *
- * Drop the tail of the request to be handled later. Set dirty bits back and
- * wake up all requests waiting for us (may be some of them are not 
intersecting
- * with shrunk request)
+ * Drop the tail of the task to be handled later. Set dirty bits back and
+ * wake up all tasks waiting for us (may be some of them are not intersecting
+ * with shrunk task)
  */
-static void coroutine_fn block_copy_inflight_req_shrink(BlockCopyState *s,
-BlockCopyInFlightReq *req, int64_t new_bytes)
+static void coroutine_fn block_copy_task_shrink(BlockCopyState *s,
+BlockCopyTask *task,
+int64_t new_bytes)
 {
-if (new_bytes == req->bytes) {
+if (new_bytes == task->bytes) {
 return;
 }
 
-assert(new_bytes > 0 && new_bytes < req->bytes);
+assert(new_bytes > 0 && new_bytes < task->bytes);
 
-s->in_flight_bytes -= req->bytes -

[PULL 14/24] iotests: use python logging for iotests.log()

2020-05-05 Thread Max Reitz

From: John Snow 

We can turn logging on/off globally instead of per-function.

Remove use_log from run_job, and use python logging to turn on
diffable output when we run through a script entry point.

iotest 245 changes output order due to buffering reasons.

An extended note on python logging:

A NullHandler is added to `qemu.iotests` to stop output from being
generated if this code is used as a library without configuring logging.
A NullHandler is only needed at the root, so a duplicate handler is not
needed for `qemu.iotests.diff_io`.

When logging is not configured, messages at the 'WARNING' levels or
above are printed with default settings. The NullHandler stops this from
occurring, which is considered good hygiene for code used as a library.

See https://docs.python.org/3/howto/logging.html#library-config

When logging is actually enabled (always at the behest of an explicit
call by a client script), a root logger is implicitly created at the
root, which allows messages to propagate upwards and be handled/emitted
from the root logger with default settings.

When we want iotest logging, we attach a handler to the
qemu.iotests.diff_io logger and disable propagation to avoid possible
double-printing.

For more information on python logging infrastructure, I highly
recommend downloading the pip package `logging_tree`, which provides
convenient visualizations of the hierarchical logging configuration
under different circumstances.

See https://pypi.org/project/logging_tree/ for more information.

Signed-off-by: John Snow 
Reviewed-by: Max Reitz 
Message-Id: <2020033114.11581-15-js...@redhat.com>
Reviewed-by: Kevin Wolf 
Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/030|  4 +--
 tests/qemu-iotests/155|  2 +-
 tests/qemu-iotests/245|  1 +
 tests/qemu-iotests/245.out| 10 +++
 tests/qemu-iotests/iotests.py | 53 ---
 5 files changed, 39 insertions(+), 31 deletions(-)

diff --git a/tests/qemu-iotests/030 b/tests/qemu-iotests/030
index aa911d266a..104e3cee1b 100755
--- a/tests/qemu-iotests/030
+++ b/tests/qemu-iotests/030
@@ -411,8 +411,8 @@ class TestParallelOps(iotests.QMPTestCase):
 result = self.vm.qmp('block-job-set-speed', device='drive0', speed=0)
 self.assert_qmp(result, 'return', {})
 
-self.vm.run_job(job='drive0', auto_dismiss=True, use_log=False)
-self.vm.run_job(job='node4', auto_dismiss=True, use_log=False)
+self.vm.run_job(job='drive0', auto_dismiss=True)
+self.vm.run_job(job='node4', auto_dismiss=True)
 self.assert_no_active_block_jobs()
 
 # Test a block-stream and a block-commit job in parallel
diff --git a/tests/qemu-iotests/155 b/tests/qemu-iotests/155
index 571bce9de4..cb371d4649 100755
--- a/tests/qemu-iotests/155
+++ b/tests/qemu-iotests/155
@@ -188,7 +188,7 @@ class MirrorBaseClass(BaseClass):
 
 self.assert_qmp(result, 'return', {})
 
-self.vm.run_job('mirror-job', use_log=False, auto_finalize=False,
+self.vm.run_job('mirror-job', auto_finalize=False,
 pre_finalize=self.openBacking, auto_dismiss=True)
 
 def testFull(self):
diff --git a/tests/qemu-iotests/245 b/tests/qemu-iotests/245
index 1001275a44..4f5f0bb901 100755
--- a/tests/qemu-iotests/245
+++ b/tests/qemu-iotests/245
@@ -1027,5 +1027,6 @@ class TestBlockdevReopen(iotests.QMPTestCase):
 self.run_test_iothreads(None, 'iothread0')
 
 if __name__ == '__main__':
+iotests.activate_logging()
 iotests.main(supported_fmts=["qcow2"],
  supported_protocols=["file"])
diff --git a/tests/qemu-iotests/245.out b/tests/qemu-iotests/245.out
index 682b93394d..4b33dcaf5c 100644
--- a/tests/qemu-iotests/245.out
+++ b/tests/qemu-iotests/245.out
@@ -1,8 +1,3 @@
-.
---
-Ran 21 tests
-
-OK
 {"execute": "job-finalize", "arguments": {"id": "commit0"}}
 {"return": {}}
 {"data": {"id": "commit0", "type": "commit"}, "event": "BLOCK_JOB_PENDING", 
"timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
@@ -15,3 +10,8 @@ OK
 {"return": {}}
 {"data": {"id": "stream0", "type": "stream"}, "event": "BLOCK_JOB_PENDING", 
"timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
 {"data": {"device": "stream0", "len": 3145728, "offset": 3145728, "speed": 0, 
"type": "stream"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": 
{"microseconds": "USECS", "seconds": "SECS"}}
+.
+--
+Ran 21 tests
+
+OK
diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 35d8cae997..6c0e781af7 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -42,6 +42,14 @@ assert sys.version_info >= (3, 6)
 QMPResponse = Dict[str, Any]
 
 
+# Use this logger for logging messages directly from the iotests module
+logger = logging.getLogger('qemu.iotests')

[PULL 19/24] Fix iotest 153

2020-05-05 Thread Max Reitz

From: Maxim Levitsky 

Commit f62514b3def5fb2acbef64d0e053c0c31fa45aff made qemu-img reject -o "" but 
this test uses it.
Since this test only tries to do a dry-run run of qemu-img amend,
replace the -o "" with dummy -o "size=$size".

Fixes: f62514b3def5fb2acbef64d0e053c0c31fa45aff

Signed-off-by: Maxim Levitsky 
Message-Id: <20200504131959.9533-1-mlevi...@redhat.com>
Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/153 |  2 +-
 tests/qemu-iotests/153.out | 12 ++--
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/tests/qemu-iotests/153 b/tests/qemu-iotests/153
index 2b13111768..cf961d3609 100755
--- a/tests/qemu-iotests/153
+++ b/tests/qemu-iotests/153
@@ -122,7 +122,7 @@ for opts1 in "" "read-only=on" 
"read-only=on,force-share=on"; do
 _run_cmd $QEMU_IMG check   $L "${TEST_IMG}"
 _run_cmd $QEMU_IMG compare $L "${TEST_IMG}" "${TEST_IMG}"
 _run_cmd $QEMU_IMG map $L "${TEST_IMG}"
-_run_cmd $QEMU_IMG amend -o "" $L "${TEST_IMG}"
+_run_cmd $QEMU_IMG amend -o "size=$size" $L "${TEST_IMG}"
 _run_cmd $QEMU_IMG commit  $L "${TEST_IMG}"
 _run_cmd $QEMU_IMG resize  $L "${TEST_IMG}" $size
 _run_cmd $QEMU_IMG rebase  $L "${TEST_IMG}" -b "${TEST_IMG}.base"
diff --git a/tests/qemu-iotests/153.out b/tests/qemu-iotests/153.out
index f7464dd8d3..b2a90caa6b 100644
--- a/tests/qemu-iotests/153.out
+++ b/tests/qemu-iotests/153.out
@@ -56,7 +56,7 @@ _qemu_img_wrapper map TEST_DIR/t.qcow2
 qemu-img: Could not open 'TEST_DIR/t.qcow2': Failed to get shared "write" lock
 Is another process using the image [TEST_DIR/t.qcow2]?
 
-_qemu_img_wrapper amend -o  TEST_DIR/t.qcow2
+_qemu_img_wrapper amend -o size=32M TEST_DIR/t.qcow2
 qemu-img: Could not open 'TEST_DIR/t.qcow2': Failed to get "write" lock
 Is another process using the image [TEST_DIR/t.qcow2]?
 
@@ -118,7 +118,7 @@ _qemu_img_wrapper compare -U TEST_DIR/t.qcow2 
TEST_DIR/t.qcow2
 
 _qemu_img_wrapper map -U TEST_DIR/t.qcow2
 
-_qemu_img_wrapper amend -o  -U TEST_DIR/t.qcow2
+_qemu_img_wrapper amend -o size=32M -U TEST_DIR/t.qcow2
 qemu-img: unrecognized option '-U'
 Try 'qemu-img --help' for more information
 
@@ -187,7 +187,7 @@ _qemu_img_wrapper compare TEST_DIR/t.qcow2 TEST_DIR/t.qcow2
 
 _qemu_img_wrapper map TEST_DIR/t.qcow2
 
-_qemu_img_wrapper amend -o  TEST_DIR/t.qcow2
+_qemu_img_wrapper amend -o size=32M TEST_DIR/t.qcow2
 qemu-img: Could not open 'TEST_DIR/t.qcow2': Failed to get "write" lock
 Is another process using the image [TEST_DIR/t.qcow2]?
 
@@ -241,7 +241,7 @@ _qemu_img_wrapper compare -U TEST_DIR/t.qcow2 
TEST_DIR/t.qcow2
 
 _qemu_img_wrapper map -U TEST_DIR/t.qcow2
 
-_qemu_img_wrapper amend -o  -U TEST_DIR/t.qcow2
+_qemu_img_wrapper amend -o size=32M -U TEST_DIR/t.qcow2
 qemu-img: unrecognized option '-U'
 Try 'qemu-img --help' for more information
 
@@ -303,7 +303,7 @@ _qemu_img_wrapper compare TEST_DIR/t.qcow2 TEST_DIR/t.qcow2
 
 _qemu_img_wrapper map TEST_DIR/t.qcow2
 
-_qemu_img_wrapper amend -o  TEST_DIR/t.qcow2
+_qemu_img_wrapper amend -o size=32M TEST_DIR/t.qcow2
 
 _qemu_img_wrapper commit TEST_DIR/t.qcow2
 
@@ -345,7 +345,7 @@ _qemu_img_wrapper compare -U TEST_DIR/t.qcow2 
TEST_DIR/t.qcow2
 
 _qemu_img_wrapper map -U TEST_DIR/t.qcow2
 
-_qemu_img_wrapper amend -o  -U TEST_DIR/t.qcow2
+_qemu_img_wrapper amend -o size=32M -U TEST_DIR/t.qcow2
 qemu-img: unrecognized option '-U'
 Try 'qemu-img --help' for more information
 
-- 
2.26.2

[PULL 11/24] iotests: add script_initialize

2020-05-05 Thread Max Reitz

From: John Snow 

Like script_main, but doesn't require a single point of entry.
Replace all existing initialization sections with this drop-in replacement.

This brings debug support to all existing script-style iotests.

Signed-off-by: John Snow 
Reviewed-by: Max Reitz 
Message-Id: <2020033114.11581-12-js...@redhat.com>
Reviewed-by: Kevin Wolf 
[mreitz: Give 274 the same treatment]
Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/149|  3 +-
 tests/qemu-iotests/194|  4 +-
 tests/qemu-iotests/202|  4 +-
 tests/qemu-iotests/203|  4 +-
 tests/qemu-iotests/206|  2 +-
 tests/qemu-iotests/207|  6 ++-
 tests/qemu-iotests/208|  2 +-
 tests/qemu-iotests/209|  2 +-
 tests/qemu-iotests/210|  6 ++-
 tests/qemu-iotests/211|  6 ++-
 tests/qemu-iotests/212|  6 ++-
 tests/qemu-iotests/213|  6 ++-
 tests/qemu-iotests/216|  4 +-
 tests/qemu-iotests/218|  2 +-
 tests/qemu-iotests/219|  2 +-
 tests/qemu-iotests/222|  7 ++--
 tests/qemu-iotests/224|  4 +-
 tests/qemu-iotests/228|  6 ++-
 tests/qemu-iotests/234|  4 +-
 tests/qemu-iotests/235|  4 +-
 tests/qemu-iotests/236|  2 +-
 tests/qemu-iotests/237|  2 +-
 tests/qemu-iotests/238|  2 +
 tests/qemu-iotests/242|  2 +-
 tests/qemu-iotests/246|  2 +-
 tests/qemu-iotests/248|  2 +-
 tests/qemu-iotests/254|  2 +-
 tests/qemu-iotests/255|  2 +-
 tests/qemu-iotests/256|  2 +-
 tests/qemu-iotests/258|  7 ++--
 tests/qemu-iotests/260|  4 +-
 tests/qemu-iotests/262|  4 +-
 tests/qemu-iotests/264|  4 +-
 tests/qemu-iotests/274|  4 +-
 tests/qemu-iotests/277|  2 +
 tests/qemu-iotests/280|  8 ++--
 tests/qemu-iotests/283|  4 +-
 tests/qemu-iotests/iotests.py | 76 +++
 38 files changed, 132 insertions(+), 83 deletions(-)

diff --git a/tests/qemu-iotests/149 b/tests/qemu-iotests/149
index b4a21bf7b7..852768f80a 100755
--- a/tests/qemu-iotests/149
+++ b/tests/qemu-iotests/149
@@ -382,8 +382,7 @@ def test_once(config, qemu_img=False):
 
 
 # Obviously we only work with the luks image format
-iotests.verify_image_format(supported_fmts=['luks'])
-iotests.verify_platform()
+iotests.script_initialize(supported_fmts=['luks'])
 
 # We need sudo in order to run cryptsetup to create
 # dm-crypt devices. This is safe to use on any
diff --git a/tests/qemu-iotests/194 b/tests/qemu-iotests/194
index 9dc1bd3510..8b1f720af4 100755
--- a/tests/qemu-iotests/194
+++ b/tests/qemu-iotests/194
@@ -21,8 +21,8 @@
 
 import iotests
 
-iotests.verify_image_format(supported_fmts=['qcow2', 'qed', 'raw'])
-iotests.verify_platform(['linux'])
+iotests.script_initialize(supported_fmts=['qcow2', 'qed', 'raw'],
+  supported_platforms=['linux'])
 
 with iotests.FilePath('source.img') as source_img_path, \
  iotests.FilePath('dest.img') as dest_img_path, \
diff --git a/tests/qemu-iotests/202 b/tests/qemu-iotests/202
index 920a8683ef..e3900a44d1 100755
--- a/tests/qemu-iotests/202
+++ b/tests/qemu-iotests/202
@@ -24,8 +24,8 @@
 
 import iotests
 
-iotests.verify_image_format(supported_fmts=['qcow2'])
-iotests.verify_platform(['linux'])
+iotests.script_initialize(supported_fmts=['qcow2'],
+  supported_platforms=['linux'])
 
 with iotests.FilePath('disk0.img') as disk0_img_path, \
  iotests.FilePath('disk1.img') as disk1_img_path, \
diff --git a/tests/qemu-iotests/203 b/tests/qemu-iotests/203
index 49eff5d405..4b4bd3307d 100755
--- a/tests/qemu-iotests/203
+++ b/tests/qemu-iotests/203
@@ -24,8 +24,8 @@
 
 import iotests
 
-iotests.verify_image_format(supported_fmts=['qcow2'])
-iotests.verify_platform(['linux'])
+iotests.script_initialize(supported_fmts=['qcow2'],
+  supported_platforms=['linux'])
 
 with iotests.FilePath('disk0.img') as disk0_img_path, \
  iotests.FilePath('disk1.img') as disk1_img_path, \
diff --git a/tests/qemu-iotests/206 b/tests/qemu-iotests/206
index e2b50ae24d..f42432a838 100755
--- a/tests/qemu-iotests/206
+++ b/tests/qemu-iotests/206
@@ -23,7 +23,7 @@
 import iotests
 from iotests import imgfmt
 
-iotests.verify_image_format(supported_fmts=['qcow2'])
+iotests.script_initialize(supported_fmts=['qcow2'])
 
 with iotests.FilePath('t.qcow2') as disk_path, \
  iotests.FilePath('t.qcow2.base') as backing_path, \
diff --git a/tests/qemu-iotests/207 b/tests/qemu-iotests/207
index 3d9c1208ca..a6621410da 100755
--- a/tests/qemu-iotests/207
+++ b/tests/qemu-iotests/207
@@ -24,8 +24,10 @@ import iotests
 import subprocess
 import re
 
-iotests.verify_image_format(supported_fmts=['raw'])
-iotests.verify_protocol(supported=['ssh'])
+iotests.script_initialize(
+supported_fmts=['raw'],
+supported_protocols=['ssh'],
+)
 
 def filter_hash(qmsg):
 def _filter(key, value):
diff --git

[PULL 15/24] block: Add blk_new_with_bs() helper

2020-05-05 Thread Max Reitz

From: Eric Blake 

There are several callers that need to create a new block backend from
an existing BDS; make the task slightly easier with a common helper
routine.

Suggested-by: Max Reitz 
Signed-off-by: Eric Blake 
Message-Id: <20200424190903.522087-2-ebl...@redhat.com>
[mreitz: Set @ret only in error paths, see
 https://lists.nongnu.org/archive/html/qemu-block/2020-04/msg01216.html]
Signed-off-by: Max Reitz 
Message-Id: <20200428192648.749066-2-ebl...@redhat.com>
Signed-off-by: Max Reitz 
---
 include/sysemu/block-backend.h |  2 ++
 block/block-backend.c  | 23 +++
 block/crypto.c |  9 -
 block/parallels.c  |  8 
 block/qcow.c   |  8 
 block/qcow2.c  | 18 --
 block/qed.c|  8 
 block/sheepdog.c   | 10 +-
 block/vdi.c|  8 
 block/vhdx.c   |  8 
 block/vmdk.c   |  9 -
 block/vpc.c|  8 
 blockdev.c |  8 +++-
 blockjob.c |  7 ++-
 14 files changed, 75 insertions(+), 59 deletions(-)

diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 34de7faa81..0917663d89 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -77,6 +77,8 @@ typedef struct BlockBackendPublic {
 } BlockBackendPublic;
 
 BlockBackend *blk_new(AioContext *ctx, uint64_t perm, uint64_t shared_perm);
+BlockBackend *blk_new_with_bs(BlockDriverState *bs, uint64_t perm,
+  uint64_t shared_perm, Error **errp);
 BlockBackend *blk_new_open(const char *filename, const char *reference,
QDict *options, int flags, Error **errp);
 int blk_get_refcnt(BlockBackend *blk);
diff --git a/block/block-backend.c b/block/block-backend.c
index 17ed6d8c5b..f4944861fa 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -355,6 +355,29 @@ BlockBackend *blk_new(AioContext *ctx, uint64_t perm, 
uint64_t shared_perm)
 return blk;
 }
 
+/*
+ * Create a new BlockBackend connected to an existing BlockDriverState.
+ *
+ * @perm is a bitmasks of BLK_PERM_* constants which describes the
+ * permissions to request for @bs that is attached to this
+ * BlockBackend.  @shared_perm is a bitmask which describes which
+ * permissions may be granted to other users of the attached node.
+ * Both sets of permissions can be changed later using blk_set_perm().
+ *
+ * Return the new BlockBackend on success, null on failure.
+ */
+BlockBackend *blk_new_with_bs(BlockDriverState *bs, uint64_t perm,
+  uint64_t shared_perm, Error **errp)
+{
+BlockBackend *blk = blk_new(bdrv_get_aio_context(bs), perm, shared_perm);
+
+if (blk_insert_bs(blk, bs, errp) < 0) {
+blk_unref(blk);
+return NULL;
+}
+return blk;
+}
+
 /*
  * Creates a new BlockBackend, opens a new BlockDriverState, and connects both.
  * The new BlockBackend is in the main AioContext.
diff --git a/block/crypto.c b/block/crypto.c
index e02f343590..ca44dae4f7 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -261,11 +261,10 @@ static int 
block_crypto_co_create_generic(BlockDriverState *bs,
 QCryptoBlock *crypto = NULL;
 struct BlockCryptoCreateData data;
 
-blk = blk_new(bdrv_get_aio_context(bs),
-  BLK_PERM_WRITE | BLK_PERM_RESIZE, BLK_PERM_ALL);
-
-ret = blk_insert_bs(blk, bs, errp);
-if (ret < 0) {
+blk = blk_new_with_bs(bs, BLK_PERM_WRITE | BLK_PERM_RESIZE, BLK_PERM_ALL,
+  errp);
+if (!blk) {
+ret = -EPERM;
 goto cleanup;
 }
 
diff --git a/block/parallels.c b/block/parallels.c
index 2be92cf417..8db64a55e3 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -559,10 +559,10 @@ static int coroutine_fn 
parallels_co_create(BlockdevCreateOptions* opts,
 return -EIO;
 }
 
-blk = blk_new(bdrv_get_aio_context(bs),
-  BLK_PERM_WRITE | BLK_PERM_RESIZE, BLK_PERM_ALL);
-ret = blk_insert_bs(blk, bs, errp);
-if (ret < 0) {
+blk = blk_new_with_bs(bs, BLK_PERM_WRITE | BLK_PERM_RESIZE, BLK_PERM_ALL,
+  errp);
+if (!blk) {
+ret = -EPERM;
 goto out;
 }
 blk_set_allow_write_beyond_eof(blk, true);
diff --git a/block/qcow.c b/block/qcow.c
index 6b5f2269f0..b0475b73a5 100644
--- a/block/qcow.c
+++ b/block/qcow.c
@@ -849,10 +849,10 @@ static int coroutine_fn 
qcow_co_create(BlockdevCreateOptions *opts,
 return -EIO;
 }
 
-qcow_blk = blk_new(bdrv_get_aio_context(bs),
-   BLK_PERM_WRITE | BLK_PERM_RESIZE, BLK_PERM_ALL);
-ret = blk_insert_bs(qcow_blk, bs, errp);
-if (ret < 0) {
+qcow_blk = blk_new_with_bs(bs, BLK_PERM_WRITE | BLK_PERM_RESIZE,
+   BLK_PERM_ALL, errp);
+if

Re: [PULL v2 0/4] Block patches

2020-05-05 Thread Peter Maydell

On Mon, 4 May 2020 at 16:15, Stefan Hajnoczi  wrote:
>
> The following changes since commit 9af638cc1f665712522608c5d6b8c03d8fa67666:
>
>   Merge remote-tracking branch 
> 'remotes/pmaydell/tags/pull-target-arm-20200504' into staging (2020-05-04 
> 13:37:17 +0100)
>
> are available in the Git repository at:
>
>   https://github.com/stefanha/qemu.git tags/block-pull-request
>
> for you to fetch changes up to 08b689aa6b521964b8275dd7a2564aefa5d68129:
>
>   lockable: Replace locks with lock guard macros (2020-05-04 16:07:43 +0100)
>
> 
> Pull request
>
> v2:
>  * Fixed stray slirp submodule change [Peter]
>
> Fixes for the lock guard macros, code conversions to the lock guard macros, 
> and
> support for selecting fuzzer targets with argv[0].
>


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/5.1
for any user-visible changes.

-- PMM

[PULL 10/24] iotests: add hmp helper with logging

2020-05-05 Thread Max Reitz

From: John Snow 

Minor cleanup for HMP functions; helps with line length and consolidates
HMP helpers through one implementation function.

Although we are adding a universal toggle to turn QMP logging on or off,
many existing callers to hmp functions don't expect that output to be
logged, which causes quite a few changes in the test output.

For now, offer a use_log parameter.

Typing notes:

QMPResponse is just an alias for Dict[str, Any]. It holds no special
meanings and it is not a formal subtype of Dict[str, Any]. It is best
thought of as a lexical synonym.

We may well wish to add stricter subtypes in the future for certain
shapes of data that are not formalized as Python objects, at which point
we can simply retire the alias and allow mypy to more strictly check
usages of the name.

Signed-off-by: John Snow 
Message-Id: <2020033114.11581-11-js...@redhat.com>
Reviewed-by: Max Reitz 
Reviewed-by: Kevin Wolf 
Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/iotests.py | 39 +--
 1 file changed, 24 insertions(+), 15 deletions(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 9f5da32dae..cf10c428b5 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -37,6 +37,10 @@ from qemu import qtest
 
 assert sys.version_info >= (3, 6)
 
+# Type Aliases
+QMPResponse = Dict[str, Any]
+
+
 faulthandler.enable()
 
 # This will not work if arguments contain spaces but is necessary if we
@@ -541,25 +545,30 @@ class VM(qtest.QEMUQtestMachine):
 self._args.append(addr)
 return self
 
-def pause_drive(self, drive, event=None):
-'''Pause drive r/w operations'''
+def hmp(self, command_line: str, use_log: bool = False) -> QMPResponse:
+cmd = 'human-monitor-command'
+kwargs = {'command-line': command_line}
+if use_log:
+return self.qmp_log(cmd, **kwargs)
+else:
+return self.qmp(cmd, **kwargs)
+
+def pause_drive(self, drive: str, event: Optional[str] = None) -> None:
+"""Pause drive r/w operations"""
 if not event:
 self.pause_drive(drive, "read_aio")
 self.pause_drive(drive, "write_aio")
 return
-self.qmp('human-monitor-command',
- command_line='qemu-io %s "break %s bp_%s"'
- % (drive, event, drive))
-
-def resume_drive(self, drive):
-self.qmp('human-monitor-command',
- command_line='qemu-io %s "remove_break bp_%s"'
- % (drive, drive))
-
-def hmp_qemu_io(self, drive, cmd):
-'''Write to a given drive using an HMP command'''
-return self.qmp('human-monitor-command',
-command_line='qemu-io %s "%s"' % (drive, cmd))
+self.hmp(f'qemu-io {drive} "break {event} bp_{drive}"')
+
+def resume_drive(self, drive: str) -> None:
+"""Resume drive r/w operations"""
+self.hmp(f'qemu-io {drive} "remove_break bp_{drive}"')
+
+def hmp_qemu_io(self, drive: str, cmd: str,
+use_log: bool = False) -> QMPResponse:
+"""Write to a given drive using an HMP command"""
+return self.hmp(f'qemu-io {drive} "{cmd}"', use_log=use_log)
 
 def flatten_qmp_object(self, obj, output=None, basestr=''):
 if output is None:
-- 
2.26.2

[PULL 00/24] Block patches

2020-05-05 Thread Max Reitz

The following changes since commit 5375af3cd7b8adcc10c18d8083b7be63976c9645:

  Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging 
(2020-05-04 15:51:09 +0100)

are available in the Git repository at:

  https://github.com/XanClic/qemu.git tags/pull-block-2020-05-05

for you to fetch changes up to 4ce5dd3e9b5ee0fac18625860eb3727399ee965e:

  block/block-copy: use aio-task-pool API (2020-05-05 14:03:28 +0200)


Block patches:
- Asynchronous copying for block-copy (i.e., the backup job)
- Allow resizing of qcow2 images when they have internal snapshots
- iotests: Logging improvements for Python tests
- iotest 153 fix, and block comment cleanups


Eric Blake (4):
  block: Add blk_new_with_bs() helper
  qcow2: Allow resize of images with internal snapshots
  qcow2: Tweak comment about bitmaps vs. resize
  block: Comment cleanups

John Snow (14):
  iotests: do a light delinting
  iotests: don't use 'format' for drive_add
  iotests: ignore import warnings from pylint
  iotests: replace mutable list default args
  iotests: add pylintrc file
  iotests: alphabetize standard imports
  iotests: drop pre-Python 3.4 compatibility code
  iotests: touch up log function signature
  iotests: limit line length to 79 chars
  iotests: add hmp helper with logging
  iotests: add script_initialize
  iotest 258: use script_main
  iotests: Mark verify functions as private
  iotests: use python logging for iotests.log()

Maxim Levitsky (1):
  Fix iotest 153

Vladimir Sementsov-Ogievskiy (5):
  block/block-copy: rename in-flight requests to tasks
  block/block-copy: alloc task on each iteration
  block/block-copy: add state pointer to BlockCopyTask
  block/block-copy: refactor task creation
  block/block-copy: use aio-task-pool API

 include/sysemu/block-backend.h |   2 +
 block/block-backend.c  |  23 +++
 block/block-copy.c | 279 +
 block/crypto.c |   9 +-
 block/io.c |   3 +-
 block/parallels.c  |   8 +-
 block/qcow.c   |   8 +-
 block/qcow2-refcount.c |   2 +-
 block/qcow2-snapshot.c |  20 +-
 block/qcow2.c  |  45 ++--
 block/qed.c|   8 +-
 block/sheepdog.c   |  10 +-
 block/vdi.c|   8 +-
 block/vhdx.c   |   8 +-
 block/vmdk.c   |   9 +-
 block/vpc.c|   8 +-
 block/vvfat.c  |  10 +-
 blockdev.c |   8 +-
 blockjob.c |   7 +-
 tests/qemu-iotests/001 |   2 +-
 tests/qemu-iotests/030 |   4 +-
 tests/qemu-iotests/052 |   2 +-
 tests/qemu-iotests/055 |   3 +-
 tests/qemu-iotests/061 |  35 
 tests/qemu-iotests/061.out |  28 +++
 tests/qemu-iotests/134 |   2 +-
 tests/qemu-iotests/149 |   3 +-
 tests/qemu-iotests/153 |   2 +-
 tests/qemu-iotests/153.out |  12 +-
 tests/qemu-iotests/155 |   2 +-
 tests/qemu-iotests/188 |   2 +-
 tests/qemu-iotests/194 |   4 +-
 tests/qemu-iotests/202 |   4 +-
 tests/qemu-iotests/203 |   4 +-
 tests/qemu-iotests/206 |   2 +-
 tests/qemu-iotests/207 |   6 +-
 tests/qemu-iotests/208 |   2 +-
 tests/qemu-iotests/209 |   2 +-
 tests/qemu-iotests/210 |   6 +-
 tests/qemu-iotests/211 |   6 +-
 tests/qemu-iotests/212 |   6 +-
 tests/qemu-iotests/213 |   6 +-
 tests/qemu-iotests/216 |   4 +-
 tests/qemu-iotests/218 |   2 +-
 tests/qemu-iotests/219 |   2 +-
 tests/qemu-iotests/222 |   7 +-
 tests/qemu-iotests/224 |   4 +-
 tests/qemu-iotests/228 |   6 +-
 tests/qemu-iotests/234 |   4 +-
 tests/qemu-iotests/235 |   4 +-
 tests/qemu-iotests/236 |   2 +-
 tests/qemu-iotests/237 |   2 +-
 tests/qemu-iotests/238 |   2 +
 tests/qemu-iotests/242 |   2 +-
 tests/qemu-iotests/245 |   1 +
 tests/qemu-iotests/245.out |  10 +-
 tests/qemu-iotests/246 |   2 +-
 tests/qemu-iotests/248 |   2 +-
 tests/qemu-iotests/254 |   2 +-
 tests/qemu-iotests/255 |   2 +-
 tests/qemu-iotests/256 |   2 +-
 tests/qemu-iotests/258 |  10 +-
 tests/qemu-iotests/260 |   4 +-
 tests/qemu-iotests/262 |   4 +-
 tests/qemu-iotests/264 |   4 +-
 tests/qemu-iotests/274 |   4 +-
 tests/qemu-iotests/277 |   2 +
 tests/qemu-iotests/280 |   8 +-
 tests/qemu-iotests/283 |   4 +-
 tests/qemu-iotests/iotests.py  | 366 -
 tests/qemu-iotests/pylintrc|  26 +++
 71 files changed, 728 insertions(+), 386 deletions(-)
 create mode 100644 tests/qemu-iotests/pylintrc

-- 
2.26.2

[PULL 05/24] iotests: add pylintrc file

2020-05-05 Thread Max Reitz

From: John Snow 

This allows others to get repeatable results with pylint. If you run
`pylint iotests.py`, you should see a 100% pass.

Signed-off-by: John Snow 
Reviewed-by: Max Reitz 
Message-Id: <2020033114.11581-6-js...@redhat.com>
Reviewed-by: Kevin Wolf 
Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/pylintrc | 22 ++
 1 file changed, 22 insertions(+)
 create mode 100644 tests/qemu-iotests/pylintrc

diff --git a/tests/qemu-iotests/pylintrc b/tests/qemu-iotests/pylintrc
new file mode 100644
index 00..daec2c4942
--- /dev/null
+++ b/tests/qemu-iotests/pylintrc
@@ -0,0 +1,22 @@
+[MESSAGES CONTROL]
+
+# Disable the message, report, category or checker with the given id(s). You
+# can either give multiple identifiers separated by comma (,) or put this
+# option multiple times (only on the command line, not in the configuration
+# file where it should appear only once). You can also use "--disable=all" to
+# disable everything first and then reenable specific checks. For example, if
+# you want to run only the similarities checker, you can use "--disable=all
+# --enable=similarities". If you want to run only the classes checker, but have
+# no Warning level messages displayed, use "--disable=all --enable=classes
+# --disable=W".
+disable=invalid-name,
+no-else-return,
+too-few-public-methods,
+too-many-arguments,
+too-many-branches,
+too-many-lines,
+too-many-locals,
+too-many-public-methods,
+# These are temporary, and should be removed:
+line-too-long,
+missing-docstring,
-- 
2.26.2

[PULL 01/24] iotests: do a light delinting

2020-05-05 Thread Max Reitz

From: John Snow 

This doesn't fix everything in here, but it does help clean up the
pylint report considerably.

This should be 100% style changes only; the intent is to make pylint
more useful by working on establishing a baseline for iotests that we
can gate against in the future.

Signed-off-by: John Snow 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Max Reitz 
Message-Id: <2020033114.11581-2-js...@redhat.com>
Reviewed-by: Kevin Wolf 
Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/iotests.py | 83 ++-
 1 file changed, 43 insertions(+), 40 deletions(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 5f8c263d59..6f6363f3ec 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -16,11 +16,9 @@
 # along with this program.  If not, see .
 #
 
-import errno
 import os
 import re
 import subprocess
-import string
 import unittest
 import sys
 import struct
@@ -35,7 +33,7 @@ import faulthandler
 sys.path.append(os.path.join(os.path.dirname(__file__), '..', '..', 'python'))
 from qemu import qtest
 
-assert sys.version_info >= (3,6)
+assert sys.version_info >= (3, 6)
 
 faulthandler.enable()
 
@@ -141,11 +139,11 @@ def qemu_img_log(*args):
 return result
 
 def img_info_log(filename, filter_path=None, imgopts=False, extra_args=[]):
-args = [ 'info' ]
+args = ['info']
 if imgopts:
 args.append('--image-opts')
 else:
-args += [ '-f', imgfmt ]
+args += ['-f', imgfmt]
 args += extra_args
 args.append(filename)
 
@@ -224,7 +222,7 @@ class QemuIoInteractive:
 # quit command is in close(), '\n' is added automatically
 assert '\n' not in cmd
 cmd = cmd.strip()
-assert cmd != 'q' and cmd != 'quit'
+assert cmd not in ('q', 'quit')
 self._p.stdin.write(cmd + '\n')
 self._p.stdin.flush()
 return self._read_output()
@@ -246,10 +244,8 @@ def qemu_nbd_early_pipe(*args):
 sys.stderr.write('qemu-nbd received signal %i: %s\n' %
  (-exitcode,
   ' '.join(qemu_nbd_args + ['--fork'] + list(args
-if exitcode == 0:
-return exitcode, ''
-else:
-return exitcode, subp.communicate()[0]
+
+return exitcode, subp.communicate()[0] if exitcode else ''
 
 def qemu_nbd_popen(*args):
 '''Run qemu-nbd in daemon mode and return the parent's exit code'''
@@ -313,7 +309,7 @@ def filter_qmp(qmsg, filter_fn):
 items = qmsg.items()
 
 for k, v in items:
-if isinstance(v, list) or isinstance(v, dict):
+if isinstance(v, (dict, list)):
 qmsg[k] = filter_qmp(v, filter_fn)
 else:
 qmsg[k] = filter_fn(k, v)
@@ -324,7 +320,7 @@ def filter_testfiles(msg):
 return msg.replace(prefix, 'TEST_DIR/PID-')
 
 def filter_qmp_testfiles(qmsg):
-def _filter(key, value):
+def _filter(_key, value):
 if is_str(value):
 return filter_testfiles(value)
 return value
@@ -351,7 +347,7 @@ def filter_imgfmt(msg):
 return msg.replace(imgfmt, 'IMGFMT')
 
 def filter_qmp_imgfmt(qmsg):
-def _filter(key, value):
+def _filter(_key, value):
 if is_str(value):
 return filter_imgfmt(value)
 return value
@@ -362,7 +358,7 @@ def log(msg, filters=[], indent=None):
 If indent is provided, JSON serializable messages are pretty-printed.'''
 for flt in filters:
 msg = flt(msg)
-if isinstance(msg, dict) or isinstance(msg, list):
+if isinstance(msg, (dict, list)):
 # Python < 3.4 needs to know not to add whitespace when 
pretty-printing:
 separators = (', ', ': ') if indent is None else (',', ': ')
 # Don't sort if it's already sorted
@@ -373,14 +369,14 @@ def log(msg, filters=[], indent=None):
 print(msg)
 
 class Timeout:
-def __init__(self, seconds, errmsg = "Timeout"):
+def __init__(self, seconds, errmsg="Timeout"):
 self.seconds = seconds
 self.errmsg = errmsg
 def __enter__(self):
 signal.signal(signal.SIGALRM, self.timeout)
 signal.setitimer(signal.ITIMER_REAL, self.seconds)
 return self
-def __exit__(self, type, value, traceback):
+def __exit__(self, exc_type, value, traceback):
 signal.setitimer(signal.ITIMER_REAL, 0)
 return False
 def timeout(self, signum, frame):
@@ -389,7 +385,7 @@ class Timeout:
 def file_pattern(name):
 return "{0}-{1}".format(os.getpid(), name)
 
-class FilePaths(object):
+class FilePaths:
 """
 FilePaths is an auto-generated filename that cleans itself up.
 
@@ -536,11 +532,11 @@ class VM(qtest.QEMUQtestMachine):
 self.pause_drive(drive, "write_aio")
 return
 self.qmp('human-monitor-command',
-command_line='qemu-io %s "break %s bp_%s"' % (drive, 
event, drive))
+

[PULL 07/24] iotests: drop pre-Python 3.4 compatibility code

2020-05-05 Thread Max Reitz

From: John Snow 

We no longer need to accommodate <3.4, drop this code.
(The lines were > 79 chars and it stood out.)

Signed-off-by: John Snow 
Reviewed-by: Max Reitz 
Message-Id: <2020033114.11581-8-js...@redhat.com>
Reviewed-by: Kevin Wolf 
Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/iotests.py | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index dae124872e..374a8f6077 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -360,12 +360,9 @@ def log(msg, filters=(), indent=None):
 for flt in filters:
 msg = flt(msg)
 if isinstance(msg, (dict, list)):
-# Python < 3.4 needs to know not to add whitespace when 
pretty-printing:
-separators = (', ', ': ') if indent is None else (',', ': ')
 # Don't sort if it's already sorted
 do_sort = not isinstance(msg, OrderedDict)
-print(json.dumps(msg, sort_keys=do_sort,
- indent=indent, separators=separators))
+print(json.dumps(msg, sort_keys=do_sort, indent=indent))
 else:
 print(msg)
 
-- 
2.26.2

[PULL 08/24] iotests: touch up log function signature

2020-05-05 Thread Max Reitz

From: John Snow 

Representing nested, recursive data structures in mypy is notoriously
difficult; the best we can reliably do right now is denote the leaf
types as "Any" while describing the general shape of the data.

Regardless, this fully annotates the log() function.

Typing notes:

TypeVar is a Type variable that can optionally be constrained by a
sequence of possible types. This variable is bound to a specific type
per-invocation, like a Generic.

log() behaves as log() now, where the incoming type informs the
signature it expects for any filter arguments passed in. If Msg is a
str, then filter should take and return a str.

Signed-off-by: John Snow 
Reviewed-by: Max Reitz 
Message-Id: <2020033114.11581-9-js...@redhat.com>
Reviewed-by: Kevin Wolf 
Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/iotests.py | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 374a8f6077..69f24223d2 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -28,6 +28,7 @@ import signal
 import struct
 import subprocess
 import sys
+from typing import (Any, Callable, Dict, Iterable, List, Optional, TypeVar)
 import unittest
 
 # pylint: disable=import-error, wrong-import-position
@@ -354,9 +355,16 @@ def filter_qmp_imgfmt(qmsg):
 return value
 return filter_qmp(qmsg, _filter)
 
-def log(msg, filters=(), indent=None):
-'''Logs either a string message or a JSON serializable message (like QMP).
-If indent is provided, JSON serializable messages are pretty-printed.'''
+
+Msg = TypeVar('Msg', Dict[str, Any], List[Any], str)
+
+def log(msg: Msg,
+filters: Iterable[Callable[[Msg], Msg]] = (),
+indent: Optional[int] = None) -> None:
+"""
+Logs either a string message or a JSON serializable message (like QMP).
+If indent is provided, JSON serializable messages are pretty-printed.
+"""
 for flt in filters:
 msg = flt(msg)
 if isinstance(msg, (dict, list)):
-- 
2.26.2

[PULL 17/24] qcow2: Tweak comment about bitmaps vs. resize

2020-05-05 Thread Max Reitz

From: Eric Blake 

Our comment did not actually match the code.  Rewrite the comment to
be less sensitive to any future changes to qcow2-bitmap.c that might
implement scenarios that we currently reject.

Signed-off-by: Eric Blake 
Reviewed-by: Max Reitz 
Message-Id: <20200428192648.749066-4-ebl...@redhat.com>
Signed-off-by: Max Reitz 
---
 block/qcow2.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 3e8b3d022b..ad934109a8 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3999,7 +3999,7 @@ static int coroutine_fn 
qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
 goto fail;
 }
 
-/* cannot proceed if image has bitmaps */
+/* See qcow2-bitmap.c for which bitmap scenarios prevent a resize. */
 if (qcow2_truncate_bitmaps_check(bs, errp)) {
 ret = -ENOTSUP;
 goto fail;
-- 
2.26.2

[PULL 18/24] block: Comment cleanups

2020-05-05 Thread Max Reitz

From: Eric Blake 

It's been a while since we got rid of the sector-based bdrv_read and
bdrv_write (commit 2e11d756); let's finish the job on a few remaining
comments.

Signed-off-by: Eric Blake 
Message-Id: <20200428213807.776655-1-ebl...@redhat.com>
Reviewed-by: Alberto Garcia 
Signed-off-by: Max Reitz 
---
 block/io.c |  3 ++-
 block/qcow2-refcount.c |  2 +-
 block/vvfat.c  | 10 +-
 tests/qemu-iotests/001 |  2 +-
 tests/qemu-iotests/052 |  2 +-
 tests/qemu-iotests/134 |  2 +-
 tests/qemu-iotests/188 |  2 +-
 7 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/block/io.c b/block/io.c
index a4f9714230..7d30e61edc 100644
--- a/block/io.c
+++ b/block/io.c
@@ -960,7 +960,7 @@ int bdrv_pwrite_zeroes(BdrvChild *child, int64_t offset,
  * flags are passed through to bdrv_pwrite_zeroes (e.g. BDRV_REQ_MAY_UNMAP,
  * BDRV_REQ_FUA).
  *
- * Returns < 0 on error, 0 on success. For error codes see bdrv_write().
+ * Returns < 0 on error, 0 on success. For error codes see bdrv_pwrite().
  */
 int bdrv_make_zero(BdrvChild *child, BdrvRequestFlags flags)
 {
@@ -994,6 +994,7 @@ int bdrv_make_zero(BdrvChild *child, BdrvRequestFlags flags)
 }
 }
 
+/* return < 0 if error. See bdrv_pwrite() for the return codes */
 int bdrv_preadv(BdrvChild *child, int64_t offset, QEMUIOVector *qiov)
 {
 int ret;
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index d9650b9b6c..0457a6060d 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -2660,7 +2660,7 @@ fail:
  * - 0 if writing to this offset will not affect the mentioned metadata
  * - a positive QCow2MetadataOverlap value indicating one overlapping section
  * - a negative value (-errno) indicating an error while performing a check,
- *   e.g. when bdrv_read failed on QCOW2_OL_INACTIVE_L2
+ *   e.g. when bdrv_pread failed on QCOW2_OL_INACTIVE_L2
  */
 int qcow2_check_metadata_overlap(BlockDriverState *bs, int ign, int64_t offset,
  int64_t size)
diff --git a/block/vvfat.c b/block/vvfat.c
index ab800c4887..6d5c090dec 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -2148,7 +2148,7 @@ DLOG(checkpoint());
  * - get modified FAT
  * - compare the two FATs (TODO)
  * - get buffer for marking used clusters
- * - recurse direntries from root (using bs->bdrv_read to make
+ * - recurse direntries from root (using bs->bdrv_pread to make
  *sure to get the new data)
  *   - check that the FAT agrees with the size
  *   - count the number of clusters occupied by this directory and
@@ -2913,9 +2913,9 @@ static int handle_deletes(BDRVVVFATState* s)
 /*
  * synchronize mapping with new state:
  *
- * - copy FAT (with bdrv_read)
+ * - copy FAT (with bdrv_pread)
  * - mark all filenames corresponding to mappings as deleted
- * - recurse direntries from root (using bs->bdrv_read)
+ * - recurse direntries from root (using bs->bdrv_pread)
  * - delete files corresponding to mappings marked as deleted
  */
 static int do_commit(BDRVVVFATState* s)
@@ -2935,10 +2935,10 @@ static int do_commit(BDRVVVFATState* s)
 return ret;
 }
 
-/* copy FAT (with bdrv_read) */
+/* copy FAT (with bdrv_pread) */
 memcpy(s->fat.pointer, s->fat2, 0x200 * s->sectors_per_fat);
 
-/* recurse direntries from root (using bs->bdrv_read) */
+/* recurse direntries from root (using bs->bdrv_pread) */
 ret = commit_direntries(s, 0, -1);
 if (ret) {
 fprintf(stderr, "Fatal: error while committing (%d)\n", ret);
diff --git a/tests/qemu-iotests/001 b/tests/qemu-iotests/001
index d87a535c33..696726e45f 100755
--- a/tests/qemu-iotests/001
+++ b/tests/qemu-iotests/001
@@ -1,6 +1,6 @@
 #!/usr/bin/env bash
 #
-# Test simple read/write using plain bdrv_read/bdrv_write
+# Test simple read/write using plain bdrv_pread/bdrv_pwrite
 #
 # Copyright (C) 2009 Red Hat, Inc.
 #
diff --git a/tests/qemu-iotests/052 b/tests/qemu-iotests/052
index 45a140910d..8d5c10601f 100755
--- a/tests/qemu-iotests/052
+++ b/tests/qemu-iotests/052
@@ -1,6 +1,6 @@
 #!/usr/bin/env bash
 #
-# Test bdrv_read/bdrv_write using BDRV_O_SNAPSHOT
+# Test bdrv_pread/bdrv_pwrite using BDRV_O_SNAPSHOT
 #
 # Copyright (C) 2013 Red Hat, Inc.
 #
diff --git a/tests/qemu-iotests/134 b/tests/qemu-iotests/134
index 5f0fb86211..5162d21662 100755
--- a/tests/qemu-iotests/134
+++ b/tests/qemu-iotests/134
@@ -1,6 +1,6 @@
 #!/usr/bin/env bash
 #
-# Test encrypted read/write using plain bdrv_read/bdrv_write
+# Test encrypted read/write using plain bdrv_pread/bdrv_pwrite
 #
 # Copyright (C) 2015 Red Hat, Inc.
 #
diff --git a/tests/qemu-iotests/188 b/tests/qemu-iotests/188
index afca44df54..09b9b6083a 100755
--- a/tests/qemu-iotests/188
+++ b/tests/qemu-iotests/188
@@ -1,6 +1,6 @@
 #!/usr/bin/env bash
 #
-# Test encrypted read/write using plain bdrv_read/bdrv_write
+# Test encrypted read/write using plain bdrv_pread/bdrv_pwrite
 #
 # Copyright (C) 2017 Red Hat, Inc.
 #
-- 
2.26.2

[PULL 13/24] iotests: Mark verify functions as private

2020-05-05 Thread Max Reitz

From: John Snow 

Mark the verify functions as "private" with a leading underscore, to
discourage their use. Update type signatures while we're here.

(Also, make pending patches not yet using the new entry points fail in a
very obvious way.)

Signed-off-by: John Snow 
Message-Id: <2020033114.11581-14-js...@redhat.com>
Reviewed-by: Max Reitz 
Reviewed-by: Kevin Wolf 
Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/iotests.py | 23 +--
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 9f85e1fba3..35d8cae997 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -1006,7 +1006,8 @@ def case_notrun(reason):
 open('%s/%s.casenotrun' % (output_dir, seq), 'a').write(
 '[case not run] ' + reason + '\n')
 
-def verify_image_format(supported_fmts=(), unsupported_fmts=()):
+def _verify_image_format(supported_fmts: Sequence[str] = (),
+ unsupported_fmts: Sequence[str] = ()) -> None:
 assert not (supported_fmts and unsupported_fmts)
 
 if 'generic' in supported_fmts and \
@@ -1020,7 +1021,8 @@ def verify_image_format(supported_fmts=(), 
unsupported_fmts=()):
 if not_sup or (imgfmt in unsupported_fmts):
 notrun('not suitable for this image format: %s' % imgfmt)
 
-def verify_protocol(supported=(), unsupported=()):
+def _verify_protocol(supported: Sequence[str] = (),
+ unsupported: Sequence[str] = ()) -> None:
 assert not (supported and unsupported)
 
 if 'generic' in supported:
@@ -1030,7 +1032,8 @@ def verify_protocol(supported=(), unsupported=()):
 if not_sup or (imgproto in unsupported):
 notrun('not suitable for this protocol: %s' % imgproto)
 
-def verify_platform(supported=(), unsupported=()):
+def _verify_platform(supported: Sequence[str] = (),
+ unsupported: Sequence[str] = ()) -> None:
 if any((sys.platform.startswith(x) for x in unsupported)):
 notrun('not suitable for this OS: %s' % sys.platform)
 
@@ -1038,11 +1041,11 @@ def verify_platform(supported=(), unsupported=()):
 if not any((sys.platform.startswith(x) for x in supported)):
 notrun('not suitable for this OS: %s' % sys.platform)
 
-def verify_cache_mode(supported_cache_modes=()):
+def _verify_cache_mode(supported_cache_modes: Sequence[str] = ()) -> None:
 if supported_cache_modes and (cachemode not in supported_cache_modes):
 notrun('not suitable for this cache mode: %s' % cachemode)
 
-def verify_aio_mode(supported_aio_modes=()):
+def _verify_aio_mode(supported_aio_modes: Sequence[str] = ()):
 if supported_aio_modes and (aiomode not in supported_aio_modes):
 notrun('not suitable for this aio mode: %s' % aiomode)
 
@@ -1170,11 +1173,11 @@ def execute_setup_common(supported_fmts: Sequence[str] 
= (),
 sys.stderr.write('Please run this test via the "check" script\n')
 sys.exit(os.EX_USAGE)
 
-verify_image_format(supported_fmts, unsupported_fmts)
-verify_protocol(supported_protocols, unsupported_protocols)
-verify_platform(supported=supported_platforms)
-verify_cache_mode(supported_cache_modes)
-verify_aio_mode(supported_aio_modes)
+_verify_image_format(supported_fmts, unsupported_fmts)
+_verify_protocol(supported_protocols, unsupported_protocols)
+_verify_platform(supported=supported_platforms)
+_verify_cache_mode(supported_cache_modes)
+_verify_aio_mode(supported_aio_modes)
 
 debug = '-d' in sys.argv
 if debug:
-- 
2.26.2

[PULL 09/24] iotests: limit line length to 79 chars

2020-05-05 Thread Max Reitz

From: John Snow 

79 is the PEP8 recommendation. This recommendation works well for
reading patch diffs in TUI email clients.

Signed-off-by: John Snow 
Reviewed-by: Max Reitz 
Message-Id: <2020033114.11581-10-js...@redhat.com>
Reviewed-by: Kevin Wolf 
Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/iotests.py | 64 +++
 tests/qemu-iotests/pylintrc   |  6 +++-
 2 files changed, 47 insertions(+), 23 deletions(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 69f24223d2..9f5da32dae 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -80,9 +80,11 @@ luks_default_key_secret_opt = 'key-secret=keysec0'
 def qemu_img(*args):
 '''Run qemu-img and return the exit code'''
 devnull = open('/dev/null', 'r+')
-exitcode = subprocess.call(qemu_img_args + list(args), stdin=devnull, 
stdout=devnull)
+exitcode = subprocess.call(qemu_img_args + list(args),
+   stdin=devnull, stdout=devnull)
 if exitcode < 0:
-sys.stderr.write('qemu-img received signal %i: %s\n' % (-exitcode, ' 
'.join(qemu_img_args + list(args
+sys.stderr.write('qemu-img received signal %i: %s\n'
+ % (-exitcode, ' '.join(qemu_img_args + list(args
 return exitcode
 
 def ordered_qmp(qmsg, conv_keys=True):
@@ -121,7 +123,8 @@ def qemu_img_verbose(*args):
 '''Run qemu-img without suppressing its output and return the exit code'''
 exitcode = subprocess.call(qemu_img_args + list(args))
 if exitcode < 0:
-sys.stderr.write('qemu-img received signal %i: %s\n' % (-exitcode, ' 
'.join(qemu_img_args + list(args
+sys.stderr.write('qemu-img received signal %i: %s\n'
+ % (-exitcode, ' '.join(qemu_img_args + list(args
 return exitcode
 
 def qemu_img_pipe(*args):
@@ -132,7 +135,8 @@ def qemu_img_pipe(*args):
 universal_newlines=True)
 exitcode = subp.wait()
 if exitcode < 0:
-sys.stderr.write('qemu-img received signal %i: %s\n' % (-exitcode, ' 
'.join(qemu_img_args + list(args
+sys.stderr.write('qemu-img received signal %i: %s\n'
+ % (-exitcode, ' '.join(qemu_img_args + list(args
 return subp.communicate()[0]
 
 def qemu_img_log(*args):
@@ -162,7 +166,8 @@ def qemu_io(*args):
 universal_newlines=True)
 exitcode = subp.wait()
 if exitcode < 0:
-sys.stderr.write('qemu-io received signal %i: %s\n' % (-exitcode, ' 
'.join(args)))
+sys.stderr.write('qemu-io received signal %i: %s\n'
+ % (-exitcode, ' '.join(args)))
 return subp.communicate()[0]
 
 def qemu_io_log(*args):
@@ -284,10 +289,13 @@ win32_re = re.compile(r"\r")
 def filter_win32(msg):
 return win32_re.sub("", msg)
 
-qemu_io_re = re.compile(r"[0-9]* ops; [0-9\/:. sec]* \([0-9\/.inf]* 
[EPTGMKiBbytes]*\/sec and [0-9\/.inf]* ops\/sec\)")
+qemu_io_re = re.compile(r"[0-9]* ops; [0-9\/:. sec]* "
+r"\([0-9\/.inf]* [EPTGMKiBbytes]*\/sec "
+r"and [0-9\/.inf]* ops\/sec\)")
 def filter_qemu_io(msg):
 msg = filter_win32(msg)
-return qemu_io_re.sub("X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)", 
msg)
+return qemu_io_re.sub("X ops; XX:XX:XX.X "
+  "(XXX YYY/sec and XXX ops/sec)", msg)
 
 chown_re = re.compile(r"chown [0-9]+:[0-9]+")
 def filter_chown(msg):
@@ -340,7 +348,9 @@ def filter_img_info(output, filename):
 line = filter_testfiles(line)
 line = line.replace(imgfmt, 'IMGFMT')
 line = re.sub('iters: [0-9]+', 'iters: XXX', line)
-line = re.sub('uuid: [-a-f0-9]+', 'uuid: 
----', line)
+line = re.sub('uuid: [-a-f0-9]+',
+  'uuid: ----',
+  line)
 line = re.sub('cid: [0-9]+', 'cid: XX', line)
 lines.append(line)
 return '\n'.join(lines)
@@ -538,11 +548,13 @@ class VM(qtest.QEMUQtestMachine):
 self.pause_drive(drive, "write_aio")
 return
 self.qmp('human-monitor-command',
- command_line='qemu-io %s "break %s bp_%s"' % (drive, event, 
drive))
+ command_line='qemu-io %s "break %s bp_%s"'
+ % (drive, event, drive))
 
 def resume_drive(self, drive):
 self.qmp('human-monitor-command',
- command_line='qemu-io %s "remove_break bp_%s"' % (drive, 
drive))
+ command_line='qemu-io %s "remove_break bp_%s"'
+ % (drive, drive))
 
 def hmp_qemu_io(self, drive, cmd):
 '''Write to a given drive using an HMP command'''
@@ -802,16 +814,18 @@ class QMPTestCase(unittest.TestCase):
 idx = int(idx)
 
 if not isinstance(d, dict) or component not in d:
-self.fail('failed path

[PULL 12/24] iotest 258: use script_main

2020-05-05 Thread Max Reitz

From: John Snow 

Since this one is nicely factored to use a single entry point,
use script_main to run the tests.

Signed-off-by: John Snow 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Max Reitz 
Message-Id: <2020033114.11581-13-js...@redhat.com>
Reviewed-by: Kevin Wolf 
Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/258 | 11 ---
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/tests/qemu-iotests/258 b/tests/qemu-iotests/258
index a65151dda6..e305a1502f 100755
--- a/tests/qemu-iotests/258
+++ b/tests/qemu-iotests/258
@@ -23,12 +23,6 @@ import iotests
 from iotests import log, qemu_img, qemu_io_silent, \
 filter_qmp_testfiles, filter_qmp_imgfmt
 
-# Need backing file and change-backing-file support
-iotests.script_initialize(
-supported_fmts=['qcow2', 'qed'],
-supported_platforms=['linux'],
-)
-
 # Returns a node for blockdev-add
 def node(node_name, path, backing=None, fmt=None, throttle=None):
 if fmt is None:
@@ -161,4 +155,7 @@ def main():
 test_concurrent_finish(False)
 
 if __name__ == '__main__':
-main()
+# Need backing file and change-backing-file support
+iotests.script_main(main,
+supported_fmts=['qcow2', 'qed'],
+supported_platforms=['linux'])
-- 
2.26.2

1 2 >

1 - 100 of 150 matches

Mail list logo