Re: [PATCH 1/2] iotests: Drop readarray from _do_filter_img_create

2020-07-10 Thread Alex Bennée


Max Reitz  writes:

> Some systems where we run tests on do not have a 4.x bash, so they do
> not have readarray.  While it looked a bit nicer than messing with
> `head` and `tail`, we do not really need it, so we might as well not use
> it.

I've fixed the cirrus build failure by brew installing a more recent
bash. However if we prefer we could do this.

>
> Reported-by: Claudio Fontana 
> Signed-off-by: Max Reitz 
> ---
>  tests/qemu-iotests/common.filter | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/tests/qemu-iotests/common.filter 
> b/tests/qemu-iotests/common.filter
> index 3833206327..345c3ca03e 100644
> --- a/tests/qemu-iotests/common.filter
> +++ b/tests/qemu-iotests/common.filter
> @@ -138,13 +138,13 @@ _do_filter_img_create()
>  # Split the line into the pre-options part ($filename_part, which
>  # precedes ", fmt=") and the options part ($options, which starts
>  # with "fmt=")
> -# (And just echo everything before the first "^Formatting")
> -readarray formatting_line < <($SED -e 's/, fmt=/\n/')
> +read formatting_line
>  
> -filename_part=${formatting_line[0]}
> -unset formatting_line[0]
> +# Split line at the first ", fmt="
> +formatting_line=$(echo "$formatting_line" | $SED -e 's/, fmt=/\nfmt=/')
>  
> -options="fmt=${formatting_line[@]}"
> +filename_part=$(echo "$formatting_line" | head -n 1)
> +options=$(echo "$formatting_line" | tail -n +2)
>  
>  # Set grep_data_file to '\|data_file' to keep it; make it empty
>  # to drop it.


-- 
Alex Bennée



Re: [PATCH v11 00/34] Add subcluster allocation to qcow2

2020-07-10 Thread Alberto Garcia
On Fri 10 Jul 2020 06:43:59 PM CEST, no-re...@patchew.org wrote:
> /tmp/qemu-test/src/block/qcow2-cluster.c:473:19: error: 'expected_type' may 
> be used uninitialized in this function [-Werror=maybe-uninitialized]
>  } else if (type != expected_type) {
>^
> /tmp/qemu-test/src/block/qcow2-cluster.c:449:25: note: 'expected_type' was 
> declared here
>  QCow2SubclusterType expected_type, type;
>  ^

Meh, this is a false positive but I forgot to fix it. I'll do it if
there is a new version, otherwise please someone just initialize
expected_type to 0 when committing.

Berto



Re: [PATCH v7 17/47] block: Re-evaluate backing file handling in reopen

2020-07-10 Thread Andrey Shinkevich

On 25.06.2020 18:21, Max Reitz wrote:

Reopening a node's backing child needs a bit of special handling because
the "backing" child has different defaults than all other children
(among other things).  Adding filter support here is a bit more
difficult than just using the child access functions.  In fact, we often
have to directly use bs->backing because these functions are about the
"backing" child (which may or may not be the COW backing file).

Signed-off-by: Max Reitz 
---
  block.c | 46 ++
  1 file changed, 38 insertions(+), 8 deletions(-)

diff --git a/block.c b/block.c
index 712230ef5c..8131d0b5eb 100644
--- a/block.c
+++ b/block.c
@@ -4026,26 +4026,56 @@ static int bdrv_reopen_parse_backing(BDRVReopenState 
*reopen_state,
  }
  }
  
+/*

+ * Ensure that @bs can really handle backing files, because we are
+ * about to give it one (or swap the existing one)
+ */
+if (bs->drv->is_filter) {
+/* Filters always have a file or a backing child */
+if (!bs->backing) {
+error_setg(errp, "'%s' is a %s filter node that does not support a 
"
+   "backing child", bs->node_name, bs->drv->format_name);
+return -EINVAL;
+}
+} else if (!bs->drv->supports_backing) {
+error_setg(errp, "Driver '%s' of node '%s' does not support backing "
+   "files", bs->drv->format_name, bs->node_name);
+return -EINVAL;
+}
+
  /*
   * Find the "actual" backing file by skipping all links that point
   * to an implicit node, if any (e.g. a commit filter node).
+ * We cannot use any of the bdrv_skip_*() functions here because
+ * those return the first explicit node, while we are looking for
+ * its overlay here.
   */
  overlay_bs = bs;
-while (backing_bs(overlay_bs) && backing_bs(overlay_bs)->implicit) {
-overlay_bs = backing_bs(overlay_bs);
+while (bdrv_filter_or_cow_bs(overlay_bs) &&
+   bdrv_filter_or_cow_bs(overlay_bs)->implicit)
+{
+overlay_bs = bdrv_filter_or_cow_bs(overlay_bs);
  }


I believe that little optimization would work properly:


for (BlockDriverState *below_bs = bdrv_filter_or_cow_bs(overlay_bs);
   below_bs && below_bs->implicit;
   below_bs = bdrv_filter_or_cow_bs(overlay_bs)) {
 overlay_bs = below_bs;
}
  
  /* If we want to replace the backing file we need some extra checks */

-if (new_backing_bs != backing_bs(overlay_bs)) {
+if (new_backing_bs != bdrv_filter_or_cow_bs(overlay_bs)) {
  /* Check for implicit nodes between bs and its backing file */
  if (bs != overlay_bs) {
  error_setg(errp, "Cannot change backing link if '%s' has "
 "an implicit backing file", bs->node_name);
  return -EPERM;
  }
-/* Check if the backing link that we want to replace is frozen */
-if (bdrv_is_backing_chain_frozen(overlay_bs, backing_bs(overlay_bs),
- errp)) {
+/*
+ * Check if the backing link that we want to replace is frozen.
+ * Note that
+ * bdrv_filter_or_cow_child(overlay_bs) == overlay_bs->backing,
+ * because we know that overlay_bs == bs, and that @bs
+ * either is a filter that uses ->backing or a COW format BDS
+ * with bs->drv->supports_backing == true.
+ */
+if (bdrv_is_backing_chain_frozen(overlay_bs,
+ child_bs(overlay_bs->backing), errp))

What would be wrong with bdrv_filter_or_cow_bs(overlay_bs) here?

+{
  return -EPERM;
  }
  reopen_state->replace_backing_bs = true;
@@ -4196,7 +4226,7 @@ int bdrv_reopen_prepare(BDRVReopenState *reopen_state, 
BlockReopenQueue *queue,
   * its metadata. Otherwise the 'backing' option can be omitted.
   */
  if (drv->supports_backing && reopen_state->backing_missing &&
-(backing_bs(reopen_state->bs) || reopen_state->bs->backing_file[0])) {

= BlockDriverState*

+(reopen_state->bs->backing || reopen_state->bs->backing_file[0])) {


= BdrvChild*

Are we OK with that?


  error_setg(errp, "backing is missing for '%s'",
 reopen_state->bs->node_name);
  ret = -EINVAL;
@@ -4337,7 +4367,7 @@ void bdrv_reopen_commit(BDRVReopenState *reopen_state)
   * from bdrv_set_backing_hd()) has the new values.
   */
  if (reopen_state->replace_backing_bs) {
-BlockDriverState *old_backing_bs = backing_bs(bs);
+BlockDriverState *old_backing_bs = child_bs(bs->backing);
  assert(!old_backing_bs || !old_backing_bs->implicit);
  /* Abort the permission update on the backing bs we're detaching */
  if (old_backing_bs) {




[PATCH] vvfat: set status to odd fixes

2020-07-10 Thread P J P
From: Prasad J Pandit 

Virtual VFAT driver is quite old and rarely used. Set its status
to Odd Fixes.

Signed-off-by: Prasad J Pandit 
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 6aa54f7f8f..2a15a3987e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2930,7 +2930,7 @@ F: block/vpc.c
 vvfat
 M: Kevin Wolf 
 L: qemu-block@nongnu.org
-S: Supported
+S: Odd Fixes
 F: block/vvfat.c
 
 Image format fuzzer
-- 
2.26.2




Re: [PATCH v5 5/5] vhost-user-blk: default num_queues to -smp N

2020-07-10 Thread Raphael Norwitz
On Fri, Jul 10, 2020 at 5:53 AM Stefan Hajnoczi  wrote:
>
> On Thu, Jul 09, 2020 at 11:02:24AM -0700, Raphael Norwitz wrote:
> > On Mon, Jul 6, 2020 at 7:00 AM Stefan Hajnoczi  wrote:
> > > diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> > > index a00b854736..39aec42dae 100644
> > > --- a/hw/block/vhost-user-blk.c
> > > +++ b/hw/block/vhost-user-blk.c
> > > @@ -420,6 +420,9 @@ static void vhost_user_blk_device_realize(DeviceState 
> > > *dev, Error **errp)
> > >  return;
> > >  }
> > >
> > > +if (s->num_queues == VHOST_USER_BLK_AUTO_NUM_QUEUES) {
> > > +s->num_queues = 1;
> > > +}
> >
> > What is this check for? Is it just a backstop to ensure that
> > num_queues is set to 1 if vhost-user-blk-pci doesn't update it?
>
> For the non-PCI VIRTIO transports that do not handle num_queues ==
> VHOST_USER_BLK_AUTO_NUM_QUEUES themselves.
>

Got it. Looks good then.

Reviewed-by: Raphael Norwitz 

> Stefan



Re: [PATCH v7 14/47] stream: Deal with filters

2020-07-10 Thread Andrey Shinkevich

On 10.07.2020 18:24, Max Reitz wrote:

On 09.07.20 16:52, Andrey Shinkevich wrote:

On 25.06.2020 18:21, Max Reitz wrote:

Because of the (not so recent anymore) changes that make the stream job
independent of the base node and instead track the node above it, we
have to split that "bottom" node into two cases: The bottom COW node,
and the node directly above the base node (which may be an R/W filter
or the bottom COW node).

Signed-off-by: Max Reitz 
---
   qapi/block-core.json |  4 +++
   block/stream.c   | 63 
   blockdev.c   |  4 ++-
   3 files changed, 53 insertions(+), 18 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index b20332e592..df87855429 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2486,6 +2486,10 @@
   # On successful completion the image file is updated to drop the
backing file
   # and the BLOCK_JOB_COMPLETED event is emitted.
   #
+# In case @device is a filter node, block-stream modifies the first
non-filter
+# overlay node below it to point to base's backing node (or NULL if
@base was
+# not specified) instead of modifying @device itself.
+#
   # @job-id: identifier for the newly-created block job. If
   #  omitted, the device name will be used. (Since 2.7)
   #
diff --git a/block/stream.c b/block/stream.c
index aa2e7af98e..b9c1141656 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -31,7 +31,8 @@ enum {
     typedef struct StreamBlockJob {
   BlockJob common;
-    BlockDriverState *bottom;
+    BlockDriverState *base_overlay; /* COW overlay (stream from this) */
+    BlockDriverState *above_base;   /* Node directly above the base */

Keeping the base_overlay is enough to complete the stream job.

Depends on the definition.  If we decide it isn’t enough, then it isn’t
enough.


The above_base may disappear during the job and we can't rely on it.

In this version of this series, it may not, because the chain is frozen.
  So the above_base cannot disappear.


Once we insert a filter above the top bs of the stream job, the parallel 
jobs in


the iotests #030 will fail with 'frozen link error'. It is because of the

independent parallel stream or commit jobs that insert/remove their filters

asynchroniously.



We can discuss whether we should allow it to disappear, but I think not.

The problem is, we need something to set as the backing file after
streaming.  How do we figure out what that should be?  My proposal is we
keep above_base and use its immediate child.


We can do the same with the base_overlay.

If the backing node turns out to be a filter, the proper backing child will

be set after the filter is removed. So, we shouldn't care.



If we don’t keep above_base, then we’re basically left guessing as to
what should be the backing file after the stream job.


   BlockdevOnError on_error;
   char *backing_file_str;
   bool bs_read_only;
@@ -53,7 +54,7 @@ static void stream_abort(Job *job)
     if (s->chain_frozen) {
   BlockJob *bjob = >common;
-    bdrv_unfreeze_backing_chain(blk_bs(bjob->blk), s->bottom);
+    bdrv_unfreeze_backing_chain(blk_bs(bjob->blk), s->above_base);
   }
   }
   @@ -62,14 +63,15 @@ static int stream_prepare(Job *job)
   StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
   BlockJob *bjob = >common;
   BlockDriverState *bs = blk_bs(bjob->blk);
-    BlockDriverState *base = backing_bs(s->bottom);
+    BlockDriverState *unfiltered_bs = bdrv_skip_filters(bs);
+    BlockDriverState *base = bdrv_filter_or_cow_bs(s->above_base);

The initial base node may be a top node for a concurrent commit job and

may disappear.

Then it would just be replaced by another node, though, so above_base
keeps a child.  The @base here is not necessarily the initial @base, and
that’s intentional.


Not really. In my example, above_base becomes a dangling

pointer because after the commit job finishes, its filter that should 
belong to the


commit job frozen chain will be deleted. If we freeze the link to the 
above_base


for this job, the iotests #30 will not pass.


base = bdrv_filter_or_cow_bs(s->base_overlay) is more reliable.

But also wrong.  The point of keeping above_base around is to get its
child here to use that child as the new backing child of the top node.


   Error *local_err = NULL;
   int ret = 0;
   -    bdrv_unfreeze_backing_chain(bs, s->bottom);
+    bdrv_unfreeze_backing_chain(bs, s->above_base);
   s->chain_frozen = false;
   -    if (bs->backing) {
+    if (bdrv_cow_child(unfiltered_bs)) {
   const char *base_id = NULL, *base_fmt = NULL;
   if (base) {
   base_id = s->backing_file_str;
@@ -77,8 +79,8 @@ static int stream_prepare(Job *job)
   base_fmt = base->drv->format_name;
   }
   }
-    bdrv_set_backing_hd(bs, base, _err);
-    ret = bdrv_change_backing_file(bs, base_id, base_fmt);
+    

Re: [PATCH v11 00/34] Add subcluster allocation to qcow2

2020-07-10 Thread no-reply
Patchew URL: https://patchew.org/QEMU/cover.1594396418.git.be...@igalia.com/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

  CC  block/blkverify.o
  CC  block/blkreplay.o
/tmp/qemu-test/src/block/qcow2-cluster.c: In function 'qcow2_get_host_offset':
/tmp/qemu-test/src/block/qcow2-cluster.c:473:19: error: 'expected_type' may be 
used uninitialized in this function [-Werror=maybe-uninitialized]
 } else if (type != expected_type) {
   ^
/tmp/qemu-test/src/block/qcow2-cluster.c:449:25: note: 'expected_type' was 
declared here
 QCow2SubclusterType expected_type, type;
 ^
cc1: all warnings being treated as errors
make: *** [block/qcow2-cluster.o] Error 1
make: *** Waiting for unfinished jobs
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 669, in 
---
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', 
'--label', 'com.qemu.instance.uuid=21da7e0fdd084eea8b00971953a3ff2a', '-u', 
'1001', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', 
'-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 
'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', 
'/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', 
'/var/tmp/patchew-tester-tmp-yoa6xyt4/src/docker-src.2020-07-10-13.26.25.25621:/var/tmp/qemu:z,ro',
 'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit 
status 2.
filter=--filter=label=com.qemu.instance.uuid=21da7e0fdd084eea8b00971953a3ff2a
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-yoa6xyt4/src'
make: *** [docker-run-test-quick@centos7] Error 2

real2m27.189s
user0m8.691s


The full log is available at
http://patchew.org/logs/cover.1594396418.git.be...@igalia.com/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

[PATCH 1/2] block/block-backend: add converter from BlockAcctStats to BlockBackend

2020-07-10 Thread Denis V. Lunev
Right now BlockAcctStats is always reside on BlockBackend. This structure
is not used in any other place. Thus we are able to create a converter
from one pointer to another.

Signed-off-by: Denis V. Lunev 
CC: Vladimir Sementsov-Ogievskiy 
CC: Kevin Wolf 
CC: Max Reitz 
---
 block/block-backend.c  | 5 +
 include/sysemu/block-backend.h | 1 +
 2 files changed, 6 insertions(+)

diff --git a/block/block-backend.c b/block/block-backend.c
index 6936b25c83..e77a7e468e 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -2143,6 +2143,11 @@ BlockAcctStats *blk_get_stats(BlockBackend *blk)
 return >stats;
 }
 
+BlockBackend *blk_stats2blk(BlockAcctStats *s)
+{
+return container_of(s, BlockBackend, stats);
+}
+
 void *blk_aio_get(const AIOCBInfo *aiocb_info, BlockBackend *blk,
   BlockCompletionFunc *cb, void *opaque)
 {
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 8203d7f6f9..bd4694e7bc 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -227,6 +227,7 @@ void blk_add_insert_bs_notifier(BlockBackend *blk, Notifier 
*notify);
 void blk_io_plug(BlockBackend *blk);
 void blk_io_unplug(BlockBackend *blk);
 BlockAcctStats *blk_get_stats(BlockBackend *blk);
+BlockBackend *blk_stats2blk(BlockAcctStats *stats);
 BlockBackendRootState *blk_get_root_state(BlockBackend *blk);
 void blk_update_root_state(BlockBackend *blk);
 bool blk_get_detect_zeroes_from_root_state(BlockBackend *blk);
-- 
2.17.1




[PATCH 2/2] block: add logging facility for long standing IO requests

2020-07-10 Thread Denis V. Lunev
There are severe delays with IO requests processing if QEMU is running in
virtual machine or over software defined storage. Such delays potentially
results in unpredictable guest behavior. For example, guests over IDE or
SATA drive could remount filesystem read-only if write is performed
longer than 10 seconds.

Such reports are very complex to process. Some good starting point for this
seems quite reasonable. This patch provides one. It adds logging of such
potentially dangerous long IO operations.

Signed-off-by: Denis V. Lunev 
CC: Vladimir Sementsov-Ogievskiy 
CC: Kevin Wolf 
CC: Max Reitz 
---
 block/accounting.c | 59 +-
 blockdev.c |  7 -
 include/block/accounting.h |  5 +++-
 3 files changed, 68 insertions(+), 3 deletions(-)

diff --git a/block/accounting.c b/block/accounting.c
index 8d41c8a83a..3002444fa5 100644
--- a/block/accounting.c
+++ b/block/accounting.c
@@ -27,7 +27,9 @@
 #include "block/accounting.h"
 #include "block/block_int.h"
 #include "qemu/timer.h"
+#include "qemu/log.h"
 #include "sysemu/qtest.h"
+#include "sysemu/block-backend.h"
 
 static QEMUClockType clock_type = QEMU_CLOCK_REALTIME;
 static const int qtest_latency_ns = NANOSECONDS_PER_SECOND / 1000;
@@ -41,10 +43,11 @@ void block_acct_init(BlockAcctStats *stats)
 }
 
 void block_acct_setup(BlockAcctStats *stats, bool account_invalid,
-  bool account_failed)
+  bool account_failed, unsigned latency_log_threshold_ms)
 {
 stats->account_invalid = account_invalid;
 stats->account_failed = account_failed;
+stats->latency_log_threshold_ms = latency_log_threshold_ms;
 }
 
 void block_acct_cleanup(BlockAcctStats *stats)
@@ -182,6 +185,58 @@ void block_latency_histograms_clear(BlockAcctStats *stats)
 }
 }
 
+static const char *block_account_type(enum BlockAcctType type)
+{
+switch (type) {
+case BLOCK_ACCT_READ:
+return "READ";
+case BLOCK_ACCT_WRITE:
+return "WRITE";
+case BLOCK_ACCT_FLUSH:
+return "DISCARD";
+case BLOCK_ACCT_UNMAP:
+return "TRUNCATE";
+case BLOCK_ACCT_NONE:
+return "NONE";
+case BLOCK_MAX_IOTYPE:
+break;
+}
+return "UNKNOWN";
+}
+
+static void block_acct_report_long(BlockAcctStats *stats,
+   BlockAcctCookie *cookie, int64_t latency_ns)
+{
+unsigned latency_ms = latency_ns / 100;
+int64_t start_time_host_s;
+char buf[64];
+struct tm t;
+BlockDriverState *bs;
+
+if (cookie->type == BLOCK_ACCT_NONE) {
+return;
+}
+if (stats->latency_log_threshold_ms == 0) {
+return;
+}
+if (latency_ms < stats->latency_log_threshold_ms) {
+return;
+}
+
+start_time_host_s = cookie->start_time_ns / 10ull;
+strftime(buf, sizeof(buf), "%m-%d %H:%M:%S",
+ localtime_r(_time_host_s, ));
+
+bs = blk_bs(blk_stats2blk(stats));
+qemu_log("long %s[%ld] IO request: %d.03%d since %s.%03d bs: %s(%s, %s)\n",
+ block_account_type(cookie->type), cookie->bytes,
+ (int)(latency_ms / 1000), (int)(latency_ms % 1000), buf,
+ (int)((cookie->start_time_ns / 100) % 1000),
+ bs == NULL ? "unknown" : bdrv_get_node_name(bs),
+ bs == NULL ? "unknown" : bdrv_get_format_name(bs),
+ bs == NULL ? "unknown" : bs->filename);
+}
+
 static void block_account_one_io(BlockAcctStats *stats, BlockAcctCookie 
*cookie,
  bool failed)
 {
@@ -222,6 +277,8 @@ static void block_account_one_io(BlockAcctStats *stats, 
BlockAcctCookie *cookie,
 
 qemu_mutex_unlock(>lock);
 
+block_acct_report_long(stats, cookie, latency_ns);
+
 cookie->type = BLOCK_ACCT_NONE;
 }
 
diff --git a/blockdev.c b/blockdev.c
index 31d5eaf6bf..d87260958c 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -625,7 +625,8 @@ static BlockBackend *blockdev_init(const char *file, QDict 
*bs_opts,
 
 bs->detect_zeroes = detect_zeroes;
 
-block_acct_setup(blk_get_stats(blk), account_invalid, account_failed);
+block_acct_setup(blk_get_stats(blk), account_invalid, account_failed,
+qemu_opt_get_number(opts, "latency-log-threshold", 0));
 
 if (!parse_stats_intervals(blk_get_stats(blk), interval_list, errp)) {
 blk_unref(blk);
@@ -3740,6 +3741,10 @@ QemuOptsList qemu_common_drive_opts = {
 .type = QEMU_OPT_BOOL,
 .help = "whether to account for failed I/O operations "
 "in the statistics",
+},{
+.name = "latency-log-threshold",
+.type = QEMU_OPT_STRING,
+.help = "threshold for long I/O report (disabled if <=0), in ms",
 },
 { /* end of list */ }
 },
diff --git a/include/block/accounting.h b/include/block/accounting.h
index 878b4c3581..c3ea25f9aa 100644
--- a/include/block/accounting.h
+++ 

[PATCH 0/2] block: add logging facility for long standing IO requests

2020-07-10 Thread Denis V. Lunev
There are severe delays with IO requests processing if QEMU is running in
virtual machine or over software defined storage. Such delays potentially
results in unpredictable guest behavior. For example, guests over IDE or
SATA drive could remount filesystem read-only if write is performed
longer than 10 seconds.

Such reports are very complex to process. Some good starting point for this
seems quite reasonable. This patch provides one. It adds logging of such
potentially dangerous long IO operations.

Signed-off-by: Denis V. Lunev 
CC: Vladimir Sementsov-Ogievskiy 
CC: Kevin Wolf 
CC: Max Reitz 





Re: [PATCH 0/2] iotests: More _filter_img_create fixes

2020-07-10 Thread John Snow



On 7/10/20 12:32 PM, Max Reitz wrote:
> Hi,
> 
> I’m sorry.
> 
> John, could I ask you to test whether this series fixes the problems
> you’re seeing?
> 

This is based on kwolf/block, I see.

By the time you return to reading work email, this link will have
information for you:

https://travis-ci.org/github/jnsnow/qemu/jobs/706960907

> 
> Max Reitz (2):
>   iotests: Drop readarray from _do_filter_img_create
>   iotests: Set LC_ALL=C for sort
> 
>  tests/qemu-iotests/common.filter | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 

-- 
—js




Re: [PATCH v11 00/34] Add subcluster allocation to qcow2

2020-07-10 Thread no-reply
Patchew URL: https://patchew.org/QEMU/cover.1594396418.git.be...@igalia.com/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

  CC  block/snapshot.o
  CC  block/qapi.o
/tmp/qemu-test/src/block/qcow2-cluster.c: In function 'qcow2_get_host_offset':
/tmp/qemu-test/src/block/qcow2-cluster.c:473:19: error: 'expected_type' may be 
used uninitialized in this function [-Werror=maybe-uninitialized]
 } else if (type != expected_type) {
   ^
/tmp/qemu-test/src/block/qcow2-cluster.c:449:25: note: 'expected_type' was 
declared here
 QCow2SubclusterType expected_type, type;
 ^
cc1: all warnings being treated as errors
make: *** [block/qcow2-cluster.o] Error 1
make: *** Waiting for unfinished jobs
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 669, in 
---
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', 
'--label', 'com.qemu.instance.uuid=10623143eef0416b9c3b987efa6621a1', '-u', 
'1003', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', 
'-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 
'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', 
'/home/patchew2/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', 
'/var/tmp/patchew-tester-tmp-zd6p1e8g/src/docker-src.2020-07-10-12.40.45.27056:/var/tmp/qemu:z,ro',
 'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit 
status 2.
filter=--filter=label=com.qemu.instance.uuid=10623143eef0416b9c3b987efa6621a1
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-zd6p1e8g/src'
make: *** [docker-run-test-quick@centos7] Error 2

real3m13.395s
user0m8.778s


The full log is available at
http://patchew.org/logs/cover.1594396418.git.be...@igalia.com/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [PATCH 2/2] iotests: Set LC_ALL=C for sort

2020-07-10 Thread Eric Blake

On 7/10/20 11:32 AM, Max Reitz wrote:

Otherwise the result is basically unpredictable.

(Note that the precise environment variable to control sorting order is
LC_COLLATE, but LC_ALL overrides LC_COLLATE, and we do not want the
sorting order to be messed up if LC_ALL is set in the environment.)


Yep, that logic is correct.



Reported-by: John Snow 
Signed-off-by: Max Reitz 
---
  tests/qemu-iotests/common.filter | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)


Reviewed-by: Eric Blake 



diff --git a/tests/qemu-iotests/common.filter b/tests/qemu-iotests/common.filter
index 345c3ca03e..4fd5c29b2a 100644
--- a/tests/qemu-iotests/common.filter
+++ b/tests/qemu-iotests/common.filter
@@ -177,7 +177,7 @@ _do_filter_img_create()
  -e 's/^\(data_file\)/3-\1/' \
  -e 's/^\(encryption\)/4-\1/' \
  -e 's/^\(preallocation\)/8-\1/' \
-| sort \
+| LC_ALL=C sort \
  | $SED -e 's/^[0-9]-//' \
  | tr '\n\0' ' \n' \
  | $SED -e 's/^ *$//' -e 's/ *$//'



--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




[PATCH 1/2] iotests: Drop readarray from _do_filter_img_create

2020-07-10 Thread Max Reitz
Some systems where we run tests on do not have a 4.x bash, so they do
not have readarray.  While it looked a bit nicer than messing with
`head` and `tail`, we do not really need it, so we might as well not use
it.

Reported-by: Claudio Fontana 
Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/common.filter | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/tests/qemu-iotests/common.filter b/tests/qemu-iotests/common.filter
index 3833206327..345c3ca03e 100644
--- a/tests/qemu-iotests/common.filter
+++ b/tests/qemu-iotests/common.filter
@@ -138,13 +138,13 @@ _do_filter_img_create()
 # Split the line into the pre-options part ($filename_part, which
 # precedes ", fmt=") and the options part ($options, which starts
 # with "fmt=")
-# (And just echo everything before the first "^Formatting")
-readarray formatting_line < <($SED -e 's/, fmt=/\n/')
+read formatting_line
 
-filename_part=${formatting_line[0]}
-unset formatting_line[0]
+# Split line at the first ", fmt="
+formatting_line=$(echo "$formatting_line" | $SED -e 's/, fmt=/\nfmt=/')
 
-options="fmt=${formatting_line[@]}"
+filename_part=$(echo "$formatting_line" | head -n 1)
+options=$(echo "$formatting_line" | tail -n +2)
 
 # Set grep_data_file to '\|data_file' to keep it; make it empty
 # to drop it.
-- 
2.26.2




Re: [PATCH 1/2] iotests: Drop readarray from _do_filter_img_create

2020-07-10 Thread Eric Blake

On 7/10/20 11:32 AM, Max Reitz wrote:

Some systems where we run tests on do not have a 4.x bash, so they do
not have readarray.  While it looked a bit nicer than messing with
`head` and `tail`, we do not really need it, so we might as well not use
it.

Reported-by: Claudio Fontana 
Signed-off-by: Max Reitz 
---
  tests/qemu-iotests/common.filter | 10 +-
  1 file changed, 5 insertions(+), 5 deletions(-)



Reviewed-by: Eric Blake 
--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




[PATCH] fixup! qemu-img: Deprecate use of -b without -F

2020-07-10 Thread Eric Blake
Port to BSD truncate.

Signed-off-by: Eric Blake 
---
 tests/qemu-iotests/114 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/114 b/tests/qemu-iotests/114
index 143683381334..d0609c499388 100755
--- a/tests/qemu-iotests/114
+++ b/tests/qemu-iotests/114
@@ -45,7 +45,7 @@ _unsupported_imgopts data_file
 # Intentionally specify backing file without backing format; demonstrate
 # the difference in warning messages when backing file could be probed.
 # Note that only a non-raw probe result will affect the resulting image.
-truncate --size=64M "$TEST_IMG.orig"
+truncate -s $((64 * 1024 * 1024)) "$TEST_IMG.orig"
 _make_test_img -b "$TEST_IMG.orig" 64M

 TEST_IMG="$TEST_IMG.base" _make_test_img 64M
-- 
2.27.0




[PATCH 2/2] iotests: Set LC_ALL=C for sort

2020-07-10 Thread Max Reitz
Otherwise the result is basically unpredictable.

(Note that the precise environment variable to control sorting order is
LC_COLLATE, but LC_ALL overrides LC_COLLATE, and we do not want the
sorting order to be messed up if LC_ALL is set in the environment.)

Reported-by: John Snow 
Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/common.filter | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/common.filter b/tests/qemu-iotests/common.filter
index 345c3ca03e..4fd5c29b2a 100644
--- a/tests/qemu-iotests/common.filter
+++ b/tests/qemu-iotests/common.filter
@@ -177,7 +177,7 @@ _do_filter_img_create()
 -e 's/^\(data_file\)/3-\1/' \
 -e 's/^\(encryption\)/4-\1/' \
 -e 's/^\(preallocation\)/8-\1/' \
-| sort \
+| LC_ALL=C sort \
 | $SED -e 's/^[0-9]-//' \
 | tr '\n\0' ' \n' \
 | $SED -e 's/^ *$//' -e 's/ *$//'
-- 
2.26.2




[PATCH v11 31/34] qcow2: Add the 'extended_l2' option and the QCOW2_INCOMPAT_EXTL2 bit

2020-07-10 Thread Alberto Garcia
Now that the implementation of subclusters is complete we can finally
add the necessary options to create and read images with this feature,
which we call "extended L2 entries".

Signed-off-by: Alberto Garcia 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
---
 qapi/block-core.json |   7 +++
 block/qcow2.h|   8 ++-
 include/block/block_int.h|   1 +
 block/qcow2.c|  66 ++--
 tests/qemu-iotests/031.out   |   8 +--
 tests/qemu-iotests/036.out   |   4 +-
 tests/qemu-iotests/049.out   | 102 +++
 tests/qemu-iotests/060.out   |   1 +
 tests/qemu-iotests/061.out   |  20 +++---
 tests/qemu-iotests/065   |  12 ++--
 tests/qemu-iotests/082.out   |  39 +---
 tests/qemu-iotests/085.out   |  38 ++--
 tests/qemu-iotests/144.out   |   4 +-
 tests/qemu-iotests/182.out   |   2 +-
 tests/qemu-iotests/185.out   |   8 +--
 tests/qemu-iotests/198   |   2 +
 tests/qemu-iotests/206.out   |   4 ++
 tests/qemu-iotests/242.out   |   5 ++
 tests/qemu-iotests/255.out   |   8 +--
 tests/qemu-iotests/274.out   |  49 ---
 tests/qemu-iotests/280.out   |   2 +-
 tests/qemu-iotests/291.out   |   2 +
 tests/qemu-iotests/common.filter |   1 +
 23 files changed, 253 insertions(+), 140 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index b20332e592..0d4231ee98 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -66,6 +66,9 @@
 # standalone (read-only) raw image without looking at qcow2
 # metadata (since: 4.0)
 #
+# @extended-l2: true if the image has extended L2 entries; only valid for
+#   compat >= 1.1 (since 5.1)
+#
 # @lazy-refcounts: on or off; only valid for compat >= 1.1
 #
 # @corrupt: true if the image has been marked corrupt; only valid for
@@ -87,6 +90,7 @@
   'compat': 'str',
   '*data-file': 'str',
   '*data-file-raw': 'bool',
+  '*extended-l2': 'bool',
   '*lazy-refcounts': 'bool',
   '*corrupt': 'bool',
   'refcount-bits': 'int',
@@ -4318,6 +4322,8 @@
 # @data-file-raw: True if the external data file must stay valid as a
 # standalone (read-only) raw image without looking at qcow2
 # metadata (default: false; since: 4.0)
+# @extended-l2  True to make the image have extended L2 entries
+#   (default: false; since 5.1)
 # @size: Size of the virtual disk in bytes
 # @version: Compatibility level (default: v3)
 # @backing-file: File name of the backing file if a backing file
@@ -4338,6 +4344,7 @@
   'data': { 'file': 'BlockdevRef',
 '*data-file':   'BlockdevRef',
 '*data-file-raw':   'bool',
+'*extended-l2': 'bool',
 'size': 'size',
 '*version': 'BlockdevQcow2Version',
 '*backing-file':'str',
diff --git a/block/qcow2.h b/block/qcow2.h
index f3499e53bf..065ec3df0b 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -246,15 +246,18 @@ enum {
 QCOW2_INCOMPAT_CORRUPT_BITNR= 1,
 QCOW2_INCOMPAT_DATA_FILE_BITNR  = 2,
 QCOW2_INCOMPAT_COMPRESSION_BITNR = 3,
+QCOW2_INCOMPAT_EXTL2_BITNR  = 4,
 QCOW2_INCOMPAT_DIRTY= 1 << QCOW2_INCOMPAT_DIRTY_BITNR,
 QCOW2_INCOMPAT_CORRUPT  = 1 << QCOW2_INCOMPAT_CORRUPT_BITNR,
 QCOW2_INCOMPAT_DATA_FILE= 1 << QCOW2_INCOMPAT_DATA_FILE_BITNR,
 QCOW2_INCOMPAT_COMPRESSION  = 1 << QCOW2_INCOMPAT_COMPRESSION_BITNR,
+QCOW2_INCOMPAT_EXTL2= 1 << QCOW2_INCOMPAT_EXTL2_BITNR,
 
 QCOW2_INCOMPAT_MASK = QCOW2_INCOMPAT_DIRTY
 | QCOW2_INCOMPAT_CORRUPT
 | QCOW2_INCOMPAT_DATA_FILE
-| QCOW2_INCOMPAT_COMPRESSION,
+| QCOW2_INCOMPAT_COMPRESSION
+| QCOW2_INCOMPAT_EXTL2,
 };
 
 /* Compatible feature bits */
@@ -581,8 +584,7 @@ typedef enum QCow2MetadataOverlap {
 
 static inline bool has_subclusters(BDRVQcow2State *s)
 {
-/* FIXME: Return false until this feature is complete */
-return false;
+return s->incompatible_features & QCOW2_INCOMPAT_EXTL2;
 }
 
 static inline size_t l2_entry_size(BDRVQcow2State *s)
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 3d6cf88592..4c6545b950 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -58,6 +58,7 @@
 #define BLOCK_OPT_DATA_FILE "data_file"
 #define BLOCK_OPT_DATA_FILE_RAW "data_file_raw"
 #define BLOCK_OPT_COMPRESSION_TYPE  "compression_type"
+#define BLOCK_OPT_EXTL2 "extended_l2"
 
 #define BLOCK_PROBE_BUF_SIZE512
 
diff --git a/block/qcow2.c b/block/qcow2.c
index 8ca0f22069..b7a1cae39c 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1449,6 

[PATCH 0/2] iotests: More _filter_img_create fixes

2020-07-10 Thread Max Reitz
Hi,

I’m sorry.

John, could I ask you to test whether this series fixes the problems
you’re seeing?


Max Reitz (2):
  iotests: Drop readarray from _do_filter_img_create
  iotests: Set LC_ALL=C for sort

 tests/qemu-iotests/common.filter | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

-- 
2.26.2




[PATCH v11 21/34] qcow2: Add subcluster support to qcow2_get_host_offset()

2020-07-10 Thread Alberto Garcia
The logic of this function remains pretty much the same, except that
it uses count_contiguous_subclusters(), which combines the logic of
count_contiguous_clusters() / count_contiguous_clusters_unallocated()
and checks individual subclusters.

qcow2_cluster_to_subcluster_type() is not necessary as a separate
function anymore so it's inlined into its caller.

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
---
 block/qcow2.h |  38 ---
 block/qcow2-cluster.c | 150 ++
 2 files changed, 92 insertions(+), 96 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 5df761edc3..4fad40b96b 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -710,29 +710,6 @@ static inline QCow2ClusterType 
qcow2_get_cluster_type(BlockDriverState *bs,
 }
 }
 
-/*
- * For an image without extended L2 entries, return the
- * QCow2SubclusterType equivalent of a given QCow2ClusterType.
- */
-static inline
-QCow2SubclusterType qcow2_cluster_to_subcluster_type(QCow2ClusterType type)
-{
-switch (type) {
-case QCOW2_CLUSTER_COMPRESSED:
-return QCOW2_SUBCLUSTER_COMPRESSED;
-case QCOW2_CLUSTER_ZERO_PLAIN:
-return QCOW2_SUBCLUSTER_ZERO_PLAIN;
-case QCOW2_CLUSTER_ZERO_ALLOC:
-return QCOW2_SUBCLUSTER_ZERO_ALLOC;
-case QCOW2_CLUSTER_NORMAL:
-return QCOW2_SUBCLUSTER_NORMAL;
-case QCOW2_CLUSTER_UNALLOCATED:
-return QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN;
-default:
-g_assert_not_reached();
-}
-}
-
 /*
  * In an image without subsclusters @l2_bitmap is ignored and
  * @sc_index must be 0.
@@ -776,7 +753,20 @@ QCow2SubclusterType 
qcow2_get_subcluster_type(BlockDriverState *bs,
 g_assert_not_reached();
 }
 } else {
-return qcow2_cluster_to_subcluster_type(type);
+switch (type) {
+case QCOW2_CLUSTER_COMPRESSED:
+return QCOW2_SUBCLUSTER_COMPRESSED;
+case QCOW2_CLUSTER_ZERO_PLAIN:
+return QCOW2_SUBCLUSTER_ZERO_PLAIN;
+case QCOW2_CLUSTER_ZERO_ALLOC:
+return QCOW2_SUBCLUSTER_ZERO_ALLOC;
+case QCOW2_CLUSTER_NORMAL:
+return QCOW2_SUBCLUSTER_NORMAL;
+case QCOW2_CLUSTER_UNALLOCATED:
+return QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN;
+default:
+g_assert_not_reached();
+}
 }
 }
 
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 01495cb1c0..492296a90f 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -426,66 +426,66 @@ static int 
qcow2_get_subcluster_range_type(BlockDriverState *bs,
 }
 
 /*
- * Checks how many clusters in a given L2 slice are contiguous in the image
- * file. As soon as one of the flags in the bitmask stop_flags changes compared
- * to the first cluster, the search is stopped and the cluster is not counted
- * as contiguous. (This allows it, for example, to stop at the first compressed
- * cluster which may require a different handling)
+ * Return the number of contiguous subclusters of the exact same type
+ * in a given L2 slice, starting from cluster @l2_index, subcluster
+ * @sc_index. Allocated subclusters are required to be contiguous in
+ * the image file.
+ * At most @nb_clusters are checked (note that this means clusters,
+ * not subclusters).
+ * Compressed clusters are always processed one by one but for the
+ * purpose of this count they are treated as if they were divided into
+ * subclusters of size s->subcluster_size.
+ * On failure return -errno and update @l2_index to point to the
+ * invalid entry.
  */
-static int count_contiguous_clusters(BlockDriverState *bs, int nb_clusters,
-int cluster_size, uint64_t *l2_slice, int l2_index, uint64_t 
stop_flags)
+static int count_contiguous_subclusters(BlockDriverState *bs, int nb_clusters,
+unsigned sc_index, uint64_t *l2_slice,
+unsigned *l2_index)
 {
 BDRVQcow2State *s = bs->opaque;
-int i;
-QCow2ClusterType first_cluster_type;
-uint64_t mask = stop_flags | L2E_OFFSET_MASK | QCOW_OFLAG_COMPRESSED;
-uint64_t first_entry = get_l2_entry(s, l2_slice, l2_index);
-uint64_t offset = first_entry & mask;
+int i, count = 0;
+bool check_offset = false;
+uint64_t expected_offset = 0;
+QCow2SubclusterType expected_type, type;
 
-first_cluster_type = qcow2_get_cluster_type(bs, first_entry);
-if (first_cluster_type == QCOW2_CLUSTER_UNALLOCATED) {
-return 0;
-}
-
-/* must be allocated */
-assert(first_cluster_type == QCOW2_CLUSTER_NORMAL ||
-   first_cluster_type == QCOW2_CLUSTER_ZERO_ALLOC);
+assert(*l2_index + nb_clusters <= s->l2_slice_size);
 
 for (i = 0; i < nb_clusters; i++) {
-uint64_t l2_entry = get_l2_entry(s, l2_slice, l2_index + i) & mask;
-if (offset + (uint64_t) i * cluster_size != l2_entry) {
+unsigned first_sc = (i == 0) ? sc_index : 0;
+uint64_t 

[PATCH v11 34/34] iotests: Add tests for qcow2 images with extended L2 entries

2020-07-10 Thread Alberto Garcia
Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
---
 tests/qemu-iotests/271 | 901 +
 tests/qemu-iotests/271.out | 726 ++
 tests/qemu-iotests/group   |   1 +
 3 files changed, 1628 insertions(+)
 create mode 100755 tests/qemu-iotests/271
 create mode 100644 tests/qemu-iotests/271.out

diff --git a/tests/qemu-iotests/271 b/tests/qemu-iotests/271
new file mode 100755
index 00..39ff462328
--- /dev/null
+++ b/tests/qemu-iotests/271
@@ -0,0 +1,901 @@
+#!/bin/bash
+#
+# Test qcow2 images with extended L2 entries
+#
+# Copyright (C) 2019-2020 Igalia, S.L.
+# Author: Alberto Garcia 
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+# creator
+owner=be...@igalia.com
+
+seq="$(basename $0)"
+echo "QA output created by $seq"
+
+here="$PWD"
+status=1   # failure is the default!
+
+_cleanup()
+{
+_cleanup_test_img
+rm -f "$TEST_IMG.raw"
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+_supported_fmt qcow2
+_supported_proto file nfs
+_supported_os Linux
+_unsupported_imgopts extended_l2 compat=0.10 cluster_size data_file 
refcount_bits=1[^0-9]
+
+l2_offset=$((0x4))
+
+_verify_img()
+{
+$QEMU_IMG compare "$TEST_IMG" "$TEST_IMG.raw" | grep -v 'Images are 
identical'
+$QEMU_IMG check "$TEST_IMG" | _filter_qemu_img_check | \
+grep -v 'No errors were found on the image'
+}
+
+# Compare the bitmap of an extended L2 entry against an expected value
+_verify_l2_bitmap()
+{
+entry_no="$1"# L2 entry number, starting from 0
+expected_alloc="$alloc"  # Space-separated list of allocated subcluster 
indexes
+expected_zero="$zero"# Space-separated list of zero subcluster indexes
+
+offset=$(($l2_offset + $entry_no * 16))
+entry=$(peek_file_be "$TEST_IMG" $offset 8)
+offset=$(($offset + 8))
+bitmap=$(peek_file_be "$TEST_IMG" $offset 8)
+
+expected_bitmap=0
+for bit in $expected_alloc; do
+expected_bitmap=$(($expected_bitmap | (1 << $bit)))
+done
+for bit in $expected_zero; do
+expected_bitmap=$(($expected_bitmap | (1 << (32 + $bit
+done
+printf -v expected_bitmap "%u" $expected_bitmap # Convert to unsigned
+
+printf "L2 entry #%d: 0x%016x %016x\n" "$entry_no" "$entry" "$bitmap"
+if [ "$bitmap" != "$expected_bitmap" ]; then
+printf "ERROR: expecting bitmap   0x%016x\n" "$expected_bitmap"
+fi
+}
+
+# This should be called as _run_test c=XXX sc=XXX off=XXX len=XXX cmd=XXX
+# c:   cluster number (0 if unset)
+# sc:  subcluster number inside cluster @c (0 if unset)
+# off: offset inside subcluster @sc, in kilobytes (0 if unset)
+# len: request length, passed directly to qemu-io (e.g: 256, 4k, 1M, ...)
+# cmd: the command to pass to qemu-io, must be one of
+#  write-> write
+#  zero -> write -z
+#  unmap-> write -z -u
+#  compress -> write -c
+#  discard  -> discard
+_run_test()
+{
+unset c sc off len cmd
+for var in "$@"; do eval "$var"; done
+case "${cmd:-write}" in
+zero)
+cmd="write -q -z";;
+unmap)
+cmd="write -q -z -u";;
+compress)
+pat=$((${pat:-0} + 1))
+cmd="write -q -c -P ${pat}";;
+write)
+pat=$((${pat:-0} + 1))
+cmd="write -q -P ${pat}";;
+discard)
+cmd="discard -q";;
+*)
+echo "Unknown option $cmd"
+exit 1;;
+esac
+c="${c:-0}"
+sc="${sc:-0}"
+off="${off:-0}"
+offset="$(($c * 64 + $sc * 2 + $off))"
+[ "$offset" != 0 ] && offset="${offset}k"
+cmd="$cmd ${offset} ${len}"
+raw_cmd=$(echo $cmd | sed s/-c//) # Raw images don't support -c
+echo $cmd | sed 's/-P [0-9][0-9]\?/-P PATTERN/'
+$QEMU_IO -c "$cmd" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "$raw_cmd" -f raw "$TEST_IMG.raw" | _filter_qemu_io
+_verify_img
+_verify_l2_bitmap "$c"
+}
+
+_reset_img()
+{
+size="$1"
+$QEMU_IMG create -f raw "$TEST_IMG.raw" "$size" | _filter_img_create
+if [ "$use_backing_file" = "yes" ]; then
+$QEMU_IMG create -f raw "$TEST_IMG.base" "$size" | _filter_img_create
+$QEMU_IO -c "write -q -P 0xFF 0 $size" -f raw "$TEST_IMG.base" | 
_filter_qemu_io
+$QEMU_IO -c "write 

[PATCH v11 28/34] qcow2: Add subcluster support to qcow2_co_pwrite_zeroes()

2020-07-10 Thread Alberto Garcia
This works now at the subcluster level and pwrite_zeroes_alignment is
updated accordingly.

qcow2_cluster_zeroize() is turned into qcow2_subcluster_zeroize() with
the following changes:

   - The request can now be subcluster-aligned.

   - The cluster-aligned body of the request is still zeroized using
 zero_in_l2_slice() as before.

   - The subcluster-aligned head and tail of the request are zeroized
 with the new zero_l2_subclusters() function.

There is just one thing to take into account for a possible future
improvement: compressed clusters cannot be partially zeroized so
zero_l2_subclusters() on the head or the tail can return -ENOTSUP.
This makes the caller repeat the *complete* request and write actual
zeroes to disk. This is sub-optimal because

   1) if the head area was compressed we would still be able to use
  the fast path for the body and possibly the tail.

   2) if the tail area was compressed we are writing zeroes to the
  head and the body areas, which are already zeroized.

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
---
 block/qcow2.h |  4 +--
 block/qcow2-cluster.c | 81 +++
 block/qcow2.c | 33 +-
 3 files changed, 94 insertions(+), 24 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 4fad40b96b..4ef4ae4ab0 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -898,8 +898,8 @@ void qcow2_alloc_cluster_abort(BlockDriverState *bs, 
QCowL2Meta *m);
 int qcow2_cluster_discard(BlockDriverState *bs, uint64_t offset,
   uint64_t bytes, enum qcow2_discard_type type,
   bool full_discard);
-int qcow2_cluster_zeroize(BlockDriverState *bs, uint64_t offset,
-  uint64_t bytes, int flags);
+int qcow2_subcluster_zeroize(BlockDriverState *bs, uint64_t offset,
+ uint64_t bytes, int flags);
 
 int qcow2_expand_zero_clusters(BlockDriverState *bs,
BlockDriverAmendStatusCB *status_cb,
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 6b5d0698d0..e023bc4b01 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -2016,12 +2016,59 @@ static int zero_in_l2_slice(BlockDriverState *bs, 
uint64_t offset,
 return nb_clusters;
 }
 
-int qcow2_cluster_zeroize(BlockDriverState *bs, uint64_t offset,
-  uint64_t bytes, int flags)
+static int zero_l2_subclusters(BlockDriverState *bs, uint64_t offset,
+   unsigned nb_subclusters)
+{
+BDRVQcow2State *s = bs->opaque;
+uint64_t *l2_slice;
+uint64_t old_l2_bitmap, l2_bitmap;
+int l2_index, ret, sc = offset_to_sc_index(s, offset);
+
+/* For full clusters use zero_in_l2_slice() instead */
+assert(nb_subclusters > 0 && nb_subclusters < s->subclusters_per_cluster);
+assert(sc + nb_subclusters <= s->subclusters_per_cluster);
+assert(offset_into_subcluster(s, offset) == 0);
+
+ret = get_cluster_table(bs, offset, _slice, _index);
+if (ret < 0) {
+return ret;
+}
+
+switch (qcow2_get_cluster_type(bs, get_l2_entry(s, l2_slice, l2_index))) {
+case QCOW2_CLUSTER_COMPRESSED:
+ret = -ENOTSUP; /* We cannot partially zeroize compressed clusters */
+goto out;
+case QCOW2_CLUSTER_NORMAL:
+case QCOW2_CLUSTER_UNALLOCATED:
+break;
+default:
+g_assert_not_reached();
+}
+
+old_l2_bitmap = l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index);
+
+l2_bitmap |=  QCOW_OFLAG_SUB_ZERO_RANGE(sc, sc + nb_subclusters);
+l2_bitmap &= ~QCOW_OFLAG_SUB_ALLOC_RANGE(sc, sc + nb_subclusters);
+
+if (old_l2_bitmap != l2_bitmap) {
+set_l2_bitmap(s, l2_slice, l2_index, l2_bitmap);
+qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
+}
+
+ret = 0;
+out:
+qcow2_cache_put(s->l2_table_cache, (void **) _slice);
+
+return ret;
+}
+
+int qcow2_subcluster_zeroize(BlockDriverState *bs, uint64_t offset,
+ uint64_t bytes, int flags)
 {
 BDRVQcow2State *s = bs->opaque;
 uint64_t end_offset = offset + bytes;
 uint64_t nb_clusters;
+unsigned head, tail;
 int64_t cleared;
 int ret;
 
@@ -2036,8 +2083,8 @@ int qcow2_cluster_zeroize(BlockDriverState *bs, uint64_t 
offset,
 }
 
 /* Caller must pass aligned values, except at image end */
-assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
-assert(QEMU_IS_ALIGNED(end_offset, s->cluster_size) ||
+assert(offset_into_subcluster(s, offset) == 0);
+assert(offset_into_subcluster(s, end_offset) == 0 ||
end_offset >= bs->total_sectors << BDRV_SECTOR_BITS);
 
 /* The zero flag is only supported by version 3 and newer */
@@ -2045,11 +2092,26 @@ int qcow2_cluster_zeroize(BlockDriverState *bs, 
uint64_t offset,
 return -ENOTSUP;
 }
 
-/* Each L2 slice is handled by its own loop iteration */
-nb_clusters = 

[PATCH v11 26/34] qcow2: Clear the L2 bitmap when allocating a compressed cluster

2020-07-10 Thread Alberto Garcia
Compressed clusters always have the bitmap part of the extended L2
entry set to 0.

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
---
 block/qcow2-cluster.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 5bdbb65e7b..6b5d0698d0 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -862,6 +862,9 @@ int qcow2_alloc_compressed_cluster_offset(BlockDriverState 
*bs,
 BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE_COMPRESSED);
 qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
 set_l2_entry(s, l2_slice, l2_index, cluster_offset);
+if (has_subclusters(s)) {
+set_l2_bitmap(s, l2_slice, l2_index, 0);
+}
 qcow2_cache_put(s->l2_table_cache, (void **) _slice);
 
 *host_offset = cluster_offset & s->cluster_offset_mask;
-- 
2.20.1




[PATCH v11 30/34] qcow2: Add prealloc field to QCowL2Meta

2020-07-10 Thread Alberto Garcia
This field allows us to indicate that the L2 metadata update does not
come from a write request with actual data but from a preallocation
request.

For traditional images this does not make any difference, but for
images with extended L2 entries this means that the clusters are
allocated normally in the L2 table but individual subclusters are
marked as unallocated.

This will allow preallocating images that have a backing file.

There is one special case: when we resize an existing image we can
also request that the new clusters are preallocated. If the image
already had a backing file then we have to hide any possible stale
data and zero out the new clusters (see commit 955c7d6687 for more
details).

In this case the subclusters cannot be left as unallocated so the L2
bitmap must be updated.

Signed-off-by: Alberto Garcia 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
---
 block/qcow2.h | 8 
 block/qcow2-cluster.c | 2 +-
 block/qcow2.c | 6 ++
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 4ef4ae4ab0..f3499e53bf 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -463,6 +463,14 @@ typedef struct QCowL2Meta
  */
 bool skip_cow;
 
+/**
+ * Indicates that this is not a normal write request but a preallocation.
+ * If the image has extended L2 entries this means that no new individual
+ * subclusters will be marked as allocated in the L2 bitmap (but any
+ * existing contents of that bitmap will be kept).
+ */
+bool prealloc;
+
 /**
  * The I/O vector with the data from the actual guest write request.
  * If non-NULL, this is meant to be merged together with the data
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index e023bc4b01..bbf373ad21 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1067,7 +1067,7 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, 
QCowL2Meta *m)
 set_l2_entry(s, l2_slice, l2_index + i, offset | QCOW_OFLAG_COPIED);
 
 /* Update bitmap with the subclusters that were just written */
-if (has_subclusters(s)) {
+if (has_subclusters(s) && !m->prealloc) {
 uint64_t l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index + i);
 unsigned written_from = m->cow_start.offset;
 unsigned written_to = m->cow_end.offset + m->cow_end.nb_bytes ?:
diff --git a/block/qcow2.c b/block/qcow2.c
index 9ab2a91c8b..8ca0f22069 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2096,6 +2096,7 @@ static coroutine_fn int 
qcow2_handle_l2meta(BlockDriverState *bs,
 QCowL2Meta *next;
 
 if (link_l2) {
+assert(!l2meta->prealloc);
 ret = qcow2_alloc_cluster_link_l2(bs, l2meta);
 if (ret) {
 goto out;
@@ -3130,6 +3131,7 @@ static int coroutine_fn preallocate_co(BlockDriverState 
*bs, uint64_t offset,
 
 while (meta) {
 QCowL2Meta *next = meta->next;
+meta->prealloc = true;
 
 ret = qcow2_alloc_cluster_link_l2(bs, meta);
 if (ret < 0) {
@@ -4217,6 +4219,7 @@ static int coroutine_fn 
qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
 int64_t clusters_allocated;
 int64_t old_file_size, last_cluster, new_file_size;
 uint64_t nb_new_data_clusters, nb_new_l2_tables;
+bool subclusters_need_allocation = false;
 
 /* With a data file, preallocation means just allocating the metadata
  * and forwarding the truncate request to the data file */
@@ -4298,6 +4301,8 @@ static int coroutine_fn 
qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
BDRV_REQ_ZERO_WRITE, NULL);
 if (ret >= 0) {
 flags &= ~BDRV_REQ_ZERO_WRITE;
+/* Ensure that we read zeroes and not backing file data */
+subclusters_need_allocation = true;
 }
 } else {
 ret = -1;
@@ -4336,6 +4341,7 @@ static int coroutine_fn 
qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
 .offset   = nb_clusters << s->cluster_bits,
 .nb_bytes = 0,
 },
+.prealloc = !subclusters_need_allocation,
 };
 qemu_co_queue_init(_requests);
 
-- 
2.20.1




[PATCH v11 33/34] qcow2: Assert that expand_zero_clusters_in_l1() does not support subclusters

2020-07-10 Thread Alberto Garcia
This function is only used by qcow2_expand_zero_clusters() to
downgrade a qcow2 image to a previous version. This would require
transforming all extended L2 entries into normal L2 entries but this
is not a simple task and there are no plans to implement this at the
moment.

Signed-off-by: Alberto Garcia 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
---
 block/qcow2-cluster.c  | 8 +++-
 tests/qemu-iotests/061 | 6 ++
 tests/qemu-iotests/061.out | 5 +
 3 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index bbf373ad21..81c49ba934 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -2159,6 +2159,9 @@ static int expand_zero_clusters_in_l1(BlockDriverState 
*bs, uint64_t *l1_table,
 int ret;
 int i, j;
 
+/* qcow2_downgrade() is not allowed in images with subclusters */
+assert(!has_subclusters(s));
+
 slice_size2 = s->l2_slice_size * l2_entry_size(s);
 n_slices = s->cluster_size / slice_size2;
 
@@ -2227,7 +2230,8 @@ static int expand_zero_clusters_in_l1(BlockDriverState 
*bs, uint64_t *l1_table,
 if (cluster_type == QCOW2_CLUSTER_ZERO_PLAIN) {
 if (!bs->backing) {
 /* not backed; therefore we can simply deallocate the
- * cluster */
+ * cluster. No need to call set_l2_bitmap(), this
+ * function doesn't support images with subclusters. */
 set_l2_entry(s, l2_slice, j, 0);
 l2_dirty = true;
 continue;
@@ -2298,6 +2302,8 @@ static int expand_zero_clusters_in_l1(BlockDriverState 
*bs, uint64_t *l1_table,
 } else {
 set_l2_entry(s, l2_slice, j, offset);
 }
+/* No need to call set_l2_bitmap() after set_l2_entry() because
+ * this function doesn't support images with subclusters. */
 l2_dirty = true;
 }
 
diff --git a/tests/qemu-iotests/061 b/tests/qemu-iotests/061
index 10eb243164..23add2dfe3 100755
--- a/tests/qemu-iotests/061
+++ b/tests/qemu-iotests/061
@@ -303,6 +303,12 @@ $QEMU_IMG amend -o "compat=0.10" "$TEST_IMG"
 _img_info --format-specific
 _check_test_img
 
+echo
+echo "=== Testing version downgrade with extended L2 entries ==="
+echo
+_make_test_img -o "compat=1.1,extended_l2=on" 64M
+$QEMU_IMG amend -o "compat=0.10" "$TEST_IMG"
+
 echo
 echo "=== Try changing the external data file ==="
 echo
diff --git a/tests/qemu-iotests/061.out b/tests/qemu-iotests/061.out
index caa53e87cc..57d8c29f0b 100644
--- a/tests/qemu-iotests/061.out
+++ b/tests/qemu-iotests/061.out
@@ -532,6 +532,11 @@ Format specific information:
 extended l2: false
 No errors were found on the image.
 
+=== Testing version downgrade with extended L2 entries ===
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
+qemu-img: Cannot downgrade an image with incompatible features 0x10 set
+
 === Try changing the external data file ===
 
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
-- 
2.20.1




[PATCH v11 17/34] qcow2: Add cluster type parameter to qcow2_get_host_offset()

2020-07-10 Thread Alberto Garcia
This function returns an integer that can be either an error code or a
cluster type (a value from the QCow2ClusterType enum).

We are going to start using subcluster types instead of cluster types
in some functions so it's better to use the exact data types instead
of integers for clarity and in order to detect errors more easily.

This patch makes qcow2_get_host_offset() return 0 on success and
puts the returned cluster type in a separate parameter. There are no
semantic changes.

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 block/qcow2.h |  3 ++-
 block/qcow2-cluster.c | 11 +++
 block/qcow2.c | 37 ++---
 3 files changed, 31 insertions(+), 20 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index ea647c8bb5..74f65793bd 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -893,7 +893,8 @@ int qcow2_encrypt_sectors(BDRVQcow2State *s, int64_t 
sector_num,
   uint8_t *buf, int nb_sectors, bool enc, Error 
**errp);
 
 int qcow2_get_host_offset(BlockDriverState *bs, uint64_t offset,
-  unsigned int *bytes, uint64_t *host_offset);
+  unsigned int *bytes, uint64_t *host_offset,
+  QCow2ClusterType *cluster_type);
 int qcow2_alloc_cluster_offset(BlockDriverState *bs, uint64_t offset,
unsigned int *bytes, uint64_t *host_offset,
QCowL2Meta **m);
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 751906c330..8e1e928852 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -565,13 +565,14 @@ static int coroutine_fn 
do_perform_cow_write(BlockDriverState *bs,
  *
  * On exit, *bytes is the number of bytes starting at offset that have the same
  * cluster type and (if applicable) are stored contiguously in the image file.
+ * The cluster type is stored in *cluster_type.
  * Compressed clusters are always returned one by one.
  *
- * Returns the cluster type (QCOW2_CLUSTER_*) on success, -errno in error
- * cases.
+ * Returns 0 on success, -errno in error cases.
  */
 int qcow2_get_host_offset(BlockDriverState *bs, uint64_t offset,
-  unsigned int *bytes, uint64_t *host_offset)
+  unsigned int *bytes, uint64_t *host_offset,
+  QCow2ClusterType *cluster_type)
 {
 BDRVQcow2State *s = bs->opaque;
 unsigned int l2_index;
@@ -713,7 +714,9 @@ out:
 assert(bytes_available - offset_in_cluster <= UINT_MAX);
 *bytes = bytes_available - offset_in_cluster;
 
-return type;
+*cluster_type = type;
+
+return 0;
 
 fail:
 qcow2_cache_put(s->l2_table_cache, (void **)_slice);
diff --git a/block/qcow2.c b/block/qcow2.c
index 49772143b3..5122b71cf2 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2043,6 +2043,7 @@ static int coroutine_fn 
qcow2_co_block_status(BlockDriverState *bs,
 BDRVQcow2State *s = bs->opaque;
 uint64_t host_offset;
 unsigned int bytes;
+QCow2ClusterType type;
 int ret, status = 0;
 
 qemu_co_mutex_lock(>lock);
@@ -2054,7 +2055,7 @@ static int coroutine_fn 
qcow2_co_block_status(BlockDriverState *bs,
 }
 
 bytes = MIN(INT_MAX, count);
-ret = qcow2_get_host_offset(bs, offset, , _offset);
+ret = qcow2_get_host_offset(bs, offset, , _offset, );
 qemu_co_mutex_unlock(>lock);
 if (ret < 0) {
 return ret;
@@ -2062,15 +2063,15 @@ static int coroutine_fn 
qcow2_co_block_status(BlockDriverState *bs,
 
 *pnum = bytes;
 
-if ((ret == QCOW2_CLUSTER_NORMAL || ret == QCOW2_CLUSTER_ZERO_ALLOC) &&
+if ((type == QCOW2_CLUSTER_NORMAL || type == QCOW2_CLUSTER_ZERO_ALLOC) &&
 !s->crypto) {
 *map = host_offset;
 *file = s->data_file->bs;
 status |= BDRV_BLOCK_OFFSET_VALID;
 }
-if (ret == QCOW2_CLUSTER_ZERO_PLAIN || ret == QCOW2_CLUSTER_ZERO_ALLOC) {
+if (type == QCOW2_CLUSTER_ZERO_PLAIN || type == QCOW2_CLUSTER_ZERO_ALLOC) {
 status |= BDRV_BLOCK_ZERO;
-} else if (ret != QCOW2_CLUSTER_UNALLOCATED) {
+} else if (type != QCOW2_CLUSTER_UNALLOCATED) {
 status |= BDRV_BLOCK_DATA;
 }
 if (s->metadata_preallocation && (status & BDRV_BLOCK_DATA) &&
@@ -2279,6 +2280,7 @@ static coroutine_fn int 
qcow2_co_preadv_part(BlockDriverState *bs,
 int ret = 0;
 unsigned int cur_bytes; /* number of bytes in current iteration */
 uint64_t host_offset = 0;
+QCow2ClusterType type;
 AioTaskPool *aio = NULL;
 
 while (bytes != 0 && aio_task_pool_status(aio) == 0) {
@@ -2290,22 +2292,23 @@ static coroutine_fn int 
qcow2_co_preadv_part(BlockDriverState *bs,
 }
 
 qemu_co_mutex_lock(>lock);
-ret = qcow2_get_host_offset(bs, offset, _bytes, _offset);
+ret = qcow2_get_host_offset(bs, offset, _bytes,
+_offset, );
 

[PATCH v11 16/34] qcow2: Add qcow2_cluster_is_allocated()

2020-07-10 Thread Alberto Garcia
This helper function tells us if a cluster is allocated (that is,
there is an associated host offset for it).

Signed-off-by: Alberto Garcia 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
---
 block/qcow2.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/block/qcow2.h b/block/qcow2.h
index 3aec6f452a..ea647c8bb5 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -780,6 +780,12 @@ QCow2SubclusterType 
qcow2_get_subcluster_type(BlockDriverState *bs,
 }
 }
 
+static inline bool qcow2_cluster_is_allocated(QCow2ClusterType type)
+{
+return (type == QCOW2_CLUSTER_COMPRESSED || type == QCOW2_CLUSTER_NORMAL ||
+type == QCOW2_CLUSTER_ZERO_ALLOC);
+}
+
 /* Check whether refcounts are eager or lazy */
 static inline bool qcow2_need_accurate_refcounts(BDRVQcow2State *s)
 {
-- 
2.20.1




Re: [PULL v2] Block layer patches

2020-07-10 Thread Eric Blake

On 7/9/20 7:17 AM, Kevin Wolf wrote:

The following changes since commit 8796c64ecdfd34be394ea277a53df0c76996:

   Merge remote-tracking branch 
'remotes/kraxel/tags/audio-20200706-pull-request' into staging (2020-07-08 
16:33:59 +0100)

are available in the Git repository at:

   git://repo.or.cz/qemu/kevin.git tags/for-upstream

for you to fetch changes up to d1c824e681e423a6224c6831a493e9175fa02dc1:

   qemu-img: Deprecate use of -b without -F (2020-07-09 14:14:55 +0200)


Block layer patches:

- file-posix: Mitigate file fragmentation with extent size hints
- Tighten qemu-img rules on missing backing format
- qemu-img map: Don't limit block status request size


Eric Blake (10):
   qemu-img: Flush stdout before before potential stderr messages
   block: Finish deprecation of 'qemu-img convert -n -o'
   sheepdog: Add trivial backing_fmt support
   vmdk: Add trivial backing_fmt support
   qcow: Tolerate backing_fmt=


I see you renumbered my test 293 to 301 for conflict resolution, but did 
not update the commit message to call out the updated number ;)


Since we may have to do a v3 because of the 'truncate -s' issue in 114, 
you could touch that up too.




  tests/qemu-iotests/301   | 88 
  tests/qemu-iotests/301.out   | 59 +++
  tests/qemu-iotests/common.filter | 62 
  tests/qemu-iotests/group |  1 +
  146 files changed, 906 insertions(+), 472 deletions(-)
  create mode 100755 tests/qemu-iotests/301
  create mode 100644 tests/qemu-iotests/301.out




--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




[PATCH v11 29/34] qcow2: Add subcluster support to qcow2_measure()

2020-07-10 Thread Alberto Garcia
Extended L2 entries are bigger than normal L2 entries so this has an
impact on the amount of metadata needed for a qcow2 file.

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
---
 block/qcow2.c | 20 +---
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 358e60a261..9ab2a91c8b 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3232,28 +3232,31 @@ int64_t qcow2_refcount_metadata_size(int64_t clusters, 
size_t cluster_size,
  * @total_size: virtual disk size in bytes
  * @cluster_size: cluster size in bytes
  * @refcount_order: refcount bits power-of-2 exponent
+ * @extended_l2: true if the image has extended L2 entries
  *
  * Returns: Total number of bytes required for the fully allocated image
  * (including metadata).
  */
 static int64_t qcow2_calc_prealloc_size(int64_t total_size,
 size_t cluster_size,
-int refcount_order)
+int refcount_order,
+bool extended_l2)
 {
 int64_t meta_size = 0;
 uint64_t nl1e, nl2e;
 int64_t aligned_total_size = ROUND_UP(total_size, cluster_size);
+size_t l2e_size = extended_l2 ? L2E_SIZE_EXTENDED : L2E_SIZE_NORMAL;
 
 /* header: 1 cluster */
 meta_size += cluster_size;
 
 /* total size of L2 tables */
 nl2e = aligned_total_size / cluster_size;
-nl2e = ROUND_UP(nl2e, cluster_size / sizeof(uint64_t));
-meta_size += nl2e * sizeof(uint64_t);
+nl2e = ROUND_UP(nl2e, cluster_size / l2e_size);
+meta_size += nl2e * l2e_size;
 
 /* total size of L1 tables */
-nl1e = nl2e * sizeof(uint64_t) / cluster_size;
+nl1e = nl2e * l2e_size / cluster_size;
 nl1e = ROUND_UP(nl1e, cluster_size / sizeof(uint64_t));
 meta_size += nl1e * sizeof(uint64_t);
 
@@ -4838,6 +4841,8 @@ static BlockMeasureInfo *qcow2_measure(QemuOpts *opts, 
BlockDriverState *in_bs,
 PreallocMode prealloc;
 bool has_backing_file;
 bool has_luks;
+bool extended_l2 = false; /* Set to false until the option is added */
+size_t l2e_size;
 
 /* Parse image creation options */
 cluster_size = qcow2_opt_get_cluster_size_del(opts, _err);
@@ -4896,8 +4901,9 @@ static BlockMeasureInfo *qcow2_measure(QemuOpts *opts, 
BlockDriverState *in_bs,
 virtual_size = ROUND_UP(virtual_size, cluster_size);
 
 /* Check that virtual disk size is valid */
+l2e_size = extended_l2 ? L2E_SIZE_EXTENDED : L2E_SIZE_NORMAL;
 l2_tables = DIV_ROUND_UP(virtual_size / cluster_size,
- cluster_size / sizeof(uint64_t));
+ cluster_size / l2e_size);
 if (l2_tables * sizeof(uint64_t) > QCOW_MAX_L1_SIZE) {
 error_setg(_err, "The image size is too large "
"(try using a larger cluster size)");
@@ -4960,9 +4966,9 @@ static BlockMeasureInfo *qcow2_measure(QemuOpts *opts, 
BlockDriverState *in_bs,
 }
 
 info = g_new0(BlockMeasureInfo, 1);
-info->fully_allocated =
+info->fully_allocated = luks_payload_size +
 qcow2_calc_prealloc_size(virtual_size, cluster_size,
- ctz32(refcount_bits)) + luks_payload_size;
+ ctz32(refcount_bits), extended_l2);
 
 /*
  * Remove data clusters that are not required.  This overestimates the
-- 
2.20.1




[PATCH v11 27/34] qcow2: Add subcluster support to handle_alloc_space()

2020-07-10 Thread Alberto Garcia
The bdrv_co_pwrite_zeroes() call here fills complete clusters with
zeroes, but it can happen that some subclusters are not part of the
write request or the copy-on-write. This patch makes sure that only
the affected subclusters are overwritten.

A potential improvement would be to also fill with zeroes the other
subclusters if we can guarantee that we are not overwriting existing
data. However this would waste more disk space, so we should first
evaluate if it's really worth doing.

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 block/qcow2.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 854a2fbd9b..b58c421c28 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2421,6 +2421,9 @@ static int handle_alloc_space(BlockDriverState *bs, 
QCowL2Meta *l2meta)
 
 for (m = l2meta; m != NULL; m = m->next) {
 int ret;
+uint64_t start_offset = m->alloc_offset + m->cow_start.offset;
+unsigned nb_bytes = m->cow_end.offset + m->cow_end.nb_bytes -
+m->cow_start.offset;
 
 if (!m->cow_start.nb_bytes && !m->cow_end.nb_bytes) {
 continue;
@@ -2435,16 +2438,14 @@ static int handle_alloc_space(BlockDriverState *bs, 
QCowL2Meta *l2meta)
  * efficiently zero out the whole clusters
  */
 
-ret = qcow2_pre_write_overlap_check(bs, 0, m->alloc_offset,
-m->nb_clusters * s->cluster_size,
+ret = qcow2_pre_write_overlap_check(bs, 0, start_offset, nb_bytes,
 true);
 if (ret < 0) {
 return ret;
 }
 
 BLKDBG_EVENT(bs->file, BLKDBG_CLUSTER_ALLOC_SPACE);
-ret = bdrv_co_pwrite_zeroes(s->data_file, m->alloc_offset,
-m->nb_clusters * s->cluster_size,
+ret = bdrv_co_pwrite_zeroes(s->data_file, start_offset, nb_bytes,
 BDRV_REQ_NO_FALLBACK);
 if (ret < 0) {
 if (ret != -ENOTSUP && ret != -EAGAIN) {
-- 
2.20.1




[PATCH v11 23/34] qcow2: Add subcluster support to discard_in_l2_slice()

2020-07-10 Thread Alberto Garcia
Two things need to be taken into account here:

1) With full_discard == true the L2 entry must be cleared completely.
   This also includes the L2 bitmap if the image has extended L2
   entries.

2) With full_discard == false we have to make the discarded cluster
   read back as zeroes. With normal L2 entries this is done with the
   QCOW_OFLAG_ZERO bit, whereas with extended L2 entries this is done
   with the individual 'all zeroes' bits for each subcluster.

   Note however that QCOW_OFLAG_ZERO is not supported in v2 qcow2
   images so, if there is a backing file, discard cannot guarantee
   that the image will read back as zeroes. If this is important for
   the caller it should forbid it as qcow2_co_pdiscard() does (see
   80f5c01183 for more details).

Signed-off-by: Alberto Garcia 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
---
 block/qcow2-cluster.c | 52 +++
 1 file changed, 23 insertions(+), 29 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index cb07fd00a1..ea025dc531 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1848,11 +1848,17 @@ static int discard_in_l2_slice(BlockDriverState *bs, 
uint64_t offset,
 assert(nb_clusters <= INT_MAX);
 
 for (i = 0; i < nb_clusters; i++) {
-uint64_t old_l2_entry;
-
-old_l2_entry = get_l2_entry(s, l2_slice, l2_index + i);
+uint64_t old_l2_entry = get_l2_entry(s, l2_slice, l2_index + i);
+uint64_t old_l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index + i);
+uint64_t new_l2_entry = old_l2_entry;
+uint64_t new_l2_bitmap = old_l2_bitmap;
+QCow2ClusterType cluster_type =
+qcow2_get_cluster_type(bs, old_l2_entry);
 
 /*
+ * If full_discard is true, the cluster should not read back as zeroes,
+ * but rather fall through to the backing file.
+ *
  * If full_discard is false, make sure that a discarded area reads back
  * as zeroes for v3 images (we cannot do it for v2 without actually
  * writing a zero-filled buffer). We can skip the operation if the
@@ -1861,40 +1867,28 @@ static int discard_in_l2_slice(BlockDriverState *bs, 
uint64_t offset,
  *
  * TODO We might want to use bdrv_block_status(bs) here, but we're
  * holding s->lock, so that doesn't work today.
- *
- * If full_discard is true, the sector should not read back as zeroes,
- * but rather fall through to the backing file.
  */
-switch (qcow2_get_cluster_type(bs, old_l2_entry)) {
-case QCOW2_CLUSTER_UNALLOCATED:
-if (full_discard || !bs->backing) {
-continue;
+if (full_discard) {
+new_l2_entry = new_l2_bitmap = 0;
+} else if (bs->backing || qcow2_cluster_is_allocated(cluster_type)) {
+if (has_subclusters(s)) {
+new_l2_entry = 0;
+new_l2_bitmap = QCOW_L2_BITMAP_ALL_ZEROES;
+} else {
+new_l2_entry = s->qcow_version >= 3 ? QCOW_OFLAG_ZERO : 0;
 }
-break;
+}
 
-case QCOW2_CLUSTER_ZERO_PLAIN:
-if (!full_discard) {
-continue;
-}
-break;
-
-case QCOW2_CLUSTER_ZERO_ALLOC:
-case QCOW2_CLUSTER_NORMAL:
-case QCOW2_CLUSTER_COMPRESSED:
-break;
-
-default:
-abort();
+if (old_l2_entry == new_l2_entry && old_l2_bitmap == new_l2_bitmap) {
+continue;
 }
 
 /* First remove L2 entries */
 qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
-if (!full_discard && s->qcow_version >= 3) {
-set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);
-} else {
-set_l2_entry(s, l2_slice, l2_index + i, 0);
+set_l2_entry(s, l2_slice, l2_index + i, new_l2_entry);
+if (has_subclusters(s)) {
+set_l2_bitmap(s, l2_slice, l2_index + i, new_l2_bitmap);
 }
-
 /* Then decrease the refcount */
 qcow2_free_any_clusters(bs, old_l2_entry, 1, type);
 }
-- 
2.20.1




[PATCH v11 13/34] qcow2: Update get/set_l2_entry() and add get/set_l2_bitmap()

2020-07-10 Thread Alberto Garcia
Extended L2 entries are 128-bit wide: 64 bits for the entry itself and
64 bits for the subcluster allocation bitmap.

In order to support them correctly get/set_l2_entry() need to be
updated so they take the entry width into account in order to
calculate the correct offset.

This patch also adds the get/set_l2_bitmap() functions that are
used to access the bitmaps. For convenience we allow calling
get_l2_bitmap() on images without subclusters. In this case the
returned value is always 0 and has no meaning.

Signed-off-by: Alberto Garcia 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
---
 block/qcow2.h | 21 +
 1 file changed, 21 insertions(+)

diff --git a/block/qcow2.h b/block/qcow2.h
index 46b351229a..82b86f6cec 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -533,15 +533,36 @@ static inline size_t l2_entry_size(BDRVQcow2State *s)
 static inline uint64_t get_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice,
 int idx)
 {
+idx *= l2_entry_size(s) / sizeof(uint64_t);
 return be64_to_cpu(l2_slice[idx]);
 }
 
+static inline uint64_t get_l2_bitmap(BDRVQcow2State *s, uint64_t *l2_slice,
+ int idx)
+{
+if (has_subclusters(s)) {
+idx *= l2_entry_size(s) / sizeof(uint64_t);
+return be64_to_cpu(l2_slice[idx + 1]);
+} else {
+return 0; /* For convenience only; this value has no meaning. */
+}
+}
+
 static inline void set_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice,
 int idx, uint64_t entry)
 {
+idx *= l2_entry_size(s) / sizeof(uint64_t);
 l2_slice[idx] = cpu_to_be64(entry);
 }
 
+static inline void set_l2_bitmap(BDRVQcow2State *s, uint64_t *l2_slice,
+ int idx, uint64_t bitmap)
+{
+assert(has_subclusters(s));
+idx *= l2_entry_size(s) / sizeof(uint64_t);
+l2_slice[idx + 1] = cpu_to_be64(bitmap);
+}
+
 static inline bool has_data_file(BlockDriverState *bs)
 {
 BDRVQcow2State *s = bs->opaque;
-- 
2.20.1




[PATCH v11 25/34] qcow2: Update L2 bitmap in qcow2_alloc_cluster_link_l2()

2020-07-10 Thread Alberto Garcia
The L2 bitmap needs to be updated after each write to indicate what
new subclusters are now allocated. This needs to happen even if the
cluster was already allocated and the L2 entry was otherwise valid.

In some cases however a write operation doesn't need change the L2
bitmap (because all affected subclusters were already allocated). This
is detected in calculate_l2_meta(), and qcow2_alloc_cluster_link_l2()
is never called in those cases.

Signed-off-by: Alberto Garcia 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
---
 block/qcow2-cluster.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index ea025dc531..5bdbb65e7b 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1062,6 +1062,24 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, 
QCowL2Meta *m)
 assert((offset & L2E_OFFSET_MASK) == offset);
 
 set_l2_entry(s, l2_slice, l2_index + i, offset | QCOW_OFLAG_COPIED);
+
+/* Update bitmap with the subclusters that were just written */
+if (has_subclusters(s)) {
+uint64_t l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index + i);
+unsigned written_from = m->cow_start.offset;
+unsigned written_to = m->cow_end.offset + m->cow_end.nb_bytes ?:
+m->nb_clusters << s->cluster_bits;
+int first_sc, last_sc;
+/* Narrow written_from and written_to down to the current cluster 
*/
+written_from = MAX(written_from, i << s->cluster_bits);
+written_to   = MIN(written_to, (i + 1) << s->cluster_bits);
+assert(written_from < written_to);
+first_sc = offset_to_sc_index(s, written_from);
+last_sc  = offset_to_sc_index(s, written_to - 1);
+l2_bitmap |= QCOW_OFLAG_SUB_ALLOC_RANGE(first_sc, last_sc + 1);
+l2_bitmap &= ~QCOW_OFLAG_SUB_ZERO_RANGE(first_sc, last_sc + 1);
+set_l2_bitmap(s, l2_slice, l2_index + i, l2_bitmap);
+}
  }
 
 
-- 
2.20.1




[PATCH v11 19/34] qcow2: Handle QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC

2020-07-10 Thread Alberto Garcia
When dealing with subcluster types there is a new value called
QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC that has no equivalent in
QCow2ClusterType.

This patch handles that value in all places where subcluster types
are processed.

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 block/qcow2.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 6fbc73ff19..854a2fbd9b 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2064,7 +2064,8 @@ static int coroutine_fn 
qcow2_co_block_status(BlockDriverState *bs,
 *pnum = bytes;
 
 if ((type == QCOW2_SUBCLUSTER_NORMAL ||
- type == QCOW2_SUBCLUSTER_ZERO_ALLOC) && !s->crypto) {
+ type == QCOW2_SUBCLUSTER_ZERO_ALLOC ||
+ type == QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC) && !s->crypto) {
 *map = host_offset;
 *file = s->data_file->bs;
 status |= BDRV_BLOCK_OFFSET_VALID;
@@ -2072,7 +2073,8 @@ static int coroutine_fn 
qcow2_co_block_status(BlockDriverState *bs,
 if (type == QCOW2_SUBCLUSTER_ZERO_PLAIN ||
 type == QCOW2_SUBCLUSTER_ZERO_ALLOC) {
 status |= BDRV_BLOCK_ZERO;
-} else if (type != QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN) {
+} else if (type != QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN &&
+   type != QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC) {
 status |= BDRV_BLOCK_DATA;
 }
 if (s->metadata_preallocation && (status & BDRV_BLOCK_DATA) &&
@@ -2235,6 +2237,7 @@ static coroutine_fn int 
qcow2_co_preadv_task(BlockDriverState *bs,
 g_assert_not_reached();
 
 case QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN:
+case QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC:
 assert(bs->backing); /* otherwise handled in qcow2_co_preadv_part */
 
 BLKDBG_EVENT(bs->file, BLKDBG_READ_BACKING_AIO);
@@ -2303,7 +2306,8 @@ static coroutine_fn int 
qcow2_co_preadv_part(BlockDriverState *bs,
 
 if (type == QCOW2_SUBCLUSTER_ZERO_PLAIN ||
 type == QCOW2_SUBCLUSTER_ZERO_ALLOC ||
-(type == QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN && !bs->backing))
+(type == QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN && !bs->backing) ||
+(type == QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC && !bs->backing))
 {
 qemu_iovec_memset(qiov, qiov_offset, 0, cur_bytes);
 } else {
@@ -3858,6 +3862,7 @@ static coroutine_fn int 
qcow2_co_pwrite_zeroes(BlockDriverState *bs,
 ret = qcow2_get_host_offset(bs, offset, , , );
 if (ret < 0 ||
 (type != QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN &&
+ type != QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC &&
  type != QCOW2_SUBCLUSTER_ZERO_PLAIN &&
  type != QCOW2_SUBCLUSTER_ZERO_ALLOC)) {
 qemu_co_mutex_unlock(>lock);
@@ -3936,6 +3941,7 @@ qcow2_co_copy_range_from(BlockDriverState *bs,
 
 switch (type) {
 case QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN:
+case QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC:
 if (bs->backing && bs->backing->bs) {
 int64_t backing_length = bdrv_getlength(bs->backing->bs);
 if (src_offset >= backing_length) {
-- 
2.20.1




[PATCH v11 04/34] qcow2: Split cluster_needs_cow() out of count_cow_clusters()

2020-07-10 Thread Alberto Garcia
We are going to need it in other places.

Signed-off-by: Alberto Garcia 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
---
 block/qcow2-cluster.c | 34 +++---
 1 file changed, 19 insertions(+), 15 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 360da3f6b2..543f515c81 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1088,6 +1088,24 @@ static void calculate_l2_meta(BlockDriverState *bs,
 QLIST_INSERT_HEAD(>cluster_allocs, *m, next_in_flight);
 }
 
+/* Returns true if writing to a cluster requires COW */
+static bool cluster_needs_cow(BlockDriverState *bs, uint64_t l2_entry)
+{
+switch (qcow2_get_cluster_type(bs, l2_entry)) {
+case QCOW2_CLUSTER_NORMAL:
+if (l2_entry & QCOW_OFLAG_COPIED) {
+return false;
+}
+case QCOW2_CLUSTER_UNALLOCATED:
+case QCOW2_CLUSTER_COMPRESSED:
+case QCOW2_CLUSTER_ZERO_PLAIN:
+case QCOW2_CLUSTER_ZERO_ALLOC:
+return true;
+default:
+abort();
+}
+}
+
 /*
  * Returns the number of contiguous clusters that can be used for an allocating
  * write, but require COW to be performed (this includes yet unallocated space,
@@ -1100,25 +1118,11 @@ static int count_cow_clusters(BlockDriverState *bs, int 
nb_clusters,
 
 for (i = 0; i < nb_clusters; i++) {
 uint64_t l2_entry = be64_to_cpu(l2_slice[l2_index + i]);
-QCow2ClusterType cluster_type = qcow2_get_cluster_type(bs, l2_entry);
-
-switch(cluster_type) {
-case QCOW2_CLUSTER_NORMAL:
-if (l2_entry & QCOW_OFLAG_COPIED) {
-goto out;
-}
+if (!cluster_needs_cow(bs, l2_entry)) {
 break;
-case QCOW2_CLUSTER_UNALLOCATED:
-case QCOW2_CLUSTER_COMPRESSED:
-case QCOW2_CLUSTER_ZERO_PLAIN:
-case QCOW2_CLUSTER_ZERO_ALLOC:
-break;
-default:
-abort();
 }
 }
 
-out:
 assert(i <= nb_clusters);
 return i;
 }
-- 
2.20.1




[PATCH v11 12/34] qcow2: Add l2_entry_size()

2020-07-10 Thread Alberto Garcia
qcow2 images with subclusters have 128-bit L2 entries. The first 64
bits contain the same information as traditional images and the last
64 bits form a bitmap with the status of each individual subcluster.

Because of that we cannot assume that L2 entries are sizeof(uint64_t)
anymore. This function returns the proper value for the image.

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
---
 block/qcow2.h  |  9 +
 block/qcow2-cluster.c  | 12 ++--
 block/qcow2-refcount.c | 14 --
 block/qcow2.c  |  8 
 4 files changed, 27 insertions(+), 16 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 4fe31adfd3..46b351229a 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -80,6 +80,10 @@
 
 #define QCOW_EXTL2_SUBCLUSTERS_PER_CLUSTER 32
 
+/* Size of normal and extended L2 entries */
+#define L2E_SIZE_NORMAL   (sizeof(uint64_t))
+#define L2E_SIZE_EXTENDED (sizeof(uint64_t) * 2)
+
 #define MIN_CLUSTER_BITS 9
 #define MAX_CLUSTER_BITS 21
 
@@ -521,6 +525,11 @@ static inline bool has_subclusters(BDRVQcow2State *s)
 return false;
 }
 
+static inline size_t l2_entry_size(BDRVQcow2State *s)
+{
+return has_subclusters(s) ? L2E_SIZE_EXTENDED : L2E_SIZE_NORMAL;
+}
+
 static inline uint64_t get_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice,
 int idx)
 {
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index b501642c56..9ba60f3fde 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -208,7 +208,7 @@ static int l2_load(BlockDriverState *bs, uint64_t offset,
uint64_t l2_offset, uint64_t **l2_slice)
 {
 BDRVQcow2State *s = bs->opaque;
-int start_of_slice = sizeof(uint64_t) *
+int start_of_slice = l2_entry_size(s) *
 (offset_to_l2_index(s, offset) - offset_to_l2_slice_index(s, offset));
 
 return qcow2_cache_get(bs, s->l2_table_cache, l2_offset + start_of_slice,
@@ -281,7 +281,7 @@ static int l2_allocate(BlockDriverState *bs, int l1_index)
 
 /* allocate a new l2 entry */
 
-l2_offset = qcow2_alloc_clusters(bs, s->l2_size * sizeof(uint64_t));
+l2_offset = qcow2_alloc_clusters(bs, s->l2_size * l2_entry_size(s));
 if (l2_offset < 0) {
 ret = l2_offset;
 goto fail;
@@ -305,7 +305,7 @@ static int l2_allocate(BlockDriverState *bs, int l1_index)
 
 /* allocate a new entry in the l2 cache */
 
-slice_size2 = s->l2_slice_size * sizeof(uint64_t);
+slice_size2 = s->l2_slice_size * l2_entry_size(s);
 n_slices = s->cluster_size / slice_size2;
 
 trace_qcow2_l2_allocate_get_empty(bs, l1_index);
@@ -369,7 +369,7 @@ fail:
 }
 s->l1_table[l1_index] = old_l2_offset;
 if (l2_offset > 0) {
-qcow2_free_clusters(bs, l2_offset, s->l2_size * sizeof(uint64_t),
+qcow2_free_clusters(bs, l2_offset, s->l2_size * l2_entry_size(s),
 QCOW2_DISCARD_ALWAYS);
 }
 return ret;
@@ -717,7 +717,7 @@ static int get_cluster_table(BlockDriverState *bs, uint64_t 
offset,
 
 /* Then decrease the refcount of the old table */
 if (l2_offset) {
-qcow2_free_clusters(bs, l2_offset, s->l2_size * sizeof(uint64_t),
+qcow2_free_clusters(bs, l2_offset, s->l2_size * l2_entry_size(s),
 QCOW2_DISCARD_OTHER);
 }
 
@@ -1914,7 +1914,7 @@ static int expand_zero_clusters_in_l1(BlockDriverState 
*bs, uint64_t *l1_table,
 int ret;
 int i, j;
 
-slice_size2 = s->l2_slice_size * sizeof(uint64_t);
+slice_size2 = s->l2_slice_size * l2_entry_size(s);
 n_slices = s->cluster_size / slice_size2;
 
 if (!is_active_l1) {
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 04546838e8..770c5dbc83 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1254,7 +1254,7 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
 l2_slice = NULL;
 l1_table = NULL;
 l1_size2 = l1_size * sizeof(uint64_t);
-slice_size2 = s->l2_slice_size * sizeof(uint64_t);
+slice_size2 = s->l2_slice_size * l2_entry_size(s);
 n_slices = s->cluster_size / slice_size2;
 
 s->cache_discards = true;
@@ -1605,7 +1605,7 @@ static int check_refcounts_l2(BlockDriverState *bs, 
BdrvCheckResult *res,
 int i, l2_size, nb_csectors, ret;
 
 /* Read L2 table from disk */
-l2_size = s->l2_size * sizeof(uint64_t);
+l2_size = s->l2_size * l2_entry_size(s);
 l2_table = g_malloc(l2_size);
 
 ret = bdrv_pread(bs->file, l2_offset, l2_table, l2_size);
@@ -1680,15 +1680,16 @@ static int check_refcounts_l2(BlockDriverState *bs, 
BdrvCheckResult *res,
 fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR",
 offset);
 if (fix & BDRV_FIX_ERRORS) {
+int idx = i * (l2_entry_size(s) / sizeof(uint64_t));
 uint64_t l2e_offset =
-l2_offset + 

[PATCH v11 24/34] qcow2: Add subcluster support to check_refcounts_l2()

2020-07-10 Thread Alberto Garcia
The offset field of an uncompressed cluster's L2 entry must be aligned
to the cluster size, otherwise it is invalid. If the cluster has no
data then it means that the offset points to a preallocation, so we
can clear the offset field without affecting the guest-visible data.
This is what 'qemu-img check' does when run in repair mode.

On traditional qcow2 images this can only happen when QCOW_OFLAG_ZERO
is set, and repairing such entries turns the clusters from ZERO_ALLOC
into ZERO_PLAIN.

Extended L2 entries have no ZERO_ALLOC clusters and no QCOW_OFLAG_ZERO
but the idea is the same: if none of the subclusters are allocated
then we can clear the offset field and leave the bitmap untouched.

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
---
 block/qcow2-refcount.c | 16 +++-
 tests/qemu-iotests/060.out |  2 +-
 2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 770c5dbc83..aae52607eb 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1669,12 +1669,18 @@ static int check_refcounts_l2(BlockDriverState *bs, 
BdrvCheckResult *res,
 
 /* Correct offsets are cluster aligned */
 if (offset_into_cluster(s, offset)) {
+bool contains_data;
 res->corruptions++;
 
-if (qcow2_get_cluster_type(bs, l2_entry) ==
-QCOW2_CLUSTER_ZERO_ALLOC)
-{
-fprintf(stderr, "%s offset=%" PRIx64 ": Preallocated zero "
+if (has_subclusters(s)) {
+uint64_t l2_bitmap = get_l2_bitmap(s, l2_table, i);
+contains_data = (l2_bitmap & QCOW_L2_BITMAP_ALL_ALLOC);
+} else {
+contains_data = !(l2_entry & QCOW_OFLAG_ZERO);
+}
+
+if (!contains_data) {
+fprintf(stderr, "%s offset=%" PRIx64 ": Preallocated "
 "cluster is not properly aligned; L2 entry "
 "corrupted.\n",
 fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR",
@@ -1686,7 +1692,7 @@ static int check_refcounts_l2(BlockDriverState *bs, 
BdrvCheckResult *res,
 int ign = active ? QCOW2_OL_ACTIVE_L2 :
QCOW2_OL_INACTIVE_L2;
 
-l2_entry = QCOW_OFLAG_ZERO;
+l2_entry = has_subclusters(s) ? 0 : QCOW_OFLAG_ZERO;
 set_l2_entry(s, l2_table, i, l2_entry);
 ret = qcow2_pre_write_overlap_check(bs, ign,
 l2e_offset, l2_entry_size(s), false);
diff --git a/tests/qemu-iotests/060.out b/tests/qemu-iotests/060.out
index be5f8707a3..fa3d68f0df 100644
--- a/tests/qemu-iotests/060.out
+++ b/tests/qemu-iotests/060.out
@@ -320,7 +320,7 @@ discard 65536/65536 bytes at offset 0
 qcow2: Marking image as corrupt: Preallocated zero cluster offset 0x2a00 
unaligned (guest offset: 0); further corruption events will be suppressed
 write failed: Input/output error
 --- Repairing ---
-Repairing offset=2a00: Preallocated zero cluster is not properly aligned; L2 
entry corrupted.
+Repairing offset=2a00: Preallocated cluster is not properly aligned; L2 entry 
corrupted.
 The following inconsistencies were found and repaired:
 
 0 leaked clusters
-- 
2.20.1




[PATCH v11 22/34] qcow2: Add subcluster support to zero_in_l2_slice()

2020-07-10 Thread Alberto Garcia
The QCOW_OFLAG_ZERO bit that indicates that a cluster reads as
zeroes is only used in standard L2 entries. Extended L2 entries use
individual 'all zeroes' bits for each subcluster.

This must be taken into account when updating the L2 entry and also
when deciding that an existing entry does not need to be updated.

Signed-off-by: Alberto Garcia 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
---
 block/qcow2-cluster.c | 36 +++-
 1 file changed, 19 insertions(+), 17 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 492296a90f..cb07fd00a1 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1957,7 +1957,6 @@ static int zero_in_l2_slice(BlockDriverState *bs, 
uint64_t offset,
 int l2_index;
 int ret;
 int i;
-bool unmap = !!(flags & BDRV_REQ_MAY_UNMAP);
 
 ret = get_cluster_table(bs, offset, _slice, _index);
 if (ret < 0) {
@@ -1969,28 +1968,31 @@ static int zero_in_l2_slice(BlockDriverState *bs, 
uint64_t offset,
 assert(nb_clusters <= INT_MAX);
 
 for (i = 0; i < nb_clusters; i++) {
-uint64_t old_offset;
-QCow2ClusterType cluster_type;
+uint64_t old_l2_entry = get_l2_entry(s, l2_slice, l2_index + i);
+uint64_t old_l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index + i);
+QCow2ClusterType type = qcow2_get_cluster_type(bs, old_l2_entry);
+bool unmap = (type == QCOW2_CLUSTER_COMPRESSED) ||
+((flags & BDRV_REQ_MAY_UNMAP) && qcow2_cluster_is_allocated(type));
+uint64_t new_l2_entry = unmap ? 0 : old_l2_entry;
+uint64_t new_l2_bitmap = old_l2_bitmap;
 
-old_offset = get_l2_entry(s, l2_slice, l2_index + i);
+if (has_subclusters(s)) {
+new_l2_bitmap = QCOW_L2_BITMAP_ALL_ZEROES;
+} else {
+new_l2_entry |= QCOW_OFLAG_ZERO;
+}
 
-/*
- * Minimize L2 changes if the cluster already reads back as
- * zeroes with correct allocation.
- */
-cluster_type = qcow2_get_cluster_type(bs, old_offset);
-if (cluster_type == QCOW2_CLUSTER_ZERO_PLAIN ||
-(cluster_type == QCOW2_CLUSTER_ZERO_ALLOC && !unmap)) {
+if (old_l2_entry == new_l2_entry && old_l2_bitmap == new_l2_bitmap) {
 continue;
 }
 
 qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
-if (cluster_type == QCOW2_CLUSTER_COMPRESSED || unmap) {
-set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);
-qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST);
-} else {
-uint64_t entry = get_l2_entry(s, l2_slice, l2_index + i);
-set_l2_entry(s, l2_slice, l2_index + i, entry | QCOW_OFLAG_ZERO);
+if (unmap) {
+qcow2_free_any_clusters(bs, old_l2_entry, 1, 
QCOW2_DISCARD_REQUEST);
+}
+set_l2_entry(s, l2_slice, l2_index + i, new_l2_entry);
+if (has_subclusters(s)) {
+set_l2_bitmap(s, l2_slice, l2_index + i, new_l2_bitmap);
 }
 }
 
-- 
2.20.1




[PATCH v11 32/34] qcow2: Allow preallocation and backing files if extended_l2 is set

2020-07-10 Thread Alberto Garcia
Traditional qcow2 images don't allow preallocation if a backing file
is set. This is because once a cluster is allocated there is no way to
tell that its data should be read from the backing file.

Extended L2 entries have individual allocation bits for each
subcluster, and therefore it is perfectly possible to have an
allocated cluster with all its subclusters unallocated.

Signed-off-by: Alberto Garcia 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
---
 block/qcow2.c  | 7 ---
 tests/qemu-iotests/206.out | 2 +-
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index b7a1cae39c..806b194761 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3449,10 +3449,11 @@ qcow2_co_create(BlockdevCreateOptions *create_options, 
Error **errp)
 qcow2_opts->preallocation = PREALLOC_MODE_OFF;
 }
 if (qcow2_opts->has_backing_file &&
-qcow2_opts->preallocation != PREALLOC_MODE_OFF)
+qcow2_opts->preallocation != PREALLOC_MODE_OFF &&
+!qcow2_opts->extended_l2)
 {
-error_setg(errp, "Backing file and preallocation cannot be used at "
-   "the same time");
+error_setg(errp, "Backing file and preallocation can only be used at "
+   "the same time if extended_l2 is on");
 ret = -EINVAL;
 goto out;
 }
diff --git a/tests/qemu-iotests/206.out b/tests/qemu-iotests/206.out
index 363c5abe35..a100849fcb 100644
--- a/tests/qemu-iotests/206.out
+++ b/tests/qemu-iotests/206.out
@@ -203,7 +203,7 @@ Job failed: Different refcount widths than 16 bits require 
compatibility level 1
 === Invalid backing file options ===
 {"execute": "blockdev-create", "arguments": {"job-id": "job0", "options": 
{"backing-file": "/dev/null", "driver": "qcow2", "file": "node0", 
"preallocation": "full", "size": 67108864}}}
 {"return": {}}
-Job failed: Backing file and preallocation cannot be used at the same time
+Job failed: Backing file and preallocation can only be used at the same time 
if extended_l2 is on
 {"execute": "job-dismiss", "arguments": {"id": "job0"}}
 {"return": {}}
 
-- 
2.20.1




[PATCH v11 09/34] qcow2: Add subcluster-related fields to BDRVQcow2State

2020-07-10 Thread Alberto Garcia
This patch adds the following new fields to BDRVQcow2State:

- subclusters_per_cluster: Number of subclusters in a cluster
- subcluster_size: The size of each subcluster, in bytes
- subcluster_bits: No. of bits so 1 << subcluster_bits = subcluster_size

Images without subclusters are treated as if they had exactly one
subcluster per cluster (i.e. subcluster_size = cluster_size).

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 block/qcow2.h | 5 +
 block/qcow2.c | 5 +
 2 files changed, 10 insertions(+)

diff --git a/block/qcow2.h b/block/qcow2.h
index 2064dd3d85..eee4c8de9c 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -78,6 +78,8 @@
 /* The cluster reads as all zeros */
 #define QCOW_OFLAG_ZERO (1ULL << 0)
 
+#define QCOW_EXTL2_SUBCLUSTERS_PER_CLUSTER 32
+
 #define MIN_CLUSTER_BITS 9
 #define MAX_CLUSTER_BITS 21
 
@@ -295,6 +297,9 @@ typedef struct BDRVQcow2State {
 int cluster_bits;
 int cluster_size;
 int l2_slice_size;
+int subcluster_bits;
+int subcluster_size;
+int subclusters_per_cluster;
 int l2_bits;
 int l2_size;
 int l1_size;
diff --git a/block/qcow2.c b/block/qcow2.c
index 47f8032f89..e0349e6800 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1444,6 +1444,11 @@ static int coroutine_fn qcow2_do_open(BlockDriverState 
*bs, QDict *options,
 }
 }
 
+s->subclusters_per_cluster =
+has_subclusters(s) ? QCOW_EXTL2_SUBCLUSTERS_PER_CLUSTER : 1;
+s->subcluster_size = s->cluster_size / s->subclusters_per_cluster;
+s->subcluster_bits = ctz32(s->subcluster_size);
+
 /* Check support for various header values */
 if (header.refcount_order > 6) {
 error_setg(errp, "Reference count entry width too large; may not "
-- 
2.20.1




[PATCH v11 06/34] qcow2: Add get_l2_entry() and set_l2_entry()

2020-07-10 Thread Alberto Garcia
The size of an L2 entry is 64 bits, but if we want to have subclusters
we need extended L2 entries. This means that we have to access L2
tables and slices differently depending on whether an image has
extended L2 entries or not.

This patch replaces all l2_slice[] accesses with calls to
get_l2_entry() and set_l2_entry().

Signed-off-by: Alberto Garcia 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 block/qcow2.h  | 12 
 block/qcow2-cluster.c  | 63 ++
 block/qcow2-refcount.c | 17 ++--
 3 files changed, 54 insertions(+), 38 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 06475e0849..eecbadc4cb 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -510,6 +510,18 @@ typedef enum QCow2MetadataOverlap {
 
 #define INV_OFFSET (-1ULL)
 
+static inline uint64_t get_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice,
+int idx)
+{
+return be64_to_cpu(l2_slice[idx]);
+}
+
+static inline void set_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice,
+int idx, uint64_t entry)
+{
+l2_slice[idx] = cpu_to_be64(entry);
+}
+
 static inline bool has_data_file(BlockDriverState *bs)
 {
 BDRVQcow2State *s = bs->opaque;
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 723a122977..b501642c56 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -383,12 +383,13 @@ fail:
  * cluster which may require a different handling)
  */
 static int count_contiguous_clusters(BlockDriverState *bs, int nb_clusters,
-int cluster_size, uint64_t *l2_slice, uint64_t stop_flags)
+int cluster_size, uint64_t *l2_slice, int l2_index, uint64_t 
stop_flags)
 {
+BDRVQcow2State *s = bs->opaque;
 int i;
 QCow2ClusterType first_cluster_type;
 uint64_t mask = stop_flags | L2E_OFFSET_MASK | QCOW_OFLAG_COMPRESSED;
-uint64_t first_entry = be64_to_cpu(l2_slice[0]);
+uint64_t first_entry = get_l2_entry(s, l2_slice, l2_index);
 uint64_t offset = first_entry & mask;
 
 first_cluster_type = qcow2_get_cluster_type(bs, first_entry);
@@ -401,7 +402,7 @@ static int count_contiguous_clusters(BlockDriverState *bs, 
int nb_clusters,
first_cluster_type == QCOW2_CLUSTER_ZERO_ALLOC);
 
 for (i = 0; i < nb_clusters; i++) {
-uint64_t l2_entry = be64_to_cpu(l2_slice[i]) & mask;
+uint64_t l2_entry = get_l2_entry(s, l2_slice, l2_index + i) & mask;
 if (offset + (uint64_t) i * cluster_size != l2_entry) {
 break;
 }
@@ -417,14 +418,16 @@ static int count_contiguous_clusters(BlockDriverState 
*bs, int nb_clusters,
 static int count_contiguous_clusters_unallocated(BlockDriverState *bs,
  int nb_clusters,
  uint64_t *l2_slice,
+ int l2_index,
  QCow2ClusterType wanted_type)
 {
+BDRVQcow2State *s = bs->opaque;
 int i;
 
 assert(wanted_type == QCOW2_CLUSTER_ZERO_PLAIN ||
wanted_type == QCOW2_CLUSTER_UNALLOCATED);
 for (i = 0; i < nb_clusters; i++) {
-uint64_t entry = be64_to_cpu(l2_slice[i]);
+uint64_t entry = get_l2_entry(s, l2_slice, l2_index + i);
 QCow2ClusterType type = qcow2_get_cluster_type(bs, entry);
 
 if (type != wanted_type) {
@@ -575,7 +578,7 @@ int qcow2_get_host_offset(BlockDriverState *bs, uint64_t 
offset,
 /* find the cluster offset for the given disk offset */
 
 l2_index = offset_to_l2_slice_index(s, offset);
-l2_entry = be64_to_cpu(l2_slice[l2_index]);
+l2_entry = get_l2_entry(s, l2_slice, l2_index);
 
 nb_clusters = size_to_clusters(s, bytes_needed);
 /* bytes_needed <= *bytes + offset_in_cluster, both of which are unsigned
@@ -610,7 +613,7 @@ int qcow2_get_host_offset(BlockDriverState *bs, uint64_t 
offset,
 case QCOW2_CLUSTER_UNALLOCATED:
 /* how many empty clusters ? */
 c = count_contiguous_clusters_unallocated(bs, nb_clusters,
-  _slice[l2_index], type);
+  l2_slice, l2_index, type);
 break;
 case QCOW2_CLUSTER_ZERO_ALLOC:
 case QCOW2_CLUSTER_NORMAL: {
@@ -618,7 +621,7 @@ int qcow2_get_host_offset(BlockDriverState *bs, uint64_t 
offset,
 *host_offset = host_cluster_offset + offset_in_cluster;
 /* how many allocated clusters ? */
 c = count_contiguous_clusters(bs, nb_clusters, s->cluster_size,
-  _slice[l2_index], QCOW_OFLAG_ZERO);
+  l2_slice, l2_index, QCOW_OFLAG_ZERO);
 if (offset_into_cluster(s, host_cluster_offset)) {
 qcow2_signal_corruption(bs, true, -1, -1,
 "Cluster allocation 

[PATCH v11 05/34] qcow2: Process QCOW2_CLUSTER_ZERO_ALLOC clusters in handle_copied()

2020-07-10 Thread Alberto Garcia
When writing to a qcow2 file there are two functions that take a
virtual offset and return a host offset, possibly allocating new
clusters if necessary:

   - handle_copied() looks for normal data clusters that are already
 allocated and have a reference count of 1. In those clusters we
 can simply write the data and there is no need to perform any
 copy-on-write.

   - handle_alloc() looks for clusters that do need copy-on-write,
 either because they haven't been allocated yet, because their
 reference count is != 1 or because they are ZERO_ALLOC clusters.

The ZERO_ALLOC case is a bit special because those are clusters that
are already allocated and they could perfectly be dealt with in
handle_copied() (as long as copy-on-write is performed when required).

In fact, there is extra code specifically for them in handle_alloc()
that tries to reuse the existing allocation if possible and frees them
otherwise.

This patch changes the handling of ZERO_ALLOC clusters so the
semantics of these two functions are now like this:

   - handle_copied() looks for clusters that are already allocated and
 which we can overwrite (NORMAL and ZERO_ALLOC clusters with a
 reference count of 1).

   - handle_alloc() looks for clusters for which we need a new
 allocation (all other cases).

One important difference after this change is that clusters found
in handle_copied() may now require copy-on-write, but this will be
necessary anyway once we add support for subclusters.

Signed-off-by: Alberto Garcia 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
---
 block/qcow2-cluster.c | 256 +++---
 1 file changed, 141 insertions(+), 115 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 543f515c81..723a122977 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1040,13 +1040,18 @@ void qcow2_alloc_cluster_abort(BlockDriverState *bs, 
QCowL2Meta *m)
 
 /*
  * For a given write request, create a new QCowL2Meta structure, add
- * it to @m and the BDRVQcow2State.cluster_allocs list.
+ * it to @m and the BDRVQcow2State.cluster_allocs list. If the write
+ * request does not need copy-on-write or changes to the L2 metadata
+ * then this function does nothing.
  *
  * @host_cluster_offset points to the beginning of the first cluster.
  *
  * @guest_offset and @bytes indicate the offset and length of the
  * request.
  *
+ * @l2_slice contains the L2 entries of all clusters involved in this
+ * write request.
+ *
  * If @keep_old is true it means that the clusters were already
  * allocated and will be overwritten. If false then the clusters are
  * new and we have to decrease the reference count of the old ones.
@@ -1054,15 +1059,53 @@ void qcow2_alloc_cluster_abort(BlockDriverState *bs, 
QCowL2Meta *m)
 static void calculate_l2_meta(BlockDriverState *bs,
   uint64_t host_cluster_offset,
   uint64_t guest_offset, unsigned bytes,
-  QCowL2Meta **m, bool keep_old)
+  uint64_t *l2_slice, QCowL2Meta **m, bool 
keep_old)
 {
 BDRVQcow2State *s = bs->opaque;
-unsigned cow_start_from = 0;
+int l2_index = offset_to_l2_slice_index(s, guest_offset);
+uint64_t l2_entry;
+unsigned cow_start_from, cow_end_to;
 unsigned cow_start_to = offset_into_cluster(s, guest_offset);
 unsigned cow_end_from = cow_start_to + bytes;
-unsigned cow_end_to = ROUND_UP(cow_end_from, s->cluster_size);
 unsigned nb_clusters = size_to_clusters(s, cow_end_from);
 QCowL2Meta *old_m = *m;
+QCow2ClusterType type;
+
+assert(nb_clusters <= s->l2_slice_size - l2_index);
+
+/* Return if there's no COW (all clusters are normal and we keep them) */
+if (keep_old) {
+int i;
+for (i = 0; i < nb_clusters; i++) {
+l2_entry = be64_to_cpu(l2_slice[l2_index + i]);
+if (qcow2_get_cluster_type(bs, l2_entry) != QCOW2_CLUSTER_NORMAL) {
+break;
+}
+}
+if (i == nb_clusters) {
+return;
+}
+}
+
+/* Get the L2 entry of the first cluster */
+l2_entry = be64_to_cpu(l2_slice[l2_index]);
+type = qcow2_get_cluster_type(bs, l2_entry);
+
+if (type == QCOW2_CLUSTER_NORMAL && keep_old) {
+cow_start_from = cow_start_to;
+} else {
+cow_start_from = 0;
+}
+
+/* Get the L2 entry of the last cluster */
+l2_entry = be64_to_cpu(l2_slice[l2_index + nb_clusters - 1]);
+type = qcow2_get_cluster_type(bs, l2_entry);
+
+if (type == QCOW2_CLUSTER_NORMAL && keep_old) {
+cow_end_to = cow_end_from;
+} else {
+cow_end_to = ROUND_UP(cow_end_from, s->cluster_size);
+}
 
 *m = g_malloc0(sizeof(**m));
 **m = (QCowL2Meta) {
@@ -1088,18 +1131,22 @@ static void calculate_l2_meta(BlockDriverState *bs,
 QLIST_INSERT_HEAD(>cluster_allocs, *m, next_in_flight);
 }
 
-/* 

[PATCH v11 07/34] qcow2: Document the Extended L2 Entries feature

2020-07-10 Thread Alberto Garcia
Subcluster allocation in qcow2 is implemented by extending the
existing L2 table entries and adding additional information to
indicate the allocation status of each subcluster.

This patch documents the changes to the qcow2 format and how they
affect the calculation of the L2 cache size.

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
Reviewed-by: Eric Blake 
---
 docs/interop/qcow2.txt | 68 --
 docs/qcow2-cache.txt   | 19 +++-
 2 files changed, 83 insertions(+), 4 deletions(-)

diff --git a/docs/interop/qcow2.txt b/docs/interop/qcow2.txt
index cb723463f2..64e9345fb4 100644
--- a/docs/interop/qcow2.txt
+++ b/docs/interop/qcow2.txt
@@ -42,6 +42,9 @@ The first cluster of a qcow2 image contains the file header:
 as the maximum cluster size and won't be able to open 
images
 with larger cluster sizes.
 
+Note: if the image has Extended L2 Entries then 
cluster_bits
+must be at least 14 (i.e. 16384 byte clusters).
+
  24 - 31:   size
 Virtual disk size in bytes.
 
@@ -117,7 +120,12 @@ the next fields through header_length.
 clusters. The compression_type field must be
 present and not zero.
 
-Bits 4-63:  Reserved (set to 0)
+Bit 4:  Extended L2 Entries.  If this bit is set then
+L2 table entries use an extended format that
+allows subcluster-based allocation. See the
+Extended L2 Entries section for more details.
+
+Bits 5-63:  Reserved (set to 0)
 
  80 -  87:  compatible_features
 Bitmask of compatible features. An implementation can
@@ -498,7 +506,7 @@ cannot be relaxed without an incompatible layout change).
 Given an offset into the virtual disk, the offset into the image file can be
 obtained as follows:
 
-l2_entries = (cluster_size / sizeof(uint64_t))
+l2_entries = (cluster_size / sizeof(uint64_t))[*]
 
 l2_index = (offset / cluster_size) % l2_entries
 l1_index = (offset / cluster_size) / l2_entries
@@ -508,6 +516,8 @@ obtained as follows:
 
 return cluster_offset + (offset % cluster_size)
 
+[*] this changes if Extended L2 Entries are enabled, see next section
+
 L1 table entry:
 
 Bit  0 -  8:Reserved (set to 0)
@@ -548,7 +558,8 @@ Standard Cluster Descriptor:
 nor is data read from the backing file if the cluster is
 unallocated.
 
-With version 2, this is always 0.
+With version 2 or with extended L2 entries (see the next
+section), this is always 0.
 
  1 -  8:Reserved (set to 0)
 
@@ -585,6 +596,57 @@ file (except if bit 0 in the Standard Cluster Descriptor 
is set). If there is
 no backing file or the backing file is smaller than the image, they shall read
 zeros for all parts that are not covered by the backing file.
 
+== Extended L2 Entries ==
+
+An image uses Extended L2 Entries if bit 4 is set on the incompatible_features
+field of the header.
+
+In these images standard data clusters are divided into 32 subclusters of the
+same size. They are contiguous and start from the beginning of the cluster.
+Subclusters can be allocated independently and the L2 entry contains 
information
+indicating the status of each one of them. Compressed data clusters don't have
+subclusters so they are treated the same as in images without this feature.
+
+The size of an extended L2 entry is 128 bits so the number of entries per table
+is calculated using this formula:
+
+l2_entries = (cluster_size / (2 * sizeof(uint64_t)))
+
+The first 64 bits have the same format as the standard L2 table entry described
+in the previous section, with the exception of bit 0 of the standard cluster
+descriptor.
+
+The last 64 bits contain a subcluster allocation bitmap with this format:
+
+Subcluster Allocation Bitmap (for standard clusters):
+
+Bit  0 - 31:Allocation status (one bit per subcluster)
+
+1: the subcluster is allocated. In this case the
+   host cluster offset field must contain a valid
+   offset.
+0: the subcluster is not allocated. In this case
+   read requests shall go to the backing file or
+   return zeros if there is no backing file data.
+
+Bits are assigned starting from the least significant
+one (i.e. bit x is used for subcluster x).
+
+32 - 63 Subcluster reads as zeros (one bit per subcluster)
+
+1: the subcluster reads as zeros. In this case the
+   allocation status bit must be unset. The host
+   

[PATCH v11 18/34] qcow2: Replace QCOW2_CLUSTER_* with QCOW2_SUBCLUSTER_*

2020-07-10 Thread Alberto Garcia
In order to support extended L2 entries some functions of the qcow2
driver need to start dealing with subclusters instead of clusters.

qcow2_get_host_offset() is modified to return the subcluster type
instead of the cluster type, and all callers are updated to replace
all values of QCow2ClusterType with their QCow2SubclusterType
equivalents.

This patch only changes the data types, there are no semantic changes.

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 block/qcow2.h |  2 +-
 block/qcow2-cluster.c | 10 +++
 block/qcow2.c | 70 ++-
 3 files changed, 42 insertions(+), 40 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 74f65793bd..5df761edc3 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -894,7 +894,7 @@ int qcow2_encrypt_sectors(BDRVQcow2State *s, int64_t 
sector_num,
 
 int qcow2_get_host_offset(BlockDriverState *bs, uint64_t offset,
   unsigned int *bytes, uint64_t *host_offset,
-  QCow2ClusterType *cluster_type);
+  QCow2SubclusterType *subcluster_type);
 int qcow2_alloc_cluster_offset(BlockDriverState *bs, uint64_t offset,
unsigned int *bytes, uint64_t *host_offset,
QCowL2Meta **m);
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 8e1e928852..dd8a72381b 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -564,15 +564,15 @@ static int coroutine_fn 
do_perform_cow_write(BlockDriverState *bs,
  * offset that we are interested in.
  *
  * On exit, *bytes is the number of bytes starting at offset that have the same
- * cluster type and (if applicable) are stored contiguously in the image file.
- * The cluster type is stored in *cluster_type.
- * Compressed clusters are always returned one by one.
+ * subcluster type and (if applicable) are stored contiguously in the image
+ * file. The subcluster type is stored in *subcluster_type.
+ * Compressed clusters are always processed one by one.
  *
  * Returns 0 on success, -errno in error cases.
  */
 int qcow2_get_host_offset(BlockDriverState *bs, uint64_t offset,
   unsigned int *bytes, uint64_t *host_offset,
-  QCow2ClusterType *cluster_type)
+  QCow2SubclusterType *subcluster_type)
 {
 BDRVQcow2State *s = bs->opaque;
 unsigned int l2_index;
@@ -714,7 +714,7 @@ out:
 assert(bytes_available - offset_in_cluster <= UINT_MAX);
 *bytes = bytes_available - offset_in_cluster;
 
-*cluster_type = type;
+*subcluster_type = qcow2_cluster_to_subcluster_type(type);
 
 return 0;
 
diff --git a/block/qcow2.c b/block/qcow2.c
index 5122b71cf2..6fbc73ff19 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2043,7 +2043,7 @@ static int coroutine_fn 
qcow2_co_block_status(BlockDriverState *bs,
 BDRVQcow2State *s = bs->opaque;
 uint64_t host_offset;
 unsigned int bytes;
-QCow2ClusterType type;
+QCow2SubclusterType type;
 int ret, status = 0;
 
 qemu_co_mutex_lock(>lock);
@@ -2063,15 +2063,16 @@ static int coroutine_fn 
qcow2_co_block_status(BlockDriverState *bs,
 
 *pnum = bytes;
 
-if ((type == QCOW2_CLUSTER_NORMAL || type == QCOW2_CLUSTER_ZERO_ALLOC) &&
-!s->crypto) {
+if ((type == QCOW2_SUBCLUSTER_NORMAL ||
+ type == QCOW2_SUBCLUSTER_ZERO_ALLOC) && !s->crypto) {
 *map = host_offset;
 *file = s->data_file->bs;
 status |= BDRV_BLOCK_OFFSET_VALID;
 }
-if (type == QCOW2_CLUSTER_ZERO_PLAIN || type == QCOW2_CLUSTER_ZERO_ALLOC) {
+if (type == QCOW2_SUBCLUSTER_ZERO_PLAIN ||
+type == QCOW2_SUBCLUSTER_ZERO_ALLOC) {
 status |= BDRV_BLOCK_ZERO;
-} else if (type != QCOW2_CLUSTER_UNALLOCATED) {
+} else if (type != QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN) {
 status |= BDRV_BLOCK_DATA;
 }
 if (s->metadata_preallocation && (status & BDRV_BLOCK_DATA) &&
@@ -2168,7 +2169,7 @@ typedef struct Qcow2AioTask {
 AioTask task;
 
 BlockDriverState *bs;
-QCow2ClusterType cluster_type; /* only for read */
+QCow2SubclusterType subcluster_type; /* only for read */
 uint64_t host_offset; /* or full descriptor in compressed clusters */
 uint64_t offset;
 uint64_t bytes;
@@ -2181,7 +2182,7 @@ static coroutine_fn int 
qcow2_co_preadv_task_entry(AioTask *task);
 static coroutine_fn int qcow2_add_task(BlockDriverState *bs,
AioTaskPool *pool,
AioTaskFunc func,
-   QCow2ClusterType cluster_type,
+   QCow2SubclusterType subcluster_type,
uint64_t host_offset,
uint64_t offset,
uint64_t bytes,
@@ -2195,7 

[PATCH v11 11/34] qcow2: Add offset_into_subcluster() and size_to_subclusters()

2020-07-10 Thread Alberto Garcia
Like offset_into_cluster() and size_to_clusters(), but for
subclusters.

Signed-off-by: Alberto Garcia 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
---
 block/qcow2.h | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/block/qcow2.h b/block/qcow2.h
index 2503374677..4fe31adfd3 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -555,11 +555,21 @@ static inline int64_t offset_into_cluster(BDRVQcow2State 
*s, int64_t offset)
 return offset & (s->cluster_size - 1);
 }
 
+static inline int64_t offset_into_subcluster(BDRVQcow2State *s, int64_t offset)
+{
+return offset & (s->subcluster_size - 1);
+}
+
 static inline uint64_t size_to_clusters(BDRVQcow2State *s, uint64_t size)
 {
 return (size + (s->cluster_size - 1)) >> s->cluster_bits;
 }
 
+static inline uint64_t size_to_subclusters(BDRVQcow2State *s, uint64_t size)
+{
+return (size + (s->subcluster_size - 1)) >> s->subcluster_bits;
+}
+
 static inline int64_t size_to_l1(BDRVQcow2State *s, int64_t size)
 {
 int shift = s->cluster_bits + s->l2_bits;
-- 
2.20.1




[PATCH v11 00/34] Add subcluster allocation to qcow2

2020-07-10 Thread Alberto Garcia
Hi,

here's the new version of the patches to add subcluster allocation
support to qcow2.

Please refer to the cover letter of the first version for a full
description of the patches:

   https://lists.gnu.org/archive/html/qemu-block/2019-10/msg00983.html

This version is rebased on top of the latest master (f2a1cf9180),
fixes the relevant conflicts (particularly after df373fb0a3) and
updates the test expectations.

Berto

v11:
- Patch 31: Fix rebase conflicts after df373fb0a3, update test
expectations.
- Patch 34: Update test expectations.

v10: https://lists.gnu.org/archive/html/qemu-block/2020-07/msg00328.html
v9: https://lists.gnu.org/archive/html/qemu-block/2020-06/msg01526.html
v8: https://lists.gnu.org/archive/html/qemu-block/2020-06/msg00546.html
v7: https://lists.gnu.org/archive/html/qemu-block/2020-05/msg01683.html
v6: https://lists.gnu.org/archive/html/qemu-block/2020-05/msg01583.html
v5: https://lists.gnu.org/archive/html/qemu-block/2020-05/msg00251.html
v4: https://lists.gnu.org/archive/html/qemu-block/2020-03/msg00966.html
v3: https://lists.gnu.org/archive/html/qemu-block/2019-12/msg00587.html
v2: https://lists.gnu.org/archive/html/qemu-block/2019-10/msg01642.html
v1: https://lists.gnu.org/archive/html/qemu-block/2019-10/msg00983.html

Output of git backport-diff against v10:

Key:
[] : patches are identical
[] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/34:[] [--] 'qcow2: Make Qcow2AioTask store the full host offset'
002/34:[] [--] 'qcow2: Convert qcow2_get_cluster_offset() into 
qcow2_get_host_offset()'
003/34:[] [--] 'qcow2: Add calculate_l2_meta()'
004/34:[] [--] 'qcow2: Split cluster_needs_cow() out of 
count_cow_clusters()'
005/34:[] [--] 'qcow2: Process QCOW2_CLUSTER_ZERO_ALLOC clusters in 
handle_copied()'
006/34:[] [--] 'qcow2: Add get_l2_entry() and set_l2_entry()'
007/34:[] [--] 'qcow2: Document the Extended L2 Entries feature'
008/34:[] [--] 'qcow2: Add dummy has_subclusters() function'
009/34:[] [--] 'qcow2: Add subcluster-related fields to BDRVQcow2State'
010/34:[] [--] 'qcow2: Add offset_to_sc_index()'
011/34:[] [--] 'qcow2: Add offset_into_subcluster() and 
size_to_subclusters()'
012/34:[] [--] 'qcow2: Add l2_entry_size()'
013/34:[] [--] 'qcow2: Update get/set_l2_entry() and add 
get/set_l2_bitmap()'
014/34:[] [--] 'qcow2: Add QCow2SubclusterType and 
qcow2_get_subcluster_type()'
015/34:[] [--] 'qcow2: Add qcow2_get_subcluster_range_type()'
016/34:[] [--] 'qcow2: Add qcow2_cluster_is_allocated()'
017/34:[] [--] 'qcow2: Add cluster type parameter to 
qcow2_get_host_offset()'
018/34:[] [--] 'qcow2: Replace QCOW2_CLUSTER_* with QCOW2_SUBCLUSTER_*'
019/34:[] [--] 'qcow2: Handle QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC'
020/34:[] [--] 'qcow2: Add subcluster support to calculate_l2_meta()'
021/34:[] [--] 'qcow2: Add subcluster support to qcow2_get_host_offset()'
022/34:[] [--] 'qcow2: Add subcluster support to zero_in_l2_slice()'
023/34:[] [--] 'qcow2: Add subcluster support to discard_in_l2_slice()'
024/34:[] [--] 'qcow2: Add subcluster support to check_refcounts_l2()'
025/34:[] [--] 'qcow2: Update L2 bitmap in qcow2_alloc_cluster_link_l2()'
026/34:[] [--] 'qcow2: Clear the L2 bitmap when allocating a compressed 
cluster'
027/34:[] [--] 'qcow2: Add subcluster support to handle_alloc_space()'
028/34:[] [--] 'qcow2: Add subcluster support to qcow2_co_pwrite_zeroes()'
029/34:[] [--] 'qcow2: Add subcluster support to qcow2_measure()'
030/34:[] [--] 'qcow2: Add prealloc field to QCowL2Meta'
031/34:[0493] [FC] 'qcow2: Add the 'extended_l2' option and the 
QCOW2_INCOMPAT_EXTL2 bit'
032/34:[] [--] 'qcow2: Allow preallocation and backing files if extended_l2 
is set'
033/34:[] [--] 'qcow2: Assert that expand_zero_clusters_in_l1() does not 
support subclusters'
034/34:[0006] [FC] 'iotests: Add tests for qcow2 images with extended L2 
entries'

Alberto Garcia (34):
  qcow2: Make Qcow2AioTask store the full host offset
  qcow2: Convert qcow2_get_cluster_offset() into qcow2_get_host_offset()
  qcow2: Add calculate_l2_meta()
  qcow2: Split cluster_needs_cow() out of count_cow_clusters()
  qcow2: Process QCOW2_CLUSTER_ZERO_ALLOC clusters in handle_copied()
  qcow2: Add get_l2_entry() and set_l2_entry()
  qcow2: Document the Extended L2 Entries feature
  qcow2: Add dummy has_subclusters() function
  qcow2: Add subcluster-related fields to BDRVQcow2State
  qcow2: Add offset_to_sc_index()
  qcow2: Add offset_into_subcluster() and size_to_subclusters()
  qcow2: Add l2_entry_size()
  qcow2: Update get/set_l2_entry() and add get/set_l2_bitmap()
  qcow2: Add QCow2SubclusterType and qcow2_get_subcluster_type()
  qcow2: Add qcow2_get_subcluster_range_type()
  qcow2: Add qcow2_cluster_is_allocated()
  qcow2: Add cluster type 

[PATCH v11 10/34] qcow2: Add offset_to_sc_index()

2020-07-10 Thread Alberto Garcia
For a given offset, return the subcluster number within its cluster
(i.e. with 32 subclusters per cluster it returns a number between 0
and 31).

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 block/qcow2.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/block/qcow2.h b/block/qcow2.h
index eee4c8de9c..2503374677 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -581,6 +581,11 @@ static inline int offset_to_l2_slice_index(BDRVQcow2State 
*s, int64_t offset)
 return (offset >> s->cluster_bits) & (s->l2_slice_size - 1);
 }
 
+static inline int offset_to_sc_index(BDRVQcow2State *s, int64_t offset)
+{
+return (offset >> s->subcluster_bits) & (s->subclusters_per_cluster - 1);
+}
+
 static inline int64_t qcow2_vm_state_offset(BDRVQcow2State *s)
 {
 return (int64_t)s->l1_vm_state_index << (s->cluster_bits + s->l2_bits);
-- 
2.20.1




[PATCH v11 14/34] qcow2: Add QCow2SubclusterType and qcow2_get_subcluster_type()

2020-07-10 Thread Alberto Garcia
This patch adds QCow2SubclusterType, which is the subcluster-level
version of QCow2ClusterType. All QCOW2_SUBCLUSTER_* values have the
the same meaning as their QCOW2_CLUSTER_* equivalents (when they
exist). See below for details and caveats.

In images without extended L2 entries clusters are treated as having
exactly one subcluster so it is possible to replace one data type with
the other while keeping the exact same semantics.

With extended L2 entries there are new possible values, and every
subcluster in the same cluster can obviously have a different
QCow2SubclusterType so functions need to be adapted to work on the
subcluster level.

There are several things that have to be taken into account:

  a) QCOW2_SUBCLUSTER_COMPRESSED means that the whole cluster is
 compressed. We do not support compression at the subcluster
 level.

  b) There are two different values for unallocated subclusters:
 QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN which means that the whole
 cluster is unallocated, and QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC
 which means that the cluster is allocated but the subcluster is
 not. The latter can only happen in images with extended L2
 entries.

  c) QCOW2_SUBCLUSTER_INVALID is used to detect the cases where an L2
 entry has a value that violates the specification. The caller is
 responsible for handling these situations.

 To prevent compatibility problems with images that have invalid
 values but are currently being read by QEMU without causing side
 effects, QCOW2_SUBCLUSTER_INVALID is only returned for images
 with extended L2 entries.

qcow2_cluster_to_subcluster_type() is added as a separate function
from qcow2_get_subcluster_type(), but this is only temporary and both
will be merged in a subsequent patch.

Signed-off-by: Alberto Garcia 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
---
 block/qcow2.h | 126 +-
 1 file changed, 125 insertions(+), 1 deletion(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 82b86f6cec..3aec6f452a 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -80,6 +80,21 @@
 
 #define QCOW_EXTL2_SUBCLUSTERS_PER_CLUSTER 32
 
+/* The subcluster X [0..31] is allocated */
+#define QCOW_OFLAG_SUB_ALLOC(X)   (1ULL << (X))
+/* The subcluster X [0..31] reads as zeroes */
+#define QCOW_OFLAG_SUB_ZERO(X)(QCOW_OFLAG_SUB_ALLOC(X) << 32)
+/* Subclusters [X, Y) (0 <= X <= Y <= 32) are allocated */
+#define QCOW_OFLAG_SUB_ALLOC_RANGE(X, Y) \
+(QCOW_OFLAG_SUB_ALLOC(Y) - QCOW_OFLAG_SUB_ALLOC(X))
+/* Subclusters [X, Y) (0 <= X <= Y <= 32) read as zeroes */
+#define QCOW_OFLAG_SUB_ZERO_RANGE(X, Y) \
+(QCOW_OFLAG_SUB_ALLOC_RANGE(X, Y) << 32)
+/* L2 entry bitmap with all allocation bits set */
+#define QCOW_L2_BITMAP_ALL_ALLOC  (QCOW_OFLAG_SUB_ALLOC_RANGE(0, 32))
+/* L2 entry bitmap with all "read as zeroes" bits set */
+#define QCOW_L2_BITMAP_ALL_ZEROES (QCOW_OFLAG_SUB_ZERO_RANGE(0, 32))
+
 /* Size of normal and extended L2 entries */
 #define L2E_SIZE_NORMAL   (sizeof(uint64_t))
 #define L2E_SIZE_EXTENDED (sizeof(uint64_t) * 2)
@@ -462,6 +477,33 @@ typedef struct QCowL2Meta
 QLIST_ENTRY(QCowL2Meta) next_in_flight;
 } QCowL2Meta;
 
+/*
+ * In images with standard L2 entries all clusters are treated as if
+ * they had one subcluster so QCow2ClusterType and QCow2SubclusterType
+ * can be mapped to each other and have the exact same meaning
+ * (QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC cannot happen in these images).
+ *
+ * In images with extended L2 entries QCow2ClusterType refers to the
+ * complete cluster and QCow2SubclusterType to each of the individual
+ * subclusters, so there are several possible combinations:
+ *
+ * |--+---|
+ * | Cluster type | Possible subcluster types |
+ * |--+---|
+ * | UNALLOCATED  | UNALLOCATED_PLAIN |
+ * |  |ZERO_PLAIN |
+ * |--+---|
+ * | NORMAL   | UNALLOCATED_ALLOC |
+ * |  |ZERO_ALLOC |
+ * |  |NORMAL |
+ * |--+---|
+ * | COMPRESSED   |COMPRESSED |
+ * |--+---|
+ *
+ * QCOW2_SUBCLUSTER_INVALID means that the L2 entry is incorrect and
+ * the image should be marked corrupt.
+ */
+
 typedef enum QCow2ClusterType {
 QCOW2_CLUSTER_UNALLOCATED,
 QCOW2_CLUSTER_ZERO_PLAIN,
@@ -470,6 +512,16 @@ typedef enum QCow2ClusterType {
 QCOW2_CLUSTER_COMPRESSED,
 } QCow2ClusterType;
 
+typedef enum QCow2SubclusterType {
+QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN,
+QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC,
+QCOW2_SUBCLUSTER_ZERO_PLAIN,
+QCOW2_SUBCLUSTER_ZERO_ALLOC,
+QCOW2_SUBCLUSTER_NORMAL,
+QCOW2_SUBCLUSTER_COMPRESSED,
+QCOW2_SUBCLUSTER_INVALID,
+} QCow2SubclusterType;
+
 typedef enum 

[PATCH v11 20/34] qcow2: Add subcluster support to calculate_l2_meta()

2020-07-10 Thread Alberto Garcia
If an image has subclusters then there are more copy-on-write
scenarios that we need to consider. Let's say we have a write request
from the middle of subcluster #3 until the end of the cluster:

1) If we are writing to a newly allocated cluster then we need
   copy-on-write. The previous contents of subclusters #0 to #3 must
   be copied to the new cluster. We can optimize this process by
   skipping all leading unallocated or zero subclusters (the status of
   those skipped subclusters will be reflected in the new L2 bitmap).

2) If we are overwriting an existing cluster:

   2.1) If subcluster #3 is unallocated or has the all-zeroes bit set
then we need copy-on-write (on subcluster #3 only).

   2.2) If subcluster #3 was already allocated then there is no need
for any copy-on-write. However we still need to update the L2
bitmap to reflect possible changes in the allocation status of
subclusters #4 to #31. Because of this, this function checks
if all the overwritten subclusters are already allocated and
in this case it returns without creating a new QCowL2Meta
structure.

After all these changes l2meta_cow_start() and l2meta_cow_end()
are not necessarily cluster-aligned anymore. We need to update the
calculation of old_start and old_end in handle_dependencies() to
guarantee that no two requests try to write on the same cluster.

Signed-off-by: Alberto Garcia 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
---
 block/qcow2-cluster.c | 163 +-
 1 file changed, 131 insertions(+), 32 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index dd8a72381b..01495cb1c0 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -387,7 +387,6 @@ fail:
  * If the L2 entry is invalid return -errno and set @type to
  * QCOW2_SUBCLUSTER_INVALID.
  */
-G_GNUC_UNUSED
 static int qcow2_get_subcluster_range_type(BlockDriverState *bs,
uint64_t l2_entry,
uint64_t l2_bitmap,
@@ -,56 +1110,148 @@ void qcow2_alloc_cluster_abort(BlockDriverState *bs, 
QCowL2Meta *m)
  * If @keep_old is true it means that the clusters were already
  * allocated and will be overwritten. If false then the clusters are
  * new and we have to decrease the reference count of the old ones.
+ *
+ * Returns 0 on success, -errno on failure.
  */
-static void calculate_l2_meta(BlockDriverState *bs,
-  uint64_t host_cluster_offset,
-  uint64_t guest_offset, unsigned bytes,
-  uint64_t *l2_slice, QCowL2Meta **m, bool 
keep_old)
+static int calculate_l2_meta(BlockDriverState *bs, uint64_t 
host_cluster_offset,
+ uint64_t guest_offset, unsigned bytes,
+ uint64_t *l2_slice, QCowL2Meta **m, bool keep_old)
 {
 BDRVQcow2State *s = bs->opaque;
-int l2_index = offset_to_l2_slice_index(s, guest_offset);
-uint64_t l2_entry;
+int sc_index, l2_index = offset_to_l2_slice_index(s, guest_offset);
+uint64_t l2_entry, l2_bitmap;
 unsigned cow_start_from, cow_end_to;
 unsigned cow_start_to = offset_into_cluster(s, guest_offset);
 unsigned cow_end_from = cow_start_to + bytes;
 unsigned nb_clusters = size_to_clusters(s, cow_end_from);
 QCowL2Meta *old_m = *m;
-QCow2ClusterType type;
+QCow2SubclusterType type;
+int i;
+bool skip_cow = keep_old;
 
 assert(nb_clusters <= s->l2_slice_size - l2_index);
 
-/* Return if there's no COW (all clusters are normal and we keep them) */
-if (keep_old) {
-int i;
-for (i = 0; i < nb_clusters; i++) {
-l2_entry = get_l2_entry(s, l2_slice, l2_index + i);
-if (qcow2_get_cluster_type(bs, l2_entry) != QCOW2_CLUSTER_NORMAL) {
-break;
+/* Check the type of all affected subclusters */
+for (i = 0; i < nb_clusters; i++) {
+l2_entry = get_l2_entry(s, l2_slice, l2_index + i);
+l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index + i);
+if (skip_cow) {
+unsigned write_from = MAX(cow_start_to, i << s->cluster_bits);
+unsigned write_to = MIN(cow_end_from, (i + 1) << s->cluster_bits);
+int first_sc = offset_to_sc_index(s, write_from);
+int last_sc = offset_to_sc_index(s, write_to - 1);
+int cnt = qcow2_get_subcluster_range_type(bs, l2_entry, l2_bitmap,
+  first_sc, );
+/* Is any of the subclusters of type != QCOW2_SUBCLUSTER_NORMAL ? 
*/
+if (type != QCOW2_SUBCLUSTER_NORMAL || first_sc + cnt <= last_sc) {
+skip_cow = false;
 }
+} else {
+/* If we can't skip the cow we can still look for invalid entries 
*/
+type = qcow2_get_subcluster_type(bs, l2_entry, l2_bitmap, 0);
 

[PATCH v11 03/34] qcow2: Add calculate_l2_meta()

2020-07-10 Thread Alberto Garcia
handle_alloc() creates a QCowL2Meta structure in order to update the
image metadata and perform the necessary copy-on-write operations.

This patch moves that code to a separate function so it can be used
from other places.

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
---
 block/qcow2-cluster.c | 77 +--
 1 file changed, 53 insertions(+), 24 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 2d9ae6481c..360da3f6b2 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1038,6 +1038,56 @@ void qcow2_alloc_cluster_abort(BlockDriverState *bs, 
QCowL2Meta *m)
 }
 }
 
+/*
+ * For a given write request, create a new QCowL2Meta structure, add
+ * it to @m and the BDRVQcow2State.cluster_allocs list.
+ *
+ * @host_cluster_offset points to the beginning of the first cluster.
+ *
+ * @guest_offset and @bytes indicate the offset and length of the
+ * request.
+ *
+ * If @keep_old is true it means that the clusters were already
+ * allocated and will be overwritten. If false then the clusters are
+ * new and we have to decrease the reference count of the old ones.
+ */
+static void calculate_l2_meta(BlockDriverState *bs,
+  uint64_t host_cluster_offset,
+  uint64_t guest_offset, unsigned bytes,
+  QCowL2Meta **m, bool keep_old)
+{
+BDRVQcow2State *s = bs->opaque;
+unsigned cow_start_from = 0;
+unsigned cow_start_to = offset_into_cluster(s, guest_offset);
+unsigned cow_end_from = cow_start_to + bytes;
+unsigned cow_end_to = ROUND_UP(cow_end_from, s->cluster_size);
+unsigned nb_clusters = size_to_clusters(s, cow_end_from);
+QCowL2Meta *old_m = *m;
+
+*m = g_malloc0(sizeof(**m));
+**m = (QCowL2Meta) {
+.next   = old_m,
+
+.alloc_offset   = host_cluster_offset,
+.offset = start_of_cluster(s, guest_offset),
+.nb_clusters= nb_clusters,
+
+.keep_old_clusters = keep_old,
+
+.cow_start = {
+.offset = cow_start_from,
+.nb_bytes   = cow_start_to - cow_start_from,
+},
+.cow_end = {
+.offset = cow_end_from,
+.nb_bytes   = cow_end_to - cow_end_from,
+},
+};
+
+qemu_co_queue_init(&(*m)->dependent_requests);
+QLIST_INSERT_HEAD(>cluster_allocs, *m, next_in_flight);
+}
+
 /*
  * Returns the number of contiguous clusters that can be used for an allocating
  * write, but require COW to be performed (this includes yet unallocated space,
@@ -1436,35 +1486,14 @@ static int handle_alloc(BlockDriverState *bs, uint64_t 
guest_offset,
 uint64_t requested_bytes = *bytes + offset_into_cluster(s, guest_offset);
 int avail_bytes = nb_clusters << s->cluster_bits;
 int nb_bytes = MIN(requested_bytes, avail_bytes);
-QCowL2Meta *old_m = *m;
-
-*m = g_malloc0(sizeof(**m));
-
-**m = (QCowL2Meta) {
-.next   = old_m,
-
-.alloc_offset   = alloc_cluster_offset,
-.offset = start_of_cluster(s, guest_offset),
-.nb_clusters= nb_clusters,
-
-.keep_old_clusters  = keep_old_clusters,
-
-.cow_start = {
-.offset = 0,
-.nb_bytes   = offset_into_cluster(s, guest_offset),
-},
-.cow_end = {
-.offset = nb_bytes,
-.nb_bytes   = avail_bytes - nb_bytes,
-},
-};
-qemu_co_queue_init(&(*m)->dependent_requests);
-QLIST_INSERT_HEAD(>cluster_allocs, *m, next_in_flight);
 
 *host_offset = alloc_cluster_offset + offset_into_cluster(s, guest_offset);
 *bytes = MIN(*bytes, nb_bytes - offset_into_cluster(s, guest_offset));
 assert(*bytes != 0);
 
+calculate_l2_meta(bs, alloc_cluster_offset, guest_offset, *bytes,
+  m, keep_old_clusters);
+
 return 1;
 
 fail:
-- 
2.20.1




[PATCH v11 15/34] qcow2: Add qcow2_get_subcluster_range_type()

2020-07-10 Thread Alberto Garcia
There are situations in which we want to know how many contiguous
subclusters of the same type there are in a given cluster. This can be
done by simply iterating over the subclusters and repeatedly calling
qcow2_get_subcluster_type() for each one of them.

However once we determined the type of a subcluster we can check the
rest efficiently by counting the number of adjacent ones (or zeroes)
in the bitmap. This is what this function does.

Signed-off-by: Alberto Garcia 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
---
 block/qcow2-cluster.c | 51 +++
 1 file changed, 51 insertions(+)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 9ba60f3fde..751906c330 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -375,6 +375,57 @@ fail:
 return ret;
 }
 
+/*
+ * For a given L2 entry, count the number of contiguous subclusters of
+ * the same type starting from @sc_from. Compressed clusters are
+ * treated as if they were divided into subclusters of size
+ * s->subcluster_size.
+ *
+ * Return the number of contiguous subclusters and set @type to the
+ * subcluster type.
+ *
+ * If the L2 entry is invalid return -errno and set @type to
+ * QCOW2_SUBCLUSTER_INVALID.
+ */
+G_GNUC_UNUSED
+static int qcow2_get_subcluster_range_type(BlockDriverState *bs,
+   uint64_t l2_entry,
+   uint64_t l2_bitmap,
+   unsigned sc_from,
+   QCow2SubclusterType *type)
+{
+BDRVQcow2State *s = bs->opaque;
+uint32_t val;
+
+*type = qcow2_get_subcluster_type(bs, l2_entry, l2_bitmap, sc_from);
+
+if (*type == QCOW2_SUBCLUSTER_INVALID) {
+return -EINVAL;
+} else if (!has_subclusters(s) || *type == QCOW2_SUBCLUSTER_COMPRESSED) {
+return s->subclusters_per_cluster - sc_from;
+}
+
+switch (*type) {
+case QCOW2_SUBCLUSTER_NORMAL:
+val = l2_bitmap | QCOW_OFLAG_SUB_ALLOC_RANGE(0, sc_from);
+return cto32(val) - sc_from;
+
+case QCOW2_SUBCLUSTER_ZERO_PLAIN:
+case QCOW2_SUBCLUSTER_ZERO_ALLOC:
+val = (l2_bitmap | QCOW_OFLAG_SUB_ZERO_RANGE(0, sc_from)) >> 32;
+return cto32(val) - sc_from;
+
+case QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN:
+case QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC:
+val = ((l2_bitmap >> 32) | l2_bitmap)
+& ~QCOW_OFLAG_SUB_ALLOC_RANGE(0, sc_from);
+return ctz32(val) - sc_from;
+
+default:
+g_assert_not_reached();
+}
+}
+
 /*
  * Checks how many clusters in a given L2 slice are contiguous in the image
  * file. As soon as one of the flags in the bitmask stop_flags changes compared
-- 
2.20.1




[PATCH v11 01/34] qcow2: Make Qcow2AioTask store the full host offset

2020-07-10 Thread Alberto Garcia
The file_cluster_offset field of Qcow2AioTask stores a cluster-aligned
host offset. In practice this is not very useful because all users(*)
of this structure need the final host offset into the cluster, which
they calculate using

   host_offset = file_cluster_offset + offset_into_cluster(s, offset)

There is no reason why Qcow2AioTask cannot store host_offset directly
and that is what this patch does.

(*) compressed clusters are the exception: in this case what
file_cluster_offset was storing was the full compressed cluster
descriptor (offset + size). This does not change with this patch
but it is documented now.

Signed-off-by: Alberto Garcia 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 block/qcow2.c  | 69 ++
 block/trace-events |  2 +-
 2 files changed, 34 insertions(+), 37 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index ea33673c55..d19f5927ac 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -74,7 +74,7 @@ typedef struct {
 
 static int coroutine_fn
 qcow2_co_preadv_compressed(BlockDriverState *bs,
-   uint64_t file_cluster_offset,
+   uint64_t cluster_descriptor,
uint64_t offset,
uint64_t bytes,
QEMUIOVector *qiov,
@@ -2113,7 +2113,7 @@ out:
 
 static coroutine_fn int
 qcow2_co_preadv_encrypted(BlockDriverState *bs,
-   uint64_t file_cluster_offset,
+   uint64_t host_offset,
uint64_t offset,
uint64_t bytes,
QEMUIOVector *qiov,
@@ -2140,16 +2140,12 @@ qcow2_co_preadv_encrypted(BlockDriverState *bs,
 }
 
 BLKDBG_EVENT(bs->file, BLKDBG_READ_AIO);
-ret = bdrv_co_pread(s->data_file,
-file_cluster_offset + offset_into_cluster(s, offset),
-bytes, buf, 0);
+ret = bdrv_co_pread(s->data_file, host_offset, bytes, buf, 0);
 if (ret < 0) {
 goto fail;
 }
 
-if (qcow2_co_decrypt(bs,
- file_cluster_offset + offset_into_cluster(s, offset),
- offset, buf, bytes) < 0)
+if (qcow2_co_decrypt(bs, host_offset, offset, buf, bytes) < 0)
 {
 ret = -EIO;
 goto fail;
@@ -2167,7 +2163,7 @@ typedef struct Qcow2AioTask {
 
 BlockDriverState *bs;
 QCow2ClusterType cluster_type; /* only for read */
-uint64_t file_cluster_offset;
+uint64_t host_offset; /* or full descriptor in compressed clusters */
 uint64_t offset;
 uint64_t bytes;
 QEMUIOVector *qiov;
@@ -2180,7 +2176,7 @@ static coroutine_fn int qcow2_add_task(BlockDriverState 
*bs,
AioTaskPool *pool,
AioTaskFunc func,
QCow2ClusterType cluster_type,
-   uint64_t file_cluster_offset,
+   uint64_t host_offset,
uint64_t offset,
uint64_t bytes,
QEMUIOVector *qiov,
@@ -2195,7 +2191,7 @@ static coroutine_fn int qcow2_add_task(BlockDriverState 
*bs,
 .bs = bs,
 .cluster_type = cluster_type,
 .qiov = qiov,
-.file_cluster_offset = file_cluster_offset,
+.host_offset = host_offset,
 .offset = offset,
 .bytes = bytes,
 .qiov_offset = qiov_offset,
@@ -2204,7 +2200,7 @@ static coroutine_fn int qcow2_add_task(BlockDriverState 
*bs,
 
 trace_qcow2_add_task(qemu_coroutine_self(), bs, pool,
  func == qcow2_co_preadv_task_entry ? "read" : "write",
- cluster_type, file_cluster_offset, offset, bytes,
+ cluster_type, host_offset, offset, bytes,
  qiov, qiov_offset);
 
 if (!pool) {
@@ -2218,13 +2214,12 @@ static coroutine_fn int qcow2_add_task(BlockDriverState 
*bs,
 
 static coroutine_fn int qcow2_co_preadv_task(BlockDriverState *bs,
  QCow2ClusterType cluster_type,
- uint64_t file_cluster_offset,
+ uint64_t host_offset,
  uint64_t offset, uint64_t bytes,
  QEMUIOVector *qiov,
  size_t qiov_offset)
 {
 BDRVQcow2State *s = bs->opaque;
-int offset_in_cluster = offset_into_cluster(s, offset);
 
 switch (cluster_type) {
 case QCOW2_CLUSTER_ZERO_PLAIN:
@@ -2240,19 +2235,17 @@ static coroutine_fn int 
qcow2_co_preadv_task(BlockDriverState *bs,
qiov, 

[PATCH v11 08/34] qcow2: Add dummy has_subclusters() function

2020-07-10 Thread Alberto Garcia
This function will be used by the qcow2 code to check if an image has
subclusters or not.

At the moment this simply returns false. Once all patches needed for
subcluster support are ready then QEMU will be able to create and
read images with subclusters and this function will return the actual
value.

Signed-off-by: Alberto Garcia 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 block/qcow2.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/block/qcow2.h b/block/qcow2.h
index eecbadc4cb..2064dd3d85 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -510,6 +510,12 @@ typedef enum QCow2MetadataOverlap {
 
 #define INV_OFFSET (-1ULL)
 
+static inline bool has_subclusters(BDRVQcow2State *s)
+{
+/* FIXME: Return false until this feature is complete */
+return false;
+}
+
 static inline uint64_t get_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice,
 int idx)
 {
-- 
2.20.1




[PATCH v11 02/34] qcow2: Convert qcow2_get_cluster_offset() into qcow2_get_host_offset()

2020-07-10 Thread Alberto Garcia
qcow2_get_cluster_offset() takes an (unaligned) guest offset and
returns the (aligned) offset of the corresponding cluster in the qcow2
image.

In practice none of the callers need to know where the cluster starts
so this patch makes the function calculate and return the final host
offset directly. The function is also renamed accordingly.

There is a pre-existing exception with compressed clusters: in this
case the function returns the complete cluster descriptor (containing
the offset and size of the compressed data). This does not change with
this patch but it is now documented.

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
---
 block/qcow2.h |  4 ++--
 block/qcow2-cluster.c | 41 +++--
 block/qcow2.c | 24 +++-
 3 files changed, 32 insertions(+), 37 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 7ce2c23bdb..06475e0849 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -694,8 +694,8 @@ int qcow2_write_l1_entry(BlockDriverState *bs, int 
l1_index);
 int qcow2_encrypt_sectors(BDRVQcow2State *s, int64_t sector_num,
   uint8_t *buf, int nb_sectors, bool enc, Error 
**errp);
 
-int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
- unsigned int *bytes, uint64_t *cluster_offset);
+int qcow2_get_host_offset(BlockDriverState *bs, uint64_t offset,
+  unsigned int *bytes, uint64_t *host_offset);
 int qcow2_alloc_cluster_offset(BlockDriverState *bs, uint64_t offset,
unsigned int *bytes, uint64_t *host_offset,
QCowL2Meta **m);
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 4b5fc8c4a7..2d9ae6481c 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -496,10 +496,15 @@ static int coroutine_fn 
do_perform_cow_write(BlockDriverState *bs,
 
 
 /*
- * get_cluster_offset
+ * get_host_offset
  *
- * For a given offset of the virtual disk, find the cluster type and offset in
- * the qcow2 file. The offset is stored in *cluster_offset.
+ * For a given offset of the virtual disk find the equivalent host
+ * offset in the qcow2 file and store it in *host_offset. Neither
+ * offset needs to be aligned to a cluster boundary.
+ *
+ * If the cluster is unallocated then *host_offset will be 0.
+ * If the cluster is compressed then *host_offset will contain the
+ * complete compressed cluster descriptor.
  *
  * On entry, *bytes is the maximum number of contiguous bytes starting at
  * offset that we are interested in.
@@ -511,12 +516,12 @@ static int coroutine_fn 
do_perform_cow_write(BlockDriverState *bs,
  * Returns the cluster type (QCOW2_CLUSTER_*) on success, -errno in error
  * cases.
  */
-int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
- unsigned int *bytes, uint64_t *cluster_offset)
+int qcow2_get_host_offset(BlockDriverState *bs, uint64_t offset,
+  unsigned int *bytes, uint64_t *host_offset)
 {
 BDRVQcow2State *s = bs->opaque;
 unsigned int l2_index;
-uint64_t l1_index, l2_offset, *l2_slice;
+uint64_t l1_index, l2_offset, *l2_slice, l2_entry;
 int c;
 unsigned int offset_in_cluster;
 uint64_t bytes_available, bytes_needed, nb_clusters;
@@ -537,7 +542,7 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t 
offset,
 bytes_needed = bytes_available;
 }
 
-*cluster_offset = 0;
+*host_offset = 0;
 
 /* seek to the l2 offset in the l1 table */
 
@@ -570,7 +575,7 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t 
offset,
 /* find the cluster offset for the given disk offset */
 
 l2_index = offset_to_l2_slice_index(s, offset);
-*cluster_offset = be64_to_cpu(l2_slice[l2_index]);
+l2_entry = be64_to_cpu(l2_slice[l2_index]);
 
 nb_clusters = size_to_clusters(s, bytes_needed);
 /* bytes_needed <= *bytes + offset_in_cluster, both of which are unsigned
@@ -578,7 +583,7 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t 
offset,
  * true */
 assert(nb_clusters <= INT_MAX);
 
-type = qcow2_get_cluster_type(bs, *cluster_offset);
+type = qcow2_get_cluster_type(bs, l2_entry);
 if (s->qcow_version < 3 && (type == QCOW2_CLUSTER_ZERO_PLAIN ||
 type == QCOW2_CLUSTER_ZERO_ALLOC)) {
 qcow2_signal_corruption(bs, true, -1, -1, "Zero cluster entry found"
@@ -599,42 +604,42 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, 
uint64_t offset,
 }
 /* Compressed clusters can only be processed one by one */
 c = 1;
-*cluster_offset &= L2E_COMPRESSED_OFFSET_SIZE_MASK;
+*host_offset = l2_entry & L2E_COMPRESSED_OFFSET_SIZE_MASK;
 break;
 case QCOW2_CLUSTER_ZERO_PLAIN:
 case QCOW2_CLUSTER_UNALLOCATED:
 /* how many empty clusters ? */
 c = 

Re: [PATCH for-5.1] file-posix: Mitigate file fragmentation with extent size hints

2020-07-10 Thread Max Reitz
On 07.07.20 18:17, Kevin Wolf wrote:
> Am 07.07.2020 um 16:23 hat Kevin Wolf geschrieben:
>> Espeically when O_DIRECT is used with image files so that the page cache
>> indirection can't cause a merge of allocating requests, the file will
>> fragment on the file system layer, with a potentially very small
>> fragment size (this depends on the requests the guest sent).
>>
>> On Linux, fragmentation can be reduced by setting an extent size hint
>> when creating the file (at least on XFS, it can't be set any more after
>> the first extent has been allocated), basically giving raw files a
>> "cluster size" for allocation.
>>
>> This adds an create option to set the extent size hint, and changes the
>> default from not setting a hint to setting it to 1 MB. The main reason
>> why qcow2 defaults to smaller cluster sizes is that COW becomes more
>> expensive, which is not an issue with raw files, so we can choose a
>> larger file. The tradeoff here is only potentially wasted disk space.
>>
>> For qcow2 (or other image formats) over file-posix, the advantage should
>> even be greater because they grow sequentially without leaving holes, so
>> there won't be wasted space. Setting even larger extent size hints for
>> such images may make sense. This can be done with the new option, but
>> let's keep the default conservative for now.
>>
>> The effect is very visible with a test that intentionally creates a
>> badly fragmented file with qemu-img bench (the time difference while
>> creating the file is already remarkable) and then looks at the number of
>> extents and the take a simple "qemu-img map" takes.
>>
>> Without an extent size hint:
>>
>> $ ./qemu-img create -f raw -o extent_size_hint=0 ~/tmp/test.raw 10G
>> Formatting '/home/kwolf/tmp/test.raw', fmt=raw size=10737418240 
>> extent_size_hint=0
>> $ ./qemu-img bench -f raw -t none -n -w ~/tmp/test.raw -c 100 -S 
>> 8192 -o 0
>> Sending 100 write requests, 4096 bytes each, 64 in parallel 
>> (starting at offset 0, step size 8192)
>> Run completed in 25.848 seconds.
>> $ ./qemu-img bench -f raw -t none -n -w ~/tmp/test.raw -c 100 -S 
>> 8192 -o 4096
>> Sending 100 write requests, 4096 bytes each, 64 in parallel 
>> (starting at offset 4096, step size 8192)
>> Run completed in 19.616 seconds.
>> $ filefrag ~/tmp/test.raw
>> /home/kwolf/tmp/test.raw: 200 extents found
>> $ time ./qemu-img map ~/tmp/test.raw
>> Offset  Length  Mapped to   File
>> 0   0x1e848 0   /home/kwolf/tmp/test.raw
>>
>> real0m1,279s
>> user0m0,043s
>> sys 0m1,226s
>>
>> With the new default extent size hint of 1 MB:
>>
>> $ ./qemu-img create -f raw -o extent_size_hint=1M ~/tmp/test.raw 10G
>> Formatting '/home/kwolf/tmp/test.raw', fmt=raw size=10737418240 
>> extent_size_hint=1048576
>> $ ./qemu-img bench -f raw -t none -n -w ~/tmp/test.raw -c 100 -S 
>> 8192 -o 0
>> Sending 100 write requests, 4096 bytes each, 64 in parallel 
>> (starting at offset 0, step size 8192)
>> Run completed in 11.833 seconds.
>> $ ./qemu-img bench -f raw -t none -n -w ~/tmp/test.raw -c 100 -S 
>> 8192 -o 4096
>> Sending 100 write requests, 4096 bytes each, 64 in parallel 
>> (starting at offset 4096, step size 8192)
>> Run completed in 10.155 seconds.
>> $ filefrag ~/tmp/test.raw
>> /home/kwolf/tmp/test.raw: 178 extents found
>> $ time ./qemu-img map ~/tmp/test.raw
>> Offset  Length  Mapped to   File
>> 0   0x1e848 0   /home/kwolf/tmp/test.raw
>>
>> real0m0,061s
>> user0m0,040s
>> sys 0m0,014s
>>
>> Signed-off-by: Kevin Wolf 
> 
> I also need to squash in a few trivial qemu-iotests updates, for which I
> won't send a v2:

The additional specifications in 243 make it print a warning on tmpfs
(because the option doesn’t work there).  I suppose the same may be true
on other filesystems as well.  Should it be filtered out?

Max



signature.asc
Description: OpenPGP digital signature


Re: [PULL v2] Block layer patches

2020-07-10 Thread Eric Blake

On 7/10/20 10:41 AM, Peter Maydell wrote:


   qemu-img: Deprecate use of -b without -F (2020-07-09 14:14:55 +0200)


Block layer patches:

- file-posix: Mitigate file fragmentation with extent size hints
- Tighten qemu-img rules on missing backing format
- qemu-img map: Don't limit block status request size



iotest 114 fails on FreeBSD and OpenBSD (and probably NetBSD
but that build hasn't reported back yet): looks like a
non-portable use of 'truncate' ?




@@ -1,8 +1,11 @@
  QA output created by 114
-qemu-img: warning: Deprecated use of backing file without explicit
backing format (detected format of raw)
-Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
backing_file=TEST_DIR/t.IMGFMT.orig
+truncate: illegal option -- -
+usage: truncate [-c] -s [+|-|%|/]size[K|k|M|m|G|g|T|t] file ...
+   truncate [-c] -r rfile file ...
+qemu-img: TEST_DIR/t.IMGFMT: Could not open 'TEST_DIR/t.IMGFMT.orig':


Yep, 'truncate' is not POSIX, but we've been lucky that most of our uses 
have been under '_supported_os Linux'; test 272 runs on BSD with 
'truncate -s' instead of 'truncate --size'.  The fix is obvious; I can 
post the followup to squash in if that will help.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [PATCH] block: Raise an error when backing file parameter is an empty string

2020-07-10 Thread Connor Kuehl

Ping

On 7/1/20 3:42 PM, Connor Kuehl wrote:

Hi Kevin & Max,

Just pinging this patch for your consideration.

Thank you,

Connor

On 6/17/20 11:27 AM, Connor Kuehl wrote:

Providing an empty string for the backing file parameter like so:

qemu-img create -f qcow2 -b '' /tmp/foo

allows the flow of control to reach and subsequently fail an assert
statement because passing an empty string to

bdrv_get_full_backing_filename_from_filename()

simply results in NULL being returned without an error being raised.

To fix this, let's check for an empty string when getting the value from
the opts list.

Reported-by: Attila Fazekas 
Fixes: https://bugzilla.redhat.com/1809553
Signed-off-by: Connor Kuehl 
---
  block.c    |  4 
  tests/qemu-iotests/298 | 47 ++
  tests/qemu-iotests/298.out |  5 
  tests/qemu-iotests/group   |  1 +
  4 files changed, 57 insertions(+)
  create mode 100755 tests/qemu-iotests/298
  create mode 100644 tests/qemu-iotests/298.out

diff --git a/block.c b/block.c
index 6dbcb7e083..b335d6bcb2 100644
--- a/block.c
+++ b/block.c
@@ -6116,6 +6116,10 @@ void bdrv_img_create(const char *filename, 
const char *fmt,

   "same filename as the backing file");
  goto out;
  }
+    if (backing_file[0] == '\0') {
+    error_setg(errp, "Expected backing file name, got empty 
string");

+    goto out;
+    }
  }
  backing_fmt = qemu_opt_get(opts, BLOCK_OPT_BACKING_FMT);
diff --git a/tests/qemu-iotests/298 b/tests/qemu-iotests/298
new file mode 100755
index 00..1e30caebec
--- /dev/null
+++ b/tests/qemu-iotests/298
@@ -0,0 +1,47 @@
+#!/usr/bin/env python3
+#
+# Copyright (C) 2020 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+
+
+
+# Regression test for avoiding an assertion when the 'backing file'
+# parameter (-b) is set to an empty string for qemu-img create
+#
+#   qemu-img create -f qcow2 -b '' /tmp/foo
+#
+# This ensures the invalid parameter is handled with a user-
+# friendly message instead of a failed assertion.
+
+import iotests
+
+class TestEmptyBackingFilename(iotests.QMPTestCase):
+
+
+    def test_empty_backing_file_name(self):
+    actual = iotests.qemu_img_pipe(
+    'create',
+    '-f', 'qcow2',
+    '-b', '',
+    '/tmp/foo'
+    )
+    expected = 'qemu-img: /tmp/foo: Expected backing file name,' \
+   ' got empty string'
+
+    self.assertEqual(actual.strip(), expected.strip())
+
+
+if __name__ == '__main__':
+    iotests.main(supported_fmts=['raw', 'qcow2'])
diff --git a/tests/qemu-iotests/298.out b/tests/qemu-iotests/298.out
new file mode 100644
index 00..ae1213e6f8
--- /dev/null
+++ b/tests/qemu-iotests/298.out
@@ -0,0 +1,5 @@
+.
+--
+Ran 1 tests
+
+OK
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index d886fa0cb3..4bca2d9e05 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -302,3 +302,4 @@
  291 rw quick
  292 rw auto quick
  297 meta
+298 img auto quick








Re: [PULL 03/31] iotests: Make _filter_img_create more active

2020-07-10 Thread John Snow



On 7/6/20 6:04 AM, Max Reitz wrote:
> Right now, _filter_img_create just filters out everything that looks
> format-dependent, and applies some filename filters.  That means that we
> have to add another filter line every time some format gets a new
> creation option.  This can be avoided by instead discarding everything
> and just keeping what we know is format-independent (format, size,
> backing file, encryption information[1], preallocation) or just
> interesting to have in the reference output (external data file path).
> 
> Furthermore, we probably want to sort these options.  Format drivers are
> not required to define them in any specific order, so the output is
> effectively random (although this has never bothered us until now).  We
> need a specific order for our reference outputs, though.  Unfortunately,
> just using a plain "sort" would change a lot of existing reference
> outputs, so we have to pre-filter the option keys to keep our existing
> order (fmt, size, backing*, data, encryption info, preallocation).
> 
> Finally, this makes it difficult for _filter_img_create to automagically
> work for QMP output.  Thus, this patch adds a separate
> _filter_img_create_for_qmp function that echos every line verbatim that
> does not start with "Formatting", and pipes those "Formatting" lines to
> _filter_img_create.
> 
> [1] Actually, the only thing that is really important is whether
> encryption is enabled or not.  A patch by Maxim thus removes all
> other "encrypt.*" options from the output:
> https://lists.nongnu.org/archive/html/qemu-block/2020-06/msg00339.html
> But that patch needs to come later so we can get away with changing
> as few reference outputs in this patch here as possible.
> 
> Signed-off-by: Max Reitz 
> Message-Id: <20200625125548.870061-2-mre...@redhat.com>
> Reviewed-by: Maxim Levitsky 


Hi, I am seeing breakage on travis-ci;

https://travis-ci.org/github/jnsnow/qemu/jobs/706915953#L2236

(Disclaimer: Has my python patches applied)

There's some ambiguity perhaps over the order required by data_file /
data_file_raw.

Also in the IRC channel, pm215 notes that there is a breakage on OSX as
'readarray' is a bash 4.0 feature, and OSX has only bash 3.2:

11:34 < stsquad> MacOSX iotest failures
11:34 < stsquad> Not run: 005 013 018 019 024 034 038 039 041 048 060
061 074 079 080 086 097 099 108 137 138 140 141 150 154 161 172 176 179 181
 184 203 220 226 229 244 249 251 252 259 265 267 287 290
292z
11:34 < stsquad> Failures: 001 002 003 004 007 008 009 010 011 012 017
020 021 022 025 027 029 031 032 033 035 036 037 042 043 046 047 050 052
 053 054 062 063 066 069 071 072 073 089 090 098 103 104
105 107 110 111 114 117 120 126 127 133 134 156 158 159 170 174 177 186
 187 190 191 192 195 214 217 268
11:34 < stsquad> Failed 69 of 75 iotests
11:34 < stsquad> that sounds like a bash-ism has snuck in?
11:36 < pm215> stsquad: readarray is in bash 4, osx system bash is 3.2
11:36 < pm215> dunno why that didn't fail for my own osx box
11:38 < th_huth> pm215: does "make check" run the iotests on your osx
box at all? Maybe it is missing gnu-sed ?
11:39 < pm215> th_huth: ah, it doesn't: "GNU sed not available ==> Not
running the qemu-iotests"
11:39 < th_huth> too bad :-(


> ---
>  tests/qemu-iotests/112.out   |   2 +-
>  tests/qemu-iotests/141   |   2 +-
>  tests/qemu-iotests/153   |   9 ++-
>  tests/qemu-iotests/common.filter | 109 ---
>  4 files changed, 91 insertions(+), 31 deletions(-)
> 
> diff --git a/tests/qemu-iotests/112.out b/tests/qemu-iotests/112.out
> index ae0318cabe..182655dbf6 100644
> --- a/tests/qemu-iotests/112.out
> +++ b/tests/qemu-iotests/112.out
> @@ -5,7 +5,7 @@ QA output created by 112
>  qemu-img: TEST_DIR/t.IMGFMT: Refcount width must be a power of two and may 
> not exceed 64 bits
>  Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
>  qemu-img: TEST_DIR/t.IMGFMT: Refcount width must be a power of two and may 
> not exceed 64 bits
> -Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864 refcount_bits=-1
> +Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
>  qemu-img: TEST_DIR/t.IMGFMT: Refcount width must be a power of two and may 
> not exceed 64 bits
>  Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
>  qemu-img: TEST_DIR/t.IMGFMT: Refcount width must be a power of two and may 
> not exceed 64 bits
> diff --git a/tests/qemu-iotests/141 b/tests/qemu-iotests/141
> index 5192d256e3..6d1b7b0d4c 100755
> --- a/tests/qemu-iotests/141
> +++ b/tests/qemu-iotests/141
> @@ -68,7 +68,7 @@ test_blockjob()
>  _send_qemu_cmd $QEMU_HANDLE \
>  "$1" \
>  "$2" \
> -| _filter_img_create | _filter_qmp_empty_return
> +| _filter_img_create_in_qmp | _filter_qmp_empty_return
>  
>  # We want this to return an error because the block job is still running
>  _send_qemu_cmd $QEMU_HANDLE \
> diff --git 

Re: [PATCH v7 16/47] block: Use bdrv_cow_child() in bdrv_co_truncate()

2020-07-10 Thread Andrey Shinkevich

On 25.06.2020 18:21, Max Reitz wrote:

The condition modified here is not about potentially filtered children,
but only about COW sources (i.e. traditional backing files).

Signed-off-by: Max Reitz 
---
  block/io.c | 7 ---
  1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/block/io.c b/block/io.c
index dc9891d6ce..097a3861d8 100644
--- a/block/io.c
+++ b/block/io.c
@@ -3308,7 +3308,7 @@ int coroutine_fn bdrv_co_truncate(BdrvChild *child, 
int64_t offset, bool exact,
Error **errp)
  {
  BlockDriverState *bs = child->bs;
-BdrvChild *filtered;
+BdrvChild *filtered, *backing;
  BlockDriver *drv = bs->drv;
  BdrvTrackedRequest req;
  int64_t old_size, new_bytes;
@@ -3361,6 +3361,7 @@ int coroutine_fn bdrv_co_truncate(BdrvChild *child, 
int64_t offset, bool exact,
  }
  
  filtered = bdrv_filter_child(bs);

+backing = bdrv_cow_child(bs);
  
  /*

   * If the image has a backing file that is large enough that it would
@@ -3372,10 +3373,10 @@ int coroutine_fn bdrv_co_truncate(BdrvChild *child, 
int64_t offset, bool exact,
   * backing file, taking care of keeping things consistent with that 
backing
   * file is the user's responsibility.
   */
-if (new_bytes && bs->backing) {
+if (new_bytes && backing) {
  int64_t backing_len;
  
-backing_len = bdrv_getlength(backing_bs(bs));

+backing_len = bdrv_getlength(backing->bs);
  if (backing_len < 0) {
  ret = backing_len;
  error_setg_errno(errp, -ret, "Could not get backing file size");

Reviewed-by: Andrey Shinkevich 



Re: [PULL v2] Block layer patches

2020-07-10 Thread Peter Maydell
On Thu, 9 Jul 2020 at 13:17, Kevin Wolf  wrote:
>
> The following changes since commit 8796c64ecdfd34be394ea277a53df0c76996:
>
>   Merge remote-tracking branch 
> 'remotes/kraxel/tags/audio-20200706-pull-request' into staging (2020-07-08 
> 16:33:59 +0100)
>
> are available in the Git repository at:
>
>   git://repo.or.cz/qemu/kevin.git tags/for-upstream
>
> for you to fetch changes up to d1c824e681e423a6224c6831a493e9175fa02dc1:
>
>   qemu-img: Deprecate use of -b without -F (2020-07-09 14:14:55 +0200)
>
> 
> Block layer patches:
>
> - file-posix: Mitigate file fragmentation with extent size hints
> - Tighten qemu-img rules on missing backing format
> - qemu-img map: Don't limit block status request size
>

iotest 114 fails on FreeBSD and OpenBSD (and probably NetBSD
but that build hasn't reported back yet): looks like a
non-portable use of 'truncate' ?

  TESTiotest-qcow2: 114 [fail]
QEMU  --
"/home/qemu/qemu-test.FOdDM2/build/tests/qemu-iotests/../../aarch64-softmmu/qemu-system-aarch64"
-nodefaults -disp
lay none -machine virt -accel qtest
QEMU_IMG  --
"/home/qemu/qemu-test.FOdDM2/build/tests/qemu-iotests/../../qemu-img"
QEMU_IO   --
"/home/qemu/qemu-test.FOdDM2/build/tests/qemu-iotests/../../qemu-io"
--cache writeback --aio threads -f qcow2
QEMU_NBD  --
"/home/qemu/qemu-test.FOdDM2/build/tests/qemu-iotests/../../qemu-nbd"
IMGFMT-- qcow2 (compat=1.1)
IMGPROTO  -- file
PLATFORM  -- FreeBSD/amd64 freebsd 12.1-RELEASE
TEST_DIR  -- /home/qemu/qemu-test.FOdDM2/build/tests/qemu-iotests/scratch
SOCK_DIR  -- /tmp/tmp.RQQ2LyuG
SOCKET_SCM_HELPER --

--- /home/qemu/qemu-test.FOdDM2/src/tests/qemu-iotests/114.out
2020-07-10 14:51:21.0 +
+++ /home/qemu/qemu-test.FOdDM2/build/tests/qemu-iotests/114.out.bad
 2020-07-10 15:15:14.446278000 +
@@ -1,8 +1,11 @@
 QA output created by 114
-qemu-img: warning: Deprecated use of backing file without explicit
backing format (detected format of raw)
-Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
backing_file=TEST_DIR/t.IMGFMT.orig
+truncate: illegal option -- -
+usage: truncate [-c] -s [+|-|%|/]size[K|k|M|m|G|g|T|t] file ...
+   truncate [-c] -r rfile file ...
+qemu-img: TEST_DIR/t.IMGFMT: Could not open 'TEST_DIR/t.IMGFMT.orig':
No such file or directory
+Could not open backing image.
 Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=67108864
-qemu-img: warning: Deprecated use of backing file without explicit
backing format
+qemu-img: Could not open
'/home/qemu/qemu-test.FOdDM2/build/tests/qemu-iotests/scratch/t.qcow2.orig':
Could not open 
'/home/qemu/qemu-test.FOdDM2/build/tests/qemu-iotests/scratch/t.qcow2.orig':
No such file or directory
 qemu-img: warning: Deprecated use of backing file without explicit
backing format (detected format of IMGFMT)
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
backing_file=TEST_DIR/t.IMGFMT.base backing_fmt=IMGFMT
 qemu-img: warning: Deprecated use of unopened backing file without
explicit backing format, use of this image requires potentially unsafe
format probing

thanks
-- PMM



Re: [PATCH v7 15/47] block: Use CAFs when working with backing chains

2020-07-10 Thread Andrey Shinkevich

On 25.06.2020 18:21, Max Reitz wrote:

Use child access functions when iterating through backing chains so
filters do not break the chain.

In addition, bdrv_find_overlay() will now always return the actual
overlay; that is, it will never return a filter node but only one with a
COW backing file (there may be filter nodes between that node and @bs).

Signed-off-by: Max Reitz 
---
  block.c | 41 +
  1 file changed, 29 insertions(+), 12 deletions(-)

diff --git a/block.c b/block.c
index a44af9c3c1..712230ef5c 100644
--- a/block.c
+++ b/block.c
@@ -4724,7 +4724,8 @@ int bdrv_change_backing_file(BlockDriverState *bs,
  }
  
  /*

- * Finds the image layer in the chain that has 'bs' as its backing file.
+ * Finds the image layer in the chain that has 'bs' (or a filter on
+ * top of it) as its backing file.
One can optionally say "Finds the first non-filter parent of bs in the 
chain".
   


...

Reviewed-by: Andrey Shinkevich 




Re: [PATCH v7 14/47] stream: Deal with filters

2020-07-10 Thread Max Reitz
On 09.07.20 17:13, Andrey Shinkevich wrote:
> On 25.06.2020 18:21, Max Reitz wrote:
>> Because of the (not so recent anymore) changes that make the stream job
>> independent of the base node and instead track the node above it, we
>> have to split that "bottom" node into two cases: The bottom COW node,
>> and the node directly above the base node (which may be an R/W filter
>> or the bottom COW node).
>>
>> Signed-off-by: Max Reitz 
>> ---
>>   qapi/block-core.json |  4 +++
>>   block/stream.c   | 63 
>>   blockdev.c   |  4 ++-
>>   3 files changed, 53 insertions(+), 18 deletions(-)
>>
>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>> index b20332e592..df87855429 100644
>> --- a/qapi/block-core.json
>> +++ b/qapi/block-core.json
>> @@ -2486,6 +2486,10 @@
>>   # On successful completion the image file is updated to drop the
>> backing file
>>   # and the BLOCK_JOB_COMPLETED event is emitted.
>>   #
>> +# In case @device is a filter node, block-stream modifies the first
>> non-filter
>> +# overlay node below it to point to base's backing node (or NULL if
>> @base was
> 
> Forgot one thing. To me, it would be more understandable to read
> 
> "...to point to the base as backing node..." because it may be thought
> as a backing
> 
> node of the base.

This doesn’t sound like it’s about understandability; “point to the base
as backing node” and “point to base’s backing node [as backing node]”
are semantically different.

Was my phrasing just wrong?  @base should be the backing node, so yours
seems correct.

Max



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v7 14/47] stream: Deal with filters

2020-07-10 Thread Max Reitz
On 09.07.20 16:52, Andrey Shinkevich wrote:
> On 25.06.2020 18:21, Max Reitz wrote:
>> Because of the (not so recent anymore) changes that make the stream job
>> independent of the base node and instead track the node above it, we
>> have to split that "bottom" node into two cases: The bottom COW node,
>> and the node directly above the base node (which may be an R/W filter
>> or the bottom COW node).
>>
>> Signed-off-by: Max Reitz 
>> ---
>>   qapi/block-core.json |  4 +++
>>   block/stream.c   | 63 
>>   blockdev.c   |  4 ++-
>>   3 files changed, 53 insertions(+), 18 deletions(-)
>>
>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>> index b20332e592..df87855429 100644
>> --- a/qapi/block-core.json
>> +++ b/qapi/block-core.json
>> @@ -2486,6 +2486,10 @@
>>   # On successful completion the image file is updated to drop the
>> backing file
>>   # and the BLOCK_JOB_COMPLETED event is emitted.
>>   #
>> +# In case @device is a filter node, block-stream modifies the first
>> non-filter
>> +# overlay node below it to point to base's backing node (or NULL if
>> @base was
>> +# not specified) instead of modifying @device itself.
>> +#
>>   # @job-id: identifier for the newly-created block job. If
>>   #  omitted, the device name will be used. (Since 2.7)
>>   #
>> diff --git a/block/stream.c b/block/stream.c
>> index aa2e7af98e..b9c1141656 100644
>> --- a/block/stream.c
>> +++ b/block/stream.c
>> @@ -31,7 +31,8 @@ enum {
>>     typedef struct StreamBlockJob {
>>   BlockJob common;
>> -    BlockDriverState *bottom;
>> +    BlockDriverState *base_overlay; /* COW overlay (stream from this) */
>> +    BlockDriverState *above_base;   /* Node directly above the base */
> 
> Keeping the base_overlay is enough to complete the stream job.

Depends on the definition.  If we decide it isn’t enough, then it isn’t
enough.

> The above_base may disappear during the job and we can't rely on it.

In this version of this series, it may not, because the chain is frozen.
 So the above_base cannot disappear.

We can discuss whether we should allow it to disappear, but I think not.

The problem is, we need something to set as the backing file after
streaming.  How do we figure out what that should be?  My proposal is we
keep above_base and use its immediate child.

If we don’t keep above_base, then we’re basically left guessing as to
what should be the backing file after the stream job.

>>   BlockdevOnError on_error;
>>   char *backing_file_str;
>>   bool bs_read_only;
>> @@ -53,7 +54,7 @@ static void stream_abort(Job *job)
>>     if (s->chain_frozen) {
>>   BlockJob *bjob = >common;
>> -    bdrv_unfreeze_backing_chain(blk_bs(bjob->blk), s->bottom);
>> +    bdrv_unfreeze_backing_chain(blk_bs(bjob->blk), s->above_base);
>>   }
>>   }
>>   @@ -62,14 +63,15 @@ static int stream_prepare(Job *job)
>>   StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
>>   BlockJob *bjob = >common;
>>   BlockDriverState *bs = blk_bs(bjob->blk);
>> -    BlockDriverState *base = backing_bs(s->bottom);
>> +    BlockDriverState *unfiltered_bs = bdrv_skip_filters(bs);
>> +    BlockDriverState *base = bdrv_filter_or_cow_bs(s->above_base);
> 
> The initial base node may be a top node for a concurrent commit job and
> 
> may disappear.
Then it would just be replaced by another node, though, so above_base
keeps a child.  The @base here is not necessarily the initial @base, and
that’s intentional.

> base = bdrv_filter_or_cow_bs(s->base_overlay) is more reliable.

But also wrong.  The point of keeping above_base around is to get its
child here to use that child as the new backing child of the top node.

>>   Error *local_err = NULL;
>>   int ret = 0;
>>   -    bdrv_unfreeze_backing_chain(bs, s->bottom);
>> +    bdrv_unfreeze_backing_chain(bs, s->above_base);
>>   s->chain_frozen = false;
>>   -    if (bs->backing) {
>> +    if (bdrv_cow_child(unfiltered_bs)) {
>>   const char *base_id = NULL, *base_fmt = NULL;
>>   if (base) {
>>   base_id = s->backing_file_str;
>> @@ -77,8 +79,8 @@ static int stream_prepare(Job *job)
>>   base_fmt = base->drv->format_name;
>>   }
>>   }
>> -    bdrv_set_backing_hd(bs, base, _err);
>> -    ret = bdrv_change_backing_file(bs, base_id, base_fmt);
>> +    bdrv_set_backing_hd(unfiltered_bs, base, _err);
>> +    ret = bdrv_change_backing_file(unfiltered_bs, base_id,
>> base_fmt);
>>   if (local_err) {
>>   error_report_err(local_err);
>>   return -EPERM;
>> @@ -109,14 +111,15 @@ static int coroutine_fn stream_run(Job *job,
>> Error **errp)
>>   StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
>>   BlockBackend *blk = s->common.blk;
>>   BlockDriverState *bs = blk_bs(blk);
>> -    bool enable_cor = !backing_bs(s->bottom);
>> +    

Re: [PATCH for-5.1 0/2] qemu-img convert: Fix abort with unaligned image size

2020-07-10 Thread no-reply
Patchew URL: https://patchew.org/QEMU/20200710142149.40962-1-kw...@redhat.com/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

Not run: 259
Failures: 268
Failed 1 of 119 iotests
make: *** [check-tests/check-block.sh] Error 1
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 669, in 
sys.exit(main())
---
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', 
'--label', 'com.qemu.instance.uuid=7dd2224f6d7b4c79a604d33021e80c19', '-u', 
'1001', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', 
'-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 
'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', 
'/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', 
'/var/tmp/patchew-tester-tmp-vz0tbkd6/src/docker-src.2020-07-10-10.27.42.8932:/var/tmp/qemu:z,ro',
 'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit 
status 2.
filter=--filter=label=com.qemu.instance.uuid=7dd2224f6d7b4c79a604d33021e80c19
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-vz0tbkd6/src'
make: *** [docker-run-test-quick@centos7] Error 2

real16m16.326s
user0m9.052s


The full log is available at
http://patchew.org/logs/20200710142149.40962-1-kw...@redhat.com/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [PATCH 6/6] block/io: improve loadvm performance

2020-07-10 Thread Vladimir Sementsov-Ogievskiy

09.07.2020 16:26, Denis V. Lunev wrote:

This patch creates intermediate buffer for reading from block driver
state and performs read-ahead to this buffer. Snapshot code performs
reads sequentially and thus we know what offsets will be required
and when they will become not needed.

Results are fantastic. Switch to snapshot times of 2GB Fedora 31 VM
over NVME storage are the following:
 original fixed
cached:  1.84s   1.16s
non-cached: 12.74s   1.27s

The difference over HDD would be more significant:)

Signed-off-by: Denis V. Lunev
CC: Vladimir Sementsov-Ogievskiy
CC: Kevin Wolf
CC: Max Reitz
CC: Stefan Hajnoczi
CC: Fam Zheng
CC: Juan Quintela
CC: Denis Plotnikov


Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir



Re: [PATCH for-5.1 2/2] file-posix: Allow byte-aligned O_DIRECT with NFS

2020-07-10 Thread Eric Blake

On 7/10/20 9:21 AM, Kevin Wolf wrote:

Since commit a6b257a08e3 ('file-posix: Handle undetectable alignment'),
we assume that if we open a file with O_DIRECT and alignment probing
returns 1, we just couldn't find out the real alignment requirement
because some filesystems make the requirement only for allocated blocks.
In this case, a safe default of 4k is used.

This is too strict NFS, which does actually allow byte-aligned requests


strict for


even with O_DIRECT. Because we can't distinguish both cases with generic
code, let's just look at the file system magic and disable
s->needs_alignment for NFS. This way, O_DIRECT can still be used on NFS
for images that are not aligned to 4k.

Signed-off-by: Kevin Wolf 
---
  block/file-posix.c | 26 +-
  1 file changed, 25 insertions(+), 1 deletion(-)



Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [PATCH for-5.1 1/2] block: Require aligned image size to avoid assertion failure

2020-07-10 Thread Eric Blake

On 7/10/20 9:21 AM, Kevin Wolf wrote:

Unaligned requests will automatically be aligned to bl.request_alignment
and we don't want to extend requests to access space beyond the end of
the image, so it's required that the image size is aligned.


Yep, that's what I've already done on nbd images.

nbdkit has '--filter=truncate' which rounds an image size up to 
alignment by reading the absent tail as zeros, and permitting writes 
that rewrite zero but failing with EIO any write that would attempt to 
change the tail.  We may eventually want that complexity in qemu's block 
layer for ALL drivers (as part of switching the block layer to 
byte-accurate sizing everywhere), but that's a LOT more effort.  The 
short term of just mandating alignment is much easier and still defensible.




With write requests, this could cause assertion failures like this if
RESIZE permissions weren't requested:

qemu-img: block/io.c:1910: bdrv_co_write_req_prepare: Assertion `end_sector <= 
bs->total_sectors || child->perm & BLK_PERM_RESIZE' failed.

This was e.g. triggered by qemu-img converting to a target image with 4k
request alignment when the image was only aligned to 512 bytes, but not
to 4k.

Signed-off-by: Kevin Wolf 
---
  block.c | 10 ++
  1 file changed, 10 insertions(+)


Reviewed-by: Eric Blake 



diff --git a/block.c b/block.c
index cc377d7ef3..c635777911 100644
--- a/block.c
+++ b/block.c
@@ -1489,6 +1489,16 @@ static int bdrv_open_driver(BlockDriverState *bs, 
BlockDriver *drv,
  return -EINVAL;
  }
  
+/*

+ * Unaligned requests will automatically be aligned to bl.request_alignment
+ * and we don't want to extend requests to access space beyond the end of
+ * the image, so it's required that the image size is aligned.
+ */
+if ((bs->total_sectors * BDRV_SECTOR_SIZE) % bs->bl.request_alignment) {
+error_setg(errp, "Image size is not a multiple of request alignment");
+return -EINVAL;
+}
+


Do we have any iotest coverage of this new message?  (If none of our 
existing tests broke, then you should add one...)



--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




[PATCH for-5.1 2/2] file-posix: Allow byte-aligned O_DIRECT with NFS

2020-07-10 Thread Kevin Wolf
Since commit a6b257a08e3 ('file-posix: Handle undetectable alignment'),
we assume that if we open a file with O_DIRECT and alignment probing
returns 1, we just couldn't find out the real alignment requirement
because some filesystems make the requirement only for allocated blocks.
In this case, a safe default of 4k is used.

This is too strict NFS, which does actually allow byte-aligned requests
even with O_DIRECT. Because we can't distinguish both cases with generic
code, let's just look at the file system magic and disable
s->needs_alignment for NFS. This way, O_DIRECT can still be used on NFS
for images that are not aligned to 4k.

Signed-off-by: Kevin Wolf 
---
 block/file-posix.c | 26 +-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 0c4e07c415..4e9dac461b 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -62,10 +62,12 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 
+#include 
 #include 
 #ifdef __s390__
 #include 
@@ -300,6 +302,28 @@ static int probe_physical_blocksize(int fd, unsigned int 
*blk_size)
 #endif
 }
 
+/*
+ * Returns true if no alignment restrictions are necessary even for files
+ * opened with O_DIRECT.
+ *
+ * raw_probe_alignment() probes the required alignment and assume that 1 means
+ * the probing failed, so it falls back to a safe default of 4k. This can be
+ * avoided if we know that byte alignment is okay for the file.
+ */
+static bool dio_byte_aligned(int fd)
+{
+#ifdef __linux__
+struct statfs buf;
+int ret;
+
+ret = fstatfs(fd, );
+if (ret == 0 && buf.f_type == NFS_SUPER_MAGIC) {
+return true;
+}
+#endif
+return false;
+}
+
 /* Check if read is allowed with given memory buffer and length.
  *
  * This function is used to check O_DIRECT memory buffer and request alignment.
@@ -631,7 +655,7 @@ static int raw_open_common(BlockDriverState *bs, QDict 
*options,
 
 s->has_discard = true;
 s->has_write_zeroes = true;
-if ((bs->open_flags & BDRV_O_NOCACHE) != 0) {
+if ((bs->open_flags & BDRV_O_NOCACHE) != 0 && !dio_byte_aligned(s->fd)) {
 s->needs_alignment = true;
 }
 
-- 
2.25.4




[PATCH for-5.1 0/2] qemu-img convert: Fix abort with unaligned image size

2020-07-10 Thread Kevin Wolf
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1834646

Patch 1 fixes the assertion failure by failing gracefully when opening
an image whose size isn't aligned to the required request alignment.

Patch 2 relaxes the restrictions for NFS, which actually supports byte
alignment, but incorrectly gets a 4k request alignment in the file-posix
block driver.

Kevin Wolf (2):
  block: Require aligned image size to avoid assertion failure
  file-posix: Allow byte-aligned O_DIRECT with NFS

 block.c| 10 ++
 block/file-posix.c | 26 +-
 2 files changed, 35 insertions(+), 1 deletion(-)

-- 
2.25.4




[PATCH for-5.1 1/2] block: Require aligned image size to avoid assertion failure

2020-07-10 Thread Kevin Wolf
Unaligned requests will automatically be aligned to bl.request_alignment
and we don't want to extend requests to access space beyond the end of
the image, so it's required that the image size is aligned.

With write requests, this could cause assertion failures like this if
RESIZE permissions weren't requested:

qemu-img: block/io.c:1910: bdrv_co_write_req_prepare: Assertion `end_sector <= 
bs->total_sectors || child->perm & BLK_PERM_RESIZE' failed.

This was e.g. triggered by qemu-img converting to a target image with 4k
request alignment when the image was only aligned to 512 bytes, but not
to 4k.

Signed-off-by: Kevin Wolf 
---
 block.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/block.c b/block.c
index cc377d7ef3..c635777911 100644
--- a/block.c
+++ b/block.c
@@ -1489,6 +1489,16 @@ static int bdrv_open_driver(BlockDriverState *bs, 
BlockDriver *drv,
 return -EINVAL;
 }
 
+/*
+ * Unaligned requests will automatically be aligned to bl.request_alignment
+ * and we don't want to extend requests to access space beyond the end of
+ * the image, so it's required that the image size is aligned.
+ */
+if ((bs->total_sectors * BDRV_SECTOR_SIZE) % bs->bl.request_alignment) {
+error_setg(errp, "Image size is not a multiple of request alignment");
+return -EINVAL;
+}
+
 assert(bdrv_opt_mem_align(bs) != 0);
 assert(bdrv_min_mem_align(bs) != 0);
 assert(is_power_of_2(bs->bl.request_alignment));
-- 
2.25.4




[PATCH v1] qmp: don't hold ctx lock while querying blockstats

2020-07-10 Thread Zhenyu Ye
Because the QMP command runs in the main thread, and changes
to the aio context of iothread will only be executed in the
main thread (they will not be in parallel), so there is no
need a lock protection while querying blockstats.

If we hold the lock here, while the I/O pressure is high in
vm and the I/O returns slowly, the main thread will be stuck
until the lock is released, which will affect the vcpu operation
and finall cause the vm to be stuck.

Signed-off-by: Zhenyu Ye 
---
 block/qapi.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/block/qapi.c b/block/qapi.c
index afd9f3b4a7..fa56bc145d 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -609,11 +609,8 @@ BlockStatsList *qmp_query_blockstats(bool has_query_nodes,
 if (has_query_nodes && query_nodes) {
 for (bs = bdrv_next_node(NULL); bs; bs = bdrv_next_node(bs)) {
 BlockStatsList *info = g_malloc0(sizeof(*info));
-AioContext *ctx = bdrv_get_aio_context(bs);
 
-aio_context_acquire(ctx);
 info->value = bdrv_query_bds_stats(bs, false);
-aio_context_release(ctx);
 
 *p_next = info;
 p_next = >next;
@@ -621,7 +618,6 @@ BlockStatsList *qmp_query_blockstats(bool has_query_nodes,
 } else {
 for (blk = blk_all_next(NULL); blk; blk = blk_all_next(blk)) {
 BlockStatsList *info;
-AioContext *ctx = blk_get_aio_context(blk);
 BlockStats *s;
 char *qdev;
 
@@ -629,7 +625,6 @@ BlockStatsList *qmp_query_blockstats(bool has_query_nodes,
 continue;
 }
 
-aio_context_acquire(ctx);
 s = bdrv_query_bds_stats(blk_bs(blk), true);
 s->has_device = true;
 s->device = g_strdup(blk_name(blk));
@@ -643,7 +638,6 @@ BlockStatsList *qmp_query_blockstats(bool has_query_nodes,
 }
 
 bdrv_query_blk_stats(s->stats, blk);
-aio_context_release(ctx);
 
 info = g_malloc0(sizeof(*info));
 info->value = s;
-- 
2.19.1





Re: [PATCH v5 5/5] vhost-user-blk: default num_queues to -smp N

2020-07-10 Thread Stefan Hajnoczi
On Thu, Jul 09, 2020 at 11:02:24AM -0700, Raphael Norwitz wrote:
> On Mon, Jul 6, 2020 at 7:00 AM Stefan Hajnoczi  wrote:
> > diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> > index a00b854736..39aec42dae 100644
> > --- a/hw/block/vhost-user-blk.c
> > +++ b/hw/block/vhost-user-blk.c
> > @@ -420,6 +420,9 @@ static void vhost_user_blk_device_realize(DeviceState 
> > *dev, Error **errp)
> >  return;
> >  }
> >
> > +if (s->num_queues == VHOST_USER_BLK_AUTO_NUM_QUEUES) {
> > +s->num_queues = 1;
> > +}
> 
> What is this check for? Is it just a backstop to ensure that
> num_queues is set to 1 if vhost-user-blk-pci doesn't update it?

For the non-PCI VIRTIO transports that do not handle num_queues ==
VHOST_USER_BLK_AUTO_NUM_QUEUES themselves.

Stefan


signature.asc
Description: PGP signature


Re: [PATCH for-5.1] qemu-img resize: Require --shrink for shrinking all image formats

2020-07-10 Thread Kevin Wolf
Am 10.07.2020 um 14:33 hat Peter Maydell geschrieben:
> On Fri, 10 Jul 2020 at 13:17, Kevin Wolf  wrote:
> >
> > QEMU 2.11 introduced the --shrink option for qemu-img resize to avoid
> > accidentally shrinking images (commit 4ffca8904a3). However, for
> > compatibility reasons, it was not enforced for raw images yet, but only
> > a deprecation warning was printed. This warning has existed for long
> > enough that we can now finally require --shrink for raw images, too, and
> > error out if it's not given.
> >
> > Documentation already describes the state as it is after this patch.
> >
> > Signed-off-by: Kevin Wolf 
> > ---
> >  qemu-img.c | 17 +++--
> >  1 file changed, 3 insertions(+), 14 deletions(-)
> >
> > diff --git a/qemu-img.c b/qemu-img.c
> > index e3b2ec3e78..f6a2703039 100644
> > --- a/qemu-img.c
> > +++ b/qemu-img.c
> > @@ -4011,20 +4011,9 @@ static int img_resize(int argc, char **argv)
> >  }
> >
> >  if (total_size < current_size && !shrink) {
> > -warn_report("Shrinking an image will delete all data beyond the "
> > -"shrunken image's end. Before performing such an "
> > -"operation, make sure there is no important data 
> > there.");
> > -
> > -if (g_strcmp0(bdrv_get_format_name(blk_bs(blk)), "raw") != 0) {
> > -error_report(
> > -  "Use the --shrink option to perform a shrink operation.");
> > -ret = -1;
> > -goto out;
> > -} else {
> > -warn_report("Using the --shrink option will suppress this 
> > message. "
> > -"Note that future versions of qemu-img may refuse 
> > to "
> > -"shrink images without this option.");
> > -}
> > +error_report("Use the --shrink option to perform a shrink 
> > operation.");
> 
> I think it would be nice to retain this bit of text:
> 
> > -warn_report("Shrinking an image will delete all data beyond the "
> > -"shrunken image's end. Before performing such an "
> > -"operation, make sure there is no important data 
> > there.");
> 
> ie, make the raw-shrink case be the same as the non-raw-shrink
> case currently does.

I had this at first, but then the whole thing looked like a warning and
I wasn't sure that it would still be understood as an error. (Which is
of course a preexisting problem for non-raw.)

Maybe it becomes clearer if I just swap the order and print the error
first and only then the warning?

Kevin




Re: [PATCH for-5.1] qemu-img resize: Require --shrink for shrinking all image formats

2020-07-10 Thread Peter Maydell
On Fri, 10 Jul 2020 at 13:17, Kevin Wolf  wrote:
>
> QEMU 2.11 introduced the --shrink option for qemu-img resize to avoid
> accidentally shrinking images (commit 4ffca8904a3). However, for
> compatibility reasons, it was not enforced for raw images yet, but only
> a deprecation warning was printed. This warning has existed for long
> enough that we can now finally require --shrink for raw images, too, and
> error out if it's not given.
>
> Documentation already describes the state as it is after this patch.
>
> Signed-off-by: Kevin Wolf 
> ---
>  qemu-img.c | 17 +++--
>  1 file changed, 3 insertions(+), 14 deletions(-)
>
> diff --git a/qemu-img.c b/qemu-img.c
> index e3b2ec3e78..f6a2703039 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -4011,20 +4011,9 @@ static int img_resize(int argc, char **argv)
>  }
>
>  if (total_size < current_size && !shrink) {
> -warn_report("Shrinking an image will delete all data beyond the "
> -"shrunken image's end. Before performing such an "
> -"operation, make sure there is no important data 
> there.");
> -
> -if (g_strcmp0(bdrv_get_format_name(blk_bs(blk)), "raw") != 0) {
> -error_report(
> -  "Use the --shrink option to perform a shrink operation.");
> -ret = -1;
> -goto out;
> -} else {
> -warn_report("Using the --shrink option will suppress this 
> message. "
> -"Note that future versions of qemu-img may refuse to 
> "
> -"shrink images without this option.");
> -}
> +error_report("Use the --shrink option to perform a shrink 
> operation.");

I think it would be nice to retain this bit of text:

> -warn_report("Shrinking an image will delete all data beyond the "
> -"shrunken image's end. Before performing such an "
> -"operation, make sure there is no important data 
> there.");

ie, make the raw-shrink case be the same as the non-raw-shrink
case currently does.

thanks
-- PMM



Re: [PATCH 2/2] hw/sd/sdcard: Do not allow invalid SD card sizes

2020-07-10 Thread Peter Maydell
On Fri, 10 Jul 2020 at 13:07, Kevin Wolf  wrote:
>
> Am 10.07.2020 um 11:59 hat Peter Maydell geschrieben:
> > On Fri, 10 Jul 2020 at 10:58, Kevin Wolf  wrote:
> > >
> > > Am 09.07.2020 um 16:15 hat Peter Maydell geschrieben:
> > > > dd/truncate etc won't work if the image file is not raw (eg if
> > > > it's qcow2). The only chance you have of something that's actually
> > > > generic would probably involve "qemu-img resize". But I'm a bit
> > > > wary of having an error message that recommends that, because
> > > > what if we got it wrong?
> > >
> > > What is your concern that we might get wrong? The suggestion is always
> > > extending the size rather than shrinking, so it should be harmless and
> > > easy to undo. (Hm, we should finally make --shrink mandatory for
> > > shrinking. We've printed a deprecation warning for almost three years.)
> >
> > If there's a qemu-img command line that will always only
> > extend the image size and never let the user accidentally
> > shrink it and throw away data, then great. I'd happily
> > recommend that.
>
> I think removing deprecated behaviour is a change that we can still make
> in the early freeze. So if you agree, I'll send a patch that makes
> shrinking an image without --shrink a hard error in 5.1.

Happy to defer to your judgement on that; I agree that removal
of deprecated behaviour is ok at this point in freeze.

-- PMM



Re: [PATCH for-5.1] qemu-img resize: Require --shrink for shrinking all image formats

2020-07-10 Thread Daniel P . Berrangé
On Fri, Jul 10, 2020 at 02:17:17PM +0200, Kevin Wolf wrote:
> QEMU 2.11 introduced the --shrink option for qemu-img resize to avoid
> accidentally shrinking images (commit 4ffca8904a3). However, for
> compatibility reasons, it was not enforced for raw images yet, but only
> a deprecation warning was printed. This warning has existed for long
> enough that we can now finally require --shrink for raw images, too, and
> error out if it's not given.

Libvirt has used the --shrink flag since Aug 2018, so this is safe
from our POV.

> Documentation already describes the state as it is after this patch.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  qemu-img.c | 17 +++--
>  1 file changed, 3 insertions(+), 14 deletions(-)

Reviewed-by: Daniel P. Berrangé 


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




[PATCH for-5.1] qemu-img resize: Require --shrink for shrinking all image formats

2020-07-10 Thread Kevin Wolf
QEMU 2.11 introduced the --shrink option for qemu-img resize to avoid
accidentally shrinking images (commit 4ffca8904a3). However, for
compatibility reasons, it was not enforced for raw images yet, but only
a deprecation warning was printed. This warning has existed for long
enough that we can now finally require --shrink for raw images, too, and
error out if it's not given.

Documentation already describes the state as it is after this patch.

Signed-off-by: Kevin Wolf 
---
 qemu-img.c | 17 +++--
 1 file changed, 3 insertions(+), 14 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index e3b2ec3e78..f6a2703039 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -4011,20 +4011,9 @@ static int img_resize(int argc, char **argv)
 }
 
 if (total_size < current_size && !shrink) {
-warn_report("Shrinking an image will delete all data beyond the "
-"shrunken image's end. Before performing such an "
-"operation, make sure there is no important data there.");
-
-if (g_strcmp0(bdrv_get_format_name(blk_bs(blk)), "raw") != 0) {
-error_report(
-  "Use the --shrink option to perform a shrink operation.");
-ret = -1;
-goto out;
-} else {
-warn_report("Using the --shrink option will suppress this message. 
"
-"Note that future versions of qemu-img may refuse to "
-"shrink images without this option.");
-}
+error_report("Use the --shrink option to perform a shrink operation.");
+ret = -1;
+goto out;
 }
 
 /*
-- 
2.25.4




Re: [PATCH 2/2] hw/sd/sdcard: Do not allow invalid SD card sizes

2020-07-10 Thread Kevin Wolf
Am 10.07.2020 um 11:59 hat Peter Maydell geschrieben:
> On Fri, 10 Jul 2020 at 10:58, Kevin Wolf  wrote:
> >
> > Am 09.07.2020 um 16:15 hat Peter Maydell geschrieben:
> > > dd/truncate etc won't work if the image file is not raw (eg if
> > > it's qcow2). The only chance you have of something that's actually
> > > generic would probably involve "qemu-img resize". But I'm a bit
> > > wary of having an error message that recommends that, because
> > > what if we got it wrong?
> >
> > What is your concern that we might get wrong? The suggestion is always
> > extending the size rather than shrinking, so it should be harmless and
> > easy to undo. (Hm, we should finally make --shrink mandatory for
> > shrinking. We've printed a deprecation warning for almost three years.)
> 
> If there's a qemu-img command line that will always only
> extend the image size and never let the user accidentally
> shrink it and throw away data, then great. I'd happily
> recommend that.

I think removing deprecated behaviour is a change that we can still make
in the early freeze. So if you agree, I'll send a patch that makes
shrinking an image without --shrink a hard error in 5.1.

Kevin




Re: [PATCH 2/2] hw/sd/sdcard: Do not allow invalid SD card sizes

2020-07-10 Thread Peter Maydell
On Fri, 10 Jul 2020 at 10:58, Kevin Wolf  wrote:
>
> Am 09.07.2020 um 16:15 hat Peter Maydell geschrieben:
> > dd/truncate etc won't work if the image file is not raw (eg if
> > it's qcow2). The only chance you have of something that's actually
> > generic would probably involve "qemu-img resize". But I'm a bit
> > wary of having an error message that recommends that, because
> > what if we got it wrong?
>
> What is your concern that we might get wrong? The suggestion is always
> extending the size rather than shrinking, so it should be harmless and
> easy to undo. (Hm, we should finally make --shrink mandatory for
> shrinking. We've printed a deprecation warning for almost three years.)

If there's a qemu-img command line that will always only
extend the image size and never let the user accidentally
shrink it and throw away data, then great. I'd happily
recommend that.

thanks
-- PMM



Re: [PATCH 2/2] hw/sd/sdcard: Do not allow invalid SD card sizes

2020-07-10 Thread Kevin Wolf
Am 09.07.2020 um 16:15 hat Peter Maydell geschrieben:
> On Thu, 9 Jul 2020 at 14:56, Philippe Mathieu-Daudé  wrote:
> >
> > On 7/7/20 10:29 PM, Niek Linnenbank wrote:
> > > So I manually copy & pasted the change into hw/sd/sd.c to test it.
> > > It looks like the check works, but my concern is that with this change,
> > > we will be getting this error on 'off-the-shelf' images as well.
> > > For example, the latest Raspbian image size also isn't a power of two:
> > >
> > > $ ./arm-softmmu/qemu-system-arm -M raspi2 -sd
> > > ~/Downloads/2020-05-27-raspios-buster-lite-armhf.img -nographic
> > > WARNING: Image format was not specified for
> > > '/home/me/Downloads/2020-05-27-raspios-buster-lite-armhf.img' and
> > > probing guessed raw.
> > >  Automatically detecting the format is dangerous for raw images,
> > > write operations on block 0 will be restricted.
> > >  Specify the 'raw' format explicitly to remove the restrictions.
> > > qemu-system-arm: Invalid SD card size: 1.73 GiB (expecting at least 2 GiB)
> > >
> > > If we do decide that the change is needed, I would like to propose that
> > > we also give the user some instructions
> > > on how to fix it, maybe some 'dd' command?
> >
> > On POSIX we can suggest to use 'truncate -s 2G' from coreutils.
> > This is not in the default Darwin packages.
> > On Windows I have no clue.
> 
> dd/truncate etc won't work if the image file is not raw (eg if
> it's qcow2). The only chance you have of something that's actually
> generic would probably involve "qemu-img resize". But I'm a bit
> wary of having an error message that recommends that, because
> what if we got it wrong?

What is your concern that we might get wrong? The suggestion is always
extending the size rather than shrinking, so it should be harmless and
easy to undo. (Hm, we should finally make --shrink mandatory for
shrinking. We've printed a deprecation warning for almost three years.)

Kevin




Re: [PULL 18/31] block/core: add generic infrastructure for x-blockdev-amend qmp command

2020-07-10 Thread Max Reitz
On 09.07.20 17:09, Peter Maydell wrote:
> On Mon, 6 Jul 2020 at 11:05, Max Reitz  wrote:
>>
>> From: Maxim Levitsky 
>>
>> blockdev-amend will be used similiar to blockdev-create
>> to allow on the fly changes of the structure of the format based block 
>> devices.
>>
>> Current plan is to first support encryption keyslot management for luks
>> based formats (raw and embedded in qcow2)
>>
>> Signed-off-by: Maxim Levitsky 
>> Reviewed-by: Daniel P. Berrangé 
>> Message-Id: <20200608094030.670121-12-mlevi...@redhat.com>
>> Signed-off-by: Max Reitz 
> 
> Hi; Coverity reports a possible issue with this function
> (CID 1430268):

Thanks for the heads-up, I’ve sent a patch (“block/amend: Check whether
the node exists”).

>> +void qmp_x_blockdev_amend(const char *job_id,
>> +  const char *node_name,
>> +  BlockdevAmendOptions *options,
>> +  bool has_force,
>> +  bool force,
>> +  Error **errp)
>> +{
>> +BlockdevAmendJob *s;
>> +const char *fmt = BlockdevDriver_str(options->driver);
>> +BlockDriver *drv = bdrv_find_format(fmt);
>> +BlockDriverState *bs = bdrv_find_node(node_name);
> 
> bdrv_find_node() can return NULL (we check for this
> in almost all callsites)...
> 
>> +if (bs->drv != drv) {
> 
> ...but here we dereference it unconditionally.
> 
>> +error_setg(errp,
>> +   "x-blockdev-amend doesn't support changing the block 
>> driver");
>> +return;
>> +}
> 
> thanks
> -- PMM
> 




signature.asc
Description: OpenPGP digital signature


Re: [PATCH 2/2] hw/sd/sdcard: Do not allow invalid SD card sizes

2020-07-10 Thread Peter Maydell
On Thu, 9 Jul 2020 at 17:27, Alistair Francis  wrote:
>
> On Thu, Jul 9, 2020 at 7:35 AM Philippe Mathieu-Daudé  wrote:
> >
> > On 7/9/20 4:15 PM, Peter Maydell wrote:
> > > The only chance you have of something that's actually
> > > generic would probably involve "qemu-img resize". But I'm a bit
> > > wary of having an error message that recommends that, because
> > > what if we got it wrong?
> >
> > I am not sure what to recommend then.
> >
> > Would that work as hint?
> >
> >   qemu-system-arm -M raspi2 -sd ./buster-lite-armhf.img
> >   qemu-system-arm: Invalid SD card size: 1.73 GiB
> >   SD card size has to be a power of 2, e.g. 2GiB.
>
> That sounds good to me. That's enough for a user to figure out the next step.
>
> If you want you could also add: "qemu-img might be able to help." or
> something like that.

Thinking about it a bit and talking to Philippe on IRC, I think
we can usefully have the message recommend "qemu-img resize" to the
user; I think we should avoid printing out an exact-cut-n-paste
command line just in case it's wrong, but something like:
 qemu-system-arm: Invalid SD card size: 1.73 GiB
 SD card size has to be a power of 2, e.g. 2GiB.
 You can resize disk images with 'qemu-img resize  '
 (note that this will lose data if you make the image smaller than it
currently is)

I think is a reasonable balance between pointing the user in
the right direction and cautioning them to check what they're
doing before they blithely perform the resize...

thanks
-- PMM



[PATCH] block/amend: Check whether the node exists

2020-07-10 Thread Max Reitz
We should check whether the user-specified node-name actually refers to
a node.  The simplest way to do that is to use bdrv_lookup_bs() instead
of bdrv_find_node() (the former wraps the latter, and produces an error
message if necessary).

Reported-by: Coverity (CID 1430268)
Fixes: ced914d0ab9fb2c900f873f6349a0b8eecd1fdbe
Signed-off-by: Max Reitz 
---
 block/amend.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/block/amend.c b/block/amend.c
index f4612dcf08..392df9ef83 100644
--- a/block/amend.c
+++ b/block/amend.c
@@ -69,8 +69,12 @@ void qmp_x_blockdev_amend(const char *job_id,
 BlockdevAmendJob *s;
 const char *fmt = BlockdevDriver_str(options->driver);
 BlockDriver *drv = bdrv_find_format(fmt);
-BlockDriverState *bs = bdrv_find_node(node_name);
+BlockDriverState *bs;
 
+bs = bdrv_lookup_bs(NULL, node_name, errp);
+if (!bs) {
+return;
+}
 
 if (!drv) {
 error_setg(errp, "Block driver '%s' not found or not supported", fmt);
-- 
2.26.2




Re: [PATCH v1] qmp: don't hold ctx lock while querying blockstats

2020-07-10 Thread Kevin Wolf
Am 10.07.2020 um 10:54 hat Zhenyu Ye geschrieben:
> Because the QMP command runs in the main thread, and changes
> to the aio context of iothread will only be executed in the
> main thread (they will not be in parallel), so there is no
> need a lock protection while querying blockstats.
> 
> If we hold the lock here, while the I/O pressure is high in
> vm and the I/O returns slowly, the main thread will be stuck
> until the lock is released, which will affect the vcpu operation
> and finall cause the vm to be stuck.
> 
> Signed-off-by: Zhenyu Ye 

Are you saying that you think qmp_query_blockstats() runs in the
iothread? This is not true, it runs in the main thread.

This makes dropping the lock wrong, and even dropping it shouldn't make
a difference for other things that should run in the main thread because
the main thread is still busy executing the QMP command.

Am I missing something?

(Also, a change like this is most definitely *not* trivial.)

Kevin




Re: [PATCH v1] qmp: don't hold ctx lock while querying blockstats

2020-07-10 Thread no-reply
Patchew URL: https://patchew.org/QEMU/20200710085400.343-1-yezhen...@huawei.com/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

Not run: 259
Failures: 192
Failed 1 of 119 iotests
make: *** [check-tests/check-block.sh] Error 1
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 669, in 
sys.exit(main())
---
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', 
'--label', 'com.qemu.instance.uuid=c086654c8b434de29ab30387058535ba', '-u', 
'1001', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', 
'-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 
'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', 
'/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', 
'/var/tmp/patchew-tester-tmp-0xv6b4b9/src/docker-src.2020-07-10-05.11.36.18650:/var/tmp/qemu:z,ro',
 'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit 
status 2.
filter=--filter=label=com.qemu.instance.uuid=c086654c8b434de29ab30387058535ba
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-0xv6b4b9/src'
make: *** [docker-run-test-quick@centos7] Error 2

real15m29.715s
user0m9.321s


The full log is available at
http://patchew.org/logs/20200710085400.343-1-yezhen...@huawei.com/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [PATCH] block: Avoid stale pointer dereference in blk_get_aio_context()

2020-07-10 Thread Kevin Wolf
Am 09.07.2020 um 15:50 hat Greg Kurz geschrieben:
> It is possible for blk_remove_bs() to race with blk_drain_all(), causing
> the latter to dereference a stale blk->root pointer:
> 
> 
>   blk_remove_bs(blk)
>bdrv_root_unref_child(blk->root)
> child_bs = blk->root->bs
> bdrv_detach_child(blk->root)
>  ...
>  g_free(blk->root) <== blk->root becomes stale
> bdrv_unref(child_bs) < yield at some point
> 
> A blk_drain_all() can be triggered by some guest action in the
> meantime, eg. on POWER, SLOF might disable bus mastering on
> a virtio-scsi-pci device:
> 
>   virtio_write_config()
>virtio_pci_stop_ioeventfd()
> virtio_bus_stop_ioeventfd()
>  virtio_scsi_dataplane_stop()
>   blk_drain_all()
>blk_get_aio_context()
>bs = blk->root ? blk->root->bs : NULL
> ^
>   stale
> 
> Then, depending on one's luck, QEMU either crashes with SEGV or
> hits the assertion in blk_get_aio_context().
> 
> blk->root is set by blk_insert_bs() which calls bdrv_root_attach_child()
> first. The blk_remove_bs() function should rollback the changes made
> by blk_insert_bs() in the opposite order (or it should be documented
> somewhere why this isn't the case). Clear blk->root before calling
> bdrv_root_unref_child() in blk_remove_bs().
> 
> Signed-off-by: Greg Kurz 

Thanks, applied to the block branch.

Kevin




Re: [PULL 00/12] Block patches

2020-07-10 Thread Kevin Wolf
Am 09.07.2020 um 20:41 hat Eduardo Habkost geschrieben:
> On Thu, Jul 09, 2020 at 05:02:06PM +0200, Kevin Wolf wrote:
> > Am 08.07.2020 um 00:05 hat Eduardo Habkost geschrieben:
> > > On Tue, Jul 07, 2020 at 05:28:21PM +0200, Philippe Mathieu-Daudé wrote:
> > > > On 6/26/20 12:25 PM, Stefan Hajnoczi wrote:
> > > > > On Thu, Jun 25, 2020 at 02:31:14PM +0100, Peter Maydell wrote:
> > > > >> On Wed, 24 Jun 2020 at 11:02, Stefan Hajnoczi  
> > > > >> wrote:
> > > > >>>
> > > > >>> The following changes since commit 
> > > > >>> 171199f56f5f9bdf1e5d670d09ef1351d8f01bae:
> > > > >>>
> > > > >>>   Merge remote-tracking branch 
> > > > >>> 'remotes/alistair/tags/pull-riscv-to-apply-20200619-3' into staging 
> > > > >>> (2020-06-22 14:45:25 +0100)
> > > > >>>
> > > > >>> are available in the Git repository at:
> > > > >>>
> > > > >>>   https://github.com/stefanha/qemu.git tags/block-pull-request
> > > > >>>
> > > > >>> for you to fetch changes up to 
> > > > >>> 7838c67f22a81fcf669785cd6c0876438422071a:
> > > > >>>
> > > > >>>   block/nvme: support nested aio_poll() (2020-06-23 15:46:08 +0100)
> > > > >>>
> > > > >>> 
> > > > >>> Pull request
> > > > >>>
> > > > >>> 
> > > > >>
> > > > >> Failure on iotest 030, x86-64 Linux:
> > > > >>
> > > > >>   TESTiotest-qcow2: 030 [fail]
> > > > >> QEMU  --
> > > > >> "/home/petmay01/linaro/qemu-for-merges/build/alldbg/tests/qemu-iotests/../../x86_64-softmmu/qemu-system-x86_64"
> > > > >> -nodefaults -display none -accel qtest
> > > > >> QEMU_IMG  --
> > > > >> "/home/petmay01/linaro/qemu-for-merges/build/alldbg/tests/qemu-iotests/../../qemu-img"
> > > > >> QEMU_IO   --
> > > > >> "/home/petmay01/linaro/qemu-for-merges/build/alldbg/tests/qemu-iotests/../../qemu-io"
> > > > >>  --cache writeback --aio threads -f qcow2
> > > > >> QEMU_NBD  --
> > > > >> "/home/petmay01/linaro/qemu-for-merges/build/alldbg/tests/qemu-iotests/../../qemu-nbd"
> > > > >> IMGFMT-- qcow2 (compat=1.1)
> > > > >> IMGPROTO  -- file
> > > > >> PLATFORM  -- Linux/x86_64 e104462 4.15.0-76-generic
> > > > >> TEST_DIR  --
> > > > >> /home/petmay01/linaro/qemu-for-merges/build/alldbg/tests/qemu-iotests/scratch
> > > > >> SOCK_DIR  -- /tmp/tmp.8tgdDjoZcO
> > > > >> SOCKET_SCM_HELPER --
> > > > >> /home/petmay01/linaro/qemu-for-merges/build/alldbg/tests/qemu-iotest/socket_scm_helper
> > > > >>
> > > > >> --- /home/petmay01/linaro/qemu-for-merges/tests/qemu-iotests/030.out
> > > > >>  2019-07-15 17:18:35.251364738 +0100
> > > > >> +++ 
> > > > >> /home/petmay01/linaro/qemu-for-merges/build/alldbg/tests/qemu-iotests/030.out.bad
> > > > >>   2020-06-25 14:04:28.500534007 +0100
> > > > >> @@ -1,5 +1,17 @@
> > > > >> -...
> > > > >> +.F.
> > > > >> +==
> > > > >> +FAIL: test_stream_parallel (__main__.TestParallelOps)
> > > > >> +--
> > > > >> +Traceback (most recent call last):
> > > > >> +  File "030", line 246, in test_stream_parallel
> > > > >> +self.assert_qmp(result, 'return', {})
> > > > >> +  File 
> > > > >> "/home/petmay01/linaro/qemu-for-merges/tests/qemu-iotests/iotests.py",
> > > > >> line 848, in assert_qmp
> > > > >> +result = self.dictpath(d, path)
> > > > >> +  File 
> > > > >> "/home/petmay01/linaro/qemu-for-merges/tests/qemu-iotests/iotests.py",
> > > > >> line 822, in dictpath
> > > > >> +self.fail(f'failed path traversal for "{path}" in "{d}"')
> > > > >> +AssertionError: failed path traversal for "return" in "{'error':
> > > > >> {'class': 'DeviceNotActive', 'desc': "Block job 'stream-node8' not
> > > > >> found"}}"
> > > > >> +
> > > > >>  
> > > > >> --
> > > > >>  Ran 27 tests
> > > > >>
> > > > >> -OK
> > > > >> +FAILED (failures=1)
> > > > > 
> > > > > Strange, I can't reproduce this failure on my pull request branch or 
> > > > > on
> > > > > qemu.git/master.
> > > > > 
> > > > > Is this failure deterministic? Are you sure it is introduced by this
> > > > > pull request?
> > > > 
> > > > Probably not introduced by this pullreq, but I also hit it on FreeBSD:
> > > > https://cirrus-ci.com/task/4620718312783872?command=main#L5803
> > > > 
> > > >   TESTiotest-qcow2: 030 [fail]
> > > > QEMU  --
> > > > "/tmp/cirrus-ci-build/build/tests/qemu-iotests/../../aarch64-softmmu/qemu-system-aarch64"
> > > > -nodefaults -display none -machine virt -accel qtest
> > > > QEMU_IMG  --
> > > > "/tmp/cirrus-ci-build/build/tests/qemu-iotests/../../qemu-img"
> > > > QEMU_IO   --
> > > > "/tmp/cirrus-ci-build/build/tests/qemu-iotests/../../qemu-io"  --cache
> > > > writeback --aio threads -f qcow2
> > > > QEMU_NBD  --
> > > > 

Re: [PATCH 09/13] nbd: add GUri-based URI parsing version

2020-07-10 Thread Daniel P . Berrangé
On Thu, Jul 09, 2020 at 11:42:30PM +0400, Marc-André Lureau wrote:
> Signed-off-by: Marc-André Lureau 
> ---
>  block/nbd.c| 86 +++---
>  util/Makefile.objs |  2 +-
>  2 files changed, 66 insertions(+), 22 deletions(-)
> 
> diff --git a/block/nbd.c b/block/nbd.c
> index faadcab442b..fdc4a53a98f 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -31,7 +31,10 @@
>  #include "qemu/osdep.h"
>  
>  #include "trace.h"
> +#ifndef HAVE_GLIB_GURI
>  #include "qemu/uri.h"
> +#endif
> +#include "qemu/error-report.h"
>  #include "qemu/option.h"
>  #include "qemu/cutils.h"
>  #include "qemu/main-loop.h"
> @@ -1513,71 +1516,112 @@ static int nbd_client_connect(BlockDriverState *bs, 
> Error **errp)
>  /*
>   * Parse nbd_open options
>   */
> -
>  static int nbd_parse_uri(const char *filename, QDict *options)
>  {
> +const char *p, *scheme, *server, *socket = NULL;
> +int port;
> +bool is_unix;
> +
> +#ifdef HAVE_GLIB_GURI
> +g_autoptr(GUri) uri = NULL;
> +g_autoptr(GHashTable) params = NULL;
> +g_autoptr(GError) err = NULL;
> +
> +uri = g_uri_parse(filename, G_URI_FLAGS_ENCODED_QUERY, );
> +if (!uri) {
> +error_report("Failed to parse NBD URI: %s", err->message);
> +return -EINVAL;
> +}
> +
> +p = g_uri_get_path(uri);
> +scheme = g_uri_get_scheme(uri);
> +server = g_uri_get_host(uri);
> +port = g_uri_get_port(uri);

I would have expected this code to fail to compile as we're setting
GLIB_VERSION_MAX_ALLOWED == GLIB_VERSION_2_48  and GUri is tagged
as newer than that.

In any case, having this conditonal code in all callers is definitely
not a desirable approach. If we want to use it, then I think we need to
pull a copy of GUri into QEMU and expose it via glib-compat.h, so that
callers can use it unconditionally.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|