Re: [PATCH 3/5] vhost-user-blk: add VIRTIO_F_RING_PACKED feature bit

2020-05-24 Thread Raphael Norwitz
On Fri, May 22, 2020 at 1:20 PM Stefan Hajnoczi  wrote:
>
> Vhost devices have a list of feature bits that the device backend is
> allowed to control. The VIRTIO_F_RING_PACKED feature is a feature that
> must be negotiated through all the way to the device backend. Add it so
> the device backend can declare whether or not it supports the packed
> ring layout.
>
> Signed-off-by: Stefan Hajnoczi 

Reviewed-by: Raphael Norwitz 

> ---
>  hw/block/vhost-user-blk.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> index 9d8c0b3909..10e114a19a 100644
> --- a/hw/block/vhost-user-blk.c
> +++ b/hw/block/vhost-user-blk.c
> @@ -44,6 +44,7 @@ static const int user_feature_bits[] = {
>  VIRTIO_BLK_F_DISCARD,
>  VIRTIO_BLK_F_WRITE_ZEROES,
>  VIRTIO_F_VERSION_1,
> +VIRTIO_F_RING_PACKED,
>  VIRTIO_RING_F_INDIRECT_DESC,
>  VIRTIO_RING_F_EVENT_IDX,
>  VIRTIO_F_NOTIFY_ON_EMPTY,
> --
> 2.25.3
>



Re: [PATCH 4/5] vhost-scsi: add VIRTIO_F_VERSION_1 and VIRTIO_F_RING_PACKED

2020-05-24 Thread Raphael Norwitz
On Fri, May 22, 2020 at 1:19 PM Stefan Hajnoczi  wrote:
>
> Let vhost-scsi and vhost-user-scsi device backends determine whether
> VIRTIO 1.0 and packed virtqueues are supported. It doesn't make sense to
> handle these feature bits in QEMU since the device backend needs to
> support them if we want to use them.
>
> Signed-off-by: Stefan Hajnoczi 

Reviewed-by: Raphael Norwitz 

> ---
>  hw/scsi/vhost-scsi.c  | 2 ++
>  hw/scsi/vhost-user-scsi.c | 2 ++
>  2 files changed, 4 insertions(+)
>
> diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
> index c1b012aea4..a7fb788af5 100644
> --- a/hw/scsi/vhost-scsi.c
> +++ b/hw/scsi/vhost-scsi.c
> @@ -34,6 +34,8 @@
>
>  /* Features supported by host kernel. */
>  static const int kernel_feature_bits[] = {
> +VIRTIO_F_VERSION_1,
> +VIRTIO_F_RING_PACKED,
>  VIRTIO_F_NOTIFY_ON_EMPTY,
>  VIRTIO_RING_F_INDIRECT_DESC,
>  VIRTIO_RING_F_EVENT_IDX,
> diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
> index cbb5d97599..6aa0d5ded2 100644
> --- a/hw/scsi/vhost-user-scsi.c
> +++ b/hw/scsi/vhost-user-scsi.c
> @@ -32,6 +32,8 @@
>
>  /* Features supported by the host application */
>  static const int user_feature_bits[] = {
> +VIRTIO_F_VERSION_1,
> +VIRTIO_F_RING_PACKED,
>  VIRTIO_F_NOTIFY_ON_EMPTY,
>  VIRTIO_RING_F_INDIRECT_DESC,
>  VIRTIO_RING_F_EVENT_IDX,
> --
> 2.25.3
>



Re: [PATCH v3 2/2] vhost-user-blk: delay vhost_user_blk_disconnect

2020-05-24 Thread Raphael Norwitz
I'm mostly happy with this. A couple comments.

On Wed, May 20, 2020 at 11:54 AM Dima Stepanov  wrote:
>
> A socket write during vhost-user communication may trigger a disconnect
> event, calling vhost_user_blk_disconnect() and clearing all the
> vhost_dev structures holding data that vhost-user functions expect to
> remain valid to roll back initialization correctly. Delay the cleanup to
> keep vhost_dev structure valid.
> There are two possible states to handle:
> 1. RUN_STATE_PRELAUNCH: skip bh oneshot call and perform disconnect in
> the caller routine.
> 2. RUN_STATE_RUNNING: delay by using bh
>
> BH changes are based on the similar changes for the vhost-user-net
> device:
>   commit e7c83a885f865128ae3cf1946f8cb538b63cbfba
>   "vhost-user: delay vhost_user_stop"
>
I'd also give credit to Li Feng here - he sent a similar patch:

https://lists.gnu.org/archive/html/qemu-devel/2020-04/msg02255.html

> Signed-off-by: Dima Stepanov 
> ---
>  hw/block/vhost-user-blk.c | 49 
> +--
>  1 file changed, 43 insertions(+), 6 deletions(-)
>
> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> index 9d8c0b3..447fc9c 100644
> --- a/hw/block/vhost-user-blk.c
> +++ b/hw/block/vhost-user-blk.c
> @@ -337,11 +337,6 @@ static void vhost_user_blk_disconnect(DeviceState *dev)
>  VirtIODevice *vdev = VIRTIO_DEVICE(dev);
>  VHostUserBlk *s = VHOST_USER_BLK(vdev);
>
> -if (!s->connected) {
> -return;
> -}
> -s->connected = false;
> -
>  if (s->dev.started) {
>  vhost_user_blk_stop(vdev);
>  }
> @@ -349,6 +344,19 @@ static void vhost_user_blk_disconnect(DeviceState *dev)
>  vhost_dev_cleanup(>dev);
>  }
>
> +static void vhost_user_blk_event(void *opaque, QEMUChrEvent event);
> +
> +static void vhost_user_blk_chr_closed_bh(void *opaque)
> +{
> +DeviceState *dev = opaque;
> +VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> +VHostUserBlk *s = VHOST_USER_BLK(vdev);
> +
> +vhost_user_blk_disconnect(dev);
> +qemu_chr_fe_set_handlers(>chardev, NULL, NULL, vhost_user_blk_event,
> +NULL, opaque, NULL, true);
> +}
> +
>  static void vhost_user_blk_event(void *opaque, QEMUChrEvent event)
>  {
>  DeviceState *dev = opaque;
> @@ -363,7 +371,28 @@ static void vhost_user_blk_event(void *opaque, 
> QEMUChrEvent event)
>  }
>  break;
>  case CHR_EVENT_CLOSED:
> -vhost_user_blk_disconnect(dev);
> +/*
> + * A close event may happen during a read/write, but vhost
> + * code assumes the vhost_dev remains setup, so delay the
> + * stop & clear. There are two possible paths to hit this
> + * disconnect event:
> + * 1. When VM is in the RUN_STATE_PRELAUNCH state. The
> + * vhost_user_blk_device_realize() is a caller.
> + * 2. In tha main loop phase after VM start.
> + *
> + * For p2 the disconnect event will be delayed. We can't
> + * do the same for p1, because we are not running the loop
> + * at this moment. So just skip this step and perform
> + * disconnect in the caller function.
> + */
> +if (s->connected && runstate_is_running()) {
> +AioContext *ctx = qemu_get_current_aio_context();
> +
> +qemu_chr_fe_set_handlers(>chardev, NULL, NULL, NULL, NULL,
> +NULL, NULL, false);
> +aio_bh_schedule_oneshot(ctx, vhost_user_blk_chr_closed_bh, 
> opaque);
> +}
> +s->connected = false;
>  break;
>  case CHR_EVENT_BREAK:
>  case CHR_EVENT_MUX_IN:
> @@ -428,6 +457,14 @@ reconnect:
>
>  ret = vhost_dev_get_config(>dev, (uint8_t *)>blkcfg,
> sizeof(struct virtio_blk_config));

I find checking s->connected before ret a little confusing. I think we
should also enforce a reconnect if s->connected is false. AFIK if
s->connected is false, ret must also be less than 0, but to be safe
I’d prefer something like:

if (ret < 0 || !s->connected) {
if (!s->connected) {
/*
 * Perform disconnect before making reconnect. More detailed
 * comment why it was delayed is in the vhost_user_blk_event()
 * routine.
 */
  vhost_user_blk_disconnect(dev);
}
if (ret < 0) {
   error_report(“vhost-user-blk: get block config failed”);
}
goto reconnect;
}

> +if (!s->connected) {
> +/*
> + * Perform disconnect before making reconnect. More detailed
> + * comment why it was delayed is in the vhost_user_blk_event()
> + * routine.
> + */
> +vhost_user_blk_disconnect(dev);
> +}
>  if (ret < 0) {
>  error_report("vhost-user-blk: get block config failed");
>  goto reconnect;
> --
> 2.7.4
>
>



Re: [PATCH v3 2/2] vhost-user-blk: delay vhost_user_blk_disconnect

2020-05-24 Thread Jason Wang



On 2020/5/20 下午11:53, Dima Stepanov wrote:

A socket write during vhost-user communication may trigger a disconnect
event, calling vhost_user_blk_disconnect() and clearing all the
vhost_dev structures holding data that vhost-user functions expect to
remain valid to roll back initialization correctly. Delay the cleanup to
keep vhost_dev structure valid.
There are two possible states to handle:
1. RUN_STATE_PRELAUNCH: skip bh oneshot call and perform disconnect in
the caller routine.
2. RUN_STATE_RUNNING: delay by using bh

BH changes are based on the similar changes for the vhost-user-net
device:
   commit e7c83a885f865128ae3cf1946f8cb538b63cbfba
   "vhost-user: delay vhost_user_stop"



It's better to explain why we don't need to deal with case 1 in the 
vhost-user-net case.


And do we still need patch 1 if we had this patch??

Thanks




Signed-off-by: Dima Stepanov 
---
  hw/block/vhost-user-blk.c | 49 +--
  1 file changed, 43 insertions(+), 6 deletions(-)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 9d8c0b3..447fc9c 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -337,11 +337,6 @@ static void vhost_user_blk_disconnect(DeviceState *dev)
  VirtIODevice *vdev = VIRTIO_DEVICE(dev);
  VHostUserBlk *s = VHOST_USER_BLK(vdev);
  
-if (!s->connected) {

-return;
-}
-s->connected = false;
-
  if (s->dev.started) {
  vhost_user_blk_stop(vdev);
  }
@@ -349,6 +344,19 @@ static void vhost_user_blk_disconnect(DeviceState *dev)
  vhost_dev_cleanup(>dev);
  }
  
+static void vhost_user_blk_event(void *opaque, QEMUChrEvent event);

+
+static void vhost_user_blk_chr_closed_bh(void *opaque)
+{
+DeviceState *dev = opaque;
+VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+VHostUserBlk *s = VHOST_USER_BLK(vdev);
+
+vhost_user_blk_disconnect(dev);
+qemu_chr_fe_set_handlers(>chardev, NULL, NULL, vhost_user_blk_event,
+NULL, opaque, NULL, true);
+}
+
  static void vhost_user_blk_event(void *opaque, QEMUChrEvent event)
  {
  DeviceState *dev = opaque;
@@ -363,7 +371,28 @@ static void vhost_user_blk_event(void *opaque, 
QEMUChrEvent event)
  }
  break;
  case CHR_EVENT_CLOSED:
-vhost_user_blk_disconnect(dev);
+/*
+ * A close event may happen during a read/write, but vhost
+ * code assumes the vhost_dev remains setup, so delay the
+ * stop & clear. There are two possible paths to hit this
+ * disconnect event:
+ * 1. When VM is in the RUN_STATE_PRELAUNCH state. The
+ * vhost_user_blk_device_realize() is a caller.
+ * 2. In tha main loop phase after VM start.
+ *
+ * For p2 the disconnect event will be delayed. We can't
+ * do the same for p1, because we are not running the loop
+ * at this moment. So just skip this step and perform
+ * disconnect in the caller function.
+ */
+if (s->connected && runstate_is_running()) {
+AioContext *ctx = qemu_get_current_aio_context();
+
+qemu_chr_fe_set_handlers(>chardev, NULL, NULL, NULL, NULL,
+NULL, NULL, false);
+aio_bh_schedule_oneshot(ctx, vhost_user_blk_chr_closed_bh, opaque);
+}
+s->connected = false;
  break;
  case CHR_EVENT_BREAK:
  case CHR_EVENT_MUX_IN:
@@ -428,6 +457,14 @@ reconnect:
  
  ret = vhost_dev_get_config(>dev, (uint8_t *)>blkcfg,

 sizeof(struct virtio_blk_config));
+if (!s->connected) {
+/*
+ * Perform disconnect before making reconnect. More detailed
+ * comment why it was delayed is in the vhost_user_blk_event()
+ * routine.
+ */
+vhost_user_blk_disconnect(dev);
+}
  if (ret < 0) {
  error_report("vhost-user-blk: get block config failed");
  goto reconnect;





Re: [PATCH v3 0/4] Introduce 'yank' oob qmp command to recover from hanging qemu

2020-05-24 Thread no-reply
Patchew URL: https://patchew.org/QEMU/cover.1590344541.git.lukasstra...@web.de/



Hi,

This series failed the asan build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
export ARCH=x86_64
make docker-image-fedora V=1 NETWORK=1
time make docker-test-debug@fedora TARGET_LIST=x86_64-softmmu J=14 NETWORK=1
=== TEST SCRIPT END ===

migration/qemu-file-channel.o: in function `channel_close':
/tmp/qemu-test/src/migration/qemu-file-channel.c:111: undefined reference to 
`yank_generic_iochannel'
/usr/bin/ld: /tmp/qemu-test/src/migration/qemu-file-channel.c:110: undefined 
reference to `yank_unregister_function'
clang-8: error: linker command failed with exit code 1 (use -v to see 
invocation)
make: *** [/tmp/qemu-test/src/rules.mak:124: tests/test-vmstate] Error 1
make: *** Waiting for unfinished jobs
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 664, in 
---
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', 
'--label', 'com.qemu.instance.uuid=18e540c0fae14ac68e571023cc096e31', '-u', 
'1001', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 
'TARGET_LIST=x86_64-softmmu', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 
'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=', '-e', 'CCACHE_DIR=/var/tmp/ccache', 
'-v', '/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', 
'/var/tmp/patchew-tester-tmp-92q_wrsw/src/docker-src.2020-05-24-17.15.58.27154:/var/tmp/qemu:z,ro',
 'qemu:fedora', '/var/tmp/qemu/run', 'test-debug']' returned non-zero exit 
status 2.
filter=--filter=label=com.qemu.instance.uuid=18e540c0fae14ac68e571023cc096e31
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-92q_wrsw/src'
make: *** [docker-run-test-debug@fedora] Error 2

real5m14.699s
user0m8.970s


The full log is available at
http://patchew.org/logs/cover.1590344541.git.lukasstra...@web.de/testing.asan/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [PATCH v3 0/4] Introduce 'yank' oob qmp command to recover from hanging qemu

2020-05-24 Thread no-reply
Patchew URL: https://patchew.org/QEMU/cover.1590344541.git.lukasstra...@web.de/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

migration/qemu-file-channel.o: In function `channel_close':
/tmp/qemu-test/src/migration/qemu-file-channel.c:110: undefined reference to 
`yank_generic_iochannel'
/tmp/qemu-test/src/migration/qemu-file-channel.c:110: undefined reference to 
`yank_unregister_function'
collect2: error: ld returned 1 exit status
  LINKtests/test-cutils
make: *** [tests/test-vmstate] Error 1
make: *** Waiting for unfinished jobs
  TESTiotest-qcow2: 001
  TESTiotest-qcow2: 002
---
Not run: 259
Failures: 267
Failed 1 of 119 iotests
make: *** [check-tests/check-block.sh] Error 1
make: *** wait: No child processes.  Stop.
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 664, in 
---
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', 
'--label', 'com.qemu.instance.uuid=db483e1a7b374ed8a9cf3528ad2357b2', '-u', 
'1003', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', 
'-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 
'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', 
'/home/patchew2/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', 
'/var/tmp/patchew-tester-tmp-4gmncabb/src/docker-src.2020-05-24-17.04.07.24804:/var/tmp/qemu:z,ro',
 'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit 
status 2.
filter=--filter=label=com.qemu.instance.uuid=db483e1a7b374ed8a9cf3528ad2357b2
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-4gmncabb/src'
make: *** [docker-run-test-quick@centos7] Error 2

real12m5.130s
user0m8.082s


The full log is available at
http://patchew.org/logs/cover.1590344541.git.lukasstra...@web.de/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

[PATCH v3 4/4] migration: Add yank feature

2020-05-24 Thread Lukas Straub
Register yank functions on sockets to shut them down.

Signed-off-by: Lukas Straub 
---
 Makefile.objs |  1 +
 migration/channel.c   | 12 
 migration/migration.c | 18 +-
 migration/multifd.c   | 10 ++
 migration/qemu-file-channel.c |  6 ++
 5 files changed, 46 insertions(+), 1 deletion(-)

diff --git a/Makefile.objs b/Makefile.objs
index 5582f4eda9..d2a49d3834 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -58,6 +58,7 @@ common-obj-$(CONFIG_LINUX) += fsdev/

 common-obj-y += accel/
 common-obj-y += migration/
+common-obj-y += yank.o

 common-obj-y += audio/
 common-obj-m += audio/
diff --git a/migration/channel.c b/migration/channel.c
index 20e4c8e2dc..5cb48d403a 100644
--- a/migration/channel.c
+++ b/migration/channel.c
@@ -18,6 +18,8 @@
 #include "trace.h"
 #include "qapi/error.h"
 #include "io/channel-tls.h"
+#include "io/channel-socket.h"
+#include "yank.h"

 /**
  * @migration_channel_process_incoming - Create new incoming migration channel
@@ -35,6 +37,11 @@ void migration_channel_process_incoming(QIOChannel *ioc)
 trace_migration_set_incoming_channel(
 ioc, object_get_typename(OBJECT(ioc)));

+if (object_dynamic_cast(OBJECT(ioc), TYPE_QIO_CHANNEL_SOCKET)) {
+yank_register_function((char *) "migration", yank_generic_iochannel,
+   QIO_CHANNEL(ioc));
+}
+
 if (s->parameters.tls_creds &&
 *s->parameters.tls_creds &&
 !object_dynamic_cast(OBJECT(ioc),
@@ -67,6 +74,11 @@ void migration_channel_connect(MigrationState *s,
 ioc, object_get_typename(OBJECT(ioc)), hostname, error);

 if (!error) {
+if (object_dynamic_cast(OBJECT(ioc), TYPE_QIO_CHANNEL_SOCKET)) {
+yank_register_function((char *) "migration", 
yank_generic_iochannel,
+   QIO_CHANNEL(ioc));
+}
+
 if (s->parameters.tls_creds &&
 *s->parameters.tls_creds &&
 !object_dynamic_cast(OBJECT(ioc),
diff --git a/migration/migration.c b/migration/migration.c
index 187ac0410c..c6d7119c08 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -54,6 +54,7 @@
 #include "net/announce.h"
 #include "qemu/queue.h"
 #include "multifd.h"
+#include "yank.h"

 #define MAX_THROTTLE  (32 << 20)  /* Migration transfer speed throttling */

@@ -231,6 +232,8 @@ void migration_incoming_state_destroy(void)
 qapi_free_SocketAddressList(mis->socket_address_list);
 mis->socket_address_list = NULL;
 }
+
+yank_unregister_instance((char *) "migration");
 }

 static void migrate_generate_event(int new_state)
@@ -362,7 +365,9 @@ void qemu_start_incoming_migration(const char *uri, Error 
**errp)
 const char *p;

 qapi_event_send_migration(MIGRATION_STATUS_SETUP);
+yank_register_instance((char *) "migration");
 if (!strcmp(uri, "defer")) {
+yank_unregister_instance((char *) "migration");
 deferred_incoming_migration(errp);
 } else if (strstart(uri, "tcp:", )) {
 tcp_start_incoming_migration(p, errp);
@@ -377,6 +382,7 @@ void qemu_start_incoming_migration(const char *uri, Error 
**errp)
 } else if (strstart(uri, "fd:", )) {
 fd_start_incoming_migration(p, errp);
 } else {
+yank_unregister_instance((char *) "migration");
 error_setg(errp, "unknown migration protocol: %s", uri);
 }
 }
@@ -1632,6 +1638,7 @@ static void migrate_fd_cleanup(MigrationState *s)
 }
 notifier_list_notify(_state_notifiers, s);
 block_cleanup_parameters(s);
+yank_unregister_instance((char *) "migration");
 }

 static void migrate_fd_cleanup_schedule(MigrationState *s)
@@ -1899,6 +1906,7 @@ void qmp_migrate_recover(const char *uri, Error **errp)
  * only re-setup the migration stream and poke existing migration
  * to continue using that newly established channel.
  */
+yank_unregister_instance((char *) "migration");
 qemu_start_incoming_migration(uri, errp);
 }

@@ -2035,7 +2043,9 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
 /* Error detected, put into errp */
 return;
 }
-
+if (!(has_resume && resume)) {
+yank_register_instance((char *) "migration");
+}
 if (strstart(uri, "tcp:", )) {
 tcp_start_outgoing_migration(s, p, _err);
 #ifdef CONFIG_RDMA
@@ -2049,6 +2059,9 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
 } else if (strstart(uri, "fd:", )) {
 fd_start_outgoing_migration(s, p, _err);
 } else {
+if (!(has_resume && resume)) {
+yank_unregister_instance((char *) "migration");
+}
 error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "uri",
"a valid migration protocol");
 migrate_set_state(>state, MIGRATION_STATUS_SETUP,
@@ -2058,6 +2071,9 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
 }

 if (local_err) {
+if 

[PATCH v3 3/4] chardev/char-socket.c: Add yank feature

2020-05-24 Thread Lukas Straub
Register a yank function to shutdown the socket on yank.

Signed-off-by: Lukas Straub 
---
 Makefile.objs |  1 +
 chardev/char-socket.c | 24 
 2 files changed, 25 insertions(+)

diff --git a/Makefile.objs b/Makefile.objs
index 8e403b81f3..5582f4eda9 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -10,6 +10,7 @@ qom-obj-y = qom/
 ifeq ($(call lor,$(CONFIG_SOFTMMU),$(CONFIG_TOOLS)),y)

 chardev-obj-y = chardev/
+chardev-obj-y += yank.o

 authz-obj-y = authz/

diff --git a/chardev/char-socket.c b/chardev/char-socket.c
index 185fe38dda..d5c6cd2153 100644
--- a/chardev/char-socket.c
+++ b/chardev/char-socket.c
@@ -34,6 +34,7 @@
 #include "qapi/error.h"
 #include "qapi/clone-visitor.h"
 #include "qapi/qapi-visit-sockets.h"
+#include "yank.h"

 #include "chardev/char-io.h"

@@ -69,6 +70,7 @@ typedef struct {
 size_t read_msgfds_num;
 int *write_msgfds;
 size_t write_msgfds_num;
+char *yank_name;

 SocketAddress *addr;
 bool is_listen;
@@ -409,6 +411,11 @@ static void tcp_chr_free_connection(Chardev *chr)

 tcp_set_msgfds(chr, NULL, 0);
 remove_fd_in_watch(chr);
+if (s->state == TCP_CHARDEV_STATE_CONNECTING
+|| s->state == TCP_CHARDEV_STATE_CONNECTED) {
+yank_unregister_function(s->yank_name, yank_generic_iochannel,
+ QIO_CHANNEL(s->sioc));
+}
 object_unref(OBJECT(s->sioc));
 s->sioc = NULL;
 object_unref(OBJECT(s->ioc));
@@ -912,6 +919,8 @@ static int tcp_chr_add_client(Chardev *chr, int fd)
 }
 tcp_chr_change_state(s, TCP_CHARDEV_STATE_CONNECTING);
 tcp_chr_set_client_ioc_name(chr, sioc);
+yank_register_function(s->yank_name, yank_generic_iochannel,
+   QIO_CHANNEL(sioc));
 ret = tcp_chr_new_client(chr, sioc);
 object_unref(OBJECT(sioc));
 return ret;
@@ -926,6 +935,8 @@ static void tcp_chr_accept(QIONetListener *listener,

 tcp_chr_change_state(s, TCP_CHARDEV_STATE_CONNECTING);
 tcp_chr_set_client_ioc_name(chr, cioc);
+yank_register_function(s->yank_name, yank_generic_iochannel,
+   QIO_CHANNEL(cioc));
 tcp_chr_new_client(chr, cioc);
 }

@@ -941,6 +952,8 @@ static int tcp_chr_connect_client_sync(Chardev *chr, Error 
**errp)
 object_unref(OBJECT(sioc));
 return -1;
 }
+yank_register_function(s->yank_name, yank_generic_iochannel,
+   QIO_CHANNEL(sioc));
 tcp_chr_new_client(chr, sioc);
 object_unref(OBJECT(sioc));
 return 0;
@@ -956,6 +969,8 @@ static void tcp_chr_accept_server_sync(Chardev *chr)
 tcp_chr_change_state(s, TCP_CHARDEV_STATE_CONNECTING);
 sioc = qio_net_listener_wait_client(s->listener);
 tcp_chr_set_client_ioc_name(chr, sioc);
+yank_register_function(s->yank_name, yank_generic_iochannel,
+   QIO_CHANNEL(sioc));
 tcp_chr_new_client(chr, sioc);
 object_unref(OBJECT(sioc));
 }
@@ -1066,6 +1081,8 @@ static void char_socket_finalize(Object *obj)
 object_unref(OBJECT(s->tls_creds));
 }
 g_free(s->tls_authz);
+yank_unregister_instance(s->yank_name);
+g_free(s->yank_name);

 qemu_chr_be_event(chr, CHR_EVENT_CLOSED);
 }
@@ -1081,6 +1098,8 @@ static void qemu_chr_socket_connected(QIOTask *task, void 
*opaque)

 if (qio_task_propagate_error(task, )) {
 tcp_chr_change_state(s, TCP_CHARDEV_STATE_DISCONNECTED);
+yank_unregister_function(s->yank_name, yank_generic_iochannel,
+ QIO_CHANNEL(sioc));
 check_report_connect_error(chr, err);
 error_free(err);
 goto cleanup;
@@ -1115,6 +1134,8 @@ static void tcp_chr_connect_client_async(Chardev *chr)
 tcp_chr_change_state(s, TCP_CHARDEV_STATE_CONNECTING);
 sioc = qio_channel_socket_new();
 tcp_chr_set_client_ioc_name(chr, sioc);
+yank_register_function(s->yank_name, yank_generic_iochannel,
+   QIO_CHANNEL(sioc));
 /*
  * Normally code would use the qio_channel_socket_connect_async
  * method which uses a QIOTask + qio_task_set_error internally
@@ -1356,6 +1377,9 @@ static void qmp_chardev_open_socket(Chardev *chr,
 qemu_chr_set_feature(chr, QEMU_CHAR_FEATURE_FD_PASS);
 }

+s->yank_name = g_strconcat("chardev:", chr->label, NULL);
+yank_register_instance(s->yank_name);
+
 /* be isn't opened until we get a connection */
 *be_opened = false;

--
2.20.1



pgp2XInlfhN9m.pgp
Description: OpenPGP digital signature


[PATCH v3 1/4] Introduce yank feature

2020-05-24 Thread Lukas Straub
The yank feature allows to recover from hanging qemu by "yanking"
at various parts. Other qemu systems can register themselves and
multiple yank functions. Then all yank functions for selected
instances can be called by the 'yank' out-of-band qmp command.
Available instances can be queried by a 'query-yank' oob command.

Signed-off-by: Lukas Straub 
---
 qapi/misc.json |  45 +
 yank.c | 174 +
 yank.h |  67 +++
 3 files changed, 286 insertions(+)
 create mode 100644 yank.c
 create mode 100644 yank.h

diff --git a/qapi/misc.json b/qapi/misc.json
index 99b90ac80b..f5228b2502 100644
--- a/qapi/misc.json
+++ b/qapi/misc.json
@@ -1550,3 +1550,48 @@
 ##
 { 'command': 'query-vm-generation-id', 'returns': 'GuidInfo' }

+##
+# @YankInstances:
+#
+# @instances: List of yank instances.
+#
+# Yank instances are named after the following schema:
+# "blockdev:", "chardev:" and "migration"
+#
+# Since: 5.1
+##
+{ 'struct': 'YankInstances', 'data': {'instances': ['str'] } }
+
+##
+# @yank:
+#
+# Recover from hanging qemu by yanking the specified instances.
+#
+# Takes @YankInstances as argument.
+#
+# Returns: nothing.
+#
+# Example:
+#
+# -> { "execute": "yank", "arguments": { "instances": ["blockdev:nbd0"] } }
+# <- { "return": {} }
+#
+# Since: 5.1
+##
+{ 'command': 'yank', 'data': 'YankInstances', 'allow-oob': true }
+
+##
+# @query-yank:
+#
+# Query yank instances.
+#
+# Returns: @YankInstances
+#
+# Example:
+#
+# -> { "execute": "query-yank" }
+# <- { "return": { "instances": ["blockdev:nbd0"] } }
+#
+# Since: 5.1
+##
+{ 'command': 'query-yank', 'returns': 'YankInstances', 'allow-oob': true }
diff --git a/yank.c b/yank.c
new file mode 100644
index 00..36d8139d4d
--- /dev/null
+++ b/yank.c
@@ -0,0 +1,174 @@
+/*
+ * QEMU yank feature
+ *
+ * Copyright (c) Lukas Straub 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qemu/thread.h"
+#include "qemu/queue.h"
+#include "qapi/qapi-commands-misc.h"
+#include "io/channel.h"
+#include "yank.h"
+
+struct YankFuncAndParam {
+YankFn *func;
+void *opaque;
+QLIST_ENTRY(YankFuncAndParam) next;
+};
+
+struct YankInstance {
+char *name;
+QLIST_HEAD(, YankFuncAndParam) yankfns;
+QLIST_ENTRY(YankInstance) next;
+};
+
+static QemuMutex lock;
+static QLIST_HEAD(yankinst_list, YankInstance) head
+= QLIST_HEAD_INITIALIZER(head);
+
+static struct YankInstance *yank_find_instance(char *name)
+{
+struct YankInstance *tmp, *instance;
+instance = NULL;
+QLIST_FOREACH(tmp, , next) {
+if (!strcmp(tmp->name, name)) {
+instance = tmp;
+}
+}
+return instance;
+}
+
+void yank_register_instance(char *instance_name)
+{
+struct YankInstance *instance;
+
+qemu_mutex_lock();
+assert(!yank_find_instance(instance_name));
+
+instance = g_slice_new(struct YankInstance);
+instance->name = g_strdup(instance_name);
+QLIST_INIT(>yankfns);
+QLIST_INSERT_HEAD(, instance, next);
+
+qemu_mutex_unlock();
+}
+
+void yank_unregister_instance(char *instance_name)
+{
+struct YankInstance *instance;
+
+qemu_mutex_lock();
+instance = yank_find_instance(instance_name);
+assert(instance);
+
+assert(QLIST_EMPTY(>yankfns));
+QLIST_REMOVE(instance, next);
+g_free(instance->name);
+g_slice_free(struct YankInstance, instance);
+
+qemu_mutex_unlock();
+}
+
+void yank_register_function(char *instance_name, YankFn *func, void *opaque)
+{
+struct YankInstance *instance;
+struct YankFuncAndParam *entry;
+
+qemu_mutex_lock();
+instance = yank_find_instance(instance_name);
+assert(instance);
+
+entry = g_slice_new(struct YankFuncAndParam);
+entry->func = func;
+entry->opaque = opaque;
+
+QLIST_INSERT_HEAD(>yankfns, entry, next);
+qemu_mutex_unlock();
+}
+
+void yank_unregister_function(char *instance_name, YankFn *func, void *opaque)
+{
+struct YankInstance *instance;
+struct YankFuncAndParam *entry;
+
+qemu_mutex_lock();
+instance = yank_find_instance(instance_name);
+assert(instance);
+
+QLIST_FOREACH(entry, >yankfns, next) {
+if (entry->func == func && entry->opaque == opaque) {
+QLIST_REMOVE(entry, next);
+g_slice_free(struct YankFuncAndParam, entry);
+qemu_mutex_unlock();
+return;
+}
+}
+
+abort();
+}
+
+void yank_generic_iochannel(void *opaque)
+{
+QIOChannel *ioc = QIO_CHANNEL(opaque);
+
+qio_channel_shutdown(ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
+}
+
+void qmp_yank(strList *instances, Error **errp)
+{
+strList *tmp;
+struct YankInstance *instance;
+struct YankFuncAndParam *entry;
+
+qemu_mutex_lock();
+tmp = instances;
+for (; tmp; tmp = tmp->next) {
+instance = 

[PATCH v3 2/4] block/nbd.c: Add yank feature

2020-05-24 Thread Lukas Straub
Register a yank function which shuts down the socket and sets
s->state = NBD_CLIENT_QUIT. This is the same behaviour as if an
error occured.

Signed-off-by: Lukas Straub 
---
 Makefile.objs |   1 +
 block/nbd.c   | 101 --
 2 files changed, 65 insertions(+), 37 deletions(-)

diff --git a/Makefile.objs b/Makefile.objs
index a7c967633a..8e403b81f3 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -18,6 +18,7 @@ block-obj-y += block.o blockjob.o job.o
 block-obj-y += block/ scsi/
 block-obj-y += qemu-io-cmds.o
 block-obj-$(CONFIG_REPLICATION) += replication.o
+block-obj-y += yank.o

 block-obj-m = block/

diff --git a/block/nbd.c b/block/nbd.c
index 2160859f64..3a41749f1b 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -35,6 +35,7 @@
 #include "qemu/option.h"
 #include "qemu/cutils.h"
 #include "qemu/main-loop.h"
+#include "qemu/atomic.h"

 #include "qapi/qapi-visit-sockets.h"
 #include "qapi/qmp/qstring.h"
@@ -43,6 +44,8 @@
 #include "block/nbd.h"
 #include "block/block_int.h"

+#include "yank.h"
+
 #define EN_OPTSTR ":exportname="
 #define MAX_NBD_REQUESTS16

@@ -84,6 +87,8 @@ typedef struct BDRVNBDState {
 NBDReply reply;
 BlockDriverState *bs;

+char *yank_name;
+
 /* Connection parameters */
 uint32_t reconnect_delay;
 SocketAddress *saddr;
@@ -94,6 +99,7 @@ typedef struct BDRVNBDState {
 } BDRVNBDState;

 static int nbd_client_connect(BlockDriverState *bs, Error **errp);
+static void nbd_yank(void *opaque);

 static void nbd_clear_bdrvstate(BDRVNBDState *s)
 {
@@ -106,17 +112,19 @@ static void nbd_clear_bdrvstate(BDRVNBDState *s)
 s->tlscredsid = NULL;
 g_free(s->x_dirty_bitmap);
 s->x_dirty_bitmap = NULL;
+g_free(s->yank_name);
+s->yank_name = NULL;
 }

 static void nbd_channel_error(BDRVNBDState *s, int ret)
 {
 if (ret == -EIO) {
-if (s->state == NBD_CLIENT_CONNECTED) {
+if (atomic_read(>state) == NBD_CLIENT_CONNECTED) {
 s->state = s->reconnect_delay ? NBD_CLIENT_CONNECTING_WAIT :
 NBD_CLIENT_CONNECTING_NOWAIT;
 }
 } else {
-if (s->state == NBD_CLIENT_CONNECTED) {
+if (atomic_read(>state) == NBD_CLIENT_CONNECTED) {
 qio_channel_shutdown(s->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
 }
 s->state = NBD_CLIENT_QUIT;
@@ -167,7 +175,7 @@ static void nbd_client_attach_aio_context(BlockDriverState 
*bs,
  * s->connection_co is either yielded from nbd_receive_reply or from
  * nbd_co_reconnect_loop()
  */
-if (s->state == NBD_CLIENT_CONNECTED) {
+if (atomic_read(>state) == NBD_CLIENT_CONNECTED) {
 qio_channel_attach_aio_context(QIO_CHANNEL(s->ioc), new_context);
 }

@@ -206,7 +214,7 @@ static void nbd_teardown_connection(BlockDriverState *bs)
 {
 BDRVNBDState *s = (BDRVNBDState *)bs->opaque;

-if (s->state == NBD_CLIENT_CONNECTED) {
+if (atomic_read(>state) == NBD_CLIENT_CONNECTED) {
 /* finish any pending coroutines */
 assert(s->ioc);
 qio_channel_shutdown(s->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
@@ -230,13 +238,14 @@ static void nbd_teardown_connection(BlockDriverState *bs)

 static bool nbd_client_connecting(BDRVNBDState *s)
 {
-return s->state == NBD_CLIENT_CONNECTING_WAIT ||
-s->state == NBD_CLIENT_CONNECTING_NOWAIT;
+NBDClientState state = atomic_read(>state);
+return state == NBD_CLIENT_CONNECTING_WAIT ||
+state == NBD_CLIENT_CONNECTING_NOWAIT;
 }

 static bool nbd_client_connecting_wait(BDRVNBDState *s)
 {
-return s->state == NBD_CLIENT_CONNECTING_WAIT;
+return atomic_read(>state) == NBD_CLIENT_CONNECTING_WAIT;
 }

 static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s)
@@ -274,6 +283,7 @@ static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState 
*s)
 /* Finalize previous connection if any */
 if (s->ioc) {
 nbd_client_detach_aio_context(s->bs);
+yank_unregister_function(s->yank_name, nbd_yank, s->bs);
 object_unref(OBJECT(s->sioc));
 s->sioc = NULL;
 object_unref(OBJECT(s->ioc));
@@ -305,7 +315,7 @@ static coroutine_fn void nbd_co_reconnect_loop(BDRVNBDState 
*s)
 nbd_reconnect_attempt(s);

 while (nbd_client_connecting(s)) {
-if (s->state == NBD_CLIENT_CONNECTING_WAIT &&
+if (atomic_read(>state) == NBD_CLIENT_CONNECTING_WAIT &&
 qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - start_time_ns > delay_ns)
 {
 s->state = NBD_CLIENT_CONNECTING_NOWAIT;
@@ -341,7 +351,7 @@ static coroutine_fn void nbd_connection_entry(void *opaque)
 int ret = 0;
 Error *local_err = NULL;

-while (s->state != NBD_CLIENT_QUIT) {
+while (atomic_read(>state) != NBD_CLIENT_QUIT) {
 /*
  * The NBD client can only really be considered idle when it has
  * yielded from qio_channel_readv_all_eof(), waiting for data. This is
@@ -356,7 +366,7 @@ static coroutine_fn 

[PATCH v3 0/4] Introduce 'yank' oob qmp command to recover from hanging qemu

2020-05-24 Thread Lukas Straub
Hello Everyone,
In many cases, if qemu has a network connection (qmp, migration, chardev, etc.)
to some other server and that server dies or hangs, qemu hangs too.
These patches introduce the new 'yank' out-of-band qmp command to recover from
these kinds of hangs. The different subsystems register callbacks which get
executed with the yank command. For example the callback can shutdown() a
socket. This is intended for the colo use-case, but it can be used for other
things too of course.

Regards,
Lukas Straub

v3:
 -don't touch softmmu/vl.c, use __contructor__ attribute instead (Paolo Bonzini)
 -fix build errors
 -rewrite migration patch so it actually passes all tests

v2:
 -don't touch io/ code anymore
 -always register yank functions
 -'yank' now takes a list of instances to yank
 -'query-yank' returns a list of yankable instances

Lukas Straub (4):
  Introduce yank feature
  block/nbd.c: Add yank feature
  chardev/char-socket.c: Add yank feature
  migration: Add yank feature

 Makefile.objs |   3 +
 block/nbd.c   | 101 
 chardev/char-socket.c |  24 +
 migration/channel.c   |  12 +++
 migration/migration.c |  18 +++-
 migration/multifd.c   |  10 ++
 migration/qemu-file-channel.c |   6 ++
 qapi/misc.json|  45 +
 yank.c| 174 ++
 yank.h|  67 +
 10 files changed, 422 insertions(+), 38 deletions(-)
 create mode 100644 yank.c
 create mode 100644 yank.h

--
2.20.1


pgpG5snNlErX2.pgp
Description: OpenPGP digital signature


Re: [PATCH v6 00/32] Add subcluster allocation to qcow2

2020-05-24 Thread no-reply
Patchew URL: https://patchew.org/QEMU/cover.1590331741.git.be...@igalia.com/



Hi,

This series failed the asan build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
export ARCH=x86_64
make docker-image-fedora V=1 NETWORK=1
time make docker-test-debug@fedora TARGET_LIST=x86_64-softmmu J=14 NETWORK=1
=== TEST SCRIPT END ===

  CC  block/parallels.o
  CC  block/blklogwrites.o
  CC  block/block-backend.o
/tmp/qemu-test/src/block/qcow2-cluster.c:1912:54: error: implicit conversion 
from enumeration type 'QCow2ClusterType' (aka 'enum QCow2ClusterType') to 
different enumeration type 'enum qcow2_discard_type' [-Werror,-Wenum-conversion]
qcow2_free_any_clusters(bs, old_l2_entry, 1, type);
~~~  ^~~~
1 error generated.
make: *** [/tmp/qemu-test/src/rules.mak:69: block/qcow2-cluster.o] Error 1
make: *** Waiting for unfinished jobs
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 664, in 
---
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', 
'--label', 'com.qemu.instance.uuid=7888e00ead4743f2936bfa5f2bc1cd85', '-u', 
'1003', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 
'TARGET_LIST=x86_64-softmmu', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 
'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=', '-e', 'CCACHE_DIR=/var/tmp/ccache', 
'-v', '/home/patchew2/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', 
'/var/tmp/patchew-tester-tmp-a3y14gp6/src/docker-src.2020-05-24-12.43.26.11536:/var/tmp/qemu:z,ro',
 'qemu:fedora', '/var/tmp/qemu/run', 'test-debug']' returned non-zero exit 
status 2.
filter=--filter=label=com.qemu.instance.uuid=7888e00ead4743f2936bfa5f2bc1cd85
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-a3y14gp6/src'
make: *** [docker-run-test-debug@fedora] Error 2

real3m28.097s
user0m7.204s


The full log is available at
http://patchew.org/logs/cover.1590331741.git.be...@igalia.com/testing.asan/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [PATCH v6 00/32] Add subcluster allocation to qcow2

2020-05-24 Thread no-reply
Patchew URL: https://patchew.org/QEMU/cover.1590331741.git.be...@igalia.com/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

  CC  block/snapshot.o
  CC  block/qapi.o
/tmp/qemu-test/src/block/qcow2-cluster.c: In function 'qcow2_get_host_offset':
/tmp/qemu-test/src/block/qcow2-cluster.c:473:19: error: 'expected_type' may be 
used uninitialized in this function [-Werror=maybe-uninitialized]
 } else if (type != expected_type) {
   ^
/tmp/qemu-test/src/block/qcow2-cluster.c:449:25: note: 'expected_type' was 
declared here
 QCow2SubclusterType expected_type, type;
 ^
/tmp/qemu-test/src/block/qcow2-cluster.c:475:19: error: 'check_offset' may be 
used uninitialized in this function [-Werror=maybe-uninitialized]
 } else if (check_offset) {
   ^
/tmp/qemu-test/src/block/qcow2-cluster.c:447:10: note: 'check_offset' was 
declared here
 bool check_offset;
  ^
/tmp/qemu-test/src/block/qcow2-cluster.c:476:29: error: 'expected_offset' may 
be used uninitialized in this function [-Werror=maybe-uninitialized]
 expected_offset += s->cluster_size;
 ^
/tmp/qemu-test/src/block/qcow2-cluster.c:448:14: note: 'expected_offset' was 
declared here
 uint64_t expected_offset;
  ^
cc1: all warnings being treated as errors
make: *** [block/qcow2-cluster.o] Error 1
make: *** Waiting for unfinished jobs
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 664, in 
---
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', 
'--label', 'com.qemu.instance.uuid=2b00ed76657046888b159ce8f43ddafe', '-u', 
'1003', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', 
'-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 
'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', 
'/home/patchew2/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', 
'/var/tmp/patchew-tester-tmp-8sa3xlcn/src/docker-src.2020-05-24-12.39.44.4969:/var/tmp/qemu:z,ro',
 'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit 
status 2.
filter=--filter=label=com.qemu.instance.uuid=2b00ed76657046888b159ce8f43ddafe
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-8sa3xlcn/src'
make: *** [docker-run-test-quick@centos7] Error 2

real2m27.867s
user0m4.586s


The full log is available at
http://patchew.org/logs/cover.1590331741.git.be...@igalia.com/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [PATCH v6 00/32] Add subcluster allocation to qcow2

2020-05-24 Thread no-reply
Patchew URL: https://patchew.org/QEMU/cover.1590331741.git.be...@igalia.com/



Hi,

This series failed the asan build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
export ARCH=x86_64
make docker-image-fedora V=1 NETWORK=1
time make docker-test-debug@fedora TARGET_LIST=x86_64-softmmu J=14 NETWORK=1
=== TEST SCRIPT END ===

  CC  block/qed-table.o
  CC  block/qed-cluster.o
  CC  block/qed-check.o
/tmp/qemu-test/src/block/qcow2-cluster.c:1912:54: error: implicit conversion 
from enumeration type 'QCow2ClusterType' (aka 'enum QCow2ClusterType') to 
different enumeration type 'enum qcow2_discard_type' [-Werror,-Wenum-conversion]
qcow2_free_any_clusters(bs, old_l2_entry, 1, type);
~~~  ^~~~
1 error generated.
make: *** [/tmp/qemu-test/src/rules.mak:69: block/qcow2-cluster.o] Error 1
make: *** Waiting for unfinished jobs
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 664, in 
---
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', 
'--label', 'com.qemu.instance.uuid=247935bf1140449991b4816970528c03', '-u', 
'1003', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 
'TARGET_LIST=x86_64-softmmu', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 
'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=', '-e', 'CCACHE_DIR=/var/tmp/ccache', 
'-v', '/home/patchew2/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', 
'/var/tmp/patchew-tester-tmp-59iulxz0/src/docker-src.2020-05-24-12.30.45.31580:/var/tmp/qemu:z,ro',
 'qemu:fedora', '/var/tmp/qemu/run', 'test-debug']' returned non-zero exit 
status 2.
filter=--filter=label=com.qemu.instance.uuid=247935bf1140449991b4816970528c03
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-59iulxz0/src'
make: *** [docker-run-test-debug@fedora] Error 2

real3m31.674s
user0m8.521s


The full log is available at
http://patchew.org/logs/cover.1590331741.git.be...@igalia.com/testing.asan/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [PATCH v6 00/32] Add subcluster allocation to qcow2

2020-05-24 Thread no-reply
Patchew URL: https://patchew.org/QEMU/cover.1590331741.git.be...@igalia.com/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

  CC  block/qed-table.o
  CC  block/qed-cluster.o
/tmp/qemu-test/src/block/qcow2-cluster.c: In function 'qcow2_get_host_offset':
/tmp/qemu-test/src/block/qcow2-cluster.c:473:19: error: 'expected_type' may be 
used uninitialized in this function [-Werror=maybe-uninitialized]
 } else if (type != expected_type) {
   ^
/tmp/qemu-test/src/block/qcow2-cluster.c:449:25: note: 'expected_type' was 
declared here
 QCow2SubclusterType expected_type, type;
 ^
/tmp/qemu-test/src/block/qcow2-cluster.c:475:19: error: 'check_offset' may be 
used uninitialized in this function [-Werror=maybe-uninitialized]
 } else if (check_offset) {
   ^
/tmp/qemu-test/src/block/qcow2-cluster.c:447:10: note: 'check_offset' was 
declared here
 bool check_offset;
  ^
/tmp/qemu-test/src/block/qcow2-cluster.c:476:29: error: 'expected_offset' may 
be used uninitialized in this function [-Werror=maybe-uninitialized]
 expected_offset += s->cluster_size;
 ^
/tmp/qemu-test/src/block/qcow2-cluster.c:448:14: note: 'expected_offset' was 
declared here
 uint64_t expected_offset;
  ^
cc1: all warnings being treated as errors
make: *** [block/qcow2-cluster.o] Error 1
make: *** Waiting for unfinished jobs
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 664, in 
---
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', 
'--label', 'com.qemu.instance.uuid=5d5c1a3ec20c4733b2b49cfa8c25190e', '-u', 
'1003', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', 
'-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 
'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', 
'/home/patchew2/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', 
'/var/tmp/patchew-tester-tmp-ke8h2yeo/src/docker-src.2020-05-24-12.27.22.23564:/var/tmp/qemu:z,ro',
 'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit 
status 2.
filter=--filter=label=com.qemu.instance.uuid=5d5c1a3ec20c4733b2b49cfa8c25190e
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-ke8h2yeo/src'
make: *** [docker-run-test-quick@centos7] Error 2

real2m28.041s
user0m8.588s


The full log is available at
http://patchew.org/logs/cover.1590331741.git.be...@igalia.com/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

[PATCH v6 04/32] qcow2: Split cluster_needs_cow() out of count_cow_clusters()

2020-05-24 Thread Alberto Garcia
We are going to need it in other places.

Signed-off-by: Alberto Garcia 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
---
 block/qcow2-cluster.c | 34 +++---
 1 file changed, 19 insertions(+), 15 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 61ad638bdc..80f9787461 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1087,6 +1087,24 @@ static void calculate_l2_meta(BlockDriverState *bs,
 QLIST_INSERT_HEAD(>cluster_allocs, *m, next_in_flight);
 }
 
+/* Returns true if writing to a cluster requires COW */
+static bool cluster_needs_cow(BlockDriverState *bs, uint64_t l2_entry)
+{
+switch (qcow2_get_cluster_type(bs, l2_entry)) {
+case QCOW2_CLUSTER_NORMAL:
+if (l2_entry & QCOW_OFLAG_COPIED) {
+return false;
+}
+case QCOW2_CLUSTER_UNALLOCATED:
+case QCOW2_CLUSTER_COMPRESSED:
+case QCOW2_CLUSTER_ZERO_PLAIN:
+case QCOW2_CLUSTER_ZERO_ALLOC:
+return true;
+default:
+abort();
+}
+}
+
 /*
  * Returns the number of contiguous clusters that can be used for an allocating
  * write, but require COW to be performed (this includes yet unallocated space,
@@ -1099,25 +1117,11 @@ static int count_cow_clusters(BlockDriverState *bs, int 
nb_clusters,
 
 for (i = 0; i < nb_clusters; i++) {
 uint64_t l2_entry = be64_to_cpu(l2_slice[l2_index + i]);
-QCow2ClusterType cluster_type = qcow2_get_cluster_type(bs, l2_entry);
-
-switch(cluster_type) {
-case QCOW2_CLUSTER_NORMAL:
-if (l2_entry & QCOW_OFLAG_COPIED) {
-goto out;
-}
+if (!cluster_needs_cow(bs, l2_entry)) {
 break;
-case QCOW2_CLUSTER_UNALLOCATED:
-case QCOW2_CLUSTER_COMPRESSED:
-case QCOW2_CLUSTER_ZERO_PLAIN:
-case QCOW2_CLUSTER_ZERO_ALLOC:
-break;
-default:
-abort();
 }
 }
 
-out:
 assert(i <= nb_clusters);
 return i;
 }
-- 
2.20.1




[PATCH v6 30/32] qcow2: Add the 'extended_l2' option and the QCOW2_INCOMPAT_EXTL2 bit

2020-05-24 Thread Alberto Garcia
Now that the implementation of subclusters is complete we can finally
add the necessary options to create and read images with this feature,
which we call "extended L2 entries".

Signed-off-by: Alberto Garcia 
---
 qapi/block-core.json |   7 +++
 block/qcow2.h|   8 ++-
 include/block/block_int.h|   1 +
 block/qcow2.c|  74 --
 tests/qemu-iotests/031.out   |   8 +--
 tests/qemu-iotests/036.out   |   4 +-
 tests/qemu-iotests/049.out   | 102 +++
 tests/qemu-iotests/060.out   |   1 +
 tests/qemu-iotests/061.out   |  20 +++---
 tests/qemu-iotests/065   |  12 ++--
 tests/qemu-iotests/082.out   |  48 ---
 tests/qemu-iotests/085.out   |  38 ++--
 tests/qemu-iotests/144.out   |   4 +-
 tests/qemu-iotests/182.out   |   2 +-
 tests/qemu-iotests/185.out   |   8 +--
 tests/qemu-iotests/198.out   |   2 +
 tests/qemu-iotests/206.out   |   4 ++
 tests/qemu-iotests/242.out   |   5 ++
 tests/qemu-iotests/255.out   |   8 +--
 tests/qemu-iotests/274.out   |  49 ---
 tests/qemu-iotests/280.out   |   2 +-
 tests/qemu-iotests/common.filter |   1 +
 22 files changed, 268 insertions(+), 140 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 6fbacddab2..0cb80bda9b 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -66,6 +66,9 @@
 # standalone (read-only) raw image without looking at qcow2
 # metadata (since: 4.0)
 #
+# @extended-l2: true if the image has extended L2 entries; only valid for
+#   compat >= 1.1 (since 5.1)
+#
 # @lazy-refcounts: on or off; only valid for compat >= 1.1
 #
 # @corrupt: true if the image has been marked corrupt; only valid for
@@ -87,6 +90,7 @@
   'compat': 'str',
   '*data-file': 'str',
   '*data-file-raw': 'bool',
+  '*extended-l2': 'bool',
   '*lazy-refcounts': 'bool',
   '*corrupt': 'bool',
   'refcount-bits': 'int',
@@ -4312,6 +4316,8 @@
 # @data-file-raw: True if the external data file must stay valid as a
 # standalone (read-only) raw image without looking at qcow2
 # metadata (default: false; since: 4.0)
+# @extended-l2  True to make the image have extended L2 entries
+#   (default: false; since 5.1)
 # @size: Size of the virtual disk in bytes
 # @version: Compatibility level (default: v3)
 # @backing-file: File name of the backing file if a backing file
@@ -4332,6 +4338,7 @@
   'data': { 'file': 'BlockdevRef',
 '*data-file':   'BlockdevRef',
 '*data-file-raw':   'bool',
+'*extended-l2': 'bool',
 'size': 'size',
 '*version': 'BlockdevQcow2Version',
 '*backing-file':'str',
diff --git a/block/qcow2.h b/block/qcow2.h
index ece5f1cb5a..0f26e39d12 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -246,15 +246,18 @@ enum {
 QCOW2_INCOMPAT_CORRUPT_BITNR= 1,
 QCOW2_INCOMPAT_DATA_FILE_BITNR  = 2,
 QCOW2_INCOMPAT_COMPRESSION_BITNR = 3,
+QCOW2_INCOMPAT_EXTL2_BITNR  = 4,
 QCOW2_INCOMPAT_DIRTY= 1 << QCOW2_INCOMPAT_DIRTY_BITNR,
 QCOW2_INCOMPAT_CORRUPT  = 1 << QCOW2_INCOMPAT_CORRUPT_BITNR,
 QCOW2_INCOMPAT_DATA_FILE= 1 << QCOW2_INCOMPAT_DATA_FILE_BITNR,
 QCOW2_INCOMPAT_COMPRESSION  = 1 << QCOW2_INCOMPAT_COMPRESSION_BITNR,
+QCOW2_INCOMPAT_EXTL2= 1 << QCOW2_INCOMPAT_EXTL2_BITNR,
 
 QCOW2_INCOMPAT_MASK = QCOW2_INCOMPAT_DIRTY
 | QCOW2_INCOMPAT_CORRUPT
 | QCOW2_INCOMPAT_DATA_FILE
-| QCOW2_INCOMPAT_COMPRESSION,
+| QCOW2_INCOMPAT_COMPRESSION
+| QCOW2_INCOMPAT_EXTL2,
 };
 
 /* Compatible feature bits */
@@ -573,8 +576,7 @@ typedef enum QCow2MetadataOverlap {
 
 static inline bool has_subclusters(BDRVQcow2State *s)
 {
-/* FIXME: Return false until this feature is complete */
-return false;
+return s->incompatible_features & QCOW2_INCOMPAT_EXTL2;
 }
 
 static inline size_t l2_entry_size(BDRVQcow2State *s)
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 791de6a59c..36e1993788 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -58,6 +58,7 @@
 #define BLOCK_OPT_DATA_FILE "data_file"
 #define BLOCK_OPT_DATA_FILE_RAW "data_file_raw"
 #define BLOCK_OPT_COMPRESSION_TYPE  "compression_type"
+#define BLOCK_OPT_EXTL2 "extended_l2"
 
 #define BLOCK_PROBE_BUF_SIZE512
 
diff --git a/block/qcow2.c b/block/qcow2.c
index bc7a4a1729..6690b39510 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1438,6 +1438,12 @@ static int coroutine_fn qcow2_do_open(BlockDriverState 
*bs, QDict *options,

[PATCH v6 18/32] qcow2: Replace QCOW2_CLUSTER_* with QCOW2_SUBCLUSTER_*

2020-05-24 Thread Alberto Garcia
In order to support extended L2 entries some functions of the qcow2
driver need to start dealing with subclusters instead of clusters.

qcow2_get_host_offset() is modified to return the subcluster type
instead of the cluster type, and all callers are updated to replace
all values of QCow2ClusterType with their QCow2SubclusterType
equivalents.

This patch only changes the data types, there are no semantic changes.

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 block/qcow2.h |  2 +-
 block/qcow2-cluster.c | 10 +++
 block/qcow2.c | 70 ++-
 3 files changed, 42 insertions(+), 40 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 9cd7cbd895..84c06d9bba 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -894,7 +894,7 @@ int qcow2_encrypt_sectors(BDRVQcow2State *s, int64_t 
sector_num,
 
 int qcow2_get_host_offset(BlockDriverState *bs, uint64_t offset,
   unsigned int *bytes, uint64_t *host_offset,
-  QCow2ClusterType *cluster_type);
+  QCow2SubclusterType *subcluster_type);
 int qcow2_alloc_cluster_offset(BlockDriverState *bs, uint64_t offset,
unsigned int *bytes, uint64_t *host_offset,
QCowL2Meta **m);
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 1e5681b0c6..ed7b92dbb2 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -564,15 +564,15 @@ static int coroutine_fn 
do_perform_cow_write(BlockDriverState *bs,
  * offset that we are interested in.
  *
  * On exit, *bytes is the number of bytes starting at offset that have the same
- * cluster type and (if applicable) are stored contiguously in the image file.
- * The cluster type is stored in *cluster_type.
- * Compressed clusters are always returned one by one.
+ * subcluster type and (if applicable) are stored contiguously in the image
+ * file. The subcluster type is stored in *subcluster_type.
+ * Compressed clusters are always processed one by one.
  *
  * Returns 0 on success, -errno in error cases.
  */
 int qcow2_get_host_offset(BlockDriverState *bs, uint64_t offset,
   unsigned int *bytes, uint64_t *host_offset,
-  QCow2ClusterType *cluster_type)
+  QCow2SubclusterType *subcluster_type)
 {
 BDRVQcow2State *s = bs->opaque;
 unsigned int l2_index;
@@ -713,7 +713,7 @@ out:
 assert(bytes_available - offset_in_cluster <= UINT_MAX);
 *bytes = bytes_available - offset_in_cluster;
 
-*cluster_type = type;
+*subcluster_type = qcow2_cluster_to_subcluster_type(type);
 
 return 0;
 
diff --git a/block/qcow2.c b/block/qcow2.c
index 2806642f68..5303b0070b 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2033,7 +2033,7 @@ static int coroutine_fn 
qcow2_co_block_status(BlockDriverState *bs,
 BDRVQcow2State *s = bs->opaque;
 uint64_t host_offset;
 unsigned int bytes;
-QCow2ClusterType type;
+QCow2SubclusterType type;
 int ret, status = 0;
 
 qemu_co_mutex_lock(>lock);
@@ -2053,15 +2053,16 @@ static int coroutine_fn 
qcow2_co_block_status(BlockDriverState *bs,
 
 *pnum = bytes;
 
-if ((type == QCOW2_CLUSTER_NORMAL || type == QCOW2_CLUSTER_ZERO_ALLOC) &&
-!s->crypto) {
+if ((type == QCOW2_SUBCLUSTER_NORMAL ||
+ type == QCOW2_SUBCLUSTER_ZERO_ALLOC) && !s->crypto) {
 *map = host_offset;
 *file = s->data_file->bs;
 status |= BDRV_BLOCK_OFFSET_VALID;
 }
-if (type == QCOW2_CLUSTER_ZERO_PLAIN || type == QCOW2_CLUSTER_ZERO_ALLOC) {
+if (type == QCOW2_SUBCLUSTER_ZERO_PLAIN ||
+type == QCOW2_SUBCLUSTER_ZERO_ALLOC) {
 status |= BDRV_BLOCK_ZERO;
-} else if (type != QCOW2_CLUSTER_UNALLOCATED) {
+} else if (type != QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN) {
 status |= BDRV_BLOCK_DATA;
 }
 if (s->metadata_preallocation && (status & BDRV_BLOCK_DATA) &&
@@ -2158,7 +2159,7 @@ typedef struct Qcow2AioTask {
 AioTask task;
 
 BlockDriverState *bs;
-QCow2ClusterType cluster_type; /* only for read */
+QCow2SubclusterType subcluster_type; /* only for read */
 uint64_t host_offset; /* or full descriptor in compressed clusters */
 uint64_t offset;
 uint64_t bytes;
@@ -2171,7 +2172,7 @@ static coroutine_fn int 
qcow2_co_preadv_task_entry(AioTask *task);
 static coroutine_fn int qcow2_add_task(BlockDriverState *bs,
AioTaskPool *pool,
AioTaskFunc func,
-   QCow2ClusterType cluster_type,
+   QCow2SubclusterType subcluster_type,
uint64_t host_offset,
uint64_t offset,
uint64_t bytes,
@@ -2185,7 

[PATCH v6 20/32] qcow2: Add subcluster support to calculate_l2_meta()

2020-05-24 Thread Alberto Garcia
If an image has subclusters then there are more copy-on-write
scenarios that we need to consider. Let's say we have a write request
from the middle of subcluster #3 until the end of the cluster:

1) If we are writing to a newly allocated cluster then we need
   copy-on-write. The previous contents of subclusters #0 to #3 must
   be copied to the new cluster. We can optimize this process by
   skipping all leading unallocated or zero subclusters (the status of
   those skipped subclusters will be reflected in the new L2 bitmap).

2) If we are overwriting an existing cluster:

   2.1) If subcluster #3 is unallocated or has the all-zeroes bit set
then we need copy-on-write (on subcluster #3 only).

   2.2) If subcluster #3 was already allocated then there is no need
for any copy-on-write. However we still need to update the L2
bitmap to reflect possible changes in the allocation status of
subclusters #4 to #31. Because of this, this function checks
if all the overwritten subclusters are already allocated and
in this case it returns without creating a new QCowL2Meta
structure.

After all these changes l2meta_cow_start() and l2meta_cow_end()
are not necessarily cluster-aligned anymore. We need to update the
calculation of old_start and old_end in handle_dependencies() to
guarantee that no two requests try to write on the same cluster.

Signed-off-by: Alberto Garcia 
---
 block/qcow2-cluster.c | 163 +-
 1 file changed, 131 insertions(+), 32 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index ed7b92dbb2..59dd9bda29 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -387,7 +387,6 @@ fail:
  * If the L2 entry is invalid return -errno and set @type to
  * QCOW2_SUBCLUSTER_INVALID.
  */
-G_GNUC_UNUSED
 static int qcow2_get_subcluster_range_type(BlockDriverState *bs,
uint64_t l2_entry,
uint64_t l2_bitmap,
@@ -1110,56 +1109,148 @@ void qcow2_alloc_cluster_abort(BlockDriverState *bs, 
QCowL2Meta *m)
  * If @keep_old is true it means that the clusters were already
  * allocated and will be overwritten. If false then the clusters are
  * new and we have to decrease the reference count of the old ones.
+ *
+ * Returns 0 on success, -errno on failure.
  */
-static void calculate_l2_meta(BlockDriverState *bs,
-  uint64_t host_cluster_offset,
-  uint64_t guest_offset, unsigned bytes,
-  uint64_t *l2_slice, QCowL2Meta **m, bool 
keep_old)
+static int calculate_l2_meta(BlockDriverState *bs, uint64_t 
host_cluster_offset,
+ uint64_t guest_offset, unsigned bytes,
+ uint64_t *l2_slice, QCowL2Meta **m, bool keep_old)
 {
 BDRVQcow2State *s = bs->opaque;
-int l2_index = offset_to_l2_slice_index(s, guest_offset);
-uint64_t l2_entry;
+int sc_index, l2_index = offset_to_l2_slice_index(s, guest_offset);
+uint64_t l2_entry, l2_bitmap;
 unsigned cow_start_from, cow_end_to;
 unsigned cow_start_to = offset_into_cluster(s, guest_offset);
 unsigned cow_end_from = cow_start_to + bytes;
 unsigned nb_clusters = size_to_clusters(s, cow_end_from);
 QCowL2Meta *old_m = *m;
-QCow2ClusterType type;
+QCow2SubclusterType type;
+int i;
+bool skip_cow = keep_old;
 
 assert(nb_clusters <= s->l2_slice_size - l2_index);
 
-/* Return if there's no COW (all clusters are normal and we keep them) */
-if (keep_old) {
-int i;
-for (i = 0; i < nb_clusters; i++) {
-l2_entry = get_l2_entry(s, l2_slice, l2_index + i);
-if (qcow2_get_cluster_type(bs, l2_entry) != QCOW2_CLUSTER_NORMAL) {
-break;
+/* Check the type of all affected subclusters */
+for (i = 0; i < nb_clusters; i++) {
+l2_entry = get_l2_entry(s, l2_slice, l2_index + i);
+l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index + i);
+if (skip_cow) {
+unsigned write_from = MAX(cow_start_to, i << s->cluster_bits);
+unsigned write_to = MIN(cow_end_from, (i + 1) << s->cluster_bits);
+int first_sc = offset_to_sc_index(s, write_from);
+int last_sc = offset_to_sc_index(s, write_to - 1);
+int cnt = qcow2_get_subcluster_range_type(bs, l2_entry, l2_bitmap,
+  first_sc, );
+/* Is any of the subclusters of type != QCOW2_SUBCLUSTER_NORMAL ? 
*/
+if (type != QCOW2_SUBCLUSTER_NORMAL || first_sc + cnt <= last_sc) {
+skip_cow = false;
 }
+} else {
+/* If we can't skip the cow we can still look for invalid entries 
*/
+type = qcow2_get_subcluster_type(bs, l2_entry, l2_bitmap, 0);
 }
-if (i == nb_clusters) {
- 

[PATCH v6 15/32] qcow2: Add qcow2_get_subcluster_range_type()

2020-05-24 Thread Alberto Garcia
There are situations in which we want to know how many contiguous
subclusters of the same type there are in a given cluster. This can be
done by simply iterating over the subclusters and repeatedly calling
qcow2_get_subcluster_type() for each one of them.

However once we determined the type of a subcluster we can check the
rest efficiently by counting the number of adjacent ones (or zeroes)
in the bitmap. This is what this function does.

Signed-off-by: Alberto Garcia 
---
 block/qcow2-cluster.c | 51 +++
 1 file changed, 51 insertions(+)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 8b2fc550b7..32dc6e75e3 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -375,6 +375,57 @@ fail:
 return ret;
 }
 
+/*
+ * For a given L2 entry, count the number of contiguous subclusters of
+ * the same type starting from @sc_from. Compressed clusters are
+ * treated as if they were divided into subclusters of size
+ * s->subcluster_size.
+ *
+ * Return the number of contiguous subclusters and set @type to the
+ * subcluster type.
+ *
+ * If the L2 entry is invalid return -errno and set @type to
+ * QCOW2_SUBCLUSTER_INVALID.
+ */
+G_GNUC_UNUSED
+static int qcow2_get_subcluster_range_type(BlockDriverState *bs,
+   uint64_t l2_entry,
+   uint64_t l2_bitmap,
+   unsigned sc_from,
+   QCow2SubclusterType *type)
+{
+BDRVQcow2State *s = bs->opaque;
+uint32_t val;
+
+*type = qcow2_get_subcluster_type(bs, l2_entry, l2_bitmap, sc_from);
+
+if (*type == QCOW2_SUBCLUSTER_INVALID) {
+return -EINVAL;
+} else if (!has_subclusters(s) || *type == QCOW2_SUBCLUSTER_COMPRESSED) {
+return s->subclusters_per_cluster - sc_from;
+}
+
+switch (*type) {
+case QCOW2_SUBCLUSTER_NORMAL:
+val = l2_bitmap | QCOW_OFLAG_SUB_ALLOC_RANGE(0, sc_from);
+return cto32(val) - sc_from;
+
+case QCOW2_SUBCLUSTER_ZERO_PLAIN:
+case QCOW2_SUBCLUSTER_ZERO_ALLOC:
+val = (l2_bitmap | QCOW_OFLAG_SUB_ZERO_RANGE(0, sc_from)) >> 32;
+return cto32(val) - sc_from;
+
+case QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN:
+case QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC:
+val = ((l2_bitmap >> 32) | l2_bitmap)
+& ~QCOW_OFLAG_SUB_ALLOC_RANGE(0, sc_from);
+return ctz32(val) - sc_from;
+
+default:
+g_assert_not_reached();
+}
+}
+
 /*
  * Checks how many clusters in a given L2 slice are contiguous in the image
  * file. As soon as one of the flags in the bitmask stop_flags changes compared
-- 
2.20.1




[PATCH v6 25/32] qcow2: Update L2 bitmap in qcow2_alloc_cluster_link_l2()

2020-05-24 Thread Alberto Garcia
The L2 bitmap needs to be updated after each write to indicate what
new subclusters are now allocated. This needs to happen even if the
cluster was already allocated and the L2 entry was otherwise valid.

In some cases however a write operation doesn't need change the L2
bitmap (because all affected subclusters were already allocated). This
is detected in calculate_l2_meta(), and qcow2_alloc_cluster_link_l2()
is never called in those cases.

Signed-off-by: Alberto Garcia 
---
 block/qcow2-cluster.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 5e70b76c21..c8d6e16237 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1061,6 +1061,24 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, 
QCowL2Meta *m)
 assert((offset & L2E_OFFSET_MASK) == offset);
 
 set_l2_entry(s, l2_slice, l2_index + i, offset | QCOW_OFLAG_COPIED);
+
+/* Update bitmap with the subclusters that were just written */
+if (has_subclusters(s)) {
+uint64_t l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index + i);
+unsigned written_from = m->cow_start.offset;
+unsigned written_to = m->cow_end.offset + m->cow_end.nb_bytes ?:
+m->nb_clusters << s->cluster_bits;
+int first_sc, last_sc;
+/* Narrow written_from and written_to down to the current cluster 
*/
+written_from = MAX(written_from, i << s->cluster_bits);
+written_to   = MIN(written_to, (i + 1) << s->cluster_bits);
+assert(written_from < written_to);
+first_sc = offset_to_sc_index(s, written_from);
+last_sc  = offset_to_sc_index(s, written_to - 1);
+l2_bitmap |= QCOW_OFLAG_SUB_ALLOC_RANGE(first_sc, last_sc + 1);
+l2_bitmap &= ~QCOW_OFLAG_SUB_ZERO_RANGE(first_sc, last_sc + 1);
+set_l2_bitmap(s, l2_slice, l2_index + i, l2_bitmap);
+}
  }
 
 
-- 
2.20.1




[PATCH v6 21/32] qcow2: Add subcluster support to qcow2_get_host_offset()

2020-05-24 Thread Alberto Garcia
The logic of this function remains pretty much the same, except that
it uses count_contiguous_subclusters(), which combines the logic of
count_contiguous_clusters() / count_contiguous_clusters_unallocated()
and checks individual subclusters.

qcow2_cluster_to_subcluster_type() is not necessary as a separate
function anymore so it's inlined into its caller.

Signed-off-by: Alberto Garcia 
---
 block/qcow2.h |  38 ---
 block/qcow2-cluster.c | 150 ++
 2 files changed, 92 insertions(+), 96 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 84c06d9bba..32c68ead9a 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -710,29 +710,6 @@ static inline QCow2ClusterType 
qcow2_get_cluster_type(BlockDriverState *bs,
 }
 }
 
-/*
- * For an image without extended L2 entries, return the
- * QCow2SubclusterType equivalent of a given QCow2ClusterType.
- */
-static inline
-QCow2SubclusterType qcow2_cluster_to_subcluster_type(QCow2ClusterType type)
-{
-switch (type) {
-case QCOW2_CLUSTER_COMPRESSED:
-return QCOW2_SUBCLUSTER_COMPRESSED;
-case QCOW2_CLUSTER_ZERO_PLAIN:
-return QCOW2_SUBCLUSTER_ZERO_PLAIN;
-case QCOW2_CLUSTER_ZERO_ALLOC:
-return QCOW2_SUBCLUSTER_ZERO_ALLOC;
-case QCOW2_CLUSTER_NORMAL:
-return QCOW2_SUBCLUSTER_NORMAL;
-case QCOW2_CLUSTER_UNALLOCATED:
-return QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN;
-default:
-g_assert_not_reached();
-}
-}
-
 /*
  * In an image without subsclusters @l2_bitmap is ignored and
  * @sc_index must be 0.
@@ -776,7 +753,20 @@ QCow2SubclusterType 
qcow2_get_subcluster_type(BlockDriverState *bs,
 g_assert_not_reached();
 }
 } else {
-return qcow2_cluster_to_subcluster_type(type);
+switch (type) {
+case QCOW2_CLUSTER_COMPRESSED:
+return QCOW2_SUBCLUSTER_COMPRESSED;
+case QCOW2_CLUSTER_ZERO_PLAIN:
+return QCOW2_SUBCLUSTER_ZERO_PLAIN;
+case QCOW2_CLUSTER_ZERO_ALLOC:
+return QCOW2_SUBCLUSTER_ZERO_ALLOC;
+case QCOW2_CLUSTER_NORMAL:
+return QCOW2_SUBCLUSTER_NORMAL;
+case QCOW2_CLUSTER_UNALLOCATED:
+return QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN;
+default:
+g_assert_not_reached();
+}
 }
 }
 
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 59dd9bda29..2f3bd3a882 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -426,66 +426,66 @@ static int 
qcow2_get_subcluster_range_type(BlockDriverState *bs,
 }
 
 /*
- * Checks how many clusters in a given L2 slice are contiguous in the image
- * file. As soon as one of the flags in the bitmask stop_flags changes compared
- * to the first cluster, the search is stopped and the cluster is not counted
- * as contiguous. (This allows it, for example, to stop at the first compressed
- * cluster which may require a different handling)
+ * Return the number of contiguous subclusters of the exact same type
+ * in a given L2 slice, starting from cluster @l2_index, subcluster
+ * @sc_index. Allocated subclusters are required to be contiguous in
+ * the image file.
+ * At most @nb_clusters are checked (note that this means clusters,
+ * not subclusters).
+ * Compressed clusters are always processed one by one but for the
+ * purpose of this count they are treated as if they were divided into
+ * subclusters of size s->subcluster_size.
+ * On failure return -errno and update @l2_index to point to the
+ * invalid entry.
  */
-static int count_contiguous_clusters(BlockDriverState *bs, int nb_clusters,
-int cluster_size, uint64_t *l2_slice, int l2_index, uint64_t 
stop_flags)
+static int count_contiguous_subclusters(BlockDriverState *bs, int nb_clusters,
+unsigned sc_index, uint64_t *l2_slice,
+unsigned *l2_index)
 {
 BDRVQcow2State *s = bs->opaque;
-int i;
-QCow2ClusterType first_cluster_type;
-uint64_t mask = stop_flags | L2E_OFFSET_MASK | QCOW_OFLAG_COMPRESSED;
-uint64_t first_entry = get_l2_entry(s, l2_slice, l2_index);
-uint64_t offset = first_entry & mask;
+int i, count = 0;
+bool check_offset;
+uint64_t expected_offset;
+QCow2SubclusterType expected_type, type;
 
-first_cluster_type = qcow2_get_cluster_type(bs, first_entry);
-if (first_cluster_type == QCOW2_CLUSTER_UNALLOCATED) {
-return 0;
-}
-
-/* must be allocated */
-assert(first_cluster_type == QCOW2_CLUSTER_NORMAL ||
-   first_cluster_type == QCOW2_CLUSTER_ZERO_ALLOC);
+assert(*l2_index + nb_clusters <= s->l2_size);
 
 for (i = 0; i < nb_clusters; i++) {
-uint64_t l2_entry = get_l2_entry(s, l2_slice, l2_index + i) & mask;
-if (offset + (uint64_t) i * cluster_size != l2_entry) {
+unsigned first_sc = (i == 0) ? sc_index : 0;
+uint64_t l2_entry = get_l2_entry(s, l2_slice, 

[PATCH v6 11/32] qcow2: Add offset_into_subcluster() and size_to_subclusters()

2020-05-24 Thread Alberto Garcia
Like offset_into_cluster() and size_to_clusters(), but for
subclusters.

Signed-off-by: Alberto Garcia 
Reviewed-by: Eric Blake 
---
 block/qcow2.h | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/block/qcow2.h b/block/qcow2.h
index ca73ac9b67..79c4f82383 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -555,11 +555,21 @@ static inline int64_t offset_into_cluster(BDRVQcow2State 
*s, int64_t offset)
 return offset & (s->cluster_size - 1);
 }
 
+static inline int64_t offset_into_subcluster(BDRVQcow2State *s, int64_t offset)
+{
+return offset & (s->subcluster_size - 1);
+}
+
 static inline uint64_t size_to_clusters(BDRVQcow2State *s, uint64_t size)
 {
 return (size + (s->cluster_size - 1)) >> s->cluster_bits;
 }
 
+static inline uint64_t size_to_subclusters(BDRVQcow2State *s, uint64_t size)
+{
+return (size + (s->subcluster_size - 1)) >> s->subcluster_bits;
+}
+
 static inline int64_t size_to_l1(BDRVQcow2State *s, int64_t size)
 {
 int shift = s->cluster_bits + s->l2_bits;
-- 
2.20.1




[PATCH v6 27/32] qcow2: Add subcluster support to handle_alloc_space()

2020-05-24 Thread Alberto Garcia
The bdrv_co_pwrite_zeroes() call here fills complete clusters with
zeroes, but it can happen that some subclusters are not part of the
write request or the copy-on-write. This patch makes sure that only
the affected subclusters are overwritten.

A potential improvement would be to also fill with zeroes the other
subclusters if we can guarantee that we are not overwriting existing
data. However this would waste more disk space, so we should first
evaluate if it's really worth doing.

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 block/qcow2.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 8126ad8fa0..430b4e423a 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2411,6 +2411,9 @@ static int handle_alloc_space(BlockDriverState *bs, 
QCowL2Meta *l2meta)
 
 for (m = l2meta; m != NULL; m = m->next) {
 int ret;
+uint64_t start_offset = m->alloc_offset + m->cow_start.offset;
+unsigned nb_bytes = m->cow_end.offset + m->cow_end.nb_bytes -
+m->cow_start.offset;
 
 if (!m->cow_start.nb_bytes && !m->cow_end.nb_bytes) {
 continue;
@@ -2425,16 +2428,14 @@ static int handle_alloc_space(BlockDriverState *bs, 
QCowL2Meta *l2meta)
  * efficiently zero out the whole clusters
  */
 
-ret = qcow2_pre_write_overlap_check(bs, 0, m->alloc_offset,
-m->nb_clusters * s->cluster_size,
+ret = qcow2_pre_write_overlap_check(bs, 0, start_offset, nb_bytes,
 true);
 if (ret < 0) {
 return ret;
 }
 
 BLKDBG_EVENT(bs->file, BLKDBG_CLUSTER_ALLOC_SPACE);
-ret = bdrv_co_pwrite_zeroes(s->data_file, m->alloc_offset,
-m->nb_clusters * s->cluster_size,
+ret = bdrv_co_pwrite_zeroes(s->data_file, start_offset, nb_bytes,
 BDRV_REQ_NO_FALLBACK);
 if (ret < 0) {
 if (ret != -ENOTSUP && ret != -EAGAIN) {
-- 
2.20.1




[PATCH v6 17/32] qcow2: Add cluster type parameter to qcow2_get_host_offset()

2020-05-24 Thread Alberto Garcia
This function returns an integer that can be either an error code or a
cluster type (a value from the QCow2ClusterType enum).

We are going to start using subcluster types instead of cluster types
in some functions so it's better to use the exact data types instead
of integers for clarity and in order to detect errors more easily.

This patch makes qcow2_get_host_offset() return 0 on success and
puts the returned cluster type in a separate parameter. There are no
semantic changes.

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 block/qcow2.h |  3 ++-
 block/qcow2-cluster.c | 11 +++
 block/qcow2.c | 37 ++---
 3 files changed, 31 insertions(+), 20 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index ba7614e406..9cd7cbd895 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -893,7 +893,8 @@ int qcow2_encrypt_sectors(BDRVQcow2State *s, int64_t 
sector_num,
   uint8_t *buf, int nb_sectors, bool enc, Error 
**errp);
 
 int qcow2_get_host_offset(BlockDriverState *bs, uint64_t offset,
-  unsigned int *bytes, uint64_t *host_offset);
+  unsigned int *bytes, uint64_t *host_offset,
+  QCow2ClusterType *cluster_type);
 int qcow2_alloc_cluster_offset(BlockDriverState *bs, uint64_t offset,
unsigned int *bytes, uint64_t *host_offset,
QCowL2Meta **m);
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 32dc6e75e3..1e5681b0c6 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -565,13 +565,14 @@ static int coroutine_fn 
do_perform_cow_write(BlockDriverState *bs,
  *
  * On exit, *bytes is the number of bytes starting at offset that have the same
  * cluster type and (if applicable) are stored contiguously in the image file.
+ * The cluster type is stored in *cluster_type.
  * Compressed clusters are always returned one by one.
  *
- * Returns the cluster type (QCOW2_CLUSTER_*) on success, -errno in error
- * cases.
+ * Returns 0 on success, -errno in error cases.
  */
 int qcow2_get_host_offset(BlockDriverState *bs, uint64_t offset,
-  unsigned int *bytes, uint64_t *host_offset)
+  unsigned int *bytes, uint64_t *host_offset,
+  QCow2ClusterType *cluster_type)
 {
 BDRVQcow2State *s = bs->opaque;
 unsigned int l2_index;
@@ -712,7 +713,9 @@ out:
 assert(bytes_available - offset_in_cluster <= UINT_MAX);
 *bytes = bytes_available - offset_in_cluster;
 
-return type;
+*cluster_type = type;
+
+return 0;
 
 fail:
 qcow2_cache_put(s->l2_table_cache, (void **)_slice);
diff --git a/block/qcow2.c b/block/qcow2.c
index d4851e0e18..2806642f68 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2033,6 +2033,7 @@ static int coroutine_fn 
qcow2_co_block_status(BlockDriverState *bs,
 BDRVQcow2State *s = bs->opaque;
 uint64_t host_offset;
 unsigned int bytes;
+QCow2ClusterType type;
 int ret, status = 0;
 
 qemu_co_mutex_lock(>lock);
@@ -2044,7 +2045,7 @@ static int coroutine_fn 
qcow2_co_block_status(BlockDriverState *bs,
 }
 
 bytes = MIN(INT_MAX, count);
-ret = qcow2_get_host_offset(bs, offset, , _offset);
+ret = qcow2_get_host_offset(bs, offset, , _offset, );
 qemu_co_mutex_unlock(>lock);
 if (ret < 0) {
 return ret;
@@ -2052,15 +2053,15 @@ static int coroutine_fn 
qcow2_co_block_status(BlockDriverState *bs,
 
 *pnum = bytes;
 
-if ((ret == QCOW2_CLUSTER_NORMAL || ret == QCOW2_CLUSTER_ZERO_ALLOC) &&
+if ((type == QCOW2_CLUSTER_NORMAL || type == QCOW2_CLUSTER_ZERO_ALLOC) &&
 !s->crypto) {
 *map = host_offset;
 *file = s->data_file->bs;
 status |= BDRV_BLOCK_OFFSET_VALID;
 }
-if (ret == QCOW2_CLUSTER_ZERO_PLAIN || ret == QCOW2_CLUSTER_ZERO_ALLOC) {
+if (type == QCOW2_CLUSTER_ZERO_PLAIN || type == QCOW2_CLUSTER_ZERO_ALLOC) {
 status |= BDRV_BLOCK_ZERO;
-} else if (ret != QCOW2_CLUSTER_UNALLOCATED) {
+} else if (type != QCOW2_CLUSTER_UNALLOCATED) {
 status |= BDRV_BLOCK_DATA;
 }
 if (s->metadata_preallocation && (status & BDRV_BLOCK_DATA) &&
@@ -2269,6 +2270,7 @@ static coroutine_fn int 
qcow2_co_preadv_part(BlockDriverState *bs,
 int ret = 0;
 unsigned int cur_bytes; /* number of bytes in current iteration */
 uint64_t host_offset = 0;
+QCow2ClusterType type;
 AioTaskPool *aio = NULL;
 
 while (bytes != 0 && aio_task_pool_status(aio) == 0) {
@@ -2280,22 +2282,23 @@ static coroutine_fn int 
qcow2_co_preadv_part(BlockDriverState *bs,
 }
 
 qemu_co_mutex_lock(>lock);
-ret = qcow2_get_host_offset(bs, offset, _bytes, _offset);
+ret = qcow2_get_host_offset(bs, offset, _bytes,
+_offset, );
 

[PATCH v6 24/32] qcow2: Add subcluster support to check_refcounts_l2()

2020-05-24 Thread Alberto Garcia
Setting the QCOW_OFLAG_ZERO bit of the L2 entry is forbidden if an
image has subclusters. Instead, the individual 'all zeroes' bits must
be used.

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Eric Blake 
---
 block/qcow2-refcount.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 770c5dbc83..696e4dad07 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1686,8 +1686,13 @@ static int check_refcounts_l2(BlockDriverState *bs, 
BdrvCheckResult *res,
 int ign = active ? QCOW2_OL_ACTIVE_L2 :
QCOW2_OL_INACTIVE_L2;
 
-l2_entry = QCOW_OFLAG_ZERO;
-set_l2_entry(s, l2_table, i, l2_entry);
+if (has_subclusters(s)) {
+set_l2_entry(s, l2_table, i, 0);
+set_l2_bitmap(s, l2_table, i,
+  QCOW_L2_BITMAP_ALL_ZEROES);
+} else {
+set_l2_entry(s, l2_table, i, QCOW_OFLAG_ZERO);
+}
 ret = qcow2_pre_write_overlap_check(bs, ign,
 l2e_offset, l2_entry_size(s), false);
 if (ret < 0) {
-- 
2.20.1




[PATCH v6 26/32] qcow2: Clear the L2 bitmap when allocating a compressed cluster

2020-05-24 Thread Alberto Garcia
Compressed clusters always have the bitmap part of the extended L2
entry set to 0.

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
---
 block/qcow2-cluster.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index c8d6e16237..3639dc8057 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -861,6 +861,9 @@ int qcow2_alloc_compressed_cluster_offset(BlockDriverState 
*bs,
 BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE_COMPRESSED);
 qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
 set_l2_entry(s, l2_slice, l2_index, cluster_offset);
+if (has_subclusters(s)) {
+set_l2_bitmap(s, l2_slice, l2_index, 0);
+}
 qcow2_cache_put(s->l2_table_cache, (void **) _slice);
 
 *host_offset = cluster_offset & s->cluster_offset_mask;
-- 
2.20.1




[PATCH v6 28/32] qcow2: Add subcluster support to qcow2_co_pwrite_zeroes()

2020-05-24 Thread Alberto Garcia
This works now at the subcluster level and pwrite_zeroes_alignment is
updated accordingly.

qcow2_cluster_zeroize() is turned into qcow2_subcluster_zeroize() with
the following changes:

   - The request can now be subcluster-aligned.

   - The cluster-aligned body of the request is still zeroized using
 zero_in_l2_slice() as before.

   - The subcluster-aligned head and tail of the request are zeroized
 with the new zero_l2_subclusters() function.

There is just one thing to take into account for a possible future
improvement: compressed clusters cannot be partially zeroized so
zero_l2_subclusters() on the head or the tail can return -ENOTSUP.
This makes the caller repeat the *complete* request and write actual
zeroes to disk. This is sub-optimal because

   1) if the head area was compressed we would still be able to use
  the fast path for the body and possibly the tail.

   2) if the tail area was compressed we are writing zeroes to the
  head and the body areas, which are already zeroized.

Signed-off-by: Alberto Garcia 
---
 block/qcow2.h |  4 +--
 block/qcow2-cluster.c | 80 +++
 block/qcow2.c | 27 ---
 3 files changed, 90 insertions(+), 21 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 32c68ead9a..ece5f1cb5a 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -898,8 +898,8 @@ void qcow2_alloc_cluster_abort(BlockDriverState *bs, 
QCowL2Meta *m);
 int qcow2_cluster_discard(BlockDriverState *bs, uint64_t offset,
   uint64_t bytes, enum qcow2_discard_type type,
   bool full_discard);
-int qcow2_cluster_zeroize(BlockDriverState *bs, uint64_t offset,
-  uint64_t bytes, int flags);
+int qcow2_subcluster_zeroize(BlockDriverState *bs, uint64_t offset,
+ uint64_t bytes, int flags);
 
 int qcow2_expand_zero_clusters(BlockDriverState *bs,
BlockDriverAmendStatusCB *status_cb,
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 3639dc8057..b808d6ad95 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -2014,12 +2014,58 @@ static int zero_in_l2_slice(BlockDriverState *bs, 
uint64_t offset,
 return nb_clusters;
 }
 
-int qcow2_cluster_zeroize(BlockDriverState *bs, uint64_t offset,
-  uint64_t bytes, int flags)
+static int zero_l2_subclusters(BlockDriverState *bs, uint64_t offset,
+   unsigned nb_subclusters)
+{
+BDRVQcow2State *s = bs->opaque;
+uint64_t *l2_slice;
+uint64_t old_l2_bitmap, l2_bitmap;
+int l2_index, ret, sc = offset_to_sc_index(s, offset);
+
+/* For full clusters use zero_in_l2_slice() instead */
+assert(nb_subclusters > 0 && nb_subclusters < s->subclusters_per_cluster);
+assert(sc + nb_subclusters <= s->subclusters_per_cluster);
+
+ret = get_cluster_table(bs, offset, _slice, _index);
+if (ret < 0) {
+return ret;
+}
+
+switch (qcow2_get_cluster_type(bs, get_l2_entry(s, l2_slice, l2_index))) {
+case QCOW2_CLUSTER_COMPRESSED:
+ret = -ENOTSUP; /* We cannot partially zeroize compressed clusters */
+goto out;
+case QCOW2_CLUSTER_NORMAL:
+case QCOW2_CLUSTER_UNALLOCATED:
+break;
+default:
+g_assert_not_reached();
+}
+
+old_l2_bitmap = l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index);
+
+l2_bitmap |=  QCOW_OFLAG_SUB_ZERO_RANGE(sc, sc + nb_subclusters);
+l2_bitmap &= ~QCOW_OFLAG_SUB_ALLOC_RANGE(sc, sc + nb_subclusters);
+
+if (old_l2_bitmap != l2_bitmap) {
+set_l2_bitmap(s, l2_slice, l2_index, l2_bitmap);
+qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
+}
+
+ret = 0;
+out:
+qcow2_cache_put(s->l2_table_cache, (void **) _slice);
+
+return ret;
+}
+
+int qcow2_subcluster_zeroize(BlockDriverState *bs, uint64_t offset,
+ uint64_t bytes, int flags)
 {
 BDRVQcow2State *s = bs->opaque;
 uint64_t end_offset = offset + bytes;
 uint64_t nb_clusters;
+unsigned head, tail;
 int64_t cleared;
 int ret;
 
@@ -2034,8 +2080,8 @@ int qcow2_cluster_zeroize(BlockDriverState *bs, uint64_t 
offset,
 }
 
 /* Caller must pass aligned values, except at image end */
-assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
-assert(QEMU_IS_ALIGNED(end_offset, s->cluster_size) ||
+assert(offset_into_subcluster(s, offset) == 0);
+assert(offset_into_subcluster(s, end_offset) == 0 ||
end_offset >= bs->total_sectors << BDRV_SECTOR_BITS);
 
 /* The zero flag is only supported by version 3 and newer */
@@ -2043,11 +2089,26 @@ int qcow2_cluster_zeroize(BlockDriverState *bs, 
uint64_t offset,
 return -ENOTSUP;
 }
 
-/* Each L2 slice is handled by its own loop iteration */
-nb_clusters = size_to_clusters(s, bytes);
+head = MIN(end_offset, ROUND_UP(offset, 

[PATCH v6 31/32] qcow2: Assert that expand_zero_clusters_in_l1() does not support subclusters

2020-05-24 Thread Alberto Garcia
This function is only used by qcow2_expand_zero_clusters() to
downgrade a qcow2 image to a previous version. It is however not
possible to downgrade an image with extended L2 entries because older
versions of qcow2 do not have this feature.

Signed-off-by: Alberto Garcia 
Reviewed-by: Eric Blake 
---
 block/qcow2-cluster.c  | 8 +++-
 tests/qemu-iotests/061 | 6 ++
 tests/qemu-iotests/061.out | 5 +
 3 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index b808d6ad95..608e785dd6 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -2156,6 +2156,9 @@ static int expand_zero_clusters_in_l1(BlockDriverState 
*bs, uint64_t *l1_table,
 int ret;
 int i, j;
 
+/* qcow2_downgrade() is not allowed in images with subclusters */
+assert(!has_subclusters(s));
+
 slice_size2 = s->l2_slice_size * l2_entry_size(s);
 n_slices = s->cluster_size / slice_size2;
 
@@ -2224,7 +2227,8 @@ static int expand_zero_clusters_in_l1(BlockDriverState 
*bs, uint64_t *l1_table,
 if (cluster_type == QCOW2_CLUSTER_ZERO_PLAIN) {
 if (!bs->backing) {
 /* not backed; therefore we can simply deallocate the
- * cluster */
+ * cluster. No need to call set_l2_bitmap(), this
+ * function doesn't support images with subclusters. */
 set_l2_entry(s, l2_slice, j, 0);
 l2_dirty = true;
 continue;
@@ -2295,6 +2299,8 @@ static int expand_zero_clusters_in_l1(BlockDriverState 
*bs, uint64_t *l1_table,
 } else {
 set_l2_entry(s, l2_slice, j, offset);
 }
+/* No need to call set_l2_bitmap() after set_l2_entry() because
+ * this function doesn't support images with subclusters. */
 l2_dirty = true;
 }
 
diff --git a/tests/qemu-iotests/061 b/tests/qemu-iotests/061
index 10eb243164..23add2dfe3 100755
--- a/tests/qemu-iotests/061
+++ b/tests/qemu-iotests/061
@@ -303,6 +303,12 @@ $QEMU_IMG amend -o "compat=0.10" "$TEST_IMG"
 _img_info --format-specific
 _check_test_img
 
+echo
+echo "=== Testing version downgrade with extended L2 entries ==="
+echo
+_make_test_img -o "compat=1.1,extended_l2=on" 64M
+$QEMU_IMG amend -o "compat=0.10" "$TEST_IMG"
+
 echo
 echo "=== Try changing the external data file ==="
 echo
diff --git a/tests/qemu-iotests/061.out b/tests/qemu-iotests/061.out
index 39812d8cf8..c1acdbd751 100644
--- a/tests/qemu-iotests/061.out
+++ b/tests/qemu-iotests/061.out
@@ -528,6 +528,11 @@ Format specific information:
 extended l2: false
 No errors were found on the image.
 
+=== Testing version downgrade with extended L2 entries ===
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
+qemu-img: Cannot downgrade an image with incompatible features 0x10 set
+
 === Try changing the external data file ===
 
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
-- 
2.20.1




[PATCH v6 22/32] qcow2: Add subcluster support to zero_in_l2_slice()

2020-05-24 Thread Alberto Garcia
The QCOW_OFLAG_ZERO bit that indicates that a cluster reads as
zeroes is only used in standard L2 entries. Extended L2 entries use
individual 'all zeroes' bits for each subcluster.

This must be taken into account when updating the L2 entry and also
when deciding that an existing entry does not need to be updated.

Signed-off-by: Alberto Garcia 
---
 block/qcow2-cluster.c | 36 +++-
 1 file changed, 19 insertions(+), 17 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 2f3bd3a882..4e59bbd545 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1956,7 +1956,6 @@ static int zero_in_l2_slice(BlockDriverState *bs, 
uint64_t offset,
 int l2_index;
 int ret;
 int i;
-bool unmap = !!(flags & BDRV_REQ_MAY_UNMAP);
 
 ret = get_cluster_table(bs, offset, _slice, _index);
 if (ret < 0) {
@@ -1968,28 +1967,31 @@ static int zero_in_l2_slice(BlockDriverState *bs, 
uint64_t offset,
 assert(nb_clusters <= INT_MAX);
 
 for (i = 0; i < nb_clusters; i++) {
-uint64_t old_offset;
-QCow2ClusterType cluster_type;
+uint64_t old_l2_entry = get_l2_entry(s, l2_slice, l2_index + i);
+uint64_t old_l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index + i);
+QCow2ClusterType type = qcow2_get_cluster_type(bs, old_l2_entry);
+bool unmap = (type == QCOW2_CLUSTER_COMPRESSED) ||
+((flags & BDRV_REQ_MAY_UNMAP) && qcow2_cluster_is_allocated(type));
+uint64_t new_l2_entry = unmap ? 0 : old_l2_entry;
+uint64_t new_l2_bitmap = old_l2_bitmap;
 
-old_offset = get_l2_entry(s, l2_slice, l2_index + i);
+if (has_subclusters(s)) {
+new_l2_bitmap = QCOW_L2_BITMAP_ALL_ZEROES;
+} else {
+new_l2_entry |= QCOW_OFLAG_ZERO;
+}
 
-/*
- * Minimize L2 changes if the cluster already reads back as
- * zeroes with correct allocation.
- */
-cluster_type = qcow2_get_cluster_type(bs, old_offset);
-if (cluster_type == QCOW2_CLUSTER_ZERO_PLAIN ||
-(cluster_type == QCOW2_CLUSTER_ZERO_ALLOC && !unmap)) {
+if (old_l2_entry == new_l2_entry && old_l2_bitmap == new_l2_bitmap) {
 continue;
 }
 
 qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
-if (cluster_type == QCOW2_CLUSTER_COMPRESSED || unmap) {
-set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);
-qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST);
-} else {
-uint64_t entry = get_l2_entry(s, l2_slice, l2_index + i);
-set_l2_entry(s, l2_slice, l2_index + i, entry | QCOW_OFLAG_ZERO);
+if (unmap) {
+qcow2_free_any_clusters(bs, old_l2_entry, 1, 
QCOW2_DISCARD_REQUEST);
+}
+set_l2_entry(s, l2_slice, l2_index + i, new_l2_entry);
+if (has_subclusters(s)) {
+set_l2_bitmap(s, l2_slice, l2_index + i, new_l2_bitmap);
 }
 }
 
-- 
2.20.1




[PATCH v6 19/32] qcow2: Handle QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC

2020-05-24 Thread Alberto Garcia
When dealing with subcluster types there is a new value called
QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC that has no equivalent in
QCow2ClusterType.

This patch handles that value in all places where subcluster types
are processed.

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 block/qcow2.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 5303b0070b..8126ad8fa0 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2054,7 +2054,8 @@ static int coroutine_fn 
qcow2_co_block_status(BlockDriverState *bs,
 *pnum = bytes;
 
 if ((type == QCOW2_SUBCLUSTER_NORMAL ||
- type == QCOW2_SUBCLUSTER_ZERO_ALLOC) && !s->crypto) {
+ type == QCOW2_SUBCLUSTER_ZERO_ALLOC ||
+ type == QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC) && !s->crypto) {
 *map = host_offset;
 *file = s->data_file->bs;
 status |= BDRV_BLOCK_OFFSET_VALID;
@@ -2062,7 +2063,8 @@ static int coroutine_fn 
qcow2_co_block_status(BlockDriverState *bs,
 if (type == QCOW2_SUBCLUSTER_ZERO_PLAIN ||
 type == QCOW2_SUBCLUSTER_ZERO_ALLOC) {
 status |= BDRV_BLOCK_ZERO;
-} else if (type != QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN) {
+} else if (type != QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN &&
+   type != QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC) {
 status |= BDRV_BLOCK_DATA;
 }
 if (s->metadata_preallocation && (status & BDRV_BLOCK_DATA) &&
@@ -2225,6 +2227,7 @@ static coroutine_fn int 
qcow2_co_preadv_task(BlockDriverState *bs,
 g_assert_not_reached();
 
 case QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN:
+case QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC:
 assert(bs->backing); /* otherwise handled in qcow2_co_preadv_part */
 
 BLKDBG_EVENT(bs->file, BLKDBG_READ_BACKING_AIO);
@@ -2293,7 +2296,8 @@ static coroutine_fn int 
qcow2_co_preadv_part(BlockDriverState *bs,
 
 if (type == QCOW2_SUBCLUSTER_ZERO_PLAIN ||
 type == QCOW2_SUBCLUSTER_ZERO_ALLOC ||
-(type == QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN && !bs->backing))
+(type == QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN && !bs->backing) ||
+(type == QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC && !bs->backing))
 {
 qemu_iovec_memset(qiov, qiov_offset, 0, cur_bytes);
 } else {
@@ -3865,6 +3869,7 @@ static coroutine_fn int 
qcow2_co_pwrite_zeroes(BlockDriverState *bs,
 ret = qcow2_get_host_offset(bs, offset, , , );
 if (ret < 0 ||
 (type != QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN &&
+ type != QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC &&
  type != QCOW2_SUBCLUSTER_ZERO_PLAIN &&
  type != QCOW2_SUBCLUSTER_ZERO_ALLOC)) {
 qemu_co_mutex_unlock(>lock);
@@ -3943,6 +3948,7 @@ qcow2_co_copy_range_from(BlockDriverState *bs,
 
 switch (type) {
 case QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN:
+case QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC:
 if (bs->backing && bs->backing->bs) {
 int64_t backing_length = bdrv_getlength(bs->backing->bs);
 if (src_offset >= backing_length) {
-- 
2.20.1




[PATCH v6 09/32] qcow2: Add subcluster-related fields to BDRVQcow2State

2020-05-24 Thread Alberto Garcia
This patch adds the following new fields to BDRVQcow2State:

- subclusters_per_cluster: Number of subclusters in a cluster
- subcluster_size: The size of each subcluster, in bytes
- subcluster_bits: No. of bits so 1 << subcluster_bits = subcluster_size

Images without subclusters are treated as if they had exactly one
subcluster per cluster (i.e. subcluster_size = cluster_size).

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 block/qcow2.h | 5 +
 block/qcow2.c | 5 +
 2 files changed, 10 insertions(+)

diff --git a/block/qcow2.h b/block/qcow2.h
index f41bfd743f..4e9c732831 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -78,6 +78,8 @@
 /* The cluster reads as all zeros */
 #define QCOW_OFLAG_ZERO (1ULL << 0)
 
+#define QCOW_EXTL2_SUBCLUSTERS_PER_CLUSTER 32
+
 #define MIN_CLUSTER_BITS 9
 #define MAX_CLUSTER_BITS 21
 
@@ -295,6 +297,9 @@ typedef struct BDRVQcow2State {
 int cluster_bits;
 int cluster_size;
 int l2_slice_size;
+int subcluster_bits;
+int subcluster_size;
+int subclusters_per_cluster;
 int l2_bits;
 int l2_size;
 int l1_size;
diff --git a/block/qcow2.c b/block/qcow2.c
index 6562eb1590..916b6c3f92 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1433,6 +1433,11 @@ static int coroutine_fn qcow2_do_open(BlockDriverState 
*bs, QDict *options,
 }
 }
 
+s->subclusters_per_cluster =
+has_subclusters(s) ? QCOW_EXTL2_SUBCLUSTERS_PER_CLUSTER : 1;
+s->subcluster_size = s->cluster_size / s->subclusters_per_cluster;
+s->subcluster_bits = ctz32(s->subcluster_size);
+
 /* Check support for various header values */
 if (header.refcount_order > 6) {
 error_setg(errp, "Reference count entry width too large; may not "
-- 
2.20.1




[PATCH v6 02/32] qcow2: Convert qcow2_get_cluster_offset() into qcow2_get_host_offset()

2020-05-24 Thread Alberto Garcia
qcow2_get_cluster_offset() takes an (unaligned) guest offset and
returns the (aligned) offset of the corresponding cluster in the qcow2
image.

In practice none of the callers need to know where the cluster starts
so this patch makes the function calculate and return the final host
offset directly. The function is also renamed accordingly.

There is a pre-existing exception with compressed clusters: in this
case the function returns the complete cluster descriptor (containing
the offset and size of the compressed data). This does not change with
this patch but it is now documented.

Signed-off-by: Alberto Garcia 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 block/qcow2.h |  4 ++--
 block/qcow2-cluster.c | 42 +++---
 block/qcow2.c | 24 +++-
 3 files changed, 32 insertions(+), 38 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 402e8acb1c..9d8530f1e8 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -694,8 +694,8 @@ int qcow2_write_l1_entry(BlockDriverState *bs, int 
l1_index);
 int qcow2_encrypt_sectors(BDRVQcow2State *s, int64_t sector_num,
   uint8_t *buf, int nb_sectors, bool enc, Error 
**errp);
 
-int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
- unsigned int *bytes, uint64_t *cluster_offset);
+int qcow2_get_host_offset(BlockDriverState *bs, uint64_t offset,
+  unsigned int *bytes, uint64_t *host_offset);
 int qcow2_alloc_cluster_offset(BlockDriverState *bs, uint64_t offset,
unsigned int *bytes, uint64_t *host_offset,
QCowL2Meta **m);
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 4b5fc8c4a7..9ab41cb728 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -496,10 +496,15 @@ static int coroutine_fn 
do_perform_cow_write(BlockDriverState *bs,
 
 
 /*
- * get_cluster_offset
+ * get_host_offset
  *
- * For a given offset of the virtual disk, find the cluster type and offset in
- * the qcow2 file. The offset is stored in *cluster_offset.
+ * For a given offset of the virtual disk find the equivalent host
+ * offset in the qcow2 file and store it in *host_offset. Neither
+ * offset needs to be aligned to a cluster boundary.
+ *
+ * If the cluster is unallocated then *host_offset will be 0.
+ * If the cluster is compressed then *host_offset will contain the
+ * complete compressed cluster descriptor.
  *
  * On entry, *bytes is the maximum number of contiguous bytes starting at
  * offset that we are interested in.
@@ -511,12 +516,12 @@ static int coroutine_fn 
do_perform_cow_write(BlockDriverState *bs,
  * Returns the cluster type (QCOW2_CLUSTER_*) on success, -errno in error
  * cases.
  */
-int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
- unsigned int *bytes, uint64_t *cluster_offset)
+int qcow2_get_host_offset(BlockDriverState *bs, uint64_t offset,
+  unsigned int *bytes, uint64_t *host_offset)
 {
 BDRVQcow2State *s = bs->opaque;
 unsigned int l2_index;
-uint64_t l1_index, l2_offset, *l2_slice;
+uint64_t l1_index, l2_offset, *l2_slice, l2_entry;
 int c;
 unsigned int offset_in_cluster;
 uint64_t bytes_available, bytes_needed, nb_clusters;
@@ -537,8 +542,6 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t 
offset,
 bytes_needed = bytes_available;
 }
 
-*cluster_offset = 0;
-
 /* seek to the l2 offset in the l1 table */
 
 l1_index = offset_to_l1_index(s, offset);
@@ -570,7 +573,7 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t 
offset,
 /* find the cluster offset for the given disk offset */
 
 l2_index = offset_to_l2_slice_index(s, offset);
-*cluster_offset = be64_to_cpu(l2_slice[l2_index]);
+l2_entry = be64_to_cpu(l2_slice[l2_index]);
 
 nb_clusters = size_to_clusters(s, bytes_needed);
 /* bytes_needed <= *bytes + offset_in_cluster, both of which are unsigned
@@ -578,7 +581,7 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t 
offset,
  * true */
 assert(nb_clusters <= INT_MAX);
 
-type = qcow2_get_cluster_type(bs, *cluster_offset);
+type = qcow2_get_cluster_type(bs, l2_entry);
 if (s->qcow_version < 3 && (type == QCOW2_CLUSTER_ZERO_PLAIN ||
 type == QCOW2_CLUSTER_ZERO_ALLOC)) {
 qcow2_signal_corruption(bs, true, -1, -1, "Zero cluster entry found"
@@ -599,42 +602,43 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, 
uint64_t offset,
 }
 /* Compressed clusters can only be processed one by one */
 c = 1;
-*cluster_offset &= L2E_COMPRESSED_OFFSET_SIZE_MASK;
+*host_offset = l2_entry & L2E_COMPRESSED_OFFSET_SIZE_MASK;
 break;
 case QCOW2_CLUSTER_ZERO_PLAIN:
 case QCOW2_CLUSTER_UNALLOCATED:
 /* how many empty clusters ? */
 

[PATCH v6 32/32] iotests: Add tests for qcow2 images with extended L2 entries

2020-05-24 Thread Alberto Garcia
Signed-off-by: Alberto Garcia 
---
 tests/qemu-iotests/271 | 705 +
 tests/qemu-iotests/271.out | 603 +++
 tests/qemu-iotests/group   |   1 +
 3 files changed, 1309 insertions(+)
 create mode 100755 tests/qemu-iotests/271
 create mode 100644 tests/qemu-iotests/271.out

diff --git a/tests/qemu-iotests/271 b/tests/qemu-iotests/271
new file mode 100755
index 00..4dd11772cf
--- /dev/null
+++ b/tests/qemu-iotests/271
@@ -0,0 +1,705 @@
+#!/bin/bash
+#
+# Test qcow2 images with extended L2 entries
+#
+# Copyright (C) 2019-2020 Igalia, S.L.
+# Author: Alberto Garcia 
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+# creator
+owner=be...@igalia.com
+
+seq="$(basename $0)"
+echo "QA output created by $seq"
+
+here="$PWD"
+status=1   # failure is the default!
+
+_cleanup()
+{
+   _cleanup_test_img
+rm -f "$TEST_IMG.raw"
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+_supported_fmt qcow2
+_supported_proto file nfs
+_supported_os Linux
+_unsupported_imgopts extended_l2 compat=0.10 cluster_size data_file
+
+l2_offset=262144 # 0x4
+
+_verify_img()
+{
+$QEMU_IMG compare "$TEST_IMG" "$TEST_IMG.raw" | grep -v 'Images are 
identical'
+$QEMU_IMG check "$TEST_IMG" | _filter_qemu_img_check | \
+grep -v 'No errors were found on the image'
+}
+
+# Compare the bitmap of an extended L2 entry against an expected value
+_verify_l2_bitmap()
+{
+entry_no="$1"# L2 entry number, starting from 0
+expected_alloc="$alloc"  # Space-separated list of allocated subcluster 
indexes
+expected_zero="$zero"# Space-separated list of zero subcluster indexes
+
+offset=$(($l2_offset + $entry_no * 16))
+entry=`peek_file_be "$TEST_IMG" $offset 8`
+offset=$(($offset + 8))
+bitmap=`peek_file_be "$TEST_IMG" $offset 8`
+
+expected_bitmap=0
+for bit in $expected_alloc; do
+expected_bitmap=$(($expected_bitmap | (1 << $bit)))
+done
+for bit in $expected_zero; do
+expected_bitmap=$(($expected_bitmap | (1 << (32 + $bit
+done
+expected_bitmap=`printf "%llu" $expected_bitmap`
+
+printf "L2 entry #%d: 0x%016lx %016lx\n" "$entry_no" "$entry" "$bitmap"
+if [ "$bitmap" != "$expected_bitmap" ]; then
+printf "ERROR: expecting bitmap   0x%016lx\n" "$expected_bitmap"
+fi
+}
+
+_test_write()
+{
+cmd="$1"
+l2_entry_idx="$2"
+[ -n "$l2_entry_idx" ] || l2_entry_idx=0
+raw_cmd=`echo $cmd | sed s/-c//` # Raw images don't support -c
+echo "$cmd"
+$QEMU_IO -c "$cmd" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "$raw_cmd" -f raw "$TEST_IMG.raw" | _filter_qemu_io
+_verify_img
+_verify_l2_bitmap "$l2_entry_idx"
+}
+
+_reset_img()
+{
+size="$1"
+$QEMU_IMG create -f raw "$TEST_IMG.raw" "$size" | _filter_img_create
+if [ "$use_backing_file" = "yes" ]; then
+$QEMU_IMG create -f raw "$TEST_IMG.base" "$size" | _filter_img_create
+$QEMU_IO -c "write -q -P 0xFF 0 $size" -f raw "$TEST_IMG.base" | 
_filter_qemu_io
+$QEMU_IO -c "write -q -P 0xFF 0 $size" -f raw "$TEST_IMG.raw" | 
_filter_qemu_io
+_make_test_img -o extended_l2=on -b "$TEST_IMG.base" "$size"
+else
+_make_test_img -o extended_l2=on "$size"
+fi
+}
+
+# Test that writing to an image with subclusters produces the expected
+# results, in images with and without backing files
+for use_backing_file in yes no; do
+echo
+echo "### Standard write tests (backing file: $use_backing_file) ###"
+echo
+_reset_img 1M
+### Write subcluster #0 (beginning of subcluster) ###
+alloc="0"; zero=""
+_test_write 'write -q -P 1 0 1k'
+
+### Write subcluster #1 (middle of subcluster) ###
+alloc="0 1"; zero=""
+_test_write 'write -q -P 2 3k 512'
+
+### Write subcluster #2 (end of subcluster) ###
+alloc="0 1 2"; zero=""
+_test_write 'write -q -P 3 5k 1k'
+
+### Write subcluster #3 (full subcluster) ###
+alloc="0 1 2 3"; zero=""
+_test_write 'write -q -P 4 6k 2k'
+
+### Write subclusters #4-6 (full subclusters) ###
+alloc="`seq 0 6`"; zero=""
+_test_write 'write -q -P 5 8k 6k'
+
+### Write subclusters #7-9 (partial subclusters) ###
+alloc="`seq 0 9`"; zero=""
+_test_write 'write 

[PATCH v6 08/32] qcow2: Add dummy has_subclusters() function

2020-05-24 Thread Alberto Garcia
This function will be used by the qcow2 code to check if an image has
subclusters or not.

At the moment this simply returns false. Once all patches needed for
subcluster support are ready then QEMU will be able to create and
read images with subclusters and this function will return the actual
value.

Signed-off-by: Alberto Garcia 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 block/qcow2.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/block/qcow2.h b/block/qcow2.h
index afee84f41f..f41bfd743f 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -510,6 +510,12 @@ typedef enum QCow2MetadataOverlap {
 
 #define INV_OFFSET (-1ULL)
 
+static inline bool has_subclusters(BDRVQcow2State *s)
+{
+/* FIXME: Return false until this feature is complete */
+return false;
+}
+
 static inline uint64_t get_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice,
 int idx)
 {
-- 
2.20.1




[PATCH v6 13/32] qcow2: Update get/set_l2_entry() and add get/set_l2_bitmap()

2020-05-24 Thread Alberto Garcia
Extended L2 entries are 128-bit wide: 64 bits for the entry itself and
64 bits for the subcluster allocation bitmap.

In order to support them correctly get/set_l2_entry() need to be
updated so they take the entry width into account in order to
calculate the correct offset.

This patch also adds the get/set_l2_bitmap() functions that are
used to access the bitmaps. For convenience we allow calling
get_l2_bitmap() on images without subclusters. In this case the
returned value is always 0 and has no meaning.

Signed-off-by: Alberto Garcia 
Reviewed-by: Eric Blake 
---
 block/qcow2.h | 21 +
 1 file changed, 21 insertions(+)

diff --git a/block/qcow2.h b/block/qcow2.h
index 95cc16a089..5c6bf48c7a 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -533,15 +533,36 @@ static inline size_t l2_entry_size(BDRVQcow2State *s)
 static inline uint64_t get_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice,
 int idx)
 {
+idx *= l2_entry_size(s) / sizeof(uint64_t);
 return be64_to_cpu(l2_slice[idx]);
 }
 
+static inline uint64_t get_l2_bitmap(BDRVQcow2State *s, uint64_t *l2_slice,
+ int idx)
+{
+if (has_subclusters(s)) {
+idx *= l2_entry_size(s) / sizeof(uint64_t);
+return be64_to_cpu(l2_slice[idx + 1]);
+} else {
+return 0; /* For convenience only; this value has no meaning. */
+}
+}
+
 static inline void set_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice,
 int idx, uint64_t entry)
 {
+idx *= l2_entry_size(s) / sizeof(uint64_t);
 l2_slice[idx] = cpu_to_be64(entry);
 }
 
+static inline void set_l2_bitmap(BDRVQcow2State *s, uint64_t *l2_slice,
+ int idx, uint64_t bitmap)
+{
+assert(has_subclusters(s));
+idx *= l2_entry_size(s) / sizeof(uint64_t);
+l2_slice[idx + 1] = cpu_to_be64(bitmap);
+}
+
 static inline bool has_data_file(BlockDriverState *bs)
 {
 BDRVQcow2State *s = bs->opaque;
-- 
2.20.1




[PATCH v6 16/32] qcow2: Add qcow2_cluster_is_allocated()

2020-05-24 Thread Alberto Garcia
This helper function tells us if a cluster is allocated (that is,
there is an associated host offset for it).

Signed-off-by: Alberto Garcia 
Reviewed-by: Eric Blake 
---
 block/qcow2.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/block/qcow2.h b/block/qcow2.h
index 27dbcbc502..ba7614e406 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -780,6 +780,12 @@ QCow2SubclusterType 
qcow2_get_subcluster_type(BlockDriverState *bs,
 }
 }
 
+static inline bool qcow2_cluster_is_allocated(QCow2ClusterType type)
+{
+return (type == QCOW2_CLUSTER_COMPRESSED || type == QCOW2_CLUSTER_NORMAL ||
+type == QCOW2_CLUSTER_ZERO_ALLOC);
+}
+
 /* Check whether refcounts are eager or lazy */
 static inline bool qcow2_need_accurate_refcounts(BDRVQcow2State *s)
 {
-- 
2.20.1




[PATCH v6 07/32] qcow2: Document the Extended L2 Entries feature

2020-05-24 Thread Alberto Garcia
Subcluster allocation in qcow2 is implemented by extending the
existing L2 table entries and adding additional information to
indicate the allocation status of each subcluster.

This patch documents the changes to the qcow2 format and how they
affect the calculation of the L2 cache size.

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
---
 docs/interop/qcow2.txt | 68 --
 docs/qcow2-cache.txt   | 19 +++-
 2 files changed, 83 insertions(+), 4 deletions(-)

diff --git a/docs/interop/qcow2.txt b/docs/interop/qcow2.txt
index cb723463f2..64e9345fb4 100644
--- a/docs/interop/qcow2.txt
+++ b/docs/interop/qcow2.txt
@@ -42,6 +42,9 @@ The first cluster of a qcow2 image contains the file header:
 as the maximum cluster size and won't be able to open 
images
 with larger cluster sizes.
 
+Note: if the image has Extended L2 Entries then 
cluster_bits
+must be at least 14 (i.e. 16384 byte clusters).
+
  24 - 31:   size
 Virtual disk size in bytes.
 
@@ -117,7 +120,12 @@ the next fields through header_length.
 clusters. The compression_type field must be
 present and not zero.
 
-Bits 4-63:  Reserved (set to 0)
+Bit 4:  Extended L2 Entries.  If this bit is set then
+L2 table entries use an extended format that
+allows subcluster-based allocation. See the
+Extended L2 Entries section for more details.
+
+Bits 5-63:  Reserved (set to 0)
 
  80 -  87:  compatible_features
 Bitmask of compatible features. An implementation can
@@ -498,7 +506,7 @@ cannot be relaxed without an incompatible layout change).
 Given an offset into the virtual disk, the offset into the image file can be
 obtained as follows:
 
-l2_entries = (cluster_size / sizeof(uint64_t))
+l2_entries = (cluster_size / sizeof(uint64_t))[*]
 
 l2_index = (offset / cluster_size) % l2_entries
 l1_index = (offset / cluster_size) / l2_entries
@@ -508,6 +516,8 @@ obtained as follows:
 
 return cluster_offset + (offset % cluster_size)
 
+[*] this changes if Extended L2 Entries are enabled, see next section
+
 L1 table entry:
 
 Bit  0 -  8:Reserved (set to 0)
@@ -548,7 +558,8 @@ Standard Cluster Descriptor:
 nor is data read from the backing file if the cluster is
 unallocated.
 
-With version 2, this is always 0.
+With version 2 or with extended L2 entries (see the next
+section), this is always 0.
 
  1 -  8:Reserved (set to 0)
 
@@ -585,6 +596,57 @@ file (except if bit 0 in the Standard Cluster Descriptor 
is set). If there is
 no backing file or the backing file is smaller than the image, they shall read
 zeros for all parts that are not covered by the backing file.
 
+== Extended L2 Entries ==
+
+An image uses Extended L2 Entries if bit 4 is set on the incompatible_features
+field of the header.
+
+In these images standard data clusters are divided into 32 subclusters of the
+same size. They are contiguous and start from the beginning of the cluster.
+Subclusters can be allocated independently and the L2 entry contains 
information
+indicating the status of each one of them. Compressed data clusters don't have
+subclusters so they are treated the same as in images without this feature.
+
+The size of an extended L2 entry is 128 bits so the number of entries per table
+is calculated using this formula:
+
+l2_entries = (cluster_size / (2 * sizeof(uint64_t)))
+
+The first 64 bits have the same format as the standard L2 table entry described
+in the previous section, with the exception of bit 0 of the standard cluster
+descriptor.
+
+The last 64 bits contain a subcluster allocation bitmap with this format:
+
+Subcluster Allocation Bitmap (for standard clusters):
+
+Bit  0 - 31:Allocation status (one bit per subcluster)
+
+1: the subcluster is allocated. In this case the
+   host cluster offset field must contain a valid
+   offset.
+0: the subcluster is not allocated. In this case
+   read requests shall go to the backing file or
+   return zeros if there is no backing file data.
+
+Bits are assigned starting from the least significant
+one (i.e. bit x is used for subcluster x).
+
+32 - 63 Subcluster reads as zeros (one bit per subcluster)
+
+1: the subcluster reads as zeros. In this case the
+   allocation status bit must be unset. The host
+   cluster offset field may or 

[PATCH v6 01/32] qcow2: Make Qcow2AioTask store the full host offset

2020-05-24 Thread Alberto Garcia
The file_cluster_offset field of Qcow2AioTask stores a cluster-aligned
host offset. In practice this is not very useful because all users(*)
of this structure need the final host offset into the cluster, which
they calculate using

   host_offset = file_cluster_offset + offset_into_cluster(s, offset)

There is no reason why Qcow2AioTask cannot store host_offset directly
and that is what this patch does.

(*) compressed clusters are the exception: in this case what
file_cluster_offset was storing was the full compressed cluster
descriptor (offset + size). This does not change with this patch
but it is documented now.

Signed-off-by: Alberto Garcia 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 block/qcow2.c  | 69 ++
 block/trace-events |  2 +-
 2 files changed, 34 insertions(+), 37 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index dfab8d2f6c..4815dc0931 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -74,7 +74,7 @@ typedef struct {
 
 static int coroutine_fn
 qcow2_co_preadv_compressed(BlockDriverState *bs,
-   uint64_t file_cluster_offset,
+   uint64_t cluster_descriptor,
uint64_t offset,
uint64_t bytes,
QEMUIOVector *qiov,
@@ -2103,7 +2103,7 @@ out:
 
 static coroutine_fn int
 qcow2_co_preadv_encrypted(BlockDriverState *bs,
-   uint64_t file_cluster_offset,
+   uint64_t host_offset,
uint64_t offset,
uint64_t bytes,
QEMUIOVector *qiov,
@@ -2130,16 +2130,12 @@ qcow2_co_preadv_encrypted(BlockDriverState *bs,
 }
 
 BLKDBG_EVENT(bs->file, BLKDBG_READ_AIO);
-ret = bdrv_co_pread(s->data_file,
-file_cluster_offset + offset_into_cluster(s, offset),
-bytes, buf, 0);
+ret = bdrv_co_pread(s->data_file, host_offset, bytes, buf, 0);
 if (ret < 0) {
 goto fail;
 }
 
-if (qcow2_co_decrypt(bs,
- file_cluster_offset + offset_into_cluster(s, offset),
- offset, buf, bytes) < 0)
+if (qcow2_co_decrypt(bs, host_offset, offset, buf, bytes) < 0)
 {
 ret = -EIO;
 goto fail;
@@ -2157,7 +2153,7 @@ typedef struct Qcow2AioTask {
 
 BlockDriverState *bs;
 QCow2ClusterType cluster_type; /* only for read */
-uint64_t file_cluster_offset;
+uint64_t host_offset; /* or full descriptor in compressed clusters */
 uint64_t offset;
 uint64_t bytes;
 QEMUIOVector *qiov;
@@ -2170,7 +2166,7 @@ static coroutine_fn int qcow2_add_task(BlockDriverState 
*bs,
AioTaskPool *pool,
AioTaskFunc func,
QCow2ClusterType cluster_type,
-   uint64_t file_cluster_offset,
+   uint64_t host_offset,
uint64_t offset,
uint64_t bytes,
QEMUIOVector *qiov,
@@ -2185,7 +2181,7 @@ static coroutine_fn int qcow2_add_task(BlockDriverState 
*bs,
 .bs = bs,
 .cluster_type = cluster_type,
 .qiov = qiov,
-.file_cluster_offset = file_cluster_offset,
+.host_offset = host_offset,
 .offset = offset,
 .bytes = bytes,
 .qiov_offset = qiov_offset,
@@ -2194,7 +2190,7 @@ static coroutine_fn int qcow2_add_task(BlockDriverState 
*bs,
 
 trace_qcow2_add_task(qemu_coroutine_self(), bs, pool,
  func == qcow2_co_preadv_task_entry ? "read" : "write",
- cluster_type, file_cluster_offset, offset, bytes,
+ cluster_type, host_offset, offset, bytes,
  qiov, qiov_offset);
 
 if (!pool) {
@@ -2208,13 +2204,12 @@ static coroutine_fn int qcow2_add_task(BlockDriverState 
*bs,
 
 static coroutine_fn int qcow2_co_preadv_task(BlockDriverState *bs,
  QCow2ClusterType cluster_type,
- uint64_t file_cluster_offset,
+ uint64_t host_offset,
  uint64_t offset, uint64_t bytes,
  QEMUIOVector *qiov,
  size_t qiov_offset)
 {
 BDRVQcow2State *s = bs->opaque;
-int offset_in_cluster = offset_into_cluster(s, offset);
 
 switch (cluster_type) {
 case QCOW2_CLUSTER_ZERO_PLAIN:
@@ -2230,19 +2225,17 @@ static coroutine_fn int 
qcow2_co_preadv_task(BlockDriverState *bs,
qiov, 

[PATCH v6 00/32] Add subcluster allocation to qcow2

2020-05-24 Thread Alberto Garcia
Hi,

here's the new version of the patches to add subcluster allocation
support to qcow2.

Please refer to the cover letter of the first version for a full
description of the patches:

   https://lists.gnu.org/archive/html/qemu-block/2019-10/msg00983.html

The big change here is that the code does not need to iterate over
every subcluster anymore. There's now a new function called
qcow2_get_subcluster_range_type() that checks a group of subclusters
in one go using cto and ctz operations.

Berto

v6:
- Patch 07: Fix indentation issues in the documentation [Eric]
- Patch 14: Use half-open intervals for the subcluster macros.
New method to detected invalid subclusters.
- Patch 15: New patch
- Patch 20: Use a faster method to check the status of a groups of
subclusters without having to iterate over each one of
them.
- Patch 21: Optimize count_contiguous_subclusters() by removing the
iteration over every subcluster.
Earlier detection of invalid subclusters.
Minor documentation updates [Eric]
- Patch 25: Update after the macro changes in patch 14
- Patch 28: Update after the macro changes in patch 14
- Patch 29: Change patch order so this goes before patch 30
- Patch 30: Fix rebase conflicts now that compression_type has been
merged.
Disable toggling extended_l2 using amend [Eric]
- Patch 32: Many new tests and refactoring

v5: https://lists.gnu.org/archive/html/qemu-block/2020-05/msg00251.html
v4: https://lists.gnu.org/archive/html/qemu-block/2020-03/msg00966.html
v3: https://lists.gnu.org/archive/html/qemu-block/2019-12/msg00587.html
v2: https://lists.gnu.org/archive/html/qemu-block/2019-10/msg01642.html
v1: https://lists.gnu.org/archive/html/qemu-block/2019-10/msg00983.html

Output of git backport-diff against v5:

Key:
[] : patches are identical
[] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/32:[] [--] 'qcow2: Make Qcow2AioTask store the full host offset'
002/32:[] [--] 'qcow2: Convert qcow2_get_cluster_offset() into 
qcow2_get_host_offset()'
003/32:[] [--] 'qcow2: Add calculate_l2_meta()'
004/32:[] [--] 'qcow2: Split cluster_needs_cow() out of 
count_cow_clusters()'
005/32:[] [--] 'qcow2: Process QCOW2_CLUSTER_ZERO_ALLOC clusters in 
handle_copied()'
006/32:[] [--] 'qcow2: Add get_l2_entry() and set_l2_entry()'
007/32:[0006] [FC] 'qcow2: Document the Extended L2 Entries feature'
008/32:[] [--] 'qcow2: Add dummy has_subclusters() function'
009/32:[] [--] 'qcow2: Add subcluster-related fields to BDRVQcow2State'
010/32:[] [--] 'qcow2: Add offset_to_sc_index()'
011/32:[] [--] 'qcow2: Add offset_into_subcluster() and 
size_to_subclusters()'
012/32:[] [--] 'qcow2: Add l2_entry_size()'
013/32:[] [--] 'qcow2: Update get/set_l2_entry() and add 
get/set_l2_bitmap()'
014/32:[0039] [FC] 'qcow2: Add QCow2SubclusterType and 
qcow2_get_subcluster_type()'
015/32:[down] 'qcow2: Add qcow2_get_subcluster_range_type()'
016/32:[] [--] 'qcow2: Add qcow2_cluster_is_allocated()'
017/32:[] [--] 'qcow2: Add cluster type parameter to 
qcow2_get_host_offset()'
018/32:[] [--] 'qcow2: Replace QCOW2_CLUSTER_* with QCOW2_SUBCLUSTER_*'
019/32:[] [--] 'qcow2: Handle QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC'
020/32:[0111] [FC] 'qcow2: Add subcluster support to calculate_l2_meta()'
021/32:[0091] [FC] 'qcow2: Add subcluster support to qcow2_get_host_offset()'
022/32:[] [--] 'qcow2: Add subcluster support to zero_in_l2_slice()'
023/32:[] [--] 'qcow2: Add subcluster support to discard_in_l2_slice()'
024/32:[] [--] 'qcow2: Add subcluster support to check_refcounts_l2()'
025/32:[0004] [FC] 'qcow2: Update L2 bitmap in qcow2_alloc_cluster_link_l2()'
026/32:[] [--] 'qcow2: Clear the L2 bitmap when allocating a compressed 
cluster'
027/32:[] [--] 'qcow2: Add subcluster support to handle_alloc_space()'
028/32:[0004] [FC] 'qcow2: Add subcluster support to qcow2_co_pwrite_zeroes()'
029/32:[0001] [FC] 'qcow2: Add subcluster support to qcow2_measure()'
030/32:[0549] [FC] 'qcow2: Add the 'extended_l2' option and the 
QCOW2_INCOMPAT_EXTL2 bit'
031/32:[] [--] 'qcow2: Assert that expand_zero_clusters_in_l1() does not 
support subclusters'
032/32:[0775] [FC] 'iotests: Add tests for qcow2 images with extended L2 
entries'

Alberto Garcia (32):
  qcow2: Make Qcow2AioTask store the full host offset
  qcow2: Convert qcow2_get_cluster_offset() into qcow2_get_host_offset()
  qcow2: Add calculate_l2_meta()
  qcow2: Split cluster_needs_cow() out of count_cow_clusters()
  qcow2: Process QCOW2_CLUSTER_ZERO_ALLOC clusters in handle_copied()
  qcow2: Add get_l2_entry() and set_l2_entry()
  qcow2: Document the Extended L2 Entries feature
  qcow2: Add dummy has_subclusters() function
  qcow2: Add subcluster-related fields to BDRVQcow2State
  qcow2: 

[PATCH v6 05/32] qcow2: Process QCOW2_CLUSTER_ZERO_ALLOC clusters in handle_copied()

2020-05-24 Thread Alberto Garcia
When writing to a qcow2 file there are two functions that take a
virtual offset and return a host offset, possibly allocating new
clusters if necessary:

   - handle_copied() looks for normal data clusters that are already
 allocated and have a reference count of 1. In those clusters we
 can simply write the data and there is no need to perform any
 copy-on-write.

   - handle_alloc() looks for clusters that do need copy-on-write,
 either because they haven't been allocated yet, because their
 reference count is != 1 or because they are ZERO_ALLOC clusters.

The ZERO_ALLOC case is a bit special because those are clusters that
are already allocated and they could perfectly be dealt with in
handle_copied() (as long as copy-on-write is performed when required).

In fact, there is extra code specifically for them in handle_alloc()
that tries to reuse the existing allocation if possible and frees them
otherwise.

This patch changes the handling of ZERO_ALLOC clusters so the
semantics of these two functions are now like this:

   - handle_copied() looks for clusters that are already allocated and
 which we can overwrite (NORMAL and ZERO_ALLOC clusters with a
 reference count of 1).

   - handle_alloc() looks for clusters for which we need a new
 allocation (all other cases).

One important difference after this change is that clusters found
in handle_copied() may now require copy-on-write, but this will be
necessary anyway once we add support for subclusters.

Signed-off-by: Alberto Garcia 
Reviewed-by: Eric Blake 
---
 block/qcow2-cluster.c | 256 +++---
 1 file changed, 141 insertions(+), 115 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 80f9787461..fce0be7a08 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1039,13 +1039,18 @@ void qcow2_alloc_cluster_abort(BlockDriverState *bs, 
QCowL2Meta *m)
 
 /*
  * For a given write request, create a new QCowL2Meta structure, add
- * it to @m and the BDRVQcow2State.cluster_allocs list.
+ * it to @m and the BDRVQcow2State.cluster_allocs list. If the write
+ * request does not need copy-on-write or changes to the L2 metadata
+ * then this function does nothing.
  *
  * @host_cluster_offset points to the beginning of the first cluster.
  *
  * @guest_offset and @bytes indicate the offset and length of the
  * request.
  *
+ * @l2_slice contains the L2 entries of all clusters involved in this
+ * write request.
+ *
  * If @keep_old is true it means that the clusters were already
  * allocated and will be overwritten. If false then the clusters are
  * new and we have to decrease the reference count of the old ones.
@@ -1053,15 +1058,53 @@ void qcow2_alloc_cluster_abort(BlockDriverState *bs, 
QCowL2Meta *m)
 static void calculate_l2_meta(BlockDriverState *bs,
   uint64_t host_cluster_offset,
   uint64_t guest_offset, unsigned bytes,
-  QCowL2Meta **m, bool keep_old)
+  uint64_t *l2_slice, QCowL2Meta **m, bool 
keep_old)
 {
 BDRVQcow2State *s = bs->opaque;
-unsigned cow_start_from = 0;
+int l2_index = offset_to_l2_slice_index(s, guest_offset);
+uint64_t l2_entry;
+unsigned cow_start_from, cow_end_to;
 unsigned cow_start_to = offset_into_cluster(s, guest_offset);
 unsigned cow_end_from = cow_start_to + bytes;
-unsigned cow_end_to = ROUND_UP(cow_end_from, s->cluster_size);
 unsigned nb_clusters = size_to_clusters(s, cow_end_from);
 QCowL2Meta *old_m = *m;
+QCow2ClusterType type;
+
+assert(nb_clusters <= s->l2_slice_size - l2_index);
+
+/* Return if there's no COW (all clusters are normal and we keep them) */
+if (keep_old) {
+int i;
+for (i = 0; i < nb_clusters; i++) {
+l2_entry = be64_to_cpu(l2_slice[l2_index + i]);
+if (qcow2_get_cluster_type(bs, l2_entry) != QCOW2_CLUSTER_NORMAL) {
+break;
+}
+}
+if (i == nb_clusters) {
+return;
+}
+}
+
+/* Get the L2 entry of the first cluster */
+l2_entry = be64_to_cpu(l2_slice[l2_index]);
+type = qcow2_get_cluster_type(bs, l2_entry);
+
+if (type == QCOW2_CLUSTER_NORMAL && keep_old) {
+cow_start_from = cow_start_to;
+} else {
+cow_start_from = 0;
+}
+
+/* Get the L2 entry of the last cluster */
+l2_entry = be64_to_cpu(l2_slice[l2_index + nb_clusters - 1]);
+type = qcow2_get_cluster_type(bs, l2_entry);
+
+if (type == QCOW2_CLUSTER_NORMAL && keep_old) {
+cow_end_to = cow_end_from;
+} else {
+cow_end_to = ROUND_UP(cow_end_from, s->cluster_size);
+}
 
 *m = g_malloc0(sizeof(**m));
 **m = (QCowL2Meta) {
@@ -1087,18 +1130,22 @@ static void calculate_l2_meta(BlockDriverState *bs,
 QLIST_INSERT_HEAD(>cluster_allocs, *m, next_in_flight);
 }
 
-/* Returns true if writing 

[PATCH v6 12/32] qcow2: Add l2_entry_size()

2020-05-24 Thread Alberto Garcia
qcow2 images with subclusters have 128-bit L2 entries. The first 64
bits contain the same information as traditional images and the last
64 bits form a bitmap with the status of each individual subcluster.

Because of that we cannot assume that L2 entries are sizeof(uint64_t)
anymore. This function returns the proper value for the image.

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
---
 block/qcow2.h  |  9 +
 block/qcow2-cluster.c  | 12 ++--
 block/qcow2-refcount.c | 14 --
 block/qcow2.c  |  8 
 4 files changed, 27 insertions(+), 16 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 79c4f82383..95cc16a089 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -80,6 +80,10 @@
 
 #define QCOW_EXTL2_SUBCLUSTERS_PER_CLUSTER 32
 
+/* Size of normal and extended L2 entries */
+#define L2E_SIZE_NORMAL   (sizeof(uint64_t))
+#define L2E_SIZE_EXTENDED (sizeof(uint64_t) * 2)
+
 #define MIN_CLUSTER_BITS 9
 #define MAX_CLUSTER_BITS 21
 
@@ -521,6 +525,11 @@ static inline bool has_subclusters(BDRVQcow2State *s)
 return false;
 }
 
+static inline size_t l2_entry_size(BDRVQcow2State *s)
+{
+return has_subclusters(s) ? L2E_SIZE_EXTENDED : L2E_SIZE_NORMAL;
+}
+
 static inline uint64_t get_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice,
 int idx)
 {
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 76fd0f3cdb..8b2fc550b7 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -208,7 +208,7 @@ static int l2_load(BlockDriverState *bs, uint64_t offset,
uint64_t l2_offset, uint64_t **l2_slice)
 {
 BDRVQcow2State *s = bs->opaque;
-int start_of_slice = sizeof(uint64_t) *
+int start_of_slice = l2_entry_size(s) *
 (offset_to_l2_index(s, offset) - offset_to_l2_slice_index(s, offset));
 
 return qcow2_cache_get(bs, s->l2_table_cache, l2_offset + start_of_slice,
@@ -281,7 +281,7 @@ static int l2_allocate(BlockDriverState *bs, int l1_index)
 
 /* allocate a new l2 entry */
 
-l2_offset = qcow2_alloc_clusters(bs, s->l2_size * sizeof(uint64_t));
+l2_offset = qcow2_alloc_clusters(bs, s->l2_size * l2_entry_size(s));
 if (l2_offset < 0) {
 ret = l2_offset;
 goto fail;
@@ -305,7 +305,7 @@ static int l2_allocate(BlockDriverState *bs, int l1_index)
 
 /* allocate a new entry in the l2 cache */
 
-slice_size2 = s->l2_slice_size * sizeof(uint64_t);
+slice_size2 = s->l2_slice_size * l2_entry_size(s);
 n_slices = s->cluster_size / slice_size2;
 
 trace_qcow2_l2_allocate_get_empty(bs, l1_index);
@@ -369,7 +369,7 @@ fail:
 }
 s->l1_table[l1_index] = old_l2_offset;
 if (l2_offset > 0) {
-qcow2_free_clusters(bs, l2_offset, s->l2_size * sizeof(uint64_t),
+qcow2_free_clusters(bs, l2_offset, s->l2_size * l2_entry_size(s),
 QCOW2_DISCARD_ALWAYS);
 }
 return ret;
@@ -716,7 +716,7 @@ static int get_cluster_table(BlockDriverState *bs, uint64_t 
offset,
 
 /* Then decrease the refcount of the old table */
 if (l2_offset) {
-qcow2_free_clusters(bs, l2_offset, s->l2_size * sizeof(uint64_t),
+qcow2_free_clusters(bs, l2_offset, s->l2_size * l2_entry_size(s),
 QCOW2_DISCARD_OTHER);
 }
 
@@ -1913,7 +1913,7 @@ static int expand_zero_clusters_in_l1(BlockDriverState 
*bs, uint64_t *l1_table,
 int ret;
 int i, j;
 
-slice_size2 = s->l2_slice_size * sizeof(uint64_t);
+slice_size2 = s->l2_slice_size * l2_entry_size(s);
 n_slices = s->cluster_size / slice_size2;
 
 if (!is_active_l1) {
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 04546838e8..770c5dbc83 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1254,7 +1254,7 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
 l2_slice = NULL;
 l1_table = NULL;
 l1_size2 = l1_size * sizeof(uint64_t);
-slice_size2 = s->l2_slice_size * sizeof(uint64_t);
+slice_size2 = s->l2_slice_size * l2_entry_size(s);
 n_slices = s->cluster_size / slice_size2;
 
 s->cache_discards = true;
@@ -1605,7 +1605,7 @@ static int check_refcounts_l2(BlockDriverState *bs, 
BdrvCheckResult *res,
 int i, l2_size, nb_csectors, ret;
 
 /* Read L2 table from disk */
-l2_size = s->l2_size * sizeof(uint64_t);
+l2_size = s->l2_size * l2_entry_size(s);
 l2_table = g_malloc(l2_size);
 
 ret = bdrv_pread(bs->file, l2_offset, l2_table, l2_size);
@@ -1680,15 +1680,16 @@ static int check_refcounts_l2(BlockDriverState *bs, 
BdrvCheckResult *res,
 fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR",
 offset);
 if (fix & BDRV_FIX_ERRORS) {
+int idx = i * (l2_entry_size(s) / sizeof(uint64_t));
 uint64_t l2e_offset =
-l2_offset + 

[PATCH v6 06/32] qcow2: Add get_l2_entry() and set_l2_entry()

2020-05-24 Thread Alberto Garcia
The size of an L2 entry is 64 bits, but if we want to have subclusters
we need extended L2 entries. This means that we have to access L2
tables and slices differently depending on whether an image has
extended L2 entries or not.

This patch replaces all l2_slice[] accesses with calls to
get_l2_entry() and set_l2_entry().

Signed-off-by: Alberto Garcia 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 block/qcow2.h  | 12 
 block/qcow2-cluster.c  | 63 ++
 block/qcow2-refcount.c | 17 ++--
 3 files changed, 54 insertions(+), 38 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 9d8530f1e8..afee84f41f 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -510,6 +510,18 @@ typedef enum QCow2MetadataOverlap {
 
 #define INV_OFFSET (-1ULL)
 
+static inline uint64_t get_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice,
+int idx)
+{
+return be64_to_cpu(l2_slice[idx]);
+}
+
+static inline void set_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice,
+int idx, uint64_t entry)
+{
+l2_slice[idx] = cpu_to_be64(entry);
+}
+
 static inline bool has_data_file(BlockDriverState *bs)
 {
 BDRVQcow2State *s = bs->opaque;
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index fce0be7a08..76fd0f3cdb 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -383,12 +383,13 @@ fail:
  * cluster which may require a different handling)
  */
 static int count_contiguous_clusters(BlockDriverState *bs, int nb_clusters,
-int cluster_size, uint64_t *l2_slice, uint64_t stop_flags)
+int cluster_size, uint64_t *l2_slice, int l2_index, uint64_t 
stop_flags)
 {
+BDRVQcow2State *s = bs->opaque;
 int i;
 QCow2ClusterType first_cluster_type;
 uint64_t mask = stop_flags | L2E_OFFSET_MASK | QCOW_OFLAG_COMPRESSED;
-uint64_t first_entry = be64_to_cpu(l2_slice[0]);
+uint64_t first_entry = get_l2_entry(s, l2_slice, l2_index);
 uint64_t offset = first_entry & mask;
 
 first_cluster_type = qcow2_get_cluster_type(bs, first_entry);
@@ -401,7 +402,7 @@ static int count_contiguous_clusters(BlockDriverState *bs, 
int nb_clusters,
first_cluster_type == QCOW2_CLUSTER_ZERO_ALLOC);
 
 for (i = 0; i < nb_clusters; i++) {
-uint64_t l2_entry = be64_to_cpu(l2_slice[i]) & mask;
+uint64_t l2_entry = get_l2_entry(s, l2_slice, l2_index + i) & mask;
 if (offset + (uint64_t) i * cluster_size != l2_entry) {
 break;
 }
@@ -417,14 +418,16 @@ static int count_contiguous_clusters(BlockDriverState 
*bs, int nb_clusters,
 static int count_contiguous_clusters_unallocated(BlockDriverState *bs,
  int nb_clusters,
  uint64_t *l2_slice,
+ int l2_index,
  QCow2ClusterType wanted_type)
 {
+BDRVQcow2State *s = bs->opaque;
 int i;
 
 assert(wanted_type == QCOW2_CLUSTER_ZERO_PLAIN ||
wanted_type == QCOW2_CLUSTER_UNALLOCATED);
 for (i = 0; i < nb_clusters; i++) {
-uint64_t entry = be64_to_cpu(l2_slice[i]);
+uint64_t entry = get_l2_entry(s, l2_slice, l2_index + i);
 QCow2ClusterType type = qcow2_get_cluster_type(bs, entry);
 
 if (type != wanted_type) {
@@ -573,7 +576,7 @@ int qcow2_get_host_offset(BlockDriverState *bs, uint64_t 
offset,
 /* find the cluster offset for the given disk offset */
 
 l2_index = offset_to_l2_slice_index(s, offset);
-l2_entry = be64_to_cpu(l2_slice[l2_index]);
+l2_entry = get_l2_entry(s, l2_slice, l2_index);
 
 nb_clusters = size_to_clusters(s, bytes_needed);
 /* bytes_needed <= *bytes + offset_in_cluster, both of which are unsigned
@@ -608,7 +611,7 @@ int qcow2_get_host_offset(BlockDriverState *bs, uint64_t 
offset,
 case QCOW2_CLUSTER_UNALLOCATED:
 /* how many empty clusters ? */
 c = count_contiguous_clusters_unallocated(bs, nb_clusters,
-  _slice[l2_index], type);
+  l2_slice, l2_index, type);
 *host_offset = 0;
 break;
 case QCOW2_CLUSTER_ZERO_ALLOC:
@@ -617,7 +620,7 @@ int qcow2_get_host_offset(BlockDriverState *bs, uint64_t 
offset,
 *host_offset = host_cluster_offset + offset_in_cluster;
 /* how many allocated clusters ? */
 c = count_contiguous_clusters(bs, nb_clusters, s->cluster_size,
-  _slice[l2_index], QCOW_OFLAG_ZERO);
+  l2_slice, l2_index, QCOW_OFLAG_ZERO);
 if (offset_into_cluster(s, host_cluster_offset)) {
 qcow2_signal_corruption(bs, true, -1, -1,
 "Cluster allocation offset %#"

[PATCH v6 10/32] qcow2: Add offset_to_sc_index()

2020-05-24 Thread Alberto Garcia
For a given offset, return the subcluster number within its cluster
(i.e. with 32 subclusters per cluster it returns a number between 0
and 31).

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 block/qcow2.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/block/qcow2.h b/block/qcow2.h
index 4e9c732831..ca73ac9b67 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -581,6 +581,11 @@ static inline int offset_to_l2_slice_index(BDRVQcow2State 
*s, int64_t offset)
 return (offset >> s->cluster_bits) & (s->l2_slice_size - 1);
 }
 
+static inline int offset_to_sc_index(BDRVQcow2State *s, int64_t offset)
+{
+return (offset >> s->subcluster_bits) & (s->subclusters_per_cluster - 1);
+}
+
 static inline int64_t qcow2_vm_state_offset(BDRVQcow2State *s)
 {
 return (int64_t)s->l1_vm_state_index << (s->cluster_bits + s->l2_bits);
-- 
2.20.1




[PATCH v6 14/32] qcow2: Add QCow2SubclusterType and qcow2_get_subcluster_type()

2020-05-24 Thread Alberto Garcia
This patch adds QCow2SubclusterType, which is the subcluster-level
version of QCow2ClusterType. All QCOW2_SUBCLUSTER_* values have the
the same meaning as their QCOW2_CLUSTER_* equivalents (when they
exist). See below for details and caveats.

In images without extended L2 entries clusters are treated as having
exactly one subcluster so it is possible to replace one data type with
the other while keeping the exact same semantics.

With extended L2 entries there are new possible values, and every
subcluster in the same cluster can obviously have a different
QCow2SubclusterType so functions need to be adapted to work on the
subcluster level.

There are several things that have to be taken into account:

  a) QCOW2_SUBCLUSTER_COMPRESSED means that the whole cluster is
 compressed. We do not support compression at the subcluster
 level.

  b) There are two different values for unallocated subclusters:
 QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN which means that the whole
 cluster is unallocated, and QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC
 which means that the cluster is allocated but the subcluster is
 not. The latter can only happen in images with extended L2
 entries.

  c) QCOW2_SUBCLUSTER_INVALID is used to detect the cases where an L2
 entry has a value that violates the specification. The caller is
 responsible for handling these situations.

 To prevent compatibility problems with images that have invalid
 values but are currently being read by QEMU without causing side
 effects, QCOW2_SUBCLUSTER_INVALID is only returned for images
 with extended L2 entries.

qcow2_cluster_to_subcluster_type() is added as a separate function
from qcow2_get_subcluster_type(), but this is only temporary and both
will be merged in a subsequent patch.

Signed-off-by: Alberto Garcia 
---
 block/qcow2.h | 126 +-
 1 file changed, 125 insertions(+), 1 deletion(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 5c6bf48c7a..27dbcbc502 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -80,6 +80,21 @@
 
 #define QCOW_EXTL2_SUBCLUSTERS_PER_CLUSTER 32
 
+/* The subcluster X [0..31] is allocated */
+#define QCOW_OFLAG_SUB_ALLOC(X)   (1ULL << (X))
+/* The subcluster X [0..31] reads as zeroes */
+#define QCOW_OFLAG_SUB_ZERO(X)(QCOW_OFLAG_SUB_ALLOC(X) << 32)
+/* Subclusters [X, Y) (0 <= X <= Y <= 32) are allocated */
+#define QCOW_OFLAG_SUB_ALLOC_RANGE(X, Y) \
+(QCOW_OFLAG_SUB_ALLOC(Y) - QCOW_OFLAG_SUB_ALLOC(X))
+/* Subclusters [X, Y) (0 <= X <= Y <= 32) read as zeroes */
+#define QCOW_OFLAG_SUB_ZERO_RANGE(X, Y) \
+(QCOW_OFLAG_SUB_ALLOC_RANGE(X, Y) << 32)
+/* L2 entry bitmap with all allocation bits set */
+#define QCOW_L2_BITMAP_ALL_ALLOC  (QCOW_OFLAG_SUB_ALLOC_RANGE(0, 32))
+/* L2 entry bitmap with all "read as zeroes" bits set */
+#define QCOW_L2_BITMAP_ALL_ZEROES (QCOW_OFLAG_SUB_ZERO_RANGE(0, 32))
+
 /* Size of normal and extended L2 entries */
 #define L2E_SIZE_NORMAL   (sizeof(uint64_t))
 #define L2E_SIZE_EXTENDED (sizeof(uint64_t) * 2)
@@ -462,6 +477,33 @@ typedef struct QCowL2Meta
 QLIST_ENTRY(QCowL2Meta) next_in_flight;
 } QCowL2Meta;
 
+/*
+ * In images with standard L2 entries all clusters are treated as if
+ * they had one subcluster so QCow2ClusterType and QCow2SubclusterType
+ * can be mapped to each other and have the exact same meaning
+ * (QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC cannot happen in these images).
+ *
+ * In images with extended L2 entries QCow2ClusterType refers to the
+ * complete cluster and QCow2SubclusterType to each of the individual
+ * subclusters, so there are several possible combinations:
+ *
+ * |--+---|
+ * | Cluster type | Possible subcluster types |
+ * |--+---|
+ * | UNALLOCATED  | UNALLOCATED_PLAIN |
+ * |  |ZERO_PLAIN |
+ * |--+---|
+ * | NORMAL   | UNALLOCATED_ALLOC |
+ * |  |ZERO_ALLOC |
+ * |  |NORMAL |
+ * |--+---|
+ * | COMPRESSED   |COMPRESSED |
+ * |--+---|
+ *
+ * QCOW2_SUBCLUSTER_INVALID means that the L2 entry is incorrect and
+ * the image should be marked corrupt.
+ */
+
 typedef enum QCow2ClusterType {
 QCOW2_CLUSTER_UNALLOCATED,
 QCOW2_CLUSTER_ZERO_PLAIN,
@@ -470,6 +512,16 @@ typedef enum QCow2ClusterType {
 QCOW2_CLUSTER_COMPRESSED,
 } QCow2ClusterType;
 
+typedef enum QCow2SubclusterType {
+QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN,
+QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC,
+QCOW2_SUBCLUSTER_ZERO_PLAIN,
+QCOW2_SUBCLUSTER_ZERO_ALLOC,
+QCOW2_SUBCLUSTER_NORMAL,
+QCOW2_SUBCLUSTER_COMPRESSED,
+QCOW2_SUBCLUSTER_INVALID,
+} QCow2SubclusterType;
+
 typedef enum QCow2MetadataOverlap {
 

[PATCH v6 23/32] qcow2: Add subcluster support to discard_in_l2_slice()

2020-05-24 Thread Alberto Garcia
Two things need to be taken into account here:

1) With full_discard == true the L2 entry must be cleared completely.
   This also includes the L2 bitmap if the image has extended L2
   entries.

2) With full_discard == false we have to make the discarded cluster
   read back as zeroes. With normal L2 entries this is done with the
   QCOW_OFLAG_ZERO bit, whereas with extended L2 entries this is done
   with the individual 'all zeroes' bits for each subcluster.

   Note however that QCOW_OFLAG_ZERO is not supported in v2 qcow2
   images so, if there is a backing file, discard cannot guarantee
   that the image will read back as zeroes. If this is important for
   the caller it should forbid it as qcow2_co_pdiscard() does (see
   80f5c01183 for more details).

Signed-off-by: Alberto Garcia 
---
 block/qcow2-cluster.c | 51 +++
 1 file changed, 22 insertions(+), 29 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 4e59bbd545..5e70b76c21 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1847,11 +1847,16 @@ static int discard_in_l2_slice(BlockDriverState *bs, 
uint64_t offset,
 assert(nb_clusters <= INT_MAX);
 
 for (i = 0; i < nb_clusters; i++) {
-uint64_t old_l2_entry;
-
-old_l2_entry = get_l2_entry(s, l2_slice, l2_index + i);
+uint64_t old_l2_entry = get_l2_entry(s, l2_slice, l2_index + i);
+uint64_t old_l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index + i);
+uint64_t new_l2_entry = old_l2_entry;
+uint64_t new_l2_bitmap = old_l2_bitmap;
+QCow2ClusterType type = qcow2_get_cluster_type(bs, old_l2_entry);
 
 /*
+ * If full_discard is true, the cluster should not read back as zeroes,
+ * but rather fall through to the backing file.
+ *
  * If full_discard is false, make sure that a discarded area reads back
  * as zeroes for v3 images (we cannot do it for v2 without actually
  * writing a zero-filled buffer). We can skip the operation if the
@@ -1860,40 +1865,28 @@ static int discard_in_l2_slice(BlockDriverState *bs, 
uint64_t offset,
  *
  * TODO We might want to use bdrv_block_status(bs) here, but we're
  * holding s->lock, so that doesn't work today.
- *
- * If full_discard is true, the sector should not read back as zeroes,
- * but rather fall through to the backing file.
  */
-switch (qcow2_get_cluster_type(bs, old_l2_entry)) {
-case QCOW2_CLUSTER_UNALLOCATED:
-if (full_discard || !bs->backing) {
-continue;
+if (full_discard) {
+new_l2_entry = new_l2_bitmap = 0;
+} else if (bs->backing || qcow2_cluster_is_allocated(type)) {
+if (has_subclusters(s)) {
+new_l2_entry = 0;
+new_l2_bitmap = QCOW_L2_BITMAP_ALL_ZEROES;
+} else {
+new_l2_entry = s->qcow_version >= 3 ? QCOW_OFLAG_ZERO : 0;
 }
-break;
+}
 
-case QCOW2_CLUSTER_ZERO_PLAIN:
-if (!full_discard) {
-continue;
-}
-break;
-
-case QCOW2_CLUSTER_ZERO_ALLOC:
-case QCOW2_CLUSTER_NORMAL:
-case QCOW2_CLUSTER_COMPRESSED:
-break;
-
-default:
-abort();
+if (old_l2_entry == new_l2_entry && old_l2_bitmap == new_l2_bitmap) {
+continue;
 }
 
 /* First remove L2 entries */
 qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
-if (!full_discard && s->qcow_version >= 3) {
-set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);
-} else {
-set_l2_entry(s, l2_slice, l2_index + i, 0);
+set_l2_entry(s, l2_slice, l2_index + i, new_l2_entry);
+if (has_subclusters(s)) {
+set_l2_bitmap(s, l2_slice, l2_index + i, new_l2_bitmap);
 }
-
 /* Then decrease the refcount */
 qcow2_free_any_clusters(bs, old_l2_entry, 1, type);
 }
-- 
2.20.1




[PATCH v6 29/32] qcow2: Add subcluster support to qcow2_measure()

2020-05-24 Thread Alberto Garcia
Extended L2 entries are bigger than normal L2 entries so this has an
impact on the amount of metadata needed for a qcow2 file.

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
---
 block/qcow2.c | 20 +---
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 40988fff55..bc7a4a1729 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3233,28 +3233,31 @@ int64_t qcow2_refcount_metadata_size(int64_t clusters, 
size_t cluster_size,
  * @total_size: virtual disk size in bytes
  * @cluster_size: cluster size in bytes
  * @refcount_order: refcount bits power-of-2 exponent
+ * @extended_l2: true if the image has extended L2 entries
  *
  * Returns: Total number of bytes required for the fully allocated image
  * (including metadata).
  */
 static int64_t qcow2_calc_prealloc_size(int64_t total_size,
 size_t cluster_size,
-int refcount_order)
+int refcount_order,
+bool extended_l2)
 {
 int64_t meta_size = 0;
 uint64_t nl1e, nl2e;
 int64_t aligned_total_size = ROUND_UP(total_size, cluster_size);
+size_t l2e_size = extended_l2 ? L2E_SIZE_EXTENDED : L2E_SIZE_NORMAL;
 
 /* header: 1 cluster */
 meta_size += cluster_size;
 
 /* total size of L2 tables */
 nl2e = aligned_total_size / cluster_size;
-nl2e = ROUND_UP(nl2e, cluster_size / sizeof(uint64_t));
-meta_size += nl2e * sizeof(uint64_t);
+nl2e = ROUND_UP(nl2e, cluster_size / l2e_size);
+meta_size += nl2e * l2e_size;
 
 /* total size of L1 tables */
-nl1e = nl2e * sizeof(uint64_t) / cluster_size;
+nl1e = nl2e * l2e_size / cluster_size;
 nl1e = ROUND_UP(nl1e, cluster_size / sizeof(uint64_t));
 meta_size += nl1e * sizeof(uint64_t);
 
@@ -4834,6 +4837,8 @@ static BlockMeasureInfo *qcow2_measure(QemuOpts *opts, 
BlockDriverState *in_bs,
 PreallocMode prealloc;
 bool has_backing_file;
 bool has_luks;
+bool extended_l2 = false; /* Set to false until the option is added */
+size_t l2e_size;
 
 /* Parse image creation options */
 cluster_size = qcow2_opt_get_cluster_size_del(opts, _err);
@@ -4899,8 +4904,9 @@ static BlockMeasureInfo *qcow2_measure(QemuOpts *opts, 
BlockDriverState *in_bs,
 virtual_size = ROUND_UP(virtual_size, cluster_size);
 
 /* Check that virtual disk size is valid */
+l2e_size = extended_l2 ? L2E_SIZE_EXTENDED : L2E_SIZE_NORMAL;
 l2_tables = DIV_ROUND_UP(virtual_size / cluster_size,
- cluster_size / sizeof(uint64_t));
+ cluster_size / l2e_size);
 if (l2_tables * sizeof(uint64_t) > QCOW_MAX_L1_SIZE) {
 error_setg(_err, "The image size is too large "
"(try using a larger cluster size)");
@@ -4963,9 +4969,9 @@ static BlockMeasureInfo *qcow2_measure(QemuOpts *opts, 
BlockDriverState *in_bs,
 }
 
 info = g_new(BlockMeasureInfo, 1);
-info->fully_allocated =
+info->fully_allocated = luks_payload_size +
 qcow2_calc_prealloc_size(virtual_size, cluster_size,
- ctz32(refcount_bits)) + luks_payload_size;
+ ctz32(refcount_bits), extended_l2);
 
 /* Remove data clusters that are not required.  This overestimates the
  * required size because metadata needed for the fully allocated file is
-- 
2.20.1




[PATCH v6 03/32] qcow2: Add calculate_l2_meta()

2020-05-24 Thread Alberto Garcia
handle_alloc() creates a QCowL2Meta structure in order to update the
image metadata and perform the necessary copy-on-write operations.

This patch moves that code to a separate function so it can be used
from other places.

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
---
 block/qcow2-cluster.c | 77 +--
 1 file changed, 53 insertions(+), 24 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 9ab41cb728..61ad638bdc 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1037,6 +1037,56 @@ void qcow2_alloc_cluster_abort(BlockDriverState *bs, 
QCowL2Meta *m)
 }
 }
 
+/*
+ * For a given write request, create a new QCowL2Meta structure, add
+ * it to @m and the BDRVQcow2State.cluster_allocs list.
+ *
+ * @host_cluster_offset points to the beginning of the first cluster.
+ *
+ * @guest_offset and @bytes indicate the offset and length of the
+ * request.
+ *
+ * If @keep_old is true it means that the clusters were already
+ * allocated and will be overwritten. If false then the clusters are
+ * new and we have to decrease the reference count of the old ones.
+ */
+static void calculate_l2_meta(BlockDriverState *bs,
+  uint64_t host_cluster_offset,
+  uint64_t guest_offset, unsigned bytes,
+  QCowL2Meta **m, bool keep_old)
+{
+BDRVQcow2State *s = bs->opaque;
+unsigned cow_start_from = 0;
+unsigned cow_start_to = offset_into_cluster(s, guest_offset);
+unsigned cow_end_from = cow_start_to + bytes;
+unsigned cow_end_to = ROUND_UP(cow_end_from, s->cluster_size);
+unsigned nb_clusters = size_to_clusters(s, cow_end_from);
+QCowL2Meta *old_m = *m;
+
+*m = g_malloc0(sizeof(**m));
+**m = (QCowL2Meta) {
+.next   = old_m,
+
+.alloc_offset   = host_cluster_offset,
+.offset = start_of_cluster(s, guest_offset),
+.nb_clusters= nb_clusters,
+
+.keep_old_clusters = keep_old,
+
+.cow_start = {
+.offset = cow_start_from,
+.nb_bytes   = cow_start_to - cow_start_from,
+},
+.cow_end = {
+.offset = cow_end_from,
+.nb_bytes   = cow_end_to - cow_end_from,
+},
+};
+
+qemu_co_queue_init(&(*m)->dependent_requests);
+QLIST_INSERT_HEAD(>cluster_allocs, *m, next_in_flight);
+}
+
 /*
  * Returns the number of contiguous clusters that can be used for an allocating
  * write, but require COW to be performed (this includes yet unallocated space,
@@ -1435,35 +1485,14 @@ static int handle_alloc(BlockDriverState *bs, uint64_t 
guest_offset,
 uint64_t requested_bytes = *bytes + offset_into_cluster(s, guest_offset);
 int avail_bytes = nb_clusters << s->cluster_bits;
 int nb_bytes = MIN(requested_bytes, avail_bytes);
-QCowL2Meta *old_m = *m;
-
-*m = g_malloc0(sizeof(**m));
-
-**m = (QCowL2Meta) {
-.next   = old_m,
-
-.alloc_offset   = alloc_cluster_offset,
-.offset = start_of_cluster(s, guest_offset),
-.nb_clusters= nb_clusters,
-
-.keep_old_clusters  = keep_old_clusters,
-
-.cow_start = {
-.offset = 0,
-.nb_bytes   = offset_into_cluster(s, guest_offset),
-},
-.cow_end = {
-.offset = nb_bytes,
-.nb_bytes   = avail_bytes - nb_bytes,
-},
-};
-qemu_co_queue_init(&(*m)->dependent_requests);
-QLIST_INSERT_HEAD(>cluster_allocs, *m, next_in_flight);
 
 *host_offset = alloc_cluster_offset + offset_into_cluster(s, guest_offset);
 *bytes = MIN(*bytes, nb_bytes - offset_into_cluster(s, guest_offset));
 assert(*bytes != 0);
 
+calculate_l2_meta(bs, alloc_cluster_offset, guest_offset, *bytes,
+  m, keep_old_clusters);
+
 return 1;
 
 fail:
-- 
2.20.1




Re: [PULL 0/4] pflash-next patches for 2020-05-22

2020-05-24 Thread Peter Maydell
On Fri, 22 May 2020 at 18:47, Philippe Mathieu-Daudé  wrote:
>
> The following changes since commit d19f1ab0de8b763159513e3eaa12c5bc68122361:
>
>   Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-2020052=
> 1-1' into staging (2020-05-21 22:06:56 +0100)
>
> are available in the Git repository at:
>
>   https://gitlab.com/philmd/qemu.git tags/pflash-next-20200522
>
> for you to fetch changes up to 1857b9db49770590483be44eb90993c42b2a5a99:
>
>   hw/block/pflash: Check return value of blk_pwrite() (2020-05-22 19:38:14 +0=
> 200)
>
> 
>
> - Remove unused timer in CFI01 flash,
> - Clean up code documentation,
> - Silent a long-standing Coverity warning (2016-07-15).


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/5.1
for any user-visible changes.

-- PMM