[PATCH v2 2/4] qcow2: Don't round the L1 table allocation up to the sector size

2020-01-09 Thread Alberto Garcia
The L1 table is read from disk using the byte-based bdrv_pread() and
is never accessed beyond its last element, so there's no need to
allocate more memory than that.

Signed-off-by: Alberto Garcia 
---
 block/qcow2-cluster.c  | 5 ++---
 block/qcow2-refcount.c | 2 +-
 block/qcow2-snapshot.c | 3 +--
 block/qcow2.c  | 2 +-
 4 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 8982b7b762..932fc48919 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -124,12 +124,11 @@ int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t 
min_size,
 #endif
 
 new_l1_size2 = sizeof(uint64_t) * new_l1_size;
-new_l1_table = qemu_try_blockalign(bs->file->bs,
-   ROUND_UP(new_l1_size2, 512));
+new_l1_table = qemu_try_blockalign(bs->file->bs, new_l1_size2);
 if (new_l1_table == NULL) {
 return -ENOMEM;
 }
-memset(new_l1_table, 0, ROUND_UP(new_l1_size2, 512));
+memset(new_l1_table, 0, new_l1_size2);
 
 if (s->l1_size) {
 memcpy(new_l1_table, s->l1_table, s->l1_size * sizeof(uint64_t));
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index f67ac6b2d8..c963bc8de1 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1262,7 +1262,7 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
  * l1_table_offset when it is the current s->l1_table_offset! Be careful
  * when changing this! */
 if (l1_table_offset != s->l1_table_offset) {
-l1_table = g_try_malloc0(ROUND_UP(l1_size2, 512));
+l1_table = g_try_malloc0(l1_size2);
 if (l1_size2 && l1_table == NULL) {
 ret = -ENOMEM;
 goto fail;
diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
index 5ab64da1ec..82c32d4c9b 100644
--- a/block/qcow2-snapshot.c
+++ b/block/qcow2-snapshot.c
@@ -1024,8 +1024,7 @@ int qcow2_snapshot_load_tmp(BlockDriverState *bs,
 return ret;
 }
 new_l1_bytes = sn->l1_size * sizeof(uint64_t);
-new_l1_table = qemu_try_blockalign(bs->file->bs,
-   ROUND_UP(new_l1_bytes, 512));
+new_l1_table = qemu_try_blockalign(bs->file->bs, new_l1_bytes);
 if (new_l1_table == NULL) {
 return -ENOMEM;
 }
diff --git a/block/qcow2.c b/block/qcow2.c
index 87ca2832f0..848a6c5182 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1499,7 +1499,7 @@ static int coroutine_fn qcow2_do_open(BlockDriverState 
*bs, QDict *options,
 
 if (s->l1_size > 0) {
 s->l1_table = qemu_try_blockalign(bs->file->bs,
-ROUND_UP(s->l1_size * sizeof(uint64_t), 512));
+  s->l1_size * sizeof(uint64_t));
 if (s->l1_table == NULL) {
 error_setg(errp, "Could not allocate L1 table");
 ret = -ENOMEM;
-- 
2.20.1




[PATCH v2 3/4] qcow2: Tighten cluster_offset alignment assertions

2020-01-09 Thread Alberto Garcia
qcow2_alloc_cluster_offset() and qcow2_get_cluster_offset() always
return offsets that are cluster-aligned so don't just check that they
are sector-aligned.

The check in qcow2_co_preadv_task() is also replaced by an assertion
for the same reason.

Signed-off-by: Alberto Garcia 
---
 block/qcow2.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 848a6c5182..783d2b9578 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2175,10 +2175,7 @@ static coroutine_fn int 
qcow2_co_preadv_task(BlockDriverState *bs,
   offset, bytes, qiov, qiov_offset);
 
 case QCOW2_CLUSTER_NORMAL:
-if ((file_cluster_offset & 511) != 0) {
-return -EIO;
-}
-
+assert(QEMU_IS_ALIGNED(file_cluster_offset, s->cluster_size));
 if (bs->encrypted) {
 return qcow2_co_preadv_encrypted(bs, file_cluster_offset,
  offset, bytes, qiov, qiov_offset);
@@ -2514,7 +2511,7 @@ static coroutine_fn int qcow2_co_pwritev_part(
 goto out_locked;
 }
 
-assert((cluster_offset & 511) == 0);
+assert(QEMU_IS_ALIGNED(cluster_offset, s->cluster_size));
 
 ret = qcow2_pre_write_overlap_check(bs, 0,
 cluster_offset + offset_in_cluster,
@@ -3904,7 +3901,7 @@ qcow2_co_copy_range_to(BlockDriverState *bs,
 goto fail;
 }
 
-assert((cluster_offset & 511) == 0);
+assert(QEMU_IS_ALIGNED(cluster_offset, s->cluster_size));
 
 ret = qcow2_pre_write_overlap_check(bs, 0,
 cluster_offset + offset_in_cluster, cur_bytes, true);
-- 
2.20.1




Re: [PATCH 1/3] qcow2: Require that the virtual size is a multiple of the sector size

2020-01-09 Thread Alberto Garcia
On Wed 08 Jan 2020 08:46:11 PM CET, Nir Soffer wrote:
>> However when an image is opened the virtual size is rounded down,
>> which means that trying to access the last few advertised bytes will
>> result in an error. As seen above QEMU cannot create such images and
>> there's no good use case that would require us to try to handle them
>> so let's just treat them as unsupported.
>
> Making error impossible is best.
>
> Can we require multiple of 4k to avoid unaligned read/write at the end
> of an image aligned to 512 bytes on storage with 4k sector size?

I wouldn't force that on the user. The only reason why I'm not allowing
the non-sector-aligned case is because it's currently broken, but I
wouldn't have a problem with it if it was working fine.

>> +echo
>> +echo "== Image size not a multiple of the sector size =="
>> +_make_test_img 64k
>
> Logging the change here would make the test and the output more clear:
>
> echo "modifying virtual size to 65535"

Ok

>> +== Image size not a multiple of the sector size ==
>> +Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=65536
>> +qemu-io: can't open device TEST_DIR/t.qcow2: Virtual size is not a multiple 
>> of 512
>
> The output is confusing, looks like we created image with aligned
> size, and on the next line it complains about the size.

Ok, I can also replace the "qemu-io write" with a "qemu-img info" or
something like that to make it a bit less confusing (i.e. the error
happens already when you open the image, not when you write to it).

Berto



Re: [PATCH 3/3] qcow2: Use BDRV_SECTOR_SIZE instead of the hardcoded value

2020-01-09 Thread Alberto Garcia
On Thu 09 Jan 2020 01:19:00 PM CET, Kevin Wolf wrote:
>> diff --git a/block/qcow2.c b/block/qcow2.c
>> index e8ce966f7f..6427c75409 100644
>> --- a/block/qcow2.c
>> +++ b/block/qcow2.c
>> @@ -2175,7 +2175,7 @@ static coroutine_fn int 
>> qcow2_co_preadv_task(BlockDriverState *bs,
>>offset, bytes, qiov, qiov_offset);
>>  
>>  case QCOW2_CLUSTER_NORMAL:
>> -if ((file_cluster_offset & 511) != 0) {
>> +if ((file_cluster_offset % BDRV_SECTOR_SIZE) != 0) {
>>  return -EIO;
>>  }
>
> Hm, unrelated to your change, but why do we test for 512 byte
> alignment here? file_cluster_offset should certainly be cluster
> aligned for normal clusters. And if the check fails, that's actually
> an image corruption and not just an I/O error. Am I missing something?

I actually suspect that this is just an old, obsolete check that we have
kept during these years. file_cluster_offset should be not just sector
aligned but also cluster aligned if I'm not wrong, and if not then
qcow2_alloc_cluster_offset() and qcow2_get_cluster_offset() should
return an error.

I can simply remove that check, or replace it with an assertion.

Berto



[PATCH 2/3] qcow2: Don't round the L1 table allocation up to the sector size

2020-01-08 Thread Alberto Garcia
The L1 table is read from disk using the byte-based bdrv_pread() and
is never accessed beyond its last element, so there's no need to
allocate more memory than that.

Signed-off-by: Alberto Garcia 
---
 block/qcow2-cluster.c  | 5 ++---
 block/qcow2-refcount.c | 2 +-
 block/qcow2-snapshot.c | 3 +--
 block/qcow2.c  | 2 +-
 4 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 8982b7b762..932fc48919 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -124,12 +124,11 @@ int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t 
min_size,
 #endif
 
 new_l1_size2 = sizeof(uint64_t) * new_l1_size;
-new_l1_table = qemu_try_blockalign(bs->file->bs,
-   ROUND_UP(new_l1_size2, 512));
+new_l1_table = qemu_try_blockalign(bs->file->bs, new_l1_size2);
 if (new_l1_table == NULL) {
 return -ENOMEM;
 }
-memset(new_l1_table, 0, ROUND_UP(new_l1_size2, 512));
+memset(new_l1_table, 0, new_l1_size2);
 
 if (s->l1_size) {
 memcpy(new_l1_table, s->l1_table, s->l1_size * sizeof(uint64_t));
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index f67ac6b2d8..c963bc8de1 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1262,7 +1262,7 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
  * l1_table_offset when it is the current s->l1_table_offset! Be careful
  * when changing this! */
 if (l1_table_offset != s->l1_table_offset) {
-l1_table = g_try_malloc0(ROUND_UP(l1_size2, 512));
+l1_table = g_try_malloc0(l1_size2);
 if (l1_size2 && l1_table == NULL) {
 ret = -ENOMEM;
 goto fail;
diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
index 5ab64da1ec..82c32d4c9b 100644
--- a/block/qcow2-snapshot.c
+++ b/block/qcow2-snapshot.c
@@ -1024,8 +1024,7 @@ int qcow2_snapshot_load_tmp(BlockDriverState *bs,
 return ret;
 }
 new_l1_bytes = sn->l1_size * sizeof(uint64_t);
-new_l1_table = qemu_try_blockalign(bs->file->bs,
-   ROUND_UP(new_l1_bytes, 512));
+new_l1_table = qemu_try_blockalign(bs->file->bs, new_l1_bytes);
 if (new_l1_table == NULL) {
 return -ENOMEM;
 }
diff --git a/block/qcow2.c b/block/qcow2.c
index 92474849db..e8ce966f7f 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1499,7 +1499,7 @@ static int coroutine_fn qcow2_do_open(BlockDriverState 
*bs, QDict *options,
 
 if (s->l1_size > 0) {
 s->l1_table = qemu_try_blockalign(bs->file->bs,
-ROUND_UP(s->l1_size * sizeof(uint64_t), 512));
+  s->l1_size * sizeof(uint64_t));
 if (s->l1_table == NULL) {
 error_setg(errp, "Could not allocate L1 table");
 ret = -ENOMEM;
-- 
2.20.1




[PATCH 1/3] qcow2: Require that the virtual size is a multiple of the sector size

2020-01-08 Thread Alberto Garcia
The qcow2 header specifies the virtual size of the image in bytes, but
BlockDriverState stores it as a number of 512-byte sectors.

If the user tries to create an image with a size that is not a
multiple of the sector size then this is fixed on creation by
silently rounding the image size up (see commit c2eb918e32).
qcow2_co_truncate() is more strict and returns an error instead.

However when an image is opened the virtual size is rounded down,
which means that trying to access the last few advertised bytes will
result in an error. As seen above QEMU cannot create such images and
there's no good use case that would require us to try to handle them
so let's just treat them as unsupported.

Signed-off-by: Alberto Garcia 
---
 block/qcow2.c  | 7 +++
 docs/interop/qcow2.txt | 3 ++-
 tests/qemu-iotests/080 | 7 +++
 tests/qemu-iotests/080.out | 4 
 4 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 7fbaac8457..92474849db 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1326,6 +1326,13 @@ static int coroutine_fn qcow2_do_open(BlockDriverState 
*bs, QDict *options,
 goto fail;
 }
 
+if (header.size % BDRV_SECTOR_SIZE) {
+error_setg(errp, "Virtual size is not a multiple of %u",
+   (unsigned) BDRV_SECTOR_SIZE);
+ret = -EINVAL;
+goto fail;
+}
+
 if (header.header_length > sizeof(header)) {
 s->unknown_header_fields_size = header.header_length - sizeof(header);
 s->unknown_header_fields = g_malloc(s->unknown_header_fields_size);
diff --git a/docs/interop/qcow2.txt b/docs/interop/qcow2.txt
index af5711e533..891f5662d8 100644
--- a/docs/interop/qcow2.txt
+++ b/docs/interop/qcow2.txt
@@ -40,7 +40,8 @@ The first cluster of a qcow2 image contains the file header:
 with larger cluster sizes.
 
  24 - 31:   size
-Virtual disk size in bytes.
+Virtual disk size in bytes. qemu can only handle
+sizes that are a multiple of 512 bytes.
 
 Note: qemu has an implementation limit of 32 MB as
 the maximum L1 table size.  With a 2 MB cluster
diff --git a/tests/qemu-iotests/080 b/tests/qemu-iotests/080
index 4bcb5021e8..2563b2c052 100755
--- a/tests/qemu-iotests/080
+++ b/tests/qemu-iotests/080
@@ -48,6 +48,7 @@ header_size=104
 
 offset_backing_file_offset=8
 offset_backing_file_size=16
+offset_virtual_size=24
 offset_l1_size=36
 offset_l1_table_offset=40
 offset_refcount_table_offset=48
@@ -197,6 +198,12 @@ poke_file "$TEST_IMG" "$offset_snap1_l1_size" 
"\x10\x00\x00\x00"
 { $QEMU_IMG snapshot -d test $TEST_IMG; } 2>&1 | _filter_testdir
 _check_test_img
 
+echo
+echo "== Image size not a multiple of the sector size =="
+_make_test_img 64k
+poke_file "$TEST_IMG" "$offset_virtual_size" "\x00\x00\x00\x00\x00\x00\xff\xff"
+{ $QEMU_IO -c "write 65530 1" $TEST_IMG; } 2>&1 | _filter_qemu_io | 
_filter_testdir
+
 # success, all done
 echo "*** done"
 rm -f $seq.full
diff --git a/tests/qemu-iotests/080.out b/tests/qemu-iotests/080.out
index 45ab01db8e..e1c969e2ba 100644
--- a/tests/qemu-iotests/080.out
+++ b/tests/qemu-iotests/080.out
@@ -104,4 +104,8 @@ Data may be corrupted, or further writes to the image may 
corrupt it.
 
 3 leaked clusters were found on the image.
 This means waste of disk space, but no harm to data.
+
+== Image size not a multiple of the sector size ==
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=65536
+qemu-io: can't open device TEST_DIR/t.qcow2: Virtual size is not a multiple of 
512
 *** done
-- 
2.20.1




[PATCH 0/3] qcow2: Misc BDRV_SECTOR_SIZE updates

2020-01-08 Thread Alberto Garcia
This small series gets rid of all the remaining instances of hardcoded
sector sizes in the qcow2 code and adds a check for images whose
virtual size is not a multiple of the sector size.

See the individual patches for details.

Berto

Alberto Garcia (3):
  qcow2: Require that the virtual size is a multiple of the sector size
  qcow2: Don't round the L1 table allocation up to the sector size
  qcow2: Use BDRV_SECTOR_SIZE instead of the hardcoded value

 block/qcow2-cluster.c  |  7 +++
 block/qcow2-refcount.c |  2 +-
 block/qcow2-snapshot.c |  3 +--
 block/qcow2.c  | 25 +
 docs/interop/qcow2.txt |  3 ++-
 tests/qemu-iotests/080 |  7 +++
 tests/qemu-iotests/080.out |  4 
 7 files changed, 35 insertions(+), 16 deletions(-)

-- 
2.20.1




[PATCH 3/3] qcow2: Use BDRV_SECTOR_SIZE instead of the hardcoded value

2020-01-08 Thread Alberto Garcia
This replaces all remaining instances in the qcow2 code.

Signed-off-by: Alberto Garcia 
---
 block/qcow2-cluster.c |  2 +-
 block/qcow2.c | 16 +---
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 932fc48919..777ca2d409 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -219,7 +219,7 @@ static int l2_load(BlockDriverState *bs, uint64_t offset,
  * Writes one sector of the L1 table to the disk (can't update single entries
  * and we really don't want bdrv_pread to perform a read-modify-write)
  */
-#define L1_ENTRIES_PER_SECTOR (512 / 8)
+#define L1_ENTRIES_PER_SECTOR (BDRV_SECTOR_SIZE / 8)
 int qcow2_write_l1_entry(BlockDriverState *bs, int l1_index)
 {
 BDRVQcow2State *s = bs->opaque;
diff --git a/block/qcow2.c b/block/qcow2.c
index e8ce966f7f..6427c75409 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2175,7 +2175,7 @@ static coroutine_fn int 
qcow2_co_preadv_task(BlockDriverState *bs,
   offset, bytes, qiov, qiov_offset);
 
 case QCOW2_CLUSTER_NORMAL:
-if ((file_cluster_offset & 511) != 0) {
+if ((file_cluster_offset % BDRV_SECTOR_SIZE) != 0) {
 return -EIO;
 }
 
@@ -2514,7 +2514,7 @@ static coroutine_fn int qcow2_co_pwritev_part(
 goto out_locked;
 }
 
-assert((cluster_offset & 511) == 0);
+assert((cluster_offset % BDRV_SECTOR_SIZE) == 0);
 
 ret = qcow2_pre_write_overlap_check(bs, 0,
 cluster_offset + offset_in_cluster,
@@ -3283,7 +3283,8 @@ qcow2_co_create(BlockdevCreateOptions *create_options, 
Error **errp)
 
 /* Validate options and set default values */
 if (!QEMU_IS_ALIGNED(qcow2_opts->size, BDRV_SECTOR_SIZE)) {
-error_setg(errp, "Image size must be a multiple of 512 bytes");
+error_setg(errp, "Image size must be a multiple of %u bytes",
+   (unsigned) BDRV_SECTOR_SIZE);
 ret = -EINVAL;
 goto out;
 }
@@ -3839,7 +3840,7 @@ qcow2_co_copy_range_from(BlockDriverState *bs,
 case QCOW2_CLUSTER_NORMAL:
 child = s->data_file;
 copy_offset += offset_into_cluster(s, src_offset);
-if ((copy_offset & 511) != 0) {
+if ((copy_offset % BDRV_SECTOR_SIZE) != 0) {
 ret = -EIO;
 goto out;
 }
@@ -3904,7 +3905,7 @@ qcow2_co_copy_range_to(BlockDriverState *bs,
 goto fail;
 }
 
-assert((cluster_offset & 511) == 0);
+assert((cluster_offset % BDRV_SECTOR_SIZE) == 0);
 
 ret = qcow2_pre_write_overlap_check(bs, 0,
 cluster_offset + offset_in_cluster, cur_bytes, true);
@@ -3961,8 +3962,9 @@ static int coroutine_fn 
qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
 return -ENOTSUP;
 }
 
-if (offset & 511) {
-error_setg(errp, "The new size must be a multiple of 512");
+if (offset % BDRV_SECTOR_SIZE) {
+error_setg(errp, "The new size must be a multiple of %u",
+   (unsigned) BDRV_SECTOR_SIZE);
 return -EINVAL;
 }
 
-- 
2.20.1




[RFC PATCH v3 12/27] qcow2: Replace QCOW2_CLUSTER_* with QCOW2_SUBCLUSTER_*

2019-12-22 Thread Alberto Garcia
In order to support extended L2 entries some functions of the qcow2
driver need to start dealing with subclusters instead of clusters.

qcow2_get_cluster_offset() is modified to return the subcluster
type instead of the cluster type, and all callers are updated to
replace all values of QCow2ClusterType with their QCow2SubclusterType
equivalents (as returned by qcow2_cluster_to_subcluster_type()).

This patch only changes the data types, there are no semantic changes.

Signed-off-by: Alberto Garcia 
---
 block/qcow2-cluster.c | 19 +-
 block/qcow2.c | 82 +--
 block/qcow2.h |  3 +-
 3 files changed, 60 insertions(+), 44 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 851c7e6165..40c2e34a2a 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -497,21 +497,22 @@ static int coroutine_fn 
do_perform_cow_write(BlockDriverState *bs,
 /*
  * get_cluster_offset
  *
- * For a given offset of the virtual disk, find the cluster type and offset in
- * the qcow2 file. The offset is stored in *cluster_offset.
+ * For a given offset of the virtual disk, find the cluster offset in
+ * the qcow2 file and store it in *cluster_offset.
  *
  * On entry, *bytes is the maximum number of contiguous bytes starting at
  * offset that we are interested in.
  *
  * On exit, *bytes is the number of bytes starting at offset that have the same
- * cluster type and (if applicable) are stored contiguously in the image file.
- * Compressed clusters are always returned one by one.
+ * subcluster type and (if applicable) are stored contiguously in the image
+ * file. The subcluster type is stored in *subcluster_type. Compressed clusters
+ * are always processed one by one.
  *
- * Returns the cluster type (QCOW2_CLUSTER_*) on success, -errno in error
- * cases.
+ * Returns 0 on success, -errno in error cases.
  */
 int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
- unsigned int *bytes, uint64_t *cluster_offset)
+ unsigned int *bytes, uint64_t *cluster_offset,
+ QCow2SubclusterType *subcluster_type)
 {
 BDRVQcow2State *s = bs->opaque;
 unsigned int l2_index;
@@ -653,7 +654,9 @@ out:
 assert(bytes_available - offset_in_cluster <= UINT_MAX);
 *bytes = bytes_available - offset_in_cluster;
 
-return type;
+*subcluster_type = qcow2_cluster_to_subcluster_type(type);
+
+return 0;
 
 fail:
 qcow2_cache_put(s->l2_table_cache, (void **)_slice);
diff --git a/block/qcow2.c b/block/qcow2.c
index e7607d90d4..9277d680ef 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1964,6 +1964,7 @@ static int coroutine_fn 
qcow2_co_block_status(BlockDriverState *bs,
 BDRVQcow2State *s = bs->opaque;
 uint64_t cluster_offset;
 unsigned int bytes;
+QCow2SubclusterType type;
 int ret, status = 0;
 
 qemu_co_mutex_lock(>lock);
@@ -1975,7 +1976,7 @@ static int coroutine_fn 
qcow2_co_block_status(BlockDriverState *bs,
 }
 
 bytes = MIN(INT_MAX, count);
-ret = qcow2_get_cluster_offset(bs, offset, , _offset);
+ret = qcow2_get_cluster_offset(bs, offset, , _offset, );
 qemu_co_mutex_unlock(>lock);
 if (ret < 0) {
 return ret;
@@ -1983,15 +1984,16 @@ static int coroutine_fn 
qcow2_co_block_status(BlockDriverState *bs,
 
 *pnum = bytes;
 
-if ((ret == QCOW2_CLUSTER_NORMAL || ret == QCOW2_CLUSTER_ZERO_ALLOC) &&
-!s->crypto) {
+if ((type == QCOW2_SUBCLUSTER_NORMAL ||
+ type == QCOW2_SUBCLUSTER_ZERO_ALLOC) && !s->crypto) {
 *map = cluster_offset | offset_into_cluster(s, offset);
 *file = s->data_file->bs;
 status |= BDRV_BLOCK_OFFSET_VALID;
 }
-if (ret == QCOW2_CLUSTER_ZERO_PLAIN || ret == QCOW2_CLUSTER_ZERO_ALLOC) {
+if (type == QCOW2_SUBCLUSTER_ZERO_PLAIN ||
+type == QCOW2_SUBCLUSTER_ZERO_ALLOC) {
 status |= BDRV_BLOCK_ZERO;
-} else if (ret != QCOW2_CLUSTER_UNALLOCATED) {
+} else if (type != QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN) {
 status |= BDRV_BLOCK_DATA;
 }
 if (s->metadata_preallocation && (status & BDRV_BLOCK_DATA) &&
@@ -2094,7 +2096,7 @@ typedef struct Qcow2AioTask {
 AioTask task;
 
 BlockDriverState *bs;
-QCow2ClusterType cluster_type; /* only for read */
+QCow2SubclusterType subcluster_type; /* only for read */
 uint64_t file_cluster_offset;
 uint64_t offset;
 uint64_t bytes;
@@ -2107,7 +2109,7 @@ static coroutine_fn int 
qcow2_co_preadv_task_entry(AioTask *task);
 static coroutine_fn int qcow2_add_task(BlockDriverState *bs,
AioTaskPool *pool,
AioTaskFunc func,
-   QCow2ClusterType cluster_type,
+   QCow2Subclust

[RFC PATCH v3 26/27] qcow2: Add subcluster support to qcow2_measure()

2019-12-22 Thread Alberto Garcia
Extended L2 entries are bigger than normal L2 entries so this has an
impact on the amount of metadata needed for a qcow2 file.

Signed-off-by: Alberto Garcia 
---
 block/qcow2.c | 19 ---
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 4f26953b1e..62093de1c6 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3162,28 +3162,31 @@ int64_t qcow2_refcount_metadata_size(int64_t clusters, 
size_t cluster_size,
  * @total_size: virtual disk size in bytes
  * @cluster_size: cluster size in bytes
  * @refcount_order: refcount bits power-of-2 exponent
+ * @extended_l2: true if the image has extended L2 entries
  *
  * Returns: Total number of bytes required for the fully allocated image
  * (including metadata).
  */
 static int64_t qcow2_calc_prealloc_size(int64_t total_size,
 size_t cluster_size,
-int refcount_order)
+int refcount_order,
+bool extended_l2)
 {
 int64_t meta_size = 0;
 uint64_t nl1e, nl2e;
 int64_t aligned_total_size = ROUND_UP(total_size, cluster_size);
+size_t l2e_size = extended_l2 ? L2E_SIZE_EXTENDED : L2E_SIZE_NORMAL;
 
 /* header: 1 cluster */
 meta_size += cluster_size;
 
 /* total size of L2 tables */
 nl2e = aligned_total_size / cluster_size;
-nl2e = ROUND_UP(nl2e, cluster_size / sizeof(uint64_t));
-meta_size += nl2e * sizeof(uint64_t);
+nl2e = ROUND_UP(nl2e, cluster_size / l2e_size);
+meta_size += nl2e * l2e_size;
 
 /* total size of L1 tables */
-nl1e = nl2e * sizeof(uint64_t) / cluster_size;
+nl1e = nl2e * l2e_size / cluster_size;
 nl1e = ROUND_UP(nl1e, cluster_size / sizeof(uint64_t));
 meta_size += nl1e * sizeof(uint64_t);
 
@@ -4704,6 +4707,7 @@ static BlockMeasureInfo *qcow2_measure(QemuOpts *opts, 
BlockDriverState *in_bs,
 bool has_backing_file;
 bool has_luks;
 bool extended_l2;
+size_t l2e_size;
 
 /* Parse image creation options */
 extended_l2 = qemu_opt_get_bool_del(opts, BLOCK_OPT_EXTL2, false);
@@ -4754,8 +4758,9 @@ static BlockMeasureInfo *qcow2_measure(QemuOpts *opts, 
BlockDriverState *in_bs,
 virtual_size = ROUND_UP(virtual_size, cluster_size);
 
 /* Check that virtual disk size is valid */
+l2e_size = extended_l2 ? L2E_SIZE_EXTENDED : L2E_SIZE_NORMAL;
 l2_tables = DIV_ROUND_UP(virtual_size / cluster_size,
- cluster_size / sizeof(uint64_t));
+ cluster_size / l2e_size);
 if (l2_tables * sizeof(uint64_t) > QCOW_MAX_L1_SIZE) {
 error_setg(_err, "The image size is too large "
"(try using a larger cluster size)");
@@ -4818,9 +4823,9 @@ static BlockMeasureInfo *qcow2_measure(QemuOpts *opts, 
BlockDriverState *in_bs,
 }
 
 info = g_new(BlockMeasureInfo, 1);
-info->fully_allocated =
+info->fully_allocated = luks_payload_size +
 qcow2_calc_prealloc_size(virtual_size, cluster_size,
- ctz32(refcount_bits)) + luks_payload_size;
+ ctz32(refcount_bits), extended_l2);
 
 /* Remove data clusters that are not required.  This overestimates the
  * required size because metadata needed for the fully allocated file is
-- 
2.20.1




[RFC PATCH v3 25/27] qcow2: Add the 'extended_l2' option and the QCOW2_INCOMPAT_EXTL2 bit

2019-12-22 Thread Alberto Garcia
Now that the implementation of subclusters is complete we can finally
add the necessary options to create and read images with this feature,
which we call "extended L2 entries".

Signed-off-by: Alberto Garcia 
---
 block/qcow2.c|  65 ++--
 block/qcow2.h|   8 ++-
 include/block/block_int.h|   1 +
 qapi/block-core.json |   7 +++
 tests/qemu-iotests/031.out   |   8 +--
 tests/qemu-iotests/036.out   |   4 +-
 tests/qemu-iotests/049.out   | 102 +++
 tests/qemu-iotests/060.out   |   1 +
 tests/qemu-iotests/061.out   |  20 +++---
 tests/qemu-iotests/065   |  18 --
 tests/qemu-iotests/082.out   |  48 ---
 tests/qemu-iotests/085.out   |  38 ++--
 tests/qemu-iotests/144.out   |   4 +-
 tests/qemu-iotests/182.out   |   2 +-
 tests/qemu-iotests/185.out   |   8 +--
 tests/qemu-iotests/198.out   |   2 +
 tests/qemu-iotests/206.out   |   4 ++
 tests/qemu-iotests/242.out   |   5 ++
 tests/qemu-iotests/255.out   |   8 +--
 tests/qemu-iotests/273.out   |   9 ++-
 tests/qemu-iotests/common.filter |   1 +
 21 files changed, 245 insertions(+), 118 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 0267722065..4f26953b1e 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1383,6 +1383,12 @@ static int coroutine_fn qcow2_do_open(BlockDriverState 
*bs, QDict *options,
 s->subcluster_size = s->cluster_size / s->subclusters_per_cluster;
 s->subcluster_bits = ctz32(s->subcluster_size);
 
+if (s->subcluster_size < (1 << MIN_CLUSTER_BITS)) {
+error_setg(errp, "Unsupported subcluster size: %d", 
s->subcluster_size);
+ret = -EINVAL;
+goto fail;
+}
+
 /* Check support for various header values */
 if (header.refcount_order > 6) {
 error_setg(errp, "Reference count entry width too large; may not "
@@ -2856,6 +2862,11 @@ int qcow2_update_header(BlockDriverState *bs)
 .bit  = QCOW2_COMPAT_LAZY_REFCOUNTS_BITNR,
 .name = "lazy refcounts",
 },
+{
+.type = QCOW2_FEAT_TYPE_INCOMPATIBLE,
+.bit  = QCOW2_INCOMPAT_EXTL2_BITNR,
+.name = "extended L2 entries",
+},
 };
 
 ret = header_ext_add(buf, QCOW2_EXT_MAGIC_FEATURE_TABLE,
@@ -3184,7 +3195,8 @@ static int64_t qcow2_calc_prealloc_size(int64_t 
total_size,
 return meta_size + aligned_total_size;
 }
 
-static bool validate_cluster_size(size_t cluster_size, Error **errp)
+static bool validate_cluster_size(size_t cluster_size, bool extended_l2,
+  Error **errp)
 {
 int cluster_bits = ctz32(cluster_size);
 if (cluster_bits < MIN_CLUSTER_BITS || cluster_bits > MAX_CLUSTER_BITS ||
@@ -3194,16 +3206,28 @@ static bool validate_cluster_size(size_t cluster_size, 
Error **errp)
"%dk", 1 << MIN_CLUSTER_BITS, 1 << (MAX_CLUSTER_BITS - 10));
 return false;
 }
+
+if (extended_l2) {
+unsigned min_cluster_size =
+(1 << MIN_CLUSTER_BITS) * QCOW_MAX_SUBCLUSTERS_PER_CLUSTER;
+if (cluster_size < min_cluster_size) {
+error_setg(errp, "Extended L2 entries are only supported with "
+   "cluster sizes of at least %u bytes", min_cluster_size);
+return false;
+}
+}
+
 return true;
 }
 
-static size_t qcow2_opt_get_cluster_size_del(QemuOpts *opts, Error **errp)
+static size_t qcow2_opt_get_cluster_size_del(QemuOpts *opts, bool extended_l2,
+ Error **errp)
 {
 size_t cluster_size;
 
 cluster_size = qemu_opt_get_size_del(opts, BLOCK_OPT_CLUSTER_SIZE,
  DEFAULT_CLUSTER_SIZE);
-if (!validate_cluster_size(cluster_size, errp)) {
+if (!validate_cluster_size(cluster_size, extended_l2, errp)) {
 return 0;
 }
 return cluster_size;
@@ -3316,7 +3340,20 @@ qcow2_co_create(BlockdevCreateOptions *create_options, 
Error **errp)
 cluster_size = DEFAULT_CLUSTER_SIZE;
 }
 
-if (!validate_cluster_size(cluster_size, errp)) {
+if (!qcow2_opts->has_extended_l2) {
+qcow2_opts->extended_l2 = false;
+}
+if (qcow2_opts->extended_l2) {
+if (version < 3) {
+error_setg(errp, "Extended L2 entries are only supported with "
+   "compatibility level 1.1 and above (use version=v3 or "
+   "greater)");
+ret = -EINVAL;
+goto out;
+}
+}
+
+if (!validate_cluster_size(cluster_size, qcow2_opts->extended_l2, errp)) {
 ret = -EINVAL;
 goto out;
 }
@@ -34

[RFC PATCH v3 23/27] qcow2: Add subcluster support to handle_alloc_space()

2019-12-22 Thread Alberto Garcia
The bdrv_co_pwrite_zeroes() call here fills complete clusters with
zeroes, but it can happen that some subclusters are not part of the
write request or the copy-on-write. This patch makes sure that only
the affected subclusters are overwritten.

A potential improvement would be to also fill with zeroes the other
subclusters if we can guarantee that we are not overwriting existing
data. However this would waste more disk space, so we should first
evaluate if it's really worth doing.

Signed-off-by: Alberto Garcia 
---
 block/qcow2.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 1d3da0ccf6..242001afa2 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2354,6 +2354,9 @@ static int handle_alloc_space(BlockDriverState *bs, 
QCowL2Meta *l2meta)
 
 for (m = l2meta; m != NULL; m = m->next) {
 int ret;
+uint64_t start_offset = m->alloc_offset + m->cow_start.offset;
+unsigned nb_bytes = m->cow_end.offset + m->cow_end.nb_bytes -
+m->cow_start.offset;
 
 if (!m->cow_start.nb_bytes && !m->cow_end.nb_bytes) {
 continue;
@@ -2368,16 +2371,14 @@ static int handle_alloc_space(BlockDriverState *bs, 
QCowL2Meta *l2meta)
  * efficiently zero out the whole clusters
  */
 
-ret = qcow2_pre_write_overlap_check(bs, 0, m->alloc_offset,
-m->nb_clusters * s->cluster_size,
+ret = qcow2_pre_write_overlap_check(bs, 0, start_offset, nb_bytes,
 true);
 if (ret < 0) {
 return ret;
 }
 
 BLKDBG_EVENT(bs->file, BLKDBG_CLUSTER_ALLOC_SPACE);
-ret = bdrv_co_pwrite_zeroes(s->data_file, m->alloc_offset,
-m->nb_clusters * s->cluster_size,
+ret = bdrv_co_pwrite_zeroes(s->data_file, start_offset, nb_bytes,
 BDRV_REQ_NO_FALLBACK);
 if (ret < 0) {
 if (ret != -ENOTSUP && ret != -EAGAIN) {
-- 
2.20.1




[RFC PATCH v3 21/27] qcow2: Update L2 bitmap in qcow2_alloc_cluster_link_l2()

2019-12-22 Thread Alberto Garcia
The L2 bitmap needs to be updated after each write to indicate what
new subclusters are now allocated.

This needs to happen even if the cluster was already allocated and the
L2 entry was otherwise valid.

Signed-off-by: Alberto Garcia 
---
 block/qcow2-cluster.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 0a40944667..ed291a4042 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -986,6 +986,23 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, 
QCowL2Meta *m)
 
 set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_COPIED |
  (cluster_offset + (i << s->cluster_bits)));
+
+/* Update bitmap with the subclusters that were just written */
+if (has_subclusters(s)) {
+unsigned written_from = m->cow_start.offset;
+unsigned written_to = m->cow_end.offset + m->cow_end.nb_bytes ?:
+m->nb_clusters << s->cluster_bits;
+uint64_t l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index + i);
+int sc;
+for (sc = 0; sc < s->subclusters_per_cluster; sc++) {
+int sc_off = i * s->cluster_size + sc * s->subcluster_size;
+if (sc_off >= written_from && sc_off < written_to) {
+l2_bitmap |= QCOW_OFLAG_SUB_ALLOC(sc);
+l2_bitmap &= ~QCOW_OFLAG_SUB_ZERO(sc);
+}
+}
+set_l2_bitmap(s, l2_slice, l2_index + i, l2_bitmap);
+}
  }
 
 
-- 
2.20.1




[RFC PATCH v3 06/27] qcow2: Add dummy has_subclusters() function

2019-12-22 Thread Alberto Garcia
This function will be used by the qcow2 code to check if an image has
subclusters or not.

At the moment this simply returns false. Once all patches needed for
subcluster support are ready then QEMU will be able to create and
read images with subclusters and this function will return the actual
value.

Signed-off-by: Alberto Garcia 
---
 block/qcow2.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/block/qcow2.h b/block/qcow2.h
index 6823d3f68f..1db3fc5dbc 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -495,6 +495,12 @@ typedef enum QCow2MetadataOverlap {
 
 #define INV_OFFSET (-1ULL)
 
+static inline bool has_subclusters(BDRVQcow2State *s)
+{
+/* FIXME: Return false until this feature is complete */
+return false;
+}
+
 static inline uint64_t get_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice,
 int idx)
 {
-- 
2.20.1




[RFC PATCH v3 16/27] qcow2: Add subcluster support to zero_in_l2_slice()

2019-12-22 Thread Alberto Garcia
Setting the QCOW_OFLAG_ZERO bit of the L2 entry is forbidden if an
image has subclusters. Instead, the individual 'all zeroes' bits must
be used.

Signed-off-by: Alberto Garcia 
---
 block/qcow2-cluster.c | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index c10601a828..70b0aaa00a 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1870,7 +1870,7 @@ static int zero_in_l2_slice(BlockDriverState *bs, 
uint64_t offset,
 assert(nb_clusters <= INT_MAX);
 
 for (i = 0; i < nb_clusters; i++) {
-uint64_t old_offset;
+uint64_t old_offset, l2_entry = 0;
 QCow2ClusterType cluster_type;
 
 old_offset = get_l2_entry(s, l2_slice, l2_index + i);
@@ -1887,12 +1887,18 @@ static int zero_in_l2_slice(BlockDriverState *bs, 
uint64_t offset,
 
 qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
 if (cluster_type == QCOW2_CLUSTER_COMPRESSED || unmap) {
-set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);
 qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST);
 } else {
-uint64_t entry = get_l2_entry(s, l2_slice, l2_index + i);
-set_l2_entry(s, l2_slice, l2_index + i, entry | QCOW_OFLAG_ZERO);
+l2_entry = get_l2_entry(s, l2_slice, l2_index + i);
 }
+
+if (has_subclusters(s)) {
+set_l2_bitmap(s, l2_slice, l2_index + i, 
QCOW_L2_BITMAP_ALL_ZEROES);
+} else {
+l2_entry |= QCOW_OFLAG_ZERO;
+}
+
+set_l2_entry(s, l2_slice, l2_index + i, l2_entry);
 }
 
 qcow2_cache_put(s->l2_table_cache, (void **) _slice);
-- 
2.20.1




[RFC PATCH v3 10/27] qcow2: Update get/set_l2_entry() and add get/set_l2_bitmap()

2019-12-22 Thread Alberto Garcia
Extended L2 entries are 128-bit wide: 64 bits for the entry itself and
64 bits for the subcluster allocation bitmap.

In order to support them correctly get/set_l2_entry() need to be
updated so they take the entry width into account in order to
calculate the correct offset.

This patch also adds the get/set_l2_bitmap() functions that are
used to access the bitmaps. For convenience we allow calling
get_l2_bitmap() on images without subclusters, although the caller
does not need and should ignore the returned value.

Signed-off-by: Alberto Garcia 
---
 block/qcow2.h | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/block/qcow2.h b/block/qcow2.h
index 8be020bb76..64b0a814f4 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -518,15 +518,37 @@ static inline size_t l2_entry_size(BDRVQcow2State *s)
 static inline uint64_t get_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice,
 int idx)
 {
+idx *= l2_entry_size(s) / sizeof(uint64_t);
 return be64_to_cpu(l2_slice[idx]);
 }
 
+static inline uint64_t get_l2_bitmap(BDRVQcow2State *s, uint64_t *l2_slice,
+ int idx)
+{
+if (has_subclusters(s)) {
+idx *= l2_entry_size(s) / sizeof(uint64_t);
+return be64_to_cpu(l2_slice[idx + 1]);
+} else {
+/* For convenience only; the caller should ignore this value. */
+return 0;
+}
+}
+
 static inline void set_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice,
 int idx, uint64_t entry)
 {
+idx *= l2_entry_size(s) / sizeof(uint64_t);
 l2_slice[idx] = cpu_to_be64(entry);
 }
 
+static inline void set_l2_bitmap(BDRVQcow2State *s, uint64_t *l2_slice,
+ int idx, uint64_t bitmap)
+{
+assert(has_subclusters(s));
+idx *= l2_entry_size(s) / sizeof(uint64_t);
+l2_slice[idx + 1] = cpu_to_be64(bitmap);
+}
+
 static inline bool has_data_file(BlockDriverState *bs)
 {
 BDRVQcow2State *s = bs->opaque;
-- 
2.20.1




[RFC PATCH v3 18/27] qcow2: Add subcluster support to check_refcounts_l2()

2019-12-22 Thread Alberto Garcia
Setting the QCOW_OFLAG_ZERO bit of the L2 entry is forbidden if an
image has subclusters. Instead, the individual 'all zeroes' bits must
be used.

Signed-off-by: Alberto Garcia 
---
 block/qcow2-refcount.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index de85ed29a4..65f4fc27c3 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1685,8 +1685,13 @@ static int check_refcounts_l2(BlockDriverState *bs, 
BdrvCheckResult *res,
 int ign = active ? QCOW2_OL_ACTIVE_L2 :
QCOW2_OL_INACTIVE_L2;
 
-l2_entry = QCOW_OFLAG_ZERO;
-set_l2_entry(s, l2_table, i, l2_entry);
+if (has_subclusters(s)) {
+set_l2_entry(s, l2_table, i, 0);
+set_l2_bitmap(s, l2_table, i,
+  QCOW_L2_BITMAP_ALL_ZEROES);
+} else {
+set_l2_entry(s, l2_table, i, QCOW_OFLAG_ZERO);
+}
 ret = qcow2_pre_write_overlap_check(bs, ign,
 l2e_offset, l2_entry_size(s), false);
 if (ret < 0) {
-- 
2.20.1




[RFC PATCH v3 09/27] qcow2: Add l2_entry_size()

2019-12-22 Thread Alberto Garcia
qcow2 images with subclusters have 128-bit L2 entries. The first 64
bits contain the same information as traditional images and the last
64 bits form a bitmap with the status of each individual subcluster.

Because of that we cannot assume that L2 entries are sizeof(uint64_t)
anymore. This function returns the proper value for the image.

Signed-off-by: Alberto Garcia 
---
 block/qcow2-cluster.c  | 12 ++--
 block/qcow2-refcount.c | 14 --
 block/qcow2.c  |  8 
 block/qcow2.h  |  9 +
 4 files changed, 27 insertions(+), 16 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 683c9569ad..851c7e6165 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -209,7 +209,7 @@ static int l2_load(BlockDriverState *bs, uint64_t offset,
uint64_t l2_offset, uint64_t **l2_slice)
 {
 BDRVQcow2State *s = bs->opaque;
-int start_of_slice = sizeof(uint64_t) *
+int start_of_slice = l2_entry_size(s) *
 (offset_to_l2_index(s, offset) - offset_to_l2_slice_index(s, offset));
 
 return qcow2_cache_get(bs, s->l2_table_cache, l2_offset + start_of_slice,
@@ -277,7 +277,7 @@ static int l2_allocate(BlockDriverState *bs, int l1_index)
 
 /* allocate a new l2 entry */
 
-l2_offset = qcow2_alloc_clusters(bs, s->l2_size * sizeof(uint64_t));
+l2_offset = qcow2_alloc_clusters(bs, s->l2_size * l2_entry_size(s));
 if (l2_offset < 0) {
 ret = l2_offset;
 goto fail;
@@ -301,7 +301,7 @@ static int l2_allocate(BlockDriverState *bs, int l1_index)
 
 /* allocate a new entry in the l2 cache */
 
-slice_size2 = s->l2_slice_size * sizeof(uint64_t);
+slice_size2 = s->l2_slice_size * l2_entry_size(s);
 n_slices = s->cluster_size / slice_size2;
 
 trace_qcow2_l2_allocate_get_empty(bs, l1_index);
@@ -365,7 +365,7 @@ fail:
 }
 s->l1_table[l1_index] = old_l2_offset;
 if (l2_offset > 0) {
-qcow2_free_clusters(bs, l2_offset, s->l2_size * sizeof(uint64_t),
+qcow2_free_clusters(bs, l2_offset, s->l2_size * l2_entry_size(s),
 QCOW2_DISCARD_ALWAYS);
 }
 return ret;
@@ -708,7 +708,7 @@ static int get_cluster_table(BlockDriverState *bs, uint64_t 
offset,
 
 /* Then decrease the refcount of the old table */
 if (l2_offset) {
-qcow2_free_clusters(bs, l2_offset, s->l2_size * sizeof(uint64_t),
+qcow2_free_clusters(bs, l2_offset, s->l2_size * l2_entry_size(s),
 QCOW2_DISCARD_OTHER);
 }
 
@@ -1895,7 +1895,7 @@ static int expand_zero_clusters_in_l1(BlockDriverState 
*bs, uint64_t *l1_table,
 int ret;
 int i, j;
 
-slice_size2 = s->l2_slice_size * sizeof(uint64_t);
+slice_size2 = s->l2_slice_size * l2_entry_size(s);
 n_slices = s->cluster_size / slice_size2;
 
 if (!is_active_l1) {
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 223048569e..de85ed29a4 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1253,7 +1253,7 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
 l2_slice = NULL;
 l1_table = NULL;
 l1_size2 = l1_size * sizeof(uint64_t);
-slice_size2 = s->l2_slice_size * sizeof(uint64_t);
+slice_size2 = s->l2_slice_size * l2_entry_size(s);
 n_slices = s->cluster_size / slice_size2;
 
 s->cache_discards = true;
@@ -1604,7 +1604,7 @@ static int check_refcounts_l2(BlockDriverState *bs, 
BdrvCheckResult *res,
 int i, l2_size, nb_csectors, ret;
 
 /* Read L2 table from disk */
-l2_size = s->l2_size * sizeof(uint64_t);
+l2_size = s->l2_size * l2_entry_size(s);
 l2_table = g_malloc(l2_size);
 
 ret = bdrv_pread(bs->file, l2_offset, l2_table, l2_size);
@@ -1679,15 +1679,16 @@ static int check_refcounts_l2(BlockDriverState *bs, 
BdrvCheckResult *res,
 fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR",
 offset);
 if (fix & BDRV_FIX_ERRORS) {
+int idx = i * (l2_entry_size(s) / sizeof(uint64_t));
 uint64_t l2e_offset =
-l2_offset + (uint64_t)i * sizeof(uint64_t);
+l2_offset + (uint64_t)i * l2_entry_size(s);
 int ign = active ? QCOW2_OL_ACTIVE_L2 :
QCOW2_OL_INACTIVE_L2;
 
 l2_entry = QCOW_OFLAG_ZERO;
 set_l2_entry(s, l2_table, i, l2_entry);
 ret = qcow2_pre_write_overlap_check(bs, ign,
-l2e_offset, sizeof(uint64_t), false);
+l2e_offset, l2_entry_size(s), false);
 if (ret < 0) {
 fprintf(stderr, "ERROR: 

[RFC PATCH v3 14/27] qcow2: Add subcluster support to calculate_l2_meta()

2019-12-22 Thread Alberto Garcia
If an image has subclusters then there are more copy-on-write
scenarios that we need to consider. Let's say we have a write request
from the middle of subcluster #3 until the end of the cluster:

   - If the cluster is new, then subclusters #0 to #3 from the old
 cluster must be copied into the new one.

   - If the cluster is new but the old cluster was unallocated, then
 only subcluster #3 needs copy-on-write. #0 to #2 are marked as
 unallocated in the bitmap of the new L2 entry.

   - If we are overwriting an old cluster and subcluster #3 is
 unallocated or has the all-zeroes bit set then we need
 copy-on-write on subcluster #3.

   - If we are overwriting an old cluster and subcluster #3 was
 allocated then there is no need to copy-on-write.

Signed-off-by: Alberto Garcia 
---
 block/qcow2-cluster.c | 140 +-
 1 file changed, 110 insertions(+), 30 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 40c2e34a2a..c6eb480ee8 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1041,56 +1041,128 @@ void qcow2_alloc_cluster_abort(BlockDriverState *bs, 
QCowL2Meta *m)
  * If @keep_old is true it means that the clusters were already
  * allocated and will be overwritten. If false then the clusters are
  * new and we have to decrease the reference count of the old ones.
+ *
+ * Returns 1 on success, -errno on failure (in order to match the
+ * return value of handle_copied() and handle_alloc()).
  */
-static void calculate_l2_meta(BlockDriverState *bs,
-  uint64_t host_cluster_offset,
-  uint64_t guest_offset, unsigned bytes,
-  uint64_t *l2_slice, QCowL2Meta **m, bool 
keep_old)
+static int calculate_l2_meta(BlockDriverState *bs, uint64_t 
host_cluster_offset,
+ uint64_t guest_offset, unsigned bytes,
+ uint64_t *l2_slice, QCowL2Meta **m, bool keep_old)
 {
 BDRVQcow2State *s = bs->opaque;
-int l2_index = offset_to_l2_slice_index(s, guest_offset);
-uint64_t l2_entry;
+int sc_index, l2_index = offset_to_l2_slice_index(s, guest_offset);
+uint64_t l2_entry, l2_bitmap;
 unsigned cow_start_from, cow_end_to;
 unsigned cow_start_to = offset_into_cluster(s, guest_offset);
 unsigned cow_end_from = cow_start_to + bytes;
 unsigned nb_clusters = size_to_clusters(s, cow_end_from);
 QCowL2Meta *old_m = *m;
-QCow2ClusterType type;
+QCow2SubclusterType type;
 
 assert(nb_clusters <= s->l2_slice_size - l2_index);
 
-/* Return if there's no COW (all clusters are normal and we keep them) */
+/* Return if there's no COW (all subclusters are normal and we are
+ * keeping the clusters) */
 if (keep_old) {
+unsigned first_sc = cow_start_to / s->subcluster_size;
+unsigned last_sc = (cow_end_from - 1) / s->subcluster_size;
 int i;
-for (i = 0; i < nb_clusters; i++) {
-l2_entry = get_l2_entry(s, l2_slice, l2_index + i);
-if (qcow2_get_cluster_type(bs, l2_entry) != QCOW2_CLUSTER_NORMAL) {
+for (i = first_sc; i <= last_sc; i++) {
+unsigned c = i / s->subclusters_per_cluster;
+unsigned sc = i % s->subclusters_per_cluster;
+l2_entry = get_l2_entry(s, l2_slice, l2_index + c);
+l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index + c);
+type = qcow2_get_subcluster_type(bs, l2_entry, l2_bitmap, sc);
+if (type == QCOW2_SUBCLUSTER_INVALID) {
+l2_index += c; /* Point to the invalid entry */
+goto fail;
+}
+if (type != QCOW2_SUBCLUSTER_NORMAL) {
 break;
 }
 }
-if (i == nb_clusters) {
-return;
+if (i == last_sc + 1) {
+return 1;
 }
 }
 
 /* Get the L2 entry from the first cluster */
 l2_entry = get_l2_entry(s, l2_slice, l2_index);
-type = qcow2_get_cluster_type(bs, l2_entry);
+l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index);
+sc_index = offset_to_sc_index(s, guest_offset);
+type = qcow2_get_subcluster_type(bs, l2_entry, l2_bitmap, sc_index);
 
-if (type == QCOW2_CLUSTER_NORMAL && keep_old) {
-cow_start_from = cow_start_to;
+if (type == QCOW2_SUBCLUSTER_INVALID) {
+goto fail;
+}
+
+if (!keep_old) {
+switch (type) {
+case QCOW2_SUBCLUSTER_NORMAL:
+case QCOW2_SUBCLUSTER_COMPRESSED:
+case QCOW2_SUBCLUSTER_ZERO_ALLOC:
+case QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC:
+cow_start_from = 0;
+break;
+case QCOW2_SUBCLUSTER_ZERO_PLAIN:
+case QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN:
+cow_start_from = sc_index << s->subcluster_bits;
+break;
+default:
+g_

[RFC PATCH v3 05/27] qcow2: Document the Extended L2 Entries feature

2019-12-22 Thread Alberto Garcia
Subcluster allocation in qcow2 is implemented by extending the
existing L2 table entries and adding additional information to
indicate the allocation status of each subcluster.

This patch documents the changes to the qcow2 format and how they
affect the calculation of the L2 cache size.

Signed-off-by: Alberto Garcia 
---
 docs/interop/qcow2.txt | 68 --
 docs/qcow2-cache.txt   | 19 +++-
 2 files changed, 83 insertions(+), 4 deletions(-)

diff --git a/docs/interop/qcow2.txt b/docs/interop/qcow2.txt
index af5711e533..d34261f955 100644
--- a/docs/interop/qcow2.txt
+++ b/docs/interop/qcow2.txt
@@ -39,6 +39,9 @@ The first cluster of a qcow2 image contains the file header:
 as the maximum cluster size and won't be able to open 
images
 with larger cluster sizes.
 
+Note: if the image has Extended L2 Entries then 
cluster_bits
+must be at least 14 (i.e. 16384 byte clusters).
+
  24 - 31:   size
 Virtual disk size in bytes.
 
@@ -109,7 +112,12 @@ in the description of a field.
 An External Data File Name header extension may
 be present if this bit is set.
 
-Bits 3-63:  Reserved (set to 0)
+Bit 3:  Extended L2 Entries.  If this bit is set then
+L2 table entries use an extended format that
+allows subcluster-based allocation. See the
+Extended L2 Entries section for more details.
+
+Bits 4-63:  Reserved (set to 0)
 
  80 -  87:  compatible_features
 Bitmask of compatible features. An implementation can
@@ -437,7 +445,7 @@ cannot be relaxed without an incompatible layout change).
 Given an offset into the virtual disk, the offset into the image file can be
 obtained as follows:
 
-l2_entries = (cluster_size / sizeof(uint64_t))
+l2_entries = (cluster_size / sizeof(uint64_t))[*]
 
 l2_index = (offset / cluster_size) % l2_entries
 l1_index = (offset / cluster_size) / l2_entries
@@ -447,6 +455,8 @@ obtained as follows:
 
 return cluster_offset + (offset % cluster_size)
 
+[*] this changes if Extended L2 Entries are enabled, see next section
+
 L1 table entry:
 
 Bit  0 -  8:Reserved (set to 0)
@@ -487,7 +497,8 @@ Standard Cluster Descriptor:
 nor is data read from the backing file if the cluster is
 unallocated.
 
-With version 2, this is always 0.
+With version 2 or with extended L2 entries (see the next
+section), this is always 0.
 
  1 -  8:Reserved (set to 0)
 
@@ -524,6 +535,57 @@ file (except if bit 0 in the Standard Cluster Descriptor 
is set). If there is
 no backing file or the backing file is smaller than the image, they shall read
 zeros for all parts that are not covered by the backing file.
 
+== Extended L2 Entries ==
+
+An image uses Extended L2 Entries if bit 3 is set on the incompatible_features
+field of the header.
+
+In these images standard data clusters are divided into 32 subclusters of the
+same size. They are contiguous and start from the beginning of the cluster.
+Subclusters can be allocated independently and the L2 entry contains 
information
+indicating the status of each one of them. Compressed data clusters don't have
+subclusters so they are treated like in images without this feature.
+
+The size of an extended L2 entry is 128 bits so the number of entries per table
+is calculated using this formula:
+
+l2_entries = (cluster_size / (2 * sizeof(uint64_t)))
+
+The first 64 bits have the same format as the standard L2 table entry described
+in the previous section, with the exception of bit 0 of the standard cluster
+descriptor.
+
+The last 64 bits contain a subcluster allocation bitmap with this format:
+
+Subcluster Allocation Bitmap (for standard clusters):
+
+Bit  0 -  31:   Allocation status (one bit per subcluster)
+
+1: the subcluster is allocated. In this case the
+   host cluster offset field must contain a valid
+   offset.
+0: the subcluster is not allocated. In this case
+   read requests shall go to the backing file or
+   return zeros if there is no backing file data.
+
+Bits are assigned starting from the most significant one.
+(i.e. bit x is used for subcluster 31 - x)
+
+32 -  63Subcluster reads as zeros (one bit per subcluster)
+
+1: the subcluster reads as zeros. In this case the
+   allocation status bit must be unset. The host
+   cluster offset field may or may not be set

[RFC PATCH v3 20/27] qcow2: Fix offset calculation in handle_dependencies()

2019-12-22 Thread Alberto Garcia
l2meta_cow_start() and l2meta_cow_end() are not necessarily
cluster-aligned if the image has subclusters, so update the
calculation of old_start and old_end to guarantee that no two requests
try to write on the same cluster.

Signed-off-by: Alberto Garcia 
---
 block/qcow2-cluster.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index ede75138d2..0a40944667 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1279,8 +1279,8 @@ static int handle_dependencies(BlockDriverState *bs, 
uint64_t guest_offset,
 
 uint64_t start = guest_offset;
 uint64_t end = start + bytes;
-uint64_t old_start = l2meta_cow_start(old_alloc);
-uint64_t old_end = l2meta_cow_end(old_alloc);
+uint64_t old_start = start_of_cluster(s, l2meta_cow_start(old_alloc));
+uint64_t old_end = ROUND_UP(l2meta_cow_end(old_alloc), 
s->cluster_size);
 
 if (end <= old_start || start >= old_end) {
 /* No intersection */
-- 
2.20.1




[RFC PATCH v3 15/27] qcow2: Add subcluster support to qcow2_get_cluster_offset()

2019-12-22 Thread Alberto Garcia
The logic of this function remains pretty much the same, except that
it uses count_contiguous_subclusters(), which combines the logic of
count_contiguous_clusters() / count_contiguous_clusters_unallocated()
and checks individual subclusters.

Signed-off-by: Alberto Garcia 
---
 block/qcow2-cluster.c | 136 --
 block/qcow2.h |  36 +--
 2 files changed, 80 insertions(+), 92 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index c6eb480ee8..c10601a828 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -372,66 +372,55 @@ fail:
 }
 
 /*
- * Checks how many clusters in a given L2 slice are contiguous in the image
- * file. As soon as one of the flags in the bitmask stop_flags changes compared
- * to the first cluster, the search is stopped and the cluster is not counted
- * as contiguous. (This allows it, for example, to stop at the first compressed
- * cluster which may require a different handling)
+ * Return the number of contiguous subclusters of the exact same type
+ * in a given L2 slice, starting from cluster @l2_index, subcluster
+ * @sc_index. Allocated subclusters are required to be contiguous in
+ * the image file.
+ * At most @nb_clusters are checked (note that this means clusters,
+ * not subclusters).
  */
-static int count_contiguous_clusters(BlockDriverState *bs, int nb_clusters,
-int cluster_size, uint64_t *l2_slice, int l2_index, uint64_t 
stop_flags)
+static int count_contiguous_subclusters(BlockDriverState *bs, int nb_clusters,
+unsigned sc_index, uint64_t *l2_slice,
+int l2_index)
 {
 BDRVQcow2State *s = bs->opaque;
-int i;
-QCow2ClusterType first_cluster_type;
-uint64_t mask = stop_flags | L2E_OFFSET_MASK | QCOW_OFLAG_COMPRESSED;
-uint64_t first_entry = get_l2_entry(s, l2_slice, l2_index);
-uint64_t offset = first_entry & mask;
-
-first_cluster_type = qcow2_get_cluster_type(bs, first_entry);
-if (first_cluster_type == QCOW2_CLUSTER_UNALLOCATED) {
-return 0;
+int i, j, count = 0;
+uint64_t l2_entry = get_l2_entry(s, l2_slice, l2_index);
+uint64_t l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index);
+uint64_t expected_offset = l2_entry & L2E_OFFSET_MASK;
+bool check_offset = true;
+QCow2SubclusterType type =
+qcow2_get_subcluster_type(bs, l2_entry, l2_bitmap, sc_index);
+
+assert(type != QCOW2_SUBCLUSTER_INVALID); /* The caller should check this 
*/
+
+if (type == QCOW2_SUBCLUSTER_COMPRESSED) {
+/* Compressed clusters are always processed one by one */
+return s->subclusters_per_cluster - sc_index;
 }
 
-/* must be allocated */
-assert(first_cluster_type == QCOW2_CLUSTER_NORMAL ||
-   first_cluster_type == QCOW2_CLUSTER_ZERO_ALLOC);
-
-for (i = 0; i < nb_clusters; i++) {
-uint64_t l2_entry = get_l2_entry(s, l2_slice, l2_index + i) & mask;
-if (offset + (uint64_t) i * cluster_size != l2_entry) {
-break;
-}
+if (type == QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN ||
+type == QCOW2_SUBCLUSTER_ZERO_PLAIN) {
+check_offset = false;
 }
 
-return i;
-}
-
-/*
- * Checks how many consecutive unallocated clusters in a given L2
- * slice have the same cluster type.
- */
-static int count_contiguous_clusters_unallocated(BlockDriverState *bs,
- int nb_clusters,
- uint64_t *l2_slice,
- int l2_index,
- QCow2ClusterType wanted_type)
-{
-BDRVQcow2State *s = bs->opaque;
-int i;
-
-assert(wanted_type == QCOW2_CLUSTER_ZERO_PLAIN ||
-   wanted_type == QCOW2_CLUSTER_UNALLOCATED);
 for (i = 0; i < nb_clusters; i++) {
-uint64_t entry = get_l2_entry(s, l2_slice, l2_index + i);
-QCow2ClusterType type = qcow2_get_cluster_type(bs, entry);
-
-if (type != wanted_type) {
-break;
+l2_entry = get_l2_entry(s, l2_slice, l2_index + i);
+l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index + i);
+if (check_offset && expected_offset != (l2_entry & L2E_OFFSET_MASK)) {
+goto out;
+}
+for (j = (i == 0) ? sc_index : 0; j < s->subclusters_per_cluster; j++) 
{
+if (qcow2_get_subcluster_type(bs, l2_entry, l2_bitmap, j) != type) 
{
+goto out;
+}
+count++;
 }
+expected_offset += s->cluster_size;
 }
 
-return i;
+out:
+return count;
 }
 
 static int coroutine_fn do_perform_cow_read(BlockDriverState *bs,
@@ -515,12 +504,12 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, 
uint64_t offset,
  QCow2SubclusterType *subcluster_type)
 {
 B

[RFC PATCH v3 03/27] qcow2: Process QCOW2_CLUSTER_ZERO_ALLOC clusters in handle_copied()

2019-12-22 Thread Alberto Garcia
When writing to a qcow2 file there are two functions that take a
virtual offset and return a host offset, possibly allocating new
clusters if necessary:

   - handle_copied() looks for normal data clusters that are already
 allocated and have a reference count of 1. In those clusters we
 can simply write the data and there is no need to perform any
 copy-on-write.

   - handle_alloc() looks for clusters that do need copy-on-write,
 either because they haven't been allocated yet, because their
 reference count is != 1 or because they are ZERO_ALLOC clusters.

The ZERO_ALLOC case is a bit special because those are clusters that
are already allocated and they could perfectly be dealt with in
handle_copied() (as long as copy-on-write is performed when required).

In fact, there is extra code specifically for them in handle_alloc()
that tries to reuse the existing allocation if possible and frees them
otherwise.

This patch changes the handling of ZERO_ALLOC clusters so the
semantics of these two functions are now like this:

   - handle_copied() looks for clusters that are already allocated and
 which we can overwrite (NORMAL and ZERO_ALLOC clusters with a
 reference count of 1).

   - handle_alloc() looks for clusters for which we need a new
 allocation (all other cases).

One importante difference after this change is that clusters found in
handle_copied() may now require copy-on-write, but this will be anyway
necessary once we add support for subclusters.

Signed-off-by: Alberto Garcia 
---
 block/qcow2-cluster.c | 226 +++---
 1 file changed, 126 insertions(+), 100 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index e078bddcc2..9387f15866 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1021,13 +1021,18 @@ void qcow2_alloc_cluster_abort(BlockDriverState *bs, 
QCowL2Meta *m)
 
 /*
  * For a given write request, create a new QCowL2Meta structure, add
- * it to @m and the BDRVQcow2State.cluster_allocs list.
+ * it to @m and the BDRVQcow2State.cluster_allocs list. If the write
+ * request does not need copy-on-write or changes to the L2 metadata
+ * then this function does nothing.
  *
  * @host_cluster_offset points to the beginning of the first cluster.
  *
  * @guest_offset and @bytes indicate the offset and length of the
  * request.
  *
+ * @l2_slice contains the L2 entries of all clusters involved in this
+ * write request.
+ *
  * If @keep_old is true it means that the clusters were already
  * allocated and will be overwritten. If false then the clusters are
  * new and we have to decrease the reference count of the old ones.
@@ -1035,15 +1040,53 @@ void qcow2_alloc_cluster_abort(BlockDriverState *bs, 
QCowL2Meta *m)
 static void calculate_l2_meta(BlockDriverState *bs,
   uint64_t host_cluster_offset,
   uint64_t guest_offset, unsigned bytes,
-  QCowL2Meta **m, bool keep_old)
+  uint64_t *l2_slice, QCowL2Meta **m, bool 
keep_old)
 {
 BDRVQcow2State *s = bs->opaque;
-unsigned cow_start_from = 0;
+int l2_index = offset_to_l2_slice_index(s, guest_offset);
+uint64_t l2_entry;
+unsigned cow_start_from, cow_end_to;
 unsigned cow_start_to = offset_into_cluster(s, guest_offset);
 unsigned cow_end_from = cow_start_to + bytes;
-unsigned cow_end_to = ROUND_UP(cow_end_from, s->cluster_size);
 unsigned nb_clusters = size_to_clusters(s, cow_end_from);
 QCowL2Meta *old_m = *m;
+QCow2ClusterType type;
+
+assert(nb_clusters <= s->l2_slice_size - l2_index);
+
+/* Return if there's no COW (all clusters are normal and we keep them) */
+if (keep_old) {
+int i;
+for (i = 0; i < nb_clusters; i++) {
+l2_entry = be64_to_cpu(l2_slice[l2_index + i]);
+if (qcow2_get_cluster_type(bs, l2_entry) != QCOW2_CLUSTER_NORMAL) {
+break;
+}
+}
+if (i == nb_clusters) {
+return;
+}
+}
+
+/* Get the L2 entry from the first cluster */
+l2_entry = be64_to_cpu(l2_slice[l2_index]);
+type = qcow2_get_cluster_type(bs, l2_entry);
+
+if (type == QCOW2_CLUSTER_NORMAL && keep_old) {
+cow_start_from = cow_start_to;
+} else {
+cow_start_from = 0;
+}
+
+/* Get the L2 entry from the last cluster */
+l2_entry = be64_to_cpu(l2_slice[l2_index + nb_clusters - 1]);
+type = qcow2_get_cluster_type(bs, l2_entry);
+
+if (type == QCOW2_CLUSTER_NORMAL && keep_old) {
+cow_end_to = cow_end_from;
+} else {
+cow_end_to = ROUND_UP(cow_end_from, s->cluster_size);
+}
 
 *m = g_malloc0(sizeof(**m));
 **m = (QCowL2Meta) {
@@ -1069,18 +1112,20 @@ static void calculate_l2_meta(BlockDriverState *bs,
 QLIST_INSERT_HEAD(>cluster_allocs, *m, next_in_flight);
 }
 
-/*

[RFC PATCH v3 27/27] iotests: Add tests for qcow2 images with extended L2 entries

2019-12-22 Thread Alberto Garcia
Signed-off-by: Alberto Garcia 
---
 tests/qemu-iotests/271 | 256 +
 tests/qemu-iotests/271.out | 208 ++
 tests/qemu-iotests/group   |   1 +
 3 files changed, 465 insertions(+)
 create mode 100755 tests/qemu-iotests/271
 create mode 100644 tests/qemu-iotests/271.out

diff --git a/tests/qemu-iotests/271 b/tests/qemu-iotests/271
new file mode 100755
index 00..73cdc37bf0
--- /dev/null
+++ b/tests/qemu-iotests/271
@@ -0,0 +1,256 @@
+#!/bin/bash
+#
+# Test qcow2 images with extended L2 entries
+#
+# Copyright (C) 2019 Igalia, S.L.
+# Author: Alberto Garcia 
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+# creator
+owner=be...@igalia.com
+
+seq="$(basename $0)"
+echo "QA output created by $seq"
+
+here="$PWD"
+status=1   # failure is the default!
+
+_cleanup()
+{
+   _cleanup_test_img
+rm -f "$TEST_IMG.raw"
+rm -f "$TEST_IMG.backing"
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+_supported_fmt qcow2
+_supported_proto file nfs
+_supported_os Linux
+
+IMGOPTS="extended_l2=on"
+l2_offset=262144 # 0x4
+
+_verify_img()
+{
+$QEMU_IMG compare "$TEST_IMG" "$TEST_IMG.raw" | grep -v 'Images are 
identical'
+$QEMU_IMG check "$TEST_IMG" | _filter_qemu_img_check | \
+grep -v 'No errors were found on the image'
+}
+
+_read_l2_entry()
+{
+entry_no=$1
+nentries=$2
+offset=$(($l2_offset + $entry_no * 16))
+length=$((nentries * 16))
+$QEMU_IO -f raw -c "read -v $offset $length" "$TEST_IMG" | _filter_qemu_io 
| head -n -2
+}
+
+_test_write()
+{
+cmd="$1"
+l2_entry_idx="$2"
+l2_entry_num="$3"
+raw_cmd=`echo $1 | sed s/-c//` # Raw images don't support -c
+echo "$cmd"
+$QEMU_IO -c "$cmd" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "$raw_cmd" -f raw "$TEST_IMG.raw" | _filter_qemu_io
+_verify_img
+if [ -n "$l2_entry_idx" ]; then
+_read_l2_entry "$l2_entry_idx" "$l2_entry_num"
+fi
+}
+
+_reset_img()
+{
+$QEMU_IMG create -f raw "$TEST_IMG.raw" 1M | _filter_img_create
+if [ "$use_backing_file" = "yes" ]; then
+$QEMU_IMG create -f raw "$TEST_IMG.backing" 1M | _filter_img_create
+$QEMU_IO -c 'write -q -P 0xFF 0 1M' -f raw "$TEST_IMG.backing" | 
_filter_qemu_io
+$QEMU_IO -c 'write -q -P 0xFF 0 1M' -f raw "$TEST_IMG.raw" | 
_filter_qemu_io
+_make_test_img -b "$TEST_IMG.backing" 1M
+else
+_make_test_img 1M
+fi
+}
+
+# Test that writing to an image with subclusters produces the expected
+# results, in images with and without backing files
+for use_backing_file in yes no; do
+echo
+echo "### Standard write tests (backing file: $use_backing_file) ###"
+echo
+_reset_img
+### Write subcluster #0 (beginning of subcluster) ###
+_test_write 'write -q -P 1 0 1k' 0 1
+
+### Write subcluster #1 (middle of subcluster) ###
+_test_write 'write -q -P 2 3k 512' 0 1
+
+### Write subcluster #2 (end of subcluster) ###
+_test_write 'write -q -P 3 5k 1k' 0 1
+
+### Write subcluster #3 (full subcluster) ###
+_test_write 'write -q -P 4 6k 2k' 0 1
+
+### Write subclusters #4-6 (full subclusters) ###
+_test_write 'write -q -P 5 8k 6k' 0 1
+
+### Write subclusters #7-9 (partial subclusters) ###
+_test_write 'write -q -P 6 15k 4k' 0 1
+
+### Write subcluster #16 (partial subcluster) ###
+_test_write 'write -q -P 7 32k 1k' 0 2
+
+### Write subcluster #31-#34 (cluster overlap) ###
+_test_write 'write -q -P 8 63k 4k' 0 2
+
+### Zero subcluster #1 (TODO: use the "all zeros" bit)
+_test_write 'write -q -z 2k 2k' 0 1
+
+### Zero cluster #0
+_test_write 'write -q -z 0 64k' 0 1
+
+### Fill cluster #0 with data
+_test_write 'write -q -P 9 0 64k' 0 1
+
+### Zero and unmap half of cluster #0 (this won't unmap it)
+_test_write 'write -q -z -u 0 32k' 0 1
+
+### Zero and unmap cluster #0
+_test_write 'write -q -z -u 0 64k' 0

[RFC PATCH v3 04/27] qcow2: Add get_l2_entry() and set_l2_entry()

2019-12-22 Thread Alberto Garcia
The size of an L2 entry is 64 bits, but if we want to have subclusters
we need extended L2 entries. This means that we have to access L2
tables and slices differently depending on whether an image has
extended L2 entries or not.

This patch replaces all l2_slice[] accesses with calls to
get_l2_entry() and set_l2_entry().

Signed-off-by: Alberto Garcia 
---
 block/qcow2-cluster.c  | 65 ++
 block/qcow2-refcount.c | 17 +--
 block/qcow2.h  | 12 
 3 files changed, 55 insertions(+), 39 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 9387f15866..683c9569ad 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -379,12 +379,13 @@ fail:
  * cluster which may require a different handling)
  */
 static int count_contiguous_clusters(BlockDriverState *bs, int nb_clusters,
-int cluster_size, uint64_t *l2_slice, uint64_t stop_flags)
+int cluster_size, uint64_t *l2_slice, int l2_index, uint64_t 
stop_flags)
 {
+BDRVQcow2State *s = bs->opaque;
 int i;
 QCow2ClusterType first_cluster_type;
 uint64_t mask = stop_flags | L2E_OFFSET_MASK | QCOW_OFLAG_COMPRESSED;
-uint64_t first_entry = be64_to_cpu(l2_slice[0]);
+uint64_t first_entry = get_l2_entry(s, l2_slice, l2_index);
 uint64_t offset = first_entry & mask;
 
 first_cluster_type = qcow2_get_cluster_type(bs, first_entry);
@@ -397,7 +398,7 @@ static int count_contiguous_clusters(BlockDriverState *bs, 
int nb_clusters,
first_cluster_type == QCOW2_CLUSTER_ZERO_ALLOC);
 
 for (i = 0; i < nb_clusters; i++) {
-uint64_t l2_entry = be64_to_cpu(l2_slice[i]) & mask;
+uint64_t l2_entry = get_l2_entry(s, l2_slice, l2_index + i) & mask;
 if (offset + (uint64_t) i * cluster_size != l2_entry) {
 break;
 }
@@ -413,14 +414,16 @@ static int count_contiguous_clusters(BlockDriverState 
*bs, int nb_clusters,
 static int count_contiguous_clusters_unallocated(BlockDriverState *bs,
  int nb_clusters,
  uint64_t *l2_slice,
+ int l2_index,
  QCow2ClusterType wanted_type)
 {
+BDRVQcow2State *s = bs->opaque;
 int i;
 
 assert(wanted_type == QCOW2_CLUSTER_ZERO_PLAIN ||
wanted_type == QCOW2_CLUSTER_UNALLOCATED);
 for (i = 0; i < nb_clusters; i++) {
-uint64_t entry = be64_to_cpu(l2_slice[i]);
+uint64_t entry = get_l2_entry(s, l2_slice, l2_index + i);
 QCow2ClusterType type = qcow2_get_cluster_type(bs, entry);
 
 if (type != wanted_type) {
@@ -566,7 +569,7 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t 
offset,
 /* find the cluster offset for the given disk offset */
 
 l2_index = offset_to_l2_slice_index(s, offset);
-*cluster_offset = be64_to_cpu(l2_slice[l2_index]);
+*cluster_offset = get_l2_entry(s, l2_slice, l2_index);
 
 nb_clusters = size_to_clusters(s, bytes_needed);
 /* bytes_needed <= *bytes + offset_in_cluster, both of which are unsigned
@@ -601,14 +604,14 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, 
uint64_t offset,
 case QCOW2_CLUSTER_UNALLOCATED:
 /* how many empty clusters ? */
 c = count_contiguous_clusters_unallocated(bs, nb_clusters,
-  _slice[l2_index], type);
+  l2_slice, l2_index, type);
 *cluster_offset = 0;
 break;
 case QCOW2_CLUSTER_ZERO_ALLOC:
 case QCOW2_CLUSTER_NORMAL:
 /* how many allocated clusters ? */
 c = count_contiguous_clusters(bs, nb_clusters, s->cluster_size,
-  _slice[l2_index], QCOW_OFLAG_ZERO);
+  l2_slice, l2_index, QCOW_OFLAG_ZERO);
 *cluster_offset &= L2E_OFFSET_MASK;
 if (offset_into_cluster(s, *cluster_offset)) {
 qcow2_signal_corruption(bs, true, -1, -1,
@@ -761,7 +764,7 @@ int qcow2_alloc_compressed_cluster_offset(BlockDriverState 
*bs,
 
 /* Compression can't overwrite anything. Fail if the cluster was already
  * allocated. */
-cluster_offset = be64_to_cpu(l2_slice[l2_index]);
+cluster_offset = get_l2_entry(s, l2_slice, l2_index);
 if (cluster_offset & L2E_OFFSET_MASK) {
 qcow2_cache_put(s->l2_table_cache, (void **) _slice);
 return -EIO;
@@ -786,7 +789,7 @@ int qcow2_alloc_compressed_cluster_offset(BlockDriverState 
*bs,
 
 BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE_COMPRESSED);
 qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
-l2_slice[l2_index] = cpu_to_be64(cluster_offset);
+set_l2_entry(s, l2_slice, l2_index, cluster_offset);
 qcow2_cache_put(s->l2_table_cache, (void **)

[RFC PATCH v3 00/27] Add subcluster allocation to qcow2

2019-12-22 Thread Alberto Garcia
Hi,

here's the new version of the patches to add subcluster allocation
support to qcow2.

Please refer to the cover letter of the first version for a full
description of the patches:

   https://lists.gnu.org/archive/html/qemu-block/2019-10/msg00983.html

This version fixes many of the problems highlighted by Max. I decided
not to replace completely the cluster logic with subcluster logic in
all cases because I felt that sometimes it only complicated the code.
Let's see what you think :-)

Berto

v3:
- Patch 01: Rename host_offset to host_cluster_offset and make 'bytes'
an unsigned int [Max]
- Patch 03: Rename cluster_needs_cow to cluster_needs_new_alloc and
count_cow_clusters to count_single_write_clusters. Update
documentation and add more assertions and checks [Max]
- Patch 09: Update qcow2_co_truncate() to properly support extended L2
entries [Max]
- Patch 10: Forbid calling set_l2_bitmap() if the image does not have
extended L2 entries [Max]
- Patch 11 (new): Add QCow2SubclusterType [Max]
- Patch 12 (new): Replace QCOW2_CLUSTER_* with QCOW2_SUBCLUSTER_*
- Patch 13 (new): Handle QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC
- Patch 14: Use QCow2SubclusterType instead of QCow2ClusterType [Max]
- Patch 15: Use QCow2SubclusterType instead of QCow2ClusterType [Max]
- Patch 19: Don't call set_l2_bitmap() if the image does not have
extended L2 entries [Max]
- Patch 21: Use smaller data types.
- Patch 22: Don't call set_l2_bitmap() if the image does not have
extended L2 entries [Max]
- Patch 23: Use smaller data types.
- Patch 25: Update test results and documentation. Move the check for
the minimum subcluster size to validate_cluster_size().
- Patch 26 (new): Add subcluster support to qcow2_measure()
- Patch 27: Add more tests

v2: https://lists.gnu.org/archive/html/qemu-block/2019-10/msg01642.html
- Patch 12: Update after the changes in 88f468e546.
- Patch 21 (new): Clear the L2 bitmap when allocating a compressed
  cluster. Compressed clusters should have the bitmap all set to 0.
- Patch 24: Document the new fields in the QAPI documentation [Eric].
- Patch 25: Allow qcow2 preallocation with backing files.
- Patch 26: Add some tests for qcow2 images with extended L2 entries.

v1: https://lists.gnu.org/archive/html/qemu-block/2019-10/msg00983.html

Output of git backport-diff against v2:

Key:
[] : patches are identical
[] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/27:[0013] [FC] 'qcow2: Add calculate_l2_meta()'
002/27:[] [-C] 'qcow2: Split cluster_needs_cow() out of 
count_cow_clusters()'
003/27:[0083] [FC] 'qcow2: Process QCOW2_CLUSTER_ZERO_ALLOC clusters in 
handle_copied()'
004/27:[] [-C] 'qcow2: Add get_l2_entry() and set_l2_entry()'
005/27:[] [--] 'qcow2: Document the Extended L2 Entries feature'
006/27:[] [--] 'qcow2: Add dummy has_subclusters() function'
007/27:[] [--] 'qcow2: Add subcluster-related fields to BDRVQcow2State'
008/27:[] [--] 'qcow2: Add offset_to_sc_index()'
009/27:[0008] [FC] 'qcow2: Add l2_entry_size()'
010/27:[0008] [FC] 'qcow2: Update get/set_l2_entry() and add 
get/set_l2_bitmap()'
011/27:[down] 'qcow2: Add QCow2SubclusterType and qcow2_get_subcluster_type()'
012/27:[down] 'qcow2: Replace QCOW2_CLUSTER_* with QCOW2_SUBCLUSTER_*'
013/27:[down] 'qcow2: Handle QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC'
014/27:[0060] [FC] 'qcow2: Add subcluster support to calculate_l2_meta()'
015/27:[0091] [FC] 'qcow2: Add subcluster support to qcow2_get_cluster_offset()'
016/27:[] [--] 'qcow2: Add subcluster support to zero_in_l2_slice()'
017/27:[] [--] 'qcow2: Add subcluster support to discard_in_l2_slice()'
018/27:[] [--] 'qcow2: Add subcluster support to check_refcounts_l2()'
019/27:[0008] [FC] 'qcow2: Add subcluster support to 
expand_zero_clusters_in_l1()'
020/27:[] [--] 'qcow2: Fix offset calculation in handle_dependencies()'
021/27:[0007] [FC] 'qcow2: Update L2 bitmap in qcow2_alloc_cluster_link_l2()'
022/27:[0004] [FC] 'qcow2: Clear the L2 bitmap when allocating a compressed 
cluster'
023/27:[0002] [FC] 'qcow2: Add subcluster support to handle_alloc_space()'
024/27:[] [-C] 'qcow2: Restrict qcow2_co_pwrite_zeroes() to full clusters 
only'
025/27:[0049] [FC] 'qcow2: Add the 'extended_l2' option and the 
QCOW2_INCOMPAT_EXTL2 bit'
026/27:[down] 'qcow2: Add subcluster support to qcow2_measure()'
027/27:[0046] [FC] 'iotests: Add tests for qcow2 images with extended L2 
entries'

Alberto Garcia (27):
  qcow2: Add calculate_l2_meta()
  qcow2: Split cluster_needs_cow() out of count_cow_clusters()
  qcow2: Process QCOW2_CLUSTER_ZERO_ALLOC clusters in handle_copied()
  qcow2: Add get_l2_entry() and set_l2_entry()
  qcow2: Document the Extended L2 Entries feature
  qcow2: Add dummy has_subclusters() function
  qcow2: Add subcluster

[RFC PATCH v3 19/27] qcow2: Add subcluster support to expand_zero_clusters_in_l1()

2019-12-22 Thread Alberto Garcia
Two changes are needed in order to add subcluster support to this
function: deallocated clusters must have their bitmaps cleared, and
expanded clusters must have all the "subcluster allocated" bits set.

Signed-off-by: Alberto Garcia 
---
 block/qcow2-cluster.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 207f670c94..ede75138d2 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -2054,6 +2054,9 @@ static int expand_zero_clusters_in_l1(BlockDriverState 
*bs, uint64_t *l1_table,
 /* not backed; therefore we can simply deallocate the
  * cluster */
 set_l2_entry(s, l2_slice, j, 0);
+if (has_subclusters(s)) {
+set_l2_bitmap(s, l2_slice, j, 0);
+}
 l2_dirty = true;
 continue;
 }
@@ -2120,6 +2123,9 @@ static int expand_zero_clusters_in_l1(BlockDriverState 
*bs, uint64_t *l1_table,
 } else {
 set_l2_entry(s, l2_slice, j, offset);
 }
+if (has_subclusters(s)) {
+set_l2_bitmap(s, l2_slice, j, QCOW_L2_BITMAP_ALL_ALLOC);
+}
 l2_dirty = true;
 }
 
-- 
2.20.1




[RFC PATCH v3 02/27] qcow2: Split cluster_needs_cow() out of count_cow_clusters()

2019-12-22 Thread Alberto Garcia
We are going to need it in other places.

Signed-off-by: Alberto Garcia 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 block/qcow2-cluster.c | 34 +++---
 1 file changed, 19 insertions(+), 15 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 617618dc54..e078bddcc2 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1069,6 +1069,24 @@ static void calculate_l2_meta(BlockDriverState *bs,
 QLIST_INSERT_HEAD(>cluster_allocs, *m, next_in_flight);
 }
 
+/* Returns true if writing to a cluster requires COW */
+static bool cluster_needs_cow(BlockDriverState *bs, uint64_t l2_entry)
+{
+switch (qcow2_get_cluster_type(bs, l2_entry)) {
+case QCOW2_CLUSTER_NORMAL:
+if (l2_entry & QCOW_OFLAG_COPIED) {
+return false;
+}
+case QCOW2_CLUSTER_UNALLOCATED:
+case QCOW2_CLUSTER_COMPRESSED:
+case QCOW2_CLUSTER_ZERO_PLAIN:
+case QCOW2_CLUSTER_ZERO_ALLOC:
+return true;
+default:
+abort();
+}
+}
+
 /*
  * Returns the number of contiguous clusters that can be used for an allocating
  * write, but require COW to be performed (this includes yet unallocated space,
@@ -1081,25 +1099,11 @@ static int count_cow_clusters(BlockDriverState *bs, int 
nb_clusters,
 
 for (i = 0; i < nb_clusters; i++) {
 uint64_t l2_entry = be64_to_cpu(l2_slice[l2_index + i]);
-QCow2ClusterType cluster_type = qcow2_get_cluster_type(bs, l2_entry);
-
-switch(cluster_type) {
-case QCOW2_CLUSTER_NORMAL:
-if (l2_entry & QCOW_OFLAG_COPIED) {
-goto out;
-}
+if (!cluster_needs_cow(bs, l2_entry)) {
 break;
-case QCOW2_CLUSTER_UNALLOCATED:
-case QCOW2_CLUSTER_COMPRESSED:
-case QCOW2_CLUSTER_ZERO_PLAIN:
-case QCOW2_CLUSTER_ZERO_ALLOC:
-break;
-default:
-abort();
 }
 }
 
-out:
 assert(i <= nb_clusters);
 return i;
 }
-- 
2.20.1




[RFC PATCH v3 08/27] qcow2: Add offset_to_sc_index()

2019-12-22 Thread Alberto Garcia
For a given offset, return the subcluster number within its cluster
(i.e. with 32 subclusters per cluster it returns a number between 0
and 31).

Signed-off-by: Alberto Garcia 
---
 block/qcow2.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/block/qcow2.h b/block/qcow2.h
index 941330cfc9..523bc489a5 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -566,6 +566,11 @@ static inline int offset_to_l2_slice_index(BDRVQcow2State 
*s, int64_t offset)
 return (offset >> s->cluster_bits) & (s->l2_slice_size - 1);
 }
 
+static inline int offset_to_sc_index(BDRVQcow2State *s, int64_t offset)
+{
+return (offset >> s->subcluster_bits) & (s->subclusters_per_cluster - 1);
+}
+
 static inline int64_t qcow2_vm_state_offset(BDRVQcow2State *s)
 {
 return (int64_t)s->l1_vm_state_index << (s->cluster_bits + s->l2_bits);
-- 
2.20.1




[RFC PATCH v3 22/27] qcow2: Clear the L2 bitmap when allocating a compressed cluster

2019-12-22 Thread Alberto Garcia
Compressed clusters always have the bitmap part of the extended L2
entry set to 0.

Signed-off-by: Alberto Garcia 
---
 block/qcow2-cluster.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index ed291a4042..af0f01621c 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -789,6 +789,9 @@ int qcow2_alloc_compressed_cluster_offset(BlockDriverState 
*bs,
 BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE_COMPRESSED);
 qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
 set_l2_entry(s, l2_slice, l2_index, cluster_offset);
+if (has_subclusters(s)) {
+set_l2_bitmap(s, l2_slice, l2_index, 0);
+}
 qcow2_cache_put(s->l2_table_cache, (void **) _slice);
 
 *host_offset = cluster_offset & s->cluster_offset_mask;
-- 
2.20.1




[RFC PATCH v3 13/27] qcow2: Handle QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC

2019-12-22 Thread Alberto Garcia
When dealing with subcluster types there is a new value called
QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC that has no equivalent in
QCow2ClusterType.

This patch handles that value in all places where subcluster types
are processed.

Signed-off-by: Alberto Garcia 
---
 block/qcow2.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 9277d680ef..1d3da0ccf6 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1985,7 +1985,8 @@ static int coroutine_fn 
qcow2_co_block_status(BlockDriverState *bs,
 *pnum = bytes;
 
 if ((type == QCOW2_SUBCLUSTER_NORMAL ||
- type == QCOW2_SUBCLUSTER_ZERO_ALLOC) && !s->crypto) {
+ type == QCOW2_SUBCLUSTER_ZERO_ALLOC ||
+ type == QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC) && !s->crypto) {
 *map = cluster_offset | offset_into_cluster(s, offset);
 *file = s->data_file->bs;
 status |= BDRV_BLOCK_OFFSET_VALID;
@@ -1993,7 +1994,8 @@ static int coroutine_fn 
qcow2_co_block_status(BlockDriverState *bs,
 if (type == QCOW2_SUBCLUSTER_ZERO_PLAIN ||
 type == QCOW2_SUBCLUSTER_ZERO_ALLOC) {
 status |= BDRV_BLOCK_ZERO;
-} else if (type != QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN) {
+} else if (type != QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN &&
+   type != QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC) {
 status |= BDRV_BLOCK_DATA;
 }
 if (s->metadata_preallocation && (status & BDRV_BLOCK_DATA) &&
@@ -2163,6 +2165,7 @@ static coroutine_fn int 
qcow2_co_preadv_task(BlockDriverState *bs,
 g_assert_not_reached();
 
 case QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN:
+case QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC:
 assert(bs->backing); /* otherwise handled in qcow2_co_preadv_part */
 
 BLKDBG_EVENT(bs->file, BLKDBG_READ_BACKING_AIO);
@@ -2236,7 +2239,8 @@ static coroutine_fn int 
qcow2_co_preadv_part(BlockDriverState *bs,
 
 if (type == QCOW2_SUBCLUSTER_ZERO_PLAIN ||
 type == QCOW2_SUBCLUSTER_ZERO_ALLOC ||
-(type == QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN && !bs->backing))
+(type == QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN && !bs->backing) ||
+(type == QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC && !bs->backing))
 {
 qemu_iovec_memset(qiov, qiov_offset, 0, cur_bytes);
 } else {
@@ -3750,6 +3754,7 @@ static coroutine_fn int 
qcow2_co_pwrite_zeroes(BlockDriverState *bs,
 return -ENOTSUP;
 }
 if (type != QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN &&
+type != QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC &&
 type != QCOW2_SUBCLUSTER_ZERO_PLAIN &&
 type != QCOW2_SUBCLUSTER_ZERO_ALLOC) {
 qemu_co_mutex_unlock(>lock);
@@ -3822,6 +3827,7 @@ qcow2_co_copy_range_from(BlockDriverState *bs,
 
 switch (type) {
 case QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN:
+case QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC:
 if (bs->backing && bs->backing->bs) {
 int64_t backing_length = bdrv_getlength(bs->backing->bs);
 if (src_offset >= backing_length) {
-- 
2.20.1




[RFC PATCH v3 11/27] qcow2: Add QCow2SubclusterType and qcow2_get_subcluster_type()

2019-12-22 Thread Alberto Garcia
This patch adds QCow2SubclusterType, which is the subcluster-level
version of QCow2ClusterType. All QCOW2_SUBCLUSTER_* values have the
the same meaning as their QCOW2_CLUSTER_* equivalents (when they
exist). See below for details and caveats.

In images without extended L2 entries clusters are treated as having
exactly one subcluster so it is possible to replace one data type with
the other while keeping the exact same semantics.

With extended L2 entries there are new possible values, and every
subcluster in the same cluster can obviously have a different
QCow2SubclusterType so functions need to be adapted to work on the
subcluster level.

There are several things that have to be taken into account:

  a) QCOW2_SUBCLUSTER_COMPRESSED means that the whole cluster is
 compressed. We do not support compression at the subcluster
 level.

  b) There are two different values for unallocated subclusters:
 QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN which means that the whole
 cluster is unallocated, and QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC
 which means that the cluster is allocated but the subcluster is
 not. The latter can only happen in images with extended L2
 entries.

  c) QCOW2_SUBCLUSTER_INVALID is used to detect the cases where an L2
 entry has a value that violates the specification. The caller is
 responsible for handling these situations.

 To prevent compatibility problems with images that have invalid
 values but are currently being read by QEMU without causing side
 effects, QCOW2_SUBCLUSTER_INVALID is only returned for images
 with extended L2 entries.

qcow2_cluster_to_subcluster_type() is added as a separate function
from qcow2_get_subcluster_type(), but this is only temporary and both
will be merged in a subsequent patch.

Signed-off-by: Alberto Garcia 
---
 block/qcow2.h | 92 +++
 1 file changed, 92 insertions(+)

diff --git a/block/qcow2.h b/block/qcow2.h
index 64b0a814f4..321ba9550f 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -80,6 +80,15 @@
 
 #define QCOW_MAX_SUBCLUSTERS_PER_CLUSTER 32
 
+/* The subcluster X [0..31] reads as zeroes */
+#define QCOW_OFLAG_SUB_ZERO(X)((1ULL << 63) >> (X))
+/* The subcluster X [0..31] is allocated */
+#define QCOW_OFLAG_SUB_ALLOC(X)   ((1ULL << 31) >> (X))
+/* L2 entry bitmap with all "read as zeroes" bits set */
+#define QCOW_L2_BITMAP_ALL_ZEROES 0xULL
+/* L2 entry bitmap with all allocation bits set */
+#define QCOW_L2_BITMAP_ALL_ALLOC  0xULL
+
 /* Size of normal and extended L2 entries */
 #define L2E_SIZE_NORMAL   (sizeof(uint64_t))
 #define L2E_SIZE_EXTENDED (sizeof(uint64_t) * 2)
@@ -455,6 +464,16 @@ typedef enum QCow2ClusterType {
 QCOW2_CLUSTER_COMPRESSED,
 } QCow2ClusterType;
 
+typedef enum QCow2SubclusterType {
+QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN,
+QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC,
+QCOW2_SUBCLUSTER_ZERO_PLAIN,
+QCOW2_SUBCLUSTER_ZERO_ALLOC,
+QCOW2_SUBCLUSTER_NORMAL,
+QCOW2_SUBCLUSTER_COMPRESSED,
+QCOW2_SUBCLUSTER_INVALID,
+} QCow2SubclusterType;
+
 typedef enum QCow2MetadataOverlap {
 QCOW2_OL_MAIN_HEADER_BITNR  = 0,
 QCOW2_OL_ACTIVE_L1_BITNR= 1,
@@ -632,6 +651,79 @@ static inline QCow2ClusterType 
qcow2_get_cluster_type(BlockDriverState *bs,
 }
 }
 
+/* For an image without extended L2 entries, return the
+ * QCow2SubclusterType equivalent of a given QCow2ClusterType */
+static inline
+QCow2SubclusterType qcow2_cluster_to_subcluster_type(QCow2ClusterType type)
+{
+switch (type) {
+case QCOW2_CLUSTER_COMPRESSED:
+return QCOW2_SUBCLUSTER_COMPRESSED;
+case QCOW2_CLUSTER_ZERO_PLAIN:
+return QCOW2_SUBCLUSTER_ZERO_PLAIN;
+case QCOW2_CLUSTER_ZERO_ALLOC:
+return QCOW2_SUBCLUSTER_ZERO_ALLOC;
+case QCOW2_CLUSTER_NORMAL:
+return QCOW2_SUBCLUSTER_NORMAL;
+case QCOW2_CLUSTER_UNALLOCATED:
+return QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN;
+default:
+g_assert_not_reached();
+}
+}
+
+/* In an image without subsclusters @l2_bitmap is ignored and
+ * @sc_index must be 0. */
+static inline
+QCow2SubclusterType qcow2_get_subcluster_type(BlockDriverState *bs,
+  uint64_t l2_entry,
+  uint64_t l2_bitmap,
+  unsigned sc_index)
+{
+BDRVQcow2State *s = bs->opaque;
+QCow2ClusterType type = qcow2_get_cluster_type(bs, l2_entry);
+assert(sc_index < s->subclusters_per_cluster);
+
+if (has_subclusters(s)) {
+bool sc_zero  = l2_bitmap & QCOW_OFLAG_SUB_ZERO(sc_index);
+bool sc_alloc = l2_bitmap & QCOW_OFLAG_SUB_ALLOC(sc_index);
+switch (type) {
+case QCOW2_CLUSTER_COMPRESSED:
+if (l2_bitmap != 0) {
+return QCOW2_SUBCLUSTER_INVALID;
+}
+  

[RFC PATCH v3 07/27] qcow2: Add subcluster-related fields to BDRVQcow2State

2019-12-22 Thread Alberto Garcia
This patch adds the following new fields to BDRVQcow2State:

- subclusters_per_cluster: Number of subclusters in a cluster
- subcluster_size: The size of each subcluster, in bytes
- subcluster_bits: No. of bits so 1 << subcluster_bits = subcluster_size

Images without subclusters are treated as if they had exactly one,
with subcluster_size = cluster_size.

Signed-off-by: Alberto Garcia 
---
 block/qcow2.c | 5 +
 block/qcow2.h | 5 +
 2 files changed, 10 insertions(+)

diff --git a/block/qcow2.c b/block/qcow2.c
index 3866b47946..cbd857e9c7 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1378,6 +1378,11 @@ static int coroutine_fn qcow2_do_open(BlockDriverState 
*bs, QDict *options,
 }
 }
 
+s->subclusters_per_cluster =
+has_subclusters(s) ? QCOW_MAX_SUBCLUSTERS_PER_CLUSTER : 1;
+s->subcluster_size = s->cluster_size / s->subclusters_per_cluster;
+s->subcluster_bits = ctz32(s->subcluster_size);
+
 /* Check support for various header values */
 if (header.refcount_order > 6) {
 error_setg(errp, "Reference count entry width too large; may not "
diff --git a/block/qcow2.h b/block/qcow2.h
index 1db3fc5dbc..941330cfc9 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -78,6 +78,8 @@
 /* The cluster reads as all zeros */
 #define QCOW_OFLAG_ZERO (1ULL << 0)
 
+#define QCOW_MAX_SUBCLUSTERS_PER_CLUSTER 32
+
 #define MIN_CLUSTER_BITS 9
 #define MAX_CLUSTER_BITS 21
 
@@ -284,6 +286,9 @@ typedef struct BDRVQcow2State {
 int cluster_bits;
 int cluster_size;
 int l2_slice_size;
+int subcluster_bits;
+int subcluster_size;
+int subclusters_per_cluster;
 int l2_bits;
 int l2_size;
 int l1_size;
-- 
2.20.1




[RFC PATCH v3 01/27] qcow2: Add calculate_l2_meta()

2019-12-22 Thread Alberto Garcia
handle_alloc() creates a QCowL2Meta structure in order to update the
image metadata and perform the necessary copy-on-write operations.

This patch moves that code to a separate function so it can be used
from other places.

Signed-off-by: Alberto Garcia 
---
 block/qcow2-cluster.c | 77 +--
 1 file changed, 53 insertions(+), 24 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 8982b7b762..617618dc54 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1019,6 +1019,56 @@ void qcow2_alloc_cluster_abort(BlockDriverState *bs, 
QCowL2Meta *m)
 QCOW2_DISCARD_NEVER);
 }
 
+/*
+ * For a given write request, create a new QCowL2Meta structure, add
+ * it to @m and the BDRVQcow2State.cluster_allocs list.
+ *
+ * @host_cluster_offset points to the beginning of the first cluster.
+ *
+ * @guest_offset and @bytes indicate the offset and length of the
+ * request.
+ *
+ * If @keep_old is true it means that the clusters were already
+ * allocated and will be overwritten. If false then the clusters are
+ * new and we have to decrease the reference count of the old ones.
+ */
+static void calculate_l2_meta(BlockDriverState *bs,
+  uint64_t host_cluster_offset,
+  uint64_t guest_offset, unsigned bytes,
+  QCowL2Meta **m, bool keep_old)
+{
+BDRVQcow2State *s = bs->opaque;
+unsigned cow_start_from = 0;
+unsigned cow_start_to = offset_into_cluster(s, guest_offset);
+unsigned cow_end_from = cow_start_to + bytes;
+unsigned cow_end_to = ROUND_UP(cow_end_from, s->cluster_size);
+unsigned nb_clusters = size_to_clusters(s, cow_end_from);
+QCowL2Meta *old_m = *m;
+
+*m = g_malloc0(sizeof(**m));
+**m = (QCowL2Meta) {
+.next   = old_m,
+
+.alloc_offset   = host_cluster_offset,
+.offset = start_of_cluster(s, guest_offset),
+.nb_clusters= nb_clusters,
+
+.keep_old_clusters = keep_old,
+
+.cow_start = {
+.offset = cow_start_from,
+.nb_bytes   = cow_start_to - cow_start_from,
+},
+.cow_end = {
+.offset = cow_end_from,
+.nb_bytes   = cow_end_to - cow_end_from,
+},
+};
+
+qemu_co_queue_init(&(*m)->dependent_requests);
+QLIST_INSERT_HEAD(>cluster_allocs, *m, next_in_flight);
+}
+
 /*
  * Returns the number of contiguous clusters that can be used for an allocating
  * write, but require COW to be performed (this includes yet unallocated space,
@@ -1417,35 +1467,14 @@ static int handle_alloc(BlockDriverState *bs, uint64_t 
guest_offset,
 uint64_t requested_bytes = *bytes + offset_into_cluster(s, guest_offset);
 int avail_bytes = nb_clusters << s->cluster_bits;
 int nb_bytes = MIN(requested_bytes, avail_bytes);
-QCowL2Meta *old_m = *m;
-
-*m = g_malloc0(sizeof(**m));
-
-**m = (QCowL2Meta) {
-.next   = old_m,
-
-.alloc_offset   = alloc_cluster_offset,
-.offset = start_of_cluster(s, guest_offset),
-.nb_clusters= nb_clusters,
-
-.keep_old_clusters  = keep_old_clusters,
-
-.cow_start = {
-.offset = 0,
-.nb_bytes   = offset_into_cluster(s, guest_offset),
-},
-.cow_end = {
-.offset = nb_bytes,
-.nb_bytes   = avail_bytes - nb_bytes,
-},
-};
-qemu_co_queue_init(&(*m)->dependent_requests);
-QLIST_INSERT_HEAD(>cluster_allocs, *m, next_in_flight);
 
 *host_offset = alloc_cluster_offset + offset_into_cluster(s, guest_offset);
 *bytes = MIN(*bytes, nb_bytes - offset_into_cluster(s, guest_offset));
 assert(*bytes != 0);
 
+calculate_l2_meta(bs, alloc_cluster_offset, guest_offset, *bytes,
+  m, keep_old_clusters);
+
 return 1;
 
 fail:
-- 
2.20.1




[RFC PATCH v3 24/27] qcow2: Restrict qcow2_co_pwrite_zeroes() to full clusters only

2019-12-22 Thread Alberto Garcia
Ideally it should be possible to zero individual subclusters using
this function, but this is currently not implemented.

Signed-off-by: Alberto Garcia 
---
 block/qcow2.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/block/qcow2.c b/block/qcow2.c
index 242001afa2..0267722065 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3754,6 +3754,12 @@ static coroutine_fn int 
qcow2_co_pwrite_zeroes(BlockDriverState *bs,
 qemu_co_mutex_unlock(>lock);
 return -ENOTSUP;
 }
+/* TODO: allow zeroing separate subclusters, we only allow
+ * zeroing full clusters at the moment. */
+if (nr != bytes) {
+qemu_co_mutex_unlock(>lock);
+return -ENOTSUP;
+}
 if (type != QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN &&
 type != QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC &&
 type != QCOW2_SUBCLUSTER_ZERO_PLAIN &&
-- 
2.20.1




[RFC PATCH v3 17/27] qcow2: Add subcluster support to discard_in_l2_slice()

2019-12-22 Thread Alberto Garcia
Setting the QCOW_OFLAG_ZERO bit of the L2 entry is forbidden if an
image has subclusters. Instead, the individual 'all zeroes' bits must
be used.

Signed-off-by: Alberto Garcia 
---
 block/qcow2-cluster.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 70b0aaa00a..207f670c94 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1790,7 +1790,11 @@ static int discard_in_l2_slice(BlockDriverState *bs, 
uint64_t offset,
 
 /* First remove L2 entries */
 qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
-if (!full_discard && s->qcow_version >= 3) {
+if (has_subclusters(s)) {
+set_l2_entry(s, l2_slice, l2_index + i, 0);
+set_l2_bitmap(s, l2_slice, l2_index + i,
+  full_discard ? 0 : QCOW_L2_BITMAP_ALL_ZEROES);
+} else if (!full_discard && s->qcow_version >= 3) {
 set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);
 } else {
 set_l2_entry(s, l2_slice, l2_index + i, 0);
-- 
2.20.1




[PATCH] qcow2: Use offset_into_cluster()

2019-12-12 Thread Alberto Garcia
There's a couple of places left in the qcow2 code that still do the
calculation manually, so let's replace them.

Signed-off-by: Alberto Garcia 
---
 block/qcow2.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 7c18721741..3866b47946 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -367,7 +367,7 @@ static int qcow2_read_extensions(BlockDriverState *bs, 
uint64_t start_offset,
 return -EINVAL;
 }
 
-if (bitmaps_ext.bitmap_directory_offset & (s->cluster_size - 1)) {
+if (offset_into_cluster(s, bitmaps_ext.bitmap_directory_offset)) {
 error_setg(errp, "bitmaps_ext: "
  "invalid bitmap directory offset");
 return -EINVAL;
@@ -1958,9 +1958,8 @@ static int coroutine_fn 
qcow2_co_block_status(BlockDriverState *bs,
 {
 BDRVQcow2State *s = bs->opaque;
 uint64_t cluster_offset;
-int index_in_cluster, ret;
 unsigned int bytes;
-int status = 0;
+int ret, status = 0;
 
 qemu_co_mutex_lock(>lock);
 
@@ -1981,8 +1980,7 @@ static int coroutine_fn 
qcow2_co_block_status(BlockDriverState *bs,
 
 if ((ret == QCOW2_CLUSTER_NORMAL || ret == QCOW2_CLUSTER_ZERO_ALLOC) &&
 !s->crypto) {
-index_in_cluster = offset & (s->cluster_size - 1);
-*map = cluster_offset | index_in_cluster;
+*map = cluster_offset | offset_into_cluster(s, offset);
 *file = s->data_file->bs;
 status |= BDRV_BLOCK_OFFSET_VALID;
 }
-- 
2.20.1




Re: [PATCH] qemu-img: fix info --backing-chain --image-opts

2019-12-06 Thread Alberto Garcia
On Thu 05 Dec 2019 02:46:46 PM CET, Stefan Hajnoczi wrote:
> Only apply --image-opts to the topmost image when listing an entire
> backing chain.  It is incorrect to treat backing filenames as image
> options.  Assuming we have the backing chain t.IMGFMT.base <-
> t.IMGFMT.mid <- t.IMGFMT, qemu-img info fails as follows:
>
>   $ qemu-img info --backing-chain --image-opts \
>   driver=qcow2,file.driver=file,file.filename=t.IMGFMT
>   qemu-img: Could not open 'TEST_DIR/t.IMGFMT.mid': Cannot find 
> device=TEST_DIR/t.IMGFMT.mid nor node_name=TEST_DIR/t.IMGFMT.mid
>
> Signed-off-by: Stefan Hajnoczi 

Reviewed-by: Alberto Garcia 

Berto



Re: [PATCH for-5.0 v2 02/23] blockdev: Allow resizing everywhere

2019-12-06 Thread Alberto Garcia
On Mon 11 Nov 2019 05:01:55 PM CET, Max Reitz wrote:
> @@ -3177,11 +3177,6 @@ void qmp_block_resize(bool has_device, const char 
> *device,
>  aio_context = bdrv_get_aio_context(bs);
>  aio_context_acquire(aio_context);
>  
> -if (!bdrv_is_first_non_filter(bs)) {
> -error_setg(errp, QERR_FEATURE_DISABLED, "resize");
> -goto out;
> -}
> -

What happens with this case now?

https://lists.gnu.org/archive/html/qemu-block/2019-11/msg00793.html

Berto



Re: [PATCH] blockjob: Fix error message for negative speed

2019-11-26 Thread Alberto Garcia
On Tue 26 Nov 2019 02:42:22 PM CET, Kevin Wolf wrote:
> The error message for a negative speed uses QERR_INVALID_PARAMETER,
> which implies that the 'speed' option doesn't even exist:
>
> {"error": {"class": "GenericError", "desc": "Invalid parameter 'speed'"}}
>
> Make it use QERR_INVALID_PARAMETER_VALUE instead:
>
> {"error": {"class": "GenericError", "desc": "Parameter 'speed' expects a 
> non-negative value"}}
>
> Signed-off-by: Kevin Wolf 

Reviewed-by: Alberto Garcia 

Berto



Re: [PATCH] throttle-groups: fix memory leak in throttle_group_set_limits

2019-11-26 Thread Alberto Garcia
On Tue 26 Nov 2019 09:17:02 AM CET, pannengy...@huawei.com wrote:
> --- a/block/throttle-groups.c
> +++ b/block/throttle-groups.c
> @@ -912,6 +912,7 @@ static void throttle_group_set_limits(Object *obj, 
> Visitor *v,
>  unlock:
>  qemu_mutex_unlock(>lock);
>  ret:
> +qapi_free_ThrottleLimits(argp);
>  error_propagate(errp, local_err);
>  return;

Thanks, but I also think that 'arg' is not used so it can be removed?

diff --git a/block/throttle-groups.c b/block/throttle-groups.c
index 77014c741b..37695b0cd7 100644
--- a/block/throttle-groups.c
+++ b/block/throttle-groups.c
@@ -893,8 +893,7 @@ static void throttle_group_set_limits(Object *obj, Visitor 
*v,
 {
 ThrottleGroup *tg = THROTTLE_GROUP(obj);
 ThrottleConfig cfg;
-ThrottleLimits arg = { 0 };
-ThrottleLimits *argp = 
+ThrottleLimits *argp;
 Error *local_err = NULL;
 
 visit_type_ThrottleLimits(v, name, , _err);
@@ -912,6 +911,7 @@ static void throttle_group_set_limits(Object *obj, Visitor 
*v,
 unlock:
 qemu_mutex_unlock(>lock);
 ret:
+qapi_free_ThrottleLimits(argp);
 error_propagate(errp, local_err);
 return;
 }

Berto



Re: [PATCH v3 2/8] block: Add no_fallback parameter to bdrv_co_truncate()

2019-11-25 Thread Alberto Garcia
On Fri 22 Nov 2019 05:05:05 PM CET, Kevin Wolf wrote:

> @@ -3405,6 +3412,7 @@ typedef struct TruncateCo {
>  int64_t offset;
>  bool exact;
>  PreallocMode prealloc;
> +bool no_fallback;
>  Error **errp;
>  int ret;
>  } TruncateCo;

You add the 'no_fallback' field here...

>  int bdrv_truncate(BdrvChild *child, int64_t offset, bool exact,
> -  PreallocMode prealloc, Error **errp)
> +  PreallocMode prealloc, bool no_fallback, Error **errp)
>  {
>  Coroutine *co;
>  TruncateCo tco = {

...but then you don't use it when the structure is initialized.

Berto



Re: [PATCH v3 3/8] qcow2: Declare BDRV_REQ_NO_FALLBACK supported

2019-11-25 Thread Alberto Garcia
On Fri 22 Nov 2019 05:05:06 PM CET, Kevin Wolf wrote:
> In the common case, qcow2_co_pwrite_zeroes() already only modifies
> metadata case, so we're fine with or without BDRV_REQ_NO_FALLBACK set.
>
> The only exception is when using an external data file, where the
> request is passed down to the block driver of the external data file. We
> are forwarding the BDRV_REQ_NO_FALLBACK flag there, though, so this is
> fine, too.
>
> Declare the flag supported therefore.
>
> Signed-off-by: Kevin Wolf 

Reviewed-by: Alberto Garcia 

Berto



Re: [PATCH v2 2/6] block: truncate: Don't make backing file data visible

2019-11-22 Thread Alberto Garcia
On Wed 20 Nov 2019 07:44:57 PM CET, Kevin Wolf wrote:
> When extending the size of an image that has a backing file larger than
> its old size, make sure that the backing file data doesn't become
> visible in the guest, but the added area is properly zeroed out.
>
> Consider the following scenario where the overlay is shorter than its
> backing file:
>
> base.qcow2: 
> overlay.qcow2:  
>
> When resizing (extending) overlay.qcow2, the new blocks should not stay
> unallocated and make the additional As from base.qcow2 visible like
> before this patch, but zeros should be read.
>
> A similar case happens with the various variants of a commit job when an
> intermediate file is short (- for unallocated):
>
> base.qcow2: A-A-
> mid.qcow2:  BB-B
> top.qcow2:  C--C--C-
>
> After commit top.qcow2 to mid.qcow2, the following happens:
>
> mid.qcow2:  CB-C00C0 (correct result)
> mid.qcow2:  CB-C--C- (before this fix)
>
> Without the fix, blocks that previously read as zeros on top.qcow2
> suddenly turn into A.
>
> Signed-off-by: Kevin Wolf 

Reviewed-by: Alberto Garcia 

Berto



Re: [PATCH v2 2/6] block: truncate: Don't make backing file data visible

2019-11-22 Thread Alberto Garcia
On Thu 21 Nov 2019 03:33:31 PM CET, Kevin Wolf wrote:
> For commit it's an image corruptor. For resize, I'll admit that it's
> just wrong behaviour that is probably harmless in most cases, but it's
> still wrong behaviour. It would be a corruptor in the same way as
> commit if it allowed resizing intermediate nodes, but I _think_ the
> old-style op blockers still forbid this. We'd have to double-check
> this if we leave things broken for block_resize.

I tried but cannot make block_resize touch an intermediate image:

if (!bdrv_is_first_non_filter(bs)) {
error_setg(errp, QERR_FEATURE_DISABLED, "resize");
goto out;
}

Berto



Re: [PATCH v2 4/6] iotests: Fix timeout in run_job()

2019-11-21 Thread Alberto Garcia
On Wed 20 Nov 2019 07:44:59 PM CET, Kevin Wolf wrote:
> run_job() accepts a wait parameter for a timeout, but it doesn't
> actually use it. The only thing that is missing is passing it to
> events_wait(), so do that now.
>
> Signed-off-by: Kevin Wolf 
> Reviewed-by: Eric Blake 
> Reviewed-by: Vladimir Sementsov-Ogievskiy 

Reviewed-by: Alberto Garcia 

Berto



Re: [PATCH v2 3/6] iotests: Add qemu_io_log()

2019-11-21 Thread Alberto Garcia
On Wed 20 Nov 2019 07:44:58 PM CET, Kevin Wolf wrote:
> Add a function that runs qemu-io and logs the output with the
> appropriate filters applied.
>
> Signed-off-by: Kevin Wolf 
> Reviewed-by: Eric Blake 
> Reviewed-by: Vladimir Sementsov-Ogievskiy 

Reviewed-by: Alberto Garcia 

Berto



Re: [PATCH v2 5/6] iotests: Support job-complete in run_job()

2019-11-21 Thread Alberto Garcia
On Wed 20 Nov 2019 07:45:00 PM CET, Kevin Wolf wrote:
> Automatically complete jobs that have a 'ready' state and need an
> explicit job-complete. Without this, run_job() would hang for such
> jobs.
>
> Signed-off-by: Kevin Wolf 
> Reviewed-by: Eric Blake 
> Reviewed-by: Vladimir Sementsov-Ogievskiy 

Reviewed-by: Alberto Garcia 

Berto



Re: qcow2 preallocation and backing files

2019-11-20 Thread Alberto Garcia
On Wed 20 Nov 2019 04:58:38 PM CET, Vladimir Sementsov-Ogievskiy wrote:
>> But then PREALLOC_MODE_OFF implies that the L2 metadata should be
>> preallocated (all clusters should be QCOW2_CLUSTER_ZERO_PLAIN), at
>> least when there is a backing file.
>
> Kevin proposed a fix that alters PREALLOC_MODE_OFF behavior if there
> is a backing file, to allocate L2 metadata with ZERO clusters..

Right, I just saw :)

Berto



Re: [PATCH 1/6] block: bdrv_co_do_pwrite_zeroes: 64 bit 'bytes' parameter

2019-11-20 Thread Alberto Garcia
On Wed 20 Nov 2019 03:03:14 PM CET, Kevin Wolf wrote:
> bdrv_co_do_pwrite_zeroes() can already cope with maximum request sizes
> by calling the driver in a loop until everything is done. Make the small
> remaining change that is necessary to let it accept a 64 bit byte count.
>
> Signed-off-by: Kevin Wolf 

Reviewed-by: Alberto Garcia 

Berto



Re: qcow2 preallocation and backing files

2019-11-20 Thread Alberto Garcia
On Wed 20 Nov 2019 01:27:53 PM CET, Vladimir Semeeausntsov-Ogievskiy wrote:

> 3. Also, the latter way is inconsistent with discard. Discarded
> regions returns zeroes, not clusters from backing. I think discard and
> truncate should behave in the same safe zero way.

But then PREALLOC_MODE_OFF implies that the L2 metadata should be
preallocated (all clusters should be QCOW2_CLUSTER_ZERO_PLAIN), at least
when there is a backing file.

Or maybe we just forbid PREALLOC_MODE_OFF during resize if there is a
backing file ?

Berto



qcow2 preallocation and backing files

2019-11-20 Thread Alberto Garcia
Hi,

as we discussed yesterday on IRC there's an inconsistency in the way
qcow2 preallocation works.

Let's create an image and fill it with data:

   $ qemu-img create -f raw base.img 1M
   $ qemu-io -f raw -c 'write -P 0xFF 0 1M' base.img

Now QEMU won't let us create a new image backed by base.img using
preallocation:

   $ qemu-img create -f qcow2 -b base.img -o preallocation=metadata active.img
   qemu-img: active.img: Backing file and preallocation cannot be used at the 
same time

The reason is that once a cluster is preallocated (i.e. it has a valid
L2 entry pointing to a host offset) the guest won't see the contents
of the backing file, so those options conflict with each other.

It is possible however to create an image that is smaller than
the backing file and then resize it using preallocation. In this
case qemu-img will happily accept any --preallocation option, with
different results from the guest's point of view:

   # This reads as 0xFF (the data comes from base.img)
   $ qemu-img create -f qcow2 -b base.img active.img 512K

   # The second half of the image also reads as 0xFF
   $ qemu-img resize --preallocation=off active.img 1M

   # Here the second half reads as zeroes
   $ qemu-img resize --preallocation=metadata active.img 1M

Apart from "qemu-img resize", the QMP block-resize command can also
extend an image like this, although it always uses PREALLOC_MODE_OFF
and the user cannot change that.

It does not seem right that the guest-visible data changes depending
on the preallocation mode. This could be solved by returning an error
when (backing_bs(blk_bs(blk)) && prealloc != PREALLOC_MODE_OFF) on
img_resize().

The important question is however: what behavior is the right one?
Should growing an image that was smaller than the backing file return
zeroes, or data from the backing file? I would opt for the latter, for
simplicity and consistency with the current behavior of block-resize,
although it was pointed out that this could be a security problem (I'm
not sure that I agree with that, but we can discuss it).

This also has a consequence on how preallocation should be implemented
for images with subclusters. Extended L2 entries allow us to allocate
a cluster but leave each one of its subclusters unallocated. That
would allow us to have a cluster that is simultaneously allocated but
whose data is read from the backing file. But it's up to us to decide
if that's what we should do when resizing an image.

Berto



Re: [RFC PATCH v2 24/26] qcow2: Add the 'extended_l2' option and the QCOW2_INCOMPAT_EXTL2 bit

2019-11-15 Thread Alberto Garcia
On Tue 05 Nov 2019 01:47:58 PM CET, Max Reitz wrote:
>> @@ -1347,6 +1347,12 @@ static int coroutine_fn 
>> qcow2_do_open(BlockDriverState *bs, QDict *options,
>>  s->subcluster_size = s->cluster_size / s->subclusters_per_cluster;
>>  s->subcluster_bits = ctz32(s->subcluster_size);
>>  
>> +if (s->subcluster_size < (1 << MIN_CLUSTER_BITS)) {
>> +error_setg(errp, "Unsupported subcluster size: %d", 
>> s->subcluster_size);
>> +ret = -EINVAL;
>> +goto fail;
>> +}
>> +
>>  /* Check support for various header values */
>>  if (header.refcount_order > 6) {
>>  error_setg(errp, "Reference count entry width too large; may not "
>
> It feels like this hunk should be part of the patch that added the
> s->subcluster_size assignment.

I don't have a strong opinion, but that patch simply defines the basic
attributes that are used elsewhere in the code, and this is the patch
where the values are checked (for creation and opening).

Berto



Re: [RFC PATCH v2 20/26] qcow2: Update L2 bitmap in qcow2_alloc_cluster_link_l2()

2019-11-14 Thread Alberto Garcia
On Tue 05 Nov 2019 12:43:16 PM CET, Max Reitz wrote:

> Speaking of handle_copied(); both elements of Qcow2COWRegion are of
> type unsigned.  handle_copied() doesn’t look like it takes any
> precautions to limit the range to even UINT_MAX (and it should
> probably limit it to INT_MAX).

Or rather, both handle_copied() and handle_alloc() should probably limit
it to BDRV_REQUEST_MAX_BYTES.

Berto



Re: [RFC PATCH v2 18/26] qcow2: Add subcluster support to expand_zero_clusters_in_l1()

2019-11-14 Thread Alberto Garcia
On Tue 05 Nov 2019 12:05:02 PM CET, Max Reitz wrote:
>> @@ -2102,6 +2103,7 @@ static int expand_zero_clusters_in_l1(BlockDriverState 
>> *bs, uint64_t *l1_table,
>>  } else {
>>  set_l2_entry(s, l2_slice, j, offset);
>>  }
>> +set_l2_bitmap(s, l2_slice, j, QCOW_L2_BITMAP_ALL_ALLOC);
>>  l2_dirty = true;
>>  }
>
> Technically this isn’t needed because this function is only called
> when downgrading v3 to v2 images (which can’t have subclusters), but
> of course it won’t hurt.

Right, but we need to change the function anyway to either do this or
assert that there are no subclusters. I prefer to do this because it's
trivial but I won't oppose if someone prefers the alternative.

Berto



Re: [RFC PATCH v2 16/26] qcow2: Add subcluster support to discard_in_l2_slice()

2019-11-14 Thread Alberto Garcia
On Mon 04 Nov 2019 04:07:35 PM CET, Max Reitz wrote:
>>  /* First remove L2 entries */
>>  qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
>> -if (!full_discard && s->qcow_version >= 3) {
>> +if (has_subclusters(s)) {
>> +set_l2_entry(s, l2_slice, l2_index + i, 0);
>> +set_l2_bitmap(s, l2_slice, l2_index + i,
>> +  full_discard ? 0 : QCOW_L2_BITMAP_ALL_ZEROES);
>
> But only for !full_discard, right?

Hence the conditional operator in my patch, maybe you didn't see it?

Berto



Re: [RFC PATCH v2 15/26] qcow2: Add subcluster support to zero_in_l2_slice()

2019-11-14 Thread Alberto Garcia
On Mon 04 Nov 2019 04:10:58 PM CET, Max Reitz wrote:
>>>  qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
>>>  if (cluster_type == QCOW2_CLUSTER_COMPRESSED || unmap) {
>>> -set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);
>>>  qcow2_free_any_clusters(bs, old_offset, 1, 
>>> QCOW2_DISCARD_REQUEST);
>> 
>> It feels wrong to me to free the cluster before updating the L2
>> entry.
>
> (Although it’s pre-existing, as set_l2_entry() is just an in-cache
> operation anyway :-/)

Yes, I think that if you want to do it afterwards you need to add
another loop after the qcow2_cache_put() call.

Berto



Re: [RFC PATCH v2 10/26] qcow2: Update get/set_l2_entry() and add get/set_l2_bitmap()

2019-11-14 Thread Alberto Garcia
On Wed 30 Oct 2019 05:55:04 PM CET, Max Reitz wrote:
>> This patch also adds the get/set_l2_bitmap() functions that are used
>> to access the bitmaps. For convenience, these functions are no-ops
>> when used in traditional qcow2 images.
>
> Granted, I haven’t seen the following patches yet, but if these
> functions are indeed called for images that don’t have subclusters,
> shouldn’t they return 0x0*0f*f then? (i.e. everything allocated)
>
> If they aren’t, they should probably just abort().  Well,
> set_l2_bitmap() should probably always abort() if there aren’t any
> subclusters.

Yeah, set_l2_bitmap() should abort (I had this changed already).

About get_l2_bitmap() ... I decided not to abort for convenience, for
cases like this one:

uint64_t l2_entry = get_l2_entry(s, l2_slice, l2_index);
uint64_t l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index);
type = qcow2_get_subcluster_type(bs, l2_entry, l2_bitmap, sc);

Here the value of l2_bitmap is going to be ignored anyway so it doesn't
matter what we return, but perhaps for consistency we should return
QCOW_OFLAG_SUB_ALLOC(0), which means that the first (and only, in this
case) subcluster is allocated.

Berto



Re: [RFC PATCH v2 03/26] qcow2: Process QCOW2_CLUSTER_ZERO_ALLOC clusters in handle_copied()

2019-11-13 Thread Alberto Garcia
On Wed 30 Oct 2019 03:24:08 PM CET, Max Reitz wrote:
>>  static void calculate_l2_meta(BlockDriverState *bs, uint64_t host_offset,
>>uint64_t guest_offset, uint64_t bytes,
>> -  QCowL2Meta **m, bool keep_old)
>> +  uint64_t *l2_slice, QCowL2Meta **m, bool 
>> keep_old)
>>  {
>>  BDRVQcow2State *s = bs->opaque;
>> -unsigned cow_start_from = 0;
>> +int l2_index = offset_to_l2_slice_index(s, guest_offset);
>> +uint64_t l2_entry;
>> +unsigned cow_start_from, cow_end_to;
>>  unsigned cow_start_to = offset_into_cluster(s, guest_offset);
>>  unsigned cow_end_from = cow_start_to + bytes;
>> -unsigned cow_end_to = ROUND_UP(cow_end_from, s->cluster_size);
>>  unsigned nb_clusters = size_to_clusters(s, cow_end_from);
>>  QCowL2Meta *old_m = *m;
>> +QCow2ClusterType type;
>> +
>> +/* Return if there's no COW (all clusters are normal and we keep them) 
>> */
>> +if (keep_old) {
>> +int i;
>> +for (i = 0; i < nb_clusters; i++) {
>> +l2_entry = be64_to_cpu(l2_slice[l2_index + i]);
>
> I’d assert somewhere that l2_index + nb_clusters - 1 won’t overflow.
>
>> +if (qcow2_get_cluster_type(bs, l2_entry) != 
>> QCOW2_CLUSTER_NORMAL) {
>
> Wouldn’t cluster_needs_cow() be better?

The semantics of cluster_needs_cow() change in this patch (which also
updates the documentation). But I should maybe change the name instead.

>> +break;
>> +}
>> +}
>> +if (i == nb_clusters) {
>> +return;
>> +}
>> +}
>
> So I understand we always need to create an L2Meta structure in all
> other cases because we at least need to turn those clusters into
> normal clusters?  (Even if they’re already allocated, as in the case
> of allocated zero clusters.)

That's correct.

>> -/* Returns true if writing to a cluster requires COW */
>> +/* Returns true if the cluster is unallocated or has refcount > 1 */
>>  static bool cluster_needs_cow(BlockDriverState *bs, uint64_t l2_entry)
>>  {
>>  switch (qcow2_get_cluster_type(bs, l2_entry)) {
>>  case QCOW2_CLUSTER_NORMAL:
>> +case QCOW2_CLUSTER_ZERO_ALLOC:
>>  if (l2_entry & QCOW_OFLAG_COPIED) {
>>  return false;
>
> Don’t zero-allocated clusters need COW always?  (Because the at the
> given host offset isn’t guaranteed to be zero.)

Yeah, hence the semantics change I described earlier. I should probably
call it cluster_needs_new_allocation() or something like that, which is
what this means now ("true if unallocated or refcount > 1").

>> - * Returns the number of contiguous clusters that can be used for an 
>> allocating
>> - * write, but require COW to be performed (this includes yet unallocated 
>> space,
>> - * which must copy from the backing file)
>> + * Returns the number of contiguous clusters that can be written to
>> + * using one single write request, starting from @l2_index.
>> + * At most @nb_clusters are checked.
>> + *
>> + * If @want_cow is true this counts clusters that are either
>> + * unallocated, or allocated but with refcount > 1.
>
> +(So they need to be newly allocated and COWed)?

Yes. Which in this context is the same as "newly allocated" I guess,
because every newly allocated cluster requires COW.

> (Or is the past participle of COW COWn?  Or maybe CedOW?)

:-))

>> + * If @want_cow is false this counts clusters that are already
>> + * allocated and can be written to using their current locations
>
> s/using their current locations/in-place/?

Ok.

>> @@ -1475,13 +1489,14 @@ static int handle_alloc(BlockDriverState *bs, 
>> uint64_t guest_offset,
>>  *bytes = MIN(*bytes, nb_bytes - offset_into_cluster(s, guest_offset));
>>  assert(*bytes != 0);
>>  
>> -calculate_l2_meta(bs, alloc_cluster_offset, guest_offset, *bytes,
>> -  m, keep_old_clusters);
>> +calculate_l2_meta(bs, alloc_cluster_offset, guest_offset, *bytes, 
>> l2_slice,
>> +  m, false);
>>  
>> -return 1;
>> +ret = 1;
>>  
>> -fail:
>> -if (*m && (*m)->nb_clusters > 0) {
>> +out:
>> +qcow2_cache_put(s->l2_table_cache, (void **) _slice);
>
> Is this a bug fix?

No, we call qcow2_cache_put() later because calculate_l2_meta() needs
l2_slice.

Berto



Re: [RFC PATCH v2 14/26] qcow2: Add subcluster support to qcow2_get_cluster_offset()

2019-11-08 Thread Alberto Garcia
On Mon 04 Nov 2019 03:58:57 PM CET, Max Reitz wrote:
> OTOH, what I don’t like so far about this series is that the “cluster
> logic” is still everywhere when I think it should just be about
> subclusters now.  (Except in few places where it must be about
> clusters as in something that can have a distinct host offset and/or
> has an own L2 entry.)  So maybe the parameter should really be
> @nb_subclusters.

> But I’m not sure.  For how this function is written right now, it
> makes sense for it to be @nb_clusters, but I think it could be changed
> so it would work with @nb_subclusters, too.

I'm still reviewing your (much appreciated) feedback, but one thing I
can tell you is that my initial versions were doing everything with
subclusters because of the reasons you mention (i.e. there was
@nb_subclusters and all that).

Later when I started to tidy things up I realized that most of those
places only needed the number of clusters after all, and in some cases
the necessary changes were really minimal (like in handle_copied()).

>> +static int count_contiguous_subclusters(BlockDriverState *bs, int 
>> nb_clusters,
>> +unsigned sc_index, uint64_t 
>> *l2_slice,
>> +int l2_index)
>>  {
   /* ... */
>> +if (type == QCOW2_CLUSTER_COMPRESSED) {
>> +return 1; /* Compressed clusters are always counted one by one */
>
> Hm, yes, but cluster by cluster, not subcluster by subcluster, so this
> should be s->subclusters_per_cluster, and perhaps sc_index should be
> asserted to be 0.  (Or it should be s->subclusters_per_cluster -
> sc_index.)

Right, that's a bug, it forces the caller to decompress the cluster 32
times in order to read it completely! Thanks!

(in reality this is not used because this function doesn't get called
for compressed clusters but the same problem happens in the calling
function, as you correctly point out. Maybe I should assert here
instead)

>> @@ -514,8 +499,8 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, 
>> uint64_t offset,
>
> I suppose this is get_subcluster_offset now.

Hmmm no, this returns the actual host cluster offset, then the caller
uses offset_into_cluster() to get the final value (which doesn't need to
be subcluster-aligned anyway).

Berto



Re: [RFC PATCH v2 13/26] qcow2: Add subcluster support to calculate_l2_meta()

2019-11-08 Thread Alberto Garcia
On Mon 04 Nov 2019 03:21:41 PM CET, Max Reitz wrote:
>> If an image has subclusters then there are more copy-on-write
>> scenarios that we need to consider. Let's say we have a write request
>> from the middle of subcluster #3 until the end of the cluster:
>> 
>>- If the cluster is new, then subclusters #0 to #3 from the old
>>  cluster must be copied into the new one.
>
> You mean for snapshots?
>
> (That isn’t quite clear, and I only guess this based on the next
> bullet point which differentiates based on “the old cluster was
> unallocated”.  That’s weird, too, because what does that mean, old
> cluster and new cluster?

Yes, perhaps the terminology is a bit unclear.

When I say "new cluster" is this context I mean that a write request
requires that a new cluster is allocated in the qcow2 file.

Then the "old cluster" would be what was there before the write (i.e. a
cluster with refcount > 1 or an unallocated cluster). Where we are doing
the copy-on-write from.

>>   * If @keep_old is true it means that the clusters were already
>>   * allocated and will be overwritten. If false then the clusters are
>>   * new and we have to decrease the reference count of the old ones.
>> + *
>> + * Returns 1 on success, -errno on failure.
>
> I think there should be a note here on why this doesn’t follow the
> general 0/-errno schema (i.e., “, because that is what callers generally
> expect”).

Good idea.

>> +if (!keep_old) {
>> +switch (type) {
>> +case QCOW2_CLUSTER_NORMAL:
>> +case QCOW2_CLUSTER_COMPRESSED:
>> +case QCOW2_CLUSTER_ZERO_ALLOC:
>> +case QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER:
>> +cow_start_from = 0;
>
> Somehow (I don’t know why) I find this a bit tough to understand.
>
> Wouldn’t it work to let cow_start start from the first subcluster for
> ZERO_ALLOC and UNALLOCATED_SUBCLUSTER?  We don’t need to COW those, it
> should be sufficient to just make the subclusters before that zero or
> unallocated, respectively.

Here's one good example why I should probably add a QCow2SubclusterType
different from the existing QCow2ClusterType.

In this context, 'type' is the type of the subcluster, and because of
that _ZERO_ALLOC means that the subcluster reads as zeros but the
cluster itself is allocated. Other subcluster may contain data and
that's why we have to copy all of them.

Berto



Re: [PATCH 1/2] block: Remove 'backing': null from bs->{explicit_, }options

2019-11-08 Thread Alberto Garcia
On Fri 08 Nov 2019 09:53:11 AM CET, Kevin Wolf wrote:
> bs->options and bs->explicit_options shouldn't contain any options for
> child nodes. bdrv_open_inherited() takes care to remove any options that
> match a child name after opening the image and the same is done when
> reopening.
>
> However, we miss the case of 'backing': null, which is a child option,
> but results in no child being created. This means that a 'backing': null
> remains in bs->options and bs->explicit_options.
>
> A typical use for 'backing': null is in live snapshots: blockdev-add for
> the qcow2 overlay makes sure not to open the backing file (because it is
> already opened and blockdev-snapshot will attach it). After doing a
> blockdev-snapshot, bs->options and bs->explicit_options become
> inconsistent with the actual state (bs has a backing file now, but the
> options still say null). On the next occasion that the image is
> reopened, e.g. switching it from read-write to read-only when another
> snapshot is taken, the option will take effect again and the node
> incorrectly loses its backing file.
>
> Fix bdrv_open_inherited() to remove the 'backing' option from
> bs->options and bs->explicit_options even for the case where it
> specifies that no backing file is wanted.
>
> Reported-by: Peter Krempa 
> Signed-off-by: Kevin Wolf 

Reviewed-by: Alberto Garcia 

Berto



Re: [PATCH 2/2] iotests: Test multiple blockdev-snapshot calls

2019-11-08 Thread Alberto Garcia
On Fri 08 Nov 2019 09:53:12 AM CET, Kevin Wolf wrote:
> +# Test large write to a qcow2 image

This doesn't belong here I guess :)

I wonder if this test could go in 245 instead.

Berto



Re: [RFC PATCH v2 12/26] qcow2: Handle QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER

2019-11-07 Thread Alberto Garcia
On Mon 04 Nov 2019 02:10:37 PM CET, Max Reitz wrote:

   [QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER]
> I still don’t know what you’re doing in the later patches, but to me
> it looks a bit like you don’t dare breaking up the existing structure
> that just deals with clusters.

Yeah, I decided to extend the existing type to make the changes less
invasive, but I'm also leaning towards creating a different type
now. I'll give it a try.

Berto



Re: [RFC 0/3] block/file-posix: Work around XFS bug

2019-11-04 Thread Alberto Garcia
On Mon 04 Nov 2019 04:14:56 PM CET, Max Reitz wrote:

>>> No, what I meant was that the original problem that led to
>>> c8bb23cbdbe would go away.
>> 
>> Ah, right. Not quite, according to my numbers:
>> 
>> |--++-+-|
>> | Cluster size | subclusters=on | subclusters=off | fallocate() |
>> |--++-+-|
>> |   256 KB | 10182 IOPS |966 IOPS |  14007 IOPS |
>> |   512 KB |  7919 IOPS |563 IOPS |  13442 IOPS |
>> |  1024 KB |  5050 IOPS |465 IOPS |  13887 IOPS |
>> |  2048 KB |  2465 IOPS |271 IOPS |  13885 IOPS |
>> |--++-+-|
>> 
>> There's obviously no backing image, and only the last column uses
>> handle_alloc_space() / fallocate().
>
> Thanks for providing some numbers!
>
> It was my impression, too, that subclusters wouldn’t solve it.  But it
> didn’t seem like that big of a difference to me.  Did you run this
> with aio=native?  (Because that’s where we have the XFS problem)

Here's with aio=native

|--++-+-|
| Cluster size | subclusters=on | subclusters=off | fallocate() |
|--++-+-|
|   256 KB | 19897 IOPS |891 IOPS |  32194 IOPS |
|   512 KB | 17881 IOPS |436 IOPS |  33092 IOPS |
|  1024 KB | 17050 IOPS |341 IOPS |  32768 IOPS |
|  2048 KB |  7854 IOPS |207 IOPS |  30944 IOPS |
|--++-+-|

Berto



Re: [RFC 0/3] block/file-posix: Work around XFS bug

2019-11-04 Thread Alberto Garcia
On Mon 04 Nov 2019 03:25:12 PM CET, Max Reitz wrote:
> So, it's obvious that c8bb23cbdbe32f5c326 is significant for 1M
> cluster-size, even on rotational disk, which means that previous
> assumption about calling handle_alloc_space() only for ssd is wrong,
> we need smarter heuristics..
>
> So, I'd prefer (1) or (2).
>>>
>>> OK.  I wonder whether that problem would go away with Berto’s subcluster
>>> series, though.
>> 
>> Catching up with this now. I was told about this last week at the KVM
>> Forum, but if the problems comes with the use of fallocate() and XFS,
>> the I don't think subclusters will solve it.
>> 
>> handle_alloc_space() is used to fill a cluster with zeroes when there's
>> COW, and that happens the same with subclusters, just at the subcluster
>> level instead of course.
>> 
>> What can happen, if the subcluster size matches the filesystem block
>> size, is that there's no need for any COW and therefore the bug is never
>> triggered. But that's not quite the same as a fix :-)
>
> No, what I meant was that the original problem that led to c8bb23cbdbe
> would go away.

Ah, right. Not quite, according to my numbers:

|--++-+-|
| Cluster size | subclusters=on | subclusters=off | fallocate() |
|--++-+-|
|   256 KB | 10182 IOPS |966 IOPS |  14007 IOPS |
|   512 KB |  7919 IOPS |563 IOPS |  13442 IOPS |
|  1024 KB |  5050 IOPS |465 IOPS |  13887 IOPS |
|  2048 KB |  2465 IOPS |271 IOPS |  13885 IOPS |
|--++-+-|

There's obviously no backing image, and only the last column uses
handle_alloc_space() / fallocate().

Berto



Re: [RFC 0/3] block/file-posix: Work around XFS bug

2019-11-04 Thread Alberto Garcia
On Fri 25 Oct 2019 04:19:30 PM CEST, Max Reitz wrote:
>>> So, it's obvious that c8bb23cbdbe32f5c326 is significant for 1M
>>> cluster-size, even on rotational disk, which means that previous
>>> assumption about calling handle_alloc_space() only for ssd is wrong,
>>> we need smarter heuristics..
>>>
>>> So, I'd prefer (1) or (2).
>
> OK.  I wonder whether that problem would go away with Berto’s subcluster
> series, though.

Catching up with this now. I was told about this last week at the KVM
Forum, but if the problems comes with the use of fallocate() and XFS,
the I don't think subclusters will solve it.

handle_alloc_space() is used to fill a cluster with zeroes when there's
COW, and that happens the same with subclusters, just at the subcluster
level instead of course.

What can happen, if the subcluster size matches the filesystem block
size, is that there's no need for any COW and therefore the bug is never
triggered. But that's not quite the same as a fix :-)

> Maybe make a decision based both on the ratio of data size to COW area
> length (only invoke handle_alloc_space() under a certain threshold),
> and the absolute COW area length (always invoke it above a certain
> threshold, unless the ratio doesn’t allow it)?

Maybe combining that with the smaller clusters/subclusters can work
around the problem. The maximum subcluster size is 64KB (for a 2MB
cluster).

Berto



Re: [RFC PATCH v2 12/26] qcow2: Handle QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER

2019-11-04 Thread Alberto Garcia
On Mon 04 Nov 2019 01:57:42 PM CET, Max Reitz wrote:
> On 26.10.19 23:25, Alberto Garcia wrote:
>> In the previous patch we added a new QCow2ClusterType named
>> QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER. There is a couple of places
>> where this new value needs to be handled, and that is what this patch
>> does.
>> 
>> Signed-off-by: Alberto Garcia 
>> ---
>>  block/qcow2.c | 13 +
>>  1 file changed, 9 insertions(+), 4 deletions(-)
> This patch deals with everything in qcow2.c.  There are more places that
> reference QCOW2_CLUSTER_* constants elsewhere, and I suppose most of
> them are handled by the following patches.
>
> But I wonder what the criterion is on where it needs to be handled and
> where it’s OK not to.  Right now it looks to me like it’s a bit
> arbitrary maybe?  But I suppose I’ll just have to wait until after the
> next patches.

This is the part of the series that I'm the least happy about, because
the existing qcow2_get_cluster_type() can never return this new value, I
only updated the cases where this can actually happen.

I'm still considering a different approach for this.

Berto



Re: [RFC PATCH v2 05/26] qcow2: Document the Extended L2 Entries feature

2019-10-30 Thread Alberto Garcia
On Wed 30 Oct 2019 05:23:30 PM CET, Max Reitz wrote:
>> +Subcluster Allocation Bitmap (for standard clusters):
>> +
>> +Bit  0 -  31:   Allocation status (one bit per subcluster)
>> +
>> +1: the subcluster is allocated. In this case the
>> +   host cluster offset field must contain a valid
>> +   offset.
>> +0: the subcluster is not allocated. In this case
>> +   read requests shall go to the backing file or
>> +   return zeros if there is no backing file data.
>> +
>> +Bits are assigned starting from the most significant 
>> one.
>> +(i.e. bit x is used for subcluster 31 - x)
>
> I seem to remember that someone proposed this bit ordering to you, but
> I wonder why.  So far everything in qcow2 starts from the least
> significant bit, for example refcounts (“If refcount_bits implies a
> sub-byte width, note that bit 0 means the least significant bit in
> this context”), feature bits, and sub-byte structure descriptions in
> general (which you reference directly with “bit x”).
>
> Soo...  What’s the reason for doing it the other way around here?

The reason is that I thought that it would be better for debugging
purposes. If I do an hexdump of the L2 table to see what's going on then
starting from the most significant bit gives me a better visual image of
what subclusters are allocated.

In other words, if the first two subclusters are allocated I think this
representation

   1100     (c0 00 00 00)

is more natural than this one

      0011  (00 00 00 03)

But I don't have a very strong opinion so I'm open to changing it.

Berto



Re: [RFC PATCH v2 01/26] qcow2: Add calculate_l2_meta()

2019-10-30 Thread Alberto Garcia
On Wed 30 Oct 2019 01:04:02 PM CET, Max Reitz wrote:
> (I intended not to comment on such things on an RFC, but here I am...)

No problem with that :-)

> I’d call it host_cluster_offset to make clear that it points to a
> cluster and isn’t the host offset for @guest_offset.

Sure, why not. We can also accept the exact offset within the cluster
and then use start_of_cluster(), but I prefer this one.

> And now that I’ve gone this far already, I might as well say that I’d
> like if it the comment noted that this function not only creates the
> L2Meta structure but also adds it to the cluster_allocs list.

I'll update the comment.

>> + * @guest_offset and @bytes indicate the offset and length of the
>> + * request.
>> + *
>> + * If @keep_old is true it means that the clusters were already
>> + * allocated and will be overwritten. If false then the clusters are
>> + * new and we have to decrease the reference count of the old ones.
>> + */
>> +static void calculate_l2_meta(BlockDriverState *bs, uint64_t host_offset,
>> +  uint64_t guest_offset, uint64_t bytes,
>
> And now I’m so deep into non-RFC-level review territory, I might as
> well say I’d prefer @bytes to be an unsigned (or maybe even a plain
> int), because anything more wouldn’t work.  (Not least because
> cow_end_to is an unsigned).

True, I'll correct that too.

Thanks!

Berto



Re: [RFC PATCH v2 01/26] qcow2: Add calculate_l2_meta()

2019-10-30 Thread Alberto Garcia
On Mon 28 Oct 2019 01:50:54 PM CET, Vladimir Sementsov-Ogievskiy wrote:
>> -.cow_end = {
>> -.offset = nb_bytes,
>
> hmm this logic is changed.. after the patch, it would be not nb_bytes, but
>
> offset_into_cluster(s, guest_offset) + MIN(*bytes, nb_bytes - 
> offset_into_cluster(s, guest_offset))

It hasn't changed. The value of .cow_end.offset before the patch is
nb_bytes (these two lines are equivalent):

  nb_bytes = MIN(requested_bytes, avail_bytes);
  nb_bytes = MIN(*bytes + offset_into_cluster, avail_bytes);

After the patch we update *bytes to pass it to calculate_l2_meta(). Here
is the value (again, these are all equivalent):

  *bytes = MIN(*bytes, nb_bytes - offset_into_cluster)
  *bytes = MIN(*bytes, MIN(*bytes + offset_into_cluster, avail_bytes) -
   offset_into_cluster)
  *bytes = MIN(*bytes, MIN(*bytes, avail_bytes - offset_into_cluster))
  *bytes = MIN(*bytes, avail_bytes - offset_into_cluster)

And here's the value of .cow_end.offset set in calculate_l2_meta():

  cow_end_from = *bytes + offset_into_cluster
  cow_end_from = MIN(*bytes, avail_bytes - offset_into_cluster) + 
 offset_into_cluster
  cow_end_from = MIN(*bytes + offset_into_cluster, avail_bytes)

This last definition of cow_end_from is the same as nb_bytes as used
before the patch.

Berto



Re: [PATCH for-4.2 0/2] qcow2: Fix QCOW2_COMPRESSED_SECTOR_MASK

2019-10-28 Thread Alberto Garcia
On Mon 28 Oct 2019 05:18:39 PM CET, Max Reitz  wrote:
> This fixes a bug reported on
> https://bugs.launchpad.net/qemu/+bug/185.  The problem is that
> QCOW2_COMPRESSED_SECTOR_MASK is a 32-bit mask when it really needs to be
> a 64-bit mask.

Ouch!

Reviewed-by: Alberto Garcia 

Berto



[RFC PATCH v2 20/26] qcow2: Update L2 bitmap in qcow2_alloc_cluster_link_l2()

2019-10-26 Thread Alberto Garcia
The L2 bitmap needs to be updated after each write to indicate what
new subclusters are now allocated.

This needs to happen even if the cluster was already allocated and the
L2 entry was otherwise valid.

Signed-off-by: Alberto Garcia 
---
 block/qcow2-cluster.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index fb6cf8df17..acb7226e03 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -980,6 +980,22 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, 
QCowL2Meta *m)
 
 set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_COPIED |
  (cluster_offset + (i << s->cluster_bits)));
+
+/* Update bitmap with the subclusters that were just written */
+if (has_subclusters(s)) {
+uint64_t written_from = m->cow_start.offset;
+uint64_t written_to = m->cow_end.offset + m->cow_end.nb_bytes;
+uint64_t l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index + i);
+int sc;
+for (sc = 0; sc < s->subclusters_per_cluster; sc++) {
+uint64_t sc_off = i * s->cluster_size + sc * 
s->subcluster_size;
+if (sc_off >= written_from && sc_off < written_to) {
+l2_bitmap |= QCOW_OFLAG_SUB_ALLOC(sc);
+l2_bitmap &= ~QCOW_OFLAG_SUB_ZERO(sc);
+}
+}
+set_l2_bitmap(s, l2_slice, l2_index + i, l2_bitmap);
+}
  }
 
 
-- 
2.20.1




[RFC PATCH v2 04/26] qcow2: Add get_l2_entry() and set_l2_entry()

2019-10-26 Thread Alberto Garcia
The size of an L2 entry is 64 bits, but if we want to have subclusters
we need extended L2 entries. This means that we have to access L2
tables and slices differently depending on whether an image has
extended L2 entries or not.

This patch replaces all l2_slice[] accesses with calls to
get_l2_entry() and set_l2_entry().

Signed-off-by: Alberto Garcia 
---
 block/qcow2-cluster.c  | 65 ++
 block/qcow2-refcount.c | 17 +--
 block/qcow2.h  | 12 
 3 files changed, 55 insertions(+), 39 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index ee6b46f917..581fa90ab1 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -379,12 +379,13 @@ fail:
  * cluster which may require a different handling)
  */
 static int count_contiguous_clusters(BlockDriverState *bs, int nb_clusters,
-int cluster_size, uint64_t *l2_slice, uint64_t stop_flags)
+int cluster_size, uint64_t *l2_slice, int l2_index, uint64_t 
stop_flags)
 {
+BDRVQcow2State *s = bs->opaque;
 int i;
 QCow2ClusterType first_cluster_type;
 uint64_t mask = stop_flags | L2E_OFFSET_MASK | QCOW_OFLAG_COMPRESSED;
-uint64_t first_entry = be64_to_cpu(l2_slice[0]);
+uint64_t first_entry = get_l2_entry(s, l2_slice, l2_index);
 uint64_t offset = first_entry & mask;
 
 first_cluster_type = qcow2_get_cluster_type(bs, first_entry);
@@ -397,7 +398,7 @@ static int count_contiguous_clusters(BlockDriverState *bs, 
int nb_clusters,
first_cluster_type == QCOW2_CLUSTER_ZERO_ALLOC);
 
 for (i = 0; i < nb_clusters; i++) {
-uint64_t l2_entry = be64_to_cpu(l2_slice[i]) & mask;
+uint64_t l2_entry = get_l2_entry(s, l2_slice, l2_index + i) & mask;
 if (offset + (uint64_t) i * cluster_size != l2_entry) {
 break;
 }
@@ -413,14 +414,16 @@ static int count_contiguous_clusters(BlockDriverState 
*bs, int nb_clusters,
 static int count_contiguous_clusters_unallocated(BlockDriverState *bs,
  int nb_clusters,
  uint64_t *l2_slice,
+ int l2_index,
  QCow2ClusterType wanted_type)
 {
+BDRVQcow2State *s = bs->opaque;
 int i;
 
 assert(wanted_type == QCOW2_CLUSTER_ZERO_PLAIN ||
wanted_type == QCOW2_CLUSTER_UNALLOCATED);
 for (i = 0; i < nb_clusters; i++) {
-uint64_t entry = be64_to_cpu(l2_slice[i]);
+uint64_t entry = get_l2_entry(s, l2_slice, l2_index + i);
 QCow2ClusterType type = qcow2_get_cluster_type(bs, entry);
 
 if (type != wanted_type) {
@@ -566,7 +569,7 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t 
offset,
 /* find the cluster offset for the given disk offset */
 
 l2_index = offset_to_l2_slice_index(s, offset);
-*cluster_offset = be64_to_cpu(l2_slice[l2_index]);
+*cluster_offset = get_l2_entry(s, l2_slice, l2_index);
 
 nb_clusters = size_to_clusters(s, bytes_needed);
 /* bytes_needed <= *bytes + offset_in_cluster, both of which are unsigned
@@ -601,14 +604,14 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, 
uint64_t offset,
 case QCOW2_CLUSTER_UNALLOCATED:
 /* how many empty clusters ? */
 c = count_contiguous_clusters_unallocated(bs, nb_clusters,
-  _slice[l2_index], type);
+  l2_slice, l2_index, type);
 *cluster_offset = 0;
 break;
 case QCOW2_CLUSTER_ZERO_ALLOC:
 case QCOW2_CLUSTER_NORMAL:
 /* how many allocated clusters ? */
 c = count_contiguous_clusters(bs, nb_clusters, s->cluster_size,
-  _slice[l2_index], QCOW_OFLAG_ZERO);
+  l2_slice, l2_index, QCOW_OFLAG_ZERO);
 *cluster_offset &= L2E_OFFSET_MASK;
 if (offset_into_cluster(s, *cluster_offset)) {
 qcow2_signal_corruption(bs, true, -1, -1,
@@ -761,7 +764,7 @@ int qcow2_alloc_compressed_cluster_offset(BlockDriverState 
*bs,
 
 /* Compression can't overwrite anything. Fail if the cluster was already
  * allocated. */
-cluster_offset = be64_to_cpu(l2_slice[l2_index]);
+cluster_offset = get_l2_entry(s, l2_slice, l2_index);
 if (cluster_offset & L2E_OFFSET_MASK) {
 qcow2_cache_put(s->l2_table_cache, (void **) _slice);
 return -EIO;
@@ -786,7 +789,7 @@ int qcow2_alloc_compressed_cluster_offset(BlockDriverState 
*bs,
 
 BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE_COMPRESSED);
 qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
-l2_slice[l2_index] = cpu_to_be64(cluster_offset);
+set_l2_entry(s, l2_slice, l2_index, cluster_offset);
 qcow2_cache_put(s->l2_table_cache, (void **)

[RFC PATCH v2 23/26] qcow2: Restrict qcow2_co_pwrite_zeroes() to full clusters only

2019-10-26 Thread Alberto Garcia
Ideally it should be possible to zero individual subclusters using
this function, but this is currently not implemented.

Signed-off-by: Alberto Garcia 
---
 block/qcow2.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/block/qcow2.c b/block/qcow2.c
index 01322ca449..537569ce88 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3704,6 +3704,12 @@ static coroutine_fn int 
qcow2_co_pwrite_zeroes(BlockDriverState *bs,
 bytes = s->cluster_size;
 nr = s->cluster_size;
 ret = qcow2_get_cluster_offset(bs, offset, , );
+/* TODO: allow zeroing separate subclusters, we only allow
+ * zeroing full clusters at the moment. */
+if (nr != bytes) {
+qemu_co_mutex_unlock(>lock);
+return -ENOTSUP;
+}
 if (ret != QCOW2_CLUSTER_UNALLOCATED &&
 ret != QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER &&
 ret != QCOW2_CLUSTER_ZERO_PLAIN &&
-- 
2.20.1




[RFC PATCH v2 01/26] qcow2: Add calculate_l2_meta()

2019-10-26 Thread Alberto Garcia
handle_alloc() creates a QCowL2Meta structure in order to update the
image metadata and perform the necessary copy-on-write operations.

This patch moves that code to a separate function so it can be used
from other places.

Signed-off-by: Alberto Garcia 
---
 block/qcow2-cluster.c | 76 +--
 1 file changed, 52 insertions(+), 24 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 8982b7b762..6c1dcdc781 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1019,6 +1019,55 @@ void qcow2_alloc_cluster_abort(BlockDriverState *bs, 
QCowL2Meta *m)
 QCOW2_DISCARD_NEVER);
 }
 
+/*
+ * For a given write request, create a new QCowL2Meta structure and
+ * add it to @m.
+ *
+ * @host_offset points to the beginning of the first cluster.
+ *
+ * @guest_offset and @bytes indicate the offset and length of the
+ * request.
+ *
+ * If @keep_old is true it means that the clusters were already
+ * allocated and will be overwritten. If false then the clusters are
+ * new and we have to decrease the reference count of the old ones.
+ */
+static void calculate_l2_meta(BlockDriverState *bs, uint64_t host_offset,
+  uint64_t guest_offset, uint64_t bytes,
+  QCowL2Meta **m, bool keep_old)
+{
+BDRVQcow2State *s = bs->opaque;
+unsigned cow_start_from = 0;
+unsigned cow_start_to = offset_into_cluster(s, guest_offset);
+unsigned cow_end_from = cow_start_to + bytes;
+unsigned cow_end_to = ROUND_UP(cow_end_from, s->cluster_size);
+unsigned nb_clusters = size_to_clusters(s, cow_end_from);
+QCowL2Meta *old_m = *m;
+
+*m = g_malloc0(sizeof(**m));
+**m = (QCowL2Meta) {
+.next   = old_m,
+
+.alloc_offset   = host_offset,
+.offset = start_of_cluster(s, guest_offset),
+.nb_clusters= nb_clusters,
+
+.keep_old_clusters = keep_old,
+
+.cow_start = {
+.offset = cow_start_from,
+.nb_bytes   = cow_start_to - cow_start_from,
+},
+.cow_end = {
+.offset = cow_end_from,
+.nb_bytes   = cow_end_to - cow_end_from,
+},
+};
+
+qemu_co_queue_init(&(*m)->dependent_requests);
+QLIST_INSERT_HEAD(>cluster_allocs, *m, next_in_flight);
+}
+
 /*
  * Returns the number of contiguous clusters that can be used for an allocating
  * write, but require COW to be performed (this includes yet unallocated space,
@@ -1417,35 +1466,14 @@ static int handle_alloc(BlockDriverState *bs, uint64_t 
guest_offset,
 uint64_t requested_bytes = *bytes + offset_into_cluster(s, guest_offset);
 int avail_bytes = nb_clusters << s->cluster_bits;
 int nb_bytes = MIN(requested_bytes, avail_bytes);
-QCowL2Meta *old_m = *m;
-
-*m = g_malloc0(sizeof(**m));
-
-**m = (QCowL2Meta) {
-.next   = old_m,
-
-.alloc_offset   = alloc_cluster_offset,
-.offset = start_of_cluster(s, guest_offset),
-.nb_clusters= nb_clusters,
-
-.keep_old_clusters  = keep_old_clusters,
-
-.cow_start = {
-.offset = 0,
-.nb_bytes   = offset_into_cluster(s, guest_offset),
-},
-.cow_end = {
-.offset = nb_bytes,
-.nb_bytes   = avail_bytes - nb_bytes,
-},
-};
-qemu_co_queue_init(&(*m)->dependent_requests);
-QLIST_INSERT_HEAD(>cluster_allocs, *m, next_in_flight);
 
 *host_offset = alloc_cluster_offset + offset_into_cluster(s, guest_offset);
 *bytes = MIN(*bytes, nb_bytes - offset_into_cluster(s, guest_offset));
 assert(*bytes != 0);
 
+calculate_l2_meta(bs, alloc_cluster_offset, guest_offset, *bytes,
+  m, keep_old_clusters);
+
 return 1;
 
 fail:
-- 
2.20.1




[RFC PATCH v2 10/26] qcow2: Update get/set_l2_entry() and add get/set_l2_bitmap()

2019-10-26 Thread Alberto Garcia
Extended L2 entries are 128-bit wide: 64 bits for the entry itself and
64 bits for the subcluster allocation bitmap.

In order to support them correctly get/set_l2_entry() need to be
updated so they take the entry width into account in order to
calculate the correct offset.

This patch also adds the get/set_l2_bitmap() functions that are used
to access the bitmaps. For convenience, these functions are no-ops
when used in traditional qcow2 images.

Signed-off-by: Alberto Garcia 
---
 block/qcow2.h | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/block/qcow2.h b/block/qcow2.h
index 29a253bfb9..741c41c80f 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -507,15 +507,37 @@ static inline size_t l2_entry_size(BDRVQcow2State *s)
 static inline uint64_t get_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice,
 int idx)
 {
+idx *= l2_entry_size(s) / sizeof(uint64_t);
 return be64_to_cpu(l2_slice[idx]);
 }
 
+static inline uint64_t get_l2_bitmap(BDRVQcow2State *s, uint64_t *l2_slice,
+ int idx)
+{
+if (has_subclusters(s)) {
+idx *= l2_entry_size(s) / sizeof(uint64_t);
+return be64_to_cpu(l2_slice[idx + 1]);
+} else {
+return 0;
+}
+}
+
 static inline void set_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice,
 int idx, uint64_t entry)
 {
+idx *= l2_entry_size(s) / sizeof(uint64_t);
 l2_slice[idx] = cpu_to_be64(entry);
 }
 
+static inline void set_l2_bitmap(BDRVQcow2State *s, uint64_t *l2_slice,
+ int idx, uint64_t bitmap)
+{
+if (has_subclusters(s)) {
+idx *= l2_entry_size(s) / sizeof(uint64_t);
+l2_slice[idx + 1] = cpu_to_be64(bitmap);
+}
+}
+
 static inline bool has_data_file(BlockDriverState *bs)
 {
 BDRVQcow2State *s = bs->opaque;
-- 
2.20.1




[RFC PATCH v2 16/26] qcow2: Add subcluster support to discard_in_l2_slice()

2019-10-26 Thread Alberto Garcia
Setting the QCOW_OFLAG_ZERO bit of the L2 entry is forbidden if an
image has subclusters. Instead, the individual 'all zeroes' bits must
be used.

Signed-off-by: Alberto Garcia 
---
 block/qcow2-cluster.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 3e4ba8d448..aa3eb727a5 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1772,7 +1772,11 @@ static int discard_in_l2_slice(BlockDriverState *bs, 
uint64_t offset,
 
 /* First remove L2 entries */
 qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
-if (!full_discard && s->qcow_version >= 3) {
+if (has_subclusters(s)) {
+set_l2_entry(s, l2_slice, l2_index + i, 0);
+set_l2_bitmap(s, l2_slice, l2_index + i,
+  full_discard ? 0 : QCOW_L2_BITMAP_ALL_ZEROES);
+} else if (!full_discard && s->qcow_version >= 3) {
 set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);
 } else {
 set_l2_entry(s, l2_slice, l2_index + i, 0);
-- 
2.20.1




[RFC PATCH v2 17/26] qcow2: Add subcluster support to check_refcounts_l2()

2019-10-26 Thread Alberto Garcia
Setting the QCOW_OFLAG_ZERO bit of the L2 entry is forbidden if an
image has subclusters. Instead, the individual 'all zeroes' bits must
be used.

Signed-off-by: Alberto Garcia 
---
 block/qcow2-refcount.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 621318447d..bc73125f70 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1685,8 +1685,13 @@ static int check_refcounts_l2(BlockDriverState *bs, 
BdrvCheckResult *res,
 int ign = active ? QCOW2_OL_ACTIVE_L2 :
QCOW2_OL_INACTIVE_L2;
 
-l2_entry = QCOW_OFLAG_ZERO;
-set_l2_entry(s, l2_table, i, l2_entry);
+if (has_subclusters(s)) {
+set_l2_entry(s, l2_table, i, 0);
+set_l2_bitmap(s, l2_table, i,
+  QCOW_L2_BITMAP_ALL_ZEROES);
+} else {
+set_l2_entry(s, l2_table, i, QCOW_OFLAG_ZERO);
+}
 ret = qcow2_pre_write_overlap_check(bs, ign,
 l2e_offset, l2_entry_size(s), false);
 if (ret < 0) {
-- 
2.20.1




[RFC PATCH v2 19/26] qcow2: Fix offset calculation in handle_dependencies()

2019-10-26 Thread Alberto Garcia
l2meta_cow_start() and l2meta_cow_end() are not necessarily
cluster-aligned if the image has subclusters, so update the
calculation of old_start and old_end to guarantee that no two requests
try to write on the same cluster.

Signed-off-by: Alberto Garcia 
---
 block/qcow2-cluster.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 62f2a9fcc0..fb6cf8df17 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1262,8 +1262,8 @@ static int handle_dependencies(BlockDriverState *bs, 
uint64_t guest_offset,
 
 uint64_t start = guest_offset;
 uint64_t end = start + bytes;
-uint64_t old_start = l2meta_cow_start(old_alloc);
-uint64_t old_end = l2meta_cow_end(old_alloc);
+uint64_t old_start = start_of_cluster(s, l2meta_cow_start(old_alloc));
+uint64_t old_end = ROUND_UP(l2meta_cow_end(old_alloc), 
s->cluster_size);
 
 if (end <= old_start || start >= old_end) {
 /* No intersection */
-- 
2.20.1




[RFC PATCH v2 09/26] qcow2: Add l2_entry_size()

2019-10-26 Thread Alberto Garcia
qcow2 images with subclusters have 128-bit L2 entries. The first 64
bits contain the same information as traditional images and the last
64 bits form a bitmap with the status of each individual subcluster.

Because of that we cannot assume that L2 entries are sizeof(uint64_t)
anymore. This function returns the proper value for the image.

Signed-off-by: Alberto Garcia 
---
 block/qcow2-cluster.c  | 12 ++--
 block/qcow2-refcount.c | 14 --
 block/qcow2.c  |  6 +++---
 block/qcow2.h  |  5 +
 4 files changed, 22 insertions(+), 15 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 581fa90ab1..1f509bda15 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -209,7 +209,7 @@ static int l2_load(BlockDriverState *bs, uint64_t offset,
uint64_t l2_offset, uint64_t **l2_slice)
 {
 BDRVQcow2State *s = bs->opaque;
-int start_of_slice = sizeof(uint64_t) *
+int start_of_slice = l2_entry_size(s) *
 (offset_to_l2_index(s, offset) - offset_to_l2_slice_index(s, offset));
 
 return qcow2_cache_get(bs, s->l2_table_cache, l2_offset + start_of_slice,
@@ -277,7 +277,7 @@ static int l2_allocate(BlockDriverState *bs, int l1_index)
 
 /* allocate a new l2 entry */
 
-l2_offset = qcow2_alloc_clusters(bs, s->l2_size * sizeof(uint64_t));
+l2_offset = qcow2_alloc_clusters(bs, s->l2_size * l2_entry_size(s));
 if (l2_offset < 0) {
 ret = l2_offset;
 goto fail;
@@ -301,7 +301,7 @@ static int l2_allocate(BlockDriverState *bs, int l1_index)
 
 /* allocate a new entry in the l2 cache */
 
-slice_size2 = s->l2_slice_size * sizeof(uint64_t);
+slice_size2 = s->l2_slice_size * l2_entry_size(s);
 n_slices = s->cluster_size / slice_size2;
 
 trace_qcow2_l2_allocate_get_empty(bs, l1_index);
@@ -365,7 +365,7 @@ fail:
 }
 s->l1_table[l1_index] = old_l2_offset;
 if (l2_offset > 0) {
-qcow2_free_clusters(bs, l2_offset, s->l2_size * sizeof(uint64_t),
+qcow2_free_clusters(bs, l2_offset, s->l2_size * l2_entry_size(s),
 QCOW2_DISCARD_ALWAYS);
 }
 return ret;
@@ -708,7 +708,7 @@ static int get_cluster_table(BlockDriverState *bs, uint64_t 
offset,
 
 /* Then decrease the refcount of the old table */
 if (l2_offset) {
-qcow2_free_clusters(bs, l2_offset, s->l2_size * sizeof(uint64_t),
+qcow2_free_clusters(bs, l2_offset, s->l2_size * l2_entry_size(s),
 QCOW2_DISCARD_OTHER);
 }
 
@@ -1883,7 +1883,7 @@ static int expand_zero_clusters_in_l1(BlockDriverState 
*bs, uint64_t *l1_table,
 int ret;
 int i, j;
 
-slice_size2 = s->l2_slice_size * sizeof(uint64_t);
+slice_size2 = s->l2_slice_size * l2_entry_size(s);
 n_slices = s->cluster_size / slice_size2;
 
 if (!is_active_l1) {
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 84fe02d388..621318447d 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1253,7 +1253,7 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
 l2_slice = NULL;
 l1_table = NULL;
 l1_size2 = l1_size * sizeof(uint64_t);
-slice_size2 = s->l2_slice_size * sizeof(uint64_t);
+slice_size2 = s->l2_slice_size * l2_entry_size(s);
 n_slices = s->cluster_size / slice_size2;
 
 s->cache_discards = true;
@@ -1604,7 +1604,7 @@ static int check_refcounts_l2(BlockDriverState *bs, 
BdrvCheckResult *res,
 int i, l2_size, nb_csectors, ret;
 
 /* Read L2 table from disk */
-l2_size = s->l2_size * sizeof(uint64_t);
+l2_size = s->l2_size * l2_entry_size(s);
 l2_table = g_malloc(l2_size);
 
 ret = bdrv_pread(bs->file, l2_offset, l2_table, l2_size);
@@ -1679,15 +1679,16 @@ static int check_refcounts_l2(BlockDriverState *bs, 
BdrvCheckResult *res,
 fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR",
 offset);
 if (fix & BDRV_FIX_ERRORS) {
+int idx = i * (l2_entry_size(s) / sizeof(uint64_t));
 uint64_t l2e_offset =
-l2_offset + (uint64_t)i * sizeof(uint64_t);
+l2_offset + (uint64_t)i * l2_entry_size(s);
 int ign = active ? QCOW2_OL_ACTIVE_L2 :
QCOW2_OL_INACTIVE_L2;
 
 l2_entry = QCOW_OFLAG_ZERO;
 set_l2_entry(s, l2_table, i, l2_entry);
 ret = qcow2_pre_write_overlap_check(bs, ign,
-l2e_offset, sizeof(uint64_t), false);
+l2e_offset, l2_entry_size(s), false);
 if (ret < 0) {
 fprintf(stderr, "ERROR: 

[RFC PATCH v2 18/26] qcow2: Add subcluster support to expand_zero_clusters_in_l1()

2019-10-26 Thread Alberto Garcia
Two changes are needed in order to add subcluster support to this
function: deallocated clusters must have their bitmaps cleared, and
expanded clusters must have all the "subcluster allocated" bits set.

Signed-off-by: Alberto Garcia 
---
 block/qcow2-cluster.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index aa3eb727a5..62f2a9fcc0 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -2036,6 +2036,7 @@ static int expand_zero_clusters_in_l1(BlockDriverState 
*bs, uint64_t *l1_table,
 /* not backed; therefore we can simply deallocate the
  * cluster */
 set_l2_entry(s, l2_slice, j, 0);
+set_l2_bitmap(s, l2_slice, j, 0);
 l2_dirty = true;
 continue;
 }
@@ -2102,6 +2103,7 @@ static int expand_zero_clusters_in_l1(BlockDriverState 
*bs, uint64_t *l1_table,
 } else {
 set_l2_entry(s, l2_slice, j, offset);
 }
+set_l2_bitmap(s, l2_slice, j, QCOW_L2_BITMAP_ALL_ALLOC);
 l2_dirty = true;
 }
 
-- 
2.20.1




[RFC PATCH v2 14/26] qcow2: Add subcluster support to qcow2_get_cluster_offset()

2019-10-26 Thread Alberto Garcia
The logic of this function remains pretty much the same, except that
it uses count_contiguous_subclusters(), which combines the logic of
count_contiguous_clusters() / count_contiguous_clusters_unallocated()
and checks individual subclusters.

Signed-off-by: Alberto Garcia 
---
 block/qcow2-cluster.c | 111 --
 1 file changed, 52 insertions(+), 59 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 990bc070af..e67559152f 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -372,66 +372,51 @@ fail:
 }
 
 /*
- * Checks how many clusters in a given L2 slice are contiguous in the image
- * file. As soon as one of the flags in the bitmask stop_flags changes compared
- * to the first cluster, the search is stopped and the cluster is not counted
- * as contiguous. (This allows it, for example, to stop at the first compressed
- * cluster which may require a different handling)
+ * Return the number of contiguous subclusters of the exact same type
+ * in a given L2 slice, starting from cluster @l2_index, subcluster
+ * @sc_index. At most @nb_clusters are checked. Allocated clusters are
+ * also required to be contiguous in the image file.
  */
-static int count_contiguous_clusters(BlockDriverState *bs, int nb_clusters,
-int cluster_size, uint64_t *l2_slice, int l2_index, uint64_t 
stop_flags)
+static int count_contiguous_subclusters(BlockDriverState *bs, int nb_clusters,
+unsigned sc_index, uint64_t *l2_slice,
+int l2_index)
 {
 BDRVQcow2State *s = bs->opaque;
-int i;
-QCow2ClusterType first_cluster_type;
-uint64_t mask = stop_flags | L2E_OFFSET_MASK | QCOW_OFLAG_COMPRESSED;
-uint64_t first_entry = get_l2_entry(s, l2_slice, l2_index);
-uint64_t offset = first_entry & mask;
-
-first_cluster_type = qcow2_get_cluster_type(bs, first_entry);
-if (first_cluster_type == QCOW2_CLUSTER_UNALLOCATED) {
-return 0;
+int i, j, count = 0;
+uint64_t l2_entry = get_l2_entry(s, l2_slice, l2_index);
+uint64_t l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index);
+uint64_t expected_offset = l2_entry & L2E_OFFSET_MASK;
+bool check_offset = true;
+QCow2ClusterType type =
+qcow2_get_subcluster_type(bs, l2_entry, l2_bitmap, sc_index);
+
+assert(type != QCOW2_CLUSTER_INVALID); /* The caller should check this */
+
+if (type == QCOW2_CLUSTER_COMPRESSED) {
+return 1; /* Compressed clusters are always counted one by one */
 }
 
-/* must be allocated */
-assert(first_cluster_type == QCOW2_CLUSTER_NORMAL ||
-   first_cluster_type == QCOW2_CLUSTER_ZERO_ALLOC);
-
-for (i = 0; i < nb_clusters; i++) {
-uint64_t l2_entry = get_l2_entry(s, l2_slice, l2_index + i) & mask;
-if (offset + (uint64_t) i * cluster_size != l2_entry) {
-break;
-}
+if (type == QCOW2_CLUSTER_UNALLOCATED || type == QCOW2_CLUSTER_ZERO_PLAIN) 
{
+check_offset = false;
 }
 
-return i;
-}
-
-/*
- * Checks how many consecutive unallocated clusters in a given L2
- * slice have the same cluster type.
- */
-static int count_contiguous_clusters_unallocated(BlockDriverState *bs,
- int nb_clusters,
- uint64_t *l2_slice,
- int l2_index,
- QCow2ClusterType wanted_type)
-{
-BDRVQcow2State *s = bs->opaque;
-int i;
-
-assert(wanted_type == QCOW2_CLUSTER_ZERO_PLAIN ||
-   wanted_type == QCOW2_CLUSTER_UNALLOCATED);
 for (i = 0; i < nb_clusters; i++) {
-uint64_t entry = get_l2_entry(s, l2_slice, l2_index + i);
-QCow2ClusterType type = qcow2_get_cluster_type(bs, entry);
-
-if (type != wanted_type) {
-break;
+l2_entry = get_l2_entry(s, l2_slice, l2_index + i);
+l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index + i);
+if (check_offset && expected_offset != (l2_entry & L2E_OFFSET_MASK)) {
+goto out;
+}
+for (j = (i == 0) ? sc_index : 0; j < s->subclusters_per_cluster; j++) 
{
+if (qcow2_get_subcluster_type(bs, l2_entry, l2_bitmap, j) != type) 
{
+goto out;
+}
+count++;
 }
+expected_offset += s->cluster_size;
 }
 
-return i;
+out:
+return count;
 }
 
 static int coroutine_fn do_perform_cow_read(BlockDriverState *bs,
@@ -514,8 +499,8 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t 
offset,
  unsigned int *bytes, uint64_t *cluster_offset)
 {
 BDRVQcow2State *s = bs->opaque;
-unsigned int l2_index;
-uint64_t l1_index, l2_offset, *l2_slice;
+unsigned int l2_index, sc_index;
+uint64_t l1_in

[RFC PATCH v2 22/26] qcow2: Add subcluster support to handle_alloc_space()

2019-10-26 Thread Alberto Garcia
The bdrv_co_pwrite_zeroes() call here fills complete clusters with
zeroes, but it can happen that some subclusters are not part of the
write request or the copy-on-write. This patch makes sure that only
the affected subclusters are overwritten.

A potential improvement would be to also fill with zeroes the other
subclusters if we can guarantee that we are not overwriting existing
data. However this would waste more disk space, so we should first
evaluate if it's really worth doing.

Signed-off-by: Alberto Garcia 
---
 block/qcow2.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 0261e87709..01322ca449 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2304,6 +2304,9 @@ static int handle_alloc_space(BlockDriverState *bs, 
QCowL2Meta *l2meta)
 
 for (m = l2meta; m != NULL; m = m->next) {
 int ret;
+uint64_t start_offset = m->alloc_offset + m->cow_start.offset;
+uint64_t nb_bytes = m->cow_end.offset + m->cow_end.nb_bytes -
+m->cow_start.offset;
 
 if (!m->cow_start.nb_bytes && !m->cow_end.nb_bytes) {
 continue;
@@ -2318,16 +2321,14 @@ static int handle_alloc_space(BlockDriverState *bs, 
QCowL2Meta *l2meta)
  * efficiently zero out the whole clusters
  */
 
-ret = qcow2_pre_write_overlap_check(bs, 0, m->alloc_offset,
-m->nb_clusters * s->cluster_size,
+ret = qcow2_pre_write_overlap_check(bs, 0, start_offset, nb_bytes,
 true);
 if (ret < 0) {
 return ret;
 }
 
 BLKDBG_EVENT(bs->file, BLKDBG_CLUSTER_ALLOC_SPACE);
-ret = bdrv_co_pwrite_zeroes(s->data_file, m->alloc_offset,
-m->nb_clusters * s->cluster_size,
+ret = bdrv_co_pwrite_zeroes(s->data_file, start_offset, nb_bytes,
 BDRV_REQ_NO_FALLBACK);
 if (ret < 0) {
 if (ret != -ENOTSUP && ret != -EAGAIN) {
-- 
2.20.1




[RFC PATCH v2 06/26] qcow2: Add dummy has_subclusters() function

2019-10-26 Thread Alberto Garcia
This function will be used by the qcow2 code to check if an image has
subclusters or not.

At the moment this simply returns false. Once all patches needed for
subcluster support are ready then QEMU will be able to create and
read images with subclusters and this function will return the actual
value.

Signed-off-by: Alberto Garcia 
---
 block/qcow2.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/block/qcow2.h b/block/qcow2.h
index 940cd4c236..b3826b37c1 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -488,6 +488,12 @@ typedef enum QCow2MetadataOverlap {
 
 #define INV_OFFSET (-1ULL)
 
+static inline bool has_subclusters(BDRVQcow2State *s)
+{
+/* FIXME: Return false until this feature is complete */
+return false;
+}
+
 static inline uint64_t get_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice,
 int idx)
 {
-- 
2.20.1




[RFC PATCH v2 21/26] qcow2: Clear the L2 bitmap when allocating a compressed cluster

2019-10-26 Thread Alberto Garcia
Compressed clusters always have the bitmap part of the extended L2
entry set to 0.

Signed-off-by: Alberto Garcia 
---
 block/qcow2-cluster.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index acb7226e03..3ba8a98073 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -783,6 +783,7 @@ int qcow2_alloc_compressed_cluster_offset(BlockDriverState 
*bs,
 BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE_COMPRESSED);
 qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
 set_l2_entry(s, l2_slice, l2_index, cluster_offset);
+set_l2_bitmap(s, l2_slice, l2_index, 0);
 qcow2_cache_put(s->l2_table_cache, (void **) _slice);
 
 *host_offset = cluster_offset & s->cluster_offset_mask;
-- 
2.20.1




[RFC PATCH v2 13/26] qcow2: Add subcluster support to calculate_l2_meta()

2019-10-26 Thread Alberto Garcia
If an image has subclusters then there are more copy-on-write
scenarios that we need to consider. Let's say we have a write request
from the middle of subcluster #3 until the end of the cluster:

   - If the cluster is new, then subclusters #0 to #3 from the old
 cluster must be copied into the new one.

   - If the cluster is new but the old cluster was unallocated, then
 only subcluster #3 needs copy-on-write. #0 to #2 are marked as
 unallocated in the bitmap of the new L2 entry.

   - If we are overwriting an old cluster and subcluster #3 is
 unallocated or has the all-zeroes bit set then we need
 copy-on-write on subcluster #3.

   - If we are overwriting an old cluster and subcluster #3 was
 allocated then there is no need to copy-on-write.

Signed-off-by: Alberto Garcia 
---
 block/qcow2-cluster.c | 136 +-
 1 file changed, 108 insertions(+), 28 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 1f509bda15..990bc070af 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1034,14 +1034,16 @@ void qcow2_alloc_cluster_abort(BlockDriverState *bs, 
QCowL2Meta *m)
  * If @keep_old is true it means that the clusters were already
  * allocated and will be overwritten. If false then the clusters are
  * new and we have to decrease the reference count of the old ones.
+ *
+ * Returns 1 on success, -errno on failure.
  */
-static void calculate_l2_meta(BlockDriverState *bs, uint64_t host_offset,
-  uint64_t guest_offset, uint64_t bytes,
-  uint64_t *l2_slice, QCowL2Meta **m, bool 
keep_old)
+static int calculate_l2_meta(BlockDriverState *bs, uint64_t host_offset,
+ uint64_t guest_offset, uint64_t bytes,
+ uint64_t *l2_slice, QCowL2Meta **m, bool keep_old)
 {
 BDRVQcow2State *s = bs->opaque;
-int l2_index = offset_to_l2_slice_index(s, guest_offset);
-uint64_t l2_entry;
+int sc_index, l2_index = offset_to_l2_slice_index(s, guest_offset);
+uint64_t l2_entry, l2_bitmap;
 unsigned cow_start_from, cow_end_to;
 unsigned cow_start_to = offset_into_cluster(s, guest_offset);
 unsigned cow_end_from = cow_start_to + bytes;
@@ -1049,38 +1051,108 @@ static void calculate_l2_meta(BlockDriverState *bs, 
uint64_t host_offset,
 QCowL2Meta *old_m = *m;
 QCow2ClusterType type;
 
-/* Return if there's no COW (all clusters are normal and we keep them) */
+/* Return if there's no COW (all subclusters are normal and we are
+ * keeping the clusters) */
 if (keep_old) {
+unsigned first_sc = cow_start_to / s->subcluster_size;
+unsigned last_sc = (cow_end_from - 1) / s->subcluster_size;
 int i;
-for (i = 0; i < nb_clusters; i++) {
-l2_entry = get_l2_entry(s, l2_slice, l2_index + i);
-if (qcow2_get_cluster_type(bs, l2_entry) != QCOW2_CLUSTER_NORMAL) {
+for (i = first_sc; i <= last_sc; i++) {
+unsigned c = i / s->subclusters_per_cluster;
+unsigned sc = i % s->subclusters_per_cluster;
+l2_entry = get_l2_entry(s, l2_slice, l2_index + c);
+l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index + c);
+type = qcow2_get_subcluster_type(bs, l2_entry, l2_bitmap, sc);
+if (type == QCOW2_CLUSTER_INVALID) {
+l2_index += c; /* Point to the invalid entry */
+goto fail;
+}
+if (type != QCOW2_CLUSTER_NORMAL) {
 break;
 }
 }
-if (i == nb_clusters) {
-return;
+if (i == last_sc + 1) {
+return 1;
 }
 }
 
 /* Get the L2 entry from the first cluster */
 l2_entry = get_l2_entry(s, l2_slice, l2_index);
-type = qcow2_get_cluster_type(bs, l2_entry);
+l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index);
+sc_index = offset_to_sc_index(s, guest_offset);
+type = qcow2_get_subcluster_type(bs, l2_entry, l2_bitmap, sc_index);
 
-if (type == QCOW2_CLUSTER_NORMAL && keep_old) {
-cow_start_from = cow_start_to;
+if (type == QCOW2_CLUSTER_INVALID) {
+goto fail;
+}
+
+if (!keep_old) {
+switch (type) {
+case QCOW2_CLUSTER_NORMAL:
+case QCOW2_CLUSTER_COMPRESSED:
+case QCOW2_CLUSTER_ZERO_ALLOC:
+case QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER:
+cow_start_from = 0;
+break;
+case QCOW2_CLUSTER_ZERO_PLAIN:
+case QCOW2_CLUSTER_UNALLOCATED:
+cow_start_from = sc_index << s->subcluster_bits;
+break;
+default:
+g_assert_not_reached();
+}
 } else {
-cow_start_from = 0;
+switch (type) {
+case QCOW2_CLUSTER_NORMAL:
+cow_start_from = cow_start_to;
+break;
+case Q

[RFC PATCH v2 24/26] qcow2: Add the 'extended_l2' option and the QCOW2_INCOMPAT_EXTL2 bit

2019-10-26 Thread Alberto Garcia
Now that the implementation of subclusters is complete we can finally
add the necessary options to create and read images with this feature,
which we call "extended L2 entries".

Signed-off-by: Alberto Garcia 
---
 block/qcow2.c|  46 ++
 block/qcow2.h|   8 ++-
 include/block/block_int.h|   1 +
 qapi/block-core.json |   6 ++
 tests/qemu-iotests/031.out   |   8 +--
 tests/qemu-iotests/036.out   |   4 +-
 tests/qemu-iotests/049.out   | 102 +++
 tests/qemu-iotests/060.out   |   1 +
 tests/qemu-iotests/061.out   |  20 +++---
 tests/qemu-iotests/065   |  18 --
 tests/qemu-iotests/082.out   |  48 ---
 tests/qemu-iotests/085.out   |  38 ++--
 tests/qemu-iotests/144.out   |   4 +-
 tests/qemu-iotests/182.out   |   2 +-
 tests/qemu-iotests/185.out   |   8 +--
 tests/qemu-iotests/198.out   |   2 +
 tests/qemu-iotests/206.out   |   4 ++
 tests/qemu-iotests/242.out   |   5 ++
 tests/qemu-iotests/255.out   |   8 +--
 tests/qemu-iotests/common.filter |   1 +
 20 files changed, 224 insertions(+), 110 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 537569ce88..b1fa7ab5da 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1347,6 +1347,12 @@ static int coroutine_fn qcow2_do_open(BlockDriverState 
*bs, QDict *options,
 s->subcluster_size = s->cluster_size / s->subclusters_per_cluster;
 s->subcluster_bits = ctz32(s->subcluster_size);
 
+if (s->subcluster_size < (1 << MIN_CLUSTER_BITS)) {
+error_setg(errp, "Unsupported subcluster size: %d", 
s->subcluster_size);
+ret = -EINVAL;
+goto fail;
+}
+
 /* Check support for various header values */
 if (header.refcount_order > 6) {
 error_setg(errp, "Reference count entry width too large; may not "
@@ -2806,6 +2812,11 @@ int qcow2_update_header(BlockDriverState *bs)
 .bit  = QCOW2_COMPAT_LAZY_REFCOUNTS_BITNR,
 .name = "lazy refcounts",
 },
+{
+.type = QCOW2_FEAT_TYPE_INCOMPATIBLE,
+.bit  = QCOW2_INCOMPAT_EXTL2_BITNR,
+.name = "extended L2 entries",
+},
 };
 
 ret = header_ext_add(buf, QCOW2_EXT_MAGIC_FEATURE_TABLE,
@@ -3271,6 +3282,27 @@ qcow2_co_create(BlockdevCreateOptions *create_options, 
Error **errp)
 goto out;
 }
 
+if (!qcow2_opts->has_extended_l2) {
+qcow2_opts->extended_l2 = false;
+}
+if (qcow2_opts->extended_l2) {
+unsigned min_cluster_size =
+(1 << MIN_CLUSTER_BITS) * QCOW_MAX_SUBCLUSTERS_PER_CLUSTER;
+if (version < 3) {
+error_setg(errp, "Extended L2 entries are only supported with "
+   "compatibility level 1.1 and above (use version=v3 or "
+   "greater)");
+ret = -EINVAL;
+goto out;
+}
+if (cluster_size < min_cluster_size) {
+error_setg(errp, "Extended L2 entries are only supported with "
+   "cluster sizes of at least %u bytes", min_cluster_size);
+ret = -EINVAL;
+goto out;
+}
+}
+
 if (!qcow2_opts->has_preallocation) {
 qcow2_opts->preallocation = PREALLOC_MODE_OFF;
 }
@@ -3392,6 +3424,11 @@ qcow2_co_create(BlockdevCreateOptions *create_options, 
Error **errp)
 cpu_to_be64(QCOW2_AUTOCLEAR_DATA_FILE_RAW);
 }
 
+if (qcow2_opts->extended_l2) {
+header->incompatible_features |=
+cpu_to_be64(QCOW2_INCOMPAT_EXTL2);
+}
+
 ret = blk_pwrite(blk, 0, header, cluster_size, 0);
 g_free(header);
 if (ret < 0) {
@@ -3569,6 +3606,7 @@ static int coroutine_fn qcow2_co_create_opts(const char 
*filename, QemuOpts *opt
 { BLOCK_OPT_BACKING_FMT,"backing-fmt" },
 { BLOCK_OPT_CLUSTER_SIZE,   "cluster-size" },
 { BLOCK_OPT_LAZY_REFCOUNTS, "lazy-refcounts" },
+{ BLOCK_OPT_EXTL2,  "extended-l2" },
 { BLOCK_OPT_REFCOUNT_BITS,  "refcount-bits" },
 { BLOCK_OPT_ENCRYPT,BLOCK_OPT_ENCRYPT_FORMAT },
 { BLOCK_OPT_COMPAT_LEVEL,   "version" },
@@ -4772,6 +4810,8 @@ static ImageInfoSpecific 
*qcow2_get_specific_info(BlockDriverState *bs,
 .corrupt= s->incompatible_features &
   QCOW2_INCOMPAT_CORRUPT,
 .has_corrupt= true,
+.has_extended_l2= true,
+.extended_l2= has_subclusters(s),
 .refcount_bits  = s->refcount_bits,
 .has_bitmaps= !!bitmap

[RFC PATCH v2 26/26] iotests: Add tests for qcow2 images with extended L2 entries

2019-10-26 Thread Alberto Garcia
Signed-off-by: Alberto Garcia 
---
 tests/qemu-iotests/271 | 235 +
 tests/qemu-iotests/271.out | 183 +
 tests/qemu-iotests/group   |   1 +
 3 files changed, 419 insertions(+)
 create mode 100755 tests/qemu-iotests/271
 create mode 100644 tests/qemu-iotests/271.out

diff --git a/tests/qemu-iotests/271 b/tests/qemu-iotests/271
new file mode 100755
index 00..c49433cdc9
--- /dev/null
+++ b/tests/qemu-iotests/271
@@ -0,0 +1,235 @@
+#!/bin/bash
+#
+# Test qcow2 images with extended L2 entries
+#
+# Copyright (C) 2019 Igalia, S.L.
+# Author: Alberto Garcia 
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+# creator
+owner=be...@igalia.com
+
+seq="$(basename $0)"
+echo "QA output created by $seq"
+
+here="$PWD"
+status=1   # failure is the default!
+
+_cleanup()
+{
+   _cleanup_test_img
+rm -f "$TEST_IMG.raw"
+rm -f "$TEST_IMG.backing"
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+_supported_fmt qcow2
+_supported_proto file nfs
+_supported_os Linux
+
+IMGOPTS="extended_l2=on"
+l2_offset=262144 # 0x4
+
+_verify_img()
+{
+$QEMU_IMG compare "$TEST_IMG" "$TEST_IMG.raw" | grep -v 'Images are 
identical'
+$QEMU_IMG check "$TEST_IMG" | _filter_qemu_img_check | \
+grep -v 'No errors were found on the image'
+}
+
+_read_l2_entry()
+{
+entry_no=$1
+nentries=$2
+offset=$(($l2_offset + $entry_no * 16))
+length=$((nentries * 16))
+$QEMU_IO -f raw -c "read -v $offset $length" "$TEST_IMG" | _filter_qemu_io 
| head -n -2
+}
+
+_test_write()
+{
+cmd="$1"
+l2_entry_idx="$2"
+l2_entry_num="$3"
+raw_cmd=`echo $1 | sed s/-c//` # Raw images don't support -c
+echo "$cmd"
+$QEMU_IO -c "$cmd" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "$raw_cmd" -f raw "$TEST_IMG.raw" | _filter_qemu_io
+_verify_img
+if [ -n "$l2_entry_idx" ]; then
+_read_l2_entry "$l2_entry_idx" "$l2_entry_num"
+fi
+}
+
+_reset_img()
+{
+$QEMU_IMG create -f raw "$TEST_IMG.raw" 1M | _filter_img_create
+if [ "$use_backing_file" = "yes" ]; then
+$QEMU_IMG create -f raw "$TEST_IMG.backing" 1M | _filter_img_create
+$QEMU_IO -c 'write -q -P 0xFF 0 1M' -f raw "$TEST_IMG.backing" | 
_filter_qemu_io
+$QEMU_IO -c 'write -q -P 0xFF 0 1M' -f raw "$TEST_IMG.raw" | 
_filter_qemu_io
+_make_test_img -b "$TEST_IMG.backing" 1M
+else
+_make_test_img 1M
+fi
+}
+
+# Test that writing to an image with subclusters produces the expected
+# results, in images with and without backing files
+for use_backing_file in yes no; do
+echo
+echo "### Standard write tests (backing file: $use_backing_file) ###"
+echo
+_reset_img
+### Write subcluster #0 (beginning of subcluster) ###
+_test_write 'write -q -P 1 0 1k' 0 1
+
+### Write subcluster #1 (middle of subcluster) ###
+_test_write 'write -q -P 2 3k 512' 0 1
+
+### Write subcluster #2 (end of subcluster) ###
+_test_write 'write -q -P 3 5k 1k' 0 1
+
+### Write subcluster #3 (full subcluster) ###
+_test_write 'write -q -P 4 6k 2k' 0 1
+
+### Write subclusters #4-6 (full subclusters) ###
+_test_write 'write -q -P 5 8k 6k' 0 1
+
+### Write subclusters #7-9 (partial subclusters) ###
+_test_write 'write -q -P 6 15k 4k' 0 1
+
+### Write subcluster #16 (partial subcluster) ###
+_test_write 'write -q -P 7 32k 1k' 0 2
+
+### Write subcluster #31-#34 (cluster overlap) ###
+_test_write 'write -q -P 8 63k 4k' 0 2
+
+### Zero subcluster #1 (TODO: use the "all zeros" bit)
+_test_write 'write -q -z 2k 2k' 0 1
+
+### Zero cluster #0
+_test_write 'write -q -z 0 64k' 0 1
+
+### Fill cluster #0 with data
+_test_write 'write -q -P 9 0 64k' 0 1
+
+### Zero and unmap half of cluster #0 (this won't unmap it)
+_test_write 'write -q -z -u 0 32k' 0 1
+
+### Zero and unmap cluster #0
+_test_write 'write -q -z -u 0 64k' 0 1

Re: [RFC PATCH 00/23] Add subcluster allocation to qcow2

2019-10-26 Thread Alberto Garcia
On Wed 23 Oct 2019 12:39:14 PM CEST, Vladimir Sementsov-Ogievskiy wrote:
> Hi!
>
> This is very interesting! Could you please export a branch to look at,
> as patches can't be applied on master now :(

I just sent a new version with some updates and rebased on top of the
current master.

Berto



[RFC PATCH v2 03/26] qcow2: Process QCOW2_CLUSTER_ZERO_ALLOC clusters in handle_copied()

2019-10-26 Thread Alberto Garcia
When writing to a qcow2 file there are two functions that take a
virtual offset and return a host offset, possibly allocating new
clusters if necessary:

   - handle_copied() looks for normal data clusters that are already
 allocated and have a reference count of 1. In those clusters we
 can simply write the data and there is no need to perform any
 copy-on-write.

   - handle_alloc() looks for clusters that do need copy-on-write,
 either because they haven't been allocated yet, because their
 reference count is != 1 or because they are ZERO_ALLOC clusters.

The ZERO_ALLOC case is a bit special because those are clusters that
are already allocated and they could perfectly be dealt with in
handle_copied() (as long as copy-on-write is performed when required).

In fact, there is extra code specifically for them in handle_alloc()
that tries to reuse the existing allocation if possible and frees them
otherwise.

This patch changes the handling of ZERO_ALLOC clusters so the
semantics of these two functions are now like this:

   - handle_copied() looks for clusters that are already allocated and
 which we can overwrite (NORMAL and ZERO_ALLOC clusters with a
 reference count of 1).

   - handle_alloc() looks for clusters for which we need a new
 allocation (all other cases).

One importante difference after this change is that clusters found in
handle_copied() may now require copy-on-write, but this will be anyway
necessary once we add support for subclusters.

Signed-off-by: Alberto Garcia 
---
 block/qcow2-cluster.c | 177 +++---
 1 file changed, 96 insertions(+), 81 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index aa1010a515..ee6b46f917 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1021,7 +1021,8 @@ void qcow2_alloc_cluster_abort(BlockDriverState *bs, 
QCowL2Meta *m)
 
 /*
  * For a given write request, create a new QCowL2Meta structure and
- * add it to @m.
+ * add it to @m. If the write request does not need copy-on-write or
+ * changes to the L2 metadata then this function does nothing.
  *
  * @host_offset points to the beginning of the first cluster.
  *
@@ -1034,15 +1035,51 @@ void qcow2_alloc_cluster_abort(BlockDriverState *bs, 
QCowL2Meta *m)
  */
 static void calculate_l2_meta(BlockDriverState *bs, uint64_t host_offset,
   uint64_t guest_offset, uint64_t bytes,
-  QCowL2Meta **m, bool keep_old)
+  uint64_t *l2_slice, QCowL2Meta **m, bool 
keep_old)
 {
 BDRVQcow2State *s = bs->opaque;
-unsigned cow_start_from = 0;
+int l2_index = offset_to_l2_slice_index(s, guest_offset);
+uint64_t l2_entry;
+unsigned cow_start_from, cow_end_to;
 unsigned cow_start_to = offset_into_cluster(s, guest_offset);
 unsigned cow_end_from = cow_start_to + bytes;
-unsigned cow_end_to = ROUND_UP(cow_end_from, s->cluster_size);
 unsigned nb_clusters = size_to_clusters(s, cow_end_from);
 QCowL2Meta *old_m = *m;
+QCow2ClusterType type;
+
+/* Return if there's no COW (all clusters are normal and we keep them) */
+if (keep_old) {
+int i;
+for (i = 0; i < nb_clusters; i++) {
+l2_entry = be64_to_cpu(l2_slice[l2_index + i]);
+if (qcow2_get_cluster_type(bs, l2_entry) != QCOW2_CLUSTER_NORMAL) {
+break;
+}
+}
+if (i == nb_clusters) {
+return;
+}
+}
+
+/* Get the L2 entry from the first cluster */
+l2_entry = be64_to_cpu(l2_slice[l2_index]);
+type = qcow2_get_cluster_type(bs, l2_entry);
+
+if (type == QCOW2_CLUSTER_NORMAL && keep_old) {
+cow_start_from = cow_start_to;
+} else {
+cow_start_from = 0;
+}
+
+/* Get the L2 entry from the last cluster */
+l2_entry = be64_to_cpu(l2_slice[l2_index + nb_clusters - 1]);
+type = qcow2_get_cluster_type(bs, l2_entry);
+
+if (type == QCOW2_CLUSTER_NORMAL && keep_old) {
+cow_end_to = cow_end_from;
+} else {
+cow_end_to = ROUND_UP(cow_end_from, s->cluster_size);
+}
 
 *m = g_malloc0(sizeof(**m));
 **m = (QCowL2Meta) {
@@ -1068,18 +1105,18 @@ static void calculate_l2_meta(BlockDriverState *bs, 
uint64_t host_offset,
 QLIST_INSERT_HEAD(>cluster_allocs, *m, next_in_flight);
 }
 
-/* Returns true if writing to a cluster requires COW */
+/* Returns true if the cluster is unallocated or has refcount > 1 */
 static bool cluster_needs_cow(BlockDriverState *bs, uint64_t l2_entry)
 {
 switch (qcow2_get_cluster_type(bs, l2_entry)) {
 case QCOW2_CLUSTER_NORMAL:
+case QCOW2_CLUSTER_ZERO_ALLOC:
 if (l2_entry & QCOW_OFLAG_COPIED) {
 return false;
 }
 case QCOW2_CLUSTER_UNALLOCATED:
 case QCOW2_CLUSTER_COMPRESSED:
 case QCOW2_CLUSTER_ZERO_PLAIN:
-case QCOW2_CLUSTER_ZERO_ALL

[RFC PATCH v2 02/26] qcow2: Split cluster_needs_cow() out of count_cow_clusters()

2019-10-26 Thread Alberto Garcia
We are going to need it in other places.

Signed-off-by: Alberto Garcia 
---
 block/qcow2-cluster.c | 34 +++---
 1 file changed, 19 insertions(+), 15 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 6c1dcdc781..aa1010a515 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1068,6 +1068,24 @@ static void calculate_l2_meta(BlockDriverState *bs, 
uint64_t host_offset,
 QLIST_INSERT_HEAD(>cluster_allocs, *m, next_in_flight);
 }
 
+/* Returns true if writing to a cluster requires COW */
+static bool cluster_needs_cow(BlockDriverState *bs, uint64_t l2_entry)
+{
+switch (qcow2_get_cluster_type(bs, l2_entry)) {
+case QCOW2_CLUSTER_NORMAL:
+if (l2_entry & QCOW_OFLAG_COPIED) {
+return false;
+}
+case QCOW2_CLUSTER_UNALLOCATED:
+case QCOW2_CLUSTER_COMPRESSED:
+case QCOW2_CLUSTER_ZERO_PLAIN:
+case QCOW2_CLUSTER_ZERO_ALLOC:
+return true;
+default:
+abort();
+}
+}
+
 /*
  * Returns the number of contiguous clusters that can be used for an allocating
  * write, but require COW to be performed (this includes yet unallocated space,
@@ -1080,25 +1098,11 @@ static int count_cow_clusters(BlockDriverState *bs, int 
nb_clusters,
 
 for (i = 0; i < nb_clusters; i++) {
 uint64_t l2_entry = be64_to_cpu(l2_slice[l2_index + i]);
-QCow2ClusterType cluster_type = qcow2_get_cluster_type(bs, l2_entry);
-
-switch(cluster_type) {
-case QCOW2_CLUSTER_NORMAL:
-if (l2_entry & QCOW_OFLAG_COPIED) {
-goto out;
-}
+if (!cluster_needs_cow(bs, l2_entry)) {
 break;
-case QCOW2_CLUSTER_UNALLOCATED:
-case QCOW2_CLUSTER_COMPRESSED:
-case QCOW2_CLUSTER_ZERO_PLAIN:
-case QCOW2_CLUSTER_ZERO_ALLOC:
-break;
-default:
-abort();
 }
 }
 
-out:
 assert(i <= nb_clusters);
 return i;
 }
-- 
2.20.1




[RFC PATCH v2 11/26] qcow2: Add qcow2_get_subcluster_type()

2019-10-26 Thread Alberto Garcia
This function returns the type of an individual subcluster. If an
image does not have subclusters then this returns the exact same value
as qcow2_get_cluster_type().

The information in standard and extended L2 entries is encoded in a
slightly different way, but all existing QCow2ClusterType values are
also valid for subclusters and have the same meanings (although they
typically only apply to the requested subcluster).

There are two important exceptions to this:

  a) QCOW2_CLUSTER_COMPRESSED means that the whole cluster is
 compressed. We do not support compression at the subcluster
 level.

  b) QCOW2_CLUSTER_UNALLOCATED means that the cluster is unallocated,
 that is, the offset field of the L2 entry does not point to a
 host cluster. All subclusters are obviously unallocated too but
 any of them could be of type QCOW2_CLUSTER_ZERO_PLAIN.

In addition to that, extended L2 entries allow one new scenario where
the cluster is normally allocated but an individual subcluster is not.
This is very different from (b) and because of that this patch adds a
new value called QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER.

As a last thing, this patch adds QCOW2_CLUSTER_INVALID to detect the
cases where an L2 entry has a value that violates the spec. The caller
is responsible for handling these situations.

To prevent compatibility problems with images that have invalid values
but are currently being read by QEMU without causing side effects,
QCOW2_CLUSTER_INVALID is only returned for images with extended L2
entries.

Signed-off-by: Alberto Garcia 
---
 block/qcow2.h | 62 +++
 1 file changed, 62 insertions(+)

diff --git a/block/qcow2.h b/block/qcow2.h
index 741c41c80f..23a2532ff2 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -77,6 +77,15 @@
 
 #define QCOW_MAX_SUBCLUSTERS_PER_CLUSTER 32
 
+/* The subcluster X [0..31] reads as zeroes */
+#define QCOW_OFLAG_SUB_ZERO(X)((1ULL << 63) >> (X))
+/* The subcluster X [0..31] is allocated */
+#define QCOW_OFLAG_SUB_ALLOC(X)   ((1ULL << 31) >> (X))
+/* L2 entry bitmap with all "read as zeroes" bits set */
+#define QCOW_L2_BITMAP_ALL_ZEROES 0xULL
+/* L2 entry bitmap with all allocation bits set */
+#define QCOW_L2_BITMAP_ALL_ALLOC  0xULL
+
 #define MIN_CLUSTER_BITS 9
 #define MAX_CLUSTER_BITS 21
 
@@ -438,10 +447,12 @@ typedef struct QCowL2Meta
 
 typedef enum QCow2ClusterType {
 QCOW2_CLUSTER_UNALLOCATED,
+QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER,
 QCOW2_CLUSTER_ZERO_PLAIN,
 QCOW2_CLUSTER_ZERO_ALLOC,
 QCOW2_CLUSTER_NORMAL,
 QCOW2_CLUSTER_COMPRESSED,
+QCOW2_CLUSTER_INVALID,
 } QCow2ClusterType;
 
 typedef enum QCow2MetadataOverlap {
@@ -621,6 +632,57 @@ static inline QCow2ClusterType 
qcow2_get_cluster_type(BlockDriverState *bs,
 }
 }
 
+/* In an image without subsclusters this returns the same value as
+ * qcow2_get_cluster_type() */
+static inline int qcow2_get_subcluster_type(BlockDriverState *bs,
+uint64_t l2_entry,
+uint64_t l2_bitmap,
+unsigned sc_index)
+{
+BDRVQcow2State *s = bs->opaque;
+QCow2ClusterType type = qcow2_get_cluster_type(bs, l2_entry);
+assert(sc_index < s->subclusters_per_cluster);
+
+if (has_subclusters(s)) {
+bool sc_zero  = l2_bitmap & QCOW_OFLAG_SUB_ZERO(sc_index);
+bool sc_alloc = l2_bitmap & QCOW_OFLAG_SUB_ALLOC(sc_index);
+switch (type) {
+case QCOW2_CLUSTER_COMPRESSED:
+if (l2_bitmap != 0) {
+return QCOW2_CLUSTER_INVALID;
+}
+break;
+case QCOW2_CLUSTER_ZERO_PLAIN:
+case QCOW2_CLUSTER_ZERO_ALLOC:
+return QCOW2_CLUSTER_INVALID;
+case QCOW2_CLUSTER_NORMAL:
+if (!sc_zero && !sc_alloc) {
+return QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER;
+} else if (!sc_zero && sc_alloc) {
+return QCOW2_CLUSTER_NORMAL;
+} else if (sc_zero && !sc_alloc) {
+return QCOW2_CLUSTER_ZERO_ALLOC;
+} else { /* sc_zero && sc_alloc */
+return QCOW2_CLUSTER_INVALID;
+}
+case QCOW2_CLUSTER_UNALLOCATED:
+if (!sc_zero && !sc_alloc) {
+return QCOW2_CLUSTER_UNALLOCATED;
+} else if (!sc_zero && sc_alloc) {
+return QCOW2_CLUSTER_INVALID;
+} else if (sc_zero && !sc_alloc) {
+return QCOW2_CLUSTER_ZERO_PLAIN;
+} else { /* sc_zero && sc_alloc */
+return QCOW2_CLUSTER_INVALID;
+}
+default:
+g_assert_not_reached();
+}
+}
+
+return type;
+}
+
 /* Check whether refcounts are eager or lazy */
 static inline bool qcow2_need_accurate_refcounts(BDRVQcow2State *s)
 {
-- 
2.20.1




[RFC PATCH v2 00/26] Add subcluster allocation to qcow2

2019-10-26 Thread Alberto Garcia
Hi,

here's the new version of the patches to add subcluster allocation
support to qcow2.

Please refer to the cover letter of the first version for a full
description of the patches:

   https://lists.gnu.org/archive/html/qemu-block/2019-10/msg00983.html

This version includes a few tests, but I'm planning to add more for
the next revision.

Berto

v2:
- Patch 12: Update after the changes in 88f468e546.
- Patch 21 (new): Clear the L2 bitmap when allocating a compressed
  cluster. Compressed clusters should have the bitmap all set to 0.
- Patch 24: Document the new fields in the QAPI documentation [Eric].
- Patch 25: Allow qcow2 preallocation with backing files.
- Patch 26: Add some tests for qcow2 images with extended L2 entries.

v1: https://lists.gnu.org/archive/html/qemu-block/2019-10/msg00983.html

Output of git backport-diff against v1:

Key:
[] : patches are identical
[] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/26:[] [-C] 'qcow2: Add calculate_l2_meta()'
002/26:[] [--] 'qcow2: Split cluster_needs_cow() out of 
count_cow_clusters()'
003/26:[] [--] 'qcow2: Process QCOW2_CLUSTER_ZERO_ALLOC clusters in 
handle_copied()'
004/26:[] [--] 'qcow2: Add get_l2_entry() and set_l2_entry()'
005/26:[] [--] 'qcow2: Document the Extended L2 Entries feature'
006/26:[] [--] 'qcow2: Add dummy has_subclusters() function'
007/26:[] [--] 'qcow2: Add subcluster-related fields to BDRVQcow2State'
008/26:[] [--] 'qcow2: Add offset_to_sc_index()'
009/26:[] [--] 'qcow2: Add l2_entry_size()'
010/26:[] [--] 'qcow2: Update get/set_l2_entry() and add 
get/set_l2_bitmap()'
011/26:[] [--] 'qcow2: Add qcow2_get_subcluster_type()'
012/26:[0005] [FC] 'qcow2: Handle QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER'
013/26:[] [--] 'qcow2: Add subcluster support to calculate_l2_meta()'
014/26:[] [--] 'qcow2: Add subcluster support to qcow2_get_cluster_offset()'
015/26:[] [--] 'qcow2: Add subcluster support to zero_in_l2_slice()'
016/26:[] [--] 'qcow2: Add subcluster support to discard_in_l2_slice()'
017/26:[] [--] 'qcow2: Add subcluster support to check_refcounts_l2()'
018/26:[] [--] 'qcow2: Add subcluster support to 
expand_zero_clusters_in_l1()'
019/26:[] [--] 'qcow2: Fix offset calculation in handle_dependencies()'
020/26:[] [--] 'qcow2: Update L2 bitmap in qcow2_alloc_cluster_link_l2()'
021/26:[down] 'qcow2: Clear the L2 bitmap when allocating a compressed cluster'
022/26:[] [--] 'qcow2: Add subcluster support to handle_alloc_space()'
023/26:[] [--] 'qcow2: Restrict qcow2_co_pwrite_zeroes() to full clusters 
only'
024/26:[0007] [FC] 'qcow2: Add the 'extended_l2' option and the 
QCOW2_INCOMPAT_EXTL2 bit'
025/26:[down] 'qcow2: Allow preallocation and backing files if extended_l2 is 
set'
026/26:[down] 'iotests: Add tests for qcow2 images with extended L2 entries'

Alberto Garcia (26):
  qcow2: Add calculate_l2_meta()
  qcow2: Split cluster_needs_cow() out of count_cow_clusters()
  qcow2: Process QCOW2_CLUSTER_ZERO_ALLOC clusters in handle_copied()
  qcow2: Add get_l2_entry() and set_l2_entry()
  qcow2: Document the Extended L2 Entries feature
  qcow2: Add dummy has_subclusters() function
  qcow2: Add subcluster-related fields to BDRVQcow2State
  qcow2: Add offset_to_sc_index()
  qcow2: Add l2_entry_size()
  qcow2: Update get/set_l2_entry() and add get/set_l2_bitmap()
  qcow2: Add qcow2_get_subcluster_type()
  qcow2: Handle QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER
  qcow2: Add subcluster support to calculate_l2_meta()
  qcow2: Add subcluster support to qcow2_get_cluster_offset()
  qcow2: Add subcluster support to zero_in_l2_slice()
  qcow2: Add subcluster support to discard_in_l2_slice()
  qcow2: Add subcluster support to check_refcounts_l2()
  qcow2: Add subcluster support to expand_zero_clusters_in_l1()
  qcow2: Fix offset calculation in handle_dependencies()
  qcow2: Update L2 bitmap in qcow2_alloc_cluster_link_l2()
  qcow2: Clear the L2 bitmap when allocating a compressed cluster
  qcow2: Add subcluster support to handle_alloc_space()
  qcow2: Restrict qcow2_co_pwrite_zeroes() to full clusters only
  qcow2: Add the 'extended_l2' option and the QCOW2_INCOMPAT_EXTL2 bit
  qcow2: Allow preallocation and backing files if extended_l2 is set
  iotests: Add tests for qcow2 images with extended L2 entries

 block/qcow2-cluster.c| 548 ---
 block/qcow2-refcount.c   |  38 ++-
 block/qcow2.c|  92 +-
 block/qcow2.h| 121 ++-
 docs/interop/qcow2.txt   |  68 +++-
 docs/qcow2-cache.txt |  19 +-
 include/block/block_int.h|   1 +
 qapi/block-core.json |   6 +
 tests/qemu-iotests/031.out   |   8 +-
 tests/qemu-iotests/036.out   |   4 +-
 tests/qemu-iotests/049.out   | 102 +++---
 tests

[RFC PATCH v2 15/26] qcow2: Add subcluster support to zero_in_l2_slice()

2019-10-26 Thread Alberto Garcia
Setting the QCOW_OFLAG_ZERO bit of the L2 entry is forbidden if an
image has subclusters. Instead, the individual 'all zeroes' bits must
be used.

Signed-off-by: Alberto Garcia 
---
 block/qcow2-cluster.c | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index e67559152f..3e4ba8d448 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1852,7 +1852,7 @@ static int zero_in_l2_slice(BlockDriverState *bs, 
uint64_t offset,
 assert(nb_clusters <= INT_MAX);
 
 for (i = 0; i < nb_clusters; i++) {
-uint64_t old_offset;
+uint64_t old_offset, l2_entry = 0;
 QCow2ClusterType cluster_type;
 
 old_offset = get_l2_entry(s, l2_slice, l2_index + i);
@@ -1869,12 +1869,18 @@ static int zero_in_l2_slice(BlockDriverState *bs, 
uint64_t offset,
 
 qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
 if (cluster_type == QCOW2_CLUSTER_COMPRESSED || unmap) {
-set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);
 qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST);
 } else {
-uint64_t entry = get_l2_entry(s, l2_slice, l2_index + i);
-set_l2_entry(s, l2_slice, l2_index + i, entry | QCOW_OFLAG_ZERO);
+l2_entry = get_l2_entry(s, l2_slice, l2_index + i);
 }
+
+if (has_subclusters(s)) {
+set_l2_bitmap(s, l2_slice, l2_index + i, 
QCOW_L2_BITMAP_ALL_ZEROES);
+} else {
+l2_entry |= QCOW_OFLAG_ZERO;
+}
+
+set_l2_entry(s, l2_slice, l2_index + i, l2_entry);
 }
 
 qcow2_cache_put(s->l2_table_cache, (void **) _slice);
-- 
2.20.1




[RFC PATCH v2 07/26] qcow2: Add subcluster-related fields to BDRVQcow2State

2019-10-26 Thread Alberto Garcia
This patch adds the following new fields to BDRVQcow2State:

- subclusters_per_cluster: Number of subclusters in a cluster
- subcluster_size: The size of each subcluster, in bytes
- subcluster_bits: No. of bits so 1 << subcluster_bits = subcluster_size

Images without subclusters are treated as if they had exactly one,
with subcluster_size = cluster_size.

Signed-off-by: Alberto Garcia 
---
 block/qcow2.c | 5 +
 block/qcow2.h | 5 +
 2 files changed, 10 insertions(+)

diff --git a/block/qcow2.c b/block/qcow2.c
index 0bc69e6996..ed8b81c7b7 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1342,6 +1342,11 @@ static int coroutine_fn qcow2_do_open(BlockDriverState 
*bs, QDict *options,
 }
 }
 
+s->subclusters_per_cluster =
+has_subclusters(s) ? QCOW_MAX_SUBCLUSTERS_PER_CLUSTER : 1;
+s->subcluster_size = s->cluster_size / s->subclusters_per_cluster;
+s->subcluster_bits = ctz32(s->subcluster_size);
+
 /* Check support for various header values */
 if (header.refcount_order > 6) {
 error_setg(errp, "Reference count entry width too large; may not "
diff --git a/block/qcow2.h b/block/qcow2.h
index b3826b37c1..278ca41314 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -75,6 +75,8 @@
 /* The cluster reads as all zeros */
 #define QCOW_OFLAG_ZERO (1ULL << 0)
 
+#define QCOW_MAX_SUBCLUSTERS_PER_CLUSTER 32
+
 #define MIN_CLUSTER_BITS 9
 #define MAX_CLUSTER_BITS 21
 
@@ -277,6 +279,9 @@ typedef struct BDRVQcow2State {
 int cluster_bits;
 int cluster_size;
 int l2_slice_size;
+int subcluster_bits;
+int subcluster_size;
+int subclusters_per_cluster;
 int l2_bits;
 int l2_size;
 int l1_size;
-- 
2.20.1




[RFC PATCH v2 05/26] qcow2: Document the Extended L2 Entries feature

2019-10-26 Thread Alberto Garcia
Subcluster allocation in qcow2 is implemented by extending the
existing L2 table entries and adding additional information to
indicate the allocation status of each subcluster.

This patch documents the changes to the qcow2 format and how they
affect the calculation of the L2 cache size.

Signed-off-by: Alberto Garcia 
---
 docs/interop/qcow2.txt | 68 --
 docs/qcow2-cache.txt   | 19 +++-
 2 files changed, 83 insertions(+), 4 deletions(-)

diff --git a/docs/interop/qcow2.txt b/docs/interop/qcow2.txt
index af5711e533..d34261f955 100644
--- a/docs/interop/qcow2.txt
+++ b/docs/interop/qcow2.txt
@@ -39,6 +39,9 @@ The first cluster of a qcow2 image contains the file header:
 as the maximum cluster size and won't be able to open 
images
 with larger cluster sizes.
 
+Note: if the image has Extended L2 Entries then 
cluster_bits
+must be at least 14 (i.e. 16384 byte clusters).
+
  24 - 31:   size
 Virtual disk size in bytes.
 
@@ -109,7 +112,12 @@ in the description of a field.
 An External Data File Name header extension may
 be present if this bit is set.
 
-Bits 3-63:  Reserved (set to 0)
+Bit 3:  Extended L2 Entries.  If this bit is set then
+L2 table entries use an extended format that
+allows subcluster-based allocation. See the
+Extended L2 Entries section for more details.
+
+Bits 4-63:  Reserved (set to 0)
 
  80 -  87:  compatible_features
 Bitmask of compatible features. An implementation can
@@ -437,7 +445,7 @@ cannot be relaxed without an incompatible layout change).
 Given an offset into the virtual disk, the offset into the image file can be
 obtained as follows:
 
-l2_entries = (cluster_size / sizeof(uint64_t))
+l2_entries = (cluster_size / sizeof(uint64_t))[*]
 
 l2_index = (offset / cluster_size) % l2_entries
 l1_index = (offset / cluster_size) / l2_entries
@@ -447,6 +455,8 @@ obtained as follows:
 
 return cluster_offset + (offset % cluster_size)
 
+[*] this changes if Extended L2 Entries are enabled, see next section
+
 L1 table entry:
 
 Bit  0 -  8:Reserved (set to 0)
@@ -487,7 +497,8 @@ Standard Cluster Descriptor:
 nor is data read from the backing file if the cluster is
 unallocated.
 
-With version 2, this is always 0.
+With version 2 or with extended L2 entries (see the next
+section), this is always 0.
 
  1 -  8:Reserved (set to 0)
 
@@ -524,6 +535,57 @@ file (except if bit 0 in the Standard Cluster Descriptor 
is set). If there is
 no backing file or the backing file is smaller than the image, they shall read
 zeros for all parts that are not covered by the backing file.
 
+== Extended L2 Entries ==
+
+An image uses Extended L2 Entries if bit 3 is set on the incompatible_features
+field of the header.
+
+In these images standard data clusters are divided into 32 subclusters of the
+same size. They are contiguous and start from the beginning of the cluster.
+Subclusters can be allocated independently and the L2 entry contains 
information
+indicating the status of each one of them. Compressed data clusters don't have
+subclusters so they are treated like in images without this feature.
+
+The size of an extended L2 entry is 128 bits so the number of entries per table
+is calculated using this formula:
+
+l2_entries = (cluster_size / (2 * sizeof(uint64_t)))
+
+The first 64 bits have the same format as the standard L2 table entry described
+in the previous section, with the exception of bit 0 of the standard cluster
+descriptor.
+
+The last 64 bits contain a subcluster allocation bitmap with this format:
+
+Subcluster Allocation Bitmap (for standard clusters):
+
+Bit  0 -  31:   Allocation status (one bit per subcluster)
+
+1: the subcluster is allocated. In this case the
+   host cluster offset field must contain a valid
+   offset.
+0: the subcluster is not allocated. In this case
+   read requests shall go to the backing file or
+   return zeros if there is no backing file data.
+
+Bits are assigned starting from the most significant one.
+(i.e. bit x is used for subcluster 31 - x)
+
+32 -  63Subcluster reads as zeros (one bit per subcluster)
+
+1: the subcluster reads as zeros. In this case the
+   allocation status bit must be unset. The host
+   cluster offset field may or may not be set

[RFC PATCH v2 12/26] qcow2: Handle QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER

2019-10-26 Thread Alberto Garcia
In the previous patch we added a new QCow2ClusterType named
QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER. There is a couple of places
where this new value needs to be handled, and that is what this patch
does.

Signed-off-by: Alberto Garcia 
---
 block/qcow2.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index ab40ae36ea..0261e87709 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1938,8 +1938,8 @@ static int coroutine_fn 
qcow2_co_block_status(BlockDriverState *bs,
 
 *pnum = bytes;
 
-if ((ret == QCOW2_CLUSTER_NORMAL || ret == QCOW2_CLUSTER_ZERO_ALLOC) &&
-!s->crypto) {
+if ((ret == QCOW2_CLUSTER_NORMAL || ret == QCOW2_CLUSTER_ZERO_ALLOC ||
+ ret == QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER) && !s->crypto) {
 index_in_cluster = offset & (s->cluster_size - 1);
 *map = cluster_offset | index_in_cluster;
 *file = s->data_file->bs;
@@ -1947,7 +1947,8 @@ static int coroutine_fn 
qcow2_co_block_status(BlockDriverState *bs,
 }
 if (ret == QCOW2_CLUSTER_ZERO_PLAIN || ret == QCOW2_CLUSTER_ZERO_ALLOC) {
 status |= BDRV_BLOCK_ZERO;
-} else if (ret != QCOW2_CLUSTER_UNALLOCATED) {
+} else if (ret != QCOW2_CLUSTER_UNALLOCATED &&
+   ret != QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER) {
 status |= BDRV_BLOCK_DATA;
 }
 if (s->metadata_preallocation && (status & BDRV_BLOCK_DATA) &&
@@ -2117,6 +2118,7 @@ static coroutine_fn int 
qcow2_co_preadv_task(BlockDriverState *bs,
 g_assert_not_reached();
 
 case QCOW2_CLUSTER_UNALLOCATED:
+case QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER:
 assert(bs->backing); /* otherwise handled in qcow2_co_preadv_part */
 
 BLKDBG_EVENT(bs->file, BLKDBG_READ_BACKING_AIO);
@@ -2187,7 +2189,8 @@ static coroutine_fn int 
qcow2_co_preadv_part(BlockDriverState *bs,
 
 if (ret == QCOW2_CLUSTER_ZERO_PLAIN ||
 ret == QCOW2_CLUSTER_ZERO_ALLOC ||
-(ret == QCOW2_CLUSTER_UNALLOCATED && !bs->backing))
+(ret == QCOW2_CLUSTER_UNALLOCATED && !bs->backing) ||
+(ret == QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER && !bs->backing))
 {
 qemu_iovec_memset(qiov, qiov_offset, 0, cur_bytes);
 } else {
@@ -3701,6 +3704,7 @@ static coroutine_fn int 
qcow2_co_pwrite_zeroes(BlockDriverState *bs,
 nr = s->cluster_size;
 ret = qcow2_get_cluster_offset(bs, offset, , );
 if (ret != QCOW2_CLUSTER_UNALLOCATED &&
+ret != QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER &&
 ret != QCOW2_CLUSTER_ZERO_PLAIN &&
 ret != QCOW2_CLUSTER_ZERO_ALLOC) {
 qemu_co_mutex_unlock(>lock);
@@ -3771,6 +3775,7 @@ qcow2_co_copy_range_from(BlockDriverState *bs,
 
 switch (ret) {
 case QCOW2_CLUSTER_UNALLOCATED:
+case QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER:
 if (bs->backing && bs->backing->bs) {
 int64_t backing_length = bdrv_getlength(bs->backing->bs);
 if (src_offset >= backing_length) {
-- 
2.20.1




[RFC PATCH v2 25/26] qcow2: Allow preallocation and backing files if extended_l2 is set

2019-10-26 Thread Alberto Garcia
Traditional qcow2 images don't allow preallocation if a backing file
is set. This is because once a cluster is allocated there is no way to
tell that its data should be read from the backing file.

Extended L2 entries have individual allocation bits for each
subcluster, and therefore it is perfectly possible to have an
allocated cluster with all its subclusters unallocated.

Signed-off-by: Alberto Garcia 
---
 block/qcow2.c  | 7 ---
 tests/qemu-iotests/206.out | 2 +-
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index b1fa7ab5da..8cf51c5d64 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3307,10 +3307,11 @@ qcow2_co_create(BlockdevCreateOptions *create_options, 
Error **errp)
 qcow2_opts->preallocation = PREALLOC_MODE_OFF;
 }
 if (qcow2_opts->has_backing_file &&
-qcow2_opts->preallocation != PREALLOC_MODE_OFF)
+qcow2_opts->preallocation != PREALLOC_MODE_OFF &&
+!qcow2_opts->extended_l2)
 {
-error_setg(errp, "Backing file and preallocation cannot be used at "
-   "the same time");
+error_setg(errp, "Backing file and preallocation can only be used at "
+   "the same time if extended_l2 is on");
 ret = -EINVAL;
 goto out;
 }
diff --git a/tests/qemu-iotests/206.out b/tests/qemu-iotests/206.out
index d2efc0394a..cfddfbfaa4 100644
--- a/tests/qemu-iotests/206.out
+++ b/tests/qemu-iotests/206.out
@@ -198,7 +198,7 @@ Job failed: Different refcount widths than 16 bits require 
compatibility level 1
 === Invalid backing file options ===
 {"execute": "blockdev-create", "arguments": {"job-id": "job0", "options": 
{"backing-file": "/dev/null", "driver": "qcow2", "file": "node0", 
"preallocation": "full", "size": 67108864}}}
 {"return": {}}
-Job failed: Backing file and preallocation cannot be used at the same time
+Job failed: Backing file and preallocation can only be used at the same time 
if extended_l2 is on
 {"execute": "job-dismiss", "arguments": {"id": "job0"}}
 {"return": {}}
 
-- 
2.20.1




[RFC PATCH v2 08/26] qcow2: Add offset_to_sc_index()

2019-10-26 Thread Alberto Garcia
For a given offset, return the subcluster number within its cluster
(i.e. with 32 subclusters per cluster it returns a number between 0
and 31).

Signed-off-by: Alberto Garcia 
---
 block/qcow2.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/block/qcow2.h b/block/qcow2.h
index 278ca41314..e25758079c 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -559,6 +559,11 @@ static inline int offset_to_l2_slice_index(BDRVQcow2State 
*s, int64_t offset)
 return (offset >> s->cluster_bits) & (s->l2_slice_size - 1);
 }
 
+static inline int offset_to_sc_index(BDRVQcow2State *s, int64_t offset)
+{
+return (offset >> s->subcluster_bits) & (s->subclusters_per_cluster - 1);
+}
+
 static inline int64_t qcow2_vm_state_offset(BDRVQcow2State *s)
 {
 return (int64_t)s->l1_vm_state_index << (s->cluster_bits + s->l2_bits);
-- 
2.20.1




<    3   4   5   6   7   8   9   10   11   12   >