[PATCH 0/4] fix & merge block_status_above and is_allocated_above

2019-11-16 Thread Vladimir Sementsov-Ogievskiy
Hi all!

I wanted to understand, what is the real difference between 
bdrv_block_status_above
and bdrv_is_allocated_above, IMHO bdrv_is_allocated_above should work through
bdrv_block_status_above..

And I found the problem: bdrv_is_allocated_above considers space after EOF as
UNALLOCATED for intermediate nodes..

UNALLOCATED is not about allocation at fs level, but about should we go to 
backing or
not.. And it seems incorrect for me, as in case of short backing file, we'll 
read
zeroes after EOF, instead of going further by backing chain.

This leads to the following effect:

./qemu-img create -f qcow2 base.qcow2 2M
./qemu-io -c "write -P 0x1 0 2M" base.qcow2

./qemu-img create -f qcow2 -b base.qcow2 mid.qcow2 1M
./qemu-img create -f qcow2 -b mid.qcow2 top.qcow2 2M

Region 1M..2M is shadowed by short middle image, so guest sees zeroes:
./qemu-io -c "read -P 0 1M 1M" top.qcow2
read 1048576/1048576 bytes at offset 1048576
1 MiB, 1 ops; 00.00 sec (22.795 GiB/sec and 23341.5807 ops/sec)

But after commit guest visible state is changed, which seems wrong for me:
./qemu-img commit top.qcow2 -b mid.qcow2

./qemu-io -c "read -P 0 1M 1M" mid.qcow2
Pattern verification failed at offset 1048576, 1048576 bytes
read 1048576/1048576 bytes at offset 1048576
1 MiB, 1 ops; 00.00 sec (4.981 GiB/sec and 5100.4794 ops/sec)

./qemu-io -c "read -P 1 1M 1M" mid.qcow2
read 1048576/1048576 bytes at offset 1048576
1 MiB, 1 ops; 00.00 sec (3.365 GiB/sec and 3446.1606 ops/sec)


I don't know, is it a real bug, as I don't know, do we support backing file 
larger than
its parent. Still, I'm not sure that this behavior of bdrv_is_allocated_above 
don't lead
to other problems.

=

Hmm, bdrv_block_allocated_above behaves strange too:

with want_zero=true, it may report unallocated zeroes because of short backing 
files, which
are actually "allocated" in POV of backing chains. But I see this may influence 
only
qemu-img compare, and I don't see can it trigger some bug..

with want_zero=false, it may do no progress because of short backing file. 
Moreover it may
report EOF in the middle!! But want_zero=false used only in bdrv_is_allocated, 
which considers
onlyt top layer, so it seems OK. 

=

So, I propose these series, still I'm not sure is there a real bug.

Vladimir Sementsov-Ogievskiy (4):
  block/io: fix bdrv_co_block_status_above
  block/io: bdrv_common_block_status_above: support include_base
  block/io: bdrv_common_block_status_above: support bs == base
  block/io: fix bdrv_is_allocated_above

 block/io.c | 104 ++---
 tests/qemu-iotests/154.out |   4 +-
 2 files changed, 53 insertions(+), 55 deletions(-)

-- 
2.21.0




[PATCH 4/4] block/io: fix bdrv_is_allocated_above

2019-11-16 Thread Vladimir Sementsov-Ogievskiy
bdrv_is_allocated_above wrongly handles short backing files: it reports
after-EOF space as UNALLOCATED which is wrong, as on read the data is
generated on the level of short backing file (if all overlays has
unallocated area at that place).

Reusing bdrv_common_block_status_above fixes the issue and unifies code
path.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/io.c | 43 +--
 1 file changed, 5 insertions(+), 38 deletions(-)

diff --git a/block/io.c b/block/io.c
index f05b2e8ecc..6946120587 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2581,52 +2581,19 @@ int coroutine_fn bdrv_is_allocated(BlockDriverState 
*bs, int64_t offset,
  * at 'offset + *pnum' may return the same allocation status (in other
  * words, the result is not necessarily the maximum possible range);
  * but 'pnum' will only be 0 when end of file is reached.
- *
  */
 int bdrv_is_allocated_above(BlockDriverState *top,
 BlockDriverState *base,
 bool include_base, int64_t offset,
 int64_t bytes, int64_t *pnum)
 {
-BlockDriverState *intermediate;
-int ret;
-int64_t n = bytes;
-
-assert(base || !include_base);
-
-intermediate = top;
-while (include_base || intermediate != base) {
-int64_t pnum_inter;
-int64_t size_inter;
-
-assert(intermediate);
-ret = bdrv_is_allocated(intermediate, offset, bytes, _inter);
-if (ret < 0) {
-return ret;
-}
-if (ret) {
-*pnum = pnum_inter;
-return 1;
-}
-
-size_inter = bdrv_getlength(intermediate);
-if (size_inter < 0) {
-return size_inter;
-}
-if (n > pnum_inter &&
-(intermediate == top || offset + pnum_inter < size_inter)) {
-n = pnum_inter;
-}
-
-if (intermediate == base) {
-break;
-}
-
-intermediate = backing_bs(intermediate);
+int ret = bdrv_common_block_status_above(top, base, include_base, false,
+ offset, bytes, pnum, NULL, NULL);
+if (ret < 0) {
+return ret;
 }
 
-*pnum = n;
-return 0;
+return !!(ret & BDRV_BLOCK_ALLOCATED);
 }
 
 typedef struct BdrvVmstateCo {
-- 
2.21.0




[PATCH 2/4] block/io: bdrv_common_block_status_above: support include_base

2019-11-16 Thread Vladimir Sementsov-Ogievskiy
In order to reuse bdrv_common_block_status_above in
bdrv_is_allocated_above, let's support include_base parameter.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/io.c | 18 +-
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/block/io.c b/block/io.c
index 4d7fa99bd2..df3ecf2430 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2196,6 +2196,7 @@ int bdrv_flush_all(void)
 typedef struct BdrvCoBlockStatusData {
 BlockDriverState *bs;
 BlockDriverState *base;
+bool include_base;
 bool want_zero;
 int64_t offset;
 int64_t bytes;
@@ -2418,6 +2419,7 @@ early_out:
 
 static int coroutine_fn bdrv_co_block_status_above(BlockDriverState *bs,
BlockDriverState *base,
+   bool include_base,
bool want_zero,
int64_t offset,
int64_t bytes,
@@ -2429,8 +2431,8 @@ static int coroutine_fn 
bdrv_co_block_status_above(BlockDriverState *bs,
 int ret = 0;
 bool first = true;
 
-assert(bs != base);
-for (p = bs; p != base; p = backing_bs(p)) {
+assert(include_base || bs != base);
+for (p = bs; include_base || p != base; p = backing_bs(p)) {
 ret = bdrv_co_block_status(p, want_zero, offset, bytes, pnum, map,
file);
 if (ret < 0) {
@@ -2466,6 +2468,10 @@ static int coroutine_fn 
bdrv_co_block_status_above(BlockDriverState *bs,
 return ret;
 }
 
+if (p == base) {
+break;
+}
+
 /* Proceed to backing */
 assert(*pnum <= bytes);
 bytes = *pnum;
@@ -2481,7 +2487,7 @@ static void coroutine_fn 
bdrv_block_status_above_co_entry(void *opaque)
 BdrvCoBlockStatusData *data = opaque;
 
 data->ret = bdrv_co_block_status_above(data->bs, data->base,
-   data->want_zero,
+   data->include_base, data->want_zero,
data->offset, data->bytes,
data->pnum, data->map, data->file);
 data->done = true;
@@ -2495,6 +2501,7 @@ static void coroutine_fn 
bdrv_block_status_above_co_entry(void *opaque)
  */
 static int bdrv_common_block_status_above(BlockDriverState *bs,
   BlockDriverState *base,
+  bool include_base,
   bool want_zero, int64_t offset,
   int64_t bytes, int64_t *pnum,
   int64_t *map,
@@ -2504,6 +2511,7 @@ static int 
bdrv_common_block_status_above(BlockDriverState *bs,
 BdrvCoBlockStatusData data = {
 .bs = bs,
 .base = base,
+.include_base = include_base,
 .want_zero = want_zero,
 .offset = offset,
 .bytes = bytes,
@@ -2528,7 +2536,7 @@ int bdrv_block_status_above(BlockDriverState *bs, 
BlockDriverState *base,
 int64_t offset, int64_t bytes, int64_t *pnum,
 int64_t *map, BlockDriverState **file)
 {
-return bdrv_common_block_status_above(bs, base, true, offset, bytes,
+return bdrv_common_block_status_above(bs, base, false, true, offset, bytes,
   pnum, map, file);
 }
 
@@ -2545,7 +2553,7 @@ int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, 
int64_t offset,
 int ret;
 int64_t dummy;
 
-ret = bdrv_common_block_status_above(bs, backing_bs(bs), false, offset,
+ret = bdrv_common_block_status_above(bs, bs, true, false, offset,
  bytes, pnum ? pnum : , NULL,
  NULL);
 if (ret < 0) {
-- 
2.21.0




[PATCH 1/4] block/io: fix bdrv_co_block_status_above

2019-11-16 Thread Vladimir Sementsov-Ogievskiy
bdrv_co_block_status_above has several problems with handling short
backing files:

1. With want_zeros=true, it may return ret with BDRV_BLOCK_ZERO but
without BDRV_BLOCK_ALLOCATED flag, when actually short backing file
which produces these after-EOF zeros is inside requested backing
sequesnce.

2. With want_zeros=false, it will just stop inside requested region, if
we have unallocated region in top node when underlying backing is
short.

Fix these things, making logic about short backing files clearer.

Note that 154 output changed, because now bdrv_block_status_above don't
merge unallocated zeros with zeros after EOF (which are actually
"allocated" in POV of read from backing-chain top) and is_zero() just
don't understand that the whole head or tail is zero. We may update
is_zero to call bdrv_block_status_above several times, or add flag to
bdrv_block_status_above that we are not interested in ALLOCATED flag,
so ranges with different ALLOCATED status may be merged, but actually,
it seems that we'd better don't care about this corner case.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/io.c | 41 --
 tests/qemu-iotests/154.out |  4 ++--
 2 files changed, 32 insertions(+), 13 deletions(-)

diff --git a/block/io.c b/block/io.c
index f75777f5ea..4d7fa99bd2 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2434,25 +2434,44 @@ static int coroutine_fn 
bdrv_co_block_status_above(BlockDriverState *bs,
 ret = bdrv_co_block_status(p, want_zero, offset, bytes, pnum, map,
file);
 if (ret < 0) {
-break;
+return ret;
 }
-if (ret & BDRV_BLOCK_ZERO && ret & BDRV_BLOCK_EOF && !first) {
+if (*pnum == 0) {
+if (first) {
+return ret;
+}
+
 /*
- * Reading beyond the end of the file continues to read
- * zeroes, but we can only widen the result to the
- * unallocated length we learned from an earlier
- * iteration.
+ * Reads from bs for selected region will return zeroes, produced
+ * because current level is short. We should consider it as
+ * allocated.
+ *
+ * TODO: Should we report p as file here?
  */
+assert(ret & BDRV_BLOCK_EOF);
 *pnum = bytes;
+return BDRV_BLOCK_ZERO | BDRV_BLOCK_ALLOCATED;
 }
-if (ret & (BDRV_BLOCK_ZERO | BDRV_BLOCK_DATA)) {
-break;
+if (ret & BDRV_BLOCK_ALLOCATED) {
+/* We've found the node and the status, we must return. */
+
+if (ret & BDRV_BLOCK_ZERO && ret & BDRV_BLOCK_EOF && !first) {
+/*
+ * This level also responsible for reads after EOF inside
+ * unallocated region in previous level.
+ */
+*pnum = bytes;
+}
+
+return ret;
 }
-/* [offset, pnum] unallocated on this layer, which could be only
- * the first part of [offset, bytes].  */
-bytes = MIN(bytes, *pnum);
+
+/* Proceed to backing */
+assert(*pnum <= bytes);
+bytes = *pnum;
 first = false;
 }
+
 return ret;
 }
 
diff --git a/tests/qemu-iotests/154.out b/tests/qemu-iotests/154.out
index fa3673317f..a203dfcadd 100644
--- a/tests/qemu-iotests/154.out
+++ b/tests/qemu-iotests/154.out
@@ -310,13 +310,13 @@ wrote 512/512 bytes at offset 134217728
 512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 2048/2048 bytes allocated at offset 128 MiB
 [{ "start": 0, "length": 134217728, "depth": 1, "zero": true, "data": false},
-{ "start": 134217728, "length": 2048, "depth": 0, "zero": true, "data": false}]
+{ "start": 134217728, "length": 2048, "depth": 0, "zero": false, "data": true, 
"offset": OFFSET}]
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134219776 
backing_file=TEST_DIR/t.IMGFMT.base
 wrote 512/512 bytes at offset 134219264
 512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 2048/2048 bytes allocated at offset 128 MiB
 [{ "start": 0, "length": 134217728, "depth": 1, "zero": true, "data": false},
-{ "start": 134217728, "length": 2048, "depth": 0, "zero": true, "data": false}]
+{ "start": 134217728, "length": 2048, "depth": 0, "zero": false, "data": true, 
"offset": OFFSET}]
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134219776 
backing_file=TEST_DIR/t.IMGFMT.base
 wrote 1024/1024 bytes at offset 134218240
 1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-- 
2.21.0




[PATCH 3/4] block/io: bdrv_common_block_status_above: support bs == base

2019-11-16 Thread Vladimir Sementsov-Ogievskiy
We are going to reuse bdrv_common_block_status_above in
bdrv_is_allocated_above. bdrv_is_allocated_above may be called with
include_base == false and still bs == base (for ex. from img_rebase()).

So, support this corner case.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/io.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/block/io.c b/block/io.c
index df3ecf2430..f05b2e8ecc 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2431,7 +2431,11 @@ static int coroutine_fn 
bdrv_co_block_status_above(BlockDriverState *bs,
 int ret = 0;
 bool first = true;
 
-assert(include_base || bs != base);
+if (!include_base && bs == base) {
+*pnum = bytes;
+return 0;
+}
+
 for (p = bs; include_base || p != base; p = backing_bs(p)) {
 ret = bdrv_co_block_status(p, want_zero, offset, bytes, pnum, map,
file);
-- 
2.21.0




DROP Re: bdrv_is_allocated_above bug or feature

2019-11-16 Thread Vladimir Sementsov-Ogievskiy
Please don't answer, I'll resend it as a cover letter for proposed series to 
fix these things.

16.11.2019 17:18, Vladimir Sementsov-Ogievskiy wrote:
> Hi all!
> 
> I wanted to understand, what is the real difference between 
> bdrv_block_status_above
> and bdrv_is_allocated_above, IMHO bdrv_is_allocated_above should work through
> bdrv_block_status_above..
> 
> And I found the difference: bdrv_is_allocated_above considers space after EOF 
> as
> UNALLOCATED for intermediate nodes..
> 
> UNALLOCATED is not about allocation at fs level, but about should we go to 
> backing or
> not.. And it seems incorrect for me, as in case of short backing file, we'll 
> read
> zeroes after EOF, instead of going further by backing chain.
> 
> This leads to the following effect:
> 
> ./qemu-img create -f qcow2 base.qcow2 2M
> ./qemu-io -c "write -P 0x1 0 2M" base.qcow2
> 
> ./qemu-img create -f qcow2 -b base.qcow2 mid.qcow2 1M
> ./qemu-img create -f qcow2 -b mid.qcow2 top.qcow2 2M
> 
> Region 1M..2M is shadowed by short middle image, so guest sees zeroes:
> ./qemu-io -c "read -P 0 1M 1M" top.qcow2
> read 1048576/1048576 bytes at offset 1048576
> 1 MiB, 1 ops; 00.00 sec (22.795 GiB/sec and 23341.5807 ops/sec)
> 
> But after commit guest visible state is changed, which seems wrong for me:
> ./qemu-img commit top.qcow2 -b mid.qcow2
> 
> ./qemu-io -c "read -P 0 1M 1M" mid.qcow2
> Pattern verification failed at offset 1048576, 1048576 bytes
> read 1048576/1048576 bytes at offset 1048576
> 1 MiB, 1 ops; 00.00 sec (4.981 GiB/sec and 5100.4794 ops/sec)
> 
> ./qemu-io -c "read -P 1 1M 1M" mid.qcow2
> read 1048576/1048576 bytes at offset 1048576
> 1 MiB, 1 ops; 00.00 sec (3.365 GiB/sec and 3446.1606 ops/sec)
> 
> 
> I don't know, is it a real bug, as I don't know, do we support backing file 
> larger than
> its parent. Still, I'm not sure that this behavior of bdrv_is_allocated_above 
> don't lead
> to other problems.
> 
> =
> 
> Hmm, bdrv_block_allocated_above behaves strange too:
> 
> with want_zero=true, it may report unallocated zeroes because of short 
> backing files, which
> are actually "allocated" in POV of backing chains. But I see this may 
> influence only
> qemu-img compare, and I don't see can it trigger some bug..
> 
> with want_zero=false, it may do no progress because of short backing file. 
> Moreover it may
> report EOF in the middle!! But want_zero=false used only in 
> bdrv_is_allocated, which considers
> onlyt top layer, so it seems OK.
> 


-- 
Best regards,
Vladimir


bdrv_is_allocated_above bug or feature

2019-11-16 Thread Vladimir Sementsov-Ogievskiy
Hi all!

I wanted to understand, what is the real difference between 
bdrv_block_status_above
and bdrv_is_allocated_above, IMHO bdrv_is_allocated_above should work through
bdrv_block_status_above..

And I found the difference: bdrv_is_allocated_above considers space after EOF as
UNALLOCATED for intermediate nodes..

UNALLOCATED is not about allocation at fs level, but about should we go to 
backing or
not.. And it seems incorrect for me, as in case of short backing file, we'll 
read
zeroes after EOF, instead of going further by backing chain.

This leads to the following effect:

./qemu-img create -f qcow2 base.qcow2 2M
./qemu-io -c "write -P 0x1 0 2M" base.qcow2

./qemu-img create -f qcow2 -b base.qcow2 mid.qcow2 1M
./qemu-img create -f qcow2 -b mid.qcow2 top.qcow2 2M

Region 1M..2M is shadowed by short middle image, so guest sees zeroes:
./qemu-io -c "read -P 0 1M 1M" top.qcow2
read 1048576/1048576 bytes at offset 1048576
1 MiB, 1 ops; 00.00 sec (22.795 GiB/sec and 23341.5807 ops/sec)

But after commit guest visible state is changed, which seems wrong for me:
./qemu-img commit top.qcow2 -b mid.qcow2

./qemu-io -c "read -P 0 1M 1M" mid.qcow2
Pattern verification failed at offset 1048576, 1048576 bytes
read 1048576/1048576 bytes at offset 1048576
1 MiB, 1 ops; 00.00 sec (4.981 GiB/sec and 5100.4794 ops/sec)

./qemu-io -c "read -P 1 1M 1M" mid.qcow2
read 1048576/1048576 bytes at offset 1048576
1 MiB, 1 ops; 00.00 sec (3.365 GiB/sec and 3446.1606 ops/sec)


I don't know, is it a real bug, as I don't know, do we support backing file 
larger than
its parent. Still, I'm not sure that this behavior of bdrv_is_allocated_above 
don't lead
to other problems.

=

Hmm, bdrv_block_allocated_above behaves strange too:

with want_zero=true, it may report unallocated zeroes because of short backing 
files, which
are actually "allocated" in POV of backing chains. But I see this may influence 
only
qemu-img compare, and I don't see can it trigger some bug..

with want_zero=false, it may do no progress because of short backing file. 
Moreover it may
report EOF in the middle!! But want_zero=false used only in bdrv_is_allocated, 
which considers
onlyt top layer, so it seems OK.

-- 
Best regards,
Vladimir


Re: [PATCH v2 2/2] iotests: add 269 to check maximum of bitmaps in qcow2

2019-11-16 Thread Vladimir Sementsov-Ogievskiy
25.10.2019 16:12, Max Reitz wrote:
> On 14.10.19 13:51, Vladimir Sementsov-Ogievskiy wrote:
>> Check that it's impossible to create more persistent bitmaps than qcow2
>> supports.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>> ---
>>   tests/qemu-iotests/269 | 47 ++
>>   tests/qemu-iotests/269.out |  3 +++
>>   tests/qemu-iotests/group   |  1 +
>>   3 files changed, 51 insertions(+)
>>   create mode 100755 tests/qemu-iotests/269
>>   create mode 100644 tests/qemu-iotests/269.out
> 
> Is there no way to make this test any faster, e.g. by creating like
> 65534 bitmaps with dd and a binary blob?  (Similarly to what I do in
> “iotests: Test qcow2's snapshot table handling”)

Seems, that's not simple.. Each bitmap should have personal name and
bitmap table..

Let's merge only patch 01 and forget about this one.

> 
> This is such an edge case, but running the test took 3:46 min before
> patch 1 (which I already find much too long), and 8:13 min afterwards
> (on my machine).
> 
> (To be honest, if we take this test as-is, I’m probably just never going
> to run it.)
> 
> Max
> 


-- 
Best regards,
Vladimir


Re: [RFC 21/24] backup: move to block-copy

2019-11-16 Thread Vladimir Sementsov-Ogievskiy
15.11.2019 20:58, Eric Blake wrote:
> On 11/15/19 8:14 AM, Vladimir Sementsov-Ogievskiy wrote:
>> This brings async request handling and block-status driven chunk sizes
>> to backup out of the box, which improves backup performance.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>> ---
> 
>> +++ b/qapi/block-core.json
>> @@ -1455,6 +1455,12 @@
>>   #    above node specified by @drive. If this option is not 
>> given,
>>   #    a node name is autogenerated. (Since: 4.2)
>>   #
>> +# @x-max-workers: maximum of parallel requests for static data backup. This
>> +# doesn't influence copy-before-write operations. (Since: 
>> 4.3)
>> +#
>> +# @x-max-chunk: maximum chunk length for static data backup. This doesn't
>> +#   influence copy-before-write operations. (Since: 4.3)
> 
> The next release is 5.0, not 4.3.
> 
> Is there a reason to keep these experimental for a while?  For example, are 
> there potential changes to the interface that might affect how it gets used?  
> Or should we drop the x- prefix and add this outright in 5.0?
> 

Hmm, I added these options to satisfy some very conservative iotests. Still, 
they may be meaningful for user too.

The only real doubt is that actually they are options of block-copy, not backup 
job. And to not finish up with adding such
parameters to each block-job, it may be better to combine them into a 
structure.. Or may be not.

-- 
Best regards,
Vladimir