19.11.2019 15:17, Vladimir Sementsov-Ogievskiy wrote: > 19.11.2019 15:05, Kevin Wolf wrote: >> Am 16.11.2019 um 17:34 hat Vladimir Sementsov-Ogievskiy geschrieben: >>> Hi all! >>> >>> I wanted to understand, what is the real difference between >>> bdrv_block_status_above and bdrv_is_allocated_above, IMHO >>> bdrv_is_allocated_above should work through bdrv_block_status_above.. >>> >>> And I found the problem: bdrv_is_allocated_above considers space after >>> EOF as UNALLOCATED for intermediate nodes.. >>> >>> UNALLOCATED is not about allocation at fs level, but about should we >>> go to backing or not.. And it seems incorrect for me, as in case of >>> short backing file, we'll read zeroes after EOF, instead of going >>> further by backing chain. >> >> We actually have documentation what it means: >> >> * BDRV_BLOCK_ALLOCATED: the content of the block is determined by this >> * layer rather than any backing, set by block layer >> >> Say we have a short overlay, like this: >> >> base.qcow2: AAAAAAAA >> overlay.qcow2: BBBB >> >> Then of course the content of block 5 (the one after EOF of >> overlay.qcow2) is still determined by overlay.qcow2, which can be easily >> verified by reading it from overlay.qcow2 (produces zeros) and from >> base.qcow2 (produces As). >> >> So the correct result when querying the block status of block 5 on >> overlay.qcow2 is BDRV_BLOCK_ALLOCATED | BDRV_BLOCK_ZERO. >> >> Interestingly, we already fixed the opposite case (large overlay over >> short backing file) in commit e88ae2264d9 from May 2014 according to the >> same logic. >> >>> This leads to the following effect: >>> >>> ./qemu-img create -f qcow2 base.qcow2 2M >>> ./qemu-io -c "write -P 0x1 0 2M" base.qcow2 >>> >>> ./qemu-img create -f qcow2 -b base.qcow2 mid.qcow2 1M >>> ./qemu-img create -f qcow2 -b mid.qcow2 top.qcow2 2M >>> >>> Region 1M..2M is shadowed by short middle image, so guest sees zeroes: >>> ./qemu-io -c "read -P 0 1M 1M" top.qcow2 >>> read 1048576/1048576 bytes at offset 1048576 >>> 1 MiB, 1 ops; 00.00 sec (22.795 GiB/sec and 23341.5807 ops/sec) >>> >>> But after commit guest visible state is changed, which seems wrong for me: >>> ./qemu-img commit top.qcow2 -b mid.qcow2 >>> >>> ./qemu-io -c "read -P 0 1M 1M" mid.qcow2 >>> Pattern verification failed at offset 1048576, 1048576 bytes >>> read 1048576/1048576 bytes at offset 1048576 >>> 1 MiB, 1 ops; 00.00 sec (4.981 GiB/sec and 5100.4794 ops/sec) >>> >>> ./qemu-io -c "read -P 1 1M 1M" mid.qcow2 >>> read 1048576/1048576 bytes at offset 1048576 >>> 1 MiB, 1 ops; 00.00 sec (3.365 GiB/sec and 3446.1606 ops/sec) >>> >>> >>> I don't know, is it a real bug, as I don't know, do we support backing >>> file larger than its parent. Still, I'm not sure that this behavior of >>> bdrv_is_allocated_above don't lead to other problems. >> >> I agree it's a bug. >> >> Your fix doesn't look right to me, though. You leave the buggy behaviour >> of bdrv_co_block_status() as it is and then add four patches to work >> around it in some (but not all) callers of it. >> >> All that it should take to fix this is making the bs->backing check >> independent from want_zero and let it set ALLOCATED. What I expected >> would be something like the below patch. >> >> But it doesn't seem to fully fix the problem (though 'alloc 1M 1M' in >> qemu-io shows that the range is now considered allocated), so probably >> there is still a separate bug in bdrv_is_allocated_above(). >> >> And I think we'll want an iotests case for both cases (short overlay, >> short backing file). >> >> Kevin >> >> >> diff --git a/block/io.c b/block/io.c >> index f75777f5ea..5eafcff01a 100644 >> --- a/block/io.c >> +++ b/block/io.c >> @@ -2359,16 +2359,17 @@ static int coroutine_fn >> bdrv_co_block_status(BlockDriverState *bs, >> if (ret & (BDRV_BLOCK_DATA | BDRV_BLOCK_ZERO)) { >> ret |= BDRV_BLOCK_ALLOCATED; >> - } else if (want_zero) { >> - if (bdrv_unallocated_blocks_are_zero(bs)) { >> - ret |= BDRV_BLOCK_ZERO; >> - } else if (bs->backing) { >> - BlockDriverState *bs2 = bs->backing->bs; >> - int64_t size2 = bdrv_getlength(bs2); >> - >> - if (size2 >= 0 && offset >= size2) { >> + } else if (want_zero && bdrv_unallocated_blocks_are_zero(bs)) { >> + ret |= BDRV_BLOCK_ZERO; >> + } else if (bs->backing) { >> + BlockDriverState *bs2 = bs->backing->bs; >> + int64_t size2 = bdrv_getlength(bs2); >> + >> + if (size2 >= 0 && offset >= size2) { >> + if (want_zero) { >> ret |= BDRV_BLOCK_ZERO; >> } >> + ret |= BDRV_BLOCK_ALLOCATED; > > No, I think it's wrong, as it's not allocated at bs, but at bs->backing->bs. > > > So, bdrv_co_block_status works correct, what we can change about it, is not > to return pnum=0 if requested range after eof, but return pnum=bytes, together > with BDRV_BLOCK_ALLOCATED | BDRV_BLOCK_ZERO | BDRV_BLOCK_EOF.
But this will break users which rely on pnum=0. > > And actual problem is in bdrv_block_status_above and bdrv_is_allocated_above, > which > I'm fixing. > > > -- Best regards, Vladimir