Re: [PATCH 0/4] fix & merge block_status_above and is_allocated_above

Vladimir Sementsov-Ogievskiy Tue, 19 Nov 2019 04:33:23 -0800

19.11.2019 15:17, Vladimir Sementsov-Ogievskiy wrote:
> 19.11.2019 15:05, Kevin Wolf wrote:
>> Am 16.11.2019 um 17:34 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>> Hi all!
>>>
>>> I wanted to understand, what is the real difference between
>>> bdrv_block_status_above and bdrv_is_allocated_above, IMHO
>>> bdrv_is_allocated_above should work through bdrv_block_status_above..
>>>
>>> And I found the problem: bdrv_is_allocated_above considers space after
>>> EOF as UNALLOCATED for intermediate nodes..
>>>
>>> UNALLOCATED is not about allocation at fs level, but about should we
>>> go to backing or not.. And it seems incorrect for me, as in case of
>>> short backing file, we'll read zeroes after EOF, instead of going
>>> further by backing chain.
>>
>> We actually have documentation what it means:
>>
>>   * BDRV_BLOCK_ALLOCATED: the content of the block is determined by this
>>   *                       layer rather than any backing, set by block layer
>>
>> Say we have a short overlay, like this:
>>
>> base.qcow2:     AAAAAAAA
>> overlay.qcow2:  BBBB
>>
>> Then of course the content of block 5 (the one after EOF of
>> overlay.qcow2) is still determined by overlay.qcow2, which can be easily
>> verified by reading it from overlay.qcow2 (produces zeros) and from
>> base.qcow2 (produces As).
>>
>> So the correct result when querying the block status of block 5 on
>> overlay.qcow2 is BDRV_BLOCK_ALLOCATED | BDRV_BLOCK_ZERO.
>>
>> Interestingly, we already fixed the opposite case (large overlay over
>> short backing file) in commit e88ae2264d9 from May 2014 according to the
>> same logic.
>>
>>> This leads to the following effect:
>>>
>>> ./qemu-img create -f qcow2 base.qcow2 2M
>>> ./qemu-io -c "write -P 0x1 0 2M" base.qcow2
>>>
>>> ./qemu-img create -f qcow2 -b base.qcow2 mid.qcow2 1M
>>> ./qemu-img create -f qcow2 -b mid.qcow2 top.qcow2 2M
>>>
>>> Region 1M..2M is shadowed by short middle image, so guest sees zeroes:
>>> ./qemu-io -c "read -P 0 1M 1M" top.qcow2
>>> read 1048576/1048576 bytes at offset 1048576
>>> 1 MiB, 1 ops; 00.00 sec (22.795 GiB/sec and 23341.5807 ops/sec)
>>>
>>> But after commit guest visible state is changed, which seems wrong for me:
>>> ./qemu-img commit top.qcow2 -b mid.qcow2
>>>
>>> ./qemu-io -c "read -P 0 1M 1M" mid.qcow2
>>> Pattern verification failed at offset 1048576, 1048576 bytes
>>> read 1048576/1048576 bytes at offset 1048576
>>> 1 MiB, 1 ops; 00.00 sec (4.981 GiB/sec and 5100.4794 ops/sec)
>>>
>>> ./qemu-io -c "read -P 1 1M 1M" mid.qcow2
>>> read 1048576/1048576 bytes at offset 1048576
>>> 1 MiB, 1 ops; 00.00 sec (3.365 GiB/sec and 3446.1606 ops/sec)
>>>
>>>
>>> I don't know, is it a real bug, as I don't know, do we support backing
>>> file larger than its parent. Still, I'm not sure that this behavior of
>>> bdrv_is_allocated_above don't lead to other problems.
>>
>> I agree it's a bug.
>>
>> Your fix doesn't look right to me, though. You leave the buggy behaviour
>> of bdrv_co_block_status() as it is and then add four patches to work
>> around it in some (but not all) callers of it.
>>
>> All that it should take to fix this is making the bs->backing check
>> independent from want_zero and let it set ALLOCATED. What I expected
>> would be something like the below patch.
>>
>> But it doesn't seem to fully fix the problem (though 'alloc 1M 1M' in
>> qemu-io shows that the range is now considered allocated), so probably
>> there is still a separate bug in bdrv_is_allocated_above().
>>
>> And I think we'll want an iotests case for both cases (short overlay,
>> short backing file).
>>
>> Kevin
>>
>>
>> diff --git a/block/io.c b/block/io.c
>> index f75777f5ea..5eafcff01a 100644
>> --- a/block/io.c
>> +++ b/block/io.c
>> @@ -2359,16 +2359,17 @@ static int coroutine_fn 
>> bdrv_co_block_status(BlockDriverState *bs,
>>       if (ret & (BDRV_BLOCK_DATA | BDRV_BLOCK_ZERO)) {
>>           ret |= BDRV_BLOCK_ALLOCATED;
>> -    } else if (want_zero) {
>> -        if (bdrv_unallocated_blocks_are_zero(bs)) {
>> -            ret |= BDRV_BLOCK_ZERO;
>> -        } else if (bs->backing) {
>> -            BlockDriverState *bs2 = bs->backing->bs;
>> -            int64_t size2 = bdrv_getlength(bs2);
>> -
>> -            if (size2 >= 0 && offset >= size2) {
>> +    } else if (want_zero && bdrv_unallocated_blocks_are_zero(bs)) {
>> +        ret |= BDRV_BLOCK_ZERO;
>> +    } else if (bs->backing) {
>> +        BlockDriverState *bs2 = bs->backing->bs;
>> +        int64_t size2 = bdrv_getlength(bs2);
>> +
>> +        if (size2 >= 0 && offset >= size2) {
>> +            if (want_zero) {
>>                   ret |= BDRV_BLOCK_ZERO;
>>               }
>> +            ret |= BDRV_BLOCK_ALLOCATED;
> 
> No, I think it's wrong, as it's not allocated at bs, but at bs->backing->bs.
> 
> 
> So, bdrv_co_block_status works correct, what we can change about it, is not
> to return pnum=0 if requested range after eof, but return pnum=bytes, together
> with BDRV_BLOCK_ALLOCATED | BDRV_BLOCK_ZERO | BDRV_BLOCK_EOF.


But this will break users which rely on pnum=0.

> 
> And actual problem is in bdrv_block_status_above and bdrv_is_allocated_above, 
> which
> I'm fixing.
> 
> 
> 


-- 
Best regards,
Vladimir

Re: [PATCH 0/4] fix & merge block_status_above and is_allocated_above

Reply via email to