Re: [PATCH 0/4] fix & merge block_status_above and is_allocated_above

Vladimir Sementsov-Ogievskiy Tue, 19 Nov 2019 04:35:21 -0800

19.11.2019 15:32, Vladimir Sementsov-Ogievskiy wrote:
> 19.11.2019 15:17, Vladimir Sementsov-Ogievskiy wrote:
>> 19.11.2019 15:05, Kevin Wolf wrote:
>>> Am 16.11.2019 um 17:34 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>>> Hi all!
>>>>
>>>> I wanted to understand, what is the real difference between
>>>> bdrv_block_status_above and bdrv_is_allocated_above, IMHO
>>>> bdrv_is_allocated_above should work through bdrv_block_status_above..
>>>>
>>>> And I found the problem: bdrv_is_allocated_above considers space after
>>>> EOF as UNALLOCATED for intermediate nodes..
>>>>
>>>> UNALLOCATED is not about allocation at fs level, but about should we
>>>> go to backing or not.. And it seems incorrect for me, as in case of
>>>> short backing file, we'll read zeroes after EOF, instead of going
>>>> further by backing chain.
>>>
>>> We actually have documentation what it means:
>>>
>>>   * BDRV_BLOCK_ALLOCATED: the content of the block is determined by this
>>>   *                       layer rather than any backing, set by block layer
>>>
>>> Say we have a short overlay, like this:
>>>
>>> base.qcow2:     AAAAAAAA
>>> overlay.qcow2:  BBBB
>>>
>>> Then of course the content of block 5 (the one after EOF of
>>> overlay.qcow2) is still determined by overlay.qcow2, which can be easily
>>> verified by reading it from overlay.qcow2 (produces zeros) and from
>>> base.qcow2 (produces As).
>>>
>>> So the correct result when querying the block status of block 5 on
>>> overlay.qcow2 is BDRV_BLOCK_ALLOCATED | BDRV_BLOCK_ZERO.
>>>
>>> Interestingly, we already fixed the opposite case (large overlay over
>>> short backing file) in commit e88ae2264d9 from May 2014 according to the
>>> same logic.
>>>
>>>> This leads to the following effect:
>>>>
>>>> ./qemu-img create -f qcow2 base.qcow2 2M
>>>> ./qemu-io -c "write -P 0x1 0 2M" base.qcow2
>>>>
>>>> ./qemu-img create -f qcow2 -b base.qcow2 mid.qcow2 1M
>>>> ./qemu-img create -f qcow2 -b mid.qcow2 top.qcow2 2M
>>>>
>>>> Region 1M..2M is shadowed by short middle image, so guest sees zeroes:
>>>> ./qemu-io -c "read -P 0 1M 1M" top.qcow2
>>>> read 1048576/1048576 bytes at offset 1048576
>>>> 1 MiB, 1 ops; 00.00 sec (22.795 GiB/sec and 23341.5807 ops/sec)
>>>>
>>>> But after commit guest visible state is changed, which seems wrong for me:
>>>> ./qemu-img commit top.qcow2 -b mid.qcow2
>>>>
>>>> ./qemu-io -c "read -P 0 1M 1M" mid.qcow2
>>>> Pattern verification failed at offset 1048576, 1048576 bytes
>>>> read 1048576/1048576 bytes at offset 1048576
>>>> 1 MiB, 1 ops; 00.00 sec (4.981 GiB/sec and 5100.4794 ops/sec)
>>>>
>>>> ./qemu-io -c "read -P 1 1M 1M" mid.qcow2
>>>> read 1048576/1048576 bytes at offset 1048576
>>>> 1 MiB, 1 ops; 00.00 sec (3.365 GiB/sec and 3446.1606 ops/sec)
>>>>
>>>>
>>>> I don't know, is it a real bug, as I don't know, do we support backing
>>>> file larger than its parent. Still, I'm not sure that this behavior of
>>>> bdrv_is_allocated_above don't lead to other problems.
>>>
>>> I agree it's a bug.
>>>
>>> Your fix doesn't look right to me, though. You leave the buggy behaviour
>>> of bdrv_co_block_status() as it is and then add four patches to work
>>> around it in some (but not all) callers of it.
>>>
>>> All that it should take to fix this is making the bs->backing check
>>> independent from want_zero and let it set ALLOCATED. What I expected
>>> would be something like the below patch.
>>>
>>> But it doesn't seem to fully fix the problem (though 'alloc 1M 1M' in
>>> qemu-io shows that the range is now considered allocated), so probably
>>> there is still a separate bug in bdrv_is_allocated_above().
>>>
>>> And I think we'll want an iotests case for both cases (short overlay,
>>> short backing file).
>>>
>>> Kevin
>>>
>>>
>>> diff --git a/block/io.c b/block/io.c
>>> index f75777f5ea..5eafcff01a 100644
>>> --- a/block/io.c
>>> +++ b/block/io.c
>>> @@ -2359,16 +2359,17 @@ static int coroutine_fn 
>>> bdrv_co_block_status(BlockDriverState *bs,
>>>       if (ret & (BDRV_BLOCK_DATA | BDRV_BLOCK_ZERO)) {
>>>           ret |= BDRV_BLOCK_ALLOCATED;
>>> -    } else if (want_zero) {
>>> -        if (bdrv_unallocated_blocks_are_zero(bs)) {
>>> -            ret |= BDRV_BLOCK_ZERO;
>>> -        } else if (bs->backing) {
>>> -            BlockDriverState *bs2 = bs->backing->bs;
>>> -            int64_t size2 = bdrv_getlength(bs2);
>>> -
>>> -            if (size2 >= 0 && offset >= size2) {
>>> +    } else if (want_zero && bdrv_unallocated_blocks_are_zero(bs)) {
>>> +        ret |= BDRV_BLOCK_ZERO;
>>> +    } else if (bs->backing) {
>>> +        BlockDriverState *bs2 = bs->backing->bs;
>>> +        int64_t size2 = bdrv_getlength(bs2);
>>> +
>>> +        if (size2 >= 0 && offset >= size2) {
>>> +            if (want_zero) {
>>>                   ret |= BDRV_BLOCK_ZERO;
>>>               }
>>> +            ret |= BDRV_BLOCK_ALLOCATED;
>>
>> No, I think it's wrong, as it's not allocated at bs, but at bs->backing->bs.
>>
>>
>> So, bdrv_co_block_status works correct, what we can change about it, is not
>> to return pnum=0 if requested range after eof, but return pnum=bytes, 
>> together
>> with BDRV_BLOCK_ALLOCATED | BDRV_BLOCK_ZERO | BDRV_BLOCK_EOF.
> 
> But this will break users which rely on pnum=0.


Possibly, we may add flag to bdrv_co_block_status, to behave in such manner, and
set it to true when call from bdrv_block_status_above/bdrv_is_allocated_above.

> 
>>
>> And actual problem is in bdrv_block_status_above and 
>> bdrv_is_allocated_above, which
>> I'm fixing.
>>
>>
>>
> 
> 


-- 
Best regards,
Vladimir

Re: [PATCH 0/4] fix & merge block_status_above and is_allocated_above

Reply via email to