Re: backing chain & block status & filters

Vladimir Sementsov-Ogievskiy Wed, 29 Apr 2020 02:17:23 -0700

28.04.2020 22:44, Vladimir Sementsov-Ogievskiy wrote:

28.04.2020 19:46, Vladimir Sementsov-Ogievskiy wrote:

28.04.2020 19:18, Eric Blake wrote:

On 4/28/20 10:13 AM, Vladimir Sementsov-Ogievskiy wrote:

Hm.  I could imagine that there are formats that have non-zero holes
(e.g. 0xff or just garbage).  It would be a bit wrong for them to return
ZERO or DATA then.

But OTOH we don’t care about such cases, do we?  We need to know whether
ranges are zero, data, or unallocated.  If they aren’t zero, we only
care about whether reading from it will return data from this layer or not.

So I suppose that anything that doesn’t support backing files (or
filtered children) should always return ZERO and/or DATA.


I'm not sure I agree with the notion that everything should be
BDRV_BLOCK_ALLOCATED at the lowest layer. It's not what it means today
at least. If we want to change this, we will have to check all callers
of bdrv_is_allocated() and friends who might use this to find holes in
the file.


Yes. Because they are doing incorrect (or at least undocumented and unreliable) 
thing.


Here's some previous mails discussing the same question about what block_status 
should actually mean.  At the time, I was so scared of the prospect of 
something breaking if I changed things that I ended up keeping status quo, so 
here we are revisiting the topic several years later, still asking the same 
questions.

https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg00069.html
https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg03757.html


Basically, the way bdrv_is_allocated() works today is that we assume an
implicit zeroed backing layer even for block drivers that don't support
backing files.


But read doesn't work so: it will read data from the bottom layer, not from
this implicit zeroed backing layer. And it is inconsistent. On read data
comes exactly from this layer, not from its implicit backing. So it should
return BDRV_BLOCK_ALLOCATED, accordingly to its definition..

Or, we should at least document current behavior:

   BDRV_BLOCK_ALLOCATED: the content of the block is determined by this
   layer rather than any backing, set by block. Attention: it may not be set
   for drivers without backing support, still data is of course read from
   this layer. Note, that for such drivers BDRV_BLOCK_ALLOCATED may mean
   allocation on fs level, which occupies real space on disk.. So, for such 
drivers

   ZERO | ALLOCATED means that, read as zero, data may be allocated on fs, or
   (most probably) not,
   don't look at ALLOCATED flag, as it is added by generic layer for another 
logic,
   not related to fs-allocation.

   0 means that, most probably, data doesn't occupy space on fs, zero-status is
   unknown (most probably non-zero)


That may be right in describing the current situation, but again, needs a GOOD 
audit of what we are actually using it for, and whether it is what we really 
WANT to be using it for.  If we're going to audit/refactor the code, we might 
as well get semantics that are actually useful, rather than painfully contorted 
to documentation that happens to match our current contorted code.


Honest enough:) I'll try to make a table.

I don't think that reporting fs-allocation status is a bad thing. But I'm sure 
that it should be separated from backing-chain-allocated concept.


As a first step, I've don brief analysis of .bdrv_co_block_status of drivers 
(attached)


As a second step, here is brief analysis of all block_status usage

--
Best regards,
Vladimir

Public interface of block-status is:

    bdrv_block_status
    bdrv_block_status_above
    bdrv_is_allocated
    bdrv_is_allocated_above


= bdrv_block_status =

bdrv_make_zero: works on current level of backing-chain, want's to skip zeroes, 
not interested in @map and @file

img convert: convert_iteration_sectors: wants to distinguish ZERO, DATA and 
go-to-backing. It also tries to not write zeroes, if have short backing file, 
but does it a bit wrong. Treats unallocated as DATA if no backing.

img-map: get_block_status: distinguish ZERO, DATA and go-to-backing. Count 
depth of the backing. Just reports final ZERO and DATA. So, fs-unallocated 
thing is reported to user

= bdrv_block_status_above =

block-copy: block_copy_block_status: wants two things:
  
  1. skip go-to-backing holes in top layer for top mode
  2. do write_zero for ZERO areas

mirror: call on the whole backing chain
   - for DATA (and for DATA|ZERO which is bad) do just copy
   - for ZERO do just ZERO
   - for 0 (which means that bottom layer doesn't report that unallocated are 
zero) does DISCARD (which is most-probably zeroing) - absolutely wrong thing

qcow2: is_zero: call on the whole backing chain, want's just to check is 
reads-as-zero or not.

qcow2: qcow2_measure: call on the whole backing chain:
   - skip ZERO
   - count clusters with both DATA and ALLOCATED set. Hmm. ALLOCATED is always 
set for DATA. Seems the function actually tries to calculate disk occupation, 
assuming that BDRV_BLOCK_ALLOCATED helps in it, but it actually doesn't..

   I think, correct solution is to support offset and bytes in bdrv_measure, 
and split it from block_status. Then qcow2_measure will just recursively call 
bdrv_measure on its children. This would be clean.

nbd: nbd_co_send_sparse_read: call on the whole backing chain:
   - wants to distinguish zeroes

nbd: blockstatus_to_extents: call on the whole backing chain:
   !ALLOCATED -> NBD_HOLE
   ZERO -> NBD_ZERO

   So, we report HOLE only if it's not BDRV_BLOCK_ALLOCATED on any layer.. 
That's wrong. I think, we should report HOLE in a lot more cases. Actually, 
when not occupy real space on disk.

img-compare: call on the whole backing chain:
  - do not compare zeroes
  - do not compare if both report unallocated.. it's actually not correct for 
protocols which reports fs-unallocated-non-zeroes. As reads may differ 
actually. Still, read from fs-unallocated area is not guaranteed to return same 
thing each time, yes? At least, null-co doesn't guarantee it :).. So, it may be 
correct to skip these areas. Or may be better to always report them different??
  - consider data-zeroes equal to unallocated.. it's definitely not correct for 
protocols which reports fs-unallocated-non-zeroes

  I think, img-compare must only consider zero/non-zero, and don't touch other 
block-status features. Otherwise it's a mess

img-convert: convert_iteration_sectors: call on the whole backing chain: 
already described in bdrv_block_status section

= bdrv_is_allocated =

Obvious thing for backing-chain related operation (still wrong that some 
protocol drivers may return fs-unallocated and it is treated as go-to-backing 
areas):
    block-copy, commit, copy-on-read, stream, img-rebase

Others:
vvfat: o_O it has qcow child.. and operates like self is a backing of this 
child. But yes, it just uses bdrv_is_allocated to understand is chunk is 
rewritten in qcow.

migration/block: skip unallocated for top mode (shared_base, as it called here)

io-alloc: just report number of allocated in top layer

io-map: map_is_allocated: same thing as io-alloc, but report chunks

test_sync_op_block_status: just check what it returns

= bdrv_is_allocated_above =

Obvious usage for backing-chain related: commit, mirror, stream, img-rebase. 
Wrong for fs-unallocated-non-zero reporting drivers

Others:
qcow2: is_unallocated: call for the whole backing chain. Used to check 
is-zero.. Wrong for fs-unallocated-non-zero reporting drivers, and may be more 
efficient if consider also ZERO status.. but in some smart-fast way.

replication: allocated or not in backing-chain: common case

Re: backing chain & block status & filters

Reply via email to