Re: backing chain & block status & filters

2020-05-07 Thread Vladimir Sementsov-Ogievskiy

07.05.2020 15:58, Max Reitz wrote:

On 28.04.20 16:51, Vladimir Sementsov-Ogievskiy wrote:

28.04.2020 14:08, Max Reitz wrote:

On 28.04.20 10:55, Vladimir Sementsov-Ogievskiy wrote:

Hi!

I wanted to resend my "[PATCH 0/4] fix & merge block_status_above and
is_allocated_above", and returned to all the inconsistencies about
block-status. I keep in mind Max's series about child-access functions,
and Andrey's work about using COR filter in block-stream, which depends
on Max's series (because, without them COR fitler with file child breaks
backing chains).. And, it seems that it's better to discuss some
questions before resending.

First, problems about block-status:

1. We consider ALLOCATED = ZERO | DATA, and documented as follows:

     * BDRV_BLOCK_DATA: allocation for data at offset is tied to this
layer
     * BDRV_BLOCK_ZERO: offset reads as zero
     * BDRV_BLOCK_OFFSET_VALID: an associated offset exists for accessing
raw data
     * BDRV_BLOCK_ALLOCATED: the content of the block is determined by
this
     *   layer rather than any backing, set by block
layer

This actually means, that we should always have BDRV_BLOCK_ALLOCATED for
formats which doesn't support backing. So, all such format drivers must
return ZERO or DATA (or both?), yes?. Seems file-posix does so, but, for
example, iscsi - doesn't.


Hm.  I could imagine that there are formats that have non-zero holes
(e.g. 0xff or just garbage).  It would be a bit wrong for them to return
ZERO or DATA then.

But OTOH we don’t care about such cases, do we?  We need to know whether
ranges are zero, data, or unallocated.  If they aren’t zero, we only
care about whether reading from it will return data from this layer or
not.

So I suppose that anything that doesn’t support backing files (or
filtered children) should always return ZERO and/or DATA.


2. ZERO. The meaning differs a bit for generic block_status and for
drivers.. I think, we at least should document it like this:

BDRV_BLOCK_DATA: allocation for data at offset is tied to this layer
BDRV_BLOCK_ZERO: if driver return ZERO, than the region is allocated at
this layer and read as ZERO. If generic block_status returns ZERO, it
only mean that it reads as zero, but the region may be allocated on
underlying level.


Hm.  What does that mean?

One of the problems is that “allocated” has two meanings:
(1) reading data returns data defined at this backing layer,
(2) actually allocated, i.e. takes up space on the file represented by
this BDS.

As far as I understand, we actually don’t care about (2) in the context
of block_status, but just about (1).

So if a layer returns ZERO, it is by definition (1)-allocated.  (It
isn’t necessarily (2)-allocated.)


3. bdi.unallocated_blocks_are_zero

I think it's very bad, that we have formats, that supports backing, but
doesn't report bdi.unallocated_blocks_are_zero as true. It means that
UNALLOCATED region reads as zero if we have short backing file, and not
as zero if we remove this short backing file.


What do you mean by “remove this short backing file”?  Because generally
one can’t just drop a backing file.

So maybe a case like block-stream?  Wouldn’t that be a bug in
block-stream them, i.e. shouldn’t it stream zeros after the end of the
backing file?


I can live with it but
this is weird logic. These bad drivers are qcow (not qcow2), parallels
and vmdk. I hope, they actually just forget to set
unallocated_blocks_are_zero to true.


qcow definitely sounds like it.


Next. But what about drivers which doesn't support backing? As we
considered above, they should always return ZERO or DATA, as everything
is allocated in this backing-chain level (last level, of course).. So
again unallocated_blocks_are_zero is meaningless. So, I think, that
driver which doesn't support backings, should be fixed to return always
ZERO or DATA, than we don't need this unallocated_blocks_are_zero at
all.


Agreed.


3.


The second 3.? :)


Short backing files in allocated_above: we must consider space after
EOF as ALLOCATED, if short backing file is inside requested
backing-chain part, as it produced exactly because of this short file
(and we never go to backing).


Sounds correct.


(current realization of allocated_above is
buggy, see my outdated series "[PATCH 0/4] fix & merge
block_status_above and is_allocated_above")

4. Long ago we've discussed problems about BDRV_BLOCK_RAW, when we have
a backing chain of non-backing child.. I just remember that we didn't
reach the consensus.


Possible? :)


5. Filters.. OK we have two functions for them:
bdrv_co_block_status_from_file and bdrv_co_block_status_from_backing. I
think both are wrong:

bdrv_co_block_status_from_file leads to problem [4], when we can report
UNALLOCATED, which refers not to the current backing chain, but to sub
backing chain of file child, which is inconsistent with
block_status_above and is_allocated_above iteration.

bdrv_co_block_status_from_backing is also is not consistent with

Re: backing chain & block status & filters

2020-05-07 Thread Max Reitz
On 28.04.20 16:51, Vladimir Sementsov-Ogievskiy wrote:
> 28.04.2020 14:08, Max Reitz wrote:
>> On 28.04.20 10:55, Vladimir Sementsov-Ogievskiy wrote:
>>> Hi!
>>>
>>> I wanted to resend my "[PATCH 0/4] fix & merge block_status_above and
>>> is_allocated_above", and returned to all the inconsistencies about
>>> block-status. I keep in mind Max's series about child-access functions,
>>> and Andrey's work about using COR filter in block-stream, which depends
>>> on Max's series (because, without them COR fitler with file child breaks
>>> backing chains).. And, it seems that it's better to discuss some
>>> questions before resending.
>>>
>>> First, problems about block-status:
>>>
>>> 1. We consider ALLOCATED = ZERO | DATA, and documented as follows:
>>>
>>>     * BDRV_BLOCK_DATA: allocation for data at offset is tied to this
>>> layer
>>>     * BDRV_BLOCK_ZERO: offset reads as zero
>>>     * BDRV_BLOCK_OFFSET_VALID: an associated offset exists for accessing
>>> raw data
>>>     * BDRV_BLOCK_ALLOCATED: the content of the block is determined by
>>> this
>>>     *   layer rather than any backing, set by block
>>> layer
>>>
>>> This actually means, that we should always have BDRV_BLOCK_ALLOCATED for
>>> formats which doesn't support backing. So, all such format drivers must
>>> return ZERO or DATA (or both?), yes?. Seems file-posix does so, but, for
>>> example, iscsi - doesn't.
>>
>> Hm.  I could imagine that there are formats that have non-zero holes
>> (e.g. 0xff or just garbage).  It would be a bit wrong for them to return
>> ZERO or DATA then.
>>
>> But OTOH we don’t care about such cases, do we?  We need to know whether
>> ranges are zero, data, or unallocated.  If they aren’t zero, we only
>> care about whether reading from it will return data from this layer or
>> not.
>>
>> So I suppose that anything that doesn’t support backing files (or
>> filtered children) should always return ZERO and/or DATA.
>>
>>> 2. ZERO. The meaning differs a bit for generic block_status and for
>>> drivers.. I think, we at least should document it like this:
>>>
>>> BDRV_BLOCK_DATA: allocation for data at offset is tied to this layer
>>> BDRV_BLOCK_ZERO: if driver return ZERO, than the region is allocated at
>>> this layer and read as ZERO. If generic block_status returns ZERO, it
>>> only mean that it reads as zero, but the region may be allocated on
>>> underlying level.
>>
>> Hm.  What does that mean?
>>
>> One of the problems is that “allocated” has two meanings:
>> (1) reading data returns data defined at this backing layer,
>> (2) actually allocated, i.e. takes up space on the file represented by
>> this BDS.
>>
>> As far as I understand, we actually don’t care about (2) in the context
>> of block_status, but just about (1).
>>
>> So if a layer returns ZERO, it is by definition (1)-allocated.  (It
>> isn’t necessarily (2)-allocated.)
>>
>>> 3. bdi.unallocated_blocks_are_zero
>>>
>>> I think it's very bad, that we have formats, that supports backing, but
>>> doesn't report bdi.unallocated_blocks_are_zero as true. It means that
>>> UNALLOCATED region reads as zero if we have short backing file, and not
>>> as zero if we remove this short backing file.
>>
>> What do you mean by “remove this short backing file”?  Because generally
>> one can’t just drop a backing file.
>>
>> So maybe a case like block-stream?  Wouldn’t that be a bug in
>> block-stream them, i.e. shouldn’t it stream zeros after the end of the
>> backing file?
>>
>>> I can live with it but
>>> this is weird logic. These bad drivers are qcow (not qcow2), parallels
>>> and vmdk. I hope, they actually just forget to set
>>> unallocated_blocks_are_zero to true.
>>
>> qcow definitely sounds like it.
>>
>>> Next. But what about drivers which doesn't support backing? As we
>>> considered above, they should always return ZERO or DATA, as everything
>>> is allocated in this backing-chain level (last level, of course).. So
>>> again unallocated_blocks_are_zero is meaningless. So, I think, that
>>> driver which doesn't support backings, should be fixed to return always
>>> ZERO or DATA, than we don't need this unallocated_blocks_are_zero at
>>> all.
>>
>> Agreed.
>>
>>> 3.
>>
>> The second 3.? :)
>>
>>> Short backing files in allocated_above: we must consider space after
>>> EOF as ALLOCATED, if short backing file is inside requested
>>> backing-chain part, as it produced exactly because of this short file
>>> (and we never go to backing).
>>
>> Sounds correct.
>>
>>> (current realization of allocated_above is
>>> buggy, see my outdated series "[PATCH 0/4] fix & merge
>>> block_status_above and is_allocated_above")
>>>
>>> 4. Long ago we've discussed problems about BDRV_BLOCK_RAW, when we have
>>> a backing chain of non-backing child.. I just remember that we didn't
>>> reach the consensus.
>>
>> Possible? :)
>>
>>> 5. Filters.. OK we have two functions for them:
>>> bdrv_co_block_status_from_file and bdrv_co_block_status_from_backing. I
>>> think 

Re: backing chain & block status & filters

2020-05-05 Thread Vladimir Sementsov-Ogievskiy

01.05.2020 6:04, Andrey Shinkevich wrote:

Sounds good to me generally.
Also, we need to identify the filter by its node name when the file names of a 
node and of the filter above it are the same. And what about automatically 
generated node name for the filter? We will want to pass it to the stream 
routine.



As I understand, there should not be auto-generated node-names in blockdev era. 
All nodes are named by libvirt with full-control on them.

--
Best regards,
Vladimir



Re: backing chain & block status & filters

2020-04-30 Thread Andrey Shinkevich
Sounds good to me generally.
Also, we need to identify the filter by its node name when the file names of a 
node and of the filter above it are the same. And what about automatically 
generated node name for the filter? We will want to pass it to the stream 
routine.

Andrey


From: Vladimir Sementsov-Ogievskiy 
Sent: Thursday, April 30, 2020 10:12 PM
To: Max Reitz ; qemu block 
Cc: Kevin Wolf ; qemu-devel ; Andrey 
Shinkevich 
Subject: Re: backing chain & block status & filters


>
>  The only difference is that if you use file-child-based filters, you 
> can't do stream/commit around them. I just think, that binding the concept to 
> the "backing" link of the node is simpler and more generic. In blockdev era, 
> when all nodes will be named and libvirt will take care of all nodes 
> including filters, we will not need any filter-skipping magic, libvirt will 
> directly point to exact nodes it means. And we can deprecate "file" children 
> of existing filters, to finally reach simple architecture with simple backing 
> chain of nodes (any nodes). And after deprecating old filename-based 
> interfaces, we'll drop all filter-skipping magic..

What do you think?

--
Best regards,
Vladimir


Re: backing chain & block status & filters

2020-04-30 Thread Vladimir Sementsov-Ogievskiy

28.04.2020 17:51, Vladimir Sementsov-Ogievskiy wrote:

28.04.2020 14:08, Max Reitz wrote:

On 28.04.20 10:55, Vladimir Sementsov-Ogievskiy wrote:

Hi!

I wanted to resend my "[PATCH 0/4] fix & merge block_status_above and
is_allocated_above", and returned to all the inconsistencies about
block-status. I keep in mind Max's series about child-access functions,
and Andrey's work about using COR filter in block-stream, which depends
on Max's series (because, without them COR fitler with file child breaks
backing chains).. And, it seems that it's better to discuss some
questions before resending.

First, problems about block-status:

1. We consider ALLOCATED = ZERO | DATA, and documented as follows:

    * BDRV_BLOCK_DATA: allocation for data at offset is tied to this layer
    * BDRV_BLOCK_ZERO: offset reads as zero
    * BDRV_BLOCK_OFFSET_VALID: an associated offset exists for accessing
raw data
    * BDRV_BLOCK_ALLOCATED: the content of the block is determined by this
    *   layer rather than any backing, set by block
layer

This actually means, that we should always have BDRV_BLOCK_ALLOCATED for
formats which doesn't support backing. So, all such format drivers must
return ZERO or DATA (or both?), yes?. Seems file-posix does so, but, for
example, iscsi - doesn't.


Hm.  I could imagine that there are formats that have non-zero holes
(e.g. 0xff or just garbage).  It would be a bit wrong for them to return
ZERO or DATA then.

But OTOH we don’t care about such cases, do we?  We need to know whether
ranges are zero, data, or unallocated.  If they aren’t zero, we only
care about whether reading from it will return data from this layer or not.

So I suppose that anything that doesn’t support backing files (or
filtered children) should always return ZERO and/or DATA.


2. ZERO. The meaning differs a bit for generic block_status and for
drivers.. I think, we at least should document it like this:

BDRV_BLOCK_DATA: allocation for data at offset is tied to this layer
BDRV_BLOCK_ZERO: if driver return ZERO, than the region is allocated at
this layer and read as ZERO. If generic block_status returns ZERO, it
only mean that it reads as zero, but the region may be allocated on
underlying level.


Hm.  What does that mean?

One of the problems is that “allocated” has two meanings:
(1) reading data returns data defined at this backing layer,
(2) actually allocated, i.e. takes up space on the file represented by
this BDS.

As far as I understand, we actually don’t care about (2) in the context
of block_status, but just about (1).

So if a layer returns ZERO, it is by definition (1)-allocated.  (It
isn’t necessarily (2)-allocated.)


3. bdi.unallocated_blocks_are_zero

I think it's very bad, that we have formats, that supports backing, but
doesn't report bdi.unallocated_blocks_are_zero as true. It means that
UNALLOCATED region reads as zero if we have short backing file, and not
as zero if we remove this short backing file.


What do you mean by “remove this short backing file”?  Because generally
one can’t just drop a backing file.

So maybe a case like block-stream?  Wouldn’t that be a bug in
block-stream them, i.e. shouldn’t it stream zeros after the end of the
backing file?


I can live with it but
this is weird logic. These bad drivers are qcow (not qcow2), parallels
and vmdk. I hope, they actually just forget to set
unallocated_blocks_are_zero to true.


qcow definitely sounds like it.


Next. But what about drivers which doesn't support backing? As we
considered above, they should always return ZERO or DATA, as everything
is allocated in this backing-chain level (last level, of course).. So
again unallocated_blocks_are_zero is meaningless. So, I think, that
driver which doesn't support backings, should be fixed to return always
ZERO or DATA, than we don't need this unallocated_blocks_are_zero at all.


Agreed.


3.


The second 3.? :)


Short backing files in allocated_above: we must consider space after
EOF as ALLOCATED, if short backing file is inside requested
backing-chain part, as it produced exactly because of this short file
(and we never go to backing).


Sounds correct.


(current realization of allocated_above is
buggy, see my outdated series "[PATCH 0/4] fix & merge
block_status_above and is_allocated_above")

4. Long ago we've discussed problems about BDRV_BLOCK_RAW, when we have
a backing chain of non-backing child.. I just remember that we didn't
reach the consensus.


Possible? :)


5. Filters.. OK we have two functions for them:
bdrv_co_block_status_from_file and bdrv_co_block_status_from_backing. I
think both are wrong:

bdrv_co_block_status_from_file leads to problem [4], when we can report
UNALLOCATED, which refers not to the current backing chain, but to sub
backing chain of file child, which is inconsistent with
block_status_above and is_allocated_above iteration.

bdrv_co_block_status_from_backing is also is not consistent with
block_status_above iteration.. At least at 

Re: backing chain & block status & filters

2020-04-29 Thread Vladimir Sementsov-Ogievskiy

Full summary

What drivers returns?

Filters and raw are just broken together with their BDRV_BLOCK_RAW.


Format drivers behaves as follows (except for backing-not-supporting, which 
should be fixed):

0 - go to backing (not-backing supporting will never return it, 
backing-supporting will return it even if there is not backing file, but read 
is guaranteed to return zeroes, if there is no backing file at the moment)
ZERO - zero at this level
DATA - data at this level
DATA | ZERO actually never returned.

Protocol drivers

0 - fs-unallocated, read may return any garbage (null-co, iscsi)

from SCSI:
Logical Block Provisioning Read Zeros (LBPRZ) bit
1 If the logical block provisioning read zeros (LBPRZ) bit is set to one, 
then, for an unmapped LBA specified by a read operation, the deviceserver shall 
send user data with all bits set to zero to the data-in buffer.
0 If the TPRZ bit is set to zero, then, for an unmapped LBA specified by a 
read operation, the device server may send user data with all bitsset to any 
value to the data-in buffer.

so, yes, this 0 actually have significant meaning. And null-co matches it too. 
And nbd should be fixed to match it, I think.

ZERO - fs-unallocated, reads as zero
DATA - fs-allocated data or safe default
ZERO | DATA - only nbd may return it, and it seems wrong. let's fix it.

So, seems, we may can with it as is. The only thing to be documented is meaning 
of zero status returned by driver:

for backing-supporting it means go-to-backing, with a guarantee to read zeroes 
if there is no backing file, for backing-not-supporting it means most probably 
not occupy disk space, read returns garbage. And format drivers without backing 
support should never return 0.

===

How it is used? And here we definitely fail. As there are a lot of places where 
0 return of drivers is misused: we consider it fs-unallocated, but it may be 
just go-to-backing, we consider it go-to-backing, but it may be fs-unallocated.

let's go again through our public API

= bdrv_block_status =

img-convert and bdrv_make_zero wants only zero and go-to-backing information
img-map is better to distinguish fs-unallocated and go-to-backing and report 
them in different way.

= bdrv_block_status_above =

most callers needs only zero and go-to-backing information, others are doing 
wrong things and should be rewritten anyway

= bdrv_is_allocated =

most callers need only go-to-backing information.

io-alloc io-map - are reporting utilities, and they probably want to show 
fs-unallocated as well... but we never documented, how actually img-map, io-map 
and io-alloc should work and what they are trying to say :). Anyway, they may 
use same interface as img-map

= bdrv_is_allocated_above =

callers only need go-to-backing information

OK, sounds good. So, for most things we only need zero/data/go-to-backing 
information. fs-unallocated should be treated as DATA, it's garbage, but on 
read we will directly read this garbage, so we should consider it DATA.

And only for reporting through img-map, io-alloc and io-map we may want 
fs-unallocated information.
Do we actually need it?

Basing on this, I think that there should be four block-status types:

ZERO: unrelated to backing, reads as zero from this layer
DATA: unrelated to backing, reads from this layer, may be non-zero
BACKING: go to backing for information, reads from backing as well
RAW: I'm a filter or 'raw' driver, I don't know what to do. I give you my child 
and offset in it, take care of it. Be careful: it may be backing child or not, 
don't break your backing-chain loop on me!

So we may require that at most one of DATA, ZERO, RAW is set. And if nothing is 
set it means BACKING. And if we want to report fs-unallocated things, we just 
need additional flag for it, like BDRV_FS_UNALLOCATED_GARBAGE, which may be 
combined with DATA type chunks.

Or, may be even better, to split type from flags: block_status will return 
type, which is one of these four types, and other things (RECURSE, EOF, 
OFFSET_VALID, FS_UNALLOCATED_GARBAGE) goes to additional flags out-argument.

Note also, that the only user of @map and @file parameters is img-map. And all 
other callers have to pass NULL, NULL. I think, this definitely should be 
refactored.


--
Best regards,
Vladimir



Re: backing chain & block status & filters

2020-04-29 Thread Vladimir Sementsov-Ogievskiy

28.04.2020 22:44, Vladimir Sementsov-Ogievskiy wrote:

28.04.2020 19:46, Vladimir Sementsov-Ogievskiy wrote:

28.04.2020 19:18, Eric Blake wrote:

On 4/28/20 10:13 AM, Vladimir Sementsov-Ogievskiy wrote:


Hm.  I could imagine that there are formats that have non-zero holes
(e.g. 0xff or just garbage).  It would be a bit wrong for them to return
ZERO or DATA then.

But OTOH we don’t care about such cases, do we?  We need to know whether
ranges are zero, data, or unallocated.  If they aren’t zero, we only
care about whether reading from it will return data from this layer or not.

So I suppose that anything that doesn’t support backing files (or
filtered children) should always return ZERO and/or DATA.


I'm not sure I agree with the notion that everything should be
BDRV_BLOCK_ALLOCATED at the lowest layer. It's not what it means today
at least. If we want to change this, we will have to check all callers
of bdrv_is_allocated() and friends who might use this to find holes in
the file.


Yes. Because they are doing incorrect (or at least undocumented and unreliable) 
thing.


Here's some previous mails discussing the same question about what block_status 
should actually mean.  At the time, I was so scared of the prospect of 
something breaking if I changed things that I ended up keeping status quo, so 
here we are revisiting the topic several years later, still asking the same 
questions.

https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg00069.html
https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg03757.html





Basically, the way bdrv_is_allocated() works today is that we assume an
implicit zeroed backing layer even for block drivers that don't support
backing files.


But read doesn't work so: it will read data from the bottom layer, not from
this implicit zeroed backing layer. And it is inconsistent. On read data
comes exactly from this layer, not from its implicit backing. So it should
return BDRV_BLOCK_ALLOCATED, accordingly to its definition..

Or, we should at least document current behavior:

   BDRV_BLOCK_ALLOCATED: the content of the block is determined by this
   layer rather than any backing, set by block. Attention: it may not be set
   for drivers without backing support, still data is of course read from
   this layer. Note, that for such drivers BDRV_BLOCK_ALLOCATED may mean
   allocation on fs level, which occupies real space on disk.. So, for such 
drivers

   ZERO | ALLOCATED means that, read as zero, data may be allocated on fs, or
   (most probably) not,
   don't look at ALLOCATED flag, as it is added by generic layer for another 
logic,
   not related to fs-allocation.

   0 means that, most probably, data doesn't occupy space on fs, zero-status is
   unknown (most probably non-zero)



That may be right in describing the current situation, but again, needs a GOOD 
audit of what we are actually using it for, and whether it is what we really 
WANT to be using it for.  If we're going to audit/refactor the code, we might 
as well get semantics that are actually useful, rather than painfully contorted 
to documentation that happens to match our current contorted code.



Honest enough:) I'll try to make a table.

I don't think that reporting fs-allocation status is a bad thing. But I'm sure 
that it should be separated from backing-chain-allocated concept.



As a first step, I've don brief analysis of .bdrv_co_block_status of drivers 
(attached)



As a second step, here is brief analysis of all block_status usage

--
Best regards,
Vladimir
Public interface of block-status is:

bdrv_block_status
bdrv_block_status_above
bdrv_is_allocated
bdrv_is_allocated_above


= bdrv_block_status =

bdrv_make_zero: works on current level of backing-chain, want's to skip zeroes, 
not interested in @map and @file

img convert: convert_iteration_sectors: wants to distinguish ZERO, DATA and 
go-to-backing. It also tries to not write zeroes, if have short backing file, 
but does it a bit wrong. Treats unallocated as DATA if no backing.

img-map: get_block_status: distinguish ZERO, DATA and go-to-backing. Count 
depth of the backing. Just reports final ZERO and DATA. So, fs-unallocated 
thing is reported to user

= bdrv_block_status_above =

block-copy: block_copy_block_status: wants two things:
  
  1. skip go-to-backing holes in top layer for top mode
  2. do write_zero for ZERO areas

mirror: call on the whole backing chain
   - for DATA (and for DATA|ZERO which is bad) do just copy
   - for ZERO do just ZERO
   - for 0 (which means that bottom layer doesn't report that unallocated are 
zero) does DISCARD (which is most-probably zeroing) - absolutely wrong thing

qcow2: is_zero: call on the whole backing chain, want's just to check is 
reads-as-zero or not.

qcow2: qcow2_measure: call on the whole backing chain:
   - skip ZERO
   - count clusters with both DATA and ALLOCATED set. Hmm. ALLOCATED is always 
set for DATA. Seems the function actually tries to calculate disk 

Re: backing chain & block status & filters

2020-04-28 Thread Vladimir Sementsov-Ogievskiy

28.04.2020 19:46, Vladimir Sementsov-Ogievskiy wrote:

28.04.2020 19:18, Eric Blake wrote:

On 4/28/20 10:13 AM, Vladimir Sementsov-Ogievskiy wrote:


Hm.  I could imagine that there are formats that have non-zero holes
(e.g. 0xff or just garbage).  It would be a bit wrong for them to return
ZERO or DATA then.

But OTOH we don’t care about such cases, do we?  We need to know whether
ranges are zero, data, or unallocated.  If they aren’t zero, we only
care about whether reading from it will return data from this layer or not.

So I suppose that anything that doesn’t support backing files (or
filtered children) should always return ZERO and/or DATA.


I'm not sure I agree with the notion that everything should be
BDRV_BLOCK_ALLOCATED at the lowest layer. It's not what it means today
at least. If we want to change this, we will have to check all callers
of bdrv_is_allocated() and friends who might use this to find holes in
the file.


Yes. Because they are doing incorrect (or at least undocumented and unreliable) 
thing.


Here's some previous mails discussing the same question about what block_status 
should actually mean.  At the time, I was so scared of the prospect of 
something breaking if I changed things that I ended up keeping status quo, so 
here we are revisiting the topic several years later, still asking the same 
questions.

https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg00069.html
https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg03757.html





Basically, the way bdrv_is_allocated() works today is that we assume an
implicit zeroed backing layer even for block drivers that don't support
backing files.


But read doesn't work so: it will read data from the bottom layer, not from
this implicit zeroed backing layer. And it is inconsistent. On read data
comes exactly from this layer, not from its implicit backing. So it should
return BDRV_BLOCK_ALLOCATED, accordingly to its definition..

Or, we should at least document current behavior:

   BDRV_BLOCK_ALLOCATED: the content of the block is determined by this
   layer rather than any backing, set by block. Attention: it may not be set
   for drivers without backing support, still data is of course read from
   this layer. Note, that for such drivers BDRV_BLOCK_ALLOCATED may mean
   allocation on fs level, which occupies real space on disk.. So, for such 
drivers

   ZERO | ALLOCATED means that, read as zero, data may be allocated on fs, or
   (most probably) not,
   don't look at ALLOCATED flag, as it is added by generic layer for another 
logic,
   not related to fs-allocation.

   0 means that, most probably, data doesn't occupy space on fs, zero-status is
   unknown (most probably non-zero)



That may be right in describing the current situation, but again, needs a GOOD 
audit of what we are actually using it for, and whether it is what we really 
WANT to be using it for.  If we're going to audit/refactor the code, we might 
as well get semantics that are actually useful, rather than painfully contorted 
to documentation that happens to match our current contorted code.



Honest enough:) I'll try to make a table.

I don't think that reporting fs-allocation status is a bad thing. But I'm sure 
that it should be separated from backing-chain-allocated concept.



As a first step, I've don brief analysis of .bdrv_co_block_status of drivers 
(attached)

--
Best regards,
Vladimir
= Filter functions =
mirror,commit,backup filters: bdrv_co_block_status_from_backing
blklogwrites,compress,COR,throttle: bdrv_co_block_status_from_file

return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID
- semantics of BDRV_BLOCK_RAW is unclean, behavior is broken

blkdebug: blkdebug_co_block_status
- actually, uses bdrv_co_block_status_from_file,
after additional blkdebug-related things not influincing the result

= raw =
raw: raw_co_block_status
return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID
- semantics of BDRV_BLOCK_RAW is unclean, behavior is broken

Summary: we need to fix BDRV_BLOCK_RAW-recursion semantics to not interfere 
with block_status_above/is_allocated_above loops.

= Format drivers with supports_backing = true =

qed: bdrv_qed_co_block_status
bdi->unallocated_blocks_are_zero = true;

0 - go to backing
ZERO - metadata-zero
DATA | OFFSET_VALID with @map set and @file = file-child : 
metadata-allocated-data
 
parallels: parallels_co_block_status
unallocated_blocks_are_zero is unset, but they are actually read as zero if 
no backing

0 - go to backing
DATA | OFFSET_VALID with @map set and @file = file-child : 
metadat-allocated-data

qcow2: qcow2_co_block_status
bdi->unallocated_blocks_are_zero = true;

ZERO | OFFSET_VALID with @map set and @file = something : 
metadata-allocated-zero
DATA | OFFSET_VALID with @map set and @file = something : 
metadata-allocated-data
RECURSE | DATA | OFFSET_VALID with @map set and @file = file-child : 
metadata-allocated-data (hint, that 

Re: backing chain & block status & filters

2020-04-28 Thread Kevin Wolf
Am 28.04.2020 um 18:46 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 28.04.2020 19:18, Eric Blake wrote:
> > On 4/28/20 10:13 AM, Vladimir Sementsov-Ogievskiy wrote:
> > 
> > > > > Hm.  I could imagine that there are formats that have non-zero holes
> > > > > (e.g. 0xff or just garbage).  It would be a bit wrong for them to 
> > > > > return
> > > > > ZERO or DATA then.
> > > > > 
> > > > > But OTOH we don’t care about such cases, do we?  We need to know 
> > > > > whether
> > > > > ranges are zero, data, or unallocated.  If they aren’t zero, we only
> > > > > care about whether reading from it will return data from this layer 
> > > > > or not.
> > > > > 
> > > > > So I suppose that anything that doesn’t support backing files (or
> > > > > filtered children) should always return ZERO and/or DATA.
> > > > 
> > > > I'm not sure I agree with the notion that everything should be
> > > > BDRV_BLOCK_ALLOCATED at the lowest layer. It's not what it means today
> > > > at least. If we want to change this, we will have to check all callers
> > > > of bdrv_is_allocated() and friends who might use this to find holes in
> > > > the file.
> > > 
> > > Yes. Because they are doing incorrect (or at least undocumented and 
> > > unreliable) thing.
> > 
> > Here's some previous mails discussing the same question about what 
> > block_status should actually mean.  At the time, I was so scared of the 
> > prospect of something breaking if I changed things that I ended up keeping 
> > status quo, so here we are revisiting the topic several years later, still 
> > asking the same questions.
> > 
> > https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg00069.html
> > https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg03757.html
> > 
> > > 
> > > > 
> > > > Basically, the way bdrv_is_allocated() works today is that we assume an
> > > > implicit zeroed backing layer even for block drivers that don't support
> > > > backing files.
> > > 
> > > But read doesn't work so: it will read data from the bottom layer, not 
> > > from
> > > this implicit zeroed backing layer. And it is inconsistent. On read data
> > > comes exactly from this layer, not from its implicit backing. So it should
> > > return BDRV_BLOCK_ALLOCATED, accordingly to its definition..
> > > 
> > > Or, we should at least document current behavior:
> > > 
> > >    BDRV_BLOCK_ALLOCATED: the content of the block is determined by this
> > >    layer rather than any backing, set by block. Attention: it may not be 
> > > set
> > >    for drivers without backing support, still data is of course read from
> > >    this layer. Note, that for such drivers BDRV_BLOCK_ALLOCATED may mean
> > >    allocation on fs level, which occupies real space on disk.. So, for 
> > > such drivers
> > > 
> > >    ZERO | ALLOCATED means that, read as zero, data may be allocated on 
> > > fs, or
> > >    (most probably) not,
> > >    don't look at ALLOCATED flag, as it is added by generic layer for 
> > > another logic,
> > >    not related to fs-allocation.
> > > 
> > >    0 means that, most probably, data doesn't occupy space on fs, 
> > > zero-status is
> > >    unknown (most probably non-zero)
> > > 
> > 
> > That may be right in describing the current situation, but again,
> > needs a GOOD audit of what we are actually using it for, and whether
> > it is what we really WANT to be using it for.  If we're going to
> > audit/refactor the code, we might as well get semantics that are
> > actually useful, rather than painfully contorted to documentation
> > that happens to match our current contorted code.
> > 
> 
> Honest enough:) I'll try to make a table.
> 
> I don't think that reporting fs-allocation status is a bad thing. But
> I'm sure that it should be separated from backing-chain-allocated
> concept.

I think we could easily agree on what would be a good concept.

My concern is just that existing code probably uses existing semantics
and not what we consider more logical now. So if we change it, we must
make sure that we change all places that expect the old semantics.

Kevin




Re: backing chain & block status & filters

2020-04-28 Thread Vladimir Sementsov-Ogievskiy

28.04.2020 19:18, Eric Blake wrote:

On 4/28/20 10:13 AM, Vladimir Sementsov-Ogievskiy wrote:


Hm.  I could imagine that there are formats that have non-zero holes
(e.g. 0xff or just garbage).  It would be a bit wrong for them to return
ZERO or DATA then.

But OTOH we don’t care about such cases, do we?  We need to know whether
ranges are zero, data, or unallocated.  If they aren’t zero, we only
care about whether reading from it will return data from this layer or not.

So I suppose that anything that doesn’t support backing files (or
filtered children) should always return ZERO and/or DATA.


I'm not sure I agree with the notion that everything should be
BDRV_BLOCK_ALLOCATED at the lowest layer. It's not what it means today
at least. If we want to change this, we will have to check all callers
of bdrv_is_allocated() and friends who might use this to find holes in
the file.


Yes. Because they are doing incorrect (or at least undocumented and unreliable) 
thing.


Here's some previous mails discussing the same question about what block_status 
should actually mean.  At the time, I was so scared of the prospect of 
something breaking if I changed things that I ended up keeping status quo, so 
here we are revisiting the topic several years later, still asking the same 
questions.

https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg00069.html
https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg03757.html





Basically, the way bdrv_is_allocated() works today is that we assume an
implicit zeroed backing layer even for block drivers that don't support
backing files.


But read doesn't work so: it will read data from the bottom layer, not from
this implicit zeroed backing layer. And it is inconsistent. On read data
comes exactly from this layer, not from its implicit backing. So it should
return BDRV_BLOCK_ALLOCATED, accordingly to its definition..

Or, we should at least document current behavior:

   BDRV_BLOCK_ALLOCATED: the content of the block is determined by this
   layer rather than any backing, set by block. Attention: it may not be set
   for drivers without backing support, still data is of course read from
   this layer. Note, that for such drivers BDRV_BLOCK_ALLOCATED may mean
   allocation on fs level, which occupies real space on disk.. So, for such 
drivers

   ZERO | ALLOCATED means that, read as zero, data may be allocated on fs, or
   (most probably) not,
   don't look at ALLOCATED flag, as it is added by generic layer for another 
logic,
   not related to fs-allocation.

   0 means that, most probably, data doesn't occupy space on fs, zero-status is
   unknown (most probably non-zero)



That may be right in describing the current situation, but again, needs a GOOD 
audit of what we are actually using it for, and whether it is what we really 
WANT to be using it for.  If we're going to audit/refactor the code, we might 
as well get semantics that are actually useful, rather than painfully contorted 
to documentation that happens to match our current contorted code.



Honest enough:) I'll try to make a table.

I don't think that reporting fs-allocation status is a bad thing. But I'm sure 
that it should be separated from backing-chain-allocated concept.

--
Best regards,
Vladimir



Re: backing chain & block status & filters

2020-04-28 Thread Eric Blake

On 4/28/20 10:13 AM, Vladimir Sementsov-Ogievskiy wrote:


Hm.  I could imagine that there are formats that have non-zero holes
(e.g. 0xff or just garbage).  It would be a bit wrong for them to return
ZERO or DATA then.

But OTOH we don’t care about such cases, do we?  We need to know whether
ranges are zero, data, or unallocated.  If they aren’t zero, we only
care about whether reading from it will return data from this layer 
or not.


So I suppose that anything that doesn’t support backing files (or
filtered children) should always return ZERO and/or DATA.


I'm not sure I agree with the notion that everything should be
BDRV_BLOCK_ALLOCATED at the lowest layer. It's not what it means today
at least. If we want to change this, we will have to check all callers
of bdrv_is_allocated() and friends who might use this to find holes in
the file.


Yes. Because they are doing incorrect (or at least undocumented and 
unreliable) thing.


Here's some previous mails discussing the same question about what 
block_status should actually mean.  At the time, I was so scared of the 
prospect of something breaking if I changed things that I ended up 
keeping status quo, so here we are revisiting the topic several years 
later, still asking the same questions.


https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg00069.html
https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg03757.html





Basically, the way bdrv_is_allocated() works today is that we assume an
implicit zeroed backing layer even for block drivers that don't support
backing files.


But read doesn't work so: it will read data from the bottom layer, not from
this implicit zeroed backing layer. And it is inconsistent. On read data
comes exactly from this layer, not from its implicit backing. So it should
return BDRV_BLOCK_ALLOCATED, accordingly to its definition..

Or, we should at least document current behavior:

   BDRV_BLOCK_ALLOCATED: the content of the block is determined by this
   layer rather than any backing, set by block. Attention: it may not be 
set

   for drivers without backing support, still data is of course read from
   this layer. Note, that for such drivers BDRV_BLOCK_ALLOCATED may mean
   allocation on fs level, which occupies real space on disk.. So, for 
such drivers


   ZERO | ALLOCATED means that, read as zero, data may be allocated on 
fs, or

   (most probably) not,
   don't look at ALLOCATED flag, as it is added by generic layer for 
another logic,

   not related to fs-allocation.

   0 means that, most probably, data doesn't occupy space on fs, 
zero-status is

   unknown (most probably non-zero)



That may be right in describing the current situation, but again, needs 
a GOOD audit of what we are actually using it for, and whether it is 
what we really WANT to be using it for.  If we're going to 
audit/refactor the code, we might as well get semantics that are 
actually useful, rather than painfully contorted to documentation that 
happens to match our current contorted code.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: backing chain & block status & filters

2020-04-28 Thread Vladimir Sementsov-Ogievskiy

28.04.2020 14:28, Kevin Wolf wrote:

Am 28.04.2020 um 13:08 hat Max Reitz geschrieben:

On 28.04.20 10:55, Vladimir Sementsov-Ogievskiy wrote:

Hi!

I wanted to resend my "[PATCH 0/4] fix & merge block_status_above and
is_allocated_above", and returned to all the inconsistencies about
block-status. I keep in mind Max's series about child-access functions,
and Andrey's work about using COR filter in block-stream, which depends
on Max's series (because, without them COR fitler with file child breaks
backing chains).. And, it seems that it's better to discuss some
questions before resending.

First, problems about block-status:

1. We consider ALLOCATED = ZERO | DATA, and documented as follows:

    * BDRV_BLOCK_DATA: allocation for data at offset is tied to this layer
    * BDRV_BLOCK_ZERO: offset reads as zero
    * BDRV_BLOCK_OFFSET_VALID: an associated offset exists for accessing
raw data
    * BDRV_BLOCK_ALLOCATED: the content of the block is determined by this
    *   layer rather than any backing, set by block
layer

This actually means, that we should always have BDRV_BLOCK_ALLOCATED for
formats which doesn't support backing. So, all such format drivers must
return ZERO or DATA (or both?), yes?. Seems file-posix does so, but, for
example, iscsi - doesn't.


Hm.  I could imagine that there are formats that have non-zero holes
(e.g. 0xff or just garbage).  It would be a bit wrong for them to return
ZERO or DATA then.

But OTOH we don’t care about such cases, do we?  We need to know whether
ranges are zero, data, or unallocated.  If they aren’t zero, we only
care about whether reading from it will return data from this layer or not.

So I suppose that anything that doesn’t support backing files (or
filtered children) should always return ZERO and/or DATA.


I'm not sure I agree with the notion that everything should be
BDRV_BLOCK_ALLOCATED at the lowest layer. It's not what it means today
at least. If we want to change this, we will have to check all callers
of bdrv_is_allocated() and friends who might use this to find holes in
the file.


Yes. Because they are doing incorrect (or at least undocumented and unreliable) 
thing.



Basically, the way bdrv_is_allocated() works today is that we assume an
implicit zeroed backing layer even for block drivers that don't support
backing files.


But read doesn't work so: it will read data from the bottom layer, not from
this implicit zeroed backing layer. And it is inconsistent. On read data
comes exactly from this layer, not from its implicit backing. So it should
return BDRV_BLOCK_ALLOCATED, accordingly to its definition..

Or, we should at least document current behavior:

  BDRV_BLOCK_ALLOCATED: the content of the block is determined by this
  layer rather than any backing, set by block. Attention: it may not be set
  for drivers without backing support, still data is of course read from
  this layer. Note, that for such drivers BDRV_BLOCK_ALLOCATED may mean
  allocation on fs level, which occupies real space on disk.. So, for such 
drivers

  ZERO | ALLOCATED means that, read as zero, data may be allocated on fs, or
  (most probably) not,
  don't look at ALLOCATED flag, as it is added by generic layer for another 
logic,
  not related to fs-allocation.

  0 means that, most probably, data doesn't occupy space on fs, zero-status is
  unknown (most probably non-zero)

 


--
Best regards,
Vladimir



Re: backing chain & block status & filters

2020-04-28 Thread Vladimir Sementsov-Ogievskiy

28.04.2020 14:08, Max Reitz wrote:

On 28.04.20 10:55, Vladimir Sementsov-Ogievskiy wrote:

Hi!

I wanted to resend my "[PATCH 0/4] fix & merge block_status_above and
is_allocated_above", and returned to all the inconsistencies about
block-status. I keep in mind Max's series about child-access functions,
and Andrey's work about using COR filter in block-stream, which depends
on Max's series (because, without them COR fitler with file child breaks
backing chains).. And, it seems that it's better to discuss some
questions before resending.

First, problems about block-status:

1. We consider ALLOCATED = ZERO | DATA, and documented as follows:

    * BDRV_BLOCK_DATA: allocation for data at offset is tied to this layer
    * BDRV_BLOCK_ZERO: offset reads as zero
    * BDRV_BLOCK_OFFSET_VALID: an associated offset exists for accessing
raw data
    * BDRV_BLOCK_ALLOCATED: the content of the block is determined by this
    *   layer rather than any backing, set by block
layer

This actually means, that we should always have BDRV_BLOCK_ALLOCATED for
formats which doesn't support backing. So, all such format drivers must
return ZERO or DATA (or both?), yes?. Seems file-posix does so, but, for
example, iscsi - doesn't.


Hm.  I could imagine that there are formats that have non-zero holes
(e.g. 0xff or just garbage).  It would be a bit wrong for them to return
ZERO or DATA then.

But OTOH we don’t care about such cases, do we?  We need to know whether
ranges are zero, data, or unallocated.  If they aren’t zero, we only
care about whether reading from it will return data from this layer or not.

So I suppose that anything that doesn’t support backing files (or
filtered children) should always return ZERO and/or DATA.


2. ZERO. The meaning differs a bit for generic block_status and for
drivers.. I think, we at least should document it like this:

BDRV_BLOCK_DATA: allocation for data at offset is tied to this layer
BDRV_BLOCK_ZERO: if driver return ZERO, than the region is allocated at
this layer and read as ZERO. If generic block_status returns ZERO, it
only mean that it reads as zero, but the region may be allocated on
underlying level.


Hm.  What does that mean?

One of the problems is that “allocated” has two meanings:
(1) reading data returns data defined at this backing layer,
(2) actually allocated, i.e. takes up space on the file represented by
this BDS.

As far as I understand, we actually don’t care about (2) in the context
of block_status, but just about (1).

So if a layer returns ZERO, it is by definition (1)-allocated.  (It
isn’t necessarily (2)-allocated.)


3. bdi.unallocated_blocks_are_zero

I think it's very bad, that we have formats, that supports backing, but
doesn't report bdi.unallocated_blocks_are_zero as true. It means that
UNALLOCATED region reads as zero if we have short backing file, and not
as zero if we remove this short backing file.


What do you mean by “remove this short backing file”?  Because generally
one can’t just drop a backing file.

So maybe a case like block-stream?  Wouldn’t that be a bug in
block-stream them, i.e. shouldn’t it stream zeros after the end of the
backing file?


I can live with it but
this is weird logic. These bad drivers are qcow (not qcow2), parallels
and vmdk. I hope, they actually just forget to set
unallocated_blocks_are_zero to true.


qcow definitely sounds like it.


Next. But what about drivers which doesn't support backing? As we
considered above, they should always return ZERO or DATA, as everything
is allocated in this backing-chain level (last level, of course).. So
again unallocated_blocks_are_zero is meaningless. So, I think, that
driver which doesn't support backings, should be fixed to return always
ZERO or DATA, than we don't need this unallocated_blocks_are_zero at all.


Agreed.


3.


The second 3.? :)


Short backing files in allocated_above: we must consider space after
EOF as ALLOCATED, if short backing file is inside requested
backing-chain part, as it produced exactly because of this short file
(and we never go to backing).


Sounds correct.


(current realization of allocated_above is
buggy, see my outdated series "[PATCH 0/4] fix & merge
block_status_above and is_allocated_above")

4. Long ago we've discussed problems about BDRV_BLOCK_RAW, when we have
a backing chain of non-backing child.. I just remember that we didn't
reach the consensus.


Possible? :)


5. Filters.. OK we have two functions for them:
bdrv_co_block_status_from_file and bdrv_co_block_status_from_backing. I
think both are wrong:

bdrv_co_block_status_from_file leads to problem [4], when we can report
UNALLOCATED, which refers not to the current backing chain, but to sub
backing chain of file child, which is inconsistent with
block_status_above and is_allocated_above iteration.

bdrv_co_block_status_from_backing is also is not consistent with
block_status_above iteration.. At least at leads to querying the same
node twice.


Well, yes.


So, 

Re: backing chain & block status & filters

2020-04-28 Thread Kevin Wolf
Am 28.04.2020 um 13:08 hat Max Reitz geschrieben:
> On 28.04.20 10:55, Vladimir Sementsov-Ogievskiy wrote:
> > Hi!
> > 
> > I wanted to resend my "[PATCH 0/4] fix & merge block_status_above and
> > is_allocated_above", and returned to all the inconsistencies about
> > block-status. I keep in mind Max's series about child-access functions,
> > and Andrey's work about using COR filter in block-stream, which depends
> > on Max's series (because, without them COR fitler with file child breaks
> > backing chains).. And, it seems that it's better to discuss some
> > questions before resending.
> > 
> > First, problems about block-status:
> > 
> > 1. We consider ALLOCATED = ZERO | DATA, and documented as follows:
> > 
> >    * BDRV_BLOCK_DATA: allocation for data at offset is tied to this layer
> >    * BDRV_BLOCK_ZERO: offset reads as zero
> >    * BDRV_BLOCK_OFFSET_VALID: an associated offset exists for accessing
> > raw data
> >    * BDRV_BLOCK_ALLOCATED: the content of the block is determined by this
> >    *   layer rather than any backing, set by block
> > layer
> > 
> > This actually means, that we should always have BDRV_BLOCK_ALLOCATED for
> > formats which doesn't support backing. So, all such format drivers must
> > return ZERO or DATA (or both?), yes?. Seems file-posix does so, but, for
> > example, iscsi - doesn't.
> 
> Hm.  I could imagine that there are formats that have non-zero holes
> (e.g. 0xff or just garbage).  It would be a bit wrong for them to return
> ZERO or DATA then.
> 
> But OTOH we don’t care about such cases, do we?  We need to know whether
> ranges are zero, data, or unallocated.  If they aren’t zero, we only
> care about whether reading from it will return data from this layer or not.
> 
> So I suppose that anything that doesn’t support backing files (or
> filtered children) should always return ZERO and/or DATA.

I'm not sure I agree with the notion that everything should be
BDRV_BLOCK_ALLOCATED at the lowest layer. It's not what it means today
at least. If we want to change this, we will have to check all callers
of bdrv_is_allocated() and friends who might use this to find holes in
the file.

Basically, the way bdrv_is_allocated() works today is that we assume an
implicit zeroed backing layer even for block drivers that don't support
backing files.

Kevin


signature.asc
Description: PGP signature


Re: backing chain & block status & filters

2020-04-28 Thread Max Reitz
On 28.04.20 10:55, Vladimir Sementsov-Ogievskiy wrote:
> Hi!
> 
> I wanted to resend my "[PATCH 0/4] fix & merge block_status_above and
> is_allocated_above", and returned to all the inconsistencies about
> block-status. I keep in mind Max's series about child-access functions,
> and Andrey's work about using COR filter in block-stream, which depends
> on Max's series (because, without them COR fitler with file child breaks
> backing chains).. And, it seems that it's better to discuss some
> questions before resending.
> 
> First, problems about block-status:
> 
> 1. We consider ALLOCATED = ZERO | DATA, and documented as follows:
> 
>    * BDRV_BLOCK_DATA: allocation for data at offset is tied to this layer
>    * BDRV_BLOCK_ZERO: offset reads as zero
>    * BDRV_BLOCK_OFFSET_VALID: an associated offset exists for accessing
> raw data
>    * BDRV_BLOCK_ALLOCATED: the content of the block is determined by this
>    *   layer rather than any backing, set by block
> layer
> 
> This actually means, that we should always have BDRV_BLOCK_ALLOCATED for
> formats which doesn't support backing. So, all such format drivers must
> return ZERO or DATA (or both?), yes?. Seems file-posix does so, but, for
> example, iscsi - doesn't.

Hm.  I could imagine that there are formats that have non-zero holes
(e.g. 0xff or just garbage).  It would be a bit wrong for them to return
ZERO or DATA then.

But OTOH we don’t care about such cases, do we?  We need to know whether
ranges are zero, data, or unallocated.  If they aren’t zero, we only
care about whether reading from it will return data from this layer or not.

So I suppose that anything that doesn’t support backing files (or
filtered children) should always return ZERO and/or DATA.

> 2. ZERO. The meaning differs a bit for generic block_status and for
> drivers.. I think, we at least should document it like this:
> 
> BDRV_BLOCK_DATA: allocation for data at offset is tied to this layer
> BDRV_BLOCK_ZERO: if driver return ZERO, than the region is allocated at
> this layer and read as ZERO. If generic block_status returns ZERO, it
> only mean that it reads as zero, but the region may be allocated on
> underlying level.

Hm.  What does that mean?

One of the problems is that “allocated” has two meanings:
(1) reading data returns data defined at this backing layer,
(2) actually allocated, i.e. takes up space on the file represented by
this BDS.

As far as I understand, we actually don’t care about (2) in the context
of block_status, but just about (1).

So if a layer returns ZERO, it is by definition (1)-allocated.  (It
isn’t necessarily (2)-allocated.)

> 3. bdi.unallocated_blocks_are_zero
> 
> I think it's very bad, that we have formats, that supports backing, but
> doesn't report bdi.unallocated_blocks_are_zero as true. It means that
> UNALLOCATED region reads as zero if we have short backing file, and not
> as zero if we remove this short backing file.

What do you mean by “remove this short backing file”?  Because generally
one can’t just drop a backing file.

So maybe a case like block-stream?  Wouldn’t that be a bug in
block-stream them, i.e. shouldn’t it stream zeros after the end of the
backing file?

> I can live with it but
> this is weird logic. These bad drivers are qcow (not qcow2), parallels
> and vmdk. I hope, they actually just forget to set
> unallocated_blocks_are_zero to true.

qcow definitely sounds like it.

> Next. But what about drivers which doesn't support backing? As we
> considered above, they should always return ZERO or DATA, as everything
> is allocated in this backing-chain level (last level, of course).. So
> again unallocated_blocks_are_zero is meaningless. So, I think, that
> driver which doesn't support backings, should be fixed to return always
> ZERO or DATA, than we don't need this unallocated_blocks_are_zero at all.

Agreed.

> 3.

The second 3.? :)

> Short backing files in allocated_above: we must consider space after
> EOF as ALLOCATED, if short backing file is inside requested
> backing-chain part, as it produced exactly because of this short file
> (and we never go to backing).

Sounds correct.

> (current realization of allocated_above is
> buggy, see my outdated series "[PATCH 0/4] fix & merge
> block_status_above and is_allocated_above")
> 
> 4. Long ago we've discussed problems about BDRV_BLOCK_RAW, when we have
> a backing chain of non-backing child.. I just remember that we didn't
> reach the consensus.

Possible? :)

> 5. Filters.. OK we have two functions for them:
> bdrv_co_block_status_from_file and bdrv_co_block_status_from_backing. I
> think both are wrong:
> 
> bdrv_co_block_status_from_file leads to problem [4], when we can report
> UNALLOCATED, which refers not to the current backing chain, but to sub
> backing chain of file child, which is inconsistent with
> block_status_above and is_allocated_above iteration.
> 
> bdrv_co_block_status_from_backing is also is not consistent with
> 

backing chain & block status & filters

2020-04-28 Thread Vladimir Sementsov-Ogievskiy

Hi!

I wanted to resend my "[PATCH 0/4] fix & merge block_status_above and 
is_allocated_above", and returned to all the inconsistencies about block-status. I keep 
in mind Max's series about child-access functions, and Andrey's work about using COR filter 
in block-stream, which depends on Max's series (because, without them COR fitler with file 
child breaks backing chains).. And, it seems that it's better to discuss some questions 
before resending.

First, problems about block-status:

1. We consider ALLOCATED = ZERO | DATA, and documented as follows:

   * BDRV_BLOCK_DATA: allocation for data at offset is tied to this layer
   * BDRV_BLOCK_ZERO: offset reads as zero
   * BDRV_BLOCK_OFFSET_VALID: an associated offset exists for accessing raw data
   * BDRV_BLOCK_ALLOCATED: the content of the block is determined by this
   *   layer rather than any backing, set by block layer

This actually means, that we should always have BDRV_BLOCK_ALLOCATED for 
formats which doesn't support backing. So, all such format drivers must return 
ZERO or DATA (or both?), yes?. Seems file-posix does so, but, for example, 
iscsi - doesn't.

2. ZERO. The meaning differs a bit for generic block_status and for drivers.. I 
think, we at least should document it like this:

BDRV_BLOCK_DATA: allocation for data at offset is tied to this layer
BDRV_BLOCK_ZERO: if driver return ZERO, than the region is allocated at this 
layer and read as ZERO. If generic block_status returns ZERO, it only mean that 
it reads as zero, but the region may be allocated on underlying level.

3. bdi.unallocated_blocks_are_zero

I think it's very bad, that we have formats, that supports backing, but doesn't 
report bdi.unallocated_blocks_are_zero as true. It means that UNALLOCATED 
region reads as zero if we have short backing file, and not as zero if we 
remove this short backing file. I can live with it but this is weird logic. 
These bad drivers are qcow (not qcow2), parallels and vmdk. I hope, they 
actually just forget to set unallocated_blocks_are_zero to true.

Next. But what about drivers which doesn't support backing? As we considered 
above, they should always return ZERO or DATA, as everything is allocated in 
this backing-chain level (last level, of course).. So again 
unallocated_blocks_are_zero is meaningless. So, I think, that driver which 
doesn't support backings, should be fixed to return always ZERO or DATA, than 
we don't need this unallocated_blocks_are_zero at all.

3. Short backing files in allocated_above: we must consider space after EOF as ALLOCATED, if 
short backing file is inside requested backing-chain part, as it produced exactly because of 
this short file (and we never go to backing). (current realization of allocated_above is 
buggy, see my outdated series "[PATCH 0/4] fix & merge block_status_above and 
is_allocated_above")

4. Long ago we've discussed problems about BDRV_BLOCK_RAW, when we have a 
backing chain of non-backing child.. I just remember that we didn't reach the 
consensus.

5. Filters.. OK we have two functions for them: bdrv_co_block_status_from_file 
and bdrv_co_block_status_from_backing. I think both are wrong:

bdrv_co_block_status_from_file leads to problem [4], when we can report 
UNALLOCATED, which refers not to the current backing chain, but to sub backing 
chain of file child, which is inconsistent with block_status_above and 
is_allocated_above iteration.

bdrv_co_block_status_from_backing is also is not consistent with 
block_status_above iteration.. At least at leads to querying the same node 
twice.

=

So, about filters and backing chains. Keeping (OK, just, trying to keep) all these things in mind, 
I think that it's better to keep backing chains exactly *backing* chains, so that 
"backing" child is the only "not own" child of the node. So, its close to 
current behavior and we don't need child-access functions. Then how filters should work:

Filter with backing child, should always return UNALLOCATED (i.e. no DATA, no 
ZERO), it is honest: everything is on the other level of backing chain.

Filter with file child should always return BDRV_BLOCK_DATA | 
BDRV_BLOCK_RECURSE, to show that:
1. everything is allocated in *this* level of backing chain
2. filter is too lazy to dig in it's file child (and, maybe the whole sub-tree 
of it) and asks generic layer to do it by itself, if it wants zeroes.

Then, of course, if we want some filter to be inside backing chain, it should have not 
"file" child but "backing". For this, we may support in current public filter 
both variants: backing or file, as user prefer. I.e., filter is opened either with file option or 
with backing and operate correspondingly. And newer filters (like backup-top) may support only 
backing variants.

=

So, I propose to complicate a bit code of old file-based filters, to support 
backing child (which may be done on demand, when needed. For example, Virtuozzo 
now need