Re: Cross-project NBD extension proposal: NBD_INFO_INIT_STATE

2020-02-19 Thread Max Reitz
On 18.02.20 21:55, Eric Blake wrote:
> On 2/17/20 9:13 AM, Max Reitz wrote:
>> Hi,
>>
>> It’s my understanding that without some is_zero infrastructure for QEMU,
>> it’s impossible to implement this flag in qemu’s NBD server.
> 
> You're right that we may need some more infrastructure before being able
> to decide when to report this bit in all cases.  But for raw files, that
> infrastructure already exists: does block_status at offset 0 and the
> entire image as length return status that the entire file is a hole.

Hm, except that only works if the file is just completely unallocated.
Calling that existing infrastructure is a bit of a stretch, I think.

Or are you saying that bdrv_co_block_status(..., 0, bdrv_getlength(),
...) is our existing infrastructure?  Actually, why not.  Can we make
block drivers catch that special case?  (Or might the generic block code
somehow truncate such requests?)

> And
> for qcow2 files, it would not be that hard to teach a similar
> block_status request to report the entire image as a hole based on my
> proposed qcow2 autoclear bit tracking that the image still reads as zero.
> 
>>
>> At the same time, I still haven’t understood what we need the flag for.
>>
>> As far as I understood in our discussion on your qemu series, there is
>> no case where anyone would need to know whether an image is zero.  All
>> > practical cases involve someone having to ensure that some image is
>> zero.  Knowing whether an image is zero can help with that, but that can
>> be an implementation detail.
>>
>> For qcow2, the idea would be that there is some flag that remains true
>> as long as the image is guaranteed to be zero.  Then we’d have some
>> bdrv_make_zero function, and qcow2’s implementation would use this
>> information to gauge whether there’s something to do as all.
>>
>> For NBD, we cannot use this idea directly because to implement such a
>> flag (as you’re describing in this mail), we’d need separate is_zero
>> infrastructure, and that kind of makes the point of “drivers’
>> bdrv_make_zero() implementations do the right thing by themselves” moot.
> 
> We don't necessarily need a separate is_zero infrastructure if we can
> instead teach the existing block_status infrastructure to report that
> the entire image reads as zero.  You're right that clients that need to
> force an entire image to be zero won't need to directly call
> block_status (they can just call bdrv_make_zero, and let that worry
> about whether a block status call makes sense among its list of steps to
> try).  But since block_status can report all-zero status for some cases,
> it's not hard to use that for feeding the NBD bit.

OK.  I’m not 100% sure there’s nothing that would bite us in the butt
here, but I seem to remember you made all block_status things 64-bit, so
I suppose you know. :)

> However, there's a difference between qemu's block status (which is
> already typed correctly to return a 64-bit answer, even if it may need a
> few tweaks for clients that currently don't expect it to request more
> than 32 bits) and NBD's block status (which can only report 32 bits
> barring a new extension to the protocol), and where a single all-zero
> bit at NBD_OPT_GO is just as easy of an extension as a way to report a
> 64-bit all-zero response to NBD_CMD_BLOCK_STATUS.

Agreed.

>> OTOH, we wouldn’t need such a flag for the implementation, because we
>> could just send a 64-bit discard/make_zero over the whole block device
>> length to the NBD server, and then the server internally does the right
>> thing(TM).  AFAIU discard and write_zeroes currently have only 32 bit
>> length fields, but there were plans for adding support for 64 bit
>> versions anyway.  From my naïve outsider perspective, doing that doesn’t
>> seem a more complicated protocol addition than adding some way to tell
>> whether an NBD export is zero.
> 
> Adding 64-bit commands to NBD is more invasive than adding a single
> startup status bit.

True.  But if we/you want 64-bit commands anyway, then it doesn’t really
matter what’s more invasive.

> Both ideas can be done - doing one does not
> preclude the other.

Absolutely.  It’s just that if you do one anyway and it supersedes the
other, than we don’t have to do both.  Hence me wondering whether one
does supersede the other.

> But at the same time, not all servers will
> implement both ideas - if one is easy to implement while the other is
> hard, it is not unlikely that qemu will still encounter NBD servers that
> advertise startup state but not support 64-bit make_zero (even if qemu
> as NBD server starts supporting 64-bit make zero) or even 64-bit block
> status results.

Hm.  You know better than me whether that’s a good argument, because it
mostly depends on how many NBD server implementations there are;
specifically whether there are any that are decidedly not feature-complete.

> Another thing to think about here is timing.  With the proposed NBD
> addition, it is the server telling the 

Re: Cross-project NBD extension proposal: NBD_INFO_INIT_STATE

2020-02-18 Thread Eric Blake

On 2/17/20 9:13 AM, Max Reitz wrote:

Hi,

It’s my understanding that without some is_zero infrastructure for QEMU,
it’s impossible to implement this flag in qemu’s NBD server.


You're right that we may need some more infrastructure before being able 
to decide when to report this bit in all cases.  But for raw files, that 
infrastructure already exists: does block_status at offset 0 and the 
entire image as length return status that the entire file is a hole. 
And for qcow2 files, it would not be that hard to teach a similar 
block_status request to report the entire image as a hole based on my 
proposed qcow2 autoclear bit tracking that the image still reads as zero.




At the same time, I still haven’t understood what we need the flag for.

As far as I understood in our discussion on your qemu series, there is
no case where anyone would need to know whether an image is zero.  All > 
practical cases involve someone having to ensure that some image is
zero.  Knowing whether an image is zero can help with that, but that can
be an implementation detail.

For qcow2, the idea would be that there is some flag that remains true
as long as the image is guaranteed to be zero.  Then we’d have some
bdrv_make_zero function, and qcow2’s implementation would use this
information to gauge whether there’s something to do as all.

For NBD, we cannot use this idea directly because to implement such a
flag (as you’re describing in this mail), we’d need separate is_zero
infrastructure, and that kind of makes the point of “drivers’
bdrv_make_zero() implementations do the right thing by themselves” moot.


We don't necessarily need a separate is_zero infrastructure if we can 
instead teach the existing block_status infrastructure to report that 
the entire image reads as zero.  You're right that clients that need to 
force an entire image to be zero won't need to directly call 
block_status (they can just call bdrv_make_zero, and let that worry 
about whether a block status call makes sense among its list of steps to 
try).  But since block_status can report all-zero status for some cases, 
it's not hard to use that for feeding the NBD bit.


However, there's a difference between qemu's block status (which is 
already typed correctly to return a 64-bit answer, even if it may need a 
few tweaks for clients that currently don't expect it to request more 
than 32 bits) and NBD's block status (which can only report 32 bits 
barring a new extension to the protocol), and where a single all-zero 
bit at NBD_OPT_GO is just as easy of an extension as a way to report a 
64-bit all-zero response to NBD_CMD_BLOCK_STATUS.




OTOH, we wouldn’t need such a flag for the implementation, because we
could just send a 64-bit discard/make_zero over the whole block device
length to the NBD server, and then the server internally does the right
thing(TM).  AFAIU discard and write_zeroes currently have only 32 bit
length fields, but there were plans for adding support for 64 bit
versions anyway.  From my naïve outsider perspective, doing that doesn’t
seem a more complicated protocol addition than adding some way to tell
whether an NBD export is zero.


Adding 64-bit commands to NBD is more invasive than adding a single 
startup status bit.  Both ideas can be done - doing one does not 
preclude the other.  But at the same time, not all servers will 
implement both ideas - if one is easy to implement while the other is 
hard, it is not unlikely that qemu will still encounter NBD servers that 
advertise startup state but not support 64-bit make_zero (even if qemu 
as NBD server starts supporting 64-bit make zero) or even 64-bit block 
status results.


Another thing to think about here is timing.  With the proposed NBD 
addition, it is the server telling the client that "the image you are 
connecting to started zero", prior to the point that the client even has 
a chance to request "can you make the image all zero in a quick manner 
(and if not, I'll fall back to writing zeroes as I go)".  And even if 
NBD gains a 64-bit block status and/or make zero command, it is still 
less network traffic for the server to advertise up-front that the image 
is all zero than it is for the client to have to issue command requests 
of the server (network traffic is not always the bottleneck, but it can 
be a consideration).




So I’m still wondering whether there are actually cases where we need to
tell whether some image or NBD export is zero that do not involve making
it zero if it isn’t.


Just because we don't think that qemu-img has such a case does not mean 
that other NBD clients will not be able to come up with some use for 
knowing if an image starts all zero.




(I keep asking because it seems to me that if all we ever really want to
do is to ensure that some images/exports are zero, we should implement
that.)


The problem is WHERE do you implement it.  Is it more efficient to 
implement make_zero in the NBD server (the client merely requests to 
make 

Re: Cross-project NBD extension proposal: NBD_INFO_INIT_STATE

2020-02-17 Thread Max Reitz
Hi,

It’s my understanding that without some is_zero infrastructure for QEMU,
it’s impossible to implement this flag in qemu’s NBD server.

At the same time, I still haven’t understood what we need the flag for.

As far as I understood in our discussion on your qemu series, there is
no case where anyone would need to know whether an image is zero.  All
practical cases involve someone having to ensure that some image is
zero.  Knowing whether an image is zero can help with that, but that can
be an implementation detail.

For qcow2, the idea would be that there is some flag that remains true
as long as the image is guaranteed to be zero.  Then we’d have some
bdrv_make_zero function, and qcow2’s implementation would use this
information to gauge whether there’s something to do as all.

For NBD, we cannot use this idea directly because to implement such a
flag (as you’re describing in this mail), we’d need separate is_zero
infrastructure, and that kind of makes the point of “drivers’
bdrv_make_zero() implementations do the right thing by themselves” moot.

OTOH, we wouldn’t need such a flag for the implementation, because we
could just send a 64-bit discard/make_zero over the whole block device
length to the NBD server, and then the server internally does the right
thing(TM).  AFAIU discard and write_zeroes currently have only 32 bit
length fields, but there were plans for adding support for 64 bit
versions anyway.  From my naïve outsider perspective, doing that doesn’t
seem a more complicated protocol addition than adding some way to tell
whether an NBD export is zero.

So I’m still wondering whether there are actually cases where we need to
tell whether some image or NBD export is zero that do not involve making
it zero if it isn’t.

(I keep asking because it seems to me that if all we ever really want to
do is to ensure that some images/exports are zero, we should implement
that.)

Max



signature.asc
Description: OpenPGP digital signature


Re: Cross-project NBD extension proposal: NBD_INFO_INIT_STATE

2020-02-12 Thread Eric Blake

On 2/12/20 6:36 AM, Richard W.M. Jones wrote:


Okay, in v2, I will start with just two bits, NBD_INIT_SPARSE
(entire image is sparse, nothing is allocated) and NBD_INIT_ZERO
(entire image reads as zero), and save any future bits for later
additions.  Do we think that 16 bits is sufficient for the amount of
initial information likely to be exposed?


So as I understand the proposal, the 16 bit limit comes about because
we want a round 4 byte reply, 16 bits are used by NBD_INFO_INIT_STATE
and that leaves 16 bits feature bits.  Therefore the only way to go
from there is to have 32 feature bits but an awkward unaligned 6 byte
structure, or 48 feature bits (8 byte structure).


In general, the NBD protocol has NOT focused on alignment issues (for 
good or for bad).  For example, NBD_INFO_BLOCK_SIZE is 18 bytes; all 
NBD_CMD_* 32-bit requests are 28 bytes except for NBD_CMD_WRITE which 
can send unaligned payload with no further padding, and so forth.




I guess given those constraints we can stick with 16 feature bits, and
if we ever needed more then we'd have to introduce NBD_INFO_INIT_STATE2.

The only thing I can think of which might be useful is a "fully
preallocated" bit which might be used as an indication that writes are
fast and are unlikely to fail with ENOSPC.


and which would be mutually-exclusive with NBD_INFO_SPARSE (except for 
an image of size 0).  That bit would ALSO be an indication that the user 
may not want to punch holes into the image, but preserve the 
fully-allocated state (and thus avoid NBD_CMD_TRIM as well as passing 
NBD_CMD_FLAG_NO_HOLE to any WRITE_ZEROES request).





Are we in agreement that
my addition of an NBD_INFO_ response to NBD_OPT_GO is the best way
to expose initial state bits?


Seems reasonable to me.

Rich.



--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: Cross-project NBD extension proposal: NBD_INFO_INIT_STATE

2020-02-12 Thread Richard W.M. Jones


On Wed, Feb 12, 2020 at 06:09:11AM -0600, Eric Blake wrote:
> On 2/12/20 1:27 AM, Wouter Verhelst wrote:
> >Hi,
> >
> >On Mon, Feb 10, 2020 at 10:52:55PM +, Richard W.M. Jones wrote:
> >>But anyway ... could a flag indicating that the whole image is sparse
> >>be useful, either as well as NBD_INIT_SPARSE or instead of it?  You
> >>could use it to avoid an initial disk trim, which is something that
> >>mke2fs does:
> >
> >Yeah, I think that could definitely be useful. I honestly can't see a
> >use for NBD_INIT_SPARSE as defined in this proposal; and I don't think
> >it's generally useful to have a feature if we can't think of a use case
> >for it (that creates added complexity for no benefit).
> >
> >If we can find a reasonable use case for NBD_INIT_SPARSE as defined in
> >this proposal, then just add a third bit (NBD_INIT_ALL_SPARSE or
> >something) that says "the whole image is sparse". Otherwise, I think we
> >should redefine NBD_INIT_SPARSE to say that.
> 
> Okay, in v2, I will start with just two bits, NBD_INIT_SPARSE
> (entire image is sparse, nothing is allocated) and NBD_INIT_ZERO
> (entire image reads as zero), and save any future bits for later
> additions.  Do we think that 16 bits is sufficient for the amount of
> initial information likely to be exposed?

So as I understand the proposal, the 16 bit limit comes about because
we want a round 4 byte reply, 16 bits are used by NBD_INFO_INIT_STATE
and that leaves 16 bits feature bits.  Therefore the only way to go
from there is to have 32 feature bits but an awkward unaligned 6 byte
structure, or 48 feature bits (8 byte structure).

I guess given those constraints we can stick with 16 feature bits, and
if we ever needed more then we'd have to introduce NBD_INFO_INIT_STATE2.

The only thing I can think of which might be useful is a "fully
preallocated" bit which might be used as an indication that writes are
fast and are unlikely to fail with ENOSPC.

> Are we in agreement that
> my addition of an NBD_INFO_ response to NBD_OPT_GO is the best way
> to expose initial state bits?

Seems reasonable to me.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW




Re: Cross-project NBD extension proposal: NBD_INFO_INIT_STATE

2020-02-12 Thread Eric Blake

On 2/12/20 1:27 AM, Wouter Verhelst wrote:

Hi,

On Mon, Feb 10, 2020 at 10:52:55PM +, Richard W.M. Jones wrote:

But anyway ... could a flag indicating that the whole image is sparse
be useful, either as well as NBD_INIT_SPARSE or instead of it?  You
could use it to avoid an initial disk trim, which is something that
mke2fs does:


Yeah, I think that could definitely be useful. I honestly can't see a
use for NBD_INIT_SPARSE as defined in this proposal; and I don't think
it's generally useful to have a feature if we can't think of a use case
for it (that creates added complexity for no benefit).

If we can find a reasonable use case for NBD_INIT_SPARSE as defined in
this proposal, then just add a third bit (NBD_INIT_ALL_SPARSE or
something) that says "the whole image is sparse". Otherwise, I think we
should redefine NBD_INIT_SPARSE to say that.


Okay, in v2, I will start with just two bits, NBD_INIT_SPARSE (entire 
image is sparse, nothing is allocated) and NBD_INIT_ZERO (entire image 
reads as zero), and save any future bits for later additions.  Do we 
think that 16 bits is sufficient for the amount of initial information 
likely to be exposed?  Are we in agreement that my addition of an 
NBD_INFO_ response to NBD_OPT_GO is the best way to expose initial state 
bits?


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: Cross-project NBD extension proposal: NBD_INFO_INIT_STATE

2020-02-11 Thread Wouter Verhelst
Hi,

On Mon, Feb 10, 2020 at 10:52:55PM +, Richard W.M. Jones wrote:
> But anyway ... could a flag indicating that the whole image is sparse
> be useful, either as well as NBD_INIT_SPARSE or instead of it?  You
> could use it to avoid an initial disk trim, which is something that
> mke2fs does:

Yeah, I think that could definitely be useful. I honestly can't see a
use for NBD_INIT_SPARSE as defined in this proposal; and I don't think
it's generally useful to have a feature if we can't think of a use case
for it (that creates added complexity for no benefit).

If we can find a reasonable use case for NBD_INIT_SPARSE as defined in
this proposal, then just add a third bit (NBD_INIT_ALL_SPARSE or
something) that says "the whole image is sparse". Otherwise, I think we
should redefine NBD_INIT_SPARSE to say that.

-- 
To the thief who stole my anti-depressants: I hope you're happy

  -- seen somewhere on the Internet on a photo of a billboard



Re: Cross-project NBD extension proposal: NBD_INFO_INIT_STATE

2020-02-11 Thread Eric Blake

On 2/10/20 4:52 PM, Richard W.M. Jones wrote:

On Mon, Feb 10, 2020 at 04:29:53PM -0600, Eric Blake wrote:

On 2/10/20 4:12 PM, Richard W.M. Jones wrote:

On Mon, Feb 10, 2020 at 03:37:20PM -0600, Eric Blake wrote:

For now, only 2 of those 16 bits are defined: NBD_INIT_SPARSE (the
image has at least one hole) and NBD_INIT_ZERO (the image reads
completely as zero); the two bits are orthogonal and can be set
independently, although it is easy enough to see completely sparse
files with both bits set.


I think I'm confused about the exact meaning of NBD_INIT_SPARSE.  Do
you really mean the whole image is sparse; or (as you seem to have
said above) that there exists a hole somewhere in the image but we're
not saying where it is and there can be non-sparse parts of the image?


As implemented:

NBD_INIT_SPARSE - there is at least one hole somewhere (allocation
would be required to write to that part of the file), but there may
b allocated data elsewhere in the image.  Most disk images will fit
this definition (for example, it is very common to have a hole
between the MBR or GPT and the first partition containing a file
system, or for file systems themselves to be sparse within the
larger block device).


I think I'm still confused about why this particular flag would be
useful for clients (I can completely understand why clients need
NBD_INIT_ZERO).

But anyway ... could a flag indicating that the whole image is sparse
be useful, either as well as NBD_INIT_SPARSE or instead of it?  You
could use it to avoid an initial disk trim, which is something that
mke2fs does:

   
https://github.com/tytso/e2fsprogs/blob/0670fc20df4a4bbbeb0edb30d82628ea30a80598/misc/mke2fs.c#L2768


I'm open to suggestions on how many initial bits should be provided.  In 
fact, if we wanted, we could have a pair mutually-exclusive bits, 
advertising:

00 - no information known
01 - image is completely sparse
10 - image is completely allocated
11 - error

The goal of providing a 16-bit answer (or we could mandate 32 or 64 
bits, if we think we will ever want to extend that far) was to make it 
easier to add in whatever other initial-state extensions that someone 
could find useful.  Until we're happy with the design, the size or any 
given bit assignment is not locked down; once we do start committing any 
of this series, we've locked in what interoperability will demand but 
still have spare bits as future extensions.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: Cross-project NBD extension proposal: NBD_INFO_INIT_STATE

2020-02-10 Thread Richard W.M. Jones
On Mon, Feb 10, 2020 at 04:29:53PM -0600, Eric Blake wrote:
> On 2/10/20 4:12 PM, Richard W.M. Jones wrote:
> >On Mon, Feb 10, 2020 at 03:37:20PM -0600, Eric Blake wrote:
> >>For now, only 2 of those 16 bits are defined: NBD_INIT_SPARSE (the
> >>image has at least one hole) and NBD_INIT_ZERO (the image reads
> >>completely as zero); the two bits are orthogonal and can be set
> >>independently, although it is easy enough to see completely sparse
> >>files with both bits set.
> >
> >I think I'm confused about the exact meaning of NBD_INIT_SPARSE.  Do
> >you really mean the whole image is sparse; or (as you seem to have
> >said above) that there exists a hole somewhere in the image but we're
> >not saying where it is and there can be non-sparse parts of the image?
> 
> As implemented:
> 
> NBD_INIT_SPARSE - there is at least one hole somewhere (allocation
> would be required to write to that part of the file), but there may
> b allocated data elsewhere in the image.  Most disk images will fit
> this definition (for example, it is very common to have a hole
> between the MBR or GPT and the first partition containing a file
> system, or for file systems themselves to be sparse within the
> larger block device).

I think I'm still confused about why this particular flag would be
useful for clients (I can completely understand why clients need
NBD_INIT_ZERO).

But anyway ... could a flag indicating that the whole image is sparse
be useful, either as well as NBD_INIT_SPARSE or instead of it?  You
could use it to avoid an initial disk trim, which is something that
mke2fs does:

  
https://github.com/tytso/e2fsprogs/blob/0670fc20df4a4bbbeb0edb30d82628ea30a80598/misc/mke2fs.c#L2768

and which is painfully slow over NBD for very large devices because of
the 32 bit limit on request sizes - try doing mke2fs on a 1E nbdkit
memory disk some time.

> NBD_INIT_ZERO - all bytes read as zero.
> 
> The combination NBD_INIT_SPARSE|NBD_INIT_ZERO is common (generally,
> if you use lseek(SEEK_DATA) to prove the entire image reads as
> zeroes, you also know the entire image is sparse), but NBD_INIT_ZERO
> in isolation is also possible (especially with the qcow2 proposal of
> a persistent autoclear bit, where even with a fully preallocated
> qcow2 image you still know it reads as zeroes but there are no
> holes).  But you are also right that for servers that can advertise
> both bits efficiently, NBD_INIT_SPARSE in isolation may be more
> common than NBD_INIT_SPARSE|NBD_INIT_ZERO (the former for most disk
> images, the latter only for a freshly-created image that happens to
> create with zero initialization).
> 
> What's more, in my patches, I did NOT patch qemu to set or consume
> INIT_SPARSE; so far, it only sets/consumes INIT_ZERO.  Of course, if
> we can find a reason WHY qemu should track whether a qcow2 image is
> fully-allocated, by demonstrating a qemu-img algorithm that becomes
> easier for knowing if an image is sparse (even if our justification
> is: "when copying an image, I want to know if the _source_ is
> sparse, to know whether I have to bend over backwards to preallocate
> the destination"), then using that in qemu makes sense for my v2
> patches. But for v1, my only justification was "when copying an
> image, I can skip holes in the source if I know the _destination_
> already reads as zeroes", which only needed INIT_ZERO.
> 
> Some of the nbdkit patches demonstrate the some-vs.-all nature of
> the two bits; for example, in the split plugin, I initialize
> h->init_sparse = false; h->init_zero = true; then in a loop over
> each file change h->init_sparse to true if at least one file was
> sparse, and change h->init_zero to false if at least one file had
> non-zero contents.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html




Re: Cross-project NBD extension proposal: NBD_INFO_INIT_STATE

2020-02-10 Thread Eric Blake

On 2/10/20 4:12 PM, Richard W.M. Jones wrote:

On Mon, Feb 10, 2020 at 03:37:20PM -0600, Eric Blake wrote:

For now, only 2 of those 16 bits are defined: NBD_INIT_SPARSE (the
image has at least one hole) and NBD_INIT_ZERO (the image reads
completely as zero); the two bits are orthogonal and can be set
independently, although it is easy enough to see completely sparse
files with both bits set.


I think I'm confused about the exact meaning of NBD_INIT_SPARSE.  Do
you really mean the whole image is sparse; or (as you seem to have
said above) that there exists a hole somewhere in the image but we're
not saying where it is and there can be non-sparse parts of the image?


As implemented:

NBD_INIT_SPARSE - there is at least one hole somewhere (allocation would 
be required to write to that part of the file), but there may b 
allocated data elsewhere in the image.  Most disk images will fit this 
definition (for example, it is very common to have a hole between the 
MBR or GPT and the first partition containing a file system, or for file 
systems themselves to be sparse within the larger block device).


NBD_INIT_ZERO - all bytes read as zero.

The combination NBD_INIT_SPARSE|NBD_INIT_ZERO is common (generally, if 
you use lseek(SEEK_DATA) to prove the entire image reads as zeroes, you 
also know the entire image is sparse), but NBD_INIT_ZERO in isolation is 
also possible (especially with the qcow2 proposal of a persistent 
autoclear bit, where even with a fully preallocated qcow2 image you 
still know it reads as zeroes but there are no holes).  But you are also 
right that for servers that can advertise both bits efficiently, 
NBD_INIT_SPARSE in isolation may be more common than 
NBD_INIT_SPARSE|NBD_INIT_ZERO (the former for most disk images, the 
latter only for a freshly-created image that happens to create with zero 
initialization).


What's more, in my patches, I did NOT patch qemu to set or consume 
INIT_SPARSE; so far, it only sets/consumes INIT_ZERO.  Of course, if we 
can find a reason WHY qemu should track whether a qcow2 image is 
fully-allocated, by demonstrating a qemu-img algorithm that becomes 
easier for knowing if an image is sparse (even if our justification is: 
"when copying an image, I want to know if the _source_ is sparse, to 
know whether I have to bend over backwards to preallocate the 
destination"), then using that in qemu makes sense for my v2 patches. 
But for v1, my only justification was "when copying an image, I can skip 
holes in the source if I know the _destination_ already reads as 
zeroes", which only needed INIT_ZERO.


Some of the nbdkit patches demonstrate the some-vs.-all nature of the 
two bits; for example, in the split plugin, I initialize h->init_sparse 
= false; h->init_zero = true; then in a loop over each file change 
h->init_sparse to true if at least one file was sparse, and change 
h->init_zero to false if at least one file had non-zero contents.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: Cross-project NBD extension proposal: NBD_INFO_INIT_STATE

2020-02-10 Thread Richard W.M. Jones
On Mon, Feb 10, 2020 at 03:37:20PM -0600, Eric Blake wrote:
> For now, only 2 of those 16 bits are defined: NBD_INIT_SPARSE (the
> image has at least one hole) and NBD_INIT_ZERO (the image reads
> completely as zero); the two bits are orthogonal and can be set
> independently, although it is easy enough to see completely sparse
> files with both bits set.

I think I'm confused about the exact meaning of NBD_INIT_SPARSE.  Do
you really mean the whole image is sparse; or (as you seem to have
said above) that there exists a hole somewhere in the image but we're
not saying where it is and there can be non-sparse parts of the image?

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v