Re: [Qemu-block] write_zeroes/trim on the whole disk

2016-09-24 Thread Wouter Verhelst
On Sat, Sep 24, 2016 at 11:19:53PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 24.09.2016 21:24, Alex Bligh wrote:
> > > On 24 Sep 2016, at 18:47, Vladimir Sementsov-Ogievskiy 
> > >  wrote:
> > > 
> > > I just wanted to say, that if we want a possibility of clearing the whole 
> > > disk in one request for qcow2 we have to take 512 as granularity for such 
> > > requests (with X = 9). An this is too small. 1tb will be the upper bound 
> > > for the request.
> > Sure. But I do not see the value in optimising these huge commands to run 
> > as single requests. If you want to do that, do it properly and have a 
> > negotiation-phase flag that supports 64 bit request lengths.
> 
> And add additional request type with another magic in first field and 64bit
> length field? If such solution is appropriate for nbd it is ok for me of
> course. I've proposed something like this in first letter - "Increase length
> field of the request to 64bit". Changing existing request message type is
> wrong of course, but creating an additional one should be ok.

I'd be opposed to that.

Currently, the request and reply headers are strictly the same, for all
messages. That's a feature; it allows a server to read the header
(determine whether it needs to read data and if so, read that too), and
hand it off to something else.

By changing the header's format, you make the protocol more complicated.
I'd like to avoid that.

[...]
> or a separate command/flag for clearing the whole disk, and separate
> block-based solution in future if needed.

I'd prefer to avoid one-usecase commands; adding too many of those would
make the protocol specification too complicated. Currently, NBD is a
simple to understand protocol; I'd like to keep it that way.

I think it's not a huge problem if we introduce something that has
rounding errors; we could easily have a spec saying that

if NBD_CMD_WRITE_ZEROES with the NBD_CMD_FLAG_SHIFT flag is used,
and the last byte of the device falls between N-1 and N blocks, then
it is not an error to specify a length of N blocks.

or something along those lines.

> or new request type with 64bit length

Not going to happen as far as I'm concerned, as per above.

-- 
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
   people in the world who think they really understand all of its rules,
   and pretty much all of them are just lying to themselves too.
 -- #debian-devel, OFTC, 2016-02-12



Re: [Qemu-block] [Nbd] write_zeroes/trim on the whole disk

2016-09-24 Thread Wouter Verhelst
On Sat, Sep 24, 2016 at 11:31:25AM +0100, Alex Bligh wrote:
> 
> > On 23 Sep 2016, at 22:21, Wouter Verhelst  wrote:
> > 
> > On Fri, Sep 23, 2016 at 02:00:06PM -0500, Eric Blake wrote:
> >> My preference would be a new flag to the existing commands, with
> >> explicit documentation that 0 offset and 0 length must be used with that
> >> flag, when requesting a full-device wipe.
> > 
> > Alternatively, what about a flag that says "if you use this flag, the
> > size should be left-shifted by X bits before processing"? That allows
> > you to do TRIM or WRITE_ZEROES on much larger chunks, without being
> > limited to "whole disk" commands. We should probably make it an illegal
> > flag for any command that actually sends data over the wire, though.
> 
> I'd prefer an approach like this. Perhaps X could be negotiated
> with the block size extension (or simply be defined as the
> 'preferred block size'.
> 
> This could then be defined to work with all commands.

The downside of making the shift size too small is that it wouldn't help
as much with the use case that was presented here (that is, blanking a
whole disk with one command). I think if the shift size is to be
negotiated, it should be something that the client prefers, rather than
something the server prefers.

I really don't think allowing a shift for things that requires data
transfer makes much sense. Yes, a shift for things like "clear all data"
can be sensible, because (depending on backend) this can be a fairly
simple operation. However, "write more than 2G of data" or "read more
than 2G of data" isn't a simple operation and takes a lot of data to be
transferred, so allowing shift for those operations doesn't make sense,
at all; and suggesting that we allow it could actually make people
believe it's a good idea to do so when it really really (really) isn't.

-- 
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
   people in the world who think they really understand all of its rules,
   and pretty much all of them are just lying to themselves too.
 -- #debian-devel, OFTC, 2016-02-12



Re: [Qemu-block] [Nbd] write_zeroes/trim on the whole disk

2016-09-24 Thread Carl-Daniel Hailfinger
On 24.09.2016 19:33, Vladimir Sementsov-Ogievskiy wrote:
> On 24.09.2016 20:13, Vladimir Sementsov-Ogievskiy wrote:
>> I agree that requests larger than disk size are ugly.. But splitting
>> request brings me again to idea of having separate command or flag
>> for clearing the whole disk without that dance. Server may report
>> availability of this/flag command only if target driver supports fast
>> write_zeroes (qcow2 in our case). 
> Also, such flag may be used to satisfy all needs:
>
> flag BIG_REQUEST is set and length = 0->request on the whole 
> disk, offset must be 0
> flag BIG_REQUEST is set and length > 0->request on 
> (offset*block_size, length*block_size), length*block_size must be <= 
> disk_size

What happens if length*block_size<=disk_size, but
offset*block_size+length*block_size>disk_size? Wraparound?

Regards,
Carl-Daniel



Re: [Qemu-block] [Nbd] write_zeroes/trim on the whole disk

2016-09-24 Thread Vladimir Sementsov-Ogievskiy

On 24.09.2016 23:14, Carl-Daniel Hailfinger wrote:

On 24.09.2016 19:33, Vladimir Sementsov-Ogievskiy wrote:

On 24.09.2016 20:13, Vladimir Sementsov-Ogievskiy wrote:

I agree that requests larger than disk size are ugly.. But splitting
request brings me again to idea of having separate command or flag
for clearing the whole disk without that dance. Server may report
availability of this/flag command only if target driver supports fast
write_zeroes (qcow2 in our case).

Also, such flag may be used to satisfy all needs:

flag BIG_REQUEST is set and length = 0->request on the whole
disk, offset must be 0
flag BIG_REQUEST is set and length > 0->request on
(offset*block_size, length*block_size), length*block_size must be <=
disk_size

What happens if length*block_size<=disk_size, but
offset*block_size+length*block_size>disk_size? Wraparound?


Personally, for me main scenario for is length=0, to clear the whole 
disk. In general allowing requests larger than disk size may not be very 
good idea.. Also:


On 24.09.2016 19:49, Alex Bligh wrote:

>May be, additional option, specifying the shift would be better. With 
convention that if offset+length exceeds disk size, length should be recalculated 
as disk_size-offset.

I don't think we should do that. We already have clear semantics that prevent 
operations beyond the end of the disk. Again, just break the command up into 
multipl commands. No great hardship.


So, most possible answer on your question: it should be an error.



Regards,
Carl-Daniel



--
Best regards,
Vladimir




Re: [Qemu-block] write_zeroes/trim on the whole disk

2016-09-24 Thread Vladimir Sementsov-Ogievskiy

On 24.09.2016 21:24, Alex Bligh wrote:

On 24 Sep 2016, at 18:47, Vladimir Sementsov-Ogievskiy 
 wrote:

I just wanted to say, that if we want a possibility of clearing the whole disk 
in one request for qcow2 we have to take 512 as granularity for such requests 
(with X = 9). An this is too small. 1tb will be the upper bound for the request.

Sure. But I do not see the value in optimising these huge commands to run as 
single requests. If you want to do that, do it properly and have a 
negotiation-phase flag that supports 64 bit request lengths.


And add additional request type with another magic in first field and 
64bit length field? If such solution is appropriate for nbd it is ok for 
me of course. I've proposed something like this in first letter - 
"Increase length field of the request to 64bit". Changing existing 
request message type is wrong of course, but creating an additional one 
should be ok.





Full backup, for example:

1. target can do fast write_zeroes: clear the whole disk (great if we can do it 
in one request, without splitting, etc), then backup all data except zero or 
unallocated (save a lot of time on this skipping).
2. target can not do fast write_zeroes: just backup all data. We need not clear 
the disk, as we will not save time by this.

So here, we need not splitting as a general. Just clear all or not clearing at 
all.

As I said, within the current protocol you cannot tell whether a target 
supports 'fast write zeroes', and indeed the support may be partial - for 
instance with a QCOW2 backend, a write that is not cluster aligned would likely 
only partially satisfy the command by deallocating bytes. There is no current 
flag for 'supports fast write zeroes' and (given the foregoing) it isn't 
evident to me exactly what it would mean.


I suggest to add this flag - which is a negotiation-phase flag, exposing 
support of the whole feature (separate command or flag for clearing the 
whole disk). Fast here means that we can do this in one request. 
write_zeroes(of any size, up to the whole disk) is fast if it will not 
take more time than usual write (restricted to 2G).




It seems however you could support your use case by simply iterating through 
the backup disk, using NBD_CMD_WRITE for the areas that are allocated and 
non-zero, and using NBD_CMD_WRITE_ZEROES for the areas that are not allocated 
or zeroed. This technique would not require a protocol change (beyond the 
existing NBD_CMD_WRITE_ZEROES extension), works irrespective of whether the 
target supports write zeroes or not, works irrespective of difference in 
cluster allocation size between source and target, is far simpler, and has the 
added advantage of making the existing zeroes-but-not-holes area into holes 
(that is optional if you can tell the difference between zeroes and holes on 
the source media). It also works on a single pass. Yes, you need to split 
requests up, but you need to split requests up ANYWAY to cope with 
NBD_CMD_WRITE's 2^32-1 length limit (I strongly advise you not to use more than 
2^31). And in any case, you probably want to parallelise reads and writes and 
have more than one write in flight in any case, all of which suggests you are 
going to be breaking up requests anyway.


This is slow, see my first letter. Iterative zeroing of qcow2 is slow.

Why separate command/flag for clearing the whole disk is better for me 
than block-based solution with splitting requests? I want to clear the 
whole disk and I don't want to introduce new functionality, which I 
don't need for now. I need to clearing the whole disk, but with 
block-based solution I have a lot of code, which solves another task. 
And it only indirectly solves my task. I.e. instead of 
simple_realisation+simple_usage+nice_solution_for_my_task I have 
harder_realisation+harder_usage+ugly_solution_for_my_task.


I understand, that we must take into account that such functionality 
(large requests) will likely be needed in future, so more generic 
solution is better for a protocol. And I suggest a compromise:


negotiation-phase flag NBD_FLAG_SEND_BIG_REQUEST : command flag 
NBD_CMD_FLAG_BIG_REQUEST is supported for WRITE_ZEROES and TRIM
negotiation-phase flag NBD_FLAG_SEND_BIG_REQUEST_REGION : non-zero 
length is supported for big request


flag NBD_CMD_FLAG_BIG_REQUEST is set and length = 0-> request on the 
whole disk, offset must be 0
flag NBD_CMD_FLAG_BIG_REQUEST is set and length > 0-> request on 
(offset*block_size, length*block_size), length*block_size must be <= 
disk_size (only if NBD_FLAG_SEND_BIG_REQUEST_REGION is negotiated)
flag NBD_CMD_FLAG_BIG_REQUEST is unset ->usual request on 
(offset, length)




or a separate command/flag for clearing the whole disk, and separate 
block-based solution in future if needed.




or new request type with 64bit length


--
Best regards,
Vladimir





Re: [Qemu-block] write_zeroes/trim on the whole disk

2016-09-24 Thread Alex Bligh

> On 24 Sep 2016, at 18:47, Vladimir Sementsov-Ogievskiy 
>  wrote:
> 
> I just wanted to say, that if we want a possibility of clearing the whole 
> disk in one request for qcow2 we have to take 512 as granularity for such 
> requests (with X = 9). An this is too small. 1tb will be the upper bound for 
> the request.

Sure. But I do not see the value in optimising these huge commands to run as 
single requests. If you want to do that, do it properly and have a 
negotiation-phase flag that supports 64 bit request lengths.

> Full backup, for example:
> 
> 1. target can do fast write_zeroes: clear the whole disk (great if we can do 
> it in one request, without splitting, etc), then backup all data except zero 
> or unallocated (save a lot of time on this skipping).
> 2. target can not do fast write_zeroes: just backup all data. We need not 
> clear the disk, as we will not save time by this.
> 
> So here, we need not splitting as a general. Just clear all or not clearing 
> at all.

As I said, within the current protocol you cannot tell whether a target 
supports 'fast write zeroes', and indeed the support may be partial - for 
instance with a QCOW2 backend, a write that is not cluster aligned would likely 
only partially satisfy the command by deallocating bytes. There is no current 
flag for 'supports fast write zeroes' and (given the foregoing) it isn't 
evident to me exactly what it would mean.

It seems however you could support your use case by simply iterating through 
the backup disk, using NBD_CMD_WRITE for the areas that are allocated and 
non-zero, and using NBD_CMD_WRITE_ZEROES for the areas that are not allocated 
or zeroed. This technique would not require a protocol change (beyond the 
existing NBD_CMD_WRITE_ZEROES extension), works irrespective of whether the 
target supports write zeroes or not, works irrespective of difference in 
cluster allocation size between source and target, is far simpler, and has the 
added advantage of making the existing zeroes-but-not-holes area into holes 
(that is optional if you can tell the difference between zeroes and holes on 
the source media). It also works on a single pass. Yes, you need to split 
requests up, but you need to split requests up ANYWAY to cope with 
NBD_CMD_WRITE's 2^32-1 length limit (I strongly advise you not to use more than 
2^31). And in any case, you probably want to parallelise reads and writes and 
have more than one write in flight in any case, all of which suggests you are 
going to be breaking up requests anyway.

-- 
Alex Bligh







Re: [Qemu-block] write_zeroes/trim on the whole disk

2016-09-24 Thread Vladimir Sementsov-Ogievskiy

On 24.09.2016 20:32, Alex Bligh wrote:

On 24 Sep 2016, at 18:13, Vladimir Sementsov-Ogievskiy 
 wrote:

On 24.09.2016 19:49, Alex Bligh wrote:

On 24 Sep 2016, at 17:42, Vladimir Sementsov-Ogievskiy 
 wrote:

On 24.09.2016 19:31, Alex Bligh wrote:

On 24 Sep 2016, at 13:06, Vladimir Sementsov-Ogievskiy 
 wrote:

Note: if disk size is not aligned to X we will have to send request larger than 
the disk size to clear the whole disk.

If you look at the block size extension, the size of the disk must be an exact 
multiple of the minimum block size. So that would work.

This means that this extension could not be used with any qcow2 disk, as qcow2 
may have size not aligned to its cluster size.

# qemu-img create -f qcow2 mega 1K
Formatting 'mega', fmt=qcow2 size=1024 encryption=off cluster_size=65536 
lazy_refcounts=off refcount_bits=16
# qemu-img info mega
image: mega
file format: qcow2
virtual size: 1.0K (1024 bytes)
disk size: 196K
cluster_size: 65536
Format specific information:
compat: 1.1
lazy refcounts: false
refcount bits: 16
corrupt: false

And there is no such restriction in documentation. Or we have to consider 
sector-size (512b) as block size for qcow2, which is too small for our needs.

If by "this extension" you mean the INFO extension (which reports block sizes) 
that's incorrect.

An nbd server using a QCOW2 file as the backend would report the sector size as 
the minimum block size. It might report the cluster size or the sector size as 
the preferred block size, or anything in between.

QCOW2 cluster size essentially determines the allocation unit. NBD is not 
bothered as to the underlying allocation unit. It does not (currently) support 
the concept of making holes visible to the client. If you use 
NBD_CMD_WRITE_ZEREOS you get zeroes, which might or might not be implemented as 
one or more holes or 'real' zeroes (save if you specify NBD_CMD_FLAG_NO_HOLE in 
which case you are guaranteed to get 'real' zeroes'). If you use NBD_CMD_TRIM 
then the area trimmed might nor might not be written with one or more whole. 
There is (currently) no way to detect the presence of holes separately from 
zeroes (though a bitmap extension was discussed).


I just wanted to say, that if we want a possibility of clearing the 
whole disk in one request for qcow2 we have to take 512 as granularity 
for such requests (with X = 9). An this is too small. 1tb will be the 
upper bound for the request.





But there is no guarantee that disk_size/block_size < INT_MAX..

I think you mean 2^32-1, but yes there is no guarantee of that. In that case 
you would need to break the call up into multiple calls.

However, being able to break the call up into multiple calls seems pretty 
sensible given that NBD_CMD_WRITE_ZEROES may take a large amount of
time, and a REALLY long time if the server doesn't support trim.


May be, additional option, specifying the shift would be better. With 
convention that if offset+length exceeds disk size, length should be 
recalculated as disk_size-offset.

I don't think we should do that. We already have clear semantics that prevent 
operations beyond the end of the disk. Again, just break the command up into 
multipl commands. No great hardship.


I agree that requests larger than disk size are ugly.. But splitting request 
brings me again to idea of having separate command or flag for clearing the 
whole disk without that dance. Server may report availability of this/flag 
command only if target driver supports fast write_zeroes (qcow2 in our case).

Why? In the general case you need to break up requests anyway (particularly 
with the INFO extension where there is a maximum command size), and issuing a 
command over a TCP connection that might take hours or days to complete with no 
hint of progress, and no TCP traffic to keep NAT etc. alive, sounds like bad 
practice. The overhead is tiny.

I would be against this change.



Full backup, for example:

1. target can do fast write_zeroes: clear the whole disk (great if we 
can do it in one request, without splitting, etc), then backup all data 
except zero or unallocated (save a lot of time on this skipping).
2. target can not do fast write_zeroes: just backup all data. We need 
not clear the disk, as we will not save time by this.


So here, we need not splitting as a general. Just clear all or not 
clearing at all.


--
Best regards,
Vladimir




Re: [Qemu-block] write_zeroes/trim on the whole disk

2016-09-24 Thread Alex Bligh

> On 24 Sep 2016, at 18:13, Vladimir Sementsov-Ogievskiy 
>  wrote:
> 
> On 24.09.2016 19:49, Alex Bligh wrote:
>>> On 24 Sep 2016, at 17:42, Vladimir Sementsov-Ogievskiy 
>>>  wrote:
>>> 
>>> On 24.09.2016 19:31, Alex Bligh wrote:
> On 24 Sep 2016, at 13:06, Vladimir Sementsov-Ogievskiy 
>  wrote:
> 
> Note: if disk size is not aligned to X we will have to send request 
> larger than the disk size to clear the whole disk.
 If you look at the block size extension, the size of the disk must be an 
 exact multiple of the minimum block size. So that would work.
> 
> This means that this extension could not be used with any qcow2 disk, as 
> qcow2 may have size not aligned to its cluster size.
> 
> # qemu-img create -f qcow2 mega 1K
> Formatting 'mega', fmt=qcow2 size=1024 encryption=off cluster_size=65536 
> lazy_refcounts=off refcount_bits=16
> # qemu-img info mega
> image: mega
> file format: qcow2
> virtual size: 1.0K (1024 bytes)
> disk size: 196K
> cluster_size: 65536
> Format specific information:
>compat: 1.1
>lazy refcounts: false
>refcount bits: 16
>corrupt: false
> 
> And there is no such restriction in documentation. Or we have to consider 
> sector-size (512b) as block size for qcow2, which is too small for our needs.

If by "this extension" you mean the INFO extension (which reports block sizes) 
that's incorrect.

An nbd server using a QCOW2 file as the backend would report the sector size as 
the minimum block size. It might report the cluster size or the sector size as 
the preferred block size, or anything in between.

QCOW2 cluster size essentially determines the allocation unit. NBD is not 
bothered as to the underlying allocation unit. It does not (currently) support 
the concept of making holes visible to the client. If you use 
NBD_CMD_WRITE_ZEREOS you get zeroes, which might or might not be implemented as 
one or more holes or 'real' zeroes (save if you specify NBD_CMD_FLAG_NO_HOLE in 
which case you are guaranteed to get 'real' zeroes'). If you use NBD_CMD_TRIM 
then the area trimmed might nor might not be written with one or more whole. 
There is (currently) no way to detect the presence of holes separately from 
zeroes (though a bitmap extension was discussed).

>>> But there is no guarantee that disk_size/block_size < INT_MAX..
>> I think you mean 2^32-1, but yes there is no guarantee of that. In that case 
>> you would need to break the call up into multiple calls.
>> 
>> However, being able to break the call up into multiple calls seems pretty 
>> sensible given that NBD_CMD_WRITE_ZEROES may take a large amount of
>> time, and a REALLY long time if the server doesn't support trim.
>> 
>>> May be, additional option, specifying the shift would be better. With 
>>> convention that if offset+length exceeds disk size, length should be 
>>> recalculated as disk_size-offset.
>> I don't think we should do that. We already have clear semantics that 
>> prevent operations beyond the end of the disk. Again, just break the command 
>> up into multipl commands. No great hardship.
>> 
> 
> I agree that requests larger than disk size are ugly.. But splitting request 
> brings me again to idea of having separate command or flag for clearing the 
> whole disk without that dance. Server may report availability of this/flag 
> command only if target driver supports fast write_zeroes (qcow2 in our case).

Why? In the general case you need to break up requests anyway (particularly 
with the INFO extension where there is a maximum command size), and issuing a 
command over a TCP connection that might take hours or days to complete with no 
hint of progress, and no TCP traffic to keep NAT etc. alive, sounds like bad 
practice. The overhead is tiny.

I would be against this change.

-- 
Alex Bligh







Re: [Qemu-block] write_zeroes/trim on the whole disk

2016-09-24 Thread Vladimir Sementsov-Ogievskiy

On 24.09.2016 20:13, Vladimir Sementsov-Ogievskiy wrote:

On 24.09.2016 19:49, Alex Bligh wrote:
On 24 Sep 2016, at 17:42, Vladimir Sementsov-Ogievskiy 
 wrote:


On 24.09.2016 19:31, Alex Bligh wrote:
On 24 Sep 2016, at 13:06, Vladimir Sementsov-Ogievskiy 
 wrote:


Note: if disk size is not aligned to X we will have to send 
request larger than the disk size to clear the whole disk.
If you look at the block size extension, the size of the disk must 
be an exact multiple of the minimum block size. So that would work.


This means that this extension could not be used with any qcow2 disk, 
as qcow2 may have size not aligned to its cluster size.


# qemu-img create -f qcow2 mega 1K
Formatting 'mega', fmt=qcow2 size=1024 encryption=off 
cluster_size=65536 lazy_refcounts=off refcount_bits=16

# qemu-img info mega
image: mega
file format: qcow2
virtual size: 1.0K (1024 bytes)
disk size: 196K
cluster_size: 65536
Format specific information:
compat: 1.1
lazy refcounts: false
refcount bits: 16
corrupt: false

And there is no such restriction in documentation. Or we have to 
consider sector-size (512b) as block size for qcow2, which is too 
small for our needs.





But there is no guarantee that disk_size/block_size < INT_MAX..
I think you mean 2^32-1, but yes there is no guarantee of that. In 
that case you would need to break the call up into multiple calls.


However, being able to break the call up into multiple calls seems 
pretty sensible given that NBD_CMD_WRITE_ZEROES may take a large 
amount of

time, and a REALLY long time if the server doesn't support trim.

May be, additional option, specifying the shift would be better. 
With convention that if offset+length exceeds disk size, length 
should be recalculated as disk_size-offset.
I don't think we should do that. We already have clear semantics that 
prevent operations beyond the end of the disk. Again, just break the 
command up into multipl commands. No great hardship.




I agree that requests larger than disk size are ugly.. But splitting 
request brings me again to idea of having separate command or flag for 
clearing the whole disk without that dance. Server may report 
availability of this/flag command only if target driver supports fast 
write_zeroes (qcow2 in our case).




Also, such flag may be used to satisfy all needs:

flag BIG_REQUEST is set and length = 0->request on the whole 
disk, offset must be 0
flag BIG_REQUEST is set and length > 0->request on 
(offset*block_size, length*block_size), length*block_size must be <= 
disk_size

flag BIG_REQUEST is unset ->usual request on (offset, length)

--
Best regards,
Vladimir




Re: [Qemu-block] write_zeroes/trim on the whole disk

2016-09-24 Thread Vladimir Sementsov-Ogievskiy

On 24.09.2016 19:49, Alex Bligh wrote:

On 24 Sep 2016, at 17:42, Vladimir Sementsov-Ogievskiy 
 wrote:

On 24.09.2016 19:31, Alex Bligh wrote:

On 24 Sep 2016, at 13:06, Vladimir Sementsov-Ogievskiy 
 wrote:

Note: if disk size is not aligned to X we will have to send request larger than 
the disk size to clear the whole disk.

If you look at the block size extension, the size of the disk must be an exact 
multiple of the minimum block size. So that would work.


This means that this extension could not be used with any qcow2 disk, as 
qcow2 may have size not aligned to its cluster size.


# qemu-img create -f qcow2 mega 1K
Formatting 'mega', fmt=qcow2 size=1024 encryption=off cluster_size=65536 
lazy_refcounts=off refcount_bits=16

# qemu-img info mega
image: mega
file format: qcow2
virtual size: 1.0K (1024 bytes)
disk size: 196K
cluster_size: 65536
Format specific information:
compat: 1.1
lazy refcounts: false
refcount bits: 16
corrupt: false

And there is no such restriction in documentation. Or we have to 
consider sector-size (512b) as block size for qcow2, which is too small 
for our needs.





But there is no guarantee that disk_size/block_size < INT_MAX..

I think you mean 2^32-1, but yes there is no guarantee of that. In that case 
you would need to break the call up into multiple calls.

However, being able to break the call up into multiple calls seems pretty 
sensible given that NBD_CMD_WRITE_ZEROES may take a large amount of
time, and a REALLY long time if the server doesn't support trim.


May be, additional option, specifying the shift would be better. With 
convention that if offset+length exceeds disk size, length should be 
recalculated as disk_size-offset.

I don't think we should do that. We already have clear semantics that prevent 
operations beyond the end of the disk. Again, just break the command up into 
multipl commands. No great hardship.



I agree that requests larger than disk size are ugly.. But splitting 
request brings me again to idea of having separate command or flag for 
clearing the whole disk without that dance. Server may report 
availability of this/flag command only if target driver supports fast 
write_zeroes (qcow2 in our case).


--
Best regards,
Vladimir




Re: [Qemu-block] write_zeroes/trim on the whole disk

2016-09-24 Thread Alex Bligh

> On 24 Sep 2016, at 17:52, Alex Bligh  wrote:
> 
> In *your* use-case holes may be desirable. However in the general case, you 
> cannot assume a server supports holes. Optional support for holes isn't even 
> in the mainline spec yet (AFAIR).

You should also be aware that the minimum granularity for writes (currently 1 
byte in the spec, but the blocksize extension allows for larger) may be smaller 
than the minimum granularity for holes, which may depend on the underlying 
filing system.

A hole is in essence merely an optimised way of storing zeroes. And the trim 
operation says 'I'm not worried about this data any more - so zero it if you 
like'.

-- 
Alex Bligh







Re: [Qemu-block] write_zeroes/trim on the whole disk

2016-09-24 Thread Alex Bligh

> On 24 Sep 2016, at 17:48, Vladimir Sementsov-Ogievskiy 
>  wrote:
> 
>>> Use NBD_CMD_WRITE_ZEROES without NBD_CMD_FLAG_NO_HOLE and you can pretty 
>>> much assume that a server that supports holes will write holes. A server 
>>> that does not support holes will write zeroes. If you don't care whether 
>>> the resultant data is zero, just use NBD_CMD_TRIM. But as you do care (see 
>>> above) you must be prepared for a 'thick' write of zeroes on servers that 
>>> don't support it.
>>> 
>> 
>> No, holes are critical. Concreate case: incremental backup to delta file. If 
>> we write zeroes instead of holes, we will lose underlying data (from 
>> previous incremental).
>> 
> hmm, no, sorry, that is not needed.

In *your* use-case holes may be desirable. However in the general case, you 
cannot assume a server supports holes. Optional support for holes isn't even in 
the mainline spec yet (AFAIR).

It's up to you if you choose not to connect to servers that don't support 
NBD_CMD_WRITE_ZEREOS or NBD_CMD_TRIM, but even if you take that strategy, you 
cannot guarantee that the server will not implement them by ignoring 
NBD_CMD_TRIM (this is perfectly acceptable as NBD_CMD_TRIM is advisory only) 
and making NBD_CMD_WRITE_ZEREOS do a long write of zeroes.

IE there is no way to detect whether the server actually supports holes.

-- 
Alex Bligh







Re: [Qemu-block] write_zeroes/trim on the whole disk

2016-09-24 Thread Alex Bligh

> On 24 Sep 2016, at 17:42, Vladimir Sementsov-Ogievskiy 
>  wrote:
> 
> On 24.09.2016 19:31, Alex Bligh wrote:
>>> On 24 Sep 2016, at 13:06, Vladimir Sementsov-Ogievskiy 
>>>  wrote:
>>> 
>>> Note: if disk size is not aligned to X we will have to send request larger 
>>> than the disk size to clear the whole disk.
>> If you look at the block size extension, the size of the disk must be an 
>> exact multiple of the minimum block size. So that would work.
>> 
> 
> But there is no guarantee that disk_size/block_size < INT_MAX..

I think you mean 2^32-1, but yes there is no guarantee of that. In that case 
you would need to break the call up into multiple calls.

However, being able to break the call up into multiple calls seems pretty 
sensible given that NBD_CMD_WRITE_ZEROES may take a large amount of
time, and a REALLY long time if the server doesn't support trim.

> May be, additional option, specifying the shift would be better. With 
> convention that if offset+length exceeds disk size, length should be 
> recalculated as disk_size-offset.

I don't think we should do that. We already have clear semantics that prevent 
operations beyond the end of the disk. Again, just break the command up into 
multipl commands. No great hardship.

-- 
Alex Bligh







Re: [Qemu-block] write_zeroes/trim on the whole disk

2016-09-24 Thread Vladimir Sementsov-Ogievskiy

On 24.09.2016 19:44, Vladimir Sementsov-Ogievskiy wrote:

On 24.09.2016 19:35, Alex Bligh wrote:
On 24 Sep 2016, at 17:20, Vladimir Sementsov-Ogievskiy 
 wrote:


Also, accordingly to documentation, NBD_CMD_TRIM is not appropriate 
for disk clearing:


  * `NBD_CMD_TRIM` (4)

  A hint to the server that the data defined by len and offset 
is no
  longer needed. A server MAY discard len bytes starting at 
offset, but

  is not required to.

  After issuing this command, a client MUST NOT make any 
assumptions

  about the contents of the export affected by this command, until
  overwriting it again with `NBD_CMD_WRITE`.

- it may do nothing.. So, what to do with this? add flag FORCE_TRIM 
for this command? Or add FORCE_HOLES flag to WRITE_ZEROES?
You cannot force a hole, because NBD the is not guaranteed to support 
holes.


Use NBD_CMD_WRITE_ZEROES without NBD_CMD_FLAG_NO_HOLE and you can 
pretty much assume that a server that supports holes will write 
holes. A server that does not support holes will write zeroes. If you 
don't care whether the resultant data is zero, just use NBD_CMD_TRIM. 
But as you do care (see above) you must be prepared for a 'thick' 
write of zeroes on servers that don't support it.




No, holes are critical. Concreate case: incremental backup to delta 
file. If we write zeroes instead of holes, we will lose underlying 
data (from previous incremental).



hmm, no, sorry, that is not needed.

--
Best regards,
Vladimir




Re: [Qemu-block] write_zeroes/trim on the whole disk

2016-09-24 Thread Vladimir Sementsov-Ogievskiy

On 24.09.2016 19:35, Alex Bligh wrote:

On 24 Sep 2016, at 17:20, Vladimir Sementsov-Ogievskiy 
 wrote:

Also, accordingly to documentation, NBD_CMD_TRIM is not appropriate for disk 
clearing:

  * `NBD_CMD_TRIM` (4)

  A hint to the server that the data defined by len and offset is no
  longer needed. A server MAY discard len bytes starting at offset, but
  is not required to.

  After issuing this command, a client MUST NOT make any assumptions
  about the contents of the export affected by this command, until
  overwriting it again with `NBD_CMD_WRITE`.

- it may do nothing.. So, what to do with this? add flag FORCE_TRIM for this 
command? Or add FORCE_HOLES flag to WRITE_ZEROES?

You cannot force a hole, because NBD the is not guaranteed to support holes.

Use NBD_CMD_WRITE_ZEROES without NBD_CMD_FLAG_NO_HOLE and you can pretty much 
assume that a server that supports holes will write holes. A server that does 
not support holes will write zeroes. If you don't care whether the resultant 
data is zero, just use NBD_CMD_TRIM. But as you do care (see above) you must be 
prepared for a 'thick' write of zeroes on servers that don't support it.



No, holes are critical. Concreate case: incremental backup to delta 
file. If we write zeroes instead of holes, we will lose underlying data 
(from previous incremental).


--
Best regards,
Vladimir




Re: [Qemu-block] write_zeroes/trim on the whole disk

2016-09-24 Thread Vladimir Sementsov-Ogievskiy

On 24.09.2016 19:31, Alex Bligh wrote:

On 24 Sep 2016, at 13:06, Vladimir Sementsov-Ogievskiy 
 wrote:

Note: if disk size is not aligned to X we will have to send request larger than 
the disk size to clear the whole disk.

If you look at the block size extension, the size of the disk must be an exact 
multiple of the minimum block size. So that would work.



But there is no guarantee that disk_size/block_size < INT_MAX.. May be, 
additional option, specifying the shift would be better. With convention 
that if offset+length exceeds disk size, length should be recalculated 
as disk_size-offset.


--
Best regards,
Vladimir




Re: [Qemu-block] write_zeroes/trim on the whole disk

2016-09-24 Thread Alex Bligh

> On 24 Sep 2016, at 17:20, Vladimir Sementsov-Ogievskiy 
>  wrote:
> 
> Also, accordingly to documentation, NBD_CMD_TRIM is not appropriate for disk 
> clearing:
> 
>  * `NBD_CMD_TRIM` (4)
> 
>  A hint to the server that the data defined by len and offset is no
>  longer needed. A server MAY discard len bytes starting at offset, but
>  is not required to.
> 
>  After issuing this command, a client MUST NOT make any assumptions
>  about the contents of the export affected by this command, until
>  overwriting it again with `NBD_CMD_WRITE`.
> 
> - it may do nothing.. So, what to do with this? add flag FORCE_TRIM for this 
> command? Or add FORCE_HOLES flag to WRITE_ZEROES?

You cannot force a hole, because NBD the is not guaranteed to support holes.

Use NBD_CMD_WRITE_ZEROES without NBD_CMD_FLAG_NO_HOLE and you can pretty much 
assume that a server that supports holes will write holes. A server that does 
not support holes will write zeroes. If you don't care whether the resultant 
data is zero, just use NBD_CMD_TRIM. But as you do care (see above) you must be 
prepared for a 'thick' write of zeroes on servers that don't support it.

-- 
Alex Bligh







Re: [Qemu-block] write_zeroes/trim on the whole disk

2016-09-24 Thread Alex Bligh

> On 24 Sep 2016, at 13:06, Vladimir Sementsov-Ogievskiy 
>  wrote:
> 
> Note: if disk size is not aligned to X we will have to send request larger 
> than the disk size to clear the whole disk.

If you look at the block size extension, the size of the disk must be an exact 
multiple of the minimum block size. So that would work.

-- 
Alex Bligh







Re: [Qemu-block] write_zeroes/trim on the whole disk

2016-09-24 Thread Denis V. Lunev
On 09/24/2016 12:21 AM, Wouter Verhelst wrote:
> On Fri, Sep 23, 2016 at 02:00:06PM -0500, Eric Blake wrote:
>> My preference would be a new flag to the existing commands, with
>> explicit documentation that 0 offset and 0 length must be used with that
>> flag, when requesting a full-device wipe.
> Alternatively, what about a flag that says "if you use this flag, the
> size should be left-shifted by X bits before processing"? That allows
> you to do TRIM or WRITE_ZEROES on much larger chunks, without being
> limited to "whole disk" commands. We should probably make it an illegal
> flag for any command that actually sends data over the wire, though.
>
interesting. Actually I like this ;)



Re: [Qemu-block] write_zeroes/trim on the whole disk

2016-09-24 Thread Vladimir Sementsov-Ogievskiy

On 24.09.2016 16:42, Vladimir Sementsov-Ogievskiy wrote:

On 24.09.2016 15:06, Vladimir Sementsov-Ogievskiy wrote:

On 24.09.2016 00:21, Wouter Verhelst wrote:

On Fri, Sep 23, 2016 at 02:00:06PM -0500, Eric Blake wrote:

My preference would be a new flag to the existing commands, with
explicit documentation that 0 offset and 0 length must be used with 
that

flag, when requesting a full-device wipe.

Alternatively, what about a flag that says "if you use this flag, the
size should be left-shifted by X bits before processing"? That allows
you to do TRIM or WRITE_ZEROES on much larger chunks, without being
limited to "whole disk" commands. We should probably make it an illegal
flag for any command that actually sends data over the wire, though.




Note: if disk size is not aligned to X we will have to send request 
larger than the disk size to clear the whole disk.


which is not beautiful.. And splitting the request into two: 
(disk_size / X * X) and (disk_size % X) is not beautiful too. May be 
flag, that says 'whole disk request' is more appropriate? Nothing 
prevents adding then flag about X-shift.





Also, accordingly to documentation, NBD_CMD_TRIM is not appropriate for 
disk clearing:


  * `NBD_CMD_TRIM` (4)

  A hint to the server that the data defined by len and offset is no
  longer needed. A server MAY discard len bytes starting at offset, but
  is not required to.

  After issuing this command, a client MUST NOT make any assumptions
  about the contents of the export affected by this command, until
  overwriting it again with `NBD_CMD_WRITE`.

- it may do nothing.. So, what to do with this? add flag FORCE_TRIM for 
this command? Or add FORCE_HOLES flag to WRITE_ZEROES?



--
Best regards,
Vladimir




Re: [Qemu-block] write_zeroes/trim on the whole disk

2016-09-24 Thread Vladimir Sementsov-Ogievskiy

On 24.09.2016 15:06, Vladimir Sementsov-Ogievskiy wrote:

On 24.09.2016 00:21, Wouter Verhelst wrote:

On Fri, Sep 23, 2016 at 02:00:06PM -0500, Eric Blake wrote:

My preference would be a new flag to the existing commands, with
explicit documentation that 0 offset and 0 length must be used with 
that

flag, when requesting a full-device wipe.

Alternatively, what about a flag that says "if you use this flag, the
size should be left-shifted by X bits before processing"? That allows
you to do TRIM or WRITE_ZEROES on much larger chunks, without being
limited to "whole disk" commands. We should probably make it an illegal
flag for any command that actually sends data over the wire, though.




Note: if disk size is not aligned to X we will have to send request 
larger than the disk size to clear the whole disk.


which is not beautiful.. And splitting the request into two: (disk_size 
/ X * X) and (disk_size % X) is not beautiful too. May be flag, that 
says 'whole disk request' is more appropriate? Nothing prevents adding 
then flag about X-shift.



--
Best regards,
Vladimir




Re: [Qemu-block] write_zeroes/trim on the whole disk

2016-09-24 Thread Vladimir Sementsov-Ogievskiy

On 24.09.2016 15:06, Vladimir Sementsov-Ogievskiy wrote:

On 24.09.2016 00:21, Wouter Verhelst wrote:

On Fri, Sep 23, 2016 at 02:00:06PM -0500, Eric Blake wrote:

My preference would be a new flag to the existing commands, with
explicit documentation that 0 offset and 0 length must be used with 
that

flag, when requesting a full-device wipe.

Alternatively, what about a flag that says "if you use this flag, the
size should be left-shifted by X bits before processing"? That allows
you to do TRIM or WRITE_ZEROES on much larger chunks, without being
limited to "whole disk" commands. We should probably make it an illegal
flag for any command that actually sends data over the wire, though.




Note: if disk size is not aligned to X we will have to send request 
larger than the disk size to clear the whole disk.




Also, in this case, which realization of bdrv interface in qemu would be 
most appropriate? Similar flag (in this case X must be defined in some 
very transparent way, as a constant of 64k for example), or flag 
BDRV_REQ_WHOLE_DISK, or separate .bdrv_zero_all and .bdrv_discard_all ?


--
Best regards,
Vladimir




Re: [Qemu-block] write_zeroes/trim on the whole disk

2016-09-24 Thread Vladimir Sementsov-Ogievskiy

On 24.09.2016 00:21, Wouter Verhelst wrote:

On Fri, Sep 23, 2016 at 02:00:06PM -0500, Eric Blake wrote:

My preference would be a new flag to the existing commands, with
explicit documentation that 0 offset and 0 length must be used with that
flag, when requesting a full-device wipe.

Alternatively, what about a flag that says "if you use this flag, the
size should be left-shifted by X bits before processing"? That allows
you to do TRIM or WRITE_ZEROES on much larger chunks, without being
limited to "whole disk" commands. We should probably make it an illegal
flag for any command that actually sends data over the wire, though.




Note: if disk size is not aligned to X we will have to send request 
larger than the disk size to clear the whole disk.


--
Best regards,
Vladimir




Re: [Qemu-block] [Nbd] write_zeroes/trim on the whole disk

2016-09-24 Thread Alex Bligh

> On 23 Sep 2016, at 22:21, Wouter Verhelst  wrote:
> 
> On Fri, Sep 23, 2016 at 02:00:06PM -0500, Eric Blake wrote:
>> My preference would be a new flag to the existing commands, with
>> explicit documentation that 0 offset and 0 length must be used with that
>> flag, when requesting a full-device wipe.
> 
> Alternatively, what about a flag that says "if you use this flag, the
> size should be left-shifted by X bits before processing"? That allows
> you to do TRIM or WRITE_ZEROES on much larger chunks, without being
> limited to "whole disk" commands. We should probably make it an illegal
> flag for any command that actually sends data over the wire, though.

I'd prefer an approach like this. Perhaps X could be negotiated
with the block size extension (or simply be defined as the
'preferred block size'.

This could then be defined to work with all commands.

-- 
Alex Bligh