Re: [Qemu-block] [Qemu-devel] [PATCH 1/1] nbd: increase maximum size of the PWRITE_ZERO request

2018-02-10 Thread Alex Bligh

> On 10 Feb 2018, at 18:43, Alex Bligh <a...@alex.org.uk> wrote:
> 
> So I think a reasonable logic for Qemu would be to try NBD_CMD_INFO and find 
> the maximum write size, and if that's unsupported use 0x (capping at 
> export size, or export size minus write offset).

Ur actually capping it at (2^16 - blocksize) would be the right thing to do 
(writes should be multiples of the block size).

-- 
Alex Bligh







Re: [Qemu-block] [Qemu-devel] [PATCH 1/1] nbd: increase maximum size of the PWRITE_ZERO request

2018-02-10 Thread Alex Bligh

> On 8 Feb 2018, at 16:28, Eric Blake <ebl...@redhat.com> wrote:
> 
> On 02/08/2018 07:23 AM, Edgar Kaziakhmedov wrote:
>> Upstream NBD protocol implementation supports an efficient zero out
>> mechanism over the wire, along with the ability to check whether a
>> client allows using a hole.
>> Accordingly, since PWRITE_ZERO doesn't involve any payload on the wire,
>> increase a maximum size of the PWRITE_ZERO request up to 1Gb (aligned).
>> Moreover, such change will decrease the number of PWRITE_ZERO NBD commands
>> in comparison with the current 32M limit. The benefits of
>> the larger constraint can be examined in a block mirroring over NBD.
> 
> We've got a potential problem.  Unless you have out-of-band communication of 
> the maximum NBD_CMD_WRITE_ZEROES sizing (or if the NBD protocol is enhanced 
> to advertise that as an additional piece of block size information during 
> NBD_OPT_GO), then a client CANNOT assume that the server will accept a 
> request this large.  We MIGHT get lucky if all existing servers that accept 
> WRITE_ZEROES requests either act on large requests or reply with EINVAL but 
> do not outright drop the connection (which is different from servers that DO 
> outright drop the connection for an NBD_CMD_WRITE larger than 32M).  But I 
> don't know if that's how all servers behave, so sending a too-large 
> WRITE_ZEROES request may have the unintended consequence of killing the 
> connection.
> 
> I'm adding the NBD list; perhaps before accepting this into qemu, I should 
> revive my earlier attempt at codifying an NBD_OPT_GO info advertisement for 
> maximum trim/zero sizing, which would let clients have a guarantee that their 
> choice of sizing won't cause unexpected failures.

A couple of comments:

1. The length field is only 32 bits, so no writes more than 0x in 
length are going to work anyway :-)

2. I'm not sure the situation is as bad as you make out Eric. I think you've 
forgotten the NBD_OPT_INFO work and the conversation around that we had where 
we determined that servers not supporting NBD_OPT_INFO were already meant to 
support 'unlimited' size writes. Per the spec:

"If block size constraints have not been advertised or agreed on externally, 
then a client SHOULD assume a default minimum block size of 1, a preferred 
block size of 2^12 (4,096), and a maximum block size of the smaller of the 
export size or 0x (effectively unlimited)."

I read these to apply to all uses of 'length', but even if one argues it 
doesn't apply to NBD_CMD_WRITE_ZEROES because it doesn't have a payload, I 
think the rebuttal is that a server which supports NBD_CMD_WRITE of a given 
length must also support NBD_CMD_WRITE_ZEROES of that length.

So I think a reasonable logic for Qemu would be to try NBD_CMD_INFO and find 
the maximum write size, and if that's unsupported use 0x (capping at 
export size, or export size minus write offset).

-- 
Alex Bligh







Re: [Qemu-block] [Nbd] [Qemu-devel] How to online resize qemu disk with nbd protocol?

2017-01-14 Thread Alex Bligh

> On 14 Jan 2017, at 14:48, Wouter Verhelst <w...@uter.be> wrote:
> 
> On Thu, Jan 12, 2017 at 06:56:42PM +0000, Alex Bligh wrote:
>> My preferred way to do this would be essentially to allow NBD_OPT_INFO
>> to be sent (wrapped appropriately) during the main transmission phase.
>> That would allow *any* INFO stuff to be reread.
> 
> Can you go into a bit more detail how you'd see that? It feels a bit
> awkward to me, at best; but I could be missing something.

Well, the idea would be something like: NBD_CMD_INFO, use the offset
specified as the the information type, and put the INFO reply
within the reply block. I guess the client doesn't know the
length of the reply so we'd have to use structured replies
or similar. Although looking through the current info types,
I can't see why today you'd want to reread anything other than
the length, so maybe this is useless futureproofing.

-- 
Alex Bligh







Re: [Qemu-block] [Qemu-devel] [Nbd] How to online resize qemu disk with nbd protocol?

2017-01-12 Thread Alex Bligh

> On 12 Jan 2017, at 18:45, Eric Blake <ebl...@redhat.com> wrote:
> 
> On 01/12/2017 11:57 AM, Bob Chen wrote:
>> There might be a time window between the NBD server's resize and the
>> client's `re-read size` request. Is it safe?
> 
> For resize larger, it seems that it would be safe for the server to just
> remember the last size it has advertised to the client.  As long as the
> client doesn't request read/write to any offset beyond the
> last-advertised size (either the size the server gave at handshake, or
> the size that the server reported when the client last used the new
> 're-read size' command), there's no difference in behavior (and
> well-behaved clients fall in this group); if the client DOES try to
> access out-of-bounds with respect to the last known size, the server
> SHOULD reject the access, but MAY serve it instead.  A client that is
> unaware of the 're-read size' command will never learn that the server
> is now offering a larger image.
> 
> For resize smaller, things are a lot trickier - how do you block access
> to a portion of a file that the client still thinks exist, but which the
> server has truncated away?  Maybe the easy way out is to state that the
> server MUST NOT resize smaller.

I'm not sure why this is any harder than Qemu writing to a physical partition
that changes size. You need to change the size of things in order. To
resize smaller, you tell Qemu it's smaller, then resize the partition. To
resize larger, you resize the partition then tell Qemu it's larger.

This is incidentally exactly the same thing you do if you have a filing
system on a partition (or a filing system on a real nbd device - we could
support the same reread of geometry in kernel).

>> 
>> What about an active `resize` request from the client? Considering some NBD
>> servers might have the capability to do instant resizing, not applying to
>> LVM or host block device, of course...
> 
> And perhaps the 're-read size' command can serve a dual purpose.  Since
> all NBD commands already include space for size and offset parameters,
> we could define that if the client passes 0/0 for size and offset, it
> wants to read the server's current size; if it passes 0/non-zero, it is
> asking the server to resize to the new non-zero size (and the server's
> success or error response tells whether the resize happened).
> 
> Should I spend time turning this idea into a more formal spec, along the
> lines of other NBD extension proposals?

I'd support the reread bit. I'm less sure we ever want to push a new
size from the client. It seems a bit like a layering violation. Perhaps
I can be convinced.

My preferred way to do this would be essentially to allow NBD_OPT_INFO
to be sent (wrapped appropriately) during the main transmission phase.
That would allow *any* INFO stuff to be reread.

If it's just this (rather than a resize command) I guess it could
go into the NBD_OPT_INFO extension branch. If it's going to do
more than _INFO_, then I guess it needs its own extension.

--
Alex Bligh






signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [Qemu-block] [Nbd] [Qemu-devel] How to online resize qemu disk with nbd protocol?

2017-01-12 Thread Alex Bligh

> On 12 Jan 2017, at 14:43, Eric Blake <ebl...@redhat.com> wrote:
> 
> That's because the NBD protocol lacks a resize command.  You'd have to
> first get that proposed as an NBD extension before qemu could support it.

Actually the NBD protocol lacks a 'make a disk with size X' command,
let alone a resize command. The size of an NBD disk is (currently)
entirely in the hands of the server. What I think we'd really need
would be a 'reread size' command, and have the server capable of
supporting resizing. That would then work for readonly images too.

--
Alex Bligh






signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [Qemu-block] [Nbd] [PATCH] doc: Propose NBD_FLAG_INIT_ZEROES extension

2016-12-13 Thread Alex Bligh

> On 13 Dec 2016, at 12:18, Wouter Verhelst <w...@uter.be> wrote:
> 
> I'm not opposed either, but I do agree with you that we shouldn't add
> such a feature if it doesn't end up getting used. Especially so if it
> burns a flag in the (16-bit) "transmission flags" field, where space is
> at a premium.

I did suggest a few non-Qemu uses (see below). I think it might be
an idea if the reference implementation supported it before
merging (which per below should be trivial).

-- 
Alex Bligh


> Begin forwarded message:
> 
> From: Alex Bligh <a...@alex.org.uk>
> Subject: Re: [Nbd] [Qemu-block] [PATCH] doc: Propose NBD_FLAG_INIT_ZEROES 
> extension
> Date: 6 December 2016 at 10:29:41 United Kingdom Time
> To: Kevin Wolf <kw...@redhat.com>
> Cc: Alex Bligh <a...@alex.org.uk>, Eric Blake <ebl...@redhat.com>, 
> "nbd-gene...@lists.sourceforge.net" <nbd-gene...@lists.sourceforge.net>, 
> xieying...@huawei.com, su...@huawei.com, qemu block <qemu-block@nongnu.org>, 
> "qemu-de...@nongnu.org" <qemu-de...@nongnu.org>, Paolo Bonzini 
> <pbonz...@redhat.com>
> 
> 
>> On 6 Dec 2016, at 09:25, Kevin Wolf <kw...@redhat.com> wrote:
>> 
>> Am 06.12.2016 um 00:42 hat Eric Blake geschrieben:
>>> While not directly related to NBD_CMD_WRITE_ZEROES, the qemu
>>> team discovered that it is useful if a server can advertise
>>> whether an export is in a known-all-zeroes state at the time
>>> the client connects.
>> 
>> Does a server usually have the information to set this flag, other than
>> querying the block status of all blocks at startup? 
> 
> The server may have other ways of knowing this, for instance
> that it has just created the file (*), or that it stat'd the file
> before opening it (not unlikely) and noticed it had 0 allocated
> size. The latter I suspect would be trivial to implement in nbd-server
> 
> (*) = e.g. I had one application where nbd use the export path
> to signify it wanted to open a temporary file, the path consisting
> of a UUID and an encoded length. If the file was not present already
> it created it with ftruncate(). That could trivially have used this.
> 
>> If so, the client could just query this by itself.
> 
> Well there's no currently mainlined extension to do that, but yes
> it could. On the other hand I see no issue passing complete
> zero status back to the client if it's so obvious from a stat().
> 
> -- 
> Alex Bligh
> 
> 
> 
> 





Re: [Qemu-block] [Nbd] [PATCH] doc: Propose NBD_FLAG_INIT_ZEROES extension

2016-12-06 Thread Alex Bligh

> On 6 Dec 2016, at 08:46, Alex Bligh <a...@alex.org.uk> wrote:
> 
> I would support this.
> 
> In fact the patch is sufficiently simple I think I'd merge this
> into extension-write-zeroes then merge that into master.

Hence:

Reviewed-By: Alex Bligh <a...@alex.org.uk>

-- 
Alex Bligh







Re: [Qemu-block] [Nbd] [PATCH] doc: Propose NBD_FLAG_INIT_ZEROES extension

2016-12-06 Thread Alex Bligh

> On 6 Dec 2016, at 09:25, Kevin Wolf <kw...@redhat.com> wrote:
> 
> Am 06.12.2016 um 00:42 hat Eric Blake geschrieben:
>> While not directly related to NBD_CMD_WRITE_ZEROES, the qemu
>> team discovered that it is useful if a server can advertise
>> whether an export is in a known-all-zeroes state at the time
>> the client connects.
> 
> Does a server usually have the information to set this flag, other than
> querying the block status of all blocks at startup? 

The server may have other ways of knowing this, for instance
that it has just created the file (*), or that it stat'd the file
before opening it (not unlikely) and noticed it had 0 allocated
size. The latter I suspect would be trivial to implement in nbd-server

(*) = e.g. I had one application where nbd use the export path
to signify it wanted to open a temporary file, the path consisting
of a UUID and an encoded length. If the file was not present already
it created it with ftruncate(). That could trivially have used this.

> If so, the client could just query this by itself.

Well there's no currently mainlined extension to do that, but yes
it could. On the other hand I see no issue passing complete
zero status back to the client if it's so obvious from a stat().

-- 
Alex Bligh







Re: [Qemu-block] [Nbd] [PATCH] doc: Propose NBD_FLAG_INIT_ZEROES extension

2016-12-06 Thread Alex Bligh

> On 5 Dec 2016, at 23:42, Eric Blake <ebl...@redhat.com> wrote:
> 
> While not directly related to NBD_CMD_WRITE_ZEROES, the qemu
> team discovered that it is useful if a server can advertise
> whether an export is in a known-all-zeroes state at the time
> the client connects.

I think this is good to go, and ...

> Signed-off-by: Eric Blake <ebl...@redhat.com>
> ---
> doc/proto.md | 5 +
> 1 file changed, 5 insertions(+)
> 
> This replaces the following qemu patch attempt:
> https://lists.gnu.org/archive/html/qemu-devel/2016-12/msg00357.html
> which tried to add NBD_CMD_HAS_ZERO_INIT with poor semantics. The
> semantics in this proposal should be much better.
> 
> Patch is to the merge of the master branch and the
> extension-write-zeroes branch.  By the way, qemu 2.8 is due
> to be released "real soon now", and implements NBD_CMD_WRITE_ZEROES,
> so maybe it is time to consider promoting the extension-write-zeroes
> branch into master.

I would support this.

In fact the patch is sufficiently simple I think I'd merge this
into extension-write-zeroes then merge that into master.

Wouter?

Alex

> diff --git a/doc/proto.md b/doc/proto.md
> index afe71fc..7e4ec7f 100644
> --- a/doc/proto.md
> +++ b/doc/proto.md
> @@ -697,6 +697,11 @@ The field has the following format:
>   the export.
> - bit 9, `NBD_FLAG_SEND_BLOCK_STATUS`: defined by the experimental
>   `BLOCK_STATUS` 
> [extension](https://github.com/NetworkBlockDevice/nbd/blob/extension-blockstatus/doc/proto.md).
> +- bit 10, `NBD_FLAG_INIT_ZEROES`: Indicates that the server guarantees
> +  that at the time transmission phase begins, all offsets within the
> +  export read as zero bytes.  Clients MAY use this information to
> +  avoid writing to sections of the export that should still read as
> +  zero after the client is done writing.
> 
> Clients SHOULD ignore unknown flags.
> 
> -- 
> 2.9.3
> 
> 
> --
> Developer Access Program for Intel Xeon Phi Processors
> Access to Intel Xeon Phi processor-based developer platforms.
> With one year of Intel Parallel Studio XE.
> Training and support from Colfax.
> Order your platform today.http://sdm.link/xeonphi
> ___
> Nbd-general mailing list
> nbd-gene...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nbd-general
> 

-- 
Alex Bligh







Re: [Qemu-block] [Nbd] [PATCH] proto: add 'shift' extension.

2016-09-27 Thread Alex Bligh

> On 27 Sep 2016, at 00:41, Wouter Verhelst <w...@uter.be> wrote:
> 
> Thoughts?

Honestly? This whole thing seems like complication for little gain. One command 
per 2GB wiped doesn't seem like a bad thing.

-- 
Alex Bligh







Re: [Qemu-block] [PATCH] proto: add 'shift' extension.

2016-09-26 Thread Alex Bligh

> On 26 Sep 2016, at 14:54, Paolo Bonzini <pbonz...@redhat.com> wrote:
> 
> 
> Considering that NBD supports multiple outstanding requests, is it a big
> deal to require one request per terabyte of storage?

+1

Also I don't think we particularly want to make clearing the entire
disk easy to do by mistake!

This whole 'clear the whole disk in one command' seems like a
dire case of premature optimisation.

-- 
Alex Bligh







Re: [Qemu-block] [PATCH] proto: add 'shift' extension.

2016-09-26 Thread Alex Bligh

> On 26 Sep 2016, at 13:46, Vladimir Sementsov-Ogievskiy 
> <vsement...@virtuozzo.com> wrote:
> 
>  Option reply types
> 
> These values are used in the "reply type" field, sent by the server
> @@ -872,7 +883,13 @@ valid may depend on negotiation during the handshake 
> phase.
>   
> [extension](https://github.com/yoe/nbd/blob/extension-write-zeroes/doc/proto.md).
> - bit 2, `NBD_CMD_FLAG_DF`; defined by the experimental `STRUCTURED_REPLY`
>   
> [extension](https://github.com/yoe/nbd/blob/extension-structured-reply/doc/proto.md).
> -
> +- bit 3, `NBD_CMD_FLAG_SHIFT`; This flag is valid for `NBD_CMD_TRIM` and
> +  `NBD_CMD_WRITE_ZEROES`. If this flag is set the server shifts request
> +  *length* and *offset* left by N bits, where N is defined by `NBD_OPT_SHIFT`
> +  option or is assumed to be 16 bits by default if `NBD_OPT_SHIFT` option is
> +  not specified. If after shift `(offset + length)` exceeds disk size, length
> +  should be reduced to `( - offset)`. However, `(offset + length)`
> +  must not exceed disk size by more than `(1 << N) - 1`.

Is there a good reason the shift size can't be either the minimum block size
or otherwise negotiated using NBD_OPT_INFO?

-- 
Alex Bligh







Re: [Qemu-block] write_zeroes/trim on the whole disk

2016-09-24 Thread Alex Bligh

> On 24 Sep 2016, at 18:47, Vladimir Sementsov-Ogievskiy 
> <vsement...@virtuozzo.com> wrote:
> 
> I just wanted to say, that if we want a possibility of clearing the whole 
> disk in one request for qcow2 we have to take 512 as granularity for such 
> requests (with X = 9). An this is too small. 1tb will be the upper bound for 
> the request.

Sure. But I do not see the value in optimising these huge commands to run as 
single requests. If you want to do that, do it properly and have a 
negotiation-phase flag that supports 64 bit request lengths.

> Full backup, for example:
> 
> 1. target can do fast write_zeroes: clear the whole disk (great if we can do 
> it in one request, without splitting, etc), then backup all data except zero 
> or unallocated (save a lot of time on this skipping).
> 2. target can not do fast write_zeroes: just backup all data. We need not 
> clear the disk, as we will not save time by this.
> 
> So here, we need not splitting as a general. Just clear all or not clearing 
> at all.

As I said, within the current protocol you cannot tell whether a target 
supports 'fast write zeroes', and indeed the support may be partial - for 
instance with a QCOW2 backend, a write that is not cluster aligned would likely 
only partially satisfy the command by deallocating bytes. There is no current 
flag for 'supports fast write zeroes' and (given the foregoing) it isn't 
evident to me exactly what it would mean.

It seems however you could support your use case by simply iterating through 
the backup disk, using NBD_CMD_WRITE for the areas that are allocated and 
non-zero, and using NBD_CMD_WRITE_ZEROES for the areas that are not allocated 
or zeroed. This technique would not require a protocol change (beyond the 
existing NBD_CMD_WRITE_ZEROES extension), works irrespective of whether the 
target supports write zeroes or not, works irrespective of difference in 
cluster allocation size between source and target, is far simpler, and has the 
added advantage of making the existing zeroes-but-not-holes area into holes 
(that is optional if you can tell the difference between zeroes and holes on 
the source media). It also works on a single pass. Yes, you need to split 
requests up, but you need to split requests up ANYWAY to cope with 
NBD_CMD_WRITE's 2^32-1 length limit (I strongly advise you not to use more than 
2^31). And in any case, you probably want to parallelise reads and writes and 
have more than one write in flight in any case, all of which suggests you are 
going to be breaking up requests anyway.

-- 
Alex Bligh







Re: [Qemu-block] write_zeroes/trim on the whole disk

2016-09-24 Thread Alex Bligh

> On 24 Sep 2016, at 18:13, Vladimir Sementsov-Ogievskiy 
> <vsement...@virtuozzo.com> wrote:
> 
> On 24.09.2016 19:49, Alex Bligh wrote:
>>> On 24 Sep 2016, at 17:42, Vladimir Sementsov-Ogievskiy 
>>> <vsement...@virtuozzo.com> wrote:
>>> 
>>> On 24.09.2016 19:31, Alex Bligh wrote:
>>>>> On 24 Sep 2016, at 13:06, Vladimir Sementsov-Ogievskiy 
>>>>> <vsement...@virtuozzo.com> wrote:
>>>>> 
>>>>> Note: if disk size is not aligned to X we will have to send request 
>>>>> larger than the disk size to clear the whole disk.
>>>> If you look at the block size extension, the size of the disk must be an 
>>>> exact multiple of the minimum block size. So that would work.
> 
> This means that this extension could not be used with any qcow2 disk, as 
> qcow2 may have size not aligned to its cluster size.
> 
> # qemu-img create -f qcow2 mega 1K
> Formatting 'mega', fmt=qcow2 size=1024 encryption=off cluster_size=65536 
> lazy_refcounts=off refcount_bits=16
> # qemu-img info mega
> image: mega
> file format: qcow2
> virtual size: 1.0K (1024 bytes)
> disk size: 196K
> cluster_size: 65536
> Format specific information:
>compat: 1.1
>lazy refcounts: false
>refcount bits: 16
>corrupt: false
> 
> And there is no such restriction in documentation. Or we have to consider 
> sector-size (512b) as block size for qcow2, which is too small for our needs.

If by "this extension" you mean the INFO extension (which reports block sizes) 
that's incorrect.

An nbd server using a QCOW2 file as the backend would report the sector size as 
the minimum block size. It might report the cluster size or the sector size as 
the preferred block size, or anything in between.

QCOW2 cluster size essentially determines the allocation unit. NBD is not 
bothered as to the underlying allocation unit. It does not (currently) support 
the concept of making holes visible to the client. If you use 
NBD_CMD_WRITE_ZEREOS you get zeroes, which might or might not be implemented as 
one or more holes or 'real' zeroes (save if you specify NBD_CMD_FLAG_NO_HOLE in 
which case you are guaranteed to get 'real' zeroes'). If you use NBD_CMD_TRIM 
then the area trimmed might nor might not be written with one or more whole. 
There is (currently) no way to detect the presence of holes separately from 
zeroes (though a bitmap extension was discussed).

>>> But there is no guarantee that disk_size/block_size < INT_MAX..
>> I think you mean 2^32-1, but yes there is no guarantee of that. In that case 
>> you would need to break the call up into multiple calls.
>> 
>> However, being able to break the call up into multiple calls seems pretty 
>> sensible given that NBD_CMD_WRITE_ZEROES may take a large amount of
>> time, and a REALLY long time if the server doesn't support trim.
>> 
>>> May be, additional option, specifying the shift would be better. With 
>>> convention that if offset+length exceeds disk size, length should be 
>>> recalculated as disk_size-offset.
>> I don't think we should do that. We already have clear semantics that 
>> prevent operations beyond the end of the disk. Again, just break the command 
>> up into multipl commands. No great hardship.
>> 
> 
> I agree that requests larger than disk size are ugly.. But splitting request 
> brings me again to idea of having separate command or flag for clearing the 
> whole disk without that dance. Server may report availability of this/flag 
> command only if target driver supports fast write_zeroes (qcow2 in our case).

Why? In the general case you need to break up requests anyway (particularly 
with the INFO extension where there is a maximum command size), and issuing a 
command over a TCP connection that might take hours or days to complete with no 
hint of progress, and no TCP traffic to keep NAT etc. alive, sounds like bad 
practice. The overhead is tiny.

I would be against this change.

-- 
Alex Bligh







Re: [Qemu-block] write_zeroes/trim on the whole disk

2016-09-24 Thread Alex Bligh

> On 24 Sep 2016, at 17:52, Alex Bligh <a...@alex.org.uk> wrote:
> 
> In *your* use-case holes may be desirable. However in the general case, you 
> cannot assume a server supports holes. Optional support for holes isn't even 
> in the mainline spec yet (AFAIR).

You should also be aware that the minimum granularity for writes (currently 1 
byte in the spec, but the blocksize extension allows for larger) may be smaller 
than the minimum granularity for holes, which may depend on the underlying 
filing system.

A hole is in essence merely an optimised way of storing zeroes. And the trim 
operation says 'I'm not worried about this data any more - so zero it if you 
like'.

-- 
Alex Bligh







Re: [Qemu-block] write_zeroes/trim on the whole disk

2016-09-24 Thread Alex Bligh

> On 24 Sep 2016, at 17:48, Vladimir Sementsov-Ogievskiy 
> <vsement...@virtuozzo.com> wrote:
> 
>>> Use NBD_CMD_WRITE_ZEROES without NBD_CMD_FLAG_NO_HOLE and you can pretty 
>>> much assume that a server that supports holes will write holes. A server 
>>> that does not support holes will write zeroes. If you don't care whether 
>>> the resultant data is zero, just use NBD_CMD_TRIM. But as you do care (see 
>>> above) you must be prepared for a 'thick' write of zeroes on servers that 
>>> don't support it.
>>> 
>> 
>> No, holes are critical. Concreate case: incremental backup to delta file. If 
>> we write zeroes instead of holes, we will lose underlying data (from 
>> previous incremental).
>> 
> hmm, no, sorry, that is not needed.

In *your* use-case holes may be desirable. However in the general case, you 
cannot assume a server supports holes. Optional support for holes isn't even in 
the mainline spec yet (AFAIR).

It's up to you if you choose not to connect to servers that don't support 
NBD_CMD_WRITE_ZEREOS or NBD_CMD_TRIM, but even if you take that strategy, you 
cannot guarantee that the server will not implement them by ignoring 
NBD_CMD_TRIM (this is perfectly acceptable as NBD_CMD_TRIM is advisory only) 
and making NBD_CMD_WRITE_ZEREOS do a long write of zeroes.

IE there is no way to detect whether the server actually supports holes.

-- 
Alex Bligh







Re: [Qemu-block] write_zeroes/trim on the whole disk

2016-09-24 Thread Alex Bligh

> On 24 Sep 2016, at 17:42, Vladimir Sementsov-Ogievskiy 
> <vsement...@virtuozzo.com> wrote:
> 
> On 24.09.2016 19:31, Alex Bligh wrote:
>>> On 24 Sep 2016, at 13:06, Vladimir Sementsov-Ogievskiy 
>>> <vsement...@virtuozzo.com> wrote:
>>> 
>>> Note: if disk size is not aligned to X we will have to send request larger 
>>> than the disk size to clear the whole disk.
>> If you look at the block size extension, the size of the disk must be an 
>> exact multiple of the minimum block size. So that would work.
>> 
> 
> But there is no guarantee that disk_size/block_size < INT_MAX..

I think you mean 2^32-1, but yes there is no guarantee of that. In that case 
you would need to break the call up into multiple calls.

However, being able to break the call up into multiple calls seems pretty 
sensible given that NBD_CMD_WRITE_ZEROES may take a large amount of
time, and a REALLY long time if the server doesn't support trim.

> May be, additional option, specifying the shift would be better. With 
> convention that if offset+length exceeds disk size, length should be 
> recalculated as disk_size-offset.

I don't think we should do that. We already have clear semantics that prevent 
operations beyond the end of the disk. Again, just break the command up into 
multipl commands. No great hardship.

-- 
Alex Bligh







Re: [Qemu-block] write_zeroes/trim on the whole disk

2016-09-24 Thread Alex Bligh

> On 24 Sep 2016, at 17:20, Vladimir Sementsov-Ogievskiy 
> <vsement...@virtuozzo.com> wrote:
> 
> Also, accordingly to documentation, NBD_CMD_TRIM is not appropriate for disk 
> clearing:
> 
>  * `NBD_CMD_TRIM` (4)
> 
>  A hint to the server that the data defined by len and offset is no
>  longer needed. A server MAY discard len bytes starting at offset, but
>  is not required to.
> 
>  After issuing this command, a client MUST NOT make any assumptions
>  about the contents of the export affected by this command, until
>  overwriting it again with `NBD_CMD_WRITE`.
> 
> - it may do nothing.. So, what to do with this? add flag FORCE_TRIM for this 
> command? Or add FORCE_HOLES flag to WRITE_ZEROES?

You cannot force a hole, because NBD the is not guaranteed to support holes.

Use NBD_CMD_WRITE_ZEROES without NBD_CMD_FLAG_NO_HOLE and you can pretty much 
assume that a server that supports holes will write holes. A server that does 
not support holes will write zeroes. If you don't care whether the resultant 
data is zero, just use NBD_CMD_TRIM. But as you do care (see above) you must be 
prepared for a 'thick' write of zeroes on servers that don't support it.

-- 
Alex Bligh







Re: [Qemu-block] write_zeroes/trim on the whole disk

2016-09-24 Thread Alex Bligh

> On 24 Sep 2016, at 13:06, Vladimir Sementsov-Ogievskiy 
> <vsement...@virtuozzo.com> wrote:
> 
> Note: if disk size is not aligned to X we will have to send request larger 
> than the disk size to clear the whole disk.

If you look at the block size extension, the size of the disk must be an exact 
multiple of the minimum block size. So that would work.

-- 
Alex Bligh







Re: [Qemu-block] [Nbd] write_zeroes/trim on the whole disk

2016-09-24 Thread Alex Bligh

> On 23 Sep 2016, at 22:21, Wouter Verhelst <w...@uter.be> wrote:
> 
> On Fri, Sep 23, 2016 at 02:00:06PM -0500, Eric Blake wrote:
>> My preference would be a new flag to the existing commands, with
>> explicit documentation that 0 offset and 0 length must be used with that
>> flag, when requesting a full-device wipe.
> 
> Alternatively, what about a flag that says "if you use this flag, the
> size should be left-shifted by X bits before processing"? That allows
> you to do TRIM or WRITE_ZEROES on much larger chunks, without being
> limited to "whole disk" commands. We should probably make it an illegal
> flag for any command that actually sends data over the wire, though.

I'd prefer an approach like this. Perhaps X could be negotiated
with the block size extension (or simply be defined as the
'preferred block size'.

This could then be defined to work with all commands.

-- 
Alex Bligh







Re: [Qemu-block] [Nbd] [PATCH v4 04/11] nbd: Improve server handling of bogus commands

2016-06-15 Thread Alex Bligh

On 15 Jun 2016, at 10:18, Paolo Bonzini <pbonz...@redhat.com> wrote:

>> So what should those servers do (like 2 of mine) which don't buffer
>> the entire read, if they get an error having already sent some data?
> 
> They have sent an error code of zero, and it turned out to be wrong.  So
> the only thing they can do safely is disconnect.

Right, but that is not what Wouter's change says:

+If an error occurs, the server SHOULD set the appropriate error code
+in the error field. The server MAY then initiate a hard disconnect.
+If it chooses not to, it MUST NOT send any payload for this request.

I read this as either

a) the server can issue a hard disconnect without sending any reply; or

b) it must send the reply header with no payload

It also seems to permit not setting the error code (it's only a 'SHOULD'),
not disconnecting (it's a MAY), then not sending any payload, which is a
nonsense.

Perhaps this should read "If an error occurs, the server MUST either initiate
a hard disconnect before the entire payload has been sent or
set the appropriate code in the error field and send the response header
without any payload." if we want to go down this route.


-- 
Alex Bligh







Re: [Qemu-block] [Nbd] [PATCH v4 04/11] nbd: Improve server handling of bogus commands

2016-06-15 Thread Alex Bligh

> On 15 Jun 2016, at 09:03, Wouter Verhelst <w...@uter.be> wrote:
> 
> On Wed, Jun 15, 2016 at 09:05:22AM +0200, Wouter Verhelst wrote:
>> There are more clients than the Linux and qemu ones, but I think it's
>> fair to say that those two are the most important ones. If they agree
>> that a read reply which errors should come without payload, then I think
>> we should update the standard to say that, too.
> 
> I've just pushed a commit that changes the spec (and the implementation)
> so that if a server encounters a read error, it does not send a payload.
> 
> In other words, the current behaviour of qemu is correct, is now
> documented to be correct, and should not be changed.

So what should those servers do (like 2 of mine) which don't buffer
the entire read, if they get an error having already sent some data?

--
Alex Bligh






signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [Qemu-block] [PATCH v4 04/11] nbd: Improve server handling of bogus commands

2016-06-14 Thread Alex Bligh

> On 14 Jun 2016, at 16:11, Paolo Bonzini <pbonz...@redhat.com> wrote:
> 
>> To illustrate the problem, look consider what qemu itself would do as
>> a server if it can't buffer the entire read issued to it.
> 
> Return ENOMEM?

Well OK, qemu then 'works' on the basis it breaks another
part of the spec, which is coping with long reads.

> However, it looks like the
> de facto status prior to structured replies is that the error is in the
> spec, and this patch introduces a regression.

Well, I guess the patch makes it work the same as the
reference server implementation and the spec, which I'd
consider a fix. My view is that the error is in the
kernel client. I think Erik CC'd in nbd-general
re the comment that the spec was broken; I don't think
it is, and don't propose to change it. Wouter might or
might not feel differently.

It's been reasonably well known (I wrote about it
at least 3 years ago), that the current implementation
(reference + kernel) does not cope well with errors
on reads, so I'm guessing one is just trading one
set of brokenness for another. So I'm pretty relaxed
about what goes in qemu.

-- 
Alex Bligh







Re: [Qemu-block] [PATCH v4 04/11] nbd: Improve server handling of bogus commands

2016-06-14 Thread Alex Bligh

On 14 Jun 2016, at 14:32, Paolo Bonzini <pbonz...@redhat.com> wrote:

> 
> On 13/06/2016 23:41, Alex Bligh wrote:
>> That's one of the reasons that there is a proposal to add
>> STRUCTURED_READ to the spec (although I still haven't had time to
>> implement that for qemu), so that we have a newer approach that allows
>> for proper error handling without ambiguity on whether bogus bytes must
>> be sent on a failed read.  But you'd have to convince me that ALL
>> existing NBD server and client implementations expect to handle a read
>> error without read payload, otherwise, I will stick with the notion that
>> the current spec wording is correct, and that read errors CANNOT be
>> gracefully recovered from unless BOTH sides transfer (possibly bogus)
>> bytes along with the error message, and which is why BOTH sides of the
>> protocol are warned that read errors usually result in a disconnection
>> rather than clean continuation, without the addition of STRUCTURED_READ.
> 
> I suspect that there are exactly two client implementations,

My understanding is that there are more than 2 client implementations.
A quick google found at least one BSD client. I bet read error handling
is a mess in all of them.

> namely
> Linux and QEMU's, and both do the right thing.

This depends what you mean by 'right'. Both appear to be non-compliant
with the standard.

Note the standard is not defined by the client implementation, but
by the protocol document.

IMHO the 'right thing' is what is in the spec. Servers can't send an
error in any other way if they don't buffer the entire read first, as the
read may error towards the end.

To illustrate the problem, look consider what qemu itself would do as
a server if it can't buffer the entire read issued to it.

> What servers do doesn't matter, if all the clients agree.

The spec originally was not clear on how errors on reads should be
handled, leading to any read causing a protocol drop. The spec is
now clear. Unfortunately it is not possible to make a back compatible
fix. Hence the real fix here is to implement structured replies,
which is what Eric and I have been working on.

-- 
Alex Bligh







Re: [Qemu-block] [PATCH v4 04/11] nbd: Improve server handling of bogus commands

2016-06-13 Thread Alex Bligh

On 13 Jun 2016, at 13:25, Eric Blake <ebl...@redhat.com> wrote:

> On 06/13/2016 06:10 AM, Paolo Bonzini wrote:
>> 
>> 
>> On 12/05/2016 00:39, Eric Blake wrote:
>>> - If we report an error to NBD_CMD_READ, we are not writing out
>>> any data payload; but the protocol says that a client can expect
>>> to read the payload no matter what (and must instead ignore it),
>>> which means the client will start reading our next replies as
>>> its data payload. Fix by disconnecting (an alternative fix of
>>> sending bogus payload would be trickier to implement).
>> 
>> This is an error in the spec.  The Linux driver doesn't expect to read
>> the payload here, and neither does block/nbd-client.c.
> 
> That's one of the reasons that there is a proposal to add
> STRUCTURED_READ to the spec (although I still haven't had time to
> implement that for qemu), so that we have a newer approach that allows
> for proper error handling without ambiguity on whether bogus bytes must
> be sent on a failed read.  But you'd have to convince me that ALL
> existing NBD server and client implementations expect to handle a read
> error without read payload, otherwise, I will stick with the notion that
> the current spec wording is correct, and that read errors CANNOT be
> gracefully recovered from unless BOTH sides transfer (possibly bogus)
> bytes along with the error message, and which is why BOTH sides of the
> protocol are warned that read errors usually result in a disconnection
> rather than clean continuation, without the addition of STRUCTURED_READ.

To back up what Eric said:

Unfortunately the design is pretty much broken for reporting errors
on reads (at least in part as there is no way to signal errors that
occur after some of the reply has been written).

The spec specifies that on a read, no matter whether or not there
is an error, the data is all sent. This was after some mailing
list conversations on the subject which indicated this was the
least broken way to do things (IIRC).

This is actually what nbd-server.c does in the threaded handler:
 https://github.com/yoe/nbd/blob/master/nbd-server.c#L1468

For amusement value, the non-threaded handler (which is not used
any more) does not send any payload on an error:
 https://github.com/yoe/nbd/blob/master/nbd-server.c#L1734

In essence read error handling is a horrible mess in NBD, and
I would not expect it to work in general :-(

--
Alex Bligh






signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [Qemu-block] [Qemu-devel] [PATCH v4 00/11] nbd: tighter protocol compliance

2016-05-12 Thread Alex Bligh

On 11 May 2016, at 23:39, Eric Blake <ebl...@redhat.com> wrote:

> Fix several corner-case bugs in our implementation of the NBD
> protocol, both as client and as server.

I thought I'd added a Reviewed-By: line to more of these before.
On a very very quick look, they all look good to me.

-- 
Alex Bligh







Re: [Qemu-block] [Nbd] [Qemu-devel] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE

2016-05-12 Thread Alex Bligh

On 11 May 2016, at 22:12, Wouter Verhelst <w...@uter.be> wrote:

> On Tue, May 10, 2016 at 04:38:29PM +0100, Alex Bligh wrote:
>> On 10 May 2016, at 16:29, Eric Blake <ebl...@redhat.com> wrote:
>>> So the kernel is currently one of the clients that does NOT honor block
>>> sizes, and as such, servers should be prepared for ANY size up to
>>> UINT_MAX (other than DoS handling).
>> 
>> Or not to permit a connection.
> 
> Right -- and this is why I was recommending against making this a MUST in the
> first place.

Indeed, and it currently is a 'MAY':

> except that if a server believes a client's behaviour constitutes
> a denial of service, it MAY initiate a hard disconnect.

-- 
Alex Bligh







Re: [Qemu-block] [Nbd] [Qemu-devel] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE

2016-05-10 Thread Alex Bligh

On 10 May 2016, at 16:45, Quentin Casasnovas <quentin.casasno...@oracle.com> 
wrote:

> I'm by no mean an expert in this, but why would the kernel break up those
> TRIM commands?  After all, breaking things up makes sense when the length
> of the request is big, not that much when it only contains the request
> header, which is the case for TRIM commands.

1. You are assuming that the only reason for limiting the size of operations is
   limiting the data transferred within one request. That is not necessarily
   the case. There are good reasons (if only orthogonality) to have limits
   in place even where no data is transferred.

2. As and when the blocksize extension is implemented in the kernel (it isn't
   now), the protocol requires it.

3. The maximum length of an NBD trim operation is 2^32. The maximum length
   of a trim operation is larger. Therefore the kernel needs to do at least
   some breaking up.

-- 
Alex Bligh







Re: [Qemu-block] [Nbd] [Qemu-devel] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE

2016-05-10 Thread Alex Bligh

On 10 May 2016, at 16:46, Eric Blake <ebl...@redhat.com> wrote:

> Does anyone have an easy way to cause the kernel to request a trim
> operation that large on a > 4G export?  I'm not familiar enough with
> EXT4 operation to know what file system operations you can run to
> ultimately indirectly create a file system trim operation that large.
> But maybe there is something simpler - does the kernel let you use the
> fallocate(2) syscall operation with FALLOC_FL_PUNCH_HOLE or
> FALLOC_FL_ZERO_RANGE on an fd backed by an NBD device?

Not tried it, but fallocate(1) with -p ?

http://man7.org/linux/man-pages/man1/fallocate.1.html

As it takes length and offset in TB, PB, EB, ZB and YB, it
seems to be 64 bit aware :-)

--
Alex Bligh






signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [Qemu-block] [Nbd] [Qemu-devel] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE

2016-05-10 Thread Alex Bligh

On 10 May 2016, at 16:29, Eric Blake <ebl...@redhat.com> wrote:

> So the kernel is currently one of the clients that does NOT honor block
> sizes, and as such, servers should be prepared for ANY size up to
> UINT_MAX (other than DoS handling).

Interesting followup question:

If the kernel does not fragment TRIM requests at all (in the
same way it fragments read and write requests), I suspect
something bad may happen with TRIM requests over 2^31
in size (particularly over 2^32 in size), as the length
field in nbd only has 32 bits.

Whether it supports block size constraints or not, it is
going to need to do *some* breaking up of requests.

--
Alex Bligh






signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [Qemu-block] [Nbd] [Qemu-devel] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE

2016-05-10 Thread Alex Bligh
Eric,

On 10 May 2016, at 16:29, Eric Blake <ebl...@redhat.com> wrote:
>>> Maybe we should revisit that in the spec, and/or advertise yet another
>>> block size (since the maximum size for a trim and/or write_zeroes
>>> request may indeed be different than the maximum size for a read/write).
>> 
>> I think it's up to the server to either handle large requests, or
>> for the client to break these up.
> 
> But the question at hand here is whether we should permit servers to
> advertise multiple maximum block sizes (one for read/write, another one
> for trim/write_zero, or even two [at least qemu tracks a separate
> maximum trim vs. write_zero sizing in its generic block layer]), or
> merely stick with the current wording that requires clients that honor
> maximum block size to obey the same maximum for ALL commands, regardless
> of amount of data sent over the wire.

In my view, we should not change this. Block sizes maxima are not there
to support DoS prevention (that's a separate phrase). They are there
to impose maximum block sizes. Adding a different maximum block size
for different commands is way too overengineered. There are after
all reasons (especially without structured replies) why you'd want
different maximum block sizes for writes and reads. If clients support
block sizes, they will necessarily have to have the infrastructure
to break requests up.

IE maximum block size should continue to mean maximum block size.

>> 
>> The core problem here is that the kernel (and, ahem, most servers) are
>> ignorant of the block size extension, and need to guess how to break
>> things up. In my view the client (kernel in this case) should
>> be breaking the trim requests up into whatever size it uses as the
>> maximum size write requests. But then it would have to know about block
>> sizes which are in (another) experimental extension.
> 
> Correct - no one has yet patched the kernel to honor block sizes
> advertised through what is currently an experimental extension.

Unsurprising at it's still experimental, and only settled down a couple
of weeks ago :-)

>  (We
> have ioctl(NBD_SET_BLKSIZE) which can be argued to set the kernel's
> minimum block size,

Technically that is 'out of band transmission of block size
constraints' :-)

> but I haven't audited whether the kernel actually
> guarantees that all client requests are sent aligned to the value passed
> that way - but we have nothing to set the maximum size,

indeed

> and are at the
> mercy of however the kernel currently decides to split large requests).

I am surprised TRIM doesn't get broken up the same way READ and WRITE
do.

> So the kernel is currently one of the clients that does NOT honor block
> sizes, and as such, servers should be prepared for ANY size up to
> UINT_MAX (other than DoS handling).

Or not to permit a connection.

>  My question above only applies to
> clients that use the experimental block size extensions.

Indeed.

>> What surprises me is that a release kernel is using experimental
>> NBD extensions; there's no guarantee these won't change. Or does
>> fstrim work some other way?
> 
> No extension in play.  The kernel is obeying NBD_FLAG_SEND_TRIM, which
> is in the normative standard, and unrelated to the INFO/BLOCK_SIZE
> extensions.

My mistake. I was confusing 'WRITE_ZEROES' with 'TRIM'.

--
Alex Bligh






signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [Qemu-block] [Nbd] [Qemu-devel] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE

2016-05-10 Thread Alex Bligh
m+1-919-301-3266
> Libvirt virtualization library http://libvirt.org
> 
> --
> Mobile security can be enabling, not merely restricting. Employees who
> bring their own devices (BYOD) to work are irked by the imposition of MDM
> restrictions. Mobile Device Manager Plus allows you to control only the
> apps on BYO-devices by containerizing them, leaving personal data untouched!
> https://ad.doubleclick.net/ddm/clk/304595813;131938128;j___
> Nbd-general mailing list
> nbd-gene...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nbd-general

--
Alex Bligh






signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [Qemu-block] [Nbd] question on ioctl NBD_SET_FLAGS

2016-05-04 Thread Alex Bligh

On 20 Apr 2016, at 17:18, Wouter Verhelst <w...@uter.be> wrote:

> Hi Eric,
> 
> On Wed, Apr 20, 2016 at 09:42:02AM -0600, Eric Blake wrote:
> [...]
>> But in 3.9, the overlap bug was still present, and the set of global
>> flags had grown to include NBD_FLAG_NO_ZEROS (commit 038e066), which
>> overlaps with NBD_FLAG_READ_ONLY.  Ouch.  This means that a client
>> talking to a server that advertises NO_ZEROES means that the client will
>> mistakenly tell the kernel to treat EVERY export as read-only, even if
>> the client doesn't respond with NBD_FLAG_C_NO_ZEROES.
>> 
>> 3.10 fixed things; negotiate() now uses uint16_t *flags (instead of u32
>> *), and no longer tries to merge global flags with transmission flags;
>> only the transmission flags are ever passed to the kernel via
>> NBD_SET_FLAGS.  Maybe it's good that there was only 2 weeks between 3.9
>> and 3.10, so hopefully few distros are trying to ship that broken version.
> 
> Well, yeah, since 3.10 was an "oops" release when 3.9 exposed that bug
> (which indeed had existed for a while) and which was reported quite
> quickly on the list. Released versions of nbd which have the bug exist
> though, and trying to have a 3.8 (or below) client talk to a 3.9 (or
> above) server has the same issue.
> 
> I decided that there was no way in which I could fix it, and that "the
> export is readonly" is bad but not a "critical data loss" kind of bug,
> so releasing 3.10 was pretty much the only sane thing I could do (other
> than delaying NO_ZEROES, which might have worked too).

For what it's worth, Ubuntu 14.04 (still under long term
support) ships with nbd-client 3.7 and I've just had a bug
report filed against gonbdserver where all exports go readonly.
   https://github.com/abligh/gonbdserver/issues/4

I've added an option to disable NBD_FLAG_NO_ZEROS on nbd-server
but that's pretty horrible. And I think this will affect all
pre-3.10 clients talking to 3.9 and later servers.

I'm going to find a minimal patch to nbd-client and offer that
to Ubuntu / Debian.

This message is here in part so I have something to point them at
on the mailing list :-)

-- 
Alex Bligh







Re: [Qemu-block] [Qemu-devel] [PATCH v3 36/44] nbd: Improve handling of shutdown requests

2016-04-25 Thread Alex Bligh

On 25 Apr 2016, at 20:20, Eric Blake <ebl...@redhat.com> wrote:

>>>} else if (fixedNewstyle) {
>> 
>> So the above is for NewStyle (not fixedNewStyle)?
> 
> The above is for fixedNewStyle when TLS is not negotiated yet; the 'else
> if' is for fixedNewStyle after TLS has been negotiated.  Prior to TLS,
> we have to special-case NBD_OPT_ABORT and actually quit.

OK. fixedNewStyle is defined as a prerequisite for TLS. I'm hoping
nothing in Qemuland ever did non-fixed NewStyle, and assuming that's
the case I would not even support it (certainly it won't play
nicely with all the other stuff you've been doing).

--
Alex Bligh






signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [Qemu-block] [Qemu-devel] [PATCH v3 44/44] nbd: Implement NBD_OPT_BLOCK_SIZE on client

2016-04-25 Thread Alex Bligh
  info->size, info->flags);
> break;
> 
> +case NBD_INFO_BLOCK_SIZE:
> +if (len != sizeof(info->min_block) * 3) {
> +error_setg(errp, "remaining export info len %" PRIu32
> +   " is unexpected size", len);
> +return -1;
> +}
> +if (read_sync(ioc, >min_block, sizeof(info->min_block)) !=
> +sizeof(info->min_block)) {
> +error_setg(errp, "failed to read info minimum block size");
> +return -1;
> +}
> +be32_to_cpus(>min_block);
> +if (!is_power_of_2(info->min_block)) {
> +error_setg(errp, "server minimum block size %" PRId32
> +   "is not a power of two", info->min_block);
> +return -1;
> +}
> +if (read_sync(ioc, >opt_block, sizeof(info->opt_block)) !=
> +sizeof(info->opt_block)) {
> +error_setg(errp, "failed to read info preferred block size");
> +return -1;
> +}
> +be32_to_cpus(>opt_block);
> +if (!is_power_of_2(info->opt_block) ||
> +info->opt_block < info->min_block) {
> +error_setg(errp, "server preferred block size %" PRId32
> +   "is not valid", info->opt_block);
> +return -1;
> +}
> +if (read_sync(ioc, >max_block, sizeof(info->max_block)) !=
> +sizeof(info->max_block)) {
> +error_setg(errp, "failed to read info maximum block size");
> +return -1;
> +}
> +be32_to_cpus(>max_block);
> +TRACE("Block sizes are 0x%" PRIx32 ", 0x%" PRIx32 ", 0x%" PRIx32,
> +  info->min_block, info->opt_block, info->max_block);
> +break;
> +

You should probably check min_block <= opt_block <= max_block here

Also should there be a check that BDRV_SECTOR_SIZE >= min_block?


> default:
> TRACE("ignoring unknown export info %" PRIu16, type);
> if (drop_sync(ioc, len) != len) {
> @@ -710,8 +769,9 @@ fail:
> #ifdef __linux__
> int nbd_init(int fd, QIOChannelSocket *sioc, NbdExportInfo *info)
> {
> -unsigned long sectors = info->size / BDRV_SECTOR_SIZE;
> -if (info->size / BDRV_SECTOR_SIZE != sectors) {
> +unsigned long sector_size = MAX(BDRV_SECTOR_SIZE, info->min_block);
> +unsigned long sectors = info->size / sector_size;
> +if (info->size / sector_size != sectors) {
> LOG("Export size %" PRId64 " too large for 32-bit kernel", 
> info->size);
>     return -E2BIG;
> }
> @@ -724,18 +784,18 @@ int nbd_init(int fd, QIOChannelSocket *sioc, 
> NbdExportInfo *info)
> return -serrno;
> }
> 
> -TRACE("Setting block size to %lu", (unsigned long)BDRV_SECTOR_SIZE);
> +TRACE("Setting block size to %lu", sector_size);
> 
> -if (ioctl(fd, NBD_SET_BLKSIZE, (unsigned long)BDRV_SECTOR_SIZE) < 0) {
> +if (ioctl(fd, NBD_SET_BLKSIZE, sector_size) < 0) {
> int serrno = errno;
> LOG("Failed setting NBD block size");
> return -serrno;
> }
> 
> TRACE("Setting size to %lu block(s)", sectors);
> -if (size % BDRV_SECTOR_SIZE) {
> -TRACE("Ignoring trailing %d bytes of export",
> -  (int) (size % BDRV_SECTOR_SIZE));
> +if (info->size % sector_size) {
> +TRACE("Ignoring trailing %" PRId64 " bytes of export",
> +  info->size % sector_size);
> }
> 
> if (ioctl(fd, NBD_SET_SIZE_BLOCKS, sectors) < 0) {
> -- 
> 2.5.5
> 
> 

-- 
Alex Bligh







Re: [Qemu-block] [Qemu-devel] [PATCH v3 42/44] nbd: Implement NBD_CMD_WRITE_ZEROES on client

2016-04-25 Thread Alex Bligh

On 23 Apr 2016, at 00:40, Eric Blake <ebl...@redhat.com> wrote:

> Upstream NBD protocol recently added the ability to efficiently
> write zeroes without having to send the zeroes over the wire,
> along with a flag to control whether the client wants a hole.
> 
> The generic block code takes care of falling back to the obvious
> write of lots of zeroes if we return -ENOTSUP because the server
> does not have WRITE_ZEROES.
> 
> Signed-off-by: Eric Blake <ebl...@redhat.com>
Reviewed-by: Alex Bligh <a...@alex.org.uk>
> 
> ---
> v3: rebase, tell block layer about our support
> ---
> block/nbd-client.h |  2 ++
> block/nbd-client.c | 34 ++
> block/nbd.c| 24 
> 3 files changed, 60 insertions(+)
> 
> diff --git a/block/nbd-client.h b/block/nbd-client.h
> index 0867147..07630ab 100644
> --- a/block/nbd-client.h
> +++ b/block/nbd-client.h
> @@ -46,6 +46,8 @@ void nbd_client_close(BlockDriverState *bs);
> int nbd_client_co_discard(BlockDriverState *bs, int64_t sector_num,
>   int nb_sectors);
> int nbd_client_co_flush(BlockDriverState *bs);
> +int nbd_client_co_write_zeroes(BlockDriverState *bs, int64_t sector_num,
> +   int nb_sectors, int *flags);
> int nbd_client_co_writev(BlockDriverState *bs, int64_t sector_num,
>  int nb_sectors, QEMUIOVector *qiov, int *flags);
> int nbd_client_co_readv(BlockDriverState *bs, int64_t sector_num,
> diff --git a/block/nbd-client.c b/block/nbd-client.c
> index f20219b..2b6ac27 100644
> --- a/block/nbd-client.c
> +++ b/block/nbd-client.c
> @@ -291,6 +291,40 @@ int nbd_client_co_readv(BlockDriverState *bs, int64_t 
> sector_num,
> return nbd_co_readv_1(bs, sector_num, nb_sectors, qiov, offset);
> }
> 
> +int nbd_client_co_write_zeroes(BlockDriverState *bs, int64_t sector_num,
> +   int nb_sectors, int *flags)
> +{
> +ssize_t ret;
> +NbdClientSession *client = nbd_get_client_session(bs);
> +struct nbd_request request = { .type = NBD_CMD_WRITE_ZEROES };
> +struct nbd_reply reply;
> +
> +if (!(client->info.flags & NBD_FLAG_SEND_WRITE_ZEROES)) {
> +return -ENOTSUP;
> +}
> +
> +if ((*flags & BDRV_REQ_FUA) && (client->info.flags & NBD_FLAG_SEND_FUA)) 
> {
> +*flags &= ~BDRV_REQ_FUA;
> +request.flags |= NBD_CMD_FLAG_FUA;
> +}
> +if (!(*flags & BDRV_REQ_MAY_UNMAP)) {
> +request.flags |= NBD_CMD_FLAG_NO_HOLE;
> +}
> +
> +request.from = sector_num * 512;
> +request.len = nb_sectors * 512;
> +
> +nbd_coroutine_start(client, );
> +ret = nbd_co_send_request(bs, , NULL, 0);
> +if (ret < 0) {
> +reply.error = -ret;
> +} else {
> +nbd_co_receive_reply(client, , , NULL, 0);
> +}
> +nbd_coroutine_end(client, );
> +return -reply.error;
> +}
> +
> int nbd_client_co_writev(BlockDriverState *bs, int64_t sector_num,
>  int nb_sectors, QEMUIOVector *qiov, int *flags)
> {
> diff --git a/block/nbd.c b/block/nbd.c
> index 34db83e..5172039 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -355,6 +355,26 @@ static int nbd_co_readv(BlockDriverState *bs, int64_t 
> sector_num,
> return nbd_client_co_readv(bs, sector_num, nb_sectors, qiov);
> }
> 
> +static int nbd_co_write_zeroes(BlockDriverState *bs, int64_t sector_num,
> +   int nb_sectors, BdrvRequestFlags orig_flags)
> +{
> +int flags = orig_flags;
> +int ret;
> +
> +ret = nbd_client_co_write_zeroes(bs, sector_num, nb_sectors, );
> +if (ret < 0) {
> +return ret;
> +}
> +
> +/* The flag wasn't sent to the server, so we need to emulate it with an
> + * explicit flush */
> +if (flags & BDRV_REQ_FUA) {
> +ret = nbd_client_co_flush(bs);
> +}
> +
> +return ret;
> +}
> +
> static int nbd_co_writev_flags(BlockDriverState *bs, int64_t sector_num,
>int nb_sectors, QEMUIOVector *qiov, int flags)
> {
> @@ -388,6 +408,7 @@ static int nbd_co_flush(BlockDriverState *bs)
> static void nbd_refresh_limits(BlockDriverState *bs, Error **errp)
> {
> bs->bl.max_discard = UINT32_MAX >> BDRV_SECTOR_BITS;
> +bs->bl.max_write_zeroes = UINT32_MAX >> BDRV_SECTOR_BITS;
> bs->bl.max_transfer_length = UINT32_MAX >> BDRV_SECTOR_BITS;
> }
> 
> @@ -476,6 +497,7 @@ static BlockDriver bdrv_nbd = {
> .bdrv_parse_filename= nbd_parse_filename,
> .bdrv_file_open = nbd_

Re: [Qemu-block] [Qemu-devel] [PATCH v3 41/44] nbd: Implement NBD_CMD_WRITE_ZEROES on server

2016-04-25 Thread Alex Bligh

On 23 Apr 2016, at 00:40, Eric Blake <ebl...@redhat.com> wrote:

> Upstream NBD protocol recently added the ability to efficiently
> write zeroes without having to send the zeroes over the wire,
> along with a flag to control whether the client wants a hole.
> 
> Signed-off-by: Eric Blake <ebl...@redhat.com>

Reviewed-by: Alex Bligh <a...@alex.org.uk>

> 
> ---
> v3: abandon NBD_CMD_CLOSE extension, rebase to use blk_pwrite_zeroes
> ---
> include/block/nbd.h |  7 +--
> nbd/server.c| 42 --
> 2 files changed, 45 insertions(+), 4 deletions(-)
> 
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index 05c0e48..1072d9e 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -70,6 +70,7 @@ typedef struct nbd_reply nbd_reply;
> #define NBD_FLAG_SEND_FUA   (1 << 3)/* Send FUA (Force Unit 
> Access) */
> #define NBD_FLAG_ROTATIONAL (1 << 4)/* Use elevator algorithm - 
> rotational media */
> #define NBD_FLAG_SEND_TRIM  (1 << 5)/* Send TRIM (discard) */
> +#define NBD_FLAG_SEND_WRITE_ZEROES (1 << 6) /* Send WRITE_ZEROES */
> 
> /* New-style handshake (global) flags, sent from server to client, and
>control what will happen during handshake phase. */
> @@ -102,7 +103,8 @@ typedef struct nbd_reply nbd_reply;
> #define NBD_INFO_DESCRIPTION2
> 
> /* Request flags, sent from client to server during transmission phase */
> -#define NBD_CMD_FLAG_FUA(1 << 0)
> +#define NBD_CMD_FLAG_FUA(1 << 0) /* 'force unit access' during write 
> */
> +#define NBD_CMD_FLAG_NO_HOLE(1 << 1) /* don't punch hole on zero run */
> 
> /* Supported request types */
> enum {
> @@ -110,7 +112,8 @@ enum {
> NBD_CMD_WRITE = 1,
> NBD_CMD_DISC = 2,
> NBD_CMD_FLUSH = 3,
> -NBD_CMD_TRIM = 4
> +NBD_CMD_TRIM = 4,
> +NBD_CMD_WRITE_ZEROES = 5,
> };
> 
> #define NBD_DEFAULT_PORT  10809
> diff --git a/nbd/server.c b/nbd/server.c
> index 1edb5f3..563afb2 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -689,7 +689,8 @@ static coroutine_fn int nbd_negotiate(NBDClientNewData 
> *data)
> char buf[8 + 8 + 8 + 128];
> int rc;
> const uint16_t myflags = (NBD_FLAG_HAS_FLAGS | NBD_FLAG_SEND_TRIM |
> -  NBD_FLAG_SEND_FLUSH | NBD_FLAG_SEND_FUA);
> +  NBD_FLAG_SEND_FLUSH | NBD_FLAG_SEND_FUA |
> +  NBD_FLAG_SEND_WRITE_ZEROES);
> bool oldStyle;
> size_t len;
> 
> @@ -1199,11 +1200,17 @@ static ssize_t nbd_co_receive_request(NBDRequest *req,
> rc = -EINVAL;
> goto out;
> }
> -if (request->flags & ~NBD_CMD_FLAG_FUA) {
> +if (request->flags & ~(NBD_CMD_FLAG_FUA | NBD_CMD_FLAG_NO_HOLE)) {
> LOG("unsupported flags (got 0x%x)", request->flags);
> rc = -EINVAL;
> goto out;
> }
> +if (request->type != NBD_CMD_WRITE_ZEROES &&
> +(request->flags & NBD_CMD_FLAG_NO_HOLE)) {
> +LOG("unexpected flags (got 0x%x)", request->flags);
> +rc = -EINVAL;
> +goto out;
> +}
> 
> rc = 0;
> 
> @@ -1308,6 +1315,37 @@ static void nbd_trip(void *opaque)
> }
> break;
> 
> +case NBD_CMD_WRITE_ZEROES:
> +TRACE("Request type is WRITE_ZEROES");
> +
> +if (exp->nbdflags & NBD_FLAG_READ_ONLY) {
> +TRACE("Server is read-only, return error");
> +reply.error = EROFS;
> +goto error_reply;
> +}
> +
> +TRACE("Writing to device");
> +
> +flags = 0;
> +if (request.flags & NBD_CMD_FLAG_FUA) {
> +flags |= BDRV_REQ_FUA;
> +}
> +if (!(request.flags & NBD_CMD_FLAG_NO_HOLE)) {
> +flags |= BDRV_REQ_MAY_UNMAP;
> +}
> +ret = blk_pwrite_zeroes(exp->blk, request.from + exp->dev_offset,
> +request.len, flags);
> +if (ret < 0) {
> +LOG("writing to file failed");
> +reply.error = -ret;
> +goto error_reply;
> +}
> +
> +if (nbd_co_send_reply(req, , 0) < 0) {
> +goto out;
> +}
> +break;
> +
> case NBD_CMD_DISC:
> /* unreachable, thanks to special case in nbd_co_receive_request() */
> abort();
> -- 
> 2.5.5
> 
> 

-- 
Alex Bligh







Re: [Qemu-block] [Qemu-devel] [PATCH v3 40/44] nbd: Implement NBD_OPT_GO on client

2016-04-25 Thread Alex Bligh

On 23 Apr 2016, at 00:40, Eric Blake <ebl...@redhat.com> wrote:

> NBD_OPT_EXPORT_NAME is lousy: it doesn't have any sane error
> reporting.  Upstream NBD recently added NBD_OPT_GO as the
> improved version of the option that does what we want: it
> reports sane errors on failures (including when a server
> requires TLS but does not have NBD_OPT_GO!), and on success
> it provides at least as much info as NBD_OPT_EXPORT_NAME sends.
> 
> Signed-off-by: Eric Blake <ebl...@redhat.com>
> 

Reviewed-by: Alex Bligh <a...@alex.org.uk>

> ---
> v3: revamp to match latest version of NBD protocol
> ---
> nbd/nbd-internal.h |   3 ++
> nbd/client.c   | 120 -
> 2 files changed, 121 insertions(+), 2 deletions(-)
> 
> diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
> index c597bb8..1935102 100644
> --- a/nbd/nbd-internal.h
> +++ b/nbd/nbd-internal.h
> @@ -55,8 +55,11 @@
>  * https://github.com/yoe/nbd/blob/master/doc/proto.md
>  */
> 
> +/* Size of all NBD_OPT_*, without payload */
> #define NBD_REQUEST_SIZE(4 + 2 + 2 + 8 + 8 + 4)
> +/* Size of all NBD_REP_* sent in answer to most NBD_OPT_*, without payload */
> #define NBD_REPLY_SIZE  (4 + 4 + 8)
> +
> #define NBD_REQUEST_MAGIC   0x25609513
> #define NBD_REPLY_MAGIC 0x67446698
> #define NBD_OPTS_MAGIC  0x49484156454F5054LL
> diff --git a/nbd/client.c b/nbd/client.c
> index 89fa2c3..dac4f29 100644
> --- a/nbd/client.c
> +++ b/nbd/client.c
> @@ -222,6 +222,11 @@ static int nbd_handle_reply_err(QIOChannel *ioc, 
> nbd_opt_reply *reply,
>reply->option);
> break;
> 
> +case NBD_REP_ERR_UNKNOWN:
> +error_setg(errp, "Requested export not available for option %" 
> PRIx32,
> +   reply->option);
> +break;
> +
> case NBD_REP_ERR_SHUTDOWN:
> error_setg(errp, "Server shutting down before option %" PRIx32,
>reply->option);
> @@ -311,6 +316,103 @@ static int nbd_receive_list(QIOChannel *ioc, const char 
> *want, Error **errp)
> }
> 
> 
> +/* Returns -1 if NBD_OPT_GO proves the export @wantname cannot be
> + * used, 0 if NBD_OPT_GO is unsupported (fall back to NBD_OPT_LIST and
> + * NBD_OPT_EXPORT_NAME in that case), and > 0 if the export is good to
> + * go (with @size and @flags populated). */
> +static int nbd_opt_go(QIOChannel *ioc, const char *wantname,
> +  NbdExportInfo *info, Error **errp)
> +{
> +nbd_opt_reply reply;
> +uint32_t len;
> +uint16_t type;
> +int error;
> +
> +/* The protocol requires that the server send NBD_INFO_EXPORT with
> + * a non-zero flags (at least NBD_FLAG_HAS_FLAGS must be set); so
> + * flags still 0 is a witness of a broken server. */
> +info->flags = 0;
> +
> +TRACE("Attempting NBD_OPT_GO for export '%s'", wantname);
> +if (nbd_send_option_request(ioc, NBD_OPT_GO, -1, wantname, errp) < 0) {
> +return -1;
> +}
> +
> +TRACE("Reading export info");
> +while (1) {
> +if (nbd_receive_option_reply(ioc, NBD_OPT_GO, , errp) < 0) {
> +return -1;
> +}
> +error = nbd_handle_reply_err(ioc, , errp);
> +if (error <= 0) {
> +return error;
> +}
> +len = reply.length;
> +
> +if (reply.type == NBD_REP_ACK) {
> +/* Server is done sending info and moved into transmission
> +   phase, but make sure it sent flags */
> +if (len) {
> +error_setg(errp, "server sent invalid NBD_REP_ACK");
> +return -1;
> +}
> +if (!info->flags) {
> +error_setg(errp, "broken server omitted NBD_INFO_EXPORT");
> +return -1;
> +}
> +TRACE("export is good to go");
> +return 1;
> +}
> +if (reply.type != NBD_REP_INFO) {
> +error_setg(errp, "unexpected reply type %" PRIx32 ", expected 
> %x",
> +   reply.type, NBD_REP_INFO);
> +return -1;
> +}
> +if (len < sizeof(type)) {
> +error_setg(errp, "NBD_REP_INFO length %" PRIu32 " is too short",
> +   len);
> +return -1;
> +}
> +if (read_sync(ioc, , sizeof(type)) != sizeof(type)) {
> +error_setg(errp, "failed to read info type");
> +return -1;
> +}
> +l

Re: [Qemu-block] [Qemu-devel] [PATCH v3 36/44] nbd: Improve handling of shutdown requests

2016-04-25 Thread Alex Bligh
->error == ESHUTDOWN) {
> +LOG("server shutting down");
> +return -EINVAL;
> +}
> return 0;
> }
> 
> diff --git a/nbd/server.c b/nbd/server.c
> index dadc928..fa6a994 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -39,6 +39,8 @@ static int system_errno_to_nbd_errno(int err)
> case EFBIG:
> case ENOSPC:
> return NBD_ENOSPC;
> +case ESHUTDOWN:
> +return NBD_ESHUTDOWN;
> case EINVAL:
> default:
> return NBD_EINVAL;
> @@ -484,6 +486,10 @@ static int nbd_negotiate_options(NBDClient *client)
> if (ret < 0) {
> return ret;
> }
> +/* Let the client keep trying, unless they asked to quit */
> +if (clientflags == NBD_OPT_ABORT) {

OK that's totally confusing. clientflags is not the client flags. clientflags
is the NBD option ID, which happens to be the two bytes after the NBD OPT magic,
which is the client flags if we were doing oldstyle negotiation, not newstyle
negotiation.

Except:

> +return -EINVAL;
> +}
> break;
> }
> } else if (fixedNewstyle) {

So the above is for NewStyle (not fixedNewStyle)?

In which case more than one option isn't even supported, so what's the
stuff purporting to handle TLS doing there?

Confused ...

> @@ -496,7 +502,15 @@ static int nbd_negotiate_options(NBDClient *client)
> break;
> 
> case NBD_OPT_ABORT:
> -return -EINVAL;
> +    /* NBD spec says we must reply before disconnecting,
> + * but that we must also tolerate guests that don't
> + * wait for our reply. */
> +ret = nbd_negotiate_send_rep(client->ioc, NBD_REP_ACK,
> + clientflags);
> +if (!ret) {
> +ret = -EINVAL;
> +}
> +return ret;
> 
> case NBD_OPT_EXPORT_NAME:
> return nbd_negotiate_handle_export_name(client, length);
> -- 
> 2.5.5
> 
> 

-- 
Alex Bligh







Re: [Qemu-block] [Qemu-devel] [PATCH v3 31/44] nbd: Share common reply-sending code in server

2016-04-25 Thread Alex Bligh

On 23 Apr 2016, at 00:40, Eric Blake <ebl...@redhat.com> wrote:

> Rather than open-coding NBD_REP_SERVER, reuse the code we
> already have by adding a length parameter.  Additionally,
> the refactoring will make adding NBD_OPT_GO in a later patch
> easier.
> 
> Signed-off-by: Eric Blake <ebl...@redhat.com>
Reviewed-by: Alex Bligh <a...@alex.org.uk>
> 
> ---
> v3: rebase to changes earlier in series
> ---
> nbd/server.c | 48 +++-
> 1 file changed, 23 insertions(+), 25 deletions(-)
> 
> diff --git a/nbd/server.c b/nbd/server.c
> index 1d30b6d..4435d37 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -195,12 +195,15 @@ static ssize_t nbd_negotiate_drop_sync(QIOChannel *ioc, 
> size_t size)
> 
> */
> 
> -static int nbd_negotiate_send_rep(QIOChannel *ioc, uint32_t type, uint32_t 
> opt)
> +/* Send a reply header, including length, but no payload.
> + * Return -errno to kill connection, 0 to continue negotiation. */
> +static int nbd_negotiate_send_rep_len(QIOChannel *ioc, uint32_t type,
> +  uint32_t opt, uint32_t len)
> {
> uint64_t magic;
> -uint32_t len;
> 
> -TRACE("Reply opt=%" PRIx32 " type=%" PRIx32, type, opt);
> +TRACE("Reply opt=%" PRIx32 " type=%" PRIx32 " len=%" PRIu32,
> +  type, opt, len);
> 
> magic = cpu_to_be64(NBD_REP_MAGIC);
> if (nbd_negotiate_write(ioc, , sizeof(magic)) != sizeof(magic)) {
> @@ -217,7 +220,7 @@ static int nbd_negotiate_send_rep(QIOChannel *ioc, 
> uint32_t type, uint32_t opt)
> LOG("write failed (rep type)");
> return -EINVAL;
> }
> -len = cpu_to_be32(0);
> +len = cpu_to_be32(len);
> if (nbd_negotiate_write(ioc, , sizeof(len)) != sizeof(len)) {
> LOG("write failed (rep data length)");
> return -EINVAL;
> @@ -225,37 +228,32 @@ static int nbd_negotiate_send_rep(QIOChannel *ioc, 
> uint32_t type, uint32_t opt)
> return 0;
> }
> 
> +/* Send a reply header with default 0 length.
> + * Return -errno to kill connection, 0 to continue negotiation. */
> +static int nbd_negotiate_send_rep(QIOChannel *ioc, uint32_t type, uint32_t 
> opt)
> +{
> +return nbd_negotiate_send_rep_len(ioc, type, opt, 0);
> +}
> +
> +/* Send an NBD_REP_SERVER reply to NBD_OPT_LIST, including payload.
> + * Return -errno to kill connection, 0 to continue negotiation. */
> static int nbd_negotiate_send_rep_list(QIOChannel *ioc, NBDExport *exp)
> {
> -uint64_t magic;
> size_t name_len, desc_len;
> -uint32_t opt, type, len;
> +uint32_t len;
> const char *name = exp->name ? exp->name : "";
> const char *desc = exp->description ? exp->description : "";
> +int rc;
> 
> TRACE("Advertising export name '%s' description '%s'", name, desc);
> name_len = strlen(name);
> desc_len = strlen(desc);
> -magic = cpu_to_be64(NBD_REP_MAGIC);
> -if (nbd_negotiate_write(ioc, , sizeof(magic)) != sizeof(magic)) {
> -LOG("write failed (magic)");
> -return -EINVAL;
> - }
> -opt = cpu_to_be32(NBD_OPT_LIST);
> -if (nbd_negotiate_write(ioc, , sizeof(opt)) != sizeof(opt)) {
> -LOG("write failed (opt)");
> -return -EINVAL;
> -}
> -type = cpu_to_be32(NBD_REP_SERVER);
> -if (nbd_negotiate_write(ioc, , sizeof(type)) != sizeof(type)) {
> -LOG("write failed (reply type)");
> -return -EINVAL;
> -}
> -len = cpu_to_be32(name_len + desc_len + sizeof(len));
> -if (nbd_negotiate_write(ioc, , sizeof(len)) != sizeof(len)) {
> -LOG("write failed (length)");
> -return -EINVAL;
> +len = name_len + desc_len + sizeof(len);
> +rc = nbd_negotiate_send_rep_len(ioc, NBD_REP_SERVER, NBD_OPT_LIST, len);
> +if (rc < 0) {
> +return rc;
> }
> +
> len = cpu_to_be32(name_len);
> if (nbd_negotiate_write(ioc, , sizeof(len)) != sizeof(len)) {
> LOG("write failed (name length)");
> -- 
> 2.5.5
> 
> 

-- 
Alex Bligh







Re: [Qemu-block] [Qemu-devel] [PATCH v3 32/44] nbd: Share common option-sending code in client

2016-04-25 Thread Alex Bligh

On 23 Apr 2016, at 00:40, Eric Blake <ebl...@redhat.com> wrote:

> Rather than open-coding each option request, it's easier to
> have common helper functions do the work.  That in turn requires
> having convenient packed types for handling option requests
> and replies.
> 
> Signed-off-by: Eric Blake <ebl...@redhat.com>
> Reviewed-by: Alex Bligh <a...@alex.org.uk>
> 
> ---
> v3: rebase, tweak a debug message
> ---
> include/block/nbd.h |  29 +-
> nbd/nbd-internal.h  |   2 +-
> nbd/client.c| 250 ++--
> 3 files changed, 129 insertions(+), 152 deletions(-)
> 
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index f4ae86c..5227ec6 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -26,20 +26,41 @@
> #include "io/channel-socket.h"
> #include "crypto/tlscreds.h"
> 
> +/* Handshake phase structs */
> +
> +struct nbd_option {
> +uint64_t magic; /* NBD_OPTS_MAGIC */
> +uint32_t option; /* NBD_OPT_* */
> +uint32_t length;
> +} QEMU_PACKED;
> +typedef struct nbd_option nbd_option;
> +
> +struct nbd_opt_reply {
> +uint64_t magic; /* NBD_REP_MAGIC */
> +uint32_t option; /* NBD_OPT_* */
> +uint32_t type; /* NBD_REP_* */
> +uint32_t length;
> +} QEMU_PACKED;
> +typedef struct nbd_opt_reply nbd_opt_reply;
> +
> +/* Transmission phase structs */
> +
> struct nbd_request {
> -uint32_t magic;
> -uint16_t flags;
> -uint16_t type;
> +uint32_t magic; /* NBD_REQUEST_MAGIC */
> +uint16_t flags; /* NBD_CMD_FLAG_* */
> +uint16_t type; /* NBD_CMD_* */
> uint64_t handle;
> uint64_t from;
> uint32_t len;
> } QEMU_PACKED;
> +typedef struct nbd_request nbd_request;
> 
> struct nbd_reply {
> -uint32_t magic;
> +uint32_t magic; /* NBD_REPLY_MAGIC */
> uint32_t error;
> uint64_t handle;
> } QEMU_PACKED;
> +typedef struct nbd_reply nbd_reply;
> 
> /* Transmission (export) flags: sent from server to client during handshake,
>but describe what will happen during transmission */
> diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
> index d0108a1..95069db 100644
> --- a/nbd/nbd-internal.h
> +++ b/nbd/nbd-internal.h
> @@ -61,7 +61,7 @@
> #define NBD_REPLY_MAGIC 0x67446698
> #define NBD_OPTS_MAGIC  0x49484156454F5054LL
> #define NBD_CLIENT_MAGIC0x420281861253LL
> -#define NBD_REP_MAGIC   0x3e889045565a9LL
> +#define NBD_REP_MAGIC   0x0003e889045565a9LL
> 
> #define NBD_SET_SOCK_IO(0xab, 0)
> #define NBD_SET_BLKSIZE _IO(0xab, 1)
> diff --git a/nbd/client.c b/nbd/client.c
> index a9e173a..6626fa8 100644
> --- a/nbd/client.c
> +++ b/nbd/client.c
> @@ -75,64 +75,123 @@ static QTAILQ_HEAD(, NBDExport) exports = 
> QTAILQ_HEAD_INITIALIZER(exports);
> 
> */
> 
> +/* Send an option request. Return 0 if successful, -1 with errp set if
> + * it is impossible to continue. */
> +static int nbd_send_option_request(QIOChannel *ioc, uint32_t opt,
> +   uint32_t len, const char *data,
> +   Error **errp)
> +{
> +nbd_option req;
> +QEMU_BUILD_BUG_ON(sizeof(req) != 16);
> 
> -/* If type represents success, return 1 without further action.
> - * If type represents an error reply, consume the rest of the packet on ioc.
> - * Then return 0 for unsupported (so the client can fall back to
> - * other approaches), or -1 with errp set for other errors.
> +if (len == -1) {
> +req.length = len = strlen(data);
> +}
> +TRACE("Sending option request %"PRIu32", len %"PRIu32, opt, len);
> +
> +stq_be_p(, NBD_OPTS_MAGIC);
> +stl_be_p(, opt);
> +stl_be_p(, len);
> +
> +if (write_sync(ioc, , sizeof(req)) != sizeof(req)) {
> +error_setg(errp, "Failed to send option request header");
> +return -1;
> +}
> +
> +if (len && write_sync(ioc, (char *) data, len) != len) {
> +error_setg(errp, "Failed to send option request data");
> +return -1;
> +}
> +
> +return 0;
> +}
> +
> +/* Receive the header of an option reply, which should match the given
> + * opt.  Read through the length field, but NOT the length bytes of
> + * payload. Return 0 if successful, -1 with errp set if it is
> + * impossible to continue. */
> +static int nbd_receive_option_reply(QIOChannel *ioc, uint32_t opt,
> +nbd_opt_reply *reply, Error **errp)
> +{
> +QEMU_BUILD_BUG_ON(sizeof(*reply) != 20);
> +if (read_sync(

Re: [Qemu-block] [Qemu-devel] [PATCH v3 30/44] nbd: Treat flags vs. command type as separate fields

2016-04-25 Thread Alex Bligh

On 23 Apr 2016, at 00:40, Eric Blake <ebl...@redhat.com> wrote:

> Current upstream NBD documents that requests have a 16-bit flags,
> followed by a 16-bit type integer; although older versions mentioned
> only a 32-bit field with masking to find flags.  Since the protocol
> is in network order (big-endian over the wire), the ABI is unchanged;
> but dealing with the flags as a separate field rather than masking
> will make it easier to add support for upcoming NBD extensions that
> increase the number of both flags and commands.
> 
> Improve some comments in nbd.h based on the current upstream
> NBD protocol (https://github.com/yoe/nbd/blob/master/doc/proto.md),
> and touch some nearby code to keep checkpatch.pl happy.
> 
> Signed-off-by: Eric Blake <ebl...@redhat.com>
Reviewed-by: Alex Bligh <a...@alex.org.uk>
> 
> ---
> v3: rebase to other changes earlier in series
> ---
> include/block/nbd.h | 18 --
> nbd/nbd-internal.h  |  4 ++--
> block/nbd-client.c  |  9 +++--
> nbd/client.c| 17 ++---
> nbd/server.c| 51 ++-
> 5 files changed, 53 insertions(+), 46 deletions(-)
> 
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index 2c753cc..f4ae86c 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -1,4 +1,5 @@
> /*
> + *  Copyright (C) 2016 Red Hat, Inc.
>  *  Copyright (C) 2005  Anthony Liguori <anth...@codemonkey.ws>
>  *
>  *  Network Block Device
> @@ -27,7 +28,8 @@
> 
> struct nbd_request {
> uint32_t magic;
> -uint32_t type;
> +uint16_t flags;
> +uint16_t type;
> uint64_t handle;
> uint64_t from;
> uint32_t len;
> @@ -39,6 +41,8 @@ struct nbd_reply {
> uint64_t handle;
> } QEMU_PACKED;
> 
> +/* Transmission (export) flags: sent from server to client during handshake,
> +   but describe what will happen during transmission */
> #define NBD_FLAG_HAS_FLAGS  (1 << 0)/* Flags are there */
> #define NBD_FLAG_READ_ONLY  (1 << 1)/* Device is read-only */
> #define NBD_FLAG_SEND_FLUSH (1 << 2)/* Send FLUSH */
> @@ -46,10 +50,12 @@ struct nbd_reply {
> #define NBD_FLAG_ROTATIONAL (1 << 4)/* Use elevator algorithm - 
> rotational media */
> #define NBD_FLAG_SEND_TRIM  (1 << 5)/* Send TRIM (discard) */
> 
> -/* New-style global flags. */
> +/* New-style handshake (global) flags, sent from server to client, and
> +   control what will happen during handshake phase. */
> #define NBD_FLAG_FIXED_NEWSTYLE (1 << 0)/* Fixed newstyle protocol. */
> 
> -/* New-style client flags. */
> +/* New-style client flags, sent from client to server to control what happens
> +   during handshake phase. */
> #define NBD_FLAG_C_FIXED_NEWSTYLE   (1 << 0)/* Fixed newstyle protocol. */
> 
> /* Reply types. */
> @@ -60,10 +66,10 @@ struct nbd_reply {
> #define NBD_REP_ERR_INVALID ((UINT32_C(1) << 31) | 3) /* Invalid length. 
> */
> #define NBD_REP_ERR_TLS_REQD((UINT32_C(1) << 31) | 5) /* TLS required */
> 
> +/* Request flags, sent from client to server during transmission phase */
> +#define NBD_CMD_FLAG_FUA(1 << 0)
> 
> -#define NBD_CMD_MASK_COMMAND 0x
> -#define NBD_CMD_FLAG_FUA (1 << 16)
> -
> +/* Supported request types */
> enum {
> NBD_CMD_READ = 0,
> NBD_CMD_WRITE = 1,
> diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
> index 035ead4..d0108a1 100644
> --- a/nbd/nbd-internal.h
> +++ b/nbd/nbd-internal.h
> @@ -52,10 +52,10 @@
> /* This is all part of the "official" NBD API.
>  *
>  * The most up-to-date documentation is available at:
> - * https://github.com/yoe/nbd/blob/master/doc/proto.txt
> + * https://github.com/yoe/nbd/blob/master/doc/proto.md
>  */
> 
> -#define NBD_REQUEST_SIZE(4 + 4 + 8 + 8 + 4)
> +#define NBD_REQUEST_SIZE(4 + 2 + 2 + 8 + 8 + 4)
> #define NBD_REPLY_SIZE  (4 + 4 + 8)
> #define NBD_REQUEST_MAGIC   0x25609513
> #define NBD_REPLY_MAGIC 0x67446698
> diff --git a/block/nbd-client.c b/block/nbd-client.c
> index 878e879..285025d 100644
> --- a/block/nbd-client.c
> +++ b/block/nbd-client.c
> @@ -1,6 +1,7 @@
> /*
>  * QEMU Block driver for  NBD
>  *
> + * Copyright (C) 2016 Red Hat, Inc.
>  * Copyright (C) 2008 Bull S.A.S.
>  * Author: Laurent Vivier <laurent.viv...@bull.net>
>  *
> @@ -252,7 +253,7 @@ static int nbd_co_writev_1(BlockDriverState *bs, int64_t 
> sector_num,
> 
> if ((*flags & BDRV_REQ_FUA) && (client->nbdflags & NBD_FLAG_SEND_FUA)) {

Re: [Qemu-block] [Qemu-devel] [PATCH v3 29/44] nbd: Avoid magic number for NBD max name size

2016-04-25 Thread Alex Bligh

On 23 Apr 2016, at 00:40, Eric Blake <ebl...@redhat.com> wrote:

> Declare a constant and use that when determining if an export
> name fits within the constraints we are willing to support.
> 
> Note that upstream NBD recently documented that clients MUST
> support export names of 256 bytes (not including trailing NUL),
> and SHOULD support names up to 4096 bytes.  4096 is a bit big
> (we would lose benefits of stack-allocation of a name array),
> and we already have other limits in place (for example, qcow2
> snapshot names are clamped around 1024).  So for now, just
> stick to the required minimum.
> 
> Signed-off-by: Eric Blake <ebl...@redhat.com>
Reviewed-by: Alex Bligh <a...@alex.org.uk>
> 
> ---
> v3: enlarge the limit, and document choice of new value
> ---
> include/block/nbd.h | 6 ++
> nbd/client.c| 2 +-
> nbd/server.c| 4 ++--
> 3 files changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index 3e2d76b..2c753cc 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -76,6 +76,12 @@ enum {
> 
> /* Maximum size of a single READ/WRITE data buffer */
> #define NBD_MAX_BUFFER_SIZE (32 * 1024 * 1024)
> +/* Maximum size of an export name. The NBD spec requires 256 and
> + * suggests that servers support up to 4096, but we stick to only the
> + * required size so that we can stack-allocate the names, and because
> + * going larger would require an audit of more code to make sure we
> + * aren't overflowing some other buffer. */
> +#define NBD_MAX_NAME_SIZE 256
> 
> ssize_t nbd_wr_syncv(QIOChannel *ioc,
>  struct iovec *iov,
> diff --git a/nbd/client.c b/nbd/client.c
> index 4659df3..b700100 100644
> --- a/nbd/client.c
> +++ b/nbd/client.c
> @@ -210,7 +210,7 @@ static int nbd_receive_list(QIOChannel *ioc, char **name, 
> Error **errp)
> error_setg(errp, "incorrect option name length");
> return -1;
> }
> -if (namelen > 255) {
> +if (namelen > NBD_MAX_NAME_SIZE) {
> error_setg(errp, "export name length too long %" PRIu32, namelen);
> return -1;
> }
> diff --git a/nbd/server.c b/nbd/server.c
> index fa05a73..a20bf8a 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -296,13 +296,13 @@ static int nbd_negotiate_handle_list(NBDClient *client, 
> uint32_t length)
> static int nbd_negotiate_handle_export_name(NBDClient *client, uint32_t 
> length)
> {
> int rc = -EINVAL;
> -char name[256];
> +char name[NBD_MAX_NAME_SIZE + 1];
> 
> /* Client sends:
>     [20 ..  xx]   export name (length bytes)
>  */
> TRACE("Checking length");
> -if (length > 255) {
> +if (length >= sizeof(name)) {
> LOG("Bad length received");
> goto fail;
> }
> -- 
> 2.5.5
> 
> 

-- 
Alex Bligh







Re: [Qemu-block] [Qemu-devel] [PATCH v3 03/44] nbd: Improve server handling of bogus commands

2016-04-25 Thread Alex Bligh
gt; @@ -1077,14 +1105,6 @@ static void nbd_trip(void *opaque)
> goto error_reply;
> }
> command = request.type & NBD_CMD_MASK_COMMAND;
> -if (command != NBD_CMD_DISC && (request.from + request.len) > exp->size) 
> {
> -LOG("From: %" PRIu64 ", Len: %" PRIu32", Size: %" PRIu64
> -", Offset: %" PRIu64 "\n",
> -request.from, request.len,
> -(uint64_t)exp->size, (uint64_t)exp->dev_offset);
> -LOG("requested operation past EOF--bad client?");
> -goto invalid_request;
> -}
> 
> if (client->closing) {
> /*
> @@ -1151,10 +1171,11 @@ static void nbd_trip(void *opaque)
> goto out;
> }
> break;
> +
> case NBD_CMD_DISC:
> -TRACE("Request type is DISCONNECT");
> -errno = 0;
> -goto out;
> +/* unreachable, thanks to special case in nbd_co_receive_request() */
> +abort();
> +
> case NBD_CMD_FLUSH:
> TRACE("Request type is FLUSH");
> 
> @@ -1182,10 +1203,14 @@ static void nbd_trip(void *opaque)
> break;
> default:
> LOG("invalid request type (%" PRIu32 ") received", request.type);
> -invalid_request:
> reply.error = EINVAL;
> error_reply:
> -if (nbd_co_send_reply(req, , 0) < 0) {
> +/* We must disconnect after replying with an error to
> + * NBD_CMD_READ, since we choose not to send bogus filler
> + * data; likewise after NBD_CMD_WRITE if we did not read the
> + * payload. */
> +if (nbd_co_send_reply(req, , 0) < 0 || command == NBD_CMD_READ 
> ||
> +(command == NBD_CMD_WRITE && !req->complete)) {
> goto out;
> }
> break;
> -- 
> 2.5.5
> 
> 

-- 
Alex Bligh







Re: [Qemu-block] [Qemu-devel] [PATCH v3 08/44] nbd: Add qemu-nbd -D for human-readable description

2016-04-25 Thread Alex Bligh

On 23 Apr 2016, at 00:40, Eric Blake <ebl...@redhat.com> wrote:

> The NBD protocol allows servers to advertise a human-readable
> description alongside an export name during NBD_OPT_LIST.  Add
> an option to pass through the user's string to the NBD client.
> 
> Doing this also makes it easier to test commit 200650d4, which
> is the client counterpart of receiving the description.
> 
> Signed-off-by: Eric Blake <ebl...@redhat.com>

Reviewed-by: Alex Bligh <a...@alex.org.uk>

> ---
> include/block/nbd.h |  1 +
> nbd/nbd-internal.h  |  5 +++--
> nbd/server.c| 34 ++
> qemu-nbd.c  | 12 +++-
> qemu-nbd.texi   |  5 -
> 5 files changed, 45 insertions(+), 12 deletions(-)
> 
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index 134f117..3e2d76b 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -107,6 +107,7 @@ BlockBackend *nbd_export_get_blockdev(NBDExport *exp);
> 
> NBDExport *nbd_export_find(const char *name);
> void nbd_export_set_name(NBDExport *exp, const char *name);
> +void nbd_export_set_description(NBDExport *exp, const char *description);
> void nbd_export_close_all(void);
> 
> void nbd_client_new(NBDExport *exp,
> diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
> index 3791535..035ead4 100644
> --- a/nbd/nbd-internal.h
> +++ b/nbd/nbd-internal.h
> @@ -103,9 +103,10 @@ static inline ssize_t read_sync(QIOChannel *ioc, void 
> *buffer, size_t size)
> return nbd_wr_syncv(ioc, , 1, 0, size, true);
> }
> 
> -static inline ssize_t write_sync(QIOChannel *ioc, void *buffer, size_t size)
> +static inline ssize_t write_sync(QIOChannel *ioc, const void *buffer,
> + size_t size)
> {
> -struct iovec iov = { .iov_base = buffer, .iov_len = size };
> +struct iovec iov = { .iov_base = (void *) buffer, .iov_len = size };
> 
> return nbd_wr_syncv(ioc, , 1, 0, size, false);
> }
> diff --git a/nbd/server.c b/nbd/server.c
> index 31fc9cf..aa252a4 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -61,6 +61,7 @@ struct NBDExport {
> 
> BlockBackend *blk;
> char *name;
> +char *description;
> off_t dev_offset;
> off_t size;
> uint16_t nbdflags;
> @@ -128,7 +129,8 @@ static ssize_t nbd_negotiate_read(QIOChannel *ioc, void 
> *buffer, size_t size)
> 
> }
> 
> -static ssize_t nbd_negotiate_write(QIOChannel *ioc, void *buffer, size_t 
> size)
> +static ssize_t nbd_negotiate_write(QIOChannel *ioc, const void *buffer,
> +   size_t size)
> {
> ssize_t ret;
> guint watch;
> @@ -224,11 +226,15 @@ static int nbd_negotiate_send_rep(QIOChannel *ioc, 
> uint32_t type, uint32_t opt)
> 
> static int nbd_negotiate_send_rep_list(QIOChannel *ioc, NBDExport *exp)
> {
> -uint64_t magic, name_len;
> +uint64_t magic;
> +size_t name_len, desc_len;
> uint32_t opt, type, len;
> +const char *name = exp->name ? exp->name : "";
> +const char *desc = exp->description ? exp->description : "";
> 
> -TRACE("Advertising export name '%s'", exp->name ? exp->name : "");
> -name_len = strlen(exp->name);
> +TRACE("Advertising export name '%s' description '%s'", name, desc);
> +name_len = strlen(name);
> +desc_len = strlen(desc);
> magic = cpu_to_be64(NBD_REP_MAGIC);
> if (nbd_negotiate_write(ioc, , sizeof(magic)) != sizeof(magic)) {
> LOG("write failed (magic)");
> @@ -244,18 +250,22 @@ static int nbd_negotiate_send_rep_list(QIOChannel *ioc, 
> NBDExport *exp)
> LOG("write failed (reply type)");
> return -EINVAL;
> }
> -len = cpu_to_be32(name_len + sizeof(len));
> +len = cpu_to_be32(name_len + desc_len + sizeof(len));
> if (nbd_negotiate_write(ioc, , sizeof(len)) != sizeof(len)) {
> LOG("write failed (length)");
> return -EINVAL;
> }
> len = cpu_to_be32(name_len);
> if (nbd_negotiate_write(ioc, , sizeof(len)) != sizeof(len)) {
> -LOG("write failed (length)");
> +LOG("write failed (name length)");
> return -EINVAL;
> }
> -if (nbd_negotiate_write(ioc, exp->name, name_len) != name_len) {
> -LOG("write failed (buffer)");
> +if (nbd_negotiate_write(ioc, name, name_len) != name_len) {
> +LOG("write failed (name buffer)");
> +return -EINVAL;
> +}
> +if (nbd_negotiate_write(ioc, desc, desc_len) != desc_len) {
> +LOG("write failed (description buffe

Re: [Qemu-block] [Qemu-devel] [PATCH v3 07/44] nbd: Limit nbdflags to 16 bits

2016-04-25 Thread Alex Bligh
   fixedNewStyle = true;
> @@ -543,17 +541,15 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char 
> *name, uint32_t *flags,
> goto fail;
> }
> *size = be64_to_cpu(s);
> -TRACE("Size is %" PRIu64, *size);
> 
> -if (read_sync(ioc, , sizeof(exportflags)) !=
> -sizeof(exportflags)) {
> +if (read_sync(ioc, flags, sizeof(*flags)) != sizeof(*flags)) {
> error_setg(errp, "Failed to read export flags");
> goto fail;
> }
> -exportflags = be16_to_cpu(exportflags);
> -*flags |= exportflags;
> -TRACE("Export flags are %" PRIx16, exportflags);
> +be16_to_cpus(flags);
> } else if (magic == NBD_CLIENT_MAGIC) {
> +uint32_t oldflags;
> +
> if (name) {
> error_setg(errp, "Server does not support export names");
> goto fail;
> @@ -570,16 +566,22 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char 
> *name, uint32_t *flags,
> *size = be64_to_cpu(s);
> TRACE("Size is %" PRIu64, *size);
> 
> -if (read_sync(ioc, flags, sizeof(*flags)) != sizeof(*flags)) {
> +if (read_sync(ioc, , sizeof(oldflags)) != sizeof(oldflags)) 
> {
> error_setg(errp, "Failed to read export flags");
> goto fail;
> }
> -*flags = be32_to_cpup(flags);
> +be32_to_cpus();
> +if (oldflags & ~0x) {
> +error_setg(errp, "Unexpected export flags %0x" PRIx32, oldflags);
> +goto fail;
> +}
> +*flags = oldflags;
> } else {
> error_setg(errp, "Bad magic received");
> goto fail;
> }
> 
> +TRACE("Size is %" PRIu64 ", export flags %" PRIx16, *size, *flags);
> if (read_sync(ioc, , 124) != 124) {
> error_setg(errp, "Failed to read reserved block");
> goto fail;
> @@ -591,7 +593,7 @@ fail:
> }
> 
> #ifdef __linux__
> -int nbd_init(int fd, QIOChannelSocket *sioc, uint32_t flags, off_t size)
> +int nbd_init(int fd, QIOChannelSocket *sioc, uint16_t flags, off_t size)
> {
> unsigned long sectors = size / BDRV_SECTOR_SIZE;
> if (size / BDRV_SECTOR_SIZE != sectors) {
> @@ -687,7 +689,7 @@ int nbd_disconnect(int fd)
> }
> 
> #else
> -int nbd_init(int fd, QIOChannelSocket *ioc, uint32_t flags, off_t size)
> +int nbd_init(int fd, QIOChannelSocket *ioc, uint16_t flags, off_t size)
> {
> return -ENOTSUP;
> }
> diff --git a/nbd/server.c b/nbd/server.c
> index 789189d..31fc9cf 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -63,7 +63,7 @@ struct NBDExport {
> char *name;
> off_t dev_offset;
> off_t size;
> -uint32_t nbdflags;
> +uint16_t nbdflags;
> QTAILQ_HEAD(, NBDClient) clients;
> QTAILQ_ENTRY(NBDExport) next;
> 
> @@ -544,8 +544,8 @@ static coroutine_fn int nbd_negotiate(NBDClientNewData 
> *data)
> NBDClient *client = data->client;
> char buf[8 + 8 + 8 + 128];
> int rc;
> -const int myflags = (NBD_FLAG_HAS_FLAGS | NBD_FLAG_SEND_TRIM |
> - NBD_FLAG_SEND_FLUSH | NBD_FLAG_SEND_FUA);
> +const uint16_t myflags = (NBD_FLAG_HAS_FLAGS | NBD_FLAG_SEND_TRIM |
> +  NBD_FLAG_SEND_FLUSH | NBD_FLAG_SEND_FUA);
> bool oldStyle;
> 
> /* Old style negotiation header without options
> @@ -575,7 +575,6 @@ static coroutine_fn int nbd_negotiate(NBDClientNewData 
> *data)
> 
> oldStyle = client->exp != NULL && !client->tlscreds;
> if (oldStyle) {
> -assert ((client->exp->nbdflags & ~65535) == 0);
> stq_be_p(buf + 8, NBD_CLIENT_MAGIC);
> stq_be_p(buf + 16, client->exp->size);
> stw_be_p(buf + 26, client->exp->nbdflags | myflags);
> @@ -604,7 +603,6 @@ static coroutine_fn int nbd_negotiate(NBDClientNewData 
> *data)
> goto fail;
> }
> 
> -assert ((client->exp->nbdflags & ~65535) == 0);
> stq_be_p(buf + 18, client->exp->size);
> stw_be_p(buf + 26, client->exp->nbdflags | myflags);
> if (nbd_negotiate_write(client->ioc, buf + 18, sizeof(buf) - 18) !=
> @@ -806,7 +804,7 @@ static void nbd_eject_notifier(Notifier *n, void *data)
> }
> 
> NBDExport *nbd_export_new(BlockBackend *blk, off_t dev_offset, off_t size,
> -  uint32_t nbdflags, void (*close)(NBDExport *),
> +  uint16_t nbdflags, void (*close)(NBDExport *),
>   Error **errp)
> {
> NBDExport *exp = g_malloc0(sizeof(NBDExport));
> diff --git a/qemu-nbd.c b/qemu-nbd.c
> index 2c9754e..71bfdeb 100644
> --- a/qemu-nbd.c
> +++ b/qemu-nbd.c
> @@ -241,7 +241,7 @@ static void *nbd_client_thread(void *arg)
> {
> char *device = arg;
> off_t size;
> -uint32_t nbdflags;
> +uint16_t nbdflags;
> QIOChannelSocket *sioc;
> int fd;
> int ret;
> @@ -455,7 +455,7 @@ int main(int argc, char **argv)
> BlockBackend *blk;
> BlockDriverState *bs;
> off_t dev_offset = 0;
> -uint32_t nbdflags = 0;
> +uint16_t nbdflags = 0;
> bool disconnect = false;
> const char *bindto = "0.0.0.0";
> const char *port = NULL;
> -- 
> 2.5.5
> 
> 

-- 
Alex Bligh







Re: [Qemu-block] [Qemu-devel] [PATCH for-2.6] nbd: Don't fail handshake on NBD_OPT_LIST descriptions

2016-04-14 Thread Alex Bligh

On 14 Apr 2016, at 16:26, Eric Blake <ebl...@redhat.com> wrote:

> [adding qemu-block in cc, since Paolo can't send pull request]
> 
> On 04/07/2016 07:09 PM, Eric Blake wrote:
>> The NBD Protocol states that NBD_REP_SERVER may set
>> 'length > sizeof(namelen) + namelen'; in which case the rest
>> of the packet is a UTF-8 description of the export.  While we
>> don't know of any NBD servers that send this description yet,
>> we had better consume the data so we don't choke when we start
>> to talk to such a server.
>> 
>> Also, a (buggy/malicious) server that replies with length <
>> sizeof(namelen) would cause us to block waiting for bytes that
>> the server is not sending, and one that replies with super-huge
>> lengths could cause us to temporarily allocate up to 4G memory.
>> Sanity check things before blindly reading incorrectly.
>> 
>> Signed-off-by: Eric Blake <ebl...@redhat.com>

Reviewed-by: Alex Bligh <a...@alex.org.uk>


>> ---
>> 
>> Yet another case of code introduced in 2.6 that doesn't play
>> nicely with spec-compliant servers...
>> 
>> Hopefully I've squashed them all now?
>> 
>> nbd/client.c | 23 +--
>> 1 file changed, 21 insertions(+), 2 deletions(-)
>> 
>> diff --git a/nbd/client.c b/nbd/client.c
>> index 6777e58..48f2a21 100644
>> --- a/nbd/client.c
>> +++ b/nbd/client.c
>> @@ -192,13 +192,18 @@ static int nbd_receive_list(QIOChannel *ioc, char 
>> **name, Error **errp)
>> return -1;
>> }
>> } else if (type == NBD_REP_SERVER) {
>> +if (len < sizeof(namelen) || len > NBD_MAX_BUFFER_SIZE) {
>> +error_setg(errp, "incorrect option length");
>> +return -1;
>> +}
>> if (read_sync(ioc, , sizeof(namelen)) != sizeof(namelen)) {
>> error_setg(errp, "failed to read option name length");
>> return -1;
>> }
>> namelen = be32_to_cpu(namelen);
>> -if (len != (namelen + sizeof(namelen))) {
>> -error_setg(errp, "incorrect option mame length");
>> +len -= sizeof(namelen);
>> +if (len < namelen) {
>> +error_setg(errp, "incorrect option name length");
>> return -1;
>> }
>> if (namelen > 255) {
>> @@ -214,6 +219,20 @@ static int nbd_receive_list(QIOChannel *ioc, char 
>> **name, Error **errp)
>> return -1;
>> }
>> (*name)[namelen] = '\0';
>> +len -= namelen;
>> +if (len) {
>> +char *buf = g_malloc(len + 1);
>> +if (read_sync(ioc, buf, len) != len) {
>> +error_setg(errp, "failed to read export description");
>> +g_free(*name);
>> +g_free(buf);
>> +    *name = NULL;
>> +return -1;
>> +}
>> +buf[len] = '\0';
>> +TRACE("Ignoring export description: %s", buf);
>> +g_free(buf);
>> +}
>> } else {
>> error_setg(errp, "Unexpected reply type %x expected %x",
>>type, NBD_REP_SERVER);
>> 
> 
> --
> Eric Blake   eblake redhat com+1-919-301-3266
> Libvirt virtualization library http://libvirt.org
> 

--
Alex Bligh






signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [Qemu-block] [Qemu-devel] [PATCH for-2.6] nbd: Don't kill server on client that doesn't request TLS

2016-04-14 Thread Alex Bligh

On 14 Apr 2016, at 16:25, Eric Blake <ebl...@redhat.com> wrote:

> [adding qemu-block in cc, since Paolo can't send pull request]
> 
> On 04/07/2016 02:29 PM, Eric Blake wrote:
>> Upstream NBD is documenting that servers MAY choose to operate
>> in a conditional mode, where it is up to the client whether to
>> use TLS.  For qemu's case, we want to always be in FORCEDTLS
>> mode, because of the risk of man-in-the-middle attacks, and since
>> we never export more than one device; likewise, the qemu client
>> will ALWAYS send NBD_OPT_STARTTLS as its first option.  But now
>> that SELECTIVETLS servers exist, it is feasible to encounter a
>> (non-qemu) client that does not do NBD_OPT_STARTTLS first, but
>> rather wants to take advantage of the conditional modes it might
>> find elsewhere.
>> 
>> Since we require TLS, we are within our rights to drop connections
>> on any client that doesn't negotiate it right away, or which
>> attempts to negotiate it incorrectly, without violating the intent
>> of the NBD Protocol.  However, it's better to allow the client to
>> continue trying, on the grounds that maybe the client will get the
>> hint to send NBD_OPT_STARTTLS.
>> 
>> Signed-off-by: Eric Blake <ebl...@redhat.com>

Reviewed-by: Alex Bligh <a...@alex.org.uk>

It is worth noting that this change (assuming I've read
it right) in no way means that qemu would be serving resources
without TLS enabled when configured with TLS, or even
providing information about those resources. It's
simply saying "TLS is required" (and not dropping the
connection so the client can ask for TLS) if the client
doesn't ask for TLS first thing. This removes the need
for the client and server to communicate out of band
that TLS is required (as well as conforming to the spec).

Alex

>> ---
>> 
>> My earlier patch was arguably incomplete:
>> https://lists.gnu.org/archive/html/qemu-devel/2016-04/msg01265.html
>> 
>> But as it is already in a pull request, and as this one is
>> a bit more controversial, it's best to keep it as a separate patch.
>> 
>> nbd/server.c | 10 --
>> 1 file changed, 8 insertions(+), 2 deletions(-)
>> 
>> diff --git a/nbd/server.c b/nbd/server.c
>> index 7843584..2b727f0 100644
>> --- a/nbd/server.c
>> +++ b/nbd/server.c
>> @@ -450,9 +450,12 @@ static int nbd_negotiate_options(NBDClient *client)
>> 
>> default:
>> TRACE("Option 0x%x not permitted before TLS", clientflags);
>> +if (nbd_negotiate_drop_sync(client->ioc, length) != length) 
>> {
>> +return -EIO;
>> +}
>> nbd_negotiate_send_rep(client->ioc, NBD_REP_ERR_TLS_REQD,
>>clientflags);
>> -return -EINVAL;
>> +break;
>> }
>> } else if (fixedNewstyle) {
>> switch (clientflags) {
>> @@ -470,6 +473,9 @@ static int nbd_negotiate_options(NBDClient *client)
>> return nbd_negotiate_handle_export_name(client, length);
>> 
>> case NBD_OPT_STARTTLS:
>> +if (nbd_negotiate_drop_sync(client->ioc, length) != length) 
>> {
>> +return -EIO;
>> +}
>> if (client->tlscreds) {
>> TRACE("TLS already enabled");
>> nbd_negotiate_send_rep(client->ioc, NBD_REP_ERR_INVALID,
>> @@ -479,7 +485,7 @@ static int nbd_negotiate_options(NBDClient *client)
>> nbd_negotiate_send_rep(client->ioc, NBD_REP_ERR_POLICY,
>>clientflags);
>> }
>> -return -EINVAL;
>> +break;
>> default:
>> TRACE("Unsupported option 0x%x", clientflags);
>> if (nbd_negotiate_drop_sync(client->ioc, length) != length) {
>> 
> 
> --
> Eric Blake   eblake redhat com+1-919-301-3266
> Libvirt virtualization library http://libvirt.org
> 

--
Alex Bligh






signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [Qemu-block] [Qemu-devel] [PATCH 16/18] nbd: Support NBD_CMD_CLOSE

2016-04-09 Thread Alex Bligh

On 10 Apr 2016, at 00:12, Eric Blake <ebl...@redhat.com> wrote:

> Qemu clients _already_ do the safe actions of waiting for all inflight
> requests to complete, then sending one final NBD_CMD_FLUSH, before
> attempting to send NBD_CMD_DISC.  If I knew how to make qemu guarantee
> that the NBD_CMD_DISC hits the wire (even in TLS mode) rather than being
> dropped early, that seems nicer than having to implement this (although
> I did learn a bit about qemu coroutines in implementing this).

Thanks. As discussed elsewhere, I think it's gnutls_bye() but I'm
more familiar with openssl and would be concerned that gnutls_bye()
might block.

--
Alex Bligh






signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [Qemu-block] [Qemu-devel] [RFC PATCH 18/18] nbd: Implement NBD_CMD_WRITE_ZEROES on client

2016-04-09 Thread Alex Bligh

On 10 Apr 2016, at 00:17, Eric Blake <ebl...@redhat.com> wrote:

> No, the code is correct.  In both functions, the logic is that if the
> lower-level knows that the server respects FUA, then it clears the flag
> before returning (flags is passed by reference, not value).  Then at
> this higher level, if FUA is still set, the server is too old, so we do
> a fallback flush to get the same semantics for the write in question,
> but at higher cost of a full flush.

OK, thanks.

--
Alex Bligh






signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [Qemu-block] [Qemu-devel] [RFC PATCH 18/18] nbd: Implement NBD_CMD_WRITE_ZEROES on client

2016-04-09 Thread Alex Bligh
en,
> .bdrv_co_readv  = nbd_co_readv,
> +.bdrv_co_write_zeroes   = nbd_co_write_zeroes,
> .bdrv_co_writev = nbd_co_writev,
> .bdrv_co_writev_flags   = nbd_co_writev_flags,
> .supported_write_flags  = BDRV_REQ_FUA,
> @@ -496,6 +517,7 @@ static BlockDriver bdrv_nbd_tcp = {
> .bdrv_parse_filename= nbd_parse_filename,
> .bdrv_file_open = nbd_open,
> .bdrv_co_readv  = nbd_co_readv,
> +.bdrv_co_write_zeroes   = nbd_co_write_zeroes,
> .bdrv_co_writev = nbd_co_writev,
> .bdrv_co_writev_flags   = nbd_co_writev_flags,
> .supported_write_flags  = BDRV_REQ_FUA,
> @@ -516,6 +538,7 @@ static BlockDriver bdrv_nbd_unix = {
> .bdrv_parse_filename= nbd_parse_filename,
> .bdrv_file_open = nbd_open,
> .bdrv_co_readv  = nbd_co_readv,
> +.bdrv_co_write_zeroes   = nbd_co_write_zeroes,
> .bdrv_co_writev = nbd_co_writev,
> .bdrv_co_writev_flags   = nbd_co_writev_flags,
> .supported_write_flags  = BDRV_REQ_FUA,
> -- 
> 2.5.5
> 
> 

-- 
Alex Bligh







Re: [Qemu-block] [Qemu-devel] [RFC PATCH 17/18] nbd: Implement NBD_CMD_WRITE_ZEROES on server

2016-04-09 Thread Alex Bligh
req->data, request.len / BDRV_SECTOR_SIZE);
> @@ -1243,6 +1254,52 @@ static void nbd_trip(void *opaque)
> goto out;
> }
> break;
> +case NBD_CMD_WRITE_ZEROES:
> +TRACE("Request type is WRITE_ZEROES");
> +
> +if (exp->nbdflags & NBD_FLAG_READ_ONLY) {
> +TRACE("Server is read-only, return error");
> +reply.error = EROFS;
> +goto error_reply;
> +}
> +
> +TRACE("Writing to device");
> +
> +flags = 0;
> +if (request.flags & NBD_CMD_FLAG_FUA) {
> +flags |= BDRV_REQ_FUA;
> +}
> +if (!(request.flags & NBD_CMD_FLAG_NO_HOLE)) {
> +/* FIXME: should this depend on whether the server is set to
> +   look for holes? */
> +flags |= BDRV_REQ_MAY_UNMAP;
> +}
> +ret = blk_write_zeroes(exp->blk,
> +   ((request.from + exp->dev_offset) /
> +BDRV_SECTOR_SIZE),
> +   request.len / BDRV_SECTOR_SIZE,
> +   flags);
> +if (ret < 0) {
> +LOG("writing to file failed");
> +reply.error = -ret;
> +goto error_reply;
> +}
> +
> +/* FIXME: do we need FUA flush here, if we also passed it to
> + * blk_write_zeroes? */

I don't think so. blk_write_zeroes with BDRV_REQ_FUA should
make *that* request access the media, but FLUSH makes ALL PREVIOUS
requests access the media, so you are doing too much work here
I think.

> +if (request.flags & NBD_CMD_FLAG_FUA) {
> +ret = blk_co_flush(exp->blk);
> +if (ret < 0) {
> +LOG("flush failed");
> +reply.error = -ret;
> +goto error_reply;
> +}
> +}
> +
> +if (nbd_co_send_reply(req, , 0) < 0) {
> +goto out;
> +}
> +break;
> case NBD_CMD_DISC:
> TRACE("Request type is DISCONNECT");
> goto out;
> -- 
> 2.5.5
> 
> 

-- 
Alex Bligh







Re: [Qemu-block] [Qemu-devel] [PATCH 16/18] nbd: Support NBD_CMD_CLOSE

2016-04-09 Thread Alex Bligh
@@ static coroutine_fn int nbd_negotiate(NBDClientNewData 
> *data)
> char buf[8 + 8 + 8 + 128];
> int rc;
> const uint16_t myflags = (NBD_FLAG_HAS_FLAGS | NBD_FLAG_SEND_TRIM |
> -  NBD_FLAG_SEND_FLUSH | NBD_FLAG_SEND_FUA);
> +  NBD_FLAG_SEND_FLUSH | NBD_FLAG_SEND_FUA |
> +  NBD_FLAG_SEND_CLOSE);
> bool oldStyle;
> size_t len;
> 
> @@ -1244,7 +1245,21 @@ static void nbd_trip(void *opaque)
> break;
> case NBD_CMD_DISC:
> TRACE("Request type is DISCONNECT");
> -errno = 0;
> +goto out;
> +case NBD_CMD_CLOSE:
> +TRACE("Request type is CLOSE");
> +if (request.flags || request.from || request.len) {
> +LOG("bad parameters, skipping flush");
> +reply.error = EINVAL;
> +} else {
> +ret = blk_co_flush(exp->blk);
> +if (ret < 0) {
> +LOG("flush failed");
> +reply.error = -ret;
> +}
> +}
> +/* Attempt to send reply, but even if it fails, we are done */
> +nbd_co_send_reply(req, , 0);
> goto out;
> case NBD_CMD_FLUSH:
> TRACE("Request type is FLUSH");
> -- 
> 2.5.5
> 
> 

-- 
Alex Bligh







Re: [Qemu-block] [Qemu-devel] [PATCH 13/18] nbd: Support shorter handshake

2016-04-09 Thread Alex Bligh

On 8 Apr 2016, at 23:05, Eric Blake <ebl...@redhat.com> wrote:

> The NBD Protocol allows the server and client to mutually agree
> on a shorter handshake (omit the 124 bytes of reserved 0), via
> the server advertising NBD_FLAG_NO_ZEROES and the client
> acknowledging with NBD_FLAG_C_NO_ZEROES (only possible in
> newstyle, whether or not it is fixed newstyle).  It doesn't
> shave much off the wire, but we might as well implement it.
> 
> Signed-off-by: Eric Blake <ebl...@redhat.com>


Reviewed-by: Alex Bligh <a...@alex.org.uk>

thanks - that was annoying me.

> ---
> include/block/nbd.h |  6 --
> nbd/client.c|  8 +++-
> nbd/server.c| 15 +++
> 3 files changed, 22 insertions(+), 7 deletions(-)
> 
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index 155196e..35c0ea3 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -73,11 +73,13 @@ typedef struct nbd_reply nbd_reply;
> 
> /* New-style handshake (global) flags, sent from server to client, and
>control what will happen during handshake phase. */
> -#define NBD_FLAG_FIXED_NEWSTYLE (1 << 0)/* Fixed newstyle protocol. 
> */
> +#define NBD_FLAG_FIXED_NEWSTYLE   (1 << 0) /* Fixed newstyle protocol. */
> +#define NBD_FLAG_NO_ZEROES(1 << 1) /* End handshake without zeroes. 
> */
> 
> /* New-style client flags, sent from client to server to control what happens
>during handshake phase. */
> -#define NBD_FLAG_C_FIXED_NEWSTYLE   (1 << 0)/* Fixed newstyle protocol. 
> */
> +#define NBD_FLAG_C_FIXED_NEWSTYLE (1 << 0) /* Fixed newstyle protocol. */
> +#define NBD_FLAG_C_NO_ZEROES  (1 << 1) /* End handshake without zeroes. 
> */
> 
> /* Reply types. */
> #define NBD_REP_ACK (1) /* Data sending finished. */
> diff --git a/nbd/client.c b/nbd/client.c
> index d4e37d5..507ddc1 100644
> --- a/nbd/client.c
> +++ b/nbd/client.c
> @@ -409,6 +409,7 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char 
> *name, uint32_t *flags,
> char buf[256];
> uint64_t magic, s;
> int rc;
> +bool zeroes = true;
> 
> TRACE("Receiving negotiation tlscreds=%p hostname=%s.",
>   tlscreds, hostname ? hostname : "");
> @@ -475,6 +476,11 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char 
> *name, uint32_t *flags,
> TRACE("Server supports fixed new style");
> clientflags |= NBD_FLAG_C_FIXED_NEWSTYLE;
> }
> +if (globalflags & NBD_FLAG_NO_ZEROES) {
> +zeroes = false;
> +TRACE("Server supports no zeroes");
> +clientflags |= NBD_FLAG_C_NO_ZEROES;
> +}
> /* client requested flags */
> clientflags = cpu_to_be32(clientflags);
> if (write_sync(ioc, , sizeof(clientflags)) !=
> @@ -558,7 +564,7 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char 
> *name, uint32_t *flags,
> goto fail;
> }
> 
> -if (drop_sync(ioc, 124) != 124) {
> +if (zeroes && drop_sync(ioc, 124) != 124) {
> error_setg(errp, "Failed to read reserved block");
> goto fail;
> }
> diff --git a/nbd/server.c b/nbd/server.c
> index 69724c9..379df8c 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -78,6 +78,7 @@ struct NBDClient {
> int refcount;
> void (*close)(NBDClient *client);
> 
> +bool no_zeroes;
> NBDExport *exp;
> QCryptoTLSCreds *tlscreds;
> char *tlsaclname;
> @@ -396,6 +397,11 @@ static int nbd_negotiate_options(NBDClient *client)
> fixedNewstyle = true;
> flags &= ~NBD_FLAG_C_FIXED_NEWSTYLE;
> }
> +if (flags & NBD_FLAG_C_NO_ZEROES) {
> +TRACE("Client supports no zeroes at handshake end");
> +client->no_zeroes = true;
> +flags &= ~NBD_FLAG_C_NO_ZEROES;
> +}
> if (flags != 0) {
> TRACE("Unknown client flags 0x%" PRIx32 " received", flags);
> return -EIO;
> @@ -527,6 +533,7 @@ static coroutine_fn int nbd_negotiate(NBDClientNewData 
> *data)
> const uint16_t myflags = (NBD_FLAG_HAS_FLAGS | NBD_FLAG_SEND_TRIM |
>   NBD_FLAG_SEND_FLUSH | NBD_FLAG_SEND_FUA);
> bool oldStyle;
> +size_t len;
> 
> /* Old style negotiation header without options
> [ 0 ..   7]   passwd   ("NBDMAGIC")
> @@ -543,7 +550,7 @@ static coroutine_fn int nbd_negotiate(NBDClientNewData 
> *data)
> options sent
> [18 ..  25]   size
> [26 ..  27]   export flags
> - 

Re: [Qemu-block] [Qemu-devel] [PATCH 10/18] nbd: Share common option-sending code in client

2016-04-09 Thread Alex Bligh

On 8 Apr 2016, at 23:05, Eric Blake <ebl...@redhat.com> wrote:

> Rather than open-coding each option request, it's easier to
> have common helper functions do the work.  That in turn requires
> having convenient packed types for handling option requests
> and replies.
> 
> Signed-off-by: Eric Blake <ebl...@redhat.com>

Reviewed-by: Alex Bligh <a...@alex.org.uk>

> ---
> include/block/nbd.h |  29 +-
> nbd/nbd-internal.h  |   2 +-
> nbd/client.c| 250 ++--
> 3 files changed, 129 insertions(+), 152 deletions(-)
> 
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index 42fd670..155196e 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -26,20 +26,41 @@
> #include "io/channel-socket.h"
> #include "crypto/tlscreds.h"
> 
> +/* Handshake phase structs */
> +
> +struct nbd_option {
> +uint64_t magic; /* NBD_OPTS_MAGIC */
> +uint32_t option; /* NBD_OPT_* */
> +uint32_t length;
> +} QEMU_PACKED;
> +typedef struct nbd_option nbd_option;
> +
> +struct nbd_opt_reply {
> +uint64_t magic; /* NBD_REP_MAGIC */
> +uint32_t option; /* NBD_OPT_* */
> +uint32_t type; /* NBD_REP_* */
> +uint32_t length;
> +} QEMU_PACKED;
> +typedef struct nbd_opt_reply nbd_opt_reply;
> +
> +/* Transmission phase structs */
> +
> struct nbd_request {
> -uint32_t magic;
> -uint16_t flags;
> -uint16_t type;
> +uint32_t magic; /* NBD_REQUEST_MAGIC */
> +uint16_t flags; /* NBD_CMD_FLAG_* */
> +uint16_t type; /* NBD_CMD_* */
> uint64_t handle;
> uint64_t from;
> uint32_t len;
> } QEMU_PACKED;
> +typedef struct nbd_request nbd_request;
> 
> struct nbd_reply {
> -uint32_t magic;
> +uint32_t magic; /* NBD_REPLY_MAGIC */
> uint32_t error;
> uint64_t handle;
> } QEMU_PACKED;
> +typedef struct nbd_reply nbd_reply;
> 
> /* Transmission (export) flags: sent from server to client during handshake,
>but describe what will happen during transmission */
> diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
> index b663bf3..b78d249 100644
> --- a/nbd/nbd-internal.h
> +++ b/nbd/nbd-internal.h
> @@ -61,7 +61,7 @@
> #define NBD_REPLY_MAGIC 0x67446698
> #define NBD_OPTS_MAGIC  0x49484156454F5054LL
> #define NBD_CLIENT_MAGIC0x420281861253LL
> -#define NBD_REP_MAGIC   0x3e889045565a9LL
> +#define NBD_REP_MAGIC   0x0003e889045565a9LL
> 
> #define NBD_SET_SOCK_IO(0xab, 0)
> #define NBD_SET_BLKSIZE _IO(0xab, 1)
> diff --git a/nbd/client.c b/nbd/client.c
> index 7fd6059..07b8d2e 100644
> --- a/nbd/client.c
> +++ b/nbd/client.c
> @@ -75,64 +75,123 @@ static QTAILQ_HEAD(, NBDExport) exports = 
> QTAILQ_HEAD_INITIALIZER(exports);
> 
> */
> 
> +/* Send an option request. Return 0 if successful, -1 with errp set if
> + * it is impossible to continue. */
> +static int nbd_send_option_request(QIOChannel *ioc, uint32_t opt,
> +   uint32_t len, const char *data,
> +   Error **errp)
> +{
> +nbd_option req;
> +QEMU_BUILD_BUG_ON(sizeof(req) != 16);
> 
> -/* If type represents success, return 1 without further action.
> - * If type represents an error reply, consume the rest of the packet on ioc.
> - * Then return 0 for unsupported (so the client can fall back to
> - * other approaches), or -1 with errp set for other errors.
> +if (len == -1) {
> +req.length = len = strlen(data);
> +}
> +TRACE("Sending option request %"PRIu32", len %"PRIu32, opt, len);
> +
> +stq_be_p(, NBD_OPTS_MAGIC);
> +stl_be_p(, opt);
> +stl_be_p(, len);
> +
> +if (write_sync(ioc, , sizeof(req)) != sizeof(req)) {
> +error_setg(errp, "Failed to send option request header");
> +return -1;
> +}
> +
> +if (len && write_sync(ioc, (char *) data, len) != len) {
> +error_setg(errp, "Failed to send option request data");
> +return -1;
> +}
> +
> +return 0;
> +}
> +
> +/* Receive the header of an option reply, which should match the given
> + * opt.  Read through the length field, but NOT the length bytes of
> + * payload. Return 0 if successful, -1 with errp set if it is
> + * impossible to continue. */
> +static int nbd_receive_option_reply(QIOChannel *ioc, uint32_t opt,
> +nbd_opt_reply *reply, Error **errp)
> +{
> +QEMU_BUILD_BUG_ON(sizeof(*reply) != 20);
> +if (read_sync(ioc, reply, sizeof(*reply)) != sizeof(*reply)) {
> +

Re: [Qemu-block] [Qemu-devel] [PATCH 08/18] nbd: Limit nbdflags to 16 bits

2016-04-09 Thread Alex Bligh

On 8 Apr 2016, at 23:05, Eric Blake <ebl...@redhat.com> wrote:

> Rather than asserting that nbdflags is within range, just give
> it the correct type to begin with :)  nbdflags corresponds to
> the per-export portion of NBD Protocol "transmission flags", which
> is 16 bits in response to NBD_OPT_EXPORT_NAME and NBD_OPT_GO.
> 
> Signed-off-by: Eric Blake <ebl...@redhat.com>


Reviewed-by: Alex Bligh <a...@alex.org.uk>

> ---
> include/block/nbd.h |  2 +-
> nbd/server.c| 10 --
> qemu-nbd.c  |  2 +-
> 3 files changed, 6 insertions(+), 8 deletions(-)
> 
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index 2c61901..42fd670 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -105,7 +105,7 @@ typedef struct NBDExport NBDExport;
> typedef struct NBDClient NBDClient;
> 
> NBDExport *nbd_export_new(BlockBackend *blk, off_t dev_offset, off_t size,
> -  uint32_t nbdflags, void (*close)(NBDExport *),
> +  uint16_t nbdflags, void (*close)(NBDExport *),
>   Error **errp);
> void nbd_export_close(NBDExport *exp);
> void nbd_export_get(NBDExport *exp);
> diff --git a/nbd/server.c b/nbd/server.c
> index 93c077e..c8666ab 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -63,7 +63,7 @@ struct NBDExport {
> char *name;
> off_t dev_offset;
> off_t size;
> -uint32_t nbdflags;
> +uint16_t nbdflags;
> QTAILQ_HEAD(, NBDClient) clients;
> QTAILQ_ENTRY(NBDExport) next;
> 
> @@ -525,8 +525,8 @@ static coroutine_fn int nbd_negotiate(NBDClientNewData 
> *data)
> NBDClient *client = data->client;
> char buf[8 + 8 + 8 + 128];
> int rc;
> -const int myflags = (NBD_FLAG_HAS_FLAGS | NBD_FLAG_SEND_TRIM |
> - NBD_FLAG_SEND_FLUSH | NBD_FLAG_SEND_FUA);
> +const uint16_t myflags = (NBD_FLAG_HAS_FLAGS | NBD_FLAG_SEND_TRIM |
> +  NBD_FLAG_SEND_FLUSH | NBD_FLAG_SEND_FUA);
> bool oldStyle;
> 
> /* Old style negotiation header without options
> @@ -556,7 +556,6 @@ static coroutine_fn int nbd_negotiate(NBDClientNewData 
> *data)
> 
> oldStyle = client->exp != NULL && !client->tlscreds;
> if (oldStyle) {
> -assert ((client->exp->nbdflags & ~65535) == 0);
> stq_be_p(buf + 8, NBD_CLIENT_MAGIC);
> stq_be_p(buf + 16, client->exp->size);
> stw_be_p(buf + 26, client->exp->nbdflags | myflags);
> @@ -585,7 +584,6 @@ static coroutine_fn int nbd_negotiate(NBDClientNewData 
> *data)
> goto fail;
> }
> 
> -assert ((client->exp->nbdflags & ~65535) == 0);
> stq_be_p(buf + 18, client->exp->size);
> stw_be_p(buf + 26, client->exp->nbdflags | myflags);
> if (nbd_negotiate_write(client->ioc, buf + 18, sizeof(buf) - 18) !=
> @@ -807,7 +805,7 @@ static void nbd_eject_notifier(Notifier *n, void *data)
> }
> 
> NBDExport *nbd_export_new(BlockBackend *blk, off_t dev_offset, off_t size,
> -  uint32_t nbdflags, void (*close)(NBDExport *),
> +  uint16_t nbdflags, void (*close)(NBDExport *),
>   Error **errp)
> {
> NBDExport *exp = g_malloc0(sizeof(NBDExport));
> diff --git a/qemu-nbd.c b/qemu-nbd.c
> index c2e4d3f..8880ac3 100644
> --- a/qemu-nbd.c
> +++ b/qemu-nbd.c
> @@ -454,7 +454,7 @@ int main(int argc, char **argv)
> BlockBackend *blk;
> BlockDriverState *bs;
> off_t dev_offset = 0;
> -uint32_t nbdflags = 0;
> +uint16_t nbdflags = 0;
> bool disconnect = false;
> const char *bindto = "0.0.0.0";
> const char *port = NULL;
> -- 
> 2.5.5
> 
> 

-- 
Alex Bligh







Re: [Qemu-block] [Qemu-devel] [PATCH 07/18] nbd: Treat flags vs. command type as separate fields

2016-04-09 Thread Alex Bligh

On 8 Apr 2016, at 23:05, Eric Blake <ebl...@redhat.com> wrote:

> Current upstream NBD documents that requests have a 16-bit flags,
> followed by a 16-bit type integer; although older versions mentioned
> only a 32-bit field with masking to find flags.  Since the protocol
> is in network order (big-endian over the wire), the ABI is unchanged;
> but dealing with the flags as a separate field rather than masking
> will make it easier to add support for upcoming NBD extensions that
> increase the number of both flags and commands.
> 
> Improve some comments in nbd.h based on the current upstream
> NBD protocol (https://github.com/yoe/nbd/blob/master/doc/proto.md),
> and touch some nearby code to keep checkpatch.pl happy.
> 
> Signed-off-by: Eric Blake <ebl...@redhat.com>


Reviewed-by: Alex Bligh <a...@alex.org.uk>

> ---
> include/block/nbd.h | 18 --
> nbd/nbd-internal.h  |  4 ++--
> block/nbd-client.c  |  9 +++--
> nbd/client.c| 17 ++---
> nbd/server.c| 35 +++
> 5 files changed, 46 insertions(+), 37 deletions(-)
> 
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index 3f047bf..2c61901 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -1,4 +1,5 @@
> /*
> + *  Copyright (C) 2016 Red Hat, Inc.
>  *  Copyright (C) 2005  Anthony Liguori <anth...@codemonkey.ws>
>  *
>  *  Network Block Device
> @@ -27,7 +28,8 @@
> 
> struct nbd_request {
> uint32_t magic;
> -uint32_t type;
> +uint16_t flags;
> +uint16_t type;
> uint64_t handle;
> uint64_t from;
> uint32_t len;
> @@ -39,6 +41,8 @@ struct nbd_reply {
> uint64_t handle;
> } QEMU_PACKED;
> 
> +/* Transmission (export) flags: sent from server to client during handshake,
> +   but describe what will happen during transmission */
> #define NBD_FLAG_HAS_FLAGS  (1 << 0)/* Flags are there */
> #define NBD_FLAG_READ_ONLY  (1 << 1)/* Device is read-only */
> #define NBD_FLAG_SEND_FLUSH (1 << 2)/* Send FLUSH */
> @@ -46,10 +50,12 @@ struct nbd_reply {
> #define NBD_FLAG_ROTATIONAL (1 << 4)/* Use elevator algorithm - 
> rotational media */
> #define NBD_FLAG_SEND_TRIM  (1 << 5)/* Send TRIM (discard) */
> 
> -/* New-style global flags. */
> +/* New-style handshake (global) flags, sent from server to client, and
> +   control what will happen during handshake phase. */
> #define NBD_FLAG_FIXED_NEWSTYLE (1 << 0)/* Fixed newstyle protocol. */
> 
> -/* New-style client flags. */
> +/* New-style client flags, sent from client to server to control what happens
> +   during handshake phase. */
> #define NBD_FLAG_C_FIXED_NEWSTYLE   (1 << 0)/* Fixed newstyle protocol. */
> 
> /* Reply types. */
> @@ -60,10 +66,10 @@ struct nbd_reply {
> #define NBD_REP_ERR_INVALID ((UINT32_C(1) << 31) | 3) /* Invalid length. 
> */
> #define NBD_REP_ERR_TLS_REQD((UINT32_C(1) << 31) | 5) /* TLS required */
> 
> +/* Request flags, sent from client to server during transmission phase */
> +#define NBD_CMD_FLAG_FUA(1 << 0)
> 
> -#define NBD_CMD_MASK_COMMAND 0x
> -#define NBD_CMD_FLAG_FUA (1 << 16)
> -
> +/* Supported request types */
> enum {
> NBD_CMD_READ = 0,
> NBD_CMD_WRITE = 1,
> diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
> index 3791535..b663bf3 100644
> --- a/nbd/nbd-internal.h
> +++ b/nbd/nbd-internal.h
> @@ -52,10 +52,10 @@
> /* This is all part of the "official" NBD API.
>  *
>  * The most up-to-date documentation is available at:
> - * https://github.com/yoe/nbd/blob/master/doc/proto.txt
> + * https://github.com/yoe/nbd/blob/master/doc/proto.md
>  */
> 
> -#define NBD_REQUEST_SIZE(4 + 4 + 8 + 8 + 4)
> +#define NBD_REQUEST_SIZE(4 + 2 + 2 + 8 + 8 + 4)
> #define NBD_REPLY_SIZE  (4 + 4 + 8)
> #define NBD_REQUEST_MAGIC   0x25609513
> #define NBD_REPLY_MAGIC 0x67446698
> diff --git a/block/nbd-client.c b/block/nbd-client.c
> index 878e879..285025d 100644
> --- a/block/nbd-client.c
> +++ b/block/nbd-client.c
> @@ -1,6 +1,7 @@
> /*
>  * QEMU Block driver for  NBD
>  *
> + * Copyright (C) 2016 Red Hat, Inc.
>  * Copyright (C) 2008 Bull S.A.S.
>  * Author: Laurent Vivier <laurent.viv...@bull.net>
>  *
> @@ -252,7 +253,7 @@ static int nbd_co_writev_1(BlockDriverState *bs, int64_t 
> sector_num,
> 
> if ((*flags & BDRV_REQ_FUA) && (client->nbdflags & NBD_FLAG_SEND_FUA)) {
> *flags &= ~BDRV_REQ_FUA;
> -request.type |= NBD_CMD_F

Re: [Qemu-block] [Qemu-devel] [PATCH 06/18] nbd: Avoid magic number for NBD max name size

2016-04-09 Thread Alex Bligh

On 8 Apr 2016, at 23:05, Eric Blake <ebl...@redhat.com> wrote:

> Declare a constant and use that when determining if an export
> name fits within the constraints we are willing to support.
> 
> Signed-off-by: Eric Blake <ebl...@redhat.com>
> ---
> include/block/nbd.h | 2 ++
> nbd/client.c| 2 +-
> nbd/server.c| 4 ++--
> 3 files changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index b86a976..3f047bf 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -76,6 +76,8 @@ enum {
> 
> /* Maximum size of a single READ/WRITE data buffer */
> #define NBD_MAX_BUFFER_SIZE (32 * 1024 * 1024)
> +/* Maximum size of an export name */
> +#define NBD_MAX_NAME_SIZE 255

Given the standard is either likely to or does (can't
remember whether that patch is merged) document the
maximum supported export length as 4096, why not change
this to 4096?

Otherwise:

Reviewed-by: Alex Bligh <a...@alex.org.uk>

> 
> ssize_t nbd_wr_syncv(QIOChannel *ioc,
>  struct iovec *iov,
> diff --git a/nbd/client.c b/nbd/client.c
> index c834587..00f9244 100644
> --- a/nbd/client.c
> +++ b/nbd/client.c
> @@ -210,7 +210,7 @@ static int nbd_receive_list(QIOChannel *ioc, char **name, 
> Error **errp)
> error_setg(errp, "incorrect option name length");
> return -1;
> }
> -if (namelen > 255) {
> +if (namelen > NBD_MAX_NAME_SIZE) {
> error_setg(errp, "export name length too long %" PRIu32, namelen);
> return -1;
> }
> diff --git a/nbd/server.c b/nbd/server.c
> index a10294e..5414c49 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -285,13 +285,13 @@ static int nbd_negotiate_handle_list(NBDClient *client, 
> uint32_t length)
> static int nbd_negotiate_handle_export_name(NBDClient *client, uint32_t 
> length)
> {
> int rc = -EINVAL;
> -char name[256];
> +char name[NBD_MAX_NAME_SIZE + 1];
> 
> /* Client sends:
> [20 ..  xx]   export name (length bytes)
>  */
> TRACE("Checking length");
> -if (length > 255) {
> +if (length >= sizeof(name)) {
> LOG("Bad length received");
> goto fail;
> }
> -- 
> 2.5.5
> 
> 

-- 
Alex Bligh







Re: [Qemu-block] [Qemu-devel] [PATCH] block: initialize qcrypto API at startup

2016-04-06 Thread Alex Bligh

On 6 Apr 2016, at 12:12, Daniel P. Berrange <berra...@redhat.com> wrote:

> Any programs which call the qcrypto APIs should ensure that
> qcrypto_init() has been called before anything else which
> can use crypto. Essentially this means right at the start
> of the main method before initializing anything else.
> 
> This is important because some versions of gnutls/gcrypt
> require explicit initialization before use.
> 
> Signed-off-by: Daniel P. Berrange <berra...@redhat.com>

Reviewed-by: Alex Bligh <a...@alex.org.uk>
Tested-by: Alex Bligh <a...@alex.org.uk>

> ---
> qemu-img.c | 6 ++
> qemu-io.c  | 6 ++
> qemu-nbd.c | 7 +++
> 3 files changed, 19 insertions(+)
> 
> diff --git a/qemu-img.c b/qemu-img.c
> index 17c5cfd..1697762 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -37,6 +37,7 @@
> #include "block/block_int.h"
> #include "block/blockjob.h"
> #include "block/qapi.h"
> +#include "crypto/init.h"
> #include 
> 
> #define QEMU_IMG_VERSION "qemu-img version " QEMU_VERSION QEMU_PKGVERSION \
> @@ -3486,6 +3487,11 @@ int main(int argc, char **argv)
> exit(EXIT_FAILURE);
> }
> 
> +if (qcrypto_init(_error) < 0) {
> +error_reportf_err(local_error, "cannot initialize crypto: ");
> +exit(1);
> +}
> +
> module_call_init(MODULE_INIT_QOM);
> bdrv_init();
> if (argc < 2) {
> diff --git a/qemu-io.c b/qemu-io.c
> index 0a738f1..288bba8 100644
> --- a/qemu-io.c
> +++ b/qemu-io.c
> @@ -23,6 +23,7 @@
> #include "sysemu/block-backend.h"
> #include "block/block_int.h"
> #include "trace/control.h"
> +#include "crypto/init.h"
> 
> #define CMD_NOFILE_OK   0x01
> 
> @@ -443,6 +444,11 @@ int main(int argc, char **argv)
> progname = basename(argv[0]);
> qemu_init_exec_dir(argv[0]);
> 
> +if (qcrypto_init(_error) < 0) {
> +error_reportf_err(local_error, "cannot initialize crypto: ");
> +exit(1);
> +}
> +
> module_call_init(MODULE_INIT_QOM);
> qemu_add_opts(_object_opts);
> bdrv_init();
> diff --git a/qemu-nbd.c b/qemu-nbd.c
> index ca4a724..fa91ca3 100644
> --- a/qemu-nbd.c
> +++ b/qemu-nbd.c
> @@ -31,6 +31,7 @@
> #include "qapi/qmp/qstring.h"
> #include "qom/object_interfaces.h"
> #include "io/channel-socket.h"
> +#include "crypto/init.h"
> 
> #include 
> #include 
> @@ -518,6 +519,12 @@ int main(int argc, char **argv)
>     memset(_sigterm, 0, sizeof(sa_sigterm));
> sa_sigterm.sa_handler = termsig_handler;
> sigaction(SIGTERM, _sigterm, NULL);
> +
> +if (qcrypto_init(_err) < 0) {
> +error_reportf_err(local_err, "cannot initialize crypto: ");
> +exit(1);
> +}
> +
> module_call_init(MODULE_INIT_QOM);
> qemu_add_opts(_object_opts);
> qemu_init_exec_dir(argv[0]);
> -- 
> 2.5.5
> 
> 

-- 
Alex Bligh