Re: [Qemu-devel] [PATCH v2 3/3] doc: Propose Structured Read extension

2016-03-30 Thread Alex Bligh
Eric,

>>> However, for ease of implementation, a
>>> +server MAY close the connection rather than entering transmission
>>> +phase if, at the end of option haggling, the client has negotiated
>>> +another command that requires a structured reply but did not also
>>> +negotiate Structured Reads.
>> 
>> That's pretty yucky given a reconnect will achieve the same result
>> and you'll end up in an infinite retry loop.
>> 
>> Wouldn't a better route be simply to say that implementing certain
>> commands (server or client sides) requires support of structured
>> replies?
> 
> I think we're in agreement that the only command that (for back-compat
> reasons) can ever send data on a simple reply is CMD_READ.  Therefore,
> if you negotiate any other command that can send data, that command will
> use a structured reply; the mere fact that you negotiated the command
> means that both client and server know about structured replies.
> 
> I guess what I was trying to get at is that if you are using any
> structured replies, then it is a disservice to wire-sniffers if you did
> not also enable structured replies for CMD_READ.

Agree

> Technically, it would
> be feasible to use simple replies for reads while using structured
> replies for the other negotiated commands,

Agree

> but practically, a client
> that does that seems undesirable, which is why I said that a server
> could disconnect rather than talking to such a client.

I would prefer to make that a protocol breach, i.e. the client
MUST NOT do that.

> So taking your sentence at face value, yes there is another solution
> possible: require that any NBD_OPT_* haggling used to negotiate the use
> of any other command with a structured reply MUST fail if the client has
> not first negotiated NBD_OPT_STRUCTURED_READ, but leaves the connection
> open so the client can continue option haggling.  That way, the only way
> to end option haggling with the new command enabled is to also enable
> structured reads.  The burden is then on the client to haggle in the
> correct order, and on the server to track haggling state when deciding
> how to answer the option requests for the new commands.

I think I'm saying something simpler (unless I'm missing something)
which is:

The client MUST NOT propose NBD_OPT_FOO unless it has previously
proposed and the server accepted NBD_OPT_BAR. In this case FOO
is e.g. the thing to find holes, BAR is NBD_OPT_STRUCTURED_READ.

So it's not a question of leaving the connection open or closing it,
it's simply that the client can't propose X unless it's already
negotiated Y. If it does, all bets are off, so of course it can
close the connection. But this makes it explicit that the client
if proposing X should first have successfully negotiated Y.

> I'm a bit reluctant to put ordering requirements on option haggling
> (that option A must be turned on before option B), but then again, the
> SELECT extension requires NBD_OPT_SELECT to be haggled prior to
> NBD_OPT_GO, so there's precedent.

Resisting such a change would resist the possibility in the future
that option X requires option Y. Even if you can get around that
this time, I'll bet my bottom dollar it will come up again.

> I also am thinking of proposing an
> extension for option haggling to communicate minimum/preferred
> alignments and maximum don't-fragment sizing, and that option would have
> to be enabled before OPT_LIST/OPT_SELECT/OPT_EXPORT_NAME (because it
> would change the NBD_REP_SERVER layout in response to those option
> requests), which would be another case where option A affects the
> behavior of option B.

... and as if by magic!

Yeah I was going to suggest a similar option, including a blocksize
alignment, a maximum size for a read/write, and a maximum DF size.

I'm busy writing an 'example' nbd server in golang and this is the
first thing I ran into. The purpose of this BTW is to serve as
an example implementation for structured replies.

>> These multiple error chunks are neat. However, I suspect lazy implementors
>> may just send an error without an offset.
> 
> Any ideas are appreciated on how to word it to suggest that servers
> SHOULD try to give full details; but I think we want the fallback to the
> no-offset case because some situations will not have an offset.

Indeed. Perhaps we can promise them Oreos.

> I was thinking that it's easier on the client if the final chunk always
> serves as a reliable indicator of success (OFFSET_DATA, OFFSET_HOLE,
> NONE) or error (ERROR, ERROR_OFFSET).  But if you think that always
> allowing a concluding NONE, even on an errored reply due to an earlier
> chunk reporting the error, is reasonable, I can relax things along those
> lines.  Or maybe we can state that the concluding chunk on an errored
> reply may be ERROR_OFFSET with 'error' and 'offset' fields set to 0,
> rather than forcing the server to replay one of the earlier offsets.

I think I'd prefer just that the last chunk has the relevant bit set,

Re: [Qemu-devel] [PATCH v2 3/3] doc: Propose Structured Read extension

2016-03-30 Thread Eric Blake
On 03/30/2016 12:50 AM, Alex Bligh wrote:
> Eric,
> 
> There's a lot in common between our two proposals now (unsurprisingly).
> You've highlighted the differences in the other mail and I'll
> comment on them there. You may want to steal some of my wording as
> I think there are bits I've got that you haven't (as well as vice versa).
> But I'm inclined to use yours as a base unless you particularly
> like mine.
> 
> Comments inline below.
> 
> Alex
> 
> On 30 Mar 2016, at 00:01, Eric Blake  wrote:
> ...
> 
>> +While the server is permitted to send at most one normal reply (or
>> +else close the connection), a command that uses structured replies
>> +may document that the server is permitted to send mutiple replies,

And just now noticing my typo on 'multiple' :)

>> +all sharing the same handle,
> 
> The thought is fine, but the language is confusing. I think this is
> a single reply, made up of multiple parts (I called them chunks). You've
> called them multiple replies, which I think makes things less clear.
> Also below you've started using my 'chunk' language anyway!

All right, for v3, I will try to go with the wording that every request
has a single reply; but the reply will either be a 'simple reply'
(single message), or a 'structured reply' (which may occupy multiple
messages, where each message is called a 'chunk').

> 
>> by using the `NBD_REPLY_FLAG_DONE`
>> +(bit 0) to delineate the final reply. The server MAY interleave
>> +intermediate replies to one structured command with replies
>> +relating to a different handle.
> 
> Neat.
> 
> The argument against this route is that now there are essentially
> two ways to end a chain of chunks (with and without a NONE chunk)
> which is necessary for the reasons you set out. On balance I like it though.

Yeah; I didn't see any way around that.  Always requiring a NONE chunk
is more network overhead if the server can guarantee that the last data
chunk is error-free; but at the same time, we shouldn't force servers to
guarantee the last data chunk will be error-free.

I don't think it is too much of a burden for a client to receive chunks
in a loop until the FLAG_DONE bit is set, without regards to what chunk
type came last.  And for CMD_READ, we still have a nice delineation: if
the last chunk is NONE, OFFSET_DATA, or OFFSET_HOLE, the command
succeeded; if the last chunk is ERROR or ERROR_OFFSET, the command failed.

> 
>>  However, for ease of implementation, a
>> +server MAY close the connection rather than entering transmission
>> +phase if, at the end of option haggling, the client has negotiated
>> +another command that requires a structured reply but did not also
>> +negotiate Structured Reads.
> 
> That's pretty yucky given a reconnect will achieve the same result
> and you'll end up in an infinite retry loop.
> 
> Wouldn't a better route be simply to say that implementing certain
> commands (server or client sides) requires support of structured
> replies?

I think we're in agreement that the only command that (for back-compat
reasons) can ever send data on a simple reply is CMD_READ.  Therefore,
if you negotiate any other command that can send data, that command will
use a structured reply; the mere fact that you negotiated the command
means that both client and server know about structured replies.

I guess what I was trying to get at is that if you are using any
structured replies, then it is a disservice to wire-sniffers if you did
not also enable structured replies for CMD_READ.  Technically, it would
be feasible to use simple replies for reads while using structured
replies for the other negotiated commands, but practically, a client
that does that seems undesirable, which is why I said that a server
could disconnect rather than talking to such a client.

So taking your sentence at face value, yes there is another solution
possible: require that any NBD_OPT_* haggling used to negotiate the use
of any other command with a structured reply MUST fail if the client has
not first negotiated NBD_OPT_STRUCTURED_READ, but leaves the connection
open so the client can continue option haggling.  That way, the only way
to end option haggling with the new command enabled is to also enable
structured reads.  The burden is then on the client to haggle in the
correct order, and on the server to track haggling state when deciding
how to answer the option requests for the new commands.

I'm a bit reluctant to put ordering requirements on option haggling
(that option A must be turned on before option B), but then again, the
SELECT extension requires NBD_OPT_SELECT to be haggled prior to
NBD_OPT_GO, so there's precedent.  I also am thinking of proposing an
extension for option haggling to communicate minimum/preferred
alignments and maximum don't-fragment sizing, and that option would have
to be enabled before OPT_LIST/OPT_SELECT/OPT_EXPORT_NAME (because it
would change the NBD_REP_SERVER layout in response to those o

Re: [Qemu-devel] [PATCH v2 3/3] doc: Propose Structured Read extension

2016-03-29 Thread Alex Bligh
Eric,

There's a lot in common between our two proposals now (unsurprisingly).
You've highlighted the differences in the other mail and I'll
comment on them there. You may want to steal some of my wording as
I think there are bits I've got that you haven't (as well as vice versa).
But I'm inclined to use yours as a base unless you particularly
like mine.

Comments inline below.

Alex

On 30 Mar 2016, at 00:01, Eric Blake  wrote:
...

> +While the server is permitted to send at most one normal reply (or
> +else close the connection), a command that uses structured replies
> +may document that the server is permitted to send mutiple replies,
> +all sharing the same handle,

The thought is fine, but the language is confusing. I think this is
a single reply, made up of multiple parts (I called them chunks). You've
called them multiple replies, which I think makes things less clear.
Also below you've started using my 'chunk' language anyway!

> by using the `NBD_REPLY_FLAG_DONE`
> +(bit 0) to delineate the final reply. The server MAY interleave
> +intermediate replies to one structured command with replies
> +relating to a different handle.

Neat.

The argument against this route is that now there are essentially
two ways to end a chain of chunks (with and without a NONE chunk)
which is necessary for the reasons you set out. On balance I like it though.

> +
> +A server MUST NOT send a data payload in a normal reply if
> +Structured Reads are negotiated.  It is envisioned that all future
> +extension commands that require a data payload in the response
> +will require independent option negotiation, and therefore, the
> +`NBD_CMD_READ` command is the only command that is allowed to use
> +the data payload of a normal reply, and only when Structured Reads
> +were not negotiated.

See other email.

>  However, for ease of implementation, a
> +server MAY close the connection rather than entering transmission
> +phase if, at the end of option haggling, the client has negotiated
> +another command that requires a structured reply but did not also
> +negotiate Structured Reads.

That's pretty yucky given a reconnect will achieve the same result
and you'll end up in an infinite retry loop.

Wouldn't a better route be simply to say that implementing certain
commands (server or client sides) requires support of structured
replies?

> +- `NBD_REPLY_TYPE_NONE` (0)
> +
> +  *length* MUST be 0 (and the payload field omitted).  This type
> +  SHOULD be used only as the final reply (that is, when
> +  `NBD_REPLY_FLAG_DONE` is set), and implies that the overall
> +  client request was successfully completed.

I think this would be clearer as 'SHOULD NOT be used other than as the
final reply'. Because you are also saying (I think) that you need not
have it as the final reply - it's just as good in a non-errored
reply to have NBD_REPLY_FLAG_DONE set on the last data packet (provided
you know it's not going to error before starting to send it).

...

> +The server MAY split the reply into any number of data chunks,
> +using reply types of `NBD_REPLY_TYPE_OFFSET_DATA` or
> +`NBD_REPLY_TYPE_OFFSET_HOLE`; each chunk MUST describe at least
> +one byte, although to minimize overhead, the server SHOULD use
> +chunks no smaller than 512 bytes where possible (the first and

This is a good idea, but rather than 'no smaller than 512 bytes', as
it's a 'SHOULD', could we have 'the server SHOULD use chunks each
an integer multiple of 512 bytes where possible' (you already have
a carve out for the first and last).

...

> +If no error is detected, then the server MUST send enough chunks
> +to cover the bytes requested.  The server MAY set the
> +`NBD_REPLY_FLAG_DONE` on the final data chunk,

In which case it MUST NOT send any further non-data chunks
(e.g. an error chunk or a NONE chunk)

> to minimize
> +traffic, but MUST NOT do so if it would still be possible to
> +detect an error while transmitting the chunk.  If the last data
> +chunk is not the final reply, the server MUST use
> +`NBD_REPLY_TYPE_NONE` as the final reply to indicate success.

or an error chunk to indicate an error, and these final chunk MUST have
NBD_REPLY_FLAG_DONE set on it.

> +If an error is detected, the server MUST send padding bytes to
> +complete the current chunk (if any), MUST report the error with a
> +reply type of either `NBD_REPLY_TYPE_ERROR` or
> +`NBD_REPLY_TYPE_ERROR_OFFSET`, and MAY end the sequence of replies
> +without sending the total number of bytes requested.  If one or
> +more offset errors are reported, the client MAY assume that all
> +data in chunks not including the offset,

"the offset(s)"

> and all data within the
> +affected chunk

"within each affected chunk"

> but prior to the offset,

"prior to the relevant offset"

> is valid; the client MAY
> +NOT assume anythi

Re: [Qemu-devel] [PATCH v2 3/3] doc: Propose Structured Read extension

2016-03-29 Thread Eric Blake
On 03/29/2016 05:01 PM, Eric Blake wrote:
> The existing transmission phase protocol is difficult to sniff,
> because correct interpretation of the server stream requires
> context from the client stream (or risks false positives if
> data payloads happen to contain the protocol magic numbers).  It
> also prohibits the ability to do efficient sparse reads, or to
> return a short read where an error is reported without also
> sending length bytes of (bogus) data.
> 

> +* `NBD_CMD_READ`
> +
> +If `NBD_OPT_STRUCTURED_READ` was not negotiated, then a read
> +request MUST always be answered by a single non-structured
> +response, as documented above (using magic 0x67446698
> +`NBD_REPLY_MAGIC`, and containing length bytes of data according
> +to the client's request, although those bytes MAY be invalid if an
> +error is returned, and the connection MUST if an error occurs

MUST be closed

(I hate it when I send patches before 'git commit --amend' on the final
unsaved changes in my editor...)

> +after a header claiming no error).
> +

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] [PATCH v2 3/3] doc: Propose Structured Read extension

2016-03-29 Thread Eric Blake
The existing transmission phase protocol is difficult to sniff,
because correct interpretation of the server stream requires
context from the client stream (or risks false positives if
data payloads happen to contain the protocol magic numbers).  It
also prohibits the ability to do efficient sparse reads, or to
return a short read where an error is reported without also
sending length bytes of (bogus) data.

Remedy this by adding a new option request negotiation, which
affects the response of the NBD_CMD_READ command, and sets
forth rules for how future command responses must behave when
they carry a data payload.

This proposal does NOT permit structured replies to anything
other than NBD_CMD_READ, although a future proposal may wish
to make that valid (so that a server could be written that
only returns structured replies).

Signed-off-by: Eric Blake 
---
 doc/proto.md | 260 ++-
 1 file changed, 256 insertions(+), 4 deletions(-)

diff --git a/doc/proto.md b/doc/proto.md
index 3f9ee23..75b1534 100644
--- a/doc/proto.md
+++ b/doc/proto.md
@@ -211,6 +211,14 @@ handle as was sent by the client in the corresponding 
request.  In
 this way, the client can correlate which request is receiving a
 response.

+By default, there is exactly one reply message for each request
+(unless the connection is closed due to an error).  Note that it is
+impossible to tell by reading just the server traffic whether a data
+field will be present.  The experimental `Structured Read` extension
+adds an additional reply type, documents when there will be multiple
+replies to a single request, and creates a context-free server stream;
+see below.
+
 ## Values

 This section describes the value and meaning of constants (other than
@@ -255,6 +263,8 @@ immediately after the global flags field in oldstyle 
negotiation:
 - bit 5, `NBD_FLAG_SEND_TRIM`; should be set to 1 if the server supports
   `NBD_CMD_TRIM` commands

+Clients SHOULD ignore unknown flags.
+
 # Client flags

 This field of 32 bits is sent after initial connection and after
@@ -338,6 +348,10 @@ of the newstyle negotiation.

 Defined by the experimental `SELECT` extension; see below.

+- `NBD_OPT_STRUCTURED_READ` (8)
+
+Defined by the experimental `Structured Read` extension; see below.
+
  Option reply types

 These values are used in the "reply type" field, sent by the server
@@ -430,6 +444,8 @@ valid may depend on negotiation during the handshake phase.
   set to 1 if the client requires "Force Unit Access" mode of
   operation.  MUST NOT be set unless export flags included
   `NBD_FLAG_SEND_FUA`.
+- bit 1, `NBD_CMD_FLAG_DF`; defined by the experimental `Structured
+  Read` extension; see below

  Request types

@@ -451,6 +467,10 @@ The following request types exist:
 signalling no error), the server MUST immediately close the
 connection; it MUST NOT send any further data to the client.

+The experimental `Structured Read` extension changes the set of
+valid replies, in part to allow recovery after a partial read and
+more efficient reads of sparse files; see below.
+
 * `NBD_CMD_WRITE` (1)

 A write request. Length and offset define the location and amount of
@@ -536,13 +556,16 @@ The following error values are defined:
 * `ENOMEM` (12), Cannot allocate memory.
 * `EINVAL` (22), Invalid argument.
 * `ENOSPC` (28), No space left on device.
+* `EOVERFLOW` (75), Value too large.

 The server SHOULD return `ENOSPC` if it receives a write request
 including one or more sectors beyond the size of the device.  It SHOULD
 return `EINVAL` if it receives a read or trim request including one or
 more sectors beyond the size of the device.  It also SHOULD map the
-`EDQUOT` and `EFBIG` errors to `ENOSPC`.  Finally, it SHOULD return
-`EPERM` if it receives a write or trim request on a read-only export.
+`EDQUOT` and `EFBIG` errors to `ENOSPC`.  It SHOULD return `EOVERFLOW`
+on a request to send structured read data without fragmentation but
+where the length is too large.  Finally, it SHOULD return `EPERM` if
+it receives a write or trim request on a read-only export.

 The server SHOULD return `EINVAL` if it receives an unknown command.

@@ -579,7 +602,7 @@ To remedy this, a `SELECT` extension is envisioned. This 
extension adds
 two option requests and one error reply type, and extends one existing
 option reply type.

-* `NBD_OPT_SELECT`
+* `NBD_OPT_SELECT` (6)

 The client wishes to select the export with the given name for use
 in the transmission phase, but does not yet want to move to the
@@ -613,7 +636,7 @@ option reply type.
   handle `NBD_REP_ERR_UNSUP`. In this case, they should fall back to
   using `NBD_OPT_EXPORT_NAME`.

-* `NBD_OPT_GO`
+* `NBD_OPT_GO` (7)

 The client wishes to terminate the negotiation phase and progress to
 the transmission phase. Possible replies from the server include:
@@ -635,6 +658,235 @@ option reply type.
   messa