Re: [Qemu-devel] [PATCH 2/2] NBD proto: add GET_LBA_STATUS extension

Eric Blake Thu, 24 Mar 2016 15:09:11 -0700

On 03/23/2016 08:16 AM, Denis V. Lunev wrote:
> From: Pavel Borzenkov <pborzen...@virtuozzo.com>
> 
> With the availability of sparse storage formats, it is often needed to
> query status of a particular LBA range and read only those blocks of
> data that are actually present on the block device.
> 
> To provide such information, the patch adds GET_LBA_STATUS extension
> with one new NBD_CMD_GET_LBA_STATUS command.
>


> +* `NBD_CMD_GET_LBA_STATUS` (7)
> +
> +    An LBA range status query request. Length and offset define the range
> +    of interest. The server MUST reply with a reply header, followed
> +    immediately by the following data:
> +
> +      - 32 bits, length of parameter data that follow (unsigned)
> +      - zero or more LBA status descriptors, each having the following
> +        structure:
> +
> +        * 64 bits, offset (unsigned)
> +        * 32 bits, length (unsigned)
> +        * 16 bits, status (unsigned)
> +
> +    unless an error condition has occurred.

To date, only the NBD_CMD_READ command caused the server to send data
after the handle in its reply.  This would be the second command with a
data field in the response, but it is returning a variable amount of
data, not directly tied to the client's length - but at least you did
make it structured so that the client knows how much to read.  However,
your patch is incomplete; you'll need to edit the "Transmission" section
of the document to mention the rules on the server sending data, as the
existing text would now be wrong:

> The server replies with:
> 
> S: 32 bits, 0x67446698, magic (NBD_REPLY_MAGIC)
> S: 32 bits, error
> S: 64 bits, handle
> S: (length bytes of data if the request is of type NBD_CMD_READ)

You may also want to add a rule that for all future extensions, any
command that requires data in the server response (other than the server
response to NBD_CMD_READ) must include a 32-bit length as the first
field of its data payload, so that the server reply is always structured.

Hmm, I still think it would be hard to write a wireshark dissector for
server replies, since there is no indication whether a given reply will
include data or not.  The _client_ knows (well, any well-written client
that uses a different value for 'handle' for every command sent to the
server), because it can read the returned 'handle' field to know what
command it matches to and therefore what data to expect; but a
third-party observer seeing _just_ the server messages has no idea which
server responses should have payload.  Scanning the stream and assuming
that a new reply starts each time NBD_REPLY_MAGIC is encountered is a
close approximation, but hits false positives if the data being returned
for NBD_CMD_READ contains the same magic number as part of its contents.
 Obviously, back-compat says we can't change the response to
NBD_CMD_READ, but that means that a wireshark dissector has to either
maintain context, or hunt through returned data looking for potential
magic numbers and possibly hitting false positives.

That said, maybe we want to allow global flag negotiation prior to
transmission to add a new flag on both server and client side - the
server would advertise that it is capable of a new reply mode, and the
client then has to reply that it wants to use the new reply mode; in
that mode, all server replies starting with magic 0x67446698
(NBD_REPLY_MAGIC) will never have a data payload, and all commands that
cause a reply with payload (both NBD_CMD_READ and the new
NBD_CMD_GET_LBA_STATUS of this message, or whatever name we give it)
will reply with a NEW magic number:

S: 32 bits, XXXX, magic (NBD_REPLY_MAGIC2)
S: 32 bits, error
S: 64 bits, handle
S: 32 bits, length
S: length bytes of data

so that the server's data stream is fully structured without having to
maintain context of the client's requests.  That is, a dissector can now
merely scan for both magic numbers; and on a stream between a client and
server that have negotiated the new mode, the old magic number will
never have a payload, and the new magic number will always be
accompanied with a payload that describes how much data to read to the
boundary of the next reply.

For that matter, right now, NBD_CMD_READ is required to either end the
session or return length bytes of data even when error is non-zero (but
where those bytes MAY be invalid); but by adding a negotiated flag for
structured length replies, it would be possible to allow for short reads
and/or returning an error with 0 bytes of payload but still keeping the
connection to the client open, without having to send wasted bytes over
the wire.

I could write up a negotiation of global flags for structured reply
lengths as an extension proposal, if you think it is worth it.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 2/2] NBD proto: add GET_LBA_STATUS extension

Reply via email to