Eric, See my message on nbd-general today re the necessity (or not) of getting NBD_OPT_BLOCK_SIZE first; it may be just that you can assume 512 is OK.
Otherwise Reviewed-by: Alex Bligh <a...@alex.org.uk> Alex On 23 Apr 2016, at 00:40, Eric Blake <ebl...@redhat.com> wrote: > The upstream NBD Protocol has defined a new extension to allow > the server to advertise block sizes to the client, as well as > a way for the client to inform the server that it intends to > obey block sizes. > > Thanks to a recent fix, our minimum transfer size is always > 1 (the block layer takes care of read-modify-write on our > behalf), although if we wanted down the road, we could > advertise a minimum of 512 based on our usage patterns to a > client that is willing to honor block sizes. Meanwhile, > advertising our absolute maximum transfer size of 32M will > help newer clients avoid EINVAL failures. > > We do not reject clients for using the older NBD_OPT_EXPORT_NAME; > we are no worse off for those clients than we used to be. But > for clients new enough to use NBD_OPT_GO, we require the client > to first send us NBD_OPT_BLOCK_SIZE to prove they know about > sizing restraints, by failing with NBD_REP_ERR_BLOCK_SIZE_REQD. > All existing released qemu clients (whether old-style or new, at > least by the end of this series) already honor our limits, and > will still connect; so at most, this would reject a new non-qemu > client that doesn't intend to obey limits (and that client could > still use NBD_OPT_EXPORT_NAME to bypass our rejection). > > Signed-off-by: Eric Blake <ebl...@redhat.com> > --- > include/block/nbd.h | 2 ++ > nbd/nbd-internal.h | 1 + > nbd/server.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++++++++ > 3 files changed, 65 insertions(+) > > diff --git a/include/block/nbd.h b/include/block/nbd.h > index 1072d9e..a5c68df 100644 > --- a/include/block/nbd.h > +++ b/include/block/nbd.h > @@ -96,11 +96,13 @@ typedef struct nbd_reply nbd_reply; > #define NBD_REP_ERR_TLS_REQD NBD_REP_ERR(5) /* TLS required */ > #define NBD_REP_ERR_UNKNOWN NBD_REP_ERR(6) /* Export unknown */ > #define NBD_REP_ERR_SHUTDOWN NBD_REP_ERR(7) /* Server shutting down */ > +#define NBD_REP_ERR_BLOCK_SIZE_REQD NBD_REP_ERR(8) /* Missing OPT_BLOCK_SIZE > */ > > /* Info types, used during NBD_REP_INFO */ > #define NBD_INFO_EXPORT 0 > #define NBD_INFO_NAME 1 > #define NBD_INFO_DESCRIPTION 2 > +#define NBD_INFO_BLOCK_SIZE 3 > > /* Request flags, sent from client to server during transmission phase */ > #define NBD_CMD_FLAG_FUA (1 << 0) /* 'force unit access' during write > */ > diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h > index 1935102..1354182 100644 > --- a/nbd/nbd-internal.h > +++ b/nbd/nbd-internal.h > @@ -85,6 +85,7 @@ > #define NBD_OPT_STARTTLS (5) > #define NBD_OPT_INFO (6) > #define NBD_OPT_GO (7) > +#define NBD_OPT_BLOCK_SIZE (9) > > /* NBD errors are based on errno numbers, so there is a 1:1 mapping, > * but only a limited set of errno values is specified in the protocol. > diff --git a/nbd/server.c b/nbd/server.c > index 563afb2..86d1e2d 100644 > --- a/nbd/server.c > +++ b/nbd/server.c > @@ -83,6 +83,7 @@ struct NBDClient { > void (*close)(NBDClient *client); > > bool no_zeroes; > + bool block_size; > NBDExport *exp; > QCryptoTLSCreds *tlscreds; > char *tlsaclname; > @@ -346,6 +347,7 @@ static int nbd_negotiate_handle_info(NBDClient *client, > uint32_t length, > uint16_t type; > uint64_t size; > uint16_t flags; > + uint32_t block; > > /* Client sends: > [20 .. xx] export name (length bytes) > @@ -391,6 +393,57 @@ static int nbd_negotiate_handle_info(NBDClient *client, > uint32_t length, > } > > rc = nbd_negotiate_send_rep_len(client->ioc, NBD_REP_INFO, opt, > + sizeof(type) + 3 * sizeof(block)); > + if (rc < 0) { > + return rc; > + } > + > + type = cpu_to_be16(NBD_INFO_BLOCK_SIZE); > + if (nbd_negotiate_write(client->ioc, &type, sizeof(type)) != > + sizeof(type)) { > + LOG("write failed"); > + return -EIO; > + } > + /* minimum - Always 1, because we use blk_pread(). > + * TODO: Advertise 512 if guest used NBD_OPT_BLOCK_SIZE? */ > + block = cpu_to_be32(1); > + if (nbd_negotiate_write(client->ioc, &block, sizeof(block)) != > + sizeof(block)) { > + LOG("write failed"); > + return -EIO; > + } > + /* preferred - At least 4096, but larger as appropriate. */ > + block = blk_get_opt_transfer_length(exp->blk) * BDRV_SECTOR_SIZE; > + block = cpu_to_be32(MAX(4096, block)); > + if (nbd_negotiate_write(client->ioc, &block, sizeof(block)) != > + sizeof(block)) { > + LOG("write failed"); > + return -EIO; > + } > + /* maximum - At most 32M, but smaller as appropriate. */ > + block = blk_get_max_transfer_length(exp->blk); > + if (block && block < NBD_MAX_BUFFER_SIZE / BDRV_SECTOR_SIZE) { > + block *= BDRV_SECTOR_SIZE; > + } else { > + block = NBD_MAX_BUFFER_SIZE; > + } > + block = cpu_to_be32(block); > + if (nbd_negotiate_write(client->ioc, &block, sizeof(block)) != > + sizeof(block)) { > + LOG("write failed"); > + return -EIO; > + } > + > + if (!client->block_size) { > + /* The client is new enough to use NBD_OPT_GO, but forgot to > + * tell us that it plans to obey block sizes. Since we fail > + * hard on oversize requests, it's better to reject such a > + * client up front. */ > + return nbd_negotiate_send_rep(client->ioc, > NBD_REP_ERR_BLOCK_SIZE_REQD, > + opt); > + } > + > + rc = nbd_negotiate_send_rep_len(client->ioc, NBD_REP_INFO, opt, > sizeof(type) + sizeof(size) + > sizeof(flags)); > if (rc < 0) { > @@ -630,6 +683,15 @@ static int nbd_negotiate_options(NBDClient *client, > uint16_t myflags) > } > break; > > + case NBD_OPT_BLOCK_SIZE: > + client->block_size = true; > + ret = nbd_negotiate_send_rep(client->ioc, NBD_REP_ACK, > + clientflags); > + if (ret < 0) { > + return ret; > + } > + break; > + > case NBD_OPT_STARTTLS: > if (nbd_negotiate_drop_sync(client->ioc, length) != length) { > return -EIO; > -- > 2.5.5 > > -- Alex Bligh