03.05.2018 17:49, Eric Blake wrote:
On 05/03/2018 04:17 AM, Vladimir Sementsov-Ogievskiy wrote:
02.05.2018 00:13, Eric Blake wrote:
The NBD spec is clarifying [1] that a server may want to advertise
different limits for READ/WRITE (in our case, 32M) than for
TRIM/ZERO (in our case, nearly 4G). Implement the client
side support for these alternate limits, by always requesting
the new information (a compliant server must ignore the
request if it is unknown), and by validating anything reported
by the server before populating the block layer limits.
[1] https://lists.debian.org/nbd/2018/03/msg00048.html
Signed-off-by: Eric Blake <ebl...@redhat.com>
+++ b/nbd/client.c
@@ -337,16 +337,18 @@ static int nbd_opt_go(QIOChannel *ioc, const
char *wantname,
info->flags = 0;
trace_nbd_opt_go_start(wantname);
- buf = g_malloc(4 + len + 2 + 2 * info->request_sizes + 1);
+ buf = g_malloc(4 + len + 2 + 3 * 2 * info->request_sizes + 1);
Hmm, what "+1" means?
Doesn't appear to be necessary, but a little slop never hurts. Better
might be struct-ifying things rather than open-coding offsets, or even
using a vectored approach rather than requiring a single buffer. But
if useful, that's refactoring that is independent of this patch, while
this is just demonstrating the bare minimum implementation of the new
feature.
stl_be_p(buf, len);
memcpy(buf + 4, wantname, len);
- /* At most one request, everything else up to server */
- stw_be_p(buf + 4 + len, info->request_sizes);
+ /* Either 0 or 3 requests, everything else up to server */
+ stw_be_p(buf + 4 + len, 3 * info->request_sizes);
if (info->request_sizes) {
stw_be_p(buf + 4 + len + 2, NBD_INFO_BLOCK_SIZE);
+ stw_be_p(buf + 4 + len + 2 + 2, NBD_INFO_TRIM_SIZE);
+ stw_be_p(buf + 4 + len + 2 + 2 + 2, NBD_INFO_ZERO_SIZE);
}
error = nbd_send_option_request(ioc, NBD_OPT_GO,
- 4 + len + 2 + 2 *
info->request_sizes,
+ 4 + len + 2 + 3 * 2 *
info->request_sizes,
buf, errp);
magic chunk) Change looks correct. Is it worth introducing a buf_len
variable, to
not retype it?
And increment as we go? Yeah, it might make things more legible.
+ be32_to_cpus(&info->max_trim);
+ if (!info->min_trim || !info->max_trim ||
+ (info->max_trim != UINT32_MAX &&
+ !QEMU_IS_ALIGNED(info->max_trim, info->min_trim))) {
+ error_setg(errp, "server trim sizes %" PRIu32 "/%"
PRIu32
+ " are not valid", info->min_trim,
info->max_trim);
+ nbd_send_opt_abort(ioc);
+ return -1;
+ }
Didn't you forget to check that min_trim is a power of two?
Nope. The NBD spec wording specifically uses "The minimum trim size
SHOULD be a power of 2, and MUST be at least as large as the preferred
block size advertised in `NBD_INFO_BLOCK_SIZE`", in part because there
ARE existing hardware iscsi devices that have an advertised
preferred/maximum trim size of exactly 15 megabytes (you can request
smaller trims 1M at a time, and the device then caches things to
eventually report unmapped slices at a 15M boundary). For more
information on this odd hardware, look in the qemu logs for the Dell
Equallogic device, such as commit 3482b9bc.
Ok. Interesting, thank you.
Since the NBD 'min trim' is advisory, and merely telling the client
the ideal alignments to use for a trim most likely to have an effect
(on Equallogic, a 1M trim by itself might not trim anything unless
neighboring areas are also trimmed, while an aligned 15M trim is
immediately observable), an NBD server wrapping such an iscsi device
may prefer to mirror the hardware's preferred alignment of 15M, rather
than inventing a minimum alignment of 1M, in what it advertises in
NBD_INFO_TRIM_SIZE.
And maybe I should tweak the NBD spec addition to call things
"preferred trim/zero" rather than "min trim/zero", if that makes
things any easier to grasp on first read, and better matches the iscsi
concept.
good idea. and it will better correspond to pref_block for
INFO_BLOCK_SIZE, as they are compared with it.
hmm, so, now nbd_opt_go() is full of very-very similar code parts,
which may worth some refactoring now or in future.
Are we going to add similar limits for BLOCK_STATUS? and for CACHE? I
have an idea: why not to make an universal option for it in
protocol? Something like
option INFO_CMD_SIZE:
uint16_t command
uint32_t min_block
uint32_t max_block
and server can send several such options. Most of semantics for these
additional limits are common, so it will simplify both documentation
and realization..
I'll copy this to nbd thread and it is better to discuss it in it.
Yep, I've followed up on that thread, but will post a v2 of this
proposal along those lines.
--
Best regards,
Vladimir