Re: [PATCH v5 0/7] vhost-user-blk: fix the migration issue and enhance qtests
On Thu, Sep 24, 2020 at 07:26:14AM -0400, Michael S. Tsirkin wrote: > On Fri, Sep 11, 2020 at 11:39:42AM +0300, Dima Stepanov wrote: > > v4 -> v5: > > - vhost: check queue state in the vhost_dev_set_log routine > > tests/qtest/vhost-user-test: prepare the tests for adding new > > dev class > > tests/qtest/vhost-user-test: add support for the > > vhost-user-blk device > > tests/qtest/vhost-user-test: add migrate_reconnect test > > Reviewed-by: Raphael Norwitz > > - Update qtest, by merging vhost-user-blk "if" case with the > > virtio-blk case. > > I dropped patches 3-7 since they were stalling on some systems. > Pls work with Peter Maydell (cc'd) to figure it out. Thanks! Peter, can you share any details for the stalling errors with me? > > > > v3 -> v4: > > - vhost: recheck dev state in the vhost_migration_log routine > > Reviewed-by: Raphael Norwitz > > - vhost: check queue state in the vhost_dev_set_log routine > > Use "continue" instead of "break" to handle non-initialized > > virtqueue case. > > > > v2 -> v3: > > - update commit message for the > > "vhost: recheck dev state in the vhost_migration_log routine" commit > > - rename "started" field of the VhostUserBlk structure to > > "started_vu", so there will be no confustion with the VHOST started > > field > > - update vhost-user-test.c to always initialize nq local variable > > (spotted by patchew) > > > > v1 -> v2: > > - add comments to connected/started fields in the header file > > - move the "s->started" logic from the vhost_user_blk_disconnect > > routine to the vhost_user_blk_stop routine > > > > Reference e-mail threads: > > - https://lists.gnu.org/archive/html/qemu-devel/2020-05/msg01509.html > > - https://lists.gnu.org/archive/html/qemu-devel/2020-05/msg05241.html > > > > If vhost-user daemon is used as a backend for the vhost device, then we > > should consider a possibility of disconnect at any moment. There was a > > general > > question here: should we consider it as an error or okay state for the > > vhost-user > > devices during migration process? > > I think the disconnect event for the vhost-user devices should not break the > > migration process, because: > > - the device will be in the stopped state, so it will not be changed > > during migration > > - if reconnect will be made the migration log will be reinitialized as > > part of reconnect/init process: > > #0 vhost_log_global_start (listener=0x563989cf7be0) > > at hw/virtio/vhost.c:920 > > #1 0x56398603d8bc in listener_add_address_space > > (listener=0x563989cf7be0, > > as=0x563986ea4340 ) > > at softmmu/memory.c:2664 > > #2 0x56398603dd30 in memory_listener_register > > (listener=0x563989cf7be0, > > as=0x563986ea4340 ) > > at softmmu/memory.c:2740 > > #3 0x563985fd6956 in vhost_dev_init (hdev=0x563989cf7bd8, > > opaque=0x563989cf7e30, backend_type=VHOST_BACKEND_TYPE_USER, > > busyloop_timeout=0) > > at hw/virtio/vhost.c:1385 > > #4 0x563985f7d0b8 in vhost_user_blk_connect (dev=0x563989cf7990) > > at hw/block/vhost-user-blk.c:315 > > #5 0x563985f7d3f6 in vhost_user_blk_event (opaque=0x563989cf7990, > > event=CHR_EVENT_OPENED) > > at hw/block/vhost-user-blk.c:379 > > The first patch in the patchset fixes this issue by setting vhost device to > > the > > stopped state in the disconnect handler and check it the > > vhost_migration_log() > > routine before returning from the function. > > qtest framework was updated to test vhost-user-blk functionality. The > > vhost-user-blk/vhost-user-blk-tests/migrate_reconnect test was added to > > reproduce > > the original issue found. > > > > Dima Stepanov (7): > > vhost: recheck dev state in the vhost_migration_log routine > > vhost: check queue state in the vhost_dev_set_log routine > > tests/qtest/vhost-user-test: prepare the tests for adding new dev > > class > > tests/qtest/libqos/virtio-blk: add support for vhost-user-blk > > tests/qtest/vhost-user-test: add support for the vhost-user-blk device > > tests/qtest/vhost-user-test: add migrate_reconnect test > > tests/qtest/vhost-user-test: enable the reconnect tests > > > > hw/block/vhost-user-blk.c | 19 ++- > > hw/virtio/vhost.c | 39 - > > include/hw/virtio/vhost-user-blk.h | 10 ++ > > tests/qtest/libqos/virtio-blk.c| 14 +- > > tests/qtest/vhost-user-test.c | 290 > > +++-- > > 5 files changed, 322 insertions(+), 50 deletions(-) > > > > -- > > 2.7.4 >
Re: [PATCH 3/3] nbd: Add 'qemu-nbd -A' to expose allocation depth
25.09.2020 23:32, Eric Blake wrote: Allow the server to expose an additional metacontext to be requested by savvy clients. qemu-nbd adds a new option -A to expose the qemu:allocation-depth metacontext through NBD_CMD_BLOCK_STATUS; this can also be set via QMP when using nbd-server-add. qemu as client can be hacked into viewing this new context by using the now-misnamed x-dirty-bitmap option when creating an NBD blockdev; may be rename it to x-block-status ? although it is worth noting the decoding of how such context information will appear in 'qemu-img map --output=json': NBD_STATE_DEPTH_UNALLOC => "zero":false, "data":true NBD_STATE_DEPTH_LOCAL => "zero":false, "data":false NBD_STATE_DEPTH_BACKING => "zero":true, "data":true It wouldn't be so simple if we decide to export exact depth number.. -- Best regards, Vladimir
Re: [PATCH 2/3] nbd: Add new qemu:allocation-depth metacontext
26.09.2020 10:33, Richard W.M. Jones wrote: On Fri, Sep 25, 2020 at 03:32:48PM -0500, Eric Blake wrote: +The second is related to exposing the source of various extents within +the image, with a single context named: + +qemu:allocation-depth + +In the allocation depth context, bits 0 and 1 form a tri-state value: + +bits 0-1 clear: NBD_STATE_DEPTH_UNALLOC, means the extent is unallocated +bit 0 set: NBD_STATE_DEPTH_LOCAL, the extent is allocated in this image +bit 1 set: NBD_STATE_DEPTH_BACKING, the extent is inherited from a + backing layer From the cover description I imagined it would show the actual depth, ie: top -> backing -> backing -> backing depth: 12 3 (0 = unallocated) I wonder if that is possible? (Perhaps there's something I don't understand here.) That's possible if we want it. Probably the best way is to add *depth parameter to bdrv_is_allocated_above (and better on top of my "[PATCH v7 0/5] fix & merge block_status_above and is_allocated_above") -- Best regards, Vladimir
Re: [PATCH 2/3] nbd: Add new qemu:allocation-depth metacontext
25.09.2020 23:32, Eric Blake wrote: 'qemu-img map' provides a way to determine which extents of an image come from the top layer vs. inherited from a backing chain. This is useful information worth exposing over NBD. There is a proposal to add a QMP command block-dirty-bitmap-populate which can create a dirty bitmap that reflects allocation information, at which point qemu:dirty-bitmap:NAME can expose that information via the creation of a temporary bitmap, but we can shorten the effort by adding a new qemu:allocation-depth context that does the same thing without an intermediate bitmap (this patch does not eliminate the need for that proposal, as it will have other uses as well). For this patch, I just encoded a tri-state value (unallocated, from this layer, from any of the backing layers); we could instead or in addition report an actual depth count per extent, if that proves more useful. Note that this patch does not actually enable any way to request a server to enable this context; that will come in the next patch. Signed-off-by: Eric Blake Looks good to me overall, need to rebase if patch 01 changed (as I propose or in some better way). -- Best regards, Vladimir
Re: [PATCH 1/3] nbd: Simplify meta-context parsing
25.09.2020 23:32, Eric Blake wrote: We had a premature optimization of trying to read as little from the wire as possible while handling NBD_OPT_SET_META_CONTEXT in phases. But in reality, we HAVE to read the entire string from the client before we can get to the next command, and it is easier to just read it all at once than it is to read it in pieces. And once we do that, several functions end up no longer performing I/O, and no longer need to return a value. While simplifying this, take advantage of g_str_has_prefix for less repetition of boilerplate string length computation. Our iotests still pass; I also checked that libnbd's testsuite (which covers more corner cases of odd meta context requests) still passes. Signed-off-by: Eric Blake --- nbd/server.c | 172 ++- 1 file changed, 47 insertions(+), 125 deletions(-) diff --git a/nbd/server.c b/nbd/server.c index 982de67816a7..0d2d7e52058f 100644 --- a/nbd/server.c +++ b/nbd/server.c @@ -1,5 +1,5 @@ /* - * Copyright (C) 2016-2018 Red Hat, Inc. + * Copyright (C) 2016-2020 Red Hat, Inc. * Copyright (C) 2005 Anthony Liguori * * Network Block Device Server Side @@ -792,135 +792,64 @@ static int nbd_negotiate_send_meta_context(NBDClient *client, return qio_channel_writev_all(client->ioc, iov, 2, errp) < 0 ? -EIO : 0; } -/* Read strlen(@pattern) bytes, and set @match to true if they match @pattern. - * @match is never set to false. - * - * Return -errno on I/O error, 0 if option was completely handled by - * sending a reply about inconsistent lengths, or 1 on success. - * - * Note: return code = 1 doesn't mean that we've read exactly @pattern. - * It only means that there are no errors. + +/* + * Check @ns with @len bytes, and set @match to true if it matches @pattern, + * or if @len is 0 and the client is performing _LIST_. @match is never set + * to false. */ -static int nbd_meta_pattern(NBDClient *client, const char *pattern, bool *match, -Error **errp) +static void nbd_meta_empty_or_pattern(NBDClient *client, const char *pattern, + const char *ns, uint32_t len, ns changed its meaning, it's not just a namespace, but the whole query. I think, better to rename it. Also, it's unusual to pass length together with nul-terminated string, it seems redundant. And, it's used only to compare with zero, strlen(ns) == 0 or ns[0] == 0 is not slower. Also, errp is unused argument. And it violate Error API recommendation to not create void functions with errp. Also we can use bool return instead of return through pointer. + bool *match, Error **errp) { -int ret; -char *query; -size_t len = strlen(pattern); - -assert(len); - -query = g_malloc(len); -ret = nbd_opt_read(client, query, len, errp); -if (ret <= 0) { -g_free(query); -return ret; -} - -if (strncmp(query, pattern, len) == 0) { +if (len == 0) { +if (client->opt == NBD_OPT_LIST_META_CONTEXT) { +*match = true; +} +trace_nbd_negotiate_meta_query_parse("empty"); +} else if (strcmp(pattern, ns) == 0) { trace_nbd_negotiate_meta_query_parse(pattern); *match = true; } else { trace_nbd_negotiate_meta_query_skip("pattern not matched"); } -g_free(query); - -return 1; -} - -/* - * Read @len bytes, and set @match to true if they match @pattern, or if @len - * is 0 and the client is performing _LIST_. @match is never set to false. - * - * Return -errno on I/O error, 0 if option was completely handled by - * sending a reply about inconsistent lengths, or 1 on success. - * - * Note: return code = 1 doesn't mean that we've read exactly @pattern. - * It only means that there are no errors. - */ -static int nbd_meta_empty_or_pattern(NBDClient *client, const char *pattern, - uint32_t len, bool *match, Error **errp) -{ -if (len == 0) { -if (client->opt == NBD_OPT_LIST_META_CONTEXT) { -*match = true; -} -trace_nbd_negotiate_meta_query_parse("empty"); -return 1; -} - -if (len != strlen(pattern)) { -trace_nbd_negotiate_meta_query_skip("different lengths"); -return nbd_opt_skip(client, len, errp); -} - -return nbd_meta_pattern(client, pattern, match, errp); } /* nbd_meta_base_query * * Handle queries to 'base' namespace. For now, only the base:allocation - * context is available. 'len' is the amount of text remaining to be read from - * the current name, after the 'base:' portion has been stripped. - * - * Return -errno on I/O error, 0 if option was completely handled by - * sending a reply about inconsistent lengths, or 1 on success. + * context is available. @len is the length of @ns, including the 'base:' + * prefix. */ -static int nbd_meta_base_query(NBDClient *
Re: [PATCH 3/3] nbd: Add 'qemu-nbd -A' to expose allocation depth
On Fri, Sep 25, 2020 at 03:32:49PM -0500, Eric Blake wrote: > Allow the server to expose an additional metacontext to be requested > by savvy clients. qemu-nbd adds a new option -A to expose the > qemu:allocation-depth metacontext through NBD_CMD_BLOCK_STATUS; this > can also be set via QMP when using nbd-server-add. > > qemu as client can be hacked into viewing this new context by using > the now-misnamed x-dirty-bitmap option when creating an NBD blockdev; > although it is worth noting the decoding of how such context > information will appear in 'qemu-img map --output=json': > > NBD_STATE_DEPTH_UNALLOC => "zero":false, "data":true > NBD_STATE_DEPTH_LOCAL => "zero":false, "data":false > NBD_STATE_DEPTH_BACKING => "zero":true, "data":true > > libnbd as client is probably a nicer way to get at the information > without having to decipher such hacks in qemu as client. ;) I've been meaning to add extents information to nbdinfo, or perhaps a new tool ("nbdmap"). Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-builder quickly builds VMs from scratch http://libguestfs.org/virt-builder.1.html
Re: [PATCH 2/3] nbd: Add new qemu:allocation-depth metacontext
On Fri, Sep 25, 2020 at 03:32:48PM -0500, Eric Blake wrote: > +The second is related to exposing the source of various extents within > +the image, with a single context named: > + > +qemu:allocation-depth > + > +In the allocation depth context, bits 0 and 1 form a tri-state value: > + > +bits 0-1 clear: NBD_STATE_DEPTH_UNALLOC, means the extent is unallocated > +bit 0 set: NBD_STATE_DEPTH_LOCAL, the extent is allocated in this image > +bit 1 set: NBD_STATE_DEPTH_BACKING, the extent is inherited from a > + backing layer >From the cover description I imagined it would show the actual depth, ie: top -> backing -> backing -> backing depth: 12 3 (0 = unallocated) I wonder if that is possible? (Perhaps there's something I don't understand here.) Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com libguestfs lets you edit virtual machines. Supports shell scripting, bindings from many languages. http://libguestfs.org