On Sat, Nov 08, 2025 at 04:59:31PM -0600, Eric Blake wrote:
> The user "John Doe" reported a deadlock when attempting to use
> qemu-storage-daemon to serve both a base file over NBD, and a qcow2
> file with that NBD export as its backing file, from the same process,
> even though it worked just fine when there were two q-s-d processes.
> The bulk of the NBD server code properly uses coroutines to make
> progress in an event-driven manner, but the code for spawning a new
> coroutine at the point when listen(2) detects a new client was
> hard-coded to use the global GMainContext; in other words, the
> callback that triggers nbd_client_new to let the server start the
> negotiation sequence with the client requires the main loop to be
> making progress.  However, the code for bdrv_open of a qcow2 image
> with an NBD backing file uses an AIO_WAIT_WHILE nested event loop to
> ensure that the entire qcow2 backing chain is either fully loaded or
> rejected, without any side effects from the main loop causing unwanted
> changes to the disk being loaded (in short, an AioContext represents
> the set of actions that are known to be safe while handling block
> layer I/O, while excluding any other pending actions in the global
> main loop with potentially larger risk of unwanted side effects).
> 
> This creates a classic case of deadlock: the server can't progress to
> the point of accept(2)ing the client to write to the NBD socket
> because the main loop is being starved until the AIO_WAIT_WHILE
> completes the bdrv_open, but the AIO_WAIT_WHILE can't progress because
> it is blocked on the client coroutine stuck in a read() of the
> expected magic number from the server side of the socket.
> 
> This patch adds a new API to allow clients to opt in to listening via
> an AioContext rather than a GMainContext.  This will allow NBD to fix
> the deadlock by performing all actions during bdrv_open in the main
> loop AioContext.
> 
> An upcoming patch will then add a unit test (kept separate to make it
> easier to rearrange the series to demonstrate the deadlock without
> this patch).
> 
> Fixes: https://gitlab.com/qemu-project/qemu/-/issues/3169
> Signed-off-by: Eric Blake <[email protected]>
> 
> ---
> v2: Retitle and add new API rather than changing semantics of
> existing qio_net_listener_set_client_func; use qio accessor rather
> than direct access to the sioc fd and a lower-level aio call
> ---
>  include/io/net-listener.h | 16 ++++++++++++++++
>  io/net-listener.c         | 36 +++++++++++++++++++++++++++++++++---
>  2 files changed, 49 insertions(+), 3 deletions(-)

Reviewed-by: Daniel P. Berrangé <[email protected]>


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


Reply via email to