On Fri, Jun 28, 2024 at 11:58:37AM GMT, Alexander Ivanov wrote:
> Ping?
> 
> On 6/7/24 17:00, Alexander Ivanov wrote:
> >   static void nbd_blockdev_client_closed(NBDClient *client, bool ignored)
> >   {
> >       nbd_client_put(client);
> > +    if (nbd_server == NULL) {
> > +        return;
> > +    }
> >       assert(nbd_server->connections > 0);
> >       nbd_server->connections--;
> >       nbd_update_server_watch(nbd_server);
>

I've spent more time looking at this, and I'm stumped.  Maybe Dan can
help out.  If I understand correctly, here's the race situation we
have:

qemu main loop                coroutine                    client
QMP nbd-server-start
 - calls nbd_server_start
   - assigns nbd_server = g_new0()
   - calls qio_net_listener_open_sync
   - calls nbd_server_update_watch
     - calls qio_net_listener_set_client_func(, nbd_accept, )
     - waiting for client is now in coroutine
- returns control to QMP main loop
                                                           opens TCP socket
                              - qio notices incoming connection
                                - calls nbd_accept callback
                                - starts NBD handshake nbd_client_new(, 
nbd_blockdev_client_closed)
QMP nbd-server-stop
 - calls nbd_server_stop
   - calls nbd_server_free
     - calls qio_net_listener_disconnect()
       - marks qio channel as useless
   - sets nbd_server = NULL
                                - qio sees channel is useless, disconnects
                                                           client deals 
gracefully with dead connection
                                - calls nbd_blockdev_client_closed callback
                                - segfault since nbd_server->connections derefs 
NULL


Thread 1 "qemu-kvm" received signal SIGSEGV, Segmentation fault.
0x000055555600af59 in nbd_blockdev_client_closed (client=0x55555810dfc0, 
    ignored=false) at ../blockdev-nbd.c:56
56          assert(nbd_server->connections > 0);
(gdb) p nbd_server
$1 = (NBDServerData *) 0x0
(gdb) bt
#0  0x000055555600af59 in nbd_blockdev_client_closed
    (client=0x55555810dfc0, ignored=false) at ../blockdev-nbd.c:56
#1  0x0000555555ff72f9 in client_close
    (client=0x55555810dfc0, negotiated=false) at ../nbd/server.c:1624
#2  0x0000555555ffbc49 in nbd_co_client_start (opaque=0x55555810dfc0)
    at ../nbd/server.c:3198
#3  0x0000555556220629 in coroutine_trampoline (i0=1488434896, i1=21845)

Worse, the race could go another way: if another QMP nbd-server-start
is called fast enough before the coroutine finishes the nbd_accept
code from the first connection, it could attempt to modify the second
nbd_server object, which may be associated with a completely different
socket.

As far as I can tell, the problem stems from the fact that the attempt
to establish the connection with the client is continuing in a
background coroutine despite our efforts to honor the QMP
nbd-server-stop command.  But I'm not sure on the proper way to inform
qio to abandon all incoming traffic to a given net listener.  Or maybe
I should just change the semantics of QMP nbd-server-stop to fail if
there are known connections from and nbd_accept() - but I still don't
want that to wait indefinitely, so I still want some way to tell the
qio channel that the server is going away and to tear down sockets as
soon as possible.

As a stopgap, something like this might prevent the SEGV:

diff --git i/blockdev-nbd.c w/blockdev-nbd.c
index 213012435f4..2f5ea094407 100644
--- i/blockdev-nbd.c
+++ w/blockdev-nbd.c
@@ -277,6 +277,12 @@ void qmp_nbd_server_stop(Error **errp)

     blk_exp_close_all_type(BLOCK_EXPORT_TYPE_NBD);

+    if (qio_net_listener_is_connected(nbd_server->listener) ||
+        nbd_server->connections > 0) {
+        error_setg(errp, "NBD server still has connected clients");
+        return;
+    }
+
     nbd_server_free(nbd_server);
     nbd_server = NULL;
 }

but it's not as graceful as I'd like (it would be nicer to have the
nbd-server-stop command wait until it knows all connections are
cleaned, instead of failing up front).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org


Reply via email to