On Tue, Nov 11, 2025 at 01:09:24PM -0600, Eric Blake wrote:
> > In the current code, when we unref() the GSource for the socket
> > watch, the destroy-notify does not get called, because the
> > event thread is in the middle of a dispatch callback for the
> > I/O event. When the dispatch callback returns control to the
> > event loop, the GSourceCallback is unrefed, and this triggers
> > the destroy-notify call, which unrefs the listener.
> >
> > The flow looks like this:
> >
> > Thread 1:
> > qio_net_listener_set_client_func(lstnr, f, ...);
> > => foreach sock: socket
> > => object_ref(lstnr)
> > => sock_src = qio_channel_socket_add_watch_source(sock,
> > ...., lstnr, object_unref);
> >
> > Thread 2:
> > poll()
> > => event POLLIN on socket
> > => ref(GSourceCallback)
> > => call dispatch(sock)
> > ...do stuff..
> >
> > Thread 1:
> > qio_net_listener_set_client_func(lstnr, NULL, ...);
> > => foreach sock: socket
> > => g_source_unref(sock_src)
> > unref(lstnr) (the final reference)
> > => finalize(lstnr)
>
> If I understand correctly, _this_ unref(lstnr) is NOT the final
> reference, because of the additional reference still owned by the
> GSource that is still dispatching, so finalize(lstnr) is not reached
> here...
>
> >
> > Thread 2:
> > => return dispatch(sock)
> > => unref(GSourceCallback)
> > => destroy-notify
> > => object_unref
>
> ...but instead here. And that is the desirable property of the
> pre-patch behavior with a per-GSource reference on the listsner object
> - we are guaranteed that listener can't be finalized while there are
> any pending dispatch in flight, even if the caller has unref'd their
> last mention of lstnr, and therefore the dispatch never has a
> use-after-free.
But thinking further, even if it is not an object_unref, but merely a
change of the async callback function, is this the sort of thing where
a mutex is needed to make things safer?
Even without my patch series, we have this scenario:
Thread 1:
qio_net_listener_set_client_func(lstnr, f1, ...);
=> foreach sock: socket
=> object_ref(lstnr)
=> sock_src = qio_channel_socket_add_watch_source(sock, ....,
lstnr, object_unref);
Thread 2:
poll()
=> event POLLIN on socket
=> ref(GSourceCallback) // while lstnr->io_func is f1
...do stuff..
Thread 1:
qio_net_listener_set_client_func(lstnr, f2, ...);
=> foreach sock: socket
=> g_source_unref(sock_src)
=> foreach sock: socket
=> object_ref(lstnr)
=> sock_src = qio_channel_socket_add_watch_source(sock, ....,
lstnr, object_unref);
Thread 2:
=> access lstnr->io_func // now sees f2
=> call f2(sock)
=> return dispatch(sock)
=> unref(GSourceCallback)
=> destroy-notify
=> object_unref
So even though thread 2 noticed a client while f1 was registered,
because we lack a mutex and are not using atomic access primitives,
there is nothing preventing thread 1 from changing the io_func from f1
to f2 before thread 2 finally calls the callback. And if f2 is NULL
(because the caller is deregistering the callback), I could see the
race resulting in qio_net_listener_channel_func() with its pre-patch
code of:
if (listener->io_func) {
listener->io_func(listener, sioc, listener->io_data);
}
resulting in a NULL-pointer deref if thread 1 changed out
listener->io_func after the if but before the dispatch.
Adding a mutex might make QIONetListener slower in the time to grab
the mutex (even if it is uncontended most of the time), but if we have
a proper mutex in place, we could at least guarantee that
listener->io_func, listener->io_data, and listener->io_notify are
changed as a group, rather than sharded if the polling thread is
managing a dispatch with listener as its opaque data at the same time
the user's thread is trying to change the registered callback.
By itself, GSource DOES have a mutex lock around the GMainContext it
is attached to; but our particular use of GSource is operating outside
that locking scheme, so it looks like I've uncovered another latent
problem.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization: qemu.org | libguestfs.org