On Tue, Nov 11, 2025 at 02:07:50PM -0600, Eric Blake wrote:
> On Tue, Nov 11, 2025 at 01:09:24PM -0600, Eric Blake wrote:
> > > In the current code, when we unref() the GSource for the socket
> > > watch, the destroy-notify does not get called, because the
> > > event thread is in the middle of a dispatch callback for the
> > > I/O event.  When the dispatch callback returns control to the
> > > event loop, the GSourceCallback is unrefed, and this triggers
> > > the destroy-notify call, which unrefs the listener.
> > > 
> > > The flow looks like this:
> > > 
> > >   Thread 1:
> > >        qio_net_listener_set_client_func(lstnr, f, ...);
> > >            => foreach sock: socket
> > >                => object_ref(lstnr)
> > >                => sock_src = qio_channel_socket_add_watch_source(sock, 
> > > ...., lstnr, object_unref);
> > > 
> > >   Thread 2:
> > >        poll()
> > >           => event POLLIN on socket
> > >                => ref(GSourceCallback)
> > >                => call dispatch(sock)
> > >                     ...do stuff..
> > > 
> > >   Thread 1:
> > >        qio_net_listener_set_client_func(lstnr, NULL, ...);
> > >           => foreach sock: socket
> > >                => g_source_unref(sock_src)
> > >        unref(lstnr)  (the final reference)
> > >           => finalize(lstnr)
> > 
> > If I understand correctly, _this_ unref(lstnr) is NOT the final
> > reference, because of the additional reference still owned by the
> > GSource that is still dispatching, so finalize(lstnr) is not reached
> > here...
> > 
> > > 
> > >   Thread 2:
> > >               => return dispatch(sock)
> > >               => unref(GSourceCallback)
> > >                   => destroy-notify
> > >                      => object_unref
> > 
> > ...but instead here.  And that is the desirable property of the
> > pre-patch behavior with a per-GSource reference on the listsner object
> > - we are guaranteed that listener can't be finalized while there are
> > any pending dispatch in flight, even if the caller has unref'd their
> > last mention of lstnr, and therefore the dispatch never has a
> > use-after-free.
> 
> But thinking further, even if it is not an object_unref, but merely a
> change of the async callback function, is this the sort of thing where
> a mutex is needed to make things safer?
> 
> Even without my patch series, we have this scenario:
> 
>   Thread 1:
>        qio_net_listener_set_client_func(lstnr, f1, ...);
>            => foreach sock: socket
>                => object_ref(lstnr)
>                => sock_src = qio_channel_socket_add_watch_source(sock, ...., 
> lstnr, object_unref);
> 
>   Thread 2:
>        poll()
>           => event POLLIN on socket
>                => ref(GSourceCallback) // while lstnr->io_func is f1
>                     ...do stuff..
> 
>   Thread 1:
>        qio_net_listener_set_client_func(lstnr, f2, ...);
>           => foreach sock: socket
>                => g_source_unref(sock_src)
>           => foreach sock: socket
>                => object_ref(lstnr)
>                => sock_src = qio_channel_socket_add_watch_source(sock, ...., 
> lstnr, object_unref);
> 
>   Thread 2:
>                => access lstnr->io_func // now sees f2
>                => call f2(sock)
>                => return dispatch(sock)
>                => unref(GSourceCallback)
>                   => destroy-notify
>                      => object_unref
> 
> So even though thread 2 noticed a client while f1 was registered,
> because we lack a mutex and are not using atomic access primitives,
> there is nothing preventing thread 1 from changing the io_func from f1
> to f2 before thread 2 finally calls the callback.  And if f2 is NULL
> (because the caller is deregistering the callback), I could see the
> race resulting in qio_net_listener_channel_func() with its pre-patch
> code of:
> 
>     if (listener->io_func) {
>         listener->io_func(listener, sioc, listener->io_data);
>     }
> 
> resulting in a NULL-pointer deref if thread 1 changed out
> listener->io_func after the if but before the dispatch.

Hmm, yes, that is a risk I hadn't thought of before.

> Adding a mutex might make QIONetListener slower in the time to grab
> the mutex (even if it is uncontended most of the time), but if we have
> a proper mutex in place, we could at least guarantee that
> listener->io_func, listener->io_data, and listener->io_notify are
> changed as a group, rather than sharded if the polling thread is
> managing a dispatch with listener as its opaque data at the same time
> the user's thread is trying to change the registered callback.

Yeah, a mutex should be ok, as it will be uncontended 99.999% of
the time.

> By itself, GSource DOES have a mutex lock around the GMainContext it
> is attached to; but our particular use of GSource is operating outside
> that locking scheme, so it looks like I've uncovered another latent
> problem.

Agreed.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


Reply via email to