When a server decides to close a client, the virNetServerClientCloseLocked() is called. In here various cleanup steps are taken, but the most important part (from this commit's POV at least) is the way that the socket is closed. Firstly, removal of the socket associated with the client from the event loop is signalized and then the socket is unrefed. The socket is not closed just yet though, because the event loop holds a reference to it. This reference will be freed as soon as the event loop wakes up and starts issuing callbacks (in this case virNetSocketEventFree()).
So far, this is how things usually work. But if the daemon reaches the number of opened files limit, things start to work differently. If the RLIMIT_NOFILE limit is reached and there's a client that wants to connect then the event loop wakes up, sees POLLIN on the socket and calls virNetServerServiceAccept() which in turn calls virNetSocketAccept(). But because of the limit, accept() fails with EMFILE leaving the POLLIN event unhandled. The dispatch then continues to next FDs with events on them. BUT, it will NOT call the socket removal callback (virNetSocketEventFree()) because it has low priority (G_PRIORITY_DEFAULT_IDLE). Per glib's documentation: * Each event source is assigned a priority. The default priority, * %G_PRIORITY_DEFAULT, is 0. Values less than 0 denote higher priorities. * Values greater than 0 denote lower priorities. Events from high priority * sources are always processed before events from lower priority sources. and per g_idle_add() documentation: * Adds a function to be called whenever there are no higher priority * events pending to the default main loop. The function is given the * default idle priority, %G_PRIORITY_DEFAULT_IDLE. Now, because we did not accept() the client we are constantly seeing POLLIN on the main socket and thus the removal of the client socket won't ever happen. The fix is to set at least the same priority as other sources, but since we want to just close an FD, let's give it the highest priority and call it before handling other events. This issue can be easily reproduced, for instance: # ulimit -S -n 40 (tweak this number if needed) # ./src/libvirtd from another terminal: # for ((i=0; i<100; i++)); do virsh list & done; virsh list The last `virsh list` must not get stuck. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2007168 Signed-off-by: Michal Privoznik <mpriv...@redhat.com> --- src/util/vireventglib.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/util/vireventglib.c b/src/util/vireventglib.c index f3e5a344b0..983787932f 100644 --- a/src/util/vireventglib.c +++ b/src/util/vireventglib.c @@ -287,7 +287,7 @@ virEventGLibHandleRemove(int watch) * 'removed' to prevent reuse */ data->removed = TRUE; - g_idle_add(virEventGLibHandleRemoveIdle, data); + g_idle_add_full(G_PRIORITY_HIGH, virEventGLibHandleRemoveIdle, data, NULL); ret = 0; -- 2.32.0