Ludovic Courtès <l...@gnu.org> skribis: > Long story short: there seems to be a problem with signal delivery. > Most likely, the initial grace period expiration above when stopping > nscd is a symptom of shepherd no longer receiving/processing SIGCHLD > rather than the cause.
Another possibility is lockup: one of the relevant fibers is either gone or stuck in ‘put-message’ or ‘get-message’. I did two things: b9a37f3 shepherd: Make signal handling fiber an essential task. 8ae2780 service: Do not attempt to restart transient services. Commit 8ae2780 fixes a bug whereby ‘herd restart’ could end up attempting to restart a transient service, which would lock up the calling fiber because the service’s controlling fiber would first receive the 'terminate message, so it would return and nobody would be reading further messages send on its channel. Commit b9a37f3 will allows us to ensure that the signal-handling fiber never exits (and we’ll get a trace in the log if it tries to). Ludo’.