Hi, Ludovic Courtès <l...@gnu.org> writes:
> Hi, > > Josselin Poiret <d...@jpoiret.xyz> skribis: > >> Maxim Cournoyer <maxim.courno...@gmail.com> writes: >> >>> This leads me to believe that Shepherd does not block until the process >>> is actually dead to mark the process as stopped (it just waitpid on the >>> group pid with WNOHANG), which means it won't block if the child process >>> hasn't exited yet, if I'm correct. > > Correct: the service is marked as stopped as soon as ‘stop’ returns. > >>> When we are in the stop slot, we know for sure that the process should >>> terminate completely, hence it'd make sense to call 'waitpid' *without* >>> WNOHANG there, to avoid 'herd restart' from starting the service while >>> its stopped process is not done terminating. >>> >>> jamid can take quite some time to terminate cleanly because of the >>> networking threads in the opendht library that needs to be finalized, >>> which is probably the reason this problem can be observed here. >>> >>> Thoughts? >> >> I agree with you, make-kill-destructor should waitpid the processes it's >> killing. There shouldn't be any issues waitpid'ing before the >> shepherd's signal handler, since stop actions are run with asyncs >> disabled. The signal handler will run once but won't get anything >> because all the processes were already waitpid'd and it uses WNOHANG. > > I think we need an extra “stopping” state for services. In general, > we’ll want to send SIGTERM, wait for some grace period or dead process > notification, then send SIGKILL, and finally change state to “stopped”. > > This is not possible in 0.9 but is something I’d like to have in 0.10¹. This sounds good. Let's keep this ticket open until this goodness lands, as a reminder. Thank you! Maxim