Hi David,
Thanks for collecting the threads... I've been a bit occupied with
another task.
On 04/17/2014 02:58 PM, Peter Levart wrote:
... I guess I am indeed going in circles at this point. I wonder
though if you'll indulge me a bit longer, and verify my collected
understanding of the requirements of what is being requested here:
1) The process API must reap all child processes it produces that
terminate during the lifetime of the JVM, leaving no zombies
(including processes which have changed process group and/or session)
2) The process API must allow for child processes which are not
managed by it (by not attempting to reap them except as allowed by #3)
3) The process API must somehow be able to "adopt" other child
processes produced by means other than the Process API
4) The process reaper should keep resource consumption to a minimum
(preferably no more than one thread, preferably no more than one extra
FD per process)
5) The process API must provide an explicitly graceful terminate
method in addition to the existing forcible and "unspecified" destroy
methods
6) The process API must provide safeguards to prevent the wrong
process from being signaled (i.e. would be required to synchronize
process reaping with termination/signaling (PID reuse probabilities
notwithstanding))
I've deliberately left off any mention of direct management of
grandchild processes. I believe it was pretty well established by
Peter Levart that a child is solely the responsibility of its parent.
Martin Buchholz has doubts about it as well. I think Roger Riggs had
some unaddressed disagreement though. For what it's worth, I agree
with Peter on this point, because I think managing grand+children
makes #6 difficult or impossible to satisfy. But the topic, AFAIK,
remains open.
Correct, in the cleanup case, we have seen zombies left around and will
need to investigate the cases.
If the child process does not clean up after itself, someone still does.
Also I haven't brought up anything from JEP 102 that I haven't already
seen on this thread.
These requirements seem to exclude some techniques brought up on the
thread previously:
- waitid(P_ALL,...)/waitpid(-1,...) (which violates #2, either
directly, or by simply failing in the WNOWAIT|WNOHANG + unmanaged
child process case previously outlined by Peter Levart).
- setpgid() to an all-child process group + waitid(P_PID,...) (which
allows badly behaved processes to cause us to violate #1, and also
prevents automatic propagation of e.g. SIGTERM/SIGINT)
- setpgid() to a per-child process group (same problems, also no
workable reaping solution was found that I saw)
- SIGCHLD + siginfo (very unlikely to work consistently or correctly)
- anything relying on WNOWAIT on Mac OS X and maybe others
I think everyone liked the idea of pluggable implementations.
I didn't see this mentioned on this thread, but it seems to me that we
can have a simple 100% correct implementation on UNIX-likes by
retaining a single thread per child process (today each one has a 32k
stack, maybe it could be even smaller?). Much like the default
polling SelectorProvider for NIO, this could act as a simple fallback
implementation that will always work and be correct.
Yes, seem clear for 100% backward compatibility this is needed (and
probably the default at least to start)
On proc-enabled systems, using poll or similar on the corresponding
proc files seems like a possible alternative implementation requiring
one additional FD per child process and only one reaper thread, since
it seems possible to meet all 6 above requirements, though lack of
standardization might add risk.
Using a single thread to iterate all child PIDs each time a SIGCHLD is
received (with WNOHANG) would work without consuming more than one
thread and zero FDs total, however it scales poorly with very large
numbers of child processes, and it might be considered a violation of
#2 to use SIGCHLD anyway. Maybe these ideas could be implemented as
an alternative, contingent on -Xrs, or contingent on the previous
handler being SIG_IGN similarly to the suggestion by Martin Buchholz.
I didn't see any other workable implementation alternatives.
As for API, I had suggested that "adopted" processes have a strict
subset of functions compared to "managed" processes, and thus could be
a supertype of Process. Martin indicated that managing grandchildren
should have a different API altogether. Peter seems to lean towards
exposing the OS capabilities a bit more directly, through child
process ID enumeration (presumably including managed and unmanaged
processes in the same bucket) and an API which operates on any child
process by ID, regardless of its disposition (though I don't know of
any portable API to enumerate child processes; on Linux I believe you
have to use /proc). Peter also suggested that a process reaper be a
primary internal API construct.
I have been working on the premise of a separate API will fewer
functions. The primary function
that is difficult and may need to be omitted is getting the exit status
of an unmanaged subprocess.
I suspect the API may be limited to knowing if the process is alive and
being able to terminate it.
There may need to be a configuration, primarily related to the reaper
that does manage every child.
I have prototyped an implementation that works across the 4/5 main OSs
for iterating over processes.
Thanks for the good summary.
Roger
Did I miss anything?