Hi Martin,
As you might have seen in my later reply to Roger, there's still hope on
that front: setpgid() + wait(-pgid, ...) might be the answer. I'm
exploring in that direction. Shells are doing it, so why can't JDK?
It's a little trickier for Process API, since I imagine that shells form
a group of processes from a pipeline which is known in-advance while
Process API will have to add processes to the live group dynamically. So
some races will have to be resolved, but I think it's doable.
Stay tuned.
Regards, Peter
On 04/08/2014 07:48 PM, Martin Buchholz wrote:
Peter, thank you very much for your deep analysis.
TIL and am horrified: signals on Unix are not queued, not even if you
specify SA_SIGINFO. Providing siginfo turns signals into proper
"messages" each with unique content, and it is unacceptable to simply
drop some (Especially when proper queueing seems required for
so-called real-time signals), but at least the Linux kernel does so
very deliberately. 45 years later, we are still fighting with
unreliable Unix signals...
We can't call waitpid(WAIT_ANY, ) because we can only wait for
processes owned by the j.l.Process subsystem. We can't override libc
functions like waitpid because the JVM may be a "guest" in some other
process.
I don't know of any public examples, but it is reasonable to add a JVM
to a previously pure native code application, similarly to the way tcl
or lua is often used to provide a higher-level safer programming api
to native code, and some programs at Google use this strategy.
What problem are we actually trying to solve? The army of reaper
threads is ugly, but the inefficiency is greatly mitigated by the use
of small explicit stack sizes. Redoing the process code is always
risky, as we have already seen in this thread.
Maintaining a single child helper process which spawns all the
(grand)child processes seems reasonable, although it would create a
permanent intermediate entry in the process table (pstree?) which
might confuse some sysadmin scripts. Is it worth it?