Re: ProcessReaper: single thread reaper

Peter Levart Mon, 14 Apr 2014 13:58:33 -0700


On 04/14/2014 07:02 PM, David M. Lloyd wrote:

On 04/14/2014 11:37 AM, Peter Levart wrote:

On 04/14/2014 04:37 PM, roger riggs wrote:

Hi,


Jtreg, for example, needs a reliable way to cleanup after tests.
We've had a variety of problems with stray processes left over because
there is no visibility nor reliable way to identify and kill them.

Roger


Hi Roger,

If you want to reliably get rid of all ancestors then there's only one
way on UNIX:


for (Proc c : enumerateDirectChildrenOfJVM()) {
     getRidOfTreeRootedAt(c);
}

getRidOfTreeRootedAt(Proc p) {

// if we're not alive any more, then we can't have children -they are

     // orphans and we can't identify them any more (their parent is
"init")
     if (p.isAlive()) {
         // save list of direct children 1st, since they will be
re-parented when
         // their parent is gone, preventing enumerating them later...
         List<Proc> children = p.enumerateDirectChildren();
         // try gracefull...
         p.terminateGrecefully();
         // wait a while
         if (p.isAlive()) p.terminateForcefully();
         // now iterate children
         for (C : children) {
             getRidOfTreeRootedAt(C);
         }
     }
}

I don't think this is a good idea. If a grandchild process exits, andthe parent waits() on it, then by the time we get around to iteratinggrandchild processes, the OS may have assigned a new process the oldPID. Zombies are pretty much the only reliable way to ensure that theprocess is the one we think it is, and we can only reliably do thatfor immediate children AFAICT.

There's already such a race in current implementation ofProcess.terminate(). It admittedly only concerns a small window betweenprocess exiting and the reaper thread managing to signal this state tothe other threads wishing to terminate it at the same time, so it couldhappen that a KILL/TERM signal is sent to an already deceased PID whichwas re-used, but it doesn't happen in practice since PIDs are notre-used very soon typically.

But I agree, waiting between listing children and sending them signalsincreases the chance of hitting a reused PID.


Regards, Peter



- must 1st terminate the parent (hopefully with grace and it will take
care of children) because if you kill a child 1st, a persistent parent
might re-spawn it.
- must enumerate the children before terminating the parent, because

they are re-parented when the parent dies and you can't find them anymore.



So my list of requirements for the new API that I submitted in previous
message:

On 04/14/2014 05:54 PM, Peter Levart wrote:

- enumerate direct children (regardless of which API was used to spawn
them) of JVM
- trigger graceful destruction of any direct child
- non-blocking query for liveness of any direct child
- trigger forcible termination of any direct child and all descendants
in one call
- (optionally: obtain a Process object of any live direct child that
was spawned by Process API)


...must be augmented:

- enumerate direct children (regardless of which API was used to spawn
them) of JVM
- enumerate direct children of any child enumerated by the API
- trigger graceful destruction of any ancestor enumerated by the API
- non-blocking query for liveness of any ancestor enumerated by the API
- trigger forcible termination of any ancestor enumerated by the API
- (optionally: obtain a Process object of any live direct JVM child that
was spawned by Process API)


Regards, Peter



On 4/14/2014 10:31 AM, David M. Lloyd wrote:

Where does the requirement to manage grandchild processes actually
come from?  I'd hate to see the ability to "nicely" terminate
immediate child processes lost just because it was difficult to
implement some grander scheme.

On 04/14/2014 08:49 AM, roger riggs wrote:

Hi Martin,

A new API is needed, overloading the current Process API is not agood

option.
Even within Process a new method will be needed to destroy the
subprocess and all
of its children maintain backward compatibility.

Are there specific OS features that need to be exposed toapplications?

Is the destroy-process-and-all-children abstraction too coarse.

Roger





On 4/11/2014 7:37 PM, Martin Buchholz wrote:

Let's step back again and try to check our goals...

We could try to optimize the one-reaper-thread-per-subprocess thing.

But that is risky, and the cost of what we're doing today is notthat

high.

We could try to implement the feature of killing off an entire
subprocess tree.  But historically, any kind of behavior change like
that has been vetoed.  I have tried and failed to make less
incompatible changes.  We would have to add a new API.

The reality is that Java does not give you real access to the
underlying OS, and unless there's a seriously heterodox attempt to

provide OS-specific extensions, people will have to continue toeither

write native code or delegate to an OS-savvy subprocess like a perl
script.

On Fri, Apr 11, 2014 at 7:52 AM, Peter Levart<[email protected]

<mailto:[email protected]>> wrote:

    On 04/09/2014 07:02 PM, Martin Buchholz wrote:

    On Tue, Apr 8, 2014 at 11:08 PM, Peter Levart
    <[email protected] <mailto:[email protected]>> wrote:

        Hi Martin,

        As you might have seen in my later reply to Roger, there's
still hope on that front: setpgid() + wait(-pgid, ...)might
        be the answer. I'm exploring in that direction. Shells are
        doing it, so why can't JDK?
It's a little trickier for Process API, since I imaginethat
        shells form a group of processes from a pipeline which is
known in-advance while Process API will have to addprocessesto the live group dynamically. So some races will haveto be
        resolved, but I think it's doable.


    This is a clever idea, and it's arguably better to design
subprocesses so they live in separate process groups (emacsdoes
    that), but:
Every time you create a process group, you change the effectof a
    user signal like Ctrl-C, since it's sent to only one group.
    Maybe propagate signals to the subprocess group? It's starting
    to get complicated...


    Hi Martin,

    Yes, shells send Ctrl-C (SIGINT) and other signals initiated by
    terminal to a (foreground) process group. A process group is

formed from a pipeline of interconnected processes. Eachpipeline

    is considered to be a separate "job", hence shells call this

feature "job-control". Child processes by default inheritprocess

    group from it's parent, so children born with Process API (and
    their children) inherit the process group from the JVM process.
    Considering the intentions of shell job-controll, is propagating
    SIGTERM/SIGINT/SIGTSTP/SIGCONT signals to children spawned by

Process API desirable? If so, then yes, handling thosesignals in

    JVM and propagating them to current process group that contains
    all children spawned by Process API and their descendants would

have to be performed by JVM. That problem would certainlyhave to

    be addressed. But let's first see what I found out about

sigaction(SIGCHLD, ...), setpgid(pid, pgid), waitpid(-pgid,...),

    etc...

    waitpid(-pgid, ...) alone seems to not be enough for our task.
    Mainly because a process can re-assign it's group and join some
    other group. I don't know if this is a situation that occurs in
    real world, but imagine if we have one live child process in a

process group pgid1 and no unwaited exited children. If weissue:


        waitpid(-pgid1, &status, 0);

    Then this call blocks, because at the time it was given, there
    were >0 child processes in the pgid1 group and none of them has
    exited yet. Now if this one child process changes it's process
    group with:

        setpgid(0, pgid2);

Then the waitpid call in the parent does not return (maybethis is

    a bug in Linux?) although there are no more live child processes

in the pgid1 group any more. Even when this child exits, thecall

    to waitpid does not return, since this child is not in the group

we are waiting for when it exits. If all our children"escape" thegroup in such way, the tread doing waiting will neverunblock. Tosolve this, we can employ signal handlers. In a signalhandler for

    SIGCHLD signal we can invoke:

        waitpid(-pgid1, &status, WNOHANG); // non-blocking call

    ...in loop until it either returns (0) which means that there're

no more unwaited exited children in the group at the momen or(-1)

    with errno == ECHILD, which means that there're no more children

in the queried group any more - the group does not exist anymore.

    Since signal handler is invoked whith SIGCHLD being masked and
    there is one bit of pending signal state in the kernel, no child
    exit can be "skipped" this way. Unless the child "escapes" by
    changing it's group. I don't know of a plausible reason for a
    program to change it's process group. If a program executing as
    JVM child wants to become a background daemon it usually behaves
    as follows:

    - fork()s a grand-child and then exit()s (so we get notified via
    signal and waitpid(-pgid, ...) successfully for it's exitstatus)
    - the grand-child then changes it's session and group (becomes
    session and group leader), closes file descriptors, etc. The
    responsibility for waiting on the grand-child daemon is
    transferred to the init process (pid=1) since the grand-child
    becomes an orphan (has no parent).

Ignoring this still unsolved problem of possible ill-behavedchildprogram that changes it's process group, I startedconstructing a

    proof-of-concept prototype. What I will do in the prototype is
    start throwing IllegalStateException from the methods of the
    Process API that pertain to such children. I think this is
reasonable.

    Stay tuned,

    Peter

Re: ProcessReaper: single thread reaper

Reply via email to