Re: RFR 10: 8184808 (process) isAlive should use pid for validity, not /proc/pid

Roger Riggs Thu, 20 Jul 2017 07:28:25 -0700

Hi Thomas,

Thanks for the investigation and links.

The variations, across os's, in the status of exited vs reaped (zombie)process have been a

problem for quite a while (for portable apps).

The description of waitpid is focused heavily on child processes; this aparticular caseis dealing with non-child processes so I stayed with using kill(pid,0)to determine liveness.


Thanks, Roger


On 7/19/2017 4:20 AM, Thomas Stüfe wrote:

Hi Roger,

On Tue, Jul 18, 2017 at 9:01 PM, Roger Riggs <[email protected]<mailto:[email protected]>> wrote:


    Hi Thomas,

    Yes, if there is no access to the pid, then it can't report alive
    or not, and assume not.
    If there access restrictions it will apply to the waitid/waitpid
    in the waitForProcessExit0
    logic also and the answer will be at least consistent (and avoid a
    possible race
    between //proc/pid/psinfo and kill state).

    Thanks, Roger

Okay, sounds reasonable. Interestingly, while reading up on thesemantics of kill(), I found:


http://pubs.opengroup.org/onlinepubs/009695399/functions/kill.html

"Existing implementations vary on the result of a kill() with pidindicating an inactive process (a terminated process that has not beenwaited for by its parent). Some indicate success on such a call(subject to permission checking), while others give an error of[ESRCH]. Since the definition of process lifetime in this volume ofIEEE Std 1003.1-2001 covers inactive processes, the [ESRCH] error asdescribed is inappropriate in this case. In particular, this meansthat an application cannot have a parent process check for terminationof a particular child with kill(). (Usually this is done with the nullsignal; this can be done reliably with waitpid().)"

So, kill() may return success for terminated but not yet reapedprocesses. I did not know that.

But this does not invalidate your change, does it, if all you want todo is to force one consistent view. At least I did not find any coderelying on isAlive returning false for not-yet-reaped processes.


Thanks, Thomas


    On 7/18/2017 2:53 PM, Thomas Stüfe wrote:

    Hi Roger,

    I think this may fail if you have no permission to send a signal
    to that process. In that case, kill(2) may yield EPERM and
    isAlive may return false even though the process is alive.

    But then, I am not sure if that could happen in that particular
    scenario, plus it may also mean that you do not have access to
    /proc/pid either. So, I do not know how much of an issue this
    could be.

    Otherwise, the fix seems straightforward.

    Kind Regards, Thomas

    On Tue, Jul 18, 2017 at 8:46 PM, Roger Riggs
    <[email protected] <mailto:[email protected]>> wrote:

        Please review a fix for an intermittent failure in the
        ProcessHandle OnExitTest
        that fails frequently on Solaris.

        ProcessHandle.isAlive is using /proc/pid/psinfo to determine
        if a process is alive and it's start time.
        However, it appears that the between the process exiting and
        the reaping of its status, the
        psinfo file indicates the process is alive but kill(pid, 0)
        reports that is is not alive.
        Depending on a race, the ProcessHandler.onExit may determine
        the process has exited
        but later isAlive may report it is alive.

        To have a consistent view of the process being alive,
        ProcessHandle.isAlive in its native implementation
        should use kill(pid, 0) to determine if the process is
        definitively determine if the process alive.

        The original issue[1] will be kept open until it is known
        that it is resolved.

        Webrev:
        http://cr.openjdk.java.net/~rriggs/webrev-alive-solaris-8184808/
        <http://cr.openjdk.java.net/%7Erriggs/webrev-alive-solaris-8184808/>

        Issue:
        https://bugs.openjdk.java.net/browse/JDK-8184808
        <https://bugs.openjdk.java.net/browse/JDK-8184808>

        Thanks, Roger

        [1] https://bugs.openjdk.java.net/browse/JDK-8177932
        <https://bugs.openjdk.java.net/browse/JDK-8177932>

Re: RFR 10: 8184808 (process) isAlive should use pid for validity, not /proc/pid

Reply via email to