Hi Thomas,
Thanks for the investigation and links.
The variations, across os's, in the status of exited vs reaped (zombie)
process have been a
problem for quite a while (for portable apps).
The description of waitpid is focused heavily on child processes; this a
particular case
is dealing with non-child processes so I stayed with using kill(pid,0)
to determine liveness.
Thanks, Roger
On 7/19/2017 4:20 AM, Thomas Stüfe wrote:
Hi Roger,
On Tue, Jul 18, 2017 at 9:01 PM, Roger Riggs <[email protected]
<mailto:[email protected]>> wrote:
Hi Thomas,
Yes, if there is no access to the pid, then it can't report alive
or not, and assume not.
If there access restrictions it will apply to the waitid/waitpid
in the waitForProcessExit0
logic also and the answer will be at least consistent (and avoid a
possible race
between //proc/pid/psinfo and kill state).
Thanks, Roger
Okay, sounds reasonable. Interestingly, while reading up on the
semantics of kill(), I found:
http://pubs.opengroup.org/onlinepubs/009695399/functions/kill.html
"Existing implementations vary on the result of a kill() with pid
indicating an inactive process (a terminated process that has not been
waited for by its parent). Some indicate success on such a call
(subject to permission checking), while others give an error of
[ESRCH]. Since the definition of process lifetime in this volume of
IEEE Std 1003.1-2001 covers inactive processes, the [ESRCH] error as
described is inappropriate in this case. In particular, this means
that an application cannot have a parent process check for termination
of a particular child with kill(). (Usually this is done with the null
signal; this can be done reliably with waitpid().)"
So, kill() may return success for terminated but not yet reaped
processes. I did not know that.
But this does not invalidate your change, does it, if all you want to
do is to force one consistent view. At least I did not find any code
relying on isAlive returning false for not-yet-reaped processes.
Thanks, Thomas
On 7/18/2017 2:53 PM, Thomas Stüfe wrote:
Hi Roger,
I think this may fail if you have no permission to send a signal
to that process. In that case, kill(2) may yield EPERM and
isAlive may return false even though the process is alive.
But then, I am not sure if that could happen in that particular
scenario, plus it may also mean that you do not have access to
/proc/pid either. So, I do not know how much of an issue this
could be.
Otherwise, the fix seems straightforward.
Kind Regards, Thomas
On Tue, Jul 18, 2017 at 8:46 PM, Roger Riggs
<[email protected] <mailto:[email protected]>> wrote:
Please review a fix for an intermittent failure in the
ProcessHandle OnExitTest
that fails frequently on Solaris.
ProcessHandle.isAlive is using /proc/pid/psinfo to determine
if a process is alive and it's start time.
However, it appears that the between the process exiting and
the reaping of its status, the
psinfo file indicates the process is alive but kill(pid, 0)
reports that is is not alive.
Depending on a race, the ProcessHandler.onExit may determine
the process has exited
but later isAlive may report it is alive.
To have a consistent view of the process being alive,
ProcessHandle.isAlive in its native implementation
should use kill(pid, 0) to determine if the process is
definitively determine if the process alive.
The original issue[1] will be kept open until it is known
that it is resolved.
Webrev:
http://cr.openjdk.java.net/~rriggs/webrev-alive-solaris-8184808/
<http://cr.openjdk.java.net/%7Erriggs/webrev-alive-solaris-8184808/>
Issue:
https://bugs.openjdk.java.net/browse/JDK-8184808
<https://bugs.openjdk.java.net/browse/JDK-8184808>
Thanks, Roger
[1] https://bugs.openjdk.java.net/browse/JDK-8177932
<https://bugs.openjdk.java.net/browse/JDK-8177932>