On Fri, 12 May 2023 15:24:19 GMT, Volker Simonis <simo...@openjdk.org> wrote:

> Since JDK13, executing commands in a sub-process defaults to the so called 
> `POSIX_SPAWN` launching mechanism (i.e. 
> `-Djdk.lang.Process.launchMechanism=POSIX_SPAWN`) on Linux. This works by 
> using `posix_spawn(3)` to firstly start a tiny helper program called 
> `jspawnhelper` in a subprocess. In a second step, `jspawnhelper` reads the 
> command data from the parent Java process over a Unix pipe and finally 
> executes (i.e. `execvp(3)`) the requested command.
> 
> In cases where the parent process terminates abnormally before `jspawnhelper` 
> has read all the expected data from the pipe, `jspawnhelper` will block 
> indefinitely on the reading end of the pipe. This is especially harmful if 
> the parent process had open sockets, because in that case, the forked 
> `jspawnhelper` process will inherit them and keep all the corresponding ports 
> open effectively preventing other processes to bind to them. Notice that this 
> is not an abstract scenario. We've observed this regularly in production with 
> services which couldn't be restarted after a crash after migrating to JDK 17.
> 
> The fix of the issue is rather trivial. `jspawnhelper` has to close its 
> writing end of the pipe which connects it with the parent Java process 
> *before* starting to read from that pipe such that reads from the pipe can 
> immediately return with EOF if the parent process terminates abnormally.
> 
> Also did some cleanup:
>  - in `jspawnhelper.c` correctly chek the result of `readFully()`
>  - in `ProcessImpl_md.c` use `restartableWrite()` instead of `write()` to 
> write the command data from the parent process to `jspawnhelper`
>  - in `childproc.{c,h}` remove remaining traces of `MODE_CLONE` because 
> that's long gone.

>     1. I think the child should probably close (but without error handling) 
> the _read_ end of the fail pipe too _before_ sending out the liveness ping.
> 
> 
> https://github.com/openjdk/jdk/blob/020a60e6449749ca979c1ace3af7db57f4c53a5f/src/java.base/unix/native/libjava/childproc.c#L335
> 
> Because the same problem could arise if the parent were to terminate. Child 
> writes to the pipe, and theoretically could block on a full pipe since it 
> holds open an fd to the read end. I admit this is highly theoretical since 
> the pipe would have to be full for this to happen, and nobody wrote to the 
> pipe yet, and the aliveness ping is just an integer (4 bytes). But to be more 
> correct, and since the fix is so trivial, the fail pipe read end should be 
> closed in the child process before writing to fail pipe.

I thought about this, but as you said, I think it can not happen as we only 
write at most two times 4 bytes. Also, if we close the read end of the fail 
pipe in the child and the parent crashes (i.e. there's no read end at all), we 
will get a SIGPIPE which we would then have to handle in `jspawnhelper`. In the 
end, I don't think that's worth it for an event which should never happen (at 
least not in this PR).

> 
>     2. I think you don't actually have to hand in the in-pipe-read-end fd 
> number via command line arg, just to have the child to close it. You could 
> just, in the parent, set the fd to FD_CLOEXEC. Since posix_spawn() exec's the 
> spawn helper, this would close the file descriptor for you. Extending this 
> thought, you could do this with all pipe ends you want to cauterize in the 
> child process.

I've deliberately not used `FD_CLOEXEC`  because the file descriptor closing 
code in `childProcess()` is shared by all launch mechanisms. So to make it 
work, I'd had to reset the corresponding file descriptors to `-1` in the child 
anyway.

I therefor felt the current fix is smaller, simpler and easier to understand.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/13956#issuecomment-1548154907

Reply via email to