On Mon, 22 May 2023 10:22:16 GMT, Volker Simonis <[email protected]> wrote:
>> Since JDK13, executing commands in a sub-process defaults to the so called
>> `POSIX_SPAWN` launching mechanism (i.e.
>> `-Djdk.lang.Process.launchMechanism=POSIX_SPAWN`) on Linux. This works by
>> using `posix_spawn(3)` to firstly start a tiny helper program called
>> `jspawnhelper` in a subprocess. In a second step, `jspawnhelper` reads the
>> command data from the parent Java process over a Unix pipe and finally
>> executes (i.e. `execvp(3)`) the requested command.
>>
>> In cases where the parent process terminates abnormally before
>> `jspawnhelper` has read all the expected data from the pipe, `jspawnhelper`
>> will block indefinitely on the reading end of the pipe. This is especially
>> harmful if the parent process had open sockets, because in that case, the
>> forked `jspawnhelper` process will inherit them and keep all the
>> corresponding ports open effectively preventing other processes to bind to
>> them. Notice that this is not an abstract scenario. We've observed this
>> regularly in production with services which couldn't be restarted after a
>> crash after migrating to JDK 17.
>>
>> The fix of the issue is rather trivial. `jspawnhelper` has to close its
>> writing end of the pipe which connects it with the parent Java process
>> *before* starting to read from that pipe such that reads from the pipe can
>> immediately return with EOF if the parent process terminates abnormally.
>>
>> Also did some cleanup:
>> - in `jspawnhelper.c` correctly chek the result of `readFully()`
>> - in `ProcessImpl_md.c` use `restartableWrite()` instead of `write()` to
>> write the command data from the parent process to `jspawnhelper`
>> - in `childproc.{c,h}` remove remaining traces of `MODE_CLONE` because
>> that's long gone.
>
> Volker Simonis has updated the pull request with a new target base due to a
> merge or a rebase. The pull request now contains five commits:
>
> - Merge branch 'master' into JDK-8307990
> - Address comments from tstuefe and RogerRiggs
> - REALLY adding the test :)
> - Added test case
> - 8307990: jspawnhelper must close its writing side of a pipe before reading
> from it
> Hey, chiming on this conversation, because I believe it will fix an issue we
> encountered since, upgrading from JDK 11 to 17. In our application we spawn a
> lot of processes over time and noticed that more and more JVM threads get
> stuck in the native `ProcessImpl.forkAndExec`.
>
> Thread Dump Excerpt
> After some investigation we saw that we have `jspawnhelper` processes
> corresponding to the number of the stuck threads. `/proc/*/task/{id}/syscall`
> shows that all of these processes are currently reading from the pipe.
>
> Our theory is that the write calls from the parent thread were interrupted
> resulting in an incomplete write.
>
> https://github.com/openjdk/jdk/blob/69f508a2ac344eb61cef7be985348873b8265171/src/java.base/unix/native/libjava/ProcessImpl_md.c#L556-L559
>
>
> This would cause the `readFully` on the `jspawnhelper` side to wait for the
> remaining data.
> As the number of bytes written is not checked on the writer side it would
> continue and wait for the alive ping from the `jspawnhelper` which results in
> a deadlock.
> I would be interested in a reproducer e.g. using `gdb`, but to be honest I
> wouldn't know where to start.
> But regardless of whether this fixes our issue I think incomplete writes
> should be handled according to specification.
> I think the `restartableWrites` in this PR do that?
> We also mitigated by falling back to `VFORK` as was also mentioned in [this
> comment](https://github.com/openjdk/jdk/pull/13956#issuecomment-1547777042).
>
> Maybe it would be worth it to mention our bug situation in the release notes
> as well? Thank you and please let me know if I got something wrong or there
> is anything I could provide to help!
@mlichtblau thanks a lot for your report. I saw some similar reports from our
internal Maven builds (have to check though if they are exactly the same). Your
analysis of the problem seems sound and I hope the restartable writes will
indeed fix the problem.
I'll definitely add your example to the release notes and I'll try if I can
come up with a test that reproduces your issue.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/13956#issuecomment-1558794718