On Mon, 22 May 2023 10:22:16 GMT, Volker Simonis <simo...@openjdk.org> wrote:
>> Since JDK13, executing commands in a sub-process defaults to the so called >> `POSIX_SPAWN` launching mechanism (i.e. >> `-Djdk.lang.Process.launchMechanism=POSIX_SPAWN`) on Linux. This works by >> using `posix_spawn(3)` to firstly start a tiny helper program called >> `jspawnhelper` in a subprocess. In a second step, `jspawnhelper` reads the >> command data from the parent Java process over a Unix pipe and finally >> executes (i.e. `execvp(3)`) the requested command. >> >> In cases where the parent process terminates abnormally before >> `jspawnhelper` has read all the expected data from the pipe, `jspawnhelper` >> will block indefinitely on the reading end of the pipe. This is especially >> harmful if the parent process had open sockets, because in that case, the >> forked `jspawnhelper` process will inherit them and keep all the >> corresponding ports open effectively preventing other processes to bind to >> them. Notice that this is not an abstract scenario. We've observed this >> regularly in production with services which couldn't be restarted after a >> crash after migrating to JDK 17. >> >> The fix of the issue is rather trivial. `jspawnhelper` has to close its >> writing end of the pipe which connects it with the parent Java process >> *before* starting to read from that pipe such that reads from the pipe can >> immediately return with EOF if the parent process terminates abnormally. >> >> Also did some cleanup: >> - in `jspawnhelper.c` correctly chek the result of `readFully()` >> - in `ProcessImpl_md.c` use `restartableWrite()` instead of `write()` to >> write the command data from the parent process to `jspawnhelper` >> - in `childproc.{c,h}` remove remaining traces of `MODE_CLONE` because >> that's long gone. > > Volker Simonis has updated the pull request with a new target base due to a > merge or a rebase. The pull request now contains five commits: > > - Merge branch 'master' into JDK-8307990 > - Address comments from tstuefe and RogerRiggs > - REALLY adding the test :) > - Added test case > - 8307990: jspawnhelper must close its writing side of a pipe before reading > from it > Hey, chiming on this conversation, because I believe it will fix an issue we > encountered since, upgrading from JDK 11 to 17. In our application we spawn a > lot of processes over time and noticed that more and more JVM threads get > stuck in the native `ProcessImpl.forkAndExec`. > > Thread Dump Excerpt > After some investigation we saw that we have `jspawnhelper` processes > corresponding to the number of the stuck threads. `/proc/*/task/{id}/syscall` > shows that all of these processes are currently reading from the pipe. > > Our theory is that the write calls from the parent thread were interrupted > resulting in an incomplete write. > > https://github.com/openjdk/jdk/blob/69f508a2ac344eb61cef7be985348873b8265171/src/java.base/unix/native/libjava/ProcessImpl_md.c#L556-L559 > > > This would cause the `readFully` on the `jspawnhelper` side to wait for the > remaining data. > As the number of bytes written is not checked on the writer side it would > continue and wait for the alive ping from the `jspawnhelper` which results in > a deadlock. > I would be interested in a reproducer e.g. using `gdb`, but to be honest I > wouldn't know where to start. > But regardless of whether this fixes our issue I think incomplete writes > should be handled according to specification. > I think the `restartableWrites` in this PR do that? > We also mitigated by falling back to `VFORK` as was also mentioned in [this > comment](https://github.com/openjdk/jdk/pull/13956#issuecomment-1547777042). > > Maybe it would be worth it to mention our bug situation in the release notes > as well? Thank you and please let me know if I got something wrong or there > is anything I could provide to help! @mlichtblau thanks a lot for your report. I saw some similar reports from our internal Maven builds (have to check though if they are exactly the same). Your analysis of the problem seems sound and I hope the restartable writes will indeed fix the problem. I'll definitely add your example to the release notes and I'll try if I can come up with a test that reproduces your issue. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13956#issuecomment-1558794718