On Fri, 19 May 2023 15:43:30 GMT, Roger Riggs <rri...@openjdk.org> wrote:
>>> Sorry, but I don't understand this argument. If we do a short read we will >>> work with corrupted ChildStuff and SpawnInfo >>> structures. This can in the extreme case execute arbitrary code (e.g. if >>> ChildStuff.argv is not fully read from the parent). You are >>> basically saying it is better to work on corrupted data rather than >>> reporting an error. >> >> No I am simply pointing out that this has changed more than just the issue >> with close. And maybe a short-read does indicate data "corruption" and maybe >> it should be a fatal error. But I don't know exactly how this might manifest >> so perhaps there are benign short-reads that actually do happen. Regardless >> it might be better to split this part out and focus on the close issue here. > >> > Sorry, but I don't understand this argument. If we do a short read we will >> > work with corrupted ChildStuff and SpawnInfo >> > structures. This can in the extreme case execute arbitrary code (e.g. if >> > ChildStuff.argv is not fully read from the parent). You are >> > basically saying it is better to work on corrupted data rather than >> > reporting an error. >> >> No I am simply pointing out that this has changed more than just the issue >> with close. And maybe a short-read does indicate data "corruption" and maybe >> it should be a fatal error. But I don't know exactly how this might manifest >> so perhaps there are benign short-reads that actually do happen. Regardless >> it might be better to split this part out and focus on the close issue here. > > Given the purpose and implementation of the `readFully` function, I don't see > how it can return anything other than an error or the full requested read > length. @RogerRiggs , @tstuefe sorry for bothering you one more time, but I think @mlichtblau brought up an interesting case yesterday which isn't fully resolved by the current fix. In @mlichtblau example, `Java_java_lang_ProcessImpl_forkAndExec()` got stuck waiting on the ping from `jspawnhelper` which itself blocks in `readFully()` because of a truncated `write()` from the parent. The current fix already replaces `write()` with `restartableWrite()` which handles the case where a `write()` call was interrupted **before** writing a single byte. But there's a second case, namely when `write()` was interrupted after it already wrote some (but not all) of the requested bytes. The `write()` man page states: > Note that a successful `write()` may transfer fewer than `count` bytes. Such > partial writes can occur for various reasons; for example, because .., or > because a blocked `write()` to a socket, pipe, or similar was interrupted > by a signal handler after it had transferred some, but before it had > transferred all of the requested bytes. In the event of a partial write, the > caller can make another `write()` call to transfer the remaining bytes. So in order to safely handle the second case as well, we have to replace `restartableWrite()` with something like `writeFully()` and that's exactly what this new commit does. I've also added a new test case which reproduces the issue by simulating a truncated `write()`. For your convenience I've tried to explain all the additional changes (except for trivial cleanups and renamings in the test) below: Thanks in advance, Volker ##### `jspawnhelper.c`: + // The file descriptor for reporting errors back to our parent we got on the command + // line should be the same like the one in the ChildStuff struct we've just read. + assert(c.fail[1] == fdout); Just trying to be overly cautious. ##### childproc.{c,h} -ssize_t restartableWrite(int fd, const void *buf, size_t count); +ssize_t writeFully(int fd, const void *buf, size_t count); -ssize_t -restartableWrite(int fd, const void *buf, size_t count) ... -} +ssize_t +writeFully(int fd, const void *buf, size_t nbyte) +#ifdef DEBUG ... +#endif ... +} Replace `restartableWrite()` with `writeFully()` which is basically a copy of `readFully()` (with additional testing code for jtreg). This handles both cases, when `write()` is interrupted before having written a single byte and returns EINTR, as well as the case where `write()` already wrote some (but not all) bytes before it was interrupted. - restartableWrite(fail_pipe_fd, &code, sizeof(code)); + if (writeFully(fail_pipe_fd, &code, sizeof(code)) != sizeof(code)) { + goto WhyCantJohnnyExec; + } Treat an incomplete write as error and bail out. - restartableWrite(fail_pipe_fd, &errnum, sizeof(errnum)); + writeFully(fail_pipe_fd, &errnum, sizeof(errnum)); We're already in the error handling code here (i.e. `WhyCantJohnnyExec`) so not much we can do. We'll close the pipe and exit with an error anyway. ##### `ProcessImpl_md.c` - restartableWrite(c->childenv[1], (char *)&magic, sizeof(magic)); // magic number first + if (writeFully(c->childenv[1], (char *)&magic, sizeof(magic)) != sizeof(magic)) { // magic number first + return -1; + } ... - restartableWrite(c->childenv[1], (char *)c, sizeof(*c)); - restartableWrite(c->childenv[1], (char *)&sp, sizeof(sp)); - restartableWrite(c->childenv[1], buf, bufsize); + if (writeFully(c->childenv[1], (char *)c, sizeof(*c)) != sizeof(*c) || + writeFully(c->childenv[1], (char *)&sp, sizeof(sp)) != sizeof(sp) || + writeFully(c->childenv[1], buf, bufsize) != bufsize) { + return -1; + } Make sure we've written all the required data to jspawnhelper or return an error otherwise. The `-1` return value will be handled in `Java_java_lang_ProcessImpl_forkAndExec()` by throwing an `IOException("posix_spawn failed")`. + /* We're done. Let jspwanhelper know he can't expect any more data from us. */ + close(c->childenv[1]); + c->childenv[1] = -1; Close the write end of the pipe to jspawnhelper after we've written all data. This will prevent `jspawnhelper` to block on the reading side if somthing goes wrong (e.g. we've not written enough data). + // Reset errno to protect against bogus error messages + errno = 0; In `Java_java_lang_ProcessImpl_forkAndExec()` we might call `throwIOException(env, errno, ..)` in error situations not caused by a failing system call (e.g. if `startChild()` returned `-1` because of a bad response from `jspawnhelper`). It is therefor better to reset `errno` at the beginning of the method to avoid bogus error messages in the generated `IOExceptions`s. - assert(errnum == CHILD_IS_ALIVE); if (errnum != CHILD_IS_ALIVE) { - /* Should never happen since the first thing the spawn - * helper should do is to send an alive ping to the parent, - * before doing any subsequent work. */ + /* This can happen if the spawn helper encounters an error + * before or during the handshake with the parent. */ throwIOException(env, 0, "Bad code from spawn helper " This assertion isn't useful. We'd rather want to see the generated exception if `errnum != CHILD_IS_ALIVE` instead of crashing the process. Also, the comment is wrong, because `jspawnhelper` does quite some stuff (e.g. allocating, reading from parent) before sending the alive ping and if these actions fail, `jspawnhelper` will send an error code (e.g. `ERR_MALLOC`, `ERR_PIPE`) which is not equal to `CHILD_IS_ALIVE`. ##### JspawnhelperProtocol.java + private static void simulateTruncatedWriteInParent(int stage) throws Exception { ... + } Added a new test to simualte a truncated write in the parent process to jspawnhelper communication. - Optional<String> cmd = ph.info().command(); - if (cmd.isPresent() && cmd.get().endsWith("jspawnhelper")) { - throw new Exception("jspawnhelper still alive after parent Java process terminated"); + try { + // Give jspawnhelper a chance to exit gracefully + ph.onExit().get(TIMEOUT, TimeUnit.SECONDS); + } catch (TimeoutException te) { + Optional<String> cmd = ph.info().command(); + if (cmd.isPresent() && cmd.get().endsWith("jspawnhelper")) { + throw new Exception("jspawnhelper still alive after parent Java process terminated"); + } Make the `simulateCrashInParent()` sub-test more robust by allowing `jspawnhelper` up to `TIMEOUT` seconds to terminate. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13956#issuecomment-1561271712