On Fri, 19 May 2023 15:43:30 GMT, Roger Riggs <rri...@openjdk.org> wrote:

>>> Sorry, but I don't understand this argument. If we do a short read we will 
>>> work with corrupted ChildStuff and SpawnInfo 
>>> structures. This can in the extreme case execute arbitrary code (e.g. if 
>>> ChildStuff.argv is not fully read from the parent). You are 
>>> basically saying it is better to work on corrupted data rather than 
>>> reporting an error.
>> 
>> No I am simply pointing out that this has changed more than just the issue 
>> with close. And maybe a short-read does indicate data "corruption" and maybe 
>> it should be a fatal error. But I don't know exactly how this might manifest 
>> so perhaps there are benign short-reads that actually do happen. Regardless 
>> it might be better to split this part out and focus on the close issue here.
>
>> > Sorry, but I don't understand this argument. If we do a short read we will 
>> > work with corrupted ChildStuff and SpawnInfo
>> > structures. This can in the extreme case execute arbitrary code (e.g. if 
>> > ChildStuff.argv is not fully read from the parent). You are
>> > basically saying it is better to work on corrupted data rather than 
>> > reporting an error.
>> 
>> No I am simply pointing out that this has changed more than just the issue 
>> with close. And maybe a short-read does indicate data "corruption" and maybe 
>> it should be a fatal error. But I don't know exactly how this might manifest 
>> so perhaps there are benign short-reads that actually do happen. Regardless 
>> it might be better to split this part out and focus on the close issue here.
> 
> Given the purpose and implementation of the `readFully` function, I don't see 
> how it can return anything other than an error or the full requested read 
> length.

@RogerRiggs , @tstuefe sorry for bothering you one more time, but I think 
@mlichtblau brought up an interesting case yesterday which isn't fully resolved 
by the current fix. In @mlichtblau example, 
`Java_java_lang_ProcessImpl_forkAndExec()` got stuck waiting on the ping from 
`jspawnhelper` which itself blocks in `readFully()` because of a truncated 
`write()` from the parent.

The current fix already replaces `write()` with `restartableWrite()` which 
handles the case where a `write()` call was interrupted **before** writing a 
single byte. But there's a second case, namely when `write()` was interrupted 
after it already wrote some (but not all) of the requested bytes. The `write()` 
man page states:

> Note  that a successful `write()` may transfer fewer than `count` bytes. Such 
> partial writes can occur for various reasons; for example, because .., or 
> because a blocked `write()` to  a  socket, pipe, or similar was interrupted 
> by a signal handler after it had transferred some, but before it had 
> transferred all of the requested bytes.  In the event of a partial write, the 
> caller can make another `write()` call to transfer the remaining bytes.

So in order to safely handle the second case as well, we have to replace 
`restartableWrite()` with something like `writeFully()` and that's exactly what 
this new commit does. I've also added a new test case which reproduces the 
issue by simulating a truncated `write()`.

For your convenience I've tried to explain all the additional changes (except 
for trivial cleanups and renamings in the test) below:

Thanks in advance,
Volker

##### `jspawnhelper.c`:


+    // The file descriptor for reporting errors back to our parent we got on 
the command
+    // line should be the same like the one in the ChildStuff struct we've 
just read.
+    assert(c.fail[1] == fdout);


Just trying to be overly cautious.

##### childproc.{c,h}


-ssize_t restartableWrite(int fd, const void *buf, size_t count);
+ssize_t writeFully(int fd, const void *buf, size_t count);



-ssize_t
-restartableWrite(int fd, const void *buf, size_t count)
...
-}

+ssize_t
+writeFully(int fd, const void *buf, size_t nbyte)
+#ifdef DEBUG
...
+#endif
...
+}


Replace `restartableWrite()` with `writeFully()` which is basically a copy of 
`readFully()` (with additional testing code for jtreg). This handles both 
cases, when `write()` is interrupted before having written a single byte and 
returns EINTR, as well as the case where `write()` already wrote some (but not 
all) bytes before it was interrupted. 


-        restartableWrite(fail_pipe_fd, &code, sizeof(code));
+        if (writeFully(fail_pipe_fd, &code, sizeof(code)) != sizeof(code)) {
+            goto WhyCantJohnnyExec;
+        }


Treat an incomplete write as error and bail out.


-        restartableWrite(fail_pipe_fd, &errnum, sizeof(errnum));
+        writeFully(fail_pipe_fd, &errnum, sizeof(errnum));


We're already in the error handling code here (i.e. `WhyCantJohnnyExec`) so not 
much we can do. We'll close the pipe and exit with an error anyway.

##### `ProcessImpl_md.c`


-    restartableWrite(c->childenv[1], (char *)&magic, sizeof(magic)); // magic 
number first
+    if (writeFully(c->childenv[1], (char *)&magic, sizeof(magic)) != 
sizeof(magic)) { // magic number first
+        return -1;
+    }
...
-    restartableWrite(c->childenv[1], (char *)c, sizeof(*c));
-    restartableWrite(c->childenv[1], (char *)&sp, sizeof(sp));
-    restartableWrite(c->childenv[1], buf, bufsize);
+    if (writeFully(c->childenv[1], (char *)c, sizeof(*c)) != sizeof(*c) ||
+        writeFully(c->childenv[1], (char *)&sp, sizeof(sp)) != sizeof(sp) ||
+        writeFully(c->childenv[1], buf, bufsize) != bufsize) {
+        return -1;
+    }


Make sure we've written all the required data to jspawnhelper or return an 
error otherwise. The `-1` return value will be handled in 
`Java_java_lang_ProcessImpl_forkAndExec()` by throwing an 
`IOException("posix_spawn failed")`.


+    /* We're done. Let jspwanhelper know he can't expect any more data from 
us. */
+    close(c->childenv[1]);
+    c->childenv[1] = -1;


Close the write end of the pipe to jspawnhelper after we've written all data. 
This will prevent `jspawnhelper` to block on the reading side if somthing goes 
wrong (e.g. we've not written enough data).


+    // Reset errno to protect against bogus error messages
+    errno = 0;


In `Java_java_lang_ProcessImpl_forkAndExec()` we might call 
`throwIOException(env, errno, ..)` in error situations not caused by a failing 
system call (e.g. if `startChild()` returned `-1` because of a bad response 
from `jspawnhelper`). It is therefor better to reset `errno` at the beginning 
of the method to avoid bogus error messages in the generated `IOExceptions`s.


-            assert(errnum == CHILD_IS_ALIVE);
             if (errnum != CHILD_IS_ALIVE) {
-                /* Should never happen since the first thing the spawn
-                 * helper should do is to send an alive ping to the parent,
-                 * before doing any subsequent work. */
+                /* This can happen if the spawn helper encounters an error
+                 * before or during the handshake with the parent. */
                 throwIOException(env, 0, "Bad code from spawn helper "


This assertion isn't useful. We'd rather want to see the generated exception if 
`errnum != CHILD_IS_ALIVE` instead of crashing the process. Also, the comment 
is wrong, because `jspawnhelper` does quite some stuff (e.g. allocating, 
reading from parent) before sending the alive ping and if these actions fail, 
`jspawnhelper` will send an error code (e.g. `ERR_MALLOC`, `ERR_PIPE`) which is 
not equal to `CHILD_IS_ALIVE`.

##### JspawnhelperProtocol.java


+    private static void simulateTruncatedWriteInParent(int stage) throws 
Exception {
...
+    }


Added a new test to simualte a truncated write in the parent process to 
jspawnhelper communication.


-            Optional<String> cmd = ph.info().command();
-            if (cmd.isPresent() && cmd.get().endsWith("jspawnhelper")) {
-                throw new Exception("jspawnhelper still alive after parent 
Java process terminated");
+            try {
+                // Give jspawnhelper a chance to exit gracefully
+                ph.onExit().get(TIMEOUT, TimeUnit.SECONDS);
+            } catch (TimeoutException te) {
+                Optional<String> cmd = ph.info().command();
+                if (cmd.isPresent() && cmd.get().endsWith("jspawnhelper")) {
+                    throw new Exception("jspawnhelper still alive after parent 
Java process terminated");
+                }


Make the `simulateCrashInParent()` sub-test more robust by allowing 
`jspawnhelper` up to `TIMEOUT` seconds to terminate.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/13956#issuecomment-1561271712

Reply via email to