On Mon, 8 May 2023 13:46:55 GMT, Maurizio Cimadamore <mcimadam...@openjdk.org> 
wrote:

> This is a tricky intermittent failure on one of our async file channel test.
> 
> While the logic of the test is a bit hard to follow, I believe the test is 
> supposed to work as follows:
> 
> * in the main thread, we submit an initial async write on a buffer
> * the completion handler of the async op will submit another async write 
> after one completes
> * before/after each write we increment/decrement an atomic counter
> * in the main thread waits 20 secs, to make sure the write buffers are full 
> and that there are some async writes actually awaiting for OS to serve them
> * we then try to close the test arena - this would fail as there are pending 
> OS writes, and we keep the arena locked to prevent JVM crashes
> * then the main test will set the `continueWriting` flag to false (which will 
> cause the handler to stop submitting new write requests).
> * the main thread will then read what's left (so that the buffers will empty 
> and the OS can serve the outstanding writes)
> * when there's no outstanding writes left, the main thread will close the 
> test scope.
> 
> I think, after many hours spent staring the test that, when working as 
> intended, the test logic is correct. Each write is only submitted after the 
> previous one finished, and the test can only end when we see the number of 
> outstanding write to reach 0. For this to happen, we need to execute the 
> handler once when `continueWriting` is set to false (which will cause an 
> asymmetric decrement of the counter from the handler, which will match the 
> asymmetric increment outside the handler, in the main test thread).
> 
> When trying to reduce the timeout which ensures that write buffers are full, 
> I started hitting the same exception as the one described in the bug report. 
> After some digging, I found that the exception was **not** caused (as I 
> thought) by some bad synchronization logic which allowed the main test to 
> close the arena before the handlers were actually finished with it. Instead, 
> the failure is caused by the test assertion which checks that the test arena 
> cannot be closed:
> 
> 
> assertMessage(expectThrows(ISE, () -> drop.close()), "Session is acquired 
> by"); 
> 
> 
> This check is bogus: it assumes that the buffers are indeed full, and that 
> some OS write operation cannot progress. In that case, the underlying arena 
> will be kept alive (as the implementation wants to avoid a JVM crash 
> triggered by an OS write on an already freed region of memory).
> 
> But, if the buffer is not full at this stage, there is nothing keeping the 
> test arena alive: note that the completion handler executes **after** the 
> arena acquire/release in `IOUtils`. So, if all write operations complete 
> normally, `drop::close` will actually succeed!
> 
> At this point it's a coint toss as to whether we'll see a message because the 
> copletion handler tries to allocate on an already closed arena, or if we see 
> a message complaining about `drop::close` not failing as expected.
> 
> While there are other ways to fix this, I think a simple fix is to actually 
> remove the assertion on `drop::close`. Note that if there's a bug in the 
> keepalive logic of the arena, the test would still fail (but with a JVM 
> crash). And, by avoiding a spurious call to `drop::close` we make sure that 
> the test always runs as intended whether or not the OS buffers are full.

This pull request has now been integrated.

Changeset: f92d095e
Author:    Maurizio Cimadamore <mcimadam...@openjdk.org>
URL:       
https://git.openjdk.org/jdk/commit/f92d095e164bd79b452586e49e166d1ba392632f
Stats:     2 lines in 1 file changed: 0 ins; 2 del; 0 mod

8307411: Test java/foreign/channels/TestAsyncSocketChannels.java failed: 
IllegalStateException: Already closed

Reviewed-by: jvernee

-------------

PR: https://git.openjdk.org/jdk/pull/13866

Reply via email to