On Thu, 27 Jun 2024 08:54:16 GMT, Kevin Walls <kev...@openjdk.org> wrote:

> This test has had occasional failures for years, possibly forever.
> A previous update made the test "othervm" which removed some interruptions, 
> but a time accounting problem remains.
> 
> This change adds a simple sleep after observing that the test threads are all 
> sleeping.
> 
> The idea is that threads may not have actually started sleeping when we 
> observe their java.lang.Thread.State as WAITING, they may use some CPU time 
> after that in order to actually get to sleep.

The test waits for the state of test threads to be WAITING.  There are a few 
more lines of test code (which might take 20,000 or 50,0000 ns using simple 
System.nanoTime calls), then we record threadmxbean.getCpuTime() for threads, 
do a sleep, and check these times again.

Usually they are the same, no more cpu time was used (correct!).  But the times 
can be some thousands of nanos different and the test fails.

My theory is that sometimes, a test thread has not really started sleeping yet 
when we read its state as WAITING and record the first time.  The test should 
sleep for a few hundred millis before taking the first time readings.

Commonly, in the failures the time discrepancy is under 100,000 nanos, with an 
example I see of nearly 200,000 nanos and the worst I saw was 15625000 nanos, 
approx 15ms.


When test threads are waiting, they look like this:

"MyThread-6" #38 [16479] prio=5 os_prio=0 cpu=56.05ms elapsed=34.89s 
tid=0x000014f83806c590 nid=16479 in Object.wait()  [0x000014f8dd343000]
   java.lang.Thread.State: WAITING (on object monitor)
Thread: 0x000014f83806c590  [0x405f] State: _at_safepoint _at_poll_safepoint 0
   JavaThread state: _thread_blocked
        at java.lang.Object.wait0(java.base@24-internal/Native Method)
        - waiting on <0x00000000ffa66fc0> (a java.lang.Object)
        at java.lang.Object.wait(java.base@24-internal/Object.java:374)
        at java.lang.Object.wait(java.base@24-internal/Object.java:348)
        at ThreadCpuTimeArray$MyThread.run(ThreadCpuTimeArray.java:263)
        - locked <0x00000000ffa66fc0> (a java.lang.Object)


This is a harmless test change either way.  If the theory is wrong, we will see 
the "out of expected range" failure again.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19924#issuecomment-2194213706

Reply via email to