On Wed, 22 Nov 2023 15:00:29 GMT, Stefan Karlsson <stef...@openjdk.org> wrote:

> In the rewrites made for:
> [JDK-8318757](https://bugs.openjdk.org/browse/JDK-8318757) `VM_ThreadDump 
> asserts in interleaved ObjectMonitor::deflate_monitor calls`
> 
> I removed the filtering of *owned ObjectMonitors with dead objects*. The 
> reasoning was that you should never have an owned ObjectMonitor with a dead 
> object. I added an assert to check this assumption. It turns out that the 
> assumption was wrong *if* you use JNI to call MonitorEnter and then remove 
> all references to the locked object.
> 
> The provided tests provoke this assert form:
> * the JNI thread detach code
> * thread dumping with locked monitors, and
> * the JVMTI GetOwnedMonitorInfo API.
> 
> While investigating this we've found that the thread detach code becomes more 
> correct when this filter was removed. Previously, the locked monitors never 
> got unlocked because the ObjectMonitor iterator never exposed these monitors 
> to the JNI detach code that unlocks the thread's monitors. That bug caused an 
> ObjectMonitor leak. So, for this case I'm leaving these ObjectMonitors 
> unfiltered so that we don't reintroduce the leak.
> 
> The thread dumping case doesn't tolerate ObjectMonitor with dead objects, so 
> I'm filtering those in the closure that collects ObjectMonitor. Side note: We 
> have discussions about ways to completely rewrite this by letting each thread 
> have thread-local information about JNI held locks. If we have this we could 
> probably throw away the entire ObjectMonitorDump hashtable, and its walk of 
> the `_in_use_list.`.
> 
> For GetOwnedMonitorInfo it is unclear if we should expose these weird 
> ObjectMonitor. If we do, then the users can detect that a thread holds a lock 
> with a dead object, and the code will return NULL as one of the "owned 
> monitors" returned. I don't think that's a good idea, so I'm filtering out 
> these ObjectMonitor for those calls.
> 
> Test: the written tests with and without the fix. Tier1-Tier3, so far.

Functional fix is simple and fine but quite a lot of commentary on the tests.

Thanks

src/hotspot/share/runtime/vmOperations.cpp line 354:

> 352:       // alive. Filter out monitors with dead objects.
> 353:       return;
> 354:     }

I don't think we need to do this, but even without this filtering I ran a 
number of tests and was unable to demonstrate any problem. The JNI locked 
monitor seems to be "invisible" to the frame that locked it and so the thread 
dump never encounters it. Were you able to provoke a failure here or is this 
defensive programming?

test/hotspot/jtreg/runtime/Monitor/IterateMonitorWithDeadObjectTest.java line 
28:

> 26:  * @test IterateMonitorWithDeadObjectTest
> 27:  * @summary This locks a monitor, GCs the object, and iterate and perform
> 28:  *          various iteration and operations over this monitor.

This doesn't read right with "iterate" and "iteration". Not sure exactly what 
you were trying to say.

test/hotspot/jtreg/runtime/Monitor/IterateMonitorWithDeadObjectTest.java line 
29:

> 27:  * @summary This locks a monitor, GCs the object, and iterate and perform
> 28:  *          various iteration and operations over this monitor.
> 29:  * @requires os.family == "linux"

I know the test this was copied from had this but I'm not sure it is actually a 
necessary restriction - any Posix platform should work. Though maybe perror is 
linux only ...

test/hotspot/jtreg/runtime/Monitor/IterateMonitorWithDeadObjectTest.java line 
31:

> 29:  * @requires os.family == "linux"
> 30:  * @library /testlibrary /test/lib
> 31:  * @build IterateMonitorWithDeadObjectTest

You don't need an explicit `@build` step

test/hotspot/jtreg/runtime/Monitor/IterateMonitorWithDeadObjectTest.java line 
40:

> 38: public class IterateMonitorWithDeadObjectTest {
> 39:     public static native void runTestAndDetachThread();
> 40:     public static native void joinTestThread();

I don't think this form of the test needs to separate out the `pthread_join()`, 
it can just be done in `runTestAndDetachThread` AFAICS. I originally split it 
out to allow the Java code to do the GC while the native thread was sleeping 
prior to detaching.

test/hotspot/jtreg/runtime/Monitor/IterateMonitorWithDeadObjectTest.java line 
57:

> 55:         // - Drop the last reference to the object
> 56:         // - GC to clear the weak reference to the object in the monitor
> 57:         // - Detach the thread - provoke previous bug

It also does a thread dump while the lock is held

test/hotspot/jtreg/runtime/Monitor/IterateMonitorWithDeadObjectTest.java line 
66:

> 64:         // dead object. The thread dumping code didn't tolerate such a 
> monitor,
> 65:         // so run a thread dump and make sure that it doesn't 
> crash/assert.
> 66:         dumpThreadsWithLockedMonitors();

But you've already detached the thread so there is no locked monitor any 
longer. And `runTestAndDetachThread()` also did a thread dump.

test/hotspot/jtreg/runtime/Monitor/libIterateMonitorWithDeadObjectTest.c line 
43:

> 41: static jobject create_object(JNIEnv* env) {
> 42:   jclass clazz = (*env)->FindClass(env, "java/lang/Object");
> 43:   if (clazz == 0) die("No class");

The `die` method is for errors with system calls. It won't show useful 
information for JNI calls that leave exceptions pending.

test/hotspot/jtreg/runtime/Monitor/libIterateMonitorWithDeadObjectTest.c line 
76:

> 74:   if (dumpAllThreadsMethod == 0) die("No dumpAllThreads method");
> 75: 
> 76:   // The 'lockedMonitors == true' is what triggers the collection of the 
> monitor with the dead object.

"triggers the collection" sounds like a GC interaction but that is not what you 
mean. Suggestion:

// The 'lockedMonitors == true' is what causes the monitor with a dead object 
to be examined.

test/hotspot/jtreg/runtime/Monitor/libIterateMonitorWithDeadObjectTest.c line 
94:

> 92: 
> 93:   // Let the GC clear the weak reference to the object.
> 94:   system_gc(env);

AFAIK there is no guarantee that one call to `System.gc()` will suffice to 
clear the weakRef. We tend use a loop with a few iterations in other tests, or 
use a WhiteBox method to achieve it. In my testing I used the finalizer to 
observe that the objects had been finalized but even then, and with a loop, I 
did not always see them collected with G1.

test/hotspot/jtreg/runtime/Monitor/libIterateMonitorWithDeadObjectTest.c line 
104:

> 102:   // source of at least two bugs:
> 103:   // - When the object reference in the monitor was made weak, the code
> 104:   //   didn't unlock the monitor, leaving it lingering in the system.

Suggestion:

// - When the object reference in the monitor was cleared, the monitor
//   iterator code would skip it, preventing it from being unlocked when
//   the owner thread detached, leaving it lingering in the system.

the original made it sound to me like the code that cleared the reference (i.e. 
the GC) was expected to do the unlocking.

test/hotspot/jtreg/runtime/Monitor/libIterateMonitorWithDeadObjectTest.c line 
107:

> 105:   // - When the monitor iterator API was rewritten the code was changed 
> to
> 106:   //   assert that we didn't have "owned" monitors with dead objects. 
> This
> 107:   //   test provokes that situation and those asserts.

nit: s/those asserts/that assert/

test/hotspot/jtreg/serviceability/jvmti/GetOwnedMonitorInfo/GetOwnedMonitorInfoTest.java
 line 54:

> 52: 
> 53:     private static void jniMonitorEnterAndLetObjectDie() {
> 54:         // The monitor iterator used GetOwnedMonitorInfo used to

s/iterator used/iterator used by/

test/hotspot/jtreg/serviceability/jvmti/GetOwnedMonitorInfo/GetOwnedMonitorInfoTest.java
 line 59:

> 57:         // GetOwnedMonitorInfo testing.
> 58:         Object obj = new Object() { public String toString() {return "";} 
> };
> 59:         jniMonitorEnter(obj);

I would add a check for `Thread.holdsLock(obj);` after this just to be sure it 
worked.

test/hotspot/jtreg/serviceability/jvmti/GetOwnedMonitorInfo/GetOwnedMonitorInfoTest.java
 line 61:

> 59:         jniMonitorEnter(obj);
> 60:         obj = null;
> 61:         System.gc();

Again one gc() is generally not sufficient.

How can this test tell that the object in the monitor was actually cleared? I 
think `monitorinflation` logging may be the only way to tell.

-------------

Changes requested by dholmes (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/16783#pullrequestreview-1745590548
PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1402843318
PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1402843678
PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1402843923
PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1402844118
PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1402846787
PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1402846942
PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1402845852
PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1402857972
PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1402859246
PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1402847898
PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1402848741
PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1402848946
PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1402851705
PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1402857110
PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1402852983

Reply via email to