TypeConverterFactoryMemoryLeakTest.java failed with "AssertionError: Should have GCd a method handle by now" [v2]

Attila Szegedi Sat, 27 Feb 2021 12:32:14 -0800

On Fri, 26 Feb 2021 08:03:26 GMT, Peter Levart <plev...@openjdk.org> wrote:


>> The good test would be trying with all current GCs (Serial, Parallel, G1, 
>> Shenandoah, ZGC):
>> 
>> make run-test TEST=jdk/dynalink TEST_VM_OPTS=-XX:+UseSerialGC
>> 
>> Also, make sure that GH actions are able to run this test. You probably need 
>> to enable the workflow here -- https://github.com/szegedi/jdk/actions -- and 
>> then trigger the workflow manually on your branch (or push another commit). 
>> Then "Checks" tab would show the results.
>
>> > I would like 1st to understand why the MH created in the 
>> > TestLinker.convertToType is actually GCed at all.
>> 
>> The whole original issue was about allowing these objects to be GCd smile.
>> Short story: because all data is scoped to objects created within `makeOne` 
>> method.
> 
> Right, below description helped me understand the "chain of domination - i.e. 
> hard references that keep the BiClassValueRoot (extends ClassValue) 
> reachable". That doesn't explain fully why converters are not GC-ed as soon 
> as the associated BiClassValueRoot is eligible for GC. The secret lies in the 
> implementation of ClassValue...
> 
>> 
>> Longer story: It is GC'd because its reachability is dominated by the 
>> `DynamicLinker` object created on [line 
>> 96](https://github.com/openjdk/jdk/blob/3afec32bdf49703bb3537b36248f0550caae6d87/test/jdk/jdk/dynalink/TypeConverterFactoryMemoryLeakTest.java#L96).
>>  It and everything it dominates (`LinkerServicesImpl` object holding the 
>> `TypeConverterFactory` object holding the `BiClassValue` objects holding the 
>> method handles) become garbage as soon as the method exits.
> 
> That is not entirely true. What becomes garbage is the objects leading to the 
> BiClassValueRoot (extends ClassValue) instance including the BiClassValueRoot 
> instance. But it is not the ClassValue<T> instance that keeps the associated 
> T value(s) reachable. The associated values are strongly reachable from the 
> associated Class<?> instances. The Class class contains an instance field of 
> type ClassValueMap which is responsible for keeping the associated values. 
> This answers my question why the associated converter is GCed only after 12 
> new converters are created for the same source/target type pair. It is not 
> because creating new converters would cause memory pressure and consequently 
> trigger GC (I tried adding explicit System.gc() calls and they did not 
> expedite the GCing of the converter). It is because ClassValue<T> logic 
> expunges associated T values from Class.ClassValueMap only at times when it 
> has to re-arrange the map's internal structures and that eventually happens 
> on new inser
 tions for the same Class instance and different ClassValue instance.
> 
> That doesn't mean type converter caching is flawed in any way. It just means 
> that some converters will be retained even though they are practically not 
> "reachable" any more through the mechanism of caching. That doesn't mean that 
> any ClassLoader leaks are possible either.
> 
>> This works because of the pains I took in `BiClassValue` to ensure we don't 
>> leak any of the infrastructural objects through implicit `this$0` 
>> outer-class references into the `ClassValue`s. That allows the `ClassValue` 
>> objects to be GCd and removed from the classes' class-value mapping. This 
>> ordinarily doesn't happen as most `ClassValue` objects are bound to static 
>> fields, but in `TypeConverterFactory` the `BiClassValue` instances are 
>> specific to the `TypeConverterFactory` instance, so they are expected to be 
>> reclaimable by GC if the converter factory itself goes away.
>> 
>> > And after that, why it is not GCed before 12 invocations to makeOne() are 
>> > made.
>> 
>> Because GC is at liberty to delay recognizing an object is phantom 
>> reachable? I don't think we need to read too much into this? Correct me if I 
>> don't see something.
> 
> As described above, GC has nothing to do with this "delay". It is caused by 
> the way ClassValue keeps associated values reachable from the associated 
> Class instances (that appear semantically as keys, but in fact it's the other 
> way arround - ClassValue instances are weakly reachable keys and Class 
> instances hold the maps).
> 
>> 
>> > What would be more interesting to test is to create a converter between a 
>> > type loaded by a custom (child) ClassLoader and a system type. After that, 
>> > release all hard references to the custom ClassLoader and see if the 
>> > converter gets GC-ed as a result. WDYT?
>> 
>> That does sound like 
>> [TypeConverterFactoryRetentionTests.testSystemLoaderToOtherLoader](https://github.com/openjdk/jdk/blob/3afec32bdf49703bb3537b36248f0550caae6d87/test/jdk/jdk/dynalink/TypeConverterFactoryRetentionTests.java#L118)
>>  True, it tests whether the _class loader_ itself gets released, not the 
>> converter, but the loader would hardly get released while there's a 
>> converter with the class from that loader bound in the parameter signature 
>> of its method handle, wouldn't it?
> 
> Yes, that test is exactly what I was thinking about.
> 
>> 
>> Regardless, if you think there's a valid use case for this additional test, 
>> we can discuss it. I'd rather scope this issue to fixing the flakiness of 
>> the tests, so they don't cause trouble with builds and can be 
>> backported/added to ongoing backports.
>> 
>> Thanks for taking time to look into this!
> 
> I think the tests cover what needs to be covered. I would just iterate the 
> same test over all GC implementations (as Aleksey already suggested) and 
> instead of minimizing the heap size (which could cause problems in some GC 
> implementations as they evolve), insert explicit System.gc() calls in the 
> loops to trigger GC cycles that also process Reference(s). I tried that with 
> all current CG implementations (SerialGC, G1GC, ParallelGC, ZGC and 
> ShenandoahGC) and they all respect System.gc() calls. Even though I increased 
> heap size to 128M, the 1st associated converter always got GCed after 12 new 
> converters were created for the same source/target class pair.

> … It is because ClassValue logic expunges associated T values from 
> Class.ClassValueMap only at times when it has to re-arrange the map's 
> internal structures and that eventually happens on new insertions for the 
> same Class instance and different ClassValue instance.

When I diagnosed the original issue by taking the heap dump and tracking the 
undesirable reachability from GC roots, I fixed it, then wrote the tests, and 
could see that before my changes the tests failed, and after the changes they 
passed, so stuff is no longer retained by class values – good enough for me, 
done! You on the other hand, need to understand _why 12 iterations_. I 
sincerely admire how you're methodically going after the details and need to 
understand the whole picture.

> I think the tests cover what needs to be covered. I would just iterate the 
> same test over all GC implementations (as Aleksey already suggested) and 
> instead of minimizing the heap size (which could cause problems in some GC 
> implementations as they evolve), insert explicit System.gc() calls in the 
> loops to trigger GC cycles that also process Reference(s). I tried that with 
> all current CG implementations (SerialGC, G1GC, ParallelGC, ZGC and 
> ShenandoahGC) and they all respect System.gc() calls. Even though I increased 
> heap size to 128M, the 1st associated converter always got GCed after 12 new 
> converters were created for the same source/target class pair.

Done: added `@run` directives for all GCs, and in fact removed the `-Xmx` as it 
is not relevant; memory leak test succeeds after 12 iterations (11 with 
Shenandoah and ZGC) and the retention tests succeed after 2 iterations (or 1 
with with Shenandoah and ZGC).

-------------

PR: https://git.openjdk.java.net/jdk/pull/2617

Re: RFR: 8261483: jdk/dynalink/TypeConverterFactoryMemoryLeakTest.java failed with "AssertionError: Should have GCd a method handle by now" [v2]

Reply via email to