I don't know much about sling models caching, but it seems storing the cache into the resource resolver and therefore binding it to that lifecycle sounds very reasonable to me. And due to the mentioned support for Closeable, the cache gets discarded automatically with the resolver being closed. No extra handling required, no soft references etc. needed.

Obviously, that sounds problematic with long or forever running resource resolvers as the cache would never be released. However, I guess thats not worse than what we have today. And long running resource resolvers are an anti-pattern anyway.

Regards
Carsten

On 23.01.2024 17:23, Paul Bjorkstrand wrote:
Hi Jörg,

My guess is that you are running up against the problem where the Model is
referencing its adaptable, directly or indirectly. In that situation, the
model would not be collectable because it is referenced more strongly than
by weak reference. The reference path of these might look like this:

Model Cache Holder (the Model Adapter Factory
(strong reference)
Model Cache
(soft reference)
Model
(strong reference)
Resource Resolver
(strong reference, maybe indirectly)
Resource [the adaptable]


The resource is strongly or softly referenced, possibly indirectly, making
it ineligible for collection on normal GC cycles. The resource & resolver,
will not be collected until after the model is collected. Since the model
is not collected until there is memory pressure, that would explain why you
are seeing this (expensive) GC behavior.

When memory pressure occurs, first the models (soft references) are
collected prior to the OOM, which releases both resolver (no longer
referenced via field in the model) and resource.

The quickest fix is to release the resource resolver at the end of the
model’s constructor or @PostConstruct [2]. Alternatively, you can eliminate
the cache=true, provided it does not negatively impact your application
performance.
Another option, though more involved, is that the entire caching impl could
be changed to better handle these kinds of reference loops, by putting the
cache in the adaptable itself (Resolver, Resource, Request, etc). Possible
good candidates of these are [4] or request attributes. [4] seems to be the
overall best candidate, especially since you can hook into it using
Closeable ([5]) improving the cache eviction even more.

As long as the object holding the cache is not referenced strongly outside
that reference loop (cache holder > cache > model > ... > cache holder),
then the loop's objects would be eligible for GC as soon as the cache
holder is eligible.

[1]:
https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/ref/SoftReference.html
[2]:
https://sling.apache.org/documentation/bundles/models.html#a-note-about-cache-true-and-using-the-self-injector
[3]: https://github.com/apache/sling-org-apache-sling-models-impl/pull/18
[4]:
https://github.com/apache/sling-org-apache-sling-api/blob/master/src/main/java/org/apache/sling/api/resource/ResourceResolver.java#L888
[5]:
https://github.com/apache/sling-org-apache-sling-api/blob/master/src/main/java/org/apache/sling/api/resource/ResourceResolver.java#L610

// Paul


On Tue, Jan 23, 2024 at 6:48 AM Jörg Hoh <jhoh...@googlemail.com.invalid>
wrote:

Hi Sling community,

I want to share a recent experience I had with Sling Models, Sling Model
caching and Garbage Collection problems.

I had a case, where an AEM instance had massive garbage collection
problems, but no memory problems. We saw the regular sawtooth pattern in
the heap consumption, but heavy GC activity (in a stop-the-world manner)
almost constantly. But no OutOfMemory situation, there it's not a memory
leak.

I manually captured a heapdump and found a lot of Sling models being
referenced by the Sling ModelAdapterFactory cache, and rechecking these
model classes in detail I found them to specify "cache=true" in their
@Model annotation.When these statements were removed, the situation looks
completely different, and the garbage collection was normal again.

I don't have a full explanation for this behavior yet. The Sling Models had
a reference to a ResourceResolver (which was properly closed), but I assume
that this reference somehow "disabled" the cleaning of the cache on major
GCs (as its a WeakHashMap), but tied the collection of these models to the
collection of the ResourceResolver objects, which have finalizers
registered. And finalizers are only executed under memory pressure. Having
this connetion might have led to the situation that the SlingModel objects
were not disposed eagerly, but only alongside the finalizers in situation
of high memory pressure in a "stop-the-world" situation.

I try to get some more examples for that behavior; but I am not sure of the
caching of Sling Models as-is is something we should continue to use. This
case was quite hard to crack, and I don't know if/how we can avoid such a
situation by design.

Jörg

--
https://cqdump.joerghoh.de



--
Carsten Ziegeler
Adobe
cziege...@apache.org

Reply via email to