Re: Sling Model Caching & GC problems

Konrad Windszus Tue, 30 Jan 2024 01:53:30 -0800
@Joerg: Pauls mail never hit the mailing list. Can you forward it?

> On 30. Jan 2024, at 10:28, Jörg Hoh <[email protected]> wrote:
> 
> Paul,
> 
> thanks for your mail. I agree to your observation what's going on. I also
> think that cache at the ResourceResolver level makes more sense, and it's
> easier to control and remove cached models, which cannot be used any more.
> 
> @Carsten: Regarding the size, we can still use a WeakHashMap as a container
> for these Sling Models (and store the WeakHashMap in the PropertyMap of the
> ResourceResolver), and that should prevent the worst, even in case of a
> long-running ResourceResolver or many adaptions.
> 
> Jörg
> 
> 
> Am Di., 30. Jan. 2024 um 07:16 Uhr schrieb Carsten Ziegeler <
> [email protected]>:
> 
>> I don't know much about sling models caching, but it seems storing the
>> cache into the resource resolver and therefore binding it to that
>> lifecycle sounds very reasonable to me. And due to the mentioned support
>> for Closeable, the cache gets discarded automatically with the resolver
>> being closed. No extra handling required, no soft references etc. needed.
>> 
>> Obviously, that sounds problematic with long or forever running resource
>> resolvers as the cache would never be released. However, I guess thats
>> not worse than what we have today. And long running resource resolvers
>> are an anti-pattern anyway.
>> 
>> Regards
>> Carsten
>> 
>> On 23.01.2024 17:23, Paul Bjorkstrand wrote:
>>> Hi Jörg,
>>> 
>>> My guess is that you are running up against the problem where the Model
>> is
>>> referencing its adaptable, directly or indirectly. In that situation, the
>>> model would not be collectable because it is referenced more strongly
>> than
>>> by weak reference. The reference path of these might look like this:
>>> 
>>> Model Cache Holder (the Model Adapter Factory
>>> (strong reference)
>>> Model Cache
>>> (soft reference)
>>> Model
>>> (strong reference)
>>> Resource Resolver
>>> (strong reference, maybe indirectly)
>>> Resource [the adaptable]
>>> 
>>> 
>>> The resource is strongly or softly referenced, possibly indirectly,
>> making
>>> it ineligible for collection on normal GC cycles. The resource &
>> resolver,
>>> will not be collected until after the model is collected. Since the model
>>> is not collected until there is memory pressure, that would explain why
>> you
>>> are seeing this (expensive) GC behavior.
>>> 
>>> When memory pressure occurs, first the models (soft references) are
>>> collected prior to the OOM, which releases both resolver (no longer
>>> referenced via field in the model) and resource.
>>> 
>>> The quickest fix is to release the resource resolver at the end of the
>>> model’s constructor or @PostConstruct [2]. Alternatively, you can
>> eliminate
>>> the cache=true, provided it does not negatively impact your application
>>> performance.
>>> Another option, though more involved, is that the entire caching impl
>> could
>>> be changed to better handle these kinds of reference loops, by putting
>> the
>>> cache in the adaptable itself (Resolver, Resource, Request, etc).
>> Possible
>>> good candidates of these are [4] or request attributes. [4] seems to be
>> the
>>> overall best candidate, especially since you can hook into it using
>>> Closeable ([5]) improving the cache eviction even more.
>>> 
>>> As long as the object holding the cache is not referenced strongly
>> outside
>>> that reference loop (cache holder > cache > model > ... > cache holder),
>>> then the loop's objects would be eligible for GC as soon as the cache
>>> holder is eligible.
>>> 
>>> [1]:
>>> 
>> https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/ref/SoftReference.html
>>> [2]:
>>> 
>> https://sling.apache.org/documentation/bundles/models.html#a-note-about-cache-true-and-using-the-self-injector
>>> [3]:
>> https://github.com/apache/sling-org-apache-sling-models-impl/pull/18
>>> [4]:
>>> 
>> https://github.com/apache/sling-org-apache-sling-api/blob/master/src/main/java/org/apache/sling/api/resource/ResourceResolver.java#L888
>>> [5]:
>>> 
>> https://github.com/apache/sling-org-apache-sling-api/blob/master/src/main/java/org/apache/sling/api/resource/ResourceResolver.java#L610
>>> 
>>> // Paul
>>> 
>>> 
>>> On Tue, Jan 23, 2024 at 6:48 AM Jörg Hoh <[email protected]
>> .invalid>
>>> wrote:
>>> 
>>>> Hi Sling community,
>>>> 
>>>> I want to share a recent experience I had with Sling Models, Sling Model
>>>> caching and Garbage Collection problems.
>>>> 
>>>> I had a case, where an AEM instance had massive garbage collection
>>>> problems, but no memory problems. We saw the regular sawtooth pattern in
>>>> the heap consumption, but heavy GC activity (in a stop-the-world manner)
>>>> almost constantly. But no OutOfMemory situation, there it's not a memory
>>>> leak.
>>>> 
>>>> I manually captured a heapdump and found a lot of Sling models being
>>>> referenced by the Sling ModelAdapterFactory cache, and rechecking these
>>>> model classes in detail I found them to specify "cache=true" in their
>>>> @Model annotation.When these statements were removed, the situation
>> looks
>>>> completely different, and the garbage collection was normal again.
>>>> 
>>>> I don't have a full explanation for this behavior yet. The Sling Models
>> had
>>>> a reference to a ResourceResolver (which was properly closed), but I
>> assume
>>>> that this reference somehow "disabled" the cleaning of the cache on
>> major
>>>> GCs (as its a WeakHashMap), but tied the collection of these models to
>> the
>>>> collection of the ResourceResolver objects, which have finalizers
>>>> registered. And finalizers are only executed under memory pressure.
>> Having
>>>> this connetion might have led to the situation that the SlingModel
>> objects
>>>> were not disposed eagerly, but only alongside the finalizers in
>> situation
>>>> of high memory pressure in a "stop-the-world" situation.
>>>> 
>>>> I try to get some more examples for that behavior; but I am not sure of
>> the
>>>> caching of Sling Models as-is is something we should continue to use.
>> This
>>>> case was quite hard to crack, and I don't know if/how we can avoid such
>> a
>>>> situation by design.
>>>> 
>>>> Jörg
>>>> 
>>>> --
>>>> https://cqdump.joerghoh.de
>>>> 
>>> 
>> 
>> --
>> Carsten Ziegeler
>> Adobe
>> [email protected]
>> 
> 
> 
> -- 
> https://cqdump.joerghoh.de
Re: Sling Model Caching & GC problems

Reply via email to