Re: Sling Model Caching & GC problems

2024-03-30 Thread Paul Bjorkstrand
Oops, in that last line I meant to say "then the original caching behavior
is preserved." instead of "then the original caching implementation is
preserved."

// Paul


On Sat, Mar 30, 2024 at 4:35 PM Paul Bjorkstrand 
wrote:

> I didn't see any (public) activity on this, and I had some time to look at
> this over this weekend. Here's the ticket [1] and PR [2] that would move
> the cache for Resource/Resolver based models into the ResourceResovler
> property map. Interestingly, request adaptables already had a similar type
> of cache colocation. I just modified the adapter factory's servlet listener
> to call close() on the cache when the request object is destroyed. If the
> adaptable somehow is neither a Request, Resource, nor Resolver, then the
> original caching implementation is preserved.
>
> This change would also likely close [3] as they are directly related.
>
> Give it a look and additional testing as needed.
>
> [1]: https://issues.apache.org/jira/browse/SLING-12279
> [2]: https://github.com/apache/sling-org-apache-sling-models-impl/pull/50
> [3]: https://issues.apache.org/jira/browse/SLING-12259
>
> // Paul
>
>
> On Tue, Jan 30, 2024 at 6:36 AM Carsten Ziegeler 
> wrote:
>
>> Sure, no problem with using a WeakHashMap.
>>
>> Regards
>> Carsten
>>
>> On 30.01.2024 10:28, Jörg Hoh wrote:
>> > Paul,
>> >
>> > thanks for your mail. I agree to your observation what's going on. I
>> also
>> > think that cache at the ResourceResolver level makes more sense, and
>> it's
>> > easier to control and remove cached models, which cannot be used any
>> more.
>> >
>> > @Carsten: Regarding the size, we can still use a WeakHashMap as a
>> container
>> > for these Sling Models (and store the WeakHashMap in the PropertyMap of
>> the
>> > ResourceResolver), and that should prevent the worst, even in case of a
>> > long-running ResourceResolver or many adaptions.
>> >
>> > Jörg
>> >
>> >
>> > Am Di., 30. Jan. 2024 um 07:16 Uhr schrieb Carsten Ziegeler <
>> > cziege...@apache.org>:
>> >
>> >> I don't know much about sling models caching, but it seems storing the
>> >> cache into the resource resolver and therefore binding it to that
>> >> lifecycle sounds very reasonable to me. And due to the mentioned
>> support
>> >> for Closeable, the cache gets discarded automatically with the resolver
>> >> being closed. No extra handling required, no soft references etc.
>> needed.
>> >>
>> >> Obviously, that sounds problematic with long or forever running
>> resource
>> >> resolvers as the cache would never be released. However, I guess thats
>> >> not worse than what we have today. And long running resource resolvers
>> >> are an anti-pattern anyway.
>> >>
>> >> Regards
>> >> Carsten
>> >>
>> >> On 23.01.2024 17:23, Paul Bjorkstrand wrote:
>> >>> Hi Jörg,
>> >>>
>> >>> My guess is that you are running up against the problem where the
>> Model
>> >> is
>> >>> referencing its adaptable, directly or indirectly. In that situation,
>> the
>> >>> model would not be collectable because it is referenced more strongly
>> >> than
>> >>> by weak reference. The reference path of these might look like this:
>> >>>
>> >>> Model Cache Holder (the Model Adapter Factory
>> >>> (strong reference)
>> >>> Model Cache
>> >>> (soft reference)
>> >>> Model
>> >>> (strong reference)
>> >>> Resource Resolver
>> >>> (strong reference, maybe indirectly)
>> >>> Resource [the adaptable]
>> >>>
>> >>>
>> >>> The resource is strongly or softly referenced, possibly indirectly,
>> >> making
>> >>> it ineligible for collection on normal GC cycles. The resource &
>> >> resolver,
>> >>> will not be collected until after the model is collected. Since the
>> model
>> >>> is not collected until there is memory pressure, that would explain
>> why
>> >> you
>> >>> are seeing this (expensive) GC behavior.
>> >>>
>> >>> When memory pressure occurs, first the models (soft references) are
>> >>> collected prior to the OOM, which releases both resolver (no longer
>> >>> referenced via field in the model) and resource.
>> >>>
>> >>> The quickest fix is to release the resource resolver at the end of the
>> >>> model’s constructor or @PostConstruct [2]. Alternatively, you can
>> >> eliminate
>> >>> the cache=true, provided it does not negatively impact your
>> application
>> >>> performance.
>> >>> Another option, though more involved, is that the entire caching impl
>> >> could
>> >>> be changed to better handle these kinds of reference loops, by putting
>> >> the
>> >>> cache in the adaptable itself (Resolver, Resource, Request, etc).
>> >> Possible
>> >>> good candidates of these are [4] or request attributes. [4] seems to
>> be
>> >> the
>> >>> overall best candidate, especially since you can hook into it using
>> >>> Closeable ([5]) improving the cache eviction even more.
>> >>>
>> >>> As long as the object holding the cache is not referenced strongly
>> >> outside
>> >>> that reference loop (cache holder > cache > model > ... > cache
>> holder),
>> >>> then the loop's 

Re: Sling Model Caching & GC problems

2024-03-30 Thread Paul Bjorkstrand
I didn't see any (public) activity on this, and I had some time to look at
this over this weekend. Here's the ticket [1] and PR [2] that would move
the cache for Resource/Resolver based models into the ResourceResovler
property map. Interestingly, request adaptables already had a similar type
of cache colocation. I just modified the adapter factory's servlet listener
to call close() on the cache when the request object is destroyed. If the
adaptable somehow is neither a Request, Resource, nor Resolver, then the
original caching implementation is preserved.

This change would also likely close [3] as they are directly related.

Give it a look and additional testing as needed.

[1]: https://issues.apache.org/jira/browse/SLING-12279
[2]: https://github.com/apache/sling-org-apache-sling-models-impl/pull/50
[3]: https://issues.apache.org/jira/browse/SLING-12259

// Paul


On Tue, Jan 30, 2024 at 6:36 AM Carsten Ziegeler 
wrote:

> Sure, no problem with using a WeakHashMap.
>
> Regards
> Carsten
>
> On 30.01.2024 10:28, Jörg Hoh wrote:
> > Paul,
> >
> > thanks for your mail. I agree to your observation what's going on. I also
> > think that cache at the ResourceResolver level makes more sense, and it's
> > easier to control and remove cached models, which cannot be used any
> more.
> >
> > @Carsten: Regarding the size, we can still use a WeakHashMap as a
> container
> > for these Sling Models (and store the WeakHashMap in the PropertyMap of
> the
> > ResourceResolver), and that should prevent the worst, even in case of a
> > long-running ResourceResolver or many adaptions.
> >
> > Jörg
> >
> >
> > Am Di., 30. Jan. 2024 um 07:16 Uhr schrieb Carsten Ziegeler <
> > cziege...@apache.org>:
> >
> >> I don't know much about sling models caching, but it seems storing the
> >> cache into the resource resolver and therefore binding it to that
> >> lifecycle sounds very reasonable to me. And due to the mentioned support
> >> for Closeable, the cache gets discarded automatically with the resolver
> >> being closed. No extra handling required, no soft references etc.
> needed.
> >>
> >> Obviously, that sounds problematic with long or forever running resource
> >> resolvers as the cache would never be released. However, I guess thats
> >> not worse than what we have today. And long running resource resolvers
> >> are an anti-pattern anyway.
> >>
> >> Regards
> >> Carsten
> >>
> >> On 23.01.2024 17:23, Paul Bjorkstrand wrote:
> >>> Hi Jörg,
> >>>
> >>> My guess is that you are running up against the problem where the Model
> >> is
> >>> referencing its adaptable, directly or indirectly. In that situation,
> the
> >>> model would not be collectable because it is referenced more strongly
> >> than
> >>> by weak reference. The reference path of these might look like this:
> >>>
> >>> Model Cache Holder (the Model Adapter Factory
> >>> (strong reference)
> >>> Model Cache
> >>> (soft reference)
> >>> Model
> >>> (strong reference)
> >>> Resource Resolver
> >>> (strong reference, maybe indirectly)
> >>> Resource [the adaptable]
> >>>
> >>>
> >>> The resource is strongly or softly referenced, possibly indirectly,
> >> making
> >>> it ineligible for collection on normal GC cycles. The resource &
> >> resolver,
> >>> will not be collected until after the model is collected. Since the
> model
> >>> is not collected until there is memory pressure, that would explain why
> >> you
> >>> are seeing this (expensive) GC behavior.
> >>>
> >>> When memory pressure occurs, first the models (soft references) are
> >>> collected prior to the OOM, which releases both resolver (no longer
> >>> referenced via field in the model) and resource.
> >>>
> >>> The quickest fix is to release the resource resolver at the end of the
> >>> model’s constructor or @PostConstruct [2]. Alternatively, you can
> >> eliminate
> >>> the cache=true, provided it does not negatively impact your application
> >>> performance.
> >>> Another option, though more involved, is that the entire caching impl
> >> could
> >>> be changed to better handle these kinds of reference loops, by putting
> >> the
> >>> cache in the adaptable itself (Resolver, Resource, Request, etc).
> >> Possible
> >>> good candidates of these are [4] or request attributes. [4] seems to be
> >> the
> >>> overall best candidate, especially since you can hook into it using
> >>> Closeable ([5]) improving the cache eviction even more.
> >>>
> >>> As long as the object holding the cache is not referenced strongly
> >> outside
> >>> that reference loop (cache holder > cache > model > ... > cache
> holder),
> >>> then the loop's objects would be eligible for GC as soon as the cache
> >>> holder is eligible.
> >>>
> >>> [1]:
> >>>
> >>
> https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/ref/SoftReference.html
> >>> [2]:
> >>>
> >>
> https://sling.apache.org/documentation/bundles/models.html#a-note-about-cache-true-and-using-the-self-injector
> >>> [3]:
> >> 

Re: Sling Model Caching & GC problems

2024-01-30 Thread Carsten Ziegeler

Sure, no problem with using a WeakHashMap.

Regards
Carsten

On 30.01.2024 10:28, Jörg Hoh wrote:

Paul,

thanks for your mail. I agree to your observation what's going on. I also
think that cache at the ResourceResolver level makes more sense, and it's
easier to control and remove cached models, which cannot be used any more.

@Carsten: Regarding the size, we can still use a WeakHashMap as a container
for these Sling Models (and store the WeakHashMap in the PropertyMap of the
ResourceResolver), and that should prevent the worst, even in case of a
long-running ResourceResolver or many adaptions.

Jörg


Am Di., 30. Jan. 2024 um 07:16 Uhr schrieb Carsten Ziegeler <
cziege...@apache.org>:


I don't know much about sling models caching, but it seems storing the
cache into the resource resolver and therefore binding it to that
lifecycle sounds very reasonable to me. And due to the mentioned support
for Closeable, the cache gets discarded automatically with the resolver
being closed. No extra handling required, no soft references etc. needed.

Obviously, that sounds problematic with long or forever running resource
resolvers as the cache would never be released. However, I guess thats
not worse than what we have today. And long running resource resolvers
are an anti-pattern anyway.

Regards
Carsten

On 23.01.2024 17:23, Paul Bjorkstrand wrote:

Hi Jörg,

My guess is that you are running up against the problem where the Model

is

referencing its adaptable, directly or indirectly. In that situation, the
model would not be collectable because it is referenced more strongly

than

by weak reference. The reference path of these might look like this:

Model Cache Holder (the Model Adapter Factory
(strong reference)
Model Cache
(soft reference)
Model
(strong reference)
Resource Resolver
(strong reference, maybe indirectly)
Resource [the adaptable]


The resource is strongly or softly referenced, possibly indirectly,

making

it ineligible for collection on normal GC cycles. The resource &

resolver,

will not be collected until after the model is collected. Since the model
is not collected until there is memory pressure, that would explain why

you

are seeing this (expensive) GC behavior.

When memory pressure occurs, first the models (soft references) are
collected prior to the OOM, which releases both resolver (no longer
referenced via field in the model) and resource.

The quickest fix is to release the resource resolver at the end of the
model’s constructor or @PostConstruct [2]. Alternatively, you can

eliminate

the cache=true, provided it does not negatively impact your application
performance.
Another option, though more involved, is that the entire caching impl

could

be changed to better handle these kinds of reference loops, by putting

the

cache in the adaptable itself (Resolver, Resource, Request, etc).

Possible

good candidates of these are [4] or request attributes. [4] seems to be

the

overall best candidate, especially since you can hook into it using
Closeable ([5]) improving the cache eviction even more.

As long as the object holding the cache is not referenced strongly

outside

that reference loop (cache holder > cache > model > ... > cache holder),
then the loop's objects would be eligible for GC as soon as the cache
holder is eligible.

[1]:


https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/ref/SoftReference.html

[2]:


https://sling.apache.org/documentation/bundles/models.html#a-note-about-cache-true-and-using-the-self-injector

[3]:

https://github.com/apache/sling-org-apache-sling-models-impl/pull/18

[4]:


https://github.com/apache/sling-org-apache-sling-api/blob/master/src/main/java/org/apache/sling/api/resource/ResourceResolver.java#L888

[5]:


https://github.com/apache/sling-org-apache-sling-api/blob/master/src/main/java/org/apache/sling/api/resource/ResourceResolver.java#L610


// Paul


On Tue, Jan 23, 2024 at 6:48 AM Jörg Hoh 
.invalid>

wrote:


Hi Sling community,

I want to share a recent experience I had with Sling Models, Sling Model
caching and Garbage Collection problems.

I had a case, where an AEM instance had massive garbage collection
problems, but no memory problems. We saw the regular sawtooth pattern in
the heap consumption, but heavy GC activity (in a stop-the-world manner)
almost constantly. But no OutOfMemory situation, there it's not a memory
leak.

I manually captured a heapdump and found a lot of Sling models being
referenced by the Sling ModelAdapterFactory cache, and rechecking these
model classes in detail I found them to specify "cache=true" in their
@Model annotation.When these statements were removed, the situation

looks

completely different, and the garbage collection was normal again.

I don't have a full explanation for this behavior yet. The Sling Models

had

a reference to a ResourceResolver (which was properly closed), but I

assume

that this reference somehow "disabled" the cleaning of the cache on

major

GCs (as 

Re: Sling Model Caching & GC problems

2024-01-30 Thread Konrad Windszus
Sorry, yes, I see it now.
Seems that Gmail has some issues and delivers some mails very delayed only (I 
only saw Carsten’s mail).
Will check next time in the archives.
Konrad

> On 30. Jan 2024, at 10:45, Stefan Seifert  
> wrote:
> 
>> @Joerg: Pauls mail never hit the mailing list. Can you forward it?
> 
> must be a problem on you side - it's in the archives:
> https://lists.apache.org/thread/xj6qosnjtgxvdf1lh58zs1bv5lh0y0q6
> 
> stefan



RE: Sling Model Caching & GC problems

2024-01-30 Thread Stefan Seifert
> @Joerg: Pauls mail never hit the mailing list. Can you forward it?

must be a problem on you side - it's in the archives:
https://lists.apache.org/thread/xj6qosnjtgxvdf1lh58zs1bv5lh0y0q6

stefan


Re: Sling Model Caching & GC problems

2024-01-30 Thread Konrad Windszus
@Joerg: Pauls mail never hit the mailing list. Can you forward it?

> On 30. Jan 2024, at 10:28, Jörg Hoh  wrote:
> 
> Paul,
> 
> thanks for your mail. I agree to your observation what's going on. I also
> think that cache at the ResourceResolver level makes more sense, and it's
> easier to control and remove cached models, which cannot be used any more.
> 
> @Carsten: Regarding the size, we can still use a WeakHashMap as a container
> for these Sling Models (and store the WeakHashMap in the PropertyMap of the
> ResourceResolver), and that should prevent the worst, even in case of a
> long-running ResourceResolver or many adaptions.
> 
> Jörg
> 
> 
> Am Di., 30. Jan. 2024 um 07:16 Uhr schrieb Carsten Ziegeler <
> cziege...@apache.org>:
> 
>> I don't know much about sling models caching, but it seems storing the
>> cache into the resource resolver and therefore binding it to that
>> lifecycle sounds very reasonable to me. And due to the mentioned support
>> for Closeable, the cache gets discarded automatically with the resolver
>> being closed. No extra handling required, no soft references etc. needed.
>> 
>> Obviously, that sounds problematic with long or forever running resource
>> resolvers as the cache would never be released. However, I guess thats
>> not worse than what we have today. And long running resource resolvers
>> are an anti-pattern anyway.
>> 
>> Regards
>> Carsten
>> 
>> On 23.01.2024 17:23, Paul Bjorkstrand wrote:
>>> Hi Jörg,
>>> 
>>> My guess is that you are running up against the problem where the Model
>> is
>>> referencing its adaptable, directly or indirectly. In that situation, the
>>> model would not be collectable because it is referenced more strongly
>> than
>>> by weak reference. The reference path of these might look like this:
>>> 
>>> Model Cache Holder (the Model Adapter Factory
>>> (strong reference)
>>> Model Cache
>>> (soft reference)
>>> Model
>>> (strong reference)
>>> Resource Resolver
>>> (strong reference, maybe indirectly)
>>> Resource [the adaptable]
>>> 
>>> 
>>> The resource is strongly or softly referenced, possibly indirectly,
>> making
>>> it ineligible for collection on normal GC cycles. The resource &
>> resolver,
>>> will not be collected until after the model is collected. Since the model
>>> is not collected until there is memory pressure, that would explain why
>> you
>>> are seeing this (expensive) GC behavior.
>>> 
>>> When memory pressure occurs, first the models (soft references) are
>>> collected prior to the OOM, which releases both resolver (no longer
>>> referenced via field in the model) and resource.
>>> 
>>> The quickest fix is to release the resource resolver at the end of the
>>> model’s constructor or @PostConstruct [2]. Alternatively, you can
>> eliminate
>>> the cache=true, provided it does not negatively impact your application
>>> performance.
>>> Another option, though more involved, is that the entire caching impl
>> could
>>> be changed to better handle these kinds of reference loops, by putting
>> the
>>> cache in the adaptable itself (Resolver, Resource, Request, etc).
>> Possible
>>> good candidates of these are [4] or request attributes. [4] seems to be
>> the
>>> overall best candidate, especially since you can hook into it using
>>> Closeable ([5]) improving the cache eviction even more.
>>> 
>>> As long as the object holding the cache is not referenced strongly
>> outside
>>> that reference loop (cache holder > cache > model > ... > cache holder),
>>> then the loop's objects would be eligible for GC as soon as the cache
>>> holder is eligible.
>>> 
>>> [1]:
>>> 
>> https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/ref/SoftReference.html
>>> [2]:
>>> 
>> https://sling.apache.org/documentation/bundles/models.html#a-note-about-cache-true-and-using-the-self-injector
>>> [3]:
>> https://github.com/apache/sling-org-apache-sling-models-impl/pull/18
>>> [4]:
>>> 
>> https://github.com/apache/sling-org-apache-sling-api/blob/master/src/main/java/org/apache/sling/api/resource/ResourceResolver.java#L888
>>> [5]:
>>> 
>> https://github.com/apache/sling-org-apache-sling-api/blob/master/src/main/java/org/apache/sling/api/resource/ResourceResolver.java#L610
>>> 
>>> // Paul
>>> 
>>> 
>>> On Tue, Jan 23, 2024 at 6:48 AM Jörg Hoh > .invalid>
>>> wrote:
>>> 
 Hi Sling community,
 
 I want to share a recent experience I had with Sling Models, Sling Model
 caching and Garbage Collection problems.
 
 I had a case, where an AEM instance had massive garbage collection
 problems, but no memory problems. We saw the regular sawtooth pattern in
 the heap consumption, but heavy GC activity (in a stop-the-world manner)
 almost constantly. But no OutOfMemory situation, there it's not a memory
 leak.
 
 I manually captured a heapdump and found a lot of Sling models being
 referenced by the Sling ModelAdapterFactory cache, and rechecking these
 model classes in detail I found 

Re: Sling Model Caching & GC problems

2024-01-30 Thread Jörg Hoh
Paul,

thanks for your mail. I agree to your observation what's going on. I also
think that cache at the ResourceResolver level makes more sense, and it's
easier to control and remove cached models, which cannot be used any more.

@Carsten: Regarding the size, we can still use a WeakHashMap as a container
for these Sling Models (and store the WeakHashMap in the PropertyMap of the
ResourceResolver), and that should prevent the worst, even in case of a
long-running ResourceResolver or many adaptions.

Jörg


Am Di., 30. Jan. 2024 um 07:16 Uhr schrieb Carsten Ziegeler <
cziege...@apache.org>:

> I don't know much about sling models caching, but it seems storing the
> cache into the resource resolver and therefore binding it to that
> lifecycle sounds very reasonable to me. And due to the mentioned support
> for Closeable, the cache gets discarded automatically with the resolver
> being closed. No extra handling required, no soft references etc. needed.
>
> Obviously, that sounds problematic with long or forever running resource
> resolvers as the cache would never be released. However, I guess thats
> not worse than what we have today. And long running resource resolvers
> are an anti-pattern anyway.
>
> Regards
> Carsten
>
> On 23.01.2024 17:23, Paul Bjorkstrand wrote:
> > Hi Jörg,
> >
> > My guess is that you are running up against the problem where the Model
> is
> > referencing its adaptable, directly or indirectly. In that situation, the
> > model would not be collectable because it is referenced more strongly
> than
> > by weak reference. The reference path of these might look like this:
> >
> > Model Cache Holder (the Model Adapter Factory
> > (strong reference)
> > Model Cache
> > (soft reference)
> > Model
> > (strong reference)
> > Resource Resolver
> > (strong reference, maybe indirectly)
> > Resource [the adaptable]
> >
> >
> > The resource is strongly or softly referenced, possibly indirectly,
> making
> > it ineligible for collection on normal GC cycles. The resource &
> resolver,
> > will not be collected until after the model is collected. Since the model
> > is not collected until there is memory pressure, that would explain why
> you
> > are seeing this (expensive) GC behavior.
> >
> > When memory pressure occurs, first the models (soft references) are
> > collected prior to the OOM, which releases both resolver (no longer
> > referenced via field in the model) and resource.
> >
> > The quickest fix is to release the resource resolver at the end of the
> > model’s constructor or @PostConstruct [2]. Alternatively, you can
> eliminate
> > the cache=true, provided it does not negatively impact your application
> > performance.
> > Another option, though more involved, is that the entire caching impl
> could
> > be changed to better handle these kinds of reference loops, by putting
> the
> > cache in the adaptable itself (Resolver, Resource, Request, etc).
> Possible
> > good candidates of these are [4] or request attributes. [4] seems to be
> the
> > overall best candidate, especially since you can hook into it using
> > Closeable ([5]) improving the cache eviction even more.
> >
> > As long as the object holding the cache is not referenced strongly
> outside
> > that reference loop (cache holder > cache > model > ... > cache holder),
> > then the loop's objects would be eligible for GC as soon as the cache
> > holder is eligible.
> >
> > [1]:
> >
> https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/ref/SoftReference.html
> > [2]:
> >
> https://sling.apache.org/documentation/bundles/models.html#a-note-about-cache-true-and-using-the-self-injector
> > [3]:
> https://github.com/apache/sling-org-apache-sling-models-impl/pull/18
> > [4]:
> >
> https://github.com/apache/sling-org-apache-sling-api/blob/master/src/main/java/org/apache/sling/api/resource/ResourceResolver.java#L888
> > [5]:
> >
> https://github.com/apache/sling-org-apache-sling-api/blob/master/src/main/java/org/apache/sling/api/resource/ResourceResolver.java#L610
> >
> > // Paul
> >
> >
> > On Tue, Jan 23, 2024 at 6:48 AM Jörg Hoh  .invalid>
> > wrote:
> >
> >> Hi Sling community,
> >>
> >> I want to share a recent experience I had with Sling Models, Sling Model
> >> caching and Garbage Collection problems.
> >>
> >> I had a case, where an AEM instance had massive garbage collection
> >> problems, but no memory problems. We saw the regular sawtooth pattern in
> >> the heap consumption, but heavy GC activity (in a stop-the-world manner)
> >> almost constantly. But no OutOfMemory situation, there it's not a memory
> >> leak.
> >>
> >> I manually captured a heapdump and found a lot of Sling models being
> >> referenced by the Sling ModelAdapterFactory cache, and rechecking these
> >> model classes in detail I found them to specify "cache=true" in their
> >> @Model annotation.When these statements were removed, the situation
> looks
> >> completely different, and the garbage collection was normal again.
> >>
> >> I don't 

Re: Sling Model Caching & GC problems

2024-01-29 Thread Carsten Ziegeler
I don't know much about sling models caching, but it seems storing the 
cache into the resource resolver and therefore binding it to that 
lifecycle sounds very reasonable to me. And due to the mentioned support 
for Closeable, the cache gets discarded automatically with the resolver 
being closed. No extra handling required, no soft references etc. needed.


Obviously, that sounds problematic with long or forever running resource 
resolvers as the cache would never be released. However, I guess thats 
not worse than what we have today. And long running resource resolvers 
are an anti-pattern anyway.


Regards
Carsten

On 23.01.2024 17:23, Paul Bjorkstrand wrote:

Hi Jörg,

My guess is that you are running up against the problem where the Model is
referencing its adaptable, directly or indirectly. In that situation, the
model would not be collectable because it is referenced more strongly than
by weak reference. The reference path of these might look like this:

Model Cache Holder (the Model Adapter Factory
(strong reference)
Model Cache
(soft reference)
Model
(strong reference)
Resource Resolver
(strong reference, maybe indirectly)
Resource [the adaptable]


The resource is strongly or softly referenced, possibly indirectly, making
it ineligible for collection on normal GC cycles. The resource & resolver,
will not be collected until after the model is collected. Since the model
is not collected until there is memory pressure, that would explain why you
are seeing this (expensive) GC behavior.

When memory pressure occurs, first the models (soft references) are
collected prior to the OOM, which releases both resolver (no longer
referenced via field in the model) and resource.

The quickest fix is to release the resource resolver at the end of the
model’s constructor or @PostConstruct [2]. Alternatively, you can eliminate
the cache=true, provided it does not negatively impact your application
performance.
Another option, though more involved, is that the entire caching impl could
be changed to better handle these kinds of reference loops, by putting the
cache in the adaptable itself (Resolver, Resource, Request, etc). Possible
good candidates of these are [4] or request attributes. [4] seems to be the
overall best candidate, especially since you can hook into it using
Closeable ([5]) improving the cache eviction even more.

As long as the object holding the cache is not referenced strongly outside
that reference loop (cache holder > cache > model > ... > cache holder),
then the loop's objects would be eligible for GC as soon as the cache
holder is eligible.

[1]:
https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/ref/SoftReference.html
[2]:
https://sling.apache.org/documentation/bundles/models.html#a-note-about-cache-true-and-using-the-self-injector
[3]: https://github.com/apache/sling-org-apache-sling-models-impl/pull/18
[4]:
https://github.com/apache/sling-org-apache-sling-api/blob/master/src/main/java/org/apache/sling/api/resource/ResourceResolver.java#L888
[5]:
https://github.com/apache/sling-org-apache-sling-api/blob/master/src/main/java/org/apache/sling/api/resource/ResourceResolver.java#L610

// Paul


On Tue, Jan 23, 2024 at 6:48 AM Jörg Hoh 
wrote:


Hi Sling community,

I want to share a recent experience I had with Sling Models, Sling Model
caching and Garbage Collection problems.

I had a case, where an AEM instance had massive garbage collection
problems, but no memory problems. We saw the regular sawtooth pattern in
the heap consumption, but heavy GC activity (in a stop-the-world manner)
almost constantly. But no OutOfMemory situation, there it's not a memory
leak.

I manually captured a heapdump and found a lot of Sling models being
referenced by the Sling ModelAdapterFactory cache, and rechecking these
model classes in detail I found them to specify "cache=true" in their
@Model annotation.When these statements were removed, the situation looks
completely different, and the garbage collection was normal again.

I don't have a full explanation for this behavior yet. The Sling Models had
a reference to a ResourceResolver (which was properly closed), but I assume
that this reference somehow "disabled" the cleaning of the cache on major
GCs (as its a WeakHashMap), but tied the collection of these models to the
collection of the ResourceResolver objects, which have finalizers
registered. And finalizers are only executed under memory pressure. Having
this connetion might have led to the situation that the SlingModel objects
were not disposed eagerly, but only alongside the finalizers in situation
of high memory pressure in a "stop-the-world" situation.

I try to get some more examples for that behavior; but I am not sure of the
caching of Sling Models as-is is something we should continue to use. This
case was quite hard to crack, and I don't know if/how we can avoid such a
situation by design.

Jörg

--
https://cqdump.joerghoh.de





--
Carsten Ziegeler
Adobe

Re: Sling Model Caching & GC problems

2024-01-23 Thread Paul Bjorkstrand
Hi Jörg,

My guess is that you are running up against the problem where the Model is
referencing its adaptable, directly or indirectly. In that situation, the
model would not be collectable because it is referenced more strongly than
by weak reference. The reference path of these might look like this:

Model Cache Holder (the Model Adapter Factory
(strong reference)
Model Cache
(soft reference)
Model
(strong reference)
Resource Resolver
(strong reference, maybe indirectly)
Resource [the adaptable]


The resource is strongly or softly referenced, possibly indirectly, making
it ineligible for collection on normal GC cycles. The resource & resolver,
will not be collected until after the model is collected. Since the model
is not collected until there is memory pressure, that would explain why you
are seeing this (expensive) GC behavior.

When memory pressure occurs, first the models (soft references) are
collected prior to the OOM, which releases both resolver (no longer
referenced via field in the model) and resource.

The quickest fix is to release the resource resolver at the end of the
model’s constructor or @PostConstruct [2]. Alternatively, you can eliminate
the cache=true, provided it does not negatively impact your application
performance.
Another option, though more involved, is that the entire caching impl could
be changed to better handle these kinds of reference loops, by putting the
cache in the adaptable itself (Resolver, Resource, Request, etc). Possible
good candidates of these are [4] or request attributes. [4] seems to be the
overall best candidate, especially since you can hook into it using
Closeable ([5]) improving the cache eviction even more.

As long as the object holding the cache is not referenced strongly outside
that reference loop (cache holder > cache > model > ... > cache holder),
then the loop's objects would be eligible for GC as soon as the cache
holder is eligible.

[1]:
https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/ref/SoftReference.html
[2]:
https://sling.apache.org/documentation/bundles/models.html#a-note-about-cache-true-and-using-the-self-injector
[3]: https://github.com/apache/sling-org-apache-sling-models-impl/pull/18
[4]:
https://github.com/apache/sling-org-apache-sling-api/blob/master/src/main/java/org/apache/sling/api/resource/ResourceResolver.java#L888
[5]:
https://github.com/apache/sling-org-apache-sling-api/blob/master/src/main/java/org/apache/sling/api/resource/ResourceResolver.java#L610

// Paul


On Tue, Jan 23, 2024 at 6:48 AM Jörg Hoh 
wrote:

> Hi Sling community,
>
> I want to share a recent experience I had with Sling Models, Sling Model
> caching and Garbage Collection problems.
>
> I had a case, where an AEM instance had massive garbage collection
> problems, but no memory problems. We saw the regular sawtooth pattern in
> the heap consumption, but heavy GC activity (in a stop-the-world manner)
> almost constantly. But no OutOfMemory situation, there it's not a memory
> leak.
>
> I manually captured a heapdump and found a lot of Sling models being
> referenced by the Sling ModelAdapterFactory cache, and rechecking these
> model classes in detail I found them to specify "cache=true" in their
> @Model annotation.When these statements were removed, the situation looks
> completely different, and the garbage collection was normal again.
>
> I don't have a full explanation for this behavior yet. The Sling Models had
> a reference to a ResourceResolver (which was properly closed), but I assume
> that this reference somehow "disabled" the cleaning of the cache on major
> GCs (as its a WeakHashMap), but tied the collection of these models to the
> collection of the ResourceResolver objects, which have finalizers
> registered. And finalizers are only executed under memory pressure. Having
> this connetion might have led to the situation that the SlingModel objects
> were not disposed eagerly, but only alongside the finalizers in situation
> of high memory pressure in a "stop-the-world" situation.
>
> I try to get some more examples for that behavior; but I am not sure of the
> caching of Sling Models as-is is something we should continue to use. This
> case was quite hard to crack, and I don't know if/how we can avoid such a
> situation by design.
>
> Jörg
>
> --
> https://cqdump.joerghoh.de
>


Sling Model Caching & GC problems

2024-01-23 Thread Jörg Hoh
Hi Sling community,

I want to share a recent experience I had with Sling Models, Sling Model
caching and Garbage Collection problems.

I had a case, where an AEM instance had massive garbage collection
problems, but no memory problems. We saw the regular sawtooth pattern in
the heap consumption, but heavy GC activity (in a stop-the-world manner)
almost constantly. But no OutOfMemory situation, there it's not a memory
leak.

I manually captured a heapdump and found a lot of Sling models being
referenced by the Sling ModelAdapterFactory cache, and rechecking these
model classes in detail I found them to specify "cache=true" in their
@Model annotation.When these statements were removed, the situation looks
completely different, and the garbage collection was normal again.

I don't have a full explanation for this behavior yet. The Sling Models had
a reference to a ResourceResolver (which was properly closed), but I assume
that this reference somehow "disabled" the cleaning of the cache on major
GCs (as its a WeakHashMap), but tied the collection of these models to the
collection of the ResourceResolver objects, which have finalizers
registered. And finalizers are only executed under memory pressure. Having
this connetion might have led to the situation that the SlingModel objects
were not disposed eagerly, but only alongside the finalizers in situation
of high memory pressure in a "stop-the-world" situation.

I try to get some more examples for that behavior; but I am not sure of the
caching of Sling Models as-is is something we should continue to use. This
case was quite hard to crack, and I don't know if/how we can avoid such a
situation by design.

Jörg

-- 
https://cqdump.joerghoh.de