Re: Sling Model Caching & GC problems
Oops, in that last line I meant to say "then the original caching behavior is preserved." instead of "then the original caching implementation is preserved." // Paul On Sat, Mar 30, 2024 at 4:35 PM Paul Bjorkstrand wrote: > I didn't see any (public) activity on this, and I had some time to look at > this over this weekend. Here's the ticket [1] and PR [2] that would move > the cache for Resource/Resolver based models into the ResourceResovler > property map. Interestingly, request adaptables already had a similar type > of cache colocation. I just modified the adapter factory's servlet listener > to call close() on the cache when the request object is destroyed. If the > adaptable somehow is neither a Request, Resource, nor Resolver, then the > original caching implementation is preserved. > > This change would also likely close [3] as they are directly related. > > Give it a look and additional testing as needed. > > [1]: https://issues.apache.org/jira/browse/SLING-12279 > [2]: https://github.com/apache/sling-org-apache-sling-models-impl/pull/50 > [3]: https://issues.apache.org/jira/browse/SLING-12259 > > // Paul > > > On Tue, Jan 30, 2024 at 6:36 AM Carsten Ziegeler > wrote: > >> Sure, no problem with using a WeakHashMap. >> >> Regards >> Carsten >> >> On 30.01.2024 10:28, Jörg Hoh wrote: >> > Paul, >> > >> > thanks for your mail. I agree to your observation what's going on. I >> also >> > think that cache at the ResourceResolver level makes more sense, and >> it's >> > easier to control and remove cached models, which cannot be used any >> more. >> > >> > @Carsten: Regarding the size, we can still use a WeakHashMap as a >> container >> > for these Sling Models (and store the WeakHashMap in the PropertyMap of >> the >> > ResourceResolver), and that should prevent the worst, even in case of a >> > long-running ResourceResolver or many adaptions. >> > >> > Jörg >> > >> > >> > Am Di., 30. Jan. 2024 um 07:16 Uhr schrieb Carsten Ziegeler < >> > cziege...@apache.org>: >> > >> >> I don't know much about sling models caching, but it seems storing the >> >> cache into the resource resolver and therefore binding it to that >> >> lifecycle sounds very reasonable to me. And due to the mentioned >> support >> >> for Closeable, the cache gets discarded automatically with the resolver >> >> being closed. No extra handling required, no soft references etc. >> needed. >> >> >> >> Obviously, that sounds problematic with long or forever running >> resource >> >> resolvers as the cache would never be released. However, I guess thats >> >> not worse than what we have today. And long running resource resolvers >> >> are an anti-pattern anyway. >> >> >> >> Regards >> >> Carsten >> >> >> >> On 23.01.2024 17:23, Paul Bjorkstrand wrote: >> >>> Hi Jörg, >> >>> >> >>> My guess is that you are running up against the problem where the >> Model >> >> is >> >>> referencing its adaptable, directly or indirectly. In that situation, >> the >> >>> model would not be collectable because it is referenced more strongly >> >> than >> >>> by weak reference. The reference path of these might look like this: >> >>> >> >>> Model Cache Holder (the Model Adapter Factory >> >>> (strong reference) >> >>> Model Cache >> >>> (soft reference) >> >>> Model >> >>> (strong reference) >> >>> Resource Resolver >> >>> (strong reference, maybe indirectly) >> >>> Resource [the adaptable] >> >>> >> >>> >> >>> The resource is strongly or softly referenced, possibly indirectly, >> >> making >> >>> it ineligible for collection on normal GC cycles. The resource & >> >> resolver, >> >>> will not be collected until after the model is collected. Since the >> model >> >>> is not collected until there is memory pressure, that would explain >> why >> >> you >> >>> are seeing this (expensive) GC behavior. >> >>> >> >>> When memory pressure occurs, first the models (soft references) are >> >>> collected prior to the OOM, which releases both resolver (no longer >> >>> referenced via field in the model) and resource. >> >>> >> >>> The quickest fix is to release the resource resolver at the end of the >> >>> model’s constructor or @PostConstruct [2]. Alternatively, you can >> >> eliminate >> >>> the cache=true, provided it does not negatively impact your >> application >> >>> performance. >> >>> Another option, though more involved, is that the entire caching impl >> >> could >> >>> be changed to better handle these kinds of reference loops, by putting >> >> the >> >>> cache in the adaptable itself (Resolver, Resource, Request, etc). >> >> Possible >> >>> good candidates of these are [4] or request attributes. [4] seems to >> be >> >> the >> >>> overall best candidate, especially since you can hook into it using >> >>> Closeable ([5]) improving the cache eviction even more. >> >>> >> >>> As long as the object holding the cache is not referenced strongly >> >> outside >> >>> that reference loop (cache holder > cache > model > ... > cache >> holder), >> >>> then the loop's
Re: Sling Model Caching & GC problems
I didn't see any (public) activity on this, and I had some time to look at this over this weekend. Here's the ticket [1] and PR [2] that would move the cache for Resource/Resolver based models into the ResourceResovler property map. Interestingly, request adaptables already had a similar type of cache colocation. I just modified the adapter factory's servlet listener to call close() on the cache when the request object is destroyed. If the adaptable somehow is neither a Request, Resource, nor Resolver, then the original caching implementation is preserved. This change would also likely close [3] as they are directly related. Give it a look and additional testing as needed. [1]: https://issues.apache.org/jira/browse/SLING-12279 [2]: https://github.com/apache/sling-org-apache-sling-models-impl/pull/50 [3]: https://issues.apache.org/jira/browse/SLING-12259 // Paul On Tue, Jan 30, 2024 at 6:36 AM Carsten Ziegeler wrote: > Sure, no problem with using a WeakHashMap. > > Regards > Carsten > > On 30.01.2024 10:28, Jörg Hoh wrote: > > Paul, > > > > thanks for your mail. I agree to your observation what's going on. I also > > think that cache at the ResourceResolver level makes more sense, and it's > > easier to control and remove cached models, which cannot be used any > more. > > > > @Carsten: Regarding the size, we can still use a WeakHashMap as a > container > > for these Sling Models (and store the WeakHashMap in the PropertyMap of > the > > ResourceResolver), and that should prevent the worst, even in case of a > > long-running ResourceResolver or many adaptions. > > > > Jörg > > > > > > Am Di., 30. Jan. 2024 um 07:16 Uhr schrieb Carsten Ziegeler < > > cziege...@apache.org>: > > > >> I don't know much about sling models caching, but it seems storing the > >> cache into the resource resolver and therefore binding it to that > >> lifecycle sounds very reasonable to me. And due to the mentioned support > >> for Closeable, the cache gets discarded automatically with the resolver > >> being closed. No extra handling required, no soft references etc. > needed. > >> > >> Obviously, that sounds problematic with long or forever running resource > >> resolvers as the cache would never be released. However, I guess thats > >> not worse than what we have today. And long running resource resolvers > >> are an anti-pattern anyway. > >> > >> Regards > >> Carsten > >> > >> On 23.01.2024 17:23, Paul Bjorkstrand wrote: > >>> Hi Jörg, > >>> > >>> My guess is that you are running up against the problem where the Model > >> is > >>> referencing its adaptable, directly or indirectly. In that situation, > the > >>> model would not be collectable because it is referenced more strongly > >> than > >>> by weak reference. The reference path of these might look like this: > >>> > >>> Model Cache Holder (the Model Adapter Factory > >>> (strong reference) > >>> Model Cache > >>> (soft reference) > >>> Model > >>> (strong reference) > >>> Resource Resolver > >>> (strong reference, maybe indirectly) > >>> Resource [the adaptable] > >>> > >>> > >>> The resource is strongly or softly referenced, possibly indirectly, > >> making > >>> it ineligible for collection on normal GC cycles. The resource & > >> resolver, > >>> will not be collected until after the model is collected. Since the > model > >>> is not collected until there is memory pressure, that would explain why > >> you > >>> are seeing this (expensive) GC behavior. > >>> > >>> When memory pressure occurs, first the models (soft references) are > >>> collected prior to the OOM, which releases both resolver (no longer > >>> referenced via field in the model) and resource. > >>> > >>> The quickest fix is to release the resource resolver at the end of the > >>> model’s constructor or @PostConstruct [2]. Alternatively, you can > >> eliminate > >>> the cache=true, provided it does not negatively impact your application > >>> performance. > >>> Another option, though more involved, is that the entire caching impl > >> could > >>> be changed to better handle these kinds of reference loops, by putting > >> the > >>> cache in the adaptable itself (Resolver, Resource, Request, etc). > >> Possible > >>> good candidates of these are [4] or request attributes. [4] seems to be > >> the > >>> overall best candidate, especially since you can hook into it using > >>> Closeable ([5]) improving the cache eviction even more. > >>> > >>> As long as the object holding the cache is not referenced strongly > >> outside > >>> that reference loop (cache holder > cache > model > ... > cache > holder), > >>> then the loop's objects would be eligible for GC as soon as the cache > >>> holder is eligible. > >>> > >>> [1]: > >>> > >> > https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/ref/SoftReference.html > >>> [2]: > >>> > >> > https://sling.apache.org/documentation/bundles/models.html#a-note-about-cache-true-and-using-the-self-injector > >>> [3]: > >>
Re: Sling Model Caching & GC problems
Sure, no problem with using a WeakHashMap. Regards Carsten On 30.01.2024 10:28, Jörg Hoh wrote: Paul, thanks for your mail. I agree to your observation what's going on. I also think that cache at the ResourceResolver level makes more sense, and it's easier to control and remove cached models, which cannot be used any more. @Carsten: Regarding the size, we can still use a WeakHashMap as a container for these Sling Models (and store the WeakHashMap in the PropertyMap of the ResourceResolver), and that should prevent the worst, even in case of a long-running ResourceResolver or many adaptions. Jörg Am Di., 30. Jan. 2024 um 07:16 Uhr schrieb Carsten Ziegeler < cziege...@apache.org>: I don't know much about sling models caching, but it seems storing the cache into the resource resolver and therefore binding it to that lifecycle sounds very reasonable to me. And due to the mentioned support for Closeable, the cache gets discarded automatically with the resolver being closed. No extra handling required, no soft references etc. needed. Obviously, that sounds problematic with long or forever running resource resolvers as the cache would never be released. However, I guess thats not worse than what we have today. And long running resource resolvers are an anti-pattern anyway. Regards Carsten On 23.01.2024 17:23, Paul Bjorkstrand wrote: Hi Jörg, My guess is that you are running up against the problem where the Model is referencing its adaptable, directly or indirectly. In that situation, the model would not be collectable because it is referenced more strongly than by weak reference. The reference path of these might look like this: Model Cache Holder (the Model Adapter Factory (strong reference) Model Cache (soft reference) Model (strong reference) Resource Resolver (strong reference, maybe indirectly) Resource [the adaptable] The resource is strongly or softly referenced, possibly indirectly, making it ineligible for collection on normal GC cycles. The resource & resolver, will not be collected until after the model is collected. Since the model is not collected until there is memory pressure, that would explain why you are seeing this (expensive) GC behavior. When memory pressure occurs, first the models (soft references) are collected prior to the OOM, which releases both resolver (no longer referenced via field in the model) and resource. The quickest fix is to release the resource resolver at the end of the model’s constructor or @PostConstruct [2]. Alternatively, you can eliminate the cache=true, provided it does not negatively impact your application performance. Another option, though more involved, is that the entire caching impl could be changed to better handle these kinds of reference loops, by putting the cache in the adaptable itself (Resolver, Resource, Request, etc). Possible good candidates of these are [4] or request attributes. [4] seems to be the overall best candidate, especially since you can hook into it using Closeable ([5]) improving the cache eviction even more. As long as the object holding the cache is not referenced strongly outside that reference loop (cache holder > cache > model > ... > cache holder), then the loop's objects would be eligible for GC as soon as the cache holder is eligible. [1]: https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/ref/SoftReference.html [2]: https://sling.apache.org/documentation/bundles/models.html#a-note-about-cache-true-and-using-the-self-injector [3]: https://github.com/apache/sling-org-apache-sling-models-impl/pull/18 [4]: https://github.com/apache/sling-org-apache-sling-api/blob/master/src/main/java/org/apache/sling/api/resource/ResourceResolver.java#L888 [5]: https://github.com/apache/sling-org-apache-sling-api/blob/master/src/main/java/org/apache/sling/api/resource/ResourceResolver.java#L610 // Paul On Tue, Jan 23, 2024 at 6:48 AM Jörg Hoh .invalid> wrote: Hi Sling community, I want to share a recent experience I had with Sling Models, Sling Model caching and Garbage Collection problems. I had a case, where an AEM instance had massive garbage collection problems, but no memory problems. We saw the regular sawtooth pattern in the heap consumption, but heavy GC activity (in a stop-the-world manner) almost constantly. But no OutOfMemory situation, there it's not a memory leak. I manually captured a heapdump and found a lot of Sling models being referenced by the Sling ModelAdapterFactory cache, and rechecking these model classes in detail I found them to specify "cache=true" in their @Model annotation.When these statements were removed, the situation looks completely different, and the garbage collection was normal again. I don't have a full explanation for this behavior yet. The Sling Models had a reference to a ResourceResolver (which was properly closed), but I assume that this reference somehow "disabled" the cleaning of the cache on major GCs (as
Re: Sling Model Caching & GC problems
Sorry, yes, I see it now. Seems that Gmail has some issues and delivers some mails very delayed only (I only saw Carsten’s mail). Will check next time in the archives. Konrad > On 30. Jan 2024, at 10:45, Stefan Seifert > wrote: > >> @Joerg: Pauls mail never hit the mailing list. Can you forward it? > > must be a problem on you side - it's in the archives: > https://lists.apache.org/thread/xj6qosnjtgxvdf1lh58zs1bv5lh0y0q6 > > stefan
RE: Sling Model Caching & GC problems
> @Joerg: Pauls mail never hit the mailing list. Can you forward it? must be a problem on you side - it's in the archives: https://lists.apache.org/thread/xj6qosnjtgxvdf1lh58zs1bv5lh0y0q6 stefan
Re: Sling Model Caching & GC problems
@Joerg: Pauls mail never hit the mailing list. Can you forward it? > On 30. Jan 2024, at 10:28, Jörg Hoh wrote: > > Paul, > > thanks for your mail. I agree to your observation what's going on. I also > think that cache at the ResourceResolver level makes more sense, and it's > easier to control and remove cached models, which cannot be used any more. > > @Carsten: Regarding the size, we can still use a WeakHashMap as a container > for these Sling Models (and store the WeakHashMap in the PropertyMap of the > ResourceResolver), and that should prevent the worst, even in case of a > long-running ResourceResolver or many adaptions. > > Jörg > > > Am Di., 30. Jan. 2024 um 07:16 Uhr schrieb Carsten Ziegeler < > cziege...@apache.org>: > >> I don't know much about sling models caching, but it seems storing the >> cache into the resource resolver and therefore binding it to that >> lifecycle sounds very reasonable to me. And due to the mentioned support >> for Closeable, the cache gets discarded automatically with the resolver >> being closed. No extra handling required, no soft references etc. needed. >> >> Obviously, that sounds problematic with long or forever running resource >> resolvers as the cache would never be released. However, I guess thats >> not worse than what we have today. And long running resource resolvers >> are an anti-pattern anyway. >> >> Regards >> Carsten >> >> On 23.01.2024 17:23, Paul Bjorkstrand wrote: >>> Hi Jörg, >>> >>> My guess is that you are running up against the problem where the Model >> is >>> referencing its adaptable, directly or indirectly. In that situation, the >>> model would not be collectable because it is referenced more strongly >> than >>> by weak reference. The reference path of these might look like this: >>> >>> Model Cache Holder (the Model Adapter Factory >>> (strong reference) >>> Model Cache >>> (soft reference) >>> Model >>> (strong reference) >>> Resource Resolver >>> (strong reference, maybe indirectly) >>> Resource [the adaptable] >>> >>> >>> The resource is strongly or softly referenced, possibly indirectly, >> making >>> it ineligible for collection on normal GC cycles. The resource & >> resolver, >>> will not be collected until after the model is collected. Since the model >>> is not collected until there is memory pressure, that would explain why >> you >>> are seeing this (expensive) GC behavior. >>> >>> When memory pressure occurs, first the models (soft references) are >>> collected prior to the OOM, which releases both resolver (no longer >>> referenced via field in the model) and resource. >>> >>> The quickest fix is to release the resource resolver at the end of the >>> model’s constructor or @PostConstruct [2]. Alternatively, you can >> eliminate >>> the cache=true, provided it does not negatively impact your application >>> performance. >>> Another option, though more involved, is that the entire caching impl >> could >>> be changed to better handle these kinds of reference loops, by putting >> the >>> cache in the adaptable itself (Resolver, Resource, Request, etc). >> Possible >>> good candidates of these are [4] or request attributes. [4] seems to be >> the >>> overall best candidate, especially since you can hook into it using >>> Closeable ([5]) improving the cache eviction even more. >>> >>> As long as the object holding the cache is not referenced strongly >> outside >>> that reference loop (cache holder > cache > model > ... > cache holder), >>> then the loop's objects would be eligible for GC as soon as the cache >>> holder is eligible. >>> >>> [1]: >>> >> https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/ref/SoftReference.html >>> [2]: >>> >> https://sling.apache.org/documentation/bundles/models.html#a-note-about-cache-true-and-using-the-self-injector >>> [3]: >> https://github.com/apache/sling-org-apache-sling-models-impl/pull/18 >>> [4]: >>> >> https://github.com/apache/sling-org-apache-sling-api/blob/master/src/main/java/org/apache/sling/api/resource/ResourceResolver.java#L888 >>> [5]: >>> >> https://github.com/apache/sling-org-apache-sling-api/blob/master/src/main/java/org/apache/sling/api/resource/ResourceResolver.java#L610 >>> >>> // Paul >>> >>> >>> On Tue, Jan 23, 2024 at 6:48 AM Jörg Hoh > .invalid> >>> wrote: >>> Hi Sling community, I want to share a recent experience I had with Sling Models, Sling Model caching and Garbage Collection problems. I had a case, where an AEM instance had massive garbage collection problems, but no memory problems. We saw the regular sawtooth pattern in the heap consumption, but heavy GC activity (in a stop-the-world manner) almost constantly. But no OutOfMemory situation, there it's not a memory leak. I manually captured a heapdump and found a lot of Sling models being referenced by the Sling ModelAdapterFactory cache, and rechecking these model classes in detail I found
Re: Sling Model Caching & GC problems
Paul, thanks for your mail. I agree to your observation what's going on. I also think that cache at the ResourceResolver level makes more sense, and it's easier to control and remove cached models, which cannot be used any more. @Carsten: Regarding the size, we can still use a WeakHashMap as a container for these Sling Models (and store the WeakHashMap in the PropertyMap of the ResourceResolver), and that should prevent the worst, even in case of a long-running ResourceResolver or many adaptions. Jörg Am Di., 30. Jan. 2024 um 07:16 Uhr schrieb Carsten Ziegeler < cziege...@apache.org>: > I don't know much about sling models caching, but it seems storing the > cache into the resource resolver and therefore binding it to that > lifecycle sounds very reasonable to me. And due to the mentioned support > for Closeable, the cache gets discarded automatically with the resolver > being closed. No extra handling required, no soft references etc. needed. > > Obviously, that sounds problematic with long or forever running resource > resolvers as the cache would never be released. However, I guess thats > not worse than what we have today. And long running resource resolvers > are an anti-pattern anyway. > > Regards > Carsten > > On 23.01.2024 17:23, Paul Bjorkstrand wrote: > > Hi Jörg, > > > > My guess is that you are running up against the problem where the Model > is > > referencing its adaptable, directly or indirectly. In that situation, the > > model would not be collectable because it is referenced more strongly > than > > by weak reference. The reference path of these might look like this: > > > > Model Cache Holder (the Model Adapter Factory > > (strong reference) > > Model Cache > > (soft reference) > > Model > > (strong reference) > > Resource Resolver > > (strong reference, maybe indirectly) > > Resource [the adaptable] > > > > > > The resource is strongly or softly referenced, possibly indirectly, > making > > it ineligible for collection on normal GC cycles. The resource & > resolver, > > will not be collected until after the model is collected. Since the model > > is not collected until there is memory pressure, that would explain why > you > > are seeing this (expensive) GC behavior. > > > > When memory pressure occurs, first the models (soft references) are > > collected prior to the OOM, which releases both resolver (no longer > > referenced via field in the model) and resource. > > > > The quickest fix is to release the resource resolver at the end of the > > model’s constructor or @PostConstruct [2]. Alternatively, you can > eliminate > > the cache=true, provided it does not negatively impact your application > > performance. > > Another option, though more involved, is that the entire caching impl > could > > be changed to better handle these kinds of reference loops, by putting > the > > cache in the adaptable itself (Resolver, Resource, Request, etc). > Possible > > good candidates of these are [4] or request attributes. [4] seems to be > the > > overall best candidate, especially since you can hook into it using > > Closeable ([5]) improving the cache eviction even more. > > > > As long as the object holding the cache is not referenced strongly > outside > > that reference loop (cache holder > cache > model > ... > cache holder), > > then the loop's objects would be eligible for GC as soon as the cache > > holder is eligible. > > > > [1]: > > > https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/ref/SoftReference.html > > [2]: > > > https://sling.apache.org/documentation/bundles/models.html#a-note-about-cache-true-and-using-the-self-injector > > [3]: > https://github.com/apache/sling-org-apache-sling-models-impl/pull/18 > > [4]: > > > https://github.com/apache/sling-org-apache-sling-api/blob/master/src/main/java/org/apache/sling/api/resource/ResourceResolver.java#L888 > > [5]: > > > https://github.com/apache/sling-org-apache-sling-api/blob/master/src/main/java/org/apache/sling/api/resource/ResourceResolver.java#L610 > > > > // Paul > > > > > > On Tue, Jan 23, 2024 at 6:48 AM Jörg Hoh .invalid> > > wrote: > > > >> Hi Sling community, > >> > >> I want to share a recent experience I had with Sling Models, Sling Model > >> caching and Garbage Collection problems. > >> > >> I had a case, where an AEM instance had massive garbage collection > >> problems, but no memory problems. We saw the regular sawtooth pattern in > >> the heap consumption, but heavy GC activity (in a stop-the-world manner) > >> almost constantly. But no OutOfMemory situation, there it's not a memory > >> leak. > >> > >> I manually captured a heapdump and found a lot of Sling models being > >> referenced by the Sling ModelAdapterFactory cache, and rechecking these > >> model classes in detail I found them to specify "cache=true" in their > >> @Model annotation.When these statements were removed, the situation > looks > >> completely different, and the garbage collection was normal again. > >> > >> I don't
Re: Sling Model Caching & GC problems
I don't know much about sling models caching, but it seems storing the cache into the resource resolver and therefore binding it to that lifecycle sounds very reasonable to me. And due to the mentioned support for Closeable, the cache gets discarded automatically with the resolver being closed. No extra handling required, no soft references etc. needed. Obviously, that sounds problematic with long or forever running resource resolvers as the cache would never be released. However, I guess thats not worse than what we have today. And long running resource resolvers are an anti-pattern anyway. Regards Carsten On 23.01.2024 17:23, Paul Bjorkstrand wrote: Hi Jörg, My guess is that you are running up against the problem where the Model is referencing its adaptable, directly or indirectly. In that situation, the model would not be collectable because it is referenced more strongly than by weak reference. The reference path of these might look like this: Model Cache Holder (the Model Adapter Factory (strong reference) Model Cache (soft reference) Model (strong reference) Resource Resolver (strong reference, maybe indirectly) Resource [the adaptable] The resource is strongly or softly referenced, possibly indirectly, making it ineligible for collection on normal GC cycles. The resource & resolver, will not be collected until after the model is collected. Since the model is not collected until there is memory pressure, that would explain why you are seeing this (expensive) GC behavior. When memory pressure occurs, first the models (soft references) are collected prior to the OOM, which releases both resolver (no longer referenced via field in the model) and resource. The quickest fix is to release the resource resolver at the end of the model’s constructor or @PostConstruct [2]. Alternatively, you can eliminate the cache=true, provided it does not negatively impact your application performance. Another option, though more involved, is that the entire caching impl could be changed to better handle these kinds of reference loops, by putting the cache in the adaptable itself (Resolver, Resource, Request, etc). Possible good candidates of these are [4] or request attributes. [4] seems to be the overall best candidate, especially since you can hook into it using Closeable ([5]) improving the cache eviction even more. As long as the object holding the cache is not referenced strongly outside that reference loop (cache holder > cache > model > ... > cache holder), then the loop's objects would be eligible for GC as soon as the cache holder is eligible. [1]: https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/ref/SoftReference.html [2]: https://sling.apache.org/documentation/bundles/models.html#a-note-about-cache-true-and-using-the-self-injector [3]: https://github.com/apache/sling-org-apache-sling-models-impl/pull/18 [4]: https://github.com/apache/sling-org-apache-sling-api/blob/master/src/main/java/org/apache/sling/api/resource/ResourceResolver.java#L888 [5]: https://github.com/apache/sling-org-apache-sling-api/blob/master/src/main/java/org/apache/sling/api/resource/ResourceResolver.java#L610 // Paul On Tue, Jan 23, 2024 at 6:48 AM Jörg Hoh wrote: Hi Sling community, I want to share a recent experience I had with Sling Models, Sling Model caching and Garbage Collection problems. I had a case, where an AEM instance had massive garbage collection problems, but no memory problems. We saw the regular sawtooth pattern in the heap consumption, but heavy GC activity (in a stop-the-world manner) almost constantly. But no OutOfMemory situation, there it's not a memory leak. I manually captured a heapdump and found a lot of Sling models being referenced by the Sling ModelAdapterFactory cache, and rechecking these model classes in detail I found them to specify "cache=true" in their @Model annotation.When these statements were removed, the situation looks completely different, and the garbage collection was normal again. I don't have a full explanation for this behavior yet. The Sling Models had a reference to a ResourceResolver (which was properly closed), but I assume that this reference somehow "disabled" the cleaning of the cache on major GCs (as its a WeakHashMap), but tied the collection of these models to the collection of the ResourceResolver objects, which have finalizers registered. And finalizers are only executed under memory pressure. Having this connetion might have led to the situation that the SlingModel objects were not disposed eagerly, but only alongside the finalizers in situation of high memory pressure in a "stop-the-world" situation. I try to get some more examples for that behavior; but I am not sure of the caching of Sling Models as-is is something we should continue to use. This case was quite hard to crack, and I don't know if/how we can avoid such a situation by design. Jörg -- https://cqdump.joerghoh.de -- Carsten Ziegeler Adobe
Re: Sling Model Caching & GC problems
Hi Jörg, My guess is that you are running up against the problem where the Model is referencing its adaptable, directly or indirectly. In that situation, the model would not be collectable because it is referenced more strongly than by weak reference. The reference path of these might look like this: Model Cache Holder (the Model Adapter Factory (strong reference) Model Cache (soft reference) Model (strong reference) Resource Resolver (strong reference, maybe indirectly) Resource [the adaptable] The resource is strongly or softly referenced, possibly indirectly, making it ineligible for collection on normal GC cycles. The resource & resolver, will not be collected until after the model is collected. Since the model is not collected until there is memory pressure, that would explain why you are seeing this (expensive) GC behavior. When memory pressure occurs, first the models (soft references) are collected prior to the OOM, which releases both resolver (no longer referenced via field in the model) and resource. The quickest fix is to release the resource resolver at the end of the model’s constructor or @PostConstruct [2]. Alternatively, you can eliminate the cache=true, provided it does not negatively impact your application performance. Another option, though more involved, is that the entire caching impl could be changed to better handle these kinds of reference loops, by putting the cache in the adaptable itself (Resolver, Resource, Request, etc). Possible good candidates of these are [4] or request attributes. [4] seems to be the overall best candidate, especially since you can hook into it using Closeable ([5]) improving the cache eviction even more. As long as the object holding the cache is not referenced strongly outside that reference loop (cache holder > cache > model > ... > cache holder), then the loop's objects would be eligible for GC as soon as the cache holder is eligible. [1]: https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/ref/SoftReference.html [2]: https://sling.apache.org/documentation/bundles/models.html#a-note-about-cache-true-and-using-the-self-injector [3]: https://github.com/apache/sling-org-apache-sling-models-impl/pull/18 [4]: https://github.com/apache/sling-org-apache-sling-api/blob/master/src/main/java/org/apache/sling/api/resource/ResourceResolver.java#L888 [5]: https://github.com/apache/sling-org-apache-sling-api/blob/master/src/main/java/org/apache/sling/api/resource/ResourceResolver.java#L610 // Paul On Tue, Jan 23, 2024 at 6:48 AM Jörg Hoh wrote: > Hi Sling community, > > I want to share a recent experience I had with Sling Models, Sling Model > caching and Garbage Collection problems. > > I had a case, where an AEM instance had massive garbage collection > problems, but no memory problems. We saw the regular sawtooth pattern in > the heap consumption, but heavy GC activity (in a stop-the-world manner) > almost constantly. But no OutOfMemory situation, there it's not a memory > leak. > > I manually captured a heapdump and found a lot of Sling models being > referenced by the Sling ModelAdapterFactory cache, and rechecking these > model classes in detail I found them to specify "cache=true" in their > @Model annotation.When these statements were removed, the situation looks > completely different, and the garbage collection was normal again. > > I don't have a full explanation for this behavior yet. The Sling Models had > a reference to a ResourceResolver (which was properly closed), but I assume > that this reference somehow "disabled" the cleaning of the cache on major > GCs (as its a WeakHashMap), but tied the collection of these models to the > collection of the ResourceResolver objects, which have finalizers > registered. And finalizers are only executed under memory pressure. Having > this connetion might have led to the situation that the SlingModel objects > were not disposed eagerly, but only alongside the finalizers in situation > of high memory pressure in a "stop-the-world" situation. > > I try to get some more examples for that behavior; but I am not sure of the > caching of Sling Models as-is is something we should continue to use. This > case was quite hard to crack, and I don't know if/how we can avoid such a > situation by design. > > Jörg > > -- > https://cqdump.joerghoh.de >
Sling Model Caching & GC problems
Hi Sling community, I want to share a recent experience I had with Sling Models, Sling Model caching and Garbage Collection problems. I had a case, where an AEM instance had massive garbage collection problems, but no memory problems. We saw the regular sawtooth pattern in the heap consumption, but heavy GC activity (in a stop-the-world manner) almost constantly. But no OutOfMemory situation, there it's not a memory leak. I manually captured a heapdump and found a lot of Sling models being referenced by the Sling ModelAdapterFactory cache, and rechecking these model classes in detail I found them to specify "cache=true" in their @Model annotation.When these statements were removed, the situation looks completely different, and the garbage collection was normal again. I don't have a full explanation for this behavior yet. The Sling Models had a reference to a ResourceResolver (which was properly closed), but I assume that this reference somehow "disabled" the cleaning of the cache on major GCs (as its a WeakHashMap), but tied the collection of these models to the collection of the ResourceResolver objects, which have finalizers registered. And finalizers are only executed under memory pressure. Having this connetion might have led to the situation that the SlingModel objects were not disposed eagerly, but only alongside the finalizers in situation of high memory pressure in a "stop-the-world" situation. I try to get some more examples for that behavior; but I am not sure of the caching of Sling Models as-is is something we should continue to use. This case was quite hard to crack, and I don't know if/how we can avoid such a situation by design. Jörg -- https://cqdump.joerghoh.de