[jira] [Commented] (SOLR-11882) SolrMetric registries retain references to SolrCores when closed
[ https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16423749#comment-16423749 ] ASF subversion and git services commented on SOLR-11882: Commit 483914b6a4c5aaa163625169066e8c6bb3942566 in lucene-solr's branch refs/heads/branch_7x from [~ab] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=483914b ] SOLR-11882: SolrMetric registries retained references to SolrCores when closed. > SolrMetric registries retain references to SolrCores when closed > > > Key: SOLR-11882 > URL: https://issues.apache.org/jira/browse/SOLR-11882 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics, Server >Affects Versions: 7.1 >Reporter: Eros Taborelli >Assignee: Andrzej Bialecki >Priority: Major > Fix For: 7.4, master (8.0) > > Attachments: SOLR-11882-7x.patch, SOLR-11882.patch, SOLR-11882.patch, > SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, > create-cores.zip, solr-dump-full_Leak_Suspects.zip, solr.config.zip > > > *Description:* > Our setup involves using a lot of small cores (possibly hundred thousand), > but working only on a few of them at any given time. > We already followed all recommendations in this guide: > [https://wiki.apache.org/solr/LotsOfCores] > We noticed that after creating/loading around 1000-2000 empty cores, with no > documents inside, the heap consumption went through the roof despite having > set transientCacheSize to only 64 (heap size set to 12G). > All cores are correctly set to loadOnStartup=false and transient=true, and we > have verified via logs that the cores in excess are actually being closed. > However, a reference remains in the > org.apache.solr.metrics.SolrMetricManager#registries that is never removed > until a core if fully unloaded. > Restarting the JVM loads all cores in the admin UI, but doesn't populate the > ConcurrentHashMap until a core is actually fully loaded. > I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size > = 512m) and made a report (attached) using eclipse MAT. > *Desired outcome:* > When a transient core is closed, the references in the SolrMetricManager > should be removed, in the same fashion the reporters for the core are also > closed and removed. > In alternative, a unloadOnClose=true|false flag could be implemented to fully > unload a transient core when closed due to the cache size. > *Note:* > The documentation mentions everywhere that the unused cores will be unloaded, > but it's misleading as the cores are never fully unloaded. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11882) SolrMetric registries retain references to SolrCores when closed
[ https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418820#comment-16418820 ] Andrzej Bialecki commented on SOLR-11882: -- Patch for branch_7x that allows us to maintain back-compat API - this is functionally identical to the patch for master except the changes in {{SolrMetricProducer}} interface. The old method is marked here as deprecated, and the new method has a default implementation that calls the old one - so third-party components that implement only the old method will be correctly called by Solr via the default impl. of the new method. If there are no objections I'll commit this shortly. > SolrMetric registries retain references to SolrCores when closed > > > Key: SOLR-11882 > URL: https://issues.apache.org/jira/browse/SOLR-11882 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics, Server >Affects Versions: 7.1 >Reporter: Eros Taborelli >Assignee: Andrzej Bialecki >Priority: Major > Fix For: 7.4, master (8.0) > > Attachments: SOLR-11882-7x.patch, SOLR-11882.patch, SOLR-11882.patch, > SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, > create-cores.zip, solr-dump-full_Leak_Suspects.zip, solr.config.zip > > > *Description:* > Our setup involves using a lot of small cores (possibly hundred thousand), > but working only on a few of them at any given time. > We already followed all recommendations in this guide: > [https://wiki.apache.org/solr/LotsOfCores] > We noticed that after creating/loading around 1000-2000 empty cores, with no > documents inside, the heap consumption went through the roof despite having > set transientCacheSize to only 64 (heap size set to 12G). > All cores are correctly set to loadOnStartup=false and transient=true, and we > have verified via logs that the cores in excess are actually being closed. > However, a reference remains in the > org.apache.solr.metrics.SolrMetricManager#registries that is never removed > until a core if fully unloaded. > Restarting the JVM loads all cores in the admin UI, but doesn't populate the > ConcurrentHashMap until a core is actually fully loaded. > I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size > = 512m) and made a report (attached) using eclipse MAT. > *Desired outcome:* > When a transient core is closed, the references in the SolrMetricManager > should be removed, in the same fashion the reporters for the core are also > closed and removed. > In alternative, a unloadOnClose=true|false flag could be implemented to fully > unload a transient core when closed due to the cache size. > *Note:* > The documentation mentions everywhere that the unused cores will be unloaded, > but it's misleading as the cores are never fully unloaded. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11882) SolrMetric registries retain references to SolrCores when closed
[ https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16417100#comment-16417100 ] ASF subversion and git services commented on SOLR-11882: Commit 7260d9ce713b5f6378b97e4c64f3045eb62f98bd in lucene-solr's branch refs/heads/master from [~ab] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7260d9c ] SOLR-11882: SolrMetric registries retained references to SolrCores when closed. > SolrMetric registries retain references to SolrCores when closed > > > Key: SOLR-11882 > URL: https://issues.apache.org/jira/browse/SOLR-11882 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics, Server >Affects Versions: 7.1 >Reporter: Eros Taborelli >Assignee: Andrzej Bialecki >Priority: Major > Fix For: master (8.0) > > Attachments: SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, > SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, create-cores.zip, > solr-dump-full_Leak_Suspects.zip, solr.config.zip > > > *Description:* > Our setup involves using a lot of small cores (possibly hundred thousand), > but working only on a few of them at any given time. > We already followed all recommendations in this guide: > [https://wiki.apache.org/solr/LotsOfCores] > We noticed that after creating/loading around 1000-2000 empty cores, with no > documents inside, the heap consumption went through the roof despite having > set transientCacheSize to only 64 (heap size set to 12G). > All cores are correctly set to loadOnStartup=false and transient=true, and we > have verified via logs that the cores in excess are actually being closed. > However, a reference remains in the > org.apache.solr.metrics.SolrMetricManager#registries that is never removed > until a core if fully unloaded. > Restarting the JVM loads all cores in the admin UI, but doesn't populate the > ConcurrentHashMap until a core is actually fully loaded. > I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size > = 512m) and made a report (attached) using eclipse MAT. > *Desired outcome:* > When a transient core is closed, the references in the SolrMetricManager > should be removed, in the same fashion the reporters for the core are also > closed and removed. > In alternative, a unloadOnClose=true|false flag could be implemented to fully > unload a transient core when closed due to the cache size. > *Note:* > The documentation mentions everywhere that the unused cores will be unloaded, > but it's misleading as the cores are never fully unloaded. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11882) SolrMetric registries retain references to SolrCores when closed
[ https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416468#comment-16416468 ] Andrzej Bialecki commented on SOLR-11882: -- Updated patch. This adds a SolrCore instance identifier (tag) to all gauges in a registry, which are then matched and removed when SolrCore is closed. The size of the patch is partially caused by the change in {{SolrMetricProducer.initializeMetrics(...)}} and the need to pass around the SolrCore instance tag. All unit tests pass, and the scenario described above also passes, ie. produces only 2 strongly referenced SolrCore objects. > SolrMetric registries retain references to SolrCores when closed > > > Key: SOLR-11882 > URL: https://issues.apache.org/jira/browse/SOLR-11882 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics, Server >Affects Versions: 7.1 >Reporter: Eros Taborelli >Assignee: Andrzej Bialecki >Priority: Major > Fix For: master (8.0) > > Attachments: SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, > SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, create-cores.zip, > solr-dump-full_Leak_Suspects.zip, solr.config.zip > > > *Description:* > Our setup involves using a lot of small cores (possibly hundred thousand), > but working only on a few of them at any given time. > We already followed all recommendations in this guide: > [https://wiki.apache.org/solr/LotsOfCores] > We noticed that after creating/loading around 1000-2000 empty cores, with no > documents inside, the heap consumption went through the roof despite having > set transientCacheSize to only 64 (heap size set to 12G). > All cores are correctly set to loadOnStartup=false and transient=true, and we > have verified via logs that the cores in excess are actually being closed. > However, a reference remains in the > org.apache.solr.metrics.SolrMetricManager#registries that is never removed > until a core if fully unloaded. > Restarting the JVM loads all cores in the admin UI, but doesn't populate the > ConcurrentHashMap until a core is actually fully loaded. > I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size > = 512m) and made a report (attached) using eclipse MAT. > *Desired outcome:* > When a transient core is closed, the references in the SolrMetricManager > should be removed, in the same fashion the reporters for the core are also > closed and removed. > In alternative, a unloadOnClose=true|false flag could be implemented to fully > unload a transient core when closed due to the cache size. > *Note:* > The documentation mentions everywhere that the unused cores will be unloaded, > but it's misleading as the cores are never fully unloaded. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11882) SolrMetric registries retain references to SolrCores when closed
[ https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416355#comment-16416355 ] Andrzej Bialecki commented on SOLR-11882: -- bq. Could we introspect the impl and do the right thing with new impls that take the new param? That would be exceedingly messy - this method is called in many components and from different contexts (eg. most but not all mbeans are initialized in SolrCore, but handlers are also initialized in CoreContainer, some components initialize their own sub-components, etc...) > SolrMetric registries retain references to SolrCores when closed > > > Key: SOLR-11882 > URL: https://issues.apache.org/jira/browse/SOLR-11882 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics, Server >Affects Versions: 7.1 >Reporter: Eros Taborelli >Assignee: Andrzej Bialecki >Priority: Major > Fix For: master (8.0) > > Attachments: SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, > SOLR-11882.patch, SOLR-11882.patch, create-cores.zip, > solr-dump-full_Leak_Suspects.zip, solr.config.zip > > > *Description:* > Our setup involves using a lot of small cores (possibly hundred thousand), > but working only on a few of them at any given time. > We already followed all recommendations in this guide: > [https://wiki.apache.org/solr/LotsOfCores] > We noticed that after creating/loading around 1000-2000 empty cores, with no > documents inside, the heap consumption went through the roof despite having > set transientCacheSize to only 64 (heap size set to 12G). > All cores are correctly set to loadOnStartup=false and transient=true, and we > have verified via logs that the cores in excess are actually being closed. > However, a reference remains in the > org.apache.solr.metrics.SolrMetricManager#registries that is never removed > until a core if fully unloaded. > Restarting the JVM loads all cores in the admin UI, but doesn't populate the > ConcurrentHashMap until a core is actually fully loaded. > I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size > = 512m) and made a report (attached) using eclipse MAT. > *Desired outcome:* > When a transient core is closed, the references in the SolrMetricManager > should be removed, in the same fashion the reporters for the core are also > closed and removed. > In alternative, a unloadOnClose=true|false flag could be implemented to fully > unload a transient core when closed due to the cache size. > *Note:* > The documentation mentions everywhere that the unused cores will be unloaded, > but it's misleading as the cores are never fully unloaded. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11882) SolrMetric registries retain references to SolrCores when closed
[ https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416335#comment-16416335 ] Mark Miller commented on SOLR-11882: bq. definitely not for 7.3. Could we introspect the impl and do the right thing with new impls that take the new param? > SolrMetric registries retain references to SolrCores when closed > > > Key: SOLR-11882 > URL: https://issues.apache.org/jira/browse/SOLR-11882 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics, Server >Affects Versions: 7.1 >Reporter: Eros Taborelli >Assignee: Andrzej Bialecki >Priority: Major > Fix For: master (8.0) > > Attachments: SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, > SOLR-11882.patch, SOLR-11882.patch, create-cores.zip, > solr-dump-full_Leak_Suspects.zip, solr.config.zip > > > *Description:* > Our setup involves using a lot of small cores (possibly hundred thousand), > but working only on a few of them at any given time. > We already followed all recommendations in this guide: > [https://wiki.apache.org/solr/LotsOfCores] > We noticed that after creating/loading around 1000-2000 empty cores, with no > documents inside, the heap consumption went through the roof despite having > set transientCacheSize to only 64 (heap size set to 12G). > All cores are correctly set to loadOnStartup=false and transient=true, and we > have verified via logs that the cores in excess are actually being closed. > However, a reference remains in the > org.apache.solr.metrics.SolrMetricManager#registries that is never removed > until a core if fully unloaded. > Restarting the JVM loads all cores in the admin UI, but doesn't populate the > ConcurrentHashMap until a core is actually fully loaded. > I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size > = 512m) and made a report (attached) using eclipse MAT. > *Desired outcome:* > When a transient core is closed, the references in the SolrMetricManager > should be removed, in the same fashion the reporters for the core are also > closed and removed. > In alternative, a unloadOnClose=true|false flag could be implemented to fully > unload a transient core when closed due to the cache size. > *Note:* > The documentation mentions everywhere that the unused cores will be unloaded, > but it's misleading as the cores are never fully unloaded. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11882) SolrMetric registries retain references to SolrCores when closed
[ https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416320#comment-16416320 ] Andrzej Bialecki commented on SOLR-11882: -- [~romseygeek] The current patch is broken (Solr silently loses some metrics from active cores .. oops). I'm preparing a new patch that is conceptually simpler and appears to be working well. However, this new fix requires changing the API of {{SolrMetricProducer}} (new parameter in {{initializeMetrics(...)}} method) so I think it's suitable only for 8.0 - definitely not for 7.3. > SolrMetric registries retain references to SolrCores when closed > > > Key: SOLR-11882 > URL: https://issues.apache.org/jira/browse/SOLR-11882 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics, Server >Affects Versions: 7.1 >Reporter: Eros Taborelli >Assignee: Andrzej Bialecki >Priority: Major > Fix For: master (8.0) > > Attachments: SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, > SOLR-11882.patch, SOLR-11882.patch, create-cores.zip, > solr-dump-full_Leak_Suspects.zip, solr.config.zip > > > *Description:* > Our setup involves using a lot of small cores (possibly hundred thousand), > but working only on a few of them at any given time. > We already followed all recommendations in this guide: > [https://wiki.apache.org/solr/LotsOfCores] > We noticed that after creating/loading around 1000-2000 empty cores, with no > documents inside, the heap consumption went through the roof despite having > set transientCacheSize to only 64 (heap size set to 12G). > All cores are correctly set to loadOnStartup=false and transient=true, and we > have verified via logs that the cores in excess are actually being closed. > However, a reference remains in the > org.apache.solr.metrics.SolrMetricManager#registries that is never removed > until a core if fully unloaded. > Restarting the JVM loads all cores in the admin UI, but doesn't populate the > ConcurrentHashMap until a core is actually fully loaded. > I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size > = 512m) and made a report (attached) using eclipse MAT. > *Desired outcome:* > When a transient core is closed, the references in the SolrMetricManager > should be removed, in the same fashion the reporters for the core are also > closed and removed. > In alternative, a unloadOnClose=true|false flag could be implemented to fully > unload a transient core when closed due to the cache size. > *Note:* > The documentation mentions everywhere that the unused cores will be unloaded, > but it's misleading as the cores are never fully unloaded. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11882) SolrMetric registries retain references to SolrCores when closed
[ https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16415849#comment-16415849 ] Erick Erickson commented on SOLR-11882: --- [~romseygeek] This has been around since at least 6.4 I believe. Plus, it's rather obscure. In the normal course of events, we don't close cores _and leave them closed_. If we re-open a core, the orphan reference is re-assigned. So if I open/close/open/close the same core a zillion times, I only have one SolrCore object. It manifests itself is if someone is using "transient cores". In that case, if the transient core cache is capped at, say, 10 and I have 100 transient cores, after I cycle through them all I'll have 90 "orphan" references. I'll never have more than that though. And never less. For anyone running "stock" Solr, it won't show up. People will open all the cores at startup and keep them open (even if reopened) and won't have any orphans. I suppose if people are unloading cores it might occur as well, but I think that's rare. All FYI to evaluate whether you want to put it in 7.3. For people affected it is, indeed serious if they have a lot of cores > SolrMetric registries retain references to SolrCores when closed > > > Key: SOLR-11882 > URL: https://issues.apache.org/jira/browse/SOLR-11882 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics, Server >Affects Versions: 7.1 >Reporter: Eros Taborelli >Assignee: Erick Erickson >Priority: Major > Attachments: SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, > SOLR-11882.patch, SOLR-11882.patch, create-cores.zip, > solr-dump-full_Leak_Suspects.zip, solr.config.zip > > > *Description:* > Our setup involves using a lot of small cores (possibly hundred thousand), > but working only on a few of them at any given time. > We already followed all recommendations in this guide: > [https://wiki.apache.org/solr/LotsOfCores] > We noticed that after creating/loading around 1000-2000 empty cores, with no > documents inside, the heap consumption went through the roof despite having > set transientCacheSize to only 64 (heap size set to 12G). > All cores are correctly set to loadOnStartup=false and transient=true, and we > have verified via logs that the cores in excess are actually being closed. > However, a reference remains in the > org.apache.solr.metrics.SolrMetricManager#registries that is never removed > until a core if fully unloaded. > Restarting the JVM loads all cores in the admin UI, but doesn't populate the > ConcurrentHashMap until a core is actually fully loaded. > I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size > = 512m) and made a report (attached) using eclipse MAT. > *Desired outcome:* > When a transient core is closed, the references in the SolrMetricManager > should be removed, in the same fashion the reporters for the core are also > closed and removed. > In alternative, a unloadOnClose=true|false flag could be implemented to fully > unload a transient core when closed due to the cache size. > *Note:* > The documentation mentions everywhere that the unused cores will be unloaded, > but it's misleading as the cores are never fully unloaded. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11882) SolrMetric registries retain references to SolrCores when closed
[ https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16415225#comment-16415225 ] Andrzej Bialecki commented on SOLR-11882: -- [~romseygeek] the patch appears to be working but as I indicated above it's probably not the whole story, so let's wait at least for a few jenkins builds to confirm that it doesn't break anything. > SolrMetric registries retain references to SolrCores when closed > > > Key: SOLR-11882 > URL: https://issues.apache.org/jira/browse/SOLR-11882 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics, Server >Affects Versions: 7.1 >Reporter: Eros Taborelli >Assignee: Erick Erickson >Priority: Major > Attachments: SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, > SOLR-11882.patch, SOLR-11882.patch, create-cores.zip, > solr-dump-full_Leak_Suspects.zip, solr.config.zip > > > *Description:* > Our setup involves using a lot of small cores (possibly hundred thousand), > but working only on a few of them at any given time. > We already followed all recommendations in this guide: > [https://wiki.apache.org/solr/LotsOfCores] > We noticed that after creating/loading around 1000-2000 empty cores, with no > documents inside, the heap consumption went through the roof despite having > set transientCacheSize to only 64 (heap size set to 12G). > All cores are correctly set to loadOnStartup=false and transient=true, and we > have verified via logs that the cores in excess are actually being closed. > However, a reference remains in the > org.apache.solr.metrics.SolrMetricManager#registries that is never removed > until a core if fully unloaded. > Restarting the JVM loads all cores in the admin UI, but doesn't populate the > ConcurrentHashMap until a core is actually fully loaded. > I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size > = 512m) and made a report (attached) using eclipse MAT. > *Desired outcome:* > When a transient core is closed, the references in the SolrMetricManager > should be removed, in the same fashion the reporters for the core are also > closed and removed. > In alternative, a unloadOnClose=true|false flag could be implemented to fully > unload a transient core when closed due to the cache size. > *Note:* > The documentation mentions everywhere that the unused cores will be unloaded, > but it's misleading as the cores are never fully unloaded. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11882) SolrMetric registries retain references to SolrCores when closed
[ https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16415218#comment-16415218 ] Alan Woodward commented on SOLR-11882: -- This looks like a pretty serious bug. I'm going to build another RC for the 7.3.0 release once SOLR-12141 is in, do you want to backport this fix as well or does it need more time to bake? > SolrMetric registries retain references to SolrCores when closed > > > Key: SOLR-11882 > URL: https://issues.apache.org/jira/browse/SOLR-11882 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics, Server >Affects Versions: 7.1 >Reporter: Eros Taborelli >Assignee: Erick Erickson >Priority: Major > Attachments: SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, > SOLR-11882.patch, SOLR-11882.patch, create-cores.zip, > solr-dump-full_Leak_Suspects.zip, solr.config.zip > > > *Description:* > Our setup involves using a lot of small cores (possibly hundred thousand), > but working only on a few of them at any given time. > We already followed all recommendations in this guide: > [https://wiki.apache.org/solr/LotsOfCores] > We noticed that after creating/loading around 1000-2000 empty cores, with no > documents inside, the heap consumption went through the roof despite having > set transientCacheSize to only 64 (heap size set to 12G). > All cores are correctly set to loadOnStartup=false and transient=true, and we > have verified via logs that the cores in excess are actually being closed. > However, a reference remains in the > org.apache.solr.metrics.SolrMetricManager#registries that is never removed > until a core if fully unloaded. > Restarting the JVM loads all cores in the admin UI, but doesn't populate the > ConcurrentHashMap until a core is actually fully loaded. > I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size > = 512m) and made a report (attached) using eclipse MAT. > *Desired outcome:* > When a transient core is closed, the references in the SolrMetricManager > should be removed, in the same fashion the reporters for the core are also > closed and removed. > In alternative, a unloadOnClose=true|false flag could be implemented to fully > unload a transient core when closed due to the cache size. > *Note:* > The documentation mentions everywhere that the unused cores will be unloaded, > but it's misleading as the cores are never fully unloaded. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11882) SolrMetric registries retain references to SolrCores when closed
[ https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414314#comment-16414314 ] Erick Erickson commented on SOLR-11882: --- [~ab] LGTM, after I cycle through my tests (and do a GC), my SolrCore count now drops back to (non-transient-cores + transient-queue-size) which is what I'd hoped for. Thanks! > SolrMetric registries retain references to SolrCores when closed > > > Key: SOLR-11882 > URL: https://issues.apache.org/jira/browse/SOLR-11882 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics, Server >Affects Versions: 7.1 >Reporter: Eros Taborelli >Assignee: Erick Erickson >Priority: Major > Attachments: SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, > SOLR-11882.patch, SOLR-11882.patch, create-cores.zip, > solr-dump-full_Leak_Suspects.zip, solr.config.zip > > > *Description:* > Our setup involves using a lot of small cores (possibly hundred thousand), > but working only on a few of them at any given time. > We already followed all recommendations in this guide: > [https://wiki.apache.org/solr/LotsOfCores] > We noticed that after creating/loading around 1000-2000 empty cores, with no > documents inside, the heap consumption went through the roof despite having > set transientCacheSize to only 64 (heap size set to 12G). > All cores are correctly set to loadOnStartup=false and transient=true, and we > have verified via logs that the cores in excess are actually being closed. > However, a reference remains in the > org.apache.solr.metrics.SolrMetricManager#registries that is never removed > until a core if fully unloaded. > Restarting the JVM loads all cores in the admin UI, but doesn't populate the > ConcurrentHashMap until a core is actually fully loaded. > I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size > = 512m) and made a report (attached) using eclipse MAT. > *Desired outcome:* > When a transient core is closed, the references in the SolrMetricManager > should be removed, in the same fashion the reporters for the core are also > closed and removed. > In alternative, a unloadOnClose=true|false flag could be implemented to fully > unload a transient core when closed due to the cache size. > *Note:* > The documentation mentions everywhere that the unused cores will be unloaded, > but it's misleading as the cores are never fully unloaded. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11882) SolrMetric registries retain references to SolrCores when closed
[ https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16413869#comment-16413869 ] Andrzej Bialecki commented on SOLR-11882: -- Here's the setup that I used to test and verify this issue: * created {{core0/conf .. core9/conf}} dirs under {{server/solr/}} and copied the {{_default}} configset to each of the conf dirs. * created in each {{core0 .. core9}} dir a {{core.properties}} file containing a single line: {{transient=true}} * modified {{server/solr/solr.xml}} to contain {{2}} under {{solr}} element * ran {{bin/solr start}} and issued a simple query request to each of the cores, to force its loading (and unloading from the small cache) After attaching a profiler I was able to verify that indeed 10 instances of SolrCore exist, all strongly referenced, and forcing GC doesn't affect this. I attached a possible patch - it associates each Gauge with the SolrInfoBean that registered it, and then unregisters these gauge instances that correspond to the bean that is being closed (whether it's SolrCore or other plugin). There are a few things that I don't like about this patch, though: I used {{WeakReference}} to tell JVM that it can garbage collect the lambdas as soon as their parent object is unreferenced, and I had to explicitly call unregistration in {{SolrCoreMetricManager.close()}}. Either one of these didn't work on its own, although I think the unregistration step should - only when used both I could see that indeed the references to old transient cores were being released. So there's likely still some other factor at play here... but at least the patch can be used as a workaround. > SolrMetric registries retain references to SolrCores when closed > > > Key: SOLR-11882 > URL: https://issues.apache.org/jira/browse/SOLR-11882 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics, Server >Affects Versions: 7.1 >Reporter: Eros Taborelli >Assignee: Erick Erickson >Priority: Major > Attachments: SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, > SOLR-11882.patch, create-cores.zip, solr-dump-full_Leak_Suspects.zip, > solr.config.zip > > > *Description:* > Our setup involves using a lot of small cores (possibly hundred thousand), > but working only on a few of them at any given time. > We already followed all recommendations in this guide: > [https://wiki.apache.org/solr/LotsOfCores] > We noticed that after creating/loading around 1000-2000 empty cores, with no > documents inside, the heap consumption went through the roof despite having > set transientCacheSize to only 64 (heap size set to 12G). > All cores are correctly set to loadOnStartup=false and transient=true, and we > have verified via logs that the cores in excess are actually being closed. > However, a reference remains in the > org.apache.solr.metrics.SolrMetricManager#registries that is never removed > until a core if fully unloaded. > Restarting the JVM loads all cores in the admin UI, but doesn't populate the > ConcurrentHashMap until a core is actually fully loaded. > I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size > = 512m) and made a report (attached) using eclipse MAT. > *Desired outcome:* > When a transient core is closed, the references in the SolrMetricManager > should be removed, in the same fashion the reporters for the core are also > closed and removed. > In alternative, a unloadOnClose=true|false flag could be implemented to fully > unload a transient core when closed due to the cache size. > *Note:* > The documentation mentions everywhere that the unused cores will be unloaded, > but it's misleading as the cores are never fully unloaded. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11882) SolrMetric registries retain references to SolrCores when closed
[ https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16402474#comment-16402474 ] Erick Erickson commented on SOLR-11882: --- [~ab] Here's a one-line fix that I don't particularly like but thought I'd add to the conversation: this is in SolrCores, almost at the very end of the file {{ @Override public void update(Observable o, Object arg) { SolrCore core = (SolrCore)arg; // delete metrics specific to this core container.getMetricManager().removeRegistry(core.getCoreMetricManager().getRegistryName()); // this is the important bit. synchronized (modifyLock) { pendingCloses.add(core); // Essentially just queue this core up for closing. modifyLock.notifyAll(); // Wakes up closer thread too } } }} _Unloading_ a non-transient core doesn't have the same problem since the line I stole is executed when unloading a core. Reloading a core (as you already pointed out) replaces the old reference with a new one so that's no problem. Just closing a transient core is where the problem is, so this code is executed when a transient core is on its way to being closed rather than in the close code itself. What I don't like about it is it's rather loosely coupled with the close, by that I mean if there's some other code somewhere that closes a core _that_ code has to remember to do this too. Anyway, I'll be happy to test anything else you come up with, it'll take me 10 minutes or so to see what the effects of any changes you want me to try is, at least as far as transient cores goes. > SolrMetric registries retain references to SolrCores when closed > > > Key: SOLR-11882 > URL: https://issues.apache.org/jira/browse/SOLR-11882 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics, Server >Affects Versions: 7.1 >Reporter: Eros Taborelli >Assignee: Erick Erickson >Priority: Major > Attachments: SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, > SOLR-11882.patch, create-cores.zip, solr-dump-full_Leak_Suspects.zip, > solr.config.zip > > > *Description:* > Our setup involves using a lot of small cores (possibly hundred thousand), > but working only on a few of them at any given time. > We already followed all recommendations in this guide: > [https://wiki.apache.org/solr/LotsOfCores] > We noticed that after creating/loading around 1000-2000 empty cores, with no > documents inside, the heap consumption went through the roof despite having > set transientCacheSize to only 64 (heap size set to 12G). > All cores are correctly set to loadOnStartup=false and transient=true, and we > have verified via logs that the cores in excess are actually being closed. > However, a reference remains in the > org.apache.solr.metrics.SolrMetricManager#registries that is never removed > until a core if fully unloaded. > Restarting the JVM loads all cores in the admin UI, but doesn't populate the > ConcurrentHashMap until a core is actually fully loaded. > I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size > = 512m) and made a report (attached) using eclipse MAT. > *Desired outcome:* > When a transient core is closed, the references in the SolrMetricManager > should be removed, in the same fashion the reporters for the core are also > closed and removed. > In alternative, a unloadOnClose=true|false flag could be implemented to fully > unload a transient core when closed due to the cache size. > *Note:* > The documentation mentions everywhere that the unused cores will be unloaded, > but it's misleading as the cores are never fully unloaded. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11882) SolrMetric registries retain references to SolrCores when closed
[ https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16399440#comment-16399440 ] Erick Erickson commented on SOLR-11882: --- [~ab] OK, I think the light finally dawned. We're talking about two different cases and they both have to be handled. 1> transient core case, the one I'm started with. In this case, the core is closed out and _may_, some time in the near or far future be opened again. In this case the patch from 28-Jan is probably almost fine although there's still a (probably small but unacceptable) chance that a new version of the core would be opened before the closer thread got 'round to closing the old one. 2> reopening a core which is the case you're talking about in your comment 1-Feb. In <2> there's no problem with cores accumulating due to the reference in the metrics code since they've been released by the new assignment already. Does that make sense? And is there a good way other than inspection to test any fixes I make? Thanks! > SolrMetric registries retain references to SolrCores when closed > > > Key: SOLR-11882 > URL: https://issues.apache.org/jira/browse/SOLR-11882 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics, Server >Affects Versions: 7.1 >Reporter: Eros Taborelli >Assignee: Erick Erickson >Priority: Major > Attachments: SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, > SOLR-11882.patch, create-cores.zip, solr-dump-full_Leak_Suspects.zip, > solr.config.zip > > > *Description:* > Our setup involves using a lot of small cores (possibly hundred thousand), > but working only on a few of them at any given time. > We already followed all recommendations in this guide: > [https://wiki.apache.org/solr/LotsOfCores] > We noticed that after creating/loading around 1000-2000 empty cores, with no > documents inside, the heap consumption went through the roof despite having > set transientCacheSize to only 64 (heap size set to 12G). > All cores are correctly set to loadOnStartup=false and transient=true, and we > have verified via logs that the cores in excess are actually being closed. > However, a reference remains in the > org.apache.solr.metrics.SolrMetricManager#registries that is never removed > until a core if fully unloaded. > Restarting the JVM loads all cores in the admin UI, but doesn't populate the > ConcurrentHashMap until a core is actually fully loaded. > I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size > = 512m) and made a report (attached) using eclipse MAT. > *Desired outcome:* > When a transient core is closed, the references in the SolrMetricManager > should be removed, in the same fashion the reporters for the core are also > closed and removed. > In alternative, a unloadOnClose=true|false flag could be implemented to fully > unload a transient core when closed due to the cache size. > *Note:* > The documentation mentions everywhere that the unused cores will be unloaded, > but it's misleading as the cores are never fully unloaded. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11882) SolrMetric registries retain references to SolrCores when closed
[ https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348957#comment-16348957 ] ASF subversion and git services commented on SOLR-11882: Commit b586dca89ff0b7c365dcbb3e1e403adf477790b1 in lucene-solr's branch refs/heads/branch_7x from [~ab] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b586dca ] Revert "SOLR-11882: SolrMetric registries retain references to SolrCores when closed" This reverts commit 2feb3e794a03e07fa1eee34188d667f24d357db5. > SolrMetric registries retain references to SolrCores when closed > > > Key: SOLR-11882 > URL: https://issues.apache.org/jira/browse/SOLR-11882 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics, Server >Affects Versions: 7.1 >Reporter: Eros Taborelli >Assignee: Erick Erickson >Priority: Major > Fix For: 7.3 > > Attachments: SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, > SOLR-11882.patch, create-cores.zip, solr-dump-full_Leak_Suspects.zip, > solr.config.zip > > > *Description:* > Our setup involves using a lot of small cores (possibly hundred thousand), > but working only on a few of them at any given time. > We already followed all recommendations in this guide: > [https://wiki.apache.org/solr/LotsOfCores] > We noticed that after creating/loading around 1000-2000 empty cores, with no > documents inside, the heap consumption went through the roof despite having > set transientCacheSize to only 64 (heap size set to 12G). > All cores are correctly set to loadOnStartup=false and transient=true, and we > have verified via logs that the cores in excess are actually being closed. > However, a reference remains in the > org.apache.solr.metrics.SolrMetricManager#registries that is never removed > until a core if fully unloaded. > Restarting the JVM loads all cores in the admin UI, but doesn't populate the > ConcurrentHashMap until a core is actually fully loaded. > I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size > = 512m) and made a report (attached) using eclipse MAT. > *Desired outcome:* > When a transient core is closed, the references in the SolrMetricManager > should be removed, in the same fashion the reporters for the core are also > closed and removed. > In alternative, a unloadOnClose=true|false flag could be implemented to fully > unload a transient core when closed due to the cache size. > *Note:* > The documentation mentions everywhere that the unused cores will be unloaded, > but it's misleading as the cores are never fully unloaded. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11882) SolrMetric registries retain references to SolrCores when closed
[ https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348956#comment-16348956 ] ASF subversion and git services commented on SOLR-11882: Commit 83696042649e5c7460c47d0ca121c46a58d2fa54 in lucene-solr's branch refs/heads/branch_7x from [~ab] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=8369604 ] Revert "SOLR-11882: SolrMetric registries retain references to SolrCores when closed" This reverts commit a729fc83311a2f6426664d098d2a5920e2b62852. > SolrMetric registries retain references to SolrCores when closed > > > Key: SOLR-11882 > URL: https://issues.apache.org/jira/browse/SOLR-11882 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics, Server >Affects Versions: 7.1 >Reporter: Eros Taborelli >Assignee: Erick Erickson >Priority: Major > Fix For: 7.3 > > Attachments: SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, > SOLR-11882.patch, create-cores.zip, solr-dump-full_Leak_Suspects.zip, > solr.config.zip > > > *Description:* > Our setup involves using a lot of small cores (possibly hundred thousand), > but working only on a few of them at any given time. > We already followed all recommendations in this guide: > [https://wiki.apache.org/solr/LotsOfCores] > We noticed that after creating/loading around 1000-2000 empty cores, with no > documents inside, the heap consumption went through the roof despite having > set transientCacheSize to only 64 (heap size set to 12G). > All cores are correctly set to loadOnStartup=false and transient=true, and we > have verified via logs that the cores in excess are actually being closed. > However, a reference remains in the > org.apache.solr.metrics.SolrMetricManager#registries that is never removed > until a core if fully unloaded. > Restarting the JVM loads all cores in the admin UI, but doesn't populate the > ConcurrentHashMap until a core is actually fully loaded. > I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size > = 512m) and made a report (attached) using eclipse MAT. > *Desired outcome:* > When a transient core is closed, the references in the SolrMetricManager > should be removed, in the same fashion the reporters for the core are also > closed and removed. > In alternative, a unloadOnClose=true|false flag could be implemented to fully > unload a transient core when closed due to the cache size. > *Note:* > The documentation mentions everywhere that the unused cores will be unloaded, > but it's misleading as the cores are never fully unloaded. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11882) SolrMetric registries retain references to SolrCores when closed
[ https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348931#comment-16348931 ] ASF subversion and git services commented on SOLR-11882: Commit b0b963c68e04a249b87d5b3ab70ade52d19d85ee in lucene-solr's branch refs/heads/master from [~ab] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b0b963c ] Revert "SOLR-11882: SolrMetric registries retain references to SolrCores when closed" This reverts commit c724845fabcdbffe15ad78f5335c77cae0900194. > SolrMetric registries retain references to SolrCores when closed > > > Key: SOLR-11882 > URL: https://issues.apache.org/jira/browse/SOLR-11882 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics, Server >Affects Versions: 7.1 >Reporter: Eros Taborelli >Assignee: Erick Erickson >Priority: Major > Fix For: 7.3 > > Attachments: SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, > SOLR-11882.patch, create-cores.zip, solr-dump-full_Leak_Suspects.zip, > solr.config.zip > > > *Description:* > Our setup involves using a lot of small cores (possibly hundred thousand), > but working only on a few of them at any given time. > We already followed all recommendations in this guide: > [https://wiki.apache.org/solr/LotsOfCores] > We noticed that after creating/loading around 1000-2000 empty cores, with no > documents inside, the heap consumption went through the roof despite having > set transientCacheSize to only 64 (heap size set to 12G). > All cores are correctly set to loadOnStartup=false and transient=true, and we > have verified via logs that the cores in excess are actually being closed. > However, a reference remains in the > org.apache.solr.metrics.SolrMetricManager#registries that is never removed > until a core if fully unloaded. > Restarting the JVM loads all cores in the admin UI, but doesn't populate the > ConcurrentHashMap until a core is actually fully loaded. > I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size > = 512m) and made a report (attached) using eclipse MAT. > *Desired outcome:* > When a transient core is closed, the references in the SolrMetricManager > should be removed, in the same fashion the reporters for the core are also > closed and removed. > In alternative, a unloadOnClose=true|false flag could be implemented to fully > unload a transient core when closed due to the cache size. > *Note:* > The documentation mentions everywhere that the unused cores will be unloaded, > but it's misleading as the cores are never fully unloaded. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11882) SolrMetric registries retain references to SolrCores when closed
[ https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348930#comment-16348930 ] ASF subversion and git services commented on SOLR-11882: Commit 8418081c4ae5bfe752938c1ae6db9cf5063c8e7f in lucene-solr's branch refs/heads/master from [~ab] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=8418081 ] Revert "SOLR-11882: SolrMetric registries retain references to SolrCores when closed" This reverts commit f0509c19c16ded1557f8d7168acb0b7faf926ab7. > SolrMetric registries retain references to SolrCores when closed > > > Key: SOLR-11882 > URL: https://issues.apache.org/jira/browse/SOLR-11882 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics, Server >Affects Versions: 7.1 >Reporter: Eros Taborelli >Assignee: Erick Erickson >Priority: Major > Fix For: 7.3 > > Attachments: SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, > SOLR-11882.patch, create-cores.zip, solr-dump-full_Leak_Suspects.zip, > solr.config.zip > > > *Description:* > Our setup involves using a lot of small cores (possibly hundred thousand), > but working only on a few of them at any given time. > We already followed all recommendations in this guide: > [https://wiki.apache.org/solr/LotsOfCores] > We noticed that after creating/loading around 1000-2000 empty cores, with no > documents inside, the heap consumption went through the roof despite having > set transientCacheSize to only 64 (heap size set to 12G). > All cores are correctly set to loadOnStartup=false and transient=true, and we > have verified via logs that the cores in excess are actually being closed. > However, a reference remains in the > org.apache.solr.metrics.SolrMetricManager#registries that is never removed > until a core if fully unloaded. > Restarting the JVM loads all cores in the admin UI, but doesn't populate the > ConcurrentHashMap until a core is actually fully loaded. > I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size > = 512m) and made a report (attached) using eclipse MAT. > *Desired outcome:* > When a transient core is closed, the references in the SolrMetricManager > should be removed, in the same fashion the reporters for the core are also > closed and removed. > In alternative, a unloadOnClose=true|false flag could be implemented to fully > unload a transient core when closed due to the cache size. > *Note:* > The documentation mentions everywhere that the unused cores will be unloaded, > but it's misleading as the cores are never fully unloaded. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11882) SolrMetric registries retain references to SolrCores when closed
[ https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348848#comment-16348848 ] Erick Erickson commented on SOLR-11882: --- OK, let's revert this. I am traveling today, so feel free if you'd like. I'll get to it this weekend if you don't get to it. > SolrMetric registries retain references to SolrCores when closed > > > Key: SOLR-11882 > URL: https://issues.apache.org/jira/browse/SOLR-11882 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics, Server >Affects Versions: 7.1 >Reporter: Eros Taborelli >Assignee: Erick Erickson >Priority: Major > Fix For: 7.3 > > Attachments: SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, > SOLR-11882.patch, create-cores.zip, solr-dump-full_Leak_Suspects.zip, > solr.config.zip > > > *Description:* > Our setup involves using a lot of small cores (possibly hundred thousand), > but working only on a few of them at any given time. > We already followed all recommendations in this guide: > [https://wiki.apache.org/solr/LotsOfCores] > We noticed that after creating/loading around 1000-2000 empty cores, with no > documents inside, the heap consumption went through the roof despite having > set transientCacheSize to only 64 (heap size set to 12G). > All cores are correctly set to loadOnStartup=false and transient=true, and we > have verified via logs that the cores in excess are actually being closed. > However, a reference remains in the > org.apache.solr.metrics.SolrMetricManager#registries that is never removed > until a core if fully unloaded. > Restarting the JVM loads all cores in the admin UI, but doesn't populate the > ConcurrentHashMap until a core is actually fully loaded. > I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size > = 512m) and made a report (attached) using eclipse MAT. > *Desired outcome:* > When a transient core is closed, the references in the SolrMetricManager > should be removed, in the same fashion the reporters for the core are also > closed and removed. > In alternative, a unloadOnClose=true|false flag could be implemented to fully > unload a transient core when closed due to the cache size. > *Note:* > The documentation mentions everywhere that the unused cores will be unloaded, > but it's misleading as the cores are never fully unloaded. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11882) SolrMetric registries retain references to SolrCores when closed
[ https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348833#comment-16348833 ] Andrzej Bialecki commented on SOLR-11882: -- It turns out that this fix is wrong... :( The new section in {{SolrCoreMetricManager.close()}} causes the new instances of gauges to be closed because the new core is registered first (and registers new instances of metrics) and only then the old one is closed - and it closes the new metrics instead of the old ones… One solution, which is more complicated than I’d like, is to use a subclass of Gauge that has a tag (the same as we do with MetricReporters) and remove instances only when the tag matches the one in the core that is being closed or revert this fix and see if there’s something better that we could do here. > SolrMetric registries retain references to SolrCores when closed > > > Key: SOLR-11882 > URL: https://issues.apache.org/jira/browse/SOLR-11882 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics, Server >Affects Versions: 7.1 >Reporter: Eros Taborelli >Assignee: Erick Erickson >Priority: Major > Fix For: 7.3 > > Attachments: SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, > SOLR-11882.patch, create-cores.zip, solr-dump-full_Leak_Suspects.zip, > solr.config.zip > > > *Description:* > Our setup involves using a lot of small cores (possibly hundred thousand), > but working only on a few of them at any given time. > We already followed all recommendations in this guide: > [https://wiki.apache.org/solr/LotsOfCores] > We noticed that after creating/loading around 1000-2000 empty cores, with no > documents inside, the heap consumption went through the roof despite having > set transientCacheSize to only 64 (heap size set to 12G). > All cores are correctly set to loadOnStartup=false and transient=true, and we > have verified via logs that the cores in excess are actually being closed. > However, a reference remains in the > org.apache.solr.metrics.SolrMetricManager#registries that is never removed > until a core if fully unloaded. > Restarting the JVM loads all cores in the admin UI, but doesn't populate the > ConcurrentHashMap until a core is actually fully loaded. > I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size > = 512m) and made a report (attached) using eclipse MAT. > *Desired outcome:* > When a transient core is closed, the references in the SolrMetricManager > should be removed, in the same fashion the reporters for the core are also > closed and removed. > In alternative, a unloadOnClose=true|false flag could be implemented to fully > unload a transient core when closed due to the cache size. > *Note:* > The documentation mentions everywhere that the unused cores will be unloaded, > but it's misleading as the cores are never fully unloaded. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11882) SolrMetric registries retain references to SolrCores when closed
[ https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342864#comment-16342864 ] ASF subversion and git services commented on SOLR-11882: Commit 2feb3e794a03e07fa1eee34188d667f24d357db5 in lucene-solr's branch refs/heads/branch_7x from Erick [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=2feb3e7 ] SOLR-11882: SolrMetric registries retain references to SolrCores when closed (cherry picked from commit c724845) > SolrMetric registries retain references to SolrCores when closed > > > Key: SOLR-11882 > URL: https://issues.apache.org/jira/browse/SOLR-11882 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics, Server >Affects Versions: 7.1 >Reporter: Eros Taborelli >Assignee: Erick Erickson >Priority: Major > Attachments: SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, > SOLR-11882.patch, create-cores.zip, solr-dump-full_Leak_Suspects.zip, > solr.config.zip > > > *Description:* > Our setup involves using a lot of small cores (possibly hundred thousand), > but working only on a few of them at any given time. > We already followed all recommendations in this guide: > [https://wiki.apache.org/solr/LotsOfCores] > We noticed that after creating/loading around 1000-2000 empty cores, with no > documents inside, the heap consumption went through the roof despite having > set transientCacheSize to only 64 (heap size set to 12G). > All cores are correctly set to loadOnStartup=false and transient=true, and we > have verified via logs that the cores in excess are actually being closed. > However, a reference remains in the > org.apache.solr.metrics.SolrMetricManager#registries that is never removed > until a core if fully unloaded. > Restarting the JVM loads all cores in the admin UI, but doesn't populate the > ConcurrentHashMap until a core is actually fully loaded. > I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size > = 512m) and made a report (attached) using eclipse MAT. > *Desired outcome:* > When a transient core is closed, the references in the SolrMetricManager > should be removed, in the same fashion the reporters for the core are also > closed and removed. > In alternative, a unloadOnClose=true|false flag could be implemented to fully > unload a transient core when closed due to the cache size. > *Note:* > The documentation mentions everywhere that the unused cores will be unloaded, > but it's misleading as the cores are never fully unloaded. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11882) SolrMetric registries retain references to SolrCores when closed
[ https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342863#comment-16342863 ] ASF subversion and git services commented on SOLR-11882: Commit a729fc83311a2f6426664d098d2a5920e2b62852 in lucene-solr's branch refs/heads/branch_7x from Erick [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=a729fc8 ] SOLR-11882: SolrMetric registries retain references to SolrCores when closed (cherry picked from commit f0509c1) > SolrMetric registries retain references to SolrCores when closed > > > Key: SOLR-11882 > URL: https://issues.apache.org/jira/browse/SOLR-11882 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics, Server >Affects Versions: 7.1 >Reporter: Eros Taborelli >Assignee: Erick Erickson >Priority: Major > Attachments: SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, > SOLR-11882.patch, create-cores.zip, solr-dump-full_Leak_Suspects.zip, > solr.config.zip > > > *Description:* > Our setup involves using a lot of small cores (possibly hundred thousand), > but working only on a few of them at any given time. > We already followed all recommendations in this guide: > [https://wiki.apache.org/solr/LotsOfCores] > We noticed that after creating/loading around 1000-2000 empty cores, with no > documents inside, the heap consumption went through the roof despite having > set transientCacheSize to only 64 (heap size set to 12G). > All cores are correctly set to loadOnStartup=false and transient=true, and we > have verified via logs that the cores in excess are actually being closed. > However, a reference remains in the > org.apache.solr.metrics.SolrMetricManager#registries that is never removed > until a core if fully unloaded. > Restarting the JVM loads all cores in the admin UI, but doesn't populate the > ConcurrentHashMap until a core is actually fully loaded. > I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size > = 512m) and made a report (attached) using eclipse MAT. > *Desired outcome:* > When a transient core is closed, the references in the SolrMetricManager > should be removed, in the same fashion the reporters for the core are also > closed and removed. > In alternative, a unloadOnClose=true|false flag could be implemented to fully > unload a transient core when closed due to the cache size. > *Note:* > The documentation mentions everywhere that the unused cores will be unloaded, > but it's misleading as the cores are never fully unloaded. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11882) SolrMetric registries retain references to SolrCores when closed
[ https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342855#comment-16342855 ] ASF subversion and git services commented on SOLR-11882: Commit c724845fabcdbffe15ad78f5335c77cae0900194 in lucene-solr's branch refs/heads/master from Erick [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=c724845 ] SOLR-11882: SolrMetric registries retain references to SolrCores when closed > SolrMetric registries retain references to SolrCores when closed > > > Key: SOLR-11882 > URL: https://issues.apache.org/jira/browse/SOLR-11882 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics, Server >Affects Versions: 7.1 >Reporter: Eros Taborelli >Assignee: Erick Erickson >Priority: Major > Attachments: SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, > SOLR-11882.patch, create-cores.zip, solr-dump-full_Leak_Suspects.zip, > solr.config.zip > > > *Description:* > Our setup involves using a lot of small cores (possibly hundred thousand), > but working only on a few of them at any given time. > We already followed all recommendations in this guide: > [https://wiki.apache.org/solr/LotsOfCores] > We noticed that after creating/loading around 1000-2000 empty cores, with no > documents inside, the heap consumption went through the roof despite having > set transientCacheSize to only 64 (heap size set to 12G). > All cores are correctly set to loadOnStartup=false and transient=true, and we > have verified via logs that the cores in excess are actually being closed. > However, a reference remains in the > org.apache.solr.metrics.SolrMetricManager#registries that is never removed > until a core if fully unloaded. > Restarting the JVM loads all cores in the admin UI, but doesn't populate the > ConcurrentHashMap until a core is actually fully loaded. > I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size > = 512m) and made a report (attached) using eclipse MAT. > *Desired outcome:* > When a transient core is closed, the references in the SolrMetricManager > should be removed, in the same fashion the reporters for the core are also > closed and removed. > In alternative, a unloadOnClose=true|false flag could be implemented to fully > unload a transient core when closed due to the cache size. > *Note:* > The documentation mentions everywhere that the unused cores will be unloaded, > but it's misleading as the cores are never fully unloaded. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11882) SolrMetric registries retain references to SolrCores when closed
[ https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342856#comment-16342856 ] ASF subversion and git services commented on SOLR-11882: Commit d85a1666a18423eeeda83ca89ce4ab959ce39066 in lucene-solr's branch refs/heads/master from Erick [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=d85a166 ] SOLR-11882: SolrMetric registries retain references to SolrCores when closed > SolrMetric registries retain references to SolrCores when closed > > > Key: SOLR-11882 > URL: https://issues.apache.org/jira/browse/SOLR-11882 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics, Server >Affects Versions: 7.1 >Reporter: Eros Taborelli >Assignee: Erick Erickson >Priority: Major > Attachments: SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, > SOLR-11882.patch, create-cores.zip, solr-dump-full_Leak_Suspects.zip, > solr.config.zip > > > *Description:* > Our setup involves using a lot of small cores (possibly hundred thousand), > but working only on a few of them at any given time. > We already followed all recommendations in this guide: > [https://wiki.apache.org/solr/LotsOfCores] > We noticed that after creating/loading around 1000-2000 empty cores, with no > documents inside, the heap consumption went through the roof despite having > set transientCacheSize to only 64 (heap size set to 12G). > All cores are correctly set to loadOnStartup=false and transient=true, and we > have verified via logs that the cores in excess are actually being closed. > However, a reference remains in the > org.apache.solr.metrics.SolrMetricManager#registries that is never removed > until a core if fully unloaded. > Restarting the JVM loads all cores in the admin UI, but doesn't populate the > ConcurrentHashMap until a core is actually fully loaded. > I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size > = 512m) and made a report (attached) using eclipse MAT. > *Desired outcome:* > When a transient core is closed, the references in the SolrMetricManager > should be removed, in the same fashion the reporters for the core are also > closed and removed. > In alternative, a unloadOnClose=true|false flag could be implemented to fully > unload a transient core when closed due to the cache size. > *Note:* > The documentation mentions everywhere that the unused cores will be unloaded, > but it's misleading as the cores are never fully unloaded. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11882) SolrMetric registries retain references to SolrCores when closed
[ https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342853#comment-16342853 ] ASF subversion and git services commented on SOLR-11882: Commit f0509c19c16ded1557f8d7168acb0b7faf926ab7 in lucene-solr's branch refs/heads/master from Erick [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f0509c1 ] SOLR-11882: SolrMetric registries retain references to SolrCores when closed > SolrMetric registries retain references to SolrCores when closed > > > Key: SOLR-11882 > URL: https://issues.apache.org/jira/browse/SOLR-11882 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics, Server >Affects Versions: 7.1 >Reporter: Eros Taborelli >Assignee: Erick Erickson >Priority: Major > Attachments: SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, > SOLR-11882.patch, create-cores.zip, solr-dump-full_Leak_Suspects.zip, > solr.config.zip > > > *Description:* > Our setup involves using a lot of small cores (possibly hundred thousand), > but working only on a few of them at any given time. > We already followed all recommendations in this guide: > [https://wiki.apache.org/solr/LotsOfCores] > We noticed that after creating/loading around 1000-2000 empty cores, with no > documents inside, the heap consumption went through the roof despite having > set transientCacheSize to only 64 (heap size set to 12G). > All cores are correctly set to loadOnStartup=false and transient=true, and we > have verified via logs that the cores in excess are actually being closed. > However, a reference remains in the > org.apache.solr.metrics.SolrMetricManager#registries that is never removed > until a core if fully unloaded. > Restarting the JVM loads all cores in the admin UI, but doesn't populate the > ConcurrentHashMap until a core is actually fully loaded. > I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size > = 512m) and made a report (attached) using eclipse MAT. > *Desired outcome:* > When a transient core is closed, the references in the SolrMetricManager > should be removed, in the same fashion the reporters for the core are also > closed and removed. > In alternative, a unloadOnClose=true|false flag could be implemented to fully > unload a transient core when closed due to the cache size. > *Note:* > The documentation mentions everywhere that the unused cores will be unloaded, > but it's misleading as the cores are never fully unloaded. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11882) SolrMetric registries retain references to SolrCores when closed
[ https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342852#comment-16342852 ] Erick Erickson commented on SOLR-11882: --- Patch with CHANGES.txt > SolrMetric registries retain references to SolrCores when closed > > > Key: SOLR-11882 > URL: https://issues.apache.org/jira/browse/SOLR-11882 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics, Server >Affects Versions: 7.1 >Reporter: Eros Taborelli >Assignee: Erick Erickson >Priority: Major > Attachments: SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, > SOLR-11882.patch, create-cores.zip, solr-dump-full_Leak_Suspects.zip, > solr.config.zip > > > *Description:* > Our setup involves using a lot of small cores (possibly hundred thousand), > but working only on a few of them at any given time. > We already followed all recommendations in this guide: > [https://wiki.apache.org/solr/LotsOfCores] > We noticed that after creating/loading around 1000-2000 empty cores, with no > documents inside, the heap consumption went through the roof despite having > set transientCacheSize to only 64 (heap size set to 12G). > All cores are correctly set to loadOnStartup=false and transient=true, and we > have verified via logs that the cores in excess are actually being closed. > However, a reference remains in the > org.apache.solr.metrics.SolrMetricManager#registries that is never removed > until a core if fully unloaded. > Restarting the JVM loads all cores in the admin UI, but doesn't populate the > ConcurrentHashMap until a core is actually fully loaded. > I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size > = 512m) and made a report (attached) using eclipse MAT. > *Desired outcome:* > When a transient core is closed, the references in the SolrMetricManager > should be removed, in the same fashion the reporters for the core are also > closed and removed. > In alternative, a unloadOnClose=true|false flag could be implemented to fully > unload a transient core when closed due to the cache size. > *Note:* > The documentation mentions everywhere that the unused cores will be unloaded, > but it's misleading as the cores are never fully unloaded. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11882) SolrMetric registries retain references to SolrCores when closed
[ https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342313#comment-16342313 ] Erick Erickson commented on SOLR-11882: --- Any reason _not_ to commit this? Otherwise I'll commit this this weekend Now that I have the right patch up there. > SolrMetric registries retain references to SolrCores when closed > > > Key: SOLR-11882 > URL: https://issues.apache.org/jira/browse/SOLR-11882 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics, Server >Affects Versions: 7.1 >Reporter: Eros Taborelli >Assignee: Erick Erickson >Priority: Major > Attachments: SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, > create-cores.zip, solr-dump-full_Leak_Suspects.zip, solr.config.zip > > > *Description:* > Our setup involves using a lot of small cores (possibly hundred thousand), > but working only on a few of them at any given time. > We already followed all recommendations in this guide: > [https://wiki.apache.org/solr/LotsOfCores] > We noticed that after creating/loading around 1000-2000 empty cores, with no > documents inside, the heap consumption went through the roof despite having > set transientCacheSize to only 64 (heap size set to 12G). > All cores are correctly set to loadOnStartup=false and transient=true, and we > have verified via logs that the cores in excess are actually being closed. > However, a reference remains in the > org.apache.solr.metrics.SolrMetricManager#registries that is never removed > until a core if fully unloaded. > Restarting the JVM loads all cores in the admin UI, but doesn't populate the > ConcurrentHashMap until a core is actually fully loaded. > I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size > = 512m) and made a report (attached) using eclipse MAT. > *Desired outcome:* > When a transient core is closed, the references in the SolrMetricManager > should be removed, in the same fashion the reporters for the core are also > closed and removed. > In alternative, a unloadOnClose=true|false flag could be implemented to fully > unload a transient core when closed due to the cache size. > *Note:* > The documentation mentions everywhere that the unused cores will be unloaded, > but it's misleading as the cores are never fully unloaded. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11882) SolrMetric registries retain references to SolrCores when closed
[ https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16338504#comment-16338504 ] Erick Erickson commented on SOLR-11882: --- Oh total bother. I put up a second copy of the _same_ hack patch up rather than the one you coached me on, I'll put that one up shortly. > SolrMetric registries retain references to SolrCores when closed > > > Key: SOLR-11882 > URL: https://issues.apache.org/jira/browse/SOLR-11882 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Server >Affects Versions: 7.1 >Reporter: Eros Taborelli >Assignee: Erick Erickson >Priority: Major > Attachments: SOLR-11882.patch, SOLR-11882.patch, create-cores.zip, > solr-dump-full_Leak_Suspects.zip, solr.config.zip > > > *Description:* > Our setup involves using a lot of small cores (possibly hundred thousand), > but working only on a few of them at any given time. > We already followed all recommendations in this guide: > [https://wiki.apache.org/solr/LotsOfCores] > We noticed that after creating/loading around 1000-2000 empty cores, with no > documents inside, the heap consumption went through the roof despite having > set transientCacheSize to only 64 (heap size set to 12G). > All cores are correctly set to loadOnStartup=false and transient=true, and we > have verified via logs that the cores in excess are actually being closed. > However, a reference remains in the > org.apache.solr.metrics.SolrMetricManager#registries that is never removed > until a core if fully unloaded. > Restarting the JVM loads all cores in the admin UI, but doesn't populate the > ConcurrentHashMap until a core is actually fully loaded. > I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size > = 512m) and made a report (attached) using eclipse MAT. > *Desired outcome:* > When a transient core is closed, the references in the SolrMetricManager > should be removed, in the same fashion the reporters for the core are also > closed and removed. > In alternative, a unloadOnClose=true|false flag could be implemented to fully > unload a transient core when closed due to the cache size. > *Note:* > The documentation mentions everywhere that the unused cores will be unloaded, > but it's misleading as the cores are never fully unloaded. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11882) SolrMetric registries retain references to SolrCores when closed
[ https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16337190#comment-16337190 ] Andrzej Bialecki commented on SOLR-11882: -- {quote}[~ab]kindly provided this suggestion, I applied it {quote} My suggestion was to replace Gauge metrics in the registry inside {{SolrCoreMetricManager.close()}} with their last primitive values (because most of these Gauges are created as lambdas and keep referencing SolrCore, whereas values they produce don't reference the core) - this way we would stop referencing SolrCore but still preserve a snapshot of gauge values. Something like this: {code} metricRegistry.getGauges().forEach((k, v) -> { Object val = v.getValue(); metricRegistry.remove(k); metricRegistry.register(k, (Gauge)() -> val); } {code} I'm surprised your patch works, because {{SolrCoreMetricManager.close()}} is called from inside {{SolrCore.close()}}, and calling {{SolrCore.close()}} here again should IMHO lead to "Too many closes" exception... > SolrMetric registries retain references to SolrCores when closed > > > Key: SOLR-11882 > URL: https://issues.apache.org/jira/browse/SOLR-11882 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Server >Affects Versions: 7.1 >Reporter: Eros Taborelli >Assignee: Erick Erickson >Priority: Major > Attachments: SOLR-11882.patch, SOLR-11882.patch, create-cores.zip, > solr-dump-full_Leak_Suspects.zip, solr.config.zip > > > *Description:* > Our setup involves using a lot of small cores (possibly hundred thousand), > but working only on a few of them at any given time. > We already followed all recommendations in this guide: > [https://wiki.apache.org/solr/LotsOfCores] > We noticed that after creating/loading around 1000-2000 empty cores, with no > documents inside, the heap consumption went through the roof despite having > set transientCacheSize to only 64 (heap size set to 12G). > All cores are correctly set to loadOnStartup=false and transient=true, and we > have verified via logs that the cores in excess are actually being closed. > However, a reference remains in the > org.apache.solr.metrics.SolrMetricManager#registries that is never removed > until a core if fully unloaded. > Restarting the JVM loads all cores in the admin UI, but doesn't populate the > ConcurrentHashMap until a core is actually fully loaded. > I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size > = 512m) and made a report (attached) using eclipse MAT. > *Desired outcome:* > When a transient core is closed, the references in the SolrMetricManager > should be removed, in the same fashion the reporters for the core are also > closed and removed. > In alternative, a unloadOnClose=true|false flag could be implemented to fully > unload a transient core when closed due to the cache size. > *Note:* > The documentation mentions everywhere that the unused cores will be unloaded, > but it's misleading as the cores are never fully unloaded. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11882) SolrMetric registries retain references to SolrCores when closed
[ https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16337104#comment-16337104 ] Eros Taborelli commented on SOLR-11882: --- [~erickerickson] yes, that is what we see. > SolrMetric registries retain references to SolrCores when closed > > > Key: SOLR-11882 > URL: https://issues.apache.org/jira/browse/SOLR-11882 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Server >Affects Versions: 7.1 >Reporter: Eros Taborelli >Assignee: Erick Erickson >Priority: Major > Attachments: SOLR-11882.patch, SOLR-11882.patch, create-cores.zip, > solr-dump-full_Leak_Suspects.zip, solr.config.zip > > > *Description:* > Our setup involves using a lot of small cores (possibly hundred thousand), > but working only on a few of them at any given time. > We already followed all recommendations in this guide: > [https://wiki.apache.org/solr/LotsOfCores] > We noticed that after creating/loading around 1000-2000 empty cores, with no > documents inside, the heap consumption went through the roof despite having > set transientCacheSize to only 64 (heap size set to 12G). > All cores are correctly set to loadOnStartup=false and transient=true, and we > have verified via logs that the cores in excess are actually being closed. > However, a reference remains in the > org.apache.solr.metrics.SolrMetricManager#registries that is never removed > until a core if fully unloaded. > Restarting the JVM loads all cores in the admin UI, but doesn't populate the > ConcurrentHashMap until a core is actually fully loaded. > I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size > = 512m) and made a report (attached) using eclipse MAT. > *Desired outcome:* > When a transient core is closed, the references in the SolrMetricManager > should be removed, in the same fashion the reporters for the core are also > closed and removed. > In alternative, a unloadOnClose=true|false flag could be implemented to fully > unload a transient core when closed due to the cache size. > *Note:* > The documentation mentions everywhere that the unused cores will be unloaded, > but it's misleading as the cores are never fully unloaded. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11882) SolrMetric registries retain references to SolrCores when closed
[ https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16336891#comment-16336891 ] Erick Erickson commented on SOLR-11882: --- [~ab] kindly provided this suggestion, I applied it - it fixes the cores lingering around. As above I had to stop indexing and force a GC to have the cores drop back to 4 in my test scenario. In "real" situations where you have hundreds/thousands of cores I'd expect the number of references to peak somewhat above your cache size as some wait around for GC 2> precommit passes 3> tests pass. I had one failure with AutoscalingHistoryHandlerTest, then 3 of 10 failed (beasting). However, 2 of 10 failed without this patch so I don't think it's relevant. What I have _not_ looked at yet is what happens when metrics are requested for non-resident cores, or whether having the cores come and go accumulates metrics over successive loads of the core. > SolrMetric registries retain references to SolrCores when closed > > > Key: SOLR-11882 > URL: https://issues.apache.org/jira/browse/SOLR-11882 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Server >Affects Versions: 7.1 >Reporter: Eros Taborelli >Assignee: Erick Erickson >Priority: Major > Attachments: SOLR-11882.patch, SOLR-11882.patch, create-cores.zip, > solr-dump-full_Leak_Suspects.zip, solr.config.zip > > > *Description:* > Our setup involves using a lot of small cores (possibly hundred thousand), > but working only on a few of them at any given time. > We already followed all recommendations in this guide: > [https://wiki.apache.org/solr/LotsOfCores] > We noticed that after creating/loading around 1000-2000 empty cores, with no > documents inside, the heap consumption went through the roof despite having > set transientCacheSize to only 64 (heap size set to 12G). > All cores are correctly set to loadOnStartup=false and transient=true, and we > have verified via logs that the cores in excess are actually being closed. > However, a reference remains in the > org.apache.solr.metrics.SolrMetricManager#registries that is never removed > until a core if fully unloaded. > Restarting the JVM loads all cores in the admin UI, but doesn't populate the > ConcurrentHashMap until a core is actually fully loaded. > I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size > = 512m) and made a report (attached) using eclipse MAT. > *Desired outcome:* > When a transient core is closed, the references in the SolrMetricManager > should be removed, in the same fashion the reporters for the core are also > closed and removed. > In alternative, a unloadOnClose=true|false flag could be implemented to fully > unload a transient core when closed due to the cache size. > *Note:* > The documentation mentions everywhere that the unused cores will be unloaded, > but it's misleading as the cores are never fully unloaded. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org