[jira] [Commented] (SOLR-11882) SolrMetric registries retain references to SolrCores when closed

2018-04-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16423749#comment-16423749
 ] 

ASF subversion and git services commented on SOLR-11882:


Commit 483914b6a4c5aaa163625169066e8c6bb3942566 in lucene-solr's branch 
refs/heads/branch_7x from [~ab]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=483914b ]

SOLR-11882: SolrMetric registries retained references to SolrCores when closed.


> SolrMetric registries retain references to SolrCores when closed
> 
>
> Key: SOLR-11882
> URL: https://issues.apache.org/jira/browse/SOLR-11882
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics, Server
>Affects Versions: 7.1
>Reporter: Eros Taborelli
>Assignee: Andrzej Bialecki 
>Priority: Major
> Fix For: 7.4, master (8.0)
>
> Attachments: SOLR-11882-7x.patch, SOLR-11882.patch, SOLR-11882.patch, 
> SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, 
> create-cores.zip, solr-dump-full_Leak_Suspects.zip, solr.config.zip
>
>
> *Description:*
> Our setup involves using a lot of small cores (possibly hundred thousand), 
> but working only on a few of them at any given time.
> We already followed all recommendations in this guide: 
> [https://wiki.apache.org/solr/LotsOfCores]
> We noticed that after creating/loading around 1000-2000 empty cores, with no 
> documents inside, the heap consumption went through the roof despite having 
> set transientCacheSize to only 64 (heap size set to 12G).
> All cores are correctly set to loadOnStartup=false and transient=true, and we 
> have verified via logs that the cores in excess are actually being closed.
> However, a reference remains in the 
> org.apache.solr.metrics.SolrMetricManager#registries that is never removed 
> until a core if fully unloaded.
> Restarting the JVM loads all cores in the admin UI, but doesn't populate the 
> ConcurrentHashMap until a core is actually fully loaded.
> I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size 
> = 512m) and made a report (attached) using eclipse MAT.
> *Desired outcome:*
> When a transient core is closed, the references in the SolrMetricManager 
> should be removed, in the same fashion the reporters for the core are also 
> closed and removed.
> In alternative, a unloadOnClose=true|false flag could be implemented to fully 
> unload a transient core when closed due to the cache size.
> *Note:*
> The documentation mentions everywhere that the unused cores will be unloaded, 
> but it's misleading as the cores are never fully unloaded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11882) SolrMetric registries retain references to SolrCores when closed

2018-03-29 Thread Andrzej Bialecki (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418820#comment-16418820
 ] 

Andrzej Bialecki  commented on SOLR-11882:
--

Patch for branch_7x that allows us to maintain back-compat API - this is 
functionally identical to the patch for master except the changes in 
{{SolrMetricProducer}} interface. The old method is marked here as deprecated, 
and the new method has a default implementation that calls the old one - so 
third-party components that implement only the old method will be correctly 
called by Solr via the default impl. of the new method.

If there are no objections I'll commit this shortly.

> SolrMetric registries retain references to SolrCores when closed
> 
>
> Key: SOLR-11882
> URL: https://issues.apache.org/jira/browse/SOLR-11882
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics, Server
>Affects Versions: 7.1
>Reporter: Eros Taborelli
>Assignee: Andrzej Bialecki 
>Priority: Major
> Fix For: 7.4, master (8.0)
>
> Attachments: SOLR-11882-7x.patch, SOLR-11882.patch, SOLR-11882.patch, 
> SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, 
> create-cores.zip, solr-dump-full_Leak_Suspects.zip, solr.config.zip
>
>
> *Description:*
> Our setup involves using a lot of small cores (possibly hundred thousand), 
> but working only on a few of them at any given time.
> We already followed all recommendations in this guide: 
> [https://wiki.apache.org/solr/LotsOfCores]
> We noticed that after creating/loading around 1000-2000 empty cores, with no 
> documents inside, the heap consumption went through the roof despite having 
> set transientCacheSize to only 64 (heap size set to 12G).
> All cores are correctly set to loadOnStartup=false and transient=true, and we 
> have verified via logs that the cores in excess are actually being closed.
> However, a reference remains in the 
> org.apache.solr.metrics.SolrMetricManager#registries that is never removed 
> until a core if fully unloaded.
> Restarting the JVM loads all cores in the admin UI, but doesn't populate the 
> ConcurrentHashMap until a core is actually fully loaded.
> I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size 
> = 512m) and made a report (attached) using eclipse MAT.
> *Desired outcome:*
> When a transient core is closed, the references in the SolrMetricManager 
> should be removed, in the same fashion the reporters for the core are also 
> closed and removed.
> In alternative, a unloadOnClose=true|false flag could be implemented to fully 
> unload a transient core when closed due to the cache size.
> *Note:*
> The documentation mentions everywhere that the unused cores will be unloaded, 
> but it's misleading as the cores are never fully unloaded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11882) SolrMetric registries retain references to SolrCores when closed

2018-03-28 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16417100#comment-16417100
 ] 

ASF subversion and git services commented on SOLR-11882:


Commit 7260d9ce713b5f6378b97e4c64f3045eb62f98bd in lucene-solr's branch 
refs/heads/master from [~ab]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7260d9c ]

SOLR-11882: SolrMetric registries retained references to SolrCores when closed.


> SolrMetric registries retain references to SolrCores when closed
> 
>
> Key: SOLR-11882
> URL: https://issues.apache.org/jira/browse/SOLR-11882
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics, Server
>Affects Versions: 7.1
>Reporter: Eros Taborelli
>Assignee: Andrzej Bialecki 
>Priority: Major
> Fix For: master (8.0)
>
> Attachments: SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, 
> SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, create-cores.zip, 
> solr-dump-full_Leak_Suspects.zip, solr.config.zip
>
>
> *Description:*
> Our setup involves using a lot of small cores (possibly hundred thousand), 
> but working only on a few of them at any given time.
> We already followed all recommendations in this guide: 
> [https://wiki.apache.org/solr/LotsOfCores]
> We noticed that after creating/loading around 1000-2000 empty cores, with no 
> documents inside, the heap consumption went through the roof despite having 
> set transientCacheSize to only 64 (heap size set to 12G).
> All cores are correctly set to loadOnStartup=false and transient=true, and we 
> have verified via logs that the cores in excess are actually being closed.
> However, a reference remains in the 
> org.apache.solr.metrics.SolrMetricManager#registries that is never removed 
> until a core if fully unloaded.
> Restarting the JVM loads all cores in the admin UI, but doesn't populate the 
> ConcurrentHashMap until a core is actually fully loaded.
> I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size 
> = 512m) and made a report (attached) using eclipse MAT.
> *Desired outcome:*
> When a transient core is closed, the references in the SolrMetricManager 
> should be removed, in the same fashion the reporters for the core are also 
> closed and removed.
> In alternative, a unloadOnClose=true|false flag could be implemented to fully 
> unload a transient core when closed due to the cache size.
> *Note:*
> The documentation mentions everywhere that the unused cores will be unloaded, 
> but it's misleading as the cores are never fully unloaded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11882) SolrMetric registries retain references to SolrCores when closed

2018-03-27 Thread Andrzej Bialecki (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416468#comment-16416468
 ] 

Andrzej Bialecki  commented on SOLR-11882:
--

Updated patch. This adds a SolrCore instance identifier (tag) to all gauges in 
a registry, which are then matched and removed when SolrCore is closed.

The size of the patch is partially caused by the change in 
{{SolrMetricProducer.initializeMetrics(...)}} and the need to pass around the 
SolrCore instance tag.
All unit tests pass, and the scenario described above also passes, ie. produces 
only 2 strongly referenced SolrCore objects.

> SolrMetric registries retain references to SolrCores when closed
> 
>
> Key: SOLR-11882
> URL: https://issues.apache.org/jira/browse/SOLR-11882
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics, Server
>Affects Versions: 7.1
>Reporter: Eros Taborelli
>Assignee: Andrzej Bialecki 
>Priority: Major
> Fix For: master (8.0)
>
> Attachments: SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, 
> SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, create-cores.zip, 
> solr-dump-full_Leak_Suspects.zip, solr.config.zip
>
>
> *Description:*
> Our setup involves using a lot of small cores (possibly hundred thousand), 
> but working only on a few of them at any given time.
> We already followed all recommendations in this guide: 
> [https://wiki.apache.org/solr/LotsOfCores]
> We noticed that after creating/loading around 1000-2000 empty cores, with no 
> documents inside, the heap consumption went through the roof despite having 
> set transientCacheSize to only 64 (heap size set to 12G).
> All cores are correctly set to loadOnStartup=false and transient=true, and we 
> have verified via logs that the cores in excess are actually being closed.
> However, a reference remains in the 
> org.apache.solr.metrics.SolrMetricManager#registries that is never removed 
> until a core if fully unloaded.
> Restarting the JVM loads all cores in the admin UI, but doesn't populate the 
> ConcurrentHashMap until a core is actually fully loaded.
> I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size 
> = 512m) and made a report (attached) using eclipse MAT.
> *Desired outcome:*
> When a transient core is closed, the references in the SolrMetricManager 
> should be removed, in the same fashion the reporters for the core are also 
> closed and removed.
> In alternative, a unloadOnClose=true|false flag could be implemented to fully 
> unload a transient core when closed due to the cache size.
> *Note:*
> The documentation mentions everywhere that the unused cores will be unloaded, 
> but it's misleading as the cores are never fully unloaded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11882) SolrMetric registries retain references to SolrCores when closed

2018-03-27 Thread Andrzej Bialecki (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416355#comment-16416355
 ] 

Andrzej Bialecki  commented on SOLR-11882:
--

bq. Could we introspect the impl and do the right thing with new impls that 
take the new param?
That would be exceedingly messy - this method is called in many components and 
from different contexts (eg. most but not all mbeans are initialized in 
SolrCore, but handlers are also initialized in CoreContainer, some components 
initialize their own sub-components, etc...)

> SolrMetric registries retain references to SolrCores when closed
> 
>
> Key: SOLR-11882
> URL: https://issues.apache.org/jira/browse/SOLR-11882
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics, Server
>Affects Versions: 7.1
>Reporter: Eros Taborelli
>Assignee: Andrzej Bialecki 
>Priority: Major
> Fix For: master (8.0)
>
> Attachments: SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, 
> SOLR-11882.patch, SOLR-11882.patch, create-cores.zip, 
> solr-dump-full_Leak_Suspects.zip, solr.config.zip
>
>
> *Description:*
> Our setup involves using a lot of small cores (possibly hundred thousand), 
> but working only on a few of them at any given time.
> We already followed all recommendations in this guide: 
> [https://wiki.apache.org/solr/LotsOfCores]
> We noticed that after creating/loading around 1000-2000 empty cores, with no 
> documents inside, the heap consumption went through the roof despite having 
> set transientCacheSize to only 64 (heap size set to 12G).
> All cores are correctly set to loadOnStartup=false and transient=true, and we 
> have verified via logs that the cores in excess are actually being closed.
> However, a reference remains in the 
> org.apache.solr.metrics.SolrMetricManager#registries that is never removed 
> until a core if fully unloaded.
> Restarting the JVM loads all cores in the admin UI, but doesn't populate the 
> ConcurrentHashMap until a core is actually fully loaded.
> I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size 
> = 512m) and made a report (attached) using eclipse MAT.
> *Desired outcome:*
> When a transient core is closed, the references in the SolrMetricManager 
> should be removed, in the same fashion the reporters for the core are also 
> closed and removed.
> In alternative, a unloadOnClose=true|false flag could be implemented to fully 
> unload a transient core when closed due to the cache size.
> *Note:*
> The documentation mentions everywhere that the unused cores will be unloaded, 
> but it's misleading as the cores are never fully unloaded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11882) SolrMetric registries retain references to SolrCores when closed

2018-03-27 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416335#comment-16416335
 ] 

Mark Miller commented on SOLR-11882:


bq. definitely not for 7.3.

Could we introspect the impl and do the right thing with new impls that take 
the new param?

> SolrMetric registries retain references to SolrCores when closed
> 
>
> Key: SOLR-11882
> URL: https://issues.apache.org/jira/browse/SOLR-11882
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics, Server
>Affects Versions: 7.1
>Reporter: Eros Taborelli
>Assignee: Andrzej Bialecki 
>Priority: Major
> Fix For: master (8.0)
>
> Attachments: SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, 
> SOLR-11882.patch, SOLR-11882.patch, create-cores.zip, 
> solr-dump-full_Leak_Suspects.zip, solr.config.zip
>
>
> *Description:*
> Our setup involves using a lot of small cores (possibly hundred thousand), 
> but working only on a few of them at any given time.
> We already followed all recommendations in this guide: 
> [https://wiki.apache.org/solr/LotsOfCores]
> We noticed that after creating/loading around 1000-2000 empty cores, with no 
> documents inside, the heap consumption went through the roof despite having 
> set transientCacheSize to only 64 (heap size set to 12G).
> All cores are correctly set to loadOnStartup=false and transient=true, and we 
> have verified via logs that the cores in excess are actually being closed.
> However, a reference remains in the 
> org.apache.solr.metrics.SolrMetricManager#registries that is never removed 
> until a core if fully unloaded.
> Restarting the JVM loads all cores in the admin UI, but doesn't populate the 
> ConcurrentHashMap until a core is actually fully loaded.
> I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size 
> = 512m) and made a report (attached) using eclipse MAT.
> *Desired outcome:*
> When a transient core is closed, the references in the SolrMetricManager 
> should be removed, in the same fashion the reporters for the core are also 
> closed and removed.
> In alternative, a unloadOnClose=true|false flag could be implemented to fully 
> unload a transient core when closed due to the cache size.
> *Note:*
> The documentation mentions everywhere that the unused cores will be unloaded, 
> but it's misleading as the cores are never fully unloaded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11882) SolrMetric registries retain references to SolrCores when closed

2018-03-27 Thread Andrzej Bialecki (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416320#comment-16416320
 ] 

Andrzej Bialecki  commented on SOLR-11882:
--

[~romseygeek] The current patch is broken (Solr silently loses some metrics 
from active cores .. oops). I'm preparing a new patch that is conceptually 
simpler and appears to be working well.

However, this new fix requires changing the API of {{SolrMetricProducer}} (new 
parameter in {{initializeMetrics(...)}} method) so I think it's suitable only 
for 8.0 - definitely not for 7.3.

> SolrMetric registries retain references to SolrCores when closed
> 
>
> Key: SOLR-11882
> URL: https://issues.apache.org/jira/browse/SOLR-11882
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics, Server
>Affects Versions: 7.1
>Reporter: Eros Taborelli
>Assignee: Andrzej Bialecki 
>Priority: Major
> Fix For: master (8.0)
>
> Attachments: SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, 
> SOLR-11882.patch, SOLR-11882.patch, create-cores.zip, 
> solr-dump-full_Leak_Suspects.zip, solr.config.zip
>
>
> *Description:*
> Our setup involves using a lot of small cores (possibly hundred thousand), 
> but working only on a few of them at any given time.
> We already followed all recommendations in this guide: 
> [https://wiki.apache.org/solr/LotsOfCores]
> We noticed that after creating/loading around 1000-2000 empty cores, with no 
> documents inside, the heap consumption went through the roof despite having 
> set transientCacheSize to only 64 (heap size set to 12G).
> All cores are correctly set to loadOnStartup=false and transient=true, and we 
> have verified via logs that the cores in excess are actually being closed.
> However, a reference remains in the 
> org.apache.solr.metrics.SolrMetricManager#registries that is never removed 
> until a core if fully unloaded.
> Restarting the JVM loads all cores in the admin UI, but doesn't populate the 
> ConcurrentHashMap until a core is actually fully loaded.
> I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size 
> = 512m) and made a report (attached) using eclipse MAT.
> *Desired outcome:*
> When a transient core is closed, the references in the SolrMetricManager 
> should be removed, in the same fashion the reporters for the core are also 
> closed and removed.
> In alternative, a unloadOnClose=true|false flag could be implemented to fully 
> unload a transient core when closed due to the cache size.
> *Note:*
> The documentation mentions everywhere that the unused cores will be unloaded, 
> but it's misleading as the cores are never fully unloaded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11882) SolrMetric registries retain references to SolrCores when closed

2018-03-27 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16415849#comment-16415849
 ] 

Erick Erickson commented on SOLR-11882:
---

[~romseygeek] This has been around since at least 6.4 I believe. Plus, it's 
rather obscure. In the normal course of events, we don't close cores _and leave 
them closed_.

If we re-open a core, the orphan reference is re-assigned. So if I 
open/close/open/close the same core a zillion times, I only have one SolrCore 
object.

It manifests itself is if someone is using "transient cores". In that case, if 
the transient core cache is capped at, say, 10 and I have 100 transient cores, 
after I cycle through them all I'll have 90 "orphan" references. I'll never 
have more than that though. And never less.

For anyone running "stock" Solr, it won't show up. People will open all the 
cores at startup and keep them open (even if reopened) and won't have any 
orphans.

I suppose if people are unloading cores it might occur as well, but I think 
that's rare.

All FYI to evaluate whether you want to put it in 7.3. For people affected it 
is, indeed serious if they have a lot of cores

> SolrMetric registries retain references to SolrCores when closed
> 
>
> Key: SOLR-11882
> URL: https://issues.apache.org/jira/browse/SOLR-11882
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics, Server
>Affects Versions: 7.1
>Reporter: Eros Taborelli
>Assignee: Erick Erickson
>Priority: Major
> Attachments: SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, 
> SOLR-11882.patch, SOLR-11882.patch, create-cores.zip, 
> solr-dump-full_Leak_Suspects.zip, solr.config.zip
>
>
> *Description:*
> Our setup involves using a lot of small cores (possibly hundred thousand), 
> but working only on a few of them at any given time.
> We already followed all recommendations in this guide: 
> [https://wiki.apache.org/solr/LotsOfCores]
> We noticed that after creating/loading around 1000-2000 empty cores, with no 
> documents inside, the heap consumption went through the roof despite having 
> set transientCacheSize to only 64 (heap size set to 12G).
> All cores are correctly set to loadOnStartup=false and transient=true, and we 
> have verified via logs that the cores in excess are actually being closed.
> However, a reference remains in the 
> org.apache.solr.metrics.SolrMetricManager#registries that is never removed 
> until a core if fully unloaded.
> Restarting the JVM loads all cores in the admin UI, but doesn't populate the 
> ConcurrentHashMap until a core is actually fully loaded.
> I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size 
> = 512m) and made a report (attached) using eclipse MAT.
> *Desired outcome:*
> When a transient core is closed, the references in the SolrMetricManager 
> should be removed, in the same fashion the reporters for the core are also 
> closed and removed.
> In alternative, a unloadOnClose=true|false flag could be implemented to fully 
> unload a transient core when closed due to the cache size.
> *Note:*
> The documentation mentions everywhere that the unused cores will be unloaded, 
> but it's misleading as the cores are never fully unloaded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11882) SolrMetric registries retain references to SolrCores when closed

2018-03-27 Thread Andrzej Bialecki (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16415225#comment-16415225
 ] 

Andrzej Bialecki  commented on SOLR-11882:
--

[~romseygeek] the patch appears to be working but as I indicated above it's 
probably not the whole story, so let's wait at least for a few jenkins builds 
to confirm that it doesn't break anything.

> SolrMetric registries retain references to SolrCores when closed
> 
>
> Key: SOLR-11882
> URL: https://issues.apache.org/jira/browse/SOLR-11882
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics, Server
>Affects Versions: 7.1
>Reporter: Eros Taborelli
>Assignee: Erick Erickson
>Priority: Major
> Attachments: SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, 
> SOLR-11882.patch, SOLR-11882.patch, create-cores.zip, 
> solr-dump-full_Leak_Suspects.zip, solr.config.zip
>
>
> *Description:*
> Our setup involves using a lot of small cores (possibly hundred thousand), 
> but working only on a few of them at any given time.
> We already followed all recommendations in this guide: 
> [https://wiki.apache.org/solr/LotsOfCores]
> We noticed that after creating/loading around 1000-2000 empty cores, with no 
> documents inside, the heap consumption went through the roof despite having 
> set transientCacheSize to only 64 (heap size set to 12G).
> All cores are correctly set to loadOnStartup=false and transient=true, and we 
> have verified via logs that the cores in excess are actually being closed.
> However, a reference remains in the 
> org.apache.solr.metrics.SolrMetricManager#registries that is never removed 
> until a core if fully unloaded.
> Restarting the JVM loads all cores in the admin UI, but doesn't populate the 
> ConcurrentHashMap until a core is actually fully loaded.
> I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size 
> = 512m) and made a report (attached) using eclipse MAT.
> *Desired outcome:*
> When a transient core is closed, the references in the SolrMetricManager 
> should be removed, in the same fashion the reporters for the core are also 
> closed and removed.
> In alternative, a unloadOnClose=true|false flag could be implemented to fully 
> unload a transient core when closed due to the cache size.
> *Note:*
> The documentation mentions everywhere that the unused cores will be unloaded, 
> but it's misleading as the cores are never fully unloaded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11882) SolrMetric registries retain references to SolrCores when closed

2018-03-27 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16415218#comment-16415218
 ] 

Alan Woodward commented on SOLR-11882:
--

This looks like a pretty serious bug.  I'm going to build another RC for the 
7.3.0 release once SOLR-12141 is in, do you want to backport this fix as well 
or does it need more time to bake?

> SolrMetric registries retain references to SolrCores when closed
> 
>
> Key: SOLR-11882
> URL: https://issues.apache.org/jira/browse/SOLR-11882
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics, Server
>Affects Versions: 7.1
>Reporter: Eros Taborelli
>Assignee: Erick Erickson
>Priority: Major
> Attachments: SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, 
> SOLR-11882.patch, SOLR-11882.patch, create-cores.zip, 
> solr-dump-full_Leak_Suspects.zip, solr.config.zip
>
>
> *Description:*
> Our setup involves using a lot of small cores (possibly hundred thousand), 
> but working only on a few of them at any given time.
> We already followed all recommendations in this guide: 
> [https://wiki.apache.org/solr/LotsOfCores]
> We noticed that after creating/loading around 1000-2000 empty cores, with no 
> documents inside, the heap consumption went through the roof despite having 
> set transientCacheSize to only 64 (heap size set to 12G).
> All cores are correctly set to loadOnStartup=false and transient=true, and we 
> have verified via logs that the cores in excess are actually being closed.
> However, a reference remains in the 
> org.apache.solr.metrics.SolrMetricManager#registries that is never removed 
> until a core if fully unloaded.
> Restarting the JVM loads all cores in the admin UI, but doesn't populate the 
> ConcurrentHashMap until a core is actually fully loaded.
> I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size 
> = 512m) and made a report (attached) using eclipse MAT.
> *Desired outcome:*
> When a transient core is closed, the references in the SolrMetricManager 
> should be removed, in the same fashion the reporters for the core are also 
> closed and removed.
> In alternative, a unloadOnClose=true|false flag could be implemented to fully 
> unload a transient core when closed due to the cache size.
> *Note:*
> The documentation mentions everywhere that the unused cores will be unloaded, 
> but it's misleading as the cores are never fully unloaded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11882) SolrMetric registries retain references to SolrCores when closed

2018-03-26 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414314#comment-16414314
 ] 

Erick Erickson commented on SOLR-11882:
---

[~ab] LGTM, after I cycle through my tests (and do a GC), my SolrCore count now 
drops back to (non-transient-cores + transient-queue-size) which is what I'd 
hoped for.

Thanks!

> SolrMetric registries retain references to SolrCores when closed
> 
>
> Key: SOLR-11882
> URL: https://issues.apache.org/jira/browse/SOLR-11882
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics, Server
>Affects Versions: 7.1
>Reporter: Eros Taborelli
>Assignee: Erick Erickson
>Priority: Major
> Attachments: SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, 
> SOLR-11882.patch, SOLR-11882.patch, create-cores.zip, 
> solr-dump-full_Leak_Suspects.zip, solr.config.zip
>
>
> *Description:*
> Our setup involves using a lot of small cores (possibly hundred thousand), 
> but working only on a few of them at any given time.
> We already followed all recommendations in this guide: 
> [https://wiki.apache.org/solr/LotsOfCores]
> We noticed that after creating/loading around 1000-2000 empty cores, with no 
> documents inside, the heap consumption went through the roof despite having 
> set transientCacheSize to only 64 (heap size set to 12G).
> All cores are correctly set to loadOnStartup=false and transient=true, and we 
> have verified via logs that the cores in excess are actually being closed.
> However, a reference remains in the 
> org.apache.solr.metrics.SolrMetricManager#registries that is never removed 
> until a core if fully unloaded.
> Restarting the JVM loads all cores in the admin UI, but doesn't populate the 
> ConcurrentHashMap until a core is actually fully loaded.
> I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size 
> = 512m) and made a report (attached) using eclipse MAT.
> *Desired outcome:*
> When a transient core is closed, the references in the SolrMetricManager 
> should be removed, in the same fashion the reporters for the core are also 
> closed and removed.
> In alternative, a unloadOnClose=true|false flag could be implemented to fully 
> unload a transient core when closed due to the cache size.
> *Note:*
> The documentation mentions everywhere that the unused cores will be unloaded, 
> but it's misleading as the cores are never fully unloaded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11882) SolrMetric registries retain references to SolrCores when closed

2018-03-26 Thread Andrzej Bialecki (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16413869#comment-16413869
 ] 

Andrzej Bialecki  commented on SOLR-11882:
--

Here's the setup that I used to test and verify this issue:
* created {{core0/conf .. core9/conf}} dirs under {{server/solr/}} and copied 
the {{_default}} configset to each of the conf dirs.
* created in each {{core0 .. core9}} dir a {{core.properties}} file containing 
a single line: {{transient=true}}
* modified {{server/solr/solr.xml}} to contain {{2}} under {{solr}} element
* ran {{bin/solr start}} and issued a simple query request to each of the 
cores, to force its loading (and unloading from the small cache)

After attaching a profiler I was able to verify that indeed 10 instances of 
SolrCore exist, all strongly referenced, and forcing GC doesn't affect this.

I attached a possible patch - it associates each Gauge with the SolrInfoBean 
that registered it, and then unregisters these gauge instances that correspond 
to the bean that is being closed (whether it's SolrCore or other plugin).

There are a few things that I don't like about this patch, though: I used 
{{WeakReference}} to tell JVM that it can garbage collect the lambdas as soon 
as their parent object is unreferenced, and I had to explicitly call 
unregistration in {{SolrCoreMetricManager.close()}}. Either one of these didn't 
work on its own, although I think the unregistration step should - only when 
used both I could see that indeed the references to old transient cores were 
being released. So there's likely still some other factor at play here... but 
at least the patch can be used as a workaround.


> SolrMetric registries retain references to SolrCores when closed
> 
>
> Key: SOLR-11882
> URL: https://issues.apache.org/jira/browse/SOLR-11882
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics, Server
>Affects Versions: 7.1
>Reporter: Eros Taborelli
>Assignee: Erick Erickson
>Priority: Major
> Attachments: SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, 
> SOLR-11882.patch, create-cores.zip, solr-dump-full_Leak_Suspects.zip, 
> solr.config.zip
>
>
> *Description:*
> Our setup involves using a lot of small cores (possibly hundred thousand), 
> but working only on a few of them at any given time.
> We already followed all recommendations in this guide: 
> [https://wiki.apache.org/solr/LotsOfCores]
> We noticed that after creating/loading around 1000-2000 empty cores, with no 
> documents inside, the heap consumption went through the roof despite having 
> set transientCacheSize to only 64 (heap size set to 12G).
> All cores are correctly set to loadOnStartup=false and transient=true, and we 
> have verified via logs that the cores in excess are actually being closed.
> However, a reference remains in the 
> org.apache.solr.metrics.SolrMetricManager#registries that is never removed 
> until a core if fully unloaded.
> Restarting the JVM loads all cores in the admin UI, but doesn't populate the 
> ConcurrentHashMap until a core is actually fully loaded.
> I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size 
> = 512m) and made a report (attached) using eclipse MAT.
> *Desired outcome:*
> When a transient core is closed, the references in the SolrMetricManager 
> should be removed, in the same fashion the reporters for the core are also 
> closed and removed.
> In alternative, a unloadOnClose=true|false flag could be implemented to fully 
> unload a transient core when closed due to the cache size.
> *Note:*
> The documentation mentions everywhere that the unused cores will be unloaded, 
> but it's misleading as the cores are never fully unloaded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11882) SolrMetric registries retain references to SolrCores when closed

2018-03-16 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16402474#comment-16402474
 ] 

Erick Erickson commented on SOLR-11882:
---

[~ab] Here's a one-line fix that I don't particularly like but thought I'd add 
to the conversation:

this is in SolrCores, almost at the very end of the file
{{
  @Override
  public void update(Observable o, Object arg) {
SolrCore core = (SolrCore)arg;
// delete metrics specific to this core

container.getMetricManager().removeRegistry(core.getCoreMetricManager().getRegistryName());
 // this is the important bit.

synchronized (modifyLock) {
  pendingCloses.add(core); // Essentially just queue this core up for 
closing.
  modifyLock.notifyAll(); // Wakes up closer thread too
}
  }
}}

_Unloading_ a non-transient core doesn't have the same problem since the line I 
stole is executed when unloading a core. Reloading a core (as you already 
pointed out) replaces the old reference with a new one so that's no problem.

Just closing a transient core is where the problem is, so this code is executed 
when a transient core is on its way to being closed rather than in the close 
code itself.

What I don't like about it is it's rather loosely coupled with the close, by 
that I mean if there's some other code somewhere that closes a core _that_ code 
has to remember to do this too.

Anyway, I'll be happy to test anything else you come up with, it'll take me 10 
minutes or so to see what the effects of any changes you want me to try is, at 
least as far as transient cores goes.

> SolrMetric registries retain references to SolrCores when closed
> 
>
> Key: SOLR-11882
> URL: https://issues.apache.org/jira/browse/SOLR-11882
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics, Server
>Affects Versions: 7.1
>Reporter: Eros Taborelli
>Assignee: Erick Erickson
>Priority: Major
> Attachments: SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, 
> SOLR-11882.patch, create-cores.zip, solr-dump-full_Leak_Suspects.zip, 
> solr.config.zip
>
>
> *Description:*
> Our setup involves using a lot of small cores (possibly hundred thousand), 
> but working only on a few of them at any given time.
> We already followed all recommendations in this guide: 
> [https://wiki.apache.org/solr/LotsOfCores]
> We noticed that after creating/loading around 1000-2000 empty cores, with no 
> documents inside, the heap consumption went through the roof despite having 
> set transientCacheSize to only 64 (heap size set to 12G).
> All cores are correctly set to loadOnStartup=false and transient=true, and we 
> have verified via logs that the cores in excess are actually being closed.
> However, a reference remains in the 
> org.apache.solr.metrics.SolrMetricManager#registries that is never removed 
> until a core if fully unloaded.
> Restarting the JVM loads all cores in the admin UI, but doesn't populate the 
> ConcurrentHashMap until a core is actually fully loaded.
> I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size 
> = 512m) and made a report (attached) using eclipse MAT.
> *Desired outcome:*
> When a transient core is closed, the references in the SolrMetricManager 
> should be removed, in the same fashion the reporters for the core are also 
> closed and removed.
> In alternative, a unloadOnClose=true|false flag could be implemented to fully 
> unload a transient core when closed due to the cache size.
> *Note:*
> The documentation mentions everywhere that the unused cores will be unloaded, 
> but it's misleading as the cores are never fully unloaded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11882) SolrMetric registries retain references to SolrCores when closed

2018-03-14 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16399440#comment-16399440
 ] 

Erick Erickson commented on SOLR-11882:
---

[~ab] OK, I think the light finally dawned. We're talking about two different 
cases and they both have to be handled.

1> transient core case, the one I'm started with. In this case, the core is 
closed out and _may_, some time in the near or far future be opened again. In 
this case the patch from 28-Jan is probably almost fine although there's still 
a (probably small but unacceptable) chance that a new version of the core would 
be opened before the closer thread got 'round to closing the old one.

2> reopening a core which is the case you're talking about in your comment 
1-Feb.

In <2> there's no problem with cores accumulating due to the reference in the 
metrics code since they've been released by the new assignment already.

Does that make sense?

And is there a good way other than inspection to test any fixes I make?

Thanks!

> SolrMetric registries retain references to SolrCores when closed
> 
>
> Key: SOLR-11882
> URL: https://issues.apache.org/jira/browse/SOLR-11882
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics, Server
>Affects Versions: 7.1
>Reporter: Eros Taborelli
>Assignee: Erick Erickson
>Priority: Major
> Attachments: SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, 
> SOLR-11882.patch, create-cores.zip, solr-dump-full_Leak_Suspects.zip, 
> solr.config.zip
>
>
> *Description:*
> Our setup involves using a lot of small cores (possibly hundred thousand), 
> but working only on a few of them at any given time.
> We already followed all recommendations in this guide: 
> [https://wiki.apache.org/solr/LotsOfCores]
> We noticed that after creating/loading around 1000-2000 empty cores, with no 
> documents inside, the heap consumption went through the roof despite having 
> set transientCacheSize to only 64 (heap size set to 12G).
> All cores are correctly set to loadOnStartup=false and transient=true, and we 
> have verified via logs that the cores in excess are actually being closed.
> However, a reference remains in the 
> org.apache.solr.metrics.SolrMetricManager#registries that is never removed 
> until a core if fully unloaded.
> Restarting the JVM loads all cores in the admin UI, but doesn't populate the 
> ConcurrentHashMap until a core is actually fully loaded.
> I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size 
> = 512m) and made a report (attached) using eclipse MAT.
> *Desired outcome:*
> When a transient core is closed, the references in the SolrMetricManager 
> should be removed, in the same fashion the reporters for the core are also 
> closed and removed.
> In alternative, a unloadOnClose=true|false flag could be implemented to fully 
> unload a transient core when closed due to the cache size.
> *Note:*
> The documentation mentions everywhere that the unused cores will be unloaded, 
> but it's misleading as the cores are never fully unloaded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11882) SolrMetric registries retain references to SolrCores when closed

2018-02-01 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348957#comment-16348957
 ] 

ASF subversion and git services commented on SOLR-11882:


Commit b586dca89ff0b7c365dcbb3e1e403adf477790b1 in lucene-solr's branch 
refs/heads/branch_7x from [~ab]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b586dca ]

Revert "SOLR-11882: SolrMetric registries retain references to SolrCores when 
closed"

This reverts commit 2feb3e794a03e07fa1eee34188d667f24d357db5.


> SolrMetric registries retain references to SolrCores when closed
> 
>
> Key: SOLR-11882
> URL: https://issues.apache.org/jira/browse/SOLR-11882
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics, Server
>Affects Versions: 7.1
>Reporter: Eros Taborelli
>Assignee: Erick Erickson
>Priority: Major
> Fix For: 7.3
>
> Attachments: SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, 
> SOLR-11882.patch, create-cores.zip, solr-dump-full_Leak_Suspects.zip, 
> solr.config.zip
>
>
> *Description:*
> Our setup involves using a lot of small cores (possibly hundred thousand), 
> but working only on a few of them at any given time.
> We already followed all recommendations in this guide: 
> [https://wiki.apache.org/solr/LotsOfCores]
> We noticed that after creating/loading around 1000-2000 empty cores, with no 
> documents inside, the heap consumption went through the roof despite having 
> set transientCacheSize to only 64 (heap size set to 12G).
> All cores are correctly set to loadOnStartup=false and transient=true, and we 
> have verified via logs that the cores in excess are actually being closed.
> However, a reference remains in the 
> org.apache.solr.metrics.SolrMetricManager#registries that is never removed 
> until a core if fully unloaded.
> Restarting the JVM loads all cores in the admin UI, but doesn't populate the 
> ConcurrentHashMap until a core is actually fully loaded.
> I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size 
> = 512m) and made a report (attached) using eclipse MAT.
> *Desired outcome:*
> When a transient core is closed, the references in the SolrMetricManager 
> should be removed, in the same fashion the reporters for the core are also 
> closed and removed.
> In alternative, a unloadOnClose=true|false flag could be implemented to fully 
> unload a transient core when closed due to the cache size.
> *Note:*
> The documentation mentions everywhere that the unused cores will be unloaded, 
> but it's misleading as the cores are never fully unloaded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11882) SolrMetric registries retain references to SolrCores when closed

2018-02-01 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348956#comment-16348956
 ] 

ASF subversion and git services commented on SOLR-11882:


Commit 83696042649e5c7460c47d0ca121c46a58d2fa54 in lucene-solr's branch 
refs/heads/branch_7x from [~ab]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=8369604 ]

Revert "SOLR-11882: SolrMetric registries retain references to SolrCores when 
closed"

This reverts commit a729fc83311a2f6426664d098d2a5920e2b62852.


> SolrMetric registries retain references to SolrCores when closed
> 
>
> Key: SOLR-11882
> URL: https://issues.apache.org/jira/browse/SOLR-11882
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics, Server
>Affects Versions: 7.1
>Reporter: Eros Taborelli
>Assignee: Erick Erickson
>Priority: Major
> Fix For: 7.3
>
> Attachments: SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, 
> SOLR-11882.patch, create-cores.zip, solr-dump-full_Leak_Suspects.zip, 
> solr.config.zip
>
>
> *Description:*
> Our setup involves using a lot of small cores (possibly hundred thousand), 
> but working only on a few of them at any given time.
> We already followed all recommendations in this guide: 
> [https://wiki.apache.org/solr/LotsOfCores]
> We noticed that after creating/loading around 1000-2000 empty cores, with no 
> documents inside, the heap consumption went through the roof despite having 
> set transientCacheSize to only 64 (heap size set to 12G).
> All cores are correctly set to loadOnStartup=false and transient=true, and we 
> have verified via logs that the cores in excess are actually being closed.
> However, a reference remains in the 
> org.apache.solr.metrics.SolrMetricManager#registries that is never removed 
> until a core if fully unloaded.
> Restarting the JVM loads all cores in the admin UI, but doesn't populate the 
> ConcurrentHashMap until a core is actually fully loaded.
> I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size 
> = 512m) and made a report (attached) using eclipse MAT.
> *Desired outcome:*
> When a transient core is closed, the references in the SolrMetricManager 
> should be removed, in the same fashion the reporters for the core are also 
> closed and removed.
> In alternative, a unloadOnClose=true|false flag could be implemented to fully 
> unload a transient core when closed due to the cache size.
> *Note:*
> The documentation mentions everywhere that the unused cores will be unloaded, 
> but it's misleading as the cores are never fully unloaded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11882) SolrMetric registries retain references to SolrCores when closed

2018-02-01 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348931#comment-16348931
 ] 

ASF subversion and git services commented on SOLR-11882:


Commit b0b963c68e04a249b87d5b3ab70ade52d19d85ee in lucene-solr's branch 
refs/heads/master from [~ab]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b0b963c ]

Revert "SOLR-11882: SolrMetric registries retain references to SolrCores when 
closed"

This reverts commit c724845fabcdbffe15ad78f5335c77cae0900194.


> SolrMetric registries retain references to SolrCores when closed
> 
>
> Key: SOLR-11882
> URL: https://issues.apache.org/jira/browse/SOLR-11882
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics, Server
>Affects Versions: 7.1
>Reporter: Eros Taborelli
>Assignee: Erick Erickson
>Priority: Major
> Fix For: 7.3
>
> Attachments: SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, 
> SOLR-11882.patch, create-cores.zip, solr-dump-full_Leak_Suspects.zip, 
> solr.config.zip
>
>
> *Description:*
> Our setup involves using a lot of small cores (possibly hundred thousand), 
> but working only on a few of them at any given time.
> We already followed all recommendations in this guide: 
> [https://wiki.apache.org/solr/LotsOfCores]
> We noticed that after creating/loading around 1000-2000 empty cores, with no 
> documents inside, the heap consumption went through the roof despite having 
> set transientCacheSize to only 64 (heap size set to 12G).
> All cores are correctly set to loadOnStartup=false and transient=true, and we 
> have verified via logs that the cores in excess are actually being closed.
> However, a reference remains in the 
> org.apache.solr.metrics.SolrMetricManager#registries that is never removed 
> until a core if fully unloaded.
> Restarting the JVM loads all cores in the admin UI, but doesn't populate the 
> ConcurrentHashMap until a core is actually fully loaded.
> I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size 
> = 512m) and made a report (attached) using eclipse MAT.
> *Desired outcome:*
> When a transient core is closed, the references in the SolrMetricManager 
> should be removed, in the same fashion the reporters for the core are also 
> closed and removed.
> In alternative, a unloadOnClose=true|false flag could be implemented to fully 
> unload a transient core when closed due to the cache size.
> *Note:*
> The documentation mentions everywhere that the unused cores will be unloaded, 
> but it's misleading as the cores are never fully unloaded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11882) SolrMetric registries retain references to SolrCores when closed

2018-02-01 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348930#comment-16348930
 ] 

ASF subversion and git services commented on SOLR-11882:


Commit 8418081c4ae5bfe752938c1ae6db9cf5063c8e7f in lucene-solr's branch 
refs/heads/master from [~ab]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=8418081 ]

Revert "SOLR-11882: SolrMetric registries retain references to SolrCores when 
closed"

This reverts commit f0509c19c16ded1557f8d7168acb0b7faf926ab7.


> SolrMetric registries retain references to SolrCores when closed
> 
>
> Key: SOLR-11882
> URL: https://issues.apache.org/jira/browse/SOLR-11882
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics, Server
>Affects Versions: 7.1
>Reporter: Eros Taborelli
>Assignee: Erick Erickson
>Priority: Major
> Fix For: 7.3
>
> Attachments: SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, 
> SOLR-11882.patch, create-cores.zip, solr-dump-full_Leak_Suspects.zip, 
> solr.config.zip
>
>
> *Description:*
> Our setup involves using a lot of small cores (possibly hundred thousand), 
> but working only on a few of them at any given time.
> We already followed all recommendations in this guide: 
> [https://wiki.apache.org/solr/LotsOfCores]
> We noticed that after creating/loading around 1000-2000 empty cores, with no 
> documents inside, the heap consumption went through the roof despite having 
> set transientCacheSize to only 64 (heap size set to 12G).
> All cores are correctly set to loadOnStartup=false and transient=true, and we 
> have verified via logs that the cores in excess are actually being closed.
> However, a reference remains in the 
> org.apache.solr.metrics.SolrMetricManager#registries that is never removed 
> until a core if fully unloaded.
> Restarting the JVM loads all cores in the admin UI, but doesn't populate the 
> ConcurrentHashMap until a core is actually fully loaded.
> I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size 
> = 512m) and made a report (attached) using eclipse MAT.
> *Desired outcome:*
> When a transient core is closed, the references in the SolrMetricManager 
> should be removed, in the same fashion the reporters for the core are also 
> closed and removed.
> In alternative, a unloadOnClose=true|false flag could be implemented to fully 
> unload a transient core when closed due to the cache size.
> *Note:*
> The documentation mentions everywhere that the unused cores will be unloaded, 
> but it's misleading as the cores are never fully unloaded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11882) SolrMetric registries retain references to SolrCores when closed

2018-02-01 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348848#comment-16348848
 ] 

Erick Erickson commented on SOLR-11882:
---

OK, let's revert this. I am traveling today, so feel free if you'd like. I'll 
get to it this weekend if you don't get to it.

> SolrMetric registries retain references to SolrCores when closed
> 
>
> Key: SOLR-11882
> URL: https://issues.apache.org/jira/browse/SOLR-11882
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics, Server
>Affects Versions: 7.1
>Reporter: Eros Taborelli
>Assignee: Erick Erickson
>Priority: Major
> Fix For: 7.3
>
> Attachments: SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, 
> SOLR-11882.patch, create-cores.zip, solr-dump-full_Leak_Suspects.zip, 
> solr.config.zip
>
>
> *Description:*
> Our setup involves using a lot of small cores (possibly hundred thousand), 
> but working only on a few of them at any given time.
> We already followed all recommendations in this guide: 
> [https://wiki.apache.org/solr/LotsOfCores]
> We noticed that after creating/loading around 1000-2000 empty cores, with no 
> documents inside, the heap consumption went through the roof despite having 
> set transientCacheSize to only 64 (heap size set to 12G).
> All cores are correctly set to loadOnStartup=false and transient=true, and we 
> have verified via logs that the cores in excess are actually being closed.
> However, a reference remains in the 
> org.apache.solr.metrics.SolrMetricManager#registries that is never removed 
> until a core if fully unloaded.
> Restarting the JVM loads all cores in the admin UI, but doesn't populate the 
> ConcurrentHashMap until a core is actually fully loaded.
> I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size 
> = 512m) and made a report (attached) using eclipse MAT.
> *Desired outcome:*
> When a transient core is closed, the references in the SolrMetricManager 
> should be removed, in the same fashion the reporters for the core are also 
> closed and removed.
> In alternative, a unloadOnClose=true|false flag could be implemented to fully 
> unload a transient core when closed due to the cache size.
> *Note:*
> The documentation mentions everywhere that the unused cores will be unloaded, 
> but it's misleading as the cores are never fully unloaded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11882) SolrMetric registries retain references to SolrCores when closed

2018-02-01 Thread Andrzej Bialecki (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348833#comment-16348833
 ] 

Andrzej Bialecki  commented on SOLR-11882:
--

It turns out that this fix is wrong... :(

The new section in {{SolrCoreMetricManager.close()}} causes the new instances 
of gauges to be closed because the new core is registered first (and registers 
new instances of metrics) and only then the old one is closed - and it closes 
the new metrics instead of the old ones…

One solution, which is more complicated than I’d like, is to use a subclass of 
Gauge that has a tag (the same as we do with MetricReporters) and remove 
instances only when the tag matches the one in the core that is being closed
or revert this fix and see if there’s something better that we could do here.

> SolrMetric registries retain references to SolrCores when closed
> 
>
> Key: SOLR-11882
> URL: https://issues.apache.org/jira/browse/SOLR-11882
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics, Server
>Affects Versions: 7.1
>Reporter: Eros Taborelli
>Assignee: Erick Erickson
>Priority: Major
> Fix For: 7.3
>
> Attachments: SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, 
> SOLR-11882.patch, create-cores.zip, solr-dump-full_Leak_Suspects.zip, 
> solr.config.zip
>
>
> *Description:*
> Our setup involves using a lot of small cores (possibly hundred thousand), 
> but working only on a few of them at any given time.
> We already followed all recommendations in this guide: 
> [https://wiki.apache.org/solr/LotsOfCores]
> We noticed that after creating/loading around 1000-2000 empty cores, with no 
> documents inside, the heap consumption went through the roof despite having 
> set transientCacheSize to only 64 (heap size set to 12G).
> All cores are correctly set to loadOnStartup=false and transient=true, and we 
> have verified via logs that the cores in excess are actually being closed.
> However, a reference remains in the 
> org.apache.solr.metrics.SolrMetricManager#registries that is never removed 
> until a core if fully unloaded.
> Restarting the JVM loads all cores in the admin UI, but doesn't populate the 
> ConcurrentHashMap until a core is actually fully loaded.
> I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size 
> = 512m) and made a report (attached) using eclipse MAT.
> *Desired outcome:*
> When a transient core is closed, the references in the SolrMetricManager 
> should be removed, in the same fashion the reporters for the core are also 
> closed and removed.
> In alternative, a unloadOnClose=true|false flag could be implemented to fully 
> unload a transient core when closed due to the cache size.
> *Note:*
> The documentation mentions everywhere that the unused cores will be unloaded, 
> but it's misleading as the cores are never fully unloaded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11882) SolrMetric registries retain references to SolrCores when closed

2018-01-28 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342864#comment-16342864
 ] 

ASF subversion and git services commented on SOLR-11882:


Commit 2feb3e794a03e07fa1eee34188d667f24d357db5 in lucene-solr's branch 
refs/heads/branch_7x from Erick
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=2feb3e7 ]

SOLR-11882: SolrMetric registries retain references to SolrCores when closed

(cherry picked from commit c724845)


> SolrMetric registries retain references to SolrCores when closed
> 
>
> Key: SOLR-11882
> URL: https://issues.apache.org/jira/browse/SOLR-11882
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics, Server
>Affects Versions: 7.1
>Reporter: Eros Taborelli
>Assignee: Erick Erickson
>Priority: Major
> Attachments: SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, 
> SOLR-11882.patch, create-cores.zip, solr-dump-full_Leak_Suspects.zip, 
> solr.config.zip
>
>
> *Description:*
> Our setup involves using a lot of small cores (possibly hundred thousand), 
> but working only on a few of them at any given time.
> We already followed all recommendations in this guide: 
> [https://wiki.apache.org/solr/LotsOfCores]
> We noticed that after creating/loading around 1000-2000 empty cores, with no 
> documents inside, the heap consumption went through the roof despite having 
> set transientCacheSize to only 64 (heap size set to 12G).
> All cores are correctly set to loadOnStartup=false and transient=true, and we 
> have verified via logs that the cores in excess are actually being closed.
> However, a reference remains in the 
> org.apache.solr.metrics.SolrMetricManager#registries that is never removed 
> until a core if fully unloaded.
> Restarting the JVM loads all cores in the admin UI, but doesn't populate the 
> ConcurrentHashMap until a core is actually fully loaded.
> I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size 
> = 512m) and made a report (attached) using eclipse MAT.
> *Desired outcome:*
> When a transient core is closed, the references in the SolrMetricManager 
> should be removed, in the same fashion the reporters for the core are also 
> closed and removed.
> In alternative, a unloadOnClose=true|false flag could be implemented to fully 
> unload a transient core when closed due to the cache size.
> *Note:*
> The documentation mentions everywhere that the unused cores will be unloaded, 
> but it's misleading as the cores are never fully unloaded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11882) SolrMetric registries retain references to SolrCores when closed

2018-01-28 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342863#comment-16342863
 ] 

ASF subversion and git services commented on SOLR-11882:


Commit a729fc83311a2f6426664d098d2a5920e2b62852 in lucene-solr's branch 
refs/heads/branch_7x from Erick
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=a729fc8 ]

SOLR-11882: SolrMetric registries retain references to SolrCores when closed

(cherry picked from commit f0509c1)


> SolrMetric registries retain references to SolrCores when closed
> 
>
> Key: SOLR-11882
> URL: https://issues.apache.org/jira/browse/SOLR-11882
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics, Server
>Affects Versions: 7.1
>Reporter: Eros Taborelli
>Assignee: Erick Erickson
>Priority: Major
> Attachments: SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, 
> SOLR-11882.patch, create-cores.zip, solr-dump-full_Leak_Suspects.zip, 
> solr.config.zip
>
>
> *Description:*
> Our setup involves using a lot of small cores (possibly hundred thousand), 
> but working only on a few of them at any given time.
> We already followed all recommendations in this guide: 
> [https://wiki.apache.org/solr/LotsOfCores]
> We noticed that after creating/loading around 1000-2000 empty cores, with no 
> documents inside, the heap consumption went through the roof despite having 
> set transientCacheSize to only 64 (heap size set to 12G).
> All cores are correctly set to loadOnStartup=false and transient=true, and we 
> have verified via logs that the cores in excess are actually being closed.
> However, a reference remains in the 
> org.apache.solr.metrics.SolrMetricManager#registries that is never removed 
> until a core if fully unloaded.
> Restarting the JVM loads all cores in the admin UI, but doesn't populate the 
> ConcurrentHashMap until a core is actually fully loaded.
> I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size 
> = 512m) and made a report (attached) using eclipse MAT.
> *Desired outcome:*
> When a transient core is closed, the references in the SolrMetricManager 
> should be removed, in the same fashion the reporters for the core are also 
> closed and removed.
> In alternative, a unloadOnClose=true|false flag could be implemented to fully 
> unload a transient core when closed due to the cache size.
> *Note:*
> The documentation mentions everywhere that the unused cores will be unloaded, 
> but it's misleading as the cores are never fully unloaded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11882) SolrMetric registries retain references to SolrCores when closed

2018-01-28 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342855#comment-16342855
 ] 

ASF subversion and git services commented on SOLR-11882:


Commit c724845fabcdbffe15ad78f5335c77cae0900194 in lucene-solr's branch 
refs/heads/master from Erick
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=c724845 ]

SOLR-11882: SolrMetric registries retain references to SolrCores when closed


> SolrMetric registries retain references to SolrCores when closed
> 
>
> Key: SOLR-11882
> URL: https://issues.apache.org/jira/browse/SOLR-11882
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics, Server
>Affects Versions: 7.1
>Reporter: Eros Taborelli
>Assignee: Erick Erickson
>Priority: Major
> Attachments: SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, 
> SOLR-11882.patch, create-cores.zip, solr-dump-full_Leak_Suspects.zip, 
> solr.config.zip
>
>
> *Description:*
> Our setup involves using a lot of small cores (possibly hundred thousand), 
> but working only on a few of them at any given time.
> We already followed all recommendations in this guide: 
> [https://wiki.apache.org/solr/LotsOfCores]
> We noticed that after creating/loading around 1000-2000 empty cores, with no 
> documents inside, the heap consumption went through the roof despite having 
> set transientCacheSize to only 64 (heap size set to 12G).
> All cores are correctly set to loadOnStartup=false and transient=true, and we 
> have verified via logs that the cores in excess are actually being closed.
> However, a reference remains in the 
> org.apache.solr.metrics.SolrMetricManager#registries that is never removed 
> until a core if fully unloaded.
> Restarting the JVM loads all cores in the admin UI, but doesn't populate the 
> ConcurrentHashMap until a core is actually fully loaded.
> I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size 
> = 512m) and made a report (attached) using eclipse MAT.
> *Desired outcome:*
> When a transient core is closed, the references in the SolrMetricManager 
> should be removed, in the same fashion the reporters for the core are also 
> closed and removed.
> In alternative, a unloadOnClose=true|false flag could be implemented to fully 
> unload a transient core when closed due to the cache size.
> *Note:*
> The documentation mentions everywhere that the unused cores will be unloaded, 
> but it's misleading as the cores are never fully unloaded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11882) SolrMetric registries retain references to SolrCores when closed

2018-01-28 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342856#comment-16342856
 ] 

ASF subversion and git services commented on SOLR-11882:


Commit d85a1666a18423eeeda83ca89ce4ab959ce39066 in lucene-solr's branch 
refs/heads/master from Erick
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=d85a166 ]

SOLR-11882: SolrMetric registries retain references to SolrCores when closed


> SolrMetric registries retain references to SolrCores when closed
> 
>
> Key: SOLR-11882
> URL: https://issues.apache.org/jira/browse/SOLR-11882
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics, Server
>Affects Versions: 7.1
>Reporter: Eros Taborelli
>Assignee: Erick Erickson
>Priority: Major
> Attachments: SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, 
> SOLR-11882.patch, create-cores.zip, solr-dump-full_Leak_Suspects.zip, 
> solr.config.zip
>
>
> *Description:*
> Our setup involves using a lot of small cores (possibly hundred thousand), 
> but working only on a few of them at any given time.
> We already followed all recommendations in this guide: 
> [https://wiki.apache.org/solr/LotsOfCores]
> We noticed that after creating/loading around 1000-2000 empty cores, with no 
> documents inside, the heap consumption went through the roof despite having 
> set transientCacheSize to only 64 (heap size set to 12G).
> All cores are correctly set to loadOnStartup=false and transient=true, and we 
> have verified via logs that the cores in excess are actually being closed.
> However, a reference remains in the 
> org.apache.solr.metrics.SolrMetricManager#registries that is never removed 
> until a core if fully unloaded.
> Restarting the JVM loads all cores in the admin UI, but doesn't populate the 
> ConcurrentHashMap until a core is actually fully loaded.
> I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size 
> = 512m) and made a report (attached) using eclipse MAT.
> *Desired outcome:*
> When a transient core is closed, the references in the SolrMetricManager 
> should be removed, in the same fashion the reporters for the core are also 
> closed and removed.
> In alternative, a unloadOnClose=true|false flag could be implemented to fully 
> unload a transient core when closed due to the cache size.
> *Note:*
> The documentation mentions everywhere that the unused cores will be unloaded, 
> but it's misleading as the cores are never fully unloaded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11882) SolrMetric registries retain references to SolrCores when closed

2018-01-28 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342853#comment-16342853
 ] 

ASF subversion and git services commented on SOLR-11882:


Commit f0509c19c16ded1557f8d7168acb0b7faf926ab7 in lucene-solr's branch 
refs/heads/master from Erick
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f0509c1 ]

SOLR-11882: SolrMetric registries retain references to SolrCores when closed


> SolrMetric registries retain references to SolrCores when closed
> 
>
> Key: SOLR-11882
> URL: https://issues.apache.org/jira/browse/SOLR-11882
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics, Server
>Affects Versions: 7.1
>Reporter: Eros Taborelli
>Assignee: Erick Erickson
>Priority: Major
> Attachments: SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, 
> SOLR-11882.patch, create-cores.zip, solr-dump-full_Leak_Suspects.zip, 
> solr.config.zip
>
>
> *Description:*
> Our setup involves using a lot of small cores (possibly hundred thousand), 
> but working only on a few of them at any given time.
> We already followed all recommendations in this guide: 
> [https://wiki.apache.org/solr/LotsOfCores]
> We noticed that after creating/loading around 1000-2000 empty cores, with no 
> documents inside, the heap consumption went through the roof despite having 
> set transientCacheSize to only 64 (heap size set to 12G).
> All cores are correctly set to loadOnStartup=false and transient=true, and we 
> have verified via logs that the cores in excess are actually being closed.
> However, a reference remains in the 
> org.apache.solr.metrics.SolrMetricManager#registries that is never removed 
> until a core if fully unloaded.
> Restarting the JVM loads all cores in the admin UI, but doesn't populate the 
> ConcurrentHashMap until a core is actually fully loaded.
> I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size 
> = 512m) and made a report (attached) using eclipse MAT.
> *Desired outcome:*
> When a transient core is closed, the references in the SolrMetricManager 
> should be removed, in the same fashion the reporters for the core are also 
> closed and removed.
> In alternative, a unloadOnClose=true|false flag could be implemented to fully 
> unload a transient core when closed due to the cache size.
> *Note:*
> The documentation mentions everywhere that the unused cores will be unloaded, 
> but it's misleading as the cores are never fully unloaded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11882) SolrMetric registries retain references to SolrCores when closed

2018-01-28 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342852#comment-16342852
 ] 

Erick Erickson commented on SOLR-11882:
---

Patch with CHANGES.txt

> SolrMetric registries retain references to SolrCores when closed
> 
>
> Key: SOLR-11882
> URL: https://issues.apache.org/jira/browse/SOLR-11882
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics, Server
>Affects Versions: 7.1
>Reporter: Eros Taborelli
>Assignee: Erick Erickson
>Priority: Major
> Attachments: SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, 
> SOLR-11882.patch, create-cores.zip, solr-dump-full_Leak_Suspects.zip, 
> solr.config.zip
>
>
> *Description:*
> Our setup involves using a lot of small cores (possibly hundred thousand), 
> but working only on a few of them at any given time.
> We already followed all recommendations in this guide: 
> [https://wiki.apache.org/solr/LotsOfCores]
> We noticed that after creating/loading around 1000-2000 empty cores, with no 
> documents inside, the heap consumption went through the roof despite having 
> set transientCacheSize to only 64 (heap size set to 12G).
> All cores are correctly set to loadOnStartup=false and transient=true, and we 
> have verified via logs that the cores in excess are actually being closed.
> However, a reference remains in the 
> org.apache.solr.metrics.SolrMetricManager#registries that is never removed 
> until a core if fully unloaded.
> Restarting the JVM loads all cores in the admin UI, but doesn't populate the 
> ConcurrentHashMap until a core is actually fully loaded.
> I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size 
> = 512m) and made a report (attached) using eclipse MAT.
> *Desired outcome:*
> When a transient core is closed, the references in the SolrMetricManager 
> should be removed, in the same fashion the reporters for the core are also 
> closed and removed.
> In alternative, a unloadOnClose=true|false flag could be implemented to fully 
> unload a transient core when closed due to the cache size.
> *Note:*
> The documentation mentions everywhere that the unused cores will be unloaded, 
> but it's misleading as the cores are never fully unloaded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11882) SolrMetric registries retain references to SolrCores when closed

2018-01-27 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342313#comment-16342313
 ] 

Erick Erickson commented on SOLR-11882:
---

Any reason _not_ to commit this? Otherwise I'll commit this this weekend 
Now that I have the right patch up there.

> SolrMetric registries retain references to SolrCores when closed
> 
>
> Key: SOLR-11882
> URL: https://issues.apache.org/jira/browse/SOLR-11882
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics, Server
>Affects Versions: 7.1
>Reporter: Eros Taborelli
>Assignee: Erick Erickson
>Priority: Major
> Attachments: SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, 
> create-cores.zip, solr-dump-full_Leak_Suspects.zip, solr.config.zip
>
>
> *Description:*
> Our setup involves using a lot of small cores (possibly hundred thousand), 
> but working only on a few of them at any given time.
> We already followed all recommendations in this guide: 
> [https://wiki.apache.org/solr/LotsOfCores]
> We noticed that after creating/loading around 1000-2000 empty cores, with no 
> documents inside, the heap consumption went through the roof despite having 
> set transientCacheSize to only 64 (heap size set to 12G).
> All cores are correctly set to loadOnStartup=false and transient=true, and we 
> have verified via logs that the cores in excess are actually being closed.
> However, a reference remains in the 
> org.apache.solr.metrics.SolrMetricManager#registries that is never removed 
> until a core if fully unloaded.
> Restarting the JVM loads all cores in the admin UI, but doesn't populate the 
> ConcurrentHashMap until a core is actually fully loaded.
> I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size 
> = 512m) and made a report (attached) using eclipse MAT.
> *Desired outcome:*
> When a transient core is closed, the references in the SolrMetricManager 
> should be removed, in the same fashion the reporters for the core are also 
> closed and removed.
> In alternative, a unloadOnClose=true|false flag could be implemented to fully 
> unload a transient core when closed due to the cache size.
> *Note:*
> The documentation mentions everywhere that the unused cores will be unloaded, 
> but it's misleading as the cores are never fully unloaded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11882) SolrMetric registries retain references to SolrCores when closed

2018-01-24 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16338504#comment-16338504
 ] 

Erick Erickson commented on SOLR-11882:
---

Oh total bother. I put up a second copy of the _same_ hack patch up rather than 
the one you coached me on, I'll put that one up shortly.

> SolrMetric registries retain references to SolrCores when closed
> 
>
> Key: SOLR-11882
> URL: https://issues.apache.org/jira/browse/SOLR-11882
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Server
>Affects Versions: 7.1
>Reporter: Eros Taborelli
>Assignee: Erick Erickson
>Priority: Major
> Attachments: SOLR-11882.patch, SOLR-11882.patch, create-cores.zip, 
> solr-dump-full_Leak_Suspects.zip, solr.config.zip
>
>
> *Description:*
> Our setup involves using a lot of small cores (possibly hundred thousand), 
> but working only on a few of them at any given time.
> We already followed all recommendations in this guide: 
> [https://wiki.apache.org/solr/LotsOfCores]
> We noticed that after creating/loading around 1000-2000 empty cores, with no 
> documents inside, the heap consumption went through the roof despite having 
> set transientCacheSize to only 64 (heap size set to 12G).
> All cores are correctly set to loadOnStartup=false and transient=true, and we 
> have verified via logs that the cores in excess are actually being closed.
> However, a reference remains in the 
> org.apache.solr.metrics.SolrMetricManager#registries that is never removed 
> until a core if fully unloaded.
> Restarting the JVM loads all cores in the admin UI, but doesn't populate the 
> ConcurrentHashMap until a core is actually fully loaded.
> I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size 
> = 512m) and made a report (attached) using eclipse MAT.
> *Desired outcome:*
> When a transient core is closed, the references in the SolrMetricManager 
> should be removed, in the same fashion the reporters for the core are also 
> closed and removed.
> In alternative, a unloadOnClose=true|false flag could be implemented to fully 
> unload a transient core when closed due to the cache size.
> *Note:*
> The documentation mentions everywhere that the unused cores will be unloaded, 
> but it's misleading as the cores are never fully unloaded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11882) SolrMetric registries retain references to SolrCores when closed

2018-01-24 Thread Andrzej Bialecki (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16337190#comment-16337190
 ] 

Andrzej Bialecki  commented on SOLR-11882:
--

{quote}[~ab]kindly provided this suggestion, I applied it
{quote}
My suggestion was to replace Gauge metrics in the registry inside 
{{SolrCoreMetricManager.close()}} with their last primitive values (because 
most of these Gauges are created as lambdas and keep referencing SolrCore, 
whereas values they produce don't reference the core) - this way we would stop 
referencing SolrCore but still preserve a snapshot of gauge values. Something 
like this:
{code}
metricRegistry.getGauges().forEach((k, v) -> {
 Object val = v.getValue();
 metricRegistry.remove(k);
 metricRegistry.register(k, (Gauge)() -> val);
}
{code}
I'm surprised your patch works, because {{SolrCoreMetricManager.close()}} is 
called from inside {{SolrCore.close()}}, and calling {{SolrCore.close()}} here 
again should IMHO lead to "Too many closes" exception...

> SolrMetric registries retain references to SolrCores when closed
> 
>
> Key: SOLR-11882
> URL: https://issues.apache.org/jira/browse/SOLR-11882
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Server
>Affects Versions: 7.1
>Reporter: Eros Taborelli
>Assignee: Erick Erickson
>Priority: Major
> Attachments: SOLR-11882.patch, SOLR-11882.patch, create-cores.zip, 
> solr-dump-full_Leak_Suspects.zip, solr.config.zip
>
>
> *Description:*
> Our setup involves using a lot of small cores (possibly hundred thousand), 
> but working only on a few of them at any given time.
> We already followed all recommendations in this guide: 
> [https://wiki.apache.org/solr/LotsOfCores]
> We noticed that after creating/loading around 1000-2000 empty cores, with no 
> documents inside, the heap consumption went through the roof despite having 
> set transientCacheSize to only 64 (heap size set to 12G).
> All cores are correctly set to loadOnStartup=false and transient=true, and we 
> have verified via logs that the cores in excess are actually being closed.
> However, a reference remains in the 
> org.apache.solr.metrics.SolrMetricManager#registries that is never removed 
> until a core if fully unloaded.
> Restarting the JVM loads all cores in the admin UI, but doesn't populate the 
> ConcurrentHashMap until a core is actually fully loaded.
> I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size 
> = 512m) and made a report (attached) using eclipse MAT.
> *Desired outcome:*
> When a transient core is closed, the references in the SolrMetricManager 
> should be removed, in the same fashion the reporters for the core are also 
> closed and removed.
> In alternative, a unloadOnClose=true|false flag could be implemented to fully 
> unload a transient core when closed due to the cache size.
> *Note:*
> The documentation mentions everywhere that the unused cores will be unloaded, 
> but it's misleading as the cores are never fully unloaded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11882) SolrMetric registries retain references to SolrCores when closed

2018-01-24 Thread Eros Taborelli (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16337104#comment-16337104
 ] 

Eros Taborelli commented on SOLR-11882:
---

[~erickerickson] yes, that is what we see.

> SolrMetric registries retain references to SolrCores when closed
> 
>
> Key: SOLR-11882
> URL: https://issues.apache.org/jira/browse/SOLR-11882
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Server
>Affects Versions: 7.1
>Reporter: Eros Taborelli
>Assignee: Erick Erickson
>Priority: Major
> Attachments: SOLR-11882.patch, SOLR-11882.patch, create-cores.zip, 
> solr-dump-full_Leak_Suspects.zip, solr.config.zip
>
>
> *Description:*
> Our setup involves using a lot of small cores (possibly hundred thousand), 
> but working only on a few of them at any given time.
> We already followed all recommendations in this guide: 
> [https://wiki.apache.org/solr/LotsOfCores]
> We noticed that after creating/loading around 1000-2000 empty cores, with no 
> documents inside, the heap consumption went through the roof despite having 
> set transientCacheSize to only 64 (heap size set to 12G).
> All cores are correctly set to loadOnStartup=false and transient=true, and we 
> have verified via logs that the cores in excess are actually being closed.
> However, a reference remains in the 
> org.apache.solr.metrics.SolrMetricManager#registries that is never removed 
> until a core if fully unloaded.
> Restarting the JVM loads all cores in the admin UI, but doesn't populate the 
> ConcurrentHashMap until a core is actually fully loaded.
> I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size 
> = 512m) and made a report (attached) using eclipse MAT.
> *Desired outcome:*
> When a transient core is closed, the references in the SolrMetricManager 
> should be removed, in the same fashion the reporters for the core are also 
> closed and removed.
> In alternative, a unloadOnClose=true|false flag could be implemented to fully 
> unload a transient core when closed due to the cache size.
> *Note:*
> The documentation mentions everywhere that the unused cores will be unloaded, 
> but it's misleading as the cores are never fully unloaded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11882) SolrMetric registries retain references to SolrCores when closed

2018-01-23 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16336891#comment-16336891
 ] 

Erick Erickson commented on SOLR-11882:
---

[~ab] kindly provided this suggestion, I applied it

- it fixes the cores lingering around. As above I had to stop indexing and 
force a GC to have the cores drop back to 4 in my test scenario. In "real" 
situations where you have hundreds/thousands of cores I'd expect the number of 
references to peak somewhat above your cache size as some wait around for GC

2> precommit passes

3> tests pass. I had one failure with AutoscalingHistoryHandlerTest, then 3 of 
10 failed (beasting). However, 2 of 10 failed without this patch so I don't 
think it's relevant.

What I have _not_ looked at yet is what happens when metrics are requested for 
non-resident cores, or whether having the cores come and go accumulates metrics 
over successive loads of the core.

> SolrMetric registries retain references to SolrCores when closed
> 
>
> Key: SOLR-11882
> URL: https://issues.apache.org/jira/browse/SOLR-11882
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Server
>Affects Versions: 7.1
>Reporter: Eros Taborelli
>Assignee: Erick Erickson
>Priority: Major
> Attachments: SOLR-11882.patch, SOLR-11882.patch, create-cores.zip, 
> solr-dump-full_Leak_Suspects.zip, solr.config.zip
>
>
> *Description:*
> Our setup involves using a lot of small cores (possibly hundred thousand), 
> but working only on a few of them at any given time.
> We already followed all recommendations in this guide: 
> [https://wiki.apache.org/solr/LotsOfCores]
> We noticed that after creating/loading around 1000-2000 empty cores, with no 
> documents inside, the heap consumption went through the roof despite having 
> set transientCacheSize to only 64 (heap size set to 12G).
> All cores are correctly set to loadOnStartup=false and transient=true, and we 
> have verified via logs that the cores in excess are actually being closed.
> However, a reference remains in the 
> org.apache.solr.metrics.SolrMetricManager#registries that is never removed 
> until a core if fully unloaded.
> Restarting the JVM loads all cores in the admin UI, but doesn't populate the 
> ConcurrentHashMap until a core is actually fully loaded.
> I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size 
> = 512m) and made a report (attached) using eclipse MAT.
> *Desired outcome:*
> When a transient core is closed, the references in the SolrMetricManager 
> should be removed, in the same fashion the reporters for the core are also 
> closed and removed.
> In alternative, a unloadOnClose=true|false flag could be implemented to fully 
> unload a transient core when closed due to the cache size.
> *Note:*
> The documentation mentions everywhere that the unused cores will be unloaded, 
> but it's misleading as the cores are never fully unloaded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org