[ 
https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16402474#comment-16402474
 ] 

Erick Erickson commented on SOLR-11882:
---------------------------------------

[~ab] Here's a one-line fix that I don't particularly like but thought I'd add 
to the conversation:

this is in SolrCores, almost at the very end of the file
{{
  @Override
  public void update(Observable o, Object arg) {
    SolrCore core = (SolrCore)arg;
    // delete metrics specific to this core
    
container.getMetricManager().removeRegistry(core.getCoreMetricManager().getRegistryName());
 // this is the important bit.

    synchronized (modifyLock) {
      pendingCloses.add(core); // Essentially just queue this core up for 
closing.
      modifyLock.notifyAll(); // Wakes up closer thread too
    }
  }
}}

_Unloading_ a non-transient core doesn't have the same problem since the line I 
stole is executed when unloading a core. Reloading a core (as you already 
pointed out) replaces the old reference with a new one so that's no problem.

Just closing a transient core is where the problem is, so this code is executed 
when a transient core is on its way to being closed rather than in the close 
code itself.

What I don't like about it is it's rather loosely coupled with the close, by 
that I mean if there's some other code somewhere that closes a core _that_ code 
has to remember to do this too.

Anyway, I'll be happy to test anything else you come up with, it'll take me 10 
minutes or so to see what the effects of any changes you want me to try is, at 
least as far as transient cores goes.

> SolrMetric registries retain references to SolrCores when closed
> ----------------------------------------------------------------
>
>                 Key: SOLR-11882
>                 URL: https://issues.apache.org/jira/browse/SOLR-11882
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: metrics, Server
>    Affects Versions: 7.1
>            Reporter: Eros Taborelli
>            Assignee: Erick Erickson
>            Priority: Major
>         Attachments: SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, 
> SOLR-11882.patch, create-cores.zip, solr-dump-full_Leak_Suspects.zip, 
> solr.config.zip
>
>
> *Description:*
> Our setup involves using a lot of small cores (possibly hundred thousand), 
> but working only on a few of them at any given time.
> We already followed all recommendations in this guide: 
> [https://wiki.apache.org/solr/LotsOfCores]
> We noticed that after creating/loading around 1000-2000 empty cores, with no 
> documents inside, the heap consumption went through the roof despite having 
> set transientCacheSize to only 64 (heap size set to 12G).
> All cores are correctly set to loadOnStartup=false and transient=true, and we 
> have verified via logs that the cores in excess are actually being closed.
> However, a reference remains in the 
> org.apache.solr.metrics.SolrMetricManager#registries that is never removed 
> until a core if fully unloaded.
> Restarting the JVM loads all cores in the admin UI, but doesn't populate the 
> ConcurrentHashMap until a core is actually fully loaded.
> I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size 
> = 512m) and made a report (attached) using eclipse MAT.
> *Desired outcome:*
> When a transient core is closed, the references in the SolrMetricManager 
> should be removed, in the same fashion the reporters for the core are also 
> closed and removed.
> In alternative, a unloadOnClose=true|false flag could be implemented to fully 
> unload a transient core when closed due to the cache size.
> *Note:*
> The documentation mentions everywhere that the unused cores will be unloaded, 
> but it's misleading as the cores are never fully unloaded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to