[ https://issues.apache.org/jira/browse/SOLR-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15143872#comment-15143872 ]
Gus Heck commented on SOLR-8349: -------------------------------- Certainly we can punt enabling this for lucene level things to start with, and then tackle that in SOLR-3443 or in a separate Lucene ticket if appropriate. I'm totally fine with that. My motivating use case that got me started on this was a SearchComponent anyway. Now that I understand it, your idea is interesting. It generally sounds like it would allow a similar memory savings, though I had assumed promoting things to a higher class loader level would be undesirable. Here's some things that occurred to me as I pondered your suggestion... # My (potentially erroneous) assumption is that the whole reason for the class loader separation is to allow long term stability with loading/unloading cores and not retain references to classes and objects that aren't needed anymore. In the extreme, if cores were loaded in the same class loader as the container this issue would be less complex. # Putting the entire complexity of the custom Filter/Analyzer etc into the cache greatly enhances the chance of a class loader memory leak. Minimizing the complexity of what's cached puts the programmer in the best position to ensure core level classes don't become referenced by objects held in the container. Note that a follow on to this (or perhaps something required for it?) might be to reference count and unload unused keys. # That said, why would we do it one way for analysis classes and another for components? If your direction is selected for analysis classes perhaps we should do components that way too? # If we do that, what's left to be loaded in the core level loader? The non-increasing set of classes never previously loaded as global and whatever is not referenced by any component/analysis class I guess... # I'm a little concerned about how we will manage to automatically create appropriate keys in the cache. The same analysis class or component may be configured multiple times and so we need a key that hashes the important configuration parameters to distinguish identical instances from variants. Automatic determination of "important" seems dicey though we could simply be pessimistic and use every configuration parameter we can find, but then we need to know which fields are representative of configuration parameters (annotation? but then we're modifying lucene again, drat) or intercept this information as we read the configuration, before we create the instance of the class? Do we have classes that configure complex sub components and hold those as fields? In SOLR-3443 I do the following to generate a key for the resource cache: {code} md5.update(cs.encode(dictionaryFiles).array()); md5.update(cs.encode(affixFile).array()); md5.update(cs.encode(String.valueOf(ignoreCase)).array()); md5.update(cs.encode(String.valueOf(longestOnly)).array()); // ** SNIP ** // resourceKey = "org.apache.lucene.analysis.hunspell.dictionary." + configHash; {code} > Allow sharing of large in memory data structures across cores > ------------------------------------------------------------- > > Key: SOLR-8349 > URL: https://issues.apache.org/jira/browse/SOLR-8349 > Project: Solr > Issue Type: Improvement > Components: Server > Affects Versions: 5.3 > Reporter: Gus Heck > Attachments: SOLR-8349.patch > > > In some cases search components or analysis classes may utilize a large > dictionary or other in-memory structure. When multiple cores are loaded with > identical configurations utilizing this large in memory structure, each core > holds it's own copy in memory. This has been noted in the past and a specific > case reported in SOLR-3443. This patch provides a generalized capability, and > if accepted, this capability will then be used to fix SOLR-3443. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org