[jira] [Commented] (SOLR-8349) Allow sharing of large in memory data structures across cores

Gus Heck (JIRA) Thu, 11 Feb 2016 17:50:48 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15143872#comment-15143872
 ]


Gus Heck commented on SOLR-8349:
--------------------------------

Certainly we can punt enabling this for lucene level things to start with, and 
then tackle that in SOLR-3443 or in a separate Lucene ticket if appropriate. 
I'm totally fine with that. My motivating use case that got me started on this 
was a SearchComponent anyway.

Now that I understand it, your idea is interesting. It generally sounds like it 
would allow a similar memory savings, though I had assumed promoting things to 
a higher class loader level would be undesirable. Here's some things that 
occurred to me as I pondered your suggestion... 

# My (potentially erroneous) assumption is that the whole reason for the class 
loader separation is to allow long term stability with loading/unloading cores 
and not retain references to classes and objects that aren't needed anymore. In 
the extreme, if cores were loaded in the same class loader as the container 
this issue would be less complex.
# Putting the entire complexity of the custom Filter/Analyzer etc into the 
cache greatly enhances the chance of a class loader memory leak. Minimizing the 
complexity of what's cached puts the programmer in the best position to ensure 
core level classes don't become referenced by objects held in the container. 
Note that a follow on to this (or perhaps something required for it?) might be 
to reference count and unload unused keys.
# That said, why would we do it one way for analysis classes and another for 
components? If your direction is selected for analysis classes perhaps we 
should do components that way too?
# If we do that, what's left to be loaded in the core level loader? The 
non-increasing set of classes never previously loaded as global and whatever is 
not referenced by any component/analysis class I guess... 
# I'm a little concerned about how we will manage to automatically create 
appropriate keys in the cache. The same analysis class or component may be 
configured multiple times and so we need a key that hashes the important 
configuration parameters to distinguish identical instances from variants. 
Automatic determination of "important" seems dicey though we could simply be 
pessimistic and use every configuration parameter we can find, but then we need 
to know which fields are representative of configuration parameters 
(annotation? but then we're modifying lucene again, drat) or intercept this 
information as we read the configuration, before we create the instance of the 
class? Do we have classes that configure complex sub components and hold those 
as fields?

In SOLR-3443 I do the following to generate a key for the resource cache: 

{code}
md5.update(cs.encode(dictionaryFiles).array());
md5.update(cs.encode(affixFile).array());
md5.update(cs.encode(String.valueOf(ignoreCase)).array());
md5.update(cs.encode(String.valueOf(longestOnly)).array());

// ** SNIP ** //

resourceKey = "org.apache.lucene.analysis.hunspell.dictionary." + configHash;
{code}

> Allow sharing of large in memory data structures across cores
> -------------------------------------------------------------
>
>                 Key: SOLR-8349
>                 URL: https://issues.apache.org/jira/browse/SOLR-8349
>             Project: Solr
>          Issue Type: Improvement
>          Components: Server
>    Affects Versions: 5.3
>            Reporter: Gus Heck
>         Attachments: SOLR-8349.patch
>
>
> In some cases search components or analysis classes may utilize a large 
> dictionary or other in-memory structure. When multiple cores are loaded with 
> identical configurations utilizing this large in memory structure, each core 
> holds it's own copy in memory. This has been noted in the past and a specific 
> case reported in SOLR-3443. This patch provides a generalized capability, and 
> if accepted, this capability will then be used to fix SOLR-3443.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8349) Allow sharing of large in memory data structures across cores

Reply via email to