Gregg Donovan created SOLR-3514:
-----------------------------------

             Summary: WeakHashMap in FileFloatSource's cache only cleaned by GC
                 Key: SOLR-3514
                 URL: https://issues.apache.org/jira/browse/SOLR-3514
             Project: Solr
          Issue Type: Bug
          Components: search
    Affects Versions: 3.6, 4.0
            Reporter: Gregg Donovan
            Priority: Minor


We've encountered GC spikes at Etsy after adding new ExternalFileFields a 
decent number of times. I was always a little confused by this behavior -- 
isn't it just one big float[]? why does that cause problems for the GC? -- but 
looking at the FileFloatSource code a little more carefully, I wonder if this 
is due to using a WeakHashMap that is only cleaned by GC or manual invocation 
of a
request handler.

FileFloatSource stores a WeakHashMap keyed by {{IndexReader}}. In the 
[code|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/java/org/apache/solr/search/function/FileFloatSource.java?revision=1310219&view=markup#l135],
 it mentions that the implementation is modeled after FieldCache. However, the 
FieldCacheImpl [adds listeners for IndexReader close events and uses those to 
purge its 
caches|http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/FieldCacheImpl.java?revision=1342751&view=markup#l166].
 Should we be doing the same in FileFloatSource?

Attached is a mostly untested patch with a possible implementation. There are 
probably better ways to do it (e.g. I don't love using another WeakHashMap), 
but I found it tough to hook into the IndexReader lifecycle without a) relying 
on classes other than FileFloatSource b) changing the public API of 
FIleFloatSource or c) changing the implementation too much.

There is a RequestHandler inside of FileFloatSource -- 
[ReloadCacheRequestHandler|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/java/org/apache/solr/search/function/FileFloatSource.java?revision=1310219&view=markup#l303]
 -- that can be used to clear the cache
entirely, but this is sub-optimal for us for a few reasons:

* It clears the entire cache. ExternalFileFields often take some
non-trivial time to load and we prefer to do so during SolrCore
warmups. Clearing the entire cache while serving traffic would likely
cause user-facing requests to timeout.
* It forces an extra commit with its consequent cache cycling, etc..

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to