[jira] [Commented] (SOLR-1293) Support for large no:of cores and faster loading/unloading of cores

Erick Erickson (JIRA) Mon, 22 Oct 2012 03:58:19 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481296#comment-13481296
 ]


Erick Erickson commented on SOLR-1293:
--------------------------------------

Well, I think this JIRA will finally get some action...

Jose: 
The actual availability of any particular feature is best tracked by the actual 
JIRA ticket. The "fix version" is usually the earliest _possible_ fix. Not 
until the resolution is something like "fixed" is the code really in the code 
line.

All:
OK, I'm thinking along these lines. I've started implementation, but wanted to 
open up the discussion in case I'm going down the wrong path.

Assumption:
1> For installations with multiple thousands of cores, provision has to me made 
for some kind of administrative process, probably an RDBMS that really 
maintains this information.


So here's a brief outline of the approach I'm thinking about.
1> Add an additional optional parameter to the <cores> entry in solr.xml, 
LRUCacheSize=#. (what default?)
2> Implement SOLR-1306, allow a data provider to be specified in solr.xml that 
gives back core descriptions, something like: <coreDescriptorProvider 
class="com.foo.FooDataProvider" [attr="val"]/> (don't quite know what attrs we 
want, if any).
3> Add two optional attributes to individual <core> entries
   a> sticky="true|false". Default to true. Any cores marked with this would 
never be aged out, essentially treat them just as current. 
   b> loadOnStartup="true|false", default to true.
4> so the process of getting a core would be something like
   a> check the normal list, just like now. If a core was found, return it.
   b> Check the LRU list, if a core was found, return it.
   c> ask the dataprovider (if defined) for the core descriptor. create the 
core and put it in the LRU list.
   d> remove any core entries over the LRU limit. Any hints on the right cache 
to use? There's the Lucene LRUCache, ConcurrentLRUCache, the LRUHashMap in 
lucene that I can't find in any of the compiled jars....). I've got to close 
the core as it's removed.... It _looks_ like I can use ConcurrentLRUCache and 
add a listener to close the core when it's removed from the list.

Processing-wise, in the usual case this would cost an extra check each time a 
core was fetched. If <a> above failed, we would have to see if the dataprovider 
was defined before returning null. I don't think that's onerous, the rest of 
the costs would only be incurred when a dataprovider _did_ exist.

But one design decisions here is along these lines. What to do with persistence 
and stickiness? Specifically, if the coreDescriptorProvider gives us a core 
from, say, an RDBMS, should we allow that core to be persisted into the 
solr.xml file if they've set persist="true" in solr.xml? I'm thinking that we 
can make this all work with maximum flexibility if we allow the 
coreDataProvider to tell us whether we should persist any core currently 
loaded....

Anyway, I'll be fleshing this out over the next little while, anybody want to 
weigh in?

Erick


                
> Support for large no:of cores and faster loading/unloading of cores
> -------------------------------------------------------------------
>
>                 Key: SOLR-1293
>                 URL: https://issues.apache.org/jira/browse/SOLR-1293
>             Project: Solr
>          Issue Type: New Feature
>          Components: multicore
>            Reporter: Noble Paul
>             Fix For: 4.1
>
>         Attachments: SOLR-1293.patch
>
>
> Solr , currently ,is not very suitable for a large no:of homogeneous cores 
> where you require fast/frequent loading/unloading of cores . usually a core 
> is required to be loaded just to fire a search query or to just index one 
> document
> The requirements of such a system are.
> * Very efficient loading of cores . Solr cannot afford to read and parse and 
> create Schema, SolrConfig Objects for each core each time the core has to be 
> loaded ( SOLR-919 , SOLR-920)
> * START STOP core . Currently it is only possible to unload a core (SOLR-880)
> * Automatic loading of cores . If a core is present and it is not loaded and 
> a request comes for that load it automatically before serving up a request
> * As there are a large no:of cores , all the cores cannot be kept loaded 
> always. There has to be an upper limit beyond which we need to unload a few 
> cores (probably the least recently used ones)
> * Automatic allotment of dataDir for cores. If the no:of cores is too high al 
> the cores' dataDirs cannot live in the same dir. There is an upper limit on 
> the no:of dirs you can create in a unix dir w/o affecting performance

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-1293) Support for large no:of cores and faster loading/unloading of cores

Reply via email to