[ 
https://issues.apache.org/jira/browse/SOLR-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13496370#comment-13496370
 ] 

Erick Erickson commented on SOLR-1306:
--------------------------------------

Well, the use case here is explicitly that the core information is kept in a 
completely extra-solr repository (extra ZK too for that matter). Managing 100K 
cores by moving directories around is non-trivial, especially since there will 
probably be some system-of-record for where all the information lives anyway.

As it stands, this patch doesn't really affect the way Solr works OOB. It only 
comes into play if the people implementing the provider _require_ it (and want 
to implement the complexity).

But let me think about this a bit. Are you suggesting that the whole notion of 
solr.xml be replaced by some kind of crawl/discovery process? Off the top of my 
head, I can imagine a degenerate solr.xml that just lists one or more 
directories. Then the load process consists of crawling those directories 
looking for cores and loading them, possibly with some kind of configuration 
files at the core level. For the 10s of K cores/machine case we don't want to 
put the data in solrconfig.xml or anything like that, I'm thinking of something 
very much simpler, on the order of a java.properties file. I've skipped 
thinking about how to "find a core" or how that plays with using common schemas 
to see if this is along the lines you're thinking of "getting meta-data closer 
to the index".

It does make the whole coordination issue a lot easier, though. You no longer 
have the loose coupling between having core information in solr.xml and then 
having to be sure the files/dirs corresponding to what's in solr.xml "just 
happen" to map to what's actually on disk.... Moving something from one place 
to another would consist of
1> shutting down the servers
2> moving the core directory from one server to another
3> starting up the servers again.

I can imagine doing this a bit differently...
1> copy the core from one server to another
2> issue an unload for the core on the source server
3> issue a create for the core on the dest server

There'd probably have to be some kind of background loading, but we're already 
talking about parallelizing multicore loads...

>From an admin perspective, the poor soul trying to maintain this all could 
>pretty easily enumerate where all the cores were just by asking each server 
>for a list of where things are.

Anyway, is the in the vicinity of "moving the metadata closer to the index"?
                
> Support pluggable persistence/loading of solr.xml details
> ---------------------------------------------------------
>
>                 Key: SOLR-1306
>                 URL: https://issues.apache.org/jira/browse/SOLR-1306
>             Project: Solr
>          Issue Type: New Feature
>          Components: multicore
>            Reporter: Noble Paul
>            Assignee: Erick Erickson
>             Fix For: 4.1
>
>         Attachments: SOLR-1306.patch, SOLR-1306.patch, SOLR-1306.patch, 
> SOLR-1306.patch
>
>
> Persisting and loading details from one xml is fine if the no:of cores are 
> small and the no:of cores are few/fixed . If there are 10's of thousands of 
> cores in a single box adding a new core (with persistent=true) becomes very 
> expensive because every core creation has to write this huge xml. 
> Moreover , there is a good chance that the file gets corrupted and all the 
> cores become unusable . In that case I would prefer it to be stored in a 
> centralized DB which is backed up/replicated and all the information is 
> available in a centralized location. 
> We may need to refactor CoreContainer to have a pluggable implementation 
> which can load/persist the details . The default implementation should 
> write/read from/to solr.xml . And the class should be pluggable as follows in 
> solr.xml
> {code:xml}
> <solr>
>   <dataProvider class="com.foo.FooDataProvider" attr1="val1" attr2="val2"/>
> </solr>
> {code}
> There will be a new interface (or abstract class ) called SolrDataProvider 
> which this class must implement

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to