[jira] [Commented] (SOLR-4083) Deprecate specifying individual information in solr.xml. Possibly deprecate solr.xml entirely

Erick Erickson (JIRA) Thu, 15 Nov 2012 09:00:15 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-4083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13498133#comment-13498133
 ]


Erick Erickson commented on SOLR-4083:
--------------------------------------

MMMMMooooommmmmm! He keeps bugging me <G>...

But maybe you've forced me to get to the core of the issue.

I _think_ the answer is that I don't know if it's acceptable to wait for 
startup until 10-15K directories are enumerated and any associated properties 
files are examined (that's the scale we're talking here). OTOH, I have no good 
intuition that it's _not_ acceptable. I was originally thinking that the core 
information for large, complex installations could be kept in some DB 
somewhere, or even in a special Solr "meta-data" core, and accessed on demand 
since there is presumably a system-of-record for that information, possibly 
externally maintained. So you could have a much faster startup in this case 
than if you had to enumerate a (very) large tree structure.

You're right that in a situation where the (pluggable or not) 
coreDescriptorProvider walked the directory structure anyway, there's no need 
to provide a way to plug anything, it'd take the same amount of time either 
way. But what about other ways of storing this?

And I guess that _assuming_ an infrastructure is out there somewhere, I can 
argue that provisioning the core (i.e. getting the directory created, getting 
the minimum directory structure in place etc) has to be done outside of this 
mechanism anyway. Once a site solves that problem, just walking the directory 
tree is sufficient, there's no need for additional coupling of the 
CoreDescriptorProvider to the physical directory structure. Get the structure 
right and you'd automatically have the info a CoreDescriptorProvider would 
return.

And I suppose one could spin off a thread at startup so startup would actually 
be very fast at the expense of (possibly) waiting on your first core get until 
either 1> the directory tree was exhausted or 2> the core was found.

Let me run an experiment or two to see what this looks like in practice. If it 
takes a minute or two to start up, especially if loading core info is a 
background thread, I may be (finally) forced to agree with you...
                
> Deprecate specifying individual <core> information in solr.xml. Possibly 
> deprecate solr.xml entirely
> ----------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-4083
>                 URL: https://issues.apache.org/jira/browse/SOLR-4083
>             Project: Solr
>          Issue Type: Improvement
>          Components: Schema and Analysis
>    Affects Versions: 4.1, 5.0
>            Reporter: Erick Erickson
>            Assignee: Erick Erickson
>
> Spinoff from SOLR-1306. Having a solr.xml file is limiting and possibly 
> unnecessary. We'd gain flexibility by having an "auto-discovery", essentially 
> walking the directories and finding all the cores and just loading them.
> Here's an issue to start the discussion of what that would look like. At this 
> point the way I'm thinking about it depends on SOLR-1306, which depends on 
> SOLR-1028, so the chain is getting kind of long.
> Straw-man proposal:
> 1> system properties can be specified as root paths in the solr tree to start 
> discovery.
> 2> the directory walking process will stop going deep (but not wide) in the 
> directories whenever a solrcore.properties file is encountered. That file can 
> contain any of the properties currently specifiable in a <core> tag. This 
> allows, for instance, re-use of a single solrconfig.xml or schema.xml file 
> across multiple cores. I really dont want to get into having 
> cores-within-cores. While this latter is possible, I don't see any advantage. 
> You _can_ have multiple roots and there's _no_ requirement that the cores be 
> in the directory immediately below that root they can be arbitrarily deep.
> 3> I'm not quite sure what to do with the various properties in the <cores> 
> tag. Perhaps just require these to be system properties?
> 4> Notice the title. Does it still make sense to specify <3> in solr.xml but 
> ignore the cores stuff? It seems like so little information will be in 
> solr.xml if we take all the <core> tags out that we should just kill it all 
> together.
> 5> Not quite sure what this means for _where_ the cores live. Is it 
> arbitrary? Anywyere on disk? Why not?
> 6> core swapping/renaming/whatever. Really, this is about how we model 
> persist="true" on solr.xml. It's easy if we keep solr.xml and just remove the 
> individual core entries. Where to put them?
> 7> _if_ we're supposed to persist core admin operations, it seems like we 
> just persist this stuff to the individual solrcore.properties files. Things 
> like whether it's loaded, whether its name has changed (1028 allows lazy 
> loading).
> 8> This still provide the capability of your own custom 
> CoreDescriptorProvider, which you'll have to specify somehow. I'm not quite 
> sure where yet.
> solr.xml is really the bootstrap for the whole shootin' match. Removing it 
> entirely means we have to specify root directories, zk parameters, whatever 
> somehow. What do people think is the best option here? Leave a degenerate 
> solr.xml? Require system properties be set for any of these options? 
> Currently, the options we'll need are anything (actual or proposed) in the 
> <solr> and <cores> tags.
> So, what the first cut at this would be, building on 1306, is a default 
> CoreDescriptorProvider that ignored all the <core> entries in solr.xml, 
> walked the tree and loaded all the cores found. I claim this is a quick thing 
> to PoC assuming SOLR-1306 and I'll try to provide a patch demonstrating it 
> over the weekend.
> But mostly, this is a place to start the discussion about what this would 
> look like rather than have it get lost in SOLR-1306.
> finally, note that I have no intention of putting any of this into 4.x at 
> least until we cut the 4.1/4.0.1 whatever.
> And, of course, until we fully deprecate solr.xml (5.0?) the current behavior 
> will be the default.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-4083) Deprecate specifying individual information in solr.xml. Possibly deprecate solr.xml entirely

Reply via email to