Re: Solr configuration options

Jan Høydahl Fri, 28 Aug 2020 07:58:30 -0700

I interpret solr.xml as the node-local configuration for a single node.
clusterprops.json is the cluster-wide configuration applying to all nodes.
solrconfig.xml is of course per core etc


solr.in.sh is the per-node ENV-VAR way of configuring a node, and many of those 
are picked up in solr.xml (other in bin/solr).

I think it is important to keep a file-local config file which can only be 
modified if you have shell access to that local node, it provides an extra 
layer of security.
And in certain cases a node may need a different configuration from another 
node, i.e. during an upgrade.

I put solr.xml in zookeeper. It may have been a mistake, since it may not make 
all that much sense to load solr.xml which is a node-level file, from ZK. But 
if it uses var substitutions for all node-level stuff, it will still work since 
those vars are pulled from local properties when parsed anyway.

I’m also somewhat against hijacking clusterprops.json as a general purpose JSON 
config file for the cluster. It was supposed to be for simple properties.

Jan

> 28. aug. 2020 kl. 14:23 skrev Erick Erickson <[email protected]>:
> 
> Solr.xml can also exist on Zookeeper, it doesn’t _have_ to exist locally. You 
> do have to restart to have any changes take effect.
> 
> Long ago in a Solr far away solr.xml was where all the cores were defined. 
> This was before “core discovery” was put in. Since solr.xml had to be there 
> anyway and was read at startup, other global information was added and it’s 
> lived on...
> 
> Then clusterprops.json came along as a place to put, well, cluster-wide 
> properties so having solr.xml too seems awkward. Although if you do have 
> solr.xml locally to each node, you could theoretically have different 
> settings for different Solr instances. Frankly I consider this more of a bug 
> than a feature.
> 
> I know there have been some talk about removing solr.xml entirely, but I’m 
> not sure what the thinking is about what to do instead. Whatever we do needs 
> to accommodate standalone. We could do the same trick we do now, and 
> essentially move all the current options in solr.xml to clusterprops.json (or 
> other ZK node) and read it locally for stand-alone. The API could even be 
> used to change it if it was stored locally.
> 
> That still leaves the chicken-and-egg problem if connecting to ZK in the 
> first place.
> 
>> On Aug 28, 2020, at 7:43 AM, Ilan Ginzburg <[email protected]> wrote:
>> 
>> I want to ramp-up/discuss/inventory configuration options in Solr. Here's my 
>> understanding of what exists and what could/should be used depending on the 
>> need. Please correct/complete as needed (or point to documentation I might 
>> have missed).
>> 
>> There are currently 3 sources of general configuration I'm aware of:
>>      • Collection specific config bootstrapped by file solrconfig.xml and 
>> copied into the initial (_default) then subsequent Config Sets in Zookeeper.
>>      • Cluster wide config in Zookeeper /clusterprops.json editable globally 
>> through Zookeeper interaction using an API. Not bootstrapped by anything 
>> (i.e. does not exist until the user explicitly creates it)
>>      • Node config file solr.xml deployed with Solr on each node and loaded 
>> when Solr starts. Changes to this file are per node and require node restart 
>> to be taken into account.
>> The Collection specific config (file solrconfig.xml then in Zookeeper 
>> /configs/<config set name>/solrconfig.xml) allows Solr devs to set 
>> reasonable defaults (the file is part of the Solr distribution). Content can 
>> be changed by users as they create new Config Sets persisted in Zookeeper.
>> 
>> Zookeeper's /clusterprops.json can be edited through the collection admin 
>> API CLUSTERPROP. If users do not set anything there, the file doesn't even 
>> exist in Zookeeper therefore `Solr devs cannot use it to set a default 
>> cluster config, there's no clusterprops.json file in the Solr distrib like 
>> there's a solrconfig.xml.
>> 
>> File solr.xml is used by Solr devs to set some reasonable default 
>> configuration (parametrized through property files or system properties). 
>> There's no API to change that file, users would have to edit/redeploy the 
>> file on each node and restart the Solr JVM on that node for the new config 
>> to be taken into account.
>> 
>> Based on the above, my vision (or mental model) of what to use depending on 
>> the need:
>> 
>> solrconfig.xml is the only per collection config. IMO it does its job 
>> correctly: Solr devs can set defaults, users tailor the content to what they 
>> need for new config sets. It's the only option for per collection config 
>> anyway.
>> 
>> The real hesitation could be between solr.xml and Zookeeper 
>> /clusterprops.json. What should go where?
>> 
>> For user configs (anything the user does to the Solr cluster AFTER it was 
>> deployed and started), /clusterprops.json seems to be the obvious choice and 
>> offers the right abstractions (global config, no need to worry about 
>> individual nodes, all nodes pick up configs and changes to configs 
>> dynamically).
>> 
>> For configs that need to be available without requiring user intervention or 
>> needed before the connection to ZK is established, there's currently no 
>> other choice than using solr.xml. Such configuration obviously include 
>> parameters that are needed to connect to ZK (timeouts, credential provider 
>> and hopefully one day an option to either use direct ZK interaction code or 
>> Curator code), but also configuration of general features that should be the 
>> default without requiring users to opt in yet allowing then to easily opt 
>> out by editing solr.xml before deploying to their cluster (in the future, 
>> this could include which Lucene version to load in Solr for example).
>> 
>> To summarize:
>>      • Collection specific config? --> solrconfig.xml
>>      • User provided cluster config once SolrCloud is running? --> ZK 
>> /clusterprops.json
>>      • Solr dev provided cluster config? --> solr.xml
>> 
>> Going forward, some (but only some!) of the config that currently can only 
>> live in solr.xml could be made to go to /clusterprops.json or another ZK 
>> based config file. This would require adding code to create that ZK file 
>> upon initial cluster start (to not force the user to push it) and devise a 
>> mechanism (likely a script, could be tricky though) to update that file in 
>> ZK when a new release of Solr is deployed and a previous version of that 
>> file already exists. Not impossible tasks, but not trivial ones either. 
>> Whatever the needs of such an approach are, it might be easier to keep the 
>> existing solr.xml as a file and allow users to define overrides in Zookeeper 
>> for the configuration parameters from solr.xml that make sense to be 
>> overridden in ZK (obviously ZK credentials or connection timeout do not make 
>> sense in that context, but defining the shard handler implementation class 
>> does since it is likely loaded after a node managed to connect to ZK).
>> 
>> Some config will have to stay in a local Node file system file and only 
>> there no matter what: Zookeeper timeout definition or any node configuration 
>> that is needed before the node connects to Zookeeper.
>> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Solr configuration options

Reply via email to