Re: Solr configuration options

Tomás Fernández Löbbe Fri, 28 Aug 2020 09:57:29 -0700

I think of it exactly as Jan described it. solr.xml is the node
configuration, usually should be the same for all the cluster, but not
necessarily all the time (i.e. during a deployment they may differ).
Putting it in ZooKeeper is, I believe, a mistake, because then you see a
file up there, but it’s not necessarily what Solr loaded, and there is no
way to know for sure what solr.xml a node started with.
I think some configurations can probably be moved to clusterprops (i.e.
maxBooleanClauses), while others still belong to whatever node
configuration file we have currently, solr.xml.


On Fri, Aug 28, 2020 at 8:51 AM Ilan Ginzburg <ilans...@gmail.com> wrote:

> What I'm really looking for (and currently my understanding is that
> solr.xml is the only option) is *a cluster config a Solr dev can set as a
> default* when introducing a new feature for example, so that the config
> is picked out of the box in SolrCloud, yet allowing the end user to
> override it if he so wishes.
>
> But "cluster config" in this context *with a caveat*: when doing a
> rolling upgrade, nodes running new code need the new cluster config, nodes
> running old code need the previous cluster config... Having a per node
> solr.xml deployed atomically with the code as currently the case has
> disadvantages, but solves this problem effectively in a very simple way. If
> we were to move to a central cluster config, we'd likely need to introduce
> config versioning or as Noble suggested elsewhere, only write code that's
> backward compatible (w.r.t. config), deploy that code everywhere then once
> no old code is running, update the cluster config. I find this approach
> complicated from both dev and operational perspective with an unclear added
> value.
>
> Ilan
>
> PS. I've stumbled upon the loading of solr.xml from Zookeeper in the past
> but couldn't find it as I wrote my message so I thought I imagined it...
>
> It's in SolrDispatchFilter.loadNodeConfig(). It establishes a connection
> to ZK for fetching solr.xml then closes it.
> It relies on system property waitForZk as the connection timeout (in
> seconds, defaults to 30) and system property zkHost as the Zookeeper host.
>
> I believe solr.xml can only end up in ZK through the use of ZkCLI. Then
> the user is on his own to manage SolrCloud version upgrades: if a new
> solr.xml is included as part of a new version of SolrCloud, the user
> having pushed a previous version into ZK will not see the update.
> I wonder if putting solr.xml in ZK is a common practice.
>
> On Fri, Aug 28, 2020 at 4:58 PM Jan Høydahl <jan....@cominvent.com> wrote:
>
>> I interpret solr.xml as the node-local configuration for a single node.
>> clusterprops.json is the cluster-wide configuration applying to all nodes.
>> solrconfig.xml is of course per core etc
>>
>> solr.in.sh is the per-node ENV-VAR way of configuring a node, and many
>> of those are picked up in solr.xml (other in bin/solr).
>>
>> I think it is important to keep a file-local config file which can only
>> be modified if you have shell access to that local node, it provides an
>> extra layer of security.
>> And in certain cases a node may need a different configuration from
>> another node, i.e. during an upgrade.
>>
>> I put solr.xml in zookeeper. It may have been a mistake, since it may not
>> make all that much sense to load solr.xml which is a node-level file, from
>> ZK. But if it uses var substitutions for all node-level stuff, it will
>> still work since those vars are pulled from local properties when parsed
>> anyway.
>>
>> I’m also somewhat against hijacking clusterprops.json as a general
>> purpose JSON config file for the cluster. It was supposed to be for simple
>> properties.
>>
>> Jan
>>
>> > 28. aug. 2020 kl. 14:23 skrev Erick Erickson <erickerick...@gmail.com>:
>> >
>> > Solr.xml can also exist on Zookeeper, it doesn’t _have_ to exist
>> locally. You do have to restart to have any changes take effect.
>> >
>> > Long ago in a Solr far away solr.xml was where all the cores were
>> defined. This was before “core discovery” was put in. Since solr.xml had to
>> be there anyway and was read at startup, other global information was added
>> and it’s lived on...
>> >
>> > Then clusterprops.json came along as a place to put, well, cluster-wide
>> properties so having solr.xml too seems awkward. Although if you do have
>> solr.xml locally to each node, you could theoretically have different
>> settings for different Solr instances. Frankly I consider this more of a
>> bug than a feature.
>> >
>> > I know there have been some talk about removing solr.xml entirely, but
>> I’m not sure what the thinking is about what to do instead. Whatever we do
>> needs to accommodate standalone. We could do the same trick we do now, and
>> essentially move all the current options in solr.xml to clusterprops.json
>> (or other ZK node) and read it locally for stand-alone. The API could even
>> be used to change it if it was stored locally.
>> >
>> > That still leaves the chicken-and-egg problem if connecting to ZK in
>> the first place.
>> >
>> >> On Aug 28, 2020, at 7:43 AM, Ilan Ginzburg <ilans...@gmail.com> wrote:
>> >>
>> >> I want to ramp-up/discuss/inventory configuration options in Solr.
>> Here's my understanding of what exists and what could/should be used
>> depending on the need. Please correct/complete as needed (or point to
>> documentation I might have missed).
>> >>
>> >> There are currently 3 sources of general configuration I'm aware of:
>> >>      • Collection specific config bootstrapped by file solrconfig.xml
>> and copied into the initial (_default) then subsequent Config Sets in
>> Zookeeper.
>> >>      • Cluster wide config in Zookeeper /clusterprops.json editable
>> globally through Zookeeper interaction using an API. Not bootstrapped by
>> anything (i.e. does not exist until the user explicitly creates it)
>> >>      • Node config file solr.xml deployed with Solr on each node and
>> loaded when Solr starts. Changes to this file are per node and require node
>> restart to be taken into account.
>> >> The Collection specific config (file solrconfig.xml then in Zookeeper
>> /configs/<config set name>/solrconfig.xml) allows Solr devs to set
>> reasonable defaults (the file is part of the Solr distribution). Content
>> can be changed by users as they create new Config Sets persisted in
>> Zookeeper.
>> >>
>> >> Zookeeper's /clusterprops.json can be edited through the collection
>> admin API CLUSTERPROP. If users do not set anything there, the file doesn't
>> even exist in Zookeeper therefore `Solr devs cannot use it to set a default
>> cluster config, there's no clusterprops.json file in the Solr distrib like
>> there's a solrconfig.xml.
>> >>
>> >> File solr.xml is used by Solr devs to set some reasonable default
>> configuration (parametrized through property files or system properties).
>> There's no API to change that file, users would have to edit/redeploy the
>> file on each node and restart the Solr JVM on that node for the new config
>> to be taken into account.
>> >>
>> >> Based on the above, my vision (or mental model) of what to use
>> depending on the need:
>> >>
>> >> solrconfig.xml is the only per collection config. IMO it does its job
>> correctly: Solr devs can set defaults, users tailor the content to what
>> they need for new config sets. It's the only option for per collection
>> config anyway.
>> >>
>> >> The real hesitation could be between solr.xml and Zookeeper
>> /clusterprops.json. What should go where?
>> >>
>> >> For user configs (anything the user does to the Solr cluster AFTER it
>> was deployed and started), /clusterprops.json seems to be the obvious
>> choice and offers the right abstractions (global config, no need to worry
>> about individual nodes, all nodes pick up configs and changes to configs
>> dynamically).
>> >>
>> >> For configs that need to be available without requiring user
>> intervention or needed before the connection to ZK is established, there's
>> currently no other choice than using solr.xml. Such configuration obviously
>> include parameters that are needed to connect to ZK (timeouts, credential
>> provider and hopefully one day an option to either use direct ZK
>> interaction code or Curator code), but also configuration of general
>> features that should be the default without requiring users to opt in yet
>> allowing then to easily opt out by editing solr.xml before deploying to
>> their cluster (in the future, this could include which Lucene version to
>> load in Solr for example).
>> >>
>> >> To summarize:
>> >>      • Collection specific config? --> solrconfig.xml
>> >>      • User provided cluster config once SolrCloud is running? --> ZK
>> /clusterprops.json
>> >>      • Solr dev provided cluster config? --> solr.xml
>> >>
>> >> Going forward, some (but only some!) of the config that currently can
>> only live in solr.xml could be made to go to /clusterprops.json or another
>> ZK based config file. This would require adding code to create that ZK file
>> upon initial cluster start (to not force the user to push it) and devise a
>> mechanism (likely a script, could be tricky though) to update that file in
>> ZK when a new release of Solr is deployed and a previous version of that
>> file already exists. Not impossible tasks, but not trivial ones either.
>> Whatever the needs of such an approach are, it might be easier to keep the
>> existing solr.xml as a file and allow users to define overrides in
>> Zookeeper for the configuration parameters from solr.xml that make sense to
>> be overridden in ZK (obviously ZK credentials or connection timeout do not
>> make sense in that context, but defining the shard handler implementation
>> class does since it is likely loaded after a node managed to connect to ZK).
>> >>
>> >> Some config will have to stay in a local Node file system file and
>> only there no matter what: Zookeeper timeout definition or any node
>> configuration that is needed before the node connects to Zookeeper.
>> >>
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> > For additional commands, e-mail: dev-h...@lucene.apache.org
>> >
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>

Re: Solr configuration options

Reply via email to