I think of it exactly as Jan described it. solr.xml is the node configuration, usually should be the same for all the cluster, but not necessarily all the time (i.e. during a deployment they may differ). Putting it in ZooKeeper is, I believe, a mistake, because then you see a file up there, but it’s not necessarily what Solr loaded, and there is no way to know for sure what solr.xml a node started with. I think some configurations can probably be moved to clusterprops (i.e. maxBooleanClauses), while others still belong to whatever node configuration file we have currently, solr.xml.
On Fri, Aug 28, 2020 at 8:51 AM Ilan Ginzburg <ilans...@gmail.com> wrote: > What I'm really looking for (and currently my understanding is that > solr.xml is the only option) is *a cluster config a Solr dev can set as a > default* when introducing a new feature for example, so that the config > is picked out of the box in SolrCloud, yet allowing the end user to > override it if he so wishes. > > But "cluster config" in this context *with a caveat*: when doing a > rolling upgrade, nodes running new code need the new cluster config, nodes > running old code need the previous cluster config... Having a per node > solr.xml deployed atomically with the code as currently the case has > disadvantages, but solves this problem effectively in a very simple way. If > we were to move to a central cluster config, we'd likely need to introduce > config versioning or as Noble suggested elsewhere, only write code that's > backward compatible (w.r.t. config), deploy that code everywhere then once > no old code is running, update the cluster config. I find this approach > complicated from both dev and operational perspective with an unclear added > value. > > Ilan > > PS. I've stumbled upon the loading of solr.xml from Zookeeper in the past > but couldn't find it as I wrote my message so I thought I imagined it... > > It's in SolrDispatchFilter.loadNodeConfig(). It establishes a connection > to ZK for fetching solr.xml then closes it. > It relies on system property waitForZk as the connection timeout (in > seconds, defaults to 30) and system property zkHost as the Zookeeper host. > > I believe solr.xml can only end up in ZK through the use of ZkCLI. Then > the user is on his own to manage SolrCloud version upgrades: if a new > solr.xml is included as part of a new version of SolrCloud, the user > having pushed a previous version into ZK will not see the update. > I wonder if putting solr.xml in ZK is a common practice. > > On Fri, Aug 28, 2020 at 4:58 PM Jan Høydahl <jan....@cominvent.com> wrote: > >> I interpret solr.xml as the node-local configuration for a single node. >> clusterprops.json is the cluster-wide configuration applying to all nodes. >> solrconfig.xml is of course per core etc >> >> solr.in.sh is the per-node ENV-VAR way of configuring a node, and many >> of those are picked up in solr.xml (other in bin/solr). >> >> I think it is important to keep a file-local config file which can only >> be modified if you have shell access to that local node, it provides an >> extra layer of security. >> And in certain cases a node may need a different configuration from >> another node, i.e. during an upgrade. >> >> I put solr.xml in zookeeper. It may have been a mistake, since it may not >> make all that much sense to load solr.xml which is a node-level file, from >> ZK. But if it uses var substitutions for all node-level stuff, it will >> still work since those vars are pulled from local properties when parsed >> anyway. >> >> I’m also somewhat against hijacking clusterprops.json as a general >> purpose JSON config file for the cluster. It was supposed to be for simple >> properties. >> >> Jan >> >> > 28. aug. 2020 kl. 14:23 skrev Erick Erickson <erickerick...@gmail.com>: >> > >> > Solr.xml can also exist on Zookeeper, it doesn’t _have_ to exist >> locally. You do have to restart to have any changes take effect. >> > >> > Long ago in a Solr far away solr.xml was where all the cores were >> defined. This was before “core discovery” was put in. Since solr.xml had to >> be there anyway and was read at startup, other global information was added >> and it’s lived on... >> > >> > Then clusterprops.json came along as a place to put, well, cluster-wide >> properties so having solr.xml too seems awkward. Although if you do have >> solr.xml locally to each node, you could theoretically have different >> settings for different Solr instances. Frankly I consider this more of a >> bug than a feature. >> > >> > I know there have been some talk about removing solr.xml entirely, but >> I’m not sure what the thinking is about what to do instead. Whatever we do >> needs to accommodate standalone. We could do the same trick we do now, and >> essentially move all the current options in solr.xml to clusterprops.json >> (or other ZK node) and read it locally for stand-alone. The API could even >> be used to change it if it was stored locally. >> > >> > That still leaves the chicken-and-egg problem if connecting to ZK in >> the first place. >> > >> >> On Aug 28, 2020, at 7:43 AM, Ilan Ginzburg <ilans...@gmail.com> wrote: >> >> >> >> I want to ramp-up/discuss/inventory configuration options in Solr. >> Here's my understanding of what exists and what could/should be used >> depending on the need. Please correct/complete as needed (or point to >> documentation I might have missed). >> >> >> >> There are currently 3 sources of general configuration I'm aware of: >> >> • Collection specific config bootstrapped by file solrconfig.xml >> and copied into the initial (_default) then subsequent Config Sets in >> Zookeeper. >> >> • Cluster wide config in Zookeeper /clusterprops.json editable >> globally through Zookeeper interaction using an API. Not bootstrapped by >> anything (i.e. does not exist until the user explicitly creates it) >> >> • Node config file solr.xml deployed with Solr on each node and >> loaded when Solr starts. Changes to this file are per node and require node >> restart to be taken into account. >> >> The Collection specific config (file solrconfig.xml then in Zookeeper >> /configs/<config set name>/solrconfig.xml) allows Solr devs to set >> reasonable defaults (the file is part of the Solr distribution). Content >> can be changed by users as they create new Config Sets persisted in >> Zookeeper. >> >> >> >> Zookeeper's /clusterprops.json can be edited through the collection >> admin API CLUSTERPROP. If users do not set anything there, the file doesn't >> even exist in Zookeeper therefore `Solr devs cannot use it to set a default >> cluster config, there's no clusterprops.json file in the Solr distrib like >> there's a solrconfig.xml. >> >> >> >> File solr.xml is used by Solr devs to set some reasonable default >> configuration (parametrized through property files or system properties). >> There's no API to change that file, users would have to edit/redeploy the >> file on each node and restart the Solr JVM on that node for the new config >> to be taken into account. >> >> >> >> Based on the above, my vision (or mental model) of what to use >> depending on the need: >> >> >> >> solrconfig.xml is the only per collection config. IMO it does its job >> correctly: Solr devs can set defaults, users tailor the content to what >> they need for new config sets. It's the only option for per collection >> config anyway. >> >> >> >> The real hesitation could be between solr.xml and Zookeeper >> /clusterprops.json. What should go where? >> >> >> >> For user configs (anything the user does to the Solr cluster AFTER it >> was deployed and started), /clusterprops.json seems to be the obvious >> choice and offers the right abstractions (global config, no need to worry >> about individual nodes, all nodes pick up configs and changes to configs >> dynamically). >> >> >> >> For configs that need to be available without requiring user >> intervention or needed before the connection to ZK is established, there's >> currently no other choice than using solr.xml. Such configuration obviously >> include parameters that are needed to connect to ZK (timeouts, credential >> provider and hopefully one day an option to either use direct ZK >> interaction code or Curator code), but also configuration of general >> features that should be the default without requiring users to opt in yet >> allowing then to easily opt out by editing solr.xml before deploying to >> their cluster (in the future, this could include which Lucene version to >> load in Solr for example). >> >> >> >> To summarize: >> >> • Collection specific config? --> solrconfig.xml >> >> • User provided cluster config once SolrCloud is running? --> ZK >> /clusterprops.json >> >> • Solr dev provided cluster config? --> solr.xml >> >> >> >> Going forward, some (but only some!) of the config that currently can >> only live in solr.xml could be made to go to /clusterprops.json or another >> ZK based config file. This would require adding code to create that ZK file >> upon initial cluster start (to not force the user to push it) and devise a >> mechanism (likely a script, could be tricky though) to update that file in >> ZK when a new release of Solr is deployed and a previous version of that >> file already exists. Not impossible tasks, but not trivial ones either. >> Whatever the needs of such an approach are, it might be easier to keep the >> existing solr.xml as a file and allow users to define overrides in >> Zookeeper for the configuration parameters from solr.xml that make sense to >> be overridden in ZK (obviously ZK credentials or connection timeout do not >> make sense in that context, but defining the shard handler implementation >> class does since it is likely loaded after a node managed to connect to ZK). >> >> >> >> Some config will have to stay in a local Node file system file and >> only there no matter what: Zookeeper timeout definition or any node >> configuration that is needed before the node connects to Zookeeper. >> >> >> > >> > >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> > For additional commands, e-mail: dev-h...@lucene.apache.org >> > >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> >>