Re: Solr configuration options

Ilan Ginzburg Thu, 03 Sep 2020 01:07:39 -0700

Noble,

In order for *clusterpropos.json* to replace what's currently done with
*solr.xml*, we'd need to introduce a mechanism to make configuration
available out of the box (when Zookeeper is still empty). And if
*clusterpropos.json* is to be used in standalone mode, it also must live
somewhere on disk as well (no Zookeeper in standalone mode).


I believe shardHandlerFactoryConfig is in *solr.xml* for all nodes to know
which shard handler to use, not for configuring a different one on each
node.

Priming an empty Zookeeper with an initial version of *clusterpropos.json*
on startup is easy (first node up pushes its local copy). But after this
happened once, if Solr is upgraded with a new default *clusterprops.json*,
it is hard (to very hard) to update the Zookeeper version, high risk of
erasing configuration that the user added or does not want to change.

How about a variant? Keep a local *solr.xml* file with default configs and
support overriding of these configs from Zookeeper's *clusterprops.json*.
This approach does not have the out of the box issue mentioned above, and
practically also solves the "updating defaults" issue: if the user cherry
picked some *solr.xml* configuration values and overrode them
*clusterprops.json*, it is then his responsibility to maintain them there.
Newly introduced configuration values in *solr.xml* (due to a new Solr
version) are not impacted since they were not overridden.

I believe this approach is not too far from a suggestion you seem to make
<https://github.com/apache/lucene-solr/pull/1684#issuecomment-683488170> to
hard code default configs to get rid of *solr.xml*. The difference is that
the hard coding is done in *solr.xml* rather than in some
*defaultSolrConfig.java* class. This makes changing default configuration
easy and not requiring recompilation but is otherwise not conceptually
different.

Ilan


On Thu, Sep 3, 2020 at 7:05 AM Noble Paul <[email protected]> wrote:

> Let's take a step back and take a look at the history of Solr.
>
> Long ago there was only standalone Solr with a single core
> there were 3 files
>
> * solr.xml : everything required for CoreContainer went here
> * solr.config.xml : per core configurations go here
> * schema.xml: this is not relevant for this discussion
>
> Now we are in the cloud world where everything lives in ZK. This also
> means there are potentially 1000's of nodes reading configuration from
> ZK. This is quite a convenient setup. The same configset is being
> shared by a very large no:of nodes and everyone is happy.
>
> But, solr.xml still stands out like a sore thumb. We have no idea what
> it is for? is it a node specific configuration? or is it something
> that every single node in the cluster should have in common?
>
> e:g: shardHandlerFactoryConfig.
>
> Does it even make sense for to use a separate
> "shardHandlerFactoryConfig" for each node? Or should we have every
> node have the same shardHandlerFactoryConfig? It totally makes no
> sense to have a different config in each node. Here is an exhaustive
> list of parameters that we can configure in solr.xml
>
>
> https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/core/NodeConfig.java
>
> 99% of these parameters information should not be configured on a per
> node basis. It should be shared across the cluster. If that is the
> case, we should always have this xml file stored in ZK and not in
> every node.So, if this file is in ZK, does it make sense to be able to
> update this file in ZK and not reload all nodes? Yes totally. Anyone
> who has 1000's of nodes in a cluster will definitely hesitate to
> restart their clusters. Editing XML file is extremely hard. Users hate
> XML. Everyone is familiar with JSON and they love JSON. The entire
> security framework runs off of security.json. It's extremely easy to
> manipulate and read.
>
> So we introduced clusterprops.json for some attributes we may change
> in a live cluster. The advantages were
>
> * It's easy to manipulate this data. Simple APIs can be provided.
> These APIs may validate your input. there is a near 1:1 mapping
> between API input and the actual config
> * users can directly edit this JSON using simple tools
> * We can change data live and it's quite easy to reload a specific
> component
>
> My preference is
> * Support everything in solr.xml in clusterprops.json. Get rid of XML
> everywhere
> * This file shall be supported in standalone mode as well. There is no harm
> * May be there are a few attributes we do not want to configure on a
> cluster-wide setup. Use a simple node.properties file for that. Nobody
> likes XML configuration. It's error-prone to edit xml files
>
> --Noble
>
>
>
>
> On Sat, Aug 29, 2020 at 5:32 AM Alexandre Rafalovitch
> <[email protected]> wrote:
> >
> > This is way above my head, but I wonder if we could dogfood any of
> > this with a future Solr cloud example? At the moment, it sets up 2-4
> > nodes, 1 collection, any number of shards/replicas. And it does it by
> > directory clone and some magic in bin/solr to ensure logs don't step
> > on each other's foot.
> >
> > If we have an idea of what this should look like and an example we
> > actually ship, we could probably make it much more concrete.
> >
> > Regards,
> >    Alex.
> >
> >
> > On Fri, 28 Aug 2020 at 15:12, Gus Heck <[email protected]> wrote:
> > >
> > > Sure of course someone has to set up the first one, that should be an
> initial collaboration with devops one can never escape that. Mount points
> can be established in an automated fashion and named by convention. My
> yearning is to make the devops side of it devops based (provide machines
> that look like X where all the "X things" are attributes familiar to devops
> people such as CPUs/mounts/RAM/etc.) and the Solr side of it controlled by
> those who are experts in Solr to the greatest extent possible. So my desire
> is that Solr specific stuff go in ZK and machine definitions be controlled
> by devops. Once the initial setup for type X is done then the solr guy says
> to devops pls give me 3 more of type X (zk locations are a devops thing
> btw, they might move zk as they see fit) and when they start, the nodes
> join the cluster. Solr guy does his thing, twiddles configs to make it hum
> (within limits, of course, some changes require machine level changes),
> occasionally requests reboots, and when he doesn't need the machines he
> says... you can turn off machine A, B and C now. Solr guy doesn't care if
> it's AMI or docker or that new Flazllebarp thing that devops seem to like
> for no clear reason other than it's sold to them by TABS
> (TinyAuspexBananaSoft Inc) who threw it in when they sold them a bunch of
> other stuff...
> > >
> > > The config is packaged with the code because there's no better way for
> a lot of software out there. Use of Zk to serve up configuration gives us
> the opportunity to do better (well I think it sounds better YMMV of course).
> > >
> > > -Gus
> > >
> > > On Fri, Aug 28, 2020 at 2:43 PM Tomás Fernández Löbbe <
> [email protected]> wrote:
> > >>
> > >> As for AMIs, you have to do it at least once, right? or are you
> thinking in someone using an pre-existing AMI? I see your point for the
> case of someone using the official Solr image as-is without any volume
> mounts I guess. I'm wondering if trying to put node configuration inside
> ZooKeeper is another thing were we try to solve things inside Solr that the
> industry already solved differently (AMIs, Docker images are exactly about
> packaging code and config)
> > >>
> > >> On Fri, Aug 28, 2020 at 11:11 AM Gus Heck <[email protected]> wrote:
> > >>>
> > >>> Which means whoever wants to make changes to solr needs to be
> able/willing/competent to make AMI/dockers/etc ... and one has to manage
> versions of those variants as opposed to managing versions of config files.
> > >>>
> > >>> On Fri, Aug 28, 2020 at 1:55 PM Tomás Fernández Löbbe <
> [email protected]> wrote:
> > >>>>
> > >>>> I think if you are using AMIs (or Docker), you could put the node
> configuration inside the AMI (or Docker image), as Ilan said, together with
> the binaries. Say you have a custom top-level handler (Collections, Cores,
> Info, whatever), which takes some arguments and it's configured in solr.xml
> and you are doing an upgrade, you probably want your old nodes (running
> with your old AMI/Docker image with old jars) to keep the old configuration
> and your new nodes to use the new.
> > >>>>
> > >>>> On Fri, Aug 28, 2020 at 10:42 AM Gus Heck <[email protected]>
> wrote:
> > >>>>>
> > >>>>> Putting solr.xml in zookeeper means you can add a node simply by
> starting solr pointing to the zookeeper, and ensure a consistent solr.xml
> for the new node if you've customized it. Since I rarely (never) hit use
> cases where I need different per node solr.xml. I generally advocate
> putting it in ZK, I'd say heterogeneous node configs is the special case
> for advanced use here.  I'm a fan of a (hypothetical future) world where
> nodes can be added/removed simply without need for local configuration. It
> would be desirable IMHO to have a smooth node add and remove process and
> having to install a file into a distribution manually after unpacking it
> (or having coordinate variations of config to be pushed to machines) is a
> minus. If and when autoscaling is happy again I'd like to be able to start
> an AMI in AWS pointing at zk (or similar) and have it join automatically,
> and then receive replicas to absorb load (per whatever autoscaling is
> specified), and then be able to issue a single command to a node to sunset
> the node that moves replicas back off of it (again per autoscaling
> preferences, failing if autoscaling constraints would be violated) and then
> asks the node to shut down so that the instance in AWS (or wherever) can be
> shut down safely.  This is a black friday,  new tenants/lost tenants, or
> new feature/EOL feature sort of use case.
> > >>>>>
> > >>>>> Thus IMHO all config for cloud should live somewhere in ZK. File
> system access should not be required to add/remove capacity. If multiple
> node configurations need to be supported we should have nodeTypes directory
> in zk (similar to configsets for collections), possible node specific
> configs there and an env var that can be read to determine the type (with
> some cluster level designation of a default node type). I think that would
> be sufficient to parameterize AMI stuff (or containers) by reading tags
> into env variables
> > >>>>>
> > >>>>> As for knowing what a node loaded, we really should be able to
> emit any config file we've loaded (without reference to disk or zk). They
> aren't that big and in most cases don't change that fast, so caching a
> simple copy as a string in memory (but only if THAT node loaded it) for
> verification would seem smart. Having a file on disk doesn't tell you if
> solr loaded with that version or if it's changed since solr loaded it
> either.
> > >>>>>
> > >>>>> Anyway, that's the pie in my sky...
> > >>>>>
> > >>>>> -Gus
> > >>>>>
> > >>>>> On Fri, Aug 28, 2020 at 11:51 AM Ilan Ginzburg <[email protected]>
> wrote:
> > >>>>>>
> > >>>>>> What I'm really looking for (and currently my understanding is
> that solr.xml is the only option) is a cluster config a Solr dev can set as
> a default when introducing a new feature for example, so that the config is
> picked out of the box in SolrCloud, yet allowing the end user to override
> it if he so wishes.
> > >>>>>>
> > >>>>>> But "cluster config" in this context with a caveat: when doing a
> rolling upgrade, nodes running new code need the new cluster config, nodes
> running old code need the previous cluster config... Having a per node
> solr.xml deployed atomically with the code as currently the case has
> disadvantages, but solves this problem effectively in a very simple way. If
> we were to move to a central cluster config, we'd likely need to introduce
> config versioning or as Noble suggested elsewhere, only write code that's
> backward compatible (w.r.t. config), deploy that code everywhere then once
> no old code is running, update the cluster config. I find this approach
> complicated from both dev and operational perspective with an unclear added
> value.
> > >>>>>>
> > >>>>>> Ilan
> > >>>>>>
> > >>>>>> PS. I've stumbled upon the loading of solr.xml from Zookeeper in
> the past but couldn't find it as I wrote my message so I thought I imagined
> it...
> > >>>>>>
> > >>>>>> It's in SolrDispatchFilter.loadNodeConfig(). It establishes a
> connection to ZK for fetching solr.xml then closes it.
> > >>>>>> It relies on system property waitForZk as the connection timeout
> (in seconds, defaults to 30) and system property zkHost as the Zookeeper
> host.
> > >>>>>>
> > >>>>>> I believe solr.xml can only end up in ZK through the use of
> ZkCLI. Then the user is on his own to manage SolrCloud version upgrades: if
> a new solr.xml is included as part of a new version of SolrCloud, the user
> having pushed a previous version into ZK will not see the update.
> > >>>>>> I wonder if putting solr.xml in ZK is a common practice.
> > >>>>>>
> > >>>>>> On Fri, Aug 28, 2020 at 4:58 PM Jan Høydahl <
> [email protected]> wrote:
> > >>>>>>>
> > >>>>>>> I interpret solr.xml as the node-local configuration for a
> single node.
> > >>>>>>> clusterprops.json is the cluster-wide configuration applying to
> all nodes.
> > >>>>>>> solrconfig.xml is of course per core etc
> > >>>>>>>
> > >>>>>>> solr.in.sh is the per-node ENV-VAR way of configuring a node,
> and many of those are picked up in solr.xml (other in bin/solr).
> > >>>>>>>
> > >>>>>>> I think it is important to keep a file-local config file which
> can only be modified if you have shell access to that local node, it
> provides an extra layer of security.
> > >>>>>>> And in certain cases a node may need a different configuration
> from another node, i.e. during an upgrade.
> > >>>>>>>
> > >>>>>>> I put solr.xml in zookeeper. It may have been a mistake, since
> it may not make all that much sense to load solr.xml which is a node-level
> file, from ZK. But if it uses var substitutions for all node-level stuff,
> it will still work since those vars are pulled from local properties when
> parsed anyway.
> > >>>>>>>
> > >>>>>>> I’m also somewhat against hijacking clusterprops.json as a
> general purpose JSON config file for the cluster. It was supposed to be for
> simple properties.
> > >>>>>>>
> > >>>>>>> Jan
> > >>>>>>>
> > >>>>>>> > 28. aug. 2020 kl. 14:23 skrev Erick Erickson <
> [email protected]>:
> > >>>>>>> >
> > >>>>>>> > Solr.xml can also exist on Zookeeper, it doesn’t _have_ to
> exist locally. You do have to restart to have any changes take effect.
> > >>>>>>> >
> > >>>>>>> > Long ago in a Solr far away solr.xml was where all the cores
> were defined. This was before “core discovery” was put in. Since solr.xml
> had to be there anyway and was read at startup, other global information
> was added and it’s lived on...
> > >>>>>>> >
> > >>>>>>> > Then clusterprops.json came along as a place to put, well,
> cluster-wide properties so having solr.xml too seems awkward. Although if
> you do have solr.xml locally to each node, you could theoretically have
> different settings for different Solr instances. Frankly I consider this
> more of a bug than a feature.
> > >>>>>>> >
> > >>>>>>> > I know there have been some talk about removing solr.xml
> entirely, but I’m not sure what the thinking is about what to do instead.
> Whatever we do needs to accommodate standalone. We could do the same trick
> we do now, and essentially move all the current options in solr.xml to
> clusterprops.json (or other ZK node) and read it locally for stand-alone.
> The API could even be used to change it if it was stored locally.
> > >>>>>>> >
> > >>>>>>> > That still leaves the chicken-and-egg problem if connecting to
> ZK in the first place.
> > >>>>>>> >
> > >>>>>>> >> On Aug 28, 2020, at 7:43 AM, Ilan Ginzburg <
> [email protected]> wrote:
> > >>>>>>> >>
> > >>>>>>> >> I want to ramp-up/discuss/inventory configuration options in
> Solr. Here's my understanding of what exists and what could/should be used
> depending on the need. Please correct/complete as needed (or point to
> documentation I might have missed).
> > >>>>>>> >>
> > >>>>>>> >> There are currently 3 sources of general configuration I'm
> aware of:
> > >>>>>>> >>      • Collection specific config bootstrapped by file
> solrconfig.xml and copied into the initial (_default) then subsequent
> Config Sets in Zookeeper.
> > >>>>>>> >>      • Cluster wide config in Zookeeper /clusterprops.json
> editable globally through Zookeeper interaction using an API. Not
> bootstrapped by anything (i.e. does not exist until the user explicitly
> creates it)
> > >>>>>>> >>      • Node config file solr.xml deployed with Solr on each
> node and loaded when Solr starts. Changes to this file are per node and
> require node restart to be taken into account.
> > >>>>>>> >> The Collection specific config (file solrconfig.xml then in
> Zookeeper /configs/<config set name>/solrconfig.xml) allows Solr devs to
> set reasonable defaults (the file is part of the Solr distribution).
> Content can be changed by users as they create new Config Sets persisted in
> Zookeeper.
> > >>>>>>> >>
> > >>>>>>> >> Zookeeper's /clusterprops.json can be edited through the
> collection admin API CLUSTERPROP. If users do not set anything there, the
> file doesn't even exist in Zookeeper therefore `Solr devs cannot use it to
> set a default cluster config, there's no clusterprops.json file in the Solr
> distrib like there's a solrconfig.xml.
> > >>>>>>> >>
> > >>>>>>> >> File solr.xml is used by Solr devs to set some reasonable
> default configuration (parametrized through property files or system
> properties). There's no API to change that file, users would have to
> edit/redeploy the file on each node and restart the Solr JVM on that node
> for the new config to be taken into account.
> > >>>>>>> >>
> > >>>>>>> >> Based on the above, my vision (or mental model) of what to
> use depending on the need:
> > >>>>>>> >>
> > >>>>>>> >> solrconfig.xml is the only per collection config. IMO it does
> its job correctly: Solr devs can set defaults, users tailor the content to
> what they need for new config sets. It's the only option for per collection
> config anyway.
> > >>>>>>> >>
> > >>>>>>> >> The real hesitation could be between solr.xml and Zookeeper
> /clusterprops.json. What should go where?
> > >>>>>>> >>
> > >>>>>>> >> For user configs (anything the user does to the Solr cluster
> AFTER it was deployed and started), /clusterprops.json seems to be the
> obvious choice and offers the right abstractions (global config, no need to
> worry about individual nodes, all nodes pick up configs and changes to
> configs dynamically).
> > >>>>>>> >>
> > >>>>>>> >> For configs that need to be available without requiring user
> intervention or needed before the connection to ZK is established, there's
> currently no other choice than using solr.xml. Such configuration obviously
> include parameters that are needed to connect to ZK (timeouts, credential
> provider and hopefully one day an option to either use direct ZK
> interaction code or Curator code), but also configuration of general
> features that should be the default without requiring users to opt in yet
> allowing then to easily opt out by editing solr.xml before deploying to
> their cluster (in the future, this could include which Lucene version to
> load in Solr for example).
> > >>>>>>> >>
> > >>>>>>> >> To summarize:
> > >>>>>>> >>      • Collection specific config? --> solrconfig.xml
> > >>>>>>> >>      • User provided cluster config once SolrCloud is
> running? --> ZK /clusterprops.json
> > >>>>>>> >>      • Solr dev provided cluster config? --> solr.xml
> > >>>>>>> >>
> > >>>>>>> >> Going forward, some (but only some!) of the config that
> currently can only live in solr.xml could be made to go to
> /clusterprops.json or another ZK based config file. This would require
> adding code to create that ZK file upon initial cluster start (to not force
> the user to push it) and devise a mechanism (likely a script, could be
> tricky though) to update that file in ZK when a new release of Solr is
> deployed and a previous version of that file already exists. Not impossible
> tasks, but not trivial ones either. Whatever the needs of such an approach
> are, it might be easier to keep the existing solr.xml as a file and allow
> users to define overrides in Zookeeper for the configuration parameters
> from solr.xml that make sense to be overridden in ZK (obviously ZK
> credentials or connection timeout do not make sense in that context, but
> defining the shard handler implementation class does since it is likely
> loaded after a node managed to connect to ZK).
> > >>>>>>> >>
> > >>>>>>> >> Some config will have to stay in a local Node file system
> file and only there no matter what: Zookeeper timeout definition or any
> node configuration that is needed before the node connects to Zookeeper.
> > >>>>>>> >>
> > >>>>>>> >
> > >>>>>>> >
> > >>>>>>> >
> ---------------------------------------------------------------------
> > >>>>>>> > To unsubscribe, e-mail: [email protected]
> > >>>>>>> > For additional commands, e-mail: [email protected]
> > >>>>>>> >
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> ---------------------------------------------------------------------
> > >>>>>>> To unsubscribe, e-mail: [email protected]
> > >>>>>>> For additional commands, e-mail: [email protected]
> > >>>>>>>
> > >>>>>
> > >>>>>
> > >>>>> --
> > >>>>> http://www.needhamsoftware.com (work)
> > >>>>> http://www.the111shift.com (play)
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> http://www.needhamsoftware.com (work)
> > >>> http://www.the111shift.com (play)
> > >
> > >
> > >
> > > --
> > > http://www.needhamsoftware.com (work)
> > > http://www.the111shift.com (play)
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> >
>
>
> --
> -----------------------------------------------------
> Noble Paul
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: Solr configuration options

Reply via email to