Re: Solr configuration options

Alexandre Rafalovitch Fri, 28 Aug 2020 12:32:32 -0700

This is way above my head, but I wonder if we could dogfood any of
this with a future Solr cloud example? At the moment, it sets up 2-4
nodes, 1 collection, any number of shards/replicas. And it does it by
directory clone and some magic in bin/solr to ensure logs don't step
on each other's foot.


If we have an idea of what this should look like and an example we
actually ship, we could probably make it much more concrete.

Regards,
   Alex.


On Fri, 28 Aug 2020 at 15:12, Gus Heck <gus.h...@gmail.com> wrote:
>
> Sure of course someone has to set up the first one, that should be an initial 
> collaboration with devops one can never escape that. Mount points can be 
> established in an automated fashion and named by convention. My yearning is 
> to make the devops side of it devops based (provide machines that look like X 
> where all the "X things" are attributes familiar to devops people such as 
> CPUs/mounts/RAM/etc.) and the Solr side of it controlled by those who are 
> experts in Solr to the greatest extent possible. So my desire is that Solr 
> specific stuff go in ZK and machine definitions be controlled by devops. Once 
> the initial setup for type X is done then the solr guy says to devops pls 
> give me 3 more of type X (zk locations are a devops thing btw, they might 
> move zk as they see fit) and when they start, the nodes join the cluster. 
> Solr guy does his thing, twiddles configs to make it hum (within limits, of 
> course, some changes require machine level changes), occasionally requests 
> reboots, and when he doesn't need the machines he says... you can turn off 
> machine A, B and C now. Solr guy doesn't care if it's AMI or docker or that 
> new Flazllebarp thing that devops seem to like for no clear reason other than 
> it's sold to them by TABS (TinyAuspexBananaSoft Inc) who threw it in when 
> they sold them a bunch of other stuff...
>
> The config is packaged with the code because there's no better way for a lot 
> of software out there. Use of Zk to serve up configuration gives us the 
> opportunity to do better (well I think it sounds better YMMV of course).
>
> -Gus
>
> On Fri, Aug 28, 2020 at 2:43 PM Tomás Fernández Löbbe <tomasflo...@gmail.com> 
> wrote:
>>
>> As for AMIs, you have to do it at least once, right? or are you thinking in 
>> someone using an pre-existing AMI? I see your point for the case of someone 
>> using the official Solr image as-is without any volume mounts I guess. I'm 
>> wondering if trying to put node configuration inside ZooKeeper is another 
>> thing were we try to solve things inside Solr that the industry already 
>> solved differently (AMIs, Docker images are exactly about packaging code and 
>> config)
>>
>> On Fri, Aug 28, 2020 at 11:11 AM Gus Heck <gus.h...@gmail.com> wrote:
>>>
>>> Which means whoever wants to make changes to solr needs to be 
>>> able/willing/competent to make AMI/dockers/etc ... and one has to manage 
>>> versions of those variants as opposed to managing versions of config files.
>>>
>>> On Fri, Aug 28, 2020 at 1:55 PM Tomás Fernández Löbbe 
>>> <tomasflo...@gmail.com> wrote:
>>>>
>>>> I think if you are using AMIs (or Docker), you could put the node 
>>>> configuration inside the AMI (or Docker image), as Ilan said, together 
>>>> with the binaries. Say you have a custom top-level handler (Collections, 
>>>> Cores, Info, whatever), which takes some arguments and it's configured in 
>>>> solr.xml and you are doing an upgrade, you probably want your old nodes 
>>>> (running with your old AMI/Docker image with old jars) to keep the old 
>>>> configuration and your new nodes to use the new.
>>>>
>>>> On Fri, Aug 28, 2020 at 10:42 AM Gus Heck <gus.h...@gmail.com> wrote:
>>>>>
>>>>> Putting solr.xml in zookeeper means you can add a node simply by starting 
>>>>> solr pointing to the zookeeper, and ensure a consistent solr.xml for the 
>>>>> new node if you've customized it. Since I rarely (never) hit use cases 
>>>>> where I need different per node solr.xml. I generally advocate putting it 
>>>>> in ZK, I'd say heterogeneous node configs is the special case for 
>>>>> advanced use here.  I'm a fan of a (hypothetical future) world where 
>>>>> nodes can be added/removed simply without need for local configuration. 
>>>>> It would be desirable IMHO to have a smooth node add and remove process 
>>>>> and having to install a file into a distribution manually after unpacking 
>>>>> it (or having coordinate variations of config to be pushed to machines) 
>>>>> is a minus. If and when autoscaling is happy again I'd like to be able to 
>>>>> start an AMI in AWS pointing at zk (or similar) and have it join 
>>>>> automatically, and then receive replicas to absorb load (per whatever 
>>>>> autoscaling is specified), and then be able to issue a single command to 
>>>>> a node to sunset the node that moves replicas back off of it (again per 
>>>>> autoscaling preferences, failing if autoscaling constraints would be 
>>>>> violated) and then asks the node to shut down so that the instance in AWS 
>>>>> (or wherever) can be shut down safely.  This is a black friday,  new 
>>>>> tenants/lost tenants, or new feature/EOL feature sort of use case.
>>>>>
>>>>> Thus IMHO all config for cloud should live somewhere in ZK. File system 
>>>>> access should not be required to add/remove capacity. If multiple node 
>>>>> configurations need to be supported we should have nodeTypes directory in 
>>>>> zk (similar to configsets for collections), possible node specific 
>>>>> configs there and an env var that can be read to determine the type (with 
>>>>> some cluster level designation of a default node type). I think that 
>>>>> would be sufficient to parameterize AMI stuff (or containers) by reading 
>>>>> tags into env variables
>>>>>
>>>>> As for knowing what a node loaded, we really should be able to emit any 
>>>>> config file we've loaded (without reference to disk or zk). They aren't 
>>>>> that big and in most cases don't change that fast, so caching a simple 
>>>>> copy as a string in memory (but only if THAT node loaded it) for 
>>>>> verification would seem smart. Having a file on disk doesn't tell you if 
>>>>> solr loaded with that version or if it's changed since solr loaded it 
>>>>> either.
>>>>>
>>>>> Anyway, that's the pie in my sky...
>>>>>
>>>>> -Gus
>>>>>
>>>>> On Fri, Aug 28, 2020 at 11:51 AM Ilan Ginzburg <ilans...@gmail.com> wrote:
>>>>>>
>>>>>> What I'm really looking for (and currently my understanding is that 
>>>>>> solr.xml is the only option) is a cluster config a Solr dev can set as a 
>>>>>> default when introducing a new feature for example, so that the config 
>>>>>> is picked out of the box in SolrCloud, yet allowing the end user to 
>>>>>> override it if he so wishes.
>>>>>>
>>>>>> But "cluster config" in this context with a caveat: when doing a rolling 
>>>>>> upgrade, nodes running new code need the new cluster config, nodes 
>>>>>> running old code need the previous cluster config... Having a per node 
>>>>>> solr.xml deployed atomically with the code as currently the case has 
>>>>>> disadvantages, but solves this problem effectively in a very simple way. 
>>>>>> If we were to move to a central cluster config, we'd likely need to 
>>>>>> introduce config versioning or as Noble suggested elsewhere, only write 
>>>>>> code that's backward compatible (w.r.t. config), deploy that code 
>>>>>> everywhere then once no old code is running, update the cluster config. 
>>>>>> I find this approach complicated from both dev and operational 
>>>>>> perspective with an unclear added value.
>>>>>>
>>>>>> Ilan
>>>>>>
>>>>>> PS. I've stumbled upon the loading of solr.xml from Zookeeper in the 
>>>>>> past but couldn't find it as I wrote my message so I thought I imagined 
>>>>>> it...
>>>>>>
>>>>>> It's in SolrDispatchFilter.loadNodeConfig(). It establishes a connection 
>>>>>> to ZK for fetching solr.xml then closes it.
>>>>>> It relies on system property waitForZk as the connection timeout (in 
>>>>>> seconds, defaults to 30) and system property zkHost as the Zookeeper 
>>>>>> host.
>>>>>>
>>>>>> I believe solr.xml can only end up in ZK through the use of ZkCLI. Then 
>>>>>> the user is on his own to manage SolrCloud version upgrades: if a new 
>>>>>> solr.xml is included as part of a new version of SolrCloud, the user 
>>>>>> having pushed a previous version into ZK will not see the update.
>>>>>> I wonder if putting solr.xml in ZK is a common practice.
>>>>>>
>>>>>> On Fri, Aug 28, 2020 at 4:58 PM Jan Høydahl <jan....@cominvent.com> 
>>>>>> wrote:
>>>>>>>
>>>>>>> I interpret solr.xml as the node-local configuration for a single node.
>>>>>>> clusterprops.json is the cluster-wide configuration applying to all 
>>>>>>> nodes.
>>>>>>> solrconfig.xml is of course per core etc
>>>>>>>
>>>>>>> solr.in.sh is the per-node ENV-VAR way of configuring a node, and many 
>>>>>>> of those are picked up in solr.xml (other in bin/solr).
>>>>>>>
>>>>>>> I think it is important to keep a file-local config file which can only 
>>>>>>> be modified if you have shell access to that local node, it provides an 
>>>>>>> extra layer of security.
>>>>>>> And in certain cases a node may need a different configuration from 
>>>>>>> another node, i.e. during an upgrade.
>>>>>>>
>>>>>>> I put solr.xml in zookeeper. It may have been a mistake, since it may 
>>>>>>> not make all that much sense to load solr.xml which is a node-level 
>>>>>>> file, from ZK. But if it uses var substitutions for all node-level 
>>>>>>> stuff, it will still work since those vars are pulled from local 
>>>>>>> properties when parsed anyway.
>>>>>>>
>>>>>>> I’m also somewhat against hijacking clusterprops.json as a general 
>>>>>>> purpose JSON config file for the cluster. It was supposed to be for 
>>>>>>> simple properties.
>>>>>>>
>>>>>>> Jan
>>>>>>>
>>>>>>> > 28. aug. 2020 kl. 14:23 skrev Erick Erickson 
>>>>>>> > <erickerick...@gmail.com>:
>>>>>>> >
>>>>>>> > Solr.xml can also exist on Zookeeper, it doesn’t _have_ to exist 
>>>>>>> > locally. You do have to restart to have any changes take effect.
>>>>>>> >
>>>>>>> > Long ago in a Solr far away solr.xml was where all the cores were 
>>>>>>> > defined. This was before “core discovery” was put in. Since solr.xml 
>>>>>>> > had to be there anyway and was read at startup, other global 
>>>>>>> > information was added and it’s lived on...
>>>>>>> >
>>>>>>> > Then clusterprops.json came along as a place to put, well, 
>>>>>>> > cluster-wide properties so having solr.xml too seems awkward. 
>>>>>>> > Although if you do have solr.xml locally to each node, you could 
>>>>>>> > theoretically have different settings for different Solr instances. 
>>>>>>> > Frankly I consider this more of a bug than a feature.
>>>>>>> >
>>>>>>> > I know there have been some talk about removing solr.xml entirely, 
>>>>>>> > but I’m not sure what the thinking is about what to do instead. 
>>>>>>> > Whatever we do needs to accommodate standalone. We could do the same 
>>>>>>> > trick we do now, and essentially move all the current options in 
>>>>>>> > solr.xml to clusterprops.json (or other ZK node) and read it locally 
>>>>>>> > for stand-alone. The API could even be used to change it if it was 
>>>>>>> > stored locally.
>>>>>>> >
>>>>>>> > That still leaves the chicken-and-egg problem if connecting to ZK in 
>>>>>>> > the first place.
>>>>>>> >
>>>>>>> >> On Aug 28, 2020, at 7:43 AM, Ilan Ginzburg <ilans...@gmail.com> 
>>>>>>> >> wrote:
>>>>>>> >>
>>>>>>> >> I want to ramp-up/discuss/inventory configuration options in Solr. 
>>>>>>> >> Here's my understanding of what exists and what could/should be used 
>>>>>>> >> depending on the need. Please correct/complete as needed (or point 
>>>>>>> >> to documentation I might have missed).
>>>>>>> >>
>>>>>>> >> There are currently 3 sources of general configuration I'm aware of:
>>>>>>> >>      • Collection specific config bootstrapped by file 
>>>>>>> >> solrconfig.xml and copied into the initial (_default) then 
>>>>>>> >> subsequent Config Sets in Zookeeper.
>>>>>>> >>      • Cluster wide config in Zookeeper /clusterprops.json editable 
>>>>>>> >> globally through Zookeeper interaction using an API. Not 
>>>>>>> >> bootstrapped by anything (i.e. does not exist until the user 
>>>>>>> >> explicitly creates it)
>>>>>>> >>      • Node config file solr.xml deployed with Solr on each node and 
>>>>>>> >> loaded when Solr starts. Changes to this file are per node and 
>>>>>>> >> require node restart to be taken into account.
>>>>>>> >> The Collection specific config (file solrconfig.xml then in 
>>>>>>> >> Zookeeper /configs/<config set name>/solrconfig.xml) allows Solr 
>>>>>>> >> devs to set reasonable defaults (the file is part of the Solr 
>>>>>>> >> distribution). Content can be changed by users as they create new 
>>>>>>> >> Config Sets persisted in Zookeeper.
>>>>>>> >>
>>>>>>> >> Zookeeper's /clusterprops.json can be edited through the collection 
>>>>>>> >> admin API CLUSTERPROP. If users do not set anything there, the file 
>>>>>>> >> doesn't even exist in Zookeeper therefore `Solr devs cannot use it 
>>>>>>> >> to set a default cluster config, there's no clusterprops.json file 
>>>>>>> >> in the Solr distrib like there's a solrconfig.xml.
>>>>>>> >>
>>>>>>> >> File solr.xml is used by Solr devs to set some reasonable default 
>>>>>>> >> configuration (parametrized through property files or system 
>>>>>>> >> properties). There's no API to change that file, users would have to 
>>>>>>> >> edit/redeploy the file on each node and restart the Solr JVM on that 
>>>>>>> >> node for the new config to be taken into account.
>>>>>>> >>
>>>>>>> >> Based on the above, my vision (or mental model) of what to use 
>>>>>>> >> depending on the need:
>>>>>>> >>
>>>>>>> >> solrconfig.xml is the only per collection config. IMO it does its 
>>>>>>> >> job correctly: Solr devs can set defaults, users tailor the content 
>>>>>>> >> to what they need for new config sets. It's the only option for per 
>>>>>>> >> collection config anyway.
>>>>>>> >>
>>>>>>> >> The real hesitation could be between solr.xml and Zookeeper 
>>>>>>> >> /clusterprops.json. What should go where?
>>>>>>> >>
>>>>>>> >> For user configs (anything the user does to the Solr cluster AFTER 
>>>>>>> >> it was deployed and started), /clusterprops.json seems to be the 
>>>>>>> >> obvious choice and offers the right abstractions (global config, no 
>>>>>>> >> need to worry about individual nodes, all nodes pick up configs and 
>>>>>>> >> changes to configs dynamically).
>>>>>>> >>
>>>>>>> >> For configs that need to be available without requiring user 
>>>>>>> >> intervention or needed before the connection to ZK is established, 
>>>>>>> >> there's currently no other choice than using solr.xml. Such 
>>>>>>> >> configuration obviously include parameters that are needed to 
>>>>>>> >> connect to ZK (timeouts, credential provider and hopefully one day 
>>>>>>> >> an option to either use direct ZK interaction code or Curator code), 
>>>>>>> >> but also configuration of general features that should be the 
>>>>>>> >> default without requiring users to opt in yet allowing then to 
>>>>>>> >> easily opt out by editing solr.xml before deploying to their cluster 
>>>>>>> >> (in the future, this could include which Lucene version to load in 
>>>>>>> >> Solr for example).
>>>>>>> >>
>>>>>>> >> To summarize:
>>>>>>> >>      • Collection specific config? --> solrconfig.xml
>>>>>>> >>      • User provided cluster config once SolrCloud is running? --> 
>>>>>>> >> ZK /clusterprops.json
>>>>>>> >>      • Solr dev provided cluster config? --> solr.xml
>>>>>>> >>
>>>>>>> >> Going forward, some (but only some!) of the config that currently 
>>>>>>> >> can only live in solr.xml could be made to go to /clusterprops.json 
>>>>>>> >> or another ZK based config file. This would require adding code to 
>>>>>>> >> create that ZK file upon initial cluster start (to not force the 
>>>>>>> >> user to push it) and devise a mechanism (likely a script, could be 
>>>>>>> >> tricky though) to update that file in ZK when a new release of Solr 
>>>>>>> >> is deployed and a previous version of that file already exists. Not 
>>>>>>> >> impossible tasks, but not trivial ones either. Whatever the needs of 
>>>>>>> >> such an approach are, it might be easier to keep the existing 
>>>>>>> >> solr.xml as a file and allow users to define overrides in Zookeeper 
>>>>>>> >> for the configuration parameters from solr.xml that make sense to be 
>>>>>>> >> overridden in ZK (obviously ZK credentials or connection timeout do 
>>>>>>> >> not make sense in that context, but defining the shard handler 
>>>>>>> >> implementation class does since it is likely loaded after a node 
>>>>>>> >> managed to connect to ZK).
>>>>>>> >>
>>>>>>> >> Some config will have to stay in a local Node file system file and 
>>>>>>> >> only there no matter what: Zookeeper timeout definition or any node 
>>>>>>> >> configuration that is needed before the node connects to Zookeeper.
>>>>>>> >>
>>>>>>> >
>>>>>>> >
>>>>>>> > ---------------------------------------------------------------------
>>>>>>> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>>>>>> > For additional commands, e-mail: dev-h...@lucene.apache.org
>>>>>>> >
>>>>>>>
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>>>>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> http://www.needhamsoftware.com (work)
>>>>> http://www.the111shift.com (play)
>>>
>>>
>>>
>>> --
>>> http://www.needhamsoftware.com (work)
>>> http://www.the111shift.com (play)
>
>
>
> --
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Solr configuration options

Reply via email to