On Fri, 4 Sep, 2020, 12:05 am Erick Erickson, <erickerick...@gmail.com> wrote:
> > > I wish everyone would just use Solr the way I think about it ;) > https://twitter.com/ichattopadhyaya/status/1210868171814473728 > > On Sep 3, 2020, at 2:11 PM, Tomás Fernández Löbbe <tomasflo...@gmail.com> > wrote: > > > > I can see that some of these configurations should be moved to > clusterporps.json, I don’t believe this is the case for all of them. Some > are configurations that are targeting the local node (i.e sharedLib path), > some are needed before connecting to ZooKeeper (zk config). Configuration > of global handlers and components, while in general you do want to see the > same conf across all nodes, you may not want the changes to reflect > atomically and instead rely on a phased upgrade (rolling, blue/green, etc), > where the conf goes together with the binaries that are being deployed. I > also fear that making the configuration of some of these components dynamic > means we have to make the code handle them dynamically (i.e. recreate the > CollectionsHandler based on callback from ZooKeeper). This would be very > hardly used in reality, but all our code needs to be restructured to handle > this, I fear this will complicate the code needlessly, and may introduce > leaks and races of all kinds. If those components can have configuration > that should be dynamic (some toggle, threshold, etc), I’d love to see those > as clusterporps, key-value mostly. > > > > If we were to put this configuration in clusterprops, would that mean > that I’m only able to do config changes via API? On a new cluster, do I > need to start Solr, make a collections API call to change the collections > handler? Or am I supposed to manually change the clusterporps file before > starting Solr and push it to Zookeeper (having a file intended for manual > edits and API edits is bad IMO)? Maybe via the cli, but still, I’d need to > do this for every cluster I create (vs have the solr.xml in my source > repository and Docker image, for example). Also I lose the ability to have > this configuration in my git repo? > > > > I'm +1 to keep a node configuration local to the node in the filesystem. > Currently, it's solr.xml. I've seen comments about xml difficult to > read/write, I think that's personal preference so, while I don't see it > that way, I understand lots of people do and things have been moving away > to other formats, I'm open to discuss that as a change. > > > > > However, 1, 2, and 3, are not trivial for a large number of Solr nodes > and if they aren’t right diagnosing them can be “challenging”… > > In my mind, solr.xml goes with your code. Having it up to date means > having all your nodes running the same version of your code. As I said, > this is the "desired state" of the cluster, but may not be the case all the > time (i.e. during deployments), and that's fine. Depending on how you > manage the cluster, you may want to live with different versions for some > time (you may have canaries or be doing a blue/green deployment, etc). > Realistically speaking, if you have a 500+ node cluster, you must have a > system in place to manage configuration and versions, let's not try to bend > backwards for a situation that isn't that realistic. > > > > Let me put an example of things I fear with making these changes atomic. > Let's say I want to start using a new, custom HealthCheckHandler > implementation, that I have put in a jar (and let's assume the jar is > already in all nodes). If I use solr.xml (where one can currently > configures this implementation), I can do a phased deployment (yes, this is > a restart of all nodes), if the healthcheck handler is buggy and fails > request, the nodes with the new code will never show as healthy, so the > deployment will likely stop (i.e. if you are using Kubernetes and using > probes, those instances will keep restarting, if you use ASG in AWS you can > do the same thing). If you make it an atomic change, bye-bye cluster, all > nodes will start reporting unhealthy (Kubernetes and ASG will kill all > those nodes). Good luck doing API changes to revert now, there is no node > to respond to those requests. Hopefully you were using some sort of stable > storage because all ephemeral is gone. Bringing back that cluster is going > to be a PITA. I have seen similar things happen. > > > > > > On Thu, Sep 3, 2020 at 9:40 AM Erick Erickson <erickerick...@gmail.com> > wrote: > > bq. Isn’t solr.xml is a way to hardcode config in a more flexible way > that a Java class? > > > > Yes, and the problem word here is “flexible”. For a single-node system > that flexibility is desirable. Flexibility comes at the cost of complexity, > especially in the SolrCloud case. In this case, not so much Solr code > complexity as operations complexity. > > > > For me this isn’t so much a question of functionality as > administration/troubleshooting/barrier to entry. > > > > If: > > 1. you can guarantee that every solr.xml file on every node in your > entire 500 node cluster is up to date > > 2. or you can guarantee that the solr.xml stored on Zookeeper > > 3. and you can guarantee that clusterprops.json in cloud mode is > interacting properly with whichever solr.xml is read > > 4. Then I’d have no problem with solr.xml. > > > > However, 1, 2, and 3, are not trivial for a large number of Solr nodes > and if they aren’t right diagnosing them can be “challenging”… > > > > Imagine all the ways that “somehow” the solr.xml file on one node or > more nodes of a 500 node cluster didn’t get updated and you’re trying to > track down why query X isn’t working as you expect. Some of the time. When > you happen to hit conditions X, Y and Z on a subrequest that goes to the > node in question (which won’t be all of the time, or even possibly a > significant fraction of the time). Do Containers matter here? Some glitch > in Puppet or similar? Somebody didn’t follow every step in the process in > the playbook? It doesn’t matter how you got into this situation, tracking > it down would be a nightmare. > > > > Or, for that matter, you’ve solved all the distribution concerns and > _can_ guarantee 1 and 3. Then somebody pushes a solr.xml to ZK either > intentionally or by mistake (OH, I thought I was on the QA system, oops). > Now I get to spend a week tracking down why the guarantee of 1 is still > true, it’s just not relevant any more. > > > > To me, it’s the same problem that is solved by the blob store for jar > files, or having configsets in ZK. When I want something available to all > my Solr instances, I do not want to have to run around to every node and > determine that the object I copied there is the right one, especially if > I’m trying to track down a problem. > > > > Sure, all my concerns can be solved, but why make it harder than it > needs to be? Distributed systems are hard enough already… > > > > FWIW, > > Erick > > > > > > > > > > > On Sep 3, 2020, at 11:00 AM, Ilan Ginzburg <ilans...@gmail.com> wrote: > > > > > > Isn’t solr.xml is a way to hardcode config in a more flexible way > that a Java class? > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > > For additional commands, e-mail: dev-h...@lucene.apache.org > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >