Hi Tomas, This type of a problem can be solved using alternate strategies. Here's how you can do so: register the updated version of the plugin in /healthcheck2 (while simultaneously the older version continues to work at /healthcheck). Make sure it works. Once it does, update the /healthcheck endpoint to use the latest version and unregister the /healthcheck2. Same can be done with other types of plugins. This is totally supported using the package manager today. I apologize for not documenting all such wonderful things about the package manager in details, but I shall attempt to do so with examples shortly so that you can consider adopting such approaches without resorting to solr.xml (which should go away for SolrCloud due to all the issues Erick mentioned). Regards, Ishan
On Thu, Sep 3, 2020 at 11:41 PM Tomás Fernández Löbbe <[email protected]> wrote: > I can see that some of these configurations should be moved to > clusterporps.json, I don’t believe this is the case for all of them. Some > are configurations that are targeting the local node (i.e sharedLib path), > some are needed before connecting to ZooKeeper (zk config). Configuration > of global handlers and components, while in general you do want to see the > same conf across all nodes, you may not want the changes to reflect > atomically and instead rely on a phased upgrade (rolling, blue/green, etc), > where the conf goes together with the binaries that are being deployed. I > also fear that making the configuration of some of these components dynamic > means we have to make the code handle them dynamically (i.e. recreate the > CollectionsHandler based on callback from ZooKeeper). This would be very > hardly used in reality, but all our code needs to be restructured to handle > this, I fear this will complicate the code needlessly, and may introduce > leaks and races of all kinds. If those components can have configuration > that should be dynamic (some toggle, threshold, etc), I’d love to see those > as clusterporps, key-value mostly. > > If we were to put this configuration in clusterprops, would that mean that > I’m only able to do config changes via API? On a new cluster, do I need to > start Solr, make a collections API call to change the collections handler? > Or am I supposed to manually change the clusterporps file before starting > Solr and push it to Zookeeper (having a file intended for manual edits and > API edits is bad IMO)? Maybe via the cli, but still, I’d need to do this > for every cluster I create (vs have the solr.xml in my source repository > and Docker image, for example). Also I lose the ability to have this > configuration in my git repo? > > I'm +1 to keep a node configuration local to the node in the filesystem. > Currently, it's solr.xml. I've seen comments about xml difficult to > read/write, I think that's personal preference so, while I don't see it > that way, I understand lots of people do and things have been moving away > to other formats, I'm open to discuss that as a change. > > > However, 1, 2, and 3, are not trivial for a large number of Solr nodes > and if they aren’t right diagnosing them can be “challenging”… > In my mind, solr.xml goes with your code. Having it up to date means > having all your nodes running the same version of your code. As I said, > this is the "desired state" of the cluster, but may not be the case all the > time (i.e. during deployments), and that's fine. Depending on how you > manage the cluster, you may want to live with different versions for some > time (you may have canaries or be doing a blue/green deployment, etc). > Realistically speaking, if you have a 500+ node cluster, you must have a > system in place to manage configuration and versions, let's not try to bend > backwards for a situation that isn't that realistic. > > Let me put an example of things I fear with making these changes atomic. > Let's say I want to start using a new, custom HealthCheckHandler > implementation, that I have put in a jar (and let's assume the jar is > already in all nodes). If I use solr.xml (where one can currently > configures this implementation), I can do a phased deployment (yes, this is > a restart of all nodes), if the healthcheck handler is buggy and fails > request, the nodes with the new code will never show as healthy, so the > deployment will likely stop (i.e. if you are using Kubernetes and using > probes, those instances will keep restarting, if you use ASG in AWS you can > do the same thing). If you make it an atomic change, bye-bye cluster, all > nodes will start reporting unhealthy (Kubernetes and ASG will kill all > those nodes). Good luck doing API changes to revert now, there is no node > to respond to those requests. Hopefully you were using some sort of stable > storage because all ephemeral is gone. Bringing back that cluster is going > to be a PITA. I have seen similar things happen. > > > On Thu, Sep 3, 2020 at 9:40 AM Erick Erickson <[email protected]> > wrote: > >> bq. Isn’t solr.xml is a way to hardcode config in a more flexible way >> that a Java class? >> >> Yes, and the problem word here is “flexible”. For a single-node system >> that flexibility is desirable. Flexibility comes at the cost of complexity, >> especially in the SolrCloud case. In this case, not so much Solr code >> complexity as operations complexity. >> >> For me this isn’t so much a question of functionality as >> administration/troubleshooting/barrier to entry. >> >> If: >> 1. you can guarantee that every solr.xml file on every node in your >> entire 500 node cluster is up to date >> 2. or you can guarantee that the solr.xml stored on Zookeeper >> 3. and you can guarantee that clusterprops.json in cloud mode is >> interacting properly with whichever solr.xml is read >> 4. Then I’d have no problem with solr.xml. >> >> However, 1, 2, and 3, are not trivial for a large number of Solr nodes >> and if they aren’t right diagnosing them can be “challenging”… >> >> Imagine all the ways that “somehow” the solr.xml file on one node or more >> nodes of a 500 node cluster didn’t get updated and you’re trying to track >> down why query X isn’t working as you expect. Some of the time. When you >> happen to hit conditions X, Y and Z on a subrequest that goes to the node >> in question (which won’t be all of the time, or even possibly a significant >> fraction of the time). Do Containers matter here? Some glitch in Puppet or >> similar? Somebody didn’t follow every step in the process in the playbook? >> It doesn’t matter how you got into this situation, tracking it down would >> be a nightmare. >> >> Or, for that matter, you’ve solved all the distribution concerns and >> _can_ guarantee 1 and 3. Then somebody pushes a solr.xml to ZK either >> intentionally or by mistake (OH, I thought I was on the QA system, oops). >> Now I get to spend a week tracking down why the guarantee of 1 is still >> true, it’s just not relevant any more. >> >> To me, it’s the same problem that is solved by the blob store for jar >> files, or having configsets in ZK. When I want something available to all >> my Solr instances, I do not want to have to run around to every node and >> determine that the object I copied there is the right one, especially if >> I’m trying to track down a problem. >> >> Sure, all my concerns can be solved, but why make it harder than it needs >> to be? Distributed systems are hard enough already… >> >> FWIW, >> Erick >> >> >> >> >> > On Sep 3, 2020, at 11:00 AM, Ilan Ginzburg <[email protected]> wrote: >> > >> > Isn’t solr.xml is a way to hardcode config in a more flexible way that >> a Java class? >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> >>
