Very much agreed. I've been trying to figure out for a long time what is the point in having a replica DOWN state that has to be toggled (DOWN and then UP!) every time a node restarts. Considering that we could just combine ACTIVE and `live_nodes` to understand whether a replica is available. It's not even foolproof since kill -9 on a solr node won't mark all the replicas DOWN-- that doesn't happen until the node comes back up (perversely).
What would it take to get to a state where restarting a node would require a minimal amount of ZK work in most cases? On Sat, Nov 2, 2019 at 5:44 PM Mark Miller <markrmil...@gmail.com> wrote: > Give me a short bit to follow up and I will lay out my case and proposal. > > Everyone is then free to decide that we need to do something drastic or > that I'm wrong and we should just continue down the same road. If that's > the case, a lot of your work will get a lot easier and less impeded by me > and we will still all be happier. Win win. > > If we can just not make drastic changes for a just a brief week or so > window, I'll say what I have to say, you guys can judge and do whatever > you'd please. > > - mark > > On Fri, Nov 1, 2019 at 7:46 PM Mark Miller <markrmil...@gmail.com> wrote: > >> Hey All Solr Dev's, >> >> SolrCloud is sick right now. The way low level Zookeeper is handeled, the >> Overseer, is mix and mess of proper exception handling and super slow >> startup and shutdown, adding new things all the time with no concern for >> performance or proper ordering (which is harder to tell than you think). >> >> Our class dependency graph doesn't even work - we just force it. Sort of. >> If the whole system doesn't block and choke it's way to a start slow >> enough, lots of things fail. >> >> This thing coughs up, you toss stuff into the storm, a good chunk of >> time, what you want eventually come back without causing too much damage. >> >> There are so many things are are off or just plain wrong and the list is >> growing and growing. No one is following this or if you are, please back me >> up. This thing will collapse under it's own wait. >> >> So if you want to add yet another state format cluster state or some >> other optimization on this junk heap, you can expect me to push back. >> >> We should all be embarrassed by the state of things. >> >> I've got some ideas for addressing them that I'll share soon, but god, >> don't keep optimizing a turd in non backcompat Overseer loving ways. That >> Overseer is an atrocity. >> >> -- >> - Mark >> >> http://about.me/markrmiller >> > > > -- > - Mark > > http://about.me/markrmiller >