Very much agreed.  I've been trying to figure out for a long time what is
the point in having a replica DOWN state that has to be toggled (DOWN and
then UP!) every time a node restarts.  Considering that we could just
combine ACTIVE and `live_nodes` to understand whether a replica is
available.  It's not even foolproof since kill -9 on a solr node won't mark
all the replicas DOWN-- that doesn't happen until the node comes back up
(perversely).

What would it take to get to a state where restarting a node would require
a minimal amount of ZK work in most cases?

On Sat, Nov 2, 2019 at 5:44 PM Mark Miller <markrmil...@gmail.com> wrote:

> Give me a short bit to follow up and I will lay out my case and proposal.
>
> Everyone is then free to decide that we need to do something drastic or
> that I'm wrong and we should just continue down the same road. If that's
> the case, a lot of your work will get a lot easier and less impeded by me
> and we will still all be happier. Win win.
>
> If we can just not make drastic changes for a just a brief week or so
> window, I'll say what I have to say, you guys can judge and do whatever
> you'd please.
>
> - mark
>
> On Fri, Nov 1, 2019 at 7:46 PM Mark Miller <markrmil...@gmail.com> wrote:
>
>> Hey All Solr Dev's,
>>
>> SolrCloud is sick right now. The way low level Zookeeper is handeled, the
>> Overseer, is mix and mess of proper exception handling and super slow
>> startup and shutdown, adding new things all the time with no concern for
>> performance or proper ordering (which is harder to tell than you think).
>>
>> Our class dependency graph doesn't even work - we just force it. Sort of.
>> If the whole system  doesn't block and choke it's way to a start slow
>> enough, lots of things fail.
>>
>> This thing coughs up, you toss stuff into the storm, a good chunk of
>> time, what you want eventually come back without causing too much damage.
>>
>> There are so many things are are off or just plain wrong and the list is
>> growing and growing. No one is following this or if you are, please back me
>> up. This thing will collapse under it's own wait.
>>
>> So if you want to add yet another state format cluster state or some
>> other optimization on this junk heap, you can expect me to push back.
>>
>> We should all be embarrassed by the state of things.
>>
>> I've got some ideas for addressing them that I'll share soon, but god,
>> don't keep optimizing a turd in non backcompat Overseer loving ways. That
>> Overseer is an atrocity.
>>
>> --
>> - Mark
>>
>> http://about.me/markrmiller
>>
>
>
> --
> - Mark
>
> http://about.me/markrmiller
>

Reply via email to