Not much. Something you can understand. How about tests < 10 seconds fail
or not. Good logging and as a backup good debug logging. Docs on how things
are designed to work? Tracking of all important operations and how long
they take with tight cutoffs? Proper response to interruption 100% of the
time? The idea of a cluster start and stop? Of a cluster install to ZK
initially. Drop all legacyCloud support, stateformat=1 support, maybe a few
other things.

I've got some stuff, I'm gonna pull out as fast as I sensibly can given
many setbacks and too little sleep for a long time.

I'm not here to do all the of the lift for everyone, but unless I get sick
in the next week or two or my 10 backup methods and git pushes and backup
branches fail or I just burn the hell out, I have a solid refuge that we
can knock out and then build on with confidence.

- Mark

On Sat, Nov 2, 2019 at 5:52 PM Scott Blum <dragonsi...@gmail.com> wrote:

> Very much agreed.  I've been trying to figure out for a long time what is
> the point in having a replica DOWN state that has to be toggled (DOWN and
> then UP!) every time a node restarts.  Considering that we could just
> combine ACTIVE and `live_nodes` to understand whether a replica is
> available.  It's not even foolproof since kill -9 on a solr node won't mark
> all the replicas DOWN-- that doesn't happen until the node comes back up
> (perversely).
>
> What would it take to get to a state where restarting a node would require
> a minimal amount of ZK work in most cases?
>
> On Sat, Nov 2, 2019 at 5:44 PM Mark Miller <markrmil...@gmail.com> wrote:
>
>> Give me a short bit to follow up and I will lay out my case and proposal.
>>
>> Everyone is then free to decide that we need to do something drastic or
>> that I'm wrong and we should just continue down the same road. If that's
>> the case, a lot of your work will get a lot easier and less impeded by me
>> and we will still all be happier. Win win.
>>
>> If we can just not make drastic changes for a just a brief week or so
>> window, I'll say what I have to say, you guys can judge and do whatever
>> you'd please.
>>
>> - mark
>>
>> On Fri, Nov 1, 2019 at 7:46 PM Mark Miller <markrmil...@gmail.com> wrote:
>>
>>> Hey All Solr Dev's,
>>>
>>> SolrCloud is sick right now. The way low level Zookeeper is handeled,
>>> the Overseer, is mix and mess of proper exception handling and super slow
>>> startup and shutdown, adding new things all the time with no concern for
>>> performance or proper ordering (which is harder to tell than you think).
>>>
>>> Our class dependency graph doesn't even work - we just force it. Sort
>>> of. If the whole system  doesn't block and choke it's way to a start slow
>>> enough, lots of things fail.
>>>
>>> This thing coughs up, you toss stuff into the storm, a good chunk of
>>> time, what you want eventually come back without causing too much damage.
>>>
>>> There are so many things are are off or just plain wrong and the list is
>>> growing and growing. No one is following this or if you are, please back me
>>> up. This thing will collapse under it's own wait.
>>>
>>> So if you want to add yet another state format cluster state or some
>>> other optimization on this junk heap, you can expect me to push back.
>>>
>>> We should all be embarrassed by the state of things.
>>>
>>> I've got some ideas for addressing them that I'll share soon, but god,
>>> don't keep optimizing a turd in non backcompat Overseer loving ways. That
>>> Overseer is an atrocity.
>>>
>>> --
>>> - Mark
>>>
>>> http://about.me/markrmiller
>>>
>>
>>
>> --
>> - Mark
>>
>> http://about.me/markrmiller
>>
>

-- 
- Mark

http://about.me/markrmiller

Reply via email to