Re: SolrCloud is sick.

David Smiley Sat, 02 Nov 2019 21:33:24 -0700

Yeah we do a bad job of the things you listed Noble.  :-(   My colleagues
want pointers to internal docs but the sad reality is there isn't any.  You
may notice I'm a stickler in my code reviews for requiring javadocs on all
top level classes.  I think more javadocs and code comments would be very
helpful -- especially for the major classes.  This might help us all and
others a lot more.  For example I think Lucene does a rather fine job of
this for its major classes -- IndexWriter being a good example.


~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Sat, Nov 2, 2019 at 7:32 PM Noble Paul <noble.p...@gmail.com> wrote:

> Hi,
>
> I believe there is a consensus on what is wrong with the way we have built
> the cluster state and overseer. We need to focus a bit more on the design
> aspect. Design, according to me, has the following elements:
>
> * How does it work?
>
> * What are the performance characteristics? Can it be done more
> efficiently?
>
> * What are the public touch points?
>
> ** Which are the files we store in ZK? Are they expected to be watched
> always?
>
> ** Or are they read on demand?
>
> ** The public APIs. Does it make sense to the user? Can it be further
> simplified? How does it compare to the other APIs in the system?
>
>
> We, as a community, do a bad job in dealing with these. While we focus on
> internal things, these are not discussed before it is too late. We usually
> do coding, tests, code review (sometimes) and commit. This leads to huge
> technical debt.
>
>
> This is not to put blame on one person or a group of people. (I
> occasionally see people discussing design issues upfront, I just hope that
> is the norm.)
>
>
> Now, why am I discussing this in this thread?
>
>
> While we agree there are problems, we are trying to solve the problem
> using the same process we used to create these problems. Again, I'm not
> questioning the intent or competence of anyone. Unless we set the process
> right, we are doomed to make the same mistakes again.
>
>
> I whole heartedly endorse any effort to improve SolrCloud/overseer. At the
> same time I fail to see us leveraging the collective experience of our
> community through meaningful discussion.
>
>
> I hope we don't resort to personal attacks and use this as an opportunity
> to improve our processes.
> Thanks
>
> On Sun, Nov 3, 2019, 9:52 AM Scott Blum <dragonsi...@gmail.com> wrote:
>
>> Very much agreed.  I've been trying to figure out for a long time what is
>> the point in having a replica DOWN state that has to be toggled (DOWN and
>> then UP!) every time a node restarts.  Considering that we could just
>> combine ACTIVE and `live_nodes` to understand whether a replica is
>> available.  It's not even foolproof since kill -9 on a solr node won't mark
>> all the replicas DOWN-- that doesn't happen until the node comes back up
>> (perversely).
>>
>> What would it take to get to a state where restarting a node would
>> require a minimal amount of ZK work in most cases?
>>
>> On Sat, Nov 2, 2019 at 5:44 PM Mark Miller <markrmil...@gmail.com> wrote:
>>
>>> Give me a short bit to follow up and I will lay out my case and proposal.
>>>
>>> Everyone is then free to decide that we need to do something drastic or
>>> that I'm wrong and we should just continue down the same road. If that's
>>> the case, a lot of your work will get a lot easier and less impeded by me
>>> and we will still all be happier. Win win.
>>>
>>> If we can just not make drastic changes for a just a brief week or so
>>> window, I'll say what I have to say, you guys can judge and do whatever
>>> you'd please.
>>>
>>> - mark
>>>
>>> On Fri, Nov 1, 2019 at 7:46 PM Mark Miller <markrmil...@gmail.com>
>>> wrote:
>>>
>>>> Hey All Solr Dev's,
>>>>
>>>> SolrCloud is sick right now. The way low level Zookeeper is handeled,
>>>> the Overseer, is mix and mess of proper exception handling and super slow
>>>> startup and shutdown, adding new things all the time with no concern for
>>>> performance or proper ordering (which is harder to tell than you think).
>>>>
>>>> Our class dependency graph doesn't even work - we just force it. Sort
>>>> of. If the whole system  doesn't block and choke it's way to a start slow
>>>> enough, lots of things fail.
>>>>
>>>> This thing coughs up, you toss stuff into the storm, a good chunk of
>>>> time, what you want eventually come back without causing too much damage.
>>>>
>>>> There are so many things are are off or just plain wrong and the list
>>>> is growing and growing. No one is following this or if you are, please back
>>>> me up. This thing will collapse under it's own wait.
>>>>
>>>> So if you want to add yet another state format cluster state or some
>>>> other optimization on this junk heap, you can expect me to push back.
>>>>
>>>> We should all be embarrassed by the state of things.
>>>>
>>>> I've got some ideas for addressing them that I'll share soon, but god,
>>>> don't keep optimizing a turd in non backcompat Overseer loving ways. That
>>>> Overseer is an atrocity.
>>>>
>>>> --
>>>> - Mark
>>>>
>>>> http://about.me/markrmiller
>>>>
>>>
>>>
>>> --
>>> - Mark
>>>
>>> http://about.me/markrmiller
>>>
>>

Re: SolrCloud is sick.

Reply via email to