Re: SolrCloud is sick.

Mark Miller Sun, 03 Nov 2019 05:36:30 -0800

Personally, I believe the latter so strongly, if I can’t convince the
others in the raft with me, I’m jumping in and swimming to another raft
after my entire adult life here.


Mark

On Sun, Nov 3, 2019 at 7:30 AM Mark Miller <markrmil...@gmail.com> wrote:

> In fact this will be a fundamental difference some of us are about to
> split between.
>
> Those that think they can ever fix the tests or the system or the 1000s of
> bugs we have and keep adding due to our current world view of making tests
> fit the system not the system fit the tests and that fact that everything
> is so slow and retry and workaround that stupid shit works all over. It's
> all deep. It's ingrained. It grown over for a decade.Its a project of 60
> modules.
>
> Soon we will split between those that think they are making progress
> across the ocean and those that think we are sitting in shark infested
> waters waiting to die actually, starting to float backwards sometimes now.
>
> - Mark
>
> On Sun, Nov 3, 2019 at 7:23 AM Mark Miller <markrmil...@gmail.com> wrote:
>
>> bq.  They also would allow it to do it in an iterative manner without
>> changing everything at once.
>>
>> Sadly, you can't fix this piece by piece :) I dare anyone to try. I
>> encourage, I applaud the effort.
>>
>> The world is your oyster from a good spot - take your pick of how to do
>> things.
>>
>> But from this spot, if anyone thinks we are getting out design change by
>> design change, JIRA by JIRA, I'm so sorry. Let's commiserate in a couple
>> years on a beer when you  give up on that.
>>
>> - Mark
>>
>> On Sun, Nov 3, 2019 at 4:01 AM Jörn Franke <jornfra...@gmail.com> wrote:
>>
>>> I cannot say anything about the statements, but maybe it could help to
>>> introduce Solr Improvement Proposals (SIP) similar to Kafka Improvement
>>> Proposals (KIP) or Flink Improvement Proposals (FLIP).
>>>
>>>   I think they are helpful to facilitate design decisions and
>>> refactoring / redesign decision. They also would allow it to do it in an
>>> iterative manner without changing everything at once.
>>> The final version could be out  in The Git of Solr in markdown including
>>> figures presenting parts of the design.
>>>
>>> However for developing them I propose a more inclusive approach where
>>> many people (not only core developers) can easily comment and support, eg
>>> Google docs or similar.
>>>
>>> > Am 03.11.2019 um 06:39 schrieb Noble Paul <noble.p...@gmail.com>:
>>> >
>>> > Solr has to do more than Lucene. A Lucene user is mostly a developer
>>> > who reads javadocs. A Solr user's touch points are
>>> >
>>> > * Public API
>>> > * Ref guide
>>> > * publicly visible files (in ZK as well as file system)
>>> > * What to see/look for in the log files to debug issues
>>> >
>>> > Then we have more nuanced touch points such as the knowledge base of
>>> > what happens internally in the system when 'X' API is invoked or when
>>> > 'Y' behavior is observed in ZK data.
>>> >
>>> > The problem with delaying the review process till code completion is
>>> > that, any changes based on review comments will require massive amount
>>> > of work.
>>> >
>>> > I don't have an answer to how we achieve it. But, I clearly see this
>>> > as a major gap in our development process today.
>>> >
>>> > This discussion may not be relevant in this thread, may be because no
>>> > behavior is changed at all. We don't know yet
>>> >
>>> > What I want to believe is Mark is doing the right thing & it's gonna
>>> > help us all in dealing with our operational issues. I don't want to
>>> > interrupt his work with more discussions.
>>> >
>>> > Thanks you
>>> >
>>> >
>>> >> On Sun, Nov 3, 2019 at 3:32 PM David Smiley <david.w.smi...@gmail.com>
>>> wrote:
>>> >>
>>> >> Yeah we do a bad job of the things you listed Noble.  :-(   My
>>> colleagues want pointers to internal docs but the sad reality is there
>>> isn't any.  You may notice I'm a stickler in my code reviews for requiring
>>> javadocs on all top level classes.  I think more javadocs and code comments
>>> would be very helpful -- especially for the major classes.  This might help
>>> us all and others a lot more.  For example I think Lucene does a rather
>>> fine job of this for its major classes -- IndexWriter being a good example.
>>> >>
>>> >> ~ David Smiley
>>> >> Apache Lucene/Solr Search Developer
>>> >> http://www.linkedin.com/in/davidwsmiley
>>> >>
>>> >>
>>> >>> On Sat, Nov 2, 2019 at 7:32 PM Noble Paul <noble.p...@gmail.com>
>>> wrote:
>>> >>>
>>> >>> Hi,
>>> >>>
>>> >>> I believe there is a consensus on what is wrong with the way we have
>>> built the cluster state and overseer. We need to focus a bit more on the
>>> design aspect. Design, according to me, has the following elements:
>>> >>>
>>> >>> * How does it work?
>>> >>>
>>> >>> * What are the performance characteristics? Can it be done more
>>> efficiently?
>>> >>>
>>> >>> * What are the public touch points?
>>> >>>
>>> >>> ** Which are the files we store in ZK? Are they expected to be
>>> watched always?
>>> >>>
>>> >>> ** Or are they read on demand?
>>> >>>
>>> >>> ** The public APIs. Does it make sense to the user? Can it be
>>> further simplified? How does it compare to the other APIs in the system?
>>> >>>
>>> >>>
>>> >>> We, as a community, do a bad job in dealing with these. While we
>>> focus on internal things, these are not discussed before it is too late. We
>>> usually do coding, tests, code review (sometimes) and commit. This leads to
>>> huge technical debt.
>>> >>>
>>> >>>
>>> >>> This is not to put blame on one person or a group of people. (I
>>> occasionally see people discussing design issues upfront, I just hope that
>>> is the norm.)
>>> >>>
>>> >>>
>>> >>> Now, why am I discussing this in this thread?
>>> >>>
>>> >>>
>>> >>> While we agree there are problems, we are trying to solve the
>>> problem using the same process we used to create these problems. Again, I'm
>>> not questioning the intent or competence of anyone. Unless we set the
>>> process right, we are doomed to make the same mistakes again.
>>> >>>
>>> >>>
>>> >>> I whole heartedly endorse any effort to improve SolrCloud/overseer.
>>> At the same time I fail to see us leveraging the collective experience of
>>> our community through meaningful discussion.
>>> >>>
>>> >>>
>>> >>> I hope we don't resort to personal attacks and use this as an
>>> opportunity to improve our processes.
>>> >>> Thanks
>>> >>>
>>> >>> On Sun, Nov 3, 2019, 9:52 AM Scott Blum <dragonsi...@gmail.com>
>>> wrote:
>>> >>>>
>>> >>>> Very much agreed.  I've been trying to figure out for a long time
>>> what is the point in having a replica DOWN state that has to be toggled
>>> (DOWN and then UP!) every time a node restarts.  Considering that we could
>>> just combine ACTIVE and `live_nodes` to understand whether a replica is
>>> available.  It's not even foolproof since kill -9 on a solr node won't mark
>>> all the replicas DOWN-- that doesn't happen until the node comes back up
>>> (perversely).
>>> >>>>
>>> >>>> What would it take to get to a state where restarting a node would
>>> require a minimal amount of ZK work in most cases?
>>> >>>>
>>> >>>> On Sat, Nov 2, 2019 at 5:44 PM Mark Miller <markrmil...@gmail.com>
>>> wrote:
>>> >>>>>
>>> >>>>> Give me a short bit to follow up and I will lay out my case and
>>> proposal.
>>> >>>>>
>>> >>>>> Everyone is then free to decide that we need to do something
>>> drastic or that I'm wrong and we should just continue down the same road.
>>> If that's the case, a lot of your work will get a lot easier and less
>>> impeded by me and we will still all be happier. Win win.
>>> >>>>>
>>> >>>>> If we can just not make drastic changes for a just a brief week or
>>> so window, I'll say what I have to say, you guys can judge and do whatever
>>> you'd please.
>>> >>>>>
>>> >>>>> - mark
>>> >>>>>
>>> >>>>> On Fri, Nov 1, 2019 at 7:46 PM Mark Miller <markrmil...@gmail.com>
>>> wrote:
>>> >>>>>>
>>> >>>>>> Hey All Solr Dev's,
>>> >>>>>>
>>> >>>>>> SolrCloud is sick right now. The way low level Zookeeper is
>>> handeled, the Overseer, is mix and mess of proper exception handling and
>>> super slow startup and shutdown, adding new things all the time with no
>>> concern for performance or proper ordering (which is harder to tell than
>>> you think).
>>> >>>>>>
>>> >>>>>> Our class dependency graph doesn't even work - we just force it.
>>> Sort of. If the whole system  doesn't block and choke it's way to a start
>>> slow enough, lots of things fail.
>>> >>>>>>
>>> >>>>>> This thing coughs up, you toss stuff into the storm, a good chunk
>>> of time, what you want eventually come back without causing too much damage.
>>> >>>>>>
>>> >>>>>> There are so many things are are off or just plain wrong and the
>>> list is growing and growing. No one is following this or if you are, please
>>> back me up. This thing will collapse under it's own wait.
>>> >>>>>>
>>> >>>>>> So if you want to add yet another state format cluster state or
>>> some other optimization on this junk heap, you can expect me to push back.
>>> >>>>>>
>>> >>>>>> We should all be embarrassed by the state of things.
>>> >>>>>>
>>> >>>>>> I've got some ideas for addressing them that I'll share soon, but
>>> god, don't keep optimizing a turd in non backcompat Overseer loving ways.
>>> That Overseer is an atrocity.
>>> >>>>>>
>>> >>>>>> --
>>> >>>>>> - Mark
>>> >>>>>>
>>> >>>>>> http://about.me/markrmiller
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> --
>>> >>>>> - Mark
>>> >>>>>
>>> >>>>> http://about.me/markrmiller
>>> >
>>> >
>>> >
>>> > --
>>> > -----------------------------------------------------
>>> > Noble Paul
>>> >
>>> > ---------------------------------------------------------------------
>>> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> > For additional commands, e-mail: dev-h...@lucene.apache.org
>>> >
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>
>>>
>>
>> --
>> - Mark
>>
>> http://about.me/markrmiller
>>
>
>
> --
> - Mark
>
> http://about.me/markrmiller
>
-- 
- Mark

http://about.me/markrmiller

Re: SolrCloud is sick.

Reply via email to