[ https://issues.apache.org/jira/browse/SOLR-8697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15154833#comment-15154833 ]
Mark Miller commented on SOLR-8697: ----------------------------------- Curator has come up before. Personally, I have not wanted to try and mimic what we have or go through a protracted hardening process again. This stuff is all very touchy, and our tests def do not catch anything, so a rip and replace at that low level would be both very difficult and sure to introduce a lot of issues. I think a lot of the problem is that devs like to favor just tossing crap on top of what exists, rather than trying to wholistically move the design forward or make it right for what they want to add (Examples: OverseerNodePrioritizer and RebalanceLeaders - which also made the election code much more dense). I feel a lot of "let's just make this work". I can't tell you how surprised I've been that some devs have come and built so much on some of the prototype code I initially laid out. I've always thought, how do you build so much on this without finding/fixing more core bugs and seeing other necessary improvments more things as you go? Not that it doesn't happen, but the scale has historically been way below what I think makes sense. Easy for me to say I guess. Anyway, it's great that you have already filed a bunch of issues :) I'd rather focus on some refactoring than bringing in curator though. The implications of that would be pretty large and we have plenty of other more pressing issues I think. > Fix LeaderElector issues > ------------------------ > > Key: SOLR-8697 > URL: https://issues.apache.org/jira/browse/SOLR-8697 > Project: Solr > Issue Type: Bug > Components: SolrCloud > Affects Versions: 5.4.1 > Reporter: Scott Blum > Labels: patch, reliability, solrcloud > Attachments: SOLR-8697.patch > > > This patch is still somewhat WIP for a couple of reasons: > 1) Still debugging test failures. > 2) This will more scrutiny from knowledgable folks! > There are some subtle bugs with the current implementation of LeaderElector, > best demonstrated by the following test: > 1) Start up a small single-node solrcloud. it should be become Overseer. > 2) kill -9 the solrcloud process and immediately start a new one. > 3) The new process won't become overseer. The old process's ZK leader elect > node has not yet disappeared, and the new process fails to set appropriate > watches. > NOTE: this is only reproducible if the new node is able to start up and join > the election quickly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org