[ https://issues.apache.org/jira/browse/SOLR-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172458#comment-15172458 ]
Scott Blum edited comment on SOLR-8696 at 2/29/16 7:42 PM: ----------------------------------------------------------- [~markrmil...@gmail.com] [~shalinmangar] I think there's a slight problem with the fix as landed (merge artifact?) {code} zkStateReader.createClusterStateWatchersAndUpdate(); Stat stat = zkClient.exists(ZkStateReader.LIVE_NODES_ZKNODE, null, true); if (stat != null && stat.getNumChildren() > 0) { zkStateReader.createClusterStateWatchersAndUpdate(); publishAndWaitForDownStates(); } {code} createClusterStateWatchersAndUpdate() shouldn't be called twice, as it sets up duplicate watchers. I actually think we should just have a single call at the top, right after createClusterZkNodes() and right before joining the overseer election, so that if we get elected we have a valid ClusterState to start with. (For the record, I'm not super happy with the fact that external code needs to worry so much about initializing ZkStateReader exactly once) Attached a patch with a suggested fix. was (Author: dragonsinth): [~markrmil...@gmail.com] [~shalinmangar] I think there's a slight problem with the fix as landed (merge artifact?) {code} zkStateReader.createClusterStateWatchersAndUpdate(); Stat stat = zkClient.exists(ZkStateReader.LIVE_NODES_ZKNODE, null, true); if (stat != null && stat.getNumChildren() > 0) { zkStateReader.createClusterStateWatchersAndUpdate(); publishAndWaitForDownStates(); } {code} createClusterStateWatchersAndUpdate() shouldn't be called twice, as it sets up duplicate watchers. I actually think we should just have a single call at the top, right after createClusterZkNodes() and right before joining the overseer election, so that if we get elected we have a valid ClusterState to start with. (For the record, I'm not super happy with the fact that external code needs to worry so much about initializing ZkStateReader exactly once) > Start the Overseer before actions that need the overseer on init and when > reconnecting after zk expiration and improve init logic. > ---------------------------------------------------------------------------------------------------------------------------------- > > Key: SOLR-8696 > URL: https://issues.apache.org/jira/browse/SOLR-8696 > Project: Solr > Issue Type: Improvement > Components: SolrCloud > Affects Versions: 5.4.1 > Reporter: Scott Blum > Assignee: Mark Miller > Labels: patch, performance, solrcloud, startup > Fix For: master > > Attachments: SOLR-8696-followup.patch, SOLR-8696.patch, > SOLR-8696.patch > > > ZkController.publishAndWaitForDownStates() occurs before overseer election. > That means if there is currently no overseer, there is ironically no one to > actually service the down state changes it's waiting on. This particularly > affects a single-node cluster such as you might run locally for development. > Additionally, we're doing an unnecessary ZkStateReader forced refresh on all > Overseer operations. This isn't necessary because ZkStateReader keeps itself > up to date. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org