[jira] [Comment Edited] (SOLR-8696) Start the Overseer before actions that need the overseer on init and when reconnecting after zk expiration and improve init logic.

Scott Blum (JIRA) Mon, 29 Feb 2016 11:44:19 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172458#comment-15172458
 ]


Scott Blum edited comment on SOLR-8696 at 2/29/16 7:42 PM:
-----------------------------------------------------------

[~markrmil...@gmail.com] [~shalinmangar]

I think there's a slight problem with the fix as landed (merge artifact?)

{code}
      zkStateReader.createClusterStateWatchersAndUpdate();
      Stat stat = zkClient.exists(ZkStateReader.LIVE_NODES_ZKNODE, null, true);
      if (stat != null && stat.getNumChildren() > 0) {
        zkStateReader.createClusterStateWatchersAndUpdate();
        publishAndWaitForDownStates();
      }
{code}

createClusterStateWatchersAndUpdate() shouldn't be called twice, as it sets up 
duplicate watchers.  I actually think we should just have a single call at the 
top, right after createClusterZkNodes() and right before joining the overseer 
election, so that if we get elected we have a valid ClusterState to start with.

(For the record, I'm not super happy with the fact that external code needs to 
worry so much about initializing ZkStateReader exactly once)

Attached a patch with a suggested fix.


was (Author: dragonsinth):
[~markrmil...@gmail.com] [~shalinmangar]

I think there's a slight problem with the fix as landed (merge artifact?)

{code}
      zkStateReader.createClusterStateWatchersAndUpdate();
      Stat stat = zkClient.exists(ZkStateReader.LIVE_NODES_ZKNODE, null, true);
      if (stat != null && stat.getNumChildren() > 0) {
        zkStateReader.createClusterStateWatchersAndUpdate();
        publishAndWaitForDownStates();
      }
{code}

createClusterStateWatchersAndUpdate() shouldn't be called twice, as it sets up 
duplicate watchers.  I actually think we should just have a single call at the 
top, right after createClusterZkNodes() and right before joining the overseer 
election, so that if we get elected we have a valid ClusterState to start with.

(For the record, I'm not super happy with the fact that external code needs to 
worry so much about initializing ZkStateReader exactly once)

> Start the Overseer before actions that need the overseer on init and when 
> reconnecting after zk expiration and improve init logic.
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-8696
>                 URL: https://issues.apache.org/jira/browse/SOLR-8696
>             Project: Solr
>          Issue Type: Improvement
>          Components: SolrCloud
>    Affects Versions: 5.4.1
>            Reporter: Scott Blum
>            Assignee: Mark Miller
>              Labels: patch, performance, solrcloud, startup
>             Fix For: master
>
>         Attachments: SOLR-8696-followup.patch, SOLR-8696.patch, 
> SOLR-8696.patch
>
>
> ZkController.publishAndWaitForDownStates() occurs before overseer election.  
> That means if there is currently no overseer, there is ironically no one to 
> actually service the down state changes it's waiting on.  This particularly 
> affects a single-node cluster such as you might run locally for development.
> Additionally, we're doing an unnecessary ZkStateReader forced refresh on all 
> Overseer operations.  This isn't necessary because ZkStateReader keeps itself 
> up to date.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-8696) Start the Overseer before actions that need the overseer on init and when reconnecting after zk expiration and improve init logic.

Reply via email to