[jira] [Created] (SOLR-10277) On 'downnode', lots of wasteful mutations are done to ZK

Joshua Humphries (JIRA) Mon, 13 Mar 2017 14:57:06 -0700

Joshua Humphries created SOLR-10277:
---------------------------------------


             Summary: On 'downnode', lots of wasteful mutations are done to ZK
                 Key: SOLR-10277
                 URL: https://issues.apache.org/jira/browse/SOLR-10277
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
          Components: SolrCloud
    Affects Versions: 5.5.3
            Reporter: Joshua Humphries


When a node restarts, it submits a single 'downnode' message to the overseer's 
state update queue.

When the overseer processes the message, it does way more writes to ZK than 
necessary. In our cluster of 48 hosts, the majority of collections have only 1 
shard and 1 replica. So a single node restarting should only result in ~1/40th 
of the collections being updated with new replica states (to indicate the node 
that is no longer active).

However, the current logic in NodeMutator#downNode always updates *every* 
collection. So we end up having to do rolling restarts very slowly to avoid 
having a severe outage due to the overseer having to do way too much work for 
each host that is restarted. And subsequent shards becoming leader can't get 
processed until the `downnode` message is fully processed. So a fast rolling 
restart can result in the overseer queue growing incredibly large and nearly 
all shards winding up in a leader-less state until that backlog is processed.

The fix is a trivial logic change to only add a ZkWriteCommand for collections 
that actually have an impacted replica.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SOLR-10277) On 'downnode', lots of wasteful mutations are done to ZK

Reply via email to