Noble Paul created SOLR-9226:
--------------------------------
Summary: Automatically fire FORCELEADER if shard leader is missing
Key: SOLR-9226
URL: https://issues.apache.org/jira/browse/SOLR-9226
Project: Solr
Issue Type: Bug
Reporter: Noble Paul
Assignee: Noble Paul
We have seen the shards losing leader often.
{code}
x:lamp_2016050713_shard2_replica1] o.a.s.c.ZkController Error getting leader
from zk
org.apache.solr.common.SolrException: Could not get leader props
at
org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:1044)
at
org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:1011)
at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:967)
at org.apache.solr.cloud.ZkController.register(ZkController.java:906)
at org.apache.solr.cloud.ZkController.register(ZkController.java:849)
at org.apache.solr.core.ZkContainer$2.run(ZkContainer.java:183)
{code}
There could be other instances as well
I recommend the following to heal such clusters
* Whenever a node finds that the shard has no LEADER, it should fire the force
FORCELEADER command
* FORCELEADER command is executed in the node that receives the command. It
should be moved to overseer to ensure that we don't run multiple such commands
in parallel.
* The command should make the best effort to identify a leader and should
assign a leader if at least one node is live in the shard
* When a shard has lost the leader, it is very likely that thousands of such
requests will be fired and they would clog the work queue. This command should
ensure that duplicate requests for FORCELEADER are consumed up from the
work-queue
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]