[ https://issues.apache.org/jira/browse/SOLR-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16187638#comment-16187638 ]
Cao Manh Dat edited comment on SOLR-10285 at 10/2/17 2:38 AM: -------------------------------------------------------------- Hi [~jhump], your patch looks good to me. About your TODO notes, I did some search and found that - ElectionContext is the only place use OverseerAction.Leader ( one for unset leader and one for set leader ). - STATE_PROP used in the second case is replica's state, which even not used in {{SliceMutator.setShardLeader}} So your concern about "mark the shard as inactive" is not correct, right? The only problem that can occur between upgrade is 1. A replica ( repA ) is currently leader 2. The overseer is very busy 3. repA does unset leader operation ( which is delayed because overseer is very busy ) 4. repA get stopped in middle of the election process ( so set leader operation never get executed ) 5. repA start with the new code, then it saw it is the leader ( the unset operation in step 2 had not been executed ) so it skipped set leader operation. I think that above case is very very very rare and even it happens, Sysadmins must handle overwhelming in the number of operations in Overseer first. was (Author: caomanhdat): Hi [~jhump], your patch looks good to me. About your TODO notes, I did some search and found that - ElectionContext is the only place use OverseerAction.Leader ( one for unset leader and one for set leader ). - STATE_PROP used in the second case is replica's state, which even not used in {{SliceMutator.setShardLeader}} So your concern about "mark the shard as inactive" is not correct, right? The only case that can occur between upgrade is 1. A replica ( repA ) is currently leader 2. The overseer is very busy 3. repA does unset leader operation ( which is delayed because overseer is very busy ) 4. repA get stopped in middle of the election process ( so set leader operation never get executed ) 5. repA start with the new code, then it saw it is the leader ( the unset operation in step 2 had not been executed ) so it skipped set leader operation. I think that above case is very very very rare and even it happens, Sysadmins must handle overwhelming in the number of operations in Overseer first. > Reduce state messages when there are leader only shards > ------------------------------------------------------- > > Key: SOLR-10285 > URL: https://issues.apache.org/jira/browse/SOLR-10285 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Reporter: Varun Thacker > Assignee: Cao Manh Dat > Attachments: SOLR-10285.patch > > > For shards which have 1 replica ( leader ) we know it doesn't need to recover > from anyone. We should short-circuit the recovery process in this case. > The motivation for this being that we will generate less state events and be > able to mark these replicas as active again without it needing to go into > 'recovering' state. > We already short circuit when you set {{-Dsolrcloud.skip.autorecovery=true}} > but that sys prop was meant for tests only. Extending this to make sure the > code short-circuits when the core knows its the only replica in the shard is > the motivation of the Jira. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org