[jira] [Commented] (SOLR-10285) Reduce state messages when there are leader only shards
[ https://issues.apache.org/jira/browse/SOLR-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16189159#comment-16189159 ] Cao Manh Dat commented on SOLR-10285: - Hi [~varunthacker], I don't know why we have to wait for the leader message to be processed ( because this ticket skipped leader message )? Even if we send leader message and wait for it to be processed, we can easily get false positive, when the replica is already a leader and the unset leader message is in the queue. > Reduce state messages when there are leader only shards > --- > > Key: SOLR-10285 > URL: https://issues.apache.org/jira/browse/SOLR-10285 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Varun Thacker >Assignee: Cao Manh Dat > Attachments: SOLR-10285.patch, SOLR-10285.patch, SOLR-10285.patch > > > For shards which have 1 replica ( leader ) we know it doesn't need to recover > from anyone. We should short-circuit the recovery process in this case. > The motivation for this being that we will generate less state events and be > able to mark these replicas as active again without it needing to go into > 'recovering' state. > We already short circuit when you set {{-Dsolrcloud.skip.autorecovery=true}} > but that sys prop was meant for tests only. Extending this to make sure the > code short-circuits when the core knows its the only replica in the shard is > the motivation of the Jira. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10285) Reduce state messages when there are leader only shards
[ https://issues.apache.org/jira/browse/SOLR-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16188454#comment-16188454 ] Varun Thacker commented on SOLR-10285: -- Hi Dat, Do you think it will be a good idea to wait for the leader message to be processed before we return? > Reduce state messages when there are leader only shards > --- > > Key: SOLR-10285 > URL: https://issues.apache.org/jira/browse/SOLR-10285 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Varun Thacker >Assignee: Cao Manh Dat > Attachments: SOLR-10285.patch, SOLR-10285.patch, SOLR-10285.patch > > > For shards which have 1 replica ( leader ) we know it doesn't need to recover > from anyone. We should short-circuit the recovery process in this case. > The motivation for this being that we will generate less state events and be > able to mark these replicas as active again without it needing to go into > 'recovering' state. > We already short circuit when you set {{-Dsolrcloud.skip.autorecovery=true}} > but that sys prop was meant for tests only. Extending this to make sure the > code short-circuits when the core knows its the only replica in the shard is > the motivation of the Jira. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10285) Reduce state messages when there are leader only shards
[ https://issues.apache.org/jira/browse/SOLR-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16187638#comment-16187638 ] Cao Manh Dat commented on SOLR-10285: - Hi [~jhump], your patch looks good to me. About your TODO notes, I did some search and found that - ElectionContext is the only place use OverseerAction.Leader ( one for unset leader and one for set leader ). - STATE_PROP used in the second case is replica's state, which even not used in {{SliceMutator.setShardLeader}} So your concern about "mark the shard as inactive" is not correct, right? The only case that can occur between upgrade is 1. A replica ( repA ) is currently leader 2. The overseer is very busy 3. repA does unset leader operation ( which is delayed because overseer is very busy ) 4. repA get stopped in middle of the election process ( so set leader operation never get executed ) 5. repA start with the new code, then it saw it is the leader ( the unset operation in step 2 had not been executed ) so it skipped set leader operation. I think that above case is very very very rare and even it happens, Sysadmins must handle overwhelming in the number of operations in Overseer first. > Reduce state messages when there are leader only shards > --- > > Key: SOLR-10285 > URL: https://issues.apache.org/jira/browse/SOLR-10285 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Varun Thacker >Assignee: Cao Manh Dat > Attachments: SOLR-10285.patch > > > For shards which have 1 replica ( leader ) we know it doesn't need to recover > from anyone. We should short-circuit the recovery process in this case. > The motivation for this being that we will generate less state events and be > able to mark these replicas as active again without it needing to go into > 'recovering' state. > We already short circuit when you set {{-Dsolrcloud.skip.autorecovery=true}} > but that sys prop was meant for tests only. Extending this to make sure the > code short-circuits when the core knows its the only replica in the shard is > the motivation of the Jira. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10285) Reduce state messages when there are leader only shards
[ https://issues.apache.org/jira/browse/SOLR-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16001160#comment-16001160 ] Erick Erickson commented on SOLR-10285: --- Joshua: "Yonik's law of patches" reads "A half-baked patch with no documentation, no tests and no backwards compatibility is better than no patch at all.". Please feel free to attach a patch even if it's not complete (even if it doesn't even _compile_!), with appropriate disclaimers. Even if someone picks up this JIRA and decides to use another approach they'll be able to benefit from what they see of your work. It also is good if you mention that you won't be working on it, that way people won't wait if they want to pick it up. Best, Erick > Reduce state messages when there are leader only shards > --- > > Key: SOLR-10285 > URL: https://issues.apache.org/jira/browse/SOLR-10285 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Varun Thacker >Assignee: Cao Manh Dat > > For shards which have 1 replica ( leader ) we know it doesn't need to recover > from anyone. We should short-circuit the recovery process in this case. > The motivation for this being that we will generate less state events and be > able to mark these replicas as active again without it needing to go into > 'recovering' state. > We already short circuit when you set {{-Dsolrcloud.skip.autorecovery=true}} > but that sys prop was meant for tests only. Extending this to make sure the > code short-circuits when the core knows its the only replica in the shard is > the motivation of the Jira. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10285) Reduce state messages when there are leader only shards
[ https://issues.apache.org/jira/browse/SOLR-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16000967#comment-16000967 ] Joshua Humphries commented on SOLR-10285: - [~dragonsinth], I'm afraid I don't have a patch. I do have a branch where I made a lot of progress, but I did not finish getting unit tests to pass. The patch for SOLR-10277 ended up being sufficient for our restart-time objectives at the time, so I put it on the back-burner. This change would certainly reduce the restart time further, quite considerably, in fact, for deployments with a large number of shards that do not have multiple replicas. I'll dust it off today and try to assess remaining work to get it merge-worthy. > Reduce state messages when there are leader only shards > --- > > Key: SOLR-10285 > URL: https://issues.apache.org/jira/browse/SOLR-10285 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Varun Thacker >Assignee: Cao Manh Dat > > For shards which have 1 replica ( leader ) we know it doesn't need to recover > from anyone. We should short-circuit the recovery process in this case. > The motivation for this being that we will generate less state events and be > able to mark these replicas as active again without it needing to go into > 'recovering' state. > We already short circuit when you set {{-Dsolrcloud.skip.autorecovery=true}} > but that sys prop was meant for tests only. Extending this to make sure the > code short-circuits when the core knows its the only replica in the shard is > the motivation of the Jira. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10285) Reduce state messages when there are leader only shards
[ https://issues.apache.org/jira/browse/SOLR-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16000303#comment-16000303 ] Scott Blum commented on SOLR-10285: --- [~jhump] did you have a patch for this? or did we only discuss it? > Reduce state messages when there are leader only shards > --- > > Key: SOLR-10285 > URL: https://issues.apache.org/jira/browse/SOLR-10285 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Varun Thacker >Assignee: Cao Manh Dat > > For shards which have 1 replica ( leader ) we know it doesn't need to recover > from anyone. We should short-circuit the recovery process in this case. > The motivation for this being that we will generate less state events and be > able to mark these replicas as active again without it needing to go into > 'recovering' state. > We already short circuit when you set {{-Dsolrcloud.skip.autorecovery=true}} > but that sys prop was meant for tests only. Extending this to make sure the > code short-circuits when the core knows its the only replica in the shard is > the motivation of the Jira. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org