[jira] [Commented] (SOLR-3939) An empty or just replicated index cannot become the leader of a shard after a leader goes down.
[ https://issues.apache.org/jira/browse/SOLR-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13610637#comment-13610637 ] Commit Tag Bot commented on SOLR-3939: -- [branch_4x commit] Mark Robert Miller http://svn.apache.org/viewvc?view=revisionrevision=1402362 SOLR-3933: Distributed commits are not guaranteed to be ordered within a request. SOLR-3939: An empty or just replicated index cannot become the leader of a shard after a leader goes down. SOLR-3971: A collection that is created with numShards=1 turns into a numShards=2 collection after starting up a second core and not specifying numShards. SOLR-3932: SolrCmdDistributorTest either takes 3 seconds or 3 minutes. An empty or just replicated index cannot become the leader of a shard after a leader goes down. --- Key: SOLR-3939 URL: https://issues.apache.org/jira/browse/SOLR-3939 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.0-BETA, 4.0 Reporter: Joel Bernstein Assignee: Mark Miller Priority: Critical Labels: 4.0.1_Candidate Fix For: 4.1, 5.0 Attachments: cloud2.log, cloud.log, SOLR-3939.patch, SOLR-3939.patch When a leader core is unloaded using the core admin api, the followers in the shard go into recovery but do not come out. Leader election doesn't take place and the shard goes down. This effects the ability to move a micro-shard from one Solr instance to another Solr instance. The problem does not occur 100% of the time but a large % of the time. To setup a test, startup Solr Cloud with a single shard. Add cores to that shard as replicas using core admin. Then unload the leader core using core admin. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3939) An empty or just replicated index cannot become the leader of a shard after a leader goes down.
[ https://issues.apache.org/jira/browse/SOLR-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13610642#comment-13610642 ] Commit Tag Bot commented on SOLR-3939: -- [branch_4x commit] Mark Robert Miller http://svn.apache.org/viewvc?view=revisionrevision=1402361 SOLR-3933: Distributed commits are not guaranteed to be ordered within a request. SOLR-3939: An empty or just replicated index cannot become the leader of a shard after a leader goes down. SOLR-3971: A collection that is created with numShards=1 turns into a numShards=2 collection after starting up a second core and not specifying numShards. SOLR-3932: SolrCmdDistributorTest either takes 3 seconds or 3 minutes. An empty or just replicated index cannot become the leader of a shard after a leader goes down. --- Key: SOLR-3939 URL: https://issues.apache.org/jira/browse/SOLR-3939 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.0-BETA, 4.0 Reporter: Joel Bernstein Assignee: Mark Miller Priority: Critical Labels: 4.0.1_Candidate Fix For: 4.1, 5.0 Attachments: cloud2.log, cloud.log, SOLR-3939.patch, SOLR-3939.patch When a leader core is unloaded using the core admin api, the followers in the shard go into recovery but do not come out. Leader election doesn't take place and the shard goes down. This effects the ability to move a micro-shard from one Solr instance to another Solr instance. The problem does not occur 100% of the time but a large % of the time. To setup a test, startup Solr Cloud with a single shard. Add cores to that shard as replicas using core admin. Then unload the leader core using core admin. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3939) An empty or just replicated index cannot become the leader of a shard after a leader goes down.
[ https://issues.apache.org/jira/browse/SOLR-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13610678#comment-13610678 ] Commit Tag Bot commented on SOLR-3939: -- [branch_4x commit] Mark Robert Miller http://svn.apache.org/viewvc?view=revisionrevision=1397672 SOLR-3939: Consider a sync attempt from leader to replica that fails due to 404 a success. SOLR-3940: Rejoining the leader election incorrectly triggers the code path for a fresh cluster start rather than fail over. An empty or just replicated index cannot become the leader of a shard after a leader goes down. --- Key: SOLR-3939 URL: https://issues.apache.org/jira/browse/SOLR-3939 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.0-BETA, 4.0 Reporter: Joel Bernstein Assignee: Mark Miller Priority: Critical Labels: 4.0.1_Candidate Fix For: 4.1, 5.0 Attachments: cloud2.log, cloud.log, SOLR-3939.patch, SOLR-3939.patch When a leader core is unloaded using the core admin api, the followers in the shard go into recovery but do not come out. Leader election doesn't take place and the shard goes down. This effects the ability to move a micro-shard from one Solr instance to another Solr instance. The problem does not occur 100% of the time but a large % of the time. To setup a test, startup Solr Cloud with a single shard. Add cores to that shard as replicas using core admin. Then unload the leader core using core admin. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3939) An empty or just replicated index cannot become the leader of a shard after a leader goes down.
[ https://issues.apache.org/jira/browse/SOLR-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13485000#comment-13485000 ] Mark Miller commented on SOLR-3939: --- Okay, I'm going to resolve this - we can make a new issue for the case where a replica comes up and is ahead somehow. An empty or just replicated index cannot become the leader of a shard after a leader goes down. --- Key: SOLR-3939 URL: https://issues.apache.org/jira/browse/SOLR-3939 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.0-BETA, 4.0 Reporter: Joel Bernstein Assignee: Mark Miller Priority: Critical Labels: 4.0.1_Candidate Fix For: 4.1, 5.0 Attachments: cloud2.log, cloud.log, SOLR-3939.patch, SOLR-3939.patch When a leader core is unloaded using the core admin api, the followers in the shard go into recovery but do not come out. Leader election doesn't take place and the shard goes down. This effects the ability to move a micro-shard from one Solr instance to another Solr instance. The problem does not occur 100% of the time but a large % of the time. To setup a test, startup Solr Cloud with a single shard. Add cores to that shard as replicas using core admin. Then unload the leader core using core admin. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3939) An empty or just replicated index cannot become the leader of a shard after a leader goes down.
[ https://issues.apache.org/jira/browse/SOLR-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13484611#comment-13484611 ] Mark Miller commented on SOLR-3939: --- I've committed my latest work to 4x Joel - can you do a bit more testing with a recent checkout? An empty or just replicated index cannot become the leader of a shard after a leader goes down. --- Key: SOLR-3939 URL: https://issues.apache.org/jira/browse/SOLR-3939 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.0-BETA, 4.0 Reporter: Joel Bernstein Assignee: Mark Miller Priority: Critical Labels: 4.0.1_Candidate Fix For: 4.1, 5.0 Attachments: cloud2.log, cloud.log, SOLR-3939.patch, SOLR-3939.patch When a leader core is unloaded using the core admin api, the followers in the shard go into recovery but do not come out. Leader election doesn't take place and the shard goes down. This effects the ability to move a micro-shard from one Solr instance to another Solr instance. The problem does not occur 100% of the time but a large % of the time. To setup a test, startup Solr Cloud with a single shard. Add cores to that shard as replicas using core admin. Then unload the leader core using core admin. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3939) An empty or just replicated index cannot become the leader of a shard after a leader goes down.
[ https://issues.apache.org/jira/browse/SOLR-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13484652#comment-13484652 ] Joel Bernstein commented on SOLR-3939: -- I ran the Oct 14th test and the leader election worked perfectly. Then I tested shutting down the leader VM instead of unloading the loader core and this worked fine. Then I tried a leader with two replicas that had both just been replicated to. When I unloaded the leader neither replica became leader. But this was the case that was not yet accounted for I believe. I can't think of a use case where the second scenario would happen though. The first scenario though is critical for migrating micro-shards, so it's great that you committed this. Thanks for your work on this issue. Joel An empty or just replicated index cannot become the leader of a shard after a leader goes down. --- Key: SOLR-3939 URL: https://issues.apache.org/jira/browse/SOLR-3939 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.0-BETA, 4.0 Reporter: Joel Bernstein Assignee: Mark Miller Priority: Critical Labels: 4.0.1_Candidate Fix For: 4.1, 5.0 Attachments: cloud2.log, cloud.log, SOLR-3939.patch, SOLR-3939.patch When a leader core is unloaded using the core admin api, the followers in the shard go into recovery but do not come out. Leader election doesn't take place and the shard goes down. This effects the ability to move a micro-shard from one Solr instance to another Solr instance. The problem does not occur 100% of the time but a large % of the time. To setup a test, startup Solr Cloud with a single shard. Add cores to that shard as replicas using core admin. Then unload the leader core using core admin. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3939) An empty or just replicated index cannot become the leader of a shard after a leader goes down.
[ https://issues.apache.org/jira/browse/SOLR-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13484683#comment-13484683 ] Yonik Seeley commented on SOLR-3939: bq. Isn't that what capturing the starting versions is all about? For a node starting up, yeah. For a leader syncing to someone else - I don't think it should matter. bq. but if you want to peer sync from the leader to a replica that is coming back up, if updates are coming in, you are going to force a replication anyway. If updates were coming in fast enough during the bounce... I guess so. An empty or just replicated index cannot become the leader of a shard after a leader goes down. --- Key: SOLR-3939 URL: https://issues.apache.org/jira/browse/SOLR-3939 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.0-BETA, 4.0 Reporter: Joel Bernstein Assignee: Mark Miller Priority: Critical Labels: 4.0.1_Candidate Fix For: 4.1, 5.0 Attachments: cloud2.log, cloud.log, SOLR-3939.patch, SOLR-3939.patch When a leader core is unloaded using the core admin api, the followers in the shard go into recovery but do not come out. Leader election doesn't take place and the shard goes down. This effects the ability to move a micro-shard from one Solr instance to another Solr instance. The problem does not occur 100% of the time but a large % of the time. To setup a test, startup Solr Cloud with a single shard. Add cores to that shard as replicas using core admin. Then unload the leader core using core admin. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3939) An empty or just replicated index cannot become the leader of a shard after a leader goes down.
[ https://issues.apache.org/jira/browse/SOLR-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483567#comment-13483567 ] Yonik Seeley commented on SOLR-3939: Trying to think if this could happen when there are versions too... say that instead of having no versions, we just have old versions from before we did the replication. This may argue for somehow marking the start of a replication in the transaction log and then never retrieving versions older than that. An empty or just replicated index cannot become the leader of a shard after a leader goes down. --- Key: SOLR-3939 URL: https://issues.apache.org/jira/browse/SOLR-3939 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.0-BETA, 4.0 Reporter: Joel Bernstein Assignee: Mark Miller Priority: Critical Labels: 4.0.1_Candidate Fix For: 4.1, 5.0 Attachments: cloud2.log, cloud.log, SOLR-3939.patch, SOLR-3939.patch When a leader core is unloaded using the core admin api, the followers in the shard go into recovery but do not come out. Leader election doesn't take place and the shard goes down. This effects the ability to move a micro-shard from one Solr instance to another Solr instance. The problem does not occur 100% of the time but a large % of the time. To setup a test, startup Solr Cloud with a single shard. Add cores to that shard as replicas using core admin. Then unload the leader core using core admin. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3939) An empty or just replicated index cannot become the leader of a shard after a leader goes down.
[ https://issues.apache.org/jira/browse/SOLR-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483595#comment-13483595 ] Yonik Seeley commented on SOLR-3939: Thinking of some scenarios where this could happen: 1. R1,R2 both up and active, add docs 1,2,3 2. bring R2 down 3. add docs 4 through 1million 4. bring R2 up, peersync fails, replication is kicked off 5. R2 finishes replication and becomes active, but it's recent version still list 1,2,3 6. bring R1 down, R2 becomes the leader 7. bring R2 up, it does a peer-sync with R1, which looks like it has really old versions (and succeeds because of that) 8. if the leader (R2) does a peer-sync back with R1, it will fail (not sure of the consequences of this) Another variation... if there's an update between 6 and 7: 6.5. add doc 1million+1 This will cause recent versions of R2 to be 1,2,3,101 It would be good to verify that peersync to the leader will either fail (causing full replication), or pick up the new document. An empty or just replicated index cannot become the leader of a shard after a leader goes down. --- Key: SOLR-3939 URL: https://issues.apache.org/jira/browse/SOLR-3939 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.0-BETA, 4.0 Reporter: Joel Bernstein Assignee: Mark Miller Priority: Critical Labels: 4.0.1_Candidate Fix For: 4.1, 5.0 Attachments: cloud2.log, cloud.log, SOLR-3939.patch, SOLR-3939.patch When a leader core is unloaded using the core admin api, the followers in the shard go into recovery but do not come out. Leader election doesn't take place and the shard goes down. This effects the ability to move a micro-shard from one Solr instance to another Solr instance. The problem does not occur 100% of the time but a large % of the time. To setup a test, startup Solr Cloud with a single shard. Add cores to that shard as replicas using core admin. Then unload the leader core using core admin. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3939) An empty or just replicated index cannot become the leader of a shard after a leader goes down.
[ https://issues.apache.org/jira/browse/SOLR-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483600#comment-13483600 ] Mark Miller commented on SOLR-3939: --- Currently the leader does not peer sync back to a replica coming up because it would have to buffer updates. I think that if a replica is somehow ahead of the leader when coming back, peersync should fail and it should replicate. I think since this is not a common case, that is much simpler than trying to peersync back from the leder to the replica in this case. An empty or just replicated index cannot become the leader of a shard after a leader goes down. --- Key: SOLR-3939 URL: https://issues.apache.org/jira/browse/SOLR-3939 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.0-BETA, 4.0 Reporter: Joel Bernstein Assignee: Mark Miller Priority: Critical Labels: 4.0.1_Candidate Fix For: 4.1, 5.0 Attachments: cloud2.log, cloud.log, SOLR-3939.patch, SOLR-3939.patch When a leader core is unloaded using the core admin api, the followers in the shard go into recovery but do not come out. Leader election doesn't take place and the shard goes down. This effects the ability to move a micro-shard from one Solr instance to another Solr instance. The problem does not occur 100% of the time but a large % of the time. To setup a test, startup Solr Cloud with a single shard. Add cores to that shard as replicas using core admin. Then unload the leader core using core admin. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3939) An empty or just replicated index cannot become the leader of a shard after a leader goes down.
[ https://issues.apache.org/jira/browse/SOLR-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483649#comment-13483649 ] Yonik Seeley commented on SOLR-3939: bq. Currently the leader does not peer sync back to a replica coming up because it would have to buffer updates. peer sync doesn't require buffering updates. AFAIK, we don't do that until we realize we need to replicate? An empty or just replicated index cannot become the leader of a shard after a leader goes down. --- Key: SOLR-3939 URL: https://issues.apache.org/jira/browse/SOLR-3939 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.0-BETA, 4.0 Reporter: Joel Bernstein Assignee: Mark Miller Priority: Critical Labels: 4.0.1_Candidate Fix For: 4.1, 5.0 Attachments: cloud2.log, cloud.log, SOLR-3939.patch, SOLR-3939.patch When a leader core is unloaded using the core admin api, the followers in the shard go into recovery but do not come out. Leader election doesn't take place and the shard goes down. This effects the ability to move a micro-shard from one Solr instance to another Solr instance. The problem does not occur 100% of the time but a large % of the time. To setup a test, startup Solr Cloud with a single shard. Add cores to that shard as replicas using core admin. Then unload the leader core using core admin. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3939) An empty or just replicated index cannot become the leader of a shard after a leader goes down.
[ https://issues.apache.org/jira/browse/SOLR-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483664#comment-13483664 ] Mark Miller commented on SOLR-3939: --- As far as I remember, if updates are coming in when you try and peer sync, we fail it? Isn't that what capturing the starting versions is all about? When a leader syncs with his replicas on leader election, we know docs are not coming in, so we don't worry about that starting versions check - but if you want to peer sync from the leader to a replica that is coming back up, if updates are coming in, you are going to force a replication anyway. Since it's already an uncommon case, it doesn't seem worth tackling. I mention buffering, because it seemed you would have to to be able to peer sync when updates are coming in (or block updates). An empty or just replicated index cannot become the leader of a shard after a leader goes down. --- Key: SOLR-3939 URL: https://issues.apache.org/jira/browse/SOLR-3939 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.0-BETA, 4.0 Reporter: Joel Bernstein Assignee: Mark Miller Priority: Critical Labels: 4.0.1_Candidate Fix For: 4.1, 5.0 Attachments: cloud2.log, cloud.log, SOLR-3939.patch, SOLR-3939.patch When a leader core is unloaded using the core admin api, the followers in the shard go into recovery but do not come out. Leader election doesn't take place and the shard goes down. This effects the ability to move a micro-shard from one Solr instance to another Solr instance. The problem does not occur 100% of the time but a large % of the time. To setup a test, startup Solr Cloud with a single shard. Add cores to that shard as replicas using core admin. Then unload the leader core using core admin. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org