[
https://issues.apache.org/jira/browse/SOLR-8619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15123616#comment-15123616
]
Anshum Gupta commented on SOLR-8619:
------------------------------------
But the user wouldn't get back a useable replica. We could add retries but fail
if it's not just a short event. The idea here being, if a user is expecting
traffic, typically the case where a user would want to add a replica, the
response from the addreplica call should assure him that a _usable_ replica was
added. If that wasn't the case, ask him to retry while also communicating about
the reason for error. If we don't do that, the user would have to check the
clusterstatus to confirm if the new replica is actually usable or not.
> A new replica should not become leader when all current replicas are down as
> it leads to data loss
> --------------------------------------------------------------------------------------------------
>
> Key: SOLR-8619
> URL: https://issues.apache.org/jira/browse/SOLR-8619
> Project: Solr
> Issue Type: Bug
> Reporter: Anshum Gupta
>
> Here's what I'm talking about:
> * Start a 2 node solrcloud cluster
> * Create a 1 shard/1 replica collection
> * Add documents
> * Shut down the node that has the only active shard
> * ADDREPLICA for the shard/collection, so Solr would attempt to add a new
> replica on the other node
> * Solr waits for a while before this replica becomes an active leader.
> * Index a few new docs
> * Bring up the old node
> * The replica comes up, with it's old index and then syncs to only contain
> the docs from the new leader.
> All old documents are lost in this case
> Here are a few things that might work here:
> 1. Reject an ADDREPLICA call if all current replicas for the shard are down.
> Considering the new replica can not sync from anyone, it doesn't make sense
> for this replica to even come up
> 2. The replica shouldn't become active/leader unless either it was the last
> known leader or active before it went into recovering state
> unless there are no other replicas in the clusterstate.
> This might very well be related to SOLR-8173 but we should add a check to
> ADDREPLICA as well.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]