[
https://issues.apache.org/jira/browse/SOLR-6837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14243069#comment-14243069
]
Shalin Shekhar Mangar commented on SOLR-6837:
---------------------------------------------
My reply to the email:
{quote}
A write in Solr, by default, is only guaranteed to exist in 1 place i.e. the
leader and the safety valves that we have to preserve these writes are:
1. The leaderVoteWait time for which leader election is suspended until enough
live replicas are available
2. The two-way peer-sync between leader candidate and other replicas
The other safety valve is on the client side with the "min_rf" parameter
introduced by SOLR-5468 in Solr 4.9. If you set this param to 2 while making
the request then Solr will return the number of replicas to which it could
successfully send the update. Then depending on the response you can make a
decision to retry the update at a later time assuming it is idempotent. This
kinda puts the onus ensuring consistency on the client side which is not ideal
but better than nothing. See SOLR-5468 for more discussion on this topic.
In your particular example, none of these safeties are invoked because you
start node2 while node1 was down and node2 goes ahead with leader election
after the wait period. Also since node1 was down during leader election, peer
sync doesn't happen and then node2 becomes the leader.
When node1 comes back online and joins as a replica, it recovers from the
leader using peer-sync (which returns the newest 100 updates) and finds that
there's nothing newer on the leader. However, there are no checks to make sure
that the replica doesn't have a newer update itself which is why you end up
with the inconsistent replica. If there were a lot of updates on node2 (more
than 100) while node1 was down, in which case peer-sync isn't applicable, then
it'd would have done a replication recovery and this inconsistency would have
been resolved.
So yeah we have a valid consistency bug such that we have inconsistent replicas
in a steady state. I wonder if the right way is to bump min_rf to a higher
value or peer-sync both ways during replica recovery. I'll need to think more
on this.
{quote}
> Inconsistent replicas when update is succesful against leader partitioned
> from all replicas
> -------------------------------------------------------------------------------------------
>
> Key: SOLR-6837
> URL: https://issues.apache.org/jira/browse/SOLR-6837
> Project: Solr
> Issue Type: Bug
> Components: SolrCloud
> Affects Versions: 4.10.2
> Reporter: Shalin Shekhar Mangar
> Labels: difficulty-hard, impact-medium
>
> Refer to the following question on solr-user:
> https://www.marshut.net/kttiuz/inconsistent-doc-value-across-two-nodes-very-simple-test-what-s-the-expected-behavior.html
> {quote}
> Config
> Solr 4.7.2 / Jetty.
> SoldCloud on two nodes, and 3 ZK, all running in localhost.
> single collection: single shard with two replicas.
> Reproducing:
> start node1 9.148.58.114:8983
> start node2 9.148.58.114:8984
> Cluster state: node1 leader. node2 active.
> index value 'A' (id="change me").
> query and expect 'A' -> success
> Stop node2
> Cluster state: node1 leader. node2 gone.
> query and expect 'A' -> success
> Update document value from 'A'->'B'
> query and expect 'B' -> success
> Stop node1
> then
> Start node2.
> Cluster state: node1 gone. node2 down.
> 104510 [coreZkRegister-1-thread-1] INFO
> org.apache.solr.cloud.ShardLeaderElectionContext Waiting until we see more
> replicas up for shard shard1: total=2 found=1 timeoutin=5.27665925E14ms
> wait 3m.
> 184679 [coreZkRegister-1-thread-1] INFO
> org.apache.solr.cloud.ShardLeaderElectionContext I am the new leader:
> http://9.148.58.114:8984/solr/quick-results-collection_shard1_replica2/
> shard1
> Cluster state: node1 gone. node2 leader.
> query and expect 'A' (old value) -> success
> start node1
> Cluster state: node1 actove. node2 leader.
> Inconsistency:
> Querying node1 always returns 'B'.
> http://localhost:8983/solr/quick-results-collection_shard1_replica1/select?q=*%3A*&wt=json&indent=true
> Querying node1 always returns 'A'.
> http://localhost:8984/solr/quick-results-collection_shard1_replica2/select?q=*%3A*&wt=json&indent=true
> {quote}
> In such a case, the final steady state of the system has inconsistent
> replicas.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]