[ 
https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15735547#comment-15735547
 ] 

Cao Manh Dat commented on SOLR-9835:
------------------------------------

Here are the detail design that I and Shalin have been working for recently

h2. Overview
The current replication mechanism of SolrCloud is called state machine, which 
replicas start in same initial state and for each input, the input is 
distributed across replicas so all replicas will end up with same next state.

But this type of replication has some drawbacks :
- The indexing and commits have to run on all replicas
- Slow recovery, because if replica miss more than N updates on its down time, 
the replica has to (usually) download the entire index from its leader.

So we introduce another replication for SolrCloud called state transfer, which 
acts like master/slave replication. Basically:
- Leader distribute the update to other replicas, but only the leader applies 
the update to IndexWriter; other replicas just store the update to UpdateLog 
(act like replication)
- Replicas frequently polling the latest segments from leader.

Pros:
- Lightweight for indexing, because only leader are running the updates and 
commits
- Very fast recovery. If a replica fails, it just has to download newest 
segments, instead of re-downloading entire index.

Cons :
- Leader can become a hotspot when there are many replicas (we can optimize 
later)
- Longer turnaround time compared to current NRT replication

h2. Commit
When we commit, we write the version of update command into the commit data in 
addition to the timestamp that is written today.

h2. Update
# When a replica receive an update request it will forward the update request 
to 
#* Correct leader ( in case of add document, delete by id )
#* All leaders ( in case of delete by query, commit )
# Leader assigns version to update as it does today, writes to update log and 
applies the update to IndexWriter
# Leader distributes the update to its replicas
# When replica receives an update, it writes the update to update log and 
return successfully
# Leader return successful

h2. Periodic Segment Replication
Replica will poll the leader for the latest commit point generation and version 
and compare against the latest commit point locally. If it is the same, then 
replication is successful and nothing needs to be done. If not, then:
# Replica downloads all files since the local commit point from the leader
# Installs into the index, synchronize on the update log, close the old tlog, 
create a new one and copy over all records from the old tlog which have version 
greater than the latest commit point’s version. This is to ensure that any 
update which hasn’t made it to the index is preserved in the current tlog. This 
might lead to duplication of updates in the previous and the current tlogs but 
that is okay.
# Replica re-opens the searcher

For example:
Leader has the following tlogs
- TLog1 has versions 1,2,3,commit
- TLog2 has versions 4,5

Before segment replication, the replica have the following tlogs:
- TLog1 - 1,2,3,4,commit

After segment replication, the replica will have the following tlogs:
- TLog1 - 1,2,3,4,commit
- TLog2 - 4,5


During this process, the replica does not need to be put into ‘recovery’ state. 
It continues to be ‘active’ and participate in indexing and searches.

h2. Replica recovery
# Replica puts itself in ‘recovery’ state.
# Replica compares its latest commit point against the leader
# If the index is same as leader, then it performs ‘peersync’ with the leader 
but only writes the peersync updates to its update log
# If the index is not the same as leader or if peersync fails, the replica:
#* Puts its update log in “buffering” state
#* Issues a hard commit to the leader
#* Copies the index commit points from the leader that do not exist locally
#* Publishes itself as ‘active’
# The “buffering” state in the above steps ensures that any updates that 
haven’t been committed to the leader are also present/replicated to the 
replica’s current transaction log

With respect to the current recovery strategy in Solr, we need only one change 
which is to check the index version of leader vs replica before we attempt a 
peersync.

h2. Leader Election
When a leader dies, a candidate replica will become a new leader. The leader 
election algorithm remains mostly the same except that after the “sync” step, 
the leader candidate will replay its transaction log before publishing itself 
as the leader.

h2. Collection property to switch replication scheme
A new property called “onlyLeaderIndexes” will be added to the collection. Any 
collection that has this property set to true will only index to the elected 
leader and the rest of the replicas will only fetch index segments from the 
leader as described above in the document. This property must be set during 
collection creation. It will default to “false”. Existing collections cannot be 
switched to using the new replication scheme. Future work can attempt to fix 
that.

h2. FAQ

Q: What happens on a soft-commit?
A: soft-commit is nothing but a searcher opened using the IndexWriter which 
flushes a new segment to the disk but does not commit it. In this case, the 
newly written segment not being part of a commit point, is not replicated at 
all. Effectively, soft commits are not useful in this new design currently. 
Future work may attempt to solve this problem. This same answer applies to 
searcher re-opens due to real-time-get.


> Create another replication mode for SolrCloud
> ---------------------------------------------
>
>                 Key: SOLR-9835
>                 URL: https://issues.apache.org/jira/browse/SOLR-9835
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Cao Manh Dat
>
> The current replication mechanism of SolrCloud is called state machine, which 
> replicas start in same initial state and for each input, the input is 
> distributed across replicas so all replicas will end up with same next state. 
> But this type of replication have some drawbacks
> - The commit (which costly) have to run on all replicas
> - Slow recovery, because if replica miss more than N updates on its down 
> time, the replica have to download entire index from its leader.
> So we create create another replication mode for SolrCloud called state 
> transfer, which acts like master/slave replication. In basically
> - Leader distribute the update to other replicas, but the leader only apply 
> the update to IW, other replicas just store the update to UpdateLog (act like 
> replication).
> - Replicas frequently polling the latest segments from leader.
> Pros:
> - Lightweight for indexing, because only leader are running the commit, 
> updates.
> - Very fast recovery, replicas just have to download the missing segments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to