[ 
https://issues.apache.org/jira/browse/SOLR-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16021261#comment-16021261
 ] 

Tomás Fernández Löbbe commented on SOLR-10233:
----------------------------------------------

Saw a couple of {{ChaosMonkeySafeLeaderWithPullReplicasTest}} failures in 
Jenkins. I'll look into those. 

> Add support for different replica types in Solr
> -----------------------------------------------
>
>                 Key: SOLR-10233
>                 URL: https://issues.apache.org/jira/browse/SOLR-10233
>             Project: Solr
>          Issue Type: New Feature
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrCloud
>            Reporter: Tomás Fernández Löbbe
>            Assignee: Tomás Fernández Löbbe
>         Attachments: SOLR-10233.patch, SOLR-10233.patch, SOLR-10233.patch, 
> SOLR-10233.patch, SOLR-10233.patch
>
>
> For the majority of the cases, current SolrCloud's  distributed indexing is 
> great. There is a subset of use cases for which the legacy Master/Slave 
> replication may fit better:
> * Don’t require NRT
> * LIR can become an issue, prefer availability of reads vs consistency or NRT
> * High number of searches (requiring many search nodes)
> SOLR-9835 is adding replicas that don’t do indexing, just update their 
> transaction log. This Jira is to extend that idea and provide the following 
> replica types:
> * *Realtime:* Writes updates to transaction log and indexes locally. Replicas 
> of type “realtime” support NRT (soft commits) and RTG. Any _realtime_ replica 
> can become a leader. This is the only type supported in SolrCloud at this 
> time and will be the default.
> * *Append:* Writes to transaction log, but not to index, uses replication. 
> Any _append_ replica can become leader (by first applying all local 
> transaction log elements). If a replica is of type _append_ but is also the 
> leader, it will behave as a _realtime_. This is exactly what SOLR-9835 is 
> proposing (non-live replicas)
> * *Passive:* Doesn’t index or writes to transaction log. Just replicates from 
> _realtime_ or _append_ replicas. Passive replicas can’t become shard leaders 
> (i.e., if there are only passive replicas in the collection at some point, 
> updates will fail same as if there is no leaders, queries continue to work), 
> so they don’t even participate in elections.
> When the leader replica of the shard receives an update, it will distribute 
> it to all _realtime_ and _append_ replicas, the same as it does today. It 
> won't distribute to _passive_ replicas.
> By using a combination of _append_ and _passive_ replicas, one can achieve an 
> equivalent of the legacy Master/Slave architecture in SolrCloud mode with 
> most of its benefits, including high availability of writes. 
> h2. API (v1 style)
> {{/admin/collections?action=CREATE…&*realtimeReplicas=X&appendReplicas=Y&passiveReplicas=Z*}}
> {{/admin/collections?action=ADDREPLICA…&*type=\[realtime/append/passive\]*}}
> * “replicationFactor=” will translate to “realtime=“ for back compatibility
> * if _passive_ > 0, _append_ or _realtime_ need to be >= 1 (can’t be all 
> passives)
> h2. Placement Strategies
> By using replica placement rules, one should be able to dedicate nodes to 
> search-only and write-only workloads. For example:
> {code}
> shard:*,replica:*,type:passive,fleet:slaves
> {code}
> where “type” is a new condition supported by the rule engine, and 
> “fleet:slaves” is a regular tag. Note that rules are only applied when the 
> replicas are created, so a later change in tags won't affect existing 
> replicas. Also, rules are per collection, so each collection could contain 
> it's own different rules.
> Note that on the server side Solr also needs to know how to distribute the 
> shard requests (maybe ShardHandler?) if we want to hit only a subset of 
> replicas (i.e. *passive *replicas only, or similar rules)
> h2. SolrJ
> SolrCloud client could be smart to prefer _passive_ replicas for search 
> requests when available (and if configured to do so). _Passive_ replicas 
> can’t respond RTG requests, so those should go to _realtime_ replicas. 
> h2. Cluster/Collection state
> {code}
> {"gettingstarted":{
>   "replicationFactor":"1",
>   "router":{"name":"compositeId"},
>   "maxShardsPerNode":"2",
>   "autoAddReplicas":"false",
>   "shards":{
>     "shard1":{
>       "range":"80000000-ffffffff",
>       "state":"active",
>       "replicas":{
>         "core_node5":{
>           "core":"gettingstarted_shard1_replica1",
>           "base_url":"http://127.0.0.1:8983/solr";,
>           "node_name":"127.0.0.1:8983_solr",
>           "state":"active",
>           "leader":"true",
>           **"type": "realtime"**},
>         "core_node10":{
>           "core":"gettingstarted_shard1_replica2",
>           "base_url":"http://127.0.0.1:7574/solr";,
>           "node_name":"127.0.0.1:7574_solr",
>           "state":"active",
>           **"type": "passive"**}},
>       }},
>     "shard2":{
>       ...
> {code}
> h2. Back compatibility
> We should be able to support back compatibility by assuming replicas without 
> a “type” property are _realtime_ replicas. 
> h2. Failure Scenarios for passive replicas
> h3. Replica-Leader partition
> In SolrCloud today, in this scenario the replica would be placed in LIR. With 
> _passive_ replicas, replicas may not be able to replicate from some time (and 
> fall behind with the index) but queries can still be served. Once the 
> connection is re-established the replication will continue. 
> h3. Replica ZooKeeper partition
> _Passive_ replica will leave the cluster. “Smart clients” and other replicas 
> (e.g. for distributed search) won’t find it and won’t query on it. Direct 
> search requests to the replica may still succeed. 
> h3. Passive replica dies (or is unreachable)
> Replica won’t be query-able. On restart, replica will recover from the 
> leader, following the same flow as _realtime_ replicas: set state to DOWN, 
> then RECOVERING, and finally ACTIVE. _Passive_ replicas will use a different 
> {{RecoveryStrategy}} implementation, that omits *preparerecovery,* and peer 
> sync attempt, it will jump to replication . If the leader didn't change, or 
> if the other replicas are of type “append”, replication should be 
> incremental. Once the first replication is done, passive replica will declare 
> itself active and start serving traffic.
> h3. Leader dies
> Passive replica won’t be able to replicate. The cluster won’t take updates 
> until a new leader is elected. Once a new leader is elected, updates will be 
> back to normal. Passive replicas will remain active and serving query traffic 
> during the “write outage”. Once the new leader is elected the replication 
> will restart (maybe from a different node)
> h3. Leader ZooKeeper partition
> Same as today. Leader will abandon leadership and a new replica will be 
> elected as leader.
> h2. Q&A
> h3. Can I use a combination of _passive_ + _realtime_?
> You could. The problem is that, since _realtime_ generate their own index, 
> any change of leadership could trigger a full replication from all the 
> _passive_ replicas. The biggest benefits of _append_ replicas is that they 
> share the same index files, which means that even if the leader changes, the 
> number of segments to replicate will remain low. For that reason, using 
> _append_ replicas is recommended when using _passive_.
> h3. Can I use _passive_ + _append_ + _realtime_?
> The issue with mixing _realtime_ replicas with _append_ replicas is that if a 
> different _realtime_ replica becomes the leader, the whole purpose of using 
> _append_ replicas is defeated, since they will all have to replicate the full 
> index. 
> h3. What happens if replication from *passives* fail?
> TBD: In general we want those replicas to continue serving search traffic, 
> but we may want to have a way to say “If can’t replicate after X hours put 
> yourself in recovery” or something similar.
> [~varunthacker] suggested that we include in the response time since the last 
> successful replication, and then the client can choose what to do with the 
> results (in a multi-shard request, this date would be the oldest of all 
> shards).
> h3. Do _passive_ replicas need to replicate from the leader only?
> This is not necessary. _Passive_ replicas can replicate from any _realtime_ 
> or _append_ replicas, although this would add some extra waiting time for the 
> last updates. Replicating from a _realtime_ replica may not be a good idea, 
> see the question “Can I use a combination of _passive_ + _realtime_?”
> h3. What if I need NRT?
> Then you can’t query _append_ or _passive_ replicas. You should use all 
> _realtime_ replicas
> h3. Will new _passive_ replicas start receiving traffic immediately after 
> added?
> _passive_ replicas will have the same states as _realtime_/_append_ replicas, 
> they’ll join the cluster as “DOWN” and be moved to “RECOVERY” until they can 
> replicate from the leader. Then they’ll start the replication process and 
> become “ACTIVE”, at this point they’ll start responding queries. They'll use 
> a different {{RecoveryStrategy}} that skips peer sync and buffering of docs, 
> and just replicates.
> h3. What if a _passive_ replica receives an update?
> This will work the same as today with non-leader replicas, it will just 
> forward the update to the correct leader.
> h3. What is the difference between using active + passive with legacy 
> master/slave?
> These are just some I can think of:
> * You now need ZooKeeper to run in SolrCloud mode
> * High availability for writes, as long as you have more than 1 active replica
> * Shard management by Solr at index time and query time.
> * Full support for Collections and Collections API
> * SolrCloudClient support
> I'd like to get some thoughts on this proposal.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to