[ https://issues.apache.org/jira/browse/SOLR-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16037781#comment-16037781 ]
Tomás Fernández Löbbe commented on SOLR-10233: ---------------------------------------------- Looking at ChaosMonkey failure: https://jenkins.thetaphi.de/job/Lucene-Solr-master-Linux/19780/ > Add support for different replica types in Solr > ----------------------------------------------- > > Key: SOLR-10233 > URL: https://issues.apache.org/jira/browse/SOLR-10233 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud > Reporter: Tomás Fernández Löbbe > Assignee: Tomás Fernández Löbbe > Fix For: master (7.0) > > Attachments: 11431.consoleText.txt, SOLR-10233.patch, > SOLR-10233.patch, SOLR-10233.patch, SOLR-10233.patch, SOLR-10233.patch > > > For the majority of the cases, current SolrCloud's distributed indexing is > great. There is a subset of use cases for which the legacy Master/Slave > replication may fit better: > * Don’t require NRT > * LIR can become an issue, prefer availability of reads vs consistency or NRT > * High number of searches (requiring many search nodes) > SOLR-9835 is adding replicas that don’t do indexing, just update their > transaction log. This Jira is to extend that idea and provide the following > replica types: > * *Realtime:* Writes updates to transaction log and indexes locally. Replicas > of type “realtime” support NRT (soft commits) and RTG. Any _realtime_ replica > can become a leader. This is the only type supported in SolrCloud at this > time and will be the default. > * *Append:* Writes to transaction log, but not to index, uses replication. > Any _append_ replica can become leader (by first applying all local > transaction log elements). If a replica is of type _append_ but is also the > leader, it will behave as a _realtime_. This is exactly what SOLR-9835 is > proposing (non-live replicas) > * *Passive:* Doesn’t index or writes to transaction log. Just replicates from > _realtime_ or _append_ replicas. Passive replicas can’t become shard leaders > (i.e., if there are only passive replicas in the collection at some point, > updates will fail same as if there is no leaders, queries continue to work), > so they don’t even participate in elections. > When the leader replica of the shard receives an update, it will distribute > it to all _realtime_ and _append_ replicas, the same as it does today. It > won't distribute to _passive_ replicas. > By using a combination of _append_ and _passive_ replicas, one can achieve an > equivalent of the legacy Master/Slave architecture in SolrCloud mode with > most of its benefits, including high availability of writes. > h2. API (v1 style) > {{/admin/collections?action=CREATE…&*realtimeReplicas=X&appendReplicas=Y&passiveReplicas=Z*}} > {{/admin/collections?action=ADDREPLICA…&*type=\[realtime/append/passive\]*}} > * “replicationFactor=” will translate to “realtime=“ for back compatibility > * if _passive_ > 0, _append_ or _realtime_ need to be >= 1 (can’t be all > passives) > h2. Placement Strategies > By using replica placement rules, one should be able to dedicate nodes to > search-only and write-only workloads. For example: > {code} > shard:*,replica:*,type:passive,fleet:slaves > {code} > where “type” is a new condition supported by the rule engine, and > “fleet:slaves” is a regular tag. Note that rules are only applied when the > replicas are created, so a later change in tags won't affect existing > replicas. Also, rules are per collection, so each collection could contain > it's own different rules. > Note that on the server side Solr also needs to know how to distribute the > shard requests (maybe ShardHandler?) if we want to hit only a subset of > replicas (i.e. *passive *replicas only, or similar rules) > h2. SolrJ > SolrCloud client could be smart to prefer _passive_ replicas for search > requests when available (and if configured to do so). _Passive_ replicas > can’t respond RTG requests, so those should go to _realtime_ replicas. > h2. Cluster/Collection state > {code} > {"gettingstarted":{ > "replicationFactor":"1", > "router":{"name":"compositeId"}, > "maxShardsPerNode":"2", > "autoAddReplicas":"false", > "shards":{ > "shard1":{ > "range":"80000000-ffffffff", > "state":"active", > "replicas":{ > "core_node5":{ > "core":"gettingstarted_shard1_replica1", > "base_url":"http://127.0.0.1:8983/solr", > "node_name":"127.0.0.1:8983_solr", > "state":"active", > "leader":"true", > **"type": "realtime"**}, > "core_node10":{ > "core":"gettingstarted_shard1_replica2", > "base_url":"http://127.0.0.1:7574/solr", > "node_name":"127.0.0.1:7574_solr", > "state":"active", > **"type": "passive"**}}, > }}, > "shard2":{ > ... > {code} > h2. Back compatibility > We should be able to support back compatibility by assuming replicas without > a “type” property are _realtime_ replicas. > h2. Failure Scenarios for passive replicas > h3. Replica-Leader partition > In SolrCloud today, in this scenario the replica would be placed in LIR. With > _passive_ replicas, replicas may not be able to replicate from some time (and > fall behind with the index) but queries can still be served. Once the > connection is re-established the replication will continue. > h3. Replica ZooKeeper partition > _Passive_ replica will leave the cluster. “Smart clients” and other replicas > (e.g. for distributed search) won’t find it and won’t query on it. Direct > search requests to the replica may still succeed. > h3. Passive replica dies (or is unreachable) > Replica won’t be query-able. On restart, replica will recover from the > leader, following the same flow as _realtime_ replicas: set state to DOWN, > then RECOVERING, and finally ACTIVE. _Passive_ replicas will use a different > {{RecoveryStrategy}} implementation, that omits *preparerecovery,* and peer > sync attempt, it will jump to replication . If the leader didn't change, or > if the other replicas are of type “append”, replication should be > incremental. Once the first replication is done, passive replica will declare > itself active and start serving traffic. > h3. Leader dies > Passive replica won’t be able to replicate. The cluster won’t take updates > until a new leader is elected. Once a new leader is elected, updates will be > back to normal. Passive replicas will remain active and serving query traffic > during the “write outage”. Once the new leader is elected the replication > will restart (maybe from a different node) > h3. Leader ZooKeeper partition > Same as today. Leader will abandon leadership and a new replica will be > elected as leader. > h2. Q&A > h3. Can I use a combination of _passive_ + _realtime_? > You could. The problem is that, since _realtime_ generate their own index, > any change of leadership could trigger a full replication from all the > _passive_ replicas. The biggest benefits of _append_ replicas is that they > share the same index files, which means that even if the leader changes, the > number of segments to replicate will remain low. For that reason, using > _append_ replicas is recommended when using _passive_. > h3. Can I use _passive_ + _append_ + _realtime_? > The issue with mixing _realtime_ replicas with _append_ replicas is that if a > different _realtime_ replica becomes the leader, the whole purpose of using > _append_ replicas is defeated, since they will all have to replicate the full > index. > h3. What happens if replication from *passives* fail? > TBD: In general we want those replicas to continue serving search traffic, > but we may want to have a way to say “If can’t replicate after X hours put > yourself in recovery” or something similar. > [~varunthacker] suggested that we include in the response time since the last > successful replication, and then the client can choose what to do with the > results (in a multi-shard request, this date would be the oldest of all > shards). > h3. Do _passive_ replicas need to replicate from the leader only? > This is not necessary. _Passive_ replicas can replicate from any _realtime_ > or _append_ replicas, although this would add some extra waiting time for the > last updates. Replicating from a _realtime_ replica may not be a good idea, > see the question “Can I use a combination of _passive_ + _realtime_?” > h3. What if I need NRT? > Then you can’t query _append_ or _passive_ replicas. You should use all > _realtime_ replicas > h3. Will new _passive_ replicas start receiving traffic immediately after > added? > _passive_ replicas will have the same states as _realtime_/_append_ replicas, > they’ll join the cluster as “DOWN” and be moved to “RECOVERY” until they can > replicate from the leader. Then they’ll start the replication process and > become “ACTIVE”, at this point they’ll start responding queries. They'll use > a different {{RecoveryStrategy}} that skips peer sync and buffering of docs, > and just replicates. > h3. What if a _passive_ replica receives an update? > This will work the same as today with non-leader replicas, it will just > forward the update to the correct leader. > h3. What is the difference between using active + passive with legacy > master/slave? > These are just some I can think of: > * You now need ZooKeeper to run in SolrCloud mode > * High availability for writes, as long as you have more than 1 active replica > * Shard management by Solr at index time and query time. > * Full support for Collections and Collections API > * SolrCloudClient support > I'd like to get some thoughts on this proposal. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org