[ https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15815930#comment-15815930 ]
Tomás Fernández Löbbe commented on SOLR-9835: --------------------------------------------- Great idea! just took a quick look at the patch to understand this better. I have a couple of questions/comments, I know this is work in progress, so feel free to disregard any of my comments if you are working on them: {code} onlyLeaderIndexes = zkStateReader.getClusterState().getCollection(collection).getLiveReplicas() == 1; {code} Maybe add a method to DocCollection like {{isOnlyLeaderIndexes()}} (or choose other name)? I understand why you did this, but this code is repeated many times, maybe can be improved for now. {code} private Map<String, ReplicateFromLeader> replicateFromLeaders = new HashMap<>(); {code} Does this need to be synchronized? {code} - private final String masterUrl; + private String masterUrl; {code} should {{masterUrl}} now be volatile? {code} + public static boolean waitForInSyncWithLeader(SolrCore core, Replica leaderReplica) throws InterruptedException { + if (waitForReplicasInSync == null) return true; + + Pair<Boolean,Integer> pair = parseValue(waitForReplicasInSync); + boolean enabled = pair.first(); + if (!enabled) return true; + + Thread.sleep(1000); + HttpSolrClient leaderClient = new HttpSolrClient.Builder(leaderReplica.getCoreUrl()).build(); + long leaderVersion = -1; + String localVersion = null; + try { + for (int i = 0; i < pair.second(); i++) { + if (core.isClosed()) return true; + ModifiableSolrParams params = new ModifiableSolrParams(); + params.set(CommonParams.QT, ReplicationHandler.PATH); + params.set(COMMAND, CMD_DETAILS); + + NamedList<Object> response = leaderClient.request(new QueryRequest(params)); + leaderVersion = (long) ((NamedList)response.get("details")).get("indexVersion"); + + localVersion = core.getDeletionPolicy().getLatestCommit().getUserData().get(SolrIndexWriter.COMMIT_TIME_MSEC_KEY); + if (localVersion == null && leaderVersion == 0) return true; + + if (localVersion != null && Long.parseLong(localVersion) == leaderVersion) { + return true; + } else { + Thread.sleep(500); + } + } + + } catch (Exception e) { + log.error("Exception when wait for replicas in sync with master"); + } finally { + try { + if (leaderClient != null) leaderClient.close(); + } catch (IOException e) { + e.printStackTrace(); + } + } + + return false; + } {code} In many cases in the tests the leader will change before the replication happens, right? Does it make sense to discover the leader inside of the loop? Also, is there a way to remove that Thread.sleep(1000) at the beginning? This code will be called very frequently in tests. > Create another replication mode for SolrCloud > --------------------------------------------- > > Key: SOLR-9835 > URL: https://issues.apache.org/jira/browse/SOLR-9835 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Reporter: Cao Manh Dat > Assignee: Shalin Shekhar Mangar > Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch > > > The current replication mechanism of SolrCloud is called state machine, which > replicas start in same initial state and for each input, the input is > distributed across replicas so all replicas will end up with same next state. > But this type of replication have some drawbacks > - The commit (which costly) have to run on all replicas > - Slow recovery, because if replica miss more than N updates on its down > time, the replica have to download entire index from its leader. > So we create create another replication mode for SolrCloud called state > transfer, which acts like master/slave replication. In basically > - Leader distribute the update to other replicas, but the leader only apply > the update to IW, other replicas just store the update to UpdateLog (act like > replication). > - Replicas frequently polling the latest segments from leader. > Pros: > - Lightweight for indexing, because only leader are running the commit, > updates. > - Very fast recovery, replicas just have to download the missing segments. > To use this new replication mode, a new collection must be created with an > additional parameter {{liveReplicas=1}} > {code} > http://localhost:8983/solr/admin/collections?action=CREATE&name=newCollection&numShards=2&replicationFactor=1&liveReplicas=1 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org