[ https://issues.apache.org/jira/browse/SOLR-7932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ramkumar Aiyengar updated SOLR-7932: ------------------------------------ Attachment: SOLR-7932.patch Okay, attached here is a smaller patch which was my biggest concern here with the timestamps to begin with. These two places use timestamps *instead* of generations, and can lead to data loss. The other place where both are used, worst case you will end up doing a full-copy, not a big deal either way (and I understand the case with master-slave in such a situation). *First change:* {code} if (latestVersion == 0L) { if (forceReplication && commit.getGeneration() != 0) { // since we won't get the files for an empty index, // we just clear ours and commit RefCounted<IndexWriter> iw = solrCore.getUpdateHandler().getSolrCoreState().getIndexWriter(solrCore); try { iw.get().deleteAll(); } finally { iw.decref(); } SolrQueryRequest req = new LocalSolrQueryRequest(solrCore, new ModifiableSolrParams()); solrCore.getUpdateHandler().commit(new CommitUpdateCommand(req, false)); } //there is nothing to be replicated successfulInstall = true; return true; } {code} Proposing replacing {{latestVersion}} with {{latestGeneration}}. Is there a case where one would be 0 and the other not? *Second change:* {code} if (!forceReplication && IndexDeletionPolicyWrapper.getCommitTimestamp(commit) == latestVersion) { //master and slave are already in sync just return LOG.info("Slave in sync with master."); successfulInstall = true; return true; } {code} I am proposing: {code} - if (!forceReplication && IndexDeletionPolicyWrapper.getCommitTimestamp(commit) == latestVersion) { + if (!forceReplication && commit.getGeneration() == latestGeneration) { {code} - If generations are not the same, but timestamps are, no harm done really, yes, you will go through replication unnecessarily, but that's it. - Will master-slave result in the same situation? If you blow away the index, I would expect the generations to not match (would be much lesser). Alternatively, we could OR the two checks here as well (i.e. check both versions and generations here), to be absolutely sure we don't skip replication if we shouldn't. [~ysee...@gmail.com], [~markrmil...@gmail.com], could you clarify how this change would affect this adversely in such a case? > Solr replication relies on timestamps to sync across machines > ------------------------------------------------------------- > > Key: SOLR-7932 > URL: https://issues.apache.org/jira/browse/SOLR-7932 > Project: Solr > Issue Type: Bug > Components: replication (java) > Reporter: Ramkumar Aiyengar > Attachments: SOLR-7932.patch, SOLR-7932.patch > > > Spinning off SOLR-7859, noticed there that wall time recorded as commit data > on a commit to check if replication needs to be done. In IndexFetcher, there > is this code: > {code} > if (!forceReplication && > IndexDeletionPolicyWrapper.getCommitTimestamp(commit) == latestVersion) { > //master and slave are already in sync just return > LOG.info("Slave in sync with master."); > successfulInstall = true; > return true; > } > {code} > It appears as if we are checking wall times across machines to check if we > are in sync, this could go wrong. > Once a decision is made to replicate, we do seem to use generations instead, > except for this place below checks both generations and timestamps to see if > a full copy is needed.. > {code} > // if the generation of master is older than that of the slave , it > means they are not compatible to be copied > // then a new index directory to be created and all the files need to > be copied > boolean isFullCopyNeeded = IndexDeletionPolicyWrapper > .getCommitTimestamp(commit) >= latestVersion > || commit.getGeneration() >= latestGeneration || forceReplication; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org