Looking at the peer sync code and I don't quite understand the condition where we report " Our versions are too old." (about line 498 in PeerySync.java, 6x).
Note that the "in the field" version is 5.3.1, but the code looks the same in 6.x. I get that we're testing the overlap between the versions we have and our peer has. But how was the 20% overlap number arrived at? What is it intended to guarantee? And in a case where where the requested number of updates is > the size of the returned list, is it valid to return true if there is _any_ overlap? Why do I care? I'm seeing a case in the field where a very large document exceeds the timeout even though the document successfully indexes on the follower, it just takes a while. The Solr node is up and accepting more updates etc. No updates have actually been missed AFAICT. So the leader is telling the follower to sync due to the timeout. The follower fails the test above and then goes into full sync unnecessarily. Since this is a very large index this takes a very long time, strains the system and the problem can cascade. I'm wondering if this test can be relaxed when the versions list returned from the peer is smaller than requested to not fail if there is any overlap. This feels like an incomplete fix though, because I'm taking it on faith that if the list returned == numRecordsToKeep, then this test wouldn't be as likely to be tripped. But there's no guarantee there so a special test in this case would just kick the can down the road I think. Can we do a different test perhaps (and I'm really reaching here into unfamiliar code so this may be all wet)? Let's say the leader gets a timeout. Would it be possible to rather than do a full peer sync have the leader ask the follower "Hey, I sent you these versions and you timed out, do you really have them or not?"? And if the follower was still processing them not have to do any peer sync at all. Assuming we could guarantee that the doc was in the replicas tlog when answering, would that guarantee data integrity? I can raise a JIRA if any of this makes sense. Erick --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org