[ 
https://issues.apache.org/jira/browse/SOLR-6683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249826#comment-14249826
 ] 

Forest Soup commented on SOLR-6683:
-----------------------------------

I applied the patch for SOLR-6359 on 4.7 and did some test. It does not work as 
expected. 
When I set below config, it still go into SnapPuller code even if I only newly 
added 800 doc.
    <updateLog>
      <str name="dir">${solr.ulog.dir:}</str>
      <int name="numRecordsToKeep">10000</int>
      <int name="maxNumLogsToKeep">100</int>
    </updateLog>

After my reading code, it seems that lines in 
org.apache.solr.update.PeerSync.handleVersions(ShardResponse srsp) cause the 
issue:
    if (ourHighThreshold < otherLow) {
      // Small overlap between version windows and ours is older
      // This means that we might miss updates if we attempted to use this 
method.
      // Since there exists just one replica that is so much newer, we must
      // fail the sync.
      log.info(msg() + " Our versions are too old. 
ourHighThreshold="+ourHighThreshold + " otherLowThreshold="+otherLow);
      return false;
    } 

Could you please comment? Thanks!

> Need a configurable parameter to control the doc number between peersync and 
> the snapshot pull recovery
> -------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-6683
>                 URL: https://issues.apache.org/jira/browse/SOLR-6683
>             Project: Solr
>          Issue Type: Bug
>          Components: replication (java)
>    Affects Versions: 4.7
>         Environment: Redhat Linux 64bit
>            Reporter: Forest Soup
>            Priority: Critical
>              Labels: performance
>
> If there are >100 docs gap between the recovering node and the good node, the 
> solr will do snap pull recovery instead of peersync.
> Can the 100 docs be configurable? For example, there can be 10000, 1000, or 
> 10 docs gap between the good node and the node to recover.
> For 100 doc, a regular restart of a solr node will trigger a full recovery, 
> which is a huge impact to the performance of the running systems
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to