justinrsweeney opened a new pull request, #1103:
URL: https://github.com/apache/solr/pull/1103

   https://issues.apache.org/jira/browse/SOLR-16487
   
   # Description
   
   This makes improvements to the following inefficiencies in how non-leader 
replicas are handled:
   
   1. The 
[RecoveryStrategy.replicate()](https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/cloud/RecoveryStrategy.java#L219)
 method makes a call to commit to on the leader. This happens whenever a 
replica is reloaded. For PULL replicas in particular this isn't necessary since 
we can just pull down whatever the latest data is and rely on other mechanisms 
to be consistently committing the leader. (As an aside, it seems like forcing a 
commit on the leader might never be necessary, but for this I've limited it to 
focusing on PULL replicas).
   2. In a case where the leader has no data yet (index version is 0), then a 
non-leader replica will consistently delete and recreate its core due to this 
case in IndexFetcher: 
https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/handler/IndexFetcher.java#L549.
 This can cause unnecessary CPU usage until the leader has data indexed to it.
   3. The polling for replication is fairly simply, but can lead to polling too 
often. As an example if you had the following config for commits:
   ```
   <autoCommit>
       <maxTime>15000</maxTime>
       <openSearcher>false</openSearcher>
   </autoCommit>
   
   <autoSoftCommit>
       <maxTime>60000</maxTime>
   </autoSoftCommit>
   ```
   The current logic would setup polling to be half of the autoCommit time, so 
poll every 7.5 seconds. However since a new searcher isn't opened, there will 
only be changes reflected every 60 seconds on the leader. We can make this 
logic a bit smarter knowing that the replication handler won't reflect changes 
until a new searcher is opened.
   
   # Solution
   
   This PR includes a number of small changes to fix the issues above:
   1. Adding a check if the current replica is a non-leader replica and if so 
skipping the commit call to the leader
   2. Modifying when the leader has a version of 0 to check if the current 
version is also 0 and doing nothing in that case. Previously this was checking 
the generation which starts at 1.
   3. Modified the code setting the polling interval for replication to only 
use the autoCommit time if openSearcher is true, otherwise it will use the soft 
commit time.
   
   # Tests
   
   Please describe the tests you've developed or run to confirm this patch 
implements the feature or solves the problem.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [ ] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [ ] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [ ] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [ ] I have developed this patch against the `main` branch.
   - [ ] I have run `./gradlew check`.
   - [ ] I have added tests for my changes.
   - [ ] I have added documentation for the [Reference 
Guide](https://github.com/apache/solr/tree/main/solr/solr-ref-guide)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to