patsonluk opened a new pull request, #1800:
URL: https://github.com/apache/solr/pull/1800

   https://issues.apache.org/jira/browse/SOLR-16871
   
   
   # Description
   
   While the previous fix #1762  does seem to avoid race condition, 
unfortunately, it was still happening on some test runs, for example 
https://github.com/cowpaths/fullstory-solr/actions/runs/5616774664/job/15219717699
   
   The issue is that even we sync the `addReplica` block, we could still run 
into race condition if:
   1. Request 1 in Thread 1 hits node 1 and attempt to create synthetic 
collection
   2. Request 2 in Thread 2 also hits node 1, due to race condition it might 
attempt to create the synthetic collection too, but in this case it's fine as 
we ignore the creation exception if it's "already exists collection"
   3. Request 2 in Thread 2 can proceed and claim the synchronize block, it 
scans for replica and it might not see any, since Thread 1 might have only 
created the collection state.json but have not pushed the replica/shard update 
yet. Request 2 in this case would proceed and create replica on node 1
   4. Request 1 might continue with the synthetic collection creation and 
create a replica as well, now we have 2 replicas
   
   # Solution
   
   Synchronize on a larger block, this will likely be fine as it's still a very 
rare case that the synthetic collection does not exist.
   
   # Tests
   
   Unfortunately I cannot reliable reproduce the issue locally even without 
this current fix. I ran `./gradlew :solr:core:beast -Ptests.dups=50 --tests 
"org.apache.solr.search.TestCoordinatorRole.testConcurrentAccess" 
-Ptests.jvms=1 "-Ptests.jvmargs=-XX:TieredStopAtLevel=1 -XX:+UseParallelGC 
-XX:ActiveProcessorCount=1 -XX:ReservedCodeCacheSize=120m" 
-Ptests.seed=D78D41A5AAFAB451 -Ptests.file.encoding=US-ASCII` and it ran around 
3x runs successfully, i interrupted it as it took too long
   
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [ ] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [x] I have developed this patch against the `main` branch.
   - [ ] I have run `./gradlew check`.
   - [ ] I have added tests for my changes.
   - [ ] I have added documentation for the [Reference 
Guide](https://github.com/apache/solr/tree/main/solr/solr-ref-guide)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to