patsonluk opened a new pull request, #1800: URL: https://github.com/apache/solr/pull/1800
https://issues.apache.org/jira/browse/SOLR-16871 # Description While the previous fix #1762 does seem to avoid race condition, unfortunately, it was still happening on some test runs, for example https://github.com/cowpaths/fullstory-solr/actions/runs/5616774664/job/15219717699 The issue is that even we sync the `addReplica` block, we could still run into race condition if: 1. Request 1 in Thread 1 hits node 1 and attempt to create synthetic collection 2. Request 2 in Thread 2 also hits node 1, due to race condition it might attempt to create the synthetic collection too, but in this case it's fine as we ignore the creation exception if it's "already exists collection" 3. Request 2 in Thread 2 can proceed and claim the synchronize block, it scans for replica and it might not see any, since Thread 1 might have only created the collection state.json but have not pushed the replica/shard update yet. Request 2 in this case would proceed and create replica on node 1 4. Request 1 might continue with the synthetic collection creation and create a replica as well, now we have 2 replicas # Solution Synchronize on a larger block, this will likely be fine as it's still a very rare case that the synthetic collection does not exist. # Tests Unfortunately I cannot reliable reproduce the issue locally even without this current fix. I ran `./gradlew :solr:core:beast -Ptests.dups=50 --tests "org.apache.solr.search.TestCoordinatorRole.testConcurrentAccess" -Ptests.jvms=1 "-Ptests.jvmargs=-XX:TieredStopAtLevel=1 -XX:+UseParallelGC -XX:ActiveProcessorCount=1 -XX:ReservedCodeCacheSize=120m" -Ptests.seed=D78D41A5AAFAB451 -Ptests.file.encoding=US-ASCII` and it ran around 3x runs successfully, i interrupted it as it took too long # Checklist Please review the following and check all that apply: - [x] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms to the standards described there to the best of my ability. - [x] I have created a Jira issue and added the issue ID to my pull request title. - [ ] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [x] I have developed this patch against the `main` branch. - [ ] I have run `./gradlew check`. - [ ] I have added tests for my changes. - [ ] I have added documentation for the [Reference Guide](https://github.com/apache/solr/tree/main/solr/solr-ref-guide) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org