[ 
https://issues.apache.org/jira/browse/SOLR-16871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17746394#comment-17746394
 ] 

ASF subversion and git services commented on SOLR-16871:
--------------------------------------------------------

Commit 848f1b04165bdb7e84c811b571a65719a0e3bb2c in solr's branch 
refs/heads/branch_9x from patsonluk
[ https://gitbox.apache.org/repos/asf?p=solr.git;h=848f1b04165 ]

SOLR-16871: Synchronize on a larger block to avoid race condition in 
CoordinatorHttpSolrCall init (#1800)

* Synchronize to avoid race condition in CoordinatorHttpSolrCall

* ./gradlew tidy

> Race condition for coordinator node init
> ----------------------------------------
>
>                 Key: SOLR-16871
>                 URL: https://issues.apache.org/jira/browse/SOLR-16871
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>            Reporter: Patson Luk
>            Priority: Major
>          Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> From a unit test case [that issue concurrent select queries to coordinator 
> nodes|https://github.com/cowpaths/fullstory-solr/blob/e4226eb8fa2afb01d7615f7faea01f71b144cd58/solr/core/src/test/org/apache/solr/search/TestCoordinatorRole.java#L486],
>  it’s found that there could be 3 race condition issues:
> 1. If multiple concurrent requests find the synthetic collection is not yet 
> created, they might all attempt to create the synthetic collection. This 
> could trigger SolrException on `collection already exists`
> 2. Similarly, if multiple concurrent requests find there’s no replica of the 
> synthetic collection for current node (multiple coordinator node scenario), 
> then CoordinatorHttpSolrCall#addReplica could be invoked multiple times. This 
> should not trigger any exception, but would create multiple replicas for the 
> same node in the synthetic collection
> 3. The existing logic 
> [here|https://github.com/cowpaths/fullstory-solr/blob/6c8531f08301a291478502c262499abed0d5075c/solr/core/src/java/org/apache/solr/servlet/CoordinatorHttpSolrCall.java#L102]
>  assumes if 
> syntheticColl.getReplicas(solrCall.cores.getZkController().getNodeName()) 
> returns non empty result, then the following call in 
> [here|https://github.com/cowpaths/fullstory-solr/blob/6c8531f08301a291478502c262499abed0d5075c/solr/core/src/java/org/apache/solr/servlet/CoordinatorHttpSolrCall.java#L112]
>  should return a core. Unfortunately, the first call can return a non empty 
> list but with a DOWN replica if another request is in the progress of 
> creating such replica. In this case, the 
> solrCall.getCoreByCollection(syntheticCollectionName, isPreferLeader) would 
> call super.getCoreByCollection at 
> [here|https://github.com/cowpaths/fullstory-solr/blob/6c8531f08301a291478502c262499abed0d5075c/solr/core/src/java/org/apache/solr/servlet/CoordinatorHttpSolrCall.java#L69]
>  which would return a null (since super impl only returns active replica). So 
> CoordinatorHttpSolrCall#getCoreByCollection would end up calling 
> CoordinatorHttpSolrCall#getCore , introducing an infinite loop and cause 
> stack overflow



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to