Patson Luk created SOLR-16871:
---------------------------------

             Summary: Race condition for coordinator node init
                 Key: SOLR-16871
                 URL: https://issues.apache.org/jira/browse/SOLR-16871
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
          Components: SolrCloud
            Reporter: Patson Luk


>From a unit test case [that issue concurrent select queries to coordinator 
>nodes|https://github.com/cowpaths/fullstory-solr/blob/e4226eb8fa2afb01d7615f7faea01f71b144cd58/solr/core/src/test/org/apache/solr/search/TestCoordinatorRole.java#L486],
> it’s found that there could be 3 race condition issues:

1. If multiple concurrent requests find the synthetic collection is not yet 
created, they might all attempt to create the synthetic collection. This could 
trigger SolrException on `collection already exists`

2. Similarly, if multiple concurrent requests find there’s no replica of the 
synthetic collection for current node (multiple coordinator node scenario), 
then CoordinatorHttpSolrCall#addReplica could be invoked multiple times. This 
should not trigger any exception, but would create multiple replicas for the 
same node in the synthetic collection

3. The existing logic 
[here|https://github.com/cowpaths/fullstory-solr/blob/6c8531f08301a291478502c262499abed0d5075c/solr/core/src/java/org/apache/solr/servlet/CoordinatorHttpSolrCall.java#L102]
 assumes if 
syntheticColl.getReplicas(solrCall.cores.getZkController().getNodeName()) 
returns non empty result, then the following call in 
[here|https://github.com/cowpaths/fullstory-solr/blob/6c8531f08301a291478502c262499abed0d5075c/solr/core/src/java/org/apache/solr/servlet/CoordinatorHttpSolrCall.java#L112]
 should return a core. Unfortunately, the first call can return a non empty 
list but with a DOWN replica if another request is in the progress of creating 
such replica. In this case, the 
solrCall.getCoreByCollection(syntheticCollectionName, isPreferLeader) would 
call super.getCoreByCollection at 
[here|https://github.com/cowpaths/fullstory-solr/blob/6c8531f08301a291478502c262499abed0d5075c/solr/core/src/java/org/apache/solr/servlet/CoordinatorHttpSolrCall.java#L69]
 which would return a null (since super impl only returns active replica). So 
CoordinatorHttpSolrCall#getCoreByCollection would end up calling 
CoordinatorHttpSolrCall#getCore , introducing an infinite loop and cause stack 
overflow



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to