Patson Luk created SOLR-16871: --------------------------------- Summary: Race condition for coordinator node init Key: SOLR-16871 URL: https://issues.apache.org/jira/browse/SOLR-16871 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: SolrCloud Reporter: Patson Luk
>From a unit test case [that issue concurrent select queries to coordinator >nodes|https://github.com/cowpaths/fullstory-solr/blob/e4226eb8fa2afb01d7615f7faea01f71b144cd58/solr/core/src/test/org/apache/solr/search/TestCoordinatorRole.java#L486], > it’s found that there could be 3 race condition issues: 1. If multiple concurrent requests find the synthetic collection is not yet created, they might all attempt to create the synthetic collection. This could trigger SolrException on `collection already exists` 2. Similarly, if multiple concurrent requests find there’s no replica of the synthetic collection for current node (multiple coordinator node scenario), then CoordinatorHttpSolrCall#addReplica could be invoked multiple times. This should not trigger any exception, but would create multiple replicas for the same node in the synthetic collection 3. The existing logic [here|https://github.com/cowpaths/fullstory-solr/blob/6c8531f08301a291478502c262499abed0d5075c/solr/core/src/java/org/apache/solr/servlet/CoordinatorHttpSolrCall.java#L102] assumes if syntheticColl.getReplicas(solrCall.cores.getZkController().getNodeName()) returns non empty result, then the following call in [here|https://github.com/cowpaths/fullstory-solr/blob/6c8531f08301a291478502c262499abed0d5075c/solr/core/src/java/org/apache/solr/servlet/CoordinatorHttpSolrCall.java#L112] should return a core. Unfortunately, the first call can return a non empty list but with a DOWN replica if another request is in the progress of creating such replica. In this case, the solrCall.getCoreByCollection(syntheticCollectionName, isPreferLeader) would call super.getCoreByCollection at [here|https://github.com/cowpaths/fullstory-solr/blob/6c8531f08301a291478502c262499abed0d5075c/solr/core/src/java/org/apache/solr/servlet/CoordinatorHttpSolrCall.java#L69] which would return a null (since super impl only returns active replica). So CoordinatorHttpSolrCall#getCoreByCollection would end up calling CoordinatorHttpSolrCall#getCore , introducing an infinite loop and cause stack overflow -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org