[jira] [Created] (CASSANDRA-5456) Large number of bootstrapping nodes cause gossip to stop working

Oleg Kibirev (JIRA) Thu, 11 Apr 2013 12:37:57 -0700

Oleg Kibirev created CASSANDRA-5456:
---------------------------------------


             Summary: Large number of bootstrapping nodes cause gossip to stop 
working
                 Key: CASSANDRA-5456
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5456
             Project: Cassandra
          Issue Type: Bug
          Components: Core
    Affects Versions: 1.1.10
            Reporter: Oleg Kibirev


Long running section of code in PendingRangeCalculatorService is synchronized 
on bootstrapTokens. This causes gossip to stop working as it waits for the same 
lock when a large number of nodes (hundreds in our case) are bootstrapping. 
Consequently, the whole cluster becomes non-functional. 

I experimented with the following change in PendingRangeCalculatorService.java 
and it resolved the problem in our case. Prior code had synchronized around the 
for loop.

synchronized(bootstrapTokens) {
    bootstrapTokens = new LinkedHashMap<Token, InetAddress>(bootstrapTokens);
}

for (Map.Entry<Token, InetAddress> entry : bootstrapTokens.entrySet())
{
   InetAddress endpoint = entry.getValue();

   allLeftMetadata.updateNormalToken(entry.getKey(), endpoint);
   for (Range<Token> range : 
strategy.getAddressRanges(allLeftMetadata).get(endpoint))
   pendingRanges.put(range, endpoint);
   allLeftMetadata.removeEndpoint(endpoint);
}
 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (CASSANDRA-5456) Large number of bootstrapping nodes cause gossip to stop working

Reply via email to