Rick Branson created CASSANDRA-6992:
---------------------------------------

             Summary: Bootstrap on vnodes clusters can cause stampeding/storm 
behavior
                 Key: CASSANDRA-6992
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6992
             Project: Cassandra
          Issue Type: Bug
         Environment: Various vnodes-enabled clusters in EC2, m1.xlarge and 
hi1.4xlarge, ~3000-8000 tokens.
            Reporter: Rick Branson


Assuming this is an issue with vnodes clusters because 
SSTableReader#getPositionsForRanges is more expensive to compute with 256x the 
ranges, but could be wrong. On even well-provisioned hosts, this can cause a 
severe spike in network throughput & CPU utilization from a storm of flushes, 
which impacts long-tail times pretty badly. On weaker hosts (like m1.xlarge 
with ~500GB of data), it can result in minutes of churn while the node gets 
through StreamOut#createPendingFiles. This *might* be better in 2.0, but it's 
probably still reproducible because the bootstrapping node sends out all of 
it's streaming requests at once. 

I'm thinking that this could be staggered at the bootstrapping node to avoid 
the simultaneous spike across the whole cluster. Not sure on how to stagger it 
besides something very naive like one-at-a-time with a pause. Maybe this should 
also be throttled in StreamOut#createPendingFiles on the out-streaming host? 
Any thoughts?

>From the stack dump of one of our weaker nodes that was struggling for a few 
>minutes just starting the StreamOut:

"MiscStage:1" daemon prio=10 tid=0x000000000292f000 nid=0x688 runnable 
[0x00007f7b03df6000]
   java.lang.Thread.State: RUNNABLE
        at 
org.apache.cassandra.utils.ByteBufferUtil.readShortLength(ByteBufferUtil.java:361)
        at 
org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:371)
        at 
org.apache.cassandra.io.sstable.IndexHelper$IndexInfo.deserialize(IndexHelper.java:187)
        at 
org.apache.cassandra.db.RowIndexEntry$Serializer.deserialize(RowIndexEntry.java:125)
        at 
org.apache.cassandra.io.sstable.SSTableReader.getPosition(SSTableReader.java:889)
        at 
org.apache.cassandra.io.sstable.SSTableReader.getPosition(SSTableReader.java:790)
        at 
org.apache.cassandra.io.sstable.SSTableReader.getPositionsForRanges(SSTableReader.java:730)
        at 
org.apache.cassandra.streaming.StreamOut.createPendingFiles(StreamOut.java:172)
        at 
org.apache.cassandra.streaming.StreamOut.transferSSTables(StreamOut.java:157)
        at 
org.apache.cassandra.streaming.StreamOut.transferRanges(StreamOut.java:148)
        at 
org.apache.cassandra.streaming.StreamOut.transferRanges(StreamOut.java:116)
        at 
org.apache.cassandra.streaming.StreamRequestVerbHandler.doVerb(StreamRequestVerbHandler.java:44)
        at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
        at java.lang.Thread.run(Thread.java:662)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to