[ https://issues.apache.org/jira/browse/CASSANDRA-11724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15276804#comment-15276804 ]
Jeremy Hanna commented on CASSANDRA-11724: ------------------------------------------ I suppose I should just say, you should set auto_bootstrap=false in your cassandra.yaml and you wouldn't need to do the two minute intervals since this is a fresh cluster. > False Failure Detection in Big Cassandra Cluster > ------------------------------------------------ > > Key: CASSANDRA-11724 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11724 > Project: Cassandra > Issue Type: Bug > Components: Core > Reporter: Jeffrey F. Lukman > Labels: gossip, node-failure > Attachments: Workload1.jpg, Workload2.jpg, Workload3.jpg, > Workload4.jpg, experiment-result.txt > > > We are running some testing on Cassandra v2.2.5 stable in a big cluster. The > setting in our testing is that each machine has 16-cores and runs 8 cassandra > instances, and our testing is 32, 64, 128, 256, and 512 instances of > Cassandra. We use the default number of vnodes for each instance which is > 256. The data and log directories are on in-memory tmpfs file system. > We run several types of workloads on this Cassandra cluster: > Workload1: Just start the cluster > Workload2: Start half of the cluster, wait until it gets into a stable > condition, and run another half of the cluster > Workload3: Start half of the cluster, wait until it gets into a stable > condition, load some data, and run another half of the cluster > Workload4: Start the cluster, wait until it gets into a stable condition, > load some data and decommission one node > For this testing, we measure the total numbers of false failure detection > inside the cluster. By false failure detection, we mean that, for example, > instance-1 marks the instance-2 down, but the instance-2 is not down. We dig > deeper into the root cause and find out that instance-1 has not received any > heartbeat after some time from instance-2 because the instance-2 run a long > computation process. > Here I attach the graphs of each workload result. -- This message was sent by Atlassian JIRA (v6.3.4#6332)