[ https://issues.apache.org/jira/browse/CASSANDRA-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ryan McGuire reassigned CASSANDRA-6977: --------------------------------------- Assignee: Ryan McGuire (was: Russ Hatch) > attempting to create 10K column families fails with 100 node cluster > -------------------------------------------------------------------- > > Key: CASSANDRA-6977 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6977 > Project: Cassandra > Issue Type: Bug > Environment: 100 nodes, Ubuntu 12.04.3 LTS, AWS m1.large instances > Reporter: Daniel Meyer > Assignee: Ryan McGuire > Priority: Minor > Attachments: 100_nodes_all_data.png, all_data_5_nodes.png, > keyspace_create.py, logs.tar, tpstats.txt, visualvm_tracer_data.csv > > > During this test we are attempting to create a total of 1K keyspaces with 10 > column families each to bring the total column families to 10K. With a 5 > node cluster this operation can be completed; however, it fails with 100 > nodes. Please see the two charts. For the 5 node case the time required to > create each keyspace and subsequent 10 column families increases linearly > until the number of keyspaces is 1K. For a 100 node cluster there is a > sudden increase in latency between 450 keyspaces and 550 keyspaces. The test > ends when the test script times out. After the test script times out it is > impossible to reconnect to the cluster with the datastax python driver > because it cannot connect to the host: > cassandra.cluster.NoHostAvailable: ('Unable to connect to any servers', > {'10.199.5.98': OperationTimedOut()} > It was found that running the following stress command does work from the > same machine the test script runs on. > cassandra-stress -d 10.199.5.98 -l 2 -e QUORUM -L3 -b -o INSERT > It should be noted that this test was initially done with DSE 4.0 and c* > version 2.0.5.24 and in that case it was not possible to run stress against > the cluster even locally on a node due to not finding the host. > Attached are system logs from one of the nodes, charts showing schema > creation latency for 5 and 100 node clusters and virtualvm tracer data for > cpu, memory, num_threads and gc runs, tpstat output and the test script. > The test script was on an m1.large aws instance outside of the cluster under > test. -- This message was sent by Atlassian JIRA (v6.2#6252)