Stress test
I am trying to use the cassandra stress tool with the user profile=table.yaml arguments specified and do authentication at the same time. If I use the user profile I get an error Invalid parameter user=* if I specify a user and password. Is it not possible to specify a yaml and use authentication?
Stress Test
Hello Folks, Does any body refer good documentation on Cassandra stress test. I have below questions. 1) Which server is good to start the test, Cassandra server or Application server. 2) I am using Datastax Java driver, is any good documentation for stress test specific to this driver. 3) How to analyze the stress test output. Thanks, - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Stress test inconsistencies
Hi All, I am struggling to make sense of a simple stress test I ran against the latest Cassandra 0.7. My server performs very poorly compared to a desktop and even a notebook. Here is the command I execute - a single threaded insert that runs on the same host as Cassnadra does (I am using new contrib/stress but old py_stress produces similar results): ./stress -t 1 -o INSERT -c 30 -n 1 -i 1 On a SUSE Linux server with a 4-core Intel XEON I get maximum 30 inserts a second with 40ms latency. But on a Windows desktop I get incredible 200-260 inserts a second with a 4ms latency!!! Even on the smallest MacBook Pro I get bursts of high throughput - 100+ inserts a second. Could you please help me figure out what is wrong with my server? I tried several servers actually with the same results. I would appreciate any help in tracing down the bottleneck. Configuration is the same in all tests with the server having the advantage of separate physical disks for commitlog and data. Could you also share with me what numbers you get or what is reasonable to expect from this test? Thank you very much, Oleg Here is the output for the Linux server, Windows desktop and MacBook Pro, one line per second: Linux server - INtel XEON X3330 @ 2.666Mhz, 4G RAM, 2G heap Created keyspaces. Sleeping 1s for propagation. total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time 19,19,19,0.05947368421052632,1 46,27,27,0.04274074074074074,2 70,24,24,0.04733,3 95,25,25,0.04696,4 119,24,24,0.048208333,5 147,28,28,0.04189285714285714,7 177,30,30,0.03904,8 206,29,29,0.04006896551724138,9 235,29,29,0.03903448275862069,10 Windows desktop: Core2 Duo CPU E6550 @ 2.333Mhz, 2G RAM, 1G heap Keyspace already exists. total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time 147,147,147,0.005292517006802721,1 351,204,204,0.0042009803921568625,2 527,176,176,0.006551136363636364,3 718,191,191,0.005617801047120419,4 980,262,262,0.00400763358778626,5 1206,226,226,0.004150442477876107,6 1416,210,210,0.005619047619047619,7 1678,262,262,0.0040038167938931295,8 MacBook Pro: Core2 Duo CPU @ 2.26Mhz, 2G RAM, 1G heap Created keyspaces. Sleeping 1s for propagation. total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time 0,0,0,NaN,1 7,7,7,0.21185714285714285,2 47,40,40,0.026925,3 171,124,124,0.007967741935483871,4 258,87,87,0.01206896551724138,6 294,36,36,0.022444,7 303,9,9,0.14378,8 307,4,4,0.2455,9 313,6,6,0.128,10 508,195,195,0.007938461538461538,11 792,284,284,0.0035985915492957746,12 882,90,90,0.01219,13
Re: Stress test
The user and password should be in -mode section, for example: ./cassandra-stress user profile=table.yaml ops\(insert=1\) -mode native cql3 user=** password=** http://docs.datastax.com/en/cassandra/3.0/cassandra/tools/toolsCStress.html /Jay On 7/27/17 2:46 PM, Greg Lloyd wrote: > I am trying to use the cassandra stress tool with the user > profile=table.yaml arguments specified and do authentication at the same > time. If I use the user profile I get an error Invalid parameter > user=* if I specify a user and password. > > Is it not possible to specify a yaml and use authentication? - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Stress test cassandr
Hi, What is the best way to stress test the cassandra cluster with real life workloads which is being followed currently? Currently i am using cassandra stress-tool but it generated blob data /yaml files provides the option to use custom keyspace. But what are the different parameters values which can be set to test the cassandra cluster in extreme environment?
Re: Stress Test
Hi, I found this blog quite helpful: https://www.instaclustr.com/deep-diving-into-cassandra-stress-part-1/ on 1, not sure if I understand your question correctly, but I would not start the stress test process on a Cassandra node which will be under test. on 3, the tool has already with an option to generate nice graphs: http://cassandra.apache.org/doc/latest/tools/cassandra_stress.html#graphing Hope that helps. Am Do., 6. Sep. 2018 um 20:14 Uhr schrieb rajasekhar kommineni < rajaco...@gmail.com>: > Hello Folks, > > Does any body refer good documentation on Cassandra stress test. > > I have below questions. > > 1) Which server is good to start the test, Cassandra server or Application > server. > 2) I am using Datastax Java driver, is any good documentation for stress > test specific to this driver. > 3) How to analyze the stress test output. > > Thanks, > > > - > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org > For additional commands, e-mail: user-h...@cassandra.apache.org > >
Re: Stress test inconsistencies
Try using something higher than -t 1, like -t 100. - Tyler On Mon, Jan 24, 2011 at 9:38 PM, Oleg Proudnikov wrote: > Hi All, > > I am struggling to make sense of a simple stress test I ran against the > latest > Cassandra 0.7. My server performs very poorly compared to a desktop and > even a > notebook. > > Here is the command I execute - a single threaded insert that runs on the > same > host as Cassnadra does (I am using new contrib/stress but old py_stress > produces > similar results): > > ./stress -t 1 -o INSERT -c 30 -n 1 -i 1 > > On a SUSE Linux server with a 4-core Intel XEON I get maximum 30 inserts a > second with 40ms latency. But on a Windows desktop I get incredible 200-260 > inserts a second with a 4ms latency!!! Even on the smallest MacBook Pro I > get > bursts of high throughput - 100+ inserts a second. > > Could you please help me figure out what is wrong with my server? I tried > several servers actually with the same results. I would appreciate any help > in > tracing down the bottleneck. Configuration is the same in all tests with > the > server having the advantage of separate physical disks for commitlog and > data. > > Could you also share with me what numbers you get or what is reasonable to > expect from this test? > > Thank you very much, > Oleg > > > Here is the output for the Linux server, Windows desktop and MacBook Pro, > one > line per second: > > Linux server - INtel XEON X3330 @ 2.666Mhz, 4G RAM, 2G heap > > Created keyspaces. Sleeping 1s for propagation. > total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time > 19,19,19,0.05947368421052632,1 > 46,27,27,0.04274074074074074,2 > 70,24,24,0.04733,3 > 95,25,25,0.04696,4 > 119,24,24,0.048208333,5 > 147,28,28,0.04189285714285714,7 > 177,30,30,0.03904,8 > 206,29,29,0.04006896551724138,9 > 235,29,29,0.03903448275862069,10 > > Windows desktop: Core2 Duo CPU E6550 @ 2.333Mhz, 2G RAM, 1G heap > > Keyspace already exists. > total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time > 147,147,147,0.005292517006802721,1 > 351,204,204,0.0042009803921568625,2 > 527,176,176,0.006551136363636364,3 > 718,191,191,0.005617801047120419,4 > 980,262,262,0.00400763358778626,5 > 1206,226,226,0.004150442477876107,6 > 1416,210,210,0.005619047619047619,7 > 1678,262,262,0.0040038167938931295,8 > > MacBook Pro: Core2 Duo CPU @ 2.26Mhz, 2G RAM, 1G heap > > Created keyspaces. Sleeping 1s for propagation. > total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time > 0,0,0,NaN,1 > 7,7,7,0.21185714285714285,2 > 47,40,40,0.026925,3 > 171,124,124,0.007967741935483871,4 > 258,87,87,0.01206896551724138,6 > 294,36,36,0.022444,7 > 303,9,9,0.14378,8 > 307,4,4,0.2455,9 > 313,6,6,0.128,10 > 508,195,195,0.007938461538461538,11 > 792,284,284,0.0035985915492957746,12 > 882,90,90,0.01219,13 > > > >
Re: Stress test inconsistencies
Tyler Hobbs riptano.com> writes: > Try using something higher than -t 1, like -t 100.- Tyler > Thank you, Tyler! When I run contrib/stress with a higher thread count, the server does scale to 200 inserts a second with latency of 200ms. At the same time Windows desktop scales to 900 inserts a second and latency of 120ms. There is a huge difference that I am trying to understand and eliminate. In my real life bulk load I have to stay with a single threaded client for the POC I am doing. The only option I have is to run several client processes... My real life load is heavier than what contrib/stress does. It takes several days to bulk load 4 million batch mutations !!! It is really painful :-( Something is just not right... Oleg
Re: Stress test inconsistencies
Oleg, I'm a novice at this, but for what it's worth I can't imagine you can have a _sustained_ 1kHz insertion rate on a single machine which also does some reads. If I'm wrong, I'll be glad to learn that I was. It just doesn't seem to square with a typical seek time on a hard drive. Maxim -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Stress-test-inconsistencies-tp5957467p5960182.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Stress test inconsistencies
On Tue, Jan 25, 2011 at 1:23 PM, Oleg Proudnikov wrote: > When I run contrib/stress with a higher thread count, the server does scale > to > 200 inserts a second with latency of 200ms. At the same time Windows > desktop > scales to 900 inserts a second and latency of 120ms. There is a huge > difference > that I am trying to understand and eliminate. > Those are really low numbers, are you still testing with 10k rows? That's not enough, try 1M to give both JVMs enough time to warm up. -Brandon
Re: Stress test inconsistencies
Brandon Williams gmail.com> writes: > > On Tue, Jan 25, 2011 at 1:23 PM, Oleg Proudnikov cloudorange.com> wrote: > > When I run contrib/stress with a higher thread count, the server does scale to > 200 inserts a second with latency of 200ms. At the same time Windows desktop > scales to 900 inserts a second and latency of 120ms. There is a huge > difference > that I am trying to understand and eliminate. > > > Those are really low numbers, are you still testing with 10k rows? That's not enough, try 1M to give both JVMs enough time to warm up. > > > -Brandon > I agree, Brandon, the numbers are very low! The warm up does not seem to make any difference though... There is something that is holding the server back because the CPU is very low. I am trying to understand where this bottleneck is on the Linux server. I do not think it is Cassandra's config as I use the same config on Windows and get much higher numbers as I described. Oleg
Re: Stress test inconsistencies
buddhasystem bnl.gov> writes: > > > Oleg, > > I'm a novice at this, but for what it's worth I can't imagine you can have a > _sustained_ 1kHz insertion rate on a single machine which also does some > reads. If I'm wrong, I'll be glad to learn that I was. It just doesn't seem > to square with a typical seek time on a hard drive. > > Maxim > Maxim, As I understand during inserts Cassandra should not be constrained by random seek time as it uses sequential writes. I do get high numbers on Windows but there is something that is holding back my Linux server. I am trying to understand what it is. Oleg
Re: Stress test inconsistencies
Look at iostat -x 10 10 when he active par tof your test is running. there should be something called svc_t - that should be in the 10ms range, and await should be low. Will tell you if IO is slow, or if IO is not being issued. Also, ensure that you ain't swapping with something like "swapon -s" On Tue, Jan 25, 2011 at 3:04 PM, Oleg Proudnikov wrote: > buddhasystem bnl.gov> writes: > > > > > > > Oleg, > > > > I'm a novice at this, but for what it's worth I can't imagine you can > have a > > _sustained_ 1kHz insertion rate on a single machine which also does some > > reads. If I'm wrong, I'll be glad to learn that I was. It just doesn't > seem > > to square with a typical seek time on a hard drive. > > > > Maxim > > > > Maxim, > > As I understand during inserts Cassandra should not be constrained by > random > seek time as it uses sequential writes. I do get high numbers on Windows > but > there is something that is holding back my Linux server. I am trying to > understand what it is. > > Oleg > > > >
Re: Stress test inconsistencies
Hi All, I was able to run contrib/stress at a very impressive throughput. Single threaded client was able to pump 2,000 inserts per second with 0.4 ms latency. Multithreaded client was able to pump 7,000 inserts per second with 7ms latency. Thank you very much for your help! Oleg
Re: Stress test inconsistencies
Would you share with us the changes you made, or problems you found? On Wed, Jan 26, 2011 at 10:41 AM, Oleg Proudnikov wrote: > Hi All, > > I was able to run contrib/stress at a very impressive throughput. Single > threaded client was able to pump 2,000 inserts per second with 0.4 ms latency. > Multithreaded client was able to pump 7,000 inserts per second with 7ms > latency. > > Thank you very much for your help! > > Oleg > > >
Re: Stress test inconsistencies
I returned to periodic commit log fsync. Jonathan Shook gmail.com> writes: > > Would you share with us the changes you made, or problems you found? >
Timeout during stress test
I am running stress test using hector. In the client logs I see: me.prettyprint.hector.api.exceptions.HTimedOutException: TimedOutException() at me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:32) at me.prettyprint.cassandra.service.HColumnFamilyImpl$1.execute(HColumnFamilyImpl.java:256) at me.prettyprint.cassandra.service.HColumnFamilyImpl$1.execute(HColumnFamilyImpl.java:227) at me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:101) at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:221) at me.prettyprint.cassandra.model.ExecutingKeyspace.doExecuteOperation(ExecutingKeyspace.java:97) at me.prettyprint.cassandra.service.HColumnFamilyImpl.doExecuteSlice(HColumnFamilyImpl.java:227) at me.prettyprint.cassandra.service.HColumnFamilyImpl.getColumns(HColumnFamilyImpl.java:139) at com.riptano.cassandra.stress.SliceCommand.call(SliceCommand.java:48) at com.riptano.cassandra.stress.SliceCommand.call(SliceCommand.java:20) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: TimedOutException() at org.apache.cassandra.thrift.Cassandra$get_slice_result.read(Cassandra.java:7174) at org.apache.cassandra.thrift.Cassandra$Client.recv_get_slice(Cassandra.java:540) at org.apache.cassandra.thrift.Cassandra$Client.get_slice(Cassandra.java:512) at me.prettyprint.cassandra.service.HColumnFamilyImpl$1.execute(HColumnFamilyImpl.java:236) But I don't see anything in cassandra logs. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Timeout-during-stress-test-tp6262430p6262430.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Stress test cassandr
Have you read through the docs for stress? You can have it use your own queries and data model. http://cassandra.apache.org/doc/latest/tools/cassandra_stress.html On Sun, Nov 26, 2017 at 1:02 AM Akshit Jain wrote: > Hi, > What is the best way to stress test the cassandra cluster with real life > workloads which is being followed currently? > Currently i am using cassandra stress-tool but it generated blob data > /yaml files provides the option to use custom keyspace. > > But what are the different parameters values which can be set to test the > cassandra cluster in extreme environment? > >
Problems with Python Stress Test
Hi guys, I was playing around with the stress.py test this week and noticed a few things. 1) Progress-interval does not always work correctly. I set it to 5 in the example below, but am instead getting varying intervals: *techlabs@cassandraN1:~/apache-cassandra-0.7.0-src/contrib/py_stress$ python stress.py --num-keys=10 --columns=5 --column-size=32 --operation=insert --progress-interval=5 --threads=4 --nodes=170.252.179.222 Keyspace already exists. total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time 6662,1332,1335,0.00307796342135,5 11607,989,988,0.00476862022199,12 20297,1738,1736,0.00273238550807,18 30631,2066,2068,0.00202261635614,24 37291,1332,1331,0.00325975901372,29 47514,2044,2044,0.00193106963725,35 56618,1820,1821,0.00276346638249,41 68652,2406,2406,0.00179436958884,47 77745,1818,1820,0.00220694060007,52 87351,1921,1918,0.00236015612201,58 97167,1963,1963,0.00230505042379,64 10,566,566,0.00223569174853,66* 2) The key_rate and op_rate doesn't seem to be calculated correctly. Also, what is the difference between the interval_key_rate and the interval_op_rate? For example in the example above, the first row shows 6662 keys inserted in 5 seconds and 6662 / 5 = 1332, which matches the interval_op_rate. The second row took 7 seconds to update instead of the requested 5. However, the interval_op_rate and interval_key_rate are being calculated based on my requested 5 seconds instead of the actual observed 7 seconds. (11607-6662)/5=989 (11607-6662)/7 = 706 Shouldn't it be basing the calculations off the 7 seconds? 3) If I write x KB to Cassandra with py_stress, the used disk space doesn't grow by x after the test. In the example below I tried to write 500,000 keys * 32 bytes * 5 columns = 78,125 kilobytes of data to the database. When I checked the amount of disk space used after the test it actually grew by 2,684,920 - 2,515,864 = 169,056 kilobytes. Is this because perhaps the commit log got duplicate copies of the data as the SSTables? Also, notice how to progress interval got thrown off after 40 seconds. techlabs@cassandraN1:~/apache-cassandra-0.7.0-src/contrib/py_stress$ df Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/cassandra7rc4-root 7583436 *2515864 *4682344 35% / none633244 208633036 1% /dev none640368 0640368 0% /dev/shm none64036856640312 1% /var/run none640368 0640368 0% /var/lock /dev/sda1 233191 20601200149 10% /boot techlabs@cassandraN1:~/apache-cassandra-0.7.0-src/contrib/py_stress$ python stress.py --num-keys=50 --columns=5 --operation=insert --progress-interval=5 --threads=1 --nodes=170.252.179.222 Keyspace already exists. total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time 15562,3112,3112,0.000300011955333,5 31643,3216,3216,0.000290757187504,10 42968,2265,2265,0.000423845265875,15 54071,2220,2220,0.000430288759747,20 66491,2484,2484,0.000382423304897,25 79891,2680,2680,0.000351728307667,30 91758,2373,2373,0.000402696775367,35 102179,2084,2084,0.000461982612291,40 114003,2364,2364,0.000403893998092,46 126509,2501,2501,0.000379724634489,51 138047,2307,2307,0.000414365229356,56 150261,2442,2442,0.000390332772296,61 164019,2751,2751,0.000343320345113,66 175390,2274,2274,0.000421584286756,71 186564,2234,2234,0.000429319251473,76 198292,2345,2345,0.00040838057315,81 210186,2378,2378,0.000400560030882,87 225144,2991,2991,0.000314564943345,92 236474,2266,2266,0.000422214746265,97 249940,2693,2693,0.000349487200297,102 264410,2894,2894,0.00030166366303,107 275429,2203,2203,0.000464002475276,112 286430,2200,2200,0.00043832517821,117 299217,2557,2557,0.000371891478764,122 313800,2916,2916,0.000322412596002,128 325252,2290,2290,0.000417413284343,133 336031,2155,2155,0.000445155976201,138 347257,2245,2245,0.000426658924816,143 357493,2047,2047,0.000472509730556,148 372151,2931,2931,0.000321278794594,153 384655,2500,2500,0.000381667455343,158 395604,2189,2189,0.000439286896144,163 409713,2821,2821,0.000334938358759,168 423162,2689,2689,0.000351835071877,174 434276,,,0.000432009316829,179 444809,2106,2106,0.00045844612893,184 458190,2676,2676,0.000353130326037,189 470852,2532,2532,0.000374360740552,194 481333,2096,2096,0.000462788910416,199 492458,2225,2225,0.000431290422932,204 50,1508,1508,0.000353647808408,207 techlabs@cassandraN1:~/apache-cassandra-0.7.0-src/contrib/py_stress$ df Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/cassandra7rc4-root 7583436 2684920 4513288 38% / none633244 208633036 1% /dev none640368 0640368 0% /dev/shm none64036856640312 1% /var/run none640368 0640368 0% /var/lock /dev/sda1 233191
CF config for Stress Test
I am starting a stress test using hector on 6 node machine 4GB heap and 12 core. In hectore readme this is what I got by default: create keyspace StressKeyspace with replication_factor = 3 and placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy'; use StressKeyspace; drop column family StressStandard; create column family StressStandard with comparator = UTF8Type and keys_cached = 1 and memtable_flush_after = 1440 and memtable_throughput = 32; Are these good values? I was thinking of highher keys_cached but not sure if it's in bytes or no of keys. Also not sure how to tune memtable values. I have set concurrent_readers to 32 and writers to 48. Can someone please help me with good values that I can start this test with? Also, any other suggested values that I need to change? Thanks -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/CF-config-for-Stress-Test-tp6255608p6255608.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Timeout during stress test
I see this occurring often when all cassandra nodes all of a sudden show CPU spike. All reads fail for about 2 mts. GC.log and system.log doesn't reveal much. Only think I notice is that when I restart nodes there are tons of files that gets deleted. cfstats from one of the nodes looks like this: nodetool -h `hostname` tpstats Pool NameActive Pending Completed ReadStage2727 21491 RequestResponseStage 0 0 201641 MutationStage 0 0 236513 ReadRepairStage 0 0 7222 GossipStage 0 0 31498 AntiEntropyStage 0 0 0 MigrationStage0 0 0 MemtablePostFlusher 0 0324 StreamStage 0 0 0 FlushWriter 0 0324 FILEUTILS-DELETE-POOL 0 0 1220 MiscStage 0 0 0 FlushSorter 0 0 0 InternalResponseStage 0 0 0 HintedHandoff 1 3 9 -- Keyspace: StressKeyspace Read Count: 21957 Read Latency: 46.91765058978913 ms. Write Count: 222104 Write Latency: 0.008302124230090408 ms. Pending Tasks: 0 Column Family: StressStandard SSTable count: 286 Space used (live): 377916657941 Space used (total): 377916657941 Memtable Columns Count: 362 Memtable Data Size: 164403613 Memtable Switch Count: 326 Read Count: 21958 Read Latency: 631.464 ms. Write Count: 222104 Write Latency: 0.007 ms. Pending Tasks: 0 Key cache capacity: 100 Key cache size: 22007 Key cache hit rate: 0.002453626459907744 Row cache: disabled Compacted row minimum size: 87 Compacted row maximum size: 5839588 Compacted row mean size: 552698 -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Timeout-during-stress-test-tp6262430p6263087.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Timeout during stress test
TimedOutException means the cluster could not perform the request in rpc_timeout time. The client should retry as the problem may be transitory. In this case read performance may have slowed down due to the number of sstables 286. It hard to tell without knowing what the workload is. Aaron On 12 Apr 2011, at 09:56, mcasandra wrote: > I see this occurring often when all cassandra nodes all of a sudden show CPU > spike. All reads fail for about 2 mts. GC.log and system.log doesn't reveal > much. > > Only think I notice is that when I restart nodes there are tons of files > that gets deleted. cfstats from one of the nodes looks like this: > > nodetool -h `hostname` tpstats > Pool NameActive Pending Completed > ReadStage2727 21491 > RequestResponseStage 0 0 201641 > MutationStage 0 0 236513 > ReadRepairStage 0 0 7222 > GossipStage 0 0 31498 > AntiEntropyStage 0 0 0 > MigrationStage0 0 0 > MemtablePostFlusher 0 0324 > StreamStage 0 0 0 > FlushWriter 0 0324 > FILEUTILS-DELETE-POOL 0 0 1220 > MiscStage 0 0 0 > FlushSorter 0 0 0 > InternalResponseStage 0 0 0 > HintedHandoff 1 3 9 > > -- > > > Keyspace: StressKeyspace >Read Count: 21957 >Read Latency: 46.91765058978913 ms. >Write Count: 222104 >Write Latency: 0.008302124230090408 ms. >Pending Tasks: 0 >Column Family: StressStandard >SSTable count: 286 >Space used (live): 377916657941 >Space used (total): 377916657941 >Memtable Columns Count: 362 >Memtable Data Size: 164403613 >Memtable Switch Count: 326 >Read Count: 21958 >Read Latency: 631.464 ms. >Write Count: 222104 >Write Latency: 0.007 ms. >Pending Tasks: 0 >Key cache capacity: 100 >Key cache size: 22007 >Key cache hit rate: 0.002453626459907744 >Row cache: disabled >Compacted row minimum size: 87 >Compacted row maximum size: 5839588 >Compacted row mean size: 552698 > > > > > -- > View this message in context: > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Timeout-during-stress-test-tp6262430p6263087.html > Sent from the cassandra-u...@incubator.apache.org mailing list archive at > Nabble.com.
Re: Timeout during stress test
It looks like hector did retry on all the nodes and failed. Does this then mean cassandra is down for clients in this scenario? That would be bad. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Timeout-during-stress-test-tp6262430p6263270.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Timeout during stress test
It means the cluster is currently overloaded and unable to complete requests in time at the CL specified. Aaron On 12 Apr 2011, at 11:18, mcasandra wrote: > It looks like hector did retry on all the nodes and failed. Does this then > mean cassandra is down for clients in this scenario? That would be bad. > > -- > View this message in context: > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Timeout-during-stress-test-tp6262430p6263270.html > Sent from the cassandra-u...@incubator.apache.org mailing list archive at > Nabble.com.
Re: Timeout during stress test
But I don't understand the reason for oveload. It was doing simple read of 12 threads and reasing 5 rows. Avg CPU only 20%, No GC issues that I see. I would expect cassandra to be able to process more with 6 nodes, 12 core, 96 GB RAM and 4 GB heap. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Timeout-during-stress-test-tp6262430p6263470.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Timeout during stress test
You'll need to provide more information, from the TP stats the read stage could not keep up. If the node is not CPU bound then it is probably IO bound. What sort of read? How many columns was it asking for ? How many columns do the rows have ? Was the test asking for different rows ? How many ops requests per second did it get up to? What do the io stats look like ? What does nodetool cfhistograms say ? Aaron On 12 Apr 2011, at 13:02, mcasandra wrote: > But I don't understand the reason for oveload. It was doing simple read of 12 > threads and reasing 5 rows. Avg CPU only 20%, No GC issues that I see. I > would expect cassandra to be able to process more with 6 nodes, 12 core, 96 > GB RAM and 4 GB heap. > > -- > View this message in context: > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Timeout-during-stress-test-tp6262430p6263470.html > Sent from the cassandra-u...@incubator.apache.org mailing list archive at > Nabble.com.
Re: Timeout during stress test
I notice you have pending hinted handoffs? Look for errors related to that. We have seen occasional corruptions in the hinted handoff sstables, If you are stressing the system to its limits, you may also consider playing with more with the number of read/write threads (concurrent_reads/writes) as well as rate limiting the number of requests you can get per node (throttle limit). We have seen similar issue when sending large number of requests to a cluster (read/write threads running out, timeouts, nodes marked as down). Terje We have seen similar issues when On Tue, Apr 12, 2011 at 9:56 AM, aaron morton wrote: > It means the cluster is currently overloaded and unable to complete > requests in time at the CL specified. > > Aaron > > On 12 Apr 2011, at 11:18, mcasandra wrote: > > > It looks like hector did retry on all the nodes and failed. Does this > then > > mean cassandra is down for clients in this scenario? That would be bad. > > > > -- > > View this message in context: > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Timeout-during-stress-test-tp6262430p6263270.html > > Sent from the cassandra-u...@incubator.apache.org mailing list archive > at Nabble.com. > >
Re: Timeout during stress test
aaron morton wrote: > > You'll need to provide more information, from the TP stats the read stage > could not keep up. If the node is not CPU bound then it is probably IO > bound. > > > What sort of read? > How many columns was it asking for ? > How many columns do the rows have ? > Was the test asking for different rows ? > How many ops requests per second did it get up to? > What do the io stats look like ? > What does nodetool cfhistograms say ? > It's simple read of 1M rows with one column of avg size of 200K. Got around 70 req per sec. Not sure how to intepret the iostats output with things happening async in cassandra. Can you give little description on how to interpret it? I have posted output of cfstats. Does cfhistograms provide better info? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Timeout-during-stress-test-tp6262430p6263859.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Timeout during stress test
Couple of hits here, one from jonathan and some previous discussions on the user list http://www.google.co.nz/search?q=cassandra+iostat Same here for cfhistograms http://www.google.co.nz/search?q=cassandra+cfhistograms cfhistograms includes information on the number of sstables read during recent requests. As your initial cfstats showed 236 sstables I thought it may be useful see if there was a high number of sstables been accessed per read. 70 requests per second is slow against a 6 node cluster where each node has 12 cores and 96GB of ram. Something is not right. Aaron On 12 Apr 2011, at 17:11, mcasandra wrote: > > aaron morton wrote: >> >> You'll need to provide more information, from the TP stats the read stage >> could not keep up. If the node is not CPU bound then it is probably IO >> bound. >> >> >> What sort of read? >> How many columns was it asking for ? >> How many columns do the rows have ? >> Was the test asking for different rows ? >> How many ops requests per second did it get up to? >> What do the io stats look like ? >> What does nodetool cfhistograms say ? >> > It's simple read of 1M rows with one column of avg size of 200K. Got around > 70 req per sec. > > Not sure how to intepret the iostats output with things happening async in > cassandra. Can you give little description on how to interpret it? > > I have posted output of cfstats. Does cfhistograms provide better info? > > > -- > View this message in context: > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Timeout-during-stress-test-tp6262430p6263859.html > Sent from the cassandra-u...@incubator.apache.org mailing list archive at > Nabble.com.
Re: Timeout during stress test
0 0 0 17084 690 0 0 0 20501 960 0 0 0 24601 1272 0 0 0 29521 1734 0 0 0 35425 2262 0 0 0 42510 2734 0 0 0 51012 3098 0 0 0 61214 3426 0 0 0 73457 3879 0 0 0 88148 4157 0 0 0 1057784065 0 0 0 1269343804 0 0 0 1523212828 0 0 0 1827851699 0 0 0 219342 821 0 0 0 263210 300 0249214 0 315852 88 0149731 0 379022 12 0 0 0 454826 3 0 0 0 545791 0 0 0 0 654949 0 0 0 0 785939 0 0 0 0 943127 0 0 74915 0 1131752 0 0 0 0 1358102 0 0 0 0 1629722 0 0 0 0 1955666 0 0 0 0 2346799 0 0 0 0 2816159 0 0 0 0 3379391 0 0 22438 0 4055269 0 0 0 0 4866323 0 0 0 0 5839588 0 0 2559 0 7007506 0 0 0 0 8409007 0 0 0 0 10090808 0 0 0 0 12108970 0 0 0 0 14530764 0 0 0 0 17436917 0 0 0 0 20924300 0 0 0 0 25109160 0 0 0 0 -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Timeout-during-stress-test-tp6262430p6265925.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Cassandra Stress Test Result Evaluation
I have been using the cassandra-stress tool to evaluate my cassandra cluster for quite some time now. My problem is that I am not able to comprehend the results generated for my specific use case. My schema looks something like this: CREATE TABLE Table_test( ID uuid, Time timestamp, Value double, Date timestamp, PRIMARY KEY ((ID,Date), Time) ) WITH COMPACT STORAGE; I have parsed this information in a custom yaml file and used parameters n=1, threads=100 and the rest are default options (cl=one, mode=native cql3 etc). The Cassandra cluster is a 3 node CentOS VM setup. A few specifics of the custom yaml file are as follows: insert: partitions: fixed(100) select: fixed(1)/2 batchtype: UNLOGGED columnspecs: -name: Time size: fixed(1000) -name: ID size: uniform(1..100) -name: Date size: uniform(1..10) -name: Value size: uniform(-100..100) My observations so far are as follows (Please correct me if I am wrong): 1. With n=1 and time: fixed(1000), the number of rows getting inserted is 10 million. (1*1000=1000) 2. The number of row-keys/partitions is 1(i.e n), within which 100 partitions are taken at a time (which means 100 *1000 = 10 key-value pairs) out of which 5 key-value pairs are processed at a time. (This is because of select: fixed(1)/2 ~ 50%) The output message also confirms the same: Generating batches with [100..100] partitions and [5..5] rows (of[10..10] total rows in the partitions) The results that I get are the following for consecutive runs with the same configuration as above: Run Total_ops Op_rate Partition_rate Row_Rate Time 1 56 19 1885 943246 3.0 2 46 46 4648 2325498 1.0 3 27 30 2982 1489870 0.9 4 59 19 1932 966034 3.1 5 100 17 1730 865182 5.8 Now what I need to understand are as follows: 1. Which among these metrics is the throughput i.e, No. of records inserted per second? Is it the Row_rate, Op_rate or Partition_rate? If it’s the Row_rate, can I safely conclude here that I am able to insert close to 1 million records per second? Any thoughts on what the Op_rate and Partition_rate mean in this case? 2. Why is it that the Total_ops vary so drastically in every run ? Has the number of threads got anything to do with this variation? What can I conclude here about the stability of my Cassandra setup? 3. How do I determine the batch size per thread here? In my example, is the batch size 5? Thanks in advance.
Re: Problems with Python Stress Test
On Thu, Feb 3, 2011 at 7:02 PM, Sameer Farooqui wrote: > Hi guys, > > I was playing around with the stress.py test this week and noticed a few > things. > > 1) Progress-interval does not always work correctly. I set it to 5 in the > example below, but am instead getting varying intervals: > Generally indicates that the client machine is being overloaded in my experience. 2) The key_rate and op_rate doesn't seem to be calculated correctly. Also, > what is the difference between the interval_key_rate and the > interval_op_rate? For example in the example above, the first row shows 6662 > keys inserted in 5 seconds and 6662 / 5 = 1332, which matches the > interval_op_rate. > There should be no difference unless you're doing range slices, but IPC timing makes them vary somewhat. 3) If I write x KB to Cassandra with py_stress, the used disk space doesn't > grow by x after the test. In the example below I tried to write 500,000 keys > * 32 bytes * 5 columns = 78,125 kilobytes of data to the database. When I > checked the amount of disk space used after the test it actually grew by > 2,684,920 - 2,515,864 = 169,056 kilobytes. Is this because perhaps the > commit log got duplicate copies of the data as the SSTables? > Commitlogs could be part of it, you're not factoring in the column names, and then there's index and bloom filter overhead. Use contrib/stress on 0.7 instead. -Brandon
Re: Problems with Python Stress Test
Brandon, Thanks for the response. I have also noticed that stress.py's progress interval gets thrown off in low memory situations. What did you mean by "contrib/stress on 0.7 instead". I don't see that dir in the src version of 0.7. - Sameer On Thu, Feb 3, 2011 at 5:22 PM, Brandon Williams wrote: > On Thu, Feb 3, 2011 at 7:02 PM, Sameer Farooqui > wrote: > >> Hi guys, >> >> I was playing around with the stress.py test this week and noticed a few >> things. >> >> 1) Progress-interval does not always work correctly. I set it to 5 in the >> example below, but am instead getting varying intervals: >> > > Generally indicates that the client machine is being overloaded in my > experience. > > 2) The key_rate and op_rate doesn't seem to be calculated correctly. Also, >> what is the difference between the interval_key_rate and the >> interval_op_rate? For example in the example above, the first row shows 6662 >> keys inserted in 5 seconds and 6662 / 5 = 1332, which matches the >> interval_op_rate. >> > > There should be no difference unless you're doing range slices, but IPC > timing makes them vary somewhat. > > 3) If I write x KB to Cassandra with py_stress, the used disk space doesn't >> grow by x after the test. In the example below I tried to write 500,000 keys >> * 32 bytes * 5 columns = 78,125 kilobytes of data to the database. When I >> checked the amount of disk space used after the test it actually grew by >> 2,684,920 - 2,515,864 = 169,056 kilobytes. Is this because perhaps the >> commit log got duplicate copies of the data as the SSTables? >> > > Commitlogs could be part of it, you're not factoring in the column names, > and then there's index and bloom filter overhead. > > Use contrib/stress on 0.7 instead. > > -Brandon >
Re: Problems with Python Stress Test
On Fri, Feb 4, 2011 at 5:23 PM, Sameer Farooqui wrote: > Brandon, > > Thanks for the response. I have also noticed that stress.py's progress > interval gets thrown off in low memory situations. > > What did you mean by "contrib/stress on 0.7 instead". I don't see that dir > in the src version of 0.7. Looks like it didn't make it in 0.7.0. It will be in 0.7.1, or you can get it from svn. -Brandon
Re: CF config for Stress Test
If you just want to benchmark the cluster it wont matter too much, though I would set keys_cached to 0 and increate memtable throughput to 64 or 128. If you are testing to get a better idea for your app then use similar settings to your app. keys_cahced is the number of keys for concurrent_readers and concurrent_writers see the comments in cong/cassandra.yaml. I could not find this KS definition in the hector code base so not sure why they chose those values. Aaron On 9 Apr 2011, at 11:10, mcasandra wrote: > I am starting a stress test using hector on 6 node machine 4GB heap and 12 > core. In hectore readme this is what I got by default: > > create keyspace StressKeyspace >with replication_factor = 3 >and placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy'; > > use StressKeyspace; > drop column family StressStandard; > create column family StressStandard >with comparator = UTF8Type >and keys_cached = 1 >and memtable_flush_after = 1440 >and memtable_throughput = 32; > > Are these good values? I was thinking of highher keys_cached but not sure if > it's in bytes or no of keys. > > Also not sure how to tune memtable values. > > I have set concurrent_readers to 32 and writers to 48. > > Can someone please help me with good values that I can start this test with? > > Also, any other suggested values that I need to change? > > Thanks > > -- > View this message in context: > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/CF-config-for-Stress-Test-tp6255608p6255608.html > Sent from the cassandra-u...@incubator.apache.org mailing list archive at > Nabble.com.
Fwd: Cassandra Stress Test Result Evaluation
I have been using the cassandra-stress tool to evaluate my cassandra cluster for quite some time now. My problem is that I am not able to comprehend the results generated for my specific use case. My schema looks something like this: CREATE TABLE Table_test( ID uuid, Time timestamp, Value double, Date timestamp, PRIMARY KEY ((ID,Date), Time) ) WITH COMPACT STORAGE; I have parsed this information in a custom yaml file and used parameters n=1, threads=100 and the rest are default options (cl=one, mode=native cql3 etc). The Cassandra cluster is a 3 node CentOS VM setup. A few specifics of the custom yaml file are as follows: insert: partitions: fixed(100) select: fixed(1)/2 batchtype: UNLOGGED columnspecs: -name: Time size: fixed(1000) -name: ID size: uniform(1..100) -name: Date size: uniform(1..10) -name: Value size: uniform(-100..100) My observations so far are as follows (Please correct me if I am wrong): 1. With n=1 and time: fixed(1000), the number of rows getting inserted is 10 million. (1*1000=1000) 2. The number of row-keys/partitions is 1(i.e n), within which 100 partitions are taken at a time (which means 100 *1000 = 10 key-value pairs) out of which 5 key-value pairs are processed at a time. (This is because of select: fixed(1)/2 ~ 50%) The output message also confirms the same: Generating batches with [100..100] partitions and [5..5] rows (of[10..10] total rows in the partitions) The results that I get are the following for consecutive runs with the same configuration as above: Run Total_ops Op_rate Partition_rate Row_Rate Time 1 56 19 1885 943246 3.0 2 46 46 4648 2325498 1.0 3 27 30 2982 1489870 0.9 4 59 19 1932 966034 3.1 5 100 17 1730 865182 5.8 Now what I need to understand are as follows: 1. Which among these metrics is the throughput i.e, No. of records inserted per second? Is it the Row_rate, Op_rate or Partition_rate? If it’s the Row_rate, can I safely conclude here that I am able to insert close to 1 million records per second? Any thoughts on what the Op_rate and Partition_rate mean in this case? 2. Why is it that the Total_ops vary so drastically in every run ? Has the number of threads got anything to do with this variation? What can I conclude here about the stability of my Cassandra setup? 3. How do I determine the batch size per thread here? In my example, is the batch size 5? Thanks in advance. -- Nisha Menon BTech (CS) Sahrdaya CET, MTech (CS) IIIT Banglore.
Re: Cassandra Stress Test Result Evaluation
Your insert settings look unrealistic since I doubt you would be writing 50k rows at a time. Try to set this to 1 per partition and you should get much more consistent numbers across runs I would think. select: fixed(1)/10 On Wed, Mar 4, 2015 at 7:53 AM, Nisha Menon wrote: > I have been using the cassandra-stress tool to evaluate my cassandra cluster > for quite some time now. My problem is that I am not able to comprehend the > results generated for my specific use case. > > My schema looks something like this: > > CREATE TABLE Table_test( > ID uuid, > Time timestamp, > Value double, > Date timestamp, > PRIMARY KEY ((ID,Date), Time) > ) WITH COMPACT STORAGE; > > I have parsed this information in a custom yaml file and used parameters > n=1, threads=100 and the rest are default options (cl=one, mode=native > cql3 etc). The Cassandra cluster is a 3 node CentOS VM setup. > > A few specifics of the custom yaml file are as follows: > > insert: > partitions: fixed(100) > select: fixed(1)/2 > batchtype: UNLOGGED > > columnspecs: > -name: Time > size: fixed(1000) > -name: ID > size: uniform(1..100) > -name: Date > size: uniform(1..10) > -name: Value > size: uniform(-100..100) > > My observations so far are as follows (Please correct me if I am wrong): > > With n=1 and time: fixed(1000), the number of rows getting inserted is > 10 million. (1*1000=1000) > The number of row-keys/partitions is 1(i.e n), within which 100 > partitions are taken at a time (which means 100 *1000 = 10 key-value > pairs) out of which 5 key-value pairs are processed at a time. (This is > because of select: fixed(1)/2 ~ 50%) > > The output message also confirms the same: > > Generating batches with [100..100] partitions and [5..5] rows > (of[10..10] total rows in the partitions) > > The results that I get are the following for consecutive runs with the same > configuration as above: > > Run Total_ops Op_rate Partition_rate Row_Rate Time > 1 56 19 1885 943246 3.0 > 2 46 46 4648 2325498 1.0 > 3 27 30 2982 1489870 0.9 > 4 59 19 1932 966034 3.1 > 5 100 17 1730 865182 5.8 > > Now what I need to understand are as follows: > > Which among these metrics is the throughput i.e, No. of records inserted per > second? Is it the Row_rate, Op_rate or Partition_rate? If it’s the Row_rate, > can I safely conclude here that I am able to insert close to 1 million > records per second? Any thoughts on what the Op_rate and Partition_rate mean > in this case? > Why is it that the Total_ops vary so drastically in every run ? Has the > number of threads got anything to do with this variation? What can I > conclude here about the stability of my Cassandra setup? > How do I determine the batch size per thread here? In my example, is the > batch size 5? > > Thanks in advance. -- http://twitter.com/tjake
Stress test using Java-based stress utility
Hi All, I am following this following link " * http://www.datastax.com/docs/0.7/utilities/stress_java *" for a stress test. I am getting this notification after running this command *xxx.xxx.xxx.xx= my ip* *contrib/stress/bin/stress -d xxx.xxx.xxx.xx* *Created keyspaces. Sleeping 1s for propagation. total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time Operation [44] retried 10 times - error inserting key 044 ((UnavailableException)) Operation [49] retried 10 times - error inserting key 049 ((UnavailableException)) Operation [7] retried 10 times - error inserting key 007 ((UnavailableException)) Operation [6] retried 10 times - error inserting key 006 ((UnavailableException)) * *Any idea why I am getting these things?* *Thank You * * *
Different Load values after stress test runs....
Hi, we're running some performance tests against some clusters and I'm curious about some of the numbers I see. I'm running the stress test against two identically configured clusters, but after I run at stress test, I get different Load values across the clusters? The difference between the two clusters is that one uses standard EC2 interfaces, but the other runs on a virtual network. Are these differences indicating something that I should be aware of?? Here is a sample of the kinds of results I'm seeing. Address DC RackStatus State LoadOwns Token 12760588759xxx 10.0.0.17 DC1 RAC1Up Normal 94 MB 25.00% 0 10.0.0.18 DC1 RAC1Up Normal 104.52 MB 25.00% 42535295865xxx 10.0.0.19 DC1 RAC1Up Normal 78.58 MB 25.00% 85070591730xxx 10.0.0.20 DC1 RAC1Up Normal 78.58 MB 25.00% 12760588759xxx Address DC RackStatus State LoadOwns Token 12760588759xxx 10.120.35.52DC1 RAC1Up Normal 103.74 MB 25.00% 0 10.120.6.124DC1 RAC1Up Normal 118.99 MB 25.00% 42535295865xxx 10.127.90.142 DC1 RAC1Up Normal 104.26 MB 25.00% 85070591730xxx 10.94.69.237DC1 RAC1Up Normal 75.74 MB 25.00% 12760588759xxx The first cluster with the vNet (10.0.0.0/28 addresses) consistently show smaller Load values. The total Load of 355MB vs. 402MB with native EC2 interfaces?? Is a total Load value even meaningful?? The stress test is the very first thing that's run against the clusters. [I'm also a little puzzled that these numbers are not uniform within the clusters, but I suspect that's because the stress test is using a key distribution that is Gaussian. I'm not 100% sure of this either since I've seen conflicting documentation. Haven't tried 'random' keys, but I presume that would change them to be uniform] Except for these curious Load numbers, things seem to be running just fine. Getting good fast results. Over 10 iterations I'm getting more than 10-12K inserts per sec. (default values for the stress test). Should I expect the Load to be the same across different clusters?? What might explain the differences I'm seeing??? Thanks in advance. CM
Re: Stress test using Java-based stress utility
Have you checked the logs on the nodes to see if there are any errors? On 7/21/11 10:43 PM, Nilabja Banerjee wrote: Hi All, I am following this following link " http://www.datastax.com/docs/0.7/utilities/stress_java " for a stress test. I am getting this notification after running this command xxx.xxx.xxx.xx= my ip contrib/stress/bin/stress -d xxx.xxx.xxx.xx Created keyspaces. Sleeping 1s for propagation. total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time Operation [44] retried 10 times - error inserting key 044 ((UnavailableException)) Operation [49] retried 10 times - error inserting key 049 ((UnavailableException)) Operation [7] retried 10 times - error inserting key 007 ((UnavailableException)) Operation [6] retried 10 times - error inserting key 006 ((UnavailableException)) Any idea why I am getting these things? Thank You -- Kirk True Founder, Principal Engineer Expert Engineering Firepower About us:
Re: Stress test using Java-based stress utility
UnavailableException is raised server side when there is less than CL nodes UP when the request starts. It seems odd to get it in this case because the default replication factor used by stress test is 1. How many nodes do you have and have you made any changes to the RF ? Also check the server side logs as Kirk says. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 22 Jul 2011, at 18:37, Kirk True wrote: > Have you checked the logs on the nodes to see if there are any errors? > > On 7/21/11 10:43 PM, Nilabja Banerjee wrote: >> >> Hi All, >> >> I am following this following link " >> http://www.datastax.com/docs/0.7/utilities/stress_java " for a stress test. >> I am getting this notification after running this command >> >> xxx.xxx.xxx.xx= my ip >> contrib/stress/bin/stress -d xxx.xxx.xxx.xx >> >> Created keyspaces. Sleeping 1s for propagation. >> total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time >> Operation [44] retried 10 times - error inserting key 044 >> ((UnavailableException)) >> >> Operation [49] retried 10 times - error inserting key 049 >> ((UnavailableException)) >> >> Operation [7] retried 10 times - error inserting key 007 >> ((UnavailableException)) >> >> Operation [6] retried 10 times - error inserting key 006 >> ((UnavailableException)) >> >> >> Any idea why I am getting these things? >> >> >> Thank You >> >> >> >> > > -- > Kirk True > Founder, Principal Engineer > > > > Expert Engineering Firepower > > About us:
Re: Stress test using Java-based stress utility
Running only one node. I dnt think it is coming for the replication factor... I will try to sort this out Any other suggestions from your side is always be helpful.. :) Thank you On 22 July 2011 14:36, aaron morton wrote: > UnavailableException is raised server side when there is less than CL nodes > UP when the request starts. > > It seems odd to get it in this case because the default replication factor > used by stress test is 1. How many nodes do you have and have you made any > changes to the RF ? > > Also check the server side logs as Kirk says. > > Cheers > > - > Aaron Morton > Freelance Cassandra Developer > @aaronmorton > http://www.thelastpickle.com > > On 22 Jul 2011, at 18:37, Kirk True wrote: > > Have you checked the logs on the nodes to see if there are any errors? > > On 7/21/11 10:43 PM, Nilabja Banerjee wrote: > > Hi All, > > I am following this following link " * > http://www.datastax.com/docs/0.7/utilities/stress_java *" for a stress > test. I am getting this notification after running this command > > *xxx.xxx.xxx.xx= my ip* > > *contrib/stress/bin/stress -d xxx.xxx.xxx.xx* > > *Created keyspaces. Sleeping 1s for propagation. > total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time > Operation [44] retried 10 times - error inserting key 044 > ((UnavailableException)) > > Operation [49] retried 10 times - error inserting key 049 > ((UnavailableException)) > > Operation [7] retried 10 times - error inserting key 007 > ((UnavailableException)) > > Operation [6] retried 10 times - error inserting key 006 > ((UnavailableException)) > * > > > *Any idea why I am getting these things?* > > > *Thank You > * > > > * > * > > > -- > Kirk True > Founder, Principal Engineer > > <http://www.mustardgrain.com/> > > *Expert Engineering Firepower* > > About us: <http://www.twitter.com/mustardgraininc> > <http://www.linkedin.com/company/mustard-grain-inc.> > > >
Re: Stress test using Java-based stress utility
What does nodetool ring say? On Fri, Jul 22, 2011 at 12:43 AM, Nilabja Banerjee wrote: > Hi All, > > I am following this following link " > http://www.datastax.com/docs/0.7/utilities/stress_java " for a stress test. > I am getting this notification after running this command > > xxx.xxx.xxx.xx= my ip > > contrib/stress/bin/stress -d xxx.xxx.xxx.xx > > Created keyspaces. Sleeping 1s for propagation. > total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time > Operation [44] retried 10 times - error inserting key 044 > ((UnavailableException)) > > Operation [49] retried 10 times - error inserting key 049 > ((UnavailableException)) > > Operation [7] retried 10 times - error inserting key 007 > ((UnavailableException)) > > Operation [6] retried 10 times - error inserting key 006 > ((UnavailableException)) > > Any idea why I am getting these things? > > Thank You > > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Stress test using Java-based stress utility
Hi,, I too wanna know what this stress tool do? What is the usage of this tool... Please explain On Fri, Jul 22, 2011 at 6:39 PM, Jonathan Ellis wrote: > What does nodetool ring say? > > On Fri, Jul 22, 2011 at 12:43 AM, Nilabja Banerjee > wrote: > > Hi All, > > > > I am following this following link " > > http://www.datastax.com/docs/0.7/utilities/stress_java " for a stress > test. > > I am getting this notification after running this command > > > > xxx.xxx.xxx.xx= my ip > > > > contrib/stress/bin/stress -d xxx.xxx.xxx.xx > > > > Created keyspaces. Sleeping 1s for propagation. > > total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time > > Operation [44] retried 10 times - error inserting key 044 > > ((UnavailableException)) > > > > Operation [49] retried 10 times - error inserting key 049 > > ((UnavailableException)) > > > > Operation [7] retried 10 times - error inserting key 007 > > ((UnavailableException)) > > > > Operation [6] retried 10 times - error inserting key 006 > > ((UnavailableException)) > > > > Any idea why I am getting these things? > > > > Thank You > > > > > > > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com >
Re: Stress test using Java-based stress utility
It's in the source distribution under tools/stress see the instructions in the README file and then look at the command line help (bin/stress --help). Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 26 Jul 2011, at 19:40, CASSANDRA learner wrote: > Hi,, > I too wanna know what this stress tool do? What is the usage of this tool... > Please explain > > On Fri, Jul 22, 2011 at 6:39 PM, Jonathan Ellis wrote: > What does nodetool ring say? > > On Fri, Jul 22, 2011 at 12:43 AM, Nilabja Banerjee > wrote: > > Hi All, > > > > I am following this following link " > > http://www.datastax.com/docs/0.7/utilities/stress_java " for a stress test. > > I am getting this notification after running this command > > > > xxx.xxx.xxx.xx= my ip > > > > contrib/stress/bin/stress -d xxx.xxx.xxx.xx > > > > Created keyspaces. Sleeping 1s for propagation. > > total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time > > Operation [44] retried 10 times - error inserting key 044 > > ((UnavailableException)) > > > > Operation [49] retried 10 times - error inserting key 049 > > ((UnavailableException)) > > > > Operation [7] retried 10 times - error inserting key 007 > > ((UnavailableException)) > > > > Operation [6] retried 10 times - error inserting key 006 > > ((UnavailableException)) > > > > Any idea why I am getting these things? > > > > Thank You > > > > > > > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com >
Re: Stress test using Java-based stress utility
Thank you every one it is working fine. I was watching jconsole behavior...can tell me where exactly I can find " *RecentHitRates" : *Tuning for Optimal Caching: Here they have given one example of that * http://www.datastax.com/docs/0.8/operations/cache_tuning#configuring-key-and-row-caches * *RecentHitRates... *In my jconsole within MBean I am unable to find that one. what is the value of long[36] and long[90]. From Jconsole attributes how can I find the *performance of the casssandra while stress testing? Thank You *** On 26 July 2011 14:33, aaron morton wrote: > It's in the source distribution under tools/stress see the instructions in > the README file and then look at the command line help (bin/stress --help). > > Cheers > > - > Aaron Morton > Freelance Cassandra Developer > @aaronmorton > http://www.thelastpickle.com > > On 26 Jul 2011, at 19:40, CASSANDRA learner wrote: > > Hi,, > I too wanna know what this stress tool do? What is the usage of this > tool... Please explain > > On Fri, Jul 22, 2011 at 6:39 PM, Jonathan Ellis wrote: > >> What does nodetool ring say? >> >> On Fri, Jul 22, 2011 at 12:43 AM, Nilabja Banerjee >> wrote: >> > Hi All, >> > >> > I am following this following link " >> > http://www.datastax.com/docs/0.7/utilities/stress_java " for a stress >> test. >> > I am getting this notification after running this command >> > >> > xxx.xxx.xxx.xx= my ip >> > >> > contrib/stress/bin/stress -d xxx.xxx.xxx.xx >> > >> > Created keyspaces. Sleeping 1s for propagation. >> > total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time >> > Operation [44] retried 10 times - error inserting key 044 >> > ((UnavailableException)) >> > >> > Operation [49] retried 10 times - error inserting key 049 >> > ((UnavailableException)) >> > >> > Operation [7] retried 10 times - error inserting key 007 >> > ((UnavailableException)) >> > >> > Operation [6] retried 10 times - error inserting key 006 >> > ((UnavailableException)) >> > >> > Any idea why I am getting these things? >> > >> > Thank You >> > >> > >> > >> >> >> >> -- >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder of DataStax, the source for professional Cassandra support >> http://www.datastax.com >> > > >
Re: Stress test using Java-based stress utility
cassandra.db.Caches On Tue, Jul 26, 2011 at 2:11 AM, Nilabja Banerjee wrote: > Thank you every one it is working fine. > > I was watching jconsole behavior...can tell me where exactly I can find " > RecentHitRates" : > > Tuning for Optimal Caching: > > Here they have given one example of that > http://www.datastax.com/docs/0.8/operations/cache_tuning#configuring-key-and-row-caches > RecentHitRates... In my jconsole within MBean I am unable to find that > one. > what is the value of long[36] and long[90]. From Jconsole attributes > how can I find the performance of the casssandra while stress testing? > Thank You > > > On 26 July 2011 14:33, aaron morton wrote: >> >> It's in the source distribution under tools/stress see the instructions in >> the README file and then look at the command line help (bin/stress --help). >> Cheers >> - >> Aaron Morton >> Freelance Cassandra Developer >> @aaronmorton >> http://www.thelastpickle.com >> On 26 Jul 2011, at 19:40, CASSANDRA learner wrote: >> >> Hi,, >> I too wanna know what this stress tool do? What is the usage of this >> tool... Please explain >> >> On Fri, Jul 22, 2011 at 6:39 PM, Jonathan Ellis wrote: >>> >>> What does nodetool ring say? >>> >>> On Fri, Jul 22, 2011 at 12:43 AM, Nilabja Banerjee >>> wrote: >>> > Hi All, >>> > >>> > I am following this following link " >>> > http://www.datastax.com/docs/0.7/utilities/stress_java " for a stress >>> > test. >>> > I am getting this notification after running this command >>> > >>> > xxx.xxx.xxx.xx= my ip >>> > >>> > contrib/stress/bin/stress -d xxx.xxx.xxx.xx >>> > >>> > Created keyspaces. Sleeping 1s for propagation. >>> > total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time >>> > Operation [44] retried 10 times - error inserting key 044 >>> > ((UnavailableException)) >>> > >>> > Operation [49] retried 10 times - error inserting key 049 >>> > ((UnavailableException)) >>> > >>> > Operation [7] retried 10 times - error inserting key 007 >>> > ((UnavailableException)) >>> > >>> > Operation [6] retried 10 times - error inserting key 006 >>> > ((UnavailableException)) >>> > >>> > Any idea why I am getting these things? >>> > >>> > Thank You >>> > >>> > >>> > >>> >>> >>> >>> -- >>> Jonathan Ellis >>> Project Chair, Apache Cassandra >>> co-founder of DataStax, the source for professional Cassandra support >>> http://www.datastax.com >> >> > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Stress test using Java-based stress utility
Thank you Jonathan.. :) On 26 July 2011 20:08, Jonathan Ellis wrote: > cassandra.db.Caches > > On Tue, Jul 26, 2011 at 2:11 AM, Nilabja Banerjee > wrote: > > Thank you every one it is working fine. > > > > I was watching jconsole behavior...can tell me where exactly I can find " > > RecentHitRates" : > > > > Tuning for Optimal Caching: > > > > Here they have given one example of that > > > http://www.datastax.com/docs/0.8/operations/cache_tuning#configuring-key-and-row-caches > > RecentHitRates... In my jconsole within MBean I am unable to find > that > > one. > > what is the value of long[36] and long[90]. From Jconsole attributes > > how can I find the performance of the casssandra while stress testing? > > Thank You > > > > > > On 26 July 2011 14:33, aaron morton wrote: > >> > >> It's in the source distribution under tools/stress see the instructions > in > >> the README file and then look at the command line help (bin/stress > --help). > >> Cheers > >> - > >> Aaron Morton > >> Freelance Cassandra Developer > >> @aaronmorton > >> http://www.thelastpickle.com > >> On 26 Jul 2011, at 19:40, CASSANDRA learner wrote: > >> > >> Hi,, > >> I too wanna know what this stress tool do? What is the usage of this > >> tool... Please explain > >> > >> On Fri, Jul 22, 2011 at 6:39 PM, Jonathan Ellis > wrote: > >>> > >>> What does nodetool ring say? > >>> > >>> On Fri, Jul 22, 2011 at 12:43 AM, Nilabja Banerjee > >>> wrote: > >>> > Hi All, > >>> > > >>> > I am following this following link " > >>> > http://www.datastax.com/docs/0.7/utilities/stress_java " for a > stress > >>> > test. > >>> > I am getting this notification after running this command > >>> > > >>> > xxx.xxx.xxx.xx= my ip > >>> > > >>> > contrib/stress/bin/stress -d xxx.xxx.xxx.xx > >>> > > >>> > Created keyspaces. Sleeping 1s for propagation. > >>> > total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time > >>> > Operation [44] retried 10 times - error inserting key 044 > >>> > ((UnavailableException)) > >>> > > >>> > Operation [49] retried 10 times - error inserting key 049 > >>> > ((UnavailableException)) > >>> > > >>> > Operation [7] retried 10 times - error inserting key 007 > >>> > ((UnavailableException)) > >>> > > >>> > Operation [6] retried 10 times - error inserting key 006 > >>> > ((UnavailableException)) > >>> > > >>> > Any idea why I am getting these things? > >>> > > >>> > Thank You > >>> > > >>> > > >>> > > >>> > >>> > >>> > >>> -- > >>> Jonathan Ellis > >>> Project Chair, Apache Cassandra > >>> co-founder of DataStax, the source for professional Cassandra support > >>> http://www.datastax.com > >> > >> > > > > > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com >
Re: Different Load values after stress test runs....
Have you run repair on the nodes ? Maybe some data was lost and not repaired yet ? Philippe 2011/8/23 Chris Marino > Hi, we're running some performance tests against some clusters and I'm > curious about some of the numbers I see. > > I'm running the stress test against two identically configured clusters, > but after I run at stress test, I get different Load values across the > clusters? > > The difference between the two clusters is that one uses standard EC2 > interfaces, but the other runs on a virtual network. Are these differences > indicating something that I should be aware of?? > > Here is a sample of the kinds of results I'm seeing. > > Address DC RackStatus State LoadOwns >Token > > 12760588759xxx > 10.0.0.17 DC1 RAC1Up Normal 94 MB > 25.00% 0 > 10.0.0.18 DC1 RAC1Up Normal 104.52 MB > 25.00% 42535295865xxx > 10.0.0.19 DC1 RAC1Up Normal 78.58 MB > 25.00% 85070591730xxx > 10.0.0.20 DC1 RAC1Up Normal 78.58 MB > 25.00% 12760588759xxx > > Address DC RackStatus State LoadOwns >Token > > 12760588759xxx > 10.120.35.52DC1 RAC1Up Normal 103.74 MB > 25.00% 0 > 10.120.6.124DC1 RAC1Up Normal 118.99 MB > 25.00% 42535295865xxx > 10.127.90.142 DC1 RAC1Up Normal 104.26 MB > 25.00% 85070591730xxx > 10.94.69.237DC1 RAC1Up Normal 75.74 MB > 25.00% 12760588759xxx > > The first cluster with the vNet (10.0.0.0/28 addresses) consistently show > smaller Load values. The total Load of 355MB vs. 402MB with native EC2 > interfaces?? Is a total Load value even meaningful?? The stress test is the > very first thing that's run against the clusters. > > [I'm also a little puzzled that these numbers are not uniform within the > clusters, but I suspect that's because the stress test is using a key > distribution that is Gaussian. I'm not 100% sure of this either since I've > seen conflicting documentation. Haven't tried 'random' keys, but I presume > that would change them to be uniform] > > Except for these curious Load numbers, things seem to be running just fine. > Getting good fast results. Over 10 iterations I'm getting more than 10-12K > inserts per sec. (default values for the stress test). > > Should I expect the Load to be the same across different clusters?? What > might explain the differences I'm seeing??? > > Thanks in advance. > CM >
How to stress test collections in Cassandra Stress
Hi, I'm trying to do a stress test on a a table with a collection column, but cannot figure out how to do that. I tried table_definition: | CREATE TABLE list ( customer_id bigint, items list, PRIMARY KEY (customer_id)); columnspec: - name: customer_id size: fixed(64) population: norm(0..40M) - name: items cluster: fixed(40) When running the benchmark, I get: java.io.IOException: Operation x10 on key(s) [27056313]: Error executing: (NoSuchElementException)
Cassandra 0.6.2 stress test failing due to setKeyspace issue
Can someone direct me how to resolve this issue in cassandra 0.6.2 version? ./stress.py -o insert -n 1 -y regular -d ec2-174-129-65-118.compute-1.amazonaws.com --threads 5 --keep-going Created keyspaces. Sleeping 1s for propagation.Traceback (most recent call last): File "./stress.py", line 381, in benchmark() File "./stress.py", line 363, in insert threads = self.create_threads('insert') File "./stress.py", line 325, in create_threads th = OperationFactory.create(type, i, self.opcounts, self.keycounts, self.latencies) File "./stress.py", line 310, in create return Inserter(i, opcounts, keycounts, latencies) File "./stress.py", line 178, in __init__ self.cclient.set_keyspace('Keyspace1') File "/home/ubuntu/cassandra/interface/thrift/gen-py/cassandra/Cassandra.py", line 333, in set_keyspace self.recv_set_keyspace() File "/home/ubuntu/cassandra/interface/thrift/gen-py/cassandra/Cassandra.py", line 349, in recv_set_keyspace raise xthrift.Thrift.TApplicationException: Invalid method name: 'set_keyspace' Niru
What need to be monitored while running stress test
What are the key things to monitor while running a stress test? There is tons of details in nodetoll tpstats/netstats/cfstats. What in particular should I be looking at? Also, I've been looking at iostat and await really goes high but cfstats shows low latency in microsecs. Is latency in cfstats calculated per operation? I am just trying to understand what I need to look just to make sure I don't overlook important points in process of evaluating cassandra. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/What-need-to-be-monitored-while-running-stress-test-tp6255765p6255765.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: How to stress test collections in Cassandra Stress
Hi, Collections are not supported in cassandra-stress tool. I suggest you use Jmeter with cassandra java driver to do your stress test with collection or Spark. 2017-04-13 16:26 GMT+02:00 eugene miretsky : > Hi, > > I'm trying to do a stress test on a a table with a collection column, but > cannot figure out how to do that. > > I tried > > table_definition: | > CREATE TABLE list ( > customer_id bigint, > items list, > PRIMARY KEY (customer_id)); > > columnspec: > - name: customer_id > size: fixed(64) > population: norm(0..40M) > - name: items > cluster: fixed(40) > > When running the benchmark, I get: java.io.IOException: Operation x10 on > key(s) [27056313]: Error executing: (NoSuchElementException) > > > -- Cordialement; Ahmed ELJAMI
Re: How to stress test collections in Cassandra Stress
unsubscribe On Thu, Apr 13, 2017 at 7:26 AM, eugene miretsky wrote: > Hi, > > I'm trying to do a stress test on a a table with a collection column, but > cannot figure out how to do that. > > I tried > > table_definition: | > CREATE TABLE list ( > customer_id bigint, > items list, > PRIMARY KEY (customer_id)); > > columnspec: > - name: customer_id > size: fixed(64) > population: norm(0..40M) > - name: items > cluster: fixed(40) > > When running the benchmark, I get: java.io.IOException: Operation x10 on > key(s) [27056313]: Error executing: (NoSuchElementException) > > >
Re: How to stress test collections in Cassandra Stress
Hi 'luckiboy'. You have been trying to unsubscribe from Cassandra dev and user list lately. To do so, sending "unsubscribe" in a message is not the way to go as you probably noticed by now. It just spam people on those lists. As written here http://cassandra.apache.org/community/, you actually have to send an email to both user-unsubscr...@cassandra.apache.org and dev-unsubscr...@cassandra.apache.org. Cheers, --- Alain Rodriguez - @arodream - al...@thelastpickle.com France The Last Pickle - Apache Cassandra Consulting http://www.thelastpickle.com 2017-04-24 15:08 GMT+02:00 LuckyBoy : > unsubscribe > > On Thu, Apr 13, 2017 at 7:26 AM, eugene miretsky < > eugene.miret...@gmail.com> wrote: > >> Hi, >> >> I'm trying to do a stress test on a a table with a collection column, but >> cannot figure out how to do that. >> >> I tried >> >> table_definition: | >> CREATE TABLE list ( >> customer_id bigint, >> items list, >> PRIMARY KEY (customer_id)); >> >> columnspec: >> - name: customer_id >> size: fixed(64) >> population: norm(0..40M) >> - name: items >> cluster: fixed(40) >> >> When running the benchmark, I get: java.io.IOException: Operation x10 on >> key(s) [27056313]: Error executing: (NoSuchElementException) >> >> >> >
Re: Cassandra 0.6.2 stress test failing due to setKeyspace issue
you're running a 0.7 stress.py against a 0.6 cassandra, that's not going to work On Thu, Jul 1, 2010 at 12:16 PM, maneela a wrote: > Can someone direct me how to resolve this issue in cassandra 0.6.2 version? > > ./stress.py -o insert -n 1 -y regular -d > ec2-174-129-65-118.compute-1.amazonaws.com --threads 5 --keep-going > > Created keyspaces. Sleeping 1s for propagation. > Traceback (most recent call last): > File "./stress.py", line 381, in > benchmark() > File "./stress.py", line 363, in insert > threads = self.create_threads('insert') > File "./stress.py", line 325, in create_threads > th = OperationFactory.create(type, i, self.opcounts, self.keycounts, > self.latencies) > File "./stress.py", line 310, in create > return Inserter(i, opcounts, keycounts, latencies) > File "./stress.py", line 178, in __init__ > self.cclient.set_keyspace('Keyspace1') > File > "/home/ubuntu/cassandra/interface/thrift/gen-py/cassandra/Cassandra.py", > line 333, in set_keyspace > self.recv_set_keyspace() > File > "/home/ubuntu/cassandra/interface/thrift/gen-py/cassandra/Cassandra.py", > line 349, in recv_set_keyspace > raise x > thrift.Thrift.TApplicationException: Invalid method name: 'set_keyspace' > > > Niru > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: Cassandra 0.6.2 stress test failing due to setKeyspace issue
Thanks Jonathan --- On Thu, 7/1/10, Jonathan Ellis wrote: From: Jonathan Ellis Subject: Re: Cassandra 0.6.2 stress test failing due to setKeyspace issue To: user@cassandra.apache.org Date: Thursday, July 1, 2010, 3:32 PM you're running a 0.7 stress.py against a 0.6 cassandra, that's not going to work On Thu, Jul 1, 2010 at 12:16 PM, maneela a wrote: Can someone direct me how to resolve this issue in cassandra 0.6.2 version? ./stress.py -o insert -n 1 -y regular -d ec2-174-129-65-118.compute-1.amazonaws.com --threads 5 --keep-going Created keyspaces. Sleeping 1s for propagation.Traceback (most recent call last): File "./stress.py", line 381, in benchmark() File "./stress.py", line 363, in insert threads = self.create_threads('insert') File "./stress.py", line 325, in create_threads th = OperationFactory.create(type, i, self.opcounts, self.keycounts, self.latencies) File "./stress.py", line 310, in create return Inserter(i, opcounts, keycounts, latencies) File "./stress.py", line 178, in __init__ self.cclient.set_keyspace('Keyspace1') File "/home/ubuntu/cassandra/interface/thrift/gen-py/cassandra/Cassandra.py", line 333, in set_keyspace self.recv_set_keyspace() File "/home/ubuntu/cassandra/interface/thrift/gen-py/cassandra/Cassandra.py", line 349, in recv_set_keyspace raise xthrift.Thrift.TApplicationException: Invalid method name: 'set_keyspace' Niru -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: What need to be monitored while running stress test
The storage proxy latencies are the primary metric: in particular, the latency histograms show the distribution of query times. On Fri, Apr 8, 2011 at 5:27 PM, mcasandra wrote: > What are the key things to monitor while running a stress test? There is > tons > of details in nodetoll tpstats/netstats/cfstats. What in particular should > I > be looking at? > > Also, I've been looking at iostat and await really goes high but cfstats > shows low latency in microsecs. Is latency in cfstats calculated per > operation? > > I am just trying to understand what I need to look just to make sure I > don't > overlook important points in process of evaluating cassandra. > > -- > View this message in context: > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/What-need-to-be-monitored-while-running-stress-test-tp6255765p6255765.html > Sent from the cassandra-u...@incubator.apache.org mailing list archive at > Nabble.com. >
Re: What need to be monitored while running stress test
What is a storage proxy latency? By query latency you mean the one in cfstats and cfhistorgrams? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/What-need-to-be-monitored-while-running-stress-test-tp6255765p6257932.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: What need to be monitored while running stress test
in jconsole MBean org.apache.cassandra.db.StorageProxy It shows the latency for read and write operations, not just per CF Aaron On 10 Apr 2011, at 11:37, mcasandra wrote: > What is a storage proxy latency? > > By query latency you mean the one in cfstats and cfhistorgrams? > > -- > View this message in context: > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/What-need-to-be-monitored-while-running-stress-test-tp6255765p6257932.html > Sent from the cassandra-u...@incubator.apache.org mailing list archive at > Nabble.com.
Cassandra stress test and max vs. average read/write latency.
Has anyone looked much at the maximum latency of cassandra read/write requests? (rather than the average latency and average throughput) We've been struggling for quite some time trying to figure out why we we see occasional read or write response times in the 100s of milliseconds even on fast machines that normally respond in just a few milleconds. We've spent a lot of time attempting to trying to tune our benchark and cassandra configurations to lower these maximum times. There are a lot of things that make it a little better, or a little worse, but we've found it nearly impossible to eliminate these "outliers" completely. A lot of our initial testing was done with a home-grown benchmark test written in C++ and using the thrift interface. However, now that we've recently upgraded from 0.6.8 to 1.0.3, that has allowed me to do some testing using the official java "stress" tool. The problem, at least for this purpose, is that the stress tool only reports *average* response times over the measurement intervals.This effectively hides the large value if they are infrequent relative to measurement interval. I've modified the stress test, so that it also tracks the maximum latency reported over each measurement interval. Here is an excerpt from a typical result: $ bin/stress -d XXX -p -e QUORUM -t 4 -i 1 -l 3 -c 1 -n 40 total interval_op_rate interval_key_rate avg_latency elapsed_time max(millisec) 5780 5780 5780 6.098615916955017E-4 1 13 13837 8057 8057 5.003102891895247E-4 2 4 22729 8892 8892 4.7199730094466935E-4 3 4 31840 9111 9111 4.6504225661288555E-4 4 1 40925 9085 9085 4.6846450192625206E-4 5 1 49076 8151 8151 5.20054799411E-4 6 100 ... 3186625 8886 8886 4.786180508665316E-4 411 10 3195626 9001 9001 4.705032774136207E-4 412 1 3204574 8948 8948 4.710549843540456E-4 414 1 3213524 8950 8950 4.7195530726256986E-4 415 1 3217534 4010 4010 0.0010763092269326683 416 607 3226560 9026 9026 4.695324617770884E-4 417 1 3235425 8865 8865 4.7805978567399887E-4 418 1 3244177 8752 8752 4.848034734917733E-4 419 10 ... My patch adds the final column which logs the maximum response time over the one-second interval. In most cases the average reponse time is under 1 msec, and though the maximum might be a bit larger, it's still just a few milleseonds - usually under 10 msec. But sometimes (like interval 416) one of the response took 607 milliseconds. These numbers aren't too bad if you are supporting an interactive application and don't mind a slightly slower response now and then as long as the average stays low and throughput stays high. But for other types of applications, these slow responses might be a problem. I'm trying to understand if this is expected or not, and if there is anything we can do about. I'd also be interested in hearing from folks that have run the stress test agains their own Cassandra clusters. If anyone wants to try this, I've included my patch to the stress test below. So far, I've only instrumented the default "insert" operation. (It also adusts the output to separate fields with spaces instead of commas. I find that easier to read and it caters to gnuplot) diff -ur stress/src/org/apache/cassandra/stress/operations/Inserter.java /home/psfales/MME/apache-cassandra-1.0.3-src/tools/stress/src/org/apache/cassandra/stress/operations/Inserter.java --- stress/src/org/apache/cassandra/stress/operations/Inserter.java 2011-11-15 02:57:23.0 -0600 +++ /home/psfales/MME/apache-cassandra-1.0.3-src/tools/stress/src/org/apache/cassandra/stress/operations/Inserter.java 2011-12-09 08:58:04.0 -0600 @@ -108,6 +108,11 @@ session.operations.getAndIncrement(); session.keys.getAndIncrement(); session.latency.getAndAdd(System.currentTimeMillis() - start); + +long delta = System.currentTimeMillis() - start; +if ( delta > session.maxlatency.get() ) { +session.maxlatency.set(delta); +} } private Map> getSuperColumnsMutationMap(List superColumns) diff -ur stress/src/org/apache/cassandra/stress/Session.java /home/psfales/MME/apache-cassandra-1.0.3-src/tools/stress/src/org/apache/cassandra/stress/Session.java --- stress/src/org/apache/cassandra/stress/Session.java 2011-11-15 02:57:23.0 -0600 +++ /home/psfales/MME/apache-cassandra-1.0.3-src/tools/stress/src/org/apache/cassandra/stress/Session.java 2011-12-09 08:48:45.0 -0600 @@ -53,6 +53,7 @@ public final AtomicInteger operations; public final AtomicInteger keys; public final AtomicLonglatency; +public final AtomicLongmaxlatency; static { @@ -337,6 +338,7 @@ operations = new AtomicInteger(); keys = new AtomicInteger(); latency = new AtomicLong(); + maxlatency = new AtomicLong(); } public int getCardinality() diff -ur stress/src/org/apache/cassa
Re: Cassandra stress test and max vs. average read/write latency.
> I'm trying to understand if this is expected or not, and if there is Without careful tuning, outliers around a couple of hundred ms are definitely expected in general (not *necessarily*, depending on workload) as a result of garbage collection pauses. The impact will be worsened a bit if you are running under high CPU load (or even maxing it out with stress) because post-pause, if you are close to max CPU usage you will take considerably longer to "catch up". Personally, I would just log each response time and feed it to gnuplot or something. It should be pretty obvious whether or not the latencies are due to periodic pauses. If you are concerned with eliminating or reducing outliers, I would: (1) Make sure that when you're benchmarking, that you're putting Cassandra under a reasonable amount of load. Latency benchmarks are usually useless if you're benchmarking against a saturated system. At least, start by achieving your latency goals at 25% or less CPU usage, and then go from there if you want to up it. (2) One can affect GC pauses, but it's non-trivial to eliminate the problem completely. For example, the length of frequent young-gen pauses can typically be decreased by decreasing the size of the young generation, leading to more frequent shorter GC pauses. But that instead causes more promotion into the old generation, which will result in more frequent very long pauses (relative to normal; they would still be infrequent relative to young gen pauses) - IF your workload is such that you are suffering from fragmentation and eventually seeing Cassandra fall back to full compacting GC:s (stop-the-world) for the old generation. I would start by adjusting young gen so that your frequent pauses are at an acceptable level, and then see whether or not you can sustain that in terms of old-gen. Start with this in any case: Run Cassandra with -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)
Re: Cassandra stress test and max vs. average read/write latency.
Peter, Thanks for your input. Can you tell me more about what we should be looking for in the gc log? We've already got the gc logging turned on and, and we've already done the plotting to show that in most cases the outliers are happening periodically (with a period of 10s of seconds to a few minutes, depnding on load and tuning) I've tried to correlate the times of the outliers with messages either in the system log or the gc log. There seemms to be some (but not complete) correlation between the outliers and system log messages about memtable flushing. I can not find anything in the gc log that seems to be an obvious problem, or that matches up with the time times of the outliers. On Mon, Dec 19, 2011 at 09:47:32PM -0800, Peter Schuller wrote: > > I'm trying to understand if this is expected or not, and if there is > > Without careful tuning, outliers around a couple of hundred ms are > definitely expected in general (not *necessarily*, depending on > workload) as a result of garbage collection pauses. The impact will be > worsened a bit if you are running under high CPU load (or even maxing > it out with stress) because post-pause, if you are close to max CPU > usage you will take considerably longer to "catch up". > > Personally, I would just log each response time and feed it to gnuplot > or something. It should be pretty obvious whether or not the latencies > are due to periodic pauses. > > If you are concerned with eliminating or reducing outliers, I would: > > (1) Make sure that when you're benchmarking, that you're putting > Cassandra under a reasonable amount of load. Latency benchmarks are > usually useless if you're benchmarking against a saturated system. At > least, start by achieving your latency goals at 25% or less CPU usage, > and then go from there if you want to up it. > > (2) One can affect GC pauses, but it's non-trivial to eliminate the > problem completely. For example, the length of frequent young-gen > pauses can typically be decreased by decreasing the size of the young > generation, leading to more frequent shorter GC pauses. But that > instead causes more promotion into the old generation, which will > result in more frequent very long pauses (relative to normal; they > would still be infrequent relative to young gen pauses) - IF your > workload is such that you are suffering from fragmentation and > eventually seeing Cassandra fall back to full compacting GC:s > (stop-the-world) for the old generation. > > I would start by adjusting young gen so that your frequent pauses are > at an acceptable level, and then see whether or not you can sustain > that in terms of old-gen. > > Start with this in any case: Run Cassandra with -XX:+PrintGC > -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps > > -- > / Peter Schuller (@scode, http://worldmodscode.wordpress.com) -- Peter Fales Alcatel-Lucent Member of Technical Staff 1960 Lucent Lane Room: 9H-505 Naperville, IL 60566-7033 Email: peter.fa...@alcatel-lucent.com Phone: 630 979 8031
Re: Cassandra stress test and max vs. average read/write latency.
> Thanks for your input. Can you tell me more about what we should be > looking for in the gc log? We've already got the gc logging turned > on and, and we've already done the plotting to show that in most > cases the outliers are happening periodically (with a period of > 10s of seconds to a few minutes, depnding on load and tuning) Are you measuring writes or reads? If writes, https://issues.apache.org/jira/browse/CASSANDRA-1991 is still relevant I think (sorry no progress from my end on that one). Also, I/O scheduling issues can easily cause problems with the commit log latency (on fsync()). Try switching to periodic commit log mode and see if it helps, just to eliminate that (if you're not already in periodic; if so, try upping the interval). For reads, I am generally unaware of much aside from GC and legitimate "jitter" (scheduling/disk I/O etc) that would generate outliers. At least that I can think of off hand... And w.r.t. the GC log - yeah, correlating in time is one thing. Another thing is to confirm what kind of GC pauses you're seeing. Generally you want to be seeing lots of ParNew:s of shorter duration, and those are tweakable by changing the young generation size. The other thing is to make sure CMS is not failing (promotion failure/concurrent mode failure) and falling back to a stop-the-world serial compacting GC of the entire heap. You might also use -:XX+PrintApplicationPauseTime (I think, I am probably not spelling it entirely correctly) to get a more obvious and greppable report for each pause, regardless of "type"/cause. > I've tried to correlate the times of the outliers with messages either > in the system log or the gc log. There seemms to be some (but not > complete) correlation between the outliers and system log messages about > memtable flushing. I can not find anything in the gc log that > seems to be an obvious problem, or that matches up with the time > times of the outliers. And these are still the very extreme (500+ ms and such) outliers that you're seeing w/o GC correlation? Off the top of my head, that seems very unexpected (assuming a non-saturated system) and would definitely invite investigation IMO. If you're willing to start iterating with the source code I'd start bisecting down the call stack and see where it's happening . -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)
Re: Cassandra stress test and max vs. average read/write latency.
Peter, Thanks for your response. I'm looking into some of the ideas in your other recent mail, but I had another followup question on this one... Is there any way to control the CPU load when using the "stress" benchmark? I have some control over that with our home-grown benchmark, but I thought it made sense to use the official benchmark tool as people might more readily believe those results and/or be able to reproduce them. But offhand, I don't see any to throttle back the load created by the stress test. On Mon, Dec 19, 2011 at 09:47:32PM -0800, Peter Schuller wrote: > > I'm trying to understand if this is expected or not, and if there is > > Without careful tuning, outliers around a couple of hundred ms are > definitely expected in general (not *necessarily*, depending on > workload) as a result of garbage collection pauses. The impact will be > worsened a bit if you are running under high CPU load (or even maxing > it out with stress) because post-pause, if you are close to max CPU > usage you will take considerably longer to "catch up". > > Personally, I would just log each response time and feed it to gnuplot > or something. It should be pretty obvious whether or not the latencies > are due to periodic pauses. > > If you are concerned with eliminating or reducing outliers, I would: > > (1) Make sure that when you're benchmarking, that you're putting > Cassandra under a reasonable amount of load. Latency benchmarks are > usually useless if you're benchmarking against a saturated system. At > least, start by achieving your latency goals at 25% or less CPU usage, > and then go from there if you want to up it. > > (2) One can affect GC pauses, but it's non-trivial to eliminate the > problem completely. For example, the length of frequent young-gen > pauses can typically be decreased by decreasing the size of the young > generation, leading to more frequent shorter GC pauses. But that > instead causes more promotion into the old generation, which will > result in more frequent very long pauses (relative to normal; they > would still be infrequent relative to young gen pauses) - IF your > workload is such that you are suffering from fragmentation and > eventually seeing Cassandra fall back to full compacting GC:s > (stop-the-world) for the old generation. > > I would start by adjusting young gen so that your frequent pauses are > at an acceptable level, and then see whether or not you can sustain > that in terms of old-gen. > > Start with this in any case: Run Cassandra with -XX:+PrintGC > -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps > > -- > / Peter Schuller (@scode, http://worldmodscode.wordpress.com) -- Peter Fales Alcatel-Lucent Member of Technical Staff 1960 Lucent Lane Room: 9H-505 Naperville, IL 60566-7033 Email: peter.fa...@alcatel-lucent.com Phone: 630 979 8031
Re: Cassandra stress test and max vs. average read/write latency.
> Is there any way to control the CPU load when using the "stress" benchmark? > I have some control over that with our home-grown benchmark, but I > thought it made sense to use the official benchmark tool as people might > more readily believe those results and/or be able to reproduce them. But > offhand, I don't see any to throttle back the load created by the > stress test. I'm not aware of one built-in. It would be a useful patch IMO, to allow setting a target rate. -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)
Cassandra Doesn't Get Linear Performance Increment in Stress Test on Amazon EC2
Hi, I am doing stress test on Datastax Cassandra Community 2.1.2, not using the provided stress test tool, but use my own stress-test client code instead(I write some C++ stress test code). My Cassandra cluster is deployed on Amazon EC2, using the provided Datastax Community AMI( HVM instances ) in the Datastax document, and I am not using EBS, just using the ephemeral storage by default. The EC2 type of Cassandra servers are m3.xlarge. I use another EC2 instance for my stress test client, which is of type r3.8xlarge. Both the Cassandra sever nodes and stress test client node are in us-east. I test the Cassandra cluster which is made up of 1 node, 2 nodes, and 4 nodes separately. I just do INSERT test and SELECT test separately, but the performance doesn't get linear increment when new nodes are added. Also I get some weird results. My test results are as follows(I do 1 million operations and I try to get the best QPS when the max latency is no more than 200ms, and the latencies are measured from the client side. The QPS is calculated by total_operations/total_time). INSERT(write): Node count Replication factor QPS Average latency(ms) Min latency(ms) .95 latency(ms) .99 latency(ms) .999 latency(ms) Max latency(ms) 1 1 18687 2.08 1.48 2.95 5.74 52.8 205.4 2 1 20793 3.15 0.84 7.71 41.35 88.7 232.7 2 2 22498 3.37 0.86 6.04 36.1 221.5 649.3 4 1 28348 4.38 0.85 8.19 64.51 169.4 251.9 4 3 28631 5.22 0.87 18.68 68.35 167.2 288 SELECT(read): Node count Replication factor QPS Average latency(ms) Min latency(ms) .95 latency(ms) .99 latency(ms) .999 latency(ms) Max latency(ms) 1 1 24498 4.01 1.51 7.6 12.51 31.5 129.6 2 1 28219 3.38 0.85 9.5 17.71 39.2 152.2 2 2 35383 4.06 0.87 9.71 21.25 70.3 215.9 4 1 34648 2.78 0.86 6.07 14.94 30.8 134.6 4 3 52932 3.45 0.86 10.81 21.05 37.4 189.1 The test data I use is generated randomly, and the schema I use is like (I use the cqlsh to create the columnfamily/table): CREATE TABLE table( id1 varchar, ts varchar, id2 varchar, msg varchar, PRIMARY KEY(id1, ts, id2)); So the fields are all string and I generate each character of the string randomly, using srand(time(0)) and rand() in C++, so I think my test data could be uniformly distributed into the Cassandra cluster. And, in my client stress test code, I use thrift C++ interface, and the basic operation I do is like: thrift_client.execute_cql3_query("INSERT INTO table WHERE id1=xxx, ts=xxx, id2=xxx, msg=xxx"); and thrift_client.execute_cql3_query("SELECT FROM table WHERE id1=xxx"); Each data entry I INSERT of SELECT is of around 100 characters. On my stress test client, I create several threads to send the read and write requests, each thread having its own thrift client, and at the beginning all the thrift clients connect to the Cassandra servers evenly. For example, I create 160 thrift clients, and each 40 clients of them connect to one server node, in a 4 node cluster. So, 1. Could anyone help me explain my test results? Why does the performance ( QPS ) just get a little increment when new nodes are added? 2. I learn from the materials that, Cassandra has better write performance than read. But why in my case the read performance is better? 3. I also use the OpsCenter to monitor the real-time performance of my cluster. But when I get the average QPS above, the operations/s provided by OpsCenter is around 1+ for write peak and 5000+ for read peak. Why is my result inconsistent with that from OpsCenter? 4. Are there any unreasonable things in my test method, such as test data and QPS calculation? Thank you very much, Joy
Re: Cassandra Doesn't Get Linear Performance Increment in Stress Test on Amazon EC2
Hi Joy, Are you resetting your data after each test run? I wonder if your tests are actually causing you to fall behind on data grooming tasks such as compaction, and so performance suffers for your later tests. There are *so many* factors which can affect performance, without reviewing test methodology in great detail, it's really hard to say whether there are flaws which might uncover an antipattern cause atypical number of cache hits or misses, and so forth. You may also be producing gc pressure in the write path, and so forth. I *can* say that 28k writes per second looks just a little low, but it depends a lot on your network, hardware, and write patterns (eg, data size). For a little performance test suite I wrote, with parallel batched writes, on a 3 node rf=3 cluster test cluster, I got about 86k writes per second. Also focusing exclusively on max latency is going to cause you some troubles especially in the case of magnetic media as you're using. Between ill-timed GC and inconsistent performance characteristics from magnetic media, your max numbers will often look significantly worse than your p(99) or p(999) numbers. All this said, one node will often look better than several nodes for certain patterns because it completely eliminates proxy (coordinator) write times. All writes are local writes. It's an over-simple case that doesn't reflect any practical production use of Cassandra, so it's probably not worth even including in your tests. I would recommend start at 3 nodes rf=3, and compare against 6 nodes rf=6. Make sure you're staying on top of compaction and aren't seeing garbage collections in the logs (either of those will be polluting your results with variability you can't account for with small sample sizes of ~1 million). If you expect to sustain write volumes like this, you'll find these clusters are sized too small (on that hardware you won't keep up with compaction), and your tests are again testing scenarios you wouldn't actually see in production. On Sat Dec 06 2014 at 7:09:18 AM kong wrote: > Hi, > > I am doing stress test on Datastax Cassandra Community 2.1.2, not using > the provided stress test tool, but use my own stress-test client code > instead(I write some C++ stress test code). My Cassandra cluster is > deployed on Amazon EC2, using the provided Datastax Community AMI( HVM > instances ) in the Datastax document, and I am not using EBS, just using > the ephemeral storage by default. The EC2 type of Cassandra servers are > m3.xlarge. I use another EC2 instance for my stress test client, which is > of type r3.8xlarge. Both the Cassandra sever nodes and stress test client > node are in us-east. I test the Cassandra cluster which is made up of 1 > node, 2 nodes, and 4 nodes separately. I just do INSERT test and SELECT > test separately, but the performance doesn’t get linear increment when new > nodes are added. Also I get some weird results. My test results are as > follows(*I do 1 million operations and I try to get the best QPS when the > max latency is no more than 200ms, and the latencies are measured from the > client side. The QPS is calculated by total_operations/total_time).* > > > > *INSERT(write):* > > Node count > > Replication factor > > QPS > > Average latency(ms) > > Min latency(ms) > > .95 latency(ms) > > .99 latency(ms) > > .999 latency(ms) > > Max latency(ms) > > 1 > > 1 > > 18687 > > 2.08 > > 1.48 > > 2.95 > > 5.74 > > 52.8 > > 205.4 > > 2 > > 1 > > 20793 > > 3.15 > > 0.84 > > 7.71 > > 41.35 > > 88.7 > > 232.7 > > 2 > > 2 > > 22498 > > 3.37 > > 0.86 > > 6.04 > > 36.1 > > 221.5 > > 649.3 > > 4 > > 1 > > 28348 > > 4.38 > > 0.85 > > 8.19 > > 64.51 > > 169.4 > > 251.9 > > 4 > > 3 > > 28631 > > 5.22 > > 0.87 > > 18.68 > > 68.35 > > 167.2 > > 288 > > > > *SELECT(read):* > > Node count > > Replication factor > > QPS > > Average latency(ms) > > Min latency(ms) > > .95 latency(ms) > > .99 latency(ms) > > .999 latency(ms) > > Max latency(ms) > > 1 > > 1 > > 24498 > > 4.01 > > 1.51 > > 7.6 > > 12.51 > > 31.5 > > 129.6 > > 2 > > 1 > > 28219 > > 3.38 > > 0.85 > > 9.5 > > 17.71 > > 39.2 > > 152.2 > > 2 > > 2 > > 35383 > > 4.06 > > 0.87 > > 9.71 > > 21.25 > > 70.3 > > 215.9 > > 4 > > 1 > > 34648 > > 2.78 > > 0.86 > > 6.07 > > 14.94 > > 30.8 > >
Re: Cassandra Doesn't Get Linear Performance Increment in Stress Test on Amazon EC2
I'm sorry, I meant to say "6 nodes rf=3". Also look at this performance over sustained periods of times, not burst writing. Run your test for several hours and watch memory and especially compaction stats. See if you can walk in what data volume you can write while keeping outstanding compaction tasks < 5 (preferably 0 or 1) for sustained periods. Measuring just burst writes will definitely mask real world conditions, and Cassandra actually absorbs bursted writes really well (which in turn masks performance problems since by the time your write times suffer from overwhelming a cluster, you're probably already in insane and difficult to recover crisis mode). On Sun Dec 07 2014 at 8:55:47 AM Eric Stevens wrote: > Hi Joy, > > Are you resetting your data after each test run? I wonder if your tests > are actually causing you to fall behind on data grooming tasks such as > compaction, and so performance suffers for your later tests. > > There are *so many* factors which can affect performance, without > reviewing test methodology in great detail, it's really hard to say whether > there are flaws which might uncover an antipattern cause atypical number of > cache hits or misses, and so forth. You may also be producing gc pressure > in the write path, and so forth. > > I *can* say that 28k writes per second looks just a little low, but it > depends a lot on your network, hardware, and write patterns (eg, data > size). For a little performance test suite I wrote, with parallel batched > writes, on a 3 node rf=3 cluster test cluster, I got about 86k writes per > second. > > Also focusing exclusively on max latency is going to cause you some > troubles especially in the case of magnetic media as you're using. Between > ill-timed GC and inconsistent performance characteristics from magnetic > media, your max numbers will often look significantly worse than your p(99) > or p(999) numbers. > > All this said, one node will often look better than several nodes for > certain patterns because it completely eliminates proxy (coordinator) write > times. All writes are local writes. It's an over-simple case that doesn't > reflect any practical production use of Cassandra, so it's probably not > worth even including in your tests. I would recommend start at 3 nodes > rf=3, and compare against 6 nodes rf=6. Make sure you're staying on top of > compaction and aren't seeing garbage collections in the logs (either of > those will be polluting your results with variability you can't account for > with small sample sizes of ~1 million). > > If you expect to sustain write volumes like this, you'll find these > clusters are sized too small (on that hardware you won't keep up with > compaction), and your tests are again testing scenarios you wouldn't > actually see in production. > > On Sat Dec 06 2014 at 7:09:18 AM kong wrote: > >> Hi, >> >> I am doing stress test on Datastax Cassandra Community 2.1.2, not using >> the provided stress test tool, but use my own stress-test client code >> instead(I write some C++ stress test code). My Cassandra cluster is >> deployed on Amazon EC2, using the provided Datastax Community AMI( HVM >> instances ) in the Datastax document, and I am not using EBS, just using >> the ephemeral storage by default. The EC2 type of Cassandra servers are >> m3.xlarge. I use another EC2 instance for my stress test client, which is >> of type r3.8xlarge. Both the Cassandra sever nodes and stress test client >> node are in us-east. I test the Cassandra cluster which is made up of 1 >> node, 2 nodes, and 4 nodes separately. I just do INSERT test and SELECT >> test separately, but the performance doesn’t get linear increment when new >> nodes are added. Also I get some weird results. My test results are as >> follows(*I do 1 million operations and I try to get the best QPS when >> the max latency is no more than 200ms, and the latencies are measured from >> the client side. The QPS is calculated by total_operations/total_time).* >> >> >> >> *INSERT(write):* >> >> Node count >> >> Replication factor >> >> QPS >> >> Average latency(ms) >> >> Min latency(ms) >> >> .95 latency(ms) >> >> .99 latency(ms) >> >> .999 latency(ms) >> >> Max latency(ms) >> >> 1 >> >> 1 >> >> 18687 >> >> 2.08 >> >> 1.48 >> >> 2.95 >> >> 5.74 >> >> 52.8 >> >> 205.4 >> >> 2 >> >> 1 >> >> 20793 >> >> 3.15 >> >> 0.84 >> >> 7.
Re: Cassandra Doesn't Get Linear Performance Increment in Stress Test on Amazon EC2
Hi Eric, Thank you very much for your reply! Do you mean that I should clear my table after each run? Indeed, I can see several times of compaction during my test, but could only a few times compaction affect the performance that much? Also, I can see from the OpsCenter some ParNew GC happen but no CMS GC happen. I run my test on EC2 cluster, I think the network could be of high speed with in it. Each Cassandra server has 4 units CPU, 15 GiB memory and 80 SSD storage, which is of m3.xlarge type. As for latency, which latency should I care about most? p(99) or p(999)? I want to get the max QPS under a certain limited latency. I know my testing scenario are not the common case in production, I just want to know how much burden my cluster can bear under stress. So, how did you test your cluster that can get 86k writes/sec? How many requests did you send to your cluster? Was it also 1 million? Did you also use OpsCenter to monitor the real time performance? I also wonder why the write and read QPS OpsCenter provide are much lower than what I calculate. Could you please describe in detail about your test deployment? Thank you very much, Joy 2014-12-07 23:55 GMT+08:00 Eric Stevens : > Hi Joy, > > Are you resetting your data after each test run? I wonder if your tests > are actually causing you to fall behind on data grooming tasks such as > compaction, and so performance suffers for your later tests. > > There are *so many* factors which can affect performance, without > reviewing test methodology in great detail, it's really hard to say whether > there are flaws which might uncover an antipattern cause atypical number of > cache hits or misses, and so forth. You may also be producing gc pressure > in the write path, and so forth. > > I *can* say that 28k writes per second looks just a little low, but it > depends a lot on your network, hardware, and write patterns (eg, data > size). For a little performance test suite I wrote, with parallel batched > writes, on a 3 node rf=3 cluster test cluster, I got about 86k writes per > second. > > Also focusing exclusively on max latency is going to cause you some > troubles especially in the case of magnetic media as you're using. Between > ill-timed GC and inconsistent performance characteristics from magnetic > media, your max numbers will often look significantly worse than your p(99) > or p(999) numbers. > > All this said, one node will often look better than several nodes for > certain patterns because it completely eliminates proxy (coordinator) write > times. All writes are local writes. It's an over-simple case that doesn't > reflect any practical production use of Cassandra, so it's probably not > worth even including in your tests. I would recommend start at 3 nodes > rf=3, and compare against 6 nodes rf=6. Make sure you're staying on top of > compaction and aren't seeing garbage collections in the logs (either of > those will be polluting your results with variability you can't account for > with small sample sizes of ~1 million). > > If you expect to sustain write volumes like this, you'll find these > clusters are sized too small (on that hardware you won't keep up with > compaction), and your tests are again testing scenarios you wouldn't > actually see in production. > > On Sat Dec 06 2014 at 7:09:18 AM kong wrote: > >> Hi, >> >> I am doing stress test on Datastax Cassandra Community 2.1.2, not using >> the provided stress test tool, but use my own stress-test client code >> instead(I write some C++ stress test code). My Cassandra cluster is >> deployed on Amazon EC2, using the provided Datastax Community AMI( HVM >> instances ) in the Datastax document, and I am not using EBS, just using >> the ephemeral storage by default. The EC2 type of Cassandra servers are >> m3.xlarge. I use another EC2 instance for my stress test client, which is >> of type r3.8xlarge. Both the Cassandra sever nodes and stress test client >> node are in us-east. I test the Cassandra cluster which is made up of 1 >> node, 2 nodes, and 4 nodes separately. I just do INSERT test and SELECT >> test separately, but the performance doesn’t get linear increment when new >> nodes are added. Also I get some weird results. My test results are as >> follows(*I do 1 million operations and I try to get the best QPS when >> the max latency is no more than 200ms, and the latencies are measured from >> the client side. The QPS is calculated by total_operations/total_time).* >> >> >> >> *INSERT(write):* >> >> Node count >> >> Replication factor >> >> QPS >> >> Average latency(ms) >> >> Min latency(ms) >> >> .
Re: Cassandra Doesn't Get Linear Performance Increment in Stress Test on Amazon EC2
I think your client could use improvements. How many threads do you have running in your test? With a thrift call like that you only can do one request at a time per connection. For example, assuming C* takes 0ms, a 10ms network latency/driver overhead will mean 20ms RTT and a max throughput of ~50 QPS per thread (native binary doesn't behave like this). Are you running client on its own system or shared with a node? how are you load balancing your requests? Source code would help since theres a lot that can become a bottleneck. Generally you will see a bit of a dip in latency from N=RF=1 and N=2, RF=2 etc since there are optimizations on the coordinator node when it doesn't need to send the request to the replicas. The impact of the network overhead decreases in significance as cluster grows. Typically; latency wise, RF=N=1 is going to be fastest possible for smaller loads (ie when a client cannot fully saturate a single node). Main thing to expect is that latency will plateau and remain fairly constant as load/nodes increase while throughput potential will linearly (empirically at least) increase. You should really attempt it with the native binary + prepared statements, running cql over thrift is far from optimal. I would recommend using the cassandra-stress tool if you want to stress test Cassandra (and not your code) http://www.datastax.com/dev/blog/improved-cassandra-2-1-stress-tool-benchmark-any-schema === Chris Lohfink On Sun, Dec 7, 2014 at 9:48 PM, 孔嘉林 wrote: > Hi Eric, > Thank you very much for your reply! > Do you mean that I should clear my table after each run? Indeed, I can see > several times of compaction during my test, but could only a few times > compaction affect the performance that much? Also, I can see from the > OpsCenter some ParNew GC happen but no CMS GC happen. > > I run my test on EC2 cluster, I think the network could be of high speed > with in it. Each Cassandra server has 4 units CPU, 15 GiB memory and 80 SSD > storage, which is of m3.xlarge type. > > As for latency, which latency should I care about most? p(99) or p(999)? I > want to get the max QPS under a certain limited latency. > > I know my testing scenario are not the common case in production, I just > want to know how much burden my cluster can bear under stress. > > So, how did you test your cluster that can get 86k writes/sec? How many > requests did you send to your cluster? Was it also 1 million? Did you also > use OpsCenter to monitor the real time performance? I also wonder why the > write and read QPS OpsCenter provide are much lower than what I calculate. > Could you please describe in detail about your test deployment? > > Thank you very much, > Joy > > 2014-12-07 23:55 GMT+08:00 Eric Stevens : > >> Hi Joy, >> >> Are you resetting your data after each test run? I wonder if your tests >> are actually causing you to fall behind on data grooming tasks such as >> compaction, and so performance suffers for your later tests. >> >> There are *so many* factors which can affect performance, without >> reviewing test methodology in great detail, it's really hard to say whether >> there are flaws which might uncover an antipattern cause atypical number of >> cache hits or misses, and so forth. You may also be producing gc pressure >> in the write path, and so forth. >> >> I *can* say that 28k writes per second looks just a little low, but it >> depends a lot on your network, hardware, and write patterns (eg, data >> size). For a little performance test suite I wrote, with parallel batched >> writes, on a 3 node rf=3 cluster test cluster, I got about 86k writes per >> second. >> >> Also focusing exclusively on max latency is going to cause you some >> troubles especially in the case of magnetic media as you're using. Between >> ill-timed GC and inconsistent performance characteristics from magnetic >> media, your max numbers will often look significantly worse than your p(99) >> or p(999) numbers. >> >> All this said, one node will often look better than several nodes for >> certain patterns because it completely eliminates proxy (coordinator) write >> times. All writes are local writes. It's an over-simple case that doesn't >> reflect any practical production use of Cassandra, so it's probably not >> worth even including in your tests. I would recommend start at 3 nodes >> rf=3, and compare against 6 nodes rf=6. Make sure you're staying on top of >> compaction and aren't seeing garbage collections in the logs (either of >> those will be polluting your results with variability you can't account for >> with small sample sizes of ~1 million). >> >> If yo
Re: Cassandra Doesn't Get Linear Performance Increment in Stress Test on Amazon EC2
Thanks Chris. I run a *client on a separate* AWS *instance from* the Cassandra cluster servers. At the client side, I create 40 or 50 threads for sending requests to each Cassandra node. I create one thrift client for each of the threads. And at the beginning, all the created thrift clients connect to the corresponding Cassandra nodes and keep connecting during the whole process(I did not close all the transports until the end of the test process). So I use very simple load balancing, since the same number of thrift clients connect to each node. And my source code is here: https://github.com/kongjialin/Cassandra/blob/master/cassandra_client.cpp It's very nice of you to help me improve my code. As I increase the number of threads, the latency gets longer. I'm using C++, so if I want to use native binary + prepared statements, the only way is to use C++ driver? Thanks very much. 2014-12-08 12:51 GMT+08:00 Chris Lohfink : > I think your client could use improvements. How many threads do you have > running in your test? With a thrift call like that you only can do one > request at a time per connection. For example, assuming C* takes 0ms, a > 10ms network latency/driver overhead will mean 20ms RTT and a max > throughput of ~50 QPS per thread (native binary doesn't behave like this). > Are you running client on its own system or shared with a node? how are > you load balancing your requests? Source code would help since theres a > lot that can become a bottleneck. > > Generally you will see a bit of a dip in latency from N=RF=1 and N=2, RF=2 > etc since there are optimizations on the coordinator node when it doesn't > need to send the request to the replicas. The impact of the network > overhead decreases in significance as cluster grows. Typically; latency > wise, RF=N=1 is going to be fastest possible for smaller loads (ie when a > client cannot fully saturate a single node). > > Main thing to expect is that latency will plateau and remain fairly > constant as load/nodes increase while throughput potential will linearly > (empirically at least) increase. > > You should really attempt it with the native binary + prepared statements, > running cql over thrift is far from optimal. I would recommend using the > cassandra-stress tool if you want to stress test Cassandra (and not your > code) > http://www.datastax.com/dev/blog/improved-cassandra-2-1-stress-tool-benchmark-any-schema > > === > Chris Lohfink > > On Sun, Dec 7, 2014 at 9:48 PM, 孔嘉林 wrote: > >> Hi Eric, >> Thank you very much for your reply! >> Do you mean that I should clear my table after each run? Indeed, I can >> see several times of compaction during my test, but could only a few times >> compaction affect the performance that much? Also, I can see from the >> OpsCenter some ParNew GC happen but no CMS GC happen. >> >> I run my test on EC2 cluster, I think the network could be of high speed >> with in it. Each Cassandra server has 4 units CPU, 15 GiB memory and 80 SSD >> storage, which is of m3.xlarge type. >> >> As for latency, which latency should I care about most? p(99) or p(999)? >> I want to get the max QPS under a certain limited latency. >> >> I know my testing scenario are not the common case in production, I just >> want to know how much burden my cluster can bear under stress. >> >> So, how did you test your cluster that can get 86k writes/sec? How many >> requests did you send to your cluster? Was it also 1 million? Did you also >> use OpsCenter to monitor the real time performance? I also wonder why the >> write and read QPS OpsCenter provide are much lower than what I calculate. >> Could you please describe in detail about your test deployment? >> >> Thank you very much, >> Joy >> >> 2014-12-07 23:55 GMT+08:00 Eric Stevens : >> >>> Hi Joy, >>> >>> Are you resetting your data after each test run? I wonder if your tests >>> are actually causing you to fall behind on data grooming tasks such as >>> compaction, and so performance suffers for your later tests. >>> >>> There are *so many* factors which can affect performance, without >>> reviewing test methodology in great detail, it's really hard to say whether >>> there are flaws which might uncover an antipattern cause atypical number of >>> cache hits or misses, and so forth. You may also be producing gc pressure >>> in the write path, and so forth. >>> >>> I *can* say that 28k writes per second looks just a little low, but it >>> depends a lot on your network, hardware, and write patterns (eg, data >>> size). For a little performance test suite I wrote
Re: Cassandra Doesn't Get Linear Performance Increment in Stress Test on Amazon EC2
So I would -expect- an increase of ~20k qps per node with m3.xlarge so there may be something up with your client (I am not a c++ person however but hopefully someone on list will take notice). Latency does not decreases linearly as you add nodes. What you are likely seeing with latency since so few nodes is side effect of an optimization. When you read/write from a table the node you request will act as the coordinator. If the data exists on the coordinator and using rf=1 or cl=1 it will not have to send the request to another node, just service it locally: +-+ +--+ | node0 | +-->|node1 | |-| |--| | client | <--+| coordinator | +-+ +--+ In this case the write latency is dominated by the network between coordinator and client. A second case is where the coordinator actually has to send the request to another node: +-+ +--+ +---+ | node0 | +-->|node1 |+--> |node2 | |-| |--| |---| | client | <--+| coordinator |<---+| data replica | +-+ +--+ +---+ As your adding nodes your increasing the probability of hitting this second scenario where the coordinator has to make an additional network hop. This possibly why your seeing an increase (aside from client issues). To get an idea on how the latency is affected when you increase nodes you really need to go higher then 4 (ie graph the same rf for 5, 10, 15, 25 nodes. below 5 isn't really the recommended way to run Cassandra anyway) nodes since the latency will approach that of the 2nd scenario (plus some spike outliers for GCs) and then it should settle down until you overwork the node. May want to give https://github.com/datastax/cpp-driver a go (not cpp guy take with grain of salt). I would still highly recommend using cassandra-stress instead of own stuff if you want to test cassandra and not your code. === Chris Lohfink On Mon, Dec 8, 2014 at 4:57 AM, 孔嘉林 wrote: > Thanks Chris. > I run a *client on a separate* AWS *instance from* the Cassandra cluster > servers. At the client side, I create 40 or 50 threads for sending requests > to each Cassandra node. I create one thrift client for each of the threads. > And at the beginning, all the created thrift clients connect to the > corresponding Cassandra nodes and keep connecting during the whole > process(I did not close all the transports until the end of the test > process). So I use very simple load balancing, since the same number of > thrift clients connect to each node. And my source code is here: > https://github.com/kongjialin/Cassandra/blob/master/cassandra_client.cpp It's > very nice of you to help me improve my code. > > As I increase the number of threads, the latency gets longer. > > I'm using C++, so if I want to use native binary + prepared statements, > the only way is to use C++ driver? > Thanks very much. > > > > > 2014-12-08 12:51 GMT+08:00 Chris Lohfink : > >> I think your client could use improvements. How many threads do you have >> running in your test? With a thrift call like that you only can do one >> request at a time per connection. For example, assuming C* takes 0ms, a >> 10ms network latency/driver overhead will mean 20ms RTT and a max >> throughput of ~50 QPS per thread (native binary doesn't behave like this). >> Are you running client on its own system or shared with a node? how are >> you load balancing your requests? Source code would help since theres a >> lot that can become a bottleneck. >> >> Generally you will see a bit of a dip in latency from N=RF=1 and N=2, >> RF=2 etc since there are optimizations on the coordinator node when it >> doesn't need to send the request to the replicas. The impact of the >> network overhead decreases in significance as cluster grows. Typically; >> latency wise, RF=N=1 is going to be fastest possible for smaller loads (ie >> when a client cannot fully saturate a single node). >> >> Main thing to expect is that latency will plateau and remain fairly >> constant as load/nodes increase while throughput potential will linearly >> (empirically at least) increase. >> >> You should really attempt it with the native binary + prepared >> statements, running cql over thrift is far from optimal. I would recommend >> using the cassandra-stress tool if you want to stress test Cassandra (and >> not your code) >> http://www.datastax.com/dev/blog/improved-cassandra-2-1-stress-tool-benchmark-any-schema &
Re: Cassandra Doesn't Get Linear Performance Increment in Stress Test on Amazon EC2
;> >> Generally you will see a bit of a dip in latency from N=RF=1 and N=2, >> RF=2 etc since there are optimizations on the coordinator node when it >> doesn't need to send the request to the replicas. The impact of the >> network overhead decreases in significance as cluster grows. Typically; >> latency wise, RF=N=1 is going to be fastest possible for smaller loads (ie >> when a client cannot fully saturate a single node). >> >> Main thing to expect is that latency will plateau and remain fairly >> constant as load/nodes increase while throughput potential will linearly >> (empirically at least) increase. >> >> You should really attempt it with the native binary + prepared >> statements, running cql over thrift is far from optimal. I would recommend >> using the cassandra-stress tool if you want to stress test Cassandra (and >> not your code) http://www.datastax.com/dev/blog/improved-cassandra-2- >> 1-stress-tool-benchmark-any-schema >> >> === >> Chris Lohfink >> >> On Sun, Dec 7, 2014 at 9:48 PM, 孔嘉林 wrote: >> >>> Hi Eric, >>> Thank you very much for your reply! >>> Do you mean that I should clear my table after each run? Indeed, I can >>> see several times of compaction during my test, but could only a few times >>> compaction affect the performance that much? Also, I can see from the >>> OpsCenter some ParNew GC happen but no CMS GC happen. >>> >>> I run my test on EC2 cluster, I think the network could be of high speed >>> with in it. Each Cassandra server has 4 units CPU, 15 GiB memory and 80 SSD >>> storage, which is of m3.xlarge type. >>> >>> As for latency, which latency should I care about most? p(99) or p(999)? >>> I want to get the max QPS under a certain limited latency. >>> >>> I know my testing scenario are not the common case in production, I just >>> want to know how much burden my cluster can bear under stress. >>> >>> So, how did you test your cluster that can get 86k writes/sec? How many >>> requests did you send to your cluster? Was it also 1 million? Did you also >>> use OpsCenter to monitor the real time performance? I also wonder why the >>> write and read QPS OpsCenter provide are much lower than what I calculate. >>> Could you please describe in detail about your test deployment? >>> >>> Thank you very much, >>> Joy >>> >>> 2014-12-07 23:55 GMT+08:00 Eric Stevens : >>> >>>> Hi Joy, >>>> >>>> Are you resetting your data after each test run? I wonder if your >>>> tests are actually causing you to fall behind on data grooming tasks such >>>> as compaction, and so performance suffers for your later tests. >>>> >>>> There are *so many* factors which can affect performance, without >>>> reviewing test methodology in great detail, it's really hard to say whether >>>> there are flaws which might uncover an antipattern cause atypical number of >>>> cache hits or misses, and so forth. You may also be producing gc pressure >>>> in the write path, and so forth. >>>> >>>> I *can* say that 28k writes per second looks just a little low, but it >>>> depends a lot on your network, hardware, and write patterns (eg, data >>>> size). For a little performance test suite I wrote, with parallel batched >>>> writes, on a 3 node rf=3 cluster test cluster, I got about 86k writes per >>>> second. >>>> >>>> Also focusing exclusively on max latency is going to cause you some >>>> troubles especially in the case of magnetic media as you're using. Between >>>> ill-timed GC and inconsistent performance characteristics from magnetic >>>> media, your max numbers will often look significantly worse than your p(99) >>>> or p(999) numbers. >>>> >>>> All this said, one node will often look better than several nodes for >>>> certain patterns because it completely eliminates proxy (coordinator) write >>>> times. All writes are local writes. It's an over-simple case that doesn't >>>> reflect any practical production use of Cassandra, so it's probably not >>>> worth even including in your tests. I would recommend start at 3 nodes >>>> rf=3, and compare against 6 nodes rf=6. Make sure you're staying on top of >>>> compaction and aren't seeing garbage collections in the logs (either of >>>> those will be polluting