Stress test

2017-07-27 Thread Greg Lloyd
I am trying to use the cassandra stress tool with the user
profile=table.yaml arguments specified and do authentication at the same
time. If I use the user profile I get an error Invalid parameter user=* if
I specify a user and password.

Is it not possible to specify a yaml and use authentication?


Stress Test

2018-09-06 Thread rajasekhar kommineni
Hello Folks,

Does any body refer good documentation on Cassandra stress test. 

I have below questions.

1) Which server is good to start the test, Cassandra server or Application 
server.
2) I am using Datastax Java driver, is any good documentation for stress test 
specific to this driver.
3) How to analyze the stress test output.

Thanks,


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Stress test inconsistencies

2011-01-24 Thread Oleg Proudnikov
Hi All,

I am struggling to make sense of a simple stress test I ran against the latest
Cassandra 0.7. My server performs very poorly compared to a desktop and even a
notebook.

Here is the command I execute - a single threaded insert that runs on the same
host as Cassnadra does (I am using new contrib/stress but old py_stress produces
similar results):

./stress -t 1 -o INSERT -c 30 -n 1 -i 1

On a SUSE Linux server with a 4-core Intel XEON I get maximum 30 inserts a
second with 40ms latency. But on a Windows desktop I get incredible 200-260
inserts a second with a 4ms latency!!! Even on the smallest MacBook Pro I get
bursts of high throughput - 100+ inserts a second.

Could you please help me figure out what is wrong with my server? I tried
several servers actually with the same results. I would appreciate any help in
tracing down the bottleneck. Configuration is the same in all tests with the
server having the advantage of separate physical disks for commitlog and data.

Could you also share with me what numbers you get or what is reasonable to
expect from this test?

Thank you very much,
Oleg


Here is the output for the Linux server, Windows desktop and MacBook Pro, one
line per second:

Linux server - INtel XEON X3330 @ 2.666Mhz, 4G RAM, 2G heap

Created keyspaces. Sleeping 1s for propagation.
total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
19,19,19,0.05947368421052632,1
46,27,27,0.04274074074074074,2
70,24,24,0.04733,3
95,25,25,0.04696,4
119,24,24,0.048208333,5
147,28,28,0.04189285714285714,7
177,30,30,0.03904,8
206,29,29,0.04006896551724138,9
235,29,29,0.03903448275862069,10

Windows desktop: Core2 Duo CPU E6550 @ 2.333Mhz, 2G RAM, 1G heap

Keyspace already exists.
total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
147,147,147,0.005292517006802721,1
351,204,204,0.0042009803921568625,2
527,176,176,0.006551136363636364,3
718,191,191,0.005617801047120419,4
980,262,262,0.00400763358778626,5
1206,226,226,0.004150442477876107,6
1416,210,210,0.005619047619047619,7
1678,262,262,0.0040038167938931295,8

MacBook Pro: Core2 Duo CPU @ 2.26Mhz, 2G RAM, 1G heap

Created keyspaces. Sleeping 1s for propagation.
total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
0,0,0,NaN,1
7,7,7,0.21185714285714285,2
47,40,40,0.026925,3
171,124,124,0.007967741935483871,4
258,87,87,0.01206896551724138,6
294,36,36,0.022444,7
303,9,9,0.14378,8
307,4,4,0.2455,9
313,6,6,0.128,10
508,195,195,0.007938461538461538,11
792,284,284,0.0035985915492957746,12
882,90,90,0.01219,13





Re: Stress test

2017-07-27 Thread Jay Zhuang
The user and password should be in -mode section, for example:
./cassandra-stress user profile=table.yaml ops\(insert=1\) -mode native
cql3 user=** password=**

http://docs.datastax.com/en/cassandra/3.0/cassandra/tools/toolsCStress.html

/Jay

On 7/27/17 2:46 PM, Greg Lloyd wrote:
> I am trying to use the cassandra stress tool with the user
> profile=table.yaml arguments specified and do authentication at the same
> time. If I use the user profile I get an error Invalid parameter
> user=* if I specify a user and password.
> 
> Is it not possible to specify a yaml and use authentication?

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Stress test cassandr

2017-11-26 Thread Akshit Jain
Hi,
What is the best way to stress test the cassandra cluster with real life
workloads which is being followed currently?
Currently i am using cassandra stress-tool but it generated blob data /yaml
files provides the option to use custom keyspace.

But what are the different parameters values which can be set to test the
cassandra cluster in extreme environment?


Re: Stress Test

2018-09-09 Thread Swen Moczarski
Hi,
I found this blog quite helpful:
https://www.instaclustr.com/deep-diving-into-cassandra-stress-part-1/

on 1, not sure if I understand your question correctly, but I would not
start the stress test process on a Cassandra node which will be under test.
on 3, the tool has already with an option to generate nice graphs:
http://cassandra.apache.org/doc/latest/tools/cassandra_stress.html#graphing

Hope that helps.

Am Do., 6. Sep. 2018 um 20:14 Uhr schrieb rajasekhar kommineni <
rajaco...@gmail.com>:

> Hello Folks,
>
> Does any body refer good documentation on Cassandra stress test.
>
> I have below questions.
>
> 1) Which server is good to start the test, Cassandra server or Application
> server.
> 2) I am using Datastax Java driver, is any good documentation for stress
> test specific to this driver.
> 3) How to analyze the stress test output.
>
> Thanks,
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>


Re: Stress test inconsistencies

2011-01-25 Thread Tyler Hobbs
Try using something higher than -t 1, like -t 100.

- Tyler

On Mon, Jan 24, 2011 at 9:38 PM, Oleg Proudnikov wrote:

> Hi All,
>
> I am struggling to make sense of a simple stress test I ran against the
> latest
> Cassandra 0.7. My server performs very poorly compared to a desktop and
> even a
> notebook.
>
> Here is the command I execute - a single threaded insert that runs on the
> same
> host as Cassnadra does (I am using new contrib/stress but old py_stress
> produces
> similar results):
>
> ./stress -t 1 -o INSERT -c 30 -n 1 -i 1
>
> On a SUSE Linux server with a 4-core Intel XEON I get maximum 30 inserts a
> second with 40ms latency. But on a Windows desktop I get incredible 200-260
> inserts a second with a 4ms latency!!! Even on the smallest MacBook Pro I
> get
> bursts of high throughput - 100+ inserts a second.
>
> Could you please help me figure out what is wrong with my server? I tried
> several servers actually with the same results. I would appreciate any help
> in
> tracing down the bottleneck. Configuration is the same in all tests with
> the
> server having the advantage of separate physical disks for commitlog and
> data.
>
> Could you also share with me what numbers you get or what is reasonable to
> expect from this test?
>
> Thank you very much,
> Oleg
>
>
> Here is the output for the Linux server, Windows desktop and MacBook Pro,
> one
> line per second:
>
> Linux server - INtel XEON X3330 @ 2.666Mhz, 4G RAM, 2G heap
>
> Created keyspaces. Sleeping 1s for propagation.
> total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
> 19,19,19,0.05947368421052632,1
> 46,27,27,0.04274074074074074,2
> 70,24,24,0.04733,3
> 95,25,25,0.04696,4
> 119,24,24,0.048208333,5
> 147,28,28,0.04189285714285714,7
> 177,30,30,0.03904,8
> 206,29,29,0.04006896551724138,9
> 235,29,29,0.03903448275862069,10
>
> Windows desktop: Core2 Duo CPU E6550 @ 2.333Mhz, 2G RAM, 1G heap
>
> Keyspace already exists.
> total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
> 147,147,147,0.005292517006802721,1
> 351,204,204,0.0042009803921568625,2
> 527,176,176,0.006551136363636364,3
> 718,191,191,0.005617801047120419,4
> 980,262,262,0.00400763358778626,5
> 1206,226,226,0.004150442477876107,6
> 1416,210,210,0.005619047619047619,7
> 1678,262,262,0.0040038167938931295,8
>
> MacBook Pro: Core2 Duo CPU @ 2.26Mhz, 2G RAM, 1G heap
>
> Created keyspaces. Sleeping 1s for propagation.
> total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
> 0,0,0,NaN,1
> 7,7,7,0.21185714285714285,2
> 47,40,40,0.026925,3
> 171,124,124,0.007967741935483871,4
> 258,87,87,0.01206896551724138,6
> 294,36,36,0.022444,7
> 303,9,9,0.14378,8
> 307,4,4,0.2455,9
> 313,6,6,0.128,10
> 508,195,195,0.007938461538461538,11
> 792,284,284,0.0035985915492957746,12
> 882,90,90,0.01219,13
>
>
>
>


Re: Stress test inconsistencies

2011-01-25 Thread Oleg Proudnikov
Tyler Hobbs  riptano.com> writes:

> Try using something higher than -t 1, like -t 100.- Tyler
>


Thank you, Tyler!

When I run contrib/stress with a higher thread count, the server does scale to
200 inserts a second with latency of 200ms. At the same time Windows desktop
scales to 900 inserts a second and latency of 120ms. There is a huge difference
that I am trying to understand and eliminate.

In my real life bulk load I have to stay with a single threaded client for the
POC I am doing. The only option I have is to run several client processes... My
real life load is heavier than what contrib/stress does. It takes several days
to bulk load 4 million batch mutations !!! It is really painful :-( Something is
just not right...

Oleg






Re: Stress test inconsistencies

2011-01-25 Thread buddhasystem

Oleg,

I'm a novice at this, but for what it's worth I can't imagine you can have a
_sustained_ 1kHz insertion rate on a single machine which also does some
reads. If I'm wrong, I'll be glad to learn that I was. It just doesn't seem
to square with a typical seek time on a hard drive.

Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Stress-test-inconsistencies-tp5957467p5960182.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Stress test inconsistencies

2011-01-25 Thread Brandon Williams
On Tue, Jan 25, 2011 at 1:23 PM, Oleg Proudnikov wrote:

> When I run contrib/stress with a higher thread count, the server does scale
> to
> 200 inserts a second with latency of 200ms. At the same time Windows
> desktop
> scales to 900 inserts a second and latency of 120ms. There is a huge
> difference
> that I am trying to understand and eliminate.
>

Those are really low numbers, are you still testing with 10k rows?  That's
not enough, try 1M to give both JVMs enough time to warm up.

-Brandon


Re: Stress test inconsistencies

2011-01-25 Thread Oleg Proudnikov
Brandon Williams  gmail.com> writes:

> 
> On Tue, Jan 25, 2011 at 1:23 PM, Oleg Proudnikov  cloudorange.com>
wrote:
> 
> When I run contrib/stress with a higher thread count, the server does scale to
> 200 inserts a second with latency of 200ms. At the same time Windows desktop
> scales to 900 inserts a second and latency of 120ms. There is a huge 
> difference
> that I am trying to understand and eliminate.
> 
> 
> Those are really low numbers, are you still testing with 10k rows?  That's not
enough, try 1M to give both JVMs enough time to warm up.
> 
> 
> -Brandon 
> 

I agree, Brandon, the numbers are very low! The warm up does not seem to make
any difference though... There is something that is holding the server back
because the CPU is very low. I am trying to understand where this bottleneck is
on the Linux server. I do not think it is Cassandra's config as I use the same
config on Windows and get much higher numbers as I described.

Oleg




Re: Stress test inconsistencies

2011-01-25 Thread Oleg Proudnikov
buddhasystem  bnl.gov> writes:

> 
> 
> Oleg,
> 
> I'm a novice at this, but for what it's worth I can't imagine you can have a
> _sustained_ 1kHz insertion rate on a single machine which also does some
> reads. If I'm wrong, I'll be glad to learn that I was. It just doesn't seem
> to square with a typical seek time on a hard drive.
> 
> Maxim
> 

Maxim,

As I understand during inserts Cassandra should not be constrained by random
seek time as it uses sequential writes. I do get high numbers on Windows but
there is something that is holding back my Linux server. I am trying to
understand what it is.

Oleg





Re: Stress test inconsistencies

2011-01-25 Thread Anthony John
Look at iostat -x 10 10 when he active par tof your test is running. there
should be something called svc_t - that should be in the 10ms range, and
await should be low.

Will tell you if IO is slow, or if IO is not being issued.

Also, ensure that you ain't swapping with something like "swapon -s"

On Tue, Jan 25, 2011 at 3:04 PM, Oleg Proudnikov wrote:

> buddhasystem  bnl.gov> writes:
>
> >
> >
> > Oleg,
> >
> > I'm a novice at this, but for what it's worth I can't imagine you can
> have a
> > _sustained_ 1kHz insertion rate on a single machine which also does some
> > reads. If I'm wrong, I'll be glad to learn that I was. It just doesn't
> seem
> > to square with a typical seek time on a hard drive.
> >
> > Maxim
> >
>
> Maxim,
>
> As I understand during inserts Cassandra should not be constrained by
> random
> seek time as it uses sequential writes. I do get high numbers on Windows
> but
> there is something that is holding back my Linux server. I am trying to
> understand what it is.
>
> Oleg
>
>
>
>


Re: Stress test inconsistencies

2011-01-26 Thread Oleg Proudnikov
Hi All,

I was able to run contrib/stress at a very impressive throughput. Single
threaded client was able to pump 2,000 inserts per second with 0.4 ms latency.
Multithreaded client was able to pump 7,000 inserts per second with 7ms latency.

Thank you very much for your help!

Oleg




Re: Stress test inconsistencies

2011-01-26 Thread Jonathan Shook
Would you share with us the changes you made, or problems you found?

On Wed, Jan 26, 2011 at 10:41 AM, Oleg Proudnikov  wrote:
> Hi All,
>
> I was able to run contrib/stress at a very impressive throughput. Single
> threaded client was able to pump 2,000 inserts per second with 0.4 ms latency.
> Multithreaded client was able to pump 7,000 inserts per second with 7ms 
> latency.
>
> Thank you very much for your help!
>
> Oleg
>
>
>


Re: Stress test inconsistencies

2011-01-26 Thread Oleg Proudnikov
I returned to periodic commit log fsync.


Jonathan Shook  gmail.com> writes:

> 
> Would you share with us the changes you made, or problems you found?
> 




Timeout during stress test

2011-04-11 Thread mcasandra
I am running stress test using hector. In the client logs I see:

me.prettyprint.hector.api.exceptions.HTimedOutException: TimedOutException()
at
me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:32)
at
me.prettyprint.cassandra.service.HColumnFamilyImpl$1.execute(HColumnFamilyImpl.java:256)
at
me.prettyprint.cassandra.service.HColumnFamilyImpl$1.execute(HColumnFamilyImpl.java:227)
at
me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:101)
at
me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:221)
at
me.prettyprint.cassandra.model.ExecutingKeyspace.doExecuteOperation(ExecutingKeyspace.java:97)
at
me.prettyprint.cassandra.service.HColumnFamilyImpl.doExecuteSlice(HColumnFamilyImpl.java:227)
at
me.prettyprint.cassandra.service.HColumnFamilyImpl.getColumns(HColumnFamilyImpl.java:139)
at
com.riptano.cassandra.stress.SliceCommand.call(SliceCommand.java:48)
at
com.riptano.cassandra.stress.SliceCommand.call(SliceCommand.java:20)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: TimedOutException()
at
org.apache.cassandra.thrift.Cassandra$get_slice_result.read(Cassandra.java:7174)
at
org.apache.cassandra.thrift.Cassandra$Client.recv_get_slice(Cassandra.java:540)
at
org.apache.cassandra.thrift.Cassandra$Client.get_slice(Cassandra.java:512)
at
me.prettyprint.cassandra.service.HColumnFamilyImpl$1.execute(HColumnFamilyImpl.java:236)


But I don't see anything in cassandra logs.

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Timeout-during-stress-test-tp6262430p6262430.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Stress test cassandr

2017-11-26 Thread Jonathan Haddad
Have you read through the docs for stress? You can have it use your own
queries and data model.

http://cassandra.apache.org/doc/latest/tools/cassandra_stress.html
On Sun, Nov 26, 2017 at 1:02 AM Akshit Jain  wrote:

> Hi,
> What is the best way to stress test the cassandra cluster with real life
> workloads which is being followed currently?
> Currently i am using cassandra stress-tool but it generated blob data
> /yaml files provides the option to use custom keyspace.
>
> But what are the different parameters values which can be set to test the
> cassandra cluster in extreme environment?
>
>


Problems with Python Stress Test

2011-02-03 Thread Sameer Farooqui
Hi guys,

I was playing around with the stress.py test this week and noticed a few
things.

1) Progress-interval does not always work correctly. I set it to 5 in the
example below, but am instead getting varying intervals:

*techlabs@cassandraN1:~/apache-cassandra-0.7.0-src/contrib/py_stress$ python
stress.py --num-keys=10 --columns=5 --column-size=32 --operation=insert
--progress-interval=5 --threads=4 --nodes=170.252.179.222
Keyspace already exists.
total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
6662,1332,1335,0.00307796342135,5
11607,989,988,0.00476862022199,12
20297,1738,1736,0.00273238550807,18
30631,2066,2068,0.00202261635614,24
37291,1332,1331,0.00325975901372,29
47514,2044,2044,0.00193106963725,35
56618,1820,1821,0.00276346638249,41
68652,2406,2406,0.00179436958884,47
77745,1818,1820,0.00220694060007,52
87351,1921,1918,0.00236015612201,58
97167,1963,1963,0.00230505042379,64
10,566,566,0.00223569174853,66*


2) The key_rate and op_rate doesn't seem to be calculated correctly. Also,
what is the difference between the interval_key_rate and the
interval_op_rate? For example in the example above, the first row shows 6662
keys inserted in 5 seconds and 6662 / 5 = 1332, which matches the
interval_op_rate.

The second row took 7 seconds to update instead of the requested 5. However,
the interval_op_rate and interval_key_rate are being calculated based on my
requested 5 seconds instead of the actual observed 7 seconds.

(11607-6662)/5=989
(11607-6662)/7 = 706

Shouldn't it be basing the calculations off the 7 seconds?


3) If I write x KB to Cassandra with py_stress, the used disk space doesn't
grow by x after the test. In the example below I tried to write 500,000 keys
* 32 bytes * 5 columns = 78,125 kilobytes of data to the database. When I
checked the amount of disk space used after the test it actually grew by
2,684,920 - 2,515,864 = 169,056 kilobytes. Is this because perhaps the
commit log got duplicate copies of the data as the SSTables?

Also, notice how to progress interval got thrown off after 40 seconds.


techlabs@cassandraN1:~/apache-cassandra-0.7.0-src/contrib/py_stress$ df
Filesystem   1K-blocks  Used Available Use% Mounted on
/dev/mapper/cassandra7rc4-root
   7583436   *2515864   *4682344  35% /
none633244   208633036   1% /dev
none640368 0640368   0% /dev/shm
none64036856640312   1% /var/run
none640368 0640368   0% /var/lock
/dev/sda1   233191 20601200149  10% /boot

techlabs@cassandraN1:~/apache-cassandra-0.7.0-src/contrib/py_stress$ python
stress.py --num-keys=50 --columns=5 --operation=insert
--progress-interval=5 --threads=1 --nodes=170.252.179.222
Keyspace already exists.
total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
15562,3112,3112,0.000300011955333,5
31643,3216,3216,0.000290757187504,10
42968,2265,2265,0.000423845265875,15
54071,2220,2220,0.000430288759747,20
66491,2484,2484,0.000382423304897,25
79891,2680,2680,0.000351728307667,30
91758,2373,2373,0.000402696775367,35
102179,2084,2084,0.000461982612291,40
114003,2364,2364,0.000403893998092,46
126509,2501,2501,0.000379724634489,51
138047,2307,2307,0.000414365229356,56
150261,2442,2442,0.000390332772296,61
164019,2751,2751,0.000343320345113,66
175390,2274,2274,0.000421584286756,71
186564,2234,2234,0.000429319251473,76
198292,2345,2345,0.00040838057315,81
210186,2378,2378,0.000400560030882,87
225144,2991,2991,0.000314564943345,92
236474,2266,2266,0.000422214746265,97
249940,2693,2693,0.000349487200297,102
264410,2894,2894,0.00030166366303,107
275429,2203,2203,0.000464002475276,112
286430,2200,2200,0.00043832517821,117
299217,2557,2557,0.000371891478764,122
313800,2916,2916,0.000322412596002,128
325252,2290,2290,0.000417413284343,133
336031,2155,2155,0.000445155976201,138
347257,2245,2245,0.000426658924816,143
357493,2047,2047,0.000472509730556,148
372151,2931,2931,0.000321278794594,153
384655,2500,2500,0.000381667455343,158
395604,2189,2189,0.000439286896144,163
409713,2821,2821,0.000334938358759,168
423162,2689,2689,0.000351835071877,174
434276,,,0.000432009316829,179
444809,2106,2106,0.00045844612893,184
458190,2676,2676,0.000353130326037,189
470852,2532,2532,0.000374360740552,194
481333,2096,2096,0.000462788910416,199
492458,2225,2225,0.000431290422932,204
50,1508,1508,0.000353647808408,207


techlabs@cassandraN1:~/apache-cassandra-0.7.0-src/contrib/py_stress$ df
Filesystem   1K-blocks  Used Available Use% Mounted on
/dev/mapper/cassandra7rc4-root
   7583436   2684920   4513288  38% /
none633244   208633036   1% /dev
none640368 0640368   0% /dev/shm
none64036856640312   1% /var/run
none640368 0640368   0% /var/lock
/dev/sda1   233191  

CF config for Stress Test

2011-04-08 Thread mcasandra
I am starting a stress test using hector on 6 node machine 4GB heap and 12
core. In hectore readme this is what I got by default:

create keyspace StressKeyspace
with replication_factor = 3
and placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy';

use StressKeyspace;
drop column family StressStandard;
create column family StressStandard
with comparator = UTF8Type
and keys_cached = 1
and memtable_flush_after = 1440
and memtable_throughput = 32;

Are these good values? I was thinking of highher keys_cached but not sure if
it's in bytes or no of keys.

Also not sure how to tune memtable values.

I have set concurrent_readers to 32 and writers to 48.

Can someone please help me with good values that I can start this test with?

Also, any other suggested values that I need to change?

Thanks

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/CF-config-for-Stress-Test-tp6255608p6255608.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Timeout during stress test

2011-04-11 Thread mcasandra
I see this occurring often when all cassandra nodes all of a sudden show CPU
spike. All reads fail for about 2 mts. GC.log and system.log doesn't reveal
much.

Only think I notice is that when I restart nodes there are tons of files
that gets deleted. cfstats from one of the nodes looks like this:

nodetool -h `hostname` tpstats
Pool NameActive   Pending  Completed
ReadStage2727  21491
RequestResponseStage  0 0 201641
MutationStage 0 0 236513
ReadRepairStage   0 0   7222
GossipStage   0 0  31498
AntiEntropyStage  0 0  0
MigrationStage0 0  0
MemtablePostFlusher   0 0324
StreamStage   0 0  0
FlushWriter   0 0324
FILEUTILS-DELETE-POOL 0 0   1220
MiscStage 0 0  0
FlushSorter   0 0  0
InternalResponseStage 0 0  0
HintedHandoff 1 3  9

--


Keyspace: StressKeyspace
Read Count: 21957
Read Latency: 46.91765058978913 ms.
Write Count: 222104
Write Latency: 0.008302124230090408 ms.
Pending Tasks: 0
Column Family: StressStandard
SSTable count: 286
Space used (live): 377916657941
Space used (total): 377916657941
Memtable Columns Count: 362
Memtable Data Size: 164403613
Memtable Switch Count: 326
Read Count: 21958
Read Latency: 631.464 ms.
Write Count: 222104
Write Latency: 0.007 ms.
Pending Tasks: 0
Key cache capacity: 100
Key cache size: 22007
Key cache hit rate: 0.002453626459907744
Row cache: disabled
Compacted row minimum size: 87
Compacted row maximum size: 5839588
Compacted row mean size: 552698




--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Timeout-during-stress-test-tp6262430p6263087.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Timeout during stress test

2011-04-11 Thread aaron morton
TimedOutException means the cluster could not perform the request in 
rpc_timeout time. The client should retry as the problem may be transitory. 

In this case read performance may have slowed down due to the number of 
sstables 286. It hard to tell without knowing what the workload is.

Aaron

On 12 Apr 2011, at 09:56, mcasandra wrote:

> I see this occurring often when all cassandra nodes all of a sudden show CPU
> spike. All reads fail for about 2 mts. GC.log and system.log doesn't reveal
> much.
> 
> Only think I notice is that when I restart nodes there are tons of files
> that gets deleted. cfstats from one of the nodes looks like this:
> 
> nodetool -h `hostname` tpstats
> Pool NameActive   Pending  Completed
> ReadStage2727  21491
> RequestResponseStage  0 0 201641
> MutationStage 0 0 236513
> ReadRepairStage   0 0   7222
> GossipStage   0 0  31498
> AntiEntropyStage  0 0  0
> MigrationStage0 0  0
> MemtablePostFlusher   0 0324
> StreamStage   0 0  0
> FlushWriter   0 0324
> FILEUTILS-DELETE-POOL 0 0   1220
> MiscStage 0 0  0
> FlushSorter   0 0  0
> InternalResponseStage 0 0  0
> HintedHandoff 1 3  9
> 
> --
> 
> 
> Keyspace: StressKeyspace
>Read Count: 21957
>Read Latency: 46.91765058978913 ms.
>Write Count: 222104
>Write Latency: 0.008302124230090408 ms.
>Pending Tasks: 0
>Column Family: StressStandard
>SSTable count: 286
>Space used (live): 377916657941
>Space used (total): 377916657941
>Memtable Columns Count: 362
>Memtable Data Size: 164403613
>Memtable Switch Count: 326
>Read Count: 21958
>Read Latency: 631.464 ms.
>Write Count: 222104
>Write Latency: 0.007 ms.
>Pending Tasks: 0
>Key cache capacity: 100
>Key cache size: 22007
>Key cache hit rate: 0.002453626459907744
>Row cache: disabled
>Compacted row minimum size: 87
>Compacted row maximum size: 5839588
>Compacted row mean size: 552698
> 
> 
> 
> 
> --
> View this message in context: 
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Timeout-during-stress-test-tp6262430p6263087.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
> Nabble.com.



Re: Timeout during stress test

2011-04-11 Thread mcasandra
It looks like hector did retry on all the nodes and failed. Does this then
mean cassandra is down for clients in this scenario? That would be bad.

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Timeout-during-stress-test-tp6262430p6263270.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Timeout during stress test

2011-04-11 Thread aaron morton
It means the cluster is currently overloaded and unable to complete requests in 
time at the CL specified. 

Aaron

On 12 Apr 2011, at 11:18, mcasandra wrote:

> It looks like hector did retry on all the nodes and failed. Does this then
> mean cassandra is down for clients in this scenario? That would be bad.
> 
> --
> View this message in context: 
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Timeout-during-stress-test-tp6262430p6263270.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
> Nabble.com.



Re: Timeout during stress test

2011-04-11 Thread mcasandra
But I don't understand the reason for oveload. It was doing simple read of 12
threads and reasing 5 rows. Avg CPU only 20%, No GC issues that I see. I
would expect cassandra to be able to process more with 6 nodes, 12 core, 96
GB RAM and 4 GB heap.

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Timeout-during-stress-test-tp6262430p6263470.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Timeout during stress test

2011-04-11 Thread aaron morton
You'll need to provide more information, from the TP stats the read stage could 
not keep up. If the node is not CPU bound then it is probably IO bound. 


What sort of read?
How many columns was it asking for ? 
How many columns do the rows have ?
Was the test asking for different rows ?
How many ops requests per second did it get up to?
What do the io stats look like ? 
What does nodetool cfhistograms say ?

Aaron

On 12 Apr 2011, at 13:02, mcasandra wrote:

> But I don't understand the reason for oveload. It was doing simple read of 12
> threads and reasing 5 rows. Avg CPU only 20%, No GC issues that I see. I
> would expect cassandra to be able to process more with 6 nodes, 12 core, 96
> GB RAM and 4 GB heap.
> 
> --
> View this message in context: 
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Timeout-during-stress-test-tp6262430p6263470.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
> Nabble.com.



Re: Timeout during stress test

2011-04-11 Thread Terje Marthinussen
I notice you have pending hinted handoffs?

Look for errors related to that. We have seen occasional corruptions in the
hinted handoff sstables,

If you are stressing the system to its limits, you may also consider playing
with more with the number of  read/write threads  (concurrent_reads/writes)
as well as rate limiting the number of requests you can get per node
(throttle limit).

We have seen similar issue when sending large number of requests to a
cluster (read/write threads running out, timeouts, nodes marked as down).

Terje

We have seen similar issues when

On Tue, Apr 12, 2011 at 9:56 AM, aaron morton wrote:

> It means the cluster is currently overloaded and unable to complete
> requests in time at the CL specified.
>
> Aaron
>
> On 12 Apr 2011, at 11:18, mcasandra wrote:
>
> > It looks like hector did retry on all the nodes and failed. Does this
> then
> > mean cassandra is down for clients in this scenario? That would be bad.
> >
> > --
> > View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Timeout-during-stress-test-tp6262430p6263270.html
> > Sent from the cassandra-u...@incubator.apache.org mailing list archive
> at Nabble.com.
>
>


Re: Timeout during stress test

2011-04-11 Thread mcasandra

aaron morton wrote:
> 
> You'll need to provide more information, from the TP stats the read stage
> could not keep up. If the node is not CPU bound then it is probably IO
> bound. 
> 
> 
> What sort of read?
> How many columns was it asking for ? 
> How many columns do the rows have ?
> Was the test asking for different rows ?
> How many ops requests per second did it get up to?
> What do the io stats look like ? 
> What does nodetool cfhistograms say ?
> 
It's simple read of 1M rows with one column of avg size of 200K. Got around
70 req per sec.

Not sure how to intepret the iostats output with things happening async in
cassandra. Can you give little description on how to interpret it?

I have posted output of cfstats. Does cfhistograms provide better info?


--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Timeout-during-stress-test-tp6262430p6263859.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Timeout during stress test

2011-04-12 Thread aaron morton
Couple of hits here, one from jonathan and some previous discussions on the 
user list http://www.google.co.nz/search?q=cassandra+iostat

Same here for cfhistograms 
http://www.google.co.nz/search?q=cassandra+cfhistograms 
cfhistograms includes information on the number of sstables read during recent 
requests. As your initial cfstats showed 236 sstables I thought it may be 
useful see if there was a high number of sstables been accessed per read. 

70 requests per second is slow against a 6 node cluster where each node has 12 
cores and 96GB of ram. Something is not right.

Aaron 
On 12 Apr 2011, at 17:11, mcasandra wrote:

> 
> aaron morton wrote:
>> 
>> You'll need to provide more information, from the TP stats the read stage
>> could not keep up. If the node is not CPU bound then it is probably IO
>> bound. 
>> 
>> 
>> What sort of read?
>> How many columns was it asking for ? 
>> How many columns do the rows have ?
>> Was the test asking for different rows ?
>> How many ops requests per second did it get up to?
>> What do the io stats look like ? 
>> What does nodetool cfhistograms say ?
>> 
> It's simple read of 1M rows with one column of avg size of 200K. Got around
> 70 req per sec.
> 
> Not sure how to intepret the iostats output with things happening async in
> cassandra. Can you give little description on how to interpret it?
> 
> I have posted output of cfstats. Does cfhistograms provide better info?
> 
> 
> --
> View this message in context: 
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Timeout-during-stress-test-tp6262430p6263859.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
> Nabble.com.



Re: Timeout during stress test

2011-04-12 Thread mcasandra
0 0  
  
0
17084  690 0 0  
  
0
20501  960 0 0  
  
0
24601 1272 0 0  
  
0
29521 1734 0 0  
  
0
35425 2262 0 0  
  
0
42510 2734 0 0  
  
0
51012 3098 0 0  
  
0
61214 3426 0 0  
  
0
73457 3879 0 0  
  
0
88148 4157 0 0  
  
0
1057784065 0 0  
  
0
1269343804 0 0  
  
0
1523212828 0 0  
  
0
1827851699 0 0  
  
0
219342 821 0 0  
  
0
263210 300 0249214  
  
0
315852  88 0149731  
  
0
379022  12 0 0  
  
0
454826   3 0 0  
  
0
545791   0 0 0  
  
0
654949   0 0 0  
  
0
785939   0 0 0  
  
0
943127   0 0 74915  
  
0
1131752  0 0 0  
  
0
1358102  0 0 0  
  
0
1629722  0 0 0  
  
0
1955666  0 0 0  
  
0
2346799  0 0 0  
  
0
2816159  0 0 0  
  
0
3379391  0 0 22438  
  
0
4055269  0 0 0  
  
0
4866323  0 0 0  
  
0
5839588  0 0  2559  
  
0
7007506  0 0 0  
  
0
8409007  0 0 0  
  
0
10090808 0 0 0  
  
0
12108970 0 0 0  
  
0
14530764 0 0 0  
  
0
17436917 0 0 0  
  
0
20924300 0 0 0  
  
0
25109160 0 0 0  
  
0


--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Timeout-during-stress-test-tp6262430p6265925.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Cassandra Stress Test Result Evaluation

2015-03-04 Thread Nisha Menon
I have been using the cassandra-stress tool to evaluate my cassandra
cluster for quite some time now. My problem is that I am not able to
comprehend the results generated for my specific use case.

My schema looks something like this:

CREATE TABLE Table_test(
  ID uuid,
  Time timestamp,
  Value double,
  Date timestamp,
  PRIMARY KEY ((ID,Date), Time)
) WITH COMPACT STORAGE;

I have parsed this information in a custom yaml file and used parameters
n=1, threads=100 and the rest are default options (cl=one, mode=native
cql3 etc). The Cassandra cluster is a 3 node CentOS VM setup.

A few specifics of the custom yaml file are as follows:

insert:
partitions: fixed(100)
select: fixed(1)/2
batchtype: UNLOGGED

columnspecs:
-name: Time
 size: fixed(1000)
-name: ID
 size: uniform(1..100)
-name: Date
 size: uniform(1..10)
-name: Value
 size: uniform(-100..100)

My observations so far are as follows (Please correct me if I am wrong):

   1. With n=1 and time: fixed(1000), the number of rows getting
   inserted is 10 million. (1*1000=1000)
   2. The number of row-keys/partitions is 1(i.e n), within which 100
   partitions are taken at a time (which means 100 *1000 = 10 key-value
   pairs) out of which 5 key-value pairs are processed at a time. (This is
   because of select: fixed(1)/2 ~ 50%)

The output message also confirms the same:

Generating batches with [100..100] partitions and [5..5] rows
(of[10..10] total rows in the partitions)

The results that I get are the following for consecutive runs with the same
configuration as above:

Run Total_ops   Op_rate Partition_rate  Row_Rate   Time
1 56   19 1885   943246 3.0
2 46   46 4648  2325498 1.0
3 27   30 2982  1489870 0.9
4 59   19 1932   966034 3.1
5 100  17 1730   865182 5.8

Now what I need to understand are as follows:

   1. Which among these metrics is the throughput i.e, No. of records
   inserted per second? Is it the Row_rate, Op_rate or Partition_rate? If it’s
   the Row_rate, can I safely conclude here that I am able to insert close to
   1 million records per second? Any thoughts on what the Op_rate and
   Partition_rate mean in this case?
   2. Why is it that the Total_ops vary so drastically in every run ? Has
   the number of threads got anything to do with this variation? What can I
   conclude here about the stability of my Cassandra setup?
   3. How do I determine the batch size per thread here? In my example, is
   the batch size 5?

Thanks in advance.


Re: Problems with Python Stress Test

2011-02-03 Thread Brandon Williams
On Thu, Feb 3, 2011 at 7:02 PM, Sameer Farooqui wrote:

> Hi guys,
>
> I was playing around with the stress.py test this week and noticed a few
> things.
>
> 1) Progress-interval does not always work correctly. I set it to 5 in the
> example below, but am instead getting varying intervals:
>

Generally indicates that the client machine is being overloaded in my
experience.

2) The key_rate and op_rate doesn't seem to be calculated correctly. Also,
> what is the difference between the interval_key_rate and the
> interval_op_rate? For example in the example above, the first row shows 6662
> keys inserted in 5 seconds and 6662 / 5 = 1332, which matches the
> interval_op_rate.
>

There should be no difference unless you're doing range slices, but IPC
timing makes them vary somewhat.

3) If I write x KB to Cassandra with py_stress, the used disk space doesn't
> grow by x after the test. In the example below I tried to write 500,000 keys
> * 32 bytes * 5 columns = 78,125 kilobytes of data to the database. When I
> checked the amount of disk space used after the test it actually grew by
> 2,684,920 - 2,515,864 = 169,056 kilobytes. Is this because perhaps the
> commit log got duplicate copies of the data as the SSTables?
>

Commitlogs could be part of it, you're not factoring in the column names,
and then there's index and bloom filter overhead.

Use contrib/stress on 0.7 instead.

-Brandon


Re: Problems with Python Stress Test

2011-02-04 Thread Sameer Farooqui
Brandon,

Thanks for the response. I have also noticed that stress.py's progress
interval gets thrown off in low memory situations.

What did you mean by "contrib/stress on 0.7 instead".  I don't see that dir
in the src version of 0.7.

- Sameer


On Thu, Feb 3, 2011 at 5:22 PM, Brandon Williams  wrote:

> On Thu, Feb 3, 2011 at 7:02 PM, Sameer Farooqui 
> wrote:
>
>> Hi guys,
>>
>> I was playing around with the stress.py test this week and noticed a few
>> things.
>>
>> 1) Progress-interval does not always work correctly. I set it to 5 in the
>> example below, but am instead getting varying intervals:
>>
>
> Generally indicates that the client machine is being overloaded in my
> experience.
>
> 2) The key_rate and op_rate doesn't seem to be calculated correctly. Also,
>> what is the difference between the interval_key_rate and the
>> interval_op_rate? For example in the example above, the first row shows 6662
>> keys inserted in 5 seconds and 6662 / 5 = 1332, which matches the
>> interval_op_rate.
>>
>
> There should be no difference unless you're doing range slices, but IPC
> timing makes them vary somewhat.
>
> 3) If I write x KB to Cassandra with py_stress, the used disk space doesn't
>> grow by x after the test. In the example below I tried to write 500,000 keys
>> * 32 bytes * 5 columns = 78,125 kilobytes of data to the database. When I
>> checked the amount of disk space used after the test it actually grew by
>> 2,684,920 - 2,515,864 = 169,056 kilobytes. Is this because perhaps the
>> commit log got duplicate copies of the data as the SSTables?
>>
>
> Commitlogs could be part of it, you're not factoring in the column names,
> and then there's index and bloom filter overhead.
>
> Use contrib/stress on 0.7 instead.
>
> -Brandon
>


Re: Problems with Python Stress Test

2011-02-05 Thread Brandon Williams
On Fri, Feb 4, 2011 at 5:23 PM, Sameer Farooqui wrote:

> Brandon,
>
> Thanks for the response. I have also noticed that stress.py's progress
> interval gets thrown off in low memory situations.
>
> What did you mean by "contrib/stress on 0.7 instead".  I don't see that dir
> in the src version of 0.7.


Looks like it didn't make it in 0.7.0.  It will be in 0.7.1, or you can get
it from svn.

-Brandon


Re: CF config for Stress Test

2011-04-09 Thread aaron morton
If you just want to benchmark the cluster it wont matter too much, though I 
would set keys_cached to 0 and increate memtable throughput to 64 or 128. If 
you are testing to get a better idea for your app then use similar settings to 
your app. 

keys_cahced is the number of keys

for concurrent_readers and concurrent_writers see the comments in 
cong/cassandra.yaml.

I could not find this KS definition in the hector code base so not sure why 
they chose those values. 

Aaron
 
On 9 Apr 2011, at 11:10, mcasandra wrote:

> I am starting a stress test using hector on 6 node machine 4GB heap and 12
> core. In hectore readme this is what I got by default:
> 
> create keyspace StressKeyspace
>with replication_factor = 3
>and placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy';
> 
> use StressKeyspace;
> drop column family StressStandard;
> create column family StressStandard
>with comparator = UTF8Type
>and keys_cached = 1
>and memtable_flush_after = 1440
>and memtable_throughput = 32;
> 
> Are these good values? I was thinking of highher keys_cached but not sure if
> it's in bytes or no of keys.
> 
> Also not sure how to tune memtable values.
> 
> I have set concurrent_readers to 32 and writers to 48.
> 
> Can someone please help me with good values that I can start this test with?
> 
> Also, any other suggested values that I need to change?
> 
> Thanks
> 
> --
> View this message in context: 
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/CF-config-for-Stress-Test-tp6255608p6255608.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
> Nabble.com.



Fwd: Cassandra Stress Test Result Evaluation

2015-03-09 Thread Nisha Menon
I have been using the cassandra-stress tool to evaluate my cassandra
cluster for quite some time now. My problem is that I am not able to
comprehend the results generated for my specific use case.

My schema looks something like this:

CREATE TABLE Table_test(
  ID uuid,
  Time timestamp,
  Value double,
  Date timestamp,
  PRIMARY KEY ((ID,Date), Time)
) WITH COMPACT STORAGE;

I have parsed this information in a custom yaml file and used parameters
n=1, threads=100 and the rest are default options (cl=one, mode=native
cql3 etc). The Cassandra cluster is a 3 node CentOS VM setup.

A few specifics of the custom yaml file are as follows:

insert:
partitions: fixed(100)
select: fixed(1)/2
batchtype: UNLOGGED

columnspecs:
-name: Time
 size: fixed(1000)
-name: ID
 size: uniform(1..100)
-name: Date
 size: uniform(1..10)
-name: Value
 size: uniform(-100..100)

My observations so far are as follows (Please correct me if I am wrong):

   1. With n=1 and time: fixed(1000), the number of rows getting
   inserted is 10 million. (1*1000=1000)
   2. The number of row-keys/partitions is 1(i.e n), within which 100
   partitions are taken at a time (which means 100 *1000 = 10 key-value
   pairs) out of which 5 key-value pairs are processed at a time. (This is
   because of select: fixed(1)/2 ~ 50%)

The output message also confirms the same:

Generating batches with [100..100] partitions and [5..5] rows
(of[10..10] total rows in the partitions)

The results that I get are the following for consecutive runs with the same
configuration as above:

Run Total_ops   Op_rate Partition_rate  Row_Rate   Time
1 56   19 1885   943246 3.0
2 46   46 4648  2325498 1.0
3 27   30 2982  1489870 0.9
4 59   19 1932   966034 3.1
5 100  17 1730   865182 5.8

Now what I need to understand are as follows:

   1. Which among these metrics is the throughput i.e, No. of records
   inserted per second? Is it the Row_rate, Op_rate or Partition_rate? If it’s
   the Row_rate, can I safely conclude here that I am able to insert close to
   1 million records per second? Any thoughts on what the Op_rate and
   Partition_rate mean in this case?
   2. Why is it that the Total_ops vary so drastically in every run ? Has
   the number of threads got anything to do with this variation? What can I
   conclude here about the stability of my Cassandra setup?
   3. How do I determine the batch size per thread here? In my example, is
   the batch size 5?

Thanks in advance.



-- 
Nisha Menon
BTech (CS) Sahrdaya CET,
MTech (CS) IIIT Banglore.


Re: Cassandra Stress Test Result Evaluation

2015-03-09 Thread Jake Luciani
Your insert settings look unrealistic since I doubt you would be
writing 50k rows at a time.  Try to set this to 1 per partition and
you should get much more consistent numbers across runs I would think.
select: fixed(1)/10

On Wed, Mar 4, 2015 at 7:53 AM, Nisha Menon  wrote:
> I have been using the cassandra-stress tool to evaluate my cassandra cluster
> for quite some time now. My problem is that I am not able to comprehend the
> results generated for my specific use case.
>
> My schema looks something like this:
>
> CREATE TABLE Table_test(
>   ID uuid,
>   Time timestamp,
>   Value double,
>   Date timestamp,
>   PRIMARY KEY ((ID,Date), Time)
> ) WITH COMPACT STORAGE;
>
> I have parsed this information in a custom yaml file and used parameters
> n=1, threads=100 and the rest are default options (cl=one, mode=native
> cql3 etc). The Cassandra cluster is a 3 node CentOS VM setup.
>
> A few specifics of the custom yaml file are as follows:
>
> insert:
> partitions: fixed(100)
> select: fixed(1)/2
> batchtype: UNLOGGED
>
> columnspecs:
> -name: Time
>  size: fixed(1000)
> -name: ID
>  size: uniform(1..100)
> -name: Date
>  size: uniform(1..10)
> -name: Value
>  size: uniform(-100..100)
>
> My observations so far are as follows (Please correct me if I am wrong):
>
> With n=1 and time: fixed(1000), the number of rows getting inserted is
> 10 million. (1*1000=1000)
> The number of row-keys/partitions is 1(i.e n), within which 100
> partitions are taken at a time (which means 100 *1000 = 10 key-value
> pairs) out of which 5 key-value pairs are processed at a time. (This is
> because of select: fixed(1)/2 ~ 50%)
>
> The output message also confirms the same:
>
> Generating batches with [100..100] partitions and [5..5] rows
> (of[10..10] total rows in the partitions)
>
> The results that I get are the following for consecutive runs with the same
> configuration as above:
>
> Run Total_ops   Op_rate Partition_rate  Row_Rate   Time
> 1 56   19 1885   943246 3.0
> 2 46   46 4648  2325498 1.0
> 3 27   30 2982  1489870 0.9
> 4 59   19 1932   966034 3.1
> 5 100  17 1730   865182 5.8
>
> Now what I need to understand are as follows:
>
> Which among these metrics is the throughput i.e, No. of records inserted per
> second? Is it the Row_rate, Op_rate or Partition_rate? If it’s the Row_rate,
> can I safely conclude here that I am able to insert close to 1 million
> records per second? Any thoughts on what the Op_rate and Partition_rate mean
> in this case?
> Why is it that the Total_ops vary so drastically in every run ? Has the
> number of threads got anything to do with this variation? What can I
> conclude here about the stability of my Cassandra setup?
> How do I determine the batch size per thread here? In my example, is the
> batch size 5?
>
> Thanks in advance.



-- 
http://twitter.com/tjake


Stress test using Java-based stress utility

2011-07-21 Thread Nilabja Banerjee
Hi All,

I am following this following link " *
http://www.datastax.com/docs/0.7/utilities/stress_java *" for a stress test.
I am getting this notification after running this command

*xxx.xxx.xxx.xx= my ip*

*contrib/stress/bin/stress -d xxx.xxx.xxx.xx*

*Created keyspaces. Sleeping 1s for propagation.
total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
Operation [44] retried 10 times - error inserting key 044
((UnavailableException))

Operation [49] retried 10 times - error inserting key 049
((UnavailableException))

Operation [7] retried 10 times - error inserting key 007
((UnavailableException))

Operation [6] retried 10 times - error inserting key 006
((UnavailableException))
*


*Any idea why I am getting these things?*


*Thank You
*


*
*


Different Load values after stress test runs....

2011-08-23 Thread Chris Marino
Hi, we're running some performance tests against some clusters and I'm
curious about some of the numbers I see.

I'm running the stress test against two identically configured clusters, but
after I run at stress test, I get different Load values across the
clusters?

The difference between the two clusters is that one uses standard EC2
interfaces, but the other runs on a virtual network. Are these differences
indicating something that I should be aware of??

Here is a sample of the kinds of results I'm seeing.

Address DC  RackStatus State   LoadOwns
   Token

 12760588759xxx
10.0.0.17   DC1 RAC1Up Normal  94 MB
25.00%  0
10.0.0.18   DC1 RAC1Up Normal  104.52 MB
25.00%  42535295865xxx
10.0.0.19   DC1 RAC1Up Normal  78.58 MB
 25.00%  85070591730xxx
10.0.0.20   DC1 RAC1Up Normal  78.58 MB
 25.00%  12760588759xxx

Address DC  RackStatus State   LoadOwns
   Token

12760588759xxx
10.120.35.52DC1 RAC1Up Normal  103.74 MB
25.00%  0
10.120.6.124DC1 RAC1Up Normal  118.99 MB
25.00%  42535295865xxx
10.127.90.142   DC1 RAC1Up Normal  104.26 MB
25.00%  85070591730xxx
10.94.69.237DC1 RAC1Up Normal  75.74 MB
 25.00%  12760588759xxx

The first cluster with the vNet (10.0.0.0/28 addresses) consistently show
smaller Load values. The total Load of 355MB vs. 402MB with native EC2
interfaces?? Is a total Load value even meaningful?? The stress test is the
very first thing that's run against the clusters.

[I'm also a little puzzled that these numbers are not uniform within the
clusters, but I suspect that's because the stress test is using a key
distribution that is Gaussian.  I'm not 100% sure of this either since I've
seen conflicting documentation. Haven't tried 'random' keys, but I presume
that would change them to be uniform]

Except for these curious Load numbers, things seem to be running just fine.
Getting good fast results. Over 10 iterations I'm getting more than 10-12K
inserts per sec. (default values for the stress test).

Should I expect the Load to be the same across different clusters?? What
might explain the differences I'm seeing???

Thanks in advance.
CM


Re: Stress test using Java-based stress utility

2011-07-21 Thread Kirk True

  
  
Have you checked the logs on the nodes to see if there are any
errors?

On 7/21/11 10:43 PM, Nilabja Banerjee wrote:
Hi All,
  
  I am following this following link " http://www.datastax.com/docs/0.7/utilities/stress_java
  " for a stress test. I am getting this notification after
  running this command 
  
  xxx.xxx.xxx.xx= my ip
  contrib/stress/bin/stress
-d xxx.xxx.xxx.xx
  Created keyspaces. Sleeping 1s
for propagation.
total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
Operation [44] retried 10 times - error inserting key
044 ((UnavailableException))

Operation [49] retried 10 times - error inserting key
049 ((UnavailableException))

Operation [7] retried 10 times - error inserting key 007
((UnavailableException))

Operation [6] retried 10 times - error inserting key 006
((UnavailableException))
  
  
  
  Any idea why I am getting these
things?
  
  
  Thank You
  
  
  
  



-- 
  Kirk True
  
  Founder, Principal Engineer
  
  
  
  
  
  Expert Engineering Firepower
  
  
  About us: 
  

  



Re: Stress test using Java-based stress utility

2011-07-22 Thread aaron morton
UnavailableException is raised server side when there is less than CL nodes UP 
when the request starts. 

It seems odd to get it in this case because the default replication factor used 
by stress test is 1. How many nodes do you have and have you made any changes 
to the RF ?

Also check the server side logs as Kirk says. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 22 Jul 2011, at 18:37, Kirk True wrote:

> Have you checked the logs on the nodes to see if there are any errors?
> 
> On 7/21/11 10:43 PM, Nilabja Banerjee wrote:
>> 
>> Hi All,
>> 
>> I am following this following link " 
>> http://www.datastax.com/docs/0.7/utilities/stress_java " for a stress test. 
>> I am getting this notification after running this command 
>> 
>> xxx.xxx.xxx.xx= my ip
>> contrib/stress/bin/stress -d xxx.xxx.xxx.xx
>> 
>> Created keyspaces. Sleeping 1s for propagation.
>> total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
>> Operation [44] retried 10 times - error inserting key 044 
>> ((UnavailableException))
>> 
>> Operation [49] retried 10 times - error inserting key 049 
>> ((UnavailableException))
>> 
>> Operation [7] retried 10 times - error inserting key 007 
>> ((UnavailableException))
>> 
>> Operation [6] retried 10 times - error inserting key 006 
>> ((UnavailableException))
>> 
>> 
>> Any idea why I am getting these things?
>> 
>> 
>> Thank You
>> 
>> 
>> 
>> 
> 
> -- 
> Kirk True 
> Founder, Principal Engineer 
> 
>  
> 
> Expert Engineering Firepower 
> 
> About us:  



Re: Stress test using Java-based stress utility

2011-07-22 Thread Nilabja Banerjee
Running only one node. I dnt think it is coming for the replication
factor...  I will try to sort this out Any other suggestions from your
side is always be helpful..

:) Thank you



On 22 July 2011 14:36, aaron morton  wrote:

> UnavailableException is raised server side when there is less than CL nodes
> UP when the request starts.
>
> It seems odd to get it in this case because the default replication factor
> used by stress test is 1. How many nodes do you have and have you made any
> changes to the RF ?
>
> Also check the server side logs as Kirk says.
>
> Cheers
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 22 Jul 2011, at 18:37, Kirk True wrote:
>
>  Have you checked the logs on the nodes to see if there are any errors?
>
> On 7/21/11 10:43 PM, Nilabja Banerjee wrote:
>
> Hi All,
>
> I am following this following link " *
> http://www.datastax.com/docs/0.7/utilities/stress_java *" for a stress
> test. I am getting this notification after running this command
>
> *xxx.xxx.xxx.xx= my ip*
>
> *contrib/stress/bin/stress -d xxx.xxx.xxx.xx*
>
> *Created keyspaces. Sleeping 1s for propagation.
> total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
> Operation [44] retried 10 times - error inserting key 044
> ((UnavailableException))
>
> Operation [49] retried 10 times - error inserting key 049
> ((UnavailableException))
>
> Operation [7] retried 10 times - error inserting key 007
> ((UnavailableException))
>
> Operation [6] retried 10 times - error inserting key 006
> ((UnavailableException))
> *
>
>
> *Any idea why I am getting these things?*
>
>
> *Thank You
> *
>
>
> *
> *
>
>
> --
> Kirk True
> Founder, Principal Engineer
>
>  <http://www.mustardgrain.com/>
>
> *Expert Engineering Firepower*
>
> About us:  <http://www.twitter.com/mustardgraininc>
>  <http://www.linkedin.com/company/mustard-grain-inc.>
>
>
>


Re: Stress test using Java-based stress utility

2011-07-22 Thread Jonathan Ellis
What does nodetool ring say?

On Fri, Jul 22, 2011 at 12:43 AM, Nilabja Banerjee
 wrote:
> Hi All,
>
> I am following this following link "
> http://www.datastax.com/docs/0.7/utilities/stress_java " for a stress test.
> I am getting this notification after running this command
>
> xxx.xxx.xxx.xx= my ip
>
> contrib/stress/bin/stress -d xxx.xxx.xxx.xx
>
> Created keyspaces. Sleeping 1s for propagation.
> total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
> Operation [44] retried 10 times - error inserting key 044
> ((UnavailableException))
>
> Operation [49] retried 10 times - error inserting key 049
> ((UnavailableException))
>
> Operation [7] retried 10 times - error inserting key 007
> ((UnavailableException))
>
> Operation [6] retried 10 times - error inserting key 006
> ((UnavailableException))
>
> Any idea why I am getting these things?
>
> Thank You
>
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Stress test using Java-based stress utility

2011-07-26 Thread CASSANDRA learner
Hi,,
I too wanna know what this stress tool do? What is the usage of this tool...
Please explain

On Fri, Jul 22, 2011 at 6:39 PM, Jonathan Ellis  wrote:

> What does nodetool ring say?
>
> On Fri, Jul 22, 2011 at 12:43 AM, Nilabja Banerjee
>  wrote:
> > Hi All,
> >
> > I am following this following link "
> > http://www.datastax.com/docs/0.7/utilities/stress_java " for a stress
> test.
> > I am getting this notification after running this command
> >
> > xxx.xxx.xxx.xx= my ip
> >
> > contrib/stress/bin/stress -d xxx.xxx.xxx.xx
> >
> > Created keyspaces. Sleeping 1s for propagation.
> > total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
> > Operation [44] retried 10 times - error inserting key 044
> > ((UnavailableException))
> >
> > Operation [49] retried 10 times - error inserting key 049
> > ((UnavailableException))
> >
> > Operation [7] retried 10 times - error inserting key 007
> > ((UnavailableException))
> >
> > Operation [6] retried 10 times - error inserting key 006
> > ((UnavailableException))
> >
> > Any idea why I am getting these things?
> >
> > Thank You
> >
> >
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>


Re: Stress test using Java-based stress utility

2011-07-26 Thread aaron morton
It's in the source distribution under tools/stress see the instructions in the 
README file and then look at the command line help (bin/stress --help). 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 26 Jul 2011, at 19:40, CASSANDRA learner wrote:

> Hi,,
> I too wanna know what this stress tool do? What is the usage of this tool... 
> Please explain
> 
> On Fri, Jul 22, 2011 at 6:39 PM, Jonathan Ellis  wrote:
> What does nodetool ring say?
> 
> On Fri, Jul 22, 2011 at 12:43 AM, Nilabja Banerjee
>  wrote:
> > Hi All,
> >
> > I am following this following link "
> > http://www.datastax.com/docs/0.7/utilities/stress_java " for a stress test.
> > I am getting this notification after running this command
> >
> > xxx.xxx.xxx.xx= my ip
> >
> > contrib/stress/bin/stress -d xxx.xxx.xxx.xx
> >
> > Created keyspaces. Sleeping 1s for propagation.
> > total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
> > Operation [44] retried 10 times - error inserting key 044
> > ((UnavailableException))
> >
> > Operation [49] retried 10 times - error inserting key 049
> > ((UnavailableException))
> >
> > Operation [7] retried 10 times - error inserting key 007
> > ((UnavailableException))
> >
> > Operation [6] retried 10 times - error inserting key 006
> > ((UnavailableException))
> >
> > Any idea why I am getting these things?
> >
> > Thank You
> >
> >
> >
> 
> 
> 
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
> 



Re: Stress test using Java-based stress utility

2011-07-26 Thread Nilabja Banerjee
Thank you every one it is working fine.

I was watching jconsole behavior...can tell me where exactly I can
find "  *RecentHitRates"
:
*Tuning for Optimal Caching:
Here they have given one example of that  *
http://www.datastax.com/docs/0.8/operations/cache_tuning#configuring-key-and-row-caches
* *RecentHitRates...  *In my jconsole within MBean I am unable to find
that one.
what is the value of long[36] and long[90].  From Jconsole attributes
how can I find the  *performance of the casssandra while stress testing?
Thank You
***

On 26 July 2011 14:33, aaron morton  wrote:

> It's in the source distribution under tools/stress see the instructions in
> the README file and then look at the command line help (bin/stress --help).
>
> Cheers
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 26 Jul 2011, at 19:40, CASSANDRA learner wrote:
>
> Hi,,
> I too wanna know what this stress tool do? What is the usage of this
> tool... Please explain
>
> On Fri, Jul 22, 2011 at 6:39 PM, Jonathan Ellis  wrote:
>
>> What does nodetool ring say?
>>
>> On Fri, Jul 22, 2011 at 12:43 AM, Nilabja Banerjee
>>  wrote:
>> > Hi All,
>> >
>> > I am following this following link "
>> > http://www.datastax.com/docs/0.7/utilities/stress_java " for a stress
>> test.
>> > I am getting this notification after running this command
>> >
>> > xxx.xxx.xxx.xx= my ip
>> >
>> > contrib/stress/bin/stress -d xxx.xxx.xxx.xx
>> >
>> > Created keyspaces. Sleeping 1s for propagation.
>> > total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
>> > Operation [44] retried 10 times - error inserting key 044
>> > ((UnavailableException))
>> >
>> > Operation [49] retried 10 times - error inserting key 049
>> > ((UnavailableException))
>> >
>> > Operation [7] retried 10 times - error inserting key 007
>> > ((UnavailableException))
>> >
>> > Operation [6] retried 10 times - error inserting key 006
>> > ((UnavailableException))
>> >
>> > Any idea why I am getting these things?
>> >
>> > Thank You
>> >
>> >
>> >
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of DataStax, the source for professional Cassandra support
>> http://www.datastax.com
>>
>
>
>


Re: Stress test using Java-based stress utility

2011-07-26 Thread Jonathan Ellis
cassandra.db.Caches

On Tue, Jul 26, 2011 at 2:11 AM, Nilabja Banerjee
 wrote:
> Thank you every one it is working fine.
>
> I was watching jconsole behavior...can tell me where exactly I can find "
> RecentHitRates" :
>
> Tuning for Optimal Caching:
>
> Here they have given one example of that
> http://www.datastax.com/docs/0.8/operations/cache_tuning#configuring-key-and-row-caches
> RecentHitRates...  In my jconsole within MBean I am unable to find that
> one.
> what is the value of long[36] and long[90].  From Jconsole attributes
> how can I find the  performance of the casssandra while stress testing?
> Thank You
>
>
> On 26 July 2011 14:33, aaron morton  wrote:
>>
>> It's in the source distribution under tools/stress see the instructions in
>> the README file and then look at the command line help (bin/stress --help).
>> Cheers
>> -
>> Aaron Morton
>> Freelance Cassandra Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>> On 26 Jul 2011, at 19:40, CASSANDRA learner wrote:
>>
>> Hi,,
>> I too wanna know what this stress tool do? What is the usage of this
>> tool... Please explain
>>
>> On Fri, Jul 22, 2011 at 6:39 PM, Jonathan Ellis  wrote:
>>>
>>> What does nodetool ring say?
>>>
>>> On Fri, Jul 22, 2011 at 12:43 AM, Nilabja Banerjee
>>>  wrote:
>>> > Hi All,
>>> >
>>> > I am following this following link "
>>> > http://www.datastax.com/docs/0.7/utilities/stress_java " for a stress
>>> > test.
>>> > I am getting this notification after running this command
>>> >
>>> > xxx.xxx.xxx.xx= my ip
>>> >
>>> > contrib/stress/bin/stress -d xxx.xxx.xxx.xx
>>> >
>>> > Created keyspaces. Sleeping 1s for propagation.
>>> > total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
>>> > Operation [44] retried 10 times - error inserting key 044
>>> > ((UnavailableException))
>>> >
>>> > Operation [49] retried 10 times - error inserting key 049
>>> > ((UnavailableException))
>>> >
>>> > Operation [7] retried 10 times - error inserting key 007
>>> > ((UnavailableException))
>>> >
>>> > Operation [6] retried 10 times - error inserting key 006
>>> > ((UnavailableException))
>>> >
>>> > Any idea why I am getting these things?
>>> >
>>> > Thank You
>>> >
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Jonathan Ellis
>>> Project Chair, Apache Cassandra
>>> co-founder of DataStax, the source for professional Cassandra support
>>> http://www.datastax.com
>>
>>
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Stress test using Java-based stress utility

2011-07-26 Thread Nilabja Banerjee
Thank you Jonathan.. :)




On 26 July 2011 20:08, Jonathan Ellis  wrote:

> cassandra.db.Caches
>
> On Tue, Jul 26, 2011 at 2:11 AM, Nilabja Banerjee
>  wrote:
> > Thank you every one it is working fine.
> >
> > I was watching jconsole behavior...can tell me where exactly I can find "
> > RecentHitRates" :
> >
> > Tuning for Optimal Caching:
> >
> > Here they have given one example of that
> >
> http://www.datastax.com/docs/0.8/operations/cache_tuning#configuring-key-and-row-caches
> > RecentHitRates...  In my jconsole within MBean I am unable to find
> that
> > one.
> > what is the value of long[36] and long[90].  From Jconsole attributes
> > how can I find the  performance of the casssandra while stress testing?
> > Thank You
> >
> >
> > On 26 July 2011 14:33, aaron morton  wrote:
> >>
> >> It's in the source distribution under tools/stress see the instructions
> in
> >> the README file and then look at the command line help (bin/stress
> --help).
> >> Cheers
> >> -
> >> Aaron Morton
> >> Freelance Cassandra Developer
> >> @aaronmorton
> >> http://www.thelastpickle.com
> >> On 26 Jul 2011, at 19:40, CASSANDRA learner wrote:
> >>
> >> Hi,,
> >> I too wanna know what this stress tool do? What is the usage of this
> >> tool... Please explain
> >>
> >> On Fri, Jul 22, 2011 at 6:39 PM, Jonathan Ellis 
> wrote:
> >>>
> >>> What does nodetool ring say?
> >>>
> >>> On Fri, Jul 22, 2011 at 12:43 AM, Nilabja Banerjee
> >>>  wrote:
> >>> > Hi All,
> >>> >
> >>> > I am following this following link "
> >>> > http://www.datastax.com/docs/0.7/utilities/stress_java " for a
> stress
> >>> > test.
> >>> > I am getting this notification after running this command
> >>> >
> >>> > xxx.xxx.xxx.xx= my ip
> >>> >
> >>> > contrib/stress/bin/stress -d xxx.xxx.xxx.xx
> >>> >
> >>> > Created keyspaces. Sleeping 1s for propagation.
> >>> > total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
> >>> > Operation [44] retried 10 times - error inserting key 044
> >>> > ((UnavailableException))
> >>> >
> >>> > Operation [49] retried 10 times - error inserting key 049
> >>> > ((UnavailableException))
> >>> >
> >>> > Operation [7] retried 10 times - error inserting key 007
> >>> > ((UnavailableException))
> >>> >
> >>> > Operation [6] retried 10 times - error inserting key 006
> >>> > ((UnavailableException))
> >>> >
> >>> > Any idea why I am getting these things?
> >>> >
> >>> > Thank You
> >>> >
> >>> >
> >>> >
> >>>
> >>>
> >>>
> >>> --
> >>> Jonathan Ellis
> >>> Project Chair, Apache Cassandra
> >>> co-founder of DataStax, the source for professional Cassandra support
> >>> http://www.datastax.com
> >>
> >>
> >
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>


Re: Different Load values after stress test runs....

2011-08-23 Thread Philippe
Have you run repair on the nodes ? Maybe some data was lost and not repaired
yet ?

Philippe

2011/8/23 Chris Marino 

> Hi, we're running some performance tests against some clusters and I'm
> curious about some of the numbers I see.
>
> I'm running the stress test against two identically configured clusters,
> but after I run at stress test, I get different Load values across the
> clusters?
>
> The difference between the two clusters is that one uses standard EC2
> interfaces, but the other runs on a virtual network. Are these differences
> indicating something that I should be aware of??
>
> Here is a sample of the kinds of results I'm seeing.
>
> Address DC  RackStatus State   LoadOwns
>Token
>
>  12760588759xxx
> 10.0.0.17   DC1 RAC1Up Normal  94 MB
> 25.00%  0
> 10.0.0.18   DC1 RAC1Up Normal  104.52 MB
> 25.00%  42535295865xxx
> 10.0.0.19   DC1 RAC1Up Normal  78.58 MB
>  25.00%  85070591730xxx
> 10.0.0.20   DC1 RAC1Up Normal  78.58 MB
>  25.00%  12760588759xxx
>
> Address DC  RackStatus State   LoadOwns
>Token
>
> 12760588759xxx
> 10.120.35.52DC1 RAC1Up Normal  103.74 MB
> 25.00%  0
> 10.120.6.124DC1 RAC1Up Normal  118.99 MB
> 25.00%  42535295865xxx
> 10.127.90.142   DC1 RAC1Up Normal  104.26 MB
> 25.00%  85070591730xxx
> 10.94.69.237DC1 RAC1Up Normal  75.74 MB
>  25.00%  12760588759xxx
>
> The first cluster with the vNet (10.0.0.0/28 addresses) consistently show
> smaller Load values. The total Load of 355MB vs. 402MB with native EC2
> interfaces?? Is a total Load value even meaningful?? The stress test is the
> very first thing that's run against the clusters.
>
> [I'm also a little puzzled that these numbers are not uniform within the
> clusters, but I suspect that's because the stress test is using a key
> distribution that is Gaussian.  I'm not 100% sure of this either since I've
> seen conflicting documentation. Haven't tried 'random' keys, but I presume
> that would change them to be uniform]
>
> Except for these curious Load numbers, things seem to be running just fine.
> Getting good fast results. Over 10 iterations I'm getting more than 10-12K
> inserts per sec. (default values for the stress test).
>
> Should I expect the Load to be the same across different clusters?? What
> might explain the differences I'm seeing???
>
> Thanks in advance.
> CM
>


How to stress test collections in Cassandra Stress

2017-04-13 Thread eugene miretsky
Hi,

I'm trying to do a stress test on a a table with a collection column, but
cannot figure out how to do that.

I tried

table_definition: |
  CREATE TABLE list (
customer_id bigint,
items list,
PRIMARY KEY (customer_id));

columnspec:
  - name: customer_id
size: fixed(64)
population: norm(0..40M)
  - name: items
cluster: fixed(40)

When running the benchmark, I get: java.io.IOException: Operation x10 on
key(s) [27056313]: Error executing: (NoSuchElementException)


Cassandra 0.6.2 stress test failing due to setKeyspace issue

2010-07-01 Thread maneela a
Can someone direct me how to resolve this issue in cassandra 0.6.2 version?
./stress.py -o insert -n 1 -y regular -d 
ec2-174-129-65-118.compute-1.amazonaws.com --threads 5 --keep-going
Created keyspaces.  Sleeping 1s for propagation.Traceback (most recent call 
last):  File "./stress.py", line 381, in     benchmark()  File 
"./stress.py", line 363, in insert    threads = 
self.create_threads('insert')  File "./stress.py", line 325, in 
create_threads    th = OperationFactory.create(type, i, self.opcounts, 
self.keycounts, self.latencies)  File "./stress.py", line 310, in create    
return Inserter(i, opcounts, keycounts, latencies)  File "./stress.py", line 
178, in __init__    self.cclient.set_keyspace('Keyspace1')  File 
"/home/ubuntu/cassandra/interface/thrift/gen-py/cassandra/Cassandra.py", line 
333, in set_keyspace    self.recv_set_keyspace()  File 
"/home/ubuntu/cassandra/interface/thrift/gen-py/cassandra/Cassandra.py", line 
349, in recv_set_keyspace    raise xthrift.Thrift.TApplicationException: 
Invalid method name: 'set_keyspace'

Niru


  

What need to be monitored while running stress test

2011-04-08 Thread mcasandra
What are the key things to monitor while running a stress test? There is tons
of details in nodetoll tpstats/netstats/cfstats. What in particular should I
be looking at?

Also, I've been looking at iostat and await really goes high but cfstats
shows low latency in microsecs. Is latency in cfstats calculated per
operation?

I am just trying to understand what I need to look just to make sure I don't
overlook important points in process of evaluating cassandra.

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/What-need-to-be-monitored-while-running-stress-test-tp6255765p6255765.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: How to stress test collections in Cassandra Stress

2017-04-24 Thread Ahmed Eljami
Hi,

Collections are not supported in cassandra-stress tool.


I suggest you use Jmeter with cassandra java driver to do your stress test
with collection or Spark.


2017-04-13 16:26 GMT+02:00 eugene miretsky :

> Hi,
>
> I'm trying to do a stress test on a a table with a collection column, but
> cannot figure out how to do that.
>
> I tried
>
> table_definition: |
>   CREATE TABLE list (
> customer_id bigint,
> items list,
> PRIMARY KEY (customer_id));
>
> columnspec:
>   - name: customer_id
> size: fixed(64)
> population: norm(0..40M)
>   - name: items
> cluster: fixed(40)
>
> When running the benchmark, I get: java.io.IOException: Operation x10 on
> key(s) [27056313]: Error executing: (NoSuchElementException)
>
>
>


-- 
Cordialement;

Ahmed ELJAMI


Re: How to stress test collections in Cassandra Stress

2017-04-24 Thread LuckyBoy
unsubscribe

On Thu, Apr 13, 2017 at 7:26 AM, eugene miretsky 
wrote:

> Hi,
>
> I'm trying to do a stress test on a a table with a collection column, but
> cannot figure out how to do that.
>
> I tried
>
> table_definition: |
>   CREATE TABLE list (
> customer_id bigint,
> items list,
> PRIMARY KEY (customer_id));
>
> columnspec:
>   - name: customer_id
> size: fixed(64)
> population: norm(0..40M)
>   - name: items
> cluster: fixed(40)
>
> When running the benchmark, I get: java.io.IOException: Operation x10 on
> key(s) [27056313]: Error executing: (NoSuchElementException)
>
>
>


Re: How to stress test collections in Cassandra Stress

2017-04-25 Thread Alain RODRIGUEZ
Hi 'luckiboy'.

You have been trying to unsubscribe from Cassandra dev and user list lately.

To do so, sending "unsubscribe" in a message is not the way to go as you
probably noticed by now. It just spam people on those lists.

As written here http://cassandra.apache.org/community/, you actually have
to send an email to both user-unsubscr...@cassandra.apache.org and
dev-unsubscr...@cassandra.apache.org.

Cheers,
---
Alain Rodriguez - @arodream - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2017-04-24 15:08 GMT+02:00 LuckyBoy :

> unsubscribe
>
> On Thu, Apr 13, 2017 at 7:26 AM, eugene miretsky <
> eugene.miret...@gmail.com> wrote:
>
>> Hi,
>>
>> I'm trying to do a stress test on a a table with a collection column, but
>> cannot figure out how to do that.
>>
>> I tried
>>
>> table_definition: |
>>   CREATE TABLE list (
>> customer_id bigint,
>> items list,
>> PRIMARY KEY (customer_id));
>>
>> columnspec:
>>   - name: customer_id
>> size: fixed(64)
>> population: norm(0..40M)
>>   - name: items
>> cluster: fixed(40)
>>
>> When running the benchmark, I get: java.io.IOException: Operation x10 on
>> key(s) [27056313]: Error executing: (NoSuchElementException)
>>
>>
>>
>


Re: Cassandra 0.6.2 stress test failing due to setKeyspace issue

2010-07-01 Thread Jonathan Ellis
you're running a 0.7 stress.py against a 0.6 cassandra, that's not going to
work

On Thu, Jul 1, 2010 at 12:16 PM, maneela a  wrote:

> Can someone direct me how to resolve this issue in cassandra 0.6.2 version?
>
> ./stress.py -o insert -n 1 -y regular -d
> ec2-174-129-65-118.compute-1.amazonaws.com --threads 5 --keep-going
>
> Created keyspaces.  Sleeping 1s for propagation.
> Traceback (most recent call last):
>   File "./stress.py", line 381, in 
> benchmark()
>   File "./stress.py", line 363, in insert
> threads = self.create_threads('insert')
>   File "./stress.py", line 325, in create_threads
> th = OperationFactory.create(type, i, self.opcounts, self.keycounts,
> self.latencies)
>   File "./stress.py", line 310, in create
> return Inserter(i, opcounts, keycounts, latencies)
>   File "./stress.py", line 178, in __init__
> self.cclient.set_keyspace('Keyspace1')
>   File
> "/home/ubuntu/cassandra/interface/thrift/gen-py/cassandra/Cassandra.py",
> line 333, in set_keyspace
> self.recv_set_keyspace()
>   File
> "/home/ubuntu/cassandra/interface/thrift/gen-py/cassandra/Cassandra.py",
> line 349, in recv_set_keyspace
> raise x
> thrift.Thrift.TApplicationException: Invalid method name: 'set_keyspace'
>
>
> Niru
>
>


-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: Cassandra 0.6.2 stress test failing due to setKeyspace issue

2010-07-01 Thread maneela a
Thanks Jonathan

--- On Thu, 7/1/10, Jonathan Ellis  wrote:

From: Jonathan Ellis 
Subject: Re: Cassandra 0.6.2 stress test failing due to setKeyspace issue
To: user@cassandra.apache.org
Date: Thursday, July 1, 2010, 3:32 PM

you're running a 0.7 stress.py against a 0.6 cassandra, that's not going to work

On Thu, Jul 1, 2010 at 12:16 PM, maneela a  wrote:




Can someone direct me how to resolve this issue in cassandra 0.6.2 version?
./stress.py -o insert -n 1 -y regular -d 
ec2-174-129-65-118.compute-1.amazonaws.com --threads 5 --keep-going


Created keyspaces.  Sleeping 1s for propagation.Traceback (most recent call 
last):  File "./stress.py", line 381, in     benchmark()

  File "./stress.py", line 363, in insert    threads = 
self.create_threads('insert')  File "./stress.py", line 325, in 
create_threads    th = OperationFactory.create(type, i, self.opcounts, 
self.keycounts, self.latencies)

  File "./stress.py", line 310, in create  
  return Inserter(i, opcounts, keycounts, latencies)  File "./stress.py", line 
178, in __init__    self.cclient.set_keyspace('Keyspace1')  File 
"/home/ubuntu/cassandra/interface/thrift/gen-py/cassandra/Cassandra.py", line 
333, in set_keyspace

    self.recv_set_keyspace()  File 
"/home/ubuntu/cassandra/interface/thrift/gen-py/cassandra/Cassandra.py", line 
349, in recv_set_keyspace    raise xthrift.Thrift.TApplicationException: 
Invalid method name: 'set_keyspace'



Niru








  


-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com






  

Re: What need to be monitored while running stress test

2011-04-09 Thread Stu Hood
The storage proxy latencies are the primary metric: in particular, the
latency histograms show the distribution of query times.


On Fri, Apr 8, 2011 at 5:27 PM, mcasandra  wrote:

> What are the key things to monitor while running a stress test? There is
> tons
> of details in nodetoll tpstats/netstats/cfstats. What in particular should
> I
> be looking at?
>
> Also, I've been looking at iostat and await really goes high but cfstats
> shows low latency in microsecs. Is latency in cfstats calculated per
> operation?
>
> I am just trying to understand what I need to look just to make sure I
> don't
> overlook important points in process of evaluating cassandra.
>
> --
> View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/What-need-to-be-monitored-while-running-stress-test-tp6255765p6255765.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at
> Nabble.com.
>


Re: What need to be monitored while running stress test

2011-04-09 Thread mcasandra
What is a storage proxy latency?

By query latency you mean the one in cfstats and cfhistorgrams?

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/What-need-to-be-monitored-while-running-stress-test-tp6255765p6257932.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: What need to be monitored while running stress test

2011-04-09 Thread aaron morton
in jconsole MBean org.apache.cassandra.db.StorageProxy 

It shows the latency for read and write operations, not just per CF 

Aaron

On 10 Apr 2011, at 11:37, mcasandra wrote:

> What is a storage proxy latency?
> 
> By query latency you mean the one in cfstats and cfhistorgrams?
> 
> --
> View this message in context: 
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/What-need-to-be-monitored-while-running-stress-test-tp6255765p6257932.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
> Nabble.com.



Cassandra stress test and max vs. average read/write latency.

2011-12-19 Thread Peter Fales

Has anyone looked much at the maximum latency of cassandra read/write
requests?  (rather than the average latency and average throughput)

We've been struggling for quite some time trying to figure out why we
we see occasional read or write response times in the 100s of milliseconds
even on fast machines that normally respond in just a few milleconds.   
We've spent a lot of time attempting to trying to tune our benchark
and cassandra configurations to lower these maximum times.   There are
a lot of things that make it a little better, or a little worse, but we've
found it nearly impossible to eliminate these "outliers" completely.

A lot of our initial testing was done with a home-grown benchmark test
written in C++ and using the thrift interface.   However, now that we've 
recently upgraded from 0.6.8 to 1.0.3, that has allowed me to do some 
testing using the official java "stress" tool.   The problem, at least
for this purpose, is that the stress tool only reports *average*
response times over the measurement intervals.This effectively
hides the large value if they are infrequent relative to measurement
interval.   I've modified the stress test, so that it also tracks the
maximum latency reported over each measurement interval. 

Here is an excerpt from a typical result:

$ bin/stress -d XXX -p  -e QUORUM  -t 4  -i 1  -l 3 -c 1 -n 40
total interval_op_rate interval_key_rate avg_latency elapsed_time max(millisec)
5780 5780 5780 6.098615916955017E-4 1 13
13837 8057 8057 5.003102891895247E-4 2 4
22729 8892 8892 4.7199730094466935E-4 3 4
31840 9111 9111 4.6504225661288555E-4 4 1
40925 9085 9085 4.6846450192625206E-4 5 1
49076 8151 8151 5.20054799411E-4 6 100
...
3186625 8886 8886 4.786180508665316E-4 411 10
3195626 9001 9001 4.705032774136207E-4 412 1
3204574 8948 8948 4.710549843540456E-4 414 1
3213524 8950 8950 4.7195530726256986E-4 415 1
3217534 4010 4010 0.0010763092269326683 416 607
3226560 9026 9026 4.695324617770884E-4 417 1
3235425 8865 8865 4.7805978567399887E-4 418 1
3244177 8752 8752 4.848034734917733E-4 419 10
...


My patch adds the final column which logs the maximum response time 
over the one-second interval.  In most cases the average reponse time 
is under 1 msec, and though the maximum might be a bit larger, it's still
just a few milleseonds - usually under 10 msec.   But sometimes (like
interval 416) one of the response took 607 milliseconds.

These numbers aren't too bad if you are supporting an interactive 
application and don't mind a slightly slower response now and then as
long as the average stays low and throughput stays high.  But for 
other types of applications, these slow responses might be a problem.

I'm trying to understand if this is expected or not, and if there is
anything we can do about.   I'd also be interested in hearing from folks
that have run the stress test agains their own Cassandra clusters.   

If anyone wants to try this, I've included  my patch to the stress test 
below.   So far, I've only instrumented the default "insert" operation. 
(It also adusts the output to separate fields with spaces instead of
commas.  I find that easier to read and it caters to gnuplot)

diff -ur stress/src/org/apache/cassandra/stress/operations/Inserter.java 
/home/psfales/MME/apache-cassandra-1.0.3-src/tools/stress/src/org/apache/cassandra/stress/operations/Inserter.java
--- stress/src/org/apache/cassandra/stress/operations/Inserter.java 
2011-11-15 02:57:23.0 -0600
+++ 
/home/psfales/MME/apache-cassandra-1.0.3-src/tools/stress/src/org/apache/cassandra/stress/operations/Inserter.java
  2011-12-09 08:58:04.0 -0600
@@ -108,6 +108,11 @@
 session.operations.getAndIncrement();
 session.keys.getAndIncrement();
 session.latency.getAndAdd(System.currentTimeMillis() - start);
+
+long  delta = System.currentTimeMillis() - start;
+if ( delta > session.maxlatency.get() ) {
+session.maxlatency.set(delta);
+}
 }
 
 private Map> 
getSuperColumnsMutationMap(List superColumns)
diff -ur stress/src/org/apache/cassandra/stress/Session.java 
/home/psfales/MME/apache-cassandra-1.0.3-src/tools/stress/src/org/apache/cassandra/stress/Session.java
--- stress/src/org/apache/cassandra/stress/Session.java 2011-11-15 
02:57:23.0 -0600
+++ 
/home/psfales/MME/apache-cassandra-1.0.3-src/tools/stress/src/org/apache/cassandra/stress/Session.java
  2011-12-09 08:48:45.0 -0600
@@ -53,6 +53,7 @@
 public final AtomicInteger operations;
 public final AtomicInteger keys;
 public final AtomicLonglatency;
+public final AtomicLongmaxlatency;
 
 static
 {
@@ -337,6 +338,7 @@
 operations = new AtomicInteger();
 keys = new AtomicInteger();
 latency = new AtomicLong();
+   maxlatency = new AtomicLong();
 }
 
 public int getCardinality()
diff -ur stress/src/org/apache/cassa

Re: Cassandra stress test and max vs. average read/write latency.

2011-12-19 Thread Peter Schuller
> I'm trying to understand if this is expected or not, and if there is

Without careful tuning, outliers around a couple of hundred ms are
definitely expected in general (not *necessarily*, depending on
workload) as a result of garbage collection pauses. The impact will be
worsened a bit if you are running under high CPU load (or even maxing
it out with stress) because post-pause, if you are close to max CPU
usage you will take considerably longer to "catch up".

Personally, I would just log each response time and feed it to gnuplot
or something. It should be pretty obvious whether or not the latencies
are due to periodic pauses.

If you are concerned with eliminating or reducing outliers, I would:

(1) Make sure that when you're benchmarking, that you're putting
Cassandra under a reasonable amount of load. Latency benchmarks are
usually useless if you're benchmarking against a saturated system. At
least, start by achieving your latency goals at 25% or less CPU usage,
and then go from there if you want to up it.

(2) One can affect GC pauses, but it's non-trivial to eliminate the
problem completely. For example, the length of frequent young-gen
pauses can typically be decreased by decreasing the size of the young
generation, leading to more frequent shorter GC pauses. But that
instead causes more promotion into the old generation, which will
result in more frequent very long pauses (relative to normal; they
would still be infrequent relative to young gen pauses) - IF your
workload is such that you are suffering from fragmentation and
eventually seeing Cassandra fall back to full compacting GC:s
(stop-the-world) for the old generation.

I would start by adjusting young gen so that your frequent pauses are
at an acceptable level, and then see whether or not you can sustain
that in terms of old-gen.

Start with this in any case: Run Cassandra with -XX:+PrintGC
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps

-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)


Re: Cassandra stress test and max vs. average read/write latency.

2011-12-22 Thread Peter Fales
Peter,

Thanks for your input.  Can you tell me more about what we should be
looking for in the gc log?   We've already got the gc logging turned
on and, and we've already done the plotting to show that in most 
cases the outliers are happening periodically (with a period of 
10s of seconds to a few minutes, depnding on load and tuning)

I've tried to correlate the times of the outliers with messages either
in the system log or the gc log.   There seemms to be some (but not
complete) correlation between the outliers and system log messages about
memtable flushing.   I can not find anything in the gc log that 
seems to be an obvious problem, or that matches up with the time 
times of the outliers.


On Mon, Dec 19, 2011 at 09:47:32PM -0800, Peter Schuller wrote:
> > I'm trying to understand if this is expected or not, and if there is
> 
> Without careful tuning, outliers around a couple of hundred ms are
> definitely expected in general (not *necessarily*, depending on
> workload) as a result of garbage collection pauses. The impact will be
> worsened a bit if you are running under high CPU load (or even maxing
> it out with stress) because post-pause, if you are close to max CPU
> usage you will take considerably longer to "catch up".
> 
> Personally, I would just log each response time and feed it to gnuplot
> or something. It should be pretty obvious whether or not the latencies
> are due to periodic pauses.
> 
> If you are concerned with eliminating or reducing outliers, I would:
> 
> (1) Make sure that when you're benchmarking, that you're putting
> Cassandra under a reasonable amount of load. Latency benchmarks are
> usually useless if you're benchmarking against a saturated system. At
> least, start by achieving your latency goals at 25% or less CPU usage,
> and then go from there if you want to up it.
> 
> (2) One can affect GC pauses, but it's non-trivial to eliminate the
> problem completely. For example, the length of frequent young-gen
> pauses can typically be decreased by decreasing the size of the young
> generation, leading to more frequent shorter GC pauses. But that
> instead causes more promotion into the old generation, which will
> result in more frequent very long pauses (relative to normal; they
> would still be infrequent relative to young gen pauses) - IF your
> workload is such that you are suffering from fragmentation and
> eventually seeing Cassandra fall back to full compacting GC:s
> (stop-the-world) for the old generation.
> 
> I would start by adjusting young gen so that your frequent pauses are
> at an acceptable level, and then see whether or not you can sustain
> that in terms of old-gen.
> 
> Start with this in any case: Run Cassandra with -XX:+PrintGC
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps
> 
> -- 
> / Peter Schuller (@scode, http://worldmodscode.wordpress.com)

-- 
Peter Fales
Alcatel-Lucent
Member of Technical Staff
1960 Lucent Lane
Room: 9H-505
Naperville, IL 60566-7033
Email: peter.fa...@alcatel-lucent.com
Phone: 630 979 8031


Re: Cassandra stress test and max vs. average read/write latency.

2011-12-22 Thread Peter Schuller
> Thanks for your input.  Can you tell me more about what we should be
> looking for in the gc log?   We've already got the gc logging turned
> on and, and we've already done the plotting to show that in most
> cases the outliers are happening periodically (with a period of
> 10s of seconds to a few minutes, depnding on load and tuning)

Are you measuring writes or reads? If writes,
https://issues.apache.org/jira/browse/CASSANDRA-1991 is still relevant
I think (sorry no progress from my end on that one). Also, I/O
scheduling issues can easily cause problems with the commit log
latency (on fsync()). Try switching to periodic commit log mode and
see if it helps, just to eliminate that (if you're not already in
periodic; if so, try upping the interval).

For reads, I am generally unaware of much aside from GC and legitimate
"jitter" (scheduling/disk I/O etc) that would generate outliers. At
least that I can think of off hand...

And w.r.t. the GC log - yeah, correlating in time is one thing.
Another thing is to confirm what kind of GC pauses you're seeing.
Generally you want to be seeing lots of ParNew:s of shorter duration,
and those are tweakable by changing the young generation size. The
other thing is to make sure CMS is not failing (promotion
failure/concurrent mode failure) and falling back to a stop-the-world
serial compacting GC of the entire heap.

You might also use -:XX+PrintApplicationPauseTime (I think, I am
probably not spelling it entirely correctly) to get a more obvious and
greppable report for each pause, regardless of "type"/cause.

> I've tried to correlate the times of the outliers with messages either
> in the system log or the gc log.   There seemms to be some (but not
> complete) correlation between the outliers and system log messages about
> memtable flushing.   I can not find anything in the gc log that
> seems to be an obvious problem, or that matches up with the time
> times of the outliers.

And these are still the very extreme (500+ ms and such) outliers that
you're seeing w/o GC correlation? Off the top of my head, that seems
very unexpected (assuming a non-saturated system) and would definitely
invite investigation IMO.

If you're willing to start iterating with the source code I'd start
bisecting down the call stack and see where it's happening .

-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)


Re: Cassandra stress test and max vs. average read/write latency.

2011-12-23 Thread Peter Fales
Peter,

Thanks for your response. I'm looking into some of the ideas in your
other recent mail, but I had another followup question on this one...

Is there any way to control the CPU load when using the "stress" benchmark?
I have some control over that with our home-grown benchmark, but I
thought it made sense to use the official benchmark tool as people might
more readily believe those results and/or be able to reproduce them.  But
offhand, I don't see any to throttle back the load created by the 
stress test.

On Mon, Dec 19, 2011 at 09:47:32PM -0800, Peter Schuller wrote:
> > I'm trying to understand if this is expected or not, and if there is
> 
> Without careful tuning, outliers around a couple of hundred ms are
> definitely expected in general (not *necessarily*, depending on
> workload) as a result of garbage collection pauses. The impact will be
> worsened a bit if you are running under high CPU load (or even maxing
> it out with stress) because post-pause, if you are close to max CPU
> usage you will take considerably longer to "catch up".
> 
> Personally, I would just log each response time and feed it to gnuplot
> or something. It should be pretty obvious whether or not the latencies
> are due to periodic pauses.
> 
> If you are concerned with eliminating or reducing outliers, I would:
> 
> (1) Make sure that when you're benchmarking, that you're putting
> Cassandra under a reasonable amount of load. Latency benchmarks are
> usually useless if you're benchmarking against a saturated system. At
> least, start by achieving your latency goals at 25% or less CPU usage,
> and then go from there if you want to up it.
> 
> (2) One can affect GC pauses, but it's non-trivial to eliminate the
> problem completely. For example, the length of frequent young-gen
> pauses can typically be decreased by decreasing the size of the young
> generation, leading to more frequent shorter GC pauses. But that
> instead causes more promotion into the old generation, which will
> result in more frequent very long pauses (relative to normal; they
> would still be infrequent relative to young gen pauses) - IF your
> workload is such that you are suffering from fragmentation and
> eventually seeing Cassandra fall back to full compacting GC:s
> (stop-the-world) for the old generation.
> 
> I would start by adjusting young gen so that your frequent pauses are
> at an acceptable level, and then see whether or not you can sustain
> that in terms of old-gen.
> 
> Start with this in any case: Run Cassandra with -XX:+PrintGC
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps
> 
> -- 
> / Peter Schuller (@scode, http://worldmodscode.wordpress.com)

-- 
Peter Fales
Alcatel-Lucent
Member of Technical Staff
1960 Lucent Lane
Room: 9H-505
Naperville, IL 60566-7033
Email: peter.fa...@alcatel-lucent.com
Phone: 630 979 8031


Re: Cassandra stress test and max vs. average read/write latency.

2012-01-02 Thread Peter Schuller
> Is there any way to control the CPU load when using the "stress" benchmark?
> I have some control over that with our home-grown benchmark, but I
> thought it made sense to use the official benchmark tool as people might
> more readily believe those results and/or be able to reproduce them.  But
> offhand, I don't see any to throttle back the load created by the
> stress test.

I'm not aware of one built-in. It would be a useful patch IMO, to
allow setting a target rate.

-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)


Cassandra Doesn't Get Linear Performance Increment in Stress Test on Amazon EC2

2014-12-06 Thread kong
Hi,

I am doing stress test on Datastax Cassandra Community 2.1.2, not using the
provided stress test tool, but use my own stress-test client code instead(I
write some C++ stress test code). My Cassandra cluster is deployed on Amazon
EC2, using the provided Datastax Community AMI( HVM instances ) in the
Datastax document, and I am not using EBS, just using the ephemeral storage
by default. The EC2 type of Cassandra servers are m3.xlarge. I use another
EC2 instance for my stress test client, which is of type r3.8xlarge. Both
the Cassandra sever nodes and stress test client node are in us-east. I test
the Cassandra cluster which is made up of 1 node, 2 nodes, and 4 nodes
separately. I just do INSERT test and SELECT test separately, but the
performance doesn't get linear increment when new nodes are added. Also I
get some weird results. My test results are as follows(I do 1 million
operations and I try to get the best QPS when the max latency is no more
than 200ms, and the latencies are measured from the client side. The QPS is
calculated by total_operations/total_time).



INSERT(write):


Node count

Replication factor

  QPS

Average latency(ms)

Min latency(ms)

.95 latency(ms)

.99 latency(ms)

.999 latency(ms)

Max latency(ms)


1

1

18687

2.08

1.48

2.95

5.74

52.8

205.4


2

1

20793

3.15

0.84

7.71

41.35

88.7

232.7


2

2

22498

3.37

0.86

6.04

36.1

221.5

649.3


4

1

28348

4.38

0.85

8.19

64.51

169.4

251.9


4

3

28631

5.22

0.87

18.68

68.35

167.2

288

   

SELECT(read):


Node count

Replication factor

QPS

Average latency(ms)

Min latency(ms)

.95 latency(ms)

.99 latency(ms)

.999 latency(ms)

Max latency(ms)


1

1

24498

4.01

1.51

7.6

12.51

31.5

129.6


2

1

28219

3.38

0.85

9.5

17.71

39.2

152.2


2

2

35383

4.06

0.87

9.71

21.25

70.3

215.9


4

1

34648

2.78

0.86

6.07

14.94

30.8

134.6


4

3

52932

3.45

0.86

10.81

21.05

37.4

189.1

 

The test data I use is generated randomly, and the schema I use is like (I
use the cqlsh to create the columnfamily/table):

CREATE TABLE table(

id1  varchar,

ts   varchar,

id2  varchar,

msg  varchar,

PRIMARY KEY(id1, ts, id2));

So the fields are all string and I generate each character of the string
randomly, using srand(time(0)) and rand() in C++, so I think my test data
could be uniformly distributed into the Cassandra cluster. And, in my client
stress test code, I use thrift C++ interface, and the basic operation I do
is like:

thrift_client.execute_cql3_query("INSERT INTO table WHERE id1=xxx, ts=xxx,
id2=xxx, msg=xxx"); and thrift_client.execute_cql3_query("SELECT FROM table
WHERE id1=xxx"); 

Each data entry I INSERT of SELECT is of around 100 characters.

On my stress test client, I create several threads to send the read and
write requests, each thread having its own thrift client, and at the
beginning all the thrift clients connect to the Cassandra servers evenly.
For example, I create 160 thrift clients, and each 40 clients of them
connect to one server node, in a 4 node cluster.

 

So, 

1.   Could anyone help me explain my test results? Why does the
performance ( QPS ) just get a little increment when new nodes are added? 

2.   I learn from the materials that, Cassandra has better write
performance than read. But why in my case the read performance is better? 

3.   I also use the OpsCenter to monitor the real-time performance of my
cluster. But when I get the average QPS above, the operations/s provided by
OpsCenter is around 1+ for write peak and 5000+ for read peak.  Why is
my result inconsistent with that from OpsCenter?

4.   Are there any unreasonable things in my test method, such as test
data and QPS calculation?

 

Thank you very much,

Joy



Re: Cassandra Doesn't Get Linear Performance Increment in Stress Test on Amazon EC2

2014-12-07 Thread Eric Stevens
Hi Joy,

Are you resetting your data after each test run?  I wonder if your tests
are actually causing you to fall behind on data grooming tasks such as
compaction, and so performance suffers for your later tests.

There are *so many* factors which can affect performance, without reviewing
test methodology in great detail, it's really hard to say whether there are
flaws which might uncover an antipattern cause atypical number of cache
hits or misses, and so forth. You may also be producing gc pressure in the
write path, and so forth.

I *can* say that 28k writes per second looks just a little low, but it
depends a lot on your network, hardware, and write patterns (eg, data
size).  For a little performance test suite I wrote, with parallel batched
writes, on a 3 node rf=3 cluster test cluster, I got about 86k writes per
second.

Also focusing exclusively on max latency is going to cause you some
troubles especially in the case of magnetic media as you're using.  Between
ill-timed GC and inconsistent performance characteristics from magnetic
media, your max numbers will often look significantly worse than your p(99)
or p(999) numbers.

All this said, one node will often look better than several nodes for
certain patterns because it completely eliminates proxy (coordinator) write
times.  All writes are local writes.  It's an over-simple case that doesn't
reflect any practical production use of Cassandra, so it's probably not
worth even including in your tests.  I would recommend start at 3 nodes
rf=3, and compare against 6 nodes rf=6.  Make sure you're staying on top of
compaction and aren't seeing garbage collections in the logs (either of
those will be polluting your results with variability you can't account for
with small sample sizes of ~1 million).

If you expect to sustain write volumes like this, you'll find these
clusters are sized too small (on that hardware you won't keep up with
compaction), and your tests are again testing scenarios you wouldn't
actually see in production.

On Sat Dec 06 2014 at 7:09:18 AM kong  wrote:

> Hi,
>
> I am doing stress test on Datastax Cassandra Community 2.1.2, not using
> the provided stress test tool, but use my own stress-test client code
> instead(I write some C++ stress test code). My Cassandra cluster is
> deployed on Amazon EC2, using the provided Datastax Community AMI( HVM
> instances ) in the Datastax document, and I am not using EBS, just using
> the ephemeral storage by default. The EC2 type of Cassandra servers are
> m3.xlarge. I use another EC2 instance for my stress test client, which is
> of type r3.8xlarge. Both the Cassandra sever nodes and stress test client
> node are in us-east. I test the Cassandra cluster which is made up of 1
> node, 2 nodes, and 4 nodes separately. I just do INSERT test and SELECT
> test separately, but the performance doesn’t get linear increment when new
> nodes are added. Also I get some weird results. My test results are as
> follows(*I do 1 million operations and I try to get the best QPS when the
> max latency is no more than 200ms, and the latencies are measured from the
> client side. The QPS is calculated by total_operations/total_time).*
>
>
>
> *INSERT(write):*
>
> Node count
>
> Replication factor
>
>   QPS
>
> Average latency(ms)
>
> Min latency(ms)
>
> .95 latency(ms)
>
> .99 latency(ms)
>
> .999 latency(ms)
>
> Max latency(ms)
>
> 1
>
> 1
>
> 18687
>
> 2.08
>
> 1.48
>
> 2.95
>
> 5.74
>
> 52.8
>
> 205.4
>
> 2
>
> 1
>
> 20793
>
> 3.15
>
> 0.84
>
> 7.71
>
> 41.35
>
> 88.7
>
> 232.7
>
> 2
>
> 2
>
> 22498
>
> 3.37
>
> 0.86
>
> 6.04
>
> 36.1
>
> 221.5
>
> 649.3
>
> 4
>
> 1
>
> 28348
>
> 4.38
>
> 0.85
>
> 8.19
>
> 64.51
>
> 169.4
>
> 251.9
>
> 4
>
> 3
>
> 28631
>
> 5.22
>
> 0.87
>
> 18.68
>
> 68.35
>
> 167.2
>
> 288
>
>
>
> *SELECT(read):*
>
> Node count
>
> Replication factor
>
> QPS
>
> Average latency(ms)
>
> Min latency(ms)
>
> .95 latency(ms)
>
> .99 latency(ms)
>
> .999 latency(ms)
>
> Max latency(ms)
>
> 1
>
> 1
>
> 24498
>
> 4.01
>
> 1.51
>
> 7.6
>
> 12.51
>
> 31.5
>
> 129.6
>
> 2
>
> 1
>
> 28219
>
> 3.38
>
> 0.85
>
> 9.5
>
> 17.71
>
> 39.2
>
> 152.2
>
> 2
>
> 2
>
> 35383
>
> 4.06
>
> 0.87
>
> 9.71
>
> 21.25
>
> 70.3
>
> 215.9
>
> 4
>
> 1
>
> 34648
>
> 2.78
>
> 0.86
>
> 6.07
>
> 14.94
>
> 30.8
>
>

Re: Cassandra Doesn't Get Linear Performance Increment in Stress Test on Amazon EC2

2014-12-07 Thread Eric Stevens
I'm sorry, I meant to say "6 nodes rf=3".

Also look at this performance over sustained periods of times, not burst
writing.  Run your test for several hours and watch memory and especially
compaction stats.  See if you can walk in what data volume you can write
while keeping outstanding compaction tasks < 5 (preferably 0 or 1) for
sustained periods.  Measuring just burst writes will definitely mask real
world conditions, and Cassandra actually absorbs bursted writes really well
(which in turn masks performance problems since by the time your write
times suffer from overwhelming a cluster, you're probably already in insane
and difficult to recover crisis mode).

On Sun Dec 07 2014 at 8:55:47 AM Eric Stevens  wrote:

> Hi Joy,
>
> Are you resetting your data after each test run?  I wonder if your tests
> are actually causing you to fall behind on data grooming tasks such as
> compaction, and so performance suffers for your later tests.
>
> There are *so many* factors which can affect performance, without
> reviewing test methodology in great detail, it's really hard to say whether
> there are flaws which might uncover an antipattern cause atypical number of
> cache hits or misses, and so forth. You may also be producing gc pressure
> in the write path, and so forth.
>
> I *can* say that 28k writes per second looks just a little low, but it
> depends a lot on your network, hardware, and write patterns (eg, data
> size).  For a little performance test suite I wrote, with parallel batched
> writes, on a 3 node rf=3 cluster test cluster, I got about 86k writes per
> second.
>
> Also focusing exclusively on max latency is going to cause you some
> troubles especially in the case of magnetic media as you're using.  Between
> ill-timed GC and inconsistent performance characteristics from magnetic
> media, your max numbers will often look significantly worse than your p(99)
> or p(999) numbers.
>
> All this said, one node will often look better than several nodes for
> certain patterns because it completely eliminates proxy (coordinator) write
> times.  All writes are local writes.  It's an over-simple case that doesn't
> reflect any practical production use of Cassandra, so it's probably not
> worth even including in your tests.  I would recommend start at 3 nodes
> rf=3, and compare against 6 nodes rf=6.  Make sure you're staying on top of
> compaction and aren't seeing garbage collections in the logs (either of
> those will be polluting your results with variability you can't account for
> with small sample sizes of ~1 million).
>
> If you expect to sustain write volumes like this, you'll find these
> clusters are sized too small (on that hardware you won't keep up with
> compaction), and your tests are again testing scenarios you wouldn't
> actually see in production.
>
> On Sat Dec 06 2014 at 7:09:18 AM kong  wrote:
>
>> Hi,
>>
>> I am doing stress test on Datastax Cassandra Community 2.1.2, not using
>> the provided stress test tool, but use my own stress-test client code
>> instead(I write some C++ stress test code). My Cassandra cluster is
>> deployed on Amazon EC2, using the provided Datastax Community AMI( HVM
>> instances ) in the Datastax document, and I am not using EBS, just using
>> the ephemeral storage by default. The EC2 type of Cassandra servers are
>> m3.xlarge. I use another EC2 instance for my stress test client, which is
>> of type r3.8xlarge. Both the Cassandra sever nodes and stress test client
>> node are in us-east. I test the Cassandra cluster which is made up of 1
>> node, 2 nodes, and 4 nodes separately. I just do INSERT test and SELECT
>> test separately, but the performance doesn’t get linear increment when new
>> nodes are added. Also I get some weird results. My test results are as
>> follows(*I do 1 million operations and I try to get the best QPS when
>> the max latency is no more than 200ms, and the latencies are measured from
>> the client side. The QPS is calculated by total_operations/total_time).*
>>
>>
>>
>> *INSERT(write):*
>>
>> Node count
>>
>> Replication factor
>>
>>   QPS
>>
>> Average latency(ms)
>>
>> Min latency(ms)
>>
>> .95 latency(ms)
>>
>> .99 latency(ms)
>>
>> .999 latency(ms)
>>
>> Max latency(ms)
>>
>> 1
>>
>> 1
>>
>> 18687
>>
>> 2.08
>>
>> 1.48
>>
>> 2.95
>>
>> 5.74
>>
>> 52.8
>>
>> 205.4
>>
>> 2
>>
>> 1
>>
>> 20793
>>
>> 3.15
>>
>> 0.84
>>
>> 7.

Re: Cassandra Doesn't Get Linear Performance Increment in Stress Test on Amazon EC2

2014-12-07 Thread 孔嘉林
Hi Eric,
Thank you very much for your reply!
Do you mean that I should clear my table after each run? Indeed, I can see
several times of compaction during my test, but could only a few times
compaction affect the performance that much? Also, I can see from the
OpsCenter some ParNew GC happen but no CMS GC happen.

I run my test on EC2 cluster, I think the network could be of high speed
with in it. Each Cassandra server has 4 units CPU, 15 GiB memory and 80 SSD
storage, which is of m3.xlarge type.

As for latency, which latency should I care about most? p(99) or p(999)? I
want to get the max QPS under a certain limited latency.

I know my testing scenario are not the common case in production, I just
want to know how much burden my cluster can bear under stress.

So, how did you test your cluster that can get 86k writes/sec? How many
requests did you send to your cluster? Was it also 1 million? Did you also
use OpsCenter to monitor the real time performance? I also wonder why the
write and read QPS OpsCenter provide are much lower than what I calculate.
Could you please describe in detail about your test deployment?

Thank you very much,
Joy

2014-12-07 23:55 GMT+08:00 Eric Stevens :

> Hi Joy,
>
> Are you resetting your data after each test run?  I wonder if your tests
> are actually causing you to fall behind on data grooming tasks such as
> compaction, and so performance suffers for your later tests.
>
> There are *so many* factors which can affect performance, without
> reviewing test methodology in great detail, it's really hard to say whether
> there are flaws which might uncover an antipattern cause atypical number of
> cache hits or misses, and so forth. You may also be producing gc pressure
> in the write path, and so forth.
>
> I *can* say that 28k writes per second looks just a little low, but it
> depends a lot on your network, hardware, and write patterns (eg, data
> size).  For a little performance test suite I wrote, with parallel batched
> writes, on a 3 node rf=3 cluster test cluster, I got about 86k writes per
> second.
>
> Also focusing exclusively on max latency is going to cause you some
> troubles especially in the case of magnetic media as you're using.  Between
> ill-timed GC and inconsistent performance characteristics from magnetic
> media, your max numbers will often look significantly worse than your p(99)
> or p(999) numbers.
>
> All this said, one node will often look better than several nodes for
> certain patterns because it completely eliminates proxy (coordinator) write
> times.  All writes are local writes.  It's an over-simple case that doesn't
> reflect any practical production use of Cassandra, so it's probably not
> worth even including in your tests.  I would recommend start at 3 nodes
> rf=3, and compare against 6 nodes rf=6.  Make sure you're staying on top of
> compaction and aren't seeing garbage collections in the logs (either of
> those will be polluting your results with variability you can't account for
> with small sample sizes of ~1 million).
>
> If you expect to sustain write volumes like this, you'll find these
> clusters are sized too small (on that hardware you won't keep up with
> compaction), and your tests are again testing scenarios you wouldn't
> actually see in production.
>
> On Sat Dec 06 2014 at 7:09:18 AM kong  wrote:
>
>> Hi,
>>
>> I am doing stress test on Datastax Cassandra Community 2.1.2, not using
>> the provided stress test tool, but use my own stress-test client code
>> instead(I write some C++ stress test code). My Cassandra cluster is
>> deployed on Amazon EC2, using the provided Datastax Community AMI( HVM
>> instances ) in the Datastax document, and I am not using EBS, just using
>> the ephemeral storage by default. The EC2 type of Cassandra servers are
>> m3.xlarge. I use another EC2 instance for my stress test client, which is
>> of type r3.8xlarge. Both the Cassandra sever nodes and stress test client
>> node are in us-east. I test the Cassandra cluster which is made up of 1
>> node, 2 nodes, and 4 nodes separately. I just do INSERT test and SELECT
>> test separately, but the performance doesn’t get linear increment when new
>> nodes are added. Also I get some weird results. My test results are as
>> follows(*I do 1 million operations and I try to get the best QPS when
>> the max latency is no more than 200ms, and the latencies are measured from
>> the client side. The QPS is calculated by total_operations/total_time).*
>>
>>
>>
>> *INSERT(write):*
>>
>> Node count
>>
>> Replication factor
>>
>>   QPS
>>
>> Average latency(ms)
>>
>> Min latency(ms)
>>
>> .

Re: Cassandra Doesn't Get Linear Performance Increment in Stress Test on Amazon EC2

2014-12-07 Thread Chris Lohfink
I think your client could use improvements.  How many threads do you have
running in your test?  With a thrift call like that you only can do one
request at a time per connection.   For example, assuming C* takes 0ms, a
10ms network latency/driver overhead will mean 20ms RTT and a max
throughput of ~50 QPS per thread (native binary doesn't behave like this).
Are you running client on its own system or shared with a node?  how are
you load balancing your requests?  Source code would help since theres a
lot that can become a bottleneck.

Generally you will see a bit of a dip in latency from N=RF=1 and N=2, RF=2
etc since there are optimizations on the coordinator node when it doesn't
need to send the request to the replicas.  The impact of the network
overhead decreases in significance as cluster grows.  Typically; latency
wise, RF=N=1 is going to be fastest possible for smaller loads (ie when a
client cannot fully saturate a single node).

Main thing to expect is that latency will plateau and remain fairly
constant as load/nodes increase while throughput potential will linearly
(empirically at least) increase.

You should really attempt it with the native binary + prepared statements,
running cql over thrift is far from optimal.  I would recommend using the
cassandra-stress tool if you want to stress test Cassandra (and not your
code)
http://www.datastax.com/dev/blog/improved-cassandra-2-1-stress-tool-benchmark-any-schema

===
Chris Lohfink

On Sun, Dec 7, 2014 at 9:48 PM, 孔嘉林  wrote:

> Hi Eric,
> Thank you very much for your reply!
> Do you mean that I should clear my table after each run? Indeed, I can see
> several times of compaction during my test, but could only a few times
> compaction affect the performance that much? Also, I can see from the
> OpsCenter some ParNew GC happen but no CMS GC happen.
>
> I run my test on EC2 cluster, I think the network could be of high speed
> with in it. Each Cassandra server has 4 units CPU, 15 GiB memory and 80 SSD
> storage, which is of m3.xlarge type.
>
> As for latency, which latency should I care about most? p(99) or p(999)? I
> want to get the max QPS under a certain limited latency.
>
> I know my testing scenario are not the common case in production, I just
> want to know how much burden my cluster can bear under stress.
>
> So, how did you test your cluster that can get 86k writes/sec? How many
> requests did you send to your cluster? Was it also 1 million? Did you also
> use OpsCenter to monitor the real time performance? I also wonder why the
> write and read QPS OpsCenter provide are much lower than what I calculate.
> Could you please describe in detail about your test deployment?
>
> Thank you very much,
> Joy
>
> 2014-12-07 23:55 GMT+08:00 Eric Stevens :
>
>> Hi Joy,
>>
>> Are you resetting your data after each test run?  I wonder if your tests
>> are actually causing you to fall behind on data grooming tasks such as
>> compaction, and so performance suffers for your later tests.
>>
>> There are *so many* factors which can affect performance, without
>> reviewing test methodology in great detail, it's really hard to say whether
>> there are flaws which might uncover an antipattern cause atypical number of
>> cache hits or misses, and so forth. You may also be producing gc pressure
>> in the write path, and so forth.
>>
>> I *can* say that 28k writes per second looks just a little low, but it
>> depends a lot on your network, hardware, and write patterns (eg, data
>> size).  For a little performance test suite I wrote, with parallel batched
>> writes, on a 3 node rf=3 cluster test cluster, I got about 86k writes per
>> second.
>>
>> Also focusing exclusively on max latency is going to cause you some
>> troubles especially in the case of magnetic media as you're using.  Between
>> ill-timed GC and inconsistent performance characteristics from magnetic
>> media, your max numbers will often look significantly worse than your p(99)
>> or p(999) numbers.
>>
>> All this said, one node will often look better than several nodes for
>> certain patterns because it completely eliminates proxy (coordinator) write
>> times.  All writes are local writes.  It's an over-simple case that doesn't
>> reflect any practical production use of Cassandra, so it's probably not
>> worth even including in your tests.  I would recommend start at 3 nodes
>> rf=3, and compare against 6 nodes rf=6.  Make sure you're staying on top of
>> compaction and aren't seeing garbage collections in the logs (either of
>> those will be polluting your results with variability you can't account for
>> with small sample sizes of ~1 million).
>>
>> If yo

Re: Cassandra Doesn't Get Linear Performance Increment in Stress Test on Amazon EC2

2014-12-08 Thread 孔嘉林
Thanks Chris.
I run a *client on a separate* AWS *instance from* the Cassandra cluster
servers. At the client side, I create 40 or 50 threads for sending requests
to each Cassandra node. I create one thrift client for each of the threads.
And at the beginning, all the created thrift clients connect to the
corresponding Cassandra nodes and keep connecting during the whole
process(I did not close all the transports until the end of the test
process). So I use very simple load balancing, since the same number of
thrift clients connect to each node. And my source code is here:
https://github.com/kongjialin/Cassandra/blob/master/cassandra_client.cpp It's
very nice of you to help me improve my code.

As I increase the number of threads, the latency gets longer.

I'm using C++, so if I want to use native binary + prepared statements, the
only way is to use C++ driver?
Thanks very much.




2014-12-08 12:51 GMT+08:00 Chris Lohfink :

> I think your client could use improvements.  How many threads do you have
> running in your test?  With a thrift call like that you only can do one
> request at a time per connection.   For example, assuming C* takes 0ms, a
> 10ms network latency/driver overhead will mean 20ms RTT and a max
> throughput of ~50 QPS per thread (native binary doesn't behave like this).
> Are you running client on its own system or shared with a node?  how are
> you load balancing your requests?  Source code would help since theres a
> lot that can become a bottleneck.
>
> Generally you will see a bit of a dip in latency from N=RF=1 and N=2, RF=2
> etc since there are optimizations on the coordinator node when it doesn't
> need to send the request to the replicas.  The impact of the network
> overhead decreases in significance as cluster grows.  Typically; latency
> wise, RF=N=1 is going to be fastest possible for smaller loads (ie when a
> client cannot fully saturate a single node).
>
> Main thing to expect is that latency will plateau and remain fairly
> constant as load/nodes increase while throughput potential will linearly
> (empirically at least) increase.
>
> You should really attempt it with the native binary + prepared statements,
> running cql over thrift is far from optimal.  I would recommend using the
> cassandra-stress tool if you want to stress test Cassandra (and not your
> code)
> http://www.datastax.com/dev/blog/improved-cassandra-2-1-stress-tool-benchmark-any-schema
>
> ===
> Chris Lohfink
>
> On Sun, Dec 7, 2014 at 9:48 PM, 孔嘉林  wrote:
>
>> Hi Eric,
>> Thank you very much for your reply!
>> Do you mean that I should clear my table after each run? Indeed, I can
>> see several times of compaction during my test, but could only a few times
>> compaction affect the performance that much? Also, I can see from the
>> OpsCenter some ParNew GC happen but no CMS GC happen.
>>
>> I run my test on EC2 cluster, I think the network could be of high speed
>> with in it. Each Cassandra server has 4 units CPU, 15 GiB memory and 80 SSD
>> storage, which is of m3.xlarge type.
>>
>> As for latency, which latency should I care about most? p(99) or p(999)?
>> I want to get the max QPS under a certain limited latency.
>>
>> I know my testing scenario are not the common case in production, I just
>> want to know how much burden my cluster can bear under stress.
>>
>> So, how did you test your cluster that can get 86k writes/sec? How many
>> requests did you send to your cluster? Was it also 1 million? Did you also
>> use OpsCenter to monitor the real time performance? I also wonder why the
>> write and read QPS OpsCenter provide are much lower than what I calculate.
>> Could you please describe in detail about your test deployment?
>>
>> Thank you very much,
>> Joy
>>
>> 2014-12-07 23:55 GMT+08:00 Eric Stevens :
>>
>>> Hi Joy,
>>>
>>> Are you resetting your data after each test run?  I wonder if your tests
>>> are actually causing you to fall behind on data grooming tasks such as
>>> compaction, and so performance suffers for your later tests.
>>>
>>> There are *so many* factors which can affect performance, without
>>> reviewing test methodology in great detail, it's really hard to say whether
>>> there are flaws which might uncover an antipattern cause atypical number of
>>> cache hits or misses, and so forth. You may also be producing gc pressure
>>> in the write path, and so forth.
>>>
>>> I *can* say that 28k writes per second looks just a little low, but it
>>> depends a lot on your network, hardware, and write patterns (eg, data
>>> size).  For a little performance test suite I wrote

Re: Cassandra Doesn't Get Linear Performance Increment in Stress Test on Amazon EC2

2014-12-08 Thread Chris Lohfink
So I would -expect- an increase of ~20k qps per node with m3.xlarge so
there may be something up with your client (I am not a c++ person however
but hopefully someone on list will take notice).

Latency does not decreases linearly as you add nodes.  What you are likely
seeing with latency since so few nodes is side effect of an optimization.
When you read/write from a table the node you request will act as the
coordinator.  If the data exists on the coordinator and using rf=1 or cl=1
it will not have to send the request to another node, just service it
locally:

  +-+ +--+
  |  node0  | +-->|node1 |
  |-| |--|
  |  client | <--+| coordinator  |
  +-+ +--+

In this case the write latency is dominated by the network between
coordinator and client.  A second case is where the coordinator actually
has to send the request to another node:

  +-+ +--+ +---+
  |  node0  | +-->|node1 |+--> |node2  |
  |-| |--| |---|
  |  client | <--+| coordinator  |<---+| data replica  |
  +-+ +--+ +---+

As your adding nodes your increasing the probability of hitting this second
scenario where the coordinator has to make an additional network hop.  This
possibly why your seeing an increase (aside from client issues). To get an
idea on how the latency is affected when you increase nodes you really need
to go higher then 4 (ie graph the same rf for 5, 10, 15, 25 nodes.  below 5
isn't really the recommended way to run Cassandra anyway) nodes since the
latency will approach that of the 2nd scenario (plus some spike outliers
for GCs) and then it should settle down until you overwork the node.

May want to give https://github.com/datastax/cpp-driver a go (not cpp guy
take with grain of salt).  I would still highly recommend using
cassandra-stress instead of own stuff if you want to test cassandra and not
your code.

===
Chris Lohfink

On Mon, Dec 8, 2014 at 4:57 AM, 孔嘉林  wrote:

> Thanks Chris.
> I run a *client on a separate* AWS *instance from* the Cassandra cluster
> servers. At the client side, I create 40 or 50 threads for sending requests
> to each Cassandra node. I create one thrift client for each of the threads.
> And at the beginning, all the created thrift clients connect to the
> corresponding Cassandra nodes and keep connecting during the whole
> process(I did not close all the transports until the end of the test
> process). So I use very simple load balancing, since the same number of
> thrift clients connect to each node. And my source code is here:
> https://github.com/kongjialin/Cassandra/blob/master/cassandra_client.cpp It's
> very nice of you to help me improve my code.
>
> As I increase the number of threads, the latency gets longer.
>
> I'm using C++, so if I want to use native binary + prepared statements,
> the only way is to use C++ driver?
> Thanks very much.
>
>
>
>
> 2014-12-08 12:51 GMT+08:00 Chris Lohfink :
>
>> I think your client could use improvements.  How many threads do you have
>> running in your test?  With a thrift call like that you only can do one
>> request at a time per connection.   For example, assuming C* takes 0ms, a
>> 10ms network latency/driver overhead will mean 20ms RTT and a max
>> throughput of ~50 QPS per thread (native binary doesn't behave like this).
>> Are you running client on its own system or shared with a node?  how are
>> you load balancing your requests?  Source code would help since theres a
>> lot that can become a bottleneck.
>>
>> Generally you will see a bit of a dip in latency from N=RF=1 and N=2,
>> RF=2 etc since there are optimizations on the coordinator node when it
>> doesn't need to send the request to the replicas.  The impact of the
>> network overhead decreases in significance as cluster grows.  Typically;
>> latency wise, RF=N=1 is going to be fastest possible for smaller loads (ie
>> when a client cannot fully saturate a single node).
>>
>> Main thing to expect is that latency will plateau and remain fairly
>> constant as load/nodes increase while throughput potential will linearly
>> (empirically at least) increase.
>>
>> You should really attempt it with the native binary + prepared
>> statements, running cql over thrift is far from optimal.  I would recommend
>> using the cassandra-stress tool if you want to stress test Cassandra (and
>> not your code)
>> http://www.datastax.com/dev/blog/improved-cassandra-2-1-stress-tool-benchmark-any-schema
&

Re: Cassandra Doesn't Get Linear Performance Increment in Stress Test on Amazon EC2

2014-12-08 Thread Eric Stevens
;>
>> Generally you will see a bit of a dip in latency from N=RF=1 and N=2,
>> RF=2 etc since there are optimizations on the coordinator node when it
>> doesn't need to send the request to the replicas.  The impact of the
>> network overhead decreases in significance as cluster grows.  Typically;
>> latency wise, RF=N=1 is going to be fastest possible for smaller loads (ie
>> when a client cannot fully saturate a single node).
>>
>> Main thing to expect is that latency will plateau and remain fairly
>> constant as load/nodes increase while throughput potential will linearly
>> (empirically at least) increase.
>>
>> You should really attempt it with the native binary + prepared
>> statements, running cql over thrift is far from optimal.  I would recommend
>> using the cassandra-stress tool if you want to stress test Cassandra (and
>> not your code) http://www.datastax.com/dev/blog/improved-cassandra-2-
>> 1-stress-tool-benchmark-any-schema
>>
>> ===
>> Chris Lohfink
>>
>> On Sun, Dec 7, 2014 at 9:48 PM, 孔嘉林  wrote:
>>
>>> Hi Eric,
>>> Thank you very much for your reply!
>>> Do you mean that I should clear my table after each run? Indeed, I can
>>> see several times of compaction during my test, but could only a few times
>>> compaction affect the performance that much? Also, I can see from the
>>> OpsCenter some ParNew GC happen but no CMS GC happen.
>>>
>>> I run my test on EC2 cluster, I think the network could be of high speed
>>> with in it. Each Cassandra server has 4 units CPU, 15 GiB memory and 80 SSD
>>> storage, which is of m3.xlarge type.
>>>
>>> As for latency, which latency should I care about most? p(99) or p(999)?
>>> I want to get the max QPS under a certain limited latency.
>>>
>>> I know my testing scenario are not the common case in production, I just
>>> want to know how much burden my cluster can bear under stress.
>>>
>>> So, how did you test your cluster that can get 86k writes/sec? How many
>>> requests did you send to your cluster? Was it also 1 million? Did you also
>>> use OpsCenter to monitor the real time performance? I also wonder why the
>>> write and read QPS OpsCenter provide are much lower than what I calculate.
>>> Could you please describe in detail about your test deployment?
>>>
>>> Thank you very much,
>>> Joy
>>>
>>> 2014-12-07 23:55 GMT+08:00 Eric Stevens :
>>>
>>>> Hi Joy,
>>>>
>>>> Are you resetting your data after each test run?  I wonder if your
>>>> tests are actually causing you to fall behind on data grooming tasks such
>>>> as compaction, and so performance suffers for your later tests.
>>>>
>>>> There are *so many* factors which can affect performance, without
>>>> reviewing test methodology in great detail, it's really hard to say whether
>>>> there are flaws which might uncover an antipattern cause atypical number of
>>>> cache hits or misses, and so forth. You may also be producing gc pressure
>>>> in the write path, and so forth.
>>>>
>>>> I *can* say that 28k writes per second looks just a little low, but it
>>>> depends a lot on your network, hardware, and write patterns (eg, data
>>>> size).  For a little performance test suite I wrote, with parallel batched
>>>> writes, on a 3 node rf=3 cluster test cluster, I got about 86k writes per
>>>> second.
>>>>
>>>> Also focusing exclusively on max latency is going to cause you some
>>>> troubles especially in the case of magnetic media as you're using.  Between
>>>> ill-timed GC and inconsistent performance characteristics from magnetic
>>>> media, your max numbers will often look significantly worse than your p(99)
>>>> or p(999) numbers.
>>>>
>>>> All this said, one node will often look better than several nodes for
>>>> certain patterns because it completely eliminates proxy (coordinator) write
>>>> times.  All writes are local writes.  It's an over-simple case that doesn't
>>>> reflect any practical production use of Cassandra, so it's probably not
>>>> worth even including in your tests.  I would recommend start at 3 nodes
>>>> rf=3, and compare against 6 nodes rf=6.  Make sure you're staying on top of
>>>> compaction and aren't seeing garbage collections in the logs (either of
>>>> those will be polluting