Fwd: Cassandra Stress Test Result Evaluation

2015-03-09 Thread Nisha Menon
I have been using the cassandra-stress tool to evaluate my cassandra
cluster for quite some time now. My problem is that I am not able to
comprehend the results generated for my specific use case.

My schema looks something like this:

CREATE TABLE Table_test(
  ID uuid,
  Time timestamp,
  Value double,
  Date timestamp,
  PRIMARY KEY ((ID,Date), Time)
) WITH COMPACT STORAGE;

I have parsed this information in a custom yaml file and used parameters
n=1, threads=100 and the rest are default options (cl=one, mode=native
cql3 etc). The Cassandra cluster is a 3 node CentOS VM setup.

A few specifics of the custom yaml file are as follows:

insert:
partitions: fixed(100)
select: fixed(1)/2
batchtype: UNLOGGED

columnspecs:
-name: Time
 size: fixed(1000)
-name: ID
 size: uniform(1..100)
-name: Date
 size: uniform(1..10)
-name: Value
 size: uniform(-100..100)

My observations so far are as follows (Please correct me if I am wrong):

   1. With n=1 and time: fixed(1000), the number of rows getting
   inserted is 10 million. (1*1000=1000)
   2. The number of row-keys/partitions is 1(i.e n), within which 100
   partitions are taken at a time (which means 100 *1000 = 10 key-value
   pairs) out of which 5 key-value pairs are processed at a time. (This is
   because of select: fixed(1)/2 ~ 50%)

The output message also confirms the same:

Generating batches with [100..100] partitions and [5..5] rows
(of[10..10] total rows in the partitions)

The results that I get are the following for consecutive runs with the same
configuration as above:

Run Total_ops   Op_rate Partition_rate  Row_Rate   Time
1 56   19 1885   943246 3.0
2 46   46 4648  2325498 1.0
3 27   30 2982  1489870 0.9
4 59   19 1932   966034 3.1
5 100  17 1730   865182 5.8

Now what I need to understand are as follows:

   1. Which among these metrics is the throughput i.e, No. of records
   inserted per second? Is it the Row_rate, Op_rate or Partition_rate? If it’s
   the Row_rate, can I safely conclude here that I am able to insert close to
   1 million records per second? Any thoughts on what the Op_rate and
   Partition_rate mean in this case?
   2. Why is it that the Total_ops vary so drastically in every run ? Has
   the number of threads got anything to do with this variation? What can I
   conclude here about the stability of my Cassandra setup?
   3. How do I determine the batch size per thread here? In my example, is
   the batch size 5?

Thanks in advance.



-- 
Nisha Menon
BTech (CS) Sahrdaya CET,
MTech (CS) IIIT Banglore.


Re: Cassandra Stress Test Result Evaluation

2015-03-09 Thread Jake Luciani
Your insert settings look unrealistic since I doubt you would be
writing 50k rows at a time.  Try to set this to 1 per partition and
you should get much more consistent numbers across runs I would think.
select: fixed(1)/10

On Wed, Mar 4, 2015 at 7:53 AM, Nisha Menon nisha.meno...@gmail.com wrote:
 I have been using the cassandra-stress tool to evaluate my cassandra cluster
 for quite some time now. My problem is that I am not able to comprehend the
 results generated for my specific use case.

 My schema looks something like this:

 CREATE TABLE Table_test(
   ID uuid,
   Time timestamp,
   Value double,
   Date timestamp,
   PRIMARY KEY ((ID,Date), Time)
 ) WITH COMPACT STORAGE;

 I have parsed this information in a custom yaml file and used parameters
 n=1, threads=100 and the rest are default options (cl=one, mode=native
 cql3 etc). The Cassandra cluster is a 3 node CentOS VM setup.

 A few specifics of the custom yaml file are as follows:

 insert:
 partitions: fixed(100)
 select: fixed(1)/2
 batchtype: UNLOGGED

 columnspecs:
 -name: Time
  size: fixed(1000)
 -name: ID
  size: uniform(1..100)
 -name: Date
  size: uniform(1..10)
 -name: Value
  size: uniform(-100..100)

 My observations so far are as follows (Please correct me if I am wrong):

 With n=1 and time: fixed(1000), the number of rows getting inserted is
 10 million. (1*1000=1000)
 The number of row-keys/partitions is 1(i.e n), within which 100
 partitions are taken at a time (which means 100 *1000 = 10 key-value
 pairs) out of which 5 key-value pairs are processed at a time. (This is
 because of select: fixed(1)/2 ~ 50%)

 The output message also confirms the same:

 Generating batches with [100..100] partitions and [5..5] rows
 (of[10..10] total rows in the partitions)

 The results that I get are the following for consecutive runs with the same
 configuration as above:

 Run Total_ops   Op_rate Partition_rate  Row_Rate   Time
 1 56   19 1885   943246 3.0
 2 46   46 4648  2325498 1.0
 3 27   30 2982  1489870 0.9
 4 59   19 1932   966034 3.1
 5 100  17 1730   865182 5.8

 Now what I need to understand are as follows:

 Which among these metrics is the throughput i.e, No. of records inserted per
 second? Is it the Row_rate, Op_rate or Partition_rate? If it’s the Row_rate,
 can I safely conclude here that I am able to insert close to 1 million
 records per second? Any thoughts on what the Op_rate and Partition_rate mean
 in this case?
 Why is it that the Total_ops vary so drastically in every run ? Has the
 number of threads got anything to do with this variation? What can I
 conclude here about the stability of my Cassandra setup?
 How do I determine the batch size per thread here? In my example, is the
 batch size 5?

 Thanks in advance.



-- 
http://twitter.com/tjake