Your insert settings look unrealistic since I doubt you would be
writing 50k rows at a time. Try to set this to 1 per partition and
you should get much more consistent numbers across runs I would think.
select: fixed(1)/10
On Wed, Mar 4, 2015 at 7:53 AM, Nisha Menon nisha.meno...@gmail.com wrote:
I have been using the cassandra-stress tool to evaluate my cassandra cluster
for quite some time now. My problem is that I am not able to comprehend the
results generated for my specific use case.
My schema looks something like this:
CREATE TABLE Table_test(
ID uuid,
Time timestamp,
Value double,
Date timestamp,
PRIMARY KEY ((ID,Date), Time)
) WITH COMPACT STORAGE;
I have parsed this information in a custom yaml file and used parameters
n=1, threads=100 and the rest are default options (cl=one, mode=native
cql3 etc). The Cassandra cluster is a 3 node CentOS VM setup.
A few specifics of the custom yaml file are as follows:
insert:
partitions: fixed(100)
select: fixed(1)/2
batchtype: UNLOGGED
columnspecs:
-name: Time
size: fixed(1000)
-name: ID
size: uniform(1..100)
-name: Date
size: uniform(1..10)
-name: Value
size: uniform(-100..100)
My observations so far are as follows (Please correct me if I am wrong):
With n=1 and time: fixed(1000), the number of rows getting inserted is
10 million. (1*1000=1000)
The number of row-keys/partitions is 1(i.e n), within which 100
partitions are taken at a time (which means 100 *1000 = 10 key-value
pairs) out of which 5 key-value pairs are processed at a time. (This is
because of select: fixed(1)/2 ~ 50%)
The output message also confirms the same:
Generating batches with [100..100] partitions and [5..5] rows
(of[10..10] total rows in the partitions)
The results that I get are the following for consecutive runs with the same
configuration as above:
Run Total_ops Op_rate Partition_rate Row_Rate Time
1 56 19 1885 943246 3.0
2 46 46 4648 2325498 1.0
3 27 30 2982 1489870 0.9
4 59 19 1932 966034 3.1
5 100 17 1730 865182 5.8
Now what I need to understand are as follows:
Which among these metrics is the throughput i.e, No. of records inserted per
second? Is it the Row_rate, Op_rate or Partition_rate? If it’s the Row_rate,
can I safely conclude here that I am able to insert close to 1 million
records per second? Any thoughts on what the Op_rate and Partition_rate mean
in this case?
Why is it that the Total_ops vary so drastically in every run ? Has the
number of threads got anything to do with this variation? What can I
conclude here about the stability of my Cassandra setup?
How do I determine the batch size per thread here? In my example, is the
batch size 5?
Thanks in advance.
--
http://twitter.com/tjake