Fwd: Cassandra Stress Test Result Evaluation

2015-03-09 Thread Nisha Menon
I have been using the cassandra-stress tool to evaluate my cassandra
cluster for quite some time now. My problem is that I am not able to
comprehend the results generated for my specific use case.

My schema looks something like this:

CREATE TABLE Table_test(
  ID uuid,
  Time timestamp,
  Value double,
  Date timestamp,
  PRIMARY KEY ((ID,Date), Time)
) WITH COMPACT STORAGE;

I have parsed this information in a custom yaml file and used parameters
n=1, threads=100 and the rest are default options (cl=one, mode=native
cql3 etc). The Cassandra cluster is a 3 node CentOS VM setup.

A few specifics of the custom yaml file are as follows:

insert:
partitions: fixed(100)
select: fixed(1)/2
batchtype: UNLOGGED

columnspecs:
-name: Time
 size: fixed(1000)
-name: ID
 size: uniform(1..100)
-name: Date
 size: uniform(1..10)
-name: Value
 size: uniform(-100..100)

My observations so far are as follows (Please correct me if I am wrong):

   1. With n=1 and time: fixed(1000), the number of rows getting
   inserted is 10 million. (1*1000=1000)
   2. The number of row-keys/partitions is 1(i.e n), within which 100
   partitions are taken at a time (which means 100 *1000 = 10 key-value
   pairs) out of which 5 key-value pairs are processed at a time. (This is
   because of select: fixed(1)/2 ~ 50%)

The output message also confirms the same:

Generating batches with [100..100] partitions and [5..5] rows
(of[10..10] total rows in the partitions)

The results that I get are the following for consecutive runs with the same
configuration as above:

Run Total_ops   Op_rate Partition_rate  Row_Rate   Time
1 56   19 1885   943246 3.0
2 46   46 4648  2325498 1.0
3 27   30 2982  1489870 0.9
4 59   19 1932   966034 3.1
5 100  17 1730   865182 5.8

Now what I need to understand are as follows:

   1. Which among these metrics is the throughput i.e, No. of records
   inserted per second? Is it the Row_rate, Op_rate or Partition_rate? If it’s
   the Row_rate, can I safely conclude here that I am able to insert close to
   1 million records per second? Any thoughts on what the Op_rate and
   Partition_rate mean in this case?
   2. Why is it that the Total_ops vary so drastically in every run ? Has
   the number of threads got anything to do with this variation? What can I
   conclude here about the stability of my Cassandra setup?
   3. How do I determine the batch size per thread here? In my example, is
   the batch size 5?

Thanks in advance.



-- 
Nisha Menon
BTech (CS) Sahrdaya CET,
MTech (CS) IIIT Banglore.


Re: Cassandra Stress Test Result Evaluation

2015-03-09 Thread Jake Luciani
Your insert settings look unrealistic since I doubt you would be
writing 50k rows at a time.  Try to set this to 1 per partition and
you should get much more consistent numbers across runs I would think.
select: fixed(1)/10

On Wed, Mar 4, 2015 at 7:53 AM, Nisha Menon  wrote:
> I have been using the cassandra-stress tool to evaluate my cassandra cluster
> for quite some time now. My problem is that I am not able to comprehend the
> results generated for my specific use case.
>
> My schema looks something like this:
>
> CREATE TABLE Table_test(
>   ID uuid,
>   Time timestamp,
>   Value double,
>   Date timestamp,
>   PRIMARY KEY ((ID,Date), Time)
> ) WITH COMPACT STORAGE;
>
> I have parsed this information in a custom yaml file and used parameters
> n=1, threads=100 and the rest are default options (cl=one, mode=native
> cql3 etc). The Cassandra cluster is a 3 node CentOS VM setup.
>
> A few specifics of the custom yaml file are as follows:
>
> insert:
> partitions: fixed(100)
> select: fixed(1)/2
> batchtype: UNLOGGED
>
> columnspecs:
> -name: Time
>  size: fixed(1000)
> -name: ID
>  size: uniform(1..100)
> -name: Date
>  size: uniform(1..10)
> -name: Value
>  size: uniform(-100..100)
>
> My observations so far are as follows (Please correct me if I am wrong):
>
> With n=1 and time: fixed(1000), the number of rows getting inserted is
> 10 million. (1*1000=1000)
> The number of row-keys/partitions is 1(i.e n), within which 100
> partitions are taken at a time (which means 100 *1000 = 10 key-value
> pairs) out of which 5 key-value pairs are processed at a time. (This is
> because of select: fixed(1)/2 ~ 50%)
>
> The output message also confirms the same:
>
> Generating batches with [100..100] partitions and [5..5] rows
> (of[10..10] total rows in the partitions)
>
> The results that I get are the following for consecutive runs with the same
> configuration as above:
>
> Run Total_ops   Op_rate Partition_rate  Row_Rate   Time
> 1 56   19 1885   943246 3.0
> 2 46   46 4648  2325498 1.0
> 3 27   30 2982  1489870 0.9
> 4 59   19 1932   966034 3.1
> 5 100  17 1730   865182 5.8
>
> Now what I need to understand are as follows:
>
> Which among these metrics is the throughput i.e, No. of records inserted per
> second? Is it the Row_rate, Op_rate or Partition_rate? If it’s the Row_rate,
> can I safely conclude here that I am able to insert close to 1 million
> records per second? Any thoughts on what the Op_rate and Partition_rate mean
> in this case?
> Why is it that the Total_ops vary so drastically in every run ? Has the
> number of threads got anything to do with this variation? What can I
> conclude here about the stability of my Cassandra setup?
> How do I determine the batch size per thread here? In my example, is the
> batch size 5?
>
> Thanks in advance.



-- 
http://twitter.com/tjake


Re: cassandra node jvm stall intermittently

2015-03-09 Thread Robert Coli
On Sat, Mar 7, 2015 at 1:44 AM, Jason Wee  wrote:

> hey Ali, 1.0.8
>
> On Sat, Mar 7, 2015 at 5:20 PM, Ali Akhtar  wrote:
>
>> What version are you running?
>>
>
Upgrade your very old version to at least 1.2.x (via 1.1.x) ASAP.

=Rob


Pointers on deploying snitch for Multi region cluster

2015-03-09 Thread Jan
 HI Folks; 
We are planning to deploy a Multi region C* Cluster with   nodes on both US 
coasts. Need some advice : 
a)  As I do not have Public IP address access,  is there an alternative way to 
deploy EC2MultiRegion snitch using Private IP addresses ? b)  Has anyone used 
EC2_Snitch  with nodes on either coast & connected  multiple VPC's with EC2 
instances using  IPSec tunnels.  Did this work ? c)  Has anyone used  
"Gossiping_File_Property"  snitch & got it working successfully in a Multi 
region deployment. 
Advice/ gotchas/ input/ do's/  don'ts     much appreciated.
ThanksJan

Re: Best way to alert/monitor "nodetool status” down.

2015-03-09 Thread Jan
You could set up an Alert  for Node down within OpsCenter. OpsCenter also 
offers you the option to send an email to a paging system with reminders. 

Jan/ 

 On Sunday, March 8, 2015 6:10 AM, Vasileios Vlachos 
 wrote:
   

  We use Nagios for monitoring, and we call the following through NRPE:
 
 #!/bin/bash
 
 # Just for reference:
 # Nodetool's output represents "Status" ans "State" in this order.
 # Status values: U (up), D (down)
 # State values: N (normal), L (leaving), J (joining), M (moving)
 
 NODETOOL=$(which nodetool);
 NODES_DOWN=$(${NODETOOL} --host localhost status | grep --count -E '^D[A-Z]');
 
 if [[ ${NODES_DOWN} -gt 0 ]]; then
     output="CRITICAL - Nodes down: ${NODES_DOWN}";
     return_code=2;
 elif [[ ${NODES_DOWN} -eq 0 ]]; then
     output="OK - Nodes down: ${NODES_DOWN}";
     return_code=0;
 else
     output="UNKNOWN - Couldn't retrieve cluster information.";
     return_code=3;
 fi
 
 echo "${output}";
 exit "${return_code}";
 
 I've not used zabbix so I'm not sure the exit codes etc are the same for you. 
Also, you may need to modify the REGEX slightly depending on the Cassandra 
version you are using. There must be a way to get this via the JMX console as 
well, which might be easier for you to monitor.
 
 On 07/03/15 00:37, Kevin Burton wrote:
  
 What’s the best way to monitor nodetool status being down? IE if a specific 
server things a node is down (DN). 
  Does this just use JMX?  IS there an API we can call? 
  We want to tie it into our zabbix server so we can detect if here is failure. 
  -- 
 Founder/CEO Spinn3r.com
  Location: San Francisco, CA
  blog: http://burtonator.wordpress.com … or check out my Google+ profile   
 
 -- 
Kind Regards,

Vasileios Vlachos

IT Infrastructure Engineer
MSc Internet & Wireless Computing
BEng Electronics Engineering
Cisco Certified Network Associate (CCNA) 

   

What are the reasons for holding off on 2.1.x at this point?

2015-03-09 Thread Jacob Rhoden

I notice some of the discussion about rolling back and avoiding upgrading. I 
wonder if people can elaborate on their pain points? 

We are in a situation where there are some use cases we wish to implement that 
appear to be much simpler to implement using indexed sets. So it has me 
wondering about what the cons would be of jumping into 2.1.3, instead of having 
to code around the limits of 2.0.x, and then re-write the features once we can 
use 2.1.3. (Ideally we want to get these use cases into prod within the next 4 
weeks)

Thanks,
Jacob

Re: What are the reasons for holding off on 2.1.x at this point?

2015-03-09 Thread graham sanderson
2.1.3 has a few memory leaks/issues, resource management race conditions.

That is horribly vague, however looking at some of the fixes in 2.1.4 I’d be 
tempted to wait on that.

2.1.3 is fine for testing though.

> On Mar 9, 2015, at 6:42 PM, Jacob Rhoden  wrote:
> 
> I notice some of the discussion about rolling back and avoiding upgrading. I 
> wonder if people can elaborate on their pain points? 
> 
> We are in a situation where there are some use cases we wish to implement 
> that appear to be much simpler to implement using indexed sets. So it has me 
> wondering about what the cons would be of jumping into 2.1.3, instead of 
> having to code around the limits of 2.0.x, and then re-write the features 
> once we can use 2.1.3. (Ideally we want to get these use cases into prod 
> within the next 4 weeks)
> 
> Thanks,
> Jacob



smime.p7s
Description: S/MIME cryptographic signature


Re: What are the reasons for holding off on 2.1.x at this point?

2015-03-09 Thread Robert Coli
On Mon, Mar 9, 2015 at 4:42 PM, Jacob Rhoden  wrote:

> I notice some of the discussion about rolling back and avoiding upgrading.
> I wonder if people can elaborate on their pain points?
>
> We are in a situation where there are some use cases we wish to implement
> that appear to be much simpler to implement using indexed sets. So it has
> me wondering about what the cons would be of jumping into 2.1.3, instead of
> having to code around the limits of 2.0.x, and then re-write the features
> once we can use 2.1.3. (Ideally we want to get these use cases into prod
> within the next 4 weeks)
>

2.1.1 probably has some serious issue that I'm not recalling right now.

2.1.2 is broken and should not be run in production.

2.1.3 appears to have a memory leak in some circumstances. If you are not
in those circumstances, perhaps that is not prohibitive.

As Graham suggested, I would develop against 2.1.x but not run 2.1.x in
production until at least 2.1.4.

=Rob


Re: Pointers on deploying snitch for Multi region cluster

2015-03-09 Thread Robert Coli
On Mon, Mar 9, 2015 at 2:17 PM, Jan  wrote:

> c)  Has anyone used  "Gossiping_File_Property"  snitch & got it working
> successfully in a Multi region deployment.
>

Were I attempting the task you're doing, I'd use GPFS.

=Rob


Cassandra Bulk loader, serious issue

2015-03-09 Thread Pranay Agarwal
Hi All,

I used sstableloader to export data from first cassandra cluster (RF 3) to
another cluster with RF 1. Afther all the tables were copied and second
cluster was working fine I decided to run node repair on the second cluster
as regular operation. *This repair cause the data size on the second
cluster to go up almost 3 times . *

Does it mean, the original data was further replicated 3 times and which
means which row is now with 9 replicas!!

Please help. Doc says sstableloader can be used with different RF factor.
Where did I go wrong?
http://datastax.com/documentation/cassandra/2.1/cassandra/tools/toolsBulkloader_t.html

-Pranay


how to clear data from disk

2015-03-09 Thread 鄢来琼
Hi ALL,

After drop table, I found the data is not removed from disk, I should reduce 
the gc_grace_seconds before the drop operation.
I have to wait for 10 days, but there is not enough disk.
Could you tell me there is method to clear the data from disk quickly?
Thank you very much!

Peter


Re: how to clear data from disk

2015-03-09 Thread 曹志富
nodetool clearsnapshot

--
Ranger Tsao

2015-03-10 10:47 GMT+08:00 鄢来琼 :

>  Hi ALL,
>
>
>
> After drop table, I found the data is not removed from disk, I should
> reduce the gc_grace_seconds before the drop operation.
>
> I have to wait for 10 days, but there is not enough disk.
>
> Could you tell me there is method to clear the data from disk quickly?
>
> Thank you very much!
>
>
>
> Peter
>


C* 2.0.9 Compaction Error

2015-03-09 Thread 曹志富
Hi,every one:

I have a 12 nodes C* 2.0.9 cluster for titan.I found some error when doing
compaction,the exception stack:

java.lang.AssertionError: Added column does not sort as the last column

at
org.apache.cassandra.db.ArrayBackedSortedColumns.addColumn(ArrayBackedSortedColumns.java:115)

at org.apache.cassandra.db.ColumnFamily.addColumn(ColumnFamily.java:116)

at org.apache.cassandra.db.ColumnFamily.addAtom(ColumnFamily.java:150)

at
org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:186)

at
org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:98)

at
org.apache.cassandra.db.compaction.PrecompactedRow.(PrecompactedRow.java:85)

at
org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:196)

at
org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:74)

at
org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:55)

at
org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:115)

at
org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:98)

at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)

at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)

at
org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:154)

at
org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)

at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)

at
org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60)

at
org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)

at
org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:198)

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)

at java.util.concurrent.FutureTask.run(FutureTask.java:262)

at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:744)


I found en issue CASSANDRA-7470
 ,but it's about CQL.

So why this Error?


--
Ranger Tsao