Decommissioning a datacenter deletes the data (on decommissioned datacenter)

2014-08-07 Thread srmore
tl;dr: Decommissioning datacenters by running nodetool decommission on a
node deletes the data on the decommissioned node - is this expected ?

I am trying our some tests on my multi-datacenter setup. Somewhere in the
docs I read that decommissioning a node will stream its data to other nodes
but it still retains its copy of the data.

I was expecting the same behavior with multiple datacenters. I am using
cassandra 1.2.12. Following are my observations:


Lets say I have a datacenter DC1 which has keyspace keyspace_dc_1 and I
have another datacenter DC2 which has keyspace keyspace_dc_2. They
already have some data in them.

I add DC2 to DC1, update the replication factors on both the keyspaces.
Looking at the gossipinfo, I can see that the schemas are synced. I then
look at the cfstats output and I can see then both the keyspaces are
replicated on both the datacenters (also on the disk, as I can see a
non-zero sstable count).

Now, I decommission DC2:
1) Update the replication factors for the keyspaces.
2) Run nodetool decommission on all the nodes.

I see that I have lost all my keyspaces (and data), the keyspaces from DC1
and DC2. This does not seem normal to me, is this expected ?

Thanks,
Sandeep


Re: Decommissioning a datacenter deletes the data (on decommissioned datacenter)

2014-08-07 Thread srmore
Hello Rob
Sorry for being ambiguous.  By deletes I mean that running decommission I
can no longer see any keyspaces owned by this node or replicated by other
nodes using the cfstats command. I am also seeing the same behavior when I
remove a single node from a cluster (without datacenters).



On Thu, Aug 7, 2014 at 11:43 AM, Robert Coli rc...@eventbrite.com wrote:

 On Thu, Aug 7, 2014 at 8:26 AM, srmore comom...@gmail.com wrote:


 tl;dr: Decommissioning datacenters by running nodetool decommission on a
 node deletes the data on the decommissioned node - is this expected ?


 What does deletes mean? What does lost all my keyspaces (and data)
 mean?

 =Rob




Re: Decommissioning a datacenter deletes the data (on decommissioned datacenter)

2014-08-07 Thread srmore
On Thu, Aug 7, 2014 at 12:27 PM, Robert Coli rc...@eventbrite.com wrote:

 On Thu, Aug 7, 2014 at 10:04 AM, srmore comom...@gmail.com wrote:

 Sorry for being ambiguous.  By deletes I mean that running decommission
 I can no longer see any keyspaces owned by this node or replicated by other
 nodes using the cfstats command. I am also seeing the same behavior when I
 remove a single node from a cluster (without datacenters).


 I'm still not fully parsing you, but clusters should never forget schema
 as a result of decommission.

 Is that what you are saying is happening?


Yes, this is what is happening.



 (In fact, even the decommissioned node itself does not forget its schema,
 which I personally consider a bug.)



Ok, so I am assuming this is not a normal behavior and possibly a bug  - is
this correct ?



 =Rob




Re: Decommissioning a datacenter deletes the data (on decommissioned datacenter)

2014-08-07 Thread srmore
Thanks for the detailed reply Ken, this really helps. I also realized that
I wasn't doing a 'nodetool rebuild' after reading your email. I was
following the steps mentioned here
http://www.datastax.com/documentation/cassandra/1.2/cassandra/operations/ops_decomission_dc_t.html

I do a test with nodetool rebuild and see what happens.



On Thu, Aug 7, 2014 at 1:27 PM, Ken Hancock ken.hanc...@schange.com wrote:

 My reading is it didn't forget the schema.  It lost the data.


 My reading is decomissioning worked fine.  Possibly when you changed the
 replication on a keyspace to include a second data center, the data didn't
 get replicated.


Probably not because I could see the sstables for the keyspace from the
other datacenter created. My understanding could be wrong though.



 When you ADD a datacenter, you need to do a nodetool rebuild to get the
 data streamed to the new data center.  When you alter a keyspace to include
 another datacenter in its replication schema, a nodetool repair is required
 -- was this done?
 http://www.datastax.com/documentation/cql/3.0/cql/cql_using/update_ks_rf_t.html


I missed the 'nodetool rebuild' step that could be my issue, yes I did run
repair.



 When you use nodetool decomission, you're effectively deleting the
 parititioning token from the cluster.  The node being decommissioned will
 stream its data to the new owners of its original token range.  This
 streaming in no way should affect any other datacenter because you have not
 changed the tokens or data ownership for any datacenter but the one in
 which you are decomissioning a node.


That is what my understanding was, but when I decommission it does clear
out (removes) all the keyspaces.



 When you eventually decomission the last node in the datacenter, all data
 is gone as there are no tokens in that datacenter to own any data.

 If you had a keyspace that was only replicated within that datacenter,
 that data is gone (though you could probably add nodes back in and
 ressurect it).



The (now outdated) documentation [1] says that data remains on the node
even after decommissioning. So I do not understand why the data would go
away.



 If you had a keyspace where you changed the replication to include another
 datacenter, if that datacenter had never received the data, then it may
 have the schema but would have none of the data (other than new data that
 was written AFTER you change the replication).


I would expect the repair to fix this, i.e. to stream the old data to the
newly added datacenter. So, does nodetool rebuild help here ?

[1] https://wiki.apache.org/cassandra/Operations#Removing_nodes_entirely







 On Thu, Aug 7, 2014 at 2:11 PM, srmore comom...@gmail.com wrote:




 On Thu, Aug 7, 2014 at 12:27 PM, Robert Coli rc...@eventbrite.com
 wrote:

 On Thu, Aug 7, 2014 at 10:04 AM, srmore comom...@gmail.com wrote:

 Sorry for being ambiguous.  By deletes I mean that running
 decommission I can no longer see any keyspaces owned by this node or
 replicated by other nodes using the cfstats command. I am also seeing the
 same behavior when I remove a single node from a cluster (without
 datacenters).


 I'm still not fully parsing you, but clusters should never forget
 schema as a result of decommission.

 Is that what you are saying is happening?


 Yes, this is what is happening.



 (In fact, even the decommissioned node itself does not forget its
 schema, which I personally consider a bug.)



 Ok, so I am assuming this is not a normal behavior and possibly a bug  -
 is this correct ?



 =Rob





 --
 *Ken Hancock *| System Architect, Advanced Advertising
 SeaChange International
 50 Nagog Park
 Acton, Massachusetts 01720
 ken.hanc...@schange.com | www.schange.com | NASDAQ:SEAC
 http://www.schange.com/en-US/Company/InvestorRelations.aspx
 Office: +1 (978) 889-3329 | [image: Google Talk:] ken.hanc...@schange.com
  | [image: Skype:]hancockks | [image: Yahoo IM:]hancockks [image:
 LinkedIn] http://www.linkedin.com/in/kenhancock

 [image: SeaChange International]
  http://www.schange.com/This e-mail and any attachments may contain
 information which is SeaChange International confidential. The information
 enclosed is intended only for the addressees herein and may not be copied
 or forwarded without permission from SeaChange International.



Re: Decommissioning a datacenter deletes the data (on decommissioned datacenter)

2014-08-07 Thread srmore
I tried using 'nodetool rebuild' after I add the datacenters,date same
outcome,  and after I decommission my keyspaces are getting wiped out, I
don't understand this.


On Thu, Aug 7, 2014 at 1:54 PM, srmore comom...@gmail.com wrote:


 Thanks for the detailed reply Ken, this really helps. I also realized that
 I wasn't doing a 'nodetool rebuild' after reading your email. I was
 following the steps mentioned here
 http://www.datastax.com/documentation/cassandra/1.2/cassandra/operations/ops_decomission_dc_t.html

 I do a test with nodetool rebuild and see what happens.



 On Thu, Aug 7, 2014 at 1:27 PM, Ken Hancock ken.hanc...@schange.com
 wrote:

 My reading is it didn't forget the schema.  It lost the data.


 My reading is decomissioning worked fine.  Possibly when you changed the
 replication on a keyspace to include a second data center, the data didn't
 get replicated.


 Probably not because I could see the sstables for the keyspace from the
 other datacenter created. My understanding could be wrong though.



 When you ADD a datacenter, you need to do a nodetool rebuild to get the
 data streamed to the new data center.  When you alter a keyspace to include
 another datacenter in its replication schema, a nodetool repair is required
 -- was this done?
 http://www.datastax.com/documentation/cql/3.0/cql/cql_using/update_ks_rf_t.html


 I missed the 'nodetool rebuild' step that could be my issue, yes I did run
 repair.



 When you use nodetool decomission, you're effectively deleting the
 parititioning token from the cluster.  The node being decommissioned will
 stream its data to the new owners of its original token range.  This
 streaming in no way should affect any other datacenter because you have not
 changed the tokens or data ownership for any datacenter but the one in
 which you are decomissioning a node.


 That is what my understanding was, but when I decommission it does clear
 out (removes) all the keyspaces.



 When you eventually decomission the last node in the datacenter, all data
 is gone as there are no tokens in that datacenter to own any data.

 If you had a keyspace that was only replicated within that datacenter,
 that data is gone (though you could probably add nodes back in and
 ressurect it).



 The (now outdated) documentation [1] says that data remains on the node
 even after decommissioning. So I do not understand why the data would go
 away.



 If you had a keyspace where you changed the replication to include
 another datacenter, if that datacenter had never received the data, then it
 may have the schema but would have none of the data (other than new data
 that was written AFTER you change the replication).


 I would expect the repair to fix this, i.e. to stream the old data to the
 newly added datacenter. So, does nodetool rebuild help here ?

 [1] https://wiki.apache.org/cassandra/Operations#Removing_nodes_entirely







 On Thu, Aug 7, 2014 at 2:11 PM, srmore comom...@gmail.com wrote:




 On Thu, Aug 7, 2014 at 12:27 PM, Robert Coli rc...@eventbrite.com
 wrote:

 On Thu, Aug 7, 2014 at 10:04 AM, srmore comom...@gmail.com wrote:

 Sorry for being ambiguous.  By deletes I mean that running
 decommission I can no longer see any keyspaces owned by this node or
 replicated by other nodes using the cfstats command. I am also seeing the
 same behavior when I remove a single node from a cluster (without
 datacenters).


 I'm still not fully parsing you, but clusters should never forget
 schema as a result of decommission.

 Is that what you are saying is happening?


 Yes, this is what is happening.



 (In fact, even the decommissioned node itself does not forget its
 schema, which I personally consider a bug.)



 Ok, so I am assuming this is not a normal behavior and possibly a bug  -
 is this correct ?



 =Rob





 --
  *Ken Hancock *| System Architect, Advanced Advertising
 SeaChange International
 50 Nagog Park
 Acton, Massachusetts 01720
 ken.hanc...@schange.com | www.schange.com | NASDAQ:SEAC
 http://www.schange.com/en-US/Company/InvestorRelations.aspx
 Office: +1 (978) 889-3329 | [image: Google Talk:] ken.hanc...@schange.com
  | [image: Skype:]hancockks | [image: Yahoo IM:]hancockks [image:
 LinkedIn] http://www.linkedin.com/in/kenhancock

 [image: SeaChange International]
  http://www.schange.com/This e-mail and any attachments may contain
 information which is SeaChange International confidential. The information
 enclosed is intended only for the addressees herein and may not be copied
 or forwarded without permission from SeaChange International.





Re: Question 1: JMX binding, Question 2: Logging

2014-02-04 Thread srmore
Hello Kyle,
For your first question, you need to create aliases to localhost e.g.
127.0.0.2,127.0.0.3 etc. this should get you going.
About the logging issue, I think if your instance failing before it gets to
long anything, as an example you can strart one instance and make sure it
logs correctly.

Hope that helps.
Sandeep




On Tue, Feb 4, 2014 at 4:25 PM, Kyle Crumpton (kcrumpto) kcrum...@cisco.com
 wrote:

  Hi all,

  I'm fairly new to Cassandra. I'm deploying it to a PaaS. One thing this
 entails is that it must be able to have more than one instance on a single
 node. I'm running into the problem that JMX binds to 0.0.0.0:7199. My
 question is this: Is there a way to configure this? I have actually found
 the post that said to change the the following

 JVM_OPTS=$JVM_OPTS -Djava.rmi.server.hostname=127.1.246.3 where
 127.1.246.3 is the IP I want to bind to..

 This actually did not change the JMX binding by any means for me. I saw a
 post about a jmx listen address in cassandra.yaml and this also did not
 work.
 Any clarity on whether this is bindable at all? Or if there are plans for
 it?

  Also-

  I have logging turned on. For some reason, though, my Cassandra is not
 actually logging as intended. My log folder is actually empty after each
 (failed) run (due to the port being taken by my other cassandra process).

  Here is an actual copy of my log4j-server.properites file:
 http://fpaste.org/74470/15510941/

  Any idea why this might not be logging?

  Thank you and best regards

  Kyle



Re: Lots of deletions results in death by GC

2014-02-04 Thread srmore
Sorry to hear that Robert, I ran into similar issue a while ago. I had an
extremely heavy write and update load, as a result Cassandra (1.2.9) was
constantly flushing to disk and used to GC, tried exactly the same steps
you tried (tuning memtable_flush_writers (to 2) and
memtable_flush_queue_size (to 8) )  no luck. Almost all of the issues went
away when I migrated to 1.2.13 this release also had some fixes which I
badly needed.  What version are you running ? (I tried to look in the
thread but couldn't find one, sorry if this is a repeat question)

Dropped messages are the sign that Cassandra is taking heavy that's the
load shedding mechanism. I would love to see some sort of  back-pressure
implemented.

-sandeep


On Tue, Feb 4, 2014 at 6:10 PM, Robert Wille rwi...@fold3.com wrote:

 I ran my test again, and Flush Writer's All time blocked increased to 2
 and then shortly thereafter GC went into its death spiral. I doubled
 memtable_flush_writers (to 2) and memtable_flush_queue_size (to 8) and
 tried again.

 This time, the table that always sat with Memtable data size = 0 now
 showed increases in Memtable data size. That was encouraging. It never
 flushed, which isn't too surprising, because that table has relatively few
 rows and they are pretty wide. However, on the fourth table to clean, Flush
 Writer's All time blocked went to 1, and then there were no more
 completed events, and about 10 minutes later GC went into its death spiral.
 I assume that each time Flush Writer completes an event, that means a table
 was flushed. Is that right? Also, I got two dropped mutation messages at
 the same time that Flush Writer's All time blocked incremented.

 I then increased the writers and queue size to 3 and 12, respectively, and
 ran my test again. This time All time blocked remained at 0, but I still
 suffered death by GC.

 I would almost think that this is caused by high load on the server, but
 I've never seen CPU utilization go above about two of my eight available
 cores. If high load triggers this problem, then that is very disconcerting.
 That means that a CPU spike could permanently cripple a node. Okay, not
 permanently, but until a manual flush occurs.

 If anyone has any further thoughts, I'd love to hear them. I'm quite at
 the end of my rope.

 Thanks in advance

 Robert

 From: Nate McCall n...@thelastpickle.com
 Reply-To: user@cassandra.apache.org
 Date: Saturday, February 1, 2014 at 9:25 AM
 To: Cassandra Users user@cassandra.apache.org
 Subject: Re: Lots of deletions results in death by GC

 What's the output of 'nodetool tpstats' while this is happening?
 Specifically is Flush Writer All time blocked increasing? If so, play
 around with turning up memtable_flush_writers and memtable_flush_queue_size
 and see if that helps.


 On Sat, Feb 1, 2014 at 9:03 AM, Robert Wille rwi...@fold3.com wrote:

 A few days ago I posted about an issue I'm having where GC takes a long
 time (20-30 seconds), and it happens repeatedly and basically no work gets
 done. I've done further investigation, and I now believe that I know the
 cause. If I do a lot of deletes, it creates memory pressure until the
 memtables are flushed, but Cassandra doesn't flush them. If I manually
 flush, then life is good again (although that takes a very long time
 because of the GC issue). If I just leave the flushing to Cassandra, then I
 end up with death by GC. I believe that when the memtables are full of
 tombstones, Cassadnra doesn't realize how much memory the memtables are
 actually taking up, and so it doesn't proactively flush them in order to
 free up heap.

 As I was deleting records out of one of my tables, I was watching it via
 nodetool cfstats, and I found a very curious thing:

 Memtable cell count: 1285
 Memtable data size, bytes: 0
 Memtable switch count: 56

 As the deletion process was chugging away, the memtable cell count
 increased, as expected, but the data size stayed at 0. No flushing
 occurred.

 Here's the schema for this table:

 CREATE TABLE bdn_index_pub (

 tshard VARCHAR,

 pord INT,

 ord INT,

 hpath VARCHAR,

 page BIGINT,

 PRIMARY KEY (tshard, pord)

 ) WITH gc_grace_seconds = 0 AND compaction = { 'class' :
 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };

 I have a few tables that I run this cleaning process on, and not all of
 them exhibit this behavior. One of them reported an increasing number of
 bytes, as expected, and it also flushed as expected. Here's the schema for
 that table:


 CREATE TABLE bdn_index_child (

 ptshard VARCHAR,

 ord INT,

 hpath VARCHAR,

 PRIMARY KEY (ptshard, ord)

 ) WITH gc_grace_seconds = 0 AND compaction = { 'class' :
 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };

 In both cases, I'm deleting the entire record (i.e. specifying just the
 first component of the primary key in the delete statement). Most records
 in bdn_index_pub have 10,000 rows per record. bdn_index_child usually has
 

Re: MUTATION messages dropped

2013-12-19 Thread srmore
What version of Cassandra are you running ? I used to see them a lot with
1.2.9, I could correlate the dropped messages with the heap usage almost
every time, so check in the logs whether you are getting GC'd. In this
respect 1.2.12 appears to be more stable. Moving to 1.2.12 took care of
this for us.

Thanks,
Sandeep


On Thu, Dec 19, 2013 at 6:12 AM, Alexander Shutyaev shuty...@gmail.comwrote:

 Hi all!

 We've had a problem with cassandra recently. We had 2 one-minute periods
 when we got a lot of timeouts on the client side (the only timeouts during
 9 days we are using cassandra in production). In the logs we've found
 corresponding messages saying something about MUTATION messages dropped.

 Now, the official faq [1] says that this is an indicator that the load is
 too high. We've checked our monitoring and found out that 1-minute average
 cpu load had a local peak at the time of the problem, but it was like 0.8
 against 0.2 usual which I guess is nothing for a 2 core virtual machine.
 We've also checked java threads - there was no peak there and their count
 was reasonable ~240-250.

 Can anyone give us a hint - what should we monitor to see this high load
 and what should we tune to make it acceptable?

 Thanks in advance,
 Alexander

 [1] http://wiki.apache.org/cassandra/FAQ#dropped_messages



Re: Write performance with 1.2.12

2013-12-12 Thread srmore
On Wed, Dec 11, 2013 at 10:49 PM, Aaron Morton aa...@thelastpickle.comwrote:

 It is the write latency, read latency is ok. Interestingly the latency is
 low when there is one node. When I join other nodes the latency drops about
 1/3. To be specific, when I start sending traffic to the other nodes the
 latency for all the nodes increases, if I stop traffic to other nodes the
 latency drops again, I checked, this is not node specific it happens to any
 node.

 Is this the local write latency or the cluster wide write request latency
 ?


This is a cluster wide write latency.



 What sort of numbers are you seeing ?



I have a custom application that writes data to the cassandra node, so the
numbers might be different than the standard stress test but it should be
good enough for comparison. With the previous release 1.0.12 I was getting
around 10K requests/ sec and with 1.2.12 I am getting around 6K requests/
sec. Everything else is the same. This is a three node cluster.

With a single node I get 3K for cassandra 1.0.12 and 1.2.12. So I suspect
there is some network chatter. I have started looking at the sources,
hoping to find something.

-sandeep


 Cheers

 -
 Aaron Morton
 New Zealand
 @aaronmorton

 Co-Founder  Principal Consultant
 Apache Cassandra Consulting
 http://www.thelastpickle.com

 On 12/12/2013, at 3:39 pm, srmore comom...@gmail.com wrote:

 Thanks Aaron


 On Wed, Dec 11, 2013 at 8:15 PM, Aaron Morton aa...@thelastpickle.comwrote:

 Changed memtable_total_space_in_mb to 1024 still no luck.

 Reducing memtable_total_space_in_mb will increase the frequency of
 flushing to disk, which will create more for compaction to do and result in
 increased IO.

 You should return it to the default.


 You are right, had to revert it back to default.



 when I send traffic to one node its performance is 2x more than when I
 send traffic to all the nodes.



 What are you measuring, request latency or local read/write latency ?

 If it’s write latency it’s probably GC, if it’s read is probably IO or
 data model.


 It is the write latency, read latency is ok. Interestingly the latency is
 low when there is one node. When I join other nodes the latency drops about
 1/3. To be specific, when I start sending traffic to the other nodes the
 latency for all the nodes increases, if I stop traffic to other nodes the
 latency drops again, I checked, this is not node specific it happens to any
 node.

 I don't see any GC activity in logs. Tried to control the compaction by
 reducing the number of threads, did not help much.


 Hope that helps.

  -
 Aaron Morton
 New Zealand
 @aaronmorton

 Co-Founder  Principal Consultant
 Apache Cassandra Consulting
 http://www.thelastpickle.com

 On 7/12/2013, at 8:05 am, srmore comom...@gmail.com wrote:

 Changed memtable_total_space_in_mb to 1024 still no luck.


 On Fri, Dec 6, 2013 at 11:05 AM, Vicky Kak vicky@gmail.com wrote:

 Can you set the memtable_total_space_in_mb value, it is defaulting to
 1/3 which is 8/3 ~ 2.6 gb in capacity

 http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-improved-memory-and-disk-space-management

 The flushing of 2.6 gb to the disk might slow the performance if
 frequently called, may be you have lots of write operations going on.



 On Fri, Dec 6, 2013 at 10:06 PM, srmore comom...@gmail.com wrote:




 On Fri, Dec 6, 2013 at 9:59 AM, Vicky Kak vicky@gmail.com wrote:

 You have passed the JVM configurations and not the cassandra
 configurations which is in cassandra.yaml.


 Apologies, was tuning JVM and that's what was in my mind.
 Here are the cassandra settings http://pastebin.com/uN42GgYT



 The spikes are not that significant in our case and we are running the
 cluster with 1.7 gb heap.

 Are these spikes causing any issue at your end?


 There are no big spikes, the overall performance seems to be about 40%
 low.






 On Fri, Dec 6, 2013 at 9:10 PM, srmore comom...@gmail.com wrote:




 On Fri, Dec 6, 2013 at 9:32 AM, Vicky Kak vicky@gmail.comwrote:

 Hard to say much without knowing about the cassandra configurations.


 The cassandra configuration is
 -Xms8G
 -Xmx8G
 -Xmn800m
 -XX:+UseParNewGC
 -XX:+UseConcMarkSweepGC
 -XX:+CMSParallelRemarkEnabled
 -XX:SurvivorRatio=4
 -XX:MaxTenuringThreshold=2
 -XX:CMSInitiatingOccupancyFraction=75
 -XX:+UseCMSInitiatingOccupancyOnly



 Yes compactions/GC's could skipe the CPU, I had similar behavior
 with my setup.


 Were you able to get around it ?



 -VK


 On Fri, Dec 6, 2013 at 7:40 PM, srmore comom...@gmail.com wrote:

 We have a 3 node cluster running cassandra 1.2.12, they are pretty
 big machines 64G ram with 16 cores, cassandra heap is 8G.

 The interesting observation is that, when I send traffic to one
 node its performance is 2x more than when I send traffic to all the 
 nodes.
 We ran 1.0.11 on the same box and we observed a slight dip but not 
 half as
 seen with 1.2.12. In both the cases we were writing

Re: Write performance with 1.2.12

2013-12-12 Thread srmore
On Thu, Dec 12, 2013 at 11:15 AM, J. Ryan Earl o...@jryanearl.us wrote:

 Why did you switch to RandomPartitioner away from Murmur3Partitioner?
  Have you tried with Murmur3?


1. # partitioner: org.apache.cassandra.dht.Murmur3Partitioner
2. partitioner: org.apache.cassandra.dht.RandomPartitioner



Since I am comparing between the two versions I am keeping all the settings
same. I see
Murmur3Partitioner has some performance improvement but then switching back
to
RandomPartitioner should not cause performance to tank, right ? or am I
missing something ?

Also, is there an easier way to update the data from RandomPartitioner to
Murmur3 ? (upgradesstable ?)





 On Fri, Dec 6, 2013 at 10:36 AM, srmore comom...@gmail.com wrote:




 On Fri, Dec 6, 2013 at 9:59 AM, Vicky Kak vicky@gmail.com wrote:

 You have passed the JVM configurations and not the cassandra
 configurations which is in cassandra.yaml.


 Apologies, was tuning JVM and that's what was in my mind.
 Here are the cassandra settings http://pastebin.com/uN42GgYT



 The spikes are not that significant in our case and we are running the
 cluster with 1.7 gb heap.

 Are these spikes causing any issue at your end?


 There are no big spikes, the overall performance seems to be about 40%
 low.






 On Fri, Dec 6, 2013 at 9:10 PM, srmore comom...@gmail.com wrote:




 On Fri, Dec 6, 2013 at 9:32 AM, Vicky Kak vicky@gmail.com wrote:

 Hard to say much without knowing about the cassandra configurations.


 The cassandra configuration is
 -Xms8G
 -Xmx8G
 -Xmn800m
 -XX:+UseParNewGC
 -XX:+UseConcMarkSweepGC
 -XX:+CMSParallelRemarkEnabled
 -XX:SurvivorRatio=4
 -XX:MaxTenuringThreshold=2
 -XX:CMSInitiatingOccupancyFraction=75
 -XX:+UseCMSInitiatingOccupancyOnly



  Yes compactions/GC's could skipe the CPU, I had similar behavior with
 my setup.


 Were you able to get around it ?



 -VK


 On Fri, Dec 6, 2013 at 7:40 PM, srmore comom...@gmail.com wrote:

 We have a 3 node cluster running cassandra 1.2.12, they are pretty
 big machines 64G ram with 16 cores, cassandra heap is 8G.

 The interesting observation is that, when I send traffic to one node
 its performance is 2x more than when I send traffic to all the nodes. We
 ran 1.0.11 on the same box and we observed a slight dip but not half as
 seen with 1.2.12. In both the cases we were writing with LOCAL_QUORUM.
 Changing CL to ONE make a slight improvement but not much.

 The read_Repair_chance is 0.1. We see some compactions running.

 following is my iostat -x output, sda is the ssd (for commit log) and
 sdb is the spinner.

 avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   66.460.008.950.010.00   24.58

 Device: rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s
 avgrq-sz avgqu-sz   await  svctm  %util
 sda   0.0027.60  0.00  4.40 0.00   256.00
 58.18 0.012.55   1.32   0.58
 sda1  0.00 0.00  0.00  0.00 0.00 0.00
 0.00 0.000.00   0.00   0.00
 sda2  0.0027.60  0.00  4.40 0.00   256.00
 58.18 0.012.55   1.32   0.58
 sdb   0.00 0.00  0.00  0.00 0.00 0.00
 0.00 0.000.00   0.00   0.00
 sdb1  0.00 0.00  0.00  0.00 0.00 0.00
 0.00 0.000.00   0.00   0.00
 dm-0  0.00 0.00  0.00  0.00 0.00 0.00
 0.00 0.000.00   0.00   0.00
 dm-1  0.00 0.00  0.00  0.60 0.00 4.80
 8.00 0.005.33   2.67   0.16
 dm-2  0.00 0.00  0.00  0.00 0.00 0.00
 0.00 0.000.00   0.00   0.00
 dm-3  0.00 0.00  0.00 24.80 0.00   198.40
 8.00 0.249.80   0.13   0.32
 dm-4  0.00 0.00  0.00  6.60 0.0052.80
 8.00 0.011.36   0.55   0.36
 dm-5  0.00 0.00  0.00  0.00 0.00 0.00
 0.00 0.000.00   0.00   0.00
 dm-6  0.00 0.00  0.00 24.80 0.00   198.40
 8.00 0.29   11.60   0.13   0.32



 I can see I am cpu bound here but couldn't figure out exactly what is
 causing it, is this caused by GC or Compaction ? I am thinking it is
 compaction, I see a lot of context switches and interrupts in my vmstat
 output.

 I don't see GC activity in the logs but see some compaction activity.
 Has anyone seen this ? or know what can be done to free up the CPU.

 Thanks,
 Sandeep










Re: Write performance with 1.2.12

2013-12-11 Thread srmore
Thanks Aaron


On Wed, Dec 11, 2013 at 8:15 PM, Aaron Morton aa...@thelastpickle.comwrote:

 Changed memtable_total_space_in_mb to 1024 still no luck.

 Reducing memtable_total_space_in_mb will increase the frequency of
 flushing to disk, which will create more for compaction to do and result in
 increased IO.

 You should return it to the default.


You are right, had to revert it back to default.



 when I send traffic to one node its performance is 2x more than when I
 send traffic to all the nodes.



 What are you measuring, request latency or local read/write latency ?

 If it’s write latency it’s probably GC, if it’s read is probably IO or
 data model.


It is the write latency, read latency is ok. Interestingly the latency is
low when there is one node. When I join other nodes the latency drops about
1/3. To be specific, when I start sending traffic to the other nodes the
latency for all the nodes increases, if I stop traffic to other nodes the
latency drops again, I checked, this is not node specific it happens to any
node.

I don't see any GC activity in logs. Tried to control the compaction by
reducing the number of threads, did not help much.


 Hope that helps.

 -
 Aaron Morton
 New Zealand
 @aaronmorton

 Co-Founder  Principal Consultant
 Apache Cassandra Consulting
 http://www.thelastpickle.com

 On 7/12/2013, at 8:05 am, srmore comom...@gmail.com wrote:

 Changed memtable_total_space_in_mb to 1024 still no luck.


 On Fri, Dec 6, 2013 at 11:05 AM, Vicky Kak vicky@gmail.com wrote:

 Can you set the memtable_total_space_in_mb value, it is defaulting to
 1/3 which is 8/3 ~ 2.6 gb in capacity

 http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-improved-memory-and-disk-space-management

 The flushing of 2.6 gb to the disk might slow the performance if
 frequently called, may be you have lots of write operations going on.



 On Fri, Dec 6, 2013 at 10:06 PM, srmore comom...@gmail.com wrote:




 On Fri, Dec 6, 2013 at 9:59 AM, Vicky Kak vicky@gmail.com wrote:

 You have passed the JVM configurations and not the cassandra
 configurations which is in cassandra.yaml.


 Apologies, was tuning JVM and that's what was in my mind.
 Here are the cassandra settings http://pastebin.com/uN42GgYT



 The spikes are not that significant in our case and we are running the
 cluster with 1.7 gb heap.

 Are these spikes causing any issue at your end?


 There are no big spikes, the overall performance seems to be about 40%
 low.






 On Fri, Dec 6, 2013 at 9:10 PM, srmore comom...@gmail.com wrote:




 On Fri, Dec 6, 2013 at 9:32 AM, Vicky Kak vicky@gmail.com wrote:

 Hard to say much without knowing about the cassandra configurations.


 The cassandra configuration is
 -Xms8G
 -Xmx8G
 -Xmn800m
 -XX:+UseParNewGC
 -XX:+UseConcMarkSweepGC
 -XX:+CMSParallelRemarkEnabled
 -XX:SurvivorRatio=4
 -XX:MaxTenuringThreshold=2
 -XX:CMSInitiatingOccupancyFraction=75
 -XX:+UseCMSInitiatingOccupancyOnly



 Yes compactions/GC's could skipe the CPU, I had similar behavior with
 my setup.


 Were you able to get around it ?



 -VK


 On Fri, Dec 6, 2013 at 7:40 PM, srmore comom...@gmail.com wrote:

 We have a 3 node cluster running cassandra 1.2.12, they are pretty
 big machines 64G ram with 16 cores, cassandra heap is 8G.

 The interesting observation is that, when I send traffic to one node
 its performance is 2x more than when I send traffic to all the nodes. We
 ran 1.0.11 on the same box and we observed a slight dip but not half as
 seen with 1.2.12. In both the cases we were writing with LOCAL_QUORUM.
 Changing CL to ONE make a slight improvement but not much.

 The read_Repair_chance is 0.1. We see some compactions running.

 following is my iostat -x output, sda is the ssd (for commit log)
 and sdb is the spinner.

 avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   66.460.008.950.010.00   24.58

 Device: rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s
 avgrq-sz avgqu-sz   await  svctm  %util
 sda   0.0027.60  0.00  4.40 0.00   256.00
 58.18 0.012.55   1.32   0.58
 sda1  0.00 0.00  0.00  0.00 0.00 0.00
 0.00 0.000.00   0.00   0.00
 sda2  0.0027.60  0.00  4.40 0.00   256.00
 58.18 0.012.55   1.32   0.58
 sdb   0.00 0.00  0.00  0.00 0.00 0.00
 0.00 0.000.00   0.00   0.00
 sdb1  0.00 0.00  0.00  0.00 0.00 0.00
 0.00 0.000.00   0.00   0.00
 dm-0  0.00 0.00  0.00  0.00 0.00 0.00
 0.00 0.000.00   0.00   0.00
 dm-1  0.00 0.00  0.00  0.60 0.00 4.80
 8.00 0.005.33   2.67   0.16
 dm-2  0.00 0.00  0.00  0.00 0.00 0.00
 0.00 0.000.00   0.00   0.00
 dm-3  0.00 0.00  0.00 24.80 0.00   198.40
 8.00 0.249.80   0.13   0.32
 dm-4  0.00 0.00  0.00  6.60 0.0052.80

Write performance with 1.2.12

2013-12-06 Thread srmore
We have a 3 node cluster running cassandra 1.2.12, they are pretty big
machines 64G ram with 16 cores, cassandra heap is 8G.

The interesting observation is that, when I send traffic to one node its
performance is 2x more than when I send traffic to all the nodes. We ran
1.0.11 on the same box and we observed a slight dip but not half as seen
with 1.2.12. In both the cases we were writing with LOCAL_QUORUM. Changing
CL to ONE make a slight improvement but not much.

The read_Repair_chance is 0.1. We see some compactions running.

following is my iostat -x output, sda is the ssd (for commit log) and sdb
is the spinner.

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  66.460.008.950.010.00   24.58

Device: rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sda   0.0027.60  0.00  4.40 0.00   256.0058.18
0.012.55   1.32   0.58
sda1  0.00 0.00  0.00  0.00 0.00 0.00 0.00
0.000.00   0.00   0.00
sda2  0.0027.60  0.00  4.40 0.00   256.0058.18
0.012.55   1.32   0.58
sdb   0.00 0.00  0.00  0.00 0.00 0.00 0.00
0.000.00   0.00   0.00
sdb1  0.00 0.00  0.00  0.00 0.00 0.00 0.00
0.000.00   0.00   0.00
dm-0  0.00 0.00  0.00  0.00 0.00 0.00 0.00
0.000.00   0.00   0.00
dm-1  0.00 0.00  0.00  0.60 0.00 4.80 8.00
0.005.33   2.67   0.16
dm-2  0.00 0.00  0.00  0.00 0.00 0.00 0.00
0.000.00   0.00   0.00
dm-3  0.00 0.00  0.00 24.80 0.00   198.40 8.00
0.249.80   0.13   0.32
dm-4  0.00 0.00  0.00  6.60 0.0052.80 8.00
0.011.36   0.55   0.36
dm-5  0.00 0.00  0.00  0.00 0.00 0.00 0.00
0.000.00   0.00   0.00
dm-6  0.00 0.00  0.00 24.80 0.00   198.40 8.00
0.29   11.60   0.13   0.32



I can see I am cpu bound here but couldn't figure out exactly what is
causing it, is this caused by GC or Compaction ? I am thinking it is
compaction, I see a lot of context switches and interrupts in my vmstat
output.

I don't see GC activity in the logs but see some compaction activity. Has
anyone seen this ? or know what can be done to free up the CPU.

Thanks,
Sandeep


Re: Write performance with 1.2.12

2013-12-06 Thread srmore
On Fri, Dec 6, 2013 at 9:32 AM, Vicky Kak vicky@gmail.com wrote:

 Hard to say much without knowing about the cassandra configurations.


The cassandra configuration is
-Xms8G
-Xmx8G
-Xmn800m
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled
-XX:SurvivorRatio=4
-XX:MaxTenuringThreshold=2
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly



 Yes compactions/GC's could skipe the CPU, I had similar behavior with my
 setup.


Were you able to get around it ?



 -VK


 On Fri, Dec 6, 2013 at 7:40 PM, srmore comom...@gmail.com wrote:

 We have a 3 node cluster running cassandra 1.2.12, they are pretty big
 machines 64G ram with 16 cores, cassandra heap is 8G.

 The interesting observation is that, when I send traffic to one node its
 performance is 2x more than when I send traffic to all the nodes. We ran
 1.0.11 on the same box and we observed a slight dip but not half as seen
 with 1.2.12. In both the cases we were writing with LOCAL_QUORUM. Changing
 CL to ONE make a slight improvement but not much.

 The read_Repair_chance is 0.1. We see some compactions running.

 following is my iostat -x output, sda is the ssd (for commit log) and sdb
 is the spinner.

 avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   66.460.008.950.010.00   24.58

 Device: rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
 avgqu-sz   await  svctm  %util
 sda   0.0027.60  0.00  4.40 0.00   256.00
 58.18 0.012.55   1.32   0.58
 sda1  0.00 0.00  0.00  0.00 0.00 0.00
 0.00 0.000.00   0.00   0.00
 sda2  0.0027.60  0.00  4.40 0.00   256.00
 58.18 0.012.55   1.32   0.58
 sdb   0.00 0.00  0.00  0.00 0.00 0.00
 0.00 0.000.00   0.00   0.00
 sdb1  0.00 0.00  0.00  0.00 0.00 0.00
 0.00 0.000.00   0.00   0.00
 dm-0  0.00 0.00  0.00  0.00 0.00 0.00
 0.00 0.000.00   0.00   0.00
 dm-1  0.00 0.00  0.00  0.60 0.00 4.80
 8.00 0.005.33   2.67   0.16
 dm-2  0.00 0.00  0.00  0.00 0.00 0.00
 0.00 0.000.00   0.00   0.00
 dm-3  0.00 0.00  0.00 24.80 0.00   198.40
 8.00 0.249.80   0.13   0.32
 dm-4  0.00 0.00  0.00  6.60 0.0052.80
 8.00 0.011.36   0.55   0.36
 dm-5  0.00 0.00  0.00  0.00 0.00 0.00
 0.00 0.000.00   0.00   0.00
 dm-6  0.00 0.00  0.00 24.80 0.00   198.40
 8.00 0.29   11.60   0.13   0.32



 I can see I am cpu bound here but couldn't figure out exactly what is
 causing it, is this caused by GC or Compaction ? I am thinking it is
 compaction, I see a lot of context switches and interrupts in my vmstat
 output.

 I don't see GC activity in the logs but see some compaction activity. Has
 anyone seen this ? or know what can be done to free up the CPU.

 Thanks,
 Sandeep






Re: Write performance with 1.2.12

2013-12-06 Thread srmore
On Fri, Dec 6, 2013 at 9:59 AM, Vicky Kak vicky@gmail.com wrote:

 You have passed the JVM configurations and not the cassandra
 configurations which is in cassandra.yaml.


Apologies, was tuning JVM and that's what was in my mind.
Here are the cassandra settings http://pastebin.com/uN42GgYT



 The spikes are not that significant in our case and we are running the
 cluster with 1.7 gb heap.

 Are these spikes causing any issue at your end?


There are no big spikes, the overall performance seems to be about 40% low.






 On Fri, Dec 6, 2013 at 9:10 PM, srmore comom...@gmail.com wrote:




 On Fri, Dec 6, 2013 at 9:32 AM, Vicky Kak vicky@gmail.com wrote:

 Hard to say much without knowing about the cassandra configurations.


 The cassandra configuration is
 -Xms8G
 -Xmx8G
 -Xmn800m
 -XX:+UseParNewGC
 -XX:+UseConcMarkSweepGC
 -XX:+CMSParallelRemarkEnabled
 -XX:SurvivorRatio=4
 -XX:MaxTenuringThreshold=2
 -XX:CMSInitiatingOccupancyFraction=75
 -XX:+UseCMSInitiatingOccupancyOnly



 Yes compactions/GC's could skipe the CPU, I had similar behavior with my
 setup.


 Were you able to get around it ?



 -VK


 On Fri, Dec 6, 2013 at 7:40 PM, srmore comom...@gmail.com wrote:

 We have a 3 node cluster running cassandra 1.2.12, they are pretty big
 machines 64G ram with 16 cores, cassandra heap is 8G.

 The interesting observation is that, when I send traffic to one node
 its performance is 2x more than when I send traffic to all the nodes. We
 ran 1.0.11 on the same box and we observed a slight dip but not half as
 seen with 1.2.12. In both the cases we were writing with LOCAL_QUORUM.
 Changing CL to ONE make a slight improvement but not much.

 The read_Repair_chance is 0.1. We see some compactions running.

 following is my iostat -x output, sda is the ssd (for commit log) and
 sdb is the spinner.

 avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   66.460.008.950.010.00   24.58

 Device: rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
 avgqu-sz   await  svctm  %util
 sda   0.0027.60  0.00  4.40 0.00   256.00
 58.18 0.012.55   1.32   0.58
 sda1  0.00 0.00  0.00  0.00 0.00 0.00
 0.00 0.000.00   0.00   0.00
 sda2  0.0027.60  0.00  4.40 0.00   256.00
 58.18 0.012.55   1.32   0.58
 sdb   0.00 0.00  0.00  0.00 0.00 0.00
 0.00 0.000.00   0.00   0.00
 sdb1  0.00 0.00  0.00  0.00 0.00 0.00
 0.00 0.000.00   0.00   0.00
 dm-0  0.00 0.00  0.00  0.00 0.00 0.00
 0.00 0.000.00   0.00   0.00
 dm-1  0.00 0.00  0.00  0.60 0.00 4.80
 8.00 0.005.33   2.67   0.16
 dm-2  0.00 0.00  0.00  0.00 0.00 0.00
 0.00 0.000.00   0.00   0.00
 dm-3  0.00 0.00  0.00 24.80 0.00   198.40
 8.00 0.249.80   0.13   0.32
 dm-4  0.00 0.00  0.00  6.60 0.0052.80
 8.00 0.011.36   0.55   0.36
 dm-5  0.00 0.00  0.00  0.00 0.00 0.00
 0.00 0.000.00   0.00   0.00
 dm-6  0.00 0.00  0.00 24.80 0.00   198.40
 8.00 0.29   11.60   0.13   0.32



 I can see I am cpu bound here but couldn't figure out exactly what is
 causing it, is this caused by GC or Compaction ? I am thinking it is
 compaction, I see a lot of context switches and interrupts in my vmstat
 output.

 I don't see GC activity in the logs but see some compaction activity.
 Has anyone seen this ? or know what can be done to free up the CPU.

 Thanks,
 Sandeep








Re: Write performance with 1.2.12

2013-12-06 Thread srmore
Looks like I am spending some time in GC.

java.lang:type=GarbageCollector,name=ConcurrentMarkSweep

CollectionTime = 51707;
CollectionCount = 103;

java.lang:type=GarbageCollector,name=ParNew

 CollectionTime = 466835;
 CollectionCount = 21315;


On Fri, Dec 6, 2013 at 9:58 AM, Jason Wee peich...@gmail.com wrote:

 Hi srmore,

 Perhaps if you use jconsole and connect to the jvm using jmx. Then uner
 MBeans tab, start inspecting the GC metrics.

 /Jason


 On Fri, Dec 6, 2013 at 11:40 PM, srmore comom...@gmail.com wrote:




 On Fri, Dec 6, 2013 at 9:32 AM, Vicky Kak vicky@gmail.com wrote:

 Hard to say much without knowing about the cassandra configurations.


 The cassandra configuration is
 -Xms8G
 -Xmx8G
 -Xmn800m
 -XX:+UseParNewGC
 -XX:+UseConcMarkSweepGC
 -XX:+CMSParallelRemarkEnabled
 -XX:SurvivorRatio=4
 -XX:MaxTenuringThreshold=2
 -XX:CMSInitiatingOccupancyFraction=75
 -XX:+UseCMSInitiatingOccupancyOnly



 Yes compactions/GC's could skipe the CPU, I had similar behavior with my
 setup.


 Were you able to get around it ?



 -VK


 On Fri, Dec 6, 2013 at 7:40 PM, srmore comom...@gmail.com wrote:

 We have a 3 node cluster running cassandra 1.2.12, they are pretty big
 machines 64G ram with 16 cores, cassandra heap is 8G.

 The interesting observation is that, when I send traffic to one node
 its performance is 2x more than when I send traffic to all the nodes. We
 ran 1.0.11 on the same box and we observed a slight dip but not half as
 seen with 1.2.12. In both the cases we were writing with LOCAL_QUORUM.
 Changing CL to ONE make a slight improvement but not much.

 The read_Repair_chance is 0.1. We see some compactions running.

 following is my iostat -x output, sda is the ssd (for commit log) and
 sdb is the spinner.

 avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   66.460.008.950.010.00   24.58

 Device: rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
 avgqu-sz   await  svctm  %util
 sda   0.0027.60  0.00  4.40 0.00   256.00
 58.18 0.012.55   1.32   0.58
 sda1  0.00 0.00  0.00  0.00 0.00 0.00
 0.00 0.000.00   0.00   0.00
 sda2  0.0027.60  0.00  4.40 0.00   256.00
 58.18 0.012.55   1.32   0.58
 sdb   0.00 0.00  0.00  0.00 0.00 0.00
 0.00 0.000.00   0.00   0.00
 sdb1  0.00 0.00  0.00  0.00 0.00 0.00
 0.00 0.000.00   0.00   0.00
 dm-0  0.00 0.00  0.00  0.00 0.00 0.00
 0.00 0.000.00   0.00   0.00
 dm-1  0.00 0.00  0.00  0.60 0.00 4.80
 8.00 0.005.33   2.67   0.16
 dm-2  0.00 0.00  0.00  0.00 0.00 0.00
 0.00 0.000.00   0.00   0.00
 dm-3  0.00 0.00  0.00 24.80 0.00   198.40
 8.00 0.249.80   0.13   0.32
 dm-4  0.00 0.00  0.00  6.60 0.0052.80
 8.00 0.011.36   0.55   0.36
 dm-5  0.00 0.00  0.00  0.00 0.00 0.00
 0.00 0.000.00   0.00   0.00
 dm-6  0.00 0.00  0.00 24.80 0.00   198.40
 8.00 0.29   11.60   0.13   0.32



 I can see I am cpu bound here but couldn't figure out exactly what is
 causing it, is this caused by GC or Compaction ? I am thinking it is
 compaction, I see a lot of context switches and interrupts in my vmstat
 output.

 I don't see GC activity in the logs but see some compaction activity.
 Has anyone seen this ? or know what can be done to free up the CPU.

 Thanks,
 Sandeep








Re: Write performance with 1.2.12

2013-12-06 Thread srmore
Not long: Uptime (seconds) : 6828

Token: 56713727820156410577229101238628035242
ID   : c796609a-a050-48df-bf56-bb09091376d9
Gossip active: true
Thrift active: true
Native Transport active: false
Load : 49.71 GB
Generation No: 1386344053
Uptime (seconds) : 6828
Heap Memory (MB) : 2409.71 / 8112.00
Data Center  : DC
Rack : RAC-1
Exceptions   : 0
Key Cache: size 56154704 (bytes), capacity 104857600 (bytes), 27
hits, 155669426 requests, 0.000 recent hit rate, 14400 save period in
seconds
Row Cache: size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests,
NaN recent hit rate, 0 save period in seconds


On Fri, Dec 6, 2013 at 11:15 AM, Vicky Kak vicky@gmail.com wrote:

 Since how long the server had been up, hours,days,months?


 On Fri, Dec 6, 2013 at 10:41 PM, srmore comom...@gmail.com wrote:

 Looks like I am spending some time in GC.

 java.lang:type=GarbageCollector,name=ConcurrentMarkSweep

 CollectionTime = 51707;
 CollectionCount = 103;

 java.lang:type=GarbageCollector,name=ParNew

  CollectionTime = 466835;
  CollectionCount = 21315;


 On Fri, Dec 6, 2013 at 9:58 AM, Jason Wee peich...@gmail.com wrote:

 Hi srmore,

 Perhaps if you use jconsole and connect to the jvm using jmx. Then uner
 MBeans tab, start inspecting the GC metrics.

 /Jason


 On Fri, Dec 6, 2013 at 11:40 PM, srmore comom...@gmail.com wrote:




 On Fri, Dec 6, 2013 at 9:32 AM, Vicky Kak vicky@gmail.com wrote:

 Hard to say much without knowing about the cassandra configurations.


 The cassandra configuration is
 -Xms8G
 -Xmx8G
 -Xmn800m
 -XX:+UseParNewGC
 -XX:+UseConcMarkSweepGC
 -XX:+CMSParallelRemarkEnabled
 -XX:SurvivorRatio=4
 -XX:MaxTenuringThreshold=2
 -XX:CMSInitiatingOccupancyFraction=75
 -XX:+UseCMSInitiatingOccupancyOnly



 Yes compactions/GC's could skipe the CPU, I had similar behavior with
 my setup.


 Were you able to get around it ?



 -VK


 On Fri, Dec 6, 2013 at 7:40 PM, srmore comom...@gmail.com wrote:

 We have a 3 node cluster running cassandra 1.2.12, they are pretty
 big machines 64G ram with 16 cores, cassandra heap is 8G.

 The interesting observation is that, when I send traffic to one node
 its performance is 2x more than when I send traffic to all the nodes. We
 ran 1.0.11 on the same box and we observed a slight dip but not half as
 seen with 1.2.12. In both the cases we were writing with LOCAL_QUORUM.
 Changing CL to ONE make a slight improvement but not much.

 The read_Repair_chance is 0.1. We see some compactions running.

 following is my iostat -x output, sda is the ssd (for commit log) and
 sdb is the spinner.

 avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   66.460.008.950.010.00   24.58

 Device: rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s
 avgrq-sz avgqu-sz   await  svctm  %util
 sda   0.0027.60  0.00  4.40 0.00   256.00
 58.18 0.012.55   1.32   0.58
 sda1  0.00 0.00  0.00  0.00 0.00 0.00
 0.00 0.000.00   0.00   0.00
 sda2  0.0027.60  0.00  4.40 0.00   256.00
 58.18 0.012.55   1.32   0.58
 sdb   0.00 0.00  0.00  0.00 0.00 0.00
 0.00 0.000.00   0.00   0.00
 sdb1  0.00 0.00  0.00  0.00 0.00 0.00
 0.00 0.000.00   0.00   0.00
 dm-0  0.00 0.00  0.00  0.00 0.00 0.00
 0.00 0.000.00   0.00   0.00
 dm-1  0.00 0.00  0.00  0.60 0.00 4.80
 8.00 0.005.33   2.67   0.16
 dm-2  0.00 0.00  0.00  0.00 0.00 0.00
 0.00 0.000.00   0.00   0.00
 dm-3  0.00 0.00  0.00 24.80 0.00   198.40
 8.00 0.249.80   0.13   0.32
 dm-4  0.00 0.00  0.00  6.60 0.0052.80
 8.00 0.011.36   0.55   0.36
 dm-5  0.00 0.00  0.00  0.00 0.00 0.00
 0.00 0.000.00   0.00   0.00
 dm-6  0.00 0.00  0.00 24.80 0.00   198.40
 8.00 0.29   11.60   0.13   0.32



 I can see I am cpu bound here but couldn't figure out exactly what is
 causing it, is this caused by GC or Compaction ? I am thinking it is
 compaction, I see a lot of context switches and interrupts in my vmstat
 output.

 I don't see GC activity in the logs but see some compaction activity.
 Has anyone seen this ? or know what can be done to free up the CPU.

 Thanks,
 Sandeep










Re: Write performance with 1.2.12

2013-12-06 Thread srmore
Changed memtable_total_space_in_mb to 1024 still no luck.


On Fri, Dec 6, 2013 at 11:05 AM, Vicky Kak vicky@gmail.com wrote:

 Can you set the memtable_total_space_in_mb value, it is defaulting to 1/3
 which is 8/3 ~ 2.6 gb in capacity

 http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-improved-memory-and-disk-space-management

 The flushing of 2.6 gb to the disk might slow the performance if
 frequently called, may be you have lots of write operations going on.



 On Fri, Dec 6, 2013 at 10:06 PM, srmore comom...@gmail.com wrote:




 On Fri, Dec 6, 2013 at 9:59 AM, Vicky Kak vicky@gmail.com wrote:

 You have passed the JVM configurations and not the cassandra
 configurations which is in cassandra.yaml.


 Apologies, was tuning JVM and that's what was in my mind.
 Here are the cassandra settings http://pastebin.com/uN42GgYT



 The spikes are not that significant in our case and we are running the
 cluster with 1.7 gb heap.

 Are these spikes causing any issue at your end?


 There are no big spikes, the overall performance seems to be about 40%
 low.






 On Fri, Dec 6, 2013 at 9:10 PM, srmore comom...@gmail.com wrote:




 On Fri, Dec 6, 2013 at 9:32 AM, Vicky Kak vicky@gmail.com wrote:

 Hard to say much without knowing about the cassandra configurations.


 The cassandra configuration is
 -Xms8G
 -Xmx8G
 -Xmn800m
 -XX:+UseParNewGC
 -XX:+UseConcMarkSweepGC
 -XX:+CMSParallelRemarkEnabled
 -XX:SurvivorRatio=4
 -XX:MaxTenuringThreshold=2
 -XX:CMSInitiatingOccupancyFraction=75
 -XX:+UseCMSInitiatingOccupancyOnly



 Yes compactions/GC's could skipe the CPU, I had similar behavior with
 my setup.


 Were you able to get around it ?



 -VK


 On Fri, Dec 6, 2013 at 7:40 PM, srmore comom...@gmail.com wrote:

 We have a 3 node cluster running cassandra 1.2.12, they are pretty
 big machines 64G ram with 16 cores, cassandra heap is 8G.

 The interesting observation is that, when I send traffic to one node
 its performance is 2x more than when I send traffic to all the nodes. We
 ran 1.0.11 on the same box and we observed a slight dip but not half as
 seen with 1.2.12. In both the cases we were writing with LOCAL_QUORUM.
 Changing CL to ONE make a slight improvement but not much.

 The read_Repair_chance is 0.1. We see some compactions running.

 following is my iostat -x output, sda is the ssd (for commit log) and
 sdb is the spinner.

 avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   66.460.008.950.010.00   24.58

 Device: rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s
 avgrq-sz avgqu-sz   await  svctm  %util
 sda   0.0027.60  0.00  4.40 0.00   256.00
 58.18 0.012.55   1.32   0.58
 sda1  0.00 0.00  0.00  0.00 0.00 0.00
 0.00 0.000.00   0.00   0.00
 sda2  0.0027.60  0.00  4.40 0.00   256.00
 58.18 0.012.55   1.32   0.58
 sdb   0.00 0.00  0.00  0.00 0.00 0.00
 0.00 0.000.00   0.00   0.00
 sdb1  0.00 0.00  0.00  0.00 0.00 0.00
 0.00 0.000.00   0.00   0.00
 dm-0  0.00 0.00  0.00  0.00 0.00 0.00
 0.00 0.000.00   0.00   0.00
 dm-1  0.00 0.00  0.00  0.60 0.00 4.80
 8.00 0.005.33   2.67   0.16
 dm-2  0.00 0.00  0.00  0.00 0.00 0.00
 0.00 0.000.00   0.00   0.00
 dm-3  0.00 0.00  0.00 24.80 0.00   198.40
 8.00 0.249.80   0.13   0.32
 dm-4  0.00 0.00  0.00  6.60 0.0052.80
 8.00 0.011.36   0.55   0.36
 dm-5  0.00 0.00  0.00  0.00 0.00 0.00
 0.00 0.000.00   0.00   0.00
 dm-6  0.00 0.00  0.00 24.80 0.00   198.40
 8.00 0.29   11.60   0.13   0.32



 I can see I am cpu bound here but couldn't figure out exactly what is
 causing it, is this caused by GC or Compaction ? I am thinking it is
 compaction, I see a lot of context switches and interrupts in my vmstat
 output.

 I don't see GC activity in the logs but see some compaction activity.
 Has anyone seen this ? or know what can be done to free up the CPU.

 Thanks,
 Sandeep










Cassandra high heap utilization under heavy reads and writes.

2013-11-23 Thread srmore
Hello,
We moved to cassandra 1.2.9 from 1.0.11 to take advantage of the off-heap
bloom filters and other improvements.

We see a lot of messages dropped under high load conditions. We noticed
that when we do heavy read AND write simultaneously (we read first and
check whether the key exists if not we write it) Cassandra heap increases
dramatically and then gossip marks the node down (as a result of high load
on the node).


Under heavy 'reads only' we don't see this behavior.  Has anyone seen this
behavior ? any suggestions.

Thanks !


Re: java.io.FileNotFoundException when setting up internode_compression

2013-11-12 Thread srmore
Thanks Christopher !
I don't think glibc is an issue (as it did go that far) /usr/tmp/
snappy-1.0.5-libsnappyjava.so is not there, permissions look ok, are there
any special settings (like JVM args) that I should be using ? I can see
libsnappyjava.so in the jar though
(snappy-java-1.0.5.jar\org\xerial\snappy\native\Linux\i386\) one other
thing I am using RedHat 6. I will try updating glibc ans see what happens.

Thanks !




On Mon, Nov 11, 2013 at 5:01 PM, Christopher Wirt chris.w...@struq.comwrote:

 I had this the other day when we were accidentally provisioned a centos5
 machine (instead of 6). Think it relates to the version of glibc. Notice it
 wants the native binary .so not the .jar



 So maybe update to a newer version of glibc? Or possibly make sure the .so
 exists at /usr/tmp/snappy-1.0.5-libsnappyjava.so?

 I was lucky and just did an OS reload to centos6.



 Here is someone having a similar issue.


 http://mail-archives.apache.org/mod_mbox/cassandra-commits/201307.mbox/%3CJIRA.12616012.1352862646995.6820.1373083550278@arcas%3E





 *From:* srmore [mailto:comom...@gmail.com]
 *Sent:* 11 November 2013 21:32
 *To:* user@cassandra.apache.org
 *Subject:* java.io.FileNotFoundException when setting up
 internode_compression



 I might be missing something obvious here, for some reason I cannot seem
 to get internode_compression = all to work. I am getting  the following
 exception. I am using cassandra 1.2.9 and have snappy-java-1.0.5.jar in my
 classpath. Google search did not return any useful result, has anyone seen
 this before ?


 java.io.FileNotFoundException: /usr/tmp/snappy-1.0.5-libsnappyjava.so (No
 such file or directory)
 at java.io.FileOutputStream.open(Native Method)
 at java.io.FileOutputStream.init(FileOutputStream.java:194)
 at java.io.FileOutputStream.init(FileOutputStream.java:145)
 at
 org.xerial.snappy.SnappyLoader.extractLibraryFile(SnappyLoader.java:394)
 at
 org.xerial.snappy.SnappyLoader.findNativeLibrary(SnappyLoader.java:468)
 at
 org.xerial.snappy.SnappyLoader.loadNativeLibrary(SnappyLoader.java:318)
 at org.xerial.snappy.SnappyLoader.load(SnappyLoader.java:229)
 at org.xerial.snappy.Snappy.clinit(Snappy.java:48)
 at
 org.apache.cassandra.io.compress.SnappyCompressor.create(SnappyCompressor.java:45)
 at
 org.apache.cassandra.io.compress.SnappyCompressor.isAvailable(SnappyCompressor.java:55)
 at
 org.apache.cassandra.io.compress.SnappyCompressor.clinit(SnappyCompressor.java:37)
 at
 org.apache.cassandra.config.CFMetaData.clinit(CFMetaData.java:82)
 at
 org.apache.cassandra.config.KSMetaData.systemKeyspace(KSMetaData.java:81)
 at
 org.apache.cassandra.config.DatabaseDescriptor.loadYaml(DatabaseDescriptor.java:471)
 at
 org.apache.cassandra.config.DatabaseDescriptor.clinit(DatabaseDescriptor.java:123)

 Caused by: java.lang.UnsatisfiedLinkError: no snappyjava in
 java.library.path
 at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1738)
 at java.lang.Runtime.loadLibrary0(Runtime.java:823)
 at java.lang.System.loadLibrary(System.java:1028)
 at
 org.xerial.snappy.SnappyNativeLoader.loadLibrary(SnappyNativeLoader.java:52)
 ... 18 more



Re: A lot of MUTATION and REQUEST_RESPONSE messages dropped

2013-11-11 Thread srmore
The problem was cross_node_timeout  value,I had it set to true and my ntp
clocks were not  synchronized as a result, some of the requests were
dropped.

Thanks,
Sandeep


On Sat, Nov 9, 2013 at 6:02 PM, srmore comom...@gmail.com wrote:

 I recently upgraded to 1.2.9 and I am seeing a lot of REQUEST_RESPONSE and
 MUTATION messages are being dropped.

 This happens when I have multiple nodes in the cluster (about 3 nodes) and
 I send traffic to only one node. I don't think the traffic is that high, it
 is around 400 msg/sec with 100 threads. When I take down other two nodes I
 don't see any errors (at least on the client side) I am using Pelops.

 On the client I get UnavailableException, but the nodes are up. Initially
 I thought I am hitting CASSANDRA-6297 (gossip thread blocking) so I
 changed memtable_flush_writers to 3. Still no luck.

 UnavailableException:
 org.scale7.cassandra.pelops.exceptions.UnavailableException: null at
 org.scale7.cassandra.pelops.exceptions.IExceptionTranslator$ExceptionTranslator.translate(IExceptionTranslator.java:61)
 ~[na:na] at

 In the debug log on the cassandra node this is the exception I see

 DEBUG [Thrift:78] 2013-11-09 16:47:28,212 CustomTThreadPoolServer.java
 Thrift transport error occurred during processing of message.
 org.apache.thrift.transport.TTransportException
 at
 org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
 at
 org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
 at
 org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
 at
 org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
 at
 org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
 at
 org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
 at
 org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
 at
 org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
 at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:22)
 at
 org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:206)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
 at java.lang.Thread.run(Thread.java:662)

 Could this be because of high load ? with Cassandra 1.0.011 I did not see
 this issue.

 Thanks,
 Sandeep





java.io.FileNotFoundException when setting up internode_compression

2013-11-11 Thread srmore
I might be missing something obvious here, for some reason I cannot seem to
get internode_compression = all to work. I am getting  the following
exception. I am using cassandra 1.2.9 and have snappy-java-1.0.5.jar in my
classpath. Google search did not return any useful result, has anyone seen
this before ?


java.io.FileNotFoundException: /usr/tmp/snappy-1.0.5-libsnappyjava.so (No
such file or directory)
at java.io.FileOutputStream.open(Native Method)
at java.io.FileOutputStream.init(FileOutputStream.java:194)
at java.io.FileOutputStream.init(FileOutputStream.java:145)
at
org.xerial.snappy.SnappyLoader.extractLibraryFile(SnappyLoader.java:394)
at
org.xerial.snappy.SnappyLoader.findNativeLibrary(SnappyLoader.java:468)
at
org.xerial.snappy.SnappyLoader.loadNativeLibrary(SnappyLoader.java:318)
at org.xerial.snappy.SnappyLoader.load(SnappyLoader.java:229)
at org.xerial.snappy.Snappy.clinit(Snappy.java:48)
at
org.apache.cassandra.io.compress.SnappyCompressor.create(SnappyCompressor.java:45)
at
org.apache.cassandra.io.compress.SnappyCompressor.isAvailable(SnappyCompressor.java:55)
at
org.apache.cassandra.io.compress.SnappyCompressor.clinit(SnappyCompressor.java:37)
at
org.apache.cassandra.config.CFMetaData.clinit(CFMetaData.java:82)
at
org.apache.cassandra.config.KSMetaData.systemKeyspace(KSMetaData.java:81)
at
org.apache.cassandra.config.DatabaseDescriptor.loadYaml(DatabaseDescriptor.java:471)
at
org.apache.cassandra.config.DatabaseDescriptor.clinit(DatabaseDescriptor.java:123)

Caused by: java.lang.UnsatisfiedLinkError: no snappyjava in
java.library.path
at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1738)
at java.lang.Runtime.loadLibrary0(Runtime.java:823)
at java.lang.System.loadLibrary(System.java:1028)
at
org.xerial.snappy.SnappyNativeLoader.loadLibrary(SnappyNativeLoader.java:52)
... 18 more


A lot of MUTATION and REQUEST_RESPONSE messages dropped

2013-11-09 Thread srmore
I recently upgraded to 1.2.9 and I am seeing a lot of REQUEST_RESPONSE and
MUTATION messages are being dropped.

This happens when I have multiple nodes in the cluster (about 3 nodes) and
I send traffic to only one node. I don't think the traffic is that high, it
is around 400 msg/sec with 100 threads. When I take down other two nodes I
don't see any errors (at least on the client side) I am using Pelops.

On the client I get UnavailableException, but the nodes are up. Initially I
thought I am hitting CASSANDRA-6297 (gossip thread blocking) so I changed
memtable_flush_writers to 3. Still no luck.

UnavailableException:
org.scale7.cassandra.pelops.exceptions.UnavailableException: null at
org.scale7.cassandra.pelops.exceptions.IExceptionTranslator$ExceptionTranslator.translate(IExceptionTranslator.java:61)
~[na:na] at

In the debug log on the cassandra node this is the exception I see

DEBUG [Thrift:78] 2013-11-09 16:47:28,212 CustomTThreadPoolServer.java
Thrift transport error occurred during processing of message.
org.apache.thrift.transport.TTransportException
at
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at
org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at
org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
at
org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
at
org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at
org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
at
org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
at
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:22)
at
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:206)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:662)

Could this be because of high load ? with Cassandra 1.0.011 I did not see
this issue.

Thanks,
Sandeep


Re: heap issues - looking for advices on gc tuning

2013-10-30 Thread srmore
We ran into similar heap issues a while ago for 1.0.11, I am not sure
whether you are at the luxury of upgrading to at-least 1.2.9, we were not.
After a lot of various painful attempts and weeks of testing (just as in
your case) the following settings worked (did not completely relieve the
heap pressure but helped a lot). We still see some heap issues but at-least
it is a bit stable. Unlike in your case we had very heavy reads and writes.
But its good to know that this happens for light load, I was thinking this
was a symptom of heavy load.

-XX:NewSize=1200M
-XX:SurvivorRatio=4
-XX:MaxTenuringThreshold=2


Not sure whether this will help you or not but I think its worth a try.

-sandeep


On Wed, Oct 30, 2013 at 4:34 AM, Jason Tang ares.t...@gmail.com wrote:

 What's configuration of following parameters
 memtable_flush_queue_size:
 concurrent_compactors:


 2013/10/30 Piavlo lolitus...@gmail.com

 Hi,

 Below I try to give a full picture to the problem I'm facing.

 This is a 12 node cluster, running on ec2 with m2.xlarge instances (17G
 ram , 2 cpus).
 Cassandra version is 1.0.8
 Cluster normally having between 3000 - 1500 reads per second (depends on
 time of the day) and 1700 - 800 writes per second- according to Opscetner.
 RF=3, now row caches are used.

 Memory relevant  configs from cassandra.yaml:
 flush_largest_memtables_at: 0.85
 reduce_cache_sizes_at: 0.90
 reduce_cache_capacity_to: 0.75
 commitlog_total_space_in_mb: 4096

 relevant JVM options used are:
 -Xms8000M -Xmx8000M -Xmn400M
 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
 -XX:MaxTenuringThreshold=1
 -XX:**CMSInitiatingOccupancyFraction**=80 -XX:+**
 UseCMSInitiatingOccupancyOnly

 Now what happens is that with these settings after cassandra process
 restart, the GC it working fine at the beginning, and heap used looks like a
 saw with perfect teeth, eventually the teeth size start to diminish until
 the teeth become not noticable, and then cassandra starts to spend lot's of
 CPU time
 doing gc. It takes about 2 weeks until for such cycle , and then I need
 to restart cassandra process to improve performance.
 During all this time there are no memory  related messages in cassandra
 system.log, except a GC for ParNew: little above 200ms once in a while.

 Things i've already done trying to reduce this eventual heap pressure.
 1) reducing bloom_filter_fp_chance  resulting in reduction from ~700MB to
 ~280MB total per node based on all Filter.db files on the node.
 2) reducing key cache sizes, and dropping key_caches for CFs which do no
 not have many reads
 3) the heap size was increased from 7000M to 8000M
 All these have not really helped , just the increase from 7000M to 8000M,
 helped in increase the cycle till excessive gc from ~9 days to ~14 days.

 I've tried to graph overtime the data that is supposed to be in heap vs
 actual heap size, by summing up all CFs bloom filter sizes + all CFs key
 cache capacities multipled by average key size + all CFs memtables data
 size reported (i've overestimated the data size a bit on purpose to be on
 the safe size).
 Here is a link to graph showing last 2 day metrics for a node which could
 not effectively do GC, and then cassandra process was restarted.
 http://awesomescreenshot.com/**0401w5y534http://awesomescreenshot.com/0401w5y534
 You can clearly see that before and after restart, the size of data that
 is in supposed to be in heap, is the same pretty much the same,
 which makes me think that I really need is GC tunning.

 Also I suppose that this is not due to number of total keys each node has
 , which is between 300 - 200 milions keys for all CF key estimates summed
 on a code.
 The nodes have datasize between 75G to 45G  accordingly to milions of
 keys. And all nodes are starting to have having GC heavy load after about
 14 days.
 Also the excessive GC and heap usage are not affected by load which
 varies depending on time of the day (see read/write rates at the beginning
 of the mail).
 So again based on this , I assume this is not due to large number of keys
 or too much load on the cluster,  but due to a pure GC misconfiguration
 issue.

 Things I remember that I've tried for GC tunning:
 1) Changing -XX:MaxTenuringThreshold=1 to values like 8 - did not help.
 2) Adding  -XX:+CMSIncrementalMode -XX:+CMSIncrementalPacing -XX:**
 CMSIncrementalDutyCycleMin=0
   -XX:CMSIncrementalDutyCycle=10 -XX:ParallelGCThreads=2
 JVM_OPTS -XX:ParallelCMSThreads=1
 this actually made things worse.
 3) Adding -XX:-XX-UseAdaptiveSizePolicy -XX:SurvivorRatio=8 - did not
 help.

 Also since it takes like 2 weeks to verify that changing GC setting did
 not help, the process is painfully slow to try all the possibilities :)
 I'd highly appreciate any help and hints on the GC tunning.

 tnx
 Alex










Re: Query a datacenter

2013-10-29 Thread srmore
Thanks Rob that helps !


On Fri, Oct 25, 2013 at 7:34 PM, Robert Coli rc...@eventbrite.com wrote:

 On Fri, Oct 25, 2013 at 2:47 PM, srmore comom...@gmail.com wrote:

 I don't know whether this is possible but was just curious, can you query
 for the data in the remote datacenter with a CL.ONE ?


 A coordinator at CL.ONE picks which replica(s) to query based in large
 part on the dynamic snitch. If your remote data center has a lower badness
 score from the perspective of the dynamic snitch, a CL.ONE request might go
 there.

 1.2.11 adds [1] a LOCAL_ONE consistencylevel which does the opposite of
 what you are asking, restricting CL.ONE from going cross-DC.


 There could be a case where one might not have a QUORUM and would like to
 read the most recent  data which includes the data from the other
 datacenter. AFAIK to reliably read the data from other datacenter we only
 have CL.EACH_QUORUM.


 Using CL.QUORUM requires a QUORUM number of responses, it does not care
 from which data center those responses come.


 Also, is there a way one can control how frequently the data is
 replicated across the datacenters ?


 Data centers don't really exist in this context [2], so your question is
 can one control how frequently data is replicated between replicas and
 the answer is no. All replication always goes to every replica.

 =Rob
 [1] https://issues.apache.org/jira/browse/CASSANDRA-6202
 [2] this is slightly glib/reductive/inaccurate, but accurate enough for
 the purposes of this response.



Query a datacenter

2013-10-25 Thread srmore
I don't know whether this is possible but was just curious, can you query
for the data in the remote datacenter with a CL.ONE ?

There could be a case where one might not have a QUORUM and would like to
read the most recent  data which includes the data from the other
datacenter. AFAIK to reliably read the data from other datacenter we only
have CL.EACH_QUORUM.


Also, is there a way one can control how frequently the data is replicated
across the datacenters ?

Thanks !


Re: Cassandra Heap Size for data more than 1 TB

2013-10-03 Thread srmore
Thanks Mohit and Michael,
That's what I thought. I have tried all the avenues, will give ParNew a
try. With the 1.0.xx I have issues when data sizes go up, hopefully that
will not be the case with 1.2.

Just curious, has anyone tried 1.2 with large data set, around 1 TB ?


Thanks !


On Thu, Oct 3, 2013 at 7:20 AM, Michał Michalski mich...@opera.com wrote:

 I was experimenting with 128 vs. 512 some time ago and I was unable to see
 any difference in terms of performance. I'd probably check 1024 too, but we
 migrated to 1.2 and heap space was not an issue anymore.

 M.

 W dniu 02.10.2013 16:32, srmore pisze:

  I changed my index_interval from 128 to index_interval: 128 to 512, does
 it
 make sense to increase more than this ?


 On Wed, Oct 2, 2013 at 9:30 AM, cem cayiro...@gmail.com wrote:

  Have a look to index_interval.

 Cem.


 On Wed, Oct 2, 2013 at 2:25 PM, srmore comom...@gmail.com wrote:

  The version of Cassandra I am using is 1.0.11, we are migrating to 1.2.X
 though. We had tuned bloom filters (0.1) and AFAIK making it lower than
 this won't matter.

 Thanks !


 On Tue, Oct 1, 2013 at 11:54 PM, Mohit Anchlia mohitanch...@gmail.com
 wrote:

  Which Cassandra version are you on? Essentially heap size is function
 of
 number of keys/metadata. In Cassandra 1.2 lot of the metadata like
 bloom
 filters were moved off heap.


 On Tue, Oct 1, 2013 at 9:34 PM, srmore comom...@gmail.com wrote:

  Does anyone know what would roughly be the heap size for cassandra
 with
 1TB of data ? We started with about 200 G and now on one of the nodes
 we
 are already on 1 TB. We were using 8G of heap and that served us well
 up
 until we reached 700 G where we started seeing failures and nodes
 flipping.

 With 1 TB of data the node refuses to come back due to lack of memory.
 needless to say repairs and compactions takes a lot of time. We upped
 the
 heap from 8 G to 12 G and suddenly everything started moving rapidly
 i.e.
 the repair tasks and the compaction tasks. But soon (in about 9-10
 hrs) we
 started seeing the same symptoms as we were seeing with 8 G.

 So my question is how do I determine what is the optimal size of heap
 for data around 1 TB ?

 Following are some of my JVM settings

 -Xms8G
 -Xmx8G
 -Xmn800m
 -XX:NewSize=1200M
 XX:MaxTenuringThreshold=2
 -XX:SurvivorRatio=4

 Thanks !










Re: Cassandra Heap Size for data more than 1 TB

2013-10-02 Thread srmore
The version of Cassandra I am using is 1.0.11, we are migrating to 1.2.X
though. We had tuned bloom filters (0.1) and AFAIK making it lower than
this won't matter.

Thanks !


On Tue, Oct 1, 2013 at 11:54 PM, Mohit Anchlia mohitanch...@gmail.comwrote:

 Which Cassandra version are you on? Essentially heap size is function of
 number of keys/metadata. In Cassandra 1.2 lot of the metadata like bloom
 filters were moved off heap.


 On Tue, Oct 1, 2013 at 9:34 PM, srmore comom...@gmail.com wrote:

 Does anyone know what would roughly be the heap size for cassandra with
 1TB of data ? We started with about 200 G and now on one of the nodes we
 are already on 1 TB. We were using 8G of heap and that served us well up
 until we reached 700 G where we started seeing failures and nodes flipping.

 With 1 TB of data the node refuses to come back due to lack of memory.
 needless to say repairs and compactions takes a lot of time. We upped the
 heap from 8 G to 12 G and suddenly everything started moving rapidly i.e.
 the repair tasks and the compaction tasks. But soon (in about 9-10 hrs) we
 started seeing the same symptoms as we were seeing with 8 G.

 So my question is how do I determine what is the optimal size of heap for
 data around 1 TB ?

 Following are some of my JVM settings

 -Xms8G
 -Xmx8G
 -Xmn800m
 -XX:NewSize=1200M
 XX:MaxTenuringThreshold=2
 -XX:SurvivorRatio=4

 Thanks !





Re: Cassandra Heap Size for data more than 1 TB

2013-10-02 Thread srmore
I changed my index_interval from 128 to index_interval: 128 to 512, does it
make sense to increase more than this ?


On Wed, Oct 2, 2013 at 9:30 AM, cem cayiro...@gmail.com wrote:

 Have a look to index_interval.

 Cem.


 On Wed, Oct 2, 2013 at 2:25 PM, srmore comom...@gmail.com wrote:

 The version of Cassandra I am using is 1.0.11, we are migrating to 1.2.X
 though. We had tuned bloom filters (0.1) and AFAIK making it lower than
 this won't matter.

 Thanks !


 On Tue, Oct 1, 2013 at 11:54 PM, Mohit Anchlia mohitanch...@gmail.comwrote:

 Which Cassandra version are you on? Essentially heap size is function of
 number of keys/metadata. In Cassandra 1.2 lot of the metadata like bloom
 filters were moved off heap.


 On Tue, Oct 1, 2013 at 9:34 PM, srmore comom...@gmail.com wrote:

 Does anyone know what would roughly be the heap size for cassandra with
 1TB of data ? We started with about 200 G and now on one of the nodes we
 are already on 1 TB. We were using 8G of heap and that served us well up
 until we reached 700 G where we started seeing failures and nodes flipping.

 With 1 TB of data the node refuses to come back due to lack of memory.
 needless to say repairs and compactions takes a lot of time. We upped the
 heap from 8 G to 12 G and suddenly everything started moving rapidly i.e.
 the repair tasks and the compaction tasks. But soon (in about 9-10 hrs) we
 started seeing the same symptoms as we were seeing with 8 G.

 So my question is how do I determine what is the optimal size of heap
 for data around 1 TB ?

 Following are some of my JVM settings

 -Xms8G
 -Xmx8G
 -Xmn800m
 -XX:NewSize=1200M
 XX:MaxTenuringThreshold=2
 -XX:SurvivorRatio=4

 Thanks !







Re: Cassandra Heap Size for data more than 1 TB

2013-10-02 Thread srmore
Sure, I was testing using high traffic with about 6K - 7K req/sec reads and
writes combined I added a node and ran repair, at this time the traffic was
stopped and heap was 8G. I saw a lot of flushing and GC activity and
finally it died saying out of memory. So I gave it more memory 12 G and
started the nodes. This sped up the compactions and validations for around
12 hours and now I am back to the flushing and high GC activity at this
point there was no traffic for more than 24 hours.

Again, thanks for the help !


On Wed, Oct 2, 2013 at 10:19 AM, cem cayiro...@gmail.com wrote:

 I think 512 is fine. Could you tell more about your traffic
 characteristics?

 Cem


 On Wed, Oct 2, 2013 at 4:32 PM, srmore comom...@gmail.com wrote:

 I changed my index_interval from 128 to index_interval: 128 to 512, does
 it make sense to increase more than this ?


 On Wed, Oct 2, 2013 at 9:30 AM, cem cayiro...@gmail.com wrote:

 Have a look to index_interval.

 Cem.


 On Wed, Oct 2, 2013 at 2:25 PM, srmore comom...@gmail.com wrote:

 The version of Cassandra I am using is 1.0.11, we are migrating to
 1.2.X though. We had tuned bloom filters (0.1) and AFAIK making it lower
 than this won't matter.

 Thanks !


 On Tue, Oct 1, 2013 at 11:54 PM, Mohit Anchlia 
 mohitanch...@gmail.comwrote:

 Which Cassandra version are you on? Essentially heap size is function
 of number of keys/metadata. In Cassandra 1.2 lot of the metadata like 
 bloom
 filters were moved off heap.


 On Tue, Oct 1, 2013 at 9:34 PM, srmore comom...@gmail.com wrote:

 Does anyone know what would roughly be the heap size for cassandra
 with 1TB of data ? We started with about 200 G and now on one of the 
 nodes
 we are already on 1 TB. We were using 8G of heap and that served us well 
 up
 until we reached 700 G where we started seeing failures and nodes 
 flipping.

 With 1 TB of data the node refuses to come back due to lack of
 memory. needless to say repairs and compactions takes a lot of time. We
 upped the heap from 8 G to 12 G and suddenly everything started moving
 rapidly i.e. the repair tasks and the compaction tasks. But soon (in 
 about
 9-10 hrs) we started seeing the same symptoms as we were seeing with 8 G.

 So my question is how do I determine what is the optimal size of heap
 for data around 1 TB ?

 Following are some of my JVM settings

 -Xms8G
 -Xmx8G
 -Xmn800m
 -XX:NewSize=1200M
 XX:MaxTenuringThreshold=2
 -XX:SurvivorRatio=4

 Thanks !









Cassandra Heap Size for data more than 1 TB

2013-10-01 Thread srmore
Does anyone know what would roughly be the heap size for cassandra with 1TB
of data ? We started with about 200 G and now on one of the nodes we are
already on 1 TB. We were using 8G of heap and that served us well up until
we reached 700 G where we started seeing failures and nodes flipping.

With 1 TB of data the node refuses to come back due to lack of memory.
needless to say repairs and compactions takes a lot of time. We upped the
heap from 8 G to 12 G and suddenly everything started moving rapidly i.e.
the repair tasks and the compaction tasks. But soon (in about 9-10 hrs) we
started seeing the same symptoms as we were seeing with 8 G.

So my question is how do I determine what is the optimal size of heap for
data around 1 TB ?

Following are some of my JVM settings

-Xms8G
-Xmx8G
-Xmn800m
-XX:NewSize=1200M
XX:MaxTenuringThreshold=2
-XX:SurvivorRatio=4

Thanks !


Re: Error during startup - java.lang.OutOfMemoryError: unable to create new native thread

2013-09-19 Thread srmore
I hit this issue again today and looks like changing -Xss option does not
work :(
I am on 1.0.11 (I know its old, we are upgrading to 1.2.9 right now) and
have about 800-900GB of data. I can see cassandra is spending a lot of time
reading the data files before it quits with  java.lang.OutOfMemoryError:
unable to create new native thread error.

My hard and soft limits seems to be ok as well
Datastax recommends [1]

* soft nofile 32768
* hard nofile 32768


and I have
hardnofile 65536
softnofile 65536

My ulimit -u output is 515038 (which again should be sufficient)

complete output

ulimit -a
core file size  (blocks, -c)0
data seg size   (kbytes, -d)  unlimited
scheduling priority (-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) 515038
max locked memory   (kbytes, -l) 32
max memory size (kbytes, -m) unlimited
open files  (-n) 1024
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) 10240
cpu time   (seconds, -t) unlimited
max user processes  (-u) 515038
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited




Has anyone run into this ?

[1] http://www.datastax.com/docs/1.1/troubleshooting/index

On Wed, Sep 11, 2013 at 8:47 AM, srmore comom...@gmail.com wrote:

 Thanks Viktor,


 - check (cassandra-env.sh) -Xss size, you may need to increase it for your
 JVM;

 This seems to have done the trick !

 Thanks !


 On Tue, Sep 10, 2013 at 12:46 AM, Viktor Jevdokimov 
 viktor.jevdoki...@adform.com wrote:

  For start:

 - check (cassandra-env.sh) -Xss size, you may need to increase it for
 your JVM;

 - check (cassandra-env.sh) -Xms and -Xmx size, you may need to increase
 it for your data load/bloom filter/index sizes.

 ** **

 ** **

 ** **
Best regards / Pagarbiai
 *Viktor Jevdokimov*
 Senior Developer

  [image: Adform News] http://www.adform.com

 *Visit us at Dmexco: *Hall 6 Stand B-52
 September 18-19 Cologne, Germany
 Email: viktor.jevdoki...@adform.com
 Phone: +370 5 212 3063, Fax +370 5 261 0453
 J. Jasinskio 16C, LT-03163 Vilnius, Lithuania
 Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
 Take a ride with Adform's Rich Media Suitehttp://vimeo.com/adform/richmedia
  [image: Dmexco 2013] http://www.dmexco.de/

 Disclaimer: The information contained in this message and attachments is
 intended solely for the attention and use of the named addressee and may be
 confidential. If you are not the intended recipient, you are reminded that
 the information remains the property of the sender. You must not use,
 disclose, distribute, copy, print or rely on this e-mail. If you have
 received this message in error, please contact the sender immediately and
 irrevocably delete this message and any copies.

 *From:* srmore [mailto:comom...@gmail.com]
 *Sent:* Tuesday, September 10, 2013 6:16 AM
 *To:* user@cassandra.apache.org
 *Subject:* Error during startup - java.lang.OutOfMemoryError: unable to
 create new native thread [heur]

 ** **


 I have a 5 node cluster with a load of around 300GB each. A node went
 down and does not come up. I can see the following exception in the logs.

 ERROR [main] 2013-09-09 21:50:56,117 AbstractCassandraDaemon.java (line
 139) Fatal exception in thread Thread[main,5,main]
 java.lang.OutOfMemoryError: unable to create new native thread
 at java.lang.Thread.start0(Native Method)
 at java.lang.Thread.start(Thread.java:640)
 at
 java.util.concurrent.ThreadPoolExecutor.addIfUnderCorePoolSize(ThreadPoolExecutor.java:703)
 at
 java.util.concurrent.ThreadPoolExecutor.prestartAllCoreThreads(ThreadPoolExecutor.java:1392)
 at
 org.apache.cassandra.concurrent.JMXEnabledThreadPoolExecutor.init(JMXEnabledThreadPoolExecutor.java:77)
 at
 org.apache.cassandra.concurrent.JMXEnabledThreadPoolExecutor.init(JMXEnabledThreadPoolExecutor.java:65)
 at
 org.apache.cassandra.concurrent.JMXConfigurableThreadPoolExecutor.init(JMXConfigurableThreadPoolExecutor.java:34)
 at
 org.apache.cassandra.concurrent.StageManager.multiThreadedConfigurableStage(StageManager.java:68)
 at
 org.apache.cassandra.concurrent.StageManager.clinit(StageManager.java:42)
 at
 org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:344)
 at
 org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:173)**
 **

 ** **

 The *ulimit -u* output is
 *515042*

 Which is far more than what is recommended [1] (10240) and I am skeptical
 to set it to unlimited as recommended here [2]

 Any pointers as to what could be the issue and how to get the node up.***
 *




 [1]
 http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html?pagename

Re: Error during startup - java.lang.OutOfMemoryError: unable to create new native thread

2013-09-19 Thread srmore
Was too fast on the send button, sorry.
The thing I wanted to add was the

pending signals (-i) 515038

that looks odd to me, could that be related.



On Thu, Sep 19, 2013 at 4:53 PM, srmore comom...@gmail.com wrote:


 I hit this issue again today and looks like changing -Xss option does not
 work :(
 I am on 1.0.11 (I know its old, we are upgrading to 1.2.9 right now) and
 have about 800-900GB of data. I can see cassandra is spending a lot of time
 reading the data files before it quits with  java.lang.OutOfMemoryError:
 unable to create new native thread error.

 My hard and soft limits seems to be ok as well
 Datastax recommends [1]

 * soft nofile 32768
 * hard nofile 32768


 and I have
 hardnofile 65536
 softnofile 65536

 My ulimit -u output is 515038 (which again should be sufficient)

 complete output

 ulimit -a
 core file size  (blocks, -c)0
 data seg size   (kbytes, -d)  unlimited
 scheduling priority (-e) 0
 file size   (blocks, -f) unlimited
 pending signals (-i) 515038
 max locked memory   (kbytes, -l) 32
 max memory size (kbytes, -m) unlimited
 open files  (-n) 1024
 pipe size(512 bytes, -p) 8
 POSIX message queues (bytes, -q) 819200
 real-time priority  (-r) 0
 stack size  (kbytes, -s) 10240
 cpu time   (seconds, -t) unlimited
 max user processes  (-u) 515038
 virtual memory  (kbytes, -v) unlimited
 file locks  (-x) unlimited




 Has anyone run into this ?

 [1] http://www.datastax.com/docs/1.1/troubleshooting/index

 On Wed, Sep 11, 2013 at 8:47 AM, srmore comom...@gmail.com wrote:

 Thanks Viktor,


 - check (cassandra-env.sh) -Xss size, you may need to increase it for
 your JVM;

 This seems to have done the trick !

 Thanks !


 On Tue, Sep 10, 2013 at 12:46 AM, Viktor Jevdokimov 
 viktor.jevdoki...@adform.com wrote:

  For start:

 - check (cassandra-env.sh) -Xss size, you may need to increase it for
 your JVM;

 - check (cassandra-env.sh) -Xms and -Xmx size, you may need to increase
 it for your data load/bloom filter/index sizes.

 ** **

 ** **

 ** **
Best regards / Pagarbiai
 *Viktor Jevdokimov*
 Senior Developer

  [image: Adform News] http://www.adform.com

 *Visit us at Dmexco: *Hall 6 Stand B-52
 September 18-19 Cologne, Germany
 Email: viktor.jevdoki...@adform.com
 Phone: +370 5 212 3063, Fax +370 5 261 0453
 J. Jasinskio 16C, LT-03163 Vilnius, Lithuania
 Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
 Take a ride with Adform's Rich Media 
 Suitehttp://vimeo.com/adform/richmedia
  [image: Dmexco 2013] http://www.dmexco.de/

 Disclaimer: The information contained in this message and attachments is
 intended solely for the attention and use of the named addressee and may be
 confidential. If you are not the intended recipient, you are reminded that
 the information remains the property of the sender. You must not use,
 disclose, distribute, copy, print or rely on this e-mail. If you have
 received this message in error, please contact the sender immediately and
 irrevocably delete this message and any copies.

 *From:* srmore [mailto:comom...@gmail.com]
 *Sent:* Tuesday, September 10, 2013 6:16 AM
 *To:* user@cassandra.apache.org
 *Subject:* Error during startup - java.lang.OutOfMemoryError: unable to
 create new native thread [heur]

 ** **


 I have a 5 node cluster with a load of around 300GB each. A node went
 down and does not come up. I can see the following exception in the logs.

 ERROR [main] 2013-09-09 21:50:56,117 AbstractCassandraDaemon.java (line
 139) Fatal exception in thread Thread[main,5,main]
 java.lang.OutOfMemoryError: unable to create new native thread
 at java.lang.Thread.start0(Native Method)
 at java.lang.Thread.start(Thread.java:640)
 at
 java.util.concurrent.ThreadPoolExecutor.addIfUnderCorePoolSize(ThreadPoolExecutor.java:703)
 at
 java.util.concurrent.ThreadPoolExecutor.prestartAllCoreThreads(ThreadPoolExecutor.java:1392)
 at
 org.apache.cassandra.concurrent.JMXEnabledThreadPoolExecutor.init(JMXEnabledThreadPoolExecutor.java:77)
 at
 org.apache.cassandra.concurrent.JMXEnabledThreadPoolExecutor.init(JMXEnabledThreadPoolExecutor.java:65)
 at
 org.apache.cassandra.concurrent.JMXConfigurableThreadPoolExecutor.init(JMXConfigurableThreadPoolExecutor.java:34)
 at
 org.apache.cassandra.concurrent.StageManager.multiThreadedConfigurableStage(StageManager.java:68)
 at
 org.apache.cassandra.concurrent.StageManager.clinit(StageManager.java:42)
 at
 org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:344)
 at
 org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:173)*
 ***

 ** **

 The *ulimit -u* output is
 *515042*

 Which is far more than what

Re: Error during startup - java.lang.OutOfMemoryError: unable to create new native thread

2013-09-11 Thread srmore
Thanks Viktor,

- check (cassandra-env.sh) -Xss size, you may need to increase it for your
JVM;

This seems to have done the trick !

Thanks !


On Tue, Sep 10, 2013 at 12:46 AM, Viktor Jevdokimov 
viktor.jevdoki...@adform.com wrote:

  For start:

 - check (cassandra-env.sh) -Xss size, you may need to increase it for your
 JVM;

 - check (cassandra-env.sh) -Xms and -Xmx size, you may need to increase it
 for your data load/bloom filter/index sizes.

 ** **

 ** **

 ** **
Best regards / Pagarbiai
 *Viktor Jevdokimov*
 Senior Developer

  [image: Adform News] http://www.adform.com

 *Visit us at Dmexco: *Hall 6 Stand B-52
 September 18-19 Cologne, Germany
 Email: viktor.jevdoki...@adform.com
 Phone: +370 5 212 3063, Fax +370 5 261 0453
 J. Jasinskio 16C, LT-03163 Vilnius, Lithuania
 Follow us on Twitter: @adforminsider http://twitter.com/#!/adforminsider
 Take a ride with Adform's Rich Media Suitehttp://vimeo.com/adform/richmedia
  [image: Dmexco 2013] http://www.dmexco.de/

 Disclaimer: The information contained in this message and attachments is
 intended solely for the attention and use of the named addressee and may be
 confidential. If you are not the intended recipient, you are reminded that
 the information remains the property of the sender. You must not use,
 disclose, distribute, copy, print or rely on this e-mail. If you have
 received this message in error, please contact the sender immediately and
 irrevocably delete this message and any copies.

 *From:* srmore [mailto:comom...@gmail.com]
 *Sent:* Tuesday, September 10, 2013 6:16 AM
 *To:* user@cassandra.apache.org
 *Subject:* Error during startup - java.lang.OutOfMemoryError: unable to
 create new native thread [heur]

 ** **


 I have a 5 node cluster with a load of around 300GB each. A node went down
 and does not come up. I can see the following exception in the logs.

 ERROR [main] 2013-09-09 21:50:56,117 AbstractCassandraDaemon.java (line
 139) Fatal exception in thread Thread[main,5,main]
 java.lang.OutOfMemoryError: unable to create new native thread
 at java.lang.Thread.start0(Native Method)
 at java.lang.Thread.start(Thread.java:640)
 at
 java.util.concurrent.ThreadPoolExecutor.addIfUnderCorePoolSize(ThreadPoolExecutor.java:703)
 at
 java.util.concurrent.ThreadPoolExecutor.prestartAllCoreThreads(ThreadPoolExecutor.java:1392)
 at
 org.apache.cassandra.concurrent.JMXEnabledThreadPoolExecutor.init(JMXEnabledThreadPoolExecutor.java:77)
 at
 org.apache.cassandra.concurrent.JMXEnabledThreadPoolExecutor.init(JMXEnabledThreadPoolExecutor.java:65)
 at
 org.apache.cassandra.concurrent.JMXConfigurableThreadPoolExecutor.init(JMXConfigurableThreadPoolExecutor.java:34)
 at
 org.apache.cassandra.concurrent.StageManager.multiThreadedConfigurableStage(StageManager.java:68)
 at
 org.apache.cassandra.concurrent.StageManager.clinit(StageManager.java:42)
 at
 org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:344)
 at
 org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:173)***
 *

 ** **

 The *ulimit -u* output is
 *515042*

 Which is far more than what is recommended [1] (10240) and I am skeptical
 to set it to unlimited as recommended here [2]

 Any pointers as to what could be the issue and how to get the node up.




 [1]
 http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html?pagename=docsversion=1.2file=install/recommended_settings#cassandra/install/installRecommendSettings.html

 [2]
 http://mail-archives.apache.org/mod_mbox/cassandra-user/201303.mbox/%3CCAPqEvGE474Omea1BFLJ6U_pbAkOwWxk=dwo35_pc-atwb4_...@mail.gmail.com%3E
 

 Thanks !

signature-logo734.pngdmexco4bc1.png

Error during startup - java.lang.OutOfMemoryError: unable to create new native thread

2013-09-09 Thread srmore
I have a 5 node cluster with a load of around 300GB each. A node went down
and does not come up. I can see the following exception in the logs.

ERROR [main] 2013-09-09 21:50:56,117 AbstractCassandraDaemon.java (line
139) Fatal exception in thread Thread[main,5,main]
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:640)
at
java.util.concurrent.ThreadPoolExecutor.addIfUnderCorePoolSize(ThreadPoolExecutor.java:703)
at
java.util.concurrent.ThreadPoolExecutor.prestartAllCoreThreads(ThreadPoolExecutor.java:1392)
at
org.apache.cassandra.concurrent.JMXEnabledThreadPoolExecutor.init(JMXEnabledThreadPoolExecutor.java:77)
at
org.apache.cassandra.concurrent.JMXEnabledThreadPoolExecutor.init(JMXEnabledThreadPoolExecutor.java:65)
at
org.apache.cassandra.concurrent.JMXConfigurableThreadPoolExecutor.init(JMXConfigurableThreadPoolExecutor.java:34)
at
org.apache.cassandra.concurrent.StageManager.multiThreadedConfigurableStage(StageManager.java:68)
at
org.apache.cassandra.concurrent.StageManager.clinit(StageManager.java:42)
at
org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:344)
at
org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:173)

The *ulimit -u* output is
*515042*

Which is far more than what is recommended [1] (10240) and I am skeptical
to set it to unlimited as recommended here [2]

Any pointers as to what could be the issue and how to get the node up.



[1]
http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html?pagename=docsversion=1.2file=install/recommended_settings#cassandra/install/installRecommendSettings.html

[2]
http://mail-archives.apache.org/mod_mbox/cassandra-user/201303.mbox/%3CCAPqEvGE474Omea1BFLJ6U_pbAkOwWxk=dwo35_pc-atwb4_...@mail.gmail.com%3E

Thanks !


Re: Best way to track backups/delays for cross DC replication

2013-09-09 Thread srmore
I would be interested to know that too, it would be great if anyone can
share how they do (or do not) track or monitor cross datacenter migrations.

Thanks !


On Wed, Sep 4, 2013 at 10:13 AM, Anand Somani meatfor...@gmail.com wrote:

 Hi,

 Scenario is a cluster spanning across datacenters and we use Local_quorum
 and want to know when things are not getting replicated across data
 centers. What is the best way to track/alert on that?

 I was planning on using the HintedHandOffManager (JMX)
 = org.apache.cassandra.db:type=HintedHandoffManager countPendingHints. Are
 there other metrics (maybe exposed via nodetool) I should be looking at. At
 this point we are on 1.1.6 cassandra.

 Thanks
 Anand



Distributed lock for cassandra

2013-08-12 Thread srmore
All,
There are some operations that demand the use lock and I was wondering
whether Cassandra has a built in locking mechanism. After hunting the web
for a while it appears that the answer is no, although I found this
outdated wiki page which describes the algorithm
http://wiki.apache.org/cassandra/Locking was this implemented ?

It would be great if people on the list can share their experiences / best
practices about locking.
Does anyone use cages https://code.google.com/p/cages/ ? if yes it would be
nice if you guys can share your experiences.

Thanks,
Sandeep


Re: Distributed lock for cassandra

2013-08-12 Thread srmore
On Mon, Aug 12, 2013 at 2:49 PM, Robert Coli rc...@eventbrite.com wrote:

 On Mon, Aug 12, 2013 at 12:31 PM, srmore comom...@gmail.com wrote:

 There are some operations that demand the use lock and I was wondering
 whether Cassandra has a built in locking mechanism. After hunting the web
 for a while it appears that the answer is no, although I found this
 outdated wiki page which describes the algorithm
 http://wiki.apache.org/cassandra/Locking was this implemented ?

 It would be great if people on the list can share their experiences /
 best practices about locking.


 If your application needs a lot of locking, it is probably not ideal for a
 distributed, log structured database with immutable data files.


This was the answer I was afraid of ... , not a lot of locking but now and
then I do need it, that said creating the username problem described in the
bug pretty much describes my problem.



 That said, Cassandra 2.0 will support CAS via Paxos. Presumably at a much,
 much lower throughput than the base system.

 https://issues.apache.org/jira/browse/CASSANDRA-5062


Thanks a lot for the pointers I will look at some of the solutions
described there.



 =Rob




Re: Alternate major compaction

2013-07-11 Thread srmore
Thanks Takenori,
Looks like the tool provides some good info that people can use. It would
be great if you can share it with the community.



On Thu, Jul 11, 2013 at 6:51 AM, Takenori Sato ts...@cloudian.com wrote:

 Hi,

 I think it is a common headache for users running a large Cassandra
 cluster in production.


 Running a major compaction is not the only cause, but more. For example, I
 see two typical scenario.

 1. backup use case
 2. active wide row

 In the case of 1, say, one data is removed a year later. This means,
 tombstone on the row is 1 year away from the original row. To remove an
 expired row entirely, a compaction set has to include all the rows. So,
 when do the original, 1 year old row, and the tombstoned row are included
 in a compaction set? It is likely to take one year.

 In the case of 2, such an active wide row exists in most of sstable files.
 And it typically contains many expired columns. But none of them wouldn't
 be removed entirely because a compaction set practically do not include all
 the row fragments.


 Btw, there is a very convenient MBean API is available. It is
 CompactionManager's forceUserDefinedCompaction. You can invoke a minor
 compaction on a file set you define. So the question is how to find an
 optimal set of sstable files.

 Then, I wrote a tool to check garbage, and print outs some useful
 information to find such an optimal set.

 Here's a simple log output.

 # /opt/cassandra/bin/checksstablegarbage -e 
 /cassandra_data/UserData/Test5_BLOB-hc-4-Data.db
 [Keyspace, ColumnFamily, gcGraceSeconds(gcBefore)] = [UserData, Test5_BLOB, 
 300(1373504071)]
 ===
 ROW_KEY, TOTAL_SIZE, COMPACTED_SIZE, TOMBSTONED, EXPIRED, 
 REMAINNING_SSTABLE_FILES
 ===
 hello5/100.txt.1373502926003, 40, 40, YES, YES, Test5_BLOB-hc-3-Data.db
 ---
 TOTAL, 40, 40
 ===

 REMAINNING_SSTABLE_FILES means any other sstable files that contain the
 respective row. So, the following is an optimal set.

 # /opt/cassandra/bin/checksstablegarbage -e 
 /cassandra_data/UserData/Test5_BLOB-hc-4-Data.db 
 /cassandra_data/UserData/Test5_BLOB-hc-3-Data.db
 [Keyspace, ColumnFamily, gcGraceSeconds(gcBefore)] = [UserData, Test5_BLOB, 
 300(1373504131)]
 ===
 ROW_KEY, TOTAL_SIZE, COMPACTED_SIZE, TOMBSTONED, EXPIRED, 
 REMAINNING_SSTABLE_FILES
 ===
 hello5/100.txt.1373502926003, 223, 0, YES, YES
 ---
 TOTAL, 223, 0
 ===

 This tool relies on SSTableReader and an aggregation iterator as Cassandra
 does in compaction. I was considering to share this with the community. So
 let me know if anyone is interested.

 Ah, note that it is based on 1.0.7. So I will need to check and update for
 newer versions.

 Thanks,
 Takenori


 On Thu, Jul 11, 2013 at 6:46 PM, Tomàs Núnez tomas.nu...@groupalia.comwrote:

 Hi

 About a year ago, we did a major compaction in our cassandra cluster (a
 n00b mistake, I know), and since then we've had huge sstables that never
 get compacted, and we were condemned to repeat the major compaction process
 every once in a while (we are using SizeTieredCompaction strategy, and
 we've not avaluated yet LeveledCompaction, because it has its downsides,
 and we've had no time to test all of them in our environment).

 I was trying to find a way to solve this situation (that is, do something
 like a major compaction that writes small sstables, not huge as major
 compaction does), and I couldn't find it in the documentation. I tried
 cleanup and scrub/upgradesstables, but they don't do that (as documentation
 states). Then I tried deleting all data in a node and then bootstrapping it
 (or nodetool rebuild-ing it), hoping that this way the sstables would get
 cleaned from deleted records and updates. But the deleted node just copied
 the sstables from another node as they were, cleaning nothing.

 So I tried a new approach: I switched the sstable compaction strategy
 (SizeTiered to Leveled), forcing the sstables to be rewritten from scratch,
 and then switching it back (Leveled to SizeTiered). It took a while (but so
 do the major compaction process) and it worked, I have smaller sstables,
 and I've regained a lot of disk space.

 I'm happy with the results, but it doesn't seem a orthodox way of
 cleaning the sstables. What do you think, is it something wrong or crazy?
 Is there a different way to achieve the same thing?

 Let's put an example:
 Suppose you have a 

Re: Migrating data from 2 node cluster to a 3 node cluster

2013-07-05 Thread srmore
On Fri, Jul 5, 2013 at 6:08 PM, Robert Coli rc...@eventbrite.com wrote:

 On Thu, Jul 4, 2013 at 10:03 AM, srmore comom...@gmail.com wrote:

 We are planning to move data from a 2 node cluster to a 3 node cluster.
 We are planning to copy the data from the two nodes (snapshot) to the new 2
 nodes and hoping that Cassandra will sync it to the third node. Will this
 work ? are there any other commands to run after we are done migrating,
 like nodetool repair.


 What RF are old and new cluster?


RF of old and new cluster is the same RF=3. Keyspaces and schema info is
also same.



 What are the tokens of old and new nodes?

tokens for old cluster ( 2-node )

node 0 -  0
node 1 -  85070591730234615865843651857942052864
Tokens for new cluster (3-node)
node 0 - 0
node 1 - 56713727820156407428984779325531226112
node 2 - 113427455640312814857969558651062452224



 http://www.palominodb.com/blog/2012/09/25/bulk-loading-options-cassandra


Thanks this helps a lot !



 =Rob



Migrating data from 2 node cluster to a 3 node cluster

2013-07-04 Thread srmore
We are planning to move data from a 2 node cluster to a 3 node cluster. We
are planning to copy the data from the two nodes (snapshot) to the new 2
nodes and hoping that Cassandra will sync it to the third node. Will this
work ? are there any other commands to run after we are done migrating,
like nodetool repair.

Thanks all.


Re: Heap is not released and streaming hangs at 0%

2013-06-26 Thread srmore
On Wed, Jun 26, 2013 at 12:16 AM, aaron morton aa...@thelastpickle.comwrote:

 bloom_filter_fp_chance value that was changed from default to 0.1, looked
 at the filters and they are about 2.5G on disk and I have around 8G of heap.
 I will try increasing the value to 0.7 and report my results.

 You need to re-write the sstables on disk using nodetool upgradesstables.
 Otherwise only the new tables with have the 0.1 setting.

 I will try increasing the value to 0.7 and report my results.

 No need to, it will probably be something like Oh no, really, what, how,
 please make it stop :)
 0.7 will mean reads will hit most / all of the SSTables for the CF.


Changing the bloom_filter_fp_chance to 0.7 did seem to correct the problem
in short run. I do not see the out of heap errors but I am taking a bit of
a performance hit. Planning to run some more tests, also  my
BloomFilterFalseRatio is 0.8367977262013025 this was the reason behind
bumping bloom_filter_fp_chance.



 I covered a high row situation in on of my talks at the summit this month,
 the slide deck is here
 http://www.slideshare.net/aaronmorton/cassandra-sf-2013-in-case-of-emergency-break-glass
  and
 the videos will soon be up at Planet Cassandra.


This was/is extremely helpful Aaron, cannot thank you enough for sharing
this with the community, eagerly looking forward for the video.


 Rebuild the sstables, then reduce the index_interval if you still need to
 reduce mem pressure.

 Cheers


 -
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com


 On 22/06/2013, at 1:17 PM, sankalp kohli kohlisank...@gmail.com wrote:

 I will take a heap dump and see whats in there rather than guessing.


 On Fri, Jun 21, 2013 at 4:12 PM, Bryan Talbot btal...@aeriagames.comwrote:

 bloom_filter_fp_chance = 0.7 is probably way too large to be effective
 and you'll probably have issues compacting deleted rows and get poor read
 performance with a value that high.  I'd guess that anything larger than
 0.1 might as well be 1.0.

 -Bryan



 On Fri, Jun 21, 2013 at 5:58 AM, srmore comom...@gmail.com wrote:


 On Fri, Jun 21, 2013 at 2:53 AM, aaron morton 
 aa...@thelastpickle.comwrote:

  nodetool -h localhost flush didn't do much good.

 Do you have 100's of millions of rows ?
 If so see recent discussions about reducing the bloom_filter_fp_chance
 and index_sampling.

 Yes, I have 100's of millions of rows.



 If this is an old schema you may be using the very old setting of
 0.000744 which creates a lot of bloom filters.

 bloom_filter_fp_chance value that was changed from default to 0.1,
 looked at the filters and they are about 2.5G on disk and I have around 8G
 of heap.
 I will try increasing the value to 0.7 and report my results.

 It also appears to be a case of hard GC failure (as Rob mentioned) as
 the heap is never released, even after 24+ hours of idle time, the JVM
 needs to be restarted to reclaim the heap.

 Cheers

-
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 20/06/2013, at 6:36 AM, Wei Zhu wz1...@yahoo.com wrote:

 If you want, you can try to force the GC through Jconsole.
 Memory-Perform GC.

 It theoretically triggers a full GC and when it will happen depends on
 the JVM

 -Wei

 --
 *From: *Robert Coli rc...@eventbrite.com
 *To: *user@cassandra.apache.org
 *Sent: *Tuesday, June 18, 2013 10:43:13 AM
 *Subject: *Re: Heap is not released and streaming hangs at 0%

 On Tue, Jun 18, 2013 at 10:33 AM, srmore comom...@gmail.com wrote:
  But then shouldn't JVM C G it eventually ? I can still see Cassandra
 alive
  and kicking but looks like the heap is locked up even after the
 traffic is
  long stopped.

 No, when GC system fails this hard it is often a permanent failure
 which requires a restart of the JVM.

  nodetool -h localhost flush didn't do much good.

 This adds support to the idea that your heap is too full, and not full
 of memtables.

 You could try nodetool -h localhost invalidatekeycache, but that
 probably will not free enough memory to help you.

 =Rob









Re: Heap is not released and streaming hangs at 0%

2013-06-21 Thread srmore
On Fri, Jun 21, 2013 at 2:53 AM, aaron morton aa...@thelastpickle.comwrote:

  nodetool -h localhost flush didn't do much good.

 Do you have 100's of millions of rows ?
 If so see recent discussions about reducing the bloom_filter_fp_chance and
 index_sampling.

Yes, I have 100's of millions of rows.



 If this is an old schema you may be using the very old setting of 0.000744
 which creates a lot of bloom filters.

 bloom_filter_fp_chance value that was changed from default to 0.1, looked
at the filters and they are about 2.5G on disk and I have around 8G of heap.
I will try increasing the value to 0.7 and report my results.

It also appears to be a case of hard GC failure (as Rob mentioned) as the
heap is never released, even after 24+ hours of idle time, the JVM needs to
be restarted to reclaim the heap.

Cheers

 -
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 20/06/2013, at 6:36 AM, Wei Zhu wz1...@yahoo.com wrote:

 If you want, you can try to force the GC through Jconsole. Memory-Perform
 GC.

 It theoretically triggers a full GC and when it will happen depends on the
 JVM

 -Wei

 --
 *From: *Robert Coli rc...@eventbrite.com
 *To: *user@cassandra.apache.org
 *Sent: *Tuesday, June 18, 2013 10:43:13 AM
 *Subject: *Re: Heap is not released and streaming hangs at 0%

 On Tue, Jun 18, 2013 at 10:33 AM, srmore comom...@gmail.com wrote:
  But then shouldn't JVM C G it eventually ? I can still see Cassandra
 alive
  and kicking but looks like the heap is locked up even after the traffic
 is
  long stopped.

 No, when GC system fails this hard it is often a permanent failure
 which requires a restart of the JVM.

  nodetool -h localhost flush didn't do much good.

 This adds support to the idea that your heap is too full, and not full
 of memtables.

 You could try nodetool -h localhost invalidatekeycache, but that
 probably will not free enough memory to help you.

 =Rob





Heap is not released and streaming hangs at 0%

2013-06-18 Thread srmore
I see an issues when I run high traffic to the Cassandra nodes, the heap
gets full to about 94% (which is expected) but the thing that confuses me
is that the heap usage never goes down after the traffic is stopped
(at-least, it appears to be so) . I kept the nodes up for a day after
stopping the traffic and the logs still tell me

Heap is 0.9430032942657169 full.  You may need to reduce memtable and/or
cache sizes.  Cassandra will now flush up to the two largest memtables to
free up memory.  Adjust flush_largest_memtables_at threshold in
cassandra.yaml if you don't want Cassandra to do this automatically

Things go back to normal when I restart Cassandra.

nodetool netstats tells me the following:

Mode: Normal
Not sending streams

and a bunch of keyspaces streaming from other nodes which are at 0% and
this stays this way until I restart Cassandra.

Also I see this at the bottom:

Pool NameActive   Pending  Completed
Commandsn/a 08267930
Responses   n/a 0   15184810

Any ideas as to how I can speed up this up and reclaim the heap ?

Thanks !


Re: Heap is not released and streaming hangs at 0%

2013-06-18 Thread srmore
Thanks Rob,
But then shouldn't JVM C G it eventually ? I can still see Cassandra alive
and kicking but looks like the heap is locked up even after the traffic is
long stopped.

nodetool -h localhost flush didn't do much good.
the version I am running is 1.0.12 (I know its due for a upgrade but gotto
work with this for now).



On Tue, Jun 18, 2013 at 12:13 PM, Robert Coli rc...@eventbrite.com wrote:

 On Tue, Jun 18, 2013 at 8:25 AM, srmore comom...@gmail.com wrote:
  I see an issues when I run high traffic to the Cassandra nodes, the heap
  gets full to about 94% (which is expected)

 Which is expected to cause GC failure? ;)

 But seriously, the reason your node is unable to GC is that you have
 filled your heap too fast for it to keep up. The JVM has seized up
 like Joe Namath with vapor lock.

  Any ideas as to how I can speed up this up and reclaim the heap ?

 Don't exhaust the ability of GC to C G. :)

 =Rob
 PS - What version of cassandra? If you nodetool -h localhost flush
 does it help?



Re: Multiple data center performance

2013-06-08 Thread srmore
I am seeing the similar behavior, in my case I have 2 nodes in each
datacenter and one node always has high latency (equal to the latency
between the two datacenters). When one of the datacenters is shutdown the
latency drops.

I am curious to know whether anyone else has these issues and if yes how
did to get around it.

Thanks !


On Fri, Jun 7, 2013 at 11:49 PM, Daning Wang dan...@netseer.com wrote:

 We have deployed multi-center but got performance issue. When the nodes on
 other center are up, the read response time from clients is 4 or 5 times
 higher. when we take those nodes down, the response time becomes
 normal(compare to the time before we changed to multi-center).

 We have high volume on the cluster, the consistency level is one for read.
 so my understanding is most of traffic between data center should be read
 repair. but seems that could not create much delay.

 What could cause the problem? how to debug this?

 Here is the keyspace,

 [default@dsat] describe dsat;
 Keyspace: dsat:
   Replication Strategy:
 org.apache.cassandra.locator.NetworkTopologyStrategy
   Durable Writes: true
 Options: [dc2:1, dc1:3]
   Column Families:
 ColumnFamily: categorization_cache


 Ring

 Datacenter: dc1
 ===
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address   Load   Tokens  Owns (effective)  Host ID
   Rack
 UN  xx.xx.xx..111   59.2 GB256 37.5%
 4d6ed8d6-870d-4963-8844-08268607757e  rac1
 DN  xx.xx.xx..121   99.63 GB   256 37.5%
 9d0d56ce-baf6-4440-a233-ad6f1d564602  rac1
 UN  xx.xx.xx..120   66.32 GB   256 37.5%
 0fd912fb-3187-462b-8c8a-7d223751b649  rac1
 UN  xx.xx.xx..118   63.61 GB   256 37.5%
 3c6e6862-ab14-4a8c-9593-49631645349d  rac1
 UN  xx.xx.xx..117   68.16 GB   256 37.5%
 ee6cdf23-d5e4-4998-a2db-f6c0ce41035a  rac1
 UN  xx.xx.xx..116   32.41 GB   256 37.5%
 f783eeef-1c51-4f91-ab7c-a60669816770  rac1
 UN  xx.xx.xx..115   64.24 GB   256 37.5%
 e75105fb-b330-4f40-aa4f-8e6e11838e37  rac1
 UN  xx.xx.xx..112   61.32 GB   256 37.5%
 2547ee54-88dd-4994-a1ad-d9ba367ed11f  rac1
 Datacenter: dc2
 ===
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address   Load   Tokens  Owns (effective)  Host ID
   Rack
 DN  xx.xx.xx.19958.39 GB   256 50.0%
 6954754a-e9df-4b3c-aca7-146b938515d8  rac1
 DN  xx.xx.xx..61  33.79 GB   256 50.0%
 91b8d510-966a-4f2d-a666-d7edbe986a1c  rac1


 Thank you in advance,

 Daning




Cassandra optimizations for multi-core machines

2013-06-05 Thread srmore
Hello All,
We are thinking of going with Cassandra on a 8 core machine, are there any
optimizations that can help us here ?

I have seen that during startup stage Cassandra uses only one core, is
there a way we can speed up the startup process ?

Thanks !


Re: Cassandra performance decreases drastically with increase in data size.

2013-06-03 Thread srmore
Thanks all for the help.
I ran the traffic over the weekend surprisingly, my heap was doing OK
(around 5.7G of 8G) but GC activity went nuts and dropped the throughput. I
will probably increase the number of nodes.

The other interesting thing I noticed was that there were some objects with
finalize() methods, this could potentially cause GC issues.


On Fri, May 31, 2013 at 1:47 AM, Aiman Parvaiz ai...@grapheffect.comwrote:

 I believe you should roll out more nodes as a temporary fix to your
 problem, 400GB on all nodes means (as correctly mentioned in other mails of
 this thread) you are spending more time on GC. Check out the second comment
 in this link by Aaron Morton, he says the more than 300GB can be
 problematic, though this post is about older version of cassandra but I
 believe concept still stands true:


 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Is-it-safe-to-stop-a-read-repair-and-any-suggestion-on-speeding-up-repairs-td6607367.html

 Thanks

 On May 29, 2013, at 9:32 PM, srmore comom...@gmail.com wrote:

 Hello,
 I am observing that my performance is drastically decreasing when my data
 size grows. I have a 3 node cluster with 64 GB of ram and my data size is
 around 400GB on all the nodes. I also see that when I re-start Cassandra
 the performance goes back to normal and then again starts decreasing after
 some time.

 Some hunting landed me to this page
 http://wiki.apache.org/cassandra/LargeDataSetConsiderations which talks
 about the large data sets and explains that it might be because I am going
 through multiple layers of OS cache, but does not tell me how to tune it.

 So, my question is, are there any optimizations that I can do to handle
 these large datatasets ?

 and why does my performance go back to normal when I restart Cassandra ?

 Thanks !





Consistency level for multi-datacenter setup

2013-06-03 Thread srmore
I am a bit confused when using the consistency level for multi datacenter
setup. Following is my setup:

I have 4 nodes the way these are set up are
Node 1 DC 1 - N1DC1
Node 2 DC 1 - N2DC1

Node 1 DC 2 - N1DC2
Node 2 DC 2 - N2DC2

I setup a delay in between two datacenters (DC1 and DC2 around 1 sec one
way)

I am observing that when I use consistency level 2 for some reason the
coordinate node is picking up the nodes from other datacenter. My
understanding was that Cassandra picks up nodes which are close by (from
local datacenter), determined by Gossip but looks like that's not the case.

I found the following comment on Datastax website :

If using a consistency level of ONE or LOCAL_QUORUM, only the nodes in the
same data center as the coordinator node must respond to the client request
in order for the request to succeed.

Does this mean that for multi datacenter we can only use ONE or
LOCAL_QUORUM if we want to use the local datacenter to avoid cross
datacenter latency.

I am using the GossipingPropertyFileSnitch.

Thanks !


Re: Consistency level for multi-datacenter setup

2013-06-03 Thread srmore
With CL=TWO it appears that one node randomly picks the node from other
datacenter to get the data. i.e. one node in the datacenter consistently
underperforms.



On Mon, Jun 3, 2013 at 3:21 PM, Hiller, Dean dean.hil...@nrel.gov wrote:

 What happens when you use CL=TWO.

 Dean

 From: srmore comom...@gmail.commailto:comom...@gmail.com
 Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Date: Monday, June 3, 2013 2:09 PM
 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Subject: Consistency level for multi-datacenter setup

 I am a bit confused when using the consistency level for multi datacenter
 setup. Following is my setup:

 I have 4 nodes the way these are set up are
 Node 1 DC 1 - N1DC1
 Node 2 DC 1 - N2DC1

 Node 1 DC 2 - N1DC2
 Node 2 DC 2 - N2DC2

 I setup a delay in between two datacenters (DC1 and DC2 around 1 sec one
 way)

 I am observing that when I use consistency level 2 for some reason the
 coordinate node is picking up the nodes from other datacenter. My
 understanding was that Cassandra picks up nodes which are close by (from
 local datacenter), determined by Gossip but looks like that's not the case.

 I found the following comment on Datastax website :

 If using a consistency level of ONE or LOCAL_QUORUM, only the nodes in
 the same data center as the coordinator node must respond to the client
 request in order for the request to succeed.

 Does this mean that for multi datacenter we can only use ONE or
 LOCAL_QUORUM if we want to use the local datacenter to avoid cross
 datacenter latency.

 I am using the GossipingPropertyFileSnitch.

 Thanks !




Re: Consistency level for multi-datacenter setup

2013-06-03 Thread srmore
We observed that as well, please let us know what you find out it would be
extremely helpful. There is also this property that you can play with  to
take care of slow nodes
*dynamic_snitch_badness_threshold*.

http://www.datastax.com/docs/1.1/configuration/node_configuration#dynamic-snitch-badness-threshold

Thanks !


On Mon, Jun 3, 2013 at 3:24 PM, Hiller, Dean dean.hil...@nrel.gov wrote:

 Also, we had to put a fix into cassandra so it removed slow nodes from
 the list of nodes to read from.  With that fix our QUOROM(not local quorom)
 started working again and would easily take the other DC nodes out of the
 list of reading from for you as well.  I need to circle back to with my
 teammate to check if he got his fix posted to the dev list or not.

 Later,
 Dean

 From: srmore comom...@gmail.commailto:comom...@gmail.com
 Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Date: Monday, June 3, 2013 2:09 PM
 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Subject: Consistency level for multi-datacenter setup

 I am a bit confused when using the consistency level for multi datacenter
 setup. Following is my setup:

 I have 4 nodes the way these are set up are
 Node 1 DC 1 - N1DC1
 Node 2 DC 1 - N2DC1

 Node 1 DC 2 - N1DC2
 Node 2 DC 2 - N2DC2

 I setup a delay in between two datacenters (DC1 and DC2 around 1 sec one
 way)

 I am observing that when I use consistency level 2 for some reason the
 coordinate node is picking up the nodes from other datacenter. My
 understanding was that Cassandra picks up nodes which are close by (from
 local datacenter), determined by Gossip but looks like that's not the case.

 I found the following comment on Datastax website :

 If using a consistency level of ONE or LOCAL_QUORUM, only the nodes in
 the same data center as the coordinator node must respond to the client
 request in order for the request to succeed.

 Does this mean that for multi datacenter we can only use ONE or
 LOCAL_QUORUM if we want to use the local datacenter to avoid cross
 datacenter latency.

 I am using the GossipingPropertyFileSnitch.

 Thanks !




Re: Consistency level for multi-datacenter setup

2013-06-03 Thread srmore
Yup, RF is 2 for both the datacenters.


On Mon, Jun 3, 2013 at 3:36 PM, Sylvain Lebresne sylv...@datastax.comwrote:

 What's your replication factor? Do you have RF=2 on both datacenters?


 On Mon, Jun 3, 2013 at 10:09 PM, srmore comom...@gmail.com wrote:

 I am a bit confused when using the consistency level for multi datacenter
 setup. Following is my setup:

 I have 4 nodes the way these are set up are
 Node 1 DC 1 - N1DC1
 Node 2 DC 1 - N2DC1

 Node 1 DC 2 - N1DC2
 Node 2 DC 2 - N2DC2

 I setup a delay in between two datacenters (DC1 and DC2 around 1 sec one
 way)

 I am observing that when I use consistency level 2 for some reason the
 coordinate node is picking up the nodes from other datacenter. My
 understanding was that Cassandra picks up nodes which are close by (from
 local datacenter), determined by Gossip but looks like that's not the case.

 I found the following comment on Datastax website :

 If using a consistency level of ONE or LOCAL_QUORUM, only the nodes in
 the same data center as the coordinator node must respond to the client
 request in order for the request to succeed.

 Does this mean that for multi datacenter we can only use ONE or
 LOCAL_QUORUM if we want to use the local datacenter to avoid cross
 datacenter latency.

 I am using the GossipingPropertyFileSnitch.

 Thanks !





Re: Consistency level for multi-datacenter setup

2013-06-03 Thread srmore
After some more investigation it does not appear to be the CL issue. Every
time I am starting up the node in other datacenter with 1sec delay my
throughput starts degrading, even with CL=ONE and CL=LOCAL_QUORUM.

I will put the logs on debug and investigate more and report back the
findings.



On Mon, Jun 3, 2013 at 3:37 PM, Hiller, Dean dean.hil...@nrel.gov wrote:

 Our badness threshold is 0.1 currently(just checked).  Our website used to
 get slow during a slow node time until we rolled our own patch out.

 Dean

 From: srmore comom...@gmail.commailto:comom...@gmail.com
 Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Date: Monday, June 3, 2013 2:31 PM
 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Subject: Re: Consistency level for multi-datacenter setup

 We observed that as well, please let us know what you find out it would be
 extremely helpful. There is also this property that you can play with  to
 take care of slow nodes
 dynamic_snitch_badness_threshold.


 http://www.datastax.com/docs/1.1/configuration/node_configuration#dynamic-snitch-badness-threshold

 Thanks !


 On Mon, Jun 3, 2013 at 3:24 PM, Hiller, Dean dean.hil...@nrel.govmailto:
 dean.hil...@nrel.gov wrote:
 Also, we had to put a fix into cassandra so it removed slow nodes from
 the list of nodes to read from.  With that fix our QUOROM(not local quorom)
 started working again and would easily take the other DC nodes out of the
 list of reading from for you as well.  I need to circle back to with my
 teammate to check if he got his fix posted to the dev list or not.

 Later,
 Dean

 From: srmore comom...@gmail.commailto:comom...@gmail.commailto:
 comom...@gmail.commailto:comom...@gmail.com
 Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
 mailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Date: Monday, June 3, 2013 2:09 PM
 To: user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:
 user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Subject: Consistency level for multi-datacenter setup

 I am a bit confused when using the consistency level for multi datacenter
 setup. Following is my setup:

 I have 4 nodes the way these are set up are
 Node 1 DC 1 - N1DC1
 Node 2 DC 1 - N2DC1

 Node 1 DC 2 - N1DC2
 Node 2 DC 2 - N2DC2

 I setup a delay in between two datacenters (DC1 and DC2 around 1 sec one
 way)

 I am observing that when I use consistency level 2 for some reason the
 coordinate node is picking up the nodes from other datacenter. My
 understanding was that Cassandra picks up nodes which are close by (from
 local datacenter), determined by Gossip but looks like that's not the case.

 I found the following comment on Datastax website :

 If using a consistency level of ONE or LOCAL_QUORUM, only the nodes in
 the same data center as the coordinator node must respond to the client
 request in order for the request to succeed.

 Does this mean that for multi datacenter we can only use ONE or
 LOCAL_QUORUM if we want to use the local datacenter to avoid cross
 datacenter latency.

 I am using the GossipingPropertyFileSnitch.

 Thanks !





Re: Cassandra performance decreases drastically with increase in data size.

2013-05-30 Thread srmore
You are right, it looks like I am doing a lot of GC. Is there any
short-term solution for this other than bumping up the heap ? because, even
if I increase the heap I will run into the same issue. Only the time before
I hit OOM will be lengthened.

It will be while before we go to latest and greatest Cassandra.

Thanks !


On Thu, May 30, 2013 at 12:05 AM, Jonathan Ellis jbel...@gmail.com wrote:

 Sounds like you're spending all your time in GC, which you can verify
 by checking what GCInspector and StatusLogger say in the log.

 Fix is increase your heap size or upgrade to 1.2:
 http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2

 On Wed, May 29, 2013 at 11:32 PM, srmore comom...@gmail.com wrote:
  Hello,
  I am observing that my performance is drastically decreasing when my data
  size grows. I have a 3 node cluster with 64 GB of ram and my data size is
  around 400GB on all the nodes. I also see that when I re-start Cassandra
 the
  performance goes back to normal and then again starts decreasing after
 some
  time.
 
  Some hunting landed me to this page
  http://wiki.apache.org/cassandra/LargeDataSetConsiderations which talks
  about the large data sets and explains that it might be because I am
 going
  through multiple layers of OS cache, but does not tell me how to tune it.
 
  So, my question is, are there any optimizations that I can do to handle
  these large datatasets ?
 
  and why does my performance go back to normal when I restart Cassandra ?
 
  Thanks !



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder, http://www.datastax.com
 @spyced



Cassandra performance decreases drastically with increase in data size.

2013-05-29 Thread srmore
Hello,
I am observing that my performance is drastically decreasing when my data
size grows. I have a 3 node cluster with 64 GB of ram and my data size is
around 400GB on all the nodes. I also see that when I re-start Cassandra
the performance goes back to normal and then again starts decreasing after
some time.

Some hunting landed me to this page
http://wiki.apache.org/cassandra/LargeDataSetConsiderations which talks
about the large data sets and explains that it might be because I am going
through multiple layers of OS cache, but does not tell me how to tune it.

So, my question is, are there any optimizations that I can do to handle
these large datatasets ?

and why does my performance go back to normal when I restart Cassandra ?

Thanks !


Re: Cannot resolve schema disagreement

2013-05-09 Thread srmore
Thanks Rob !

Tried the steps, that did not work, however I was able to resolve the
problem by syncing the clocks. The thing that confuses me is that, the FAQ
says Before 0.7.6, this can also be caused by cluster system clocks being
substantially out of sync with each other. The version I am using was
1.0.12.

This raises an important question, where does Cassandra get the time
information from ? and is it required (I know it is highly highly advisable
to) to keep clocks in sync, any suggestions/best practices on how to keep
the clocks in sync  ?



/srm


On Thu, May 9, 2013 at 1:58 PM, Robert Coli rc...@eventbrite.com wrote:

 On Wed, May 8, 2013 at 5:40 PM, srmore comom...@gmail.com wrote:
  After running the commands, I get back to the same issue. Cannot afford
 to
  lose the data so I guess this is the only option for me. And
 unfortunately I
  am using 1.0.12 ( cannot upgrade as of now ). Any, ideas on what might be
  happening or any pointers will be greatly appreciated.

 If you can afford downtime on the cluster, the solution to this
 problem with the highest chance of success is :

 1) dump the existing schema from a good node
 2) nodetool drain on all nodes
 3) stop cluster
 4) move schema and migration CF tables out of the way on all nodes
 5) start cluster
 6) re-load schema, being careful to explicitly check for schema
 agreement on all nodes between schema modifying statements

 In many/most cases of schema disagreement, people try the FAQ approach
 and it doesn't work and they end up being forced to do the above
 anyway. In general if you can tolerate the downtime, you should save
 yourself the effort and just do the above process.

 =Rob



Re: Cannot resolve schema disagreement

2013-05-09 Thread srmore
Thought so.

Thanks Aaron !



On Thu, May 9, 2013 at 6:09 PM, aaron morton aa...@thelastpickle.comwrote:

 This raises an important question, where does Cassandra get the time
 information from ?

 http://docs.oracle.com/javase/6/docs/api/java/lang/System.html
 normally milliSeconds, not sure if 1.0.12 may use nanoTime() which is less
 reliable on some VM's.

 and is it required (I know it is highly highly advisable to) to keep
 clocks in sync, any suggestions/best practices on how to keep the clocks in
 sync  ?

 http://en.wikipedia.org/wiki/Network_Time_Protocol

 Hope that helps.

 -
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 10/05/2013, at 9:16 AM, srmore comom...@gmail.com wrote:

 Thanks Rob !

 Tried the steps, that did not work, however I was able to resolve the
 problem by syncing the clocks. The thing that confuses me is that, the FAQ
 says Before 0.7.6, this can also be caused by cluster system clocks being
 substantially out of sync with each other. The version I am using was
 1.0.12.

 This raises an important question, where does Cassandra get the time
 information from ? and is it required (I know it is highly highly advisable
 to) to keep clocks in sync, any suggestions/best practices on how to keep
 the clocks in sync  ?



 /srm


 On Thu, May 9, 2013 at 1:58 PM, Robert Coli rc...@eventbrite.com wrote:

 On Wed, May 8, 2013 at 5:40 PM, srmore comom...@gmail.com wrote:
  After running the commands, I get back to the same issue. Cannot afford
 to
  lose the data so I guess this is the only option for me. And
 unfortunately I
  am using 1.0.12 ( cannot upgrade as of now ). Any, ideas on what might
 be
  happening or any pointers will be greatly appreciated.

 If you can afford downtime on the cluster, the solution to this
 problem with the highest chance of success is :

 1) dump the existing schema from a good node
 2) nodetool drain on all nodes
 3) stop cluster
 4) move schema and migration CF tables out of the way on all nodes
 5) start cluster
 6) re-load schema, being careful to explicitly check for schema
 agreement on all nodes between schema modifying statements

 In many/most cases of schema disagreement, people try the FAQ approach
 and it doesn't work and they end up being forced to do the above
 anyway. In general if you can tolerate the downtime, you should save
 yourself the effort and just do the above process.

 =Rob






Cannot resolve schema disagreement

2013-05-08 Thread srmore
Hello,
I have a cluster of 4 nodes and two of them are on different schema. I
tried to run the commands described in the FAQ section but no luck  (
http://wiki.apache.org/cassandra/FAQ#schema_disagreement) .

After running the commands, I get back to the same issue. Cannot afford to
lose the data so I guess this is the only option for me. And unfortunately
I am using 1.0.12 ( cannot upgrade as of now ). Any, ideas on what might be
happening or any pointers will be greatly appreciated.

/srm