Re: cassandra freezes
What will be the implications of the fact that cassandra can't keep up with the write? Will the memtables be queued in memory until they are flushed? On Thu, Feb 25, 2010 at 4:56 PM, Jonathan Ellis jbel...@gmail.com wrote: Are you swapping? http://spyced.blogspot.com/2010/01/linux-performance-basics.html otherwise there's something wrong w/ your vm (?), disk i/o doesn't block incoming writes in cassandra On Thu, Feb 25, 2010 at 8:49 AM, Boris Shulman shulm...@gmail.com wrote: I don't think it is gc related issue. There is no correlation between gc times and the freeze times. More over I don't see any gc activity that lasts for omre than o.03 sec. But there is a correlation between disk flushing operations. I've noticed that the system freezes each time when my commit log reaches 1.1G. I have 1024M memtable size so I assume this is when the data flushing occurs. On Thu, Feb 25, 2010 at 4:13 PM, Jonathan Ellis jbel...@gmail.com wrote: Then you should check GC timing with -Xverbose:gc option (see: http://wiki.apache.org/cassandra/RunningCassandra for how to modify jvm options) for a correlation. On Thu, Feb 25, 2010 at 8:09 AM, Boris Shulman shulm...@gmail.com wrote: In these tests I perform only write operations, no reads. On Thu, Feb 25, 2010 at 4:07 PM, Jonathan Ellis jbel...@gmail.com wrote: The only kind of freeze that makes sense there is your reads are i/o bound and the extra disk activity is killing you. In that case the fix is to add more RAM, or give less to the JVM so the OS can use more for buffer cache. On Thu, Feb 25, 2010 at 8:01 AM, Boris Shulman shulm...@gmail.com wrote: In my case the cassandra node freezes while memtable flush operation is performed or compactation operation is performed. How can I optimize the cassandra configuration in order to avoid this behavior? I've tried both using large memtable size (1G) and small (128M) but in every case I have some sort of freezes when the data is flushed to the disk. Please advice.
Re: cassandra freezes
The only kind of freeze that makes sense there is your reads are i/o bound and the extra disk activity is killing you. In that case the fix is to add more RAM, or give less to the JVM so the OS can use more for buffer cache. On Thu, Feb 25, 2010 at 8:01 AM, Boris Shulman shulm...@gmail.com wrote: In my case the cassandra node freezes while memtable flush operation is performed or compactation operation is performed. How can I optimize the cassandra configuration in order to avoid this behavior? I've tried both using large memtable size (1G) and small (128M) but in every case I have some sort of freezes when the data is flushed to the disk. Please advice.
Re: cassandra freezes
Then you should check GC timing with -Xverbose:gc option (see: http://wiki.apache.org/cassandra/RunningCassandra for how to modify jvm options) for a correlation. On Thu, Feb 25, 2010 at 8:09 AM, Boris Shulman shulm...@gmail.com wrote: In these tests I perform only write operations, no reads. On Thu, Feb 25, 2010 at 4:07 PM, Jonathan Ellis jbel...@gmail.com wrote: The only kind of freeze that makes sense there is your reads are i/o bound and the extra disk activity is killing you. In that case the fix is to add more RAM, or give less to the JVM so the OS can use more for buffer cache. On Thu, Feb 25, 2010 at 8:01 AM, Boris Shulman shulm...@gmail.com wrote: In my case the cassandra node freezes while memtable flush operation is performed or compactation operation is performed. How can I optimize the cassandra configuration in order to avoid this behavior? I've tried both using large memtable size (1G) and small (128M) but in every case I have some sort of freezes when the data is flushed to the disk. Please advice.
Re: cassandra freezes
I don't think it is gc related issue. There is no correlation between gc times and the freeze times. More over I don't see any gc activity that lasts for omre than o.03 sec. But there is a correlation between disk flushing operations. I've noticed that the system freezes each time when my commit log reaches 1.1G. I have 1024M memtable size so I assume this is when the data flushing occurs. On Thu, Feb 25, 2010 at 4:13 PM, Jonathan Ellis jbel...@gmail.com wrote: Then you should check GC timing with -Xverbose:gc option (see: http://wiki.apache.org/cassandra/RunningCassandra for how to modify jvm options) for a correlation. On Thu, Feb 25, 2010 at 8:09 AM, Boris Shulman shulm...@gmail.com wrote: In these tests I perform only write operations, no reads. On Thu, Feb 25, 2010 at 4:07 PM, Jonathan Ellis jbel...@gmail.com wrote: The only kind of freeze that makes sense there is your reads are i/o bound and the extra disk activity is killing you. In that case the fix is to add more RAM, or give less to the JVM so the OS can use more for buffer cache. On Thu, Feb 25, 2010 at 8:01 AM, Boris Shulman shulm...@gmail.com wrote: In my case the cassandra node freezes while memtable flush operation is performed or compactation operation is performed. How can I optimize the cassandra configuration in order to avoid this behavior? I've tried both using large memtable size (1G) and small (128M) but in every case I have some sort of freezes when the data is flushed to the disk. Please advice.
Re: cassandra freezes
On Thu, 25 Feb 2010 08:56:25 -0600 Jonathan Ellis jbel...@gmail.com wrote: JE Are you swapping? JE http://spyced.blogspot.com/2010/01/linux-performance-basics.html JE otherwise there's something wrong w/ your vm (?), disk i/o doesn't JE block incoming writes in cassandra If the user has enough memory, can tmpfs (/tmp for example) be used for the data and commitlog to produce results without disk I/O (so it can be determined if disk I/O is the problem)? I've done this with other applications but don't know if it would work with Cassandra. Ted
Re: cassandra freezes
On Wed, Feb 24, 2010 at 8:46 PM, Santal Li santal...@gmail.com wrote: BTW: Somebody in my team told me, that if the cassandra managed data was too huge( 15x than heap space) , will cause performance issues, is this true? It really has more to do with what your hot data set is, than absolute size. Once any system becomes i/o bound because the hot set can't be cached in os buffers, you're going to be in trouble, there's nothing magic about that. :) -Jonathan
Re: cassandra freezes
I'm still in the experimentation stage so perhaps forgive this hypothetical question/idea. I am planning to load balance by putting haproxy in front of the cassandra cluster. First of all, is that a bad idea? Secondly, if I have high enough replication and # of nodes, is it possible and a good idea to proactively cause GCing to happen? (I.e. take a node out of the haproxy LB pool, somehow cause it to gc, and then put the node back in... repeat at intervals for each node?) Simon Smith
Re: cassandra freezes
haproxy should be fine. normal GCs aren't a problem, you don't need to worry about that. what is a problem is when you shove more requests into cassandra than it can handle, so it tries to GC to get enough memory to handle that, then you shove even more requests, so it GC's again, and it spirals out of control and freezes. https://issues.apache.org/jira/browse/CASSANDRA-685 will address this by not allowing more requests than it can handle. On Sat, Feb 20, 2010 at 10:22 AM, Simon Smith simongsm...@gmail.com wrote: I'm still in the experimentation stage so perhaps forgive this hypothetical question/idea. I am planning to load balance by putting haproxy in front of the cassandra cluster. First of all, is that a bad idea? Secondly, if I have high enough replication and # of nodes, is it possible and a good idea to proactively cause GCing to happen? (I.e. take a node out of the haproxy LB pool, somehow cause it to gc, and then put the node back in... repeat at intervals for each node?) Simon Smith
Re: cassandra freezes
On Fri, Feb 19, 2010 at 7:40 PM, Santal Li santal...@gmail.com wrote: I meet almost same thing as you. When I do some benchmarks write test, some times one Cassandra will freeze and other node will consider it was shutdown and up after 30+ second. I am using 5 node, each node 8G mem for java heap. From my investigate, it was caused by GC thread, because I start the JConsole and monitor with the memory heap usage, each time when the GC happend, heap usage will drop down from 6G to 1G, and check the casandra log, I found the freeze happend at exactly same times. With such a big heap, old generation GCs can definitely take a while. With just 1.5 gig heap, and with somewhat efficient parallel collection (on multi-core machine), we had trouble keeping collections below 5 seconds. But this depends a lot on survival ratio -- less garbage there is (and more live objects), slower things are. And relationship is super-linear too, so processing 6 gig (or whatever part of that is old generation space) can take a long time. It is certainly worth keeping in mind that more memory generally means longer gc collection time. But Jonathan is probably right in that this alone would not cause appearance of freeze -- rather, overload of GC blocking processing AND accumulation of new requests sounds more plausible. It is still good to consider both parts of the puzzle; preventing overflow that can turn bad situation into catastrophe, and trying to reduce impact of GC. So I think when using huge memory(2G), maybe need using some different GC stratege other than the default one provide by Cassandra lunch script. Dose't anyone meet this situation, can you please provide some guide? There are many ways to change GC settings, and specifically trying to reduce impact of old gen collections (young generation ones are less often problematic, although they can be tuned as well). Often there is a trade-off between frequency and impact of GC: to simplify, less often you configure it to occur (like increase heap), more impact it usually has when it does occur. Concurrent collectors (like traditional CMS) are good for steady state, and can keep oldgen GC from occuring maybe for hours (doing incremental concurrent partial collections). But can also lead to GC-from-hell when it must do full GC (since it's stop-the-world) kind. There is tons of information on how to deal with GC settings, but unfortunately it is bit of black arts and very dependant on your specific use case. There being dozens (more than a hundred I think) different switches makes it actually trickier, since you also need to learn which ones matter, and in what combinations. One somewhat counter-intuitive suggestion is to reduce size of heap at least with respect to caching. So mostly try to just keep live working set in memory, and not do caching inside Java process. Operating systems are pretty good at caching disk pages; and if storage engine is out of process (like native BDB), this can significantly reduce GC. In-process caches can be really bad for GC activity, because their contents are potentially long-living, yet relatively transient (that is, neither mostly live, nor mostly garbage, making GC optimizer try in vain to compact things). But once again, this may or may not help, and needs to be experimented with. Not sure if above helps, but I hope it gives at least some ideas, -+ Tatu +-
Re: cassandra freezes
are you using the old deb package? because that had broken gc settings. On Fri, Feb 19, 2010 at 10:40 PM, Santal Li santal...@gmail.com wrote: I meet almost same thing as you. When I do some benchmarks write test, some times one Cassandra will freeze and other node will consider it was shutdown and up after 30+ second. I am using 5 node, each node 8G mem for java heap. From my investigate, it was caused by GC thread, because I start the JConsole and monitor with the memory heap usage, each time when the GC happend, heap usage will drop down from 6G to 1G, and check the casandra log, I found the freeze happend at exactly same times. So I think when using huge memory(2G), maybe need using some different GC stratege other than the default one provide by Cassandra lunch script. Dose't anyone meet this situation, can you please provide some guide? Thanks -Santal 2010/2/17 Tatu Saloranta tsalora...@gmail.com On Tue, Feb 16, 2010 at 6:25 AM, Boris Shulman shulm...@gmail.com wrote: Hello, I'm running some benchmarks on 2 cassandra nodes each running on 8 cores machine with 16G RAM, 10G for Java heap. I've noticed that during benchmarks with numerous writes cassandra just freeze for several minutes (in those benchmarks I'm writing batches of 10 columns with 1K data each for every key in a single CF). Usually after performing 50K writes I'm getting a TimeOutException and cassandra just freezes. What configuration changes can I make in order to prevent this? Is it possible that my setup just can't handle the load? How can I calculate the number of casandra nodes for a desired load? One thing that can cause seeming lockups is garbage collector. So enabling GC debug output would be heplful, to see GC activity. Some collector (CMS specifically) can stop the system for very long time, up to minutes. This is not necessarily the root cause, but is easy to rule out. Beyond this, getting a stack trace during lockup would make sense. That can pinpoint what threads are doing, or what they are blocked on in case there is a deadlock or heavy contention on some shared resource. -+ Tatu +-
Re: cassandra freezes
the GC options as bellow: JVM_OPTS= \ -ea \ -Xms2G \ -Xmx8G \ -XX:SurvivorRatio=8 \ -XX:TargetSurvivorRatio=90 \ -XX:+AggressiveOpts \ -XX:+UseParNewGC \ -XX:+UseConcMarkSweepGC \ -XX:+CMSParallelRemarkEnabled \ -XX:+HeapDumpOnOutOfMemoryError \ -XX:SurvivorRatio=128 \ -XX:MaxTenuringThreshold=0 \ -Dcom.sun.management.jmxremote.port=8080 \ -Dcom.sun.management.jmxremote.ssl=false \ -Dcom.sun.management.jmxremote.authenticate=false Regards -Santal 2010/2/20 Jonathan Ellis jbel...@gmail.com are you using the old deb package? because that had broken gc settings. On Fri, Feb 19, 2010 at 10:40 PM, Santal Li santal...@gmail.com wrote: I meet almost same thing as you. When I do some benchmarks write test, some times one Cassandra will freeze and other node will consider it was shutdown and up after 30+ second. I am using 5 node, each node 8G mem for java heap. From my investigate, it was caused by GC thread, because I start the JConsole and monitor with the memory heap usage, each time when the GC happend, heap usage will drop down from 6G to 1G, and check the casandra log, I found the freeze happend at exactly same times. So I think when using huge memory(2G), maybe need using some different GC stratege other than the default one provide by Cassandra lunch script. Dose't anyone meet this situation, can you please provide some guide? Thanks -Santal 2010/2/17 Tatu Saloranta tsalora...@gmail.com On Tue, Feb 16, 2010 at 6:25 AM, Boris Shulman shulm...@gmail.com wrote: Hello, I'm running some benchmarks on 2 cassandra nodes each running on 8 cores machine with 16G RAM, 10G for Java heap. I've noticed that during benchmarks with numerous writes cassandra just freeze for several minutes (in those benchmarks I'm writing batches of 10 columns with 1K data each for every key in a single CF). Usually after performing 50K writes I'm getting a TimeOutException and cassandra just freezes. What configuration changes can I make in order to prevent this? Is it possible that my setup just can't handle the load? How can I calculate the number of casandra nodes for a desired load? One thing that can cause seeming lockups is garbage collector. So enabling GC debug output would be heplful, to see GC activity. Some collector (CMS specifically) can stop the system for very long time, up to minutes. This is not necessarily the root cause, but is easy to rule out. Beyond this, getting a stack trace during lockup would make sense. That can pinpoint what threads are doing, or what they are blocked on in case there is a deadlock or heavy contention on some shared resource. -+ Tatu +-
Re: cassandra freezes
On Tue, Feb 16, 2010 at 6:25 AM, Boris Shulman shulm...@gmail.com wrote: Hello, I'm running some benchmarks on 2 cassandra nodes each running on 8 cores machine with 16G RAM, 10G for Java heap. I've noticed that during benchmarks with numerous writes cassandra just freeze for several minutes (in those benchmarks I'm writing batches of 10 columns with 1K data each for every key in a single CF). Usually after performing 50K writes I'm getting a TimeOutException and cassandra just freezes. What configuration changes can I make in order to prevent this? Is it possible that my setup just can't handle the load? How can I calculate the number of casandra nodes for a desired load? One thing that can cause seeming lockups is garbage collector. So enabling GC debug output would be heplful, to see GC activity. Some collector (CMS specifically) can stop the system for very long time, up to minutes. This is not necessarily the root cause, but is easy to rule out. Beyond this, getting a stack trace during lockup would make sense. That can pinpoint what threads are doing, or what they are blocked on in case there is a deadlock or heavy contention on some shared resource. -+ Tatu +-