Re: Cass 2.0.0: Extensive memory allocation when row_cache enabled
First off, I'm curious what hardware (system specs) you're running this on? Secondly, here are some observations: * You're not running the newest JDK7, I can tell by your stack-size. Consider getting the newest. * Cassandra 2.0.2 has a lot of improvements, consider upgrading. We noticed improved heap usage compared to 2.0.2 * Have you simply tried decreasing the size of your row cache? Tried 256MB? * Do you have JNA installed? Otherwise, you're not getting off-heap usage for these caches which seems likely. Check your cassandra.log to verify JNA operation. * Your NewGen is too small. See your heap peaks? This is because short-lived memory is being put into OldGen, which only gets cleaned up during fullGC. You should set your NewGen to about 25-30% of your total heapsize. Many objects are short-lived, and CMS GC is significantly more efficient if the shorter-lived objects never get promoted to OldGen; you'll get more concurrent, non-blocking GC. If you're not using JNA (per above) row-cache and key-cache is still on-heap, so you want your NewGen to be = twice as large as the size of these combined caches. You should never so those crazy heap spikes, your caches are essentially overflowing into OldGen (with JNA). On Tue, Nov 5, 2013 at 3:04 AM, Jiri Horky ho...@avast.com wrote: Hi there, we are seeing extensive memory allocation leading to quite long and frequent GC pauses when using row cache. This is on cassandra 2.0.0 cluster with JNA 4.0 library with following settings: key_cache_size_in_mb: 300 key_cache_save_period: 14400 row_cache_size_in_mb: 1024 row_cache_save_period: 14400 commitlog_sync: periodic commitlog_sync_period_in_ms: 1 commitlog_segment_size_in_mb: 32 -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms10G -Xmx10G -Xmn1024M -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/data2/cassandra-work/instance-1/cassandra-1383566283-pid1893.hprof -Xss180k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB -XX:+UseCondCardMark We have disabled row cache on one node to see the difference. Please see attached plots from visual VM, I think that the effect is quite visible. I have also taken 10x jmap -histo after 5s on a affected server and plotted the result, attached as well. I have taken a dump of the application when the heap size was 10GB, most of the memory was unreachable, which was expected. The majority was used by 55-59M objects of HeapByteBuffer, byte[] and org.apache.cassandra.db.Column classes. I also include a list of inbound references to the HeapByteBuffer objects from which it should be visible where they are being allocated. This was acquired using Eclipse MAT. Here is the comparison of GC times when row cache enabled and disabled: prg01 - row cache enabled - uptime 20h45m - ConcurrentMarkSweep - 11494686ms - ParNew - 14690885 ms - time spent in GC: 35% prg02 - row cache disabled - uptime 23h45m - ConcurrentMarkSweep - 251ms - ParNew - 230791 ms - time spent in GC: 0.27% I would be grateful for any hints. Please let me know if you need any further information. For now, we are going to disable the row cache. Regards Jiri Horky
Re: Cass 2.0.0: Extensive memory allocation when row_cache enabled
On Thu, Nov 14, 2013 at 10:05 AM, J. Ryan Earl o...@jryanearl.us wrote: * Cassandra 2.0.2 has a lot of improvements, consider upgrading. We noticed improved heap usage compared to 2.0.2 https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/ And especially if you're using Level Compaction / LCS : https://issues.apache.org/jira/browse/CASSANDRA-6284 Wrong tracking of minLevel in Leveled Compaction Strategy causing serious performance problems tl;dr - don't upgrade to 2.0.2 in production. =Rob
Re: Cass 2.0.0: Extensive memory allocation when row_cache enabled
Hi, On 11/12/2013 05:29 AM, Aaron Morton wrote: Are you doing large slices or do could you have a lot of tombstones on the rows ? don't really know - how can I monitor that? For tombstones, do you do a lot of deletes ? Also in v2.0.2 cfstats has this Average live cells per slice (last five minutes): 0.0 Average tombstones per slice (last five minutes): 0.0 For large slices you need to check your code. e.g. do you anything that reads lots of columns or very large columns or lets the user select how many columns to read? The org.apache.cassandra.db.ArrayBackedSortedColumns in the trace back is used during reads (.e.g. org.apache.cassandra.db.filter.SliceQueryFilter) thanks for explanation, will try to provide some figures (but unfortunately not from the 2.0.2). You probably want the heap to be 4G to 8G in size, 10G will encounter longer pauses. Also the size of the new heap may be too big depending on the number of cores. I would recommend trying 800M I tried to decrease it first to 384M then to 128M with no change in the behaviour. I don't really care extra memory overhead of the cache - to be able to actual point to it with objects, but I don't really see the reason why it should create/delete those many objects so quickly. Not sure what you changed to 384M. Sorry for the confusion. I meant to say that I tried to decrease row cache size to 384M and then to 128M and the GC times did not change at all (still ~30% of the time). Shows the heap growing very quickly. This could be due to wide reads or a high write throughput. Well, both prg01 and prg02 receive the same load which is about ~150-250 (during peak) read requests per seconds and 100-160 write requests per second. The only with heap growing rapidly and GC kicking in is on nodes with row cache enabled. This sounds like on a row cache miss cassandra is reading the whole row, which happens to be a wide row. I would also guess some writes are going to the rows and they are getting invalidated out of the row cache. The row cache is not great for rows the update frequently and/or wide rows. How big are the rows ? use nodetool cfstats and nodetool cfhistograms. I will get in touch with the developers and take the data from cf* commands in a few days (I am out of office for some days). Thanks for the pointers, will get in touch. Cheers Jiri Horky
Re: Cass 2.0.0: Extensive memory allocation when row_cache enabled
Are you doing large slices or do could you have a lot of tombstones on the rows ? don't really know - how can I monitor that? For tombstones, do you do a lot of deletes ? Also in v2.0.2 cfstats has this Average live cells per slice (last five minutes): 0.0 Average tombstones per slice (last five minutes): 0.0 For large slices you need to check your code. e.g. do you anything that reads lots of columns or very large columns or lets the user select how many columns to read? The org.apache.cassandra.db.ArrayBackedSortedColumns in the trace back is used during reads (.e.g. org.apache.cassandra.db.filter.SliceQueryFilter) You probably want the heap to be 4G to 8G in size, 10G will encounter longer pauses. Also the size of the new heap may be too big depending on the number of cores. I would recommend trying 800M I tried to decrease it first to 384M then to 128M with no change in the behaviour. I don't really care extra memory overhead of the cache - to be able to actual point to it with objects, but I don't really see the reason why it should create/delete those many objects so quickly. Not sure what you changed to 384M. Shows the heap growing very quickly. This could be due to wide reads or a high write throughput. Well, both prg01 and prg02 receive the same load which is about ~150-250 (during peak) read requests per seconds and 100-160 write requests per second. The only with heap growing rapidly and GC kicking in is on nodes with row cache enabled. This sounds like on a row cache miss cassandra is reading the whole row, which happens to be a wide row. I would also guess some writes are going to the rows and they are getting invalidated out of the row cache. The row cache is not great for rows the update frequently and/or wide rows. How big are the rows ? use nodetool cfstats and nodetool cfhistograms. Cheers - Aaron Morton New Zealand @aaronmorton Co-Founder Principal Consultant Apache Cassandra Consulting http://www.thelastpickle.com On 8/11/2013, at 11:47 pm, Jiri Horky ho...@avast.com wrote: Hi, On 11/07/2013 05:18 AM, Aaron Morton wrote: Class Name | Shallow Heap | Retained Heap --- | | java.nio.HeapByteBuffer @ 0x7806a0848 | 48 |80 '- name org.apache.cassandra.db.Column @ 0x7806424e8 | 32 | 112 |- [338530] java.lang.Object[540217] @ 0x57d62f560 Unreachable |2,160,888 | 2,160,888 |- [338530] java.lang.Object[810325] @ 0x591546540 |3,241,320 | 7,820,328 | '- elementData java.util.ArrayList @ 0x75e8424c0 | 24 | 7,820,352 | |- list org.apache.cassandra.db.ArrayBackedSortedColumns$SlicesIterator @ 0x5940e0b18 | 48 | 128 | | '- val$filteredIter org.apache.cassandra.db.filter.SliceQueryFilter$1 @ 0x5940e0b48 | 32 | 7,820,568 | | '- val$iter org.apache.cassandra.db.filter.QueryFilter$2 @ 0x5940e0b68 Unreachable | 24 | 7,820,592 | |- this$0, parent java.util.ArrayList$SubList @ 0x5940e0bb8 | 40 |40 | | '- this$1 java.util.ArrayList$SubList$1 @ 0x5940e0be0 | 40 |80 | | '- currentSlice org.apache.cassandra.db.ArrayBackedSortedColumns$SlicesIterator @ 0x5940e0b18| 48 | 128 | |'- val$filteredIter org.apache.cassandra.db.filter.SliceQueryFilter$1 @ 0x5940e0b48 | 32 | 7,820,568 | | '- val$iter org.apache.cassandra.db.filter.QueryFilter$2 @ 0x5940e0b68 Unreachable | 24 | 7,820,592 | |- columns org.apache.cassandra.db.ArrayBackedSortedColumns @ 0x5b0a33488 | 32 |56 | | '- val$cf org.apache.cassandra.db.filter.SliceQueryFilter$1 @ 0x5940e0b48 | 32 | 7,820,568 | | '- val$iter org.apache.cassandra.db.filter.QueryFilter$2 @ 0x5940e0b68 Unreachable | 24 | 7,820,592 | '- Total: 3 entries
Re: Cass 2.0.0: Extensive memory allocation when row_cache enabled
Hi, On 11/07/2013 05:18 AM, Aaron Morton wrote: Class Name | Shallow Heap | Retained Heap --- | | java.nio.HeapByteBuffer @ 0x7806a0848 | 48 |80 '- name org.apache.cassandra.db.Column @ 0x7806424e8 | 32 | 112 |- [338530] java.lang.Object[540217] @ 0x57d62f560 Unreachable |2,160,888 | 2,160,888 |- [338530] java.lang.Object[810325] @ 0x591546540 |3,241,320 | 7,820,328 | '- elementData java.util.ArrayList @ 0x75e8424c0 | 24 | 7,820,352 | |- list org.apache.cassandra.db.ArrayBackedSortedColumns$SlicesIterator @ 0x5940e0b18 | 48 | 128 | | '- val$filteredIter org.apache.cassandra.db.filter.SliceQueryFilter$1 @ 0x5940e0b48 | 32 | 7,820,568 | | '- val$iter org.apache.cassandra.db.filter.QueryFilter$2 @ 0x5940e0b68 Unreachable | 24 | 7,820,592 | |- this$0, parent java.util.ArrayList$SubList @ 0x5940e0bb8 | 40 |40 | | '- this$1 java.util.ArrayList$SubList$1 @ 0x5940e0be0 | 40 |80 | | '- currentSlice org.apache.cassandra.db.ArrayBackedSortedColumns$SlicesIterator @ 0x5940e0b18| 48 | 128 | |'- val$filteredIter org.apache.cassandra.db.filter.SliceQueryFilter$1 @ 0x5940e0b48 | 32 | 7,820,568 | | '- val$iter org.apache.cassandra.db.filter.QueryFilter$2 @ 0x5940e0b68 Unreachable | 24 | 7,820,592 | |- columns org.apache.cassandra.db.ArrayBackedSortedColumns @ 0x5b0a33488 | 32 |56 | | '- val$cf org.apache.cassandra.db.filter.SliceQueryFilter$1 @ 0x5940e0b48 | 32 | 7,820,568 | | '- val$iter org.apache.cassandra.db.filter.QueryFilter$2 @ 0x5940e0b68 Unreachable | 24 | 7,820,592 | '- Total: 3 entries | | |- [338530] java.lang.Object[360145] @ 0x7736ce2f0 Unreachable |1,440,600 | 1,440,600 '- Total: 3 entries | | Are you doing large slices or do could you have a lot of tombstones on the rows ? don't really know - how can I monitor that? We have disabled row cache on one node to see the difference. Please see attached plots from visual VM, I think that the effect is quite visible. The default row cache is of the JVM heap, have you changed to the ConcurrentLinkedHashCacheProvider ? Answered by Chris already :) No. One way the SerializingCacheProvider could impact GC is if the CF takes a lot of writes. The SerializingCacheProvider invalidates the row when it is written to and had to read the entire row and serialise it on a cache miss. -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms10G -Xmx10G -Xmn1024M -XX:+HeapDumpOnOutOfMemoryError You probably want the heap to be 4G to 8G in size, 10G will encounter longer pauses. Also the size of the new heap may be too big depending on the number of cores. I would recommend trying 800M I tried to decrease it first to 384M then to 128M with no change in the behaviour. I don't really care extra memory overhead of the cache - to be able to actual point to it with objects, but I don't really see the reason why it should create/delete those many objects so quickly. prg01.visual.vm.png Shows the heap growing very quickly. This could be due to wide reads or a high write throughput. Well, both prg01 and prg02 receive the same load which is about ~150-250 (during peak) read requests per seconds and 100-160 write requests per second. The only with heap growing rapidly and GC kicking in is on nodes with row cache enabled. Hope that helps. Thank you! Jiri Horky
Re: Cass 2.0.0: Extensive memory allocation when row_cache enabled
On 11/06/2013 11:18 PM, Aaron Morton wrote: The default row cache is of the JVM heap, have you changed to the ConcurrentLinkedHashCacheProvider ? ConcurrentLinkedHashCacheProvider was removed in 2.0.x.
Re: Cass 2.0.0: Extensive memory allocation when row_cache enabled
Both caches involve several objects per entry (What do we want? Packed objects. When do we want them? Now!). The size is an estimate of the off heap values only and not the total size nor number of entries. An acceptable size will depend on your data and access patterns. In one case we had a cluster that at 512mb would go into a GC death spiral despite plenty of free heap (presumably just due to the number of objects) while empirically the cluster runs smoothly at 384mb. Your caches appear on the larger size, I suggest trying smaller values and only increase when it produces measurable sustained gains. On 11/05/2013 04:04 AM, Jiri Horky wrote: Hi there, we are seeing extensive memory allocation leading to quite long and frequent GC pauses when using row cache. This is on cassandra 2.0.0 cluster with JNA 4.0 library with following settings: key_cache_size_in_mb: 300 key_cache_save_period: 14400 row_cache_size_in_mb: 1024 row_cache_save_period: 14400 commitlog_sync: periodic commitlog_sync_period_in_ms: 1 commitlog_segment_size_in_mb: 32 -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms10G -Xmx10G -Xmn1024M -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/data2/cassandra-work/instance-1/cassandra-1383566283-pid1893.hprof -Xss180k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB -XX:+UseCondCardMark We have disabled row cache on one node to see the difference. Please see attached plots from visual VM, I think that the effect is quite visible. I have also taken 10x jmap -histo after 5s on a affected server and plotted the result, attached as well. I have taken a dump of the application when the heap size was 10GB, most of the memory was unreachable, which was expected. The majority was used by 55-59M objects of HeapByteBuffer, byte[] and org.apache.cassandra.db.Column classes. I also include a list of inbound references to the HeapByteBuffer objects from which it should be visible where they are being allocated. This was acquired using Eclipse MAT. Here is the comparison of GC times when row cache enabled and disabled: prg01 - row cache enabled - uptime 20h45m - ConcurrentMarkSweep - 11494686ms - ParNew - 14690885 ms - time spent in GC: 35% prg02 - row cache disabled - uptime 23h45m - ConcurrentMarkSweep - 251ms - ParNew - 230791 ms - time spent in GC: 0.27% I would be grateful for any hints. Please let me know if you need any further information. For now, we are going to disable the row cache. Regards Jiri Horky
Re: Cass 2.0.0: Extensive memory allocation when row_cache enabled
Class Name | Shallow Heap | Retained Heap --- | | java.nio.HeapByteBuffer @ 0x7806a0848 | 48 |80 '- name org.apache.cassandra.db.Column @ 0x7806424e8 | 32 | 112 |- [338530] java.lang.Object[540217] @ 0x57d62f560 Unreachable |2,160,888 | 2,160,888 |- [338530] java.lang.Object[810325] @ 0x591546540 |3,241,320 | 7,820,328 | '- elementData java.util.ArrayList @ 0x75e8424c0 | 24 | 7,820,352 | |- list org.apache.cassandra.db.ArrayBackedSortedColumns$SlicesIterator @ 0x5940e0b18 | 48 | 128 | | '- val$filteredIter org.apache.cassandra.db.filter.SliceQueryFilter$1 @ 0x5940e0b48 | 32 | 7,820,568 | | '- val$iter org.apache.cassandra.db.filter.QueryFilter$2 @ 0x5940e0b68 Unreachable | 24 | 7,820,592 | |- this$0, parent java.util.ArrayList$SubList @ 0x5940e0bb8 | 40 |40 | | '- this$1 java.util.ArrayList$SubList$1 @ 0x5940e0be0 | 40 |80 | | '- currentSlice org.apache.cassandra.db.ArrayBackedSortedColumns$SlicesIterator @ 0x5940e0b18| 48 | 128 | |'- val$filteredIter org.apache.cassandra.db.filter.SliceQueryFilter$1 @ 0x5940e0b48 | 32 | 7,820,568 | | '- val$iter org.apache.cassandra.db.filter.QueryFilter$2 @ 0x5940e0b68 Unreachable | 24 | 7,820,592 | |- columns org.apache.cassandra.db.ArrayBackedSortedColumns @ 0x5b0a33488 | 32 |56 | | '- val$cf org.apache.cassandra.db.filter.SliceQueryFilter$1 @ 0x5940e0b48 | 32 | 7,820,568 | | '- val$iter org.apache.cassandra.db.filter.QueryFilter$2 @ 0x5940e0b68 Unreachable | 24 | 7,820,592 | '- Total: 3 entries | | |- [338530] java.lang.Object[360145] @ 0x7736ce2f0 Unreachable |1,440,600 | 1,440,600 '- Total: 3 entries | | Are you doing large slices or do could you have a lot of tombstones on the rows ? We have disabled row cache on one node to see the difference. Please see attached plots from visual VM, I think that the effect is quite visible. The default row cache is of the JVM heap, have you changed to the ConcurrentLinkedHashCacheProvider ? One way the SerializingCacheProvider could impact GC is if the CF takes a lot of writes. The SerializingCacheProvider invalidates the row when it is written to and had to read the entire row and serialise it on a cache miss. -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms10G -Xmx10G -Xmn1024M -XX:+HeapDumpOnOutOfMemoryError You probably want the heap to be 4G to 8G in size, 10G will encounter longer pauses. Also the size of the new heap may be too big depending on the number of cores. I would recommend trying 800M prg01.visual.vm.png Shows the heap growing very quickly. This could be due to wide reads or a high write throughput. Hope that helps. - Aaron Morton New Zealand @aaronmorton Co-Founder Principal Consultant Apache Cassandra Consulting http://www.thelastpickle.com On 7/11/2013, at 6:29 am, Chris Burroughs chris.burrou...@gmail.com wrote: Both caches involve several objects per entry (What do we want? Packed objects. When do we want them? Now!). The size is an estimate of the off heap values only and not the total size nor number of entries. An acceptable size will depend on your data and access patterns. In one case we had a cluster that at 512mb would go into a GC death spiral despite plenty of free heap (presumably just due to the number of objects) while empirically the cluster runs smoothly at 384mb. Your caches appear on the larger size, I suggest trying smaller values and only increase when it produces measurable
Re: Cass 2.0.0: Extensive memory allocation when row_cache enabled
We have the same problem. 2013/11/5 Jiri Horky ho...@avast.com Hi there, we are seeing extensive memory allocation leading to quite long and frequent GC pauses when using row cache. This is on cassandra 2.0.0 cluster with JNA 4.0 library with following settings: key_cache_size_in_mb: 300 key_cache_save_period: 14400 row_cache_size_in_mb: 1024 row_cache_save_period: 14400 commitlog_sync: periodic commitlog_sync_period_in_ms: 1 commitlog_segment_size_in_mb: 32 -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms10G -Xmx10G -Xmn1024M -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/data2/cassandra-work/instance-1/cassandra-1383566283-pid1893.hprof -Xss180k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB -XX:+UseCondCardMark We have disabled row cache on one node to see the difference. Please see attached plots from visual VM, I think that the effect is quite visible. I have also taken 10x jmap -histo after 5s on a affected server and plotted the result, attached as well. I have taken a dump of the application when the heap size was 10GB, most of the memory was unreachable, which was expected. The majority was used by 55-59M objects of HeapByteBuffer, byte[] and org.apache.cassandra.db.Column classes. I also include a list of inbound references to the HeapByteBuffer objects from which it should be visible where they are being allocated. This was acquired using Eclipse MAT. Here is the comparison of GC times when row cache enabled and disabled: prg01 - row cache enabled - uptime 20h45m - ConcurrentMarkSweep - 11494686ms - ParNew - 14690885 ms - time spent in GC: 35% prg02 - row cache disabled - uptime 23h45m - ConcurrentMarkSweep - 251ms - ParNew - 230791 ms - time spent in GC: 0.27% I would be grateful for any hints. Please let me know if you need any further information. For now, we are going to disable the row cache. Regards Jiri Horky -- Atenciosamente, Sávio S. Teles de Oliveira voice: +55 62 9136 6996 http://br.linkedin.com/in/savioteles Mestrando em Ciências da Computação - UFG Arquiteto de Software Laboratory for Ubiquitous and Pervasive Applications (LUPA) - UFG