[ 
https://issues.apache.org/jira/browse/CASSANDRA-6107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13780049#comment-13780049
 ] 

Constance Eustace commented on CASSANDRA-6107:
----------------------------------------------

It appears you are using a MAX_CACHE_PREPARED of 100,000, and 
COncurrentLinkedHashMap should use that as an evictor.

If the individual keys for 200 line batch statements are large (say, 10k, which 
I think based on the heap dump they consist of 1 map per statement in the batch 
possibly, so that is easily possible). So 100000 x 100000 bytes per statement = 
10 gigabytes... uhoh. 

I think 600,000 updates, which are 3000 batches of 200 statements each popped 
the heap for a 4GB. I figure 1 GB of that heap is used for 
filters/sstables/memtables/etc, so 3000 batches popped 3GB of heap, so a 
megabyte per batch.

Can we expose the MAX_CACHE_PREPARED as a config parameter?
                
> CQL3 Batch statement memory leak
> --------------------------------
>
>                 Key: CASSANDRA-6107
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6107
>             Project: Cassandra
>          Issue Type: Bug
>          Components: API, Core
>         Environment: - CASS version: 1.2.8 or 2.0.1, same issue seen in both
> - Running on OSX MacbookPro
> - Sun JVM 1.7
> - Single local cassandra node
> - both CMS and G1 GC used
> - we are using the cass-JDBC driver to submit our batches
>            Reporter: Constance Eustace
>            Priority: Minor
>
> We are doing large volume insert/update tests on a CASS via CQL3. 
> Using 4GB heap, after roughly 750,000 updates create/update 75,000 row keys, 
> we run out of heap, and it never dissipates, and we begin getting this 
> infamous error which many people seem to be encountering:
> WARN [ScheduledTasks:1] 2013-09-26 16:17:10,752 GCInspector.java (line 142) 
> Heap is 0.9383457210434385 full.  You may need to reduce memtable and/or 
> cache sizes.  Cassandra will now flush up to the two largest memtables to 
> free up memory.  Adjust flush_largest_memtables_at threshold in 
> cassandra.yaml if you don't want Cassandra to do this automatically
>  INFO [ScheduledTasks:1] 2013-09-26 16:17:10,753 StorageService.java (line 
> 3614) Unable to reduce heap usage since there are no dirty column families
> 8 and 12 GB heaps appear to delay the problem by roughly proportionate 
> amounts of 75,000 - 100,000 rowkeys per 4GB. Each run of 50,000 row key 
> creations sees the heap grow and never shrink again. 
> We have attempted to no effect:
> - removing all secondary indexes to see if that alleviates overuse of bloom 
> filters 
> - adjusted parameters for compaction throughput
> - adjusted memtable flush thresholds and other parameters 
> By examining heapdumps, it seems apparent that the problem is perpetual 
> retention of CQL3 BATCH statements. We have even tried dropping the keyspaces 
> after the updates and the CQL3 statement are still visible in the heapdump, 
> and after many many many CMS GC runs. G1 also showed this issue.
> The 750,000 statements are broken into batches of roughly 200 statements.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to