[ https://issues.apache.org/jira/browse/CASSANDRA-6107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13780049#comment-13780049 ]
Constance Eustace commented on CASSANDRA-6107: ---------------------------------------------- It appears you are using a MAX_CACHE_PREPARED of 100,000, and COncurrentLinkedHashMap should use that as an evictor. If the individual keys for 200 line batch statements are large (say, 10k, which I think based on the heap dump they consist of 1 map per statement in the batch possibly, so that is easily possible). So 100000 x 100000 bytes per statement = 10 gigabytes... uhoh. I think 600,000 updates, which are 3000 batches of 200 statements each popped the heap for a 4GB. I figure 1 GB of that heap is used for filters/sstables/memtables/etc, so 3000 batches popped 3GB of heap, so a megabyte per batch. Can we expose the MAX_CACHE_PREPARED as a config parameter? > CQL3 Batch statement memory leak > -------------------------------- > > Key: CASSANDRA-6107 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6107 > Project: Cassandra > Issue Type: Bug > Components: API, Core > Environment: - CASS version: 1.2.8 or 2.0.1, same issue seen in both > - Running on OSX MacbookPro > - Sun JVM 1.7 > - Single local cassandra node > - both CMS and G1 GC used > - we are using the cass-JDBC driver to submit our batches > Reporter: Constance Eustace > Priority: Minor > > We are doing large volume insert/update tests on a CASS via CQL3. > Using 4GB heap, after roughly 750,000 updates create/update 75,000 row keys, > we run out of heap, and it never dissipates, and we begin getting this > infamous error which many people seem to be encountering: > WARN [ScheduledTasks:1] 2013-09-26 16:17:10,752 GCInspector.java (line 142) > Heap is 0.9383457210434385 full. You may need to reduce memtable and/or > cache sizes. Cassandra will now flush up to the two largest memtables to > free up memory. Adjust flush_largest_memtables_at threshold in > cassandra.yaml if you don't want Cassandra to do this automatically > INFO [ScheduledTasks:1] 2013-09-26 16:17:10,753 StorageService.java (line > 3614) Unable to reduce heap usage since there are no dirty column families > 8 and 12 GB heaps appear to delay the problem by roughly proportionate > amounts of 75,000 - 100,000 rowkeys per 4GB. Each run of 50,000 row key > creations sees the heap grow and never shrink again. > We have attempted to no effect: > - removing all secondary indexes to see if that alleviates overuse of bloom > filters > - adjusted parameters for compaction throughput > - adjusted memtable flush thresholds and other parameters > By examining heapdumps, it seems apparent that the problem is perpetual > retention of CQL3 BATCH statements. We have even tried dropping the keyspaces > after the updates and the CQL3 statement are still visible in the heapdump, > and after many many many CMS GC runs. G1 also showed this issue. > The 750,000 statements are broken into batches of roughly 200 statements. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira