[ https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081834#comment-13081834 ]
Brandon Williams edited comment on CASSANDRA-2868 at 8/9/11 6:43 PM: --------------------------------------------------------------------- bq. Wouldn't it be worth indicating that how many collection have been done since last log message if it's > 1, since it can (be > 1). The only reason I added count tracking was to prevent it from firing when there were no GCs (the api is flakey.) I've never actually been able to get > 1 to happen, but we can add it to the logging. bq. IMO the duration-based thresholds are hard to reason about here, where we're dealing w/ summaries and not individual GC results. We are dealing with individual GCs at least 99% of the time in practice. The worst case is >1 GC inflates the gctime enough that we errantly log when it's not needed, but I imagine to trigger that you would have to be in a gc pressure situation already. bq. I think I'd rather have something like the dropped messages logger, where every N seconds we log the summary we get from the mbean. That seems like it could be a lot of noise since GC is constantly happening. bq. The flushLargestMemtables/reduceCacheSizes stuff should probably be removed. I think the logic there is still sound ("Did we just do a CMS? Is the heap still 80% full?") and it seems to work as well as it always has. was (Author: brandon.williams): bq. Wouldn't it be worth indicating that how many collection have been done since last log message if it's > 1, since it can (be > 1). The only reason I added count tracking was to prevent it from firing when there were no GCs (the api is flakey.) I've never actually been able to get > 1 to happen, but we can add it to the logging. bq. IMO the duration-based thresholds are hard to reason about here, where we're dealing w/ summaries and not individual GC results. We are dealing with individual GCs at least 99% of the time in practice. The worst case is >1 GC inflates the gctime enough that we errantly log when it's not needed, but I imagine to trigger that you would have to be in a gc pressure situation already. bq. I think I'd rather have something like the dropped messages logger, where every N seconds we log the summary we get from the mbean. That seems like it could a lot of noise since GC is constantly happening. bq. The flushLargestMemtables/reduceCacheSizes stuff should probably be removed. I think the logic there is still sound ("Did we just do a CMS? Is the heap still 80% full?") and it seems to work as well as it always has. > Native Memory Leak > ------------------ > > Key: CASSANDRA-2868 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2868 > Project: Cassandra > Issue Type: Bug > Components: Core > Reporter: Daniel Doubleday > Assignee: Brandon Williams > Priority: Minor > Fix For: 0.8.4 > > Attachments: 2868-v1.txt, 2868-v2.txt, 48hour_RES.png, > low-load-36-hours-initial-results.png > > > We have memory issues with long running servers. These have been confirmed by > several users in the user list. That's why I report. > The memory consumption of the cassandra java process increases steadily until > it's killed by the os because of oom (with no swap) > Our server is started with -Xmx3000M and running for around 23 days. > pmap -x shows > Total SST: 1961616 (mem mapped data and index files) > Anon RSS: 6499640 > Total RSS: 8478376 > This shows that > 3G are 'overallocated'. > We will use BRAF on one of our less important nodes to check wether it is > related to mmap and report back. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira