[ 
https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081834#comment-13081834
 ] 

Brandon Williams edited comment on CASSANDRA-2868 at 8/9/11 6:43 PM:
---------------------------------------------------------------------

bq. Wouldn't it be worth indicating that how many collection have been done 
since last log message if it's > 1, since it can (be > 1).

The only reason I added count tracking was to prevent it from firing when there 
were no GCs (the api is flakey.)  I've never actually been able to get > 1 to 
happen, but we can add it to the logging.

bq. IMO the duration-based thresholds are hard to reason about here, where 
we're dealing w/ summaries and not individual GC results.

We are dealing with individual GCs at least 99% of the time in practice.  The 
worst case is >1 GC inflates the gctime enough that we errantly log when it's 
not needed, but I imagine to trigger that you would have to be in a gc pressure 
situation already.

bq. I think I'd rather have something like the dropped messages logger, where 
every N seconds we log the summary we get from the mbean.

That seems like it could be a lot of noise since GC is constantly happening.

bq. The flushLargestMemtables/reduceCacheSizes stuff should probably be 
removed. 

I think the logic there is still sound ("Did we just do a CMS? Is the heap 
still 80% full?") and it seems to work as well as it always has.



      was (Author: brandon.williams):
    bq. Wouldn't it be worth indicating that how many collection have been done 
since last log message if it's > 1, since it can (be > 1).

The only reason I added count tracking was to prevent it from firing when there 
were no GCs (the api is flakey.)  I've never actually been able to get > 1 to 
happen, but we can add it to the logging.

bq. IMO the duration-based thresholds are hard to reason about here, where 
we're dealing w/ summaries and not individual GC results.

We are dealing with individual GCs at least 99% of the time in practice.  The 
worst case is >1 GC inflates the gctime enough that we errantly log when it's 
not needed, but I imagine to trigger that you would have to be in a gc pressure 
situation already.

bq. I think I'd rather have something like the dropped messages logger, where 
every N seconds we log the summary we get from the mbean.

That seems like it could a lot of noise since GC is constantly happening.

bq. The flushLargestMemtables/reduceCacheSizes stuff should probably be 
removed. 

I think the logic there is still sound ("Did we just do a CMS? Is the heap 
still 80% full?") and it seems to work as well as it always has.


  
> Native Memory Leak
> ------------------
>
>                 Key: CASSANDRA-2868
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Daniel Doubleday
>            Assignee: Brandon Williams
>            Priority: Minor
>             Fix For: 0.8.4
>
>         Attachments: 2868-v1.txt, 2868-v2.txt, 48hour_RES.png, 
> low-load-36-hours-initial-results.png
>
>
> We have memory issues with long running servers. These have been confirmed by 
> several users in the user list. That's why I report.
> The memory consumption of the cassandra java process increases steadily until 
> it's killed by the os because of oom (with no swap)
> Our server is started with -Xmx3000M and running for around 23 days.
> pmap -x shows
> Total SST: 1961616 (mem mapped data and index files)
> Anon  RSS: 6499640
> Total RSS: 8478376
> This shows that > 3G are 'overallocated'.
> We will use BRAF on one of our less important nodes to check wether it is 
> related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to