[ 
https://issues.apache.org/jira/browse/CASSANDRA-19785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17867328#comment-17867328
 ] 

Brandon Williams commented on CASSANDRA-19785:
----------------------------------------------

I would recommend seeing if this persists on 4.0.13, but I would also be 
looking at anything that might make this deployment different from the norm 
since 4.0.11 has been out for a year and nobody has seen this before.

> Possible memory leak in BTree.FastBuilder 
> ------------------------------------------
>
>                 Key: CASSANDRA-19785
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19785
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Legacy/Core
>            Reporter: Paul Chandler
>            Priority: Normal
>             Fix For: 4.0.x
>
>         Attachments: image-2024-07-19-08-44-56-714.png, 
> image-2024-07-19-08-45-17-289.png, image-2024-07-19-08-45-33-933.png, 
> image-2024-07-19-08-45-50-383.png, image-2024-07-19-08-46-06-919.png, 
> image-2024-07-19-08-46-42-979.png, image-2024-07-19-08-46-56-594.png, 
> image-2024-07-19-08-47-19-517.png, image-2024-07-19-08-47-34-582.png
>
>
> We are having a problem with the heap growing in size, This is a large 
> cluster > 1,000 nodes across a large number of dc’s. This is running version 
> 4.0.11.
>  
> Each node has a 32GB heap, and the amount used continues to grow until it 
> reaches 30GB, it then struggles with multiple Full GC pauses, as can be seen 
> here:
> !image-2024-07-19-08-44-56-714.png!
> We took 2 heap dumps on one node a few days after it was restarted, and the 
> heap had grown by 2.7GB
>  
> 9{^}th{^} July
> !image-2024-07-19-08-45-17-289.png!
> 11{^}th{^} July
> !image-2024-07-19-08-45-33-933.png!
> This can be seen as mainly an increase of memory used by 
> FastThreadLocalThread, increasing from 5.92GB to 8.53GB
> !image-2024-07-19-08-45-50-383.png!
> !image-2024-07-19-08-46-06-919.png!
> Looking deeper into this it can be seen that the growing heap is contained 
> within the threads for the MutationStage, Native-transport-Requests, 
> ReadStage etc. We would expect the memory used within these threads to be 
> short lived, and not grow as time goes on.  We recently increased the size of 
> theses threadpools, and that has increased the size of the problem.
>  
> Top memory usage for FastThreadLocalThread
> 9{^}th{^} July
> !image-2024-07-19-08-46-42-979.png!
> 11{^}th{^} July
> !image-2024-07-19-08-46-56-594.png!
> This has led us to investigate whether there could be a memory leak, and we 
> have found the following issues within the retained references in 
> BTree.FastBuilder objects. The issue appears to stem from the reset() method, 
> which does not properly clear all buffers.  We are not really sure how the 
> BTree.FastBuilder works, but this this is our analysis of where a leak might 
> occur.
>  
> Specifically:
> Leaf Buffer Not Being Cleared:
> When leaf().count is 0, the statement Arrays.fill(leaf().buffer, 0, 
> leaf().count, null); does not clear the buffer because the end index is 0. 
> This leaves the buffer with references to potentially large objects, 
> preventing garbage collection and increasing heap usage.
> Branch inUse Property:
> If the inUse property of the branch is set to false elsewhere in the code, 
> the while loop while (branch != null && branch.inUse) does not execute, 
> resulting in uncleared branch buffers and retained references.
>  
> This is based on the following observations:
>     Heap Dumps: Analysis of heap dumps shows that leaf().count is often 0, 
> and as a result, the buffer is not being cleared, leading to high heap 
> utilization.
> !image-2024-07-19-08-47-19-517.png!
>     Remote Debugging: Debugging sessions indicate that the drain() method 
> sets count to 0, and the inUse flag for the parent branch is set to false, 
> preventing the while loop in reset() from clearing the branch buffers.
> !image-2024-07-19-08-47-34-582.png!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to