[ https://issues.apache.org/jira/browse/CASSANDRA-19785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17867328#comment-17867328 ]
Brandon Williams commented on CASSANDRA-19785: ---------------------------------------------- I would recommend seeing if this persists on 4.0.13, but I would also be looking at anything that might make this deployment different from the norm since 4.0.11 has been out for a year and nobody has seen this before. > Possible memory leak in BTree.FastBuilder > ------------------------------------------ > > Key: CASSANDRA-19785 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19785 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Core > Reporter: Paul Chandler > Priority: Normal > Fix For: 4.0.x > > Attachments: image-2024-07-19-08-44-56-714.png, > image-2024-07-19-08-45-17-289.png, image-2024-07-19-08-45-33-933.png, > image-2024-07-19-08-45-50-383.png, image-2024-07-19-08-46-06-919.png, > image-2024-07-19-08-46-42-979.png, image-2024-07-19-08-46-56-594.png, > image-2024-07-19-08-47-19-517.png, image-2024-07-19-08-47-34-582.png > > > We are having a problem with the heap growing in size, This is a large > cluster > 1,000 nodes across a large number of dc’s. This is running version > 4.0.11. > > Each node has a 32GB heap, and the amount used continues to grow until it > reaches 30GB, it then struggles with multiple Full GC pauses, as can be seen > here: > !image-2024-07-19-08-44-56-714.png! > We took 2 heap dumps on one node a few days after it was restarted, and the > heap had grown by 2.7GB > > 9{^}th{^} July > !image-2024-07-19-08-45-17-289.png! > 11{^}th{^} July > !image-2024-07-19-08-45-33-933.png! > This can be seen as mainly an increase of memory used by > FastThreadLocalThread, increasing from 5.92GB to 8.53GB > !image-2024-07-19-08-45-50-383.png! > !image-2024-07-19-08-46-06-919.png! > Looking deeper into this it can be seen that the growing heap is contained > within the threads for the MutationStage, Native-transport-Requests, > ReadStage etc. We would expect the memory used within these threads to be > short lived, and not grow as time goes on. We recently increased the size of > theses threadpools, and that has increased the size of the problem. > > Top memory usage for FastThreadLocalThread > 9{^}th{^} July > !image-2024-07-19-08-46-42-979.png! > 11{^}th{^} July > !image-2024-07-19-08-46-56-594.png! > This has led us to investigate whether there could be a memory leak, and we > have found the following issues within the retained references in > BTree.FastBuilder objects. The issue appears to stem from the reset() method, > which does not properly clear all buffers. We are not really sure how the > BTree.FastBuilder works, but this this is our analysis of where a leak might > occur. > > Specifically: > Leaf Buffer Not Being Cleared: > When leaf().count is 0, the statement Arrays.fill(leaf().buffer, 0, > leaf().count, null); does not clear the buffer because the end index is 0. > This leaves the buffer with references to potentially large objects, > preventing garbage collection and increasing heap usage. > Branch inUse Property: > If the inUse property of the branch is set to false elsewhere in the code, > the while loop while (branch != null && branch.inUse) does not execute, > resulting in uncleared branch buffers and retained references. > > This is based on the following observations: > Heap Dumps: Analysis of heap dumps shows that leaf().count is often 0, > and as a result, the buffer is not being cleared, leading to high heap > utilization. > !image-2024-07-19-08-47-19-517.png! > Remote Debugging: Debugging sessions indicate that the drain() method > sets count to 0, and the inUse flag for the parent branch is set to false, > preventing the while loop in reset() from clearing the branch buffers. > !image-2024-07-19-08-47-34-582.png! > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org