[ https://issues.apache.org/jira/browse/RATIS-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Duong updated RATIS-2141: ------------------------- Description: In 3.1.0, with stateMachineCache enabled, the RaftLogCache entries contain a reference to the original RaftClientRequest. This is not supposed to happen as RaftLogCache entries should only refer to the LogEntries with data truncated, and RaftLogCache retention policy only counts the size of the entries without data. This problem impacts Apache Ozone. The reference form RaftLogCache entries prevent the original RaftClientRequest (which contains a large data chunk) to be GCed. The result is Ozone datanodes quickly run out of heap memory. !heap-dump.png|width=1286,height=141! !RaftLogCache_entry.png|width=730,height=272! This is not the case with the latest master branch, only with the 3.1.0 release. The fix for this issue in 3.1.0 is as simple as [6a141544c567a6325b05e2972cd426cdc14060cb|https://github.com/duongkame/ratis/commit/bcff74af0a5fa4b68af2267ce8dfa01f65a5445c]. was: In 3.1.0, with stateMachineCache enabled, the RaftLogCache entries contain a reference to the original RaftClientRequest. This is not supposed to happen as RaftLogCache entries should only refer to the LogEntries with data truncated, and RaftLogCache retention policy only counts the size of the entries without data. This problem impacts Apache Ozone. The reference form RaftLogCache entries prevent the original RaftClientRequest (which contains a large data chunk) to be GCed. The result is Ozone datanodes quickly run out of heap memory. This is not the case with latest master branch, only with the 3.1.0 release. The fix for this issue in 3.1.0 is as simple as [6a141544c567a6325b05e2972cd426cdc14060cb|https://github.com/duongkame/ratis/commit/bcff74af0a5fa4b68af2267ce8dfa01f65a5445c]. > Memory leak for stateMachineCache use cases > ------------------------------------------- > > Key: RATIS-2141 > URL: https://issues.apache.org/jira/browse/RATIS-2141 > Project: Ratis > Issue Type: Bug > Components: server > Affects Versions: 3.1.0 > Reporter: Duong > Priority: Major > Attachments: RaftLogCache_entry.png, heap-dump.png > > > In 3.1.0, with stateMachineCache enabled, the RaftLogCache entries contain a > reference to the original RaftClientRequest. This is not supposed to happen > as RaftLogCache entries should only refer to the LogEntries with data > truncated, and RaftLogCache retention policy only counts the size of the > entries without data. > This problem impacts Apache Ozone. The reference form RaftLogCache entries > prevent the original RaftClientRequest (which contains a large data chunk) to > be GCed. The result is Ozone datanodes quickly run out of heap memory. > !heap-dump.png|width=1286,height=141! > !RaftLogCache_entry.png|width=730,height=272! > This is not the case with the latest master branch, only with the 3.1.0 > release. > The fix for this issue in 3.1.0 is as simple as > [6a141544c567a6325b05e2972cd426cdc14060cb|https://github.com/duongkame/ratis/commit/bcff74af0a5fa4b68af2267ce8dfa01f65a5445c]. > -- This message was sent by Atlassian Jira (v8.20.10#820010)