[ https://issues.apache.org/jira/browse/RATIS-845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
runzhiwang updated RATIS-845: ----------------------------- Description: *What's the problem ? * As the image shows, there are 1885 instances of RaftServerImpl, most of them are Closed, and should be GC, but actually not. You can find from the image 1513 RaftServerImpl were held by ManagermentFactory->jxmMBeanServer->HashMap, 372 RaftServerImpl were held by Datanode ReportManager Thread -> prometheus -> HashMap. So 1513 RaftServerImpl leak in ratis, and 372 leak in ozone. If RaftServerImpl can not GC, there are a lot of related resource can not be GC, such as the [DirectByteBuffer|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/raftlog/segmented/SegmentedRaftLogWorker.java#L150] in SegmentRaftLogWorker, which result 1GB memory leak out of heap. h3. *{color:#DE350B}1. 1885 instances of RaftServerImpl {color}* !screenshot-4.png! h3. *{color:#DE350B}2. 1513 RaftServerImpl were held by ManagermentFactory->jxmMBeanServer->HashMap, 372 RaftServerImpl were held by Datanode ReportManager Thread -> prometheus -> HashMap{color}* !screenshot-5.png! h3. *{color:#DE350B}3. 1513 RaftServerImpl were held by ManagermentFactory->jxmMBeanServer->HashMap{color}* !screenshot-6.png! h3. *{color:#DE350B}4. 372 RaftServerImpl were held by Datanode ReportManager Thread -> prometheus -> HashMap{color}* !screenshot-7.png! h3. *{color:#DE350B}5. 2038 DirectByteBuffer, and 1885 held by RaftServerImpl.{color}* !screenshot-8.png! !screenshot-9.png! h3. *{color:#DE350B}6. 1033 DirectByteBuffer were held by ManagermentFactory, 802 DirectByteBuffer were held by Datanode ReportManager Thread, total 1885.{color}* !screenshot-10.png! was: *What's the problem ? * As the image shows, there are 1885 instances of RaftServerImpl, most of them are Closed, and should be GC, but actually not. You can find from the image 1513 RaftServerImpl were held by ManagermentFactory->jxmMBeanServer->HashMap, 372 RaftServerImpl were held by Datanode ReportManager Thread -> prometheus -> HashMap. If RaftServerImpl can not GC, there are a lot of related resource can not be GC, such as the [DirectByteBuffer|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/raftlog/segmented/SegmentedRaftLogWorker.java#L150] in SegmentRaftLogWorker, which result 1GB memory leak out of heap. h3. *{color:#DE350B}1. 1885 instances of RaftServerImpl {color}* !screenshot-4.png! h3. *{color:#DE350B}2. 1513 RaftServerImpl were held by ManagermentFactory->jxmMBeanServer->HashMap, 372 RaftServerImpl were held by Datanode ReportManager Thread -> prometheus -> HashMap{color}* !screenshot-5.png! h3. *{color:#DE350B}3. 1513 RaftServerImpl were held by ManagermentFactory->jxmMBeanServer->HashMap{color}* !screenshot-6.png! h3. *{color:#DE350B}4. 372 RaftServerImpl were held by Datanode ReportManager Thread -> prometheus -> HashMap{color}* !screenshot-7.png! h3. *{color:#DE350B}5. 2038 DirectByteBuffer, and 1885 held by RaftServerImpl.{color}* !screenshot-8.png! !screenshot-9.png! h3. *{color:#DE350B}6. 1033 DirectByteBuffer were held by ManagermentFactory, 802 DirectByteBuffer were held by Datanode ReportManager Thread, total 1885.{color}* !screenshot-10.png! > Memory leak of RaftServerImpl > ----------------------------- > > Key: RATIS-845 > URL: https://issues.apache.org/jira/browse/RATIS-845 > Project: Ratis > Issue Type: Bug > Reporter: runzhiwang > Assignee: runzhiwang > Priority: Major > Attachments: screenshot-10.png, screenshot-2.png, screenshot-3.png, > screenshot-4.png, screenshot-5.png, screenshot-6.png, screenshot-7.png, > screenshot-8.png, screenshot-9.png > > > *What's the problem ? * > As the image shows, there are 1885 instances of RaftServerImpl, most of them > are Closed, and should be GC, but actually not. You can find from the image > 1513 RaftServerImpl were held by > ManagermentFactory->jxmMBeanServer->HashMap, 372 RaftServerImpl were held by > Datanode ReportManager Thread -> prometheus -> HashMap. So 1513 > RaftServerImpl leak in ratis, and 372 leak in ozone. If RaftServerImpl can > not GC, there are a lot of related resource can not be GC, such as the > [DirectByteBuffer|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/raftlog/segmented/SegmentedRaftLogWorker.java#L150] > in SegmentRaftLogWorker, which result 1GB memory leak out of heap. > h3. *{color:#DE350B}1. 1885 instances of RaftServerImpl {color}* > !screenshot-4.png! > h3. *{color:#DE350B}2. 1513 RaftServerImpl were held by > ManagermentFactory->jxmMBeanServer->HashMap, 372 RaftServerImpl were held by > Datanode ReportManager Thread -> prometheus -> HashMap{color}* > !screenshot-5.png! > h3. *{color:#DE350B}3. 1513 RaftServerImpl were held by > ManagermentFactory->jxmMBeanServer->HashMap{color}* > !screenshot-6.png! > h3. *{color:#DE350B}4. 372 RaftServerImpl were held by Datanode ReportManager > Thread -> prometheus -> HashMap{color}* > !screenshot-7.png! > h3. *{color:#DE350B}5. 2038 DirectByteBuffer, and 1885 held by > RaftServerImpl.{color}* > !screenshot-8.png! > !screenshot-9.png! > h3. *{color:#DE350B}6. 1033 DirectByteBuffer were held by ManagermentFactory, > 802 DirectByteBuffer were held by Datanode ReportManager Thread, total > 1885.{color}* > !screenshot-10.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)