[ https://issues.apache.org/jira/browse/RATIS-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17096244#comment-17096244 ]
Shashikant Banerjee commented on RATIS-845: ------------------------------------------- [~avijayan], can you plz review it? > Memory leak of RaftServerImpl > ----------------------------- > > Key: RATIS-845 > URL: https://issues.apache.org/jira/browse/RATIS-845 > Project: Ratis > Issue Type: Bug > Reporter: runzhiwang > Assignee: runzhiwang > Priority: Major > Attachments: screenshot-10.png, screenshot-2.png, screenshot-3.png, > screenshot-4.png, screenshot-5.png, screenshot-6.png, screenshot-7.png, > screenshot-8.png, screenshot-9.png > > Time Spent: 20m > Remaining Estimate: 0h > > *What's the problem ? * > As the image shows, there are 1885 instances of RaftServerImpl, most of them > are Closed, and should be GC, but actually not. You can find from the image > 1513 RaftServerImpl were held by > ManagermentFactory->jxmMBeanServer->HashMap, 372 RaftServerImpl were held by > Datanode ReportManager Thread -> prometheus -> HashMap. So 1513 > RaftServerImpl leak in ratis, and 372 leak in ozone. If RaftServerImpl can > not GC, there are a lot of related resource can not be GC, such as the > [DirectByteBuffer|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/raftlog/segmented/SegmentedRaftLogWorker.java#L150] > in SegmentRaftLogWorker, which result 1GB memory leak out of heap. > h3. *{color:#DE350B}1. 1885 instances of RaftServerImpl {color}* > !screenshot-4.png! > h3. *{color:#DE350B}2. 1513 RaftServerImpl were held by > ManagermentFactory->jxmMBeanServer->HashMap, 372 RaftServerImpl were held by > Datanode ReportManager Thread -> prometheus -> HashMap{color}* > !screenshot-5.png! > h3. *{color:#DE350B}3. 1513 RaftServerImpl were held by > ManagermentFactory->jxmMBeanServer->HashMap{color}* > !screenshot-6.png! > h3. *{color:#DE350B}4. 372 RaftServerImpl were held by Datanode ReportManager > Thread -> prometheus -> HashMap{color}* > !screenshot-7.png! > h3. *{color:#DE350B}5. 2038 DirectByteBuffer, and 1885 held by > RaftServerImpl.{color}* > !screenshot-8.png! > !screenshot-9.png! > h3. *{color:#DE350B}6. 1033 DirectByteBuffer were held by ManagermentFactory, > 802 DirectByteBuffer were held by Datanode ReportManager Thread, total > 1885.{color}* > !screenshot-10.png! > h3. *{color:#DE350B}7. The reason RaftServerImpl held by > ManagermentFactory->jxmMBeanServer->HashMap is ratis start > [JmxReporter|https://github.com/apache/incubator-ratis/blob/master/ratis-metrics/src/main/java/org/apache/ratis/metrics/MetricsReporting.java#L47], > but does not stop it. {color}* > h3. *{color:#DE350B}8. The reason RaftServerImpl held by Datanode > ReportManager Thread -> prometheus -> HashMap is ozone call the ratis > function to > [register|https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/HddsDatanodeService.java#L189] > metric in prometheus, but does not unregister it.{color}* -- This message was sent by Atlassian Jira (v8.3.4#803005)