[ https://issues.apache.org/jira/browse/RATIS-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17633276#comment-17633276 ]
Song Ziyang commented on RATIS-1743: ------------------------------------ Seems that there is a reference leak of SegmentedRaftLogWorker. As a workaround, I can manually set the sharedBuffer to null in the close() hook. What do you think? [~adoroszlai] [~szetszwo] > Memory leak in SegmentedRaftLogWorker due to metrics > ---------------------------------------------------- > > Key: RATIS-1743 > URL: https://issues.apache.org/jira/browse/RATIS-1743 > Project: Ratis > Issue Type: Bug > Components: server > Affects Versions: 3.0.0, 2.4.1 > Reporter: Attila Doroszlai > Priority: Blocker > Attachments: Screenshot from 2022-11-12 22-17-11.png > > > OOME happens in Ozone integration tests. Currently Xmx=2g, but increasing it > does not help. > {code:title=https://github.com/adoroszlai/hadoop-ozone/actions/runs/3450185096/jobs/5761108630#step:5:3155} > [INFO] Running org.apache.hadoop.ozone.scm.TestStorageContainerManagerHA > Error: java.lang.OutOfMemoryError: Java heap space > Error: Tests run: 8, Failures: 0, Errors: 4, Skipped: 0, Time elapsed: > 426.774 s <<< FAILURE! - in > org.apache.hadoop.ozone.scm.TestStorageContainerManagerHA > {code} > {code} > java.lang.OutOfMemoryError: Java heap space > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker.lambda$new$4(SegmentedRaftLogWorker.java:223) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$$Lambda$603/1771708635.get(Unknown > Source) > at org.apache.ratis.util.MemoizedSupplier.get(MemoizedSupplier.java:62) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogOutputStream.write(SegmentedRaftLogOutputStream.java:101) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$WriteLog.execute(SegmentedRaftLogWorker.java:568) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker.run(SegmentedRaftLogWorker.java:320) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$$Lambda$595/1626598428.run(Unknown > Source) > at java.lang.Thread.run(Thread.java:750) > {code} > Ozone registers JMX reporter (this is not new): > {code:title=https://github.com/apache/ozone/blob/a13c62b60556cd003ee2149179f72029d9e35756/hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/server/http/RatisDropwizardExports.java#L51-L53} > MetricRegistries.global() > .addReporterRegistration(MetricsReporting.jmxReporter(), > MetricsReporting.stopJmxReporter()); > {code} > Based on the heap dump and test log, {{SegmentedRaftLogWorker}} instances are > retained by JmxMBeanServer after {{close()}}. > The problem is probably not new, but its effect is much worse now, because > {{SegmentedRaftLogWorker}} recently got a shared buffer (RATIS-1717). > {code:title=config in Ozone} > raft.server.log.appender.buffer.byte-limit = 33554432 (custom) > {code} > See screenshot for GC root. > CC [~szetszwo], [~William Song] -- This message was sent by Atlassian Jira (v8.20.10#820010)