[ https://issues.apache.org/jira/browse/KAFKA-16052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17800978#comment-17800978 ]
Luke Chen edited comment on KAFKA-16052 at 12/28/23 12:13 PM: -------------------------------------------------------------- Sharing my way to detect the leaking threads: [https://github.com/apache/kafka/pull/15052/files#diff-b8f9f9d1b191457cbdb332a3429f0ad65b50fa4cef5af8562abcfd1f177a2cfeR2441] In this drafted PR, I added some "expected thread names" list (white list), and try to find threads that are not expected. The verification will be checked on each time `QuorumTestHarness` run (beforeAll/afterAll). The CI result [here|https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-15052/5] shows the unexpected threads detected. Usually, I have to trace back to earlier test cases to find out which one leaked the thread. But that, at least, give us some clue. was (Author: showuon): Sharing my way to detect the leaking threads: [https://github.com/apache/kafka/pull/15052/files#diff-b8f9f9d1b191457cbdb332a3429f0ad65b50fa4cef5af8562abcfd1f177a2cfeR2441] In this drafted PR, I added some "expected thread names" list (white list), and try to find threads that are not expected. The verification will be checked on each time `QuorumTestHarness` run (beforeAll/afterAll). The CI result [here|https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-15052/5] shows the unexpected threads detected. Usually, I have to trace back to earlier test cases to find out which one leaked the thread. But that's at least give us some clue. > OOM in Kafka test suite > ----------------------- > > Key: KAFKA-16052 > URL: https://issues.apache.org/jira/browse/KAFKA-16052 > Project: Kafka > Issue Type: Bug > Affects Versions: 3.7.0 > Reporter: Divij Vaidya > Priority: Major > Attachments: Screenshot 2023-12-27 at 14.04.52.png, Screenshot > 2023-12-27 at 14.22.21.png, Screenshot 2023-12-27 at 14.45.20.png, Screenshot > 2023-12-27 at 15.31.09.png, Screenshot 2023-12-27 at 17.44.09.png, Screenshot > 2023-12-28 at 00.13.06.png, Screenshot 2023-12-28 at 00.18.56.png, Screenshot > 2023-12-28 at 11.26.03.png, Screenshot 2023-12-28 at 11.26.09.png, newRM.patch > > > *Problem* > Our test suite is failing with frequent OOM. Discussion in the mailing list > is here: [https://lists.apache.org/thread/d5js0xpsrsvhgjb10mbzo9cwsy8087x4] > *Setup* > To find the source of leaks, I ran the :core:test build target with a single > thread (see below on how to do it) and attached a profiler to it. This Jira > tracks the list of action items identified from the analysis. > How to run tests using a single thread: > {code:java} > diff --git a/build.gradle b/build.gradle > index f7abbf4f0b..81df03f1ee 100644 > --- a/build.gradle > +++ b/build.gradle > @@ -74,9 +74,8 @@ ext { > "--add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED" > )- maxTestForks = project.hasProperty('maxParallelForks') ? > maxParallelForks.toInteger() : Runtime.runtime.availableProcessors() > - maxScalacThreads = project.hasProperty('maxScalacThreads') ? > maxScalacThreads.toInteger() : > - Math.min(Runtime.runtime.availableProcessors(), 8) > + maxTestForks = 1 > + maxScalacThreads = 1 > userIgnoreFailures = project.hasProperty('ignoreFailures') ? > ignoreFailures : false userMaxTestRetries = > project.hasProperty('maxTestRetries') ? maxTestRetries.toInteger() : 0 > diff --git a/gradle.properties b/gradle.properties > index 4880248cac..ee4b6e3bc1 100644 > --- a/gradle.properties > +++ b/gradle.properties > @@ -30,4 +30,4 @@ scalaVersion=2.13.12 > swaggerVersion=2.2.8 > task=build > org.gradle.jvmargs=-Xmx2g -Xss4m -XX:+UseParallelGC > -org.gradle.parallel=true > +org.gradle.parallel=false {code} > *Result of experiment* > This is how the heap memory utilized looks like, starting from tens of MB to > ending with 1.5GB (with spikes of 2GB) of heap being used as the test > executes. Note that the total number of threads also increases but it does > not correlate with sharp increase in heap memory usage. The heap dump is > available at > [https://www.dropbox.com/scl/fi/nwtgc6ir6830xlfy9z9cu/GradleWorkerMain_10311_27_12_2023_13_37_08.hprof.zip?rlkey=ozbdgh5vih4rcynnxbatzk7ln&dl=0] > > !Screenshot 2023-12-27 at 14.22.21.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)