[ 
https://issues.apache.org/jira/browse/KAFKA-19912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18040961#comment-18040961
 ] 

KC.H commented on KAFKA-19912:
------------------------------

h2. How to Reproduce

 
{code:java}
//org.apache.kafka.clients.consumer.internals.ApplicationEventHandlerTest.java

@Test
public void testDelayInInitializeResources() throws InterruptedException {
    assertInitializeResourcesError(
        TimeoutException.class,
        () -> {
            long delayMs = initializationTimeoutMs * 2;
            org.apache.kafka.common.utils.Utils.sleep(delayMs);
            return networkClientDelegate;
        }
    );
    TimeUnit.MINUTES.sleep(1000); // Add this.
} {code}
h2. Root Cause

The OOM issue occurs due to a combination of three factors:
 # {*}Thread Leak (Already Fixed){*}: Before the fix, when an exception 
occurred during initialization, the {{ConsumerNetworkThread}} couldn't be 
properly closed, causing it to enter an infinite loop in the {{runOnce()}} 
method.
 # {*}Mock Object Behavior{*}: In tests, {{NetworkClientDelegate}} is a mock 
object. Unlike production code where {{poll()}} respects the timeout parameter, 
the mock returns immediately without any waiting.
 # {*}Mockito Invocation Recording{*}: Mockito records every method invocation 
in its {{{}invocationContainer{}}}. When the infinite loop rapidly calls mock 
methods, these records quickly fill up the heap.


 

 

> Investigate OOM error on builds due to leaked thread
> ----------------------------------------------------
>
>                 Key: KAFKA-19912
>                 URL: https://issues.apache.org/jira/browse/KAFKA-19912
>             Project: Kafka
>          Issue Type: Bug
>          Components: clients, consumer, unit tests
>            Reporter: Kirk True
>            Assignee: Kirk True
>            Priority: Major
>         Attachments: heap-dump.png
>
>
> Builds of Kafka began failing in the GitHub Actions environment because of an 
> OOM error. KAFKA-19898 was filed and one of the client unit tests was 
> identified as the culprit. [A fix|https://github.com/apache/kafka/pull/20930] 
> was pushed quickly to resolve the issue. However, since the test is only 
> executed once per run, it's not clear how a small number of dangling threads 
> could have caused the entire build to experience problems.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to