[ https://issues.apache.org/jira/browse/KAFKA-7974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ismael Juma updated KAFKA-7974: ------------------------------- Affects Version/s: 2.2.0 2.1.1 > KafkaAdminClient loses worker thread/enters zombie state when initial DNS > lookup fails > -------------------------------------------------------------------------------------- > > Key: KAFKA-7974 > URL: https://issues.apache.org/jira/browse/KAFKA-7974 > Project: Kafka > Issue Type: Bug > Affects Versions: 2.2.0, 2.1.1 > Reporter: Nicholas Parker > Priority: Major > Fix For: 2.3.0 > > > Version: kafka-clients-2.1.0 > I have some code that creates creates a KafkaAdminClient instance and then > invokes listTopics(). I was seeing the following stacktrace in the logs, > after which the KafkaAdminClient instance became unresponsive: > {code:java} > ERROR [kafka-admin-client-thread | adminclient-1] 2019-02-18 01:00:45,597 > KafkaThread.java:51 - Uncaught exception in thread 'kafka-admin-client-thread > | adminclient-1': > java.lang.IllegalStateException: No entry found for connection 0 > at > org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:330) > at > org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:134) > at > org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:921) > at org.apache.kafka.clients.NetworkClient.ready(NetworkClient.java:287) > at > org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.sendEligibleCalls(KafkaAdminClient.java:898) > at > org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.run(KafkaAdminClient.java:1113) > at java.lang.Thread.run(Thread.java:748){code} > From looking at the code I was able to trace down a possible cause: > * NetworkClient.ready() invokes this.initiateConnect() as seen in the above > stacktrace > * NetworkClient.initiateConnect() invokes > ClusterConnectionStates.connecting(), which internally invokes > ClientUtils.resolve() to to resolve the host when creating an entry for the > connection. > * If this host lookup fails, a UnknownHostException can be thrown back to > NetworkClient.initiateConnect() and the connection entry is not created in > ClusterConnectionStates. This exception doesn't get logged so this is a guess > on my part. > * NetworkClient.initiateConnect() catches the exception and attempts to call > ClusterConnectionStates.disconnected(), which throws an IllegalStateException > because no entry had yet been created due to the lookup failure. > * This IllegalStateException ends up killing the worker thread and > KafkaAdminClient gets stuck, never returning from listTopics(). -- This message was sent by Atlassian JIRA (v7.6.3#76005)