[ https://issues.apache.org/jira/browse/KAFKA-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17163590#comment-17163590 ]
Thiago Santos commented on KAFKA-9531: -------------------------------------- I am experiencing the same problem. When i start a local Kafka cluster with docker-compose. The kafka-connect producer gets stuck in this loop when i stop one of the containers in the Kafka cluster. Any update about this issue? {code:java} // code placeholder kafka-connect | java.net.UnknownHostException: kafka3kafka-connect | java.net.UnknownHostException: kafka3kafka-connect | at java.net.InetAddress.getAllByName0(InetAddress.java:1281)kafka-connect | at java.net.InetAddress.getAllByName(InetAddress.java:1193)kafka-connect | at java.net.InetAddress.getAllByName(InetAddress.java:1127)kafka-connect | at org.apache.kafka.clients.ClientUtils.resolve(ClientUtils.java:110)kafka-connect | at org.apache.kafka.clients.ClusterConnectionStates$NodeConnectionState.currentAddress(ClusterConnectionStates.java:403)kafka-connect | at org.apache.kafka.clients.ClusterConnectionStates$NodeConnectionState.access$200(ClusterConnectionStates.java:363)kafka-connect | at org.apache.kafka.clients.ClusterConnectionStates.currentAddress(ClusterConnectionStates.java:151)kafka-connect | at org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:962)kafka-connect | at org.apache.kafka.clients.NetworkClient.ready(NetworkClient.java:294)kafka-connect | at org.apache.kafka.clients.producer.internals.Sender.sendProducerData(Sender.java:350)kafka-connect | at org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:323)kafka-connect | at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:239)kafka-connect | at java.lang.Thread.run(Thread.java:748) {code} > java.net.UnknownHostException loop on VM rolling update using CNAME > ------------------------------------------------------------------- > > Key: KAFKA-9531 > URL: https://issues.apache.org/jira/browse/KAFKA-9531 > Project: Kafka > Issue Type: Bug > Components: clients, controller, network, producer > Affects Versions: 2.4.0 > Reporter: Rui Abreu > Priority: Major > > Hello, > > My cluster setup in based on VMs behind DNS CNAME . > Example: node.internal is a CNAME to either nodeA.internal or nodeB.internal > Since kafka-client 1.2.1, it has been observed that sometimes Kafka clients > get stuck on a loop with the exception: > Example after nodeB.internal is replaced with nodeA.internal > > {code:java} > 2020-02-10T12:11:28.181Z o.a.k.c.NetworkClient [WARN] - [Consumer > clientId=consumer-6, groupId=consumer.group] Error connecting to node > nodeB.internal:9092 (id: 2 rack: null) > java.net.UnknownHostException: nodeB.internal:9092 > at java.net.InetAddress.getAllByName0(InetAddress.java:1281) > ~[?:1.8.0_222] > at java.net.InetAddress.getAllByName(InetAddress.java:1193) > ~[?:1.8.0_222] > at java.net.InetAddress.getAllByName(InetAddress.java:1127) > ~[?:1.8.0_222] > at org.apache.kafka.clients.ClientUtils.resolve(ClientUtils.java:104) > ~[stormjar.jar:?] > at > org.apache.kafka.clients.ClusterConnectionStates$NodeConnectionState.currentAddress(ClusterConnectionStates.java:403) > ~[stormjar.jar:?] > at > org.apache.kafka.clients.ClusterConnectionStates$NodeConnectionState.access$200(ClusterConnectionStates.java:363) > ~[stormjar.jar:?] > at > org.apache.kafka.clients.ClusterConnectionStates.currentAddress(ClusterConnectionStates.java:151) > ~[stormjar.jar:?] > at > org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:943) > ~[stormjar.jar:?] > at > org.apache.kafka.clients.NetworkClient.access$600(NetworkClient.java:68) > ~[stormjar.jar:?] > at > org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater.maybeUpdate(NetworkClient.java:1114) > ~[stormjar.jar:?] > at > org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater.maybeUpdate(NetworkClient.java:1005) > ~[stormjar.jar:?] > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:537) > ~[stormjar.jar:?] > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:262) > ~[stormjar.jar:?] > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:233) > ~[stormjar.jar:?] > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224) > ~[stormjar.jar:?] > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.awaitMetadataUpdate(ConsumerNetworkClient.java:161) > ~[stormjar.jar:?] > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:366) > ~[stormjar.jar:?] > at > org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1251) > ~[stormjar.jar:?] > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1220) > ~[stormjar.jar:?] > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1159) > ~[stormjar.jar:?] > at > org.apache.storm.kafka.spout.KafkaSpout.pollKafkaBroker(KafkaSpout.java:365) > ~[stormjar.jar:?] > at > org.apache.storm.kafka.spout.KafkaSpout.nextTuple(KafkaSpout.java:294) > ~[stormjar.jar:?] > at > org.apache.storm.daemon.executor$fn__10715$fn__10730$fn__10761.invoke(executor.clj:649) > ~[storm-core-1.1.3.jar:1.1.3] > at org.apache.storm.util$async_loop$fn__553.invoke(util.clj:484) > ~[storm-core-1.1.3.jar:1.1.3] > at clojure.lang.AFn.run(AFn.java:22) ~[clojure-1.7.0.jar:?] > at java.lang.Thread.run(Thread.java:748) [?:1.8.0_222] > {code} > > The time it spends in the loop is arbitrary, but it seems the client > effectively stops while this is happening. > This error contrasts with instances where the client is able to recover on > its own after a few seconds: > {code:java} > 2020-02-08T01:15:37.390Z o.a.k.c.c.i.AbstractCoordinator [INFO] - [Consumer > clientId=consumer-7, groupId=consumer-group] Group coordinator > nodeA.internal:9092 (id: 2147483645 rack: null) is unavailable or invalid, > will attempt rediscovery > > 2020-02-08T01:15:37.885Z o.a.k.c.c.i.AbstractCoordinator [INFO] - [Consumer > clientId=consumer-7, groupId=consumer-group] Discovered group coordinator > nodeB.internal:9092 (id: 2147483646 rack: null) > 2020-02-08T01:15:37.885Z o.a.k.c.ClusterConnectionStates [INFO] - [Consumer > clientId=consumer-7, groupId=consumer-group] Hostname for node 2147483646 > changed from nodeA.internal to nodeB.internal > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)