[ 
https://issues.apache.org/jira/browse/KAFKA-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Abreu updated KAFKA-9531:
-----------------------------
    Component/s: network

> java.net.UnknownHostException loop on VM rolling update using CNAME
> -------------------------------------------------------------------
>
>                 Key: KAFKA-9531
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9531
>             Project: Kafka
>          Issue Type: Bug
>          Components: clients, controller, network, producer 
>    Affects Versions: 2.4.0
>            Reporter: Rui Abreu
>            Priority: Major
>
> Hello,
>  
> My cluster setup in based on VMs behind DNS CNAME .
> Example:  node.internal is a CNAME to either nodeA.internal or nodeB.internal
> Since kafka-client 1.2.1,  it has been observed that sometimes Kafka clients 
> get stuck on a loop with the exception:
> Example after nodeB.internal is replaced with nodeA.internal 
>  
> {code:java}
> 2020-02-10T12:11:28.181Z o.a.k.c.NetworkClient [WARN]    - [Consumer 
> clientId=consumer-6, groupId=consumer.group] Error connecting to node 
> nodeB.internal:9092 (id: 2 rack: null)
> java.net.UnknownHostException: nodeB.internal:9092
>       at java.net.InetAddress.getAllByName0(InetAddress.java:1281) 
> ~[?:1.8.0_222]
>       at java.net.InetAddress.getAllByName(InetAddress.java:1193) 
> ~[?:1.8.0_222]
>       at java.net.InetAddress.getAllByName(InetAddress.java:1127) 
> ~[?:1.8.0_222]
>       at org.apache.kafka.clients.ClientUtils.resolve(ClientUtils.java:104) 
> ~[stormjar.jar:?]
>       at 
> org.apache.kafka.clients.ClusterConnectionStates$NodeConnectionState.currentAddress(ClusterConnectionStates.java:403)
>  ~[stormjar.jar:?]
>       at 
> org.apache.kafka.clients.ClusterConnectionStates$NodeConnectionState.access$200(ClusterConnectionStates.java:363)
>  ~[stormjar.jar:?]
>       at 
> org.apache.kafka.clients.ClusterConnectionStates.currentAddress(ClusterConnectionStates.java:151)
>  ~[stormjar.jar:?]
>       at 
> org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:943)
>  ~[stormjar.jar:?]
>       at 
> org.apache.kafka.clients.NetworkClient.access$600(NetworkClient.java:68) 
> ~[stormjar.jar:?]
>       at 
> org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater.maybeUpdate(NetworkClient.java:1114)
>  ~[stormjar.jar:?]
>       at 
> org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater.maybeUpdate(NetworkClient.java:1005)
>  ~[stormjar.jar:?]
>       at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:537) 
> ~[stormjar.jar:?]
>       at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:262)
>  ~[stormjar.jar:?]
>       at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:233)
>  ~[stormjar.jar:?]
>       at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224)
>  ~[stormjar.jar:?]
>       at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.awaitMetadataUpdate(ConsumerNetworkClient.java:161)
>  ~[stormjar.jar:?]
>       at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:366)
>  ~[stormjar.jar:?]
>       at 
> org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1251)
>  ~[stormjar.jar:?]
>       at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1220) 
> ~[stormjar.jar:?]
>       at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1159) 
> ~[stormjar.jar:?]
>       at 
> org.apache.storm.kafka.spout.KafkaSpout.pollKafkaBroker(KafkaSpout.java:365) 
> ~[stormjar.jar:?]
>       at 
> org.apache.storm.kafka.spout.KafkaSpout.nextTuple(KafkaSpout.java:294) 
> ~[stormjar.jar:?]
>       at 
> org.apache.storm.daemon.executor$fn__10715$fn__10730$fn__10761.invoke(executor.clj:649)
>  ~[storm-core-1.1.3.jar:1.1.3]
>       at org.apache.storm.util$async_loop$fn__553.invoke(util.clj:484) 
> ~[storm-core-1.1.3.jar:1.1.3]
>       at clojure.lang.AFn.run(AFn.java:22) ~[clojure-1.7.0.jar:?]
>       at java.lang.Thread.run(Thread.java:748) [?:1.8.0_222]
> {code}
>  
> The time it spends in the loop is arbitrary, but it seems the client 
> effectively stops while this is happening.
> This error contrasts with instances where the client is able to recover on 
> its own after a few seconds:
> {code:java}
> 2020-02-08T01:15:37.390Z o.a.k.c.c.i.AbstractCoordinator [INFO] - [Consumer 
> clientId=consumer-7, groupId=consumer-group] Group coordinator 
> nodeA.internal:9092 (id: 2147483645 rack: null) is unavailable or invalid, 
> will attempt rediscovery
>  
> 2020-02-08T01:15:37.885Z o.a.k.c.c.i.AbstractCoordinator [INFO] - [Consumer 
> clientId=consumer-7, groupId=consumer-group] Discovered group coordinator 
> nodeB.internal:9092 (id: 2147483646 rack: null)
> 2020-02-08T01:15:37.885Z o.a.k.c.ClusterConnectionStates [INFO] - [Consumer 
> clientId=consumer-7, groupId=consumer-group] Hostname for node 2147483646 
> changed from nodeA.internal to nodeB.internal
> {code}
>    



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to