[ 
https://issues.apache.org/jira/browse/KAFKA-6839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tyler Monahan updated KAFKA-6839:
---------------------------------
    Description: 
I have a 3 node kafka cluster setup in aws that talks to a 3 node zk cluster 
behind an elb. I am giving the kafka instances a dns cname record that points 
to the aws elb which is another cname record pointing to two A records. When 
the aws elb cname record changes the two A records it is pointing at and kafka 
trys to reconnect to zk after losing a session it uses the old A records and 
not the new ones so the reconnect attempt fails. There appears to be some kind 
of caching instead of using the record that is set in the config file.

This is the error message I am seeing in the broker logs.
{code:java}
[2018-04-30 20:09:21,449] INFO Opening socket connection to server 
ip-10-65-68-244.us-west-2.compute.internal/10.65.68.244:2181. Will not attempt 
to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2018-04-30 20:09:24,450] WARN Client session timed out, have not heard from 
server in 3962ms for sessionid 0x263094512190001 
(org.apache.zookeeper.ClientCnxn)
[2018-04-30 20:09:24,451] INFO Client session timed out, have not heard from 
server in 3962ms for sessionid 0x263094512190001, closing socket connection and 
attempting reconnect (org.apache.zookeeper.ClientCnxn)
[2018-04-30 20:09:26,532] INFO Opening socket connection to server 
ip-10-65-84-102.us-west-2.compute.internal/10.65.84.102:2181. Will not attempt 
to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2018-04-30 20:09:29,531] WARN Session 0x263094512190001 for server null, 
unexpected error, closing socket connection and attempting reconnect 
(org.apache.zookeeper.ClientCnxn)
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141)
{code}

  was:I have a 3 node kafka cluster setup in aws that talks to a 3 node zk 
cluster behind an elb. I am giving the kafka instances a dns cname record that 
points to the aws elb which is another cname record pointing to two A records. 
When the aws elb cname record changes the two A records it is pointing at and 
kafka trys to reconnect to zk after losing a session it uses the old A records 
and not the new ones so the reconnect attempt fails. There appears to be some 
kind of caching instead of using the record that is set in the config file.


> ZK session retry with cname record
> ----------------------------------
>
>                 Key: KAFKA-6839
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6839
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 1.1.0
>            Reporter: Tyler Monahan
>            Priority: Major
>
> I have a 3 node kafka cluster setup in aws that talks to a 3 node zk cluster 
> behind an elb. I am giving the kafka instances a dns cname record that points 
> to the aws elb which is another cname record pointing to two A records. When 
> the aws elb cname record changes the two A records it is pointing at and 
> kafka trys to reconnect to zk after losing a session it uses the old A 
> records and not the new ones so the reconnect attempt fails. There appears to 
> be some kind of caching instead of using the record that is set in the config 
> file.
> This is the error message I am seeing in the broker logs.
> {code:java}
> [2018-04-30 20:09:21,449] INFO Opening socket connection to server 
> ip-10-65-68-244.us-west-2.compute.internal/10.65.68.244:2181. Will not 
> attempt to authenticate using SASL (unknown error) 
> (org.apache.zookeeper.ClientCnxn)
> [2018-04-30 20:09:24,450] WARN Client session timed out, have not heard from 
> server in 3962ms for sessionid 0x263094512190001 
> (org.apache.zookeeper.ClientCnxn)
> [2018-04-30 20:09:24,451] INFO Client session timed out, have not heard from 
> server in 3962ms for sessionid 0x263094512190001, closing socket connection 
> and attempting reconnect (org.apache.zookeeper.ClientCnxn)
> [2018-04-30 20:09:26,532] INFO Opening socket connection to server 
> ip-10-65-84-102.us-west-2.compute.internal/10.65.84.102:2181. Will not 
> attempt to authenticate using SASL (unknown error) 
> (org.apache.zookeeper.ClientCnxn)
> [2018-04-30 20:09:29,531] WARN Session 0x263094512190001 for server null, 
> unexpected error, closing socket connection and attempting reconnect 
> (org.apache.zookeeper.ClientCnxn)
> java.net.NoRouteToHostException: No route to host
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
> at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to