[ 
https://issues.apache.org/jira/browse/KAFKA-7931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16885113#comment-16885113
 ] 

Sam Weston commented on KAFKA-7931:
-----------------------------------

Good news! I've got to the bottom of it!

The fix is to use a DNS name as the advertised listener instead of the Pod IP 
address (in my case the Kubernetes headless service). Now I can restart 
containers as quickly as I like and my Java apps don't get upset.

e.g. 
KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://pulseplatform-dev-kafka-0.pulseplatform-dev-kafka-headless.pulseplatform-dev:9092
 where the headless service is called pulseplatform-dev-kafka-headless, my 
namespace is pulseplatform-dev and the pod is called pulseplatform-dev-kafka-0

> Java Client: if all ephemeral brokers fail, client can never reconnect to 
> brokers
> ---------------------------------------------------------------------------------
>
>                 Key: KAFKA-7931
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7931
>             Project: Kafka
>          Issue Type: Bug
>          Components: clients
>    Affects Versions: 2.1.0
>            Reporter: Brian
>            Priority: Critical
>
> Steps to reproduce:
>  * Setup kafka cluster in GKE, with bootstrap server address configured to 
> point to a load balancer that exposes all GKE nodes
>  * Run producer that emits values into a partition with 3 replicas
>  * Kill every broker in the cluster
>  * Wait for brokers to restart
> Observed result:
> The java client cannot find any of the nodes even though they have all 
> recovered. I see messages like "Connection to node 30 (/10.6.0.101:9092) 
> could not be established. Broker may not be available.".
> Note, this is *not* a duplicate of 
> https://issues.apache.org/jira/browse/KAFKA-7890. I'm using the client 
> version that contains the fix for 
> https://issues.apache.org/jira/browse/KAFKA-7890.
> Versions:
> Kakfa: kafka version 2.1.0, using confluentinc/cp-kafka/5.1.0 docker image
> Client: trunk from a few days ago (git sha 
> 9f7e6b291309286e3e3c1610e98d978773c9d504), to pull in the fix for KAFKA-7890
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to