[jira] [Comment Edited] (KAFKA-7931) Java Client: if all ephemeral brokers fail, client can never reconnect to brokers

Sam Weston (JIRA) Mon, 15 Jul 2019 04:48:02 -0700


    [ 
https://issues.apache.org/jira/browse/KAFKA-7931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16885113#comment-16885113
 ]


Sam Weston edited comment on KAFKA-7931 at 7/15/19 11:40 AM:
-------------------------------------------------------------

Good news! I've got to the bottom of it!

The fix is to use a DNS name as the advertised listener instead of the Pod IP 
address (in my case the Kubernetes headless service). Now I can restart 
containers as quickly as I like and my Java apps don't get upset.

e.g. 
KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://pulseplatform-dev-kafka-0.pulseplatform-dev-kafka-headless.pulseplatform-dev:9092
 where the headless service is called pulseplatform-dev-kafka-headless, my 
namespace is pulseplatform-dev and the pod is called pulseplatform-dev-kafka-0

If you're using the incubator helm chart let me know and I'll provide more 
details of my values file.


was (Author: cablespaghetti):
Good news! I've got to the bottom of it!

The fix is to use a DNS name as the advertised listener instead of the Pod IP 
address (in my case the Kubernetes headless service). Now I can restart 
containers as quickly as I like and my Java apps don't get upset.

e.g. 
KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://pulseplatform-dev-kafka-0.pulseplatform-dev-kafka-headless.pulseplatform-dev:9092
 where the headless service is called pulseplatform-dev-kafka-headless, my 
namespace is pulseplatform-dev and the pod is called pulseplatform-dev-kafka-0

> Java Client: if all ephemeral brokers fail, client can never reconnect to 
> brokers
> ---------------------------------------------------------------------------------
>
>                 Key: KAFKA-7931
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7931
>             Project: Kafka
>          Issue Type: Bug
>          Components: clients
>    Affects Versions: 2.1.0
>            Reporter: Brian
>            Priority: Critical
>
> Steps to reproduce:
>  * Setup kafka cluster in GKE, with bootstrap server address configured to 
> point to a load balancer that exposes all GKE nodes
>  * Run producer that emits values into a partition with 3 replicas
>  * Kill every broker in the cluster
>  * Wait for brokers to restart
> Observed result:
> The java client cannot find any of the nodes even though they have all 
> recovered. I see messages like "Connection to node 30 (/10.6.0.101:9092) 
> could not be established. Broker may not be available.".
> Note, this is *not* a duplicate of 
> https://issues.apache.org/jira/browse/KAFKA-7890. I'm using the client 
> version that contains the fix for 
> https://issues.apache.org/jira/browse/KAFKA-7890.
> Versions:
> Kakfa: kafka version 2.1.0, using confluentinc/cp-kafka/5.1.0 docker image
> Client: trunk from a few days ago (git sha 
> 9f7e6b291309286e3e3c1610e98d978773c9d504), to pull in the fix for KAFKA-7890
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Comment Edited] (KAFKA-7931) Java Client: if all ephemeral brokers fail, client can never reconnect to brokers

Reply via email to