Francesco vigotti created KAFKA-6129:
----------------------------------------
Summary: kafka issue when exposing through nodeport in kubernetes
Key: KAFKA-6129
URL: https://issues.apache.org/jira/browse/KAFKA-6129
Project: Kafka
Issue Type: Bug
Affects Versions: 0.10.2.1
Environment: kubernetes
Reporter: Francesco vigotti
Priority: Critical
I've started writing in this issue:
https://issues.apache.org/jira/browse/KAFKA-2729
but then I'm going to open this new issue because I've probably found the cause
in my kubernetes setup, but In my opinion kubernetes did nothing wrong in his
setup ( and all other application works using the same nodeport redirection ,
ie: zookeeper )
kafka brokers fails , silently (randomly in multiple brokers setup) and with a
misleading error from producer so I think that Kafka should be improved,
providing more robust pre-startup flight-checks and identifying/reporting the
current issue
After further investigation from my reply here
https://issues.apache.org/jira/browse/KAFKA-2729 with a minimum size cluster (
1 zk + 1 kafka-broker ) I've found the problem,
the problem is with kubernetes, ( I don't know why this issue appeared only now
to me , if something changed in recent kube-proxy versions or in kafka 0.10+ ,
or ... )
anyway my old kafka cluster started being underreplicated and return various
problem ,
the problem happens when in kubernetes pods are created and redirected using a
nodeport-service ( over a static ip in my case ) to expose kafka brokers from
the host, when using hostNetwork ( so no redirection ) everything works, what
is strange is that zookeeper instead works fine with nodeport ( which create a
redirection rule in iptables->nat->prerouting ) the only application I've found
problems with this kubernetes configuration is kafka,
what is weird is that kafka starts correctly without errors, but on multiple
broker clusters there are random issues, on single broker cluster instead the
console-producer fails with infinite looop of :
```
[2017-10-26 09:38:23,281] WARN Error while fetching metadata with correlation
id 5 : {test6=UNKNOWN_TOPIC_OR_PARTITION}
(org.apache.kafka.clients.NetworkClient)
[2017-10-26 09:38:23,383] WARN Error while fetching metadata with correlation
id 6 : {test6=UNKNOWN_TOPIC_OR_PARTITION}
(org.apache.kafka.clients.NetworkClient)
[2017-10-26 09:38:23,485] WARN Error while fetching metadata with correlation
id 7 : {test6=UNKNOWN_TOPIC_OR_PARTITION}
(org.apache.kafka.clients.NetworkClient)
```
, still no errors reported from broker or zookeeper,
Also I want to say that I've come across this discussion :
https://stackoverflow.com/questions/35788697/leader-not-available-kafka-in-console-producer
but the proposed solution for the host pod ( to allow self-resolving of
advertised hostname) didn't worked
```
hostAliases:
- ip: "127.0.0.1"
hostnames:
- "---myhosthostname---"
````
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)