Francesco vigotti created KAFKA-6129:
----------------------------------------

             Summary: kafka issue when exposing through nodeport in kubernetes
                 Key: KAFKA-6129
                 URL: https://issues.apache.org/jira/browse/KAFKA-6129
             Project: Kafka
          Issue Type: Bug
    Affects Versions: 0.10.2.1
         Environment: kubernetes
            Reporter: Francesco vigotti
            Priority: Critical


I've started writing in this issue: 
https://issues.apache.org/jira/browse/KAFKA-2729
but then I'm going to open this new issue because I've probably found the cause 
in my kubernetes setup, but In my opinion kubernetes did nothing wrong in his 
setup ( and all other application works using the same nodeport redirection , 
ie: zookeeper )
kafka brokers fails , silently (randomly in multiple brokers setup)  and with a 
misleading error from producer so I think that Kafka should be improved, 
providing more robust pre-startup flight-checks and identifying/reporting the 
current issue 

After further investigation from my reply here 
https://issues.apache.org/jira/browse/KAFKA-2729  with a minimum size cluster ( 
1 zk + 1 kafka-broker ) I've found the problem, 
the problem is with kubernetes, ( I don't know why this issue appeared only now 
to me , if something changed in recent kube-proxy versions or in kafka 0.10+ , 
or ... ) 
anyway my old kafka cluster started being underreplicated and return various 
problem , 

the problem happens when in kubernetes pods are created and redirected using a 
nodeport-service ( over a static ip in my case ) to expose kafka brokers from 
the host, when using hostNetwork  ( so no redirection ) everything works, what 
is strange is that zookeeper instead works fine with nodeport ( which create a 
redirection rule in iptables->nat->prerouting ) the only application I've found 
problems with this kubernetes configuration is kafka,
what is weird is that kafka starts correctly without errors, but on multiple 
broker clusters there are random issues, on single broker cluster instead the 
console-producer fails with infinite looop of :

```
[2017-10-26 09:38:23,281] WARN Error while fetching metadata with correlation 
id 5 : {test6=UNKNOWN_TOPIC_OR_PARTITION} 
(org.apache.kafka.clients.NetworkClient)
[2017-10-26 09:38:23,383] WARN Error while fetching metadata with correlation 
id 6 : {test6=UNKNOWN_TOPIC_OR_PARTITION} 
(org.apache.kafka.clients.NetworkClient)
[2017-10-26 09:38:23,485] WARN Error while fetching metadata with correlation 
id 7 : {test6=UNKNOWN_TOPIC_OR_PARTITION} 
(org.apache.kafka.clients.NetworkClient)
```
, still no errors reported from broker or zookeeper,
Also I want to say that I've come across this discussion : 
             
https://stackoverflow.com/questions/35788697/leader-not-available-kafka-in-console-producer
 
but the proposed solution for the host pod ( to allow self-resolving of 
advertised hostname) didn't worked 
``` 
hostAliases:
      - ip: "127.0.0.1"
        hostnames:
        - "---myhosthostname---"
````






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to