[ https://issues.apache.org/jira/browse/KAFKA-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Manikumar resolved KAFKA-2553. ------------------------------ Resolution: Fixed Similar issues areĀ in KAFKA-2169. Pl reopen if the issue still exists > Kafka Consumer Hangs after Network Partition > -------------------------------------------- > > Key: KAFKA-2553 > URL: https://issues.apache.org/jira/browse/KAFKA-2553 > Project: Kafka > Issue Type: Bug > Components: consumer > Affects Versions: 0.8.1.1 > Environment: Amazon EC2, Ubuntu 12.04. > Reporter: Aaditya Ramesh > Assignee: Neha Narkhede > Priority: Major > Attachments: kafka_bug_report > > > We have a Kafka consumer in an EC2 instance in Ireland that fetches data from > a kafka cluster in a datacenter in the eastern United States. We sporadically > encounter strange network partitions where we are unable to ping any machines > between the two data centers (the ping always times out), but this kind of > network partition is not too strange for inter-data center connections. > However, Kafka consumer's connection to Zookeeper never recovers after one of > these network hiccups and requires a full process restart in order to begin > consuming from the remote data center after the network has recovered. The > relevant code in ZookeeperConsumerConnector.scala catches all Throwables and > does nothing with them, which not only doesn't alert the process, but also > doesn't display any alerting metrics that we could use to diagnose such a > hung state. We therefore patched the client code in our codebase to perform a > System.exit(0) whenever this occurs, since a restart is better than failing > silently. -- This message was sent by Atlassian JIRA (v7.6.3#76005)