Lei Wang created KAFKA-3971: ------------------------------- Summary: Consumers drop from coordinator and cannot reconnet Key: KAFKA-3971 URL: https://issues.apache.org/jira/browse/KAFKA-3971 Project: Kafka Issue Type: Bug Components: consumer Affects Versions: 0.9.0.1 Environment: version 0.9.0.1 Reporter: Lei Wang
>From time to time, we're creating new topics, and all consumers will pickup >those new topics. When starting to consume from these new topics, we often see >some of random consumers cannot connect to the coordinator. The log will be >flushed with the following log message tens of thousands every second: {noformat} 16/07/18 18:18:36.003 INFO (AbstractCoordinator.java:529): Marking the coordinator 2147483645 dead. 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the coordinator 2147483645 dead. 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the coordinator 2147483645 dead. 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the coordinator 2147483645 dead. 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the coordinator 2147483645 dead. 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the coordinator 2147483645 dead. 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the coordinator 2147483645 dead. 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the coordinator 2147483645 dead. 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the coordinator 2147483645 dead. 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the coordinator 2147483645 dead. 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the coordinator 2147483645 dead. 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the coordinator 2147483645 dead. 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the coordinator 2147483645 dead. 16/07/18 18:18:36.005 INFO (AbstractCoordinator.java:529): Marking the coordinator 2147483645 dead. {noformat} the servers seem working fine, and other consumers are also happy. from the log, looks like it's retrying multiple times every millisecond but all failing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)