[ https://issues.apache.org/jira/browse/KAFKA-9044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16968291#comment-16968291 ]
Ismael Juma commented on KAFKA-9044: ------------------------------------ If you can share, it would be useful to have access to server, state change and controller logs during the affected period. > Brokers occasionally (randomly?) dropping out of clusters > --------------------------------------------------------- > > Key: KAFKA-9044 > URL: https://issues.apache.org/jira/browse/KAFKA-9044 > Project: Kafka > Issue Type: Bug > Components: core > Affects Versions: 2.3.0, 2.3.1 > Environment: Ubuntu 14.04 > Reporter: Peter Bukowinski > Priority: Major > > I have several cluster running kafka 2.3.1 and this issue has affected all of > them. Because of replication and the size of the clusters (30 brokers), this > bug is not causing any data loss, but it is nevertheless concerning. When a > broker drops out, the log gives no indication that there are any zookeeper > issues (and indeed the zookeepers are healthy when this occurs. Here's > snippet from a broker log when it occurs: > {{[2019-10-07 11:02:27,630] INFO [GroupMetadataManager brokerId=14] Removed 0 > expired offsets in 0 milliseconds. > (kafka.coordinator.group.GroupMetadataManager)}} > {{[2019-10-07 11:02:56,936] INFO [Log partition=internal_test-52, > dir=/data/3/kl] Found deletable segments with base offsets [1975332] due to > retention time 3600000ms breach (kafka.log.Log)}} > {{[2019-10-07 11:02:56,936] INFO [Log partition=internal_test-52, > dir=/data/3/kl] Scheduling log segment [baseOffset 1975332, size 92076008] > for deletion. (kafka.log.Log)}} > {{[2019-10-07 11:02:56,936] INFO [Log partition=internal_test-52, > dir=/data/3/kl] Incrementing log start offset to 2000317 (kafka.log.Log)}} > {{[2019-10-07 11:03:56,936] INFO [Log partition=internal_test-52, > dir=/data/3/kl] Deleting segment 1975332 (kafka.log.Log)}} > {{[2019-10-07 11:03:56,957] INFO Deleted log > /data/3/kl/internal_test-52/00000000000001975332.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-10-07 11:03:56,957] INFO Deleted offset index > /data/3/kl/internal_test-52/00000000000001975332.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-10-07 11:03:56,958] INFO Deleted time index > /data/3/kl/internal_test-52/00000000000001975332.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-10-07 11:12:27,630] INFO [GroupMetadataManager brokerId=14] Removed > 0 expired offsets in 0 milliseconds. > (kafka.coordinator.group.GroupMetadataManager)}} > {{[2019-10-07 11:22:27,629] INFO [GroupMetadataManager brokerId=14] Removed > 0 expired offsets in 0 milliseconds. > (kafka.coordinator.group.GroupMetadataManager)}} > {{[2019-10-07 11:32:27,629] INFO [GroupMetadataManager brokerId=14] Removed > 0 expired offsets in 0 milliseconds. > (kafka.coordinator.group.GroupMetadataManager)}} > {{[2019-10-07 11:42:27,629] INFO [GroupMetadataManager brokerId=14] Removed > 0 expired offsets in 0 milliseconds. > (kafka.coordinator.group.GroupMetadataManager)}} > {{[2019-10-07 11:52:27,629] INFO [GroupMetadataManager brokerId=14] Removed > 0 expired offsets in 0 milliseconds. > (kafka.coordinator.group.GroupMetadataManager)}} > {{[2019-10-07 12:02:27,629] INFO [GroupMetadataManager brokerId=14] Removed > 0 expired offsets in 0 milliseconds. > (kafka.coordinator.group.GroupMetadataManager)}} > {{[2019-10-07 12:12:27,630] INFO [GroupMetadataManager brokerId=14] Removed > 0 expired offsets in 0 milliseconds. > (kafka.coordinator.group.GroupMetadataManager)}} > {{[2019-10-07 12:22:27,629] INFO [GroupMetadataManager brokerId=14] Removed > 0 expired offsets in 0 milliseconds. > (kafka.coordinator.group.GroupMetadataManager)}} > {{[2019-10-07 12:32:27,629] INFO [GroupMetadataManager brokerId=14] Removed > 0 expired offsets in 0 milliseconds. > (kafka.coordinator.group.GroupMetadataManager)}} > {{[2019-10-07 12:42:27,630] INFO [GroupMetadataManager brokerId=14] Removed > 0 expired offsets in 1 milliseconds. > (kafka.coordinator.group.GroupMetadataManager)}} > {{[2019-10-07 12:52:27,629] INFO [GroupMetadataManager brokerId=14] Removed > 0 expired offsets in 0 milliseconds. > (kafka.coordinator.group.GroupMetadataManager)}} > {{[2019-10-07 13:02:27,630] INFO [GroupMetadataManager brokerId=14] Removed > 0 expired offsets in 0 milliseconds. > (kafka.coordinator.group.GroupMetadataManager)}} > {{[2019-10-07 13:12:27,629] INFO [GroupMetadataManager brokerId=14] Removed > 0 expired offsets in 0 milliseconds. > (kafka.coordinator.group.GroupMetadataManager)}} > {{[2019-10-07 13:22:27,629] INFO [GroupMetadataManager brokerId=14] Removed > 0 expired offsets in 0 milliseconds. > (kafka.coordinator.group.GroupMetadataManager)}} > {{[2019-10-07 13:32:27,629] INFO [GroupMetadataManager brokerId=14] Removed > 0 expired offsets in 0 milliseconds. > (kafka.coordinator.group.GroupMetadataManager)}} > {{[2019-10-07 13:42:27,629] INFO [GroupMetadataManager brokerId=14] Removed > 0 expired offsets in 0 milliseconds. > (kafka.coordinator.group.GroupMetadataManager)}} > {{[2019-10-07 13:52:27,630] INFO [GroupMetadataManager brokerId=14] Removed > 0 expired offsets in 1 milliseconds. > (kafka.coordinator.group.GroupMetadataManager)}} > {{[2019-10-07 14:02:27,629] INFO [GroupMetadataManager brokerId=14] Removed > 0 expired offsets in 0 milliseconds. > (kafka.coordinator.group.GroupMetadataManager)}} > {{[2019-10-07 14:12:27,629] INFO [GroupMetadataManager brokerId=14] Removed > 0 expired offsets in 0 milliseconds. > (kafka.coordinator.group.GroupMetadataManager)}} > {{[2019-10-07 14:22:27,630] INFO [GroupMetadataManager brokerId=14] Removed > 0 expired offsets in 1 milliseconds. > (kafka.coordinator.group.GroupMetadataManager)}} > {{[2019-10-07 14:32:27,629] INFO [GroupMetadataManager brokerId=14] Removed > 0 expired offsets in 0 milliseconds. > (kafka.coordinator.group.GroupMetadataManager)}} > {{[2019-10-07 14:42:27,629] INFO [GroupMetadataManager brokerId=14] Removed > 0 expired offsets in 0 milliseconds. > (kafka.coordinator.group.GroupMetadataManager)}} > {{[2019-10-07 14:46:07,510] INFO [Partition internal_test-33 broker=14] > Shrinking ISR from 16,17,14 to 14. Leader: (highWatermark: 2007553, > endOffset: 2007555). Out of sync replicas: (brokerId: 16, endOffset: 2007553) > (brokerId: 17, endOffset: 2007553). (kafka.cluster.Partition)}} > {{[2019-10-07 14:46:07,511] INFO [Partition internal_test-33 broker=14] > Cached zkVersion [17] not equal to that in zookeeper, skip updating ISR > (kafka.cluster.Partition)}} > The controller log shows the following: > {{[2019-10-07 14:45:55,427] INFO [Controller id=24] Newly added brokers: , > deleted brokers: 14, bounced brokers: , all live brokers: > 1,2,3,4,5,6,7,8,9,10,11,12,13,15,16,17,18,19,20,21,22,23,24,25 > (kafka.controller.KafkaController)}} > {{[2019-10-07 14:45:55,477] INFO [RequestSendThread controllerId=24] Shutting > down (kafka.controller.RequestSendThread)}} > {{[2019-10-07 14:45:55,477] INFO [RequestSendThread controllerId=24] Shutdown > completed (kafka.controller.RequestSendThread)}} > {{[2019-10-07 14:45:55,477] INFO [RequestSendThread controllerId=24] Stopped > (kafka.controller.RequestSendThread)}} > {{[2019-10-07 14:45:55,481] INFO [Controller id=24] Broker failure callback > for 14 (kafka.controller.KafkaController)}} > The clusters use {{zookeeper.session.timeout.ms=45000}}, and > {{zookeeper.connection.timeout.ms=90000}}. > I'm unable to find a cause for this behavior. The only thing I can do to > resolve the issue is to restart the broker. -- This message was sent by Atlassian Jira (v8.3.4#803005)