[ 
https://issues.apache.org/jira/browse/KAFKA-9044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16968622#comment-16968622
 ] 

Peter Bukowinski commented on KAFKA-9044:
-----------------------------------------

I parsed the zk transaction log and there is deletion event that lines up with 
the timestamp from the controller's log.

11/6/19 3:24:46 AM PST session 0x36b4976857ae342 cxid 0x1 zxid 0xa0244538f 
delete '/kafka/brokers/ids/23
11/6/19 3:24:46 AM PST session 0x26b4976aa821783 cxid 0x7 zxid 0xa02445390 
setData 
'/kafka/CruiseControlBrokerList,#32333d31353733303339343836313233,1467596
11/6/19 3:24:46 AM PST session 0x36b49768579d948 cxid 0x1f5 zxid 0xa024454b8 
setData 
'/kafka/brokers/topics/ac/partitions/207/state,#7b22636f6e74726f6c6c65725f65706f6368223a31372c226c6561646572223a31382c2276657273696f6e223a312c226c65616465725f65706f6368223a31332c22697372223a5b31382c312c32335d7d,28
11/6/19 3:24:46 AM PST session 0x16c9250fa50af3e cxid 0x2c6 zxid 0xa024454b9 
setData 
'/kafka/brokers/topics/Fe/partitions/199/state,#7b22636f6e74726f6c6c65725f65706f6368223a31372c226c6561646572223a322c2276657273696f6e223a312c226c65616465725f65706f6368223a31332c22697372223a5b322c312c32335d7d,28
...


> Brokers occasionally (randomly?) dropping out of clusters
> ---------------------------------------------------------
>
>                 Key: KAFKA-9044
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9044
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 2.3.0, 2.3.1
>         Environment: Ubuntu 14.04
>            Reporter: Peter Bukowinski
>            Priority: Major
>
> I have several cluster running kafka 2.3.1 and this issue has affected all of 
> them. Because of replication and the size of the clusters (30 brokers), this 
> bug is not causing any data loss, but it is nevertheless concerning. When a 
> broker drops out, the log gives no indication that there are any zookeeper 
> issues (and indeed the zookeepers are healthy when this occurs. Here's 
> snippet from a broker log when it occurs:
> {{[2019-10-07 11:02:27,630] INFO [GroupMetadataManager brokerId=14] Removed 0 
> expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 11:02:56,936] INFO [Log partition=internal_test-52, 
> dir=/data/3/kl] Found deletable segments with base offsets [1975332] due to 
> retention time 3600000ms breach (kafka.log.Log)}}
>  {{[2019-10-07 11:02:56,936] INFO [Log partition=internal_test-52, 
> dir=/data/3/kl] Scheduling log segment [baseOffset 1975332, size 92076008] 
> for deletion. (kafka.log.Log)}}
>  {{[2019-10-07 11:02:56,936] INFO [Log partition=internal_test-52, 
> dir=/data/3/kl] Incrementing log start offset to 2000317 (kafka.log.Log)}}
>  {{[2019-10-07 11:03:56,936] INFO [Log partition=internal_test-52, 
> dir=/data/3/kl] Deleting segment 1975332 (kafka.log.Log)}}
>  {{[2019-10-07 11:03:56,957] INFO Deleted log 
> /data/3/kl/internal_test-52/00000000000001975332.log.deleted. 
> (kafka.log.LogSegment)}}
>  {{[2019-10-07 11:03:56,957] INFO Deleted offset index 
> /data/3/kl/internal_test-52/00000000000001975332.index.deleted. 
> (kafka.log.LogSegment)}}
>  {{[2019-10-07 11:03:56,958] INFO Deleted time index 
> /data/3/kl/internal_test-52/00000000000001975332.timeindex.deleted. 
> (kafka.log.LogSegment)}}
>  {{[2019-10-07 11:12:27,630] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 11:22:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 11:32:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 11:42:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 11:52:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 12:02:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 12:12:27,630] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 12:22:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 12:32:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 12:42:27,630] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 1 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 12:52:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 13:02:27,630] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 13:12:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 13:22:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 13:32:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 13:42:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 13:52:27,630] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 1 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 14:02:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 14:12:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 14:22:27,630] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 1 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 14:32:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 14:42:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 14:46:07,510] INFO [Partition internal_test-33 broker=14] 
> Shrinking ISR from 16,17,14 to 14. Leader: (highWatermark: 2007553, 
> endOffset: 2007555). Out of sync replicas: (brokerId: 16, endOffset: 2007553) 
> (brokerId: 17, endOffset: 2007553). (kafka.cluster.Partition)}}
>  {{[2019-10-07 14:46:07,511] INFO [Partition internal_test-33 broker=14] 
> Cached zkVersion [17] not equal to that in zookeeper, skip updating ISR 
> (kafka.cluster.Partition)}}
> The controller log shows the following:
> {{[2019-10-07 14:45:55,427] INFO [Controller id=24] Newly added brokers: , 
> deleted brokers: 14, bounced brokers: , all live brokers: 
> 1,2,3,4,5,6,7,8,9,10,11,12,13,15,16,17,18,19,20,21,22,23,24,25 
> (kafka.controller.KafkaController)}}
> {{[2019-10-07 14:45:55,477] INFO [RequestSendThread controllerId=24] Shutting 
> down (kafka.controller.RequestSendThread)}}
> {{[2019-10-07 14:45:55,477] INFO [RequestSendThread controllerId=24] Shutdown 
> completed (kafka.controller.RequestSendThread)}}
> {{[2019-10-07 14:45:55,477] INFO [RequestSendThread controllerId=24] Stopped 
> (kafka.controller.RequestSendThread)}}
> {{[2019-10-07 14:45:55,481] INFO [Controller id=24] Broker failure callback 
> for 14 (kafka.controller.KafkaController)}}
> The clusters use {{zookeeper.session.timeout.ms=45000}}, and 
> {{zookeeper.connection.timeout.ms=90000}}.
> I'm unable to find a cause for this behavior. The only thing I can do to 
> resolve the issue is to restart the broker.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to