Bryan, Did you take down some brokers in your cluster while hitting KAFKA-1028? If yes, you may be hitting KAFKA-1647 also.
Guozhang On Mon, Oct 20, 2014 at 1:18 PM, Bryan Baugher <bjb...@gmail.com> wrote: > Hi everyone, > > We run a 3 Kafka cluster using 0.8.1.1 with all topics having a replication > factor of 3 meaning every broker has a replica of every partition. > > We recently ran into this issue ( > https://issues.apache.org/jira/browse/KAFKA-1028) and saw data loss within > Kafka. We understand why it happened and have plans to try to ensure it > doesn't happen again. > > The strange part was that the broker that was chosen for the un-clean > leader election seemed to drop all of its own data about the partition in > the process as our monitoring shows the broker offset was reset to 0 for a > number of partitions. > > Following the broker's server logs in chronological order for a particular > partition that saw data loss I see this, > > 2014-10-16 10:18:11,104 INFO kafka.log.Log: Completed load of log TOPIC-6 > with log end offset 528026 > > 2014-10-16 10:20:18,144 WARN > kafka.controller.OfflinePartitionLeaderSelector: > [OfflinePartitionLeaderSelector]: No broker in ISR is alive for [TOPIC,6]. > Elect leader 1 from live brokers 1,2. There's potential data loss. > > 2014-10-16 10:20:18,277 WARN kafka.cluster.Partition: Partition [TOPIC,6] > on broker 1: No checkpointed highwatermark is found for partition [TOPIC,6] > > 2014-10-16 10:20:18,698 INFO kafka.log.Log: Truncating log TOPIC-6 to > offset 0. > > 2014-10-16 10:21:18,788 INFO kafka.log.OffsetIndex: Deleting index > /storage/kafka/00/kafka_data/TOPIC-6/00000000000000528024.index.deleted > > 2014-10-16 10:21:18,781 INFO kafka.log.Log: Deleting segment 528024 from > log TOPIC-6. > > I'm not too worried about this since I'm hoping to move to Kafka 0.8.2 ASAP > but I was curious if anyone could explain this behavior. > > -Bryan > -- -- Guozhang