Yes the cluster was to a degree restarted in a rolling fashion but due to some other events causing the brokers to be rather confused the ISR for a number of partitions became empty and a new controller was elected. KAFKA-1647 sounds exactly like the problem I encountered. Thank you.
On Tue, Oct 21, 2014 at 3:28 PM, Guozhang Wang <wangg...@gmail.com> wrote: > Bryan, > > Did you take down some brokers in your cluster while hitting KAFKA-1028? If > yes, you may be hitting KAFKA-1647 also. > > Guozhang > > On Mon, Oct 20, 2014 at 1:18 PM, Bryan Baugher <bjb...@gmail.com> wrote: > > > Hi everyone, > > > > We run a 3 Kafka cluster using 0.8.1.1 with all topics having a > replication > > factor of 3 meaning every broker has a replica of every partition. > > > > We recently ran into this issue ( > > https://issues.apache.org/jira/browse/KAFKA-1028) and saw data loss > within > > Kafka. We understand why it happened and have plans to try to ensure it > > doesn't happen again. > > > > The strange part was that the broker that was chosen for the un-clean > > leader election seemed to drop all of its own data about the partition in > > the process as our monitoring shows the broker offset was reset to 0 for > a > > number of partitions. > > > > Following the broker's server logs in chronological order for a > particular > > partition that saw data loss I see this, > > > > 2014-10-16 10:18:11,104 INFO kafka.log.Log: Completed load of log TOPIC-6 > > with log end offset 528026 > > > > 2014-10-16 10:20:18,144 WARN > > kafka.controller.OfflinePartitionLeaderSelector: > > [OfflinePartitionLeaderSelector]: No broker in ISR is alive for > [TOPIC,6]. > > Elect leader 1 from live brokers 1,2. There's potential data loss. > > > > 2014-10-16 10:20:18,277 WARN kafka.cluster.Partition: Partition [TOPIC,6] > > on broker 1: No checkpointed highwatermark is found for partition > [TOPIC,6] > > > > 2014-10-16 10:20:18,698 INFO kafka.log.Log: Truncating log TOPIC-6 to > > offset 0. > > > > 2014-10-16 10:21:18,788 INFO kafka.log.OffsetIndex: Deleting index > > /storage/kafka/00/kafka_data/TOPIC-6/00000000000000528024.index.deleted > > > > 2014-10-16 10:21:18,781 INFO kafka.log.Log: Deleting segment 528024 from > > log TOPIC-6. > > > > I'm not too worried about this since I'm hoping to move to Kafka 0.8.2 > ASAP > > but I was curious if anyone could explain this behavior. > > > > -Bryan > > > > > > -- > -- Guozhang > -- Bryan