[ https://issues.apache.org/jira/browse/KAFKA-3643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15264543#comment-15264543 ]
Arun Mathew commented on KAFKA-3643: ------------------------------------ [~gwenshap] This is the issue I talked to you about during Kafka Summit 2016. > Data Duplication on clean restart of Kafka Broker > ------------------------------------------------- > > Key: KAFKA-3643 > URL: https://issues.apache.org/jira/browse/KAFKA-3643 > Project: Kafka > Issue Type: Bug > Components: core > Affects Versions: 0.9.0.1 > Reporter: Arun Mathew > > We observed event duplication while partition leadership is restored back to > preferred leader from the new leader upon restart of the preferred leader. > Steps to Reproduce > - Three Broker Kafka Cluster (B1, B2, B3) > - Create a topic with 3 replica and 1 partition. > - [B1 is assigned the (preferred) Leader, B2, B3 are ISR] > - Start sending events using performance producer for large number of events > that can last for few minutes to cover the broker restart time interval (say > 4Million) > - set producer batch size = 1 > - Clean shutdown Leader Broker B1 > - Event sending continues > - Now, B2 is the new Leader and B3 is ISR. > - Restart the Broker B1 (preferred leader for Partition 0) > - The replica in B1 catches up and becomes the Leader for P-0 > - Wait for producer to finish > - Use get offset command to get the event count in Partition, which is higher > than events sent (4M) -- This message was sent by Atlassian JIRA (v6.3.4#6332)