[ https://issues.apache.org/jira/browse/KAFKA-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15410693#comment-15410693 ]
Flavio Junqueira commented on KAFKA-1211: ----------------------------------------- [~junrao] I was reading point (a) in your answer again, and there is something I don't understand. You say that the follower truncates and then become leader. This is fine, I understand it can happen. The bit I don't understand is how it can truncate committed messages. Let's say that we are talking about servers A and B, min ISR is 2 (the replica set can be larger than 2, but it doesn't really matter for this example): # A leads initially and B follows A. # B truncates # B becomes leader If A leads, the it means that it was previously in the ISR (assuming unclean leader election disabled) and it contains all committed messages. If B was also part of the previous ISR, then both A and B it will also have all committed and B won't truncate committed messages. The situation you describe can only happen if either A or B lose committed messages on their own and not because of the truncation, e.g., if the messages didn't make it from the page cache to disk before a crash. Is my understanding correct? > Hold the produce request with ack > 1 in purgatory until replicas' HW has > larger than the produce offset > -------------------------------------------------------------------------------------------------------- > > Key: KAFKA-1211 > URL: https://issues.apache.org/jira/browse/KAFKA-1211 > Project: Kafka > Issue Type: Bug > Reporter: Guozhang Wang > Assignee: Guozhang Wang > Fix For: 0.11.0.0 > > > Today during leader failover we will have a weakness period when the > followers truncate their data before fetching from the new leader, i.e., > number of in-sync replicas is just 1. If during this time the leader has also > failed then produce requests with ack >1 that have get responded will still > be lost. To avoid this scenario we would prefer to hold the produce request > in purgatory until replica's HW has larger than the offset instead of just > their end-of-log offsets. -- This message was sent by Atlassian JIRA (v6.3.4#6332)