[ https://issues.apache.org/jira/browse/KAFKA-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15406063#comment-15406063 ]
Flavio Junqueira commented on KAFKA-1211: ----------------------------------------- Thanks for the clarification, [~junrao]. There are a couple of specific points that still aren't entirely clear to me: # We are trying to preserve the generation when we copy messages to a follower, correct? In step 3.4, when we say that the follower flushes the LGS, we are more specifically trying to replicate the leader LGS, is that right? What happens if either the follower crashes or the leader changes between persisting the new LGS and fetching the new messages from the leader? I'm concerned that we will leave the LGS and the log of the broker in an inconsistent state. # When we say in step 3.4 that the follower needs to remember the LLG, I suppose this is just during this recovery period. Otherwise, once we have completed the sync up, the follower knows that the latest generation is the LLG. During sync up, there is the question I'm raising above, but it is also not super clear whether we need to persist the LLG independently to make sure that we don't have a situation in which the follower crashes, comes back, and accepts messages from a different generation. > Hold the produce request with ack > 1 in purgatory until replicas' HW has > larger than the produce offset > -------------------------------------------------------------------------------------------------------- > > Key: KAFKA-1211 > URL: https://issues.apache.org/jira/browse/KAFKA-1211 > Project: Kafka > Issue Type: Bug > Reporter: Guozhang Wang > Assignee: Guozhang Wang > Fix For: 0.11.0.0 > > > Today during leader failover we will have a weakness period when the > followers truncate their data before fetching from the new leader, i.e., > number of in-sync replicas is just 1. If during this time the leader has also > failed then produce requests with ack >1 that have get responded will still > be lost. To avoid this scenario we would prefer to hold the produce request > in purgatory until replica's HW has larger than the offset instead of just > their end-of-log offsets. -- This message was sent by Atlassian JIRA (v6.3.4#6332)