[ 
https://issues.apache.org/jira/browse/KAFKA-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15406063#comment-15406063
 ] 

Flavio Junqueira commented on KAFKA-1211:
-----------------------------------------

Thanks for the clarification, [~junrao]. There are a couple of specific points 
that still aren't entirely clear to me:

# We are trying to preserve the generation when we copy messages to a follower, 
correct? In step 3.4, when we say that the follower flushes the LGS, we are 
more specifically trying to replicate the leader LGS, is that right? What 
happens if either the follower crashes or the leader changes between persisting 
the new LGS and fetching the new messages from the leader? I'm concerned that 
we will leave the LGS and the log of the broker in an inconsistent state.
# When we say in step 3.4 that the follower needs to remember the LLG, I 
suppose this is just during this recovery period. Otherwise, once we have 
completed the sync up, the follower knows that the latest generation is the 
LLG. During sync up, there is the question I'm raising above, but it is also 
not super clear whether we need to persist the LLG independently to make sure 
that we don't have a situation in which the follower crashes, comes back, and 
accepts messages from a different generation.

> Hold the produce request with ack > 1 in purgatory until replicas' HW has 
> larger than the produce offset
> --------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-1211
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1211
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Guozhang Wang
>            Assignee: Guozhang Wang
>             Fix For: 0.11.0.0
>
>
> Today during leader failover we will have a weakness period when the 
> followers truncate their data before fetching from the new leader, i.e., 
> number of in-sync replicas is just 1. If during this time the leader has also 
> failed then produce requests with ack >1 that have get responded will still 
> be lost. To avoid this scenario we would prefer to hold the produce request 
> in purgatory until replica's HW has larger than the offset instead of just 
> their end-of-log offsets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to