[ 
https://issues.apache.org/jira/browse/KAFKA-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15406915#comment-15406915
 ] 

Jun Rao commented on KAFKA-1211:
--------------------------------

[~fpj], very good questions.

1. Yes, the idea is for the follower to copy LGS from the leader. About the 
possibility of leading to an inconsistent state. We just need to make sure the 
log is consistent with respect to the local leader-generation-checkpoint file 
up to the log end offset. One potential issue with the current proposal is when 
the follower truncates the file and then flushes the checkpoint file. If the 
follower crashes at this point and the truncation hasn't been flushed, we may 
treat some of the messages after the truncation point  to be in a wrong leader 
generation. To fix that, we can change the protocol a bit. The basic idea is 
that the follower will never flush the checkpoint ahead of the log. Specially, 
when the follower gets the LGS from the leader, it stores it in memory. After 
truncation, the follower only flushes the prefix of LGS whose start offset is 
up to the log end offset. As the follower starts fetching data, everytime the 
fetched messages cross the leader generation boundary (according to the cached 
LGS), the follower will add a new lead generation entry to the checkpoint file 
and flushes it.

2. LLG doesn't have to be persisted and only needs to be cached in memory. The 
idea of LLG is really to detect any leader generation changes since the 
follower issued the RetreiveLeaderGeneration request. Once this is detected, 
the follower can handle it properly. If the follower crashes and restarts, it 
can always re-get the LLG from the current leader.

> Hold the produce request with ack > 1 in purgatory until replicas' HW has 
> larger than the produce offset
> --------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-1211
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1211
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Guozhang Wang
>            Assignee: Guozhang Wang
>             Fix For: 0.11.0.0
>
>
> Today during leader failover we will have a weakness period when the 
> followers truncate their data before fetching from the new leader, i.e., 
> number of in-sync replicas is just 1. If during this time the leader has also 
> failed then produce requests with ack >1 that have get responded will still 
> be lost. To avoid this scenario we would prefer to hold the produce request 
> in purgatory until replica's HW has larger than the offset instead of just 
> their end-of-log offsets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to