[ 
https://issues.apache.org/jira/browse/KAFKA-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15410693#comment-15410693
 ] 

Flavio Junqueira commented on KAFKA-1211:
-----------------------------------------

[~junrao] I was reading point (a) in your answer again, and there is something 
I don't understand. You say that the follower truncates and then become leader. 
This is fine, I understand it can happen. The bit I don't understand is how it 
can truncate committed messages. 

Let's say that we are talking about servers A and B, min ISR is 2 (the replica 
set can be larger than 2, but it doesn't really matter for this example):

# A leads initially and B follows A.
# B truncates
# B becomes leader

If A leads, the it means that it was previously in the ISR (assuming unclean 
leader election disabled) and it contains all committed messages. If B was also 
part of the previous ISR, then both A and B it will also have all committed and 
B won't truncate committed messages.

The situation you describe can only happen if either A or B lose committed 
messages on their own and not because of the truncation, e.g., if the messages 
didn't make it from the page cache to disk before a crash.

Is my understanding correct?

> Hold the produce request with ack > 1 in purgatory until replicas' HW has 
> larger than the produce offset
> --------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-1211
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1211
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Guozhang Wang
>            Assignee: Guozhang Wang
>             Fix For: 0.11.0.0
>
>
> Today during leader failover we will have a weakness period when the 
> followers truncate their data before fetching from the new leader, i.e., 
> number of in-sync replicas is just 1. If during this time the leader has also 
> failed then produce requests with ack >1 that have get responded will still 
> be lost. To avoid this scenario we would prefer to hold the produce request 
> in purgatory until replica's HW has larger than the offset instead of just 
> their end-of-log offsets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to