[
https://issues.apache.org/jira/browse/KAFKA-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15410693#comment-15410693
]
Flavio Junqueira commented on KAFKA-1211:
-----------------------------------------
[~junrao] I was reading point (a) in your answer again, and there is something
I don't understand. You say that the follower truncates and then become leader.
This is fine, I understand it can happen. The bit I don't understand is how it
can truncate committed messages.
Let's say that we are talking about servers A and B, min ISR is 2 (the replica
set can be larger than 2, but it doesn't really matter for this example):
# A leads initially and B follows A.
# B truncates
# B becomes leader
If A leads, the it means that it was previously in the ISR (assuming unclean
leader election disabled) and it contains all committed messages. If B was also
part of the previous ISR, then both A and B it will also have all committed and
B won't truncate committed messages.
The situation you describe can only happen if either A or B lose committed
messages on their own and not because of the truncation, e.g., if the messages
didn't make it from the page cache to disk before a crash.
Is my understanding correct?
> Hold the produce request with ack > 1 in purgatory until replicas' HW has
> larger than the produce offset
> --------------------------------------------------------------------------------------------------------
>
> Key: KAFKA-1211
> URL: https://issues.apache.org/jira/browse/KAFKA-1211
> Project: Kafka
> Issue Type: Bug
> Reporter: Guozhang Wang
> Assignee: Guozhang Wang
> Fix For: 0.11.0.0
>
>
> Today during leader failover we will have a weakness period when the
> followers truncate their data before fetching from the new leader, i.e.,
> number of in-sync replicas is just 1. If during this time the leader has also
> failed then produce requests with ack >1 that have get responded will still
> be lost. To avoid this scenario we would prefer to hold the produce request
> in purgatory until replica's HW has larger than the offset instead of just
> their end-of-log offsets.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)