[
https://issues.apache.org/jira/browse/KAFKA-2334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jiangjie Qin updated KAFKA-2334:
--------------------------------
Fix Version/s: (was: 0.9.0)
0.8.3
Status: Patch Available (was: In Progress)
> Prevent HW from going back during leader failover
> --------------------------------------------------
>
> Key: KAFKA-2334
> URL: https://issues.apache.org/jira/browse/KAFKA-2334
> Project: Kafka
> Issue Type: Bug
> Reporter: Guozhang Wang
> Assignee: Jiangjie Qin
> Fix For: 0.8.3
>
>
> Consider the following scenario:
> 0. Kafka use replication factor of 2, with broker B1 as the leader, and B2 as
> the follower.
> 1. A producer keep sending to Kafka with ack=-1.
> 2. A consumer repeat issuing ListOffset request to Kafka.
> And the following sequence:
> 0. B1 current log-end-offset (LEO) 0, HW-offset 0; and same with B2.
> 1. B1 receive a ProduceRequest of 100 messages, append to local log (LEO
> becomes 100) and hold the request in purgatory.
> 2. B1 receive a FetchRequest starting at offset 0 from follower B2, and
> returns the 100 messages.
> 3. B2 append its received message to local log (LEO becomes 100).
> 4. B1 receive another FetchRequest starting at offset 100 from B2, knowing
> that B2's LEO has caught up to 100, and hence update its own HW, and
> satisfying the ProduceRequest in purgatory, and sending the FetchResponse
> with HW 100 back to B2 ASYNCHRONOUSLY.
> 5. B1 successfully sends the ProduceResponse to the producer, and then fails,
> hence the FetchResponse did not reach B2, whose HW remains 0.
> From the consumer's point of view, it could first see the latest offset of
> 100 (from B1), and then see the latest offset of 0 (from B2), and then the
> latest offset gradually catch up to 100.
> This is because we use HW to guard the ListOffset and
> Fetch-from-ordinary-consumer.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)