Sriram Subramanian created KAFKA-905:
----------------------------------------
Summary: Logs can have same offsets causing recovery failure
Key: KAFKA-905
URL: https://issues.apache.org/jira/browse/KAFKA-905
Project: Kafka
Issue Type: Bug
Affects Versions: 0.8
Reporter: Sriram Subramanian
Assignee: Sriram Subramanian
Fix For: 0.8
Consider the following scenario -
L F
1 m1,m2 1 m1,m2
3 m3,m4 3 m3,m4
5 m5,m6 5 m5,m6
HW = 6 HW = 4
Follower goes down and comes back up. Truncates its log to HW
L F
1 m1,m2 1 m1,m2
3 m3,m4 3 m3,m4
5 m5,m6
HW = 6 HW = 4
Before follower catches up with the leader, leader goes down and follower
becomes the leader. It then gets new messages
F L
1 m1,m2 1 m1,m2
3 m3,m4 3 m3,m4
5 m5,m6 10 m5-m10
HW=6 HW=4
follower fetches from offset 7. Since offset 7 is within the compressed message
10 in the leader, the whole message chunk is sent to the follower
F L
1 m1,m2 1 m1,m2
3 m3,m4 3 m3,m4
5 m5,m6 10 m5-m10
10 m5-m10
HW=4 HW=10
The follower logs now contain the same offsets. On recovery, re-indexing will
fail due to repeated offsets.
Possible ways to fix this -
1. The fetcher thread can do deep iteration instead of shallow iteration and
drop the offsets that are less than the log end offset. This would however
incur performance hit.
2. To optimize step 1, we could do the deep iteration till the logical offset
of the fetched message set is greater than the log end offset of the follower
log and then switch to shallow iteration.
3. On recovery we just truncate the active segment and refetch the data.
All the above 3 steps are hacky. The right fix is to ensure we never corrupt
the logs. We can incur data loss but should not compromise consistency. For
0.8, the easiest and simplest fix would be 3.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira