[
https://issues.apache.org/jira/browse/KAFKA-757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Swapnil Ghike updated KAFKA-757:
--------------------------------
Attachment: kafka-757-v1.patch
There are two parts to this patch :
A. Move the sanity check to detect corrupt index files from OffsetIndex
constructor to Log constructor below the recovery logic. In case of a hard
kill, checking for corrupt index files before the last segment has been
recovered will fail the require() assertion.
B. The following corner case is possible:
1. A broker rolled a new log segment file and an index file of non-zero size,
and got hard killed before any appends to the index file were flushed.
2. When the broker reboots and tries to load existing log segments, it will
encounter this index file that has non-zero size, but has no data.
3. Since the broker was hard killed, it will enter the recovery logic in
Log.loadLogSegments().
4. The recovery logic will try to truncate the index file to the base offset of
the segment. It will try to find the indexSlotFor(baseOffset). indexForSlot()
will return a non- zero value, because the relativeOffset(idx, mid) ==
relOffset == 0.
5. This will set the size of index file to a non-zero value (which will be half
of its original size which was maxIndexSize * 8).
6. Thus, the require() check for corrupted index file in Log constructor will
not pass since we have #entries == size != 0 && lastOffset == baseOffset.
The solution is to modify indexSlotFor() such that it returns -1 for non–zero
sized index file whose lastOffset is 0 (assuming that setLength() will set
empty bytes to 0), so that the index file is truncated to #entries == size ==
0.
Testing done:
1. Unit tests passed.
2. Change the flush interval and index append interval to really low values.
Produce data using console producer (index file will have flushed entries),
hard kill the broker, restart the broker. Should see the exception without A.
Should pass with A, ctrl+C the broker.
3. Cleanup the kafka-logs directory, don't cleanup the zookeeper. Restart the
broker (to create empty log and index files for topics created in 2 above), it
will boot up, hard kill it. Restart the broker again, it should fail without B,
should boot successfully with B.
> System Test Hard Failure cases : "Fatal error during KafkaServerStable
> startup" when hard-failed broker is re-started
> ---------------------------------------------------------------------------------------------------------------------
>
> Key: KAFKA-757
> URL: https://issues.apache.org/jira/browse/KAFKA-757
> Project: Kafka
> Issue Type: Bug
> Reporter: John Fung
> Assignee: Swapnil Ghike
> Priority: Blocker
> Labels: 0.8, replication-testing
> Attachments: kafka-757-v1.patch
>
>
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira