[
https://issues.apache.org/jira/browse/KAFKA-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13808133#comment-13808133
]
Jay Kreps commented on KAFKA-1106:
----------------------------------
Do you have the highwatermark checkpoint file that caused this? Your patch
makes things more tolerant of errors but I guess the question is how we got
into that state...
> HighwaterMarkCheckpoint failure puting broker into a bad state
> --------------------------------------------------------------
>
> Key: KAFKA-1106
> URL: https://issues.apache.org/jira/browse/KAFKA-1106
> Project: Kafka
> Issue Type: Bug
> Components: core
> Affects Versions: 0.8
> Reporter: David Lao
> Attachments: KAFKA-1106-patch, kafka.log
>
>
> I'm encountering a case where broker get stuck due to HighwaterMarkCheckpoint
> failing to recover from reading what appear to be corrupted isr entries. Once
> in this state, leader election can never succeed and hence stalling the
> entire cluster.
> Please see the detailed stack trace from the attached log. Perhaps failing
> fast when HighwaterMarkCheckpoint fails to read would force the broker to
> restart and recover.
--
This message was sent by Atlassian JIRA
(v6.1#6144)