[ 
https://issues.apache.org/jira/browse/KAFKA-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312603#comment-14312603
 ] 

Manikumar Reddy commented on KAFKA-1758:
----------------------------------------

Attaching a patch which handles NumberFormatException while reading   recovery 
checkpoint file. We still fail for other IOExceptions. On NumberFormatException 
we will set the last recovery point to zero.

> corrupt recovery file prevents startup
> --------------------------------------
>
>                 Key: KAFKA-1758
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1758
>             Project: Kafka
>          Issue Type: Bug
>          Components: log
>            Reporter: Jason Rosenberg
>            Assignee: Manikumar Reddy
>              Labels: newbie
>             Fix For: 0.9.0
>
>         Attachments: KAFKA-1758.patch
>
>
> Hi,
> We recently had a kafka node go down suddenly. When it came back up, it 
> apparently had a corrupt recovery file, and refused to startup:
> {code}
> 2014-11-06 08:17:19,299  WARN [main] server.KafkaServer - Error starting up 
> KafkaServer
> java.lang.NumberFormatException: For input string: 
> "^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
> ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@"
>         at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>         at java.lang.Integer.parseInt(Integer.java:481)
>         at java.lang.Integer.parseInt(Integer.java:527)
>         at 
> scala.collection.immutable.StringLike$class.toInt(StringLike.scala:229)
>         at scala.collection.immutable.StringOps.toInt(StringOps.scala:31)
>         at kafka.server.OffsetCheckpoint.read(OffsetCheckpoint.scala:76)
>         at 
> kafka.log.LogManager$$anonfun$loadLogs$1.apply(LogManager.scala:106)
>         at 
> kafka.log.LogManager$$anonfun$loadLogs$1.apply(LogManager.scala:105)
>         at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>         at 
> scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
>         at kafka.log.LogManager.loadLogs(LogManager.scala:105)
>         at kafka.log.LogManager.<init>(LogManager.scala:57)
>         at kafka.server.KafkaServer.createLogManager(KafkaServer.scala:275)
>         at kafka.server.KafkaServer.startup(KafkaServer.scala:72)
> {code}
> And the app is under a monitor (so it was repeatedly restarting and failing 
> with this error for several minutes before we got to it)…
> We moved the ‘recovery-point-offset-checkpoint’ file out of the way, and it 
> then restarted cleanly (but of course re-synced all it’s data from replicas, 
> so we had no data loss).
> Anyway, I’m wondering if that’s the expected behavior? Or should it not 
> declare it corrupt and then proceed automatically to an unclean restart?
> Should this NumberFormatException be handled a bit more gracefully?
> We saved the corrupt file if it’s worth inspecting (although I doubt it will 
> be useful!)….
> The corrupt files appeared to be all zeroes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to