[ https://issues.apache.org/jira/browse/KAFKA-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312603#comment-14312603 ]
Manikumar Reddy commented on KAFKA-1758: ---------------------------------------- Attaching a patch which handles NumberFormatException while reading recovery checkpoint file. We still fail for other IOExceptions. On NumberFormatException we will set the last recovery point to zero. > corrupt recovery file prevents startup > -------------------------------------- > > Key: KAFKA-1758 > URL: https://issues.apache.org/jira/browse/KAFKA-1758 > Project: Kafka > Issue Type: Bug > Components: log > Reporter: Jason Rosenberg > Assignee: Manikumar Reddy > Labels: newbie > Fix For: 0.9.0 > > Attachments: KAFKA-1758.patch > > > Hi, > We recently had a kafka node go down suddenly. When it came back up, it > apparently had a corrupt recovery file, and refused to startup: > {code} > 2014-11-06 08:17:19,299 WARN [main] server.KafkaServer - Error starting up > KafkaServer > java.lang.NumberFormatException: For input string: > "^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@ > ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@" > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > at java.lang.Integer.parseInt(Integer.java:481) > at java.lang.Integer.parseInt(Integer.java:527) > at > scala.collection.immutable.StringLike$class.toInt(StringLike.scala:229) > at scala.collection.immutable.StringOps.toInt(StringOps.scala:31) > at kafka.server.OffsetCheckpoint.read(OffsetCheckpoint.scala:76) > at > kafka.log.LogManager$$anonfun$loadLogs$1.apply(LogManager.scala:106) > at > kafka.log.LogManager$$anonfun$loadLogs$1.apply(LogManager.scala:105) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at > scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34) > at kafka.log.LogManager.loadLogs(LogManager.scala:105) > at kafka.log.LogManager.<init>(LogManager.scala:57) > at kafka.server.KafkaServer.createLogManager(KafkaServer.scala:275) > at kafka.server.KafkaServer.startup(KafkaServer.scala:72) > {code} > And the app is under a monitor (so it was repeatedly restarting and failing > with this error for several minutes before we got to it)… > We moved the ‘recovery-point-offset-checkpoint’ file out of the way, and it > then restarted cleanly (but of course re-synced all it’s data from replicas, > so we had no data loss). > Anyway, I’m wondering if that’s the expected behavior? Or should it not > declare it corrupt and then proceed automatically to an unclean restart? > Should this NumberFormatException be handled a bit more gracefully? > We saved the corrupt file if it’s worth inspecting (although I doubt it will > be useful!)…. > The corrupt files appeared to be all zeroes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)