[ 
https://issues.apache.org/jira/browse/KAFKA-5747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131252#comment-16131252
 ] 

Jason Gustafson commented on KAFKA-5747:
----------------------------------------

Changed the affected version to 0.11.0.0 only since this producer snapshot file 
did not exist prior to this release.

> Broker crashes on startup when trying to parse empty snapshot files
> -------------------------------------------------------------------
>
>                 Key: KAFKA-5747
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5747
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.11.0.0
>            Reporter: Lukasz Mierzwa
>            Assignee: Jason Gustafson
>             Fix For: 0.11.0.1
>
>
> A broker server crash can sometime result in empty snapshot files on disk 
> (depending on FS, barrier setting etc), when Kafka tries to parse such files 
> it crashes, gets restarted and crashes again, this happens until you remove 
> empty snapshot files with:
> {noformat}
> find /logs/dir -name \*.snapshot -size 0 -delete
> {noformat}
> Log:
> {noformat}
> Aug 15 22:52:11 localhost kafka[23681]: INFO Recovering unflushed segment 0 
> in log __consumer_offsets-16. (kafka.log.Log)
> Aug 15 22:52:11 localhost kafka[23681]: INFO Loading producer state from 
> offset 1207 for partition __consumer_offsets-16 with message format version 0 
> (kafka.log.Log)
> Aug 15 22:52:11 localhost kafka[23681]: INFO Completed load of log 
> __consumer_offsets-16 with 1 log segments, log start offset 0 and log end 
> offset 1207 in 15 ms (kafka.log.Log)
> Aug 15 22:52:11 localhost kafka[23681]: WARN Found a corrupted index file due 
> to requirement failed: Corrupt index found, index file 
> (/disk/data/kafka/mycluster/mytopic-64/00000000300519800823.index) has 
> non-zero size but the last offset is 300519800823 which is no larger than the 
> base offset 300519800823.}. deleting 
> /disk/data/kafka/mycluster/mytopic-64/00000000300519800823.timeindex, 
> /disk/data/kafka/mycluster/mytopic-64/00000000300519800823.index, and 
> /disk/data/kafka/mycluster/mytopic-64/00000000300519800823.txnindex and 
> rebuilding index... (kafka.log.Log)
> Aug 15 22:52:11 localhost kafka[23681]: INFO Loading producer state from 
> snapshot file 00000000300519800823.snapshot for partition mytopic-64 
> (kafka.log.ProducerStateManager)
> Aug 15 22:52:11 localhost kafka[23681]: ERROR There was an error in one of 
> the threads during logs loading: 
> org.apache.kafka.common.protocol.types.SchemaException: Error reading field 
> 'version': java.nio.BufferUnderflowException (kafka.log.LogManager)
> Aug 15 22:52:11 localhost kafka[23681]: FATAL [Kafka Server 10139], Fatal 
> error during KafkaServer startup. Prepare to shutdown 
> (kafka.server.KafkaServer)
> Aug 15 22:52:11 localhost kafka[23681]: 
> org.apache.kafka.common.protocol.types.SchemaException: Error reading field 
> 'version': java.nio.BufferUnderflowException
> Aug 15 22:52:11 localhost kafka[23681]:         at 
> org.apache.kafka.common.protocol.types.Schema.read(Schema.java:76)
> Aug 15 22:52:11 localhost kafka[23681]:         at 
> kafka.log.ProducerStateManager$.readSnapshot(ProducerStateManager.scala:289)
> Aug 15 22:52:11 localhost kafka[23681]:         at 
> kafka.log.ProducerStateManager.loadFromSnapshot(ProducerStateManager.scala:440)
> Aug 15 22:52:11 localhost kafka[23681]:         at 
> kafka.log.ProducerStateManager.truncateAndReload(ProducerStateManager.scala:499)
> Aug 15 22:52:11 localhost kafka[23681]:         at 
> kafka.log.Log.recoverSegment(Log.scala:327)
> Aug 15 22:52:11 localhost kafka[23681]:         at 
> kafka.log.Log.$anonfun$loadSegmentFiles$3(Log.scala:314)
> Aug 15 22:52:11 localhost kafka[23681]:         at 
> scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:789)
> Aug 15 22:52:11 localhost kafka[23681]:         at 
> scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:32)
> Aug 15 22:52:11 localhost kafka[23681]:         at 
> scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:29)
> Aug 15 22:52:11 localhost kafka[23681]:         at 
> scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:191)
> Aug 15 22:52:11 localhost kafka[23681]:         at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:788)
> Aug 15 22:52:11 localhost kafka[23681]:         at 
> kafka.log.Log.loadSegmentFiles(Log.scala:272)
> Aug 15 22:52:11 localhost kafka[23681]:         at 
> kafka.log.Log.loadSegments(Log.scala:376)
> Aug 15 22:52:11 localhost kafka[23681]:         at 
> kafka.log.Log.<init>(Log.scala:179)
> Aug 15 22:52:11 localhost kafka[23681]:         at 
> kafka.log.Log$.apply(Log.scala:1581)
> Aug 15 22:52:11 localhost kafka[23681]:         at 
> kafka.log.LogManager.$anonfun$loadLogs$12(LogManager.scala:172)
> Aug 15 22:52:11 localhost kafka[23681]:         at 
> kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:57)
> Aug 15 22:52:11 localhost kafka[23681]:         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> Aug 15 22:52:11 localhost kafka[23681]:         at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> Aug 15 22:52:11 localhost kafka[23681]:         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> Aug 15 22:52:11 localhost kafka[23681]:         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> Aug 15 22:52:11 localhost kafka[23681]:         at 
> java.lang.Thread.run(Thread.java:748)
> Aug 15 22:52:11 localhost kafka[23681]: INFO [Kafka Server 10139], shutting 
> down (kafka.server.KafkaServer)
> Aug 15 22:52:11 localhost kafka[23681]: INFO Terminate ZkClient event thread. 
> (org.I0Itec.zkclient.ZkEventThread)
> Aug 15 22:52:11 localhost kafka[23681]: INFO Session: 0x15de80e0a37004c 
> closed (org.apache.zookeeper.ZooKeeper)
> Aug 15 22:52:11 localhost kafka[23681]: INFO EventThread shut down for 
> session: 0x15de80e0a37004c (org.apache.zookeeper.ClientCnxn)
> Aug 15 22:52:11 localhost kafka[23681]: INFO [Kafka Server 10139], shut down 
> completed (kafka.server.KafkaServer)
> Aug 15 22:52:11 localhost kafka[23681]: FATAL Exiting Kafka. 
> (kafka.server.KafkaServerStartable)
> Aug 15 22:52:11 localhost kafka[23681]: INFO [Kafka Server 10139], shutting 
> down (kafka.server.KafkaServer)
> Aug 15 22:52:12 localhost systemd[1]: kafka.service: Main process exited, 
> code=exited, status=1/FAILURE
> Aug 15 22:52:12 localhost systemd[1]: kafka.service: Unit entered failed 
> state.
> Aug 15 22:52:12 localhost systemd[1]: kafka.service: Failed with result 
> 'exit-code'.
> {noformat}
> Happens with 0.10 & 0.11



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to