[ https://issues.apache.org/jira/browse/KAFKA-8604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16873734#comment-16873734 ]
songyingshuan commented on KAFKA-8604: -------------------------------------- The exceptions was thrown from here: {code:scala} def delete() { val deletedLog = log.delete() val deletedIndex = index.delete() val deletedTimeIndex = timeIndex.delete() val deletedTxnIndex = txnIndex.delete() if (!deletedLog && log.file.exists) // from here throw new IOException("Delete of log " + log.file.getName + " failed.") if (!deletedIndex && index.file.exists) throw new IOException("Delete of index " + index.file.getName + " failed.") if (!deletedTimeIndex && timeIndex.file.exists) throw new IOException("Delete of time index " + timeIndex.file.getName + " failed.") if (!deletedTxnIndex && txnIndex.file.exists) throw new IOException("Delete of transaction index " + txnIndex.file.getName + " failed.") } {code} it means deletedLog == false && log.file.exists == true,that is the log file to be deleted failed to delete and the file is still in the disk. but we have checked the partition file dir and the file did not exist. So confused !! > kafka log dir was marked as offline because of deleting segments of > __consumer_offsets failed > --------------------------------------------------------------------------------------------- > > Key: KAFKA-8604 > URL: https://issues.apache.org/jira/browse/KAFKA-8604 > Project: Kafka > Issue Type: Bug > Components: log cleaner > Affects Versions: 1.0.1 > Reporter: songyingshuan > Priority: Major > Attachments: error-logs.log > > > We encountered a problem in our product env without any foresight. When kafka > broker trying to clean __consumer_offsets-38 (and only happents to this > partition), the log shows > it failed, and marking the whole disk/log dir offline, and this leads to a > negative impact on some normal partitions (because of the ISR list of those > partitions decrease). > we had to restart the broker server to reuse the disk/dir which was marked as > offline. BUT!! this problem occurs periodically with the same reason so we > have to restart broker periodically. > we read some source code of kafka-1.0.1, but cannot make sure why this > happends. And The cluster status had been good until this problem suddenly > attacked us. > the error log is something like this : > > {code:java} > 2019-06-25 00:11:26,241 INFO kafka.log.TimeIndex: Deleting index > /data6/kafka/data/__consumer_offsets-38/00000000012855596978.timeindex.deleted > 2019-06-25 00:11:26,258 ERROR kafka.server.LogDirFailureChannel: Error while > deleting segments for __consumer_offsets-38 in dir /data6/kafka/data > java.io.IOException: Delete of log 00000000000000000000.log.deleted failed. > at kafka.log.LogSegment.delete(LogSegment.scala:496) > at > kafka.log.Log$$anonfun$kafka$log$Log$$deleteSeg$1$1.apply$mcV$sp(Log.scala:1596) > at kafka.log.Log$$anonfun$kafka$log$Log$$deleteSeg$1$1.apply(Log.scala:1596) > at kafka.log.Log$$anonfun$kafka$log$Log$$deleteSeg$1$1.apply(Log.scala:1596) > at kafka.log.Log.maybeHandleIOException(Log.scala:1669) > at kafka.log.Log.kafka$log$Log$$deleteSeg$1(Log.scala:1595) > at > kafka.log.Log$$anonfun$kafka$log$Log$$asyncDeleteSegment$1.apply$mcV$sp(Log.scala:1599) > at > kafka.utils.KafkaScheduler$$anonfun$1.apply$mcV$sp(KafkaScheduler.scala:110) > at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:61) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2019-06-25 00:11:26,265 ERROR kafka.utils.KafkaScheduler: Uncaught exception > in scheduled task 'delete-file' > org.apache.kafka.common.errors.KafkaStorageException: Error while deleting > segments for __consumer_offsets-38 in dir /data6/kafka/data > Caused by: java.io.IOException: Delete of log > 00000000000000000000.log.deleted failed. > at kafka.log.LogSegment.delete(LogSegment.scala:496) > at > kafka.log.Log$$anonfun$kafka$log$Log$$deleteSeg$1$1.apply$mcV$sp(Log.scala:1596) > at kafka.log.Log$$anonfun$kafka$log$Log$$deleteSeg$1$1.apply(Log.scala:1596) > at kafka.log.Log$$anonfun$kafka$log$Log$$deleteSeg$1$1.apply(Log.scala:1596) > at kafka.log.Log.maybeHandleIOException(Log.scala:1669) > at kafka.log.Log.kafka$log$Log$$deleteSeg$1(Log.scala:1595) > at > kafka.log.Log$$anonfun$kafka$log$Log$$asyncDeleteSegment$1.apply$mcV$sp(Log.scala:1599) > at > kafka.utils.KafkaScheduler$$anonfun$1.apply$mcV$sp(KafkaScheduler.scala:110) > at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:61) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2019-06-25 00:11:26,268 INFO kafka.server.ReplicaManager: [ReplicaManager > broker=1001] Stopping serving replicas in dir /data6/kafka/data > {code} > and we also find that the cleaning of __consumer_offsets-38 is so frequently > that almost 85% of output log is related. something like this : > > {code:java} > 2019-06-25 20:29:03,222 INFO kafka.log.OffsetIndex: Deleting index > /data6/kafka/data/__consumer_offsets-38/00000000008457474982.index.deleted > 2019-06-25 20:29:03,222 INFO kafka.log.TimeIndex: Deleting index > /data6/kafka/data/__consumer_offsets-38/00000000008457474982.timeindex.deleted > 2019-06-25 20:29:03,295 INFO kafka.log.Log: Deleting segment 10514999959 > from log __consumer_offsets-38. 2019-06-25 20:29:03,298 INFO > kafka.log.OffsetIndex: Deleting index > /data6/kafka/data/__consumer_offsets-38/00000000010514999959.index.deleted > 2019-06-25 20:29:03,298 INFO kafka.log.TimeIndex: Deleting index > /data6/kafka/data/__consumer_offsets-38/00000000010514999959.timeindex.deleted > 2019-06-25 20:29:03,380 INFO kafka.log.Log: Deleting segment 11374174390 > from log __consumer_offsets-38. 2019-06-25 20:29:03,382 INFO > kafka.log.OffsetIndex: Deleting index > /data6/kafka/data/__consumer_offsets-38/00000000011374174390.index.deleted > 2019-06-25 20:29:03,382 INFO kafka.log.TimeIndex: Deleting index > /data6/kafka/data/__consumer_offsets-38/00000000011374174390.timeindex.deleted > 2019-06-25 20:29:03,455 INFO kafka.log.Log: Deleting segment 13507582773 > from log __consumer_offsets-38. 2019-06-25 20:29:03,461 INFO > kafka.log.OffsetIndex: Deleting index > /data6/kafka/data/__consumer_offsets-38/00000000013507582773.index.deleted > 2019-06-25 20:29:03,461 INFO kafka.log.TimeIndex: Deleting index > /data6/kafka/data/__consumer_offsets-38/00000000013507582773.timeindex.deleted > 2019-06-25 20:29:03,531 INFO kafka.log.Log: Deleting segment 15655066421 > from log __consumer_offsets-38. 2019-06-25 20:29:03,531 INFO > kafka.log.OffsetIndex: Deleting index > /data6/kafka/data/__consumer_offsets-38/00000000015655066421.index.deleted > 2019-06-25 20:29:03,531 INFO kafka.log.TimeIndex: Deleting index > /data6/kafka/data/__consumer_offsets-38/00000000015655066421.timeindex.deleted > 2019-06-25 20:29:03,606 INFO kafka.log.Log: Deleting segment 17802355567 > from log __consumer_offsets-38. 2019-06-25 20:29:03,606 INFO kafka.log.Log: > Deleting segment 18177617859 from log __consumer_offsets-38. 2019-06-25 > 20:29:03,606 INFO kafka.log.OffsetIndex: Deleting index > /data6/kafka/data/__consumer_offsets-38/00000000017802355567.index.deleted > 2019-06-25 20:29:03,606 INFO kafka.log.TimeIndex: Deleting index > /data6/kafka/data/__consumer_offsets-38/00000000017802355567.timeindex.deleted > 2019-06-25 20:29:03,606 INFO kafka.log.OffsetIndex: Deleting index > /data6/kafka/data/__consumer_offsets-38/00000000018177617859.index.deleted > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)