[ https://issues.apache.org/jira/browse/KAFKA-8604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16885316#comment-16885316 ]
Alex Bertzeletos commented on KAFKA-8604: ----------------------------------------- [~ymxz] So, in your case, setting the log.retention.check.interval.ms=3000 helped as a workaround? > kafka log dir was marked as offline because of deleting segments of > __consumer_offsets failed > --------------------------------------------------------------------------------------------- > > Key: KAFKA-8604 > URL: https://issues.apache.org/jira/browse/KAFKA-8604 > Project: Kafka > Issue Type: Bug > Components: log cleaner > Affects Versions: 1.0.1 > Reporter: songyingshuan > Priority: Major > Attachments: error-logs.log > > > We encountered a problem in our product env without any foresight. When kafka > broker trying to clean __consumer_offsets-38 (and only happents to this > partition), the log shows > it failed, and marking the whole disk/log dir offline, and this leads to a > negative impact on some normal partitions (because of the ISR list of those > partitions decrease). > we had to restart the broker server to reuse the disk/dir which was marked as > offline. BUT!! this problem occurs periodically with the same reason so we > have to restart broker periodically. > we read some source code of kafka-1.0.1, but cannot make sure why this > happends. And The cluster status had been good until this problem suddenly > attacked us. > the error log is something like this : > > {code:java} > 2019-06-25 00:11:26,241 INFO kafka.log.TimeIndex: Deleting index > /data6/kafka/data/__consumer_offsets-38/00000000012855596978.timeindex.deleted > 2019-06-25 00:11:26,258 ERROR kafka.server.LogDirFailureChannel: Error while > deleting segments for __consumer_offsets-38 in dir /data6/kafka/data > java.io.IOException: Delete of log 00000000000000000000.log.deleted failed. > at kafka.log.LogSegment.delete(LogSegment.scala:496) > at > kafka.log.Log$$anonfun$kafka$log$Log$$deleteSeg$1$1.apply$mcV$sp(Log.scala:1596) > at kafka.log.Log$$anonfun$kafka$log$Log$$deleteSeg$1$1.apply(Log.scala:1596) > at kafka.log.Log$$anonfun$kafka$log$Log$$deleteSeg$1$1.apply(Log.scala:1596) > at kafka.log.Log.maybeHandleIOException(Log.scala:1669) > at kafka.log.Log.kafka$log$Log$$deleteSeg$1(Log.scala:1595) > at > kafka.log.Log$$anonfun$kafka$log$Log$$asyncDeleteSegment$1.apply$mcV$sp(Log.scala:1599) > at > kafka.utils.KafkaScheduler$$anonfun$1.apply$mcV$sp(KafkaScheduler.scala:110) > at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:61) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2019-06-25 00:11:26,265 ERROR kafka.utils.KafkaScheduler: Uncaught exception > in scheduled task 'delete-file' > org.apache.kafka.common.errors.KafkaStorageException: Error while deleting > segments for __consumer_offsets-38 in dir /data6/kafka/data > Caused by: java.io.IOException: Delete of log > 00000000000000000000.log.deleted failed. > at kafka.log.LogSegment.delete(LogSegment.scala:496) > at > kafka.log.Log$$anonfun$kafka$log$Log$$deleteSeg$1$1.apply$mcV$sp(Log.scala:1596) > at kafka.log.Log$$anonfun$kafka$log$Log$$deleteSeg$1$1.apply(Log.scala:1596) > at kafka.log.Log$$anonfun$kafka$log$Log$$deleteSeg$1$1.apply(Log.scala:1596) > at kafka.log.Log.maybeHandleIOException(Log.scala:1669) > at kafka.log.Log.kafka$log$Log$$deleteSeg$1(Log.scala:1595) > at > kafka.log.Log$$anonfun$kafka$log$Log$$asyncDeleteSegment$1.apply$mcV$sp(Log.scala:1599) > at > kafka.utils.KafkaScheduler$$anonfun$1.apply$mcV$sp(KafkaScheduler.scala:110) > at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:61) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2019-06-25 00:11:26,268 INFO kafka.server.ReplicaManager: [ReplicaManager > broker=1001] Stopping serving replicas in dir /data6/kafka/data > {code} > and we also find that the cleaning of __consumer_offsets-38 is so frequently > that almost 85% of output log is related. something like this : > > {code:java} > 2019-06-25 20:29:03,222 INFO kafka.log.OffsetIndex: Deleting index > /data6/kafka/data/__consumer_offsets-38/00000000008457474982.index.deleted > 2019-06-25 20:29:03,222 INFO kafka.log.TimeIndex: Deleting index > /data6/kafka/data/__consumer_offsets-38/00000000008457474982.timeindex.deleted > 2019-06-25 20:29:03,295 INFO kafka.log.Log: Deleting segment 10514999959 > from log __consumer_offsets-38. 2019-06-25 20:29:03,298 INFO > kafka.log.OffsetIndex: Deleting index > /data6/kafka/data/__consumer_offsets-38/00000000010514999959.index.deleted > 2019-06-25 20:29:03,298 INFO kafka.log.TimeIndex: Deleting index > /data6/kafka/data/__consumer_offsets-38/00000000010514999959.timeindex.deleted > 2019-06-25 20:29:03,380 INFO kafka.log.Log: Deleting segment 11374174390 > from log __consumer_offsets-38. 2019-06-25 20:29:03,382 INFO > kafka.log.OffsetIndex: Deleting index > /data6/kafka/data/__consumer_offsets-38/00000000011374174390.index.deleted > 2019-06-25 20:29:03,382 INFO kafka.log.TimeIndex: Deleting index > /data6/kafka/data/__consumer_offsets-38/00000000011374174390.timeindex.deleted > 2019-06-25 20:29:03,455 INFO kafka.log.Log: Deleting segment 13507582773 > from log __consumer_offsets-38. 2019-06-25 20:29:03,461 INFO > kafka.log.OffsetIndex: Deleting index > /data6/kafka/data/__consumer_offsets-38/00000000013507582773.index.deleted > 2019-06-25 20:29:03,461 INFO kafka.log.TimeIndex: Deleting index > /data6/kafka/data/__consumer_offsets-38/00000000013507582773.timeindex.deleted > 2019-06-25 20:29:03,531 INFO kafka.log.Log: Deleting segment 15655066421 > from log __consumer_offsets-38. 2019-06-25 20:29:03,531 INFO > kafka.log.OffsetIndex: Deleting index > /data6/kafka/data/__consumer_offsets-38/00000000015655066421.index.deleted > 2019-06-25 20:29:03,531 INFO kafka.log.TimeIndex: Deleting index > /data6/kafka/data/__consumer_offsets-38/00000000015655066421.timeindex.deleted > 2019-06-25 20:29:03,606 INFO kafka.log.Log: Deleting segment 17802355567 > from log __consumer_offsets-38. 2019-06-25 20:29:03,606 INFO kafka.log.Log: > Deleting segment 18177617859 from log __consumer_offsets-38. 2019-06-25 > 20:29:03,606 INFO kafka.log.OffsetIndex: Deleting index > /data6/kafka/data/__consumer_offsets-38/00000000017802355567.index.deleted > 2019-06-25 20:29:03,606 INFO kafka.log.TimeIndex: Deleting index > /data6/kafka/data/__consumer_offsets-38/00000000017802355567.timeindex.deleted > 2019-06-25 20:29:03,606 INFO kafka.log.OffsetIndex: Deleting index > /data6/kafka/data/__consumer_offsets-38/00000000018177617859.index.deleted > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016)