[ https://issues.apache.org/jira/browse/KAFKA-7074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Phil Mikhailov updated KAFKA-7074: ---------------------------------- Description: We have one of standard options of deploying Kafka-based EventSourcing microservice - deployment with reset. It includes running {{kafka-reset-tool}} before deploying and running microservice which should reset offsets for input topics and delete intermediate topics to allow microservice rebuild its internal state. We have faced a problem several times (its not 100% reproducible) that {{LogCleaner}} crashed with {{NoSuchFileException}} during compaction routing after such deployment. See detailed log: {code} Cleaning mechanism disable possible compaction: [2018-06-14 18:25:01,464] INFO The cleaning for partition reports-app-ReportsContractExpansionStateStore-changelog-7 is aborted Cleaning happended: [2018-06-14 18:25:56,761] INFO Deleting index /var/lib/kafka/data/reports-app-ReportsContractExpansionStateStore-changelog-7/00000000000000000000.timeindex (kafka.log.TimeIndex) Compaction failed 'cause it can't find file: [2018-06-14 18:25:57,402] ERROR [kafka-log-cleaner-thread-0], Error due to (kafka.log.LogCleaner) kafka.common.KafkaStorageException: Failed to change the timeindex file suffix from to .deleted for log segment 0 at kafka.log.LogSegment.kafkaStorageException$1(LogSegment.scala:340) at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:350) at kafka.log.Log.kafka$log$Log$$asyncDeleteSegment(Log.scala:981) at kafka.log.Log$$anonfun$replaceSegments$1.apply(Log.scala:1027) at kafka.log.Log$$anonfun$replaceSegments$1.apply(Log.scala:1022) at scala.collection.immutable.List.foreach(List.scala:381) at kafka.log.Log.replaceSegments(Log.scala:1022) at kafka.log.Cleaner.cleanSegments(LogCleaner.scala:426) at kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:363) at kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:362) at scala.collection.immutable.List.foreach(List.scala:381) at kafka.log.Cleaner.clean(LogCleaner.scala:362) at kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:241) at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:220) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63) Caused by: java.nio.file.NoSuchFileException: /var/lib/kafka/data/reports-app-ReportsContractExpansionStateStore-changelog-7/00000000000000000000.timeindex at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:409) at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262) at java.nio.file.Files.move(Files.java:1395) at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:711) at kafka.log.AbstractIndex.renameTo(AbstractIndex.scala:126) ... 14 more Suppressed: java.nio.file.NoSuchFileException: /var/lib/kafka/data/reports-app-ReportsContractExpansionStateStore-changelog-7/00000000000000000000.timeindex -> /var/lib/kafka/data/reports-app-ReportsContractExpansionStateStore-changelog-7/00000000000000000000.timeindex.deleted at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:396) at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262) at java.nio.file.Files.move(Files.java:1395) at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:708) ... 15 more {code} It looks like disabling compaction while deleting log doesn't work. After that compaction stops working and space consumption grows and there is only one solution to this problem is to restart broker. was: We have one of standard options of deploying Kafka-based EventSourcing microservice - deployment with reset. It includes running {{kafka-reset-tool}} before deploying and running microservice which should reset offsets for input topics and delete intermediate topics to allow microservice rebuild its internal state. We have faced a problem several times (its not 100% reproducible) that {{LogCleaner}} crashed with {{NoSuchFileException}} during compaction routing after such deployment. See detailed log: {code} Cleaning mechanism disable possible compaction: [2018-06-14 18:25:01,464] INFO The cleaning for partition reports-app-ReportsContractExpansionStateStore-changelog-7 is aborted Cleaning happended: [2018-06-14 18:25:56,761] INFO Deleting index /var/lib/kafka/data/reports-app-ReportsContractExpansionStateStore-changelog-7/00000000000000000000.timeindex (kafka.log.TimeIndex) Compaction failed 'cause it can't find file: [2018-06-14 18:25:57,402] ERROR [kafka-log-cleaner-thread-0], Error due to (kafka.log.LogCleaner) kafka.common.KafkaStorageException: Failed to change the timeindex file suffix from to .deleted for log segment 0 at kafka.log.LogSegment.kafkaStorageException$1(LogSegment.scala:340) at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:350) at kafka.log.Log.kafka$log$Log$$asyncDeleteSegment(Log.scala:981) at kafka.log.Log$$anonfun$replaceSegments$1.apply(Log.scala:1027) at kafka.log.Log$$anonfun$replaceSegments$1.apply(Log.scala:1022) at scala.collection.immutable.List.foreach(List.scala:381) at kafka.log.Log.replaceSegments(Log.scala:1022) at kafka.log.Cleaner.cleanSegments(LogCleaner.scala:426) at kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:363) at kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:362) at scala.collection.immutable.List.foreach(List.scala:381) at kafka.log.Cleaner.clean(LogCleaner.scala:362) at kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:241) at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:220) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63) Caused by: java.nio.file.NoSuchFileException: /var/lib/kafka/data/reports-app-ReportsContractExpansionStateStore-changelog-7/00000000000000000000.timeindex at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:409) at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262) at java.nio.file.Files.move(Files.java:1395) at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:711) at kafka.log.AbstractIndex.renameTo(AbstractIndex.scala:126) ... 14 more Suppressed: java.nio.file.NoSuchFileException: /var/lib/kafka/data/reports-app-ReportsContractExpansionStateStore-changelog-7/00000000000000000000.timeindex -> /var/lib/kafka/data/reports-app-ReportsContractExpansionStateStore-changelog-7/00000000000000000000.timeindex.deleted at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:396) at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262) at java.nio.file.Files.move(Files.java:1395) at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:708) ... 15 more {code} It looks like disabling compaction while deleting log doesn't work. > Race condition between Kafka log deletion and compaction > -------------------------------------------------------- > > Key: KAFKA-7074 > URL: https://issues.apache.org/jira/browse/KAFKA-7074 > Project: Kafka > Issue Type: Bug > Components: streams > Affects Versions: 0.10.2.1 > Reporter: Phil Mikhailov > Priority: Major > > We have one of standard options of deploying Kafka-based EventSourcing > microservice - deployment with reset. It includes running > {{kafka-reset-tool}} before deploying and running microservice which should > reset offsets for input topics and delete intermediate topics to allow > microservice rebuild its internal state. > We have faced a problem several times (its not 100% reproducible) that > {{LogCleaner}} crashed with {{NoSuchFileException}} during compaction routing > after such deployment. See detailed log: > {code} > Cleaning mechanism disable possible compaction: [2018-06-14 18:25:01,464] > INFO The cleaning for partition > reports-app-ReportsContractExpansionStateStore-changelog-7 is aborted > Cleaning happended: [2018-06-14 18:25:56,761] > INFO Deleting index > /var/lib/kafka/data/reports-app-ReportsContractExpansionStateStore-changelog-7/00000000000000000000.timeindex > (kafka.log.TimeIndex) > Compaction failed 'cause it can't find file: [2018-06-14 18:25:57,402] > ERROR [kafka-log-cleaner-thread-0], Error due to (kafka.log.LogCleaner) > > kafka.common.KafkaStorageException: Failed to change the timeindex file > suffix from to .deleted for log segment 0 > at > kafka.log.LogSegment.kafkaStorageException$1(LogSegment.scala:340) > at > kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:350) > at > kafka.log.Log.kafka$log$Log$$asyncDeleteSegment(Log.scala:981) > at > kafka.log.Log$$anonfun$replaceSegments$1.apply(Log.scala:1027) > at > kafka.log.Log$$anonfun$replaceSegments$1.apply(Log.scala:1022) > at > scala.collection.immutable.List.foreach(List.scala:381) > at > kafka.log.Log.replaceSegments(Log.scala:1022) > at > kafka.log.Cleaner.cleanSegments(LogCleaner.scala:426) > at > kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:363) > at > kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:362) > at > scala.collection.immutable.List.foreach(List.scala:381) > at > kafka.log.Cleaner.clean(LogCleaner.scala:362) > at > kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:241) > at > kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:220) > at > kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63) > Caused by: > java.nio.file.NoSuchFileException: > /var/lib/kafka/data/reports-app-ReportsContractExpansionStateStore-changelog-7/00000000000000000000.timeindex > at > sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) > at > sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) > at > sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) > at > sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:409) > at > sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262) > at > java.nio.file.Files.move(Files.java:1395) > at > org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:711) > at > kafka.log.AbstractIndex.renameTo(AbstractIndex.scala:126) > ... 14 more > Suppressed: > java.nio.file.NoSuchFileException: > /var/lib/kafka/data/reports-app-ReportsContractExpansionStateStore-changelog-7/00000000000000000000.timeindex > -> > > /var/lib/kafka/data/reports-app-ReportsContractExpansionStateStore-changelog-7/00000000000000000000.timeindex.deleted > at > sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) > at > sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) > at > sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:396) > at > sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262) > at > java.nio.file.Files.move(Files.java:1395) > at > org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:708) > ... 15 > more > {code} > It looks like disabling compaction while deleting log doesn't work. > After that compaction stops working and space consumption grows and there is > only one solution to this problem is to restart broker. -- This message was sent by Atlassian JIRA (v7.6.3#76005)