[ https://issues.apache.org/jira/browse/KAFKA-7074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Phil Mikhailov updated KAFKA-7074: ---------------------------------- Description: We have one of standard options of deploying Kafka-based EventSourcing microservice - deployment with reset. It includes running {{kafka-reset-tool}} before deploying and running microservice which should reset offsets for input topics and delete intermediate topics to allow microservice rebuild its internal state. We have faced a problem several times (its not 100% reproducible) that {{LogCleaner}} crashed with {{NoSuchFileException}} during compaction routing after such deployment. See detailed log: {code} Cleaning mechanism disable possible compaction: [2018-06-14 18:25:01,464] INFO The cleaning for partition reports-app-ReportsContractExpansionStateStore-changelog-7 is aborted Cleaning happended: [2018-06-14 18:25:56,761] INFO Deleting index /var/lib/kafka/data/reports-app-ReportsContractExpansionStateStore-changelog-7/00000000000000000000.timeindex (kafka.log.TimeIndex) Compaction failed 'cause it can't find file: [2018-06-14 18:25:57,402] ERROR [kafka-log-cleaner-thread-0], Error due to (kafka.log.LogCleaner) kafka.common.KafkaStorageException: Failed to change the timeindex file suffix from to .deleted for log segment 0 at kafka.log.LogSegment.kafkaStorageException$1(LogSegment.scala:340) at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:350) at kafka.log.Log.kafka$log$Log$$asyncDeleteSegment(Log.scala:981) at kafka.log.Log$$anonfun$replaceSegments$1.apply(Log.scala:1027) at kafka.log.Log$$anonfun$replaceSegments$1.apply(Log.scala:1022) at scala.collection.immutable.List.foreach(List.scala:381) at kafka.log.Log.replaceSegments(Log.scala:1022) at kafka.log.Cleaner.cleanSegments(LogCleaner.scala:426) at kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:363) at kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:362) at scala.collection.immutable.List.foreach(List.scala:381) at kafka.log.Cleaner.clean(LogCleaner.scala:362) at kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:241) at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:220) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63) Caused by: java.nio.file.NoSuchFileException: /var/lib/kafka/data/reports-app-ReportsContractExpansionStateStore-changelog-7/00000000000000000000.timeindex at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:409) at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262) at java.nio.file.Files.move(Files.java:1395) at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:711) at kafka.log.AbstractIndex.renameTo(AbstractIndex.scala:126) ... 14 more Suppressed: java.nio.file.NoSuchFileException: /var/lib/kafka/data/reports-app-ReportsContractExpansionStateStore-changelog-7/00000000000000000000.timeindex -> /var/lib/kafka/data/reports-app-ReportsContractExpansionStateStore-changelog-7/00000000000000000000.timeindex.deleted at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:396) at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262) at java.nio.file.Files.move(Files.java:1395) at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:708) ... 15 more {code} It looks like disabling compaction while deleting log doesn't work. was: We have one of standard options of deploying Kafka-based EventSourcing microservice - deployment with reset. It includes running {{kafka-reset-tool}} before deploying and running microservice which should reset offsets for input topics and delete intermediate topics to allow microservice rebuild its internal state. We have faced a problem several times (its not 100% reproducible) that {{LogCleaner}} crashed with {{NoSuchFileException}} during compaction routing after such deployment. See detailed log: {code} Cleaning mechanism disable possible compaction: [2018-06-14 18:25:01,464] INFO The cleaning for partition reports-app-ReportsContractExpansionStateStore-changelog-7 is aborted Cleaning happended: [2018-06-14 18:25:56,761] INFO Deleting index /var/lib/kafka/data/reports-app-ReportsContractExpansionStateStore-changelog-7/00000000000000000000.timeindex (kafka.log.TimeIndex) Compaction failed 'cause it can't find file: [2018-06-14 18:25:57,402] ERROR [kafka-log-cleaner-thread-0], Error due to (kafka.log.LogCleaner) kafka.common.KafkaStorageException: Failed to change the timeindex file suffix from to .deleted for log segment 0 at kafka.log.LogSegment.kafkaStorageException$1(LogSegment.scala:340) at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:350) at kafka.log.Log.kafka$log$Log$$asyncDeleteSegment(Log.scala:981) at kafka.log.Log$$anonfun$replaceSegments$1.apply(Log.scala:1027) at kafka.log.Log$$anonfun$replaceSegments$1.apply(Log.scala:1022) at scala.collection.immutable.List.foreach(List.scala:381) at kafka.log.Log.replaceSegments(Log.scala:1022) at kafka.log.Cleaner.cleanSegments(LogCleaner.scala:426) at kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:363) at kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:362) at scala.collection.immutable.List.foreach(List.scala:381) at kafka.log.Cleaner.clean(LogCleaner.scala:362) at kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:241) at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:220) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63) Caused by: java.nio.file.NoSuchFileException: /var/lib/kafka/data/reports-app-ReportsContractExpansionStateStore-changelog-7/00000000000000000000.timeindex at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:409) at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262) at java.nio.file.Files.move(Files.java:1395) at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:711) at kafka.log.AbstractIndex.renameTo(AbstractIndex.scala:126) ... 14 more Suppressed: java.nio.file.NoSuchFileException: /var/lib/kafka/data/reports-app-ReportsContractExpansionStateStore-changelog-7/00000000000000000000.timeindex -> /var/lib/kafka/data/reports-app-ReportsContractExpansionStateStore-changelog-7/00000000000000000000.timeindex.deleted at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:396) at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262) at java.nio.file.Files.move(Files.java:1395) at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:708) ... 15 more {code} It looks like that disabling compaction while deleting log doesn't work. > Race condition between Kafka log deletion and compaction > -------------------------------------------------------- > > Key: KAFKA-7074 > URL: https://issues.apache.org/jira/browse/KAFKA-7074 > Project: Kafka > Issue Type: Bug > Components: streams > Affects Versions: 0.10.2.1 > Reporter: Phil Mikhailov > Priority: Major > > We have one of standard options of deploying Kafka-based EventSourcing > microservice - deployment with reset. It includes running > {{kafka-reset-tool}} before deploying and running microservice which should > reset offsets for input topics and delete intermediate topics to allow > microservice rebuild its internal state. > We have faced a problem several times (its not 100% reproducible) that > {{LogCleaner}} crashed with {{NoSuchFileException}} during compaction routing > after such deployment. See detailed log: > {code} > Cleaning mechanism disable possible compaction: [2018-06-14 18:25:01,464] > INFO The cleaning for partition > reports-app-ReportsContractExpansionStateStore-changelog-7 is aborted > Cleaning happended: [2018-06-14 18:25:56,761] > INFO Deleting index > /var/lib/kafka/data/reports-app-ReportsContractExpansionStateStore-changelog-7/00000000000000000000.timeindex > (kafka.log.TimeIndex) > Compaction failed 'cause it can't find file: [2018-06-14 18:25:57,402] > ERROR [kafka-log-cleaner-thread-0], Error due to (kafka.log.LogCleaner) > > kafka.common.KafkaStorageException: Failed to change the timeindex file > suffix from to .deleted for log segment 0 > at > kafka.log.LogSegment.kafkaStorageException$1(LogSegment.scala:340) > at > kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:350) > at > kafka.log.Log.kafka$log$Log$$asyncDeleteSegment(Log.scala:981) > at > kafka.log.Log$$anonfun$replaceSegments$1.apply(Log.scala:1027) > at > kafka.log.Log$$anonfun$replaceSegments$1.apply(Log.scala:1022) > at > scala.collection.immutable.List.foreach(List.scala:381) > at > kafka.log.Log.replaceSegments(Log.scala:1022) > at > kafka.log.Cleaner.cleanSegments(LogCleaner.scala:426) > at > kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:363) > at > kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:362) > at > scala.collection.immutable.List.foreach(List.scala:381) > at > kafka.log.Cleaner.clean(LogCleaner.scala:362) > at > kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:241) > at > kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:220) > at > kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63) > Caused by: > java.nio.file.NoSuchFileException: > /var/lib/kafka/data/reports-app-ReportsContractExpansionStateStore-changelog-7/00000000000000000000.timeindex > at > sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) > at > sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) > at > sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) > at > sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:409) > at > sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262) > at > java.nio.file.Files.move(Files.java:1395) > at > org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:711) > at > kafka.log.AbstractIndex.renameTo(AbstractIndex.scala:126) > ... 14 more > Suppressed: > java.nio.file.NoSuchFileException: > /var/lib/kafka/data/reports-app-ReportsContractExpansionStateStore-changelog-7/00000000000000000000.timeindex > -> > > /var/lib/kafka/data/reports-app-ReportsContractExpansionStateStore-changelog-7/00000000000000000000.timeindex.deleted > at > sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) > at > sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) > at > sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:396) > at > sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262) > at > java.nio.file.Files.move(Files.java:1395) > at > org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:708) > ... 15 > more > {code} > It looks like disabling compaction while deleting log doesn't work. -- This message was sent by Atlassian JIRA (v7.6.3#76005)