[ 
https://issues.apache.org/jira/browse/KAFKA-7074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phil Mikhailov updated KAFKA-7074:
----------------------------------
    Description: 
We have one of standard options of deploying Kafka-based EventSourcing 
microservice - deployment with reset. It includes running {{kafka-reset-tool}} 
before deploying and running microservice which should reset offsets for input 
topics and delete intermediate topics to allow microservice rebuild its 
internal state. 
We have faced a problem several times (its not 100% reproducible) that 
{{LogCleaner}} crashed with {{NoSuchFileException}} during compaction routing 
after such deployment. See detailed log:
{code}
Cleaning mechanism disable possible compaction: [2018-06-14 18:25:01,464] INFO 
The cleaning for partition 
reports-app-ReportsContractExpansionStateStore-changelog-7 is aborted 
Cleaning happended:                             [2018-06-14 18:25:56,761] INFO 
Deleting index 
/var/lib/kafka/data/reports-app-ReportsContractExpansionStateStore-changelog-7/00000000000000000000.timeindex
 (kafka.log.TimeIndex)
Compaction failed 'cause it can't find file:    [2018-06-14 18:25:57,402] ERROR 
[kafka-log-cleaner-thread-0], Error due to  (kafka.log.LogCleaner)
                                                    
kafka.common.KafkaStorageException: Failed to change the timeindex file suffix 
from  to .deleted for log segment 0
                                                            at 
kafka.log.LogSegment.kafkaStorageException$1(LogSegment.scala:340)
                                                            at 
kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:350)
                                                            at 
kafka.log.Log.kafka$log$Log$$asyncDeleteSegment(Log.scala:981)
                                                            at 
kafka.log.Log$$anonfun$replaceSegments$1.apply(Log.scala:1027)
                                                            at 
kafka.log.Log$$anonfun$replaceSegments$1.apply(Log.scala:1022)
                                                            at 
scala.collection.immutable.List.foreach(List.scala:381)
                                                            at 
kafka.log.Log.replaceSegments(Log.scala:1022)
                                                            at 
kafka.log.Cleaner.cleanSegments(LogCleaner.scala:426)
                                                            at 
kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:363)
                                                            at 
kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:362)
                                                            at 
scala.collection.immutable.List.foreach(List.scala:381)
                                                            at 
kafka.log.Cleaner.clean(LogCleaner.scala:362)
                                                            at 
kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:241)
                                                            at 
kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:220)
                                                            at 
kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
                                                    Caused by: 
java.nio.file.NoSuchFileException: 
/var/lib/kafka/data/reports-app-ReportsContractExpansionStateStore-changelog-7/00000000000000000000.timeindex
                                                            at 
sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
                                                            at 
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
                                                            at 
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
                                                            at 
sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:409)
                                                            at 
sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
                                                            at 
java.nio.file.Files.move(Files.java:1395)
                                                            at 
org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:711)
                                                            at 
kafka.log.AbstractIndex.renameTo(AbstractIndex.scala:126)
                                                            ... 14 more
                                                            Suppressed: 
java.nio.file.NoSuchFileException: 
/var/lib/kafka/data/reports-app-ReportsContractExpansionStateStore-changelog-7/00000000000000000000.timeindex
 -> 
                                                    
/var/lib/kafka/data/reports-app-ReportsContractExpansionStateStore-changelog-7/00000000000000000000.timeindex.deleted
                                                                    at 
sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
                                                                    at 
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
                                                                    at 
sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:396)
                                                                    at 
sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
                                                                    at 
java.nio.file.Files.move(Files.java:1395)
                                                                    at 
org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:708)
                                                                    ... 15 more

{code}

It looks like disabling compaction while deleting log doesn't work.

  was:
We have one of standard options of deploying Kafka-based EventSourcing 
microservice - deployment with reset. It includes running {{kafka-reset-tool}} 
before deploying and running microservice which should reset offsets for input 
topics and delete intermediate topics to allow microservice rebuild its 
internal state. 
We have faced a problem several times (its not 100% reproducible) that 
{{LogCleaner}} crashed with {{NoSuchFileException}} during compaction routing 
after such deployment. See detailed log:
{code}
Cleaning mechanism disable possible compaction: [2018-06-14 18:25:01,464] INFO 
The cleaning for partition 
reports-app-ReportsContractExpansionStateStore-changelog-7 is aborted 
Cleaning happended:                             [2018-06-14 18:25:56,761] INFO 
Deleting index 
/var/lib/kafka/data/reports-app-ReportsContractExpansionStateStore-changelog-7/00000000000000000000.timeindex
 (kafka.log.TimeIndex)
Compaction failed 'cause it can't find file:    [2018-06-14 18:25:57,402] ERROR 
[kafka-log-cleaner-thread-0], Error due to  (kafka.log.LogCleaner)
                                                    
kafka.common.KafkaStorageException: Failed to change the timeindex file suffix 
from  to .deleted for log segment 0
                                                            at 
kafka.log.LogSegment.kafkaStorageException$1(LogSegment.scala:340)
                                                            at 
kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:350)
                                                            at 
kafka.log.Log.kafka$log$Log$$asyncDeleteSegment(Log.scala:981)
                                                            at 
kafka.log.Log$$anonfun$replaceSegments$1.apply(Log.scala:1027)
                                                            at 
kafka.log.Log$$anonfun$replaceSegments$1.apply(Log.scala:1022)
                                                            at 
scala.collection.immutable.List.foreach(List.scala:381)
                                                            at 
kafka.log.Log.replaceSegments(Log.scala:1022)
                                                            at 
kafka.log.Cleaner.cleanSegments(LogCleaner.scala:426)
                                                            at 
kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:363)
                                                            at 
kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:362)
                                                            at 
scala.collection.immutable.List.foreach(List.scala:381)
                                                            at 
kafka.log.Cleaner.clean(LogCleaner.scala:362)
                                                            at 
kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:241)
                                                            at 
kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:220)
                                                            at 
kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
                                                    Caused by: 
java.nio.file.NoSuchFileException: 
/var/lib/kafka/data/reports-app-ReportsContractExpansionStateStore-changelog-7/00000000000000000000.timeindex
                                                            at 
sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
                                                            at 
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
                                                            at 
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
                                                            at 
sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:409)
                                                            at 
sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
                                                            at 
java.nio.file.Files.move(Files.java:1395)
                                                            at 
org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:711)
                                                            at 
kafka.log.AbstractIndex.renameTo(AbstractIndex.scala:126)
                                                            ... 14 more
                                                            Suppressed: 
java.nio.file.NoSuchFileException: 
/var/lib/kafka/data/reports-app-ReportsContractExpansionStateStore-changelog-7/00000000000000000000.timeindex
 -> 
                                                    
/var/lib/kafka/data/reports-app-ReportsContractExpansionStateStore-changelog-7/00000000000000000000.timeindex.deleted
                                                                    at 
sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
                                                                    at 
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
                                                                    at 
sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:396)
                                                                    at 
sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
                                                                    at 
java.nio.file.Files.move(Files.java:1395)
                                                                    at 
org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:708)
                                                                    ... 15 more

{code}

It looks like that disabling compaction while deleting log doesn't work.


> Race condition between Kafka log deletion and compaction
> --------------------------------------------------------
>
>                 Key: KAFKA-7074
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7074
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 0.10.2.1
>            Reporter: Phil Mikhailov
>            Priority: Major
>
> We have one of standard options of deploying Kafka-based EventSourcing 
> microservice - deployment with reset. It includes running 
> {{kafka-reset-tool}} before deploying and running microservice which should 
> reset offsets for input topics and delete intermediate topics to allow 
> microservice rebuild its internal state. 
> We have faced a problem several times (its not 100% reproducible) that 
> {{LogCleaner}} crashed with {{NoSuchFileException}} during compaction routing 
> after such deployment. See detailed log:
> {code}
> Cleaning mechanism disable possible compaction: [2018-06-14 18:25:01,464] 
> INFO The cleaning for partition 
> reports-app-ReportsContractExpansionStateStore-changelog-7 is aborted 
> Cleaning happended:                             [2018-06-14 18:25:56,761] 
> INFO Deleting index 
> /var/lib/kafka/data/reports-app-ReportsContractExpansionStateStore-changelog-7/00000000000000000000.timeindex
>  (kafka.log.TimeIndex)
> Compaction failed 'cause it can't find file:    [2018-06-14 18:25:57,402] 
> ERROR [kafka-log-cleaner-thread-0], Error due to  (kafka.log.LogCleaner)
>                                                     
> kafka.common.KafkaStorageException: Failed to change the timeindex file 
> suffix from  to .deleted for log segment 0
>                                                             at 
> kafka.log.LogSegment.kafkaStorageException$1(LogSegment.scala:340)
>                                                             at 
> kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:350)
>                                                             at 
> kafka.log.Log.kafka$log$Log$$asyncDeleteSegment(Log.scala:981)
>                                                             at 
> kafka.log.Log$$anonfun$replaceSegments$1.apply(Log.scala:1027)
>                                                             at 
> kafka.log.Log$$anonfun$replaceSegments$1.apply(Log.scala:1022)
>                                                             at 
> scala.collection.immutable.List.foreach(List.scala:381)
>                                                             at 
> kafka.log.Log.replaceSegments(Log.scala:1022)
>                                                             at 
> kafka.log.Cleaner.cleanSegments(LogCleaner.scala:426)
>                                                             at 
> kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:363)
>                                                             at 
> kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:362)
>                                                             at 
> scala.collection.immutable.List.foreach(List.scala:381)
>                                                             at 
> kafka.log.Cleaner.clean(LogCleaner.scala:362)
>                                                             at 
> kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:241)
>                                                             at 
> kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:220)
>                                                             at 
> kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
>                                                     Caused by: 
> java.nio.file.NoSuchFileException: 
> /var/lib/kafka/data/reports-app-ReportsContractExpansionStateStore-changelog-7/00000000000000000000.timeindex
>                                                             at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
>                                                             at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>                                                             at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>                                                             at 
> sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:409)
>                                                             at 
> sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
>                                                             at 
> java.nio.file.Files.move(Files.java:1395)
>                                                             at 
> org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:711)
>                                                             at 
> kafka.log.AbstractIndex.renameTo(AbstractIndex.scala:126)
>                                                             ... 14 more
>                                                             Suppressed: 
> java.nio.file.NoSuchFileException: 
> /var/lib/kafka/data/reports-app-ReportsContractExpansionStateStore-changelog-7/00000000000000000000.timeindex
>  -> 
>                                                     
> /var/lib/kafka/data/reports-app-ReportsContractExpansionStateStore-changelog-7/00000000000000000000.timeindex.deleted
>                                                                     at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
>                                                                     at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>                                                                     at 
> sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:396)
>                                                                     at 
> sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
>                                                                     at 
> java.nio.file.Files.move(Files.java:1395)
>                                                                     at 
> org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:708)
>                                                                     ... 15 
> more
> {code}
> It looks like disabling compaction while deleting log doesn't work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to