[ 
https://issues.apache.org/jira/browse/KAFKA-7074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phil Mikhailov updated KAFKA-7074:
----------------------------------
    Description: 
We have one of standard options of deploying Kafka-based EventSourcing 
microservice - deployment with reset. It includes running {{kafka-reset-tool}} 
before deploying and running microservice which should reset offsets for input 
topics and delete intermediate topics to allow microservice rebuild its 
internal state. 
We have faced a problem several times (its not 100% reproducible) that 
{{LogCleaner}} crashed with {{NoSuchFileException}} during compaction routing 
after such deployment. See detailed log:
{code}
Cleaning mechanism disable possible compaction: [2018-06-14 18:25:01,464] INFO 
The cleaning for partition 
reports-app-ReportsContractExpansionStateStore-changelog-7 is aborted 
Cleaning happended:                             [2018-06-14 18:25:56,761] INFO 
Deleting index 
/var/lib/kafka/data/reports-app-ReportsContractExpansionStateStore-changelog-7/00000000000000000000.timeindex
 (kafka.log.TimeIndex)
Compaction failed 'cause it can't find file:    [2018-06-14 18:25:57,402] ERROR 
[kafka-log-cleaner-thread-0], Error due to  (kafka.log.LogCleaner)
                                                    
kafka.common.KafkaStorageException: Failed to change the timeindex file suffix 
from  to .deleted for log segment 0
                                                            at 
kafka.log.LogSegment.kafkaStorageException$1(LogSegment.scala:340)
                                                            at 
kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:350)
                                                            at 
kafka.log.Log.kafka$log$Log$$asyncDeleteSegment(Log.scala:981)
                                                            at 
kafka.log.Log$$anonfun$replaceSegments$1.apply(Log.scala:1027)
                                                            at 
kafka.log.Log$$anonfun$replaceSegments$1.apply(Log.scala:1022)
                                                            at 
scala.collection.immutable.List.foreach(List.scala:381)
                                                            at 
kafka.log.Log.replaceSegments(Log.scala:1022)
                                                            at 
kafka.log.Cleaner.cleanSegments(LogCleaner.scala:426)
                                                            at 
kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:363)
                                                            at 
kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:362)
                                                            at 
scala.collection.immutable.List.foreach(List.scala:381)
                                                            at 
kafka.log.Cleaner.clean(LogCleaner.scala:362)
                                                            at 
kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:241)
                                                            at 
kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:220)
                                                            at 
kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
                                                    Caused by: 
java.nio.file.NoSuchFileException: 
/var/lib/kafka/data/reports-app-ReportsContractExpansionStateStore-changelog-7/00000000000000000000.timeindex
                                                            at 
sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
                                                            at 
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
                                                            at 
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
                                                            at 
sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:409)
                                                            at 
sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
                                                            at 
java.nio.file.Files.move(Files.java:1395)
                                                            at 
org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:711)
                                                            at 
kafka.log.AbstractIndex.renameTo(AbstractIndex.scala:126)
                                                            ... 14 more
                                                            Suppressed: 
java.nio.file.NoSuchFileException: 
/var/lib/kafka/data/reports-app-ReportsContractExpansionStateStore-changelog-7/00000000000000000000.timeindex
 -> 
                                                    
/var/lib/kafka/data/reports-app-ReportsContractExpansionStateStore-changelog-7/00000000000000000000.timeindex.deleted
                                                                    at 
sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
                                                                    at 
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
                                                                    at 
sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:396)
                                                                    at 
sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
                                                                    at 
java.nio.file.Files.move(Files.java:1395)
                                                                    at 
org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:708)
                                                                    ... 15 more

{code}

It looks like disabling compaction while deleting log doesn't work.
After that compaction stops working and space consumption grows and there is 
only one solution to this problem is to restart broker.

  was:
We have one of standard options of deploying Kafka-based EventSourcing 
microservice - deployment with reset. It includes running {{kafka-reset-tool}} 
before deploying and running microservice which should reset offsets for input 
topics and delete intermediate topics to allow microservice rebuild its 
internal state. 
We have faced a problem several times (its not 100% reproducible) that 
{{LogCleaner}} crashed with {{NoSuchFileException}} during compaction routing 
after such deployment. See detailed log:
{code}
Cleaning mechanism disable possible compaction: [2018-06-14 18:25:01,464] INFO 
The cleaning for partition 
reports-app-ReportsContractExpansionStateStore-changelog-7 is aborted 
Cleaning happended:                             [2018-06-14 18:25:56,761] INFO 
Deleting index 
/var/lib/kafka/data/reports-app-ReportsContractExpansionStateStore-changelog-7/00000000000000000000.timeindex
 (kafka.log.TimeIndex)
Compaction failed 'cause it can't find file:    [2018-06-14 18:25:57,402] ERROR 
[kafka-log-cleaner-thread-0], Error due to  (kafka.log.LogCleaner)
                                                    
kafka.common.KafkaStorageException: Failed to change the timeindex file suffix 
from  to .deleted for log segment 0
                                                            at 
kafka.log.LogSegment.kafkaStorageException$1(LogSegment.scala:340)
                                                            at 
kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:350)
                                                            at 
kafka.log.Log.kafka$log$Log$$asyncDeleteSegment(Log.scala:981)
                                                            at 
kafka.log.Log$$anonfun$replaceSegments$1.apply(Log.scala:1027)
                                                            at 
kafka.log.Log$$anonfun$replaceSegments$1.apply(Log.scala:1022)
                                                            at 
scala.collection.immutable.List.foreach(List.scala:381)
                                                            at 
kafka.log.Log.replaceSegments(Log.scala:1022)
                                                            at 
kafka.log.Cleaner.cleanSegments(LogCleaner.scala:426)
                                                            at 
kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:363)
                                                            at 
kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:362)
                                                            at 
scala.collection.immutable.List.foreach(List.scala:381)
                                                            at 
kafka.log.Cleaner.clean(LogCleaner.scala:362)
                                                            at 
kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:241)
                                                            at 
kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:220)
                                                            at 
kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
                                                    Caused by: 
java.nio.file.NoSuchFileException: 
/var/lib/kafka/data/reports-app-ReportsContractExpansionStateStore-changelog-7/00000000000000000000.timeindex
                                                            at 
sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
                                                            at 
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
                                                            at 
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
                                                            at 
sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:409)
                                                            at 
sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
                                                            at 
java.nio.file.Files.move(Files.java:1395)
                                                            at 
org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:711)
                                                            at 
kafka.log.AbstractIndex.renameTo(AbstractIndex.scala:126)
                                                            ... 14 more
                                                            Suppressed: 
java.nio.file.NoSuchFileException: 
/var/lib/kafka/data/reports-app-ReportsContractExpansionStateStore-changelog-7/00000000000000000000.timeindex
 -> 
                                                    
/var/lib/kafka/data/reports-app-ReportsContractExpansionStateStore-changelog-7/00000000000000000000.timeindex.deleted
                                                                    at 
sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
                                                                    at 
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
                                                                    at 
sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:396)
                                                                    at 
sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
                                                                    at 
java.nio.file.Files.move(Files.java:1395)
                                                                    at 
org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:708)
                                                                    ... 15 more

{code}

It looks like disabling compaction while deleting log doesn't work.


> Race condition between Kafka log deletion and compaction
> --------------------------------------------------------
>
>                 Key: KAFKA-7074
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7074
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 0.10.2.1
>            Reporter: Phil Mikhailov
>            Priority: Major
>
> We have one of standard options of deploying Kafka-based EventSourcing 
> microservice - deployment with reset. It includes running 
> {{kafka-reset-tool}} before deploying and running microservice which should 
> reset offsets for input topics and delete intermediate topics to allow 
> microservice rebuild its internal state. 
> We have faced a problem several times (its not 100% reproducible) that 
> {{LogCleaner}} crashed with {{NoSuchFileException}} during compaction routing 
> after such deployment. See detailed log:
> {code}
> Cleaning mechanism disable possible compaction: [2018-06-14 18:25:01,464] 
> INFO The cleaning for partition 
> reports-app-ReportsContractExpansionStateStore-changelog-7 is aborted 
> Cleaning happended:                             [2018-06-14 18:25:56,761] 
> INFO Deleting index 
> /var/lib/kafka/data/reports-app-ReportsContractExpansionStateStore-changelog-7/00000000000000000000.timeindex
>  (kafka.log.TimeIndex)
> Compaction failed 'cause it can't find file:    [2018-06-14 18:25:57,402] 
> ERROR [kafka-log-cleaner-thread-0], Error due to  (kafka.log.LogCleaner)
>                                                     
> kafka.common.KafkaStorageException: Failed to change the timeindex file 
> suffix from  to .deleted for log segment 0
>                                                             at 
> kafka.log.LogSegment.kafkaStorageException$1(LogSegment.scala:340)
>                                                             at 
> kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:350)
>                                                             at 
> kafka.log.Log.kafka$log$Log$$asyncDeleteSegment(Log.scala:981)
>                                                             at 
> kafka.log.Log$$anonfun$replaceSegments$1.apply(Log.scala:1027)
>                                                             at 
> kafka.log.Log$$anonfun$replaceSegments$1.apply(Log.scala:1022)
>                                                             at 
> scala.collection.immutable.List.foreach(List.scala:381)
>                                                             at 
> kafka.log.Log.replaceSegments(Log.scala:1022)
>                                                             at 
> kafka.log.Cleaner.cleanSegments(LogCleaner.scala:426)
>                                                             at 
> kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:363)
>                                                             at 
> kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:362)
>                                                             at 
> scala.collection.immutable.List.foreach(List.scala:381)
>                                                             at 
> kafka.log.Cleaner.clean(LogCleaner.scala:362)
>                                                             at 
> kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:241)
>                                                             at 
> kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:220)
>                                                             at 
> kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
>                                                     Caused by: 
> java.nio.file.NoSuchFileException: 
> /var/lib/kafka/data/reports-app-ReportsContractExpansionStateStore-changelog-7/00000000000000000000.timeindex
>                                                             at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
>                                                             at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>                                                             at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>                                                             at 
> sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:409)
>                                                             at 
> sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
>                                                             at 
> java.nio.file.Files.move(Files.java:1395)
>                                                             at 
> org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:711)
>                                                             at 
> kafka.log.AbstractIndex.renameTo(AbstractIndex.scala:126)
>                                                             ... 14 more
>                                                             Suppressed: 
> java.nio.file.NoSuchFileException: 
> /var/lib/kafka/data/reports-app-ReportsContractExpansionStateStore-changelog-7/00000000000000000000.timeindex
>  -> 
>                                                     
> /var/lib/kafka/data/reports-app-ReportsContractExpansionStateStore-changelog-7/00000000000000000000.timeindex.deleted
>                                                                     at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
>                                                                     at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>                                                                     at 
> sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:396)
>                                                                     at 
> sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
>                                                                     at 
> java.nio.file.Files.move(Files.java:1395)
>                                                                     at 
> org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:708)
>                                                                     ... 15 
> more
> {code}
> It looks like disabling compaction while deleting log doesn't work.
> After that compaction stops working and space consumption grows and there is 
> only one solution to this problem is to restart broker.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to