[ https://issues.apache.org/jira/browse/KAFKA-14238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17606000#comment-17606000 ]
Jose Armando Garcia Sancio edited comment on KAFKA-14238 at 9/16/22 9:44 PM: ----------------------------------------------------------------------------- Was able to write a test that fails with the current implementation: {code:java} > Task :core:test FAILED kafka.raft.KafkaMetadataLogTest.testSegmentLessThanLatestSnapshot() failed, log available in /home/jsancio/work/kafka/core/build/reports/testOutput/kafka.raft.KafkaMetadataLogTest.testSegmentNotDeleteWithoutSnapshot().test.stdoutKafkaMetadataLogTest > testSegmentNotDeleteWithoutSnapshot() FAILED org.opentest4j.AssertionFailedError: latest snapshot offset (1440) must be >= log start offset (20010) ==> expected: <true> but was: <false> at org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151) at org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132) at org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63) at org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36) at org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:210) at kafka.raft.KafkaMetadataLogTest.testSegmentNotDeleteWithoutSnapshot(KafkaMetadataLogTest.scala:921) {code} was (Author: jagsancio): Was able to write a test that fails with the current implementation: {code:java} > Task :core:test FAILED kafka.raft.KafkaMetadataLogTest.testSegmentNotDeleteWithoutSnapshot() failed, log available in /home/jsancio/work/kafka/core/build/reports/testOutput/kafka.raft.KafkaMetadataLogTest.testSegmentNotDeleteWithoutSnapshot().test.stdoutKafkaMetadataLogTest > testSegmentNotDeleteWithoutSnapshot() FAILED org.opentest4j.AssertionFailedError: latest snapshot offset (1440) must be >= log start offset (20010) ==> expected: <true> but was: <false> at org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151) at org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132) at org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63) at org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36) at org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:210) at kafka.raft.KafkaMetadataLogTest.testSegmentNotDeleteWithoutSnapshot(KafkaMetadataLogTest.scala:921) {code} > KRaft replicas can delete segments not included in a snapshot > ------------------------------------------------------------- > > Key: KAFKA-14238 > URL: https://issues.apache.org/jira/browse/KAFKA-14238 > Project: Kafka > Issue Type: Bug > Components: core, kraft > Reporter: Jose Armando Garcia Sancio > Assignee: Jose Armando Garcia Sancio > Priority: Blocker > Fix For: 3.3.0 > > > We see this in the log > {code:java} > Deleting segment LogSegment(baseOffset=243864, size=9269150, > lastModifiedTime=1662486784182, largestRecordTimestamp=Some(1662486784160)) > due to retention time 604800000ms breach based on the largest record > timestamp in the segment {code} > This then cause {{KafkaRaftClient}} to throw an exception when sending > batches to the listener: > {code:java} > java.lang.IllegalStateException: Snapshot expected since next offset of > org.apache.kafka.controller.QuorumController$QuorumMetaLogListener@195461949 > is 0, log start offset is 369668 and high-watermark is 547379 > at > org.apache.kafka.raft.KafkaRaftClient.lambda$updateListenersProgress$4(KafkaRaftClient.java:312) > at java.base/java.util.Optional.orElseThrow(Optional.java:403) > at > org.apache.kafka.raft.KafkaRaftClient.lambda$updateListenersProgress$5(KafkaRaftClient.java:311) > at java.base/java.util.OptionalLong.ifPresent(OptionalLong.java:165) > at > org.apache.kafka.raft.KafkaRaftClient.updateListenersProgress(KafkaRaftClient.java:309){code} > The on disk state for the cluster metadata partition confirms this: > {code:java} > ls __cluster_metadata-0/ > 00000000000000369668.index > 00000000000000369668.log > 00000000000000369668.timeindex > 00000000000000503411.index > 00000000000000503411.log > 00000000000000503411.snapshot > 00000000000000503411.timeindex > 00000000000000548746.snapshot > leader-epoch-checkpoint > partition.metadata > quorum-state{code} > Noticed that there are no {{checkpoint}} files and the log doesn't have a > segment at base offset 0. > This is happening because the {{LogConfig}} used for KRaft sets the retention > policy to {{delete}} which causes the method {{deleteOldSegments}} to delete > old segments even if there are no snaspshot for it. For KRaft, Kafka should > only delete segment that breach the log start offset. > Log configuration for KRaft: > {code:java} > val props = new Properties() > props.put(LogConfig.MaxMessageBytesProp, > config.maxBatchSizeInBytes.toString) > props.put(LogConfig.SegmentBytesProp, Int.box(config.logSegmentBytes)) > props.put(LogConfig.SegmentMsProp, Long.box(config.logSegmentMillis)) > props.put(LogConfig.FileDeleteDelayMsProp, > Int.box(Defaults.FileDeleteDelayMs)) > LogConfig.validateValues(props) > val defaultLogConfig = LogConfig(props){code} > Segment deletion code: > {code:java} > def deleteOldSegments(): Int = { > if (config.delete) { > deleteLogStartOffsetBreachedSegments() + > deleteRetentionSizeBreachedSegments() + > deleteRetentionMsBreachedSegments() > } else { > deleteLogStartOffsetBreachedSegments() > } > }{code} -- This message was sent by Atlassian Jira (v8.20.10#820010)