jolshan commented on a change in pull request #9590: URL: https://github.com/apache/kafka/pull/9590#discussion_r622456807
########## File path: core/src/test/scala/unit/kafka/log/LogCleanerTest.scala ########## @@ -984,19 +1003,26 @@ class LogCleanerTest { def distinctValuesBySegment = log.logSegments.map(s => s.log.records.asScala.map(record => TestUtils.readString(record.value)).toSet.size).toSeq - val disctinctValuesBySegmentBeforeClean = distinctValuesBySegment + val distinctValuesBySegmentBeforeClean = distinctValuesBySegment assertTrue(distinctValuesBySegment.reverse.tail.forall(_ > N), "Test is not effective unless each segment contains duplicates. Increase segment size or decrease number of keys.") + log.updateHighWatermark(log.activeSegment.baseOffset) cleaner.clean(LogToClean(new TopicPartition("test", 0), log, 0, firstUncleanableOffset)) val distinctValuesBySegmentAfterClean = distinctValuesBySegment - assertTrue(disctinctValuesBySegmentBeforeClean.zip(distinctValuesBySegmentAfterClean) - .take(numCleanableSegments).forall { case (before, after) => after < before }, + // One segment should have been completely deleted, so there will be fewer segments. + assertTrue(distinctValuesBySegmentAfterClean.size < distinctValuesBySegmentBeforeClean.size) + + // Drop the first segment from before cleaning since it was removed. Also subtract 1 from numCleanableSegments + val normalizedDistinctValuesBySegmentBeforeClean = distinctValuesBySegmentBeforeClean.drop(1) Review comment: This test is a little tricky, but I've updated it. Now it only uses duplicate keys. It's a little confusing because the first uncleanable offset is not actually the point at which records below are cleaned. The segments cleaned are the full segments below the uncleanable offset (so if the segments before the uncleanable offset in this case). And even then, one record (the last record in the last segment) will be retained due to how cleaning works. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org