jolshan commented on a change in pull request #9590:
URL: https://github.com/apache/kafka/pull/9590#discussion_r622456807



##########
File path: core/src/test/scala/unit/kafka/log/LogCleanerTest.scala
##########
@@ -984,19 +1003,26 @@ class LogCleanerTest {
 
     def distinctValuesBySegment = log.logSegments.map(s => 
s.log.records.asScala.map(record => 
TestUtils.readString(record.value)).toSet.size).toSeq
 
-    val disctinctValuesBySegmentBeforeClean = distinctValuesBySegment
+    val distinctValuesBySegmentBeforeClean = distinctValuesBySegment
     assertTrue(distinctValuesBySegment.reverse.tail.forall(_ > N),
       "Test is not effective unless each segment contains duplicates. Increase 
segment size or decrease number of keys.")
 
+    log.updateHighWatermark(log.activeSegment.baseOffset)
     cleaner.clean(LogToClean(new TopicPartition("test", 0), log, 0, 
firstUncleanableOffset))
 
     val distinctValuesBySegmentAfterClean = distinctValuesBySegment
 
-    
assertTrue(disctinctValuesBySegmentBeforeClean.zip(distinctValuesBySegmentAfterClean)
-      .take(numCleanableSegments).forall { case (before, after) => after < 
before },
+    // One segment should have been completely deleted, so there will be fewer 
segments.
+    assertTrue(distinctValuesBySegmentAfterClean.size < 
distinctValuesBySegmentBeforeClean.size)
+
+    // Drop the first segment from before cleaning since it was removed. Also 
subtract 1 from numCleanableSegments
+    val normalizedDistinctValuesBySegmentBeforeClean = 
distinctValuesBySegmentBeforeClean.drop(1)

Review comment:
       This test is a little tricky, but I've updated it. Now it only uses 
duplicate keys. It's a little confusing because the first uncleanable offset is 
not actually the point at which records below are cleaned. The segments cleaned 
are the full segments below the uncleanable offset (so if the segments before 
the uncleanable offset in this case). And even then, one record (the last 
record in the last segment) will be retained due to how cleaning works. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to