chia7712 commented on code in PR #15621: URL: https://github.com/apache/kafka/pull/15621#discussion_r1593241011
########## core/src/test/scala/unit/kafka/log/LogValidatorTest.scala: ########## @@ -404,9 +414,7 @@ class LogValidatorTest { assertEquals(now + 1, validatingResults.maxTimestampMs, s"Max timestamp should be ${now + 1}") - val expectedOffsetOfMaxTimestamp = 1 - assertEquals(expectedOffsetOfMaxTimestamp, validatingResults.offsetOfMaxTimestampMs, - s"Offset of max timestamp should be 1") + assertEquals(2, validatingResults.shallowOffsetOfMaxTimestamp) Review Comment: yep, that is a "TYPO" but it does not change the test. We do pass the "NONE" to create `LogValidator` so it will run the path `assignOffsetsNonCompressed` https://github.com/apache/kafka/blob/525b9b1d7682ae2a527ceca83fedca44b1cba11a/core/src/test/scala/unit/kafka/log/LogValidatorTest.scala#L377 However, I do observe a potential bug. **context** 1. Those batches can have different compression 2. We take the compression from last batch https://github.com/apache/kafka/blob/525b9b1d7682ae2a527ceca83fedca44b1cba11a/core/src/main/scala/kafka/log/UnifiedLog.scala#L1180 **potential bug** topic-level compression = GZIP batch_0 = NONE batch_1 = GZIP In this case, we don't rebuild records according to topic-level compression since the compression of "last batch" is equal to `GZIP`. Hence, it results in batch_0 having incorrect compression. This bug does not produce corrupt records, so we can add comments/docs to describe that issue. Or we can fix it by changing the `sourceCompression` to be a "collection" of all batches' compression, and then do conversion if one of them is mismatched. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org