hachikuji commented on a change in pull request #9590: URL: https://github.com/apache/kafka/pull/9590#discussion_r544593141
########## File path: clients/src/main/java/org/apache/kafka/common/record/MemoryRecords.java ########## @@ -198,7 +198,7 @@ private static FilterResult filterTo(TopicPartition partition, Iterable<MutableR if (!retainedRecords.isEmpty()) { if (writeOriginalBatch) { batch.writeTo(bufferOutputStream); - filterResult.updateRetainedBatchMetadata(batch, retainedRecords.size(), false); + filterResult.updateRetainedBatchMetadata(batch, retainedRecords.get(0).offset(), retainedRecords.size(), false); Review comment: I think we're on the right track really. We don't have to implement exactly what the jira describes. One thing to keep in mind is that the first record in the log might be a control record which a consumer won't return to applications anyway. Also note that the log start offset has implications for replication. Only the records above the log start offset will be replicated to a new replica, but all of the retained records after cleaning should be replicated. Hence we already have some constraints which make keeping the log start offset aligned with the first returnable record difficult. I would just say that we define and document a reasonable invariant. I think it should be clear that we need to ensure the segment base offset is less than or equal to the base offset of the first batch contained in it. The invariant I would suggest is that the log start offset should be less than or equal to the end offset of the first batch that can be returned from `Fetch`. For the case of log cleaning, the simplest thing to do is keep the log start offset aligned with the base offset of the first batch. In the case of `DeleteRecords`, the log start offset will point to some offset within the first batch. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org