hachikuji commented on a change in pull request #9590:
URL: https://github.com/apache/kafka/pull/9590#discussion_r544593141



##########
File path: 
clients/src/main/java/org/apache/kafka/common/record/MemoryRecords.java
##########
@@ -198,7 +198,7 @@ private static FilterResult filterTo(TopicPartition 
partition, Iterable<MutableR
             if (!retainedRecords.isEmpty()) {
                 if (writeOriginalBatch) {
                     batch.writeTo(bufferOutputStream);
-                    filterResult.updateRetainedBatchMetadata(batch, 
retainedRecords.size(), false);
+                    filterResult.updateRetainedBatchMetadata(batch, 
retainedRecords.get(0).offset(), retainedRecords.size(), false);

Review comment:
       I think we're on the right track really. We don't have to implement 
exactly what the jira describes. One thing to keep in mind is that the first 
record in the log might be a control record which a consumer won't return to 
applications anyway. Also note that the log start offset also has implications 
for replication. Only the records above the log start offset will be replicated 
to a new replica, but all of the retained records after cleaning should be 
replicated. Hence we already have some constraints which make keeping the log 
start offset aligned with the first returnable record difficult.
   
   I would just say that we define and document a reasonable invariant. I think 
it should be clear that we need to ensure the segment base offset is less than 
or equal to the base offset of the first batch contained in it. The invariant I 
would suggest is that the log start offset should be less than or equal to the 
end offset of the first batch that can be returned from `Fetch`. For the case 
of log cleaning, the simplest thing to do is keep the log start offset aligned 
with the base offset of the first batch. In the case of `DeleteRecords`, the 
log start offset will point to some offset within the first batch.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to