Re: [PR] KAFKA-806: Index may not always observe log.index.interval.bytes [kafka]

via GitHub Thu, 06 Feb 2025 14:31:34 -0800


chia7712 commented on code in PR #18012:
URL: https://github.com/apache/kafka/pull/18012#discussion_r1945549851



##########
storage/src/main/java/org/apache/kafka/storage/internals/log/LogSegment.java:
##########
@@ -232,38 +232,38 @@ private boolean canConvertToRelativeOffset(long offset) 
throws IOException {
      * It is assumed this method is being called from within a lock, it is not 
thread-safe otherwise.
      *
      * @param largestOffset The last offset in the message set
-     * @param largestTimestampMs The largest timestamp in the message set.
-     * @param shallowOffsetOfMaxTimestamp The last offset of earliest batch 
with max timestamp in the messages to append.
-     * @param records The log entries to append.
+     * @param records       The log entries to append.
      * @throws LogSegmentOffsetOverflowException if the largest offset causes 
index offset overflow
      */
     public void append(long largestOffset,
-                       long largestTimestampMs,
-                       long shallowOffsetOfMaxTimestamp,
                        MemoryRecords records) throws IOException {
         if (records.sizeInBytes() > 0) {
-            LOGGER.trace("Inserting {} bytes at end offset {} at position {} 
with largest timestamp {} at offset {}",
-                records.sizeInBytes(), largestOffset, log.sizeInBytes(), 
largestTimestampMs, shallowOffsetOfMaxTimestamp);
+            LOGGER.trace("Inserting {} bytes at end offset {} at position {}",
+                records.sizeInBytes(), largestOffset, log.sizeInBytes());
             int physicalPosition = log.sizeInBytes();
-            if (physicalPosition == 0)
-                rollingBasedTimestamp = OptionalLong.of(largestTimestampMs);
 
             ensureOffsetInRange(largestOffset);
 
             // append the messages
             long appendedBytes = log.append(records);
             LOGGER.trace("Appended {} to {} at end offset {}", appendedBytes, 
log.file(), largestOffset);
-            // Update the in memory max timestamp and corresponding offset.
-            if (largestTimestampMs > maxTimestampSoFar()) {
-                maxTimestampAndOffsetSoFar = new 
TimestampOffset(largestTimestampMs, shallowOffsetOfMaxTimestamp);
-            }
-            // append an entry to the index (if needed)
-            if (bytesSinceLastIndexEntry > indexIntervalBytes) {
-                offsetIndex().append(largestOffset, physicalPosition);

Review Comment:
   In scenarios where a new follower is synchronizing from another replica, 
another issue arises. If the records consists of multiple batches, the 
`physicalPosition` will reflect the position of the first batch. This leads to 
an inaccurate index, represented as `(offset_of_last_batch, 
position_of_first_batch)`. This issue can be easily reproduced, and our dump 
log tool currently displays warnings to indicate this inconsistency.
   ```
   Mismatches in :/home/chia7712/ikea-0-follower/00000000000000000000.index
     Index offset: 2062680, log offset: 2061012
     Index offset: 2060984, log offset: 2058974
     Index offset: 2058940, log offset: 2056966
     Index offset: 2056936, log offset: 2054979
     Index offset: 2054950, log offset: 2052962
     Index offset: 2052931, log offset: 2050959
     Index offset: 2050923, log offset: 2048924
     Index offset: 2048895, log offset: 2046967
   ```
   
   This drawback is analogous to having fewer index entries, which can lead to 
increased fetch times. I'm happy that @FrankYang0529 fix "two" issues by this 
PR. 
   
   However, I'm wondering if we should address this issue in the active 3.x 
version. We could calculate the accurate position by using the size of the last 
batch. For example:
   ```
                   int position = records.lastBatch().map(b -> 
log.sizeInBytes() - b.sizeInBytes()).orElse(physicalPosition);
                   offsetIndex().append(largestOffset, position);
   ```
   
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] KAFKA-806: Index may not always observe log.index.interval.bytes [kafka]

Reply via email to