Recently I am trying to understand the fetch offset mechanism of kafka through code but I have certain doubts which I am still not able to understand.
*What I believe Log Segment contains * Log Segment constitutes a list of record batches with key as base offset. Let's take an example and list segments. *Segment 1 * base offset 50 List of Record Batch with start offset and last offset 1. (50,56) RB1 2. (57,62) RB2 3. (65,92) RB3 *Offset Index (baseoffset(50),relative offset( 0) , position(234))* *Segment 2* base offset 93 List of Record Batch with start offset and last offset 1. (93,98) RB1 2. (99,102) RB2 3. (103,105) RB3 Process of fetching the data with >= targetOffset Lets say targetOffset = 60 1. We first try to find the segment whose baseoffset is the largest one but lesser or equal than the target Offset( https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/LocalLog.scala#L396). In the above case it would return *Segment 1*. 2. Reading the segment <https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/LocalLog.scala#L421> 3.Then we look up the Offset Index and try to find the largest offset lesser or equal to the targetOffset.In the translate Offset we execute the index look up. Code line <https://github.com/apache/kafka/blob/trunk/storage/src/main/java/org/apache/kafka/storage/internals/log/LogSegment.java#L394>. It would return mapping which contains offset and position i.e 50,234 4. Using the startposition *234* and the targetOffset 60 , We try to execute the function searchForOffsetWithSize <https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/common/record/FileRecords.java#L316> which returns the batch whose last offset >= targetOffset. 5. According to the code, It will return the RecordBatch 2 of Segment 1 i.e. RB2(57,62) because 62>=60. 6. We return this logoffsetposition <https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/common/record/FileRecords.java#L320>( 62, batch positon , batch size) *My questions * 1. Batch position and batch size corresponds to *segment 1 RB2*. The position of RB2 starts from 57 , then why are we sending the last offset(62) in the batch position. 2. In the code after fetching the logOffsetPosition <https://github.com/apache/kafka/blob/trunk/storage/src/main/java/org/apache/kafka/storage/internals/log/LogSegment.java#L434> I have not seen any usage of the last offset value returned of a batch , but I see usage of the position value which would be pointing to offset 57. 3. According to the algorithm, we are sending log data which starts from the 57th offset position instead of 60th offset position. Is it not breaching the contract where we want to send log data >= target Offset Can anyone help me identify the gap in understanding of what I am missing here. Thanks and Regards Arpit Goyal 8861094754