Recently I am trying to understand the fetch offset mechanism of kafka
through code but I have certain doubts which I am still not able to
understand.

*What I believe Log Segment contains *

 Log Segment constitutes a list of record batches with key as base offset.
Let's take an  example and list segments.
*Segment 1 *
base offset 50
List of Record Batch with start offset  and last offset
1. (50,56) RB1
2. (57,62) RB2
3. (65,92) RB3
*Offset Index (baseoffset(50),relative offset( 0) , position(234))*
*Segment 2*
base offset 93
List of Record Batch with start offset  and last offset
1. (93,98) RB1
2. (99,102) RB2
3. (103,105) RB3


Process of fetching the data with >= targetOffset
Lets say targetOffset = 60

1. We first try to find the segment whose baseoffset is the largest  one
but lesser or equal  than the target Offset(
https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/LocalLog.scala#L396).
In the above case it would return *Segment 1*.

2. Reading the segment
<https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/LocalLog.scala#L421>
3.Then we look up the Offset Index and try to find the largest offset
lesser or equal to the targetOffset.In the translate Offset we execute the
index look up. Code line
<https://github.com/apache/kafka/blob/trunk/storage/src/main/java/org/apache/kafka/storage/internals/log/LogSegment.java#L394>.
It would return mapping which contains offset and position i.e 50,234
4. Using the startposition *234*  and the targetOffset 60 , We try to
execute the function searchForOffsetWithSize
<https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/common/record/FileRecords.java#L316>
which returns the batch whose last offset >= targetOffset.
5. According to the  code, It will return the RecordBatch 2 of Segment 1
i.e. RB2(57,62)  because 62>=60.
6. We return this logoffsetposition
<https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/common/record/FileRecords.java#L320>(
62, batch positon , batch size)

*My questions *
1. Batch position and batch size corresponds to *segment 1 RB2*. The
position of RB2 starts from 57 , then why are we sending the last
offset(62) in the batch position.
2. In the code after fetching the logOffsetPosition
<https://github.com/apache/kafka/blob/trunk/storage/src/main/java/org/apache/kafka/storage/internals/log/LogSegment.java#L434>
I
have not seen any usage of the last offset value returned of a batch , but
I see usage of the position value which would be pointing to offset 57.
3. According to the algorithm, we are sending log data which starts from
the 57th offset position instead of 60th offset position. Is it not
breaching the contract where we want to send log data >= target Offset
Can anyone help me identify the gap in understanding of what I am missing
here.


Thanks and Regards
Arpit Goyal
8861094754

Reply via email to