Thanks for your reply!

I may not use "normalization". What I want to refer to is:

appendInfo.setLastOffset(offset.value - 1)

which underneath updates the base offset field (in record batch) but not
the offset delta of each record.

Best,
tison.


Justine Olshan <jols...@confluent.io.invalid> 于2023年8月8日周二 00:43写道:

> The sequence summary looks right to me.
> For log normalization, are you referring to compaction? The segment's first
> and last offsets might change, but a batch keeps its offsets when
> compaction occurs.
>
> Hope that helps.
> Justine
>
> On Mon, Aug 7, 2023 at 8:59 AM Matthias J. Sax <mj...@apache.org> wrote:
>
> > > but the base offset may change during log normalizing.
> >
> > Not sure what you mean by "normalization" but offsets are immutable, so
> > they don't change. (To be fair, I am not an expert on brokers, so not
> > sure how this work in detail when log compaction ticks in).
> >
> > > This field is given by the producer and the broker should only read it.
> >
> > Sounds right. The point being is, that the broker has an "expected"
> > value for it, and if the provided value does not match the expected one,
> > the write is rejected to begin with.
> >
> >
> > -Matthias
> >
> > On 8/7/23 6:35 AM, tison wrote:
> > > Hi Matthias and Justine,
> > >
> > > Thanks for your reply!
> > >
> > > I can summarize the answer as -
> > >
> > > Record offset = base offset + offset delta. This field is calculated by
> > the
> > > broker and the delta won't change but the base offset may change during
> > log
> > > normalizing.
> > > Record sequence = base sequence + (offset) delta. This field is given
> by
> > > the producer and the broker should only read it.
> > >
> > > Is it correct?
> > >
> > > I implement the manipulation part of base offset following this
> > > understanding at [1].
> > >
> > > Best,
> > > tison.
> > >
> > > [1]
> > >
> >
> https://github.com/tisonkun/kafka-api/blob/d080ab7e4b57c0ab0182e0b254333f400e616cd2/simplesrv/src/lib.rs#L391-L394
> > >
> > >
> > > Justine Olshan <jols...@confluent.io.invalid> 于2023年8月2日周三 04:19写道:
> > >
> > >> For what it's worth -- the sequence number is not calculated
> > >> "baseOffset/baseSequence + offset delta" but rather by monotonically
> > >> increasing for a given epoch. If the epoch is bumped, we reset back to
> > >> zero.
> > >> This may mean that the offset and sequence may match, but do not
> > strictly
> > >> need to be the same. The sequence number will also always come from
> the
> > >> client and be in the produce records sent to the Kafka broker.
> > >>
> > >> As for offsets, there is some code in the log layer that maintains the
> > log
> > >> end offset and assigns offsets to the records. The produce handling on
> > the
> > >> leader should typically assign the offset.
> > >> I believe you can find that code here:
> > >>
> > >>
> >
> https://github.com/apache/kafka/blob/b9a45546a7918799b6fb3c0fe63b56f47d8fcba9/core/src/main/scala/kafka/log/UnifiedLog.scala#L766
> > >>
> > >> Justine
> > >>
> > >> On Tue, Aug 1, 2023 at 11:38 AM Matthias J. Sax <mj...@apache.org>
> > wrote:
> > >>
> > >>> The _offset_ is the position of the record in the partition.
> > >>>
> > >>> The _sequence number_ is a unique ID that allows broker to
> de-duplicate
> > >>> messages. It requires the producer to implement the idempotency
> > protocol
> > >>> (part of Kafka transactions); thus, sequence numbers are optional and
> > as
> > >>> long as you don't want to support idempotent writes, you don't need
> to
> > >>> worry about them. (If you want to dig into details, checkout KIP-98
> > that
> > >>> is the original KIP about Kafka TX).
> > >>>
> > >>> HTH,
> > >>>     -Matthias
> > >>>
> > >>> On 8/1/23 2:19 AM, tison wrote:
> > >>>> Hi,
> > >>>>
> > >>>> I'm wringing a Kafka API Rust codec library[1] to understand how
> Kafka
> > >>>> models its concepts and how the core business logic works.
> > >>>>
> > >>>> During implementing the codec for Records[2], I saw a twins of
> fields
> > >>>> "sequence" and "offset". Both of them are calculated by
> > >>>> baseOffset/baseSequence + offset delta. Then I'm a bit confused how
> to
> > >>> deal
> > >>>> with them properly - what's the difference between these two
> concepts
> > >>>> logically?
> > >>>>
> > >>>> Also, to understand how the core business logic works, I write a
> > simple
> > >>>> server based on my codec library, and observe that the server may
> need
> > >> to
> > >>>> update offset for records produced. How does Kafka set the correct
> > >> offset
> > >>>> for each produced records? And how does Kafka maintain the
> calculation
> > >>> for
> > >>>> offset and sequence during these modifications?
> > >>>>
> > >>>> I'll appreciate if anyone can answer the question or give some
> > insights
> > >>> :D
> > >>>>
> > >>>> Best,
> > >>>> tison.
> > >>>>
> > >>>> [1] https://github.com/tisonkun/kafka-api
> > >>>> [2] https://kafka.apache.org/documentation/#messageformat
> > >>>>
> > >>>
> > >>
> > >
> >
>

Reply via email to