Re: [QUESTION] What is the difference between sequence and offset for a Record?

2023-08-09 Thread tison
Thanks for your reply! I may not use "normalization". What I want to refer to is: appendInfo.setLastOffset(offset.value - 1) which underneath updates the base offset field (in record batch) but not the offset delta of each record. Best, tison. Justine Olshan 于2023年8月8日周二 00:43写道: > The

Re: [QUESTION] What is the difference between sequence and offset for a Record?

2023-08-07 Thread Justine Olshan
The sequence summary looks right to me. For log normalization, are you referring to compaction? The segment's first and last offsets might change, but a batch keeps its offsets when compaction occurs. Hope that helps. Justine On Mon, Aug 7, 2023 at 8:59 AM Matthias J. Sax wrote: > > but the

Re: [QUESTION] What is the difference between sequence and offset for a Record?

2023-08-07 Thread Matthias J. Sax
but the base offset may change during log normalizing. Not sure what you mean by "normalization" but offsets are immutable, so they don't change. (To be fair, I am not an expert on brokers, so not sure how this work in detail when log compaction ticks in). This field is given by the

Re: [QUESTION] What is the difference between sequence and offset for a Record?

2023-08-07 Thread tison
Hi Matthias and Justine, Thanks for your reply! I can summarize the answer as - Record offset = base offset + offset delta. This field is calculated by the broker and the delta won't change but the base offset may change during log normalizing. Record sequence = base sequence + (offset) delta.

Re: [QUESTION] What is the difference between sequence and offset for a Record?

2023-08-01 Thread Justine Olshan
For what it's worth -- the sequence number is not calculated "baseOffset/baseSequence + offset delta" but rather by monotonically increasing for a given epoch. If the epoch is bumped, we reset back to zero. This may mean that the offset and sequence may match, but do not strictly need to be the

Re: [QUESTION] What is the difference between sequence and offset for a Record?

2023-08-01 Thread Matthias J. Sax
The _offset_ is the position of the record in the partition. The _sequence number_ is a unique ID that allows broker to de-duplicate messages. It requires the producer to implement the idempotency protocol (part of Kafka transactions); thus, sequence numbers are optional and as long as you

[QUESTION] What is the difference between sequence and offset for a Record?

2023-08-01 Thread tison
Hi, I'm wringing a Kafka API Rust codec library[1] to understand how Kafka models its concepts and how the core business logic works. During implementing the codec for Records[2], I saw a twins of fields "sequence" and "offset". Both of them are calculated by baseOffset/baseSequence + offset