The _offset_ is the position of the record in the partition.

The _sequence number_ is a unique ID that allows broker to de-duplicate messages. It requires the producer to implement the idempotency protocol (part of Kafka transactions); thus, sequence numbers are optional and as long as you don't want to support idempotent writes, you don't need to worry about them. (If you want to dig into details, checkout KIP-98 that is the original KIP about Kafka TX).

HTH,
  -Matthias

On 8/1/23 2:19 AM, tison wrote:
Hi,

I'm wringing a Kafka API Rust codec library[1] to understand how Kafka
models its concepts and how the core business logic works.

During implementing the codec for Records[2], I saw a twins of fields
"sequence" and "offset". Both of them are calculated by
baseOffset/baseSequence + offset delta. Then I'm a bit confused how to deal
with them properly - what's the difference between these two concepts
logically?

Also, to understand how the core business logic works, I write a simple
server based on my codec library, and observe that the server may need to
update offset for records produced. How does Kafka set the correct offset
for each produced records? And how does Kafka maintain the calculation for
offset and sequence during these modifications?

I'll appreciate if anyone can answer the question or give some insights :D

Best,
tison.

[1] https://github.com/tisonkun/kafka-api
[2] https://kafka.apache.org/documentation/#messageformat

Reply via email to