Re: Contiguous Offsets on non-compacted topics

2018-01-24 Thread Cody Koeninger
Can anyone clarify what (other than the known cases of compaction or transactions) could be causing non-contiguous offsets? That sounds like a potential defect, given that I ran billions of messages a day through kafka 0.8.x series for years without seeing that. On Tue, Jan 23, 2018 at 3:35 PM,

Re: Contiguous Offsets on non-compacted topics

2018-01-23 Thread Justin Miller
Hi Matthias and Guozhang, Given that information, I think I’m going to try out the following in our data lake persisters (spark-streaming-kafka): https://issues.apache.org/jira/browse/SPARK-17147 Skipping one message out of 10+ billion a day

Re: Contiguous Offsets on non-compacted topics

2018-01-23 Thread Guozhang Wang
Hello Justin, There are actually multi reasons that can cause incontinuous offsets, or "holes" in the Kafka partition logs: 1. compaction, you knew it already. 2. when transactions are turned on, then some offsets are actually taken by the "transaction marker" messages, which will not be exposed

Re: Contiguous Offsets on non-compacted topics

2018-01-23 Thread Matthias J. Sax
In general offsets should be consecutive. However, this is no "official guarantee" and you should not build application that rely on consecutive offsets. Also note, with Kafka 0.11 and transactions, commit/abort markers require on offset in the partitions and thus, having "offset gaps" is normal

Contiguous Offsets on non-compacted topics

2018-01-23 Thread Justin Miller
Greetings, We’ve seen a strange situation where-in the topic is not compacted but the offset numbers inside the partition (#93) are not contiguous. This only happens once a day though, on a topic with billions of messages per day. next offset = 1786997223 next offset = 1786997224 next offset