Can anyone clarify what (other than the known cases of compaction or transactions) could be causing non-contiguous offsets?
That sounds like a potential defect, given that I ran billions of messages a day through kafka 0.8.x series for years without seeing that. On Tue, Jan 23, 2018 at 3:35 PM, Justin Miller <justin.mil...@protectwise.com> wrote: > Hi Matthias and Guozhang, > > Given that information, I think I’m going to try out the following in our > data lake persisters (spark-streaming-kafka): > https://issues.apache.org/jira/browse/SPARK-17147 > <https://issues.apache.org/jira/browse/SPARK-17147> > > Skipping one message out of 10+ billion a day won’t be the end of the world > for this topic and it’ll save me from having to manually restart the process. > :) > > These topics aren’t compacted, and we’re still only on 0.10 (switched to 0.10 > today), but we were able to reproduce the issue when we restarted the Kafka > brokers migrating from 0.9.0.0 message format to 0.10.2. > > Thanks, > Justin > >> On Jan 23, 2018, at 2:31 PM, Guozhang Wang <wangg...@gmail.com> wrote: >> >> Hello Justin, >> >> There are actually multi reasons that can cause incontinuous offsets, or >> "holes" in the Kafka partition logs: >> >> 1. compaction, you knew it already. >> 2. when transactions are turned on, then some offsets are actually taken by >> the "transaction marker" messages, which will not be exposed by the >> consumer since they are only used internally. So from the reader's pov >> there are holes in the offsets. >> >> >> >> Guozhang >> >> >> >> >> On Tue, Jan 23, 2018 at 9:52 AM, Justin Miller < >> justin.mil...@protectwise.com> wrote: >> >>> Greetings, >>> >>> We’ve seen a strange situation where-in the topic is not compacted but the >>> offset numbers inside the partition (#93) are not contiguous. This only >>> happens once a day though, on a topic with billions of messages per day. >>> >>> next offset = 1786997223 >>> next offset = 1786997224 >>> next offset = 1786997226 >>> next offset = 1786997227 >>> next offset = 1786997228 >>> >>> I was wondering if this still holds with Kafka 0.10, 0.11, 1.0: >>> http://grokbase.com/t/kafka/users/12bpnexg1m/dumb-question-about-offsets < >>> http://grokbase.com/t/kafka/users/12bpnexg1m/dumb-question-about-offsets> >>> >>> Specifically: “In Kafka 0.8, each message is assigned a monotonically >>> increasing, contiguous sequence number per partition,starting with 1.” >>> >>> We’re on Kafka 1.0 with logs at version 0.9.0.0. >>> >>> Thanks! >>> Justin >> >> >> >> >> -- >> -- Guozhang >