Can anyone clarify what (other than the known cases of compaction or
transactions) could be causing non-contiguous offsets?

That sounds like a potential defect, given that I ran billions of
messages a day through kafka 0.8.x series for years without seeing
that.

On Tue, Jan 23, 2018 at 3:35 PM, Justin Miller
<justin.mil...@protectwise.com> wrote:
> Hi Matthias and Guozhang,
>
> Given that information, I think I’m going to try out the following in our 
> data lake persisters (spark-streaming-kafka):
> https://issues.apache.org/jira/browse/SPARK-17147 
> <https://issues.apache.org/jira/browse/SPARK-17147>
>
> Skipping one message out of 10+ billion a day won’t be the end of the world 
> for this topic and it’ll save me from having to manually restart the process. 
> :)
>
> These topics aren’t compacted, and we’re still only on 0.10 (switched to 0.10 
> today), but we were able to reproduce the issue when we restarted the Kafka 
> brokers migrating from 0.9.0.0 message format to 0.10.2.
>
> Thanks,
> Justin
>
>> On Jan 23, 2018, at 2:31 PM, Guozhang Wang <wangg...@gmail.com> wrote:
>>
>> Hello Justin,
>>
>> There are actually multi reasons that can cause incontinuous offsets, or
>> "holes" in the Kafka partition logs:
>>
>> 1. compaction, you knew it already.
>> 2. when transactions are turned on, then some offsets are actually taken by
>> the "transaction marker" messages, which will not be exposed by the
>> consumer since they are only used internally. So from the reader's pov
>> there are holes in the offsets.
>>
>>
>>
>> Guozhang
>>
>>
>>
>>
>> On Tue, Jan 23, 2018 at 9:52 AM, Justin Miller <
>> justin.mil...@protectwise.com> wrote:
>>
>>> Greetings,
>>>
>>> We’ve seen a strange situation where-in the topic is not compacted but the
>>> offset numbers inside the partition (#93) are not contiguous. This only
>>> happens once a day though, on a topic with billions of messages per day.
>>>
>>> next offset = 1786997223
>>> next offset = 1786997224
>>> next offset = 1786997226
>>> next offset = 1786997227
>>> next offset = 1786997228
>>>
>>> I was wondering if this still holds with Kafka 0.10, 0.11, 1.0:
>>> http://grokbase.com/t/kafka/users/12bpnexg1m/dumb-question-about-offsets <
>>> http://grokbase.com/t/kafka/users/12bpnexg1m/dumb-question-about-offsets>
>>>
>>> Specifically: “In Kafka 0.8, each message is assigned a monotonically
>>> increasing, contiguous sequence number per partition,starting with 1.”
>>>
>>> We’re on Kafka 1.0 with logs at version 0.9.0.0.
>>>
>>> Thanks!
>>> Justin
>>
>>
>>
>>
>> --
>> -- Guozhang
>

Reply via email to