Re: Contiguous Offsets on non-compacted topics

2018-01-24 Thread Cody Koeninger
Can anyone clarify what (other than the known cases of compaction or
transactions) could be causing non-contiguous offsets?

That sounds like a potential defect, given that I ran billions of
messages a day through kafka 0.8.x series for years without seeing
that.

On Tue, Jan 23, 2018 at 3:35 PM, Justin Miller
 wrote:
> Hi Matthias and Guozhang,
>
> Given that information, I think I’m going to try out the following in our 
> data lake persisters (spark-streaming-kafka):
> https://issues.apache.org/jira/browse/SPARK-17147 
> 
>
> Skipping one message out of 10+ billion a day won’t be the end of the world 
> for this topic and it’ll save me from having to manually restart the process. 
> :)
>
> These topics aren’t compacted, and we’re still only on 0.10 (switched to 0.10 
> today), but we were able to reproduce the issue when we restarted the Kafka 
> brokers migrating from 0.9.0.0 message format to 0.10.2.
>
> Thanks,
> Justin
>
>> On Jan 23, 2018, at 2:31 PM, Guozhang Wang  wrote:
>>
>> Hello Justin,
>>
>> There are actually multi reasons that can cause incontinuous offsets, or
>> "holes" in the Kafka partition logs:
>>
>> 1. compaction, you knew it already.
>> 2. when transactions are turned on, then some offsets are actually taken by
>> the "transaction marker" messages, which will not be exposed by the
>> consumer since they are only used internally. So from the reader's pov
>> there are holes in the offsets.
>>
>>
>>
>> Guozhang
>>
>>
>>
>>
>> On Tue, Jan 23, 2018 at 9:52 AM, Justin Miller <
>> justin.mil...@protectwise.com> wrote:
>>
>>> Greetings,
>>>
>>> We’ve seen a strange situation where-in the topic is not compacted but the
>>> offset numbers inside the partition (#93) are not contiguous. This only
>>> happens once a day though, on a topic with billions of messages per day.
>>>
>>> next offset = 1786997223
>>> next offset = 1786997224
>>> next offset = 1786997226
>>> next offset = 1786997227
>>> next offset = 1786997228
>>>
>>> I was wondering if this still holds with Kafka 0.10, 0.11, 1.0:
>>> http://grokbase.com/t/kafka/users/12bpnexg1m/dumb-question-about-offsets <
>>> http://grokbase.com/t/kafka/users/12bpnexg1m/dumb-question-about-offsets>
>>>
>>> Specifically: “In Kafka 0.8, each message is assigned a monotonically
>>> increasing, contiguous sequence number per partition,starting with 1.”
>>>
>>> We’re on Kafka 1.0 with logs at version 0.9.0.0.
>>>
>>> Thanks!
>>> Justin
>>
>>
>>
>>
>> --
>> -- Guozhang
>


Re: Contiguous Offsets on non-compacted topics

2018-01-23 Thread Justin Miller
Hi Matthias and Guozhang,

Given that information, I think I’m going to try out the following in our data 
lake persisters (spark-streaming-kafka): 
https://issues.apache.org/jira/browse/SPARK-17147 


Skipping one message out of 10+ billion a day won’t be the end of the world for 
this topic and it’ll save me from having to manually restart the process. :)

These topics aren’t compacted, and we’re still only on 0.10 (switched to 0.10 
today), but we were able to reproduce the issue when we restarted the Kafka 
brokers migrating from 0.9.0.0 message format to 0.10.2.

Thanks,
Justin

> On Jan 23, 2018, at 2:31 PM, Guozhang Wang  wrote:
> 
> Hello Justin,
> 
> There are actually multi reasons that can cause incontinuous offsets, or
> "holes" in the Kafka partition logs:
> 
> 1. compaction, you knew it already.
> 2. when transactions are turned on, then some offsets are actually taken by
> the "transaction marker" messages, which will not be exposed by the
> consumer since they are only used internally. So from the reader's pov
> there are holes in the offsets.
> 
> 
> 
> Guozhang
> 
> 
> 
> 
> On Tue, Jan 23, 2018 at 9:52 AM, Justin Miller <
> justin.mil...@protectwise.com> wrote:
> 
>> Greetings,
>> 
>> We’ve seen a strange situation where-in the topic is not compacted but the
>> offset numbers inside the partition (#93) are not contiguous. This only
>> happens once a day though, on a topic with billions of messages per day.
>> 
>> next offset = 1786997223
>> next offset = 1786997224
>> next offset = 1786997226
>> next offset = 1786997227
>> next offset = 1786997228
>> 
>> I was wondering if this still holds with Kafka 0.10, 0.11, 1.0:
>> http://grokbase.com/t/kafka/users/12bpnexg1m/dumb-question-about-offsets <
>> http://grokbase.com/t/kafka/users/12bpnexg1m/dumb-question-about-offsets>
>> 
>> Specifically: “In Kafka 0.8, each message is assigned a monotonically
>> increasing, contiguous sequence number per partition,starting with 1.”
>> 
>> We’re on Kafka 1.0 with logs at version 0.9.0.0.
>> 
>> Thanks!
>> Justin
> 
> 
> 
> 
> -- 
> -- Guozhang



Re: Contiguous Offsets on non-compacted topics

2018-01-23 Thread Guozhang Wang
Hello Justin,

There are actually multi reasons that can cause incontinuous offsets, or
"holes" in the Kafka partition logs:

1. compaction, you knew it already.
2. when transactions are turned on, then some offsets are actually taken by
the "transaction marker" messages, which will not be exposed by the
consumer since they are only used internally. So from the reader's pov
there are holes in the offsets.



Guozhang




On Tue, Jan 23, 2018 at 9:52 AM, Justin Miller <
justin.mil...@protectwise.com> wrote:

> Greetings,
>
> We’ve seen a strange situation where-in the topic is not compacted but the
> offset numbers inside the partition (#93) are not contiguous. This only
> happens once a day though, on a topic with billions of messages per day.
>
> next offset = 1786997223
> next offset = 1786997224
> next offset = 1786997226
> next offset = 1786997227
> next offset = 1786997228
>
> I was wondering if this still holds with Kafka 0.10, 0.11, 1.0:
> http://grokbase.com/t/kafka/users/12bpnexg1m/dumb-question-about-offsets <
> http://grokbase.com/t/kafka/users/12bpnexg1m/dumb-question-about-offsets>
>
> Specifically: “In Kafka 0.8, each message is assigned a monotonically
> increasing, contiguous sequence number per partition,starting with 1.”
>
> We’re on Kafka 1.0 with logs at version 0.9.0.0.
>
> Thanks!
> Justin




-- 
-- Guozhang


Re: Contiguous Offsets on non-compacted topics

2018-01-23 Thread Matthias J. Sax
In general offsets should be consecutive. However, this is no "official
guarantee" and you should not build application that rely on consecutive
offsets.

Also note, with Kafka 0.11 and transactions, commit/abort markers
require on offset in the partitions and thus, having "offset gaps" is
normal for this case.

Not sure atm, why you have a "offset gap" as your 0.9 log format does
not support transactions.


-Matthias


On 1/23/18 9:52 AM, Justin Miller wrote:
> Greetings, 
> 
> We’ve seen a strange situation where-in the topic is not compacted but the 
> offset numbers inside the partition (#93) are not contiguous. This only 
> happens once a day though, on a topic with billions of messages per day.
> 
> next offset = 1786997223
> next offset = 1786997224
> next offset = 1786997226
> next offset = 1786997227
> next offset = 1786997228
> 
> I was wondering if this still holds with Kafka 0.10, 0.11, 1.0:   
> http://grokbase.com/t/kafka/users/12bpnexg1m/dumb-question-about-offsets 
> 
> 
> Specifically: “In Kafka 0.8, each message is assigned a monotonically 
> increasing, contiguous sequence number per partition,starting with 1.”
> 
> We’re on Kafka 1.0 with logs at version 0.9.0.0.
> 
> Thanks!
> Justin
> 



signature.asc
Description: OpenPGP digital signature


Contiguous Offsets on non-compacted topics

2018-01-23 Thread Justin Miller
Greetings, 

We’ve seen a strange situation where-in the topic is not compacted but the 
offset numbers inside the partition (#93) are not contiguous. This only happens 
once a day though, on a topic with billions of messages per day.

next offset = 1786997223
next offset = 1786997224
next offset = 1786997226
next offset = 1786997227
next offset = 1786997228

I was wondering if this still holds with Kafka 0.10, 0.11, 1.0: 
http://grokbase.com/t/kafka/users/12bpnexg1m/dumb-question-about-offsets 


Specifically: “In Kafka 0.8, each message is assigned a monotonically 
increasing, contiguous sequence number per partition,starting with 1.”

We’re on Kafka 1.0 with logs at version 0.9.0.0.

Thanks!
Justin