Re: Offsets getting lost if no messages sent for a long time

2016-08-23 Thread Gerard Klijs
I don't know the answer to the second question, if you don't use (much)
auto-generated id's for the consumer group you should be ok, since it's a
compacted topic after all, you might want to check if the compaction is on.
We set the offsets.retention.minutes to a week without a problem.

On Tue, Aug 23, 2016 at 12:21 PM Michael Freeman 
wrote:

> Might be easier to handle duplicate messages as opposed to handling long
> periods of time without messages.
>
> Michael
>
> > On 22 Aug 2016, at 15:55, Misra, Rahul 
> wrote:
> >
> > Hi,
> >
> > Can anybody provide any guidance on the following:
> >
> > 1. Given a limited set of groups and consumers, will increasing
> 'offsets.retention.minutes' to a high value (say 30 days) cause the
> __consumer_offsets topic to bloat unnecessarily or will compaction ensure
> that the entries for each key remain limited (which would mean that having
> a high 'offsets.retention.minutes' value is not a problem. I would prefer
> this option).
> >
> > 2. If the consumer calls commitSync() with latest already committed
> offsets (which have been committed already but no messages have been
> received for a long time after that), will it make an entry to the
> __consumer_offsets topic and ensure that the offsets are retained even with
> a small 'offsets.retention.minutes'? In our application the dry period
> (period without a new message is not well defined in advance).
> >
> >
> > Regards,
> > Rahul Misra
> >
> >
> > -----Original Message-----
> > From: Misra, Rahul [mailto:rahul.mi...@altisource.com]
> > Sent: Sunday, August 21, 2016 12:46 AM
> > To: Ian Wrigley; users@kafka.apache.org
> > Subject: RE: Offsets getting lost if no messages sent for a long time
> >
> > Hi Ian,
> >
> > Thanks for the quick response. Your answer clears things up.
> > I have some follow up questions though:
> >
> > 1. Given a limited set of groups and consumers, will increasing
> 'offsets.retention.minutes' to a high value (say 30 days) cause the
> __consumer_offsets topic to bloat unnecessarily or will compaction ensure
> that the entries for each key remain limited (which would mean that having
> a high 'offsets.retention.minutes' value is not a problem. I would prefer
> this option).
> >
> > 2. If the consumer calls commitSync() with latest already committed
> offsets (which have been committed already but no messages have been
> received for a long time after that), will it make an entry to the
> __consumer_offsets topic and ensure that the offsets are retained even with
> a small 'offsets.retention.minutes'? In our application the dry period
> (period without a new message is not well defined in advance).
> >
> >
> > Regards,
> > Rahul Misra
> >
> >
> >
> >
> >
> > -Original Message-
> > From: Ian Wrigley [mailto:i...@confluent.io]
> > Sent: Sunday, August 21, 2016 12:01 AM
> > To: users@kafka.apache.org
> > Subject: Re: Offsets getting lost if no messages sent for a long time
> >
> > Since nothing was written to the __consumer_offsets topic for more than
> its configured retention period (offsets.retention.minutes, by default 1440
> minutes, or one day), the offset info will be removed. Retention period is
> all about when the last offset was written, not the last time a Consumer
> looked at a topic.
> >
> > You can increase the value of offsets.retention.minutes to ensure that
> offset info isn’t cleaned out before more messages are written to a topic
> and read by the Consumer (and hence the Consumer updates its offset info in
> __consumer_offsets).
> >
> > Ian.
> >
> > ---
> > Ian Wrigley
> > Director, Education Services
> > Confluent, Inc
> >
> >> On Aug 20, 2016, at 11:36 AM, Misra, Rahul 
> wrote:
> >>
> >> Hi,
> >>
> >> I have observed the following scenario (the consumer here has
> 'enable.auto.commit=false' and offsets are committed using commitSync() if
> any messages are received):
> >>
> >> 1.  Start a consumer (with a specific group.Id) and send some
> messages to its subscribed topic.
> >>
> >> 2.  The consumer consumes the messages and the group+consumer has
> an entry in the __commit_offsets with the latest offsets for this group and
> consumer.
> >>
> >> 3.  The consumer will keep polling the topic but don't send any
> more messages to the topic for a long time (longer than one day. The
> consumer keeps polling the topic in a 

Re: Offsets getting lost if no messages sent for a long time

2016-08-23 Thread Michael Freeman
Might be easier to handle duplicate messages as opposed to handling long 
periods of time without messages.

Michael

> On 22 Aug 2016, at 15:55, Misra, Rahul  wrote:
> 
> Hi,
> 
> Can anybody provide any guidance on the following:
> 
> 1. Given a limited set of groups and consumers, will increasing 
> 'offsets.retention.minutes' to a high value (say 30 days) cause the 
> __consumer_offsets topic to bloat unnecessarily or will compaction ensure 
> that the entries for each key remain limited (which would mean that having a 
> high 'offsets.retention.minutes' value is not a problem. I would prefer this 
> option).
> 
> 2. If the consumer calls commitSync() with latest already committed offsets 
> (which have been committed already but no messages have been received for a 
> long time after that), will it make an entry to the __consumer_offsets topic 
> and ensure that the offsets are retained even with a small 
> 'offsets.retention.minutes'? In our application the dry period (period 
> without a new message is not well defined in advance).
> 
> 
> Regards,
> Rahul Misra
> 
> 
> -Original Message-
> From: Misra, Rahul [mailto:rahul.mi...@altisource.com] 
> Sent: Sunday, August 21, 2016 12:46 AM
> To: Ian Wrigley; users@kafka.apache.org
> Subject: RE: Offsets getting lost if no messages sent for a long time
> 
> Hi Ian,
> 
> Thanks for the quick response. Your answer clears things up.
> I have some follow up questions though:
> 
> 1. Given a limited set of groups and consumers, will increasing 
> 'offsets.retention.minutes' to a high value (say 30 days) cause the 
> __consumer_offsets topic to bloat unnecessarily or will compaction ensure 
> that the entries for each key remain limited (which would mean that having a 
> high 'offsets.retention.minutes' value is not a problem. I would prefer this 
> option).
> 
> 2. If the consumer calls commitSync() with latest already committed offsets 
> (which have been committed already but no messages have been received for a 
> long time after that), will it make an entry to the __consumer_offsets topic 
> and ensure that the offsets are retained even with a small 
> 'offsets.retention.minutes'? In our application the dry period (period 
> without a new message is not well defined in advance).
> 
> 
> Regards,
> Rahul Misra
> 
> 
> 
> 
> 
> -Original Message-
> From: Ian Wrigley [mailto:i...@confluent.io] 
> Sent: Sunday, August 21, 2016 12:01 AM
> To: users@kafka.apache.org
> Subject: Re: Offsets getting lost if no messages sent for a long time
> 
> Since nothing was written to the __consumer_offsets topic for more than its 
> configured retention period (offsets.retention.minutes, by default 1440 
> minutes, or one day), the offset info will be removed. Retention period is 
> all about when the last offset was written, not the last time a Consumer 
> looked at a topic.
> 
> You can increase the value of offsets.retention.minutes to ensure that offset 
> info isn’t cleaned out before more messages are written to a topic and read 
> by the Consumer (and hence the Consumer updates its offset info in 
> __consumer_offsets).
> 
> Ian.
> 
> ---
> Ian Wrigley
> Director, Education Services
> Confluent, Inc
> 
>> On Aug 20, 2016, at 11:36 AM, Misra, Rahul  
>> wrote:
>> 
>> Hi,
>> 
>> I have observed the following scenario (the consumer here has 
>> 'enable.auto.commit=false' and offsets are committed using commitSync() if 
>> any messages are received):
>> 
>> 1.  Start a consumer (with a specific group.Id) and send some messages 
>> to its subscribed topic.
>> 
>> 2.  The consumer consumes the messages and the group+consumer has an 
>> entry in the __commit_offsets with the latest offsets for this group and 
>> consumer.
>> 
>> 3.  The consumer will keep polling the topic but don't send any more 
>> messages to the topic for a long time (longer than one day. The consumer 
>> keeps polling the topic in a while loop). The default duration for which a 
>> group's entry is retained in the offsets topic is 1 day.
>> 
>> 4.  Now stop the consumer. (there is no other consumer for this group)
>> 
>> 5.  Send some more messages to the topic.
>> 
>> 6.  Start the consumer (with the same group and consumer id as earlier).
>> 
>> 7.  The consumer does not pick up the new messages sent in step 5 as it 
>> has lost the committed offsets and starts with the 'latest' offsets.
>> 
>> Is this an expected behavior? Or do I have 

RE: Offsets getting lost if no messages sent for a long time

2016-08-22 Thread Misra, Rahul
Hi,

Can anybody provide any guidance on the following:

1. Given a limited set of groups and consumers, will increasing 
'offsets.retention.minutes' to a high value (say 30 days) cause the 
__consumer_offsets topic to bloat unnecessarily or will compaction ensure that 
the entries for each key remain limited (which would mean that having a high 
'offsets.retention.minutes' value is not a problem. I would prefer this option).

2. If the consumer calls commitSync() with latest already committed offsets 
(which have been committed already but no messages have been received for a 
long time after that), will it make an entry to the __consumer_offsets topic 
and ensure that the offsets are retained even with a small 
'offsets.retention.minutes'? In our application the dry period (period without 
a new message is not well defined in advance).


Regards,
Rahul Misra


-Original Message-
From: Misra, Rahul [mailto:rahul.mi...@altisource.com] 
Sent: Sunday, August 21, 2016 12:46 AM
To: Ian Wrigley; users@kafka.apache.org
Subject: RE: Offsets getting lost if no messages sent for a long time

Hi Ian,

Thanks for the quick response. Your answer clears things up.
I have some follow up questions though:

1. Given a limited set of groups and consumers, will increasing 
'offsets.retention.minutes' to a high value (say 30 days) cause the 
__consumer_offsets topic to bloat unnecessarily or will compaction ensure that 
the entries for each key remain limited (which would mean that having a high 
'offsets.retention.minutes' value is not a problem. I would prefer this option).

2. If the consumer calls commitSync() with latest already committed offsets 
(which have been committed already but no messages have been received for a 
long time after that), will it make an entry to the __consumer_offsets topic 
and ensure that the offsets are retained even with a small 
'offsets.retention.minutes'? In our application the dry period (period without 
a new message is not well defined in advance).


Regards,
Rahul Misra





-Original Message-
From: Ian Wrigley [mailto:i...@confluent.io] 
Sent: Sunday, August 21, 2016 12:01 AM
To: users@kafka.apache.org
Subject: Re: Offsets getting lost if no messages sent for a long time

Since nothing was written to the __consumer_offsets topic for more than its 
configured retention period (offsets.retention.minutes, by default 1440 
minutes, or one day), the offset info will be removed. Retention period is all 
about when the last offset was written, not the last time a Consumer looked at 
a topic.

You can increase the value of offsets.retention.minutes to ensure that offset 
info isn’t cleaned out before more messages are written to a topic and read by 
the Consumer (and hence the Consumer updates its offset info in 
__consumer_offsets).

Ian.

---
Ian Wrigley
Director, Education Services
Confluent, Inc

> On Aug 20, 2016, at 11:36 AM, Misra, Rahul  wrote:
> 
> Hi,
> 
> I have observed the following scenario (the consumer here has 
> 'enable.auto.commit=false' and offsets are committed using commitSync() if 
> any messages are received):
> 
> 1.  Start a consumer (with a specific group.Id) and send some messages to 
> its subscribed topic.
> 
> 2.  The consumer consumes the messages and the group+consumer has an 
> entry in the __commit_offsets with the latest offsets for this group and 
> consumer.
> 
> 3.  The consumer will keep polling the topic but don't send any more 
> messages to the topic for a long time (longer than one day. The consumer 
> keeps polling the topic in a while loop). The default duration for which a 
> group's entry is retained in the offsets topic is 1 day.
> 
> 4.  Now stop the consumer. (there is no other consumer for this group)
> 
> 5.  Send some more messages to the topic.
> 
> 6.  Start the consumer (with the same group and consumer id as earlier).
> 
> 7.  The consumer does not pick up the new messages sent in step 5 as it 
> has lost the committed offsets and starts with the 'latest' offsets.
> 
> Is this an expected behavior? Or do I have something wrong in the 
> configurations?
> Is there a way to ensure that the offsets are retained even if there are no 
> messages flowing in?
> I had  assumed that if the consumer keeps polling the kafka topic, its 
> offsets will be retained even if no messages are received by the 
> corresponding topic.
> 
> Regards,
> Rahul
> This email message and any attachments are intended solely for the use of the 
> addressee. If you are not the intended recipient, you are prohibited from 
> reading, disclosing, reproducing, distributing, disseminating or otherwise 
> using this transmission. If you have received this message in error, please 
> promptly notify the send

RE: Offsets getting lost if no messages sent for a long time

2016-08-20 Thread Misra, Rahul
Hi Ian,

Thanks for the quick response. Your answer clears things up.
I have some follow up questions though:

1. Given a limited set of groups and consumers, will increasing 
'offsets.retention.minutes' to a high value (say 30 days) cause the 
__consumer_offsets topic to bloat unnecessarily or will compaction ensure that 
the entries for each key remain limited (which would mean that having a high 
'offsets.retention.minutes' value is not a problem. I would prefer this option).

2. If the consumer calls commitSync() with latest already committed offsets 
(which have been committed already but no messages have been received for a 
long time after that), will it make an entry to the __consumer_offsets topic 
and ensure that the offsets are retained even with a small 
'offsets.retention.minutes'? In our application the dry period (period without 
a new message is not well defined in advance).


Regards,
Rahul Misra





-Original Message-
From: Ian Wrigley [mailto:i...@confluent.io] 
Sent: Sunday, August 21, 2016 12:01 AM
To: users@kafka.apache.org
Subject: Re: Offsets getting lost if no messages sent for a long time

Since nothing was written to the __consumer_offsets topic for more than its 
configured retention period (offsets.retention.minutes, by default 1440 
minutes, or one day), the offset info will be removed. Retention period is all 
about when the last offset was written, not the last time a Consumer looked at 
a topic.

You can increase the value of offsets.retention.minutes to ensure that offset 
info isn’t cleaned out before more messages are written to a topic and read by 
the Consumer (and hence the Consumer updates its offset info in 
__consumer_offsets).

Ian.

---
Ian Wrigley
Director, Education Services
Confluent, Inc

> On Aug 20, 2016, at 11:36 AM, Misra, Rahul  wrote:
> 
> Hi,
> 
> I have observed the following scenario (the consumer here has 
> 'enable.auto.commit=false' and offsets are committed using commitSync() if 
> any messages are received):
> 
> 1.  Start a consumer (with a specific group.Id) and send some messages to 
> its subscribed topic.
> 
> 2.  The consumer consumes the messages and the group+consumer has an 
> entry in the __commit_offsets with the latest offsets for this group and 
> consumer.
> 
> 3.  The consumer will keep polling the topic but don't send any more 
> messages to the topic for a long time (longer than one day. The consumer 
> keeps polling the topic in a while loop). The default duration for which a 
> group's entry is retained in the offsets topic is 1 day.
> 
> 4.  Now stop the consumer. (there is no other consumer for this group)
> 
> 5.  Send some more messages to the topic.
> 
> 6.  Start the consumer (with the same group and consumer id as earlier).
> 
> 7.  The consumer does not pick up the new messages sent in step 5 as it 
> has lost the committed offsets and starts with the 'latest' offsets.
> 
> Is this an expected behavior? Or do I have something wrong in the 
> configurations?
> Is there a way to ensure that the offsets are retained even if there are no 
> messages flowing in?
> I had  assumed that if the consumer keeps polling the kafka topic, its 
> offsets will be retained even if no messages are received by the 
> corresponding topic.
> 
> Regards,
> Rahul
> This email message and any attachments are intended solely for the use of the 
> addressee. If you are not the intended recipient, you are prohibited from 
> reading, disclosing, reproducing, distributing, disseminating or otherwise 
> using this transmission. If you have received this message in error, please 
> promptly notify the sender by reply email and immediately delete this message 
> from your system. This message and any attachments may contain information 
> that is confidential, privileged or exempt from disclosure. Delivery of this 
> message to any person other than the intended recipient is not intended to 
> waive any right or privilege. Message transmission is not guaranteed to be 
> secure or free of software viruses. 
> ***

This email message and any attachments are intended solely for the use of the 
addressee. If you are not the intended recipient, you are prohibited from 
reading, disclosing, reproducing, distributing, disseminating or otherwise 
using this transmission. If you have received this message in error, please 
promptly notify the sender by reply email and immediately delete this message 
from your system. This message and any attachments may contain information that 
is confidential, privileged or exempt from disclosure. Delivery of this message 
to any person other than the intended recipient is not intended to waive any 
right or privilege. Message transmission is not guaranteed to be secure or free 
of software viruses. 
***


Re: Offsets getting lost if no messages sent for a long time

2016-08-20 Thread Ian Wrigley
Since nothing was written to the __consumer_offsets topic for more than its 
configured retention period (offsets.retention.minutes, by default 1440 
minutes, or one day), the offset info will be removed. Retention period is all 
about when the last offset was written, not the last time a Consumer looked at 
a topic.

You can increase the value of offsets.retention.minutes to ensure that offset 
info isn’t cleaned out before more messages are written to a topic and read by 
the Consumer (and hence the Consumer updates its offset info in 
__consumer_offsets).

Ian.

---
Ian Wrigley
Director, Education Services
Confluent, Inc

> On Aug 20, 2016, at 11:36 AM, Misra, Rahul  wrote:
> 
> Hi,
> 
> I have observed the following scenario (the consumer here has 
> 'enable.auto.commit=false' and offsets are committed using commitSync() if 
> any messages are received):
> 
> 1.  Start a consumer (with a specific group.Id) and send some messages to 
> its subscribed topic.
> 
> 2.  The consumer consumes the messages and the group+consumer has an 
> entry in the __commit_offsets with the latest offsets for this group and 
> consumer.
> 
> 3.  The consumer will keep polling the topic but don't send any more 
> messages to the topic for a long time (longer than one day. The consumer 
> keeps polling the topic in a while loop). The default duration for which a 
> group's entry is retained in the offsets topic is 1 day.
> 
> 4.  Now stop the consumer. (there is no other consumer for this group)
> 
> 5.  Send some more messages to the topic.
> 
> 6.  Start the consumer (with the same group and consumer id as earlier).
> 
> 7.  The consumer does not pick up the new messages sent in step 5 as it 
> has lost the committed offsets and starts with the 'latest' offsets.
> 
> Is this an expected behavior? Or do I have something wrong in the 
> configurations?
> Is there a way to ensure that the offsets are retained even if there are no 
> messages flowing in?
> I had  assumed that if the consumer keeps polling the kafka topic, its 
> offsets will be retained even if no messages are received by the 
> corresponding topic.
> 
> Regards,
> Rahul
> This email message and any attachments are intended solely for the use of the 
> addressee. If you are not the intended recipient, you are prohibited from 
> reading, disclosing, reproducing, distributing, disseminating or otherwise 
> using this transmission. If you have received this message in error, please 
> promptly notify the sender by reply email and immediately delete this message 
> from your system. This message and any attachments may contain information 
> that is confidential, privileged or exempt from disclosure. Delivery of this 
> message to any person other than the intended recipient is not intended to 
> waive any right or privilege. Message transmission is not guaranteed to be 
> secure or free of software viruses. 
> ***