Re: understanding OffsetOutOfRangeException's....

2014-01-12 Thread Jason Rosenberg
Not sure, but I'll try (it's a bit difficult to create a test-case, because
it requires a good bit of integration testing, etc.).

Jason


On Sat, Jan 11, 2014 at 12:06 AM, Jun Rao  wrote:

> Do you think you can reproduce this easily?
>
> Thanks,
>
> Jun
>
>


Re: understanding OffsetOutOfRangeException's....

2014-01-10 Thread Jun Rao
Do you think you can reproduce this easily?

Thanks,

Jun


On Fri, Jan 10, 2014 at 11:33 AM, Jason Rosenberg  wrote:

> well, not currently, as we don't have multiple partitions for the
> topics.but yes, I understand that would help too
>
> but, we are using this multiple consumers within a process approach in
> general with much success so far..just was curious about this ERROR I
> was seeing :)
>
>
> On Fri, Jan 10, 2014 at 11:06 AM, Jun Rao  wrote:
>
> > Could you increase parallelism on the consumers?
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Thu, Jan 9, 2014 at 1:22 PM, Jason Rosenberg 
> wrote:
> >
> > > The consumption rate is a little better after the refactoring.  The
> main
> > > issue though, was that we had a mismatch between large and small
> topics.
> >  A
> > > large topic can lag, and adversely affect consumption of other topics,
> so
> > > this is an attempt to isolate topic filtering, and better balance the
> > > consumers for the different topics.
> > >
> > > So, it's definitely working on that score.
> > >
> > > The topic that was lagging (and getting OffsetOutOfRangeExceptions) was
> > > doing that before and after the refactor (and after we started also
> > seeing
> > > the ERROR logging).  But consumption of all other topics is working
> > better
> > > now (almost no lag at all).
> > >
> > > I'm also setting the client.id for each consumer in the process, so
> > that I
> > > can see the individual metrics per consumer.
> > >
> > > Jason
> > >
> > >
> > > On Thu, Jan 9, 2014 at 1:00 PM, Jun Rao  wrote:
> > >
> > > > Does the consumption rate in the client (msg/sec) change
> significantly
> > > > after the refactoring?
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > >
> > > > On Wed, Jan 8, 2014 at 10:44 AM, Jason Rosenberg 
> > > wrote:
> > > >
> > > > > Yes, it's happening continuously, at the moment (although I'm
> > expecting
> > > > the
> > > > > consumer to catch up soon)
> > > > >
> > > > > It seemed to start happening after I refactored the consumer app to
> > use
> > > > > multiple consumer connectors in the same process (each one has a
> > > separate
> > > > > topic filter, so should be no overlap between them).  All using the
> > > same
> > > > > consumer group.
> > > > >
> > > > > Could it be a thread safety issue in the ZookeeperConsumerConnector
> > > > (seems
> > > > > unlikely).
> > > > >
> > > > > Jason
> > > > >
> > > > >
> > > > > On Wed, Jan 8, 2014 at 1:04 AM, Jun Rao  wrote:
> > > > >
> > > > > > Normally, if the consumer can't keep up, you should just see the
> > > > > > OffsetOutOfRangeException warning. The offset mismatch error
> should
> > > > never
> > > > > > happen. It could be that OffsetOutOfRangeException exposed a bug.
> > Do
> > > > you
> > > > > > think you can reproduce this easily?
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jun
> > > > > >
> > > > > >
> > > > > > On Tue, Jan 7, 2014 at 9:29 PM, Jason Rosenberg <
> j...@squareup.com>
> > > > > wrote:
> > > > > >
> > > > > > > Jun,
> > > > > > >
> > > > > > > I'm not sure I understand your question, wrt produced data?
> > > > > > >
> > > > > > > But yes, in general, I believe the consumer is not keeping up
> > with
> > > > the
> > > > > > > broker's deleting the data.  So it's trying to fetch the next
> > batch
> > > > of
> > > > > > > data, but it's last offset is no longer there, etc.  So that's
> > the
> > > > > reason
> > > > > > > for the WARN message, in the fetcher thread.
> > > > > > >
> > > > > > > I'm just not sure I understand then why we don't always see the
> > > > > > > ConsumerIterator error also, because won't there always be
> > missing
> > > > data
> > > > > > > detected there?  Why sometimes and not always?  What's the
> > > > difference?
> > > > > > >
> > > > > > > Jason
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Jan 8, 2014 at 12:07 AM, Jun Rao 
> > wrote:
> > > > > > >
> > > > > > > > The WARN and ERROR may not be completely correlated. Could it
> > be
> > > > that
> > > > > > the
> > > > > > > > consumer is slow and couldn't keep up with the produced data?
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > Jun
> > > > > > > >
> > > > > > > >
> > > > > > > > On Tue, Jan 7, 2014 at 6:47 PM, Jason Rosenberg <
> > > j...@squareup.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > So, sometimes I just get the WARN from the
> > > ConsumerFetcherThread
> > > > > (as
> > > > > > > > > previously noted, above), e.g.:
> > > > > > > > >
> > > > > > > > > 2014-01-08 02:31:47,394  WARN
> > > > > > [ConsumerFetcherThread-myconsumerapp-11]
> > > > > > > > > consumer.ConsumerFetcherThread -
> > > > > > > > > [ConsumerFetcherThread-myconsumerapp-11], Current offset
> > > > > 16163904970
> > > > > > > > > for partition [mypartition,0] out of range; reset offset to
> > > > > > > > > 16175326044
> > > > > > > > >
> > > > > > > > > More recently, I see these in the following log line (not
> > sure
> > > > why
> > > > > I

Re: understanding OffsetOutOfRangeException's....

2014-01-10 Thread Jason Rosenberg
well, not currently, as we don't have multiple partitions for the
topics.but yes, I understand that would help too

but, we are using this multiple consumers within a process approach in
general with much success so far..just was curious about this ERROR I
was seeing :)


On Fri, Jan 10, 2014 at 11:06 AM, Jun Rao  wrote:

> Could you increase parallelism on the consumers?
>
> Thanks,
>
> Jun
>
>
> On Thu, Jan 9, 2014 at 1:22 PM, Jason Rosenberg  wrote:
>
> > The consumption rate is a little better after the refactoring.  The main
> > issue though, was that we had a mismatch between large and small topics.
>  A
> > large topic can lag, and adversely affect consumption of other topics, so
> > this is an attempt to isolate topic filtering, and better balance the
> > consumers for the different topics.
> >
> > So, it's definitely working on that score.
> >
> > The topic that was lagging (and getting OffsetOutOfRangeExceptions) was
> > doing that before and after the refactor (and after we started also
> seeing
> > the ERROR logging).  But consumption of all other topics is working
> better
> > now (almost no lag at all).
> >
> > I'm also setting the client.id for each consumer in the process, so
> that I
> > can see the individual metrics per consumer.
> >
> > Jason
> >
> >
> > On Thu, Jan 9, 2014 at 1:00 PM, Jun Rao  wrote:
> >
> > > Does the consumption rate in the client (msg/sec) change significantly
> > > after the refactoring?
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > > On Wed, Jan 8, 2014 at 10:44 AM, Jason Rosenberg 
> > wrote:
> > >
> > > > Yes, it's happening continuously, at the moment (although I'm
> expecting
> > > the
> > > > consumer to catch up soon)
> > > >
> > > > It seemed to start happening after I refactored the consumer app to
> use
> > > > multiple consumer connectors in the same process (each one has a
> > separate
> > > > topic filter, so should be no overlap between them).  All using the
> > same
> > > > consumer group.
> > > >
> > > > Could it be a thread safety issue in the ZookeeperConsumerConnector
> > > (seems
> > > > unlikely).
> > > >
> > > > Jason
> > > >
> > > >
> > > > On Wed, Jan 8, 2014 at 1:04 AM, Jun Rao  wrote:
> > > >
> > > > > Normally, if the consumer can't keep up, you should just see the
> > > > > OffsetOutOfRangeException warning. The offset mismatch error should
> > > never
> > > > > happen. It could be that OffsetOutOfRangeException exposed a bug.
> Do
> > > you
> > > > > think you can reproduce this easily?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jun
> > > > >
> > > > >
> > > > > On Tue, Jan 7, 2014 at 9:29 PM, Jason Rosenberg 
> > > > wrote:
> > > > >
> > > > > > Jun,
> > > > > >
> > > > > > I'm not sure I understand your question, wrt produced data?
> > > > > >
> > > > > > But yes, in general, I believe the consumer is not keeping up
> with
> > > the
> > > > > > broker's deleting the data.  So it's trying to fetch the next
> batch
> > > of
> > > > > > data, but it's last offset is no longer there, etc.  So that's
> the
> > > > reason
> > > > > > for the WARN message, in the fetcher thread.
> > > > > >
> > > > > > I'm just not sure I understand then why we don't always see the
> > > > > > ConsumerIterator error also, because won't there always be
> missing
> > > data
> > > > > > detected there?  Why sometimes and not always?  What's the
> > > difference?
> > > > > >
> > > > > > Jason
> > > > > >
> > > > > >
> > > > > > On Wed, Jan 8, 2014 at 12:07 AM, Jun Rao 
> wrote:
> > > > > >
> > > > > > > The WARN and ERROR may not be completely correlated. Could it
> be
> > > that
> > > > > the
> > > > > > > consumer is slow and couldn't keep up with the produced data?
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Jun
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Jan 7, 2014 at 6:47 PM, Jason Rosenberg <
> > j...@squareup.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > So, sometimes I just get the WARN from the
> > ConsumerFetcherThread
> > > > (as
> > > > > > > > previously noted, above), e.g.:
> > > > > > > >
> > > > > > > > 2014-01-08 02:31:47,394  WARN
> > > > > [ConsumerFetcherThread-myconsumerapp-11]
> > > > > > > > consumer.ConsumerFetcherThread -
> > > > > > > > [ConsumerFetcherThread-myconsumerapp-11], Current offset
> > > > 16163904970
> > > > > > > > for partition [mypartition,0] out of range; reset offset to
> > > > > > > > 16175326044
> > > > > > > >
> > > > > > > > More recently, I see these in the following log line (not
> sure
> > > why
> > > > I
> > > > > > > > didn't see it previously), coming from the ConsumerIterator:
> > > > > > > >
> > > > > > > > 2014-01-08 02:31:47,681 ERROR [myconsumerthread-0]
> > > > > > > > consumer.ConsumerIterator - consumed offset: 16163904970
> > doesn't
> > > > > match
> > > > > > > > fetch offset: 16175326044 for mytopic:0: fetched offset =
> > > > > 16175330598:
> > > > > > > > consumed offset = 16163904970;
> > > > > > > >  Consumer may lose data
> > > > > 

Re: understanding OffsetOutOfRangeException's....

2014-01-10 Thread Jun Rao
Could you increase parallelism on the consumers?

Thanks,

Jun


On Thu, Jan 9, 2014 at 1:22 PM, Jason Rosenberg  wrote:

> The consumption rate is a little better after the refactoring.  The main
> issue though, was that we had a mismatch between large and small topics.  A
> large topic can lag, and adversely affect consumption of other topics, so
> this is an attempt to isolate topic filtering, and better balance the
> consumers for the different topics.
>
> So, it's definitely working on that score.
>
> The topic that was lagging (and getting OffsetOutOfRangeExceptions) was
> doing that before and after the refactor (and after we started also seeing
> the ERROR logging).  But consumption of all other topics is working better
> now (almost no lag at all).
>
> I'm also setting the client.id for each consumer in the process, so that I
> can see the individual metrics per consumer.
>
> Jason
>
>
> On Thu, Jan 9, 2014 at 1:00 PM, Jun Rao  wrote:
>
> > Does the consumption rate in the client (msg/sec) change significantly
> > after the refactoring?
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Wed, Jan 8, 2014 at 10:44 AM, Jason Rosenberg 
> wrote:
> >
> > > Yes, it's happening continuously, at the moment (although I'm expecting
> > the
> > > consumer to catch up soon)
> > >
> > > It seemed to start happening after I refactored the consumer app to use
> > > multiple consumer connectors in the same process (each one has a
> separate
> > > topic filter, so should be no overlap between them).  All using the
> same
> > > consumer group.
> > >
> > > Could it be a thread safety issue in the ZookeeperConsumerConnector
> > (seems
> > > unlikely).
> > >
> > > Jason
> > >
> > >
> > > On Wed, Jan 8, 2014 at 1:04 AM, Jun Rao  wrote:
> > >
> > > > Normally, if the consumer can't keep up, you should just see the
> > > > OffsetOutOfRangeException warning. The offset mismatch error should
> > never
> > > > happen. It could be that OffsetOutOfRangeException exposed a bug. Do
> > you
> > > > think you can reproduce this easily?
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > >
> > > > On Tue, Jan 7, 2014 at 9:29 PM, Jason Rosenberg 
> > > wrote:
> > > >
> > > > > Jun,
> > > > >
> > > > > I'm not sure I understand your question, wrt produced data?
> > > > >
> > > > > But yes, in general, I believe the consumer is not keeping up with
> > the
> > > > > broker's deleting the data.  So it's trying to fetch the next batch
> > of
> > > > > data, but it's last offset is no longer there, etc.  So that's the
> > > reason
> > > > > for the WARN message, in the fetcher thread.
> > > > >
> > > > > I'm just not sure I understand then why we don't always see the
> > > > > ConsumerIterator error also, because won't there always be missing
> > data
> > > > > detected there?  Why sometimes and not always?  What's the
> > difference?
> > > > >
> > > > > Jason
> > > > >
> > > > >
> > > > > On Wed, Jan 8, 2014 at 12:07 AM, Jun Rao  wrote:
> > > > >
> > > > > > The WARN and ERROR may not be completely correlated. Could it be
> > that
> > > > the
> > > > > > consumer is slow and couldn't keep up with the produced data?
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jun
> > > > > >
> > > > > >
> > > > > > On Tue, Jan 7, 2014 at 6:47 PM, Jason Rosenberg <
> j...@squareup.com>
> > > > > wrote:
> > > > > >
> > > > > > > So, sometimes I just get the WARN from the
> ConsumerFetcherThread
> > > (as
> > > > > > > previously noted, above), e.g.:
> > > > > > >
> > > > > > > 2014-01-08 02:31:47,394  WARN
> > > > [ConsumerFetcherThread-myconsumerapp-11]
> > > > > > > consumer.ConsumerFetcherThread -
> > > > > > > [ConsumerFetcherThread-myconsumerapp-11], Current offset
> > > 16163904970
> > > > > > > for partition [mypartition,0] out of range; reset offset to
> > > > > > > 16175326044
> > > > > > >
> > > > > > > More recently, I see these in the following log line (not sure
> > why
> > > I
> > > > > > > didn't see it previously), coming from the ConsumerIterator:
> > > > > > >
> > > > > > > 2014-01-08 02:31:47,681 ERROR [myconsumerthread-0]
> > > > > > > consumer.ConsumerIterator - consumed offset: 16163904970
> doesn't
> > > > match
> > > > > > > fetch offset: 16175326044 for mytopic:0: fetched offset =
> > > > 16175330598:
> > > > > > > consumed offset = 16163904970;
> > > > > > >  Consumer may lose data
> > > > > > >
> > > > > > > Why would I not see this second ERROR everytime there's a
> > > > > > > corresponding WARN on the FetcherThread for an offset reset?
> > > > > > >
> > > > > > > Should I only be concerned about possible lost data if I see
> the
> > > > > > > second ERROR log line?
> > > > > > >
> > > > > > > Jason
> > > > > > >
> > > > > > > On Tue, Dec 24, 2013 at 3:49 PM, Jason Rosenberg <
> > j...@squareup.com
> > > >
> > > > > > wrote:
> > > > > > > > But I assume this would not be normally you'd want to log
> > (every
> > > > > > > > incoming producer request?).  Maybe just for debugging?  Or
> is
> > it
> > > > > only
> > 

Re: understanding OffsetOutOfRangeException's....

2014-01-09 Thread Jason Rosenberg
The consumption rate is a little better after the refactoring.  The main
issue though, was that we had a mismatch between large and small topics.  A
large topic can lag, and adversely affect consumption of other topics, so
this is an attempt to isolate topic filtering, and better balance the
consumers for the different topics.

So, it's definitely working on that score.

The topic that was lagging (and getting OffsetOutOfRangeExceptions) was
doing that before and after the refactor (and after we started also seeing
the ERROR logging).  But consumption of all other topics is working better
now (almost no lag at all).

I'm also setting the client.id for each consumer in the process, so that I
can see the individual metrics per consumer.

Jason


On Thu, Jan 9, 2014 at 1:00 PM, Jun Rao  wrote:

> Does the consumption rate in the client (msg/sec) change significantly
> after the refactoring?
>
> Thanks,
>
> Jun
>
>
> On Wed, Jan 8, 2014 at 10:44 AM, Jason Rosenberg  wrote:
>
> > Yes, it's happening continuously, at the moment (although I'm expecting
> the
> > consumer to catch up soon)
> >
> > It seemed to start happening after I refactored the consumer app to use
> > multiple consumer connectors in the same process (each one has a separate
> > topic filter, so should be no overlap between them).  All using the same
> > consumer group.
> >
> > Could it be a thread safety issue in the ZookeeperConsumerConnector
> (seems
> > unlikely).
> >
> > Jason
> >
> >
> > On Wed, Jan 8, 2014 at 1:04 AM, Jun Rao  wrote:
> >
> > > Normally, if the consumer can't keep up, you should just see the
> > > OffsetOutOfRangeException warning. The offset mismatch error should
> never
> > > happen. It could be that OffsetOutOfRangeException exposed a bug. Do
> you
> > > think you can reproduce this easily?
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > > On Tue, Jan 7, 2014 at 9:29 PM, Jason Rosenberg 
> > wrote:
> > >
> > > > Jun,
> > > >
> > > > I'm not sure I understand your question, wrt produced data?
> > > >
> > > > But yes, in general, I believe the consumer is not keeping up with
> the
> > > > broker's deleting the data.  So it's trying to fetch the next batch
> of
> > > > data, but it's last offset is no longer there, etc.  So that's the
> > reason
> > > > for the WARN message, in the fetcher thread.
> > > >
> > > > I'm just not sure I understand then why we don't always see the
> > > > ConsumerIterator error also, because won't there always be missing
> data
> > > > detected there?  Why sometimes and not always?  What's the
> difference?
> > > >
> > > > Jason
> > > >
> > > >
> > > > On Wed, Jan 8, 2014 at 12:07 AM, Jun Rao  wrote:
> > > >
> > > > > The WARN and ERROR may not be completely correlated. Could it be
> that
> > > the
> > > > > consumer is slow and couldn't keep up with the produced data?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jun
> > > > >
> > > > >
> > > > > On Tue, Jan 7, 2014 at 6:47 PM, Jason Rosenberg 
> > > > wrote:
> > > > >
> > > > > > So, sometimes I just get the WARN from the ConsumerFetcherThread
> > (as
> > > > > > previously noted, above), e.g.:
> > > > > >
> > > > > > 2014-01-08 02:31:47,394  WARN
> > > [ConsumerFetcherThread-myconsumerapp-11]
> > > > > > consumer.ConsumerFetcherThread -
> > > > > > [ConsumerFetcherThread-myconsumerapp-11], Current offset
> > 16163904970
> > > > > > for partition [mypartition,0] out of range; reset offset to
> > > > > > 16175326044
> > > > > >
> > > > > > More recently, I see these in the following log line (not sure
> why
> > I
> > > > > > didn't see it previously), coming from the ConsumerIterator:
> > > > > >
> > > > > > 2014-01-08 02:31:47,681 ERROR [myconsumerthread-0]
> > > > > > consumer.ConsumerIterator - consumed offset: 16163904970 doesn't
> > > match
> > > > > > fetch offset: 16175326044 for mytopic:0: fetched offset =
> > > 16175330598:
> > > > > > consumed offset = 16163904970;
> > > > > >  Consumer may lose data
> > > > > >
> > > > > > Why would I not see this second ERROR everytime there's a
> > > > > > corresponding WARN on the FetcherThread for an offset reset?
> > > > > >
> > > > > > Should I only be concerned about possible lost data if I see the
> > > > > > second ERROR log line?
> > > > > >
> > > > > > Jason
> > > > > >
> > > > > > On Tue, Dec 24, 2013 at 3:49 PM, Jason Rosenberg <
> j...@squareup.com
> > >
> > > > > wrote:
> > > > > > > But I assume this would not be normally you'd want to log
> (every
> > > > > > > incoming producer request?).  Maybe just for debugging?  Or is
> it
> > > > only
> > > > > > > for consumer fetch requests?
> > > > > > >
> > > > > > > On Tue, Dec 24, 2013 at 12:50 PM, Guozhang Wang <
> > > wangg...@gmail.com>
> > > > > > wrote:
> > > > > > >> TRACE is lower than INFO so INFO level request logging would
> > also
> > > be
> > > > > > >> recorded.
> > > > > > >>
> > > > > > >> You can check for "Completed XXX request" in the log files to
> > > check
> > > > > the
> > > > > > >> request info with the

Re: understanding OffsetOutOfRangeException's....

2014-01-09 Thread Jun Rao
Does the consumption rate in the client (msg/sec) change significantly
after the refactoring?

Thanks,

Jun


On Wed, Jan 8, 2014 at 10:44 AM, Jason Rosenberg  wrote:

> Yes, it's happening continuously, at the moment (although I'm expecting the
> consumer to catch up soon)
>
> It seemed to start happening after I refactored the consumer app to use
> multiple consumer connectors in the same process (each one has a separate
> topic filter, so should be no overlap between them).  All using the same
> consumer group.
>
> Could it be a thread safety issue in the ZookeeperConsumerConnector (seems
> unlikely).
>
> Jason
>
>
> On Wed, Jan 8, 2014 at 1:04 AM, Jun Rao  wrote:
>
> > Normally, if the consumer can't keep up, you should just see the
> > OffsetOutOfRangeException warning. The offset mismatch error should never
> > happen. It could be that OffsetOutOfRangeException exposed a bug. Do you
> > think you can reproduce this easily?
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Tue, Jan 7, 2014 at 9:29 PM, Jason Rosenberg 
> wrote:
> >
> > > Jun,
> > >
> > > I'm not sure I understand your question, wrt produced data?
> > >
> > > But yes, in general, I believe the consumer is not keeping up with the
> > > broker's deleting the data.  So it's trying to fetch the next batch of
> > > data, but it's last offset is no longer there, etc.  So that's the
> reason
> > > for the WARN message, in the fetcher thread.
> > >
> > > I'm just not sure I understand then why we don't always see the
> > > ConsumerIterator error also, because won't there always be missing data
> > > detected there?  Why sometimes and not always?  What's the difference?
> > >
> > > Jason
> > >
> > >
> > > On Wed, Jan 8, 2014 at 12:07 AM, Jun Rao  wrote:
> > >
> > > > The WARN and ERROR may not be completely correlated. Could it be that
> > the
> > > > consumer is slow and couldn't keep up with the produced data?
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > >
> > > > On Tue, Jan 7, 2014 at 6:47 PM, Jason Rosenberg 
> > > wrote:
> > > >
> > > > > So, sometimes I just get the WARN from the ConsumerFetcherThread
> (as
> > > > > previously noted, above), e.g.:
> > > > >
> > > > > 2014-01-08 02:31:47,394  WARN
> > [ConsumerFetcherThread-myconsumerapp-11]
> > > > > consumer.ConsumerFetcherThread -
> > > > > [ConsumerFetcherThread-myconsumerapp-11], Current offset
> 16163904970
> > > > > for partition [mypartition,0] out of range; reset offset to
> > > > > 16175326044
> > > > >
> > > > > More recently, I see these in the following log line (not sure why
> I
> > > > > didn't see it previously), coming from the ConsumerIterator:
> > > > >
> > > > > 2014-01-08 02:31:47,681 ERROR [myconsumerthread-0]
> > > > > consumer.ConsumerIterator - consumed offset: 16163904970 doesn't
> > match
> > > > > fetch offset: 16175326044 for mytopic:0: fetched offset =
> > 16175330598:
> > > > > consumed offset = 16163904970;
> > > > >  Consumer may lose data
> > > > >
> > > > > Why would I not see this second ERROR everytime there's a
> > > > > corresponding WARN on the FetcherThread for an offset reset?
> > > > >
> > > > > Should I only be concerned about possible lost data if I see the
> > > > > second ERROR log line?
> > > > >
> > > > > Jason
> > > > >
> > > > > On Tue, Dec 24, 2013 at 3:49 PM, Jason Rosenberg  >
> > > > wrote:
> > > > > > But I assume this would not be normally you'd want to log (every
> > > > > > incoming producer request?).  Maybe just for debugging?  Or is it
> > > only
> > > > > > for consumer fetch requests?
> > > > > >
> > > > > > On Tue, Dec 24, 2013 at 12:50 PM, Guozhang Wang <
> > wangg...@gmail.com>
> > > > > wrote:
> > > > > >> TRACE is lower than INFO so INFO level request logging would
> also
> > be
> > > > > >> recorded.
> > > > > >>
> > > > > >> You can check for "Completed XXX request" in the log files to
> > check
> > > > the
> > > > > >> request info with the correlation id.
> > > > > >>
> > > > > >> Guozhang
> > > > > >>
> > > > > >>
> > > > > >> On Mon, Dec 23, 2013 at 10:46 PM, Jason Rosenberg <
> > j...@squareup.com
> > > >
> > > > > wrote:
> > > > > >>
> > > > > >>> Hmmm, it looks like I'm enabling all logging at INFO, and the
> > > request
> > > > > >>> logging is only done at TRACE (why is that?).
> > > > > >>>
> > > > > >>> I suppose one wouldn't normally want to see request logs, so by
> > > > > default,
> > > > > >>> they aren't enabled?
> > > > > >>>
> > > > > >>>
> > > > > >>> On Mon, Dec 23, 2013 at 11:46 PM, Jun Rao 
> > > wrote:
> > > > > >>>
> > > > > >>> > Did you enable request log? It logs the ip of every request.
> > > > > >>> >
> > > > > >>> > Thanks,
> > > > > >>> >
> > > > > >>> > Jun
> > > > > >>> >
> > > > > >>> >
> > > > > >>> > On Mon, Dec 23, 2013 at 3:52 PM, Jason Rosenberg <
> > > j...@squareup.com
> > > > >
> > > > > >>> wrote:
> > > > > >>> >
> > > > > >>> > > Hi Guozhang,
> > > > > >>> > >
> > > > > >>> > > I'm not sure I understand your first answer.  I don't see
> > > > anything
> > > > > >>>

Re: understanding OffsetOutOfRangeException's....

2014-01-08 Thread Jason Rosenberg
Joe,

I'm creating separate connectors, and creating separate streams, with a
separate thread pool to process them, for each connector.

This appears to be working well (e.g. each connector seems to be correctly
processing data).

The only difference is the extra ERROR log message I'm seeing on the
ConsumerIterator

Jazson


On Wed, Jan 8, 2014 at 1:59 PM, Joe Stein  wrote:

> thinking/typing out loud here not sure if this is the problem but could be
> so figure I throw it out there
>
> the ZookeeperConsumerConnector has a messageStreamCreated atomic boolean
> stopping more than one consumer connector being created for messages stream
> by filter at once...
>
> do you have separate instantiated objects of your ConsumerConnector (so
> calling ConsumerConnector.create for each one) or do you have one
> ConsumerConnector.create and then you call createMessageStreams multiple
> times on it (which would cause a RuntimeException so probably not but maybe
> your code is swallowing that)... if the latter that could explain why it is
> lagging behind unexpectedly since really one ConsumerConnector is
> running
>
> again, you might not be doing this but figure I throw it out there in case
> you were.
>
> /***
>  Joe Stein
>  Founder, Principal Consultant
>  Big Data Open Source Security LLC
>  http://www.stealth.ly
>  Twitter: @allthingshadoop 
> /
>
>
> On Wed, Jan 8, 2014 at 1:44 PM, Jason Rosenberg  wrote:
>
> > Yes, it's happening continuously, at the moment (although I'm expecting
> the
> > consumer to catch up soon)
> >
> > It seemed to start happening after I refactored the consumer app to use
> > multiple consumer connectors in the same process (each one has a separate
> > topic filter, so should be no overlap between them).  All using the same
> > consumer group.
> >
> > Could it be a thread safety issue in the ZookeeperConsumerConnector
> (seems
> > unlikely).
> >
> > Jason
> >
> >
> > On Wed, Jan 8, 2014 at 1:04 AM, Jun Rao  wrote:
> >
> > > Normally, if the consumer can't keep up, you should just see the
> > > OffsetOutOfRangeException warning. The offset mismatch error should
> never
> > > happen. It could be that OffsetOutOfRangeException exposed a bug. Do
> you
> > > think you can reproduce this easily?
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > > On Tue, Jan 7, 2014 at 9:29 PM, Jason Rosenberg 
> > wrote:
> > >
> > > > Jun,
> > > >
> > > > I'm not sure I understand your question, wrt produced data?
> > > >
> > > > But yes, in general, I believe the consumer is not keeping up with
> the
> > > > broker's deleting the data.  So it's trying to fetch the next batch
> of
> > > > data, but it's last offset is no longer there, etc.  So that's the
> > reason
> > > > for the WARN message, in the fetcher thread.
> > > >
> > > > I'm just not sure I understand then why we don't always see the
> > > > ConsumerIterator error also, because won't there always be missing
> data
> > > > detected there?  Why sometimes and not always?  What's the
> difference?
> > > >
> > > > Jason
> > > >
> > > >
> > > > On Wed, Jan 8, 2014 at 12:07 AM, Jun Rao  wrote:
> > > >
> > > > > The WARN and ERROR may not be completely correlated. Could it be
> that
> > > the
> > > > > consumer is slow and couldn't keep up with the produced data?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jun
> > > > >
> > > > >
> > > > > On Tue, Jan 7, 2014 at 6:47 PM, Jason Rosenberg 
> > > > wrote:
> > > > >
> > > > > > So, sometimes I just get the WARN from the ConsumerFetcherThread
> > (as
> > > > > > previously noted, above), e.g.:
> > > > > >
> > > > > > 2014-01-08 02:31:47,394  WARN
> > > [ConsumerFetcherThread-myconsumerapp-11]
> > > > > > consumer.ConsumerFetcherThread -
> > > > > > [ConsumerFetcherThread-myconsumerapp-11], Current offset
> > 16163904970
> > > > > > for partition [mypartition,0] out of range; reset offset to
> > > > > > 16175326044
> > > > > >
> > > > > > More recently, I see these in the following log line (not sure
> why
> > I
> > > > > > didn't see it previously), coming from the ConsumerIterator:
> > > > > >
> > > > > > 2014-01-08 02:31:47,681 ERROR [myconsumerthread-0]
> > > > > > consumer.ConsumerIterator - consumed offset: 16163904970 doesn't
> > > match
> > > > > > fetch offset: 16175326044 for mytopic:0: fetched offset =
> > > 16175330598:
> > > > > > consumed offset = 16163904970;
> > > > > >  Consumer may lose data
> > > > > >
> > > > > > Why would I not see this second ERROR everytime there's a
> > > > > > corresponding WARN on the FetcherThread for an offset reset?
> > > > > >
> > > > > > Should I only be concerned about possible lost data if I see the
> > > > > > second ERROR log line?
> > > > > >
> > > > > > Jason
> > > > > >
> > > > > > On Tue, Dec 24, 2013 at 3:49 PM, Jason Rosenberg <
> j...@squareup.com
> > >
> > > > > wrote:
> > > > > > > But I assume this would not be

Re: understanding OffsetOutOfRangeException's....

2014-01-08 Thread Joe Stein
thinking/typing out loud here not sure if this is the problem but could be
so figure I throw it out there

the ZookeeperConsumerConnector has a messageStreamCreated atomic boolean
stopping more than one consumer connector being created for messages stream
by filter at once...

do you have separate instantiated objects of your ConsumerConnector (so
calling ConsumerConnector.create for each one) or do you have one
ConsumerConnector.create and then you call createMessageStreams multiple
times on it (which would cause a RuntimeException so probably not but maybe
your code is swallowing that)... if the latter that could explain why it is
lagging behind unexpectedly since really one ConsumerConnector is
running

again, you might not be doing this but figure I throw it out there in case
you were.

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop 
/


On Wed, Jan 8, 2014 at 1:44 PM, Jason Rosenberg  wrote:

> Yes, it's happening continuously, at the moment (although I'm expecting the
> consumer to catch up soon)
>
> It seemed to start happening after I refactored the consumer app to use
> multiple consumer connectors in the same process (each one has a separate
> topic filter, so should be no overlap between them).  All using the same
> consumer group.
>
> Could it be a thread safety issue in the ZookeeperConsumerConnector (seems
> unlikely).
>
> Jason
>
>
> On Wed, Jan 8, 2014 at 1:04 AM, Jun Rao  wrote:
>
> > Normally, if the consumer can't keep up, you should just see the
> > OffsetOutOfRangeException warning. The offset mismatch error should never
> > happen. It could be that OffsetOutOfRangeException exposed a bug. Do you
> > think you can reproduce this easily?
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Tue, Jan 7, 2014 at 9:29 PM, Jason Rosenberg 
> wrote:
> >
> > > Jun,
> > >
> > > I'm not sure I understand your question, wrt produced data?
> > >
> > > But yes, in general, I believe the consumer is not keeping up with the
> > > broker's deleting the data.  So it's trying to fetch the next batch of
> > > data, but it's last offset is no longer there, etc.  So that's the
> reason
> > > for the WARN message, in the fetcher thread.
> > >
> > > I'm just not sure I understand then why we don't always see the
> > > ConsumerIterator error also, because won't there always be missing data
> > > detected there?  Why sometimes and not always?  What's the difference?
> > >
> > > Jason
> > >
> > >
> > > On Wed, Jan 8, 2014 at 12:07 AM, Jun Rao  wrote:
> > >
> > > > The WARN and ERROR may not be completely correlated. Could it be that
> > the
> > > > consumer is slow and couldn't keep up with the produced data?
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > >
> > > > On Tue, Jan 7, 2014 at 6:47 PM, Jason Rosenberg 
> > > wrote:
> > > >
> > > > > So, sometimes I just get the WARN from the ConsumerFetcherThread
> (as
> > > > > previously noted, above), e.g.:
> > > > >
> > > > > 2014-01-08 02:31:47,394  WARN
> > [ConsumerFetcherThread-myconsumerapp-11]
> > > > > consumer.ConsumerFetcherThread -
> > > > > [ConsumerFetcherThread-myconsumerapp-11], Current offset
> 16163904970
> > > > > for partition [mypartition,0] out of range; reset offset to
> > > > > 16175326044
> > > > >
> > > > > More recently, I see these in the following log line (not sure why
> I
> > > > > didn't see it previously), coming from the ConsumerIterator:
> > > > >
> > > > > 2014-01-08 02:31:47,681 ERROR [myconsumerthread-0]
> > > > > consumer.ConsumerIterator - consumed offset: 16163904970 doesn't
> > match
> > > > > fetch offset: 16175326044 for mytopic:0: fetched offset =
> > 16175330598:
> > > > > consumed offset = 16163904970;
> > > > >  Consumer may lose data
> > > > >
> > > > > Why would I not see this second ERROR everytime there's a
> > > > > corresponding WARN on the FetcherThread for an offset reset?
> > > > >
> > > > > Should I only be concerned about possible lost data if I see the
> > > > > second ERROR log line?
> > > > >
> > > > > Jason
> > > > >
> > > > > On Tue, Dec 24, 2013 at 3:49 PM, Jason Rosenberg  >
> > > > wrote:
> > > > > > But I assume this would not be normally you'd want to log (every
> > > > > > incoming producer request?).  Maybe just for debugging?  Or is it
> > > only
> > > > > > for consumer fetch requests?
> > > > > >
> > > > > > On Tue, Dec 24, 2013 at 12:50 PM, Guozhang Wang <
> > wangg...@gmail.com>
> > > > > wrote:
> > > > > >> TRACE is lower than INFO so INFO level request logging would
> also
> > be
> > > > > >> recorded.
> > > > > >>
> > > > > >> You can check for "Completed XXX request" in the log files to
> > check
> > > > the
> > > > > >> request info with the correlation id.
> > > > > >>
> > > > > >> Guozhang
> > > > > >>
> > > > > >>
> > > > > >> On Mon, Dec 23, 2013 at 10:46 PM, Jason Ro

Re: understanding OffsetOutOfRangeException's....

2014-01-08 Thread Jason Rosenberg
Yes, it's happening continuously, at the moment (although I'm expecting the
consumer to catch up soon)

It seemed to start happening after I refactored the consumer app to use
multiple consumer connectors in the same process (each one has a separate
topic filter, so should be no overlap between them).  All using the same
consumer group.

Could it be a thread safety issue in the ZookeeperConsumerConnector (seems
unlikely).

Jason


On Wed, Jan 8, 2014 at 1:04 AM, Jun Rao  wrote:

> Normally, if the consumer can't keep up, you should just see the
> OffsetOutOfRangeException warning. The offset mismatch error should never
> happen. It could be that OffsetOutOfRangeException exposed a bug. Do you
> think you can reproduce this easily?
>
> Thanks,
>
> Jun
>
>
> On Tue, Jan 7, 2014 at 9:29 PM, Jason Rosenberg  wrote:
>
> > Jun,
> >
> > I'm not sure I understand your question, wrt produced data?
> >
> > But yes, in general, I believe the consumer is not keeping up with the
> > broker's deleting the data.  So it's trying to fetch the next batch of
> > data, but it's last offset is no longer there, etc.  So that's the reason
> > for the WARN message, in the fetcher thread.
> >
> > I'm just not sure I understand then why we don't always see the
> > ConsumerIterator error also, because won't there always be missing data
> > detected there?  Why sometimes and not always?  What's the difference?
> >
> > Jason
> >
> >
> > On Wed, Jan 8, 2014 at 12:07 AM, Jun Rao  wrote:
> >
> > > The WARN and ERROR may not be completely correlated. Could it be that
> the
> > > consumer is slow and couldn't keep up with the produced data?
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > > On Tue, Jan 7, 2014 at 6:47 PM, Jason Rosenberg 
> > wrote:
> > >
> > > > So, sometimes I just get the WARN from the ConsumerFetcherThread (as
> > > > previously noted, above), e.g.:
> > > >
> > > > 2014-01-08 02:31:47,394  WARN
> [ConsumerFetcherThread-myconsumerapp-11]
> > > > consumer.ConsumerFetcherThread -
> > > > [ConsumerFetcherThread-myconsumerapp-11], Current offset 16163904970
> > > > for partition [mypartition,0] out of range; reset offset to
> > > > 16175326044
> > > >
> > > > More recently, I see these in the following log line (not sure why I
> > > > didn't see it previously), coming from the ConsumerIterator:
> > > >
> > > > 2014-01-08 02:31:47,681 ERROR [myconsumerthread-0]
> > > > consumer.ConsumerIterator - consumed offset: 16163904970 doesn't
> match
> > > > fetch offset: 16175326044 for mytopic:0: fetched offset =
> 16175330598:
> > > > consumed offset = 16163904970;
> > > >  Consumer may lose data
> > > >
> > > > Why would I not see this second ERROR everytime there's a
> > > > corresponding WARN on the FetcherThread for an offset reset?
> > > >
> > > > Should I only be concerned about possible lost data if I see the
> > > > second ERROR log line?
> > > >
> > > > Jason
> > > >
> > > > On Tue, Dec 24, 2013 at 3:49 PM, Jason Rosenberg 
> > > wrote:
> > > > > But I assume this would not be normally you'd want to log (every
> > > > > incoming producer request?).  Maybe just for debugging?  Or is it
> > only
> > > > > for consumer fetch requests?
> > > > >
> > > > > On Tue, Dec 24, 2013 at 12:50 PM, Guozhang Wang <
> wangg...@gmail.com>
> > > > wrote:
> > > > >> TRACE is lower than INFO so INFO level request logging would also
> be
> > > > >> recorded.
> > > > >>
> > > > >> You can check for "Completed XXX request" in the log files to
> check
> > > the
> > > > >> request info with the correlation id.
> > > > >>
> > > > >> Guozhang
> > > > >>
> > > > >>
> > > > >> On Mon, Dec 23, 2013 at 10:46 PM, Jason Rosenberg <
> j...@squareup.com
> > >
> > > > wrote:
> > > > >>
> > > > >>> Hmmm, it looks like I'm enabling all logging at INFO, and the
> > request
> > > > >>> logging is only done at TRACE (why is that?).
> > > > >>>
> > > > >>> I suppose one wouldn't normally want to see request logs, so by
> > > > default,
> > > > >>> they aren't enabled?
> > > > >>>
> > > > >>>
> > > > >>> On Mon, Dec 23, 2013 at 11:46 PM, Jun Rao 
> > wrote:
> > > > >>>
> > > > >>> > Did you enable request log? It logs the ip of every request.
> > > > >>> >
> > > > >>> > Thanks,
> > > > >>> >
> > > > >>> > Jun
> > > > >>> >
> > > > >>> >
> > > > >>> > On Mon, Dec 23, 2013 at 3:52 PM, Jason Rosenberg <
> > j...@squareup.com
> > > >
> > > > >>> wrote:
> > > > >>> >
> > > > >>> > > Hi Guozhang,
> > > > >>> > >
> > > > >>> > > I'm not sure I understand your first answer.  I don't see
> > > anything
> > > > >>> > > regarding the correlation id, elsewhere in the broker
> > > logs.They
> > > > >>> only
> > > > >>> > > show up in those ERROR messages
> > > > >>> > >
> > > > >>> > > I do see correlation id's in clients, but not on the
> > broker.
> > > > >>> > >
> > > > >>> > > Jason
> > > > >>> > >
> > > > >>> > >
> > > > >>> > > On Mon, Dec 23, 2013 at 6:46 PM, Guozhang Wang <
> > > wangg...@gmail.com
> > > > >
> > > > >>> > wrote:
> > > > >>> > >
> > > 

Re: understanding OffsetOutOfRangeException's....

2014-01-07 Thread Jun Rao
Normally, if the consumer can't keep up, you should just see the
OffsetOutOfRangeException warning. The offset mismatch error should never
happen. It could be that OffsetOutOfRangeException exposed a bug. Do you
think you can reproduce this easily?

Thanks,

Jun


On Tue, Jan 7, 2014 at 9:29 PM, Jason Rosenberg  wrote:

> Jun,
>
> I'm not sure I understand your question, wrt produced data?
>
> But yes, in general, I believe the consumer is not keeping up with the
> broker's deleting the data.  So it's trying to fetch the next batch of
> data, but it's last offset is no longer there, etc.  So that's the reason
> for the WARN message, in the fetcher thread.
>
> I'm just not sure I understand then why we don't always see the
> ConsumerIterator error also, because won't there always be missing data
> detected there?  Why sometimes and not always?  What's the difference?
>
> Jason
>
>
> On Wed, Jan 8, 2014 at 12:07 AM, Jun Rao  wrote:
>
> > The WARN and ERROR may not be completely correlated. Could it be that the
> > consumer is slow and couldn't keep up with the produced data?
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Tue, Jan 7, 2014 at 6:47 PM, Jason Rosenberg 
> wrote:
> >
> > > So, sometimes I just get the WARN from the ConsumerFetcherThread (as
> > > previously noted, above), e.g.:
> > >
> > > 2014-01-08 02:31:47,394  WARN [ConsumerFetcherThread-myconsumerapp-11]
> > > consumer.ConsumerFetcherThread -
> > > [ConsumerFetcherThread-myconsumerapp-11], Current offset 16163904970
> > > for partition [mypartition,0] out of range; reset offset to
> > > 16175326044
> > >
> > > More recently, I see these in the following log line (not sure why I
> > > didn't see it previously), coming from the ConsumerIterator:
> > >
> > > 2014-01-08 02:31:47,681 ERROR [myconsumerthread-0]
> > > consumer.ConsumerIterator - consumed offset: 16163904970 doesn't match
> > > fetch offset: 16175326044 for mytopic:0: fetched offset = 16175330598:
> > > consumed offset = 16163904970;
> > >  Consumer may lose data
> > >
> > > Why would I not see this second ERROR everytime there's a
> > > corresponding WARN on the FetcherThread for an offset reset?
> > >
> > > Should I only be concerned about possible lost data if I see the
> > > second ERROR log line?
> > >
> > > Jason
> > >
> > > On Tue, Dec 24, 2013 at 3:49 PM, Jason Rosenberg 
> > wrote:
> > > > But I assume this would not be normally you'd want to log (every
> > > > incoming producer request?).  Maybe just for debugging?  Or is it
> only
> > > > for consumer fetch requests?
> > > >
> > > > On Tue, Dec 24, 2013 at 12:50 PM, Guozhang Wang 
> > > wrote:
> > > >> TRACE is lower than INFO so INFO level request logging would also be
> > > >> recorded.
> > > >>
> > > >> You can check for "Completed XXX request" in the log files to check
> > the
> > > >> request info with the correlation id.
> > > >>
> > > >> Guozhang
> > > >>
> > > >>
> > > >> On Mon, Dec 23, 2013 at 10:46 PM, Jason Rosenberg  >
> > > wrote:
> > > >>
> > > >>> Hmmm, it looks like I'm enabling all logging at INFO, and the
> request
> > > >>> logging is only done at TRACE (why is that?).
> > > >>>
> > > >>> I suppose one wouldn't normally want to see request logs, so by
> > > default,
> > > >>> they aren't enabled?
> > > >>>
> > > >>>
> > > >>> On Mon, Dec 23, 2013 at 11:46 PM, Jun Rao 
> wrote:
> > > >>>
> > > >>> > Did you enable request log? It logs the ip of every request.
> > > >>> >
> > > >>> > Thanks,
> > > >>> >
> > > >>> > Jun
> > > >>> >
> > > >>> >
> > > >>> > On Mon, Dec 23, 2013 at 3:52 PM, Jason Rosenberg <
> j...@squareup.com
> > >
> > > >>> wrote:
> > > >>> >
> > > >>> > > Hi Guozhang,
> > > >>> > >
> > > >>> > > I'm not sure I understand your first answer.  I don't see
> > anything
> > > >>> > > regarding the correlation id, elsewhere in the broker
> > logs.They
> > > >>> only
> > > >>> > > show up in those ERROR messages
> > > >>> > >
> > > >>> > > I do see correlation id's in clients, but not on the
> broker.
> > > >>> > >
> > > >>> > > Jason
> > > >>> > >
> > > >>> > >
> > > >>> > > On Mon, Dec 23, 2013 at 6:46 PM, Guozhang Wang <
> > wangg...@gmail.com
> > > >
> > > >>> > wrote:
> > > >>> > >
> > > >>> > > > Jason,
> > > >>> > > >
> > > >>> > > > You can search the correlation id in the public access log on
> > the
> > > >>> > servers
> > > >>> > > > to get the consumer information.
> > > >>> > > >
> > > >>> > > > As for logging, I agree that we should use the same level on
> > both
> > > >>> > sides.
> > > >>> > > > Could you file a jira for this?
> > > >>> > > >
> > > >>> > > > Guozhang
> > > >>> > > >
> > > >>> > > >
> > > >>> > > > On Mon, Dec 23, 2013 at 3:09 PM, Jason Rosenberg <
> > > j...@squareup.com>
> > > >>> > > wrote:
> > > >>> > > >
> > > >>> > > > > In our broker logs, we occasionally see errors like this:
> > > >>> > > > >
> > > >>> > > > > 2013-12-23 05:02:08,456 ERROR [kafka-request-handler-2]
> > > >>> > > server.KafkaApis
> > > >>> > > > -
> > > >>> > > > > [KafkaApi-45] E

Re: understanding OffsetOutOfRangeException's....

2014-01-07 Thread Jason Rosenberg
Jun,

I'm not sure I understand your question, wrt produced data?

But yes, in general, I believe the consumer is not keeping up with the
broker's deleting the data.  So it's trying to fetch the next batch of
data, but it's last offset is no longer there, etc.  So that's the reason
for the WARN message, in the fetcher thread.

I'm just not sure I understand then why we don't always see the
ConsumerIterator error also, because won't there always be missing data
detected there?  Why sometimes and not always?  What's the difference?

Jason


On Wed, Jan 8, 2014 at 12:07 AM, Jun Rao  wrote:

> The WARN and ERROR may not be completely correlated. Could it be that the
> consumer is slow and couldn't keep up with the produced data?
>
> Thanks,
>
> Jun
>
>
> On Tue, Jan 7, 2014 at 6:47 PM, Jason Rosenberg  wrote:
>
> > So, sometimes I just get the WARN from the ConsumerFetcherThread (as
> > previously noted, above), e.g.:
> >
> > 2014-01-08 02:31:47,394  WARN [ConsumerFetcherThread-myconsumerapp-11]
> > consumer.ConsumerFetcherThread -
> > [ConsumerFetcherThread-myconsumerapp-11], Current offset 16163904970
> > for partition [mypartition,0] out of range; reset offset to
> > 16175326044
> >
> > More recently, I see these in the following log line (not sure why I
> > didn't see it previously), coming from the ConsumerIterator:
> >
> > 2014-01-08 02:31:47,681 ERROR [myconsumerthread-0]
> > consumer.ConsumerIterator - consumed offset: 16163904970 doesn't match
> > fetch offset: 16175326044 for mytopic:0: fetched offset = 16175330598:
> > consumed offset = 16163904970;
> >  Consumer may lose data
> >
> > Why would I not see this second ERROR everytime there's a
> > corresponding WARN on the FetcherThread for an offset reset?
> >
> > Should I only be concerned about possible lost data if I see the
> > second ERROR log line?
> >
> > Jason
> >
> > On Tue, Dec 24, 2013 at 3:49 PM, Jason Rosenberg 
> wrote:
> > > But I assume this would not be normally you'd want to log (every
> > > incoming producer request?).  Maybe just for debugging?  Or is it only
> > > for consumer fetch requests?
> > >
> > > On Tue, Dec 24, 2013 at 12:50 PM, Guozhang Wang 
> > wrote:
> > >> TRACE is lower than INFO so INFO level request logging would also be
> > >> recorded.
> > >>
> > >> You can check for "Completed XXX request" in the log files to check
> the
> > >> request info with the correlation id.
> > >>
> > >> Guozhang
> > >>
> > >>
> > >> On Mon, Dec 23, 2013 at 10:46 PM, Jason Rosenberg 
> > wrote:
> > >>
> > >>> Hmmm, it looks like I'm enabling all logging at INFO, and the request
> > >>> logging is only done at TRACE (why is that?).
> > >>>
> > >>> I suppose one wouldn't normally want to see request logs, so by
> > default,
> > >>> they aren't enabled?
> > >>>
> > >>>
> > >>> On Mon, Dec 23, 2013 at 11:46 PM, Jun Rao  wrote:
> > >>>
> > >>> > Did you enable request log? It logs the ip of every request.
> > >>> >
> > >>> > Thanks,
> > >>> >
> > >>> > Jun
> > >>> >
> > >>> >
> > >>> > On Mon, Dec 23, 2013 at 3:52 PM, Jason Rosenberg  >
> > >>> wrote:
> > >>> >
> > >>> > > Hi Guozhang,
> > >>> > >
> > >>> > > I'm not sure I understand your first answer.  I don't see
> anything
> > >>> > > regarding the correlation id, elsewhere in the broker
> logs.They
> > >>> only
> > >>> > > show up in those ERROR messages
> > >>> > >
> > >>> > > I do see correlation id's in clients, but not on the broker.
> > >>> > >
> > >>> > > Jason
> > >>> > >
> > >>> > >
> > >>> > > On Mon, Dec 23, 2013 at 6:46 PM, Guozhang Wang <
> wangg...@gmail.com
> > >
> > >>> > wrote:
> > >>> > >
> > >>> > > > Jason,
> > >>> > > >
> > >>> > > > You can search the correlation id in the public access log on
> the
> > >>> > servers
> > >>> > > > to get the consumer information.
> > >>> > > >
> > >>> > > > As for logging, I agree that we should use the same level on
> both
> > >>> > sides.
> > >>> > > > Could you file a jira for this?
> > >>> > > >
> > >>> > > > Guozhang
> > >>> > > >
> > >>> > > >
> > >>> > > > On Mon, Dec 23, 2013 at 3:09 PM, Jason Rosenberg <
> > j...@squareup.com>
> > >>> > > wrote:
> > >>> > > >
> > >>> > > > > In our broker logs, we occasionally see errors like this:
> > >>> > > > >
> > >>> > > > > 2013-12-23 05:02:08,456 ERROR [kafka-request-handler-2]
> > >>> > > server.KafkaApis
> > >>> > > > -
> > >>> > > > > [KafkaApi-45] Error when processing fetch request for
> partition
> > >>> > > > [mytopic,0]
> > >>> > > > > offset 204243601 from consumer with correlation id 130341
> > >>> > > > > kafka.common.OffsetOutOfRangeException: Request for offset
> > >>> 204243601
> > >>> > > but
> > >>> > > > we
> > >>> > > > > only have log segments in the range 204343397 to 207423640.
> > >>> > > > >
> > >>> > > > > I assume this means there's a consumer that has fallen behind
> > >>> > consuming
> > >>> > > > > messages, and the log retention policy has removed messages
> > before
> > >>> > they
> > >>> > > > > could be consumed by the consumer.
> > >>> > 

Re: understanding OffsetOutOfRangeException's....

2014-01-07 Thread Jun Rao
The WARN and ERROR may not be completely correlated. Could it be that the
consumer is slow and couldn't keep up with the produced data?

Thanks,

Jun


On Tue, Jan 7, 2014 at 6:47 PM, Jason Rosenberg  wrote:

> So, sometimes I just get the WARN from the ConsumerFetcherThread (as
> previously noted, above), e.g.:
>
> 2014-01-08 02:31:47,394  WARN [ConsumerFetcherThread-myconsumerapp-11]
> consumer.ConsumerFetcherThread -
> [ConsumerFetcherThread-myconsumerapp-11], Current offset 16163904970
> for partition [mypartition,0] out of range; reset offset to
> 16175326044
>
> More recently, I see these in the following log line (not sure why I
> didn't see it previously), coming from the ConsumerIterator:
>
> 2014-01-08 02:31:47,681 ERROR [myconsumerthread-0]
> consumer.ConsumerIterator - consumed offset: 16163904970 doesn't match
> fetch offset: 16175326044 for mytopic:0: fetched offset = 16175330598:
> consumed offset = 16163904970;
>  Consumer may lose data
>
> Why would I not see this second ERROR everytime there's a
> corresponding WARN on the FetcherThread for an offset reset?
>
> Should I only be concerned about possible lost data if I see the
> second ERROR log line?
>
> Jason
>
> On Tue, Dec 24, 2013 at 3:49 PM, Jason Rosenberg  wrote:
> > But I assume this would not be normally you'd want to log (every
> > incoming producer request?).  Maybe just for debugging?  Or is it only
> > for consumer fetch requests?
> >
> > On Tue, Dec 24, 2013 at 12:50 PM, Guozhang Wang 
> wrote:
> >> TRACE is lower than INFO so INFO level request logging would also be
> >> recorded.
> >>
> >> You can check for "Completed XXX request" in the log files to check the
> >> request info with the correlation id.
> >>
> >> Guozhang
> >>
> >>
> >> On Mon, Dec 23, 2013 at 10:46 PM, Jason Rosenberg 
> wrote:
> >>
> >>> Hmmm, it looks like I'm enabling all logging at INFO, and the request
> >>> logging is only done at TRACE (why is that?).
> >>>
> >>> I suppose one wouldn't normally want to see request logs, so by
> default,
> >>> they aren't enabled?
> >>>
> >>>
> >>> On Mon, Dec 23, 2013 at 11:46 PM, Jun Rao  wrote:
> >>>
> >>> > Did you enable request log? It logs the ip of every request.
> >>> >
> >>> > Thanks,
> >>> >
> >>> > Jun
> >>> >
> >>> >
> >>> > On Mon, Dec 23, 2013 at 3:52 PM, Jason Rosenberg 
> >>> wrote:
> >>> >
> >>> > > Hi Guozhang,
> >>> > >
> >>> > > I'm not sure I understand your first answer.  I don't see anything
> >>> > > regarding the correlation id, elsewhere in the broker logs.They
> >>> only
> >>> > > show up in those ERROR messages
> >>> > >
> >>> > > I do see correlation id's in clients, but not on the broker.
> >>> > >
> >>> > > Jason
> >>> > >
> >>> > >
> >>> > > On Mon, Dec 23, 2013 at 6:46 PM, Guozhang Wang  >
> >>> > wrote:
> >>> > >
> >>> > > > Jason,
> >>> > > >
> >>> > > > You can search the correlation id in the public access log on the
> >>> > servers
> >>> > > > to get the consumer information.
> >>> > > >
> >>> > > > As for logging, I agree that we should use the same level on both
> >>> > sides.
> >>> > > > Could you file a jira for this?
> >>> > > >
> >>> > > > Guozhang
> >>> > > >
> >>> > > >
> >>> > > > On Mon, Dec 23, 2013 at 3:09 PM, Jason Rosenberg <
> j...@squareup.com>
> >>> > > wrote:
> >>> > > >
> >>> > > > > In our broker logs, we occasionally see errors like this:
> >>> > > > >
> >>> > > > > 2013-12-23 05:02:08,456 ERROR [kafka-request-handler-2]
> >>> > > server.KafkaApis
> >>> > > > -
> >>> > > > > [KafkaApi-45] Error when processing fetch request for partition
> >>> > > > [mytopic,0]
> >>> > > > > offset 204243601 from consumer with correlation id 130341
> >>> > > > > kafka.common.OffsetOutOfRangeException: Request for offset
> >>> 204243601
> >>> > > but
> >>> > > > we
> >>> > > > > only have log segments in the range 204343397 to 207423640.
> >>> > > > >
> >>> > > > > I assume this means there's a consumer that has fallen behind
> >>> > consuming
> >>> > > > > messages, and the log retention policy has removed messages
> before
> >>> > they
> >>> > > > > could be consumed by the consumer.
> >>> > > > >
> >>> > > > > However, I'm not 100% which consumer it is, and it looks like
> the
> >>> > only
> >>> > > > info
> >>> > > > > we have is the correlation id of the consumer, e.g.:
> >>> > > > >
> >>> > > > > "from consumer with correlation id 130341"
> >>> > > > >
> >>> > > > > Is there a way to know which consumer this refers to?  It seems
> >>> there
> >>> > > are
> >>> > > > > far more correlation id's than there are consumers.  Would it
> be
> >>> > > possible
> >>> > > > > to provide a bit more descriptive error message here, so we can
> >>> > > > immediately
> >>> > > > > know which consumer is falling behind?
> >>> > > > >
> >>> > > > > We do see a corresponding entry in the consumer logs too:
> >>> > > > >
> >>> > > > > 2013-12-23 05:02:08,797  WARN
> >>> > > > >
> [ConsumerFetcherThread-myconsumergroup-1387353494862-7aa0c61d-0-45]
> >>> > > > > consumer.Consumer

Re: understanding OffsetOutOfRangeException's....

2014-01-07 Thread Jason Rosenberg
I've filed https://issues.apache.org/jira/browse/KAFKA-1200, to address the
incosistent log-level issue.


On Tue, Jan 7, 2014 at 9:47 PM, Jason Rosenberg  wrote:

> So, sometimes I just get the WARN from the ConsumerFetcherThread (as
> previously noted, above), e.g.:
>
> 2014-01-08 02:31:47,394  WARN [ConsumerFetcherThread-myconsumerapp-11]
> consumer.ConsumerFetcherThread -
> [ConsumerFetcherThread-myconsumerapp-11], Current offset 16163904970
> for partition [mypartition,0] out of range; reset offset to
> 16175326044
>
> More recently, I see these in the following log line (not sure why I
> didn't see it previously), coming from the ConsumerIterator:
>
> 2014-01-08 02:31:47,681 ERROR [myconsumerthread-0]
> consumer.ConsumerIterator - consumed offset: 16163904970 doesn't match
> fetch offset: 16175326044 for mytopic:0: fetched offset = 16175330598:
> consumed offset = 16163904970;
>  Consumer may lose data
>
> Why would I not see this second ERROR everytime there's a
> corresponding WARN on the FetcherThread for an offset reset?
>
> Should I only be concerned about possible lost data if I see the
> second ERROR log line?
>
> Jason
>
> On Tue, Dec 24, 2013 at 3:49 PM, Jason Rosenberg  wrote:
> > But I assume this would not be normally you'd want to log (every
> > incoming producer request?).  Maybe just for debugging?  Or is it only
> > for consumer fetch requests?
> >
> > On Tue, Dec 24, 2013 at 12:50 PM, Guozhang Wang 
> wrote:
> >> TRACE is lower than INFO so INFO level request logging would also be
> >> recorded.
> >>
> >> You can check for "Completed XXX request" in the log files to check the
> >> request info with the correlation id.
> >>
> >> Guozhang
> >>
> >>
> >> On Mon, Dec 23, 2013 at 10:46 PM, Jason Rosenberg 
> wrote:
> >>
> >>> Hmmm, it looks like I'm enabling all logging at INFO, and the request
> >>> logging is only done at TRACE (why is that?).
> >>>
> >>> I suppose one wouldn't normally want to see request logs, so by
> default,
> >>> they aren't enabled?
> >>>
> >>>
> >>> On Mon, Dec 23, 2013 at 11:46 PM, Jun Rao  wrote:
> >>>
> >>> > Did you enable request log? It logs the ip of every request.
> >>> >
> >>> > Thanks,
> >>> >
> >>> > Jun
> >>> >
> >>> >
> >>> > On Mon, Dec 23, 2013 at 3:52 PM, Jason Rosenberg 
> >>> wrote:
> >>> >
> >>> > > Hi Guozhang,
> >>> > >
> >>> > > I'm not sure I understand your first answer.  I don't see anything
> >>> > > regarding the correlation id, elsewhere in the broker logs.They
> >>> only
> >>> > > show up in those ERROR messages
> >>> > >
> >>> > > I do see correlation id's in clients, but not on the broker.
> >>> > >
> >>> > > Jason
> >>> > >
> >>> > >
> >>> > > On Mon, Dec 23, 2013 at 6:46 PM, Guozhang Wang  >
> >>> > wrote:
> >>> > >
> >>> > > > Jason,
> >>> > > >
> >>> > > > You can search the correlation id in the public access log on the
> >>> > servers
> >>> > > > to get the consumer information.
> >>> > > >
> >>> > > > As for logging, I agree that we should use the same level on both
> >>> > sides.
> >>> > > > Could you file a jira for this?
> >>> > > >
> >>> > > > Guozhang
> >>> > > >
> >>> > > >
> >>> > > > On Mon, Dec 23, 2013 at 3:09 PM, Jason Rosenberg <
> j...@squareup.com>
> >>> > > wrote:
> >>> > > >
> >>> > > > > In our broker logs, we occasionally see errors like this:
> >>> > > > >
> >>> > > > > 2013-12-23 05:02:08,456 ERROR [kafka-request-handler-2]
> >>> > > server.KafkaApis
> >>> > > > -
> >>> > > > > [KafkaApi-45] Error when processing fetch request for partition
> >>> > > > [mytopic,0]
> >>> > > > > offset 204243601 from consumer with correlation id 130341
> >>> > > > > kafka.common.OffsetOutOfRangeException: Request for offset
> >>> 204243601
> >>> > > but
> >>> > > > we
> >>> > > > > only have log segments in the range 204343397 to 207423640.
> >>> > > > >
> >>> > > > > I assume this means there's a consumer that has fallen behind
> >>> > consuming
> >>> > > > > messages, and the log retention policy has removed messages
> before
> >>> > they
> >>> > > > > could be consumed by the consumer.
> >>> > > > >
> >>> > > > > However, I'm not 100% which consumer it is, and it looks like
> the
> >>> > only
> >>> > > > info
> >>> > > > > we have is the correlation id of the consumer, e.g.:
> >>> > > > >
> >>> > > > > "from consumer with correlation id 130341"
> >>> > > > >
> >>> > > > > Is there a way to know which consumer this refers to?  It seems
> >>> there
> >>> > > are
> >>> > > > > far more correlation id's than there are consumers.  Would it
> be
> >>> > > possible
> >>> > > > > to provide a bit more descriptive error message here, so we can
> >>> > > > immediately
> >>> > > > > know which consumer is falling behind?
> >>> > > > >
> >>> > > > > We do see a corresponding entry in the consumer logs too:
> >>> > > > >
> >>> > > > > 2013-12-23 05:02:08,797  WARN
> >>> > > > >
> [ConsumerFetcherThread-myconsumergroup-1387353494862-7aa0c61d-0-45]
> >>> > > > > consumer.ConsumerFetcherThread -
> >>> > > > >
> >>> [Consumer

Re: understanding OffsetOutOfRangeException's....

2014-01-07 Thread Jason Rosenberg
So, sometimes I just get the WARN from the ConsumerFetcherThread (as
previously noted, above), e.g.:

2014-01-08 02:31:47,394  WARN [ConsumerFetcherThread-myconsumerapp-11]
consumer.ConsumerFetcherThread -
[ConsumerFetcherThread-myconsumerapp-11], Current offset 16163904970
for partition [mypartition,0] out of range; reset offset to
16175326044

More recently, I see these in the following log line (not sure why I
didn't see it previously), coming from the ConsumerIterator:

2014-01-08 02:31:47,681 ERROR [myconsumerthread-0]
consumer.ConsumerIterator - consumed offset: 16163904970 doesn't match
fetch offset: 16175326044 for mytopic:0: fetched offset = 16175330598:
consumed offset = 16163904970;
 Consumer may lose data

Why would I not see this second ERROR everytime there's a
corresponding WARN on the FetcherThread for an offset reset?

Should I only be concerned about possible lost data if I see the
second ERROR log line?

Jason

On Tue, Dec 24, 2013 at 3:49 PM, Jason Rosenberg  wrote:
> But I assume this would not be normally you'd want to log (every
> incoming producer request?).  Maybe just for debugging?  Or is it only
> for consumer fetch requests?
>
> On Tue, Dec 24, 2013 at 12:50 PM, Guozhang Wang  wrote:
>> TRACE is lower than INFO so INFO level request logging would also be
>> recorded.
>>
>> You can check for "Completed XXX request" in the log files to check the
>> request info with the correlation id.
>>
>> Guozhang
>>
>>
>> On Mon, Dec 23, 2013 at 10:46 PM, Jason Rosenberg  wrote:
>>
>>> Hmmm, it looks like I'm enabling all logging at INFO, and the request
>>> logging is only done at TRACE (why is that?).
>>>
>>> I suppose one wouldn't normally want to see request logs, so by default,
>>> they aren't enabled?
>>>
>>>
>>> On Mon, Dec 23, 2013 at 11:46 PM, Jun Rao  wrote:
>>>
>>> > Did you enable request log? It logs the ip of every request.
>>> >
>>> > Thanks,
>>> >
>>> > Jun
>>> >
>>> >
>>> > On Mon, Dec 23, 2013 at 3:52 PM, Jason Rosenberg 
>>> wrote:
>>> >
>>> > > Hi Guozhang,
>>> > >
>>> > > I'm not sure I understand your first answer.  I don't see anything
>>> > > regarding the correlation id, elsewhere in the broker logs.They
>>> only
>>> > > show up in those ERROR messages
>>> > >
>>> > > I do see correlation id's in clients, but not on the broker.
>>> > >
>>> > > Jason
>>> > >
>>> > >
>>> > > On Mon, Dec 23, 2013 at 6:46 PM, Guozhang Wang 
>>> > wrote:
>>> > >
>>> > > > Jason,
>>> > > >
>>> > > > You can search the correlation id in the public access log on the
>>> > servers
>>> > > > to get the consumer information.
>>> > > >
>>> > > > As for logging, I agree that we should use the same level on both
>>> > sides.
>>> > > > Could you file a jira for this?
>>> > > >
>>> > > > Guozhang
>>> > > >
>>> > > >
>>> > > > On Mon, Dec 23, 2013 at 3:09 PM, Jason Rosenberg 
>>> > > wrote:
>>> > > >
>>> > > > > In our broker logs, we occasionally see errors like this:
>>> > > > >
>>> > > > > 2013-12-23 05:02:08,456 ERROR [kafka-request-handler-2]
>>> > > server.KafkaApis
>>> > > > -
>>> > > > > [KafkaApi-45] Error when processing fetch request for partition
>>> > > > [mytopic,0]
>>> > > > > offset 204243601 from consumer with correlation id 130341
>>> > > > > kafka.common.OffsetOutOfRangeException: Request for offset
>>> 204243601
>>> > > but
>>> > > > we
>>> > > > > only have log segments in the range 204343397 to 207423640.
>>> > > > >
>>> > > > > I assume this means there's a consumer that has fallen behind
>>> > consuming
>>> > > > > messages, and the log retention policy has removed messages before
>>> > they
>>> > > > > could be consumed by the consumer.
>>> > > > >
>>> > > > > However, I'm not 100% which consumer it is, and it looks like the
>>> > only
>>> > > > info
>>> > > > > we have is the correlation id of the consumer, e.g.:
>>> > > > >
>>> > > > > "from consumer with correlation id 130341"
>>> > > > >
>>> > > > > Is there a way to know which consumer this refers to?  It seems
>>> there
>>> > > are
>>> > > > > far more correlation id's than there are consumers.  Would it be
>>> > > possible
>>> > > > > to provide a bit more descriptive error message here, so we can
>>> > > > immediately
>>> > > > > know which consumer is falling behind?
>>> > > > >
>>> > > > > We do see a corresponding entry in the consumer logs too:
>>> > > > >
>>> > > > > 2013-12-23 05:02:08,797  WARN
>>> > > > > [ConsumerFetcherThread-myconsumergroup-1387353494862-7aa0c61d-0-45]
>>> > > > > consumer.ConsumerFetcherThread -
>>> > > > >
>>> [ConsumerFetcherThread-myconsumergroup-1387353494862-7aa0c61d-0-45],
>>> > > > > Current offset 204243601 for partition [mytopic,0] out of range;
>>> > reset
>>> > > > > offset to 204343397
>>> > > > >
>>> > > > > But it would be nice to be able to also use the broker log to
>>> quickly
>>> > > > find
>>> > > > > consumers with issues.
>>> > > > >
>>> > > > > Also, I'm not sure, why is logging the event as an ERROR in the
>>> > broker,
>>> > > > but
>>> > > > > a WARN in 

Re: understanding OffsetOutOfRangeException's....

2013-12-24 Thread Jason Rosenberg
But I assume this would not be normally you'd want to log (every
incoming producer request?).  Maybe just for debugging?  Or is it only
for consumer fetch requests?

On Tue, Dec 24, 2013 at 12:50 PM, Guozhang Wang  wrote:
> TRACE is lower than INFO so INFO level request logging would also be
> recorded.
>
> You can check for "Completed XXX request" in the log files to check the
> request info with the correlation id.
>
> Guozhang
>
>
> On Mon, Dec 23, 2013 at 10:46 PM, Jason Rosenberg  wrote:
>
>> Hmmm, it looks like I'm enabling all logging at INFO, and the request
>> logging is only done at TRACE (why is that?).
>>
>> I suppose one wouldn't normally want to see request logs, so by default,
>> they aren't enabled?
>>
>>
>> On Mon, Dec 23, 2013 at 11:46 PM, Jun Rao  wrote:
>>
>> > Did you enable request log? It logs the ip of every request.
>> >
>> > Thanks,
>> >
>> > Jun
>> >
>> >
>> > On Mon, Dec 23, 2013 at 3:52 PM, Jason Rosenberg 
>> wrote:
>> >
>> > > Hi Guozhang,
>> > >
>> > > I'm not sure I understand your first answer.  I don't see anything
>> > > regarding the correlation id, elsewhere in the broker logs.They
>> only
>> > > show up in those ERROR messages
>> > >
>> > > I do see correlation id's in clients, but not on the broker.
>> > >
>> > > Jason
>> > >
>> > >
>> > > On Mon, Dec 23, 2013 at 6:46 PM, Guozhang Wang 
>> > wrote:
>> > >
>> > > > Jason,
>> > > >
>> > > > You can search the correlation id in the public access log on the
>> > servers
>> > > > to get the consumer information.
>> > > >
>> > > > As for logging, I agree that we should use the same level on both
>> > sides.
>> > > > Could you file a jira for this?
>> > > >
>> > > > Guozhang
>> > > >
>> > > >
>> > > > On Mon, Dec 23, 2013 at 3:09 PM, Jason Rosenberg 
>> > > wrote:
>> > > >
>> > > > > In our broker logs, we occasionally see errors like this:
>> > > > >
>> > > > > 2013-12-23 05:02:08,456 ERROR [kafka-request-handler-2]
>> > > server.KafkaApis
>> > > > -
>> > > > > [KafkaApi-45] Error when processing fetch request for partition
>> > > > [mytopic,0]
>> > > > > offset 204243601 from consumer with correlation id 130341
>> > > > > kafka.common.OffsetOutOfRangeException: Request for offset
>> 204243601
>> > > but
>> > > > we
>> > > > > only have log segments in the range 204343397 to 207423640.
>> > > > >
>> > > > > I assume this means there's a consumer that has fallen behind
>> > consuming
>> > > > > messages, and the log retention policy has removed messages before
>> > they
>> > > > > could be consumed by the consumer.
>> > > > >
>> > > > > However, I'm not 100% which consumer it is, and it looks like the
>> > only
>> > > > info
>> > > > > we have is the correlation id of the consumer, e.g.:
>> > > > >
>> > > > > "from consumer with correlation id 130341"
>> > > > >
>> > > > > Is there a way to know which consumer this refers to?  It seems
>> there
>> > > are
>> > > > > far more correlation id's than there are consumers.  Would it be
>> > > possible
>> > > > > to provide a bit more descriptive error message here, so we can
>> > > > immediately
>> > > > > know which consumer is falling behind?
>> > > > >
>> > > > > We do see a corresponding entry in the consumer logs too:
>> > > > >
>> > > > > 2013-12-23 05:02:08,797  WARN
>> > > > > [ConsumerFetcherThread-myconsumergroup-1387353494862-7aa0c61d-0-45]
>> > > > > consumer.ConsumerFetcherThread -
>> > > > >
>> [ConsumerFetcherThread-myconsumergroup-1387353494862-7aa0c61d-0-45],
>> > > > > Current offset 204243601 for partition [mytopic,0] out of range;
>> > reset
>> > > > > offset to 204343397
>> > > > >
>> > > > > But it would be nice to be able to also use the broker log to
>> quickly
>> > > > find
>> > > > > consumers with issues.
>> > > > >
>> > > > > Also, I'm not sure, why is logging the event as an ERROR in the
>> > broker,
>> > > > but
>> > > > > a WARN in the consumer?
>> > > > >
>> > > > > Jason
>> > > > >
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > > -- Guozhang
>> > > >
>> > >
>> >
>>
>
>
>
> --
> -- Guozhang


Re: understanding OffsetOutOfRangeException's....

2013-12-24 Thread Guozhang Wang
TRACE is lower than INFO so INFO level request logging would also be
recorded.

You can check for "Completed XXX request" in the log files to check the
request info with the correlation id.

Guozhang


On Mon, Dec 23, 2013 at 10:46 PM, Jason Rosenberg  wrote:

> Hmmm, it looks like I'm enabling all logging at INFO, and the request
> logging is only done at TRACE (why is that?).
>
> I suppose one wouldn't normally want to see request logs, so by default,
> they aren't enabled?
>
>
> On Mon, Dec 23, 2013 at 11:46 PM, Jun Rao  wrote:
>
> > Did you enable request log? It logs the ip of every request.
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Mon, Dec 23, 2013 at 3:52 PM, Jason Rosenberg 
> wrote:
> >
> > > Hi Guozhang,
> > >
> > > I'm not sure I understand your first answer.  I don't see anything
> > > regarding the correlation id, elsewhere in the broker logs.They
> only
> > > show up in those ERROR messages
> > >
> > > I do see correlation id's in clients, but not on the broker.
> > >
> > > Jason
> > >
> > >
> > > On Mon, Dec 23, 2013 at 6:46 PM, Guozhang Wang 
> > wrote:
> > >
> > > > Jason,
> > > >
> > > > You can search the correlation id in the public access log on the
> > servers
> > > > to get the consumer information.
> > > >
> > > > As for logging, I agree that we should use the same level on both
> > sides.
> > > > Could you file a jira for this?
> > > >
> > > > Guozhang
> > > >
> > > >
> > > > On Mon, Dec 23, 2013 at 3:09 PM, Jason Rosenberg 
> > > wrote:
> > > >
> > > > > In our broker logs, we occasionally see errors like this:
> > > > >
> > > > > 2013-12-23 05:02:08,456 ERROR [kafka-request-handler-2]
> > > server.KafkaApis
> > > > -
> > > > > [KafkaApi-45] Error when processing fetch request for partition
> > > > [mytopic,0]
> > > > > offset 204243601 from consumer with correlation id 130341
> > > > > kafka.common.OffsetOutOfRangeException: Request for offset
> 204243601
> > > but
> > > > we
> > > > > only have log segments in the range 204343397 to 207423640.
> > > > >
> > > > > I assume this means there's a consumer that has fallen behind
> > consuming
> > > > > messages, and the log retention policy has removed messages before
> > they
> > > > > could be consumed by the consumer.
> > > > >
> > > > > However, I'm not 100% which consumer it is, and it looks like the
> > only
> > > > info
> > > > > we have is the correlation id of the consumer, e.g.:
> > > > >
> > > > > "from consumer with correlation id 130341"
> > > > >
> > > > > Is there a way to know which consumer this refers to?  It seems
> there
> > > are
> > > > > far more correlation id's than there are consumers.  Would it be
> > > possible
> > > > > to provide a bit more descriptive error message here, so we can
> > > > immediately
> > > > > know which consumer is falling behind?
> > > > >
> > > > > We do see a corresponding entry in the consumer logs too:
> > > > >
> > > > > 2013-12-23 05:02:08,797  WARN
> > > > > [ConsumerFetcherThread-myconsumergroup-1387353494862-7aa0c61d-0-45]
> > > > > consumer.ConsumerFetcherThread -
> > > > >
> [ConsumerFetcherThread-myconsumergroup-1387353494862-7aa0c61d-0-45],
> > > > > Current offset 204243601 for partition [mytopic,0] out of range;
> > reset
> > > > > offset to 204343397
> > > > >
> > > > > But it would be nice to be able to also use the broker log to
> quickly
> > > > find
> > > > > consumers with issues.
> > > > >
> > > > > Also, I'm not sure, why is logging the event as an ERROR in the
> > broker,
> > > > but
> > > > > a WARN in the consumer?
> > > > >
> > > > > Jason
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > -- Guozhang
> > > >
> > >
> >
>



-- 
-- Guozhang


Re: understanding OffsetOutOfRangeException's....

2013-12-23 Thread Jason Rosenberg
Hmmm, it looks like I'm enabling all logging at INFO, and the request
logging is only done at TRACE (why is that?).

I suppose one wouldn't normally want to see request logs, so by default,
they aren't enabled?


On Mon, Dec 23, 2013 at 11:46 PM, Jun Rao  wrote:

> Did you enable request log? It logs the ip of every request.
>
> Thanks,
>
> Jun
>
>
> On Mon, Dec 23, 2013 at 3:52 PM, Jason Rosenberg  wrote:
>
> > Hi Guozhang,
> >
> > I'm not sure I understand your first answer.  I don't see anything
> > regarding the correlation id, elsewhere in the broker logs.They only
> > show up in those ERROR messages
> >
> > I do see correlation id's in clients, but not on the broker.
> >
> > Jason
> >
> >
> > On Mon, Dec 23, 2013 at 6:46 PM, Guozhang Wang 
> wrote:
> >
> > > Jason,
> > >
> > > You can search the correlation id in the public access log on the
> servers
> > > to get the consumer information.
> > >
> > > As for logging, I agree that we should use the same level on both
> sides.
> > > Could you file a jira for this?
> > >
> > > Guozhang
> > >
> > >
> > > On Mon, Dec 23, 2013 at 3:09 PM, Jason Rosenberg 
> > wrote:
> > >
> > > > In our broker logs, we occasionally see errors like this:
> > > >
> > > > 2013-12-23 05:02:08,456 ERROR [kafka-request-handler-2]
> > server.KafkaApis
> > > -
> > > > [KafkaApi-45] Error when processing fetch request for partition
> > > [mytopic,0]
> > > > offset 204243601 from consumer with correlation id 130341
> > > > kafka.common.OffsetOutOfRangeException: Request for offset 204243601
> > but
> > > we
> > > > only have log segments in the range 204343397 to 207423640.
> > > >
> > > > I assume this means there's a consumer that has fallen behind
> consuming
> > > > messages, and the log retention policy has removed messages before
> they
> > > > could be consumed by the consumer.
> > > >
> > > > However, I'm not 100% which consumer it is, and it looks like the
> only
> > > info
> > > > we have is the correlation id of the consumer, e.g.:
> > > >
> > > > "from consumer with correlation id 130341"
> > > >
> > > > Is there a way to know which consumer this refers to?  It seems there
> > are
> > > > far more correlation id's than there are consumers.  Would it be
> > possible
> > > > to provide a bit more descriptive error message here, so we can
> > > immediately
> > > > know which consumer is falling behind?
> > > >
> > > > We do see a corresponding entry in the consumer logs too:
> > > >
> > > > 2013-12-23 05:02:08,797  WARN
> > > > [ConsumerFetcherThread-myconsumergroup-1387353494862-7aa0c61d-0-45]
> > > > consumer.ConsumerFetcherThread -
> > > > [ConsumerFetcherThread-myconsumergroup-1387353494862-7aa0c61d-0-45],
> > > > Current offset 204243601 for partition [mytopic,0] out of range;
> reset
> > > > offset to 204343397
> > > >
> > > > But it would be nice to be able to also use the broker log to quickly
> > > find
> > > > consumers with issues.
> > > >
> > > > Also, I'm not sure, why is logging the event as an ERROR in the
> broker,
> > > but
> > > > a WARN in the consumer?
> > > >
> > > > Jason
> > > >
> > >
> > >
> > >
> > > --
> > > -- Guozhang
> > >
> >
>


Re: understanding OffsetOutOfRangeException's....

2013-12-23 Thread Jun Rao
Did you enable request log? It logs the ip of every request.

Thanks,

Jun


On Mon, Dec 23, 2013 at 3:52 PM, Jason Rosenberg  wrote:

> Hi Guozhang,
>
> I'm not sure I understand your first answer.  I don't see anything
> regarding the correlation id, elsewhere in the broker logs.They only
> show up in those ERROR messages
>
> I do see correlation id's in clients, but not on the broker.
>
> Jason
>
>
> On Mon, Dec 23, 2013 at 6:46 PM, Guozhang Wang  wrote:
>
> > Jason,
> >
> > You can search the correlation id in the public access log on the servers
> > to get the consumer information.
> >
> > As for logging, I agree that we should use the same level on both sides.
> > Could you file a jira for this?
> >
> > Guozhang
> >
> >
> > On Mon, Dec 23, 2013 at 3:09 PM, Jason Rosenberg 
> wrote:
> >
> > > In our broker logs, we occasionally see errors like this:
> > >
> > > 2013-12-23 05:02:08,456 ERROR [kafka-request-handler-2]
> server.KafkaApis
> > -
> > > [KafkaApi-45] Error when processing fetch request for partition
> > [mytopic,0]
> > > offset 204243601 from consumer with correlation id 130341
> > > kafka.common.OffsetOutOfRangeException: Request for offset 204243601
> but
> > we
> > > only have log segments in the range 204343397 to 207423640.
> > >
> > > I assume this means there's a consumer that has fallen behind consuming
> > > messages, and the log retention policy has removed messages before they
> > > could be consumed by the consumer.
> > >
> > > However, I'm not 100% which consumer it is, and it looks like the only
> > info
> > > we have is the correlation id of the consumer, e.g.:
> > >
> > > "from consumer with correlation id 130341"
> > >
> > > Is there a way to know which consumer this refers to?  It seems there
> are
> > > far more correlation id's than there are consumers.  Would it be
> possible
> > > to provide a bit more descriptive error message here, so we can
> > immediately
> > > know which consumer is falling behind?
> > >
> > > We do see a corresponding entry in the consumer logs too:
> > >
> > > 2013-12-23 05:02:08,797  WARN
> > > [ConsumerFetcherThread-myconsumergroup-1387353494862-7aa0c61d-0-45]
> > > consumer.ConsumerFetcherThread -
> > > [ConsumerFetcherThread-myconsumergroup-1387353494862-7aa0c61d-0-45],
> > > Current offset 204243601 for partition [mytopic,0] out of range; reset
> > > offset to 204343397
> > >
> > > But it would be nice to be able to also use the broker log to quickly
> > find
> > > consumers with issues.
> > >
> > > Also, I'm not sure, why is logging the event as an ERROR in the broker,
> > but
> > > a WARN in the consumer?
> > >
> > > Jason
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>


Re: understanding OffsetOutOfRangeException's....

2013-12-23 Thread Jason Rosenberg
Hi Guozhang,

I'm not sure I understand your first answer.  I don't see anything
regarding the correlation id, elsewhere in the broker logs.They only
show up in those ERROR messages

I do see correlation id's in clients, but not on the broker.

Jason


On Mon, Dec 23, 2013 at 6:46 PM, Guozhang Wang  wrote:

> Jason,
>
> You can search the correlation id in the public access log on the servers
> to get the consumer information.
>
> As for logging, I agree that we should use the same level on both sides.
> Could you file a jira for this?
>
> Guozhang
>
>
> On Mon, Dec 23, 2013 at 3:09 PM, Jason Rosenberg  wrote:
>
> > In our broker logs, we occasionally see errors like this:
> >
> > 2013-12-23 05:02:08,456 ERROR [kafka-request-handler-2] server.KafkaApis
> -
> > [KafkaApi-45] Error when processing fetch request for partition
> [mytopic,0]
> > offset 204243601 from consumer with correlation id 130341
> > kafka.common.OffsetOutOfRangeException: Request for offset 204243601 but
> we
> > only have log segments in the range 204343397 to 207423640.
> >
> > I assume this means there's a consumer that has fallen behind consuming
> > messages, and the log retention policy has removed messages before they
> > could be consumed by the consumer.
> >
> > However, I'm not 100% which consumer it is, and it looks like the only
> info
> > we have is the correlation id of the consumer, e.g.:
> >
> > "from consumer with correlation id 130341"
> >
> > Is there a way to know which consumer this refers to?  It seems there are
> > far more correlation id's than there are consumers.  Would it be possible
> > to provide a bit more descriptive error message here, so we can
> immediately
> > know which consumer is falling behind?
> >
> > We do see a corresponding entry in the consumer logs too:
> >
> > 2013-12-23 05:02:08,797  WARN
> > [ConsumerFetcherThread-myconsumergroup-1387353494862-7aa0c61d-0-45]
> > consumer.ConsumerFetcherThread -
> > [ConsumerFetcherThread-myconsumergroup-1387353494862-7aa0c61d-0-45],
> > Current offset 204243601 for partition [mytopic,0] out of range; reset
> > offset to 204343397
> >
> > But it would be nice to be able to also use the broker log to quickly
> find
> > consumers with issues.
> >
> > Also, I'm not sure, why is logging the event as an ERROR in the broker,
> but
> > a WARN in the consumer?
> >
> > Jason
> >
>
>
>
> --
> -- Guozhang
>


Re: understanding OffsetOutOfRangeException's....

2013-12-23 Thread Guozhang Wang
Jason,

You can search the correlation id in the public access log on the servers
to get the consumer information.

As for logging, I agree that we should use the same level on both sides.
Could you file a jira for this?

Guozhang


On Mon, Dec 23, 2013 at 3:09 PM, Jason Rosenberg  wrote:

> In our broker logs, we occasionally see errors like this:
>
> 2013-12-23 05:02:08,456 ERROR [kafka-request-handler-2] server.KafkaApis -
> [KafkaApi-45] Error when processing fetch request for partition [mytopic,0]
> offset 204243601 from consumer with correlation id 130341
> kafka.common.OffsetOutOfRangeException: Request for offset 204243601 but we
> only have log segments in the range 204343397 to 207423640.
>
> I assume this means there's a consumer that has fallen behind consuming
> messages, and the log retention policy has removed messages before they
> could be consumed by the consumer.
>
> However, I'm not 100% which consumer it is, and it looks like the only info
> we have is the correlation id of the consumer, e.g.:
>
> "from consumer with correlation id 130341"
>
> Is there a way to know which consumer this refers to?  It seems there are
> far more correlation id's than there are consumers.  Would it be possible
> to provide a bit more descriptive error message here, so we can immediately
> know which consumer is falling behind?
>
> We do see a corresponding entry in the consumer logs too:
>
> 2013-12-23 05:02:08,797  WARN
> [ConsumerFetcherThread-myconsumergroup-1387353494862-7aa0c61d-0-45]
> consumer.ConsumerFetcherThread -
> [ConsumerFetcherThread-myconsumergroup-1387353494862-7aa0c61d-0-45],
> Current offset 204243601 for partition [mytopic,0] out of range; reset
> offset to 204343397
>
> But it would be nice to be able to also use the broker log to quickly find
> consumers with issues.
>
> Also, I'm not sure, why is logging the event as an ERROR in the broker, but
> a WARN in the consumer?
>
> Jason
>



-- 
-- Guozhang


understanding OffsetOutOfRangeException's....

2013-12-23 Thread Jason Rosenberg
In our broker logs, we occasionally see errors like this:

2013-12-23 05:02:08,456 ERROR [kafka-request-handler-2] server.KafkaApis -
[KafkaApi-45] Error when processing fetch request for partition [mytopic,0]
offset 204243601 from consumer with correlation id 130341
kafka.common.OffsetOutOfRangeException: Request for offset 204243601 but we
only have log segments in the range 204343397 to 207423640.

I assume this means there's a consumer that has fallen behind consuming
messages, and the log retention policy has removed messages before they
could be consumed by the consumer.

However, I'm not 100% which consumer it is, and it looks like the only info
we have is the correlation id of the consumer, e.g.:

"from consumer with correlation id 130341"

Is there a way to know which consumer this refers to?  It seems there are
far more correlation id's than there are consumers.  Would it be possible
to provide a bit more descriptive error message here, so we can immediately
know which consumer is falling behind?

We do see a corresponding entry in the consumer logs too:

2013-12-23 05:02:08,797  WARN
[ConsumerFetcherThread-myconsumergroup-1387353494862-7aa0c61d-0-45]
consumer.ConsumerFetcherThread -
[ConsumerFetcherThread-myconsumergroup-1387353494862-7aa0c61d-0-45],
Current offset 204243601 for partition [mytopic,0] out of range; reset
offset to 204343397

But it would be nice to be able to also use the broker log to quickly find
consumers with issues.

Also, I'm not sure, why is logging the event as an ERROR in the broker, but
a WARN in the consumer?

Jason